MaxView

← Back to run

Log Summary

XPK Start: Tue Apr 21 06:39:35 UTC 2026
2026-04-21 06:39:52.371922: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'}
`rope_scaling`'s factor field must be a float >= 1, got 40
`rope_scaling`'s beta_fast field must be a float, got 32
`rope_scaling`'s beta_slow field must be a float, got 1
Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'}
Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'}
Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'}
I0421 06:39:56.359722 136604529964864 max_utils.py:273] Attempting to initialize the jax distributed system...
INFO:2026-04-21 06:40:05,397:jax._src.distributed:149: Starting JAX distributed service on [::]:8482
I0421 06:40:05.397896 136604529964864 distributed.py:149] Starting JAX distributed service on [::]:8482
INFO:2026-04-21 06:40:05,400:jax._src.distributed:166: Connecting to JAX distributed service on mt-07-distill-smoke-7d8ye-slice-job-0-0.mt-07-distill-smoke-7d8ye:8482
I0421 06:40:05.400094 136604529964864 distributed.py:166] Connecting to JAX distributed service on mt-07-distill-smoke-7d8ye-slice-job-0-0.mt-07-distill-smoke-7d8ye:8482
I0421 06:40:06.621108 136604529964864 max_utils.py:284] Jax distributed system initialized!
I0421 06:40:13.512078 136604529964864 max_utils.py:244] Jax distributed system is already initialized.
W0421 06:40:13.644431 136604529964864 pyconfig.py:111] base_output_directory is not provided; Using local directory called maxtext_output
I0421 06:40:13.705611 136604529964864 max_utils.py:244] Jax distributed system is already initialized.
I0421 06:40:13.706802 136604529964864 pyconfig.py:471] Config param abort_on_inf_loss: True
I0421 06:40:13.706850 136604529964864 pyconfig.py:471] Config param abort_on_nan_loss: True
I0421 06:40:13.706877 136604529964864 pyconfig.py:471] Config param act_quantization_calibration_method: absmax
I0421 06:40:13.706899 136604529964864 pyconfig.py:471] Config param activation_dropout_for_audio: 0.0
I0421 06:40:13.706919 136604529964864 pyconfig.py:471] Config param activation_function_for_audio: gelu
I0421 06:40:13.706938 136604529964864 pyconfig.py:471] Config param activations_in_float32: False
I0421 06:40:13.706956 136604529964864 pyconfig.py:471] Config param adam_b1: 0.9
I0421 06:40:13.706974 136604529964864 pyconfig.py:471] Config param adam_b2: 0.95
I0421 06:40:13.706992 136604529964864 pyconfig.py:471] Config param adam_eps: 1e-08
I0421 06:40:13.707014 136604529964864 pyconfig.py:471] Config param adam_eps_root: 0.0
I0421 06:40:13.707031 136604529964864 pyconfig.py:471] Config param adam_weight_decay: 0.1
I0421 06:40:13.707050 136604529964864 pyconfig.py:471] Config param adamw_mask: []
I0421 06:40:13.707067 136604529964864 pyconfig.py:471] Config param add_bos: True
I0421 06:40:13.707085 136604529964864 pyconfig.py:471] Config param add_eos: True
I0421 06:40:13.707102 136604529964864 pyconfig.py:471] Config param allow_split_physical_axes: False
I0421 06:40:13.707118 136604529964864 pyconfig.py:471] Config param ar_cache_axis_order: 1,2,0,3
I0421 06:40:13.707135 136604529964864 pyconfig.py:471] Config param async_checkpointing: True
I0421 06:40:13.707156 136604529964864 pyconfig.py:471] Config param async_scheduling: False
I0421 06:40:13.707172 136604529964864 pyconfig.py:471] Config param attention: dot_product
I0421 06:40:13.707189 136604529964864 pyconfig.py:471] Config param attention_bias: False
I0421 06:40:13.707206 136604529964864 pyconfig.py:471] Config param attention_dropout_for_audio: 0.0
I0421 06:40:13.707222 136604529964864 pyconfig.py:471] Config param attention_out: RematLocation.REMAT
I0421 06:40:13.707243 136604529964864 pyconfig.py:471] Config param attention_output_dim: -1
I0421 06:40:13.707260 136604529964864 pyconfig.py:471] Config param attention_sink: False
I0421 06:40:13.707277 136604529964864 pyconfig.py:471] Config param attention_type: global
I0421 06:40:13.707294 136604529964864 pyconfig.py:471] Config param attn_logits_soft_cap: None
I0421 06:40:13.707311 136604529964864 pyconfig.py:471] Config param audio_path: 
I0421 06:40:13.707326 136604529964864 pyconfig.py:471] Config param audio_placeholder: <|audio|>
I0421 06:40:13.707343 136604529964864 pyconfig.py:471] Config param autoregressive_decode_assert: 
I0421 06:40:13.707358 136604529964864 pyconfig.py:471] Config param base_config: base.yml
I0421 06:40:13.707374 136604529964864 pyconfig.py:471] Config param base_emb_dim: 16
I0421 06:40:13.707391 136604529964864 pyconfig.py:471] Config param base_mlp_dim: 64
I0421 06:40:13.707407 136604529964864 pyconfig.py:471] Config param base_moe_mlp_dim: -1
I0421 06:40:13.707423 136604529964864 pyconfig.py:471] Config param base_num_decoder_layers: 1
I0421 06:40:13.707440 136604529964864 pyconfig.py:471] Config param base_num_kv_heads: 2
I0421 06:40:13.707455 136604529964864 pyconfig.py:471] Config param base_num_query_heads: 2
I0421 06:40:13.707471 136604529964864 pyconfig.py:471] Config param base_output_directory: /deps/maxtext_output
I0421 06:40:13.707488 136604529964864 pyconfig.py:471] Config param batch_size: 1
I0421 06:40:13.707505 136604529964864 pyconfig.py:471] Config param batch_split_factor: 1
I0421 06:40:13.707521 136604529964864 pyconfig.py:471] Config param beta_fast: 32
I0421 06:40:13.707538 136604529964864 pyconfig.py:471] Config param beta_slow: 1
I0421 06:40:13.707553 136604529964864 pyconfig.py:471] Config param bwd_quantization_calibration_method: absmax
I0421 06:40:13.707569 136604529964864 pyconfig.py:471] Config param capacity_factor: -1.0
I0421 06:40:13.707586 136604529964864 pyconfig.py:471] Config param cast_logits_to_fp32: True
I0421 06:40:13.707603 136604529964864 pyconfig.py:471] Config param chat_template: 
I0421 06:40:13.707619 136604529964864 pyconfig.py:471] Config param chat_template_path: 
I0421 06:40:13.707635 136604529964864 pyconfig.py:471] Config param checkpoint_conversion_fn: None
I0421 06:40:13.707653 136604529964864 pyconfig.py:471] Config param checkpoint_dir: /deps/maxtext_output/gpt3-52k_2026-04-21-06-40/checkpoints/
I0421 06:40:13.707670 136604529964864 pyconfig.py:471] Config param checkpoint_is_quantized: False
I0421 06:40:13.707685 136604529964864 pyconfig.py:471] Config param checkpoint_period: 2000
I0421 06:40:13.707710 136604529964864 pyconfig.py:471] Config param checkpoint_storage_concurrent_gb: 96
I0421 06:40:13.707728 136604529964864 pyconfig.py:471] Config param checkpoint_storage_target_data_file_size_bytes: 2147483648
I0421 06:40:13.707744 136604529964864 pyconfig.py:471] Config param checkpoint_storage_use_ocdbt: True
I0421 06:40:13.707761 136604529964864 pyconfig.py:471] Config param checkpoint_storage_use_zarr3: True
I0421 06:40:13.707776 136604529964864 pyconfig.py:471] Config param checkpoint_todelete_full_path: None
I0421 06:40:13.707791 136604529964864 pyconfig.py:471] Config param checkpoint_todelete_subdir: None
I0421 06:40:13.707807 136604529964864 pyconfig.py:471] Config param chips_per_vm: 4
I0421 06:40:13.707821 136604529964864 pyconfig.py:471] Config param chunk_attn_window_size: 0
I0421 06:40:13.707836 136604529964864 pyconfig.py:471] Config param collect_stack_trace: False
I0421 06:40:13.707852 136604529964864 pyconfig.py:471] Config param colocated_python_checkpointing: False
I0421 06:40:13.707866 136604529964864 pyconfig.py:471] Config param colocated_python_data_input: False
I0421 06:40:13.707883 136604529964864 pyconfig.py:471] Config param compile_topology: 
I0421 06:40:13.707899 136604529964864 pyconfig.py:471] Config param compile_topology_num_slices: -1
I0421 06:40:13.707915 136604529964864 pyconfig.py:471] Config param compile_xla_flags: 
I0421 06:40:13.707930 136604529964864 pyconfig.py:471] Config param compiled_trainstep_file: 
I0421 06:40:13.707946 136604529964864 pyconfig.py:471] Config param compute_axis_order: 0,1,2,3
I0421 06:40:13.707962 136604529964864 pyconfig.py:471] Config param constant_bound_config: []
I0421 06:40:13.707978 136604529964864 pyconfig.py:471] Config param context: RematLocation.REMAT
I0421 06:40:13.707995 136604529964864 pyconfig.py:471] Config param context_parallel_load_balance: True
I0421 06:40:13.708011 136604529964864 pyconfig.py:471] Config param context_parallel_reorder_strategy: ReorderStrategy.AUTO
I0421 06:40:13.708028 136604529964864 pyconfig.py:471] Config param context_parallel_size: 1
I0421 06:40:13.708043 136604529964864 pyconfig.py:471] Config param context_parallel_strategy: all_gather
I0421 06:40:13.708059 136604529964864 pyconfig.py:471] Config param context_sharding: context
I0421 06:40:13.708075 136604529964864 pyconfig.py:471] Config param conv_chunksize_for_audio: 500
I0421 06:40:13.708090 136604529964864 pyconfig.py:471] Config param conv_stride_for_vit: 14
I0421 06:40:13.708106 136604529964864 pyconfig.py:471] Config param convert_checkpoint_if_possible: False
I0421 06:40:13.708120 136604529964864 pyconfig.py:471] Config param cost_estimate_flops_bwd: -1
I0421 06:40:13.708135 136604529964864 pyconfig.py:471] Config param cost_estimate_flops_fwd: -1
I0421 06:40:13.708150 136604529964864 pyconfig.py:471] Config param custom_mesh: 
I0421 06:40:13.708171 136604529964864 pyconfig.py:471] Config param custom_mesh_and_rule: 
I0421 06:40:13.708186 136604529964864 pyconfig.py:471] Config param d_model_for_audio: 256
I0421 06:40:13.708202 136604529964864 pyconfig.py:471] Config param data_sharding: (('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'),)
I0421 06:40:13.708222 136604529964864 pyconfig.py:471] Config param data_shuffle_seed: 0
I0421 06:40:13.708237 136604529964864 pyconfig.py:471] Config param dataset_name: c4/en:3.0.1
I0421 06:40:13.708253 136604529964864 pyconfig.py:471] Config param dataset_path: 
I0421 06:40:13.708270 136604529964864 pyconfig.py:471] Config param dataset_type: DatasetType.HF
I0421 06:40:13.708292 136604529964864 pyconfig.py:471] Config param dcn_autoregressive_parallelism: 1
I0421 06:40:13.708308 136604529964864 pyconfig.py:471] Config param dcn_context_autoregressive_parallelism: 1
I0421 06:40:13.708324 136604529964864 pyconfig.py:471] Config param dcn_context_parallelism: 1
I0421 06:40:13.708339 136604529964864 pyconfig.py:471] Config param dcn_data_parallelism: -1
I0421 06:40:13.708355 136604529964864 pyconfig.py:471] Config param dcn_diloco_parallelism: 1
I0421 06:40:13.708370 136604529964864 pyconfig.py:471] Config param dcn_expert_parallelism: 1
I0421 06:40:13.708386 136604529964864 pyconfig.py:471] Config param dcn_fsdp_parallelism: 1
I0421 06:40:13.708400 136604529964864 pyconfig.py:471] Config param dcn_fsdp_transpose_parallelism: 1
I0421 06:40:13.708415 136604529964864 pyconfig.py:471] Config param dcn_parallelism: [1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
I0421 06:40:13.708433 136604529964864 pyconfig.py:471] Config param dcn_pipeline_parallelism: 1
I0421 06:40:13.708447 136604529964864 pyconfig.py:471] Config param dcn_sequence_parallelism: 1
I0421 06:40:13.708464 136604529964864 pyconfig.py:471] Config param dcn_tensor_parallelism: 1
I0421 06:40:13.708478 136604529964864 pyconfig.py:471] Config param dcn_tensor_sequence_parallelism: 1
I0421 06:40:13.708494 136604529964864 pyconfig.py:471] Config param dcn_tensor_transpose_parallelism: 1
I0421 06:40:13.708510 136604529964864 pyconfig.py:471] Config param debug: {'rl': False}
I0421 06:40:13.708527 136604529964864 pyconfig.py:471] Config param debug_sharding: False
I0421 06:40:13.708543 136604529964864 pyconfig.py:471] Config param decode_sampling_nucleus_p: -1
I0421 06:40:13.708559 136604529964864 pyconfig.py:471] Config param decode_sampling_strategy: SamplingStrategy.GREEDY
I0421 06:40:13.708577 136604529964864 pyconfig.py:471] Config param decode_sampling_temperature: 1.0
I0421 06:40:13.708594 136604529964864 pyconfig.py:471] Config param decode_sampling_top_k: 0
I0421 06:40:13.708609 136604529964864 pyconfig.py:471] Config param decoder_block: DecoderBlockType.GPT3
I0421 06:40:13.708626 136604529964864 pyconfig.py:471] Config param decoder_layer_input: RematLocation.DEVICE
I0421 06:40:13.708643 136604529964864 pyconfig.py:471] Config param deepstack_visual_indexes_for_vit: []
I0421 06:40:13.708658 136604529964864 pyconfig.py:471] Config param degenerate_group_masking: True
I0421 06:40:13.708675 136604529964864 pyconfig.py:471] Config param dense_init_scale: 1.0
I0421 06:40:13.708690 136604529964864 pyconfig.py:471] Config param diloco_outer_lr: 0.3
I0421 06:40:13.708774 136604529964864 pyconfig.py:471] Config param diloco_outer_momentum: 0.9
I0421 06:40:13.708812 136604529964864 pyconfig.py:471] Config param diloco_sync_period: 36
I0421 06:40:13.708838 136604529964864 pyconfig.py:471] Config param distill_alpha: 0.5
I0421 06:40:13.708860 136604529964864 pyconfig.py:471] Config param distill_alpha_end: None
I0421 06:40:13.708878 136604529964864 pyconfig.py:471] Config param distill_alpha_schedule: constant
I0421 06:40:13.708894 136604529964864 pyconfig.py:471] Config param distill_beta: 0.0
I0421 06:40:13.708910 136604529964864 pyconfig.py:471] Config param distill_beta_end: None
I0421 06:40:13.708927 136604529964864 pyconfig.py:471] Config param distill_beta_schedule: constant
I0421 06:40:13.708943 136604529964864 pyconfig.py:471] Config param distill_feature_loss_type: cosine
I0421 06:40:13.708960 136604529964864 pyconfig.py:471] Config param distill_layer_indices: None
I0421 06:40:13.708974 136604529964864 pyconfig.py:471] Config param distill_temperature: 1.0
I0421 06:40:13.708991 136604529964864 pyconfig.py:471] Config param distill_temperature_end: None
I0421 06:40:13.709006 136604529964864 pyconfig.py:471] Config param distill_temperature_schedule: constant
I0421 06:40:13.709022 136604529964864 pyconfig.py:471] Config param downsample_hidden_size_for_audio: 256
I0421 06:40:13.709039 136604529964864 pyconfig.py:471] Config param dpo_beta: 0.1
I0421 06:40:13.709056 136604529964864 pyconfig.py:471] Config param dpo_label_smoothing: 0.0
I0421 06:40:13.709074 136604529964864 pyconfig.py:471] Config param dq_reduction_steps: 0
I0421 06:40:13.709090 136604529964864 pyconfig.py:471] Config param dropout_rate: 0.0
I0421 06:40:13.709137 136604529964864 pyconfig.py:471] Config param dtype: bfloat16
I0421 06:40:13.709188 136604529964864 pyconfig.py:471] Config param dtype_mm: float32
I0421 06:40:13.709207 136604529964864 pyconfig.py:471] Config param dump_hlo: False
I0421 06:40:13.709223 136604529964864 pyconfig.py:471] Config param dump_hlo_delete_local_after: True
I0421 06:40:13.709239 136604529964864 pyconfig.py:471] Config param dump_hlo_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-21-06-40/xla_dump
I0421 06:40:13.709254 136604529964864 pyconfig.py:471] Config param dump_hlo_local_dir: /tmp/xla_dump/
I0421 06:40:13.709270 136604529964864 pyconfig.py:471] Config param dump_hlo_local_module_name: jit_train_step
I0421 06:40:13.709285 136604529964864 pyconfig.py:471] Config param dump_hlo_module_name: jit_train_step
I0421 06:40:13.709300 136604529964864 pyconfig.py:471] Config param dump_hlo_upload_all: False
I0421 06:40:13.709316 136604529964864 pyconfig.py:471] Config param dump_hlo_xla_flags: 
I0421 06:40:13.709330 136604529964864 pyconfig.py:471] Config param dump_jaxpr: False
I0421 06:40:13.709346 136604529964864 pyconfig.py:471] Config param dump_jaxpr_delete_local_after: True
I0421 06:40:13.709362 136604529964864 pyconfig.py:471] Config param dump_jaxpr_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-21-06-40/jaxpr_dump
I0421 06:40:13.709376 136604529964864 pyconfig.py:471] Config param dump_jaxpr_local_dir: /tmp/jaxpr_dump/
I0421 06:40:13.709392 136604529964864 pyconfig.py:471] Config param dump_step: -1
I0421 06:40:13.709407 136604529964864 pyconfig.py:471] Config param elastic_enabled: False
I0421 06:40:13.709422 136604529964864 pyconfig.py:471] Config param elastic_max_retries: 10
I0421 06:40:13.709438 136604529964864 pyconfig.py:471] Config param elastic_timeout_seconds: 300
I0421 06:40:13.709453 136604529964864 pyconfig.py:471] Config param emb_dim: 16
I0421 06:40:13.709468 136604529964864 pyconfig.py:471] Config param enable_autocheckpoint: False
I0421 06:40:13.709483 136604529964864 pyconfig.py:471] Config param enable_checkpoint_cloud_logger: False
I0421 06:40:13.709499 136604529964864 pyconfig.py:471] Config param enable_checkpointing: True
I0421 06:40:13.709513 136604529964864 pyconfig.py:471] Config param enable_continuous_checkpointing: False
I0421 06:40:13.709529 136604529964864 pyconfig.py:471] Config param enable_data_shuffling: True
I0421 06:40:13.709544 136604529964864 pyconfig.py:471] Config param enable_diloco: False
I0421 06:40:13.709558 136604529964864 pyconfig.py:471] Config param enable_dp_attention: False
I0421 06:40:13.709574 136604529964864 pyconfig.py:471] Config param enable_dropout: False
I0421 06:40:13.709588 136604529964864 pyconfig.py:471] Config param enable_emergency_checkpoint: False
I0421 06:40:13.709604 136604529964864 pyconfig.py:471] Config param enable_expert_parallel: False
I0421 06:40:13.709620 136604529964864 pyconfig.py:471] Config param enable_gcp_goodput_metrics: True
I0421 06:40:13.709635 136604529964864 pyconfig.py:471] Config param enable_gcp_step_deviation_metrics: True
I0421 06:40:13.709650 136604529964864 pyconfig.py:471] Config param enable_goodput_recording: False
I0421 06:40:13.709665 136604529964864 pyconfig.py:471] Config param enable_jax_profiler: False
I0421 06:40:13.709680 136604529964864 pyconfig.py:471] Config param enable_llm_inference_pool: False
I0421 06:40:13.709706 136604529964864 pyconfig.py:471] Config param enable_model_warmup: False
I0421 06:40:13.709727 136604529964864 pyconfig.py:471] Config param enable_multi_tier_checkpointing: False
I0421 06:40:13.709741 136604529964864 pyconfig.py:471] Config param enable_nnx: False
I0421 06:40:13.709757 136604529964864 pyconfig.py:471] Config param enable_orbax_v1: False
I0421 06:40:13.709772 136604529964864 pyconfig.py:471] Config param enable_padding_causal_mask: True
I0421 06:40:13.709788 136604529964864 pyconfig.py:471] Config param enable_pathways_goodput: False
I0421 06:40:13.709802 136604529964864 pyconfig.py:471] Config param enable_prefix_caching: False
I0421 06:40:13.709816 136604529964864 pyconfig.py:471] Config param enable_rampup_batch_size: False
I0421 06:40:13.709832 136604529964864 pyconfig.py:471] Config param enable_single_controller: False
I0421 06:40:13.709846 136604529964864 pyconfig.py:471] Config param enable_single_replica_ckpt_restoring: False
I0421 06:40:13.709863 136604529964864 pyconfig.py:471] Config param enable_tensorboard: True
I0421 06:40:13.709878 136604529964864 pyconfig.py:471] Config param enable_tunix_perf_metrics: False
I0421 06:40:13.709894 136604529964864 pyconfig.py:471] Config param encoder_attention_heads_for_audio: 4
I0421 06:40:13.709910 136604529964864 pyconfig.py:471] Config param encoder_ffn_dim_for_audio: 512
I0421 06:40:13.709925 136604529964864 pyconfig.py:471] Config param encoder_layers_for_audio: 2
I0421 06:40:13.709940 136604529964864 pyconfig.py:471] Config param engram: RematLocation.REMAT
I0421 06:40:13.709957 136604529964864 pyconfig.py:471] Config param engram_head_dim: 1280
I0421 06:40:13.709973 136604529964864 pyconfig.py:471] Config param engram_kernel_size: 4
I0421 06:40:13.709988 136604529964864 pyconfig.py:471] Config param engram_layers: []
I0421 06:40:13.710004 136604529964864 pyconfig.py:471] Config param engram_max_ngram_size: 3
I0421 06:40:13.710019 136604529964864 pyconfig.py:471] Config param engram_num_heads: 8
I0421 06:40:13.710035 136604529964864 pyconfig.py:471] Config param engram_seed: 0
I0421 06:40:13.710049 136604529964864 pyconfig.py:471] Config param engram_vocab_bases: []
I0421 06:40:13.710065 136604529964864 pyconfig.py:471] Config param epsilon_high: None
I0421 06:40:13.710081 136604529964864 pyconfig.py:471] Config param eval_corr_lst: False
I0421 06:40:13.710098 136604529964864 pyconfig.py:471] Config param eval_data_columns: ['text']
I0421 06:40:13.710114 136604529964864 pyconfig.py:471] Config param eval_dataset_name: c4/en:3.0.1
I0421 06:40:13.710130 136604529964864 pyconfig.py:471] Config param eval_image_column: image
I0421 06:40:13.710146 136604529964864 pyconfig.py:471] Config param eval_interval: -1
I0421 06:40:13.710167 136604529964864 pyconfig.py:471] Config param eval_make_lst: False
I0421 06:40:13.710184 136604529964864 pyconfig.py:471] Config param eval_per_device_batch_size: 2
I0421 06:40:13.710200 136604529964864 pyconfig.py:471] Config param eval_sampling_strategy: greedy
I0421 06:40:13.710216 136604529964864 pyconfig.py:471] Config param eval_split: validation
I0421 06:40:13.710232 136604529964864 pyconfig.py:471] Config param eval_steps: -1
I0421 06:40:13.710247 136604529964864 pyconfig.py:471] Config param expansion_factor_real_data: -1.0
I0421 06:40:13.710264 136604529964864 pyconfig.py:471] Config param final_logits_soft_cap: None
I0421 06:40:13.710280 136604529964864 pyconfig.py:471] Config param first_num_dense_layers: 0
I0421 06:40:13.710296 136604529964864 pyconfig.py:471] Config param float32_gate_logits: False
I0421 06:40:13.710311 136604529964864 pyconfig.py:471] Config param float32_logits: False
I0421 06:40:13.710328 136604529964864 pyconfig.py:471] Config param float32_qk_product: False
I0421 06:40:13.710343 136604529964864 pyconfig.py:471] Config param float32_weight_sum: True
I0421 06:40:13.710359 136604529964864 pyconfig.py:471] Config param force_q_layout: False
I0421 06:40:13.710375 136604529964864 pyconfig.py:471] Config param force_unroll: False
I0421 06:40:13.710390 136604529964864 pyconfig.py:471] Config param freeze_audio_encoder_params: True
I0421 06:40:13.710405 136604529964864 pyconfig.py:471] Config param freeze_vision_encoder_params: True
I0421 06:40:13.710419 136604529964864 pyconfig.py:471] Config param fused_mlp: False
I0421 06:40:13.710435 136604529964864 pyconfig.py:471] Config param fused_qkv: True
I0421 06:40:13.710449 136604529964864 pyconfig.py:471] Config param gcs_metrics: False
I0421 06:40:13.710465 136604529964864 pyconfig.py:471] Config param gdn_chunk_size: 64
I0421 06:40:13.710479 136604529964864 pyconfig.py:471] Config param gdn_conv_kernel_dim: 4
I0421 06:40:13.710494 136604529964864 pyconfig.py:471] Config param gdn_key_head_dim: 128
I0421 06:40:13.710508 136604529964864 pyconfig.py:471] Config param gdn_num_key_heads: 16
I0421 06:40:13.710524 136604529964864 pyconfig.py:471] Config param gdn_num_value_heads: 32
I0421 06:40:13.710538 136604529964864 pyconfig.py:471] Config param gdn_value_head_dim: 128
I0421 06:40:13.710554 136604529964864 pyconfig.py:471] Config param generate_padding_batch_eval: False
I0421 06:40:13.710569 136604529964864 pyconfig.py:471] Config param generate_padding_batch_train: False
I0421 06:40:13.710583 136604529964864 pyconfig.py:471] Config param generate_slice: v5e-16
I0421 06:40:13.710599 136604529964864 pyconfig.py:471] Config param generation_configs: {}
I0421 06:40:13.710615 136604529964864 pyconfig.py:471] Config param global_batch_size_to_eval_on: 64
I0421 06:40:13.710630 136604529964864 pyconfig.py:471] Config param global_batch_size_to_load: 512
I0421 06:40:13.710645 136604529964864 pyconfig.py:471] Config param global_batch_size_to_load_eval: 64
I0421 06:40:13.710660 136604529964864 pyconfig.py:471] Config param global_batch_size_to_load_increment: None
I0421 06:40:13.710675 136604529964864 pyconfig.py:471] Config param global_batch_size_to_load_start: None
I0421 06:40:13.710690 136604529964864 pyconfig.py:471] Config param global_batch_size_to_train_on: 512
I0421 06:40:13.710714 136604529964864 pyconfig.py:471] Config param global_head_dim: 0
I0421 06:40:13.710730 136604529964864 pyconfig.py:471] Config param global_num_kv_heads: 0
I0421 06:40:13.710744 136604529964864 pyconfig.py:471] Config param global_parameter_scale: 1
I0421 06:40:13.710759 136604529964864 pyconfig.py:471] Config param global_rampup_samples: 500
I0421 06:40:13.710776 136604529964864 pyconfig.py:471] Config param global_rope_max_timescale: -1
I0421 06:40:13.710790 136604529964864 pyconfig.py:471] Config param global_rope_proportion: 0.25
I0421 06:40:13.710807 136604529964864 pyconfig.py:471] Config param goodput_upload_interval_seconds: 30
I0421 06:40:13.710822 136604529964864 pyconfig.py:471] Config param grad_dtype: float32
I0421 06:40:13.710858 136604529964864 pyconfig.py:471] Config param gradient_accumulation_steps: 8
I0421 06:40:13.710876 136604529964864 pyconfig.py:471] Config param gradient_clipping_threshold: 1.0
I0421 06:40:13.710893 136604529964864 pyconfig.py:471] Config param grain_data_source_max_workers: 16
I0421 06:40:13.710910 136604529964864 pyconfig.py:471] Config param grain_eval_files: 
I0421 06:40:13.710927 136604529964864 pyconfig.py:471] Config param grain_file_type: arrayrecord
I0421 06:40:13.710943 136604529964864 pyconfig.py:471] Config param grain_num_threads: 16
I0421 06:40:13.710959 136604529964864 pyconfig.py:471] Config param grain_num_threads_eval: 16
I0421 06:40:13.710974 136604529964864 pyconfig.py:471] Config param grain_packing_type: first_fit
I0421 06:40:13.710990 136604529964864 pyconfig.py:471] Config param grain_per_worker_buffer_size: 1
I0421 06:40:13.711004 136604529964864 pyconfig.py:471] Config param grain_per_worker_buffer_size_eval: 1
I0421 06:40:13.711020 136604529964864 pyconfig.py:471] Config param grain_prefetch_buffer_size: 500
I0421 06:40:13.711035 136604529964864 pyconfig.py:471] Config param grain_prefetch_buffer_size_eval: 500
I0421 06:40:13.711051 136604529964864 pyconfig.py:471] Config param grain_ram_budget_mb: 1024
I0421 06:40:13.711066 136604529964864 pyconfig.py:471] Config param grain_shuffle_buffer_size: 100
I0421 06:40:13.711082 136604529964864 pyconfig.py:471] Config param grain_train_files: 
I0421 06:40:13.711096 136604529964864 pyconfig.py:471] Config param grain_train_mixture_config_path: 
I0421 06:40:13.711112 136604529964864 pyconfig.py:471] Config param grain_worker_count: 1
I0421 06:40:13.711127 136604529964864 pyconfig.py:471] Config param grain_worker_count_eval: 1
I0421 06:40:13.711143 136604529964864 pyconfig.py:471] Config param grpo_beta: 0.08
I0421 06:40:13.711161 136604529964864 pyconfig.py:471] Config param grpo_epsilon: 0.2
I0421 06:40:13.711178 136604529964864 pyconfig.py:471] Config param hardware: tpu
I0421 06:40:13.711194 136604529964864 pyconfig.py:471] Config param hbm_utilization_vllm: 0.72
I0421 06:40:13.711209 136604529964864 pyconfig.py:471] Config param head_dim: 8
I0421 06:40:13.711224 136604529964864 pyconfig.py:471] Config param heartbeat_reporting_interval_in_seconds: 5
I0421 06:40:13.711240 136604529964864 pyconfig.py:471] Config param hf_data_dir: None
I0421 06:40:13.711254 136604529964864 pyconfig.py:471] Config param hf_eval_files: None
I0421 06:40:13.711270 136604529964864 pyconfig.py:471] Config param hf_eval_split: None
I0421 06:40:13.711284 136604529964864 pyconfig.py:471] Config param hf_name: None
I0421 06:40:13.711299 136604529964864 pyconfig.py:471] Config param hf_path: OptimalScale/ClimbMix
I0421 06:40:13.711313 136604529964864 pyconfig.py:471] Config param hf_train_files: None
I0421 06:40:13.711329 136604529964864 pyconfig.py:471] Config param hidden_size_for_vit: 1408
I0421 06:40:13.711345 136604529964864 pyconfig.py:471] Config param hide_profiler_step_metric: False
I0421 06:40:13.711359 136604529964864 pyconfig.py:471] Config param ici_autoregressive_parallelism: 1
I0421 06:40:13.711374 136604529964864 pyconfig.py:471] Config param ici_context_autoregressive_parallelism: 1
I0421 06:40:13.711390 136604529964864 pyconfig.py:471] Config param ici_context_parallelism: 1
I0421 06:40:13.711405 136604529964864 pyconfig.py:471] Config param ici_data_parallelism: 1
I0421 06:40:13.711420 136604529964864 pyconfig.py:471] Config param ici_diloco_parallelism: 1
I0421 06:40:13.711435 136604529964864 pyconfig.py:471] Config param ici_expert_parallelism: 1
I0421 06:40:13.711450 136604529964864 pyconfig.py:471] Config param ici_fsdp_parallelism: -1
I0421 06:40:13.711465 136604529964864 pyconfig.py:471] Config param ici_fsdp_transpose_parallelism: 1
I0421 06:40:13.711479 136604529964864 pyconfig.py:471] Config param ici_parallelism: [1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
I0421 06:40:13.711496 136604529964864 pyconfig.py:471] Config param ici_pipeline_parallelism: 1
I0421 06:40:13.711512 136604529964864 pyconfig.py:471] Config param ici_sequence_parallelism: 1
I0421 06:40:13.711527 136604529964864 pyconfig.py:471] Config param ici_tensor_parallelism: 1
I0421 06:40:13.711543 136604529964864 pyconfig.py:471] Config param ici_tensor_sequence_parallelism: 1
I0421 06:40:13.711557 136604529964864 pyconfig.py:471] Config param ici_tensor_transpose_parallelism: 1
I0421 06:40:13.711573 136604529964864 pyconfig.py:471] Config param image_path: 
I0421 06:40:13.711589 136604529964864 pyconfig.py:471] Config param image_placeholder: <|image|>
I0421 06:40:13.711604 136604529964864 pyconfig.py:471] Config param image_size_for_vit: 896
I0421 06:40:13.711619 136604529964864 pyconfig.py:471] Config param indexer_head_dim: 128
I0421 06:40:13.711634 136604529964864 pyconfig.py:471] Config param indexer_loss_scaling_factor: 0.0
I0421 06:40:13.711649 136604529964864 pyconfig.py:471] Config param indexer_n_heads: 64
I0421 06:40:13.711665 136604529964864 pyconfig.py:471] Config param indexer_sparse_training: False
I0421 06:40:13.711680 136604529964864 pyconfig.py:471] Config param indexer_topk: 2048
I0421 06:40:13.711705 136604529964864 pyconfig.py:471] Config param inference_benchmark_test: False
I0421 06:40:13.711721 136604529964864 pyconfig.py:471] Config param inference_metadata_file: 
I0421 06:40:13.711736 136604529964864 pyconfig.py:471] Config param inference_microbenchmark_log_file_path: 
I0421 06:40:13.711752 136604529964864 pyconfig.py:471] Config param inference_microbenchmark_loop_iters: 10
I0421 06:40:13.711767 136604529964864 pyconfig.py:471] Config param inference_microbenchmark_num_samples: [1, 2, 3, 4, 5]
I0421 06:40:13.711783 136604529964864 pyconfig.py:471] Config param inference_microbenchmark_prefill_lengths: 64,128,256,512,1024
I0421 06:40:13.711798 136604529964864 pyconfig.py:471] Config param inference_microbenchmark_stages: prefill,generate
I0421 06:40:13.711813 136604529964864 pyconfig.py:471] Config param inference_server: MaxtextInterleavedServer
I0421 06:40:13.711828 136604529964864 pyconfig.py:471] Config param inhomogeneous_layer_cycle_interval: 1
I0421 06:40:13.711844 136604529964864 pyconfig.py:471] Config param init_weights_seed: 0
I0421 06:40:13.711858 136604529964864 pyconfig.py:471] Config param input_data_sharding_logical_axes: ['activation_embed_and_logits_batch', 'activation_norm_length']
I0421 06:40:13.711874 136604529964864 pyconfig.py:471] Config param interleave_moe_layer_step: 1
I0421 06:40:13.711890 136604529964864 pyconfig.py:471] Config param intermediate_size_for_vit: 5632
I0421 06:40:13.711905 136604529964864 pyconfig.py:471] Config param internal_compile: False
I0421 06:40:13.711921 136604529964864 pyconfig.py:471] Config param internal_compile_num_devices: -1
I0421 06:40:13.711936 136604529964864 pyconfig.py:471] Config param jax_cache_dir: ~/jax_cache
I0421 06:40:13.711952 136604529964864 pyconfig.py:471] Config param jax_debug_log_modules: 
I0421 06:40:13.711966 136604529964864 pyconfig.py:471] Config param jax_distributed_initialization_timeout: 300
I0421 06:40:13.711981 136604529964864 pyconfig.py:471] Config param jax_profiler_port: 9999
I0421 06:40:13.711997 136604529964864 pyconfig.py:471] Config param key_proj: RematLocation.REMAT
I0421 06:40:13.712013 136604529964864 pyconfig.py:471] Config param kv_cache_buffer: 256
I0421 06:40:13.712028 136604529964864 pyconfig.py:471] Config param kv_lora_rank: 512
I0421 06:40:13.712044 136604529964864 pyconfig.py:471] Config param kv_quant_axis: KvQuantAxis.HEADS_AND_DKV
I0421 06:40:13.712063 136604529964864 pyconfig.py:471] Config param kv_quant_dtype: int8
I0421 06:40:13.712079 136604529964864 pyconfig.py:471] Config param kv_wa_proj: RematLocation.REMAT
I0421 06:40:13.712095 136604529964864 pyconfig.py:471] Config param learning_rate: 0.0002
I0421 06:40:13.712112 136604529964864 pyconfig.py:471] Config param learning_rate_final_fraction: 0.1
I0421 06:40:13.712128 136604529964864 pyconfig.py:471] Config param learning_rate_schedule_steps: 200000
I0421 06:40:13.712144 136604529964864 pyconfig.py:471] Config param load_balance_loss_weight: 0.0
I0421 06:40:13.712161 136604529964864 pyconfig.py:471] Config param load_checkpoint_only_once: False
I0421 06:40:13.712177 136604529964864 pyconfig.py:471] Config param load_from_prefill_dir: False
I0421 06:40:13.712192 136604529964864 pyconfig.py:471] Config param load_full_state_path: 
I0421 06:40:13.712208 136604529964864 pyconfig.py:471] Config param load_parameters_path: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items
I0421 06:40:13.712224 136604529964864 pyconfig.py:471] Config param local_checkpoint_directory: 
I0421 06:40:13.712242 136604529964864 pyconfig.py:471] Config param local_checkpoint_period: 0
I0421 06:40:13.712257 136604529964864 pyconfig.py:471] Config param local_rope_max_timescale: -1
I0421 06:40:13.712273 136604529964864 pyconfig.py:471] Config param local_rope_proportion: 1.0
I0421 06:40:13.712288 136604529964864 pyconfig.py:471] Config param log_config: True
I0421 06:40:13.712304 136604529964864 pyconfig.py:471] Config param log_period: 10
I0421 06:40:13.712320 136604529964864 pyconfig.py:471] Config param logical_axis_rules: (('activation_embed_and_logits_batch', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_embed_and_logits_batch_sequence', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_vocab', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_vocab', ('tensor', 'tensor_transpose')), ('activation_vocab', 'tensor_sequence'), ('activation_vocab', ('sequence', 'context')), ('vocab', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('embed_vocab', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence', 'autoregressive')), ('activation_kv_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence')), ('activation_attn_length', ('sequence', 'context')), ('activation_attn_length', ('context',)), ('activation_q_length', ('context',)), ('activation_kv_length', ()), ('activation_attn_embed', ('tensor', 'tensor_transpose')), ('activation_kv', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_kv_head_dim', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('q_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('kv_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('qkv', ()), ('kv', ()), ('kv_head_dim', ()), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'expert')), ('q_lora_up_proj', ()), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'expert')), ('kv_lora_up_proj', ()), ('activation_batch_moe', ('data', 'fsdp', 'fsdp_transpose')), ('activation_length_moe', ('sequence', 'context')), ('activation_length_moe', ('context',)), ('activation_norm_length_moe', ('tensor_sequence', 'context', 'sequence')), ('activation_embed_moe', ('tensor', 'tensor_transpose')), ('activation_mlp_moe', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_exp', ('expert',)), ('exp', 'expert'), ('mlp_moe', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'context')), ('embed_moe', ('fsdp', 'sequence', 'context')), ('activation_mlp', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_length', ('sequence', 'context')), ('activation_length', ('context',)), ('activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_embed', ('tensor', 'tensor_transpose')), ('activation_stage', 'stage'), ('mlp', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'context', 'expert')), ('norm', ('tensor', 'tensor_transpose')), ('layers', 'stage'), ('diloco', 'diloco'), ('engram_dim', ('tensor',)), ('dense_layers', ()), ('moe_layers', ()), ('mhc', ()), ('prefill_activation_length', ('sequence', 'context')), ('prefill_activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_prefill_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_length', ('sequence',)), ('cache_heads', ('autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence')), ('cache_heads', ('autoregressive', 'tensor', 'tensor_sequence')), ('paged_kv_heads', ('tensor',)), ('cache_batch_prefill', ()), ('cache_batch', ()), ('cache_heads_none', ()), ('cache_kv', ()), ('cache_sequence', ()), ('num_pages', ()), ('tokens_per_page', ()), ('paged_kv_head_dim_size', ()), ('mlp_no_fsdp', ('tensor', 'tensor_sequence', 'autoregressive')), ('embed_tensor_transpose', ('tensor_transpose',)), ('exp_with_fsdp', 'fsdp'))
I0421 06:40:13.712393 136604529964864 pyconfig.py:471] Config param logits_dot_in_fp32: False
I0421 06:40:13.712410 136604529964864 pyconfig.py:471] Config param logits_via_embedding: True
I0421 06:40:13.712426 136604529964864 pyconfig.py:471] Config param lora_input_adapters_path: 
I0421 06:40:13.712440 136604529964864 pyconfig.py:471] Config param loss_algo: grpo
I0421 06:40:13.712456 136604529964864 pyconfig.py:471] Config param lr_schedule_type: LearningRateScheduleType.COSINE
I0421 06:40:13.712474 136604529964864 pyconfig.py:471] Config param managed_mldiagnostics: False
I0421 06:40:13.712490 136604529964864 pyconfig.py:471] Config param managed_mldiagnostics_dir: /deps/maxtext_output/gpt3-52k_2026-04-21-06-40/managed-mldiagnostics
I0421 06:40:13.712505 136604529964864 pyconfig.py:471] Config param managed_mldiagnostics_run_group: 
I0421 06:40:13.712521 136604529964864 pyconfig.py:471] Config param matmul_precision: MatmulPrecision.DEFAULT
I0421 06:40:13.712539 136604529964864 pyconfig.py:471] Config param max_checkify: False
I0421 06:40:13.712553 136604529964864 pyconfig.py:471] Config param max_concurrency: 256
I0421 06:40:13.712569 136604529964864 pyconfig.py:471] Config param max_corpus_chars: 10000000
I0421 06:40:13.712585 136604529964864 pyconfig.py:471] Config param max_num_batched_tokens: None
I0421 06:40:13.712600 136604529964864 pyconfig.py:471] Config param max_num_checkpoints_to_keep: None
I0421 06:40:13.712616 136604529964864 pyconfig.py:471] Config param max_num_images_per_example: -1
I0421 06:40:13.712630 136604529964864 pyconfig.py:471] Config param max_num_seqs: None
I0421 06:40:13.712646 136604529964864 pyconfig.py:471] Config param max_position_embeddings: 163840
I0421 06:40:13.712660 136604529964864 pyconfig.py:471] Config param max_prefill_predict_length: 64
I0421 06:40:13.712676 136604529964864 pyconfig.py:471] Config param max_sample_len_for_audio: 10000
I0421 06:40:13.712692 136604529964864 pyconfig.py:471] Config param max_segments_per_seq: -1
I0421 06:40:13.712718 136604529964864 pyconfig.py:471] Config param max_source_positions_for_audio: 1500
I0421 06:40:13.712734 136604529964864 pyconfig.py:471] Config param max_target_length: 2048
I0421 06:40:13.712748 136604529964864 pyconfig.py:471] Config param max_timescale_for_audio: 10000.0
I0421 06:40:13.712764 136604529964864 pyconfig.py:471] Config param megablox: True
I0421 06:40:13.712780 136604529964864 pyconfig.py:471] Config param merge_gating_gmm: False
I0421 06:40:13.712796 136604529964864 pyconfig.py:471] Config param mesh_axes: ['diloco', 'data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive']
I0421 06:40:13.712814 136604529964864 pyconfig.py:471] Config param metrics_dir: /deps/maxtext_output/gpt3-52k_2026-04-21-06-40/metrics/
I0421 06:40:13.712830 136604529964864 pyconfig.py:471] Config param metrics_file: 
I0421 06:40:13.712846 136604529964864 pyconfig.py:471] Config param mhc_expansion_rate: 1
I0421 06:40:13.712861 136604529964864 pyconfig.py:471] Config param micro_batch_size_to_eval_on: 64
I0421 06:40:13.712877 136604529964864 pyconfig.py:471] Config param micro_batch_size_to_train_on: 64
I0421 06:40:13.712893 136604529964864 pyconfig.py:471] Config param mla_kv: RematLocation.REMAT
I0421 06:40:13.712908 136604529964864 pyconfig.py:471] Config param mla_naive_kvcache: True
I0421 06:40:13.712924 136604529964864 pyconfig.py:471] Config param mla_q: RematLocation.REMAT
I0421 06:40:13.712940 136604529964864 pyconfig.py:471] Config param mlp_activations: ['gelu']
I0421 06:40:13.712956 136604529964864 pyconfig.py:471] Config param mlp_activations_limit: -1.0
I0421 06:40:13.712972 136604529964864 pyconfig.py:471] Config param mlp_bias: False
I0421 06:40:13.712988 136604529964864 pyconfig.py:471] Config param mlp_dim: 64
I0421 06:40:13.713004 136604529964864 pyconfig.py:471] Config param mlpwi: RematLocation.REMAT
I0421 06:40:13.713020 136604529964864 pyconfig.py:471] Config param mlpwi_0: RematLocation.REMAT
I0421 06:40:13.713034 136604529964864 pyconfig.py:471] Config param mlpwi_1: RematLocation.REMAT
I0421 06:40:13.713050 136604529964864 pyconfig.py:471] Config param mlpwo: RematLocation.REMAT
I0421 06:40:13.713066 136604529964864 pyconfig.py:471] Config param moba: False
I0421 06:40:13.713082 136604529964864 pyconfig.py:471] Config param moba_chunk_size: 1024
I0421 06:40:13.713097 136604529964864 pyconfig.py:471] Config param moba_topk: 8
I0421 06:40:13.713113 136604529964864 pyconfig.py:471] Config param model_call_mode: 
I0421 06:40:13.713129 136604529964864 pyconfig.py:471] Config param model_name: gpt3-52k
I0421 06:40:13.713145 136604529964864 pyconfig.py:471] Config param moe_expert_input_dim: -1
I0421 06:40:13.713164 136604529964864 pyconfig.py:471] Config param moe_fsdp_use_two_stage_all_gather: False
I0421 06:40:13.713179 136604529964864 pyconfig.py:471] Config param moe_mlp_dim: -1
I0421 06:40:13.713194 136604529964864 pyconfig.py:471] Config param moe_mlpwi_0: RematLocation.REMAT
I0421 06:40:13.713209 136604529964864 pyconfig.py:471] Config param moe_mlpwi_1: RematLocation.REMAT
I0421 06:40:13.713226 136604529964864 pyconfig.py:471] Config param moe_mlpwo: RematLocation.REMAT
I0421 06:40:13.713242 136604529964864 pyconfig.py:471] Config param monitor_goodput: False
I0421 06:40:13.713258 136604529964864 pyconfig.py:471] Config param monitor_step_time_deviation: True
I0421 06:40:13.713274 136604529964864 pyconfig.py:471] Config param mrope_section: [24, 20, 20]
I0421 06:40:13.713290 136604529964864 pyconfig.py:471] Config param mscale: 1.0
I0421 06:40:13.713306 136604529964864 pyconfig.py:471] Config param mtc_data_parallelism: 0
I0421 06:40:13.713321 136604529964864 pyconfig.py:471] Config param mtp_eval_target_module: 0
I0421 06:40:13.713338 136604529964864 pyconfig.py:471] Config param mtp_loss_scaling_factor: 0.1
I0421 06:40:13.713354 136604529964864 pyconfig.py:471] Config param mtp_num_layers: 0
I0421 06:40:13.713370 136604529964864 pyconfig.py:471] Config param mu_dtype: float32
I0421 06:40:13.713395 136604529964864 pyconfig.py:471] Config param multi_sampling: False
I0421 06:40:13.713411 136604529964864 pyconfig.py:471] Config param multi_tier_checkpointing_backup_interval_minutes: 0
I0421 06:40:13.713425 136604529964864 pyconfig.py:471] Config param muon_beta: 0.95
I0421 06:40:13.713442 136604529964864 pyconfig.py:471] Config param muon_consistent_rms: None
I0421 06:40:13.713458 136604529964864 pyconfig.py:471] Config param muon_weight_decay: 0.0
I0421 06:40:13.713474 136604529964864 pyconfig.py:471] Config param n_routing_groups: -1
I0421 06:40:13.713490 136604529964864 pyconfig.py:471] Config param n_window_for_audio: 50
I0421 06:40:13.713504 136604529964864 pyconfig.py:471] Config param n_window_infer_for_audio: 800
I0421 06:40:13.713521 136604529964864 pyconfig.py:471] Config param nope_layer_interval: -1
I0421 06:40:13.713536 136604529964864 pyconfig.py:471] Config param norm_topk_prob: False
I0421 06:40:13.713552 136604529964864 pyconfig.py:471] Config param normalization_layer_epsilon: 1e-05
I0421 06:40:13.713570 136604529964864 pyconfig.py:471] Config param normalize_embedding_logits: False
I0421 06:40:13.713586 136604529964864 pyconfig.py:471] Config param num_attention_heads_for_vit: 16
I0421 06:40:13.713603 136604529964864 pyconfig.py:471] Config param num_batches: 4
I0421 06:40:13.713618 136604529964864 pyconfig.py:471] Config param num_channels_for_vit: 3
I0421 06:40:13.713634 136604529964864 pyconfig.py:471] Config param num_conv_layers_for_audio: 3
I0421 06:40:13.713649 136604529964864 pyconfig.py:471] Config param num_decoder_layers: 1
I0421 06:40:13.713665 136604529964864 pyconfig.py:471] Config param num_diloco_replicas: 1
I0421 06:40:13.713679 136604529964864 pyconfig.py:471] Config param num_epoch: 1
I0421 06:40:13.713701 136604529964864 pyconfig.py:471] Config param num_eval_passes: 1
I0421 06:40:13.713717 136604529964864 pyconfig.py:471] Config param num_experts: 1
I0421 06:40:13.713733 136604529964864 pyconfig.py:471] Config param num_experts_per_tok: 1
I0421 06:40:13.713748 136604529964864 pyconfig.py:471] Config param num_generations: 2
I0421 06:40:13.713764 136604529964864 pyconfig.py:471] Config param num_hidden_layers_for_vit: 34
I0421 06:40:13.713779 136604529964864 pyconfig.py:471] Config param num_iterations: 1
I0421 06:40:13.713796 136604529964864 pyconfig.py:471] Config param num_kv_heads: 2
I0421 06:40:13.713811 136604529964864 pyconfig.py:471] Config param num_layers_per_pipeline_stage: 1
I0421 06:40:13.713827 136604529964864 pyconfig.py:471] Config param num_mel_bins_for_audio: 128
I0421 06:40:13.713841 136604529964864 pyconfig.py:471] Config param num_pipeline_microbatches: -1
I0421 06:40:13.713856 136604529964864 pyconfig.py:471] Config param num_pipeline_repeats: -1
I0421 06:40:13.713872 136604529964864 pyconfig.py:471] Config param num_position_embeddings_for_vit: 1024
I0421 06:40:13.713886 136604529964864 pyconfig.py:471] Config param num_query_heads: 2
I0421 06:40:13.713902 136604529964864 pyconfig.py:471] Config param num_samplers_slices: -1
I0421 06:40:13.713917 136604529964864 pyconfig.py:471] Config param num_slices: 1
I0421 06:40:13.713932 136604529964864 pyconfig.py:471] Config param num_target_devices: 32
I0421 06:40:13.713946 136604529964864 pyconfig.py:471] Config param num_test_batches: 5
I0421 06:40:13.713962 136604529964864 pyconfig.py:471] Config param num_trainer_slices: -1
I0421 06:40:13.713978 136604529964864 pyconfig.py:471] Config param num_vocab_tiling: 1
I0421 06:40:13.713993 136604529964864 pyconfig.py:471] Config param off_policy_steps: 0
I0421 06:40:13.714009 136604529964864 pyconfig.py:471] Config param offline_data_dir: None
I0421 06:40:13.714024 136604529964864 pyconfig.py:471] Config param opt_type: OptimizerType.ADAM_PAX
I0421 06:40:13.714042 136604529964864 pyconfig.py:471] Config param optimize_mesh_for_tpu_v6e: False
I0421 06:40:13.714057 136604529964864 pyconfig.py:471] Config param optimizer_memory_host_offload: False
I0421 06:40:13.714072 136604529964864 pyconfig.py:471] Config param original_max_position_embeddings: 4096
I0421 06:40:13.714087 136604529964864 pyconfig.py:471] Config param out_hidden_size_for_vit: 512
I0421 06:40:13.714102 136604529964864 pyconfig.py:471] Config param out_proj: RematLocation.REMAT
I0421 06:40:13.714119 136604529964864 pyconfig.py:471] Config param output_dim_for_audio: 512
I0421 06:40:13.714134 136604529964864 pyconfig.py:471] Config param override_logical_axis_rules: False
I0421 06:40:13.714148 136604529964864 pyconfig.py:471] Config param override_model_config: True
I0421 06:40:13.714167 136604529964864 pyconfig.py:471] Config param packing: True
I0421 06:40:13.714182 136604529964864 pyconfig.py:471] Config param pagedattn_head_dim_alignment: 128
I0421 06:40:13.714198 136604529964864 pyconfig.py:471] Config param pagedattn_max_pages_per_group: -1
I0421 06:40:13.714212 136604529964864 pyconfig.py:471] Config param pagedattn_num_pages: 64
I0421 06:40:13.714227 136604529964864 pyconfig.py:471] Config param pagedattn_pages_per_compute_block: 4
I0421 06:40:13.714242 136604529964864 pyconfig.py:471] Config param pagedattn_tokens_per_page: 32
I0421 06:40:13.714258 136604529964864 pyconfig.py:471] Config param param_scan_axis: 1
I0421 06:40:13.714274 136604529964864 pyconfig.py:471] Config param parameter_memory_host_offload: False
I0421 06:40:13.714289 136604529964864 pyconfig.py:471] Config param partial_rotary_factor: 1.0
I0421 06:40:13.714303 136604529964864 pyconfig.py:471] Config param patch_size_for_vit: 14
I0421 06:40:13.714319 136604529964864 pyconfig.py:471] Config param penalty_incorrect_answer: -1.0
I0421 06:40:13.714335 136604529964864 pyconfig.py:471] Config param penalty_incorrect_format: -0.5
I0421 06:40:13.714350 136604529964864 pyconfig.py:471] Config param per_device_batch_size: 2
I0421 06:40:13.714365 136604529964864 pyconfig.py:471] Config param per_device_batch_size_increment: 2.0
I0421 06:40:13.714382 136604529964864 pyconfig.py:471] Config param per_device_batch_size_start: 4.0
I0421 06:40:13.714397 136604529964864 pyconfig.py:471] Config param pipeline_delay_activation_forwarding: False
I0421 06:40:13.714413 136604529964864 pyconfig.py:471] Config param pipeline_fsdp_ag_once: False
I0421 06:40:13.714428 136604529964864 pyconfig.py:471] Config param pipeline_fsdp_ag_per_repeat: False
I0421 06:40:13.714443 136604529964864 pyconfig.py:471] Config param pipeline_parallel_layers: 1
I0421 06:40:13.714458 136604529964864 pyconfig.py:471] Config param pixel_shuffle_ratio_for_vit: 0.5
I0421 06:40:13.714474 136604529964864 pyconfig.py:471] Config param posemb_type_for_vit: learn
I0421 06:40:13.714488 136604529964864 pyconfig.py:471] Config param position_id_per_seconds: 25
I0421 06:40:13.714504 136604529964864 pyconfig.py:471] Config param prefill_cache_axis_order: 1,2,0,3
I0421 06:40:13.714520 136604529964864 pyconfig.py:471] Config param prefill_cache_dir: 
I0421 06:40:13.714536 136604529964864 pyconfig.py:471] Config param prefill_chunk_size: 256
I0421 06:40:13.714573 136604529964864 pyconfig.py:471] Config param prefill_slice: v5e-16
I0421 06:40:13.714587 136604529964864 pyconfig.py:471] Config param prefix_caching_dram_byte: 100000000000
I0421 06:40:13.714614 136604529964864 pyconfig.py:471] Config param prefix_caching_hbm_byte: 10000000000
I0421 06:40:13.714630 136604529964864 pyconfig.py:471] Config param profile_cleanly: True
I0421 06:40:13.714644 136604529964864 pyconfig.py:471] Config param profile_periodically_period: -1
I0421 06:40:13.714660 136604529964864 pyconfig.py:471] Config param profile_power_events: False
I0421 06:40:13.714674 136604529964864 pyconfig.py:471] Config param profiler: ProfilerType.NONE
I0421 06:40:13.714692 136604529964864 pyconfig.py:471] Config param profiler_steps: 5
I0421 06:40:13.714717 136604529964864 pyconfig.py:471] Config param projector_dropout_for_vit: 0.0
I0421 06:40:13.714732 136604529964864 pyconfig.py:471] Config param projector_input_dim_for_vit: 4096
I0421 06:40:13.714748 136604529964864 pyconfig.py:471] Config param projector_output_dim_for_vit: 4096
I0421 06:40:13.714762 136604529964864 pyconfig.py:471] Config param prometheus_port: 0
I0421 06:40:13.714778 136604529964864 pyconfig.py:471] Config param prompt: I love to
I0421 06:40:13.714793 136604529964864 pyconfig.py:471] Config param pure_nnx: False
I0421 06:40:13.714809 136604529964864 pyconfig.py:471] Config param pure_nnx_decoder: False
I0421 06:40:13.714824 136604529964864 pyconfig.py:471] Config param q_lora_rank: 0
I0421 06:40:13.714839 136604529964864 pyconfig.py:471] Config param qk_clip_threshold: 100.0
I0421 06:40:13.714855 136604529964864 pyconfig.py:471] Config param qk_nope_head_dim: 128
I0421 06:40:13.714871 136604529964864 pyconfig.py:471] Config param qk_norm_with_scale: True
I0421 06:40:13.714885 136604529964864 pyconfig.py:471] Config param qk_rope_head_dim: 64
I0421 06:40:13.714901 136604529964864 pyconfig.py:471] Config param qkv_proj: RematLocation.REMAT
I0421 06:40:13.714917 136604529964864 pyconfig.py:471] Config param quant_cfg_path: 
I0421 06:40:13.714932 136604529964864 pyconfig.py:471] Config param quantization: QuantizationType.NONE
I0421 06:40:13.714950 136604529964864 pyconfig.py:471] Config param quantization_local_shard_count: 4
I0421 06:40:13.714966 136604529964864 pyconfig.py:471] Config param quantize_kvcache: False
I0421 06:40:13.714981 136604529964864 pyconfig.py:471] Config param query_proj: RematLocation.REMAT
I0421 06:40:13.714997 136604529964864 pyconfig.py:471] Config param query_wa_proj: RematLocation.REMAT
I0421 06:40:13.715011 136604529964864 pyconfig.py:471] Config param ragged_block_size: 256
I0421 06:40:13.715027 136604529964864 pyconfig.py:471] Config param ragged_buffer_factor: -1.0
I0421 06:40:13.715043 136604529964864 pyconfig.py:471] Config param rampup_end_step: 0
I0421 06:40:13.715059 136604529964864 pyconfig.py:471] Config param rampup_samples_per_increment_to_load: None
I0421 06:40:13.715076 136604529964864 pyconfig.py:471] Config param reasoning_end_token: </reasoning>
I0421 06:40:13.715091 136604529964864 pyconfig.py:471] Config param reasoning_start_token: <reasoning>
I0421 06:40:13.715105 136604529964864 pyconfig.py:471] Config param record_internal_nn_metrics: 0
I0421 06:40:13.715121 136604529964864 pyconfig.py:471] Config param remat_policy: full
I0421 06:40:13.715138 136604529964864 pyconfig.py:471] Config param remat_policy_for_vit: minimal
I0421 06:40:13.715156 136604529964864 pyconfig.py:471] Config param remove_size_one_mesh_axis_from_type: True
I0421 06:40:13.715172 136604529964864 pyconfig.py:471] Config param replicate_quant_scale: False
I0421 06:40:13.715188 136604529964864 pyconfig.py:471] Config param replicator_backup_interval_minutes: 0
I0421 06:40:13.715203 136604529964864 pyconfig.py:471] Config param report_heartbeat_metric_for_gcp_monitoring: False
I0421 06:40:13.715217 136604529964864 pyconfig.py:471] Config param report_performance_metric_for_gcp_monitoring: False
I0421 06:40:13.715233 136604529964864 pyconfig.py:471] Config param reshape_q: False
I0421 06:40:13.715247 136604529964864 pyconfig.py:471] Config param return_log_prob: False
I0421 06:40:13.715263 136604529964864 pyconfig.py:471] Config param reuse_example_batch: 0
I0421 06:40:13.715277 136604529964864 pyconfig.py:471] Config param reward_exact_answer: 5.0
I0421 06:40:13.715294 136604529964864 pyconfig.py:471] Config param reward_exact_format_match: 3.0
I0421 06:40:13.715309 136604529964864 pyconfig.py:471] Config param reward_partial_format_match: 0.5
I0421 06:40:13.715325 136604529964864 pyconfig.py:471] Config param reward_ratio_guess_to_answer_high: 0.5
I0421 06:40:13.715340 136604529964864 pyconfig.py:471] Config param reward_ratio_guess_to_answer_low: 0.25
I0421 06:40:13.715356 136604529964864 pyconfig.py:471] Config param reward_white_space_format_match: 1.5
I0421 06:40:13.715371 136604529964864 pyconfig.py:471] Config param rl: {'num_generations': 2, 'num_iterations': 1, 'grpo_beta': 0.08, 'grpo_epsilon': 0.2, 'loss_algo': 'grpo', 'use_agentic_rollout': False, 'max_concurrency': 256, 'off_policy_steps': 0, 'system_prompt': '', 'degenerate_group_masking': True, 'epsilon_high': None}
I0421 06:40:13.715392 136604529964864 pyconfig.py:471] Config param rollout_data_parallelism: -1
I0421 06:40:13.715408 136604529964864 pyconfig.py:471] Config param rollout_expert_parallelism: 1
I0421 06:40:13.715424 136604529964864 pyconfig.py:471] Config param rollout_micro_batch_size: -1
I0421 06:40:13.715438 136604529964864 pyconfig.py:471] Config param rollout_tensor_parallelism: -1
I0421 06:40:13.715454 136604529964864 pyconfig.py:471] Config param rope_attention_scaling: False
I0421 06:40:13.715468 136604529964864 pyconfig.py:471] Config param rope_factor: 40
I0421 06:40:13.715484 136604529964864 pyconfig.py:471] Config param rope_interleave: True
I0421 06:40:13.715499 136604529964864 pyconfig.py:471] Config param rope_linear_scaling_factor: 1.0
I0421 06:40:13.715515 136604529964864 pyconfig.py:471] Config param rope_max_timescale: 10000
I0421 06:40:13.715531 136604529964864 pyconfig.py:471] Config param rope_min_timescale: 1
I0421 06:40:13.715546 136604529964864 pyconfig.py:471] Config param rope_theta_for_vit: 10000
I0421 06:40:13.715560 136604529964864 pyconfig.py:471] Config param rope_truncate: True
I0421 06:40:13.715576 136604529964864 pyconfig.py:471] Config param rope_type: RopeType.DEFAULT
I0421 06:40:13.715594 136604529964864 pyconfig.py:471] Config param rope_use_scale: True
I0421 06:40:13.715610 136604529964864 pyconfig.py:471] Config param routed_bias: False
I0421 06:40:13.715625 136604529964864 pyconfig.py:471] Config param routed_bias_update_rate: 0.0
I0421 06:40:13.715641 136604529964864 pyconfig.py:471] Config param routed_scaling_factor: 1.0
I0421 06:40:13.715656 136604529964864 pyconfig.py:471] Config param routed_score_func: 
I0421 06:40:13.715672 136604529964864 pyconfig.py:471] Config param run_name: gpt3-52k_2026-04-21-06-40
I0421 06:40:13.715688 136604529964864 pyconfig.py:471] Config param sa_block_kv: 512
I0421 06:40:13.715714 136604529964864 pyconfig.py:471] Config param sa_block_kv_compute: 512
I0421 06:40:13.715731 136604529964864 pyconfig.py:471] Config param sa_block_kv_dkv: 512
I0421 06:40:13.715749 136604529964864 pyconfig.py:471] Config param sa_block_kv_dkv_compute: 512
I0421 06:40:13.715765 136604529964864 pyconfig.py:471] Config param sa_block_kv_dq: 512
I0421 06:40:13.715781 136604529964864 pyconfig.py:471] Config param sa_block_q: 512
I0421 06:40:13.715795 136604529964864 pyconfig.py:471] Config param sa_block_q_dkv: 512
I0421 06:40:13.715811 136604529964864 pyconfig.py:471] Config param sa_block_q_dq: 512
I0421 06:40:13.715827 136604529964864 pyconfig.py:471] Config param sa_k_layout: HEAD_DIM_MINOR
I0421 06:40:13.715841 136604529964864 pyconfig.py:471] Config param sa_q_layout: HEAD_DIM_MINOR
I0421 06:40:13.715856 136604529964864 pyconfig.py:471] Config param sa_use_fused_bwd_kernel: False
I0421 06:40:13.715872 136604529964864 pyconfig.py:471] Config param sa_v_layout: HEAD_DIM_MINOR
I0421 06:40:13.715887 136604529964864 pyconfig.py:471] Config param sampler_devices_fraction: 0.5
I0421 06:40:13.715903 136604529964864 pyconfig.py:471] Config param save_checkpoint_on_completion: True
I0421 06:40:13.715918 136604529964864 pyconfig.py:471] Config param save_config_to_gcs: False
I0421 06:40:13.715934 136604529964864 pyconfig.py:471] Config param save_quantized_params_path: 
I0421 06:40:13.715949 136604529964864 pyconfig.py:471] Config param scale_embedding_for_audio: True
I0421 06:40:13.715963 136604529964864 pyconfig.py:471] Config param scan_layers: True
I0421 06:40:13.715979 136604529964864 pyconfig.py:471] Config param scan_layers_per_stage: False
I0421 06:40:13.715995 136604529964864 pyconfig.py:471] Config param scan_pipeline_iterations: True
I0421 06:40:13.716010 136604529964864 pyconfig.py:471] Config param scan_pipeline_repeats: False
I0421 06:40:13.716024 136604529964864 pyconfig.py:471] Config param set_remat_policy_on_layers_per_stage: False
I0421 06:40:13.716040 136604529964864 pyconfig.py:471] Config param set_remat_policy_on_pipeline_iterations: True
I0421 06:40:13.716056 136604529964864 pyconfig.py:471] Config param sft_train_on_completion_only: False
I0421 06:40:13.716071 136604529964864 pyconfig.py:471] Config param shard_exp_on_fsdp: False
I0421 06:40:13.716086 136604529964864 pyconfig.py:471] Config param shard_mode: ShardMode.AUTO
I0421 06:40:13.716104 136604529964864 pyconfig.py:471] Config param shard_optimizer_over_data: False
I0421 06:40:13.716119 136604529964864 pyconfig.py:471] Config param sharding_strategy: None
I0421 06:40:13.716135 136604529964864 pyconfig.py:471] Config param sharding_tolerance: 0.02
I0421 06:40:13.716150 136604529964864 pyconfig.py:471] Config param shardy: True
I0421 06:40:13.716168 136604529964864 pyconfig.py:471] Config param share_kv_projections: False
I0421 06:40:13.716183 136604529964864 pyconfig.py:471] Config param shared_experts: 0
I0421 06:40:13.716199 136604529964864 pyconfig.py:471] Config param sinkhorn_iterations: 20
I0421 06:40:13.716214 136604529964864 pyconfig.py:471] Config param skip_first_n_steps_for_profiler: 1
I0421 06:40:13.716229 136604529964864 pyconfig.py:471] Config param skip_jax_distributed_system: False
I0421 06:40:13.716245 136604529964864 pyconfig.py:471] Config param skip_step_interval: 128
I0421 06:40:13.716261 136604529964864 pyconfig.py:471] Config param skip_step_on_spikes: False
I0421 06:40:13.716305 136604529964864 pyconfig.py:471] Config param skip_step_scaling_factor: 6.0
I0421 06:40:13.716321 136604529964864 pyconfig.py:471] Config param sliding_window_size: 0
I0421 06:40:13.716336 136604529964864 pyconfig.py:471] Config param solution_end_token: </answer>
I0421 06:40:13.716352 136604529964864 pyconfig.py:471] Config param solution_start_token: <answer>
I0421 06:40:13.716367 136604529964864 pyconfig.py:471] Config param source_checkpoint_layout: orbax
I0421 06:40:13.716382 136604529964864 pyconfig.py:471] Config param sparse_matmul: True
I0421 06:40:13.716398 136604529964864 pyconfig.py:471] Config param spatial_merge_size_for_vit: 2
I0421 06:40:13.716413 136604529964864 pyconfig.py:471] Config param stack_prefill_result_cache: False
I0421 06:40:13.716428 136604529964864 pyconfig.py:471] Config param stack_trace_interval_seconds: 600
I0421 06:40:13.716444 136604529964864 pyconfig.py:471] Config param stack_trace_to_cloud: False
I0421 06:40:13.716460 136604529964864 pyconfig.py:471] Config param step_deviation_interval_seconds: 30
I0421 06:40:13.716476 136604529964864 pyconfig.py:471] Config param steps: 200000
I0421 06:40:13.716491 136604529964864 pyconfig.py:471] Config param stop_strings: None
I0421 06:40:13.716507 136604529964864 pyconfig.py:471] Config param student_overrides: {'model_name': 'llama3.1-8b'}
I0421 06:40:13.716524 136604529964864 pyconfig.py:471] Config param student_params_to_update: None
I0421 06:40:13.716538 136604529964864 pyconfig.py:471] Config param subslice_shape: 
I0421 06:40:13.716553 136604529964864 pyconfig.py:471] Config param swap_space_vllm_gb: 2
I0421 06:40:13.716568 136604529964864 pyconfig.py:471] Config param system_prompt: 
I0421 06:40:13.716583 136604529964864 pyconfig.py:471] Config param target_eval_loss: 0.0
I0421 06:40:13.716599 136604529964864 pyconfig.py:471] Config param teacher_overrides: {'model_name': 'llama3.1-8b'}
I0421 06:40:13.716614 136604529964864 pyconfig.py:471] Config param temperature_tuning: False
I0421 06:40:13.716629 136604529964864 pyconfig.py:471] Config param temporal_patch_size_for_vit: 2
I0421 06:40:13.716644 136604529964864 pyconfig.py:471] Config param tensorboard_dir: /deps/maxtext_output/gpt3-52k_2026-04-21-06-40/tensorboard/
I0421 06:40:13.716659 136604529964864 pyconfig.py:471] Config param tensors_on_device: None
I0421 06:40:13.716675 136604529964864 pyconfig.py:471] Config param tensors_to_offload: None
I0421 06:40:13.716690 136604529964864 pyconfig.py:471] Config param test_batch_start_index: 0
I0421 06:40:13.716714 136604529964864 pyconfig.py:471] Config param tile_size_for_vit: 336
I0421 06:40:13.716729 136604529964864 pyconfig.py:471] Config param tokenize_eval_data: True
I0421 06:40:13.716743 136604529964864 pyconfig.py:471] Config param tokenize_train_data: True
I0421 06:40:13.716758 136604529964864 pyconfig.py:471] Config param tokenizer_path: meta-llama/Llama-3.1-8B
I0421 06:40:13.716773 136604529964864 pyconfig.py:471] Config param tokenizer_type: TokenizerType.HUGGINGFACE
I0421 06:40:13.716791 136604529964864 pyconfig.py:471] Config param topk_routing_group: -1
I0421 06:40:13.716807 136604529964864 pyconfig.py:471] Config param train_data_columns: ['text']
I0421 06:40:13.716823 136604529964864 pyconfig.py:471] Config param train_fraction: 1.0
I0421 06:40:13.716837 136604529964864 pyconfig.py:471] Config param train_image_column: image
I0421 06:40:13.716853 136604529964864 pyconfig.py:471] Config param train_micro_batch_size: -1
I0421 06:40:13.716868 136604529964864 pyconfig.py:471] Config param train_split: train
I0421 06:40:13.716884 136604529964864 pyconfig.py:471] Config param trainable_parameters_mask: []
I0421 06:40:13.716899 136604529964864 pyconfig.py:471] Config param trainable_position_size: 2048
I0421 06:40:13.716914 136604529964864 pyconfig.py:471] Config param trainer_devices_fraction: 0.5
I0421 06:40:13.716929 136604529964864 pyconfig.py:471] Config param upload_all_profiler_results: False
I0421 06:40:13.716944 136604529964864 pyconfig.py:471] Config param use_2d_fsdp_sharding: False
I0421 06:40:13.716959 136604529964864 pyconfig.py:471] Config param use_agentic_rollout: False
I0421 06:40:13.716975 136604529964864 pyconfig.py:471] Config param use_audio: False
I0421 06:40:13.716990 136604529964864 pyconfig.py:471] Config param use_audio_in_video: False
I0421 06:40:13.717005 136604529964864 pyconfig.py:471] Config param use_batch_split_schedule: False
I0421 06:40:13.717021 136604529964864 pyconfig.py:471] Config param use_chat_template: False
I0421 06:40:13.717037 136604529964864 pyconfig.py:471] Config param use_chunked_prefill: False
I0421 06:40:13.717052 136604529964864 pyconfig.py:471] Config param use_custom_sort_vjp: True
I0421 06:40:13.717068 136604529964864 pyconfig.py:471] Config param use_dpo: False
I0421 06:40:13.717083 136604529964864 pyconfig.py:471] Config param use_gather_mosaic_kernel: False
I0421 06:40:13.717098 136604529964864 pyconfig.py:471] Config param use_grpo: True
I0421 06:40:13.717114 136604529964864 pyconfig.py:471] Config param use_indexer: False
I0421 06:40:13.717129 136604529964864 pyconfig.py:471] Config param use_iota_embed: True
I0421 06:40:13.717144 136604529964864 pyconfig.py:471] Config param use_jax_splash: False
I0421 06:40:13.717162 136604529964864 pyconfig.py:471] Config param use_max_logit_estimate: -1
I0421 06:40:13.717178 136604529964864 pyconfig.py:471] Config param use_mrope: False
I0421 06:40:13.717192 136604529964864 pyconfig.py:471] Config param use_multimodal: False
I0421 06:40:13.717206 136604529964864 pyconfig.py:471] Config param use_pathways: True
I0421 06:40:13.717222 136604529964864 pyconfig.py:471] Config param use_post_attn_norm: False
I0421 06:40:13.717237 136604529964864 pyconfig.py:471] Config param use_post_ffw_norm: False
I0421 06:40:13.717253 136604529964864 pyconfig.py:471] Config param use_qk_clip: False
I0421 06:40:13.717267 136604529964864 pyconfig.py:471] Config param use_qk_norm: False
I0421 06:40:13.717283 136604529964864 pyconfig.py:471] Config param use_qk_norm_in_gdn: True
I0421 06:40:13.717297 136604529964864 pyconfig.py:471] Config param use_qwix_quantization: False
I0421 06:40:13.717313 136604529964864 pyconfig.py:471] Config param use_ragged_attention: False
I0421 06:40:13.717328 136604529964864 pyconfig.py:471] Config param use_random_routing: False
I0421 06:40:13.717343 136604529964864 pyconfig.py:471] Config param use_replicator_service: False
I0421 06:40:13.717358 136604529964864 pyconfig.py:471] Config param use_ring_of_experts: False
I0421 06:40:13.717372 136604529964864 pyconfig.py:471] Config param use_sft: False
I0421 06:40:13.717388 136604529964864 pyconfig.py:471] Config param use_splash_scheduler: False
I0421 06:40:13.717403 136604529964864 pyconfig.py:471] Config param use_tokamax_gmm: False
I0421 06:40:13.717418 136604529964864 pyconfig.py:471] Config param use_tokamax_splash: False
I0421 06:40:13.717434 136604529964864 pyconfig.py:471] Config param use_truncation: True
I0421 06:40:13.717449 136604529964864 pyconfig.py:471] Config param use_tunix_gradient_accumulation: False
I0421 06:40:13.717466 136604529964864 pyconfig.py:471] Config param use_untrainable_positional_embedding: False
I0421 06:40:13.717480 136604529964864 pyconfig.py:471] Config param use_vertex_tensorboard: False
I0421 06:40:13.717496 136604529964864 pyconfig.py:471] Config param using_pipeline_parallelism: False
I0421 06:40:13.717511 136604529964864 pyconfig.py:471] Config param v_head_dim: 128
I0421 06:40:13.717526 136604529964864 pyconfig.py:471] Config param v_norm_with_scale: True
I0421 06:40:13.717540 136604529964864 pyconfig.py:471] Config param value_proj: RematLocation.REMAT
I0421 06:40:13.717556 136604529964864 pyconfig.py:471] Config param vertex_tensorboard_project: 
I0421 06:40:13.717572 136604529964864 pyconfig.py:471] Config param vertex_tensorboard_region: 
I0421 06:40:13.717586 136604529964864 pyconfig.py:471] Config param video_path: 
I0421 06:40:13.717602 136604529964864 pyconfig.py:471] Config param video_placeholder: <|video|>
I0421 06:40:13.717616 136604529964864 pyconfig.py:471] Config param vision_output_dim_for_vit: 4096
I0421 06:40:13.717632 136604529964864 pyconfig.py:471] Config param vision_output_length: -1
I0421 06:40:13.717646 136604529964864 pyconfig.py:471] Config param vllm_additional_config: {}
I0421 06:40:13.717662 136604529964864 pyconfig.py:471] Config param vllm_hf_config_path: 
I0421 06:40:13.717676 136604529964864 pyconfig.py:471] Config param vllm_hf_overrides: {}
I0421 06:40:13.717692 136604529964864 pyconfig.py:471] Config param vocab_size: 32000
I0421 06:40:13.717715 136604529964864 pyconfig.py:471] Config param warmup_steps_fraction: 0.1
I0421 06:40:13.717731 136604529964864 pyconfig.py:471] Config param weight_dtype: float32
I0421 06:40:13.717756 136604529964864 pyconfig.py:471] Config param weight_quantization_calibration_method: absmax
I0421 06:40:13.717772 136604529964864 pyconfig.py:471] Config param wi_tile_dlhs_batch_seq: 512
I0421 06:40:13.717788 136604529964864 pyconfig.py:471] Config param wi_tile_dlhs_embed_dim: 1024
I0421 06:40:13.717803 136604529964864 pyconfig.py:471] Config param wi_tile_dlhs_mlp_dim: 1024
I0421 06:40:13.717819 136604529964864 pyconfig.py:471] Config param wi_tile_drhs_batch_seq: 512
I0421 06:40:13.717835 136604529964864 pyconfig.py:471] Config param wi_tile_drhs_embed_dim: 1024
I0421 06:40:13.717850 136604529964864 pyconfig.py:471] Config param wi_tile_drhs_mlp_dim: 1024
I0421 06:40:13.717866 136604529964864 pyconfig.py:471] Config param wi_tile_fwd_batch_seq: 512
I0421 06:40:13.717881 136604529964864 pyconfig.py:471] Config param wi_tile_fwd_embed_dim: 1024
I0421 06:40:13.717896 136604529964864 pyconfig.py:471] Config param wi_tile_fwd_mlp_dim: 1024
I0421 06:40:13.717910 136604529964864 pyconfig.py:471] Config param wo_tile_dlhs_batch_seq: 512
I0421 06:40:13.717926 136604529964864 pyconfig.py:471] Config param wo_tile_dlhs_embed_dim: 1024
I0421 06:40:13.717941 136604529964864 pyconfig.py:471] Config param wo_tile_dlhs_mlp_dim: 1024
I0421 06:40:13.717956 136604529964864 pyconfig.py:471] Config param wo_tile_drhs_batch_seq: 512
I0421 06:40:13.717970 136604529964864 pyconfig.py:471] Config param wo_tile_drhs_embed_dim: 1024
I0421 06:40:13.717986 136604529964864 pyconfig.py:471] Config param wo_tile_drhs_mlp_dim: 1024
I0421 06:40:13.717999 136604529964864 pyconfig.py:471] Config param wo_tile_fwd_batch_seq: 512
I0421 06:40:13.718015 136604529964864 pyconfig.py:471] Config param wo_tile_fwd_embed_dim: 1024
I0421 06:40:13.718031 136604529964864 pyconfig.py:471] Config param wo_tile_fwd_mlp_dim: 1024
I0421 06:40:13.718045 136604529964864 pyconfig.py:471] Config param wsd_decay_steps_fraction: 0.1
I0421 06:40:13.718061 136604529964864 pyconfig.py:471] Config param wsd_decay_style: WsdDecayStyle.LINEAR
I0421 06:40:13.718079 136604529964864 pyconfig.py:471] Config param xprof_e2e_enable_fw_power_level_event: False
I0421 06:40:13.718094 136604529964864 pyconfig.py:471] Config param xprof_e2e_enable_fw_thermal_event: False
I0421 06:40:13.718110 136604529964864 pyconfig.py:471] Config param xprof_e2e_enable_fw_throttle_event: False
I0421 06:40:13.718124 136604529964864 pyconfig.py:471] Config param xprof_tpu_power_trace_level: 0
I0421 06:40:13.718142 136604529964864 pyconfig.py:471] Config param z_loss_multiplier: 0.0
I0421 06:40:13.718454 136604529964864 tokenizer.py:245] Tokenizer path: meta-llama/Llama-2-7b-chat-hf
I0421 06:40:13.718491 136604529964864 tokenizer.py:224] Loading HF tokenizer: meta-llama/Llama-2-7b-chat-hf
I0421 06:40:17.406306 136604529964864 _schedule.py:129] A polynomial schedule was set with a non-positive `transition_steps` value; this results in a constant schedule with value `init_value`.
I0421 06:40:17.409444 136604529964864 maxtext_utils.py:1565] Num_devices: 32, shape (1, 4, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1)
I0421 06:40:17.409570 136604529964864 train_distill.py:596] Applying logical axis rules for model initialization and training...
I0421 06:40:17.409643 136604529964864 train_distill.py:600] Loading Student from ...
I0421 06:40:17.409673 136604529964864 train_distill.py:169] --- Student Configuration ---
I0421 06:40:17.409708 136604529964864 train_distill.py:170]   Model Name:      gpt3-52k
I0421 06:40:17.409735 136604529964864 train_distill.py:171]   Dimensions:      1 Layers, 16 Emb Dim, 8 Head Dim
I0421 06:40:17.409753 136604529964864 train_distill.py:174]   Attention Heads: 2 Query, 2 KV
I0421 06:40:17.409774 136604529964864 train_distill.py:175]   Vocab Size:      32000
I0421 06:40:17.409791 136604529964864 train_distill.py:176]   Checkpoint:      
I0421 06:40:17.409810 136604529964864 train_distill.py:465] Initializing model: gpt3-52k...
I0421 06:40:18.676213 136604529964864 train_distill.py:614] Loading Teacher from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items...
I0421 06:40:18.676325 136604529964864 train_distill.py:169] --- Teacher Configuration ---
I0421 06:40:18.676355 136604529964864 train_distill.py:170]   Model Name:      gpt3-52k
I0421 06:40:18.676378 136604529964864 train_distill.py:171]   Dimensions:      1 Layers, 16 Emb Dim, 8 Head Dim
I0421 06:40:18.676399 136604529964864 train_distill.py:174]   Attention Heads: 2 Query, 2 KV
I0421 06:40:18.676419 136604529964864 train_distill.py:175]   Vocab Size:      32000
I0421 06:40:18.676437 136604529964864 train_distill.py:176]   Checkpoint:      gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items
I0421 06:40:18.676456 136604529964864 train_distill.py:465] Initializing model: gpt3-52k...
I0421 06:40:19.810647 136604529964864 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0421 06:40:19.811110 136604529964864 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=True, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7c3d00d55d60>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0421 06:40:19.811174 136604529964864 abstract_checkpointer.py:35] orbax-checkpoint version: 0.11.28
W0421 06:40:20.361922 136604529964864 checkpoint.py:202] Metadata file does not exist: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items/_CHECKPOINT_METADATA
I0421 06:40:21.331003    2144 google_auth_provider.cc:181] Running on GCE, using service account 562977990677-compute@developer.gserviceaccount.com
I0421 06:40:22.656516 136604529964864 checkpointer.py:304] Restoring checkpoint from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items.
W0421 06:40:24.711857 136604529964864 transform_utils.py:230] The transformations API will eventually be replaced by an upgraded design. The current API will not be removed until this point, but it will no longer be actively worked on.
I0421 06:40:24.712222 136604529964864 transform_utils.py:288] The following keys are not loaded from the original tree after applying specified transforms: params/params/decoder/to_nnx__rngs/aqt/count, params/params/decoder/to_nnx__rngs/aqt/key, params/params/decoder/to_nnx__rngs/dropout/count, params/params/decoder/to_nnx__rngs/dropout/key, params/params/decoder/to_nnx__rngs/params/count, params/params/decoder/to_nnx__rngs/params/key
I0421 06:40:26.095342 136604529964864 checkpointer.py:318] Finished restoring checkpoint in 3.82 seconds from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items.
I0421 06:40:26.800579 136604529964864 train_distill.py:640] Initializing Data Iterators via MaxText pipeline...
I0421 06:40:26.864213 136604529964864 config.py:112] TensorFlow version 2.20.0 available.
I0421 06:40:26.864803 136604529964864 config.py:125] JAX version 0.8.3 available.
E0421 06:40:29.008310 136604529964864 packing.py:209] PackAndBatchOperation is deprecated. Please use lazy_dataset.FirstFitPackIterDataset instead.
I0421 06:40:29.008543 136604529964864 data_loader.py:408] Adding CopyNumPyArrayToSharedMemory MapTransform.
I0421 06:40:29.011574 136604529964864 train_distill.py:410] Input Pipeline Checkpointing: DISABLED
I0421 06:40:29.011639 136604529964864 train_distill.py:414] Reason: Iterator 'MultiHostDataLoadIterator' is not recognized as Grain (dataset_type='DatasetType.HF', has_save=False)
I0421 06:40:29.011715 136604529964864 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0421 06:40:29.011795 136604529964864 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7c3d00d55d60>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0421 06:40:29.011837 136604529964864 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0421 06:40:29.011868 136604529964864 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7c3d00d55d60>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0421 06:40:29.011912 136604529964864 checkpoint_manager.py:702] [process=4][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c26702cd1c0>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c26169ef590>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c26169ef500>}, handler_registry=None
I0421 06:40:29.012105 136604529964864 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c26702cd1c0>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0421 06:40:29.012146 136604529964864 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c26169ef590>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0421 06:40:29.012172 136604529964864 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c26169ef500>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0421 06:40:29.012195 136604529964864 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c26702e1790>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0421 06:40:29.012222 136604529964864 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c26702cd1c0>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c26702cd1c0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c26169ef590>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c26169ef590>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c26169ef500>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c26169ef500>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c26702e1790>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c26702e1790>}).
I0421 06:40:29.012649 136604529964864 async_checkpointer.py:177] [process=4][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7c26168b45e0> timeout: 600 secs and primary_host=0 for async checkpoint writes
I0421 06:40:31.314228 136604529964864 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_main_20260421_061409/pt_distill_nnx_xpk_main_20260421_061409_07_distill_smoke/checkpoints
I0421 06:40:31.744680 136604529964864 checkpoint_manager.py:921] [process=4][thread=MainThread] CheckpointManager created,  primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_main_20260421_061409/pt_distill_nnx_xpk_main_20260421_061409_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7c26169ef4d0>
I0421 06:40:31.744882 136604529964864 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0421 06:40:31.744950 136604529964864 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7c3d00d55d60>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0421 06:40:31.744986 136604529964864 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0421 06:40:31.745019 136604529964864 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7c3d00d55d60>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0421 06:40:31.745065 136604529964864 checkpoint_manager.py:1983] [process=4][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning.
I0421 06:40:31.745131 136604529964864 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=136604529964864 count=1 at 0x7c26702f2940>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7c26169ef350>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7c26169ef320>, _write_futures=[])
I0421 06:40:31.745593 136604529964864 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=136604529964864 count=1 at 0x7c26702f2940>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7c26169ef350>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7c26169ef320>, _write_futures=[])
I0421 06:40:31.745626 136604529964864 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=136604529964864 count=1 at 0x7c26702f2940>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7c26169ef350>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7c26169ef320>, _write_futures=[])
I0421 06:40:31.745657 136604529964864 checkpoint_manager.py:702] [process=4][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c26169eedb0>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c26169ee780>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c26169ee810>, 'iter': <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7c26169ed730>}, handler_registry=None
I0421 06:40:31.745789 136604529964864 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c26169eedb0>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0421 06:40:31.745830 136604529964864 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c26169ee780>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0421 06:40:31.745855 136604529964864 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c26169ee810>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0421 06:40:31.745884 136604529964864 composite_checkpoint_handler.py:237] Deferred registration for item: "iter". Adding handler `<maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7c26169ed730>` for item "iter" and save args `<class 'maxtext.common.checkpointing.GrainCheckpointSave'>` and restore args `<class 'maxtext.common.checkpointing.GrainCheckpointRestore'>` to `_handler_registry`.
I0421 06:40:31.745908 136604529964864 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c26169ecfe0>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0421 06:40:31.745933 136604529964864 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c26169eedb0>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c26169eedb0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c26169ee780>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c26169ee780>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c26169ee810>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c26169ee810>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointSave'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7c26169ed730>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointRestore'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7c26169ed730>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c26169ecfe0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c26169ecfe0>}).
I0421 06:40:31.746003 136604529964864 async_checkpointer.py:177] [process=4][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7c26168b4860> timeout: 600 secs and primary_host=0 for async checkpoint writes
I0421 06:40:32.148411 136604529964864 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_main_20260421_061409/pt_distill_nnx_xpk_main_20260421_061409_07_distill_smoke/checkpoints
I0421 06:40:32.160535 136604529964864 checkpoint_manager.py:921] [process=4][thread=MainThread] CheckpointManager created,  primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_main_20260421_061409/pt_distill_nnx_xpk_main_20260421_061409_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7c2671047b60>
I0421 06:40:32.161050 136604529964864 train_distill.py:691] Starting Distillation Training...
I0421 06:40:32.161156 136604529964864 peft_trainer.py:590] Training with mesh: Mesh('diloco': 1, 'data': 4, 'stage': 1, 'fsdp': 8, 'fsdp_transpose': 1, 'sequence': 1, 'context': 1, 'context_autoregressive': 1, 'tensor': 1, 'tensor_transpose': 1, 'tensor_sequence': 1, 'expert': 1, 'autoregressive': 1, axis_types=(Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto))
I0421 06:40:33.257256 136604529964864 peft_trainer.py:600] Compiled train_step cache size: 0

Training:   0%|          | 0/5 [00:00<?, ?step/s]I0421 06:40:33.259014 136460204226304 grain_pool.py:367] Grain pool will use 1 processes.
I0421 06:40:33.285445 136460204226304 grain_pool.py:440] Grain pool will start child processes.
I0421 06:40:33.290567 136460204226304 grain_pool.py:448] Grain pool started all child processes.
2026-04-21 06:40:39.314926: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'}
`rope_scaling`'s factor field must be a float >= 1, got 40
`rope_scaling`'s beta_fast field must be a float, got 32
`rope_scaling`'s beta_slow field must be a float, got 1
Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'}
Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'}
Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'}
I0421 06:40:42.518233 136604529964864 utils.py:86] Train loop finished in: 9.2603 seconds
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 765, in <module>
    app.run(main)
  File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 316, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 261, in _run_main
    sys.exit(main(argv))
             ^^^^^^^^^^
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 761, in main
    train_distill(student_config, teacher_config, is_offline, global_config.offline_data_dir)
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 693, in train_distill
    trainer.train(train_iter, eval_iter)
  File "/usr/local/lib/python3.12/site-packages/tunix/sft/peft_trainer.py", line 659, in train
    train_example = sharding_utils.shard_input(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 58, in shard_input
    return jax.tree.map(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/tree.py", line 155, in map
    return tree_util.tree_map(f, tree, *rest, is_leaf=is_leaf)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 362, in tree_map
    return treedef.unflatten(f(*xs) for xs in zip(*all_leaves))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 362, in <genexpr>
    return treedef.unflatten(f(*xs) for xs in zip(*all_leaves))
                             ^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 59, in <lambda>
    lambda x: jax.make_array_from_process_local_data(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 986, in make_array_from_process_local_data
    out = [_array_from_process_local_data(data, s, shape)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 1048, in _array_from_process_local_data
    return make_array_from_callback(global_shape, sharding, cb)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 845, in make_array_from_callback
    per_device_values = api.device_put(per_device_values, devices)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/api.py", line 2729, in device_put
    out_flat = dispatch._batched_device_put_impl(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 558, in _batched_device_put_impl
    y = _device_put_impl(x, device=device, src=src, copy=cp, aval=aval)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 545, in _device_put_impl
    return _device_put_sharding_impl(x, aval, device, copy)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 487, in _device_put_sharding_impl
    raise ValueError(
ValueError: device_put's first argument must be a fully addressable array, but got value with devices {TpuDevice(id=20, process_index=4, coords=(0,5,0), core_on_chip=0), TpuDevice(id=30, process_index=7, coords=(2,7,0), core_on_chip=0), TpuDevice(id=18, process_index=5, coords=(2,4,0), core_on_chip=0), TpuDevice(id=24, process_index=6, coords=(0,6,0), core_on_chip=0), TpuDevice(id=1, process_index=0, coords=(1,0,0), core_on_chip=0), TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), TpuDevice(id=28, process_index=6, coords=(0,7,0), core_on_chip=0), TpuDevice(id=5, process_index=0, coords=(1,1,0), core_on_chip=0), TpuDevice(id=19, process_index=5, coords=(3,4,0), core_on_chip=0), TpuDevice(id=15, process_index=3, coords=(3,3,0), core_on_chip=0), TpuDevice(id=31, process_index=7, coords=(3,7,0), core_on_chip=0), TpuDevice(id=16, process_index=4, coords=(0,4,0), core_on_chip=0), TpuDevice(id=8, process_index=2, coords=(0,2,0), core_on_chip=0), TpuDevice(id=2, process_index=1, coords=(2,0,0), core_on_chip=0), TpuDevice(id=14, process_index=3, coords=(2,3,0), core_on_chip=0), TpuDevice(id=29, process_index=6, coords=(1,7,0), core_on_chip=0), TpuDevice(id=9, process_index=2, coords=(1,2,0), core_on_chip=0), TpuDevice(id=13, process_index=2, coords=(1,3,0), core_on_chip=0), TpuDevice(id=3, process_index=1, coords=(3,0,0), core_on_chip=0), TpuDevice(id=7, process_index=1, coords=(3,1,0), core_on_chip=0), TpuDevice(id=27, process_index=7, coords=(3,6,0), core_on_chip=0), TpuDevice(id=6, process_index=1, coords=(2,1,0), core_on_chip=0), TpuDevice(id=4, process_index=0, coords=(0,1,0), core_on_chip=0), TpuDevice(id=22, process_index=5, coords=(2,5,0), core_on_chip=0), TpuDevice(id=21, process_index=4, coords=(1,5,0), core_on_chip=0), TpuDevice(id=12, process_index=2, coords=(0,3,0), core_on_chip=0), TpuDevice(id=17, process_index=4, coords=(1,4,0), core_on_chip=0), TpuDevice(id=25, process_index=6, coords=(1,6,0), core_on_chip=0), TpuDevice(id=26, process_index=7, coords=(2,6,0), core_on_chip=0), TpuDevice(id=23, process_index=5, coords=(3,5,0), core_on_chip=0), TpuDevice(id=11, process_index=3, coords=(3,2,0), core_on_chip=0), TpuDevice(id=10, process_index=3, coords=(2,2,0), core_on_chip=0)}
I0421 06:40:42.863907 136460204226304 grain_pool.py:542] Grain pool is exiting.
I0421 06:40:42.864007 136460204226304 grain_pool.py:547] Shutting down multiprocessing system.
I0421 06:40:44.326070 136460204226304 grain_pool.py:547] Shutting down multiprocessing system.

Training:   0%|          | 0/5 [00:12<?, ?step/s]
/usr/local/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 15 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
XPK End: Tue Apr 21 06:40:55 UTC 2026
EXIT_CODE=1