MaxView

← Back to run

Log Summary

XPK Start: Wed Apr 22 21:37:33 UTC 2026
2026-04-22 21:37:50.275077: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'}
`rope_scaling`'s factor field must be a float >= 1, got 40
`rope_scaling`'s beta_fast field must be a float, got 32
`rope_scaling`'s beta_slow field must be a float, got 1
Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'}
Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'}
Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'}
I0422 21:37:54.420316 140036574910272 max_utils.py:273] Attempting to initialize the jax distributed system...
INFO:2026-04-22 21:38:03,459:jax._src.distributed:149: Starting JAX distributed service on [::]:8482
I0422 21:38:03.459161 140036574910272 distributed.py:149] Starting JAX distributed service on [::]:8482
INFO:2026-04-22 21:38:03,461:jax._src.distributed:166: Connecting to JAX distributed service on mt-07-distill-smoke-t1n1b-slice-job-0-0.mt-07-distill-smoke-t1n1b:8482
I0422 21:38:03.461479 140036574910272 distributed.py:166] Connecting to JAX distributed service on mt-07-distill-smoke-t1n1b-slice-job-0-0.mt-07-distill-smoke-t1n1b:8482
I0422 21:38:04.812991 140036574910272 max_utils.py:284] Jax distributed system initialized!
I0422 21:38:11.429179 140036574910272 max_utils.py:244] Jax distributed system is already initialized.
W0422 21:38:11.560319 140036574910272 pyconfig.py:111] base_output_directory is not provided; Using local directory called maxtext_output
I0422 21:38:11.620766 140036574910272 max_utils.py:244] Jax distributed system is already initialized.
I0422 21:38:11.622009 140036574910272 pyconfig.py:471] Config param abort_on_inf_loss: True
I0422 21:38:11.622057 140036574910272 pyconfig.py:471] Config param abort_on_nan_loss: True
I0422 21:38:11.622082 140036574910272 pyconfig.py:471] Config param act_quantization_calibration_method: absmax
I0422 21:38:11.622116 140036574910272 pyconfig.py:471] Config param activation_dropout_for_audio: 0.0
I0422 21:38:11.622135 140036574910272 pyconfig.py:471] Config param activation_function_for_audio: gelu
I0422 21:38:11.622154 140036574910272 pyconfig.py:471] Config param activations_in_float32: False
I0422 21:38:11.622171 140036574910272 pyconfig.py:471] Config param adam_b1: 0.9
I0422 21:38:11.622190 140036574910272 pyconfig.py:471] Config param adam_b2: 0.95
I0422 21:38:11.622207 140036574910272 pyconfig.py:471] Config param adam_eps: 1e-08
I0422 21:38:11.622231 140036574910272 pyconfig.py:471] Config param adam_eps_root: 0.0
I0422 21:38:11.622247 140036574910272 pyconfig.py:471] Config param adam_weight_decay: 0.1
I0422 21:38:11.622263 140036574910272 pyconfig.py:471] Config param adamw_mask: []
I0422 21:38:11.622280 140036574910272 pyconfig.py:471] Config param add_bos: True
I0422 21:38:11.622297 140036574910272 pyconfig.py:471] Config param add_eos: True
I0422 21:38:11.622315 140036574910272 pyconfig.py:471] Config param allow_split_physical_axes: False
I0422 21:38:11.622332 140036574910272 pyconfig.py:471] Config param ar_cache_axis_order: 1,2,0,3
I0422 21:38:11.622348 140036574910272 pyconfig.py:471] Config param async_checkpointing: True
I0422 21:38:11.622364 140036574910272 pyconfig.py:471] Config param async_scheduling: False
I0422 21:38:11.622381 140036574910272 pyconfig.py:471] Config param attention: dot_product
I0422 21:38:11.622397 140036574910272 pyconfig.py:471] Config param attention_bias: False
I0422 21:38:11.622415 140036574910272 pyconfig.py:471] Config param attention_dropout_for_audio: 0.0
I0422 21:38:11.622441 140036574910272 pyconfig.py:471] Config param attention_out: RematLocation.REMAT
I0422 21:38:11.622462 140036574910272 pyconfig.py:471] Config param attention_output_dim: -1
I0422 21:38:11.622524 140036574910272 pyconfig.py:471] Config param attention_sink: False
I0422 21:38:11.622555 140036574910272 pyconfig.py:471] Config param attention_type: global
I0422 21:38:11.622571 140036574910272 pyconfig.py:471] Config param attn_logits_soft_cap: None
I0422 21:38:11.622594 140036574910272 pyconfig.py:471] Config param audio_path: 
I0422 21:38:11.622609 140036574910272 pyconfig.py:471] Config param audio_placeholder: <|audio|>
I0422 21:38:11.622625 140036574910272 pyconfig.py:471] Config param autoregressive_decode_assert: 
I0422 21:38:11.622647 140036574910272 pyconfig.py:471] Config param base_config: base.yml
I0422 21:38:11.622664 140036574910272 pyconfig.py:471] Config param base_emb_dim: 16
I0422 21:38:11.622687 140036574910272 pyconfig.py:471] Config param base_mlp_dim: 64
I0422 21:38:11.622704 140036574910272 pyconfig.py:471] Config param base_moe_mlp_dim: -1
I0422 21:38:11.622720 140036574910272 pyconfig.py:471] Config param base_num_decoder_layers: 1
I0422 21:38:11.622736 140036574910272 pyconfig.py:471] Config param base_num_kv_heads: 2
I0422 21:38:11.622751 140036574910272 pyconfig.py:471] Config param base_num_query_heads: 2
I0422 21:38:11.622767 140036574910272 pyconfig.py:471] Config param base_output_directory: /deps/maxtext_output
I0422 21:38:11.622783 140036574910272 pyconfig.py:471] Config param batch_size: 1
I0422 21:38:11.622799 140036574910272 pyconfig.py:471] Config param batch_split_factor: 1
I0422 21:38:11.622816 140036574910272 pyconfig.py:471] Config param beta_fast: 32
I0422 21:38:11.622832 140036574910272 pyconfig.py:471] Config param beta_slow: 1
I0422 21:38:11.622848 140036574910272 pyconfig.py:471] Config param bwd_quantization_calibration_method: absmax
I0422 21:38:11.622865 140036574910272 pyconfig.py:471] Config param capacity_factor: -1.0
I0422 21:38:11.622882 140036574910272 pyconfig.py:471] Config param cast_logits_to_fp32: True
I0422 21:38:11.622899 140036574910272 pyconfig.py:471] Config param chat_template: 
I0422 21:38:11.622915 140036574910272 pyconfig.py:471] Config param chat_template_path: 
I0422 21:38:11.622933 140036574910272 pyconfig.py:471] Config param checkpoint_conversion_fn: None
I0422 21:38:11.622949 140036574910272 pyconfig.py:471] Config param checkpoint_dir: /deps/maxtext_output/gpt3-52k_2026-04-22-21-38/checkpoints/
I0422 21:38:11.622966 140036574910272 pyconfig.py:471] Config param checkpoint_is_quantized: False
I0422 21:38:11.622982 140036574910272 pyconfig.py:471] Config param checkpoint_period: 2000
I0422 21:38:11.622998 140036574910272 pyconfig.py:471] Config param checkpoint_storage_concurrent_gb: 96
I0422 21:38:11.623015 140036574910272 pyconfig.py:471] Config param checkpoint_storage_target_data_file_size_bytes: 2147483648
I0422 21:38:11.623030 140036574910272 pyconfig.py:471] Config param checkpoint_storage_use_ocdbt: True
I0422 21:38:11.623045 140036574910272 pyconfig.py:471] Config param checkpoint_storage_use_zarr3: True
I0422 21:38:11.623061 140036574910272 pyconfig.py:471] Config param checkpoint_todelete_full_path: None
I0422 21:38:11.623077 140036574910272 pyconfig.py:471] Config param checkpoint_todelete_subdir: None
I0422 21:38:11.623091 140036574910272 pyconfig.py:471] Config param chips_per_vm: 4
I0422 21:38:11.623115 140036574910272 pyconfig.py:471] Config param chunk_attn_window_size: 0
I0422 21:38:11.623129 140036574910272 pyconfig.py:471] Config param collect_stack_trace: False
I0422 21:38:11.623149 140036574910272 pyconfig.py:471] Config param colocated_python_checkpointing: False
I0422 21:38:11.623164 140036574910272 pyconfig.py:471] Config param colocated_python_data_input: False
I0422 21:38:11.623178 140036574910272 pyconfig.py:471] Config param compile_topology: 
I0422 21:38:11.623192 140036574910272 pyconfig.py:471] Config param compile_topology_num_slices: -1
I0422 21:38:11.623208 140036574910272 pyconfig.py:471] Config param compile_xla_flags: 
I0422 21:38:11.623223 140036574910272 pyconfig.py:471] Config param compiled_trainstep_file: 
I0422 21:38:11.623237 140036574910272 pyconfig.py:471] Config param compute_axis_order: 0,1,2,3
I0422 21:38:11.623262 140036574910272 pyconfig.py:471] Config param constant_bound_config: []
I0422 21:38:11.623277 140036574910272 pyconfig.py:471] Config param context: RematLocation.REMAT
I0422 21:38:11.623295 140036574910272 pyconfig.py:471] Config param context_parallel_load_balance: True
I0422 21:38:11.623309 140036574910272 pyconfig.py:471] Config param context_parallel_reorder_strategy: ReorderStrategy.AUTO
I0422 21:38:11.623327 140036574910272 pyconfig.py:471] Config param context_parallel_size: 1
I0422 21:38:11.623345 140036574910272 pyconfig.py:471] Config param context_parallel_strategy: all_gather
I0422 21:38:11.623361 140036574910272 pyconfig.py:471] Config param context_sharding: context
I0422 21:38:11.623376 140036574910272 pyconfig.py:471] Config param conv_chunksize_for_audio: 500
I0422 21:38:11.623399 140036574910272 pyconfig.py:471] Config param conv_stride_for_vit: 14
I0422 21:38:11.623415 140036574910272 pyconfig.py:471] Config param convert_checkpoint_if_possible: False
I0422 21:38:11.623430 140036574910272 pyconfig.py:471] Config param cost_estimate_flops_bwd: -1
I0422 21:38:11.623446 140036574910272 pyconfig.py:471] Config param cost_estimate_flops_fwd: -1
I0422 21:38:11.623462 140036574910272 pyconfig.py:471] Config param custom_mesh: 
I0422 21:38:11.623477 140036574910272 pyconfig.py:471] Config param custom_mesh_and_rule: 
I0422 21:38:11.623491 140036574910272 pyconfig.py:471] Config param d_model_for_audio: 256
I0422 21:38:11.623513 140036574910272 pyconfig.py:471] Config param data_sharding: (('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'),)
I0422 21:38:11.623540 140036574910272 pyconfig.py:471] Config param data_shuffle_seed: 0
I0422 21:38:11.623564 140036574910272 pyconfig.py:471] Config param dataset_name: c4/en:3.0.1
I0422 21:38:11.623589 140036574910272 pyconfig.py:471] Config param dataset_path: 
I0422 21:38:11.623616 140036574910272 pyconfig.py:471] Config param dataset_type: DatasetType.HF
I0422 21:38:11.623636 140036574910272 pyconfig.py:471] Config param dcn_autoregressive_parallelism: 1
I0422 21:38:11.623651 140036574910272 pyconfig.py:471] Config param dcn_context_autoregressive_parallelism: 1
I0422 21:38:11.623667 140036574910272 pyconfig.py:471] Config param dcn_context_parallelism: 1
I0422 21:38:11.623684 140036574910272 pyconfig.py:471] Config param dcn_data_parallelism: -1
I0422 21:38:11.623700 140036574910272 pyconfig.py:471] Config param dcn_diloco_parallelism: 1
I0422 21:38:11.623716 140036574910272 pyconfig.py:471] Config param dcn_expert_parallelism: 1
I0422 21:38:11.623730 140036574910272 pyconfig.py:471] Config param dcn_fsdp_parallelism: 1
I0422 21:38:11.623746 140036574910272 pyconfig.py:471] Config param dcn_fsdp_transpose_parallelism: 1
I0422 21:38:11.623761 140036574910272 pyconfig.py:471] Config param dcn_parallelism: [1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
I0422 21:38:11.623779 140036574910272 pyconfig.py:471] Config param dcn_pipeline_parallelism: 1
I0422 21:38:11.623793 140036574910272 pyconfig.py:471] Config param dcn_sequence_parallelism: 1
I0422 21:38:11.623809 140036574910272 pyconfig.py:471] Config param dcn_tensor_parallelism: 1
I0422 21:38:11.623826 140036574910272 pyconfig.py:471] Config param dcn_tensor_sequence_parallelism: 1
I0422 21:38:11.623842 140036574910272 pyconfig.py:471] Config param dcn_tensor_transpose_parallelism: 1
I0422 21:38:11.623858 140036574910272 pyconfig.py:471] Config param debug: {'rl': False}
I0422 21:38:11.623874 140036574910272 pyconfig.py:471] Config param debug_sharding: False
I0422 21:38:11.623891 140036574910272 pyconfig.py:471] Config param decode_sampling_nucleus_p: -1
I0422 21:38:11.623905 140036574910272 pyconfig.py:471] Config param decode_sampling_strategy: SamplingStrategy.GREEDY
I0422 21:38:11.623923 140036574910272 pyconfig.py:471] Config param decode_sampling_temperature: 1.0
I0422 21:38:11.623940 140036574910272 pyconfig.py:471] Config param decode_sampling_top_k: 0
I0422 21:38:11.623956 140036574910272 pyconfig.py:471] Config param decoder_block: DecoderBlockType.GPT3
I0422 21:38:11.623972 140036574910272 pyconfig.py:471] Config param decoder_layer_input: RematLocation.DEVICE
I0422 21:38:11.623989 140036574910272 pyconfig.py:471] Config param deepstack_visual_indexes_for_vit: []
I0422 21:38:11.624003 140036574910272 pyconfig.py:471] Config param degenerate_group_masking: True
I0422 21:38:11.624019 140036574910272 pyconfig.py:471] Config param dense_init_scale: 1.0
I0422 21:38:11.624036 140036574910272 pyconfig.py:471] Config param diloco_outer_lr: 0.3
I0422 21:38:11.624053 140036574910272 pyconfig.py:471] Config param diloco_outer_momentum: 0.9
I0422 21:38:11.624077 140036574910272 pyconfig.py:471] Config param diloco_sync_period: 36
I0422 21:38:11.624125 140036574910272 pyconfig.py:471] Config param distill_alpha: 0.5
I0422 21:38:11.624144 140036574910272 pyconfig.py:471] Config param distill_alpha_end: None
I0422 21:38:11.624159 140036574910272 pyconfig.py:471] Config param distill_alpha_schedule: constant
I0422 21:38:11.624191 140036574910272 pyconfig.py:471] Config param distill_beta: 0.0
I0422 21:38:11.624208 140036574910272 pyconfig.py:471] Config param distill_beta_end: None
I0422 21:38:11.624233 140036574910272 pyconfig.py:471] Config param distill_beta_schedule: constant
I0422 21:38:11.624250 140036574910272 pyconfig.py:471] Config param distill_feature_loss_type: cosine
I0422 21:38:11.624264 140036574910272 pyconfig.py:471] Config param distill_layer_indices: None
I0422 21:38:11.624289 140036574910272 pyconfig.py:471] Config param distill_temperature: 1.0
I0422 21:38:11.624305 140036574910272 pyconfig.py:471] Config param distill_temperature_end: None
I0422 21:38:11.624325 140036574910272 pyconfig.py:471] Config param distill_temperature_schedule: constant
I0422 21:38:11.624340 140036574910272 pyconfig.py:471] Config param downsample_hidden_size_for_audio: 256
I0422 21:38:11.624354 140036574910272 pyconfig.py:471] Config param dpo_beta: 0.1
I0422 21:38:11.624380 140036574910272 pyconfig.py:471] Config param dpo_label_smoothing: 0.0
I0422 21:38:11.624396 140036574910272 pyconfig.py:471] Config param dq_reduction_steps: 0
I0422 21:38:11.624410 140036574910272 pyconfig.py:471] Config param dropout_rate: 0.0
I0422 21:38:11.624432 140036574910272 pyconfig.py:471] Config param dtype: bfloat16
I0422 21:38:11.624464 140036574910272 pyconfig.py:471] Config param dtype_mm: float32
I0422 21:38:11.624493 140036574910272 pyconfig.py:471] Config param dump_hlo: False
I0422 21:38:11.624514 140036574910272 pyconfig.py:471] Config param dump_hlo_delete_local_after: True
I0422 21:38:11.624541 140036574910272 pyconfig.py:471] Config param dump_hlo_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-22-21-38/xla_dump
I0422 21:38:11.624557 140036574910272 pyconfig.py:471] Config param dump_hlo_local_dir: /tmp/xla_dump/
I0422 21:38:11.624577 140036574910272 pyconfig.py:471] Config param dump_hlo_local_module_name: jit_train_step
I0422 21:38:11.624593 140036574910272 pyconfig.py:471] Config param dump_hlo_module_name: jit_train_step
I0422 21:38:11.624608 140036574910272 pyconfig.py:471] Config param dump_hlo_upload_all: False
I0422 21:38:11.624629 140036574910272 pyconfig.py:471] Config param dump_hlo_xla_flags: 
I0422 21:38:11.624645 140036574910272 pyconfig.py:471] Config param dump_jaxpr: False
I0422 21:38:11.624660 140036574910272 pyconfig.py:471] Config param dump_jaxpr_delete_local_after: True
I0422 21:38:11.624680 140036574910272 pyconfig.py:471] Config param dump_jaxpr_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-22-21-38/jaxpr_dump
I0422 21:38:11.624695 140036574910272 pyconfig.py:471] Config param dump_jaxpr_local_dir: /tmp/jaxpr_dump/
I0422 21:38:11.624710 140036574910272 pyconfig.py:471] Config param dump_step: -1
I0422 21:38:11.624731 140036574910272 pyconfig.py:471] Config param elastic_enabled: False
I0422 21:38:11.624745 140036574910272 pyconfig.py:471] Config param elastic_max_retries: 10
I0422 21:38:11.624760 140036574910272 pyconfig.py:471] Config param elastic_timeout_seconds: 300
I0422 21:38:11.624781 140036574910272 pyconfig.py:471] Config param emb_dim: 16
I0422 21:38:11.624795 140036574910272 pyconfig.py:471] Config param enable_autocheckpoint: False
I0422 21:38:11.624811 140036574910272 pyconfig.py:471] Config param enable_checkpoint_cloud_logger: False
I0422 21:38:11.624825 140036574910272 pyconfig.py:471] Config param enable_checkpointing: True
I0422 21:38:11.624841 140036574910272 pyconfig.py:471] Config param enable_continuous_checkpointing: False
I0422 21:38:11.624855 140036574910272 pyconfig.py:471] Config param enable_data_shuffling: True
I0422 21:38:11.624871 140036574910272 pyconfig.py:471] Config param enable_diloco: False
I0422 21:38:11.624886 140036574910272 pyconfig.py:471] Config param enable_dp_attention: False
I0422 21:38:11.624901 140036574910272 pyconfig.py:471] Config param enable_dropout: False
I0422 21:38:11.624916 140036574910272 pyconfig.py:471] Config param enable_emergency_checkpoint: False
I0422 21:38:11.624931 140036574910272 pyconfig.py:471] Config param enable_expert_parallel: False
I0422 21:38:11.624946 140036574910272 pyconfig.py:471] Config param enable_gcp_goodput_metrics: True
I0422 21:38:11.624961 140036574910272 pyconfig.py:471] Config param enable_gcp_step_deviation_metrics: True
I0422 21:38:11.624975 140036574910272 pyconfig.py:471] Config param enable_goodput_recording: False
I0422 21:38:11.624991 140036574910272 pyconfig.py:471] Config param enable_jax_profiler: False
I0422 21:38:11.625005 140036574910272 pyconfig.py:471] Config param enable_llm_inference_pool: False
I0422 21:38:11.625021 140036574910272 pyconfig.py:471] Config param enable_model_warmup: False
I0422 21:38:11.625035 140036574910272 pyconfig.py:471] Config param enable_multi_tier_checkpointing: False
I0422 21:38:11.625055 140036574910272 pyconfig.py:471] Config param enable_nnx: False
I0422 21:38:11.625070 140036574910272 pyconfig.py:471] Config param enable_orbax_v1: False
I0422 21:38:11.625084 140036574910272 pyconfig.py:471] Config param enable_padding_causal_mask: True
I0422 21:38:11.625113 140036574910272 pyconfig.py:471] Config param enable_pathways_goodput: False
I0422 21:38:11.625128 140036574910272 pyconfig.py:471] Config param enable_prefix_caching: False
I0422 21:38:11.625143 140036574910272 pyconfig.py:471] Config param enable_rampup_batch_size: False
I0422 21:38:11.625164 140036574910272 pyconfig.py:471] Config param enable_single_controller: False
I0422 21:38:11.625180 140036574910272 pyconfig.py:471] Config param enable_single_replica_ckpt_restoring: False
I0422 21:38:11.625195 140036574910272 pyconfig.py:471] Config param enable_tensorboard: True
I0422 21:38:11.625223 140036574910272 pyconfig.py:471] Config param enable_tunix_perf_metrics: False
I0422 21:38:11.625238 140036574910272 pyconfig.py:471] Config param encoder_attention_heads_for_audio: 4
I0422 21:38:11.625255 140036574910272 pyconfig.py:471] Config param encoder_ffn_dim_for_audio: 512
I0422 21:38:11.625269 140036574910272 pyconfig.py:471] Config param encoder_layers_for_audio: 2
I0422 21:38:11.625284 140036574910272 pyconfig.py:471] Config param engram: RematLocation.REMAT
I0422 21:38:11.625300 140036574910272 pyconfig.py:471] Config param engram_head_dim: 1280
I0422 21:38:11.625315 140036574910272 pyconfig.py:471] Config param engram_kernel_size: 4
I0422 21:38:11.625329 140036574910272 pyconfig.py:471] Config param engram_layers: []
I0422 21:38:11.625345 140036574910272 pyconfig.py:471] Config param engram_max_ngram_size: 3
I0422 21:38:11.625359 140036574910272 pyconfig.py:471] Config param engram_num_heads: 8
I0422 21:38:11.625375 140036574910272 pyconfig.py:471] Config param engram_seed: 0
I0422 21:38:11.625396 140036574910272 pyconfig.py:471] Config param engram_vocab_bases: []
I0422 21:38:11.625411 140036574910272 pyconfig.py:471] Config param epsilon_high: None
I0422 21:38:11.625425 140036574910272 pyconfig.py:471] Config param eval_corr_lst: False
I0422 21:38:11.625453 140036574910272 pyconfig.py:471] Config param eval_data_columns: ['text']
I0422 21:38:11.625470 140036574910272 pyconfig.py:471] Config param eval_dataset_name: c4/en:3.0.1
I0422 21:38:11.625484 140036574910272 pyconfig.py:471] Config param eval_image_column: image
I0422 21:38:11.625510 140036574910272 pyconfig.py:471] Config param eval_interval: -1
I0422 21:38:11.625526 140036574910272 pyconfig.py:471] Config param eval_make_lst: False
I0422 21:38:11.625552 140036574910272 pyconfig.py:471] Config param eval_per_device_batch_size: 2
I0422 21:38:11.625568 140036574910272 pyconfig.py:471] Config param eval_sampling_strategy: greedy
I0422 21:38:11.625584 140036574910272 pyconfig.py:471] Config param eval_split: validation
I0422 21:38:11.625603 140036574910272 pyconfig.py:471] Config param eval_steps: -1
I0422 21:38:11.625619 140036574910272 pyconfig.py:471] Config param expansion_factor_real_data: -1.0
I0422 21:38:11.625635 140036574910272 pyconfig.py:471] Config param final_logits_soft_cap: None
I0422 21:38:11.625661 140036574910272 pyconfig.py:471] Config param first_num_dense_layers: 0
I0422 21:38:11.625677 140036574910272 pyconfig.py:471] Config param float32_gate_logits: False
I0422 21:38:11.625704 140036574910272 pyconfig.py:471] Config param float32_logits: False
I0422 21:38:11.625721 140036574910272 pyconfig.py:471] Config param float32_qk_product: False
I0422 21:38:11.625737 140036574910272 pyconfig.py:471] Config param float32_weight_sum: True
I0422 21:38:11.625757 140036574910272 pyconfig.py:471] Config param force_q_layout: False
I0422 21:38:11.625771 140036574910272 pyconfig.py:471] Config param force_unroll: False
I0422 21:38:11.625787 140036574910272 pyconfig.py:471] Config param freeze_audio_encoder_params: True
I0422 21:38:11.625802 140036574910272 pyconfig.py:471] Config param freeze_vision_encoder_params: True
I0422 21:38:11.625818 140036574910272 pyconfig.py:471] Config param fused_mlp: False
I0422 21:38:11.625833 140036574910272 pyconfig.py:471] Config param fused_qkv: True
I0422 21:38:11.625848 140036574910272 pyconfig.py:471] Config param gcs_metrics: False
I0422 21:38:11.625862 140036574910272 pyconfig.py:471] Config param gdn_chunk_size: 64
I0422 21:38:11.625878 140036574910272 pyconfig.py:471] Config param gdn_conv_kernel_dim: 4
I0422 21:38:11.625893 140036574910272 pyconfig.py:471] Config param gdn_key_head_dim: 128
I0422 21:38:11.625908 140036574910272 pyconfig.py:471] Config param gdn_num_key_heads: 16
I0422 21:38:11.625924 140036574910272 pyconfig.py:471] Config param gdn_num_value_heads: 32
I0422 21:38:11.625938 140036574910272 pyconfig.py:471] Config param gdn_value_head_dim: 128
I0422 21:38:11.625954 140036574910272 pyconfig.py:471] Config param generate_padding_batch_eval: False
I0422 21:38:11.625968 140036574910272 pyconfig.py:471] Config param generate_padding_batch_train: False
I0422 21:38:11.625984 140036574910272 pyconfig.py:471] Config param generate_slice: v5e-16
I0422 21:38:11.625999 140036574910272 pyconfig.py:471] Config param generation_configs: {}
I0422 21:38:11.626014 140036574910272 pyconfig.py:471] Config param global_batch_size_to_eval_on: 64
I0422 21:38:11.626029 140036574910272 pyconfig.py:471] Config param global_batch_size_to_load: 512
I0422 21:38:11.626044 140036574910272 pyconfig.py:471] Config param global_batch_size_to_load_eval: 64
I0422 21:38:11.626060 140036574910272 pyconfig.py:471] Config param global_batch_size_to_load_increment: None
I0422 21:38:11.626075 140036574910272 pyconfig.py:471] Config param global_batch_size_to_load_start: None
I0422 21:38:11.626091 140036574910272 pyconfig.py:471] Config param global_batch_size_to_train_on: 512
I0422 21:38:11.626117 140036574910272 pyconfig.py:471] Config param global_head_dim: 0
I0422 21:38:11.626132 140036574910272 pyconfig.py:471] Config param global_num_kv_heads: 0
I0422 21:38:11.626148 140036574910272 pyconfig.py:471] Config param global_parameter_scale: 1
I0422 21:38:11.626163 140036574910272 pyconfig.py:471] Config param global_rampup_samples: 500
I0422 21:38:11.626178 140036574910272 pyconfig.py:471] Config param global_rope_max_timescale: -1
I0422 21:38:11.626195 140036574910272 pyconfig.py:471] Config param global_rope_proportion: 0.25
I0422 21:38:11.626212 140036574910272 pyconfig.py:471] Config param goodput_upload_interval_seconds: 30
I0422 21:38:11.626226 140036574910272 pyconfig.py:471] Config param grad_dtype: float32
I0422 21:38:11.626263 140036574910272 pyconfig.py:471] Config param gradient_accumulation_steps: 8
I0422 21:38:11.626278 140036574910272 pyconfig.py:471] Config param gradient_clipping_threshold: 1.0
I0422 21:38:11.626295 140036574910272 pyconfig.py:471] Config param grain_data_source_max_workers: 16
I0422 21:38:11.626310 140036574910272 pyconfig.py:471] Config param grain_eval_files: 
I0422 21:38:11.626326 140036574910272 pyconfig.py:471] Config param grain_file_type: arrayrecord
I0422 21:38:11.626341 140036574910272 pyconfig.py:471] Config param grain_num_threads: 16
I0422 21:38:11.626357 140036574910272 pyconfig.py:471] Config param grain_num_threads_eval: 16
I0422 21:38:11.626372 140036574910272 pyconfig.py:471] Config param grain_packing_type: first_fit
I0422 21:38:11.626386 140036574910272 pyconfig.py:471] Config param grain_per_worker_buffer_size: 1
I0422 21:38:11.626403 140036574910272 pyconfig.py:471] Config param grain_per_worker_buffer_size_eval: 1
I0422 21:38:11.626417 140036574910272 pyconfig.py:471] Config param grain_prefetch_buffer_size: 500
I0422 21:38:11.626433 140036574910272 pyconfig.py:471] Config param grain_prefetch_buffer_size_eval: 500
I0422 21:38:11.626449 140036574910272 pyconfig.py:471] Config param grain_ram_budget_mb: 1024
I0422 21:38:11.626464 140036574910272 pyconfig.py:471] Config param grain_shuffle_buffer_size: 100
I0422 21:38:11.626479 140036574910272 pyconfig.py:471] Config param grain_train_files: 
I0422 21:38:11.626494 140036574910272 pyconfig.py:471] Config param grain_train_mixture_config_path: 
I0422 21:38:11.626514 140036574910272 pyconfig.py:471] Config param grain_worker_count: 1
I0422 21:38:11.626529 140036574910272 pyconfig.py:471] Config param grain_worker_count_eval: 1
I0422 21:38:11.626544 140036574910272 pyconfig.py:471] Config param grpo_beta: 0.08
I0422 21:38:11.626561 140036574910272 pyconfig.py:471] Config param grpo_epsilon: 0.2
I0422 21:38:11.626577 140036574910272 pyconfig.py:471] Config param hardware: tpu
I0422 21:38:11.626593 140036574910272 pyconfig.py:471] Config param hbm_utilization_vllm: 0.72
I0422 21:38:11.626608 140036574910272 pyconfig.py:471] Config param head_dim: 8
I0422 21:38:11.626624 140036574910272 pyconfig.py:471] Config param heartbeat_reporting_interval_in_seconds: 5
I0422 21:38:11.626639 140036574910272 pyconfig.py:471] Config param hf_data_dir: None
I0422 21:38:11.626655 140036574910272 pyconfig.py:471] Config param hf_eval_files: None
I0422 21:38:11.626671 140036574910272 pyconfig.py:471] Config param hf_eval_split: None
I0422 21:38:11.626685 140036574910272 pyconfig.py:471] Config param hf_name: None
I0422 21:38:11.626701 140036574910272 pyconfig.py:471] Config param hf_path: OptimalScale/ClimbMix
I0422 21:38:11.626715 140036574910272 pyconfig.py:471] Config param hf_train_files: None
I0422 21:38:11.626731 140036574910272 pyconfig.py:471] Config param hidden_size_for_vit: 1408
I0422 21:38:11.626746 140036574910272 pyconfig.py:471] Config param hide_profiler_step_metric: False
I0422 21:38:11.626761 140036574910272 pyconfig.py:471] Config param ici_autoregressive_parallelism: 1
I0422 21:38:11.626775 140036574910272 pyconfig.py:471] Config param ici_context_autoregressive_parallelism: 1
I0422 21:38:11.626792 140036574910272 pyconfig.py:471] Config param ici_context_parallelism: 1
I0422 21:38:11.626807 140036574910272 pyconfig.py:471] Config param ici_data_parallelism: 1
I0422 21:38:11.626822 140036574910272 pyconfig.py:471] Config param ici_diloco_parallelism: 1
I0422 21:38:11.626837 140036574910272 pyconfig.py:471] Config param ici_expert_parallelism: 1
I0422 21:38:11.626853 140036574910272 pyconfig.py:471] Config param ici_fsdp_parallelism: -1
I0422 21:38:11.626869 140036574910272 pyconfig.py:471] Config param ici_fsdp_transpose_parallelism: 1
I0422 21:38:11.626883 140036574910272 pyconfig.py:471] Config param ici_parallelism: [1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
I0422 21:38:11.626900 140036574910272 pyconfig.py:471] Config param ici_pipeline_parallelism: 1
I0422 21:38:11.626921 140036574910272 pyconfig.py:471] Config param ici_sequence_parallelism: 1
I0422 21:38:11.626935 140036574910272 pyconfig.py:471] Config param ici_tensor_parallelism: 1
I0422 21:38:11.626962 140036574910272 pyconfig.py:471] Config param ici_tensor_sequence_parallelism: 1
I0422 21:38:11.626979 140036574910272 pyconfig.py:471] Config param ici_tensor_transpose_parallelism: 1
I0422 21:38:11.626994 140036574910272 pyconfig.py:471] Config param image_path: 
I0422 21:38:11.627021 140036574910272 pyconfig.py:471] Config param image_placeholder: <|image|>
I0422 21:38:11.627036 140036574910272 pyconfig.py:471] Config param image_size_for_vit: 896
I0422 21:38:11.627061 140036574910272 pyconfig.py:471] Config param indexer_head_dim: 128
I0422 21:38:11.627077 140036574910272 pyconfig.py:471] Config param indexer_loss_scaling_factor: 0.0
I0422 21:38:11.627102 140036574910272 pyconfig.py:471] Config param indexer_n_heads: 64
I0422 21:38:11.627129 140036574910272 pyconfig.py:471] Config param indexer_sparse_training: False
I0422 21:38:11.627145 140036574910272 pyconfig.py:471] Config param indexer_topk: 2048
I0422 21:38:11.627170 140036574910272 pyconfig.py:471] Config param inference_benchmark_test: False
I0422 21:38:11.627185 140036574910272 pyconfig.py:471] Config param inference_metadata_file: 
I0422 21:38:11.627201 140036574910272 pyconfig.py:471] Config param inference_microbenchmark_log_file_path: 
I0422 21:38:11.627228 140036574910272 pyconfig.py:471] Config param inference_microbenchmark_loop_iters: 10
I0422 21:38:11.627244 140036574910272 pyconfig.py:471] Config param inference_microbenchmark_num_samples: [1, 2, 3, 4, 5]
I0422 21:38:11.627274 140036574910272 pyconfig.py:471] Config param inference_microbenchmark_prefill_lengths: 64,128,256,512,1024
I0422 21:38:11.627290 140036574910272 pyconfig.py:471] Config param inference_microbenchmark_stages: prefill,generate
I0422 21:38:11.627315 140036574910272 pyconfig.py:471] Config param inference_server: MaxtextInterleavedServer
I0422 21:38:11.627332 140036574910272 pyconfig.py:471] Config param inhomogeneous_layer_cycle_interval: 1
I0422 21:38:11.627346 140036574910272 pyconfig.py:471] Config param init_weights_seed: 0
I0422 21:38:11.627367 140036574910272 pyconfig.py:471] Config param input_data_sharding_logical_axes: ['activation_embed_and_logits_batch', 'activation_norm_length']
I0422 21:38:11.627383 140036574910272 pyconfig.py:471] Config param interleave_moe_layer_step: 1
I0422 21:38:11.627399 140036574910272 pyconfig.py:471] Config param intermediate_size_for_vit: 5632
I0422 21:38:11.627420 140036574910272 pyconfig.py:471] Config param internal_compile: False
I0422 21:38:11.627437 140036574910272 pyconfig.py:471] Config param internal_compile_num_devices: -1
I0422 21:38:11.627451 140036574910272 pyconfig.py:471] Config param jax_cache_dir: ~/jax_cache
I0422 21:38:11.627477 140036574910272 pyconfig.py:471] Config param jax_debug_log_modules: 
I0422 21:38:11.627492 140036574910272 pyconfig.py:471] Config param jax_distributed_initialization_timeout: 300
I0422 21:38:11.627521 140036574910272 pyconfig.py:471] Config param jax_profiler_port: 9999
I0422 21:38:11.627537 140036574910272 pyconfig.py:471] Config param key_proj: RematLocation.REMAT
I0422 21:38:11.627562 140036574910272 pyconfig.py:471] Config param kv_cache_buffer: 256
I0422 21:38:11.627578 140036574910272 pyconfig.py:471] Config param kv_lora_rank: 512
I0422 21:38:11.627593 140036574910272 pyconfig.py:471] Config param kv_quant_axis: KvQuantAxis.HEADS_AND_DKV
I0422 21:38:11.627620 140036574910272 pyconfig.py:471] Config param kv_quant_dtype: int8
I0422 21:38:11.627635 140036574910272 pyconfig.py:471] Config param kv_wa_proj: RematLocation.REMAT
I0422 21:38:11.627651 140036574910272 pyconfig.py:471] Config param learning_rate: 0.0002
I0422 21:38:11.627680 140036574910272 pyconfig.py:471] Config param learning_rate_final_fraction: 0.1
I0422 21:38:11.627695 140036574910272 pyconfig.py:471] Config param learning_rate_schedule_steps: 200000
I0422 21:38:11.627721 140036574910272 pyconfig.py:471] Config param load_balance_loss_weight: 0.0
I0422 21:38:11.627736 140036574910272 pyconfig.py:471] Config param load_checkpoint_only_once: False
I0422 21:38:11.627750 140036574910272 pyconfig.py:471] Config param load_from_prefill_dir: False
I0422 21:38:11.627778 140036574910272 pyconfig.py:471] Config param load_full_state_path: 
I0422 21:38:11.627792 140036574910272 pyconfig.py:471] Config param load_parameters_path: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items
I0422 21:38:11.627809 140036574910272 pyconfig.py:471] Config param local_checkpoint_directory: 
I0422 21:38:11.627823 140036574910272 pyconfig.py:471] Config param local_checkpoint_period: 0
I0422 21:38:11.627847 140036574910272 pyconfig.py:471] Config param local_rope_max_timescale: -1
I0422 21:38:11.627864 140036574910272 pyconfig.py:471] Config param local_rope_proportion: 1.0
I0422 21:38:11.627878 140036574910272 pyconfig.py:471] Config param log_config: True
I0422 21:38:11.627900 140036574910272 pyconfig.py:471] Config param log_period: 10
I0422 21:38:11.627914 140036574910272 pyconfig.py:471] Config param logical_axis_rules: (('activation_embed_and_logits_batch', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_embed_and_logits_batch_sequence', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_vocab', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_vocab', ('tensor', 'tensor_transpose')), ('activation_vocab', 'tensor_sequence'), ('activation_vocab', ('sequence', 'context')), ('vocab', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('embed_vocab', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence', 'autoregressive')), ('activation_kv_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence')), ('activation_attn_length', ('sequence', 'context')), ('activation_attn_length', ('context',)), ('activation_q_length', ('context',)), ('activation_kv_length', ()), ('activation_attn_embed', ('tensor', 'tensor_transpose')), ('activation_kv', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_kv_head_dim', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('q_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('kv_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('qkv', ()), ('kv', ()), ('kv_head_dim', ()), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'expert')), ('q_lora_up_proj', ()), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'expert')), ('kv_lora_up_proj', ()), ('activation_batch_moe', ('data', 'fsdp', 'fsdp_transpose')), ('activation_length_moe', ('sequence', 'context')), ('activation_length_moe', ('context',)), ('activation_norm_length_moe', ('tensor_sequence', 'context', 'sequence')), ('activation_embed_moe', ('tensor', 'tensor_transpose')), ('activation_mlp_moe', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_exp', ('expert',)), ('exp', 'expert'), ('mlp_moe', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'context')), ('embed_moe', ('fsdp', 'sequence', 'context')), ('activation_mlp', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_length', ('sequence', 'context')), ('activation_length', ('context',)), ('activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_embed', ('tensor', 'tensor_transpose')), ('activation_stage', 'stage'), ('mlp', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'context', 'expert')), ('norm', ('tensor', 'tensor_transpose')), ('layers', 'stage'), ('diloco', 'diloco'), ('engram_dim', ('tensor',)), ('dense_layers', ()), ('moe_layers', ()), ('mhc', ()), ('prefill_activation_length', ('sequence', 'context')), ('prefill_activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_prefill_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_length', ('sequence',)), ('cache_heads', ('autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence')), ('cache_heads', ('autoregressive', 'tensor', 'tensor_sequence')), ('paged_kv_heads', ('tensor',)), ('cache_batch_prefill', ()), ('cache_batch', ()), ('cache_heads_none', ()), ('cache_kv', ()), ('cache_sequence', ()), ('num_pages', ()), ('tokens_per_page', ()), ('paged_kv_head_dim_size', ()), ('mlp_no_fsdp', ('tensor', 'tensor_sequence', 'autoregressive')), ('embed_tensor_transpose', ('tensor_transpose',)), ('exp_with_fsdp', 'fsdp'))
I0422 21:38:11.628005 140036574910272 pyconfig.py:471] Config param logits_dot_in_fp32: False
I0422 21:38:11.628023 140036574910272 pyconfig.py:471] Config param logits_via_embedding: True
I0422 21:38:11.628045 140036574910272 pyconfig.py:471] Config param lora_input_adapters_path: 
I0422 21:38:11.628061 140036574910272 pyconfig.py:471] Config param loss_algo: grpo
I0422 21:38:11.628077 140036574910272 pyconfig.py:471] Config param lr_schedule_type: LearningRateScheduleType.COSINE
I0422 21:38:11.628118 140036574910272 pyconfig.py:471] Config param managed_mldiagnostics: False
I0422 21:38:11.628133 140036574910272 pyconfig.py:471] Config param managed_mldiagnostics_dir: /deps/maxtext_output/gpt3-52k_2026-04-22-21-38/managed-mldiagnostics
I0422 21:38:11.628154 140036574910272 pyconfig.py:471] Config param managed_mldiagnostics_run_group: 
I0422 21:38:11.628169 140036574910272 pyconfig.py:471] Config param matmul_precision: MatmulPrecision.DEFAULT
I0422 21:38:11.628198 140036574910272 pyconfig.py:471] Config param max_checkify: False
I0422 21:38:11.628214 140036574910272 pyconfig.py:471] Config param max_concurrency: 256
I0422 21:38:11.628228 140036574910272 pyconfig.py:471] Config param max_corpus_chars: 10000000
I0422 21:38:11.628254 140036574910272 pyconfig.py:471] Config param max_num_batched_tokens: None
I0422 21:38:11.628268 140036574910272 pyconfig.py:471] Config param max_num_checkpoints_to_keep: None
I0422 21:38:11.628294 140036574910272 pyconfig.py:471] Config param max_num_images_per_example: -1
I0422 21:38:11.628310 140036574910272 pyconfig.py:471] Config param max_num_seqs: None
I0422 21:38:11.628324 140036574910272 pyconfig.py:471] Config param max_position_embeddings: 163840
I0422 21:38:11.628351 140036574910272 pyconfig.py:471] Config param max_prefill_predict_length: 64
I0422 21:38:11.628367 140036574910272 pyconfig.py:471] Config param max_sample_len_for_audio: 10000
I0422 21:38:11.628382 140036574910272 pyconfig.py:471] Config param max_segments_per_seq: -1
I0422 21:38:11.628403 140036574910272 pyconfig.py:471] Config param max_source_positions_for_audio: 1500
I0422 21:38:11.628420 140036574910272 pyconfig.py:471] Config param max_target_length: 2048
I0422 21:38:11.628436 140036574910272 pyconfig.py:471] Config param max_timescale_for_audio: 10000.0
I0422 21:38:11.628452 140036574910272 pyconfig.py:471] Config param megablox: True
I0422 21:38:11.628467 140036574910272 pyconfig.py:471] Config param merge_gating_gmm: False
I0422 21:38:11.628483 140036574910272 pyconfig.py:471] Config param mesh_axes: ['diloco', 'data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive']
I0422 21:38:11.628500 140036574910272 pyconfig.py:471] Config param metrics_dir: /deps/maxtext_output/gpt3-52k_2026-04-22-21-38/metrics/
I0422 21:38:11.628519 140036574910272 pyconfig.py:471] Config param metrics_file: 
I0422 21:38:11.628535 140036574910272 pyconfig.py:471] Config param mhc_expansion_rate: 1
I0422 21:38:11.628551 140036574910272 pyconfig.py:471] Config param micro_batch_size_to_eval_on: 64
I0422 21:38:11.628567 140036574910272 pyconfig.py:471] Config param micro_batch_size_to_train_on: 64
I0422 21:38:11.628582 140036574910272 pyconfig.py:471] Config param mla_kv: RematLocation.REMAT
I0422 21:38:11.628598 140036574910272 pyconfig.py:471] Config param mla_naive_kvcache: True
I0422 21:38:11.628613 140036574910272 pyconfig.py:471] Config param mla_q: RematLocation.REMAT
I0422 21:38:11.628629 140036574910272 pyconfig.py:471] Config param mlp_activations: ['gelu']
I0422 21:38:11.628644 140036574910272 pyconfig.py:471] Config param mlp_activations_limit: -1.0
I0422 21:38:11.628660 140036574910272 pyconfig.py:471] Config param mlp_bias: False
I0422 21:38:11.628675 140036574910272 pyconfig.py:471] Config param mlp_dim: 64
I0422 21:38:11.628691 140036574910272 pyconfig.py:471] Config param mlpwi: RematLocation.REMAT
I0422 21:38:11.628706 140036574910272 pyconfig.py:471] Config param mlpwi_0: RematLocation.REMAT
I0422 21:38:11.628721 140036574910272 pyconfig.py:471] Config param mlpwi_1: RematLocation.REMAT
I0422 21:38:11.628737 140036574910272 pyconfig.py:471] Config param mlpwo: RematLocation.REMAT
I0422 21:38:11.628752 140036574910272 pyconfig.py:471] Config param moba: False
I0422 21:38:11.628767 140036574910272 pyconfig.py:471] Config param moba_chunk_size: 1024
I0422 21:38:11.628782 140036574910272 pyconfig.py:471] Config param moba_topk: 8
I0422 21:38:11.628798 140036574910272 pyconfig.py:471] Config param model_call_mode: 
I0422 21:38:11.628814 140036574910272 pyconfig.py:471] Config param model_name: gpt3-52k
I0422 21:38:11.628828 140036574910272 pyconfig.py:471] Config param moe_expert_input_dim: -1
I0422 21:38:11.628844 140036574910272 pyconfig.py:471] Config param moe_fsdp_use_two_stage_all_gather: False
I0422 21:38:11.628860 140036574910272 pyconfig.py:471] Config param moe_mlp_dim: -1
I0422 21:38:11.628874 140036574910272 pyconfig.py:471] Config param moe_mlpwi_0: RematLocation.REMAT
I0422 21:38:11.628890 140036574910272 pyconfig.py:471] Config param moe_mlpwi_1: RematLocation.REMAT
I0422 21:38:11.628905 140036574910272 pyconfig.py:471] Config param moe_mlpwo: RematLocation.REMAT
I0422 21:38:11.628921 140036574910272 pyconfig.py:471] Config param monitor_goodput: False
I0422 21:38:11.628936 140036574910272 pyconfig.py:471] Config param monitor_step_time_deviation: True
I0422 21:38:11.628952 140036574910272 pyconfig.py:471] Config param mrope_section: [24, 20, 20]
I0422 21:38:11.628967 140036574910272 pyconfig.py:471] Config param mscale: 1.0
I0422 21:38:11.628983 140036574910272 pyconfig.py:471] Config param mtc_data_parallelism: 0
I0422 21:38:11.628997 140036574910272 pyconfig.py:471] Config param mtp_eval_target_module: 0
I0422 21:38:11.629012 140036574910272 pyconfig.py:471] Config param mtp_loss_scaling_factor: 0.1
I0422 21:38:11.629029 140036574910272 pyconfig.py:471] Config param mtp_num_layers: 0
I0422 21:38:11.629043 140036574910272 pyconfig.py:471] Config param mu_dtype: float32
I0422 21:38:11.629068 140036574910272 pyconfig.py:471] Config param multi_sampling: False
I0422 21:38:11.629083 140036574910272 pyconfig.py:471] Config param multi_tier_checkpointing_backup_interval_minutes: 0
I0422 21:38:11.629118 140036574910272 pyconfig.py:471] Config param muon_beta: 0.95
I0422 21:38:11.629137 140036574910272 pyconfig.py:471] Config param muon_consistent_rms: None
I0422 21:38:11.629153 140036574910272 pyconfig.py:471] Config param muon_weight_decay: 0.0
I0422 21:38:11.629169 140036574910272 pyconfig.py:471] Config param n_routing_groups: -1
I0422 21:38:11.629183 140036574910272 pyconfig.py:471] Config param n_window_for_audio: 50
I0422 21:38:11.629198 140036574910272 pyconfig.py:471] Config param n_window_infer_for_audio: 800
I0422 21:38:11.629213 140036574910272 pyconfig.py:471] Config param nope_layer_interval: -1
I0422 21:38:11.629229 140036574910272 pyconfig.py:471] Config param norm_topk_prob: False
I0422 21:38:11.629244 140036574910272 pyconfig.py:471] Config param normalization_layer_epsilon: 1e-05
I0422 21:38:11.629262 140036574910272 pyconfig.py:471] Config param normalize_embedding_logits: False
I0422 21:38:11.629278 140036574910272 pyconfig.py:471] Config param num_attention_heads_for_vit: 16
I0422 21:38:11.629293 140036574910272 pyconfig.py:471] Config param num_batches: 4
I0422 21:38:11.629309 140036574910272 pyconfig.py:471] Config param num_channels_for_vit: 3
I0422 21:38:11.629325 140036574910272 pyconfig.py:471] Config param num_conv_layers_for_audio: 3
I0422 21:38:11.629339 140036574910272 pyconfig.py:471] Config param num_decoder_layers: 1
I0422 21:38:11.629369 140036574910272 pyconfig.py:471] Config param num_diloco_replicas: 1
I0422 21:38:11.629384 140036574910272 pyconfig.py:471] Config param num_epoch: 1
I0422 21:38:11.629409 140036574910272 pyconfig.py:471] Config param num_eval_passes: 1
I0422 21:38:11.629425 140036574910272 pyconfig.py:471] Config param num_experts: 1
I0422 21:38:11.629441 140036574910272 pyconfig.py:471] Config param num_experts_per_tok: 1
I0422 21:38:11.629462 140036574910272 pyconfig.py:471] Config param num_generations: 2
I0422 21:38:11.629478 140036574910272 pyconfig.py:471] Config param num_hidden_layers_for_vit: 34
I0422 21:38:11.629503 140036574910272 pyconfig.py:471] Config param num_iterations: 1
I0422 21:38:11.629523 140036574910272 pyconfig.py:471] Config param num_kv_heads: 2
I0422 21:38:11.629538 140036574910272 pyconfig.py:471] Config param num_layers_per_pipeline_stage: 1
I0422 21:38:11.629564 140036574910272 pyconfig.py:471] Config param num_mel_bins_for_audio: 128
I0422 21:38:11.629578 140036574910272 pyconfig.py:471] Config param num_pipeline_microbatches: -1
I0422 21:38:11.629601 140036574910272 pyconfig.py:471] Config param num_pipeline_repeats: -1
I0422 21:38:11.629616 140036574910272 pyconfig.py:471] Config param num_position_embeddings_for_vit: 1024
I0422 21:38:11.629632 140036574910272 pyconfig.py:471] Config param num_query_heads: 2
I0422 21:38:11.629658 140036574910272 pyconfig.py:471] Config param num_samplers_slices: -1
I0422 21:38:11.629674 140036574910272 pyconfig.py:471] Config param num_slices: 1
I0422 21:38:11.629688 140036574910272 pyconfig.py:471] Config param num_target_devices: 32
I0422 21:38:11.629714 140036574910272 pyconfig.py:471] Config param num_test_batches: 5
I0422 21:38:11.629728 140036574910272 pyconfig.py:471] Config param num_trainer_slices: -1
I0422 21:38:11.629752 140036574910272 pyconfig.py:471] Config param num_vocab_tiling: 1
I0422 21:38:11.629768 140036574910272 pyconfig.py:471] Config param off_policy_steps: 0
I0422 21:38:11.629782 140036574910272 pyconfig.py:471] Config param offline_data_dir: None
I0422 21:38:11.629809 140036574910272 pyconfig.py:471] Config param opt_type: OptimizerType.ADAM_PAX
I0422 21:38:11.629827 140036574910272 pyconfig.py:471] Config param optimize_mesh_for_tpu_v6e: False
I0422 21:38:11.629852 140036574910272 pyconfig.py:471] Config param optimizer_memory_host_offload: False
I0422 21:38:11.629868 140036574910272 pyconfig.py:471] Config param original_max_position_embeddings: 4096
I0422 21:38:11.629883 140036574910272 pyconfig.py:471] Config param out_hidden_size_for_vit: 512
I0422 21:38:11.629903 140036574910272 pyconfig.py:471] Config param out_proj: RematLocation.REMAT
I0422 21:38:11.629920 140036574910272 pyconfig.py:471] Config param output_dim_for_audio: 512
I0422 21:38:11.629934 140036574910272 pyconfig.py:471] Config param override_logical_axis_rules: False
I0422 21:38:11.629955 140036574910272 pyconfig.py:471] Config param override_model_config: True
I0422 21:38:11.629971 140036574910272 pyconfig.py:471] Config param packing: True
I0422 21:38:11.629985 140036574910272 pyconfig.py:471] Config param pagedattn_head_dim_alignment: 128
I0422 21:38:11.630010 140036574910272 pyconfig.py:471] Config param pagedattn_max_pages_per_group: -1
I0422 21:38:11.630027 140036574910272 pyconfig.py:471] Config param pagedattn_num_pages: 64
I0422 21:38:11.630041 140036574910272 pyconfig.py:471] Config param pagedattn_pages_per_compute_block: 4
I0422 21:38:11.630068 140036574910272 pyconfig.py:471] Config param pagedattn_tokens_per_page: 32
I0422 21:38:11.630084 140036574910272 pyconfig.py:471] Config param param_scan_axis: 1
I0422 21:38:11.630122 140036574910272 pyconfig.py:471] Config param parameter_memory_host_offload: False
I0422 21:38:11.630139 140036574910272 pyconfig.py:471] Config param partial_rotary_factor: 1.0
I0422 21:38:11.630159 140036574910272 pyconfig.py:471] Config param patch_size_for_vit: 14
I0422 21:38:11.630175 140036574910272 pyconfig.py:471] Config param penalty_incorrect_answer: -1.0
I0422 21:38:11.630191 140036574910272 pyconfig.py:471] Config param penalty_incorrect_format: -0.5
I0422 21:38:11.630217 140036574910272 pyconfig.py:471] Config param per_device_batch_size: 2
I0422 21:38:11.630232 140036574910272 pyconfig.py:471] Config param per_device_batch_size_increment: 2.0
I0422 21:38:11.630259 140036574910272 pyconfig.py:471] Config param per_device_batch_size_start: 4.0
I0422 21:38:11.630276 140036574910272 pyconfig.py:471] Config param pipeline_delay_activation_forwarding: False
I0422 21:38:11.630290 140036574910272 pyconfig.py:471] Config param pipeline_fsdp_ag_once: False
I0422 21:38:11.630316 140036574910272 pyconfig.py:471] Config param pipeline_fsdp_ag_per_repeat: False
I0422 21:38:11.630330 140036574910272 pyconfig.py:471] Config param pipeline_parallel_layers: 1
I0422 21:38:11.630351 140036574910272 pyconfig.py:471] Config param pixel_shuffle_ratio_for_vit: 0.5
I0422 21:38:11.630367 140036574910272 pyconfig.py:471] Config param posemb_type_for_vit: learn
I0422 21:38:11.630384 140036574910272 pyconfig.py:471] Config param position_id_per_seconds: 25
I0422 21:38:11.630398 140036574910272 pyconfig.py:471] Config param prefill_cache_axis_order: 1,2,0,3
I0422 21:38:11.630414 140036574910272 pyconfig.py:471] Config param prefill_cache_dir: 
I0422 21:38:11.630429 140036574910272 pyconfig.py:471] Config param prefill_chunk_size: 256
I0422 21:38:11.630444 140036574910272 pyconfig.py:471] Config param prefill_slice: v5e-16
I0422 21:38:11.630459 140036574910272 pyconfig.py:471] Config param prefix_caching_dram_byte: 100000000000
I0422 21:38:11.630475 140036574910272 pyconfig.py:471] Config param prefix_caching_hbm_byte: 10000000000
I0422 21:38:11.630490 140036574910272 pyconfig.py:471] Config param profile_cleanly: True
I0422 21:38:11.630510 140036574910272 pyconfig.py:471] Config param profile_periodically_period: -1
I0422 21:38:11.630526 140036574910272 pyconfig.py:471] Config param profile_power_events: False
I0422 21:38:11.630541 140036574910272 pyconfig.py:471] Config param profiler: ProfilerType.NONE
I0422 21:38:11.630560 140036574910272 pyconfig.py:471] Config param profiler_steps: 5
I0422 21:38:11.630575 140036574910272 pyconfig.py:471] Config param projector_dropout_for_vit: 0.0
I0422 21:38:11.630591 140036574910272 pyconfig.py:471] Config param projector_input_dim_for_vit: 4096
I0422 21:38:11.630606 140036574910272 pyconfig.py:471] Config param projector_output_dim_for_vit: 4096
I0422 21:38:11.630622 140036574910272 pyconfig.py:471] Config param prometheus_port: 0
I0422 21:38:11.630636 140036574910272 pyconfig.py:471] Config param prompt: I love to
I0422 21:38:11.630652 140036574910272 pyconfig.py:471] Config param pure_nnx: False
I0422 21:38:11.630667 140036574910272 pyconfig.py:471] Config param pure_nnx_decoder: False
I0422 21:38:11.630683 140036574910272 pyconfig.py:471] Config param q_lora_rank: 0
I0422 21:38:11.630697 140036574910272 pyconfig.py:471] Config param qk_clip_threshold: 100.0
I0422 21:38:11.630712 140036574910272 pyconfig.py:471] Config param qk_nope_head_dim: 128
I0422 21:38:11.630727 140036574910272 pyconfig.py:471] Config param qk_norm_with_scale: True
I0422 21:38:11.630742 140036574910272 pyconfig.py:471] Config param qk_rope_head_dim: 64
I0422 21:38:11.630757 140036574910272 pyconfig.py:471] Config param qkv_proj: RematLocation.REMAT
I0422 21:38:11.630774 140036574910272 pyconfig.py:471] Config param quant_cfg_path: 
I0422 21:38:11.630789 140036574910272 pyconfig.py:471] Config param quantization: QuantizationType.NONE
I0422 21:38:11.630808 140036574910272 pyconfig.py:471] Config param quantization_local_shard_count: 4
I0422 21:38:11.630822 140036574910272 pyconfig.py:471] Config param quantize_kvcache: False
I0422 21:38:11.630838 140036574910272 pyconfig.py:471] Config param query_proj: RematLocation.REMAT
I0422 21:38:11.630853 140036574910272 pyconfig.py:471] Config param query_wa_proj: RematLocation.REMAT
I0422 21:38:11.630870 140036574910272 pyconfig.py:471] Config param ragged_block_size: 256
I0422 21:38:11.630884 140036574910272 pyconfig.py:471] Config param ragged_buffer_factor: -1.0
I0422 21:38:11.630901 140036574910272 pyconfig.py:471] Config param rampup_end_step: 0
I0422 21:38:11.630915 140036574910272 pyconfig.py:471] Config param rampup_samples_per_increment_to_load: None
I0422 21:38:11.630931 140036574910272 pyconfig.py:471] Config param reasoning_end_token: </reasoning>
I0422 21:38:11.630946 140036574910272 pyconfig.py:471] Config param reasoning_start_token: <reasoning>
I0422 21:38:11.630962 140036574910272 pyconfig.py:471] Config param record_internal_nn_metrics: 0
I0422 21:38:11.630976 140036574910272 pyconfig.py:471] Config param remat_policy: full
I0422 21:38:11.630991 140036574910272 pyconfig.py:471] Config param remat_policy_for_vit: minimal
I0422 21:38:11.631006 140036574910272 pyconfig.py:471] Config param remove_size_one_mesh_axis_from_type: True
I0422 21:38:11.631021 140036574910272 pyconfig.py:471] Config param replicate_quant_scale: False
I0422 21:38:11.631035 140036574910272 pyconfig.py:471] Config param replicator_backup_interval_minutes: 0
I0422 21:38:11.631051 140036574910272 pyconfig.py:471] Config param report_heartbeat_metric_for_gcp_monitoring: False
I0422 21:38:11.631066 140036574910272 pyconfig.py:471] Config param report_performance_metric_for_gcp_monitoring: False
I0422 21:38:11.631080 140036574910272 pyconfig.py:471] Config param reshape_q: False
I0422 21:38:11.631107 140036574910272 pyconfig.py:471] Config param return_log_prob: False
I0422 21:38:11.631123 140036574910272 pyconfig.py:471] Config param reuse_example_batch: 0
I0422 21:38:11.631138 140036574910272 pyconfig.py:471] Config param reward_exact_answer: 5.0
I0422 21:38:11.631153 140036574910272 pyconfig.py:471] Config param reward_exact_format_match: 3.0
I0422 21:38:11.631168 140036574910272 pyconfig.py:471] Config param reward_partial_format_match: 0.5
I0422 21:38:11.631184 140036574910272 pyconfig.py:471] Config param reward_ratio_guess_to_answer_high: 0.5
I0422 21:38:11.631200 140036574910272 pyconfig.py:471] Config param reward_ratio_guess_to_answer_low: 0.25
I0422 21:38:11.631216 140036574910272 pyconfig.py:471] Config param reward_white_space_format_match: 1.5
I0422 21:38:11.631231 140036574910272 pyconfig.py:471] Config param rl: {'num_generations': 2, 'num_iterations': 1, 'grpo_beta': 0.08, 'grpo_epsilon': 0.2, 'loss_algo': 'grpo', 'use_agentic_rollout': False, 'max_concurrency': 256, 'off_policy_steps': 0, 'system_prompt': '', 'degenerate_group_masking': True, 'epsilon_high': None}
I0422 21:38:11.631251 140036574910272 pyconfig.py:471] Config param rollout_data_parallelism: -1
I0422 21:38:11.631266 140036574910272 pyconfig.py:471] Config param rollout_expert_parallelism: 1
I0422 21:38:11.631282 140036574910272 pyconfig.py:471] Config param rollout_micro_batch_size: -1
I0422 21:38:11.631297 140036574910272 pyconfig.py:471] Config param rollout_tensor_parallelism: -1
I0422 21:38:11.631313 140036574910272 pyconfig.py:471] Config param rope_attention_scaling: False
I0422 21:38:11.631328 140036574910272 pyconfig.py:471] Config param rope_factor: 40
I0422 21:38:11.631344 140036574910272 pyconfig.py:471] Config param rope_interleave: True
I0422 21:38:11.631358 140036574910272 pyconfig.py:471] Config param rope_linear_scaling_factor: 1.0
I0422 21:38:11.631374 140036574910272 pyconfig.py:471] Config param rope_max_timescale: 10000
I0422 21:38:11.631390 140036574910272 pyconfig.py:471] Config param rope_min_timescale: 1
I0422 21:38:11.631405 140036574910272 pyconfig.py:471] Config param rope_theta_for_vit: 10000
I0422 21:38:11.631421 140036574910272 pyconfig.py:471] Config param rope_truncate: True
I0422 21:38:11.631437 140036574910272 pyconfig.py:471] Config param rope_type: RopeType.DEFAULT
I0422 21:38:11.631453 140036574910272 pyconfig.py:471] Config param rope_use_scale: True
I0422 21:38:11.631469 140036574910272 pyconfig.py:471] Config param routed_bias: False
I0422 21:38:11.631483 140036574910272 pyconfig.py:471] Config param routed_bias_update_rate: 0.0
I0422 21:38:11.631500 140036574910272 pyconfig.py:471] Config param routed_scaling_factor: 1.0
I0422 21:38:11.631519 140036574910272 pyconfig.py:471] Config param routed_score_func: 
I0422 21:38:11.631534 140036574910272 pyconfig.py:471] Config param run_name: gpt3-52k_2026-04-22-21-38
I0422 21:38:11.631550 140036574910272 pyconfig.py:471] Config param sa_block_kv: 512
I0422 21:38:11.631564 140036574910272 pyconfig.py:471] Config param sa_block_kv_compute: 512
I0422 21:38:11.631580 140036574910272 pyconfig.py:471] Config param sa_block_kv_dkv: 512
I0422 21:38:11.631594 140036574910272 pyconfig.py:471] Config param sa_block_kv_dkv_compute: 512
I0422 21:38:11.631610 140036574910272 pyconfig.py:471] Config param sa_block_kv_dq: 512
I0422 21:38:11.631625 140036574910272 pyconfig.py:471] Config param sa_block_q: 512
I0422 21:38:11.631641 140036574910272 pyconfig.py:471] Config param sa_block_q_dkv: 512
I0422 21:38:11.631655 140036574910272 pyconfig.py:471] Config param sa_block_q_dq: 512
I0422 21:38:11.631671 140036574910272 pyconfig.py:471] Config param sa_k_layout: HEAD_DIM_MINOR
I0422 21:38:11.631686 140036574910272 pyconfig.py:471] Config param sa_q_layout: HEAD_DIM_MINOR
I0422 21:38:11.631701 140036574910272 pyconfig.py:471] Config param sa_use_fused_bwd_kernel: False
I0422 21:38:11.631716 140036574910272 pyconfig.py:471] Config param sa_v_layout: HEAD_DIM_MINOR
I0422 21:38:11.631732 140036574910272 pyconfig.py:471] Config param sampler_devices_fraction: 0.5
I0422 21:38:11.631748 140036574910272 pyconfig.py:471] Config param save_checkpoint_on_completion: True
I0422 21:38:11.631763 140036574910272 pyconfig.py:471] Config param save_config_to_gcs: False
I0422 21:38:11.631779 140036574910272 pyconfig.py:471] Config param save_quantized_params_path: 
I0422 21:38:11.631800 140036574910272 pyconfig.py:471] Config param scale_embedding_for_audio: True
I0422 21:38:11.631814 140036574910272 pyconfig.py:471] Config param scan_layers: True
I0422 21:38:11.631830 140036574910272 pyconfig.py:471] Config param scan_layers_per_stage: False
I0422 21:38:11.631856 140036574910272 pyconfig.py:471] Config param scan_pipeline_iterations: True
I0422 21:38:11.631871 140036574910272 pyconfig.py:471] Config param scan_pipeline_repeats: False
I0422 21:38:11.631895 140036574910272 pyconfig.py:471] Config param set_remat_policy_on_layers_per_stage: False
I0422 21:38:11.631912 140036574910272 pyconfig.py:471] Config param set_remat_policy_on_pipeline_iterations: True
I0422 21:38:11.631926 140036574910272 pyconfig.py:471] Config param sft_train_on_completion_only: False
I0422 21:38:11.631947 140036574910272 pyconfig.py:471] Config param shard_exp_on_fsdp: False
I0422 21:38:11.631962 140036574910272 pyconfig.py:471] Config param shard_mode: ShardMode.AUTO
I0422 21:38:11.631979 140036574910272 pyconfig.py:471] Config param shard_optimizer_over_data: False
I0422 21:38:11.632001 140036574910272 pyconfig.py:471] Config param sharding_strategy: None
I0422 21:38:11.632017 140036574910272 pyconfig.py:471] Config param sharding_tolerance: 0.02
I0422 21:38:11.632031 140036574910272 pyconfig.py:471] Config param shardy: True
I0422 21:38:11.632056 140036574910272 pyconfig.py:471] Config param share_kv_projections: False
I0422 21:38:11.632071 140036574910272 pyconfig.py:471] Config param shared_experts: 0
I0422 21:38:11.632102 140036574910272 pyconfig.py:471] Config param sinkhorn_iterations: 20
I0422 21:38:11.632119 140036574910272 pyconfig.py:471] Config param skip_first_n_steps_for_profiler: 1
I0422 21:38:11.632142 140036574910272 pyconfig.py:471] Config param skip_jax_distributed_system: False
I0422 21:38:11.632158 140036574910272 pyconfig.py:471] Config param skip_step_interval: 128
I0422 21:38:11.632173 140036574910272 pyconfig.py:471] Config param skip_step_on_spikes: False
I0422 21:38:11.632192 140036574910272 pyconfig.py:471] Config param skip_step_scaling_factor: 6.0
I0422 21:38:11.632209 140036574910272 pyconfig.py:471] Config param sliding_window_size: 0
I0422 21:38:11.632224 140036574910272 pyconfig.py:471] Config param solution_end_token: </answer>
I0422 21:38:11.632247 140036574910272 pyconfig.py:471] Config param solution_start_token: <answer>
I0422 21:38:11.632264 140036574910272 pyconfig.py:471] Config param source_checkpoint_layout: orbax
I0422 21:38:11.632279 140036574910272 pyconfig.py:471] Config param sparse_matmul: True
I0422 21:38:11.632304 140036574910272 pyconfig.py:471] Config param spatial_merge_size_for_vit: 2
I0422 21:38:11.632318 140036574910272 pyconfig.py:471] Config param stack_prefill_result_cache: False
I0422 21:38:11.632343 140036574910272 pyconfig.py:471] Config param stack_trace_interval_seconds: 600
I0422 21:38:11.632359 140036574910272 pyconfig.py:471] Config param stack_trace_to_cloud: False
I0422 21:38:11.632373 140036574910272 pyconfig.py:471] Config param step_deviation_interval_seconds: 30
I0422 21:38:11.632397 140036574910272 pyconfig.py:471] Config param steps: 200000
I0422 21:38:11.632413 140036574910272 pyconfig.py:471] Config param stop_strings: None
I0422 21:38:11.632430 140036574910272 pyconfig.py:471] Config param student_overrides: {'model_name': 'llama3.1-8b'}
I0422 21:38:11.632455 140036574910272 pyconfig.py:471] Config param student_params_to_update: None
I0422 21:38:11.632470 140036574910272 pyconfig.py:471] Config param subslice_shape: 
I0422 21:38:11.632494 140036574910272 pyconfig.py:471] Config param swap_space_vllm_gb: 2
I0422 21:38:11.632513 140036574910272 pyconfig.py:471] Config param system_prompt: 
I0422 21:38:11.632530 140036574910272 pyconfig.py:471] Config param target_eval_loss: 0.0
I0422 21:38:11.632555 140036574910272 pyconfig.py:471] Config param teacher_overrides: {'model_name': 'llama3.1-8b'}
I0422 21:38:11.632572 140036574910272 pyconfig.py:471] Config param temperature_tuning: False
I0422 21:38:11.632591 140036574910272 pyconfig.py:471] Config param temporal_patch_size_for_vit: 2
I0422 21:38:11.632607 140036574910272 pyconfig.py:471] Config param tensorboard_dir: /deps/maxtext_output/gpt3-52k_2026-04-22-21-38/tensorboard/
I0422 21:38:11.632624 140036574910272 pyconfig.py:471] Config param tensors_on_device: None
I0422 21:38:11.632645 140036574910272 pyconfig.py:471] Config param tensors_to_offload: None
I0422 21:38:11.632659 140036574910272 pyconfig.py:471] Config param test_batch_start_index: 0
I0422 21:38:11.632675 140036574910272 pyconfig.py:471] Config param tile_size_for_vit: 336
I0422 21:38:11.632700 140036574910272 pyconfig.py:471] Config param tokenize_eval_data: True
I0422 21:38:11.632716 140036574910272 pyconfig.py:471] Config param tokenize_train_data: True
I0422 21:38:11.632730 140036574910272 pyconfig.py:471] Config param tokenizer_path: meta-llama/Llama-3.1-8B
I0422 21:38:11.632751 140036574910272 pyconfig.py:471] Config param tokenizer_type: TokenizerType.HUGGINGFACE
I0422 21:38:11.632769 140036574910272 pyconfig.py:471] Config param topk_routing_group: -1
I0422 21:38:11.632795 140036574910272 pyconfig.py:471] Config param train_data_columns: ['text']
I0422 21:38:11.632812 140036574910272 pyconfig.py:471] Config param train_fraction: 1.0
I0422 21:38:11.632826 140036574910272 pyconfig.py:471] Config param train_image_column: image
I0422 21:38:11.632856 140036574910272 pyconfig.py:471] Config param train_micro_batch_size: -1
I0422 21:38:11.632871 140036574910272 pyconfig.py:471] Config param train_split: train
I0422 21:38:11.632890 140036574910272 pyconfig.py:471] Config param trainable_parameters_mask: []
I0422 21:38:11.632906 140036574910272 pyconfig.py:471] Config param trainable_position_size: 2048
I0422 21:38:11.632920 140036574910272 pyconfig.py:471] Config param trainer_devices_fraction: 0.5
I0422 21:38:11.632943 140036574910272 pyconfig.py:471] Config param upload_all_profiler_results: False
I0422 21:38:11.632959 140036574910272 pyconfig.py:471] Config param use_2d_fsdp_sharding: False
I0422 21:38:11.632974 140036574910272 pyconfig.py:471] Config param use_agentic_rollout: False
I0422 21:38:11.633000 140036574910272 pyconfig.py:471] Config param use_audio: False
I0422 21:38:11.633014 140036574910272 pyconfig.py:471] Config param use_audio_in_video: False
I0422 21:38:11.633030 140036574910272 pyconfig.py:471] Config param use_batch_split_schedule: False
I0422 21:38:11.633052 140036574910272 pyconfig.py:471] Config param use_chat_template: False
I0422 21:38:11.633066 140036574910272 pyconfig.py:471] Config param use_chunked_prefill: False
I0422 21:38:11.633100 140036574910272 pyconfig.py:471] Config param use_custom_sort_vjp: True
I0422 21:38:11.633116 140036574910272 pyconfig.py:471] Config param use_dpo: False
I0422 21:38:11.633131 140036574910272 pyconfig.py:471] Config param use_gather_mosaic_kernel: False
I0422 21:38:11.633152 140036574910272 pyconfig.py:471] Config param use_grpo: True
I0422 21:38:11.633166 140036574910272 pyconfig.py:471] Config param use_indexer: False
I0422 21:38:11.633182 140036574910272 pyconfig.py:471] Config param use_iota_embed: True
I0422 21:38:11.633197 140036574910272 pyconfig.py:471] Config param use_jax_splash: False
I0422 21:38:11.633213 140036574910272 pyconfig.py:471] Config param use_max_logit_estimate: -1
I0422 21:38:11.633228 140036574910272 pyconfig.py:471] Config param use_mrope: False
I0422 21:38:11.633244 140036574910272 pyconfig.py:471] Config param use_multimodal: False
I0422 21:38:11.633258 140036574910272 pyconfig.py:471] Config param use_nnx_pipeline: False
I0422 21:38:11.633274 140036574910272 pyconfig.py:471] Config param use_pathways: True
I0422 21:38:11.633288 140036574910272 pyconfig.py:471] Config param use_post_attn_norm: False
I0422 21:38:11.633304 140036574910272 pyconfig.py:471] Config param use_post_ffw_norm: False
I0422 21:38:11.633320 140036574910272 pyconfig.py:471] Config param use_qk_clip: False
I0422 21:38:11.633336 140036574910272 pyconfig.py:471] Config param use_qk_norm: False
I0422 21:38:11.633350 140036574910272 pyconfig.py:471] Config param use_qk_norm_in_gdn: True
I0422 21:38:11.633366 140036574910272 pyconfig.py:471] Config param use_qwix_quantization: False
I0422 21:38:11.633381 140036574910272 pyconfig.py:471] Config param use_ragged_attention: False
I0422 21:38:11.633395 140036574910272 pyconfig.py:471] Config param use_random_routing: False
I0422 21:38:11.633411 140036574910272 pyconfig.py:471] Config param use_replicator_service: False
I0422 21:38:11.633436 140036574910272 pyconfig.py:471] Config param use_ring_of_experts: False
I0422 21:38:11.633452 140036574910272 pyconfig.py:471] Config param use_sft: False
I0422 21:38:11.633467 140036574910272 pyconfig.py:471] Config param use_splash_scheduler: False
I0422 21:38:11.633492 140036574910272 pyconfig.py:471] Config param use_tokamax_gmm: False
I0422 21:38:11.633512 140036574910272 pyconfig.py:471] Config param use_tokamax_splash: False
I0422 21:38:11.633533 140036574910272 pyconfig.py:471] Config param use_truncation: True
I0422 21:38:11.633548 140036574910272 pyconfig.py:471] Config param use_tunix_gradient_accumulation: False
I0422 21:38:11.633563 140036574910272 pyconfig.py:471] Config param use_untrainable_positional_embedding: False
I0422 21:38:11.633584 140036574910272 pyconfig.py:471] Config param use_vertex_tensorboard: False
I0422 21:38:11.633600 140036574910272 pyconfig.py:471] Config param using_pipeline_parallelism: False
I0422 21:38:11.633615 140036574910272 pyconfig.py:471] Config param v_head_dim: 128
I0422 21:38:11.633640 140036574910272 pyconfig.py:471] Config param v_norm_with_scale: True
I0422 21:38:11.633656 140036574910272 pyconfig.py:471] Config param value_proj: RematLocation.REMAT
I0422 21:38:11.633673 140036574910272 pyconfig.py:471] Config param vertex_tensorboard_project: 
I0422 21:38:11.633699 140036574910272 pyconfig.py:471] Config param vertex_tensorboard_region: 
I0422 21:38:11.633714 140036574910272 pyconfig.py:471] Config param video_path: 
I0422 21:38:11.633734 140036574910272 pyconfig.py:471] Config param video_placeholder: <|video|>
I0422 21:38:11.633750 140036574910272 pyconfig.py:471] Config param vision_output_dim_for_vit: 4096
I0422 21:38:11.633766 140036574910272 pyconfig.py:471] Config param vision_output_length: -1
I0422 21:38:11.633780 140036574910272 pyconfig.py:471] Config param vllm_additional_config: {}
I0422 21:38:11.633796 140036574910272 pyconfig.py:471] Config param vllm_hf_config_path: 
I0422 21:38:11.633810 140036574910272 pyconfig.py:471] Config param vllm_hf_overrides: {}
I0422 21:38:11.633826 140036574910272 pyconfig.py:471] Config param vocab_size: 32000
I0422 21:38:11.633841 140036574910272 pyconfig.py:471] Config param warmup_steps_fraction: 0.1
I0422 21:38:11.633857 140036574910272 pyconfig.py:471] Config param weight_dtype: float32
I0422 21:38:11.633882 140036574910272 pyconfig.py:471] Config param weight_quantization_calibration_method: absmax
I0422 21:38:11.633897 140036574910272 pyconfig.py:471] Config param wi_tile_dlhs_batch_seq: 512
I0422 21:38:11.633914 140036574910272 pyconfig.py:471] Config param wi_tile_dlhs_embed_dim: 1024
I0422 21:38:11.633929 140036574910272 pyconfig.py:471] Config param wi_tile_dlhs_mlp_dim: 1024
I0422 21:38:11.633944 140036574910272 pyconfig.py:471] Config param wi_tile_drhs_batch_seq: 512
I0422 21:38:11.633959 140036574910272 pyconfig.py:471] Config param wi_tile_drhs_embed_dim: 1024
I0422 21:38:11.633975 140036574910272 pyconfig.py:471] Config param wi_tile_drhs_mlp_dim: 1024
I0422 21:38:11.633991 140036574910272 pyconfig.py:471] Config param wi_tile_fwd_batch_seq: 512
I0422 21:38:11.634006 140036574910272 pyconfig.py:471] Config param wi_tile_fwd_embed_dim: 1024
I0422 21:38:11.634021 140036574910272 pyconfig.py:471] Config param wi_tile_fwd_mlp_dim: 1024
I0422 21:38:11.634037 140036574910272 pyconfig.py:471] Config param wo_tile_dlhs_batch_seq: 512
I0422 21:38:11.634053 140036574910272 pyconfig.py:471] Config param wo_tile_dlhs_embed_dim: 1024
I0422 21:38:11.634067 140036574910272 pyconfig.py:471] Config param wo_tile_dlhs_mlp_dim: 1024
I0422 21:38:11.634083 140036574910272 pyconfig.py:471] Config param wo_tile_drhs_batch_seq: 512
I0422 21:38:11.634108 140036574910272 pyconfig.py:471] Config param wo_tile_drhs_embed_dim: 1024
I0422 21:38:11.634124 140036574910272 pyconfig.py:471] Config param wo_tile_drhs_mlp_dim: 1024
I0422 21:38:11.634139 140036574910272 pyconfig.py:471] Config param wo_tile_fwd_batch_seq: 512
I0422 21:38:11.634155 140036574910272 pyconfig.py:471] Config param wo_tile_fwd_embed_dim: 1024
I0422 21:38:11.634169 140036574910272 pyconfig.py:471] Config param wo_tile_fwd_mlp_dim: 1024
I0422 21:38:11.634185 140036574910272 pyconfig.py:471] Config param wsd_decay_steps_fraction: 0.1
I0422 21:38:11.634202 140036574910272 pyconfig.py:471] Config param wsd_decay_style: WsdDecayStyle.LINEAR
I0422 21:38:11.634220 140036574910272 pyconfig.py:471] Config param xprof_e2e_enable_fw_power_level_event: False
I0422 21:38:11.634236 140036574910272 pyconfig.py:471] Config param xprof_e2e_enable_fw_thermal_event: False
I0422 21:38:11.634252 140036574910272 pyconfig.py:471] Config param xprof_e2e_enable_fw_throttle_event: False
I0422 21:38:11.634268 140036574910272 pyconfig.py:471] Config param xprof_tpu_power_trace_level: 0
I0422 21:38:11.634286 140036574910272 pyconfig.py:471] Config param z_loss_multiplier: 0.0
I0422 21:38:11.634609 140036574910272 tokenizer.py:245] Tokenizer path: meta-llama/Llama-2-7b-chat-hf
I0422 21:38:11.634642 140036574910272 tokenizer.py:224] Loading HF tokenizer: meta-llama/Llama-2-7b-chat-hf
I0422 21:38:15.170317 140036574910272 _schedule.py:129] A polynomial schedule was set with a non-positive `transition_steps` value; this results in a constant schedule with value `init_value`.
I0422 21:38:15.173372 140036574910272 maxtext_utils.py:1565] Num_devices: 32, shape (1, 4, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1)
I0422 21:38:15.173504 140036574910272 train_distill.py:596] Applying logical axis rules for model initialization and training...
I0422 21:38:15.173575 140036574910272 train_distill.py:600] Loading Student from ...
I0422 21:38:15.173603 140036574910272 train_distill.py:169] --- Student Configuration ---
I0422 21:38:15.173624 140036574910272 train_distill.py:170]   Model Name:      gpt3-52k
I0422 21:38:15.173646 140036574910272 train_distill.py:171]   Dimensions:      1 Layers, 16 Emb Dim, 8 Head Dim
I0422 21:38:15.173666 140036574910272 train_distill.py:174]   Attention Heads: 2 Query, 2 KV
I0422 21:38:15.173683 140036574910272 train_distill.py:175]   Vocab Size:      32000
I0422 21:38:15.173701 140036574910272 train_distill.py:176]   Checkpoint:      
I0422 21:38:15.173719 140036574910272 train_distill.py:465] Initializing model: gpt3-52k...
I0422 21:38:16.586039 140036574910272 train_distill.py:614] Loading Teacher from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items...
I0422 21:38:16.586163 140036574910272 train_distill.py:169] --- Teacher Configuration ---
I0422 21:38:16.586192 140036574910272 train_distill.py:170]   Model Name:      gpt3-52k
I0422 21:38:16.586216 140036574910272 train_distill.py:171]   Dimensions:      1 Layers, 16 Emb Dim, 8 Head Dim
I0422 21:38:16.586236 140036574910272 train_distill.py:174]   Attention Heads: 2 Query, 2 KV
I0422 21:38:16.586256 140036574910272 train_distill.py:175]   Vocab Size:      32000
I0422 21:38:16.586276 140036574910272 train_distill.py:176]   Checkpoint:      gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items
I0422 21:38:16.586295 140036574910272 train_distill.py:465] Initializing model: gpt3-52k...
I0422 21:38:17.659429 140036574910272 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0422 21:38:17.659887 140036574910272 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=True, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7f5c16b354c0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0422 21:38:17.659947 140036574910272 abstract_checkpointer.py:35] orbax-checkpoint version: 0.11.28
W0422 21:38:18.199316 140036574910272 checkpoint.py:202] Metadata file does not exist: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items/_CHECKPOINT_METADATA
I0422 21:38:18.743621    2143 google_auth_provider.cc:181] Running on GCE, using service account 562977990677-compute@developer.gserviceaccount.com
I0422 21:38:20.340075 140036574910272 checkpointer.py:304] Restoring checkpoint from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items.
W0422 21:38:22.495800 140036574910272 transform_utils.py:230] The transformations API will eventually be replaced by an upgraded design. The current API will not be removed until this point, but it will no longer be actively worked on.
I0422 21:38:22.496178 140036574910272 transform_utils.py:288] The following keys are not loaded from the original tree after applying specified transforms: params/params/decoder/to_nnx__rngs/aqt/count, params/params/decoder/to_nnx__rngs/aqt/key, params/params/decoder/to_nnx__rngs/dropout/count, params/params/decoder/to_nnx__rngs/dropout/key, params/params/decoder/to_nnx__rngs/params/count, params/params/decoder/to_nnx__rngs/params/key
I0422 21:38:23.090080 140036574910272 checkpointer.py:318] Finished restoring checkpoint in 3.57 seconds from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items.
I0422 21:38:23.791862 140036574910272 train_distill.py:640] Initializing Data Iterators via MaxText pipeline...
I0422 21:38:23.854978 140036574910272 config.py:112] TensorFlow version 2.20.0 available.
I0422 21:38:23.855540 140036574910272 config.py:125] JAX version 0.8.3 available.
E0422 21:38:25.898768 140036574910272 packing.py:209] PackAndBatchOperation is deprecated. Please use lazy_dataset.FirstFitPackIterDataset instead.
I0422 21:38:25.898985 140036574910272 data_loader.py:408] Adding CopyNumPyArrayToSharedMemory MapTransform.
I0422 21:38:25.902163 140036574910272 train_distill.py:410] Input Pipeline Checkpointing: DISABLED
I0422 21:38:25.902222 140036574910272 train_distill.py:414] Reason: Iterator 'MultiHostDataLoadIterator' is not recognized as Grain (dataset_type='DatasetType.HF', has_save=False)
I0422 21:38:25.902295 140036574910272 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0422 21:38:25.902384 140036574910272 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7f5c16b354c0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0422 21:38:25.902427 140036574910272 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0422 21:38:25.902469 140036574910272 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7f5c16b354c0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0422 21:38:25.902532 140036574910272 checkpoint_manager.py:702] [process=6][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f45458b2c30>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f4544237350>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f45442372c0>}, handler_registry=None
I0422 21:38:25.902765 140036574910272 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f45458b2c30>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0422 21:38:25.902812 140036574910272 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f4544237350>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0422 21:38:25.902862 140036574910272 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f45442372c0>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0422 21:38:25.902905 140036574910272 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f4544236f60>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0422 21:38:25.902950 140036574910272 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f45458b2c30>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f45458b2c30>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f4544237350>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f4544237350>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f45442372c0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f45442372c0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f4544236f60>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f4544236f60>}).
I0422 21:38:25.903411 140036574910272 async_checkpointer.py:177] [process=6][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7f454438b240> timeout: 600 secs and primary_host=0 for async checkpoint writes
I0422 21:38:28.526608 140036574910272 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_test_pipeline_scan_nnx_20260422_212613/pt_distill_linen_xpk_test_pipeline_scan_nnx_20260422_212613_07_distill_smoke/checkpoints
I0422 21:38:28.987190 140036574910272 checkpoint_manager.py:921] [process=6][thread=MainThread] CheckpointManager created,  primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_test_pipeline_scan_nnx_20260422_212613/pt_distill_linen_xpk_test_pipeline_scan_nnx_20260422_212613_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7f4544237290>
I0422 21:38:28.987373 140036574910272 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0422 21:38:28.987440 140036574910272 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7f5c16b354c0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0422 21:38:28.987478 140036574910272 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0422 21:38:28.987510 140036574910272 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7f5c16b354c0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0422 21:38:28.987547 140036574910272 checkpoint_manager.py:1983] [process=6][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning.
I0422 21:38:28.987604 140036574910272 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=140036574910272 count=1 at 0x7f454444cdc0>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7f45454de570>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7f45462d92b0>, _write_futures=[])
I0422 21:38:28.987972 140036574910272 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=140036574910272 count=1 at 0x7f454444cdc0>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7f45454de570>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7f45462d92b0>, _write_futures=[])
I0422 21:38:28.987999 140036574910272 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=140036574910272 count=1 at 0x7f454444cdc0>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7f45454de570>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7f45462d92b0>, _write_futures=[])
I0422 21:38:28.988032 140036574910272 checkpoint_manager.py:702] [process=6][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f4544237260>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f454427e360>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f454427e1e0>, 'iter': <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7f454427eba0>}, handler_registry=None
I0422 21:38:28.988157 140036574910272 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f4544237260>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0422 21:38:28.988192 140036574910272 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f454427e360>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0422 21:38:28.988214 140036574910272 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f454427e1e0>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0422 21:38:28.988242 140036574910272 composite_checkpoint_handler.py:237] Deferred registration for item: "iter". Adding handler `<maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7f454427eba0>` for item "iter" and save args `<class 'maxtext.common.checkpointing.GrainCheckpointSave'>` and restore args `<class 'maxtext.common.checkpointing.GrainCheckpointRestore'>` to `_handler_registry`.
I0422 21:38:28.988272 140036574910272 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f454427f710>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0422 21:38:28.988296 140036574910272 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f4544237260>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f4544237260>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f454427e360>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f454427e360>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f454427e1e0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f454427e1e0>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointSave'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7f454427eba0>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointRestore'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7f454427eba0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f454427f710>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f454427f710>}).
I0422 21:38:28.988367 140036574910272 async_checkpointer.py:177] [process=6][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7f454438b380> timeout: 600 secs and primary_host=0 for async checkpoint writes
I0422 21:38:29.789692 140036574910272 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_test_pipeline_scan_nnx_20260422_212613/pt_distill_linen_xpk_test_pipeline_scan_nnx_20260422_212613_07_distill_smoke/checkpoints
I0422 21:38:30.230744 140036574910272 checkpoint_manager.py:921] [process=6][thread=MainThread] CheckpointManager created,  primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_test_pipeline_scan_nnx_20260422_212613/pt_distill_linen_xpk_test_pipeline_scan_nnx_20260422_212613_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7f4546252720>
I0422 21:38:30.231387 140036574910272 train_distill.py:691] Starting Distillation Training...
I0422 21:38:30.231498 140036574910272 peft_trainer.py:590] Training with mesh: Mesh('diloco': 1, 'data': 4, 'stage': 1, 'fsdp': 8, 'fsdp_transpose': 1, 'sequence': 1, 'context': 1, 'context_autoregressive': 1, 'tensor': 1, 'tensor_transpose': 1, 'tensor_sequence': 1, 'expert': 1, 'autoregressive': 1, axis_types=(Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto))
I0422 21:38:30.580068 140036574910272 peft_trainer.py:600] Compiled train_step cache size: 0

Training:   0%|          | 0/5 [00:00<?, ?step/s]I0422 21:38:30.581980 139891983775488 grain_pool.py:367] Grain pool will use 1 processes.
I0422 21:38:30.608375 139891983775488 grain_pool.py:440] Grain pool will start child processes.
I0422 21:38:30.613892 139891983775488 grain_pool.py:448] Grain pool started all child processes.
2026-04-22 21:38:36.663615: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'}
`rope_scaling`'s factor field must be a float >= 1, got 40
`rope_scaling`'s beta_fast field must be a float, got 32
`rope_scaling`'s beta_slow field must be a float, got 1
Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'}
Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'}
Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'}
I0422 21:38:40.068830 140036574910272 utils.py:86] Train loop finished in: 9.4882 seconds
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 765, in <module>
    app.run(main)
  File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 316, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 261, in _run_main
    sys.exit(main(argv))
             ^^^^^^^^^^
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 761, in main
    train_distill(student_config, teacher_config, is_offline, global_config.offline_data_dir)
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 693, in train_distill
    trainer.train(train_iter, eval_iter)
  File "/usr/local/lib/python3.12/site-packages/tunix/sft/peft_trainer.py", line 659, in train
    train_example = sharding_utils.shard_input(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 58, in shard_input
    return jax.tree.map(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/tree.py", line 155, in map
    return tree_util.tree_map(f, tree, *rest, is_leaf=is_leaf)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 362, in tree_map
    return treedef.unflatten(f(*xs) for xs in zip(*all_leaves))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 362, in <genexpr>
    return treedef.unflatten(f(*xs) for xs in zip(*all_leaves))
                             ^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 59, in <lambda>
    lambda x: jax.make_array_from_process_local_data(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 986, in make_array_from_process_local_data
    out = [_array_from_process_local_data(data, s, shape)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 1048, in _array_from_process_local_data
    return make_array_from_callback(global_shape, sharding, cb)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 845, in make_array_from_callback
    per_device_values = api.device_put(per_device_values, devices)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/api.py", line 2729, in device_put
    out_flat = dispatch._batched_device_put_impl(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 558, in _batched_device_put_impl
    y = _device_put_impl(x, device=device, src=src, copy=cp, aval=aval)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 545, in _device_put_impl
    return _device_put_sharding_impl(x, aval, device, copy)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 487, in _device_put_sharding_impl
    raise ValueError(
ValueError: device_put's first argument must be a fully addressable array, but got value with devices {TpuDevice(id=28, process_index=6, coords=(0,7,0), core_on_chip=0), TpuDevice(id=7, process_index=1, coords=(3,1,0), core_on_chip=0), TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), TpuDevice(id=21, process_index=4, coords=(1,5,0), core_on_chip=0), TpuDevice(id=6, process_index=1, coords=(2,1,0), core_on_chip=0), TpuDevice(id=16, process_index=4, coords=(0,4,0), core_on_chip=0), TpuDevice(id=4, process_index=0, coords=(0,1,0), core_on_chip=0), TpuDevice(id=5, process_index=0, coords=(1,1,0), core_on_chip=0), TpuDevice(id=14, process_index=3, coords=(2,3,0), core_on_chip=0), TpuDevice(id=25, process_index=6, coords=(1,6,0), core_on_chip=0), TpuDevice(id=22, process_index=5, coords=(2,5,0), core_on_chip=0), TpuDevice(id=18, process_index=5, coords=(2,4,0), core_on_chip=0), TpuDevice(id=8, process_index=2, coords=(0,2,0), core_on_chip=0), TpuDevice(id=27, process_index=7, coords=(3,6,0), core_on_chip=0), TpuDevice(id=10, process_index=3, coords=(2,2,0), core_on_chip=0), TpuDevice(id=29, process_index=6, coords=(1,7,0), core_on_chip=0), TpuDevice(id=15, process_index=3, coords=(3,3,0), core_on_chip=0), TpuDevice(id=17, process_index=4, coords=(1,4,0), core_on_chip=0), TpuDevice(id=13, process_index=2, coords=(1,3,0), core_on_chip=0), TpuDevice(id=24, process_index=6, coords=(0,6,0), core_on_chip=0), TpuDevice(id=19, process_index=5, coords=(3,4,0), core_on_chip=0), TpuDevice(id=12, process_index=2, coords=(0,3,0), core_on_chip=0), TpuDevice(id=20, process_index=4, coords=(0,5,0), core_on_chip=0), TpuDevice(id=23, process_index=5, coords=(3,5,0), core_on_chip=0), TpuDevice(id=30, process_index=7, coords=(2,7,0), core_on_chip=0), TpuDevice(id=26, process_index=7, coords=(2,6,0), core_on_chip=0), TpuDevice(id=11, process_index=3, coords=(3,2,0), core_on_chip=0), TpuDevice(id=31, process_index=7, coords=(3,7,0), core_on_chip=0), TpuDevice(id=9, process_index=2, coords=(1,2,0), core_on_chip=0), TpuDevice(id=3, process_index=1, coords=(3,0,0), core_on_chip=0), TpuDevice(id=2, process_index=1, coords=(2,0,0), core_on_chip=0), TpuDevice(id=1, process_index=0, coords=(1,0,0), core_on_chip=0)}
I0422 21:38:40.414344 139891983775488 grain_pool.py:542] Grain pool is exiting.
I0422 21:38:40.414450 139891983775488 grain_pool.py:547] Shutting down multiprocessing system.
I0422 21:38:41.862984 139891983775488 grain_pool.py:547] Shutting down multiprocessing system.

Training:   0%|          | 0/5 [00:13<?, ?step/s]
/usr/local/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 15 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
XPK End: Wed Apr 22 21:38:53 UTC 2026
EXIT_CODE=1