MaxView

‹ 01_sft_smokeCase: 07_distill_smoke— ›

Metrics: Linen vs NNX  ·  main

MetricLinen  586e69205NNX  586e69205Diff (NNX − Linen)

Diff = NNX value − Linen value. Green = NNX improved. Red = NNX regressed.

Linen  ·  586e69205  ·  main_20260423_071551  ·  full log
XPK Start: Thu Apr 23 07:30:09 UTC 2026
2026-04-23 07:30:27.434652: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
`rope_parameters`'s factor field must be a float >= 1, got 40
`rope_parameters`'s beta_fast field must be a float, got 32
`rope_parameters`'s beta_slow field must be a float, got 1
DeepseekV32Config got `key=rope_scaling` in kwargs but hasn't set it as attribute. For RoPE standardization you need to set `self.rope_parameters` in model's config. 
I0423 07:30:33.691767 137367741581120 max_utils.py:273] Attempting to initialize the jax distributed system...
I0423 07:30:42.731318 137367741581120 distributed.py:149] Starting JAX distributed service on [::]:8482
I0423 07:30:42.733544 137367741581120 distributed.py:172] Connecting to JAX distributed service on mt-07-distill-smoke-zkpib-slice-job-0-0.mt-07-distill-smoke-zkpib:8482
I0423 07:30:43.863112 137367741581120 max_utils.py:284] Jax distributed system initialized!
I0423 07:30:50.173268 137367741581120 max_utils.py:244] Jax distributed system is already initialized.
W0423 07:30:50.304201 137367741581120 pyconfig.py:111] base_output_directory is not provided; Using local directory called maxtext_output
I0423 07:30:50.727890 137367741581120 max_utils.py:244] Jax distributed system is already initialized.
I0423 07:30:50.729113 137367741581120 pyconfig.py:471] Config param abort_on_inf_loss: True
I0423 07:30:50.729161 137367741581120 pyconfig.py:471] Config param abort_on_nan_loss: True
I0423 07:30:50.729189 137367741581120 pyconfig.py:471] Config param act_quantization_calibration_method: absmax
I0423 07:30:50.729214 137367741581120 pyconfig.py:471] Config param activation_dropout_for_audio: 0.0
I0423 07:30:50.729236 137367741581120 pyconfig.py:471] Config param activation_function_for_audio: gelu
I0423 07:30:50.729253 137367741581120 pyconfig.py:471] Config param activations_in_float32: False
I0423 07:30:50.729272 137367741581120 pyconfig.py:471] Config param adam_b1: 0.9
I0423 07:30:50.729290 137367741581120 pyconfig.py:471] Config param adam_b2: 0.95
I0423 07:30:50.729308 137367741581120 pyconfig.py:471] Config param adam_eps: 1e-08
I0423 07:30:50.729332 137367741581120 pyconfig.py:471] Config param adam_eps_root: 0.0
I0423 07:30:50.729350 137367741581120 pyconfig.py:471] Config param adam_weight_decay: 0.1
I0423 07:30:50.729367 137367741581120 pyconfig.py:471] Config param adamw_mask: []
I0423 07:30:50.729382 137367741581120 pyconfig.py:471] Config param add_bos: True
I0423 07:30:50.729399 137367741581120 pyconfig.py:471] Config param add_eos: True
I0423 07:30:50.729415 137367741581120 pyconfig.py:471] Config param allow_split_physical_axes: False
I0423 07:30:50.729431 137367741581120 pyconfig.py:471] Config param ar_cache_axis_order: 1,2,0,3
I0423 07:30:50.729447 137367741581120 pyconfig.py:471] Config param async_checkpointing: True
I0423 07:30:50.729463 137367741581120 pyconfig.py:471] Config param async_scheduling: False
I0423 07:30:50.729478 137367741581120 pyconfig.py:471] Config param attention: dot_product
I0423 07:30:50.729494 137367741581120 pyconfig.py:471] Config param attention_bias: False
I0423 07:30:50.729511 137367741581120 pyconfig.py:471] Config param attention_dropout_for_audio: 0.0
I0423 07:30:50.729526 137367741581120 pyconfig.py:471] Config param attention_out: RematLocation.REMAT
I0423 07:30:50.729546 137367741581120 pyconfig.py:471] Config param attention_output_dim: -1
I0423 07:30:50.729561 137367741581120 pyconfig.py:471] Config param attention_sink: False
I0423 07:30:50.729579 137367741581120 pyconfig.py:471] Config param attention_type: global
I0423 07:30:50.729604 137367741581120 pyconfig.py:471] Config param attn_logits_soft_cap: None
I0423 07:30:50.729635 137367741581120 pyconfig.py:471] Config param audio_path: 
I0423 07:30:50.729654 137367741581120 pyconfig.py:471] Config param audio_placeholder: <|audio|>
I0423 07:30:50.729668 137367741581120 pyconfig.py:471] Config param autoregressive_decode_assert: 
I0423 07:30:50.729685 137367741581120 pyconfig.py:471] Config param base_config: base.yml
I0423 07:30:50.729699 137367741581120 pyconfig.py:471] Config param base_emb_dim: 16
I0423 07:30:50.729715 137367741581120 pyconfig.py:471] Config param base_mlp_dim: 64
I0423 07:30:50.729730 137367741581120 pyconfig.py:471] Config param base_moe_mlp_dim: -1
I0423 07:30:50.729746 137367741581120 pyconfig.py:471] Config param base_num_decoder_layers: 1
I0423 07:30:50.729760 137367741581120 pyconfig.py:471] Config param base_num_kv_heads: 2
I0423 07:30:50.729776 137367741581120 pyconfig.py:471] Config param base_num_query_heads: 2
I0423 07:30:50.729792 137367741581120 pyconfig.py:471] Config param base_output_directory: /deps/maxtext_output
I0423 07:30:50.729807 137367741581120 pyconfig.py:471] Config param batch_size: 1
I0423 07:30:50.729822 137367741581120 pyconfig.py:471] Config param batch_split_factor: 1
I0423 07:30:50.729837 137367741581120 pyconfig.py:471] Config param beta_fast: 32
I0423 07:30:50.729853 137367741581120 pyconfig.py:471] Config param beta_slow: 1
I0423 07:30:50.729867 137367741581120 pyconfig.py:471] Config param bwd_quantization_calibration_method: absmax
I0423 07:30:50.729884 137367741581120 pyconfig.py:471] Config param capacity_factor: -1.0
I0423 07:30:50.729901 137367741581120 pyconfig.py:471] Config param cast_logits_to_fp32: True
I0423 07:30:50.729918 137367741581120 pyconfig.py:471] Config param chat_template: 
I0423 07:30:50.729934 137367741581120 pyconfig.py:471] Config param chat_template_path: 
I0423 07:30:50.729950 137367741581120 pyconfig.py:471] Config param checkpoint_conversion_fn: None
I0423 07:30:50.729967 137367741581120 pyconfig.py:471] Config param checkpoint_dir: /deps/maxtext_output/gpt3-52k_2026-04-23-07-30/checkpoints/
I0423 07:30:50.729983 137367741581120 pyconfig.py:471] Config param checkpoint_is_quantized: False
I0423 07:30:50.729998 137367741581120 pyconfig.py:471] Config param checkpoint_period: 2000
I0423 07:30:50.730013 137367741581120 pyconfig.py:471] Config param checkpoint_storage_concurrent_gb: 96
I0423 07:30:50.730030 137367741581120 pyconfig.py:471] Config param checkpoint_storage_target_data_file_size_bytes: 2147483648
I0423 07:30:50.730044 137367741581120 pyconfig.py:471] Config param checkpoint_storage_use_ocdbt: True
I0423 07:30:50.730059 137367741581120 pyconfig.py:471] Config param checkpoint_storage_use_zarr3: True
I0423 07:30:50.730074 137367741581120 pyconfig.py:471] Config param checkpoint_todelete_full_path: None
I0423 07:30:50.730089 137367741581120 pyconfig.py:471] Config param checkpoint_todelete_subdir: None
I0423 07:30:50.730129 137367741581120 pyconfig.py:471] Config param chips_per_vm: 4
I0423 07:30:50.730144 137367741581120 pyconfig.py:471] Config param chunk_attn_window_size: 0
I0423 07:30:50.730160 137367741581120 pyconfig.py:471] Config param collect_stack_trace: False
I0423 07:30:50.730175 137367741581120 pyconfig.py:471] Config param colocated_python_checkpointing: False
I0423 07:30:50.730190 137367741581120 pyconfig.py:471] Config param colocated_python_data_input: False
I0423 07:30:50.730205 137367741581120 pyconfig.py:471] Config param compile_topology: 
I0423 07:30:50.730224 137367741581120 pyconfig.py:471] Config param compile_topology_num_slices: -1
I0423 07:30:50.730240 137367741581120 pyconfig.py:471] Config param compile_xla_flags: 
I0423 07:30:50.730255 137367741581120 pyconfig.py:471] Config param compiled_trainstep_file: 
I0423 07:30:50.730269 137367741581120 pyconfig.py:471] Config param compute_axis_order: 0,1,2,3
I0423 07:30:50.730285 137367741581120 pyconfig.py:471] Config param constant_bound_config: []
I0423 07:30:50.730300 137367741581120 pyconfig.py:471] Config param context: RematLocation.REMAT
I0423 07:30:50.730316 137367741581120 pyconfig.py:471] Config param context_parallel_load_balance: True
I0423 07:30:50.730332 137367741581120 pyconfig.py:471] Config param context_parallel_reorder_strategy: ReorderStrategy.AUTO
I0423 07:30:50.730350 137367741581120 pyconfig.py:471] Config param context_parallel_size: 1
I0423 07:30:50.730366 137367741581120 pyconfig.py:471] Config param context_parallel_strategy: all_gather
I0423 07:30:50.730380 137367741581120 pyconfig.py:471] Config param context_sharding: context
I0423 07:30:50.730395 137367741581120 pyconfig.py:471] Config param conv_chunksize_for_audio: 500
I0423 07:30:50.730410 137367741581120 pyconfig.py:471] Config param conv_stride_for_vit: 14
I0423 07:30:50.730425 137367741581120 pyconfig.py:471] Config param convert_checkpoint_if_possible: False
I0423 07:30:50.730440 137367741581120 pyconfig.py:471] Config param cost_estimate_flops_bwd: -1
I0423 07:30:50.730455 137367741581120 pyconfig.py:471] Config param cost_estimate_flops_fwd: -1
I0423 07:30:50.730472 137367741581120 pyconfig.py:471] Config param custom_mesh: 
I0423 07:30:50.730490 137367741581120 pyconfig.py:471] Config param custom_mesh_and_rule: 
I0423 07:30:50.730504 137367741581120 pyconfig.py:471] Config param d_model_for_audio: 256
I0423 07:30:50.730520 137367741581120 pyconfig.py:471] Config param data_sharding: (('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'),)
I0423 07:30:50.730540 137367741581120 pyconfig.py:471] Config param data_shuffle_seed: 0
I0423 07:30:50.730555 137367741581120 pyconfig.py:471] Config param dataset_name: c4/en:3.0.1
I0423 07:30:50.730570 137367741581120 pyconfig.py:471] Config param dataset_path: 
I0423 07:30:50.730584 137367741581120 pyconfig.py:471] Config param dataset_type: DatasetType.HF
I0423 07:30:50.730601 137367741581120 pyconfig.py:471] Config param dcn_autoregressive_parallelism: 1
I0423 07:30:50.730617 137367741581120 pyconfig.py:471] Config param dcn_context_autoregressive_parallelism: 1
I0423 07:30:50.730632 137367741581120 pyconfig.py:471] Config param dcn_context_parallelism: 1
I0423 07:30:50.730647 137367741581120 pyconfig.py:471] Config param dcn_data_parallelism: -1
I0423 07:30:50.730662 137367741581120 pyconfig.py:471] Config param dcn_diloco_parallelism: 1
I0423 07:30:50.730676 137367741581120 pyconfig.py:471] Config param dcn_expert_parallelism: 1
I0423 07:30:50.730691 137367741581120 pyconfig.py:471] Config param dcn_fsdp_parallelism: 1
I0423 07:30:50.730705 137367741581120 pyconfig.py:471] Config param dcn_fsdp_transpose_parallelism: 1
I0423 07:30:50.730721 137367741581120 pyconfig.py:471] Config param dcn_parallelism: [1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
I0423 07:30:50.730737 137367741581120 pyconfig.py:471] Config param dcn_pipeline_parallelism: 1
I0423 07:30:50.730753 137367741581120 pyconfig.py:471] Config param dcn_sequence_parallelism: 1
I0423 07:30:50.730767 137367741581120 pyconfig.py:471] Config param dcn_tensor_parallelism: 1
I0423 07:30:50.730782 137367741581120 pyconfig.py:471] Config param dcn_tensor_sequence_parallelism: 1
I0423 07:30:50.730798 137367741581120 pyconfig.py:471] Config param dcn_tensor_transpose_parallelism: 1
I0423 07:30:50.730813 137367741581120 pyconfig.py:471] Config param debug: {'rl': False}
I0423 07:30:50.730830 137367741581120 pyconfig.py:471] Config param debug_sharding: False
I0423 07:30:50.730844 137367741581120 pyconfig.py:471] Config param decode_sampling_nucleus_p: -1
I0423 07:30:50.730859 137367741581120 pyconfig.py:471] Config param decode_sampling_strategy: SamplingStrategy.GREEDY
I0423 07:30:50.730875 137367741581120 pyconfig.py:471] Config param decode_sampling_temperature: 1.0
I0423 07:30:50.730891 137367741581120 pyconfig.py:471] Config param decode_sampling_top_k: 0
I0423 07:30:50.730906 137367741581120 pyconfig.py:471] Config param decoder_block: DecoderBlockType.GPT3
I0423 07:30:50.730922 137367741581120 pyconfig.py:471] Config param decoder_layer_input: RematLocation.DEVICE
I0423 07:30:50.730939 137367741581120 pyconfig.py:471] Config param deepstack_visual_indexes_for_vit: []
I0423 07:30:50.730953 137367741581120 pyconfig.py:471] Config param degenerate_group_masking: True
I0423 07:30:50.730969 137367741581120 pyconfig.py:471] Config param dense_init_scale: 1.0
I0423 07:30:50.730985 137367741581120 pyconfig.py:471] Config param diloco_outer_lr: 0.3
I0423 07:30:50.731001 137367741581120 pyconfig.py:471] Config param diloco_outer_momentum: 0.9
I0423 07:30:50.731016 137367741581120 pyconfig.py:471] Config param diloco_sync_period: 36
I0423 07:30:50.731032 137367741581120 pyconfig.py:471] Config param distill_alpha: 0.5
I0423 07:30:50.731049 137367741581120 pyconfig.py:471] Config param distill_alpha_end: None
I0423 07:30:50.731063 137367741581120 pyconfig.py:471] Config param distill_alpha_schedule: constant
I0423 07:30:50.731078 137367741581120 pyconfig.py:471] Config param distill_beta: 0.0
I0423 07:30:50.731093 137367741581120 pyconfig.py:471] Config param distill_beta_end: None
I0423 07:30:50.731118 137367741581120 pyconfig.py:471] Config param distill_beta_schedule: constant
I0423 07:30:50.731133 137367741581120 pyconfig.py:471] Config param distill_feature_loss_type: cosine
I0423 07:30:50.731148 137367741581120 pyconfig.py:471] Config param distill_layer_indices: None
I0423 07:30:50.731164 137367741581120 pyconfig.py:471] Config param distill_temperature: 1.0
I0423 07:30:50.731178 137367741581120 pyconfig.py:471] Config param distill_temperature_end: None
I0423 07:30:50.731194 137367741581120 pyconfig.py:471] Config param distill_temperature_schedule: constant
I0423 07:30:50.731209 137367741581120 pyconfig.py:471] Config param downsample_hidden_size_for_audio: 256
I0423 07:30:50.731227 137367741581120 pyconfig.py:471] Config param dpo_beta: 0.1
I0423 07:30:50.731243 137367741581120 pyconfig.py:471] Config param dpo_label_smoothing: 0.0
I0423 07:30:50.731257 137367741581120 pyconfig.py:471] Config param dq_reduction_steps: 0
I0423 07:30:50.731273 137367741581120 pyconfig.py:471] Config param dropout_rate: 0.0
I0423 07:30:50.731287 137367741581120 pyconfig.py:471] Config param dtype: bfloat16
I0423 07:30:50.731318 137367741581120 pyconfig.py:471] Config param dtype_mm: float32
I0423 07:30:50.731333 137367741581120 pyconfig.py:471] Config param dump_hlo: False
I0423 07:30:50.731349 137367741581120 pyconfig.py:471] Config param dump_hlo_delete_local_after: True
I0423 07:30:50.731363 137367741581120 pyconfig.py:471] Config param dump_hlo_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-23-07-30/xla_dump
I0423 07:30:50.731379 137367741581120 pyconfig.py:471] Config param dump_hlo_local_dir: /tmp/xla_dump/
I0423 07:30:50.731394 137367741581120 pyconfig.py:471] Config param dump_hlo_local_module_name: jit_train_step
I0423 07:30:50.731409 137367741581120 pyconfig.py:471] Config param dump_hlo_module_name: jit_train_step
I0423 07:30:50.731424 137367741581120 pyconfig.py:471] Config param dump_hlo_upload_all: False
I0423 07:30:50.731440 137367741581120 pyconfig.py:471] Config param dump_hlo_xla_flags: 
I0423 07:30:50.731455 137367741581120 pyconfig.py:471] Config param dump_jaxpr: False
I0423 07:30:50.731471 137367741581120 pyconfig.py:471] Config param dump_jaxpr_delete_local_after: True
I0423 07:30:50.731486 137367741581120 pyconfig.py:471] Config param dump_jaxpr_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-23-07-30/jaxpr_dump
I0423 07:30:50.731500 137367741581120 pyconfig.py:471] Config param dump_jaxpr_local_dir: /tmp/jaxpr_dump/
I0423 07:30:50.731515 137367741581120 pyconfig.py:471] Config param dump_step: -1
I0423 07:30:50.731531 137367741581120 pyconfig.py:471] Config param elastic_enabled: False
I0423 07:30:50.731545 137367741581120 pyconfig.py:471] Config param elastic_max_retries: 10
I0423 07:30:50.731561 137367741581120 pyconfig.py:471] Config param elastic_timeout_seconds: 300
I0423 07:30:50.731578 137367741581120 pyconfig.py:471] Config param emb_dim: 16
I0423 07:30:50.731594 137367741581120 pyconfig.py:471] Config param enable_autocheckpoint: False
I0423 07:30:50.731608 137367741581120 pyconfig.py:471] Config param enable_checkpoint_cloud_logger: False
I0423 07:30:50.731623 137367741581120 pyconfig.py:471] Config param enable_checkpointing: True
I0423 07:30:50.731639 137367741581120 pyconfig.py:471] Config param enable_continuous_checkpointing: False
I0423 07:30:50.731654 137367741581120 pyconfig.py:471] Config param enable_data_shuffling: True
I0423 07:30:50.731668 137367741581120 pyconfig.py:471] Config param enable_diloco: False
I0423 07:30:50.731683 137367741581120 pyconfig.py:471] Config param enable_dp_attention: False
I0423 07:30:50.731698 137367741581120 pyconfig.py:471] Config param enable_dropout: False
I0423 07:30:50.731713 137367741581120 pyconfig.py:471] Config param enable_emergency_checkpoint: False
I0423 07:30:50.731727 137367741581120 pyconfig.py:471] Config param enable_expert_parallel: False
I0423 07:30:50.731743 137367741581120 pyconfig.py:471] Config param enable_gcp_goodput_metrics: True
I0423 07:30:50.731757 137367741581120 pyconfig.py:471] Config param enable_gcp_step_deviation_metrics: True
I0423 07:30:50.731772 137367741581120 pyconfig.py:471] Config param enable_goodput_recording: False
I0423 07:30:50.731786 137367741581120 pyconfig.py:471] Config param enable_jax_profiler: False
I0423 07:30:50.731801 137367741581120 pyconfig.py:471] Config param enable_llm_inference_pool: False
I0423 07:30:50.731817 137367741581120 pyconfig.py:471] Config param enable_model_warmup: False
I0423 07:30:50.731831 137367741581120 pyconfig.py:471] Config param enable_multi_tier_checkpointing: False
I0423 07:30:50.731846 137367741581120 pyconfig.py:471] Config param enable_nnx: False
I0423 07:30:50.731862 137367741581120 pyconfig.py:471] Config param enable_orbax_v1: False
I0423 07:30:50.731877 137367741581120 pyconfig.py:471] Config param enable_padding_causal_mask: True
I0423 07:30:50.731891 137367741581120 pyconfig.py:471] Config param enable_pathways_goodput: False
I0423 07:30:50.731907 137367741581120 pyconfig.py:471] Config param enable_prefix_caching: False
I0423 07:30:50.731921 137367741581120 pyconfig.py:471] Config param enable_rampup_batch_size: False
I0423 07:30:50.731936 137367741581120 pyconfig.py:471] Config param enable_single_controller: False
I0423 07:30:50.731950 137367741581120 pyconfig.py:471] Config param enable_single_replica_ckpt_restoring: False
I0423 07:30:50.731966 137367741581120 pyconfig.py:471] Config param enable_tensorboard: True
I0423 07:30:50.731980 137367741581120 pyconfig.py:471] Config param enable_tunix_perf_metrics: False
I0423 07:30:50.731996 137367741581120 pyconfig.py:471] Config param encoder_attention_heads_for_audio: 4
I0423 07:30:50.732010 137367741581120 pyconfig.py:471] Config param encoder_ffn_dim_for_audio: 512
I0423 07:30:50.732026 137367741581120 pyconfig.py:471] Config param encoder_layers_for_audio: 2
I0423 07:30:50.732040 137367741581120 pyconfig.py:471] Config param engram: RematLocation.REMAT
I0423 07:30:50.732056 137367741581120 pyconfig.py:471] Config param engram_head_dim: 1280
I0423 07:30:50.732070 137367741581120 pyconfig.py:471] Config param engram_kernel_size: 4
I0423 07:30:50.732086 137367741581120 pyconfig.py:471] Config param engram_layers: []
I0423 07:30:50.732113 137367741581120 pyconfig.py:471] Config param engram_max_ngram_size: 3
I0423 07:30:50.732129 137367741581120 pyconfig.py:471] Config param engram_num_heads: 8
I0423 07:30:50.732143 137367741581120 pyconfig.py:471] Config param engram_seed: 0
I0423 07:30:50.732158 137367741581120 pyconfig.py:471] Config param engram_vocab_bases: []
I0423 07:30:50.732172 137367741581120 pyconfig.py:471] Config param epsilon_high: None
I0423 07:30:50.732188 137367741581120 pyconfig.py:471] Config param eval_corr_lst: False
I0423 07:30:50.732202 137367741581120 pyconfig.py:471] Config param eval_data_columns: ['text']
I0423 07:30:50.732218 137367741581120 pyconfig.py:471] Config param eval_dataset_name: c4/en:3.0.1
I0423 07:30:50.732238 137367741581120 pyconfig.py:471] Config param eval_image_column: image
I0423 07:30:50.732253 137367741581120 pyconfig.py:471] Config param eval_interval: -1
I0423 07:30:50.732268 137367741581120 pyconfig.py:471] Config param eval_make_lst: False
I0423 07:30:50.732284 137367741581120 pyconfig.py:471] Config param eval_per_device_batch_size: 2
I0423 07:30:50.732300 137367741581120 pyconfig.py:471] Config param eval_sampling_strategy: greedy
I0423 07:30:50.732314 137367741581120 pyconfig.py:471] Config param eval_split: validation
I0423 07:30:50.732331 137367741581120 pyconfig.py:471] Config param eval_steps: -1
I0423 07:30:50.732346 137367741581120 pyconfig.py:471] Config param expansion_factor_real_data: -1.0
I0423 07:30:50.732361 137367741581120 pyconfig.py:471] Config param final_logits_soft_cap: None
I0423 07:30:50.732376 137367741581120 pyconfig.py:471] Config param first_num_dense_layers: 0
I0423 07:30:50.732391 137367741581120 pyconfig.py:471] Config param float32_gate_logits: False
I0423 07:30:50.732405 137367741581120 pyconfig.py:471] Config param float32_logits: False
I0423 07:30:50.732421 137367741581120 pyconfig.py:471] Config param float32_qk_product: False
I0423 07:30:50.732436 137367741581120 pyconfig.py:471] Config param float32_weight_sum: True
I0423 07:30:50.732451 137367741581120 pyconfig.py:471] Config param force_q_layout: False
I0423 07:30:50.732466 137367741581120 pyconfig.py:471] Config param force_unroll: False
I0423 07:30:50.732480 137367741581120 pyconfig.py:471] Config param formatting_func_kwargs: {}
I0423 07:30:50.732496 137367741581120 pyconfig.py:471] Config param formatting_func_path: 
I0423 07:30:50.732511 137367741581120 pyconfig.py:471] Config param freeze_audio_encoder_params: True
I0423 07:30:50.732526 137367741581120 pyconfig.py:471] Config param freeze_vision_encoder_params: True
I0423 07:30:50.732542 137367741581120 pyconfig.py:471] Config param fused_mlp: False
I0423 07:30:50.732558 137367741581120 pyconfig.py:471] Config param fused_qkv: True
I0423 07:30:50.732574 137367741581120 pyconfig.py:471] Config param gcs_metrics: False
I0423 07:30:50.732588 137367741581120 pyconfig.py:471] Config param gdn_chunk_size: 64
I0423 07:30:50.732603 137367741581120 pyconfig.py:471] Config param gdn_conv_kernel_dim: 4
I0423 07:30:50.732618 137367741581120 pyconfig.py:471] Config param gdn_key_head_dim: 128
I0423 07:30:50.732634 137367741581120 pyconfig.py:471] Config param gdn_num_key_heads: 16
I0423 07:30:50.732648 137367741581120 pyconfig.py:471] Config param gdn_num_value_heads: 32
I0423 07:30:50.732663 137367741581120 pyconfig.py:471] Config param gdn_value_head_dim: 128
I0423 07:30:50.732679 137367741581120 pyconfig.py:471] Config param generate_padding_batch_eval: False
I0423 07:30:50.732693 137367741581120 pyconfig.py:471] Config param generate_padding_batch_train: False
I0423 07:30:50.732708 137367741581120 pyconfig.py:471] Config param generate_slice: v5e-16
I0423 07:30:50.732724 137367741581120 pyconfig.py:471] Config param generation_configs: {}
I0423 07:30:50.732738 137367741581120 pyconfig.py:471] Config param global_batch_size_to_eval_on: 64
I0423 07:30:50.732753 137367741581120 pyconfig.py:471] Config param global_batch_size_to_load: 512
I0423 07:30:50.732768 137367741581120 pyconfig.py:471] Config param global_batch_size_to_load_eval: 64
I0423 07:30:50.732783 137367741581120 pyconfig.py:471] Config param global_batch_size_to_load_increment: None
I0423 07:30:50.732799 137367741581120 pyconfig.py:471] Config param global_batch_size_to_load_start: None
I0423 07:30:50.732812 137367741581120 pyconfig.py:471] Config param global_batch_size_to_train_on: 512
I0423 07:30:50.732828 137367741581120 pyconfig.py:471] Config param global_head_dim: 0
I0423 07:30:50.732844 137367741581120 pyconfig.py:471] Config param global_num_kv_heads: 0
I0423 07:30:50.732859 137367741581120 pyconfig.py:471] Config param global_parameter_scale: 1
I0423 07:30:50.732875 137367741581120 pyconfig.py:471] Config param global_rampup_samples: 500
I0423 07:30:50.732891 137367741581120 pyconfig.py:471] Config param global_rope_max_timescale: -1
I0423 07:30:50.732905 137367741581120 pyconfig.py:471] Config param global_rope_proportion: 0.25
I0423 07:30:50.732921 137367741581120 pyconfig.py:471] Config param goodput_upload_interval_seconds: 30
I0423 07:30:50.732936 137367741581120 pyconfig.py:471] Config param grad_dtype: float32
I0423 07:30:50.732969 137367741581120 pyconfig.py:471] Config param gradient_accumulation_steps: 8
I0423 07:30:50.732985 137367741581120 pyconfig.py:471] Config param gradient_clipping_threshold: 1.0
I0423 07:30:50.733000 137367741581120 pyconfig.py:471] Config param grain_data_source_max_workers: 16
I0423 07:30:50.733017 137367741581120 pyconfig.py:471] Config param grain_eval_files: 
I0423 07:30:50.733032 137367741581120 pyconfig.py:471] Config param grain_file_type: arrayrecord
I0423 07:30:50.733047 137367741581120 pyconfig.py:471] Config param grain_num_threads: 16
I0423 07:30:50.733062 137367741581120 pyconfig.py:471] Config param grain_num_threads_eval: 16
I0423 07:30:50.733077 137367741581120 pyconfig.py:471] Config param grain_packing_type: first_fit
I0423 07:30:50.733092 137367741581120 pyconfig.py:471] Config param grain_per_worker_buffer_size: 1
I0423 07:30:50.733121 137367741581120 pyconfig.py:471] Config param grain_per_worker_buffer_size_eval: 1
I0423 07:30:50.733136 137367741581120 pyconfig.py:471] Config param grain_prefetch_buffer_size: 500
I0423 07:30:50.733151 137367741581120 pyconfig.py:471] Config param grain_prefetch_buffer_size_eval: 500
I0423 07:30:50.733165 137367741581120 pyconfig.py:471] Config param grain_ram_budget_mb: 1024
I0423 07:30:50.733181 137367741581120 pyconfig.py:471] Config param grain_shuffle_buffer_size: 100
I0423 07:30:50.733196 137367741581120 pyconfig.py:471] Config param grain_train_files: 
I0423 07:30:50.733211 137367741581120 pyconfig.py:471] Config param grain_train_mixture_config_path: 
I0423 07:30:50.733229 137367741581120 pyconfig.py:471] Config param grain_worker_count: 1
I0423 07:30:50.733245 137367741581120 pyconfig.py:471] Config param grain_worker_count_eval: 1
I0423 07:30:50.733260 137367741581120 pyconfig.py:471] Config param grpo_beta: 0.08
I0423 07:30:50.733276 137367741581120 pyconfig.py:471] Config param grpo_epsilon: 0.2
I0423 07:30:50.733292 137367741581120 pyconfig.py:471] Config param hardware: tpu
I0423 07:30:50.733306 137367741581120 pyconfig.py:471] Config param hbm_utilization_vllm: 0.72
I0423 07:30:50.733322 137367741581120 pyconfig.py:471] Config param head_dim: 8
I0423 07:30:50.733337 137367741581120 pyconfig.py:471] Config param heartbeat_reporting_interval_in_seconds: 5
I0423 07:30:50.733352 137367741581120 pyconfig.py:471] Config param hf_data_dir: None
I0423 07:30:50.733366 137367741581120 pyconfig.py:471] Config param hf_eval_files: None
I0423 07:30:50.733381 137367741581120 pyconfig.py:471] Config param hf_eval_split: None
I0423 07:30:50.733397 137367741581120 pyconfig.py:471] Config param hf_name: None
I0423 07:30:50.733411 137367741581120 pyconfig.py:471] Config param hf_path: OptimalScale/ClimbMix
I0423 07:30:50.733427 137367741581120 pyconfig.py:471] Config param hf_train_files: None
I0423 07:30:50.733441 137367741581120 pyconfig.py:471] Config param hidden_size_for_vit: 1408
I0423 07:30:50.733455 137367741581120 pyconfig.py:471] Config param hide_profiler_step_metric: False
I0423 07:30:50.733471 137367741581120 pyconfig.py:471] Config param ici_autoregressive_parallelism: 1
I0423 07:30:50.733485 137367741581120 pyconfig.py:471] Config param ici_context_autoregressive_parallelism: 1
I0423 07:30:50.733500 137367741581120 pyconfig.py:471] Config param ici_context_parallelism: 1
I0423 07:30:50.733515 137367741581120 pyconfig.py:471] Config param ici_data_parallelism: 1
I0423 07:30:50.733530 137367741581120 pyconfig.py:471] Config param ici_diloco_parallelism: 1
I0423 07:30:50.733544 137367741581120 pyconfig.py:471] Config param ici_expert_parallelism: 1
I0423 07:30:50.733559 137367741581120 pyconfig.py:471] Config param ici_fsdp_parallelism: -1
I0423 07:30:50.733574 137367741581120 pyconfig.py:471] Config param ici_fsdp_transpose_parallelism: 1
I0423 07:30:50.733588 137367741581120 pyconfig.py:471] Config param ici_parallelism: [1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
I0423 07:30:50.733605 137367741581120 pyconfig.py:471] Config param ici_pipeline_parallelism: 1
I0423 07:30:50.733620 137367741581120 pyconfig.py:471] Config param ici_sequence_parallelism: 1
I0423 07:30:50.733634 137367741581120 pyconfig.py:471] Config param ici_tensor_parallelism: 1
I0423 07:30:50.733650 137367741581120 pyconfig.py:471] Config param ici_tensor_sequence_parallelism: 1
I0423 07:30:50.733665 137367741581120 pyconfig.py:471] Config param ici_tensor_transpose_parallelism: 1
I0423 07:30:50.733681 137367741581120 pyconfig.py:471] Config param image_path: 
I0423 07:30:50.733696 137367741581120 pyconfig.py:471] Config param image_placeholder: <|image|>
I0423 07:30:50.733711 137367741581120 pyconfig.py:471] Config param image_size_for_vit: 896
I0423 07:30:50.733726 137367741581120 pyconfig.py:471] Config param indexer_head_dim: 128
I0423 07:30:50.733740 137367741581120 pyconfig.py:471] Config param indexer_loss_scaling_factor: 0.0
I0423 07:30:50.733756 137367741581120 pyconfig.py:471] Config param indexer_n_heads: 64
I0423 07:30:50.733770 137367741581120 pyconfig.py:471] Config param indexer_sparse_training: False
I0423 07:30:50.733786 137367741581120 pyconfig.py:471] Config param indexer_topk: 2048
I0423 07:30:50.733799 137367741581120 pyconfig.py:471] Config param inference_benchmark_test: False
I0423 07:30:50.733815 137367741581120 pyconfig.py:471] Config param inference_metadata_file: 
I0423 07:30:50.733830 137367741581120 pyconfig.py:471] Config param inference_microbenchmark_log_file_path: 
I0423 07:30:50.733844 137367741581120 pyconfig.py:471] Config param inference_microbenchmark_loop_iters: 10
I0423 07:30:50.733860 137367741581120 pyconfig.py:471] Config param inference_microbenchmark_num_samples: [1, 2, 3, 4, 5]
I0423 07:30:50.733875 137367741581120 pyconfig.py:471] Config param inference_microbenchmark_prefill_lengths: 64,128,256,512,1024
I0423 07:30:50.733890 137367741581120 pyconfig.py:471] Config param inference_microbenchmark_stages: prefill,generate
I0423 07:30:50.733906 137367741581120 pyconfig.py:471] Config param inference_server: MaxtextInterleavedServer
I0423 07:30:50.733920 137367741581120 pyconfig.py:471] Config param inhomogeneous_layer_cycle_interval: 1
I0423 07:30:50.733935 137367741581120 pyconfig.py:471] Config param init_weights_seed: 0
I0423 07:30:50.733950 137367741581120 pyconfig.py:471] Config param input_data_sharding_logical_axes: ['activation_embed_and_logits_batch', 'activation_norm_length']
I0423 07:30:50.733966 137367741581120 pyconfig.py:471] Config param interleave_moe_layer_step: 1
I0423 07:30:50.733980 137367741581120 pyconfig.py:471] Config param intermediate_size_for_vit: 5632
I0423 07:30:50.733995 137367741581120 pyconfig.py:471] Config param internal_compile: False
I0423 07:30:50.734010 137367741581120 pyconfig.py:471] Config param internal_compile_num_devices: -1
I0423 07:30:50.734024 137367741581120 pyconfig.py:471] Config param jax_cache_dir: ~/jax_cache
I0423 07:30:50.734040 137367741581120 pyconfig.py:471] Config param jax_debug_log_modules: 
I0423 07:30:50.734054 137367741581120 pyconfig.py:471] Config param jax_distributed_initialization_timeout: 300
I0423 07:30:50.734070 137367741581120 pyconfig.py:471] Config param jax_profiler_port: 9999
I0423 07:30:50.734085 137367741581120 pyconfig.py:471] Config param key_proj: RematLocation.REMAT
I0423 07:30:50.734110 137367741581120 pyconfig.py:471] Config param kv_cache_buffer: 256
I0423 07:30:50.734126 137367741581120 pyconfig.py:471] Config param kv_lora_rank: 512
I0423 07:30:50.734142 137367741581120 pyconfig.py:471] Config param kv_quant_axis: KvQuantAxis.HEADS_AND_DKV
I0423 07:30:50.734159 137367741581120 pyconfig.py:471] Config param kv_quant_dtype: int8
I0423 07:30:50.734174 137367741581120 pyconfig.py:471] Config param kv_wa_proj: RematLocation.REMAT
I0423 07:30:50.734189 137367741581120 pyconfig.py:471] Config param learning_rate: 0.0002
I0423 07:30:50.734205 137367741581120 pyconfig.py:471] Config param learning_rate_final_fraction: 0.1
I0423 07:30:50.734224 137367741581120 pyconfig.py:471] Config param learning_rate_schedule_steps: 200000
I0423 07:30:50.734240 137367741581120 pyconfig.py:471] Config param load_balance_loss_weight: 0.0
I0423 07:30:50.734254 137367741581120 pyconfig.py:471] Config param load_checkpoint_only_once: False
I0423 07:30:50.734269 137367741581120 pyconfig.py:471] Config param load_from_prefill_dir: False
I0423 07:30:50.734283 137367741581120 pyconfig.py:471] Config param load_full_state_path: 
I0423 07:30:50.734299 137367741581120 pyconfig.py:471] Config param load_parameters_path: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items
I0423 07:30:50.734313 137367741581120 pyconfig.py:471] Config param local_checkpoint_directory: 
I0423 07:30:50.734329 137367741581120 pyconfig.py:471] Config param local_checkpoint_period: 0
I0423 07:30:50.734343 137367741581120 pyconfig.py:471] Config param local_rope_max_timescale: -1
I0423 07:30:50.734358 137367741581120 pyconfig.py:471] Config param local_rope_proportion: 1.0
I0423 07:30:50.734373 137367741581120 pyconfig.py:471] Config param log_config: True
I0423 07:30:50.734388 137367741581120 pyconfig.py:471] Config param log_period: 10
I0423 07:30:50.734403 137367741581120 pyconfig.py:471] Config param logical_axis_rules: (('activation_embed_and_logits_batch', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_embed_and_logits_batch_sequence', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_vocab', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_vocab', ('tensor', 'tensor_transpose')), ('activation_vocab', 'tensor_sequence'), ('activation_vocab', ('sequence', 'context')), ('vocab', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('embed_vocab', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_batch_attn', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence', 'autoregressive')), ('activation_kv_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence')), ('activation_length_attn', ('sequence', 'context')), ('activation_length_attn', ('context',)), ('activation_q_length', ('context',)), ('activation_kv_length', ()), ('activation_embed_attn', ('tensor', 'tensor_transpose')), ('activation_kv', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_kv_head_dim', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('q_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('kv_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('qkv', ()), ('kv', ()), ('kv_head_dim', ()), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'expert')), ('q_lora_up_proj', ()), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'expert')), ('kv_lora_up_proj', ()), ('activation_batch_moe', ('data', 'fsdp', 'fsdp_transpose')), ('activation_length_moe', ('sequence', 'context')), ('activation_length_moe', ('context',)), ('activation_norm_length_moe', ('tensor_sequence', 'context', 'sequence')), ('activation_embed_moe', ('tensor', 'tensor_transpose')), ('activation_mlp_moe', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_exp', ('expert',)), ('exp', 'expert'), ('mlp_moe', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'context')), ('embed_moe', ('fsdp', 'sequence', 'context')), ('activation_mlp', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_length', ('sequence', 'context')), ('activation_length', ('context',)), ('activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_embed', ('tensor', 'tensor_transpose')), ('activation_stage', 'stage'), ('mlp', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'context', 'expert')), ('norm', ('tensor', 'tensor_transpose')), ('layers', 'stage'), ('diloco', 'diloco'), ('engram_dim', ('tensor',)), ('dense_layers', ()), ('moe_layers', ()), ('mhc', ()), ('prefill_activation_length', ('sequence', 'context')), ('prefill_activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_prefill_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_length', ('sequence',)), ('cache_heads', ('autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence')), ('cache_heads', ('autoregressive', 'tensor', 'tensor_sequence')), ('paged_kv_heads', ('tensor',)), ('cache_batch_prefill', ()), ('cache_batch', ()), ('cache_heads_none', ()), ('cache_kv', ()), ('cache_sequence', ()), ('num_pages', ()), ('tokens_per_page', ()), ('paged_kv_head_dim_size', ()), ('mlp_no_fsdp', ('tensor', 'tensor_sequence', 'autoregressive')), ('embed_tensor_transpose', ('tensor_transpose',)), ('exp_with_fsdp', 'fsdp'))
I0423 07:30:50.734481 137367741581120 pyconfig.py:471] Config param logits_dot_in_fp32: False
I0423 07:30:50.734498 137367741581120 pyconfig.py:471] Config param logits_via_embedding: True
I0423 07:30:50.734515 137367741581120 pyconfig.py:471] Config param lora_input_adapters_path: 
I0423 07:30:50.734531 137367741581120 pyconfig.py:471] Config param loss_algo: grpo
I0423 07:30:50.734546 137367741581120 pyconfig.py:471] Config param lr_schedule_type: LearningRateScheduleType.COSINE
I0423 07:30:50.734564 137367741581120 pyconfig.py:471] Config param managed_mldiagnostics: False
I0423 07:30:50.734580 137367741581120 pyconfig.py:471] Config param managed_mldiagnostics_dir: /deps/maxtext_output/gpt3-52k_2026-04-23-07-30/managed-mldiagnostics
I0423 07:30:50.734594 137367741581120 pyconfig.py:471] Config param managed_mldiagnostics_run_group: 
I0423 07:30:50.734609 137367741581120 pyconfig.py:471] Config param matmul_precision: MatmulPrecision.DEFAULT
I0423 07:30:50.734626 137367741581120 pyconfig.py:471] Config param max_checkify: False
I0423 07:30:50.734641 137367741581120 pyconfig.py:471] Config param max_concurrency: 256
I0423 07:30:50.734657 137367741581120 pyconfig.py:471] Config param max_corpus_chars: 10000000
I0423 07:30:50.734671 137367741581120 pyconfig.py:471] Config param max_num_batched_tokens: None
I0423 07:30:50.734687 137367741581120 pyconfig.py:471] Config param max_num_checkpoints_to_keep: None
I0423 07:30:50.734702 137367741581120 pyconfig.py:471] Config param max_num_images_per_example: -1
I0423 07:30:50.734716 137367741581120 pyconfig.py:471] Config param max_num_seqs: None
I0423 07:30:50.734732 137367741581120 pyconfig.py:471] Config param max_position_embeddings: 163840
I0423 07:30:50.734748 137367741581120 pyconfig.py:471] Config param max_prefill_predict_length: 64
I0423 07:30:50.734762 137367741581120 pyconfig.py:471] Config param max_sample_len_for_audio: 10000
I0423 07:30:50.734777 137367741581120 pyconfig.py:471] Config param max_segments_per_seq: -1
I0423 07:30:50.734793 137367741581120 pyconfig.py:471] Config param max_source_positions_for_audio: 1500
I0423 07:30:50.734807 137367741581120 pyconfig.py:471] Config param max_target_length: 2048
I0423 07:30:50.734822 137367741581120 pyconfig.py:471] Config param max_timescale_for_audio: 10000.0
I0423 07:30:50.734838 137367741581120 pyconfig.py:471] Config param megablox: True
I0423 07:30:50.734852 137367741581120 pyconfig.py:471] Config param merge_gating_gmm: False
I0423 07:30:50.734868 137367741581120 pyconfig.py:471] Config param mesh_axes: ['diloco', 'data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive']
I0423 07:30:50.734884 137367741581120 pyconfig.py:471] Config param metrics_dir: /deps/maxtext_output/gpt3-52k_2026-04-23-07-30/metrics/
I0423 07:30:50.734899 137367741581120 pyconfig.py:471] Config param metrics_file: 
I0423 07:30:50.734915 137367741581120 pyconfig.py:471] Config param mhc_expansion_rate: 1
I0423 07:30:50.734929 137367741581120 pyconfig.py:471] Config param micro_batch_size_to_eval_on: 64
I0423 07:30:50.734945 137367741581120 pyconfig.py:471] Config param micro_batch_size_to_train_on: 64
I0423 07:30:50.734959 137367741581120 pyconfig.py:471] Config param mla_kv: RematLocation.REMAT
I0423 07:30:50.734975 137367741581120 pyconfig.py:471] Config param mla_naive_kvcache: True
I0423 07:30:50.734991 137367741581120 pyconfig.py:471] Config param mla_q: RematLocation.REMAT
I0423 07:30:50.735005 137367741581120 pyconfig.py:471] Config param mlp_activations: ['gelu']
I0423 07:30:50.735021 137367741581120 pyconfig.py:471] Config param mlp_activations_limit: -1.0
I0423 07:30:50.735036 137367741581120 pyconfig.py:471] Config param mlp_bias: False
I0423 07:30:50.735052 137367741581120 pyconfig.py:471] Config param mlp_dim: 64
I0423 07:30:50.735068 137367741581120 pyconfig.py:471] Config param mlpwi: RematLocation.REMAT
I0423 07:30:50.735082 137367741581120 pyconfig.py:471] Config param mlpwi_0: RematLocation.REMAT
I0423 07:30:50.735105 137367741581120 pyconfig.py:471] Config param mlpwi_1: RematLocation.REMAT
I0423 07:30:50.735121 137367741581120 pyconfig.py:471] Config param mlpwo: RematLocation.REMAT
I0423 07:30:50.735136 137367741581120 pyconfig.py:471] Config param moba: False
I0423 07:30:50.735150 137367741581120 pyconfig.py:471] Config param moba_chunk_size: 1024
I0423 07:30:50.735165 137367741581120 pyconfig.py:471] Config param moba_topk: 8
I0423 07:30:50.735180 137367741581120 pyconfig.py:471] Config param model_call_mode: 
I0423 07:30:50.735195 137367741581120 pyconfig.py:471] Config param model_name: gpt3-52k
I0423 07:30:50.735209 137367741581120 pyconfig.py:471] Config param moe_expert_input_dim: -1
I0423 07:30:50.735228 137367741581120 pyconfig.py:471] Config param moe_fsdp_use_two_stage_all_gather: False
I0423 07:30:50.735242 137367741581120 pyconfig.py:471] Config param moe_mlp_dim: -1
I0423 07:30:50.735257 137367741581120 pyconfig.py:471] Config param moe_mlpwi_0: RematLocation.REMAT
I0423 07:30:50.735273 137367741581120 pyconfig.py:471] Config param moe_mlpwi_1: RematLocation.REMAT
I0423 07:30:50.735289 137367741581120 pyconfig.py:471] Config param moe_mlpwo: RematLocation.REMAT
I0423 07:30:50.735303 137367741581120 pyconfig.py:471] Config param monitor_goodput: False
I0423 07:30:50.735319 137367741581120 pyconfig.py:471] Config param monitor_step_time_deviation: True
I0423 07:30:50.735333 137367741581120 pyconfig.py:471] Config param mrope_section: [24, 20, 20]
I0423 07:30:50.735348 137367741581120 pyconfig.py:471] Config param mscale: 1.0
I0423 07:30:50.735363 137367741581120 pyconfig.py:471] Config param mtc_data_parallelism: 0
I0423 07:30:50.735378 137367741581120 pyconfig.py:471] Config param mtp_eval_target_module: 0
I0423 07:30:50.735393 137367741581120 pyconfig.py:471] Config param mtp_loss_scaling_factor: 0.1
I0423 07:30:50.735409 137367741581120 pyconfig.py:471] Config param mtp_num_layers: 0
I0423 07:30:50.735424 137367741581120 pyconfig.py:471] Config param mu_dtype: float32
I0423 07:30:50.735447 137367741581120 pyconfig.py:471] Config param multi_sampling: False
I0423 07:30:50.735462 137367741581120 pyconfig.py:471] Config param multi_tier_checkpointing_backup_interval_minutes: 0
I0423 07:30:50.735477 137367741581120 pyconfig.py:471] Config param muon_beta: 0.95
I0423 07:30:50.735491 137367741581120 pyconfig.py:471] Config param muon_consistent_rms: None
I0423 07:30:50.735507 137367741581120 pyconfig.py:471] Config param muon_weight_decay: 0.0
I0423 07:30:50.735523 137367741581120 pyconfig.py:471] Config param n_routing_groups: -1
I0423 07:30:50.735538 137367741581120 pyconfig.py:471] Config param n_window_for_audio: 50
I0423 07:30:50.735553 137367741581120 pyconfig.py:471] Config param n_window_infer_for_audio: 800
I0423 07:30:50.735569 137367741581120 pyconfig.py:471] Config param nope_layer_interval: -1
I0423 07:30:50.735583 137367741581120 pyconfig.py:471] Config param norm_topk_prob: False
I0423 07:30:50.735599 137367741581120 pyconfig.py:471] Config param normalization_layer_epsilon: 1e-05
I0423 07:30:50.735615 137367741581120 pyconfig.py:471] Config param normalize_embedding_logits: False
I0423 07:30:50.735630 137367741581120 pyconfig.py:471] Config param num_attention_heads_for_vit: 16
I0423 07:30:50.735645 137367741581120 pyconfig.py:471] Config param num_batches: 4
I0423 07:30:50.735660 137367741581120 pyconfig.py:471] Config param num_channels_for_vit: 3
I0423 07:30:50.735675 137367741581120 pyconfig.py:471] Config param num_conv_layers_for_audio: 3
I0423 07:30:50.735691 137367741581120 pyconfig.py:471] Config param num_decoder_layers: 1
I0423 07:30:50.735706 137367741581120 pyconfig.py:471] Config param num_diloco_replicas: 1
I0423 07:30:50.735721 137367741581120 pyconfig.py:471] Config param num_epoch: 1
I0423 07:30:50.735736 137367741581120 pyconfig.py:471] Config param num_eval_passes: 1
I0423 07:30:50.735751 137367741581120 pyconfig.py:471] Config param num_experts: 1
I0423 07:30:50.735765 137367741581120 pyconfig.py:471] Config param num_experts_per_tok: 1
I0423 07:30:50.735781 137367741581120 pyconfig.py:471] Config param num_generations: 2
I0423 07:30:50.735795 137367741581120 pyconfig.py:471] Config param num_hidden_layers_for_vit: 34
I0423 07:30:50.735810 137367741581120 pyconfig.py:471] Config param num_iterations: 1
I0423 07:30:50.735825 137367741581120 pyconfig.py:471] Config param num_kv_heads: 2
I0423 07:30:50.735839 137367741581120 pyconfig.py:471] Config param num_layers_per_pipeline_stage: 1
I0423 07:30:50.735854 137367741581120 pyconfig.py:471] Config param num_mel_bins_for_audio: 128
I0423 07:30:50.735868 137367741581120 pyconfig.py:471] Config param num_pipeline_microbatches: -1
I0423 07:30:50.735883 137367741581120 pyconfig.py:471] Config param num_pipeline_repeats: -1
I0423 07:30:50.735898 137367741581120 pyconfig.py:471] Config param num_position_embeddings_for_vit: 1024
I0423 07:30:50.735914 137367741581120 pyconfig.py:471] Config param num_query_heads: 2
I0423 07:30:50.735928 137367741581120 pyconfig.py:471] Config param num_samplers_slices: -1
I0423 07:30:50.735943 137367741581120 pyconfig.py:471] Config param num_slices: 1
I0423 07:30:50.735957 137367741581120 pyconfig.py:471] Config param num_target_devices: 32
I0423 07:30:50.735972 137367741581120 pyconfig.py:471] Config param num_test_batches: 5
I0423 07:30:50.735988 137367741581120 pyconfig.py:471] Config param num_trainer_slices: -1
I0423 07:30:50.736002 137367741581120 pyconfig.py:471] Config param num_vocab_tiling: 1
I0423 07:30:50.736017 137367741581120 pyconfig.py:471] Config param off_policy_steps: 0
I0423 07:30:50.736032 137367741581120 pyconfig.py:471] Config param offline_data_dir: None
I0423 07:30:50.736047 137367741581120 pyconfig.py:471] Config param opt_type: OptimizerType.ADAM_PAX
I0423 07:30:50.736064 137367741581120 pyconfig.py:471] Config param optimize_mesh_for_tpu_v6e: False
I0423 07:30:50.736079 137367741581120 pyconfig.py:471] Config param optimizer_memory_host_offload: False
I0423 07:30:50.736104 137367741581120 pyconfig.py:471] Config param original_max_position_embeddings: 4096
I0423 07:30:50.736120 137367741581120 pyconfig.py:471] Config param out_hidden_size_for_vit: 512
I0423 07:30:50.736134 137367741581120 pyconfig.py:471] Config param out_proj: RematLocation.REMAT
I0423 07:30:50.736150 137367741581120 pyconfig.py:471] Config param output_dim_for_audio: 512
I0423 07:30:50.736164 137367741581120 pyconfig.py:471] Config param override_logical_axis_rules: False
I0423 07:30:50.736180 137367741581120 pyconfig.py:471] Config param override_model_config: True
I0423 07:30:50.736194 137367741581120 pyconfig.py:471] Config param packing: True
I0423 07:30:50.736209 137367741581120 pyconfig.py:471] Config param pagedattn_head_dim_alignment: 128
I0423 07:30:50.736226 137367741581120 pyconfig.py:471] Config param pagedattn_max_pages_per_group: -1
I0423 07:30:50.736242 137367741581120 pyconfig.py:471] Config param pagedattn_num_pages: 64
I0423 07:30:50.736255 137367741581120 pyconfig.py:471] Config param pagedattn_pages_per_compute_block: 4
I0423 07:30:50.736271 137367741581120 pyconfig.py:471] Config param pagedattn_tokens_per_page: 32
I0423 07:30:50.736285 137367741581120 pyconfig.py:471] Config param param_scan_axis: 1
I0423 07:30:50.736307 137367741581120 pyconfig.py:471] Config param parameter_memory_host_offload: False
I0423 07:30:50.736333 137367741581120 pyconfig.py:471] Config param partial_rotary_factor: 1.0
I0423 07:30:50.736359 137367741581120 pyconfig.py:471] Config param patch_size_for_vit: 14
I0423 07:30:50.736385 137367741581120 pyconfig.py:471] Config param penalty_incorrect_answer: -1.0
I0423 07:30:50.736409 137367741581120 pyconfig.py:471] Config param penalty_incorrect_format: -0.5
I0423 07:30:50.736434 137367741581120 pyconfig.py:471] Config param per_device_batch_size: 2
I0423 07:30:50.736459 137367741581120 pyconfig.py:471] Config param per_device_batch_size_increment: 2.0
I0423 07:30:50.736483 137367741581120 pyconfig.py:471] Config param per_device_batch_size_start: 4.0
I0423 07:30:50.736509 137367741581120 pyconfig.py:471] Config param pipeline_delay_activation_forwarding: False
I0423 07:30:50.736535 137367741581120 pyconfig.py:471] Config param pipeline_fsdp_ag_once: False
I0423 07:30:50.736561 137367741581120 pyconfig.py:471] Config param pipeline_fsdp_ag_per_repeat: False
I0423 07:30:50.736586 137367741581120 pyconfig.py:471] Config param pipeline_parallel_layers: 1
I0423 07:30:50.736609 137367741581120 pyconfig.py:471] Config param pixel_shuffle_ratio_for_vit: 0.5
I0423 07:30:50.736636 137367741581120 pyconfig.py:471] Config param posemb_type_for_vit: learn
I0423 07:30:50.736653 137367741581120 pyconfig.py:471] Config param position_id_per_seconds: 25
I0423 07:30:50.736670 137367741581120 pyconfig.py:471] Config param prefill_cache_axis_order: 1,2,0,3
I0423 07:30:50.736684 137367741581120 pyconfig.py:471] Config param prefill_cache_dir: 
I0423 07:30:50.736700 137367741581120 pyconfig.py:471] Config param prefill_chunk_size: 256
I0423 07:30:50.736715 137367741581120 pyconfig.py:471] Config param prefill_slice: v5e-16
I0423 07:30:50.736729 137367741581120 pyconfig.py:471] Config param prefix_caching_dram_byte: 100000000000
I0423 07:30:50.736745 137367741581120 pyconfig.py:471] Config param prefix_caching_hbm_byte: 10000000000
I0423 07:30:50.736760 137367741581120 pyconfig.py:471] Config param prefuse_moe_weights: False
I0423 07:30:50.736774 137367741581120 pyconfig.py:471] Config param profile_cleanly: True
I0423 07:30:50.736789 137367741581120 pyconfig.py:471] Config param profile_periodically_period: -1
I0423 07:30:50.736805 137367741581120 pyconfig.py:471] Config param profile_power_events: False
I0423 07:30:50.736819 137367741581120 pyconfig.py:471] Config param profiler: ProfilerType.NONE
I0423 07:30:50.736837 137367741581120 pyconfig.py:471] Config param profiler_steps: 5
I0423 07:30:50.736853 137367741581120 pyconfig.py:471] Config param projector_dropout_for_vit: 0.0
I0423 07:30:50.736867 137367741581120 pyconfig.py:471] Config param projector_input_dim_for_vit: 4096
I0423 07:30:50.736882 137367741581120 pyconfig.py:471] Config param projector_output_dim_for_vit: 4096
I0423 07:30:50.736901 137367741581120 pyconfig.py:471] Config param prometheus_port: 0
I0423 07:30:50.736927 137367741581120 pyconfig.py:471] Config param prompt: I love to
I0423 07:30:50.736952 137367741581120 pyconfig.py:471] Config param pure_nnx: False
I0423 07:30:50.736979 137367741581120 pyconfig.py:471] Config param pure_nnx_decoder: False
I0423 07:30:50.737004 137367741581120 pyconfig.py:471] Config param q_lora_rank: 0
I0423 07:30:50.737029 137367741581120 pyconfig.py:471] Config param qk_clip_threshold: 100.0
I0423 07:30:50.737054 137367741581120 pyconfig.py:471] Config param qk_nope_head_dim: 128
I0423 07:30:50.737079 137367741581120 pyconfig.py:471] Config param qk_norm_with_scale: True
I0423 07:30:50.737117 137367741581120 pyconfig.py:471] Config param qk_rope_head_dim: 64
I0423 07:30:50.737145 137367741581120 pyconfig.py:471] Config param qkv_proj: RematLocation.REMAT
I0423 07:30:50.737168 137367741581120 pyconfig.py:471] Config param quant_cfg_path: 
I0423 07:30:50.737184 137367741581120 pyconfig.py:471] Config param quantization: QuantizationType.NONE
I0423 07:30:50.737203 137367741581120 pyconfig.py:471] Config param quantization_local_shard_count: 4
I0423 07:30:50.737219 137367741581120 pyconfig.py:471] Config param quantize_kvcache: False
I0423 07:30:50.737237 137367741581120 pyconfig.py:471] Config param query_proj: RematLocation.REMAT
I0423 07:30:50.737254 137367741581120 pyconfig.py:471] Config param query_wa_proj: RematLocation.REMAT
I0423 07:30:50.737270 137367741581120 pyconfig.py:471] Config param ragged_block_size: 256
I0423 07:30:50.737284 137367741581120 pyconfig.py:471] Config param ragged_buffer_factor: -1.0
I0423 07:30:50.737300 137367741581120 pyconfig.py:471] Config param rampup_end_step: 0
I0423 07:30:50.737315 137367741581120 pyconfig.py:471] Config param rampup_samples_per_increment_to_load: None
I0423 07:30:50.737330 137367741581120 pyconfig.py:471] Config param reasoning_end_token: </reasoning>
I0423 07:30:50.737346 137367741581120 pyconfig.py:471] Config param reasoning_start_token: <reasoning>
I0423 07:30:50.737361 137367741581120 pyconfig.py:471] Config param record_internal_nn_metrics: 0
I0423 07:30:50.737378 137367741581120 pyconfig.py:471] Config param remat_policy: full
I0423 07:30:50.737403 137367741581120 pyconfig.py:471] Config param remat_policy_for_vit: minimal
I0423 07:30:50.737429 137367741581120 pyconfig.py:471] Config param remove_size_one_mesh_axis_from_type: True
I0423 07:30:50.737452 137367741581120 pyconfig.py:471] Config param replicate_quant_scale: False
I0423 07:30:50.737469 137367741581120 pyconfig.py:471] Config param replicator_backup_interval_minutes: 0
I0423 07:30:50.737486 137367741581120 pyconfig.py:471] Config param report_heartbeat_metric_for_gcp_monitoring: False
I0423 07:30:50.737502 137367741581120 pyconfig.py:471] Config param report_performance_metric_for_gcp_monitoring: False
I0423 07:30:50.737517 137367741581120 pyconfig.py:471] Config param reshape_q: False
I0423 07:30:50.737532 137367741581120 pyconfig.py:471] Config param return_log_prob: False
I0423 07:30:50.737546 137367741581120 pyconfig.py:471] Config param reuse_example_batch: 0
I0423 07:30:50.737562 137367741581120 pyconfig.py:471] Config param reward_exact_answer: 5.0
I0423 07:30:50.737578 137367741581120 pyconfig.py:471] Config param reward_exact_format_match: 3.0
I0423 07:30:50.737594 137367741581120 pyconfig.py:471] Config param reward_partial_format_match: 0.5
I0423 07:30:50.737611 137367741581120 pyconfig.py:471] Config param reward_ratio_guess_to_answer_high: 0.5
I0423 07:30:50.737626 137367741581120 pyconfig.py:471] Config param reward_ratio_guess_to_answer_low: 0.25
I0423 07:30:50.737642 137367741581120 pyconfig.py:471] Config param reward_white_space_format_match: 1.5
I0423 07:30:50.737656 137367741581120 pyconfig.py:471] Config param rl: {'num_generations': 2, 'num_iterations': 1, 'grpo_beta': 0.08, 'grpo_epsilon': 0.2, 'loss_algo': 'grpo', 'use_agentic_rollout': False, 'max_concurrency': 256, 'off_policy_steps': 0, 'system_prompt': '', 'degenerate_group_masking': True, 'epsilon_high': None}
I0423 07:30:50.737677 137367741581120 pyconfig.py:471] Config param rollout_data_parallelism: -1
I0423 07:30:50.737693 137367741581120 pyconfig.py:471] Config param rollout_expert_parallelism: 1
I0423 07:30:50.737707 137367741581120 pyconfig.py:471] Config param rollout_micro_batch_size: -1
I0423 07:30:50.737722 137367741581120 pyconfig.py:471] Config param rollout_tensor_parallelism: -1
I0423 07:30:50.737739 137367741581120 pyconfig.py:471] Config param rope_attention_scaling: False
I0423 07:30:50.737753 137367741581120 pyconfig.py:471] Config param rope_factor: 40
I0423 07:30:50.737771 137367741581120 pyconfig.py:471] Config param rope_interleave: True
I0423 07:30:50.737798 137367741581120 pyconfig.py:471] Config param rope_linear_scaling_factor: 1.0
I0423 07:30:50.737827 137367741581120 pyconfig.py:471] Config param rope_max_timescale: 10000
I0423 07:30:50.737853 137367741581120 pyconfig.py:471] Config param rope_min_timescale: 1
I0423 07:30:50.737878 137367741581120 pyconfig.py:471] Config param rope_theta_for_vit: 10000
I0423 07:30:50.737903 137367741581120 pyconfig.py:471] Config param rope_truncate: True
I0423 07:30:50.737927 137367741581120 pyconfig.py:471] Config param rope_type: RopeType.DEFAULT
I0423 07:30:50.737954 137367741581120 pyconfig.py:471] Config param rope_use_scale: True
I0423 07:30:50.737978 137367741581120 pyconfig.py:471] Config param routed_bias: False
I0423 07:30:50.738003 137367741581120 pyconfig.py:471] Config param routed_bias_update_rate: 0.0
I0423 07:30:50.738028 137367741581120 pyconfig.py:471] Config param routed_scaling_factor: 1.0
I0423 07:30:50.738053 137367741581120 pyconfig.py:471] Config param routed_score_func: 
I0423 07:30:50.738079 137367741581120 pyconfig.py:471] Config param run_name: gpt3-52k_2026-04-23-07-30
I0423 07:30:50.738107 137367741581120 pyconfig.py:471] Config param sa_block_kv: 512
I0423 07:30:50.738124 137367741581120 pyconfig.py:471] Config param sa_block_kv_compute: 512
I0423 07:30:50.738138 137367741581120 pyconfig.py:471] Config param sa_block_kv_dkv: 512
I0423 07:30:50.738154 137367741581120 pyconfig.py:471] Config param sa_block_kv_dkv_compute: 512
I0423 07:30:50.738169 137367741581120 pyconfig.py:471] Config param sa_block_kv_dq: 512
I0423 07:30:50.738184 137367741581120 pyconfig.py:471] Config param sa_block_q: 512
I0423 07:30:50.738198 137367741581120 pyconfig.py:471] Config param sa_block_q_dkv: 512
I0423 07:30:50.738214 137367741581120 pyconfig.py:471] Config param sa_block_q_dq: 512
I0423 07:30:50.738232 137367741581120 pyconfig.py:471] Config param sa_k_layout: HEAD_DIM_MINOR
I0423 07:30:50.738247 137367741581120 pyconfig.py:471] Config param sa_q_layout: HEAD_DIM_MINOR
I0423 07:30:50.738263 137367741581120 pyconfig.py:471] Config param sa_use_fused_bwd_kernel: False
I0423 07:30:50.738279 137367741581120 pyconfig.py:471] Config param sa_v_layout: HEAD_DIM_MINOR
I0423 07:30:50.738293 137367741581120 pyconfig.py:471] Config param sampler_devices_fraction: 0.5
I0423 07:30:50.738309 137367741581120 pyconfig.py:471] Config param save_checkpoint_on_completion: True
I0423 07:30:50.738325 137367741581120 pyconfig.py:471] Config param save_config_to_gcs: False
I0423 07:30:50.738339 137367741581120 pyconfig.py:471] Config param save_quantized_params_path: 
I0423 07:30:50.738355 137367741581120 pyconfig.py:471] Config param scale_embedding_for_audio: True
I0423 07:30:50.738369 137367741581120 pyconfig.py:471] Config param scan_layers: True
I0423 07:30:50.738388 137367741581120 pyconfig.py:471] Config param scan_layers_per_stage: False
I0423 07:30:50.738412 137367741581120 pyconfig.py:471] Config param scan_pipeline_iterations: True
I0423 07:30:50.738439 137367741581120 pyconfig.py:471] Config param scan_pipeline_repeats: False
I0423 07:30:50.738461 137367741581120 pyconfig.py:471] Config param set_remat_policy_on_layers_per_stage: False
I0423 07:30:50.738475 137367741581120 pyconfig.py:471] Config param set_remat_policy_on_pipeline_iterations: True
I0423 07:30:50.738490 137367741581120 pyconfig.py:471] Config param sft_train_on_completion_only: False
I0423 07:30:50.738505 137367741581120 pyconfig.py:471] Config param shard_exp_on_fsdp: False
I0423 07:30:50.738521 137367741581120 pyconfig.py:471] Config param shard_mode: ShardMode.AUTO
I0423 07:30:50.738537 137367741581120 pyconfig.py:471] Config param shard_optimizer_over_data: False
I0423 07:30:50.738553 137367741581120 pyconfig.py:471] Config param sharding_strategy: None
I0423 07:30:50.738567 137367741581120 pyconfig.py:471] Config param sharding_tolerance: 0.02
I0423 07:30:50.738584 137367741581120 pyconfig.py:471] Config param shardy: True
I0423 07:30:50.738598 137367741581120 pyconfig.py:471] Config param share_kv_projections: False
I0423 07:30:50.738614 137367741581120 pyconfig.py:471] Config param shared_experts: 0
I0423 07:30:50.738628 137367741581120 pyconfig.py:471] Config param sinkhorn_iterations: 20
I0423 07:30:50.738643 137367741581120 pyconfig.py:471] Config param skip_first_n_steps_for_profiler: 1
I0423 07:30:50.738659 137367741581120 pyconfig.py:471] Config param skip_jax_distributed_system: False
I0423 07:30:50.738673 137367741581120 pyconfig.py:471] Config param skip_step_interval: 128
I0423 07:30:50.738689 137367741581120 pyconfig.py:471] Config param skip_step_on_spikes: False
I0423 07:30:50.738703 137367741581120 pyconfig.py:471] Config param skip_step_scaling_factor: 6.0
I0423 07:30:50.738718 137367741581120 pyconfig.py:471] Config param sliding_window_size: 0
I0423 07:30:50.738733 137367741581120 pyconfig.py:471] Config param solution_end_token: </answer>
I0423 07:30:50.738748 137367741581120 pyconfig.py:471] Config param solution_start_token: <answer>
I0423 07:30:50.738763 137367741581120 pyconfig.py:471] Config param source_checkpoint_layout: orbax
I0423 07:30:50.738778 137367741581120 pyconfig.py:471] Config param sparse_matmul: True
I0423 07:30:50.738798 137367741581120 pyconfig.py:471] Config param spatial_merge_size_for_vit: 2
I0423 07:30:50.738823 137367741581120 pyconfig.py:471] Config param stack_prefill_result_cache: False
I0423 07:30:50.738850 137367741581120 pyconfig.py:471] Config param stack_trace_interval_seconds: 600
I0423 07:30:50.738875 137367741581120 pyconfig.py:471] Config param stack_trace_to_cloud: False
I0423 07:30:50.738899 137367741581120 pyconfig.py:471] Config param step_deviation_interval_seconds: 30
I0423 07:30:50.738924 137367741581120 pyconfig.py:471] Config param steps: 200000
I0423 07:30:50.738948 137367741581120 pyconfig.py:471] Config param stop_strings: None
I0423 07:30:50.738972 137367741581120 pyconfig.py:471] Config param student_overrides: {'model_name': 'llama3.1-8b'}
I0423 07:30:50.738999 137367741581120 pyconfig.py:471] Config param student_params_to_update: None
I0423 07:30:50.739023 137367741581120 pyconfig.py:471] Config param subslice_shape: 
I0423 07:30:50.739047 137367741581120 pyconfig.py:471] Config param swap_space_vllm_gb: 2
I0423 07:30:50.739073 137367741581120 pyconfig.py:471] Config param system_prompt: 
I0423 07:30:50.739110 137367741581120 pyconfig.py:471] Config param target_eval_loss: 0.0
I0423 07:30:50.739132 137367741581120 pyconfig.py:471] Config param teacher_overrides: {'model_name': 'llama3.1-8b'}
I0423 07:30:50.739149 137367741581120 pyconfig.py:471] Config param temperature_tuning: False
I0423 07:30:50.739168 137367741581120 pyconfig.py:471] Config param temporal_patch_size_for_vit: 2
I0423 07:30:50.739194 137367741581120 pyconfig.py:471] Config param tensorboard_dir: /deps/maxtext_output/gpt3-52k_2026-04-23-07-30/tensorboard/
I0423 07:30:50.739224 137367741581120 pyconfig.py:471] Config param tensors_on_device: None
I0423 07:30:50.739244 137367741581120 pyconfig.py:471] Config param tensors_to_offload: None
I0423 07:30:50.739260 137367741581120 pyconfig.py:471] Config param test_batch_start_index: 0
I0423 07:30:50.739274 137367741581120 pyconfig.py:471] Config param tile_size_for_vit: 336
I0423 07:30:50.739289 137367741581120 pyconfig.py:471] Config param tokenize_eval_data: True
I0423 07:30:50.739305 137367741581120 pyconfig.py:471] Config param tokenize_train_data: True
I0423 07:30:50.739320 137367741581120 pyconfig.py:471] Config param tokenizer_path: meta-llama/Llama-3.1-8B
I0423 07:30:50.739335 137367741581120 pyconfig.py:471] Config param tokenizer_type: TokenizerType.HUGGINGFACE
I0423 07:30:50.739353 137367741581120 pyconfig.py:471] Config param topk_routing_group: -1
I0423 07:30:50.739369 137367741581120 pyconfig.py:471] Config param train_data_columns: ['text']
I0423 07:30:50.739383 137367741581120 pyconfig.py:471] Config param train_fraction: 1.0
I0423 07:30:50.739399 137367741581120 pyconfig.py:471] Config param train_image_column: image
I0423 07:30:50.739413 137367741581120 pyconfig.py:471] Config param train_micro_batch_size: -1
I0423 07:30:50.739429 137367741581120 pyconfig.py:471] Config param train_split: train
I0423 07:30:50.739445 137367741581120 pyconfig.py:471] Config param trainable_parameters_mask: []
I0423 07:30:50.739460 137367741581120 pyconfig.py:471] Config param trainable_position_size: 2048
I0423 07:30:50.739476 137367741581120 pyconfig.py:471] Config param trainer_devices_fraction: 0.5
I0423 07:30:50.739491 137367741581120 pyconfig.py:471] Config param upload_all_profiler_results: False
I0423 07:30:50.739506 137367741581120 pyconfig.py:471] Config param use_2d_fsdp_sharding: False
I0423 07:30:50.739522 137367741581120 pyconfig.py:471] Config param use_agentic_rollout: False
I0423 07:30:50.739537 137367741581120 pyconfig.py:471] Config param use_audio: False
I0423 07:30:50.739552 137367741581120 pyconfig.py:471] Config param use_audio_in_video: False
I0423 07:30:50.739568 137367741581120 pyconfig.py:471] Config param use_batch_split_schedule: False
I0423 07:30:50.739592 137367741581120 pyconfig.py:471] Config param use_chat_template: False
I0423 07:30:50.739619 137367741581120 pyconfig.py:471] Config param use_chunked_prefill: False
I0423 07:30:50.739646 137367741581120 pyconfig.py:471] Config param use_custom_sort_vjp: True
I0423 07:30:50.739671 137367741581120 pyconfig.py:471] Config param use_dpo: False
I0423 07:30:50.739696 137367741581120 pyconfig.py:471] Config param use_gather_mosaic_kernel: False
I0423 07:30:50.739719 137367741581120 pyconfig.py:471] Config param use_grpo: True
I0423 07:30:50.739743 137367741581120 pyconfig.py:471] Config param use_indexer: False
I0423 07:30:50.739767 137367741581120 pyconfig.py:471] Config param use_iota_embed: True
I0423 07:30:50.739791 137367741581120 pyconfig.py:471] Config param use_jax_splash: False
I0423 07:30:50.739816 137367741581120 pyconfig.py:471] Config param use_max_logit_estimate: -1
I0423 07:30:50.739841 137367741581120 pyconfig.py:471] Config param use_mrope: False
I0423 07:30:50.739864 137367741581120 pyconfig.py:471] Config param use_multimodal: False
I0423 07:30:50.739879 137367741581120 pyconfig.py:471] Config param use_pathways: True
I0423 07:30:50.739895 137367741581120 pyconfig.py:471] Config param use_post_attn_norm: False
I0423 07:30:50.739908 137367741581120 pyconfig.py:471] Config param use_post_ffw_norm: False
I0423 07:30:50.739924 137367741581120 pyconfig.py:471] Config param use_qk_clip: False
I0423 07:30:50.739938 137367741581120 pyconfig.py:471] Config param use_qk_norm: False
I0423 07:30:50.739954 137367741581120 pyconfig.py:471] Config param use_qk_norm_in_gdn: True
I0423 07:30:50.739974 137367741581120 pyconfig.py:471] Config param use_qwix_quantization: False
I0423 07:30:50.740000 137367741581120 pyconfig.py:471] Config param use_ragged_attention: False
I0423 07:30:50.740024 137367741581120 pyconfig.py:471] Config param use_random_routing: False
I0423 07:30:50.740043 137367741581120 pyconfig.py:471] Config param use_replicator_service: False
I0423 07:30:50.740057 137367741581120 pyconfig.py:471] Config param use_ring_of_experts: False
I0423 07:30:50.740073 137367741581120 pyconfig.py:471] Config param use_sft: False
I0423 07:30:50.740087 137367741581120 pyconfig.py:471] Config param use_splash_scheduler: False
I0423 07:30:50.740113 137367741581120 pyconfig.py:471] Config param use_tokamax_gmm: False
I0423 07:30:50.740127 137367741581120 pyconfig.py:471] Config param use_tokamax_splash: False
I0423 07:30:50.740142 137367741581120 pyconfig.py:471] Config param use_truncation: True
I0423 07:30:50.740156 137367741581120 pyconfig.py:471] Config param use_tunix_gradient_accumulation: False
I0423 07:30:50.740171 137367741581120 pyconfig.py:471] Config param use_untrainable_positional_embedding: False
I0423 07:30:50.740185 137367741581120 pyconfig.py:471] Config param use_vertex_tensorboard: False
I0423 07:30:50.740200 137367741581120 pyconfig.py:471] Config param using_pipeline_parallelism: False
I0423 07:30:50.740214 137367741581120 pyconfig.py:471] Config param v_head_dim: 128
I0423 07:30:50.740233 137367741581120 pyconfig.py:471] Config param v_norm_with_scale: True
I0423 07:30:50.740247 137367741581120 pyconfig.py:471] Config param value_proj: RematLocation.REMAT
I0423 07:30:50.740264 137367741581120 pyconfig.py:471] Config param vertex_tensorboard_project: 
I0423 07:30:50.740278 137367741581120 pyconfig.py:471] Config param vertex_tensorboard_region: 
I0423 07:30:50.740293 137367741581120 pyconfig.py:471] Config param video_path: 
I0423 07:30:50.740307 137367741581120 pyconfig.py:471] Config param video_placeholder: <|video|>
I0423 07:30:50.740322 137367741581120 pyconfig.py:471] Config param vision_output_dim_for_vit: 4096
I0423 07:30:50.740336 137367741581120 pyconfig.py:471] Config param vision_output_length: -1
I0423 07:30:50.740360 137367741581120 pyconfig.py:471] Config param vllm_additional_config: {}
I0423 07:30:50.740387 137367741581120 pyconfig.py:471] Config param vllm_hf_config_path: 
I0423 07:30:50.740414 137367741581120 pyconfig.py:471] Config param vllm_hf_overrides: {}
I0423 07:30:50.740440 137367741581120 pyconfig.py:471] Config param vocab_size: 32000
I0423 07:30:50.740465 137367741581120 pyconfig.py:471] Config param warmup_steps_fraction: 0.1
I0423 07:30:50.740490 137367741581120 pyconfig.py:471] Config param weight_dtype: float32
I0423 07:30:50.740531 137367741581120 pyconfig.py:471] Config param weight_quantization_calibration_method: absmax
I0423 07:30:50.740556 137367741581120 pyconfig.py:471] Config param wi_tile_dlhs_batch_seq: 512
I0423 07:30:50.740580 137367741581120 pyconfig.py:471] Config param wi_tile_dlhs_embed_dim: 1024
I0423 07:30:50.740607 137367741581120 pyconfig.py:471] Config param wi_tile_dlhs_mlp_dim: 1024
I0423 07:30:50.740625 137367741581120 pyconfig.py:471] Config param wi_tile_drhs_batch_seq: 512
I0423 07:30:50.740640 137367741581120 pyconfig.py:471] Config param wi_tile_drhs_embed_dim: 1024
I0423 07:30:50.740655 137367741581120 pyconfig.py:471] Config param wi_tile_drhs_mlp_dim: 1024
I0423 07:30:50.740670 137367741581120 pyconfig.py:471] Config param wi_tile_fwd_batch_seq: 512
I0423 07:30:50.740686 137367741581120 pyconfig.py:471] Config param wi_tile_fwd_embed_dim: 1024
I0423 07:30:50.740711 137367741581120 pyconfig.py:471] Config param wi_tile_fwd_mlp_dim: 1024
I0423 07:30:50.740738 137367741581120 pyconfig.py:471] Config param wo_tile_dlhs_batch_seq: 512
I0423 07:30:50.740762 137367741581120 pyconfig.py:471] Config param wo_tile_dlhs_embed_dim: 1024
I0423 07:30:50.740777 137367741581120 pyconfig.py:471] Config param wo_tile_dlhs_mlp_dim: 1024
I0423 07:30:50.740793 137367741581120 pyconfig.py:471] Config param wo_tile_drhs_batch_seq: 512
I0423 07:30:50.740808 137367741581120 pyconfig.py:471] Config param wo_tile_drhs_embed_dim: 1024
I0423 07:30:50.740822 137367741581120 pyconfig.py:471] Config param wo_tile_drhs_mlp_dim: 1024
I0423 07:30:50.740837 137367741581120 pyconfig.py:471] Config param wo_tile_fwd_batch_seq: 512
I0423 07:30:50.740853 137367741581120 pyconfig.py:471] Config param wo_tile_fwd_embed_dim: 1024
I0423 07:30:50.740867 137367741581120 pyconfig.py:471] Config param wo_tile_fwd_mlp_dim: 1024
I0423 07:30:50.740882 137367741581120 pyconfig.py:471] Config param wsd_decay_steps_fraction: 0.1
I0423 07:30:50.740897 137367741581120 pyconfig.py:471] Config param wsd_decay_style: WsdDecayStyle.LINEAR
I0423 07:30:50.740915 137367741581120 pyconfig.py:471] Config param xprof_e2e_enable_fw_power_level_event: False
I0423 07:30:50.740929 137367741581120 pyconfig.py:471] Config param xprof_e2e_enable_fw_thermal_event: False
I0423 07:30:50.740945 137367741581120 pyconfig.py:471] Config param xprof_e2e_enable_fw_throttle_event: False
I0423 07:30:50.740958 137367741581120 pyconfig.py:471] Config param xprof_tpu_power_trace_level: 0
I0423 07:30:50.740976 137367741581120 pyconfig.py:471] Config param z_loss_multiplier: 0.0
I0423 07:30:50.741448 137367741581120 tokenizer.py:245] Tokenizer path: meta-llama/Llama-2-7b-chat-hf
I0423 07:30:50.741499 137367741581120 tokenizer.py:224] Loading HF tokenizer: meta-llama/Llama-2-7b-chat-hf
I0423 07:30:50.926573 137367741581120 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json "HTTP/1.1 200 OK"
I0423 07:30:51.038172 137367741581120 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json "HTTP/1.1 200 OK"
I0423 07:30:51.149415 137367741581120 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer_config.json "HTTP/1.1 200 OK"
I0423 07:30:51.272727 137367741581120 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer_config.json "HTTP/1.1 200 OK"
I0423 07:30:51.386734 137367741581120 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main/additional_chat_templates?recursive=false&expand=false "HTTP/1.1 404 Not Found"
I0423 07:30:51.493147 137367741581120 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main?recursive=true&expand=false "HTTP/1.1 200 OK"
I0423 07:30:51.600013 137367741581120 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer.model "HTTP/1.1 302 Found"
I0423 07:30:51.711998 137367741581120 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/xet-read-token/f5db02db724555f92da89c216ac04704f23d4590 "HTTP/1.1 200 OK"
I0423 07:30:52.500526 137367741581120 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer.json "HTTP/1.1 200 OK"
I0423 07:30:52.628233 137367741581120 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer.json "HTTP/1.1 200 OK"
I0423 07:30:52.835793 137367741581120 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/added_tokens.json "HTTP/1.1 404 Not Found"
I0423 07:30:52.946760 137367741581120 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/special_tokens_map.json "HTTP/1.1 200 OK"
I0423 07:30:53.055449 137367741581120 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/special_tokens_map.json "HTTP/1.1 200 OK"
I0423 07:30:53.169205 137367741581120 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/chat_template.jinja "HTTP/1.1 404 Not Found"
I0423 07:30:53.259505 137367741581120 _schedule.py:129] A polynomial schedule was set with a non-positive `transition_steps` value; this results in a constant schedule with value `init_value`.
I0423 07:30:53.266740 137367741581120 maxtext_utils.py:1604] Num_devices: 32, shape (1, 4, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1)
I0423 07:30:53.266880 137367741581120 train_distill.py:582] Applying logical axis rules for model initialization and training...
I0423 07:30:53.266975 137367741581120 train_distill.py:586] Loading Student from ...
I0423 07:30:53.267022 137367741581120 train_distill.py:170] --- Student Configuration ---
I0423 07:30:53.267062 137367741581120 train_distill.py:171]   Model Name:      gpt3-52k
I0423 07:30:53.267111 137367741581120 train_distill.py:172]   Dimensions:      1 Layers, 16 Emb Dim, 8 Head Dim
I0423 07:30:53.267146 137367741581120 train_distill.py:175]   Attention Heads: 2 Query, 2 KV
I0423 07:30:53.267178 137367741581120 train_distill.py:176]   Vocab Size:      32000
I0423 07:30:53.267207 137367741581120 train_distill.py:177]   Checkpoint:      
I0423 07:30:53.267244 137367741581120 train_distill.py:451] Initializing model: gpt3-52k...
I0423 07:30:54.656754 137367741581120 train_distill.py:600] Loading Teacher from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items...
I0423 07:30:54.656865 137367741581120 train_distill.py:170] --- Teacher Configuration ---
I0423 07:30:54.656894 137367741581120 train_distill.py:171]   Model Name:      gpt3-52k
I0423 07:30:54.656919 137367741581120 train_distill.py:172]   Dimensions:      1 Layers, 16 Emb Dim, 8 Head Dim
I0423 07:30:54.656940 137367741581120 train_distill.py:175]   Attention Heads: 2 Query, 2 KV
I0423 07:30:54.656960 137367741581120 train_distill.py:176]   Vocab Size:      32000
I0423 07:30:54.656978 137367741581120 train_distill.py:177]   Checkpoint:      gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items
I0423 07:30:54.656996 137367741581120 train_distill.py:451] Initializing model: gpt3-52k...
I0423 07:30:55.659353 137367741581120 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0423 07:30:55.659516 137367741581120 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=True, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7ceeb350df40>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0423 07:30:55.659578 137367741581120 abstract_checkpointer.py:35] orbax-checkpoint version: 0.11.28
W0423 07:30:56.199372 137367741581120 checkpoint.py:202] Metadata file does not exist: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items/_CHECKPOINT_METADATA
I0423 07:30:56.765111    1967 google_auth_provider.cc:181] Running on GCE, using service account 562977990677-compute@developer.gserviceaccount.com
I0423 07:30:58.047269 137367741581120 checkpointer.py:304] Restoring checkpoint from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items.
W0423 07:31:00.384142 137367741581120 transform_utils.py:230] The transformations API will eventually be replaced by an upgraded design. The current API will not be removed until this point, but it will no longer be actively worked on.
I0423 07:31:00.384500 137367741581120 transform_utils.py:288] The following keys are not loaded from the original tree after applying specified transforms: params/params/decoder/to_nnx__rngs/aqt/count, params/params/decoder/to_nnx__rngs/aqt/key, params/params/decoder/to_nnx__rngs/dropout/count, params/params/decoder/to_nnx__rngs/dropout/key, params/params/decoder/to_nnx__rngs/params/count, params/params/decoder/to_nnx__rngs/params/key
I0423 07:31:02.940896 137367741581120 checkpointer.py:318] Finished restoring checkpoint in 5.27 seconds from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items.
I0423 07:31:03.678789 137367741581120 train_distill.py:626] Initializing Data Iterators via MaxText pipeline...
I0423 07:31:03.742671 137367741581120 config.py:112] TensorFlow version 2.20.0 available.
I0423 07:31:03.743195 137367741581120 config.py:125] JAX version 0.9.2 available.
I0423 07:31:04.185069 137367741581120 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/main/README.md "HTTP/1.1 307 Temporary Redirect"
I0423 07:31:04.193977 137367741581120 _client.py:1025] HTTP Request: HEAD https://huggingface.co/api/resolve-cache/datasets/OptimalScale/ClimbMix/6d467b96d8f26cbe7465e2d70e36191aa75867ac/README.md "HTTP/1.1 200 OK"
I0423 07:31:04.203513 137367741581120 _client.py:1025] HTTP Request: GET https://huggingface.co/api/resolve-cache/datasets/OptimalScale/ClimbMix/6d467b96d8f26cbe7465e2d70e36191aa75867ac/README.md "HTTP/1.1 200 OK"
I0423 07:31:04.305877 137367741581120 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/6d467b96d8f26cbe7465e2d70e36191aa75867ac/ClimbMix.py "HTTP/1.1 404 Not Found"
I0423 07:31:04.627338 137367741581120 _client.py:1025] HTTP Request: HEAD https://s3.amazonaws.com/datasets.huggingface.co/datasets/datasets/OptimalScale/ClimbMix/OptimalScale/ClimbMix.py "HTTP/1.1 404 Not Found"
I0423 07:31:04.857893 137367741581120 _client.py:1025] HTTP Request: GET https://huggingface.co/api/datasets/OptimalScale/ClimbMix/revision/6d467b96d8f26cbe7465e2d70e36191aa75867ac "HTTP/1.1 200 OK"
I0423 07:31:04.960238 137367741581120 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/6d467b96d8f26cbe7465e2d70e36191aa75867ac/.huggingface.yaml "HTTP/1.1 404 Not Found"
I0423 07:31:05.134482 137367741581120 _client.py:1025] HTTP Request: GET https://datasets-server.huggingface.co/info?dataset=OptimalScale/ClimbMix "HTTP/1.1 200 OK"
I0423 07:31:05.247736 137367741581120 _client.py:1025] HTTP Request: GET https://huggingface.co/api/datasets/OptimalScale/ClimbMix/tree/6d467b96d8f26cbe7465e2d70e36191aa75867ac/data?recursive=true&expand=false "HTTP/1.1 404 Not Found"
I0423 07:31:05.356936 137367741581120 _client.py:1025] HTTP Request: GET https://huggingface.co/api/datasets/OptimalScale/ClimbMix/tree/6d467b96d8f26cbe7465e2d70e36191aa75867ac?recursive=false&expand=false "HTTP/1.1 200 OK"
I0423 07:31:05.492459 137367741581120 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/6d467b96d8f26cbe7465e2d70e36191aa75867ac/dataset_infos.json "HTTP/1.1 404 Not Found"
I0423 07:31:05.660361 137367741581120 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json "HTTP/1.1 200 OK"
I0423 07:31:05.779233 137367741581120 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer_config.json "HTTP/1.1 200 OK"
I0423 07:31:05.887618 137367741581120 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main/additional_chat_templates?recursive=false&expand=false "HTTP/1.1 404 Not Found"
I0423 07:31:05.992044 137367741581120 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main?recursive=true&expand=false "HTTP/1.1 200 OK"
E0423 07:31:06.082575 137367741581120 packing.py:209] PackAndBatchOperation is deprecated. Please use lazy_dataset.FirstFitPackIterDataset instead.
I0423 07:31:06.082786 137367741581120 data_loader.py:408] Adding CopyNumPyArrayToSharedMemory MapTransform.
I0423 07:31:06.085791 137367741581120 train_distill.py:396] Input Pipeline Checkpointing: DISABLED
I0423 07:31:06.085851 137367741581120 train_distill.py:400] Reason: Iterator 'MultiHostDataLoadIterator' is not recognized as Grain (dataset_type='DatasetType.HF', has_save=False)
I0423 07:31:06.085915 137367741581120 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0423 07:31:06.085993 137367741581120 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7ceeb350df40>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0423 07:31:06.086035 137367741581120 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0423 07:31:06.086068 137367741581120 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7ceeb350df40>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0423 07:31:06.086125 137367741581120 checkpoint_manager.py:702] [process=6][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7cd1f829c5f0>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7cd39afa7950>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7cd398073440>}, handler_registry=None
I0423 07:31:06.086321 137367741581120 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7cd1f829c5f0>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0423 07:31:06.086363 137367741581120 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7cd39afa7950>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0423 07:31:06.086388 137367741581120 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7cd398073440>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0423 07:31:06.086411 137367741581120 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7cd1f8306150>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0423 07:31:06.086437 137367741581120 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7cd1f829c5f0>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7cd1f829c5f0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7cd39afa7950>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7cd39afa7950>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7cd398073440>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7cd398073440>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7cd1f8306150>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7cd1f8306150>}).
I0423 07:31:06.086802 137367741581120 async_checkpointer.py:177] [process=6][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7ccdd02b85e0> timeout: 600 secs and primary_host=0 for async checkpoint writes
I0423 07:31:08.203563 137367741581120 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_main_20260423_071551/pt_distill_linen_xpk_main_20260423_071551_07_distill_smoke/checkpoints
I0423 07:31:08.228477 137367741581120 checkpoint_manager.py:921] [process=6][thread=MainThread] CheckpointManager created,  primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_main_20260423_071551/pt_distill_linen_xpk_main_20260423_071551_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7cd398073410>
I0423 07:31:08.228600 137367741581120 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0423 07:31:08.228680 137367741581120 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7ceeb350df40>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0423 07:31:08.228734 137367741581120 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0423 07:31:08.228783 137367741581120 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7ceeb350df40>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0423 07:31:08.228825 137367741581120 checkpoint_manager.py:1983] [process=6][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning.
I0423 07:31:08.228878 137367741581120 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=137367741581120 count=1 at 0x7cd39af19680>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7cd398073230>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7cd398073200>, _write_futures=[])
I0423 07:31:08.229286 137367741581120 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=137367741581120 count=1 at 0x7cd39af19680>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7cd398073230>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7cd398073200>, _write_futures=[])
I0423 07:31:08.229319 137367741581120 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=137367741581120 count=1 at 0x7cd39af19680>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7cd398073230>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7cd398073200>, _write_futures=[])
I0423 07:31:08.229359 137367741581120 checkpoint_manager.py:702] [process=6][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7cd3980733e0>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7cd398072540>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7cd3980719a0>, 'iter': <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7cd751df40e0>}, handler_registry=None
I0423 07:31:08.229478 137367741581120 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7cd3980733e0>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0423 07:31:08.229516 137367741581120 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7cd398072540>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0423 07:31:08.229541 137367741581120 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7cd3980719a0>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0423 07:31:08.229570 137367741581120 composite_checkpoint_handler.py:237] Deferred registration for item: "iter". Adding handler `<maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7cd751df40e0>` for item "iter" and save args `<class 'maxtext.common.checkpointing.GrainCheckpointSave'>` and restore args `<class 'maxtext.common.checkpointing.GrainCheckpointRestore'>` to `_handler_registry`.
I0423 07:31:08.229593 137367741581120 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7cceac046330>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0423 07:31:08.229617 137367741581120 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7cd3980733e0>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7cd3980733e0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7cd398072540>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7cd398072540>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7cd3980719a0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7cd3980719a0>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointSave'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7cd751df40e0>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointRestore'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7cd751df40e0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7cceac046330>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7cceac046330>}).
I0423 07:31:08.229689 137367741581120 async_checkpointer.py:177] [process=6][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7ccdd02b8720> timeout: 600 secs and primary_host=0 for async checkpoint writes
I0423 07:31:08.650950 137367741581120 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_main_20260423_071551/pt_distill_linen_xpk_main_20260423_071551_07_distill_smoke/checkpoints
I0423 07:31:09.076161 137367741581120 checkpoint_manager.py:921] [process=6][thread=MainThread] CheckpointManager created,  primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_main_20260423_071551/pt_distill_linen_xpk_main_20260423_071551_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7cd398072c00>
I0423 07:31:09.076744 137367741581120 train_distill.py:677] Starting Distillation Training...
I0423 07:31:09.076858 137367741581120 peft_trainer.py:584] Training with mesh: Mesh('diloco': 1, 'data': 4, 'stage': 1, 'fsdp': 8, 'fsdp_transpose': 1, 'sequence': 1, 'context': 1, 'context_autoregressive': 1, 'tensor': 1, 'tensor_transpose': 1, 'tensor_sequence': 1, 'expert': 1, 'autoregressive': 1, axis_types=(Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto))
I0423 07:31:09.534421 137367741581120 peft_trainer.py:594] Compiled train_step cache size: 0
I0423 07:31:09.536238 137212704311040 grain_pool.py:367] Grain pool will use 1 processes.
I0423 07:31:09.593679 137212704311040 grain_pool.py:440] Grain pool will start child processes.
I0423 07:31:09.599041 137212704311040 grain_pool.py:448] Grain pool started all child processes.
2026-04-23 07:31:16.089071: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
`rope_parameters`'s factor field must be a float >= 1, got 40
`rope_parameters`'s beta_fast field must be a float, got 32
`rope_parameters`'s beta_slow field must be a float, got 1
DeepseekV32Config got `key=rope_scaling` in kwargs but hasn't set it as attribute. For RoPE standardization you need to set `self.rope_parameters` in model's config. 
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 781, in <module>
    app.run(main)
  File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 316, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 261, in _run_main
    sys.exit(main(argv))
             ^^^^^^^^^^
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 777, in main
    train_distill(student_config, teacher_config, is_offline, global_config.offline_data_dir)
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 679, in train_distill
    trainer.train(train_iter, eval_iter)
  File "/usr/local/lib/python3.12/site-packages/tunix/sft/peft_trainer.py", line 652, in train
    train_example = sharding_utils.shard_input(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 58, in shard_input
    return jax.tree.map(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/tree.py", line 156, in map
    return tree_util.tree_map(f, tree, *rest, is_leaf=is_leaf)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 373, in tree_map
    return treedef.unflatten(f(*xs) for xs in zip(*all_leaves))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 373, in <genexpr>
    return treedef.unflatten(f(*xs) for xs in zip(*all_leaves))
                             ^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 59, in <lambda>
    lambda x: jax.make_array_from_process_local_data(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 985, in make_array_from_process_local_data
    out = [_array_from_process_local_data(data, s, shape)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 1047, in _array_from_process_local_data
    return make_array_from_callback(global_shape, sharding, cb)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 844, in make_array_from_callback
    per_device_values = api.device_put(per_device_values, devices)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/api.py", line 2732, in device_put
    out_flat = dispatch._batched_device_put_impl(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 602, in _batched_device_put_impl
    y = _device_put_impl(x, device=device, src=src, copy=cp, aval=aval)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 582, in _device_put_impl
    return _device_put_sharding_impl(x, aval, device, copy)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 512, in _device_put_sharding_impl
    raise ValueError(
ValueError: When the second argument to `device_put` is a Device, the first argument must be a fully addressable array or a non-addressable array with a single device sharding. Got value with devices {TpuDevice(id=26, process_index=7, coords=(2,6,0), core_on_chip=0), TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), TpuDevice(id=17, process_index=4, coords=(1,4,0), core_on_chip=0), TpuDevice(id=13, process_index=2, coords=(1,3,0), core_on_chip=0), TpuDevice(id=30, process_index=7, coords=(2,7,0), core_on_chip=0), TpuDevice(id=8, process_index=2, coords=(0,2,0), core_on_chip=0), TpuDevice(id=23, process_index=5, coords=(3,5,0), core_on_chip=0), TpuDevice(id=9, process_index=2, coords=(1,2,0), core_on_chip=0), TpuDevice(id=31, process_index=7, coords=(3,7,0), core_on_chip=0), TpuDevice(id=25, process_index=6, coords=(1,6,0), core_on_chip=0), TpuDevice(id=1, process_index=0, coords=(1,0,0), core_on_chip=0), TpuDevice(id=18, process_index=5, coords=(2,4,0), core_on_chip=0), TpuDevice(id=20, process_index=4, coords=(0,5,0), core_on_chip=0), TpuDevice(id=5, process_index=0, coords=(1,1,0), core_on_chip=0), TpuDevice(id=7, process_index=1, coords=(3,1,0), core_on_chip=0), TpuDevice(id=29, process_index=6, coords=(1,7,0), core_on_chip=0), TpuDevice(id=14, process_index=3, coords=(2,3,0), core_on_chip=0), TpuDevice(id=11, process_index=3, coords=(3,2,0), core_on_chip=0), TpuDevice(id=19, process_index=5, coords=(3,4,0), core_on_chip=0), TpuDevice(id=28, process_index=6, coords=(0,7,0), core_on_chip=0), TpuDevice(id=22, process_index=5, coords=(2,5,0), core_on_chip=0), TpuDevice(id=2, process_index=1, coords=(2,0,0), core_on_chip=0), TpuDevice(id=12, process_index=2, coords=(0,3,0), core_on_chip=0), TpuDevice(id=15, process_index=3, coords=(3,3,0), core_on_chip=0), TpuDevice(id=27, process_index=7, coords=(3,6,0), core_on_chip=0), TpuDevice(id=10, process_index=3, coords=(2,2,0), core_on_chip=0), TpuDevice(id=3, process_index=1, coords=(3,0,0), core_on_chip=0), TpuDevice(id=16, process_index=4, coords=(0,4,0), core_on_chip=0), TpuDevice(id=24, process_index=6, coords=(0,6,0), core_on_chip=0), TpuDevice(id=4, process_index=0, coords=(0,1,0), core_on_chip=0), TpuDevice(id=6, process_index=1, coords=(2,1,0), core_on_chip=0), TpuDevice(id=21, process_index=4, coords=(1,5,0), core_on_chip=0)}
I0423 07:31:20.242706 137212704311040 grain_pool.py:542] Grain pool is exiting.
I0423 07:31:20.242809 137212704311040 grain_pool.py:547] Shutting down multiprocessing system.
I0423 07:31:21.935147 137212704311040 grain_pool.py:547] Shutting down multiprocessing system.
/usr/local/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 15 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
XPK End: Thu Apr 23 07:31:30 UTC 2026
EXIT_CODE=1
NNX  ·  586e69205  ·  main_20260423_071551  ·  full log
XPK Start: Thu Apr 23 07:40:48 UTC 2026
2026-04-23 07:41:06.239903: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
`rope_parameters`'s factor field must be a float >= 1, got 40
`rope_parameters`'s beta_fast field must be a float, got 32
`rope_parameters`'s beta_slow field must be a float, got 1
DeepseekV32Config got `key=rope_scaling` in kwargs but hasn't set it as attribute. For RoPE standardization you need to set `self.rope_parameters` in model's config. 
I0423 07:41:12.573308 134431136712512 max_utils.py:273] Attempting to initialize the jax distributed system...
I0423 07:41:21.613436 134431136712512 distributed.py:149] Starting JAX distributed service on [::]:8482
I0423 07:41:21.615808 134431136712512 distributed.py:172] Connecting to JAX distributed service on mt-07-distill-smoke-zg9kk-slice-job-0-0.mt-07-distill-smoke-zg9kk:8482
I0423 07:41:22.722996 134431136712512 max_utils.py:284] Jax distributed system initialized!
I0423 07:41:27.945087 134431136712512 max_utils.py:244] Jax distributed system is already initialized.
W0423 07:41:28.076818 134431136712512 pyconfig.py:111] base_output_directory is not provided; Using local directory called maxtext_output
I0423 07:41:28.496958 134431136712512 max_utils.py:244] Jax distributed system is already initialized.
I0423 07:41:28.498208 134431136712512 pyconfig.py:471] Config param abort_on_inf_loss: True
I0423 07:41:28.498257 134431136712512 pyconfig.py:471] Config param abort_on_nan_loss: True
I0423 07:41:28.498288 134431136712512 pyconfig.py:471] Config param act_quantization_calibration_method: absmax
I0423 07:41:28.498310 134431136712512 pyconfig.py:471] Config param activation_dropout_for_audio: 0.0
I0423 07:41:28.498331 134431136712512 pyconfig.py:471] Config param activation_function_for_audio: gelu
I0423 07:41:28.498352 134431136712512 pyconfig.py:471] Config param activations_in_float32: False
I0423 07:41:28.498371 134431136712512 pyconfig.py:471] Config param adam_b1: 0.9
I0423 07:41:28.498389 134431136712512 pyconfig.py:471] Config param adam_b2: 0.95
I0423 07:41:28.498405 134431136712512 pyconfig.py:471] Config param adam_eps: 1e-08
I0423 07:41:28.498427 134431136712512 pyconfig.py:471] Config param adam_eps_root: 0.0
I0423 07:41:28.498442 134431136712512 pyconfig.py:471] Config param adam_weight_decay: 0.1
I0423 07:41:28.498459 134431136712512 pyconfig.py:471] Config param adamw_mask: []
I0423 07:41:28.498474 134431136712512 pyconfig.py:471] Config param add_bos: True
I0423 07:41:28.498490 134431136712512 pyconfig.py:471] Config param add_eos: True
I0423 07:41:28.498505 134431136712512 pyconfig.py:471] Config param allow_split_physical_axes: False
I0423 07:41:28.498524 134431136712512 pyconfig.py:471] Config param ar_cache_axis_order: 1,2,0,3
I0423 07:41:28.498542 134431136712512 pyconfig.py:471] Config param async_checkpointing: True
I0423 07:41:28.498558 134431136712512 pyconfig.py:471] Config param async_scheduling: False
I0423 07:41:28.498572 134431136712512 pyconfig.py:471] Config param attention: dot_product
I0423 07:41:28.498589 134431136712512 pyconfig.py:471] Config param attention_bias: False
I0423 07:41:28.498604 134431136712512 pyconfig.py:471] Config param attention_dropout_for_audio: 0.0
I0423 07:41:28.498620 134431136712512 pyconfig.py:471] Config param attention_out: RematLocation.REMAT
I0423 07:41:28.498641 134431136712512 pyconfig.py:471] Config param attention_output_dim: -1
I0423 07:41:28.498657 134431136712512 pyconfig.py:471] Config param attention_sink: False
I0423 07:41:28.498673 134431136712512 pyconfig.py:471] Config param attention_type: global
I0423 07:41:28.498688 134431136712512 pyconfig.py:471] Config param attn_logits_soft_cap: None
I0423 07:41:28.498704 134431136712512 pyconfig.py:471] Config param audio_path: 
I0423 07:41:28.498718 134431136712512 pyconfig.py:471] Config param audio_placeholder: <|audio|>
I0423 07:41:28.498733 134431136712512 pyconfig.py:471] Config param autoregressive_decode_assert: 
I0423 07:41:28.498749 134431136712512 pyconfig.py:471] Config param base_config: base.yml
I0423 07:41:28.498764 134431136712512 pyconfig.py:471] Config param base_emb_dim: 16
I0423 07:41:28.498780 134431136712512 pyconfig.py:471] Config param base_mlp_dim: 64
I0423 07:41:28.498795 134431136712512 pyconfig.py:471] Config param base_moe_mlp_dim: -1
I0423 07:41:28.498810 134431136712512 pyconfig.py:471] Config param base_num_decoder_layers: 1
I0423 07:41:28.498826 134431136712512 pyconfig.py:471] Config param base_num_kv_heads: 2
I0423 07:41:28.498841 134431136712512 pyconfig.py:471] Config param base_num_query_heads: 2
I0423 07:41:28.498857 134431136712512 pyconfig.py:471] Config param base_output_directory: /deps/maxtext_output
I0423 07:41:28.498872 134431136712512 pyconfig.py:471] Config param batch_size: 1
I0423 07:41:28.498888 134431136712512 pyconfig.py:471] Config param batch_split_factor: 1
I0423 07:41:28.498903 134431136712512 pyconfig.py:471] Config param beta_fast: 32
I0423 07:41:28.498919 134431136712512 pyconfig.py:471] Config param beta_slow: 1
I0423 07:41:28.498934 134431136712512 pyconfig.py:471] Config param bwd_quantization_calibration_method: absmax
I0423 07:41:28.498950 134431136712512 pyconfig.py:471] Config param capacity_factor: -1.0
I0423 07:41:28.498966 134431136712512 pyconfig.py:471] Config param cast_logits_to_fp32: True
I0423 07:41:28.498981 134431136712512 pyconfig.py:471] Config param chat_template: 
I0423 07:41:28.498996 134431136712512 pyconfig.py:471] Config param chat_template_path: 
I0423 07:41:28.499012 134431136712512 pyconfig.py:471] Config param checkpoint_conversion_fn: None
I0423 07:41:28.499030 134431136712512 pyconfig.py:471] Config param checkpoint_dir: /deps/maxtext_output/gpt3-52k_2026-04-23-07-41/checkpoints/
I0423 07:41:28.499045 134431136712512 pyconfig.py:471] Config param checkpoint_is_quantized: False
I0423 07:41:28.499092 134431136712512 pyconfig.py:471] Config param checkpoint_period: 2000
I0423 07:41:28.499147 134431136712512 pyconfig.py:471] Config param checkpoint_storage_concurrent_gb: 96
I0423 07:41:28.499176 134431136712512 pyconfig.py:471] Config param checkpoint_storage_target_data_file_size_bytes: 2147483648
I0423 07:41:28.499195 134431136712512 pyconfig.py:471] Config param checkpoint_storage_use_ocdbt: True
I0423 07:41:28.499210 134431136712512 pyconfig.py:471] Config param checkpoint_storage_use_zarr3: True
I0423 07:41:28.499226 134431136712512 pyconfig.py:471] Config param checkpoint_todelete_full_path: None
I0423 07:41:28.499240 134431136712512 pyconfig.py:471] Config param checkpoint_todelete_subdir: None
I0423 07:41:28.499256 134431136712512 pyconfig.py:471] Config param chips_per_vm: 4
I0423 07:41:28.499274 134431136712512 pyconfig.py:471] Config param chunk_attn_window_size: 0
I0423 07:41:28.499290 134431136712512 pyconfig.py:471] Config param collect_stack_trace: False
I0423 07:41:28.499304 134431136712512 pyconfig.py:471] Config param colocated_python_checkpointing: False
I0423 07:41:28.499319 134431136712512 pyconfig.py:471] Config param colocated_python_data_input: False
I0423 07:41:28.499334 134431136712512 pyconfig.py:471] Config param compile_topology: 
I0423 07:41:28.499349 134431136712512 pyconfig.py:471] Config param compile_topology_num_slices: -1
I0423 07:41:28.499365 134431136712512 pyconfig.py:471] Config param compile_xla_flags: 
I0423 07:41:28.499381 134431136712512 pyconfig.py:471] Config param compiled_trainstep_file: 
I0423 07:41:28.499395 134431136712512 pyconfig.py:471] Config param compute_axis_order: 0,1,2,3
I0423 07:41:28.499411 134431136712512 pyconfig.py:471] Config param constant_bound_config: []
I0423 07:41:28.499426 134431136712512 pyconfig.py:471] Config param context: RematLocation.REMAT
I0423 07:41:28.499442 134431136712512 pyconfig.py:471] Config param context_parallel_load_balance: True
I0423 07:41:28.499457 134431136712512 pyconfig.py:471] Config param context_parallel_reorder_strategy: ReorderStrategy.AUTO
I0423 07:41:28.499474 134431136712512 pyconfig.py:471] Config param context_parallel_size: 1
I0423 07:41:28.499488 134431136712512 pyconfig.py:471] Config param context_parallel_strategy: all_gather
I0423 07:41:28.499515 134431136712512 pyconfig.py:471] Config param context_sharding: context
I0423 07:41:28.499541 134431136712512 pyconfig.py:471] Config param conv_chunksize_for_audio: 500
I0423 07:41:28.499564 134431136712512 pyconfig.py:471] Config param conv_stride_for_vit: 14
I0423 07:41:28.499622 134431136712512 pyconfig.py:471] Config param convert_checkpoint_if_possible: False
I0423 07:41:28.499646 134431136712512 pyconfig.py:471] Config param cost_estimate_flops_bwd: -1
I0423 07:41:28.499664 134431136712512 pyconfig.py:471] Config param cost_estimate_flops_fwd: -1
I0423 07:41:28.499680 134431136712512 pyconfig.py:471] Config param custom_mesh: 
I0423 07:41:28.499695 134431136712512 pyconfig.py:471] Config param custom_mesh_and_rule: 
I0423 07:41:28.499710 134431136712512 pyconfig.py:471] Config param d_model_for_audio: 256
I0423 07:41:28.499725 134431136712512 pyconfig.py:471] Config param data_sharding: (('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'),)
I0423 07:41:28.499744 134431136712512 pyconfig.py:471] Config param data_shuffle_seed: 0
I0423 07:41:28.499759 134431136712512 pyconfig.py:471] Config param dataset_name: c4/en:3.0.1
I0423 07:41:28.499775 134431136712512 pyconfig.py:471] Config param dataset_path: 
I0423 07:41:28.499789 134431136712512 pyconfig.py:471] Config param dataset_type: DatasetType.HF
I0423 07:41:28.499807 134431136712512 pyconfig.py:471] Config param dcn_autoregressive_parallelism: 1
I0423 07:41:28.499822 134431136712512 pyconfig.py:471] Config param dcn_context_autoregressive_parallelism: 1
I0423 07:41:28.499838 134431136712512 pyconfig.py:471] Config param dcn_context_parallelism: 1
I0423 07:41:28.499852 134431136712512 pyconfig.py:471] Config param dcn_data_parallelism: -1
I0423 07:41:28.499867 134431136712512 pyconfig.py:471] Config param dcn_diloco_parallelism: 1
I0423 07:41:28.499882 134431136712512 pyconfig.py:471] Config param dcn_expert_parallelism: 1
I0423 07:41:28.499897 134431136712512 pyconfig.py:471] Config param dcn_fsdp_parallelism: 1
I0423 07:41:28.499911 134431136712512 pyconfig.py:471] Config param dcn_fsdp_transpose_parallelism: 1
I0423 07:41:28.499927 134431136712512 pyconfig.py:471] Config param dcn_parallelism: [1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
I0423 07:41:28.499944 134431136712512 pyconfig.py:471] Config param dcn_pipeline_parallelism: 1
I0423 07:41:28.499958 134431136712512 pyconfig.py:471] Config param dcn_sequence_parallelism: 1
I0423 07:41:28.499973 134431136712512 pyconfig.py:471] Config param dcn_tensor_parallelism: 1
I0423 07:41:28.499989 134431136712512 pyconfig.py:471] Config param dcn_tensor_sequence_parallelism: 1
I0423 07:41:28.500003 134431136712512 pyconfig.py:471] Config param dcn_tensor_transpose_parallelism: 1
I0423 07:41:28.500018 134431136712512 pyconfig.py:471] Config param debug: {'rl': False}
I0423 07:41:28.500035 134431136712512 pyconfig.py:471] Config param debug_sharding: False
I0423 07:41:28.500050 134431136712512 pyconfig.py:471] Config param decode_sampling_nucleus_p: -1
I0423 07:41:28.500064 134431136712512 pyconfig.py:471] Config param decode_sampling_strategy: SamplingStrategy.GREEDY
I0423 07:41:28.500082 134431136712512 pyconfig.py:471] Config param decode_sampling_temperature: 1.0
I0423 07:41:28.500106 134431136712512 pyconfig.py:471] Config param decode_sampling_top_k: 0
I0423 07:41:28.500122 134431136712512 pyconfig.py:471] Config param decoder_block: DecoderBlockType.GPT3
I0423 07:41:28.500140 134431136712512 pyconfig.py:471] Config param decoder_layer_input: RematLocation.DEVICE
I0423 07:41:28.500156 134431136712512 pyconfig.py:471] Config param deepstack_visual_indexes_for_vit: []
I0423 07:41:28.500172 134431136712512 pyconfig.py:471] Config param degenerate_group_masking: True
I0423 07:41:28.500187 134431136712512 pyconfig.py:471] Config param dense_init_scale: 1.0
I0423 07:41:28.500203 134431136712512 pyconfig.py:471] Config param diloco_outer_lr: 0.3
I0423 07:41:28.500218 134431136712512 pyconfig.py:471] Config param diloco_outer_momentum: 0.9
I0423 07:41:28.500234 134431136712512 pyconfig.py:471] Config param diloco_sync_period: 36
I0423 07:41:28.500250 134431136712512 pyconfig.py:471] Config param distill_alpha: 0.5
I0423 07:41:28.500265 134431136712512 pyconfig.py:471] Config param distill_alpha_end: None
I0423 07:41:28.500284 134431136712512 pyconfig.py:471] Config param distill_alpha_schedule: constant
I0423 07:41:28.500300 134431136712512 pyconfig.py:471] Config param distill_beta: 0.0
I0423 07:41:28.500315 134431136712512 pyconfig.py:471] Config param distill_beta_end: None
I0423 07:41:28.500331 134431136712512 pyconfig.py:471] Config param distill_beta_schedule: constant
I0423 07:41:28.500346 134431136712512 pyconfig.py:471] Config param distill_feature_loss_type: cosine
I0423 07:41:28.500361 134431136712512 pyconfig.py:471] Config param distill_layer_indices: None
I0423 07:41:28.500377 134431136712512 pyconfig.py:471] Config param distill_temperature: 1.0
I0423 07:41:28.500391 134431136712512 pyconfig.py:471] Config param distill_temperature_end: None
I0423 07:41:28.500407 134431136712512 pyconfig.py:471] Config param distill_temperature_schedule: constant
I0423 07:41:28.500427 134431136712512 pyconfig.py:471] Config param downsample_hidden_size_for_audio: 256
I0423 07:41:28.500453 134431136712512 pyconfig.py:471] Config param dpo_beta: 0.1
I0423 07:41:28.500478 134431136712512 pyconfig.py:471] Config param dpo_label_smoothing: 0.0
I0423 07:41:28.500500 134431136712512 pyconfig.py:471] Config param dq_reduction_steps: 0
I0423 07:41:28.500516 134431136712512 pyconfig.py:471] Config param dropout_rate: 0.0
I0423 07:41:28.500530 134431136712512 pyconfig.py:471] Config param dtype: bfloat16
I0423 07:41:28.500561 134431136712512 pyconfig.py:471] Config param dtype_mm: float32
I0423 07:41:28.500577 134431136712512 pyconfig.py:471] Config param dump_hlo: False
I0423 07:41:28.500592 134431136712512 pyconfig.py:471] Config param dump_hlo_delete_local_after: True
I0423 07:41:28.500607 134431136712512 pyconfig.py:471] Config param dump_hlo_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-23-07-41/xla_dump
I0423 07:41:28.500622 134431136712512 pyconfig.py:471] Config param dump_hlo_local_dir: /tmp/xla_dump/
I0423 07:41:28.500637 134431136712512 pyconfig.py:471] Config param dump_hlo_local_module_name: jit_train_step
I0423 07:41:28.500651 134431136712512 pyconfig.py:471] Config param dump_hlo_module_name: jit_train_step
I0423 07:41:28.500667 134431136712512 pyconfig.py:471] Config param dump_hlo_upload_all: False
I0423 07:41:28.500681 134431136712512 pyconfig.py:471] Config param dump_hlo_xla_flags: 
I0423 07:41:28.500697 134431136712512 pyconfig.py:471] Config param dump_jaxpr: False
I0423 07:41:28.500712 134431136712512 pyconfig.py:471] Config param dump_jaxpr_delete_local_after: True
I0423 07:41:28.500727 134431136712512 pyconfig.py:471] Config param dump_jaxpr_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-23-07-41/jaxpr_dump
I0423 07:41:28.500742 134431136712512 pyconfig.py:471] Config param dump_jaxpr_local_dir: /tmp/jaxpr_dump/
I0423 07:41:28.500757 134431136712512 pyconfig.py:471] Config param dump_step: -1
I0423 07:41:28.500772 134431136712512 pyconfig.py:471] Config param elastic_enabled: False
I0423 07:41:28.500787 134431136712512 pyconfig.py:471] Config param elastic_max_retries: 10
I0423 07:41:28.500803 134431136712512 pyconfig.py:471] Config param elastic_timeout_seconds: 300
I0423 07:41:28.500818 134431136712512 pyconfig.py:471] Config param emb_dim: 16
I0423 07:41:28.500833 134431136712512 pyconfig.py:471] Config param enable_autocheckpoint: False
I0423 07:41:28.500847 134431136712512 pyconfig.py:471] Config param enable_checkpoint_cloud_logger: False
I0423 07:41:28.500869 134431136712512 pyconfig.py:471] Config param enable_checkpointing: True
I0423 07:41:28.500893 134431136712512 pyconfig.py:471] Config param enable_continuous_checkpointing: False
I0423 07:41:28.500914 134431136712512 pyconfig.py:471] Config param enable_data_shuffling: True
I0423 07:41:28.500929 134431136712512 pyconfig.py:471] Config param enable_diloco: False
I0423 07:41:28.500945 134431136712512 pyconfig.py:471] Config param enable_dp_attention: False
I0423 07:41:28.500968 134431136712512 pyconfig.py:471] Config param enable_dropout: False
I0423 07:41:28.500993 134431136712512 pyconfig.py:471] Config param enable_emergency_checkpoint: False
I0423 07:41:28.501008 134431136712512 pyconfig.py:471] Config param enable_expert_parallel: False
I0423 07:41:28.501041 134431136712512 pyconfig.py:471] Config param enable_gcp_goodput_metrics: True
I0423 07:41:28.501056 134431136712512 pyconfig.py:471] Config param enable_gcp_step_deviation_metrics: True
I0423 07:41:28.501072 134431136712512 pyconfig.py:471] Config param enable_goodput_recording: False
I0423 07:41:28.501087 134431136712512 pyconfig.py:471] Config param enable_jax_profiler: False
I0423 07:41:28.501116 134431136712512 pyconfig.py:471] Config param enable_llm_inference_pool: False
I0423 07:41:28.501132 134431136712512 pyconfig.py:471] Config param enable_model_warmup: False
I0423 07:41:28.501148 134431136712512 pyconfig.py:471] Config param enable_multi_tier_checkpointing: False
I0423 07:41:28.501162 134431136712512 pyconfig.py:471] Config param enable_nnx: False
I0423 07:41:28.501177 134431136712512 pyconfig.py:471] Config param enable_orbax_v1: False
I0423 07:41:28.501193 134431136712512 pyconfig.py:471] Config param enable_padding_causal_mask: True
I0423 07:41:28.501207 134431136712512 pyconfig.py:471] Config param enable_pathways_goodput: False
I0423 07:41:28.501222 134431136712512 pyconfig.py:471] Config param enable_prefix_caching: False
I0423 07:41:28.501236 134431136712512 pyconfig.py:471] Config param enable_rampup_batch_size: False
I0423 07:41:28.501252 134431136712512 pyconfig.py:471] Config param enable_single_controller: False
I0423 07:41:28.501266 134431136712512 pyconfig.py:471] Config param enable_single_replica_ckpt_restoring: False
I0423 07:41:28.501286 134431136712512 pyconfig.py:471] Config param enable_tensorboard: True
I0423 07:41:28.501300 134431136712512 pyconfig.py:471] Config param enable_tunix_perf_metrics: False
I0423 07:41:28.501315 134431136712512 pyconfig.py:471] Config param encoder_attention_heads_for_audio: 4
I0423 07:41:28.501331 134431136712512 pyconfig.py:471] Config param encoder_ffn_dim_for_audio: 512
I0423 07:41:28.501346 134431136712512 pyconfig.py:471] Config param encoder_layers_for_audio: 2
I0423 07:41:28.501362 134431136712512 pyconfig.py:471] Config param engram: RematLocation.REMAT
I0423 07:41:28.501377 134431136712512 pyconfig.py:471] Config param engram_head_dim: 1280
I0423 07:41:28.501393 134431136712512 pyconfig.py:471] Config param engram_kernel_size: 4
I0423 07:41:28.501407 134431136712512 pyconfig.py:471] Config param engram_layers: []
I0423 07:41:28.501423 134431136712512 pyconfig.py:471] Config param engram_max_ngram_size: 3
I0423 07:41:28.501438 134431136712512 pyconfig.py:471] Config param engram_num_heads: 8
I0423 07:41:28.501454 134431136712512 pyconfig.py:471] Config param engram_seed: 0
I0423 07:41:28.501468 134431136712512 pyconfig.py:471] Config param engram_vocab_bases: []
I0423 07:41:28.501482 134431136712512 pyconfig.py:471] Config param epsilon_high: None
I0423 07:41:28.501498 134431136712512 pyconfig.py:471] Config param eval_corr_lst: False
I0423 07:41:28.501512 134431136712512 pyconfig.py:471] Config param eval_data_columns: ['text']
I0423 07:41:28.501530 134431136712512 pyconfig.py:471] Config param eval_dataset_name: c4/en:3.0.1
I0423 07:41:28.501544 134431136712512 pyconfig.py:471] Config param eval_image_column: image
I0423 07:41:28.501560 134431136712512 pyconfig.py:471] Config param eval_interval: -1
I0423 07:41:28.501574 134431136712512 pyconfig.py:471] Config param eval_make_lst: False
I0423 07:41:28.501590 134431136712512 pyconfig.py:471] Config param eval_per_device_batch_size: 2
I0423 07:41:28.501604 134431136712512 pyconfig.py:471] Config param eval_sampling_strategy: greedy
I0423 07:41:28.501620 134431136712512 pyconfig.py:471] Config param eval_split: validation
I0423 07:41:28.501633 134431136712512 pyconfig.py:471] Config param eval_steps: -1
I0423 07:41:28.501648 134431136712512 pyconfig.py:471] Config param expansion_factor_real_data: -1.0
I0423 07:41:28.501663 134431136712512 pyconfig.py:471] Config param final_logits_soft_cap: None
I0423 07:41:28.501679 134431136712512 pyconfig.py:471] Config param first_num_dense_layers: 0
I0423 07:41:28.501693 134431136712512 pyconfig.py:471] Config param float32_gate_logits: False
I0423 07:41:28.501709 134431136712512 pyconfig.py:471] Config param float32_logits: False
I0423 07:41:28.501723 134431136712512 pyconfig.py:471] Config param float32_qk_product: False
I0423 07:41:28.501739 134431136712512 pyconfig.py:471] Config param float32_weight_sum: True
I0423 07:41:28.501753 134431136712512 pyconfig.py:471] Config param force_q_layout: False
I0423 07:41:28.501768 134431136712512 pyconfig.py:471] Config param force_unroll: False
I0423 07:41:28.501782 134431136712512 pyconfig.py:471] Config param formatting_func_kwargs: {}
I0423 07:41:28.501797 134431136712512 pyconfig.py:471] Config param formatting_func_path: 
I0423 07:41:28.501813 134431136712512 pyconfig.py:471] Config param freeze_audio_encoder_params: True
I0423 07:41:28.501829 134431136712512 pyconfig.py:471] Config param freeze_vision_encoder_params: True
I0423 07:41:28.501844 134431136712512 pyconfig.py:471] Config param fused_mlp: False
I0423 07:41:28.501859 134431136712512 pyconfig.py:471] Config param fused_qkv: True
I0423 07:41:28.501873 134431136712512 pyconfig.py:471] Config param gcs_metrics: False
I0423 07:41:28.501888 134431136712512 pyconfig.py:471] Config param gdn_chunk_size: 64
I0423 07:41:28.501902 134431136712512 pyconfig.py:471] Config param gdn_conv_kernel_dim: 4
I0423 07:41:28.501918 134431136712512 pyconfig.py:471] Config param gdn_key_head_dim: 128
I0423 07:41:28.501932 134431136712512 pyconfig.py:471] Config param gdn_num_key_heads: 16
I0423 07:41:28.501947 134431136712512 pyconfig.py:471] Config param gdn_num_value_heads: 32
I0423 07:41:28.501961 134431136712512 pyconfig.py:471] Config param gdn_value_head_dim: 128
I0423 07:41:28.501977 134431136712512 pyconfig.py:471] Config param generate_padding_batch_eval: False
I0423 07:41:28.501991 134431136712512 pyconfig.py:471] Config param generate_padding_batch_train: False
I0423 07:41:28.502007 134431136712512 pyconfig.py:471] Config param generate_slice: v5e-16
I0423 07:41:28.502022 134431136712512 pyconfig.py:471] Config param generation_configs: {}
I0423 07:41:28.502037 134431136712512 pyconfig.py:471] Config param global_batch_size_to_eval_on: 64
I0423 07:41:28.502051 134431136712512 pyconfig.py:471] Config param global_batch_size_to_load: 512
I0423 07:41:28.502066 134431136712512 pyconfig.py:471] Config param global_batch_size_to_load_eval: 64
I0423 07:41:28.502080 134431136712512 pyconfig.py:471] Config param global_batch_size_to_load_increment: None
I0423 07:41:28.502104 134431136712512 pyconfig.py:471] Config param global_batch_size_to_load_start: None
I0423 07:41:28.502120 134431136712512 pyconfig.py:471] Config param global_batch_size_to_train_on: 512
I0423 07:41:28.502134 134431136712512 pyconfig.py:471] Config param global_head_dim: 0
I0423 07:41:28.502149 134431136712512 pyconfig.py:471] Config param global_num_kv_heads: 0
I0423 07:41:28.502173 134431136712512 pyconfig.py:471] Config param global_parameter_scale: 1
I0423 07:41:28.502198 134431136712512 pyconfig.py:471] Config param global_rampup_samples: 500
I0423 07:41:28.502221 134431136712512 pyconfig.py:471] Config param global_rope_max_timescale: -1
I0423 07:41:28.502247 134431136712512 pyconfig.py:471] Config param global_rope_proportion: 0.25
I0423 07:41:28.502280 134431136712512 pyconfig.py:471] Config param goodput_upload_interval_seconds: 30
I0423 07:41:28.502303 134431136712512 pyconfig.py:471] Config param grad_dtype: float32
I0423 07:41:28.502351 134431136712512 pyconfig.py:471] Config param gradient_accumulation_steps: 8
I0423 07:41:28.502374 134431136712512 pyconfig.py:471] Config param gradient_clipping_threshold: 1.0
I0423 07:41:28.502397 134431136712512 pyconfig.py:471] Config param grain_data_source_max_workers: 16
I0423 07:41:28.502421 134431136712512 pyconfig.py:471] Config param grain_eval_files: 
I0423 07:41:28.502443 134431136712512 pyconfig.py:471] Config param grain_file_type: arrayrecord
I0423 07:41:28.502464 134431136712512 pyconfig.py:471] Config param grain_num_threads: 16
I0423 07:41:28.502486 134431136712512 pyconfig.py:471] Config param grain_num_threads_eval: 16
I0423 07:41:28.502507 134431136712512 pyconfig.py:471] Config param grain_packing_type: first_fit
I0423 07:41:28.502530 134431136712512 pyconfig.py:471] Config param grain_per_worker_buffer_size: 1
I0423 07:41:28.502551 134431136712512 pyconfig.py:471] Config param grain_per_worker_buffer_size_eval: 1
I0423 07:41:28.502573 134431136712512 pyconfig.py:471] Config param grain_prefetch_buffer_size: 500
I0423 07:41:28.502595 134431136712512 pyconfig.py:471] Config param grain_prefetch_buffer_size_eval: 500
I0423 07:41:28.502617 134431136712512 pyconfig.py:471] Config param grain_ram_budget_mb: 1024
I0423 07:41:28.502639 134431136712512 pyconfig.py:471] Config param grain_shuffle_buffer_size: 100
I0423 07:41:28.502661 134431136712512 pyconfig.py:471] Config param grain_train_files: 
I0423 07:41:28.502682 134431136712512 pyconfig.py:471] Config param grain_train_mixture_config_path: 
I0423 07:41:28.502703 134431136712512 pyconfig.py:471] Config param grain_worker_count: 1
I0423 07:41:28.502724 134431136712512 pyconfig.py:471] Config param grain_worker_count_eval: 1
I0423 07:41:28.502745 134431136712512 pyconfig.py:471] Config param grpo_beta: 0.08
I0423 07:41:28.502767 134431136712512 pyconfig.py:471] Config param grpo_epsilon: 0.2
I0423 07:41:28.502789 134431136712512 pyconfig.py:471] Config param hardware: tpu
I0423 07:41:28.502811 134431136712512 pyconfig.py:471] Config param hbm_utilization_vllm: 0.72
I0423 07:41:28.502832 134431136712512 pyconfig.py:471] Config param head_dim: 8
I0423 07:41:28.502854 134431136712512 pyconfig.py:471] Config param heartbeat_reporting_interval_in_seconds: 5
I0423 07:41:28.502875 134431136712512 pyconfig.py:471] Config param hf_data_dir: None
I0423 07:41:28.502897 134431136712512 pyconfig.py:471] Config param hf_eval_files: None
I0423 07:41:28.502918 134431136712512 pyconfig.py:471] Config param hf_eval_split: None
I0423 07:41:28.502939 134431136712512 pyconfig.py:471] Config param hf_name: None
I0423 07:41:28.502961 134431136712512 pyconfig.py:471] Config param hf_path: OptimalScale/ClimbMix
I0423 07:41:28.502982 134431136712512 pyconfig.py:471] Config param hf_train_files: None
I0423 07:41:28.503003 134431136712512 pyconfig.py:471] Config param hidden_size_for_vit: 1408
I0423 07:41:28.503025 134431136712512 pyconfig.py:471] Config param hide_profiler_step_metric: False
I0423 07:41:28.503047 134431136712512 pyconfig.py:471] Config param ici_autoregressive_parallelism: 1
I0423 07:41:28.503069 134431136712512 pyconfig.py:471] Config param ici_context_autoregressive_parallelism: 1
I0423 07:41:28.503092 134431136712512 pyconfig.py:471] Config param ici_context_parallelism: 1
I0423 07:41:28.503137 134431136712512 pyconfig.py:471] Config param ici_data_parallelism: 1
I0423 07:41:28.503159 134431136712512 pyconfig.py:471] Config param ici_diloco_parallelism: 1
I0423 07:41:28.503182 134431136712512 pyconfig.py:471] Config param ici_expert_parallelism: 1
I0423 07:41:28.503205 134431136712512 pyconfig.py:471] Config param ici_fsdp_parallelism: -1
I0423 07:41:28.503227 134431136712512 pyconfig.py:471] Config param ici_fsdp_transpose_parallelism: 1
I0423 07:41:28.503249 134431136712512 pyconfig.py:471] Config param ici_parallelism: [1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
I0423 07:41:28.503280 134431136712512 pyconfig.py:471] Config param ici_pipeline_parallelism: 1
I0423 07:41:28.503304 134431136712512 pyconfig.py:471] Config param ici_sequence_parallelism: 1
I0423 07:41:28.503329 134431136712512 pyconfig.py:471] Config param ici_tensor_parallelism: 1
I0423 07:41:28.503351 134431136712512 pyconfig.py:471] Config param ici_tensor_sequence_parallelism: 1
I0423 07:41:28.503376 134431136712512 pyconfig.py:471] Config param ici_tensor_transpose_parallelism: 1
I0423 07:41:28.503402 134431136712512 pyconfig.py:471] Config param image_path: 
I0423 07:41:28.503425 134431136712512 pyconfig.py:471] Config param image_placeholder: <|image|>
I0423 07:41:28.503440 134431136712512 pyconfig.py:471] Config param image_size_for_vit: 896
I0423 07:41:28.503456 134431136712512 pyconfig.py:471] Config param indexer_head_dim: 128
I0423 07:41:28.503471 134431136712512 pyconfig.py:471] Config param indexer_loss_scaling_factor: 0.0
I0423 07:41:28.503488 134431136712512 pyconfig.py:471] Config param indexer_n_heads: 64
I0423 07:41:28.503502 134431136712512 pyconfig.py:471] Config param indexer_sparse_training: False
I0423 07:41:28.503517 134431136712512 pyconfig.py:471] Config param indexer_topk: 2048
I0423 07:41:28.503533 134431136712512 pyconfig.py:471] Config param inference_benchmark_test: False
I0423 07:41:28.503548 134431136712512 pyconfig.py:471] Config param inference_metadata_file: 
I0423 07:41:28.503562 134431136712512 pyconfig.py:471] Config param inference_microbenchmark_log_file_path: 
I0423 07:41:28.503578 134431136712512 pyconfig.py:471] Config param inference_microbenchmark_loop_iters: 10
I0423 07:41:28.503592 134431136712512 pyconfig.py:471] Config param inference_microbenchmark_num_samples: [1, 2, 3, 4, 5]
I0423 07:41:28.503609 134431136712512 pyconfig.py:471] Config param inference_microbenchmark_prefill_lengths: 64,128,256,512,1024
I0423 07:41:28.503625 134431136712512 pyconfig.py:471] Config param inference_microbenchmark_stages: prefill,generate
I0423 07:41:28.503642 134431136712512 pyconfig.py:471] Config param inference_server: MaxtextInterleavedServer
I0423 07:41:28.503657 134431136712512 pyconfig.py:471] Config param inhomogeneous_layer_cycle_interval: 1
I0423 07:41:28.503672 134431136712512 pyconfig.py:471] Config param init_weights_seed: 0
I0423 07:41:28.503687 134431136712512 pyconfig.py:471] Config param input_data_sharding_logical_axes: ['activation_embed_and_logits_batch', 'activation_norm_length']
I0423 07:41:28.503703 134431136712512 pyconfig.py:471] Config param interleave_moe_layer_step: 1
I0423 07:41:28.503718 134431136712512 pyconfig.py:471] Config param intermediate_size_for_vit: 5632
I0423 07:41:28.503733 134431136712512 pyconfig.py:471] Config param internal_compile: False
I0423 07:41:28.503747 134431136712512 pyconfig.py:471] Config param internal_compile_num_devices: -1
I0423 07:41:28.503768 134431136712512 pyconfig.py:471] Config param jax_cache_dir: ~/jax_cache
I0423 07:41:28.503793 134431136712512 pyconfig.py:471] Config param jax_debug_log_modules: 
I0423 07:41:28.503810 134431136712512 pyconfig.py:471] Config param jax_distributed_initialization_timeout: 300
I0423 07:41:28.503826 134431136712512 pyconfig.py:471] Config param jax_profiler_port: 9999
I0423 07:41:28.503840 134431136712512 pyconfig.py:471] Config param key_proj: RematLocation.REMAT
I0423 07:41:28.503856 134431136712512 pyconfig.py:471] Config param kv_cache_buffer: 256
I0423 07:41:28.503870 134431136712512 pyconfig.py:471] Config param kv_lora_rank: 512
I0423 07:41:28.503885 134431136712512 pyconfig.py:471] Config param kv_quant_axis: KvQuantAxis.HEADS_AND_DKV
I0423 07:41:28.503903 134431136712512 pyconfig.py:471] Config param kv_quant_dtype: int8
I0423 07:41:28.503918 134431136712512 pyconfig.py:471] Config param kv_wa_proj: RematLocation.REMAT
I0423 07:41:28.503933 134431136712512 pyconfig.py:471] Config param learning_rate: 0.0002
I0423 07:41:28.503949 134431136712512 pyconfig.py:471] Config param learning_rate_final_fraction: 0.1
I0423 07:41:28.503965 134431136712512 pyconfig.py:471] Config param learning_rate_schedule_steps: 200000
I0423 07:41:28.503980 134431136712512 pyconfig.py:471] Config param load_balance_loss_weight: 0.0
I0423 07:41:28.503995 134431136712512 pyconfig.py:471] Config param load_checkpoint_only_once: False
I0423 07:41:28.504010 134431136712512 pyconfig.py:471] Config param load_from_prefill_dir: False
I0423 07:41:28.504026 134431136712512 pyconfig.py:471] Config param load_full_state_path: 
I0423 07:41:28.504040 134431136712512 pyconfig.py:471] Config param load_parameters_path: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items
I0423 07:41:28.504057 134431136712512 pyconfig.py:471] Config param local_checkpoint_directory: 
I0423 07:41:28.504071 134431136712512 pyconfig.py:471] Config param local_checkpoint_period: 0
I0423 07:41:28.504086 134431136712512 pyconfig.py:471] Config param local_rope_max_timescale: -1
I0423 07:41:28.504118 134431136712512 pyconfig.py:471] Config param local_rope_proportion: 1.0
I0423 07:41:28.504135 134431136712512 pyconfig.py:471] Config param log_config: True
I0423 07:41:28.504149 134431136712512 pyconfig.py:471] Config param log_period: 10
I0423 07:41:28.504165 134431136712512 pyconfig.py:471] Config param logical_axis_rules: (('activation_embed_and_logits_batch', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_embed_and_logits_batch_sequence', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_vocab', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_vocab', ('tensor', 'tensor_transpose')), ('activation_vocab', 'tensor_sequence'), ('activation_vocab', ('sequence', 'context')), ('vocab', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('embed_vocab', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_batch_attn', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence', 'autoregressive')), ('activation_kv_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence')), ('activation_length_attn', ('sequence', 'context')), ('activation_length_attn', ('context',)), ('activation_q_length', ('context',)), ('activation_kv_length', ()), ('activation_embed_attn', ('tensor', 'tensor_transpose')), ('activation_kv', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_kv_head_dim', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('q_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('kv_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('qkv', ()), ('kv', ()), ('kv_head_dim', ()), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'expert')), ('q_lora_up_proj', ()), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'expert')), ('kv_lora_up_proj', ()), ('activation_batch_moe', ('data', 'fsdp', 'fsdp_transpose')), ('activation_length_moe', ('sequence', 'context')), ('activation_length_moe', ('context',)), ('activation_norm_length_moe', ('tensor_sequence', 'context', 'sequence')), ('activation_embed_moe', ('tensor', 'tensor_transpose')), ('activation_mlp_moe', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_exp', ('expert',)), ('exp', 'expert'), ('mlp_moe', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'context')), ('embed_moe', ('fsdp', 'sequence', 'context')), ('activation_mlp', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_length', ('sequence', 'context')), ('activation_length', ('context',)), ('activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_embed', ('tensor', 'tensor_transpose')), ('activation_stage', 'stage'), ('mlp', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'context', 'expert')), ('norm', ('tensor', 'tensor_transpose')), ('layers', 'stage'), ('diloco', 'diloco'), ('engram_dim', ('tensor',)), ('dense_layers', ()), ('moe_layers', ()), ('mhc', ()), ('prefill_activation_length', ('sequence', 'context')), ('prefill_activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_prefill_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_length', ('sequence',)), ('cache_heads', ('autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence')), ('cache_heads', ('autoregressive', 'tensor', 'tensor_sequence')), ('paged_kv_heads', ('tensor',)), ('cache_batch_prefill', ()), ('cache_batch', ()), ('cache_heads_none', ()), ('cache_kv', ()), ('cache_sequence', ()), ('num_pages', ()), ('tokens_per_page', ()), ('paged_kv_head_dim_size', ()), ('mlp_no_fsdp', ('tensor', 'tensor_sequence', 'autoregressive')), ('embed_tensor_transpose', ('tensor_transpose',)), ('exp_with_fsdp', 'fsdp'))
I0423 07:41:28.504240 134431136712512 pyconfig.py:471] Config param logits_dot_in_fp32: False
I0423 07:41:28.504274 134431136712512 pyconfig.py:471] Config param logits_via_embedding: True
I0423 07:41:28.504290 134431136712512 pyconfig.py:471] Config param lora_input_adapters_path: 
I0423 07:41:28.504304 134431136712512 pyconfig.py:471] Config param loss_algo: grpo
I0423 07:41:28.504318 134431136712512 pyconfig.py:471] Config param lr_schedule_type: LearningRateScheduleType.COSINE
I0423 07:41:28.504336 134431136712512 pyconfig.py:471] Config param managed_mldiagnostics: False
I0423 07:41:28.504350 134431136712512 pyconfig.py:471] Config param managed_mldiagnostics_dir: /deps/maxtext_output/gpt3-52k_2026-04-23-07-41/managed-mldiagnostics
I0423 07:41:28.504366 134431136712512 pyconfig.py:471] Config param managed_mldiagnostics_run_group: 
I0423 07:41:28.504380 134431136712512 pyconfig.py:471] Config param matmul_precision: MatmulPrecision.DEFAULT
I0423 07:41:28.504397 134431136712512 pyconfig.py:471] Config param max_checkify: False
I0423 07:41:28.504411 134431136712512 pyconfig.py:471] Config param max_concurrency: 256
I0423 07:41:28.504426 134431136712512 pyconfig.py:471] Config param max_corpus_chars: 10000000
I0423 07:41:28.504441 134431136712512 pyconfig.py:471] Config param max_num_batched_tokens: None
I0423 07:41:28.504456 134431136712512 pyconfig.py:471] Config param max_num_checkpoints_to_keep: None
I0423 07:41:28.504471 134431136712512 pyconfig.py:471] Config param max_num_images_per_example: -1
I0423 07:41:28.504485 134431136712512 pyconfig.py:471] Config param max_num_seqs: None
I0423 07:41:28.504501 134431136712512 pyconfig.py:471] Config param max_position_embeddings: 163840
I0423 07:41:28.504515 134431136712512 pyconfig.py:471] Config param max_prefill_predict_length: 64
I0423 07:41:28.504531 134431136712512 pyconfig.py:471] Config param max_sample_len_for_audio: 10000
I0423 07:41:28.504547 134431136712512 pyconfig.py:471] Config param max_segments_per_seq: -1
I0423 07:41:28.504584 134431136712512 pyconfig.py:471] Config param max_source_positions_for_audio: 1500
I0423 07:41:28.504601 134431136712512 pyconfig.py:471] Config param max_target_length: 2048
I0423 07:41:28.504615 134431136712512 pyconfig.py:471] Config param max_timescale_for_audio: 10000.0
I0423 07:41:28.504631 134431136712512 pyconfig.py:471] Config param megablox: True
I0423 07:41:28.504645 134431136712512 pyconfig.py:471] Config param merge_gating_gmm: False
I0423 07:41:28.504660 134431136712512 pyconfig.py:471] Config param mesh_axes: ['diloco', 'data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive']
I0423 07:41:28.504677 134431136712512 pyconfig.py:471] Config param metrics_dir: /deps/maxtext_output/gpt3-52k_2026-04-23-07-41/metrics/
I0423 07:41:28.504692 134431136712512 pyconfig.py:471] Config param metrics_file: 
I0423 07:41:28.504706 134431136712512 pyconfig.py:471] Config param mhc_expansion_rate: 1
I0423 07:41:28.504721 134431136712512 pyconfig.py:471] Config param micro_batch_size_to_eval_on: 64
I0423 07:41:28.504737 134431136712512 pyconfig.py:471] Config param micro_batch_size_to_train_on: 64
I0423 07:41:28.504752 134431136712512 pyconfig.py:471] Config param mla_kv: RematLocation.REMAT
I0423 07:41:28.504768 134431136712512 pyconfig.py:471] Config param mla_naive_kvcache: True
I0423 07:41:28.504782 134431136712512 pyconfig.py:471] Config param mla_q: RematLocation.REMAT
I0423 07:41:28.504797 134431136712512 pyconfig.py:471] Config param mlp_activations: ['gelu']
I0423 07:41:28.504812 134431136712512 pyconfig.py:471] Config param mlp_activations_limit: -1.0
I0423 07:41:28.504827 134431136712512 pyconfig.py:471] Config param mlp_bias: False
I0423 07:41:28.504843 134431136712512 pyconfig.py:471] Config param mlp_dim: 64
I0423 07:41:28.504858 134431136712512 pyconfig.py:471] Config param mlpwi: RematLocation.REMAT
I0423 07:41:28.504874 134431136712512 pyconfig.py:471] Config param mlpwi_0: RematLocation.REMAT
I0423 07:41:28.504890 134431136712512 pyconfig.py:471] Config param mlpwi_1: RematLocation.REMAT
I0423 07:41:28.504904 134431136712512 pyconfig.py:471] Config param mlpwo: RematLocation.REMAT
I0423 07:41:28.504919 134431136712512 pyconfig.py:471] Config param moba: False
I0423 07:41:28.504933 134431136712512 pyconfig.py:471] Config param moba_chunk_size: 1024
I0423 07:41:28.504949 134431136712512 pyconfig.py:471] Config param moba_topk: 8
I0423 07:41:28.504964 134431136712512 pyconfig.py:471] Config param model_call_mode: 
I0423 07:41:28.504978 134431136712512 pyconfig.py:471] Config param model_name: gpt3-52k
I0423 07:41:28.504993 134431136712512 pyconfig.py:471] Config param moe_expert_input_dim: -1
I0423 07:41:28.505008 134431136712512 pyconfig.py:471] Config param moe_fsdp_use_two_stage_all_gather: False
I0423 07:41:28.505022 134431136712512 pyconfig.py:471] Config param moe_mlp_dim: -1
I0423 07:41:28.505038 134431136712512 pyconfig.py:471] Config param moe_mlpwi_0: RematLocation.REMAT
I0423 07:41:28.505060 134431136712512 pyconfig.py:471] Config param moe_mlpwi_1: RematLocation.REMAT
I0423 07:41:28.505085 134431136712512 pyconfig.py:471] Config param moe_mlpwo: RematLocation.REMAT
I0423 07:41:28.505120 134431136712512 pyconfig.py:471] Config param monitor_goodput: False
I0423 07:41:28.505138 134431136712512 pyconfig.py:471] Config param monitor_step_time_deviation: True
I0423 07:41:28.505152 134431136712512 pyconfig.py:471] Config param mrope_section: [24, 20, 20]
I0423 07:41:28.505168 134431136712512 pyconfig.py:471] Config param mscale: 1.0
I0423 07:41:28.505185 134431136712512 pyconfig.py:471] Config param mtc_data_parallelism: 0
I0423 07:41:28.505199 134431136712512 pyconfig.py:471] Config param mtp_eval_target_module: 0
I0423 07:41:28.505214 134431136712512 pyconfig.py:471] Config param mtp_loss_scaling_factor: 0.1
I0423 07:41:28.505229 134431136712512 pyconfig.py:471] Config param mtp_num_layers: 0
I0423 07:41:28.505243 134431136712512 pyconfig.py:471] Config param mu_dtype: float32
I0423 07:41:28.505272 134431136712512 pyconfig.py:471] Config param multi_sampling: False
I0423 07:41:28.505288 134431136712512 pyconfig.py:471] Config param multi_tier_checkpointing_backup_interval_minutes: 0
I0423 07:41:28.505303 134431136712512 pyconfig.py:471] Config param muon_beta: 0.95
I0423 07:41:28.505319 134431136712512 pyconfig.py:471] Config param muon_consistent_rms: None
I0423 07:41:28.505333 134431136712512 pyconfig.py:471] Config param muon_weight_decay: 0.0
I0423 07:41:28.505349 134431136712512 pyconfig.py:471] Config param n_routing_groups: -1
I0423 07:41:28.505363 134431136712512 pyconfig.py:471] Config param n_window_for_audio: 50
I0423 07:41:28.505378 134431136712512 pyconfig.py:471] Config param n_window_infer_for_audio: 800
I0423 07:41:28.505393 134431136712512 pyconfig.py:471] Config param nope_layer_interval: -1
I0423 07:41:28.505407 134431136712512 pyconfig.py:471] Config param norm_topk_prob: False
I0423 07:41:28.505422 134431136712512 pyconfig.py:471] Config param normalization_layer_epsilon: 1e-05
I0423 07:41:28.505439 134431136712512 pyconfig.py:471] Config param normalize_embedding_logits: False
I0423 07:41:28.505453 134431136712512 pyconfig.py:471] Config param num_attention_heads_for_vit: 16
I0423 07:41:28.505467 134431136712512 pyconfig.py:471] Config param num_batches: 4
I0423 07:41:28.505481 134431136712512 pyconfig.py:471] Config param num_channels_for_vit: 3
I0423 07:41:28.505497 134431136712512 pyconfig.py:471] Config param num_conv_layers_for_audio: 3
I0423 07:41:28.505511 134431136712512 pyconfig.py:471] Config param num_decoder_layers: 1
I0423 07:41:28.505527 134431136712512 pyconfig.py:471] Config param num_diloco_replicas: 1
I0423 07:41:28.505543 134431136712512 pyconfig.py:471] Config param num_epoch: 1
I0423 07:41:28.505557 134431136712512 pyconfig.py:471] Config param num_eval_passes: 1
I0423 07:41:28.505572 134431136712512 pyconfig.py:471] Config param num_experts: 1
I0423 07:41:28.505588 134431136712512 pyconfig.py:471] Config param num_experts_per_tok: 1
I0423 07:41:28.505602 134431136712512 pyconfig.py:471] Config param num_generations: 2
I0423 07:41:28.505616 134431136712512 pyconfig.py:471] Config param num_hidden_layers_for_vit: 34
I0423 07:41:28.505630 134431136712512 pyconfig.py:471] Config param num_iterations: 1
I0423 07:41:28.505645 134431136712512 pyconfig.py:471] Config param num_kv_heads: 2
I0423 07:41:28.505660 134431136712512 pyconfig.py:471] Config param num_layers_per_pipeline_stage: 1
I0423 07:41:28.505675 134431136712512 pyconfig.py:471] Config param num_mel_bins_for_audio: 128
I0423 07:41:28.505689 134431136712512 pyconfig.py:471] Config param num_pipeline_microbatches: -1
I0423 07:41:28.505704 134431136712512 pyconfig.py:471] Config param num_pipeline_repeats: -1
I0423 07:41:28.505718 134431136712512 pyconfig.py:471] Config param num_position_embeddings_for_vit: 1024
I0423 07:41:28.505732 134431136712512 pyconfig.py:471] Config param num_query_heads: 2
I0423 07:41:28.505748 134431136712512 pyconfig.py:471] Config param num_samplers_slices: -1
I0423 07:41:28.505762 134431136712512 pyconfig.py:471] Config param num_slices: 1
I0423 07:41:28.505776 134431136712512 pyconfig.py:471] Config param num_target_devices: 32
I0423 07:41:28.505791 134431136712512 pyconfig.py:471] Config param num_test_batches: 5
I0423 07:41:28.505805 134431136712512 pyconfig.py:471] Config param num_trainer_slices: -1
I0423 07:41:28.505819 134431136712512 pyconfig.py:471] Config param num_vocab_tiling: 1
I0423 07:41:28.505842 134431136712512 pyconfig.py:471] Config param off_policy_steps: 0
I0423 07:41:28.505867 134431136712512 pyconfig.py:471] Config param offline_data_dir: None
I0423 07:41:28.505888 134431136712512 pyconfig.py:471] Config param opt_type: OptimizerType.ADAM_PAX
I0423 07:41:28.505906 134431136712512 pyconfig.py:471] Config param optimize_mesh_for_tpu_v6e: False
I0423 07:41:28.505921 134431136712512 pyconfig.py:471] Config param optimizer_memory_host_offload: False
I0423 07:41:28.505937 134431136712512 pyconfig.py:471] Config param original_max_position_embeddings: 4096
I0423 07:41:28.505953 134431136712512 pyconfig.py:471] Config param out_hidden_size_for_vit: 512
I0423 07:41:28.505967 134431136712512 pyconfig.py:471] Config param out_proj: RematLocation.REMAT
I0423 07:41:28.505983 134431136712512 pyconfig.py:471] Config param output_dim_for_audio: 512
I0423 07:41:28.505998 134431136712512 pyconfig.py:471] Config param override_logical_axis_rules: False
I0423 07:41:28.506014 134431136712512 pyconfig.py:471] Config param override_model_config: True
I0423 07:41:28.506029 134431136712512 pyconfig.py:471] Config param packing: True
I0423 07:41:28.506043 134431136712512 pyconfig.py:471] Config param pagedattn_head_dim_alignment: 128
I0423 07:41:28.506059 134431136712512 pyconfig.py:471] Config param pagedattn_max_pages_per_group: -1
I0423 07:41:28.506073 134431136712512 pyconfig.py:471] Config param pagedattn_num_pages: 64
I0423 07:41:28.506088 134431136712512 pyconfig.py:471] Config param pagedattn_pages_per_compute_block: 4
I0423 07:41:28.506122 134431136712512 pyconfig.py:471] Config param pagedattn_tokens_per_page: 32
I0423 07:41:28.506137 134431136712512 pyconfig.py:471] Config param param_scan_axis: 1
I0423 07:41:28.506156 134431136712512 pyconfig.py:471] Config param parameter_memory_host_offload: False
I0423 07:41:28.506180 134431136712512 pyconfig.py:471] Config param partial_rotary_factor: 1.0
I0423 07:41:28.506199 134431136712512 pyconfig.py:471] Config param patch_size_for_vit: 14
I0423 07:41:28.506232 134431136712512 pyconfig.py:471] Config param penalty_incorrect_answer: -1.0
I0423 07:41:28.506249 134431136712512 pyconfig.py:471] Config param penalty_incorrect_format: -0.5
I0423 07:41:28.506265 134431136712512 pyconfig.py:471] Config param per_device_batch_size: 2
I0423 07:41:28.506284 134431136712512 pyconfig.py:471] Config param per_device_batch_size_increment: 2.0
I0423 07:41:28.506299 134431136712512 pyconfig.py:471] Config param per_device_batch_size_start: 4.0
I0423 07:41:28.506314 134431136712512 pyconfig.py:471] Config param pipeline_delay_activation_forwarding: False
I0423 07:41:28.506329 134431136712512 pyconfig.py:471] Config param pipeline_fsdp_ag_once: False
I0423 07:41:28.506343 134431136712512 pyconfig.py:471] Config param pipeline_fsdp_ag_per_repeat: False
I0423 07:41:28.506359 134431136712512 pyconfig.py:471] Config param pipeline_parallel_layers: 1
I0423 07:41:28.506373 134431136712512 pyconfig.py:471] Config param pixel_shuffle_ratio_for_vit: 0.5
I0423 07:41:28.506389 134431136712512 pyconfig.py:471] Config param posemb_type_for_vit: learn
I0423 07:41:28.506403 134431136712512 pyconfig.py:471] Config param position_id_per_seconds: 25
I0423 07:41:28.506419 134431136712512 pyconfig.py:471] Config param prefill_cache_axis_order: 1,2,0,3
I0423 07:41:28.506433 134431136712512 pyconfig.py:471] Config param prefill_cache_dir: 
I0423 07:41:28.506448 134431136712512 pyconfig.py:471] Config param prefill_chunk_size: 256
I0423 07:41:28.506463 134431136712512 pyconfig.py:471] Config param prefill_slice: v5e-16
I0423 07:41:28.506477 134431136712512 pyconfig.py:471] Config param prefix_caching_dram_byte: 100000000000
I0423 07:41:28.506493 134431136712512 pyconfig.py:471] Config param prefix_caching_hbm_byte: 10000000000
I0423 07:41:28.506507 134431136712512 pyconfig.py:471] Config param prefuse_moe_weights: False
I0423 07:41:28.506524 134431136712512 pyconfig.py:471] Config param profile_cleanly: True
I0423 07:41:28.506539 134431136712512 pyconfig.py:471] Config param profile_periodically_period: -1
I0423 07:41:28.506554 134431136712512 pyconfig.py:471] Config param profile_power_events: False
I0423 07:41:28.506568 134431136712512 pyconfig.py:471] Config param profiler: ProfilerType.NONE
I0423 07:41:28.506586 134431136712512 pyconfig.py:471] Config param profiler_steps: 5
I0423 07:41:28.506601 134431136712512 pyconfig.py:471] Config param projector_dropout_for_vit: 0.0
I0423 07:41:28.506616 134431136712512 pyconfig.py:471] Config param projector_input_dim_for_vit: 4096
I0423 07:41:28.506630 134431136712512 pyconfig.py:471] Config param projector_output_dim_for_vit: 4096
I0423 07:41:28.506645 134431136712512 pyconfig.py:471] Config param prometheus_port: 0
I0423 07:41:28.506661 134431136712512 pyconfig.py:471] Config param prompt: I love to
I0423 07:41:28.506675 134431136712512 pyconfig.py:471] Config param pure_nnx: False
I0423 07:41:28.506690 134431136712512 pyconfig.py:471] Config param pure_nnx_decoder: False
I0423 07:41:28.506704 134431136712512 pyconfig.py:471] Config param q_lora_rank: 0
I0423 07:41:28.506719 134431136712512 pyconfig.py:471] Config param qk_clip_threshold: 100.0
I0423 07:41:28.506735 134431136712512 pyconfig.py:471] Config param qk_nope_head_dim: 128
I0423 07:41:28.506751 134431136712512 pyconfig.py:471] Config param qk_norm_with_scale: True
I0423 07:41:28.506781 134431136712512 pyconfig.py:471] Config param qk_rope_head_dim: 64
I0423 07:41:28.506796 134431136712512 pyconfig.py:471] Config param qkv_proj: RematLocation.REMAT
I0423 07:41:28.506811 134431136712512 pyconfig.py:471] Config param quant_cfg_path: 
I0423 07:41:28.506827 134431136712512 pyconfig.py:471] Config param quantization: QuantizationType.NONE
I0423 07:41:28.506846 134431136712512 pyconfig.py:471] Config param quantization_local_shard_count: 4
I0423 07:41:28.506861 134431136712512 pyconfig.py:471] Config param quantize_kvcache: False
I0423 07:41:28.506876 134431136712512 pyconfig.py:471] Config param query_proj: RematLocation.REMAT
I0423 07:41:28.506891 134431136712512 pyconfig.py:471] Config param query_wa_proj: RematLocation.REMAT
I0423 07:41:28.506908 134431136712512 pyconfig.py:471] Config param ragged_block_size: 256
I0423 07:41:28.506922 134431136712512 pyconfig.py:471] Config param ragged_buffer_factor: -1.0
I0423 07:41:28.506938 134431136712512 pyconfig.py:471] Config param rampup_end_step: 0
I0423 07:41:28.506953 134431136712512 pyconfig.py:471] Config param rampup_samples_per_increment_to_load: None
I0423 07:41:28.506968 134431136712512 pyconfig.py:471] Config param reasoning_end_token: </reasoning>
I0423 07:41:28.506983 134431136712512 pyconfig.py:471] Config param reasoning_start_token: <reasoning>
I0423 07:41:28.506997 134431136712512 pyconfig.py:471] Config param record_internal_nn_metrics: 0
I0423 07:41:28.507011 134431136712512 pyconfig.py:471] Config param remat_policy: full
I0423 07:41:28.507026 134431136712512 pyconfig.py:471] Config param remat_policy_for_vit: minimal
I0423 07:41:28.507040 134431136712512 pyconfig.py:471] Config param remove_size_one_mesh_axis_from_type: True
I0423 07:41:28.507054 134431136712512 pyconfig.py:471] Config param replicate_quant_scale: False
I0423 07:41:28.507068 134431136712512 pyconfig.py:471] Config param replicator_backup_interval_minutes: 0
I0423 07:41:28.507084 134431136712512 pyconfig.py:471] Config param report_heartbeat_metric_for_gcp_monitoring: False
I0423 07:41:28.507120 134431136712512 pyconfig.py:471] Config param report_performance_metric_for_gcp_monitoring: False
I0423 07:41:28.507138 134431136712512 pyconfig.py:471] Config param reshape_q: False
I0423 07:41:28.507152 134431136712512 pyconfig.py:471] Config param return_log_prob: False
I0423 07:41:28.507166 134431136712512 pyconfig.py:471] Config param reuse_example_batch: 0
I0423 07:41:28.507182 134431136712512 pyconfig.py:471] Config param reward_exact_answer: 5.0
I0423 07:41:28.507196 134431136712512 pyconfig.py:471] Config param reward_exact_format_match: 3.0
I0423 07:41:28.507213 134431136712512 pyconfig.py:471] Config param reward_partial_format_match: 0.5
I0423 07:41:28.507228 134431136712512 pyconfig.py:471] Config param reward_ratio_guess_to_answer_high: 0.5
I0423 07:41:28.507244 134431136712512 pyconfig.py:471] Config param reward_ratio_guess_to_answer_low: 0.25
I0423 07:41:28.507261 134431136712512 pyconfig.py:471] Config param reward_white_space_format_match: 1.5
I0423 07:41:28.507279 134431136712512 pyconfig.py:471] Config param rl: {'num_generations': 2, 'num_iterations': 1, 'grpo_beta': 0.08, 'grpo_epsilon': 0.2, 'loss_algo': 'grpo', 'use_agentic_rollout': False, 'max_concurrency': 256, 'off_policy_steps': 0, 'system_prompt': '', 'degenerate_group_masking': True, 'epsilon_high': None}
I0423 07:41:28.507300 134431136712512 pyconfig.py:471] Config param rollout_data_parallelism: -1
I0423 07:41:28.507314 134431136712512 pyconfig.py:471] Config param rollout_expert_parallelism: 1
I0423 07:41:28.507329 134431136712512 pyconfig.py:471] Config param rollout_micro_batch_size: -1
I0423 07:41:28.507344 134431136712512 pyconfig.py:471] Config param rollout_tensor_parallelism: -1
I0423 07:41:28.507360 134431136712512 pyconfig.py:471] Config param rope_attention_scaling: False
I0423 07:41:28.507374 134431136712512 pyconfig.py:471] Config param rope_factor: 40
I0423 07:41:28.507389 134431136712512 pyconfig.py:471] Config param rope_interleave: True
I0423 07:41:28.507403 134431136712512 pyconfig.py:471] Config param rope_linear_scaling_factor: 1.0
I0423 07:41:28.507417 134431136712512 pyconfig.py:471] Config param rope_max_timescale: 10000
I0423 07:41:28.507433 134431136712512 pyconfig.py:471] Config param rope_min_timescale: 1
I0423 07:41:28.507447 134431136712512 pyconfig.py:471] Config param rope_theta_for_vit: 10000
I0423 07:41:28.507462 134431136712512 pyconfig.py:471] Config param rope_truncate: True
I0423 07:41:28.507476 134431136712512 pyconfig.py:471] Config param rope_type: RopeType.DEFAULT
I0423 07:41:28.507494 134431136712512 pyconfig.py:471] Config param rope_use_scale: True
I0423 07:41:28.507508 134431136712512 pyconfig.py:471] Config param routed_bias: False
I0423 07:41:28.507525 134431136712512 pyconfig.py:471] Config param routed_bias_update_rate: 0.0
I0423 07:41:28.507540 134431136712512 pyconfig.py:471] Config param routed_scaling_factor: 1.0
I0423 07:41:28.507554 134431136712512 pyconfig.py:471] Config param routed_score_func: 
I0423 07:41:28.507570 134431136712512 pyconfig.py:471] Config param run_name: gpt3-52k_2026-04-23-07-41
I0423 07:41:28.507584 134431136712512 pyconfig.py:471] Config param sa_block_kv: 512
I0423 07:41:28.507600 134431136712512 pyconfig.py:471] Config param sa_block_kv_compute: 512
I0423 07:41:28.507615 134431136712512 pyconfig.py:471] Config param sa_block_kv_dkv: 512
I0423 07:41:28.507630 134431136712512 pyconfig.py:471] Config param sa_block_kv_dkv_compute: 512
I0423 07:41:28.507644 134431136712512 pyconfig.py:471] Config param sa_block_kv_dq: 512
I0423 07:41:28.507658 134431136712512 pyconfig.py:471] Config param sa_block_q: 512
I0423 07:41:28.507674 134431136712512 pyconfig.py:471] Config param sa_block_q_dkv: 512
I0423 07:41:28.507688 134431136712512 pyconfig.py:471] Config param sa_block_q_dq: 512
I0423 07:41:28.507703 134431136712512 pyconfig.py:471] Config param sa_k_layout: HEAD_DIM_MINOR
I0423 07:41:28.507717 134431136712512 pyconfig.py:471] Config param sa_q_layout: HEAD_DIM_MINOR
I0423 07:41:28.507731 134431136712512 pyconfig.py:471] Config param sa_use_fused_bwd_kernel: False
I0423 07:41:28.507746 134431136712512 pyconfig.py:471] Config param sa_v_layout: HEAD_DIM_MINOR
I0423 07:41:28.507760 134431136712512 pyconfig.py:471] Config param sampler_devices_fraction: 0.5
I0423 07:41:28.507775 134431136712512 pyconfig.py:471] Config param save_checkpoint_on_completion: True
I0423 07:41:28.507790 134431136712512 pyconfig.py:471] Config param save_config_to_gcs: False
I0423 07:41:28.507804 134431136712512 pyconfig.py:471] Config param save_quantized_params_path: 
I0423 07:41:28.507820 134431136712512 pyconfig.py:471] Config param scale_embedding_for_audio: True
I0423 07:41:28.507834 134431136712512 pyconfig.py:471] Config param scan_layers: True
I0423 07:41:28.507849 134431136712512 pyconfig.py:471] Config param scan_layers_per_stage: False
I0423 07:41:28.507863 134431136712512 pyconfig.py:471] Config param scan_pipeline_iterations: True
I0423 07:41:28.507878 134431136712512 pyconfig.py:471] Config param scan_pipeline_repeats: False
I0423 07:41:28.507892 134431136712512 pyconfig.py:471] Config param set_remat_policy_on_layers_per_stage: False
I0423 07:41:28.507908 134431136712512 pyconfig.py:471] Config param set_remat_policy_on_pipeline_iterations: True
I0423 07:41:28.507922 134431136712512 pyconfig.py:471] Config param sft_train_on_completion_only: False
I0423 07:41:28.507936 134431136712512 pyconfig.py:471] Config param shard_exp_on_fsdp: False
I0423 07:41:28.507950 134431136712512 pyconfig.py:471] Config param shard_mode: ShardMode.AUTO
I0423 07:41:28.507967 134431136712512 pyconfig.py:471] Config param shard_optimizer_over_data: False
I0423 07:41:28.507981 134431136712512 pyconfig.py:471] Config param sharding_strategy: None
I0423 07:41:28.507996 134431136712512 pyconfig.py:471] Config param sharding_tolerance: 0.02
I0423 07:41:28.508011 134431136712512 pyconfig.py:471] Config param shardy: True
I0423 07:41:28.508025 134431136712512 pyconfig.py:471] Config param share_kv_projections: False
I0423 07:41:28.508040 134431136712512 pyconfig.py:471] Config param shared_experts: 0
I0423 07:41:28.508056 134431136712512 pyconfig.py:471] Config param sinkhorn_iterations: 20
I0423 07:41:28.508070 134431136712512 pyconfig.py:471] Config param skip_first_n_steps_for_profiler: 1
I0423 07:41:28.508085 134431136712512 pyconfig.py:471] Config param skip_jax_distributed_system: False
I0423 07:41:28.508114 134431136712512 pyconfig.py:471] Config param skip_step_interval: 128
I0423 07:41:28.508131 134431136712512 pyconfig.py:471] Config param skip_step_on_spikes: False
I0423 07:41:28.508145 134431136712512 pyconfig.py:471] Config param skip_step_scaling_factor: 6.0
I0423 07:41:28.508160 134431136712512 pyconfig.py:471] Config param sliding_window_size: 0
I0423 07:41:28.508175 134431136712512 pyconfig.py:471] Config param solution_end_token: </answer>
I0423 07:41:28.508190 134431136712512 pyconfig.py:471] Config param solution_start_token: <answer>
I0423 07:41:28.508205 134431136712512 pyconfig.py:471] Config param source_checkpoint_layout: orbax
I0423 07:41:28.508220 134431136712512 pyconfig.py:471] Config param sparse_matmul: True
I0423 07:41:28.508234 134431136712512 pyconfig.py:471] Config param spatial_merge_size_for_vit: 2
I0423 07:41:28.508249 134431136712512 pyconfig.py:471] Config param stack_prefill_result_cache: False
I0423 07:41:28.508264 134431136712512 pyconfig.py:471] Config param stack_trace_interval_seconds: 600
I0423 07:41:28.508282 134431136712512 pyconfig.py:471] Config param stack_trace_to_cloud: False
I0423 07:41:28.508297 134431136712512 pyconfig.py:471] Config param step_deviation_interval_seconds: 30
I0423 07:41:28.508316 134431136712512 pyconfig.py:471] Config param steps: 200000
I0423 07:41:28.508329 134431136712512 pyconfig.py:471] Config param stop_strings: None
I0423 07:41:28.508351 134431136712512 pyconfig.py:471] Config param student_overrides: {'model_name': 'llama3.1-8b'}
I0423 07:41:28.508371 134431136712512 pyconfig.py:471] Config param student_params_to_update: None
I0423 07:41:28.508394 134431136712512 pyconfig.py:471] Config param subslice_shape: 
I0423 07:41:28.508419 134431136712512 pyconfig.py:471] Config param swap_space_vllm_gb: 2
I0423 07:41:28.508441 134431136712512 pyconfig.py:471] Config param system_prompt: 
I0423 07:41:28.508457 134431136712512 pyconfig.py:471] Config param target_eval_loss: 0.0
I0423 07:41:28.508473 134431136712512 pyconfig.py:471] Config param teacher_overrides: {'model_name': 'llama3.1-8b'}
I0423 07:41:28.508489 134431136712512 pyconfig.py:471] Config param temperature_tuning: False
I0423 07:41:28.508503 134431136712512 pyconfig.py:471] Config param temporal_patch_size_for_vit: 2
I0423 07:41:28.508518 134431136712512 pyconfig.py:471] Config param tensorboard_dir: /deps/maxtext_output/gpt3-52k_2026-04-23-07-41/tensorboard/
I0423 07:41:28.508535 134431136712512 pyconfig.py:471] Config param tensors_on_device: None
I0423 07:41:28.508550 134431136712512 pyconfig.py:471] Config param tensors_to_offload: None
I0423 07:41:28.508565 134431136712512 pyconfig.py:471] Config param test_batch_start_index: 0
I0423 07:41:28.508581 134431136712512 pyconfig.py:471] Config param tile_size_for_vit: 336
I0423 07:41:28.508596 134431136712512 pyconfig.py:471] Config param tokenize_eval_data: True
I0423 07:41:28.508611 134431136712512 pyconfig.py:471] Config param tokenize_train_data: True
I0423 07:41:28.508626 134431136712512 pyconfig.py:471] Config param tokenizer_path: meta-llama/Llama-3.1-8B
I0423 07:41:28.508642 134431136712512 pyconfig.py:471] Config param tokenizer_type: TokenizerType.HUGGINGFACE
I0423 07:41:28.508659 134431136712512 pyconfig.py:471] Config param topk_routing_group: -1
I0423 07:41:28.508674 134431136712512 pyconfig.py:471] Config param train_data_columns: ['text']
I0423 07:41:28.508689 134431136712512 pyconfig.py:471] Config param train_fraction: 1.0
I0423 07:41:28.508705 134431136712512 pyconfig.py:471] Config param train_image_column: image
I0423 07:41:28.508720 134431136712512 pyconfig.py:471] Config param train_micro_batch_size: -1
I0423 07:41:28.508735 134431136712512 pyconfig.py:471] Config param train_split: train
I0423 07:41:28.508749 134431136712512 pyconfig.py:471] Config param trainable_parameters_mask: []
I0423 07:41:28.508764 134431136712512 pyconfig.py:471] Config param trainable_position_size: 2048
I0423 07:41:28.508780 134431136712512 pyconfig.py:471] Config param trainer_devices_fraction: 0.5
I0423 07:41:28.508795 134431136712512 pyconfig.py:471] Config param upload_all_profiler_results: False
I0423 07:41:28.508811 134431136712512 pyconfig.py:471] Config param use_2d_fsdp_sharding: False
I0423 07:41:28.508825 134431136712512 pyconfig.py:471] Config param use_agentic_rollout: False
I0423 07:41:28.508841 134431136712512 pyconfig.py:471] Config param use_audio: False
I0423 07:41:28.508856 134431136712512 pyconfig.py:471] Config param use_audio_in_video: False
I0423 07:41:28.508870 134431136712512 pyconfig.py:471] Config param use_batch_split_schedule: False
I0423 07:41:28.508886 134431136712512 pyconfig.py:471] Config param use_chat_template: False
I0423 07:41:28.508901 134431136712512 pyconfig.py:471] Config param use_chunked_prefill: False
I0423 07:41:28.508922 134431136712512 pyconfig.py:471] Config param use_custom_sort_vjp: True
I0423 07:41:28.508945 134431136712512 pyconfig.py:471] Config param use_dpo: False
I0423 07:41:28.508961 134431136712512 pyconfig.py:471] Config param use_gather_mosaic_kernel: False
I0423 07:41:28.508975 134431136712512 pyconfig.py:471] Config param use_grpo: True
I0423 07:41:28.508990 134431136712512 pyconfig.py:471] Config param use_indexer: False
I0423 07:41:28.509005 134431136712512 pyconfig.py:471] Config param use_iota_embed: True
I0423 07:41:28.509020 134431136712512 pyconfig.py:471] Config param use_jax_splash: False
I0423 07:41:28.509035 134431136712512 pyconfig.py:471] Config param use_max_logit_estimate: -1
I0423 07:41:28.509051 134431136712512 pyconfig.py:471] Config param use_mrope: False
I0423 07:41:28.509065 134431136712512 pyconfig.py:471] Config param use_multimodal: False
I0423 07:41:28.509080 134431136712512 pyconfig.py:471] Config param use_pathways: True
I0423 07:41:28.509126 134431136712512 pyconfig.py:471] Config param use_post_attn_norm: False
I0423 07:41:28.509141 134431136712512 pyconfig.py:471] Config param use_post_ffw_norm: False
I0423 07:41:28.509157 134431136712512 pyconfig.py:471] Config param use_qk_clip: False
I0423 07:41:28.509171 134431136712512 pyconfig.py:471] Config param use_qk_norm: False
I0423 07:41:28.509186 134431136712512 pyconfig.py:471] Config param use_qk_norm_in_gdn: True
I0423 07:41:28.509202 134431136712512 pyconfig.py:471] Config param use_qwix_quantization: False
I0423 07:41:28.509217 134431136712512 pyconfig.py:471] Config param use_ragged_attention: False
I0423 07:41:28.509233 134431136712512 pyconfig.py:471] Config param use_random_routing: False
I0423 07:41:28.509248 134431136712512 pyconfig.py:471] Config param use_replicator_service: False
I0423 07:41:28.509264 134431136712512 pyconfig.py:471] Config param use_ring_of_experts: False
I0423 07:41:28.509283 134431136712512 pyconfig.py:471] Config param use_sft: False
I0423 07:41:28.509297 134431136712512 pyconfig.py:471] Config param use_splash_scheduler: False
I0423 07:41:28.509313 134431136712512 pyconfig.py:471] Config param use_tokamax_gmm: False
I0423 07:41:28.509327 134431136712512 pyconfig.py:471] Config param use_tokamax_splash: False
I0423 07:41:28.509342 134431136712512 pyconfig.py:471] Config param use_truncation: True
I0423 07:41:28.509356 134431136712512 pyconfig.py:471] Config param use_tunix_gradient_accumulation: False
I0423 07:41:28.509372 134431136712512 pyconfig.py:471] Config param use_untrainable_positional_embedding: False
I0423 07:41:28.509386 134431136712512 pyconfig.py:471] Config param use_vertex_tensorboard: False
I0423 07:41:28.509401 134431136712512 pyconfig.py:471] Config param using_pipeline_parallelism: False
I0423 07:41:28.509415 134431136712512 pyconfig.py:471] Config param v_head_dim: 128
I0423 07:41:28.509429 134431136712512 pyconfig.py:471] Config param v_norm_with_scale: True
I0423 07:41:28.509445 134431136712512 pyconfig.py:471] Config param value_proj: RematLocation.REMAT
I0423 07:41:28.509460 134431136712512 pyconfig.py:471] Config param vertex_tensorboard_project: 
I0423 07:41:28.509475 134431136712512 pyconfig.py:471] Config param vertex_tensorboard_region: 
I0423 07:41:28.509490 134431136712512 pyconfig.py:471] Config param video_path: 
I0423 07:41:28.509504 134431136712512 pyconfig.py:471] Config param video_placeholder: <|video|>
I0423 07:41:28.509521 134431136712512 pyconfig.py:471] Config param vision_output_dim_for_vit: 4096
I0423 07:41:28.509535 134431136712512 pyconfig.py:471] Config param vision_output_length: -1
I0423 07:41:28.509550 134431136712512 pyconfig.py:471] Config param vllm_additional_config: {}
I0423 07:41:28.509565 134431136712512 pyconfig.py:471] Config param vllm_hf_config_path: 
I0423 07:41:28.509579 134431136712512 pyconfig.py:471] Config param vllm_hf_overrides: {}
I0423 07:41:28.509594 134431136712512 pyconfig.py:471] Config param vocab_size: 32000
I0423 07:41:28.509609 134431136712512 pyconfig.py:471] Config param warmup_steps_fraction: 0.1
I0423 07:41:28.509625 134431136712512 pyconfig.py:471] Config param weight_dtype: float32
I0423 07:41:28.509650 134431136712512 pyconfig.py:471] Config param weight_quantization_calibration_method: absmax
I0423 07:41:28.509665 134431136712512 pyconfig.py:471] Config param wi_tile_dlhs_batch_seq: 512
I0423 07:41:28.509681 134431136712512 pyconfig.py:471] Config param wi_tile_dlhs_embed_dim: 1024
I0423 07:41:28.509696 134431136712512 pyconfig.py:471] Config param wi_tile_dlhs_mlp_dim: 1024
I0423 07:41:28.509711 134431136712512 pyconfig.py:471] Config param wi_tile_drhs_batch_seq: 512
I0423 07:41:28.509726 134431136712512 pyconfig.py:471] Config param wi_tile_drhs_embed_dim: 1024
I0423 07:41:28.509742 134431136712512 pyconfig.py:471] Config param wi_tile_drhs_mlp_dim: 1024
I0423 07:41:28.509757 134431136712512 pyconfig.py:471] Config param wi_tile_fwd_batch_seq: 512
I0423 07:41:28.509773 134431136712512 pyconfig.py:471] Config param wi_tile_fwd_embed_dim: 1024
I0423 07:41:28.509788 134431136712512 pyconfig.py:471] Config param wi_tile_fwd_mlp_dim: 1024
I0423 07:41:28.509807 134431136712512 pyconfig.py:471] Config param wo_tile_dlhs_batch_seq: 512
I0423 07:41:28.509829 134431136712512 pyconfig.py:471] Config param wo_tile_dlhs_embed_dim: 1024
I0423 07:41:28.509844 134431136712512 pyconfig.py:471] Config param wo_tile_dlhs_mlp_dim: 1024
I0423 07:41:28.509859 134431136712512 pyconfig.py:471] Config param wo_tile_drhs_batch_seq: 512
I0423 07:41:28.509873 134431136712512 pyconfig.py:471] Config param wo_tile_drhs_embed_dim: 1024
I0423 07:41:28.509888 134431136712512 pyconfig.py:471] Config param wo_tile_drhs_mlp_dim: 1024
I0423 07:41:28.509903 134431136712512 pyconfig.py:471] Config param wo_tile_fwd_batch_seq: 512
I0423 07:41:28.509919 134431136712512 pyconfig.py:471] Config param wo_tile_fwd_embed_dim: 1024
I0423 07:41:28.509934 134431136712512 pyconfig.py:471] Config param wo_tile_fwd_mlp_dim: 1024
I0423 07:41:28.509949 134431136712512 pyconfig.py:471] Config param wsd_decay_steps_fraction: 0.1
I0423 07:41:28.509964 134431136712512 pyconfig.py:471] Config param wsd_decay_style: WsdDecayStyle.LINEAR
I0423 07:41:28.509983 134431136712512 pyconfig.py:471] Config param xprof_e2e_enable_fw_power_level_event: False
I0423 07:41:28.509997 134431136712512 pyconfig.py:471] Config param xprof_e2e_enable_fw_thermal_event: False
I0423 07:41:28.510013 134431136712512 pyconfig.py:471] Config param xprof_e2e_enable_fw_throttle_event: False
I0423 07:41:28.510027 134431136712512 pyconfig.py:471] Config param xprof_tpu_power_trace_level: 0
I0423 07:41:28.510044 134431136712512 pyconfig.py:471] Config param z_loss_multiplier: 0.0
I0423 07:41:28.510428 134431136712512 tokenizer.py:245] Tokenizer path: meta-llama/Llama-2-7b-chat-hf
I0423 07:41:28.510480 134431136712512 tokenizer.py:224] Loading HF tokenizer: meta-llama/Llama-2-7b-chat-hf
I0423 07:41:28.707670 134431136712512 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json "HTTP/1.1 200 OK"
I0423 07:41:28.822865 134431136712512 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json "HTTP/1.1 200 OK"
I0423 07:41:28.928375 134431136712512 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer_config.json "HTTP/1.1 200 OK"
I0423 07:41:29.034873 134431136712512 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer_config.json "HTTP/1.1 200 OK"
I0423 07:41:29.147930 134431136712512 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main/additional_chat_templates?recursive=false&expand=false "HTTP/1.1 404 Not Found"
I0423 07:41:29.257657 134431136712512 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main?recursive=true&expand=false "HTTP/1.1 200 OK"
I0423 07:41:29.362047 134431136712512 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer.model "HTTP/1.1 302 Found"
I0423 07:41:29.472993 134431136712512 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/xet-read-token/f5db02db724555f92da89c216ac04704f23d4590 "HTTP/1.1 200 OK"
I0423 07:41:30.106933 134431136712512 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer.json "HTTP/1.1 200 OK"
I0423 07:41:30.305296 134431136712512 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer.json "HTTP/1.1 200 OK"
I0423 07:41:30.432663 134431136712512 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/added_tokens.json "HTTP/1.1 404 Not Found"
I0423 07:41:30.546939 134431136712512 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/special_tokens_map.json "HTTP/1.1 200 OK"
I0423 07:41:30.668837 134431136712512 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/special_tokens_map.json "HTTP/1.1 200 OK"
I0423 07:41:30.780781 134431136712512 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/chat_template.jinja "HTTP/1.1 404 Not Found"
I0423 07:41:30.872034 134431136712512 _schedule.py:129] A polynomial schedule was set with a non-positive `transition_steps` value; this results in a constant schedule with value `init_value`.
I0423 07:41:30.878922 134431136712512 maxtext_utils.py:1604] Num_devices: 32, shape (1, 4, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1)
I0423 07:41:30.879052 134431136712512 train_distill.py:582] Applying logical axis rules for model initialization and training...
I0423 07:41:30.879141 134431136712512 train_distill.py:586] Loading Student from ...
I0423 07:41:30.879172 134431136712512 train_distill.py:170] --- Student Configuration ---
I0423 07:41:30.879194 134431136712512 train_distill.py:171]   Model Name:      gpt3-52k
I0423 07:41:30.879215 134431136712512 train_distill.py:172]   Dimensions:      1 Layers, 16 Emb Dim, 8 Head Dim
I0423 07:41:30.879234 134431136712512 train_distill.py:175]   Attention Heads: 2 Query, 2 KV
I0423 07:41:30.879254 134431136712512 train_distill.py:176]   Vocab Size:      32000
I0423 07:41:30.879273 134431136712512 train_distill.py:177]   Checkpoint:      
I0423 07:41:30.879292 134431136712512 train_distill.py:451] Initializing model: gpt3-52k...
I0423 07:41:32.158404 134431136712512 train_distill.py:600] Loading Teacher from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items...
I0423 07:41:32.158514 134431136712512 train_distill.py:170] --- Teacher Configuration ---
I0423 07:41:32.158544 134431136712512 train_distill.py:171]   Model Name:      gpt3-52k
I0423 07:41:32.158574 134431136712512 train_distill.py:172]   Dimensions:      1 Layers, 16 Emb Dim, 8 Head Dim
I0423 07:41:32.158596 134431136712512 train_distill.py:175]   Attention Heads: 2 Query, 2 KV
I0423 07:41:32.158616 134431136712512 train_distill.py:176]   Vocab Size:      32000
I0423 07:41:32.158633 134431136712512 train_distill.py:177]   Checkpoint:      gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items
I0423 07:41:32.158650 134431136712512 train_distill.py:451] Initializing model: gpt3-52k...
I0423 07:41:33.210732 134431136712512 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0423 07:41:33.210886 134431136712512 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=True, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7a42f8099f40>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0423 07:41:33.210942 134431136712512 abstract_checkpointer.py:35] orbax-checkpoint version: 0.11.28
W0423 07:41:33.748063 134431136712512 checkpoint.py:202] Metadata file does not exist: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items/_CHECKPOINT_METADATA
I0423 07:41:34.286663    1932 google_auth_provider.cc:181] Running on GCE, using service account 562977990677-compute@developer.gserviceaccount.com
I0423 07:41:35.416793 134431136712512 checkpointer.py:304] Restoring checkpoint from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items.
W0423 07:41:37.578843 134431136712512 transform_utils.py:230] The transformations API will eventually be replaced by an upgraded design. The current API will not be removed until this point, but it will no longer be actively worked on.
I0423 07:41:37.579214 134431136712512 transform_utils.py:288] The following keys are not loaded from the original tree after applying specified transforms: params/params/decoder/to_nnx__rngs/aqt/count, params/params/decoder/to_nnx__rngs/aqt/key, params/params/decoder/to_nnx__rngs/dropout/count, params/params/decoder/to_nnx__rngs/dropout/key, params/params/decoder/to_nnx__rngs/params/count, params/params/decoder/to_nnx__rngs/params/key
I0423 07:41:40.986524 134431136712512 checkpointer.py:318] Finished restoring checkpoint in 5.93 seconds from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items.
I0423 07:41:41.732027 134431136712512 train_distill.py:626] Initializing Data Iterators via MaxText pipeline...
I0423 07:41:41.795578 134431136712512 config.py:112] TensorFlow version 2.20.0 available.
I0423 07:41:41.796054 134431136712512 config.py:125] JAX version 0.9.2 available.
I0423 07:41:42.213725 134431136712512 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/main/README.md "HTTP/1.1 307 Temporary Redirect"
I0423 07:41:42.243825 134431136712512 _client.py:1025] HTTP Request: HEAD https://huggingface.co/api/resolve-cache/datasets/OptimalScale/ClimbMix/6d467b96d8f26cbe7465e2d70e36191aa75867ac/README.md "HTTP/1.1 200 OK"
I0423 07:41:42.371151 134431136712512 _client.py:1025] HTTP Request: GET https://huggingface.co/api/resolve-cache/datasets/OptimalScale/ClimbMix/6d467b96d8f26cbe7465e2d70e36191aa75867ac/README.md "HTTP/1.1 200 OK"
I0423 07:41:42.489608 134431136712512 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/6d467b96d8f26cbe7465e2d70e36191aa75867ac/ClimbMix.py "HTTP/1.1 404 Not Found"
I0423 07:41:42.794385 134431136712512 _client.py:1025] HTTP Request: HEAD https://s3.amazonaws.com/datasets.huggingface.co/datasets/datasets/OptimalScale/ClimbMix/OptimalScale/ClimbMix.py "HTTP/1.1 404 Not Found"
I0423 07:41:42.939374 134431136712512 _client.py:1025] HTTP Request: GET https://huggingface.co/api/datasets/OptimalScale/ClimbMix/revision/6d467b96d8f26cbe7465e2d70e36191aa75867ac "HTTP/1.1 200 OK"
I0423 07:41:43.045383 134431136712512 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/6d467b96d8f26cbe7465e2d70e36191aa75867ac/.huggingface.yaml "HTTP/1.1 404 Not Found"
I0423 07:41:43.203683 134431136712512 _client.py:1025] HTTP Request: GET https://datasets-server.huggingface.co/info?dataset=OptimalScale/ClimbMix "HTTP/1.1 200 OK"
I0423 07:41:43.313184 134431136712512 _client.py:1025] HTTP Request: GET https://huggingface.co/api/datasets/OptimalScale/ClimbMix/tree/6d467b96d8f26cbe7465e2d70e36191aa75867ac/data?recursive=true&expand=false "HTTP/1.1 404 Not Found"
I0423 07:41:43.421551 134431136712512 _client.py:1025] HTTP Request: GET https://huggingface.co/api/datasets/OptimalScale/ClimbMix/tree/6d467b96d8f26cbe7465e2d70e36191aa75867ac?recursive=false&expand=false "HTTP/1.1 200 OK"
I0423 07:41:43.565761 134431136712512 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/6d467b96d8f26cbe7465e2d70e36191aa75867ac/dataset_infos.json "HTTP/1.1 404 Not Found"
I0423 07:41:43.725537 134431136712512 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json "HTTP/1.1 200 OK"
I0423 07:41:43.830624 134431136712512 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer_config.json "HTTP/1.1 200 OK"
I0423 07:41:43.936450 134431136712512 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main/additional_chat_templates?recursive=false&expand=false "HTTP/1.1 404 Not Found"
I0423 07:41:44.037894 134431136712512 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main?recursive=true&expand=false "HTTP/1.1 200 OK"
E0423 07:41:44.132752 134431136712512 packing.py:209] PackAndBatchOperation is deprecated. Please use lazy_dataset.FirstFitPackIterDataset instead.
I0423 07:41:44.132963 134431136712512 data_loader.py:408] Adding CopyNumPyArrayToSharedMemory MapTransform.
I0423 07:41:44.135981 134431136712512 train_distill.py:396] Input Pipeline Checkpointing: DISABLED
I0423 07:41:44.136040 134431136712512 train_distill.py:400] Reason: Iterator 'MultiHostDataLoadIterator' is not recognized as Grain (dataset_type='DatasetType.HF', has_save=False)
I0423 07:41:44.136123 134431136712512 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0423 07:41:44.136205 134431136712512 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7a42f8099f40>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0423 07:41:44.136246 134431136712512 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0423 07:41:44.136277 134431136712512 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7a42f8099f40>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0423 07:41:44.136320 134431136712512 checkpoint_manager.py:702] [process=6][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a2eac1f3440>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a3d227f0bc0>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a24f419b920>}, handler_registry=None
I0423 07:41:44.136510 134431136712512 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a2eac1f3440>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0423 07:41:44.136552 134431136712512 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a3d227f0bc0>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0423 07:41:44.136583 134431136712512 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a24f419b920>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0423 07:41:44.136608 134431136712512 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a3d2224d910>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0423 07:41:44.136634 134431136712512 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a2eac1f3440>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a2eac1f3440>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a3d227f0bc0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a3d227f0bc0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a24f419b920>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a24f419b920>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a3d2224d910>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a3d2224d910>}).
I0423 07:41:44.137012 134431136712512 async_checkpointer.py:177] [process=6][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7a24f4121800> timeout: 600 secs and primary_host=0 for async checkpoint writes
I0423 07:41:46.261958 134431136712512 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_main_20260423_071551/pt_distill_nnx_xpk_main_20260423_071551_07_distill_smoke/checkpoints
I0423 07:41:46.264280 134431136712512 checkpoint_manager.py:921] [process=6][thread=MainThread] CheckpointManager created,  primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_main_20260423_071551/pt_distill_nnx_xpk_main_20260423_071551_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7a2eac1d6a50>
I0423 07:41:46.264390 134431136712512 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0423 07:41:46.264455 134431136712512 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7a42f8099f40>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0423 07:41:46.264489 134431136712512 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0423 07:41:46.264519 134431136712512 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7a42f8099f40>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0423 07:41:46.264551 134431136712512 checkpoint_manager.py:1983] [process=6][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning.
I0423 07:41:46.264607 134431136712512 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=134431136712512 count=1 at 0x7a2aa42fc240>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7a24f419b710>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7a24f419b6e0>, _write_futures=[])
I0423 07:41:46.264973 134431136712512 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=134431136712512 count=1 at 0x7a2aa42fc240>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7a24f419b710>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7a24f419b6e0>, _write_futures=[])
I0423 07:41:46.264997 134431136712512 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=134431136712512 count=1 at 0x7a2aa42fc240>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7a24f419b710>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7a24f419b6e0>, _write_futures=[])
I0423 07:41:46.265028 134431136712512 checkpoint_manager.py:702] [process=6][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a24f419b8f0>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a24f41f44a0>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a30ac248500>, 'iter': <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7a22f81d3aa0>}, handler_registry=None
I0423 07:41:46.265137 134431136712512 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a24f419b8f0>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0423 07:41:46.265172 134431136712512 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a24f41f44a0>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0423 07:41:46.265195 134431136712512 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a30ac248500>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0423 07:41:46.265222 134431136712512 composite_checkpoint_handler.py:237] Deferred registration for item: "iter". Adding handler `<maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7a22f81d3aa0>` for item "iter" and save args `<class 'maxtext.common.checkpointing.GrainCheckpointSave'>` and restore args `<class 'maxtext.common.checkpointing.GrainCheckpointRestore'>` to `_handler_registry`.
I0423 07:41:46.265248 134431136712512 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a22f81d34a0>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0423 07:41:46.265273 134431136712512 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a24f419b8f0>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a24f419b8f0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a24f41f44a0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a24f41f44a0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a30ac248500>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a30ac248500>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointSave'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7a22f81d3aa0>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointRestore'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7a22f81d3aa0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a22f81d34a0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a22f81d34a0>}).
I0423 07:41:46.265341 134431136712512 async_checkpointer.py:177] [process=6][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7a24f4121a80> timeout: 600 secs and primary_host=0 for async checkpoint writes
I0423 07:41:47.093975 134431136712512 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_main_20260423_071551/pt_distill_nnx_xpk_main_20260423_071551_07_distill_smoke/checkpoints
I0423 07:41:47.097201 134431136712512 checkpoint_manager.py:921] [process=6][thread=MainThread] CheckpointManager created,  primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_main_20260423_071551/pt_distill_nnx_xpk_main_20260423_071551_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7a3d224eda90>
I0423 07:41:47.097608 134431136712512 train_distill.py:677] Starting Distillation Training...
I0423 07:41:47.097713 134431136712512 peft_trainer.py:584] Training with mesh: Mesh('diloco': 1, 'data': 4, 'stage': 1, 'fsdp': 8, 'fsdp_transpose': 1, 'sequence': 1, 'context': 1, 'context_autoregressive': 1, 'tensor': 1, 'tensor_transpose': 1, 'tensor_sequence': 1, 'expert': 1, 'autoregressive': 1, axis_types=(Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto))
I0423 07:41:47.567675 134431136712512 peft_trainer.py:594] Compiled train_step cache size: 0
I0423 07:41:47.569371 134278838994688 grain_pool.py:367] Grain pool will use 1 processes.
I0423 07:41:47.627738 134278838994688 grain_pool.py:440] Grain pool will start child processes.
I0423 07:41:47.633507 134278838994688 grain_pool.py:448] Grain pool started all child processes.
2026-04-23 07:41:54.142748: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
`rope_parameters`'s factor field must be a float >= 1, got 40
`rope_parameters`'s beta_fast field must be a float, got 32
`rope_parameters`'s beta_slow field must be a float, got 1
DeepseekV32Config got `key=rope_scaling` in kwargs but hasn't set it as attribute. For RoPE standardization you need to set `self.rope_parameters` in model's config. 
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 781, in <module>
    app.run(main)
  File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 316, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 261, in _run_main
    sys.exit(main(argv))
             ^^^^^^^^^^
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 777, in main
    train_distill(student_config, teacher_config, is_offline, global_config.offline_data_dir)
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 679, in train_distill
    trainer.train(train_iter, eval_iter)
  File "/usr/local/lib/python3.12/site-packages/tunix/sft/peft_trainer.py", line 652, in train
    train_example = sharding_utils.shard_input(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 58, in shard_input
    return jax.tree.map(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/tree.py", line 156, in map
    return tree_util.tree_map(f, tree, *rest, is_leaf=is_leaf)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 373, in tree_map
    return treedef.unflatten(f(*xs) for xs in zip(*all_leaves))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 373, in <genexpr>
    return treedef.unflatten(f(*xs) for xs in zip(*all_leaves))
                             ^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 59, in <lambda>
    lambda x: jax.make_array_from_process_local_data(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 985, in make_array_from_process_local_data
    out = [_array_from_process_local_data(data, s, shape)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 1047, in _array_from_process_local_data
    return make_array_from_callback(global_shape, sharding, cb)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 844, in make_array_from_callback
    per_device_values = api.device_put(per_device_values, devices)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/api.py", line 2732, in device_put
    out_flat = dispatch._batched_device_put_impl(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 602, in _batched_device_put_impl
    y = _device_put_impl(x, device=device, src=src, copy=cp, aval=aval)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 582, in _device_put_impl
    return _device_put_sharding_impl(x, aval, device, copy)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 512, in _device_put_sharding_impl
    raise ValueError(
ValueError: When the second argument to `device_put` is a Device, the first argument must be a fully addressable array or a non-addressable array with a single device sharding. Got value with devices {TpuDevice(id=25, process_index=6, coords=(1,6,0), core_on_chip=0), TpuDevice(id=16, process_index=4, coords=(0,4,0), core_on_chip=0), TpuDevice(id=29, process_index=6, coords=(1,7,0), core_on_chip=0), TpuDevice(id=19, process_index=5, coords=(3,4,0), core_on_chip=0), TpuDevice(id=7, process_index=1, coords=(3,1,0), core_on_chip=0), TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), TpuDevice(id=12, process_index=2, coords=(0,3,0), core_on_chip=0), TpuDevice(id=10, process_index=3, coords=(2,2,0), core_on_chip=0), TpuDevice(id=23, process_index=5, coords=(3,5,0), core_on_chip=0), TpuDevice(id=5, process_index=0, coords=(1,1,0), core_on_chip=0), TpuDevice(id=8, process_index=2, coords=(0,2,0), core_on_chip=0), TpuDevice(id=22, process_index=5, coords=(2,5,0), core_on_chip=0), TpuDevice(id=27, process_index=7, coords=(3,6,0), core_on_chip=0), TpuDevice(id=28, process_index=6, coords=(0,7,0), core_on_chip=0), TpuDevice(id=17, process_index=4, coords=(1,4,0), core_on_chip=0), TpuDevice(id=26, process_index=7, coords=(2,6,0), core_on_chip=0), TpuDevice(id=11, process_index=3, coords=(3,2,0), core_on_chip=0), TpuDevice(id=4, process_index=0, coords=(0,1,0), core_on_chip=0), TpuDevice(id=21, process_index=4, coords=(1,5,0), core_on_chip=0), TpuDevice(id=15, process_index=3, coords=(3,3,0), core_on_chip=0), TpuDevice(id=24, process_index=6, coords=(0,6,0), core_on_chip=0), TpuDevice(id=9, process_index=2, coords=(1,2,0), core_on_chip=0), TpuDevice(id=31, process_index=7, coords=(3,7,0), core_on_chip=0), TpuDevice(id=2, process_index=1, coords=(2,0,0), core_on_chip=0), TpuDevice(id=20, process_index=4, coords=(0,5,0), core_on_chip=0), TpuDevice(id=1, process_index=0, coords=(1,0,0), core_on_chip=0), TpuDevice(id=14, process_index=3, coords=(2,3,0), core_on_chip=0), TpuDevice(id=6, process_index=1, coords=(2,1,0), core_on_chip=0), TpuDevice(id=18, process_index=5, coords=(2,4,0), core_on_chip=0), TpuDevice(id=13, process_index=2, coords=(1,3,0), core_on_chip=0), TpuDevice(id=30, process_index=7, coords=(2,7,0), core_on_chip=0), TpuDevice(id=3, process_index=1, coords=(3,0,0), core_on_chip=0)}
I0423 07:41:58.307933 134278838994688 grain_pool.py:542] Grain pool is exiting.
I0423 07:41:58.308036 134278838994688 grain_pool.py:547] Shutting down multiprocessing system.
I0423 07:41:59.970507 134278838994688 grain_pool.py:547] Shutting down multiprocessing system.
/usr/local/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 15 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
XPK End: Thu Apr 23 07:42:09 UTC 2026
EXIT_CODE=1