MaxView

← Back to run

Log Summary

XPK Start: Thu Apr 23 09:49:08 UTC 2026
2026-04-23 09:49:25.763269: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'}
`rope_scaling`'s factor field must be a float >= 1, got 40
`rope_scaling`'s beta_fast field must be a float, got 32
`rope_scaling`'s beta_slow field must be a float, got 1
Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'}
Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'}
Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'}
I0423 09:49:29.915775 135852897781568 max_utils.py:273] Attempting to initialize the jax distributed system...
INFO:2026-04-23 09:49:38,956:jax._src.distributed:149: Starting JAX distributed service on [::]:8482
I0423 09:49:38.956430 135852897781568 distributed.py:149] Starting JAX distributed service on [::]:8482
INFO:2026-04-23 09:49:38,958:jax._src.distributed:166: Connecting to JAX distributed service on mt-07-distill-smoke-903ji-slice-job-0-0.mt-07-distill-smoke-903ji:8482
I0423 09:49:38.958784 135852897781568 distributed.py:166] Connecting to JAX distributed service on mt-07-distill-smoke-903ji-slice-job-0-0.mt-07-distill-smoke-903ji:8482
I0423 09:49:40.334381 135852897781568 max_utils.py:284] Jax distributed system initialized!
I0423 09:49:46.989748 135852897781568 max_utils.py:244] Jax distributed system is already initialized.
W0423 09:49:47.119377 135852897781568 pyconfig.py:111] base_output_directory is not provided; Using local directory called maxtext_output
I0423 09:49:47.178395 135852897781568 max_utils.py:244] Jax distributed system is already initialized.
I0423 09:49:47.179603 135852897781568 pyconfig.py:471] Config param abort_on_inf_loss: True
I0423 09:49:47.179651 135852897781568 pyconfig.py:471] Config param abort_on_nan_loss: True
I0423 09:49:47.179678 135852897781568 pyconfig.py:471] Config param act_quantization_calibration_method: absmax
I0423 09:49:47.179700 135852897781568 pyconfig.py:471] Config param activation_dropout_for_audio: 0.0
I0423 09:49:47.179720 135852897781568 pyconfig.py:471] Config param activation_function_for_audio: gelu
I0423 09:49:47.179739 135852897781568 pyconfig.py:471] Config param activations_in_float32: False
I0423 09:49:47.179757 135852897781568 pyconfig.py:471] Config param adam_b1: 0.9
I0423 09:49:47.179777 135852897781568 pyconfig.py:471] Config param adam_b2: 0.95
I0423 09:49:47.179795 135852897781568 pyconfig.py:471] Config param adam_eps: 1e-08
I0423 09:49:47.179817 135852897781568 pyconfig.py:471] Config param adam_eps_root: 0.0
I0423 09:49:47.179833 135852897781568 pyconfig.py:471] Config param adam_weight_decay: 0.1
I0423 09:49:47.179851 135852897781568 pyconfig.py:471] Config param adamw_mask: []
I0423 09:49:47.179867 135852897781568 pyconfig.py:471] Config param add_bos: True
I0423 09:49:47.179884 135852897781568 pyconfig.py:471] Config param add_eos: True
I0423 09:49:47.179901 135852897781568 pyconfig.py:471] Config param allow_split_physical_axes: False
I0423 09:49:47.179916 135852897781568 pyconfig.py:471] Config param ar_cache_axis_order: 1,2,0,3
I0423 09:49:47.179932 135852897781568 pyconfig.py:471] Config param async_checkpointing: True
I0423 09:49:47.179949 135852897781568 pyconfig.py:471] Config param async_scheduling: False
I0423 09:49:47.179965 135852897781568 pyconfig.py:471] Config param attention: dot_product
I0423 09:49:47.179982 135852897781568 pyconfig.py:471] Config param attention_bias: False
I0423 09:49:47.179998 135852897781568 pyconfig.py:471] Config param attention_dropout_for_audio: 0.0
I0423 09:49:47.180014 135852897781568 pyconfig.py:471] Config param attention_out: RematLocation.REMAT
I0423 09:49:47.180035 135852897781568 pyconfig.py:471] Config param attention_output_dim: -1
I0423 09:49:47.180052 135852897781568 pyconfig.py:471] Config param attention_sink: False
I0423 09:49:47.180068 135852897781568 pyconfig.py:471] Config param attention_type: global
I0423 09:49:47.180084 135852897781568 pyconfig.py:471] Config param attn_logits_soft_cap: None
I0423 09:49:47.180113 135852897781568 pyconfig.py:471] Config param audio_path: 
I0423 09:49:47.180130 135852897781568 pyconfig.py:471] Config param audio_placeholder: <|audio|>
I0423 09:49:47.180145 135852897781568 pyconfig.py:471] Config param autoregressive_decode_assert: 
I0423 09:49:47.180161 135852897781568 pyconfig.py:471] Config param base_config: base.yml
I0423 09:49:47.180177 135852897781568 pyconfig.py:471] Config param base_emb_dim: 16
I0423 09:49:47.180194 135852897781568 pyconfig.py:471] Config param base_mlp_dim: 64
I0423 09:49:47.180210 135852897781568 pyconfig.py:471] Config param base_moe_mlp_dim: -1
I0423 09:49:47.180225 135852897781568 pyconfig.py:471] Config param base_num_decoder_layers: 1
I0423 09:49:47.180242 135852897781568 pyconfig.py:471] Config param base_num_kv_heads: 2
I0423 09:49:47.180258 135852897781568 pyconfig.py:471] Config param base_num_query_heads: 2
I0423 09:49:47.180282 135852897781568 pyconfig.py:471] Config param base_output_directory: /deps/maxtext_output
I0423 09:49:47.180307 135852897781568 pyconfig.py:471] Config param batch_size: 1
I0423 09:49:47.180346 135852897781568 pyconfig.py:471] Config param batch_split_factor: 1
I0423 09:49:47.180363 135852897781568 pyconfig.py:471] Config param beta_fast: 32
I0423 09:49:47.180380 135852897781568 pyconfig.py:471] Config param beta_slow: 1
I0423 09:49:47.180395 135852897781568 pyconfig.py:471] Config param bwd_quantization_calibration_method: absmax
I0423 09:49:47.180413 135852897781568 pyconfig.py:471] Config param capacity_factor: -1.0
I0423 09:49:47.180429 135852897781568 pyconfig.py:471] Config param cast_logits_to_fp32: True
I0423 09:49:47.180446 135852897781568 pyconfig.py:471] Config param chat_template: 
I0423 09:49:47.180461 135852897781568 pyconfig.py:471] Config param chat_template_path: 
I0423 09:49:47.180477 135852897781568 pyconfig.py:471] Config param checkpoint_conversion_fn: None
I0423 09:49:47.180494 135852897781568 pyconfig.py:471] Config param checkpoint_dir: /deps/maxtext_output/gpt3-52k_2026-04-23-09-49/checkpoints/
I0423 09:49:47.180511 135852897781568 pyconfig.py:471] Config param checkpoint_is_quantized: False
I0423 09:49:47.180528 135852897781568 pyconfig.py:471] Config param checkpoint_period: 2000
I0423 09:49:47.180544 135852897781568 pyconfig.py:471] Config param checkpoint_storage_concurrent_gb: 96
I0423 09:49:47.180560 135852897781568 pyconfig.py:471] Config param checkpoint_storage_target_data_file_size_bytes: 2147483648
I0423 09:49:47.180576 135852897781568 pyconfig.py:471] Config param checkpoint_storage_use_ocdbt: True
I0423 09:49:47.180591 135852897781568 pyconfig.py:471] Config param checkpoint_storage_use_zarr3: True
I0423 09:49:47.180608 135852897781568 pyconfig.py:471] Config param checkpoint_todelete_full_path: None
I0423 09:49:47.180623 135852897781568 pyconfig.py:471] Config param checkpoint_todelete_subdir: None
I0423 09:49:47.180639 135852897781568 pyconfig.py:471] Config param chips_per_vm: 4
I0423 09:49:47.180656 135852897781568 pyconfig.py:471] Config param chunk_attn_window_size: 0
I0423 09:49:47.180673 135852897781568 pyconfig.py:471] Config param collect_stack_trace: False
I0423 09:49:47.180689 135852897781568 pyconfig.py:471] Config param colocated_python_checkpointing: False
I0423 09:49:47.180704 135852897781568 pyconfig.py:471] Config param colocated_python_data_input: False
I0423 09:49:47.180720 135852897781568 pyconfig.py:471] Config param compile_topology: 
I0423 09:49:47.180734 135852897781568 pyconfig.py:471] Config param compile_topology_num_slices: -1
I0423 09:49:47.180749 135852897781568 pyconfig.py:471] Config param compile_xla_flags: 
I0423 09:49:47.180765 135852897781568 pyconfig.py:471] Config param compiled_trainstep_file: 
I0423 09:49:47.180779 135852897781568 pyconfig.py:471] Config param compute_axis_order: 0,1,2,3
I0423 09:49:47.180795 135852897781568 pyconfig.py:471] Config param constant_bound_config: []
I0423 09:49:47.180810 135852897781568 pyconfig.py:471] Config param context: RematLocation.REMAT
I0423 09:49:47.180826 135852897781568 pyconfig.py:471] Config param context_parallel_load_balance: True
I0423 09:49:47.180842 135852897781568 pyconfig.py:471] Config param context_parallel_reorder_strategy: ReorderStrategy.AUTO
I0423 09:49:47.180860 135852897781568 pyconfig.py:471] Config param context_parallel_size: 1
I0423 09:49:47.180876 135852897781568 pyconfig.py:471] Config param context_parallel_strategy: all_gather
I0423 09:49:47.180890 135852897781568 pyconfig.py:471] Config param context_sharding: context
I0423 09:49:47.180906 135852897781568 pyconfig.py:471] Config param conv_chunksize_for_audio: 500
I0423 09:49:47.180920 135852897781568 pyconfig.py:471] Config param conv_stride_for_vit: 14
I0423 09:49:47.180936 135852897781568 pyconfig.py:471] Config param convert_checkpoint_if_possible: False
I0423 09:49:47.180950 135852897781568 pyconfig.py:471] Config param cost_estimate_flops_bwd: -1
I0423 09:49:47.180966 135852897781568 pyconfig.py:471] Config param cost_estimate_flops_fwd: -1
I0423 09:49:47.180981 135852897781568 pyconfig.py:471] Config param custom_mesh: 
I0423 09:49:47.180996 135852897781568 pyconfig.py:471] Config param custom_mesh_and_rule: 
I0423 09:49:47.181011 135852897781568 pyconfig.py:471] Config param d_model_for_audio: 256
I0423 09:49:47.181025 135852897781568 pyconfig.py:471] Config param data_sharding: (('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'),)
I0423 09:49:47.181045 135852897781568 pyconfig.py:471] Config param data_shuffle_seed: 0
I0423 09:49:47.181060 135852897781568 pyconfig.py:471] Config param dataset_name: c4/en:3.0.1
I0423 09:49:47.181075 135852897781568 pyconfig.py:471] Config param dataset_path: 
I0423 09:49:47.181090 135852897781568 pyconfig.py:471] Config param dataset_type: DatasetType.HF
I0423 09:49:47.181143 135852897781568 pyconfig.py:471] Config param dcn_autoregressive_parallelism: 1
I0423 09:49:47.181157 135852897781568 pyconfig.py:471] Config param dcn_context_autoregressive_parallelism: 1
I0423 09:49:47.181173 135852897781568 pyconfig.py:471] Config param dcn_context_parallelism: 1
I0423 09:49:47.181189 135852897781568 pyconfig.py:471] Config param dcn_data_parallelism: -1
I0423 09:49:47.181204 135852897781568 pyconfig.py:471] Config param dcn_diloco_parallelism: 1
I0423 09:49:47.181219 135852897781568 pyconfig.py:471] Config param dcn_expert_parallelism: 1
I0423 09:49:47.181234 135852897781568 pyconfig.py:471] Config param dcn_fsdp_parallelism: 1
I0423 09:49:47.181250 135852897781568 pyconfig.py:471] Config param dcn_fsdp_transpose_parallelism: 1
I0423 09:49:47.181266 135852897781568 pyconfig.py:471] Config param dcn_parallelism: [1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
I0423 09:49:47.181282 135852897781568 pyconfig.py:471] Config param dcn_pipeline_parallelism: 1
I0423 09:49:47.181297 135852897781568 pyconfig.py:471] Config param dcn_sequence_parallelism: 1
I0423 09:49:47.181312 135852897781568 pyconfig.py:471] Config param dcn_tensor_parallelism: 1
I0423 09:49:47.181328 135852897781568 pyconfig.py:471] Config param dcn_tensor_sequence_parallelism: 1
I0423 09:49:47.181348 135852897781568 pyconfig.py:471] Config param dcn_tensor_transpose_parallelism: 1
I0423 09:49:47.181362 135852897781568 pyconfig.py:471] Config param debug: {'rl': False}
I0423 09:49:47.181378 135852897781568 pyconfig.py:471] Config param debug_sharding: False
I0423 09:49:47.181393 135852897781568 pyconfig.py:471] Config param decode_sampling_nucleus_p: -1
I0423 09:49:47.181408 135852897781568 pyconfig.py:471] Config param decode_sampling_strategy: SamplingStrategy.GREEDY
I0423 09:49:47.181427 135852897781568 pyconfig.py:471] Config param decode_sampling_temperature: 1.0
I0423 09:49:47.181442 135852897781568 pyconfig.py:471] Config param decode_sampling_top_k: 0
I0423 09:49:47.181458 135852897781568 pyconfig.py:471] Config param decoder_block: DecoderBlockType.GPT3
I0423 09:49:47.181476 135852897781568 pyconfig.py:471] Config param decoder_layer_input: RematLocation.DEVICE
I0423 09:49:47.181492 135852897781568 pyconfig.py:471] Config param deepstack_visual_indexes_for_vit: []
I0423 09:49:47.181508 135852897781568 pyconfig.py:471] Config param degenerate_group_masking: True
I0423 09:49:47.181524 135852897781568 pyconfig.py:471] Config param dense_init_scale: 1.0
I0423 09:49:47.181538 135852897781568 pyconfig.py:471] Config param diloco_outer_lr: 0.3
I0423 09:49:47.181554 135852897781568 pyconfig.py:471] Config param diloco_outer_momentum: 0.9
I0423 09:49:47.181570 135852897781568 pyconfig.py:471] Config param diloco_sync_period: 36
I0423 09:49:47.181585 135852897781568 pyconfig.py:471] Config param distill_alpha: 0.5
I0423 09:49:47.181601 135852897781568 pyconfig.py:471] Config param distill_alpha_end: None
I0423 09:49:47.181616 135852897781568 pyconfig.py:471] Config param distill_alpha_schedule: constant
I0423 09:49:47.181631 135852897781568 pyconfig.py:471] Config param distill_beta: 0.0
I0423 09:49:47.181647 135852897781568 pyconfig.py:471] Config param distill_beta_end: None
I0423 09:49:47.181661 135852897781568 pyconfig.py:471] Config param distill_beta_schedule: constant
I0423 09:49:47.181676 135852897781568 pyconfig.py:471] Config param distill_feature_loss_type: cosine
I0423 09:49:47.181692 135852897781568 pyconfig.py:471] Config param distill_layer_indices: None
I0423 09:49:47.181708 135852897781568 pyconfig.py:471] Config param distill_temperature: 1.0
I0423 09:49:47.181722 135852897781568 pyconfig.py:471] Config param distill_temperature_end: None
I0423 09:49:47.181738 135852897781568 pyconfig.py:471] Config param distill_temperature_schedule: constant
I0423 09:49:47.181753 135852897781568 pyconfig.py:471] Config param downsample_hidden_size_for_audio: 256
I0423 09:49:47.181769 135852897781568 pyconfig.py:471] Config param dpo_beta: 0.1
I0423 09:49:47.181784 135852897781568 pyconfig.py:471] Config param dpo_label_smoothing: 0.0
I0423 09:49:47.181800 135852897781568 pyconfig.py:471] Config param dq_reduction_steps: 0
I0423 09:49:47.181814 135852897781568 pyconfig.py:471] Config param dropout_rate: 0.0
I0423 09:49:47.181829 135852897781568 pyconfig.py:471] Config param dtype: bfloat16
I0423 09:49:47.181859 135852897781568 pyconfig.py:471] Config param dtype_mm: float32
I0423 09:49:47.181875 135852897781568 pyconfig.py:471] Config param dump_hlo: False
I0423 09:49:47.181891 135852897781568 pyconfig.py:471] Config param dump_hlo_delete_local_after: True
I0423 09:49:47.181907 135852897781568 pyconfig.py:471] Config param dump_hlo_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-23-09-49/xla_dump
I0423 09:49:47.181923 135852897781568 pyconfig.py:471] Config param dump_hlo_local_dir: /tmp/xla_dump/
I0423 09:49:47.181938 135852897781568 pyconfig.py:471] Config param dump_hlo_local_module_name: jit_train_step
I0423 09:49:47.181954 135852897781568 pyconfig.py:471] Config param dump_hlo_module_name: jit_train_step
I0423 09:49:47.181970 135852897781568 pyconfig.py:471] Config param dump_hlo_upload_all: False
I0423 09:49:47.181985 135852897781568 pyconfig.py:471] Config param dump_hlo_xla_flags: 
I0423 09:49:47.181999 135852897781568 pyconfig.py:471] Config param dump_jaxpr: False
I0423 09:49:47.182015 135852897781568 pyconfig.py:471] Config param dump_jaxpr_delete_local_after: True
I0423 09:49:47.182029 135852897781568 pyconfig.py:471] Config param dump_jaxpr_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-23-09-49/jaxpr_dump
I0423 09:49:47.182045 135852897781568 pyconfig.py:471] Config param dump_jaxpr_local_dir: /tmp/jaxpr_dump/
I0423 09:49:47.182059 135852897781568 pyconfig.py:471] Config param dump_step: -1
I0423 09:49:47.182075 135852897781568 pyconfig.py:471] Config param elastic_enabled: False
I0423 09:49:47.182090 135852897781568 pyconfig.py:471] Config param elastic_max_retries: 10
I0423 09:49:47.182115 135852897781568 pyconfig.py:471] Config param elastic_timeout_seconds: 300
I0423 09:49:47.182131 135852897781568 pyconfig.py:471] Config param emb_dim: 16
I0423 09:49:47.182145 135852897781568 pyconfig.py:471] Config param enable_autocheckpoint: False
I0423 09:49:47.182161 135852897781568 pyconfig.py:471] Config param enable_checkpoint_cloud_logger: False
I0423 09:49:47.182177 135852897781568 pyconfig.py:471] Config param enable_checkpointing: True
I0423 09:49:47.182191 135852897781568 pyconfig.py:471] Config param enable_continuous_checkpointing: False
I0423 09:49:47.182206 135852897781568 pyconfig.py:471] Config param enable_data_shuffling: True
I0423 09:49:47.182222 135852897781568 pyconfig.py:471] Config param enable_diloco: False
I0423 09:49:47.182237 135852897781568 pyconfig.py:471] Config param enable_dp_attention: False
I0423 09:49:47.182252 135852897781568 pyconfig.py:471] Config param enable_dropout: False
I0423 09:49:47.182268 135852897781568 pyconfig.py:471] Config param enable_emergency_checkpoint: False
I0423 09:49:47.182282 135852897781568 pyconfig.py:471] Config param enable_expert_parallel: False
I0423 09:49:47.182298 135852897781568 pyconfig.py:471] Config param enable_gcp_goodput_metrics: True
I0423 09:49:47.182312 135852897781568 pyconfig.py:471] Config param enable_gcp_step_deviation_metrics: True
I0423 09:49:47.182328 135852897781568 pyconfig.py:471] Config param enable_goodput_recording: False
I0423 09:49:47.182347 135852897781568 pyconfig.py:471] Config param enable_jax_profiler: False
I0423 09:49:47.182362 135852897781568 pyconfig.py:471] Config param enable_llm_inference_pool: False
I0423 09:49:47.182378 135852897781568 pyconfig.py:471] Config param enable_model_warmup: False
I0423 09:49:47.182393 135852897781568 pyconfig.py:471] Config param enable_multi_tier_checkpointing: False
I0423 09:49:47.182408 135852897781568 pyconfig.py:471] Config param enable_nnx: False
I0423 09:49:47.182424 135852897781568 pyconfig.py:471] Config param enable_orbax_v1: False
I0423 09:49:47.182440 135852897781568 pyconfig.py:471] Config param enable_padding_causal_mask: True
I0423 09:49:47.182454 135852897781568 pyconfig.py:471] Config param enable_pathways_goodput: False
I0423 09:49:47.182469 135852897781568 pyconfig.py:471] Config param enable_prefix_caching: False
I0423 09:49:47.182485 135852897781568 pyconfig.py:471] Config param enable_rampup_batch_size: False
I0423 09:49:47.182499 135852897781568 pyconfig.py:471] Config param enable_single_controller: False
I0423 09:49:47.182514 135852897781568 pyconfig.py:471] Config param enable_single_replica_ckpt_restoring: False
I0423 09:49:47.182531 135852897781568 pyconfig.py:471] Config param enable_tensorboard: True
I0423 09:49:47.182545 135852897781568 pyconfig.py:471] Config param enable_tunix_perf_metrics: False
I0423 09:49:47.182560 135852897781568 pyconfig.py:471] Config param encoder_attention_heads_for_audio: 4
I0423 09:49:47.182576 135852897781568 pyconfig.py:471] Config param encoder_ffn_dim_for_audio: 512
I0423 09:49:47.182591 135852897781568 pyconfig.py:471] Config param encoder_layers_for_audio: 2
I0423 09:49:47.182607 135852897781568 pyconfig.py:471] Config param engram: RematLocation.REMAT
I0423 09:49:47.182623 135852897781568 pyconfig.py:471] Config param engram_head_dim: 1280
I0423 09:49:47.182637 135852897781568 pyconfig.py:471] Config param engram_kernel_size: 4
I0423 09:49:47.182653 135852897781568 pyconfig.py:471] Config param engram_layers: []
I0423 09:49:47.182669 135852897781568 pyconfig.py:471] Config param engram_max_ngram_size: 3
I0423 09:49:47.182684 135852897781568 pyconfig.py:471] Config param engram_num_heads: 8
I0423 09:49:47.182699 135852897781568 pyconfig.py:471] Config param engram_seed: 0
I0423 09:49:47.182714 135852897781568 pyconfig.py:471] Config param engram_vocab_bases: []
I0423 09:49:47.182728 135852897781568 pyconfig.py:471] Config param epsilon_high: None
I0423 09:49:47.182743 135852897781568 pyconfig.py:471] Config param eval_corr_lst: False
I0423 09:49:47.182759 135852897781568 pyconfig.py:471] Config param eval_data_columns: ['text']
I0423 09:49:47.182775 135852897781568 pyconfig.py:471] Config param eval_dataset_name: c4/en:3.0.1
I0423 09:49:47.182790 135852897781568 pyconfig.py:471] Config param eval_image_column: image
I0423 09:49:47.182806 135852897781568 pyconfig.py:471] Config param eval_interval: -1
I0423 09:49:47.182821 135852897781568 pyconfig.py:471] Config param eval_make_lst: False
I0423 09:49:47.182836 135852897781568 pyconfig.py:471] Config param eval_per_device_batch_size: 2
I0423 09:49:47.182851 135852897781568 pyconfig.py:471] Config param eval_sampling_strategy: greedy
I0423 09:49:47.182867 135852897781568 pyconfig.py:471] Config param eval_split: validation
I0423 09:49:47.182882 135852897781568 pyconfig.py:471] Config param eval_steps: -1
I0423 09:49:47.182898 135852897781568 pyconfig.py:471] Config param expansion_factor_real_data: -1.0
I0423 09:49:47.182914 135852897781568 pyconfig.py:471] Config param final_logits_soft_cap: None
I0423 09:49:47.182928 135852897781568 pyconfig.py:471] Config param first_num_dense_layers: 0
I0423 09:49:47.182944 135852897781568 pyconfig.py:471] Config param float32_gate_logits: False
I0423 09:49:47.182958 135852897781568 pyconfig.py:471] Config param float32_logits: False
I0423 09:49:47.182974 135852897781568 pyconfig.py:471] Config param float32_qk_product: False
I0423 09:49:47.182990 135852897781568 pyconfig.py:471] Config param float32_weight_sum: True
I0423 09:49:47.183004 135852897781568 pyconfig.py:471] Config param force_q_layout: False
I0423 09:49:47.183020 135852897781568 pyconfig.py:471] Config param force_unroll: False
I0423 09:49:47.183037 135852897781568 pyconfig.py:471] Config param formatting_func_kwargs: {}
I0423 09:49:47.183053 135852897781568 pyconfig.py:471] Config param formatting_func_path: 
I0423 09:49:47.183067 135852897781568 pyconfig.py:471] Config param freeze_audio_encoder_params: True
I0423 09:49:47.183083 135852897781568 pyconfig.py:471] Config param freeze_vision_encoder_params: True
I0423 09:49:47.183108 135852897781568 pyconfig.py:471] Config param fused_mlp: False
I0423 09:49:47.183123 135852897781568 pyconfig.py:471] Config param fused_qkv: True
I0423 09:49:47.183139 135852897781568 pyconfig.py:471] Config param gcs_metrics: False
I0423 09:49:47.183154 135852897781568 pyconfig.py:471] Config param gdn_chunk_size: 64
I0423 09:49:47.183170 135852897781568 pyconfig.py:471] Config param gdn_conv_kernel_dim: 4
I0423 09:49:47.183185 135852897781568 pyconfig.py:471] Config param gdn_key_head_dim: 128
I0423 09:49:47.183201 135852897781568 pyconfig.py:471] Config param gdn_num_key_heads: 16
I0423 09:49:47.183216 135852897781568 pyconfig.py:471] Config param gdn_num_value_heads: 32
I0423 09:49:47.183232 135852897781568 pyconfig.py:471] Config param gdn_value_head_dim: 128
I0423 09:49:47.183247 135852897781568 pyconfig.py:471] Config param generate_padding_batch_eval: False
I0423 09:49:47.183262 135852897781568 pyconfig.py:471] Config param generate_padding_batch_train: False
I0423 09:49:47.183277 135852897781568 pyconfig.py:471] Config param generate_slice: v5e-16
I0423 09:49:47.183291 135852897781568 pyconfig.py:471] Config param generation_configs: {}
I0423 09:49:47.183306 135852897781568 pyconfig.py:471] Config param global_batch_size_to_eval_on: 64
I0423 09:49:47.183322 135852897781568 pyconfig.py:471] Config param global_batch_size_to_load: 512
I0423 09:49:47.183341 135852897781568 pyconfig.py:471] Config param global_batch_size_to_load_eval: 64
I0423 09:49:47.183357 135852897781568 pyconfig.py:471] Config param global_batch_size_to_load_increment: None
I0423 09:49:47.183371 135852897781568 pyconfig.py:471] Config param global_batch_size_to_load_start: None
I0423 09:49:47.183387 135852897781568 pyconfig.py:471] Config param global_batch_size_to_train_on: 512
I0423 09:49:47.183403 135852897781568 pyconfig.py:471] Config param global_head_dim: 0
I0423 09:49:47.183418 135852897781568 pyconfig.py:471] Config param global_num_kv_heads: 0
I0423 09:49:47.183434 135852897781568 pyconfig.py:471] Config param global_parameter_scale: 1
I0423 09:49:47.183449 135852897781568 pyconfig.py:471] Config param global_rampup_samples: 500
I0423 09:49:47.183464 135852897781568 pyconfig.py:471] Config param global_rope_max_timescale: -1
I0423 09:49:47.183480 135852897781568 pyconfig.py:471] Config param global_rope_proportion: 0.25
I0423 09:49:47.183496 135852897781568 pyconfig.py:471] Config param goodput_upload_interval_seconds: 30
I0423 09:49:47.183511 135852897781568 pyconfig.py:471] Config param grad_dtype: float32
I0423 09:49:47.183546 135852897781568 pyconfig.py:471] Config param gradient_accumulation_steps: 8
I0423 09:49:47.183562 135852897781568 pyconfig.py:471] Config param gradient_clipping_threshold: 1.0
I0423 09:49:47.183578 135852897781568 pyconfig.py:471] Config param grain_data_source_max_workers: 16
I0423 09:49:47.183594 135852897781568 pyconfig.py:471] Config param grain_eval_files: 
I0423 09:49:47.183610 135852897781568 pyconfig.py:471] Config param grain_file_type: arrayrecord
I0423 09:49:47.183626 135852897781568 pyconfig.py:471] Config param grain_num_threads: 16
I0423 09:49:47.183641 135852897781568 pyconfig.py:471] Config param grain_num_threads_eval: 16
I0423 09:49:47.183657 135852897781568 pyconfig.py:471] Config param grain_packing_type: first_fit
I0423 09:49:47.183671 135852897781568 pyconfig.py:471] Config param grain_per_worker_buffer_size: 1
I0423 09:49:47.183687 135852897781568 pyconfig.py:471] Config param grain_per_worker_buffer_size_eval: 1
I0423 09:49:47.183703 135852897781568 pyconfig.py:471] Config param grain_prefetch_buffer_size: 500
I0423 09:49:47.183718 135852897781568 pyconfig.py:471] Config param grain_prefetch_buffer_size_eval: 500
I0423 09:49:47.183734 135852897781568 pyconfig.py:471] Config param grain_ram_budget_mb: 1024
I0423 09:49:47.183748 135852897781568 pyconfig.py:471] Config param grain_shuffle_buffer_size: 100
I0423 09:49:47.183764 135852897781568 pyconfig.py:471] Config param grain_train_files: 
I0423 09:49:47.183778 135852897781568 pyconfig.py:471] Config param grain_train_mixture_config_path: 
I0423 09:49:47.183794 135852897781568 pyconfig.py:471] Config param grain_worker_count: 1
I0423 09:49:47.183809 135852897781568 pyconfig.py:471] Config param grain_worker_count_eval: 1
I0423 09:49:47.183824 135852897781568 pyconfig.py:471] Config param grpo_beta: 0.08
I0423 09:49:47.183840 135852897781568 pyconfig.py:471] Config param grpo_epsilon: 0.2
I0423 09:49:47.183854 135852897781568 pyconfig.py:471] Config param hardware: tpu
I0423 09:49:47.183870 135852897781568 pyconfig.py:471] Config param hbm_utilization_vllm: 0.72
I0423 09:49:47.183885 135852897781568 pyconfig.py:471] Config param head_dim: 8
I0423 09:49:47.183900 135852897781568 pyconfig.py:471] Config param heartbeat_reporting_interval_in_seconds: 5
I0423 09:49:47.183916 135852897781568 pyconfig.py:471] Config param hf_data_dir: None
I0423 09:49:47.183932 135852897781568 pyconfig.py:471] Config param hf_eval_files: None
I0423 09:49:47.183948 135852897781568 pyconfig.py:471] Config param hf_eval_split: None
I0423 09:49:47.183964 135852897781568 pyconfig.py:471] Config param hf_name: None
I0423 09:49:47.183979 135852897781568 pyconfig.py:471] Config param hf_path: OptimalScale/ClimbMix
I0423 09:49:47.183994 135852897781568 pyconfig.py:471] Config param hf_train_files: None
I0423 09:49:47.184010 135852897781568 pyconfig.py:471] Config param hidden_size_for_vit: 1408
I0423 09:49:47.184024 135852897781568 pyconfig.py:471] Config param hide_profiler_step_metric: False
I0423 09:49:47.184040 135852897781568 pyconfig.py:471] Config param ici_autoregressive_parallelism: 1
I0423 09:49:47.184054 135852897781568 pyconfig.py:471] Config param ici_context_autoregressive_parallelism: 1
I0423 09:49:47.184070 135852897781568 pyconfig.py:471] Config param ici_context_parallelism: 1
I0423 09:49:47.184085 135852897781568 pyconfig.py:471] Config param ici_data_parallelism: 1
I0423 09:49:47.184110 135852897781568 pyconfig.py:471] Config param ici_diloco_parallelism: 1
I0423 09:49:47.184126 135852897781568 pyconfig.py:471] Config param ici_expert_parallelism: 1
I0423 09:49:47.184141 135852897781568 pyconfig.py:471] Config param ici_fsdp_parallelism: -1
I0423 09:49:47.184157 135852897781568 pyconfig.py:471] Config param ici_fsdp_transpose_parallelism: 1
I0423 09:49:47.184171 135852897781568 pyconfig.py:471] Config param ici_parallelism: [1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
I0423 09:49:47.184188 135852897781568 pyconfig.py:471] Config param ici_pipeline_parallelism: 1
I0423 09:49:47.184202 135852897781568 pyconfig.py:471] Config param ici_sequence_parallelism: 1
I0423 09:49:47.184217 135852897781568 pyconfig.py:471] Config param ici_tensor_parallelism: 1
I0423 09:49:47.184231 135852897781568 pyconfig.py:471] Config param ici_tensor_sequence_parallelism: 1
I0423 09:49:47.184247 135852897781568 pyconfig.py:471] Config param ici_tensor_transpose_parallelism: 1
I0423 09:49:47.184262 135852897781568 pyconfig.py:471] Config param image_path: 
I0423 09:49:47.184278 135852897781568 pyconfig.py:471] Config param image_placeholder: <|image|>
I0423 09:49:47.184293 135852897781568 pyconfig.py:471] Config param image_size_for_vit: 896
I0423 09:49:47.184309 135852897781568 pyconfig.py:471] Config param indexer_head_dim: 128
I0423 09:49:47.184323 135852897781568 pyconfig.py:471] Config param indexer_loss_scaling_factor: 0.0
I0423 09:49:47.184343 135852897781568 pyconfig.py:471] Config param indexer_n_heads: 64
I0423 09:49:47.184357 135852897781568 pyconfig.py:471] Config param indexer_sparse_training: False
I0423 09:49:47.184372 135852897781568 pyconfig.py:471] Config param indexer_topk: 2048
I0423 09:49:47.184386 135852897781568 pyconfig.py:471] Config param inference_benchmark_test: False
I0423 09:49:47.184402 135852897781568 pyconfig.py:471] Config param inference_metadata_file: 
I0423 09:49:47.184417 135852897781568 pyconfig.py:471] Config param inference_microbenchmark_log_file_path: 
I0423 09:49:47.184432 135852897781568 pyconfig.py:471] Config param inference_microbenchmark_loop_iters: 10
I0423 09:49:47.184447 135852897781568 pyconfig.py:471] Config param inference_microbenchmark_num_samples: [1, 2, 3, 4, 5]
I0423 09:49:47.184463 135852897781568 pyconfig.py:471] Config param inference_microbenchmark_prefill_lengths: 64,128,256,512,1024
I0423 09:49:47.184478 135852897781568 pyconfig.py:471] Config param inference_microbenchmark_stages: prefill,generate
I0423 09:49:47.184493 135852897781568 pyconfig.py:471] Config param inference_server: MaxtextInterleavedServer
I0423 09:49:47.184509 135852897781568 pyconfig.py:471] Config param inhomogeneous_layer_cycle_interval: 1
I0423 09:49:47.184523 135852897781568 pyconfig.py:471] Config param init_weights_seed: 0
I0423 09:49:47.184539 135852897781568 pyconfig.py:471] Config param input_data_sharding_logical_axes: ['activation_embed_and_logits_batch', 'activation_norm_length']
I0423 09:49:47.184555 135852897781568 pyconfig.py:471] Config param interleave_moe_layer_step: 1
I0423 09:49:47.184571 135852897781568 pyconfig.py:471] Config param intermediate_size_for_vit: 5632
I0423 09:49:47.184587 135852897781568 pyconfig.py:471] Config param internal_compile: False
I0423 09:49:47.184603 135852897781568 pyconfig.py:471] Config param internal_compile_num_devices: -1
I0423 09:49:47.184619 135852897781568 pyconfig.py:471] Config param jax_cache_dir: ~/jax_cache
I0423 09:49:47.184635 135852897781568 pyconfig.py:471] Config param jax_debug_log_modules: 
I0423 09:49:47.184649 135852897781568 pyconfig.py:471] Config param jax_distributed_initialization_timeout: 300
I0423 09:49:47.184665 135852897781568 pyconfig.py:471] Config param jax_profiler_port: 9999
I0423 09:49:47.184679 135852897781568 pyconfig.py:471] Config param key_proj: RematLocation.REMAT
I0423 09:49:47.184696 135852897781568 pyconfig.py:471] Config param kv_cache_buffer: 256
I0423 09:49:47.184711 135852897781568 pyconfig.py:471] Config param kv_lora_rank: 512
I0423 09:49:47.184727 135852897781568 pyconfig.py:471] Config param kv_quant_axis: KvQuantAxis.HEADS_AND_DKV
I0423 09:49:47.184743 135852897781568 pyconfig.py:471] Config param kv_quant_dtype: int8
I0423 09:49:47.184759 135852897781568 pyconfig.py:471] Config param kv_wa_proj: RematLocation.REMAT
I0423 09:49:47.184775 135852897781568 pyconfig.py:471] Config param learning_rate: 0.0002
I0423 09:49:47.184791 135852897781568 pyconfig.py:471] Config param learning_rate_final_fraction: 0.1
I0423 09:49:47.184806 135852897781568 pyconfig.py:471] Config param learning_rate_schedule_steps: 200000
I0423 09:49:47.184822 135852897781568 pyconfig.py:471] Config param load_balance_loss_weight: 0.0
I0423 09:49:47.184838 135852897781568 pyconfig.py:471] Config param load_checkpoint_only_once: False
I0423 09:49:47.184852 135852897781568 pyconfig.py:471] Config param load_from_prefill_dir: False
I0423 09:49:47.184867 135852897781568 pyconfig.py:471] Config param load_full_state_path: 
I0423 09:49:47.184882 135852897781568 pyconfig.py:471] Config param load_parameters_path: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items
I0423 09:49:47.184898 135852897781568 pyconfig.py:471] Config param local_checkpoint_directory: 
I0423 09:49:47.184914 135852897781568 pyconfig.py:471] Config param local_checkpoint_period: 0
I0423 09:49:47.184929 135852897781568 pyconfig.py:471] Config param local_rope_max_timescale: -1
I0423 09:49:47.184944 135852897781568 pyconfig.py:471] Config param local_rope_proportion: 1.0
I0423 09:49:47.184960 135852897781568 pyconfig.py:471] Config param log_config: True
I0423 09:49:47.184975 135852897781568 pyconfig.py:471] Config param log_period: 10
I0423 09:49:47.184991 135852897781568 pyconfig.py:471] Config param logical_axis_rules: (('activation_embed_and_logits_batch', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_embed_and_logits_batch_sequence', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_vocab', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_vocab', ('tensor', 'tensor_transpose')), ('activation_vocab', 'tensor_sequence'), ('activation_vocab', ('sequence', 'context')), ('vocab', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('embed_vocab', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence', 'autoregressive')), ('activation_kv_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence')), ('activation_attn_length', ('sequence', 'context')), ('activation_attn_length', ('context',)), ('activation_q_length', ('context',)), ('activation_kv_length', ()), ('activation_attn_embed', ('tensor', 'tensor_transpose')), ('activation_kv', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_kv_head_dim', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('q_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('kv_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('qkv', ()), ('kv', ()), ('kv_head_dim', ()), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'expert')), ('q_lora_up_proj', ()), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'expert')), ('kv_lora_up_proj', ()), ('activation_batch_moe', ('data', 'fsdp', 'fsdp_transpose')), ('activation_length_moe', ('sequence', 'context')), ('activation_length_moe', ('context',)), ('activation_norm_length_moe', ('tensor_sequence', 'context', 'sequence')), ('activation_embed_moe', ('tensor', 'tensor_transpose')), ('activation_mlp_moe', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_exp', ('expert',)), ('exp', 'expert'), ('mlp_moe', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'context')), ('embed_moe', ('fsdp', 'sequence', 'context')), ('activation_mlp', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_length', ('sequence', 'context')), ('activation_length', ('context',)), ('activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_embed', ('tensor', 'tensor_transpose')), ('activation_stage', 'stage'), ('mlp', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'context', 'expert')), ('norm', ('tensor', 'tensor_transpose')), ('layers', 'stage'), ('diloco', 'diloco'), ('engram_dim', ('tensor',)), ('dense_layers', ()), ('moe_layers', ()), ('mhc', ()), ('prefill_activation_length', ('sequence', 'context')), ('prefill_activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_prefill_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_length', ('sequence',)), ('cache_heads', ('autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence')), ('cache_heads', ('autoregressive', 'tensor', 'tensor_sequence')), ('paged_kv_heads', ('tensor',)), ('cache_batch_prefill', ()), ('cache_batch', ()), ('cache_heads_none', ()), ('cache_kv', ()), ('cache_sequence', ()), ('num_pages', ()), ('tokens_per_page', ()), ('paged_kv_head_dim_size', ()), ('mlp_no_fsdp', ('tensor', 'tensor_sequence', 'autoregressive')), ('embed_tensor_transpose', ('tensor_transpose',)), ('exp_with_fsdp', 'fsdp'))
I0423 09:49:47.185062 135852897781568 pyconfig.py:471] Config param logits_dot_in_fp32: False
I0423 09:49:47.185077 135852897781568 pyconfig.py:471] Config param logits_via_embedding: True
I0423 09:49:47.185101 135852897781568 pyconfig.py:471] Config param lora_input_adapters_path: 
I0423 09:49:47.185117 135852897781568 pyconfig.py:471] Config param loss_algo: grpo
I0423 09:49:47.185133 135852897781568 pyconfig.py:471] Config param lr_schedule_type: LearningRateScheduleType.COSINE
I0423 09:49:47.185151 135852897781568 pyconfig.py:471] Config param managed_mldiagnostics: False
I0423 09:49:47.185167 135852897781568 pyconfig.py:471] Config param managed_mldiagnostics_dir: /deps/maxtext_output/gpt3-52k_2026-04-23-09-49/managed-mldiagnostics
I0423 09:49:47.185181 135852897781568 pyconfig.py:471] Config param managed_mldiagnostics_run_group: 
I0423 09:49:47.185197 135852897781568 pyconfig.py:471] Config param matmul_precision: MatmulPrecision.DEFAULT
I0423 09:49:47.185214 135852897781568 pyconfig.py:471] Config param max_checkify: False
I0423 09:49:47.185229 135852897781568 pyconfig.py:471] Config param max_concurrency: 256
I0423 09:49:47.185245 135852897781568 pyconfig.py:471] Config param max_corpus_chars: 10000000
I0423 09:49:47.185260 135852897781568 pyconfig.py:471] Config param max_num_batched_tokens: None
I0423 09:49:47.185275 135852897781568 pyconfig.py:471] Config param max_num_checkpoints_to_keep: None
I0423 09:49:47.185290 135852897781568 pyconfig.py:471] Config param max_num_images_per_example: -1
I0423 09:49:47.185306 135852897781568 pyconfig.py:471] Config param max_num_seqs: None
I0423 09:49:47.185322 135852897781568 pyconfig.py:471] Config param max_position_embeddings: 163840
I0423 09:49:47.185341 135852897781568 pyconfig.py:471] Config param max_prefill_predict_length: 64
I0423 09:49:47.185357 135852897781568 pyconfig.py:471] Config param max_sample_len_for_audio: 10000
I0423 09:49:47.185372 135852897781568 pyconfig.py:471] Config param max_segments_per_seq: -1
I0423 09:49:47.185387 135852897781568 pyconfig.py:471] Config param max_source_positions_for_audio: 1500
I0423 09:49:47.185402 135852897781568 pyconfig.py:471] Config param max_target_length: 2048
I0423 09:49:47.185417 135852897781568 pyconfig.py:471] Config param max_timescale_for_audio: 10000.0
I0423 09:49:47.185433 135852897781568 pyconfig.py:471] Config param megablox: True
I0423 09:49:47.185449 135852897781568 pyconfig.py:471] Config param merge_gating_gmm: False
I0423 09:49:47.185465 135852897781568 pyconfig.py:471] Config param mesh_axes: ['diloco', 'data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive']
I0423 09:49:47.185483 135852897781568 pyconfig.py:471] Config param metrics_dir: /deps/maxtext_output/gpt3-52k_2026-04-23-09-49/metrics/
I0423 09:49:47.185497 135852897781568 pyconfig.py:471] Config param metrics_file: 
I0423 09:49:47.185513 135852897781568 pyconfig.py:471] Config param mhc_expansion_rate: 1
I0423 09:49:47.185527 135852897781568 pyconfig.py:471] Config param micro_batch_size_to_eval_on: 64
I0423 09:49:47.185543 135852897781568 pyconfig.py:471] Config param micro_batch_size_to_train_on: 64
I0423 09:49:47.185557 135852897781568 pyconfig.py:471] Config param mla_kv: RematLocation.REMAT
I0423 09:49:47.185573 135852897781568 pyconfig.py:471] Config param mla_naive_kvcache: True
I0423 09:49:47.185587 135852897781568 pyconfig.py:471] Config param mla_q: RematLocation.REMAT
I0423 09:49:47.185603 135852897781568 pyconfig.py:471] Config param mlp_activations: ['gelu']
I0423 09:49:47.185618 135852897781568 pyconfig.py:471] Config param mlp_activations_limit: -1.0
I0423 09:49:47.185634 135852897781568 pyconfig.py:471] Config param mlp_bias: False
I0423 09:49:47.185648 135852897781568 pyconfig.py:471] Config param mlp_dim: 64
I0423 09:49:47.185663 135852897781568 pyconfig.py:471] Config param mlpwi: RematLocation.REMAT
I0423 09:49:47.185678 135852897781568 pyconfig.py:471] Config param mlpwi_0: RematLocation.REMAT
I0423 09:49:47.185693 135852897781568 pyconfig.py:471] Config param mlpwi_1: RematLocation.REMAT
I0423 09:49:47.185708 135852897781568 pyconfig.py:471] Config param mlpwo: RematLocation.REMAT
I0423 09:49:47.185722 135852897781568 pyconfig.py:471] Config param moba: False
I0423 09:49:47.185738 135852897781568 pyconfig.py:471] Config param moba_chunk_size: 1024
I0423 09:49:47.185753 135852897781568 pyconfig.py:471] Config param moba_topk: 8
I0423 09:49:47.185767 135852897781568 pyconfig.py:471] Config param model_call_mode: 
I0423 09:49:47.185783 135852897781568 pyconfig.py:471] Config param model_name: gpt3-52k
I0423 09:49:47.185798 135852897781568 pyconfig.py:471] Config param moe_expert_input_dim: -1
I0423 09:49:47.185813 135852897781568 pyconfig.py:471] Config param moe_fsdp_use_two_stage_all_gather: False
I0423 09:49:47.185829 135852897781568 pyconfig.py:471] Config param moe_mlp_dim: -1
I0423 09:49:47.185844 135852897781568 pyconfig.py:471] Config param moe_mlpwi_0: RematLocation.REMAT
I0423 09:49:47.185859 135852897781568 pyconfig.py:471] Config param moe_mlpwi_1: RematLocation.REMAT
I0423 09:49:47.185874 135852897781568 pyconfig.py:471] Config param moe_mlpwo: RematLocation.REMAT
I0423 09:49:47.185889 135852897781568 pyconfig.py:471] Config param monitor_goodput: False
I0423 09:49:47.185903 135852897781568 pyconfig.py:471] Config param monitor_step_time_deviation: True
I0423 09:49:47.185918 135852897781568 pyconfig.py:471] Config param mrope_section: [24, 20, 20]
I0423 09:49:47.185934 135852897781568 pyconfig.py:471] Config param mscale: 1.0
I0423 09:49:47.185950 135852897781568 pyconfig.py:471] Config param mtc_data_parallelism: 0
I0423 09:49:47.185965 135852897781568 pyconfig.py:471] Config param mtp_eval_target_module: 0
I0423 09:49:47.185981 135852897781568 pyconfig.py:471] Config param mtp_loss_scaling_factor: 0.1
I0423 09:49:47.185996 135852897781568 pyconfig.py:471] Config param mtp_num_layers: 0
I0423 09:49:47.186012 135852897781568 pyconfig.py:471] Config param mu_dtype: float32
I0423 09:49:47.186035 135852897781568 pyconfig.py:471] Config param multi_sampling: False
I0423 09:49:47.186052 135852897781568 pyconfig.py:471] Config param multi_tier_checkpointing_backup_interval_minutes: 0
I0423 09:49:47.186068 135852897781568 pyconfig.py:471] Config param muon_beta: 0.95
I0423 09:49:47.186083 135852897781568 pyconfig.py:471] Config param muon_consistent_rms: None
I0423 09:49:47.186108 135852897781568 pyconfig.py:471] Config param muon_weight_decay: 0.0
I0423 09:49:47.186124 135852897781568 pyconfig.py:471] Config param n_routing_groups: -1
I0423 09:49:47.186138 135852897781568 pyconfig.py:471] Config param n_window_for_audio: 50
I0423 09:49:47.186154 135852897781568 pyconfig.py:471] Config param n_window_infer_for_audio: 800
I0423 09:49:47.186168 135852897781568 pyconfig.py:471] Config param nope_layer_interval: -1
I0423 09:49:47.186184 135852897781568 pyconfig.py:471] Config param norm_topk_prob: False
I0423 09:49:47.186198 135852897781568 pyconfig.py:471] Config param normalization_layer_epsilon: 1e-05
I0423 09:49:47.186215 135852897781568 pyconfig.py:471] Config param normalize_embedding_logits: False
I0423 09:49:47.186230 135852897781568 pyconfig.py:471] Config param num_attention_heads_for_vit: 16
I0423 09:49:47.186245 135852897781568 pyconfig.py:471] Config param num_batches: 4
I0423 09:49:47.186259 135852897781568 pyconfig.py:471] Config param num_channels_for_vit: 3
I0423 09:49:47.186275 135852897781568 pyconfig.py:471] Config param num_conv_layers_for_audio: 3
I0423 09:49:47.186289 135852897781568 pyconfig.py:471] Config param num_decoder_layers: 1
I0423 09:49:47.186305 135852897781568 pyconfig.py:471] Config param num_diloco_replicas: 1
I0423 09:49:47.186319 135852897781568 pyconfig.py:471] Config param num_epoch: 1
I0423 09:49:47.186339 135852897781568 pyconfig.py:471] Config param num_eval_passes: 1
I0423 09:49:47.186353 135852897781568 pyconfig.py:471] Config param num_experts: 1
I0423 09:49:47.186369 135852897781568 pyconfig.py:471] Config param num_experts_per_tok: 1
I0423 09:49:47.186383 135852897781568 pyconfig.py:471] Config param num_generations: 2
I0423 09:49:47.186399 135852897781568 pyconfig.py:471] Config param num_hidden_layers_for_vit: 34
I0423 09:49:47.186413 135852897781568 pyconfig.py:471] Config param num_iterations: 1
I0423 09:49:47.186429 135852897781568 pyconfig.py:471] Config param num_kv_heads: 2
I0423 09:49:47.186443 135852897781568 pyconfig.py:471] Config param num_layers_per_pipeline_stage: 1
I0423 09:49:47.186458 135852897781568 pyconfig.py:471] Config param num_mel_bins_for_audio: 128
I0423 09:49:47.186474 135852897781568 pyconfig.py:471] Config param num_pipeline_microbatches: -1
I0423 09:49:47.186489 135852897781568 pyconfig.py:471] Config param num_pipeline_repeats: -1
I0423 09:49:47.186504 135852897781568 pyconfig.py:471] Config param num_position_embeddings_for_vit: 1024
I0423 09:49:47.186520 135852897781568 pyconfig.py:471] Config param num_query_heads: 2
I0423 09:49:47.186534 135852897781568 pyconfig.py:471] Config param num_samplers_slices: -1
I0423 09:49:47.186549 135852897781568 pyconfig.py:471] Config param num_slices: 1
I0423 09:49:47.186563 135852897781568 pyconfig.py:471] Config param num_target_devices: 32
I0423 09:49:47.186578 135852897781568 pyconfig.py:471] Config param num_test_batches: 5
I0423 09:49:47.186593 135852897781568 pyconfig.py:471] Config param num_trainer_slices: -1
I0423 09:49:47.186609 135852897781568 pyconfig.py:471] Config param num_vocab_tiling: 1
I0423 09:49:47.186623 135852897781568 pyconfig.py:471] Config param off_policy_steps: 0
I0423 09:49:47.186639 135852897781568 pyconfig.py:471] Config param offline_data_dir: None
I0423 09:49:47.186654 135852897781568 pyconfig.py:471] Config param opt_type: OptimizerType.ADAM_PAX
I0423 09:49:47.186672 135852897781568 pyconfig.py:471] Config param optimize_mesh_for_tpu_v6e: False
I0423 09:49:47.186687 135852897781568 pyconfig.py:471] Config param optimizer_memory_host_offload: False
I0423 09:49:47.186703 135852897781568 pyconfig.py:471] Config param original_max_position_embeddings: 4096
I0423 09:49:47.186719 135852897781568 pyconfig.py:471] Config param out_hidden_size_for_vit: 512
I0423 09:49:47.186733 135852897781568 pyconfig.py:471] Config param out_proj: RematLocation.REMAT
I0423 09:49:47.186750 135852897781568 pyconfig.py:471] Config param output_dim_for_audio: 512
I0423 09:49:47.186766 135852897781568 pyconfig.py:471] Config param override_logical_axis_rules: False
I0423 09:49:47.186781 135852897781568 pyconfig.py:471] Config param override_model_config: True
I0423 09:49:47.186795 135852897781568 pyconfig.py:471] Config param packing: True
I0423 09:49:47.186811 135852897781568 pyconfig.py:471] Config param pagedattn_head_dim_alignment: 128
I0423 09:49:47.186825 135852897781568 pyconfig.py:471] Config param pagedattn_max_pages_per_group: -1
I0423 09:49:47.186841 135852897781568 pyconfig.py:471] Config param pagedattn_num_pages: 64
I0423 09:49:47.186856 135852897781568 pyconfig.py:471] Config param pagedattn_pages_per_compute_block: 4
I0423 09:49:47.186872 135852897781568 pyconfig.py:471] Config param pagedattn_tokens_per_page: 32
I0423 09:49:47.186886 135852897781568 pyconfig.py:471] Config param param_scan_axis: 1
I0423 09:49:47.186900 135852897781568 pyconfig.py:471] Config param parameter_memory_host_offload: False
I0423 09:49:47.186916 135852897781568 pyconfig.py:471] Config param partial_rotary_factor: 1.0
I0423 09:49:47.186930 135852897781568 pyconfig.py:471] Config param patch_size_for_vit: 14
I0423 09:49:47.186946 135852897781568 pyconfig.py:471] Config param penalty_incorrect_answer: -1.0
I0423 09:49:47.186963 135852897781568 pyconfig.py:471] Config param penalty_incorrect_format: -0.5
I0423 09:49:47.186979 135852897781568 pyconfig.py:471] Config param per_device_batch_size: 2
I0423 09:49:47.186995 135852897781568 pyconfig.py:471] Config param per_device_batch_size_increment: 2.0
I0423 09:49:47.187010 135852897781568 pyconfig.py:471] Config param per_device_batch_size_start: 4.0
I0423 09:49:47.187025 135852897781568 pyconfig.py:471] Config param pipeline_delay_activation_forwarding: False
I0423 09:49:47.187040 135852897781568 pyconfig.py:471] Config param pipeline_fsdp_ag_once: False
I0423 09:49:47.187055 135852897781568 pyconfig.py:471] Config param pipeline_fsdp_ag_per_repeat: False
I0423 09:49:47.187071 135852897781568 pyconfig.py:471] Config param pipeline_parallel_layers: 1
I0423 09:49:47.187086 135852897781568 pyconfig.py:471] Config param pixel_shuffle_ratio_for_vit: 0.5
I0423 09:49:47.187113 135852897781568 pyconfig.py:471] Config param posemb_type_for_vit: learn
I0423 09:49:47.187127 135852897781568 pyconfig.py:471] Config param position_id_per_seconds: 25
I0423 09:49:47.187143 135852897781568 pyconfig.py:471] Config param prefill_cache_axis_order: 1,2,0,3
I0423 09:49:47.187157 135852897781568 pyconfig.py:471] Config param prefill_cache_dir: 
I0423 09:49:47.187172 135852897781568 pyconfig.py:471] Config param prefill_chunk_size: 256
I0423 09:49:47.187187 135852897781568 pyconfig.py:471] Config param prefill_slice: v5e-16
I0423 09:49:47.187202 135852897781568 pyconfig.py:471] Config param prefix_caching_dram_byte: 100000000000
I0423 09:49:47.187216 135852897781568 pyconfig.py:471] Config param prefix_caching_hbm_byte: 10000000000
I0423 09:49:47.187232 135852897781568 pyconfig.py:471] Config param prefuse_moe_weights: False
I0423 09:49:47.187246 135852897781568 pyconfig.py:471] Config param profile_cleanly: True
I0423 09:49:47.187262 135852897781568 pyconfig.py:471] Config param profile_periodically_period: -1
I0423 09:49:47.187277 135852897781568 pyconfig.py:471] Config param profile_power_events: False
I0423 09:49:47.187293 135852897781568 pyconfig.py:471] Config param profiler: ProfilerType.NONE
I0423 09:49:47.187309 135852897781568 pyconfig.py:471] Config param profiler_steps: 5
I0423 09:49:47.187325 135852897781568 pyconfig.py:471] Config param projector_dropout_for_vit: 0.0
I0423 09:49:47.187344 135852897781568 pyconfig.py:471] Config param projector_input_dim_for_vit: 4096
I0423 09:49:47.187359 135852897781568 pyconfig.py:471] Config param projector_output_dim_for_vit: 4096
I0423 09:49:47.187374 135852897781568 pyconfig.py:471] Config param prometheus_port: 0
I0423 09:49:47.187389 135852897781568 pyconfig.py:471] Config param prompt: I love to
I0423 09:49:47.187404 135852897781568 pyconfig.py:471] Config param pure_nnx: False
I0423 09:49:47.187420 135852897781568 pyconfig.py:471] Config param pure_nnx_decoder: False
I0423 09:49:47.187435 135852897781568 pyconfig.py:471] Config param q_lora_rank: 0
I0423 09:49:47.187450 135852897781568 pyconfig.py:471] Config param qk_clip_threshold: 100.0
I0423 09:49:47.187467 135852897781568 pyconfig.py:471] Config param qk_nope_head_dim: 128
I0423 09:49:47.187482 135852897781568 pyconfig.py:471] Config param qk_norm_with_scale: True
I0423 09:49:47.187498 135852897781568 pyconfig.py:471] Config param qk_rope_head_dim: 64
I0423 09:49:47.187514 135852897781568 pyconfig.py:471] Config param qkv_proj: RematLocation.REMAT
I0423 09:49:47.187530 135852897781568 pyconfig.py:471] Config param quant_cfg_path: 
I0423 09:49:47.187546 135852897781568 pyconfig.py:471] Config param quantization: QuantizationType.NONE
I0423 09:49:47.187564 135852897781568 pyconfig.py:471] Config param quantization_local_shard_count: 4
I0423 09:49:47.187578 135852897781568 pyconfig.py:471] Config param quantize_kvcache: False
I0423 09:49:47.187593 135852897781568 pyconfig.py:471] Config param query_proj: RematLocation.REMAT
I0423 09:49:47.187610 135852897781568 pyconfig.py:471] Config param query_wa_proj: RematLocation.REMAT
I0423 09:49:47.187625 135852897781568 pyconfig.py:471] Config param ragged_block_size: 256
I0423 09:49:47.187641 135852897781568 pyconfig.py:471] Config param ragged_buffer_factor: -1.0
I0423 09:49:47.187655 135852897781568 pyconfig.py:471] Config param rampup_end_step: 0
I0423 09:49:47.187671 135852897781568 pyconfig.py:471] Config param rampup_samples_per_increment_to_load: None
I0423 09:49:47.187685 135852897781568 pyconfig.py:471] Config param reasoning_end_token: </reasoning>
I0423 09:49:47.187701 135852897781568 pyconfig.py:471] Config param reasoning_start_token: <reasoning>
I0423 09:49:47.187717 135852897781568 pyconfig.py:471] Config param record_internal_nn_metrics: 0
I0423 09:49:47.187731 135852897781568 pyconfig.py:471] Config param remat_policy: full
I0423 09:49:47.187747 135852897781568 pyconfig.py:471] Config param remat_policy_for_vit: minimal
I0423 09:49:47.187763 135852897781568 pyconfig.py:471] Config param remove_size_one_mesh_axis_from_type: True
I0423 09:49:47.187779 135852897781568 pyconfig.py:471] Config param replicate_quant_scale: False
I0423 09:49:47.187794 135852897781568 pyconfig.py:471] Config param replicator_backup_interval_minutes: 0
I0423 09:49:47.187809 135852897781568 pyconfig.py:471] Config param report_heartbeat_metric_for_gcp_monitoring: False
I0423 09:49:47.187824 135852897781568 pyconfig.py:471] Config param report_performance_metric_for_gcp_monitoring: False
I0423 09:49:47.187839 135852897781568 pyconfig.py:471] Config param reshape_q: False
I0423 09:49:47.187854 135852897781568 pyconfig.py:471] Config param return_log_prob: False
I0423 09:49:47.187869 135852897781568 pyconfig.py:471] Config param reuse_example_batch: 0
I0423 09:49:47.187884 135852897781568 pyconfig.py:471] Config param reward_exact_answer: 5.0
I0423 09:49:47.187900 135852897781568 pyconfig.py:471] Config param reward_exact_format_match: 3.0
I0423 09:49:47.187914 135852897781568 pyconfig.py:471] Config param reward_partial_format_match: 0.5
I0423 09:49:47.187930 135852897781568 pyconfig.py:471] Config param reward_ratio_guess_to_answer_high: 0.5
I0423 09:49:47.187946 135852897781568 pyconfig.py:471] Config param reward_ratio_guess_to_answer_low: 0.25
I0423 09:49:47.187961 135852897781568 pyconfig.py:471] Config param reward_white_space_format_match: 1.5
I0423 09:49:47.187977 135852897781568 pyconfig.py:471] Config param rl: {'num_generations': 2, 'num_iterations': 1, 'grpo_beta': 0.08, 'grpo_epsilon': 0.2, 'loss_algo': 'grpo', 'use_agentic_rollout': False, 'max_concurrency': 256, 'off_policy_steps': 0, 'system_prompt': '', 'degenerate_group_masking': True, 'epsilon_high': None}
I0423 09:49:47.187997 135852897781568 pyconfig.py:471] Config param rollout_data_parallelism: -1
I0423 09:49:47.188014 135852897781568 pyconfig.py:471] Config param rollout_expert_parallelism: 1
I0423 09:49:47.188030 135852897781568 pyconfig.py:471] Config param rollout_micro_batch_size: -1
I0423 09:49:47.188046 135852897781568 pyconfig.py:471] Config param rollout_tensor_parallelism: -1
I0423 09:49:47.188061 135852897781568 pyconfig.py:471] Config param rope_attention_scaling: False
I0423 09:49:47.188076 135852897781568 pyconfig.py:471] Config param rope_factor: 40
I0423 09:49:47.188091 135852897781568 pyconfig.py:471] Config param rope_interleave: True
I0423 09:49:47.188118 135852897781568 pyconfig.py:471] Config param rope_linear_scaling_factor: 1.0
I0423 09:49:47.188134 135852897781568 pyconfig.py:471] Config param rope_max_timescale: 10000
I0423 09:49:47.188149 135852897781568 pyconfig.py:471] Config param rope_min_timescale: 1
I0423 09:49:47.188164 135852897781568 pyconfig.py:471] Config param rope_theta_for_vit: 10000
I0423 09:49:47.188180 135852897781568 pyconfig.py:471] Config param rope_truncate: True
I0423 09:49:47.188194 135852897781568 pyconfig.py:471] Config param rope_type: RopeType.DEFAULT
I0423 09:49:47.188212 135852897781568 pyconfig.py:471] Config param rope_use_scale: True
I0423 09:49:47.188227 135852897781568 pyconfig.py:471] Config param routed_bias: False
I0423 09:49:47.188242 135852897781568 pyconfig.py:471] Config param routed_bias_update_rate: 0.0
I0423 09:49:47.188258 135852897781568 pyconfig.py:471] Config param routed_scaling_factor: 1.0
I0423 09:49:47.188274 135852897781568 pyconfig.py:471] Config param routed_score_func: 
I0423 09:49:47.188288 135852897781568 pyconfig.py:471] Config param run_name: gpt3-52k_2026-04-23-09-49
I0423 09:49:47.188304 135852897781568 pyconfig.py:471] Config param sa_block_kv: 512
I0423 09:49:47.188320 135852897781568 pyconfig.py:471] Config param sa_block_kv_compute: 512
I0423 09:49:47.188338 135852897781568 pyconfig.py:471] Config param sa_block_kv_dkv: 512
I0423 09:49:47.188353 135852897781568 pyconfig.py:471] Config param sa_block_kv_dkv_compute: 512
I0423 09:49:47.188369 135852897781568 pyconfig.py:471] Config param sa_block_kv_dq: 512
I0423 09:49:47.188383 135852897781568 pyconfig.py:471] Config param sa_block_q: 512
I0423 09:49:47.188398 135852897781568 pyconfig.py:471] Config param sa_block_q_dkv: 512
I0423 09:49:47.188413 135852897781568 pyconfig.py:471] Config param sa_block_q_dq: 512
I0423 09:49:47.188429 135852897781568 pyconfig.py:471] Config param sa_k_layout: HEAD_DIM_MINOR
I0423 09:49:47.188443 135852897781568 pyconfig.py:471] Config param sa_q_layout: HEAD_DIM_MINOR
I0423 09:49:47.188458 135852897781568 pyconfig.py:471] Config param sa_use_fused_bwd_kernel: False
I0423 09:49:47.188472 135852897781568 pyconfig.py:471] Config param sa_v_layout: HEAD_DIM_MINOR
I0423 09:49:47.188488 135852897781568 pyconfig.py:471] Config param sampler_devices_fraction: 0.5
I0423 09:49:47.188505 135852897781568 pyconfig.py:471] Config param save_checkpoint_on_completion: True
I0423 09:49:47.188520 135852897781568 pyconfig.py:471] Config param save_config_to_gcs: False
I0423 09:49:47.188536 135852897781568 pyconfig.py:471] Config param save_quantized_params_path: 
I0423 09:49:47.188551 135852897781568 pyconfig.py:471] Config param scale_embedding_for_audio: True
I0423 09:49:47.188566 135852897781568 pyconfig.py:471] Config param scan_layers: True
I0423 09:49:47.188581 135852897781568 pyconfig.py:471] Config param scan_layers_per_stage: False
I0423 09:49:47.188596 135852897781568 pyconfig.py:471] Config param scan_pipeline_iterations: True
I0423 09:49:47.188611 135852897781568 pyconfig.py:471] Config param scan_pipeline_repeats: False
I0423 09:49:47.188627 135852897781568 pyconfig.py:471] Config param set_remat_policy_on_layers_per_stage: False
I0423 09:49:47.188641 135852897781568 pyconfig.py:471] Config param set_remat_policy_on_pipeline_iterations: True
I0423 09:49:47.188656 135852897781568 pyconfig.py:471] Config param sft_train_on_completion_only: False
I0423 09:49:47.188672 135852897781568 pyconfig.py:471] Config param shard_exp_on_fsdp: False
I0423 09:49:47.188686 135852897781568 pyconfig.py:471] Config param shard_mode: ShardMode.AUTO
I0423 09:49:47.188703 135852897781568 pyconfig.py:471] Config param shard_optimizer_over_data: False
I0423 09:49:47.188717 135852897781568 pyconfig.py:471] Config param sharding_strategy: None
I0423 09:49:47.188732 135852897781568 pyconfig.py:471] Config param sharding_tolerance: 0.02
I0423 09:49:47.188747 135852897781568 pyconfig.py:471] Config param shardy: True
I0423 09:49:47.188763 135852897781568 pyconfig.py:471] Config param share_kv_projections: False
I0423 09:49:47.188778 135852897781568 pyconfig.py:471] Config param shared_experts: 0
I0423 09:49:47.188794 135852897781568 pyconfig.py:471] Config param sinkhorn_iterations: 20
I0423 09:49:47.188809 135852897781568 pyconfig.py:471] Config param skip_first_n_steps_for_profiler: 1
I0423 09:49:47.188824 135852897781568 pyconfig.py:471] Config param skip_jax_distributed_system: False
I0423 09:49:47.188839 135852897781568 pyconfig.py:471] Config param skip_step_interval: 128
I0423 09:49:47.188855 135852897781568 pyconfig.py:471] Config param skip_step_on_spikes: False
I0423 09:49:47.188871 135852897781568 pyconfig.py:471] Config param skip_step_scaling_factor: 6.0
I0423 09:49:47.188886 135852897781568 pyconfig.py:471] Config param sliding_window_size: 0
I0423 09:49:47.188900 135852897781568 pyconfig.py:471] Config param solution_end_token: </answer>
I0423 09:49:47.188916 135852897781568 pyconfig.py:471] Config param solution_start_token: <answer>
I0423 09:49:47.188932 135852897781568 pyconfig.py:471] Config param source_checkpoint_layout: orbax
I0423 09:49:47.188946 135852897781568 pyconfig.py:471] Config param sparse_matmul: True
I0423 09:49:47.188961 135852897781568 pyconfig.py:471] Config param spatial_merge_size_for_vit: 2
I0423 09:49:47.188977 135852897781568 pyconfig.py:471] Config param stack_prefill_result_cache: False
I0423 09:49:47.188992 135852897781568 pyconfig.py:471] Config param stack_trace_interval_seconds: 600
I0423 09:49:47.189007 135852897781568 pyconfig.py:471] Config param stack_trace_to_cloud: False
I0423 09:49:47.189023 135852897781568 pyconfig.py:471] Config param step_deviation_interval_seconds: 30
I0423 09:49:47.189038 135852897781568 pyconfig.py:471] Config param steps: 200000
I0423 09:49:47.189054 135852897781568 pyconfig.py:471] Config param stop_strings: None
I0423 09:49:47.189069 135852897781568 pyconfig.py:471] Config param student_overrides: {'model_name': 'llama3.1-8b'}
I0423 09:49:47.189104 135852897781568 pyconfig.py:471] Config param student_params_to_update: None
I0423 09:49:47.189122 135852897781568 pyconfig.py:471] Config param subslice_shape: 
I0423 09:49:47.189137 135852897781568 pyconfig.py:471] Config param swap_space_vllm_gb: 2
I0423 09:49:47.189152 135852897781568 pyconfig.py:471] Config param system_prompt: 
I0423 09:49:47.189166 135852897781568 pyconfig.py:471] Config param target_eval_loss: 0.0
I0423 09:49:47.189182 135852897781568 pyconfig.py:471] Config param teacher_overrides: {'model_name': 'llama3.1-8b'}
I0423 09:49:47.189198 135852897781568 pyconfig.py:471] Config param temperature_tuning: False
I0423 09:49:47.189212 135852897781568 pyconfig.py:471] Config param temporal_patch_size_for_vit: 2
I0423 09:49:47.189228 135852897781568 pyconfig.py:471] Config param tensorboard_dir: /deps/maxtext_output/gpt3-52k_2026-04-23-09-49/tensorboard/
I0423 09:49:47.189242 135852897781568 pyconfig.py:471] Config param tensors_on_device: None
I0423 09:49:47.189258 135852897781568 pyconfig.py:471] Config param tensors_to_offload: None
I0423 09:49:47.189273 135852897781568 pyconfig.py:471] Config param test_batch_start_index: 0
I0423 09:49:47.189289 135852897781568 pyconfig.py:471] Config param tile_size_for_vit: 336
I0423 09:49:47.189304 135852897781568 pyconfig.py:471] Config param tokenize_eval_data: True
I0423 09:49:47.189320 135852897781568 pyconfig.py:471] Config param tokenize_train_data: True
I0423 09:49:47.189337 135852897781568 pyconfig.py:471] Config param tokenizer_path: meta-llama/Llama-3.1-8B
I0423 09:49:47.189353 135852897781568 pyconfig.py:471] Config param tokenizer_type: TokenizerType.HUGGINGFACE
I0423 09:49:47.189370 135852897781568 pyconfig.py:471] Config param topk_routing_group: -1
I0423 09:49:47.189384 135852897781568 pyconfig.py:471] Config param train_data_columns: ['text']
I0423 09:49:47.189399 135852897781568 pyconfig.py:471] Config param train_fraction: 1.0
I0423 09:49:47.189414 135852897781568 pyconfig.py:471] Config param train_image_column: image
I0423 09:49:47.189430 135852897781568 pyconfig.py:471] Config param train_micro_batch_size: -1
I0423 09:49:47.189446 135852897781568 pyconfig.py:471] Config param train_split: train
I0423 09:49:47.189462 135852897781568 pyconfig.py:471] Config param trainable_parameters_mask: []
I0423 09:49:47.189478 135852897781568 pyconfig.py:471] Config param trainable_position_size: 2048
I0423 09:49:47.189492 135852897781568 pyconfig.py:471] Config param trainer_devices_fraction: 0.5
I0423 09:49:47.189508 135852897781568 pyconfig.py:471] Config param upload_all_profiler_results: False
I0423 09:49:47.189522 135852897781568 pyconfig.py:471] Config param use_2d_fsdp_sharding: False
I0423 09:49:47.189538 135852897781568 pyconfig.py:471] Config param use_agentic_rollout: False
I0423 09:49:47.189552 135852897781568 pyconfig.py:471] Config param use_audio: False
I0423 09:49:47.189568 135852897781568 pyconfig.py:471] Config param use_audio_in_video: False
I0423 09:49:47.189582 135852897781568 pyconfig.py:471] Config param use_batch_split_schedule: False
I0423 09:49:47.189597 135852897781568 pyconfig.py:471] Config param use_chat_template: False
I0423 09:49:47.189611 135852897781568 pyconfig.py:471] Config param use_chunked_prefill: False
I0423 09:49:47.189627 135852897781568 pyconfig.py:471] Config param use_custom_sort_vjp: True
I0423 09:49:47.189642 135852897781568 pyconfig.py:471] Config param use_dpo: False
I0423 09:49:47.189656 135852897781568 pyconfig.py:471] Config param use_gather_mosaic_kernel: False
I0423 09:49:47.189672 135852897781568 pyconfig.py:471] Config param use_grpo: True
I0423 09:49:47.189686 135852897781568 pyconfig.py:471] Config param use_indexer: False
I0423 09:49:47.189702 135852897781568 pyconfig.py:471] Config param use_iota_embed: True
I0423 09:49:47.189716 135852897781568 pyconfig.py:471] Config param use_jax_splash: False
I0423 09:49:47.189732 135852897781568 pyconfig.py:471] Config param use_max_logit_estimate: -1
I0423 09:49:47.189746 135852897781568 pyconfig.py:471] Config param use_mrope: False
I0423 09:49:47.189762 135852897781568 pyconfig.py:471] Config param use_multimodal: False
I0423 09:49:47.189776 135852897781568 pyconfig.py:471] Config param use_pathways: True
I0423 09:49:47.189791 135852897781568 pyconfig.py:471] Config param use_post_attn_norm: False
I0423 09:49:47.189807 135852897781568 pyconfig.py:471] Config param use_post_ffw_norm: False
I0423 09:49:47.189823 135852897781568 pyconfig.py:471] Config param use_qk_clip: False
I0423 09:49:47.189838 135852897781568 pyconfig.py:471] Config param use_qk_norm: False
I0423 09:49:47.189852 135852897781568 pyconfig.py:471] Config param use_qk_norm_in_gdn: True
I0423 09:49:47.189867 135852897781568 pyconfig.py:471] Config param use_qwix_quantization: False
I0423 09:49:47.189881 135852897781568 pyconfig.py:471] Config param use_ragged_attention: False
I0423 09:49:47.189897 135852897781568 pyconfig.py:471] Config param use_random_routing: False
I0423 09:49:47.189913 135852897781568 pyconfig.py:471] Config param use_replicator_service: False
I0423 09:49:47.189927 135852897781568 pyconfig.py:471] Config param use_ring_of_experts: False
I0423 09:49:47.189942 135852897781568 pyconfig.py:471] Config param use_sft: False
I0423 09:49:47.189956 135852897781568 pyconfig.py:471] Config param use_splash_scheduler: False
I0423 09:49:47.189972 135852897781568 pyconfig.py:471] Config param use_tokamax_gmm: False
I0423 09:49:47.189987 135852897781568 pyconfig.py:471] Config param use_tokamax_splash: False
I0423 09:49:47.190001 135852897781568 pyconfig.py:471] Config param use_truncation: True
I0423 09:49:47.190016 135852897781568 pyconfig.py:471] Config param use_tunix_gradient_accumulation: False
I0423 09:49:47.190030 135852897781568 pyconfig.py:471] Config param use_untrainable_positional_embedding: False
I0423 09:49:47.190045 135852897781568 pyconfig.py:471] Config param use_vertex_tensorboard: False
I0423 09:49:47.190061 135852897781568 pyconfig.py:471] Config param using_pipeline_parallelism: False
I0423 09:49:47.190077 135852897781568 pyconfig.py:471] Config param v_head_dim: 128
I0423 09:49:47.190091 135852897781568 pyconfig.py:471] Config param v_norm_with_scale: True
I0423 09:49:47.190115 135852897781568 pyconfig.py:471] Config param value_proj: RematLocation.REMAT
I0423 09:49:47.190131 135852897781568 pyconfig.py:471] Config param vertex_tensorboard_project: 
I0423 09:49:47.190145 135852897781568 pyconfig.py:471] Config param vertex_tensorboard_region: 
I0423 09:49:47.190161 135852897781568 pyconfig.py:471] Config param video_path: 
I0423 09:49:47.190176 135852897781568 pyconfig.py:471] Config param video_placeholder: <|video|>
I0423 09:49:47.190191 135852897781568 pyconfig.py:471] Config param vision_output_dim_for_vit: 4096
I0423 09:49:47.190207 135852897781568 pyconfig.py:471] Config param vision_output_length: -1
I0423 09:49:47.190221 135852897781568 pyconfig.py:471] Config param vllm_additional_config: {}
I0423 09:49:47.190237 135852897781568 pyconfig.py:471] Config param vllm_hf_config_path: 
I0423 09:49:47.190252 135852897781568 pyconfig.py:471] Config param vllm_hf_overrides: {}
I0423 09:49:47.190268 135852897781568 pyconfig.py:471] Config param vocab_size: 32000
I0423 09:49:47.190282 135852897781568 pyconfig.py:471] Config param warmup_steps_fraction: 0.1
I0423 09:49:47.190298 135852897781568 pyconfig.py:471] Config param weight_dtype: float32
I0423 09:49:47.190320 135852897781568 pyconfig.py:471] Config param weight_quantization_calibration_method: absmax
I0423 09:49:47.190338 135852897781568 pyconfig.py:471] Config param wi_tile_dlhs_batch_seq: 512
I0423 09:49:47.190353 135852897781568 pyconfig.py:471] Config param wi_tile_dlhs_embed_dim: 1024
I0423 09:49:47.190367 135852897781568 pyconfig.py:471] Config param wi_tile_dlhs_mlp_dim: 1024
I0423 09:49:47.190383 135852897781568 pyconfig.py:471] Config param wi_tile_drhs_batch_seq: 512
I0423 09:49:47.190397 135852897781568 pyconfig.py:471] Config param wi_tile_drhs_embed_dim: 1024
I0423 09:49:47.190412 135852897781568 pyconfig.py:471] Config param wi_tile_drhs_mlp_dim: 1024
I0423 09:49:47.190428 135852897781568 pyconfig.py:471] Config param wi_tile_fwd_batch_seq: 512
I0423 09:49:47.190443 135852897781568 pyconfig.py:471] Config param wi_tile_fwd_embed_dim: 1024
I0423 09:49:47.190458 135852897781568 pyconfig.py:471] Config param wi_tile_fwd_mlp_dim: 1024
I0423 09:49:47.190473 135852897781568 pyconfig.py:471] Config param wo_tile_dlhs_batch_seq: 512
I0423 09:49:47.190489 135852897781568 pyconfig.py:471] Config param wo_tile_dlhs_embed_dim: 1024
I0423 09:49:47.190504 135852897781568 pyconfig.py:471] Config param wo_tile_dlhs_mlp_dim: 1024
I0423 09:49:47.190520 135852897781568 pyconfig.py:471] Config param wo_tile_drhs_batch_seq: 512
I0423 09:49:47.190534 135852897781568 pyconfig.py:471] Config param wo_tile_drhs_embed_dim: 1024
I0423 09:49:47.190550 135852897781568 pyconfig.py:471] Config param wo_tile_drhs_mlp_dim: 1024
I0423 09:49:47.190565 135852897781568 pyconfig.py:471] Config param wo_tile_fwd_batch_seq: 512
I0423 09:49:47.190580 135852897781568 pyconfig.py:471] Config param wo_tile_fwd_embed_dim: 1024
I0423 09:49:47.190594 135852897781568 pyconfig.py:471] Config param wo_tile_fwd_mlp_dim: 1024
I0423 09:49:47.190610 135852897781568 pyconfig.py:471] Config param wsd_decay_steps_fraction: 0.1
I0423 09:49:47.190625 135852897781568 pyconfig.py:471] Config param wsd_decay_style: WsdDecayStyle.LINEAR
I0423 09:49:47.190642 135852897781568 pyconfig.py:471] Config param xprof_e2e_enable_fw_power_level_event: False
I0423 09:49:47.190657 135852897781568 pyconfig.py:471] Config param xprof_e2e_enable_fw_thermal_event: False
I0423 09:49:47.190672 135852897781568 pyconfig.py:471] Config param xprof_e2e_enable_fw_throttle_event: False
I0423 09:49:47.190686 135852897781568 pyconfig.py:471] Config param xprof_tpu_power_trace_level: 0
I0423 09:49:47.190704 135852897781568 pyconfig.py:471] Config param z_loss_multiplier: 0.0
I0423 09:49:47.191015 135852897781568 tokenizer.py:245] Tokenizer path: meta-llama/Llama-2-7b-chat-hf
I0423 09:49:47.191051 135852897781568 tokenizer.py:224] Loading HF tokenizer: meta-llama/Llama-2-7b-chat-hf
I0423 09:49:50.846153 135852897781568 _schedule.py:129] A polynomial schedule was set with a non-positive `transition_steps` value; this results in a constant schedule with value `init_value`.
I0423 09:49:50.849241 135852897781568 maxtext_utils.py:1771] Num_devices: 32, shape (1, 4, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1)
I0423 09:49:50.849375 135852897781568 train_distill.py:580] Applying logical axis rules for model initialization and training...
I0423 09:49:50.849448 135852897781568 train_distill.py:584] Loading Student from ...
I0423 09:49:50.849476 135852897781568 train_distill.py:168] --- Student Configuration ---
I0423 09:49:50.849497 135852897781568 train_distill.py:169]   Model Name:      gpt3-52k
I0423 09:49:50.849518 135852897781568 train_distill.py:170]   Dimensions:      1 Layers, 16 Emb Dim, 8 Head Dim
I0423 09:49:50.849535 135852897781568 train_distill.py:173]   Attention Heads: 2 Query, 2 KV
I0423 09:49:50.849554 135852897781568 train_distill.py:174]   Vocab Size:      32000
I0423 09:49:50.849570 135852897781568 train_distill.py:175]   Checkpoint:      
I0423 09:49:50.849586 135852897781568 train_distill.py:449] Initializing model: gpt3-52k...
I0423 09:49:52.217488 135852897781568 train_distill.py:598] Loading Teacher from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items...
I0423 09:49:52.217599 135852897781568 train_distill.py:168] --- Teacher Configuration ---
I0423 09:49:52.217628 135852897781568 train_distill.py:169]   Model Name:      gpt3-52k
I0423 09:49:52.217651 135852897781568 train_distill.py:170]   Dimensions:      1 Layers, 16 Emb Dim, 8 Head Dim
I0423 09:49:52.217672 135852897781568 train_distill.py:173]   Attention Heads: 2 Query, 2 KV
I0423 09:49:52.217690 135852897781568 train_distill.py:174]   Vocab Size:      32000
I0423 09:49:52.217708 135852897781568 train_distill.py:175]   Checkpoint:      gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items
I0423 09:49:52.217725 135852897781568 train_distill.py:449] Initializing model: gpt3-52k...
I0423 09:49:53.244612 135852897781568 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0423 09:49:53.245041 135852897781568 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=True, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7b8e0038e1b0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0423 09:49:53.245121 135852897781568 abstract_checkpointer.py:35] orbax-checkpoint version: 0.11.28
W0423 09:49:53.754223 135852897781568 checkpoint.py:202] Metadata file does not exist: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items/_CHECKPOINT_METADATA
I0423 09:49:54.272962    2125 google_auth_provider.cc:181] Running on GCE, using service account 562977990677-compute@developer.gserviceaccount.com
I0423 09:49:55.399883 135852897781568 checkpointer.py:304] Restoring checkpoint from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items.
W0423 09:49:58.410290 135852897781568 transform_utils.py:230] The transformations API will eventually be replaced by an upgraded design. The current API will not be removed until this point, but it will no longer be actively worked on.
I0423 09:49:58.410676 135852897781568 transform_utils.py:288] The following keys are not loaded from the original tree after applying specified transforms: params/params/decoder/to_nnx__rngs/aqt/count, params/params/decoder/to_nnx__rngs/aqt/key, params/params/decoder/to_nnx__rngs/dropout/count, params/params/decoder/to_nnx__rngs/dropout/key, params/params/decoder/to_nnx__rngs/params/count, params/params/decoder/to_nnx__rngs/params/key
I0423 09:50:01.711861 135852897781568 checkpointer.py:318] Finished restoring checkpoint in 6.69 seconds from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items.
I0423 09:50:02.408264 135852897781568 train_distill.py:624] Initializing Data Iterators via MaxText pipeline...
I0423 09:50:02.471661 135852897781568 config.py:112] TensorFlow version 2.20.0 available.
I0423 09:50:02.472182 135852897781568 config.py:125] JAX version 0.8.3 available.
E0423 09:50:04.511296 135852897781568 packing.py:209] PackAndBatchOperation is deprecated. Please use lazy_dataset.FirstFitPackIterDataset instead.
I0423 09:50:04.511516 135852897781568 data_loader.py:408] Adding CopyNumPyArrayToSharedMemory MapTransform.
I0423 09:50:04.514687 135852897781568 train_distill.py:394] Input Pipeline Checkpointing: DISABLED
I0423 09:50:04.514757 135852897781568 train_distill.py:398] Reason: Iterator 'MultiHostDataLoadIterator' is not recognized as Grain (dataset_type='DatasetType.HF', has_save=False)
I0423 09:50:04.514822 135852897781568 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0423 09:50:04.514902 135852897781568 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7b8e0038e1b0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0423 09:50:04.514944 135852897781568 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0423 09:50:04.514976 135852897781568 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7b8e0038e1b0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0423 09:50:04.515035 135852897781568 checkpoint_manager.py:702] [process=6][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b77257b8410>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b77268e5c40>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7b7724ed71a0>}, handler_registry=None
I0423 09:50:04.515274 135852897781568 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b77257b8410>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0423 09:50:04.515329 135852897781568 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b77268e5c40>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0423 09:50:04.515366 135852897781568 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7b7724ed71a0>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0423 09:50:04.515403 135852897781568 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7b77255c55b0>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0423 09:50:04.515444 135852897781568 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b77257b8410>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b77257b8410>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b77268e5c40>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b77268e5c40>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7b7724ed71a0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7b7724ed71a0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7b77255c55b0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7b77255c55b0>}).
I0423 09:50:04.515907 135852897781568 async_checkpointer.py:177] [process=6][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7b7724e5d9e0> timeout: 600 secs and primary_host=0 for async checkpoint writes
I0423 09:50:07.404761 135852897781568 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_trainstate_and_training_loop_20260423_093815/pt_distill_linen_xpk_feat_nnx_trainstate_and_training_loop_20260423_093815_07_distill_smoke/checkpoints
I0423 09:50:07.406962 135852897781568 checkpoint_manager.py:921] [process=6][thread=MainThread] CheckpointManager created,  primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_trainstate_and_training_loop_20260423_093815/pt_distill_linen_xpk_feat_nnx_trainstate_and_training_loop_20260423_093815_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7b7724ed7170>
I0423 09:50:07.407073 135852897781568 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0423 09:50:07.407151 135852897781568 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7b8e0038e1b0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0423 09:50:07.407188 135852897781568 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0423 09:50:07.407219 135852897781568 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7b8e0038e1b0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0423 09:50:07.407253 135852897781568 checkpoint_manager.py:1983] [process=6][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning.
I0423 09:50:07.407305 135852897781568 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=135852897781568 count=1 at 0x7b7724f00a80>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7b7724ed6f60>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7b7724ed6f30>, _write_futures=[])
I0423 09:50:07.407658 135852897781568 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=135852897781568 count=1 at 0x7b7724f00a80>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7b7724ed6f60>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7b7724ed6f30>, _write_futures=[])
I0423 09:50:07.407685 135852897781568 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=135852897781568 count=1 at 0x7b7724f00a80>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7b7724ed6f60>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7b7724ed6f30>, _write_futures=[])
I0423 09:50:07.407716 135852897781568 checkpoint_manager.py:702] [process=6][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b7724ed7140>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b7724cd24e0>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7b7724cd2600>, 'iter': <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7b7724cd1a90>}, handler_registry=None
I0423 09:50:07.407809 135852897781568 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b7724ed7140>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0423 09:50:07.407843 135852897781568 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b7724cd24e0>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0423 09:50:07.407866 135852897781568 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7b7724cd2600>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0423 09:50:07.407892 135852897781568 composite_checkpoint_handler.py:237] Deferred registration for item: "iter". Adding handler `<maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7b7724cd1a90>` for item "iter" and save args `<class 'maxtext.common.checkpointing.GrainCheckpointSave'>` and restore args `<class 'maxtext.common.checkpointing.GrainCheckpointRestore'>` to `_handler_registry`.
I0423 09:50:07.407914 135852897781568 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7b7724cd1550>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0423 09:50:07.407937 135852897781568 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b7724ed7140>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b7724ed7140>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b7724cd24e0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b7724cd24e0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7b7724cd2600>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7b7724cd2600>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointSave'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7b7724cd1a90>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointRestore'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7b7724cd1a90>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7b7724cd1550>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7b7724cd1550>}).
I0423 09:50:07.408006 135852897781568 async_checkpointer.py:177] [process=6][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7b7724e5db20> timeout: 600 secs and primary_host=0 for async checkpoint writes
I0423 09:50:08.214781 135852897781568 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_trainstate_and_training_loop_20260423_093815/pt_distill_linen_xpk_feat_nnx_trainstate_and_training_loop_20260423_093815_07_distill_smoke/checkpoints
I0423 09:50:08.653808 135852897781568 checkpoint_manager.py:921] [process=6][thread=MainThread] CheckpointManager created,  primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_trainstate_and_training_loop_20260423_093815/pt_distill_linen_xpk_feat_nnx_trainstate_and_training_loop_20260423_093815_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7b7724ed6600>
I0423 09:50:08.654454 135852897781568 train_distill.py:675] Starting Distillation Training...
I0423 09:50:08.654569 135852897781568 peft_trainer.py:590] Training with mesh: Mesh('diloco': 1, 'data': 4, 'stage': 1, 'fsdp': 8, 'fsdp_transpose': 1, 'sequence': 1, 'context': 1, 'context_autoregressive': 1, 'tensor': 1, 'tensor_transpose': 1, 'tensor_sequence': 1, 'expert': 1, 'autoregressive': 1, axis_types=(Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto))
I0423 09:50:09.456498 135852897781568 peft_trainer.py:600] Compiled train_step cache size: 0

Training:   0%|          | 0/5 [00:00<?, ?step/s]I0423 09:50:09.458325 135709281216256 grain_pool.py:367] Grain pool will use 1 processes.
I0423 09:50:09.485351 135709281216256 grain_pool.py:440] Grain pool will start child processes.
I0423 09:50:09.490514 135709281216256 grain_pool.py:448] Grain pool started all child processes.
2026-04-23 09:50:15.498312: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'}
`rope_scaling`'s factor field must be a float >= 1, got 40
`rope_scaling`'s beta_fast field must be a float, got 32
`rope_scaling`'s beta_slow field must be a float, got 1
Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'}
Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'}
Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'}
I0423 09:50:18.896497 135852897781568 utils.py:86] Train loop finished in: 9.4394 seconds
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 749, in <module>
    app.run(main)
  File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 316, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 261, in _run_main
    sys.exit(main(argv))
             ^^^^^^^^^^
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 745, in main
    train_distill(student_config, teacher_config, is_offline, global_config.offline_data_dir)
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 677, in train_distill
    trainer.train(train_iter, eval_iter)
  File "/usr/local/lib/python3.12/site-packages/tunix/sft/peft_trainer.py", line 659, in train
    train_example = sharding_utils.shard_input(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 58, in shard_input
    return jax.tree.map(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/tree.py", line 155, in map
    return tree_util.tree_map(f, tree, *rest, is_leaf=is_leaf)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 362, in tree_map
    return treedef.unflatten(f(*xs) for xs in zip(*all_leaves))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 362, in <genexpr>
    return treedef.unflatten(f(*xs) for xs in zip(*all_leaves))
                             ^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 59, in <lambda>
    lambda x: jax.make_array_from_process_local_data(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 986, in make_array_from_process_local_data
    out = [_array_from_process_local_data(data, s, shape)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 1048, in _array_from_process_local_data
    return make_array_from_callback(global_shape, sharding, cb)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 845, in make_array_from_callback
    per_device_values = api.device_put(per_device_values, devices)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/api.py", line 2729, in device_put
    out_flat = dispatch._batched_device_put_impl(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 558, in _batched_device_put_impl
    y = _device_put_impl(x, device=device, src=src, copy=cp, aval=aval)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 545, in _device_put_impl
    return _device_put_sharding_impl(x, aval, device, copy)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 487, in _device_put_sharding_impl
    raise ValueError(
ValueError: device_put's first argument must be a fully addressable array, but got value with devices {TpuDevice(id=27, process_index=7, coords=(3,6,0), core_on_chip=0), TpuDevice(id=13, process_index=2, coords=(1,3,0), core_on_chip=0), TpuDevice(id=26, process_index=7, coords=(2,6,0), core_on_chip=0), TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), TpuDevice(id=24, process_index=6, coords=(0,6,0), core_on_chip=0), TpuDevice(id=5, process_index=0, coords=(1,1,0), core_on_chip=0), TpuDevice(id=8, process_index=2, coords=(0,2,0), core_on_chip=0), TpuDevice(id=20, process_index=4, coords=(0,5,0), core_on_chip=0), TpuDevice(id=10, process_index=3, coords=(2,2,0), core_on_chip=0), TpuDevice(id=9, process_index=2, coords=(1,2,0), core_on_chip=0), TpuDevice(id=17, process_index=4, coords=(1,4,0), core_on_chip=0), TpuDevice(id=1, process_index=0, coords=(1,0,0), core_on_chip=0), TpuDevice(id=23, process_index=5, coords=(3,5,0), core_on_chip=0), TpuDevice(id=4, process_index=0, coords=(0,1,0), core_on_chip=0), TpuDevice(id=18, process_index=5, coords=(2,4,0), core_on_chip=0), TpuDevice(id=3, process_index=1, coords=(3,0,0), core_on_chip=0), TpuDevice(id=28, process_index=6, coords=(0,7,0), core_on_chip=0), TpuDevice(id=21, process_index=4, coords=(1,5,0), core_on_chip=0), TpuDevice(id=29, process_index=6, coords=(1,7,0), core_on_chip=0), TpuDevice(id=30, process_index=7, coords=(2,7,0), core_on_chip=0), TpuDevice(id=25, process_index=6, coords=(1,6,0), core_on_chip=0), TpuDevice(id=2, process_index=1, coords=(2,0,0), core_on_chip=0), TpuDevice(id=11, process_index=3, coords=(3,2,0), core_on_chip=0), TpuDevice(id=19, process_index=5, coords=(3,4,0), core_on_chip=0), TpuDevice(id=7, process_index=1, coords=(3,1,0), core_on_chip=0), TpuDevice(id=15, process_index=3, coords=(3,3,0), core_on_chip=0), TpuDevice(id=16, process_index=4, coords=(0,4,0), core_on_chip=0), TpuDevice(id=6, process_index=1, coords=(2,1,0), core_on_chip=0), TpuDevice(id=31, process_index=7, coords=(3,7,0), core_on_chip=0), TpuDevice(id=12, process_index=2, coords=(0,3,0), core_on_chip=0), TpuDevice(id=22, process_index=5, coords=(2,5,0), core_on_chip=0), TpuDevice(id=14, process_index=3, coords=(2,3,0), core_on_chip=0)}
I0423 09:50:19.237888 135709281216256 grain_pool.py:542] Grain pool is exiting.
I0423 09:50:19.238008 135709281216256 grain_pool.py:547] Shutting down multiprocessing system.
I0423 09:50:20.702233 135709281216256 grain_pool.py:547] Shutting down multiprocessing system.

Training:   0%|          | 0/5 [00:13<?, ?step/s]
/usr/local/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 15 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
XPK End: Thu Apr 23 09:50:28 UTC 2026
EXIT_CODE=1