MaxView

← Back to run

Log Summary

XPK Start: Fri Apr 24 20:19:23 UTC 2026
2026-04-24 20:19:40.482239: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'}
`rope_scaling`'s factor field must be a float >= 1, got 40
`rope_scaling`'s beta_fast field must be a float, got 32
`rope_scaling`'s beta_slow field must be a float, got 1
Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'}
Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'}
Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'}
I0424 20:19:44.456224 138012459587392 max_utils.py:273] Attempting to initialize the jax distributed system...
INFO:2026-04-24 20:19:53,493:jax._src.distributed:149: Starting JAX distributed service on [::]:8482
I0424 20:19:53.493682 138012459587392 distributed.py:149] Starting JAX distributed service on [::]:8482
INFO:2026-04-24 20:19:53,498:jax._src.distributed:166: Connecting to JAX distributed service on mt-07-distill-smoke-5bwv2-slice-job-0-0.mt-07-distill-smoke-5bwv2:8482
I0424 20:19:53.498256 138012459587392 distributed.py:166] Connecting to JAX distributed service on mt-07-distill-smoke-5bwv2-slice-job-0-0.mt-07-distill-smoke-5bwv2:8482
I0424 20:19:54.962059 138012459587392 max_utils.py:284] Jax distributed system initialized!
I0424 20:20:00.476881 138012459587392 max_utils.py:244] Jax distributed system is already initialized.
W0424 20:20:00.607621 138012459587392 pyconfig.py:111] base_output_directory is not provided; Using local directory called maxtext_output
I0424 20:20:00.667216 138012459587392 max_utils.py:244] Jax distributed system is already initialized.
I0424 20:20:00.668395 138012459587392 pyconfig.py:471] Config param abort_on_inf_loss: True
I0424 20:20:00.668441 138012459587392 pyconfig.py:471] Config param abort_on_nan_loss: True
I0424 20:20:00.668468 138012459587392 pyconfig.py:471] Config param act_quantization_calibration_method: absmax
I0424 20:20:00.668490 138012459587392 pyconfig.py:471] Config param activation_dropout_for_audio: 0.0
I0424 20:20:00.668514 138012459587392 pyconfig.py:471] Config param activation_function_for_audio: gelu
I0424 20:20:00.668533 138012459587392 pyconfig.py:471] Config param activations_in_float32: False
I0424 20:20:00.668549 138012459587392 pyconfig.py:471] Config param adam_b1: 0.9
I0424 20:20:00.668567 138012459587392 pyconfig.py:471] Config param adam_b2: 0.95
I0424 20:20:00.668583 138012459587392 pyconfig.py:471] Config param adam_eps: 1e-08
I0424 20:20:00.668605 138012459587392 pyconfig.py:471] Config param adam_eps_root: 0.0
I0424 20:20:00.668621 138012459587392 pyconfig.py:471] Config param adam_weight_decay: 0.1
I0424 20:20:00.668640 138012459587392 pyconfig.py:471] Config param adamw_mask: []
I0424 20:20:00.668658 138012459587392 pyconfig.py:471] Config param add_bos: True
I0424 20:20:00.668675 138012459587392 pyconfig.py:471] Config param add_eos: True
I0424 20:20:00.668691 138012459587392 pyconfig.py:471] Config param allow_split_physical_axes: False
I0424 20:20:00.668708 138012459587392 pyconfig.py:471] Config param ar_cache_axis_order: 1,2,0,3
I0424 20:20:00.668725 138012459587392 pyconfig.py:471] Config param async_checkpointing: True
I0424 20:20:00.668741 138012459587392 pyconfig.py:471] Config param async_scheduling: False
I0424 20:20:00.668756 138012459587392 pyconfig.py:471] Config param attention: dot_product
I0424 20:20:00.668773 138012459587392 pyconfig.py:471] Config param attention_bias: False
I0424 20:20:00.668790 138012459587392 pyconfig.py:471] Config param attention_dropout_for_audio: 0.0
I0424 20:20:00.668806 138012459587392 pyconfig.py:471] Config param attention_out: RematLocation.REMAT
I0424 20:20:00.668827 138012459587392 pyconfig.py:471] Config param attention_output_dim: -1
I0424 20:20:00.668843 138012459587392 pyconfig.py:471] Config param attention_sink: False
I0424 20:20:00.668860 138012459587392 pyconfig.py:471] Config param attention_type: global
I0424 20:20:00.668875 138012459587392 pyconfig.py:471] Config param attn_logits_soft_cap: None
I0424 20:20:00.668892 138012459587392 pyconfig.py:471] Config param audio_path: 
I0424 20:20:00.668909 138012459587392 pyconfig.py:471] Config param audio_placeholder: <|audio|>
I0424 20:20:00.668924 138012459587392 pyconfig.py:471] Config param autoregressive_decode_assert: 
I0424 20:20:00.668940 138012459587392 pyconfig.py:471] Config param base_config: base.yml
I0424 20:20:00.668956 138012459587392 pyconfig.py:471] Config param base_emb_dim: 16
I0424 20:20:00.668973 138012459587392 pyconfig.py:471] Config param base_mlp_dim: 64
I0424 20:20:00.668989 138012459587392 pyconfig.py:471] Config param base_moe_mlp_dim: -1
I0424 20:20:00.669004 138012459587392 pyconfig.py:471] Config param base_num_decoder_layers: 1
I0424 20:20:00.669020 138012459587392 pyconfig.py:471] Config param base_num_kv_heads: 2
I0424 20:20:00.669036 138012459587392 pyconfig.py:471] Config param base_num_query_heads: 2
I0424 20:20:00.669053 138012459587392 pyconfig.py:471] Config param base_output_directory: /deps/maxtext_output
I0424 20:20:00.669069 138012459587392 pyconfig.py:471] Config param batch_size: 1
I0424 20:20:00.669086 138012459587392 pyconfig.py:471] Config param batch_split_factor: 1
I0424 20:20:00.669120 138012459587392 pyconfig.py:471] Config param beta_fast: 32
I0424 20:20:00.669136 138012459587392 pyconfig.py:471] Config param beta_slow: 1
I0424 20:20:00.669152 138012459587392 pyconfig.py:471] Config param bwd_quantization_calibration_method: absmax
I0424 20:20:00.669170 138012459587392 pyconfig.py:471] Config param capacity_factor: -1.0
I0424 20:20:00.669187 138012459587392 pyconfig.py:471] Config param cast_logits_to_fp32: True
I0424 20:20:00.669203 138012459587392 pyconfig.py:471] Config param chat_template: 
I0424 20:20:00.669220 138012459587392 pyconfig.py:471] Config param chat_template_path: 
I0424 20:20:00.669241 138012459587392 pyconfig.py:471] Config param checkpoint_conversion_fn: None
I0424 20:20:00.669259 138012459587392 pyconfig.py:471] Config param checkpoint_dir: /deps/maxtext_output/gpt3-52k_2026-04-24-20-20/checkpoints/
I0424 20:20:00.669276 138012459587392 pyconfig.py:471] Config param checkpoint_is_quantized: False
I0424 20:20:00.669293 138012459587392 pyconfig.py:471] Config param checkpoint_period: 2000
I0424 20:20:00.669310 138012459587392 pyconfig.py:471] Config param checkpoint_storage_concurrent_gb: 96
I0424 20:20:00.669342 138012459587392 pyconfig.py:471] Config param checkpoint_storage_target_data_file_size_bytes: 2147483648
I0424 20:20:00.669360 138012459587392 pyconfig.py:471] Config param checkpoint_storage_use_ocdbt: True
I0424 20:20:00.669382 138012459587392 pyconfig.py:471] Config param checkpoint_storage_use_zarr3: True
I0424 20:20:00.669407 138012459587392 pyconfig.py:471] Config param checkpoint_todelete_full_path: None
I0424 20:20:00.669433 138012459587392 pyconfig.py:471] Config param checkpoint_todelete_subdir: None
I0424 20:20:00.669451 138012459587392 pyconfig.py:471] Config param chips_per_vm: 4
I0424 20:20:00.669466 138012459587392 pyconfig.py:471] Config param chunk_attn_window_size: 0
I0424 20:20:00.669482 138012459587392 pyconfig.py:471] Config param collect_stack_trace: False
I0424 20:20:00.669497 138012459587392 pyconfig.py:471] Config param colocated_python_checkpointing: False
I0424 20:20:00.669513 138012459587392 pyconfig.py:471] Config param colocated_python_data_input: False
I0424 20:20:00.669528 138012459587392 pyconfig.py:471] Config param compile_topology: 
I0424 20:20:00.669542 138012459587392 pyconfig.py:471] Config param compile_topology_num_slices: -1
I0424 20:20:00.669558 138012459587392 pyconfig.py:471] Config param compile_xla_flags: 
I0424 20:20:00.669573 138012459587392 pyconfig.py:471] Config param compiled_trainstep_file: 
I0424 20:20:00.669589 138012459587392 pyconfig.py:471] Config param compute_axis_order: 0,1,2,3
I0424 20:20:00.669606 138012459587392 pyconfig.py:471] Config param constant_bound_config: []
I0424 20:20:00.669620 138012459587392 pyconfig.py:471] Config param context: RematLocation.REMAT
I0424 20:20:00.669635 138012459587392 pyconfig.py:471] Config param context_parallel_load_balance: True
I0424 20:20:00.669650 138012459587392 pyconfig.py:471] Config param context_parallel_reorder_strategy: ReorderStrategy.AUTO
I0424 20:20:00.669666 138012459587392 pyconfig.py:471] Config param context_parallel_size: 1
I0424 20:20:00.669682 138012459587392 pyconfig.py:471] Config param context_parallel_strategy: all_gather
I0424 20:20:00.669698 138012459587392 pyconfig.py:471] Config param context_sharding: context
I0424 20:20:00.669713 138012459587392 pyconfig.py:471] Config param conv_chunksize_for_audio: 500
I0424 20:20:00.669728 138012459587392 pyconfig.py:471] Config param conv_stride_for_vit: 14
I0424 20:20:00.669742 138012459587392 pyconfig.py:471] Config param convert_checkpoint_if_possible: False
I0424 20:20:00.669756 138012459587392 pyconfig.py:471] Config param cost_estimate_flops_bwd: -1
I0424 20:20:00.669772 138012459587392 pyconfig.py:471] Config param cost_estimate_flops_fwd: -1
I0424 20:20:00.669786 138012459587392 pyconfig.py:471] Config param custom_mesh: 
I0424 20:20:00.669800 138012459587392 pyconfig.py:471] Config param custom_mesh_and_rule: 
I0424 20:20:00.669814 138012459587392 pyconfig.py:471] Config param d_model_for_audio: 256
I0424 20:20:00.669828 138012459587392 pyconfig.py:471] Config param data_sharding: (('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'),)
I0424 20:20:00.669846 138012459587392 pyconfig.py:471] Config param data_shuffle_seed: 0
I0424 20:20:00.669861 138012459587392 pyconfig.py:471] Config param dataset_name: c4/en:3.0.1
I0424 20:20:00.669877 138012459587392 pyconfig.py:471] Config param dataset_path: 
I0424 20:20:00.669891 138012459587392 pyconfig.py:471] Config param dataset_type: DatasetType.HF
I0424 20:20:00.669908 138012459587392 pyconfig.py:471] Config param dcn_autoregressive_parallelism: 1
I0424 20:20:00.669923 138012459587392 pyconfig.py:471] Config param dcn_context_autoregressive_parallelism: 1
I0424 20:20:00.669939 138012459587392 pyconfig.py:471] Config param dcn_context_parallelism: 1
I0424 20:20:00.669953 138012459587392 pyconfig.py:471] Config param dcn_data_parallelism: -1
I0424 20:20:00.669969 138012459587392 pyconfig.py:471] Config param dcn_diloco_parallelism: 1
I0424 20:20:00.669984 138012459587392 pyconfig.py:471] Config param dcn_expert_parallelism: 1
I0424 20:20:00.669999 138012459587392 pyconfig.py:471] Config param dcn_fsdp_parallelism: 1
I0424 20:20:00.670013 138012459587392 pyconfig.py:471] Config param dcn_fsdp_transpose_parallelism: 1
I0424 20:20:00.670030 138012459587392 pyconfig.py:471] Config param dcn_parallelism: [1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
I0424 20:20:00.670047 138012459587392 pyconfig.py:471] Config param dcn_pipeline_parallelism: 1
I0424 20:20:00.670063 138012459587392 pyconfig.py:471] Config param dcn_sequence_parallelism: 1
I0424 20:20:00.670078 138012459587392 pyconfig.py:471] Config param dcn_tensor_parallelism: 1
I0424 20:20:00.670102 138012459587392 pyconfig.py:471] Config param dcn_tensor_sequence_parallelism: 1
I0424 20:20:00.670116 138012459587392 pyconfig.py:471] Config param dcn_tensor_transpose_parallelism: 1
I0424 20:20:00.670132 138012459587392 pyconfig.py:471] Config param debug: {'rl': False}
I0424 20:20:00.670148 138012459587392 pyconfig.py:471] Config param debug_sharding: False
I0424 20:20:00.670167 138012459587392 pyconfig.py:471] Config param decode_sampling_nucleus_p: -1
I0424 20:20:00.670183 138012459587392 pyconfig.py:471] Config param decode_sampling_strategy: SamplingStrategy.GREEDY
I0424 20:20:00.670201 138012459587392 pyconfig.py:471] Config param decode_sampling_temperature: 1.0
I0424 20:20:00.670217 138012459587392 pyconfig.py:471] Config param decode_sampling_top_k: 0
I0424 20:20:00.670236 138012459587392 pyconfig.py:471] Config param decoder_block: DecoderBlockType.GPT3
I0424 20:20:00.670253 138012459587392 pyconfig.py:471] Config param decoder_layer_input: RematLocation.DEVICE
I0424 20:20:00.670270 138012459587392 pyconfig.py:471] Config param deepstack_visual_indexes_for_vit: []
I0424 20:20:00.670286 138012459587392 pyconfig.py:471] Config param degenerate_group_masking: True
I0424 20:20:00.670300 138012459587392 pyconfig.py:471] Config param dense_init_scale: 1.0
I0424 20:20:00.670317 138012459587392 pyconfig.py:471] Config param diloco_outer_lr: 0.3
I0424 20:20:00.670333 138012459587392 pyconfig.py:471] Config param diloco_outer_momentum: 0.9
I0424 20:20:00.670348 138012459587392 pyconfig.py:471] Config param diloco_sync_period: 36
I0424 20:20:00.670364 138012459587392 pyconfig.py:471] Config param distill_alpha: 0.5
I0424 20:20:00.670379 138012459587392 pyconfig.py:471] Config param distill_alpha_end: None
I0424 20:20:00.670393 138012459587392 pyconfig.py:471] Config param distill_alpha_schedule: constant
I0424 20:20:00.670409 138012459587392 pyconfig.py:471] Config param distill_beta: 0.0
I0424 20:20:00.670425 138012459587392 pyconfig.py:471] Config param distill_beta_end: None
I0424 20:20:00.670439 138012459587392 pyconfig.py:471] Config param distill_beta_schedule: constant
I0424 20:20:00.670455 138012459587392 pyconfig.py:471] Config param distill_feature_loss_type: cosine
I0424 20:20:00.670469 138012459587392 pyconfig.py:471] Config param distill_layer_indices: None
I0424 20:20:00.670484 138012459587392 pyconfig.py:471] Config param distill_temperature: 1.0
I0424 20:20:00.670499 138012459587392 pyconfig.py:471] Config param distill_temperature_end: None
I0424 20:20:00.670514 138012459587392 pyconfig.py:471] Config param distill_temperature_schedule: constant
I0424 20:20:00.670529 138012459587392 pyconfig.py:471] Config param downsample_hidden_size_for_audio: 256
I0424 20:20:00.670544 138012459587392 pyconfig.py:471] Config param dpo_beta: 0.1
I0424 20:20:00.670559 138012459587392 pyconfig.py:471] Config param dpo_label_smoothing: 0.0
I0424 20:20:00.670575 138012459587392 pyconfig.py:471] Config param dq_reduction_steps: 0
I0424 20:20:00.670590 138012459587392 pyconfig.py:471] Config param dropout_rate: 0.0
I0424 20:20:00.670604 138012459587392 pyconfig.py:471] Config param dtype: bfloat16
I0424 20:20:00.670635 138012459587392 pyconfig.py:471] Config param dtype_mm: float32
I0424 20:20:00.670651 138012459587392 pyconfig.py:471] Config param dump_hlo: False
I0424 20:20:00.670668 138012459587392 pyconfig.py:471] Config param dump_hlo_delete_local_after: True
I0424 20:20:00.670683 138012459587392 pyconfig.py:471] Config param dump_hlo_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-24-20-20/xla_dump
I0424 20:20:00.670700 138012459587392 pyconfig.py:471] Config param dump_hlo_local_dir: /tmp/xla_dump/
I0424 20:20:00.670715 138012459587392 pyconfig.py:471] Config param dump_hlo_local_module_name: jit_train_step
I0424 20:20:00.670731 138012459587392 pyconfig.py:471] Config param dump_hlo_module_name: jit_train_step
I0424 20:20:00.670747 138012459587392 pyconfig.py:471] Config param dump_hlo_upload_all: False
I0424 20:20:00.670762 138012459587392 pyconfig.py:471] Config param dump_hlo_xla_flags: 
I0424 20:20:00.670777 138012459587392 pyconfig.py:471] Config param dump_jaxpr: False
I0424 20:20:00.670792 138012459587392 pyconfig.py:471] Config param dump_jaxpr_delete_local_after: True
I0424 20:20:00.670808 138012459587392 pyconfig.py:471] Config param dump_jaxpr_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-24-20-20/jaxpr_dump
I0424 20:20:00.670824 138012459587392 pyconfig.py:471] Config param dump_jaxpr_local_dir: /tmp/jaxpr_dump/
I0424 20:20:00.670840 138012459587392 pyconfig.py:471] Config param dump_step: -1
I0424 20:20:00.670854 138012459587392 pyconfig.py:471] Config param elastic_enabled: False
I0424 20:20:00.670870 138012459587392 pyconfig.py:471] Config param elastic_max_retries: 10
I0424 20:20:00.670886 138012459587392 pyconfig.py:471] Config param elastic_timeout_seconds: 300
I0424 20:20:00.670902 138012459587392 pyconfig.py:471] Config param emb_dim: 16
I0424 20:20:00.670917 138012459587392 pyconfig.py:471] Config param enable_autocheckpoint: False
I0424 20:20:00.670933 138012459587392 pyconfig.py:471] Config param enable_checkpoint_cloud_logger: False
I0424 20:20:00.670948 138012459587392 pyconfig.py:471] Config param enable_checkpointing: True
I0424 20:20:00.670964 138012459587392 pyconfig.py:471] Config param enable_continuous_checkpointing: False
I0424 20:20:00.670978 138012459587392 pyconfig.py:471] Config param enable_data_shuffling: True
I0424 20:20:00.670992 138012459587392 pyconfig.py:471] Config param enable_diloco: False
I0424 20:20:00.671008 138012459587392 pyconfig.py:471] Config param enable_dp_attention: False
I0424 20:20:00.671022 138012459587392 pyconfig.py:471] Config param enable_dropout: False
I0424 20:20:00.671037 138012459587392 pyconfig.py:471] Config param enable_emergency_checkpoint: False
I0424 20:20:00.671052 138012459587392 pyconfig.py:471] Config param enable_expert_parallel: False
I0424 20:20:00.671067 138012459587392 pyconfig.py:471] Config param enable_gcp_goodput_metrics: True
I0424 20:20:00.671082 138012459587392 pyconfig.py:471] Config param enable_gcp_step_deviation_metrics: True
I0424 20:20:00.671107 138012459587392 pyconfig.py:471] Config param enable_goodput_recording: False
I0424 20:20:00.671123 138012459587392 pyconfig.py:471] Config param enable_jax_profiler: False
I0424 20:20:00.671139 138012459587392 pyconfig.py:471] Config param enable_llm_inference_pool: False
I0424 20:20:00.671153 138012459587392 pyconfig.py:471] Config param enable_model_warmup: False
I0424 20:20:00.671169 138012459587392 pyconfig.py:471] Config param enable_multi_tier_checkpointing: False
I0424 20:20:00.671185 138012459587392 pyconfig.py:471] Config param enable_nnx: False
I0424 20:20:00.671200 138012459587392 pyconfig.py:471] Config param enable_orbax_v1: False
I0424 20:20:00.671215 138012459587392 pyconfig.py:471] Config param enable_padding_causal_mask: True
I0424 20:20:00.671235 138012459587392 pyconfig.py:471] Config param enable_pathways_goodput: False
I0424 20:20:00.671249 138012459587392 pyconfig.py:471] Config param enable_prefix_caching: False
I0424 20:20:00.671265 138012459587392 pyconfig.py:471] Config param enable_rampup_batch_size: False
I0424 20:20:00.671281 138012459587392 pyconfig.py:471] Config param enable_single_controller: False
I0424 20:20:00.671295 138012459587392 pyconfig.py:471] Config param enable_single_replica_ckpt_restoring: False
I0424 20:20:00.671310 138012459587392 pyconfig.py:471] Config param enable_tensorboard: True
I0424 20:20:00.671326 138012459587392 pyconfig.py:471] Config param enable_tunix_perf_metrics: False
I0424 20:20:00.671342 138012459587392 pyconfig.py:471] Config param encoder_attention_heads_for_audio: 4
I0424 20:20:00.671356 138012459587392 pyconfig.py:471] Config param encoder_ffn_dim_for_audio: 512
I0424 20:20:00.671370 138012459587392 pyconfig.py:471] Config param encoder_layers_for_audio: 2
I0424 20:20:00.671385 138012459587392 pyconfig.py:471] Config param engram: RematLocation.REMAT
I0424 20:20:00.671399 138012459587392 pyconfig.py:471] Config param engram_head_dim: 1280
I0424 20:20:00.671413 138012459587392 pyconfig.py:471] Config param engram_kernel_size: 4
I0424 20:20:00.671430 138012459587392 pyconfig.py:471] Config param engram_layers: []
I0424 20:20:00.671444 138012459587392 pyconfig.py:471] Config param engram_max_ngram_size: 3
I0424 20:20:00.671461 138012459587392 pyconfig.py:471] Config param engram_num_heads: 8
I0424 20:20:00.671477 138012459587392 pyconfig.py:471] Config param engram_seed: 0
I0424 20:20:00.671491 138012459587392 pyconfig.py:471] Config param engram_vocab_bases: []
I0424 20:20:00.671506 138012459587392 pyconfig.py:471] Config param epsilon_high: None
I0424 20:20:00.671520 138012459587392 pyconfig.py:471] Config param eval_corr_lst: False
I0424 20:20:00.671534 138012459587392 pyconfig.py:471] Config param eval_data_columns: ['text']
I0424 20:20:00.671550 138012459587392 pyconfig.py:471] Config param eval_dataset_name: c4/en:3.0.1
I0424 20:20:00.671564 138012459587392 pyconfig.py:471] Config param eval_image_column: image
I0424 20:20:00.671580 138012459587392 pyconfig.py:471] Config param eval_interval: -1
I0424 20:20:00.671595 138012459587392 pyconfig.py:471] Config param eval_make_lst: False
I0424 20:20:00.671611 138012459587392 pyconfig.py:471] Config param eval_per_device_batch_size: 2
I0424 20:20:00.671627 138012459587392 pyconfig.py:471] Config param eval_sampling_strategy: greedy
I0424 20:20:00.671642 138012459587392 pyconfig.py:471] Config param eval_split: validation
I0424 20:20:00.671656 138012459587392 pyconfig.py:471] Config param eval_steps: -1
I0424 20:20:00.671670 138012459587392 pyconfig.py:471] Config param expansion_factor_real_data: -1.0
I0424 20:20:00.671686 138012459587392 pyconfig.py:471] Config param final_logits_soft_cap: None
I0424 20:20:00.671700 138012459587392 pyconfig.py:471] Config param first_num_dense_layers: 0
I0424 20:20:00.671716 138012459587392 pyconfig.py:471] Config param float32_gate_logits: False
I0424 20:20:00.671730 138012459587392 pyconfig.py:471] Config param float32_logits: False
I0424 20:20:00.671746 138012459587392 pyconfig.py:471] Config param float32_qk_product: False
I0424 20:20:00.671763 138012459587392 pyconfig.py:471] Config param float32_weight_sum: True
I0424 20:20:00.671777 138012459587392 pyconfig.py:471] Config param force_q_layout: False
I0424 20:20:00.671793 138012459587392 pyconfig.py:471] Config param force_unroll: False
I0424 20:20:00.671810 138012459587392 pyconfig.py:471] Config param freeze_audio_encoder_params: True
I0424 20:20:00.671824 138012459587392 pyconfig.py:471] Config param freeze_vision_encoder_params: True
I0424 20:20:00.671840 138012459587392 pyconfig.py:471] Config param fused_mlp: False
I0424 20:20:00.671854 138012459587392 pyconfig.py:471] Config param fused_qkv: True
I0424 20:20:00.671869 138012459587392 pyconfig.py:471] Config param gcs_metrics: False
I0424 20:20:00.671885 138012459587392 pyconfig.py:471] Config param gdn_chunk_size: 64
I0424 20:20:00.671899 138012459587392 pyconfig.py:471] Config param gdn_conv_kernel_dim: 4
I0424 20:20:00.671915 138012459587392 pyconfig.py:471] Config param gdn_key_head_dim: 128
I0424 20:20:00.671930 138012459587392 pyconfig.py:471] Config param gdn_num_key_heads: 16
I0424 20:20:00.671945 138012459587392 pyconfig.py:471] Config param gdn_num_value_heads: 32
I0424 20:20:00.671961 138012459587392 pyconfig.py:471] Config param gdn_value_head_dim: 128
I0424 20:20:00.671976 138012459587392 pyconfig.py:471] Config param generate_padding_batch_eval: False
I0424 20:20:00.671991 138012459587392 pyconfig.py:471] Config param generate_padding_batch_train: False
I0424 20:20:00.672007 138012459587392 pyconfig.py:471] Config param generate_slice: v5e-16
I0424 20:20:00.672022 138012459587392 pyconfig.py:471] Config param generation_configs: {}
I0424 20:20:00.672038 138012459587392 pyconfig.py:471] Config param global_batch_size_to_eval_on: 64
I0424 20:20:00.672054 138012459587392 pyconfig.py:471] Config param global_batch_size_to_load: 512
I0424 20:20:00.672070 138012459587392 pyconfig.py:471] Config param global_batch_size_to_load_eval: 64
I0424 20:20:00.672084 138012459587392 pyconfig.py:471] Config param global_batch_size_to_load_increment: None
I0424 20:20:00.672106 138012459587392 pyconfig.py:471] Config param global_batch_size_to_load_start: None
I0424 20:20:00.672121 138012459587392 pyconfig.py:471] Config param global_batch_size_to_train_on: 512
I0424 20:20:00.672137 138012459587392 pyconfig.py:471] Config param global_head_dim: 0
I0424 20:20:00.672153 138012459587392 pyconfig.py:471] Config param global_num_kv_heads: 0
I0424 20:20:00.672168 138012459587392 pyconfig.py:471] Config param global_parameter_scale: 1
I0424 20:20:00.672184 138012459587392 pyconfig.py:471] Config param global_rampup_samples: 500
I0424 20:20:00.672200 138012459587392 pyconfig.py:471] Config param global_rope_max_timescale: -1
I0424 20:20:00.672215 138012459587392 pyconfig.py:471] Config param global_rope_proportion: 0.25
I0424 20:20:00.672236 138012459587392 pyconfig.py:471] Config param goodput_upload_interval_seconds: 30
I0424 20:20:00.672251 138012459587392 pyconfig.py:471] Config param grad_dtype: float32
I0424 20:20:00.672287 138012459587392 pyconfig.py:471] Config param gradient_accumulation_steps: 8
I0424 20:20:00.672303 138012459587392 pyconfig.py:471] Config param gradient_clipping_threshold: 1.0
I0424 20:20:00.672319 138012459587392 pyconfig.py:471] Config param grain_data_source_max_workers: 16
I0424 20:20:00.672334 138012459587392 pyconfig.py:471] Config param grain_eval_files: 
I0424 20:20:00.672348 138012459587392 pyconfig.py:471] Config param grain_file_type: arrayrecord
I0424 20:20:00.672369 138012459587392 pyconfig.py:471] Config param grain_num_threads: 16
I0424 20:20:00.672383 138012459587392 pyconfig.py:471] Config param grain_num_threads_eval: 16
I0424 20:20:00.672397 138012459587392 pyconfig.py:471] Config param grain_packing_type: first_fit
I0424 20:20:00.672414 138012459587392 pyconfig.py:471] Config param grain_per_worker_buffer_size: 1
I0424 20:20:00.672428 138012459587392 pyconfig.py:471] Config param grain_per_worker_buffer_size_eval: 1
I0424 20:20:00.672444 138012459587392 pyconfig.py:471] Config param grain_prefetch_buffer_size: 500
I0424 20:20:00.672458 138012459587392 pyconfig.py:471] Config param grain_prefetch_buffer_size_eval: 500
I0424 20:20:00.672474 138012459587392 pyconfig.py:471] Config param grain_ram_budget_mb: 1024
I0424 20:20:00.672488 138012459587392 pyconfig.py:471] Config param grain_shuffle_buffer_size: 100
I0424 20:20:00.672502 138012459587392 pyconfig.py:471] Config param grain_train_files: 
I0424 20:20:00.672518 138012459587392 pyconfig.py:471] Config param grain_train_mixture_config_path: 
I0424 20:20:00.672533 138012459587392 pyconfig.py:471] Config param grain_worker_count: 1
I0424 20:20:00.672549 138012459587392 pyconfig.py:471] Config param grain_worker_count_eval: 1
I0424 20:20:00.672563 138012459587392 pyconfig.py:471] Config param grpo_beta: 0.08
I0424 20:20:00.672580 138012459587392 pyconfig.py:471] Config param grpo_epsilon: 0.2
I0424 20:20:00.672595 138012459587392 pyconfig.py:471] Config param hardware: tpu
I0424 20:20:00.672610 138012459587392 pyconfig.py:471] Config param hbm_utilization_vllm: 0.72
I0424 20:20:00.672627 138012459587392 pyconfig.py:471] Config param head_dim: 8
I0424 20:20:00.672643 138012459587392 pyconfig.py:471] Config param heartbeat_reporting_interval_in_seconds: 5
I0424 20:20:00.672657 138012459587392 pyconfig.py:471] Config param hf_data_dir: None
I0424 20:20:00.672673 138012459587392 pyconfig.py:471] Config param hf_eval_files: None
I0424 20:20:00.672687 138012459587392 pyconfig.py:471] Config param hf_eval_split: None
I0424 20:20:00.672703 138012459587392 pyconfig.py:471] Config param hf_name: None
I0424 20:20:00.672717 138012459587392 pyconfig.py:471] Config param hf_path: OptimalScale/ClimbMix
I0424 20:20:00.672733 138012459587392 pyconfig.py:471] Config param hf_train_files: None
I0424 20:20:00.672749 138012459587392 pyconfig.py:471] Config param hidden_size_for_vit: 1408
I0424 20:20:00.672765 138012459587392 pyconfig.py:471] Config param hide_profiler_step_metric: False
I0424 20:20:00.672782 138012459587392 pyconfig.py:471] Config param ici_autoregressive_parallelism: 1
I0424 20:20:00.672796 138012459587392 pyconfig.py:471] Config param ici_context_autoregressive_parallelism: 1
I0424 20:20:00.672811 138012459587392 pyconfig.py:471] Config param ici_context_parallelism: 1
I0424 20:20:00.672827 138012459587392 pyconfig.py:471] Config param ici_data_parallelism: 1
I0424 20:20:00.672841 138012459587392 pyconfig.py:471] Config param ici_diloco_parallelism: 1
I0424 20:20:00.672857 138012459587392 pyconfig.py:471] Config param ici_expert_parallelism: 1
I0424 20:20:00.672871 138012459587392 pyconfig.py:471] Config param ici_fsdp_parallelism: -1
I0424 20:20:00.672887 138012459587392 pyconfig.py:471] Config param ici_fsdp_transpose_parallelism: 1
I0424 20:20:00.672903 138012459587392 pyconfig.py:471] Config param ici_parallelism: [1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
I0424 20:20:00.672918 138012459587392 pyconfig.py:471] Config param ici_pipeline_parallelism: 1
I0424 20:20:00.672934 138012459587392 pyconfig.py:471] Config param ici_sequence_parallelism: 1
I0424 20:20:00.672948 138012459587392 pyconfig.py:471] Config param ici_tensor_parallelism: 1
I0424 20:20:00.672964 138012459587392 pyconfig.py:471] Config param ici_tensor_sequence_parallelism: 1
I0424 20:20:00.672980 138012459587392 pyconfig.py:471] Config param ici_tensor_transpose_parallelism: 1
I0424 20:20:00.672994 138012459587392 pyconfig.py:471] Config param image_path: 
I0424 20:20:00.673009 138012459587392 pyconfig.py:471] Config param image_placeholder: <|image|>
I0424 20:20:00.673024 138012459587392 pyconfig.py:471] Config param image_size_for_vit: 896
I0424 20:20:00.673040 138012459587392 pyconfig.py:471] Config param indexer_head_dim: 128
I0424 20:20:00.673054 138012459587392 pyconfig.py:471] Config param indexer_loss_scaling_factor: 0.0
I0424 20:20:00.673070 138012459587392 pyconfig.py:471] Config param indexer_n_heads: 64
I0424 20:20:00.673085 138012459587392 pyconfig.py:471] Config param indexer_sparse_training: False
I0424 20:20:00.673110 138012459587392 pyconfig.py:471] Config param indexer_topk: 2048
I0424 20:20:00.673125 138012459587392 pyconfig.py:471] Config param inference_benchmark_test: False
I0424 20:20:00.673141 138012459587392 pyconfig.py:471] Config param inference_metadata_file: 
I0424 20:20:00.673156 138012459587392 pyconfig.py:471] Config param inference_microbenchmark_log_file_path: 
I0424 20:20:00.673171 138012459587392 pyconfig.py:471] Config param inference_microbenchmark_loop_iters: 10
I0424 20:20:00.673187 138012459587392 pyconfig.py:471] Config param inference_microbenchmark_num_samples: [1, 2, 3, 4, 5]
I0424 20:20:00.673203 138012459587392 pyconfig.py:471] Config param inference_microbenchmark_prefill_lengths: 64,128,256,512,1024
I0424 20:20:00.673217 138012459587392 pyconfig.py:471] Config param inference_microbenchmark_stages: prefill,generate
I0424 20:20:00.673237 138012459587392 pyconfig.py:471] Config param inference_server: MaxtextInterleavedServer
I0424 20:20:00.673251 138012459587392 pyconfig.py:471] Config param inhomogeneous_layer_cycle_interval: 1
I0424 20:20:00.673266 138012459587392 pyconfig.py:471] Config param init_weights_seed: 0
I0424 20:20:00.673282 138012459587392 pyconfig.py:471] Config param input_data_sharding_logical_axes: ['activation_embed_and_logits_batch', 'activation_norm_length']
I0424 20:20:00.673298 138012459587392 pyconfig.py:471] Config param interleave_moe_layer_step: 1
I0424 20:20:00.673313 138012459587392 pyconfig.py:471] Config param intermediate_size_for_vit: 5632
I0424 20:20:00.673329 138012459587392 pyconfig.py:471] Config param internal_compile: False
I0424 20:20:00.673344 138012459587392 pyconfig.py:471] Config param internal_compile_num_devices: -1
I0424 20:20:00.673359 138012459587392 pyconfig.py:471] Config param jax_cache_dir: ~/jax_cache
I0424 20:20:00.673375 138012459587392 pyconfig.py:471] Config param jax_debug_log_modules: 
I0424 20:20:00.673391 138012459587392 pyconfig.py:471] Config param jax_distributed_initialization_timeout: 300
I0424 20:20:00.673406 138012459587392 pyconfig.py:471] Config param jax_profiler_port: 9999
I0424 20:20:00.673422 138012459587392 pyconfig.py:471] Config param key_proj: RematLocation.REMAT
I0424 20:20:00.673437 138012459587392 pyconfig.py:471] Config param kv_cache_buffer: 256
I0424 20:20:00.673453 138012459587392 pyconfig.py:471] Config param kv_lora_rank: 512
I0424 20:20:00.673468 138012459587392 pyconfig.py:471] Config param kv_quant_axis: KvQuantAxis.HEADS_AND_DKV
I0424 20:20:00.673486 138012459587392 pyconfig.py:471] Config param kv_quant_dtype: int8
I0424 20:20:00.673501 138012459587392 pyconfig.py:471] Config param kv_wa_proj: RematLocation.REMAT
I0424 20:20:00.673516 138012459587392 pyconfig.py:471] Config param learning_rate: 0.0002
I0424 20:20:00.673532 138012459587392 pyconfig.py:471] Config param learning_rate_final_fraction: 0.1
I0424 20:20:00.673548 138012459587392 pyconfig.py:471] Config param learning_rate_schedule_steps: 200000
I0424 20:20:00.673562 138012459587392 pyconfig.py:471] Config param load_balance_loss_weight: 0.0
I0424 20:20:00.673578 138012459587392 pyconfig.py:471] Config param load_checkpoint_only_once: False
I0424 20:20:00.673593 138012459587392 pyconfig.py:471] Config param load_from_prefill_dir: False
I0424 20:20:00.673609 138012459587392 pyconfig.py:471] Config param load_full_state_path: 
I0424 20:20:00.673624 138012459587392 pyconfig.py:471] Config param load_parameters_path: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items
I0424 20:20:00.673640 138012459587392 pyconfig.py:471] Config param local_checkpoint_directory: 
I0424 20:20:00.673655 138012459587392 pyconfig.py:471] Config param local_checkpoint_period: 0
I0424 20:20:00.673671 138012459587392 pyconfig.py:471] Config param local_rope_max_timescale: -1
I0424 20:20:00.673686 138012459587392 pyconfig.py:471] Config param local_rope_proportion: 1.0
I0424 20:20:00.673701 138012459587392 pyconfig.py:471] Config param log_config: True
I0424 20:20:00.673716 138012459587392 pyconfig.py:471] Config param log_period: 10
I0424 20:20:00.673730 138012459587392 pyconfig.py:471] Config param logical_axis_rules: (('activation_embed_and_logits_batch', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_embed_and_logits_batch_sequence', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_vocab', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_vocab', ('tensor', 'tensor_transpose')), ('activation_vocab', 'tensor_sequence'), ('activation_vocab', ('sequence', 'context')), ('vocab', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('embed_vocab', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence', 'autoregressive')), ('activation_kv_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence')), ('activation_attn_length', ('sequence', 'context')), ('activation_attn_length', ('context',)), ('activation_q_length', ('context',)), ('activation_kv_length', ()), ('activation_attn_embed', ('tensor', 'tensor_transpose')), ('activation_kv', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_kv_head_dim', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('q_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('kv_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('qkv', ()), ('kv', ()), ('kv_head_dim', ()), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'expert')), ('q_lora_up_proj', ()), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'expert')), ('kv_lora_up_proj', ()), ('activation_batch_moe', ('data', 'fsdp', 'fsdp_transpose')), ('activation_length_moe', ('sequence', 'context')), ('activation_length_moe', ('context',)), ('activation_norm_length_moe', ('tensor_sequence', 'context', 'sequence')), ('activation_embed_moe', ('tensor', 'tensor_transpose')), ('activation_mlp_moe', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_exp', ('expert',)), ('exp', 'expert'), ('mlp_moe', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'context')), ('embed_moe', ('fsdp', 'sequence', 'context')), ('activation_mlp', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_length', ('sequence', 'context')), ('activation_length', ('context',)), ('activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_embed', ('tensor', 'tensor_transpose')), ('activation_stage', 'stage'), ('mlp', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'context', 'expert')), ('norm', ('tensor', 'tensor_transpose')), ('layers', 'stage'), ('diloco', 'diloco'), ('engram_dim', ('tensor',)), ('dense_layers', ()), ('moe_layers', ()), ('mhc', ()), ('prefill_activation_length', ('sequence', 'context')), ('prefill_activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_prefill_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_length', ('sequence',)), ('cache_heads', ('autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence')), ('cache_heads', ('autoregressive', 'tensor', 'tensor_sequence')), ('paged_kv_heads', ('tensor',)), ('cache_batch_prefill', ()), ('cache_batch', ()), ('cache_heads_none', ()), ('cache_kv', ()), ('cache_sequence', ()), ('num_pages', ()), ('tokens_per_page', ()), ('paged_kv_head_dim_size', ()), ('mlp_no_fsdp', ('tensor', 'tensor_sequence', 'autoregressive')), ('embed_tensor_transpose', ('tensor_transpose',)), ('exp_with_fsdp', 'fsdp'))
I0424 20:20:00.673811 138012459587392 pyconfig.py:471] Config param logits_dot_in_fp32: False
I0424 20:20:00.673832 138012459587392 pyconfig.py:471] Config param logits_via_embedding: True
I0424 20:20:00.673848 138012459587392 pyconfig.py:471] Config param lora_input_adapters_path: 
I0424 20:20:00.673862 138012459587392 pyconfig.py:471] Config param loss_algo: grpo
I0424 20:20:00.673877 138012459587392 pyconfig.py:471] Config param lr_schedule_type: LearningRateScheduleType.COSINE
I0424 20:20:00.673897 138012459587392 pyconfig.py:471] Config param managed_mldiagnostics: False
I0424 20:20:00.673912 138012459587392 pyconfig.py:471] Config param managed_mldiagnostics_dir: /deps/maxtext_output/gpt3-52k_2026-04-24-20-20/managed-mldiagnostics
I0424 20:20:00.673926 138012459587392 pyconfig.py:471] Config param managed_mldiagnostics_run_group: 
I0424 20:20:00.673942 138012459587392 pyconfig.py:471] Config param matmul_precision: MatmulPrecision.DEFAULT
I0424 20:20:00.673959 138012459587392 pyconfig.py:471] Config param max_checkify: False
I0424 20:20:00.673974 138012459587392 pyconfig.py:471] Config param max_concurrency: 256
I0424 20:20:00.673988 138012459587392 pyconfig.py:471] Config param max_corpus_chars: 10000000
I0424 20:20:00.674002 138012459587392 pyconfig.py:471] Config param max_num_batched_tokens: None
I0424 20:20:00.674016 138012459587392 pyconfig.py:471] Config param max_num_checkpoints_to_keep: None
I0424 20:20:00.674032 138012459587392 pyconfig.py:471] Config param max_num_images_per_example: -1
I0424 20:20:00.674047 138012459587392 pyconfig.py:471] Config param max_num_seqs: None
I0424 20:20:00.674062 138012459587392 pyconfig.py:471] Config param max_position_embeddings: 163840
I0424 20:20:00.674076 138012459587392 pyconfig.py:471] Config param max_prefill_predict_length: 64
I0424 20:20:00.674111 138012459587392 pyconfig.py:471] Config param max_sample_len_for_audio: 10000
I0424 20:20:00.674128 138012459587392 pyconfig.py:471] Config param max_segments_per_seq: -1
I0424 20:20:00.674144 138012459587392 pyconfig.py:471] Config param max_source_positions_for_audio: 1500
I0424 20:20:00.674158 138012459587392 pyconfig.py:471] Config param max_target_length: 2048
I0424 20:20:00.674174 138012459587392 pyconfig.py:471] Config param max_timescale_for_audio: 10000.0
I0424 20:20:00.674189 138012459587392 pyconfig.py:471] Config param megablox: True
I0424 20:20:00.674204 138012459587392 pyconfig.py:471] Config param merge_gating_gmm: False
I0424 20:20:00.674220 138012459587392 pyconfig.py:471] Config param mesh_axes: ['diloco', 'data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive']
I0424 20:20:00.674243 138012459587392 pyconfig.py:471] Config param metrics_dir: /deps/maxtext_output/gpt3-52k_2026-04-24-20-20/metrics/
I0424 20:20:00.674258 138012459587392 pyconfig.py:471] Config param metrics_file: 
I0424 20:20:00.674273 138012459587392 pyconfig.py:471] Config param mhc_expansion_rate: 1
I0424 20:20:00.674288 138012459587392 pyconfig.py:471] Config param micro_batch_size_to_eval_on: 64
I0424 20:20:00.674304 138012459587392 pyconfig.py:471] Config param micro_batch_size_to_train_on: 64
I0424 20:20:00.674318 138012459587392 pyconfig.py:471] Config param mla_kv: RematLocation.REMAT
I0424 20:20:00.674333 138012459587392 pyconfig.py:471] Config param mla_naive_kvcache: True
I0424 20:20:00.674349 138012459587392 pyconfig.py:471] Config param mla_q: RematLocation.REMAT
I0424 20:20:00.674364 138012459587392 pyconfig.py:471] Config param mlp_activations: ['gelu']
I0424 20:20:00.674380 138012459587392 pyconfig.py:471] Config param mlp_activations_limit: -1.0
I0424 20:20:00.674395 138012459587392 pyconfig.py:471] Config param mlp_bias: False
I0424 20:20:00.674409 138012459587392 pyconfig.py:471] Config param mlp_dim: 64
I0424 20:20:00.674425 138012459587392 pyconfig.py:471] Config param mlpwi: RematLocation.REMAT
I0424 20:20:00.674440 138012459587392 pyconfig.py:471] Config param mlpwi_0: RematLocation.REMAT
I0424 20:20:00.674456 138012459587392 pyconfig.py:471] Config param mlpwi_1: RematLocation.REMAT
I0424 20:20:00.674472 138012459587392 pyconfig.py:471] Config param mlpwo: RematLocation.REMAT
I0424 20:20:00.674486 138012459587392 pyconfig.py:471] Config param moba: False
I0424 20:20:00.674502 138012459587392 pyconfig.py:471] Config param moba_chunk_size: 1024
I0424 20:20:00.674518 138012459587392 pyconfig.py:471] Config param moba_topk: 8
I0424 20:20:00.674532 138012459587392 pyconfig.py:471] Config param model_call_mode: 
I0424 20:20:00.674548 138012459587392 pyconfig.py:471] Config param model_name: gpt3-52k
I0424 20:20:00.674564 138012459587392 pyconfig.py:471] Config param moe_expert_input_dim: -1
I0424 20:20:00.674578 138012459587392 pyconfig.py:471] Config param moe_fsdp_use_two_stage_all_gather: False
I0424 20:20:00.674593 138012459587392 pyconfig.py:471] Config param moe_mlp_dim: -1
I0424 20:20:00.674610 138012459587392 pyconfig.py:471] Config param moe_mlpwi_0: RematLocation.REMAT
I0424 20:20:00.674626 138012459587392 pyconfig.py:471] Config param moe_mlpwi_1: RematLocation.REMAT
I0424 20:20:00.674640 138012459587392 pyconfig.py:471] Config param moe_mlpwo: RematLocation.REMAT
I0424 20:20:00.674656 138012459587392 pyconfig.py:471] Config param monitor_goodput: False
I0424 20:20:00.674681 138012459587392 pyconfig.py:471] Config param monitor_step_time_deviation: True
I0424 20:20:00.674695 138012459587392 pyconfig.py:471] Config param mrope_section: [24, 20, 20]
I0424 20:20:00.674712 138012459587392 pyconfig.py:471] Config param mscale: 1.0
I0424 20:20:00.674727 138012459587392 pyconfig.py:471] Config param mtc_data_parallelism: 0
I0424 20:20:00.674743 138012459587392 pyconfig.py:471] Config param mtp_eval_target_module: 0
I0424 20:20:00.674758 138012459587392 pyconfig.py:471] Config param mtp_loss_scaling_factor: 0.1
I0424 20:20:00.674774 138012459587392 pyconfig.py:471] Config param mtp_num_layers: 0
I0424 20:20:00.674791 138012459587392 pyconfig.py:471] Config param mu_dtype: float32
I0424 20:20:00.674814 138012459587392 pyconfig.py:471] Config param multi_sampling: False
I0424 20:20:00.674829 138012459587392 pyconfig.py:471] Config param multi_tier_checkpointing_backup_interval_minutes: 0
I0424 20:20:00.674846 138012459587392 pyconfig.py:471] Config param muon_beta: 0.95
I0424 20:20:00.674862 138012459587392 pyconfig.py:471] Config param muon_consistent_rms: None
I0424 20:20:00.674876 138012459587392 pyconfig.py:471] Config param muon_weight_decay: 0.0
I0424 20:20:00.674892 138012459587392 pyconfig.py:471] Config param n_routing_groups: -1
I0424 20:20:00.674908 138012459587392 pyconfig.py:471] Config param n_window_for_audio: 50
I0424 20:20:00.674924 138012459587392 pyconfig.py:471] Config param n_window_infer_for_audio: 800
I0424 20:20:00.674938 138012459587392 pyconfig.py:471] Config param nope_layer_interval: -1
I0424 20:20:00.674954 138012459587392 pyconfig.py:471] Config param norm_topk_prob: False
I0424 20:20:00.674968 138012459587392 pyconfig.py:471] Config param normalization_layer_epsilon: 1e-05
I0424 20:20:00.674986 138012459587392 pyconfig.py:471] Config param normalize_embedding_logits: False
I0424 20:20:00.675002 138012459587392 pyconfig.py:471] Config param num_attention_heads_for_vit: 16
I0424 20:20:00.675016 138012459587392 pyconfig.py:471] Config param num_batches: 4
I0424 20:20:00.675032 138012459587392 pyconfig.py:471] Config param num_channels_for_vit: 3
I0424 20:20:00.675047 138012459587392 pyconfig.py:471] Config param num_conv_layers_for_audio: 3
I0424 20:20:00.675061 138012459587392 pyconfig.py:471] Config param num_decoder_layers: 1
I0424 20:20:00.675078 138012459587392 pyconfig.py:471] Config param num_diloco_replicas: 1
I0424 20:20:00.675102 138012459587392 pyconfig.py:471] Config param num_epoch: 1
I0424 20:20:00.675117 138012459587392 pyconfig.py:471] Config param num_eval_passes: 1
I0424 20:20:00.675133 138012459587392 pyconfig.py:471] Config param num_experts: 1
I0424 20:20:00.675148 138012459587392 pyconfig.py:471] Config param num_experts_per_tok: 1
I0424 20:20:00.675164 138012459587392 pyconfig.py:471] Config param num_generations: 2
I0424 20:20:00.675179 138012459587392 pyconfig.py:471] Config param num_hidden_layers_for_vit: 34
I0424 20:20:00.675195 138012459587392 pyconfig.py:471] Config param num_iterations: 1
I0424 20:20:00.675209 138012459587392 pyconfig.py:471] Config param num_kv_heads: 2
I0424 20:20:00.675229 138012459587392 pyconfig.py:471] Config param num_layers_per_pipeline_stage: 1
I0424 20:20:00.675244 138012459587392 pyconfig.py:471] Config param num_mel_bins_for_audio: 128
I0424 20:20:00.675258 138012459587392 pyconfig.py:471] Config param num_pipeline_microbatches: -1
I0424 20:20:00.675273 138012459587392 pyconfig.py:471] Config param num_pipeline_repeats: -1
I0424 20:20:00.675287 138012459587392 pyconfig.py:471] Config param num_position_embeddings_for_vit: 1024
I0424 20:20:00.675302 138012459587392 pyconfig.py:471] Config param num_query_heads: 2
I0424 20:20:00.675318 138012459587392 pyconfig.py:471] Config param num_samplers_slices: -1
I0424 20:20:00.675334 138012459587392 pyconfig.py:471] Config param num_slices: 1
I0424 20:20:00.675349 138012459587392 pyconfig.py:471] Config param num_target_devices: 32
I0424 20:20:00.675364 138012459587392 pyconfig.py:471] Config param num_test_batches: 5
I0424 20:20:00.675380 138012459587392 pyconfig.py:471] Config param num_trainer_slices: -1
I0424 20:20:00.675395 138012459587392 pyconfig.py:471] Config param num_vocab_tiling: 1
I0424 20:20:00.675410 138012459587392 pyconfig.py:471] Config param off_policy_steps: 0
I0424 20:20:00.675425 138012459587392 pyconfig.py:471] Config param offline_data_dir: None
I0424 20:20:00.675439 138012459587392 pyconfig.py:471] Config param opt_type: OptimizerType.ADAM_PAX
I0424 20:20:00.675457 138012459587392 pyconfig.py:471] Config param optimize_mesh_for_tpu_v6e: False
I0424 20:20:00.675473 138012459587392 pyconfig.py:471] Config param optimizer_memory_host_offload: False
I0424 20:20:00.675487 138012459587392 pyconfig.py:471] Config param original_max_position_embeddings: 4096
I0424 20:20:00.675503 138012459587392 pyconfig.py:471] Config param out_hidden_size_for_vit: 512
I0424 20:20:00.675519 138012459587392 pyconfig.py:471] Config param out_proj: RematLocation.REMAT
I0424 20:20:00.675534 138012459587392 pyconfig.py:471] Config param output_dim_for_audio: 512
I0424 20:20:00.675549 138012459587392 pyconfig.py:471] Config param override_logical_axis_rules: False
I0424 20:20:00.675565 138012459587392 pyconfig.py:471] Config param override_model_config: True
I0424 20:20:00.675579 138012459587392 pyconfig.py:471] Config param packing: True
I0424 20:20:00.675595 138012459587392 pyconfig.py:471] Config param pagedattn_head_dim_alignment: 128
I0424 20:20:00.675611 138012459587392 pyconfig.py:471] Config param pagedattn_max_pages_per_group: -1
I0424 20:20:00.675625 138012459587392 pyconfig.py:471] Config param pagedattn_num_pages: 64
I0424 20:20:00.675640 138012459587392 pyconfig.py:471] Config param pagedattn_pages_per_compute_block: 4
I0424 20:20:00.675655 138012459587392 pyconfig.py:471] Config param pagedattn_tokens_per_page: 32
I0424 20:20:00.675671 138012459587392 pyconfig.py:471] Config param param_scan_axis: 1
I0424 20:20:00.675685 138012459587392 pyconfig.py:471] Config param parameter_memory_host_offload: False
I0424 20:20:00.675699 138012459587392 pyconfig.py:471] Config param partial_rotary_factor: 1.0
I0424 20:20:00.675716 138012459587392 pyconfig.py:471] Config param patch_size_for_vit: 14
I0424 20:20:00.675730 138012459587392 pyconfig.py:471] Config param penalty_incorrect_answer: -1.0
I0424 20:20:00.675746 138012459587392 pyconfig.py:471] Config param penalty_incorrect_format: -0.5
I0424 20:20:00.675762 138012459587392 pyconfig.py:471] Config param per_device_batch_size: 2
I0424 20:20:00.675778 138012459587392 pyconfig.py:471] Config param per_device_batch_size_increment: 2.0
I0424 20:20:00.675794 138012459587392 pyconfig.py:471] Config param per_device_batch_size_start: 4.0
I0424 20:20:00.675808 138012459587392 pyconfig.py:471] Config param pipeline_delay_activation_forwarding: False
I0424 20:20:00.675824 138012459587392 pyconfig.py:471] Config param pipeline_fsdp_ag_once: False
I0424 20:20:00.675838 138012459587392 pyconfig.py:471] Config param pipeline_fsdp_ag_per_repeat: False
I0424 20:20:00.675853 138012459587392 pyconfig.py:471] Config param pipeline_parallel_layers: 1
I0424 20:20:00.675868 138012459587392 pyconfig.py:471] Config param pixel_shuffle_ratio_for_vit: 0.5
I0424 20:20:00.675883 138012459587392 pyconfig.py:471] Config param posemb_type_for_vit: learn
I0424 20:20:00.675897 138012459587392 pyconfig.py:471] Config param position_id_per_seconds: 25
I0424 20:20:00.675912 138012459587392 pyconfig.py:471] Config param prefill_cache_axis_order: 1,2,0,3
I0424 20:20:00.675927 138012459587392 pyconfig.py:471] Config param prefill_cache_dir: 
I0424 20:20:00.675943 138012459587392 pyconfig.py:471] Config param prefill_chunk_size: 256
I0424 20:20:00.675959 138012459587392 pyconfig.py:471] Config param prefill_slice: v5e-16
I0424 20:20:00.675973 138012459587392 pyconfig.py:471] Config param prefix_caching_dram_byte: 100000000000
I0424 20:20:00.675987 138012459587392 pyconfig.py:471] Config param prefix_caching_hbm_byte: 10000000000
I0424 20:20:00.676003 138012459587392 pyconfig.py:471] Config param profile_cleanly: True
I0424 20:20:00.676017 138012459587392 pyconfig.py:471] Config param profile_periodically_period: -1
I0424 20:20:00.676032 138012459587392 pyconfig.py:471] Config param profile_power_events: False
I0424 20:20:00.676047 138012459587392 pyconfig.py:471] Config param profiler: ProfilerType.NONE
I0424 20:20:00.676065 138012459587392 pyconfig.py:471] Config param profiler_steps: 5
I0424 20:20:00.676079 138012459587392 pyconfig.py:471] Config param projector_dropout_for_vit: 0.0
I0424 20:20:00.676106 138012459587392 pyconfig.py:471] Config param projector_input_dim_for_vit: 4096
I0424 20:20:00.676121 138012459587392 pyconfig.py:471] Config param projector_output_dim_for_vit: 4096
I0424 20:20:00.676137 138012459587392 pyconfig.py:471] Config param prometheus_port: 0
I0424 20:20:00.676151 138012459587392 pyconfig.py:471] Config param prompt: I love to
I0424 20:20:00.676167 138012459587392 pyconfig.py:471] Config param pure_nnx: False
I0424 20:20:00.676181 138012459587392 pyconfig.py:471] Config param pure_nnx_decoder: False
I0424 20:20:00.676198 138012459587392 pyconfig.py:471] Config param q_lora_rank: 0
I0424 20:20:00.676212 138012459587392 pyconfig.py:471] Config param qk_clip_threshold: 100.0
I0424 20:20:00.676229 138012459587392 pyconfig.py:471] Config param qk_nope_head_dim: 128
I0424 20:20:00.676244 138012459587392 pyconfig.py:471] Config param qk_norm_with_scale: True
I0424 20:20:00.676259 138012459587392 pyconfig.py:471] Config param qk_rope_head_dim: 64
I0424 20:20:00.676275 138012459587392 pyconfig.py:471] Config param qkv_proj: RematLocation.REMAT
I0424 20:20:00.676290 138012459587392 pyconfig.py:471] Config param quant_cfg_path: 
I0424 20:20:00.676304 138012459587392 pyconfig.py:471] Config param quantization: QuantizationType.NONE
I0424 20:20:00.676322 138012459587392 pyconfig.py:471] Config param quantization_local_shard_count: 4
I0424 20:20:00.676337 138012459587392 pyconfig.py:471] Config param quantize_kvcache: False
I0424 20:20:00.676352 138012459587392 pyconfig.py:471] Config param query_proj: RematLocation.REMAT
I0424 20:20:00.676367 138012459587392 pyconfig.py:471] Config param query_wa_proj: RematLocation.REMAT
I0424 20:20:00.676382 138012459587392 pyconfig.py:471] Config param ragged_block_size: 256
I0424 20:20:00.676398 138012459587392 pyconfig.py:471] Config param ragged_buffer_factor: -1.0
I0424 20:20:00.676412 138012459587392 pyconfig.py:471] Config param rampup_end_step: 0
I0424 20:20:00.676429 138012459587392 pyconfig.py:471] Config param rampup_samples_per_increment_to_load: None
I0424 20:20:00.676445 138012459587392 pyconfig.py:471] Config param reasoning_end_token: </reasoning>
I0424 20:20:00.676459 138012459587392 pyconfig.py:471] Config param reasoning_start_token: <reasoning>
I0424 20:20:00.676475 138012459587392 pyconfig.py:471] Config param record_internal_nn_metrics: 0
I0424 20:20:00.676490 138012459587392 pyconfig.py:471] Config param remat_policy: full
I0424 20:20:00.676505 138012459587392 pyconfig.py:471] Config param remat_policy_for_vit: minimal
I0424 20:20:00.676521 138012459587392 pyconfig.py:471] Config param remove_size_one_mesh_axis_from_type: True
I0424 20:20:00.676538 138012459587392 pyconfig.py:471] Config param replicate_quant_scale: False
I0424 20:20:00.676552 138012459587392 pyconfig.py:471] Config param replicator_backup_interval_minutes: 0
I0424 20:20:00.676568 138012459587392 pyconfig.py:471] Config param report_heartbeat_metric_for_gcp_monitoring: False
I0424 20:20:00.676584 138012459587392 pyconfig.py:471] Config param report_performance_metric_for_gcp_monitoring: False
I0424 20:20:00.676600 138012459587392 pyconfig.py:471] Config param reshape_q: False
I0424 20:20:00.676615 138012459587392 pyconfig.py:471] Config param return_log_prob: False
I0424 20:20:00.676630 138012459587392 pyconfig.py:471] Config param reuse_example_batch: 0
I0424 20:20:00.676645 138012459587392 pyconfig.py:471] Config param reward_exact_answer: 5.0
I0424 20:20:00.676661 138012459587392 pyconfig.py:471] Config param reward_exact_format_match: 3.0
I0424 20:20:00.676676 138012459587392 pyconfig.py:471] Config param reward_partial_format_match: 0.5
I0424 20:20:00.676691 138012459587392 pyconfig.py:471] Config param reward_ratio_guess_to_answer_high: 0.5
I0424 20:20:00.676706 138012459587392 pyconfig.py:471] Config param reward_ratio_guess_to_answer_low: 0.25
I0424 20:20:00.676722 138012459587392 pyconfig.py:471] Config param reward_white_space_format_match: 1.5
I0424 20:20:00.676738 138012459587392 pyconfig.py:471] Config param rl: {'num_generations': 2, 'num_iterations': 1, 'grpo_beta': 0.08, 'grpo_epsilon': 0.2, 'loss_algo': 'grpo', 'use_agentic_rollout': False, 'max_concurrency': 256, 'off_policy_steps': 0, 'system_prompt': '', 'degenerate_group_masking': True, 'epsilon_high': None}
I0424 20:20:00.676759 138012459587392 pyconfig.py:471] Config param rollout_data_parallelism: -1
I0424 20:20:00.676773 138012459587392 pyconfig.py:471] Config param rollout_expert_parallelism: 1
I0424 20:20:00.676789 138012459587392 pyconfig.py:471] Config param rollout_micro_batch_size: -1
I0424 20:20:00.676803 138012459587392 pyconfig.py:471] Config param rollout_tensor_parallelism: -1
I0424 20:20:00.676817 138012459587392 pyconfig.py:471] Config param rope_attention_scaling: False
I0424 20:20:00.676831 138012459587392 pyconfig.py:471] Config param rope_factor: 40
I0424 20:20:00.676848 138012459587392 pyconfig.py:471] Config param rope_interleave: True
I0424 20:20:00.676863 138012459587392 pyconfig.py:471] Config param rope_linear_scaling_factor: 1.0
I0424 20:20:00.676877 138012459587392 pyconfig.py:471] Config param rope_max_timescale: 10000
I0424 20:20:00.676892 138012459587392 pyconfig.py:471] Config param rope_min_timescale: 1
I0424 20:20:00.676907 138012459587392 pyconfig.py:471] Config param rope_theta_for_vit: 10000
I0424 20:20:00.676923 138012459587392 pyconfig.py:471] Config param rope_truncate: True
I0424 20:20:00.676936 138012459587392 pyconfig.py:471] Config param rope_type: RopeType.DEFAULT
I0424 20:20:00.676955 138012459587392 pyconfig.py:471] Config param rope_use_scale: True
I0424 20:20:00.676974 138012459587392 pyconfig.py:471] Config param routed_bias: False
I0424 20:20:00.676988 138012459587392 pyconfig.py:471] Config param routed_bias_update_rate: 0.0
I0424 20:20:00.677002 138012459587392 pyconfig.py:471] Config param routed_scaling_factor: 1.0
I0424 20:20:00.677019 138012459587392 pyconfig.py:471] Config param routed_score_func: 
I0424 20:20:00.677033 138012459587392 pyconfig.py:471] Config param run_name: gpt3-52k_2026-04-24-20-20
I0424 20:20:00.677047 138012459587392 pyconfig.py:471] Config param sa_block_kv: 512
I0424 20:20:00.677063 138012459587392 pyconfig.py:471] Config param sa_block_kv_compute: 512
I0424 20:20:00.677078 138012459587392 pyconfig.py:471] Config param sa_block_kv_dkv: 512
I0424 20:20:00.677107 138012459587392 pyconfig.py:471] Config param sa_block_kv_dkv_compute: 512
I0424 20:20:00.677123 138012459587392 pyconfig.py:471] Config param sa_block_kv_dq: 512
I0424 20:20:00.677138 138012459587392 pyconfig.py:471] Config param sa_block_q: 512
I0424 20:20:00.677152 138012459587392 pyconfig.py:471] Config param sa_block_q_dkv: 512
I0424 20:20:00.677168 138012459587392 pyconfig.py:471] Config param sa_block_q_dq: 512
I0424 20:20:00.677182 138012459587392 pyconfig.py:471] Config param sa_k_layout: HEAD_DIM_MINOR
I0424 20:20:00.677196 138012459587392 pyconfig.py:471] Config param sa_q_layout: HEAD_DIM_MINOR
I0424 20:20:00.677211 138012459587392 pyconfig.py:471] Config param sa_use_fused_bwd_kernel: False
I0424 20:20:00.677229 138012459587392 pyconfig.py:471] Config param sa_v_layout: HEAD_DIM_MINOR
I0424 20:20:00.677244 138012459587392 pyconfig.py:471] Config param sampler_devices_fraction: 0.5
I0424 20:20:00.677260 138012459587392 pyconfig.py:471] Config param save_checkpoint_on_completion: True
I0424 20:20:00.677274 138012459587392 pyconfig.py:471] Config param save_config_to_gcs: False
I0424 20:20:00.677290 138012459587392 pyconfig.py:471] Config param save_quantized_params_path: 
I0424 20:20:00.677306 138012459587392 pyconfig.py:471] Config param scale_embedding_for_audio: True
I0424 20:20:00.677320 138012459587392 pyconfig.py:471] Config param scan_layers: True
I0424 20:20:00.677336 138012459587392 pyconfig.py:471] Config param scan_layers_per_stage: False
I0424 20:20:00.677352 138012459587392 pyconfig.py:471] Config param scan_pipeline_iterations: True
I0424 20:20:00.677367 138012459587392 pyconfig.py:471] Config param scan_pipeline_repeats: False
I0424 20:20:00.677382 138012459587392 pyconfig.py:471] Config param set_remat_policy_on_layers_per_stage: False
I0424 20:20:00.677398 138012459587392 pyconfig.py:471] Config param set_remat_policy_on_pipeline_iterations: True
I0424 20:20:00.677412 138012459587392 pyconfig.py:471] Config param sft_train_on_completion_only: False
I0424 20:20:00.677427 138012459587392 pyconfig.py:471] Config param shard_exp_on_fsdp: False
I0424 20:20:00.677443 138012459587392 pyconfig.py:471] Config param shard_mode: ShardMode.AUTO
I0424 20:20:00.677460 138012459587392 pyconfig.py:471] Config param shard_optimizer_over_data: False
I0424 20:20:00.677475 138012459587392 pyconfig.py:471] Config param sharding_strategy: None
I0424 20:20:00.677490 138012459587392 pyconfig.py:471] Config param sharding_tolerance: 0.02
I0424 20:20:00.677506 138012459587392 pyconfig.py:471] Config param shardy: True
I0424 20:20:00.677522 138012459587392 pyconfig.py:471] Config param share_kv_projections: False
I0424 20:20:00.677537 138012459587392 pyconfig.py:471] Config param shared_experts: 0
I0424 20:20:00.677553 138012459587392 pyconfig.py:471] Config param sinkhorn_iterations: 20
I0424 20:20:00.677568 138012459587392 pyconfig.py:471] Config param skip_first_n_steps_for_profiler: 1
I0424 20:20:00.677583 138012459587392 pyconfig.py:471] Config param skip_jax_distributed_system: False
I0424 20:20:00.677598 138012459587392 pyconfig.py:471] Config param skip_step_interval: 128
I0424 20:20:00.677614 138012459587392 pyconfig.py:471] Config param skip_step_on_spikes: False
I0424 20:20:00.677629 138012459587392 pyconfig.py:471] Config param skip_step_scaling_factor: 6.0
I0424 20:20:00.677644 138012459587392 pyconfig.py:471] Config param sliding_window_size: 0
I0424 20:20:00.677659 138012459587392 pyconfig.py:471] Config param solution_end_token: </answer>
I0424 20:20:00.677674 138012459587392 pyconfig.py:471] Config param solution_start_token: <answer>
I0424 20:20:00.677689 138012459587392 pyconfig.py:471] Config param source_checkpoint_layout: orbax
I0424 20:20:00.677705 138012459587392 pyconfig.py:471] Config param sparse_matmul: True
I0424 20:20:00.677720 138012459587392 pyconfig.py:471] Config param spatial_merge_size_for_vit: 2
I0424 20:20:00.677734 138012459587392 pyconfig.py:471] Config param stack_prefill_result_cache: False
I0424 20:20:00.677750 138012459587392 pyconfig.py:471] Config param stack_trace_interval_seconds: 600
I0424 20:20:00.677765 138012459587392 pyconfig.py:471] Config param stack_trace_to_cloud: False
I0424 20:20:00.677780 138012459587392 pyconfig.py:471] Config param step_deviation_interval_seconds: 30
I0424 20:20:00.677796 138012459587392 pyconfig.py:471] Config param steps: 200000
I0424 20:20:00.677812 138012459587392 pyconfig.py:471] Config param stop_strings: None
I0424 20:20:00.677827 138012459587392 pyconfig.py:471] Config param student_overrides: {'model_name': 'llama3.1-8b'}
I0424 20:20:00.677844 138012459587392 pyconfig.py:471] Config param student_params_to_update: None
I0424 20:20:00.677860 138012459587392 pyconfig.py:471] Config param subslice_shape: 
I0424 20:20:00.677876 138012459587392 pyconfig.py:471] Config param swap_space_vllm_gb: 2
I0424 20:20:00.677893 138012459587392 pyconfig.py:471] Config param system_prompt: 
I0424 20:20:00.677908 138012459587392 pyconfig.py:471] Config param target_eval_loss: 0.0
I0424 20:20:00.677925 138012459587392 pyconfig.py:471] Config param teacher_overrides: {'model_name': 'llama3.1-8b'}
I0424 20:20:00.677942 138012459587392 pyconfig.py:471] Config param temperature_tuning: False
I0424 20:20:00.677958 138012459587392 pyconfig.py:471] Config param temporal_patch_size_for_vit: 2
I0424 20:20:00.677973 138012459587392 pyconfig.py:471] Config param tensorboard_dir: /deps/maxtext_output/gpt3-52k_2026-04-24-20-20/tensorboard/
I0424 20:20:00.677989 138012459587392 pyconfig.py:471] Config param tensors_on_device: None
I0424 20:20:00.678004 138012459587392 pyconfig.py:471] Config param tensors_to_offload: None
I0424 20:20:00.678019 138012459587392 pyconfig.py:471] Config param test_batch_start_index: 0
I0424 20:20:00.678035 138012459587392 pyconfig.py:471] Config param tile_size_for_vit: 336
I0424 20:20:00.678050 138012459587392 pyconfig.py:471] Config param tokenize_eval_data: True
I0424 20:20:00.678066 138012459587392 pyconfig.py:471] Config param tokenize_train_data: True
I0424 20:20:00.678080 138012459587392 pyconfig.py:471] Config param tokenizer_path: meta-llama/Llama-3.1-8B
I0424 20:20:00.678105 138012459587392 pyconfig.py:471] Config param tokenizer_type: TokenizerType.HUGGINGFACE
I0424 20:20:00.678123 138012459587392 pyconfig.py:471] Config param topk_routing_group: -1
I0424 20:20:00.678139 138012459587392 pyconfig.py:471] Config param train_data_columns: ['text']
I0424 20:20:00.678155 138012459587392 pyconfig.py:471] Config param train_fraction: 1.0
I0424 20:20:00.678171 138012459587392 pyconfig.py:471] Config param train_image_column: image
I0424 20:20:00.678186 138012459587392 pyconfig.py:471] Config param train_micro_batch_size: -1
I0424 20:20:00.678202 138012459587392 pyconfig.py:471] Config param train_split: train
I0424 20:20:00.678219 138012459587392 pyconfig.py:471] Config param trainable_parameters_mask: []
I0424 20:20:00.678239 138012459587392 pyconfig.py:471] Config param trainable_position_size: 2048
I0424 20:20:00.678253 138012459587392 pyconfig.py:471] Config param trainer_devices_fraction: 0.5
I0424 20:20:00.678269 138012459587392 pyconfig.py:471] Config param upload_all_profiler_results: False
I0424 20:20:00.678283 138012459587392 pyconfig.py:471] Config param use_2d_fsdp_sharding: False
I0424 20:20:00.678298 138012459587392 pyconfig.py:471] Config param use_agentic_rollout: False
I0424 20:20:00.678314 138012459587392 pyconfig.py:471] Config param use_audio: False
I0424 20:20:00.678328 138012459587392 pyconfig.py:471] Config param use_audio_in_video: False
I0424 20:20:00.678344 138012459587392 pyconfig.py:471] Config param use_batch_split_schedule: False
I0424 20:20:00.678360 138012459587392 pyconfig.py:471] Config param use_chat_template: False
I0424 20:20:00.678376 138012459587392 pyconfig.py:471] Config param use_chunked_prefill: False
I0424 20:20:00.678391 138012459587392 pyconfig.py:471] Config param use_custom_sort_vjp: True
I0424 20:20:00.678407 138012459587392 pyconfig.py:471] Config param use_dpo: False
I0424 20:20:00.678422 138012459587392 pyconfig.py:471] Config param use_gather_mosaic_kernel: False
I0424 20:20:00.678438 138012459587392 pyconfig.py:471] Config param use_grpo: True
I0424 20:20:00.678454 138012459587392 pyconfig.py:471] Config param use_indexer: False
I0424 20:20:00.678470 138012459587392 pyconfig.py:471] Config param use_iota_embed: True
I0424 20:20:00.678485 138012459587392 pyconfig.py:471] Config param use_jax_splash: False
I0424 20:20:00.678500 138012459587392 pyconfig.py:471] Config param use_max_logit_estimate: -1
I0424 20:20:00.678514 138012459587392 pyconfig.py:471] Config param use_mrope: False
I0424 20:20:00.678531 138012459587392 pyconfig.py:471] Config param use_multimodal: False
I0424 20:20:00.678547 138012459587392 pyconfig.py:471] Config param use_nnx_pipeline: False
I0424 20:20:00.678563 138012459587392 pyconfig.py:471] Config param use_pathways: True
I0424 20:20:00.678579 138012459587392 pyconfig.py:471] Config param use_post_attn_norm: False
I0424 20:20:00.678595 138012459587392 pyconfig.py:471] Config param use_post_ffw_norm: False
I0424 20:20:00.678610 138012459587392 pyconfig.py:471] Config param use_qk_clip: False
I0424 20:20:00.678625 138012459587392 pyconfig.py:471] Config param use_qk_norm: False
I0424 20:20:00.678641 138012459587392 pyconfig.py:471] Config param use_qk_norm_in_gdn: True
I0424 20:20:00.678657 138012459587392 pyconfig.py:471] Config param use_qwix_quantization: False
I0424 20:20:00.678672 138012459587392 pyconfig.py:471] Config param use_ragged_attention: False
I0424 20:20:00.678688 138012459587392 pyconfig.py:471] Config param use_random_routing: False
I0424 20:20:00.678704 138012459587392 pyconfig.py:471] Config param use_replicator_service: False
I0424 20:20:00.678719 138012459587392 pyconfig.py:471] Config param use_ring_of_experts: False
I0424 20:20:00.678735 138012459587392 pyconfig.py:471] Config param use_sft: False
I0424 20:20:00.678750 138012459587392 pyconfig.py:471] Config param use_splash_scheduler: False
I0424 20:20:00.678766 138012459587392 pyconfig.py:471] Config param use_tokamax_gmm: False
I0424 20:20:00.678781 138012459587392 pyconfig.py:471] Config param use_tokamax_splash: False
I0424 20:20:00.678796 138012459587392 pyconfig.py:471] Config param use_truncation: True
I0424 20:20:00.678812 138012459587392 pyconfig.py:471] Config param use_tunix_gradient_accumulation: False
I0424 20:20:00.678828 138012459587392 pyconfig.py:471] Config param use_untrainable_positional_embedding: False
I0424 20:20:00.678843 138012459587392 pyconfig.py:471] Config param use_vertex_tensorboard: False
I0424 20:20:00.678859 138012459587392 pyconfig.py:471] Config param using_pipeline_parallelism: False
I0424 20:20:00.678875 138012459587392 pyconfig.py:471] Config param v_head_dim: 128
I0424 20:20:00.678890 138012459587392 pyconfig.py:471] Config param v_norm_with_scale: True
I0424 20:20:00.678906 138012459587392 pyconfig.py:471] Config param value_proj: RematLocation.REMAT
I0424 20:20:00.678922 138012459587392 pyconfig.py:471] Config param vertex_tensorboard_project: 
I0424 20:20:00.678938 138012459587392 pyconfig.py:471] Config param vertex_tensorboard_region: 
I0424 20:20:00.678955 138012459587392 pyconfig.py:471] Config param video_path: 
I0424 20:20:00.678969 138012459587392 pyconfig.py:471] Config param video_placeholder: <|video|>
I0424 20:20:00.678983 138012459587392 pyconfig.py:471] Config param vision_output_dim_for_vit: 4096
I0424 20:20:00.678999 138012459587392 pyconfig.py:471] Config param vision_output_length: -1
I0424 20:20:00.679015 138012459587392 pyconfig.py:471] Config param vllm_additional_config: {}
I0424 20:20:00.679029 138012459587392 pyconfig.py:471] Config param vllm_hf_config_path: 
I0424 20:20:00.679045 138012459587392 pyconfig.py:471] Config param vllm_hf_overrides: {}
I0424 20:20:00.679059 138012459587392 pyconfig.py:471] Config param vocab_size: 32000
I0424 20:20:00.679075 138012459587392 pyconfig.py:471] Config param warmup_steps_fraction: 0.1
I0424 20:20:00.679090 138012459587392 pyconfig.py:471] Config param weight_dtype: float32
I0424 20:20:00.679122 138012459587392 pyconfig.py:471] Config param weight_quantization_calibration_method: absmax
I0424 20:20:00.679139 138012459587392 pyconfig.py:471] Config param wi_tile_dlhs_batch_seq: 512
I0424 20:20:00.679154 138012459587392 pyconfig.py:471] Config param wi_tile_dlhs_embed_dim: 1024
I0424 20:20:00.679170 138012459587392 pyconfig.py:471] Config param wi_tile_dlhs_mlp_dim: 1024
I0424 20:20:00.679185 138012459587392 pyconfig.py:471] Config param wi_tile_drhs_batch_seq: 512
I0424 20:20:00.679201 138012459587392 pyconfig.py:471] Config param wi_tile_drhs_embed_dim: 1024
I0424 20:20:00.679215 138012459587392 pyconfig.py:471] Config param wi_tile_drhs_mlp_dim: 1024
I0424 20:20:00.679234 138012459587392 pyconfig.py:471] Config param wi_tile_fwd_batch_seq: 512
I0424 20:20:00.679248 138012459587392 pyconfig.py:471] Config param wi_tile_fwd_embed_dim: 1024
I0424 20:20:00.679264 138012459587392 pyconfig.py:471] Config param wi_tile_fwd_mlp_dim: 1024
I0424 20:20:00.679279 138012459587392 pyconfig.py:471] Config param wo_tile_dlhs_batch_seq: 512
I0424 20:20:00.679295 138012459587392 pyconfig.py:471] Config param wo_tile_dlhs_embed_dim: 1024
I0424 20:20:00.679310 138012459587392 pyconfig.py:471] Config param wo_tile_dlhs_mlp_dim: 1024
I0424 20:20:00.679325 138012459587392 pyconfig.py:471] Config param wo_tile_drhs_batch_seq: 512
I0424 20:20:00.679340 138012459587392 pyconfig.py:471] Config param wo_tile_drhs_embed_dim: 1024
I0424 20:20:00.679355 138012459587392 pyconfig.py:471] Config param wo_tile_drhs_mlp_dim: 1024
I0424 20:20:00.679371 138012459587392 pyconfig.py:471] Config param wo_tile_fwd_batch_seq: 512
I0424 20:20:00.679384 138012459587392 pyconfig.py:471] Config param wo_tile_fwd_embed_dim: 1024
I0424 20:20:00.679399 138012459587392 pyconfig.py:471] Config param wo_tile_fwd_mlp_dim: 1024
I0424 20:20:00.679413 138012459587392 pyconfig.py:471] Config param wsd_decay_steps_fraction: 0.1
I0424 20:20:00.679429 138012459587392 pyconfig.py:471] Config param wsd_decay_style: WsdDecayStyle.LINEAR
I0424 20:20:00.679447 138012459587392 pyconfig.py:471] Config param xprof_e2e_enable_fw_power_level_event: False
I0424 20:20:00.679461 138012459587392 pyconfig.py:471] Config param xprof_e2e_enable_fw_thermal_event: False
I0424 20:20:00.679477 138012459587392 pyconfig.py:471] Config param xprof_e2e_enable_fw_throttle_event: False
I0424 20:20:00.679493 138012459587392 pyconfig.py:471] Config param xprof_tpu_power_trace_level: 0
I0424 20:20:00.679512 138012459587392 pyconfig.py:471] Config param z_loss_multiplier: 0.0
I0424 20:20:00.679826 138012459587392 tokenizer.py:245] Tokenizer path: meta-llama/Llama-2-7b-chat-hf
I0424 20:20:00.679859 138012459587392 tokenizer.py:224] Loading HF tokenizer: meta-llama/Llama-2-7b-chat-hf
I0424 20:20:04.692132 138012459587392 _schedule.py:129] A polynomial schedule was set with a non-positive `transition_steps` value; this results in a constant schedule with value `init_value`.
I0424 20:20:04.695225 138012459587392 maxtext_utils.py:1565] Num_devices: 32, shape (1, 4, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1)
I0424 20:20:04.695346 138012459587392 train_distill.py:596] Applying logical axis rules for model initialization and training...
I0424 20:20:04.695416 138012459587392 train_distill.py:600] Loading Student from ...
I0424 20:20:04.695444 138012459587392 train_distill.py:169] --- Student Configuration ---
I0424 20:20:04.695467 138012459587392 train_distill.py:170]   Model Name:      gpt3-52k
I0424 20:20:04.695495 138012459587392 train_distill.py:171]   Dimensions:      1 Layers, 16 Emb Dim, 8 Head Dim
I0424 20:20:04.695514 138012459587392 train_distill.py:174]   Attention Heads: 2 Query, 2 KV
I0424 20:20:04.695530 138012459587392 train_distill.py:175]   Vocab Size:      32000
I0424 20:20:04.695548 138012459587392 train_distill.py:176]   Checkpoint:      
I0424 20:20:04.695564 138012459587392 train_distill.py:465] Initializing model: gpt3-52k...
I0424 20:20:06.141020 138012459587392 train_distill.py:614] Loading Teacher from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items...
I0424 20:20:06.141142 138012459587392 train_distill.py:169] --- Teacher Configuration ---
I0424 20:20:06.141172 138012459587392 train_distill.py:170]   Model Name:      gpt3-52k
I0424 20:20:06.141198 138012459587392 train_distill.py:171]   Dimensions:      1 Layers, 16 Emb Dim, 8 Head Dim
I0424 20:20:06.141220 138012459587392 train_distill.py:174]   Attention Heads: 2 Query, 2 KV
I0424 20:20:06.141242 138012459587392 train_distill.py:175]   Vocab Size:      32000
I0424 20:20:06.141260 138012459587392 train_distill.py:176]   Checkpoint:      gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items
I0424 20:20:06.141279 138012459587392 train_distill.py:465] Initializing model: gpt3-52k...
I0424 20:20:07.224703 138012459587392 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0424 20:20:07.225233 138012459587392 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=True, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7d84d0041a30>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0424 20:20:07.225302 138012459587392 abstract_checkpointer.py:35] orbax-checkpoint version: 0.11.28
W0424 20:20:07.739956 138012459587392 checkpoint.py:202] Metadata file does not exist: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items/_CHECKPOINT_METADATA
I0424 20:20:08.265910    2080 google_auth_provider.cc:181] Running on GCE, using service account 562977990677-compute@developer.gserviceaccount.com
I0424 20:20:09.406466 138012459587392 checkpointer.py:304] Restoring checkpoint from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items.
W0424 20:20:11.845997 138012459587392 transform_utils.py:230] The transformations API will eventually be replaced by an upgraded design. The current API will not be removed until this point, but it will no longer be actively worked on.
I0424 20:20:11.846390 138012459587392 transform_utils.py:288] The following keys are not loaded from the original tree after applying specified transforms: params/params/decoder/to_nnx__rngs/aqt/count, params/params/decoder/to_nnx__rngs/aqt/key, params/params/decoder/to_nnx__rngs/dropout/count, params/params/decoder/to_nnx__rngs/dropout/key, params/params/decoder/to_nnx__rngs/params/count, params/params/decoder/to_nnx__rngs/params/key
I0424 20:20:11.907218 138012459587392 checkpointer.py:318] Finished restoring checkpoint in 2.90 seconds from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items.
I0424 20:20:12.601606 138012459587392 train_distill.py:640] Initializing Data Iterators via MaxText pipeline...
I0424 20:20:12.664412 138012459587392 config.py:112] TensorFlow version 2.20.0 available.
I0424 20:20:12.664904 138012459587392 config.py:125] JAX version 0.8.3 available.
E0424 20:20:14.681351 138012459587392 packing.py:209] PackAndBatchOperation is deprecated. Please use lazy_dataset.FirstFitPackIterDataset instead.
I0424 20:20:14.681576 138012459587392 data_loader.py:408] Adding CopyNumPyArrayToSharedMemory MapTransform.
I0424 20:20:14.684645 138012459587392 train_distill.py:410] Input Pipeline Checkpointing: DISABLED
I0424 20:20:14.684709 138012459587392 train_distill.py:414] Reason: Iterator 'MultiHostDataLoadIterator' is not recognized as Grain (dataset_type='DatasetType.HF', has_save=False)
I0424 20:20:14.684775 138012459587392 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0424 20:20:14.684851 138012459587392 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7d84d0041a30>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0424 20:20:14.684892 138012459587392 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0424 20:20:14.684924 138012459587392 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7d84d0041a30>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0424 20:20:14.684965 138012459587392 checkpoint_manager.py:702] [process=6][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7d7ae4784ad0>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7d79937974d0>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7d7993797440>}, handler_registry=None
I0424 20:20:14.685170 138012459587392 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7d7ae4784ad0>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0424 20:20:14.685213 138012459587392 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7d79937974d0>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0424 20:20:14.685241 138012459587392 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7d7993797440>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0424 20:20:14.685267 138012459587392 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7d733823e750>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0424 20:20:14.685293 138012459587392 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7d7ae4784ad0>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7d7ae4784ad0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7d79937974d0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7d79937974d0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7d7993797440>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7d7993797440>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7d733823e750>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7d733823e750>}).
I0424 20:20:14.685708 138012459587392 async_checkpointer.py:177] [process=6][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7d7993723240> timeout: 600 secs and primary_host=0 for async checkpoint writes
I0424 20:20:17.161573 138012459587392 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_test_pipeline_scan_nnx_20260424_200844/pt_distill_linen_xpk_test_pipeline_scan_nnx_20260424_200844_07_distill_smoke/checkpoints
I0424 20:20:18.040724 138012459587392 checkpoint_manager.py:921] [process=6][thread=MainThread] CheckpointManager created,  primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_test_pipeline_scan_nnx_20260424_200844/pt_distill_linen_xpk_test_pipeline_scan_nnx_20260424_200844_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7d7993797410>
I0424 20:20:18.040897 138012459587392 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0424 20:20:18.040966 138012459587392 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7d84d0041a30>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0424 20:20:18.041002 138012459587392 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0424 20:20:18.041032 138012459587392 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7d84d0041a30>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0424 20:20:18.041068 138012459587392 checkpoint_manager.py:1983] [process=6][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning.
I0424 20:20:18.041140 138012459587392 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=138012459587392 count=1 at 0x7d799371eec0>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7d7993797230>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7d7993797200>, _write_futures=[])
I0424 20:20:18.041509 138012459587392 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=138012459587392 count=1 at 0x7d799371eec0>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7d7993797230>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7d7993797200>, _write_futures=[])
I0424 20:20:18.041536 138012459587392 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=138012459587392 count=1 at 0x7d799371eec0>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7d7993797230>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7d7993797200>, _write_futures=[])
I0424 20:20:18.041572 138012459587392 checkpoint_manager.py:702] [process=6][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7d79937973e0>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7d6ec816b260>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7d6ec816b4d0>, 'iter': <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7d6ec816a6c0>}, handler_registry=None
I0424 20:20:18.041678 138012459587392 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7d79937973e0>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0424 20:20:18.041710 138012459587392 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7d6ec816b260>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0424 20:20:18.041733 138012459587392 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7d6ec816b4d0>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0424 20:20:18.041760 138012459587392 composite_checkpoint_handler.py:237] Deferred registration for item: "iter". Adding handler `<maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7d6ec816a6c0>` for item "iter" and save args `<class 'maxtext.common.checkpointing.GrainCheckpointSave'>` and restore args `<class 'maxtext.common.checkpointing.GrainCheckpointRestore'>` to `_handler_registry`.
I0424 20:20:18.041782 138012459587392 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7d6ec816aae0>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0424 20:20:18.041806 138012459587392 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7d79937973e0>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7d79937973e0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7d6ec816b260>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7d6ec816b260>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7d6ec816b4d0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7d6ec816b4d0>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointSave'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7d6ec816a6c0>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointRestore'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7d6ec816a6c0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7d6ec816aae0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7d6ec816aae0>}).
I0424 20:20:18.041877 138012459587392 async_checkpointer.py:177] [process=6][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7d7993723380> timeout: 600 secs and primary_host=0 for async checkpoint writes
I0424 20:20:18.845823 138012459587392 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_test_pipeline_scan_nnx_20260424_200844/pt_distill_linen_xpk_test_pipeline_scan_nnx_20260424_200844_07_distill_smoke/checkpoints
I0424 20:20:19.030469 138012459587392 checkpoint_manager.py:921] [process=6][thread=MainThread] CheckpointManager created,  primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_test_pipeline_scan_nnx_20260424_200844/pt_distill_linen_xpk_test_pipeline_scan_nnx_20260424_200844_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7d7ae474c680>
I0424 20:20:19.031061 138012459587392 train_distill.py:691] Starting Distillation Training...
I0424 20:20:19.031188 138012459587392 peft_trainer.py:590] Training with mesh: Mesh('diloco': 1, 'data': 4, 'stage': 1, 'fsdp': 8, 'fsdp_transpose': 1, 'sequence': 1, 'context': 1, 'context_autoregressive': 1, 'tensor': 1, 'tensor_transpose': 1, 'tensor_sequence': 1, 'expert': 1, 'autoregressive': 1, axis_types=(Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto))
I0424 20:20:19.383929 138012459587392 peft_trainer.py:600] Compiled train_step cache size: 0

Training:   0%|          | 0/5 [00:00<?, ?step/s]I0424 20:20:19.385910 137872308942592 grain_pool.py:367] Grain pool will use 1 processes.
I0424 20:20:19.412722 137872308942592 grain_pool.py:440] Grain pool will start child processes.
I0424 20:20:19.418148 137872308942592 grain_pool.py:448] Grain pool started all child processes.
2026-04-24 20:20:25.480565: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'}
`rope_scaling`'s factor field must be a float >= 1, got 40
`rope_scaling`'s beta_fast field must be a float, got 32
`rope_scaling`'s beta_slow field must be a float, got 1
Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'}
Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'}
Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'}
I0424 20:20:28.869334 138012459587392 utils.py:86] Train loop finished in: 9.4848 seconds
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 765, in <module>
    app.run(main)
  File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 316, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 261, in _run_main
    sys.exit(main(argv))
             ^^^^^^^^^^
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 761, in main
    train_distill(student_config, teacher_config, is_offline, global_config.offline_data_dir)
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 693, in train_distill
    trainer.train(train_iter, eval_iter)
  File "/usr/local/lib/python3.12/site-packages/tunix/sft/peft_trainer.py", line 659, in train
    train_example = sharding_utils.shard_input(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 58, in shard_input
    return jax.tree.map(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/tree.py", line 155, in map
    return tree_util.tree_map(f, tree, *rest, is_leaf=is_leaf)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 362, in tree_map
    return treedef.unflatten(f(*xs) for xs in zip(*all_leaves))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 362, in <genexpr>
    return treedef.unflatten(f(*xs) for xs in zip(*all_leaves))
                             ^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 59, in <lambda>
    lambda x: jax.make_array_from_process_local_data(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 986, in make_array_from_process_local_data
    out = [_array_from_process_local_data(data, s, shape)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 1048, in _array_from_process_local_data
    return make_array_from_callback(global_shape, sharding, cb)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 845, in make_array_from_callback
    per_device_values = api.device_put(per_device_values, devices)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/api.py", line 2729, in device_put
    out_flat = dispatch._batched_device_put_impl(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 558, in _batched_device_put_impl
    y = _device_put_impl(x, device=device, src=src, copy=cp, aval=aval)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 545, in _device_put_impl
    return _device_put_sharding_impl(x, aval, device, copy)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 487, in _device_put_sharding_impl
    raise ValueError(
ValueError: device_put's first argument must be a fully addressable array, but got value with devices {TpuDevice(id=4, process_index=0, coords=(0,1,0), core_on_chip=0), TpuDevice(id=14, process_index=3, coords=(2,3,0), core_on_chip=0), TpuDevice(id=30, process_index=7, coords=(2,7,0), core_on_chip=0), TpuDevice(id=1, process_index=0, coords=(1,0,0), core_on_chip=0), TpuDevice(id=12, process_index=2, coords=(0,3,0), core_on_chip=0), TpuDevice(id=18, process_index=5, coords=(2,4,0), core_on_chip=0), TpuDevice(id=28, process_index=6, coords=(0,7,0), core_on_chip=0), TpuDevice(id=5, process_index=0, coords=(1,1,0), core_on_chip=0), TpuDevice(id=8, process_index=2, coords=(0,2,0), core_on_chip=0), TpuDevice(id=6, process_index=1, coords=(2,1,0), core_on_chip=0), TpuDevice(id=25, process_index=6, coords=(1,6,0), core_on_chip=0), TpuDevice(id=16, process_index=4, coords=(0,4,0), core_on_chip=0), TpuDevice(id=19, process_index=5, coords=(3,4,0), core_on_chip=0), TpuDevice(id=11, process_index=3, coords=(3,2,0), core_on_chip=0), TpuDevice(id=13, process_index=2, coords=(1,3,0), core_on_chip=0), TpuDevice(id=29, process_index=6, coords=(1,7,0), core_on_chip=0), TpuDevice(id=15, process_index=3, coords=(3,3,0), core_on_chip=0), TpuDevice(id=10, process_index=3, coords=(2,2,0), core_on_chip=0), TpuDevice(id=22, process_index=5, coords=(2,5,0), core_on_chip=0), TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), TpuDevice(id=27, process_index=7, coords=(3,6,0), core_on_chip=0), TpuDevice(id=31, process_index=7, coords=(3,7,0), core_on_chip=0), TpuDevice(id=20, process_index=4, coords=(0,5,0), core_on_chip=0), TpuDevice(id=7, process_index=1, coords=(3,1,0), core_on_chip=0), TpuDevice(id=24, process_index=6, coords=(0,6,0), core_on_chip=0), TpuDevice(id=21, process_index=4, coords=(1,5,0), core_on_chip=0), TpuDevice(id=17, process_index=4, coords=(1,4,0), core_on_chip=0), TpuDevice(id=2, process_index=1, coords=(2,0,0), core_on_chip=0), TpuDevice(id=26, process_index=7, coords=(2,6,0), core_on_chip=0), TpuDevice(id=23, process_index=5, coords=(3,5,0), core_on_chip=0), TpuDevice(id=9, process_index=2, coords=(1,2,0), core_on_chip=0), TpuDevice(id=3, process_index=1, coords=(3,0,0), core_on_chip=0)}
I0424 20:20:29.214634 137872308942592 grain_pool.py:542] Grain pool is exiting.
I0424 20:20:29.214736 137872308942592 grain_pool.py:547] Shutting down multiprocessing system.
I0424 20:20:30.661886 137872308942592 grain_pool.py:547] Shutting down multiprocessing system.

Training:   0%|          | 0/5 [00:13<?, ?step/s]
/usr/local/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 15 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
XPK End: Fri Apr 24 20:20:38 UTC 2026
EXIT_CODE=1