MaxView

‹ 01_sft_smokeCase: 07_distill_smoke— ›

Metrics: Linen vs NNX  ·  main

MetricLinen  b117f50cfNNX  b117f50cfDiff (NNX − Linen)

Diff = NNX value − Linen value. Green = NNX improved. Red = NNX regressed.

Linen  ·  b117f50cf  ·  main_20260424_070237  ·  full log
XPK Start: Fri Apr 24 07:14:45 UTC 2026
2026-04-24 07:15:03.908623: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
`rope_parameters`'s factor field must be a float >= 1, got 40
`rope_parameters`'s beta_fast field must be a float, got 32
`rope_parameters`'s beta_slow field must be a float, got 1
DeepseekV32Config got `key=rope_scaling` in kwargs but hasn't set it as attribute. For RoPE standardization you need to set `self.rope_parameters` in model's config. 
I0424 07:15:10.477046 139516489717568 max_utils.py:273] Attempting to initialize the jax distributed system...
I0424 07:15:19.516084 139516489717568 distributed.py:149] Starting JAX distributed service on [::]:8482
I0424 07:15:19.521602 139516489717568 distributed.py:172] Connecting to JAX distributed service on mt-07-distill-smoke-vuqow-slice-job-0-0.mt-07-distill-smoke-vuqow:8482
I0424 07:15:20.216856 139516489717568 max_utils.py:284] Jax distributed system initialized!
I0424 07:15:25.812885 139516489717568 max_utils.py:244] Jax distributed system is already initialized.
W0424 07:15:25.944439 139516489717568 pyconfig.py:111] base_output_directory is not provided; Using local directory called maxtext_output
I0424 07:15:26.370086 139516489717568 max_utils.py:244] Jax distributed system is already initialized.
I0424 07:15:26.371287 139516489717568 pyconfig.py:471] Config param abort_on_inf_loss: True
I0424 07:15:26.371337 139516489717568 pyconfig.py:471] Config param abort_on_nan_loss: True
I0424 07:15:26.371365 139516489717568 pyconfig.py:471] Config param act_quantization_calibration_method: absmax
I0424 07:15:26.371385 139516489717568 pyconfig.py:471] Config param activation_dropout_for_audio: 0.0
I0424 07:15:26.371405 139516489717568 pyconfig.py:471] Config param activation_function_for_audio: gelu
I0424 07:15:26.371425 139516489717568 pyconfig.py:471] Config param activations_in_float32: False
I0424 07:15:26.371443 139516489717568 pyconfig.py:471] Config param adam_b1: 0.9
I0424 07:15:26.371463 139516489717568 pyconfig.py:471] Config param adam_b2: 0.95
I0424 07:15:26.371479 139516489717568 pyconfig.py:471] Config param adam_eps: 1e-08
I0424 07:15:26.371500 139516489717568 pyconfig.py:471] Config param adam_eps_root: 0.0
I0424 07:15:26.371516 139516489717568 pyconfig.py:471] Config param adam_weight_decay: 0.1
I0424 07:15:26.371533 139516489717568 pyconfig.py:471] Config param adamw_mask: []
I0424 07:15:26.371549 139516489717568 pyconfig.py:471] Config param add_bos: True
I0424 07:15:26.371565 139516489717568 pyconfig.py:471] Config param add_eos: True
I0424 07:15:26.371582 139516489717568 pyconfig.py:471] Config param allow_split_physical_axes: False
I0424 07:15:26.371598 139516489717568 pyconfig.py:471] Config param ar_cache_axis_order: 1,2,0,3
I0424 07:15:26.371615 139516489717568 pyconfig.py:471] Config param async_checkpointing: True
I0424 07:15:26.371631 139516489717568 pyconfig.py:471] Config param async_scheduling: False
I0424 07:15:26.371646 139516489717568 pyconfig.py:471] Config param attention: dot_product
I0424 07:15:26.371674 139516489717568 pyconfig.py:471] Config param attention_bias: False
I0424 07:15:26.371690 139516489717568 pyconfig.py:471] Config param attention_dropout_for_audio: 0.0
I0424 07:15:26.371707 139516489717568 pyconfig.py:471] Config param attention_out: RematLocation.REMAT
I0424 07:15:26.371727 139516489717568 pyconfig.py:471] Config param attention_output_dim: -1
I0424 07:15:26.371742 139516489717568 pyconfig.py:471] Config param attention_sink: False
I0424 07:15:26.371757 139516489717568 pyconfig.py:471] Config param attention_type: global
I0424 07:15:26.371772 139516489717568 pyconfig.py:471] Config param attn_logits_soft_cap: None
I0424 07:15:26.371788 139516489717568 pyconfig.py:471] Config param audio_path: 
I0424 07:15:26.371805 139516489717568 pyconfig.py:471] Config param audio_placeholder: <|audio|>
I0424 07:15:26.371821 139516489717568 pyconfig.py:471] Config param autoregressive_decode_assert: 
I0424 07:15:26.371836 139516489717568 pyconfig.py:471] Config param base_config: base.yml
I0424 07:15:26.371851 139516489717568 pyconfig.py:471] Config param base_emb_dim: 16
I0424 07:15:26.371867 139516489717568 pyconfig.py:471] Config param base_mlp_dim: 64
I0424 07:15:26.371882 139516489717568 pyconfig.py:471] Config param base_moe_mlp_dim: -1
I0424 07:15:26.371897 139516489717568 pyconfig.py:471] Config param base_num_decoder_layers: 1
I0424 07:15:26.371913 139516489717568 pyconfig.py:471] Config param base_num_kv_heads: 2
I0424 07:15:26.371928 139516489717568 pyconfig.py:471] Config param base_num_query_heads: 2
I0424 07:15:26.371944 139516489717568 pyconfig.py:471] Config param base_output_directory: /deps/maxtext_output
I0424 07:15:26.371958 139516489717568 pyconfig.py:471] Config param batch_size: 1
I0424 07:15:26.371974 139516489717568 pyconfig.py:471] Config param batch_split_factor: 1
I0424 07:15:26.371988 139516489717568 pyconfig.py:471] Config param beta_fast: 32
I0424 07:15:26.372004 139516489717568 pyconfig.py:471] Config param beta_slow: 1
I0424 07:15:26.372022 139516489717568 pyconfig.py:471] Config param bwd_quantization_calibration_method: absmax
I0424 07:15:26.372038 139516489717568 pyconfig.py:471] Config param capacity_factor: -1.0
I0424 07:15:26.372053 139516489717568 pyconfig.py:471] Config param cast_logits_to_fp32: True
I0424 07:15:26.372069 139516489717568 pyconfig.py:471] Config param chat_template: 
I0424 07:15:26.372083 139516489717568 pyconfig.py:471] Config param chat_template_path: 
I0424 07:15:26.372100 139516489717568 pyconfig.py:471] Config param checkpoint_conversion_fn: None
I0424 07:15:26.372116 139516489717568 pyconfig.py:471] Config param checkpoint_dir: /deps/maxtext_output/gpt3-52k_2026-04-24-07-15/checkpoints/
I0424 07:15:26.372132 139516489717568 pyconfig.py:471] Config param checkpoint_is_quantized: False
I0424 07:15:26.372148 139516489717568 pyconfig.py:471] Config param checkpoint_period: 2000
I0424 07:15:26.372163 139516489717568 pyconfig.py:471] Config param checkpoint_storage_concurrent_gb: 96
I0424 07:15:26.372180 139516489717568 pyconfig.py:471] Config param checkpoint_storage_target_data_file_size_bytes: 2147483648
I0424 07:15:26.372195 139516489717568 pyconfig.py:471] Config param checkpoint_storage_use_ocdbt: True
I0424 07:15:26.372210 139516489717568 pyconfig.py:471] Config param checkpoint_storage_use_zarr3: True
I0424 07:15:26.372226 139516489717568 pyconfig.py:471] Config param checkpoint_todelete_full_path: None
I0424 07:15:26.372240 139516489717568 pyconfig.py:471] Config param checkpoint_todelete_subdir: None
I0424 07:15:26.372256 139516489717568 pyconfig.py:471] Config param chips_per_vm: 4
I0424 07:15:26.372270 139516489717568 pyconfig.py:471] Config param chunk_attn_window_size: 0
I0424 07:15:26.372285 139516489717568 pyconfig.py:471] Config param collect_stack_trace: False
I0424 07:15:26.372301 139516489717568 pyconfig.py:471] Config param colocated_python_checkpointing: False
I0424 07:15:26.372316 139516489717568 pyconfig.py:471] Config param colocated_python_data_input: False
I0424 07:15:26.372332 139516489717568 pyconfig.py:471] Config param compile_topology: 
I0424 07:15:26.372347 139516489717568 pyconfig.py:471] Config param compile_topology_num_slices: -1
I0424 07:15:26.372361 139516489717568 pyconfig.py:471] Config param compile_xla_flags: 
I0424 07:15:26.372377 139516489717568 pyconfig.py:471] Config param compiled_trainstep_file: 
I0424 07:15:26.372393 139516489717568 pyconfig.py:471] Config param compute_axis_order: 0,1,2,3
I0424 07:15:26.372408 139516489717568 pyconfig.py:471] Config param constant_bound_config: []
I0424 07:15:26.372422 139516489717568 pyconfig.py:471] Config param context: RematLocation.REMAT
I0424 07:15:26.372439 139516489717568 pyconfig.py:471] Config param context_parallel_load_balance: True
I0424 07:15:26.372454 139516489717568 pyconfig.py:471] Config param context_parallel_reorder_strategy: ReorderStrategy.AUTO
I0424 07:15:26.372472 139516489717568 pyconfig.py:471] Config param context_parallel_size: 1
I0424 07:15:26.372486 139516489717568 pyconfig.py:471] Config param context_parallel_strategy: all_gather
I0424 07:15:26.372501 139516489717568 pyconfig.py:471] Config param context_sharding: context
I0424 07:15:26.372515 139516489717568 pyconfig.py:471] Config param conv_chunksize_for_audio: 500
I0424 07:15:26.372531 139516489717568 pyconfig.py:471] Config param conv_stride_for_vit: 14
I0424 07:15:26.372547 139516489717568 pyconfig.py:471] Config param convert_checkpoint_if_possible: False
I0424 07:15:26.372561 139516489717568 pyconfig.py:471] Config param cost_estimate_flops_bwd: -1
I0424 07:15:26.372576 139516489717568 pyconfig.py:471] Config param cost_estimate_flops_fwd: -1
I0424 07:15:26.372591 139516489717568 pyconfig.py:471] Config param custom_mesh: 
I0424 07:15:26.372606 139516489717568 pyconfig.py:471] Config param custom_mesh_and_rule: 
I0424 07:15:26.372622 139516489717568 pyconfig.py:471] Config param d_model_for_audio: 256
I0424 07:15:26.372637 139516489717568 pyconfig.py:471] Config param data_sharding: (('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'),)
I0424 07:15:26.372680 139516489717568 pyconfig.py:471] Config param data_shuffle_seed: 0
I0424 07:15:26.372696 139516489717568 pyconfig.py:471] Config param dataset_name: c4/en:3.0.1
I0424 07:15:26.372712 139516489717568 pyconfig.py:471] Config param dataset_path: 
I0424 07:15:26.372727 139516489717568 pyconfig.py:471] Config param dataset_type: DatasetType.HF
I0424 07:15:26.372746 139516489717568 pyconfig.py:471] Config param dcn_autoregressive_parallelism: 1
I0424 07:15:26.372762 139516489717568 pyconfig.py:471] Config param dcn_context_autoregressive_parallelism: 1
I0424 07:15:26.372777 139516489717568 pyconfig.py:471] Config param dcn_context_parallelism: 1
I0424 07:15:26.372792 139516489717568 pyconfig.py:471] Config param dcn_data_parallelism: -1
I0424 07:15:26.372807 139516489717568 pyconfig.py:471] Config param dcn_diloco_parallelism: 1
I0424 07:15:26.372822 139516489717568 pyconfig.py:471] Config param dcn_expert_parallelism: 1
I0424 07:15:26.372837 139516489717568 pyconfig.py:471] Config param dcn_fsdp_parallelism: 1
I0424 07:15:26.372852 139516489717568 pyconfig.py:471] Config param dcn_fsdp_transpose_parallelism: 1
I0424 07:15:26.372866 139516489717568 pyconfig.py:471] Config param dcn_parallelism: [1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
I0424 07:15:26.372883 139516489717568 pyconfig.py:471] Config param dcn_pipeline_parallelism: 1
I0424 07:15:26.372898 139516489717568 pyconfig.py:471] Config param dcn_sequence_parallelism: 1
I0424 07:15:26.372912 139516489717568 pyconfig.py:471] Config param dcn_tensor_parallelism: 1
I0424 07:15:26.372927 139516489717568 pyconfig.py:471] Config param dcn_tensor_sequence_parallelism: 1
I0424 07:15:26.372942 139516489717568 pyconfig.py:471] Config param dcn_tensor_transpose_parallelism: 1
I0424 07:15:26.372957 139516489717568 pyconfig.py:471] Config param debug: {'rl': False}
I0424 07:15:26.372972 139516489717568 pyconfig.py:471] Config param debug_sharding: False
I0424 07:15:26.372987 139516489717568 pyconfig.py:471] Config param decode_sampling_nucleus_p: -1
I0424 07:15:26.373004 139516489717568 pyconfig.py:471] Config param decode_sampling_strategy: SamplingStrategy.GREEDY
I0424 07:15:26.373024 139516489717568 pyconfig.py:471] Config param decode_sampling_temperature: 1.0
I0424 07:15:26.373040 139516489717568 pyconfig.py:471] Config param decode_sampling_top_k: 0
I0424 07:15:26.373056 139516489717568 pyconfig.py:471] Config param decoder_block: DecoderBlockType.GPT3
I0424 07:15:26.373071 139516489717568 pyconfig.py:471] Config param decoder_layer_input: RematLocation.DEVICE
I0424 07:15:26.373087 139516489717568 pyconfig.py:471] Config param deepstack_visual_indexes_for_vit: []
I0424 07:15:26.373103 139516489717568 pyconfig.py:471] Config param degenerate_group_masking: True
I0424 07:15:26.373119 139516489717568 pyconfig.py:471] Config param dense_init_scale: 1.0
I0424 07:15:26.373134 139516489717568 pyconfig.py:471] Config param diloco_outer_lr: 0.3
I0424 07:15:26.373150 139516489717568 pyconfig.py:471] Config param diloco_outer_momentum: 0.9
I0424 07:15:26.373165 139516489717568 pyconfig.py:471] Config param diloco_sync_period: 36
I0424 07:15:26.373181 139516489717568 pyconfig.py:471] Config param distill_alpha: 0.5
I0424 07:15:26.373196 139516489717568 pyconfig.py:471] Config param distill_alpha_end: None
I0424 07:15:26.373212 139516489717568 pyconfig.py:471] Config param distill_alpha_schedule: constant
I0424 07:15:26.373226 139516489717568 pyconfig.py:471] Config param distill_beta: 0.0
I0424 07:15:26.373242 139516489717568 pyconfig.py:471] Config param distill_beta_end: None
I0424 07:15:26.373256 139516489717568 pyconfig.py:471] Config param distill_beta_schedule: constant
I0424 07:15:26.373271 139516489717568 pyconfig.py:471] Config param distill_feature_loss_type: cosine
I0424 07:15:26.373286 139516489717568 pyconfig.py:471] Config param distill_layer_indices: None
I0424 07:15:26.373302 139516489717568 pyconfig.py:471] Config param distill_temperature: 1.0
I0424 07:15:26.373321 139516489717568 pyconfig.py:471] Config param distill_temperature_end: None
I0424 07:15:26.373335 139516489717568 pyconfig.py:471] Config param distill_temperature_schedule: constant
I0424 07:15:26.373351 139516489717568 pyconfig.py:471] Config param downsample_hidden_size_for_audio: 256
I0424 07:15:26.373366 139516489717568 pyconfig.py:471] Config param dpo_beta: 0.1
I0424 07:15:26.373381 139516489717568 pyconfig.py:471] Config param dpo_label_smoothing: 0.0
I0424 07:15:26.373396 139516489717568 pyconfig.py:471] Config param dq_reduction_steps: 0
I0424 07:15:26.373412 139516489717568 pyconfig.py:471] Config param dropout_rate: 0.0
I0424 07:15:26.373426 139516489717568 pyconfig.py:471] Config param dtype: bfloat16
I0424 07:15:26.373457 139516489717568 pyconfig.py:471] Config param dtype_mm: float32
I0424 07:15:26.373474 139516489717568 pyconfig.py:471] Config param dump_hlo: False
I0424 07:15:26.373490 139516489717568 pyconfig.py:471] Config param dump_hlo_delete_local_after: True
I0424 07:15:26.373504 139516489717568 pyconfig.py:471] Config param dump_hlo_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-24-07-15/xla_dump
I0424 07:15:26.373519 139516489717568 pyconfig.py:471] Config param dump_hlo_local_dir: /tmp/xla_dump/
I0424 07:15:26.373533 139516489717568 pyconfig.py:471] Config param dump_hlo_local_module_name: jit_train_step
I0424 07:15:26.373549 139516489717568 pyconfig.py:471] Config param dump_hlo_module_name: jit_train_step
I0424 07:15:26.373565 139516489717568 pyconfig.py:471] Config param dump_hlo_upload_all: False
I0424 07:15:26.373580 139516489717568 pyconfig.py:471] Config param dump_hlo_xla_flags: 
I0424 07:15:26.373596 139516489717568 pyconfig.py:471] Config param dump_jaxpr: False
I0424 07:15:26.373611 139516489717568 pyconfig.py:471] Config param dump_jaxpr_delete_local_after: True
I0424 07:15:26.373625 139516489717568 pyconfig.py:471] Config param dump_jaxpr_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-24-07-15/jaxpr_dump
I0424 07:15:26.373641 139516489717568 pyconfig.py:471] Config param dump_jaxpr_local_dir: /tmp/jaxpr_dump/
I0424 07:15:26.373666 139516489717568 pyconfig.py:471] Config param dump_step: -1
I0424 07:15:26.373681 139516489717568 pyconfig.py:471] Config param elastic_enabled: False
I0424 07:15:26.373697 139516489717568 pyconfig.py:471] Config param elastic_max_retries: 10
I0424 07:15:26.373711 139516489717568 pyconfig.py:471] Config param elastic_timeout_seconds: 300
I0424 07:15:26.373726 139516489717568 pyconfig.py:471] Config param emb_dim: 16
I0424 07:15:26.373741 139516489717568 pyconfig.py:471] Config param enable_autocheckpoint: False
I0424 07:15:26.373756 139516489717568 pyconfig.py:471] Config param enable_checkpoint_cloud_logger: False
I0424 07:15:26.373771 139516489717568 pyconfig.py:471] Config param enable_checkpointing: True
I0424 07:15:26.373786 139516489717568 pyconfig.py:471] Config param enable_continuous_checkpointing: False
I0424 07:15:26.373800 139516489717568 pyconfig.py:471] Config param enable_data_shuffling: True
I0424 07:15:26.373816 139516489717568 pyconfig.py:471] Config param enable_diloco: False
I0424 07:15:26.373831 139516489717568 pyconfig.py:471] Config param enable_dp_attention: False
I0424 07:15:26.373845 139516489717568 pyconfig.py:471] Config param enable_dropout: False
I0424 07:15:26.373860 139516489717568 pyconfig.py:471] Config param enable_emergency_checkpoint: False
I0424 07:15:26.373876 139516489717568 pyconfig.py:471] Config param enable_expert_parallel: False
I0424 07:15:26.373890 139516489717568 pyconfig.py:471] Config param enable_gcp_goodput_metrics: True
I0424 07:15:26.373905 139516489717568 pyconfig.py:471] Config param enable_gcp_step_deviation_metrics: True
I0424 07:15:26.373920 139516489717568 pyconfig.py:471] Config param enable_goodput_recording: False
I0424 07:15:26.373935 139516489717568 pyconfig.py:471] Config param enable_jax_profiler: False
I0424 07:15:26.373950 139516489717568 pyconfig.py:471] Config param enable_llm_inference_pool: False
I0424 07:15:26.373965 139516489717568 pyconfig.py:471] Config param enable_model_warmup: False
I0424 07:15:26.373980 139516489717568 pyconfig.py:471] Config param enable_multi_tier_checkpointing: False
I0424 07:15:26.373994 139516489717568 pyconfig.py:471] Config param enable_nnx: False
I0424 07:15:26.374013 139516489717568 pyconfig.py:471] Config param enable_orbax_v1: False
I0424 07:15:26.374029 139516489717568 pyconfig.py:471] Config param enable_padding_causal_mask: True
I0424 07:15:26.374044 139516489717568 pyconfig.py:471] Config param enable_pathways_goodput: False
I0424 07:15:26.374058 139516489717568 pyconfig.py:471] Config param enable_prefix_caching: False
I0424 07:15:26.374073 139516489717568 pyconfig.py:471] Config param enable_rampup_batch_size: False
I0424 07:15:26.374087 139516489717568 pyconfig.py:471] Config param enable_single_controller: False
I0424 07:15:26.374102 139516489717568 pyconfig.py:471] Config param enable_single_replica_ckpt_restoring: False
I0424 07:15:26.374118 139516489717568 pyconfig.py:471] Config param enable_tensorboard: True
I0424 07:15:26.374133 139516489717568 pyconfig.py:471] Config param enable_tunix_perf_metrics: False
I0424 07:15:26.374147 139516489717568 pyconfig.py:471] Config param encoder_attention_heads_for_audio: 4
I0424 07:15:26.374163 139516489717568 pyconfig.py:471] Config param encoder_ffn_dim_for_audio: 512
I0424 07:15:26.374177 139516489717568 pyconfig.py:471] Config param encoder_layers_for_audio: 2
I0424 07:15:26.374192 139516489717568 pyconfig.py:471] Config param engram: RematLocation.REMAT
I0424 07:15:26.374208 139516489717568 pyconfig.py:471] Config param engram_head_dim: 1280
I0424 07:15:26.374223 139516489717568 pyconfig.py:471] Config param engram_kernel_size: 4
I0424 07:15:26.374239 139516489717568 pyconfig.py:471] Config param engram_layers: []
I0424 07:15:26.374253 139516489717568 pyconfig.py:471] Config param engram_max_ngram_size: 3
I0424 07:15:26.374268 139516489717568 pyconfig.py:471] Config param engram_num_heads: 8
I0424 07:15:26.374283 139516489717568 pyconfig.py:471] Config param engram_seed: 0
I0424 07:15:26.374299 139516489717568 pyconfig.py:471] Config param engram_vocab_bases: []
I0424 07:15:26.374314 139516489717568 pyconfig.py:471] Config param epsilon_high: None
I0424 07:15:26.374329 139516489717568 pyconfig.py:471] Config param eval_corr_lst: False
I0424 07:15:26.374343 139516489717568 pyconfig.py:471] Config param eval_data_columns: ['text']
I0424 07:15:26.374359 139516489717568 pyconfig.py:471] Config param eval_dataset_name: c4/en:3.0.1
I0424 07:15:26.374374 139516489717568 pyconfig.py:471] Config param eval_image_column: image
I0424 07:15:26.374389 139516489717568 pyconfig.py:471] Config param eval_interval: -1
I0424 07:15:26.374405 139516489717568 pyconfig.py:471] Config param eval_make_lst: False
I0424 07:15:26.374420 139516489717568 pyconfig.py:471] Config param eval_per_device_batch_size: 2
I0424 07:15:26.374435 139516489717568 pyconfig.py:471] Config param eval_sampling_strategy: greedy
I0424 07:15:26.374450 139516489717568 pyconfig.py:471] Config param eval_split: validation
I0424 07:15:26.374464 139516489717568 pyconfig.py:471] Config param eval_steps: -1
I0424 07:15:26.374480 139516489717568 pyconfig.py:471] Config param expansion_factor_real_data: -1.0
I0424 07:15:26.374495 139516489717568 pyconfig.py:471] Config param final_logits_soft_cap: None
I0424 07:15:26.374511 139516489717568 pyconfig.py:471] Config param first_num_dense_layers: 0
I0424 07:15:26.374527 139516489717568 pyconfig.py:471] Config param float32_gate_logits: False
I0424 07:15:26.374541 139516489717568 pyconfig.py:471] Config param float32_logits: False
I0424 07:15:26.374556 139516489717568 pyconfig.py:471] Config param float32_qk_product: False
I0424 07:15:26.374570 139516489717568 pyconfig.py:471] Config param float32_weight_sum: True
I0424 07:15:26.374586 139516489717568 pyconfig.py:471] Config param force_q_layout: False
I0424 07:15:26.374601 139516489717568 pyconfig.py:471] Config param force_unroll: False
I0424 07:15:26.374617 139516489717568 pyconfig.py:471] Config param formatting_func_kwargs: {}
I0424 07:15:26.374633 139516489717568 pyconfig.py:471] Config param formatting_func_path: 
I0424 07:15:26.374646 139516489717568 pyconfig.py:471] Config param freeze_audio_encoder_params: True
I0424 07:15:26.374674 139516489717568 pyconfig.py:471] Config param freeze_vision_encoder_params: True
I0424 07:15:26.374689 139516489717568 pyconfig.py:471] Config param fused_mlp: False
I0424 07:15:26.374704 139516489717568 pyconfig.py:471] Config param fused_qkv: True
I0424 07:15:26.374720 139516489717568 pyconfig.py:471] Config param gcs_metrics: False
I0424 07:15:26.374736 139516489717568 pyconfig.py:471] Config param gdn_chunk_size: 64
I0424 07:15:26.374752 139516489717568 pyconfig.py:471] Config param gdn_conv_kernel_dim: 4
I0424 07:15:26.374766 139516489717568 pyconfig.py:471] Config param gdn_key_head_dim: 128
I0424 07:15:26.374783 139516489717568 pyconfig.py:471] Config param gdn_num_key_heads: 16
I0424 07:15:26.374796 139516489717568 pyconfig.py:471] Config param gdn_num_value_heads: 32
I0424 07:15:26.374812 139516489717568 pyconfig.py:471] Config param gdn_value_head_dim: 128
I0424 07:15:26.374827 139516489717568 pyconfig.py:471] Config param generate_padding_batch_eval: False
I0424 07:15:26.374841 139516489717568 pyconfig.py:471] Config param generate_padding_batch_train: False
I0424 07:15:26.374857 139516489717568 pyconfig.py:471] Config param generate_slice: v5e-16
I0424 07:15:26.374871 139516489717568 pyconfig.py:471] Config param generation_configs: {}
I0424 07:15:26.374887 139516489717568 pyconfig.py:471] Config param global_batch_size_to_eval_on: 64
I0424 07:15:26.374902 139516489717568 pyconfig.py:471] Config param global_batch_size_to_load: 512
I0424 07:15:26.374927 139516489717568 pyconfig.py:471] Config param global_batch_size_to_load_eval: 64
I0424 07:15:26.374941 139516489717568 pyconfig.py:471] Config param global_batch_size_to_load_increment: None
I0424 07:15:26.374957 139516489717568 pyconfig.py:471] Config param global_batch_size_to_load_start: None
I0424 07:15:26.374971 139516489717568 pyconfig.py:471] Config param global_batch_size_to_train_on: 512
I0424 07:15:26.374986 139516489717568 pyconfig.py:471] Config param global_head_dim: 0
I0424 07:15:26.375001 139516489717568 pyconfig.py:471] Config param global_num_kv_heads: 0
I0424 07:15:26.375017 139516489717568 pyconfig.py:471] Config param global_parameter_scale: 1
I0424 07:15:26.375033 139516489717568 pyconfig.py:471] Config param global_rampup_samples: 500
I0424 07:15:26.375049 139516489717568 pyconfig.py:471] Config param global_rope_max_timescale: -1
I0424 07:15:26.375065 139516489717568 pyconfig.py:471] Config param global_rope_proportion: 0.25
I0424 07:15:26.375081 139516489717568 pyconfig.py:471] Config param goodput_upload_interval_seconds: 30
I0424 07:15:26.375096 139516489717568 pyconfig.py:471] Config param grad_dtype: float32
I0424 07:15:26.375131 139516489717568 pyconfig.py:471] Config param gradient_accumulation_steps: 8
I0424 07:15:26.375149 139516489717568 pyconfig.py:471] Config param gradient_clipping_threshold: 1.0
I0424 07:15:26.375165 139516489717568 pyconfig.py:471] Config param grain_data_source_max_workers: 16
I0424 07:15:26.375180 139516489717568 pyconfig.py:471] Config param grain_eval_files: 
I0424 07:15:26.375196 139516489717568 pyconfig.py:471] Config param grain_file_type: arrayrecord
I0424 07:15:26.375211 139516489717568 pyconfig.py:471] Config param grain_num_threads: 16
I0424 07:15:26.375225 139516489717568 pyconfig.py:471] Config param grain_num_threads_eval: 16
I0424 07:15:26.375241 139516489717568 pyconfig.py:471] Config param grain_packing_type: first_fit
I0424 07:15:26.375257 139516489717568 pyconfig.py:471] Config param grain_per_worker_buffer_size: 1
I0424 07:15:26.375271 139516489717568 pyconfig.py:471] Config param grain_per_worker_buffer_size_eval: 1
I0424 07:15:26.375286 139516489717568 pyconfig.py:471] Config param grain_prefetch_buffer_size: 500
I0424 07:15:26.375301 139516489717568 pyconfig.py:471] Config param grain_prefetch_buffer_size_eval: 500
I0424 07:15:26.375315 139516489717568 pyconfig.py:471] Config param grain_ram_budget_mb: 1024
I0424 07:15:26.375331 139516489717568 pyconfig.py:471] Config param grain_shuffle_buffer_size: 100
I0424 07:15:26.375346 139516489717568 pyconfig.py:471] Config param grain_train_files: 
I0424 07:15:26.375362 139516489717568 pyconfig.py:471] Config param grain_train_mixture_config_path: 
I0424 07:15:26.375377 139516489717568 pyconfig.py:471] Config param grain_worker_count: 1
I0424 07:15:26.375391 139516489717568 pyconfig.py:471] Config param grain_worker_count_eval: 1
I0424 07:15:26.375410 139516489717568 pyconfig.py:471] Config param grpo_beta: 0.08
I0424 07:15:26.375427 139516489717568 pyconfig.py:471] Config param grpo_epsilon: 0.2
I0424 07:15:26.375443 139516489717568 pyconfig.py:471] Config param hardware: tpu
I0424 07:15:26.375458 139516489717568 pyconfig.py:471] Config param hbm_utilization_vllm: 0.72
I0424 07:15:26.375474 139516489717568 pyconfig.py:471] Config param head_dim: 8
I0424 07:15:26.375489 139516489717568 pyconfig.py:471] Config param heartbeat_reporting_interval_in_seconds: 5
I0424 07:15:26.375504 139516489717568 pyconfig.py:471] Config param hf_data_dir: None
I0424 07:15:26.375519 139516489717568 pyconfig.py:471] Config param hf_eval_files: None
I0424 07:15:26.375534 139516489717568 pyconfig.py:471] Config param hf_eval_split: None
I0424 07:15:26.375550 139516489717568 pyconfig.py:471] Config param hf_name: None
I0424 07:15:26.375566 139516489717568 pyconfig.py:471] Config param hf_path: OptimalScale/ClimbMix
I0424 07:15:26.375580 139516489717568 pyconfig.py:471] Config param hf_train_files: None
I0424 07:15:26.375595 139516489717568 pyconfig.py:471] Config param hidden_size_for_vit: 1408
I0424 07:15:26.375609 139516489717568 pyconfig.py:471] Config param hide_profiler_step_metric: False
I0424 07:15:26.375625 139516489717568 pyconfig.py:471] Config param ici_autoregressive_parallelism: 1
I0424 07:15:26.375641 139516489717568 pyconfig.py:471] Config param ici_context_autoregressive_parallelism: 1
I0424 07:15:26.375664 139516489717568 pyconfig.py:471] Config param ici_context_parallelism: 1
I0424 07:15:26.375679 139516489717568 pyconfig.py:471] Config param ici_data_parallelism: 1
I0424 07:15:26.375695 139516489717568 pyconfig.py:471] Config param ici_diloco_parallelism: 1
I0424 07:15:26.375709 139516489717568 pyconfig.py:471] Config param ici_expert_parallelism: 1
I0424 07:15:26.375724 139516489717568 pyconfig.py:471] Config param ici_fsdp_parallelism: -1
I0424 07:15:26.375740 139516489717568 pyconfig.py:471] Config param ici_fsdp_transpose_parallelism: 1
I0424 07:15:26.375756 139516489717568 pyconfig.py:471] Config param ici_parallelism: [1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
I0424 07:15:26.375771 139516489717568 pyconfig.py:471] Config param ici_pipeline_parallelism: 1
I0424 07:15:26.375787 139516489717568 pyconfig.py:471] Config param ici_sequence_parallelism: 1
I0424 07:15:26.375801 139516489717568 pyconfig.py:471] Config param ici_tensor_parallelism: 1
I0424 07:15:26.375816 139516489717568 pyconfig.py:471] Config param ici_tensor_sequence_parallelism: 1
I0424 07:15:26.375832 139516489717568 pyconfig.py:471] Config param ici_tensor_transpose_parallelism: 1
I0424 07:15:26.375847 139516489717568 pyconfig.py:471] Config param image_path: 
I0424 07:15:26.375861 139516489717568 pyconfig.py:471] Config param image_placeholder: <|image|>
I0424 07:15:26.375877 139516489717568 pyconfig.py:471] Config param image_size_for_vit: 896
I0424 07:15:26.375892 139516489717568 pyconfig.py:471] Config param indexer_head_dim: 128
I0424 07:15:26.375906 139516489717568 pyconfig.py:471] Config param indexer_loss_scaling_factor: 0.0
I0424 07:15:26.375922 139516489717568 pyconfig.py:471] Config param indexer_n_heads: 64
I0424 07:15:26.375937 139516489717568 pyconfig.py:471] Config param indexer_sparse_training: False
I0424 07:15:26.375951 139516489717568 pyconfig.py:471] Config param indexer_topk: 2048
I0424 07:15:26.375966 139516489717568 pyconfig.py:471] Config param inference_benchmark_test: False
I0424 07:15:26.375982 139516489717568 pyconfig.py:471] Config param inference_metadata_file: 
I0424 07:15:26.375995 139516489717568 pyconfig.py:471] Config param inference_microbenchmark_log_file_path: 
I0424 07:15:26.376014 139516489717568 pyconfig.py:471] Config param inference_microbenchmark_loop_iters: 10
I0424 07:15:26.376029 139516489717568 pyconfig.py:471] Config param inference_microbenchmark_num_samples: [1, 2, 3, 4, 5]
I0424 07:15:26.376046 139516489717568 pyconfig.py:471] Config param inference_microbenchmark_prefill_lengths: 64,128,256,512,1024
I0424 07:15:26.376060 139516489717568 pyconfig.py:471] Config param inference_microbenchmark_stages: prefill,generate
I0424 07:15:26.376076 139516489717568 pyconfig.py:471] Config param inference_server: MaxtextInterleavedServer
I0424 07:15:26.376090 139516489717568 pyconfig.py:471] Config param inhomogeneous_layer_cycle_interval: 1
I0424 07:15:26.376106 139516489717568 pyconfig.py:471] Config param init_weights_seed: 0
I0424 07:15:26.376120 139516489717568 pyconfig.py:471] Config param input_data_sharding_logical_axes: ['activation_embed_and_logits_batch', 'activation_norm_length']
I0424 07:15:26.376136 139516489717568 pyconfig.py:471] Config param interleave_moe_layer_step: 1
I0424 07:15:26.376152 139516489717568 pyconfig.py:471] Config param intermediate_size_for_vit: 5632
I0424 07:15:26.376168 139516489717568 pyconfig.py:471] Config param internal_compile: False
I0424 07:15:26.376184 139516489717568 pyconfig.py:471] Config param internal_compile_num_devices: -1
I0424 07:15:26.376198 139516489717568 pyconfig.py:471] Config param jax_cache_dir: ~/jax_cache
I0424 07:15:26.376214 139516489717568 pyconfig.py:471] Config param jax_debug_log_modules: 
I0424 07:15:26.376228 139516489717568 pyconfig.py:471] Config param jax_distributed_initialization_timeout: 300
I0424 07:15:26.376244 139516489717568 pyconfig.py:471] Config param jax_profiler_port: 9999
I0424 07:15:26.376259 139516489717568 pyconfig.py:471] Config param key_proj: RematLocation.REMAT
I0424 07:15:26.376273 139516489717568 pyconfig.py:471] Config param kv_cache_buffer: 256
I0424 07:15:26.376289 139516489717568 pyconfig.py:471] Config param kv_lora_rank: 512
I0424 07:15:26.376305 139516489717568 pyconfig.py:471] Config param kv_quant_axis: KvQuantAxis.HEADS_AND_DKV
I0424 07:15:26.376322 139516489717568 pyconfig.py:471] Config param kv_quant_dtype: int8
I0424 07:15:26.376338 139516489717568 pyconfig.py:471] Config param kv_wa_proj: RematLocation.REMAT
I0424 07:15:26.376353 139516489717568 pyconfig.py:471] Config param learning_rate: 0.0002
I0424 07:15:26.376370 139516489717568 pyconfig.py:471] Config param learning_rate_final_fraction: 0.1
I0424 07:15:26.376385 139516489717568 pyconfig.py:471] Config param learning_rate_schedule_steps: 200000
I0424 07:15:26.376400 139516489717568 pyconfig.py:471] Config param load_balance_loss_weight: 0.0
I0424 07:15:26.376416 139516489717568 pyconfig.py:471] Config param load_checkpoint_only_once: False
I0424 07:15:26.376432 139516489717568 pyconfig.py:471] Config param load_from_prefill_dir: False
I0424 07:15:26.376446 139516489717568 pyconfig.py:471] Config param load_full_state_path: 
I0424 07:15:26.376461 139516489717568 pyconfig.py:471] Config param load_parameters_path: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items
I0424 07:15:26.376475 139516489717568 pyconfig.py:471] Config param local_checkpoint_directory: 
I0424 07:15:26.376491 139516489717568 pyconfig.py:471] Config param local_checkpoint_period: 0
I0424 07:15:26.376506 139516489717568 pyconfig.py:471] Config param local_rope_max_timescale: -1
I0424 07:15:26.376520 139516489717568 pyconfig.py:471] Config param local_rope_proportion: 1.0
I0424 07:15:26.376536 139516489717568 pyconfig.py:471] Config param log_config: True
I0424 07:15:26.376550 139516489717568 pyconfig.py:471] Config param log_period: 10
I0424 07:15:26.376566 139516489717568 pyconfig.py:471] Config param logical_axis_rules: (('activation_embed_and_logits_batch', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_embed_and_logits_batch_sequence', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_vocab', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_vocab', ('tensor', 'tensor_transpose')), ('activation_vocab', 'tensor_sequence'), ('activation_vocab', ('sequence', 'context')), ('vocab', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('embed_vocab', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_batch_attn', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence', 'autoregressive')), ('activation_kv_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence')), ('activation_length_attn', ('sequence', 'context')), ('activation_length_attn', ('context',)), ('activation_q_length', ('context',)), ('activation_kv_length', ()), ('activation_embed_attn', ('tensor', 'tensor_transpose')), ('activation_kv', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_kv_head_dim', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('q_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('kv_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('qkv', ()), ('kv', ()), ('kv_head_dim', ()), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'expert')), ('q_lora_up_proj', ()), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'expert')), ('kv_lora_up_proj', ()), ('activation_batch_moe', ('data', 'fsdp', 'fsdp_transpose')), ('activation_length_moe', ('sequence', 'context')), ('activation_length_moe', ('context',)), ('activation_norm_length_moe', ('tensor_sequence', 'context', 'sequence')), ('activation_embed_moe', ('tensor', 'tensor_transpose')), ('activation_mlp_moe', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_exp', ('expert',)), ('exp', 'expert'), ('mlp_moe', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'context')), ('embed_moe', ('fsdp', 'sequence', 'context')), ('activation_mlp', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_length', ('sequence', 'context')), ('activation_length', ('context',)), ('activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_embed', ('tensor', 'tensor_transpose')), ('activation_stage', 'stage'), ('mlp', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'context', 'expert')), ('norm', ('tensor', 'tensor_transpose')), ('layers', 'stage'), ('diloco', 'diloco'), ('engram_dim', ('tensor',)), ('dense_layers', ()), ('moe_layers', ()), ('mhc', ()), ('prefill_activation_length', ('sequence', 'context')), ('prefill_activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_prefill_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_length', ('sequence',)), ('cache_heads', ('autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence')), ('cache_heads', ('autoregressive', 'tensor', 'tensor_sequence')), ('paged_kv_heads', ('tensor',)), ('cache_batch_prefill', ()), ('cache_batch', ()), ('cache_heads_none', ()), ('cache_kv', ()), ('cache_sequence', ()), ('num_pages', ()), ('tokens_per_page', ()), ('paged_kv_head_dim_size', ()), ('mlp_no_fsdp', ('tensor', 'tensor_sequence', 'autoregressive')), ('embed_tensor_transpose', ('tensor_transpose',)), ('exp_with_fsdp', 'fsdp'))
I0424 07:15:26.376640 139516489717568 pyconfig.py:471] Config param logits_dot_in_fp32: False
I0424 07:15:26.376666 139516489717568 pyconfig.py:471] Config param logits_via_embedding: True
I0424 07:15:26.376683 139516489717568 pyconfig.py:471] Config param lora_input_adapters_path: 
I0424 07:15:26.376698 139516489717568 pyconfig.py:471] Config param loss_algo: grpo
I0424 07:15:26.376714 139516489717568 pyconfig.py:471] Config param lr_schedule_type: LearningRateScheduleType.COSINE
I0424 07:15:26.376731 139516489717568 pyconfig.py:471] Config param managed_mldiagnostics: False
I0424 07:15:26.376748 139516489717568 pyconfig.py:471] Config param managed_mldiagnostics_dir: /deps/maxtext_output/gpt3-52k_2026-04-24-07-15/managed-mldiagnostics
I0424 07:15:26.376762 139516489717568 pyconfig.py:471] Config param managed_mldiagnostics_run_group: 
I0424 07:15:26.376778 139516489717568 pyconfig.py:471] Config param matmul_precision: MatmulPrecision.DEFAULT
I0424 07:15:26.376795 139516489717568 pyconfig.py:471] Config param max_checkify: False
I0424 07:15:26.376810 139516489717568 pyconfig.py:471] Config param max_concurrency: 256
I0424 07:15:26.376825 139516489717568 pyconfig.py:471] Config param max_corpus_chars: 10000000
I0424 07:15:26.376840 139516489717568 pyconfig.py:471] Config param max_num_batched_tokens: None
I0424 07:15:26.376856 139516489717568 pyconfig.py:471] Config param max_num_checkpoints_to_keep: None
I0424 07:15:26.376870 139516489717568 pyconfig.py:471] Config param max_num_images_per_example: -1
I0424 07:15:26.376885 139516489717568 pyconfig.py:471] Config param max_num_seqs: None
I0424 07:15:26.376901 139516489717568 pyconfig.py:471] Config param max_position_embeddings: 163840
I0424 07:15:26.376915 139516489717568 pyconfig.py:471] Config param max_prefill_predict_length: 64
I0424 07:15:26.376931 139516489717568 pyconfig.py:471] Config param max_sample_len_for_audio: 10000
I0424 07:15:26.376944 139516489717568 pyconfig.py:471] Config param max_segments_per_seq: -1
I0424 07:15:26.376960 139516489717568 pyconfig.py:471] Config param max_source_positions_for_audio: 1500
I0424 07:15:26.376974 139516489717568 pyconfig.py:471] Config param max_target_length: 2048
I0424 07:15:26.376989 139516489717568 pyconfig.py:471] Config param max_timescale_for_audio: 10000.0
I0424 07:15:26.377004 139516489717568 pyconfig.py:471] Config param megablox: True
I0424 07:15:26.377023 139516489717568 pyconfig.py:471] Config param merge_gating_gmm: False
I0424 07:15:26.377037 139516489717568 pyconfig.py:471] Config param mesh_axes: ['diloco', 'data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive']
I0424 07:15:26.377054 139516489717568 pyconfig.py:471] Config param metrics_dir: /deps/maxtext_output/gpt3-52k_2026-04-24-07-15/metrics/
I0424 07:15:26.377069 139516489717568 pyconfig.py:471] Config param metrics_file: 
I0424 07:15:26.377085 139516489717568 pyconfig.py:471] Config param mhc_expansion_rate: 1
I0424 07:15:26.377099 139516489717568 pyconfig.py:471] Config param micro_batch_size_to_eval_on: 64
I0424 07:15:26.377115 139516489717568 pyconfig.py:471] Config param micro_batch_size_to_train_on: 64
I0424 07:15:26.377129 139516489717568 pyconfig.py:471] Config param mla_kv: RematLocation.REMAT
I0424 07:15:26.377144 139516489717568 pyconfig.py:471] Config param mla_naive_kvcache: True
I0424 07:15:26.377160 139516489717568 pyconfig.py:471] Config param mla_q: RematLocation.REMAT
I0424 07:15:26.377175 139516489717568 pyconfig.py:471] Config param mlp_activations: ['gelu']
I0424 07:15:26.377191 139516489717568 pyconfig.py:471] Config param mlp_activations_limit: -1.0
I0424 07:15:26.377206 139516489717568 pyconfig.py:471] Config param mlp_bias: False
I0424 07:15:26.377222 139516489717568 pyconfig.py:471] Config param mlp_dim: 64
I0424 07:15:26.377237 139516489717568 pyconfig.py:471] Config param mlpwi: RematLocation.REMAT
I0424 07:15:26.377252 139516489717568 pyconfig.py:471] Config param mlpwi_0: RematLocation.REMAT
I0424 07:15:26.377268 139516489717568 pyconfig.py:471] Config param mlpwi_1: RematLocation.REMAT
I0424 07:15:26.377282 139516489717568 pyconfig.py:471] Config param mlpwo: RematLocation.REMAT
I0424 07:15:26.377298 139516489717568 pyconfig.py:471] Config param moba: False
I0424 07:15:26.377312 139516489717568 pyconfig.py:471] Config param moba_chunk_size: 1024
I0424 07:15:26.377327 139516489717568 pyconfig.py:471] Config param moba_topk: 8
I0424 07:15:26.377343 139516489717568 pyconfig.py:471] Config param model_call_mode: 
I0424 07:15:26.377357 139516489717568 pyconfig.py:471] Config param model_name: gpt3-52k
I0424 07:15:26.377372 139516489717568 pyconfig.py:471] Config param moe_expert_input_dim: -1
I0424 07:15:26.377387 139516489717568 pyconfig.py:471] Config param moe_fsdp_use_two_stage_all_gather: False
I0424 07:15:26.377401 139516489717568 pyconfig.py:471] Config param moe_mlp_dim: -1
I0424 07:15:26.377417 139516489717568 pyconfig.py:471] Config param moe_mlpwi_0: RematLocation.REMAT
I0424 07:15:26.377432 139516489717568 pyconfig.py:471] Config param moe_mlpwi_1: RematLocation.REMAT
I0424 07:15:26.377447 139516489717568 pyconfig.py:471] Config param moe_mlpwo: RematLocation.REMAT
I0424 07:15:26.377462 139516489717568 pyconfig.py:471] Config param monitor_goodput: False
I0424 07:15:26.377478 139516489717568 pyconfig.py:471] Config param monitor_step_time_deviation: True
I0424 07:15:26.377492 139516489717568 pyconfig.py:471] Config param mrope_section: [24, 20, 20]
I0424 07:15:26.377508 139516489717568 pyconfig.py:471] Config param mscale: 1.0
I0424 07:15:26.377522 139516489717568 pyconfig.py:471] Config param mtc_data_parallelism: 0
I0424 07:15:26.377537 139516489717568 pyconfig.py:471] Config param mtp_eval_target_module: 0
I0424 07:15:26.377553 139516489717568 pyconfig.py:471] Config param mtp_loss_scaling_factor: 0.1
I0424 07:15:26.377568 139516489717568 pyconfig.py:471] Config param mtp_num_layers: 0
I0424 07:15:26.377583 139516489717568 pyconfig.py:471] Config param mu_dtype: float32
I0424 07:15:26.377607 139516489717568 pyconfig.py:471] Config param multi_sampling: False
I0424 07:15:26.377623 139516489717568 pyconfig.py:471] Config param multi_tier_checkpointing_backup_interval_minutes: 0
I0424 07:15:26.377637 139516489717568 pyconfig.py:471] Config param muon_beta: 0.95
I0424 07:15:26.377662 139516489717568 pyconfig.py:471] Config param muon_consistent_rms: None
I0424 07:15:26.377679 139516489717568 pyconfig.py:471] Config param muon_weight_decay: 0.0
I0424 07:15:26.377693 139516489717568 pyconfig.py:471] Config param n_routing_groups: -1
I0424 07:15:26.377708 139516489717568 pyconfig.py:471] Config param n_window_for_audio: 50
I0424 07:15:26.377723 139516489717568 pyconfig.py:471] Config param n_window_infer_for_audio: 800
I0424 07:15:26.377739 139516489717568 pyconfig.py:471] Config param nope_layer_interval: -1
I0424 07:15:26.377755 139516489717568 pyconfig.py:471] Config param norm_topk_prob: False
I0424 07:15:26.377771 139516489717568 pyconfig.py:471] Config param normalization_layer_epsilon: 1e-05
I0424 07:15:26.377789 139516489717568 pyconfig.py:471] Config param normalize_embedding_logits: False
I0424 07:15:26.377804 139516489717568 pyconfig.py:471] Config param num_attention_heads_for_vit: 16
I0424 07:15:26.377820 139516489717568 pyconfig.py:471] Config param num_batches: 4
I0424 07:15:26.377834 139516489717568 pyconfig.py:471] Config param num_channels_for_vit: 3
I0424 07:15:26.377850 139516489717568 pyconfig.py:471] Config param num_conv_layers_for_audio: 3
I0424 07:15:26.377864 139516489717568 pyconfig.py:471] Config param num_decoder_layers: 1
I0424 07:15:26.377880 139516489717568 pyconfig.py:471] Config param num_diloco_replicas: 1
I0424 07:15:26.377893 139516489717568 pyconfig.py:471] Config param num_epoch: 1
I0424 07:15:26.377909 139516489717568 pyconfig.py:471] Config param num_eval_passes: 1
I0424 07:15:26.377924 139516489717568 pyconfig.py:471] Config param num_experts: 1
I0424 07:15:26.377938 139516489717568 pyconfig.py:471] Config param num_experts_per_tok: 1
I0424 07:15:26.377953 139516489717568 pyconfig.py:471] Config param num_generations: 2
I0424 07:15:26.377968 139516489717568 pyconfig.py:471] Config param num_hidden_layers_for_vit: 34
I0424 07:15:26.377983 139516489717568 pyconfig.py:471] Config param num_iterations: 1
I0424 07:15:26.377997 139516489717568 pyconfig.py:471] Config param num_kv_heads: 2
I0424 07:15:26.378015 139516489717568 pyconfig.py:471] Config param num_layers_per_pipeline_stage: 1
I0424 07:15:26.378030 139516489717568 pyconfig.py:471] Config param num_mel_bins_for_audio: 128
I0424 07:15:26.378045 139516489717568 pyconfig.py:471] Config param num_pipeline_microbatches: -1
I0424 07:15:26.378060 139516489717568 pyconfig.py:471] Config param num_pipeline_repeats: -1
I0424 07:15:26.378076 139516489717568 pyconfig.py:471] Config param num_position_embeddings_for_vit: 1024
I0424 07:15:26.378090 139516489717568 pyconfig.py:471] Config param num_query_heads: 2
I0424 07:15:26.378105 139516489717568 pyconfig.py:471] Config param num_samplers_slices: -1
I0424 07:15:26.378121 139516489717568 pyconfig.py:471] Config param num_slices: 1
I0424 07:15:26.378136 139516489717568 pyconfig.py:471] Config param num_target_devices: 32
I0424 07:15:26.378151 139516489717568 pyconfig.py:471] Config param num_test_batches: 5
I0424 07:15:26.378167 139516489717568 pyconfig.py:471] Config param num_trainer_slices: -1
I0424 07:15:26.378181 139516489717568 pyconfig.py:471] Config param num_vocab_tiling: 1
I0424 07:15:26.378197 139516489717568 pyconfig.py:471] Config param off_policy_steps: 0
I0424 07:15:26.378212 139516489717568 pyconfig.py:471] Config param offline_data_dir: None
I0424 07:15:26.378228 139516489717568 pyconfig.py:471] Config param opt_type: OptimizerType.ADAM_PAX
I0424 07:15:26.378245 139516489717568 pyconfig.py:471] Config param optimize_mesh_for_tpu_v6e: False
I0424 07:15:26.378260 139516489717568 pyconfig.py:471] Config param optimizer_memory_host_offload: False
I0424 07:15:26.378275 139516489717568 pyconfig.py:471] Config param original_max_position_embeddings: 4096
I0424 07:15:26.378290 139516489717568 pyconfig.py:471] Config param out_hidden_size_for_vit: 512
I0424 07:15:26.378305 139516489717568 pyconfig.py:471] Config param out_proj: RematLocation.REMAT
I0424 07:15:26.378321 139516489717568 pyconfig.py:471] Config param output_dim_for_audio: 512
I0424 07:15:26.378335 139516489717568 pyconfig.py:471] Config param override_logical_axis_rules: False
I0424 07:15:26.378350 139516489717568 pyconfig.py:471] Config param override_model_config: True
I0424 07:15:26.378365 139516489717568 pyconfig.py:471] Config param packing: True
I0424 07:15:26.378380 139516489717568 pyconfig.py:471] Config param pagedattn_head_dim_alignment: 128
I0424 07:15:26.378396 139516489717568 pyconfig.py:471] Config param pagedattn_max_pages_per_group: -1
I0424 07:15:26.378411 139516489717568 pyconfig.py:471] Config param pagedattn_num_pages: 64
I0424 07:15:26.378426 139516489717568 pyconfig.py:471] Config param pagedattn_pages_per_compute_block: 4
I0424 07:15:26.378440 139516489717568 pyconfig.py:471] Config param pagedattn_tokens_per_page: 32
I0424 07:15:26.378456 139516489717568 pyconfig.py:471] Config param param_scan_axis: 1
I0424 07:15:26.378469 139516489717568 pyconfig.py:471] Config param parameter_memory_host_offload: False
I0424 07:15:26.378485 139516489717568 pyconfig.py:471] Config param partial_rotary_factor: 1.0
I0424 07:15:26.378501 139516489717568 pyconfig.py:471] Config param patch_size_for_vit: 14
I0424 07:15:26.378515 139516489717568 pyconfig.py:471] Config param penalty_incorrect_answer: -1.0
I0424 07:15:26.378530 139516489717568 pyconfig.py:471] Config param penalty_incorrect_format: -0.5
I0424 07:15:26.378545 139516489717568 pyconfig.py:471] Config param per_device_batch_size: 2
I0424 07:15:26.378561 139516489717568 pyconfig.py:471] Config param per_device_batch_size_increment: 2.0
I0424 07:15:26.378576 139516489717568 pyconfig.py:471] Config param per_device_batch_size_start: 4.0
I0424 07:15:26.378591 139516489717568 pyconfig.py:471] Config param pipeline_delay_activation_forwarding: False
I0424 07:15:26.378606 139516489717568 pyconfig.py:471] Config param pipeline_fsdp_ag_once: False
I0424 07:15:26.378622 139516489717568 pyconfig.py:471] Config param pipeline_fsdp_ag_per_repeat: False
I0424 07:15:26.378636 139516489717568 pyconfig.py:471] Config param pipeline_parallel_layers: 1
I0424 07:15:26.378659 139516489717568 pyconfig.py:471] Config param pixel_shuffle_ratio_for_vit: 0.5
I0424 07:15:26.378675 139516489717568 pyconfig.py:471] Config param posemb_type_for_vit: learn
I0424 07:15:26.378690 139516489717568 pyconfig.py:471] Config param position_id_per_seconds: 25
I0424 07:15:26.378705 139516489717568 pyconfig.py:471] Config param prefill_cache_axis_order: 1,2,0,3
I0424 07:15:26.378720 139516489717568 pyconfig.py:471] Config param prefill_cache_dir: 
I0424 07:15:26.378734 139516489717568 pyconfig.py:471] Config param prefill_chunk_size: 256
I0424 07:15:26.378749 139516489717568 pyconfig.py:471] Config param prefill_slice: v5e-16
I0424 07:15:26.378765 139516489717568 pyconfig.py:471] Config param prefix_caching_dram_byte: 100000000000
I0424 07:15:26.378779 139516489717568 pyconfig.py:471] Config param prefix_caching_hbm_byte: 10000000000
I0424 07:15:26.378794 139516489717568 pyconfig.py:471] Config param prefuse_moe_weights: False
I0424 07:15:26.378810 139516489717568 pyconfig.py:471] Config param profile_cleanly: True
I0424 07:15:26.378850 139516489717568 pyconfig.py:471] Config param profile_periodically_period: -1
I0424 07:15:26.378868 139516489717568 pyconfig.py:471] Config param profile_power_events: False
I0424 07:15:26.378884 139516489717568 pyconfig.py:471] Config param profiler: ProfilerType.NONE
I0424 07:15:26.378902 139516489717568 pyconfig.py:471] Config param profiler_steps: 5
I0424 07:15:26.378916 139516489717568 pyconfig.py:471] Config param projector_dropout_for_vit: 0.0
I0424 07:15:26.378931 139516489717568 pyconfig.py:471] Config param projector_input_dim_for_vit: 4096
I0424 07:15:26.378946 139516489717568 pyconfig.py:471] Config param projector_output_dim_for_vit: 4096
I0424 07:15:26.378961 139516489717568 pyconfig.py:471] Config param prometheus_port: 0
I0424 07:15:26.378976 139516489717568 pyconfig.py:471] Config param prompt: I love to
I0424 07:15:26.378992 139516489717568 pyconfig.py:471] Config param pure_nnx: False
I0424 07:15:26.379006 139516489717568 pyconfig.py:471] Config param pure_nnx_decoder: False
I0424 07:15:26.379025 139516489717568 pyconfig.py:471] Config param q_lora_rank: 0
I0424 07:15:26.379039 139516489717568 pyconfig.py:471] Config param qk_clip_threshold: 100.0
I0424 07:15:26.379055 139516489717568 pyconfig.py:471] Config param qk_nope_head_dim: 128
I0424 07:15:26.379070 139516489717568 pyconfig.py:471] Config param qk_norm_with_scale: True
I0424 07:15:26.379086 139516489717568 pyconfig.py:471] Config param qk_rope_head_dim: 64
I0424 07:15:26.379102 139516489717568 pyconfig.py:471] Config param qkv_proj: RematLocation.REMAT
I0424 07:15:26.379117 139516489717568 pyconfig.py:471] Config param quant_cfg_path: 
I0424 07:15:26.379133 139516489717568 pyconfig.py:471] Config param quantization: QuantizationType.NONE
I0424 07:15:26.379151 139516489717568 pyconfig.py:471] Config param quantization_local_shard_count: 4
I0424 07:15:26.379167 139516489717568 pyconfig.py:471] Config param quantize_kvcache: False
I0424 07:15:26.379181 139516489717568 pyconfig.py:471] Config param query_proj: RematLocation.REMAT
I0424 07:15:26.379196 139516489717568 pyconfig.py:471] Config param query_wa_proj: RematLocation.REMAT
I0424 07:15:26.379212 139516489717568 pyconfig.py:471] Config param ragged_block_size: 256
I0424 07:15:26.379228 139516489717568 pyconfig.py:471] Config param ragged_buffer_factor: -1.0
I0424 07:15:26.379244 139516489717568 pyconfig.py:471] Config param rampup_end_step: 0
I0424 07:15:26.379260 139516489717568 pyconfig.py:471] Config param rampup_samples_per_increment_to_load: None
I0424 07:15:26.379276 139516489717568 pyconfig.py:471] Config param reasoning_end_token: </reasoning>
I0424 07:15:26.379292 139516489717568 pyconfig.py:471] Config param reasoning_start_token: <reasoning>
I0424 07:15:26.379306 139516489717568 pyconfig.py:471] Config param record_internal_nn_metrics: 0
I0424 07:15:26.379322 139516489717568 pyconfig.py:471] Config param remat_policy: full
I0424 07:15:26.379338 139516489717568 pyconfig.py:471] Config param remat_policy_for_vit: minimal
I0424 07:15:26.379353 139516489717568 pyconfig.py:471] Config param remove_size_one_mesh_axis_from_type: True
I0424 07:15:26.379368 139516489717568 pyconfig.py:471] Config param replicate_quant_scale: False
I0424 07:15:26.379382 139516489717568 pyconfig.py:471] Config param replicator_backup_interval_minutes: 0
I0424 07:15:26.379398 139516489717568 pyconfig.py:471] Config param report_heartbeat_metric_for_gcp_monitoring: False
I0424 07:15:26.379414 139516489717568 pyconfig.py:471] Config param report_performance_metric_for_gcp_monitoring: False
I0424 07:15:26.379427 139516489717568 pyconfig.py:471] Config param reshape_q: False
I0424 07:15:26.379443 139516489717568 pyconfig.py:471] Config param return_log_prob: False
I0424 07:15:26.379457 139516489717568 pyconfig.py:471] Config param reuse_example_batch: 0
I0424 07:15:26.379473 139516489717568 pyconfig.py:471] Config param reward_exact_answer: 5.0
I0424 07:15:26.379488 139516489717568 pyconfig.py:471] Config param reward_exact_format_match: 3.0
I0424 07:15:26.379504 139516489717568 pyconfig.py:471] Config param reward_partial_format_match: 0.5
I0424 07:15:26.379520 139516489717568 pyconfig.py:471] Config param reward_ratio_guess_to_answer_high: 0.5
I0424 07:15:26.379535 139516489717568 pyconfig.py:471] Config param reward_ratio_guess_to_answer_low: 0.25
I0424 07:15:26.379551 139516489717568 pyconfig.py:471] Config param reward_white_space_format_match: 1.5
I0424 07:15:26.379566 139516489717568 pyconfig.py:471] Config param rl: {'num_generations': 2, 'num_iterations': 1, 'grpo_beta': 0.08, 'grpo_epsilon': 0.2, 'loss_algo': 'grpo', 'use_agentic_rollout': False, 'max_concurrency': 256, 'off_policy_steps': 0, 'system_prompt': '', 'degenerate_group_masking': True, 'epsilon_high': None}
I0424 07:15:26.379587 139516489717568 pyconfig.py:471] Config param rollout_data_parallelism: -1
I0424 07:15:26.379603 139516489717568 pyconfig.py:471] Config param rollout_expert_parallelism: 1
I0424 07:15:26.379617 139516489717568 pyconfig.py:471] Config param rollout_micro_batch_size: -1
I0424 07:15:26.379632 139516489717568 pyconfig.py:471] Config param rollout_tensor_parallelism: -1
I0424 07:15:26.379648 139516489717568 pyconfig.py:471] Config param rope_attention_scaling: False
I0424 07:15:26.379671 139516489717568 pyconfig.py:471] Config param rope_factor: 40
I0424 07:15:26.379686 139516489717568 pyconfig.py:471] Config param rope_interleave: True
I0424 07:15:26.379702 139516489717568 pyconfig.py:471] Config param rope_linear_scaling_factor: 1.0
I0424 07:15:26.379716 139516489717568 pyconfig.py:471] Config param rope_max_timescale: 10000
I0424 07:15:26.379731 139516489717568 pyconfig.py:471] Config param rope_min_timescale: 1
I0424 07:15:26.379747 139516489717568 pyconfig.py:471] Config param rope_theta_for_vit: 10000
I0424 07:15:26.379763 139516489717568 pyconfig.py:471] Config param rope_truncate: True
I0424 07:15:26.379776 139516489717568 pyconfig.py:471] Config param rope_type: RopeType.DEFAULT
I0424 07:15:26.379795 139516489717568 pyconfig.py:471] Config param rope_use_scale: True
I0424 07:15:26.379811 139516489717568 pyconfig.py:471] Config param routed_bias: False
I0424 07:15:26.379827 139516489717568 pyconfig.py:471] Config param routed_bias_update_rate: 0.0
I0424 07:15:26.379842 139516489717568 pyconfig.py:471] Config param routed_scaling_factor: 1.0
I0424 07:15:26.379856 139516489717568 pyconfig.py:471] Config param routed_score_func: 
I0424 07:15:26.379872 139516489717568 pyconfig.py:471] Config param run_name: gpt3-52k_2026-04-24-07-15
I0424 07:15:26.379886 139516489717568 pyconfig.py:471] Config param sa_block_kv: 512
I0424 07:15:26.379901 139516489717568 pyconfig.py:471] Config param sa_block_kv_compute: 512
I0424 07:15:26.379915 139516489717568 pyconfig.py:471] Config param sa_block_kv_dkv: 512
I0424 07:15:26.379931 139516489717568 pyconfig.py:471] Config param sa_block_kv_dkv_compute: 512
I0424 07:15:26.379946 139516489717568 pyconfig.py:471] Config param sa_block_kv_dq: 512
I0424 07:15:26.379960 139516489717568 pyconfig.py:471] Config param sa_block_q: 512
I0424 07:15:26.379976 139516489717568 pyconfig.py:471] Config param sa_block_q_dkv: 512
I0424 07:15:26.379990 139516489717568 pyconfig.py:471] Config param sa_block_q_dq: 512
I0424 07:15:26.380026 139516489717568 pyconfig.py:471] Config param sa_k_layout: HEAD_DIM_MINOR
I0424 07:15:26.380042 139516489717568 pyconfig.py:471] Config param sa_q_layout: HEAD_DIM_MINOR
I0424 07:15:26.380058 139516489717568 pyconfig.py:471] Config param sa_use_fused_bwd_kernel: False
I0424 07:15:26.380073 139516489717568 pyconfig.py:471] Config param sa_v_layout: HEAD_DIM_MINOR
I0424 07:15:26.380089 139516489717568 pyconfig.py:471] Config param sampler_devices_fraction: 0.5
I0424 07:15:26.380104 139516489717568 pyconfig.py:471] Config param save_checkpoint_on_completion: True
I0424 07:15:26.380119 139516489717568 pyconfig.py:471] Config param save_config_to_gcs: False
I0424 07:15:26.380135 139516489717568 pyconfig.py:471] Config param save_quantized_params_path: 
I0424 07:15:26.380148 139516489717568 pyconfig.py:471] Config param scale_embedding_for_audio: True
I0424 07:15:26.380164 139516489717568 pyconfig.py:471] Config param scan_layers: True
I0424 07:15:26.380179 139516489717568 pyconfig.py:471] Config param scan_layers_per_stage: False
I0424 07:15:26.380193 139516489717568 pyconfig.py:471] Config param scan_pipeline_iterations: True
I0424 07:15:26.380209 139516489717568 pyconfig.py:471] Config param scan_pipeline_repeats: False
I0424 07:15:26.380223 139516489717568 pyconfig.py:471] Config param set_remat_policy_on_layers_per_stage: False
I0424 07:15:26.380238 139516489717568 pyconfig.py:471] Config param set_remat_policy_on_pipeline_iterations: True
I0424 07:15:26.380254 139516489717568 pyconfig.py:471] Config param sft_train_on_completion_only: False
I0424 07:15:26.380268 139516489717568 pyconfig.py:471] Config param shard_exp_on_fsdp: False
I0424 07:15:26.380283 139516489717568 pyconfig.py:471] Config param shard_mode: ShardMode.AUTO
I0424 07:15:26.380299 139516489717568 pyconfig.py:471] Config param shard_optimizer_over_data: False
I0424 07:15:26.380336 139516489717568 pyconfig.py:471] Config param sharding_strategy: None
I0424 07:15:26.380352 139516489717568 pyconfig.py:471] Config param sharding_tolerance: 0.02
I0424 07:15:26.380367 139516489717568 pyconfig.py:471] Config param shardy: True
I0424 07:15:26.380383 139516489717568 pyconfig.py:471] Config param share_kv_projections: False
I0424 07:15:26.380398 139516489717568 pyconfig.py:471] Config param shared_experts: 0
I0424 07:15:26.380413 139516489717568 pyconfig.py:471] Config param sinkhorn_iterations: 20
I0424 07:15:26.380428 139516489717568 pyconfig.py:471] Config param skip_first_n_steps_for_profiler: 1
I0424 07:15:26.380443 139516489717568 pyconfig.py:471] Config param skip_jax_distributed_system: False
I0424 07:15:26.380458 139516489717568 pyconfig.py:471] Config param skip_step_interval: 128
I0424 07:15:26.380472 139516489717568 pyconfig.py:471] Config param skip_step_on_spikes: False
I0424 07:15:26.380488 139516489717568 pyconfig.py:471] Config param skip_step_scaling_factor: 6.0
I0424 07:15:26.380504 139516489717568 pyconfig.py:471] Config param sliding_window_size: 0
I0424 07:15:26.380519 139516489717568 pyconfig.py:471] Config param solution_end_token: </answer>
I0424 07:15:26.380535 139516489717568 pyconfig.py:471] Config param solution_start_token: <answer>
I0424 07:15:26.380550 139516489717568 pyconfig.py:471] Config param source_checkpoint_layout: orbax
I0424 07:15:26.380566 139516489717568 pyconfig.py:471] Config param sparse_matmul: True
I0424 07:15:26.380580 139516489717568 pyconfig.py:471] Config param spatial_merge_size_for_vit: 2
I0424 07:15:26.380596 139516489717568 pyconfig.py:471] Config param stack_prefill_result_cache: False
I0424 07:15:26.380611 139516489717568 pyconfig.py:471] Config param stack_trace_interval_seconds: 600
I0424 07:15:26.380625 139516489717568 pyconfig.py:471] Config param stack_trace_to_cloud: False
I0424 07:15:26.380640 139516489717568 pyconfig.py:471] Config param step_deviation_interval_seconds: 30
I0424 07:15:26.380661 139516489717568 pyconfig.py:471] Config param steps: 200000
I0424 07:15:26.380677 139516489717568 pyconfig.py:471] Config param stop_strings: None
I0424 07:15:26.380692 139516489717568 pyconfig.py:471] Config param student_overrides: {'model_name': 'llama3.1-8b'}
I0424 07:15:26.380707 139516489717568 pyconfig.py:471] Config param student_params_to_update: None
I0424 07:15:26.380723 139516489717568 pyconfig.py:471] Config param subslice_shape: 
I0424 07:15:26.380738 139516489717568 pyconfig.py:471] Config param swap_space_vllm_gb: 2
I0424 07:15:26.380754 139516489717568 pyconfig.py:471] Config param system_prompt: 
I0424 07:15:26.380768 139516489717568 pyconfig.py:471] Config param target_eval_loss: 0.0
I0424 07:15:26.380784 139516489717568 pyconfig.py:471] Config param teacher_overrides: {'model_name': 'llama3.1-8b'}
I0424 07:15:26.380799 139516489717568 pyconfig.py:471] Config param temperature_tuning: False
I0424 07:15:26.380814 139516489717568 pyconfig.py:471] Config param temporal_patch_size_for_vit: 2
I0424 07:15:26.380830 139516489717568 pyconfig.py:471] Config param tensorboard_dir: /deps/maxtext_output/gpt3-52k_2026-04-24-07-15/tensorboard/
I0424 07:15:26.380844 139516489717568 pyconfig.py:471] Config param tensors_on_device: None
I0424 07:15:26.380860 139516489717568 pyconfig.py:471] Config param tensors_to_offload: None
I0424 07:15:26.380875 139516489717568 pyconfig.py:471] Config param test_batch_start_index: 0
I0424 07:15:26.380889 139516489717568 pyconfig.py:471] Config param tile_size_for_vit: 336
I0424 07:15:26.380904 139516489717568 pyconfig.py:471] Config param tokenize_eval_data: True
I0424 07:15:26.380920 139516489717568 pyconfig.py:471] Config param tokenize_train_data: True
I0424 07:15:26.380934 139516489717568 pyconfig.py:471] Config param tokenizer_path: meta-llama/Llama-3.1-8B
I0424 07:15:26.380949 139516489717568 pyconfig.py:471] Config param tokenizer_type: TokenizerType.HUGGINGFACE
I0424 07:15:26.380965 139516489717568 pyconfig.py:471] Config param topk_routing_group: -1
I0424 07:15:26.380980 139516489717568 pyconfig.py:471] Config param train_data_columns: ['text']
I0424 07:15:26.380996 139516489717568 pyconfig.py:471] Config param train_fraction: 1.0
I0424 07:15:26.381015 139516489717568 pyconfig.py:471] Config param train_image_column: image
I0424 07:15:26.381030 139516489717568 pyconfig.py:471] Config param train_micro_batch_size: -1
I0424 07:15:26.381045 139516489717568 pyconfig.py:471] Config param train_split: train
I0424 07:15:26.381060 139516489717568 pyconfig.py:471] Config param trainable_parameters_mask: []
I0424 07:15:26.381075 139516489717568 pyconfig.py:471] Config param trainable_position_size: 2048
I0424 07:15:26.381090 139516489717568 pyconfig.py:471] Config param trainer_devices_fraction: 0.5
I0424 07:15:26.381105 139516489717568 pyconfig.py:471] Config param upload_all_profiler_results: False
I0424 07:15:26.381120 139516489717568 pyconfig.py:471] Config param use_2d_fsdp_sharding: False
I0424 07:15:26.381135 139516489717568 pyconfig.py:471] Config param use_agentic_rollout: False
I0424 07:15:26.381151 139516489717568 pyconfig.py:471] Config param use_audio: False
I0424 07:15:26.381165 139516489717568 pyconfig.py:471] Config param use_audio_in_video: False
I0424 07:15:26.381181 139516489717568 pyconfig.py:471] Config param use_batch_split_schedule: False
I0424 07:15:26.381195 139516489717568 pyconfig.py:471] Config param use_chat_template: False
I0424 07:15:26.381210 139516489717568 pyconfig.py:471] Config param use_chunked_prefill: False
I0424 07:15:26.381225 139516489717568 pyconfig.py:471] Config param use_custom_sort_vjp: True
I0424 07:15:26.381239 139516489717568 pyconfig.py:471] Config param use_dpo: False
I0424 07:15:26.381254 139516489717568 pyconfig.py:471] Config param use_gather_mosaic_kernel: False
I0424 07:15:26.381269 139516489717568 pyconfig.py:471] Config param use_grpo: True
I0424 07:15:26.381283 139516489717568 pyconfig.py:471] Config param use_indexer: False
I0424 07:15:26.381298 139516489717568 pyconfig.py:471] Config param use_iota_embed: True
I0424 07:15:26.381314 139516489717568 pyconfig.py:471] Config param use_jax_splash: False
I0424 07:15:26.381328 139516489717568 pyconfig.py:471] Config param use_max_logit_estimate: -1
I0424 07:15:26.381344 139516489717568 pyconfig.py:471] Config param use_mrope: False
I0424 07:15:26.381360 139516489717568 pyconfig.py:471] Config param use_multimodal: False
I0424 07:15:26.381376 139516489717568 pyconfig.py:471] Config param use_pathways: True
I0424 07:15:26.381390 139516489717568 pyconfig.py:471] Config param use_post_attn_norm: False
I0424 07:15:26.381406 139516489717568 pyconfig.py:471] Config param use_post_ffw_norm: False
I0424 07:15:26.381421 139516489717568 pyconfig.py:471] Config param use_qk_clip: False
I0424 07:15:26.381436 139516489717568 pyconfig.py:471] Config param use_qk_norm: False
I0424 07:15:26.381450 139516489717568 pyconfig.py:471] Config param use_qk_norm_in_gdn: True
I0424 07:15:26.381466 139516489717568 pyconfig.py:471] Config param use_qwix_quantization: False
I0424 07:15:26.381481 139516489717568 pyconfig.py:471] Config param use_ragged_attention: False
I0424 07:15:26.381495 139516489717568 pyconfig.py:471] Config param use_random_routing: False
I0424 07:15:26.381511 139516489717568 pyconfig.py:471] Config param use_replicator_service: False
I0424 07:15:26.381525 139516489717568 pyconfig.py:471] Config param use_ring_of_experts: False
I0424 07:15:26.381541 139516489717568 pyconfig.py:471] Config param use_sft: False
I0424 07:15:26.381555 139516489717568 pyconfig.py:471] Config param use_splash_scheduler: False
I0424 07:15:26.381570 139516489717568 pyconfig.py:471] Config param use_tokamax_gmm: False
I0424 07:15:26.381585 139516489717568 pyconfig.py:471] Config param use_tokamax_splash: False
I0424 07:15:26.381600 139516489717568 pyconfig.py:471] Config param use_truncation: True
I0424 07:15:26.381616 139516489717568 pyconfig.py:471] Config param use_tunix_gradient_accumulation: False
I0424 07:15:26.381630 139516489717568 pyconfig.py:471] Config param use_untrainable_positional_embedding: False
I0424 07:15:26.381645 139516489717568 pyconfig.py:471] Config param use_vertex_tensorboard: False
I0424 07:15:26.381668 139516489717568 pyconfig.py:471] Config param using_pipeline_parallelism: False
I0424 07:15:26.381683 139516489717568 pyconfig.py:471] Config param v_head_dim: 128
I0424 07:15:26.381698 139516489717568 pyconfig.py:471] Config param v_norm_with_scale: True
I0424 07:15:26.381713 139516489717568 pyconfig.py:471] Config param value_proj: RematLocation.REMAT
I0424 07:15:26.381729 139516489717568 pyconfig.py:471] Config param vertex_tensorboard_project: 
I0424 07:15:26.381744 139516489717568 pyconfig.py:471] Config param vertex_tensorboard_region: 
I0424 07:15:26.381758 139516489717568 pyconfig.py:471] Config param video_path: 
I0424 07:15:26.381773 139516489717568 pyconfig.py:471] Config param video_placeholder: <|video|>
I0424 07:15:26.381788 139516489717568 pyconfig.py:471] Config param vision_output_dim_for_vit: 4096
I0424 07:15:26.381803 139516489717568 pyconfig.py:471] Config param vision_output_length: -1
I0424 07:15:26.381818 139516489717568 pyconfig.py:471] Config param vllm_additional_config: {}
I0424 07:15:26.381833 139516489717568 pyconfig.py:471] Config param vllm_hf_config_path: 
I0424 07:15:26.381862 139516489717568 pyconfig.py:471] Config param vllm_hf_overrides: {}
I0424 07:15:26.381879 139516489717568 pyconfig.py:471] Config param vocab_size: 32000
I0424 07:15:26.381893 139516489717568 pyconfig.py:471] Config param warmup_steps_fraction: 0.1
I0424 07:15:26.381909 139516489717568 pyconfig.py:471] Config param weight_dtype: float32
I0424 07:15:26.381934 139516489717568 pyconfig.py:471] Config param weight_quantization_calibration_method: absmax
I0424 07:15:26.381949 139516489717568 pyconfig.py:471] Config param wi_tile_dlhs_batch_seq: 512
I0424 07:15:26.381964 139516489717568 pyconfig.py:471] Config param wi_tile_dlhs_embed_dim: 1024
I0424 07:15:26.381979 139516489717568 pyconfig.py:471] Config param wi_tile_dlhs_mlp_dim: 1024
I0424 07:15:26.381994 139516489717568 pyconfig.py:471] Config param wi_tile_drhs_batch_seq: 512
I0424 07:15:26.382013 139516489717568 pyconfig.py:471] Config param wi_tile_drhs_embed_dim: 1024
I0424 07:15:26.382027 139516489717568 pyconfig.py:471] Config param wi_tile_drhs_mlp_dim: 1024
I0424 07:15:26.382043 139516489717568 pyconfig.py:471] Config param wi_tile_fwd_batch_seq: 512
I0424 07:15:26.382058 139516489717568 pyconfig.py:471] Config param wi_tile_fwd_embed_dim: 1024
I0424 07:15:26.382072 139516489717568 pyconfig.py:471] Config param wi_tile_fwd_mlp_dim: 1024
I0424 07:15:26.382088 139516489717568 pyconfig.py:471] Config param wo_tile_dlhs_batch_seq: 512
I0424 07:15:26.382104 139516489717568 pyconfig.py:471] Config param wo_tile_dlhs_embed_dim: 1024
I0424 07:15:26.382118 139516489717568 pyconfig.py:471] Config param wo_tile_dlhs_mlp_dim: 1024
I0424 07:15:26.382132 139516489717568 pyconfig.py:471] Config param wo_tile_drhs_batch_seq: 512
I0424 07:15:26.382148 139516489717568 pyconfig.py:471] Config param wo_tile_drhs_embed_dim: 1024
I0424 07:15:26.382163 139516489717568 pyconfig.py:471] Config param wo_tile_drhs_mlp_dim: 1024
I0424 07:15:26.382178 139516489717568 pyconfig.py:471] Config param wo_tile_fwd_batch_seq: 512
I0424 07:15:26.382192 139516489717568 pyconfig.py:471] Config param wo_tile_fwd_embed_dim: 1024
I0424 07:15:26.382207 139516489717568 pyconfig.py:471] Config param wo_tile_fwd_mlp_dim: 1024
I0424 07:15:26.382222 139516489717568 pyconfig.py:471] Config param wsd_decay_steps_fraction: 0.1
I0424 07:15:26.382237 139516489717568 pyconfig.py:471] Config param wsd_decay_style: WsdDecayStyle.LINEAR
I0424 07:15:26.382255 139516489717568 pyconfig.py:471] Config param xprof_e2e_enable_fw_power_level_event: False
I0424 07:15:26.382271 139516489717568 pyconfig.py:471] Config param xprof_e2e_enable_fw_thermal_event: False
I0424 07:15:26.382285 139516489717568 pyconfig.py:471] Config param xprof_e2e_enable_fw_throttle_event: False
I0424 07:15:26.382300 139516489717568 pyconfig.py:471] Config param xprof_tpu_power_trace_level: 0
I0424 07:15:26.382317 139516489717568 pyconfig.py:471] Config param z_loss_multiplier: 0.0
I0424 07:15:26.382641 139516489717568 tokenizer.py:245] Tokenizer path: meta-llama/Llama-2-7b-chat-hf
I0424 07:15:26.382691 139516489717568 tokenizer.py:224] Loading HF tokenizer: meta-llama/Llama-2-7b-chat-hf
I0424 07:15:26.581381 139516489717568 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json "HTTP/1.1 200 OK"
I0424 07:15:26.702201 139516489717568 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json "HTTP/1.1 200 OK"
I0424 07:15:26.816438 139516489717568 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer_config.json "HTTP/1.1 200 OK"
I0424 07:15:26.935153 139516489717568 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer_config.json "HTTP/1.1 200 OK"
I0424 07:15:27.045106 139516489717568 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main/additional_chat_templates?recursive=false&expand=false "HTTP/1.1 404 Not Found"
I0424 07:15:27.166584 139516489717568 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main?recursive=true&expand=false "HTTP/1.1 200 OK"
I0424 07:15:27.277299 139516489717568 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer.model "HTTP/1.1 302 Found"
I0424 07:15:27.429951 139516489717568 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/xet-read-token/f5db02db724555f92da89c216ac04704f23d4590 "HTTP/1.1 200 OK"
I0424 07:15:28.059551 139516489717568 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer.json "HTTP/1.1 200 OK"
I0424 07:15:28.213185 139516489717568 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer.json "HTTP/1.1 200 OK"
I0424 07:15:28.496082 139516489717568 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/added_tokens.json "HTTP/1.1 404 Not Found"
I0424 07:15:28.602453 139516489717568 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/special_tokens_map.json "HTTP/1.1 200 OK"
I0424 07:15:28.730415 139516489717568 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/special_tokens_map.json "HTTP/1.1 200 OK"
I0424 07:15:28.839742 139516489717568 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/chat_template.jinja "HTTP/1.1 404 Not Found"
I0424 07:15:28.936237 139516489717568 _schedule.py:129] A polynomial schedule was set with a non-positive `transition_steps` value; this results in a constant schedule with value `init_value`.
I0424 07:15:28.947409 139516489717568 maxtext_utils.py:1604] Num_devices: 32, shape (1, 4, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1)
I0424 07:15:28.947559 139516489717568 train_distill.py:582] Applying logical axis rules for model initialization and training...
I0424 07:15:28.947640 139516489717568 train_distill.py:586] Loading Student from ...
I0424 07:15:28.947683 139516489717568 train_distill.py:170] --- Student Configuration ---
I0424 07:15:28.947706 139516489717568 train_distill.py:171]   Model Name:      gpt3-52k
I0424 07:15:28.947728 139516489717568 train_distill.py:172]   Dimensions:      1 Layers, 16 Emb Dim, 8 Head Dim
I0424 07:15:28.947746 139516489717568 train_distill.py:175]   Attention Heads: 2 Query, 2 KV
I0424 07:15:28.947763 139516489717568 train_distill.py:176]   Vocab Size:      32000
I0424 07:15:28.947781 139516489717568 train_distill.py:177]   Checkpoint:      
I0424 07:15:28.947800 139516489717568 train_distill.py:451] Initializing model: gpt3-52k...
I0424 07:15:30.343850 139516489717568 train_distill.py:600] Loading Teacher from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items...
I0424 07:15:30.343959 139516489717568 train_distill.py:170] --- Teacher Configuration ---
I0424 07:15:30.343990 139516489717568 train_distill.py:171]   Model Name:      gpt3-52k
I0424 07:15:30.344015 139516489717568 train_distill.py:172]   Dimensions:      1 Layers, 16 Emb Dim, 8 Head Dim
I0424 07:15:30.344037 139516489717568 train_distill.py:175]   Attention Heads: 2 Query, 2 KV
I0424 07:15:30.344056 139516489717568 train_distill.py:176]   Vocab Size:      32000
I0424 07:15:30.344074 139516489717568 train_distill.py:177]   Checkpoint:      gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items
I0424 07:15:30.344094 139516489717568 train_distill.py:451] Initializing model: gpt3-52k...
I0424 07:15:31.350757 139516489717568 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0424 07:15:31.350917 139516489717568 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=True, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7ee2feb4e570>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0424 07:15:31.350978 139516489717568 abstract_checkpointer.py:35] orbax-checkpoint version: 0.11.28
W0424 07:15:31.852680 139516489717568 checkpoint.py:202] Metadata file does not exist: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items/_CHECKPOINT_METADATA
I0424 07:15:32.399226    1941 google_auth_provider.cc:181] Running on GCE, using service account 562977990677-compute@developer.gserviceaccount.com
I0424 07:15:33.537008 139516489717568 checkpointer.py:304] Restoring checkpoint from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items.
W0424 07:15:35.597154 139516489717568 transform_utils.py:230] The transformations API will eventually be replaced by an upgraded design. The current API will not be removed until this point, but it will no longer be actively worked on.
I0424 07:15:35.597527 139516489717568 transform_utils.py:288] The following keys are not loaded from the original tree after applying specified transforms: params/params/decoder/to_nnx__rngs/aqt/count, params/params/decoder/to_nnx__rngs/aqt/key, params/params/decoder/to_nnx__rngs/dropout/count, params/params/decoder/to_nnx__rngs/dropout/key, params/params/decoder/to_nnx__rngs/params/count, params/params/decoder/to_nnx__rngs/params/key
I0424 07:15:37.832464 139516489717568 checkpointer.py:318] Finished restoring checkpoint in 4.66 seconds from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items.
I0424 07:15:38.564716 139516489717568 train_distill.py:626] Initializing Data Iterators via MaxText pipeline...
I0424 07:15:38.629807 139516489717568 config.py:112] TensorFlow version 2.20.0 available.
I0424 07:15:38.630309 139516489717568 config.py:125] JAX version 0.9.2 available.
I0424 07:15:39.064920 139516489717568 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/main/README.md "HTTP/1.1 307 Temporary Redirect"
I0424 07:15:39.072314 139516489717568 _client.py:1025] HTTP Request: HEAD https://huggingface.co/api/resolve-cache/datasets/OptimalScale/ClimbMix/6d467b96d8f26cbe7465e2d70e36191aa75867ac/README.md "HTTP/1.1 200 OK"
I0424 07:15:39.080373 139516489717568 _client.py:1025] HTTP Request: GET https://huggingface.co/api/resolve-cache/datasets/OptimalScale/ClimbMix/6d467b96d8f26cbe7465e2d70e36191aa75867ac/README.md "HTTP/1.1 200 OK"
I0424 07:15:39.342344 139516489717568 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/6d467b96d8f26cbe7465e2d70e36191aa75867ac/ClimbMix.py "HTTP/1.1 404 Not Found"
I0424 07:15:39.661969 139516489717568 _client.py:1025] HTTP Request: HEAD https://s3.amazonaws.com/datasets.huggingface.co/datasets/datasets/OptimalScale/ClimbMix/OptimalScale/ClimbMix.py "HTTP/1.1 404 Not Found"
I0424 07:15:39.803086 139516489717568 _client.py:1025] HTTP Request: GET https://huggingface.co/api/datasets/OptimalScale/ClimbMix/revision/6d467b96d8f26cbe7465e2d70e36191aa75867ac "HTTP/1.1 200 OK"
I0424 07:15:39.909960 139516489717568 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/6d467b96d8f26cbe7465e2d70e36191aa75867ac/.huggingface.yaml "HTTP/1.1 404 Not Found"
I0424 07:15:40.080200 139516489717568 _client.py:1025] HTTP Request: GET https://datasets-server.huggingface.co/info?dataset=OptimalScale/ClimbMix "HTTP/1.1 200 OK"
I0424 07:15:40.192996 139516489717568 _client.py:1025] HTTP Request: GET https://huggingface.co/api/datasets/OptimalScale/ClimbMix/tree/6d467b96d8f26cbe7465e2d70e36191aa75867ac/data?recursive=true&expand=false "HTTP/1.1 404 Not Found"
I0424 07:15:40.300965 139516489717568 _client.py:1025] HTTP Request: GET https://huggingface.co/api/datasets/OptimalScale/ClimbMix/tree/6d467b96d8f26cbe7465e2d70e36191aa75867ac?recursive=false&expand=false "HTTP/1.1 200 OK"
I0424 07:15:40.435942 139516489717568 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/6d467b96d8f26cbe7465e2d70e36191aa75867ac/dataset_infos.json "HTTP/1.1 404 Not Found"
I0424 07:15:40.604352 139516489717568 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json "HTTP/1.1 200 OK"
I0424 07:15:40.710286 139516489717568 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer_config.json "HTTP/1.1 200 OK"
I0424 07:15:40.818473 139516489717568 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main/additional_chat_templates?recursive=false&expand=false "HTTP/1.1 404 Not Found"
I0424 07:15:40.932048 139516489717568 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main?recursive=true&expand=false "HTTP/1.1 200 OK"
E0424 07:15:41.030350 139516489717568 packing.py:209] PackAndBatchOperation is deprecated. Please use lazy_dataset.FirstFitPackIterDataset instead.
I0424 07:15:41.030563 139516489717568 data_loader.py:408] Adding CopyNumPyArrayToSharedMemory MapTransform.
I0424 07:15:41.033603 139516489717568 train_distill.py:396] Input Pipeline Checkpointing: DISABLED
I0424 07:15:41.033681 139516489717568 train_distill.py:400] Reason: Iterator 'MultiHostDataLoadIterator' is not recognized as Grain (dataset_type='DatasetType.HF', has_save=False)
I0424 07:15:41.033748 139516489717568 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0424 07:15:41.033826 139516489717568 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7ee2feb4e570>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0424 07:15:41.033867 139516489717568 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0424 07:15:41.033899 139516489717568 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7ee2feb4e570>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0424 07:15:41.033941 139516489717568 checkpoint_manager.py:702] [process=5][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7ecbaffa52b0>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7ed140223410>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7ed140223350>}, handler_registry=None
I0424 07:15:41.034139 139516489717568 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7ecbaffa52b0>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0424 07:15:41.034180 139516489717568 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7ed140223410>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0424 07:15:41.034207 139516489717568 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7ed140223350>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0424 07:15:41.034230 139516489717568 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7ec87f738e90>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0424 07:15:41.034257 139516489717568 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7ecbaffa52b0>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7ecbaffa52b0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7ed140223410>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7ed140223410>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7ed140223350>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7ed140223350>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7ec87f738e90>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7ec87f738e90>}).
I0424 07:15:41.034631 139516489717568 async_checkpointer.py:177] [process=5][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7ecbacdec5e0> timeout: 600 secs and primary_host=0 for async checkpoint writes
I0424 07:15:42.640111 139516489717568 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_main_20260424_070237/pt_distill_linen_xpk_main_20260424_070237_07_distill_smoke/checkpoints
I0424 07:15:42.642310 139516489717568 checkpoint_manager.py:921] [process=5][thread=MainThread] CheckpointManager created,  primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_main_20260424_070237/pt_distill_linen_xpk_main_20260424_070237_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7ed140223320>
I0424 07:15:42.642418 139516489717568 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0424 07:15:42.642484 139516489717568 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7ee2feb4e570>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0424 07:15:42.642519 139516489717568 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0424 07:15:42.642550 139516489717568 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7ee2feb4e570>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0424 07:15:42.642584 139516489717568 checkpoint_manager.py:1983] [process=5][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning.
I0424 07:15:42.642632 139516489717568 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=139516489717568 count=1 at 0x7ecab4356f40>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7ed140223140>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7ed140223110>, _write_futures=[])
I0424 07:15:42.643023 139516489717568 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=139516489717568 count=1 at 0x7ecab4356f40>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7ed140223140>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7ed140223110>, _write_futures=[])
I0424 07:15:42.643050 139516489717568 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=139516489717568 count=1 at 0x7ecab4356f40>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7ed140223140>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7ed140223110>, _write_futures=[])
I0424 07:15:42.643080 139516489717568 checkpoint_manager.py:702] [process=5][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7edd3582d6d0>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7ed140221910>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7ed140222c00>, 'iter': <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7ed140222480>}, handler_registry=None
I0424 07:15:42.643177 139516489717568 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7edd3582d6d0>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0424 07:15:42.643208 139516489717568 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7ed140221910>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0424 07:15:42.643232 139516489717568 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7ed140222c00>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0424 07:15:42.643258 139516489717568 composite_checkpoint_handler.py:237] Deferred registration for item: "iter". Adding handler `<maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7ed140222480>` for item "iter" and save args `<class 'maxtext.common.checkpointing.GrainCheckpointSave'>` and restore args `<class 'maxtext.common.checkpointing.GrainCheckpointRestore'>` to `_handler_registry`.
I0424 07:15:42.643282 139516489717568 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7ed140221820>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0424 07:15:42.643306 139516489717568 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7edd3582d6d0>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7edd3582d6d0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7ed140221910>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7ed140221910>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7ed140222c00>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7ed140222c00>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointSave'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7ed140222480>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointRestore'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7ed140222480>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7ed140221820>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7ed140221820>}).
I0424 07:15:42.643375 139516489717568 async_checkpointer.py:177] [process=5][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7ecbacdec720> timeout: 600 secs and primary_host=0 for async checkpoint writes
I0424 07:15:43.366770 139516489717568 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_main_20260424_070237/pt_distill_linen_xpk_main_20260424_070237_07_distill_smoke/checkpoints
I0424 07:15:43.374222 139516489717568 checkpoint_manager.py:921] [process=5][thread=MainThread] CheckpointManager created,  primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_main_20260424_070237/pt_distill_linen_xpk_main_20260424_070237_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7edd343ef7d0>
I0424 07:15:43.374641 139516489717568 train_distill.py:677] Starting Distillation Training...
I0424 07:15:43.374767 139516489717568 peft_trainer.py:584] Training with mesh: Mesh('diloco': 1, 'data': 4, 'stage': 1, 'fsdp': 8, 'fsdp_transpose': 1, 'sequence': 1, 'context': 1, 'context_autoregressive': 1, 'tensor': 1, 'tensor_transpose': 1, 'tensor_sequence': 1, 'expert': 1, 'autoregressive': 1, axis_types=(Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto))
I0424 07:15:43.830904 139516489717568 peft_trainer.py:594] Compiled train_step cache size: 0
I0424 07:15:43.832625 139362872313600 grain_pool.py:367] Grain pool will use 1 processes.
I0424 07:15:43.889572 139362872313600 grain_pool.py:440] Grain pool will start child processes.
I0424 07:15:43.895274 139362872313600 grain_pool.py:448] Grain pool started all child processes.
2026-04-24 07:15:50.390390: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
`rope_parameters`'s factor field must be a float >= 1, got 40
`rope_parameters`'s beta_fast field must be a float, got 32
`rope_parameters`'s beta_slow field must be a float, got 1
DeepseekV32Config got `key=rope_scaling` in kwargs but hasn't set it as attribute. For RoPE standardization you need to set `self.rope_parameters` in model's config. 
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 781, in <module>
    app.run(main)
  File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 316, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 261, in _run_main
    sys.exit(main(argv))
             ^^^^^^^^^^
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 777, in main
    train_distill(student_config, teacher_config, is_offline, global_config.offline_data_dir)
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 679, in train_distill
    trainer.train(train_iter, eval_iter)
  File "/usr/local/lib/python3.12/site-packages/tunix/sft/peft_trainer.py", line 652, in train
    train_example = sharding_utils.shard_input(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 58, in shard_input
    return jax.tree.map(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/tree.py", line 156, in map
    return tree_util.tree_map(f, tree, *rest, is_leaf=is_leaf)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 373, in tree_map
    return treedef.unflatten(f(*xs) for xs in zip(*all_leaves))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 373, in <genexpr>
    return treedef.unflatten(f(*xs) for xs in zip(*all_leaves))
                             ^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 59, in <lambda>
    lambda x: jax.make_array_from_process_local_data(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 985, in make_array_from_process_local_data
    out = [_array_from_process_local_data(data, s, shape)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 1047, in _array_from_process_local_data
    return make_array_from_callback(global_shape, sharding, cb)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 844, in make_array_from_callback
    per_device_values = api.device_put(per_device_values, devices)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/api.py", line 2732, in device_put
    out_flat = dispatch._batched_device_put_impl(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 602, in _batched_device_put_impl
    y = _device_put_impl(x, device=device, src=src, copy=cp, aval=aval)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 582, in _device_put_impl
    return _device_put_sharding_impl(x, aval, device, copy)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 512, in _device_put_sharding_impl
    raise ValueError(
ValueError: When the second argument to `device_put` is a Device, the first argument must be a fully addressable array or a non-addressable array with a single device sharding. Got value with devices {TpuDevice(id=15, process_index=3, coords=(3,3,0), core_on_chip=0), TpuDevice(id=6, process_index=1, coords=(2,1,0), core_on_chip=0), TpuDevice(id=22, process_index=5, coords=(2,5,0), core_on_chip=0), TpuDevice(id=7, process_index=1, coords=(3,1,0), core_on_chip=0), TpuDevice(id=2, process_index=1, coords=(2,0,0), core_on_chip=0), TpuDevice(id=16, process_index=4, coords=(0,4,0), core_on_chip=0), TpuDevice(id=27, process_index=7, coords=(3,6,0), core_on_chip=0), TpuDevice(id=26, process_index=7, coords=(2,6,0), core_on_chip=0), TpuDevice(id=24, process_index=6, coords=(0,6,0), core_on_chip=0), TpuDevice(id=17, process_index=4, coords=(1,4,0), core_on_chip=0), TpuDevice(id=13, process_index=2, coords=(1,3,0), core_on_chip=0), TpuDevice(id=3, process_index=1, coords=(3,0,0), core_on_chip=0), TpuDevice(id=11, process_index=3, coords=(3,2,0), core_on_chip=0), TpuDevice(id=23, process_index=5, coords=(3,5,0), core_on_chip=0), TpuDevice(id=21, process_index=4, coords=(1,5,0), core_on_chip=0), TpuDevice(id=8, process_index=2, coords=(0,2,0), core_on_chip=0), TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), TpuDevice(id=20, process_index=4, coords=(0,5,0), core_on_chip=0), TpuDevice(id=28, process_index=6, coords=(0,7,0), core_on_chip=0), TpuDevice(id=1, process_index=0, coords=(1,0,0), core_on_chip=0), TpuDevice(id=9, process_index=2, coords=(1,2,0), core_on_chip=0), TpuDevice(id=31, process_index=7, coords=(3,7,0), core_on_chip=0), TpuDevice(id=18, process_index=5, coords=(2,4,0), core_on_chip=0), TpuDevice(id=5, process_index=0, coords=(1,1,0), core_on_chip=0), TpuDevice(id=10, process_index=3, coords=(2,2,0), core_on_chip=0), TpuDevice(id=30, process_index=7, coords=(2,7,0), core_on_chip=0), TpuDevice(id=19, process_index=5, coords=(3,4,0), core_on_chip=0), TpuDevice(id=14, process_index=3, coords=(2,3,0), core_on_chip=0), TpuDevice(id=4, process_index=0, coords=(0,1,0), core_on_chip=0), TpuDevice(id=25, process_index=6, coords=(1,6,0), core_on_chip=0), TpuDevice(id=29, process_index=6, coords=(1,7,0), core_on_chip=0), TpuDevice(id=12, process_index=2, coords=(0,3,0), core_on_chip=0)}
I0424 07:15:53.959980 139362872313600 grain_pool.py:542] Grain pool is exiting.
I0424 07:15:53.960084 139362872313600 grain_pool.py:547] Shutting down multiprocessing system.
I0424 07:15:55.621247 139362872313600 grain_pool.py:547] Shutting down multiprocessing system.
/usr/local/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 15 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
XPK End: Fri Apr 24 07:16:06 UTC 2026
EXIT_CODE=1
NNX  ·  b117f50cf  ·  main_20260424_070237  ·  full log
XPK Start: Fri Apr 24 07:25:16 UTC 2026
2026-04-24 07:25:34.743095: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
`rope_parameters`'s factor field must be a float >= 1, got 40
`rope_parameters`'s beta_fast field must be a float, got 32
`rope_parameters`'s beta_slow field must be a float, got 1
DeepseekV32Config got `key=rope_scaling` in kwargs but hasn't set it as attribute. For RoPE standardization you need to set `self.rope_parameters` in model's config. 
I0424 07:25:41.043705 132390722602816 max_utils.py:273] Attempting to initialize the jax distributed system...
I0424 07:25:50.081614 132390722602816 distributed.py:149] Starting JAX distributed service on [::]:8482
I0424 07:25:50.083856 132390722602816 distributed.py:172] Connecting to JAX distributed service on mt-07-distill-smoke-esds6-slice-job-0-0.mt-07-distill-smoke-esds6:8482
I0424 07:25:51.348138 132390722602816 max_utils.py:284] Jax distributed system initialized!
I0424 07:25:57.630319 132390722602816 max_utils.py:244] Jax distributed system is already initialized.
W0424 07:25:57.756656 132390722602816 pyconfig.py:111] base_output_directory is not provided; Using local directory called maxtext_output
I0424 07:25:57.813706 132390722602816 max_utils.py:244] Jax distributed system is already initialized.
I0424 07:25:57.814852 132390722602816 pyconfig.py:471] Config param abort_on_inf_loss: True
I0424 07:25:57.814899 132390722602816 pyconfig.py:471] Config param abort_on_nan_loss: True
I0424 07:25:57.814935 132390722602816 pyconfig.py:471] Config param act_quantization_calibration_method: absmax
I0424 07:25:57.814959 132390722602816 pyconfig.py:471] Config param activation_dropout_for_audio: 0.0
I0424 07:25:57.814981 132390722602816 pyconfig.py:471] Config param activation_function_for_audio: gelu
I0424 07:25:57.815000 132390722602816 pyconfig.py:471] Config param activations_in_float32: False
I0424 07:25:57.815018 132390722602816 pyconfig.py:471] Config param adam_b1: 0.9
I0424 07:25:57.815038 132390722602816 pyconfig.py:471] Config param adam_b2: 0.95
I0424 07:25:57.815056 132390722602816 pyconfig.py:471] Config param adam_eps: 1e-08
I0424 07:25:57.815079 132390722602816 pyconfig.py:471] Config param adam_eps_root: 0.0
I0424 07:25:57.815096 132390722602816 pyconfig.py:471] Config param adam_weight_decay: 0.1
I0424 07:25:57.815113 132390722602816 pyconfig.py:471] Config param adamw_mask: []
I0424 07:25:57.815129 132390722602816 pyconfig.py:471] Config param add_bos: True
I0424 07:25:57.815146 132390722602816 pyconfig.py:471] Config param add_eos: True
I0424 07:25:57.815162 132390722602816 pyconfig.py:471] Config param allow_split_physical_axes: False
I0424 07:25:57.815179 132390722602816 pyconfig.py:471] Config param ar_cache_axis_order: 1,2,0,3
I0424 07:25:57.815196 132390722602816 pyconfig.py:471] Config param async_checkpointing: True
I0424 07:25:57.815212 132390722602816 pyconfig.py:471] Config param async_scheduling: False
I0424 07:25:57.815228 132390722602816 pyconfig.py:471] Config param attention: dot_product
I0424 07:25:57.815245 132390722602816 pyconfig.py:471] Config param attention_bias: False
I0424 07:25:57.815262 132390722602816 pyconfig.py:471] Config param attention_dropout_for_audio: 0.0
I0424 07:25:57.815278 132390722602816 pyconfig.py:471] Config param attention_out: RematLocation.REMAT
I0424 07:25:57.815299 132390722602816 pyconfig.py:471] Config param attention_output_dim: -1
I0424 07:25:57.815314 132390722602816 pyconfig.py:471] Config param attention_sink: False
I0424 07:25:57.815330 132390722602816 pyconfig.py:471] Config param attention_type: global
I0424 07:25:57.815345 132390722602816 pyconfig.py:471] Config param attn_logits_soft_cap: None
I0424 07:25:57.815361 132390722602816 pyconfig.py:471] Config param audio_path: 
I0424 07:25:57.815376 132390722602816 pyconfig.py:471] Config param audio_placeholder: <|audio|>
I0424 07:25:57.815392 132390722602816 pyconfig.py:471] Config param autoregressive_decode_assert: 
I0424 07:25:57.815406 132390722602816 pyconfig.py:471] Config param base_config: base.yml
I0424 07:25:57.815422 132390722602816 pyconfig.py:471] Config param base_emb_dim: 16
I0424 07:25:57.815437 132390722602816 pyconfig.py:471] Config param base_mlp_dim: 64
I0424 07:25:57.815452 132390722602816 pyconfig.py:471] Config param base_moe_mlp_dim: -1
I0424 07:25:57.815467 132390722602816 pyconfig.py:471] Config param base_num_decoder_layers: 1
I0424 07:25:57.815483 132390722602816 pyconfig.py:471] Config param base_num_kv_heads: 2
I0424 07:25:57.815497 132390722602816 pyconfig.py:471] Config param base_num_query_heads: 2
I0424 07:25:57.815513 132390722602816 pyconfig.py:471] Config param base_output_directory: /deps/maxtext_output
I0424 07:25:57.815528 132390722602816 pyconfig.py:471] Config param batch_size: 1
I0424 07:25:57.815544 132390722602816 pyconfig.py:471] Config param batch_split_factor: 1
I0424 07:25:57.815559 132390722602816 pyconfig.py:471] Config param beta_fast: 32
I0424 07:25:57.815575 132390722602816 pyconfig.py:471] Config param beta_slow: 1
I0424 07:25:57.815591 132390722602816 pyconfig.py:471] Config param bwd_quantization_calibration_method: absmax
I0424 07:25:57.815609 132390722602816 pyconfig.py:471] Config param capacity_factor: -1.0
I0424 07:25:57.815626 132390722602816 pyconfig.py:471] Config param cast_logits_to_fp32: True
I0424 07:25:57.815642 132390722602816 pyconfig.py:471] Config param chat_template: 
I0424 07:25:57.815656 132390722602816 pyconfig.py:471] Config param chat_template_path: 
I0424 07:25:57.815674 132390722602816 pyconfig.py:471] Config param checkpoint_conversion_fn: None
I0424 07:25:57.815693 132390722602816 pyconfig.py:471] Config param checkpoint_dir: /deps/maxtext_output/gpt3-52k_2026-04-24-07-25/checkpoints/
I0424 07:25:57.815710 132390722602816 pyconfig.py:471] Config param checkpoint_is_quantized: False
I0424 07:25:57.815725 132390722602816 pyconfig.py:471] Config param checkpoint_period: 2000
I0424 07:25:57.815742 132390722602816 pyconfig.py:471] Config param checkpoint_storage_concurrent_gb: 96
I0424 07:25:57.815757 132390722602816 pyconfig.py:471] Config param checkpoint_storage_target_data_file_size_bytes: 2147483648
I0424 07:25:57.815773 132390722602816 pyconfig.py:471] Config param checkpoint_storage_use_ocdbt: True
I0424 07:25:57.815788 132390722602816 pyconfig.py:471] Config param checkpoint_storage_use_zarr3: True
I0424 07:25:57.815804 132390722602816 pyconfig.py:471] Config param checkpoint_todelete_full_path: None
I0424 07:25:57.815818 132390722602816 pyconfig.py:471] Config param checkpoint_todelete_subdir: None
I0424 07:25:57.815833 132390722602816 pyconfig.py:471] Config param chips_per_vm: 4
I0424 07:25:57.815848 132390722602816 pyconfig.py:471] Config param chunk_attn_window_size: 0
I0424 07:25:57.815863 132390722602816 pyconfig.py:471] Config param collect_stack_trace: False
I0424 07:25:57.815878 132390722602816 pyconfig.py:471] Config param colocated_python_checkpointing: False
I0424 07:25:57.815894 132390722602816 pyconfig.py:471] Config param colocated_python_data_input: False
I0424 07:25:57.815908 132390722602816 pyconfig.py:471] Config param compile_topology: 
I0424 07:25:57.815933 132390722602816 pyconfig.py:471] Config param compile_topology_num_slices: -1
I0424 07:25:57.815948 132390722602816 pyconfig.py:471] Config param compile_xla_flags: 
I0424 07:25:57.815964 132390722602816 pyconfig.py:471] Config param compiled_trainstep_file: 
I0424 07:25:57.815979 132390722602816 pyconfig.py:471] Config param compute_axis_order: 0,1,2,3
I0424 07:25:57.815994 132390722602816 pyconfig.py:471] Config param constant_bound_config: []
I0424 07:25:57.816009 132390722602816 pyconfig.py:471] Config param context: RematLocation.REMAT
I0424 07:25:57.816026 132390722602816 pyconfig.py:471] Config param context_parallel_load_balance: True
I0424 07:25:57.816040 132390722602816 pyconfig.py:471] Config param context_parallel_reorder_strategy: ReorderStrategy.AUTO
I0424 07:25:57.816058 132390722602816 pyconfig.py:471] Config param context_parallel_size: 1
I0424 07:25:57.816072 132390722602816 pyconfig.py:471] Config param context_parallel_strategy: all_gather
I0424 07:25:57.816088 132390722602816 pyconfig.py:471] Config param context_sharding: context
I0424 07:25:57.816104 132390722602816 pyconfig.py:471] Config param conv_chunksize_for_audio: 500
I0424 07:25:57.816118 132390722602816 pyconfig.py:471] Config param conv_stride_for_vit: 14
I0424 07:25:57.816133 132390722602816 pyconfig.py:471] Config param convert_checkpoint_if_possible: False
I0424 07:25:57.816149 132390722602816 pyconfig.py:471] Config param cost_estimate_flops_bwd: -1
I0424 07:25:57.816165 132390722602816 pyconfig.py:471] Config param cost_estimate_flops_fwd: -1
I0424 07:25:57.816179 132390722602816 pyconfig.py:471] Config param custom_mesh: 
I0424 07:25:57.816195 132390722602816 pyconfig.py:471] Config param custom_mesh_and_rule: 
I0424 07:25:57.816220 132390722602816 pyconfig.py:471] Config param d_model_for_audio: 256
I0424 07:25:57.816236 132390722602816 pyconfig.py:471] Config param data_sharding: (('data', 'stage', 'fsdp', 'fsdp_transpose', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'),)
I0424 07:25:57.816256 132390722602816 pyconfig.py:471] Config param data_shuffle_seed: 0
I0424 07:25:57.816272 132390722602816 pyconfig.py:471] Config param dataset_name: c4/en:3.0.1
I0424 07:25:57.816295 132390722602816 pyconfig.py:471] Config param dataset_path: 
I0424 07:25:57.816320 132390722602816 pyconfig.py:471] Config param dataset_type: DatasetType.HF
I0424 07:25:57.816350 132390722602816 pyconfig.py:471] Config param dcn_autoregressive_parallelism: 1
I0424 07:25:57.816370 132390722602816 pyconfig.py:471] Config param dcn_context_autoregressive_parallelism: 1
I0424 07:25:57.816386 132390722602816 pyconfig.py:471] Config param dcn_context_parallelism: 1
I0424 07:25:57.816401 132390722602816 pyconfig.py:471] Config param dcn_data_parallelism: -1
I0424 07:25:57.816416 132390722602816 pyconfig.py:471] Config param dcn_diloco_parallelism: 1
I0424 07:25:57.816433 132390722602816 pyconfig.py:471] Config param dcn_expert_parallelism: 1
I0424 07:25:57.816448 132390722602816 pyconfig.py:471] Config param dcn_fsdp_parallelism: 1
I0424 07:25:57.816464 132390722602816 pyconfig.py:471] Config param dcn_fsdp_transpose_parallelism: 1
I0424 07:25:57.816480 132390722602816 pyconfig.py:471] Config param dcn_parallelism: [1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
I0424 07:25:57.816496 132390722602816 pyconfig.py:471] Config param dcn_pipeline_parallelism: 1
I0424 07:25:57.816512 132390722602816 pyconfig.py:471] Config param dcn_sequence_parallelism: 1
I0424 07:25:57.816528 132390722602816 pyconfig.py:471] Config param dcn_tensor_parallelism: 1
I0424 07:25:57.816542 132390722602816 pyconfig.py:471] Config param dcn_tensor_sequence_parallelism: 1
I0424 07:25:57.816558 132390722602816 pyconfig.py:471] Config param dcn_tensor_transpose_parallelism: 1
I0424 07:25:57.816574 132390722602816 pyconfig.py:471] Config param debug: {'rl': False}
I0424 07:25:57.816590 132390722602816 pyconfig.py:471] Config param debug_sharding: False
I0424 07:25:57.816606 132390722602816 pyconfig.py:471] Config param decode_sampling_nucleus_p: -1
I0424 07:25:57.816622 132390722602816 pyconfig.py:471] Config param decode_sampling_strategy: SamplingStrategy.GREEDY
I0424 07:25:57.816640 132390722602816 pyconfig.py:471] Config param decode_sampling_temperature: 1.0
I0424 07:25:57.816655 132390722602816 pyconfig.py:471] Config param decode_sampling_top_k: 0
I0424 07:25:57.816670 132390722602816 pyconfig.py:471] Config param decoder_block: DecoderBlockType.GPT3
I0424 07:25:57.816694 132390722602816 pyconfig.py:471] Config param decoder_layer_input: RematLocation.DEVICE
I0424 07:25:57.816711 132390722602816 pyconfig.py:471] Config param deepstack_visual_indexes_for_vit: []
I0424 07:25:57.816728 132390722602816 pyconfig.py:471] Config param degenerate_group_masking: True
I0424 07:25:57.816743 132390722602816 pyconfig.py:471] Config param dense_init_scale: 1.0
I0424 07:25:57.816759 132390722602816 pyconfig.py:471] Config param diloco_outer_lr: 0.3
I0424 07:25:57.816775 132390722602816 pyconfig.py:471] Config param diloco_outer_momentum: 0.9
I0424 07:25:57.816790 132390722602816 pyconfig.py:471] Config param diloco_sync_period: 36
I0424 07:25:57.816806 132390722602816 pyconfig.py:471] Config param distill_alpha: 0.5
I0424 07:25:57.816823 132390722602816 pyconfig.py:471] Config param distill_alpha_end: None
I0424 07:25:57.816838 132390722602816 pyconfig.py:471] Config param distill_alpha_schedule: constant
I0424 07:25:57.816853 132390722602816 pyconfig.py:471] Config param distill_beta: 0.0
I0424 07:25:57.816869 132390722602816 pyconfig.py:471] Config param distill_beta_end: None
I0424 07:25:57.816884 132390722602816 pyconfig.py:471] Config param distill_beta_schedule: constant
I0424 07:25:57.816899 132390722602816 pyconfig.py:471] Config param distill_feature_loss_type: cosine
I0424 07:25:57.816915 132390722602816 pyconfig.py:471] Config param distill_layer_indices: None
I0424 07:25:57.816954 132390722602816 pyconfig.py:471] Config param distill_temperature: 1.0
I0424 07:25:57.816972 132390722602816 pyconfig.py:471] Config param distill_temperature_end: None
I0424 07:25:57.816986 132390722602816 pyconfig.py:471] Config param distill_temperature_schedule: constant
I0424 07:25:57.817002 132390722602816 pyconfig.py:471] Config param downsample_hidden_size_for_audio: 256
I0424 07:25:57.817016 132390722602816 pyconfig.py:471] Config param dpo_beta: 0.1
I0424 07:25:57.817032 132390722602816 pyconfig.py:471] Config param dpo_label_smoothing: 0.0
I0424 07:25:57.817047 132390722602816 pyconfig.py:471] Config param dq_reduction_steps: 0
I0424 07:25:57.817062 132390722602816 pyconfig.py:471] Config param dropout_rate: 0.0
I0424 07:25:57.817077 132390722602816 pyconfig.py:471] Config param dtype: bfloat16
I0424 07:25:57.817110 132390722602816 pyconfig.py:471] Config param dtype_mm: float32
I0424 07:25:57.817127 132390722602816 pyconfig.py:471] Config param dump_hlo: False
I0424 07:25:57.817143 132390722602816 pyconfig.py:471] Config param dump_hlo_delete_local_after: True
I0424 07:25:57.817158 132390722602816 pyconfig.py:471] Config param dump_hlo_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-24-07-25/xla_dump
I0424 07:25:57.817175 132390722602816 pyconfig.py:471] Config param dump_hlo_local_dir: /tmp/xla_dump/
I0424 07:25:57.817189 132390722602816 pyconfig.py:471] Config param dump_hlo_local_module_name: jit_train_step
I0424 07:25:57.817205 132390722602816 pyconfig.py:471] Config param dump_hlo_module_name: jit_train_step
I0424 07:25:57.817220 132390722602816 pyconfig.py:471] Config param dump_hlo_upload_all: False
I0424 07:25:57.817236 132390722602816 pyconfig.py:471] Config param dump_hlo_xla_flags: 
I0424 07:25:57.817250 132390722602816 pyconfig.py:471] Config param dump_jaxpr: False
I0424 07:25:57.817266 132390722602816 pyconfig.py:471] Config param dump_jaxpr_delete_local_after: True
I0424 07:25:57.817283 132390722602816 pyconfig.py:471] Config param dump_jaxpr_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-24-07-25/jaxpr_dump
I0424 07:25:57.817298 132390722602816 pyconfig.py:471] Config param dump_jaxpr_local_dir: /tmp/jaxpr_dump/
I0424 07:25:57.817314 132390722602816 pyconfig.py:471] Config param dump_step: -1
I0424 07:25:57.817329 132390722602816 pyconfig.py:471] Config param elastic_enabled: False
I0424 07:25:57.817345 132390722602816 pyconfig.py:471] Config param elastic_max_retries: 10
I0424 07:25:57.817360 132390722602816 pyconfig.py:471] Config param elastic_timeout_seconds: 300
I0424 07:25:57.817376 132390722602816 pyconfig.py:471] Config param emb_dim: 16
I0424 07:25:57.817390 132390722602816 pyconfig.py:471] Config param enable_autocheckpoint: False
I0424 07:25:57.817406 132390722602816 pyconfig.py:471] Config param enable_checkpoint_cloud_logger: False
I0424 07:25:57.817420 132390722602816 pyconfig.py:471] Config param enable_checkpointing: True
I0424 07:25:57.817436 132390722602816 pyconfig.py:471] Config param enable_continuous_checkpointing: False
I0424 07:25:57.817451 132390722602816 pyconfig.py:471] Config param enable_data_shuffling: True
I0424 07:25:57.817467 132390722602816 pyconfig.py:471] Config param enable_diloco: False
I0424 07:25:57.817481 132390722602816 pyconfig.py:471] Config param enable_dp_attention: False
I0424 07:25:57.817497 132390722602816 pyconfig.py:471] Config param enable_dropout: False
I0424 07:25:57.817511 132390722602816 pyconfig.py:471] Config param enable_emergency_checkpoint: False
I0424 07:25:57.817527 132390722602816 pyconfig.py:471] Config param enable_expert_parallel: False
I0424 07:25:57.817542 132390722602816 pyconfig.py:471] Config param enable_gcp_goodput_metrics: True
I0424 07:25:57.817558 132390722602816 pyconfig.py:471] Config param enable_gcp_step_deviation_metrics: True
I0424 07:25:57.817572 132390722602816 pyconfig.py:471] Config param enable_goodput_recording: False
I0424 07:25:57.817588 132390722602816 pyconfig.py:471] Config param enable_jax_profiler: False
I0424 07:25:57.817604 132390722602816 pyconfig.py:471] Config param enable_llm_inference_pool: False
I0424 07:25:57.817618 132390722602816 pyconfig.py:471] Config param enable_model_warmup: False
I0424 07:25:57.817634 132390722602816 pyconfig.py:471] Config param enable_multi_tier_checkpointing: False
I0424 07:25:57.817650 132390722602816 pyconfig.py:471] Config param enable_nnx: False
I0424 07:25:57.817666 132390722602816 pyconfig.py:471] Config param enable_orbax_v1: False
I0424 07:25:57.817680 132390722602816 pyconfig.py:471] Config param enable_padding_causal_mask: True
I0424 07:25:57.817700 132390722602816 pyconfig.py:471] Config param enable_pathways_goodput: False
I0424 07:25:57.817717 132390722602816 pyconfig.py:471] Config param enable_prefix_caching: False
I0424 07:25:57.817731 132390722602816 pyconfig.py:471] Config param enable_rampup_batch_size: False
I0424 07:25:57.817747 132390722602816 pyconfig.py:471] Config param enable_single_controller: False
I0424 07:25:57.817762 132390722602816 pyconfig.py:471] Config param enable_single_replica_ckpt_restoring: False
I0424 07:25:57.817777 132390722602816 pyconfig.py:471] Config param enable_tensorboard: True
I0424 07:25:57.817793 132390722602816 pyconfig.py:471] Config param enable_tunix_perf_metrics: False
I0424 07:25:57.817808 132390722602816 pyconfig.py:471] Config param encoder_attention_heads_for_audio: 4
I0424 07:25:57.817822 132390722602816 pyconfig.py:471] Config param encoder_ffn_dim_for_audio: 512
I0424 07:25:57.817838 132390722602816 pyconfig.py:471] Config param encoder_layers_for_audio: 2
I0424 07:25:57.817854 132390722602816 pyconfig.py:471] Config param engram: RematLocation.REMAT
I0424 07:25:57.817870 132390722602816 pyconfig.py:471] Config param engram_head_dim: 1280
I0424 07:25:57.817884 132390722602816 pyconfig.py:471] Config param engram_kernel_size: 4
I0424 07:25:57.817900 132390722602816 pyconfig.py:471] Config param engram_layers: []
I0424 07:25:57.817916 132390722602816 pyconfig.py:471] Config param engram_max_ngram_size: 3
I0424 07:25:57.817943 132390722602816 pyconfig.py:471] Config param engram_num_heads: 8
I0424 07:25:57.817958 132390722602816 pyconfig.py:471] Config param engram_seed: 0
I0424 07:25:57.817974 132390722602816 pyconfig.py:471] Config param engram_vocab_bases: []
I0424 07:25:57.817988 132390722602816 pyconfig.py:471] Config param epsilon_high: None
I0424 07:25:57.818004 132390722602816 pyconfig.py:471] Config param eval_corr_lst: False
I0424 07:25:57.818020 132390722602816 pyconfig.py:471] Config param eval_data_columns: ['text']
I0424 07:25:57.818036 132390722602816 pyconfig.py:471] Config param eval_dataset_name: c4/en:3.0.1
I0424 07:25:57.818054 132390722602816 pyconfig.py:471] Config param eval_image_column: image
I0424 07:25:57.818069 132390722602816 pyconfig.py:471] Config param eval_interval: -1
I0424 07:25:57.818083 132390722602816 pyconfig.py:471] Config param eval_make_lst: False
I0424 07:25:57.818099 132390722602816 pyconfig.py:471] Config param eval_mode: pass
I0424 07:25:57.818115 132390722602816 pyconfig.py:471] Config param eval_per_device_batch_size: 2
I0424 07:25:57.818130 132390722602816 pyconfig.py:471] Config param eval_sampling_strategy: greedy
I0424 07:25:57.818145 132390722602816 pyconfig.py:471] Config param eval_split: validation
I0424 07:25:57.818161 132390722602816 pyconfig.py:471] Config param eval_steps: -1
I0424 07:25:57.818176 132390722602816 pyconfig.py:471] Config param expansion_factor_real_data: -1.0
I0424 07:25:57.818192 132390722602816 pyconfig.py:471] Config param final_logits_soft_cap: None
I0424 07:25:57.818207 132390722602816 pyconfig.py:471] Config param first_num_dense_layers: 0
I0424 07:25:57.818222 132390722602816 pyconfig.py:471] Config param float32_gate_logits: False
I0424 07:25:57.818239 132390722602816 pyconfig.py:471] Config param float32_logits: False
I0424 07:25:57.818255 132390722602816 pyconfig.py:471] Config param float32_qk_product: False
I0424 07:25:57.818269 132390722602816 pyconfig.py:471] Config param float32_weight_sum: True
I0424 07:25:57.818284 132390722602816 pyconfig.py:471] Config param force_q_layout: False
I0424 07:25:57.818299 132390722602816 pyconfig.py:471] Config param force_unroll: False
I0424 07:25:57.818315 132390722602816 pyconfig.py:471] Config param formatting_func_kwargs: {}
I0424 07:25:57.818330 132390722602816 pyconfig.py:471] Config param formatting_func_path: 
I0424 07:25:57.818345 132390722602816 pyconfig.py:471] Config param freeze_audio_encoder_params: True
I0424 07:25:57.818360 132390722602816 pyconfig.py:471] Config param freeze_vision_encoder_params: True
I0424 07:25:57.818376 132390722602816 pyconfig.py:471] Config param fused_mlp: False
I0424 07:25:57.818390 132390722602816 pyconfig.py:471] Config param fused_qkv: True
I0424 07:25:57.818406 132390722602816 pyconfig.py:471] Config param gcs_metrics: False
I0424 07:25:57.818420 132390722602816 pyconfig.py:471] Config param gdn_chunk_size: 64
I0424 07:25:57.818436 132390722602816 pyconfig.py:471] Config param gdn_conv_kernel_dim: 4
I0424 07:25:57.818451 132390722602816 pyconfig.py:471] Config param gdn_key_head_dim: 128
I0424 07:25:57.818466 132390722602816 pyconfig.py:471] Config param gdn_num_key_heads: 16
I0424 07:25:57.818482 132390722602816 pyconfig.py:471] Config param gdn_num_value_heads: 32
I0424 07:25:57.818498 132390722602816 pyconfig.py:471] Config param gdn_value_head_dim: 128
I0424 07:25:57.818513 132390722602816 pyconfig.py:471] Config param generate_padding_batch_eval: False
I0424 07:25:57.818527 132390722602816 pyconfig.py:471] Config param generate_padding_batch_train: False
I0424 07:25:57.818542 132390722602816 pyconfig.py:471] Config param generate_slice: v5e-16
I0424 07:25:57.818558 132390722602816 pyconfig.py:471] Config param generation_configs: {}
I0424 07:25:57.818574 132390722602816 pyconfig.py:471] Config param global_batch_size_to_eval_on: 64
I0424 07:25:57.818589 132390722602816 pyconfig.py:471] Config param global_batch_size_to_load: 512
I0424 07:25:57.818605 132390722602816 pyconfig.py:471] Config param global_batch_size_to_load_eval: 64
I0424 07:25:57.818619 132390722602816 pyconfig.py:471] Config param global_batch_size_to_load_increment: None
I0424 07:25:57.818635 132390722602816 pyconfig.py:471] Config param global_batch_size_to_load_start: None
I0424 07:25:57.818650 132390722602816 pyconfig.py:471] Config param global_batch_size_to_train_on: 512
I0424 07:25:57.818666 132390722602816 pyconfig.py:471] Config param global_head_dim: 0
I0424 07:25:57.818680 132390722602816 pyconfig.py:471] Config param global_num_kv_heads: 0
I0424 07:25:57.818699 132390722602816 pyconfig.py:471] Config param global_parameter_scale: 1
I0424 07:25:57.818715 132390722602816 pyconfig.py:471] Config param global_rampup_samples: 500
I0424 07:25:57.818729 132390722602816 pyconfig.py:471] Config param global_rope_max_timescale: -1
I0424 07:25:57.818745 132390722602816 pyconfig.py:471] Config param global_rope_proportion: 0.25
I0424 07:25:57.818762 132390722602816 pyconfig.py:471] Config param goodput_upload_interval_seconds: 30
I0424 07:25:57.818776 132390722602816 pyconfig.py:471] Config param grad_dtype: float32
I0424 07:25:57.818812 132390722602816 pyconfig.py:471] Config param gradient_accumulation_steps: 8
I0424 07:25:57.818828 132390722602816 pyconfig.py:471] Config param gradient_clipping_threshold: 1.0
I0424 07:25:57.818845 132390722602816 pyconfig.py:471] Config param grain_data_source_max_workers: 16
I0424 07:25:57.818859 132390722602816 pyconfig.py:471] Config param grain_eval_files: 
I0424 07:25:57.818873 132390722602816 pyconfig.py:471] Config param grain_file_type: arrayrecord
I0424 07:25:57.818889 132390722602816 pyconfig.py:471] Config param grain_num_threads: 16
I0424 07:25:57.818906 132390722602816 pyconfig.py:471] Config param grain_num_threads_eval: 16
I0424 07:25:57.818921 132390722602816 pyconfig.py:471] Config param grain_packing_type: first_fit
I0424 07:25:57.818947 132390722602816 pyconfig.py:471] Config param grain_per_worker_buffer_size: 1
I0424 07:25:57.818962 132390722602816 pyconfig.py:471] Config param grain_per_worker_buffer_size_eval: 1
I0424 07:25:57.818978 132390722602816 pyconfig.py:471] Config param grain_prefetch_buffer_size: 500
I0424 07:25:57.818993 132390722602816 pyconfig.py:471] Config param grain_prefetch_buffer_size_eval: 500
I0424 07:25:57.819009 132390722602816 pyconfig.py:471] Config param grain_ram_budget_mb: 1024
I0424 07:25:57.819023 132390722602816 pyconfig.py:471] Config param grain_shuffle_buffer_size: 100
I0424 07:25:57.819040 132390722602816 pyconfig.py:471] Config param grain_train_files: 
I0424 07:25:57.819055 132390722602816 pyconfig.py:471] Config param grain_train_mixture_config_path: 
I0424 07:25:57.819070 132390722602816 pyconfig.py:471] Config param grain_worker_count: 1
I0424 07:25:57.819086 132390722602816 pyconfig.py:471] Config param grain_worker_count_eval: 1
I0424 07:25:57.819101 132390722602816 pyconfig.py:471] Config param grpo_beta: 0.08
I0424 07:25:57.819116 132390722602816 pyconfig.py:471] Config param grpo_epsilon: 0.2
I0424 07:25:57.819133 132390722602816 pyconfig.py:471] Config param hardware: tpu
I0424 07:25:57.819148 132390722602816 pyconfig.py:471] Config param hbm_utilization_vllm: 0.72
I0424 07:25:57.819165 132390722602816 pyconfig.py:471] Config param head_dim: 8
I0424 07:25:57.819179 132390722602816 pyconfig.py:471] Config param heartbeat_reporting_interval_in_seconds: 5
I0424 07:25:57.819195 132390722602816 pyconfig.py:471] Config param hf_data_dir: None
I0424 07:25:57.819209 132390722602816 pyconfig.py:471] Config param hf_eval_files: None
I0424 07:25:57.819225 132390722602816 pyconfig.py:471] Config param hf_eval_split: None
I0424 07:25:57.819240 132390722602816 pyconfig.py:471] Config param hf_name: None
I0424 07:25:57.819255 132390722602816 pyconfig.py:471] Config param hf_path: OptimalScale/ClimbMix
I0424 07:25:57.819269 132390722602816 pyconfig.py:471] Config param hf_train_files: None
I0424 07:25:57.819285 132390722602816 pyconfig.py:471] Config param hidden_size_for_vit: 1408
I0424 07:25:57.819301 132390722602816 pyconfig.py:471] Config param hide_profiler_step_metric: False
I0424 07:25:57.819315 132390722602816 pyconfig.py:471] Config param ici_autoregressive_parallelism: 1
I0424 07:25:57.819331 132390722602816 pyconfig.py:471] Config param ici_context_autoregressive_parallelism: 1
I0424 07:25:57.819345 132390722602816 pyconfig.py:471] Config param ici_context_parallelism: 1
I0424 07:25:57.819361 132390722602816 pyconfig.py:471] Config param ici_data_parallelism: 1
I0424 07:25:57.819377 132390722602816 pyconfig.py:471] Config param ici_diloco_parallelism: 1
I0424 07:25:57.819392 132390722602816 pyconfig.py:471] Config param ici_expert_parallelism: 1
I0424 07:25:57.819406 132390722602816 pyconfig.py:471] Config param ici_fsdp_parallelism: -1
I0424 07:25:57.819422 132390722602816 pyconfig.py:471] Config param ici_fsdp_transpose_parallelism: 1
I0424 07:25:57.819438 132390722602816 pyconfig.py:471] Config param ici_parallelism: [1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1]
I0424 07:25:57.819453 132390722602816 pyconfig.py:471] Config param ici_pipeline_parallelism: 1
I0424 07:25:57.819469 132390722602816 pyconfig.py:471] Config param ici_sequence_parallelism: 1
I0424 07:25:57.819485 132390722602816 pyconfig.py:471] Config param ici_tensor_parallelism: 1
I0424 07:25:57.819499 132390722602816 pyconfig.py:471] Config param ici_tensor_sequence_parallelism: 1
I0424 07:25:57.819515 132390722602816 pyconfig.py:471] Config param ici_tensor_transpose_parallelism: 1
I0424 07:25:57.819530 132390722602816 pyconfig.py:471] Config param image_path: 
I0424 07:25:57.819545 132390722602816 pyconfig.py:471] Config param image_placeholder: <|image|>
I0424 07:25:57.819559 132390722602816 pyconfig.py:471] Config param image_size_for_vit: 896
I0424 07:25:57.819575 132390722602816 pyconfig.py:471] Config param indexer_head_dim: 128
I0424 07:25:57.819589 132390722602816 pyconfig.py:471] Config param indexer_loss_scaling_factor: 0.0
I0424 07:25:57.819605 132390722602816 pyconfig.py:471] Config param indexer_n_heads: 64
I0424 07:25:57.819621 132390722602816 pyconfig.py:471] Config param indexer_sparse_training: False
I0424 07:25:57.819636 132390722602816 pyconfig.py:471] Config param indexer_topk: 2048
I0424 07:25:57.819652 132390722602816 pyconfig.py:471] Config param inference_benchmark_test: False
I0424 07:25:57.819666 132390722602816 pyconfig.py:471] Config param inference_metadata_file: 
I0424 07:25:57.819680 132390722602816 pyconfig.py:471] Config param inference_microbenchmark_log_file_path: 
I0424 07:25:57.819700 132390722602816 pyconfig.py:471] Config param inference_microbenchmark_loop_iters: 10
I0424 07:25:57.819714 132390722602816 pyconfig.py:471] Config param inference_microbenchmark_num_samples: [1, 2, 3, 4, 5]
I0424 07:25:57.819730 132390722602816 pyconfig.py:471] Config param inference_microbenchmark_prefill_lengths: 64,128,256,512,1024
I0424 07:25:57.819745 132390722602816 pyconfig.py:471] Config param inference_microbenchmark_stages: prefill,generate
I0424 07:25:57.819761 132390722602816 pyconfig.py:471] Config param inference_server: MaxtextInterleavedServer
I0424 07:25:57.819775 132390722602816 pyconfig.py:471] Config param inhomogeneous_layer_cycle_interval: 1
I0424 07:25:57.819791 132390722602816 pyconfig.py:471] Config param init_weights_seed: 0
I0424 07:25:57.819805 132390722602816 pyconfig.py:471] Config param input_data_sharding_logical_axes: ['activation_embed_and_logits_batch', 'activation_norm_length']
I0424 07:25:57.819822 132390722602816 pyconfig.py:471] Config param interleave_moe_layer_step: 1
I0424 07:25:57.819837 132390722602816 pyconfig.py:471] Config param intermediate_size_for_vit: 5632
I0424 07:25:57.819853 132390722602816 pyconfig.py:471] Config param internal_compile: False
I0424 07:25:57.819867 132390722602816 pyconfig.py:471] Config param internal_compile_num_devices: -1
I0424 07:25:57.819883 132390722602816 pyconfig.py:471] Config param jax_cache_dir: ~/jax_cache
I0424 07:25:57.819897 132390722602816 pyconfig.py:471] Config param jax_debug_log_modules: 
I0424 07:25:57.819913 132390722602816 pyconfig.py:471] Config param jax_distributed_initialization_timeout: 300
I0424 07:25:57.819935 132390722602816 pyconfig.py:471] Config param jax_profiler_port: 9999
I0424 07:25:57.819951 132390722602816 pyconfig.py:471] Config param key_proj: RematLocation.REMAT
I0424 07:25:57.819968 132390722602816 pyconfig.py:471] Config param kv_cache_buffer: 256
I0424 07:25:57.819982 132390722602816 pyconfig.py:471] Config param kv_lora_rank: 512
I0424 07:25:57.819998 132390722602816 pyconfig.py:471] Config param kv_quant_axis: KvQuantAxis.HEADS_AND_DKV
I0424 07:25:57.820016 132390722602816 pyconfig.py:471] Config param kv_quant_dtype: int8
I0424 07:25:57.820030 132390722602816 pyconfig.py:471] Config param kv_wa_proj: RematLocation.REMAT
I0424 07:25:57.820046 132390722602816 pyconfig.py:471] Config param learning_rate: 0.0002
I0424 07:25:57.820061 132390722602816 pyconfig.py:471] Config param learning_rate_final_fraction: 0.1
I0424 07:25:57.820077 132390722602816 pyconfig.py:471] Config param learning_rate_schedule_steps: 200000
I0424 07:25:57.820092 132390722602816 pyconfig.py:471] Config param load_balance_loss_weight: 0.0
I0424 07:25:57.820109 132390722602816 pyconfig.py:471] Config param load_checkpoint_only_once: False
I0424 07:25:57.820123 132390722602816 pyconfig.py:471] Config param load_from_prefill_dir: False
I0424 07:25:57.820138 132390722602816 pyconfig.py:471] Config param load_full_state_path: 
I0424 07:25:57.820153 132390722602816 pyconfig.py:471] Config param load_parameters_path: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items
I0424 07:25:57.820169 132390722602816 pyconfig.py:471] Config param local_checkpoint_directory: 
I0424 07:25:57.820185 132390722602816 pyconfig.py:471] Config param local_checkpoint_period: 0
I0424 07:25:57.820199 132390722602816 pyconfig.py:471] Config param local_rope_max_timescale: -1
I0424 07:25:57.820215 132390722602816 pyconfig.py:471] Config param local_rope_proportion: 1.0
I0424 07:25:57.820230 132390722602816 pyconfig.py:471] Config param log_config: True
I0424 07:25:57.820245 132390722602816 pyconfig.py:471] Config param log_period: 10
I0424 07:25:57.820260 132390722602816 pyconfig.py:471] Config param logical_axis_rules: (('activation_embed_and_logits_batch', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_embed_and_logits_batch_sequence', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'context', 'expert')), ('activation_vocab', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_vocab', ('tensor', 'tensor_transpose')), ('activation_vocab', 'tensor_sequence'), ('vocab', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('embed_vocab', ('fsdp', 'fsdp_transpose', 'context', 'expert')), ('activation_batch_attn', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('activation_kv_heads', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_length_attn', ('context',)), ('activation_q_length', ('context',)), ('activation_kv_length', ()), ('activation_embed_attn', ('tensor', 'tensor_transpose')), ('activation_kv', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_kv_head_dim', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('q_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('kv_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('qkv', ()), ('kv', ()), ('kv_head_dim', ()), ('q_lora', ('fsdp', 'fsdp_transpose', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'fsdp_transpose', 'context', 'expert')), ('q_lora', ('fsdp', 'context', 'expert')), ('q_lora_up_proj', ()), ('kv_lora', ('fsdp', 'fsdp_transpose', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'fsdp_transpose', 'context', 'expert')), ('kv_lora', ('fsdp', 'context', 'expert')), ('kv_lora_up_proj', ()), ('activation_batch_moe', ('data', 'fsdp', 'fsdp_transpose')), ('activation_length_moe', ('context',)), ('activation_norm_length_moe', ('tensor_sequence', 'context')), ('activation_embed_moe', ('tensor', 'tensor_transpose')), ('activation_mlp_moe', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_exp', ('expert',)), ('exp', 'expert'), ('mlp_moe', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'context')), ('embed_moe', ('fsdp', 'context')), ('activation_mlp', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_length', ('context',)), ('activation_norm_length', ('tensor_sequence', 'context')), ('activation_embed', ('tensor', 'tensor_transpose')), ('activation_stage', 'stage'), ('mlp', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed', ('fsdp', 'fsdp_transpose', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'fsdp_transpose', 'context', 'expert')), ('embed', ('fsdp', 'context', 'expert')), ('norm', ('tensor', 'tensor_transpose')), ('layers', 'stage'), ('diloco', 'diloco'), ('engram_dim', ('tensor',)), ('dense_layers', ()), ('moe_layers', ()), ('mhc', ()), ('prefill_activation_length', ('context',)), ('prefill_activation_norm_length', ('tensor_sequence', 'context')), ('activation_prefill_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_length', ()), ('cache_heads', ('autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence')), ('cache_heads', ('autoregressive', 'tensor', 'tensor_sequence')), ('paged_kv_heads', ('tensor',)), ('cache_batch_prefill', ()), ('cache_batch', ()), ('cache_heads_none', ()), ('cache_kv', ()), ('cache_sequence', ()), ('num_pages', ()), ('tokens_per_page', ()), ('paged_kv_head_dim_size', ()), ('mlp_no_fsdp', ('tensor', 'tensor_sequence', 'autoregressive')), ('embed_tensor_transpose', ('tensor_transpose',)), ('exp_with_fsdp', 'fsdp'))
I0424 07:25:57.820328 132390722602816 pyconfig.py:471] Config param logits_dot_in_fp32: False
I0424 07:25:57.820345 132390722602816 pyconfig.py:471] Config param logits_via_embedding: True
I0424 07:25:57.820361 132390722602816 pyconfig.py:471] Config param lora_input_adapters_path: 
I0424 07:25:57.820376 132390722602816 pyconfig.py:471] Config param loss_algo: grpo
I0424 07:25:57.820392 132390722602816 pyconfig.py:471] Config param lr_schedule_type: LearningRateScheduleType.COSINE
I0424 07:25:57.820409 132390722602816 pyconfig.py:471] Config param managed_mldiagnostics: False
I0424 07:25:57.820425 132390722602816 pyconfig.py:471] Config param managed_mldiagnostics_dir: /deps/maxtext_output/gpt3-52k_2026-04-24-07-25/managed-mldiagnostics
I0424 07:25:57.820439 132390722602816 pyconfig.py:471] Config param managed_mldiagnostics_run_group: 
I0424 07:25:57.820455 132390722602816 pyconfig.py:471] Config param math_verify_num_procs: None
I0424 07:25:57.820473 132390722602816 pyconfig.py:471] Config param math_verify_timeout: 300
I0424 07:25:57.820487 132390722602816 pyconfig.py:471] Config param matmul_precision: MatmulPrecision.DEFAULT
I0424 07:25:57.820504 132390722602816 pyconfig.py:471] Config param max_checkify: False
I0424 07:25:57.820520 132390722602816 pyconfig.py:471] Config param max_concurrency: 256
I0424 07:25:57.820534 132390722602816 pyconfig.py:471] Config param max_corpus_chars: 10000000
I0424 07:25:57.820549 132390722602816 pyconfig.py:471] Config param max_num_batched_tokens: None
I0424 07:25:57.820565 132390722602816 pyconfig.py:471] Config param max_num_checkpoints_to_keep: None
I0424 07:25:57.820579 132390722602816 pyconfig.py:471] Config param max_num_images_per_example: -1
I0424 07:25:57.820595 132390722602816 pyconfig.py:471] Config param max_num_seqs: None
I0424 07:25:57.820609 132390722602816 pyconfig.py:471] Config param max_position_embeddings: 163840
I0424 07:25:57.820624 132390722602816 pyconfig.py:471] Config param max_prefill_predict_length: 64
I0424 07:25:57.820638 132390722602816 pyconfig.py:471] Config param max_sample_len_for_audio: 10000
I0424 07:25:57.820655 132390722602816 pyconfig.py:471] Config param max_segments_per_seq: -1
I0424 07:25:57.820670 132390722602816 pyconfig.py:471] Config param max_source_positions_for_audio: 1500
I0424 07:25:57.820685 132390722602816 pyconfig.py:471] Config param max_target_length: 2048
I0424 07:25:57.820703 132390722602816 pyconfig.py:471] Config param max_timescale_for_audio: 10000.0
I0424 07:25:57.820719 132390722602816 pyconfig.py:471] Config param megablox: True
I0424 07:25:57.820734 132390722602816 pyconfig.py:471] Config param merge_gating_gmm: False
I0424 07:25:57.820750 132390722602816 pyconfig.py:471] Config param mesh_axes: ['diloco', 'data', 'stage', 'fsdp', 'fsdp_transpose', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive']
I0424 07:25:57.820767 132390722602816 pyconfig.py:471] Config param metrics_dir: /deps/maxtext_output/gpt3-52k_2026-04-24-07-25/metrics/
I0424 07:25:57.820781 132390722602816 pyconfig.py:471] Config param metrics_file: 
I0424 07:25:57.820797 132390722602816 pyconfig.py:471] Config param mhc_expansion_rate: 1
I0424 07:25:57.820811 132390722602816 pyconfig.py:471] Config param micro_batch_size_to_eval_on: 64
I0424 07:25:57.820827 132390722602816 pyconfig.py:471] Config param micro_batch_size_to_train_on: 64
I0424 07:25:57.820841 132390722602816 pyconfig.py:471] Config param mla_kv: RematLocation.REMAT
I0424 07:25:57.820856 132390722602816 pyconfig.py:471] Config param mla_naive_kvcache: True
I0424 07:25:57.820871 132390722602816 pyconfig.py:471] Config param mla_q: RematLocation.REMAT
I0424 07:25:57.820887 132390722602816 pyconfig.py:471] Config param mlp_activations: ['gelu']
I0424 07:25:57.820902 132390722602816 pyconfig.py:471] Config param mlp_activations_limit: -1.0
I0424 07:25:57.820918 132390722602816 pyconfig.py:471] Config param mlp_bias: False
I0424 07:25:57.820940 132390722602816 pyconfig.py:471] Config param mlp_dim: 64
I0424 07:25:57.820956 132390722602816 pyconfig.py:471] Config param mlpwi: RematLocation.REMAT
I0424 07:25:57.820971 132390722602816 pyconfig.py:471] Config param mlpwi_0: RematLocation.REMAT
I0424 07:25:57.820987 132390722602816 pyconfig.py:471] Config param mlpwi_1: RematLocation.REMAT
I0424 07:25:57.821001 132390722602816 pyconfig.py:471] Config param mlpwo: RematLocation.REMAT
I0424 07:25:57.821017 132390722602816 pyconfig.py:471] Config param moba: False
I0424 07:25:57.821031 132390722602816 pyconfig.py:471] Config param moba_chunk_size: 1024
I0424 07:25:57.821046 132390722602816 pyconfig.py:471] Config param moba_topk: 8
I0424 07:25:57.821062 132390722602816 pyconfig.py:471] Config param model_call_mode: 
I0424 07:25:57.821077 132390722602816 pyconfig.py:471] Config param model_name: gpt3-52k
I0424 07:25:57.821093 132390722602816 pyconfig.py:471] Config param moe_expert_input_dim: -1
I0424 07:25:57.821106 132390722602816 pyconfig.py:471] Config param moe_fsdp_use_two_stage_all_gather: False
I0424 07:25:57.821122 132390722602816 pyconfig.py:471] Config param moe_mlp_dim: -1
I0424 07:25:57.821136 132390722602816 pyconfig.py:471] Config param moe_mlpwi_0: RematLocation.REMAT
I0424 07:25:57.821152 132390722602816 pyconfig.py:471] Config param moe_mlpwi_1: RematLocation.REMAT
I0424 07:25:57.821167 132390722602816 pyconfig.py:471] Config param moe_mlpwo: RematLocation.REMAT
I0424 07:25:57.821182 132390722602816 pyconfig.py:471] Config param monitor_goodput: False
I0424 07:25:57.821197 132390722602816 pyconfig.py:471] Config param monitor_step_time_deviation: True
I0424 07:25:57.821213 132390722602816 pyconfig.py:471] Config param mrope_section: [24, 20, 20]
I0424 07:25:57.821227 132390722602816 pyconfig.py:471] Config param mscale: 1.0
I0424 07:25:57.821243 132390722602816 pyconfig.py:471] Config param mtc_data_parallelism: 0
I0424 07:25:57.821257 132390722602816 pyconfig.py:471] Config param mtp_eval_target_module: 0
I0424 07:25:57.821273 132390722602816 pyconfig.py:471] Config param mtp_loss_scaling_factor: 0.1
I0424 07:25:57.821287 132390722602816 pyconfig.py:471] Config param mtp_num_layers: 0
I0424 07:25:57.821303 132390722602816 pyconfig.py:471] Config param mu_dtype: float32
I0424 07:25:57.821326 132390722602816 pyconfig.py:471] Config param multi_sampling: False
I0424 07:25:57.821341 132390722602816 pyconfig.py:471] Config param multi_tier_checkpointing_backup_interval_minutes: 0
I0424 07:25:57.821357 132390722602816 pyconfig.py:471] Config param muon_beta: 0.95
I0424 07:25:57.821374 132390722602816 pyconfig.py:471] Config param muon_consistent_rms: None
I0424 07:25:57.821390 132390722602816 pyconfig.py:471] Config param muon_weight_decay: 0.0
I0424 07:25:57.821404 132390722602816 pyconfig.py:471] Config param n_routing_groups: -1
I0424 07:25:57.821420 132390722602816 pyconfig.py:471] Config param n_window_for_audio: 50
I0424 07:25:57.821439 132390722602816 pyconfig.py:471] Config param n_window_infer_for_audio: 800
I0424 07:25:57.821455 132390722602816 pyconfig.py:471] Config param nope_layer_interval: -1
I0424 07:25:57.821469 132390722602816 pyconfig.py:471] Config param norm_topk_prob: False
I0424 07:25:57.821485 132390722602816 pyconfig.py:471] Config param normalization_layer_epsilon: 1e-05
I0424 07:25:57.821502 132390722602816 pyconfig.py:471] Config param normalize_embedding_logits: False
I0424 07:25:57.821518 132390722602816 pyconfig.py:471] Config param num_attention_heads_for_vit: 16
I0424 07:25:57.821532 132390722602816 pyconfig.py:471] Config param num_batches: 4
I0424 07:25:57.821547 132390722602816 pyconfig.py:471] Config param num_channels_for_vit: 3
I0424 07:25:57.821562 132390722602816 pyconfig.py:471] Config param num_conv_layers_for_audio: 3
I0424 07:25:57.821577 132390722602816 pyconfig.py:471] Config param num_decoder_layers: 1
I0424 07:25:57.821593 132390722602816 pyconfig.py:471] Config param num_diloco_replicas: 1
I0424 07:25:57.821609 132390722602816 pyconfig.py:471] Config param num_epoch: 1
I0424 07:25:57.821623 132390722602816 pyconfig.py:471] Config param num_eval_passes: 1
I0424 07:25:57.821638 132390722602816 pyconfig.py:471] Config param num_experts: 1
I0424 07:25:57.821654 132390722602816 pyconfig.py:471] Config param num_experts_per_tok: 1
I0424 07:25:57.821669 132390722602816 pyconfig.py:471] Config param num_generations: 2
I0424 07:25:57.821684 132390722602816 pyconfig.py:471] Config param num_hidden_layers_for_vit: 34
I0424 07:25:57.821703 132390722602816 pyconfig.py:471] Config param num_iterations: 1
I0424 07:25:57.821718 132390722602816 pyconfig.py:471] Config param num_kv_heads: 2
I0424 07:25:57.821733 132390722602816 pyconfig.py:471] Config param num_layers_per_pipeline_stage: 1
I0424 07:25:57.821748 132390722602816 pyconfig.py:471] Config param num_mel_bins_for_audio: 128
I0424 07:25:57.821762 132390722602816 pyconfig.py:471] Config param num_pipeline_microbatches: -1
I0424 07:25:57.821778 132390722602816 pyconfig.py:471] Config param num_pipeline_repeats: -1
I0424 07:25:57.821793 132390722602816 pyconfig.py:471] Config param num_position_embeddings_for_vit: 1024
I0424 07:25:57.821807 132390722602816 pyconfig.py:471] Config param num_query_heads: 2
I0424 07:25:57.821823 132390722602816 pyconfig.py:471] Config param num_samplers_slices: -1
I0424 07:25:57.821837 132390722602816 pyconfig.py:471] Config param num_slices: 1
I0424 07:25:57.821853 132390722602816 pyconfig.py:471] Config param num_target_devices: 32
I0424 07:25:57.821867 132390722602816 pyconfig.py:471] Config param num_test_batches: 5
I0424 07:25:57.821882 132390722602816 pyconfig.py:471] Config param num_trainer_slices: -1
I0424 07:25:57.821897 132390722602816 pyconfig.py:471] Config param num_vocab_tiling: 1
I0424 07:25:57.821912 132390722602816 pyconfig.py:471] Config param off_policy_steps: 0
I0424 07:25:57.821936 132390722602816 pyconfig.py:471] Config param offline_data_dir: None
I0424 07:25:57.821953 132390722602816 pyconfig.py:471] Config param opt_type: OptimizerType.ADAM_PAX
I0424 07:25:57.821969 132390722602816 pyconfig.py:471] Config param optimize_mesh_for_tpu_v6e: False
I0424 07:25:57.821985 132390722602816 pyconfig.py:471] Config param optimizer_memory_host_offload: False
I0424 07:25:57.822000 132390722602816 pyconfig.py:471] Config param original_max_position_embeddings: 4096
I0424 07:25:57.822016 132390722602816 pyconfig.py:471] Config param out_hidden_size_for_vit: 512
I0424 07:25:57.822030 132390722602816 pyconfig.py:471] Config param out_proj: RematLocation.REMAT
I0424 07:25:57.822046 132390722602816 pyconfig.py:471] Config param output_dim_for_audio: 512
I0424 07:25:57.822060 132390722602816 pyconfig.py:471] Config param override_logical_axis_rules: False
I0424 07:25:57.822076 132390722602816 pyconfig.py:471] Config param override_model_config: True
I0424 07:25:57.822092 132390722602816 pyconfig.py:471] Config param packing: True
I0424 07:25:57.822106 132390722602816 pyconfig.py:471] Config param pagedattn_head_dim_alignment: 128
I0424 07:25:57.822121 132390722602816 pyconfig.py:471] Config param pagedattn_max_pages_per_group: -1
I0424 07:25:57.822137 132390722602816 pyconfig.py:471] Config param pagedattn_num_pages: 64
I0424 07:25:57.822151 132390722602816 pyconfig.py:471] Config param pagedattn_pages_per_compute_block: 4
I0424 07:25:57.822166 132390722602816 pyconfig.py:471] Config param pagedattn_tokens_per_page: 32
I0424 07:25:57.822182 132390722602816 pyconfig.py:471] Config param param_scan_axis: 1
I0424 07:25:57.822196 132390722602816 pyconfig.py:471] Config param parameter_memory_host_offload: False
I0424 07:25:57.822211 132390722602816 pyconfig.py:471] Config param partial_rotary_factor: 1.0
I0424 07:25:57.822226 132390722602816 pyconfig.py:471] Config param patch_size_for_vit: 14
I0424 07:25:57.822241 132390722602816 pyconfig.py:471] Config param penalty_incorrect_answer: -1.0
I0424 07:25:57.822256 132390722602816 pyconfig.py:471] Config param penalty_incorrect_format: -0.5
I0424 07:25:57.822272 132390722602816 pyconfig.py:471] Config param per_device_batch_size: 2
I0424 07:25:57.822287 132390722602816 pyconfig.py:471] Config param per_device_batch_size_increment: 2.0
I0424 07:25:57.822302 132390722602816 pyconfig.py:471] Config param per_device_batch_size_start: 4.0
I0424 07:25:57.822318 132390722602816 pyconfig.py:471] Config param pipeline_delay_activation_forwarding: False
I0424 07:25:57.822334 132390722602816 pyconfig.py:471] Config param pipeline_fsdp_ag_once: False
I0424 07:25:57.822348 132390722602816 pyconfig.py:471] Config param pipeline_fsdp_ag_per_repeat: False
I0424 07:25:57.822363 132390722602816 pyconfig.py:471] Config param pipeline_parallel_layers: 1
I0424 07:25:57.822378 132390722602816 pyconfig.py:471] Config param pixel_shuffle_ratio_for_vit: 0.5
I0424 07:25:57.822394 132390722602816 pyconfig.py:471] Config param posemb_type_for_vit: learn
I0424 07:25:57.822408 132390722602816 pyconfig.py:471] Config param position_id_per_seconds: 25
I0424 07:25:57.822424 132390722602816 pyconfig.py:471] Config param prefill_cache_axis_order: 1,2,0,3
I0424 07:25:57.822438 132390722602816 pyconfig.py:471] Config param prefill_cache_dir: 
I0424 07:25:57.822454 132390722602816 pyconfig.py:471] Config param prefill_chunk_size: 256
I0424 07:25:57.822468 132390722602816 pyconfig.py:471] Config param prefill_slice: v5e-16
I0424 07:25:57.822484 132390722602816 pyconfig.py:471] Config param prefix_caching_dram_byte: 100000000000
I0424 07:25:57.822498 132390722602816 pyconfig.py:471] Config param prefix_caching_hbm_byte: 10000000000
I0424 07:25:57.822514 132390722602816 pyconfig.py:471] Config param prefuse_moe_weights: False
I0424 07:25:57.822528 132390722602816 pyconfig.py:471] Config param profile_cleanly: True
I0424 07:25:57.822543 132390722602816 pyconfig.py:471] Config param profile_periodically_period: -1
I0424 07:25:57.822558 132390722602816 pyconfig.py:471] Config param profile_power_events: False
I0424 07:25:57.822573 132390722602816 pyconfig.py:471] Config param profiler: ProfilerType.NONE
I0424 07:25:57.822591 132390722602816 pyconfig.py:471] Config param profiler_steps: 5
I0424 07:25:57.822605 132390722602816 pyconfig.py:471] Config param projector_dropout_for_vit: 0.0
I0424 07:25:57.822619 132390722602816 pyconfig.py:471] Config param projector_input_dim_for_vit: 4096
I0424 07:25:57.822635 132390722602816 pyconfig.py:471] Config param projector_output_dim_for_vit: 4096
I0424 07:25:57.822649 132390722602816 pyconfig.py:471] Config param prometheus_port: 0
I0424 07:25:57.822665 132390722602816 pyconfig.py:471] Config param prompt: I love to
I0424 07:25:57.822681 132390722602816 pyconfig.py:471] Config param pure_nnx: False
I0424 07:25:57.822700 132390722602816 pyconfig.py:471] Config param pure_nnx_decoder: False
I0424 07:25:57.822717 132390722602816 pyconfig.py:471] Config param q_lora_rank: 0
I0424 07:25:57.822732 132390722602816 pyconfig.py:471] Config param qk_clip_threshold: 100.0
I0424 07:25:57.822747 132390722602816 pyconfig.py:471] Config param qk_nope_head_dim: 128
I0424 07:25:57.822762 132390722602816 pyconfig.py:471] Config param qk_norm_with_scale: True
I0424 07:25:57.822777 132390722602816 pyconfig.py:471] Config param qk_rope_head_dim: 64
I0424 07:25:57.822793 132390722602816 pyconfig.py:471] Config param qkv_proj: RematLocation.REMAT
I0424 07:25:57.822809 132390722602816 pyconfig.py:471] Config param quant_cfg_path: 
I0424 07:25:57.822823 132390722602816 pyconfig.py:471] Config param quantization: QuantizationType.NONE
I0424 07:25:57.822841 132390722602816 pyconfig.py:471] Config param quantization_local_shard_count: 4
I0424 07:25:57.822856 132390722602816 pyconfig.py:471] Config param quantize_kvcache: False
I0424 07:25:57.822872 132390722602816 pyconfig.py:471] Config param query_proj: RematLocation.REMAT
I0424 07:25:57.822887 132390722602816 pyconfig.py:471] Config param query_wa_proj: RematLocation.REMAT
I0424 07:25:57.822901 132390722602816 pyconfig.py:471] Config param ragged_block_size: 256
I0424 07:25:57.822917 132390722602816 pyconfig.py:471] Config param ragged_buffer_factor: -1.0
I0424 07:25:57.822941 132390722602816 pyconfig.py:471] Config param rampup_end_step: 0
I0424 07:25:57.822957 132390722602816 pyconfig.py:471] Config param rampup_samples_per_increment_to_load: None
I0424 07:25:57.822973 132390722602816 pyconfig.py:471] Config param reasoning_end_token: </reasoning>
I0424 07:25:57.822988 132390722602816 pyconfig.py:471] Config param reasoning_start_token: <reasoning>
I0424 07:25:57.823004 132390722602816 pyconfig.py:471] Config param record_internal_nn_metrics: 0
I0424 07:25:57.823018 132390722602816 pyconfig.py:471] Config param remat_policy: full
I0424 07:25:57.823034 132390722602816 pyconfig.py:471] Config param remat_policy_for_vit: minimal
I0424 07:25:57.823048 132390722602816 pyconfig.py:471] Config param remove_size_one_mesh_axis_from_type: True
I0424 07:25:57.823064 132390722602816 pyconfig.py:471] Config param replicate_quant_scale: False
I0424 07:25:57.823081 132390722602816 pyconfig.py:471] Config param replicator_backup_interval_minutes: 0
I0424 07:25:57.823097 132390722602816 pyconfig.py:471] Config param report_heartbeat_metric_for_gcp_monitoring: False
I0424 07:25:57.823112 132390722602816 pyconfig.py:471] Config param report_performance_metric_for_gcp_monitoring: False
I0424 07:25:57.823127 132390722602816 pyconfig.py:471] Config param reshape_q: False
I0424 07:25:57.823142 132390722602816 pyconfig.py:471] Config param return_log_prob: False
I0424 07:25:57.823157 132390722602816 pyconfig.py:471] Config param reuse_example_batch: 0
I0424 07:25:57.823171 132390722602816 pyconfig.py:471] Config param reward_exact_answer: 5.0
I0424 07:25:57.823188 132390722602816 pyconfig.py:471] Config param reward_exact_format_match: 3.0
I0424 07:25:57.823203 132390722602816 pyconfig.py:471] Config param reward_partial_format_match: 0.5
I0424 07:25:57.823220 132390722602816 pyconfig.py:471] Config param reward_ratio_guess_to_answer_high: 0.5
I0424 07:25:57.823235 132390722602816 pyconfig.py:471] Config param reward_ratio_guess_to_answer_low: 0.25
I0424 07:25:57.823251 132390722602816 pyconfig.py:471] Config param reward_white_space_format_match: 1.5
I0424 07:25:57.823266 132390722602816 pyconfig.py:471] Config param rl: {'num_generations': 2, 'num_iterations': 1, 'grpo_beta': 0.08, 'grpo_epsilon': 0.2, 'loss_algo': 'grpo', 'use_agentic_rollout': False, 'max_concurrency': 256, 'off_policy_steps': 0, 'system_prompt': '', 'degenerate_group_masking': True, 'epsilon_high': None}
I0424 07:25:57.823287 132390722602816 pyconfig.py:471] Config param rollout_data_parallelism: -1
I0424 07:25:57.823302 132390722602816 pyconfig.py:471] Config param rollout_expert_parallelism: 1
I0424 07:25:57.823317 132390722602816 pyconfig.py:471] Config param rollout_micro_batch_size: -1
I0424 07:25:57.823332 132390722602816 pyconfig.py:471] Config param rollout_tensor_parallelism: -1
I0424 07:25:57.823347 132390722602816 pyconfig.py:471] Config param rope_attention_scaling: False
I0424 07:25:57.823362 132390722602816 pyconfig.py:471] Config param rope_factor: 40
I0424 07:25:57.823377 132390722602816 pyconfig.py:471] Config param rope_interleave: True
I0424 07:25:57.823392 132390722602816 pyconfig.py:471] Config param rope_linear_scaling_factor: 1.0
I0424 07:25:57.823408 132390722602816 pyconfig.py:471] Config param rope_max_timescale: 10000
I0424 07:25:57.823422 132390722602816 pyconfig.py:471] Config param rope_min_timescale: 1
I0424 07:25:57.823436 132390722602816 pyconfig.py:471] Config param rope_theta_for_vit: 10000
I0424 07:25:57.823452 132390722602816 pyconfig.py:471] Config param rope_truncate: True
I0424 07:25:57.823467 132390722602816 pyconfig.py:471] Config param rope_type: RopeType.DEFAULT
I0424 07:25:57.823485 132390722602816 pyconfig.py:471] Config param rope_use_scale: True
I0424 07:25:57.823500 132390722602816 pyconfig.py:471] Config param routed_bias: False
I0424 07:25:57.823515 132390722602816 pyconfig.py:471] Config param routed_bias_update_rate: 0.0
I0424 07:25:57.823530 132390722602816 pyconfig.py:471] Config param routed_scaling_factor: 1.0
I0424 07:25:57.823546 132390722602816 pyconfig.py:471] Config param routed_score_func: 
I0424 07:25:57.823561 132390722602816 pyconfig.py:471] Config param run_name: gpt3-52k_2026-04-24-07-25
I0424 07:25:57.823575 132390722602816 pyconfig.py:471] Config param sa_block_kv: 512
I0424 07:25:57.823591 132390722602816 pyconfig.py:471] Config param sa_block_kv_compute: 512
I0424 07:25:57.823605 132390722602816 pyconfig.py:471] Config param sa_block_kv_dkv: 512
I0424 07:25:57.823621 132390722602816 pyconfig.py:471] Config param sa_block_kv_dkv_compute: 512
I0424 07:25:57.823635 132390722602816 pyconfig.py:471] Config param sa_block_kv_dq: 512
I0424 07:25:57.823651 132390722602816 pyconfig.py:471] Config param sa_block_q: 512
I0424 07:25:57.823665 132390722602816 pyconfig.py:471] Config param sa_block_q_dkv: 512
I0424 07:25:57.823681 132390722602816 pyconfig.py:471] Config param sa_block_q_dq: 512
I0424 07:25:57.823700 132390722602816 pyconfig.py:471] Config param sa_k_layout: HEAD_DIM_MINOR
I0424 07:25:57.823714 132390722602816 pyconfig.py:471] Config param sa_q_layout: HEAD_DIM_MINOR
I0424 07:25:57.823730 132390722602816 pyconfig.py:471] Config param sa_use_fused_bwd_kernel: False
I0424 07:25:57.823746 132390722602816 pyconfig.py:471] Config param sa_v_layout: HEAD_DIM_MINOR
I0424 07:25:57.823760 132390722602816 pyconfig.py:471] Config param sampler_devices_fraction: 0.5
I0424 07:25:57.823776 132390722602816 pyconfig.py:471] Config param save_checkpoint_on_completion: True
I0424 07:25:57.823790 132390722602816 pyconfig.py:471] Config param save_config_to_gcs: False
I0424 07:25:57.823806 132390722602816 pyconfig.py:471] Config param save_quantized_params_path: 
I0424 07:25:57.823820 132390722602816 pyconfig.py:471] Config param scale_embedding_for_audio: True
I0424 07:25:57.823835 132390722602816 pyconfig.py:471] Config param scan_layers: True
I0424 07:25:57.823850 132390722602816 pyconfig.py:471] Config param scan_layers_per_stage: False
I0424 07:25:57.823865 132390722602816 pyconfig.py:471] Config param scan_pipeline_iterations: True
I0424 07:25:57.823880 132390722602816 pyconfig.py:471] Config param scan_pipeline_repeats: False
I0424 07:25:57.823896 132390722602816 pyconfig.py:471] Config param set_remat_policy_on_layers_per_stage: False
I0424 07:25:57.823912 132390722602816 pyconfig.py:471] Config param set_remat_policy_on_pipeline_iterations: True
I0424 07:25:57.823936 132390722602816 pyconfig.py:471] Config param sft_train_on_completion_only: False
I0424 07:25:57.823952 132390722602816 pyconfig.py:471] Config param shard_exp_on_fsdp: False
I0424 07:25:57.823968 132390722602816 pyconfig.py:471] Config param shard_mode: ShardMode.AUTO
I0424 07:25:57.823983 132390722602816 pyconfig.py:471] Config param shard_optimizer_over_data: False
I0424 07:25:57.823999 132390722602816 pyconfig.py:471] Config param sharding_strategy: None
I0424 07:25:57.824014 132390722602816 pyconfig.py:471] Config param sharding_tolerance: 0.02
I0424 07:25:57.824030 132390722602816 pyconfig.py:471] Config param shardy: True
I0424 07:25:57.824045 132390722602816 pyconfig.py:471] Config param share_kv_projections: False
I0424 07:25:57.824061 132390722602816 pyconfig.py:471] Config param shared_experts: 0
I0424 07:25:57.824075 132390722602816 pyconfig.py:471] Config param sinkhorn_iterations: 20
I0424 07:25:57.824091 132390722602816 pyconfig.py:471] Config param skip_first_n_steps_for_profiler: 1
I0424 07:25:57.824106 132390722602816 pyconfig.py:471] Config param skip_jax_distributed_system: False
I0424 07:25:57.824120 132390722602816 pyconfig.py:471] Config param skip_step_interval: 128
I0424 07:25:57.824136 132390722602816 pyconfig.py:471] Config param skip_step_on_spikes: False
I0424 07:25:57.824152 132390722602816 pyconfig.py:471] Config param skip_step_scaling_factor: 6.0
I0424 07:25:57.824168 132390722602816 pyconfig.py:471] Config param sliding_window_size: 0
I0424 07:25:57.824184 132390722602816 pyconfig.py:471] Config param solution_end_token: </answer>
I0424 07:25:57.824198 132390722602816 pyconfig.py:471] Config param solution_start_token: <answer>
I0424 07:25:57.824214 132390722602816 pyconfig.py:471] Config param source_checkpoint_layout: orbax
I0424 07:25:57.824229 132390722602816 pyconfig.py:471] Config param sparse_matmul: True
I0424 07:25:57.824244 132390722602816 pyconfig.py:471] Config param spatial_merge_size_for_vit: 2
I0424 07:25:57.824259 132390722602816 pyconfig.py:471] Config param stack_prefill_result_cache: False
I0424 07:25:57.824274 132390722602816 pyconfig.py:471] Config param stack_trace_interval_seconds: 600
I0424 07:25:57.824290 132390722602816 pyconfig.py:471] Config param stack_trace_to_cloud: False
I0424 07:25:57.824304 132390722602816 pyconfig.py:471] Config param step_deviation_interval_seconds: 30
I0424 07:25:57.824320 132390722602816 pyconfig.py:471] Config param steps: 200000
I0424 07:25:57.824334 132390722602816 pyconfig.py:471] Config param stop_strings: None
I0424 07:25:57.824350 132390722602816 pyconfig.py:471] Config param student_overrides: {'model_name': 'llama3.1-8b'}
I0424 07:25:57.824367 132390722602816 pyconfig.py:471] Config param student_params_to_update: None
I0424 07:25:57.824382 132390722602816 pyconfig.py:471] Config param subslice_shape: 
I0424 07:25:57.824397 132390722602816 pyconfig.py:471] Config param swap_space_vllm_gb: 2
I0424 07:25:57.824412 132390722602816 pyconfig.py:471] Config param system_prompt: 
I0424 07:25:57.824427 132390722602816 pyconfig.py:471] Config param target_eval_loss: 0.0
I0424 07:25:57.824442 132390722602816 pyconfig.py:471] Config param teacher_overrides: {'model_name': 'llama3.1-8b'}
I0424 07:25:57.824459 132390722602816 pyconfig.py:471] Config param temperature_tuning: False
I0424 07:25:57.824473 132390722602816 pyconfig.py:471] Config param temporal_patch_size_for_vit: 2
I0424 07:25:57.824489 132390722602816 pyconfig.py:471] Config param tensorboard_dir: /deps/maxtext_output/gpt3-52k_2026-04-24-07-25/tensorboard/
I0424 07:25:57.824503 132390722602816 pyconfig.py:471] Config param tensors_on_device: None
I0424 07:25:57.824519 132390722602816 pyconfig.py:471] Config param tensors_to_offload: None
I0424 07:25:57.824534 132390722602816 pyconfig.py:471] Config param test_batch_start_index: 0
I0424 07:25:57.824550 132390722602816 pyconfig.py:471] Config param tile_size_for_vit: 336
I0424 07:25:57.824565 132390722602816 pyconfig.py:471] Config param tokenize_eval_data: True
I0424 07:25:57.824580 132390722602816 pyconfig.py:471] Config param tokenize_train_data: True
I0424 07:25:57.824596 132390722602816 pyconfig.py:471] Config param tokenizer_path: meta-llama/Llama-3.1-8B
I0424 07:25:57.824611 132390722602816 pyconfig.py:471] Config param tokenizer_type: TokenizerType.HUGGINGFACE
I0424 07:25:57.824628 132390722602816 pyconfig.py:471] Config param topk_routing_group: -1
I0424 07:25:57.824644 132390722602816 pyconfig.py:471] Config param train_data_columns: ['text']
I0424 07:25:57.824659 132390722602816 pyconfig.py:471] Config param train_fraction: 1.0
I0424 07:25:57.824675 132390722602816 pyconfig.py:471] Config param train_image_column: image
I0424 07:25:57.824692 132390722602816 pyconfig.py:471] Config param train_micro_batch_size: -1
I0424 07:25:57.824708 132390722602816 pyconfig.py:471] Config param train_split: train
I0424 07:25:57.824722 132390722602816 pyconfig.py:471] Config param trainable_parameters_mask: []
I0424 07:25:57.824738 132390722602816 pyconfig.py:471] Config param trainable_position_size: 2048
I0424 07:25:57.824753 132390722602816 pyconfig.py:471] Config param trainer_devices_fraction: 0.5
I0424 07:25:57.824768 132390722602816 pyconfig.py:471] Config param upload_all_profiler_results: False
I0424 07:25:57.824784 132390722602816 pyconfig.py:471] Config param use_2d_fsdp_sharding: False
I0424 07:25:57.824799 132390722602816 pyconfig.py:471] Config param use_agentic_rollout: False
I0424 07:25:57.824814 132390722602816 pyconfig.py:471] Config param use_audio: False
I0424 07:25:57.824830 132390722602816 pyconfig.py:471] Config param use_audio_in_video: False
I0424 07:25:57.824845 132390722602816 pyconfig.py:471] Config param use_batch_split_schedule: False
I0424 07:25:57.824860 132390722602816 pyconfig.py:471] Config param use_chat_template: False
I0424 07:25:57.824875 132390722602816 pyconfig.py:471] Config param use_chunked_prefill: False
I0424 07:25:57.824890 132390722602816 pyconfig.py:471] Config param use_custom_sort_vjp: True
I0424 07:25:57.824904 132390722602816 pyconfig.py:471] Config param use_dpo: False
I0424 07:25:57.824920 132390722602816 pyconfig.py:471] Config param use_gather_mosaic_kernel: False
I0424 07:25:57.824943 132390722602816 pyconfig.py:471] Config param use_grpo: True
I0424 07:25:57.824959 132390722602816 pyconfig.py:471] Config param use_indexer: False
I0424 07:25:57.824975 132390722602816 pyconfig.py:471] Config param use_iota_embed: True
I0424 07:25:57.824989 132390722602816 pyconfig.py:471] Config param use_jax_splash: False
I0424 07:25:57.825005 132390722602816 pyconfig.py:471] Config param use_max_logit_estimate: -1
I0424 07:25:57.825020 132390722602816 pyconfig.py:471] Config param use_mrope: False
I0424 07:25:57.825035 132390722602816 pyconfig.py:471] Config param use_multimodal: False
I0424 07:25:57.825050 132390722602816 pyconfig.py:471] Config param use_pathways: True
I0424 07:25:57.825065 132390722602816 pyconfig.py:471] Config param use_post_attn_norm: False
I0424 07:25:57.825079 132390722602816 pyconfig.py:471] Config param use_post_ffw_norm: False
I0424 07:25:57.825095 132390722602816 pyconfig.py:471] Config param use_qk_clip: False
I0424 07:25:57.825109 132390722602816 pyconfig.py:471] Config param use_qk_norm: False
I0424 07:25:57.825124 132390722602816 pyconfig.py:471] Config param use_qk_norm_in_gdn: True
I0424 07:25:57.825139 132390722602816 pyconfig.py:471] Config param use_qwix_quantization: False
I0424 07:25:57.825154 132390722602816 pyconfig.py:471] Config param use_ragged_attention: False
I0424 07:25:57.825170 132390722602816 pyconfig.py:471] Config param use_random_routing: False
I0424 07:25:57.825184 132390722602816 pyconfig.py:471] Config param use_replicator_service: False
I0424 07:25:57.825200 132390722602816 pyconfig.py:471] Config param use_ring_of_experts: False
I0424 07:25:57.825216 132390722602816 pyconfig.py:471] Config param use_sft: False
I0424 07:25:57.825230 132390722602816 pyconfig.py:471] Config param use_splash_scheduler: False
I0424 07:25:57.825245 132390722602816 pyconfig.py:471] Config param use_tokamax_gmm: False
I0424 07:25:57.825259 132390722602816 pyconfig.py:471] Config param use_tokamax_splash: False
I0424 07:25:57.825275 132390722602816 pyconfig.py:471] Config param use_truncation: True
I0424 07:25:57.825289 132390722602816 pyconfig.py:471] Config param use_tunix_gradient_accumulation: False
I0424 07:25:57.825305 132390722602816 pyconfig.py:471] Config param use_untrainable_positional_embedding: False
I0424 07:25:57.825320 132390722602816 pyconfig.py:471] Config param use_vertex_tensorboard: False
I0424 07:25:57.825335 132390722602816 pyconfig.py:471] Config param using_pipeline_parallelism: False
I0424 07:25:57.825349 132390722602816 pyconfig.py:471] Config param v_head_dim: 128
I0424 07:25:57.825365 132390722602816 pyconfig.py:471] Config param v_norm_with_scale: True
I0424 07:25:57.825379 132390722602816 pyconfig.py:471] Config param value_proj: RematLocation.REMAT
I0424 07:25:57.825396 132390722602816 pyconfig.py:471] Config param vertex_tensorboard_project: 
I0424 07:25:57.825411 132390722602816 pyconfig.py:471] Config param vertex_tensorboard_region: 
I0424 07:25:57.825425 132390722602816 pyconfig.py:471] Config param video_path: 
I0424 07:25:57.825441 132390722602816 pyconfig.py:471] Config param video_placeholder: <|video|>
I0424 07:25:57.825455 132390722602816 pyconfig.py:471] Config param vision_output_dim_for_vit: 4096
I0424 07:25:57.825471 132390722602816 pyconfig.py:471] Config param vision_output_length: -1
I0424 07:25:57.825485 132390722602816 pyconfig.py:471] Config param vllm_additional_config: {}
I0424 07:25:57.825501 132390722602816 pyconfig.py:471] Config param vllm_hf_config_path: 
I0424 07:25:57.825517 132390722602816 pyconfig.py:471] Config param vllm_hf_overrides: {}
I0424 07:25:57.825531 132390722602816 pyconfig.py:471] Config param vocab_size: 32000
I0424 07:25:57.825547 132390722602816 pyconfig.py:471] Config param warmup_steps_fraction: 0.1
I0424 07:25:57.825563 132390722602816 pyconfig.py:471] Config param weight_dtype: float32
I0424 07:25:57.825587 132390722602816 pyconfig.py:471] Config param weight_quantization_calibration_method: absmax
I0424 07:25:57.825603 132390722602816 pyconfig.py:471] Config param wi_tile_dlhs_batch_seq: 512
I0424 07:25:57.825618 132390722602816 pyconfig.py:471] Config param wi_tile_dlhs_embed_dim: 1024
I0424 07:25:57.825633 132390722602816 pyconfig.py:471] Config param wi_tile_dlhs_mlp_dim: 1024
I0424 07:25:57.825648 132390722602816 pyconfig.py:471] Config param wi_tile_drhs_batch_seq: 512
I0424 07:25:57.825664 132390722602816 pyconfig.py:471] Config param wi_tile_drhs_embed_dim: 1024
I0424 07:25:57.825678 132390722602816 pyconfig.py:471] Config param wi_tile_drhs_mlp_dim: 1024
I0424 07:25:57.825698 132390722602816 pyconfig.py:471] Config param wi_tile_fwd_batch_seq: 512
I0424 07:25:57.825714 132390722602816 pyconfig.py:471] Config param wi_tile_fwd_embed_dim: 1024
I0424 07:25:57.825728 132390722602816 pyconfig.py:471] Config param wi_tile_fwd_mlp_dim: 1024
I0424 07:25:57.825744 132390722602816 pyconfig.py:471] Config param wo_tile_dlhs_batch_seq: 512
I0424 07:25:57.825759 132390722602816 pyconfig.py:471] Config param wo_tile_dlhs_embed_dim: 1024
I0424 07:25:57.825774 132390722602816 pyconfig.py:471] Config param wo_tile_dlhs_mlp_dim: 1024
I0424 07:25:57.825789 132390722602816 pyconfig.py:471] Config param wo_tile_drhs_batch_seq: 512
I0424 07:25:57.825805 132390722602816 pyconfig.py:471] Config param wo_tile_drhs_embed_dim: 1024
I0424 07:25:57.825819 132390722602816 pyconfig.py:471] Config param wo_tile_drhs_mlp_dim: 1024
I0424 07:25:57.825834 132390722602816 pyconfig.py:471] Config param wo_tile_fwd_batch_seq: 512
I0424 07:25:57.825850 132390722602816 pyconfig.py:471] Config param wo_tile_fwd_embed_dim: 1024
I0424 07:25:57.825866 132390722602816 pyconfig.py:471] Config param wo_tile_fwd_mlp_dim: 1024
I0424 07:25:57.825880 132390722602816 pyconfig.py:471] Config param wsd_decay_steps_fraction: 0.1
I0424 07:25:57.825896 132390722602816 pyconfig.py:471] Config param wsd_decay_style: WsdDecayStyle.LINEAR
I0424 07:25:57.825914 132390722602816 pyconfig.py:471] Config param xprof_e2e_enable_fw_power_level_event: False
I0424 07:25:57.825938 132390722602816 pyconfig.py:471] Config param xprof_e2e_enable_fw_thermal_event: False
I0424 07:25:57.825953 132390722602816 pyconfig.py:471] Config param xprof_e2e_enable_fw_throttle_event: False
I0424 07:25:57.825968 132390722602816 pyconfig.py:471] Config param xprof_tpu_power_trace_level: 0
I0424 07:25:57.825986 132390722602816 pyconfig.py:471] Config param z_loss_multiplier: 0.0
I0424 07:25:57.826296 132390722602816 tokenizer.py:245] Tokenizer path: meta-llama/Llama-2-7b-chat-hf
I0424 07:25:57.826331 132390722602816 tokenizer.py:224] Loading HF tokenizer: meta-llama/Llama-2-7b-chat-hf
I0424 07:25:58.022304 132390722602816 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json "HTTP/1.1 200 OK"
I0424 07:25:58.152364 132390722602816 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json "HTTP/1.1 200 OK"
I0424 07:25:58.270585 132390722602816 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer_config.json "HTTP/1.1 200 OK"
I0424 07:25:58.383074 132390722602816 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer_config.json "HTTP/1.1 200 OK"
I0424 07:25:58.495115 132390722602816 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main/additional_chat_templates?recursive=false&expand=false "HTTP/1.1 404 Not Found"
I0424 07:25:58.599084 132390722602816 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main?recursive=true&expand=false "HTTP/1.1 200 OK"
I0424 07:25:58.715487 132390722602816 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer.model "HTTP/1.1 302 Found"
I0424 07:25:58.825349 132390722602816 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/xet-read-token/f5db02db724555f92da89c216ac04704f23d4590 "HTTP/1.1 200 OK"
I0424 07:25:59.440051 132390722602816 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer.json "HTTP/1.1 200 OK"
I0424 07:25:59.561654 132390722602816 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer.json "HTTP/1.1 200 OK"
I0424 07:25:59.851315 132390722602816 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/added_tokens.json "HTTP/1.1 404 Not Found"
I0424 07:25:59.960515 132390722602816 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/special_tokens_map.json "HTTP/1.1 200 OK"
I0424 07:26:00.080237 132390722602816 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/special_tokens_map.json "HTTP/1.1 200 OK"
I0424 07:26:00.198018 132390722602816 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/chat_template.jinja "HTTP/1.1 404 Not Found"
I0424 07:26:00.288802 132390722602816 _schedule.py:129] A polynomial schedule was set with a non-positive `transition_steps` value; this results in a constant schedule with value `init_value`.
I0424 07:26:00.295509 132390722602816 maxtext_utils.py:1604] Num_devices: 32, shape (1, 4, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1)
I0424 07:26:00.295649 132390722602816 train_distill.py:582] Applying logical axis rules for model initialization and training...
I0424 07:26:00.295728 132390722602816 train_distill.py:586] Loading Student from ...
I0424 07:26:00.295759 132390722602816 train_distill.py:170] --- Student Configuration ---
I0424 07:26:00.295781 132390722602816 train_distill.py:171]   Model Name:      gpt3-52k
I0424 07:26:00.295803 132390722602816 train_distill.py:172]   Dimensions:      1 Layers, 16 Emb Dim, 8 Head Dim
I0424 07:26:00.295824 132390722602816 train_distill.py:175]   Attention Heads: 2 Query, 2 KV
I0424 07:26:00.295842 132390722602816 train_distill.py:176]   Vocab Size:      32000
I0424 07:26:00.295858 132390722602816 train_distill.py:177]   Checkpoint:      
I0424 07:26:00.295878 132390722602816 train_distill.py:451] Initializing model: gpt3-52k...
I0424 07:26:01.945705 132390722602816 train_distill.py:600] Loading Teacher from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items...
I0424 07:26:01.945815 132390722602816 train_distill.py:170] --- Teacher Configuration ---
I0424 07:26:01.945844 132390722602816 train_distill.py:171]   Model Name:      gpt3-52k
I0424 07:26:01.945870 132390722602816 train_distill.py:172]   Dimensions:      1 Layers, 16 Emb Dim, 8 Head Dim
I0424 07:26:01.945890 132390722602816 train_distill.py:175]   Attention Heads: 2 Query, 2 KV
I0424 07:26:01.945909 132390722602816 train_distill.py:176]   Vocab Size:      32000
I0424 07:26:01.945938 132390722602816 train_distill.py:177]   Checkpoint:      gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items
I0424 07:26:01.945966 132390722602816 train_distill.py:451] Initializing model: gpt3-52k...
I0424 07:26:03.005946 132390722602816 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0424 07:26:03.006111 132390722602816 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=True, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7867e5c84740>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0424 07:26:03.006179 132390722602816 abstract_checkpointer.py:35] orbax-checkpoint version: 0.11.28
W0424 07:26:03.515845 132390722602816 checkpoint.py:202] Metadata file does not exist: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items/_CHECKPOINT_METADATA
I0424 07:26:04.056134    1969 google_auth_provider.cc:181] Running on GCE, using service account 562977990677-compute@developer.gserviceaccount.com
I0424 07:26:05.241974 132390722602816 checkpointer.py:304] Restoring checkpoint from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items.
W0424 07:26:07.412478 132390722602816 transform_utils.py:230] The transformations API will eventually be replaced by an upgraded design. The current API will not be removed until this point, but it will no longer be actively worked on.
I0424 07:26:07.412855 132390722602816 transform_utils.py:288] The following keys are not loaded from the original tree after applying specified transforms: params/params/decoder/to_nnx__rngs/aqt/count, params/params/decoder/to_nnx__rngs/aqt/key, params/params/decoder/to_nnx__rngs/dropout/count, params/params/decoder/to_nnx__rngs/dropout/key, params/params/decoder/to_nnx__rngs/params/count, params/params/decoder/to_nnx__rngs/params/key
I0424 07:26:09.732227 132390722602816 checkpointer.py:318] Finished restoring checkpoint in 4.87 seconds from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items.
I0424 07:26:10.476410 132390722602816 train_distill.py:626] Initializing Data Iterators via MaxText pipeline...
I0424 07:26:10.541784 132390722602816 config.py:112] TensorFlow version 2.20.0 available.
I0424 07:26:10.542284 132390722602816 config.py:125] JAX version 0.9.2 available.
I0424 07:26:10.996382 132390722602816 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/main/README.md "HTTP/1.1 307 Temporary Redirect"
I0424 07:26:11.005558 132390722602816 _client.py:1025] HTTP Request: HEAD https://huggingface.co/api/resolve-cache/datasets/OptimalScale/ClimbMix/6d467b96d8f26cbe7465e2d70e36191aa75867ac/README.md "HTTP/1.1 200 OK"
I0424 07:26:11.014951 132390722602816 _client.py:1025] HTTP Request: GET https://huggingface.co/api/resolve-cache/datasets/OptimalScale/ClimbMix/6d467b96d8f26cbe7465e2d70e36191aa75867ac/README.md "HTTP/1.1 200 OK"
I0424 07:26:11.125278 132390722602816 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/6d467b96d8f26cbe7465e2d70e36191aa75867ac/ClimbMix.py "HTTP/1.1 404 Not Found"
I0424 07:26:11.453836 132390722602816 _client.py:1025] HTTP Request: HEAD https://s3.amazonaws.com/datasets.huggingface.co/datasets/datasets/OptimalScale/ClimbMix/OptimalScale/ClimbMix.py "HTTP/1.1 404 Not Found"
I0424 07:26:11.566463 132390722602816 _client.py:1025] HTTP Request: GET https://huggingface.co/api/datasets/OptimalScale/ClimbMix/revision/6d467b96d8f26cbe7465e2d70e36191aa75867ac "HTTP/1.1 200 OK"
I0424 07:26:11.670682 132390722602816 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/6d467b96d8f26cbe7465e2d70e36191aa75867ac/.huggingface.yaml "HTTP/1.1 404 Not Found"
I0424 07:26:11.833389 132390722602816 _client.py:1025] HTTP Request: GET https://datasets-server.huggingface.co/info?dataset=OptimalScale/ClimbMix "HTTP/1.1 200 OK"
I0424 07:26:11.974574 132390722602816 _client.py:1025] HTTP Request: GET https://huggingface.co/api/datasets/OptimalScale/ClimbMix/tree/6d467b96d8f26cbe7465e2d70e36191aa75867ac/data?recursive=true&expand=false "HTTP/1.1 404 Not Found"
I0424 07:26:12.088776 132390722602816 _client.py:1025] HTTP Request: GET https://huggingface.co/api/datasets/OptimalScale/ClimbMix/tree/6d467b96d8f26cbe7465e2d70e36191aa75867ac?recursive=false&expand=false "HTTP/1.1 200 OK"
I0424 07:26:12.229704 132390722602816 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/6d467b96d8f26cbe7465e2d70e36191aa75867ac/dataset_infos.json "HTTP/1.1 404 Not Found"
I0424 07:26:12.394753 132390722602816 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json "HTTP/1.1 200 OK"
I0424 07:26:12.504395 132390722602816 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer_config.json "HTTP/1.1 200 OK"
I0424 07:26:12.630991 132390722602816 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main/additional_chat_templates?recursive=false&expand=false "HTTP/1.1 404 Not Found"
I0424 07:26:12.738281 132390722602816 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main?recursive=true&expand=false "HTTP/1.1 200 OK"
E0424 07:26:12.830696 132390722602816 packing.py:209] PackAndBatchOperation is deprecated. Please use lazy_dataset.FirstFitPackIterDataset instead.
I0424 07:26:12.830906 132390722602816 data_loader.py:408] Adding CopyNumPyArrayToSharedMemory MapTransform.
I0424 07:26:12.833943 132390722602816 train_distill.py:396] Input Pipeline Checkpointing: DISABLED
I0424 07:26:12.834006 132390722602816 train_distill.py:400] Reason: Iterator 'MultiHostDataLoadIterator' is not recognized as Grain (dataset_type='DatasetType.HF', has_save=False)
I0424 07:26:12.834069 132390722602816 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0424 07:26:12.834148 132390722602816 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7867e5c84740>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0424 07:26:12.834190 132390722602816 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0424 07:26:12.834223 132390722602816 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7867e5c84740>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0424 07:26:12.834266 132390722602816 checkpoint_manager.py:702] [process=3][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x784f242f32f0>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7850f9847920>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7850f9847860>}, handler_registry=None
I0424 07:26:12.834460 132390722602816 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x784f242f32f0>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0424 07:26:12.834501 132390722602816 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7850f9847920>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0424 07:26:12.834528 132390722602816 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7850f9847860>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0424 07:26:12.834552 132390722602816 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x784f2438b290>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0424 07:26:12.834578 132390722602816 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x784f242f32f0>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x784f242f32f0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7850f9847920>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7850f9847920>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7850f9847860>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7850f9847860>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x784f2438b290>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x784f2438b290>}).
I0424 07:26:12.834966 132390722602816 async_checkpointer.py:177] [process=3][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x784d08300360> timeout: 600 secs and primary_host=0 for async checkpoint writes
I0424 07:26:14.362241 132390722602816 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_main_20260424_070237/pt_distill_nnx_xpk_main_20260424_070237_07_distill_smoke/checkpoints
I0424 07:26:14.376972 132390722602816 checkpoint_manager.py:921] [process=3][thread=MainThread] CheckpointManager created,  primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_main_20260424_070237/pt_distill_nnx_xpk_main_20260424_070237_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7850f9847830>
I0424 07:26:14.377090 132390722602816 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0424 07:26:14.377155 132390722602816 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7867e5c84740>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0424 07:26:14.377191 132390722602816 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0424 07:26:14.377222 132390722602816 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7867e5c84740>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0424 07:26:14.377257 132390722602816 checkpoint_manager.py:1983] [process=3][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning.
I0424 07:26:14.377307 132390722602816 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=132390722602816 count=1 at 0x78621cdc9b00>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x78621cd8d640>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7850f9844b90>, _write_futures=[])
I0424 07:26:14.377752 132390722602816 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=132390722602816 count=1 at 0x78621cdc9b00>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x78621cd8d640>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7850f9844b90>, _write_futures=[])
I0424 07:26:14.377782 132390722602816 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=132390722602816 count=1 at 0x78621cdc9b00>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x78621cd8d640>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7850f9844b90>, _write_futures=[])
I0424 07:26:14.377814 132390722602816 checkpoint_manager.py:702] [process=3][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7850f9847800>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7850f9768bf0>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x784d08137e00>, 'iter': <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x784d08137650>}, handler_registry=None
I0424 07:26:14.377910 132390722602816 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7850f9847800>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0424 07:26:14.377960 132390722602816 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7850f9768bf0>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0424 07:26:14.377984 132390722602816 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x784d08137e00>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0424 07:26:14.378011 132390722602816 composite_checkpoint_handler.py:237] Deferred registration for item: "iter". Adding handler `<maxtext.common.checkpointing.GrainCheckpointHandler object at 0x784d08137650>` for item "iter" and save args `<class 'maxtext.common.checkpointing.GrainCheckpointSave'>` and restore args `<class 'maxtext.common.checkpointing.GrainCheckpointRestore'>` to `_handler_registry`.
I0424 07:26:14.378033 132390722602816 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7850f9768110>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0424 07:26:14.378057 132390722602816 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7850f9847800>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7850f9847800>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7850f9768bf0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7850f9768bf0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x784d08137e00>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x784d08137e00>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointSave'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x784d08137650>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointRestore'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x784d08137650>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7850f9768110>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7850f9768110>}).
I0424 07:26:14.378123 132390722602816 async_checkpointer.py:177] [process=3][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x784d083005e0> timeout: 600 secs and primary_host=0 for async checkpoint writes
I0424 07:26:14.758228 132390722602816 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_main_20260424_070237/pt_distill_nnx_xpk_main_20260424_070237_07_distill_smoke/checkpoints
I0424 07:26:14.767046 132390722602816 checkpoint_manager.py:921] [process=3][thread=MainThread] CheckpointManager created,  primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_main_20260424_070237/pt_distill_nnx_xpk_main_20260424_070237_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7850f9768590>
I0424 07:26:14.767482 132390722602816 train_distill.py:677] Starting Distillation Training...
I0424 07:26:14.767601 132390722602816 peft_trainer.py:584] Training with mesh: Mesh('diloco': 1, 'data': 4, 'stage': 1, 'fsdp': 8, 'fsdp_transpose': 1, 'context': 1, 'context_autoregressive': 1, 'tensor': 1, 'tensor_transpose': 1, 'tensor_sequence': 1, 'expert': 1, 'autoregressive': 1, axis_types=(Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto))
I0424 07:26:15.255474 132390722602816 peft_trainer.py:594] Compiled train_step cache size: 0
I0424 07:26:15.257218 132236766586624 grain_pool.py:367] Grain pool will use 1 processes.
I0424 07:26:15.308034 132236766586624 grain_pool.py:440] Grain pool will start child processes.
I0424 07:26:15.313919 132236766586624 grain_pool.py:448] Grain pool started all child processes.
2026-04-24 07:26:21.801960: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
`rope_parameters`'s factor field must be a float >= 1, got 40
`rope_parameters`'s beta_fast field must be a float, got 32
`rope_parameters`'s beta_slow field must be a float, got 1
DeepseekV32Config got `key=rope_scaling` in kwargs but hasn't set it as attribute. For RoPE standardization you need to set `self.rope_parameters` in model's config. 
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 781, in <module>
    app.run(main)
  File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 316, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 261, in _run_main
    sys.exit(main(argv))
             ^^^^^^^^^^
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 777, in main
    train_distill(student_config, teacher_config, is_offline, global_config.offline_data_dir)
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 679, in train_distill
    trainer.train(train_iter, eval_iter)
  File "/usr/local/lib/python3.12/site-packages/tunix/sft/peft_trainer.py", line 652, in train
    train_example = sharding_utils.shard_input(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 58, in shard_input
    return jax.tree.map(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/tree.py", line 156, in map
    return tree_util.tree_map(f, tree, *rest, is_leaf=is_leaf)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 373, in tree_map
    return treedef.unflatten(f(*xs) for xs in zip(*all_leaves))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 373, in <genexpr>
    return treedef.unflatten(f(*xs) for xs in zip(*all_leaves))
                             ^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 59, in <lambda>
    lambda x: jax.make_array_from_process_local_data(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 985, in make_array_from_process_local_data
    out = [_array_from_process_local_data(data, s, shape)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 1047, in _array_from_process_local_data
    return make_array_from_callback(global_shape, sharding, cb)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 844, in make_array_from_callback
    per_device_values = api.device_put(per_device_values, devices)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/api.py", line 2732, in device_put
    out_flat = dispatch._batched_device_put_impl(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 602, in _batched_device_put_impl
    y = _device_put_impl(x, device=device, src=src, copy=cp, aval=aval)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 582, in _device_put_impl
    return _device_put_sharding_impl(x, aval, device, copy)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 512, in _device_put_sharding_impl
    raise ValueError(
ValueError: When the second argument to `device_put` is a Device, the first argument must be a fully addressable array or a non-addressable array with a single device sharding. Got value with devices {TpuDevice(id=15, process_index=3, coords=(3,3,0), core_on_chip=0), TpuDevice(id=9, process_index=2, coords=(1,2,0), core_on_chip=0), TpuDevice(id=31, process_index=7, coords=(3,7,0), core_on_chip=0), TpuDevice(id=20, process_index=4, coords=(0,5,0), core_on_chip=0), TpuDevice(id=10, process_index=3, coords=(2,2,0), core_on_chip=0), TpuDevice(id=5, process_index=0, coords=(1,1,0), core_on_chip=0), TpuDevice(id=17, process_index=4, coords=(1,4,0), core_on_chip=0), TpuDevice(id=7, process_index=1, coords=(3,1,0), core_on_chip=0), TpuDevice(id=24, process_index=6, coords=(0,6,0), core_on_chip=0), TpuDevice(id=23, process_index=5, coords=(3,5,0), core_on_chip=0), TpuDevice(id=27, process_index=7, coords=(3,6,0), core_on_chip=0), TpuDevice(id=28, process_index=6, coords=(0,7,0), core_on_chip=0), TpuDevice(id=8, process_index=2, coords=(0,2,0), core_on_chip=0), TpuDevice(id=21, process_index=4, coords=(1,5,0), core_on_chip=0), TpuDevice(id=3, process_index=1, coords=(3,0,0), core_on_chip=0), TpuDevice(id=14, process_index=3, coords=(2,3,0), core_on_chip=0), TpuDevice(id=13, process_index=2, coords=(1,3,0), core_on_chip=0), TpuDevice(id=18, process_index=5, coords=(2,4,0), core_on_chip=0), TpuDevice(id=26, process_index=7, coords=(2,6,0), core_on_chip=0), TpuDevice(id=22, process_index=5, coords=(2,5,0), core_on_chip=0), TpuDevice(id=16, process_index=4, coords=(0,4,0), core_on_chip=0), TpuDevice(id=6, process_index=1, coords=(2,1,0), core_on_chip=0), TpuDevice(id=30, process_index=7, coords=(2,7,0), core_on_chip=0), TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), TpuDevice(id=12, process_index=2, coords=(0,3,0), core_on_chip=0), TpuDevice(id=25, process_index=6, coords=(1,6,0), core_on_chip=0), TpuDevice(id=19, process_index=5, coords=(3,4,0), core_on_chip=0), TpuDevice(id=1, process_index=0, coords=(1,0,0), core_on_chip=0), TpuDevice(id=4, process_index=0, coords=(0,1,0), core_on_chip=0), TpuDevice(id=11, process_index=3, coords=(3,2,0), core_on_chip=0), TpuDevice(id=2, process_index=1, coords=(2,0,0), core_on_chip=0), TpuDevice(id=29, process_index=6, coords=(1,7,0), core_on_chip=0)}
I0424 07:26:25.909459 132236766586624 grain_pool.py:542] Grain pool is exiting.
I0424 07:26:25.909564 132236766586624 grain_pool.py:547] Shutting down multiprocessing system.
I0424 07:26:27.596578 132236766586624 grain_pool.py:547] Shutting down multiprocessing system.
/usr/local/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 15 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
XPK End: Fri Apr 24 07:26:37 UTC 2026
EXIT_CODE=1