MaxView

‹ 01_sft_smokeCase: 07_distill_smoke— ›

Metrics: Linen vs NNX  ·  main

MetricLinen  59e0f1759NNX  59e0f1759Diff (NNX − Linen)

Diff = NNX value − Linen value. Green = NNX improved. Red = NNX regressed.

Linen  ·  59e0f1759  ·  main_20260425_071506  ·  full log
XPK Start: Sat Apr 25 07:23:04 UTC 2026
2026-04-25 07:23:22.441942: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
`rope_parameters`'s factor field must be a float >= 1, got 40
`rope_parameters`'s beta_fast field must be a float, got 32
`rope_parameters`'s beta_slow field must be a float, got 1
DeepseekV32Config got `key=rope_scaling` in kwargs but hasn't set it as attribute. For RoPE standardization you need to set `self.rope_parameters` in model's config. 
I0425 07:23:28.799919 139758723237696 max_utils.py:273] Attempting to initialize the jax distributed system...
I0425 07:23:37.840887 139758723237696 distributed.py:149] Starting JAX distributed service on [::]:8482
I0425 07:23:37.843219 139758723237696 distributed.py:172] Connecting to JAX distributed service on mt-07-distill-smoke-02x39-slice-job-0-0.mt-07-distill-smoke-02x39:8482
I0425 07:23:39.784313 139758723237696 max_utils.py:284] Jax distributed system initialized!
I0425 07:23:46.275007 139758723237696 max_utils.py:244] Jax distributed system is already initialized.
W0425 07:23:46.402840 139758723237696 pyconfig.py:111] base_output_directory is not provided; Using local directory called maxtext_output
I0425 07:23:46.460133 139758723237696 max_utils.py:244] Jax distributed system is already initialized.
I0425 07:23:46.461289 139758723237696 pyconfig.py:471] Config param abort_on_inf_loss: True
I0425 07:23:46.461348 139758723237696 pyconfig.py:471] Config param abort_on_nan_loss: True
I0425 07:23:46.461371 139758723237696 pyconfig.py:471] Config param act_quantization_calibration_method: absmax
I0425 07:23:46.461390 139758723237696 pyconfig.py:471] Config param activation_dropout_for_audio: 0.0
I0425 07:23:46.461407 139758723237696 pyconfig.py:471] Config param activation_function_for_audio: gelu
I0425 07:23:46.461425 139758723237696 pyconfig.py:471] Config param activations_in_float32: False
I0425 07:23:46.461441 139758723237696 pyconfig.py:471] Config param adam_b1: 0.9
I0425 07:23:46.461466 139758723237696 pyconfig.py:471] Config param adam_b2: 0.95
I0425 07:23:46.461484 139758723237696 pyconfig.py:471] Config param adam_eps: 1e-08
I0425 07:23:46.461505 139758723237696 pyconfig.py:471] Config param adam_eps_root: 0.0
I0425 07:23:46.461523 139758723237696 pyconfig.py:471] Config param adam_weight_decay: 0.1
I0425 07:23:46.461539 139758723237696 pyconfig.py:471] Config param adamw_mask: []
I0425 07:23:46.461555 139758723237696 pyconfig.py:471] Config param add_bos: True
I0425 07:23:46.461572 139758723237696 pyconfig.py:471] Config param add_eos: True
I0425 07:23:46.461587 139758723237696 pyconfig.py:471] Config param allow_split_physical_axes: False
I0425 07:23:46.461604 139758723237696 pyconfig.py:471] Config param ar_cache_axis_order: 1,2,0,3
I0425 07:23:46.461622 139758723237696 pyconfig.py:471] Config param async_checkpointing: True
I0425 07:23:46.461639 139758723237696 pyconfig.py:471] Config param async_scheduling: False
I0425 07:23:46.461655 139758723237696 pyconfig.py:471] Config param attention: dot_product
I0425 07:23:46.461672 139758723237696 pyconfig.py:471] Config param attention_bias: False
I0425 07:23:46.461688 139758723237696 pyconfig.py:471] Config param attention_dropout_for_audio: 0.0
I0425 07:23:46.461705 139758723237696 pyconfig.py:471] Config param attention_out: RematLocation.REMAT
I0425 07:23:46.461725 139758723237696 pyconfig.py:471] Config param attention_output_dim: -1
I0425 07:23:46.461742 139758723237696 pyconfig.py:471] Config param attention_sink: False
I0425 07:23:46.461757 139758723237696 pyconfig.py:471] Config param attention_type: global
I0425 07:23:46.461774 139758723237696 pyconfig.py:471] Config param attn_logits_soft_cap: None
I0425 07:23:46.461789 139758723237696 pyconfig.py:471] Config param audio_path: 
I0425 07:23:46.461805 139758723237696 pyconfig.py:471] Config param audio_placeholder: <|audio|>
I0425 07:23:46.461822 139758723237696 pyconfig.py:471] Config param autoregressive_decode_assert: 
I0425 07:23:46.461837 139758723237696 pyconfig.py:471] Config param base_config: base.yml
I0425 07:23:46.461853 139758723237696 pyconfig.py:471] Config param base_emb_dim: 16
I0425 07:23:46.461869 139758723237696 pyconfig.py:471] Config param base_mlp_dim: 64
I0425 07:23:46.461885 139758723237696 pyconfig.py:471] Config param base_moe_mlp_dim: -1
I0425 07:23:46.461901 139758723237696 pyconfig.py:471] Config param base_num_decoder_layers: 1
I0425 07:23:46.461916 139758723237696 pyconfig.py:471] Config param base_num_kv_heads: 2
I0425 07:23:46.461931 139758723237696 pyconfig.py:471] Config param base_num_query_heads: 2
I0425 07:23:46.461947 139758723237696 pyconfig.py:471] Config param base_output_directory: /deps/maxtext_output
I0425 07:23:46.461962 139758723237696 pyconfig.py:471] Config param batch_size: 1
I0425 07:23:46.461979 139758723237696 pyconfig.py:471] Config param batch_split_factor: 1
I0425 07:23:46.461995 139758723237696 pyconfig.py:471] Config param beta_fast: 32
I0425 07:23:46.462012 139758723237696 pyconfig.py:471] Config param beta_slow: 1
I0425 07:23:46.462027 139758723237696 pyconfig.py:471] Config param bwd_quantization_calibration_method: absmax
I0425 07:23:46.462044 139758723237696 pyconfig.py:471] Config param capacity_factor: -1.0
I0425 07:23:46.462061 139758723237696 pyconfig.py:471] Config param cast_logits_to_fp32: True
I0425 07:23:46.462077 139758723237696 pyconfig.py:471] Config param chat_template: 
I0425 07:23:46.462103 139758723237696 pyconfig.py:471] Config param chat_template_path: 
I0425 07:23:46.462122 139758723237696 pyconfig.py:471] Config param checkpoint_conversion_fn: None
I0425 07:23:46.462141 139758723237696 pyconfig.py:471] Config param checkpoint_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-07-23/checkpoints/
I0425 07:23:46.462158 139758723237696 pyconfig.py:471] Config param checkpoint_is_quantized: False
I0425 07:23:46.462175 139758723237696 pyconfig.py:471] Config param checkpoint_period: 2000
I0425 07:23:46.462194 139758723237696 pyconfig.py:471] Config param checkpoint_storage_concurrent_gb: 96
I0425 07:23:46.462212 139758723237696 pyconfig.py:471] Config param checkpoint_storage_target_data_file_size_bytes: 2147483648
I0425 07:23:46.462228 139758723237696 pyconfig.py:471] Config param checkpoint_storage_use_ocdbt: True
I0425 07:23:46.462244 139758723237696 pyconfig.py:471] Config param checkpoint_storage_use_zarr3: True
I0425 07:23:46.462260 139758723237696 pyconfig.py:471] Config param checkpoint_todelete_full_path: None
I0425 07:23:46.462276 139758723237696 pyconfig.py:471] Config param checkpoint_todelete_subdir: None
I0425 07:23:46.462292 139758723237696 pyconfig.py:471] Config param chips_per_vm: 4
I0425 07:23:46.462311 139758723237696 pyconfig.py:471] Config param chunk_attn_window_size: 0
I0425 07:23:46.462327 139758723237696 pyconfig.py:471] Config param collect_stack_trace: False
I0425 07:23:46.462343 139758723237696 pyconfig.py:471] Config param colocated_python_checkpointing: False
I0425 07:23:46.462357 139758723237696 pyconfig.py:471] Config param colocated_python_data_input: False
I0425 07:23:46.462373 139758723237696 pyconfig.py:471] Config param compile_topology: 
I0425 07:23:46.462388 139758723237696 pyconfig.py:471] Config param compile_topology_num_slices: -1
I0425 07:23:46.462404 139758723237696 pyconfig.py:471] Config param compile_xla_flags: 
I0425 07:23:46.462420 139758723237696 pyconfig.py:471] Config param compiled_trainstep_file: 
I0425 07:23:46.462436 139758723237696 pyconfig.py:471] Config param compute_axis_order: 0,1,2,3
I0425 07:23:46.462450 139758723237696 pyconfig.py:471] Config param constant_bound_config: []
I0425 07:23:46.462468 139758723237696 pyconfig.py:471] Config param context: RematLocation.REMAT
I0425 07:23:46.462484 139758723237696 pyconfig.py:471] Config param context_parallel_load_balance: True
I0425 07:23:46.462499 139758723237696 pyconfig.py:471] Config param context_parallel_reorder_strategy: ReorderStrategy.AUTO
I0425 07:23:46.462518 139758723237696 pyconfig.py:471] Config param context_parallel_size: 1
I0425 07:23:46.462533 139758723237696 pyconfig.py:471] Config param context_parallel_strategy: all_gather
I0425 07:23:46.462549 139758723237696 pyconfig.py:471] Config param context_sharding: context
I0425 07:23:46.462565 139758723237696 pyconfig.py:471] Config param conv_chunksize_for_audio: 500
I0425 07:23:46.462581 139758723237696 pyconfig.py:471] Config param conv_stride_for_vit: 14
I0425 07:23:46.462596 139758723237696 pyconfig.py:471] Config param convert_checkpoint_if_possible: False
I0425 07:23:46.462610 139758723237696 pyconfig.py:471] Config param cost_estimate_flops_bwd: -1
I0425 07:23:46.462626 139758723237696 pyconfig.py:471] Config param cost_estimate_flops_fwd: -1
I0425 07:23:46.462641 139758723237696 pyconfig.py:471] Config param custom_mesh: 
I0425 07:23:46.462657 139758723237696 pyconfig.py:471] Config param custom_mesh_and_rule: 
I0425 07:23:46.462673 139758723237696 pyconfig.py:471] Config param d_model_for_audio: 256
I0425 07:23:46.462688 139758723237696 pyconfig.py:471] Config param data_sharding: (('data', 'stage', 'fsdp', 'fsdp_transpose', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'),)
I0425 07:23:46.462708 139758723237696 pyconfig.py:471] Config param data_shuffle_seed: 0
I0425 07:23:46.462724 139758723237696 pyconfig.py:471] Config param dataset_name: c4/en:3.0.1
I0425 07:23:46.462740 139758723237696 pyconfig.py:471] Config param dataset_path: 
I0425 07:23:46.462756 139758723237696 pyconfig.py:471] Config param dataset_type: DatasetType.HF
I0425 07:23:46.462774 139758723237696 pyconfig.py:471] Config param dcn_autoregressive_parallelism: 1
I0425 07:23:46.462790 139758723237696 pyconfig.py:471] Config param dcn_context_autoregressive_parallelism: 1
I0425 07:23:46.462831 139758723237696 pyconfig.py:471] Config param dcn_context_parallelism: 1
I0425 07:23:46.462849 139758723237696 pyconfig.py:471] Config param dcn_data_parallelism: -1
I0425 07:23:46.462864 139758723237696 pyconfig.py:471] Config param dcn_diloco_parallelism: 1
I0425 07:23:46.462880 139758723237696 pyconfig.py:471] Config param dcn_expert_parallelism: 1
I0425 07:23:46.462895 139758723237696 pyconfig.py:471] Config param dcn_fsdp_parallelism: 1
I0425 07:23:46.462911 139758723237696 pyconfig.py:471] Config param dcn_fsdp_transpose_parallelism: 1
I0425 07:23:46.462926 139758723237696 pyconfig.py:471] Config param dcn_parallelism: [1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
I0425 07:23:46.462944 139758723237696 pyconfig.py:471] Config param dcn_pipeline_parallelism: 1
I0425 07:23:46.462960 139758723237696 pyconfig.py:471] Config param dcn_sequence_parallelism: 1
I0425 07:23:46.462974 139758723237696 pyconfig.py:471] Config param dcn_tensor_parallelism: 1
I0425 07:23:46.462990 139758723237696 pyconfig.py:471] Config param dcn_tensor_sequence_parallelism: 1
I0425 07:23:46.463007 139758723237696 pyconfig.py:471] Config param dcn_tensor_transpose_parallelism: 1
I0425 07:23:46.463022 139758723237696 pyconfig.py:471] Config param debug: {'rl': False}
I0425 07:23:46.463038 139758723237696 pyconfig.py:471] Config param debug_sharding: False
I0425 07:23:46.463053 139758723237696 pyconfig.py:471] Config param decode_sampling_nucleus_p: -1
I0425 07:23:46.463069 139758723237696 pyconfig.py:471] Config param decode_sampling_strategy: SamplingStrategy.GREEDY
I0425 07:23:46.463087 139758723237696 pyconfig.py:471] Config param decode_sampling_temperature: 1.0
I0425 07:23:46.463111 139758723237696 pyconfig.py:471] Config param decode_sampling_top_k: 0
I0425 07:23:46.463126 139758723237696 pyconfig.py:471] Config param decoder_block: DecoderBlockType.GPT3
I0425 07:23:46.463144 139758723237696 pyconfig.py:471] Config param decoder_layer_input: RematLocation.DEVICE
I0425 07:23:46.463160 139758723237696 pyconfig.py:471] Config param deepstack_visual_indexes_for_vit: []
I0425 07:23:46.463176 139758723237696 pyconfig.py:471] Config param degenerate_group_masking: True
I0425 07:23:46.463191 139758723237696 pyconfig.py:471] Config param dense_init_scale: 1.0
I0425 07:23:46.463207 139758723237696 pyconfig.py:471] Config param diloco_outer_lr: 0.3
I0425 07:23:46.463223 139758723237696 pyconfig.py:471] Config param diloco_outer_momentum: 0.9
I0425 07:23:46.463238 139758723237696 pyconfig.py:471] Config param diloco_sync_period: 36
I0425 07:23:46.463255 139758723237696 pyconfig.py:471] Config param distill_alpha: 0.5
I0425 07:23:46.463270 139758723237696 pyconfig.py:471] Config param distill_alpha_end: None
I0425 07:23:46.463287 139758723237696 pyconfig.py:471] Config param distill_alpha_schedule: constant
I0425 07:23:46.463306 139758723237696 pyconfig.py:471] Config param distill_beta: 0.0
I0425 07:23:46.463322 139758723237696 pyconfig.py:471] Config param distill_beta_end: None
I0425 07:23:46.463337 139758723237696 pyconfig.py:471] Config param distill_beta_schedule: constant
I0425 07:23:46.463353 139758723237696 pyconfig.py:471] Config param distill_feature_loss_type: cosine
I0425 07:23:46.463369 139758723237696 pyconfig.py:471] Config param distill_layer_indices: None
I0425 07:23:46.463384 139758723237696 pyconfig.py:471] Config param distill_temperature: 1.0
I0425 07:23:46.463401 139758723237696 pyconfig.py:471] Config param distill_temperature_end: None
I0425 07:23:46.463416 139758723237696 pyconfig.py:471] Config param distill_temperature_schedule: constant
I0425 07:23:46.463431 139758723237696 pyconfig.py:471] Config param downsample_hidden_size_for_audio: 256
I0425 07:23:46.463447 139758723237696 pyconfig.py:471] Config param dpo_beta: 0.1
I0425 07:23:46.463463 139758723237696 pyconfig.py:471] Config param dpo_label_smoothing: 0.0
I0425 07:23:46.463479 139758723237696 pyconfig.py:471] Config param dq_reduction_steps: 0
I0425 07:23:46.463495 139758723237696 pyconfig.py:471] Config param dropout_rate: 0.0
I0425 07:23:46.463510 139758723237696 pyconfig.py:471] Config param dtype: bfloat16
I0425 07:23:46.463540 139758723237696 pyconfig.py:471] Config param dtype_mm: float32
I0425 07:23:46.463555 139758723237696 pyconfig.py:471] Config param dump_hlo: False
I0425 07:23:46.463571 139758723237696 pyconfig.py:471] Config param dump_hlo_delete_local_after: True
I0425 07:23:46.463587 139758723237696 pyconfig.py:471] Config param dump_hlo_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-07-23/xla_dump
I0425 07:23:46.463602 139758723237696 pyconfig.py:471] Config param dump_hlo_local_dir: /tmp/xla_dump/
I0425 07:23:46.463618 139758723237696 pyconfig.py:471] Config param dump_hlo_local_module_name: jit_train_step
I0425 07:23:46.463633 139758723237696 pyconfig.py:471] Config param dump_hlo_module_name: jit_train_step
I0425 07:23:46.463648 139758723237696 pyconfig.py:471] Config param dump_hlo_upload_all: False
I0425 07:23:46.463664 139758723237696 pyconfig.py:471] Config param dump_hlo_xla_flags: 
I0425 07:23:46.463680 139758723237696 pyconfig.py:471] Config param dump_jaxpr: False
I0425 07:23:46.463698 139758723237696 pyconfig.py:471] Config param dump_jaxpr_delete_local_after: True
I0425 07:23:46.463714 139758723237696 pyconfig.py:471] Config param dump_jaxpr_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-07-23/jaxpr_dump
I0425 07:23:46.463730 139758723237696 pyconfig.py:471] Config param dump_jaxpr_local_dir: /tmp/jaxpr_dump/
I0425 07:23:46.463745 139758723237696 pyconfig.py:471] Config param dump_step: -1
I0425 07:23:46.463760 139758723237696 pyconfig.py:471] Config param elastic_enabled: False
I0425 07:23:46.463777 139758723237696 pyconfig.py:471] Config param elastic_max_retries: 10
I0425 07:23:46.463791 139758723237696 pyconfig.py:471] Config param elastic_timeout_seconds: 300
I0425 07:23:46.463807 139758723237696 pyconfig.py:471] Config param emb_dim: 16
I0425 07:23:46.463823 139758723237696 pyconfig.py:471] Config param enable_autocheckpoint: False
I0425 07:23:46.463837 139758723237696 pyconfig.py:471] Config param enable_checkpoint_cloud_logger: False
I0425 07:23:46.463853 139758723237696 pyconfig.py:471] Config param enable_checkpointing: True
I0425 07:23:46.463869 139758723237696 pyconfig.py:471] Config param enable_continuous_checkpointing: False
I0425 07:23:46.463884 139758723237696 pyconfig.py:471] Config param enable_data_shuffling: True
I0425 07:23:46.463900 139758723237696 pyconfig.py:471] Config param enable_diloco: False
I0425 07:23:46.463916 139758723237696 pyconfig.py:471] Config param enable_dp_attention: False
I0425 07:23:46.463930 139758723237696 pyconfig.py:471] Config param enable_dropout: False
I0425 07:23:46.463946 139758723237696 pyconfig.py:471] Config param enable_emergency_checkpoint: False
I0425 07:23:46.463961 139758723237696 pyconfig.py:471] Config param enable_expert_parallel: False
I0425 07:23:46.463977 139758723237696 pyconfig.py:471] Config param enable_gcp_goodput_metrics: True
I0425 07:23:46.463993 139758723237696 pyconfig.py:471] Config param enable_gcp_step_deviation_metrics: True
I0425 07:23:46.464008 139758723237696 pyconfig.py:471] Config param enable_goodput_recording: False
I0425 07:23:46.464024 139758723237696 pyconfig.py:471] Config param enable_jax_profiler: False
I0425 07:23:46.464038 139758723237696 pyconfig.py:471] Config param enable_llm_inference_pool: False
I0425 07:23:46.464053 139758723237696 pyconfig.py:471] Config param enable_model_warmup: False
I0425 07:23:46.464069 139758723237696 pyconfig.py:471] Config param enable_multi_tier_checkpointing: False
I0425 07:23:46.464084 139758723237696 pyconfig.py:471] Config param enable_nnx: False
I0425 07:23:46.464110 139758723237696 pyconfig.py:471] Config param enable_orbax_v1: False
I0425 07:23:46.464124 139758723237696 pyconfig.py:471] Config param enable_padding_causal_mask: True
I0425 07:23:46.464141 139758723237696 pyconfig.py:471] Config param enable_pathways_goodput: False
I0425 07:23:46.464155 139758723237696 pyconfig.py:471] Config param enable_prefix_caching: False
I0425 07:23:46.464170 139758723237696 pyconfig.py:471] Config param enable_rampup_batch_size: False
I0425 07:23:46.464186 139758723237696 pyconfig.py:471] Config param enable_single_controller: False
I0425 07:23:46.464200 139758723237696 pyconfig.py:471] Config param enable_single_replica_ckpt_restoring: False
I0425 07:23:46.464215 139758723237696 pyconfig.py:471] Config param enable_tensorboard: True
I0425 07:23:46.464231 139758723237696 pyconfig.py:471] Config param enable_tunix_perf_metrics: False
I0425 07:23:46.464246 139758723237696 pyconfig.py:471] Config param encoder_attention_heads_for_audio: 4
I0425 07:23:46.464262 139758723237696 pyconfig.py:471] Config param encoder_ffn_dim_for_audio: 512
I0425 07:23:46.464277 139758723237696 pyconfig.py:471] Config param encoder_layers_for_audio: 2
I0425 07:23:46.464293 139758723237696 pyconfig.py:471] Config param engram: RematLocation.REMAT
I0425 07:23:46.464312 139758723237696 pyconfig.py:471] Config param engram_head_dim: 1280
I0425 07:23:46.464328 139758723237696 pyconfig.py:471] Config param engram_kernel_size: 4
I0425 07:23:46.464344 139758723237696 pyconfig.py:471] Config param engram_layers: []
I0425 07:23:46.464360 139758723237696 pyconfig.py:471] Config param engram_max_ngram_size: 3
I0425 07:23:46.464375 139758723237696 pyconfig.py:471] Config param engram_num_heads: 8
I0425 07:23:46.464391 139758723237696 pyconfig.py:471] Config param engram_seed: 0
I0425 07:23:46.464406 139758723237696 pyconfig.py:471] Config param engram_vocab_bases: []
I0425 07:23:46.464423 139758723237696 pyconfig.py:471] Config param epsilon_high: None
I0425 07:23:46.464438 139758723237696 pyconfig.py:471] Config param eval_corr_lst: False
I0425 07:23:46.464454 139758723237696 pyconfig.py:471] Config param eval_data_columns: ['text']
I0425 07:23:46.464469 139758723237696 pyconfig.py:471] Config param eval_dataset_name: c4/en:3.0.1
I0425 07:23:46.464485 139758723237696 pyconfig.py:471] Config param eval_image_column: image
I0425 07:23:46.464499 139758723237696 pyconfig.py:471] Config param eval_interval: -1
I0425 07:23:46.464515 139758723237696 pyconfig.py:471] Config param eval_make_lst: False
I0425 07:23:46.464530 139758723237696 pyconfig.py:471] Config param eval_mode: pass
I0425 07:23:46.464546 139758723237696 pyconfig.py:471] Config param eval_per_device_batch_size: 2
I0425 07:23:46.464562 139758723237696 pyconfig.py:471] Config param eval_sampling_strategy: greedy
I0425 07:23:46.464578 139758723237696 pyconfig.py:471] Config param eval_split: validation
I0425 07:23:46.464592 139758723237696 pyconfig.py:471] Config param eval_steps: -1
I0425 07:23:46.464609 139758723237696 pyconfig.py:471] Config param expansion_factor_real_data: -1.0
I0425 07:23:46.464625 139758723237696 pyconfig.py:471] Config param final_logits_soft_cap: None
I0425 07:23:46.464640 139758723237696 pyconfig.py:471] Config param first_num_dense_layers: 0
I0425 07:23:46.464656 139758723237696 pyconfig.py:471] Config param float32_gate_logits: False
I0425 07:23:46.464670 139758723237696 pyconfig.py:471] Config param float32_logits: False
I0425 07:23:46.464686 139758723237696 pyconfig.py:471] Config param float32_qk_product: False
I0425 07:23:46.464701 139758723237696 pyconfig.py:471] Config param float32_weight_sum: True
I0425 07:23:46.464717 139758723237696 pyconfig.py:471] Config param force_q_layout: False
I0425 07:23:46.464733 139758723237696 pyconfig.py:471] Config param force_unroll: False
I0425 07:23:46.464749 139758723237696 pyconfig.py:471] Config param formatting_func_kwargs: {}
I0425 07:23:46.464765 139758723237696 pyconfig.py:471] Config param formatting_func_path: 
I0425 07:23:46.464780 139758723237696 pyconfig.py:471] Config param freeze_audio_encoder_params: True
I0425 07:23:46.464796 139758723237696 pyconfig.py:471] Config param freeze_vision_encoder_params: True
I0425 07:23:46.464811 139758723237696 pyconfig.py:471] Config param fused_mlp: False
I0425 07:23:46.464827 139758723237696 pyconfig.py:471] Config param fused_qkv: True
I0425 07:23:46.464843 139758723237696 pyconfig.py:471] Config param gcs_metrics: False
I0425 07:23:46.464859 139758723237696 pyconfig.py:471] Config param gdn_chunk_size: 64
I0425 07:23:46.464873 139758723237696 pyconfig.py:471] Config param gdn_conv_kernel_dim: 4
I0425 07:23:46.464889 139758723237696 pyconfig.py:471] Config param gdn_key_head_dim: 128
I0425 07:23:46.464905 139758723237696 pyconfig.py:471] Config param gdn_num_key_heads: 16
I0425 07:23:46.464921 139758723237696 pyconfig.py:471] Config param gdn_num_value_heads: 32
I0425 07:23:46.464937 139758723237696 pyconfig.py:471] Config param gdn_value_head_dim: 128
I0425 07:23:46.464953 139758723237696 pyconfig.py:471] Config param generate_padding_batch_eval: False
I0425 07:23:46.464967 139758723237696 pyconfig.py:471] Config param generate_padding_batch_train: False
I0425 07:23:46.464984 139758723237696 pyconfig.py:471] Config param generate_slice: v5e-16
I0425 07:23:46.464998 139758723237696 pyconfig.py:471] Config param generation_configs: {}
I0425 07:23:46.465015 139758723237696 pyconfig.py:471] Config param global_batch_size_to_eval_on: 64
I0425 07:23:46.465031 139758723237696 pyconfig.py:471] Config param global_batch_size_to_load: 512
I0425 07:23:46.465047 139758723237696 pyconfig.py:471] Config param global_batch_size_to_load_eval: 64
I0425 07:23:46.465061 139758723237696 pyconfig.py:471] Config param global_batch_size_to_load_increment: None
I0425 07:23:46.465077 139758723237696 pyconfig.py:471] Config param global_batch_size_to_load_start: None
I0425 07:23:46.465101 139758723237696 pyconfig.py:471] Config param global_batch_size_to_train_on: 512
I0425 07:23:46.465117 139758723237696 pyconfig.py:471] Config param global_head_dim: 0
I0425 07:23:46.465133 139758723237696 pyconfig.py:471] Config param global_num_kv_heads: 0
I0425 07:23:46.465148 139758723237696 pyconfig.py:471] Config param global_parameter_scale: 1
I0425 07:23:46.465164 139758723237696 pyconfig.py:471] Config param global_rampup_samples: 500
I0425 07:23:46.465180 139758723237696 pyconfig.py:471] Config param global_rope_max_timescale: -1
I0425 07:23:46.465196 139758723237696 pyconfig.py:471] Config param global_rope_proportion: 0.25
I0425 07:23:46.465214 139758723237696 pyconfig.py:471] Config param goodput_upload_interval_seconds: 30
I0425 07:23:46.465229 139758723237696 pyconfig.py:471] Config param grad_dtype: float32
I0425 07:23:46.465263 139758723237696 pyconfig.py:471] Config param gradient_accumulation_steps: 8
I0425 07:23:46.465280 139758723237696 pyconfig.py:471] Config param gradient_clipping_threshold: 1.0
I0425 07:23:46.465297 139758723237696 pyconfig.py:471] Config param grain_data_source_max_workers: 16
I0425 07:23:46.465317 139758723237696 pyconfig.py:471] Config param grain_eval_files: 
I0425 07:23:46.465333 139758723237696 pyconfig.py:471] Config param grain_file_type: arrayrecord
I0425 07:23:46.465349 139758723237696 pyconfig.py:471] Config param grain_num_threads: 16
I0425 07:23:46.465364 139758723237696 pyconfig.py:471] Config param grain_num_threads_eval: 16
I0425 07:23:46.465380 139758723237696 pyconfig.py:471] Config param grain_packing_type: first_fit
I0425 07:23:46.465397 139758723237696 pyconfig.py:471] Config param grain_per_worker_buffer_size: 1
I0425 07:23:46.465413 139758723237696 pyconfig.py:471] Config param grain_per_worker_buffer_size_eval: 1
I0425 07:23:46.465429 139758723237696 pyconfig.py:471] Config param grain_prefetch_buffer_size: 500
I0425 07:23:46.465446 139758723237696 pyconfig.py:471] Config param grain_prefetch_buffer_size_eval: 500
I0425 07:23:46.465462 139758723237696 pyconfig.py:471] Config param grain_ram_budget_mb: 1024
I0425 07:23:46.465478 139758723237696 pyconfig.py:471] Config param grain_shuffle_buffer_size: 100
I0425 07:23:46.465493 139758723237696 pyconfig.py:471] Config param grain_train_files: 
I0425 07:23:46.465509 139758723237696 pyconfig.py:471] Config param grain_train_mixture_config_path: 
I0425 07:23:46.465524 139758723237696 pyconfig.py:471] Config param grain_worker_count: 1
I0425 07:23:46.465540 139758723237696 pyconfig.py:471] Config param grain_worker_count_eval: 1
I0425 07:23:46.465555 139758723237696 pyconfig.py:471] Config param grpo_beta: 0.08
I0425 07:23:46.465571 139758723237696 pyconfig.py:471] Config param grpo_epsilon: 0.2
I0425 07:23:46.465586 139758723237696 pyconfig.py:471] Config param hardware: tpu
I0425 07:23:46.465602 139758723237696 pyconfig.py:471] Config param hbm_utilization_vllm: 0.72
I0425 07:23:46.465618 139758723237696 pyconfig.py:471] Config param head_dim: 8
I0425 07:23:46.465634 139758723237696 pyconfig.py:471] Config param heartbeat_reporting_interval_in_seconds: 5
I0425 07:23:46.465648 139758723237696 pyconfig.py:471] Config param hf_data_dir: None
I0425 07:23:46.465664 139758723237696 pyconfig.py:471] Config param hf_eval_files: None
I0425 07:23:46.465679 139758723237696 pyconfig.py:471] Config param hf_eval_split: None
I0425 07:23:46.465694 139758723237696 pyconfig.py:471] Config param hf_name: None
I0425 07:23:46.465710 139758723237696 pyconfig.py:471] Config param hf_path: OptimalScale/ClimbMix
I0425 07:23:46.465725 139758723237696 pyconfig.py:471] Config param hf_train_files: None
I0425 07:23:46.465740 139758723237696 pyconfig.py:471] Config param hidden_size_for_vit: 1408
I0425 07:23:46.465756 139758723237696 pyconfig.py:471] Config param hide_profiler_step_metric: False
I0425 07:23:46.465773 139758723237696 pyconfig.py:471] Config param ici_autoregressive_parallelism: 1
I0425 07:23:46.465788 139758723237696 pyconfig.py:471] Config param ici_context_autoregressive_parallelism: 1
I0425 07:23:46.465803 139758723237696 pyconfig.py:471] Config param ici_context_parallelism: 1
I0425 07:23:46.465819 139758723237696 pyconfig.py:471] Config param ici_data_parallelism: 1
I0425 07:23:46.465834 139758723237696 pyconfig.py:471] Config param ici_diloco_parallelism: 1
I0425 07:23:46.465850 139758723237696 pyconfig.py:471] Config param ici_expert_parallelism: 1
I0425 07:23:46.465864 139758723237696 pyconfig.py:471] Config param ici_fsdp_parallelism: -1
I0425 07:23:46.465881 139758723237696 pyconfig.py:471] Config param ici_fsdp_transpose_parallelism: 1
I0425 07:23:46.465895 139758723237696 pyconfig.py:471] Config param ici_parallelism: [1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1]
I0425 07:23:46.465913 139758723237696 pyconfig.py:471] Config param ici_pipeline_parallelism: 1
I0425 07:23:46.465928 139758723237696 pyconfig.py:471] Config param ici_sequence_parallelism: 1
I0425 07:23:46.465944 139758723237696 pyconfig.py:471] Config param ici_tensor_parallelism: 1
I0425 07:23:46.465958 139758723237696 pyconfig.py:471] Config param ici_tensor_sequence_parallelism: 1
I0425 07:23:46.465975 139758723237696 pyconfig.py:471] Config param ici_tensor_transpose_parallelism: 1
I0425 07:23:46.465991 139758723237696 pyconfig.py:471] Config param image_path: 
I0425 07:23:46.466006 139758723237696 pyconfig.py:471] Config param image_placeholder: <|image|>
I0425 07:23:46.466022 139758723237696 pyconfig.py:471] Config param image_size_for_vit: 896
I0425 07:23:46.466038 139758723237696 pyconfig.py:471] Config param indexer_head_dim: 128
I0425 07:23:46.466054 139758723237696 pyconfig.py:471] Config param indexer_loss_scaling_factor: 0.0
I0425 07:23:46.466069 139758723237696 pyconfig.py:471] Config param indexer_n_heads: 64
I0425 07:23:46.466085 139758723237696 pyconfig.py:471] Config param indexer_sparse_training: False
I0425 07:23:46.466113 139758723237696 pyconfig.py:471] Config param indexer_topk: 2048
I0425 07:23:46.466130 139758723237696 pyconfig.py:471] Config param inference_benchmark_test: False
I0425 07:23:46.466145 139758723237696 pyconfig.py:471] Config param inference_metadata_file: 
I0425 07:23:46.466162 139758723237696 pyconfig.py:471] Config param inference_microbenchmark_log_file_path: 
I0425 07:23:46.466178 139758723237696 pyconfig.py:471] Config param inference_microbenchmark_loop_iters: 10
I0425 07:23:46.466192 139758723237696 pyconfig.py:471] Config param inference_microbenchmark_num_samples: [1, 2, 3, 4, 5]
I0425 07:23:46.466209 139758723237696 pyconfig.py:471] Config param inference_microbenchmark_prefill_lengths: 64,128,256,512,1024
I0425 07:23:46.466225 139758723237696 pyconfig.py:471] Config param inference_microbenchmark_stages: prefill,generate
I0425 07:23:46.466241 139758723237696 pyconfig.py:471] Config param inference_server: MaxtextInterleavedServer
I0425 07:23:46.466257 139758723237696 pyconfig.py:471] Config param inhomogeneous_layer_cycle_interval: 1
I0425 07:23:46.466274 139758723237696 pyconfig.py:471] Config param init_weights_seed: 0
I0425 07:23:46.466289 139758723237696 pyconfig.py:471] Config param input_data_sharding_logical_axes: ['activation_embed_and_logits_batch', 'activation_norm_length']
I0425 07:23:46.466310 139758723237696 pyconfig.py:471] Config param interleave_moe_layer_step: 1
I0425 07:23:46.466325 139758723237696 pyconfig.py:471] Config param intermediate_size_for_vit: 5632
I0425 07:23:46.466340 139758723237696 pyconfig.py:471] Config param internal_compile: False
I0425 07:23:46.466354 139758723237696 pyconfig.py:471] Config param internal_compile_num_devices: -1
I0425 07:23:46.466371 139758723237696 pyconfig.py:471] Config param jax_cache_dir: ~/jax_cache
I0425 07:23:46.466387 139758723237696 pyconfig.py:471] Config param jax_debug_log_modules: 
I0425 07:23:46.466402 139758723237696 pyconfig.py:471] Config param jax_distributed_initialization_timeout: 300
I0425 07:23:46.466418 139758723237696 pyconfig.py:471] Config param jax_profiler_port: 9999
I0425 07:23:46.466432 139758723237696 pyconfig.py:471] Config param key_proj: RematLocation.REMAT
I0425 07:23:46.466449 139758723237696 pyconfig.py:471] Config param kv_cache_buffer: 256
I0425 07:23:46.466464 139758723237696 pyconfig.py:471] Config param kv_lora_rank: 512
I0425 07:23:46.466480 139758723237696 pyconfig.py:471] Config param kv_quant_axis: KvQuantAxis.HEADS_AND_DKV
I0425 07:23:46.466498 139758723237696 pyconfig.py:471] Config param kv_quant_dtype: int8
I0425 07:23:46.466512 139758723237696 pyconfig.py:471] Config param kv_wa_proj: RematLocation.REMAT
I0425 07:23:46.466529 139758723237696 pyconfig.py:471] Config param learning_rate: 0.0002
I0425 07:23:46.466544 139758723237696 pyconfig.py:471] Config param learning_rate_final_fraction: 0.1
I0425 07:23:46.466561 139758723237696 pyconfig.py:471] Config param learning_rate_schedule_steps: 200000
I0425 07:23:46.466576 139758723237696 pyconfig.py:471] Config param load_balance_loss_weight: 0.0
I0425 07:23:46.466592 139758723237696 pyconfig.py:471] Config param load_checkpoint_only_once: False
I0425 07:23:46.466609 139758723237696 pyconfig.py:471] Config param load_from_prefill_dir: False
I0425 07:23:46.466623 139758723237696 pyconfig.py:471] Config param load_full_state_path: 
I0425 07:23:46.466639 139758723237696 pyconfig.py:471] Config param load_parameters_path: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items
I0425 07:23:46.466655 139758723237696 pyconfig.py:471] Config param local_checkpoint_directory: 
I0425 07:23:46.466670 139758723237696 pyconfig.py:471] Config param local_checkpoint_period: 0
I0425 07:23:46.466686 139758723237696 pyconfig.py:471] Config param local_rope_max_timescale: -1
I0425 07:23:46.466701 139758723237696 pyconfig.py:471] Config param local_rope_proportion: 1.0
I0425 07:23:46.466717 139758723237696 pyconfig.py:471] Config param log_config: True
I0425 07:23:46.466732 139758723237696 pyconfig.py:471] Config param log_period: 10
I0425 07:23:46.466749 139758723237696 pyconfig.py:471] Config param logical_axis_rules: (('activation_embed_and_logits_batch', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_embed_and_logits_batch_sequence', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'context', 'expert')), ('activation_vocab', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_vocab', ('tensor', 'tensor_transpose')), ('activation_vocab', 'tensor_sequence'), ('vocab', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('embed_vocab', ('fsdp', 'fsdp_transpose', 'context', 'expert')), ('activation_batch_attn', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('activation_kv_heads', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_length_attn', ('context',)), ('activation_q_length', ('context',)), ('activation_kv_length', ()), ('activation_embed_attn', ('tensor', 'tensor_transpose')), ('activation_kv', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_kv_head_dim', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('q_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('kv_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('qkv', ()), ('kv', ()), ('kv_head_dim', ()), ('q_lora', ('fsdp', 'fsdp_transpose', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'fsdp_transpose', 'context', 'expert')), ('q_lora', ('fsdp', 'context', 'expert')), ('q_lora_up_proj', ()), ('kv_lora', ('fsdp', 'fsdp_transpose', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'fsdp_transpose', 'context', 'expert')), ('kv_lora', ('fsdp', 'context', 'expert')), ('kv_lora_up_proj', ()), ('activation_batch_moe', ('data', 'fsdp', 'fsdp_transpose')), ('activation_length_moe', ('context',)), ('activation_norm_length_moe', ('tensor_sequence', 'context')), ('activation_embed_moe', ('tensor', 'tensor_transpose')), ('activation_mlp_moe', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_exp', ('expert',)), ('exp', 'expert'), ('mlp_moe', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'context')), ('embed_moe', ('fsdp', 'context')), ('activation_mlp', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_length', ('context',)), ('activation_norm_length', ('tensor_sequence', 'context')), ('activation_embed', ('tensor', 'tensor_transpose')), ('activation_stage', 'stage'), ('mlp', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed', ('fsdp', 'fsdp_transpose', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'fsdp_transpose', 'context', 'expert')), ('embed', ('fsdp', 'context', 'expert')), ('norm', ('tensor', 'tensor_transpose')), ('layers', 'stage'), ('diloco', 'diloco'), ('engram_dim', ('tensor',)), ('dense_layers', ()), ('moe_layers', ()), ('mhc', ()), ('prefill_activation_length', ('context',)), ('prefill_activation_norm_length', ('tensor_sequence', 'context')), ('activation_prefill_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_length', ()), ('cache_heads', ('autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence')), ('cache_heads', ('autoregressive', 'tensor', 'tensor_sequence')), ('paged_kv_heads', ('tensor',)), ('cache_batch_prefill', ()), ('cache_batch', ()), ('cache_heads_none', ()), ('cache_kv', ()), ('cache_sequence', ()), ('num_pages', ()), ('tokens_per_page', ()), ('paged_kv_head_dim_size', ()), ('mlp_no_fsdp', ('tensor', 'tensor_sequence', 'autoregressive')), ('embed_tensor_transpose', ('tensor_transpose',)), ('exp_with_fsdp', 'fsdp'))
I0425 07:23:46.466819 139758723237696 pyconfig.py:471] Config param logits_dot_in_fp32: False
I0425 07:23:46.466835 139758723237696 pyconfig.py:471] Config param logits_via_embedding: True
I0425 07:23:46.466851 139758723237696 pyconfig.py:471] Config param lora_input_adapters_path: 
I0425 07:23:46.466867 139758723237696 pyconfig.py:471] Config param loss_algo: grpo
I0425 07:23:46.466884 139758723237696 pyconfig.py:471] Config param lr_schedule_type: LearningRateScheduleType.COSINE
I0425 07:23:46.466902 139758723237696 pyconfig.py:471] Config param managed_mldiagnostics: False
I0425 07:23:46.466918 139758723237696 pyconfig.py:471] Config param managed_mldiagnostics_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-07-23/managed-mldiagnostics
I0425 07:23:46.466933 139758723237696 pyconfig.py:471] Config param managed_mldiagnostics_run_group: 
I0425 07:23:46.466949 139758723237696 pyconfig.py:471] Config param math_verify_num_procs: None
I0425 07:23:46.466965 139758723237696 pyconfig.py:471] Config param math_verify_timeout: 300
I0425 07:23:46.466981 139758723237696 pyconfig.py:471] Config param matmul_precision: MatmulPrecision.DEFAULT
I0425 07:23:46.466999 139758723237696 pyconfig.py:471] Config param max_checkify: False
I0425 07:23:46.467015 139758723237696 pyconfig.py:471] Config param max_concurrency: 256
I0425 07:23:46.467030 139758723237696 pyconfig.py:471] Config param max_corpus_chars: 10000000
I0425 07:23:46.467046 139758723237696 pyconfig.py:471] Config param max_num_batched_tokens: None
I0425 07:23:46.467062 139758723237696 pyconfig.py:471] Config param max_num_checkpoints_to_keep: None
I0425 07:23:46.467078 139758723237696 pyconfig.py:471] Config param max_num_images_per_example: -1
I0425 07:23:46.467103 139758723237696 pyconfig.py:471] Config param max_num_seqs: None
I0425 07:23:46.467119 139758723237696 pyconfig.py:471] Config param max_position_embeddings: 163840
I0425 07:23:46.467135 139758723237696 pyconfig.py:471] Config param max_prefill_predict_length: 64
I0425 07:23:46.467151 139758723237696 pyconfig.py:471] Config param max_sample_len_for_audio: 10000
I0425 07:23:46.467166 139758723237696 pyconfig.py:471] Config param max_segments_per_seq: -1
I0425 07:23:46.467183 139758723237696 pyconfig.py:471] Config param max_source_positions_for_audio: 1500
I0425 07:23:46.467199 139758723237696 pyconfig.py:471] Config param max_target_length: 2048
I0425 07:23:46.467215 139758723237696 pyconfig.py:471] Config param max_timescale_for_audio: 10000.0
I0425 07:23:46.467229 139758723237696 pyconfig.py:471] Config param megablox: True
I0425 07:23:46.467245 139758723237696 pyconfig.py:471] Config param merge_gating_gmm: False
I0425 07:23:46.467261 139758723237696 pyconfig.py:471] Config param mesh_axes: ['diloco', 'data', 'stage', 'fsdp', 'fsdp_transpose', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive']
I0425 07:23:46.467279 139758723237696 pyconfig.py:471] Config param metrics_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-07-23/metrics/
I0425 07:23:46.467295 139758723237696 pyconfig.py:471] Config param metrics_file: 
I0425 07:23:46.467314 139758723237696 pyconfig.py:471] Config param mhc_expansion_rate: 1
I0425 07:23:46.467329 139758723237696 pyconfig.py:471] Config param micro_batch_size_to_eval_on: 64
I0425 07:23:46.467345 139758723237696 pyconfig.py:471] Config param micro_batch_size_to_train_on: 64
I0425 07:23:46.467361 139758723237696 pyconfig.py:471] Config param mla_kv: RematLocation.REMAT
I0425 07:23:46.467377 139758723237696 pyconfig.py:471] Config param mla_naive_kvcache: True
I0425 07:23:46.467392 139758723237696 pyconfig.py:471] Config param mla_q: RematLocation.REMAT
I0425 07:23:46.467409 139758723237696 pyconfig.py:471] Config param mlp_activations: ['gelu']
I0425 07:23:46.467423 139758723237696 pyconfig.py:471] Config param mlp_activations_limit: -1.0
I0425 07:23:46.467440 139758723237696 pyconfig.py:471] Config param mlp_bias: False
I0425 07:23:46.467456 139758723237696 pyconfig.py:471] Config param mlp_dim: 64
I0425 07:23:46.467471 139758723237696 pyconfig.py:471] Config param mlpwi: RematLocation.REMAT
I0425 07:23:46.467487 139758723237696 pyconfig.py:471] Config param mlpwi_0: RematLocation.REMAT
I0425 07:23:46.467504 139758723237696 pyconfig.py:471] Config param mlpwi_1: RematLocation.REMAT
I0425 07:23:46.467519 139758723237696 pyconfig.py:471] Config param mlpwo: RematLocation.REMAT
I0425 07:23:46.467536 139758723237696 pyconfig.py:471] Config param moba: False
I0425 07:23:46.467553 139758723237696 pyconfig.py:471] Config param moba_chunk_size: 1024
I0425 07:23:46.467569 139758723237696 pyconfig.py:471] Config param moba_topk: 8
I0425 07:23:46.467585 139758723237696 pyconfig.py:471] Config param model_call_mode: 
I0425 07:23:46.467601 139758723237696 pyconfig.py:471] Config param model_name: gpt3-52k
I0425 07:23:46.467616 139758723237696 pyconfig.py:471] Config param moe_expert_input_dim: -1
I0425 07:23:46.467631 139758723237696 pyconfig.py:471] Config param moe_fsdp_use_two_stage_all_gather: False
I0425 07:23:46.467648 139758723237696 pyconfig.py:471] Config param moe_mlp_dim: -1
I0425 07:23:46.467663 139758723237696 pyconfig.py:471] Config param moe_mlpwi_0: RematLocation.REMAT
I0425 07:23:46.467679 139758723237696 pyconfig.py:471] Config param moe_mlpwi_1: RematLocation.REMAT
I0425 07:23:46.467695 139758723237696 pyconfig.py:471] Config param moe_mlpwo: RematLocation.REMAT
I0425 07:23:46.467710 139758723237696 pyconfig.py:471] Config param monitor_goodput: False
I0425 07:23:46.467726 139758723237696 pyconfig.py:471] Config param monitor_step_time_deviation: True
I0425 07:23:46.467742 139758723237696 pyconfig.py:471] Config param mrope_section: [24, 20, 20]
I0425 07:23:46.467758 139758723237696 pyconfig.py:471] Config param mscale: 1.0
I0425 07:23:46.467774 139758723237696 pyconfig.py:471] Config param mtc_data_parallelism: 0
I0425 07:23:46.467791 139758723237696 pyconfig.py:471] Config param mtp_eval_target_module: 0
I0425 07:23:46.467805 139758723237696 pyconfig.py:471] Config param mtp_loss_scaling_factor: 0.1
I0425 07:23:46.467822 139758723237696 pyconfig.py:471] Config param mtp_num_layers: 0
I0425 07:23:46.467838 139758723237696 pyconfig.py:471] Config param mu_dtype: float32
I0425 07:23:46.467861 139758723237696 pyconfig.py:471] Config param multi_sampling: False
I0425 07:23:46.467877 139758723237696 pyconfig.py:471] Config param multi_tier_checkpointing_backup_interval_minutes: 0
I0425 07:23:46.467892 139758723237696 pyconfig.py:471] Config param muon_beta: 0.95
I0425 07:23:46.467909 139758723237696 pyconfig.py:471] Config param muon_consistent_rms: None
I0425 07:23:46.467925 139758723237696 pyconfig.py:471] Config param muon_weight_decay: 0.0
I0425 07:23:46.467941 139758723237696 pyconfig.py:471] Config param n_routing_groups: -1
I0425 07:23:46.467957 139758723237696 pyconfig.py:471] Config param n_window_for_audio: 50
I0425 07:23:46.467973 139758723237696 pyconfig.py:471] Config param n_window_infer_for_audio: 800
I0425 07:23:46.467989 139758723237696 pyconfig.py:471] Config param nope_layer_interval: -1
I0425 07:23:46.468005 139758723237696 pyconfig.py:471] Config param norm_topk_prob: False
I0425 07:23:46.468021 139758723237696 pyconfig.py:471] Config param normalization_layer_epsilon: 1e-05
I0425 07:23:46.468039 139758723237696 pyconfig.py:471] Config param normalize_embedding_logits: False
I0425 07:23:46.468055 139758723237696 pyconfig.py:471] Config param num_attention_heads_for_vit: 16
I0425 07:23:46.468072 139758723237696 pyconfig.py:471] Config param num_batches: 4
I0425 07:23:46.468088 139758723237696 pyconfig.py:471] Config param num_channels_for_vit: 3
I0425 07:23:46.468114 139758723237696 pyconfig.py:471] Config param num_conv_layers_for_audio: 3
I0425 07:23:46.468131 139758723237696 pyconfig.py:471] Config param num_decoder_layers: 1
I0425 07:23:46.468145 139758723237696 pyconfig.py:471] Config param num_diloco_replicas: 1
I0425 07:23:46.468161 139758723237696 pyconfig.py:471] Config param num_epoch: 1
I0425 07:23:46.468177 139758723237696 pyconfig.py:471] Config param num_eval_passes: 1
I0425 07:23:46.468193 139758723237696 pyconfig.py:471] Config param num_experts: 1
I0425 07:23:46.468209 139758723237696 pyconfig.py:471] Config param num_experts_per_tok: 1
I0425 07:23:46.468225 139758723237696 pyconfig.py:471] Config param num_generations: 2
I0425 07:23:46.468239 139758723237696 pyconfig.py:471] Config param num_hidden_layers_for_vit: 34
I0425 07:23:46.468255 139758723237696 pyconfig.py:471] Config param num_iterations: 1
I0425 07:23:46.468271 139758723237696 pyconfig.py:471] Config param num_kv_heads: 2
I0425 07:23:46.468287 139758723237696 pyconfig.py:471] Config param num_layers_per_pipeline_stage: 1
I0425 07:23:46.468306 139758723237696 pyconfig.py:471] Config param num_mel_bins_for_audio: 128
I0425 07:23:46.468322 139758723237696 pyconfig.py:471] Config param num_pipeline_microbatches: -1
I0425 07:23:46.468336 139758723237696 pyconfig.py:471] Config param num_pipeline_repeats: -1
I0425 07:23:46.468353 139758723237696 pyconfig.py:471] Config param num_position_embeddings_for_vit: 1024
I0425 07:23:46.468369 139758723237696 pyconfig.py:471] Config param num_query_heads: 2
I0425 07:23:46.468384 139758723237696 pyconfig.py:471] Config param num_samplers_slices: -1
I0425 07:23:46.468400 139758723237696 pyconfig.py:471] Config param num_slices: 1
I0425 07:23:46.468416 139758723237696 pyconfig.py:471] Config param num_target_devices: 32
I0425 07:23:46.468432 139758723237696 pyconfig.py:471] Config param num_test_batches: 5
I0425 07:23:46.468448 139758723237696 pyconfig.py:471] Config param num_trainer_slices: -1
I0425 07:23:46.468464 139758723237696 pyconfig.py:471] Config param num_vocab_tiling: 1
I0425 07:23:46.468479 139758723237696 pyconfig.py:471] Config param off_policy_steps: 0
I0425 07:23:46.468495 139758723237696 pyconfig.py:471] Config param offline_data_dir: None
I0425 07:23:46.468510 139758723237696 pyconfig.py:471] Config param opt_type: OptimizerType.ADAM_PAX
I0425 07:23:46.468529 139758723237696 pyconfig.py:471] Config param optimize_mesh_for_tpu_v6e: False
I0425 07:23:46.468544 139758723237696 pyconfig.py:471] Config param optimizer_memory_host_offload: False
I0425 07:23:46.468560 139758723237696 pyconfig.py:471] Config param original_max_position_embeddings: 4096
I0425 07:23:46.468576 139758723237696 pyconfig.py:471] Config param out_hidden_size_for_vit: 512
I0425 07:23:46.468592 139758723237696 pyconfig.py:471] Config param out_proj: RematLocation.REMAT
I0425 07:23:46.468609 139758723237696 pyconfig.py:471] Config param output_dim_for_audio: 512
I0425 07:23:46.468625 139758723237696 pyconfig.py:471] Config param override_logical_axis_rules: False
I0425 07:23:46.468641 139758723237696 pyconfig.py:471] Config param override_model_config: True
I0425 07:23:46.468657 139758723237696 pyconfig.py:471] Config param packing: True
I0425 07:23:46.468673 139758723237696 pyconfig.py:471] Config param pagedattn_head_dim_alignment: 128
I0425 07:23:46.468689 139758723237696 pyconfig.py:471] Config param pagedattn_max_pages_per_group: -1
I0425 07:23:46.468705 139758723237696 pyconfig.py:471] Config param pagedattn_num_pages: 64
I0425 07:23:46.468721 139758723237696 pyconfig.py:471] Config param pagedattn_pages_per_compute_block: 4
I0425 07:23:46.468738 139758723237696 pyconfig.py:471] Config param pagedattn_tokens_per_page: 32
I0425 07:23:46.468753 139758723237696 pyconfig.py:471] Config param param_scan_axis: 1
I0425 07:23:46.468768 139758723237696 pyconfig.py:471] Config param parameter_memory_host_offload: False
I0425 07:23:46.468784 139758723237696 pyconfig.py:471] Config param partial_rotary_factor: 1.0
I0425 07:23:46.468801 139758723237696 pyconfig.py:471] Config param patch_size_for_vit: 14
I0425 07:23:46.468816 139758723237696 pyconfig.py:471] Config param penalty_incorrect_answer: -1.0
I0425 07:23:46.468832 139758723237696 pyconfig.py:471] Config param penalty_incorrect_format: -0.5
I0425 07:23:46.468849 139758723237696 pyconfig.py:471] Config param per_device_batch_size: 2
I0425 07:23:46.468864 139758723237696 pyconfig.py:471] Config param per_device_batch_size_increment: 2.0
I0425 07:23:46.468880 139758723237696 pyconfig.py:471] Config param per_device_batch_size_start: 4.0
I0425 07:23:46.468895 139758723237696 pyconfig.py:471] Config param pipeline_delay_activation_forwarding: False
I0425 07:23:46.468911 139758723237696 pyconfig.py:471] Config param pipeline_fsdp_ag_once: False
I0425 07:23:46.468928 139758723237696 pyconfig.py:471] Config param pipeline_fsdp_ag_per_repeat: False
I0425 07:23:46.468944 139758723237696 pyconfig.py:471] Config param pipeline_parallel_layers: 1
I0425 07:23:46.468960 139758723237696 pyconfig.py:471] Config param pixel_shuffle_ratio_for_vit: 0.5
I0425 07:23:46.468977 139758723237696 pyconfig.py:471] Config param posemb_type_for_vit: learn
I0425 07:23:46.468992 139758723237696 pyconfig.py:471] Config param position_id_per_seconds: 25
I0425 07:23:46.469008 139758723237696 pyconfig.py:471] Config param prefill_cache_axis_order: 1,2,0,3
I0425 07:23:46.469025 139758723237696 pyconfig.py:471] Config param prefill_cache_dir: 
I0425 07:23:46.469040 139758723237696 pyconfig.py:471] Config param prefill_chunk_size: 256
I0425 07:23:46.469056 139758723237696 pyconfig.py:471] Config param prefill_slice: v5e-16
I0425 07:23:46.469072 139758723237696 pyconfig.py:471] Config param prefix_caching_dram_byte: 100000000000
I0425 07:23:46.469103 139758723237696 pyconfig.py:471] Config param prefix_caching_hbm_byte: 10000000000
I0425 07:23:46.469119 139758723237696 pyconfig.py:471] Config param prefuse_moe_weights: False
I0425 07:23:46.469135 139758723237696 pyconfig.py:471] Config param profile_cleanly: True
I0425 07:23:46.469150 139758723237696 pyconfig.py:471] Config param profile_periodically_period: -1
I0425 07:23:46.469165 139758723237696 pyconfig.py:471] Config param profile_power_events: False
I0425 07:23:46.469181 139758723237696 pyconfig.py:471] Config param profiler: ProfilerType.NONE
I0425 07:23:46.469200 139758723237696 pyconfig.py:471] Config param profiler_steps: 5
I0425 07:23:46.469216 139758723237696 pyconfig.py:471] Config param projector_dropout_for_vit: 0.0
I0425 07:23:46.469232 139758723237696 pyconfig.py:471] Config param projector_input_dim_for_vit: 4096
I0425 07:23:46.469247 139758723237696 pyconfig.py:471] Config param projector_output_dim_for_vit: 4096
I0425 07:23:46.469263 139758723237696 pyconfig.py:471] Config param prometheus_port: 0
I0425 07:23:46.469279 139758723237696 pyconfig.py:471] Config param prompt: I love to
I0425 07:23:46.469295 139758723237696 pyconfig.py:471] Config param pure_nnx: False
I0425 07:23:46.469317 139758723237696 pyconfig.py:471] Config param pure_nnx_decoder: False
I0425 07:23:46.469332 139758723237696 pyconfig.py:471] Config param q_lora_rank: 0
I0425 07:23:46.469348 139758723237696 pyconfig.py:471] Config param qk_clip_threshold: 100.0
I0425 07:23:46.469364 139758723237696 pyconfig.py:471] Config param qk_nope_head_dim: 128
I0425 07:23:46.469380 139758723237696 pyconfig.py:471] Config param qk_norm_with_scale: True
I0425 07:23:46.469395 139758723237696 pyconfig.py:471] Config param qk_rope_head_dim: 64
I0425 07:23:46.469412 139758723237696 pyconfig.py:471] Config param qkv_proj: RematLocation.REMAT
I0425 07:23:46.469427 139758723237696 pyconfig.py:471] Config param quant_cfg_path: 
I0425 07:23:46.469443 139758723237696 pyconfig.py:471] Config param quantization: QuantizationType.NONE
I0425 07:23:46.469461 139758723237696 pyconfig.py:471] Config param quantization_local_shard_count: 4
I0425 07:23:46.469477 139758723237696 pyconfig.py:471] Config param quantize_kvcache: False
I0425 07:23:46.469493 139758723237696 pyconfig.py:471] Config param query_proj: RematLocation.REMAT
I0425 07:23:46.469509 139758723237696 pyconfig.py:471] Config param query_wa_proj: RematLocation.REMAT
I0425 07:23:46.469526 139758723237696 pyconfig.py:471] Config param ragged_block_size: 256
I0425 07:23:46.469541 139758723237696 pyconfig.py:471] Config param ragged_buffer_factor: -1.0
I0425 07:23:46.469558 139758723237696 pyconfig.py:471] Config param rampup_end_step: 0
I0425 07:23:46.469574 139758723237696 pyconfig.py:471] Config param rampup_samples_per_increment_to_load: None
I0425 07:23:46.469590 139758723237696 pyconfig.py:471] Config param reasoning_end_token: </reasoning>
I0425 07:23:46.469605 139758723237696 pyconfig.py:471] Config param reasoning_start_token: <reasoning>
I0425 07:23:46.469619 139758723237696 pyconfig.py:471] Config param record_internal_nn_metrics: 0
I0425 07:23:46.469635 139758723237696 pyconfig.py:471] Config param remat_policy: full
I0425 07:23:46.469651 139758723237696 pyconfig.py:471] Config param remat_policy_for_vit: minimal
I0425 07:23:46.469667 139758723237696 pyconfig.py:471] Config param remove_size_one_mesh_axis_from_type: True
I0425 07:23:46.469683 139758723237696 pyconfig.py:471] Config param replicate_quant_scale: False
I0425 07:23:46.469699 139758723237696 pyconfig.py:471] Config param replicator_backup_interval_minutes: 0
I0425 07:23:46.469715 139758723237696 pyconfig.py:471] Config param report_heartbeat_metric_for_gcp_monitoring: False
I0425 07:23:46.469731 139758723237696 pyconfig.py:471] Config param report_performance_metric_for_gcp_monitoring: False
I0425 07:23:46.469745 139758723237696 pyconfig.py:471] Config param reshape_q: False
I0425 07:23:46.469762 139758723237696 pyconfig.py:471] Config param return_log_prob: False
I0425 07:23:46.469778 139758723237696 pyconfig.py:471] Config param reuse_example_batch: 0
I0425 07:23:46.469794 139758723237696 pyconfig.py:471] Config param reward_exact_answer: 5.0
I0425 07:23:46.469808 139758723237696 pyconfig.py:471] Config param reward_exact_format_match: 3.0
I0425 07:23:46.469825 139758723237696 pyconfig.py:471] Config param reward_partial_format_match: 0.5
I0425 07:23:46.469841 139758723237696 pyconfig.py:471] Config param reward_ratio_guess_to_answer_high: 0.5
I0425 07:23:46.469858 139758723237696 pyconfig.py:471] Config param reward_ratio_guess_to_answer_low: 0.25
I0425 07:23:46.469874 139758723237696 pyconfig.py:471] Config param reward_white_space_format_match: 1.5
I0425 07:23:46.469891 139758723237696 pyconfig.py:471] Config param rl: {'num_generations': 2, 'num_iterations': 1, 'grpo_beta': 0.08, 'grpo_epsilon': 0.2, 'loss_algo': 'grpo', 'use_agentic_rollout': False, 'max_concurrency': 256, 'off_policy_steps': 0, 'system_prompt': '', 'degenerate_group_masking': True, 'epsilon_high': None}
I0425 07:23:46.469912 139758723237696 pyconfig.py:471] Config param rollout_data_parallelism: -1
I0425 07:23:46.469928 139758723237696 pyconfig.py:471] Config param rollout_expert_parallelism: 1
I0425 07:23:46.469944 139758723237696 pyconfig.py:471] Config param rollout_micro_batch_size: -1
I0425 07:23:46.469959 139758723237696 pyconfig.py:471] Config param rollout_tensor_parallelism: -1
I0425 07:23:46.469975 139758723237696 pyconfig.py:471] Config param rope_attention_scaling: False
I0425 07:23:46.469991 139758723237696 pyconfig.py:471] Config param rope_factor: 40
I0425 07:23:46.470005 139758723237696 pyconfig.py:471] Config param rope_interleave: True
I0425 07:23:46.470020 139758723237696 pyconfig.py:471] Config param rope_linear_scaling_factor: 1.0
I0425 07:23:46.470037 139758723237696 pyconfig.py:471] Config param rope_max_timescale: 10000
I0425 07:23:46.470053 139758723237696 pyconfig.py:471] Config param rope_min_timescale: 1
I0425 07:23:46.470069 139758723237696 pyconfig.py:471] Config param rope_theta_for_vit: 10000
I0425 07:23:46.470084 139758723237696 pyconfig.py:471] Config param rope_truncate: True
I0425 07:23:46.470106 139758723237696 pyconfig.py:471] Config param rope_type: RopeType.DEFAULT
I0425 07:23:46.470125 139758723237696 pyconfig.py:471] Config param rope_use_scale: True
I0425 07:23:46.470141 139758723237696 pyconfig.py:471] Config param routed_bias: False
I0425 07:23:46.470157 139758723237696 pyconfig.py:471] Config param routed_bias_update_rate: 0.0
I0425 07:23:46.470173 139758723237696 pyconfig.py:471] Config param routed_scaling_factor: 1.0
I0425 07:23:46.470190 139758723237696 pyconfig.py:471] Config param routed_score_func: 
I0425 07:23:46.470204 139758723237696 pyconfig.py:471] Config param run_name: gpt3-52k_2026-04-25-07-23
I0425 07:23:46.470221 139758723237696 pyconfig.py:471] Config param sa_block_kv: 512
I0425 07:23:46.470237 139758723237696 pyconfig.py:471] Config param sa_block_kv_compute: 512
I0425 07:23:46.470252 139758723237696 pyconfig.py:471] Config param sa_block_kv_dkv: 512
I0425 07:23:46.470267 139758723237696 pyconfig.py:471] Config param sa_block_kv_dkv_compute: 512
I0425 07:23:46.470283 139758723237696 pyconfig.py:471] Config param sa_block_kv_dq: 512
I0425 07:23:46.470303 139758723237696 pyconfig.py:471] Config param sa_block_q: 512
I0425 07:23:46.470318 139758723237696 pyconfig.py:471] Config param sa_block_q_dkv: 512
I0425 07:23:46.470334 139758723237696 pyconfig.py:471] Config param sa_block_q_dq: 512
I0425 07:23:46.470350 139758723237696 pyconfig.py:471] Config param sa_k_layout: HEAD_DIM_MINOR
I0425 07:23:46.470366 139758723237696 pyconfig.py:471] Config param sa_q_layout: HEAD_DIM_MINOR
I0425 07:23:46.470381 139758723237696 pyconfig.py:471] Config param sa_use_fused_bwd_kernel: False
I0425 07:23:46.470397 139758723237696 pyconfig.py:471] Config param sa_v_layout: HEAD_DIM_MINOR
I0425 07:23:46.470413 139758723237696 pyconfig.py:471] Config param sampler_devices_fraction: 0.5
I0425 07:23:46.470429 139758723237696 pyconfig.py:471] Config param save_checkpoint_on_completion: True
I0425 07:23:46.470445 139758723237696 pyconfig.py:471] Config param save_config_to_gcs: False
I0425 07:23:46.470461 139758723237696 pyconfig.py:471] Config param save_quantized_params_path: 
I0425 07:23:46.470477 139758723237696 pyconfig.py:471] Config param scale_embedding_for_audio: True
I0425 07:23:46.470493 139758723237696 pyconfig.py:471] Config param scan_layers: True
I0425 07:23:46.470509 139758723237696 pyconfig.py:471] Config param scan_layers_per_stage: False
I0425 07:23:46.470523 139758723237696 pyconfig.py:471] Config param scan_pipeline_iterations: True
I0425 07:23:46.470539 139758723237696 pyconfig.py:471] Config param scan_pipeline_repeats: False
I0425 07:23:46.470556 139758723237696 pyconfig.py:471] Config param set_remat_policy_on_layers_per_stage: False
I0425 07:23:46.470571 139758723237696 pyconfig.py:471] Config param set_remat_policy_on_pipeline_iterations: True
I0425 07:23:46.470587 139758723237696 pyconfig.py:471] Config param sft_train_on_completion_only: False
I0425 07:23:46.470602 139758723237696 pyconfig.py:471] Config param shard_exp_on_fsdp: False
I0425 07:23:46.470618 139758723237696 pyconfig.py:471] Config param shard_mode: ShardMode.AUTO
I0425 07:23:46.470635 139758723237696 pyconfig.py:471] Config param shard_optimizer_over_data: False
I0425 07:23:46.470651 139758723237696 pyconfig.py:471] Config param sharding_strategy: None
I0425 07:23:46.470667 139758723237696 pyconfig.py:471] Config param sharding_tolerance: 0.02
I0425 07:23:46.470683 139758723237696 pyconfig.py:471] Config param shardy: True
I0425 07:23:46.470699 139758723237696 pyconfig.py:471] Config param share_kv_projections: False
I0425 07:23:46.470715 139758723237696 pyconfig.py:471] Config param shared_experts: 0
I0425 07:23:46.470731 139758723237696 pyconfig.py:471] Config param sinkhorn_iterations: 20
I0425 07:23:46.470746 139758723237696 pyconfig.py:471] Config param skip_first_n_steps_for_profiler: 1
I0425 07:23:46.470762 139758723237696 pyconfig.py:471] Config param skip_jax_distributed_system: False
I0425 07:23:46.470777 139758723237696 pyconfig.py:471] Config param skip_step_interval: 128
I0425 07:23:46.470794 139758723237696 pyconfig.py:471] Config param skip_step_on_spikes: False
I0425 07:23:46.470810 139758723237696 pyconfig.py:471] Config param skip_step_scaling_factor: 6.0
I0425 07:23:46.470827 139758723237696 pyconfig.py:471] Config param sliding_window_size: 0
I0425 07:23:46.470842 139758723237696 pyconfig.py:471] Config param solution_end_token: </answer>
I0425 07:23:46.470857 139758723237696 pyconfig.py:471] Config param solution_start_token: <answer>
I0425 07:23:46.470873 139758723237696 pyconfig.py:471] Config param source_checkpoint_layout: orbax
I0425 07:23:46.470887 139758723237696 pyconfig.py:471] Config param sparse_matmul: True
I0425 07:23:46.470904 139758723237696 pyconfig.py:471] Config param spatial_merge_size_for_vit: 2
I0425 07:23:46.470920 139758723237696 pyconfig.py:471] Config param stack_prefill_result_cache: False
I0425 07:23:46.470936 139758723237696 pyconfig.py:471] Config param stack_trace_interval_seconds: 600
I0425 07:23:46.470952 139758723237696 pyconfig.py:471] Config param stack_trace_to_cloud: False
I0425 07:23:46.470968 139758723237696 pyconfig.py:471] Config param step_deviation_interval_seconds: 30
I0425 07:23:46.470983 139758723237696 pyconfig.py:471] Config param steps: 200000
I0425 07:23:46.470999 139758723237696 pyconfig.py:471] Config param stop_strings: None
I0425 07:23:46.471015 139758723237696 pyconfig.py:471] Config param student_overrides: {'model_name': 'llama3.1-8b'}
I0425 07:23:46.471032 139758723237696 pyconfig.py:471] Config param student_params_to_update: None
I0425 07:23:46.471047 139758723237696 pyconfig.py:471] Config param subslice_shape: 
I0425 07:23:46.471063 139758723237696 pyconfig.py:471] Config param swap_space_vllm_gb: 2
I0425 07:23:46.471078 139758723237696 pyconfig.py:471] Config param system_prompt: 
I0425 07:23:46.471102 139758723237696 pyconfig.py:471] Config param target_eval_loss: 0.0
I0425 07:23:46.471119 139758723237696 pyconfig.py:471] Config param teacher_overrides: {'model_name': 'llama3.1-8b'}
I0425 07:23:46.471135 139758723237696 pyconfig.py:471] Config param temperature_tuning: False
I0425 07:23:46.471151 139758723237696 pyconfig.py:471] Config param temporal_patch_size_for_vit: 2
I0425 07:23:46.471167 139758723237696 pyconfig.py:471] Config param tensorboard_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-07-23/tensorboard/
I0425 07:23:46.471182 139758723237696 pyconfig.py:471] Config param tensors_on_device: None
I0425 07:23:46.471198 139758723237696 pyconfig.py:471] Config param tensors_to_offload: None
I0425 07:23:46.471214 139758723237696 pyconfig.py:471] Config param test_batch_start_index: 0
I0425 07:23:46.471230 139758723237696 pyconfig.py:471] Config param tile_size_for_vit: 336
I0425 07:23:46.471246 139758723237696 pyconfig.py:471] Config param tokenize_eval_data: True
I0425 07:23:46.471262 139758723237696 pyconfig.py:471] Config param tokenize_train_data: True
I0425 07:23:46.471278 139758723237696 pyconfig.py:471] Config param tokenizer_path: meta-llama/Llama-3.1-8B
I0425 07:23:46.471294 139758723237696 pyconfig.py:471] Config param tokenizer_type: TokenizerType.HUGGINGFACE
I0425 07:23:46.471314 139758723237696 pyconfig.py:471] Config param topk_routing_group: -1
I0425 07:23:46.471330 139758723237696 pyconfig.py:471] Config param train_data_columns: ['text']
I0425 07:23:46.471347 139758723237696 pyconfig.py:471] Config param train_fraction: 1.0
I0425 07:23:46.471363 139758723237696 pyconfig.py:471] Config param train_image_column: image
I0425 07:23:46.471379 139758723237696 pyconfig.py:471] Config param train_micro_batch_size: -1
I0425 07:23:46.471395 139758723237696 pyconfig.py:471] Config param train_split: train
I0425 07:23:46.471411 139758723237696 pyconfig.py:471] Config param trainable_parameters_mask: []
I0425 07:23:46.471428 139758723237696 pyconfig.py:471] Config param trainable_position_size: 2048
I0425 07:23:46.471444 139758723237696 pyconfig.py:471] Config param trainer_devices_fraction: 0.5
I0425 07:23:46.471461 139758723237696 pyconfig.py:471] Config param upload_all_profiler_results: False
I0425 07:23:46.471477 139758723237696 pyconfig.py:471] Config param use_2d_fsdp_sharding: False
I0425 07:23:46.471493 139758723237696 pyconfig.py:471] Config param use_agentic_rollout: False
I0425 07:23:46.471508 139758723237696 pyconfig.py:471] Config param use_audio: False
I0425 07:23:46.471523 139758723237696 pyconfig.py:471] Config param use_audio_in_video: False
I0425 07:23:46.471539 139758723237696 pyconfig.py:471] Config param use_batch_split_schedule: False
I0425 07:23:46.471554 139758723237696 pyconfig.py:471] Config param use_chat_template: False
I0425 07:23:46.471572 139758723237696 pyconfig.py:471] Config param use_chunked_prefill: False
I0425 07:23:46.471588 139758723237696 pyconfig.py:471] Config param use_custom_sort_vjp: True
I0425 07:23:46.471604 139758723237696 pyconfig.py:471] Config param use_dpo: False
I0425 07:23:46.471620 139758723237696 pyconfig.py:471] Config param use_gather_mosaic_kernel: False
I0425 07:23:46.471634 139758723237696 pyconfig.py:471] Config param use_grpo: True
I0425 07:23:46.471650 139758723237696 pyconfig.py:471] Config param use_indexer: False
I0425 07:23:46.471666 139758723237696 pyconfig.py:471] Config param use_iota_embed: True
I0425 07:23:46.471682 139758723237696 pyconfig.py:471] Config param use_jax_splash: False
I0425 07:23:46.471698 139758723237696 pyconfig.py:471] Config param use_max_logit_estimate: -1
I0425 07:23:46.471712 139758723237696 pyconfig.py:471] Config param use_mrope: False
I0425 07:23:46.471728 139758723237696 pyconfig.py:471] Config param use_multimodal: False
I0425 07:23:46.471744 139758723237696 pyconfig.py:471] Config param use_pathways: True
I0425 07:23:46.471760 139758723237696 pyconfig.py:471] Config param use_post_attn_norm: False
I0425 07:23:46.471776 139758723237696 pyconfig.py:471] Config param use_post_ffw_norm: False
I0425 07:23:46.471792 139758723237696 pyconfig.py:471] Config param use_qk_clip: False
I0425 07:23:46.471807 139758723237696 pyconfig.py:471] Config param use_qk_norm: False
I0425 07:23:46.471822 139758723237696 pyconfig.py:471] Config param use_qk_norm_in_gdn: True
I0425 07:23:46.471838 139758723237696 pyconfig.py:471] Config param use_qwix_quantization: False
I0425 07:23:46.471854 139758723237696 pyconfig.py:471] Config param use_ragged_attention: False
I0425 07:23:46.471870 139758723237696 pyconfig.py:471] Config param use_random_routing: False
I0425 07:23:46.471886 139758723237696 pyconfig.py:471] Config param use_replicator_service: False
I0425 07:23:46.471902 139758723237696 pyconfig.py:471] Config param use_ring_of_experts: False
I0425 07:23:46.471917 139758723237696 pyconfig.py:471] Config param use_sft: False
I0425 07:23:46.471933 139758723237696 pyconfig.py:471] Config param use_splash_scheduler: False
I0425 07:23:46.471949 139758723237696 pyconfig.py:471] Config param use_tokamax_gmm: False
I0425 07:23:46.471965 139758723237696 pyconfig.py:471] Config param use_tokamax_splash: False
I0425 07:23:46.471981 139758723237696 pyconfig.py:471] Config param use_truncation: True
I0425 07:23:46.471996 139758723237696 pyconfig.py:471] Config param use_tunix_gradient_accumulation: False
I0425 07:23:46.472012 139758723237696 pyconfig.py:471] Config param use_untrainable_positional_embedding: False
I0425 07:23:46.472028 139758723237696 pyconfig.py:471] Config param use_vertex_tensorboard: False
I0425 07:23:46.472044 139758723237696 pyconfig.py:471] Config param using_pipeline_parallelism: False
I0425 07:23:46.472060 139758723237696 pyconfig.py:471] Config param v_head_dim: 128
I0425 07:23:46.472076 139758723237696 pyconfig.py:471] Config param v_norm_with_scale: True
I0425 07:23:46.472100 139758723237696 pyconfig.py:471] Config param value_proj: RematLocation.REMAT
I0425 07:23:46.472116 139758723237696 pyconfig.py:471] Config param vertex_tensorboard_project: 
I0425 07:23:46.472132 139758723237696 pyconfig.py:471] Config param vertex_tensorboard_region: 
I0425 07:23:46.472148 139758723237696 pyconfig.py:471] Config param video_path: 
I0425 07:23:46.472163 139758723237696 pyconfig.py:471] Config param video_placeholder: <|video|>
I0425 07:23:46.472179 139758723237696 pyconfig.py:471] Config param vision_output_dim_for_vit: 4096
I0425 07:23:46.472195 139758723237696 pyconfig.py:471] Config param vision_output_length: -1
I0425 07:23:46.472211 139758723237696 pyconfig.py:471] Config param vllm_additional_config: {}
I0425 07:23:46.472227 139758723237696 pyconfig.py:471] Config param vllm_hf_config_path: 
I0425 07:23:46.472244 139758723237696 pyconfig.py:471] Config param vllm_hf_overrides: {}
I0425 07:23:46.472259 139758723237696 pyconfig.py:471] Config param vocab_size: 32000
I0425 07:23:46.472276 139758723237696 pyconfig.py:471] Config param warmup_steps_fraction: 0.1
I0425 07:23:46.472292 139758723237696 pyconfig.py:471] Config param weight_dtype: float32
I0425 07:23:46.472319 139758723237696 pyconfig.py:471] Config param weight_quantization_calibration_method: absmax
I0425 07:23:46.472336 139758723237696 pyconfig.py:471] Config param wi_tile_dlhs_batch_seq: 512
I0425 07:23:46.472352 139758723237696 pyconfig.py:471] Config param wi_tile_dlhs_embed_dim: 1024
I0425 07:23:46.472366 139758723237696 pyconfig.py:471] Config param wi_tile_dlhs_mlp_dim: 1024
I0425 07:23:46.472381 139758723237696 pyconfig.py:471] Config param wi_tile_drhs_batch_seq: 512
I0425 07:23:46.472396 139758723237696 pyconfig.py:471] Config param wi_tile_drhs_embed_dim: 1024
I0425 07:23:46.472411 139758723237696 pyconfig.py:471] Config param wi_tile_drhs_mlp_dim: 1024
I0425 07:23:46.472427 139758723237696 pyconfig.py:471] Config param wi_tile_fwd_batch_seq: 512
I0425 07:23:46.472441 139758723237696 pyconfig.py:471] Config param wi_tile_fwd_embed_dim: 1024
I0425 07:23:46.472457 139758723237696 pyconfig.py:471] Config param wi_tile_fwd_mlp_dim: 1024
I0425 07:23:46.472472 139758723237696 pyconfig.py:471] Config param wo_tile_dlhs_batch_seq: 512
I0425 07:23:46.472488 139758723237696 pyconfig.py:471] Config param wo_tile_dlhs_embed_dim: 1024
I0425 07:23:46.472502 139758723237696 pyconfig.py:471] Config param wo_tile_dlhs_mlp_dim: 1024
I0425 07:23:46.472517 139758723237696 pyconfig.py:471] Config param wo_tile_drhs_batch_seq: 512
I0425 07:23:46.472533 139758723237696 pyconfig.py:471] Config param wo_tile_drhs_embed_dim: 1024
I0425 07:23:46.472549 139758723237696 pyconfig.py:471] Config param wo_tile_drhs_mlp_dim: 1024
I0425 07:23:46.472565 139758723237696 pyconfig.py:471] Config param wo_tile_fwd_batch_seq: 512
I0425 07:23:46.472579 139758723237696 pyconfig.py:471] Config param wo_tile_fwd_embed_dim: 1024
I0425 07:23:46.472595 139758723237696 pyconfig.py:471] Config param wo_tile_fwd_mlp_dim: 1024
I0425 07:23:46.472609 139758723237696 pyconfig.py:471] Config param wsd_decay_steps_fraction: 0.1
I0425 07:23:46.472626 139758723237696 pyconfig.py:471] Config param wsd_decay_style: WsdDecayStyle.LINEAR
I0425 07:23:46.472645 139758723237696 pyconfig.py:471] Config param xprof_e2e_enable_fw_power_level_event: False
I0425 07:23:46.472660 139758723237696 pyconfig.py:471] Config param xprof_e2e_enable_fw_thermal_event: False
I0425 07:23:46.472675 139758723237696 pyconfig.py:471] Config param xprof_e2e_enable_fw_throttle_event: False
I0425 07:23:46.472691 139758723237696 pyconfig.py:471] Config param xprof_tpu_power_trace_level: 0
I0425 07:23:46.472707 139758723237696 pyconfig.py:471] Config param z_loss_multiplier: 0.0
I0425 07:23:46.473015 139758723237696 tokenizer.py:245] Tokenizer path: meta-llama/Llama-2-7b-chat-hf
I0425 07:23:46.473049 139758723237696 tokenizer.py:224] Loading HF tokenizer: meta-llama/Llama-2-7b-chat-hf
I0425 07:23:46.661007 139758723237696 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json "HTTP/1.1 200 OK"
I0425 07:23:46.766265 139758723237696 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json "HTTP/1.1 200 OK"
I0425 07:23:46.879936 139758723237696 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer_config.json "HTTP/1.1 200 OK"
I0425 07:23:46.986825 139758723237696 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer_config.json "HTTP/1.1 200 OK"
I0425 07:23:47.097620 139758723237696 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main/additional_chat_templates?recursive=false&expand=false "HTTP/1.1 404 Not Found"
I0425 07:23:47.201726 139758723237696 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main?recursive=true&expand=false "HTTP/1.1 200 OK"
I0425 07:23:47.317066 139758723237696 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer.model "HTTP/1.1 302 Found"
I0425 07:23:47.427922 139758723237696 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/xet-read-token/f5db02db724555f92da89c216ac04704f23d4590 "HTTP/1.1 200 OK"
I0425 07:23:48.051024 139758723237696 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer.json "HTTP/1.1 200 OK"
I0425 07:23:48.174417 139758723237696 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer.json "HTTP/1.1 200 OK"
I0425 07:23:48.551342 139758723237696 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/added_tokens.json "HTTP/1.1 404 Not Found"
I0425 07:23:48.660066 139758723237696 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/special_tokens_map.json "HTTP/1.1 200 OK"
I0425 07:23:48.770079 139758723237696 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/special_tokens_map.json "HTTP/1.1 200 OK"
I0425 07:23:48.884574 139758723237696 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/chat_template.jinja "HTTP/1.1 404 Not Found"
I0425 07:23:48.975935 139758723237696 _schedule.py:129] A polynomial schedule was set with a non-positive `transition_steps` value; this results in a constant schedule with value `init_value`.
I0425 07:23:48.982641 139758723237696 maxtext_utils.py:1604] Num_devices: 32, shape (1, 4, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1)
I0425 07:23:48.982784 139758723237696 train_distill.py:582] Applying logical axis rules for model initialization and training...
I0425 07:23:48.982856 139758723237696 train_distill.py:586] Loading Student from ...
I0425 07:23:48.982884 139758723237696 train_distill.py:170] --- Student Configuration ---
I0425 07:23:48.982907 139758723237696 train_distill.py:171]   Model Name:      gpt3-52k
I0425 07:23:48.982928 139758723237696 train_distill.py:172]   Dimensions:      1 Layers, 16 Emb Dim, 8 Head Dim
I0425 07:23:48.982948 139758723237696 train_distill.py:175]   Attention Heads: 2 Query, 2 KV
I0425 07:23:48.982967 139758723237696 train_distill.py:176]   Vocab Size:      32000
I0425 07:23:48.982985 139758723237696 train_distill.py:177]   Checkpoint:      
I0425 07:23:48.983004 139758723237696 train_distill.py:451] Initializing model: gpt3-52k...
I0425 07:23:50.773281 139758723237696 train_distill.py:600] Loading Teacher from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items...
I0425 07:23:50.773391 139758723237696 train_distill.py:170] --- Teacher Configuration ---
I0425 07:23:50.773420 139758723237696 train_distill.py:171]   Model Name:      gpt3-52k
I0425 07:23:50.773444 139758723237696 train_distill.py:172]   Dimensions:      1 Layers, 16 Emb Dim, 8 Head Dim
I0425 07:23:50.773463 139758723237696 train_distill.py:175]   Attention Heads: 2 Query, 2 KV
I0425 07:23:50.773483 139758723237696 train_distill.py:176]   Vocab Size:      32000
I0425 07:23:50.773501 139758723237696 train_distill.py:177]   Checkpoint:      gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items
I0425 07:23:50.773520 139758723237696 train_distill.py:451] Initializing model: gpt3-52k...
I0425 07:23:51.793349 139758723237696 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0425 07:23:51.793505 139758723237696 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=True, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7f1b64d503e0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0425 07:23:51.793566 139758723237696 abstract_checkpointer.py:35] orbax-checkpoint version: 0.11.28
W0425 07:23:52.298979 139758723237696 checkpoint.py:202] Metadata file does not exist: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items/_CHECKPOINT_METADATA
I0425 07:23:52.856783    1971 google_auth_provider.cc:181] Running on GCE, using service account 562977990677-compute@developer.gserviceaccount.com
I0425 07:23:53.872400 139758723237696 checkpointer.py:304] Restoring checkpoint from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items.
W0425 07:23:55.830293 139758723237696 transform_utils.py:230] The transformations API will eventually be replaced by an upgraded design. The current API will not be removed until this point, but it will no longer be actively worked on.
I0425 07:23:55.830654 139758723237696 transform_utils.py:288] The following keys are not loaded from the original tree after applying specified transforms: params/params/decoder/to_nnx__rngs/aqt/count, params/params/decoder/to_nnx__rngs/aqt/key, params/params/decoder/to_nnx__rngs/dropout/count, params/params/decoder/to_nnx__rngs/dropout/key, params/params/decoder/to_nnx__rngs/params/count, params/params/decoder/to_nnx__rngs/params/key
I0425 07:23:58.666985 139758723237696 checkpointer.py:318] Finished restoring checkpoint in 5.18 seconds from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items.
I0425 07:23:59.401551 139758723237696 train_distill.py:626] Initializing Data Iterators via MaxText pipeline...
I0425 07:23:59.466891 139758723237696 config.py:112] TensorFlow version 2.20.0 available.
I0425 07:23:59.467426 139758723237696 config.py:125] JAX version 0.9.2 available.
I0425 07:23:59.905328 139758723237696 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/main/README.md "HTTP/1.1 307 Temporary Redirect"
I0425 07:23:59.912576 139758723237696 _client.py:1025] HTTP Request: HEAD https://huggingface.co/api/resolve-cache/datasets/OptimalScale/ClimbMix/6d467b96d8f26cbe7465e2d70e36191aa75867ac/README.md "HTTP/1.1 200 OK"
I0425 07:23:59.920604 139758723237696 _client.py:1025] HTTP Request: GET https://huggingface.co/api/resolve-cache/datasets/OptimalScale/ClimbMix/6d467b96d8f26cbe7465e2d70e36191aa75867ac/README.md "HTTP/1.1 200 OK"
I0425 07:24:00.022638 139758723237696 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/6d467b96d8f26cbe7465e2d70e36191aa75867ac/ClimbMix.py "HTTP/1.1 404 Not Found"
I0425 07:24:00.338578 139758723237696 _client.py:1025] HTTP Request: HEAD https://s3.amazonaws.com/datasets.huggingface.co/datasets/datasets/OptimalScale/ClimbMix/OptimalScale/ClimbMix.py "HTTP/1.1 404 Not Found"
I0425 07:24:00.448519 139758723237696 _client.py:1025] HTTP Request: GET https://huggingface.co/api/datasets/OptimalScale/ClimbMix/revision/6d467b96d8f26cbe7465e2d70e36191aa75867ac "HTTP/1.1 200 OK"
I0425 07:24:00.574214 139758723237696 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/6d467b96d8f26cbe7465e2d70e36191aa75867ac/.huggingface.yaml "HTTP/1.1 404 Not Found"
I0425 07:24:00.737619 139758723237696 _client.py:1025] HTTP Request: GET https://datasets-server.huggingface.co/info?dataset=OptimalScale/ClimbMix "HTTP/1.1 200 OK"
I0425 07:24:00.847800 139758723237696 _client.py:1025] HTTP Request: GET https://huggingface.co/api/datasets/OptimalScale/ClimbMix/tree/6d467b96d8f26cbe7465e2d70e36191aa75867ac/data?recursive=true&expand=false "HTTP/1.1 404 Not Found"
I0425 07:24:00.977261 139758723237696 _client.py:1025] HTTP Request: GET https://huggingface.co/api/datasets/OptimalScale/ClimbMix/tree/6d467b96d8f26cbe7465e2d70e36191aa75867ac?recursive=false&expand=false "HTTP/1.1 200 OK"
I0425 07:24:01.115448 139758723237696 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/6d467b96d8f26cbe7465e2d70e36191aa75867ac/dataset_infos.json "HTTP/1.1 404 Not Found"
I0425 07:24:01.286349 139758723237696 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json "HTTP/1.1 200 OK"
I0425 07:24:01.394585 139758723237696 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer_config.json "HTTP/1.1 200 OK"
I0425 07:24:01.504628 139758723237696 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main/additional_chat_templates?recursive=false&expand=false "HTTP/1.1 404 Not Found"
I0425 07:24:01.608055 139758723237696 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main?recursive=true&expand=false "HTTP/1.1 200 OK"
E0425 07:24:01.699295 139758723237696 packing.py:209] PackAndBatchOperation is deprecated. Please use lazy_dataset.FirstFitPackIterDataset instead.
I0425 07:24:01.699503 139758723237696 data_loader.py:408] Adding CopyNumPyArrayToSharedMemory MapTransform.
I0425 07:24:01.702436 139758723237696 train_distill.py:396] Input Pipeline Checkpointing: DISABLED
I0425 07:24:01.702493 139758723237696 train_distill.py:400] Reason: Iterator 'MultiHostDataLoadIterator' is not recognized as Grain (dataset_type='DatasetType.HF', has_save=False)
I0425 07:24:01.702555 139758723237696 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0425 07:24:01.702638 139758723237696 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7f1b64d503e0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0425 07:24:01.702677 139758723237696 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0425 07:24:01.702709 139758723237696 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7f1b64d503e0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0425 07:24:01.702751 139758723237696 checkpoint_manager.py:702] [process=6][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f0389d5b1d0>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f1b6ad0ae70>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f0389d5b110>}, handler_registry=None
I0425 07:24:01.702946 139758723237696 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f0389d5b1d0>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0425 07:24:01.702988 139758723237696 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f1b6ad0ae70>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0425 07:24:01.703014 139758723237696 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f0389d5b110>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0425 07:24:01.703038 139758723237696 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f158eceb590>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0425 07:24:01.703066 139758723237696 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f0389d5b1d0>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f0389d5b1d0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f1b6ad0ae70>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f1b6ad0ae70>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f0389d5b110>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f0389d5b110>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f158eceb590>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f158eceb590>}).
I0425 07:24:01.703453 139758723237696 async_checkpointer.py:177] [process=6][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7f158ee82ac0> timeout: 600 secs and primary_host=0 for async checkpoint writes
I0425 07:24:03.296893 139758723237696 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_main_20260425_071506/pt_distill_linen_xpk_main_20260425_071506_07_distill_smoke/checkpoints
I0425 07:24:03.313332 139758723237696 checkpoint_manager.py:921] [process=6][thread=MainThread] CheckpointManager created,  primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_main_20260425_071506/pt_distill_linen_xpk_main_20260425_071506_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7f0389d5b0e0>
I0425 07:24:03.313450 139758723237696 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0425 07:24:03.313518 139758723237696 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7f1b64d503e0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0425 07:24:03.313554 139758723237696 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0425 07:24:03.313592 139758723237696 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7f1b64d503e0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0425 07:24:03.313628 139758723237696 checkpoint_manager.py:1983] [process=6][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning.
I0425 07:24:03.313681 139758723237696 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=139758723237696 count=1 at 0x7f158ec85800>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7f0389d5aed0>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7f0389d5aea0>, _write_futures=[])
I0425 07:24:03.314040 139758723237696 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=139758723237696 count=1 at 0x7f158ec85800>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7f0389d5aed0>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7f0389d5aea0>, _write_futures=[])
I0425 07:24:03.314066 139758723237696 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=139758723237696 count=1 at 0x7f158ec85800>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7f0389d5aed0>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7f0389d5aea0>, _write_futures=[])
I0425 07:24:03.314109 139758723237696 checkpoint_manager.py:702] [process=6][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f026c2c20f0>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f0389d5a900>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f0389de1fd0>, 'iter': <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7f0389de2d80>}, handler_registry=None
I0425 07:24:03.314210 139758723237696 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f026c2c20f0>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0425 07:24:03.314243 139758723237696 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f0389d5a900>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0425 07:24:03.314266 139758723237696 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f0389de1fd0>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0425 07:24:03.314294 139758723237696 composite_checkpoint_handler.py:237] Deferred registration for item: "iter". Adding handler `<maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7f0389de2d80>` for item "iter" and save args `<class 'maxtext.common.checkpointing.GrainCheckpointSave'>` and restore args `<class 'maxtext.common.checkpointing.GrainCheckpointRestore'>` to `_handler_registry`.
I0425 07:24:03.314316 139758723237696 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f0389d5a300>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0425 07:24:03.314340 139758723237696 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f026c2c20f0>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f026c2c20f0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f0389d5a900>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f0389d5a900>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f0389de1fd0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f0389de1fd0>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointSave'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7f0389de2d80>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointRestore'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7f0389de2d80>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f0389d5a300>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f0389d5a300>}).
I0425 07:24:03.314407 139758723237696 async_checkpointer.py:177] [process=6][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7f158ee82c00> timeout: 600 secs and primary_host=0 for async checkpoint writes
I0425 07:24:03.689455 139758723237696 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_main_20260425_071506/pt_distill_linen_xpk_main_20260425_071506_07_distill_smoke/checkpoints
I0425 07:24:03.701480 139758723237696 checkpoint_manager.py:921] [process=6][thread=MainThread] CheckpointManager created,  primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_main_20260425_071506/pt_distill_linen_xpk_main_20260425_071506_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7f158ecc80e0>
I0425 07:24:03.702037 139758723237696 train_distill.py:677] Starting Distillation Training...
I0425 07:24:03.702166 139758723237696 peft_trainer.py:584] Training with mesh: Mesh('diloco': 1, 'data': 4, 'stage': 1, 'fsdp': 8, 'fsdp_transpose': 1, 'context': 1, 'context_autoregressive': 1, 'tensor': 1, 'tensor_transpose': 1, 'tensor_sequence': 1, 'expert': 1, 'autoregressive': 1, axis_types=(Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto))
I0425 07:24:04.185139 139758723237696 peft_trainer.py:594] Compiled train_step cache size: 0
I0425 07:24:04.186681 139602761324288 grain_pool.py:367] Grain pool will use 1 processes.
I0425 07:24:04.243024 139602761324288 grain_pool.py:440] Grain pool will start child processes.
I0425 07:24:04.248849 139602761324288 grain_pool.py:448] Grain pool started all child processes.
2026-04-25 07:24:10.755090: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
`rope_parameters`'s factor field must be a float >= 1, got 40
`rope_parameters`'s beta_fast field must be a float, got 32
`rope_parameters`'s beta_slow field must be a float, got 1
DeepseekV32Config got `key=rope_scaling` in kwargs but hasn't set it as attribute. For RoPE standardization you need to set `self.rope_parameters` in model's config. 
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 781, in <module>
    app.run(main)
  File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 316, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 261, in _run_main
    sys.exit(main(argv))
             ^^^^^^^^^^
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 777, in main
    train_distill(student_config, teacher_config, is_offline, global_config.offline_data_dir)
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 679, in train_distill
    trainer.train(train_iter, eval_iter)
  File "/usr/local/lib/python3.12/site-packages/tunix/sft/peft_trainer.py", line 652, in train
    train_example = sharding_utils.shard_input(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 58, in shard_input
    return jax.tree.map(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/tree.py", line 156, in map
    return tree_util.tree_map(f, tree, *rest, is_leaf=is_leaf)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 373, in tree_map
    return treedef.unflatten(f(*xs) for xs in zip(*all_leaves))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 373, in <genexpr>
    return treedef.unflatten(f(*xs) for xs in zip(*all_leaves))
                             ^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 59, in <lambda>
    lambda x: jax.make_array_from_process_local_data(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 985, in make_array_from_process_local_data
    out = [_array_from_process_local_data(data, s, shape)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 1047, in _array_from_process_local_data
    return make_array_from_callback(global_shape, sharding, cb)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 844, in make_array_from_callback
    per_device_values = api.device_put(per_device_values, devices)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/api.py", line 2732, in device_put
    out_flat = dispatch._batched_device_put_impl(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 602, in _batched_device_put_impl
    y = _device_put_impl(x, device=device, src=src, copy=cp, aval=aval)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 582, in _device_put_impl
    return _device_put_sharding_impl(x, aval, device, copy)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 512, in _device_put_sharding_impl
    raise ValueError(
ValueError: When the second argument to `device_put` is a Device, the first argument must be a fully addressable array or a non-addressable array with a single device sharding. Got value with devices {TpuDevice(id=31, process_index=7, coords=(3,7,0), core_on_chip=0), TpuDevice(id=9, process_index=2, coords=(1,2,0), core_on_chip=0), TpuDevice(id=15, process_index=3, coords=(3,3,0), core_on_chip=0), TpuDevice(id=24, process_index=6, coords=(0,6,0), core_on_chip=0), TpuDevice(id=21, process_index=4, coords=(1,5,0), core_on_chip=0), TpuDevice(id=28, process_index=6, coords=(0,7,0), core_on_chip=0), TpuDevice(id=18, process_index=5, coords=(2,4,0), core_on_chip=0), TpuDevice(id=13, process_index=2, coords=(1,3,0), core_on_chip=0), TpuDevice(id=2, process_index=1, coords=(2,0,0), core_on_chip=0), TpuDevice(id=27, process_index=7, coords=(3,6,0), core_on_chip=0), TpuDevice(id=7, process_index=1, coords=(3,1,0), core_on_chip=0), TpuDevice(id=26, process_index=7, coords=(2,6,0), core_on_chip=0), TpuDevice(id=12, process_index=2, coords=(0,3,0), core_on_chip=0), TpuDevice(id=29, process_index=6, coords=(1,7,0), core_on_chip=0), TpuDevice(id=19, process_index=5, coords=(3,4,0), core_on_chip=0), TpuDevice(id=16, process_index=4, coords=(0,4,0), core_on_chip=0), TpuDevice(id=10, process_index=3, coords=(2,2,0), core_on_chip=0), TpuDevice(id=1, process_index=0, coords=(1,0,0), core_on_chip=0), TpuDevice(id=11, process_index=3, coords=(3,2,0), core_on_chip=0), TpuDevice(id=20, process_index=4, coords=(0,5,0), core_on_chip=0), TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), TpuDevice(id=22, process_index=5, coords=(2,5,0), core_on_chip=0), TpuDevice(id=30, process_index=7, coords=(2,7,0), core_on_chip=0), TpuDevice(id=17, process_index=4, coords=(1,4,0), core_on_chip=0), TpuDevice(id=3, process_index=1, coords=(3,0,0), core_on_chip=0), TpuDevice(id=4, process_index=0, coords=(0,1,0), core_on_chip=0), TpuDevice(id=6, process_index=1, coords=(2,1,0), core_on_chip=0), TpuDevice(id=23, process_index=5, coords=(3,5,0), core_on_chip=0), TpuDevice(id=5, process_index=0, coords=(1,1,0), core_on_chip=0), TpuDevice(id=25, process_index=6, coords=(1,6,0), core_on_chip=0), TpuDevice(id=8, process_index=2, coords=(0,2,0), core_on_chip=0), TpuDevice(id=14, process_index=3, coords=(2,3,0), core_on_chip=0)}
I0425 07:24:14.944451 139602761324288 grain_pool.py:542] Grain pool is exiting.
I0425 07:24:14.944553 139602761324288 grain_pool.py:547] Shutting down multiprocessing system.
I0425 07:24:16.658219 139602761324288 grain_pool.py:547] Shutting down multiprocessing system.
/usr/local/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 15 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
XPK End: Sat Apr 25 07:24:24 UTC 2026
EXIT_CODE=1
NNX  ·  59e0f1759  ·  main_20260425_071506  ·  full log
XPK Start: Sat Apr 25 07:33:28 UTC 2026
2026-04-25 07:33:47.013561: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
`rope_parameters`'s factor field must be a float >= 1, got 40
`rope_parameters`'s beta_fast field must be a float, got 32
`rope_parameters`'s beta_slow field must be a float, got 1
DeepseekV32Config got `key=rope_scaling` in kwargs but hasn't set it as attribute. For RoPE standardization you need to set `self.rope_parameters` in model's config. 
I0425 07:33:53.301823 138943686608704 max_utils.py:273] Attempting to initialize the jax distributed system...
I0425 07:34:02.342883 138943686608704 distributed.py:149] Starting JAX distributed service on [::]:8482
I0425 07:34:02.345161 138943686608704 distributed.py:172] Connecting to JAX distributed service on mt-07-distill-smoke-w1fuf-slice-job-0-0.mt-07-distill-smoke-w1fuf:8482
I0425 07:34:03.432073 138943686608704 max_utils.py:284] Jax distributed system initialized!
I0425 07:34:08.647843 138943686608704 max_utils.py:244] Jax distributed system is already initialized.
W0425 07:34:08.775028 138943686608704 pyconfig.py:111] base_output_directory is not provided; Using local directory called maxtext_output
I0425 07:34:08.832459 138943686608704 max_utils.py:244] Jax distributed system is already initialized.
I0425 07:34:08.833603 138943686608704 pyconfig.py:471] Config param abort_on_inf_loss: True
I0425 07:34:08.833650 138943686608704 pyconfig.py:471] Config param abort_on_nan_loss: True
I0425 07:34:08.833673 138943686608704 pyconfig.py:471] Config param act_quantization_calibration_method: absmax
I0425 07:34:08.833694 138943686608704 pyconfig.py:471] Config param activation_dropout_for_audio: 0.0
I0425 07:34:08.833715 138943686608704 pyconfig.py:471] Config param activation_function_for_audio: gelu
I0425 07:34:08.833736 138943686608704 pyconfig.py:471] Config param activations_in_float32: False
I0425 07:34:08.833755 138943686608704 pyconfig.py:471] Config param adam_b1: 0.9
I0425 07:34:08.833775 138943686608704 pyconfig.py:471] Config param adam_b2: 0.95
I0425 07:34:08.833792 138943686608704 pyconfig.py:471] Config param adam_eps: 1e-08
I0425 07:34:08.833815 138943686608704 pyconfig.py:471] Config param adam_eps_root: 0.0
I0425 07:34:08.833830 138943686608704 pyconfig.py:471] Config param adam_weight_decay: 0.1
I0425 07:34:08.833848 138943686608704 pyconfig.py:471] Config param adamw_mask: []
I0425 07:34:08.833865 138943686608704 pyconfig.py:471] Config param add_bos: True
I0425 07:34:08.833881 138943686608704 pyconfig.py:471] Config param add_eos: True
I0425 07:34:08.833898 138943686608704 pyconfig.py:471] Config param allow_split_physical_axes: False
I0425 07:34:08.833914 138943686608704 pyconfig.py:471] Config param ar_cache_axis_order: 1,2,0,3
I0425 07:34:08.833932 138943686608704 pyconfig.py:471] Config param async_checkpointing: True
I0425 07:34:08.833949 138943686608704 pyconfig.py:471] Config param async_scheduling: False
I0425 07:34:08.833964 138943686608704 pyconfig.py:471] Config param attention: dot_product
I0425 07:34:08.833981 138943686608704 pyconfig.py:471] Config param attention_bias: False
I0425 07:34:08.833998 138943686608704 pyconfig.py:471] Config param attention_dropout_for_audio: 0.0
I0425 07:34:08.834013 138943686608704 pyconfig.py:471] Config param attention_out: RematLocation.REMAT
I0425 07:34:08.834034 138943686608704 pyconfig.py:471] Config param attention_output_dim: -1
I0425 07:34:08.834050 138943686608704 pyconfig.py:471] Config param attention_sink: False
I0425 07:34:08.834066 138943686608704 pyconfig.py:471] Config param attention_type: global
I0425 07:34:08.834081 138943686608704 pyconfig.py:471] Config param attn_logits_soft_cap: None
I0425 07:34:08.834115 138943686608704 pyconfig.py:471] Config param audio_path: 
I0425 07:34:08.834132 138943686608704 pyconfig.py:471] Config param audio_placeholder: <|audio|>
I0425 07:34:08.834147 138943686608704 pyconfig.py:471] Config param autoregressive_decode_assert: 
I0425 07:34:08.834163 138943686608704 pyconfig.py:471] Config param base_config: base.yml
I0425 07:34:08.834193 138943686608704 pyconfig.py:471] Config param base_emb_dim: 16
I0425 07:34:08.834210 138943686608704 pyconfig.py:471] Config param base_mlp_dim: 64
I0425 07:34:08.834224 138943686608704 pyconfig.py:471] Config param base_moe_mlp_dim: -1
I0425 07:34:08.834240 138943686608704 pyconfig.py:471] Config param base_num_decoder_layers: 1
I0425 07:34:08.834257 138943686608704 pyconfig.py:471] Config param base_num_kv_heads: 2
I0425 07:34:08.834277 138943686608704 pyconfig.py:471] Config param base_num_query_heads: 2
I0425 07:34:08.834293 138943686608704 pyconfig.py:471] Config param base_output_directory: /deps/maxtext_output
I0425 07:34:08.834308 138943686608704 pyconfig.py:471] Config param batch_size: 1
I0425 07:34:08.834323 138943686608704 pyconfig.py:471] Config param batch_split_factor: 1
I0425 07:34:08.834339 138943686608704 pyconfig.py:471] Config param beta_fast: 32
I0425 07:34:08.834354 138943686608704 pyconfig.py:471] Config param beta_slow: 1
I0425 07:34:08.834371 138943686608704 pyconfig.py:471] Config param bwd_quantization_calibration_method: absmax
I0425 07:34:08.834387 138943686608704 pyconfig.py:471] Config param capacity_factor: -1.0
I0425 07:34:08.834404 138943686608704 pyconfig.py:471] Config param cast_logits_to_fp32: True
I0425 07:34:08.834419 138943686608704 pyconfig.py:471] Config param chat_template: 
I0425 07:34:08.834435 138943686608704 pyconfig.py:471] Config param chat_template_path: 
I0425 07:34:08.834453 138943686608704 pyconfig.py:471] Config param checkpoint_conversion_fn: None
I0425 07:34:08.834470 138943686608704 pyconfig.py:471] Config param checkpoint_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-07-34/checkpoints/
I0425 07:34:08.834486 138943686608704 pyconfig.py:471] Config param checkpoint_is_quantized: False
I0425 07:34:08.834502 138943686608704 pyconfig.py:471] Config param checkpoint_period: 2000
I0425 07:34:08.834519 138943686608704 pyconfig.py:471] Config param checkpoint_storage_concurrent_gb: 96
I0425 07:34:08.834536 138943686608704 pyconfig.py:471] Config param checkpoint_storage_target_data_file_size_bytes: 2147483648
I0425 07:34:08.834550 138943686608704 pyconfig.py:471] Config param checkpoint_storage_use_ocdbt: True
I0425 07:34:08.834566 138943686608704 pyconfig.py:471] Config param checkpoint_storage_use_zarr3: True
I0425 07:34:08.834582 138943686608704 pyconfig.py:471] Config param checkpoint_todelete_full_path: None
I0425 07:34:08.834597 138943686608704 pyconfig.py:471] Config param checkpoint_todelete_subdir: None
I0425 07:34:08.834613 138943686608704 pyconfig.py:471] Config param chips_per_vm: 4
I0425 07:34:08.834629 138943686608704 pyconfig.py:471] Config param chunk_attn_window_size: 0
I0425 07:34:08.834645 138943686608704 pyconfig.py:471] Config param collect_stack_trace: False
I0425 07:34:08.834661 138943686608704 pyconfig.py:471] Config param colocated_python_checkpointing: False
I0425 07:34:08.834676 138943686608704 pyconfig.py:471] Config param colocated_python_data_input: False
I0425 07:34:08.834691 138943686608704 pyconfig.py:471] Config param compile_topology: 
I0425 07:34:08.834706 138943686608704 pyconfig.py:471] Config param compile_topology_num_slices: -1
I0425 07:34:08.834722 138943686608704 pyconfig.py:471] Config param compile_xla_flags: 
I0425 07:34:08.834738 138943686608704 pyconfig.py:471] Config param compiled_trainstep_file: 
I0425 07:34:08.834752 138943686608704 pyconfig.py:471] Config param compute_axis_order: 0,1,2,3
I0425 07:34:08.834768 138943686608704 pyconfig.py:471] Config param constant_bound_config: []
I0425 07:34:08.834783 138943686608704 pyconfig.py:471] Config param context: RematLocation.REMAT
I0425 07:34:08.834799 138943686608704 pyconfig.py:471] Config param context_parallel_load_balance: True
I0425 07:34:08.834814 138943686608704 pyconfig.py:471] Config param context_parallel_reorder_strategy: ReorderStrategy.AUTO
I0425 07:34:08.834832 138943686608704 pyconfig.py:471] Config param context_parallel_size: 1
I0425 07:34:08.834848 138943686608704 pyconfig.py:471] Config param context_parallel_strategy: all_gather
I0425 07:34:08.834862 138943686608704 pyconfig.py:471] Config param context_sharding: context
I0425 07:34:08.834878 138943686608704 pyconfig.py:471] Config param conv_chunksize_for_audio: 500
I0425 07:34:08.834892 138943686608704 pyconfig.py:471] Config param conv_stride_for_vit: 14
I0425 07:34:08.834909 138943686608704 pyconfig.py:471] Config param convert_checkpoint_if_possible: False
I0425 07:34:08.834923 138943686608704 pyconfig.py:471] Config param cost_estimate_flops_bwd: -1
I0425 07:34:08.834939 138943686608704 pyconfig.py:471] Config param cost_estimate_flops_fwd: -1
I0425 07:34:08.834955 138943686608704 pyconfig.py:471] Config param custom_mesh: 
I0425 07:34:08.834969 138943686608704 pyconfig.py:471] Config param custom_mesh_and_rule: 
I0425 07:34:08.834985 138943686608704 pyconfig.py:471] Config param d_model_for_audio: 256
I0425 07:34:08.835000 138943686608704 pyconfig.py:471] Config param data_sharding: (('data', 'stage', 'fsdp', 'fsdp_transpose', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'),)
I0425 07:34:08.835020 138943686608704 pyconfig.py:471] Config param data_shuffle_seed: 0
I0425 07:34:08.835036 138943686608704 pyconfig.py:471] Config param dataset_name: c4/en:3.0.1
I0425 07:34:08.835052 138943686608704 pyconfig.py:471] Config param dataset_path: 
I0425 07:34:08.835066 138943686608704 pyconfig.py:471] Config param dataset_type: DatasetType.HF
I0425 07:34:08.835083 138943686608704 pyconfig.py:471] Config param dcn_autoregressive_parallelism: 1
I0425 07:34:08.835110 138943686608704 pyconfig.py:471] Config param dcn_context_autoregressive_parallelism: 1
I0425 07:34:08.835127 138943686608704 pyconfig.py:471] Config param dcn_context_parallelism: 1
I0425 07:34:08.835142 138943686608704 pyconfig.py:471] Config param dcn_data_parallelism: -1
I0425 07:34:08.835158 138943686608704 pyconfig.py:471] Config param dcn_diloco_parallelism: 1
I0425 07:34:08.835172 138943686608704 pyconfig.py:471] Config param dcn_expert_parallelism: 1
I0425 07:34:08.835188 138943686608704 pyconfig.py:471] Config param dcn_fsdp_parallelism: 1
I0425 07:34:08.835203 138943686608704 pyconfig.py:471] Config param dcn_fsdp_transpose_parallelism: 1
I0425 07:34:08.835219 138943686608704 pyconfig.py:471] Config param dcn_parallelism: [1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
I0425 07:34:08.835237 138943686608704 pyconfig.py:471] Config param dcn_pipeline_parallelism: 1
I0425 07:34:08.835253 138943686608704 pyconfig.py:471] Config param dcn_sequence_parallelism: 1
I0425 07:34:08.835272 138943686608704 pyconfig.py:471] Config param dcn_tensor_parallelism: 1
I0425 07:34:08.835288 138943686608704 pyconfig.py:471] Config param dcn_tensor_sequence_parallelism: 1
I0425 07:34:08.835304 138943686608704 pyconfig.py:471] Config param dcn_tensor_transpose_parallelism: 1
I0425 07:34:08.835321 138943686608704 pyconfig.py:471] Config param debug: {'rl': False}
I0425 07:34:08.835339 138943686608704 pyconfig.py:471] Config param debug_sharding: False
I0425 07:34:08.835353 138943686608704 pyconfig.py:471] Config param decode_sampling_nucleus_p: -1
I0425 07:34:08.835370 138943686608704 pyconfig.py:471] Config param decode_sampling_strategy: SamplingStrategy.GREEDY
I0425 07:34:08.835388 138943686608704 pyconfig.py:471] Config param decode_sampling_temperature: 1.0
I0425 07:34:08.835404 138943686608704 pyconfig.py:471] Config param decode_sampling_top_k: 0
I0425 07:34:08.835420 138943686608704 pyconfig.py:471] Config param decoder_block: DecoderBlockType.GPT3
I0425 07:34:08.835436 138943686608704 pyconfig.py:471] Config param decoder_layer_input: RematLocation.DEVICE
I0425 07:34:08.835451 138943686608704 pyconfig.py:471] Config param deepstack_visual_indexes_for_vit: []
I0425 07:34:08.835467 138943686608704 pyconfig.py:471] Config param degenerate_group_masking: True
I0425 07:34:08.835483 138943686608704 pyconfig.py:471] Config param dense_init_scale: 1.0
I0425 07:34:08.835500 138943686608704 pyconfig.py:471] Config param diloco_outer_lr: 0.3
I0425 07:34:08.835516 138943686608704 pyconfig.py:471] Config param diloco_outer_momentum: 0.9
I0425 07:34:08.835531 138943686608704 pyconfig.py:471] Config param diloco_sync_period: 36
I0425 07:34:08.835546 138943686608704 pyconfig.py:471] Config param distill_alpha: 0.5
I0425 07:34:08.835562 138943686608704 pyconfig.py:471] Config param distill_alpha_end: None
I0425 07:34:08.835578 138943686608704 pyconfig.py:471] Config param distill_alpha_schedule: constant
I0425 07:34:08.835594 138943686608704 pyconfig.py:471] Config param distill_beta: 0.0
I0425 07:34:08.835610 138943686608704 pyconfig.py:471] Config param distill_beta_end: None
I0425 07:34:08.835626 138943686608704 pyconfig.py:471] Config param distill_beta_schedule: constant
I0425 07:34:08.835642 138943686608704 pyconfig.py:471] Config param distill_feature_loss_type: cosine
I0425 07:34:08.835656 138943686608704 pyconfig.py:471] Config param distill_layer_indices: None
I0425 07:34:08.835671 138943686608704 pyconfig.py:471] Config param distill_temperature: 1.0
I0425 07:34:08.835687 138943686608704 pyconfig.py:471] Config param distill_temperature_end: None
I0425 07:34:08.835703 138943686608704 pyconfig.py:471] Config param distill_temperature_schedule: constant
I0425 07:34:08.835719 138943686608704 pyconfig.py:471] Config param downsample_hidden_size_for_audio: 256
I0425 07:34:08.835733 138943686608704 pyconfig.py:471] Config param dpo_beta: 0.1
I0425 07:34:08.835749 138943686608704 pyconfig.py:471] Config param dpo_label_smoothing: 0.0
I0425 07:34:08.835764 138943686608704 pyconfig.py:471] Config param dq_reduction_steps: 0
I0425 07:34:08.835780 138943686608704 pyconfig.py:471] Config param dropout_rate: 0.0
I0425 07:34:08.835795 138943686608704 pyconfig.py:471] Config param dtype: bfloat16
I0425 07:34:08.835827 138943686608704 pyconfig.py:471] Config param dtype_mm: float32
I0425 07:34:08.835844 138943686608704 pyconfig.py:471] Config param dump_hlo: False
I0425 07:34:08.835859 138943686608704 pyconfig.py:471] Config param dump_hlo_delete_local_after: True
I0425 07:34:08.835875 138943686608704 pyconfig.py:471] Config param dump_hlo_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-07-34/xla_dump
I0425 07:34:08.835890 138943686608704 pyconfig.py:471] Config param dump_hlo_local_dir: /tmp/xla_dump/
I0425 07:34:08.835905 138943686608704 pyconfig.py:471] Config param dump_hlo_local_module_name: jit_train_step
I0425 07:34:08.835921 138943686608704 pyconfig.py:471] Config param dump_hlo_module_name: jit_train_step
I0425 07:34:08.835935 138943686608704 pyconfig.py:471] Config param dump_hlo_upload_all: False
I0425 07:34:08.835951 138943686608704 pyconfig.py:471] Config param dump_hlo_xla_flags: 
I0425 07:34:08.835967 138943686608704 pyconfig.py:471] Config param dump_jaxpr: False
I0425 07:34:08.835982 138943686608704 pyconfig.py:471] Config param dump_jaxpr_delete_local_after: True
I0425 07:34:08.835997 138943686608704 pyconfig.py:471] Config param dump_jaxpr_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-07-34/jaxpr_dump
I0425 07:34:08.836013 138943686608704 pyconfig.py:471] Config param dump_jaxpr_local_dir: /tmp/jaxpr_dump/
I0425 07:34:08.836029 138943686608704 pyconfig.py:471] Config param dump_step: -1
I0425 07:34:08.836043 138943686608704 pyconfig.py:471] Config param elastic_enabled: False
I0425 07:34:08.836059 138943686608704 pyconfig.py:471] Config param elastic_max_retries: 10
I0425 07:34:08.836075 138943686608704 pyconfig.py:471] Config param elastic_timeout_seconds: 300
I0425 07:34:08.836091 138943686608704 pyconfig.py:471] Config param emb_dim: 16
I0425 07:34:08.836114 138943686608704 pyconfig.py:471] Config param enable_autocheckpoint: False
I0425 07:34:08.836130 138943686608704 pyconfig.py:471] Config param enable_checkpoint_cloud_logger: False
I0425 07:34:08.836146 138943686608704 pyconfig.py:471] Config param enable_checkpointing: True
I0425 07:34:08.836162 138943686608704 pyconfig.py:471] Config param enable_continuous_checkpointing: False
I0425 07:34:08.836176 138943686608704 pyconfig.py:471] Config param enable_data_shuffling: True
I0425 07:34:08.836192 138943686608704 pyconfig.py:471] Config param enable_diloco: False
I0425 07:34:08.836207 138943686608704 pyconfig.py:471] Config param enable_dp_attention: False
I0425 07:34:08.836223 138943686608704 pyconfig.py:471] Config param enable_dropout: False
I0425 07:34:08.836239 138943686608704 pyconfig.py:471] Config param enable_emergency_checkpoint: False
I0425 07:34:08.836255 138943686608704 pyconfig.py:471] Config param enable_expert_parallel: False
I0425 07:34:08.836273 138943686608704 pyconfig.py:471] Config param enable_gcp_goodput_metrics: True
I0425 07:34:08.836289 138943686608704 pyconfig.py:471] Config param enable_gcp_step_deviation_metrics: True
I0425 07:34:08.836305 138943686608704 pyconfig.py:471] Config param enable_goodput_recording: False
I0425 07:34:08.836321 138943686608704 pyconfig.py:471] Config param enable_jax_profiler: False
I0425 07:34:08.836336 138943686608704 pyconfig.py:471] Config param enable_llm_inference_pool: False
I0425 07:34:08.836352 138943686608704 pyconfig.py:471] Config param enable_model_warmup: False
I0425 07:34:08.836367 138943686608704 pyconfig.py:471] Config param enable_multi_tier_checkpointing: False
I0425 07:34:08.836383 138943686608704 pyconfig.py:471] Config param enable_nnx: False
I0425 07:34:08.836399 138943686608704 pyconfig.py:471] Config param enable_orbax_v1: False
I0425 07:34:08.836414 138943686608704 pyconfig.py:471] Config param enable_padding_causal_mask: True
I0425 07:34:08.836430 138943686608704 pyconfig.py:471] Config param enable_pathways_goodput: False
I0425 07:34:08.836445 138943686608704 pyconfig.py:471] Config param enable_prefix_caching: False
I0425 07:34:08.836460 138943686608704 pyconfig.py:471] Config param enable_rampup_batch_size: False
I0425 07:34:08.836477 138943686608704 pyconfig.py:471] Config param enable_single_controller: False
I0425 07:34:08.836492 138943686608704 pyconfig.py:471] Config param enable_single_replica_ckpt_restoring: False
I0425 07:34:08.836508 138943686608704 pyconfig.py:471] Config param enable_tensorboard: True
I0425 07:34:08.836523 138943686608704 pyconfig.py:471] Config param enable_tunix_perf_metrics: False
I0425 07:34:08.836538 138943686608704 pyconfig.py:471] Config param encoder_attention_heads_for_audio: 4
I0425 07:34:08.836553 138943686608704 pyconfig.py:471] Config param encoder_ffn_dim_for_audio: 512
I0425 07:34:08.836569 138943686608704 pyconfig.py:471] Config param encoder_layers_for_audio: 2
I0425 07:34:08.836583 138943686608704 pyconfig.py:471] Config param engram: RematLocation.REMAT
I0425 07:34:08.836600 138943686608704 pyconfig.py:471] Config param engram_head_dim: 1280
I0425 07:34:08.836614 138943686608704 pyconfig.py:471] Config param engram_kernel_size: 4
I0425 07:34:08.836630 138943686608704 pyconfig.py:471] Config param engram_layers: []
I0425 07:34:08.836645 138943686608704 pyconfig.py:471] Config param engram_max_ngram_size: 3
I0425 07:34:08.836661 138943686608704 pyconfig.py:471] Config param engram_num_heads: 8
I0425 07:34:08.836676 138943686608704 pyconfig.py:471] Config param engram_seed: 0
I0425 07:34:08.836690 138943686608704 pyconfig.py:471] Config param engram_vocab_bases: []
I0425 07:34:08.836706 138943686608704 pyconfig.py:471] Config param epsilon_high: None
I0425 07:34:08.836720 138943686608704 pyconfig.py:471] Config param eval_corr_lst: False
I0425 07:34:08.836736 138943686608704 pyconfig.py:471] Config param eval_data_columns: ['text']
I0425 07:34:08.836751 138943686608704 pyconfig.py:471] Config param eval_dataset_name: c4/en:3.0.1
I0425 07:34:08.836767 138943686608704 pyconfig.py:471] Config param eval_image_column: image
I0425 07:34:08.836781 138943686608704 pyconfig.py:471] Config param eval_interval: -1
I0425 07:34:08.836797 138943686608704 pyconfig.py:471] Config param eval_make_lst: False
I0425 07:34:08.836812 138943686608704 pyconfig.py:471] Config param eval_mode: pass
I0425 07:34:08.836827 138943686608704 pyconfig.py:471] Config param eval_per_device_batch_size: 2
I0425 07:34:08.836843 138943686608704 pyconfig.py:471] Config param eval_sampling_strategy: greedy
I0425 07:34:08.836859 138943686608704 pyconfig.py:471] Config param eval_split: validation
I0425 07:34:08.836874 138943686608704 pyconfig.py:471] Config param eval_steps: -1
I0425 07:34:08.836889 138943686608704 pyconfig.py:471] Config param expansion_factor_real_data: -1.0
I0425 07:34:08.836906 138943686608704 pyconfig.py:471] Config param final_logits_soft_cap: None
I0425 07:34:08.836921 138943686608704 pyconfig.py:471] Config param first_num_dense_layers: 0
I0425 07:34:08.836937 138943686608704 pyconfig.py:471] Config param float32_gate_logits: False
I0425 07:34:08.836953 138943686608704 pyconfig.py:471] Config param float32_logits: False
I0425 07:34:08.836967 138943686608704 pyconfig.py:471] Config param float32_qk_product: False
I0425 07:34:08.836983 138943686608704 pyconfig.py:471] Config param float32_weight_sum: True
I0425 07:34:08.836997 138943686608704 pyconfig.py:471] Config param force_q_layout: False
I0425 07:34:08.837013 138943686608704 pyconfig.py:471] Config param force_unroll: False
I0425 07:34:08.837027 138943686608704 pyconfig.py:471] Config param formatting_func_kwargs: {}
I0425 07:34:08.837044 138943686608704 pyconfig.py:471] Config param formatting_func_path: 
I0425 07:34:08.837058 138943686608704 pyconfig.py:471] Config param freeze_audio_encoder_params: True
I0425 07:34:08.837072 138943686608704 pyconfig.py:471] Config param freeze_vision_encoder_params: True
I0425 07:34:08.837088 138943686608704 pyconfig.py:471] Config param fused_mlp: False
I0425 07:34:08.837111 138943686608704 pyconfig.py:471] Config param fused_qkv: True
I0425 07:34:08.837127 138943686608704 pyconfig.py:471] Config param gcs_metrics: False
I0425 07:34:08.837143 138943686608704 pyconfig.py:471] Config param gdn_chunk_size: 64
I0425 07:34:08.837157 138943686608704 pyconfig.py:471] Config param gdn_conv_kernel_dim: 4
I0425 07:34:08.837173 138943686608704 pyconfig.py:471] Config param gdn_key_head_dim: 128
I0425 07:34:08.837187 138943686608704 pyconfig.py:471] Config param gdn_num_key_heads: 16
I0425 07:34:08.837203 138943686608704 pyconfig.py:471] Config param gdn_num_value_heads: 32
I0425 07:34:08.837218 138943686608704 pyconfig.py:471] Config param gdn_value_head_dim: 128
I0425 07:34:08.837233 138943686608704 pyconfig.py:471] Config param generate_padding_batch_eval: False
I0425 07:34:08.837249 138943686608704 pyconfig.py:471] Config param generate_padding_batch_train: False
I0425 07:34:08.837265 138943686608704 pyconfig.py:471] Config param generate_slice: v5e-16
I0425 07:34:08.837284 138943686608704 pyconfig.py:471] Config param generation_configs: {}
I0425 07:34:08.837299 138943686608704 pyconfig.py:471] Config param global_batch_size_to_eval_on: 64
I0425 07:34:08.837315 138943686608704 pyconfig.py:471] Config param global_batch_size_to_load: 512
I0425 07:34:08.837332 138943686608704 pyconfig.py:471] Config param global_batch_size_to_load_eval: 64
I0425 07:34:08.837346 138943686608704 pyconfig.py:471] Config param global_batch_size_to_load_increment: None
I0425 07:34:08.837363 138943686608704 pyconfig.py:471] Config param global_batch_size_to_load_start: None
I0425 07:34:08.837380 138943686608704 pyconfig.py:471] Config param global_batch_size_to_train_on: 512
I0425 07:34:08.837396 138943686608704 pyconfig.py:471] Config param global_head_dim: 0
I0425 07:34:08.837412 138943686608704 pyconfig.py:471] Config param global_num_kv_heads: 0
I0425 07:34:08.837427 138943686608704 pyconfig.py:471] Config param global_parameter_scale: 1
I0425 07:34:08.837443 138943686608704 pyconfig.py:471] Config param global_rampup_samples: 500
I0425 07:34:08.837458 138943686608704 pyconfig.py:471] Config param global_rope_max_timescale: -1
I0425 07:34:08.837473 138943686608704 pyconfig.py:471] Config param global_rope_proportion: 0.25
I0425 07:34:08.837490 138943686608704 pyconfig.py:471] Config param goodput_upload_interval_seconds: 30
I0425 07:34:08.837505 138943686608704 pyconfig.py:471] Config param grad_dtype: float32
I0425 07:34:08.837540 138943686608704 pyconfig.py:471] Config param gradient_accumulation_steps: 8
I0425 07:34:08.837557 138943686608704 pyconfig.py:471] Config param gradient_clipping_threshold: 1.0
I0425 07:34:08.837574 138943686608704 pyconfig.py:471] Config param grain_data_source_max_workers: 16
I0425 07:34:08.837590 138943686608704 pyconfig.py:471] Config param grain_eval_files: 
I0425 07:34:08.837604 138943686608704 pyconfig.py:471] Config param grain_file_type: arrayrecord
I0425 07:34:08.837620 138943686608704 pyconfig.py:471] Config param grain_num_threads: 16
I0425 07:34:08.837635 138943686608704 pyconfig.py:471] Config param grain_num_threads_eval: 16
I0425 07:34:08.837651 138943686608704 pyconfig.py:471] Config param grain_packing_type: first_fit
I0425 07:34:08.837666 138943686608704 pyconfig.py:471] Config param grain_per_worker_buffer_size: 1
I0425 07:34:08.837682 138943686608704 pyconfig.py:471] Config param grain_per_worker_buffer_size_eval: 1
I0425 07:34:08.837696 138943686608704 pyconfig.py:471] Config param grain_prefetch_buffer_size: 500
I0425 07:34:08.837712 138943686608704 pyconfig.py:471] Config param grain_prefetch_buffer_size_eval: 500
I0425 07:34:08.837728 138943686608704 pyconfig.py:471] Config param grain_ram_budget_mb: 1024
I0425 07:34:08.837744 138943686608704 pyconfig.py:471] Config param grain_shuffle_buffer_size: 100
I0425 07:34:08.837760 138943686608704 pyconfig.py:471] Config param grain_train_files: 
I0425 07:34:08.837775 138943686608704 pyconfig.py:471] Config param grain_train_mixture_config_path: 
I0425 07:34:08.837791 138943686608704 pyconfig.py:471] Config param grain_worker_count: 1
I0425 07:34:08.837806 138943686608704 pyconfig.py:471] Config param grain_worker_count_eval: 1
I0425 07:34:08.837822 138943686608704 pyconfig.py:471] Config param grpo_beta: 0.08
I0425 07:34:08.837837 138943686608704 pyconfig.py:471] Config param grpo_epsilon: 0.2
I0425 07:34:08.837853 138943686608704 pyconfig.py:471] Config param hardware: tpu
I0425 07:34:08.837869 138943686608704 pyconfig.py:471] Config param hbm_utilization_vllm: 0.72
I0425 07:34:08.837885 138943686608704 pyconfig.py:471] Config param head_dim: 8
I0425 07:34:08.837899 138943686608704 pyconfig.py:471] Config param heartbeat_reporting_interval_in_seconds: 5
I0425 07:34:08.837915 138943686608704 pyconfig.py:471] Config param hf_data_dir: None
I0425 07:34:08.837931 138943686608704 pyconfig.py:471] Config param hf_eval_files: None
I0425 07:34:08.837947 138943686608704 pyconfig.py:471] Config param hf_eval_split: None
I0425 07:34:08.837961 138943686608704 pyconfig.py:471] Config param hf_name: None
I0425 07:34:08.837976 138943686608704 pyconfig.py:471] Config param hf_path: OptimalScale/ClimbMix
I0425 07:34:08.837992 138943686608704 pyconfig.py:471] Config param hf_train_files: None
I0425 07:34:08.838008 138943686608704 pyconfig.py:471] Config param hidden_size_for_vit: 1408
I0425 07:34:08.838022 138943686608704 pyconfig.py:471] Config param hide_profiler_step_metric: False
I0425 07:34:08.838038 138943686608704 pyconfig.py:471] Config param ici_autoregressive_parallelism: 1
I0425 07:34:08.838053 138943686608704 pyconfig.py:471] Config param ici_context_autoregressive_parallelism: 1
I0425 07:34:08.838068 138943686608704 pyconfig.py:471] Config param ici_context_parallelism: 1
I0425 07:34:08.838082 138943686608704 pyconfig.py:471] Config param ici_data_parallelism: 1
I0425 07:34:08.838104 138943686608704 pyconfig.py:471] Config param ici_diloco_parallelism: 1
I0425 07:34:08.838121 138943686608704 pyconfig.py:471] Config param ici_expert_parallelism: 1
I0425 07:34:08.838137 138943686608704 pyconfig.py:471] Config param ici_fsdp_parallelism: -1
I0425 07:34:08.838151 138943686608704 pyconfig.py:471] Config param ici_fsdp_transpose_parallelism: 1
I0425 07:34:08.838166 138943686608704 pyconfig.py:471] Config param ici_parallelism: [1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1]
I0425 07:34:08.838183 138943686608704 pyconfig.py:471] Config param ici_pipeline_parallelism: 1
I0425 07:34:08.838198 138943686608704 pyconfig.py:471] Config param ici_sequence_parallelism: 1
I0425 07:34:08.838214 138943686608704 pyconfig.py:471] Config param ici_tensor_parallelism: 1
I0425 07:34:08.838230 138943686608704 pyconfig.py:471] Config param ici_tensor_sequence_parallelism: 1
I0425 07:34:08.838246 138943686608704 pyconfig.py:471] Config param ici_tensor_transpose_parallelism: 1
I0425 07:34:08.838260 138943686608704 pyconfig.py:471] Config param image_path: 
I0425 07:34:08.838279 138943686608704 pyconfig.py:471] Config param image_placeholder: <|image|>
I0425 07:34:08.838295 138943686608704 pyconfig.py:471] Config param image_size_for_vit: 896
I0425 07:34:08.838311 138943686608704 pyconfig.py:471] Config param indexer_head_dim: 128
I0425 07:34:08.838328 138943686608704 pyconfig.py:471] Config param indexer_loss_scaling_factor: 0.0
I0425 07:34:08.838344 138943686608704 pyconfig.py:471] Config param indexer_n_heads: 64
I0425 07:34:08.838361 138943686608704 pyconfig.py:471] Config param indexer_sparse_training: False
I0425 07:34:08.838377 138943686608704 pyconfig.py:471] Config param indexer_topk: 2048
I0425 07:34:08.838391 138943686608704 pyconfig.py:471] Config param inference_benchmark_test: False
I0425 07:34:08.838406 138943686608704 pyconfig.py:471] Config param inference_metadata_file: 
I0425 07:34:08.838422 138943686608704 pyconfig.py:471] Config param inference_microbenchmark_log_file_path: 
I0425 07:34:08.838436 138943686608704 pyconfig.py:471] Config param inference_microbenchmark_loop_iters: 10
I0425 07:34:08.838453 138943686608704 pyconfig.py:471] Config param inference_microbenchmark_num_samples: [1, 2, 3, 4, 5]
I0425 07:34:08.838467 138943686608704 pyconfig.py:471] Config param inference_microbenchmark_prefill_lengths: 64,128,256,512,1024
I0425 07:34:08.838483 138943686608704 pyconfig.py:471] Config param inference_microbenchmark_stages: prefill,generate
I0425 07:34:08.838497 138943686608704 pyconfig.py:471] Config param inference_server: MaxtextInterleavedServer
I0425 07:34:08.838513 138943686608704 pyconfig.py:471] Config param inhomogeneous_layer_cycle_interval: 1
I0425 07:34:08.838527 138943686608704 pyconfig.py:471] Config param init_weights_seed: 0
I0425 07:34:08.838541 138943686608704 pyconfig.py:471] Config param input_data_sharding_logical_axes: ['activation_embed_and_logits_batch', 'activation_norm_length']
I0425 07:34:08.838557 138943686608704 pyconfig.py:471] Config param interleave_moe_layer_step: 1
I0425 07:34:08.838572 138943686608704 pyconfig.py:471] Config param intermediate_size_for_vit: 5632
I0425 07:34:08.838586 138943686608704 pyconfig.py:471] Config param internal_compile: False
I0425 07:34:08.838602 138943686608704 pyconfig.py:471] Config param internal_compile_num_devices: -1
I0425 07:34:08.838616 138943686608704 pyconfig.py:471] Config param jax_cache_dir: ~/jax_cache
I0425 07:34:08.838632 138943686608704 pyconfig.py:471] Config param jax_debug_log_modules: 
I0425 07:34:08.838648 138943686608704 pyconfig.py:471] Config param jax_distributed_initialization_timeout: 300
I0425 07:34:08.838663 138943686608704 pyconfig.py:471] Config param jax_profiler_port: 9999
I0425 07:34:08.838679 138943686608704 pyconfig.py:471] Config param key_proj: RematLocation.REMAT
I0425 07:34:08.838696 138943686608704 pyconfig.py:471] Config param kv_cache_buffer: 256
I0425 07:34:08.838710 138943686608704 pyconfig.py:471] Config param kv_lora_rank: 512
I0425 07:34:08.838726 138943686608704 pyconfig.py:471] Config param kv_quant_axis: KvQuantAxis.HEADS_AND_DKV
I0425 07:34:08.838743 138943686608704 pyconfig.py:471] Config param kv_quant_dtype: int8
I0425 07:34:08.838758 138943686608704 pyconfig.py:471] Config param kv_wa_proj: RematLocation.REMAT
I0425 07:34:08.838774 138943686608704 pyconfig.py:471] Config param learning_rate: 0.0002
I0425 07:34:08.838789 138943686608704 pyconfig.py:471] Config param learning_rate_final_fraction: 0.1
I0425 07:34:08.838805 138943686608704 pyconfig.py:471] Config param learning_rate_schedule_steps: 200000
I0425 07:34:08.838819 138943686608704 pyconfig.py:471] Config param load_balance_loss_weight: 0.0
I0425 07:34:08.838835 138943686608704 pyconfig.py:471] Config param load_checkpoint_only_once: False
I0425 07:34:08.838850 138943686608704 pyconfig.py:471] Config param load_from_prefill_dir: False
I0425 07:34:08.838866 138943686608704 pyconfig.py:471] Config param load_full_state_path: 
I0425 07:34:08.838880 138943686608704 pyconfig.py:471] Config param load_parameters_path: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items
I0425 07:34:08.838896 138943686608704 pyconfig.py:471] Config param local_checkpoint_directory: 
I0425 07:34:08.838910 138943686608704 pyconfig.py:471] Config param local_checkpoint_period: 0
I0425 07:34:08.838926 138943686608704 pyconfig.py:471] Config param local_rope_max_timescale: -1
I0425 07:34:08.838943 138943686608704 pyconfig.py:471] Config param local_rope_proportion: 1.0
I0425 07:34:08.838957 138943686608704 pyconfig.py:471] Config param log_config: True
I0425 07:34:08.838972 138943686608704 pyconfig.py:471] Config param log_period: 10
I0425 07:34:08.838988 138943686608704 pyconfig.py:471] Config param logical_axis_rules: (('activation_embed_and_logits_batch', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_embed_and_logits_batch_sequence', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'context', 'expert')), ('activation_vocab', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_vocab', ('tensor', 'tensor_transpose')), ('activation_vocab', 'tensor_sequence'), ('vocab', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('embed_vocab', ('fsdp', 'fsdp_transpose', 'context', 'expert')), ('activation_batch_attn', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('activation_kv_heads', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_length_attn', ('context',)), ('activation_q_length', ('context',)), ('activation_kv_length', ()), ('activation_embed_attn', ('tensor', 'tensor_transpose')), ('activation_kv', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_kv_head_dim', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('q_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('kv_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('qkv', ()), ('kv', ()), ('kv_head_dim', ()), ('q_lora', ('fsdp', 'fsdp_transpose', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'fsdp_transpose', 'context', 'expert')), ('q_lora', ('fsdp', 'context', 'expert')), ('q_lora_up_proj', ()), ('kv_lora', ('fsdp', 'fsdp_transpose', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'fsdp_transpose', 'context', 'expert')), ('kv_lora', ('fsdp', 'context', 'expert')), ('kv_lora_up_proj', ()), ('activation_batch_moe', ('data', 'fsdp', 'fsdp_transpose')), ('activation_length_moe', ('context',)), ('activation_norm_length_moe', ('tensor_sequence', 'context')), ('activation_embed_moe', ('tensor', 'tensor_transpose')), ('activation_mlp_moe', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_exp', ('expert',)), ('exp', 'expert'), ('mlp_moe', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'context')), ('embed_moe', ('fsdp', 'context')), ('activation_mlp', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_length', ('context',)), ('activation_norm_length', ('tensor_sequence', 'context')), ('activation_embed', ('tensor', 'tensor_transpose')), ('activation_stage', 'stage'), ('mlp', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed', ('fsdp', 'fsdp_transpose', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'fsdp_transpose', 'context', 'expert')), ('embed', ('fsdp', 'context', 'expert')), ('norm', ('tensor', 'tensor_transpose')), ('layers', 'stage'), ('diloco', 'diloco'), ('engram_dim', ('tensor',)), ('dense_layers', ()), ('moe_layers', ()), ('mhc', ()), ('prefill_activation_length', ('context',)), ('prefill_activation_norm_length', ('tensor_sequence', 'context')), ('activation_prefill_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_length', ()), ('cache_heads', ('autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence')), ('cache_heads', ('autoregressive', 'tensor', 'tensor_sequence')), ('paged_kv_heads', ('tensor',)), ('cache_batch_prefill', ()), ('cache_batch', ()), ('cache_heads_none', ()), ('cache_kv', ()), ('cache_sequence', ()), ('num_pages', ()), ('tokens_per_page', ()), ('paged_kv_head_dim_size', ()), ('mlp_no_fsdp', ('tensor', 'tensor_sequence', 'autoregressive')), ('embed_tensor_transpose', ('tensor_transpose',)), ('exp_with_fsdp', 'fsdp'))
I0425 07:34:08.839057 138943686608704 pyconfig.py:471] Config param logits_dot_in_fp32: False
I0425 07:34:08.839078 138943686608704 pyconfig.py:471] Config param logits_via_embedding: True
I0425 07:34:08.839104 138943686608704 pyconfig.py:471] Config param lora_input_adapters_path: 
I0425 07:34:08.839121 138943686608704 pyconfig.py:471] Config param loss_algo: grpo
I0425 07:34:08.839137 138943686608704 pyconfig.py:471] Config param lr_schedule_type: LearningRateScheduleType.COSINE
I0425 07:34:08.839155 138943686608704 pyconfig.py:471] Config param managed_mldiagnostics: False
I0425 07:34:08.839169 138943686608704 pyconfig.py:471] Config param managed_mldiagnostics_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-07-34/managed-mldiagnostics
I0425 07:34:08.839185 138943686608704 pyconfig.py:471] Config param managed_mldiagnostics_run_group: 
I0425 07:34:08.839200 138943686608704 pyconfig.py:471] Config param math_verify_num_procs: None
I0425 07:34:08.839216 138943686608704 pyconfig.py:471] Config param math_verify_timeout: 300
I0425 07:34:08.839231 138943686608704 pyconfig.py:471] Config param matmul_precision: MatmulPrecision.DEFAULT
I0425 07:34:08.839248 138943686608704 pyconfig.py:471] Config param max_checkify: False
I0425 07:34:08.839264 138943686608704 pyconfig.py:471] Config param max_concurrency: 256
I0425 07:34:08.839284 138943686608704 pyconfig.py:471] Config param max_corpus_chars: 10000000
I0425 07:34:08.839300 138943686608704 pyconfig.py:471] Config param max_num_batched_tokens: None
I0425 07:34:08.839315 138943686608704 pyconfig.py:471] Config param max_num_checkpoints_to_keep: None
I0425 07:34:08.839331 138943686608704 pyconfig.py:471] Config param max_num_images_per_example: -1
I0425 07:34:08.839347 138943686608704 pyconfig.py:471] Config param max_num_seqs: None
I0425 07:34:08.839362 138943686608704 pyconfig.py:471] Config param max_position_embeddings: 163840
I0425 07:34:08.839378 138943686608704 pyconfig.py:471] Config param max_prefill_predict_length: 64
I0425 07:34:08.839394 138943686608704 pyconfig.py:471] Config param max_sample_len_for_audio: 10000
I0425 07:34:08.839410 138943686608704 pyconfig.py:471] Config param max_segments_per_seq: -1
I0425 07:34:08.839425 138943686608704 pyconfig.py:471] Config param max_source_positions_for_audio: 1500
I0425 07:34:08.839441 138943686608704 pyconfig.py:471] Config param max_target_length: 2048
I0425 07:34:08.839457 138943686608704 pyconfig.py:471] Config param max_timescale_for_audio: 10000.0
I0425 07:34:08.839473 138943686608704 pyconfig.py:471] Config param megablox: True
I0425 07:34:08.839489 138943686608704 pyconfig.py:471] Config param merge_gating_gmm: False
I0425 07:34:08.839505 138943686608704 pyconfig.py:471] Config param mesh_axes: ['diloco', 'data', 'stage', 'fsdp', 'fsdp_transpose', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive']
I0425 07:34:08.839522 138943686608704 pyconfig.py:471] Config param metrics_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-07-34/metrics/
I0425 07:34:08.839540 138943686608704 pyconfig.py:471] Config param metrics_file: 
I0425 07:34:08.839555 138943686608704 pyconfig.py:471] Config param mhc_expansion_rate: 1
I0425 07:34:08.839571 138943686608704 pyconfig.py:471] Config param micro_batch_size_to_eval_on: 64
I0425 07:34:08.839587 138943686608704 pyconfig.py:471] Config param micro_batch_size_to_train_on: 64
I0425 07:34:08.839603 138943686608704 pyconfig.py:471] Config param mla_kv: RematLocation.REMAT
I0425 07:34:08.839618 138943686608704 pyconfig.py:471] Config param mla_naive_kvcache: True
I0425 07:34:08.839634 138943686608704 pyconfig.py:471] Config param mla_q: RematLocation.REMAT
I0425 07:34:08.839650 138943686608704 pyconfig.py:471] Config param mlp_activations: ['gelu']
I0425 07:34:08.839666 138943686608704 pyconfig.py:471] Config param mlp_activations_limit: -1.0
I0425 07:34:08.839682 138943686608704 pyconfig.py:471] Config param mlp_bias: False
I0425 07:34:08.839698 138943686608704 pyconfig.py:471] Config param mlp_dim: 64
I0425 07:34:08.839714 138943686608704 pyconfig.py:471] Config param mlpwi: RematLocation.REMAT
I0425 07:34:08.839728 138943686608704 pyconfig.py:471] Config param mlpwi_0: RematLocation.REMAT
I0425 07:34:08.839744 138943686608704 pyconfig.py:471] Config param mlpwi_1: RematLocation.REMAT
I0425 07:34:08.839760 138943686608704 pyconfig.py:471] Config param mlpwo: RematLocation.REMAT
I0425 07:34:08.839776 138943686608704 pyconfig.py:471] Config param moba: False
I0425 07:34:08.839791 138943686608704 pyconfig.py:471] Config param moba_chunk_size: 1024
I0425 07:34:08.839805 138943686608704 pyconfig.py:471] Config param moba_topk: 8
I0425 07:34:08.839821 138943686608704 pyconfig.py:471] Config param model_call_mode: 
I0425 07:34:08.839837 138943686608704 pyconfig.py:471] Config param model_name: gpt3-52k
I0425 07:34:08.839853 138943686608704 pyconfig.py:471] Config param moe_expert_input_dim: -1
I0425 07:34:08.839869 138943686608704 pyconfig.py:471] Config param moe_fsdp_use_two_stage_all_gather: False
I0425 07:34:08.839884 138943686608704 pyconfig.py:471] Config param moe_mlp_dim: -1
I0425 07:34:08.839898 138943686608704 pyconfig.py:471] Config param moe_mlpwi_0: RematLocation.REMAT
I0425 07:34:08.839914 138943686608704 pyconfig.py:471] Config param moe_mlpwi_1: RematLocation.REMAT
I0425 07:34:08.839929 138943686608704 pyconfig.py:471] Config param moe_mlpwo: RematLocation.REMAT
I0425 07:34:08.839948 138943686608704 pyconfig.py:471] Config param monitor_goodput: False
I0425 07:34:08.839962 138943686608704 pyconfig.py:471] Config param monitor_step_time_deviation: True
I0425 07:34:08.839977 138943686608704 pyconfig.py:471] Config param mrope_section: [24, 20, 20]
I0425 07:34:08.839995 138943686608704 pyconfig.py:471] Config param mscale: 1.0
I0425 07:34:08.840009 138943686608704 pyconfig.py:471] Config param mtc_data_parallelism: 0
I0425 07:34:08.840025 138943686608704 pyconfig.py:471] Config param mtp_eval_target_module: 0
I0425 07:34:08.840039 138943686608704 pyconfig.py:471] Config param mtp_loss_scaling_factor: 0.1
I0425 07:34:08.840056 138943686608704 pyconfig.py:471] Config param mtp_num_layers: 0
I0425 07:34:08.840071 138943686608704 pyconfig.py:471] Config param mu_dtype: float32
I0425 07:34:08.840106 138943686608704 pyconfig.py:471] Config param multi_sampling: False
I0425 07:34:08.840123 138943686608704 pyconfig.py:471] Config param multi_tier_checkpointing_backup_interval_minutes: 0
I0425 07:34:08.840138 138943686608704 pyconfig.py:471] Config param muon_beta: 0.95
I0425 07:34:08.840155 138943686608704 pyconfig.py:471] Config param muon_consistent_rms: None
I0425 07:34:08.840169 138943686608704 pyconfig.py:471] Config param muon_weight_decay: 0.0
I0425 07:34:08.840185 138943686608704 pyconfig.py:471] Config param n_routing_groups: -1
I0425 07:34:08.840201 138943686608704 pyconfig.py:471] Config param n_window_for_audio: 50
I0425 07:34:08.840215 138943686608704 pyconfig.py:471] Config param n_window_infer_for_audio: 800
I0425 07:34:08.840231 138943686608704 pyconfig.py:471] Config param nope_layer_interval: -1
I0425 07:34:08.840252 138943686608704 pyconfig.py:471] Config param norm_topk_prob: False
I0425 07:34:08.840274 138943686608704 pyconfig.py:471] Config param normalization_layer_epsilon: 1e-05
I0425 07:34:08.840290 138943686608704 pyconfig.py:471] Config param normalize_embedding_logits: False
I0425 07:34:08.840306 138943686608704 pyconfig.py:471] Config param num_attention_heads_for_vit: 16
I0425 07:34:08.840322 138943686608704 pyconfig.py:471] Config param num_batches: 4
I0425 07:34:08.840336 138943686608704 pyconfig.py:471] Config param num_channels_for_vit: 3
I0425 07:34:08.840352 138943686608704 pyconfig.py:471] Config param num_conv_layers_for_audio: 3
I0425 07:34:08.840366 138943686608704 pyconfig.py:471] Config param num_decoder_layers: 1
I0425 07:34:08.840382 138943686608704 pyconfig.py:471] Config param num_diloco_replicas: 1
I0425 07:34:08.840396 138943686608704 pyconfig.py:471] Config param num_epoch: 1
I0425 07:34:08.840412 138943686608704 pyconfig.py:471] Config param num_eval_passes: 1
I0425 07:34:08.840428 138943686608704 pyconfig.py:471] Config param num_experts: 1
I0425 07:34:08.840442 138943686608704 pyconfig.py:471] Config param num_experts_per_tok: 1
I0425 07:34:08.840458 138943686608704 pyconfig.py:471] Config param num_generations: 2
I0425 07:34:08.840472 138943686608704 pyconfig.py:471] Config param num_hidden_layers_for_vit: 34
I0425 07:34:08.840488 138943686608704 pyconfig.py:471] Config param num_iterations: 1
I0425 07:34:08.840503 138943686608704 pyconfig.py:471] Config param num_kv_heads: 2
I0425 07:34:08.840519 138943686608704 pyconfig.py:471] Config param num_layers_per_pipeline_stage: 1
I0425 07:34:08.840533 138943686608704 pyconfig.py:471] Config param num_mel_bins_for_audio: 128
I0425 07:34:08.840548 138943686608704 pyconfig.py:471] Config param num_pipeline_microbatches: -1
I0425 07:34:08.840563 138943686608704 pyconfig.py:471] Config param num_pipeline_repeats: -1
I0425 07:34:08.840578 138943686608704 pyconfig.py:471] Config param num_position_embeddings_for_vit: 1024
I0425 07:34:08.840593 138943686608704 pyconfig.py:471] Config param num_query_heads: 2
I0425 07:34:08.840609 138943686608704 pyconfig.py:471] Config param num_samplers_slices: -1
I0425 07:34:08.840623 138943686608704 pyconfig.py:471] Config param num_slices: 1
I0425 07:34:08.840639 138943686608704 pyconfig.py:471] Config param num_target_devices: 32
I0425 07:34:08.840653 138943686608704 pyconfig.py:471] Config param num_test_batches: 5
I0425 07:34:08.840669 138943686608704 pyconfig.py:471] Config param num_trainer_slices: -1
I0425 07:34:08.840683 138943686608704 pyconfig.py:471] Config param num_vocab_tiling: 1
I0425 07:34:08.840699 138943686608704 pyconfig.py:471] Config param off_policy_steps: 0
I0425 07:34:08.840713 138943686608704 pyconfig.py:471] Config param offline_data_dir: None
I0425 07:34:08.840730 138943686608704 pyconfig.py:471] Config param opt_type: OptimizerType.ADAM_PAX
I0425 07:34:08.840747 138943686608704 pyconfig.py:471] Config param optimize_mesh_for_tpu_v6e: False
I0425 07:34:08.840762 138943686608704 pyconfig.py:471] Config param optimizer_memory_host_offload: False
I0425 07:34:08.840779 138943686608704 pyconfig.py:471] Config param original_max_position_embeddings: 4096
I0425 07:34:08.840794 138943686608704 pyconfig.py:471] Config param out_hidden_size_for_vit: 512
I0425 07:34:08.840808 138943686608704 pyconfig.py:471] Config param out_proj: RematLocation.REMAT
I0425 07:34:08.840825 138943686608704 pyconfig.py:471] Config param output_dim_for_audio: 512
I0425 07:34:08.840839 138943686608704 pyconfig.py:471] Config param override_logical_axis_rules: False
I0425 07:34:08.840855 138943686608704 pyconfig.py:471] Config param override_model_config: True
I0425 07:34:08.840870 138943686608704 pyconfig.py:471] Config param packing: True
I0425 07:34:08.840884 138943686608704 pyconfig.py:471] Config param pagedattn_head_dim_alignment: 128
I0425 07:34:08.840900 138943686608704 pyconfig.py:471] Config param pagedattn_max_pages_per_group: -1
I0425 07:34:08.840915 138943686608704 pyconfig.py:471] Config param pagedattn_num_pages: 64
I0425 07:34:08.840929 138943686608704 pyconfig.py:471] Config param pagedattn_pages_per_compute_block: 4
I0425 07:34:08.840945 138943686608704 pyconfig.py:471] Config param pagedattn_tokens_per_page: 32
I0425 07:34:08.840961 138943686608704 pyconfig.py:471] Config param param_scan_axis: 1
I0425 07:34:08.840975 138943686608704 pyconfig.py:471] Config param parameter_memory_host_offload: False
I0425 07:34:08.840991 138943686608704 pyconfig.py:471] Config param partial_rotary_factor: 1.0
I0425 07:34:08.841006 138943686608704 pyconfig.py:471] Config param patch_size_for_vit: 14
I0425 07:34:08.841020 138943686608704 pyconfig.py:471] Config param penalty_incorrect_answer: -1.0
I0425 07:34:08.841035 138943686608704 pyconfig.py:471] Config param penalty_incorrect_format: -0.5
I0425 07:34:08.841050 138943686608704 pyconfig.py:471] Config param per_device_batch_size: 2
I0425 07:34:08.841066 138943686608704 pyconfig.py:471] Config param per_device_batch_size_increment: 2.0
I0425 07:34:08.841083 138943686608704 pyconfig.py:471] Config param per_device_batch_size_start: 4.0
I0425 07:34:08.841108 138943686608704 pyconfig.py:471] Config param pipeline_delay_activation_forwarding: False
I0425 07:34:08.841124 138943686608704 pyconfig.py:471] Config param pipeline_fsdp_ag_once: False
I0425 07:34:08.841140 138943686608704 pyconfig.py:471] Config param pipeline_fsdp_ag_per_repeat: False
I0425 07:34:08.841155 138943686608704 pyconfig.py:471] Config param pipeline_parallel_layers: 1
I0425 07:34:08.841171 138943686608704 pyconfig.py:471] Config param pixel_shuffle_ratio_for_vit: 0.5
I0425 07:34:08.841186 138943686608704 pyconfig.py:471] Config param posemb_type_for_vit: learn
I0425 07:34:08.841201 138943686608704 pyconfig.py:471] Config param position_id_per_seconds: 25
I0425 07:34:08.841217 138943686608704 pyconfig.py:471] Config param prefill_cache_axis_order: 1,2,0,3
I0425 07:34:08.841231 138943686608704 pyconfig.py:471] Config param prefill_cache_dir: 
I0425 07:34:08.841246 138943686608704 pyconfig.py:471] Config param prefill_chunk_size: 256
I0425 07:34:08.841261 138943686608704 pyconfig.py:471] Config param prefill_slice: v5e-16
I0425 07:34:08.841281 138943686608704 pyconfig.py:471] Config param prefix_caching_dram_byte: 100000000000
I0425 07:34:08.841296 138943686608704 pyconfig.py:471] Config param prefix_caching_hbm_byte: 10000000000
I0425 07:34:08.841312 138943686608704 pyconfig.py:471] Config param prefuse_moe_weights: False
I0425 07:34:08.841327 138943686608704 pyconfig.py:471] Config param profile_cleanly: True
I0425 07:34:08.841343 138943686608704 pyconfig.py:471] Config param profile_periodically_period: -1
I0425 07:34:08.841357 138943686608704 pyconfig.py:471] Config param profile_power_events: False
I0425 07:34:08.841373 138943686608704 pyconfig.py:471] Config param profiler: ProfilerType.NONE
I0425 07:34:08.841390 138943686608704 pyconfig.py:471] Config param profiler_steps: 5
I0425 07:34:08.841406 138943686608704 pyconfig.py:471] Config param projector_dropout_for_vit: 0.0
I0425 07:34:08.841421 138943686608704 pyconfig.py:471] Config param projector_input_dim_for_vit: 4096
I0425 07:34:08.841437 138943686608704 pyconfig.py:471] Config param projector_output_dim_for_vit: 4096
I0425 07:34:08.841453 138943686608704 pyconfig.py:471] Config param prometheus_port: 0
I0425 07:34:08.841467 138943686608704 pyconfig.py:471] Config param prompt: I love to
I0425 07:34:08.841483 138943686608704 pyconfig.py:471] Config param pure_nnx: False
I0425 07:34:08.841499 138943686608704 pyconfig.py:471] Config param pure_nnx_decoder: False
I0425 07:34:08.841513 138943686608704 pyconfig.py:471] Config param q_lora_rank: 0
I0425 07:34:08.841529 138943686608704 pyconfig.py:471] Config param qk_clip_threshold: 100.0
I0425 07:34:08.841545 138943686608704 pyconfig.py:471] Config param qk_nope_head_dim: 128
I0425 07:34:08.841560 138943686608704 pyconfig.py:471] Config param qk_norm_with_scale: True
I0425 07:34:08.841576 138943686608704 pyconfig.py:471] Config param qk_rope_head_dim: 64
I0425 07:34:08.841591 138943686608704 pyconfig.py:471] Config param qkv_proj: RematLocation.REMAT
I0425 07:34:08.841607 138943686608704 pyconfig.py:471] Config param quant_cfg_path: 
I0425 07:34:08.841621 138943686608704 pyconfig.py:471] Config param quantization: QuantizationType.NONE
I0425 07:34:08.841639 138943686608704 pyconfig.py:471] Config param quantization_local_shard_count: 4
I0425 07:34:08.841655 138943686608704 pyconfig.py:471] Config param quantize_kvcache: False
I0425 07:34:08.841670 138943686608704 pyconfig.py:471] Config param query_proj: RematLocation.REMAT
I0425 07:34:08.841685 138943686608704 pyconfig.py:471] Config param query_wa_proj: RematLocation.REMAT
I0425 07:34:08.841701 138943686608704 pyconfig.py:471] Config param ragged_block_size: 256
I0425 07:34:08.841716 138943686608704 pyconfig.py:471] Config param ragged_buffer_factor: -1.0
I0425 07:34:08.841731 138943686608704 pyconfig.py:471] Config param rampup_end_step: 0
I0425 07:34:08.841746 138943686608704 pyconfig.py:471] Config param rampup_samples_per_increment_to_load: None
I0425 07:34:08.841761 138943686608704 pyconfig.py:471] Config param reasoning_end_token: </reasoning>
I0425 07:34:08.841776 138943686608704 pyconfig.py:471] Config param reasoning_start_token: <reasoning>
I0425 07:34:08.841792 138943686608704 pyconfig.py:471] Config param record_internal_nn_metrics: 0
I0425 07:34:08.841806 138943686608704 pyconfig.py:471] Config param remat_policy: full
I0425 07:34:08.841821 138943686608704 pyconfig.py:471] Config param remat_policy_for_vit: minimal
I0425 07:34:08.841836 138943686608704 pyconfig.py:471] Config param remove_size_one_mesh_axis_from_type: True
I0425 07:34:08.841852 138943686608704 pyconfig.py:471] Config param replicate_quant_scale: False
I0425 07:34:08.841866 138943686608704 pyconfig.py:471] Config param replicator_backup_interval_minutes: 0
I0425 07:34:08.841882 138943686608704 pyconfig.py:471] Config param report_heartbeat_metric_for_gcp_monitoring: False
I0425 07:34:08.841897 138943686608704 pyconfig.py:471] Config param report_performance_metric_for_gcp_monitoring: False
I0425 07:34:08.841912 138943686608704 pyconfig.py:471] Config param reshape_q: False
I0425 07:34:08.841927 138943686608704 pyconfig.py:471] Config param return_log_prob: False
I0425 07:34:08.841942 138943686608704 pyconfig.py:471] Config param reuse_example_batch: 0
I0425 07:34:08.841956 138943686608704 pyconfig.py:471] Config param reward_exact_answer: 5.0
I0425 07:34:08.841973 138943686608704 pyconfig.py:471] Config param reward_exact_format_match: 3.0
I0425 07:34:08.841987 138943686608704 pyconfig.py:471] Config param reward_partial_format_match: 0.5
I0425 07:34:08.842004 138943686608704 pyconfig.py:471] Config param reward_ratio_guess_to_answer_high: 0.5
I0425 07:34:08.842020 138943686608704 pyconfig.py:471] Config param reward_ratio_guess_to_answer_low: 0.25
I0425 07:34:08.842035 138943686608704 pyconfig.py:471] Config param reward_white_space_format_match: 1.5
I0425 07:34:08.842051 138943686608704 pyconfig.py:471] Config param rl: {'num_generations': 2, 'num_iterations': 1, 'grpo_beta': 0.08, 'grpo_epsilon': 0.2, 'loss_algo': 'grpo', 'use_agentic_rollout': False, 'max_concurrency': 256, 'off_policy_steps': 0, 'system_prompt': '', 'degenerate_group_masking': True, 'epsilon_high': None}
I0425 07:34:08.842072 138943686608704 pyconfig.py:471] Config param rollout_data_parallelism: -1
I0425 07:34:08.842089 138943686608704 pyconfig.py:471] Config param rollout_expert_parallelism: 1
I0425 07:34:08.842112 138943686608704 pyconfig.py:471] Config param rollout_micro_batch_size: -1
I0425 07:34:08.842128 138943686608704 pyconfig.py:471] Config param rollout_tensor_parallelism: -1
I0425 07:34:08.842143 138943686608704 pyconfig.py:471] Config param rope_attention_scaling: False
I0425 07:34:08.842158 138943686608704 pyconfig.py:471] Config param rope_factor: 40
I0425 07:34:08.842173 138943686608704 pyconfig.py:471] Config param rope_interleave: True
I0425 07:34:08.842189 138943686608704 pyconfig.py:471] Config param rope_linear_scaling_factor: 1.0
I0425 07:34:08.842203 138943686608704 pyconfig.py:471] Config param rope_max_timescale: 10000
I0425 07:34:08.842219 138943686608704 pyconfig.py:471] Config param rope_min_timescale: 1
I0425 07:34:08.842234 138943686608704 pyconfig.py:471] Config param rope_theta_for_vit: 10000
I0425 07:34:08.842249 138943686608704 pyconfig.py:471] Config param rope_truncate: True
I0425 07:34:08.842264 138943686608704 pyconfig.py:471] Config param rope_type: RopeType.DEFAULT
I0425 07:34:08.842286 138943686608704 pyconfig.py:471] Config param rope_use_scale: True
I0425 07:34:08.842301 138943686608704 pyconfig.py:471] Config param routed_bias: False
I0425 07:34:08.842316 138943686608704 pyconfig.py:471] Config param routed_bias_update_rate: 0.0
I0425 07:34:08.842331 138943686608704 pyconfig.py:471] Config param routed_scaling_factor: 1.0
I0425 07:34:08.842347 138943686608704 pyconfig.py:471] Config param routed_score_func: 
I0425 07:34:08.842361 138943686608704 pyconfig.py:471] Config param run_name: gpt3-52k_2026-04-25-07-34
I0425 07:34:08.842377 138943686608704 pyconfig.py:471] Config param sa_block_kv: 512
I0425 07:34:08.842391 138943686608704 pyconfig.py:471] Config param sa_block_kv_compute: 512
I0425 07:34:08.842408 138943686608704 pyconfig.py:471] Config param sa_block_kv_dkv: 512
I0425 07:34:08.842424 138943686608704 pyconfig.py:471] Config param sa_block_kv_dkv_compute: 512
I0425 07:34:08.842438 138943686608704 pyconfig.py:471] Config param sa_block_kv_dq: 512
I0425 07:34:08.842454 138943686608704 pyconfig.py:471] Config param sa_block_q: 512
I0425 07:34:08.842470 138943686608704 pyconfig.py:471] Config param sa_block_q_dkv: 512
I0425 07:34:08.842486 138943686608704 pyconfig.py:471] Config param sa_block_q_dq: 512
I0425 07:34:08.842502 138943686608704 pyconfig.py:471] Config param sa_k_layout: HEAD_DIM_MINOR
I0425 07:34:08.842516 138943686608704 pyconfig.py:471] Config param sa_q_layout: HEAD_DIM_MINOR
I0425 07:34:08.842532 138943686608704 pyconfig.py:471] Config param sa_use_fused_bwd_kernel: False
I0425 07:34:08.842549 138943686608704 pyconfig.py:471] Config param sa_v_layout: HEAD_DIM_MINOR
I0425 07:34:08.842564 138943686608704 pyconfig.py:471] Config param sampler_devices_fraction: 0.5
I0425 07:34:08.842581 138943686608704 pyconfig.py:471] Config param save_checkpoint_on_completion: True
I0425 07:34:08.842595 138943686608704 pyconfig.py:471] Config param save_config_to_gcs: False
I0425 07:34:08.842610 138943686608704 pyconfig.py:471] Config param save_quantized_params_path: 
I0425 07:34:08.842626 138943686608704 pyconfig.py:471] Config param scale_embedding_for_audio: True
I0425 07:34:08.842642 138943686608704 pyconfig.py:471] Config param scan_layers: True
I0425 07:34:08.842656 138943686608704 pyconfig.py:471] Config param scan_layers_per_stage: False
I0425 07:34:08.842672 138943686608704 pyconfig.py:471] Config param scan_pipeline_iterations: True
I0425 07:34:08.842688 138943686608704 pyconfig.py:471] Config param scan_pipeline_repeats: False
I0425 07:34:08.842703 138943686608704 pyconfig.py:471] Config param set_remat_policy_on_layers_per_stage: False
I0425 07:34:08.842719 138943686608704 pyconfig.py:471] Config param set_remat_policy_on_pipeline_iterations: True
I0425 07:34:08.842733 138943686608704 pyconfig.py:471] Config param sft_train_on_completion_only: False
I0425 07:34:08.842749 138943686608704 pyconfig.py:471] Config param shard_exp_on_fsdp: False
I0425 07:34:08.842764 138943686608704 pyconfig.py:471] Config param shard_mode: ShardMode.AUTO
I0425 07:34:08.842781 138943686608704 pyconfig.py:471] Config param shard_optimizer_over_data: False
I0425 07:34:08.842797 138943686608704 pyconfig.py:471] Config param sharding_strategy: None
I0425 07:34:08.842812 138943686608704 pyconfig.py:471] Config param sharding_tolerance: 0.02
I0425 07:34:08.842828 138943686608704 pyconfig.py:471] Config param shardy: True
I0425 07:34:08.842842 138943686608704 pyconfig.py:471] Config param share_kv_projections: False
I0425 07:34:08.842858 138943686608704 pyconfig.py:471] Config param shared_experts: 0
I0425 07:34:08.842874 138943686608704 pyconfig.py:471] Config param sinkhorn_iterations: 20
I0425 07:34:08.842890 138943686608704 pyconfig.py:471] Config param skip_first_n_steps_for_profiler: 1
I0425 07:34:08.842904 138943686608704 pyconfig.py:471] Config param skip_jax_distributed_system: False
I0425 07:34:08.842920 138943686608704 pyconfig.py:471] Config param skip_step_interval: 128
I0425 07:34:08.842934 138943686608704 pyconfig.py:471] Config param skip_step_on_spikes: False
I0425 07:34:08.842950 138943686608704 pyconfig.py:471] Config param skip_step_scaling_factor: 6.0
I0425 07:34:08.842965 138943686608704 pyconfig.py:471] Config param sliding_window_size: 0
I0425 07:34:08.842979 138943686608704 pyconfig.py:471] Config param solution_end_token: </answer>
I0425 07:34:08.842995 138943686608704 pyconfig.py:471] Config param solution_start_token: <answer>
I0425 07:34:08.843009 138943686608704 pyconfig.py:471] Config param source_checkpoint_layout: orbax
I0425 07:34:08.843025 138943686608704 pyconfig.py:471] Config param sparse_matmul: True
I0425 07:34:08.843041 138943686608704 pyconfig.py:471] Config param spatial_merge_size_for_vit: 2
I0425 07:34:08.843057 138943686608704 pyconfig.py:471] Config param stack_prefill_result_cache: False
I0425 07:34:08.843072 138943686608704 pyconfig.py:471] Config param stack_trace_interval_seconds: 600
I0425 07:34:08.843088 138943686608704 pyconfig.py:471] Config param stack_trace_to_cloud: False
I0425 07:34:08.843115 138943686608704 pyconfig.py:471] Config param step_deviation_interval_seconds: 30
I0425 07:34:08.843131 138943686608704 pyconfig.py:471] Config param steps: 200000
I0425 07:34:08.843147 138943686608704 pyconfig.py:471] Config param stop_strings: None
I0425 07:34:08.843163 138943686608704 pyconfig.py:471] Config param student_overrides: {'model_name': 'llama3.1-8b'}
I0425 07:34:08.843178 138943686608704 pyconfig.py:471] Config param student_params_to_update: None
I0425 07:34:08.843194 138943686608704 pyconfig.py:471] Config param subslice_shape: 
I0425 07:34:08.843210 138943686608704 pyconfig.py:471] Config param swap_space_vllm_gb: 2
I0425 07:34:08.843224 138943686608704 pyconfig.py:471] Config param system_prompt: 
I0425 07:34:08.843240 138943686608704 pyconfig.py:471] Config param target_eval_loss: 0.0
I0425 07:34:08.843255 138943686608704 pyconfig.py:471] Config param teacher_overrides: {'model_name': 'llama3.1-8b'}
I0425 07:34:08.843276 138943686608704 pyconfig.py:471] Config param temperature_tuning: False
I0425 07:34:08.843290 138943686608704 pyconfig.py:471] Config param temporal_patch_size_for_vit: 2
I0425 07:34:08.843306 138943686608704 pyconfig.py:471] Config param tensorboard_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-07-34/tensorboard/
I0425 07:34:08.843321 138943686608704 pyconfig.py:471] Config param tensors_on_device: None
I0425 07:34:08.843335 138943686608704 pyconfig.py:471] Config param tensors_to_offload: None
I0425 07:34:08.843351 138943686608704 pyconfig.py:471] Config param test_batch_start_index: 0
I0425 07:34:08.843365 138943686608704 pyconfig.py:471] Config param tile_size_for_vit: 336
I0425 07:34:08.843381 138943686608704 pyconfig.py:471] Config param tokenize_eval_data: True
I0425 07:34:08.843395 138943686608704 pyconfig.py:471] Config param tokenize_train_data: True
I0425 07:34:08.843411 138943686608704 pyconfig.py:471] Config param tokenizer_path: meta-llama/Llama-3.1-8B
I0425 07:34:08.843426 138943686608704 pyconfig.py:471] Config param tokenizer_type: TokenizerType.HUGGINGFACE
I0425 07:34:08.843444 138943686608704 pyconfig.py:471] Config param topk_routing_group: -1
I0425 07:34:08.843458 138943686608704 pyconfig.py:471] Config param train_data_columns: ['text']
I0425 07:34:08.843474 138943686608704 pyconfig.py:471] Config param train_fraction: 1.0
I0425 07:34:08.843488 138943686608704 pyconfig.py:471] Config param train_image_column: image
I0425 07:34:08.843504 138943686608704 pyconfig.py:471] Config param train_micro_batch_size: -1
I0425 07:34:08.843519 138943686608704 pyconfig.py:471] Config param train_split: train
I0425 07:34:08.843534 138943686608704 pyconfig.py:471] Config param trainable_parameters_mask: []
I0425 07:34:08.843549 138943686608704 pyconfig.py:471] Config param trainable_position_size: 2048
I0425 07:34:08.843564 138943686608704 pyconfig.py:471] Config param trainer_devices_fraction: 0.5
I0425 07:34:08.843580 138943686608704 pyconfig.py:471] Config param upload_all_profiler_results: False
I0425 07:34:08.843595 138943686608704 pyconfig.py:471] Config param use_2d_fsdp_sharding: False
I0425 07:34:08.843609 138943686608704 pyconfig.py:471] Config param use_agentic_rollout: False
I0425 07:34:08.843625 138943686608704 pyconfig.py:471] Config param use_audio: False
I0425 07:34:08.843639 138943686608704 pyconfig.py:471] Config param use_audio_in_video: False
I0425 07:34:08.843655 138943686608704 pyconfig.py:471] Config param use_batch_split_schedule: False
I0425 07:34:08.843670 138943686608704 pyconfig.py:471] Config param use_chat_template: False
I0425 07:34:08.843686 138943686608704 pyconfig.py:471] Config param use_chunked_prefill: False
I0425 07:34:08.843700 138943686608704 pyconfig.py:471] Config param use_custom_sort_vjp: True
I0425 07:34:08.843716 138943686608704 pyconfig.py:471] Config param use_dpo: False
I0425 07:34:08.843731 138943686608704 pyconfig.py:471] Config param use_gather_mosaic_kernel: False
I0425 07:34:08.843747 138943686608704 pyconfig.py:471] Config param use_grpo: True
I0425 07:34:08.843761 138943686608704 pyconfig.py:471] Config param use_indexer: False
I0425 07:34:08.843777 138943686608704 pyconfig.py:471] Config param use_iota_embed: True
I0425 07:34:08.843791 138943686608704 pyconfig.py:471] Config param use_jax_splash: False
I0425 07:34:08.843807 138943686608704 pyconfig.py:471] Config param use_max_logit_estimate: -1
I0425 07:34:08.843822 138943686608704 pyconfig.py:471] Config param use_mrope: False
I0425 07:34:08.843837 138943686608704 pyconfig.py:471] Config param use_multimodal: False
I0425 07:34:08.843852 138943686608704 pyconfig.py:471] Config param use_pathways: True
I0425 07:34:08.843868 138943686608704 pyconfig.py:471] Config param use_post_attn_norm: False
I0425 07:34:08.843882 138943686608704 pyconfig.py:471] Config param use_post_ffw_norm: False
I0425 07:34:08.843898 138943686608704 pyconfig.py:471] Config param use_qk_clip: False
I0425 07:34:08.843912 138943686608704 pyconfig.py:471] Config param use_qk_norm: False
I0425 07:34:08.843928 138943686608704 pyconfig.py:471] Config param use_qk_norm_in_gdn: True
I0425 07:34:08.843942 138943686608704 pyconfig.py:471] Config param use_qwix_quantization: False
I0425 07:34:08.843957 138943686608704 pyconfig.py:471] Config param use_ragged_attention: False
I0425 07:34:08.843971 138943686608704 pyconfig.py:471] Config param use_random_routing: False
I0425 07:34:08.843987 138943686608704 pyconfig.py:471] Config param use_replicator_service: False
I0425 07:34:08.844001 138943686608704 pyconfig.py:471] Config param use_ring_of_experts: False
I0425 07:34:08.844016 138943686608704 pyconfig.py:471] Config param use_sft: False
I0425 07:34:08.844033 138943686608704 pyconfig.py:471] Config param use_splash_scheduler: False
I0425 07:34:08.844048 138943686608704 pyconfig.py:471] Config param use_tokamax_gmm: False
I0425 07:34:08.844064 138943686608704 pyconfig.py:471] Config param use_tokamax_splash: False
I0425 07:34:08.844079 138943686608704 pyconfig.py:471] Config param use_truncation: True
I0425 07:34:08.844102 138943686608704 pyconfig.py:471] Config param use_tunix_gradient_accumulation: False
I0425 07:34:08.844118 138943686608704 pyconfig.py:471] Config param use_untrainable_positional_embedding: False
I0425 07:34:08.844133 138943686608704 pyconfig.py:471] Config param use_vertex_tensorboard: False
I0425 07:34:08.844172 138943686608704 pyconfig.py:471] Config param using_pipeline_parallelism: False
I0425 07:34:08.844187 138943686608704 pyconfig.py:471] Config param v_head_dim: 128
I0425 07:34:08.844203 138943686608704 pyconfig.py:471] Config param v_norm_with_scale: True
I0425 07:34:08.844219 138943686608704 pyconfig.py:471] Config param value_proj: RematLocation.REMAT
I0425 07:34:08.844234 138943686608704 pyconfig.py:471] Config param vertex_tensorboard_project: 
I0425 07:34:08.844250 138943686608704 pyconfig.py:471] Config param vertex_tensorboard_region: 
I0425 07:34:08.844265 138943686608704 pyconfig.py:471] Config param video_path: 
I0425 07:34:08.844285 138943686608704 pyconfig.py:471] Config param video_placeholder: <|video|>
I0425 07:34:08.844301 138943686608704 pyconfig.py:471] Config param vision_output_dim_for_vit: 4096
I0425 07:34:08.844318 138943686608704 pyconfig.py:471] Config param vision_output_length: -1
I0425 07:34:08.844334 138943686608704 pyconfig.py:471] Config param vllm_additional_config: {}
I0425 07:34:08.844350 138943686608704 pyconfig.py:471] Config param vllm_hf_config_path: 
I0425 07:34:08.844366 138943686608704 pyconfig.py:471] Config param vllm_hf_overrides: {}
I0425 07:34:08.844381 138943686608704 pyconfig.py:471] Config param vocab_size: 32000
I0425 07:34:08.844397 138943686608704 pyconfig.py:471] Config param warmup_steps_fraction: 0.1
I0425 07:34:08.844413 138943686608704 pyconfig.py:471] Config param weight_dtype: float32
I0425 07:34:08.844437 138943686608704 pyconfig.py:471] Config param weight_quantization_calibration_method: absmax
I0425 07:34:08.844452 138943686608704 pyconfig.py:471] Config param wi_tile_dlhs_batch_seq: 512
I0425 07:34:08.844468 138943686608704 pyconfig.py:471] Config param wi_tile_dlhs_embed_dim: 1024
I0425 07:34:08.844483 138943686608704 pyconfig.py:471] Config param wi_tile_dlhs_mlp_dim: 1024
I0425 07:34:08.844498 138943686608704 pyconfig.py:471] Config param wi_tile_drhs_batch_seq: 512
I0425 07:34:08.844515 138943686608704 pyconfig.py:471] Config param wi_tile_drhs_embed_dim: 1024
I0425 07:34:08.844529 138943686608704 pyconfig.py:471] Config param wi_tile_drhs_mlp_dim: 1024
I0425 07:34:08.844545 138943686608704 pyconfig.py:471] Config param wi_tile_fwd_batch_seq: 512
I0425 07:34:08.844560 138943686608704 pyconfig.py:471] Config param wi_tile_fwd_embed_dim: 1024
I0425 07:34:08.844575 138943686608704 pyconfig.py:471] Config param wi_tile_fwd_mlp_dim: 1024
I0425 07:34:08.844591 138943686608704 pyconfig.py:471] Config param wo_tile_dlhs_batch_seq: 512
I0425 07:34:08.844607 138943686608704 pyconfig.py:471] Config param wo_tile_dlhs_embed_dim: 1024
I0425 07:34:08.844621 138943686608704 pyconfig.py:471] Config param wo_tile_dlhs_mlp_dim: 1024
I0425 07:34:08.844637 138943686608704 pyconfig.py:471] Config param wo_tile_drhs_batch_seq: 512
I0425 07:34:08.844653 138943686608704 pyconfig.py:471] Config param wo_tile_drhs_embed_dim: 1024
I0425 07:34:08.844667 138943686608704 pyconfig.py:471] Config param wo_tile_drhs_mlp_dim: 1024
I0425 07:34:08.844682 138943686608704 pyconfig.py:471] Config param wo_tile_fwd_batch_seq: 512
I0425 07:34:08.844698 138943686608704 pyconfig.py:471] Config param wo_tile_fwd_embed_dim: 1024
I0425 07:34:08.844713 138943686608704 pyconfig.py:471] Config param wo_tile_fwd_mlp_dim: 1024
I0425 07:34:08.844729 138943686608704 pyconfig.py:471] Config param wsd_decay_steps_fraction: 0.1
I0425 07:34:08.844745 138943686608704 pyconfig.py:471] Config param wsd_decay_style: WsdDecayStyle.LINEAR
I0425 07:34:08.844761 138943686608704 pyconfig.py:471] Config param xprof_e2e_enable_fw_power_level_event: False
I0425 07:34:08.844777 138943686608704 pyconfig.py:471] Config param xprof_e2e_enable_fw_thermal_event: False
I0425 07:34:08.844792 138943686608704 pyconfig.py:471] Config param xprof_e2e_enable_fw_throttle_event: False
I0425 07:34:08.844806 138943686608704 pyconfig.py:471] Config param xprof_tpu_power_trace_level: 0
I0425 07:34:08.844824 138943686608704 pyconfig.py:471] Config param z_loss_multiplier: 0.0
I0425 07:34:08.845140 138943686608704 tokenizer.py:245] Tokenizer path: meta-llama/Llama-2-7b-chat-hf
I0425 07:34:08.845175 138943686608704 tokenizer.py:224] Loading HF tokenizer: meta-llama/Llama-2-7b-chat-hf
I0425 07:34:09.022780 138943686608704 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json "HTTP/1.1 200 OK"
I0425 07:34:09.129697 138943686608704 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json "HTTP/1.1 200 OK"
I0425 07:34:09.237979 138943686608704 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer_config.json "HTTP/1.1 200 OK"
I0425 07:34:09.345750 138943686608704 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer_config.json "HTTP/1.1 200 OK"
I0425 07:34:09.456047 138943686608704 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main/additional_chat_templates?recursive=false&expand=false "HTTP/1.1 404 Not Found"
I0425 07:34:09.563130 138943686608704 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main?recursive=true&expand=false "HTTP/1.1 200 OK"
I0425 07:34:09.673561 138943686608704 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer.model "HTTP/1.1 302 Found"
I0425 07:34:09.786691 138943686608704 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/xet-read-token/f5db02db724555f92da89c216ac04704f23d4590 "HTTP/1.1 200 OK"
I0425 07:34:10.427052 138943686608704 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer.json "HTTP/1.1 200 OK"
I0425 07:34:10.541959 138943686608704 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer.json "HTTP/1.1 200 OK"
I0425 07:34:10.844858 138943686608704 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/added_tokens.json "HTTP/1.1 404 Not Found"
I0425 07:34:10.957470 138943686608704 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/special_tokens_map.json "HTTP/1.1 200 OK"
I0425 07:34:11.065258 138943686608704 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/special_tokens_map.json "HTTP/1.1 200 OK"
I0425 07:34:11.172938 138943686608704 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/chat_template.jinja "HTTP/1.1 404 Not Found"
I0425 07:34:11.264698 138943686608704 _schedule.py:129] A polynomial schedule was set with a non-positive `transition_steps` value; this results in a constant schedule with value `init_value`.
I0425 07:34:11.271665 138943686608704 maxtext_utils.py:1604] Num_devices: 32, shape (1, 4, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1)
I0425 07:34:11.271812 138943686608704 train_distill.py:582] Applying logical axis rules for model initialization and training...
I0425 07:34:11.271888 138943686608704 train_distill.py:586] Loading Student from ...
I0425 07:34:11.271918 138943686608704 train_distill.py:170] --- Student Configuration ---
I0425 07:34:11.271940 138943686608704 train_distill.py:171]   Model Name:      gpt3-52k
I0425 07:34:11.271961 138943686608704 train_distill.py:172]   Dimensions:      1 Layers, 16 Emb Dim, 8 Head Dim
I0425 07:34:11.271981 138943686608704 train_distill.py:175]   Attention Heads: 2 Query, 2 KV
I0425 07:34:11.272000 138943686608704 train_distill.py:176]   Vocab Size:      32000
I0425 07:34:11.272017 138943686608704 train_distill.py:177]   Checkpoint:      
I0425 07:34:11.272037 138943686608704 train_distill.py:451] Initializing model: gpt3-52k...
I0425 07:34:12.548565 138943686608704 train_distill.py:600] Loading Teacher from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items...
I0425 07:34:12.548678 138943686608704 train_distill.py:170] --- Teacher Configuration ---
I0425 07:34:12.548708 138943686608704 train_distill.py:171]   Model Name:      gpt3-52k
I0425 07:34:12.548738 138943686608704 train_distill.py:172]   Dimensions:      1 Layers, 16 Emb Dim, 8 Head Dim
I0425 07:34:12.548759 138943686608704 train_distill.py:175]   Attention Heads: 2 Query, 2 KV
I0425 07:34:12.548779 138943686608704 train_distill.py:176]   Vocab Size:      32000
I0425 07:34:12.548796 138943686608704 train_distill.py:177]   Checkpoint:      gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items
I0425 07:34:12.548815 138943686608704 train_distill.py:451] Initializing model: gpt3-52k...
I0425 07:34:13.985520 138943686608704 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0425 07:34:13.985680 138943686608704 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=True, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7e5da0f0e540>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0425 07:34:13.985736 138943686608704 abstract_checkpointer.py:35] orbax-checkpoint version: 0.11.28
W0425 07:34:14.491720 138943686608704 checkpoint.py:202] Metadata file does not exist: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items/_CHECKPOINT_METADATA
I0425 07:34:15.034782    1932 google_auth_provider.cc:181] Running on GCE, using service account 562977990677-compute@developer.gserviceaccount.com
I0425 07:34:16.173784 138943686608704 checkpointer.py:304] Restoring checkpoint from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items.
W0425 07:34:18.315288 138943686608704 transform_utils.py:230] The transformations API will eventually be replaced by an upgraded design. The current API will not be removed until this point, but it will no longer be actively worked on.
I0425 07:34:18.315637 138943686608704 transform_utils.py:288] The following keys are not loaded from the original tree after applying specified transforms: params/params/decoder/to_nnx__rngs/aqt/count, params/params/decoder/to_nnx__rngs/aqt/key, params/params/decoder/to_nnx__rngs/dropout/count, params/params/decoder/to_nnx__rngs/dropout/key, params/params/decoder/to_nnx__rngs/params/count, params/params/decoder/to_nnx__rngs/params/key
I0425 07:34:20.663424 138943686608704 checkpointer.py:318] Finished restoring checkpoint in 4.87 seconds from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items.
I0425 07:34:21.409992 138943686608704 train_distill.py:626] Initializing Data Iterators via MaxText pipeline...
I0425 07:34:21.475017 138943686608704 config.py:112] TensorFlow version 2.20.0 available.
I0425 07:34:21.475538 138943686608704 config.py:125] JAX version 0.9.2 available.
I0425 07:34:21.926739 138943686608704 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/main/README.md "HTTP/1.1 307 Temporary Redirect"
I0425 07:34:21.934490 138943686608704 _client.py:1025] HTTP Request: HEAD https://huggingface.co/api/resolve-cache/datasets/OptimalScale/ClimbMix/6d467b96d8f26cbe7465e2d70e36191aa75867ac/README.md "HTTP/1.1 200 OK"
I0425 07:34:21.942048 138943686608704 _client.py:1025] HTTP Request: GET https://huggingface.co/api/resolve-cache/datasets/OptimalScale/ClimbMix/6d467b96d8f26cbe7465e2d70e36191aa75867ac/README.md "HTTP/1.1 200 OK"
I0425 07:34:22.047809 138943686608704 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/6d467b96d8f26cbe7465e2d70e36191aa75867ac/ClimbMix.py "HTTP/1.1 404 Not Found"
I0425 07:34:22.345930 138943686608704 _client.py:1025] HTTP Request: HEAD https://s3.amazonaws.com/datasets.huggingface.co/datasets/datasets/OptimalScale/ClimbMix/OptimalScale/ClimbMix.py "HTTP/1.1 404 Not Found"
I0425 07:34:22.462929 138943686608704 _client.py:1025] HTTP Request: GET https://huggingface.co/api/datasets/OptimalScale/ClimbMix/revision/6d467b96d8f26cbe7465e2d70e36191aa75867ac "HTTP/1.1 200 OK"
I0425 07:34:22.569684 138943686608704 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/6d467b96d8f26cbe7465e2d70e36191aa75867ac/.huggingface.yaml "HTTP/1.1 404 Not Found"
I0425 07:34:22.737663 138943686608704 _client.py:1025] HTTP Request: GET https://datasets-server.huggingface.co/info?dataset=OptimalScale/ClimbMix "HTTP/1.1 200 OK"
I0425 07:34:22.845307 138943686608704 _client.py:1025] HTTP Request: GET https://huggingface.co/api/datasets/OptimalScale/ClimbMix/tree/6d467b96d8f26cbe7465e2d70e36191aa75867ac/data?recursive=true&expand=false "HTTP/1.1 404 Not Found"
I0425 07:34:23.003647 138943686608704 _client.py:1025] HTTP Request: GET https://huggingface.co/api/datasets/OptimalScale/ClimbMix/tree/6d467b96d8f26cbe7465e2d70e36191aa75867ac?recursive=false&expand=false "HTTP/1.1 200 OK"
I0425 07:34:23.141748 138943686608704 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/6d467b96d8f26cbe7465e2d70e36191aa75867ac/dataset_infos.json "HTTP/1.1 404 Not Found"
I0425 07:34:23.303148 138943686608704 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json "HTTP/1.1 200 OK"
I0425 07:34:23.410048 138943686608704 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer_config.json "HTTP/1.1 200 OK"
I0425 07:34:23.524925 138943686608704 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main/additional_chat_templates?recursive=false&expand=false "HTTP/1.1 404 Not Found"
I0425 07:34:23.630282 138943686608704 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main?recursive=true&expand=false "HTTP/1.1 200 OK"
E0425 07:34:23.722237 138943686608704 packing.py:209] PackAndBatchOperation is deprecated. Please use lazy_dataset.FirstFitPackIterDataset instead.
I0425 07:34:23.722445 138943686608704 data_loader.py:408] Adding CopyNumPyArrayToSharedMemory MapTransform.
I0425 07:34:23.725466 138943686608704 train_distill.py:396] Input Pipeline Checkpointing: DISABLED
I0425 07:34:23.725526 138943686608704 train_distill.py:400] Reason: Iterator 'MultiHostDataLoadIterator' is not recognized as Grain (dataset_type='DatasetType.HF', has_save=False)
I0425 07:34:23.725591 138943686608704 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0425 07:34:23.725665 138943686608704 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7e5da0f0e540>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0425 07:34:23.725707 138943686608704 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0425 07:34:23.725738 138943686608704 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7e5da0f0e540>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0425 07:34:23.725780 138943686608704 checkpoint_manager.py:702] [process=6][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e5814772ff0>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e45781ab8f0>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e45781ab800>}, handler_registry=None
I0425 07:34:23.725970 138943686608704 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e5814772ff0>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0425 07:34:23.726011 138943686608704 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e45781ab8f0>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0425 07:34:23.726038 138943686608704 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e45781ab800>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0425 07:34:23.726063 138943686608704 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e4578209280>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0425 07:34:23.726090 138943686608704 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e5814772ff0>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e5814772ff0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e45781ab8f0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e45781ab8f0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e45781ab800>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e45781ab800>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e4578209280>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e4578209280>}).
I0425 07:34:23.726498 138943686608704 async_checkpointer.py:177] [process=6][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7e47441a04a0> timeout: 600 secs and primary_host=0 for async checkpoint writes
I0425 07:34:25.310899 138943686608704 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_main_20260425_071506/pt_distill_nnx_xpk_main_20260425_071506_07_distill_smoke/checkpoints
I0425 07:34:25.333823 138943686608704 checkpoint_manager.py:921] [process=6][thread=MainThread] CheckpointManager created,  primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_main_20260425_071506/pt_distill_nnx_xpk_main_20260425_071506_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7e45781ab7d0>
I0425 07:34:25.333959 138943686608704 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0425 07:34:25.334042 138943686608704 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7e5da0f0e540>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0425 07:34:25.334090 138943686608704 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0425 07:34:25.334154 138943686608704 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7e5da0f0e540>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0425 07:34:25.334208 138943686608704 checkpoint_manager.py:1983] [process=6][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning.
I0425 07:34:25.334281 138943686608704 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=138943686608704 count=1 at 0x7e541825a540>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7e45781ab650>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7e45781ab620>, _write_futures=[])
I0425 07:34:25.334757 138943686608704 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=138943686608704 count=1 at 0x7e541825a540>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7e45781ab650>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7e45781ab620>, _write_futures=[])
I0425 07:34:25.334805 138943686608704 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=138943686608704 count=1 at 0x7e541825a540>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7e45781ab650>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7e45781ab620>, _write_futures=[])
I0425 07:34:25.334856 138943686608704 checkpoint_manager.py:702] [process=6][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e45781ab7a0>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e45781aad50>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e45781a99d0>, 'iter': <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7e45781a9d60>}, handler_registry=None
I0425 07:34:25.334979 138943686608704 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e45781ab7a0>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0425 07:34:25.335025 138943686608704 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e45781aad50>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0425 07:34:25.335066 138943686608704 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e45781a99d0>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0425 07:34:25.335124 138943686608704 composite_checkpoint_handler.py:237] Deferred registration for item: "iter". Adding handler `<maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7e45781a9d60>` for item "iter" and save args `<class 'maxtext.common.checkpointing.GrainCheckpointSave'>` and restore args `<class 'maxtext.common.checkpointing.GrainCheckpointRestore'>` to `_handler_registry`.
I0425 07:34:25.335160 138943686608704 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e45781a9cd0>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0425 07:34:25.335197 138943686608704 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e45781ab7a0>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e45781ab7a0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e45781aad50>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e45781aad50>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e45781a99d0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e45781a99d0>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointSave'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7e45781a9d60>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointRestore'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7e45781a9d60>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e45781a9cd0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e45781a9cd0>}).
I0425 07:34:25.335292 138943686608704 async_checkpointer.py:177] [process=6][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7e47441a0720> timeout: 600 secs and primary_host=0 for async checkpoint writes
I0425 07:34:25.709077 138943686608704 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_main_20260425_071506/pt_distill_nnx_xpk_main_20260425_071506_07_distill_smoke/checkpoints
I0425 07:34:25.725298 138943686608704 checkpoint_manager.py:921] [process=6][thread=MainThread] CheckpointManager created,  primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_main_20260425_071506/pt_distill_nnx_xpk_main_20260425_071506_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7e457011aff0>
I0425 07:34:25.725741 138943686608704 train_distill.py:677] Starting Distillation Training...
I0425 07:34:25.725849 138943686608704 peft_trainer.py:584] Training with mesh: Mesh('diloco': 1, 'data': 4, 'stage': 1, 'fsdp': 8, 'fsdp_transpose': 1, 'context': 1, 'context_autoregressive': 1, 'tensor': 1, 'tensor_transpose': 1, 'tensor_sequence': 1, 'expert': 1, 'autoregressive': 1, axis_types=(Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto))
I0425 07:34:26.194831 138943686608704 peft_trainer.py:594] Compiled train_step cache size: 0
I0425 07:34:26.196406 138791851382528 grain_pool.py:367] Grain pool will use 1 processes.
I0425 07:34:26.249216 138791851382528 grain_pool.py:440] Grain pool will start child processes.
I0425 07:34:26.255029 138791851382528 grain_pool.py:448] Grain pool started all child processes.
2026-04-25 07:34:32.776296: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
`rope_parameters`'s factor field must be a float >= 1, got 40
`rope_parameters`'s beta_fast field must be a float, got 32
`rope_parameters`'s beta_slow field must be a float, got 1
DeepseekV32Config got `key=rope_scaling` in kwargs but hasn't set it as attribute. For RoPE standardization you need to set `self.rope_parameters` in model's config. 
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 781, in <module>
    app.run(main)
  File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 316, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 261, in _run_main
    sys.exit(main(argv))
             ^^^^^^^^^^
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 777, in main
    train_distill(student_config, teacher_config, is_offline, global_config.offline_data_dir)
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 679, in train_distill
    trainer.train(train_iter, eval_iter)
  File "/usr/local/lib/python3.12/site-packages/tunix/sft/peft_trainer.py", line 652, in train
    train_example = sharding_utils.shard_input(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 58, in shard_input
    return jax.tree.map(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/tree.py", line 156, in map
    return tree_util.tree_map(f, tree, *rest, is_leaf=is_leaf)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 373, in tree_map
    return treedef.unflatten(f(*xs) for xs in zip(*all_leaves))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 373, in <genexpr>
    return treedef.unflatten(f(*xs) for xs in zip(*all_leaves))
                             ^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 59, in <lambda>
    lambda x: jax.make_array_from_process_local_data(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 985, in make_array_from_process_local_data
    out = [_array_from_process_local_data(data, s, shape)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 1047, in _array_from_process_local_data
    return make_array_from_callback(global_shape, sharding, cb)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 844, in make_array_from_callback
    per_device_values = api.device_put(per_device_values, devices)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/api.py", line 2732, in device_put
    out_flat = dispatch._batched_device_put_impl(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 602, in _batched_device_put_impl
    y = _device_put_impl(x, device=device, src=src, copy=cp, aval=aval)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 582, in _device_put_impl
    return _device_put_sharding_impl(x, aval, device, copy)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 512, in _device_put_sharding_impl
    raise ValueError(
ValueError: When the second argument to `device_put` is a Device, the first argument must be a fully addressable array or a non-addressable array with a single device sharding. Got value with devices {TpuDevice(id=12, process_index=2, coords=(0,3,0), core_on_chip=0), TpuDevice(id=19, process_index=5, coords=(3,4,0), core_on_chip=0), TpuDevice(id=13, process_index=2, coords=(1,3,0), core_on_chip=0), TpuDevice(id=29, process_index=6, coords=(1,7,0), core_on_chip=0), TpuDevice(id=16, process_index=4, coords=(0,4,0), core_on_chip=0), TpuDevice(id=15, process_index=3, coords=(3,3,0), core_on_chip=0), TpuDevice(id=9, process_index=2, coords=(1,2,0), core_on_chip=0), TpuDevice(id=18, process_index=5, coords=(2,4,0), core_on_chip=0), TpuDevice(id=28, process_index=6, coords=(0,7,0), core_on_chip=0), TpuDevice(id=22, process_index=5, coords=(2,5,0), core_on_chip=0), TpuDevice(id=7, process_index=1, coords=(3,1,0), core_on_chip=0), TpuDevice(id=10, process_index=3, coords=(2,2,0), core_on_chip=0), TpuDevice(id=26, process_index=7, coords=(2,6,0), core_on_chip=0), TpuDevice(id=4, process_index=0, coords=(0,1,0), core_on_chip=0), TpuDevice(id=1, process_index=0, coords=(1,0,0), core_on_chip=0), TpuDevice(id=17, process_index=4, coords=(1,4,0), core_on_chip=0), TpuDevice(id=3, process_index=1, coords=(3,0,0), core_on_chip=0), TpuDevice(id=6, process_index=1, coords=(2,1,0), core_on_chip=0), TpuDevice(id=2, process_index=1, coords=(2,0,0), core_on_chip=0), TpuDevice(id=14, process_index=3, coords=(2,3,0), core_on_chip=0), TpuDevice(id=11, process_index=3, coords=(3,2,0), core_on_chip=0), TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), TpuDevice(id=8, process_index=2, coords=(0,2,0), core_on_chip=0), TpuDevice(id=30, process_index=7, coords=(2,7,0), core_on_chip=0), TpuDevice(id=23, process_index=5, coords=(3,5,0), core_on_chip=0), TpuDevice(id=21, process_index=4, coords=(1,5,0), core_on_chip=0), TpuDevice(id=27, process_index=7, coords=(3,6,0), core_on_chip=0), TpuDevice(id=25, process_index=6, coords=(1,6,0), core_on_chip=0), TpuDevice(id=20, process_index=4, coords=(0,5,0), core_on_chip=0), TpuDevice(id=31, process_index=7, coords=(3,7,0), core_on_chip=0), TpuDevice(id=24, process_index=6, coords=(0,6,0), core_on_chip=0), TpuDevice(id=5, process_index=0, coords=(1,1,0), core_on_chip=0)}
I0425 07:34:36.882372 138791851382528 grain_pool.py:542] Grain pool is exiting.
I0425 07:34:36.882472 138791851382528 grain_pool.py:547] Shutting down multiprocessing system.
I0425 07:34:38.584295 138791851382528 grain_pool.py:547] Shutting down multiprocessing system.
/usr/local/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 15 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
XPK End: Sat Apr 25 07:34:49 UTC 2026
EXIT_CODE=1