MaxView

← Back to run

Log Summary

XPK Start: Sat Apr 25 09:29:17 UTC 2026
2026-04-25 09:29:35.240027: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
`rope_parameters`'s factor field must be a float >= 1, got 40
`rope_parameters`'s beta_fast field must be a float, got 32
`rope_parameters`'s beta_slow field must be a float, got 1
DeepseekV32Config got `key=rope_scaling` in kwargs but hasn't set it as attribute. For RoPE standardization you need to set `self.rope_parameters` in model's config. 
I0425 09:29:41.629810 140339395942208 max_utils.py:273] Attempting to initialize the jax distributed system...
I0425 09:29:50.669717 140339395942208 distributed.py:149] Starting JAX distributed service on [::]:8482
I0425 09:29:50.671979 140339395942208 distributed.py:172] Connecting to JAX distributed service on mt-07-distill-smoke-uaih9-slice-job-0-0.mt-07-distill-smoke-uaih9:8482
I0425 09:29:51.950554 140339395942208 max_utils.py:284] Jax distributed system initialized!
I0425 09:29:58.253046 140339395942208 max_utils.py:244] Jax distributed system is already initialized.
W0425 09:29:58.385421 140339395942208 pyconfig.py:111] base_output_directory is not provided; Using local directory called maxtext_output
I0425 09:29:58.446465 140339395942208 max_utils.py:244] Jax distributed system is already initialized.
I0425 09:29:58.447682 140339395942208 pyconfig.py:471] Config param abort_on_inf_loss: True
I0425 09:29:58.447731 140339395942208 pyconfig.py:471] Config param abort_on_nan_loss: True
I0425 09:29:58.447756 140339395942208 pyconfig.py:471] Config param act_quantization_calibration_method: absmax
I0425 09:29:58.447777 140339395942208 pyconfig.py:471] Config param activation_dropout_for_audio: 0.0
I0425 09:29:58.447796 140339395942208 pyconfig.py:471] Config param activation_function_for_audio: gelu
I0425 09:29:58.447815 140339395942208 pyconfig.py:471] Config param activations_in_float32: False
I0425 09:29:58.447832 140339395942208 pyconfig.py:471] Config param adam_b1: 0.9
I0425 09:29:58.447851 140339395942208 pyconfig.py:471] Config param adam_b2: 0.95
I0425 09:29:58.447867 140339395942208 pyconfig.py:471] Config param adam_eps: 1e-08
I0425 09:29:58.447890 140339395942208 pyconfig.py:471] Config param adam_eps_root: 0.0
I0425 09:29:58.447906 140339395942208 pyconfig.py:471] Config param adam_weight_decay: 0.1
I0425 09:29:58.447923 140339395942208 pyconfig.py:471] Config param adamw_mask: []
I0425 09:29:58.447939 140339395942208 pyconfig.py:471] Config param add_bos: True
I0425 09:29:58.447956 140339395942208 pyconfig.py:471] Config param add_eos: True
I0425 09:29:58.447971 140339395942208 pyconfig.py:471] Config param allow_split_physical_axes: False
I0425 09:29:58.447988 140339395942208 pyconfig.py:471] Config param ar_cache_axis_order: 1,2,0,3
I0425 09:29:58.448005 140339395942208 pyconfig.py:471] Config param async_checkpointing: True
I0425 09:29:58.448021 140339395942208 pyconfig.py:471] Config param async_scheduling: False
I0425 09:29:58.448036 140339395942208 pyconfig.py:471] Config param attention: dot_product
I0425 09:29:58.448053 140339395942208 pyconfig.py:471] Config param attention_bias: False
I0425 09:29:58.448069 140339395942208 pyconfig.py:471] Config param attention_dropout_for_audio: 0.0
I0425 09:29:58.448086 140339395942208 pyconfig.py:471] Config param attention_out: RematLocation.REMAT
I0425 09:29:58.448117 140339395942208 pyconfig.py:471] Config param attention_output_dim: -1
I0425 09:29:58.448132 140339395942208 pyconfig.py:471] Config param attention_sink: False
I0425 09:29:58.448148 140339395942208 pyconfig.py:471] Config param attention_type: global
I0425 09:29:58.448172 140339395942208 pyconfig.py:471] Config param attn_logits_soft_cap: None
I0425 09:29:58.448196 140339395942208 pyconfig.py:471] Config param audio_path: 
I0425 09:29:58.448222 140339395942208 pyconfig.py:471] Config param audio_placeholder: <|audio|>
I0425 09:29:58.448247 140339395942208 pyconfig.py:471] Config param autoregressive_decode_assert: 
I0425 09:29:58.448273 140339395942208 pyconfig.py:471] Config param base_config: base.yml
I0425 09:29:58.448299 140339395942208 pyconfig.py:471] Config param base_emb_dim: 16
I0425 09:29:58.448330 140339395942208 pyconfig.py:471] Config param base_mlp_dim: 64
I0425 09:29:58.448359 140339395942208 pyconfig.py:471] Config param base_moe_mlp_dim: -1
I0425 09:29:58.448386 140339395942208 pyconfig.py:471] Config param base_num_decoder_layers: 1
I0425 09:29:58.448413 140339395942208 pyconfig.py:471] Config param base_num_kv_heads: 2
I0425 09:29:58.448435 140339395942208 pyconfig.py:471] Config param base_num_query_heads: 2
I0425 09:29:58.448451 140339395942208 pyconfig.py:471] Config param base_output_directory: /deps/maxtext_output
I0425 09:29:58.448467 140339395942208 pyconfig.py:471] Config param batch_size: 1
I0425 09:29:58.448491 140339395942208 pyconfig.py:471] Config param batch_split_factor: 1
I0425 09:29:58.448515 140339395942208 pyconfig.py:471] Config param beta_fast: 32
I0425 09:29:58.448540 140339395942208 pyconfig.py:471] Config param beta_slow: 1
I0425 09:29:58.448566 140339395942208 pyconfig.py:471] Config param bwd_quantization_calibration_method: absmax
I0425 09:29:58.448593 140339395942208 pyconfig.py:471] Config param capacity_factor: -1.0
I0425 09:29:58.448620 140339395942208 pyconfig.py:471] Config param cast_logits_to_fp32: True
I0425 09:29:58.448646 140339395942208 pyconfig.py:471] Config param chat_template: 
I0425 09:29:58.448669 140339395942208 pyconfig.py:471] Config param chat_template_path: 
I0425 09:29:58.448690 140339395942208 pyconfig.py:471] Config param checkpoint_conversion_fn: None
I0425 09:29:58.448717 140339395942208 pyconfig.py:471] Config param checkpoint_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-09-29/checkpoints/
I0425 09:29:58.448740 140339395942208 pyconfig.py:471] Config param checkpoint_is_quantized: False
I0425 09:29:58.448761 140339395942208 pyconfig.py:471] Config param checkpoint_period: 2000
I0425 09:29:58.448787 140339395942208 pyconfig.py:471] Config param checkpoint_storage_concurrent_gb: 96
I0425 09:29:58.448813 140339395942208 pyconfig.py:471] Config param checkpoint_storage_target_data_file_size_bytes: 2147483648
I0425 09:29:58.448839 140339395942208 pyconfig.py:471] Config param checkpoint_storage_use_ocdbt: True
I0425 09:29:58.448863 140339395942208 pyconfig.py:471] Config param checkpoint_storage_use_zarr3: True
I0425 09:29:58.448886 140339395942208 pyconfig.py:471] Config param checkpoint_todelete_full_path: None
I0425 09:29:58.448908 140339395942208 pyconfig.py:471] Config param checkpoint_todelete_subdir: None
I0425 09:29:58.448934 140339395942208 pyconfig.py:471] Config param chips_per_vm: 4
I0425 09:29:58.448957 140339395942208 pyconfig.py:471] Config param chunk_attn_window_size: 0
I0425 09:29:58.448982 140339395942208 pyconfig.py:471] Config param collect_stack_trace: False
I0425 09:29:58.449005 140339395942208 pyconfig.py:471] Config param colocated_python_checkpointing: False
I0425 09:29:58.449030 140339395942208 pyconfig.py:471] Config param colocated_python_data_input: False
I0425 09:29:58.449054 140339395942208 pyconfig.py:471] Config param compile_topology: 
I0425 09:29:58.449075 140339395942208 pyconfig.py:471] Config param compile_topology_num_slices: -1
I0425 09:29:58.449118 140339395942208 pyconfig.py:471] Config param compile_xla_flags: 
I0425 09:29:58.449140 140339395942208 pyconfig.py:471] Config param compiled_trainstep_file: 
I0425 09:29:58.449162 140339395942208 pyconfig.py:471] Config param compute_axis_order: 0,1,2,3
I0425 09:29:58.449182 140339395942208 pyconfig.py:471] Config param constant_bound_config: []
I0425 09:29:58.449203 140339395942208 pyconfig.py:471] Config param context: RematLocation.REMAT
I0425 09:29:58.449227 140339395942208 pyconfig.py:471] Config param context_parallel_load_balance: True
I0425 09:29:58.449252 140339395942208 pyconfig.py:471] Config param context_parallel_reorder_strategy: ReorderStrategy.AUTO
I0425 09:29:58.449285 140339395942208 pyconfig.py:471] Config param context_parallel_size: 1
I0425 09:29:58.449306 140339395942208 pyconfig.py:471] Config param context_parallel_strategy: all_gather
I0425 09:29:58.449326 140339395942208 pyconfig.py:471] Config param context_sharding: context
I0425 09:29:58.449345 140339395942208 pyconfig.py:471] Config param conv_chunksize_for_audio: 500
I0425 09:29:58.449369 140339395942208 pyconfig.py:471] Config param conv_stride_for_vit: 14
I0425 09:29:58.449390 140339395942208 pyconfig.py:471] Config param convert_checkpoint_if_possible: False
I0425 09:29:58.449413 140339395942208 pyconfig.py:471] Config param cost_estimate_flops_bwd: -1
I0425 09:29:58.449430 140339395942208 pyconfig.py:471] Config param cost_estimate_flops_fwd: -1
I0425 09:29:58.449446 140339395942208 pyconfig.py:471] Config param custom_mesh: 
I0425 09:29:58.449460 140339395942208 pyconfig.py:471] Config param custom_mesh_and_rule: 
I0425 09:29:58.449475 140339395942208 pyconfig.py:471] Config param d_model_for_audio: 256
I0425 09:29:58.449490 140339395942208 pyconfig.py:471] Config param data_sharding: (('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'),)
I0425 09:29:58.449511 140339395942208 pyconfig.py:471] Config param data_shuffle_seed: 0
I0425 09:29:58.449526 140339395942208 pyconfig.py:471] Config param dataset_name: c4/en:3.0.1
I0425 09:29:58.449541 140339395942208 pyconfig.py:471] Config param dataset_path: 
I0425 09:29:58.449555 140339395942208 pyconfig.py:471] Config param dataset_type: DatasetType.HF
I0425 09:29:58.449573 140339395942208 pyconfig.py:471] Config param dcn_autoregressive_parallelism: 1
I0425 09:29:58.449588 140339395942208 pyconfig.py:471] Config param dcn_context_autoregressive_parallelism: 1
I0425 09:29:58.449603 140339395942208 pyconfig.py:471] Config param dcn_context_parallelism: 1
I0425 09:29:58.449617 140339395942208 pyconfig.py:471] Config param dcn_data_parallelism: -1
I0425 09:29:58.449633 140339395942208 pyconfig.py:471] Config param dcn_diloco_parallelism: 1
I0425 09:29:58.449647 140339395942208 pyconfig.py:471] Config param dcn_expert_parallelism: 1
I0425 09:29:58.449663 140339395942208 pyconfig.py:471] Config param dcn_fsdp_parallelism: 1
I0425 09:29:58.449677 140339395942208 pyconfig.py:471] Config param dcn_fsdp_transpose_parallelism: 1
I0425 09:29:58.449693 140339395942208 pyconfig.py:471] Config param dcn_parallelism: [1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
I0425 09:29:58.449708 140339395942208 pyconfig.py:471] Config param dcn_pipeline_parallelism: 1
I0425 09:29:58.449724 140339395942208 pyconfig.py:471] Config param dcn_sequence_parallelism: 1
I0425 09:29:58.449738 140339395942208 pyconfig.py:471] Config param dcn_tensor_parallelism: 1
I0425 09:29:58.449753 140339395942208 pyconfig.py:471] Config param dcn_tensor_sequence_parallelism: 1
I0425 09:29:58.449768 140339395942208 pyconfig.py:471] Config param dcn_tensor_transpose_parallelism: 1
I0425 09:29:58.449784 140339395942208 pyconfig.py:471] Config param debug: {'rl': False}
I0425 09:29:58.449798 140339395942208 pyconfig.py:471] Config param debug_sharding: False
I0425 09:29:58.449814 140339395942208 pyconfig.py:471] Config param decode_sampling_nucleus_p: -1
I0425 09:29:58.449828 140339395942208 pyconfig.py:471] Config param decode_sampling_strategy: SamplingStrategy.GREEDY
I0425 09:29:58.449846 140339395942208 pyconfig.py:471] Config param decode_sampling_temperature: 1.0
I0425 09:29:58.449860 140339395942208 pyconfig.py:471] Config param decode_sampling_top_k: 0
I0425 09:29:58.449876 140339395942208 pyconfig.py:471] Config param decoder_block: DecoderBlockType.GPT3
I0425 09:29:58.449891 140339395942208 pyconfig.py:471] Config param decoder_layer_input: RematLocation.DEVICE
I0425 09:29:58.449908 140339395942208 pyconfig.py:471] Config param deepstack_visual_indexes_for_vit: []
I0425 09:29:58.449922 140339395942208 pyconfig.py:471] Config param degenerate_group_masking: True
I0425 09:29:58.449939 140339395942208 pyconfig.py:471] Config param dense_init_scale: 1.0
I0425 09:29:58.449953 140339395942208 pyconfig.py:471] Config param diloco_outer_lr: 0.3
I0425 09:29:58.449969 140339395942208 pyconfig.py:471] Config param diloco_outer_momentum: 0.9
I0425 09:29:58.449984 140339395942208 pyconfig.py:471] Config param diloco_sync_period: 36
I0425 09:29:58.450000 140339395942208 pyconfig.py:471] Config param distill_alpha: 0.5
I0425 09:29:58.450019 140339395942208 pyconfig.py:471] Config param distill_alpha_end: None
I0425 09:29:58.450035 140339395942208 pyconfig.py:471] Config param distill_alpha_schedule: constant
I0425 09:29:58.450049 140339395942208 pyconfig.py:471] Config param distill_beta: 0.0
I0425 09:29:58.450066 140339395942208 pyconfig.py:471] Config param distill_beta_end: None
I0425 09:29:58.450082 140339395942208 pyconfig.py:471] Config param distill_beta_schedule: constant
I0425 09:29:58.450125 140339395942208 pyconfig.py:471] Config param distill_feature_loss_type: cosine
I0425 09:29:58.450142 140339395942208 pyconfig.py:471] Config param distill_layer_indices: None
I0425 09:29:58.450156 140339395942208 pyconfig.py:471] Config param distill_temperature: 1.0
I0425 09:29:58.450172 140339395942208 pyconfig.py:471] Config param distill_temperature_end: None
I0425 09:29:58.450186 140339395942208 pyconfig.py:471] Config param distill_temperature_schedule: constant
I0425 09:29:58.450202 140339395942208 pyconfig.py:471] Config param downsample_hidden_size_for_audio: 256
I0425 09:29:58.450216 140339395942208 pyconfig.py:471] Config param dpo_beta: 0.1
I0425 09:29:58.450232 140339395942208 pyconfig.py:471] Config param dpo_label_smoothing: 0.0
I0425 09:29:58.450246 140339395942208 pyconfig.py:471] Config param dq_reduction_steps: 0
I0425 09:29:58.450262 140339395942208 pyconfig.py:471] Config param dropout_rate: 0.0
I0425 09:29:58.450277 140339395942208 pyconfig.py:471] Config param dtype: bfloat16
I0425 09:29:58.450318 140339395942208 pyconfig.py:471] Config param dtype_mm: float32
I0425 09:29:58.450343 140339395942208 pyconfig.py:471] Config param dump_hlo: False
I0425 09:29:58.450365 140339395942208 pyconfig.py:471] Config param dump_hlo_delete_local_after: True
I0425 09:29:58.450380 140339395942208 pyconfig.py:471] Config param dump_hlo_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-09-29/xla_dump
I0425 09:29:58.450396 140339395942208 pyconfig.py:471] Config param dump_hlo_local_dir: /tmp/xla_dump/
I0425 09:29:58.450411 140339395942208 pyconfig.py:471] Config param dump_hlo_local_module_name: jit_train_step
I0425 09:29:58.450426 140339395942208 pyconfig.py:471] Config param dump_hlo_module_name: jit_train_step
I0425 09:29:58.450440 140339395942208 pyconfig.py:471] Config param dump_hlo_upload_all: False
I0425 09:29:58.450456 140339395942208 pyconfig.py:471] Config param dump_hlo_xla_flags: 
I0425 09:29:58.450470 140339395942208 pyconfig.py:471] Config param dump_jaxpr: False
I0425 09:29:58.450486 140339395942208 pyconfig.py:471] Config param dump_jaxpr_delete_local_after: True
I0425 09:29:58.450500 140339395942208 pyconfig.py:471] Config param dump_jaxpr_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-09-29/jaxpr_dump
I0425 09:29:58.450516 140339395942208 pyconfig.py:471] Config param dump_jaxpr_local_dir: /tmp/jaxpr_dump/
I0425 09:29:58.450530 140339395942208 pyconfig.py:471] Config param dump_step: -1
I0425 09:29:58.450546 140339395942208 pyconfig.py:471] Config param elastic_enabled: False
I0425 09:29:58.450560 140339395942208 pyconfig.py:471] Config param elastic_max_retries: 10
I0425 09:29:58.450577 140339395942208 pyconfig.py:471] Config param elastic_timeout_seconds: 300
I0425 09:29:58.450592 140339395942208 pyconfig.py:471] Config param emb_dim: 16
I0425 09:29:58.450608 140339395942208 pyconfig.py:471] Config param enable_autocheckpoint: False
I0425 09:29:58.450622 140339395942208 pyconfig.py:471] Config param enable_checkpoint_cloud_logger: False
I0425 09:29:58.450638 140339395942208 pyconfig.py:471] Config param enable_checkpointing: True
I0425 09:29:58.450652 140339395942208 pyconfig.py:471] Config param enable_continuous_checkpointing: False
I0425 09:29:58.450668 140339395942208 pyconfig.py:471] Config param enable_data_shuffling: True
I0425 09:29:58.450682 140339395942208 pyconfig.py:471] Config param enable_diloco: False
I0425 09:29:58.450698 140339395942208 pyconfig.py:471] Config param enable_dp_attention: False
I0425 09:29:58.450712 140339395942208 pyconfig.py:471] Config param enable_dropout: False
I0425 09:29:58.450728 140339395942208 pyconfig.py:471] Config param enable_emergency_checkpoint: False
I0425 09:29:58.450743 140339395942208 pyconfig.py:471] Config param enable_expert_parallel: False
I0425 09:29:58.450759 140339395942208 pyconfig.py:471] Config param enable_gcp_goodput_metrics: True
I0425 09:29:58.450773 140339395942208 pyconfig.py:471] Config param enable_gcp_step_deviation_metrics: True
I0425 09:29:58.450789 140339395942208 pyconfig.py:471] Config param enable_goodput_recording: False
I0425 09:29:58.450803 140339395942208 pyconfig.py:471] Config param enable_jax_profiler: False
I0425 09:29:58.450819 140339395942208 pyconfig.py:471] Config param enable_llm_inference_pool: False
I0425 09:29:58.450833 140339395942208 pyconfig.py:471] Config param enable_model_warmup: False
I0425 09:29:58.450848 140339395942208 pyconfig.py:471] Config param enable_multi_tier_checkpointing: False
I0425 09:29:58.450863 140339395942208 pyconfig.py:471] Config param enable_nnx: False
I0425 09:29:58.450878 140339395942208 pyconfig.py:471] Config param enable_orbax_v1: False
I0425 09:29:58.450893 140339395942208 pyconfig.py:471] Config param enable_padding_causal_mask: True
I0425 09:29:58.450908 140339395942208 pyconfig.py:471] Config param enable_pathways_goodput: False
I0425 09:29:58.450922 140339395942208 pyconfig.py:471] Config param enable_prefix_caching: False
I0425 09:29:58.450938 140339395942208 pyconfig.py:471] Config param enable_rampup_batch_size: False
I0425 09:29:58.450952 140339395942208 pyconfig.py:471] Config param enable_single_controller: False
I0425 09:29:58.450968 140339395942208 pyconfig.py:471] Config param enable_single_replica_ckpt_restoring: False
I0425 09:29:58.450983 140339395942208 pyconfig.py:471] Config param enable_tensorboard: True
I0425 09:29:58.450998 140339395942208 pyconfig.py:471] Config param enable_tunix_perf_metrics: False
I0425 09:29:58.451014 140339395942208 pyconfig.py:471] Config param encoder_attention_heads_for_audio: 4
I0425 09:29:58.451030 140339395942208 pyconfig.py:471] Config param encoder_ffn_dim_for_audio: 512
I0425 09:29:58.451045 140339395942208 pyconfig.py:471] Config param encoder_layers_for_audio: 2
I0425 09:29:58.451061 140339395942208 pyconfig.py:471] Config param engram: RematLocation.REMAT
I0425 09:29:58.451075 140339395942208 pyconfig.py:471] Config param engram_head_dim: 1280
I0425 09:29:58.451091 140339395942208 pyconfig.py:471] Config param engram_kernel_size: 4
I0425 09:29:58.451116 140339395942208 pyconfig.py:471] Config param engram_layers: []
I0425 09:29:58.451133 140339395942208 pyconfig.py:471] Config param engram_max_ngram_size: 3
I0425 09:29:58.451147 140339395942208 pyconfig.py:471] Config param engram_num_heads: 8
I0425 09:29:58.451162 140339395942208 pyconfig.py:471] Config param engram_seed: 0
I0425 09:29:58.451176 140339395942208 pyconfig.py:471] Config param engram_vocab_bases: []
I0425 09:29:58.451192 140339395942208 pyconfig.py:471] Config param epsilon_high: None
I0425 09:29:58.451207 140339395942208 pyconfig.py:471] Config param eval_corr_lst: False
I0425 09:29:58.451222 140339395942208 pyconfig.py:471] Config param eval_data_columns: ['text']
I0425 09:29:58.451237 140339395942208 pyconfig.py:471] Config param eval_dataset_name: c4/en:3.0.1
I0425 09:29:58.451252 140339395942208 pyconfig.py:471] Config param eval_image_column: image
I0425 09:29:58.451267 140339395942208 pyconfig.py:471] Config param eval_interval: -1
I0425 09:29:58.451282 140339395942208 pyconfig.py:471] Config param eval_make_lst: False
I0425 09:29:58.451297 140339395942208 pyconfig.py:471] Config param eval_per_device_batch_size: 2
I0425 09:29:58.451317 140339395942208 pyconfig.py:471] Config param eval_sampling_strategy: greedy
I0425 09:29:58.451332 140339395942208 pyconfig.py:471] Config param eval_split: validation
I0425 09:29:58.451349 140339395942208 pyconfig.py:471] Config param eval_steps: -1
I0425 09:29:58.451365 140339395942208 pyconfig.py:471] Config param expansion_factor_real_data: -1.0
I0425 09:29:58.451381 140339395942208 pyconfig.py:471] Config param final_logits_soft_cap: None
I0425 09:29:58.451395 140339395942208 pyconfig.py:471] Config param first_num_dense_layers: 0
I0425 09:29:58.451412 140339395942208 pyconfig.py:471] Config param float32_gate_logits: False
I0425 09:29:58.451428 140339395942208 pyconfig.py:471] Config param float32_logits: False
I0425 09:29:58.451444 140339395942208 pyconfig.py:471] Config param float32_qk_product: False
I0425 09:29:58.451458 140339395942208 pyconfig.py:471] Config param float32_weight_sum: True
I0425 09:29:58.451474 140339395942208 pyconfig.py:471] Config param force_q_layout: False
I0425 09:29:58.451490 140339395942208 pyconfig.py:471] Config param force_unroll: False
I0425 09:29:58.451503 140339395942208 pyconfig.py:471] Config param formatting_func_kwargs: {}
I0425 09:29:58.451519 140339395942208 pyconfig.py:471] Config param formatting_func_path: 
I0425 09:29:58.451534 140339395942208 pyconfig.py:471] Config param freeze_audio_encoder_params: True
I0425 09:29:58.451549 140339395942208 pyconfig.py:471] Config param freeze_vision_encoder_params: True
I0425 09:29:58.451566 140339395942208 pyconfig.py:471] Config param fused_mlp: False
I0425 09:29:58.451580 140339395942208 pyconfig.py:471] Config param fused_qkv: True
I0425 09:29:58.451596 140339395942208 pyconfig.py:471] Config param gcs_metrics: False
I0425 09:29:58.451610 140339395942208 pyconfig.py:471] Config param gdn_chunk_size: 64
I0425 09:29:58.451625 140339395942208 pyconfig.py:471] Config param gdn_conv_kernel_dim: 4
I0425 09:29:58.451639 140339395942208 pyconfig.py:471] Config param gdn_key_head_dim: 128
I0425 09:29:58.451655 140339395942208 pyconfig.py:471] Config param gdn_num_key_heads: 16
I0425 09:29:58.451671 140339395942208 pyconfig.py:471] Config param gdn_num_value_heads: 32
I0425 09:29:58.451686 140339395942208 pyconfig.py:471] Config param gdn_value_head_dim: 128
I0425 09:29:58.451701 140339395942208 pyconfig.py:471] Config param generate_padding_batch_eval: False
I0425 09:29:58.451717 140339395942208 pyconfig.py:471] Config param generate_padding_batch_train: False
I0425 09:29:58.451731 140339395942208 pyconfig.py:471] Config param generate_slice: v5e-16
I0425 09:29:58.451747 140339395942208 pyconfig.py:471] Config param generation_configs: {}
I0425 09:29:58.451762 140339395942208 pyconfig.py:471] Config param global_batch_size_to_eval_on: 64
I0425 09:29:58.451777 140339395942208 pyconfig.py:471] Config param global_batch_size_to_load: 512
I0425 09:29:58.451792 140339395942208 pyconfig.py:471] Config param global_batch_size_to_load_eval: 64
I0425 09:29:58.451808 140339395942208 pyconfig.py:471] Config param global_batch_size_to_load_increment: None
I0425 09:29:58.451823 140339395942208 pyconfig.py:471] Config param global_batch_size_to_load_start: None
I0425 09:29:58.451838 140339395942208 pyconfig.py:471] Config param global_batch_size_to_train_on: 512
I0425 09:29:58.451853 140339395942208 pyconfig.py:471] Config param global_head_dim: 0
I0425 09:29:58.451868 140339395942208 pyconfig.py:471] Config param global_num_kv_heads: 0
I0425 09:29:58.451884 140339395942208 pyconfig.py:471] Config param global_parameter_scale: 1
I0425 09:29:58.451898 140339395942208 pyconfig.py:471] Config param global_rampup_samples: 500
I0425 09:29:58.451914 140339395942208 pyconfig.py:471] Config param global_rope_max_timescale: -1
I0425 09:29:58.451929 140339395942208 pyconfig.py:471] Config param global_rope_proportion: 0.25
I0425 09:29:58.451946 140339395942208 pyconfig.py:471] Config param goodput_upload_interval_seconds: 30
I0425 09:29:58.451966 140339395942208 pyconfig.py:471] Config param grad_dtype: float32
I0425 09:29:58.452002 140339395942208 pyconfig.py:471] Config param gradient_accumulation_steps: 8
I0425 09:29:58.452019 140339395942208 pyconfig.py:471] Config param gradient_clipping_threshold: 1.0
I0425 09:29:58.452035 140339395942208 pyconfig.py:471] Config param grain_data_source_max_workers: 16
I0425 09:29:58.452051 140339395942208 pyconfig.py:471] Config param grain_eval_files: 
I0425 09:29:58.452067 140339395942208 pyconfig.py:471] Config param grain_file_type: arrayrecord
I0425 09:29:58.452083 140339395942208 pyconfig.py:471] Config param grain_num_threads: 16
I0425 09:29:58.452106 140339395942208 pyconfig.py:471] Config param grain_num_threads_eval: 16
I0425 09:29:58.452122 140339395942208 pyconfig.py:471] Config param grain_packing_type: first_fit
I0425 09:29:58.452138 140339395942208 pyconfig.py:471] Config param grain_per_worker_buffer_size: 1
I0425 09:29:58.452154 140339395942208 pyconfig.py:471] Config param grain_per_worker_buffer_size_eval: 1
I0425 09:29:58.452170 140339395942208 pyconfig.py:471] Config param grain_prefetch_buffer_size: 500
I0425 09:29:58.452184 140339395942208 pyconfig.py:471] Config param grain_prefetch_buffer_size_eval: 500
I0425 09:29:58.452200 140339395942208 pyconfig.py:471] Config param grain_ram_budget_mb: 1024
I0425 09:29:58.452215 140339395942208 pyconfig.py:471] Config param grain_shuffle_buffer_size: 100
I0425 09:29:58.452231 140339395942208 pyconfig.py:471] Config param grain_train_files: 
I0425 09:29:58.452247 140339395942208 pyconfig.py:471] Config param grain_train_mixture_config_path: 
I0425 09:29:58.452262 140339395942208 pyconfig.py:471] Config param grain_worker_count: 1
I0425 09:29:58.452278 140339395942208 pyconfig.py:471] Config param grain_worker_count_eval: 1
I0425 09:29:58.452294 140339395942208 pyconfig.py:471] Config param grpo_beta: 0.08
I0425 09:29:58.452308 140339395942208 pyconfig.py:471] Config param grpo_epsilon: 0.2
I0425 09:29:58.452328 140339395942208 pyconfig.py:471] Config param hardware: tpu
I0425 09:29:58.452343 140339395942208 pyconfig.py:471] Config param hbm_utilization_vllm: 0.72
I0425 09:29:58.452359 140339395942208 pyconfig.py:471] Config param head_dim: 8
I0425 09:29:58.452375 140339395942208 pyconfig.py:471] Config param heartbeat_reporting_interval_in_seconds: 5
I0425 09:29:58.452390 140339395942208 pyconfig.py:471] Config param hf_data_dir: None
I0425 09:29:58.452405 140339395942208 pyconfig.py:471] Config param hf_eval_files: None
I0425 09:29:58.452420 140339395942208 pyconfig.py:471] Config param hf_eval_split: None
I0425 09:29:58.452435 140339395942208 pyconfig.py:471] Config param hf_name: None
I0425 09:29:58.452449 140339395942208 pyconfig.py:471] Config param hf_path: OptimalScale/ClimbMix
I0425 09:29:58.452464 140339395942208 pyconfig.py:471] Config param hf_train_files: None
I0425 09:29:58.452479 140339395942208 pyconfig.py:471] Config param hidden_size_for_vit: 1408
I0425 09:29:58.452494 140339395942208 pyconfig.py:471] Config param hide_profiler_step_metric: False
I0425 09:29:58.452509 140339395942208 pyconfig.py:471] Config param ici_autoregressive_parallelism: 1
I0425 09:29:58.452524 140339395942208 pyconfig.py:471] Config param ici_context_autoregressive_parallelism: 1
I0425 09:29:58.452538 140339395942208 pyconfig.py:471] Config param ici_context_parallelism: 1
I0425 09:29:58.452554 140339395942208 pyconfig.py:471] Config param ici_data_parallelism: 1
I0425 09:29:58.452568 140339395942208 pyconfig.py:471] Config param ici_diloco_parallelism: 1
I0425 09:29:58.452584 140339395942208 pyconfig.py:471] Config param ici_expert_parallelism: 1
I0425 09:29:58.452598 140339395942208 pyconfig.py:471] Config param ici_fsdp_parallelism: -1
I0425 09:29:58.452614 140339395942208 pyconfig.py:471] Config param ici_fsdp_transpose_parallelism: 1
I0425 09:29:58.452629 140339395942208 pyconfig.py:471] Config param ici_parallelism: [1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
I0425 09:29:58.452645 140339395942208 pyconfig.py:471] Config param ici_pipeline_parallelism: 1
I0425 09:29:58.452660 140339395942208 pyconfig.py:471] Config param ici_sequence_parallelism: 1
I0425 09:29:58.452675 140339395942208 pyconfig.py:471] Config param ici_tensor_parallelism: 1
I0425 09:29:58.452689 140339395942208 pyconfig.py:471] Config param ici_tensor_sequence_parallelism: 1
I0425 09:29:58.452705 140339395942208 pyconfig.py:471] Config param ici_tensor_transpose_parallelism: 1
I0425 09:29:58.452719 140339395942208 pyconfig.py:471] Config param image_path: 
I0425 09:29:58.452735 140339395942208 pyconfig.py:471] Config param image_placeholder: <|image|>
I0425 09:29:58.452749 140339395942208 pyconfig.py:471] Config param image_size_for_vit: 896
I0425 09:29:58.452765 140339395942208 pyconfig.py:471] Config param indexer_head_dim: 128
I0425 09:29:58.452779 140339395942208 pyconfig.py:471] Config param indexer_loss_scaling_factor: 0.0
I0425 09:29:58.452795 140339395942208 pyconfig.py:471] Config param indexer_n_heads: 64
I0425 09:29:58.452809 140339395942208 pyconfig.py:471] Config param indexer_sparse_training: False
I0425 09:29:58.452825 140339395942208 pyconfig.py:471] Config param indexer_topk: 2048
I0425 09:29:58.452839 140339395942208 pyconfig.py:471] Config param inference_benchmark_test: False
I0425 09:29:58.452855 140339395942208 pyconfig.py:471] Config param inference_metadata_file: 
I0425 09:29:58.452869 140339395942208 pyconfig.py:471] Config param inference_microbenchmark_log_file_path: 
I0425 09:29:58.452884 140339395942208 pyconfig.py:471] Config param inference_microbenchmark_loop_iters: 10
I0425 09:29:58.452899 140339395942208 pyconfig.py:471] Config param inference_microbenchmark_num_samples: [1, 2, 3, 4, 5]
I0425 09:29:58.452915 140339395942208 pyconfig.py:471] Config param inference_microbenchmark_prefill_lengths: 64,128,256,512,1024
I0425 09:29:58.452929 140339395942208 pyconfig.py:471] Config param inference_microbenchmark_stages: prefill,generate
I0425 09:29:58.452945 140339395942208 pyconfig.py:471] Config param inference_server: MaxtextInterleavedServer
I0425 09:29:58.452960 140339395942208 pyconfig.py:471] Config param inhomogeneous_layer_cycle_interval: 1
I0425 09:29:58.452975 140339395942208 pyconfig.py:471] Config param init_weights_seed: 0
I0425 09:29:58.452990 140339395942208 pyconfig.py:471] Config param input_data_sharding_logical_axes: ['activation_embed_and_logits_batch', 'activation_norm_length']
I0425 09:29:58.453006 140339395942208 pyconfig.py:471] Config param interleave_moe_layer_step: 1
I0425 09:29:58.453021 140339395942208 pyconfig.py:471] Config param intermediate_size_for_vit: 5632
I0425 09:29:58.453037 140339395942208 pyconfig.py:471] Config param internal_compile: False
I0425 09:29:58.453051 140339395942208 pyconfig.py:471] Config param internal_compile_num_devices: -1
I0425 09:29:58.453066 140339395942208 pyconfig.py:471] Config param jax_cache_dir: ~/jax_cache
I0425 09:29:58.453080 140339395942208 pyconfig.py:471] Config param jax_debug_log_modules: 
I0425 09:29:58.453224 140339395942208 pyconfig.py:471] Config param jax_distributed_initialization_timeout: 300
I0425 09:29:58.453246 140339395942208 pyconfig.py:471] Config param jax_profiler_port: 9999
I0425 09:29:58.453261 140339395942208 pyconfig.py:471] Config param key_proj: RematLocation.REMAT
I0425 09:29:58.453278 140339395942208 pyconfig.py:471] Config param kv_cache_buffer: 256
I0425 09:29:58.453292 140339395942208 pyconfig.py:471] Config param kv_lora_rank: 512
I0425 09:29:58.453307 140339395942208 pyconfig.py:471] Config param kv_quant_axis: KvQuantAxis.HEADS_AND_DKV
I0425 09:29:58.453330 140339395942208 pyconfig.py:471] Config param kv_quant_dtype: int8
I0425 09:29:58.453344 140339395942208 pyconfig.py:471] Config param kv_wa_proj: RematLocation.REMAT
I0425 09:29:58.453361 140339395942208 pyconfig.py:471] Config param learning_rate: 0.0002
I0425 09:29:58.453376 140339395942208 pyconfig.py:471] Config param learning_rate_final_fraction: 0.1
I0425 09:29:58.453392 140339395942208 pyconfig.py:471] Config param learning_rate_schedule_steps: 200000
I0425 09:29:58.453407 140339395942208 pyconfig.py:471] Config param load_balance_loss_weight: 0.0
I0425 09:29:58.453423 140339395942208 pyconfig.py:471] Config param load_checkpoint_only_once: False
I0425 09:29:58.453437 140339395942208 pyconfig.py:471] Config param load_from_prefill_dir: False
I0425 09:29:58.453453 140339395942208 pyconfig.py:471] Config param load_full_state_path: 
I0425 09:29:58.453467 140339395942208 pyconfig.py:471] Config param load_parameters_path: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items
I0425 09:29:58.453483 140339395942208 pyconfig.py:471] Config param local_checkpoint_directory: 
I0425 09:29:58.453500 140339395942208 pyconfig.py:471] Config param local_checkpoint_period: 0
I0425 09:29:58.453515 140339395942208 pyconfig.py:471] Config param local_rope_max_timescale: -1
I0425 09:29:58.453529 140339395942208 pyconfig.py:471] Config param local_rope_proportion: 1.0
I0425 09:29:58.453545 140339395942208 pyconfig.py:471] Config param log_config: True
I0425 09:29:58.453559 140339395942208 pyconfig.py:471] Config param log_period: 10
I0425 09:29:58.453575 140339395942208 pyconfig.py:471] Config param logical_axis_rules: (('activation_embed_and_logits_batch', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_embed_and_logits_batch_sequence', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_vocab', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_vocab', ('tensor', 'tensor_transpose')), ('activation_vocab', 'tensor_sequence'), ('activation_vocab', ('sequence', 'context')), ('vocab', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('embed_vocab', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_batch_attn', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence', 'autoregressive')), ('activation_kv_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence')), ('activation_length_attn', ('sequence', 'context')), ('activation_length_attn', ('context',)), ('activation_q_length', ('context',)), ('activation_kv_length', ()), ('activation_embed_attn', ('tensor', 'tensor_transpose')), ('activation_kv', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_kv_head_dim', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('q_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('kv_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('qkv', ()), ('kv', ()), ('kv_head_dim', ()), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'expert')), ('q_lora_up_proj', ()), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'expert')), ('kv_lora_up_proj', ()), ('activation_batch_moe', ('data', 'fsdp', 'fsdp_transpose')), ('activation_length_moe', ('sequence', 'context')), ('activation_length_moe', ('context',)), ('activation_norm_length_moe', ('tensor_sequence', 'context', 'sequence')), ('activation_embed_moe', ('tensor', 'tensor_transpose')), ('activation_mlp_moe', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_exp', ('expert',)), ('exp', 'expert'), ('mlp_moe', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'context')), ('embed_moe', ('fsdp', 'sequence', 'context')), ('activation_mlp', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_length', ('sequence', 'context')), ('activation_length', ('context',)), ('activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_embed', ('tensor', 'tensor_transpose')), ('activation_stage', 'stage'), ('mlp', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'context', 'expert')), ('norm', ('tensor', 'tensor_transpose')), ('layers', 'stage'), ('diloco', 'diloco'), ('engram_dim', ('tensor',)), ('dense_layers', ()), ('moe_layers', ()), ('mhc', ()), ('prefill_activation_length', ('sequence', 'context')), ('prefill_activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_prefill_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_length', ('sequence',)), ('cache_heads', ('autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence')), ('cache_heads', ('autoregressive', 'tensor', 'tensor_sequence')), ('paged_kv_heads', ('tensor',)), ('cache_batch_prefill', ()), ('cache_batch', ()), ('cache_heads_none', ()), ('cache_kv', ()), ('cache_sequence', ()), ('num_pages', ()), ('tokens_per_page', ()), ('paged_kv_head_dim_size', ()), ('mlp_no_fsdp', ('tensor', 'tensor_sequence', 'autoregressive')), ('embed_tensor_transpose', ('tensor_transpose',)), ('exp_with_fsdp', 'fsdp'))
I0425 09:29:58.453650 140339395942208 pyconfig.py:471] Config param logits_dot_in_fp32: False
I0425 09:29:58.453667 140339395942208 pyconfig.py:471] Config param logits_via_embedding: True
I0425 09:29:58.453683 140339395942208 pyconfig.py:471] Config param lora_input_adapters_path: 
I0425 09:29:58.453698 140339395942208 pyconfig.py:471] Config param loss_algo: grpo
I0425 09:29:58.453712 140339395942208 pyconfig.py:471] Config param lr_schedule_type: LearningRateScheduleType.COSINE
I0425 09:29:58.453729 140339395942208 pyconfig.py:471] Config param managed_mldiagnostics: False
I0425 09:29:58.453744 140339395942208 pyconfig.py:471] Config param managed_mldiagnostics_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-09-29/managed-mldiagnostics
I0425 09:29:58.453759 140339395942208 pyconfig.py:471] Config param managed_mldiagnostics_run_group: 
I0425 09:29:58.453774 140339395942208 pyconfig.py:471] Config param matmul_precision: MatmulPrecision.DEFAULT
I0425 09:29:58.453792 140339395942208 pyconfig.py:471] Config param max_checkify: False
I0425 09:29:58.453806 140339395942208 pyconfig.py:471] Config param max_concurrency: 256
I0425 09:29:58.453821 140339395942208 pyconfig.py:471] Config param max_corpus_chars: 10000000
I0425 09:29:58.453836 140339395942208 pyconfig.py:471] Config param max_num_batched_tokens: None
I0425 09:29:58.453852 140339395942208 pyconfig.py:471] Config param max_num_checkpoints_to_keep: None
I0425 09:29:58.453866 140339395942208 pyconfig.py:471] Config param max_num_images_per_example: -1
I0425 09:29:58.453882 140339395942208 pyconfig.py:471] Config param max_num_seqs: None
I0425 09:29:58.453896 140339395942208 pyconfig.py:471] Config param max_position_embeddings: 163840
I0425 09:29:58.453911 140339395942208 pyconfig.py:471] Config param max_prefill_predict_length: 64
I0425 09:29:58.453926 140339395942208 pyconfig.py:471] Config param max_sample_len_for_audio: 10000
I0425 09:29:58.453942 140339395942208 pyconfig.py:471] Config param max_segments_per_seq: -1
I0425 09:29:58.453956 140339395942208 pyconfig.py:471] Config param max_source_positions_for_audio: 1500
I0425 09:29:58.453971 140339395942208 pyconfig.py:471] Config param max_target_length: 2048
I0425 09:29:58.453985 140339395942208 pyconfig.py:471] Config param max_timescale_for_audio: 10000.0
I0425 09:29:58.454001 140339395942208 pyconfig.py:471] Config param megablox: True
I0425 09:29:58.454016 140339395942208 pyconfig.py:471] Config param merge_gating_gmm: False
I0425 09:29:58.454032 140339395942208 pyconfig.py:471] Config param mesh_axes: ['diloco', 'data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive']
I0425 09:29:58.454051 140339395942208 pyconfig.py:471] Config param metrics_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-09-29/metrics/
I0425 09:29:58.454068 140339395942208 pyconfig.py:471] Config param metrics_file: 
I0425 09:29:58.454084 140339395942208 pyconfig.py:471] Config param mhc_expansion_rate: 1
I0425 09:29:58.454183 140339395942208 pyconfig.py:471] Config param micro_batch_size_to_eval_on: 64
I0425 09:29:58.454200 140339395942208 pyconfig.py:471] Config param micro_batch_size_to_train_on: 64
I0425 09:29:58.454217 140339395942208 pyconfig.py:471] Config param mla_kv: RematLocation.REMAT
I0425 09:29:58.454234 140339395942208 pyconfig.py:471] Config param mla_naive_kvcache: True
I0425 09:29:58.454249 140339395942208 pyconfig.py:471] Config param mla_q: RematLocation.REMAT
I0425 09:29:58.454264 140339395942208 pyconfig.py:471] Config param mlp_activations: ['gelu']
I0425 09:29:58.454280 140339395942208 pyconfig.py:471] Config param mlp_activations_limit: -1.0
I0425 09:29:58.454295 140339395942208 pyconfig.py:471] Config param mlp_bias: False
I0425 09:29:58.454311 140339395942208 pyconfig.py:471] Config param mlp_dim: 64
I0425 09:29:58.454331 140339395942208 pyconfig.py:471] Config param mlpwi: RematLocation.REMAT
I0425 09:29:58.454345 140339395942208 pyconfig.py:471] Config param mlpwi_0: RematLocation.REMAT
I0425 09:29:58.454362 140339395942208 pyconfig.py:471] Config param mlpwi_1: RematLocation.REMAT
I0425 09:29:58.454376 140339395942208 pyconfig.py:471] Config param mlpwo: RematLocation.REMAT
I0425 09:29:58.454392 140339395942208 pyconfig.py:471] Config param moba: False
I0425 09:29:58.454409 140339395942208 pyconfig.py:471] Config param moba_chunk_size: 1024
I0425 09:29:58.454424 140339395942208 pyconfig.py:471] Config param moba_topk: 8
I0425 09:29:58.454438 140339395942208 pyconfig.py:471] Config param model_call_mode: 
I0425 09:29:58.454454 140339395942208 pyconfig.py:471] Config param model_name: gpt3-52k
I0425 09:29:58.454468 140339395942208 pyconfig.py:471] Config param moe_expert_input_dim: -1
I0425 09:29:58.454484 140339395942208 pyconfig.py:471] Config param moe_fsdp_use_two_stage_all_gather: False
I0425 09:29:58.454499 140339395942208 pyconfig.py:471] Config param moe_mlp_dim: -1
I0425 09:29:58.454514 140339395942208 pyconfig.py:471] Config param moe_mlpwi_0: RematLocation.REMAT
I0425 09:29:58.454529 140339395942208 pyconfig.py:471] Config param moe_mlpwi_1: RematLocation.REMAT
I0425 09:29:58.454545 140339395942208 pyconfig.py:471] Config param moe_mlpwo: RematLocation.REMAT
I0425 09:29:58.454563 140339395942208 pyconfig.py:471] Config param monitor_goodput: False
I0425 09:29:58.454588 140339395942208 pyconfig.py:471] Config param monitor_step_time_deviation: True
I0425 09:29:58.454615 140339395942208 pyconfig.py:471] Config param mrope_section: [24, 20, 20]
I0425 09:29:58.454643 140339395942208 pyconfig.py:471] Config param mscale: 1.0
I0425 09:29:58.454669 140339395942208 pyconfig.py:471] Config param mtc_data_parallelism: 0
I0425 09:29:58.454692 140339395942208 pyconfig.py:471] Config param mtp_eval_target_module: 0
I0425 09:29:58.454707 140339395942208 pyconfig.py:471] Config param mtp_loss_scaling_factor: 0.1
I0425 09:29:58.454724 140339395942208 pyconfig.py:471] Config param mtp_num_layers: 0
I0425 09:29:58.454738 140339395942208 pyconfig.py:471] Config param mu_dtype: float32
I0425 09:29:58.454765 140339395942208 pyconfig.py:471] Config param multi_sampling: False
I0425 09:29:58.454781 140339395942208 pyconfig.py:471] Config param multi_tier_checkpointing_backup_interval_minutes: 0
I0425 09:29:58.454797 140339395942208 pyconfig.py:471] Config param muon_beta: 0.95
I0425 09:29:58.454812 140339395942208 pyconfig.py:471] Config param muon_consistent_rms: None
I0425 09:29:58.454828 140339395942208 pyconfig.py:471] Config param muon_weight_decay: 0.0
I0425 09:29:58.454842 140339395942208 pyconfig.py:471] Config param n_routing_groups: -1
I0425 09:29:58.454858 140339395942208 pyconfig.py:471] Config param n_window_for_audio: 50
I0425 09:29:58.454872 140339395942208 pyconfig.py:471] Config param n_window_infer_for_audio: 800
I0425 09:29:58.454888 140339395942208 pyconfig.py:471] Config param nope_layer_interval: -1
I0425 09:29:58.454903 140339395942208 pyconfig.py:471] Config param norm_topk_prob: False
I0425 09:29:58.454917 140339395942208 pyconfig.py:471] Config param normalization_layer_epsilon: 1e-05
I0425 09:29:58.454935 140339395942208 pyconfig.py:471] Config param normalize_embedding_logits: False
I0425 09:29:58.454952 140339395942208 pyconfig.py:471] Config param num_attention_heads_for_vit: 16
I0425 09:29:58.454967 140339395942208 pyconfig.py:471] Config param num_batches: 4
I0425 09:29:58.454982 140339395942208 pyconfig.py:471] Config param num_channels_for_vit: 3
I0425 09:29:58.454997 140339395942208 pyconfig.py:471] Config param num_conv_layers_for_audio: 3
I0425 09:29:58.455012 140339395942208 pyconfig.py:471] Config param num_decoder_layers: 1
I0425 09:29:58.455026 140339395942208 pyconfig.py:471] Config param num_diloco_replicas: 1
I0425 09:29:58.455042 140339395942208 pyconfig.py:471] Config param num_epoch: 1
I0425 09:29:58.455056 140339395942208 pyconfig.py:471] Config param num_eval_passes: 1
I0425 09:29:58.455072 140339395942208 pyconfig.py:471] Config param num_experts: 1
I0425 09:29:58.455086 140339395942208 pyconfig.py:471] Config param num_experts_per_tok: 1
I0425 09:29:58.455113 140339395942208 pyconfig.py:471] Config param num_generations: 2
I0425 09:29:58.455127 140339395942208 pyconfig.py:471] Config param num_hidden_layers_for_vit: 34
I0425 09:29:58.455142 140339395942208 pyconfig.py:471] Config param num_iterations: 1
I0425 09:29:58.455157 140339395942208 pyconfig.py:471] Config param num_kv_heads: 2
I0425 09:29:58.455179 140339395942208 pyconfig.py:471] Config param num_layers_per_pipeline_stage: 1
I0425 09:29:58.455205 140339395942208 pyconfig.py:471] Config param num_mel_bins_for_audio: 128
I0425 09:29:58.455230 140339395942208 pyconfig.py:471] Config param num_pipeline_microbatches: -1
I0425 09:29:58.455252 140339395942208 pyconfig.py:471] Config param num_pipeline_repeats: -1
I0425 09:29:58.455269 140339395942208 pyconfig.py:471] Config param num_position_embeddings_for_vit: 1024
I0425 09:29:58.455283 140339395942208 pyconfig.py:471] Config param num_query_heads: 2
I0425 09:29:58.455298 140339395942208 pyconfig.py:471] Config param num_samplers_slices: -1
I0425 09:29:58.455318 140339395942208 pyconfig.py:471] Config param num_slices: 1
I0425 09:29:58.455340 140339395942208 pyconfig.py:471] Config param num_target_devices: 32
I0425 09:29:58.455365 140339395942208 pyconfig.py:471] Config param num_test_batches: 5
I0425 09:29:58.455390 140339395942208 pyconfig.py:471] Config param num_trainer_slices: -1
I0425 09:29:58.455416 140339395942208 pyconfig.py:471] Config param num_vocab_tiling: 1
I0425 09:29:58.455443 140339395942208 pyconfig.py:471] Config param off_policy_steps: 0
I0425 09:29:58.455467 140339395942208 pyconfig.py:471] Config param offline_data_dir: None
I0425 09:29:58.455492 140339395942208 pyconfig.py:471] Config param opt_type: OptimizerType.ADAM_PAX
I0425 09:29:58.455521 140339395942208 pyconfig.py:471] Config param optimize_mesh_for_tpu_v6e: False
I0425 09:29:58.455546 140339395942208 pyconfig.py:471] Config param optimizer_memory_host_offload: False
I0425 09:29:58.455572 140339395942208 pyconfig.py:471] Config param original_max_position_embeddings: 4096
I0425 09:29:58.455598 140339395942208 pyconfig.py:471] Config param out_hidden_size_for_vit: 512
I0425 09:29:58.455622 140339395942208 pyconfig.py:471] Config param out_proj: RematLocation.REMAT
I0425 09:29:58.455649 140339395942208 pyconfig.py:471] Config param output_dim_for_audio: 512
I0425 09:29:58.455673 140339395942208 pyconfig.py:471] Config param override_logical_axis_rules: False
I0425 09:29:58.455698 140339395942208 pyconfig.py:471] Config param override_model_config: True
I0425 09:29:58.455724 140339395942208 pyconfig.py:471] Config param packing: True
I0425 09:29:58.455748 140339395942208 pyconfig.py:471] Config param pagedattn_head_dim_alignment: 128
I0425 09:29:58.455773 140339395942208 pyconfig.py:471] Config param pagedattn_max_pages_per_group: -1
I0425 09:29:58.455798 140339395942208 pyconfig.py:471] Config param pagedattn_num_pages: 64
I0425 09:29:58.455823 140339395942208 pyconfig.py:471] Config param pagedattn_pages_per_compute_block: 4
I0425 09:29:58.455848 140339395942208 pyconfig.py:471] Config param pagedattn_tokens_per_page: 32
I0425 09:29:58.455873 140339395942208 pyconfig.py:471] Config param param_scan_axis: 1
I0425 09:29:58.455897 140339395942208 pyconfig.py:471] Config param parameter_memory_host_offload: False
I0425 09:29:58.455922 140339395942208 pyconfig.py:471] Config param partial_rotary_factor: 1.0
I0425 09:29:58.455948 140339395942208 pyconfig.py:471] Config param patch_size_for_vit: 14
I0425 09:29:58.455973 140339395942208 pyconfig.py:471] Config param penalty_incorrect_answer: -1.0
I0425 09:29:58.455998 140339395942208 pyconfig.py:471] Config param penalty_incorrect_format: -0.5
I0425 09:29:58.456024 140339395942208 pyconfig.py:471] Config param per_device_batch_size: 2
I0425 09:29:58.456048 140339395942208 pyconfig.py:471] Config param per_device_batch_size_increment: 2.0
I0425 09:29:58.456073 140339395942208 pyconfig.py:471] Config param per_device_batch_size_start: 4.0
I0425 09:29:58.456113 140339395942208 pyconfig.py:471] Config param pipeline_delay_activation_forwarding: False
I0425 09:29:58.456139 140339395942208 pyconfig.py:471] Config param pipeline_fsdp_ag_once: False
I0425 09:29:58.456163 140339395942208 pyconfig.py:471] Config param pipeline_fsdp_ag_per_repeat: False
I0425 09:29:58.456188 140339395942208 pyconfig.py:471] Config param pipeline_parallel_layers: 1
I0425 09:29:58.456214 140339395942208 pyconfig.py:471] Config param pixel_shuffle_ratio_for_vit: 0.5
I0425 09:29:58.456240 140339395942208 pyconfig.py:471] Config param posemb_type_for_vit: learn
I0425 09:29:58.456265 140339395942208 pyconfig.py:471] Config param position_id_per_seconds: 25
I0425 09:29:58.456290 140339395942208 pyconfig.py:471] Config param prefill_cache_axis_order: 1,2,0,3
I0425 09:29:58.456319 140339395942208 pyconfig.py:471] Config param prefill_cache_dir: 
I0425 09:29:58.456344 140339395942208 pyconfig.py:471] Config param prefill_chunk_size: 256
I0425 09:29:58.456368 140339395942208 pyconfig.py:471] Config param prefill_slice: v5e-16
I0425 09:29:58.456393 140339395942208 pyconfig.py:471] Config param prefix_caching_dram_byte: 100000000000
I0425 09:29:58.456418 140339395942208 pyconfig.py:471] Config param prefix_caching_hbm_byte: 10000000000
I0425 09:29:58.456444 140339395942208 pyconfig.py:471] Config param prefuse_moe_weights: False
I0425 09:29:58.456469 140339395942208 pyconfig.py:471] Config param profile_cleanly: True
I0425 09:29:58.456497 140339395942208 pyconfig.py:471] Config param profile_periodically_period: -1
I0425 09:29:58.456524 140339395942208 pyconfig.py:471] Config param profile_power_events: False
I0425 09:29:58.456548 140339395942208 pyconfig.py:471] Config param profiler: ProfilerType.NONE
I0425 09:29:58.456577 140339395942208 pyconfig.py:471] Config param profiler_steps: 5
I0425 09:29:58.456602 140339395942208 pyconfig.py:471] Config param projector_dropout_for_vit: 0.0
I0425 09:29:58.456627 140339395942208 pyconfig.py:471] Config param projector_input_dim_for_vit: 4096
I0425 09:29:58.456652 140339395942208 pyconfig.py:471] Config param projector_output_dim_for_vit: 4096
I0425 09:29:58.456677 140339395942208 pyconfig.py:471] Config param prometheus_port: 0
I0425 09:29:58.456702 140339395942208 pyconfig.py:471] Config param prompt: I love to
I0425 09:29:58.456727 140339395942208 pyconfig.py:471] Config param pure_nnx: False
I0425 09:29:58.456752 140339395942208 pyconfig.py:471] Config param pure_nnx_decoder: False
I0425 09:29:58.456777 140339395942208 pyconfig.py:471] Config param q_lora_rank: 0
I0425 09:29:58.456801 140339395942208 pyconfig.py:471] Config param qk_clip_threshold: 100.0
I0425 09:29:58.456826 140339395942208 pyconfig.py:471] Config param qk_nope_head_dim: 128
I0425 09:29:58.456851 140339395942208 pyconfig.py:471] Config param qk_norm_with_scale: True
I0425 09:29:58.456876 140339395942208 pyconfig.py:471] Config param qk_rope_head_dim: 64
I0425 09:29:58.456901 140339395942208 pyconfig.py:471] Config param qkv_proj: RematLocation.REMAT
I0425 09:29:58.456927 140339395942208 pyconfig.py:471] Config param quant_cfg_path: 
I0425 09:29:58.456952 140339395942208 pyconfig.py:471] Config param quantization: QuantizationType.NONE
I0425 09:29:58.456981 140339395942208 pyconfig.py:471] Config param quantization_local_shard_count: 4
I0425 09:29:58.457007 140339395942208 pyconfig.py:471] Config param quantize_kvcache: False
I0425 09:29:58.457031 140339395942208 pyconfig.py:471] Config param query_proj: RematLocation.REMAT
I0425 09:29:58.457057 140339395942208 pyconfig.py:471] Config param query_wa_proj: RematLocation.REMAT
I0425 09:29:58.457084 140339395942208 pyconfig.py:471] Config param ragged_block_size: 256
I0425 09:29:58.457126 140339395942208 pyconfig.py:471] Config param ragged_buffer_factor: -1.0
I0425 09:29:58.457151 140339395942208 pyconfig.py:471] Config param rampup_end_step: 0
I0425 09:29:58.457168 140339395942208 pyconfig.py:471] Config param rampup_samples_per_increment_to_load: None
I0425 09:29:58.457184 140339395942208 pyconfig.py:471] Config param reasoning_end_token: </reasoning>
I0425 09:29:58.457200 140339395942208 pyconfig.py:471] Config param reasoning_start_token: <reasoning>
I0425 09:29:58.457216 140339395942208 pyconfig.py:471] Config param record_internal_nn_metrics: 0
I0425 09:29:58.457230 140339395942208 pyconfig.py:471] Config param remat_policy: full
I0425 09:29:58.457253 140339395942208 pyconfig.py:471] Config param remat_policy_for_vit: minimal
I0425 09:29:58.457278 140339395942208 pyconfig.py:471] Config param remove_size_one_mesh_axis_from_type: True
I0425 09:29:58.457303 140339395942208 pyconfig.py:471] Config param replicate_quant_scale: False
I0425 09:29:58.457333 140339395942208 pyconfig.py:471] Config param replicator_backup_interval_minutes: 0
I0425 09:29:58.457358 140339395942208 pyconfig.py:471] Config param report_heartbeat_metric_for_gcp_monitoring: False
I0425 09:29:58.457383 140339395942208 pyconfig.py:471] Config param report_performance_metric_for_gcp_monitoring: False
I0425 09:29:58.457407 140339395942208 pyconfig.py:471] Config param reshape_q: False
I0425 09:29:58.457432 140339395942208 pyconfig.py:471] Config param return_log_prob: False
I0425 09:29:58.457458 140339395942208 pyconfig.py:471] Config param reuse_example_batch: 0
I0425 09:29:58.457483 140339395942208 pyconfig.py:471] Config param reward_exact_answer: 5.0
I0425 09:29:58.457506 140339395942208 pyconfig.py:471] Config param reward_exact_format_match: 3.0
I0425 09:29:58.457529 140339395942208 pyconfig.py:471] Config param reward_partial_format_match: 0.5
I0425 09:29:58.457556 140339395942208 pyconfig.py:471] Config param reward_ratio_guess_to_answer_high: 0.5
I0425 09:29:58.457581 140339395942208 pyconfig.py:471] Config param reward_ratio_guess_to_answer_low: 0.25
I0425 09:29:58.457606 140339395942208 pyconfig.py:471] Config param reward_white_space_format_match: 1.5
I0425 09:29:58.457633 140339395942208 pyconfig.py:471] Config param rl: {'num_generations': 2, 'num_iterations': 1, 'grpo_beta': 0.08, 'grpo_epsilon': 0.2, 'loss_algo': 'grpo', 'use_agentic_rollout': False, 'max_concurrency': 256, 'off_policy_steps': 0, 'system_prompt': '', 'degenerate_group_masking': True, 'epsilon_high': None}
I0425 09:29:58.457664 140339395942208 pyconfig.py:471] Config param rollout_data_parallelism: -1
I0425 09:29:58.457690 140339395942208 pyconfig.py:471] Config param rollout_expert_parallelism: 1
I0425 09:29:58.457716 140339395942208 pyconfig.py:471] Config param rollout_micro_batch_size: -1
I0425 09:29:58.457741 140339395942208 pyconfig.py:471] Config param rollout_tensor_parallelism: -1
I0425 09:29:58.457766 140339395942208 pyconfig.py:471] Config param rope_attention_scaling: False
I0425 09:29:58.457792 140339395942208 pyconfig.py:471] Config param rope_factor: 40
I0425 09:29:58.457817 140339395942208 pyconfig.py:471] Config param rope_interleave: True
I0425 09:29:58.457842 140339395942208 pyconfig.py:471] Config param rope_linear_scaling_factor: 1.0
I0425 09:29:58.457868 140339395942208 pyconfig.py:471] Config param rope_max_timescale: 10000
I0425 09:29:58.457892 140339395942208 pyconfig.py:471] Config param rope_min_timescale: 1
I0425 09:29:58.457917 140339395942208 pyconfig.py:471] Config param rope_theta_for_vit: 10000
I0425 09:29:58.457942 140339395942208 pyconfig.py:471] Config param rope_truncate: True
I0425 09:29:58.457967 140339395942208 pyconfig.py:471] Config param rope_type: RopeType.DEFAULT
I0425 09:29:58.457994 140339395942208 pyconfig.py:471] Config param rope_use_scale: True
I0425 09:29:58.458019 140339395942208 pyconfig.py:471] Config param routed_bias: False
I0425 09:29:58.458044 140339395942208 pyconfig.py:471] Config param routed_bias_update_rate: 0.0
I0425 09:29:58.458070 140339395942208 pyconfig.py:471] Config param routed_scaling_factor: 1.0
I0425 09:29:58.458106 140339395942208 pyconfig.py:471] Config param routed_score_func: 
I0425 09:29:58.458132 140339395942208 pyconfig.py:471] Config param run_name: gpt3-52k_2026-04-25-09-29
I0425 09:29:58.458157 140339395942208 pyconfig.py:471] Config param sa_block_kv: 512
I0425 09:29:58.458181 140339395942208 pyconfig.py:471] Config param sa_block_kv_compute: 512
I0425 09:29:58.458206 140339395942208 pyconfig.py:471] Config param sa_block_kv_dkv: 512
I0425 09:29:58.458231 140339395942208 pyconfig.py:471] Config param sa_block_kv_dkv_compute: 512
I0425 09:29:58.458255 140339395942208 pyconfig.py:471] Config param sa_block_kv_dq: 512
I0425 09:29:58.458279 140339395942208 pyconfig.py:471] Config param sa_block_q: 512
I0425 09:29:58.458303 140339395942208 pyconfig.py:471] Config param sa_block_q_dkv: 512
I0425 09:29:58.458331 140339395942208 pyconfig.py:471] Config param sa_block_q_dq: 512
I0425 09:29:58.458357 140339395942208 pyconfig.py:471] Config param sa_k_layout: HEAD_DIM_MINOR
I0425 09:29:58.458381 140339395942208 pyconfig.py:471] Config param sa_q_layout: HEAD_DIM_MINOR
I0425 09:29:58.458406 140339395942208 pyconfig.py:471] Config param sa_use_fused_bwd_kernel: False
I0425 09:29:58.458431 140339395942208 pyconfig.py:471] Config param sa_v_layout: HEAD_DIM_MINOR
I0425 09:29:58.458457 140339395942208 pyconfig.py:471] Config param sampler_devices_fraction: 0.5
I0425 09:29:58.458484 140339395942208 pyconfig.py:471] Config param save_checkpoint_on_completion: True
I0425 09:29:58.458508 140339395942208 pyconfig.py:471] Config param save_config_to_gcs: False
I0425 09:29:58.458533 140339395942208 pyconfig.py:471] Config param save_quantized_params_path: 
I0425 09:29:58.458556 140339395942208 pyconfig.py:471] Config param scale_embedding_for_audio: True
I0425 09:29:58.458580 140339395942208 pyconfig.py:471] Config param scan_layers: True
I0425 09:29:58.458606 140339395942208 pyconfig.py:471] Config param scan_layers_per_stage: False
I0425 09:29:58.458632 140339395942208 pyconfig.py:471] Config param scan_pipeline_iterations: True
I0425 09:29:58.458657 140339395942208 pyconfig.py:471] Config param scan_pipeline_repeats: False
I0425 09:29:58.458681 140339395942208 pyconfig.py:471] Config param set_remat_policy_on_layers_per_stage: False
I0425 09:29:58.458705 140339395942208 pyconfig.py:471] Config param set_remat_policy_on_pipeline_iterations: True
I0425 09:29:58.458730 140339395942208 pyconfig.py:471] Config param sft_train_on_completion_only: False
I0425 09:29:58.458755 140339395942208 pyconfig.py:471] Config param shard_exp_on_fsdp: False
I0425 09:29:58.458779 140339395942208 pyconfig.py:471] Config param shard_mode: ShardMode.AUTO
I0425 09:29:58.458806 140339395942208 pyconfig.py:471] Config param shard_optimizer_over_data: False
I0425 09:29:58.458831 140339395942208 pyconfig.py:471] Config param sharding_strategy: None
I0425 09:29:58.458856 140339395942208 pyconfig.py:471] Config param sharding_tolerance: 0.02
I0425 09:29:58.458882 140339395942208 pyconfig.py:471] Config param shardy: True
I0425 09:29:58.458907 140339395942208 pyconfig.py:471] Config param share_kv_projections: False
I0425 09:29:58.458932 140339395942208 pyconfig.py:471] Config param shared_experts: 0
I0425 09:29:58.458956 140339395942208 pyconfig.py:471] Config param sinkhorn_iterations: 20
I0425 09:29:58.458981 140339395942208 pyconfig.py:471] Config param skip_first_n_steps_for_profiler: 1
I0425 09:29:58.459006 140339395942208 pyconfig.py:471] Config param skip_jax_distributed_system: False
I0425 09:29:58.459030 140339395942208 pyconfig.py:471] Config param skip_step_interval: 128
I0425 09:29:58.459054 140339395942208 pyconfig.py:471] Config param skip_step_on_spikes: False
I0425 09:29:58.459079 140339395942208 pyconfig.py:471] Config param skip_step_scaling_factor: 6.0
I0425 09:29:58.459115 140339395942208 pyconfig.py:471] Config param sliding_window_size: 0
I0425 09:29:58.459141 140339395942208 pyconfig.py:471] Config param solution_end_token: </answer>
I0425 09:29:58.459166 140339395942208 pyconfig.py:471] Config param solution_start_token: <answer>
I0425 09:29:58.459191 140339395942208 pyconfig.py:471] Config param source_checkpoint_layout: orbax
I0425 09:29:58.459215 140339395942208 pyconfig.py:471] Config param sparse_matmul: True
I0425 09:29:58.459240 140339395942208 pyconfig.py:471] Config param spatial_merge_size_for_vit: 2
I0425 09:29:58.459264 140339395942208 pyconfig.py:471] Config param stack_prefill_result_cache: False
I0425 09:29:58.459288 140339395942208 pyconfig.py:471] Config param stack_trace_interval_seconds: 600
I0425 09:29:58.459317 140339395942208 pyconfig.py:471] Config param stack_trace_to_cloud: False
I0425 09:29:58.459343 140339395942208 pyconfig.py:471] Config param step_deviation_interval_seconds: 30
I0425 09:29:58.459367 140339395942208 pyconfig.py:471] Config param steps: 200000
I0425 09:29:58.459392 140339395942208 pyconfig.py:471] Config param stop_strings: None
I0425 09:29:58.459416 140339395942208 pyconfig.py:471] Config param student_overrides: {'model_name': 'llama3.1-8b'}
I0425 09:29:58.459439 140339395942208 pyconfig.py:471] Config param student_params_to_update: None
I0425 09:29:58.459461 140339395942208 pyconfig.py:471] Config param subslice_shape: 
I0425 09:29:58.459479 140339395942208 pyconfig.py:471] Config param swap_space_vllm_gb: 2
I0425 09:29:58.459495 140339395942208 pyconfig.py:471] Config param system_prompt: 
I0425 09:29:58.459509 140339395942208 pyconfig.py:471] Config param target_eval_loss: 0.0
I0425 09:29:58.459525 140339395942208 pyconfig.py:471] Config param teacher_overrides: {'model_name': 'llama3.1-8b'}
I0425 09:29:58.459540 140339395942208 pyconfig.py:471] Config param temperature_tuning: False
I0425 09:29:58.459556 140339395942208 pyconfig.py:471] Config param temporal_patch_size_for_vit: 2
I0425 09:29:58.459571 140339395942208 pyconfig.py:471] Config param tensorboard_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-09-29/tensorboard/
I0425 09:29:58.459587 140339395942208 pyconfig.py:471] Config param tensors_on_device: None
I0425 09:29:58.459602 140339395942208 pyconfig.py:471] Config param tensors_to_offload: None
I0425 09:29:58.459619 140339395942208 pyconfig.py:471] Config param test_batch_start_index: 0
I0425 09:29:58.459635 140339395942208 pyconfig.py:471] Config param tile_size_for_vit: 336
I0425 09:29:58.459651 140339395942208 pyconfig.py:471] Config param tokenize_eval_data: True
I0425 09:29:58.459667 140339395942208 pyconfig.py:471] Config param tokenize_train_data: True
I0425 09:29:58.459682 140339395942208 pyconfig.py:471] Config param tokenizer_path: meta-llama/Llama-3.1-8B
I0425 09:29:58.459698 140339395942208 pyconfig.py:471] Config param tokenizer_type: TokenizerType.HUGGINGFACE
I0425 09:29:58.459716 140339395942208 pyconfig.py:471] Config param topk_routing_group: -1
I0425 09:29:58.459731 140339395942208 pyconfig.py:471] Config param train_data_columns: ['text']
I0425 09:29:58.459747 140339395942208 pyconfig.py:471] Config param train_fraction: 1.0
I0425 09:29:58.459764 140339395942208 pyconfig.py:471] Config param train_image_column: image
I0425 09:29:58.459780 140339395942208 pyconfig.py:471] Config param train_micro_batch_size: -1
I0425 09:29:58.459803 140339395942208 pyconfig.py:471] Config param train_split: train
I0425 09:29:58.459829 140339395942208 pyconfig.py:471] Config param trainable_parameters_mask: []
I0425 09:29:58.459854 140339395942208 pyconfig.py:471] Config param trainable_position_size: 2048
I0425 09:29:58.459879 140339395942208 pyconfig.py:471] Config param trainer_devices_fraction: 0.5
I0425 09:29:58.459906 140339395942208 pyconfig.py:471] Config param upload_all_profiler_results: False
I0425 09:29:58.459931 140339395942208 pyconfig.py:471] Config param use_2d_fsdp_sharding: False
I0425 09:29:58.459956 140339395942208 pyconfig.py:471] Config param use_agentic_rollout: False
I0425 09:29:58.459981 140339395942208 pyconfig.py:471] Config param use_audio: False
I0425 09:29:58.460005 140339395942208 pyconfig.py:471] Config param use_audio_in_video: False
I0425 09:29:58.460029 140339395942208 pyconfig.py:471] Config param use_batch_split_schedule: False
I0425 09:29:58.460054 140339395942208 pyconfig.py:471] Config param use_chat_template: False
I0425 09:29:58.460078 140339395942208 pyconfig.py:471] Config param use_chunked_prefill: False
I0425 09:29:58.460112 140339395942208 pyconfig.py:471] Config param use_custom_sort_vjp: True
I0425 09:29:58.460138 140339395942208 pyconfig.py:471] Config param use_dpo: False
I0425 09:29:58.460162 140339395942208 pyconfig.py:471] Config param use_gather_mosaic_kernel: False
I0425 09:29:58.460187 140339395942208 pyconfig.py:471] Config param use_grpo: True
I0425 09:29:58.460211 140339395942208 pyconfig.py:471] Config param use_indexer: False
I0425 09:29:58.460236 140339395942208 pyconfig.py:471] Config param use_iota_embed: True
I0425 09:29:58.460260 140339395942208 pyconfig.py:471] Config param use_jax_splash: False
I0425 09:29:58.460284 140339395942208 pyconfig.py:471] Config param use_max_logit_estimate: -1
I0425 09:29:58.460309 140339395942208 pyconfig.py:471] Config param use_mrope: False
I0425 09:29:58.460338 140339395942208 pyconfig.py:471] Config param use_multimodal: False
I0425 09:29:58.460362 140339395942208 pyconfig.py:471] Config param use_pathways: True
I0425 09:29:58.460386 140339395942208 pyconfig.py:471] Config param use_post_attn_norm: False
I0425 09:29:58.460411 140339395942208 pyconfig.py:471] Config param use_post_ffw_norm: False
I0425 09:29:58.460435 140339395942208 pyconfig.py:471] Config param use_qk_clip: False
I0425 09:29:58.460459 140339395942208 pyconfig.py:471] Config param use_qk_norm: False
I0425 09:29:58.460483 140339395942208 pyconfig.py:471] Config param use_qk_norm_in_gdn: True
I0425 09:29:58.460507 140339395942208 pyconfig.py:471] Config param use_qwix_quantization: False
I0425 09:29:58.460531 140339395942208 pyconfig.py:471] Config param use_ragged_attention: False
I0425 09:29:58.460556 140339395942208 pyconfig.py:471] Config param use_random_routing: False
I0425 09:29:58.460580 140339395942208 pyconfig.py:471] Config param use_replicator_service: False
I0425 09:29:58.460604 140339395942208 pyconfig.py:471] Config param use_ring_of_experts: False
I0425 09:29:58.460628 140339395942208 pyconfig.py:471] Config param use_sft: False
I0425 09:29:58.460653 140339395942208 pyconfig.py:471] Config param use_splash_scheduler: False
I0425 09:29:58.460678 140339395942208 pyconfig.py:471] Config param use_tokamax_gmm: False
I0425 09:29:58.460702 140339395942208 pyconfig.py:471] Config param use_tokamax_splash: False
I0425 09:29:58.460728 140339395942208 pyconfig.py:471] Config param use_truncation: True
I0425 09:29:58.460753 140339395942208 pyconfig.py:471] Config param use_tunix_gradient_accumulation: False
I0425 09:29:58.460777 140339395942208 pyconfig.py:471] Config param use_untrainable_positional_embedding: False
I0425 09:29:58.460802 140339395942208 pyconfig.py:471] Config param use_vertex_tensorboard: False
I0425 09:29:58.460826 140339395942208 pyconfig.py:471] Config param using_pipeline_parallelism: False
I0425 09:29:58.460851 140339395942208 pyconfig.py:471] Config param v_head_dim: 128
I0425 09:29:58.460876 140339395942208 pyconfig.py:471] Config param v_norm_with_scale: True
I0425 09:29:58.460901 140339395942208 pyconfig.py:471] Config param value_proj: RematLocation.REMAT
I0425 09:29:58.460926 140339395942208 pyconfig.py:471] Config param vertex_tensorboard_project: 
I0425 09:29:58.460951 140339395942208 pyconfig.py:471] Config param vertex_tensorboard_region: 
I0425 09:29:58.460975 140339395942208 pyconfig.py:471] Config param video_path: 
I0425 09:29:58.460999 140339395942208 pyconfig.py:471] Config param video_placeholder: <|video|>
I0425 09:29:58.461023 140339395942208 pyconfig.py:471] Config param vision_output_dim_for_vit: 4096
I0425 09:29:58.461047 140339395942208 pyconfig.py:471] Config param vision_output_length: -1
I0425 09:29:58.461071 140339395942208 pyconfig.py:471] Config param vllm_additional_config: {}
I0425 09:29:58.461090 140339395942208 pyconfig.py:471] Config param vllm_hf_config_path: 
I0425 09:29:58.461119 140339395942208 pyconfig.py:471] Config param vllm_hf_overrides: {}
I0425 09:29:58.461145 140339395942208 pyconfig.py:471] Config param vocab_size: 32000
I0425 09:29:58.461169 140339395942208 pyconfig.py:471] Config param warmup_steps_fraction: 0.1
I0425 09:29:58.461195 140339395942208 pyconfig.py:471] Config param weight_dtype: float32
I0425 09:29:58.461231 140339395942208 pyconfig.py:471] Config param weight_quantization_calibration_method: absmax
I0425 09:29:58.461249 140339395942208 pyconfig.py:471] Config param wi_tile_dlhs_batch_seq: 512
I0425 09:29:58.461267 140339395942208 pyconfig.py:471] Config param wi_tile_dlhs_embed_dim: 1024
I0425 09:29:58.461293 140339395942208 pyconfig.py:471] Config param wi_tile_dlhs_mlp_dim: 1024
I0425 09:29:58.461323 140339395942208 pyconfig.py:471] Config param wi_tile_drhs_batch_seq: 512
I0425 09:29:58.461348 140339395942208 pyconfig.py:471] Config param wi_tile_drhs_embed_dim: 1024
I0425 09:29:58.461371 140339395942208 pyconfig.py:471] Config param wi_tile_drhs_mlp_dim: 1024
I0425 09:29:58.461395 140339395942208 pyconfig.py:471] Config param wi_tile_fwd_batch_seq: 512
I0425 09:29:58.461418 140339395942208 pyconfig.py:471] Config param wi_tile_fwd_embed_dim: 1024
I0425 09:29:58.461439 140339395942208 pyconfig.py:471] Config param wi_tile_fwd_mlp_dim: 1024
I0425 09:29:58.461457 140339395942208 pyconfig.py:471] Config param wo_tile_dlhs_batch_seq: 512
I0425 09:29:58.461472 140339395942208 pyconfig.py:471] Config param wo_tile_dlhs_embed_dim: 1024
I0425 09:29:58.461488 140339395942208 pyconfig.py:471] Config param wo_tile_dlhs_mlp_dim: 1024
I0425 09:29:58.461502 140339395942208 pyconfig.py:471] Config param wo_tile_drhs_batch_seq: 512
I0425 09:29:58.461520 140339395942208 pyconfig.py:471] Config param wo_tile_drhs_embed_dim: 1024
I0425 09:29:58.461544 140339395942208 pyconfig.py:471] Config param wo_tile_drhs_mlp_dim: 1024
I0425 09:29:58.461568 140339395942208 pyconfig.py:471] Config param wo_tile_fwd_batch_seq: 512
I0425 09:29:58.461591 140339395942208 pyconfig.py:471] Config param wo_tile_fwd_embed_dim: 1024
I0425 09:29:58.461616 140339395942208 pyconfig.py:471] Config param wo_tile_fwd_mlp_dim: 1024
I0425 09:29:58.461641 140339395942208 pyconfig.py:471] Config param wsd_decay_steps_fraction: 0.1
I0425 09:29:58.461666 140339395942208 pyconfig.py:471] Config param wsd_decay_style: WsdDecayStyle.LINEAR
I0425 09:29:58.461694 140339395942208 pyconfig.py:471] Config param xprof_e2e_enable_fw_power_level_event: False
I0425 09:29:58.461716 140339395942208 pyconfig.py:471] Config param xprof_e2e_enable_fw_thermal_event: False
I0425 09:29:58.461740 140339395942208 pyconfig.py:471] Config param xprof_e2e_enable_fw_throttle_event: False
I0425 09:29:58.461764 140339395942208 pyconfig.py:471] Config param xprof_tpu_power_trace_level: 0
I0425 09:29:58.461789 140339395942208 pyconfig.py:471] Config param z_loss_multiplier: 0.0
I0425 09:29:58.462140 140339395942208 tokenizer.py:245] Tokenizer path: meta-llama/Llama-2-7b-chat-hf
I0425 09:29:58.462177 140339395942208 tokenizer.py:224] Loading HF tokenizer: meta-llama/Llama-2-7b-chat-hf
I0425 09:29:58.650456 140339395942208 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json "HTTP/1.1 200 OK"
I0425 09:29:58.760344 140339395942208 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json "HTTP/1.1 200 OK"
I0425 09:29:58.883017 140339395942208 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer_config.json "HTTP/1.1 200 OK"
I0425 09:29:58.989347 140339395942208 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer_config.json "HTTP/1.1 200 OK"
I0425 09:29:59.097256 140339395942208 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main/additional_chat_templates?recursive=false&expand=false "HTTP/1.1 404 Not Found"
I0425 09:29:59.201164 140339395942208 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main?recursive=true&expand=false "HTTP/1.1 200 OK"
I0425 09:29:59.313810 140339395942208 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer.model "HTTP/1.1 302 Found"
I0425 09:29:59.423926 140339395942208 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/xet-read-token/f5db02db724555f92da89c216ac04704f23d4590 "HTTP/1.1 200 OK"
I0425 09:30:00.057919 140339395942208 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer.json "HTTP/1.1 200 OK"
I0425 09:30:00.170363 140339395942208 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer.json "HTTP/1.1 200 OK"
I0425 09:30:00.589727 140339395942208 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/added_tokens.json "HTTP/1.1 404 Not Found"
I0425 09:30:00.699877 140339395942208 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/special_tokens_map.json "HTTP/1.1 200 OK"
I0425 09:30:00.810192 140339395942208 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/special_tokens_map.json "HTTP/1.1 200 OK"
I0425 09:30:00.919211 140339395942208 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/chat_template.jinja "HTTP/1.1 404 Not Found"
I0425 09:30:01.023324 140339395942208 _schedule.py:129] A polynomial schedule was set with a non-positive `transition_steps` value; this results in a constant schedule with value `init_value`.
I0425 09:30:01.030192 140339395942208 maxtext_utils.py:1771] Num_devices: 32, shape (1, 4, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1)
I0425 09:30:01.030342 140339395942208 train_distill.py:582] Applying logical axis rules for model initialization and training...
I0425 09:30:01.030416 140339395942208 train_distill.py:586] Loading Student from ...
I0425 09:30:01.030446 140339395942208 train_distill.py:170] --- Student Configuration ---
I0425 09:30:01.030468 140339395942208 train_distill.py:171]   Model Name:      gpt3-52k
I0425 09:30:01.030491 140339395942208 train_distill.py:172]   Dimensions:      1 Layers, 16 Emb Dim, 8 Head Dim
I0425 09:30:01.030510 140339395942208 train_distill.py:175]   Attention Heads: 2 Query, 2 KV
I0425 09:30:01.030529 140339395942208 train_distill.py:176]   Vocab Size:      32000
I0425 09:30:01.030547 140339395942208 train_distill.py:177]   Checkpoint:      
I0425 09:30:01.030566 140339395942208 train_distill.py:451] Initializing model: gpt3-52k...
I0425 09:30:02.780235 140339395942208 train_distill.py:600] Loading Teacher from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items...
I0425 09:30:02.780345 140339395942208 train_distill.py:170] --- Teacher Configuration ---
I0425 09:30:02.780374 140339395942208 train_distill.py:171]   Model Name:      gpt3-52k
I0425 09:30:02.780404 140339395942208 train_distill.py:172]   Dimensions:      1 Layers, 16 Emb Dim, 8 Head Dim
I0425 09:30:02.780435 140339395942208 train_distill.py:175]   Attention Heads: 2 Query, 2 KV
I0425 09:30:02.780462 140339395942208 train_distill.py:176]   Vocab Size:      32000
I0425 09:30:02.780492 140339395942208 train_distill.py:177]   Checkpoint:      gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items
I0425 09:30:02.780520 140339395942208 train_distill.py:451] Initializing model: gpt3-52k...
I0425 09:30:03.777762 140339395942208 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0425 09:30:03.777917 140339395942208 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=True, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7fa2979f3a40>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0425 09:30:03.777975 140339395942208 abstract_checkpointer.py:35] orbax-checkpoint version: 0.11.28
W0425 09:30:04.271780 140339395942208 checkpoint.py:202] Metadata file does not exist: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items/_CHECKPOINT_METADATA
I0425 09:30:04.807118    1961 google_auth_provider.cc:181] Running on GCE, using service account 562977990677-compute@developer.gserviceaccount.com
I0425 09:30:05.941786 140339395942208 checkpointer.py:304] Restoring checkpoint from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items.
W0425 09:30:08.112149 140339395942208 transform_utils.py:230] The transformations API will eventually be replaced by an upgraded design. The current API will not be removed until this point, but it will no longer be actively worked on.
I0425 09:30:08.112513 140339395942208 transform_utils.py:288] The following keys are not loaded from the original tree after applying specified transforms: params/params/decoder/to_nnx__rngs/aqt/count, params/params/decoder/to_nnx__rngs/aqt/key, params/params/decoder/to_nnx__rngs/dropout/count, params/params/decoder/to_nnx__rngs/dropout/key, params/params/decoder/to_nnx__rngs/params/count, params/params/decoder/to_nnx__rngs/params/key
I0425 09:30:10.954910 140339395942208 checkpointer.py:318] Finished restoring checkpoint in 5.39 seconds from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items.
I0425 09:30:11.682890 140339395942208 train_distill.py:626] Initializing Data Iterators via MaxText pipeline...
I0425 09:30:11.745497 140339395942208 config.py:112] TensorFlow version 2.20.0 available.
I0425 09:30:11.745982 140339395942208 config.py:125] JAX version 0.9.2 available.
I0425 09:30:12.160791 140339395942208 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/main/README.md "HTTP/1.1 307 Temporary Redirect"
I0425 09:30:12.169998 140339395942208 _client.py:1025] HTTP Request: HEAD https://huggingface.co/api/resolve-cache/datasets/OptimalScale/ClimbMix/6d467b96d8f26cbe7465e2d70e36191aa75867ac/README.md "HTTP/1.1 200 OK"
I0425 09:30:12.179620 140339395942208 _client.py:1025] HTTP Request: GET https://huggingface.co/api/resolve-cache/datasets/OptimalScale/ClimbMix/6d467b96d8f26cbe7465e2d70e36191aa75867ac/README.md "HTTP/1.1 200 OK"
I0425 09:30:12.286998 140339395942208 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/6d467b96d8f26cbe7465e2d70e36191aa75867ac/ClimbMix.py "HTTP/1.1 404 Not Found"
I0425 09:30:12.598653 140339395942208 _client.py:1025] HTTP Request: HEAD https://s3.amazonaws.com/datasets.huggingface.co/datasets/datasets/OptimalScale/ClimbMix/OptimalScale/ClimbMix.py "HTTP/1.1 404 Not Found"
I0425 09:30:12.712517 140339395942208 _client.py:1025] HTTP Request: GET https://huggingface.co/api/datasets/OptimalScale/ClimbMix/revision/6d467b96d8f26cbe7465e2d70e36191aa75867ac "HTTP/1.1 200 OK"
I0425 09:30:12.829390 140339395942208 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/6d467b96d8f26cbe7465e2d70e36191aa75867ac/.huggingface.yaml "HTTP/1.1 404 Not Found"
I0425 09:30:13.022998 140339395942208 _client.py:1025] HTTP Request: GET https://datasets-server.huggingface.co/info?dataset=OptimalScale/ClimbMix "HTTP/1.1 200 OK"
I0425 09:30:13.131421 140339395942208 _client.py:1025] HTTP Request: GET https://huggingface.co/api/datasets/OptimalScale/ClimbMix/tree/6d467b96d8f26cbe7465e2d70e36191aa75867ac/data?recursive=true&expand=false "HTTP/1.1 404 Not Found"
I0425 09:30:13.246297 140339395942208 _client.py:1025] HTTP Request: GET https://huggingface.co/api/datasets/OptimalScale/ClimbMix/tree/6d467b96d8f26cbe7465e2d70e36191aa75867ac?recursive=false&expand=false "HTTP/1.1 200 OK"
I0425 09:30:13.385432 140339395942208 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/6d467b96d8f26cbe7465e2d70e36191aa75867ac/dataset_infos.json "HTTP/1.1 404 Not Found"
I0425 09:30:13.549421 140339395942208 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json "HTTP/1.1 200 OK"
I0425 09:30:13.664545 140339395942208 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer_config.json "HTTP/1.1 200 OK"
I0425 09:30:13.778771 140339395942208 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main/additional_chat_templates?recursive=false&expand=false "HTTP/1.1 404 Not Found"
I0425 09:30:13.888999 140339395942208 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main?recursive=true&expand=false "HTTP/1.1 200 OK"
E0425 09:30:13.982915 140339395942208 packing.py:209] PackAndBatchOperation is deprecated. Please use lazy_dataset.FirstFitPackIterDataset instead.
I0425 09:30:13.983140 140339395942208 data_loader.py:408] Adding CopyNumPyArrayToSharedMemory MapTransform.
I0425 09:30:13.986122 140339395942208 train_distill.py:396] Input Pipeline Checkpointing: DISABLED
I0425 09:30:13.986182 140339395942208 train_distill.py:400] Reason: Iterator 'MultiHostDataLoadIterator' is not recognized as Grain (dataset_type='DatasetType.HF', has_save=False)
I0425 09:30:13.986246 140339395942208 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0425 09:30:13.986329 140339395942208 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7fa2979f3a40>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0425 09:30:13.986369 140339395942208 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0425 09:30:13.986400 140339395942208 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7fa2979f3a40>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0425 09:30:13.986442 140339395942208 checkpoint_manager.py:702] [process=6][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f9ce0dae960>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f8b08eb7380>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f8b08eb72c0>}, handler_registry=None
I0425 09:30:13.986636 140339395942208 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f9ce0dae960>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0425 09:30:13.986676 140339395942208 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f8b08eb7380>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0425 09:30:13.986702 140339395942208 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f8b08eb72c0>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0425 09:30:13.986726 140339395942208 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f9cc18d82c0>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0425 09:30:13.986752 140339395942208 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f9ce0dae960>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f9ce0dae960>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f8b08eb7380>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f8b08eb7380>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f8b08eb72c0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f8b08eb72c0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f9cc18d82c0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f9cc18d82c0>}).
I0425 09:30:13.987126 140339395942208 async_checkpointer.py:177] [process=6][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7f89b43d7740> timeout: 600 secs and primary_host=0 for async checkpoint writes
I0425 09:30:15.579421 140339395942208 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_trainstate_and_training_loop_20260425_092123/pt_distill_linen_xpk_feat_nnx_trainstate_and_training_loop_20260425_092123_07_distill_smoke/checkpoints
I0425 09:30:16.014481 140339395942208 checkpoint_manager.py:921] [process=6][thread=MainThread] CheckpointManager created,  primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_trainstate_and_training_loop_20260425_092123/pt_distill_linen_xpk_feat_nnx_trainstate_and_training_loop_20260425_092123_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7f8b08eb7290>
I0425 09:30:16.014650 140339395942208 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0425 09:30:16.014719 140339395942208 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7fa2979f3a40>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0425 09:30:16.014758 140339395942208 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0425 09:30:16.014790 140339395942208 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7fa2979f3a40>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0425 09:30:16.014826 140339395942208 checkpoint_manager.py:1983] [process=6][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning.
I0425 09:30:16.014883 140339395942208 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=140339395942208 count=1 at 0x7f89b4761380>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7f8b08eb70b0>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7f8b08eb7080>, _write_futures=[])
I0425 09:30:16.015355 140339395942208 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=140339395942208 count=1 at 0x7f89b4761380>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7f8b08eb70b0>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7f8b08eb7080>, _write_futures=[])
I0425 09:30:16.015386 140339395942208 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=140339395942208 count=1 at 0x7f89b4761380>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7f8b08eb70b0>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7f8b08eb7080>, _write_futures=[])
I0425 09:30:16.015419 140339395942208 checkpoint_manager.py:702] [process=6][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f89dc0d9370>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f87e6fdf140>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f87e6fdf1a0>, 'iter': <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7f87e6fdf440>}, handler_registry=None
I0425 09:30:16.015656 140339395942208 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f89dc0d9370>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0425 09:30:16.015708 140339395942208 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f87e6fdf140>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0425 09:30:16.015735 140339395942208 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f87e6fdf1a0>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0425 09:30:16.015765 140339395942208 composite_checkpoint_handler.py:237] Deferred registration for item: "iter". Adding handler `<maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7f87e6fdf440>` for item "iter" and save args `<class 'maxtext.common.checkpointing.GrainCheckpointSave'>` and restore args `<class 'maxtext.common.checkpointing.GrainCheckpointRestore'>` to `_handler_registry`.
I0425 09:30:16.015789 140339395942208 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f87e6fde930>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0425 09:30:16.015814 140339395942208 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f89dc0d9370>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f89dc0d9370>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f87e6fdf140>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f87e6fdf140>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f87e6fdf1a0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f87e6fdf1a0>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointSave'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7f87e6fdf440>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointRestore'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7f87e6fdf440>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f87e6fde930>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f87e6fde930>}).
I0425 09:30:16.015890 140339395942208 async_checkpointer.py:177] [process=6][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7f89b43d7880> timeout: 600 secs and primary_host=0 for async checkpoint writes
I0425 09:30:16.401747 140339395942208 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_trainstate_and_training_loop_20260425_092123/pt_distill_linen_xpk_feat_nnx_trainstate_and_training_loop_20260425_092123_07_distill_smoke/checkpoints
I0425 09:30:16.833690 140339395942208 checkpoint_manager.py:921] [process=6][thread=MainThread] CheckpointManager created,  primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_trainstate_and_training_loop_20260425_092123/pt_distill_linen_xpk_feat_nnx_trainstate_and_training_loop_20260425_092123_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7f9ce97400b0>
I0425 09:30:16.834284 140339395942208 train_distill.py:677] Starting Distillation Training...
I0425 09:30:16.834399 140339395942208 peft_trainer.py:584] Training with mesh: Mesh('diloco': 1, 'data': 4, 'stage': 1, 'fsdp': 8, 'fsdp_transpose': 1, 'sequence': 1, 'context': 1, 'context_autoregressive': 1, 'tensor': 1, 'tensor_transpose': 1, 'tensor_sequence': 1, 'expert': 1, 'autoregressive': 1, axis_types=(Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto))
I0425 09:30:17.294514 140339395942208 peft_trainer.py:594] Compiled train_step cache size: 0
I0425 09:30:17.296142 140185534699264 grain_pool.py:367] Grain pool will use 1 processes.
I0425 09:30:17.352389 140185534699264 grain_pool.py:440] Grain pool will start child processes.
I0425 09:30:17.358146 140185534699264 grain_pool.py:448] Grain pool started all child processes.
2026-04-25 09:30:23.833483: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
`rope_parameters`'s factor field must be a float >= 1, got 40
`rope_parameters`'s beta_fast field must be a float, got 32
`rope_parameters`'s beta_slow field must be a float, got 1
DeepseekV32Config got `key=rope_scaling` in kwargs but hasn't set it as attribute. For RoPE standardization you need to set `self.rope_parameters` in model's config. 
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 781, in <module>
    app.run(main)
  File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 316, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 261, in _run_main
    sys.exit(main(argv))
             ^^^^^^^^^^
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 777, in main
    train_distill(student_config, teacher_config, is_offline, global_config.offline_data_dir)
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 679, in train_distill
    trainer.train(train_iter, eval_iter)
  File "/usr/local/lib/python3.12/site-packages/tunix/sft/peft_trainer.py", line 652, in train
    train_example = sharding_utils.shard_input(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 58, in shard_input
    return jax.tree.map(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/tree.py", line 156, in map
    return tree_util.tree_map(f, tree, *rest, is_leaf=is_leaf)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 373, in tree_map
    return treedef.unflatten(f(*xs) for xs in zip(*all_leaves))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 373, in <genexpr>
    return treedef.unflatten(f(*xs) for xs in zip(*all_leaves))
                             ^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 59, in <lambda>
    lambda x: jax.make_array_from_process_local_data(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 985, in make_array_from_process_local_data
    out = [_array_from_process_local_data(data, s, shape)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 1047, in _array_from_process_local_data
    return make_array_from_callback(global_shape, sharding, cb)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 844, in make_array_from_callback
    per_device_values = api.device_put(per_device_values, devices)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/api.py", line 2732, in device_put
    out_flat = dispatch._batched_device_put_impl(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 602, in _batched_device_put_impl
    y = _device_put_impl(x, device=device, src=src, copy=cp, aval=aval)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 582, in _device_put_impl
    return _device_put_sharding_impl(x, aval, device, copy)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 512, in _device_put_sharding_impl
    raise ValueError(
ValueError: When the second argument to `device_put` is a Device, the first argument must be a fully addressable array or a non-addressable array with a single device sharding. Got value with devices {TpuDevice(id=16, process_index=4, coords=(0,4,0), core_on_chip=0), TpuDevice(id=4, process_index=0, coords=(0,1,0), core_on_chip=0), TpuDevice(id=25, process_index=6, coords=(1,6,0), core_on_chip=0), TpuDevice(id=27, process_index=7, coords=(3,6,0), core_on_chip=0), TpuDevice(id=21, process_index=4, coords=(1,5,0), core_on_chip=0), TpuDevice(id=2, process_index=1, coords=(2,0,0), core_on_chip=0), TpuDevice(id=10, process_index=3, coords=(2,2,0), core_on_chip=0), TpuDevice(id=3, process_index=1, coords=(3,0,0), core_on_chip=0), TpuDevice(id=9, process_index=2, coords=(1,2,0), core_on_chip=0), TpuDevice(id=23, process_index=5, coords=(3,5,0), core_on_chip=0), TpuDevice(id=13, process_index=2, coords=(1,3,0), core_on_chip=0), TpuDevice(id=7, process_index=1, coords=(3,1,0), core_on_chip=0), TpuDevice(id=8, process_index=2, coords=(0,2,0), core_on_chip=0), TpuDevice(id=28, process_index=6, coords=(0,7,0), core_on_chip=0), TpuDevice(id=30, process_index=7, coords=(2,7,0), core_on_chip=0), TpuDevice(id=6, process_index=1, coords=(2,1,0), core_on_chip=0), TpuDevice(id=14, process_index=3, coords=(2,3,0), core_on_chip=0), TpuDevice(id=17, process_index=4, coords=(1,4,0), core_on_chip=0), TpuDevice(id=18, process_index=5, coords=(2,4,0), core_on_chip=0), TpuDevice(id=20, process_index=4, coords=(0,5,0), core_on_chip=0), TpuDevice(id=29, process_index=6, coords=(1,7,0), core_on_chip=0), TpuDevice(id=31, process_index=7, coords=(3,7,0), core_on_chip=0), TpuDevice(id=12, process_index=2, coords=(0,3,0), core_on_chip=0), TpuDevice(id=11, process_index=3, coords=(3,2,0), core_on_chip=0), TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), TpuDevice(id=19, process_index=5, coords=(3,4,0), core_on_chip=0), TpuDevice(id=15, process_index=3, coords=(3,3,0), core_on_chip=0), TpuDevice(id=22, process_index=5, coords=(2,5,0), core_on_chip=0), TpuDevice(id=5, process_index=0, coords=(1,1,0), core_on_chip=0), TpuDevice(id=24, process_index=6, coords=(0,6,0), core_on_chip=0), TpuDevice(id=26, process_index=7, coords=(2,6,0), core_on_chip=0), TpuDevice(id=1, process_index=0, coords=(1,0,0), core_on_chip=0)}
I0425 09:30:27.992850 140185534699264 grain_pool.py:542] Grain pool is exiting.
I0425 09:30:27.992951 140185534699264 grain_pool.py:547] Shutting down multiprocessing system.
I0425 09:30:29.705719 140185534699264 grain_pool.py:547] Shutting down multiprocessing system.
/usr/local/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 15 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
XPK End: Sat Apr 25 09:30:38 UTC 2026
EXIT_CODE=1