MaxView

← Back to run

Log Summary

XPK Start: Fri Apr 24 09:31:26 UTC 2026
2026-04-24 09:31:44.129456: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
`rope_parameters`'s factor field must be a float >= 1, got 40
`rope_parameters`'s beta_fast field must be a float, got 32
`rope_parameters`'s beta_slow field must be a float, got 1
DeepseekV32Config got `key=rope_scaling` in kwargs but hasn't set it as attribute. For RoPE standardization you need to set `self.rope_parameters` in model's config. 
I0424 09:31:50.473319 132915196512064 max_utils.py:273] Attempting to initialize the jax distributed system...
I0424 09:31:59.513968 132915196512064 distributed.py:149] Starting JAX distributed service on [::]:8482
I0424 09:31:59.516143 132915196512064 distributed.py:172] Connecting to JAX distributed service on mt-07-distill-smoke-9oq9k-slice-job-0-0.mt-07-distill-smoke-9oq9k:8482
I0424 09:32:00.618337 132915196512064 max_utils.py:284] Jax distributed system initialized!
I0424 09:32:07.016480 132915196512064 max_utils.py:244] Jax distributed system is already initialized.
W0424 09:32:07.146882 132915196512064 pyconfig.py:111] base_output_directory is not provided; Using local directory called maxtext_output
I0424 09:32:07.207059 132915196512064 max_utils.py:244] Jax distributed system is already initialized.
I0424 09:32:07.208300 132915196512064 pyconfig.py:471] Config param abort_on_inf_loss: True
I0424 09:32:07.208349 132915196512064 pyconfig.py:471] Config param abort_on_nan_loss: True
I0424 09:32:07.208375 132915196512064 pyconfig.py:471] Config param act_quantization_calibration_method: absmax
I0424 09:32:07.208396 132915196512064 pyconfig.py:471] Config param activation_dropout_for_audio: 0.0
I0424 09:32:07.208416 132915196512064 pyconfig.py:471] Config param activation_function_for_audio: gelu
I0424 09:32:07.208433 132915196512064 pyconfig.py:471] Config param activations_in_float32: False
I0424 09:32:07.208451 132915196512064 pyconfig.py:471] Config param adam_b1: 0.9
I0424 09:32:07.208470 132915196512064 pyconfig.py:471] Config param adam_b2: 0.95
I0424 09:32:07.208487 132915196512064 pyconfig.py:471] Config param adam_eps: 1e-08
I0424 09:32:07.208510 132915196512064 pyconfig.py:471] Config param adam_eps_root: 0.0
I0424 09:32:07.208526 132915196512064 pyconfig.py:471] Config param adam_weight_decay: 0.1
I0424 09:32:07.208546 132915196512064 pyconfig.py:471] Config param adamw_mask: []
I0424 09:32:07.208561 132915196512064 pyconfig.py:471] Config param add_bos: True
I0424 09:32:07.208578 132915196512064 pyconfig.py:471] Config param add_eos: True
I0424 09:32:07.208594 132915196512064 pyconfig.py:471] Config param allow_split_physical_axes: False
I0424 09:32:07.208610 132915196512064 pyconfig.py:471] Config param ar_cache_axis_order: 1,2,0,3
I0424 09:32:07.208626 132915196512064 pyconfig.py:471] Config param async_checkpointing: True
I0424 09:32:07.208642 132915196512064 pyconfig.py:471] Config param async_scheduling: False
I0424 09:32:07.208657 132915196512064 pyconfig.py:471] Config param attention: dot_product
I0424 09:32:07.208674 132915196512064 pyconfig.py:471] Config param attention_bias: False
I0424 09:32:07.208691 132915196512064 pyconfig.py:471] Config param attention_dropout_for_audio: 0.0
I0424 09:32:07.208706 132915196512064 pyconfig.py:471] Config param attention_out: RematLocation.REMAT
I0424 09:32:07.208727 132915196512064 pyconfig.py:471] Config param attention_output_dim: -1
I0424 09:32:07.208742 132915196512064 pyconfig.py:471] Config param attention_sink: False
I0424 09:32:07.208758 132915196512064 pyconfig.py:471] Config param attention_type: global
I0424 09:32:07.208772 132915196512064 pyconfig.py:471] Config param attn_logits_soft_cap: None
I0424 09:32:07.208789 132915196512064 pyconfig.py:471] Config param audio_path: 
I0424 09:32:07.208804 132915196512064 pyconfig.py:471] Config param audio_placeholder: <|audio|>
I0424 09:32:07.208820 132915196512064 pyconfig.py:471] Config param autoregressive_decode_assert: 
I0424 09:32:07.208836 132915196512064 pyconfig.py:471] Config param base_config: base.yml
I0424 09:32:07.208850 132915196512064 pyconfig.py:471] Config param base_emb_dim: 16
I0424 09:32:07.208866 132915196512064 pyconfig.py:471] Config param base_mlp_dim: 64
I0424 09:32:07.208880 132915196512064 pyconfig.py:471] Config param base_moe_mlp_dim: -1
I0424 09:32:07.208896 132915196512064 pyconfig.py:471] Config param base_num_decoder_layers: 1
I0424 09:32:07.208912 132915196512064 pyconfig.py:471] Config param base_num_kv_heads: 2
I0424 09:32:07.208927 132915196512064 pyconfig.py:471] Config param base_num_query_heads: 2
I0424 09:32:07.208942 132915196512064 pyconfig.py:471] Config param base_output_directory: /deps/maxtext_output
I0424 09:32:07.208957 132915196512064 pyconfig.py:471] Config param batch_size: 1
I0424 09:32:07.208972 132915196512064 pyconfig.py:471] Config param batch_split_factor: 1
I0424 09:32:07.208988 132915196512064 pyconfig.py:471] Config param beta_fast: 32
I0424 09:32:07.209002 132915196512064 pyconfig.py:471] Config param beta_slow: 1
I0424 09:32:07.209018 132915196512064 pyconfig.py:471] Config param bwd_quantization_calibration_method: absmax
I0424 09:32:07.209035 132915196512064 pyconfig.py:471] Config param capacity_factor: -1.0
I0424 09:32:07.209051 132915196512064 pyconfig.py:471] Config param cast_logits_to_fp32: True
I0424 09:32:07.209066 132915196512064 pyconfig.py:471] Config param chat_template: 
I0424 09:32:07.209082 132915196512064 pyconfig.py:471] Config param chat_template_path: 
I0424 09:32:07.209115 132915196512064 pyconfig.py:471] Config param checkpoint_conversion_fn: None
I0424 09:32:07.209131 132915196512064 pyconfig.py:471] Config param checkpoint_dir: /deps/maxtext_output/gpt3-52k_2026-04-24-09-32/checkpoints/
I0424 09:32:07.209148 132915196512064 pyconfig.py:471] Config param checkpoint_is_quantized: False
I0424 09:32:07.209165 132915196512064 pyconfig.py:471] Config param checkpoint_period: 2000
I0424 09:32:07.209180 132915196512064 pyconfig.py:471] Config param checkpoint_storage_concurrent_gb: 96
I0424 09:32:07.209197 132915196512064 pyconfig.py:471] Config param checkpoint_storage_target_data_file_size_bytes: 2147483648
I0424 09:32:07.209213 132915196512064 pyconfig.py:471] Config param checkpoint_storage_use_ocdbt: True
I0424 09:32:07.209227 132915196512064 pyconfig.py:471] Config param checkpoint_storage_use_zarr3: True
I0424 09:32:07.209243 132915196512064 pyconfig.py:471] Config param checkpoint_todelete_full_path: None
I0424 09:32:07.209257 132915196512064 pyconfig.py:471] Config param checkpoint_todelete_subdir: None
I0424 09:32:07.209273 132915196512064 pyconfig.py:471] Config param chips_per_vm: 4
I0424 09:32:07.209287 132915196512064 pyconfig.py:471] Config param chunk_attn_window_size: 0
I0424 09:32:07.209306 132915196512064 pyconfig.py:471] Config param collect_stack_trace: False
I0424 09:32:07.209448 132915196512064 pyconfig.py:471] Config param colocated_python_checkpointing: False
I0424 09:32:07.209552 132915196512064 pyconfig.py:471] Config param colocated_python_data_input: False
I0424 09:32:07.209578 132915196512064 pyconfig.py:471] Config param compile_topology: 
I0424 09:32:07.209598 132915196512064 pyconfig.py:471] Config param compile_topology_num_slices: -1
I0424 09:32:07.209617 132915196512064 pyconfig.py:471] Config param compile_xla_flags: 
I0424 09:32:07.209633 132915196512064 pyconfig.py:471] Config param compiled_trainstep_file: 
I0424 09:32:07.209650 132915196512064 pyconfig.py:471] Config param compute_axis_order: 0,1,2,3
I0424 09:32:07.209667 132915196512064 pyconfig.py:471] Config param constant_bound_config: []
I0424 09:32:07.209682 132915196512064 pyconfig.py:471] Config param context: RematLocation.REMAT
I0424 09:32:07.209704 132915196512064 pyconfig.py:471] Config param context_parallel_load_balance: True
I0424 09:32:07.209721 132915196512064 pyconfig.py:471] Config param context_parallel_reorder_strategy: ReorderStrategy.AUTO
I0424 09:32:07.209743 132915196512064 pyconfig.py:471] Config param context_parallel_size: 1
I0424 09:32:07.209760 132915196512064 pyconfig.py:471] Config param context_parallel_strategy: all_gather
I0424 09:32:07.209781 132915196512064 pyconfig.py:471] Config param context_sharding: context
I0424 09:32:07.209796 132915196512064 pyconfig.py:471] Config param conv_chunksize_for_audio: 500
I0424 09:32:07.209810 132915196512064 pyconfig.py:471] Config param conv_stride_for_vit: 14
I0424 09:32:07.209824 132915196512064 pyconfig.py:471] Config param convert_checkpoint_if_possible: False
I0424 09:32:07.209841 132915196512064 pyconfig.py:471] Config param cost_estimate_flops_bwd: -1
I0424 09:32:07.209855 132915196512064 pyconfig.py:471] Config param cost_estimate_flops_fwd: -1
I0424 09:32:07.209870 132915196512064 pyconfig.py:471] Config param custom_mesh: 
I0424 09:32:07.209884 132915196512064 pyconfig.py:471] Config param custom_mesh_and_rule: 
I0424 09:32:07.209900 132915196512064 pyconfig.py:471] Config param d_model_for_audio: 256
I0424 09:32:07.209914 132915196512064 pyconfig.py:471] Config param data_sharding: (('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'),)
I0424 09:32:07.209935 132915196512064 pyconfig.py:471] Config param data_shuffle_seed: 0
I0424 09:32:07.209950 132915196512064 pyconfig.py:471] Config param dataset_name: c4/en:3.0.1
I0424 09:32:07.209965 132915196512064 pyconfig.py:471] Config param dataset_path: 
I0424 09:32:07.209980 132915196512064 pyconfig.py:471] Config param dataset_type: DatasetType.HF
I0424 09:32:07.209998 132915196512064 pyconfig.py:471] Config param dcn_autoregressive_parallelism: 1
I0424 09:32:07.210014 132915196512064 pyconfig.py:471] Config param dcn_context_autoregressive_parallelism: 1
I0424 09:32:07.210029 132915196512064 pyconfig.py:471] Config param dcn_context_parallelism: 1
I0424 09:32:07.210043 132915196512064 pyconfig.py:471] Config param dcn_data_parallelism: -1
I0424 09:32:07.210058 132915196512064 pyconfig.py:471] Config param dcn_diloco_parallelism: 1
I0424 09:32:07.210072 132915196512064 pyconfig.py:471] Config param dcn_expert_parallelism: 1
I0424 09:32:07.210088 132915196512064 pyconfig.py:471] Config param dcn_fsdp_parallelism: 1
I0424 09:32:07.210117 132915196512064 pyconfig.py:471] Config param dcn_fsdp_transpose_parallelism: 1
I0424 09:32:07.210133 132915196512064 pyconfig.py:471] Config param dcn_parallelism: [1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
I0424 09:32:07.210150 132915196512064 pyconfig.py:471] Config param dcn_pipeline_parallelism: 1
I0424 09:32:07.210164 132915196512064 pyconfig.py:471] Config param dcn_sequence_parallelism: 1
I0424 09:32:07.210179 132915196512064 pyconfig.py:471] Config param dcn_tensor_parallelism: 1
I0424 09:32:07.210193 132915196512064 pyconfig.py:471] Config param dcn_tensor_sequence_parallelism: 1
I0424 09:32:07.210208 132915196512064 pyconfig.py:471] Config param dcn_tensor_transpose_parallelism: 1
I0424 09:32:07.210222 132915196512064 pyconfig.py:471] Config param debug: {'rl': False}
I0424 09:32:07.210238 132915196512064 pyconfig.py:471] Config param debug_sharding: False
I0424 09:32:07.210252 132915196512064 pyconfig.py:471] Config param decode_sampling_nucleus_p: -1
I0424 09:32:07.210266 132915196512064 pyconfig.py:471] Config param decode_sampling_strategy: SamplingStrategy.GREEDY
I0424 09:32:07.210283 132915196512064 pyconfig.py:471] Config param decode_sampling_temperature: 1.0
I0424 09:32:07.210299 132915196512064 pyconfig.py:471] Config param decode_sampling_top_k: 0
I0424 09:32:07.210320 132915196512064 pyconfig.py:471] Config param decoder_block: DecoderBlockType.GPT3
I0424 09:32:07.210336 132915196512064 pyconfig.py:471] Config param decoder_layer_input: RematLocation.DEVICE
I0424 09:32:07.210353 132915196512064 pyconfig.py:471] Config param deepstack_visual_indexes_for_vit: []
I0424 09:32:07.210367 132915196512064 pyconfig.py:471] Config param degenerate_group_masking: True
I0424 09:32:07.210384 132915196512064 pyconfig.py:471] Config param dense_init_scale: 1.0
I0424 09:32:07.210399 132915196512064 pyconfig.py:471] Config param diloco_outer_lr: 0.3
I0424 09:32:07.210415 132915196512064 pyconfig.py:471] Config param diloco_outer_momentum: 0.9
I0424 09:32:07.210431 132915196512064 pyconfig.py:471] Config param diloco_sync_period: 36
I0424 09:32:07.210447 132915196512064 pyconfig.py:471] Config param distill_alpha: 0.5
I0424 09:32:07.210464 132915196512064 pyconfig.py:471] Config param distill_alpha_end: None
I0424 09:32:07.210479 132915196512064 pyconfig.py:471] Config param distill_alpha_schedule: constant
I0424 09:32:07.210494 132915196512064 pyconfig.py:471] Config param distill_beta: 0.0
I0424 09:32:07.210510 132915196512064 pyconfig.py:471] Config param distill_beta_end: None
I0424 09:32:07.210524 132915196512064 pyconfig.py:471] Config param distill_beta_schedule: constant
I0424 09:32:07.210539 132915196512064 pyconfig.py:471] Config param distill_feature_loss_type: cosine
I0424 09:32:07.210553 132915196512064 pyconfig.py:471] Config param distill_layer_indices: None
I0424 09:32:07.210569 132915196512064 pyconfig.py:471] Config param distill_temperature: 1.0
I0424 09:32:07.210583 132915196512064 pyconfig.py:471] Config param distill_temperature_end: None
I0424 09:32:07.210598 132915196512064 pyconfig.py:471] Config param distill_temperature_schedule: constant
I0424 09:32:07.210612 132915196512064 pyconfig.py:471] Config param downsample_hidden_size_for_audio: 256
I0424 09:32:07.210627 132915196512064 pyconfig.py:471] Config param dpo_beta: 0.1
I0424 09:32:07.210642 132915196512064 pyconfig.py:471] Config param dpo_label_smoothing: 0.0
I0424 09:32:07.210657 132915196512064 pyconfig.py:471] Config param dq_reduction_steps: 0
I0424 09:32:07.210671 132915196512064 pyconfig.py:471] Config param dropout_rate: 0.0
I0424 09:32:07.210687 132915196512064 pyconfig.py:471] Config param dtype: bfloat16
I0424 09:32:07.210725 132915196512064 pyconfig.py:471] Config param dtype_mm: float32
I0424 09:32:07.210741 132915196512064 pyconfig.py:471] Config param dump_hlo: False
I0424 09:32:07.210757 132915196512064 pyconfig.py:471] Config param dump_hlo_delete_local_after: True
I0424 09:32:07.210772 132915196512064 pyconfig.py:471] Config param dump_hlo_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-24-09-32/xla_dump
I0424 09:32:07.210787 132915196512064 pyconfig.py:471] Config param dump_hlo_local_dir: /tmp/xla_dump/
I0424 09:32:07.210802 132915196512064 pyconfig.py:471] Config param dump_hlo_local_module_name: jit_train_step
I0424 09:32:07.210817 132915196512064 pyconfig.py:471] Config param dump_hlo_module_name: jit_train_step
I0424 09:32:07.210831 132915196512064 pyconfig.py:471] Config param dump_hlo_upload_all: False
I0424 09:32:07.210846 132915196512064 pyconfig.py:471] Config param dump_hlo_xla_flags: 
I0424 09:32:07.210860 132915196512064 pyconfig.py:471] Config param dump_jaxpr: False
I0424 09:32:07.210875 132915196512064 pyconfig.py:471] Config param dump_jaxpr_delete_local_after: True
I0424 09:32:07.210889 132915196512064 pyconfig.py:471] Config param dump_jaxpr_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-24-09-32/jaxpr_dump
I0424 09:32:07.210904 132915196512064 pyconfig.py:471] Config param dump_jaxpr_local_dir: /tmp/jaxpr_dump/
I0424 09:32:07.210918 132915196512064 pyconfig.py:471] Config param dump_step: -1
I0424 09:32:07.210934 132915196512064 pyconfig.py:471] Config param elastic_enabled: False
I0424 09:32:07.210948 132915196512064 pyconfig.py:471] Config param elastic_max_retries: 10
I0424 09:32:07.210964 132915196512064 pyconfig.py:471] Config param elastic_timeout_seconds: 300
I0424 09:32:07.210979 132915196512064 pyconfig.py:471] Config param emb_dim: 16
I0424 09:32:07.210994 132915196512064 pyconfig.py:471] Config param enable_autocheckpoint: False
I0424 09:32:07.211009 132915196512064 pyconfig.py:471] Config param enable_checkpoint_cloud_logger: False
I0424 09:32:07.211025 132915196512064 pyconfig.py:471] Config param enable_checkpointing: True
I0424 09:32:07.211044 132915196512064 pyconfig.py:471] Config param enable_continuous_checkpointing: False
I0424 09:32:07.211058 132915196512064 pyconfig.py:471] Config param enable_data_shuffling: True
I0424 09:32:07.211073 132915196512064 pyconfig.py:471] Config param enable_diloco: False
I0424 09:32:07.211087 132915196512064 pyconfig.py:471] Config param enable_dp_attention: False
I0424 09:32:07.211112 132915196512064 pyconfig.py:471] Config param enable_dropout: False
I0424 09:32:07.211126 132915196512064 pyconfig.py:471] Config param enable_emergency_checkpoint: False
I0424 09:32:07.211141 132915196512064 pyconfig.py:471] Config param enable_expert_parallel: False
I0424 09:32:07.211155 132915196512064 pyconfig.py:471] Config param enable_gcp_goodput_metrics: True
I0424 09:32:07.211171 132915196512064 pyconfig.py:471] Config param enable_gcp_step_deviation_metrics: True
I0424 09:32:07.211185 132915196512064 pyconfig.py:471] Config param enable_goodput_recording: False
I0424 09:32:07.211200 132915196512064 pyconfig.py:471] Config param enable_jax_profiler: False
I0424 09:32:07.211215 132915196512064 pyconfig.py:471] Config param enable_llm_inference_pool: False
I0424 09:32:07.211229 132915196512064 pyconfig.py:471] Config param enable_model_warmup: False
I0424 09:32:07.211244 132915196512064 pyconfig.py:471] Config param enable_multi_tier_checkpointing: False
I0424 09:32:07.211258 132915196512064 pyconfig.py:471] Config param enable_nnx: False
I0424 09:32:07.211274 132915196512064 pyconfig.py:471] Config param enable_orbax_v1: False
I0424 09:32:07.211289 132915196512064 pyconfig.py:471] Config param enable_padding_causal_mask: True
I0424 09:32:07.211303 132915196512064 pyconfig.py:471] Config param enable_pathways_goodput: False
I0424 09:32:07.211323 132915196512064 pyconfig.py:471] Config param enable_prefix_caching: False
I0424 09:32:07.211337 132915196512064 pyconfig.py:471] Config param enable_rampup_batch_size: False
I0424 09:32:07.211352 132915196512064 pyconfig.py:471] Config param enable_single_controller: False
I0424 09:32:07.211366 132915196512064 pyconfig.py:471] Config param enable_single_replica_ckpt_restoring: False
I0424 09:32:07.211381 132915196512064 pyconfig.py:471] Config param enable_tensorboard: True
I0424 09:32:07.211396 132915196512064 pyconfig.py:471] Config param enable_tunix_perf_metrics: False
I0424 09:32:07.211410 132915196512064 pyconfig.py:471] Config param encoder_attention_heads_for_audio: 4
I0424 09:32:07.211426 132915196512064 pyconfig.py:471] Config param encoder_ffn_dim_for_audio: 512
I0424 09:32:07.211439 132915196512064 pyconfig.py:471] Config param encoder_layers_for_audio: 2
I0424 09:32:07.211455 132915196512064 pyconfig.py:471] Config param engram: RematLocation.REMAT
I0424 09:32:07.211470 132915196512064 pyconfig.py:471] Config param engram_head_dim: 1280
I0424 09:32:07.211485 132915196512064 pyconfig.py:471] Config param engram_kernel_size: 4
I0424 09:32:07.211499 132915196512064 pyconfig.py:471] Config param engram_layers: []
I0424 09:32:07.211514 132915196512064 pyconfig.py:471] Config param engram_max_ngram_size: 3
I0424 09:32:07.211528 132915196512064 pyconfig.py:471] Config param engram_num_heads: 8
I0424 09:32:07.211543 132915196512064 pyconfig.py:471] Config param engram_seed: 0
I0424 09:32:07.211557 132915196512064 pyconfig.py:471] Config param engram_vocab_bases: []
I0424 09:32:07.211572 132915196512064 pyconfig.py:471] Config param epsilon_high: None
I0424 09:32:07.211587 132915196512064 pyconfig.py:471] Config param eval_corr_lst: False
I0424 09:32:07.211603 132915196512064 pyconfig.py:471] Config param eval_data_columns: ['text']
I0424 09:32:07.211617 132915196512064 pyconfig.py:471] Config param eval_dataset_name: c4/en:3.0.1
I0424 09:32:07.211633 132915196512064 pyconfig.py:471] Config param eval_image_column: image
I0424 09:32:07.211647 132915196512064 pyconfig.py:471] Config param eval_interval: -1
I0424 09:32:07.211661 132915196512064 pyconfig.py:471] Config param eval_make_lst: False
I0424 09:32:07.211676 132915196512064 pyconfig.py:471] Config param eval_per_device_batch_size: 2
I0424 09:32:07.211692 132915196512064 pyconfig.py:471] Config param eval_sampling_strategy: greedy
I0424 09:32:07.211706 132915196512064 pyconfig.py:471] Config param eval_split: validation
I0424 09:32:07.211721 132915196512064 pyconfig.py:471] Config param eval_steps: -1
I0424 09:32:07.211735 132915196512064 pyconfig.py:471] Config param expansion_factor_real_data: -1.0
I0424 09:32:07.211751 132915196512064 pyconfig.py:471] Config param final_logits_soft_cap: None
I0424 09:32:07.211767 132915196512064 pyconfig.py:471] Config param first_num_dense_layers: 0
I0424 09:32:07.211781 132915196512064 pyconfig.py:471] Config param float32_gate_logits: False
I0424 09:32:07.211796 132915196512064 pyconfig.py:471] Config param float32_logits: False
I0424 09:32:07.211811 132915196512064 pyconfig.py:471] Config param float32_qk_product: False
I0424 09:32:07.211826 132915196512064 pyconfig.py:471] Config param float32_weight_sum: True
I0424 09:32:07.211841 132915196512064 pyconfig.py:471] Config param force_q_layout: False
I0424 09:32:07.211855 132915196512064 pyconfig.py:471] Config param force_unroll: False
I0424 09:32:07.211870 132915196512064 pyconfig.py:471] Config param formatting_func_kwargs: {}
I0424 09:32:07.211886 132915196512064 pyconfig.py:471] Config param formatting_func_path: 
I0424 09:32:07.211900 132915196512064 pyconfig.py:471] Config param freeze_audio_encoder_params: True
I0424 09:32:07.211915 132915196512064 pyconfig.py:471] Config param freeze_vision_encoder_params: True
I0424 09:32:07.211929 132915196512064 pyconfig.py:471] Config param fused_mlp: False
I0424 09:32:07.211944 132915196512064 pyconfig.py:471] Config param fused_qkv: True
I0424 09:32:07.211958 132915196512064 pyconfig.py:471] Config param gcs_metrics: False
I0424 09:32:07.211974 132915196512064 pyconfig.py:471] Config param gdn_chunk_size: 64
I0424 09:32:07.211988 132915196512064 pyconfig.py:471] Config param gdn_conv_kernel_dim: 4
I0424 09:32:07.212003 132915196512064 pyconfig.py:471] Config param gdn_key_head_dim: 128
I0424 09:32:07.212017 132915196512064 pyconfig.py:471] Config param gdn_num_key_heads: 16
I0424 09:32:07.212032 132915196512064 pyconfig.py:471] Config param gdn_num_value_heads: 32
I0424 09:32:07.212046 132915196512064 pyconfig.py:471] Config param gdn_value_head_dim: 128
I0424 09:32:07.212061 132915196512064 pyconfig.py:471] Config param generate_padding_batch_eval: False
I0424 09:32:07.212075 132915196512064 pyconfig.py:471] Config param generate_padding_batch_train: False
I0424 09:32:07.212090 132915196512064 pyconfig.py:471] Config param generate_slice: v5e-16
I0424 09:32:07.212113 132915196512064 pyconfig.py:471] Config param generation_configs: {}
I0424 09:32:07.212128 132915196512064 pyconfig.py:471] Config param global_batch_size_to_eval_on: 64
I0424 09:32:07.212142 132915196512064 pyconfig.py:471] Config param global_batch_size_to_load: 512
I0424 09:32:07.212157 132915196512064 pyconfig.py:471] Config param global_batch_size_to_load_eval: 64
I0424 09:32:07.212172 132915196512064 pyconfig.py:471] Config param global_batch_size_to_load_increment: None
I0424 09:32:07.212187 132915196512064 pyconfig.py:471] Config param global_batch_size_to_load_start: None
I0424 09:32:07.212202 132915196512064 pyconfig.py:471] Config param global_batch_size_to_train_on: 512
I0424 09:32:07.212215 132915196512064 pyconfig.py:471] Config param global_head_dim: 0
I0424 09:32:07.212231 132915196512064 pyconfig.py:471] Config param global_num_kv_heads: 0
I0424 09:32:07.212245 132915196512064 pyconfig.py:471] Config param global_parameter_scale: 1
I0424 09:32:07.212259 132915196512064 pyconfig.py:471] Config param global_rampup_samples: 500
I0424 09:32:07.212275 132915196512064 pyconfig.py:471] Config param global_rope_max_timescale: -1
I0424 09:32:07.212289 132915196512064 pyconfig.py:471] Config param global_rope_proportion: 0.25
I0424 09:32:07.212305 132915196512064 pyconfig.py:471] Config param goodput_upload_interval_seconds: 30
I0424 09:32:07.212324 132915196512064 pyconfig.py:471] Config param grad_dtype: float32
I0424 09:32:07.212365 132915196512064 pyconfig.py:471] Config param gradient_accumulation_steps: 8
I0424 09:32:07.212382 132915196512064 pyconfig.py:471] Config param gradient_clipping_threshold: 1.0
I0424 09:32:07.212398 132915196512064 pyconfig.py:471] Config param grain_data_source_max_workers: 16
I0424 09:32:07.212414 132915196512064 pyconfig.py:471] Config param grain_eval_files: 
I0424 09:32:07.212430 132915196512064 pyconfig.py:471] Config param grain_file_type: arrayrecord
I0424 09:32:07.212444 132915196512064 pyconfig.py:471] Config param grain_num_threads: 16
I0424 09:32:07.212460 132915196512064 pyconfig.py:471] Config param grain_num_threads_eval: 16
I0424 09:32:07.212474 132915196512064 pyconfig.py:471] Config param grain_packing_type: first_fit
I0424 09:32:07.212489 132915196512064 pyconfig.py:471] Config param grain_per_worker_buffer_size: 1
I0424 09:32:07.212505 132915196512064 pyconfig.py:471] Config param grain_per_worker_buffer_size_eval: 1
I0424 09:32:07.212520 132915196512064 pyconfig.py:471] Config param grain_prefetch_buffer_size: 500
I0424 09:32:07.212536 132915196512064 pyconfig.py:471] Config param grain_prefetch_buffer_size_eval: 500
I0424 09:32:07.212550 132915196512064 pyconfig.py:471] Config param grain_ram_budget_mb: 1024
I0424 09:32:07.212566 132915196512064 pyconfig.py:471] Config param grain_shuffle_buffer_size: 100
I0424 09:32:07.212580 132915196512064 pyconfig.py:471] Config param grain_train_files: 
I0424 09:32:07.212595 132915196512064 pyconfig.py:471] Config param grain_train_mixture_config_path: 
I0424 09:32:07.212610 132915196512064 pyconfig.py:471] Config param grain_worker_count: 1
I0424 09:32:07.212625 132915196512064 pyconfig.py:471] Config param grain_worker_count_eval: 1
I0424 09:32:07.212640 132915196512064 pyconfig.py:471] Config param grpo_beta: 0.08
I0424 09:32:07.212655 132915196512064 pyconfig.py:471] Config param grpo_epsilon: 0.2
I0424 09:32:07.212669 132915196512064 pyconfig.py:471] Config param hardware: tpu
I0424 09:32:07.212685 132915196512064 pyconfig.py:471] Config param hbm_utilization_vllm: 0.72
I0424 09:32:07.212700 132915196512064 pyconfig.py:471] Config param head_dim: 8
I0424 09:32:07.212714 132915196512064 pyconfig.py:471] Config param heartbeat_reporting_interval_in_seconds: 5
I0424 09:32:07.212729 132915196512064 pyconfig.py:471] Config param hf_data_dir: None
I0424 09:32:07.212743 132915196512064 pyconfig.py:471] Config param hf_eval_files: None
I0424 09:32:07.212759 132915196512064 pyconfig.py:471] Config param hf_eval_split: None
I0424 09:32:07.212773 132915196512064 pyconfig.py:471] Config param hf_name: None
I0424 09:32:07.212788 132915196512064 pyconfig.py:471] Config param hf_path: OptimalScale/ClimbMix
I0424 09:32:07.212802 132915196512064 pyconfig.py:471] Config param hf_train_files: None
I0424 09:32:07.212817 132915196512064 pyconfig.py:471] Config param hidden_size_for_vit: 1408
I0424 09:32:07.212831 132915196512064 pyconfig.py:471] Config param hide_profiler_step_metric: False
I0424 09:32:07.212845 132915196512064 pyconfig.py:471] Config param ici_autoregressive_parallelism: 1
I0424 09:32:07.212861 132915196512064 pyconfig.py:471] Config param ici_context_autoregressive_parallelism: 1
I0424 09:32:07.212875 132915196512064 pyconfig.py:471] Config param ici_context_parallelism: 1
I0424 09:32:07.212890 132915196512064 pyconfig.py:471] Config param ici_data_parallelism: 1
I0424 09:32:07.212905 132915196512064 pyconfig.py:471] Config param ici_diloco_parallelism: 1
I0424 09:32:07.212921 132915196512064 pyconfig.py:471] Config param ici_expert_parallelism: 1
I0424 09:32:07.212935 132915196512064 pyconfig.py:471] Config param ici_fsdp_parallelism: -1
I0424 09:32:07.212950 132915196512064 pyconfig.py:471] Config param ici_fsdp_transpose_parallelism: 1
I0424 09:32:07.212964 132915196512064 pyconfig.py:471] Config param ici_parallelism: [1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
I0424 09:32:07.212980 132915196512064 pyconfig.py:471] Config param ici_pipeline_parallelism: 1
I0424 09:32:07.212995 132915196512064 pyconfig.py:471] Config param ici_sequence_parallelism: 1
I0424 09:32:07.213010 132915196512064 pyconfig.py:471] Config param ici_tensor_parallelism: 1
I0424 09:32:07.213026 132915196512064 pyconfig.py:471] Config param ici_tensor_sequence_parallelism: 1
I0424 09:32:07.213040 132915196512064 pyconfig.py:471] Config param ici_tensor_transpose_parallelism: 1
I0424 09:32:07.213056 132915196512064 pyconfig.py:471] Config param image_path: 
I0424 09:32:07.213070 132915196512064 pyconfig.py:471] Config param image_placeholder: <|image|>
I0424 09:32:07.213085 132915196512064 pyconfig.py:471] Config param image_size_for_vit: 896
I0424 09:32:07.213110 132915196512064 pyconfig.py:471] Config param indexer_head_dim: 128
I0424 09:32:07.213125 132915196512064 pyconfig.py:471] Config param indexer_loss_scaling_factor: 0.0
I0424 09:32:07.213141 132915196512064 pyconfig.py:471] Config param indexer_n_heads: 64
I0424 09:32:07.213155 132915196512064 pyconfig.py:471] Config param indexer_sparse_training: False
I0424 09:32:07.213170 132915196512064 pyconfig.py:471] Config param indexer_topk: 2048
I0424 09:32:07.213185 132915196512064 pyconfig.py:471] Config param inference_benchmark_test: False
I0424 09:32:07.213200 132915196512064 pyconfig.py:471] Config param inference_metadata_file: 
I0424 09:32:07.213215 132915196512064 pyconfig.py:471] Config param inference_microbenchmark_log_file_path: 
I0424 09:32:07.213229 132915196512064 pyconfig.py:471] Config param inference_microbenchmark_loop_iters: 10
I0424 09:32:07.213245 132915196512064 pyconfig.py:471] Config param inference_microbenchmark_num_samples: [1, 2, 3, 4, 5]
I0424 09:32:07.213260 132915196512064 pyconfig.py:471] Config param inference_microbenchmark_prefill_lengths: 64,128,256,512,1024
I0424 09:32:07.213276 132915196512064 pyconfig.py:471] Config param inference_microbenchmark_stages: prefill,generate
I0424 09:32:07.213291 132915196512064 pyconfig.py:471] Config param inference_server: MaxtextInterleavedServer
I0424 09:32:07.213306 132915196512064 pyconfig.py:471] Config param inhomogeneous_layer_cycle_interval: 1
I0424 09:32:07.213324 132915196512064 pyconfig.py:471] Config param init_weights_seed: 0
I0424 09:32:07.213338 132915196512064 pyconfig.py:471] Config param input_data_sharding_logical_axes: ['activation_embed_and_logits_batch', 'activation_norm_length']
I0424 09:32:07.213355 132915196512064 pyconfig.py:471] Config param interleave_moe_layer_step: 1
I0424 09:32:07.213370 132915196512064 pyconfig.py:471] Config param intermediate_size_for_vit: 5632
I0424 09:32:07.213385 132915196512064 pyconfig.py:471] Config param internal_compile: False
I0424 09:32:07.213399 132915196512064 pyconfig.py:471] Config param internal_compile_num_devices: -1
I0424 09:32:07.213414 132915196512064 pyconfig.py:471] Config param jax_cache_dir: ~/jax_cache
I0424 09:32:07.213429 132915196512064 pyconfig.py:471] Config param jax_debug_log_modules: 
I0424 09:32:07.213443 132915196512064 pyconfig.py:471] Config param jax_distributed_initialization_timeout: 300
I0424 09:32:07.213458 132915196512064 pyconfig.py:471] Config param jax_profiler_port: 9999
I0424 09:32:07.213473 132915196512064 pyconfig.py:471] Config param key_proj: RematLocation.REMAT
I0424 09:32:07.213489 132915196512064 pyconfig.py:471] Config param kv_cache_buffer: 256
I0424 09:32:07.213504 132915196512064 pyconfig.py:471] Config param kv_lora_rank: 512
I0424 09:32:07.213518 132915196512064 pyconfig.py:471] Config param kv_quant_axis: KvQuantAxis.HEADS_AND_DKV
I0424 09:32:07.213536 132915196512064 pyconfig.py:471] Config param kv_quant_dtype: int8
I0424 09:32:07.213551 132915196512064 pyconfig.py:471] Config param kv_wa_proj: RematLocation.REMAT
I0424 09:32:07.213566 132915196512064 pyconfig.py:471] Config param learning_rate: 0.0002
I0424 09:32:07.213582 132915196512064 pyconfig.py:471] Config param learning_rate_final_fraction: 0.1
I0424 09:32:07.213597 132915196512064 pyconfig.py:471] Config param learning_rate_schedule_steps: 200000
I0424 09:32:07.213612 132915196512064 pyconfig.py:471] Config param load_balance_loss_weight: 0.0
I0424 09:32:07.213628 132915196512064 pyconfig.py:471] Config param load_checkpoint_only_once: False
I0424 09:32:07.213644 132915196512064 pyconfig.py:471] Config param load_from_prefill_dir: False
I0424 09:32:07.213660 132915196512064 pyconfig.py:471] Config param load_full_state_path: 
I0424 09:32:07.213675 132915196512064 pyconfig.py:471] Config param load_parameters_path: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items
I0424 09:32:07.213690 132915196512064 pyconfig.py:471] Config param local_checkpoint_directory: 
I0424 09:32:07.213705 132915196512064 pyconfig.py:471] Config param local_checkpoint_period: 0
I0424 09:32:07.213719 132915196512064 pyconfig.py:471] Config param local_rope_max_timescale: -1
I0424 09:32:07.213734 132915196512064 pyconfig.py:471] Config param local_rope_proportion: 1.0
I0424 09:32:07.213750 132915196512064 pyconfig.py:471] Config param log_config: True
I0424 09:32:07.213765 132915196512064 pyconfig.py:471] Config param log_period: 10
I0424 09:32:07.213779 132915196512064 pyconfig.py:471] Config param logical_axis_rules: (('activation_embed_and_logits_batch', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_embed_and_logits_batch_sequence', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_vocab', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_vocab', ('tensor', 'tensor_transpose')), ('activation_vocab', 'tensor_sequence'), ('activation_vocab', ('sequence', 'context')), ('vocab', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('embed_vocab', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_batch_attn', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence', 'autoregressive')), ('activation_kv_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence')), ('activation_length_attn', ('sequence', 'context')), ('activation_length_attn', ('context',)), ('activation_q_length', ('context',)), ('activation_kv_length', ()), ('activation_embed_attn', ('tensor', 'tensor_transpose')), ('activation_kv', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_kv_head_dim', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('q_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('kv_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('qkv', ()), ('kv', ()), ('kv_head_dim', ()), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'expert')), ('q_lora_up_proj', ()), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'expert')), ('kv_lora_up_proj', ()), ('activation_batch_moe', ('data', 'fsdp', 'fsdp_transpose')), ('activation_length_moe', ('sequence', 'context')), ('activation_length_moe', ('context',)), ('activation_norm_length_moe', ('tensor_sequence', 'context', 'sequence')), ('activation_embed_moe', ('tensor', 'tensor_transpose')), ('activation_mlp_moe', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_exp', ('expert',)), ('exp', 'expert'), ('mlp_moe', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'context')), ('embed_moe', ('fsdp', 'sequence', 'context')), ('activation_mlp', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_length', ('sequence', 'context')), ('activation_length', ('context',)), ('activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_embed', ('tensor', 'tensor_transpose')), ('activation_stage', 'stage'), ('mlp', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'context', 'expert')), ('norm', ('tensor', 'tensor_transpose')), ('layers', 'stage'), ('diloco', 'diloco'), ('engram_dim', ('tensor',)), ('dense_layers', ()), ('moe_layers', ()), ('mhc', ()), ('prefill_activation_length', ('sequence', 'context')), ('prefill_activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_prefill_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_length', ('sequence',)), ('cache_heads', ('autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence')), ('cache_heads', ('autoregressive', 'tensor', 'tensor_sequence')), ('paged_kv_heads', ('tensor',)), ('cache_batch_prefill', ()), ('cache_batch', ()), ('cache_heads_none', ()), ('cache_kv', ()), ('cache_sequence', ()), ('num_pages', ()), ('tokens_per_page', ()), ('paged_kv_head_dim_size', ()), ('mlp_no_fsdp', ('tensor', 'tensor_sequence', 'autoregressive')), ('embed_tensor_transpose', ('tensor_transpose',)), ('exp_with_fsdp', 'fsdp'))
I0424 09:32:07.213884 132915196512064 pyconfig.py:471] Config param logits_dot_in_fp32: False
I0424 09:32:07.213914 132915196512064 pyconfig.py:471] Config param logits_via_embedding: True
I0424 09:32:07.213930 132915196512064 pyconfig.py:471] Config param lora_input_adapters_path: 
I0424 09:32:07.213946 132915196512064 pyconfig.py:471] Config param loss_algo: grpo
I0424 09:32:07.213962 132915196512064 pyconfig.py:471] Config param lr_schedule_type: LearningRateScheduleType.COSINE
I0424 09:32:07.213980 132915196512064 pyconfig.py:471] Config param managed_mldiagnostics: False
I0424 09:32:07.213994 132915196512064 pyconfig.py:471] Config param managed_mldiagnostics_dir: /deps/maxtext_output/gpt3-52k_2026-04-24-09-32/managed-mldiagnostics
I0424 09:32:07.214010 132915196512064 pyconfig.py:471] Config param managed_mldiagnostics_run_group: 
I0424 09:32:07.214024 132915196512064 pyconfig.py:471] Config param matmul_precision: MatmulPrecision.DEFAULT
I0424 09:32:07.214042 132915196512064 pyconfig.py:471] Config param max_checkify: False
I0424 09:32:07.214056 132915196512064 pyconfig.py:471] Config param max_concurrency: 256
I0424 09:32:07.214072 132915196512064 pyconfig.py:471] Config param max_corpus_chars: 10000000
I0424 09:32:07.214086 132915196512064 pyconfig.py:471] Config param max_num_batched_tokens: None
I0424 09:32:07.214111 132915196512064 pyconfig.py:471] Config param max_num_checkpoints_to_keep: None
I0424 09:32:07.214125 132915196512064 pyconfig.py:471] Config param max_num_images_per_example: -1
I0424 09:32:07.214139 132915196512064 pyconfig.py:471] Config param max_num_seqs: None
I0424 09:32:07.214155 132915196512064 pyconfig.py:471] Config param max_position_embeddings: 163840
I0424 09:32:07.214169 132915196512064 pyconfig.py:471] Config param max_prefill_predict_length: 64
I0424 09:32:07.214183 132915196512064 pyconfig.py:471] Config param max_sample_len_for_audio: 10000
I0424 09:32:07.214198 132915196512064 pyconfig.py:471] Config param max_segments_per_seq: -1
I0424 09:32:07.214213 132915196512064 pyconfig.py:471] Config param max_source_positions_for_audio: 1500
I0424 09:32:07.214229 132915196512064 pyconfig.py:471] Config param max_target_length: 2048
I0424 09:32:07.214244 132915196512064 pyconfig.py:471] Config param max_timescale_for_audio: 10000.0
I0424 09:32:07.214259 132915196512064 pyconfig.py:471] Config param megablox: True
I0424 09:32:07.214276 132915196512064 pyconfig.py:471] Config param merge_gating_gmm: False
I0424 09:32:07.214291 132915196512064 pyconfig.py:471] Config param mesh_axes: ['diloco', 'data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive']
I0424 09:32:07.214310 132915196512064 pyconfig.py:471] Config param metrics_dir: /deps/maxtext_output/gpt3-52k_2026-04-24-09-32/metrics/
I0424 09:32:07.214329 132915196512064 pyconfig.py:471] Config param metrics_file: 
I0424 09:32:07.214344 132915196512064 pyconfig.py:471] Config param mhc_expansion_rate: 1
I0424 09:32:07.214359 132915196512064 pyconfig.py:471] Config param micro_batch_size_to_eval_on: 64
I0424 09:32:07.214373 132915196512064 pyconfig.py:471] Config param micro_batch_size_to_train_on: 64
I0424 09:32:07.214388 132915196512064 pyconfig.py:471] Config param mla_kv: RematLocation.REMAT
I0424 09:32:07.214403 132915196512064 pyconfig.py:471] Config param mla_naive_kvcache: True
I0424 09:32:07.214418 132915196512064 pyconfig.py:471] Config param mla_q: RematLocation.REMAT
I0424 09:32:07.214432 132915196512064 pyconfig.py:471] Config param mlp_activations: ['gelu']
I0424 09:32:07.214448 132915196512064 pyconfig.py:471] Config param mlp_activations_limit: -1.0
I0424 09:32:07.214462 132915196512064 pyconfig.py:471] Config param mlp_bias: False
I0424 09:32:07.214478 132915196512064 pyconfig.py:471] Config param mlp_dim: 64
I0424 09:32:07.214493 132915196512064 pyconfig.py:471] Config param mlpwi: RematLocation.REMAT
I0424 09:32:07.214508 132915196512064 pyconfig.py:471] Config param mlpwi_0: RematLocation.REMAT
I0424 09:32:07.214524 132915196512064 pyconfig.py:471] Config param mlpwi_1: RematLocation.REMAT
I0424 09:32:07.214538 132915196512064 pyconfig.py:471] Config param mlpwo: RematLocation.REMAT
I0424 09:32:07.214553 132915196512064 pyconfig.py:471] Config param moba: False
I0424 09:32:07.214569 132915196512064 pyconfig.py:471] Config param moba_chunk_size: 1024
I0424 09:32:07.214583 132915196512064 pyconfig.py:471] Config param moba_topk: 8
I0424 09:32:07.214598 132915196512064 pyconfig.py:471] Config param model_call_mode: 
I0424 09:32:07.214612 132915196512064 pyconfig.py:471] Config param model_name: gpt3-52k
I0424 09:32:07.214627 132915196512064 pyconfig.py:471] Config param moe_expert_input_dim: -1
I0424 09:32:07.214641 132915196512064 pyconfig.py:471] Config param moe_fsdp_use_two_stage_all_gather: False
I0424 09:32:07.214656 132915196512064 pyconfig.py:471] Config param moe_mlp_dim: -1
I0424 09:32:07.214670 132915196512064 pyconfig.py:471] Config param moe_mlpwi_0: RematLocation.REMAT
I0424 09:32:07.214686 132915196512064 pyconfig.py:471] Config param moe_mlpwi_1: RematLocation.REMAT
I0424 09:32:07.214701 132915196512064 pyconfig.py:471] Config param moe_mlpwo: RematLocation.REMAT
I0424 09:32:07.214717 132915196512064 pyconfig.py:471] Config param monitor_goodput: False
I0424 09:32:07.214731 132915196512064 pyconfig.py:471] Config param monitor_step_time_deviation: True
I0424 09:32:07.214746 132915196512064 pyconfig.py:471] Config param mrope_section: [24, 20, 20]
I0424 09:32:07.214762 132915196512064 pyconfig.py:471] Config param mscale: 1.0
I0424 09:32:07.214778 132915196512064 pyconfig.py:471] Config param mtc_data_parallelism: 0
I0424 09:32:07.214793 132915196512064 pyconfig.py:471] Config param mtp_eval_target_module: 0
I0424 09:32:07.214808 132915196512064 pyconfig.py:471] Config param mtp_loss_scaling_factor: 0.1
I0424 09:32:07.214824 132915196512064 pyconfig.py:471] Config param mtp_num_layers: 0
I0424 09:32:07.214838 132915196512064 pyconfig.py:471] Config param mu_dtype: float32
I0424 09:32:07.214861 132915196512064 pyconfig.py:471] Config param multi_sampling: False
I0424 09:32:07.214876 132915196512064 pyconfig.py:471] Config param multi_tier_checkpointing_backup_interval_minutes: 0
I0424 09:32:07.214892 132915196512064 pyconfig.py:471] Config param muon_beta: 0.95
I0424 09:32:07.214907 132915196512064 pyconfig.py:471] Config param muon_consistent_rms: None
I0424 09:32:07.214922 132915196512064 pyconfig.py:471] Config param muon_weight_decay: 0.0
I0424 09:32:07.214938 132915196512064 pyconfig.py:471] Config param n_routing_groups: -1
I0424 09:32:07.214953 132915196512064 pyconfig.py:471] Config param n_window_for_audio: 50
I0424 09:32:07.214968 132915196512064 pyconfig.py:471] Config param n_window_infer_for_audio: 800
I0424 09:32:07.214982 132915196512064 pyconfig.py:471] Config param nope_layer_interval: -1
I0424 09:32:07.214998 132915196512064 pyconfig.py:471] Config param norm_topk_prob: False
I0424 09:32:07.215013 132915196512064 pyconfig.py:471] Config param normalization_layer_epsilon: 1e-05
I0424 09:32:07.215035 132915196512064 pyconfig.py:471] Config param normalize_embedding_logits: False
I0424 09:32:07.215049 132915196512064 pyconfig.py:471] Config param num_attention_heads_for_vit: 16
I0424 09:32:07.215065 132915196512064 pyconfig.py:471] Config param num_batches: 4
I0424 09:32:07.215080 132915196512064 pyconfig.py:471] Config param num_channels_for_vit: 3
I0424 09:32:07.215105 132915196512064 pyconfig.py:471] Config param num_conv_layers_for_audio: 3
I0424 09:32:07.215121 132915196512064 pyconfig.py:471] Config param num_decoder_layers: 1
I0424 09:32:07.215137 132915196512064 pyconfig.py:471] Config param num_diloco_replicas: 1
I0424 09:32:07.215151 132915196512064 pyconfig.py:471] Config param num_epoch: 1
I0424 09:32:07.215167 132915196512064 pyconfig.py:471] Config param num_eval_passes: 1
I0424 09:32:07.215183 132915196512064 pyconfig.py:471] Config param num_experts: 1
I0424 09:32:07.215200 132915196512064 pyconfig.py:471] Config param num_experts_per_tok: 1
I0424 09:32:07.215216 132915196512064 pyconfig.py:471] Config param num_generations: 2
I0424 09:32:07.215229 132915196512064 pyconfig.py:471] Config param num_hidden_layers_for_vit: 34
I0424 09:32:07.215245 132915196512064 pyconfig.py:471] Config param num_iterations: 1
I0424 09:32:07.215259 132915196512064 pyconfig.py:471] Config param num_kv_heads: 2
I0424 09:32:07.215274 132915196512064 pyconfig.py:471] Config param num_layers_per_pipeline_stage: 1
I0424 09:32:07.215289 132915196512064 pyconfig.py:471] Config param num_mel_bins_for_audio: 128
I0424 09:32:07.215304 132915196512064 pyconfig.py:471] Config param num_pipeline_microbatches: -1
I0424 09:32:07.215323 132915196512064 pyconfig.py:471] Config param num_pipeline_repeats: -1
I0424 09:32:07.215337 132915196512064 pyconfig.py:471] Config param num_position_embeddings_for_vit: 1024
I0424 09:32:07.215352 132915196512064 pyconfig.py:471] Config param num_query_heads: 2
I0424 09:32:07.215367 132915196512064 pyconfig.py:471] Config param num_samplers_slices: -1
I0424 09:32:07.215383 132915196512064 pyconfig.py:471] Config param num_slices: 1
I0424 09:32:07.215397 132915196512064 pyconfig.py:471] Config param num_target_devices: 32
I0424 09:32:07.215411 132915196512064 pyconfig.py:471] Config param num_test_batches: 5
I0424 09:32:07.215426 132915196512064 pyconfig.py:471] Config param num_trainer_slices: -1
I0424 09:32:07.215440 132915196512064 pyconfig.py:471] Config param num_vocab_tiling: 1
I0424 09:32:07.215455 132915196512064 pyconfig.py:471] Config param off_policy_steps: 0
I0424 09:32:07.215469 132915196512064 pyconfig.py:471] Config param offline_data_dir: None
I0424 09:32:07.215484 132915196512064 pyconfig.py:471] Config param opt_type: OptimizerType.ADAM_PAX
I0424 09:32:07.215502 132915196512064 pyconfig.py:471] Config param optimize_mesh_for_tpu_v6e: False
I0424 09:32:07.215517 132915196512064 pyconfig.py:471] Config param optimizer_memory_host_offload: False
I0424 09:32:07.215533 132915196512064 pyconfig.py:471] Config param original_max_position_embeddings: 4096
I0424 09:32:07.215548 132915196512064 pyconfig.py:471] Config param out_hidden_size_for_vit: 512
I0424 09:32:07.215564 132915196512064 pyconfig.py:471] Config param out_proj: RematLocation.REMAT
I0424 09:32:07.215578 132915196512064 pyconfig.py:471] Config param output_dim_for_audio: 512
I0424 09:32:07.215594 132915196512064 pyconfig.py:471] Config param override_logical_axis_rules: False
I0424 09:32:07.215608 132915196512064 pyconfig.py:471] Config param override_model_config: True
I0424 09:32:07.215623 132915196512064 pyconfig.py:471] Config param packing: True
I0424 09:32:07.215638 132915196512064 pyconfig.py:471] Config param pagedattn_head_dim_alignment: 128
I0424 09:32:07.215653 132915196512064 pyconfig.py:471] Config param pagedattn_max_pages_per_group: -1
I0424 09:32:07.215667 132915196512064 pyconfig.py:471] Config param pagedattn_num_pages: 64
I0424 09:32:07.215682 132915196512064 pyconfig.py:471] Config param pagedattn_pages_per_compute_block: 4
I0424 09:32:07.215698 132915196512064 pyconfig.py:471] Config param pagedattn_tokens_per_page: 32
I0424 09:32:07.215712 132915196512064 pyconfig.py:471] Config param param_scan_axis: 1
I0424 09:32:07.215727 132915196512064 pyconfig.py:471] Config param parameter_memory_host_offload: False
I0424 09:32:07.215743 132915196512064 pyconfig.py:471] Config param partial_rotary_factor: 1.0
I0424 09:32:07.215759 132915196512064 pyconfig.py:471] Config param patch_size_for_vit: 14
I0424 09:32:07.215773 132915196512064 pyconfig.py:471] Config param penalty_incorrect_answer: -1.0
I0424 09:32:07.215789 132915196512064 pyconfig.py:471] Config param penalty_incorrect_format: -0.5
I0424 09:32:07.215805 132915196512064 pyconfig.py:471] Config param per_device_batch_size: 2
I0424 09:32:07.215820 132915196512064 pyconfig.py:471] Config param per_device_batch_size_increment: 2.0
I0424 09:32:07.215836 132915196512064 pyconfig.py:471] Config param per_device_batch_size_start: 4.0
I0424 09:32:07.215852 132915196512064 pyconfig.py:471] Config param pipeline_delay_activation_forwarding: False
I0424 09:32:07.215866 132915196512064 pyconfig.py:471] Config param pipeline_fsdp_ag_once: False
I0424 09:32:07.215881 132915196512064 pyconfig.py:471] Config param pipeline_fsdp_ag_per_repeat: False
I0424 09:32:07.215896 132915196512064 pyconfig.py:471] Config param pipeline_parallel_layers: 1
I0424 09:32:07.215911 132915196512064 pyconfig.py:471] Config param pixel_shuffle_ratio_for_vit: 0.5
I0424 09:32:07.215925 132915196512064 pyconfig.py:471] Config param posemb_type_for_vit: learn
I0424 09:32:07.215941 132915196512064 pyconfig.py:471] Config param position_id_per_seconds: 25
I0424 09:32:07.215956 132915196512064 pyconfig.py:471] Config param prefill_cache_axis_order: 1,2,0,3
I0424 09:32:07.215971 132915196512064 pyconfig.py:471] Config param prefill_cache_dir: 
I0424 09:32:07.215987 132915196512064 pyconfig.py:471] Config param prefill_chunk_size: 256
I0424 09:32:07.216001 132915196512064 pyconfig.py:471] Config param prefill_slice: v5e-16
I0424 09:32:07.216017 132915196512064 pyconfig.py:471] Config param prefix_caching_dram_byte: 100000000000
I0424 09:32:07.216031 132915196512064 pyconfig.py:471] Config param prefix_caching_hbm_byte: 10000000000
I0424 09:32:07.216046 132915196512064 pyconfig.py:471] Config param prefuse_moe_weights: False
I0424 09:32:07.216061 132915196512064 pyconfig.py:471] Config param profile_cleanly: True
I0424 09:32:07.216076 132915196512064 pyconfig.py:471] Config param profile_periodically_period: -1
I0424 09:32:07.216091 132915196512064 pyconfig.py:471] Config param profile_power_events: False
I0424 09:32:07.216119 132915196512064 pyconfig.py:471] Config param profiler: ProfilerType.NONE
I0424 09:32:07.216136 132915196512064 pyconfig.py:471] Config param profiler_steps: 5
I0424 09:32:07.216151 132915196512064 pyconfig.py:471] Config param projector_dropout_for_vit: 0.0
I0424 09:32:07.216165 132915196512064 pyconfig.py:471] Config param projector_input_dim_for_vit: 4096
I0424 09:32:07.216181 132915196512064 pyconfig.py:471] Config param projector_output_dim_for_vit: 4096
I0424 09:32:07.216196 132915196512064 pyconfig.py:471] Config param prometheus_port: 0
I0424 09:32:07.216210 132915196512064 pyconfig.py:471] Config param prompt: I love to
I0424 09:32:07.216225 132915196512064 pyconfig.py:471] Config param pure_nnx: False
I0424 09:32:07.216240 132915196512064 pyconfig.py:471] Config param pure_nnx_decoder: False
I0424 09:32:07.216255 132915196512064 pyconfig.py:471] Config param q_lora_rank: 0
I0424 09:32:07.216270 132915196512064 pyconfig.py:471] Config param qk_clip_threshold: 100.0
I0424 09:32:07.216285 132915196512064 pyconfig.py:471] Config param qk_nope_head_dim: 128
I0424 09:32:07.216301 132915196512064 pyconfig.py:471] Config param qk_norm_with_scale: True
I0424 09:32:07.216321 132915196512064 pyconfig.py:471] Config param qk_rope_head_dim: 64
I0424 09:32:07.216337 132915196512064 pyconfig.py:471] Config param qkv_proj: RematLocation.REMAT
I0424 09:32:07.216352 132915196512064 pyconfig.py:471] Config param quant_cfg_path: 
I0424 09:32:07.216368 132915196512064 pyconfig.py:471] Config param quantization: QuantizationType.NONE
I0424 09:32:07.216386 132915196512064 pyconfig.py:471] Config param quantization_local_shard_count: 4
I0424 09:32:07.216400 132915196512064 pyconfig.py:471] Config param quantize_kvcache: False
I0424 09:32:07.216417 132915196512064 pyconfig.py:471] Config param query_proj: RematLocation.REMAT
I0424 09:32:07.216431 132915196512064 pyconfig.py:471] Config param query_wa_proj: RematLocation.REMAT
I0424 09:32:07.216448 132915196512064 pyconfig.py:471] Config param ragged_block_size: 256
I0424 09:32:07.216462 132915196512064 pyconfig.py:471] Config param ragged_buffer_factor: -1.0
I0424 09:32:07.216477 132915196512064 pyconfig.py:471] Config param rampup_end_step: 0
I0424 09:32:07.216492 132915196512064 pyconfig.py:471] Config param rampup_samples_per_increment_to_load: None
I0424 09:32:07.216507 132915196512064 pyconfig.py:471] Config param reasoning_end_token: </reasoning>
I0424 09:32:07.216522 132915196512064 pyconfig.py:471] Config param reasoning_start_token: <reasoning>
I0424 09:32:07.216538 132915196512064 pyconfig.py:471] Config param record_internal_nn_metrics: 0
I0424 09:32:07.216552 132915196512064 pyconfig.py:471] Config param remat_policy: full
I0424 09:32:07.216567 132915196512064 pyconfig.py:471] Config param remat_policy_for_vit: minimal
I0424 09:32:07.216582 132915196512064 pyconfig.py:471] Config param remove_size_one_mesh_axis_from_type: True
I0424 09:32:07.216597 132915196512064 pyconfig.py:471] Config param replicate_quant_scale: False
I0424 09:32:07.216611 132915196512064 pyconfig.py:471] Config param replicator_backup_interval_minutes: 0
I0424 09:32:07.216626 132915196512064 pyconfig.py:471] Config param report_heartbeat_metric_for_gcp_monitoring: False
I0424 09:32:07.216641 132915196512064 pyconfig.py:471] Config param report_performance_metric_for_gcp_monitoring: False
I0424 09:32:07.216656 132915196512064 pyconfig.py:471] Config param reshape_q: False
I0424 09:32:07.216670 132915196512064 pyconfig.py:471] Config param return_log_prob: False
I0424 09:32:07.216685 132915196512064 pyconfig.py:471] Config param reuse_example_batch: 0
I0424 09:32:07.216700 132915196512064 pyconfig.py:471] Config param reward_exact_answer: 5.0
I0424 09:32:07.216715 132915196512064 pyconfig.py:471] Config param reward_exact_format_match: 3.0
I0424 09:32:07.216731 132915196512064 pyconfig.py:471] Config param reward_partial_format_match: 0.5
I0424 09:32:07.216747 132915196512064 pyconfig.py:471] Config param reward_ratio_guess_to_answer_high: 0.5
I0424 09:32:07.216762 132915196512064 pyconfig.py:471] Config param reward_ratio_guess_to_answer_low: 0.25
I0424 09:32:07.216778 132915196512064 pyconfig.py:471] Config param reward_white_space_format_match: 1.5
I0424 09:32:07.216792 132915196512064 pyconfig.py:471] Config param rl: {'num_generations': 2, 'num_iterations': 1, 'grpo_beta': 0.08, 'grpo_epsilon': 0.2, 'loss_algo': 'grpo', 'use_agentic_rollout': False, 'max_concurrency': 256, 'off_policy_steps': 0, 'system_prompt': '', 'degenerate_group_masking': True, 'epsilon_high': None}
I0424 09:32:07.216813 132915196512064 pyconfig.py:471] Config param rollout_data_parallelism: -1
I0424 09:32:07.216829 132915196512064 pyconfig.py:471] Config param rollout_expert_parallelism: 1
I0424 09:32:07.216844 132915196512064 pyconfig.py:471] Config param rollout_micro_batch_size: -1
I0424 09:32:07.216858 132915196512064 pyconfig.py:471] Config param rollout_tensor_parallelism: -1
I0424 09:32:07.216873 132915196512064 pyconfig.py:471] Config param rope_attention_scaling: False
I0424 09:32:07.216887 132915196512064 pyconfig.py:471] Config param rope_factor: 40
I0424 09:32:07.216902 132915196512064 pyconfig.py:471] Config param rope_interleave: True
I0424 09:32:07.216916 132915196512064 pyconfig.py:471] Config param rope_linear_scaling_factor: 1.0
I0424 09:32:07.216931 132915196512064 pyconfig.py:471] Config param rope_max_timescale: 10000
I0424 09:32:07.216946 132915196512064 pyconfig.py:471] Config param rope_min_timescale: 1
I0424 09:32:07.216960 132915196512064 pyconfig.py:471] Config param rope_theta_for_vit: 10000
I0424 09:32:07.216975 132915196512064 pyconfig.py:471] Config param rope_truncate: True
I0424 09:32:07.216990 132915196512064 pyconfig.py:471] Config param rope_type: RopeType.DEFAULT
I0424 09:32:07.217007 132915196512064 pyconfig.py:471] Config param rope_use_scale: True
I0424 09:32:07.217023 132915196512064 pyconfig.py:471] Config param routed_bias: False
I0424 09:32:07.217037 132915196512064 pyconfig.py:471] Config param routed_bias_update_rate: 0.0
I0424 09:32:07.217053 132915196512064 pyconfig.py:471] Config param routed_scaling_factor: 1.0
I0424 09:32:07.217067 132915196512064 pyconfig.py:471] Config param routed_score_func: 
I0424 09:32:07.217082 132915196512064 pyconfig.py:471] Config param run_name: gpt3-52k_2026-04-24-09-32
I0424 09:32:07.217108 132915196512064 pyconfig.py:471] Config param sa_block_kv: 512
I0424 09:32:07.217124 132915196512064 pyconfig.py:471] Config param sa_block_kv_compute: 512
I0424 09:32:07.217138 132915196512064 pyconfig.py:471] Config param sa_block_kv_dkv: 512
I0424 09:32:07.217153 132915196512064 pyconfig.py:471] Config param sa_block_kv_dkv_compute: 512
I0424 09:32:07.217167 132915196512064 pyconfig.py:471] Config param sa_block_kv_dq: 512
I0424 09:32:07.217183 132915196512064 pyconfig.py:471] Config param sa_block_q: 512
I0424 09:32:07.217198 132915196512064 pyconfig.py:471] Config param sa_block_q_dkv: 512
I0424 09:32:07.217213 132915196512064 pyconfig.py:471] Config param sa_block_q_dq: 512
I0424 09:32:07.217229 132915196512064 pyconfig.py:471] Config param sa_k_layout: HEAD_DIM_MINOR
I0424 09:32:07.217243 132915196512064 pyconfig.py:471] Config param sa_q_layout: HEAD_DIM_MINOR
I0424 09:32:07.217258 132915196512064 pyconfig.py:471] Config param sa_use_fused_bwd_kernel: False
I0424 09:32:07.217272 132915196512064 pyconfig.py:471] Config param sa_v_layout: HEAD_DIM_MINOR
I0424 09:32:07.217288 132915196512064 pyconfig.py:471] Config param sampler_devices_fraction: 0.5
I0424 09:32:07.217303 132915196512064 pyconfig.py:471] Config param save_checkpoint_on_completion: True
I0424 09:32:07.217321 132915196512064 pyconfig.py:471] Config param save_config_to_gcs: False
I0424 09:32:07.217337 132915196512064 pyconfig.py:471] Config param save_quantized_params_path: 
I0424 09:32:07.217351 132915196512064 pyconfig.py:471] Config param scale_embedding_for_audio: True
I0424 09:32:07.217366 132915196512064 pyconfig.py:471] Config param scan_layers: True
I0424 09:32:07.217380 132915196512064 pyconfig.py:471] Config param scan_layers_per_stage: False
I0424 09:32:07.217396 132915196512064 pyconfig.py:471] Config param scan_pipeline_iterations: True
I0424 09:32:07.217411 132915196512064 pyconfig.py:471] Config param scan_pipeline_repeats: False
I0424 09:32:07.217427 132915196512064 pyconfig.py:471] Config param set_remat_policy_on_layers_per_stage: False
I0424 09:32:07.217441 132915196512064 pyconfig.py:471] Config param set_remat_policy_on_pipeline_iterations: True
I0424 09:32:07.217456 132915196512064 pyconfig.py:471] Config param sft_train_on_completion_only: False
I0424 09:32:07.217472 132915196512064 pyconfig.py:471] Config param shard_exp_on_fsdp: False
I0424 09:32:07.217486 132915196512064 pyconfig.py:471] Config param shard_mode: ShardMode.AUTO
I0424 09:32:07.217504 132915196512064 pyconfig.py:471] Config param shard_optimizer_over_data: False
I0424 09:32:07.217521 132915196512064 pyconfig.py:471] Config param sharding_strategy: None
I0424 09:32:07.217536 132915196512064 pyconfig.py:471] Config param sharding_tolerance: 0.02
I0424 09:32:07.217552 132915196512064 pyconfig.py:471] Config param shardy: True
I0424 09:32:07.217566 132915196512064 pyconfig.py:471] Config param share_kv_projections: False
I0424 09:32:07.217582 132915196512064 pyconfig.py:471] Config param shared_experts: 0
I0424 09:32:07.217596 132915196512064 pyconfig.py:471] Config param sinkhorn_iterations: 20
I0424 09:32:07.217610 132915196512064 pyconfig.py:471] Config param skip_first_n_steps_for_profiler: 1
I0424 09:32:07.217625 132915196512064 pyconfig.py:471] Config param skip_jax_distributed_system: False
I0424 09:32:07.217640 132915196512064 pyconfig.py:471] Config param skip_step_interval: 128
I0424 09:32:07.217655 132915196512064 pyconfig.py:471] Config param skip_step_on_spikes: False
I0424 09:32:07.217670 132915196512064 pyconfig.py:471] Config param skip_step_scaling_factor: 6.0
I0424 09:32:07.217685 132915196512064 pyconfig.py:471] Config param sliding_window_size: 0
I0424 09:32:07.217700 132915196512064 pyconfig.py:471] Config param solution_end_token: </answer>
I0424 09:32:07.217714 132915196512064 pyconfig.py:471] Config param solution_start_token: <answer>
I0424 09:32:07.217729 132915196512064 pyconfig.py:471] Config param source_checkpoint_layout: orbax
I0424 09:32:07.217744 132915196512064 pyconfig.py:471] Config param sparse_matmul: True
I0424 09:32:07.217760 132915196512064 pyconfig.py:471] Config param spatial_merge_size_for_vit: 2
I0424 09:32:07.217775 132915196512064 pyconfig.py:471] Config param stack_prefill_result_cache: False
I0424 09:32:07.217788 132915196512064 pyconfig.py:471] Config param stack_trace_interval_seconds: 600
I0424 09:32:07.217804 132915196512064 pyconfig.py:471] Config param stack_trace_to_cloud: False
I0424 09:32:07.217818 132915196512064 pyconfig.py:471] Config param step_deviation_interval_seconds: 30
I0424 09:32:07.217834 132915196512064 pyconfig.py:471] Config param steps: 200000
I0424 09:32:07.217849 132915196512064 pyconfig.py:471] Config param stop_strings: None
I0424 09:32:07.217865 132915196512064 pyconfig.py:471] Config param student_overrides: {'model_name': 'llama3.1-8b'}
I0424 09:32:07.217883 132915196512064 pyconfig.py:471] Config param student_params_to_update: None
I0424 09:32:07.217898 132915196512064 pyconfig.py:471] Config param subslice_shape: 
I0424 09:32:07.217913 132915196512064 pyconfig.py:471] Config param swap_space_vllm_gb: 2
I0424 09:32:07.217928 132915196512064 pyconfig.py:471] Config param system_prompt: 
I0424 09:32:07.217943 132915196512064 pyconfig.py:471] Config param target_eval_loss: 0.0
I0424 09:32:07.217959 132915196512064 pyconfig.py:471] Config param teacher_overrides: {'model_name': 'llama3.1-8b'}
I0424 09:32:07.217974 132915196512064 pyconfig.py:471] Config param temperature_tuning: False
I0424 09:32:07.217989 132915196512064 pyconfig.py:471] Config param temporal_patch_size_for_vit: 2
I0424 09:32:07.218003 132915196512064 pyconfig.py:471] Config param tensorboard_dir: /deps/maxtext_output/gpt3-52k_2026-04-24-09-32/tensorboard/
I0424 09:32:07.218019 132915196512064 pyconfig.py:471] Config param tensors_on_device: None
I0424 09:32:07.218035 132915196512064 pyconfig.py:471] Config param tensors_to_offload: None
I0424 09:32:07.218049 132915196512064 pyconfig.py:471] Config param test_batch_start_index: 0
I0424 09:32:07.218065 132915196512064 pyconfig.py:471] Config param tile_size_for_vit: 336
I0424 09:32:07.218079 132915196512064 pyconfig.py:471] Config param tokenize_eval_data: True
I0424 09:32:07.218104 132915196512064 pyconfig.py:471] Config param tokenize_train_data: True
I0424 09:32:07.218120 132915196512064 pyconfig.py:471] Config param tokenizer_path: meta-llama/Llama-3.1-8B
I0424 09:32:07.218134 132915196512064 pyconfig.py:471] Config param tokenizer_type: TokenizerType.HUGGINGFACE
I0424 09:32:07.218152 132915196512064 pyconfig.py:471] Config param topk_routing_group: -1
I0424 09:32:07.218167 132915196512064 pyconfig.py:471] Config param train_data_columns: ['text']
I0424 09:32:07.218183 132915196512064 pyconfig.py:471] Config param train_fraction: 1.0
I0424 09:32:07.218199 132915196512064 pyconfig.py:471] Config param train_image_column: image
I0424 09:32:07.218214 132915196512064 pyconfig.py:471] Config param train_micro_batch_size: -1
I0424 09:32:07.218230 132915196512064 pyconfig.py:471] Config param train_split: train
I0424 09:32:07.218245 132915196512064 pyconfig.py:471] Config param trainable_parameters_mask: []
I0424 09:32:07.218260 132915196512064 pyconfig.py:471] Config param trainable_position_size: 2048
I0424 09:32:07.218276 132915196512064 pyconfig.py:471] Config param trainer_devices_fraction: 0.5
I0424 09:32:07.218291 132915196512064 pyconfig.py:471] Config param upload_all_profiler_results: False
I0424 09:32:07.218305 132915196512064 pyconfig.py:471] Config param use_2d_fsdp_sharding: False
I0424 09:32:07.218324 132915196512064 pyconfig.py:471] Config param use_agentic_rollout: False
I0424 09:32:07.218341 132915196512064 pyconfig.py:471] Config param use_audio: False
I0424 09:32:07.218355 132915196512064 pyconfig.py:471] Config param use_audio_in_video: False
I0424 09:32:07.218371 132915196512064 pyconfig.py:471] Config param use_batch_split_schedule: False
I0424 09:32:07.218386 132915196512064 pyconfig.py:471] Config param use_chat_template: False
I0424 09:32:07.218400 132915196512064 pyconfig.py:471] Config param use_chunked_prefill: False
I0424 09:32:07.218416 132915196512064 pyconfig.py:471] Config param use_custom_sort_vjp: True
I0424 09:32:07.218431 132915196512064 pyconfig.py:471] Config param use_dpo: False
I0424 09:32:07.218447 132915196512064 pyconfig.py:471] Config param use_gather_mosaic_kernel: False
I0424 09:32:07.218461 132915196512064 pyconfig.py:471] Config param use_grpo: True
I0424 09:32:07.218477 132915196512064 pyconfig.py:471] Config param use_indexer: False
I0424 09:32:07.218492 132915196512064 pyconfig.py:471] Config param use_iota_embed: True
I0424 09:32:07.218508 132915196512064 pyconfig.py:471] Config param use_jax_splash: False
I0424 09:32:07.218523 132915196512064 pyconfig.py:471] Config param use_max_logit_estimate: -1
I0424 09:32:07.218537 132915196512064 pyconfig.py:471] Config param use_mrope: False
I0424 09:32:07.218552 132915196512064 pyconfig.py:471] Config param use_multimodal: False
I0424 09:32:07.218566 132915196512064 pyconfig.py:471] Config param use_pathways: True
I0424 09:32:07.218582 132915196512064 pyconfig.py:471] Config param use_post_attn_norm: False
I0424 09:32:07.218598 132915196512064 pyconfig.py:471] Config param use_post_ffw_norm: False
I0424 09:32:07.218612 132915196512064 pyconfig.py:471] Config param use_qk_clip: False
I0424 09:32:07.218627 132915196512064 pyconfig.py:471] Config param use_qk_norm: False
I0424 09:32:07.218641 132915196512064 pyconfig.py:471] Config param use_qk_norm_in_gdn: True
I0424 09:32:07.218657 132915196512064 pyconfig.py:471] Config param use_qwix_quantization: False
I0424 09:32:07.218672 132915196512064 pyconfig.py:471] Config param use_ragged_attention: False
I0424 09:32:07.218686 132915196512064 pyconfig.py:471] Config param use_random_routing: False
I0424 09:32:07.218701 132915196512064 pyconfig.py:471] Config param use_replicator_service: False
I0424 09:32:07.218716 132915196512064 pyconfig.py:471] Config param use_ring_of_experts: False
I0424 09:32:07.218731 132915196512064 pyconfig.py:471] Config param use_sft: False
I0424 09:32:07.218746 132915196512064 pyconfig.py:471] Config param use_splash_scheduler: False
I0424 09:32:07.218761 132915196512064 pyconfig.py:471] Config param use_tokamax_gmm: False
I0424 09:32:07.218775 132915196512064 pyconfig.py:471] Config param use_tokamax_splash: False
I0424 09:32:07.218790 132915196512064 pyconfig.py:471] Config param use_truncation: True
I0424 09:32:07.218806 132915196512064 pyconfig.py:471] Config param use_tunix_gradient_accumulation: False
I0424 09:32:07.218821 132915196512064 pyconfig.py:471] Config param use_untrainable_positional_embedding: False
I0424 09:32:07.218835 132915196512064 pyconfig.py:471] Config param use_vertex_tensorboard: False
I0424 09:32:07.218850 132915196512064 pyconfig.py:471] Config param using_pipeline_parallelism: False
I0424 09:32:07.218865 132915196512064 pyconfig.py:471] Config param v_head_dim: 128
I0424 09:32:07.218880 132915196512064 pyconfig.py:471] Config param v_norm_with_scale: True
I0424 09:32:07.218894 132915196512064 pyconfig.py:471] Config param value_proj: RematLocation.REMAT
I0424 09:32:07.218910 132915196512064 pyconfig.py:471] Config param vertex_tensorboard_project: 
I0424 09:32:07.218924 132915196512064 pyconfig.py:471] Config param vertex_tensorboard_region: 
I0424 09:32:07.218939 132915196512064 pyconfig.py:471] Config param video_path: 
I0424 09:32:07.218953 132915196512064 pyconfig.py:471] Config param video_placeholder: <|video|>
I0424 09:32:07.218969 132915196512064 pyconfig.py:471] Config param vision_output_dim_for_vit: 4096
I0424 09:32:07.218983 132915196512064 pyconfig.py:471] Config param vision_output_length: -1
I0424 09:32:07.218998 132915196512064 pyconfig.py:471] Config param vllm_additional_config: {}
I0424 09:32:07.219012 132915196512064 pyconfig.py:471] Config param vllm_hf_config_path: 
I0424 09:32:07.219028 132915196512064 pyconfig.py:471] Config param vllm_hf_overrides: {}
I0424 09:32:07.219044 132915196512064 pyconfig.py:471] Config param vocab_size: 32000
I0424 09:32:07.219059 132915196512064 pyconfig.py:471] Config param warmup_steps_fraction: 0.1
I0424 09:32:07.219074 132915196512064 pyconfig.py:471] Config param weight_dtype: float32
I0424 09:32:07.219108 132915196512064 pyconfig.py:471] Config param weight_quantization_calibration_method: absmax
I0424 09:32:07.219125 132915196512064 pyconfig.py:471] Config param wi_tile_dlhs_batch_seq: 512
I0424 09:32:07.219139 132915196512064 pyconfig.py:471] Config param wi_tile_dlhs_embed_dim: 1024
I0424 09:32:07.219155 132915196512064 pyconfig.py:471] Config param wi_tile_dlhs_mlp_dim: 1024
I0424 09:32:07.219172 132915196512064 pyconfig.py:471] Config param wi_tile_drhs_batch_seq: 512
I0424 09:32:07.219187 132915196512064 pyconfig.py:471] Config param wi_tile_drhs_embed_dim: 1024
I0424 09:32:07.219202 132915196512064 pyconfig.py:471] Config param wi_tile_drhs_mlp_dim: 1024
I0424 09:32:07.219218 132915196512064 pyconfig.py:471] Config param wi_tile_fwd_batch_seq: 512
I0424 09:32:07.219233 132915196512064 pyconfig.py:471] Config param wi_tile_fwd_embed_dim: 1024
I0424 09:32:07.219247 132915196512064 pyconfig.py:471] Config param wi_tile_fwd_mlp_dim: 1024
I0424 09:32:07.219262 132915196512064 pyconfig.py:471] Config param wo_tile_dlhs_batch_seq: 512
I0424 09:32:07.219277 132915196512064 pyconfig.py:471] Config param wo_tile_dlhs_embed_dim: 1024
I0424 09:32:07.219292 132915196512064 pyconfig.py:471] Config param wo_tile_dlhs_mlp_dim: 1024
I0424 09:32:07.219306 132915196512064 pyconfig.py:471] Config param wo_tile_drhs_batch_seq: 512
I0424 09:32:07.219326 132915196512064 pyconfig.py:471] Config param wo_tile_drhs_embed_dim: 1024
I0424 09:32:07.219341 132915196512064 pyconfig.py:471] Config param wo_tile_drhs_mlp_dim: 1024
I0424 09:32:07.219356 132915196512064 pyconfig.py:471] Config param wo_tile_fwd_batch_seq: 512
I0424 09:32:07.219370 132915196512064 pyconfig.py:471] Config param wo_tile_fwd_embed_dim: 1024
I0424 09:32:07.219383 132915196512064 pyconfig.py:471] Config param wo_tile_fwd_mlp_dim: 1024
I0424 09:32:07.219399 132915196512064 pyconfig.py:471] Config param wsd_decay_steps_fraction: 0.1
I0424 09:32:07.219414 132915196512064 pyconfig.py:471] Config param wsd_decay_style: WsdDecayStyle.LINEAR
I0424 09:32:07.219432 132915196512064 pyconfig.py:471] Config param xprof_e2e_enable_fw_power_level_event: False
I0424 09:32:07.219446 132915196512064 pyconfig.py:471] Config param xprof_e2e_enable_fw_thermal_event: False
I0424 09:32:07.219461 132915196512064 pyconfig.py:471] Config param xprof_e2e_enable_fw_throttle_event: False
I0424 09:32:07.219475 132915196512064 pyconfig.py:471] Config param xprof_tpu_power_trace_level: 0
I0424 09:32:07.219493 132915196512064 pyconfig.py:471] Config param z_loss_multiplier: 0.0
I0424 09:32:07.219997 132915196512064 tokenizer.py:245] Tokenizer path: meta-llama/Llama-2-7b-chat-hf
I0424 09:32:07.220038 132915196512064 tokenizer.py:224] Loading HF tokenizer: meta-llama/Llama-2-7b-chat-hf
I0424 09:32:07.405611 132915196512064 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json "HTTP/1.1 200 OK"
I0424 09:32:07.509235 132915196512064 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json "HTTP/1.1 200 OK"
I0424 09:32:07.620831 132915196512064 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer_config.json "HTTP/1.1 200 OK"
I0424 09:32:07.728480 132915196512064 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer_config.json "HTTP/1.1 200 OK"
I0424 09:32:07.844611 132915196512064 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main/additional_chat_templates?recursive=false&expand=false "HTTP/1.1 404 Not Found"
I0424 09:32:07.955355 132915196512064 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main?recursive=true&expand=false "HTTP/1.1 200 OK"
I0424 09:32:08.069730 132915196512064 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer.model "HTTP/1.1 302 Found"
I0424 09:32:08.179372 132915196512064 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/xet-read-token/f5db02db724555f92da89c216ac04704f23d4590 "HTTP/1.1 200 OK"
I0424 09:32:08.878283 132915196512064 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer.json "HTTP/1.1 200 OK"
I0424 09:32:09.017405 132915196512064 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer.json "HTTP/1.1 200 OK"
I0424 09:32:09.364300 132915196512064 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/added_tokens.json "HTTP/1.1 404 Not Found"
I0424 09:32:09.482342 132915196512064 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/special_tokens_map.json "HTTP/1.1 200 OK"
I0424 09:32:09.594021 132915196512064 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/special_tokens_map.json "HTTP/1.1 200 OK"
I0424 09:32:09.705307 132915196512064 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/chat_template.jinja "HTTP/1.1 404 Not Found"
I0424 09:32:09.798974 132915196512064 _schedule.py:129] A polynomial schedule was set with a non-positive `transition_steps` value; this results in a constant schedule with value `init_value`.
I0424 09:32:09.805997 132915196512064 maxtext_utils.py:1771] Num_devices: 32, shape (1, 4, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1)
I0424 09:32:09.806175 132915196512064 train_distill.py:582] Applying logical axis rules for model initialization and training...
I0424 09:32:09.806262 132915196512064 train_distill.py:586] Loading Student from ...
I0424 09:32:09.806296 132915196512064 train_distill.py:170] --- Student Configuration ---
I0424 09:32:09.806321 132915196512064 train_distill.py:171]   Model Name:      gpt3-52k
I0424 09:32:09.806344 132915196512064 train_distill.py:172]   Dimensions:      1 Layers, 16 Emb Dim, 8 Head Dim
I0424 09:32:09.806364 132915196512064 train_distill.py:175]   Attention Heads: 2 Query, 2 KV
I0424 09:32:09.806382 132915196512064 train_distill.py:176]   Vocab Size:      32000
I0424 09:32:09.806400 132915196512064 train_distill.py:177]   Checkpoint:      
I0424 09:32:09.806418 132915196512064 train_distill.py:451] Initializing model: gpt3-52k...
I0424 09:32:11.465708 132915196512064 train_distill.py:600] Loading Teacher from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items...
I0424 09:32:11.465815 132915196512064 train_distill.py:170] --- Teacher Configuration ---
I0424 09:32:11.465843 132915196512064 train_distill.py:171]   Model Name:      gpt3-52k
I0424 09:32:11.465868 132915196512064 train_distill.py:172]   Dimensions:      1 Layers, 16 Emb Dim, 8 Head Dim
I0424 09:32:11.465889 132915196512064 train_distill.py:175]   Attention Heads: 2 Query, 2 KV
I0424 09:32:11.465908 132915196512064 train_distill.py:176]   Vocab Size:      32000
I0424 09:32:11.465925 132915196512064 train_distill.py:177]   Checkpoint:      gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items
I0424 09:32:11.465944 132915196512064 train_distill.py:451] Initializing model: gpt3-52k...
I0424 09:32:12.541429 132915196512064 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0424 09:32:12.541582 132915196512064 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=True, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x78e202db34d0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0424 09:32:12.541638 132915196512064 abstract_checkpointer.py:35] orbax-checkpoint version: 0.11.28
W0424 09:32:13.042257 132915196512064 checkpoint.py:202] Metadata file does not exist: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items/_CHECKPOINT_METADATA
I0424 09:32:13.610450    1964 google_auth_provider.cc:181] Running on GCE, using service account 562977990677-compute@developer.gserviceaccount.com
I0424 09:32:14.791116 132915196512064 checkpointer.py:304] Restoring checkpoint from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items.
W0424 09:32:17.034174 132915196512064 transform_utils.py:230] The transformations API will eventually be replaced by an upgraded design. The current API will not be removed until this point, but it will no longer be actively worked on.
I0424 09:32:17.034550 132915196512064 transform_utils.py:288] The following keys are not loaded from the original tree after applying specified transforms: params/params/decoder/to_nnx__rngs/aqt/count, params/params/decoder/to_nnx__rngs/aqt/key, params/params/decoder/to_nnx__rngs/dropout/count, params/params/decoder/to_nnx__rngs/dropout/key, params/params/decoder/to_nnx__rngs/params/count, params/params/decoder/to_nnx__rngs/params/key
I0424 09:32:19.319946 132915196512064 checkpointer.py:318] Finished restoring checkpoint in 4.94 seconds from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items.
I0424 09:32:20.056515 132915196512064 train_distill.py:626] Initializing Data Iterators via MaxText pipeline...
I0424 09:32:20.120247 132915196512064 config.py:112] TensorFlow version 2.20.0 available.
I0424 09:32:20.120728 132915196512064 config.py:125] JAX version 0.9.2 available.
I0424 09:32:20.555791 132915196512064 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/main/README.md "HTTP/1.1 307 Temporary Redirect"
I0424 09:32:20.563493 132915196512064 _client.py:1025] HTTP Request: HEAD https://huggingface.co/api/resolve-cache/datasets/OptimalScale/ClimbMix/6d467b96d8f26cbe7465e2d70e36191aa75867ac/README.md "HTTP/1.1 200 OK"
I0424 09:32:20.570869 132915196512064 _client.py:1025] HTTP Request: GET https://huggingface.co/api/resolve-cache/datasets/OptimalScale/ClimbMix/6d467b96d8f26cbe7465e2d70e36191aa75867ac/README.md "HTTP/1.1 200 OK"
I0424 09:32:20.767642 132915196512064 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/6d467b96d8f26cbe7465e2d70e36191aa75867ac/ClimbMix.py "HTTP/1.1 404 Not Found"
I0424 09:32:21.074615 132915196512064 _client.py:1025] HTTP Request: HEAD https://s3.amazonaws.com/datasets.huggingface.co/datasets/datasets/OptimalScale/ClimbMix/OptimalScale/ClimbMix.py "HTTP/1.1 404 Not Found"
I0424 09:32:21.204305 132915196512064 _client.py:1025] HTTP Request: GET https://huggingface.co/api/datasets/OptimalScale/ClimbMix/revision/6d467b96d8f26cbe7465e2d70e36191aa75867ac "HTTP/1.1 200 OK"
I0424 09:32:21.309369 132915196512064 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/6d467b96d8f26cbe7465e2d70e36191aa75867ac/.huggingface.yaml "HTTP/1.1 404 Not Found"
I0424 09:32:21.514175 132915196512064 _client.py:1025] HTTP Request: GET https://datasets-server.huggingface.co/info?dataset=OptimalScale/ClimbMix "HTTP/1.1 200 OK"
I0424 09:32:21.622600 132915196512064 _client.py:1025] HTTP Request: GET https://huggingface.co/api/datasets/OptimalScale/ClimbMix/tree/6d467b96d8f26cbe7465e2d70e36191aa75867ac/data?recursive=true&expand=false "HTTP/1.1 404 Not Found"
I0424 09:32:21.731795 132915196512064 _client.py:1025] HTTP Request: GET https://huggingface.co/api/datasets/OptimalScale/ClimbMix/tree/6d467b96d8f26cbe7465e2d70e36191aa75867ac?recursive=false&expand=false "HTTP/1.1 200 OK"
I0424 09:32:21.898518 132915196512064 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/6d467b96d8f26cbe7465e2d70e36191aa75867ac/dataset_infos.json "HTTP/1.1 404 Not Found"
I0424 09:32:22.064538 132915196512064 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json "HTTP/1.1 200 OK"
I0424 09:32:22.178734 132915196512064 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer_config.json "HTTP/1.1 200 OK"
I0424 09:32:22.289239 132915196512064 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main/additional_chat_templates?recursive=false&expand=false "HTTP/1.1 404 Not Found"
I0424 09:32:22.394656 132915196512064 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main?recursive=true&expand=false "HTTP/1.1 200 OK"
E0424 09:32:22.487479 132915196512064 packing.py:209] PackAndBatchOperation is deprecated. Please use lazy_dataset.FirstFitPackIterDataset instead.
I0424 09:32:22.487686 132915196512064 data_loader.py:408] Adding CopyNumPyArrayToSharedMemory MapTransform.
I0424 09:32:22.490703 132915196512064 train_distill.py:396] Input Pipeline Checkpointing: DISABLED
I0424 09:32:22.490762 132915196512064 train_distill.py:400] Reason: Iterator 'MultiHostDataLoadIterator' is not recognized as Grain (dataset_type='DatasetType.HF', has_save=False)
I0424 09:32:22.490824 132915196512064 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0424 09:32:22.490898 132915196512064 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x78e202db34d0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0424 09:32:22.490938 132915196512064 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0424 09:32:22.490968 132915196512064 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x78e202db34d0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0424 09:32:22.491010 132915196512064 checkpoint_manager.py:702] [process=6][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78dc7f517680>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78dc657ee7b0>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x78c3c84efef0>}, handler_registry=None
I0424 09:32:22.491211 132915196512064 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78dc7f517680>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0424 09:32:22.491253 132915196512064 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78dc657ee7b0>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0424 09:32:22.491279 132915196512064 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x78c3c84efef0>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0424 09:32:22.491302 132915196512064 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x78dc543c47a0>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0424 09:32:22.491333 132915196512064 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78dc7f517680>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78dc7f517680>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78dc657ee7b0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78dc657ee7b0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x78c3c84efef0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x78c3c84efef0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x78dc543c47a0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x78dc543c47a0>}).
I0424 09:32:22.491708 132915196512064 async_checkpointer.py:177] [process=6][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x78c6d4138220> timeout: 600 secs and primary_host=0 for async checkpoint writes
I0424 09:32:24.216307 132915196512064 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_trainstate_and_training_loop_20260424_091321/pt_distill_nnx_xpk_feat_nnx_trainstate_and_training_loop_20260424_091321_07_distill_smoke/checkpoints
I0424 09:32:24.489468 132915196512064 checkpoint_manager.py:921] [process=6][thread=MainThread] CheckpointManager created,  primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_trainstate_and_training_loop_20260424_091321/pt_distill_nnx_xpk_feat_nnx_trainstate_and_training_loop_20260424_091321_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x78c3c84efec0>
I0424 09:32:24.489658 132915196512064 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0424 09:32:24.489728 132915196512064 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x78e202db34d0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0424 09:32:24.489767 132915196512064 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0424 09:32:24.489797 132915196512064 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x78e202db34d0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0424 09:32:24.489833 132915196512064 checkpoint_manager.py:1983] [process=6][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning.
I0424 09:32:24.489885 132915196512064 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=132915196512064 count=1 at 0x78c6d4131240>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x78c3c84efce0>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x78c3c84efcb0>, _write_futures=[])
I0424 09:32:24.490268 132915196512064 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=132915196512064 count=1 at 0x78c6d4131240>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x78c3c84efce0>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x78c3c84efcb0>, _write_futures=[])
I0424 09:32:24.490297 132915196512064 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=132915196512064 count=1 at 0x78c6d4131240>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x78c3c84efce0>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x78c3c84efcb0>, _write_futures=[])
I0424 09:32:24.490329 132915196512064 checkpoint_manager.py:702] [process=6][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78dc54385be0>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78c3c84efe90>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x78c3c84ef8f0>, 'iter': <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x78c3c84ef170>}, handler_registry=None
I0424 09:32:24.490433 132915196512064 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78dc54385be0>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0424 09:32:24.490469 132915196512064 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78c3c84efe90>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0424 09:32:24.490492 132915196512064 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x78c3c84ef8f0>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0424 09:32:24.490520 132915196512064 composite_checkpoint_handler.py:237] Deferred registration for item: "iter". Adding handler `<maxtext.common.checkpointing.GrainCheckpointHandler object at 0x78c3c84ef170>` for item "iter" and save args `<class 'maxtext.common.checkpointing.GrainCheckpointSave'>` and restore args `<class 'maxtext.common.checkpointing.GrainCheckpointRestore'>` to `_handler_registry`.
I0424 09:32:24.490543 132915196512064 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x78c3c84ee480>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0424 09:32:24.490573 132915196512064 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78dc54385be0>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78dc54385be0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78c3c84efe90>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78c3c84efe90>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x78c3c84ef8f0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x78c3c84ef8f0>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointSave'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x78c3c84ef170>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointRestore'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x78c3c84ef170>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x78c3c84ee480>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x78c3c84ee480>}).
I0424 09:32:24.490644 132915196512064 async_checkpointer.py:177] [process=6][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x78c6d4138360> timeout: 600 secs and primary_host=0 for async checkpoint writes
I0424 09:32:24.889238 132915196512064 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_trainstate_and_training_loop_20260424_091321/pt_distill_nnx_xpk_feat_nnx_trainstate_and_training_loop_20260424_091321_07_distill_smoke/checkpoints
I0424 09:32:24.937827 132915196512064 checkpoint_manager.py:921] [process=6][thread=MainThread] CheckpointManager created,  primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_trainstate_and_training_loop_20260424_091321/pt_distill_nnx_xpk_feat_nnx_trainstate_and_training_loop_20260424_091321_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x78dc6e1fa210>
I0424 09:32:24.938341 132915196512064 train_distill.py:677] Starting Distillation Training...
I0424 09:32:24.938450 132915196512064 peft_trainer.py:584] Training with mesh: Mesh('diloco': 1, 'data': 4, 'stage': 1, 'fsdp': 8, 'fsdp_transpose': 1, 'sequence': 1, 'context': 1, 'context_autoregressive': 1, 'tensor': 1, 'tensor_transpose': 1, 'tensor_sequence': 1, 'expert': 1, 'autoregressive': 1, axis_types=(Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto))
I0424 09:32:25.423385 132915196512064 peft_trainer.py:594] Compiled train_step cache size: 0
I0424 09:32:25.425027 132748043876096 grain_pool.py:367] Grain pool will use 1 processes.
I0424 09:32:25.483028 132748043876096 grain_pool.py:440] Grain pool will start child processes.
I0424 09:32:25.488813 132748043876096 grain_pool.py:448] Grain pool started all child processes.
2026-04-24 09:32:32.003151: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
`rope_parameters`'s factor field must be a float >= 1, got 40
`rope_parameters`'s beta_fast field must be a float, got 32
`rope_parameters`'s beta_slow field must be a float, got 1
DeepseekV32Config got `key=rope_scaling` in kwargs but hasn't set it as attribute. For RoPE standardization you need to set `self.rope_parameters` in model's config. 
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 781, in <module>
    app.run(main)
  File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 316, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 261, in _run_main
    sys.exit(main(argv))
             ^^^^^^^^^^
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 777, in main
    train_distill(student_config, teacher_config, is_offline, global_config.offline_data_dir)
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 679, in train_distill
    trainer.train(train_iter, eval_iter)
  File "/usr/local/lib/python3.12/site-packages/tunix/sft/peft_trainer.py", line 652, in train
    train_example = sharding_utils.shard_input(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 58, in shard_input
    return jax.tree.map(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/tree.py", line 156, in map
    return tree_util.tree_map(f, tree, *rest, is_leaf=is_leaf)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 373, in tree_map
    return treedef.unflatten(f(*xs) for xs in zip(*all_leaves))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 373, in <genexpr>
    return treedef.unflatten(f(*xs) for xs in zip(*all_leaves))
                             ^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 59, in <lambda>
    lambda x: jax.make_array_from_process_local_data(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 985, in make_array_from_process_local_data
    out = [_array_from_process_local_data(data, s, shape)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 1047, in _array_from_process_local_data
    return make_array_from_callback(global_shape, sharding, cb)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 844, in make_array_from_callback
    per_device_values = api.device_put(per_device_values, devices)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/api.py", line 2732, in device_put
    out_flat = dispatch._batched_device_put_impl(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 602, in _batched_device_put_impl
    y = _device_put_impl(x, device=device, src=src, copy=cp, aval=aval)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 582, in _device_put_impl
    return _device_put_sharding_impl(x, aval, device, copy)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 512, in _device_put_sharding_impl
    raise ValueError(
ValueError: When the second argument to `device_put` is a Device, the first argument must be a fully addressable array or a non-addressable array with a single device sharding. Got value with devices {TpuDevice(id=11, process_index=3, coords=(3,2,0), core_on_chip=0), TpuDevice(id=19, process_index=5, coords=(3,4,0), core_on_chip=0), TpuDevice(id=20, process_index=4, coords=(0,5,0), core_on_chip=0), TpuDevice(id=13, process_index=2, coords=(1,3,0), core_on_chip=0), TpuDevice(id=17, process_index=4, coords=(1,4,0), core_on_chip=0), TpuDevice(id=8, process_index=2, coords=(0,2,0), core_on_chip=0), TpuDevice(id=22, process_index=5, coords=(2,5,0), core_on_chip=0), TpuDevice(id=4, process_index=0, coords=(0,1,0), core_on_chip=0), TpuDevice(id=9, process_index=2, coords=(1,2,0), core_on_chip=0), TpuDevice(id=26, process_index=7, coords=(2,6,0), core_on_chip=0), TpuDevice(id=28, process_index=6, coords=(0,7,0), core_on_chip=0), TpuDevice(id=5, process_index=0, coords=(1,1,0), core_on_chip=0), TpuDevice(id=7, process_index=1, coords=(3,1,0), core_on_chip=0), TpuDevice(id=18, process_index=5, coords=(2,4,0), core_on_chip=0), TpuDevice(id=15, process_index=3, coords=(3,3,0), core_on_chip=0), TpuDevice(id=6, process_index=1, coords=(2,1,0), core_on_chip=0), TpuDevice(id=21, process_index=4, coords=(1,5,0), core_on_chip=0), TpuDevice(id=27, process_index=7, coords=(3,6,0), core_on_chip=0), TpuDevice(id=2, process_index=1, coords=(2,0,0), core_on_chip=0), TpuDevice(id=30, process_index=7, coords=(2,7,0), core_on_chip=0), TpuDevice(id=3, process_index=1, coords=(3,0,0), core_on_chip=0), TpuDevice(id=23, process_index=5, coords=(3,5,0), core_on_chip=0), TpuDevice(id=14, process_index=3, coords=(2,3,0), core_on_chip=0), TpuDevice(id=24, process_index=6, coords=(0,6,0), core_on_chip=0), TpuDevice(id=16, process_index=4, coords=(0,4,0), core_on_chip=0), TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), TpuDevice(id=12, process_index=2, coords=(0,3,0), core_on_chip=0), TpuDevice(id=29, process_index=6, coords=(1,7,0), core_on_chip=0), TpuDevice(id=10, process_index=3, coords=(2,2,0), core_on_chip=0), TpuDevice(id=25, process_index=6, coords=(1,6,0), core_on_chip=0), TpuDevice(id=31, process_index=7, coords=(3,7,0), core_on_chip=0), TpuDevice(id=1, process_index=0, coords=(1,0,0), core_on_chip=0)}
I0424 09:32:36.175152 132748043876096 grain_pool.py:542] Grain pool is exiting.
I0424 09:32:36.175255 132748043876096 grain_pool.py:547] Shutting down multiprocessing system.
I0424 09:32:37.870104 132748043876096 grain_pool.py:547] Shutting down multiprocessing system.
/usr/local/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 15 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
XPK End: Fri Apr 24 09:32:47 UTC 2026
EXIT_CODE=1