XPK Start: Mon Apr 20 06:25:12 UTC 2026 2026-04-20 06:25:29.763373: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303) I0420 06:25:33.347530 138677877729088 max_utils.py:273] Attempting to initialize the jax distributed system... INFO:2026-04-20 06:25:42,387:jax._src.distributed:149: Starting JAX distributed service on [::]:8482 I0420 06:25:42.387797 138677877729088 distributed.py:149] Starting JAX distributed service on [::]:8482 INFO:2026-04-20 06:25:42,390:jax._src.distributed:166: Connecting to JAX distributed service on mt-07-distill-smoke-hv3u8-slice-job-0-0.mt-07-distill-smoke-hv3u8:8482 I0420 06:25:42.390263 138677877729088 distributed.py:166] Connecting to JAX distributed service on mt-07-distill-smoke-hv3u8-slice-job-0-0.mt-07-distill-smoke-hv3u8:8482 I0420 06:25:43.230932 138677877729088 max_utils.py:284] Jax distributed system initialized! I0420 06:25:49.445967 138677877729088 max_utils.py:244] Jax distributed system is already initialized. I0420 06:25:49.919833 138677877729088 max_utils.py:244] Jax distributed system is already initialized. I0420 06:25:49.921005 138677877729088 pyconfig.py:432] Config param abort_on_inf_loss: True I0420 06:25:49.921055 138677877729088 pyconfig.py:432] Config param abort_on_nan_loss: True I0420 06:25:49.921083 138677877729088 pyconfig.py:432] Config param act_quantization_calibration_method: absmax I0420 06:25:49.921104 138677877729088 pyconfig.py:432] Config param activation_dropout_for_audio: 0.0 I0420 06:25:49.921123 138677877729088 pyconfig.py:432] Config param activation_function_for_audio: gelu I0420 06:25:49.921141 138677877729088 pyconfig.py:432] Config param activations_in_float32: False I0420 06:25:49.921161 138677877729088 pyconfig.py:432] Config param adam_b1: 0.9 I0420 06:25:49.921180 138677877729088 pyconfig.py:432] Config param adam_b2: 0.95 I0420 06:25:49.921197 138677877729088 pyconfig.py:432] Config param adam_eps: 1e-08 I0420 06:25:49.921220 138677877729088 pyconfig.py:432] Config param adam_eps_root: 0.0 I0420 06:25:49.921236 138677877729088 pyconfig.py:432] Config param adam_weight_decay: 0.1 I0420 06:25:49.921253 138677877729088 pyconfig.py:432] Config param adamw_mask: [] I0420 06:25:49.921270 138677877729088 pyconfig.py:432] Config param add_bos: True I0420 06:25:49.921284 138677877729088 pyconfig.py:432] Config param add_eos: True I0420 06:25:49.921300 138677877729088 pyconfig.py:432] Config param allow_split_physical_axes: False I0420 06:25:49.921315 138677877729088 pyconfig.py:432] Config param ar_cache_axis_order: 1,2,0,3 I0420 06:25:49.921331 138677877729088 pyconfig.py:432] Config param async_checkpointing: True I0420 06:25:49.921347 138677877729088 pyconfig.py:432] Config param async_scheduling: False I0420 06:25:49.921363 138677877729088 pyconfig.py:432] Config param attention: dot_product I0420 06:25:49.921378 138677877729088 pyconfig.py:432] Config param attention_bias: False I0420 06:25:49.921394 138677877729088 pyconfig.py:432] Config param attention_dropout_for_audio: 0.0 I0420 06:25:49.921411 138677877729088 pyconfig.py:432] Config param attention_out: RematLocation.REMAT I0420 06:25:49.921432 138677877729088 pyconfig.py:432] Config param attention_output_dim: -1 I0420 06:25:49.921448 138677877729088 pyconfig.py:432] Config param attention_sink: False I0420 06:25:49.921465 138677877729088 pyconfig.py:432] Config param attention_type: global I0420 06:25:49.921479 138677877729088 pyconfig.py:432] Config param attn_logits_soft_cap: None I0420 06:25:49.921496 138677877729088 pyconfig.py:432] Config param audio_path: I0420 06:25:49.921511 138677877729088 pyconfig.py:432] Config param audio_placeholder: <|audio|> I0420 06:25:49.921527 138677877729088 pyconfig.py:432] Config param autoregressive_decode_assert: I0420 06:25:49.921542 138677877729088 pyconfig.py:432] Config param base_config: base.yml I0420 06:25:49.921557 138677877729088 pyconfig.py:432] Config param base_emb_dim: 16 I0420 06:25:49.921573 138677877729088 pyconfig.py:432] Config param base_mlp_dim: 64 I0420 06:25:49.921589 138677877729088 pyconfig.py:432] Config param base_moe_mlp_dim: -1 I0420 06:25:49.921605 138677877729088 pyconfig.py:432] Config param base_num_decoder_layers: 1 I0420 06:25:49.921621 138677877729088 pyconfig.py:432] Config param base_num_kv_heads: 2 I0420 06:25:49.921637 138677877729088 pyconfig.py:432] Config param base_num_query_heads: 2 I0420 06:25:49.921651 138677877729088 pyconfig.py:432] Config param base_output_directory: I0420 06:25:49.921667 138677877729088 pyconfig.py:432] Config param batch_size: 1 I0420 06:25:49.921682 138677877729088 pyconfig.py:432] Config param batch_split_factor: 1 I0420 06:25:49.921698 138677877729088 pyconfig.py:432] Config param beta_fast: 32 I0420 06:25:49.921723 138677877729088 pyconfig.py:432] Config param beta_slow: 1 I0420 06:25:49.921740 138677877729088 pyconfig.py:432] Config param bwd_quantization_calibration_method: absmax I0420 06:25:49.921756 138677877729088 pyconfig.py:432] Config param capacity_factor: -1.0 I0420 06:25:49.921771 138677877729088 pyconfig.py:432] Config param cast_logits_to_fp32: True I0420 06:25:49.921787 138677877729088 pyconfig.py:432] Config param chat_template: I0420 06:25:49.921802 138677877729088 pyconfig.py:432] Config param chat_template_path: I0420 06:25:49.921819 138677877729088 pyconfig.py:432] Config param checkpoint_conversion_fn: None I0420 06:25:49.921837 138677877729088 pyconfig.py:432] Config param checkpoint_dir: None I0420 06:25:49.921855 138677877729088 pyconfig.py:432] Config param checkpoint_is_quantized: False I0420 06:25:49.921872 138677877729088 pyconfig.py:432] Config param checkpoint_period: 2000 I0420 06:25:49.921887 138677877729088 pyconfig.py:432] Config param checkpoint_storage_concurrent_gb: 96 I0420 06:25:49.921903 138677877729088 pyconfig.py:432] Config param checkpoint_storage_target_data_file_size_bytes: 2147483648 I0420 06:25:49.921920 138677877729088 pyconfig.py:432] Config param checkpoint_storage_use_ocdbt: True I0420 06:25:49.921935 138677877729088 pyconfig.py:432] Config param checkpoint_storage_use_zarr3: True I0420 06:25:49.921951 138677877729088 pyconfig.py:432] Config param checkpoint_todelete_full_path: None I0420 06:25:49.921967 138677877729088 pyconfig.py:432] Config param checkpoint_todelete_subdir: None I0420 06:25:49.921981 138677877729088 pyconfig.py:432] Config param chips_per_vm: 4 I0420 06:25:49.922000 138677877729088 pyconfig.py:432] Config param chunk_attn_window_size: 0 I0420 06:25:49.922015 138677877729088 pyconfig.py:432] Config param collect_stack_trace: False I0420 06:25:49.922030 138677877729088 pyconfig.py:432] Config param colocated_python_checkpointing: False I0420 06:25:49.922045 138677877729088 pyconfig.py:432] Config param colocated_python_data_input: False I0420 06:25:49.922059 138677877729088 pyconfig.py:432] Config param compile_topology: I0420 06:25:49.922073 138677877729088 pyconfig.py:432] Config param compile_topology_num_slices: -1 I0420 06:25:49.922088 138677877729088 pyconfig.py:432] Config param compile_xla_flags: I0420 06:25:49.922104 138677877729088 pyconfig.py:432] Config param compiled_trainstep_file: I0420 06:25:49.922119 138677877729088 pyconfig.py:432] Config param compute_axis_order: 0,1,2,3 I0420 06:25:49.922135 138677877729088 pyconfig.py:432] Config param constant_bound_config: [] I0420 06:25:49.922151 138677877729088 pyconfig.py:432] Config param context: RematLocation.REMAT I0420 06:25:49.922168 138677877729088 pyconfig.py:432] Config param context_parallel_load_balance: True I0420 06:25:49.922184 138677877729088 pyconfig.py:432] Config param context_parallel_size: 1 I0420 06:25:49.922199 138677877729088 pyconfig.py:432] Config param context_parallel_strategy: all_gather I0420 06:25:49.922214 138677877729088 pyconfig.py:432] Config param context_sharding: context I0420 06:25:49.922230 138677877729088 pyconfig.py:432] Config param conv_chunksize_for_audio: 500 I0420 06:25:49.922244 138677877729088 pyconfig.py:432] Config param conv_stride_for_vit: 14 I0420 06:25:49.922260 138677877729088 pyconfig.py:432] Config param cost_estimate_flops_bwd: -1 I0420 06:25:49.922276 138677877729088 pyconfig.py:432] Config param cost_estimate_flops_fwd: -1 I0420 06:25:49.922291 138677877729088 pyconfig.py:432] Config param custom_mesh: I0420 06:25:49.922307 138677877729088 pyconfig.py:432] Config param custom_mesh_and_rule: I0420 06:25:49.922322 138677877729088 pyconfig.py:432] Config param d_model_for_audio: 256 I0420 06:25:49.922336 138677877729088 pyconfig.py:432] Config param data_sharding: (('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'),) I0420 06:25:49.922355 138677877729088 pyconfig.py:432] Config param data_shuffle_seed: 0 I0420 06:25:49.922369 138677877729088 pyconfig.py:432] Config param dataset_name: c4/en:3.0.1 I0420 06:25:49.922385 138677877729088 pyconfig.py:432] Config param dataset_path: I0420 06:25:49.922399 138677877729088 pyconfig.py:432] Config param dataset_type: DatasetType.HF I0420 06:25:49.922416 138677877729088 pyconfig.py:432] Config param dcn_autoregressive_parallelism: 1 I0420 06:25:49.922431 138677877729088 pyconfig.py:432] Config param dcn_context_autoregressive_parallelism: 1 I0420 06:25:49.922446 138677877729088 pyconfig.py:432] Config param dcn_context_parallelism: 1 I0420 06:25:49.922469 138677877729088 pyconfig.py:432] Config param dcn_data_parallelism: -1 I0420 06:25:49.922485 138677877729088 pyconfig.py:432] Config param dcn_diloco_parallelism: 1 I0420 06:25:49.922500 138677877729088 pyconfig.py:432] Config param dcn_expert_parallelism: 1 I0420 06:25:49.922515 138677877729088 pyconfig.py:432] Config param dcn_fsdp_parallelism: 1 I0420 06:25:49.922530 138677877729088 pyconfig.py:432] Config param dcn_fsdp_transpose_parallelism: 1 I0420 06:25:49.922544 138677877729088 pyconfig.py:432] Config param dcn_parallelism: [1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] I0420 06:25:49.922561 138677877729088 pyconfig.py:432] Config param dcn_pipeline_parallelism: 1 I0420 06:25:49.922575 138677877729088 pyconfig.py:432] Config param dcn_sequence_parallelism: 1 I0420 06:25:49.922590 138677877729088 pyconfig.py:432] Config param dcn_tensor_parallelism: 1 I0420 06:25:49.922604 138677877729088 pyconfig.py:432] Config param dcn_tensor_sequence_parallelism: 1 I0420 06:25:49.922620 138677877729088 pyconfig.py:432] Config param dcn_tensor_transpose_parallelism: 1 I0420 06:25:49.922634 138677877729088 pyconfig.py:432] Config param debug: {'rl': False} I0420 06:25:49.922651 138677877729088 pyconfig.py:432] Config param debug_sharding: False I0420 06:25:49.922666 138677877729088 pyconfig.py:432] Config param decode_sampling_nucleus_p: -1 I0420 06:25:49.922681 138677877729088 pyconfig.py:432] Config param decode_sampling_strategy: SamplingStrategy.GREEDY I0420 06:25:49.922698 138677877729088 pyconfig.py:432] Config param decode_sampling_temperature: 1.0 I0420 06:25:49.922738 138677877729088 pyconfig.py:432] Config param decode_sampling_top_k: 0 I0420 06:25:49.922753 138677877729088 pyconfig.py:432] Config param decoder_block: DecoderBlockType.GPT3 I0420 06:25:49.922904 138677877729088 pyconfig.py:432] Config param decoder_layer_input: RematLocation.DEVICE I0420 06:25:49.923008 138677877729088 pyconfig.py:432] Config param deepstack_visual_indexes_for_vit: [] I0420 06:25:49.923034 138677877729088 pyconfig.py:432] Config param degenerate_group_masking: True I0420 06:25:49.923056 138677877729088 pyconfig.py:432] Config param dense_init_scale: 1.0 I0420 06:25:49.923079 138677877729088 pyconfig.py:432] Config param diloco_outer_lr: 0.3 I0420 06:25:49.923097 138677877729088 pyconfig.py:432] Config param diloco_outer_momentum: 0.9 I0420 06:25:49.923115 138677877729088 pyconfig.py:432] Config param diloco_sync_period: 36 I0420 06:25:49.923130 138677877729088 pyconfig.py:432] Config param distill_alpha: 0.5 I0420 06:25:49.923147 138677877729088 pyconfig.py:432] Config param distill_alpha_end: None I0420 06:25:49.923165 138677877729088 pyconfig.py:432] Config param distill_alpha_schedule: constant I0420 06:25:49.923183 138677877729088 pyconfig.py:432] Config param distill_beta: 0.0 I0420 06:25:49.923199 138677877729088 pyconfig.py:432] Config param distill_beta_end: None I0420 06:25:49.923213 138677877729088 pyconfig.py:432] Config param distill_beta_schedule: constant I0420 06:25:49.923230 138677877729088 pyconfig.py:432] Config param distill_feature_loss_type: cosine I0420 06:25:49.923244 138677877729088 pyconfig.py:432] Config param distill_layer_indices: None I0420 06:25:49.923261 138677877729088 pyconfig.py:432] Config param distill_temperature: 1.0 I0420 06:25:49.923276 138677877729088 pyconfig.py:432] Config param distill_temperature_end: None I0420 06:25:49.923293 138677877729088 pyconfig.py:432] Config param distill_temperature_schedule: constant I0420 06:25:49.923309 138677877729088 pyconfig.py:432] Config param downsample_hidden_size_for_audio: 256 I0420 06:25:49.923324 138677877729088 pyconfig.py:432] Config param dpo_beta: 0.1 I0420 06:25:49.923340 138677877729088 pyconfig.py:432] Config param dpo_label_smoothing: 0.0 I0420 06:25:49.923355 138677877729088 pyconfig.py:432] Config param dq_reduction_steps: 0 I0420 06:25:49.923370 138677877729088 pyconfig.py:432] Config param dropout_rate: 0.0 I0420 06:25:49.923387 138677877729088 pyconfig.py:432] Config param dtype: bfloat16 I0420 06:25:49.923427 138677877729088 pyconfig.py:432] Config param dtype_mm: float32 I0420 06:25:49.923444 138677877729088 pyconfig.py:432] Config param dump_hlo: False I0420 06:25:49.923462 138677877729088 pyconfig.py:432] Config param dump_hlo_delete_local_after: True I0420 06:25:49.923480 138677877729088 pyconfig.py:432] Config param dump_hlo_gcs_dir: gpt3-52k_2026-04-20-06-25/xla_dump I0420 06:25:49.923498 138677877729088 pyconfig.py:432] Config param dump_hlo_local_dir: /tmp/xla_dump/ I0420 06:25:49.923517 138677877729088 pyconfig.py:432] Config param dump_hlo_local_module_name: jit_train_step I0420 06:25:49.923535 138677877729088 pyconfig.py:432] Config param dump_hlo_module_name: jit_train_step I0420 06:25:49.923553 138677877729088 pyconfig.py:432] Config param dump_hlo_upload_all: False I0420 06:25:49.923572 138677877729088 pyconfig.py:432] Config param dump_hlo_xla_flags: I0420 06:25:49.923586 138677877729088 pyconfig.py:432] Config param dump_jaxpr: False I0420 06:25:49.923603 138677877729088 pyconfig.py:432] Config param dump_jaxpr_delete_local_after: True I0420 06:25:49.923619 138677877729088 pyconfig.py:432] Config param dump_jaxpr_gcs_dir: gpt3-52k_2026-04-20-06-25/jaxpr_dump I0420 06:25:49.923636 138677877729088 pyconfig.py:432] Config param dump_jaxpr_local_dir: /tmp/jaxpr_dump/ I0420 06:25:49.923652 138677877729088 pyconfig.py:432] Config param dump_step: -1 I0420 06:25:49.923666 138677877729088 pyconfig.py:432] Config param elastic_enabled: False I0420 06:25:49.923681 138677877729088 pyconfig.py:432] Config param elastic_max_retries: 10 I0420 06:25:49.923696 138677877729088 pyconfig.py:432] Config param elastic_timeout_seconds: 300 I0420 06:25:49.923725 138677877729088 pyconfig.py:432] Config param emb_dim: 16 I0420 06:25:49.923739 138677877729088 pyconfig.py:432] Config param enable_autocheckpoint: False I0420 06:25:49.923755 138677877729088 pyconfig.py:432] Config param enable_checkpoint_cloud_logger: False I0420 06:25:49.923775 138677877729088 pyconfig.py:432] Config param enable_checkpointing: True I0420 06:25:49.923791 138677877729088 pyconfig.py:432] Config param enable_continuous_checkpointing: False I0420 06:25:49.923806 138677877729088 pyconfig.py:432] Config param enable_data_shuffling: True I0420 06:25:49.923822 138677877729088 pyconfig.py:432] Config param enable_diloco: False I0420 06:25:49.923837 138677877729088 pyconfig.py:432] Config param enable_dp_attention: False I0420 06:25:49.923851 138677877729088 pyconfig.py:432] Config param enable_dropout: False I0420 06:25:49.923866 138677877729088 pyconfig.py:432] Config param enable_emergency_checkpoint: False I0420 06:25:49.923880 138677877729088 pyconfig.py:432] Config param enable_expert_parallel: False I0420 06:25:49.923894 138677877729088 pyconfig.py:432] Config param enable_gcp_goodput_metrics: True I0420 06:25:49.923909 138677877729088 pyconfig.py:432] Config param enable_gcp_step_deviation_metrics: True I0420 06:25:49.923923 138677877729088 pyconfig.py:432] Config param enable_goodput_recording: False I0420 06:25:49.923938 138677877729088 pyconfig.py:432] Config param enable_jax_profiler: False I0420 06:25:49.923952 138677877729088 pyconfig.py:432] Config param enable_llm_inference_pool: False I0420 06:25:49.923967 138677877729088 pyconfig.py:432] Config param enable_model_warmup: False I0420 06:25:49.923981 138677877729088 pyconfig.py:432] Config param enable_multi_tier_checkpointing: False I0420 06:25:49.923996 138677877729088 pyconfig.py:432] Config param enable_nnx: False I0420 06:25:49.924011 138677877729088 pyconfig.py:432] Config param enable_orbax_v1: False I0420 06:25:49.924026 138677877729088 pyconfig.py:432] Config param enable_padding_causal_mask: True I0420 06:25:49.924040 138677877729088 pyconfig.py:432] Config param enable_pathways_goodput: False I0420 06:25:49.924055 138677877729088 pyconfig.py:432] Config param enable_prefix_caching: False I0420 06:25:49.924070 138677877729088 pyconfig.py:432] Config param enable_rampup_batch_size: False I0420 06:25:49.924085 138677877729088 pyconfig.py:432] Config param enable_single_controller: False I0420 06:25:49.924099 138677877729088 pyconfig.py:432] Config param enable_single_replica_ckpt_restoring: False I0420 06:25:49.924114 138677877729088 pyconfig.py:432] Config param enable_tensorboard: True I0420 06:25:49.924129 138677877729088 pyconfig.py:432] Config param enable_tunix_perf_metrics: False I0420 06:25:49.924145 138677877729088 pyconfig.py:432] Config param encoder_attention_heads_for_audio: 4 I0420 06:25:49.924159 138677877729088 pyconfig.py:432] Config param encoder_ffn_dim_for_audio: 512 I0420 06:25:49.924174 138677877729088 pyconfig.py:432] Config param encoder_layers_for_audio: 2 I0420 06:25:49.924188 138677877729088 pyconfig.py:432] Config param engram: RematLocation.REMAT I0420 06:25:49.924205 138677877729088 pyconfig.py:432] Config param engram_head_dim: 1280 I0420 06:25:49.924221 138677877729088 pyconfig.py:432] Config param engram_kernel_size: 4 I0420 06:25:49.924235 138677877729088 pyconfig.py:432] Config param engram_layers: [] I0420 06:25:49.924250 138677877729088 pyconfig.py:432] Config param engram_max_ngram_size: 3 I0420 06:25:49.924264 138677877729088 pyconfig.py:432] Config param engram_num_heads: 8 I0420 06:25:49.924279 138677877729088 pyconfig.py:432] Config param engram_seed: 0 I0420 06:25:49.924293 138677877729088 pyconfig.py:432] Config param engram_vocab_bases: [] I0420 06:25:49.924308 138677877729088 pyconfig.py:432] Config param epsilon_high: None I0420 06:25:49.924322 138677877729088 pyconfig.py:432] Config param eval_corr_lst: False I0420 06:25:49.924337 138677877729088 pyconfig.py:432] Config param eval_data_columns: ['text'] I0420 06:25:49.924355 138677877729088 pyconfig.py:432] Config param eval_dataset_name: c4/en:3.0.1 I0420 06:25:49.924369 138677877729088 pyconfig.py:432] Config param eval_image_column: image I0420 06:25:49.924384 138677877729088 pyconfig.py:432] Config param eval_interval: -1 I0420 06:25:49.924399 138677877729088 pyconfig.py:432] Config param eval_make_lst: False I0420 06:25:49.924414 138677877729088 pyconfig.py:432] Config param eval_per_device_batch_size: 2 I0420 06:25:49.924428 138677877729088 pyconfig.py:432] Config param eval_sampling_strategy: greedy I0420 06:25:49.924443 138677877729088 pyconfig.py:432] Config param eval_split: validation I0420 06:25:49.924458 138677877729088 pyconfig.py:432] Config param eval_steps: -1 I0420 06:25:49.924472 138677877729088 pyconfig.py:432] Config param expansion_factor_real_data: -1.0 I0420 06:25:49.924488 138677877729088 pyconfig.py:432] Config param final_logits_soft_cap: None I0420 06:25:49.924502 138677877729088 pyconfig.py:432] Config param first_num_dense_layers: 0 I0420 06:25:49.924517 138677877729088 pyconfig.py:432] Config param float32_gate_logits: False I0420 06:25:49.924532 138677877729088 pyconfig.py:432] Config param float32_logits: False I0420 06:25:49.924546 138677877729088 pyconfig.py:432] Config param float32_qk_product: False I0420 06:25:49.924561 138677877729088 pyconfig.py:432] Config param float32_weight_sum: True I0420 06:25:49.924576 138677877729088 pyconfig.py:432] Config param force_q_layout: False I0420 06:25:49.924591 138677877729088 pyconfig.py:432] Config param force_unroll: False I0420 06:25:49.924605 138677877729088 pyconfig.py:432] Config param freeze_audio_encoder_params: True I0420 06:25:49.924621 138677877729088 pyconfig.py:432] Config param freeze_vision_encoder_params: True I0420 06:25:49.924634 138677877729088 pyconfig.py:432] Config param fused_mlp: False I0420 06:25:49.924649 138677877729088 pyconfig.py:432] Config param fused_qkv: True I0420 06:25:49.924663 138677877729088 pyconfig.py:432] Config param gcs_metrics: False I0420 06:25:49.924686 138677877729088 pyconfig.py:432] Config param gdn_chunk_size: 64 I0420 06:25:49.924702 138677877729088 pyconfig.py:432] Config param gdn_conv_kernel_dim: 4 I0420 06:25:49.924728 138677877729088 pyconfig.py:432] Config param gdn_key_head_dim: 128 I0420 06:25:49.924742 138677877729088 pyconfig.py:432] Config param gdn_num_key_heads: 16 I0420 06:25:49.924757 138677877729088 pyconfig.py:432] Config param gdn_num_value_heads: 32 I0420 06:25:49.924775 138677877729088 pyconfig.py:432] Config param gdn_value_head_dim: 128 I0420 06:25:49.924790 138677877729088 pyconfig.py:432] Config param generate_padding_batch_eval: False I0420 06:25:49.924806 138677877729088 pyconfig.py:432] Config param generate_padding_batch_train: False I0420 06:25:49.924820 138677877729088 pyconfig.py:432] Config param generate_slice: v5e-16 I0420 06:25:49.924835 138677877729088 pyconfig.py:432] Config param generation_configs: {} I0420 06:25:49.924850 138677877729088 pyconfig.py:432] Config param global_batch_size_to_eval_on: 64 I0420 06:25:49.924865 138677877729088 pyconfig.py:432] Config param global_batch_size_to_load: 512 I0420 06:25:49.924880 138677877729088 pyconfig.py:432] Config param global_batch_size_to_load_eval: 64 I0420 06:25:49.924896 138677877729088 pyconfig.py:432] Config param global_batch_size_to_load_increment: None I0420 06:25:49.924910 138677877729088 pyconfig.py:432] Config param global_batch_size_to_load_start: None I0420 06:25:49.924927 138677877729088 pyconfig.py:432] Config param global_batch_size_to_train_on: 512 I0420 06:25:49.924941 138677877729088 pyconfig.py:432] Config param global_head_dim: 0 I0420 06:25:49.924957 138677877729088 pyconfig.py:432] Config param global_num_kv_heads: 0 I0420 06:25:49.924972 138677877729088 pyconfig.py:432] Config param global_parameter_scale: 1 I0420 06:25:49.924986 138677877729088 pyconfig.py:432] Config param global_rampup_samples: 500 I0420 06:25:49.925001 138677877729088 pyconfig.py:432] Config param global_rope_max_timescale: -1 I0420 06:25:49.925015 138677877729088 pyconfig.py:432] Config param global_rope_proportion: 0.25 I0420 06:25:49.925031 138677877729088 pyconfig.py:432] Config param goodput_upload_interval_seconds: 30 I0420 06:25:49.925047 138677877729088 pyconfig.py:432] Config param grad_dtype: float32 I0420 06:25:49.925085 138677877729088 pyconfig.py:432] Config param gradient_accumulation_steps: 8 I0420 06:25:49.925103 138677877729088 pyconfig.py:432] Config param gradient_clipping_threshold: 1.0 I0420 06:25:49.925118 138677877729088 pyconfig.py:432] Config param grain_data_source_max_workers: 16 I0420 06:25:49.925134 138677877729088 pyconfig.py:432] Config param grain_eval_files: I0420 06:25:49.925148 138677877729088 pyconfig.py:432] Config param grain_file_type: arrayrecord I0420 06:25:49.925163 138677877729088 pyconfig.py:432] Config param grain_num_threads: 16 I0420 06:25:49.925178 138677877729088 pyconfig.py:432] Config param grain_num_threads_eval: 16 I0420 06:25:49.925194 138677877729088 pyconfig.py:432] Config param grain_packing_type: first_fit I0420 06:25:49.925209 138677877729088 pyconfig.py:432] Config param grain_per_worker_buffer_size: 1 I0420 06:25:49.925225 138677877729088 pyconfig.py:432] Config param grain_per_worker_buffer_size_eval: 1 I0420 06:25:49.925239 138677877729088 pyconfig.py:432] Config param grain_prefetch_buffer_size: 500 I0420 06:25:49.925253 138677877729088 pyconfig.py:432] Config param grain_prefetch_buffer_size_eval: 500 I0420 06:25:49.925268 138677877729088 pyconfig.py:432] Config param grain_ram_budget_mb: 1024 I0420 06:25:49.925282 138677877729088 pyconfig.py:432] Config param grain_shuffle_buffer_size: 100 I0420 06:25:49.925298 138677877729088 pyconfig.py:432] Config param grain_train_files: I0420 06:25:49.925313 138677877729088 pyconfig.py:432] Config param grain_train_mixture_config_path: I0420 06:25:49.925327 138677877729088 pyconfig.py:432] Config param grain_worker_count: 1 I0420 06:25:49.925342 138677877729088 pyconfig.py:432] Config param grain_worker_count_eval: 1 I0420 06:25:49.925358 138677877729088 pyconfig.py:432] Config param grpo_beta: 0.08 I0420 06:25:49.925372 138677877729088 pyconfig.py:432] Config param grpo_epsilon: 0.2 I0420 06:25:49.925388 138677877729088 pyconfig.py:432] Config param hardware: tpu I0420 06:25:49.925403 138677877729088 pyconfig.py:432] Config param hbm_utilization_vllm: 0.72 I0420 06:25:49.925419 138677877729088 pyconfig.py:432] Config param head_dim: 8 I0420 06:25:49.925434 138677877729088 pyconfig.py:432] Config param heartbeat_reporting_interval_in_seconds: 5 I0420 06:25:49.925449 138677877729088 pyconfig.py:432] Config param hf_data_dir: None I0420 06:25:49.925464 138677877729088 pyconfig.py:432] Config param hf_eval_files: None I0420 06:25:49.925479 138677877729088 pyconfig.py:432] Config param hf_eval_split: None I0420 06:25:49.925493 138677877729088 pyconfig.py:432] Config param hf_name: None I0420 06:25:49.925509 138677877729088 pyconfig.py:432] Config param hf_path: OptimalScale/ClimbMix I0420 06:25:49.925523 138677877729088 pyconfig.py:432] Config param hf_train_files: None I0420 06:25:49.925540 138677877729088 pyconfig.py:432] Config param hidden_size_for_vit: 1408 I0420 06:25:49.925554 138677877729088 pyconfig.py:432] Config param hide_profiler_step_metric: False I0420 06:25:49.925570 138677877729088 pyconfig.py:432] Config param ici_autoregressive_parallelism: 1 I0420 06:25:49.925585 138677877729088 pyconfig.py:432] Config param ici_context_autoregressive_parallelism: 1 I0420 06:25:49.925600 138677877729088 pyconfig.py:432] Config param ici_context_parallelism: 1 I0420 06:25:49.925615 138677877729088 pyconfig.py:432] Config param ici_data_parallelism: 1 I0420 06:25:49.925629 138677877729088 pyconfig.py:432] Config param ici_diloco_parallelism: 1 I0420 06:25:49.925644 138677877729088 pyconfig.py:432] Config param ici_expert_parallelism: 1 I0420 06:25:49.925658 138677877729088 pyconfig.py:432] Config param ici_fsdp_parallelism: -1 I0420 06:25:49.925673 138677877729088 pyconfig.py:432] Config param ici_fsdp_transpose_parallelism: 1 I0420 06:25:49.925689 138677877729088 pyconfig.py:432] Config param ici_parallelism: [1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1] I0420 06:25:49.925719 138677877729088 pyconfig.py:432] Config param ici_pipeline_parallelism: 1 I0420 06:25:49.925735 138677877729088 pyconfig.py:432] Config param ici_sequence_parallelism: 1 I0420 06:25:49.925753 138677877729088 pyconfig.py:432] Config param ici_tensor_parallelism: 1 I0420 06:25:49.925766 138677877729088 pyconfig.py:432] Config param ici_tensor_sequence_parallelism: 1 I0420 06:25:49.925786 138677877729088 pyconfig.py:432] Config param ici_tensor_transpose_parallelism: 1 I0420 06:25:49.925801 138677877729088 pyconfig.py:432] Config param image_path: I0420 06:25:49.925817 138677877729088 pyconfig.py:432] Config param image_placeholder: <|image|> I0420 06:25:49.925832 138677877729088 pyconfig.py:432] Config param image_size_for_vit: 896 I0420 06:25:49.925846 138677877729088 pyconfig.py:432] Config param indexer_head_dim: 128 I0420 06:25:49.925862 138677877729088 pyconfig.py:432] Config param indexer_loss_scaling_factor: 0.0 I0420 06:25:49.925877 138677877729088 pyconfig.py:432] Config param indexer_n_heads: 64 I0420 06:25:49.925892 138677877729088 pyconfig.py:432] Config param indexer_sparse_training: False I0420 06:25:49.925906 138677877729088 pyconfig.py:432] Config param indexer_topk: 2048 I0420 06:25:49.925921 138677877729088 pyconfig.py:432] Config param inference_benchmark_test: False I0420 06:25:49.925936 138677877729088 pyconfig.py:432] Config param inference_metadata_file: I0420 06:25:49.925950 138677877729088 pyconfig.py:432] Config param inference_microbenchmark_log_file_path: I0420 06:25:49.925965 138677877729088 pyconfig.py:432] Config param inference_microbenchmark_loop_iters: 10 I0420 06:25:49.925980 138677877729088 pyconfig.py:432] Config param inference_microbenchmark_num_samples: [1, 2, 3, 4, 5] I0420 06:25:49.925995 138677877729088 pyconfig.py:432] Config param inference_microbenchmark_prefill_lengths: 64,128,256,512,1024 I0420 06:25:49.926010 138677877729088 pyconfig.py:432] Config param inference_microbenchmark_stages: prefill,generate I0420 06:25:49.926024 138677877729088 pyconfig.py:432] Config param inference_server: MaxtextInterleavedServer I0420 06:25:49.926039 138677877729088 pyconfig.py:432] Config param inhomogeneous_layer_cycle_interval: 1 I0420 06:25:49.926053 138677877729088 pyconfig.py:432] Config param init_weights_seed: 0 I0420 06:25:49.926068 138677877729088 pyconfig.py:432] Config param input_data_sharding_logical_axes: ['activation_embed_and_logits_batch', 'activation_norm_length'] I0420 06:25:49.926085 138677877729088 pyconfig.py:432] Config param interleave_moe_layer_step: 1 I0420 06:25:49.926100 138677877729088 pyconfig.py:432] Config param intermediate_size_for_vit: 5632 I0420 06:25:49.926115 138677877729088 pyconfig.py:432] Config param internal_compile: False I0420 06:25:49.926129 138677877729088 pyconfig.py:432] Config param internal_compile_num_devices: -1 I0420 06:25:49.926144 138677877729088 pyconfig.py:432] Config param jax_cache_dir: ~/jax_cache I0420 06:25:49.926159 138677877729088 pyconfig.py:432] Config param jax_debug_log_modules: I0420 06:25:49.926174 138677877729088 pyconfig.py:432] Config param jax_distributed_initialization_timeout: 300 I0420 06:25:49.926190 138677877729088 pyconfig.py:432] Config param jax_profiler_port: 9999 I0420 06:25:49.926205 138677877729088 pyconfig.py:432] Config param key_proj: RematLocation.REMAT I0420 06:25:49.926223 138677877729088 pyconfig.py:432] Config param kv_cache_buffer: 256 I0420 06:25:49.926237 138677877729088 pyconfig.py:432] Config param kv_lora_rank: 512 I0420 06:25:49.926252 138677877729088 pyconfig.py:432] Config param kv_quant_axis: KvQuantAxis.HEADS_AND_DKV I0420 06:25:49.926273 138677877729088 pyconfig.py:432] Config param kv_quant_dtype: int8 I0420 06:25:49.926288 138677877729088 pyconfig.py:432] Config param kv_wa_proj: RematLocation.REMAT I0420 06:25:49.926302 138677877729088 pyconfig.py:432] Config param learning_rate: 0.0002 I0420 06:25:49.926318 138677877729088 pyconfig.py:432] Config param learning_rate_final_fraction: 0.1 I0420 06:25:49.926333 138677877729088 pyconfig.py:432] Config param learning_rate_schedule_steps: 200000 I0420 06:25:49.926347 138677877729088 pyconfig.py:432] Config param load_balance_loss_weight: 0.0 I0420 06:25:49.926361 138677877729088 pyconfig.py:432] Config param load_checkpoint_only_once: False I0420 06:25:49.926377 138677877729088 pyconfig.py:432] Config param load_from_prefill_dir: False I0420 06:25:49.926392 138677877729088 pyconfig.py:432] Config param load_full_state_path: I0420 06:25:49.926406 138677877729088 pyconfig.py:432] Config param load_parameters_path: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items I0420 06:25:49.926422 138677877729088 pyconfig.py:432] Config param local_checkpoint_directory: I0420 06:25:49.926436 138677877729088 pyconfig.py:432] Config param local_checkpoint_period: 0 I0420 06:25:49.926450 138677877729088 pyconfig.py:432] Config param local_rope_max_timescale: -1 I0420 06:25:49.926465 138677877729088 pyconfig.py:432] Config param local_rope_proportion: 1.0 I0420 06:25:49.926480 138677877729088 pyconfig.py:432] Config param log_config: True I0420 06:25:49.926495 138677877729088 pyconfig.py:432] Config param log_period: 10 I0420 06:25:49.926510 138677877729088 pyconfig.py:432] Config param logical_axis_rules: (('activation_embed_and_logits_batch', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_embed_and_logits_batch_sequence', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_vocab', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_vocab', ('tensor', 'tensor_transpose')), ('activation_vocab', 'tensor_sequence'), ('activation_vocab', ('sequence', 'context')), ('vocab', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('embed_vocab', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence', 'autoregressive')), ('activation_kv_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence')), ('activation_attn_length', ('sequence', 'context')), ('activation_attn_length', ('context',)), ('activation_q_length', ('context',)), ('activation_kv_length', ()), ('activation_attn_embed', ('tensor', 'tensor_transpose')), ('activation_kv', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_kv_head_dim', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('q_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('kv_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('qkv', ()), ('kv', ()), ('kv_head_dim', ()), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'expert')), ('q_lora_up_proj', ()), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'expert')), ('kv_lora_up_proj', ()), ('activation_batch_moe', ('data', 'fsdp', 'fsdp_transpose')), ('activation_length_moe', ('sequence', 'context')), ('activation_length_moe', ('context',)), ('activation_norm_length_moe', ('tensor_sequence', 'context', 'sequence')), ('activation_embed_moe', ('tensor', 'tensor_transpose')), ('activation_mlp_moe', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_exp', ('expert',)), ('exp', 'expert'), ('mlp_moe', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'context')), ('embed_moe', ('fsdp', 'sequence', 'context')), ('activation_mlp', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_length', ('sequence', 'context')), ('activation_length', ('context',)), ('activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_embed', ('tensor', 'tensor_transpose')), ('activation_stage', 'stage'), ('mlp', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'context', 'expert')), ('norm', ('tensor', 'tensor_transpose')), ('layers', 'stage'), ('diloco', 'diloco'), ('engram_dim', ('tensor',)), ('dense_layers', ()), ('moe_layers', ()), ('mhc', ()), ('prefill_activation_length', ('sequence', 'context')), ('prefill_activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_prefill_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_length', ('sequence',)), ('cache_heads', ('autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence')), ('cache_heads', ('autoregressive', 'tensor', 'tensor_sequence')), ('paged_kv_heads', ('tensor',)), ('cache_batch_prefill', ()), ('cache_batch', ()), ('cache_heads_none', ()), ('cache_kv', ()), ('cache_sequence', ()), ('num_pages', ()), ('tokens_per_page', ()), ('paged_kv_head_dim_size', ()), ('mlp_no_fsdp', ('tensor', 'tensor_sequence', 'autoregressive')), ('embed_tensor_transpose', ('tensor_transpose',)), ('exp_with_fsdp', 'fsdp')) I0420 06:25:49.926638 138677877729088 pyconfig.py:432] Config param logits_dot_in_fp32: False I0420 06:25:49.926656 138677877729088 pyconfig.py:432] Config param logits_via_embedding: True I0420 06:25:49.926673 138677877729088 pyconfig.py:432] Config param lora_input_adapters_path: I0420 06:25:49.926687 138677877729088 pyconfig.py:432] Config param loss_algo: grpo I0420 06:25:49.926703 138677877729088 pyconfig.py:432] Config param lr_schedule_type: LearningRateScheduleType.COSINE I0420 06:25:49.926733 138677877729088 pyconfig.py:432] Config param managed_mldiagnostics: False I0420 06:25:49.926749 138677877729088 pyconfig.py:432] Config param managed_mldiagnostics_dir: None I0420 06:25:49.926765 138677877729088 pyconfig.py:432] Config param managed_mldiagnostics_run_group: I0420 06:25:49.926785 138677877729088 pyconfig.py:432] Config param matmul_precision: MatmulPrecision.DEFAULT I0420 06:25:49.926803 138677877729088 pyconfig.py:432] Config param max_checkify: False I0420 06:25:49.926819 138677877729088 pyconfig.py:432] Config param max_concurrency: 256 I0420 06:25:49.926833 138677877729088 pyconfig.py:432] Config param max_corpus_chars: 10000000 I0420 06:25:49.926848 138677877729088 pyconfig.py:432] Config param max_num_batched_tokens: None I0420 06:25:49.926862 138677877729088 pyconfig.py:432] Config param max_num_checkpoints_to_keep: None I0420 06:25:49.926877 138677877729088 pyconfig.py:432] Config param max_num_images_per_example: -1 I0420 06:25:49.926892 138677877729088 pyconfig.py:432] Config param max_num_seqs: None I0420 06:25:49.926906 138677877729088 pyconfig.py:432] Config param max_position_embeddings: 163840 I0420 06:25:49.926921 138677877729088 pyconfig.py:432] Config param max_prefill_predict_length: 64 I0420 06:25:49.926937 138677877729088 pyconfig.py:432] Config param max_sample_len_for_audio: 10000 I0420 06:25:49.926950 138677877729088 pyconfig.py:432] Config param max_segments_per_seq: -1 I0420 06:25:49.926965 138677877729088 pyconfig.py:432] Config param max_source_positions_for_audio: 1500 I0420 06:25:49.926980 138677877729088 pyconfig.py:432] Config param max_target_length: 2048 I0420 06:25:49.926995 138677877729088 pyconfig.py:432] Config param max_timescale_for_audio: 10000.0 I0420 06:25:49.927010 138677877729088 pyconfig.py:432] Config param megablox: True I0420 06:25:49.927025 138677877729088 pyconfig.py:432] Config param merge_gating_gmm: False I0420 06:25:49.927039 138677877729088 pyconfig.py:432] Config param mesh_axes: ['diloco', 'data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'] I0420 06:25:49.927058 138677877729088 pyconfig.py:432] Config param metrics_dir: None I0420 06:25:49.927072 138677877729088 pyconfig.py:432] Config param metrics_file: I0420 06:25:49.927087 138677877729088 pyconfig.py:432] Config param mhc_expansion_rate: 1 I0420 06:25:49.927101 138677877729088 pyconfig.py:432] Config param micro_batch_size_to_eval_on: 64 I0420 06:25:49.927117 138677877729088 pyconfig.py:432] Config param micro_batch_size_to_train_on: 64 I0420 06:25:49.927130 138677877729088 pyconfig.py:432] Config param mla_kv: RematLocation.REMAT I0420 06:25:49.927146 138677877729088 pyconfig.py:432] Config param mla_naive_kvcache: True I0420 06:25:49.927161 138677877729088 pyconfig.py:432] Config param mla_q: RematLocation.REMAT I0420 06:25:49.927176 138677877729088 pyconfig.py:432] Config param mlp_activations: ['gelu'] I0420 06:25:49.927192 138677877729088 pyconfig.py:432] Config param mlp_activations_limit: -1.0 I0420 06:25:49.927207 138677877729088 pyconfig.py:432] Config param mlp_bias: False I0420 06:25:49.927221 138677877729088 pyconfig.py:432] Config param mlp_dim: 64 I0420 06:25:49.927236 138677877729088 pyconfig.py:432] Config param mlpwi: RematLocation.REMAT I0420 06:25:49.927252 138677877729088 pyconfig.py:432] Config param mlpwi_0: RematLocation.REMAT I0420 06:25:49.927266 138677877729088 pyconfig.py:432] Config param mlpwi_1: RematLocation.REMAT I0420 06:25:49.927281 138677877729088 pyconfig.py:432] Config param mlpwo: RematLocation.REMAT I0420 06:25:49.927298 138677877729088 pyconfig.py:432] Config param moba: False I0420 06:25:49.927312 138677877729088 pyconfig.py:432] Config param moba_chunk_size: 1024 I0420 06:25:49.927327 138677877729088 pyconfig.py:432] Config param moba_topk: 8 I0420 06:25:49.927341 138677877729088 pyconfig.py:432] Config param model_call_mode: I0420 06:25:49.927356 138677877729088 pyconfig.py:432] Config param model_name: gpt3-52k I0420 06:25:49.927371 138677877729088 pyconfig.py:432] Config param moe_expert_input_dim: -1 I0420 06:25:49.927385 138677877729088 pyconfig.py:432] Config param moe_fsdp_use_two_stage_all_gather: False I0420 06:25:49.927401 138677877729088 pyconfig.py:432] Config param moe_mlp_dim: -1 I0420 06:25:49.927415 138677877729088 pyconfig.py:432] Config param moe_mlpwi_0: RematLocation.REMAT I0420 06:25:49.927430 138677877729088 pyconfig.py:432] Config param moe_mlpwi_1: RematLocation.REMAT I0420 06:25:49.927444 138677877729088 pyconfig.py:432] Config param moe_mlpwo: RematLocation.REMAT I0420 06:25:49.927460 138677877729088 pyconfig.py:432] Config param monitor_goodput: False I0420 06:25:49.927474 138677877729088 pyconfig.py:432] Config param monitor_step_time_deviation: True I0420 06:25:49.927489 138677877729088 pyconfig.py:432] Config param mrope_section: [24, 20, 20] I0420 06:25:49.927504 138677877729088 pyconfig.py:432] Config param mscale: 1.0 I0420 06:25:49.927519 138677877729088 pyconfig.py:432] Config param mtc_data_parallelism: 0 I0420 06:25:49.927535 138677877729088 pyconfig.py:432] Config param mtp_eval_target_module: 0 I0420 06:25:49.927548 138677877729088 pyconfig.py:432] Config param mtp_loss_scaling_factor: 0.1 I0420 06:25:49.927564 138677877729088 pyconfig.py:432] Config param mtp_num_layers: 0 I0420 06:25:49.927578 138677877729088 pyconfig.py:432] Config param mu_dtype: float32 I0420 06:25:49.927602 138677877729088 pyconfig.py:432] Config param multi_sampling: False I0420 06:25:49.927617 138677877729088 pyconfig.py:432] Config param multi_tier_checkpointing_backup_interval_minutes: 0 I0420 06:25:49.927632 138677877729088 pyconfig.py:432] Config param muon_beta: 0.95 I0420 06:25:49.927648 138677877729088 pyconfig.py:432] Config param muon_consistent_rms: None I0420 06:25:49.927662 138677877729088 pyconfig.py:432] Config param muon_weight_decay: 0.0 I0420 06:25:49.927677 138677877729088 pyconfig.py:432] Config param n_routing_groups: -1 I0420 06:25:49.927692 138677877729088 pyconfig.py:432] Config param n_window_for_audio: 50 I0420 06:25:49.927716 138677877729088 pyconfig.py:432] Config param n_window_infer_for_audio: 800 I0420 06:25:49.927732 138677877729088 pyconfig.py:432] Config param nope_layer_interval: -1 I0420 06:25:49.927747 138677877729088 pyconfig.py:432] Config param norm_topk_prob: False I0420 06:25:49.927762 138677877729088 pyconfig.py:432] Config param normalization_layer_epsilon: 1e-05 I0420 06:25:49.927786 138677877729088 pyconfig.py:432] Config param normalize_embedding_logits: False I0420 06:25:49.927801 138677877729088 pyconfig.py:432] Config param num_attention_heads_for_vit: 16 I0420 06:25:49.927816 138677877729088 pyconfig.py:432] Config param num_batches: 4 I0420 06:25:49.927830 138677877729088 pyconfig.py:432] Config param num_channels_for_vit: 3 I0420 06:25:49.927844 138677877729088 pyconfig.py:432] Config param num_conv_layers_for_audio: 3 I0420 06:25:49.927859 138677877729088 pyconfig.py:432] Config param num_decoder_layers: 1 I0420 06:25:49.927873 138677877729088 pyconfig.py:432] Config param num_diloco_replicas: 1 I0420 06:25:49.927889 138677877729088 pyconfig.py:432] Config param num_epoch: 1 I0420 06:25:49.927903 138677877729088 pyconfig.py:432] Config param num_eval_passes: 1 I0420 06:25:49.927919 138677877729088 pyconfig.py:432] Config param num_experts: 1 I0420 06:25:49.927934 138677877729088 pyconfig.py:432] Config param num_experts_per_tok: 1 I0420 06:25:49.927949 138677877729088 pyconfig.py:432] Config param num_generations: 2 I0420 06:25:49.927965 138677877729088 pyconfig.py:432] Config param num_hidden_layers_for_vit: 34 I0420 06:25:49.927979 138677877729088 pyconfig.py:432] Config param num_iterations: 1 I0420 06:25:49.927994 138677877729088 pyconfig.py:432] Config param num_kv_heads: 2 I0420 06:25:49.928009 138677877729088 pyconfig.py:432] Config param num_layers_per_pipeline_stage: 1 I0420 06:25:49.928024 138677877729088 pyconfig.py:432] Config param num_mel_bins_for_audio: 128 I0420 06:25:49.928038 138677877729088 pyconfig.py:432] Config param num_pipeline_microbatches: -1 I0420 06:25:49.928053 138677877729088 pyconfig.py:432] Config param num_pipeline_repeats: -1 I0420 06:25:49.928068 138677877729088 pyconfig.py:432] Config param num_position_embeddings_for_vit: 1024 I0420 06:25:49.928083 138677877729088 pyconfig.py:432] Config param num_query_heads: 2 I0420 06:25:49.928097 138677877729088 pyconfig.py:432] Config param num_samplers_slices: -1 I0420 06:25:49.928111 138677877729088 pyconfig.py:432] Config param num_slices: 1 I0420 06:25:49.928126 138677877729088 pyconfig.py:432] Config param num_target_devices: 32 I0420 06:25:49.928141 138677877729088 pyconfig.py:432] Config param num_test_batches: 5 I0420 06:25:49.928156 138677877729088 pyconfig.py:432] Config param num_trainer_slices: -1 I0420 06:25:49.928171 138677877729088 pyconfig.py:432] Config param num_vocab_tiling: 1 I0420 06:25:49.928185 138677877729088 pyconfig.py:432] Config param off_policy_steps: 0 I0420 06:25:49.928200 138677877729088 pyconfig.py:432] Config param offline_data_dir: None I0420 06:25:49.928214 138677877729088 pyconfig.py:432] Config param opt_type: OptimizerType.ADAM_PAX I0420 06:25:49.928232 138677877729088 pyconfig.py:432] Config param optimize_mesh_for_tpu_v6e: False I0420 06:25:49.928246 138677877729088 pyconfig.py:432] Config param optimizer_memory_host_offload: False I0420 06:25:49.928261 138677877729088 pyconfig.py:432] Config param original_max_position_embeddings: 4096 I0420 06:25:49.928276 138677877729088 pyconfig.py:432] Config param out_hidden_size_for_vit: 512 I0420 06:25:49.928291 138677877729088 pyconfig.py:432] Config param out_proj: RematLocation.REMAT I0420 06:25:49.928305 138677877729088 pyconfig.py:432] Config param output_dim_for_audio: 512 I0420 06:25:49.928321 138677877729088 pyconfig.py:432] Config param override_logical_axis_rules: False I0420 06:25:49.928335 138677877729088 pyconfig.py:432] Config param override_model_config: True I0420 06:25:49.928350 138677877729088 pyconfig.py:432] Config param packing: True I0420 06:25:49.928364 138677877729088 pyconfig.py:432] Config param pagedattn_head_dim_alignment: 128 I0420 06:25:49.928379 138677877729088 pyconfig.py:432] Config param pagedattn_max_pages_per_group: -1 I0420 06:25:49.928393 138677877729088 pyconfig.py:432] Config param pagedattn_num_pages: 64 I0420 06:25:49.928409 138677877729088 pyconfig.py:432] Config param pagedattn_pages_per_compute_block: 4 I0420 06:25:49.928423 138677877729088 pyconfig.py:432] Config param pagedattn_tokens_per_page: 32 I0420 06:25:49.928437 138677877729088 pyconfig.py:432] Config param param_scan_axis: 1 I0420 06:25:49.928453 138677877729088 pyconfig.py:432] Config param parameter_memory_host_offload: False I0420 06:25:49.928467 138677877729088 pyconfig.py:432] Config param partial_rotary_factor: 1.0 I0420 06:25:49.928482 138677877729088 pyconfig.py:432] Config param patch_size_for_vit: 14 I0420 06:25:49.928497 138677877729088 pyconfig.py:432] Config param penalty_incorrect_answer: -1.0 I0420 06:25:49.928513 138677877729088 pyconfig.py:432] Config param penalty_incorrect_format: -0.5 I0420 06:25:49.928528 138677877729088 pyconfig.py:432] Config param per_device_batch_size: 2 I0420 06:25:49.928543 138677877729088 pyconfig.py:432] Config param per_device_batch_size_increment: 2.0 I0420 06:25:49.928558 138677877729088 pyconfig.py:432] Config param per_device_batch_size_start: 4.0 I0420 06:25:49.928572 138677877729088 pyconfig.py:432] Config param pipeline_delay_activation_forwarding: False I0420 06:25:49.928587 138677877729088 pyconfig.py:432] Config param pipeline_fsdp_ag_once: False I0420 06:25:49.928601 138677877729088 pyconfig.py:432] Config param pipeline_fsdp_ag_per_repeat: False I0420 06:25:49.928616 138677877729088 pyconfig.py:432] Config param pipeline_parallel_layers: 1 I0420 06:25:49.928630 138677877729088 pyconfig.py:432] Config param pixel_shuffle_ratio_for_vit: 0.5 I0420 06:25:49.928646 138677877729088 pyconfig.py:432] Config param posemb_type_for_vit: learn I0420 06:25:49.928660 138677877729088 pyconfig.py:432] Config param position_id_per_seconds: 25 I0420 06:25:49.928675 138677877729088 pyconfig.py:432] Config param prefill_cache_axis_order: 1,2,0,3 I0420 06:25:49.928690 138677877729088 pyconfig.py:432] Config param prefill_cache_dir: I0420 06:25:49.928718 138677877729088 pyconfig.py:432] Config param prefill_chunk_size: 256 I0420 06:25:49.928734 138677877729088 pyconfig.py:432] Config param prefill_slice: v5e-16 I0420 06:25:49.928748 138677877729088 pyconfig.py:432] Config param prefix_caching_dram_byte: 100000000000 I0420 06:25:49.928764 138677877729088 pyconfig.py:432] Config param prefix_caching_hbm_byte: 10000000000 I0420 06:25:49.928783 138677877729088 pyconfig.py:432] Config param profile_cleanly: True I0420 06:25:49.928797 138677877729088 pyconfig.py:432] Config param profile_periodically_period: -1 I0420 06:25:49.928812 138677877729088 pyconfig.py:432] Config param profile_power_events: False I0420 06:25:49.928827 138677877729088 pyconfig.py:432] Config param profiler: ProfilerType.NONE I0420 06:25:49.928844 138677877729088 pyconfig.py:432] Config param profiler_steps: 5 I0420 06:25:49.928859 138677877729088 pyconfig.py:432] Config param projector_dropout_for_vit: 0.0 I0420 06:25:49.928874 138677877729088 pyconfig.py:432] Config param projector_input_dim_for_vit: 4096 I0420 06:25:49.928890 138677877729088 pyconfig.py:432] Config param projector_output_dim_for_vit: 4096 I0420 06:25:49.928905 138677877729088 pyconfig.py:432] Config param prometheus_port: 0 I0420 06:25:49.928920 138677877729088 pyconfig.py:432] Config param prompt: I love to I0420 06:25:49.928933 138677877729088 pyconfig.py:432] Config param pure_nnx: False I0420 06:25:49.928949 138677877729088 pyconfig.py:432] Config param pure_nnx_decoder: False I0420 06:25:49.928964 138677877729088 pyconfig.py:432] Config param q_lora_rank: 0 I0420 06:25:49.928978 138677877729088 pyconfig.py:432] Config param qk_clip_threshold: 100.0 I0420 06:25:49.928994 138677877729088 pyconfig.py:432] Config param qk_nope_head_dim: 128 I0420 06:25:49.929008 138677877729088 pyconfig.py:432] Config param qk_norm_with_scale: True I0420 06:25:49.929023 138677877729088 pyconfig.py:432] Config param qk_rope_head_dim: 64 I0420 06:25:49.929037 138677877729088 pyconfig.py:432] Config param qkv_proj: RematLocation.REMAT I0420 06:25:49.929053 138677877729088 pyconfig.py:432] Config param quant_cfg_path: I0420 06:25:49.929067 138677877729088 pyconfig.py:432] Config param quantization: QuantizationType.NONE I0420 06:25:49.929085 138677877729088 pyconfig.py:432] Config param quantization_local_shard_count: 4 I0420 06:25:49.929100 138677877729088 pyconfig.py:432] Config param quantize_kvcache: False I0420 06:25:49.929117 138677877729088 pyconfig.py:432] Config param query_proj: RematLocation.REMAT I0420 06:25:49.929133 138677877729088 pyconfig.py:432] Config param query_wa_proj: RematLocation.REMAT I0420 06:25:49.929149 138677877729088 pyconfig.py:432] Config param ragged_block_size: 256 I0420 06:25:49.929163 138677877729088 pyconfig.py:432] Config param ragged_buffer_factor: -1.0 I0420 06:25:49.929180 138677877729088 pyconfig.py:432] Config param rampup_end_step: 0 I0420 06:25:49.929196 138677877729088 pyconfig.py:432] Config param rampup_samples_per_increment_to_load: None I0420 06:25:49.929212 138677877729088 pyconfig.py:432] Config param reasoning_end_token: </reasoning> I0420 06:25:49.929228 138677877729088 pyconfig.py:432] Config param reasoning_start_token: <reasoning> I0420 06:25:49.929243 138677877729088 pyconfig.py:432] Config param record_internal_nn_metrics: 0 I0420 06:25:49.929259 138677877729088 pyconfig.py:432] Config param remat_policy: full I0420 06:25:49.929275 138677877729088 pyconfig.py:432] Config param remat_policy_for_vit: minimal I0420 06:25:49.929291 138677877729088 pyconfig.py:432] Config param remove_size_one_mesh_axis_from_type: True I0420 06:25:49.929306 138677877729088 pyconfig.py:432] Config param replicate_quant_scale: False I0420 06:25:49.929321 138677877729088 pyconfig.py:432] Config param replicator_backup_interval_minutes: 0 I0420 06:25:49.929336 138677877729088 pyconfig.py:432] Config param report_heartbeat_metric_for_gcp_monitoring: False I0420 06:25:49.929352 138677877729088 pyconfig.py:432] Config param report_performance_metric_for_gcp_monitoring: False I0420 06:25:49.929366 138677877729088 pyconfig.py:432] Config param reshape_q: False I0420 06:25:49.929381 138677877729088 pyconfig.py:432] Config param return_log_prob: False I0420 06:25:49.929397 138677877729088 pyconfig.py:432] Config param reuse_example_batch: 0 I0420 06:25:49.929411 138677877729088 pyconfig.py:432] Config param reward_exact_answer: 5.0 I0420 06:25:49.929426 138677877729088 pyconfig.py:432] Config param reward_exact_format_match: 3.0 I0420 06:25:49.929440 138677877729088 pyconfig.py:432] Config param reward_partial_format_match: 0.5 I0420 06:25:49.929456 138677877729088 pyconfig.py:432] Config param reward_ratio_guess_to_answer_high: 0.5 I0420 06:25:49.929471 138677877729088 pyconfig.py:432] Config param reward_ratio_guess_to_answer_low: 0.25 I0420 06:25:49.929485 138677877729088 pyconfig.py:432] Config param reward_white_space_format_match: 1.5 I0420 06:25:49.929501 138677877729088 pyconfig.py:432] Config param rl: {'num_generations': 2, 'num_iterations': 1, 'grpo_beta': 0.08, 'grpo_epsilon': 0.2, 'loss_algo': 'grpo', 'use_agentic_rollout': False, 'max_concurrency': 256, 'off_policy_steps': 0, 'system_prompt': '', 'degenerate_group_masking': True, 'epsilon_high': None} I0420 06:25:49.929522 138677877729088 pyconfig.py:432] Config param rollout_data_parallelism: -1 I0420 06:25:49.929537 138677877729088 pyconfig.py:432] Config param rollout_expert_parallelism: 1 I0420 06:25:49.929552 138677877729088 pyconfig.py:432] Config param rollout_micro_batch_size: -1 I0420 06:25:49.929569 138677877729088 pyconfig.py:432] Config param rollout_tensor_parallelism: -1 I0420 06:25:49.929584 138677877729088 pyconfig.py:432] Config param rope_attention_scaling: False I0420 06:25:49.929598 138677877729088 pyconfig.py:432] Config param rope_factor: 40 I0420 06:25:49.929613 138677877729088 pyconfig.py:432] Config param rope_interleave: True I0420 06:25:49.929628 138677877729088 pyconfig.py:432] Config param rope_linear_scaling_factor: 1.0 I0420 06:25:49.929642 138677877729088 pyconfig.py:432] Config param rope_max_timescale: 10000 I0420 06:25:49.929656 138677877729088 pyconfig.py:432] Config param rope_min_timescale: 1 I0420 06:25:49.929672 138677877729088 pyconfig.py:432] Config param rope_theta_for_vit: 10000 I0420 06:25:49.929685 138677877729088 pyconfig.py:432] Config param rope_truncate: True I0420 06:25:49.929700 138677877729088 pyconfig.py:432] Config param rope_type: RopeType.DEFAULT I0420 06:25:49.929727 138677877729088 pyconfig.py:432] Config param rope_use_scale: True I0420 06:25:49.929742 138677877729088 pyconfig.py:432] Config param routed_bias: False I0420 06:25:49.929756 138677877729088 pyconfig.py:432] Config param routed_bias_update_rate: 0.0 I0420 06:25:49.929774 138677877729088 pyconfig.py:432] Config param routed_scaling_factor: 1.0 I0420 06:25:49.929788 138677877729088 pyconfig.py:432] Config param routed_score_func: I0420 06:25:49.929804 138677877729088 pyconfig.py:432] Config param run_name: gpt3-52k_2026-04-20-06-25 I0420 06:25:49.929819 138677877729088 pyconfig.py:432] Config param sa_block_kv: 512 I0420 06:25:49.929833 138677877729088 pyconfig.py:432] Config param sa_block_kv_compute: 512 I0420 06:25:49.929848 138677877729088 pyconfig.py:432] Config param sa_block_kv_dkv: 512 I0420 06:25:49.929864 138677877729088 pyconfig.py:432] Config param sa_block_kv_dkv_compute: 512 I0420 06:25:49.929880 138677877729088 pyconfig.py:432] Config param sa_block_kv_dq: 512 I0420 06:25:49.929894 138677877729088 pyconfig.py:432] Config param sa_block_q: 512 I0420 06:25:49.929908 138677877729088 pyconfig.py:432] Config param sa_block_q_dkv: 512 I0420 06:25:49.929923 138677877729088 pyconfig.py:432] Config param sa_block_q_dq: 512 I0420 06:25:49.929939 138677877729088 pyconfig.py:432] Config param sa_k_layout: HEAD_DIM_MINOR I0420 06:25:49.929953 138677877729088 pyconfig.py:432] Config param sa_q_layout: HEAD_DIM_MINOR I0420 06:25:49.929968 138677877729088 pyconfig.py:432] Config param sa_use_fused_bwd_kernel: False I0420 06:25:49.929982 138677877729088 pyconfig.py:432] Config param sa_v_layout: HEAD_DIM_MINOR I0420 06:25:49.929997 138677877729088 pyconfig.py:432] Config param sampler_devices_fraction: 0.5 I0420 06:25:49.930013 138677877729088 pyconfig.py:432] Config param save_checkpoint_on_completion: True I0420 06:25:49.930027 138677877729088 pyconfig.py:432] Config param save_config_to_gcs: False I0420 06:25:49.930043 138677877729088 pyconfig.py:432] Config param save_quantized_params_path: I0420 06:25:49.930058 138677877729088 pyconfig.py:432] Config param scale_embedding_for_audio: True I0420 06:25:49.930072 138677877729088 pyconfig.py:432] Config param scan_layers: True I0420 06:25:49.930087 138677877729088 pyconfig.py:432] Config param scan_layers_per_stage: False I0420 06:25:49.930101 138677877729088 pyconfig.py:432] Config param scan_pipeline_iterations: True I0420 06:25:49.930116 138677877729088 pyconfig.py:432] Config param scan_pipeline_repeats: False I0420 06:25:49.930131 138677877729088 pyconfig.py:432] Config param set_remat_policy_on_layers_per_stage: False I0420 06:25:49.930145 138677877729088 pyconfig.py:432] Config param set_remat_policy_on_pipeline_iterations: True I0420 06:25:49.930160 138677877729088 pyconfig.py:432] Config param sft_train_on_completion_only: False I0420 06:25:49.930174 138677877729088 pyconfig.py:432] Config param shard_exp_on_fsdp: False I0420 06:25:49.930189 138677877729088 pyconfig.py:432] Config param shard_mode: ShardMode.AUTO I0420 06:25:49.930207 138677877729088 pyconfig.py:432] Config param shard_optimizer_over_data: False I0420 06:25:49.930221 138677877729088 pyconfig.py:432] Config param sharding_strategy: None I0420 06:25:49.930235 138677877729088 pyconfig.py:432] Config param sharding_tolerance: 0.02 I0420 06:25:49.930251 138677877729088 pyconfig.py:432] Config param shardy: True I0420 06:25:49.930267 138677877729088 pyconfig.py:432] Config param share_kv_projections: False I0420 06:25:49.930283 138677877729088 pyconfig.py:432] Config param shared_experts: 0 I0420 06:25:49.930298 138677877729088 pyconfig.py:432] Config param sinkhorn_iterations: 20 I0420 06:25:49.930314 138677877729088 pyconfig.py:432] Config param skip_first_n_steps_for_profiler: 1 I0420 06:25:49.930328 138677877729088 pyconfig.py:432] Config param skip_jax_distributed_system: False I0420 06:25:49.930343 138677877729088 pyconfig.py:432] Config param skip_step_interval: 128 I0420 06:25:49.930358 138677877729088 pyconfig.py:432] Config param skip_step_on_spikes: False I0420 06:25:49.930372 138677877729088 pyconfig.py:432] Config param skip_step_scaling_factor: 6.0 I0420 06:25:49.930387 138677877729088 pyconfig.py:432] Config param sliding_window_size: 0 I0420 06:25:49.930401 138677877729088 pyconfig.py:432] Config param solution_end_token: </answer> I0420 06:25:49.930416 138677877729088 pyconfig.py:432] Config param solution_start_token: <answer> I0420 06:25:49.930432 138677877729088 pyconfig.py:432] Config param source_checkpoint_layout: orbax I0420 06:25:49.930446 138677877729088 pyconfig.py:432] Config param sparse_matmul: True I0420 06:25:49.930461 138677877729088 pyconfig.py:432] Config param spatial_merge_size_for_vit: 2 I0420 06:25:49.930475 138677877729088 pyconfig.py:432] Config param stack_prefill_result_cache: False I0420 06:25:49.930491 138677877729088 pyconfig.py:432] Config param stack_trace_interval_seconds: 600 I0420 06:25:49.930505 138677877729088 pyconfig.py:432] Config param stack_trace_to_cloud: False I0420 06:25:49.930519 138677877729088 pyconfig.py:432] Config param step_deviation_interval_seconds: 30 I0420 06:25:49.930535 138677877729088 pyconfig.py:432] Config param steps: 200000 I0420 06:25:49.930549 138677877729088 pyconfig.py:432] Config param stop_strings: None I0420 06:25:49.930564 138677877729088 pyconfig.py:432] Config param student_overrides: {'model_name': 'llama3.1-8b'} I0420 06:25:49.930580 138677877729088 pyconfig.py:432] Config param student_params_to_update: None I0420 06:25:49.930594 138677877729088 pyconfig.py:432] Config param subslice_shape: I0420 06:25:49.930609 138677877729088 pyconfig.py:432] Config param swap_space_vllm_gb: 2 I0420 06:25:49.930624 138677877729088 pyconfig.py:432] Config param system_prompt: I0420 06:25:49.930639 138677877729088 pyconfig.py:432] Config param target_eval_loss: 0.0 I0420 06:25:49.930653 138677877729088 pyconfig.py:432] Config param teacher_overrides: {'model_name': 'llama3.1-8b'} I0420 06:25:49.930669 138677877729088 pyconfig.py:432] Config param temperature_tuning: False I0420 06:25:49.930683 138677877729088 pyconfig.py:432] Config param temporal_patch_size_for_vit: 2 I0420 06:25:49.930698 138677877729088 pyconfig.py:432] Config param tensorboard_dir: None I0420 06:25:49.930720 138677877729088 pyconfig.py:432] Config param tensors_on_device: None I0420 06:25:49.930734 138677877729088 pyconfig.py:432] Config param tensors_to_offload: None I0420 06:25:49.930750 138677877729088 pyconfig.py:432] Config param test_batch_start_index: 0 I0420 06:25:49.930764 138677877729088 pyconfig.py:432] Config param tile_size_for_vit: 336 I0420 06:25:49.930782 138677877729088 pyconfig.py:432] Config param tokenize_eval_data: True I0420 06:25:49.930796 138677877729088 pyconfig.py:432] Config param tokenize_train_data: True I0420 06:25:49.930812 138677877729088 pyconfig.py:432] Config param tokenizer_path: meta-llama/Llama-3.1-8B I0420 06:25:49.930827 138677877729088 pyconfig.py:432] Config param tokenizer_type: TokenizerType.HUGGINGFACE I0420 06:25:49.930844 138677877729088 pyconfig.py:432] Config param topk_routing_group: -1 I0420 06:25:49.930858 138677877729088 pyconfig.py:432] Config param train_data_columns: ['text'] I0420 06:25:49.930874 138677877729088 pyconfig.py:432] Config param train_fraction: 1.0 I0420 06:25:49.930890 138677877729088 pyconfig.py:432] Config param train_image_column: image I0420 06:25:49.930906 138677877729088 pyconfig.py:432] Config param train_micro_batch_size: -1 I0420 06:25:49.930920 138677877729088 pyconfig.py:432] Config param train_split: train I0420 06:25:49.930935 138677877729088 pyconfig.py:432] Config param trainable_parameters_mask: [] I0420 06:25:49.930950 138677877729088 pyconfig.py:432] Config param trainable_position_size: 2048 I0420 06:25:49.930964 138677877729088 pyconfig.py:432] Config param trainer_devices_fraction: 0.5 I0420 06:25:49.930980 138677877729088 pyconfig.py:432] Config param upload_all_profiler_results: False I0420 06:25:49.930994 138677877729088 pyconfig.py:432] Config param use_2d_fsdp_sharding: False I0420 06:25:49.931009 138677877729088 pyconfig.py:432] Config param use_agentic_rollout: False I0420 06:25:49.931027 138677877729088 pyconfig.py:432] Config param use_audio: False I0420 06:25:49.931041 138677877729088 pyconfig.py:432] Config param use_audio_in_video: False I0420 06:25:49.931055 138677877729088 pyconfig.py:432] Config param use_batch_split_schedule: False I0420 06:25:49.931071 138677877729088 pyconfig.py:432] Config param use_chat_template: False I0420 06:25:49.931085 138677877729088 pyconfig.py:432] Config param use_chunked_prefill: False I0420 06:25:49.931100 138677877729088 pyconfig.py:432] Config param use_custom_sort_vjp: True I0420 06:25:49.931115 138677877729088 pyconfig.py:432] Config param use_dpo: False I0420 06:25:49.931128 138677877729088 pyconfig.py:432] Config param use_gather_mosaic_kernel: False I0420 06:25:49.931144 138677877729088 pyconfig.py:432] Config param use_grpo: True I0420 06:25:49.931159 138677877729088 pyconfig.py:432] Config param use_indexer: False I0420 06:25:49.931172 138677877729088 pyconfig.py:432] Config param use_iota_embed: True I0420 06:25:49.931187 138677877729088 pyconfig.py:432] Config param use_jax_splash: False I0420 06:25:49.931202 138677877729088 pyconfig.py:432] Config param use_max_logit_estimate: -1 I0420 06:25:49.931217 138677877729088 pyconfig.py:432] Config param use_mrope: False I0420 06:25:49.931232 138677877729088 pyconfig.py:432] Config param use_multimodal: False I0420 06:25:49.931247 138677877729088 pyconfig.py:432] Config param use_pathways: True I0420 06:25:49.931261 138677877729088 pyconfig.py:432] Config param use_post_attn_norm: False I0420 06:25:49.931276 138677877729088 pyconfig.py:432] Config param use_post_ffw_norm: False I0420 06:25:49.931290 138677877729088 pyconfig.py:432] Config param use_qk_clip: False I0420 06:25:49.931305 138677877729088 pyconfig.py:432] Config param use_qk_norm: False I0420 06:25:49.931319 138677877729088 pyconfig.py:432] Config param use_qk_norm_in_gdn: True I0420 06:25:49.931335 138677877729088 pyconfig.py:432] Config param use_qwix_quantization: False I0420 06:25:49.931349 138677877729088 pyconfig.py:432] Config param use_ragged_attention: False I0420 06:25:49.931363 138677877729088 pyconfig.py:432] Config param use_random_routing: False I0420 06:25:49.931378 138677877729088 pyconfig.py:432] Config param use_replicator_service: False I0420 06:25:49.931392 138677877729088 pyconfig.py:432] Config param use_ring_of_experts: False I0420 06:25:49.931407 138677877729088 pyconfig.py:432] Config param use_sft: False I0420 06:25:49.931421 138677877729088 pyconfig.py:432] Config param use_splash_scheduler: False I0420 06:25:49.931436 138677877729088 pyconfig.py:432] Config param use_tokamax_gmm: False I0420 06:25:49.931450 138677877729088 pyconfig.py:432] Config param use_tokamax_splash: False I0420 06:25:49.931465 138677877729088 pyconfig.py:432] Config param use_truncation: True I0420 06:25:49.931478 138677877729088 pyconfig.py:432] Config param use_tunix_gradient_accumulation: False I0420 06:25:49.931494 138677877729088 pyconfig.py:432] Config param use_untrainable_positional_embedding: False I0420 06:25:49.931508 138677877729088 pyconfig.py:432] Config param use_vertex_tensorboard: False I0420 06:25:49.931522 138677877729088 pyconfig.py:432] Config param using_pipeline_parallelism: False I0420 06:25:49.931536 138677877729088 pyconfig.py:432] Config param v_head_dim: 128 I0420 06:25:49.931551 138677877729088 pyconfig.py:432] Config param v_norm_with_scale: True I0420 06:25:49.931565 138677877729088 pyconfig.py:432] Config param value_proj: RematLocation.REMAT I0420 06:25:49.931580 138677877729088 pyconfig.py:432] Config param vertex_tensorboard_project: I0420 06:25:49.931594 138677877729088 pyconfig.py:432] Config param vertex_tensorboard_region: I0420 06:25:49.931609 138677877729088 pyconfig.py:432] Config param video_path: I0420 06:25:49.931623 138677877729088 pyconfig.py:432] Config param video_placeholder: <|video|> I0420 06:25:49.931638 138677877729088 pyconfig.py:432] Config param vision_output_dim_for_vit: 4096 I0420 06:25:49.931652 138677877729088 pyconfig.py:432] Config param vision_output_length: -1 I0420 06:25:49.931668 138677877729088 pyconfig.py:432] Config param vllm_additional_config: {} I0420 06:25:49.931682 138677877729088 pyconfig.py:432] Config param vllm_hf_config_path: I0420 06:25:49.931697 138677877729088 pyconfig.py:432] Config param vllm_hf_overrides: {} I0420 06:25:49.931724 138677877729088 pyconfig.py:432] Config param vocab_size: 32000 I0420 06:25:49.931738 138677877729088 pyconfig.py:432] Config param warmup_steps_fraction: 0.1 I0420 06:25:49.931754 138677877729088 pyconfig.py:432] Config param weight_dtype: float32 I0420 06:25:49.931781 138677877729088 pyconfig.py:432] Config param weight_quantization_calibration_method: absmax I0420 06:25:49.931795 138677877729088 pyconfig.py:432] Config param wi_tile_dlhs_batch_seq: 512 I0420 06:25:49.931810 138677877729088 pyconfig.py:432] Config param wi_tile_dlhs_embed_dim: 1024 I0420 06:25:49.931824 138677877729088 pyconfig.py:432] Config param wi_tile_dlhs_mlp_dim: 1024 I0420 06:25:49.931839 138677877729088 pyconfig.py:432] Config param wi_tile_drhs_batch_seq: 512 I0420 06:25:49.931853 138677877729088 pyconfig.py:432] Config param wi_tile_drhs_embed_dim: 1024 I0420 06:25:49.931868 138677877729088 pyconfig.py:432] Config param wi_tile_drhs_mlp_dim: 1024 I0420 06:25:49.931882 138677877729088 pyconfig.py:432] Config param wi_tile_fwd_batch_seq: 512 I0420 06:25:49.931897 138677877729088 pyconfig.py:432] Config param wi_tile_fwd_embed_dim: 1024 I0420 06:25:49.931911 138677877729088 pyconfig.py:432] Config param wi_tile_fwd_mlp_dim: 1024 I0420 06:25:49.931926 138677877729088 pyconfig.py:432] Config param wo_tile_dlhs_batch_seq: 512 I0420 06:25:49.931940 138677877729088 pyconfig.py:432] Config param wo_tile_dlhs_embed_dim: 1024 I0420 06:25:49.931956 138677877729088 pyconfig.py:432] Config param wo_tile_dlhs_mlp_dim: 1024 I0420 06:25:49.931969 138677877729088 pyconfig.py:432] Config param wo_tile_drhs_batch_seq: 512 I0420 06:25:49.931984 138677877729088 pyconfig.py:432] Config param wo_tile_drhs_embed_dim: 1024 I0420 06:25:49.931998 138677877729088 pyconfig.py:432] Config param wo_tile_drhs_mlp_dim: 1024 I0420 06:25:49.932013 138677877729088 pyconfig.py:432] Config param wo_tile_fwd_batch_seq: 512 I0420 06:25:49.932027 138677877729088 pyconfig.py:432] Config param wo_tile_fwd_embed_dim: 1024 I0420 06:25:49.932042 138677877729088 pyconfig.py:432] Config param wo_tile_fwd_mlp_dim: 1024 I0420 06:25:49.932056 138677877729088 pyconfig.py:432] Config param wsd_decay_steps_fraction: 0.1 I0420 06:25:49.932071 138677877729088 pyconfig.py:432] Config param wsd_decay_style: WsdDecayStyle.LINEAR I0420 06:25:49.932087 138677877729088 pyconfig.py:432] Config param xprof_e2e_enable_fw_power_level_event: False I0420 06:25:49.932102 138677877729088 pyconfig.py:432] Config param xprof_e2e_enable_fw_thermal_event: False I0420 06:25:49.932116 138677877729088 pyconfig.py:432] Config param xprof_e2e_enable_fw_throttle_event: False I0420 06:25:49.932131 138677877729088 pyconfig.py:432] Config param xprof_tpu_power_trace_level: 0 I0420 06:25:49.932150 138677877729088 pyconfig.py:432] Config param z_loss_multiplier: 0.0 I0420 06:25:49.932682 138677877729088 tokenizer.py:245] Tokenizer path: meta-llama/Llama-2-7b-chat-hf I0420 06:25:49.932735 138677877729088 tokenizer.py:224] Loading HF tokenizer: meta-llama/Llama-2-7b-chat-hf I0420 06:25:53.923814 138677877729088 _schedule.py:129] A polynomial schedule was set with a non-positive `transition_steps` value; this results in a constant schedule with value `init_value`. I0420 06:25:53.926832 138677877729088 maxtext_utils.py:1551] Num_devices: 32, shape (1, 4, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1) I0420 06:25:53.926956 138677877729088 train_distill.py:596] Applying logical axis rules for model initialization and training... I0420 06:25:53.927028 138677877729088 train_distill.py:600] Loading Student from ... I0420 06:25:53.927056 138677877729088 train_distill.py:169] --- Student Configuration --- I0420 06:25:53.927079 138677877729088 train_distill.py:170] Model Name: gpt3-52k I0420 06:25:53.927099 138677877729088 train_distill.py:171] Dimensions: 1 Layers, 16 Emb Dim, 8 Head Dim I0420 06:25:53.927117 138677877729088 train_distill.py:174] Attention Heads: 2 Query, 2 KV I0420 06:25:53.927135 138677877729088 train_distill.py:175] Vocab Size: 32000 I0420 06:25:53.927154 138677877729088 train_distill.py:176] Checkpoint: I0420 06:25:53.927170 138677877729088 train_distill.py:465] Initializing model: gpt3-52k... I0420 06:25:55.196683 138677877729088 train_distill.py:614] Loading Teacher from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items... I0420 06:25:55.196806 138677877729088 train_distill.py:169] --- Teacher Configuration --- I0420 06:25:55.196835 138677877729088 train_distill.py:170] Model Name: gpt3-52k I0420 06:25:55.196858 138677877729088 train_distill.py:171] Dimensions: 1 Layers, 16 Emb Dim, 8 Head Dim I0420 06:25:55.196878 138677877729088 train_distill.py:174] Attention Heads: 2 Query, 2 KV I0420 06:25:55.196897 138677877729088 train_distill.py:175] Vocab Size: 32000 I0420 06:25:55.196915 138677877729088 train_distill.py:176] Checkpoint: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items I0420 06:25:55.196934 138677877729088 train_distill.py:465] Initializing model: gpt3-52k... I0420 06:25:56.339755 138677877729088 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0420 06:25:56.340204 138677877729088 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=True, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7e1fbe033950>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0420 06:25:56.340264 138677877729088 abstract_checkpointer.py:35] orbax-checkpoint version: 0.11.28 W0420 06:25:56.874283 138677877729088 checkpoint.py:202] Metadata file does not exist: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items/_CHECKPOINT_METADATA I0420 06:25:57.837016 2080 google_auth_provider.cc:181] Running on GCE, using service account 562977990677-compute@developer.gserviceaccount.com I0420 06:25:58.942958 138677877729088 checkpointer.py:304] Restoring checkpoint from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items. W0420 06:26:01.099759 138677877729088 transform_utils.py:230] The transformations API will eventually be replaced by an upgraded design. The current API will not be removed until this point, but it will no longer be actively worked on. I0420 06:26:01.100130 138677877729088 transform_utils.py:288] The following keys are not loaded from the original tree after applying specified transforms: params/params/decoder/to_nnx__rngs/aqt/count, params/params/decoder/to_nnx__rngs/aqt/key, params/params/decoder/to_nnx__rngs/dropout/count, params/params/decoder/to_nnx__rngs/dropout/key, params/params/decoder/to_nnx__rngs/params/count, params/params/decoder/to_nnx__rngs/params/key I0420 06:26:01.661573 138677877729088 checkpointer.py:318] Finished restoring checkpoint in 3.08 seconds from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items. I0420 06:26:02.361082 138677877729088 train_distill.py:640] Initializing Data Iterators via MaxText pipeline... I0420 06:26:02.424364 138677877729088 config.py:112] TensorFlow version 2.20.0 available. I0420 06:26:02.424903 138677877729088 config.py:125] JAX version 0.8.3 available. E0420 06:26:04.475353 138677877729088 packing.py:209] PackAndBatchOperation is deprecated. Please use lazy_dataset.FirstFitPackIterDataset instead. I0420 06:26:04.475576 138677877729088 data_loader.py:408] Adding CopyNumPyArrayToSharedMemory MapTransform. I0420 06:26:04.480053 138677877729088 train_distill.py:410] Input Pipeline Checkpointing: DISABLED I0420 06:26:04.480122 138677877729088 train_distill.py:414] Reason: Iterator 'MultiHostDataLoadIterator' is not recognized as Grain (dataset_type='DatasetType.HF', has_save=False) I0420 06:26:04.480185 138677877729088 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0420 06:26:04.480261 138677877729088 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7e1fbe033950>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0420 06:26:04.480302 138677877729088 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0420 06:26:04.480334 138677877729088 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7e1fbe033950>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0420 06:26:04.480378 138677877729088 checkpoint_manager.py:702] [process=1][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e0670167890>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e095aaf92e0>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e0670167770>}, handler_registry=None I0420 06:26:04.480573 138677877729088 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e0670167890>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0420 06:26:04.480614 138677877729088 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e095aaf92e0>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0420 06:26:04.480641 138677877729088 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e0670167770>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0420 06:26:04.480665 138677877729088 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e06b050d550>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0420 06:26:04.480698 138677877729088 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e0670167890>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e0670167890>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e095aaf92e0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e095aaf92e0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e0670167770>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e0670167770>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e06b050d550>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e06b050d550>}). I0420 06:26:04.481126 138677877729088 async_checkpointer.py:177] [process=1][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7e067007efc0> timeout: 600 secs and primary_host=0 for async checkpoint writes I0420 06:26:07.173629 138677877729088 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_main_20260420_060038/pt_distill_nnx_xpk_main_20260420_060038_07_distill_smoke/checkpoints I0420 06:26:07.562336 138677877729088 checkpoint_manager.py:921] [process=1][thread=MainThread] CheckpointManager created, primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_main_20260420_060038/pt_distill_nnx_xpk_main_20260420_060038_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7e0670167740> I0420 06:26:07.562506 138677877729088 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0420 06:26:07.562574 138677877729088 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7e1fbe033950>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0420 06:26:07.562618 138677877729088 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0420 06:26:07.562665 138677877729088 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7e1fbe033950>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0420 06:26:07.562732 138677877729088 checkpoint_manager.py:1983] [process=1][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning. I0420 06:26:07.562790 138677877729088 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=138677877729088 count=1 at 0x7e067006f780>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7e0670167560>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7e0670167530>, _write_futures=[]) I0420 06:26:07.563152 138677877729088 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=138677877729088 count=1 at 0x7e067006f780>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7e0670167560>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7e0670167530>, _write_futures=[]) I0420 06:26:07.563179 138677877729088 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=138677877729088 count=1 at 0x7e067006f780>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7e0670167560>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7e0670167530>, _write_futures=[]) I0420 06:26:07.563211 138677877729088 checkpoint_manager.py:702] [process=1][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e0670167710>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e0670165520>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e0670165e80>, 'iter': <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7e0670165760>}, handler_registry=None I0420 06:26:07.563315 138677877729088 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e0670167710>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0420 06:26:07.563350 138677877729088 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e0670165520>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0420 06:26:07.563375 138677877729088 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e0670165e80>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0420 06:26:07.563402 138677877729088 composite_checkpoint_handler.py:237] Deferred registration for item: "iter". Adding handler `<maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7e0670165760>` for item "iter" and save args `<class 'maxtext.common.checkpointing.GrainCheckpointSave'>` and restore args `<class 'maxtext.common.checkpointing.GrainCheckpointRestore'>` to `_handler_registry`. I0420 06:26:07.563425 138677877729088 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e0670165370>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0420 06:26:07.563451 138677877729088 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e0670167710>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e0670167710>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e0670165520>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e0670165520>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e0670165e80>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e0670165e80>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointSave'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7e0670165760>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointRestore'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7e0670165760>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e0670165370>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e0670165370>}). I0420 06:26:07.563520 138677877729088 async_checkpointer.py:177] [process=1][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7e067007f240> timeout: 600 secs and primary_host=0 for async checkpoint writes I0420 06:26:07.944906 138677877729088 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_main_20260420_060038/pt_distill_nnx_xpk_main_20260420_060038_07_distill_smoke/checkpoints I0420 06:26:07.956285 138677877729088 checkpoint_manager.py:921] [process=1][thread=MainThread] CheckpointManager created, primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_main_20260420_060038/pt_distill_nnx_xpk_main_20260420_060038_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7e095af76ab0> I0420 06:26:07.956739 138677877729088 train_distill.py:691] Starting Distillation Training... I0420 06:26:07.956837 138677877729088 peft_trainer.py:590] Training with mesh: Mesh('diloco': 1, 'data': 4, 'stage': 1, 'fsdp': 8, 'fsdp_transpose': 1, 'sequence': 1, 'context': 1, 'context_autoregressive': 1, 'tensor': 1, 'tensor_transpose': 1, 'tensor_sequence': 1, 'expert': 1, 'autoregressive': 1, axis_types=(Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto)) I0420 06:26:08.670662 138677877729088 peft_trainer.py:600] Compiled train_step cache size: 0 Training: 0%| | 0/5 [00:00<?, ?step/s]I0420 06:26:08.672580 138538582529792 grain_pool.py:367] Grain pool will use 1 processes. I0420 06:26:08.698618 138538582529792 grain_pool.py:440] Grain pool will start child processes. I0420 06:26:08.703777 138538582529792 grain_pool.py:448] Grain pool started all child processes. 2026-04-20 06:26:14.742543: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303) I0420 06:26:17.870551 138677877729088 utils.py:86] Train loop finished in: 9.1993 seconds Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 765, in <module> app.run(main) File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 316, in run _run_main(main, args) File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 261, in _run_main sys.exit(main(argv)) ^^^^^^^^^^ File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 761, in main train_distill(student_config, teacher_config, is_offline, global_config.offline_data_dir) File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 693, in train_distill trainer.train(train_iter, eval_iter) File "/usr/local/lib/python3.12/site-packages/tunix/sft/peft_trainer.py", line 659, in train train_example = sharding_utils.shard_input( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 58, in shard_input return jax.tree.map( ^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/tree.py", line 155, in map return tree_util.tree_map(f, tree, *rest, is_leaf=is_leaf) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 362, in tree_map return treedef.unflatten(f(*xs) for xs in zip(*all_leaves)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 362, in <genexpr> return treedef.unflatten(f(*xs) for xs in zip(*all_leaves)) ^^^^^^ File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 59, in <lambda> lambda x: jax.make_array_from_process_local_data( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 986, in make_array_from_process_local_data out = [_array_from_process_local_data(data, s, shape) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 1048, in _array_from_process_local_data return make_array_from_callback(global_shape, sharding, cb) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 845, in make_array_from_callback per_device_values = api.device_put(per_device_values, devices) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/api.py", line 2729, in device_put out_flat = dispatch._batched_device_put_impl( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 558, in _batched_device_put_impl y = _device_put_impl(x, device=device, src=src, copy=cp, aval=aval) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 545, in _device_put_impl return _device_put_sharding_impl(x, aval, device, copy) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 487, in _device_put_sharding_impl raise ValueError( ValueError: device_put's first argument must be a fully addressable array, but got value with devices {TpuDevice(id=29, process_index=6, coords=(1,7,0), core_on_chip=0), TpuDevice(id=8, process_index=2, coords=(0,2,0), core_on_chip=0), TpuDevice(id=14, process_index=3, coords=(2,3,0), core_on_chip=0), TpuDevice(id=20, process_index=4, coords=(0,5,0), core_on_chip=0), TpuDevice(id=9, process_index=2, coords=(1,2,0), core_on_chip=0), TpuDevice(id=23, process_index=5, coords=(3,5,0), core_on_chip=0), TpuDevice(id=1, process_index=0, coords=(1,0,0), core_on_chip=0), TpuDevice(id=6, process_index=1, coords=(2,1,0), core_on_chip=0), TpuDevice(id=27, process_index=7, coords=(3,6,0), core_on_chip=0), TpuDevice(id=12, process_index=2, coords=(0,3,0), core_on_chip=0), TpuDevice(id=5, process_index=0, coords=(1,1,0), core_on_chip=0), TpuDevice(id=17, process_index=4, coords=(1,4,0), core_on_chip=0), TpuDevice(id=24, process_index=6, coords=(0,6,0), core_on_chip=0), TpuDevice(id=31, process_index=7, coords=(3,7,0), core_on_chip=0), TpuDevice(id=18, process_index=5, coords=(2,4,0), core_on_chip=0), TpuDevice(id=28, process_index=6, coords=(0,7,0), core_on_chip=0), TpuDevice(id=3, process_index=1, coords=(3,0,0), core_on_chip=0), TpuDevice(id=19, process_index=5, coords=(3,4,0), core_on_chip=0), TpuDevice(id=21, process_index=4, coords=(1,5,0), core_on_chip=0), TpuDevice(id=13, process_index=2, coords=(1,3,0), core_on_chip=0), TpuDevice(id=25, process_index=6, coords=(1,6,0), core_on_chip=0), TpuDevice(id=30, process_index=7, coords=(2,7,0), core_on_chip=0), TpuDevice(id=7, process_index=1, coords=(3,1,0), core_on_chip=0), TpuDevice(id=15, process_index=3, coords=(3,3,0), core_on_chip=0), TpuDevice(id=16, process_index=4, coords=(0,4,0), core_on_chip=0), TpuDevice(id=2, process_index=1, coords=(2,0,0), core_on_chip=0), TpuDevice(id=22, process_index=5, coords=(2,5,0), core_on_chip=0), TpuDevice(id=26, process_index=7, coords=(2,6,0), core_on_chip=0), TpuDevice(id=11, process_index=3, coords=(3,2,0), core_on_chip=0), TpuDevice(id=4, process_index=0, coords=(0,1,0), core_on_chip=0), TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), TpuDevice(id=10, process_index=3, coords=(2,2,0), core_on_chip=0)} I0420 06:26:18.217211 138538582529792 grain_pool.py:542] Grain pool is exiting. I0420 06:26:18.217309 138538582529792 grain_pool.py:547] Shutting down multiprocessing system. I0420 06:26:19.655553 138538582529792 grain_pool.py:547] Shutting down multiprocessing system. Training: 0%| | 0/5 [00:13<?, ?step/s] /usr/local/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 15 leaked shared_memory objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' XPK End: Mon Apr 20 06:26:27 UTC 2026 EXIT_CODE=1