feat/nnx-post-train-fixesXPK Start: Mon Apr 20 15:51:58 UTC 2026 2026-04-20 15:52:15.065939: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303) I0420 15:52:18.614012 140084627253056 max_utils.py:273] Attempting to initialize the jax distributed system... INFO:2026-04-20 15:52:27,654:jax._src.distributed:149: Starting JAX distributed service on [::]:8482 I0420 15:52:27.654032 140084627253056 distributed.py:149] Starting JAX distributed service on [::]:8482 INFO:2026-04-20 15:52:27,656:jax._src.distributed:166: Connecting to JAX distributed service on mt-07-distill-smoke-2yyli-slice-job-0-0.mt-07-distill-smoke-2yyli:8482 I0420 15:52:27.656413 140084627253056 distributed.py:166] Connecting to JAX distributed service on mt-07-distill-smoke-2yyli-slice-job-0-0.mt-07-distill-smoke-2yyli:8482 I0420 15:52:29.183849 140084627253056 max_utils.py:284] Jax distributed system initialized! I0420 15:52:35.198754 140084627253056 max_utils.py:244] Jax distributed system is already initialized. I0420 15:52:35.664205 140084627253056 max_utils.py:244] Jax distributed system is already initialized. I0420 15:52:35.665389 140084627253056 pyconfig.py:432] Config param abort_on_inf_loss: True I0420 15:52:35.665438 140084627253056 pyconfig.py:432] Config param abort_on_nan_loss: True I0420 15:52:35.665463 140084627253056 pyconfig.py:432] Config param act_quantization_calibration_method: absmax I0420 15:52:35.665485 140084627253056 pyconfig.py:432] Config param activation_dropout_for_audio: 0.0 I0420 15:52:35.665503 140084627253056 pyconfig.py:432] Config param activation_function_for_audio: gelu I0420 15:52:35.665523 140084627253056 pyconfig.py:432] Config param activations_in_float32: False I0420 15:52:35.665542 140084627253056 pyconfig.py:432] Config param adam_b1: 0.9 I0420 15:52:35.665561 140084627253056 pyconfig.py:432] Config param adam_b2: 0.95 I0420 15:52:35.665583 140084627253056 pyconfig.py:432] Config param adam_eps: 1e-08 I0420 15:52:35.665606 140084627253056 pyconfig.py:432] Config param adam_eps_root: 0.0 I0420 15:52:35.665622 140084627253056 pyconfig.py:432] Config param adam_weight_decay: 0.1 I0420 15:52:35.665639 140084627253056 pyconfig.py:432] Config param adamw_mask: [] I0420 15:52:35.665657 140084627253056 pyconfig.py:432] Config param add_bos: True I0420 15:52:35.665671 140084627253056 pyconfig.py:432] Config param add_eos: True I0420 15:52:35.665688 140084627253056 pyconfig.py:432] Config param allow_split_physical_axes: False I0420 15:52:35.665706 140084627253056 pyconfig.py:432] Config param ar_cache_axis_order: 1,2,0,3 I0420 15:52:35.665723 140084627253056 pyconfig.py:432] Config param async_checkpointing: True I0420 15:52:35.665740 140084627253056 pyconfig.py:432] Config param async_scheduling: False I0420 15:52:35.665755 140084627253056 pyconfig.py:432] Config param attention: dot_product I0420 15:52:35.665771 140084627253056 pyconfig.py:432] Config param attention_bias: False I0420 15:52:35.665788 140084627253056 pyconfig.py:432] Config param attention_dropout_for_audio: 0.0 I0420 15:52:35.665808 140084627253056 pyconfig.py:432] Config param attention_out: RematLocation.REMAT I0420 15:52:35.665828 140084627253056 pyconfig.py:432] Config param attention_sink: False I0420 15:52:35.665844 140084627253056 pyconfig.py:432] Config param attention_type: global I0420 15:52:35.665861 140084627253056 pyconfig.py:432] Config param attn_logits_soft_cap: None I0420 15:52:35.665903 140084627253056 pyconfig.py:432] Config param audio_path: I0420 15:52:35.665929 140084627253056 pyconfig.py:432] Config param audio_placeholder: <|audio|> I0420 15:52:35.665954 140084627253056 pyconfig.py:432] Config param autoregressive_decode_assert: I0420 15:52:35.665981 140084627253056 pyconfig.py:432] Config param base_config: base.yml I0420 15:52:35.666002 140084627253056 pyconfig.py:432] Config param base_emb_dim: 16 I0420 15:52:35.666019 140084627253056 pyconfig.py:432] Config param base_mlp_dim: 64 I0420 15:52:35.666034 140084627253056 pyconfig.py:432] Config param base_moe_mlp_dim: 7168 I0420 15:52:35.666051 140084627253056 pyconfig.py:432] Config param base_num_decoder_layers: 1 I0420 15:52:35.666066 140084627253056 pyconfig.py:432] Config param base_num_kv_heads: 2 I0420 15:52:35.666094 140084627253056 pyconfig.py:432] Config param base_num_query_heads: 2 I0420 15:52:35.666111 140084627253056 pyconfig.py:432] Config param base_output_directory: I0420 15:52:35.666128 140084627253056 pyconfig.py:432] Config param batch_size: 1 I0420 15:52:35.666145 140084627253056 pyconfig.py:432] Config param batch_split_factor: 1 I0420 15:52:35.666160 140084627253056 pyconfig.py:432] Config param beta_fast: 32 I0420 15:52:35.666177 140084627253056 pyconfig.py:432] Config param beta_slow: 1 I0420 15:52:35.666193 140084627253056 pyconfig.py:432] Config param bwd_quantization_calibration_method: absmax I0420 15:52:35.666209 140084627253056 pyconfig.py:432] Config param capacity_factor: -1.0 I0420 15:52:35.666224 140084627253056 pyconfig.py:432] Config param cast_logits_to_fp32: True I0420 15:52:35.666241 140084627253056 pyconfig.py:432] Config param chat_template: I0420 15:52:35.666256 140084627253056 pyconfig.py:432] Config param chat_template_path: I0420 15:52:35.666273 140084627253056 pyconfig.py:432] Config param checkpoint_conversion_fn: None I0420 15:52:35.666290 140084627253056 pyconfig.py:432] Config param checkpoint_dir: None I0420 15:52:35.666306 140084627253056 pyconfig.py:432] Config param checkpoint_is_quantized: False I0420 15:52:35.666325 140084627253056 pyconfig.py:432] Config param checkpoint_period: 2000 I0420 15:52:35.666342 140084627253056 pyconfig.py:432] Config param checkpoint_storage_concurrent_gb: 96 I0420 15:52:35.666358 140084627253056 pyconfig.py:432] Config param checkpoint_storage_target_data_file_size_bytes: 2147483648 I0420 15:52:35.666375 140084627253056 pyconfig.py:432] Config param checkpoint_storage_use_ocdbt: True I0420 15:52:35.666390 140084627253056 pyconfig.py:432] Config param checkpoint_storage_use_zarr3: True I0420 15:52:35.666405 140084627253056 pyconfig.py:432] Config param checkpoint_todelete_full_path: None I0420 15:52:35.666421 140084627253056 pyconfig.py:432] Config param checkpoint_todelete_subdir: None I0420 15:52:35.666436 140084627253056 pyconfig.py:432] Config param chips_per_vm: 4 I0420 15:52:35.666452 140084627253056 pyconfig.py:432] Config param chunk_attn_window_size: 0 I0420 15:52:35.666467 140084627253056 pyconfig.py:432] Config param collect_stack_trace: False I0420 15:52:35.666484 140084627253056 pyconfig.py:432] Config param colocated_python_checkpointing: False I0420 15:52:35.666498 140084627253056 pyconfig.py:432] Config param colocated_python_data_input: False I0420 15:52:35.666514 140084627253056 pyconfig.py:432] Config param compile_topology: I0420 15:52:35.666531 140084627253056 pyconfig.py:432] Config param compile_topology_num_slices: -1 I0420 15:52:35.666546 140084627253056 pyconfig.py:432] Config param compile_xla_flags: I0420 15:52:35.666562 140084627253056 pyconfig.py:432] Config param compiled_trainstep_file: I0420 15:52:35.666583 140084627253056 pyconfig.py:432] Config param compute_axis_order: 0,1,2,3 I0420 15:52:35.666598 140084627253056 pyconfig.py:432] Config param constant_bound_config: [] I0420 15:52:35.666614 140084627253056 pyconfig.py:432] Config param context: RematLocation.REMAT I0420 15:52:35.666631 140084627253056 pyconfig.py:432] Config param context_parallel_load_balance: True I0420 15:52:35.666646 140084627253056 pyconfig.py:432] Config param context_parallel_size: 1 I0420 15:52:35.666662 140084627253056 pyconfig.py:432] Config param context_parallel_strategy: all_gather I0420 15:52:35.666678 140084627253056 pyconfig.py:432] Config param context_sharding: context I0420 15:52:35.666694 140084627253056 pyconfig.py:432] Config param conv_chunksize_for_audio: 500 I0420 15:52:35.666708 140084627253056 pyconfig.py:432] Config param conv_stride_for_vit: 14 I0420 15:52:35.666725 140084627253056 pyconfig.py:432] Config param cost_estimate_flops_bwd: -1 I0420 15:52:35.666740 140084627253056 pyconfig.py:432] Config param cost_estimate_flops_fwd: -1 I0420 15:52:35.666756 140084627253056 pyconfig.py:432] Config param custom_mesh: I0420 15:52:35.666770 140084627253056 pyconfig.py:432] Config param custom_mesh_and_rule: I0420 15:52:35.666786 140084627253056 pyconfig.py:432] Config param d_model_for_audio: 256 I0420 15:52:35.666801 140084627253056 pyconfig.py:432] Config param data_sharding: (('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'),) I0420 15:52:35.666821 140084627253056 pyconfig.py:432] Config param data_shuffle_seed: 0 I0420 15:52:35.666839 140084627253056 pyconfig.py:432] Config param dataset_name: c4/en:3.0.1 I0420 15:52:35.666856 140084627253056 pyconfig.py:432] Config param dataset_path: I0420 15:52:35.666870 140084627253056 pyconfig.py:432] Config param dataset_type: DatasetType.HF I0420 15:52:35.666888 140084627253056 pyconfig.py:432] Config param dcn_autoregressive_parallelism: 1 I0420 15:52:35.666903 140084627253056 pyconfig.py:432] Config param dcn_context_autoregressive_parallelism: 1 I0420 15:52:35.666919 140084627253056 pyconfig.py:432] Config param dcn_context_parallelism: 1 I0420 15:52:35.666934 140084627253056 pyconfig.py:432] Config param dcn_data_parallelism: -1 I0420 15:52:35.666950 140084627253056 pyconfig.py:432] Config param dcn_diloco_parallelism: 1 I0420 15:52:35.666965 140084627253056 pyconfig.py:432] Config param dcn_expert_parallelism: 1 I0420 15:52:35.666980 140084627253056 pyconfig.py:432] Config param dcn_fsdp_parallelism: 1 I0420 15:52:35.666996 140084627253056 pyconfig.py:432] Config param dcn_fsdp_transpose_parallelism: 1 I0420 15:52:35.667013 140084627253056 pyconfig.py:432] Config param dcn_parallelism: [1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] I0420 15:52:35.667028 140084627253056 pyconfig.py:432] Config param dcn_pipeline_parallelism: 1 I0420 15:52:35.667045 140084627253056 pyconfig.py:432] Config param dcn_sequence_parallelism: 1 I0420 15:52:35.667059 140084627253056 pyconfig.py:432] Config param dcn_tensor_parallelism: 1 I0420 15:52:35.667074 140084627253056 pyconfig.py:432] Config param dcn_tensor_sequence_parallelism: 1 I0420 15:52:35.667100 140084627253056 pyconfig.py:432] Config param dcn_tensor_transpose_parallelism: 1 I0420 15:52:35.667117 140084627253056 pyconfig.py:432] Config param debug: {'rl': False} I0420 15:52:35.667135 140084627253056 pyconfig.py:432] Config param debug_sharding: False I0420 15:52:35.667152 140084627253056 pyconfig.py:432] Config param decode_sampling_nucleus_p: -1 I0420 15:52:35.667166 140084627253056 pyconfig.py:432] Config param decode_sampling_strategy: SamplingStrategy.GREEDY I0420 15:52:35.667184 140084627253056 pyconfig.py:432] Config param decode_sampling_temperature: 1.0 I0420 15:52:35.667199 140084627253056 pyconfig.py:432] Config param decode_sampling_top_k: 0 I0420 15:52:35.667215 140084627253056 pyconfig.py:432] Config param decoder_block: DecoderBlockType.GPT3 I0420 15:52:35.667233 140084627253056 pyconfig.py:432] Config param decoder_layer_input: RematLocation.DEVICE I0420 15:52:35.667250 140084627253056 pyconfig.py:432] Config param deepstack_visual_indexes_for_vit: [] I0420 15:52:35.667265 140084627253056 pyconfig.py:432] Config param degenerate_group_masking: True I0420 15:52:35.667281 140084627253056 pyconfig.py:432] Config param diloco_outer_lr: 0.3 I0420 15:52:35.667297 140084627253056 pyconfig.py:432] Config param diloco_outer_momentum: 0.9 I0420 15:52:35.667313 140084627253056 pyconfig.py:432] Config param diloco_sync_period: 36 I0420 15:52:35.667329 140084627253056 pyconfig.py:432] Config param distill_alpha: 0.5 I0420 15:52:35.667345 140084627253056 pyconfig.py:432] Config param distill_beta: 0.0 I0420 15:52:35.667361 140084627253056 pyconfig.py:432] Config param distill_feature_loss_type: cosine I0420 15:52:35.667377 140084627253056 pyconfig.py:432] Config param distill_layer_indices: None I0420 15:52:35.667392 140084627253056 pyconfig.py:432] Config param distill_temperature: 1.0 I0420 15:52:35.667410 140084627253056 pyconfig.py:432] Config param downsample_hidden_size_for_audio: 256 I0420 15:52:35.667425 140084627253056 pyconfig.py:432] Config param dpo_beta: 0.1 I0420 15:52:35.667443 140084627253056 pyconfig.py:432] Config param dpo_label_smoothing: 0.0 I0420 15:52:35.667458 140084627253056 pyconfig.py:432] Config param dq_reduction_steps: 0 I0420 15:52:35.667474 140084627253056 pyconfig.py:432] Config param dropout_rate: 0.0 I0420 15:52:35.667490 140084627253056 pyconfig.py:432] Config param dtype: bfloat16 I0420 15:52:35.667521 140084627253056 pyconfig.py:432] Config param dtype_mm: float32 I0420 15:52:35.667537 140084627253056 pyconfig.py:432] Config param dump_hlo: False I0420 15:52:35.667551 140084627253056 pyconfig.py:432] Config param dump_hlo_delete_local_after: True I0420 15:52:35.667567 140084627253056 pyconfig.py:432] Config param dump_hlo_gcs_dir: gpt3-52k_2026-04-20-15-52/xla_dump I0420 15:52:35.667587 140084627253056 pyconfig.py:432] Config param dump_hlo_local_dir: /tmp/xla_dump/ I0420 15:52:35.667604 140084627253056 pyconfig.py:432] Config param dump_hlo_local_module_name: jit_train_step I0420 15:52:35.667619 140084627253056 pyconfig.py:432] Config param dump_hlo_module_name: jit_train_step I0420 15:52:35.667635 140084627253056 pyconfig.py:432] Config param dump_hlo_upload_all: False I0420 15:52:35.667649 140084627253056 pyconfig.py:432] Config param dump_hlo_xla_flags: I0420 15:52:35.667665 140084627253056 pyconfig.py:432] Config param dump_jaxpr: False I0420 15:52:35.667681 140084627253056 pyconfig.py:432] Config param dump_jaxpr_delete_local_after: True I0420 15:52:35.667695 140084627253056 pyconfig.py:432] Config param dump_jaxpr_gcs_dir: gpt3-52k_2026-04-20-15-52/jaxpr_dump I0420 15:52:35.667711 140084627253056 pyconfig.py:432] Config param dump_jaxpr_local_dir: /tmp/jaxpr_dump/ I0420 15:52:35.667725 140084627253056 pyconfig.py:432] Config param dump_step: -1 I0420 15:52:35.667742 140084627253056 pyconfig.py:432] Config param elastic_enabled: False I0420 15:52:35.667757 140084627253056 pyconfig.py:432] Config param elastic_max_retries: 10 I0420 15:52:35.667772 140084627253056 pyconfig.py:432] Config param elastic_timeout_seconds: 300 I0420 15:52:35.667789 140084627253056 pyconfig.py:432] Config param emb_dim: 16 I0420 15:52:35.667804 140084627253056 pyconfig.py:432] Config param enable_autocheckpoint: False I0420 15:52:35.667819 140084627253056 pyconfig.py:432] Config param enable_checkpoint_cloud_logger: False I0420 15:52:35.667836 140084627253056 pyconfig.py:432] Config param enable_checkpointing: True I0420 15:52:35.667853 140084627253056 pyconfig.py:432] Config param enable_continuous_checkpointing: False I0420 15:52:35.667869 140084627253056 pyconfig.py:432] Config param enable_data_shuffling: True I0420 15:52:35.667885 140084627253056 pyconfig.py:432] Config param enable_diloco: False I0420 15:52:35.667899 140084627253056 pyconfig.py:432] Config param enable_dp_attention: False I0420 15:52:35.667915 140084627253056 pyconfig.py:432] Config param enable_dropout: False I0420 15:52:35.667930 140084627253056 pyconfig.py:432] Config param enable_emergency_checkpoint: False I0420 15:52:35.667946 140084627253056 pyconfig.py:432] Config param enable_gcp_goodput_metrics: True I0420 15:52:35.667960 140084627253056 pyconfig.py:432] Config param enable_gcp_step_deviation_metrics: True I0420 15:52:35.667976 140084627253056 pyconfig.py:432] Config param enable_goodput_recording: False I0420 15:52:35.667991 140084627253056 pyconfig.py:432] Config param enable_jax_profiler: False I0420 15:52:35.668007 140084627253056 pyconfig.py:432] Config param enable_llm_inference_pool: False I0420 15:52:35.668021 140084627253056 pyconfig.py:432] Config param enable_model_warmup: False I0420 15:52:35.668036 140084627253056 pyconfig.py:432] Config param enable_multi_tier_checkpointing: False I0420 15:52:35.668052 140084627253056 pyconfig.py:432] Config param enable_nnx: False I0420 15:52:35.668067 140084627253056 pyconfig.py:432] Config param enable_orbax_v1: False I0420 15:52:35.668092 140084627253056 pyconfig.py:432] Config param enable_padding_causal_mask: True I0420 15:52:35.668109 140084627253056 pyconfig.py:432] Config param enable_pathways_goodput: False I0420 15:52:35.668123 140084627253056 pyconfig.py:432] Config param enable_prefix_caching: False I0420 15:52:35.668139 140084627253056 pyconfig.py:432] Config param enable_rampup_batch_size: False I0420 15:52:35.668154 140084627253056 pyconfig.py:432] Config param enable_single_controller: False I0420 15:52:35.668168 140084627253056 pyconfig.py:432] Config param enable_single_replica_ckpt_restoring: False I0420 15:52:35.668184 140084627253056 pyconfig.py:432] Config param enable_tensorboard: True I0420 15:52:35.668199 140084627253056 pyconfig.py:432] Config param enable_tunix_perf_metrics: False I0420 15:52:35.668215 140084627253056 pyconfig.py:432] Config param encoder_attention_heads_for_audio: 4 I0420 15:52:35.668229 140084627253056 pyconfig.py:432] Config param encoder_ffn_dim_for_audio: 512 I0420 15:52:35.668244 140084627253056 pyconfig.py:432] Config param encoder_layers_for_audio: 2 I0420 15:52:35.668258 140084627253056 pyconfig.py:432] Config param engram: RematLocation.REMAT I0420 15:52:35.668275 140084627253056 pyconfig.py:432] Config param engram_head_dim: 1280 I0420 15:52:35.668290 140084627253056 pyconfig.py:432] Config param engram_kernel_size: 4 I0420 15:52:35.668306 140084627253056 pyconfig.py:432] Config param engram_layers: [] I0420 15:52:35.668322 140084627253056 pyconfig.py:432] Config param engram_max_ngram_size: 3 I0420 15:52:35.668338 140084627253056 pyconfig.py:432] Config param engram_num_heads: 8 I0420 15:52:35.668352 140084627253056 pyconfig.py:432] Config param engram_seed: 0 I0420 15:52:35.668368 140084627253056 pyconfig.py:432] Config param engram_vocab_bases: [] I0420 15:52:35.668384 140084627253056 pyconfig.py:432] Config param epsilon_high: None I0420 15:52:35.668399 140084627253056 pyconfig.py:432] Config param eval_corr_lst: False I0420 15:52:35.668414 140084627253056 pyconfig.py:432] Config param eval_data_columns: ['text'] I0420 15:52:35.668431 140084627253056 pyconfig.py:432] Config param eval_dataset_name: c4/en:3.0.1 I0420 15:52:35.668445 140084627253056 pyconfig.py:432] Config param eval_image_column: image I0420 15:52:35.668461 140084627253056 pyconfig.py:432] Config param eval_interval: -1 I0420 15:52:35.668477 140084627253056 pyconfig.py:432] Config param eval_make_lst: False I0420 15:52:35.668492 140084627253056 pyconfig.py:432] Config param eval_per_device_batch_size: 2 I0420 15:52:35.668508 140084627253056 pyconfig.py:432] Config param eval_sampling_strategy: greedy I0420 15:52:35.668523 140084627253056 pyconfig.py:432] Config param eval_split: validation I0420 15:52:35.668537 140084627253056 pyconfig.py:432] Config param eval_steps: -1 I0420 15:52:35.668553 140084627253056 pyconfig.py:432] Config param expansion_factor_real_data: -1.0 I0420 15:52:35.668568 140084627253056 pyconfig.py:432] Config param final_logits_soft_cap: None I0420 15:52:35.668589 140084627253056 pyconfig.py:432] Config param first_num_dense_layers: 0 I0420 15:52:35.668605 140084627253056 pyconfig.py:432] Config param float32_gate_logits: False I0420 15:52:35.668620 140084627253056 pyconfig.py:432] Config param float32_logits: False I0420 15:52:35.668636 140084627253056 pyconfig.py:432] Config param float32_qk_product: False I0420 15:52:35.668650 140084627253056 pyconfig.py:432] Config param float32_weight_sum: True I0420 15:52:35.668666 140084627253056 pyconfig.py:432] Config param force_q_layout: False I0420 15:52:35.668681 140084627253056 pyconfig.py:432] Config param force_unroll: False I0420 15:52:35.668697 140084627253056 pyconfig.py:432] Config param freeze_audio_encoder_params: True I0420 15:52:35.668712 140084627253056 pyconfig.py:432] Config param freeze_vision_encoder_params: True I0420 15:52:35.668725 140084627253056 pyconfig.py:432] Config param fused_mlp: False I0420 15:52:35.668741 140084627253056 pyconfig.py:432] Config param fused_qkv: True I0420 15:52:35.668756 140084627253056 pyconfig.py:432] Config param gcs_metrics: False I0420 15:52:35.668770 140084627253056 pyconfig.py:432] Config param gdn_chunk_size: 64 I0420 15:52:35.668787 140084627253056 pyconfig.py:432] Config param gdn_conv_kernel_dim: 4 I0420 15:52:35.668801 140084627253056 pyconfig.py:432] Config param gdn_key_head_dim: 128 I0420 15:52:35.668817 140084627253056 pyconfig.py:432] Config param gdn_num_key_heads: 16 I0420 15:52:35.668833 140084627253056 pyconfig.py:432] Config param gdn_num_value_heads: 32 I0420 15:52:35.668849 140084627253056 pyconfig.py:432] Config param gdn_value_head_dim: 128 I0420 15:52:35.668865 140084627253056 pyconfig.py:432] Config param generate_padding_batch_eval: False I0420 15:52:35.668881 140084627253056 pyconfig.py:432] Config param generate_padding_batch_train: False I0420 15:52:35.668898 140084627253056 pyconfig.py:432] Config param generate_slice: v5e-16 I0420 15:52:35.668915 140084627253056 pyconfig.py:432] Config param generation_configs: {} I0420 15:52:35.668930 140084627253056 pyconfig.py:432] Config param global_batch_size_to_eval_on: 64 I0420 15:52:35.668947 140084627253056 pyconfig.py:432] Config param global_batch_size_to_load: 512 I0420 15:52:35.668964 140084627253056 pyconfig.py:432] Config param global_batch_size_to_load_eval: 64 I0420 15:52:35.668978 140084627253056 pyconfig.py:432] Config param global_batch_size_to_load_increment: None I0420 15:52:35.668995 140084627253056 pyconfig.py:432] Config param global_batch_size_to_load_start: None I0420 15:52:35.669012 140084627253056 pyconfig.py:432] Config param global_batch_size_to_train_on: 512 I0420 15:52:35.669027 140084627253056 pyconfig.py:432] Config param global_head_dim: 0 I0420 15:52:35.669044 140084627253056 pyconfig.py:432] Config param global_num_kv_heads: 0 I0420 15:52:35.669059 140084627253056 pyconfig.py:432] Config param global_parameter_scale: 1 I0420 15:52:35.669075 140084627253056 pyconfig.py:432] Config param global_rampup_samples: 500 I0420 15:52:35.669102 140084627253056 pyconfig.py:432] Config param global_rope_max_timescale: -1 I0420 15:52:35.669117 140084627253056 pyconfig.py:432] Config param global_rope_proportion: 0.25 I0420 15:52:35.669134 140084627253056 pyconfig.py:432] Config param goodput_upload_interval_seconds: 30 I0420 15:52:35.669148 140084627253056 pyconfig.py:432] Config param grad_dtype: float32 I0420 15:52:35.669183 140084627253056 pyconfig.py:432] Config param gradient_accumulation_steps: 8 I0420 15:52:35.669198 140084627253056 pyconfig.py:432] Config param gradient_clipping_threshold: 1.0 I0420 15:52:35.669214 140084627253056 pyconfig.py:432] Config param grain_data_source_max_workers: 16 I0420 15:52:35.669230 140084627253056 pyconfig.py:432] Config param grain_eval_files: I0420 15:52:35.669245 140084627253056 pyconfig.py:432] Config param grain_file_type: arrayrecord I0420 15:52:35.669261 140084627253056 pyconfig.py:432] Config param grain_num_threads: 16 I0420 15:52:35.669276 140084627253056 pyconfig.py:432] Config param grain_num_threads_eval: 16 I0420 15:52:35.669292 140084627253056 pyconfig.py:432] Config param grain_packing_type: first_fit I0420 15:52:35.669306 140084627253056 pyconfig.py:432] Config param grain_per_worker_buffer_size: 1 I0420 15:52:35.669322 140084627253056 pyconfig.py:432] Config param grain_per_worker_buffer_size_eval: 1 I0420 15:52:35.669337 140084627253056 pyconfig.py:432] Config param grain_prefetch_buffer_size: 500 I0420 15:52:35.669353 140084627253056 pyconfig.py:432] Config param grain_prefetch_buffer_size_eval: 500 I0420 15:52:35.669367 140084627253056 pyconfig.py:432] Config param grain_ram_budget_mb: 1024 I0420 15:52:35.669383 140084627253056 pyconfig.py:432] Config param grain_shuffle_buffer_size: 100 I0420 15:52:35.669398 140084627253056 pyconfig.py:432] Config param grain_train_files: I0420 15:52:35.669414 140084627253056 pyconfig.py:432] Config param grain_train_mixture_config_path: I0420 15:52:35.669429 140084627253056 pyconfig.py:432] Config param grain_worker_count: 1 I0420 15:52:35.669446 140084627253056 pyconfig.py:432] Config param grain_worker_count_eval: 1 I0420 15:52:35.669460 140084627253056 pyconfig.py:432] Config param grpo_beta: 0.08 I0420 15:52:35.669477 140084627253056 pyconfig.py:432] Config param grpo_epsilon: 0.2 I0420 15:52:35.669493 140084627253056 pyconfig.py:432] Config param hardware: tpu I0420 15:52:35.669508 140084627253056 pyconfig.py:432] Config param hbm_utilization_vllm: 0.72 I0420 15:52:35.669524 140084627253056 pyconfig.py:432] Config param head_dim: 8 I0420 15:52:35.669540 140084627253056 pyconfig.py:432] Config param heartbeat_reporting_interval_in_seconds: 5 I0420 15:52:35.669557 140084627253056 pyconfig.py:432] Config param hf_data_dir: None I0420 15:52:35.669574 140084627253056 pyconfig.py:432] Config param hf_eval_files: None I0420 15:52:35.669592 140084627253056 pyconfig.py:432] Config param hf_eval_split: None I0420 15:52:35.669610 140084627253056 pyconfig.py:432] Config param hf_name: None I0420 15:52:35.669628 140084627253056 pyconfig.py:432] Config param hf_path: OptimalScale/ClimbMix I0420 15:52:35.669645 140084627253056 pyconfig.py:432] Config param hf_train_files: None I0420 15:52:35.669660 140084627253056 pyconfig.py:432] Config param hidden_size_for_vit: 1408 I0420 15:52:35.669677 140084627253056 pyconfig.py:432] Config param hide_profiler_step_metric: False I0420 15:52:35.669693 140084627253056 pyconfig.py:432] Config param ici_autoregressive_parallelism: 1 I0420 15:52:35.669709 140084627253056 pyconfig.py:432] Config param ici_context_autoregressive_parallelism: 1 I0420 15:52:35.669724 140084627253056 pyconfig.py:432] Config param ici_context_parallelism: 1 I0420 15:52:35.669741 140084627253056 pyconfig.py:432] Config param ici_data_parallelism: 1 I0420 15:52:35.669755 140084627253056 pyconfig.py:432] Config param ici_diloco_parallelism: 1 I0420 15:52:35.669770 140084627253056 pyconfig.py:432] Config param ici_expert_parallelism: 1 I0420 15:52:35.669786 140084627253056 pyconfig.py:432] Config param ici_fsdp_parallelism: -1 I0420 15:52:35.669800 140084627253056 pyconfig.py:432] Config param ici_fsdp_transpose_parallelism: 1 I0420 15:52:35.669816 140084627253056 pyconfig.py:432] Config param ici_parallelism: [1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1] I0420 15:52:35.669833 140084627253056 pyconfig.py:432] Config param ici_pipeline_parallelism: 1 I0420 15:52:35.669851 140084627253056 pyconfig.py:432] Config param ici_sequence_parallelism: 1 I0420 15:52:35.669865 140084627253056 pyconfig.py:432] Config param ici_tensor_parallelism: 1 I0420 15:52:35.669881 140084627253056 pyconfig.py:432] Config param ici_tensor_sequence_parallelism: 1 I0420 15:52:35.669895 140084627253056 pyconfig.py:432] Config param ici_tensor_transpose_parallelism: 1 I0420 15:52:35.669911 140084627253056 pyconfig.py:432] Config param image_path: I0420 15:52:35.669926 140084627253056 pyconfig.py:432] Config param image_placeholder: <|image|> I0420 15:52:35.669942 140084627253056 pyconfig.py:432] Config param image_size_for_vit: 896 I0420 15:52:35.669957 140084627253056 pyconfig.py:432] Config param indexer_head_dim: 128 I0420 15:52:35.669972 140084627253056 pyconfig.py:432] Config param indexer_loss_scaling_factor: 0.0 I0420 15:52:35.669988 140084627253056 pyconfig.py:432] Config param indexer_n_heads: 64 I0420 15:52:35.670004 140084627253056 pyconfig.py:432] Config param indexer_sparse_training: False I0420 15:52:35.670020 140084627253056 pyconfig.py:432] Config param indexer_topk: 2048 I0420 15:52:35.670036 140084627253056 pyconfig.py:432] Config param inference_benchmark_test: False I0420 15:52:35.670050 140084627253056 pyconfig.py:432] Config param inference_metadata_file: I0420 15:52:35.670066 140084627253056 pyconfig.py:432] Config param inference_microbenchmark_log_file_path: I0420 15:52:35.670089 140084627253056 pyconfig.py:432] Config param inference_microbenchmark_loop_iters: 10 I0420 15:52:35.670105 140084627253056 pyconfig.py:432] Config param inference_microbenchmark_num_samples: [1, 2, 3, 4, 5] I0420 15:52:35.670121 140084627253056 pyconfig.py:432] Config param inference_microbenchmark_prefill_lengths: 64,128,256,512,1024 I0420 15:52:35.670137 140084627253056 pyconfig.py:432] Config param inference_microbenchmark_stages: prefill,generate I0420 15:52:35.670154 140084627253056 pyconfig.py:432] Config param inference_server: MaxtextInterleavedServer I0420 15:52:35.670170 140084627253056 pyconfig.py:432] Config param inhomogeneous_layer_cycle_interval: 1 I0420 15:52:35.670185 140084627253056 pyconfig.py:432] Config param init_weights_seed: 0 I0420 15:52:35.670201 140084627253056 pyconfig.py:432] Config param input_data_sharding_logical_axes: ['activation_embed_and_logits_batch', 'activation_norm_length'] I0420 15:52:35.670216 140084627253056 pyconfig.py:432] Config param interleave_moe_layer_step: 1 I0420 15:52:35.670232 140084627253056 pyconfig.py:432] Config param intermediate_size_for_vit: 5632 I0420 15:52:35.670248 140084627253056 pyconfig.py:432] Config param internal_compile: False I0420 15:52:35.670263 140084627253056 pyconfig.py:432] Config param internal_compile_num_devices: -1 I0420 15:52:35.670279 140084627253056 pyconfig.py:432] Config param jax_cache_dir: ~/jax_cache I0420 15:52:35.670294 140084627253056 pyconfig.py:432] Config param jax_debug_log_modules: I0420 15:52:35.670309 140084627253056 pyconfig.py:432] Config param jax_distributed_initialization_timeout: 300 I0420 15:52:35.670325 140084627253056 pyconfig.py:432] Config param jax_profiler_port: 9999 I0420 15:52:35.670340 140084627253056 pyconfig.py:432] Config param key_proj: RematLocation.REMAT I0420 15:52:35.670356 140084627253056 pyconfig.py:432] Config param kv_cache_buffer: 256 I0420 15:52:35.670371 140084627253056 pyconfig.py:432] Config param kv_lora_rank: 512 I0420 15:52:35.670388 140084627253056 pyconfig.py:432] Config param kv_quant_axis: KvQuantAxis.HEADS_AND_DKV I0420 15:52:35.670406 140084627253056 pyconfig.py:432] Config param kv_quant_dtype: int8 I0420 15:52:35.670422 140084627253056 pyconfig.py:432] Config param kv_wa_proj: RematLocation.REMAT I0420 15:52:35.670437 140084627253056 pyconfig.py:432] Config param learning_rate: 0.0002 I0420 15:52:35.670454 140084627253056 pyconfig.py:432] Config param learning_rate_final_fraction: 0.1 I0420 15:52:35.670471 140084627253056 pyconfig.py:432] Config param learning_rate_schedule_steps: 200000 I0420 15:52:35.670485 140084627253056 pyconfig.py:432] Config param load_balance_loss_weight: 0.0 I0420 15:52:35.670502 140084627253056 pyconfig.py:432] Config param load_checkpoint_only_once: False I0420 15:52:35.670518 140084627253056 pyconfig.py:432] Config param load_from_prefill_dir: False I0420 15:52:35.670533 140084627253056 pyconfig.py:432] Config param load_full_state_path: I0420 15:52:35.670549 140084627253056 pyconfig.py:432] Config param load_parameters_path: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items I0420 15:52:35.670565 140084627253056 pyconfig.py:432] Config param local_checkpoint_directory: I0420 15:52:35.670583 140084627253056 pyconfig.py:432] Config param local_checkpoint_period: 0 I0420 15:52:35.670600 140084627253056 pyconfig.py:432] Config param local_rope_max_timescale: -1 I0420 15:52:35.670617 140084627253056 pyconfig.py:432] Config param local_rope_proportion: 1.0 I0420 15:52:35.670632 140084627253056 pyconfig.py:432] Config param log_config: True I0420 15:52:35.670648 140084627253056 pyconfig.py:432] Config param log_period: 10 I0420 15:52:35.670663 140084627253056 pyconfig.py:432] Config param logical_axis_rules: (('activation_embed_and_logits_batch', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_embed_and_logits_batch_sequence', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_vocab', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_vocab', ('tensor', 'tensor_transpose')), ('activation_vocab', 'tensor_sequence'), ('activation_vocab', ('sequence', 'context')), ('vocab', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('embed_vocab', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence', 'autoregressive')), ('activation_kv_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence')), ('activation_attn_length', ('sequence', 'context')), ('activation_attn_length', ('context',)), ('activation_q_length', ('context',)), ('activation_kv_length', ()), ('activation_attn_embed', ('tensor', 'tensor_transpose')), ('activation_kv', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_kv_head_dim', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('q_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('kv_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('qkv', ()), ('kv', ()), ('kv_head_dim', ()), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'expert')), ('q_lora_up_proj', ()), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'expert')), ('kv_lora_up_proj', ()), ('activation_batch_moe', ('data', 'fsdp', 'fsdp_transpose')), ('activation_length_moe', ('sequence', 'context')), ('activation_length_moe', ('context',)), ('activation_norm_length_moe', ('tensor_sequence', 'context', 'sequence')), ('activation_embed_moe', ('tensor', 'tensor_transpose')), ('activation_mlp_moe', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_exp', ('expert',)), ('exp', 'expert'), ('mlp_moe', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'context')), ('embed_moe', ('fsdp', 'sequence', 'context')), ('activation_mlp', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_length', ('sequence', 'context')), ('activation_length', ('context',)), ('activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_embed', ('tensor', 'tensor_transpose')), ('activation_stage', 'stage'), ('mlp', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'context', 'expert')), ('norm', ('tensor', 'tensor_transpose')), ('layers', 'stage'), ('diloco', 'diloco'), ('engram_dim', ('tensor',)), ('dense_layers', ()), ('moe_layers', ()), ('mhc', ()), ('prefill_activation_length', ('sequence', 'context')), ('prefill_activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_prefill_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_length', ('sequence',)), ('cache_heads', ('autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence')), ('cache_heads', ('autoregressive', 'tensor', 'tensor_sequence')), ('paged_kv_heads', ('tensor',)), ('cache_batch_prefill', ()), ('cache_batch', ()), ('cache_heads_none', ()), ('cache_kv', ()), ('cache_sequence', ()), ('num_pages', ()), ('tokens_per_page', ()), ('paged_kv_head_dim_size', ()), ('mlp_no_fsdp', ('tensor', 'tensor_sequence', 'autoregressive')), ('embed_tensor_transpose', ('tensor_transpose',)), ('exp_with_fsdp', 'fsdp')) I0420 15:52:35.670736 140084627253056 pyconfig.py:432] Config param logits_dot_in_fp32: False I0420 15:52:35.670752 140084627253056 pyconfig.py:432] Config param logits_via_embedding: True I0420 15:52:35.670769 140084627253056 pyconfig.py:432] Config param lora_input_adapters_path: I0420 15:52:35.670784 140084627253056 pyconfig.py:432] Config param loss_algo: grpo I0420 15:52:35.670800 140084627253056 pyconfig.py:432] Config param lr_schedule_type: LearningRateScheduleType.COSINE I0420 15:52:35.670818 140084627253056 pyconfig.py:432] Config param managed_mldiagnostics: False I0420 15:52:35.670835 140084627253056 pyconfig.py:432] Config param managed_mldiagnostics_dir: None I0420 15:52:35.670851 140084627253056 pyconfig.py:432] Config param managed_mldiagnostics_run_group: I0420 15:52:35.670867 140084627253056 pyconfig.py:432] Config param matmul_precision: MatmulPrecision.DEFAULT I0420 15:52:35.670885 140084627253056 pyconfig.py:432] Config param max_checkify: False I0420 15:52:35.670900 140084627253056 pyconfig.py:432] Config param max_concurrency: 256 I0420 15:52:35.670916 140084627253056 pyconfig.py:432] Config param max_corpus_chars: 10000000 I0420 15:52:35.670931 140084627253056 pyconfig.py:432] Config param max_num_batched_tokens: None I0420 15:52:35.670947 140084627253056 pyconfig.py:432] Config param max_num_checkpoints_to_keep: None I0420 15:52:35.670962 140084627253056 pyconfig.py:432] Config param max_num_images_per_example: -1 I0420 15:52:35.670978 140084627253056 pyconfig.py:432] Config param max_num_seqs: None I0420 15:52:35.670994 140084627253056 pyconfig.py:432] Config param max_position_embeddings: 163840 I0420 15:52:35.671009 140084627253056 pyconfig.py:432] Config param max_prefill_predict_length: 64 I0420 15:52:35.671025 140084627253056 pyconfig.py:432] Config param max_sample_len_for_audio: 10000 I0420 15:52:35.671040 140084627253056 pyconfig.py:432] Config param max_segments_per_seq: -1 I0420 15:52:35.671056 140084627253056 pyconfig.py:432] Config param max_source_positions_for_audio: 1500 I0420 15:52:35.671072 140084627253056 pyconfig.py:432] Config param max_target_length: 2048 I0420 15:52:35.671099 140084627253056 pyconfig.py:432] Config param max_timescale_for_audio: 10000.0 I0420 15:52:35.671116 140084627253056 pyconfig.py:432] Config param megablox: True I0420 15:52:35.671133 140084627253056 pyconfig.py:432] Config param merge_gating_gmm: False I0420 15:52:35.671150 140084627253056 pyconfig.py:432] Config param mesh_axes: ['diloco', 'data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'] I0420 15:52:35.671169 140084627253056 pyconfig.py:432] Config param metrics_dir: None I0420 15:52:35.671186 140084627253056 pyconfig.py:432] Config param metrics_file: I0420 15:52:35.671201 140084627253056 pyconfig.py:432] Config param mhc_expansion_rate: 1 I0420 15:52:35.671217 140084627253056 pyconfig.py:432] Config param micro_batch_size_to_eval_on: 64 I0420 15:52:35.671232 140084627253056 pyconfig.py:432] Config param micro_batch_size_to_train_on: 64 I0420 15:52:35.671248 140084627253056 pyconfig.py:432] Config param mla_kv: RematLocation.REMAT I0420 15:52:35.671263 140084627253056 pyconfig.py:432] Config param mla_naive_kvcache: True I0420 15:52:35.671279 140084627253056 pyconfig.py:432] Config param mla_q: RematLocation.REMAT I0420 15:52:35.671295 140084627253056 pyconfig.py:432] Config param mlp_activations: ['gelu'] I0420 15:52:35.671314 140084627253056 pyconfig.py:432] Config param mlp_activations_limit: -1.0 I0420 15:52:35.671329 140084627253056 pyconfig.py:432] Config param mlp_bias: False I0420 15:52:35.671344 140084627253056 pyconfig.py:432] Config param mlp_dim: 64 I0420 15:52:35.671360 140084627253056 pyconfig.py:432] Config param mlpwi: RematLocation.REMAT I0420 15:52:35.671377 140084627253056 pyconfig.py:432] Config param mlpwi_0: RematLocation.REMAT I0420 15:52:35.671394 140084627253056 pyconfig.py:432] Config param mlpwi_1: RematLocation.REMAT I0420 15:52:35.671410 140084627253056 pyconfig.py:432] Config param mlpwo: RematLocation.REMAT I0420 15:52:35.671426 140084627253056 pyconfig.py:432] Config param moba: False I0420 15:52:35.671441 140084627253056 pyconfig.py:432] Config param moba_chunk_size: 1024 I0420 15:52:35.671457 140084627253056 pyconfig.py:432] Config param moba_topk: 8 I0420 15:52:35.671472 140084627253056 pyconfig.py:432] Config param model_call_mode: I0420 15:52:35.671488 140084627253056 pyconfig.py:432] Config param model_name: gpt3-52k I0420 15:52:35.671502 140084627253056 pyconfig.py:432] Config param moe_fsdp_use_two_stage_all_gather: False I0420 15:52:35.671518 140084627253056 pyconfig.py:432] Config param moe_mlp_dim: 7168 I0420 15:52:35.671533 140084627253056 pyconfig.py:432] Config param moe_mlpwi_0: RematLocation.REMAT I0420 15:52:35.671549 140084627253056 pyconfig.py:432] Config param moe_mlpwi_1: RematLocation.REMAT I0420 15:52:35.671564 140084627253056 pyconfig.py:432] Config param moe_mlpwo: RematLocation.REMAT I0420 15:52:35.671584 140084627253056 pyconfig.py:432] Config param monitor_goodput: False I0420 15:52:35.671599 140084627253056 pyconfig.py:432] Config param monitor_step_time_deviation: True I0420 15:52:35.671614 140084627253056 pyconfig.py:432] Config param mrope_section: [24, 20, 20] I0420 15:52:35.671629 140084627253056 pyconfig.py:432] Config param mscale: 1.0 I0420 15:52:35.671646 140084627253056 pyconfig.py:432] Config param mtc_data_parallelism: 0 I0420 15:52:35.671662 140084627253056 pyconfig.py:432] Config param mtp_eval_target_module: 0 I0420 15:52:35.671677 140084627253056 pyconfig.py:432] Config param mtp_loss_scaling_factor: 0.1 I0420 15:52:35.671694 140084627253056 pyconfig.py:432] Config param mtp_num_layers: 0 I0420 15:52:35.671709 140084627253056 pyconfig.py:432] Config param mu_dtype: float32 I0420 15:52:35.671733 140084627253056 pyconfig.py:432] Config param multi_sampling: False I0420 15:52:35.671748 140084627253056 pyconfig.py:432] Config param multi_tier_checkpointing_backup_interval_minutes: 0 I0420 15:52:35.671766 140084627253056 pyconfig.py:432] Config param muon_beta: 0.95 I0420 15:52:35.671783 140084627253056 pyconfig.py:432] Config param muon_consistent_rms: None I0420 15:52:35.671799 140084627253056 pyconfig.py:432] Config param muon_weight_decay: 0.0 I0420 15:52:35.671814 140084627253056 pyconfig.py:432] Config param n_routing_groups: -1 I0420 15:52:35.671832 140084627253056 pyconfig.py:432] Config param n_window_for_audio: 50 I0420 15:52:35.671847 140084627253056 pyconfig.py:432] Config param n_window_infer_for_audio: 800 I0420 15:52:35.671863 140084627253056 pyconfig.py:432] Config param nope_layer_interval: -1 I0420 15:52:35.671879 140084627253056 pyconfig.py:432] Config param norm_topk_prob: False I0420 15:52:35.671894 140084627253056 pyconfig.py:432] Config param normalization_layer_epsilon: 1e-05 I0420 15:52:35.671912 140084627253056 pyconfig.py:432] Config param normalize_embedding_logits: False I0420 15:52:35.671928 140084627253056 pyconfig.py:432] Config param num_attention_heads_for_vit: 16 I0420 15:52:35.671944 140084627253056 pyconfig.py:432] Config param num_batches: 4 I0420 15:52:35.671959 140084627253056 pyconfig.py:432] Config param num_channels_for_vit: 3 I0420 15:52:35.671975 140084627253056 pyconfig.py:432] Config param num_conv_layers_for_audio: 3 I0420 15:52:35.671989 140084627253056 pyconfig.py:432] Config param num_decoder_layers: 1 I0420 15:52:35.672005 140084627253056 pyconfig.py:432] Config param num_diloco_replicas: 1 I0420 15:52:35.672019 140084627253056 pyconfig.py:432] Config param num_epoch: 1 I0420 15:52:35.672036 140084627253056 pyconfig.py:432] Config param num_eval_passes: 1 I0420 15:52:35.672050 140084627253056 pyconfig.py:432] Config param num_experts: 1 I0420 15:52:35.672066 140084627253056 pyconfig.py:432] Config param num_experts_per_tok: 1 I0420 15:52:35.672088 140084627253056 pyconfig.py:432] Config param num_generations: 2 I0420 15:52:35.672104 140084627253056 pyconfig.py:432] Config param num_hidden_layers_for_vit: 34 I0420 15:52:35.672120 140084627253056 pyconfig.py:432] Config param num_iterations: 1 I0420 15:52:35.672135 140084627253056 pyconfig.py:432] Config param num_kv_heads: 2 I0420 15:52:35.672151 140084627253056 pyconfig.py:432] Config param num_layers_per_pipeline_stage: 1 I0420 15:52:35.672166 140084627253056 pyconfig.py:432] Config param num_mel_bins_for_audio: 128 I0420 15:52:35.672182 140084627253056 pyconfig.py:432] Config param num_pipeline_microbatches: -1 I0420 15:52:35.672196 140084627253056 pyconfig.py:432] Config param num_pipeline_repeats: -1 I0420 15:52:35.672210 140084627253056 pyconfig.py:432] Config param num_position_embeddings_for_vit: 1024 I0420 15:52:35.672224 140084627253056 pyconfig.py:432] Config param num_query_heads: 2 I0420 15:52:35.672239 140084627253056 pyconfig.py:432] Config param num_samplers_slices: -1 I0420 15:52:35.672256 140084627253056 pyconfig.py:432] Config param num_slices: 1 I0420 15:52:35.672270 140084627253056 pyconfig.py:432] Config param num_target_devices: 32 I0420 15:52:35.672287 140084627253056 pyconfig.py:432] Config param num_test_batches: 5 I0420 15:52:35.672301 140084627253056 pyconfig.py:432] Config param num_trainer_slices: -1 I0420 15:52:35.672317 140084627253056 pyconfig.py:432] Config param num_vocab_tiling: 1 I0420 15:52:35.672331 140084627253056 pyconfig.py:432] Config param off_policy_steps: 0 I0420 15:52:35.672347 140084627253056 pyconfig.py:432] Config param offline_data_dir: None I0420 15:52:35.672362 140084627253056 pyconfig.py:432] Config param opt_type: OptimizerType.ADAM_PAX I0420 15:52:35.672380 140084627253056 pyconfig.py:432] Config param optimize_mesh_for_tpu_v6e: False I0420 15:52:35.672394 140084627253056 pyconfig.py:432] Config param optimizer_memory_host_offload: False I0420 15:52:35.672410 140084627253056 pyconfig.py:432] Config param original_max_position_embeddings: 4096 I0420 15:52:35.672425 140084627253056 pyconfig.py:432] Config param out_hidden_size_for_vit: 512 I0420 15:52:35.672441 140084627253056 pyconfig.py:432] Config param out_proj: RematLocation.REMAT I0420 15:52:35.672456 140084627253056 pyconfig.py:432] Config param output_dim_for_audio: 512 I0420 15:52:35.672470 140084627253056 pyconfig.py:432] Config param override_logical_axis_rules: False I0420 15:52:35.672485 140084627253056 pyconfig.py:432] Config param override_model_config: True I0420 15:52:35.672499 140084627253056 pyconfig.py:432] Config param packing: True I0420 15:52:35.672513 140084627253056 pyconfig.py:432] Config param pagedattn_head_dim_alignment: 128 I0420 15:52:35.672530 140084627253056 pyconfig.py:432] Config param pagedattn_max_pages_per_group: -1 I0420 15:52:35.672546 140084627253056 pyconfig.py:432] Config param pagedattn_num_pages: 64 I0420 15:52:35.672559 140084627253056 pyconfig.py:432] Config param pagedattn_pages_per_compute_block: 4 I0420 15:52:35.672575 140084627253056 pyconfig.py:432] Config param pagedattn_tokens_per_page: 32 I0420 15:52:35.672594 140084627253056 pyconfig.py:432] Config param param_scan_axis: 1 I0420 15:52:35.672610 140084627253056 pyconfig.py:432] Config param parameter_memory_host_offload: False I0420 15:52:35.672624 140084627253056 pyconfig.py:432] Config param partial_rotary_factor: 1.0 I0420 15:52:35.672640 140084627253056 pyconfig.py:432] Config param patch_size_for_vit: 14 I0420 15:52:35.672654 140084627253056 pyconfig.py:432] Config param penalty_incorrect_answer: -1.0 I0420 15:52:35.672671 140084627253056 pyconfig.py:432] Config param penalty_incorrect_format: -0.5 I0420 15:52:35.672685 140084627253056 pyconfig.py:432] Config param per_device_batch_size: 2 I0420 15:52:35.672701 140084627253056 pyconfig.py:432] Config param per_device_batch_size_increment: 2.0 I0420 15:52:35.672716 140084627253056 pyconfig.py:432] Config param per_device_batch_size_start: 4.0 I0420 15:52:35.672732 140084627253056 pyconfig.py:432] Config param pipeline_delay_activation_forwarding: False I0420 15:52:35.672747 140084627253056 pyconfig.py:432] Config param pipeline_fsdp_ag_once: False I0420 15:52:35.672763 140084627253056 pyconfig.py:432] Config param pipeline_fsdp_ag_per_repeat: False I0420 15:52:35.672777 140084627253056 pyconfig.py:432] Config param pipeline_parallel_layers: 1 I0420 15:52:35.672793 140084627253056 pyconfig.py:432] Config param pixel_shuffle_ratio_for_vit: 0.5 I0420 15:52:35.672808 140084627253056 pyconfig.py:432] Config param posemb_type_for_vit: learn I0420 15:52:35.672824 140084627253056 pyconfig.py:432] Config param position_id_per_seconds: 25 I0420 15:52:35.672840 140084627253056 pyconfig.py:432] Config param prefill_cache_axis_order: 1,2,0,3 I0420 15:52:35.672857 140084627253056 pyconfig.py:432] Config param prefill_cache_dir: I0420 15:52:35.672871 140084627253056 pyconfig.py:432] Config param prefill_chunk_size: 256 I0420 15:52:35.672887 140084627253056 pyconfig.py:432] Config param prefill_slice: v5e-16 I0420 15:52:35.672901 140084627253056 pyconfig.py:432] Config param prefix_caching_dram_byte: 100000000000 I0420 15:52:35.672917 140084627253056 pyconfig.py:432] Config param prefix_caching_hbm_byte: 10000000000 I0420 15:52:35.672931 140084627253056 pyconfig.py:432] Config param profile_cleanly: True I0420 15:52:35.672947 140084627253056 pyconfig.py:432] Config param profile_periodically_period: -1 I0420 15:52:35.672961 140084627253056 pyconfig.py:432] Config param profile_power_events: False I0420 15:52:35.672977 140084627253056 pyconfig.py:432] Config param profiler: ProfilerType.NONE I0420 15:52:35.672996 140084627253056 pyconfig.py:432] Config param profiler_steps: 5 I0420 15:52:35.673013 140084627253056 pyconfig.py:432] Config param projector_dropout_for_vit: 0.0 I0420 15:52:35.673028 140084627253056 pyconfig.py:432] Config param projector_input_dim_for_vit: 4096 I0420 15:52:35.673044 140084627253056 pyconfig.py:432] Config param projector_output_dim_for_vit: 4096 I0420 15:52:35.673058 140084627253056 pyconfig.py:432] Config param prometheus_port: 0 I0420 15:52:35.673074 140084627253056 pyconfig.py:432] Config param prompt: I love to I0420 15:52:35.673098 140084627253056 pyconfig.py:432] Config param pure_nnx: False I0420 15:52:35.673114 140084627253056 pyconfig.py:432] Config param pure_nnx_decoder: False I0420 15:52:35.673129 140084627253056 pyconfig.py:432] Config param q_lora_rank: 0 I0420 15:52:35.673144 140084627253056 pyconfig.py:432] Config param qk_clip_threshold: 100.0 I0420 15:52:35.673159 140084627253056 pyconfig.py:432] Config param qk_nope_head_dim: 128 I0420 15:52:35.673173 140084627253056 pyconfig.py:432] Config param qk_norm_with_scale: True I0420 15:52:35.673190 140084627253056 pyconfig.py:432] Config param qk_rope_head_dim: 64 I0420 15:52:35.673207 140084627253056 pyconfig.py:432] Config param qkv_proj: RematLocation.REMAT I0420 15:52:35.673223 140084627253056 pyconfig.py:432] Config param quant_cfg_path: I0420 15:52:35.673238 140084627253056 pyconfig.py:432] Config param quantization: QuantizationType.NONE I0420 15:52:35.673256 140084627253056 pyconfig.py:432] Config param quantization_local_shard_count: 4 I0420 15:52:35.673273 140084627253056 pyconfig.py:432] Config param quantize_kvcache: False I0420 15:52:35.673290 140084627253056 pyconfig.py:432] Config param query_proj: RematLocation.REMAT I0420 15:52:35.673307 140084627253056 pyconfig.py:432] Config param query_wa_proj: RematLocation.REMAT I0420 15:52:35.673322 140084627253056 pyconfig.py:432] Config param ragged_block_size: 256 I0420 15:52:35.673340 140084627253056 pyconfig.py:432] Config param rampup_end_step: 0 I0420 15:52:35.673357 140084627253056 pyconfig.py:432] Config param rampup_samples_per_increment_to_load: None I0420 15:52:35.673373 140084627253056 pyconfig.py:432] Config param reasoning_end_token: </reasoning> I0420 15:52:35.673388 140084627253056 pyconfig.py:432] Config param reasoning_start_token: <reasoning> I0420 15:52:35.673405 140084627253056 pyconfig.py:432] Config param record_internal_nn_metrics: 0 I0420 15:52:35.673419 140084627253056 pyconfig.py:432] Config param remat_policy: full I0420 15:52:35.673435 140084627253056 pyconfig.py:432] Config param remat_policy_for_vit: minimal I0420 15:52:35.673450 140084627253056 pyconfig.py:432] Config param remove_size_one_mesh_axis_from_type: True I0420 15:52:35.673466 140084627253056 pyconfig.py:432] Config param replicate_quant_scale: False I0420 15:52:35.673481 140084627253056 pyconfig.py:432] Config param replicator_backup_interval_minutes: 0 I0420 15:52:35.673497 140084627253056 pyconfig.py:432] Config param report_heartbeat_metric_for_gcp_monitoring: False I0420 15:52:35.673511 140084627253056 pyconfig.py:432] Config param report_performance_metric_for_gcp_monitoring: False I0420 15:52:35.673526 140084627253056 pyconfig.py:432] Config param reshape_q: False I0420 15:52:35.673540 140084627253056 pyconfig.py:432] Config param return_log_prob: False I0420 15:52:35.673557 140084627253056 pyconfig.py:432] Config param reuse_example_batch: 0 I0420 15:52:35.673572 140084627253056 pyconfig.py:432] Config param reward_exact_answer: 5.0 I0420 15:52:35.673593 140084627253056 pyconfig.py:432] Config param reward_exact_format_match: 3.0 I0420 15:52:35.673608 140084627253056 pyconfig.py:432] Config param reward_partial_format_match: 0.5 I0420 15:52:35.673625 140084627253056 pyconfig.py:432] Config param reward_ratio_guess_to_answer_high: 0.5 I0420 15:52:35.673640 140084627253056 pyconfig.py:432] Config param reward_ratio_guess_to_answer_low: 0.25 I0420 15:52:35.673656 140084627253056 pyconfig.py:432] Config param reward_white_space_format_match: 1.5 I0420 15:52:35.673671 140084627253056 pyconfig.py:432] Config param rl: {'num_generations': 2, 'num_iterations': 1, 'grpo_beta': 0.08, 'grpo_epsilon': 0.2, 'loss_algo': 'grpo', 'use_agentic_rollout': False, 'max_concurrency': 256, 'off_policy_steps': 0, 'system_prompt': '', 'degenerate_group_masking': True, 'epsilon_high': None} I0420 15:52:35.673692 140084627253056 pyconfig.py:432] Config param rollout_data_parallelism: -1 I0420 15:52:35.673707 140084627253056 pyconfig.py:432] Config param rollout_expert_parallelism: 1 I0420 15:52:35.673722 140084627253056 pyconfig.py:432] Config param rollout_micro_batch_size: -1 I0420 15:52:35.673736 140084627253056 pyconfig.py:432] Config param rollout_tensor_parallelism: -1 I0420 15:52:35.673753 140084627253056 pyconfig.py:432] Config param rope_attention_scaling: False I0420 15:52:35.673767 140084627253056 pyconfig.py:432] Config param rope_factor: 40 I0420 15:52:35.673783 140084627253056 pyconfig.py:432] Config param rope_interleave: True I0420 15:52:35.673797 140084627253056 pyconfig.py:432] Config param rope_linear_scaling_factor: 1.0 I0420 15:52:35.673813 140084627253056 pyconfig.py:432] Config param rope_max_timescale: 10000 I0420 15:52:35.673828 140084627253056 pyconfig.py:432] Config param rope_min_timescale: 1 I0420 15:52:35.673845 140084627253056 pyconfig.py:432] Config param rope_theta_for_vit: 10000 I0420 15:52:35.673859 140084627253056 pyconfig.py:432] Config param rope_truncate: True I0420 15:52:35.673876 140084627253056 pyconfig.py:432] Config param rope_type: RopeType.DEFAULT I0420 15:52:35.673892 140084627253056 pyconfig.py:432] Config param rope_use_scale: True I0420 15:52:35.673908 140084627253056 pyconfig.py:432] Config param routed_bias: False I0420 15:52:35.673923 140084627253056 pyconfig.py:432] Config param routed_bias_update_rate: 0.0 I0420 15:52:35.673939 140084627253056 pyconfig.py:432] Config param routed_scaling_factor: 1.0 I0420 15:52:35.673953 140084627253056 pyconfig.py:432] Config param routed_score_func: I0420 15:52:35.673969 140084627253056 pyconfig.py:432] Config param run_name: gpt3-52k_2026-04-20-15-52 I0420 15:52:35.673984 140084627253056 pyconfig.py:432] Config param sa_block_kv: 512 I0420 15:52:35.674000 140084627253056 pyconfig.py:432] Config param sa_block_kv_compute: 512 I0420 15:52:35.674014 140084627253056 pyconfig.py:432] Config param sa_block_kv_dkv: 512 I0420 15:52:35.674030 140084627253056 pyconfig.py:432] Config param sa_block_kv_dkv_compute: 512 I0420 15:52:35.674045 140084627253056 pyconfig.py:432] Config param sa_block_kv_dq: 512 I0420 15:52:35.674061 140084627253056 pyconfig.py:432] Config param sa_block_q: 512 I0420 15:52:35.674076 140084627253056 pyconfig.py:432] Config param sa_block_q_dkv: 512 I0420 15:52:35.674102 140084627253056 pyconfig.py:432] Config param sa_block_q_dq: 512 I0420 15:52:35.674116 140084627253056 pyconfig.py:432] Config param sa_k_layout: HEAD_DIM_MINOR I0420 15:52:35.674133 140084627253056 pyconfig.py:432] Config param sa_q_layout: HEAD_DIM_MINOR I0420 15:52:35.674148 140084627253056 pyconfig.py:432] Config param sa_use_fused_bwd_kernel: False I0420 15:52:35.674164 140084627253056 pyconfig.py:432] Config param sa_v_layout: HEAD_DIM_MINOR I0420 15:52:35.674178 140084627253056 pyconfig.py:432] Config param sampler_devices_fraction: 0.5 I0420 15:52:35.674194 140084627253056 pyconfig.py:432] Config param save_checkpoint_on_completion: True I0420 15:52:35.674209 140084627253056 pyconfig.py:432] Config param save_config_to_gcs: False I0420 15:52:35.674226 140084627253056 pyconfig.py:432] Config param save_quantized_params_path: I0420 15:52:35.674240 140084627253056 pyconfig.py:432] Config param scale_embedding_for_audio: True I0420 15:52:35.674256 140084627253056 pyconfig.py:432] Config param scan_layers: True I0420 15:52:35.674270 140084627253056 pyconfig.py:432] Config param scan_layers_per_stage: False I0420 15:52:35.674286 140084627253056 pyconfig.py:432] Config param scan_pipeline_iterations: True I0420 15:52:35.674300 140084627253056 pyconfig.py:432] Config param scan_pipeline_repeats: False I0420 15:52:35.674316 140084627253056 pyconfig.py:432] Config param set_remat_policy_on_layers_per_stage: False I0420 15:52:35.674330 140084627253056 pyconfig.py:432] Config param set_remat_policy_on_pipeline_iterations: True I0420 15:52:35.674346 140084627253056 pyconfig.py:432] Config param sft_train_on_completion_only: False I0420 15:52:35.674360 140084627253056 pyconfig.py:432] Config param shard_exp_on_fsdp: False I0420 15:52:35.674376 140084627253056 pyconfig.py:432] Config param shard_mode: ShardMode.AUTO I0420 15:52:35.674395 140084627253056 pyconfig.py:432] Config param shard_optimizer_over_data: False I0420 15:52:35.674409 140084627253056 pyconfig.py:432] Config param sharding_strategy: None I0420 15:52:35.674425 140084627253056 pyconfig.py:432] Config param sharding_tolerance: 0.02 I0420 15:52:35.674440 140084627253056 pyconfig.py:432] Config param shardy: True I0420 15:52:35.674456 140084627253056 pyconfig.py:432] Config param share_kv_projections: False I0420 15:52:35.674470 140084627253056 pyconfig.py:432] Config param shared_experts: 0 I0420 15:52:35.674486 140084627253056 pyconfig.py:432] Config param sinkhorn_iterations: 20 I0420 15:52:35.674500 140084627253056 pyconfig.py:432] Config param skip_first_n_steps_for_profiler: 1 I0420 15:52:35.674516 140084627253056 pyconfig.py:432] Config param skip_jax_distributed_system: False I0420 15:52:35.674530 140084627253056 pyconfig.py:432] Config param skip_step_interval: 128 I0420 15:52:35.674546 140084627253056 pyconfig.py:432] Config param skip_step_on_spikes: False I0420 15:52:35.674561 140084627253056 pyconfig.py:432] Config param skip_step_scaling_factor: 6.0 I0420 15:52:35.674577 140084627253056 pyconfig.py:432] Config param sliding_window_size: 0 I0420 15:52:35.674595 140084627253056 pyconfig.py:432] Config param solution_end_token: </answer> I0420 15:52:35.674612 140084627253056 pyconfig.py:432] Config param solution_start_token: <answer> I0420 15:52:35.674630 140084627253056 pyconfig.py:432] Config param source_checkpoint_layout: orbax I0420 15:52:35.674644 140084627253056 pyconfig.py:432] Config param sparse_matmul: True I0420 15:52:35.674661 140084627253056 pyconfig.py:432] Config param spatial_merge_size_for_vit: 2 I0420 15:52:35.674675 140084627253056 pyconfig.py:432] Config param stack_prefill_result_cache: False I0420 15:52:35.674691 140084627253056 pyconfig.py:432] Config param stack_trace_interval_seconds: 600 I0420 15:52:35.674705 140084627253056 pyconfig.py:432] Config param stack_trace_to_cloud: False I0420 15:52:35.674721 140084627253056 pyconfig.py:432] Config param step_deviation_interval_seconds: 30 I0420 15:52:35.674736 140084627253056 pyconfig.py:432] Config param steps: 200000 I0420 15:52:35.674752 140084627253056 pyconfig.py:432] Config param stop_strings: None I0420 15:52:35.674766 140084627253056 pyconfig.py:432] Config param student_overrides: {'model_name': 'llama3.1-8b'} I0420 15:52:35.674783 140084627253056 pyconfig.py:432] Config param student_params_to_update: None I0420 15:52:35.674797 140084627253056 pyconfig.py:432] Config param subslice_shape: I0420 15:52:35.674813 140084627253056 pyconfig.py:432] Config param swap_space_vllm_gb: 2 I0420 15:52:35.674827 140084627253056 pyconfig.py:432] Config param system_prompt: I0420 15:52:35.674845 140084627253056 pyconfig.py:432] Config param target_eval_loss: 0.0 I0420 15:52:35.674859 140084627253056 pyconfig.py:432] Config param teacher_overrides: {'model_name': 'llama3.1-8b'} I0420 15:52:35.674875 140084627253056 pyconfig.py:432] Config param temperature_tuning: False I0420 15:52:35.674890 140084627253056 pyconfig.py:432] Config param temporal_patch_size_for_vit: 2 I0420 15:52:35.674905 140084627253056 pyconfig.py:432] Config param tensorboard_dir: None I0420 15:52:35.674920 140084627253056 pyconfig.py:432] Config param tensors_on_device: None I0420 15:52:35.674936 140084627253056 pyconfig.py:432] Config param tensors_to_offload: None I0420 15:52:35.674950 140084627253056 pyconfig.py:432] Config param test_batch_start_index: 0 I0420 15:52:35.674966 140084627253056 pyconfig.py:432] Config param tile_size_for_vit: 336 I0420 15:52:35.674980 140084627253056 pyconfig.py:432] Config param tokenize_eval_data: True I0420 15:52:35.674997 140084627253056 pyconfig.py:432] Config param tokenize_train_data: True I0420 15:52:35.675011 140084627253056 pyconfig.py:432] Config param tokenizer_path: meta-llama/Llama-3.1-8B I0420 15:52:35.675027 140084627253056 pyconfig.py:432] Config param tokenizer_type: TokenizerType.HUGGINGFACE I0420 15:52:35.675044 140084627253056 pyconfig.py:432] Config param topk_routing_group: -1 I0420 15:52:35.675062 140084627253056 pyconfig.py:432] Config param train_data_columns: ['text'] I0420 15:52:35.675087 140084627253056 pyconfig.py:432] Config param train_fraction: 1.0 I0420 15:52:35.675106 140084627253056 pyconfig.py:432] Config param train_image_column: image I0420 15:52:35.675120 140084627253056 pyconfig.py:432] Config param train_micro_batch_size: -1 I0420 15:52:35.675137 140084627253056 pyconfig.py:432] Config param train_split: train I0420 15:52:35.675151 140084627253056 pyconfig.py:432] Config param trainable_parameters_mask: [] I0420 15:52:35.675167 140084627253056 pyconfig.py:432] Config param trainable_position_size: 2048 I0420 15:52:35.675182 140084627253056 pyconfig.py:432] Config param trainer_devices_fraction: 0.5 I0420 15:52:35.675199 140084627253056 pyconfig.py:432] Config param upload_all_profiler_results: False I0420 15:52:35.675214 140084627253056 pyconfig.py:432] Config param use_2d_fsdp_sharding: False I0420 15:52:35.675230 140084627253056 pyconfig.py:432] Config param use_agentic_rollout: False I0420 15:52:35.675244 140084627253056 pyconfig.py:432] Config param use_audio: False I0420 15:52:35.675260 140084627253056 pyconfig.py:432] Config param use_audio_in_video: False I0420 15:52:35.675274 140084627253056 pyconfig.py:432] Config param use_batch_split_schedule: False I0420 15:52:35.675290 140084627253056 pyconfig.py:432] Config param use_chat_template: False I0420 15:52:35.675304 140084627253056 pyconfig.py:432] Config param use_chunked_prefill: False I0420 15:52:35.675320 140084627253056 pyconfig.py:432] Config param use_custom_sort_vjp: True I0420 15:52:35.675335 140084627253056 pyconfig.py:432] Config param use_dpo: False I0420 15:52:35.675351 140084627253056 pyconfig.py:432] Config param use_gather_mosaic_kernel: False I0420 15:52:35.675365 140084627253056 pyconfig.py:432] Config param use_grpo: True I0420 15:52:35.675381 140084627253056 pyconfig.py:432] Config param use_indexer: False I0420 15:52:35.675395 140084627253056 pyconfig.py:432] Config param use_iota_embed: True I0420 15:52:35.675411 140084627253056 pyconfig.py:432] Config param use_jax_splash: False I0420 15:52:35.675425 140084627253056 pyconfig.py:432] Config param use_max_logit_estimate: -1 I0420 15:52:35.675441 140084627253056 pyconfig.py:432] Config param use_mrope: False I0420 15:52:35.675455 140084627253056 pyconfig.py:432] Config param use_multimodal: False I0420 15:52:35.675471 140084627253056 pyconfig.py:432] Config param use_pathways: True I0420 15:52:35.675486 140084627253056 pyconfig.py:432] Config param use_post_attn_norm: False I0420 15:52:35.675503 140084627253056 pyconfig.py:432] Config param use_post_ffw_norm: False I0420 15:52:35.675517 140084627253056 pyconfig.py:432] Config param use_qk_clip: False I0420 15:52:35.675533 140084627253056 pyconfig.py:432] Config param use_qk_norm: False I0420 15:52:35.675547 140084627253056 pyconfig.py:432] Config param use_qk_norm_in_gdn: True I0420 15:52:35.675563 140084627253056 pyconfig.py:432] Config param use_qwix_quantization: False I0420 15:52:35.675577 140084627253056 pyconfig.py:432] Config param use_ragged_attention: False I0420 15:52:35.675597 140084627253056 pyconfig.py:432] Config param use_random_routing: False I0420 15:52:35.675612 140084627253056 pyconfig.py:432] Config param use_replicator_service: False I0420 15:52:35.675628 140084627253056 pyconfig.py:432] Config param use_ring_of_experts: False I0420 15:52:35.675642 140084627253056 pyconfig.py:432] Config param use_sft: False I0420 15:52:35.675658 140084627253056 pyconfig.py:432] Config param use_splash_scheduler: False I0420 15:52:35.675672 140084627253056 pyconfig.py:432] Config param use_tokamax_gmm: False I0420 15:52:35.675688 140084627253056 pyconfig.py:432] Config param use_tokamax_splash: False I0420 15:52:35.675702 140084627253056 pyconfig.py:432] Config param use_truncation: True I0420 15:52:35.675718 140084627253056 pyconfig.py:432] Config param use_tunix_gradient_accumulation: False I0420 15:52:35.675732 140084627253056 pyconfig.py:432] Config param use_untrainable_positional_embedding: False I0420 15:52:35.675748 140084627253056 pyconfig.py:432] Config param use_vertex_tensorboard: False I0420 15:52:35.675762 140084627253056 pyconfig.py:432] Config param using_pipeline_parallelism: False I0420 15:52:35.675779 140084627253056 pyconfig.py:432] Config param v_head_dim: 128 I0420 15:52:35.675793 140084627253056 pyconfig.py:432] Config param v_norm_with_scale: True I0420 15:52:35.675808 140084627253056 pyconfig.py:432] Config param value_proj: RematLocation.REMAT I0420 15:52:35.675824 140084627253056 pyconfig.py:432] Config param vertex_tensorboard_project: I0420 15:52:35.675841 140084627253056 pyconfig.py:432] Config param vertex_tensorboard_region: I0420 15:52:35.675855 140084627253056 pyconfig.py:432] Config param video_path: I0420 15:52:35.675872 140084627253056 pyconfig.py:432] Config param video_placeholder: <|video|> I0420 15:52:35.675886 140084627253056 pyconfig.py:432] Config param vision_output_dim_for_vit: 4096 I0420 15:52:35.675902 140084627253056 pyconfig.py:432] Config param vision_output_length: -1 I0420 15:52:35.675918 140084627253056 pyconfig.py:432] Config param vllm_additional_config: {} I0420 15:52:35.675933 140084627253056 pyconfig.py:432] Config param vllm_hf_config_path: I0420 15:52:35.675950 140084627253056 pyconfig.py:432] Config param vllm_hf_overrides: {} I0420 15:52:35.675964 140084627253056 pyconfig.py:432] Config param vocab_size: 32000 I0420 15:52:35.675980 140084627253056 pyconfig.py:432] Config param warmup_steps_fraction: 0.1 I0420 15:52:35.675994 140084627253056 pyconfig.py:432] Config param weight_dtype: float32 I0420 15:52:35.676018 140084627253056 pyconfig.py:432] Config param weight_quantization_calibration_method: absmax I0420 15:52:35.676032 140084627253056 pyconfig.py:432] Config param wi_tile_dlhs_batch_seq: 512 I0420 15:52:35.676048 140084627253056 pyconfig.py:432] Config param wi_tile_dlhs_embed_dim: 1024 I0420 15:52:35.676063 140084627253056 pyconfig.py:432] Config param wi_tile_dlhs_mlp_dim: 1024 I0420 15:52:35.676086 140084627253056 pyconfig.py:432] Config param wi_tile_drhs_batch_seq: 512 I0420 15:52:35.676103 140084627253056 pyconfig.py:432] Config param wi_tile_drhs_embed_dim: 1024 I0420 15:52:35.676118 140084627253056 pyconfig.py:432] Config param wi_tile_drhs_mlp_dim: 1024 I0420 15:52:35.676133 140084627253056 pyconfig.py:432] Config param wi_tile_fwd_batch_seq: 512 I0420 15:52:35.676149 140084627253056 pyconfig.py:432] Config param wi_tile_fwd_embed_dim: 1024 I0420 15:52:35.676164 140084627253056 pyconfig.py:432] Config param wi_tile_fwd_mlp_dim: 1024 I0420 15:52:35.676182 140084627253056 pyconfig.py:432] Config param wo_tile_dlhs_batch_seq: 512 I0420 15:52:35.676197 140084627253056 pyconfig.py:432] Config param wo_tile_dlhs_embed_dim: 1024 I0420 15:52:35.676213 140084627253056 pyconfig.py:432] Config param wo_tile_dlhs_mlp_dim: 1024 I0420 15:52:35.676228 140084627253056 pyconfig.py:432] Config param wo_tile_drhs_batch_seq: 512 I0420 15:52:35.676244 140084627253056 pyconfig.py:432] Config param wo_tile_drhs_embed_dim: 1024 I0420 15:52:35.676260 140084627253056 pyconfig.py:432] Config param wo_tile_drhs_mlp_dim: 1024 I0420 15:52:35.676275 140084627253056 pyconfig.py:432] Config param wo_tile_fwd_batch_seq: 512 I0420 15:52:35.676290 140084627253056 pyconfig.py:432] Config param wo_tile_fwd_embed_dim: 1024 I0420 15:52:35.676305 140084627253056 pyconfig.py:432] Config param wo_tile_fwd_mlp_dim: 1024 I0420 15:52:35.676320 140084627253056 pyconfig.py:432] Config param wsd_decay_steps_fraction: 0.1 I0420 15:52:35.676335 140084627253056 pyconfig.py:432] Config param wsd_decay_style: WsdDecayStyle.LINEAR I0420 15:52:35.676351 140084627253056 pyconfig.py:432] Config param xprof_e2e_enable_fw_power_level_event: False I0420 15:52:35.676367 140084627253056 pyconfig.py:432] Config param xprof_e2e_enable_fw_thermal_event: False I0420 15:52:35.676382 140084627253056 pyconfig.py:432] Config param xprof_e2e_enable_fw_throttle_event: False I0420 15:52:35.676398 140084627253056 pyconfig.py:432] Config param xprof_tpu_power_trace_level: 0 I0420 15:52:35.676414 140084627253056 pyconfig.py:432] Config param z_loss_multiplier: 0.0 I0420 15:52:35.676741 140084627253056 tokenizer.py:245] Tokenizer path: meta-llama/Llama-2-7b-chat-hf I0420 15:52:35.676780 140084627253056 tokenizer.py:224] Loading HF tokenizer: meta-llama/Llama-2-7b-chat-hf I0420 15:52:39.573024 140084627253056 _schedule.py:129] A polynomial schedule was set with a non-positive `transition_steps` value; this results in a constant schedule with value `init_value`. I0420 15:52:39.576035 140084627253056 maxtext_utils.py:1718] Num_devices: 32, shape (1, 4, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1) I0420 15:52:39.576181 140084627253056 train_distill.py:596] Applying logical axis rules for model initialization and training... I0420 15:52:39.576253 140084627253056 train_distill.py:600] Loading Student from ... I0420 15:52:39.576281 140084627253056 train_distill.py:169] --- Student Configuration --- I0420 15:52:39.576305 140084627253056 train_distill.py:170] Model Name: gpt3-52k I0420 15:52:39.576328 140084627253056 train_distill.py:171] Dimensions: 1 Layers, 16 Emb Dim, 8 Head Dim I0420 15:52:39.576348 140084627253056 train_distill.py:174] Attention Heads: 2 Query, 2 KV I0420 15:52:39.576366 140084627253056 train_distill.py:175] Vocab Size: 32000 I0420 15:52:39.576384 140084627253056 train_distill.py:176] Checkpoint: I0420 15:52:39.576401 140084627253056 train_distill.py:472] Initializing model: gpt3-52k... I0420 15:52:40.973181 140084627253056 train_distill.py:614] Loading Teacher from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items... I0420 15:52:40.973289 140084627253056 train_distill.py:169] --- Teacher Configuration --- I0420 15:52:40.973317 140084627253056 train_distill.py:170] Model Name: gpt3-52k I0420 15:52:40.973340 140084627253056 train_distill.py:171] Dimensions: 1 Layers, 16 Emb Dim, 8 Head Dim I0420 15:52:40.973363 140084627253056 train_distill.py:174] Attention Heads: 2 Query, 2 KV I0420 15:52:40.973381 140084627253056 train_distill.py:175] Vocab Size: 32000 I0420 15:52:40.973401 140084627253056 train_distill.py:176] Checkpoint: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items I0420 15:52:40.973422 140084627253056 train_distill.py:472] Initializing model: gpt3-52k... I0420 15:52:42.033600 140084627253056 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0420 15:52:42.034037 140084627253056 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=True, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7f6746d24380>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0420 15:52:42.034115 140084627253056 abstract_checkpointer.py:35] orbax-checkpoint version: 0.11.28 W0420 15:52:42.602211 140084627253056 checkpoint.py:202] Metadata file does not exist: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items/_CHECKPOINT_METADATA I0420 15:52:43.152243 2091 google_auth_provider.cc:181] Running on GCE, using service account 562977990677-compute@developer.gserviceaccount.com I0420 15:52:44.305475 140084627253056 checkpointer.py:304] Restoring checkpoint from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items. W0420 15:52:46.838149 140084627253056 transform_utils.py:230] The transformations API will eventually be replaced by an upgraded design. The current API will not be removed until this point, but it will no longer be actively worked on. I0420 15:52:46.838524 140084627253056 transform_utils.py:288] The following keys are not loaded from the original tree after applying specified transforms: params/params/decoder/to_nnx__rngs/aqt/count, params/params/decoder/to_nnx__rngs/aqt/key, params/params/decoder/to_nnx__rngs/dropout/count, params/params/decoder/to_nnx__rngs/dropout/key, params/params/decoder/to_nnx__rngs/params/count, params/params/decoder/to_nnx__rngs/params/key I0420 15:52:48.097912 140084627253056 checkpointer.py:318] Finished restoring checkpoint in 4.16 seconds from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items. I0420 15:52:48.453005 140084627253056 metrics_logger.py:64] WandbBackend skipped: 'wandb' library not installed. I0420 15:52:48.718552 140084627253056 train_distill.py:640] Initializing Data Iterators via MaxText pipeline... I0420 15:52:48.782145 140084627253056 config.py:112] TensorFlow version 2.20.0 available. I0420 15:52:48.782636 140084627253056 config.py:125] JAX version 0.8.3 available. E0420 15:52:50.864113 140084627253056 packing.py:209] PackAndBatchOperation is deprecated. Please use lazy_dataset.FirstFitPackIterDataset instead. I0420 15:52:50.864333 140084627253056 data_loader.py:408] Adding CopyNumPyArrayToSharedMemory MapTransform. I0420 15:52:50.867326 140084627253056 train_distill.py:417] Input Pipeline Checkpointing: DISABLED I0420 15:52:50.867388 140084627253056 train_distill.py:421] Reason: Iterator 'MultiHostDataLoadIterator' is not recognized as Grain (dataset_type='DatasetType.HF', has_save=False) I0420 15:52:50.867449 140084627253056 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0420 15:52:50.867526 140084627253056 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7f6746d24380>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0420 15:52:50.867567 140084627253056 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0420 15:52:50.867597 140084627253056 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7f6746d24380>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0420 15:52:50.867640 140084627253056 checkpoint_manager.py:702] [process=0][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f50f7e5b170>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f50f7e5b0e0>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f50f7e5b050>}, handler_registry=None I0420 15:52:50.867833 140084627253056 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f50f7e5b170>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0420 15:52:50.867873 140084627253056 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f50f7e5b0e0>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0420 15:52:50.867900 140084627253056 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f50f7e5b050>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0420 15:52:50.867930 140084627253056 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f5d4d1a4dd0>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0420 15:52:50.867957 140084627253056 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f50f7e5b170>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f50f7e5b170>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f50f7e5b0e0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f50f7e5b0e0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f50f7e5b050>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f50f7e5b050>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f5d4d1a4dd0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f5d4d1a4dd0>}). I0420 15:52:50.868375 140084627253056 async_checkpointer.py:177] [process=0][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7f50f7a8d800> timeout: 600 secs and primary_host=0 for async checkpoint writes I0420 15:52:53.334852 140084627253056 checkpoint_manager.py:558] Created directory=gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260420_153552/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260420_153552_07_distill_smoke/checkpoints I0420 15:52:53.782160 140084627253056 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260420_153552/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260420_153552_07_distill_smoke/checkpoints I0420 15:52:53.788257 140084627253056 checkpoint_manager.py:921] [process=0][thread=MainThread] CheckpointManager created, primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260420_153552/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260420_153552_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7f50f7e5b020> I0420 15:52:53.788371 140084627253056 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0420 15:52:53.788435 140084627253056 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7f6746d24380>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0420 15:52:53.788471 140084627253056 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0420 15:52:53.788500 140084627253056 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7f6746d24380>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0420 15:52:53.788534 140084627253056 checkpoint_manager.py:1983] [process=0][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning. I0420 15:52:53.788589 140084627253056 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=140084627253056 count=1 at 0x7f4d583e4680>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7f50f7e5ae10>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7f50f7e5ade0>, _write_futures=[]) I0420 15:52:53.788897 140084627253056 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=140084627253056 count=1 at 0x7f4d583e4680>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7f50f7e5ae10>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7f50f7e5ade0>, _write_futures=[]) I0420 15:52:53.788921 140084627253056 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=140084627253056 count=1 at 0x7f4d583e4680>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7f50f7e5ae10>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7f50f7e5ade0>, _write_futures=[]) I0420 15:52:53.788952 140084627253056 checkpoint_manager.py:702] [process=0][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f50f7e5aff0>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f50f7e5a0f0>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f50f79d6120>, 'iter': <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7f50f79d5550>}, handler_registry=None I0420 15:52:53.789048 140084627253056 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f50f7e5aff0>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0420 15:52:53.789098 140084627253056 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f50f7e5a0f0>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0420 15:52:53.789124 140084627253056 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f50f79d6120>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0420 15:52:53.789152 140084627253056 composite_checkpoint_handler.py:237] Deferred registration for item: "iter". Adding handler `<maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7f50f79d5550>` for item "iter" and save args `<class 'maxtext.common.checkpointing.GrainCheckpointSave'>` and restore args `<class 'maxtext.common.checkpointing.GrainCheckpointRestore'>` to `_handler_registry`. I0420 15:52:53.789175 140084627253056 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f50f79d6300>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0420 15:52:53.789199 140084627253056 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f50f7e5aff0>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f50f7e5aff0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f50f7e5a0f0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7f50f7e5a0f0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f50f79d6120>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f50f79d6120>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointSave'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7f50f79d5550>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointRestore'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7f50f79d5550>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f50f79d6300>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7f50f79d6300>}). I0420 15:52:53.789268 140084627253056 async_checkpointer.py:177] [process=0][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7f50f7a8d940> timeout: 600 secs and primary_host=0 for async checkpoint writes I0420 15:52:54.172882 140084627253056 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260420_153552/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260420_153552_07_distill_smoke/checkpoints I0420 15:52:54.192347 140084627253056 checkpoint_manager.py:921] [process=0][thread=MainThread] CheckpointManager created, primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260420_153552/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260420_153552_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7f50f7e5a300> I0420 15:52:54.192528 140084627253056 train_distill.py:687] Starting Distillation Training... I0420 15:52:54.192627 140084627253056 peft_trainer.py:590] Training with mesh: Mesh('diloco': 1, 'data': 4, 'stage': 1, 'fsdp': 8, 'fsdp_transpose': 1, 'sequence': 1, 'context': 1, 'context_autoregressive': 1, 'tensor': 1, 'tensor_transpose': 1, 'tensor_sequence': 1, 'expert': 1, 'autoregressive': 1, axis_types=(Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto)) I0420 15:52:54.546571 140084627253056 peft_trainer.py:600] Compiled train_step cache size: 0 Training: 0%| | 0/5 [00:00<?, ?step/s]I0420 15:52:54.548312 139944311904000 grain_pool.py:367] Grain pool will use 1 processes. I0420 15:52:54.935253 139944311904000 grain_pool.py:440] Grain pool will start child processes. I0420 15:52:54.940637 139944311904000 grain_pool.py:448] Grain pool started all child processes. 2026-04-20 15:53:00.945094: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303) I0420 15:53:04.176497 140084627253056 utils.py:86] Train loop finished in: 9.6293 seconds Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 761, in <module> app.run(main) File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 316, in run _run_main(main, args) File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 261, in _run_main sys.exit(main(argv)) ^^^^^^^^^^ File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 757, in main train_distill(student_config, teacher_config, is_offline, global_config.offline_data_dir) File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 689, in train_distill trainer.train(train_iter, eval_iter) File "/usr/local/lib/python3.12/site-packages/tunix/sft/peft_trainer.py", line 659, in train train_example = sharding_utils.shard_input( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 58, in shard_input return jax.tree.map( ^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/tree.py", line 155, in map return tree_util.tree_map(f, tree, *rest, is_leaf=is_leaf) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 362, in tree_map return treedef.unflatten(f(*xs) for xs in zip(*all_leaves)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 362, in <genexpr> return treedef.unflatten(f(*xs) for xs in zip(*all_leaves)) ^^^^^^ File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 59, in <lambda> lambda x: jax.make_array_from_process_local_data( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 986, in make_array_from_process_local_data out = [_array_from_process_local_data(data, s, shape) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 1048, in _array_from_process_local_data return make_array_from_callback(global_shape, sharding, cb) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 845, in make_array_from_callback per_device_values = api.device_put(per_device_values, devices) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/api.py", line 2729, in device_put out_flat = dispatch._batched_device_put_impl( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 558, in _batched_device_put_impl y = _device_put_impl(x, device=device, src=src, copy=cp, aval=aval) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 545, in _device_put_impl return _device_put_sharding_impl(x, aval, device, copy) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 487, in _device_put_sharding_impl raise ValueError( ValueError: device_put's first argument must be a fully addressable array, but got value with devices {TpuDevice(id=13, process_index=2, coords=(1,3,0), core_on_chip=0), TpuDevice(id=30, process_index=7, coords=(2,7,0), core_on_chip=0), TpuDevice(id=20, process_index=4, coords=(0,5,0), core_on_chip=0), TpuDevice(id=1, process_index=0, coords=(1,0,0), core_on_chip=0), TpuDevice(id=28, process_index=6, coords=(0,7,0), core_on_chip=0), TpuDevice(id=9, process_index=2, coords=(1,2,0), core_on_chip=0), TpuDevice(id=29, process_index=6, coords=(1,7,0), core_on_chip=0), TpuDevice(id=27, process_index=7, coords=(3,6,0), core_on_chip=0), TpuDevice(id=10, process_index=3, coords=(2,2,0), core_on_chip=0), TpuDevice(id=3, process_index=1, coords=(3,0,0), core_on_chip=0), TpuDevice(id=4, process_index=0, coords=(0,1,0), core_on_chip=0), TpuDevice(id=5, process_index=0, coords=(1,1,0), core_on_chip=0), TpuDevice(id=2, process_index=1, coords=(2,0,0), core_on_chip=0), TpuDevice(id=15, process_index=3, coords=(3,3,0), core_on_chip=0), TpuDevice(id=14, process_index=3, coords=(2,3,0), core_on_chip=0), TpuDevice(id=24, process_index=6, coords=(0,6,0), core_on_chip=0), TpuDevice(id=22, process_index=5, coords=(2,5,0), core_on_chip=0), TpuDevice(id=21, process_index=4, coords=(1,5,0), core_on_chip=0), TpuDevice(id=12, process_index=2, coords=(0,3,0), core_on_chip=0), TpuDevice(id=17, process_index=4, coords=(1,4,0), core_on_chip=0), TpuDevice(id=31, process_index=7, coords=(3,7,0), core_on_chip=0), TpuDevice(id=11, process_index=3, coords=(3,2,0), core_on_chip=0), TpuDevice(id=25, process_index=6, coords=(1,6,0), core_on_chip=0), TpuDevice(id=8, process_index=2, coords=(0,2,0), core_on_chip=0), TpuDevice(id=23, process_index=5, coords=(3,5,0), core_on_chip=0), TpuDevice(id=6, process_index=1, coords=(2,1,0), core_on_chip=0), TpuDevice(id=18, process_index=5, coords=(2,4,0), core_on_chip=0), TpuDevice(id=7, process_index=1, coords=(3,1,0), core_on_chip=0), TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), TpuDevice(id=19, process_index=5, coords=(3,4,0), core_on_chip=0), TpuDevice(id=26, process_index=7, coords=(2,6,0), core_on_chip=0), TpuDevice(id=16, process_index=4, coords=(0,4,0), core_on_chip=0)} I0420 15:53:04.519649 139944311904000 grain_pool.py:542] Grain pool is exiting. I0420 15:53:04.519751 139944311904000 grain_pool.py:547] Shutting down multiprocessing system. I0420 15:53:05.948552 139944311904000 grain_pool.py:547] Shutting down multiprocessing system. Training: 0%| | 0/5 [00:13<?, ?step/s] /usr/local/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 15 leaked shared_memory objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' Exception ignored in: <function GCSRecordWriter.__del__ at 0x7f67436bd260> Traceback (most recent call last): File "/usr/local/lib/python3.12/site-packages/tensorboardX/record_writer.py", line 134, in __del__ File "/usr/local/lib/python3.12/site-packages/tensorboardX/record_writer.py", line 158, in close File "/usr/local/lib/python3.12/site-packages/tensorboardX/record_writer.py", line 149, in flush File "/usr/local/lib/python3.12/copy.py", line 87, in copy ImportError: sys.meta_path is None, Python is likely shutting down XPK End: Mon Apr 20 15:53:13 UTC 2026 EXIT_CODE=1
XPK Start: Mon Apr 20 16:16:20 UTC 2026 2026-04-20 16:16:37.437297: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303) I0420 16:16:40.995103 137005397919552 max_utils.py:273] Attempting to initialize the jax distributed system... INFO:2026-04-20 16:16:50,033:jax._src.distributed:149: Starting JAX distributed service on [::]:8482 I0420 16:16:50.033734 137005397919552 distributed.py:149] Starting JAX distributed service on [::]:8482 INFO:2026-04-20 16:16:50,039:jax._src.distributed:166: Connecting to JAX distributed service on mt-07-distill-smoke-yp34y-slice-job-0-0.mt-07-distill-smoke-yp34y:8482 I0420 16:16:50.039265 137005397919552 distributed.py:166] Connecting to JAX distributed service on mt-07-distill-smoke-yp34y-slice-job-0-0.mt-07-distill-smoke-yp34y:8482 I0420 16:16:51.511896 137005397919552 max_utils.py:284] Jax distributed system initialized! I0420 16:16:57.894912 137005397919552 max_utils.py:244] Jax distributed system is already initialized. I0420 16:16:58.363710 137005397919552 max_utils.py:244] Jax distributed system is already initialized. I0420 16:16:58.364956 137005397919552 pyconfig.py:432] Config param abort_on_inf_loss: True I0420 16:16:58.365005 137005397919552 pyconfig.py:432] Config param abort_on_nan_loss: True I0420 16:16:58.365031 137005397919552 pyconfig.py:432] Config param act_quantization_calibration_method: absmax I0420 16:16:58.365053 137005397919552 pyconfig.py:432] Config param activation_dropout_for_audio: 0.0 I0420 16:16:58.365075 137005397919552 pyconfig.py:432] Config param activation_function_for_audio: gelu I0420 16:16:58.365115 137005397919552 pyconfig.py:432] Config param activations_in_float32: False I0420 16:16:58.365136 137005397919552 pyconfig.py:432] Config param adam_b1: 0.9 I0420 16:16:58.365156 137005397919552 pyconfig.py:432] Config param adam_b2: 0.95 I0420 16:16:58.365172 137005397919552 pyconfig.py:432] Config param adam_eps: 1e-08 I0420 16:16:58.365195 137005397919552 pyconfig.py:432] Config param adam_eps_root: 0.0 I0420 16:16:58.365212 137005397919552 pyconfig.py:432] Config param adam_weight_decay: 0.1 I0420 16:16:58.365227 137005397919552 pyconfig.py:432] Config param adamw_mask: [] I0420 16:16:58.365244 137005397919552 pyconfig.py:432] Config param add_bos: True I0420 16:16:58.365261 137005397919552 pyconfig.py:432] Config param add_eos: True I0420 16:16:58.365277 137005397919552 pyconfig.py:432] Config param allow_split_physical_axes: False I0420 16:16:58.365292 137005397919552 pyconfig.py:432] Config param ar_cache_axis_order: 1,2,0,3 I0420 16:16:58.365307 137005397919552 pyconfig.py:432] Config param async_checkpointing: True I0420 16:16:58.365324 137005397919552 pyconfig.py:432] Config param async_scheduling: False I0420 16:16:58.365340 137005397919552 pyconfig.py:432] Config param attention: dot_product I0420 16:16:58.365355 137005397919552 pyconfig.py:432] Config param attention_bias: False I0420 16:16:58.365371 137005397919552 pyconfig.py:432] Config param attention_dropout_for_audio: 0.0 I0420 16:16:58.365389 137005397919552 pyconfig.py:432] Config param attention_out: RematLocation.REMAT I0420 16:16:58.365410 137005397919552 pyconfig.py:432] Config param attention_sink: False I0420 16:16:58.365427 137005397919552 pyconfig.py:432] Config param attention_type: global I0420 16:16:58.365444 137005397919552 pyconfig.py:432] Config param attn_logits_soft_cap: None I0420 16:16:58.365460 137005397919552 pyconfig.py:432] Config param audio_path: I0420 16:16:58.365483 137005397919552 pyconfig.py:432] Config param audio_placeholder: <|audio|> I0420 16:16:58.365508 137005397919552 pyconfig.py:432] Config param autoregressive_decode_assert: I0420 16:16:58.365534 137005397919552 pyconfig.py:432] Config param base_config: base.yml I0420 16:16:58.365557 137005397919552 pyconfig.py:432] Config param base_emb_dim: 16 I0420 16:16:58.365579 137005397919552 pyconfig.py:432] Config param base_mlp_dim: 64 I0420 16:16:58.365601 137005397919552 pyconfig.py:432] Config param base_moe_mlp_dim: 7168 I0420 16:16:58.365622 137005397919552 pyconfig.py:432] Config param base_num_decoder_layers: 1 I0420 16:16:58.365647 137005397919552 pyconfig.py:432] Config param base_num_kv_heads: 2 I0420 16:16:58.365673 137005397919552 pyconfig.py:432] Config param base_num_query_heads: 2 I0420 16:16:58.365697 137005397919552 pyconfig.py:432] Config param base_output_directory: I0420 16:16:58.365720 137005397919552 pyconfig.py:432] Config param batch_size: 1 I0420 16:16:58.365744 137005397919552 pyconfig.py:432] Config param batch_split_factor: 1 I0420 16:16:58.365767 137005397919552 pyconfig.py:432] Config param beta_fast: 32 I0420 16:16:58.365790 137005397919552 pyconfig.py:432] Config param beta_slow: 1 I0420 16:16:58.365812 137005397919552 pyconfig.py:432] Config param bwd_quantization_calibration_method: absmax I0420 16:16:58.365837 137005397919552 pyconfig.py:432] Config param capacity_factor: -1.0 I0420 16:16:58.365853 137005397919552 pyconfig.py:432] Config param cast_logits_to_fp32: True I0420 16:16:58.365869 137005397919552 pyconfig.py:432] Config param chat_template: I0420 16:16:58.365885 137005397919552 pyconfig.py:432] Config param chat_template_path: I0420 16:16:58.365900 137005397919552 pyconfig.py:432] Config param checkpoint_conversion_fn: None I0420 16:16:58.365918 137005397919552 pyconfig.py:432] Config param checkpoint_dir: None I0420 16:16:58.365936 137005397919552 pyconfig.py:432] Config param checkpoint_is_quantized: False I0420 16:16:58.365954 137005397919552 pyconfig.py:432] Config param checkpoint_period: 2000 I0420 16:16:58.365971 137005397919552 pyconfig.py:432] Config param checkpoint_storage_concurrent_gb: 96 I0420 16:16:58.365986 137005397919552 pyconfig.py:432] Config param checkpoint_storage_target_data_file_size_bytes: 2147483648 I0420 16:16:58.366003 137005397919552 pyconfig.py:432] Config param checkpoint_storage_use_ocdbt: True I0420 16:16:58.366021 137005397919552 pyconfig.py:432] Config param checkpoint_storage_use_zarr3: True I0420 16:16:58.366036 137005397919552 pyconfig.py:432] Config param checkpoint_todelete_full_path: None I0420 16:16:58.366052 137005397919552 pyconfig.py:432] Config param checkpoint_todelete_subdir: None I0420 16:16:58.366068 137005397919552 pyconfig.py:432] Config param chips_per_vm: 4 I0420 16:16:58.366113 137005397919552 pyconfig.py:432] Config param chunk_attn_window_size: 0 I0420 16:16:58.366139 137005397919552 pyconfig.py:432] Config param collect_stack_trace: False I0420 16:16:58.366163 137005397919552 pyconfig.py:432] Config param colocated_python_checkpointing: False I0420 16:16:58.366184 137005397919552 pyconfig.py:432] Config param colocated_python_data_input: False I0420 16:16:58.366205 137005397919552 pyconfig.py:432] Config param compile_topology: I0420 16:16:58.366220 137005397919552 pyconfig.py:432] Config param compile_topology_num_slices: -1 I0420 16:16:58.366236 137005397919552 pyconfig.py:432] Config param compile_xla_flags: I0420 16:16:58.366251 137005397919552 pyconfig.py:432] Config param compiled_trainstep_file: I0420 16:16:58.366267 137005397919552 pyconfig.py:432] Config param compute_axis_order: 0,1,2,3 I0420 16:16:58.366283 137005397919552 pyconfig.py:432] Config param constant_bound_config: [] I0420 16:16:58.366297 137005397919552 pyconfig.py:432] Config param context: RematLocation.REMAT I0420 16:16:58.366314 137005397919552 pyconfig.py:432] Config param context_parallel_load_balance: True I0420 16:16:58.366329 137005397919552 pyconfig.py:432] Config param context_parallel_size: 1 I0420 16:16:58.366345 137005397919552 pyconfig.py:432] Config param context_parallel_strategy: all_gather I0420 16:16:58.366359 137005397919552 pyconfig.py:432] Config param context_sharding: context I0420 16:16:58.366375 137005397919552 pyconfig.py:432] Config param conv_chunksize_for_audio: 500 I0420 16:16:58.366392 137005397919552 pyconfig.py:432] Config param conv_stride_for_vit: 14 I0420 16:16:58.366408 137005397919552 pyconfig.py:432] Config param cost_estimate_flops_bwd: -1 I0420 16:16:58.366423 137005397919552 pyconfig.py:432] Config param cost_estimate_flops_fwd: -1 I0420 16:16:58.366438 137005397919552 pyconfig.py:432] Config param custom_mesh: I0420 16:16:58.366453 137005397919552 pyconfig.py:432] Config param custom_mesh_and_rule: I0420 16:16:58.366469 137005397919552 pyconfig.py:432] Config param d_model_for_audio: 256 I0420 16:16:58.366483 137005397919552 pyconfig.py:432] Config param data_sharding: (('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'),) I0420 16:16:58.366503 137005397919552 pyconfig.py:432] Config param data_shuffle_seed: 0 I0420 16:16:58.366519 137005397919552 pyconfig.py:432] Config param dataset_name: c4/en:3.0.1 I0420 16:16:58.366533 137005397919552 pyconfig.py:432] Config param dataset_path: I0420 16:16:58.366549 137005397919552 pyconfig.py:432] Config param dataset_type: DatasetType.HF I0420 16:16:58.366566 137005397919552 pyconfig.py:432] Config param dcn_autoregressive_parallelism: 1 I0420 16:16:58.366583 137005397919552 pyconfig.py:432] Config param dcn_context_autoregressive_parallelism: 1 I0420 16:16:58.366599 137005397919552 pyconfig.py:432] Config param dcn_context_parallelism: 1 I0420 16:16:58.366615 137005397919552 pyconfig.py:432] Config param dcn_data_parallelism: -1 I0420 16:16:58.366629 137005397919552 pyconfig.py:432] Config param dcn_diloco_parallelism: 1 I0420 16:16:58.366645 137005397919552 pyconfig.py:432] Config param dcn_expert_parallelism: 1 I0420 16:16:58.366659 137005397919552 pyconfig.py:432] Config param dcn_fsdp_parallelism: 1 I0420 16:16:58.366674 137005397919552 pyconfig.py:432] Config param dcn_fsdp_transpose_parallelism: 1 I0420 16:16:58.366690 137005397919552 pyconfig.py:432] Config param dcn_parallelism: [1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] I0420 16:16:58.366707 137005397919552 pyconfig.py:432] Config param dcn_pipeline_parallelism: 1 I0420 16:16:58.366722 137005397919552 pyconfig.py:432] Config param dcn_sequence_parallelism: 1 I0420 16:16:58.366738 137005397919552 pyconfig.py:432] Config param dcn_tensor_parallelism: 1 I0420 16:16:58.366753 137005397919552 pyconfig.py:432] Config param dcn_tensor_sequence_parallelism: 1 I0420 16:16:58.366767 137005397919552 pyconfig.py:432] Config param dcn_tensor_transpose_parallelism: 1 I0420 16:16:58.366783 137005397919552 pyconfig.py:432] Config param debug: {'rl': False} I0420 16:16:58.366800 137005397919552 pyconfig.py:432] Config param debug_sharding: False I0420 16:16:58.366815 137005397919552 pyconfig.py:432] Config param decode_sampling_nucleus_p: -1 I0420 16:16:58.366835 137005397919552 pyconfig.py:432] Config param decode_sampling_strategy: SamplingStrategy.GREEDY I0420 16:16:58.366853 137005397919552 pyconfig.py:432] Config param decode_sampling_temperature: 1.0 I0420 16:16:58.366868 137005397919552 pyconfig.py:432] Config param decode_sampling_top_k: 0 I0420 16:16:58.366884 137005397919552 pyconfig.py:432] Config param decoder_block: DecoderBlockType.GPT3 I0420 16:16:58.366900 137005397919552 pyconfig.py:432] Config param decoder_layer_input: RematLocation.DEVICE I0420 16:16:58.366916 137005397919552 pyconfig.py:432] Config param deepstack_visual_indexes_for_vit: [] I0420 16:16:58.366933 137005397919552 pyconfig.py:432] Config param degenerate_group_masking: True I0420 16:16:58.366947 137005397919552 pyconfig.py:432] Config param diloco_outer_lr: 0.3 I0420 16:16:58.366963 137005397919552 pyconfig.py:432] Config param diloco_outer_momentum: 0.9 I0420 16:16:58.366978 137005397919552 pyconfig.py:432] Config param diloco_sync_period: 36 I0420 16:16:58.366993 137005397919552 pyconfig.py:432] Config param distill_alpha: 0.5 I0420 16:16:58.367008 137005397919552 pyconfig.py:432] Config param distill_beta: 0.0 I0420 16:16:58.367024 137005397919552 pyconfig.py:432] Config param distill_feature_loss_type: cosine I0420 16:16:58.367039 137005397919552 pyconfig.py:432] Config param distill_layer_indices: None I0420 16:16:58.367055 137005397919552 pyconfig.py:432] Config param distill_temperature: 1.0 I0420 16:16:58.367070 137005397919552 pyconfig.py:432] Config param downsample_hidden_size_for_audio: 256 I0420 16:16:58.367103 137005397919552 pyconfig.py:432] Config param dpo_beta: 0.1 I0420 16:16:58.367124 137005397919552 pyconfig.py:432] Config param dpo_label_smoothing: 0.0 I0420 16:16:58.367139 137005397919552 pyconfig.py:432] Config param dq_reduction_steps: 0 I0420 16:16:58.367154 137005397919552 pyconfig.py:432] Config param dropout_rate: 0.0 I0420 16:16:58.367170 137005397919552 pyconfig.py:432] Config param dtype: bfloat16 I0420 16:16:58.367200 137005397919552 pyconfig.py:432] Config param dtype_mm: float32 I0420 16:16:58.367215 137005397919552 pyconfig.py:432] Config param dump_hlo: False I0420 16:16:58.367230 137005397919552 pyconfig.py:432] Config param dump_hlo_delete_local_after: True I0420 16:16:58.367246 137005397919552 pyconfig.py:432] Config param dump_hlo_gcs_dir: gpt3-52k_2026-04-20-16-16/xla_dump I0420 16:16:58.367260 137005397919552 pyconfig.py:432] Config param dump_hlo_local_dir: /tmp/xla_dump/ I0420 16:16:58.367276 137005397919552 pyconfig.py:432] Config param dump_hlo_local_module_name: jit_train_step I0420 16:16:58.367290 137005397919552 pyconfig.py:432] Config param dump_hlo_module_name: jit_train_step I0420 16:16:58.367305 137005397919552 pyconfig.py:432] Config param dump_hlo_upload_all: False I0420 16:16:58.367319 137005397919552 pyconfig.py:432] Config param dump_hlo_xla_flags: I0420 16:16:58.367335 137005397919552 pyconfig.py:432] Config param dump_jaxpr: False I0420 16:16:58.367351 137005397919552 pyconfig.py:432] Config param dump_jaxpr_delete_local_after: True I0420 16:16:58.367365 137005397919552 pyconfig.py:432] Config param dump_jaxpr_gcs_dir: gpt3-52k_2026-04-20-16-16/jaxpr_dump I0420 16:16:58.367381 137005397919552 pyconfig.py:432] Config param dump_jaxpr_local_dir: /tmp/jaxpr_dump/ I0420 16:16:58.367395 137005397919552 pyconfig.py:432] Config param dump_step: -1 I0420 16:16:58.367410 137005397919552 pyconfig.py:432] Config param elastic_enabled: False I0420 16:16:58.367424 137005397919552 pyconfig.py:432] Config param elastic_max_retries: 10 I0420 16:16:58.367440 137005397919552 pyconfig.py:432] Config param elastic_timeout_seconds: 300 I0420 16:16:58.367454 137005397919552 pyconfig.py:432] Config param emb_dim: 16 I0420 16:16:58.367469 137005397919552 pyconfig.py:432] Config param enable_autocheckpoint: False I0420 16:16:58.367483 137005397919552 pyconfig.py:432] Config param enable_checkpoint_cloud_logger: False I0420 16:16:58.367499 137005397919552 pyconfig.py:432] Config param enable_checkpointing: True I0420 16:16:58.367514 137005397919552 pyconfig.py:432] Config param enable_continuous_checkpointing: False I0420 16:16:58.367530 137005397919552 pyconfig.py:432] Config param enable_data_shuffling: True I0420 16:16:58.367545 137005397919552 pyconfig.py:432] Config param enable_diloco: False I0420 16:16:58.367560 137005397919552 pyconfig.py:432] Config param enable_dp_attention: False I0420 16:16:58.367575 137005397919552 pyconfig.py:432] Config param enable_dropout: False I0420 16:16:58.367589 137005397919552 pyconfig.py:432] Config param enable_emergency_checkpoint: False I0420 16:16:58.367603 137005397919552 pyconfig.py:432] Config param enable_gcp_goodput_metrics: True I0420 16:16:58.367618 137005397919552 pyconfig.py:432] Config param enable_gcp_step_deviation_metrics: True I0420 16:16:58.367632 137005397919552 pyconfig.py:432] Config param enable_goodput_recording: False I0420 16:16:58.367648 137005397919552 pyconfig.py:432] Config param enable_jax_profiler: False I0420 16:16:58.367664 137005397919552 pyconfig.py:432] Config param enable_llm_inference_pool: False I0420 16:16:58.367678 137005397919552 pyconfig.py:432] Config param enable_model_warmup: False I0420 16:16:58.367694 137005397919552 pyconfig.py:432] Config param enable_multi_tier_checkpointing: False I0420 16:16:58.367707 137005397919552 pyconfig.py:432] Config param enable_nnx: False I0420 16:16:58.367723 137005397919552 pyconfig.py:432] Config param enable_orbax_v1: False I0420 16:16:58.367737 137005397919552 pyconfig.py:432] Config param enable_padding_causal_mask: True I0420 16:16:58.367752 137005397919552 pyconfig.py:432] Config param enable_pathways_goodput: False I0420 16:16:58.367766 137005397919552 pyconfig.py:432] Config param enable_prefix_caching: False I0420 16:16:58.367782 137005397919552 pyconfig.py:432] Config param enable_rampup_batch_size: False I0420 16:16:58.367796 137005397919552 pyconfig.py:432] Config param enable_single_controller: False I0420 16:16:58.367811 137005397919552 pyconfig.py:432] Config param enable_single_replica_ckpt_restoring: False I0420 16:16:58.367829 137005397919552 pyconfig.py:432] Config param enable_tensorboard: True I0420 16:16:58.367843 137005397919552 pyconfig.py:432] Config param enable_tunix_perf_metrics: False I0420 16:16:58.367859 137005397919552 pyconfig.py:432] Config param encoder_attention_heads_for_audio: 4 I0420 16:16:58.367876 137005397919552 pyconfig.py:432] Config param encoder_ffn_dim_for_audio: 512 I0420 16:16:58.367889 137005397919552 pyconfig.py:432] Config param encoder_layers_for_audio: 2 I0420 16:16:58.367905 137005397919552 pyconfig.py:432] Config param engram: RematLocation.REMAT I0420 16:16:58.367923 137005397919552 pyconfig.py:432] Config param engram_head_dim: 1280 I0420 16:16:58.367940 137005397919552 pyconfig.py:432] Config param engram_kernel_size: 4 I0420 16:16:58.367955 137005397919552 pyconfig.py:432] Config param engram_layers: [] I0420 16:16:58.367972 137005397919552 pyconfig.py:432] Config param engram_max_ngram_size: 3 I0420 16:16:58.367988 137005397919552 pyconfig.py:432] Config param engram_num_heads: 8 I0420 16:16:58.368002 137005397919552 pyconfig.py:432] Config param engram_seed: 0 I0420 16:16:58.368018 137005397919552 pyconfig.py:432] Config param engram_vocab_bases: [] I0420 16:16:58.368033 137005397919552 pyconfig.py:432] Config param epsilon_high: None I0420 16:16:58.368047 137005397919552 pyconfig.py:432] Config param eval_corr_lst: False I0420 16:16:58.368063 137005397919552 pyconfig.py:432] Config param eval_data_columns: ['text'] I0420 16:16:58.368091 137005397919552 pyconfig.py:432] Config param eval_dataset_name: c4/en:3.0.1 I0420 16:16:58.368117 137005397919552 pyconfig.py:432] Config param eval_image_column: image I0420 16:16:58.368139 137005397919552 pyconfig.py:432] Config param eval_interval: -1 I0420 16:16:58.368162 137005397919552 pyconfig.py:432] Config param eval_make_lst: False I0420 16:16:58.368184 137005397919552 pyconfig.py:432] Config param eval_per_device_batch_size: 2 I0420 16:16:58.368206 137005397919552 pyconfig.py:432] Config param eval_sampling_strategy: greedy I0420 16:16:58.368229 137005397919552 pyconfig.py:432] Config param eval_split: validation I0420 16:16:58.368251 137005397919552 pyconfig.py:432] Config param eval_steps: -1 I0420 16:16:58.368274 137005397919552 pyconfig.py:432] Config param expansion_factor_real_data: -1.0 I0420 16:16:58.368294 137005397919552 pyconfig.py:432] Config param final_logits_soft_cap: None I0420 16:16:58.368309 137005397919552 pyconfig.py:432] Config param first_num_dense_layers: 0 I0420 16:16:58.368324 137005397919552 pyconfig.py:432] Config param float32_gate_logits: False I0420 16:16:58.368339 137005397919552 pyconfig.py:432] Config param float32_logits: False I0420 16:16:58.368354 137005397919552 pyconfig.py:432] Config param float32_qk_product: False I0420 16:16:58.368368 137005397919552 pyconfig.py:432] Config param float32_weight_sum: True I0420 16:16:58.368384 137005397919552 pyconfig.py:432] Config param force_q_layout: False I0420 16:16:58.368399 137005397919552 pyconfig.py:432] Config param force_unroll: False I0420 16:16:58.368414 137005397919552 pyconfig.py:432] Config param freeze_audio_encoder_params: True I0420 16:16:58.368430 137005397919552 pyconfig.py:432] Config param freeze_vision_encoder_params: True I0420 16:16:58.368445 137005397919552 pyconfig.py:432] Config param fused_mlp: False I0420 16:16:58.368461 137005397919552 pyconfig.py:432] Config param fused_qkv: True I0420 16:16:58.368475 137005397919552 pyconfig.py:432] Config param gcs_metrics: False I0420 16:16:58.368490 137005397919552 pyconfig.py:432] Config param gdn_chunk_size: 64 I0420 16:16:58.368504 137005397919552 pyconfig.py:432] Config param gdn_conv_kernel_dim: 4 I0420 16:16:58.368518 137005397919552 pyconfig.py:432] Config param gdn_key_head_dim: 128 I0420 16:16:58.368534 137005397919552 pyconfig.py:432] Config param gdn_num_key_heads: 16 I0420 16:16:58.368550 137005397919552 pyconfig.py:432] Config param gdn_num_value_heads: 32 I0420 16:16:58.368564 137005397919552 pyconfig.py:432] Config param gdn_value_head_dim: 128 I0420 16:16:58.368579 137005397919552 pyconfig.py:432] Config param generate_padding_batch_eval: False I0420 16:16:58.368593 137005397919552 pyconfig.py:432] Config param generate_padding_batch_train: False I0420 16:16:58.368608 137005397919552 pyconfig.py:432] Config param generate_slice: v5e-16 I0420 16:16:58.368623 137005397919552 pyconfig.py:432] Config param generation_configs: {} I0420 16:16:58.368639 137005397919552 pyconfig.py:432] Config param global_batch_size_to_eval_on: 64 I0420 16:16:58.368654 137005397919552 pyconfig.py:432] Config param global_batch_size_to_load: 512 I0420 16:16:58.368668 137005397919552 pyconfig.py:432] Config param global_batch_size_to_load_eval: 64 I0420 16:16:58.368683 137005397919552 pyconfig.py:432] Config param global_batch_size_to_load_increment: None I0420 16:16:58.368697 137005397919552 pyconfig.py:432] Config param global_batch_size_to_load_start: None I0420 16:16:58.368712 137005397919552 pyconfig.py:432] Config param global_batch_size_to_train_on: 512 I0420 16:16:58.368728 137005397919552 pyconfig.py:432] Config param global_head_dim: 0 I0420 16:16:58.368743 137005397919552 pyconfig.py:432] Config param global_num_kv_heads: 0 I0420 16:16:58.368757 137005397919552 pyconfig.py:432] Config param global_parameter_scale: 1 I0420 16:16:58.368773 137005397919552 pyconfig.py:432] Config param global_rampup_samples: 500 I0420 16:16:58.368787 137005397919552 pyconfig.py:432] Config param global_rope_max_timescale: -1 I0420 16:16:58.368803 137005397919552 pyconfig.py:432] Config param global_rope_proportion: 0.25 I0420 16:16:58.368818 137005397919552 pyconfig.py:432] Config param goodput_upload_interval_seconds: 30 I0420 16:16:58.368836 137005397919552 pyconfig.py:432] Config param grad_dtype: float32 I0420 16:16:58.368870 137005397919552 pyconfig.py:432] Config param gradient_accumulation_steps: 8 I0420 16:16:58.368886 137005397919552 pyconfig.py:432] Config param gradient_clipping_threshold: 1.0 I0420 16:16:58.368901 137005397919552 pyconfig.py:432] Config param grain_data_source_max_workers: 16 I0420 16:16:58.368917 137005397919552 pyconfig.py:432] Config param grain_eval_files: I0420 16:16:58.368932 137005397919552 pyconfig.py:432] Config param grain_file_type: arrayrecord I0420 16:16:58.368947 137005397919552 pyconfig.py:432] Config param grain_num_threads: 16 I0420 16:16:58.368962 137005397919552 pyconfig.py:432] Config param grain_num_threads_eval: 16 I0420 16:16:58.368978 137005397919552 pyconfig.py:432] Config param grain_packing_type: first_fit I0420 16:16:58.368992 137005397919552 pyconfig.py:432] Config param grain_per_worker_buffer_size: 1 I0420 16:16:58.369008 137005397919552 pyconfig.py:432] Config param grain_per_worker_buffer_size_eval: 1 I0420 16:16:58.369023 137005397919552 pyconfig.py:432] Config param grain_prefetch_buffer_size: 500 I0420 16:16:58.369038 137005397919552 pyconfig.py:432] Config param grain_prefetch_buffer_size_eval: 500 I0420 16:16:58.369054 137005397919552 pyconfig.py:432] Config param grain_ram_budget_mb: 1024 I0420 16:16:58.369068 137005397919552 pyconfig.py:432] Config param grain_shuffle_buffer_size: 100 I0420 16:16:58.369097 137005397919552 pyconfig.py:432] Config param grain_train_files: I0420 16:16:58.369119 137005397919552 pyconfig.py:432] Config param grain_train_mixture_config_path: I0420 16:16:58.369134 137005397919552 pyconfig.py:432] Config param grain_worker_count: 1 I0420 16:16:58.369153 137005397919552 pyconfig.py:432] Config param grain_worker_count_eval: 1 I0420 16:16:58.369170 137005397919552 pyconfig.py:432] Config param grpo_beta: 0.08 I0420 16:16:58.369187 137005397919552 pyconfig.py:432] Config param grpo_epsilon: 0.2 I0420 16:16:58.369203 137005397919552 pyconfig.py:432] Config param hardware: tpu I0420 16:16:58.369218 137005397919552 pyconfig.py:432] Config param hbm_utilization_vllm: 0.72 I0420 16:16:58.369235 137005397919552 pyconfig.py:432] Config param head_dim: 8 I0420 16:16:58.369249 137005397919552 pyconfig.py:432] Config param heartbeat_reporting_interval_in_seconds: 5 I0420 16:16:58.369265 137005397919552 pyconfig.py:432] Config param hf_data_dir: None I0420 16:16:58.369280 137005397919552 pyconfig.py:432] Config param hf_eval_files: None I0420 16:16:58.369296 137005397919552 pyconfig.py:432] Config param hf_eval_split: None I0420 16:16:58.369310 137005397919552 pyconfig.py:432] Config param hf_name: None I0420 16:16:58.369325 137005397919552 pyconfig.py:432] Config param hf_path: OptimalScale/ClimbMix I0420 16:16:58.369340 137005397919552 pyconfig.py:432] Config param hf_train_files: None I0420 16:16:58.369355 137005397919552 pyconfig.py:432] Config param hidden_size_for_vit: 1408 I0420 16:16:58.369370 137005397919552 pyconfig.py:432] Config param hide_profiler_step_metric: False I0420 16:16:58.369386 137005397919552 pyconfig.py:432] Config param ici_autoregressive_parallelism: 1 I0420 16:16:58.369400 137005397919552 pyconfig.py:432] Config param ici_context_autoregressive_parallelism: 1 I0420 16:16:58.369416 137005397919552 pyconfig.py:432] Config param ici_context_parallelism: 1 I0420 16:16:58.369430 137005397919552 pyconfig.py:432] Config param ici_data_parallelism: 1 I0420 16:16:58.369446 137005397919552 pyconfig.py:432] Config param ici_diloco_parallelism: 1 I0420 16:16:58.369460 137005397919552 pyconfig.py:432] Config param ici_expert_parallelism: 1 I0420 16:16:58.369476 137005397919552 pyconfig.py:432] Config param ici_fsdp_parallelism: -1 I0420 16:16:58.369490 137005397919552 pyconfig.py:432] Config param ici_fsdp_transpose_parallelism: 1 I0420 16:16:58.369506 137005397919552 pyconfig.py:432] Config param ici_parallelism: [1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1] I0420 16:16:58.369521 137005397919552 pyconfig.py:432] Config param ici_pipeline_parallelism: 1 I0420 16:16:58.369537 137005397919552 pyconfig.py:432] Config param ici_sequence_parallelism: 1 I0420 16:16:58.369551 137005397919552 pyconfig.py:432] Config param ici_tensor_parallelism: 1 I0420 16:16:58.369566 137005397919552 pyconfig.py:432] Config param ici_tensor_sequence_parallelism: 1 I0420 16:16:58.369580 137005397919552 pyconfig.py:432] Config param ici_tensor_transpose_parallelism: 1 I0420 16:16:58.369596 137005397919552 pyconfig.py:432] Config param image_path: I0420 16:16:58.369610 137005397919552 pyconfig.py:432] Config param image_placeholder: <|image|> I0420 16:16:58.369625 137005397919552 pyconfig.py:432] Config param image_size_for_vit: 896 I0420 16:16:58.369639 137005397919552 pyconfig.py:432] Config param indexer_head_dim: 128 I0420 16:16:58.369655 137005397919552 pyconfig.py:432] Config param indexer_loss_scaling_factor: 0.0 I0420 16:16:58.369669 137005397919552 pyconfig.py:432] Config param indexer_n_heads: 64 I0420 16:16:58.369685 137005397919552 pyconfig.py:432] Config param indexer_sparse_training: False I0420 16:16:58.369699 137005397919552 pyconfig.py:432] Config param indexer_topk: 2048 I0420 16:16:58.369715 137005397919552 pyconfig.py:432] Config param inference_benchmark_test: False I0420 16:16:58.369729 137005397919552 pyconfig.py:432] Config param inference_metadata_file: I0420 16:16:58.369745 137005397919552 pyconfig.py:432] Config param inference_microbenchmark_log_file_path: I0420 16:16:58.369759 137005397919552 pyconfig.py:432] Config param inference_microbenchmark_loop_iters: 10 I0420 16:16:58.369775 137005397919552 pyconfig.py:432] Config param inference_microbenchmark_num_samples: [1, 2, 3, 4, 5] I0420 16:16:58.369789 137005397919552 pyconfig.py:432] Config param inference_microbenchmark_prefill_lengths: 64,128,256,512,1024 I0420 16:16:58.369805 137005397919552 pyconfig.py:432] Config param inference_microbenchmark_stages: prefill,generate I0420 16:16:58.369822 137005397919552 pyconfig.py:432] Config param inference_server: MaxtextInterleavedServer I0420 16:16:58.369837 137005397919552 pyconfig.py:432] Config param inhomogeneous_layer_cycle_interval: 1 I0420 16:16:58.369852 137005397919552 pyconfig.py:432] Config param init_weights_seed: 0 I0420 16:16:58.369868 137005397919552 pyconfig.py:432] Config param input_data_sharding_logical_axes: ['activation_embed_and_logits_batch', 'activation_norm_length'] I0420 16:16:58.369883 137005397919552 pyconfig.py:432] Config param interleave_moe_layer_step: 1 I0420 16:16:58.369899 137005397919552 pyconfig.py:432] Config param intermediate_size_for_vit: 5632 I0420 16:16:58.369913 137005397919552 pyconfig.py:432] Config param internal_compile: False I0420 16:16:58.369928 137005397919552 pyconfig.py:432] Config param internal_compile_num_devices: -1 I0420 16:16:58.369942 137005397919552 pyconfig.py:432] Config param jax_cache_dir: ~/jax_cache I0420 16:16:58.369958 137005397919552 pyconfig.py:432] Config param jax_debug_log_modules: I0420 16:16:58.369972 137005397919552 pyconfig.py:432] Config param jax_distributed_initialization_timeout: 300 I0420 16:16:58.369991 137005397919552 pyconfig.py:432] Config param jax_profiler_port: 9999 I0420 16:16:58.370011 137005397919552 pyconfig.py:432] Config param key_proj: RematLocation.REMAT I0420 16:16:58.370027 137005397919552 pyconfig.py:432] Config param kv_cache_buffer: 256 I0420 16:16:58.370041 137005397919552 pyconfig.py:432] Config param kv_lora_rank: 512 I0420 16:16:58.370058 137005397919552 pyconfig.py:432] Config param kv_quant_axis: KvQuantAxis.HEADS_AND_DKV I0420 16:16:58.370074 137005397919552 pyconfig.py:432] Config param kv_quant_dtype: int8 I0420 16:16:58.370106 137005397919552 pyconfig.py:432] Config param kv_wa_proj: RematLocation.REMAT I0420 16:16:58.370130 137005397919552 pyconfig.py:432] Config param learning_rate: 0.0002 I0420 16:16:58.370153 137005397919552 pyconfig.py:432] Config param learning_rate_final_fraction: 0.1 I0420 16:16:58.370176 137005397919552 pyconfig.py:432] Config param learning_rate_schedule_steps: 200000 I0420 16:16:58.370198 137005397919552 pyconfig.py:432] Config param load_balance_loss_weight: 0.0 I0420 16:16:58.370213 137005397919552 pyconfig.py:432] Config param load_checkpoint_only_once: False I0420 16:16:58.370228 137005397919552 pyconfig.py:432] Config param load_from_prefill_dir: False I0420 16:16:58.370244 137005397919552 pyconfig.py:432] Config param load_full_state_path: I0420 16:16:58.370259 137005397919552 pyconfig.py:432] Config param load_parameters_path: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items I0420 16:16:58.370275 137005397919552 pyconfig.py:432] Config param local_checkpoint_directory: I0420 16:16:58.370289 137005397919552 pyconfig.py:432] Config param local_checkpoint_period: 0 I0420 16:16:58.370305 137005397919552 pyconfig.py:432] Config param local_rope_max_timescale: -1 I0420 16:16:58.370319 137005397919552 pyconfig.py:432] Config param local_rope_proportion: 1.0 I0420 16:16:58.370334 137005397919552 pyconfig.py:432] Config param log_config: True I0420 16:16:58.370350 137005397919552 pyconfig.py:432] Config param log_period: 10 I0420 16:16:58.370366 137005397919552 pyconfig.py:432] Config param logical_axis_rules: (('activation_embed_and_logits_batch', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_embed_and_logits_batch_sequence', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_vocab', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_vocab', ('tensor', 'tensor_transpose')), ('activation_vocab', 'tensor_sequence'), ('activation_vocab', ('sequence', 'context')), ('vocab', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('embed_vocab', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence', 'autoregressive')), ('activation_kv_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence')), ('activation_attn_length', ('sequence', 'context')), ('activation_attn_length', ('context',)), ('activation_q_length', ('context',)), ('activation_kv_length', ()), ('activation_attn_embed', ('tensor', 'tensor_transpose')), ('activation_kv', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_kv_head_dim', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('q_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('kv_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('qkv', ()), ('kv', ()), ('kv_head_dim', ()), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'expert')), ('q_lora_up_proj', ()), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'expert')), ('kv_lora_up_proj', ()), ('activation_batch_moe', ('data', 'fsdp', 'fsdp_transpose')), ('activation_length_moe', ('sequence', 'context')), ('activation_length_moe', ('context',)), ('activation_norm_length_moe', ('tensor_sequence', 'context', 'sequence')), ('activation_embed_moe', ('tensor', 'tensor_transpose')), ('activation_mlp_moe', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_exp', ('expert',)), ('exp', 'expert'), ('mlp_moe', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'context')), ('embed_moe', ('fsdp', 'sequence', 'context')), ('activation_mlp', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_length', ('sequence', 'context')), ('activation_length', ('context',)), ('activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_embed', ('tensor', 'tensor_transpose')), ('activation_stage', 'stage'), ('mlp', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'context', 'expert')), ('norm', ('tensor', 'tensor_transpose')), ('layers', 'stage'), ('diloco', 'diloco'), ('engram_dim', ('tensor',)), ('dense_layers', ()), ('moe_layers', ()), ('mhc', ()), ('prefill_activation_length', ('sequence', 'context')), ('prefill_activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_prefill_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_length', ('sequence',)), ('cache_heads', ('autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence')), ('cache_heads', ('autoregressive', 'tensor', 'tensor_sequence')), ('paged_kv_heads', ('tensor',)), ('cache_batch_prefill', ()), ('cache_batch', ()), ('cache_heads_none', ()), ('cache_kv', ()), ('cache_sequence', ()), ('num_pages', ()), ('tokens_per_page', ()), ('paged_kv_head_dim_size', ()), ('mlp_no_fsdp', ('tensor', 'tensor_sequence', 'autoregressive')), ('embed_tensor_transpose', ('tensor_transpose',)), ('exp_with_fsdp', 'fsdp')) I0420 16:16:58.370439 137005397919552 pyconfig.py:432] Config param logits_dot_in_fp32: False I0420 16:16:58.370454 137005397919552 pyconfig.py:432] Config param logits_via_embedding: True I0420 16:16:58.370470 137005397919552 pyconfig.py:432] Config param lora_input_adapters_path: I0420 16:16:58.370483 137005397919552 pyconfig.py:432] Config param loss_algo: grpo I0420 16:16:58.370499 137005397919552 pyconfig.py:432] Config param lr_schedule_type: LearningRateScheduleType.COSINE I0420 16:16:58.370517 137005397919552 pyconfig.py:432] Config param managed_mldiagnostics: False I0420 16:16:58.370532 137005397919552 pyconfig.py:432] Config param managed_mldiagnostics_dir: None I0420 16:16:58.370547 137005397919552 pyconfig.py:432] Config param managed_mldiagnostics_run_group: I0420 16:16:58.370562 137005397919552 pyconfig.py:432] Config param matmul_precision: MatmulPrecision.DEFAULT I0420 16:16:58.370581 137005397919552 pyconfig.py:432] Config param max_checkify: False I0420 16:16:58.370595 137005397919552 pyconfig.py:432] Config param max_concurrency: 256 I0420 16:16:58.370615 137005397919552 pyconfig.py:432] Config param max_corpus_chars: 10000000 I0420 16:16:58.370636 137005397919552 pyconfig.py:432] Config param max_num_batched_tokens: None I0420 16:16:58.370651 137005397919552 pyconfig.py:432] Config param max_num_checkpoints_to_keep: None I0420 16:16:58.370673 137005397919552 pyconfig.py:432] Config param max_num_images_per_example: -1 I0420 16:16:58.370693 137005397919552 pyconfig.py:432] Config param max_num_seqs: None I0420 16:16:58.370710 137005397919552 pyconfig.py:432] Config param max_position_embeddings: 163840 I0420 16:16:58.370733 137005397919552 pyconfig.py:432] Config param max_prefill_predict_length: 64 I0420 16:16:58.370756 137005397919552 pyconfig.py:432] Config param max_sample_len_for_audio: 10000 I0420 16:16:58.370778 137005397919552 pyconfig.py:432] Config param max_segments_per_seq: -1 I0420 16:16:58.370798 137005397919552 pyconfig.py:432] Config param max_source_positions_for_audio: 1500 I0420 16:16:58.370826 137005397919552 pyconfig.py:432] Config param max_target_length: 2048 I0420 16:16:58.370846 137005397919552 pyconfig.py:432] Config param max_timescale_for_audio: 10000.0 I0420 16:16:58.370869 137005397919552 pyconfig.py:432] Config param megablox: True I0420 16:16:58.370890 137005397919552 pyconfig.py:432] Config param merge_gating_gmm: False I0420 16:16:58.370912 137005397919552 pyconfig.py:432] Config param mesh_axes: ['diloco', 'data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'] I0420 16:16:58.370933 137005397919552 pyconfig.py:432] Config param metrics_dir: None I0420 16:16:58.370950 137005397919552 pyconfig.py:432] Config param metrics_file: I0420 16:16:58.370967 137005397919552 pyconfig.py:432] Config param mhc_expansion_rate: 1 I0420 16:16:58.370982 137005397919552 pyconfig.py:432] Config param micro_batch_size_to_eval_on: 64 I0420 16:16:58.370996 137005397919552 pyconfig.py:432] Config param micro_batch_size_to_train_on: 64 I0420 16:16:58.371012 137005397919552 pyconfig.py:432] Config param mla_kv: RematLocation.REMAT I0420 16:16:58.371026 137005397919552 pyconfig.py:432] Config param mla_naive_kvcache: True I0420 16:16:58.371042 137005397919552 pyconfig.py:432] Config param mla_q: RematLocation.REMAT I0420 16:16:58.371056 137005397919552 pyconfig.py:432] Config param mlp_activations: ['gelu'] I0420 16:16:58.371072 137005397919552 pyconfig.py:432] Config param mlp_activations_limit: -1.0 I0420 16:16:58.371107 137005397919552 pyconfig.py:432] Config param mlp_bias: False I0420 16:16:58.371124 137005397919552 pyconfig.py:432] Config param mlp_dim: 64 I0420 16:16:58.371141 137005397919552 pyconfig.py:432] Config param mlpwi: RematLocation.REMAT I0420 16:16:58.371158 137005397919552 pyconfig.py:432] Config param mlpwi_0: RematLocation.REMAT I0420 16:16:58.371173 137005397919552 pyconfig.py:432] Config param mlpwi_1: RematLocation.REMAT I0420 16:16:58.371189 137005397919552 pyconfig.py:432] Config param mlpwo: RematLocation.REMAT I0420 16:16:58.371204 137005397919552 pyconfig.py:432] Config param moba: False I0420 16:16:58.371221 137005397919552 pyconfig.py:432] Config param moba_chunk_size: 1024 I0420 16:16:58.371237 137005397919552 pyconfig.py:432] Config param moba_topk: 8 I0420 16:16:58.371252 137005397919552 pyconfig.py:432] Config param model_call_mode: I0420 16:16:58.371267 137005397919552 pyconfig.py:432] Config param model_name: gpt3-52k I0420 16:16:58.371282 137005397919552 pyconfig.py:432] Config param moe_fsdp_use_two_stage_all_gather: False I0420 16:16:58.371298 137005397919552 pyconfig.py:432] Config param moe_mlp_dim: 7168 I0420 16:16:58.371312 137005397919552 pyconfig.py:432] Config param moe_mlpwi_0: RematLocation.REMAT I0420 16:16:58.371328 137005397919552 pyconfig.py:432] Config param moe_mlpwi_1: RematLocation.REMAT I0420 16:16:58.371343 137005397919552 pyconfig.py:432] Config param moe_mlpwo: RematLocation.REMAT I0420 16:16:58.371358 137005397919552 pyconfig.py:432] Config param monitor_goodput: False I0420 16:16:58.371373 137005397919552 pyconfig.py:432] Config param monitor_step_time_deviation: True I0420 16:16:58.371388 137005397919552 pyconfig.py:432] Config param mrope_section: [24, 20, 20] I0420 16:16:58.371403 137005397919552 pyconfig.py:432] Config param mscale: 1.0 I0420 16:16:58.371419 137005397919552 pyconfig.py:432] Config param mtc_data_parallelism: 0 I0420 16:16:58.371433 137005397919552 pyconfig.py:432] Config param mtp_eval_target_module: 0 I0420 16:16:58.371449 137005397919552 pyconfig.py:432] Config param mtp_loss_scaling_factor: 0.1 I0420 16:16:58.371464 137005397919552 pyconfig.py:432] Config param mtp_num_layers: 0 I0420 16:16:58.371479 137005397919552 pyconfig.py:432] Config param mu_dtype: float32 I0420 16:16:58.371516 137005397919552 pyconfig.py:432] Config param multi_sampling: False I0420 16:16:58.371537 137005397919552 pyconfig.py:432] Config param multi_tier_checkpointing_backup_interval_minutes: 0 I0420 16:16:58.371561 137005397919552 pyconfig.py:432] Config param muon_beta: 0.95 I0420 16:16:58.371578 137005397919552 pyconfig.py:432] Config param muon_consistent_rms: None I0420 16:16:58.371593 137005397919552 pyconfig.py:432] Config param muon_weight_decay: 0.0 I0420 16:16:58.371608 137005397919552 pyconfig.py:432] Config param n_routing_groups: -1 I0420 16:16:58.371622 137005397919552 pyconfig.py:432] Config param n_window_for_audio: 50 I0420 16:16:58.371637 137005397919552 pyconfig.py:432] Config param n_window_infer_for_audio: 800 I0420 16:16:58.371651 137005397919552 pyconfig.py:432] Config param nope_layer_interval: -1 I0420 16:16:58.371665 137005397919552 pyconfig.py:432] Config param norm_topk_prob: False I0420 16:16:58.371680 137005397919552 pyconfig.py:432] Config param normalization_layer_epsilon: 1e-05 I0420 16:16:58.371698 137005397919552 pyconfig.py:432] Config param normalize_embedding_logits: False I0420 16:16:58.371712 137005397919552 pyconfig.py:432] Config param num_attention_heads_for_vit: 16 I0420 16:16:58.371727 137005397919552 pyconfig.py:432] Config param num_batches: 4 I0420 16:16:58.371742 137005397919552 pyconfig.py:432] Config param num_channels_for_vit: 3 I0420 16:16:58.371757 137005397919552 pyconfig.py:432] Config param num_conv_layers_for_audio: 3 I0420 16:16:58.371770 137005397919552 pyconfig.py:432] Config param num_decoder_layers: 1 I0420 16:16:58.371786 137005397919552 pyconfig.py:432] Config param num_diloco_replicas: 1 I0420 16:16:58.371802 137005397919552 pyconfig.py:432] Config param num_epoch: 1 I0420 16:16:58.371815 137005397919552 pyconfig.py:432] Config param num_eval_passes: 1 I0420 16:16:58.371835 137005397919552 pyconfig.py:432] Config param num_experts: 1 I0420 16:16:58.371849 137005397919552 pyconfig.py:432] Config param num_experts_per_tok: 1 I0420 16:16:58.371865 137005397919552 pyconfig.py:432] Config param num_generations: 2 I0420 16:16:58.371879 137005397919552 pyconfig.py:432] Config param num_hidden_layers_for_vit: 34 I0420 16:16:58.371894 137005397919552 pyconfig.py:432] Config param num_iterations: 1 I0420 16:16:58.371909 137005397919552 pyconfig.py:432] Config param num_kv_heads: 2 I0420 16:16:58.371923 137005397919552 pyconfig.py:432] Config param num_layers_per_pipeline_stage: 1 I0420 16:16:58.371939 137005397919552 pyconfig.py:432] Config param num_mel_bins_for_audio: 128 I0420 16:16:58.371953 137005397919552 pyconfig.py:432] Config param num_pipeline_microbatches: -1 I0420 16:16:58.371968 137005397919552 pyconfig.py:432] Config param num_pipeline_repeats: -1 I0420 16:16:58.371983 137005397919552 pyconfig.py:432] Config param num_position_embeddings_for_vit: 1024 I0420 16:16:58.371998 137005397919552 pyconfig.py:432] Config param num_query_heads: 2 I0420 16:16:58.372012 137005397919552 pyconfig.py:432] Config param num_samplers_slices: -1 I0420 16:16:58.372027 137005397919552 pyconfig.py:432] Config param num_slices: 1 I0420 16:16:58.372040 137005397919552 pyconfig.py:432] Config param num_target_devices: 32 I0420 16:16:58.372057 137005397919552 pyconfig.py:432] Config param num_test_batches: 5 I0420 16:16:58.372071 137005397919552 pyconfig.py:432] Config param num_trainer_slices: -1 I0420 16:16:58.372104 137005397919552 pyconfig.py:432] Config param num_vocab_tiling: 1 I0420 16:16:58.372128 137005397919552 pyconfig.py:432] Config param off_policy_steps: 0 I0420 16:16:58.372148 137005397919552 pyconfig.py:432] Config param offline_data_dir: None I0420 16:16:58.372170 137005397919552 pyconfig.py:432] Config param opt_type: OptimizerType.ADAM_PAX I0420 16:16:58.372196 137005397919552 pyconfig.py:432] Config param optimize_mesh_for_tpu_v6e: False I0420 16:16:58.372219 137005397919552 pyconfig.py:432] Config param optimizer_memory_host_offload: False I0420 16:16:58.372241 137005397919552 pyconfig.py:432] Config param original_max_position_embeddings: 4096 I0420 16:16:58.372263 137005397919552 pyconfig.py:432] Config param out_hidden_size_for_vit: 512 I0420 16:16:58.372286 137005397919552 pyconfig.py:432] Config param out_proj: RematLocation.REMAT I0420 16:16:58.372309 137005397919552 pyconfig.py:432] Config param output_dim_for_audio: 512 I0420 16:16:58.372334 137005397919552 pyconfig.py:432] Config param override_logical_axis_rules: False I0420 16:16:58.372356 137005397919552 pyconfig.py:432] Config param override_model_config: True I0420 16:16:58.372371 137005397919552 pyconfig.py:432] Config param packing: True I0420 16:16:58.372390 137005397919552 pyconfig.py:432] Config param pagedattn_head_dim_alignment: 128 I0420 16:16:58.372413 137005397919552 pyconfig.py:432] Config param pagedattn_max_pages_per_group: -1 I0420 16:16:58.372432 137005397919552 pyconfig.py:432] Config param pagedattn_num_pages: 64 I0420 16:16:58.372455 137005397919552 pyconfig.py:432] Config param pagedattn_pages_per_compute_block: 4 I0420 16:16:58.372475 137005397919552 pyconfig.py:432] Config param pagedattn_tokens_per_page: 32 I0420 16:16:58.372498 137005397919552 pyconfig.py:432] Config param param_scan_axis: 1 I0420 16:16:58.372521 137005397919552 pyconfig.py:432] Config param parameter_memory_host_offload: False I0420 16:16:58.372544 137005397919552 pyconfig.py:432] Config param partial_rotary_factor: 1.0 I0420 16:16:58.372566 137005397919552 pyconfig.py:432] Config param patch_size_for_vit: 14 I0420 16:16:58.372587 137005397919552 pyconfig.py:432] Config param penalty_incorrect_answer: -1.0 I0420 16:16:58.372610 137005397919552 pyconfig.py:432] Config param penalty_incorrect_format: -0.5 I0420 16:16:58.372634 137005397919552 pyconfig.py:432] Config param per_device_batch_size: 2 I0420 16:16:58.372656 137005397919552 pyconfig.py:432] Config param per_device_batch_size_increment: 2.0 I0420 16:16:58.372678 137005397919552 pyconfig.py:432] Config param per_device_batch_size_start: 4.0 I0420 16:16:58.372697 137005397919552 pyconfig.py:432] Config param pipeline_delay_activation_forwarding: False I0420 16:16:58.372713 137005397919552 pyconfig.py:432] Config param pipeline_fsdp_ag_once: False I0420 16:16:58.372727 137005397919552 pyconfig.py:432] Config param pipeline_fsdp_ag_per_repeat: False I0420 16:16:58.372744 137005397919552 pyconfig.py:432] Config param pipeline_parallel_layers: 1 I0420 16:16:58.372767 137005397919552 pyconfig.py:432] Config param pixel_shuffle_ratio_for_vit: 0.5 I0420 16:16:58.372789 137005397919552 pyconfig.py:432] Config param posemb_type_for_vit: learn I0420 16:16:58.372812 137005397919552 pyconfig.py:432] Config param position_id_per_seconds: 25 I0420 16:16:58.372838 137005397919552 pyconfig.py:432] Config param prefill_cache_axis_order: 1,2,0,3 I0420 16:16:58.372860 137005397919552 pyconfig.py:432] Config param prefill_cache_dir: I0420 16:16:58.372881 137005397919552 pyconfig.py:432] Config param prefill_chunk_size: 256 I0420 16:16:58.372903 137005397919552 pyconfig.py:432] Config param prefill_slice: v5e-16 I0420 16:16:58.372926 137005397919552 pyconfig.py:432] Config param prefix_caching_dram_byte: 100000000000 I0420 16:16:58.372948 137005397919552 pyconfig.py:432] Config param prefix_caching_hbm_byte: 10000000000 I0420 16:16:58.372969 137005397919552 pyconfig.py:432] Config param profile_cleanly: True I0420 16:16:58.372990 137005397919552 pyconfig.py:432] Config param profile_periodically_period: -1 I0420 16:16:58.373011 137005397919552 pyconfig.py:432] Config param profile_power_events: False I0420 16:16:58.373033 137005397919552 pyconfig.py:432] Config param profiler: ProfilerType.NONE I0420 16:16:58.373058 137005397919552 pyconfig.py:432] Config param profiler_steps: 5 I0420 16:16:58.373092 137005397919552 pyconfig.py:432] Config param projector_dropout_for_vit: 0.0 I0420 16:16:58.373115 137005397919552 pyconfig.py:432] Config param projector_input_dim_for_vit: 4096 I0420 16:16:58.373137 137005397919552 pyconfig.py:432] Config param projector_output_dim_for_vit: 4096 I0420 16:16:58.373160 137005397919552 pyconfig.py:432] Config param prometheus_port: 0 I0420 16:16:58.373183 137005397919552 pyconfig.py:432] Config param prompt: I love to I0420 16:16:58.373205 137005397919552 pyconfig.py:432] Config param pure_nnx: False I0420 16:16:58.373228 137005397919552 pyconfig.py:432] Config param pure_nnx_decoder: False I0420 16:16:58.373250 137005397919552 pyconfig.py:432] Config param q_lora_rank: 0 I0420 16:16:58.373271 137005397919552 pyconfig.py:432] Config param qk_clip_threshold: 100.0 I0420 16:16:58.373293 137005397919552 pyconfig.py:432] Config param qk_nope_head_dim: 128 I0420 16:16:58.373316 137005397919552 pyconfig.py:432] Config param qk_norm_with_scale: True I0420 16:16:58.373336 137005397919552 pyconfig.py:432] Config param qk_rope_head_dim: 64 I0420 16:16:58.373359 137005397919552 pyconfig.py:432] Config param qkv_proj: RematLocation.REMAT I0420 16:16:58.373379 137005397919552 pyconfig.py:432] Config param quant_cfg_path: I0420 16:16:58.373402 137005397919552 pyconfig.py:432] Config param quantization: QuantizationType.NONE I0420 16:16:58.373426 137005397919552 pyconfig.py:432] Config param quantization_local_shard_count: 4 I0420 16:16:58.373446 137005397919552 pyconfig.py:432] Config param quantize_kvcache: False I0420 16:16:58.373469 137005397919552 pyconfig.py:432] Config param query_proj: RematLocation.REMAT I0420 16:16:58.373489 137005397919552 pyconfig.py:432] Config param query_wa_proj: RematLocation.REMAT I0420 16:16:58.373512 137005397919552 pyconfig.py:432] Config param ragged_block_size: 256 I0420 16:16:58.373534 137005397919552 pyconfig.py:432] Config param rampup_end_step: 0 I0420 16:16:58.373557 137005397919552 pyconfig.py:432] Config param rampup_samples_per_increment_to_load: None I0420 16:16:58.373579 137005397919552 pyconfig.py:432] Config param reasoning_end_token: </reasoning> I0420 16:16:58.373599 137005397919552 pyconfig.py:432] Config param reasoning_start_token: <reasoning> I0420 16:16:58.373618 137005397919552 pyconfig.py:432] Config param record_internal_nn_metrics: 0 I0420 16:16:58.373634 137005397919552 pyconfig.py:432] Config param remat_policy: full I0420 16:16:58.373649 137005397919552 pyconfig.py:432] Config param remat_policy_for_vit: minimal I0420 16:16:58.373664 137005397919552 pyconfig.py:432] Config param remove_size_one_mesh_axis_from_type: True I0420 16:16:58.373679 137005397919552 pyconfig.py:432] Config param replicate_quant_scale: False I0420 16:16:58.373693 137005397919552 pyconfig.py:432] Config param replicator_backup_interval_minutes: 0 I0420 16:16:58.373708 137005397919552 pyconfig.py:432] Config param report_heartbeat_metric_for_gcp_monitoring: False I0420 16:16:58.373723 137005397919552 pyconfig.py:432] Config param report_performance_metric_for_gcp_monitoring: False I0420 16:16:58.373738 137005397919552 pyconfig.py:432] Config param reshape_q: False I0420 16:16:58.373753 137005397919552 pyconfig.py:432] Config param return_log_prob: False I0420 16:16:58.373769 137005397919552 pyconfig.py:432] Config param reuse_example_batch: 0 I0420 16:16:58.373785 137005397919552 pyconfig.py:432] Config param reward_exact_answer: 5.0 I0420 16:16:58.373807 137005397919552 pyconfig.py:432] Config param reward_exact_format_match: 3.0 I0420 16:16:58.373838 137005397919552 pyconfig.py:432] Config param reward_partial_format_match: 0.5 I0420 16:16:58.373866 137005397919552 pyconfig.py:432] Config param reward_ratio_guess_to_answer_high: 0.5 I0420 16:16:58.373892 137005397919552 pyconfig.py:432] Config param reward_ratio_guess_to_answer_low: 0.25 I0420 16:16:58.373917 137005397919552 pyconfig.py:432] Config param reward_white_space_format_match: 1.5 I0420 16:16:58.373943 137005397919552 pyconfig.py:432] Config param rl: {'num_generations': 2, 'num_iterations': 1, 'grpo_beta': 0.08, 'grpo_epsilon': 0.2, 'loss_algo': 'grpo', 'use_agentic_rollout': False, 'max_concurrency': 256, 'off_policy_steps': 0, 'system_prompt': '', 'degenerate_group_masking': True, 'epsilon_high': None} I0420 16:16:58.373977 137005397919552 pyconfig.py:432] Config param rollout_data_parallelism: -1 I0420 16:16:58.374002 137005397919552 pyconfig.py:432] Config param rollout_expert_parallelism: 1 I0420 16:16:58.374027 137005397919552 pyconfig.py:432] Config param rollout_micro_batch_size: -1 I0420 16:16:58.374053 137005397919552 pyconfig.py:432] Config param rollout_tensor_parallelism: -1 I0420 16:16:58.374092 137005397919552 pyconfig.py:432] Config param rope_attention_scaling: False I0420 16:16:58.374119 137005397919552 pyconfig.py:432] Config param rope_factor: 40 I0420 16:16:58.374143 137005397919552 pyconfig.py:432] Config param rope_interleave: True I0420 16:16:58.374165 137005397919552 pyconfig.py:432] Config param rope_linear_scaling_factor: 1.0 I0420 16:16:58.374187 137005397919552 pyconfig.py:432] Config param rope_max_timescale: 10000 I0420 16:16:58.374210 137005397919552 pyconfig.py:432] Config param rope_min_timescale: 1 I0420 16:16:58.374234 137005397919552 pyconfig.py:432] Config param rope_theta_for_vit: 10000 I0420 16:16:58.374260 137005397919552 pyconfig.py:432] Config param rope_truncate: True I0420 16:16:58.374286 137005397919552 pyconfig.py:432] Config param rope_type: RopeType.DEFAULT I0420 16:16:58.374317 137005397919552 pyconfig.py:432] Config param rope_use_scale: True I0420 16:16:58.374342 137005397919552 pyconfig.py:432] Config param routed_bias: False I0420 16:16:58.374368 137005397919552 pyconfig.py:432] Config param routed_bias_update_rate: 0.0 I0420 16:16:58.374393 137005397919552 pyconfig.py:432] Config param routed_scaling_factor: 1.0 I0420 16:16:58.374416 137005397919552 pyconfig.py:432] Config param routed_score_func: I0420 16:16:58.374439 137005397919552 pyconfig.py:432] Config param run_name: gpt3-52k_2026-04-20-16-16 I0420 16:16:58.374463 137005397919552 pyconfig.py:432] Config param sa_block_kv: 512 I0420 16:16:58.374486 137005397919552 pyconfig.py:432] Config param sa_block_kv_compute: 512 I0420 16:16:58.374501 137005397919552 pyconfig.py:432] Config param sa_block_kv_dkv: 512 I0420 16:16:58.374517 137005397919552 pyconfig.py:432] Config param sa_block_kv_dkv_compute: 512 I0420 16:16:58.374531 137005397919552 pyconfig.py:432] Config param sa_block_kv_dq: 512 I0420 16:16:58.374547 137005397919552 pyconfig.py:432] Config param sa_block_q: 512 I0420 16:16:58.374562 137005397919552 pyconfig.py:432] Config param sa_block_q_dkv: 512 I0420 16:16:58.374578 137005397919552 pyconfig.py:432] Config param sa_block_q_dq: 512 I0420 16:16:58.374598 137005397919552 pyconfig.py:432] Config param sa_k_layout: HEAD_DIM_MINOR I0420 16:16:58.374624 137005397919552 pyconfig.py:432] Config param sa_q_layout: HEAD_DIM_MINOR I0420 16:16:58.374650 137005397919552 pyconfig.py:432] Config param sa_use_fused_bwd_kernel: False I0420 16:16:58.374676 137005397919552 pyconfig.py:432] Config param sa_v_layout: HEAD_DIM_MINOR I0420 16:16:58.374700 137005397919552 pyconfig.py:432] Config param sampler_devices_fraction: 0.5 I0420 16:16:58.374724 137005397919552 pyconfig.py:432] Config param save_checkpoint_on_completion: True I0420 16:16:58.374746 137005397919552 pyconfig.py:432] Config param save_config_to_gcs: False I0420 16:16:58.374772 137005397919552 pyconfig.py:432] Config param save_quantized_params_path: I0420 16:16:58.374796 137005397919552 pyconfig.py:432] Config param scale_embedding_for_audio: True I0420 16:16:58.374825 137005397919552 pyconfig.py:432] Config param scan_layers: True I0420 16:16:58.374847 137005397919552 pyconfig.py:432] Config param scan_layers_per_stage: False I0420 16:16:58.374871 137005397919552 pyconfig.py:432] Config param scan_pipeline_iterations: True I0420 16:16:58.374894 137005397919552 pyconfig.py:432] Config param scan_pipeline_repeats: False I0420 16:16:58.374911 137005397919552 pyconfig.py:432] Config param set_remat_policy_on_layers_per_stage: False I0420 16:16:58.374926 137005397919552 pyconfig.py:432] Config param set_remat_policy_on_pipeline_iterations: True I0420 16:16:58.374940 137005397919552 pyconfig.py:432] Config param sft_train_on_completion_only: False I0420 16:16:58.374956 137005397919552 pyconfig.py:432] Config param shard_exp_on_fsdp: False I0420 16:16:58.374970 137005397919552 pyconfig.py:432] Config param shard_mode: ShardMode.AUTO I0420 16:16:58.374987 137005397919552 pyconfig.py:432] Config param shard_optimizer_over_data: False I0420 16:16:58.375001 137005397919552 pyconfig.py:432] Config param sharding_strategy: None I0420 16:16:58.375017 137005397919552 pyconfig.py:432] Config param sharding_tolerance: 0.02 I0420 16:16:58.375032 137005397919552 pyconfig.py:432] Config param shardy: True I0420 16:16:58.375047 137005397919552 pyconfig.py:432] Config param share_kv_projections: False I0420 16:16:58.375061 137005397919552 pyconfig.py:432] Config param shared_experts: 0 I0420 16:16:58.375089 137005397919552 pyconfig.py:432] Config param sinkhorn_iterations: 20 I0420 16:16:58.375113 137005397919552 pyconfig.py:432] Config param skip_first_n_steps_for_profiler: 1 I0420 16:16:58.375128 137005397919552 pyconfig.py:432] Config param skip_jax_distributed_system: False I0420 16:16:58.375143 137005397919552 pyconfig.py:432] Config param skip_step_interval: 128 I0420 16:16:58.375157 137005397919552 pyconfig.py:432] Config param skip_step_on_spikes: False I0420 16:16:58.375173 137005397919552 pyconfig.py:432] Config param skip_step_scaling_factor: 6.0 I0420 16:16:58.375187 137005397919552 pyconfig.py:432] Config param sliding_window_size: 0 I0420 16:16:58.375202 137005397919552 pyconfig.py:432] Config param solution_end_token: </answer> I0420 16:16:58.375218 137005397919552 pyconfig.py:432] Config param solution_start_token: <answer> I0420 16:16:58.375235 137005397919552 pyconfig.py:432] Config param source_checkpoint_layout: orbax I0420 16:16:58.375259 137005397919552 pyconfig.py:432] Config param sparse_matmul: True I0420 16:16:58.375280 137005397919552 pyconfig.py:432] Config param spatial_merge_size_for_vit: 2 I0420 16:16:58.375294 137005397919552 pyconfig.py:432] Config param stack_prefill_result_cache: False I0420 16:16:58.375310 137005397919552 pyconfig.py:432] Config param stack_trace_interval_seconds: 600 I0420 16:16:58.375324 137005397919552 pyconfig.py:432] Config param stack_trace_to_cloud: False I0420 16:16:58.375340 137005397919552 pyconfig.py:432] Config param step_deviation_interval_seconds: 30 I0420 16:16:58.375354 137005397919552 pyconfig.py:432] Config param steps: 200000 I0420 16:16:58.375370 137005397919552 pyconfig.py:432] Config param stop_strings: None I0420 16:16:58.375384 137005397919552 pyconfig.py:432] Config param student_overrides: {'model_name': 'llama3.1-8b'} I0420 16:16:58.375400 137005397919552 pyconfig.py:432] Config param student_params_to_update: None I0420 16:16:58.375414 137005397919552 pyconfig.py:432] Config param subslice_shape: I0420 16:16:58.375429 137005397919552 pyconfig.py:432] Config param swap_space_vllm_gb: 2 I0420 16:16:58.375444 137005397919552 pyconfig.py:432] Config param system_prompt: I0420 16:16:58.375459 137005397919552 pyconfig.py:432] Config param target_eval_loss: 0.0 I0420 16:16:58.375473 137005397919552 pyconfig.py:432] Config param teacher_overrides: {'model_name': 'llama3.1-8b'} I0420 16:16:58.375489 137005397919552 pyconfig.py:432] Config param temperature_tuning: False I0420 16:16:58.375503 137005397919552 pyconfig.py:432] Config param temporal_patch_size_for_vit: 2 I0420 16:16:58.375519 137005397919552 pyconfig.py:432] Config param tensorboard_dir: None I0420 16:16:58.375532 137005397919552 pyconfig.py:432] Config param tensors_on_device: None I0420 16:16:58.375548 137005397919552 pyconfig.py:432] Config param tensors_to_offload: None I0420 16:16:58.375561 137005397919552 pyconfig.py:432] Config param test_batch_start_index: 0 I0420 16:16:58.375578 137005397919552 pyconfig.py:432] Config param tile_size_for_vit: 336 I0420 16:16:58.375593 137005397919552 pyconfig.py:432] Config param tokenize_eval_data: True I0420 16:16:58.375609 137005397919552 pyconfig.py:432] Config param tokenize_train_data: True I0420 16:16:58.375622 137005397919552 pyconfig.py:432] Config param tokenizer_path: meta-llama/Llama-3.1-8B I0420 16:16:58.375638 137005397919552 pyconfig.py:432] Config param tokenizer_type: TokenizerType.HUGGINGFACE I0420 16:16:58.375656 137005397919552 pyconfig.py:432] Config param topk_routing_group: -1 I0420 16:16:58.375670 137005397919552 pyconfig.py:432] Config param train_data_columns: ['text'] I0420 16:16:58.375686 137005397919552 pyconfig.py:432] Config param train_fraction: 1.0 I0420 16:16:58.375700 137005397919552 pyconfig.py:432] Config param train_image_column: image I0420 16:16:58.375716 137005397919552 pyconfig.py:432] Config param train_micro_batch_size: -1 I0420 16:16:58.375731 137005397919552 pyconfig.py:432] Config param train_split: train I0420 16:16:58.375745 137005397919552 pyconfig.py:432] Config param trainable_parameters_mask: [] I0420 16:16:58.375761 137005397919552 pyconfig.py:432] Config param trainable_position_size: 2048 I0420 16:16:58.375775 137005397919552 pyconfig.py:432] Config param trainer_devices_fraction: 0.5 I0420 16:16:58.375791 137005397919552 pyconfig.py:432] Config param upload_all_profiler_results: False I0420 16:16:58.375805 137005397919552 pyconfig.py:432] Config param use_2d_fsdp_sharding: False I0420 16:16:58.375825 137005397919552 pyconfig.py:432] Config param use_agentic_rollout: False I0420 16:16:58.375839 137005397919552 pyconfig.py:432] Config param use_audio: False I0420 16:16:58.375855 137005397919552 pyconfig.py:432] Config param use_audio_in_video: False I0420 16:16:58.375869 137005397919552 pyconfig.py:432] Config param use_batch_split_schedule: False I0420 16:16:58.375884 137005397919552 pyconfig.py:432] Config param use_chat_template: False I0420 16:16:58.375897 137005397919552 pyconfig.py:432] Config param use_chunked_prefill: False I0420 16:16:58.375913 137005397919552 pyconfig.py:432] Config param use_custom_sort_vjp: True I0420 16:16:58.375927 137005397919552 pyconfig.py:432] Config param use_dpo: False I0420 16:16:58.375942 137005397919552 pyconfig.py:432] Config param use_gather_mosaic_kernel: False I0420 16:16:58.375957 137005397919552 pyconfig.py:432] Config param use_grpo: True I0420 16:16:58.375972 137005397919552 pyconfig.py:432] Config param use_indexer: False I0420 16:16:58.375986 137005397919552 pyconfig.py:432] Config param use_iota_embed: True I0420 16:16:58.376001 137005397919552 pyconfig.py:432] Config param use_jax_splash: False I0420 16:16:58.376021 137005397919552 pyconfig.py:432] Config param use_max_logit_estimate: -1 I0420 16:16:58.376042 137005397919552 pyconfig.py:432] Config param use_mrope: False I0420 16:16:58.376056 137005397919552 pyconfig.py:432] Config param use_multimodal: False I0420 16:16:58.376096 137005397919552 pyconfig.py:432] Config param use_pathways: True I0420 16:16:58.376121 137005397919552 pyconfig.py:432] Config param use_post_attn_norm: False I0420 16:16:58.376144 137005397919552 pyconfig.py:432] Config param use_post_ffw_norm: False I0420 16:16:58.376164 137005397919552 pyconfig.py:432] Config param use_qk_clip: False I0420 16:16:58.376185 137005397919552 pyconfig.py:432] Config param use_qk_norm: False I0420 16:16:58.376208 137005397919552 pyconfig.py:432] Config param use_qk_norm_in_gdn: True I0420 16:16:58.376230 137005397919552 pyconfig.py:432] Config param use_qwix_quantization: False I0420 16:16:58.376250 137005397919552 pyconfig.py:432] Config param use_ragged_attention: False I0420 16:16:58.376272 137005397919552 pyconfig.py:432] Config param use_random_routing: False I0420 16:16:58.376295 137005397919552 pyconfig.py:432] Config param use_replicator_service: False I0420 16:16:58.376318 137005397919552 pyconfig.py:432] Config param use_ring_of_experts: False I0420 16:16:58.376343 137005397919552 pyconfig.py:432] Config param use_sft: False I0420 16:16:58.376365 137005397919552 pyconfig.py:432] Config param use_splash_scheduler: False I0420 16:16:58.376380 137005397919552 pyconfig.py:432] Config param use_tokamax_gmm: False I0420 16:16:58.376404 137005397919552 pyconfig.py:432] Config param use_tokamax_splash: False I0420 16:16:58.376425 137005397919552 pyconfig.py:432] Config param use_truncation: True I0420 16:16:58.376449 137005397919552 pyconfig.py:432] Config param use_tunix_gradient_accumulation: False I0420 16:16:58.376472 137005397919552 pyconfig.py:432] Config param use_untrainable_positional_embedding: False I0420 16:16:58.376495 137005397919552 pyconfig.py:432] Config param use_vertex_tensorboard: False I0420 16:16:58.376515 137005397919552 pyconfig.py:432] Config param using_pipeline_parallelism: False I0420 16:16:58.376538 137005397919552 pyconfig.py:432] Config param v_head_dim: 128 I0420 16:16:58.376560 137005397919552 pyconfig.py:432] Config param v_norm_with_scale: True I0420 16:16:58.376581 137005397919552 pyconfig.py:432] Config param value_proj: RematLocation.REMAT I0420 16:16:58.376604 137005397919552 pyconfig.py:432] Config param vertex_tensorboard_project: I0420 16:16:58.376624 137005397919552 pyconfig.py:432] Config param vertex_tensorboard_region: I0420 16:16:58.376642 137005397919552 pyconfig.py:432] Config param video_path: I0420 16:16:58.376657 137005397919552 pyconfig.py:432] Config param video_placeholder: <|video|> I0420 16:16:58.376672 137005397919552 pyconfig.py:432] Config param vision_output_dim_for_vit: 4096 I0420 16:16:58.376686 137005397919552 pyconfig.py:432] Config param vision_output_length: -1 I0420 16:16:58.376701 137005397919552 pyconfig.py:432] Config param vllm_additional_config: {} I0420 16:16:58.376716 137005397919552 pyconfig.py:432] Config param vllm_hf_config_path: I0420 16:16:58.376731 137005397919552 pyconfig.py:432] Config param vllm_hf_overrides: {} I0420 16:16:58.376746 137005397919552 pyconfig.py:432] Config param vocab_size: 32000 I0420 16:16:58.376764 137005397919552 pyconfig.py:432] Config param warmup_steps_fraction: 0.1 I0420 16:16:58.376790 137005397919552 pyconfig.py:432] Config param weight_dtype: float32 I0420 16:16:58.376830 137005397919552 pyconfig.py:432] Config param weight_quantization_calibration_method: absmax I0420 16:16:58.376853 137005397919552 pyconfig.py:432] Config param wi_tile_dlhs_batch_seq: 512 I0420 16:16:58.376874 137005397919552 pyconfig.py:432] Config param wi_tile_dlhs_embed_dim: 1024 I0420 16:16:58.376897 137005397919552 pyconfig.py:432] Config param wi_tile_dlhs_mlp_dim: 1024 I0420 16:16:58.376918 137005397919552 pyconfig.py:432] Config param wi_tile_drhs_batch_seq: 512 I0420 16:16:58.376940 137005397919552 pyconfig.py:432] Config param wi_tile_drhs_embed_dim: 1024 I0420 16:16:58.376961 137005397919552 pyconfig.py:432] Config param wi_tile_drhs_mlp_dim: 1024 I0420 16:16:58.376985 137005397919552 pyconfig.py:432] Config param wi_tile_fwd_batch_seq: 512 I0420 16:16:58.377008 137005397919552 pyconfig.py:432] Config param wi_tile_fwd_embed_dim: 1024 I0420 16:16:58.377031 137005397919552 pyconfig.py:432] Config param wi_tile_fwd_mlp_dim: 1024 I0420 16:16:58.377053 137005397919552 pyconfig.py:432] Config param wo_tile_dlhs_batch_seq: 512 I0420 16:16:58.377086 137005397919552 pyconfig.py:432] Config param wo_tile_dlhs_embed_dim: 1024 I0420 16:16:58.377112 137005397919552 pyconfig.py:432] Config param wo_tile_dlhs_mlp_dim: 1024 I0420 16:16:58.377135 137005397919552 pyconfig.py:432] Config param wo_tile_drhs_batch_seq: 512 I0420 16:16:58.377156 137005397919552 pyconfig.py:432] Config param wo_tile_drhs_embed_dim: 1024 I0420 16:16:58.377179 137005397919552 pyconfig.py:432] Config param wo_tile_drhs_mlp_dim: 1024 I0420 16:16:58.377202 137005397919552 pyconfig.py:432] Config param wo_tile_fwd_batch_seq: 512 I0420 16:16:58.377224 137005397919552 pyconfig.py:432] Config param wo_tile_fwd_embed_dim: 1024 I0420 16:16:58.377244 137005397919552 pyconfig.py:432] Config param wo_tile_fwd_mlp_dim: 1024 I0420 16:16:58.377264 137005397919552 pyconfig.py:432] Config param wsd_decay_steps_fraction: 0.1 I0420 16:16:58.377279 137005397919552 pyconfig.py:432] Config param wsd_decay_style: WsdDecayStyle.LINEAR I0420 16:16:58.377298 137005397919552 pyconfig.py:432] Config param xprof_e2e_enable_fw_power_level_event: False I0420 16:16:58.377312 137005397919552 pyconfig.py:432] Config param xprof_e2e_enable_fw_thermal_event: False I0420 16:16:58.377328 137005397919552 pyconfig.py:432] Config param xprof_e2e_enable_fw_throttle_event: False I0420 16:16:58.377342 137005397919552 pyconfig.py:432] Config param xprof_tpu_power_trace_level: 0 I0420 16:16:58.377359 137005397919552 pyconfig.py:432] Config param z_loss_multiplier: 0.0 I0420 16:16:58.377696 137005397919552 tokenizer.py:245] Tokenizer path: meta-llama/Llama-2-7b-chat-hf I0420 16:16:58.377731 137005397919552 tokenizer.py:224] Loading HF tokenizer: meta-llama/Llama-2-7b-chat-hf I0420 16:17:02.314277 137005397919552 _schedule.py:129] A polynomial schedule was set with a non-positive `transition_steps` value; this results in a constant schedule with value `init_value`. I0420 16:17:02.317224 137005397919552 maxtext_utils.py:1718] Num_devices: 32, shape (1, 4, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1) I0420 16:17:02.317334 137005397919552 train_distill.py:596] Applying logical axis rules for model initialization and training... I0420 16:17:02.317403 137005397919552 train_distill.py:600] Loading Student from ... I0420 16:17:02.317430 137005397919552 train_distill.py:169] --- Student Configuration --- I0420 16:17:02.317453 137005397919552 train_distill.py:170] Model Name: gpt3-52k I0420 16:17:02.317474 137005397919552 train_distill.py:171] Dimensions: 1 Layers, 16 Emb Dim, 8 Head Dim I0420 16:17:02.317493 137005397919552 train_distill.py:174] Attention Heads: 2 Query, 2 KV I0420 16:17:02.317512 137005397919552 train_distill.py:175] Vocab Size: 32000 I0420 16:17:02.317529 137005397919552 train_distill.py:176] Checkpoint: I0420 16:17:02.317546 137005397919552 train_distill.py:472] Initializing model: gpt3-52k... I0420 16:17:03.585749 137005397919552 train_distill.py:614] Loading Teacher from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items... I0420 16:17:03.585860 137005397919552 train_distill.py:169] --- Teacher Configuration --- I0420 16:17:03.585888 137005397919552 train_distill.py:170] Model Name: gpt3-52k I0420 16:17:03.585911 137005397919552 train_distill.py:171] Dimensions: 1 Layers, 16 Emb Dim, 8 Head Dim I0420 16:17:03.585932 137005397919552 train_distill.py:174] Attention Heads: 2 Query, 2 KV I0420 16:17:03.585951 137005397919552 train_distill.py:175] Vocab Size: 32000 I0420 16:17:03.585970 137005397919552 train_distill.py:176] Checkpoint: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items I0420 16:17:03.585989 137005397919552 train_distill.py:472] Initializing model: gpt3-52k... I0420 16:17:04.734141 137005397919552 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0420 16:17:04.734591 137005397919552 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=True, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7c9a567333b0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0420 16:17:04.734664 137005397919552 abstract_checkpointer.py:35] orbax-checkpoint version: 0.11.28 W0420 16:17:05.701584 137005397919552 checkpoint.py:202] Metadata file does not exist: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items/_CHECKPOINT_METADATA I0420 16:17:06.238714 2131 google_auth_provider.cc:181] Running on GCE, using service account 562977990677-compute@developer.gserviceaccount.com I0420 16:17:07.791511 137005397919552 checkpointer.py:304] Restoring checkpoint from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items. W0420 16:17:09.838890 137005397919552 transform_utils.py:230] The transformations API will eventually be replaced by an upgraded design. The current API will not be removed until this point, but it will no longer be actively worked on. I0420 16:17:09.839306 137005397919552 transform_utils.py:288] The following keys are not loaded from the original tree after applying specified transforms: params/params/decoder/to_nnx__rngs/aqt/count, params/params/decoder/to_nnx__rngs/aqt/key, params/params/decoder/to_nnx__rngs/dropout/count, params/params/decoder/to_nnx__rngs/dropout/key, params/params/decoder/to_nnx__rngs/params/count, params/params/decoder/to_nnx__rngs/params/key I0420 16:17:11.511313 137005397919552 checkpointer.py:318] Finished restoring checkpoint in 4.10 seconds from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items. I0420 16:17:11.875130 137005397919552 metrics_logger.py:64] WandbBackend skipped: 'wandb' library not installed. I0420 16:17:12.141978 137005397919552 train_distill.py:640] Initializing Data Iterators via MaxText pipeline... I0420 16:17:12.205822 137005397919552 config.py:112] TensorFlow version 2.20.0 available. I0420 16:17:12.206375 137005397919552 config.py:125] JAX version 0.8.3 available. E0420 16:17:14.243875 137005397919552 packing.py:209] PackAndBatchOperation is deprecated. Please use lazy_dataset.FirstFitPackIterDataset instead. I0420 16:17:14.244133 137005397919552 data_loader.py:408] Adding CopyNumPyArrayToSharedMemory MapTransform. I0420 16:17:14.247126 137005397919552 train_distill.py:417] Input Pipeline Checkpointing: DISABLED I0420 16:17:14.247190 137005397919552 train_distill.py:421] Reason: Iterator 'MultiHostDataLoadIterator' is not recognized as Grain (dataset_type='DatasetType.HF', has_save=False) I0420 16:17:14.247273 137005397919552 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0420 16:17:14.247365 137005397919552 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7c9a567333b0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0420 16:17:14.247416 137005397919552 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0420 16:17:14.247466 137005397919552 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7c9a567333b0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0420 16:17:14.247526 137005397919552 checkpoint_manager.py:702] [process=0][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c7b84073b90>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c7b84073410>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c7b84073380>}, handler_registry=None I0420 16:17:14.247745 137005397919552 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c7b84073b90>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0420 16:17:14.247801 137005397919552 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c7b84073410>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0420 16:17:14.247845 137005397919552 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c7b84073380>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0420 16:17:14.247976 137005397919552 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c7ba41adee0>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0420 16:17:14.248029 137005397919552 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c7b84073b90>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c7b84073b90>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c7b84073410>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c7b84073410>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c7b84073380>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c7b84073380>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c7ba41adee0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c7ba41adee0>}). I0420 16:17:14.248532 137005397919552 async_checkpointer.py:177] [process=0][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7c7b642222a0> timeout: 600 secs and primary_host=0 for async checkpoint writes I0420 16:17:16.720169 137005397919552 checkpoint_manager.py:558] Created directory=gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260420_153552/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260420_153552_07_distill_smoke/checkpoints I0420 16:17:17.138966 137005397919552 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260420_153552/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260420_153552_07_distill_smoke/checkpoints I0420 16:17:17.153125 137005397919552 checkpoint_manager.py:921] [process=0][thread=MainThread] CheckpointManager created, primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260420_153552/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260420_153552_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7c7ba41a03b0> I0420 16:17:17.153248 137005397919552 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0420 16:17:17.153314 137005397919552 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7c9a567333b0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0420 16:17:17.153350 137005397919552 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0420 16:17:17.153380 137005397919552 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7c9a567333b0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0420 16:17:17.153416 137005397919552 checkpoint_manager.py:1983] [process=0][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning. I0420 16:17:17.153467 137005397919552 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=137005397919552 count=1 at 0x7c7ba4163fc0>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7c7b840731d0>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7c7b840731a0>, _write_futures=[]) I0420 16:17:17.153791 137005397919552 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=137005397919552 count=1 at 0x7c7ba4163fc0>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7c7b840731d0>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7c7b840731a0>, _write_futures=[]) I0420 16:17:17.153818 137005397919552 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=137005397919552 count=1 at 0x7c7ba4163fc0>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7c7b840731d0>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7c7b840731a0>, _write_futures=[]) I0420 16:17:17.153849 137005397919552 checkpoint_manager.py:702] [process=0][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c7b84073350>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c7b84071310>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c7b84071100>, 'iter': <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7c7b84072810>}, handler_registry=None I0420 16:17:17.153943 137005397919552 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c7b84073350>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0420 16:17:17.153976 137005397919552 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c7b84071310>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0420 16:17:17.153999 137005397919552 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c7b84071100>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0420 16:17:17.154026 137005397919552 composite_checkpoint_handler.py:237] Deferred registration for item: "iter". Adding handler `<maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7c7b84072810>` for item "iter" and save args `<class 'maxtext.common.checkpointing.GrainCheckpointSave'>` and restore args `<class 'maxtext.common.checkpointing.GrainCheckpointRestore'>` to `_handler_registry`. I0420 16:17:17.154050 137005397919552 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c7b84070da0>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0420 16:17:17.154075 137005397919552 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c7b84073350>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c7b84073350>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c7b84071310>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c7b84071310>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c7b84071100>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c7b84071100>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointSave'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7c7b84072810>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointRestore'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7c7b84072810>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c7b84070da0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c7b84070da0>}). I0420 16:17:17.154163 137005397919552 async_checkpointer.py:177] [process=0][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7c7b642223e0> timeout: 600 secs and primary_host=0 for async checkpoint writes I0420 16:17:17.979766 137005397919552 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260420_153552/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260420_153552_07_distill_smoke/checkpoints I0420 16:17:17.981882 137005397919552 checkpoint_manager.py:921] [process=0][thread=MainThread] CheckpointManager created, primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260420_153552/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260420_153552_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7c81582c8440> I0420 16:17:17.982044 137005397919552 train_distill.py:687] Starting Distillation Training... I0420 16:17:17.982147 137005397919552 peft_trainer.py:590] Training with mesh: Mesh('diloco': 1, 'data': 4, 'stage': 1, 'fsdp': 8, 'fsdp_transpose': 1, 'sequence': 1, 'context': 1, 'context_autoregressive': 1, 'tensor': 1, 'tensor_transpose': 1, 'tensor_sequence': 1, 'expert': 1, 'autoregressive': 1, axis_types=(Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto)) I0420 16:17:18.839467 137005397919552 peft_trainer.py:600] Compiled train_step cache size: 0 Training: 0%| | 0/5 [00:00<?, ?step/s]I0420 16:17:18.841365 136861970491136 grain_pool.py:367] Grain pool will use 1 processes. I0420 16:17:18.867982 136861970491136 grain_pool.py:440] Grain pool will start child processes. I0420 16:17:18.873149 136861970491136 grain_pool.py:448] Grain pool started all child processes. 2026-04-20 16:17:24.900369: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303) I0420 16:17:28.164154 137005397919552 utils.py:86] Train loop finished in: 9.3240 seconds Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 761, in <module> app.run(main) File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 316, in run _run_main(main, args) File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 261, in _run_main sys.exit(main(argv)) ^^^^^^^^^^ File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 757, in main train_distill(student_config, teacher_config, is_offline, global_config.offline_data_dir) File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 689, in train_distill trainer.train(train_iter, eval_iter) File "/usr/local/lib/python3.12/site-packages/tunix/sft/peft_trainer.py", line 659, in train train_example = sharding_utils.shard_input( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 58, in shard_input return jax.tree.map( ^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/tree.py", line 155, in map return tree_util.tree_map(f, tree, *rest, is_leaf=is_leaf) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 362, in tree_map return treedef.unflatten(f(*xs) for xs in zip(*all_leaves)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 362, in <genexpr> return treedef.unflatten(f(*xs) for xs in zip(*all_leaves)) ^^^^^^ File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 59, in <lambda> lambda x: jax.make_array_from_process_local_data( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 986, in make_array_from_process_local_data out = [_array_from_process_local_data(data, s, shape) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 1048, in _array_from_process_local_data return make_array_from_callback(global_shape, sharding, cb) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 845, in make_array_from_callback per_device_values = api.device_put(per_device_values, devices) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/api.py", line 2729, in device_put out_flat = dispatch._batched_device_put_impl( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 558, in _batched_device_put_impl y = _device_put_impl(x, device=device, src=src, copy=cp, aval=aval) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 545, in _device_put_impl return _device_put_sharding_impl(x, aval, device, copy) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 487, in _device_put_sharding_impl raise ValueError( ValueError: device_put's first argument must be a fully addressable array, but got value with devices {TpuDevice(id=13, process_index=2, coords=(1,3,0), core_on_chip=0), TpuDevice(id=21, process_index=4, coords=(1,5,0), core_on_chip=0), TpuDevice(id=24, process_index=6, coords=(0,6,0), core_on_chip=0), TpuDevice(id=19, process_index=5, coords=(3,4,0), core_on_chip=0), TpuDevice(id=14, process_index=3, coords=(2,3,0), core_on_chip=0), TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), TpuDevice(id=5, process_index=0, coords=(1,1,0), core_on_chip=0), TpuDevice(id=23, process_index=5, coords=(3,5,0), core_on_chip=0), TpuDevice(id=18, process_index=5, coords=(2,4,0), core_on_chip=0), TpuDevice(id=22, process_index=5, coords=(2,5,0), core_on_chip=0), TpuDevice(id=7, process_index=1, coords=(3,1,0), core_on_chip=0), TpuDevice(id=26, process_index=7, coords=(2,6,0), core_on_chip=0), TpuDevice(id=15, process_index=3, coords=(3,3,0), core_on_chip=0), TpuDevice(id=3, process_index=1, coords=(3,0,0), core_on_chip=0), TpuDevice(id=28, process_index=6, coords=(0,7,0), core_on_chip=0), TpuDevice(id=9, process_index=2, coords=(1,2,0), core_on_chip=0), TpuDevice(id=27, process_index=7, coords=(3,6,0), core_on_chip=0), TpuDevice(id=4, process_index=0, coords=(0,1,0), core_on_chip=0), TpuDevice(id=16, process_index=4, coords=(0,4,0), core_on_chip=0), TpuDevice(id=6, process_index=1, coords=(2,1,0), core_on_chip=0), TpuDevice(id=30, process_index=7, coords=(2,7,0), core_on_chip=0), TpuDevice(id=2, process_index=1, coords=(2,0,0), core_on_chip=0), TpuDevice(id=10, process_index=3, coords=(2,2,0), core_on_chip=0), TpuDevice(id=1, process_index=0, coords=(1,0,0), core_on_chip=0), TpuDevice(id=17, process_index=4, coords=(1,4,0), core_on_chip=0), TpuDevice(id=20, process_index=4, coords=(0,5,0), core_on_chip=0), TpuDevice(id=25, process_index=6, coords=(1,6,0), core_on_chip=0), TpuDevice(id=11, process_index=3, coords=(3,2,0), core_on_chip=0), TpuDevice(id=8, process_index=2, coords=(0,2,0), core_on_chip=0), TpuDevice(id=12, process_index=2, coords=(0,3,0), core_on_chip=0), TpuDevice(id=29, process_index=6, coords=(1,7,0), core_on_chip=0), TpuDevice(id=31, process_index=7, coords=(3,7,0), core_on_chip=0)} I0420 16:17:28.514305 136861970491136 grain_pool.py:542] Grain pool is exiting. I0420 16:17:28.514406 136861970491136 grain_pool.py:547] Shutting down multiprocessing system. I0420 16:17:29.943007 136861970491136 grain_pool.py:547] Shutting down multiprocessing system. Training: 0%| | 0/5 [00:13<?, ?step/s] /usr/local/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 15 leaked shared_memory objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' Exception ignored in: <function GCSRecordWriter.__del__ at 0x7c9a530e5260> Traceback (most recent call last): File "/usr/local/lib/python3.12/site-packages/tensorboardX/record_writer.py", line 134, in __del__ File "/usr/local/lib/python3.12/site-packages/tensorboardX/record_writer.py", line 158, in close File "/usr/local/lib/python3.12/site-packages/tensorboardX/record_writer.py", line 149, in flush File "/usr/local/lib/python3.12/copy.py", line 87, in copy ImportError: sys.meta_path is None, Python is likely shutting down XPK End: Mon Apr 20 16:17:40 UTC 2026 EXIT_CODE=1