MaxView

← Back to run

Log Summary

XPK Start: Mon Apr 20 08:23:46 UTC 2026
2026-04-20 08:24:03.456566: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
I0420 08:24:07.025277 134595236116288 max_utils.py:273] Attempting to initialize the jax distributed system...
INFO:2026-04-20 08:24:16,065:jax._src.distributed:149: Starting JAX distributed service on [::]:8482
I0420 08:24:16.065251 134595236116288 distributed.py:149] Starting JAX distributed service on [::]:8482
INFO:2026-04-20 08:24:16,069:jax._src.distributed:166: Connecting to JAX distributed service on mt-07-distill-smoke-2dal1-slice-job-0-0.mt-07-distill-smoke-2dal1:8482
I0420 08:24:16.069852 134595236116288 distributed.py:166] Connecting to JAX distributed service on mt-07-distill-smoke-2dal1-slice-job-0-0.mt-07-distill-smoke-2dal1:8482
I0420 08:24:17.543618 134595236116288 max_utils.py:284] Jax distributed system initialized!
I0420 08:24:23.722696 134595236116288 max_utils.py:244] Jax distributed system is already initialized.
I0420 08:24:24.192549 134595236116288 max_utils.py:244] Jax distributed system is already initialized.
I0420 08:24:24.193754 134595236116288 pyconfig.py:432] Config param abort_on_inf_loss: True
I0420 08:24:24.193802 134595236116288 pyconfig.py:432] Config param abort_on_nan_loss: True
I0420 08:24:24.193830 134595236116288 pyconfig.py:432] Config param act_quantization_calibration_method: absmax
I0420 08:24:24.193851 134595236116288 pyconfig.py:432] Config param activation_dropout_for_audio: 0.0
I0420 08:24:24.193870 134595236116288 pyconfig.py:432] Config param activation_function_for_audio: gelu
I0420 08:24:24.193888 134595236116288 pyconfig.py:432] Config param activations_in_float32: False
I0420 08:24:24.193907 134595236116288 pyconfig.py:432] Config param adam_b1: 0.9
I0420 08:24:24.193926 134595236116288 pyconfig.py:432] Config param adam_b2: 0.95
I0420 08:24:24.193943 134595236116288 pyconfig.py:432] Config param adam_eps: 1e-08
I0420 08:24:24.193965 134595236116288 pyconfig.py:432] Config param adam_eps_root: 0.0
I0420 08:24:24.193981 134595236116288 pyconfig.py:432] Config param adam_weight_decay: 0.1
I0420 08:24:24.193998 134595236116288 pyconfig.py:432] Config param adamw_mask: []
I0420 08:24:24.194013 134595236116288 pyconfig.py:432] Config param add_bos: True
I0420 08:24:24.194030 134595236116288 pyconfig.py:432] Config param add_eos: True
I0420 08:24:24.194045 134595236116288 pyconfig.py:432] Config param allow_split_physical_axes: False
I0420 08:24:24.194061 134595236116288 pyconfig.py:432] Config param ar_cache_axis_order: 1,2,0,3
I0420 08:24:24.194077 134595236116288 pyconfig.py:432] Config param async_checkpointing: True
I0420 08:24:24.194092 134595236116288 pyconfig.py:432] Config param async_scheduling: False
I0420 08:24:24.194108 134595236116288 pyconfig.py:432] Config param attention: dot_product
I0420 08:24:24.194124 134595236116288 pyconfig.py:432] Config param attention_bias: False
I0420 08:24:24.194139 134595236116288 pyconfig.py:432] Config param attention_dropout_for_audio: 0.0
I0420 08:24:24.194158 134595236116288 pyconfig.py:432] Config param attention_out: RematLocation.REMAT
I0420 08:24:24.194178 134595236116288 pyconfig.py:432] Config param attention_output_dim: -1
I0420 08:24:24.194193 134595236116288 pyconfig.py:432] Config param attention_sink: False
I0420 08:24:24.194210 134595236116288 pyconfig.py:432] Config param attention_type: global
I0420 08:24:24.194226 134595236116288 pyconfig.py:432] Config param attn_logits_soft_cap: None
I0420 08:24:24.194241 134595236116288 pyconfig.py:432] Config param audio_path: 
I0420 08:24:24.194257 134595236116288 pyconfig.py:432] Config param audio_placeholder: <|audio|>
I0420 08:24:24.194273 134595236116288 pyconfig.py:432] Config param autoregressive_decode_assert: 
I0420 08:24:24.194288 134595236116288 pyconfig.py:432] Config param base_config: base.yml
I0420 08:24:24.194303 134595236116288 pyconfig.py:432] Config param base_emb_dim: 16
I0420 08:24:24.194318 134595236116288 pyconfig.py:432] Config param base_mlp_dim: 64
I0420 08:24:24.194334 134595236116288 pyconfig.py:432] Config param base_moe_mlp_dim: -1
I0420 08:24:24.194349 134595236116288 pyconfig.py:432] Config param base_num_decoder_layers: 1
I0420 08:24:24.194365 134595236116288 pyconfig.py:432] Config param base_num_kv_heads: 2
I0420 08:24:24.194380 134595236116288 pyconfig.py:432] Config param base_num_query_heads: 2
I0420 08:24:24.194396 134595236116288 pyconfig.py:432] Config param base_output_directory: 
I0420 08:24:24.194410 134595236116288 pyconfig.py:432] Config param batch_size: 1
I0420 08:24:24.194426 134595236116288 pyconfig.py:432] Config param batch_split_factor: 1
I0420 08:24:24.194441 134595236116288 pyconfig.py:432] Config param beta_fast: 32
I0420 08:24:24.194457 134595236116288 pyconfig.py:432] Config param beta_slow: 1
I0420 08:24:24.194471 134595236116288 pyconfig.py:432] Config param bwd_quantization_calibration_method: absmax
I0420 08:24:24.194487 134595236116288 pyconfig.py:432] Config param capacity_factor: -1.0
I0420 08:24:24.194504 134595236116288 pyconfig.py:432] Config param cast_logits_to_fp32: True
I0420 08:24:24.194519 134595236116288 pyconfig.py:432] Config param chat_template: 
I0420 08:24:24.194533 134595236116288 pyconfig.py:432] Config param chat_template_path: 
I0420 08:24:24.194550 134595236116288 pyconfig.py:432] Config param checkpoint_conversion_fn: None
I0420 08:24:24.194565 134595236116288 pyconfig.py:432] Config param checkpoint_dir: None
I0420 08:24:24.194582 134595236116288 pyconfig.py:432] Config param checkpoint_is_quantized: False
I0420 08:24:24.194600 134595236116288 pyconfig.py:432] Config param checkpoint_period: 2000
I0420 08:24:24.194616 134595236116288 pyconfig.py:432] Config param checkpoint_storage_concurrent_gb: 96
I0420 08:24:24.194632 134595236116288 pyconfig.py:432] Config param checkpoint_storage_target_data_file_size_bytes: 2147483648
I0420 08:24:24.194648 134595236116288 pyconfig.py:432] Config param checkpoint_storage_use_ocdbt: True
I0420 08:24:24.194663 134595236116288 pyconfig.py:432] Config param checkpoint_storage_use_zarr3: True
I0420 08:24:24.194678 134595236116288 pyconfig.py:432] Config param checkpoint_todelete_full_path: None
I0420 08:24:24.194694 134595236116288 pyconfig.py:432] Config param checkpoint_todelete_subdir: None
I0420 08:24:24.194723 134595236116288 pyconfig.py:432] Config param chips_per_vm: 4
I0420 08:24:24.194743 134595236116288 pyconfig.py:432] Config param chunk_attn_window_size: 0
I0420 08:24:24.194758 134595236116288 pyconfig.py:432] Config param collect_stack_trace: False
I0420 08:24:24.194773 134595236116288 pyconfig.py:432] Config param colocated_python_checkpointing: False
I0420 08:24:24.194787 134595236116288 pyconfig.py:432] Config param colocated_python_data_input: False
I0420 08:24:24.194803 134595236116288 pyconfig.py:432] Config param compile_topology: 
I0420 08:24:24.194817 134595236116288 pyconfig.py:432] Config param compile_topology_num_slices: -1
I0420 08:24:24.194832 134595236116288 pyconfig.py:432] Config param compile_xla_flags: 
I0420 08:24:24.194849 134595236116288 pyconfig.py:432] Config param compiled_trainstep_file: 
I0420 08:24:24.194862 134595236116288 pyconfig.py:432] Config param compute_axis_order: 0,1,2,3
I0420 08:24:24.194878 134595236116288 pyconfig.py:432] Config param constant_bound_config: []
I0420 08:24:24.194894 134595236116288 pyconfig.py:432] Config param context: RematLocation.REMAT
I0420 08:24:24.194910 134595236116288 pyconfig.py:432] Config param context_parallel_load_balance: True
I0420 08:24:24.194925 134595236116288 pyconfig.py:432] Config param context_parallel_size: 1
I0420 08:24:24.194940 134595236116288 pyconfig.py:432] Config param context_parallel_strategy: all_gather
I0420 08:24:24.194955 134595236116288 pyconfig.py:432] Config param context_sharding: context
I0420 08:24:24.194969 134595236116288 pyconfig.py:432] Config param conv_chunksize_for_audio: 500
I0420 08:24:24.194984 134595236116288 pyconfig.py:432] Config param conv_stride_for_vit: 14
I0420 08:24:24.194999 134595236116288 pyconfig.py:432] Config param cost_estimate_flops_bwd: -1
I0420 08:24:24.195013 134595236116288 pyconfig.py:432] Config param cost_estimate_flops_fwd: -1
I0420 08:24:24.195028 134595236116288 pyconfig.py:432] Config param custom_mesh: 
I0420 08:24:24.195043 134595236116288 pyconfig.py:432] Config param custom_mesh_and_rule: 
I0420 08:24:24.195058 134595236116288 pyconfig.py:432] Config param d_model_for_audio: 256
I0420 08:24:24.195072 134595236116288 pyconfig.py:432] Config param data_sharding: (('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'),)
I0420 08:24:24.195092 134595236116288 pyconfig.py:432] Config param data_shuffle_seed: 0
I0420 08:24:24.195107 134595236116288 pyconfig.py:432] Config param dataset_name: c4/en:3.0.1
I0420 08:24:24.195122 134595236116288 pyconfig.py:432] Config param dataset_path: 
I0420 08:24:24.195137 134595236116288 pyconfig.py:432] Config param dataset_type: DatasetType.HF
I0420 08:24:24.195153 134595236116288 pyconfig.py:432] Config param dcn_autoregressive_parallelism: 1
I0420 08:24:24.195167 134595236116288 pyconfig.py:432] Config param dcn_context_autoregressive_parallelism: 1
I0420 08:24:24.195183 134595236116288 pyconfig.py:432] Config param dcn_context_parallelism: 1
I0420 08:24:24.195198 134595236116288 pyconfig.py:432] Config param dcn_data_parallelism: -1
I0420 08:24:24.195213 134595236116288 pyconfig.py:432] Config param dcn_diloco_parallelism: 1
I0420 08:24:24.195228 134595236116288 pyconfig.py:432] Config param dcn_expert_parallelism: 1
I0420 08:24:24.195243 134595236116288 pyconfig.py:432] Config param dcn_fsdp_parallelism: 1
I0420 08:24:24.195258 134595236116288 pyconfig.py:432] Config param dcn_fsdp_transpose_parallelism: 1
I0420 08:24:24.195273 134595236116288 pyconfig.py:432] Config param dcn_parallelism: [1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
I0420 08:24:24.195289 134595236116288 pyconfig.py:432] Config param dcn_pipeline_parallelism: 1
I0420 08:24:24.195304 134595236116288 pyconfig.py:432] Config param dcn_sequence_parallelism: 1
I0420 08:24:24.195340 134595236116288 pyconfig.py:432] Config param dcn_tensor_parallelism: 1
I0420 08:24:24.195354 134595236116288 pyconfig.py:432] Config param dcn_tensor_sequence_parallelism: 1
I0420 08:24:24.195370 134595236116288 pyconfig.py:432] Config param dcn_tensor_transpose_parallelism: 1
I0420 08:24:24.195385 134595236116288 pyconfig.py:432] Config param debug: {'rl': False}
I0420 08:24:24.195400 134595236116288 pyconfig.py:432] Config param debug_sharding: False
I0420 08:24:24.195416 134595236116288 pyconfig.py:432] Config param decode_sampling_nucleus_p: -1
I0420 08:24:24.195430 134595236116288 pyconfig.py:432] Config param decode_sampling_strategy: SamplingStrategy.GREEDY
I0420 08:24:24.195448 134595236116288 pyconfig.py:432] Config param decode_sampling_temperature: 1.0
I0420 08:24:24.195462 134595236116288 pyconfig.py:432] Config param decode_sampling_top_k: 0
I0420 08:24:24.195478 134595236116288 pyconfig.py:432] Config param decoder_block: DecoderBlockType.GPT3
I0420 08:24:24.195494 134595236116288 pyconfig.py:432] Config param decoder_layer_input: RematLocation.DEVICE
I0420 08:24:24.195510 134595236116288 pyconfig.py:432] Config param deepstack_visual_indexes_for_vit: []
I0420 08:24:24.195524 134595236116288 pyconfig.py:432] Config param degenerate_group_masking: True
I0420 08:24:24.195540 134595236116288 pyconfig.py:432] Config param dense_init_scale: 1.0
I0420 08:24:24.195556 134595236116288 pyconfig.py:432] Config param diloco_outer_lr: 0.3
I0420 08:24:24.195571 134595236116288 pyconfig.py:432] Config param diloco_outer_momentum: 0.9
I0420 08:24:24.195587 134595236116288 pyconfig.py:432] Config param diloco_sync_period: 36
I0420 08:24:24.195601 134595236116288 pyconfig.py:432] Config param distill_alpha: 0.5
I0420 08:24:24.195617 134595236116288 pyconfig.py:432] Config param distill_beta: 0.0
I0420 08:24:24.195632 134595236116288 pyconfig.py:432] Config param distill_feature_loss_type: cosine
I0420 08:24:24.195648 134595236116288 pyconfig.py:432] Config param distill_layer_indices: None
I0420 08:24:24.195664 134595236116288 pyconfig.py:432] Config param distill_temperature: 1.0
I0420 08:24:24.195679 134595236116288 pyconfig.py:432] Config param downsample_hidden_size_for_audio: 256
I0420 08:24:24.195694 134595236116288 pyconfig.py:432] Config param dpo_beta: 0.1
I0420 08:24:24.195719 134595236116288 pyconfig.py:432] Config param dpo_label_smoothing: 0.0
I0420 08:24:24.195738 134595236116288 pyconfig.py:432] Config param dq_reduction_steps: 0
I0420 08:24:24.195752 134595236116288 pyconfig.py:432] Config param dropout_rate: 0.0
I0420 08:24:24.195768 134595236116288 pyconfig.py:432] Config param dtype: bfloat16
I0420 08:24:24.195799 134595236116288 pyconfig.py:432] Config param dtype_mm: float32
I0420 08:24:24.195814 134595236116288 pyconfig.py:432] Config param dump_hlo: False
I0420 08:24:24.195830 134595236116288 pyconfig.py:432] Config param dump_hlo_delete_local_after: True
I0420 08:24:24.195844 134595236116288 pyconfig.py:432] Config param dump_hlo_gcs_dir: gpt3-52k_2026-04-20-08-24/xla_dump
I0420 08:24:24.195860 134595236116288 pyconfig.py:432] Config param dump_hlo_local_dir: /tmp/xla_dump/
I0420 08:24:24.195875 134595236116288 pyconfig.py:432] Config param dump_hlo_local_module_name: jit_train_step
I0420 08:24:24.195890 134595236116288 pyconfig.py:432] Config param dump_hlo_module_name: jit_train_step
I0420 08:24:24.195906 134595236116288 pyconfig.py:432] Config param dump_hlo_upload_all: False
I0420 08:24:24.195920 134595236116288 pyconfig.py:432] Config param dump_hlo_xla_flags: 
I0420 08:24:24.195936 134595236116288 pyconfig.py:432] Config param dump_jaxpr: False
I0420 08:24:24.195950 134595236116288 pyconfig.py:432] Config param dump_jaxpr_delete_local_after: True
I0420 08:24:24.195966 134595236116288 pyconfig.py:432] Config param dump_jaxpr_gcs_dir: gpt3-52k_2026-04-20-08-24/jaxpr_dump
I0420 08:24:24.195980 134595236116288 pyconfig.py:432] Config param dump_jaxpr_local_dir: /tmp/jaxpr_dump/
I0420 08:24:24.195996 134595236116288 pyconfig.py:432] Config param dump_step: -1
I0420 08:24:24.196012 134595236116288 pyconfig.py:432] Config param elastic_enabled: False
I0420 08:24:24.196026 134595236116288 pyconfig.py:432] Config param elastic_max_retries: 10
I0420 08:24:24.196042 134595236116288 pyconfig.py:432] Config param elastic_timeout_seconds: 300
I0420 08:24:24.196056 134595236116288 pyconfig.py:432] Config param emb_dim: 16
I0420 08:24:24.196071 134595236116288 pyconfig.py:432] Config param enable_autocheckpoint: False
I0420 08:24:24.196087 134595236116288 pyconfig.py:432] Config param enable_checkpoint_cloud_logger: False
I0420 08:24:24.196101 134595236116288 pyconfig.py:432] Config param enable_checkpointing: True
I0420 08:24:24.196116 134595236116288 pyconfig.py:432] Config param enable_continuous_checkpointing: False
I0420 08:24:24.196130 134595236116288 pyconfig.py:432] Config param enable_data_shuffling: True
I0420 08:24:24.196146 134595236116288 pyconfig.py:432] Config param enable_diloco: False
I0420 08:24:24.196161 134595236116288 pyconfig.py:432] Config param enable_dp_attention: False
I0420 08:24:24.196175 134595236116288 pyconfig.py:432] Config param enable_dropout: False
I0420 08:24:24.196191 134595236116288 pyconfig.py:432] Config param enable_emergency_checkpoint: False
I0420 08:24:24.196205 134595236116288 pyconfig.py:432] Config param enable_gcp_goodput_metrics: True
I0420 08:24:24.196221 134595236116288 pyconfig.py:432] Config param enable_gcp_step_deviation_metrics: True
I0420 08:24:24.196235 134595236116288 pyconfig.py:432] Config param enable_goodput_recording: False
I0420 08:24:24.196251 134595236116288 pyconfig.py:432] Config param enable_jax_profiler: False
I0420 08:24:24.196265 134595236116288 pyconfig.py:432] Config param enable_llm_inference_pool: False
I0420 08:24:24.196280 134595236116288 pyconfig.py:432] Config param enable_model_warmup: False
I0420 08:24:24.196295 134595236116288 pyconfig.py:432] Config param enable_multi_tier_checkpointing: False
I0420 08:24:24.196310 134595236116288 pyconfig.py:432] Config param enable_nnx: False
I0420 08:24:24.196325 134595236116288 pyconfig.py:432] Config param enable_orbax_v1: False
I0420 08:24:24.196339 134595236116288 pyconfig.py:432] Config param enable_padding_causal_mask: True
I0420 08:24:24.196354 134595236116288 pyconfig.py:432] Config param enable_pathways_goodput: False
I0420 08:24:24.196369 134595236116288 pyconfig.py:432] Config param enable_prefix_caching: False
I0420 08:24:24.196385 134595236116288 pyconfig.py:432] Config param enable_rampup_batch_size: False
I0420 08:24:24.196400 134595236116288 pyconfig.py:432] Config param enable_single_controller: False
I0420 08:24:24.196414 134595236116288 pyconfig.py:432] Config param enable_single_replica_ckpt_restoring: False
I0420 08:24:24.196429 134595236116288 pyconfig.py:432] Config param enable_tensorboard: True
I0420 08:24:24.196444 134595236116288 pyconfig.py:432] Config param enable_tunix_perf_metrics: False
I0420 08:24:24.196460 134595236116288 pyconfig.py:432] Config param encoder_attention_heads_for_audio: 4
I0420 08:24:24.196476 134595236116288 pyconfig.py:432] Config param encoder_ffn_dim_for_audio: 512
I0420 08:24:24.196491 134595236116288 pyconfig.py:432] Config param encoder_layers_for_audio: 2
I0420 08:24:24.196506 134595236116288 pyconfig.py:432] Config param engram: RematLocation.REMAT
I0420 08:24:24.196522 134595236116288 pyconfig.py:432] Config param engram_head_dim: 1280
I0420 08:24:24.196536 134595236116288 pyconfig.py:432] Config param engram_kernel_size: 4
I0420 08:24:24.196551 134595236116288 pyconfig.py:432] Config param engram_layers: []
I0420 08:24:24.196567 134595236116288 pyconfig.py:432] Config param engram_max_ngram_size: 3
I0420 08:24:24.196581 134595236116288 pyconfig.py:432] Config param engram_num_heads: 8
I0420 08:24:24.196597 134595236116288 pyconfig.py:432] Config param engram_seed: 0
I0420 08:24:24.196612 134595236116288 pyconfig.py:432] Config param engram_vocab_bases: []
I0420 08:24:24.196626 134595236116288 pyconfig.py:432] Config param epsilon_high: None
I0420 08:24:24.196642 134595236116288 pyconfig.py:432] Config param eval_corr_lst: False
I0420 08:24:24.196656 134595236116288 pyconfig.py:432] Config param eval_data_columns: ['text']
I0420 08:24:24.196670 134595236116288 pyconfig.py:432] Config param eval_dataset_name: c4/en:3.0.1
I0420 08:24:24.196686 134595236116288 pyconfig.py:432] Config param eval_image_column: image
I0420 08:24:24.196702 134595236116288 pyconfig.py:432] Config param eval_interval: -1
I0420 08:24:24.196729 134595236116288 pyconfig.py:432] Config param eval_make_lst: False
I0420 08:24:24.196748 134595236116288 pyconfig.py:432] Config param eval_per_device_batch_size: 2
I0420 08:24:24.196763 134595236116288 pyconfig.py:432] Config param eval_sampling_strategy: greedy
I0420 08:24:24.196777 134595236116288 pyconfig.py:432] Config param eval_split: validation
I0420 08:24:24.196793 134595236116288 pyconfig.py:432] Config param eval_steps: -1
I0420 08:24:24.196808 134595236116288 pyconfig.py:432] Config param expansion_factor_real_data: -1.0
I0420 08:24:24.196825 134595236116288 pyconfig.py:432] Config param final_logits_soft_cap: None
I0420 08:24:24.196840 134595236116288 pyconfig.py:432] Config param first_num_dense_layers: 0
I0420 08:24:24.196855 134595236116288 pyconfig.py:432] Config param float32_gate_logits: False
I0420 08:24:24.196869 134595236116288 pyconfig.py:432] Config param float32_logits: False
I0420 08:24:24.196884 134595236116288 pyconfig.py:432] Config param float32_qk_product: False
I0420 08:24:24.196898 134595236116288 pyconfig.py:432] Config param float32_weight_sum: True
I0420 08:24:24.196913 134595236116288 pyconfig.py:432] Config param force_q_layout: False
I0420 08:24:24.196929 134595236116288 pyconfig.py:432] Config param force_unroll: False
I0420 08:24:24.196945 134595236116288 pyconfig.py:432] Config param freeze_audio_encoder_params: True
I0420 08:24:24.196960 134595236116288 pyconfig.py:432] Config param freeze_vision_encoder_params: True
I0420 08:24:24.196975 134595236116288 pyconfig.py:432] Config param fused_mlp: False
I0420 08:24:24.196989 134595236116288 pyconfig.py:432] Config param fused_qkv: True
I0420 08:24:24.197005 134595236116288 pyconfig.py:432] Config param gcs_metrics: False
I0420 08:24:24.197019 134595236116288 pyconfig.py:432] Config param gdn_chunk_size: 64
I0420 08:24:24.197035 134595236116288 pyconfig.py:432] Config param gdn_conv_kernel_dim: 4
I0420 08:24:24.197050 134595236116288 pyconfig.py:432] Config param gdn_key_head_dim: 128
I0420 08:24:24.197065 134595236116288 pyconfig.py:432] Config param gdn_num_key_heads: 16
I0420 08:24:24.197079 134595236116288 pyconfig.py:432] Config param gdn_num_value_heads: 32
I0420 08:24:24.197094 134595236116288 pyconfig.py:432] Config param gdn_value_head_dim: 128
I0420 08:24:24.197109 134595236116288 pyconfig.py:432] Config param generate_padding_batch_eval: False
I0420 08:24:24.197124 134595236116288 pyconfig.py:432] Config param generate_padding_batch_train: False
I0420 08:24:24.197139 134595236116288 pyconfig.py:432] Config param generate_slice: v5e-16
I0420 08:24:24.197153 134595236116288 pyconfig.py:432] Config param generation_configs: {}
I0420 08:24:24.197169 134595236116288 pyconfig.py:432] Config param global_batch_size_to_eval_on: 64
I0420 08:24:24.197185 134595236116288 pyconfig.py:432] Config param global_batch_size_to_load: 512
I0420 08:24:24.197200 134595236116288 pyconfig.py:432] Config param global_batch_size_to_load_eval: 64
I0420 08:24:24.197216 134595236116288 pyconfig.py:432] Config param global_batch_size_to_load_increment: None
I0420 08:24:24.197230 134595236116288 pyconfig.py:432] Config param global_batch_size_to_load_start: None
I0420 08:24:24.197244 134595236116288 pyconfig.py:432] Config param global_batch_size_to_train_on: 512
I0420 08:24:24.197260 134595236116288 pyconfig.py:432] Config param global_head_dim: 0
I0420 08:24:24.197274 134595236116288 pyconfig.py:432] Config param global_num_kv_heads: 0
I0420 08:24:24.197289 134595236116288 pyconfig.py:432] Config param global_parameter_scale: 1
I0420 08:24:24.197304 134595236116288 pyconfig.py:432] Config param global_rampup_samples: 500
I0420 08:24:24.197319 134595236116288 pyconfig.py:432] Config param global_rope_max_timescale: -1
I0420 08:24:24.197334 134595236116288 pyconfig.py:432] Config param global_rope_proportion: 0.25
I0420 08:24:24.197350 134595236116288 pyconfig.py:432] Config param goodput_upload_interval_seconds: 30
I0420 08:24:24.197363 134595236116288 pyconfig.py:432] Config param grad_dtype: float32
I0420 08:24:24.197398 134595236116288 pyconfig.py:432] Config param gradient_accumulation_steps: 8
I0420 08:24:24.197413 134595236116288 pyconfig.py:432] Config param gradient_clipping_threshold: 1.0
I0420 08:24:24.197427 134595236116288 pyconfig.py:432] Config param grain_data_source_max_workers: 16
I0420 08:24:24.197443 134595236116288 pyconfig.py:432] Config param grain_eval_files: 
I0420 08:24:24.197458 134595236116288 pyconfig.py:432] Config param grain_file_type: arrayrecord
I0420 08:24:24.197473 134595236116288 pyconfig.py:432] Config param grain_num_threads: 16
I0420 08:24:24.197489 134595236116288 pyconfig.py:432] Config param grain_num_threads_eval: 16
I0420 08:24:24.197503 134595236116288 pyconfig.py:432] Config param grain_packing_type: first_fit
I0420 08:24:24.197519 134595236116288 pyconfig.py:432] Config param grain_per_worker_buffer_size: 1
I0420 08:24:24.197533 134595236116288 pyconfig.py:432] Config param grain_per_worker_buffer_size_eval: 1
I0420 08:24:24.197548 134595236116288 pyconfig.py:432] Config param grain_prefetch_buffer_size: 500
I0420 08:24:24.197564 134595236116288 pyconfig.py:432] Config param grain_prefetch_buffer_size_eval: 500
I0420 08:24:24.197578 134595236116288 pyconfig.py:432] Config param grain_ram_budget_mb: 1024
I0420 08:24:24.197593 134595236116288 pyconfig.py:432] Config param grain_shuffle_buffer_size: 100
I0420 08:24:24.197609 134595236116288 pyconfig.py:432] Config param grain_train_files: 
I0420 08:24:24.197623 134595236116288 pyconfig.py:432] Config param grain_train_mixture_config_path: 
I0420 08:24:24.197639 134595236116288 pyconfig.py:432] Config param grain_worker_count: 1
I0420 08:24:24.197653 134595236116288 pyconfig.py:432] Config param grain_worker_count_eval: 1
I0420 08:24:24.197669 134595236116288 pyconfig.py:432] Config param grpo_beta: 0.08
I0420 08:24:24.197684 134595236116288 pyconfig.py:432] Config param grpo_epsilon: 0.2
I0420 08:24:24.197700 134595236116288 pyconfig.py:432] Config param hardware: tpu
I0420 08:24:24.197727 134595236116288 pyconfig.py:432] Config param hbm_utilization_vllm: 0.72
I0420 08:24:24.197745 134595236116288 pyconfig.py:432] Config param head_dim: 8
I0420 08:24:24.197761 134595236116288 pyconfig.py:432] Config param heartbeat_reporting_interval_in_seconds: 5
I0420 08:24:24.197775 134595236116288 pyconfig.py:432] Config param hf_data_dir: None
I0420 08:24:24.197790 134595236116288 pyconfig.py:432] Config param hf_eval_files: None
I0420 08:24:24.197805 134595236116288 pyconfig.py:432] Config param hf_eval_split: None
I0420 08:24:24.197819 134595236116288 pyconfig.py:432] Config param hf_name: None
I0420 08:24:24.197834 134595236116288 pyconfig.py:432] Config param hf_path: OptimalScale/ClimbMix
I0420 08:24:24.197849 134595236116288 pyconfig.py:432] Config param hf_train_files: None
I0420 08:24:24.197865 134595236116288 pyconfig.py:432] Config param hidden_size_for_vit: 1408
I0420 08:24:24.197881 134595236116288 pyconfig.py:432] Config param hide_profiler_step_metric: False
I0420 08:24:24.197896 134595236116288 pyconfig.py:432] Config param ici_autoregressive_parallelism: 1
I0420 08:24:24.197911 134595236116288 pyconfig.py:432] Config param ici_context_autoregressive_parallelism: 1
I0420 08:24:24.197925 134595236116288 pyconfig.py:432] Config param ici_context_parallelism: 1
I0420 08:24:24.197941 134595236116288 pyconfig.py:432] Config param ici_data_parallelism: 1
I0420 08:24:24.197956 134595236116288 pyconfig.py:432] Config param ici_diloco_parallelism: 1
I0420 08:24:24.197972 134595236116288 pyconfig.py:432] Config param ici_expert_parallelism: 1
I0420 08:24:24.197986 134595236116288 pyconfig.py:432] Config param ici_fsdp_parallelism: -1
I0420 08:24:24.198001 134595236116288 pyconfig.py:432] Config param ici_fsdp_transpose_parallelism: 1
I0420 08:24:24.198015 134595236116288 pyconfig.py:432] Config param ici_parallelism: [1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
I0420 08:24:24.198032 134595236116288 pyconfig.py:432] Config param ici_pipeline_parallelism: 1
I0420 08:24:24.198046 134595236116288 pyconfig.py:432] Config param ici_sequence_parallelism: 1
I0420 08:24:24.198062 134595236116288 pyconfig.py:432] Config param ici_tensor_parallelism: 1
I0420 08:24:24.198077 134595236116288 pyconfig.py:432] Config param ici_tensor_sequence_parallelism: 1
I0420 08:24:24.198092 134595236116288 pyconfig.py:432] Config param ici_tensor_transpose_parallelism: 1
I0420 08:24:24.198106 134595236116288 pyconfig.py:432] Config param image_path: 
I0420 08:24:24.198122 134595236116288 pyconfig.py:432] Config param image_placeholder: <|image|>
I0420 08:24:24.198136 134595236116288 pyconfig.py:432] Config param image_size_for_vit: 896
I0420 08:24:24.198152 134595236116288 pyconfig.py:432] Config param indexer_head_dim: 128
I0420 08:24:24.198166 134595236116288 pyconfig.py:432] Config param indexer_loss_scaling_factor: 0.0
I0420 08:24:24.198181 134595236116288 pyconfig.py:432] Config param indexer_n_heads: 64
I0420 08:24:24.198197 134595236116288 pyconfig.py:432] Config param indexer_sparse_training: False
I0420 08:24:24.198213 134595236116288 pyconfig.py:432] Config param indexer_topk: 2048
I0420 08:24:24.198228 134595236116288 pyconfig.py:432] Config param inference_benchmark_test: False
I0420 08:24:24.198242 134595236116288 pyconfig.py:432] Config param inference_metadata_file: 
I0420 08:24:24.198257 134595236116288 pyconfig.py:432] Config param inference_microbenchmark_log_file_path: 
I0420 08:24:24.198271 134595236116288 pyconfig.py:432] Config param inference_microbenchmark_loop_iters: 10
I0420 08:24:24.198287 134595236116288 pyconfig.py:432] Config param inference_microbenchmark_num_samples: [1, 2, 3, 4, 5]
I0420 08:24:24.198302 134595236116288 pyconfig.py:432] Config param inference_microbenchmark_prefill_lengths: 64,128,256,512,1024
I0420 08:24:24.198317 134595236116288 pyconfig.py:432] Config param inference_microbenchmark_stages: prefill,generate
I0420 08:24:24.198333 134595236116288 pyconfig.py:432] Config param inference_server: MaxtextInterleavedServer
I0420 08:24:24.198347 134595236116288 pyconfig.py:432] Config param inhomogeneous_layer_cycle_interval: 1
I0420 08:24:24.198362 134595236116288 pyconfig.py:432] Config param init_weights_seed: 0
I0420 08:24:24.198376 134595236116288 pyconfig.py:432] Config param input_data_sharding_logical_axes: ['activation_embed_and_logits_batch', 'activation_norm_length']
I0420 08:24:24.198392 134595236116288 pyconfig.py:432] Config param interleave_moe_layer_step: 1
I0420 08:24:24.198407 134595236116288 pyconfig.py:432] Config param intermediate_size_for_vit: 5632
I0420 08:24:24.198421 134595236116288 pyconfig.py:432] Config param internal_compile: False
I0420 08:24:24.198436 134595236116288 pyconfig.py:432] Config param internal_compile_num_devices: -1
I0420 08:24:24.198452 134595236116288 pyconfig.py:432] Config param jax_cache_dir: ~/jax_cache
I0420 08:24:24.198466 134595236116288 pyconfig.py:432] Config param jax_debug_log_modules: 
I0420 08:24:24.198481 134595236116288 pyconfig.py:432] Config param jax_distributed_initialization_timeout: 300
I0420 08:24:24.198495 134595236116288 pyconfig.py:432] Config param jax_profiler_port: 9999
I0420 08:24:24.198509 134595236116288 pyconfig.py:432] Config param key_proj: RematLocation.REMAT
I0420 08:24:24.198525 134595236116288 pyconfig.py:432] Config param kv_cache_buffer: 256
I0420 08:24:24.198541 134595236116288 pyconfig.py:432] Config param kv_lora_rank: 512
I0420 08:24:24.198555 134595236116288 pyconfig.py:432] Config param kv_quant_axis: KvQuantAxis.HEADS_AND_DKV
I0420 08:24:24.198573 134595236116288 pyconfig.py:432] Config param kv_quant_dtype: int8
I0420 08:24:24.198587 134595236116288 pyconfig.py:432] Config param kv_wa_proj: RematLocation.REMAT
I0420 08:24:24.198603 134595236116288 pyconfig.py:432] Config param learning_rate: 0.0002
I0420 08:24:24.198619 134595236116288 pyconfig.py:432] Config param learning_rate_final_fraction: 0.1
I0420 08:24:24.198633 134595236116288 pyconfig.py:432] Config param learning_rate_schedule_steps: 200000
I0420 08:24:24.198650 134595236116288 pyconfig.py:432] Config param load_balance_loss_weight: 0.0
I0420 08:24:24.198665 134595236116288 pyconfig.py:432] Config param load_checkpoint_only_once: False
I0420 08:24:24.198680 134595236116288 pyconfig.py:432] Config param load_from_prefill_dir: False
I0420 08:24:24.198695 134595236116288 pyconfig.py:432] Config param load_full_state_path: 
I0420 08:24:24.198724 134595236116288 pyconfig.py:432] Config param load_parameters_path: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items
I0420 08:24:24.198744 134595236116288 pyconfig.py:432] Config param local_checkpoint_directory: 
I0420 08:24:24.198760 134595236116288 pyconfig.py:432] Config param local_checkpoint_period: 0
I0420 08:24:24.198775 134595236116288 pyconfig.py:432] Config param local_rope_max_timescale: -1
I0420 08:24:24.198790 134595236116288 pyconfig.py:432] Config param local_rope_proportion: 1.0
I0420 08:24:24.198805 134595236116288 pyconfig.py:432] Config param log_config: True
I0420 08:24:24.198819 134595236116288 pyconfig.py:432] Config param log_period: 10
I0420 08:24:24.198833 134595236116288 pyconfig.py:432] Config param logical_axis_rules: (('activation_embed_and_logits_batch', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_embed_and_logits_batch_sequence', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_vocab', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_vocab', ('tensor', 'tensor_transpose')), ('activation_vocab', 'tensor_sequence'), ('activation_vocab', ('sequence', 'context')), ('vocab', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('embed_vocab', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence', 'autoregressive')), ('activation_kv_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence')), ('activation_attn_length', ('sequence', 'context')), ('activation_attn_length', ('context',)), ('activation_q_length', ('context',)), ('activation_kv_length', ()), ('activation_attn_embed', ('tensor', 'tensor_transpose')), ('activation_kv', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_kv_head_dim', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('q_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('kv_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('qkv', ()), ('kv', ()), ('kv_head_dim', ()), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'expert')), ('q_lora_up_proj', ()), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'expert')), ('kv_lora_up_proj', ()), ('activation_batch_moe', ('data', 'fsdp', 'fsdp_transpose')), ('activation_length_moe', ('sequence', 'context')), ('activation_length_moe', ('context',)), ('activation_norm_length_moe', ('tensor_sequence', 'context', 'sequence')), ('activation_embed_moe', ('tensor', 'tensor_transpose')), ('activation_mlp_moe', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_exp', ('expert',)), ('exp', 'expert'), ('mlp_moe', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'context')), ('embed_moe', ('fsdp', 'sequence', 'context')), ('activation_mlp', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_length', ('sequence', 'context')), ('activation_length', ('context',)), ('activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_embed', ('tensor', 'tensor_transpose')), ('activation_stage', 'stage'), ('mlp', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'context', 'expert')), ('norm', ('tensor', 'tensor_transpose')), ('layers', 'stage'), ('diloco', 'diloco'), ('engram_dim', ('tensor',)), ('dense_layers', ()), ('moe_layers', ()), ('mhc', ()), ('prefill_activation_length', ('sequence', 'context')), ('prefill_activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_prefill_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_length', ('sequence',)), ('cache_heads', ('autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence')), ('cache_heads', ('autoregressive', 'tensor', 'tensor_sequence')), ('paged_kv_heads', ('tensor',)), ('cache_batch_prefill', ()), ('cache_batch', ()), ('cache_heads_none', ()), ('cache_kv', ()), ('cache_sequence', ()), ('num_pages', ()), ('tokens_per_page', ()), ('paged_kv_head_dim_size', ()), ('mlp_no_fsdp', ('tensor', 'tensor_sequence', 'autoregressive')), ('embed_tensor_transpose', ('tensor_transpose',)), ('exp_with_fsdp', 'fsdp'))
I0420 08:24:24.198906 134595236116288 pyconfig.py:432] Config param logits_dot_in_fp32: False
I0420 08:24:24.198923 134595236116288 pyconfig.py:432] Config param logits_via_embedding: True
I0420 08:24:24.198939 134595236116288 pyconfig.py:432] Config param lora_input_adapters_path: 
I0420 08:24:24.198954 134595236116288 pyconfig.py:432] Config param loss_algo: grpo
I0420 08:24:24.198970 134595236116288 pyconfig.py:432] Config param lr_schedule_type: LearningRateScheduleType.COSINE
I0420 08:24:24.198987 134595236116288 pyconfig.py:432] Config param managed_mldiagnostics: False
I0420 08:24:24.199003 134595236116288 pyconfig.py:432] Config param managed_mldiagnostics_dir: None
I0420 08:24:24.199017 134595236116288 pyconfig.py:432] Config param managed_mldiagnostics_run_group: 
I0420 08:24:24.199032 134595236116288 pyconfig.py:432] Config param matmul_precision: MatmulPrecision.DEFAULT
I0420 08:24:24.199048 134595236116288 pyconfig.py:432] Config param max_checkify: False
I0420 08:24:24.199064 134595236116288 pyconfig.py:432] Config param max_concurrency: 256
I0420 08:24:24.199078 134595236116288 pyconfig.py:432] Config param max_corpus_chars: 10000000
I0420 08:24:24.199094 134595236116288 pyconfig.py:432] Config param max_num_batched_tokens: None
I0420 08:24:24.199108 134595236116288 pyconfig.py:432] Config param max_num_checkpoints_to_keep: None
I0420 08:24:24.199123 134595236116288 pyconfig.py:432] Config param max_num_images_per_example: -1
I0420 08:24:24.199139 134595236116288 pyconfig.py:432] Config param max_num_seqs: None
I0420 08:24:24.199152 134595236116288 pyconfig.py:432] Config param max_position_embeddings: 163840
I0420 08:24:24.199167 134595236116288 pyconfig.py:432] Config param max_prefill_predict_length: 64
I0420 08:24:24.199182 134595236116288 pyconfig.py:432] Config param max_sample_len_for_audio: 10000
I0420 08:24:24.199196 134595236116288 pyconfig.py:432] Config param max_segments_per_seq: -1
I0420 08:24:24.199212 134595236116288 pyconfig.py:432] Config param max_source_positions_for_audio: 1500
I0420 08:24:24.199227 134595236116288 pyconfig.py:432] Config param max_target_length: 2048
I0420 08:24:24.199241 134595236116288 pyconfig.py:432] Config param max_timescale_for_audio: 10000.0
I0420 08:24:24.199257 134595236116288 pyconfig.py:432] Config param megablox: True
I0420 08:24:24.199273 134595236116288 pyconfig.py:432] Config param merge_gating_gmm: False
I0420 08:24:24.199288 134595236116288 pyconfig.py:432] Config param mesh_axes: ['diloco', 'data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive']
I0420 08:24:24.199306 134595236116288 pyconfig.py:432] Config param metrics_dir: None
I0420 08:24:24.199321 134595236116288 pyconfig.py:432] Config param metrics_file: 
I0420 08:24:24.199336 134595236116288 pyconfig.py:432] Config param mhc_expansion_rate: 1
I0420 08:24:24.199351 134595236116288 pyconfig.py:432] Config param micro_batch_size_to_eval_on: 64
I0420 08:24:24.199365 134595236116288 pyconfig.py:432] Config param micro_batch_size_to_train_on: 64
I0420 08:24:24.199380 134595236116288 pyconfig.py:432] Config param mla_kv: RematLocation.REMAT
I0420 08:24:24.199395 134595236116288 pyconfig.py:432] Config param mla_naive_kvcache: True
I0420 08:24:24.199410 134595236116288 pyconfig.py:432] Config param mla_q: RematLocation.REMAT
I0420 08:24:24.199426 134595236116288 pyconfig.py:432] Config param mlp_activations: ['gelu']
I0420 08:24:24.199441 134595236116288 pyconfig.py:432] Config param mlp_activations_limit: -1.0
I0420 08:24:24.199456 134595236116288 pyconfig.py:432] Config param mlp_bias: False
I0420 08:24:24.199470 134595236116288 pyconfig.py:432] Config param mlp_dim: 64
I0420 08:24:24.199486 134595236116288 pyconfig.py:432] Config param mlpwi: RematLocation.REMAT
I0420 08:24:24.199500 134595236116288 pyconfig.py:432] Config param mlpwi_0: RematLocation.REMAT
I0420 08:24:24.199516 134595236116288 pyconfig.py:432] Config param mlpwi_1: RematLocation.REMAT
I0420 08:24:24.199531 134595236116288 pyconfig.py:432] Config param mlpwo: RematLocation.REMAT
I0420 08:24:24.199546 134595236116288 pyconfig.py:432] Config param moba: False
I0420 08:24:24.199562 134595236116288 pyconfig.py:432] Config param moba_chunk_size: 1024
I0420 08:24:24.199576 134595236116288 pyconfig.py:432] Config param moba_topk: 8
I0420 08:24:24.199591 134595236116288 pyconfig.py:432] Config param model_call_mode: 
I0420 08:24:24.199606 134595236116288 pyconfig.py:432] Config param model_name: gpt3-52k
I0420 08:24:24.199622 134595236116288 pyconfig.py:432] Config param moe_expert_input_dim: -1
I0420 08:24:24.199636 134595236116288 pyconfig.py:432] Config param moe_fsdp_use_two_stage_all_gather: False
I0420 08:24:24.199652 134595236116288 pyconfig.py:432] Config param moe_mlp_dim: -1
I0420 08:24:24.199667 134595236116288 pyconfig.py:432] Config param moe_mlpwi_0: RematLocation.REMAT
I0420 08:24:24.199681 134595236116288 pyconfig.py:432] Config param moe_mlpwi_1: RematLocation.REMAT
I0420 08:24:24.199697 134595236116288 pyconfig.py:432] Config param moe_mlpwo: RematLocation.REMAT
I0420 08:24:24.199723 134595236116288 pyconfig.py:432] Config param monitor_goodput: False
I0420 08:24:24.199741 134595236116288 pyconfig.py:432] Config param monitor_step_time_deviation: True
I0420 08:24:24.199757 134595236116288 pyconfig.py:432] Config param mrope_section: [24, 20, 20]
I0420 08:24:24.199772 134595236116288 pyconfig.py:432] Config param mscale: 1.0
I0420 08:24:24.199787 134595236116288 pyconfig.py:432] Config param mtc_data_parallelism: 0
I0420 08:24:24.199803 134595236116288 pyconfig.py:432] Config param mtp_eval_target_module: 0
I0420 08:24:24.199816 134595236116288 pyconfig.py:432] Config param mtp_loss_scaling_factor: 0.1
I0420 08:24:24.199832 134595236116288 pyconfig.py:432] Config param mtp_num_layers: 0
I0420 08:24:24.199847 134595236116288 pyconfig.py:432] Config param mu_dtype: float32
I0420 08:24:24.199870 134595236116288 pyconfig.py:432] Config param multi_sampling: False
I0420 08:24:24.199887 134595236116288 pyconfig.py:432] Config param multi_tier_checkpointing_backup_interval_minutes: 0
I0420 08:24:24.199901 134595236116288 pyconfig.py:432] Config param muon_beta: 0.95
I0420 08:24:24.199916 134595236116288 pyconfig.py:432] Config param muon_consistent_rms: None
I0420 08:24:24.199932 134595236116288 pyconfig.py:432] Config param muon_weight_decay: 0.0
I0420 08:24:24.199946 134595236116288 pyconfig.py:432] Config param n_routing_groups: -1
I0420 08:24:24.199962 134595236116288 pyconfig.py:432] Config param n_window_for_audio: 50
I0420 08:24:24.199977 134595236116288 pyconfig.py:432] Config param n_window_infer_for_audio: 800
I0420 08:24:24.199991 134595236116288 pyconfig.py:432] Config param nope_layer_interval: -1
I0420 08:24:24.200007 134595236116288 pyconfig.py:432] Config param norm_topk_prob: False
I0420 08:24:24.200021 134595236116288 pyconfig.py:432] Config param normalization_layer_epsilon: 1e-05
I0420 08:24:24.200038 134595236116288 pyconfig.py:432] Config param normalize_embedding_logits: False
I0420 08:24:24.200053 134595236116288 pyconfig.py:432] Config param num_attention_heads_for_vit: 16
I0420 08:24:24.200067 134595236116288 pyconfig.py:432] Config param num_batches: 4
I0420 08:24:24.200083 134595236116288 pyconfig.py:432] Config param num_channels_for_vit: 3
I0420 08:24:24.200098 134595236116288 pyconfig.py:432] Config param num_conv_layers_for_audio: 3
I0420 08:24:24.200112 134595236116288 pyconfig.py:432] Config param num_decoder_layers: 1
I0420 08:24:24.200127 134595236116288 pyconfig.py:432] Config param num_diloco_replicas: 1
I0420 08:24:24.200142 134595236116288 pyconfig.py:432] Config param num_epoch: 1
I0420 08:24:24.200157 134595236116288 pyconfig.py:432] Config param num_eval_passes: 1
I0420 08:24:24.200173 134595236116288 pyconfig.py:432] Config param num_experts: 1
I0420 08:24:24.200187 134595236116288 pyconfig.py:432] Config param num_experts_per_tok: 1
I0420 08:24:24.200202 134595236116288 pyconfig.py:432] Config param num_generations: 2
I0420 08:24:24.200216 134595236116288 pyconfig.py:432] Config param num_hidden_layers_for_vit: 34
I0420 08:24:24.200232 134595236116288 pyconfig.py:432] Config param num_iterations: 1
I0420 08:24:24.200246 134595236116288 pyconfig.py:432] Config param num_kv_heads: 2
I0420 08:24:24.200261 134595236116288 pyconfig.py:432] Config param num_layers_per_pipeline_stage: 1
I0420 08:24:24.200275 134595236116288 pyconfig.py:432] Config param num_mel_bins_for_audio: 128
I0420 08:24:24.200290 134595236116288 pyconfig.py:432] Config param num_pipeline_microbatches: -1
I0420 08:24:24.200306 134595236116288 pyconfig.py:432] Config param num_pipeline_repeats: -1
I0420 08:24:24.200321 134595236116288 pyconfig.py:432] Config param num_position_embeddings_for_vit: 1024
I0420 08:24:24.200336 134595236116288 pyconfig.py:432] Config param num_query_heads: 2
I0420 08:24:24.200351 134595236116288 pyconfig.py:432] Config param num_samplers_slices: -1
I0420 08:24:24.200365 134595236116288 pyconfig.py:432] Config param num_slices: 1
I0420 08:24:24.200381 134595236116288 pyconfig.py:432] Config param num_target_devices: 32
I0420 08:24:24.200395 134595236116288 pyconfig.py:432] Config param num_test_batches: 5
I0420 08:24:24.200410 134595236116288 pyconfig.py:432] Config param num_trainer_slices: -1
I0420 08:24:24.200424 134595236116288 pyconfig.py:432] Config param num_vocab_tiling: 1
I0420 08:24:24.200440 134595236116288 pyconfig.py:432] Config param off_policy_steps: 0
I0420 08:24:24.200455 134595236116288 pyconfig.py:432] Config param offline_data_dir: None
I0420 08:24:24.200469 134595236116288 pyconfig.py:432] Config param opt_type: OptimizerType.ADAM_PAX
I0420 08:24:24.200487 134595236116288 pyconfig.py:432] Config param optimize_mesh_for_tpu_v6e: False
I0420 08:24:24.200500 134595236116288 pyconfig.py:432] Config param optimizer_memory_host_offload: False
I0420 08:24:24.200516 134595236116288 pyconfig.py:432] Config param original_max_position_embeddings: 4096
I0420 08:24:24.200531 134595236116288 pyconfig.py:432] Config param out_hidden_size_for_vit: 512
I0420 08:24:24.200546 134595236116288 pyconfig.py:432] Config param out_proj: RematLocation.REMAT
I0420 08:24:24.200561 134595236116288 pyconfig.py:432] Config param output_dim_for_audio: 512
I0420 08:24:24.200576 134595236116288 pyconfig.py:432] Config param override_logical_axis_rules: False
I0420 08:24:24.200591 134595236116288 pyconfig.py:432] Config param override_model_config: True
I0420 08:24:24.200605 134595236116288 pyconfig.py:432] Config param packing: True
I0420 08:24:24.200621 134595236116288 pyconfig.py:432] Config param pagedattn_head_dim_alignment: 128
I0420 08:24:24.200635 134595236116288 pyconfig.py:432] Config param pagedattn_max_pages_per_group: -1
I0420 08:24:24.200650 134595236116288 pyconfig.py:432] Config param pagedattn_num_pages: 64
I0420 08:24:24.200664 134595236116288 pyconfig.py:432] Config param pagedattn_pages_per_compute_block: 4
I0420 08:24:24.200680 134595236116288 pyconfig.py:432] Config param pagedattn_tokens_per_page: 32
I0420 08:24:24.200694 134595236116288 pyconfig.py:432] Config param param_scan_axis: 1
I0420 08:24:24.200716 134595236116288 pyconfig.py:432] Config param parameter_memory_host_offload: False
I0420 08:24:24.200736 134595236116288 pyconfig.py:432] Config param partial_rotary_factor: 1.0
I0420 08:24:24.200752 134595236116288 pyconfig.py:432] Config param patch_size_for_vit: 14
I0420 08:24:24.200766 134595236116288 pyconfig.py:432] Config param penalty_incorrect_answer: -1.0
I0420 08:24:24.200781 134595236116288 pyconfig.py:432] Config param penalty_incorrect_format: -0.5
I0420 08:24:24.200796 134595236116288 pyconfig.py:432] Config param per_device_batch_size: 2
I0420 08:24:24.200812 134595236116288 pyconfig.py:432] Config param per_device_batch_size_increment: 2.0
I0420 08:24:24.200826 134595236116288 pyconfig.py:432] Config param per_device_batch_size_start: 4.0
I0420 08:24:24.200842 134595236116288 pyconfig.py:432] Config param pipeline_delay_activation_forwarding: False
I0420 08:24:24.200856 134595236116288 pyconfig.py:432] Config param pipeline_fsdp_ag_once: False
I0420 08:24:24.200872 134595236116288 pyconfig.py:432] Config param pipeline_fsdp_ag_per_repeat: False
I0420 08:24:24.200886 134595236116288 pyconfig.py:432] Config param pipeline_parallel_layers: 1
I0420 08:24:24.200901 134595236116288 pyconfig.py:432] Config param pixel_shuffle_ratio_for_vit: 0.5
I0420 08:24:24.200916 134595236116288 pyconfig.py:432] Config param posemb_type_for_vit: learn
I0420 08:24:24.200931 134595236116288 pyconfig.py:432] Config param position_id_per_seconds: 25
I0420 08:24:24.200947 134595236116288 pyconfig.py:432] Config param prefill_cache_axis_order: 1,2,0,3
I0420 08:24:24.200963 134595236116288 pyconfig.py:432] Config param prefill_cache_dir: 
I0420 08:24:24.200978 134595236116288 pyconfig.py:432] Config param prefill_chunk_size: 256
I0420 08:24:24.200993 134595236116288 pyconfig.py:432] Config param prefill_slice: v5e-16
I0420 08:24:24.201008 134595236116288 pyconfig.py:432] Config param prefix_caching_dram_byte: 100000000000
I0420 08:24:24.201023 134595236116288 pyconfig.py:432] Config param prefix_caching_hbm_byte: 10000000000
I0420 08:24:24.201038 134595236116288 pyconfig.py:432] Config param profile_cleanly: True
I0420 08:24:24.201053 134595236116288 pyconfig.py:432] Config param profile_periodically_period: -1
I0420 08:24:24.201069 134595236116288 pyconfig.py:432] Config param profile_power_events: False
I0420 08:24:24.201084 134595236116288 pyconfig.py:432] Config param profiler: ProfilerType.NONE
I0420 08:24:24.201102 134595236116288 pyconfig.py:432] Config param profiler_steps: 5
I0420 08:24:24.201116 134595236116288 pyconfig.py:432] Config param projector_dropout_for_vit: 0.0
I0420 08:24:24.201132 134595236116288 pyconfig.py:432] Config param projector_input_dim_for_vit: 4096
I0420 08:24:24.201146 134595236116288 pyconfig.py:432] Config param projector_output_dim_for_vit: 4096
I0420 08:24:24.201162 134595236116288 pyconfig.py:432] Config param prometheus_port: 0
I0420 08:24:24.201176 134595236116288 pyconfig.py:432] Config param prompt: I love to
I0420 08:24:24.201192 134595236116288 pyconfig.py:432] Config param pure_nnx: False
I0420 08:24:24.201206 134595236116288 pyconfig.py:432] Config param pure_nnx_decoder: False
I0420 08:24:24.201221 134595236116288 pyconfig.py:432] Config param q_lora_rank: 0
I0420 08:24:24.201237 134595236116288 pyconfig.py:432] Config param qk_clip_threshold: 100.0
I0420 08:24:24.201251 134595236116288 pyconfig.py:432] Config param qk_nope_head_dim: 128
I0420 08:24:24.201266 134595236116288 pyconfig.py:432] Config param qk_norm_with_scale: True
I0420 08:24:24.201282 134595236116288 pyconfig.py:432] Config param qk_rope_head_dim: 64
I0420 08:24:24.201295 134595236116288 pyconfig.py:432] Config param qkv_proj: RematLocation.REMAT
I0420 08:24:24.201311 134595236116288 pyconfig.py:432] Config param quant_cfg_path: 
I0420 08:24:24.201325 134595236116288 pyconfig.py:432] Config param quantization: QuantizationType.NONE
I0420 08:24:24.201343 134595236116288 pyconfig.py:432] Config param quantization_local_shard_count: 4
I0420 08:24:24.201357 134595236116288 pyconfig.py:432] Config param quantize_kvcache: False
I0420 08:24:24.201372 134595236116288 pyconfig.py:432] Config param query_proj: RematLocation.REMAT
I0420 08:24:24.201386 134595236116288 pyconfig.py:432] Config param query_wa_proj: RematLocation.REMAT
I0420 08:24:24.201401 134595236116288 pyconfig.py:432] Config param ragged_block_size: 256
I0420 08:24:24.201416 134595236116288 pyconfig.py:432] Config param ragged_buffer_factor: -1.0
I0420 08:24:24.201431 134595236116288 pyconfig.py:432] Config param rampup_end_step: 0
I0420 08:24:24.201446 134595236116288 pyconfig.py:432] Config param rampup_samples_per_increment_to_load: None
I0420 08:24:24.201461 134595236116288 pyconfig.py:432] Config param reasoning_end_token: </reasoning>
I0420 08:24:24.201476 134595236116288 pyconfig.py:432] Config param reasoning_start_token: <reasoning>
I0420 08:24:24.201491 134595236116288 pyconfig.py:432] Config param record_internal_nn_metrics: 0
I0420 08:24:24.201506 134595236116288 pyconfig.py:432] Config param remat_policy: full
I0420 08:24:24.201520 134595236116288 pyconfig.py:432] Config param remat_policy_for_vit: minimal
I0420 08:24:24.201535 134595236116288 pyconfig.py:432] Config param remove_size_one_mesh_axis_from_type: True
I0420 08:24:24.201550 134595236116288 pyconfig.py:432] Config param replicate_quant_scale: False
I0420 08:24:24.201565 134595236116288 pyconfig.py:432] Config param replicator_backup_interval_minutes: 0
I0420 08:24:24.201579 134595236116288 pyconfig.py:432] Config param report_heartbeat_metric_for_gcp_monitoring: False
I0420 08:24:24.201596 134595236116288 pyconfig.py:432] Config param report_performance_metric_for_gcp_monitoring: False
I0420 08:24:24.201611 134595236116288 pyconfig.py:432] Config param reshape_q: False
I0420 08:24:24.201626 134595236116288 pyconfig.py:432] Config param return_log_prob: False
I0420 08:24:24.201641 134595236116288 pyconfig.py:432] Config param reuse_example_batch: 0
I0420 08:24:24.201656 134595236116288 pyconfig.py:432] Config param reward_exact_answer: 5.0
I0420 08:24:24.201671 134595236116288 pyconfig.py:432] Config param reward_exact_format_match: 3.0
I0420 08:24:24.201686 134595236116288 pyconfig.py:432] Config param reward_partial_format_match: 0.5
I0420 08:24:24.201701 134595236116288 pyconfig.py:432] Config param reward_ratio_guess_to_answer_high: 0.5
I0420 08:24:24.201724 134595236116288 pyconfig.py:432] Config param reward_ratio_guess_to_answer_low: 0.25
I0420 08:24:24.201743 134595236116288 pyconfig.py:432] Config param reward_white_space_format_match: 1.5
I0420 08:24:24.201757 134595236116288 pyconfig.py:432] Config param rl: {'num_generations': 2, 'num_iterations': 1, 'grpo_beta': 0.08, 'grpo_epsilon': 0.2, 'loss_algo': 'grpo', 'use_agentic_rollout': False, 'max_concurrency': 256, 'off_policy_steps': 0, 'system_prompt': '', 'degenerate_group_masking': True, 'epsilon_high': None}
I0420 08:24:24.201778 134595236116288 pyconfig.py:432] Config param rollout_data_parallelism: -1
I0420 08:24:24.201792 134595236116288 pyconfig.py:432] Config param rollout_expert_parallelism: 1
I0420 08:24:24.201808 134595236116288 pyconfig.py:432] Config param rollout_micro_batch_size: -1
I0420 08:24:24.201822 134595236116288 pyconfig.py:432] Config param rollout_tensor_parallelism: -1
I0420 08:24:24.201838 134595236116288 pyconfig.py:432] Config param rope_attention_scaling: False
I0420 08:24:24.201851 134595236116288 pyconfig.py:432] Config param rope_factor: 40
I0420 08:24:24.201867 134595236116288 pyconfig.py:432] Config param rope_interleave: True
I0420 08:24:24.201882 134595236116288 pyconfig.py:432] Config param rope_linear_scaling_factor: 1.0
I0420 08:24:24.201897 134595236116288 pyconfig.py:432] Config param rope_max_timescale: 10000
I0420 08:24:24.201912 134595236116288 pyconfig.py:432] Config param rope_min_timescale: 1
I0420 08:24:24.201927 134595236116288 pyconfig.py:432] Config param rope_theta_for_vit: 10000
I0420 08:24:24.201943 134595236116288 pyconfig.py:432] Config param rope_truncate: True
I0420 08:24:24.201957 134595236116288 pyconfig.py:432] Config param rope_type: RopeType.DEFAULT
I0420 08:24:24.201975 134595236116288 pyconfig.py:432] Config param rope_use_scale: True
I0420 08:24:24.201989 134595236116288 pyconfig.py:432] Config param routed_bias: False
I0420 08:24:24.202004 134595236116288 pyconfig.py:432] Config param routed_bias_update_rate: 0.0
I0420 08:24:24.202018 134595236116288 pyconfig.py:432] Config param routed_scaling_factor: 1.0
I0420 08:24:24.202034 134595236116288 pyconfig.py:432] Config param routed_score_func: 
I0420 08:24:24.202048 134595236116288 pyconfig.py:432] Config param run_name: gpt3-52k_2026-04-20-08-24
I0420 08:24:24.202063 134595236116288 pyconfig.py:432] Config param sa_block_kv: 512
I0420 08:24:24.202077 134595236116288 pyconfig.py:432] Config param sa_block_kv_compute: 512
I0420 08:24:24.202091 134595236116288 pyconfig.py:432] Config param sa_block_kv_dkv: 512
I0420 08:24:24.202107 134595236116288 pyconfig.py:432] Config param sa_block_kv_dkv_compute: 512
I0420 08:24:24.202122 134595236116288 pyconfig.py:432] Config param sa_block_kv_dq: 512
I0420 08:24:24.202136 134595236116288 pyconfig.py:432] Config param sa_block_q: 512
I0420 08:24:24.202152 134595236116288 pyconfig.py:432] Config param sa_block_q_dkv: 512
I0420 08:24:24.202166 134595236116288 pyconfig.py:432] Config param sa_block_q_dq: 512
I0420 08:24:24.202180 134595236116288 pyconfig.py:432] Config param sa_k_layout: HEAD_DIM_MINOR
I0420 08:24:24.202195 134595236116288 pyconfig.py:432] Config param sa_q_layout: HEAD_DIM_MINOR
I0420 08:24:24.202210 134595236116288 pyconfig.py:432] Config param sa_use_fused_bwd_kernel: False
I0420 08:24:24.202225 134595236116288 pyconfig.py:432] Config param sa_v_layout: HEAD_DIM_MINOR
I0420 08:24:24.202239 134595236116288 pyconfig.py:432] Config param sampler_devices_fraction: 0.5
I0420 08:24:24.202256 134595236116288 pyconfig.py:432] Config param save_checkpoint_on_completion: True
I0420 08:24:24.202271 134595236116288 pyconfig.py:432] Config param save_config_to_gcs: False
I0420 08:24:24.202285 134595236116288 pyconfig.py:432] Config param save_quantized_params_path: 
I0420 08:24:24.202301 134595236116288 pyconfig.py:432] Config param scale_embedding_for_audio: True
I0420 08:24:24.202315 134595236116288 pyconfig.py:432] Config param scan_layers: True
I0420 08:24:24.202331 134595236116288 pyconfig.py:432] Config param scan_layers_per_stage: False
I0420 08:24:24.202346 134595236116288 pyconfig.py:432] Config param scan_pipeline_iterations: True
I0420 08:24:24.202360 134595236116288 pyconfig.py:432] Config param scan_pipeline_repeats: False
I0420 08:24:24.202375 134595236116288 pyconfig.py:432] Config param set_remat_policy_on_layers_per_stage: False
I0420 08:24:24.202389 134595236116288 pyconfig.py:432] Config param set_remat_policy_on_pipeline_iterations: True
I0420 08:24:24.202405 134595236116288 pyconfig.py:432] Config param sft_train_on_completion_only: False
I0420 08:24:24.202418 134595236116288 pyconfig.py:432] Config param shard_exp_on_fsdp: False
I0420 08:24:24.202434 134595236116288 pyconfig.py:432] Config param shard_mode: ShardMode.AUTO
I0420 08:24:24.202450 134595236116288 pyconfig.py:432] Config param shard_optimizer_over_data: False
I0420 08:24:24.202466 134595236116288 pyconfig.py:432] Config param sharding_strategy: None
I0420 08:24:24.202480 134595236116288 pyconfig.py:432] Config param sharding_tolerance: 0.02
I0420 08:24:24.202495 134595236116288 pyconfig.py:432] Config param shardy: True
I0420 08:24:24.202510 134595236116288 pyconfig.py:432] Config param share_kv_projections: False
I0420 08:24:24.202525 134595236116288 pyconfig.py:432] Config param shared_experts: 0
I0420 08:24:24.202540 134595236116288 pyconfig.py:432] Config param sinkhorn_iterations: 20
I0420 08:24:24.202554 134595236116288 pyconfig.py:432] Config param skip_first_n_steps_for_profiler: 1
I0420 08:24:24.202569 134595236116288 pyconfig.py:432] Config param skip_jax_distributed_system: False
I0420 08:24:24.202584 134595236116288 pyconfig.py:432] Config param skip_step_interval: 128
I0420 08:24:24.202598 134595236116288 pyconfig.py:432] Config param skip_step_on_spikes: False
I0420 08:24:24.202614 134595236116288 pyconfig.py:432] Config param skip_step_scaling_factor: 6.0
I0420 08:24:24.202630 134595236116288 pyconfig.py:432] Config param sliding_window_size: 0
I0420 08:24:24.202644 134595236116288 pyconfig.py:432] Config param solution_end_token: </answer>
I0420 08:24:24.202659 134595236116288 pyconfig.py:432] Config param solution_start_token: <answer>
I0420 08:24:24.202674 134595236116288 pyconfig.py:432] Config param source_checkpoint_layout: orbax
I0420 08:24:24.202688 134595236116288 pyconfig.py:432] Config param sparse_matmul: True
I0420 08:24:24.202712 134595236116288 pyconfig.py:432] Config param spatial_merge_size_for_vit: 2
I0420 08:24:24.202728 134595236116288 pyconfig.py:432] Config param stack_prefill_result_cache: False
I0420 08:24:24.202748 134595236116288 pyconfig.py:432] Config param stack_trace_interval_seconds: 600
I0420 08:24:24.202762 134595236116288 pyconfig.py:432] Config param stack_trace_to_cloud: False
I0420 08:24:24.202778 134595236116288 pyconfig.py:432] Config param step_deviation_interval_seconds: 30
I0420 08:24:24.202792 134595236116288 pyconfig.py:432] Config param steps: 200000
I0420 08:24:24.202807 134595236116288 pyconfig.py:432] Config param stop_strings: None
I0420 08:24:24.202821 134595236116288 pyconfig.py:432] Config param student_overrides: {'model_name': 'llama3.1-8b'}
I0420 08:24:24.202837 134595236116288 pyconfig.py:432] Config param student_params_to_update: None
I0420 08:24:24.202852 134595236116288 pyconfig.py:432] Config param subslice_shape: 
I0420 08:24:24.202867 134595236116288 pyconfig.py:432] Config param swap_space_vllm_gb: 2
I0420 08:24:24.202881 134595236116288 pyconfig.py:432] Config param system_prompt: 
I0420 08:24:24.202896 134595236116288 pyconfig.py:432] Config param target_eval_loss: 0.0
I0420 08:24:24.202910 134595236116288 pyconfig.py:432] Config param teacher_overrides: {'model_name': 'llama3.1-8b'}
I0420 08:24:24.202926 134595236116288 pyconfig.py:432] Config param temperature_tuning: False
I0420 08:24:24.202941 134595236116288 pyconfig.py:432] Config param temporal_patch_size_for_vit: 2
I0420 08:24:24.202956 134595236116288 pyconfig.py:432] Config param tensorboard_dir: None
I0420 08:24:24.202970 134595236116288 pyconfig.py:432] Config param tensors_on_device: None
I0420 08:24:24.202986 134595236116288 pyconfig.py:432] Config param tensors_to_offload: None
I0420 08:24:24.203000 134595236116288 pyconfig.py:432] Config param test_batch_start_index: 0
I0420 08:24:24.203015 134595236116288 pyconfig.py:432] Config param tile_size_for_vit: 336
I0420 08:24:24.203029 134595236116288 pyconfig.py:432] Config param tokenize_eval_data: True
I0420 08:24:24.203043 134595236116288 pyconfig.py:432] Config param tokenize_train_data: True
I0420 08:24:24.203058 134595236116288 pyconfig.py:432] Config param tokenizer_path: meta-llama/Llama-3.1-8B
I0420 08:24:24.203072 134595236116288 pyconfig.py:432] Config param tokenizer_type: TokenizerType.HUGGINGFACE
I0420 08:24:24.203089 134595236116288 pyconfig.py:432] Config param topk_routing_group: -1
I0420 08:24:24.203104 134595236116288 pyconfig.py:432] Config param train_data_columns: ['text']
I0420 08:24:24.203118 134595236116288 pyconfig.py:432] Config param train_fraction: 1.0
I0420 08:24:24.203134 134595236116288 pyconfig.py:432] Config param train_image_column: image
I0420 08:24:24.203149 134595236116288 pyconfig.py:432] Config param train_micro_batch_size: -1
I0420 08:24:24.203167 134595236116288 pyconfig.py:432] Config param train_split: train
I0420 08:24:24.203181 134595236116288 pyconfig.py:432] Config param trainable_parameters_mask: []
I0420 08:24:24.203197 134595236116288 pyconfig.py:432] Config param trainable_position_size: 2048
I0420 08:24:24.203211 134595236116288 pyconfig.py:432] Config param trainer_devices_fraction: 0.5
I0420 08:24:24.203227 134595236116288 pyconfig.py:432] Config param upload_all_profiler_results: False
I0420 08:24:24.203241 134595236116288 pyconfig.py:432] Config param use_2d_fsdp_sharding: False
I0420 08:24:24.203255 134595236116288 pyconfig.py:432] Config param use_agentic_rollout: False
I0420 08:24:24.203271 134595236116288 pyconfig.py:432] Config param use_audio: False
I0420 08:24:24.203285 134595236116288 pyconfig.py:432] Config param use_audio_in_video: False
I0420 08:24:24.203301 134595236116288 pyconfig.py:432] Config param use_batch_split_schedule: False
I0420 08:24:24.203315 134595236116288 pyconfig.py:432] Config param use_chat_template: False
I0420 08:24:24.203330 134595236116288 pyconfig.py:432] Config param use_chunked_prefill: False
I0420 08:24:24.203344 134595236116288 pyconfig.py:432] Config param use_custom_sort_vjp: True
I0420 08:24:24.203359 134595236116288 pyconfig.py:432] Config param use_dpo: False
I0420 08:24:24.203373 134595236116288 pyconfig.py:432] Config param use_gather_mosaic_kernel: False
I0420 08:24:24.203388 134595236116288 pyconfig.py:432] Config param use_grpo: True
I0420 08:24:24.203404 134595236116288 pyconfig.py:432] Config param use_indexer: False
I0420 08:24:24.203418 134595236116288 pyconfig.py:432] Config param use_iota_embed: True
I0420 08:24:24.203433 134595236116288 pyconfig.py:432] Config param use_jax_splash: False
I0420 08:24:24.203447 134595236116288 pyconfig.py:432] Config param use_max_logit_estimate: -1
I0420 08:24:24.203462 134595236116288 pyconfig.py:432] Config param use_mrope: False
I0420 08:24:24.203478 134595236116288 pyconfig.py:432] Config param use_multimodal: False
I0420 08:24:24.203492 134595236116288 pyconfig.py:432] Config param use_pathways: True
I0420 08:24:24.203507 134595236116288 pyconfig.py:432] Config param use_post_attn_norm: False
I0420 08:24:24.203523 134595236116288 pyconfig.py:432] Config param use_post_ffw_norm: False
I0420 08:24:24.203536 134595236116288 pyconfig.py:432] Config param use_qk_clip: False
I0420 08:24:24.203552 134595236116288 pyconfig.py:432] Config param use_qk_norm: False
I0420 08:24:24.203566 134595236116288 pyconfig.py:432] Config param use_qk_norm_in_gdn: True
I0420 08:24:24.203581 134595236116288 pyconfig.py:432] Config param use_qwix_quantization: False
I0420 08:24:24.203596 134595236116288 pyconfig.py:432] Config param use_ragged_attention: False
I0420 08:24:24.203611 134595236116288 pyconfig.py:432] Config param use_random_routing: False
I0420 08:24:24.203625 134595236116288 pyconfig.py:432] Config param use_replicator_service: False
I0420 08:24:24.203639 134595236116288 pyconfig.py:432] Config param use_ring_of_experts: False
I0420 08:24:24.203655 134595236116288 pyconfig.py:432] Config param use_sft: False
I0420 08:24:24.203669 134595236116288 pyconfig.py:432] Config param use_splash_scheduler: False
I0420 08:24:24.203684 134595236116288 pyconfig.py:432] Config param use_tokamax_gmm: False
I0420 08:24:24.203699 134595236116288 pyconfig.py:432] Config param use_tokamax_splash: False
I0420 08:24:24.203724 134595236116288 pyconfig.py:432] Config param use_truncation: True
I0420 08:24:24.203741 134595236116288 pyconfig.py:432] Config param use_tunix_gradient_accumulation: False
I0420 08:24:24.203757 134595236116288 pyconfig.py:432] Config param use_untrainable_positional_embedding: False
I0420 08:24:24.203771 134595236116288 pyconfig.py:432] Config param use_vertex_tensorboard: False
I0420 08:24:24.203786 134595236116288 pyconfig.py:432] Config param using_pipeline_parallelism: False
I0420 08:24:24.203802 134595236116288 pyconfig.py:432] Config param v_head_dim: 128
I0420 08:24:24.203815 134595236116288 pyconfig.py:432] Config param v_norm_with_scale: True
I0420 08:24:24.203831 134595236116288 pyconfig.py:432] Config param value_proj: RematLocation.REMAT
I0420 08:24:24.203845 134595236116288 pyconfig.py:432] Config param vertex_tensorboard_project: 
I0420 08:24:24.203861 134595236116288 pyconfig.py:432] Config param vertex_tensorboard_region: 
I0420 08:24:24.203875 134595236116288 pyconfig.py:432] Config param video_path: 
I0420 08:24:24.203891 134595236116288 pyconfig.py:432] Config param video_placeholder: <|video|>
I0420 08:24:24.203905 134595236116288 pyconfig.py:432] Config param vision_output_dim_for_vit: 4096
I0420 08:24:24.203921 134595236116288 pyconfig.py:432] Config param vision_output_length: -1
I0420 08:24:24.203935 134595236116288 pyconfig.py:432] Config param vllm_additional_config: {}
I0420 08:24:24.203951 134595236116288 pyconfig.py:432] Config param vllm_hf_config_path: 
I0420 08:24:24.203965 134595236116288 pyconfig.py:432] Config param vllm_hf_overrides: {}
I0420 08:24:24.203981 134595236116288 pyconfig.py:432] Config param vocab_size: 32000
I0420 08:24:24.203997 134595236116288 pyconfig.py:432] Config param warmup_steps_fraction: 0.1
I0420 08:24:24.204012 134595236116288 pyconfig.py:432] Config param weight_dtype: float32
I0420 08:24:24.204034 134595236116288 pyconfig.py:432] Config param weight_quantization_calibration_method: absmax
I0420 08:24:24.204049 134595236116288 pyconfig.py:432] Config param wi_tile_dlhs_batch_seq: 512
I0420 08:24:24.204065 134595236116288 pyconfig.py:432] Config param wi_tile_dlhs_embed_dim: 1024
I0420 08:24:24.204081 134595236116288 pyconfig.py:432] Config param wi_tile_dlhs_mlp_dim: 1024
I0420 08:24:24.204096 134595236116288 pyconfig.py:432] Config param wi_tile_drhs_batch_seq: 512
I0420 08:24:24.204112 134595236116288 pyconfig.py:432] Config param wi_tile_drhs_embed_dim: 1024
I0420 08:24:24.204126 134595236116288 pyconfig.py:432] Config param wi_tile_drhs_mlp_dim: 1024
I0420 08:24:24.204141 134595236116288 pyconfig.py:432] Config param wi_tile_fwd_batch_seq: 512
I0420 08:24:24.204156 134595236116288 pyconfig.py:432] Config param wi_tile_fwd_embed_dim: 1024
I0420 08:24:24.204172 134595236116288 pyconfig.py:432] Config param wi_tile_fwd_mlp_dim: 1024
I0420 08:24:24.204187 134595236116288 pyconfig.py:432] Config param wo_tile_dlhs_batch_seq: 512
I0420 08:24:24.204201 134595236116288 pyconfig.py:432] Config param wo_tile_dlhs_embed_dim: 1024
I0420 08:24:24.204217 134595236116288 pyconfig.py:432] Config param wo_tile_dlhs_mlp_dim: 1024
I0420 08:24:24.204231 134595236116288 pyconfig.py:432] Config param wo_tile_drhs_batch_seq: 512
I0420 08:24:24.204246 134595236116288 pyconfig.py:432] Config param wo_tile_drhs_embed_dim: 1024
I0420 08:24:24.204262 134595236116288 pyconfig.py:432] Config param wo_tile_drhs_mlp_dim: 1024
I0420 08:24:24.204277 134595236116288 pyconfig.py:432] Config param wo_tile_fwd_batch_seq: 512
I0420 08:24:24.204292 134595236116288 pyconfig.py:432] Config param wo_tile_fwd_embed_dim: 1024
I0420 08:24:24.204306 134595236116288 pyconfig.py:432] Config param wo_tile_fwd_mlp_dim: 1024
I0420 08:24:24.204322 134595236116288 pyconfig.py:432] Config param wsd_decay_steps_fraction: 0.1
I0420 08:24:24.204336 134595236116288 pyconfig.py:432] Config param wsd_decay_style: WsdDecayStyle.LINEAR
I0420 08:24:24.204353 134595236116288 pyconfig.py:432] Config param xprof_e2e_enable_fw_power_level_event: False
I0420 08:24:24.204368 134595236116288 pyconfig.py:432] Config param xprof_e2e_enable_fw_thermal_event: False
I0420 08:24:24.204384 134595236116288 pyconfig.py:432] Config param xprof_e2e_enable_fw_throttle_event: False
I0420 08:24:24.204399 134595236116288 pyconfig.py:432] Config param xprof_tpu_power_trace_level: 0
I0420 08:24:24.204416 134595236116288 pyconfig.py:432] Config param z_loss_multiplier: 0.0
I0420 08:24:24.204771 134595236116288 tokenizer.py:245] Tokenizer path: meta-llama/Llama-2-7b-chat-hf
I0420 08:24:24.204809 134595236116288 tokenizer.py:224] Loading HF tokenizer: meta-llama/Llama-2-7b-chat-hf
I0420 08:24:28.222442 134595236116288 _schedule.py:129] A polynomial schedule was set with a non-positive `transition_steps` value; this results in a constant schedule with value `init_value`.
I0420 08:24:28.225429 134595236116288 maxtext_utils.py:1718] Num_devices: 32, shape (1, 4, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1)
I0420 08:24:28.225550 134595236116288 train_distill.py:584] Applying logical axis rules for model initialization and training...
I0420 08:24:28.225620 134595236116288 train_distill.py:588] Loading Student from ...
I0420 08:24:28.225647 134595236116288 train_distill.py:169] --- Student Configuration ---
I0420 08:24:28.225668 134595236116288 train_distill.py:170]   Model Name:      gpt3-52k
I0420 08:24:28.225689 134595236116288 train_distill.py:171]   Dimensions:      1 Layers, 16 Emb Dim, 8 Head Dim
I0420 08:24:28.225719 134595236116288 train_distill.py:174]   Attention Heads: 2 Query, 2 KV
I0420 08:24:28.225739 134595236116288 train_distill.py:175]   Vocab Size:      32000
I0420 08:24:28.225758 134595236116288 train_distill.py:176]   Checkpoint:      
I0420 08:24:28.225776 134595236116288 train_distill.py:460] Initializing model: gpt3-52k...
I0420 08:24:29.607862 134595236116288 train_distill.py:602] Loading Teacher from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items...
I0420 08:24:29.607968 134595236116288 train_distill.py:169] --- Teacher Configuration ---
I0420 08:24:29.607996 134595236116288 train_distill.py:170]   Model Name:      gpt3-52k
I0420 08:24:29.608020 134595236116288 train_distill.py:171]   Dimensions:      1 Layers, 16 Emb Dim, 8 Head Dim
I0420 08:24:29.608043 134595236116288 train_distill.py:174]   Attention Heads: 2 Query, 2 KV
I0420 08:24:29.608063 134595236116288 train_distill.py:175]   Vocab Size:      32000
I0420 08:24:29.608083 134595236116288 train_distill.py:176]   Checkpoint:      gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items
I0420 08:24:29.608101 134595236116288 train_distill.py:460] Initializing model: gpt3-52k...
I0420 08:24:30.663587 134595236116288 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0420 08:24:30.664042 134595236116288 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=True, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7a693096e5d0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0420 08:24:30.664103 134595236116288 abstract_checkpointer.py:35] orbax-checkpoint version: 0.11.28
W0420 08:24:31.173753 134595236116288 checkpoint.py:202] Metadata file does not exist: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items/_CHECKPOINT_METADATA
I0420 08:24:32.144813    2086 google_auth_provider.cc:181] Running on GCE, using service account 562977990677-compute@developer.gserviceaccount.com
I0420 08:24:33.001500 134595236116288 checkpointer.py:304] Restoring checkpoint from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items.
W0420 08:24:34.758466 134595236116288 transform_utils.py:230] The transformations API will eventually be replaced by an upgraded design. The current API will not be removed until this point, but it will no longer be actively worked on.
I0420 08:24:34.758862 134595236116288 transform_utils.py:288] The following keys are not loaded from the original tree after applying specified transforms: params/params/decoder/to_nnx__rngs/aqt/count, params/params/decoder/to_nnx__rngs/aqt/key, params/params/decoder/to_nnx__rngs/dropout/count, params/params/decoder/to_nnx__rngs/dropout/key, params/params/decoder/to_nnx__rngs/params/count, params/params/decoder/to_nnx__rngs/params/key
I0420 08:24:35.879925 134595236116288 checkpointer.py:318] Finished restoring checkpoint in 3.25 seconds from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items.
I0420 08:24:36.564079 134595236116288 train_distill.py:628] Initializing Data Iterators via MaxText pipeline...
I0420 08:24:36.626981 134595236116288 config.py:112] TensorFlow version 2.20.0 available.
I0420 08:24:36.627470 134595236116288 config.py:125] JAX version 0.8.3 available.
E0420 08:24:38.711879 134595236116288 packing.py:209] PackAndBatchOperation is deprecated. Please use lazy_dataset.FirstFitPackIterDataset instead.
I0420 08:24:38.712104 134595236116288 data_loader.py:408] Adding CopyNumPyArrayToSharedMemory MapTransform.
I0420 08:24:38.715070 134595236116288 train_distill.py:405] Input Pipeline Checkpointing: DISABLED
I0420 08:24:38.715130 134595236116288 train_distill.py:409] Reason: Iterator 'MultiHostDataLoadIterator' is not recognized as Grain (dataset_type='DatasetType.HF', has_save=False)
I0420 08:24:38.715192 134595236116288 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0420 08:24:38.715265 134595236116288 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7a693096e5d0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0420 08:24:38.715305 134595236116288 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0420 08:24:38.715336 134595236116288 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7a693096e5d0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0420 08:24:38.715379 134595236116288 checkpoint_manager.py:702] [process=1][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a529221b500>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a529221b470>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a529221b3e0>}, handler_registry=None
I0420 08:24:38.715567 134595236116288 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a529221b500>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0420 08:24:38.715607 134595236116288 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a529221b470>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0420 08:24:38.715633 134595236116288 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a529221b3e0>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0420 08:24:38.715657 134595236116288 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a529221b050>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0420 08:24:38.715684 134595236116288 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a529221b500>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a529221b500>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a529221b470>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a529221b470>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a529221b3e0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a529221b3e0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a529221b050>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a529221b050>}).
I0420 08:24:38.716101 134595236116288 async_checkpointer.py:177] [process=1][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7a5291dbfec0> timeout: 600 secs and primary_host=0 for async checkpoint writes
I0420 08:24:41.056787 134595236116288 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_trainstate_and_training_loop_20260420_081542/pt_distill_linen_xpk_feat_nnx_trainstate_and_training_loop_20260420_081542_07_distill_smoke/checkpoints
I0420 08:24:41.478264 134595236116288 checkpoint_manager.py:921] [process=1][thread=MainThread] CheckpointManager created,  primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_trainstate_and_training_loop_20260420_081542/pt_distill_linen_xpk_feat_nnx_trainstate_and_training_loop_20260420_081542_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7a529221b3b0>
I0420 08:24:41.478438 134595236116288 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0420 08:24:41.478507 134595236116288 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7a693096e5d0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0420 08:24:41.478543 134595236116288 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0420 08:24:41.478576 134595236116288 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7a693096e5d0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0420 08:24:41.478612 134595236116288 checkpoint_manager.py:1983] [process=1][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning.
I0420 08:24:41.478669 134595236116288 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=134595236116288 count=1 at 0x7a4ec036ed80>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7a529221b1d0>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7a529221b1a0>, _write_futures=[])
I0420 08:24:41.479020 134595236116288 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=134595236116288 count=1 at 0x7a4ec036ed80>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7a529221b1d0>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7a529221b1a0>, _write_futures=[])
I0420 08:24:41.479109 134595236116288 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=134595236116288 count=1 at 0x7a4ec036ed80>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7a529221b1d0>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7a529221b1a0>, _write_futures=[])
I0420 08:24:41.479145 134595236116288 checkpoint_manager.py:702] [process=1][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a529221b380>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a529221a480>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a5292219a30>, 'iter': <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7a5292219790>}, handler_registry=None
I0420 08:24:41.479254 134595236116288 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a529221b380>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0420 08:24:41.479289 134595236116288 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a529221a480>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0420 08:24:41.479314 134595236116288 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a5292219a30>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0420 08:24:41.479341 134595236116288 composite_checkpoint_handler.py:237] Deferred registration for item: "iter". Adding handler `<maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7a5292219790>` for item "iter" and save args `<class 'maxtext.common.checkpointing.GrainCheckpointSave'>` and restore args `<class 'maxtext.common.checkpointing.GrainCheckpointRestore'>` to `_handler_registry`.
I0420 08:24:41.479364 134595236116288 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a5291d390d0>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0420 08:24:41.479388 134595236116288 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a529221b380>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a529221b380>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a529221a480>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a529221a480>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a5292219a30>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a5292219a30>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointSave'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7a5292219790>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointRestore'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7a5292219790>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a5291d390d0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a5291d390d0>}).
I0420 08:24:41.479460 134595236116288 async_checkpointer.py:177] [process=1][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7a5291dbff60> timeout: 600 secs and primary_host=0 for async checkpoint writes
I0420 08:24:41.896757 134595236116288 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_trainstate_and_training_loop_20260420_081542/pt_distill_linen_xpk_feat_nnx_trainstate_and_training_loop_20260420_081542_07_distill_smoke/checkpoints
I0420 08:24:42.316472 134595236116288 checkpoint_manager.py:921] [process=1][thread=MainThread] CheckpointManager created,  primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_trainstate_and_training_loop_20260420_081542/pt_distill_linen_xpk_feat_nnx_trainstate_and_training_loop_20260420_081542_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7a4ec04a58e0>
I0420 08:24:42.316728 134595236116288 train_distill.py:675] Starting Distillation Training...
I0420 08:24:42.316830 134595236116288 peft_trainer.py:590] Training with mesh: Mesh('diloco': 1, 'data': 4, 'stage': 1, 'fsdp': 8, 'fsdp_transpose': 1, 'sequence': 1, 'context': 1, 'context_autoregressive': 1, 'tensor': 1, 'tensor_transpose': 1, 'tensor_sequence': 1, 'expert': 1, 'autoregressive': 1, axis_types=(Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto))
I0420 08:24:43.027144 134595236116288 peft_trainer.py:600] Compiled train_step cache size: 0

Training:   0%|          | 0/5 [00:00<?, ?step/s]I0420 08:24:43.028914 134455981217536 grain_pool.py:367] Grain pool will use 1 processes.
I0420 08:24:43.055376 134455981217536 grain_pool.py:440] Grain pool will start child processes.
I0420 08:24:43.060675 134455981217536 grain_pool.py:448] Grain pool started all child processes.
2026-04-20 08:24:49.111164: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
I0420 08:24:52.156973 134595236116288 utils.py:86] Train loop finished in: 9.1293 seconds
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 749, in <module>
    app.run(main)
  File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 316, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 261, in _run_main
    sys.exit(main(argv))
             ^^^^^^^^^^
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 745, in main
    train_distill(student_config, teacher_config, is_offline, global_config.offline_data_dir)
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 677, in train_distill
    trainer.train(train_iter, eval_iter)
  File "/usr/local/lib/python3.12/site-packages/tunix/sft/peft_trainer.py", line 659, in train
    train_example = sharding_utils.shard_input(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 58, in shard_input
    return jax.tree.map(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/tree.py", line 155, in map
    return tree_util.tree_map(f, tree, *rest, is_leaf=is_leaf)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 362, in tree_map
    return treedef.unflatten(f(*xs) for xs in zip(*all_leaves))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 362, in <genexpr>
    return treedef.unflatten(f(*xs) for xs in zip(*all_leaves))
                             ^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 59, in <lambda>
    lambda x: jax.make_array_from_process_local_data(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 986, in make_array_from_process_local_data
    out = [_array_from_process_local_data(data, s, shape)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 1048, in _array_from_process_local_data
    return make_array_from_callback(global_shape, sharding, cb)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 845, in make_array_from_callback
    per_device_values = api.device_put(per_device_values, devices)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/api.py", line 2729, in device_put
    out_flat = dispatch._batched_device_put_impl(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 558, in _batched_device_put_impl
    y = _device_put_impl(x, device=device, src=src, copy=cp, aval=aval)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 545, in _device_put_impl
    return _device_put_sharding_impl(x, aval, device, copy)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 487, in _device_put_sharding_impl
    raise ValueError(
ValueError: device_put's first argument must be a fully addressable array, but got value with devices {TpuDevice(id=2, process_index=1, coords=(2,0,0), core_on_chip=0), TpuDevice(id=20, process_index=4, coords=(0,5,0), core_on_chip=0), TpuDevice(id=7, process_index=1, coords=(3,1,0), core_on_chip=0), TpuDevice(id=21, process_index=4, coords=(1,5,0), core_on_chip=0), TpuDevice(id=29, process_index=6, coords=(1,7,0), core_on_chip=0), TpuDevice(id=23, process_index=5, coords=(3,5,0), core_on_chip=0), TpuDevice(id=31, process_index=7, coords=(3,7,0), core_on_chip=0), TpuDevice(id=8, process_index=2, coords=(0,2,0), core_on_chip=0), TpuDevice(id=4, process_index=0, coords=(0,1,0), core_on_chip=0), TpuDevice(id=11, process_index=3, coords=(3,2,0), core_on_chip=0), TpuDevice(id=24, process_index=6, coords=(0,6,0), core_on_chip=0), TpuDevice(id=16, process_index=4, coords=(0,4,0), core_on_chip=0), TpuDevice(id=18, process_index=5, coords=(2,4,0), core_on_chip=0), TpuDevice(id=26, process_index=7, coords=(2,6,0), core_on_chip=0), TpuDevice(id=12, process_index=2, coords=(0,3,0), core_on_chip=0), TpuDevice(id=14, process_index=3, coords=(2,3,0), core_on_chip=0), TpuDevice(id=9, process_index=2, coords=(1,2,0), core_on_chip=0), TpuDevice(id=25, process_index=6, coords=(1,6,0), core_on_chip=0), TpuDevice(id=19, process_index=5, coords=(3,4,0), core_on_chip=0), TpuDevice(id=27, process_index=7, coords=(3,6,0), core_on_chip=0), TpuDevice(id=13, process_index=2, coords=(1,3,0), core_on_chip=0), TpuDevice(id=5, process_index=0, coords=(1,1,0), core_on_chip=0), TpuDevice(id=17, process_index=4, coords=(1,4,0), core_on_chip=0), TpuDevice(id=22, process_index=5, coords=(2,5,0), core_on_chip=0), TpuDevice(id=10, process_index=3, coords=(2,2,0), core_on_chip=0), TpuDevice(id=3, process_index=1, coords=(3,0,0), core_on_chip=0), TpuDevice(id=15, process_index=3, coords=(3,3,0), core_on_chip=0), TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), TpuDevice(id=1, process_index=0, coords=(1,0,0), core_on_chip=0), TpuDevice(id=6, process_index=1, coords=(2,1,0), core_on_chip=0), TpuDevice(id=30, process_index=7, coords=(2,7,0), core_on_chip=0), TpuDevice(id=28, process_index=6, coords=(0,7,0), core_on_chip=0)}
I0420 08:24:52.505603 134455981217536 grain_pool.py:542] Grain pool is exiting.
I0420 08:24:52.505701 134455981217536 grain_pool.py:547] Shutting down multiprocessing system.
I0420 08:24:53.954559 134455981217536 grain_pool.py:547] Shutting down multiprocessing system.

Training:   0%|          | 0/5 [00:13<?, ?step/s]
/usr/local/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 15 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
XPK End: Mon Apr 20 08:25:01 UTC 2026
EXIT_CODE=1