XPK Start: Wed Apr 22 09:54:15 UTC 2026 2026-04-22 09:54:32.463977: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303) I0422 09:54:36.058316 136547450599232 max_utils.py:273] Attempting to initialize the jax distributed system... INFO:2026-04-22 09:54:45,097:jax._src.distributed:149: Starting JAX distributed service on [::]:8482 I0422 09:54:45.097824 136547450599232 distributed.py:149] Starting JAX distributed service on [::]:8482 INFO:2026-04-22 09:54:45,100:jax._src.distributed:166: Connecting to JAX distributed service on mt-07-distill-smoke-2ta7g-slice-job-0-0.mt-07-distill-smoke-2ta7g:8482 I0422 09:54:45.100115 136547450599232 distributed.py:166] Connecting to JAX distributed service on mt-07-distill-smoke-2ta7g-slice-job-0-0.mt-07-distill-smoke-2ta7g:8482 I0422 09:54:45.801211 136547450599232 max_utils.py:284] Jax distributed system initialized! I0422 09:54:52.126633 136547450599232 max_utils.py:244] Jax distributed system is already initialized. I0422 09:54:52.598979 136547450599232 max_utils.py:244] Jax distributed system is already initialized. I0422 09:54:52.600293 136547450599232 pyconfig.py:432] Config param abort_on_inf_loss: True I0422 09:54:52.600427 136547450599232 pyconfig.py:432] Config param abort_on_nan_loss: True I0422 09:54:52.600456 136547450599232 pyconfig.py:432] Config param act_quantization_calibration_method: absmax I0422 09:54:52.600478 136547450599232 pyconfig.py:432] Config param activation_dropout_for_audio: 0.0 I0422 09:54:52.600497 136547450599232 pyconfig.py:432] Config param activation_function_for_audio: gelu I0422 09:54:52.600516 136547450599232 pyconfig.py:432] Config param activations_in_float32: False I0422 09:54:52.600533 136547450599232 pyconfig.py:432] Config param adam_b1: 0.9 I0422 09:54:52.600553 136547450599232 pyconfig.py:432] Config param adam_b2: 0.95 I0422 09:54:52.600569 136547450599232 pyconfig.py:432] Config param adam_eps: 1e-08 I0422 09:54:52.600592 136547450599232 pyconfig.py:432] Config param adam_eps_root: 0.0 I0422 09:54:52.600609 136547450599232 pyconfig.py:432] Config param adam_weight_decay: 0.1 I0422 09:54:52.600624 136547450599232 pyconfig.py:432] Config param adamw_mask: [] I0422 09:54:52.600641 136547450599232 pyconfig.py:432] Config param add_bos: True I0422 09:54:52.600657 136547450599232 pyconfig.py:432] Config param add_eos: True I0422 09:54:52.600674 136547450599232 pyconfig.py:432] Config param allow_split_physical_axes: False I0422 09:54:52.600703 136547450599232 pyconfig.py:432] Config param ar_cache_axis_order: 1,2,0,3 I0422 09:54:52.600721 136547450599232 pyconfig.py:432] Config param async_checkpointing: True I0422 09:54:52.600735 136547450599232 pyconfig.py:432] Config param async_scheduling: False I0422 09:54:52.600751 136547450599232 pyconfig.py:432] Config param attention: dot_product I0422 09:54:52.600768 136547450599232 pyconfig.py:432] Config param attention_bias: False I0422 09:54:52.600784 136547450599232 pyconfig.py:432] Config param attention_dropout_for_audio: 0.0 I0422 09:54:52.600803 136547450599232 pyconfig.py:432] Config param attention_out: RematLocation.REMAT I0422 09:54:52.600825 136547450599232 pyconfig.py:432] Config param attention_output_dim: -1 I0422 09:54:52.600840 136547450599232 pyconfig.py:432] Config param attention_sink: False I0422 09:54:52.600857 136547450599232 pyconfig.py:432] Config param attention_type: global I0422 09:54:52.600872 136547450599232 pyconfig.py:432] Config param attn_logits_soft_cap: None I0422 09:54:52.600889 136547450599232 pyconfig.py:432] Config param audio_path: I0422 09:54:52.600904 136547450599232 pyconfig.py:432] Config param audio_placeholder: <|audio|> I0422 09:54:52.600920 136547450599232 pyconfig.py:432] Config param autoregressive_decode_assert: I0422 09:54:52.600948 136547450599232 pyconfig.py:432] Config param base_config: base.yml I0422 09:54:52.600964 136547450599232 pyconfig.py:432] Config param base_emb_dim: 16 I0422 09:54:52.600979 136547450599232 pyconfig.py:432] Config param base_mlp_dim: 64 I0422 09:54:52.600995 136547450599232 pyconfig.py:432] Config param base_moe_mlp_dim: -1 I0422 09:54:52.601011 136547450599232 pyconfig.py:432] Config param base_num_decoder_layers: 1 I0422 09:54:52.601026 136547450599232 pyconfig.py:432] Config param base_num_kv_heads: 2 I0422 09:54:52.601042 136547450599232 pyconfig.py:432] Config param base_num_query_heads: 2 I0422 09:54:52.601058 136547450599232 pyconfig.py:432] Config param base_output_directory: I0422 09:54:52.601072 136547450599232 pyconfig.py:432] Config param batch_size: 1 I0422 09:54:52.601089 136547450599232 pyconfig.py:432] Config param batch_split_factor: 1 I0422 09:54:52.601105 136547450599232 pyconfig.py:432] Config param beta_fast: 32 I0422 09:54:52.601121 136547450599232 pyconfig.py:432] Config param beta_slow: 1 I0422 09:54:52.601137 136547450599232 pyconfig.py:432] Config param bwd_quantization_calibration_method: absmax I0422 09:54:52.601152 136547450599232 pyconfig.py:432] Config param capacity_factor: -1.0 I0422 09:54:52.601169 136547450599232 pyconfig.py:432] Config param cast_logits_to_fp32: True I0422 09:54:52.601185 136547450599232 pyconfig.py:432] Config param chat_template: I0422 09:54:52.601200 136547450599232 pyconfig.py:432] Config param chat_template_path: I0422 09:54:52.601218 136547450599232 pyconfig.py:432] Config param checkpoint_conversion_fn: None I0422 09:54:52.601234 136547450599232 pyconfig.py:432] Config param checkpoint_dir: None I0422 09:54:52.601252 136547450599232 pyconfig.py:432] Config param checkpoint_is_quantized: False I0422 09:54:52.601269 136547450599232 pyconfig.py:432] Config param checkpoint_period: 2000 I0422 09:54:52.601284 136547450599232 pyconfig.py:432] Config param checkpoint_storage_concurrent_gb: 96 I0422 09:54:52.601300 136547450599232 pyconfig.py:432] Config param checkpoint_storage_target_data_file_size_bytes: 2147483648 I0422 09:54:52.601318 136547450599232 pyconfig.py:432] Config param checkpoint_storage_use_ocdbt: True I0422 09:54:52.601333 136547450599232 pyconfig.py:432] Config param checkpoint_storage_use_zarr3: True I0422 09:54:52.601351 136547450599232 pyconfig.py:432] Config param checkpoint_todelete_full_path: None I0422 09:54:52.601367 136547450599232 pyconfig.py:432] Config param checkpoint_todelete_subdir: None I0422 09:54:52.601384 136547450599232 pyconfig.py:432] Config param chips_per_vm: 4 I0422 09:54:52.601401 136547450599232 pyconfig.py:432] Config param chunk_attn_window_size: 0 I0422 09:54:52.601417 136547450599232 pyconfig.py:432] Config param collect_stack_trace: False I0422 09:54:52.601432 136547450599232 pyconfig.py:432] Config param colocated_python_checkpointing: False I0422 09:54:52.601448 136547450599232 pyconfig.py:432] Config param colocated_python_data_input: False I0422 09:54:52.601464 136547450599232 pyconfig.py:432] Config param compile_topology: I0422 09:54:52.601479 136547450599232 pyconfig.py:432] Config param compile_topology_num_slices: -1 I0422 09:54:52.601494 136547450599232 pyconfig.py:432] Config param compile_xla_flags: I0422 09:54:52.601509 136547450599232 pyconfig.py:432] Config param compiled_trainstep_file: I0422 09:54:52.601524 136547450599232 pyconfig.py:432] Config param compute_axis_order: 0,1,2,3 I0422 09:54:52.601539 136547450599232 pyconfig.py:432] Config param constant_bound_config: [] I0422 09:54:52.601555 136547450599232 pyconfig.py:432] Config param context: RematLocation.REMAT I0422 09:54:52.601570 136547450599232 pyconfig.py:432] Config param context_parallel_load_balance: True I0422 09:54:52.601585 136547450599232 pyconfig.py:432] Config param context_parallel_size: 1 I0422 09:54:52.601600 136547450599232 pyconfig.py:432] Config param context_parallel_strategy: all_gather I0422 09:54:52.601615 136547450599232 pyconfig.py:432] Config param context_sharding: context I0422 09:54:52.601630 136547450599232 pyconfig.py:432] Config param conv_chunksize_for_audio: 500 I0422 09:54:52.601646 136547450599232 pyconfig.py:432] Config param conv_stride_for_vit: 14 I0422 09:54:52.601661 136547450599232 pyconfig.py:432] Config param cost_estimate_flops_bwd: -1 I0422 09:54:52.601677 136547450599232 pyconfig.py:432] Config param cost_estimate_flops_fwd: -1 I0422 09:54:52.601721 136547450599232 pyconfig.py:432] Config param custom_mesh: I0422 09:54:52.601740 136547450599232 pyconfig.py:432] Config param custom_mesh_and_rule: I0422 09:54:52.601754 136547450599232 pyconfig.py:432] Config param d_model_for_audio: 256 I0422 09:54:52.601769 136547450599232 pyconfig.py:432] Config param data_sharding: (('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'),) I0422 09:54:52.601791 136547450599232 pyconfig.py:432] Config param data_shuffle_seed: 0 I0422 09:54:52.601806 136547450599232 pyconfig.py:432] Config param dataset_name: c4/en:3.0.1 I0422 09:54:52.601821 136547450599232 pyconfig.py:432] Config param dataset_path: I0422 09:54:52.601836 136547450599232 pyconfig.py:432] Config param dataset_type: DatasetType.HF I0422 09:54:52.601855 136547450599232 pyconfig.py:432] Config param dcn_autoregressive_parallelism: 1 I0422 09:54:52.601869 136547450599232 pyconfig.py:432] Config param dcn_context_autoregressive_parallelism: 1 I0422 09:54:52.601885 136547450599232 pyconfig.py:432] Config param dcn_context_parallelism: 1 I0422 09:54:52.601899 136547450599232 pyconfig.py:432] Config param dcn_data_parallelism: -1 I0422 09:54:52.601914 136547450599232 pyconfig.py:432] Config param dcn_diloco_parallelism: 1 I0422 09:54:52.601938 136547450599232 pyconfig.py:432] Config param dcn_expert_parallelism: 1 I0422 09:54:52.601953 136547450599232 pyconfig.py:432] Config param dcn_fsdp_parallelism: 1 I0422 09:54:52.601969 136547450599232 pyconfig.py:432] Config param dcn_fsdp_transpose_parallelism: 1 I0422 09:54:52.601984 136547450599232 pyconfig.py:432] Config param dcn_parallelism: [1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] I0422 09:54:52.602001 136547450599232 pyconfig.py:432] Config param dcn_pipeline_parallelism: 1 I0422 09:54:52.602015 136547450599232 pyconfig.py:432] Config param dcn_sequence_parallelism: 1 I0422 09:54:52.602030 136547450599232 pyconfig.py:432] Config param dcn_tensor_parallelism: 1 I0422 09:54:52.602044 136547450599232 pyconfig.py:432] Config param dcn_tensor_sequence_parallelism: 1 I0422 09:54:52.602060 136547450599232 pyconfig.py:432] Config param dcn_tensor_transpose_parallelism: 1 I0422 09:54:52.602075 136547450599232 pyconfig.py:432] Config param debug: {'rl': False} I0422 09:54:52.602091 136547450599232 pyconfig.py:432] Config param debug_sharding: False I0422 09:54:52.602107 136547450599232 pyconfig.py:432] Config param decode_sampling_nucleus_p: -1 I0422 09:54:52.602123 136547450599232 pyconfig.py:432] Config param decode_sampling_strategy: SamplingStrategy.GREEDY I0422 09:54:52.602141 136547450599232 pyconfig.py:432] Config param decode_sampling_temperature: 1.0 I0422 09:54:52.602155 136547450599232 pyconfig.py:432] Config param decode_sampling_top_k: 0 I0422 09:54:52.602171 136547450599232 pyconfig.py:432] Config param decoder_block: DecoderBlockType.GPT3 I0422 09:54:52.602192 136547450599232 pyconfig.py:432] Config param decoder_layer_input: RematLocation.DEVICE I0422 09:54:52.602216 136547450599232 pyconfig.py:432] Config param deepstack_visual_indexes_for_vit: [] I0422 09:54:52.602241 136547450599232 pyconfig.py:432] Config param degenerate_group_masking: True I0422 09:54:52.602262 136547450599232 pyconfig.py:432] Config param dense_init_scale: 1.0 I0422 09:54:52.602280 136547450599232 pyconfig.py:432] Config param diloco_outer_lr: 0.3 I0422 09:54:52.602296 136547450599232 pyconfig.py:432] Config param diloco_outer_momentum: 0.9 I0422 09:54:52.602310 136547450599232 pyconfig.py:432] Config param diloco_sync_period: 36 I0422 09:54:52.602326 136547450599232 pyconfig.py:432] Config param distill_alpha: 0.5 I0422 09:54:52.602341 136547450599232 pyconfig.py:432] Config param distill_alpha_end: None I0422 09:54:52.602356 136547450599232 pyconfig.py:432] Config param distill_alpha_schedule: constant I0422 09:54:52.602373 136547450599232 pyconfig.py:432] Config param distill_beta: 0.0 I0422 09:54:52.602387 136547450599232 pyconfig.py:432] Config param distill_beta_end: None I0422 09:54:52.602402 136547450599232 pyconfig.py:432] Config param distill_beta_schedule: constant I0422 09:54:52.602417 136547450599232 pyconfig.py:432] Config param distill_feature_loss_type: cosine I0422 09:54:52.602432 136547450599232 pyconfig.py:432] Config param distill_layer_indices: None I0422 09:54:52.602446 136547450599232 pyconfig.py:432] Config param distill_temperature: 1.0 I0422 09:54:52.602461 136547450599232 pyconfig.py:432] Config param distill_temperature_end: None I0422 09:54:52.602478 136547450599232 pyconfig.py:432] Config param distill_temperature_schedule: constant I0422 09:54:52.602493 136547450599232 pyconfig.py:432] Config param downsample_hidden_size_for_audio: 256 I0422 09:54:52.602507 136547450599232 pyconfig.py:432] Config param dpo_beta: 0.1 I0422 09:54:52.602523 136547450599232 pyconfig.py:432] Config param dpo_label_smoothing: 0.0 I0422 09:54:52.602538 136547450599232 pyconfig.py:432] Config param dq_reduction_steps: 0 I0422 09:54:52.602553 136547450599232 pyconfig.py:432] Config param dropout_rate: 0.0 I0422 09:54:52.602569 136547450599232 pyconfig.py:432] Config param dtype: bfloat16 I0422 09:54:52.602605 136547450599232 pyconfig.py:432] Config param dtype_mm: float32 I0422 09:54:52.602623 136547450599232 pyconfig.py:432] Config param dump_hlo: False I0422 09:54:52.602639 136547450599232 pyconfig.py:432] Config param dump_hlo_delete_local_after: True I0422 09:54:52.602654 136547450599232 pyconfig.py:432] Config param dump_hlo_gcs_dir: gpt3-52k_2026-04-22-09-54/xla_dump I0422 09:54:52.602670 136547450599232 pyconfig.py:432] Config param dump_hlo_local_dir: /tmp/xla_dump/ I0422 09:54:52.602684 136547450599232 pyconfig.py:432] Config param dump_hlo_local_module_name: jit_train_step I0422 09:54:52.602705 136547450599232 pyconfig.py:432] Config param dump_hlo_module_name: jit_train_step I0422 09:54:52.602719 136547450599232 pyconfig.py:432] Config param dump_hlo_upload_all: False I0422 09:54:52.602735 136547450599232 pyconfig.py:432] Config param dump_hlo_xla_flags: I0422 09:54:52.602751 136547450599232 pyconfig.py:432] Config param dump_jaxpr: False I0422 09:54:52.602765 136547450599232 pyconfig.py:432] Config param dump_jaxpr_delete_local_after: True I0422 09:54:52.602780 136547450599232 pyconfig.py:432] Config param dump_jaxpr_gcs_dir: gpt3-52k_2026-04-22-09-54/jaxpr_dump I0422 09:54:52.602796 136547450599232 pyconfig.py:432] Config param dump_jaxpr_local_dir: /tmp/jaxpr_dump/ I0422 09:54:52.602811 136547450599232 pyconfig.py:432] Config param dump_step: -1 I0422 09:54:52.602827 136547450599232 pyconfig.py:432] Config param elastic_enabled: False I0422 09:54:52.602841 136547450599232 pyconfig.py:432] Config param elastic_max_retries: 10 I0422 09:54:52.602857 136547450599232 pyconfig.py:432] Config param elastic_timeout_seconds: 300 I0422 09:54:52.602872 136547450599232 pyconfig.py:432] Config param emb_dim: 16 I0422 09:54:52.602887 136547450599232 pyconfig.py:432] Config param enable_autocheckpoint: False I0422 09:54:52.602902 136547450599232 pyconfig.py:432] Config param enable_checkpoint_cloud_logger: False I0422 09:54:52.602917 136547450599232 pyconfig.py:432] Config param enable_checkpointing: True I0422 09:54:52.602941 136547450599232 pyconfig.py:432] Config param enable_continuous_checkpointing: False I0422 09:54:52.602957 136547450599232 pyconfig.py:432] Config param enable_data_shuffling: True I0422 09:54:52.602972 136547450599232 pyconfig.py:432] Config param enable_diloco: False I0422 09:54:52.602988 136547450599232 pyconfig.py:432] Config param enable_dp_attention: False I0422 09:54:52.603001 136547450599232 pyconfig.py:432] Config param enable_dropout: False I0422 09:54:52.603017 136547450599232 pyconfig.py:432] Config param enable_emergency_checkpoint: False I0422 09:54:52.603032 136547450599232 pyconfig.py:432] Config param enable_expert_parallel: False I0422 09:54:52.603048 136547450599232 pyconfig.py:432] Config param enable_gcp_goodput_metrics: True I0422 09:54:52.603063 136547450599232 pyconfig.py:432] Config param enable_gcp_step_deviation_metrics: True I0422 09:54:52.603078 136547450599232 pyconfig.py:432] Config param enable_goodput_recording: False I0422 09:54:52.603092 136547450599232 pyconfig.py:432] Config param enable_jax_profiler: False I0422 09:54:52.603108 136547450599232 pyconfig.py:432] Config param enable_llm_inference_pool: False I0422 09:54:52.603122 136547450599232 pyconfig.py:432] Config param enable_model_warmup: False I0422 09:54:52.603137 136547450599232 pyconfig.py:432] Config param enable_multi_tier_checkpointing: False I0422 09:54:52.603152 136547450599232 pyconfig.py:432] Config param enable_nnx: False I0422 09:54:52.603168 136547450599232 pyconfig.py:432] Config param enable_orbax_v1: False I0422 09:54:52.603182 136547450599232 pyconfig.py:432] Config param enable_padding_causal_mask: True I0422 09:54:52.603198 136547450599232 pyconfig.py:432] Config param enable_pathways_goodput: False I0422 09:54:52.603213 136547450599232 pyconfig.py:432] Config param enable_prefix_caching: False I0422 09:54:52.603229 136547450599232 pyconfig.py:432] Config param enable_rampup_batch_size: False I0422 09:54:52.603244 136547450599232 pyconfig.py:432] Config param enable_single_controller: False I0422 09:54:52.603260 136547450599232 pyconfig.py:432] Config param enable_single_replica_ckpt_restoring: False I0422 09:54:52.603274 136547450599232 pyconfig.py:432] Config param enable_tensorboard: True I0422 09:54:52.603289 136547450599232 pyconfig.py:432] Config param enable_tunix_perf_metrics: False I0422 09:54:52.603304 136547450599232 pyconfig.py:432] Config param encoder_attention_heads_for_audio: 4 I0422 09:54:52.603319 136547450599232 pyconfig.py:432] Config param encoder_ffn_dim_for_audio: 512 I0422 09:54:52.603335 136547450599232 pyconfig.py:432] Config param encoder_layers_for_audio: 2 I0422 09:54:52.603351 136547450599232 pyconfig.py:432] Config param engram: RematLocation.REMAT I0422 09:54:52.603366 136547450599232 pyconfig.py:432] Config param engram_head_dim: 1280 I0422 09:54:52.603381 136547450599232 pyconfig.py:432] Config param engram_kernel_size: 4 I0422 09:54:52.603396 136547450599232 pyconfig.py:432] Config param engram_layers: [] I0422 09:54:52.603411 136547450599232 pyconfig.py:432] Config param engram_max_ngram_size: 3 I0422 09:54:52.603426 136547450599232 pyconfig.py:432] Config param engram_num_heads: 8 I0422 09:54:52.603442 136547450599232 pyconfig.py:432] Config param engram_seed: 0 I0422 09:54:52.603457 136547450599232 pyconfig.py:432] Config param engram_vocab_bases: [] I0422 09:54:52.603473 136547450599232 pyconfig.py:432] Config param epsilon_high: None I0422 09:54:52.603488 136547450599232 pyconfig.py:432] Config param eval_corr_lst: False I0422 09:54:52.603503 136547450599232 pyconfig.py:432] Config param eval_data_columns: ['text'] I0422 09:54:52.603521 136547450599232 pyconfig.py:432] Config param eval_dataset_name: c4/en:3.0.1 I0422 09:54:52.603537 136547450599232 pyconfig.py:432] Config param eval_image_column: image I0422 09:54:52.603551 136547450599232 pyconfig.py:432] Config param eval_interval: -1 I0422 09:54:52.603567 136547450599232 pyconfig.py:432] Config param eval_make_lst: False I0422 09:54:52.603582 136547450599232 pyconfig.py:432] Config param eval_per_device_batch_size: 2 I0422 09:54:52.603597 136547450599232 pyconfig.py:432] Config param eval_sampling_strategy: greedy I0422 09:54:52.603613 136547450599232 pyconfig.py:432] Config param eval_split: validation I0422 09:54:52.603627 136547450599232 pyconfig.py:432] Config param eval_steps: -1 I0422 09:54:52.603643 136547450599232 pyconfig.py:432] Config param expansion_factor_real_data: -1.0 I0422 09:54:52.603658 136547450599232 pyconfig.py:432] Config param final_logits_soft_cap: None I0422 09:54:52.603673 136547450599232 pyconfig.py:432] Config param first_num_dense_layers: 0 I0422 09:54:52.603688 136547450599232 pyconfig.py:432] Config param float32_gate_logits: False I0422 09:54:52.603708 136547450599232 pyconfig.py:432] Config param float32_logits: False I0422 09:54:52.603722 136547450599232 pyconfig.py:432] Config param float32_qk_product: False I0422 09:54:52.603739 136547450599232 pyconfig.py:432] Config param float32_weight_sum: True I0422 09:54:52.603754 136547450599232 pyconfig.py:432] Config param force_q_layout: False I0422 09:54:52.603769 136547450599232 pyconfig.py:432] Config param force_unroll: False I0422 09:54:52.603784 136547450599232 pyconfig.py:432] Config param freeze_audio_encoder_params: True I0422 09:54:52.603799 136547450599232 pyconfig.py:432] Config param freeze_vision_encoder_params: True I0422 09:54:52.603815 136547450599232 pyconfig.py:432] Config param fused_mlp: False I0422 09:54:52.603831 136547450599232 pyconfig.py:432] Config param fused_qkv: True I0422 09:54:52.603846 136547450599232 pyconfig.py:432] Config param gcs_metrics: False I0422 09:54:52.603861 136547450599232 pyconfig.py:432] Config param gdn_chunk_size: 64 I0422 09:54:52.603877 136547450599232 pyconfig.py:432] Config param gdn_conv_kernel_dim: 4 I0422 09:54:52.603891 136547450599232 pyconfig.py:432] Config param gdn_key_head_dim: 128 I0422 09:54:52.603905 136547450599232 pyconfig.py:432] Config param gdn_num_key_heads: 16 I0422 09:54:52.603921 136547450599232 pyconfig.py:432] Config param gdn_num_value_heads: 32 I0422 09:54:52.603948 136547450599232 pyconfig.py:432] Config param gdn_value_head_dim: 128 I0422 09:54:52.603963 136547450599232 pyconfig.py:432] Config param generate_padding_batch_eval: False I0422 09:54:52.603978 136547450599232 pyconfig.py:432] Config param generate_padding_batch_train: False I0422 09:54:52.603992 136547450599232 pyconfig.py:432] Config param generate_slice: v5e-16 I0422 09:54:52.604008 136547450599232 pyconfig.py:432] Config param generation_configs: {} I0422 09:54:52.604024 136547450599232 pyconfig.py:432] Config param global_batch_size_to_eval_on: 64 I0422 09:54:52.604038 136547450599232 pyconfig.py:432] Config param global_batch_size_to_load: 512 I0422 09:54:52.604054 136547450599232 pyconfig.py:432] Config param global_batch_size_to_load_eval: 64 I0422 09:54:52.604068 136547450599232 pyconfig.py:432] Config param global_batch_size_to_load_increment: None I0422 09:54:52.604085 136547450599232 pyconfig.py:432] Config param global_batch_size_to_load_start: None I0422 09:54:52.604099 136547450599232 pyconfig.py:432] Config param global_batch_size_to_train_on: 512 I0422 09:54:52.604115 136547450599232 pyconfig.py:432] Config param global_head_dim: 0 I0422 09:54:52.604129 136547450599232 pyconfig.py:432] Config param global_num_kv_heads: 0 I0422 09:54:52.604144 136547450599232 pyconfig.py:432] Config param global_parameter_scale: 1 I0422 09:54:52.604159 136547450599232 pyconfig.py:432] Config param global_rampup_samples: 500 I0422 09:54:52.604174 136547450599232 pyconfig.py:432] Config param global_rope_max_timescale: -1 I0422 09:54:52.604189 136547450599232 pyconfig.py:432] Config param global_rope_proportion: 0.25 I0422 09:54:52.604205 136547450599232 pyconfig.py:432] Config param goodput_upload_interval_seconds: 30 I0422 09:54:52.604221 136547450599232 pyconfig.py:432] Config param grad_dtype: float32 I0422 09:54:52.604260 136547450599232 pyconfig.py:432] Config param gradient_accumulation_steps: 8 I0422 09:54:52.604279 136547450599232 pyconfig.py:432] Config param gradient_clipping_threshold: 1.0 I0422 09:54:52.604295 136547450599232 pyconfig.py:432] Config param grain_data_source_max_workers: 16 I0422 09:54:52.604311 136547450599232 pyconfig.py:432] Config param grain_eval_files: I0422 09:54:52.604329 136547450599232 pyconfig.py:432] Config param grain_file_type: arrayrecord I0422 09:54:52.604344 136547450599232 pyconfig.py:432] Config param grain_num_threads: 16 I0422 09:54:52.604360 136547450599232 pyconfig.py:432] Config param grain_num_threads_eval: 16 I0422 09:54:52.604376 136547450599232 pyconfig.py:432] Config param grain_packing_type: first_fit I0422 09:54:52.604393 136547450599232 pyconfig.py:432] Config param grain_per_worker_buffer_size: 1 I0422 09:54:52.604408 136547450599232 pyconfig.py:432] Config param grain_per_worker_buffer_size_eval: 1 I0422 09:54:52.604424 136547450599232 pyconfig.py:432] Config param grain_prefetch_buffer_size: 500 I0422 09:54:52.604440 136547450599232 pyconfig.py:432] Config param grain_prefetch_buffer_size_eval: 500 I0422 09:54:52.604455 136547450599232 pyconfig.py:432] Config param grain_ram_budget_mb: 1024 I0422 09:54:52.604471 136547450599232 pyconfig.py:432] Config param grain_shuffle_buffer_size: 100 I0422 09:54:52.604493 136547450599232 pyconfig.py:432] Config param grain_train_files: I0422 09:54:52.604516 136547450599232 pyconfig.py:432] Config param grain_train_mixture_config_path: I0422 09:54:52.604537 136547450599232 pyconfig.py:432] Config param grain_worker_count: 1 I0422 09:54:52.604558 136547450599232 pyconfig.py:432] Config param grain_worker_count_eval: 1 I0422 09:54:52.604580 136547450599232 pyconfig.py:432] Config param grpo_beta: 0.08 I0422 09:54:52.604602 136547450599232 pyconfig.py:432] Config param grpo_epsilon: 0.2 I0422 09:54:52.604618 136547450599232 pyconfig.py:432] Config param hardware: tpu I0422 09:54:52.604634 136547450599232 pyconfig.py:432] Config param hbm_utilization_vllm: 0.72 I0422 09:54:52.604650 136547450599232 pyconfig.py:432] Config param head_dim: 8 I0422 09:54:52.604664 136547450599232 pyconfig.py:432] Config param heartbeat_reporting_interval_in_seconds: 5 I0422 09:54:52.604679 136547450599232 pyconfig.py:432] Config param hf_data_dir: None I0422 09:54:52.604699 136547450599232 pyconfig.py:432] Config param hf_eval_files: None I0422 09:54:52.604714 136547450599232 pyconfig.py:432] Config param hf_eval_split: None I0422 09:54:52.604729 136547450599232 pyconfig.py:432] Config param hf_name: None I0422 09:54:52.604745 136547450599232 pyconfig.py:432] Config param hf_path: OptimalScale/ClimbMix I0422 09:54:52.604759 136547450599232 pyconfig.py:432] Config param hf_train_files: None I0422 09:54:52.604775 136547450599232 pyconfig.py:432] Config param hidden_size_for_vit: 1408 I0422 09:54:52.604790 136547450599232 pyconfig.py:432] Config param hide_profiler_step_metric: False I0422 09:54:52.604806 136547450599232 pyconfig.py:432] Config param ici_autoregressive_parallelism: 1 I0422 09:54:52.604821 136547450599232 pyconfig.py:432] Config param ici_context_autoregressive_parallelism: 1 I0422 09:54:52.604837 136547450599232 pyconfig.py:432] Config param ici_context_parallelism: 1 I0422 09:54:52.604851 136547450599232 pyconfig.py:432] Config param ici_data_parallelism: 1 I0422 09:54:52.604866 136547450599232 pyconfig.py:432] Config param ici_diloco_parallelism: 1 I0422 09:54:52.604882 136547450599232 pyconfig.py:432] Config param ici_expert_parallelism: 1 I0422 09:54:52.604896 136547450599232 pyconfig.py:432] Config param ici_fsdp_parallelism: -1 I0422 09:54:52.604912 136547450599232 pyconfig.py:432] Config param ici_fsdp_transpose_parallelism: 1 I0422 09:54:52.604945 136547450599232 pyconfig.py:432] Config param ici_parallelism: [1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1] I0422 09:54:52.604966 136547450599232 pyconfig.py:432] Config param ici_pipeline_parallelism: 1 I0422 09:54:52.604982 136547450599232 pyconfig.py:432] Config param ici_sequence_parallelism: 1 I0422 09:54:52.604996 136547450599232 pyconfig.py:432] Config param ici_tensor_parallelism: 1 I0422 09:54:52.605011 136547450599232 pyconfig.py:432] Config param ici_tensor_sequence_parallelism: 1 I0422 09:54:52.605026 136547450599232 pyconfig.py:432] Config param ici_tensor_transpose_parallelism: 1 I0422 09:54:52.605041 136547450599232 pyconfig.py:432] Config param image_path: I0422 09:54:52.605056 136547450599232 pyconfig.py:432] Config param image_placeholder: <|image|> I0422 09:54:52.605071 136547450599232 pyconfig.py:432] Config param image_size_for_vit: 896 I0422 09:54:52.605087 136547450599232 pyconfig.py:432] Config param indexer_head_dim: 128 I0422 09:54:52.605101 136547450599232 pyconfig.py:432] Config param indexer_loss_scaling_factor: 0.0 I0422 09:54:52.605117 136547450599232 pyconfig.py:432] Config param indexer_n_heads: 64 I0422 09:54:52.605131 136547450599232 pyconfig.py:432] Config param indexer_sparse_training: False I0422 09:54:52.605147 136547450599232 pyconfig.py:432] Config param indexer_topk: 2048 I0422 09:54:52.605162 136547450599232 pyconfig.py:432] Config param inference_benchmark_test: False I0422 09:54:52.605176 136547450599232 pyconfig.py:432] Config param inference_metadata_file: I0422 09:54:52.605191 136547450599232 pyconfig.py:432] Config param inference_microbenchmark_log_file_path: I0422 09:54:52.605207 136547450599232 pyconfig.py:432] Config param inference_microbenchmark_loop_iters: 10 I0422 09:54:52.605221 136547450599232 pyconfig.py:432] Config param inference_microbenchmark_num_samples: [1, 2, 3, 4, 5] I0422 09:54:52.605237 136547450599232 pyconfig.py:432] Config param inference_microbenchmark_prefill_lengths: 64,128,256,512,1024 I0422 09:54:52.605252 136547450599232 pyconfig.py:432] Config param inference_microbenchmark_stages: prefill,generate I0422 09:54:52.605267 136547450599232 pyconfig.py:432] Config param inference_server: MaxtextInterleavedServer I0422 09:54:52.605282 136547450599232 pyconfig.py:432] Config param inhomogeneous_layer_cycle_interval: 1 I0422 09:54:52.605296 136547450599232 pyconfig.py:432] Config param init_weights_seed: 0 I0422 09:54:52.605311 136547450599232 pyconfig.py:432] Config param input_data_sharding_logical_axes: ['activation_embed_and_logits_batch', 'activation_norm_length'] I0422 09:54:52.605328 136547450599232 pyconfig.py:432] Config param interleave_moe_layer_step: 1 I0422 09:54:52.605344 136547450599232 pyconfig.py:432] Config param intermediate_size_for_vit: 5632 I0422 09:54:52.605359 136547450599232 pyconfig.py:432] Config param internal_compile: False I0422 09:54:52.605374 136547450599232 pyconfig.py:432] Config param internal_compile_num_devices: -1 I0422 09:54:52.605388 136547450599232 pyconfig.py:432] Config param jax_cache_dir: ~/jax_cache I0422 09:54:52.605403 136547450599232 pyconfig.py:432] Config param jax_debug_log_modules: I0422 09:54:52.605419 136547450599232 pyconfig.py:432] Config param jax_distributed_initialization_timeout: 300 I0422 09:54:52.605434 136547450599232 pyconfig.py:432] Config param jax_profiler_port: 9999 I0422 09:54:52.605451 136547450599232 pyconfig.py:432] Config param key_proj: RematLocation.REMAT I0422 09:54:52.605468 136547450599232 pyconfig.py:432] Config param kv_cache_buffer: 256 I0422 09:54:52.605482 136547450599232 pyconfig.py:432] Config param kv_lora_rank: 512 I0422 09:54:52.605497 136547450599232 pyconfig.py:432] Config param kv_quant_axis: KvQuantAxis.HEADS_AND_DKV I0422 09:54:52.605515 136547450599232 pyconfig.py:432] Config param kv_quant_dtype: int8 I0422 09:54:52.605529 136547450599232 pyconfig.py:432] Config param kv_wa_proj: RematLocation.REMAT I0422 09:54:52.605545 136547450599232 pyconfig.py:432] Config param learning_rate: 0.0002 I0422 09:54:52.605560 136547450599232 pyconfig.py:432] Config param learning_rate_final_fraction: 0.1 I0422 09:54:52.605574 136547450599232 pyconfig.py:432] Config param learning_rate_schedule_steps: 200000 I0422 09:54:52.605590 136547450599232 pyconfig.py:432] Config param load_balance_loss_weight: 0.0 I0422 09:54:52.605604 136547450599232 pyconfig.py:432] Config param load_checkpoint_only_once: False I0422 09:54:52.605620 136547450599232 pyconfig.py:432] Config param load_from_prefill_dir: False I0422 09:54:52.605635 136547450599232 pyconfig.py:432] Config param load_full_state_path: I0422 09:54:52.605650 136547450599232 pyconfig.py:432] Config param load_parameters_path: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items I0422 09:54:52.605666 136547450599232 pyconfig.py:432] Config param local_checkpoint_directory: I0422 09:54:52.605681 136547450599232 pyconfig.py:432] Config param local_checkpoint_period: 0 I0422 09:54:52.605702 136547450599232 pyconfig.py:432] Config param local_rope_max_timescale: -1 I0422 09:54:52.605717 136547450599232 pyconfig.py:432] Config param local_rope_proportion: 1.0 I0422 09:54:52.605732 136547450599232 pyconfig.py:432] Config param log_config: True I0422 09:54:52.605748 136547450599232 pyconfig.py:432] Config param log_period: 10 I0422 09:54:52.605762 136547450599232 pyconfig.py:432] Config param logical_axis_rules: (('activation_embed_and_logits_batch', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_embed_and_logits_batch_sequence', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_vocab', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_vocab', ('tensor', 'tensor_transpose')), ('activation_vocab', 'tensor_sequence'), ('activation_vocab', ('sequence', 'context')), ('vocab', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('embed_vocab', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence', 'autoregressive')), ('activation_kv_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence')), ('activation_attn_length', ('sequence', 'context')), ('activation_attn_length', ('context',)), ('activation_q_length', ('context',)), ('activation_kv_length', ()), ('activation_attn_embed', ('tensor', 'tensor_transpose')), ('activation_kv', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_kv_head_dim', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('q_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('kv_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('qkv', ()), ('kv', ()), ('kv_head_dim', ()), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'expert')), ('q_lora_up_proj', ()), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'expert')), ('kv_lora_up_proj', ()), ('activation_batch_moe', ('data', 'fsdp', 'fsdp_transpose')), ('activation_length_moe', ('sequence', 'context')), ('activation_length_moe', ('context',)), ('activation_norm_length_moe', ('tensor_sequence', 'context', 'sequence')), ('activation_embed_moe', ('tensor', 'tensor_transpose')), ('activation_mlp_moe', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_exp', ('expert',)), ('exp', 'expert'), ('mlp_moe', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'context')), ('embed_moe', ('fsdp', 'sequence', 'context')), ('activation_mlp', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_length', ('sequence', 'context')), ('activation_length', ('context',)), ('activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_embed', ('tensor', 'tensor_transpose')), ('activation_stage', 'stage'), ('mlp', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'context', 'expert')), ('norm', ('tensor', 'tensor_transpose')), ('layers', 'stage'), ('diloco', 'diloco'), ('engram_dim', ('tensor',)), ('dense_layers', ()), ('moe_layers', ()), ('mhc', ()), ('prefill_activation_length', ('sequence', 'context')), ('prefill_activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_prefill_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_length', ('sequence',)), ('cache_heads', ('autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence')), ('cache_heads', ('autoregressive', 'tensor', 'tensor_sequence')), ('paged_kv_heads', ('tensor',)), ('cache_batch_prefill', ()), ('cache_batch', ()), ('cache_heads_none', ()), ('cache_kv', ()), ('cache_sequence', ()), ('num_pages', ()), ('tokens_per_page', ()), ('paged_kv_head_dim_size', ()), ('mlp_no_fsdp', ('tensor', 'tensor_sequence', 'autoregressive')), ('embed_tensor_transpose', ('tensor_transpose',)), ('exp_with_fsdp', 'fsdp')) I0422 09:54:52.605880 136547450599232 pyconfig.py:432] Config param logits_dot_in_fp32: False I0422 09:54:52.605898 136547450599232 pyconfig.py:432] Config param logits_via_embedding: True I0422 09:54:52.605914 136547450599232 pyconfig.py:432] Config param lora_input_adapters_path: I0422 09:54:52.605944 136547450599232 pyconfig.py:432] Config param loss_algo: grpo I0422 09:54:52.605962 136547450599232 pyconfig.py:432] Config param lr_schedule_type: LearningRateScheduleType.COSINE I0422 09:54:52.605981 136547450599232 pyconfig.py:432] Config param managed_mldiagnostics: False I0422 09:54:52.605995 136547450599232 pyconfig.py:432] Config param managed_mldiagnostics_dir: None I0422 09:54:52.606010 136547450599232 pyconfig.py:432] Config param managed_mldiagnostics_run_group: I0422 09:54:52.606026 136547450599232 pyconfig.py:432] Config param matmul_precision: MatmulPrecision.DEFAULT I0422 09:54:52.606044 136547450599232 pyconfig.py:432] Config param max_checkify: False I0422 09:54:52.606060 136547450599232 pyconfig.py:432] Config param max_concurrency: 256 I0422 09:54:52.606074 136547450599232 pyconfig.py:432] Config param max_corpus_chars: 10000000 I0422 09:54:52.606089 136547450599232 pyconfig.py:432] Config param max_num_batched_tokens: None I0422 09:54:52.606103 136547450599232 pyconfig.py:432] Config param max_num_checkpoints_to_keep: None I0422 09:54:52.606118 136547450599232 pyconfig.py:432] Config param max_num_images_per_example: -1 I0422 09:54:52.606134 136547450599232 pyconfig.py:432] Config param max_num_seqs: None I0422 09:54:52.606150 136547450599232 pyconfig.py:432] Config param max_position_embeddings: 163840 I0422 09:54:52.606164 136547450599232 pyconfig.py:432] Config param max_prefill_predict_length: 64 I0422 09:54:52.606178 136547450599232 pyconfig.py:432] Config param max_sample_len_for_audio: 10000 I0422 09:54:52.606194 136547450599232 pyconfig.py:432] Config param max_segments_per_seq: -1 I0422 09:54:52.606208 136547450599232 pyconfig.py:432] Config param max_source_positions_for_audio: 1500 I0422 09:54:52.606223 136547450599232 pyconfig.py:432] Config param max_target_length: 2048 I0422 09:54:52.606238 136547450599232 pyconfig.py:432] Config param max_timescale_for_audio: 10000.0 I0422 09:54:52.606254 136547450599232 pyconfig.py:432] Config param megablox: True I0422 09:54:52.606270 136547450599232 pyconfig.py:432] Config param merge_gating_gmm: False I0422 09:54:52.606284 136547450599232 pyconfig.py:432] Config param mesh_axes: ['diloco', 'data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'] I0422 09:54:52.606304 136547450599232 pyconfig.py:432] Config param metrics_dir: None I0422 09:54:52.606319 136547450599232 pyconfig.py:432] Config param metrics_file: I0422 09:54:52.606333 136547450599232 pyconfig.py:432] Config param mhc_expansion_rate: 1 I0422 09:54:52.606349 136547450599232 pyconfig.py:432] Config param micro_batch_size_to_eval_on: 64 I0422 09:54:52.606365 136547450599232 pyconfig.py:432] Config param micro_batch_size_to_train_on: 64 I0422 09:54:52.606380 136547450599232 pyconfig.py:432] Config param mla_kv: RematLocation.REMAT I0422 09:54:52.606396 136547450599232 pyconfig.py:432] Config param mla_naive_kvcache: True I0422 09:54:52.606411 136547450599232 pyconfig.py:432] Config param mla_q: RematLocation.REMAT I0422 09:54:52.606428 136547450599232 pyconfig.py:432] Config param mlp_activations: ['gelu'] I0422 09:54:52.606446 136547450599232 pyconfig.py:432] Config param mlp_activations_limit: -1.0 I0422 09:54:52.606460 136547450599232 pyconfig.py:432] Config param mlp_bias: False I0422 09:54:52.606475 136547450599232 pyconfig.py:432] Config param mlp_dim: 64 I0422 09:54:52.606490 136547450599232 pyconfig.py:432] Config param mlpwi: RematLocation.REMAT I0422 09:54:52.606505 136547450599232 pyconfig.py:432] Config param mlpwi_0: RematLocation.REMAT I0422 09:54:52.606521 136547450599232 pyconfig.py:432] Config param mlpwi_1: RematLocation.REMAT I0422 09:54:52.606535 136547450599232 pyconfig.py:432] Config param mlpwo: RematLocation.REMAT I0422 09:54:52.606551 136547450599232 pyconfig.py:432] Config param moba: False I0422 09:54:52.606566 136547450599232 pyconfig.py:432] Config param moba_chunk_size: 1024 I0422 09:54:52.606581 136547450599232 pyconfig.py:432] Config param moba_topk: 8 I0422 09:54:52.606596 136547450599232 pyconfig.py:432] Config param model_call_mode: I0422 09:54:52.606612 136547450599232 pyconfig.py:432] Config param model_name: gpt3-52k I0422 09:54:52.606626 136547450599232 pyconfig.py:432] Config param moe_expert_input_dim: -1 I0422 09:54:52.606642 136547450599232 pyconfig.py:432] Config param moe_fsdp_use_two_stage_all_gather: False I0422 09:54:52.606656 136547450599232 pyconfig.py:432] Config param moe_mlp_dim: -1 I0422 09:54:52.606671 136547450599232 pyconfig.py:432] Config param moe_mlpwi_0: RematLocation.REMAT I0422 09:54:52.606686 136547450599232 pyconfig.py:432] Config param moe_mlpwi_1: RematLocation.REMAT I0422 09:54:52.606706 136547450599232 pyconfig.py:432] Config param moe_mlpwo: RematLocation.REMAT I0422 09:54:52.606720 136547450599232 pyconfig.py:432] Config param monitor_goodput: False I0422 09:54:52.606735 136547450599232 pyconfig.py:432] Config param monitor_step_time_deviation: True I0422 09:54:52.606750 136547450599232 pyconfig.py:432] Config param mrope_section: [24, 20, 20] I0422 09:54:52.606765 136547450599232 pyconfig.py:432] Config param mscale: 1.0 I0422 09:54:52.606782 136547450599232 pyconfig.py:432] Config param mtc_data_parallelism: 0 I0422 09:54:52.606796 136547450599232 pyconfig.py:432] Config param mtp_eval_target_module: 0 I0422 09:54:52.606811 136547450599232 pyconfig.py:432] Config param mtp_loss_scaling_factor: 0.1 I0422 09:54:52.606826 136547450599232 pyconfig.py:432] Config param mtp_num_layers: 0 I0422 09:54:52.606841 136547450599232 pyconfig.py:432] Config param mu_dtype: float32 I0422 09:54:52.606868 136547450599232 pyconfig.py:432] Config param multi_sampling: False I0422 09:54:52.606884 136547450599232 pyconfig.py:432] Config param multi_tier_checkpointing_backup_interval_minutes: 0 I0422 09:54:52.606898 136547450599232 pyconfig.py:432] Config param muon_beta: 0.95 I0422 09:54:52.606914 136547450599232 pyconfig.py:432] Config param muon_consistent_rms: None I0422 09:54:52.606941 136547450599232 pyconfig.py:432] Config param muon_weight_decay: 0.0 I0422 09:54:52.606960 136547450599232 pyconfig.py:432] Config param n_routing_groups: -1 I0422 09:54:52.606975 136547450599232 pyconfig.py:432] Config param n_window_for_audio: 50 I0422 09:54:52.606990 136547450599232 pyconfig.py:432] Config param n_window_infer_for_audio: 800 I0422 09:54:52.607004 136547450599232 pyconfig.py:432] Config param nope_layer_interval: -1 I0422 09:54:52.607020 136547450599232 pyconfig.py:432] Config param norm_topk_prob: False I0422 09:54:52.607034 136547450599232 pyconfig.py:432] Config param normalization_layer_epsilon: 1e-05 I0422 09:54:52.607053 136547450599232 pyconfig.py:432] Config param normalize_embedding_logits: False I0422 09:54:52.607068 136547450599232 pyconfig.py:432] Config param num_attention_heads_for_vit: 16 I0422 09:54:52.607083 136547450599232 pyconfig.py:432] Config param num_batches: 4 I0422 09:54:52.607099 136547450599232 pyconfig.py:432] Config param num_channels_for_vit: 3 I0422 09:54:52.607114 136547450599232 pyconfig.py:432] Config param num_conv_layers_for_audio: 3 I0422 09:54:52.607128 136547450599232 pyconfig.py:432] Config param num_decoder_layers: 1 I0422 09:54:52.607143 136547450599232 pyconfig.py:432] Config param num_diloco_replicas: 1 I0422 09:54:52.607158 136547450599232 pyconfig.py:432] Config param num_epoch: 1 I0422 09:54:52.607173 136547450599232 pyconfig.py:432] Config param num_eval_passes: 1 I0422 09:54:52.607189 136547450599232 pyconfig.py:432] Config param num_experts: 1 I0422 09:54:52.607203 136547450599232 pyconfig.py:432] Config param num_experts_per_tok: 1 I0422 09:54:52.607219 136547450599232 pyconfig.py:432] Config param num_generations: 2 I0422 09:54:52.607233 136547450599232 pyconfig.py:432] Config param num_hidden_layers_for_vit: 34 I0422 09:54:52.607249 136547450599232 pyconfig.py:432] Config param num_iterations: 1 I0422 09:54:52.607264 136547450599232 pyconfig.py:432] Config param num_kv_heads: 2 I0422 09:54:52.607280 136547450599232 pyconfig.py:432] Config param num_layers_per_pipeline_stage: 1 I0422 09:54:52.607295 136547450599232 pyconfig.py:432] Config param num_mel_bins_for_audio: 128 I0422 09:54:52.607309 136547450599232 pyconfig.py:432] Config param num_pipeline_microbatches: -1 I0422 09:54:52.607324 136547450599232 pyconfig.py:432] Config param num_pipeline_repeats: -1 I0422 09:54:52.607340 136547450599232 pyconfig.py:432] Config param num_position_embeddings_for_vit: 1024 I0422 09:54:52.607354 136547450599232 pyconfig.py:432] Config param num_query_heads: 2 I0422 09:54:52.607369 136547450599232 pyconfig.py:432] Config param num_samplers_slices: -1 I0422 09:54:52.607383 136547450599232 pyconfig.py:432] Config param num_slices: 1 I0422 09:54:52.607398 136547450599232 pyconfig.py:432] Config param num_target_devices: 32 I0422 09:54:52.607414 136547450599232 pyconfig.py:432] Config param num_test_batches: 5 I0422 09:54:52.607430 136547450599232 pyconfig.py:432] Config param num_trainer_slices: -1 I0422 09:54:52.607445 136547450599232 pyconfig.py:432] Config param num_vocab_tiling: 1 I0422 09:54:52.607461 136547450599232 pyconfig.py:432] Config param off_policy_steps: 0 I0422 09:54:52.607475 136547450599232 pyconfig.py:432] Config param offline_data_dir: None I0422 09:54:52.607491 136547450599232 pyconfig.py:432] Config param opt_type: OptimizerType.ADAM_PAX I0422 09:54:52.607507 136547450599232 pyconfig.py:432] Config param optimize_mesh_for_tpu_v6e: False I0422 09:54:52.607523 136547450599232 pyconfig.py:432] Config param optimizer_memory_host_offload: False I0422 09:54:52.607538 136547450599232 pyconfig.py:432] Config param original_max_position_embeddings: 4096 I0422 09:54:52.607553 136547450599232 pyconfig.py:432] Config param out_hidden_size_for_vit: 512 I0422 09:54:52.607567 136547450599232 pyconfig.py:432] Config param out_proj: RematLocation.REMAT I0422 09:54:52.607583 136547450599232 pyconfig.py:432] Config param output_dim_for_audio: 512 I0422 09:54:52.607598 136547450599232 pyconfig.py:432] Config param override_logical_axis_rules: False I0422 09:54:52.607613 136547450599232 pyconfig.py:432] Config param override_model_config: True I0422 09:54:52.607628 136547450599232 pyconfig.py:432] Config param packing: True I0422 09:54:52.607643 136547450599232 pyconfig.py:432] Config param pagedattn_head_dim_alignment: 128 I0422 09:54:52.607657 136547450599232 pyconfig.py:432] Config param pagedattn_max_pages_per_group: -1 I0422 09:54:52.607679 136547450599232 pyconfig.py:432] Config param pagedattn_num_pages: 64 I0422 09:54:52.607706 136547450599232 pyconfig.py:432] Config param pagedattn_pages_per_compute_block: 4 I0422 09:54:52.607723 136547450599232 pyconfig.py:432] Config param pagedattn_tokens_per_page: 32 I0422 09:54:52.607738 136547450599232 pyconfig.py:432] Config param param_scan_axis: 1 I0422 09:54:52.607753 136547450599232 pyconfig.py:432] Config param parameter_memory_host_offload: False I0422 09:54:52.607767 136547450599232 pyconfig.py:432] Config param partial_rotary_factor: 1.0 I0422 09:54:52.607782 136547450599232 pyconfig.py:432] Config param patch_size_for_vit: 14 I0422 09:54:52.607798 136547450599232 pyconfig.py:432] Config param penalty_incorrect_answer: -1.0 I0422 09:54:52.607812 136547450599232 pyconfig.py:432] Config param penalty_incorrect_format: -0.5 I0422 09:54:52.607829 136547450599232 pyconfig.py:432] Config param per_device_batch_size: 2 I0422 09:54:52.607843 136547450599232 pyconfig.py:432] Config param per_device_batch_size_increment: 2.0 I0422 09:54:52.607858 136547450599232 pyconfig.py:432] Config param per_device_batch_size_start: 4.0 I0422 09:54:52.607873 136547450599232 pyconfig.py:432] Config param pipeline_delay_activation_forwarding: False I0422 09:54:52.607889 136547450599232 pyconfig.py:432] Config param pipeline_fsdp_ag_once: False I0422 09:54:52.607904 136547450599232 pyconfig.py:432] Config param pipeline_fsdp_ag_per_repeat: False I0422 09:54:52.607918 136547450599232 pyconfig.py:432] Config param pipeline_parallel_layers: 1 I0422 09:54:52.607949 136547450599232 pyconfig.py:432] Config param pixel_shuffle_ratio_for_vit: 0.5 I0422 09:54:52.607967 136547450599232 pyconfig.py:432] Config param posemb_type_for_vit: learn I0422 09:54:52.607982 136547450599232 pyconfig.py:432] Config param position_id_per_seconds: 25 I0422 09:54:52.607997 136547450599232 pyconfig.py:432] Config param prefill_cache_axis_order: 1,2,0,3 I0422 09:54:52.608011 136547450599232 pyconfig.py:432] Config param prefill_cache_dir: I0422 09:54:52.608027 136547450599232 pyconfig.py:432] Config param prefill_chunk_size: 256 I0422 09:54:52.608042 136547450599232 pyconfig.py:432] Config param prefill_slice: v5e-16 I0422 09:54:52.608057 136547450599232 pyconfig.py:432] Config param prefix_caching_dram_byte: 100000000000 I0422 09:54:52.608073 136547450599232 pyconfig.py:432] Config param prefix_caching_hbm_byte: 10000000000 I0422 09:54:52.608088 136547450599232 pyconfig.py:432] Config param profile_cleanly: True I0422 09:54:52.608102 136547450599232 pyconfig.py:432] Config param profile_periodically_period: -1 I0422 09:54:52.608117 136547450599232 pyconfig.py:432] Config param profile_power_events: False I0422 09:54:52.608133 136547450599232 pyconfig.py:432] Config param profiler: ProfilerType.NONE I0422 09:54:52.608151 136547450599232 pyconfig.py:432] Config param profiler_steps: 5 I0422 09:54:52.608166 136547450599232 pyconfig.py:432] Config param projector_dropout_for_vit: 0.0 I0422 09:54:52.608182 136547450599232 pyconfig.py:432] Config param projector_input_dim_for_vit: 4096 I0422 09:54:52.608196 136547450599232 pyconfig.py:432] Config param projector_output_dim_for_vit: 4096 I0422 09:54:52.608212 136547450599232 pyconfig.py:432] Config param prometheus_port: 0 I0422 09:54:52.608227 136547450599232 pyconfig.py:432] Config param prompt: I love to I0422 09:54:52.608242 136547450599232 pyconfig.py:432] Config param pure_nnx: False I0422 09:54:52.608257 136547450599232 pyconfig.py:432] Config param pure_nnx_decoder: False I0422 09:54:52.608273 136547450599232 pyconfig.py:432] Config param q_lora_rank: 0 I0422 09:54:52.608288 136547450599232 pyconfig.py:432] Config param qk_clip_threshold: 100.0 I0422 09:54:52.608304 136547450599232 pyconfig.py:432] Config param qk_nope_head_dim: 128 I0422 09:54:52.608320 136547450599232 pyconfig.py:432] Config param qk_norm_with_scale: True I0422 09:54:52.608335 136547450599232 pyconfig.py:432] Config param qk_rope_head_dim: 64 I0422 09:54:52.608350 136547450599232 pyconfig.py:432] Config param qkv_proj: RematLocation.REMAT I0422 09:54:52.608366 136547450599232 pyconfig.py:432] Config param quant_cfg_path: I0422 09:54:52.608382 136547450599232 pyconfig.py:432] Config param quantization: QuantizationType.NONE I0422 09:54:52.608400 136547450599232 pyconfig.py:432] Config param quantization_local_shard_count: 4 I0422 09:54:52.608416 136547450599232 pyconfig.py:432] Config param quantize_kvcache: False I0422 09:54:52.608429 136547450599232 pyconfig.py:432] Config param query_proj: RematLocation.REMAT I0422 09:54:52.608446 136547450599232 pyconfig.py:432] Config param query_wa_proj: RematLocation.REMAT I0422 09:54:52.608462 136547450599232 pyconfig.py:432] Config param ragged_block_size: 256 I0422 09:54:52.608476 136547450599232 pyconfig.py:432] Config param ragged_buffer_factor: -1.0 I0422 09:54:52.608492 136547450599232 pyconfig.py:432] Config param rampup_end_step: 0 I0422 09:54:52.608506 136547450599232 pyconfig.py:432] Config param rampup_samples_per_increment_to_load: None I0422 09:54:52.608522 136547450599232 pyconfig.py:432] Config param reasoning_end_token: </reasoning> I0422 09:54:52.608536 136547450599232 pyconfig.py:432] Config param reasoning_start_token: <reasoning> I0422 09:54:52.608553 136547450599232 pyconfig.py:432] Config param record_internal_nn_metrics: 0 I0422 09:54:52.608567 136547450599232 pyconfig.py:432] Config param remat_policy: full I0422 09:54:52.608583 136547450599232 pyconfig.py:432] Config param remat_policy_for_vit: minimal I0422 09:54:52.608597 136547450599232 pyconfig.py:432] Config param remove_size_one_mesh_axis_from_type: True I0422 09:54:52.608613 136547450599232 pyconfig.py:432] Config param replicate_quant_scale: False I0422 09:54:52.608628 136547450599232 pyconfig.py:432] Config param replicator_backup_interval_minutes: 0 I0422 09:54:52.608643 136547450599232 pyconfig.py:432] Config param report_heartbeat_metric_for_gcp_monitoring: False I0422 09:54:52.608657 136547450599232 pyconfig.py:432] Config param report_performance_metric_for_gcp_monitoring: False I0422 09:54:52.608673 136547450599232 pyconfig.py:432] Config param reshape_q: False I0422 09:54:52.608691 136547450599232 pyconfig.py:432] Config param return_log_prob: False I0422 09:54:52.608707 136547450599232 pyconfig.py:432] Config param reuse_example_batch: 0 I0422 09:54:52.608722 136547450599232 pyconfig.py:432] Config param reward_exact_answer: 5.0 I0422 09:54:52.608737 136547450599232 pyconfig.py:432] Config param reward_exact_format_match: 3.0 I0422 09:54:52.608752 136547450599232 pyconfig.py:432] Config param reward_partial_format_match: 0.5 I0422 09:54:52.608768 136547450599232 pyconfig.py:432] Config param reward_ratio_guess_to_answer_high: 0.5 I0422 09:54:52.608783 136547450599232 pyconfig.py:432] Config param reward_ratio_guess_to_answer_low: 0.25 I0422 09:54:52.608798 136547450599232 pyconfig.py:432] Config param reward_white_space_format_match: 1.5 I0422 09:54:52.608815 136547450599232 pyconfig.py:432] Config param rl: {'num_generations': 2, 'num_iterations': 1, 'grpo_beta': 0.08, 'grpo_epsilon': 0.2, 'loss_algo': 'grpo', 'use_agentic_rollout': False, 'max_concurrency': 256, 'off_policy_steps': 0, 'system_prompt': '', 'degenerate_group_masking': True, 'epsilon_high': None} I0422 09:54:52.608836 136547450599232 pyconfig.py:432] Config param rollout_data_parallelism: -1 I0422 09:54:52.608853 136547450599232 pyconfig.py:432] Config param rollout_expert_parallelism: 1 I0422 09:54:52.608867 136547450599232 pyconfig.py:432] Config param rollout_micro_batch_size: -1 I0422 09:54:52.608882 136547450599232 pyconfig.py:432] Config param rollout_tensor_parallelism: -1 I0422 09:54:52.608898 136547450599232 pyconfig.py:432] Config param rope_attention_scaling: False I0422 09:54:52.608912 136547450599232 pyconfig.py:432] Config param rope_factor: 40 I0422 09:54:52.608941 136547450599232 pyconfig.py:432] Config param rope_interleave: True I0422 09:54:52.608959 136547450599232 pyconfig.py:432] Config param rope_linear_scaling_factor: 1.0 I0422 09:54:52.608973 136547450599232 pyconfig.py:432] Config param rope_max_timescale: 10000 I0422 09:54:52.608989 136547450599232 pyconfig.py:432] Config param rope_min_timescale: 1 I0422 09:54:52.609003 136547450599232 pyconfig.py:432] Config param rope_theta_for_vit: 10000 I0422 09:54:52.609019 136547450599232 pyconfig.py:432] Config param rope_truncate: True I0422 09:54:52.609034 136547450599232 pyconfig.py:432] Config param rope_type: RopeType.DEFAULT I0422 09:54:52.609053 136547450599232 pyconfig.py:432] Config param rope_use_scale: True I0422 09:54:52.609069 136547450599232 pyconfig.py:432] Config param routed_bias: False I0422 09:54:52.609084 136547450599232 pyconfig.py:432] Config param routed_bias_update_rate: 0.0 I0422 09:54:52.609099 136547450599232 pyconfig.py:432] Config param routed_scaling_factor: 1.0 I0422 09:54:52.609115 136547450599232 pyconfig.py:432] Config param routed_score_func: I0422 09:54:52.609130 136547450599232 pyconfig.py:432] Config param run_name: gpt3-52k_2026-04-22-09-54 I0422 09:54:52.609144 136547450599232 pyconfig.py:432] Config param sa_block_kv: 512 I0422 09:54:52.609160 136547450599232 pyconfig.py:432] Config param sa_block_kv_compute: 512 I0422 09:54:52.609174 136547450599232 pyconfig.py:432] Config param sa_block_kv_dkv: 512 I0422 09:54:52.609189 136547450599232 pyconfig.py:432] Config param sa_block_kv_dkv_compute: 512 I0422 09:54:52.609204 136547450599232 pyconfig.py:432] Config param sa_block_kv_dq: 512 I0422 09:54:52.609219 136547450599232 pyconfig.py:432] Config param sa_block_q: 512 I0422 09:54:52.609233 136547450599232 pyconfig.py:432] Config param sa_block_q_dkv: 512 I0422 09:54:52.609249 136547450599232 pyconfig.py:432] Config param sa_block_q_dq: 512 I0422 09:54:52.609264 136547450599232 pyconfig.py:432] Config param sa_k_layout: HEAD_DIM_MINOR I0422 09:54:52.609278 136547450599232 pyconfig.py:432] Config param sa_q_layout: HEAD_DIM_MINOR I0422 09:54:52.609293 136547450599232 pyconfig.py:432] Config param sa_use_fused_bwd_kernel: False I0422 09:54:52.609307 136547450599232 pyconfig.py:432] Config param sa_v_layout: HEAD_DIM_MINOR I0422 09:54:52.609323 136547450599232 pyconfig.py:432] Config param sampler_devices_fraction: 0.5 I0422 09:54:52.609337 136547450599232 pyconfig.py:432] Config param save_checkpoint_on_completion: True I0422 09:54:52.609353 136547450599232 pyconfig.py:432] Config param save_config_to_gcs: False I0422 09:54:52.609367 136547450599232 pyconfig.py:432] Config param save_quantized_params_path: I0422 09:54:52.609383 136547450599232 pyconfig.py:432] Config param scale_embedding_for_audio: True I0422 09:54:52.609397 136547450599232 pyconfig.py:432] Config param scan_layers: True I0422 09:54:52.609412 136547450599232 pyconfig.py:432] Config param scan_layers_per_stage: False I0422 09:54:52.609426 136547450599232 pyconfig.py:432] Config param scan_pipeline_iterations: True I0422 09:54:52.609444 136547450599232 pyconfig.py:432] Config param scan_pipeline_repeats: False I0422 09:54:52.609458 136547450599232 pyconfig.py:432] Config param set_remat_policy_on_layers_per_stage: False I0422 09:54:52.609473 136547450599232 pyconfig.py:432] Config param set_remat_policy_on_pipeline_iterations: True I0422 09:54:52.609487 136547450599232 pyconfig.py:432] Config param sft_train_on_completion_only: False I0422 09:54:52.609502 136547450599232 pyconfig.py:432] Config param shard_exp_on_fsdp: False I0422 09:54:52.609516 136547450599232 pyconfig.py:432] Config param shard_mode: ShardMode.AUTO I0422 09:54:52.609534 136547450599232 pyconfig.py:432] Config param shard_optimizer_over_data: False I0422 09:54:52.609548 136547450599232 pyconfig.py:432] Config param sharding_strategy: None I0422 09:54:52.609563 136547450599232 pyconfig.py:432] Config param sharding_tolerance: 0.02 I0422 09:54:52.609578 136547450599232 pyconfig.py:432] Config param shardy: True I0422 09:54:52.609593 136547450599232 pyconfig.py:432] Config param share_kv_projections: False I0422 09:54:52.609609 136547450599232 pyconfig.py:432] Config param shared_experts: 0 I0422 09:54:52.609623 136547450599232 pyconfig.py:432] Config param sinkhorn_iterations: 20 I0422 09:54:52.609638 136547450599232 pyconfig.py:432] Config param skip_first_n_steps_for_profiler: 1 I0422 09:54:52.609654 136547450599232 pyconfig.py:432] Config param skip_jax_distributed_system: False I0422 09:54:52.609668 136547450599232 pyconfig.py:432] Config param skip_step_interval: 128 I0422 09:54:52.609683 136547450599232 pyconfig.py:432] Config param skip_step_on_spikes: False I0422 09:54:52.609702 136547450599232 pyconfig.py:432] Config param skip_step_scaling_factor: 6.0 I0422 09:54:52.609718 136547450599232 pyconfig.py:432] Config param sliding_window_size: 0 I0422 09:54:52.609733 136547450599232 pyconfig.py:432] Config param solution_end_token: </answer> I0422 09:54:52.609748 136547450599232 pyconfig.py:432] Config param solution_start_token: <answer> I0422 09:54:52.609763 136547450599232 pyconfig.py:432] Config param source_checkpoint_layout: orbax I0422 09:54:52.609777 136547450599232 pyconfig.py:432] Config param sparse_matmul: True I0422 09:54:52.609792 136547450599232 pyconfig.py:432] Config param spatial_merge_size_for_vit: 2 I0422 09:54:52.609807 136547450599232 pyconfig.py:432] Config param stack_prefill_result_cache: False I0422 09:54:52.609823 136547450599232 pyconfig.py:432] Config param stack_trace_interval_seconds: 600 I0422 09:54:52.609837 136547450599232 pyconfig.py:432] Config param stack_trace_to_cloud: False I0422 09:54:52.609851 136547450599232 pyconfig.py:432] Config param step_deviation_interval_seconds: 30 I0422 09:54:52.609867 136547450599232 pyconfig.py:432] Config param steps: 200000 I0422 09:54:52.609882 136547450599232 pyconfig.py:432] Config param stop_strings: None I0422 09:54:52.609898 136547450599232 pyconfig.py:432] Config param student_overrides: {'model_name': 'llama3.1-8b'} I0422 09:54:52.609914 136547450599232 pyconfig.py:432] Config param student_params_to_update: None I0422 09:54:52.609944 136547450599232 pyconfig.py:432] Config param subslice_shape: I0422 09:54:52.609961 136547450599232 pyconfig.py:432] Config param swap_space_vllm_gb: 2 I0422 09:54:52.609975 136547450599232 pyconfig.py:432] Config param system_prompt: I0422 09:54:52.609990 136547450599232 pyconfig.py:432] Config param target_eval_loss: 0.0 I0422 09:54:52.610005 136547450599232 pyconfig.py:432] Config param teacher_overrides: {'model_name': 'llama3.1-8b'} I0422 09:54:52.610021 136547450599232 pyconfig.py:432] Config param temperature_tuning: False I0422 09:54:52.610035 136547450599232 pyconfig.py:432] Config param temporal_patch_size_for_vit: 2 I0422 09:54:52.610051 136547450599232 pyconfig.py:432] Config param tensorboard_dir: None I0422 09:54:52.610065 136547450599232 pyconfig.py:432] Config param tensors_on_device: None I0422 09:54:52.610080 136547450599232 pyconfig.py:432] Config param tensors_to_offload: None I0422 09:54:52.610095 136547450599232 pyconfig.py:432] Config param test_batch_start_index: 0 I0422 09:54:52.610110 136547450599232 pyconfig.py:432] Config param tile_size_for_vit: 336 I0422 09:54:52.610125 136547450599232 pyconfig.py:432] Config param tokenize_eval_data: True I0422 09:54:52.610140 136547450599232 pyconfig.py:432] Config param tokenize_train_data: True I0422 09:54:52.610155 136547450599232 pyconfig.py:432] Config param tokenizer_path: meta-llama/Llama-3.1-8B I0422 09:54:52.610170 136547450599232 pyconfig.py:432] Config param tokenizer_type: TokenizerType.HUGGINGFACE I0422 09:54:52.610188 136547450599232 pyconfig.py:432] Config param topk_routing_group: -1 I0422 09:54:52.610202 136547450599232 pyconfig.py:432] Config param train_data_columns: ['text'] I0422 09:54:52.610219 136547450599232 pyconfig.py:432] Config param train_fraction: 1.0 I0422 09:54:52.610234 136547450599232 pyconfig.py:432] Config param train_image_column: image I0422 09:54:52.610249 136547450599232 pyconfig.py:432] Config param train_micro_batch_size: -1 I0422 09:54:52.610264 136547450599232 pyconfig.py:432] Config param train_split: train I0422 09:54:52.610280 136547450599232 pyconfig.py:432] Config param trainable_parameters_mask: [] I0422 09:54:52.610296 136547450599232 pyconfig.py:432] Config param trainable_position_size: 2048 I0422 09:54:52.610311 136547450599232 pyconfig.py:432] Config param trainer_devices_fraction: 0.5 I0422 09:54:52.610327 136547450599232 pyconfig.py:432] Config param upload_all_profiler_results: False I0422 09:54:52.610341 136547450599232 pyconfig.py:432] Config param use_2d_fsdp_sharding: False I0422 09:54:52.610357 136547450599232 pyconfig.py:432] Config param use_agentic_rollout: False I0422 09:54:52.610372 136547450599232 pyconfig.py:432] Config param use_audio: False I0422 09:54:52.610386 136547450599232 pyconfig.py:432] Config param use_audio_in_video: False I0422 09:54:52.610400 136547450599232 pyconfig.py:432] Config param use_batch_split_schedule: False I0422 09:54:52.610416 136547450599232 pyconfig.py:432] Config param use_chat_template: False I0422 09:54:52.610431 136547450599232 pyconfig.py:432] Config param use_chunked_prefill: False I0422 09:54:52.610448 136547450599232 pyconfig.py:432] Config param use_custom_sort_vjp: True I0422 09:54:52.610463 136547450599232 pyconfig.py:432] Config param use_dpo: False I0422 09:54:52.610479 136547450599232 pyconfig.py:432] Config param use_gather_mosaic_kernel: False I0422 09:54:52.610494 136547450599232 pyconfig.py:432] Config param use_grpo: True I0422 09:54:52.610508 136547450599232 pyconfig.py:432] Config param use_indexer: False I0422 09:54:52.610523 136547450599232 pyconfig.py:432] Config param use_iota_embed: True I0422 09:54:52.610538 136547450599232 pyconfig.py:432] Config param use_jax_splash: False I0422 09:54:52.610554 136547450599232 pyconfig.py:432] Config param use_max_logit_estimate: -1 I0422 09:54:52.610569 136547450599232 pyconfig.py:432] Config param use_mrope: False I0422 09:54:52.610583 136547450599232 pyconfig.py:432] Config param use_multimodal: False I0422 09:54:52.610599 136547450599232 pyconfig.py:432] Config param use_pathways: True I0422 09:54:52.610615 136547450599232 pyconfig.py:432] Config param use_post_attn_norm: False I0422 09:54:52.610631 136547450599232 pyconfig.py:432] Config param use_post_ffw_norm: False I0422 09:54:52.610645 136547450599232 pyconfig.py:432] Config param use_qk_clip: False I0422 09:54:52.610660 136547450599232 pyconfig.py:432] Config param use_qk_norm: False I0422 09:54:52.610675 136547450599232 pyconfig.py:432] Config param use_qk_norm_in_gdn: True I0422 09:54:52.610693 136547450599232 pyconfig.py:432] Config param use_qwix_quantization: False I0422 09:54:52.610708 136547450599232 pyconfig.py:432] Config param use_ragged_attention: False I0422 09:54:52.610724 136547450599232 pyconfig.py:432] Config param use_random_routing: False I0422 09:54:52.610738 136547450599232 pyconfig.py:432] Config param use_replicator_service: False I0422 09:54:52.610753 136547450599232 pyconfig.py:432] Config param use_ring_of_experts: False I0422 09:54:52.610768 136547450599232 pyconfig.py:432] Config param use_sft: False I0422 09:54:52.610783 136547450599232 pyconfig.py:432] Config param use_splash_scheduler: False I0422 09:54:52.610797 136547450599232 pyconfig.py:432] Config param use_tokamax_gmm: False I0422 09:54:52.610813 136547450599232 pyconfig.py:432] Config param use_tokamax_splash: False I0422 09:54:52.610827 136547450599232 pyconfig.py:432] Config param use_truncation: True I0422 09:54:52.610842 136547450599232 pyconfig.py:432] Config param use_tunix_gradient_accumulation: False I0422 09:54:52.610857 136547450599232 pyconfig.py:432] Config param use_untrainable_positional_embedding: False I0422 09:54:52.610872 136547450599232 pyconfig.py:432] Config param use_vertex_tensorboard: False I0422 09:54:52.610886 136547450599232 pyconfig.py:432] Config param using_pipeline_parallelism: False I0422 09:54:52.610900 136547450599232 pyconfig.py:432] Config param v_head_dim: 128 I0422 09:54:52.610914 136547450599232 pyconfig.py:432] Config param v_norm_with_scale: True I0422 09:54:52.610942 136547450599232 pyconfig.py:432] Config param value_proj: RematLocation.REMAT I0422 09:54:52.610960 136547450599232 pyconfig.py:432] Config param vertex_tensorboard_project: I0422 09:54:52.610975 136547450599232 pyconfig.py:432] Config param vertex_tensorboard_region: I0422 09:54:52.610990 136547450599232 pyconfig.py:432] Config param video_path: I0422 09:54:52.611006 136547450599232 pyconfig.py:432] Config param video_placeholder: <|video|> I0422 09:54:52.611021 136547450599232 pyconfig.py:432] Config param vision_output_dim_for_vit: 4096 I0422 09:54:52.611036 136547450599232 pyconfig.py:432] Config param vision_output_length: -1 I0422 09:54:52.611052 136547450599232 pyconfig.py:432] Config param vllm_additional_config: {} I0422 09:54:52.611068 136547450599232 pyconfig.py:432] Config param vllm_hf_config_path: I0422 09:54:52.611083 136547450599232 pyconfig.py:432] Config param vllm_hf_overrides: {} I0422 09:54:52.611099 136547450599232 pyconfig.py:432] Config param vocab_size: 32000 I0422 09:54:52.611115 136547450599232 pyconfig.py:432] Config param warmup_steps_fraction: 0.1 I0422 09:54:52.611130 136547450599232 pyconfig.py:432] Config param weight_dtype: float32 I0422 09:54:52.611155 136547450599232 pyconfig.py:432] Config param weight_quantization_calibration_method: absmax I0422 09:54:52.611171 136547450599232 pyconfig.py:432] Config param wi_tile_dlhs_batch_seq: 512 I0422 09:54:52.611186 136547450599232 pyconfig.py:432] Config param wi_tile_dlhs_embed_dim: 1024 I0422 09:54:52.611202 136547450599232 pyconfig.py:432] Config param wi_tile_dlhs_mlp_dim: 1024 I0422 09:54:52.611218 136547450599232 pyconfig.py:432] Config param wi_tile_drhs_batch_seq: 512 I0422 09:54:52.611232 136547450599232 pyconfig.py:432] Config param wi_tile_drhs_embed_dim: 1024 I0422 09:54:52.611247 136547450599232 pyconfig.py:432] Config param wi_tile_drhs_mlp_dim: 1024 I0422 09:54:52.611263 136547450599232 pyconfig.py:432] Config param wi_tile_fwd_batch_seq: 512 I0422 09:54:52.611278 136547450599232 pyconfig.py:432] Config param wi_tile_fwd_embed_dim: 1024 I0422 09:54:52.611293 136547450599232 pyconfig.py:432] Config param wi_tile_fwd_mlp_dim: 1024 I0422 09:54:52.611309 136547450599232 pyconfig.py:432] Config param wo_tile_dlhs_batch_seq: 512 I0422 09:54:52.611323 136547450599232 pyconfig.py:432] Config param wo_tile_dlhs_embed_dim: 1024 I0422 09:54:52.611339 136547450599232 pyconfig.py:432] Config param wo_tile_dlhs_mlp_dim: 1024 I0422 09:54:52.611353 136547450599232 pyconfig.py:432] Config param wo_tile_drhs_batch_seq: 512 I0422 09:54:52.611368 136547450599232 pyconfig.py:432] Config param wo_tile_drhs_embed_dim: 1024 I0422 09:54:52.611382 136547450599232 pyconfig.py:432] Config param wo_tile_drhs_mlp_dim: 1024 I0422 09:54:52.611398 136547450599232 pyconfig.py:432] Config param wo_tile_fwd_batch_seq: 512 I0422 09:54:52.611413 136547450599232 pyconfig.py:432] Config param wo_tile_fwd_embed_dim: 1024 I0422 09:54:52.611428 136547450599232 pyconfig.py:432] Config param wo_tile_fwd_mlp_dim: 1024 I0422 09:54:52.611443 136547450599232 pyconfig.py:432] Config param wsd_decay_steps_fraction: 0.1 I0422 09:54:52.611459 136547450599232 pyconfig.py:432] Config param wsd_decay_style: WsdDecayStyle.LINEAR I0422 09:54:52.611477 136547450599232 pyconfig.py:432] Config param xprof_e2e_enable_fw_power_level_event: False I0422 09:54:52.611492 136547450599232 pyconfig.py:432] Config param xprof_e2e_enable_fw_thermal_event: False I0422 09:54:52.611508 136547450599232 pyconfig.py:432] Config param xprof_e2e_enable_fw_throttle_event: False I0422 09:54:52.611522 136547450599232 pyconfig.py:432] Config param xprof_tpu_power_trace_level: 0 I0422 09:54:52.611540 136547450599232 pyconfig.py:432] Config param z_loss_multiplier: 0.0 I0422 09:54:52.612098 136547450599232 tokenizer.py:245] Tokenizer path: meta-llama/Llama-2-7b-chat-hf I0422 09:54:52.612144 136547450599232 tokenizer.py:224] Loading HF tokenizer: meta-llama/Llama-2-7b-chat-hf I0422 09:54:56.915963 136547450599232 _schedule.py:129] A polynomial schedule was set with a non-positive `transition_steps` value; this results in a constant schedule with value `init_value`. I0422 09:54:56.919055 136547450599232 maxtext_utils.py:1718] Num_devices: 32, shape (1, 4, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1) I0422 09:54:56.919175 136547450599232 train_distill.py:596] Applying logical axis rules for model initialization and training... I0422 09:54:56.919245 136547450599232 train_distill.py:600] Loading Student from ... I0422 09:54:56.919274 136547450599232 train_distill.py:169] --- Student Configuration --- I0422 09:54:56.919297 136547450599232 train_distill.py:170] Model Name: gpt3-52k I0422 09:54:56.919319 136547450599232 train_distill.py:171] Dimensions: 1 Layers, 16 Emb Dim, 8 Head Dim I0422 09:54:56.919338 136547450599232 train_distill.py:174] Attention Heads: 2 Query, 2 KV I0422 09:54:56.919356 136547450599232 train_distill.py:175] Vocab Size: 32000 I0422 09:54:56.919374 136547450599232 train_distill.py:176] Checkpoint: I0422 09:54:56.919392 136547450599232 train_distill.py:465] Initializing model: gpt3-52k... I0422 09:54:58.197578 136547450599232 train_distill.py:614] Loading Teacher from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items... I0422 09:54:58.197690 136547450599232 train_distill.py:169] --- Teacher Configuration --- I0422 09:54:58.197719 136547450599232 train_distill.py:170] Model Name: gpt3-52k I0422 09:54:58.197751 136547450599232 train_distill.py:171] Dimensions: 1 Layers, 16 Emb Dim, 8 Head Dim I0422 09:54:58.197774 136547450599232 train_distill.py:174] Attention Heads: 2 Query, 2 KV I0422 09:54:58.197792 136547450599232 train_distill.py:175] Vocab Size: 32000 I0422 09:54:58.197814 136547450599232 train_distill.py:176] Checkpoint: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items I0422 09:54:58.197832 136547450599232 train_distill.py:465] Initializing model: gpt3-52k... I0422 09:54:59.365452 136547450599232 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0422 09:54:59.365889 136547450599232 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=True, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7c2fb6a90200>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0422 09:54:59.365962 136547450599232 abstract_checkpointer.py:35] orbax-checkpoint version: 0.11.28 W0422 09:54:59.875765 136547450599232 checkpoint.py:202] Metadata file does not exist: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items/_CHECKPOINT_METADATA I0422 09:55:00.418585 2138 google_auth_provider.cc:181] Running on GCE, using service account 562977990677-compute@developer.gserviceaccount.com I0422 09:55:01.418877 136547450599232 checkpointer.py:304] Restoring checkpoint from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items. W0422 09:55:03.991002 136547450599232 transform_utils.py:230] The transformations API will eventually be replaced by an upgraded design. The current API will not be removed until this point, but it will no longer be actively worked on. I0422 09:55:03.991383 136547450599232 transform_utils.py:288] The following keys are not loaded from the original tree after applying specified transforms: params/params/decoder/to_nnx__rngs/aqt/count, params/params/decoder/to_nnx__rngs/aqt/key, params/params/decoder/to_nnx__rngs/dropout/count, params/params/decoder/to_nnx__rngs/dropout/key, params/params/decoder/to_nnx__rngs/params/count, params/params/decoder/to_nnx__rngs/params/key I0422 09:55:04.307564 136547450599232 checkpointer.py:318] Finished restoring checkpoint in 3.27 seconds from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items. I0422 09:55:04.999694 136547450599232 train_distill.py:640] Initializing Data Iterators via MaxText pipeline... I0422 09:55:05.063825 136547450599232 config.py:112] TensorFlow version 2.20.0 available. I0422 09:55:05.064362 136547450599232 config.py:125] JAX version 0.8.3 available. E0422 09:55:07.243822 136547450599232 packing.py:209] PackAndBatchOperation is deprecated. Please use lazy_dataset.FirstFitPackIterDataset instead. I0422 09:55:07.244056 136547450599232 data_loader.py:408] Adding CopyNumPyArrayToSharedMemory MapTransform. I0422 09:55:07.247095 136547450599232 train_distill.py:410] Input Pipeline Checkpointing: DISABLED I0422 09:55:07.247157 136547450599232 train_distill.py:414] Reason: Iterator 'MultiHostDataLoadIterator' is not recognized as Grain (dataset_type='DatasetType.HF', has_save=False) I0422 09:55:07.247221 136547450599232 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0422 09:55:07.247298 136547450599232 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7c2fb6a90200>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0422 09:55:07.247339 136547450599232 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0422 09:55:07.247372 136547450599232 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7c2fb6a90200>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0422 09:55:07.247415 136547450599232 checkpoint_manager.py:702] [process=3][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c258f1e1640>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c1773f5b530>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c1773f5b470>}, handler_registry=None I0422 09:55:07.247613 136547450599232 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c258f1e1640>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0422 09:55:07.247655 136547450599232 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c1773f5b530>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0422 09:55:07.247681 136547450599232 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c1773f5b470>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0422 09:55:07.247704 136547450599232 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c190c72d2e0>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0422 09:55:07.247731 136547450599232 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c258f1e1640>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c258f1e1640>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c1773f5b530>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c1773f5b530>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c1773f5b470>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c1773f5b470>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c190c72d2e0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c190c72d2e0>}). I0422 09:55:07.248157 136547450599232 async_checkpointer.py:177] [process=3][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7c17738709a0> timeout: 600 secs and primary_host=0 for async checkpoint writes I0422 09:55:10.345420 136547450599232 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_trainstate_and_training_loop_20260422_093114/pt_distill_nnx_xpk_feat_nnx_trainstate_and_training_loop_20260422_093114_07_distill_smoke/checkpoints I0422 09:55:10.354165 136547450599232 checkpoint_manager.py:921] [process=3][thread=MainThread] CheckpointManager created, primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_trainstate_and_training_loop_20260422_093114/pt_distill_nnx_xpk_feat_nnx_trainstate_and_training_loop_20260422_093114_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7c190c72c3b0> I0422 09:55:10.354280 136547450599232 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0422 09:55:10.354345 136547450599232 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7c2fb6a90200>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0422 09:55:10.354380 136547450599232 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0422 09:55:10.354412 136547450599232 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7c2fb6a90200>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0422 09:55:10.354454 136547450599232 checkpoint_manager.py:1983] [process=3][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning. I0422 09:55:10.354506 136547450599232 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=136547450599232 count=1 at 0x7c1773817180>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7c1773f5b260>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7c1773f5b230>, _write_futures=[]) I0422 09:55:10.354851 136547450599232 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=136547450599232 count=1 at 0x7c1773817180>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7c1773f5b260>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7c1773f5b230>, _write_futures=[]) I0422 09:55:10.354879 136547450599232 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=136547450599232 count=1 at 0x7c1773817180>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7c1773f5b260>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7c1773f5b230>, _write_futures=[]) I0422 09:55:10.354910 136547450599232 checkpoint_manager.py:702] [process=3][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c1773f5b440>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c1773f59130>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c1773f59a00>, 'iter': <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7c1773f5bda0>}, handler_registry=None I0422 09:55:10.355023 136547450599232 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c1773f5b440>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0422 09:55:10.355057 136547450599232 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c1773f59130>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0422 09:55:10.355082 136547450599232 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c1773f59a00>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0422 09:55:10.355110 136547450599232 composite_checkpoint_handler.py:237] Deferred registration for item: "iter". Adding handler `<maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7c1773f5bda0>` for item "iter" and save args `<class 'maxtext.common.checkpointing.GrainCheckpointSave'>` and restore args `<class 'maxtext.common.checkpointing.GrainCheckpointRestore'>` to `_handler_registry`. I0422 09:55:10.355133 136547450599232 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c1773f59010>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0422 09:55:10.355159 136547450599232 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c1773f5b440>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c1773f5b440>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c1773f59130>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c1773f59130>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c1773f59a00>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c1773f59a00>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointSave'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7c1773f5bda0>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointRestore'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7c1773f5bda0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c1773f59010>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c1773f59010>}). I0422 09:55:10.355231 136547450599232 async_checkpointer.py:177] [process=3][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7c1773870ae0> timeout: 600 secs and primary_host=0 for async checkpoint writes I0422 09:55:10.735982 136547450599232 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_trainstate_and_training_loop_20260422_093114/pt_distill_nnx_xpk_feat_nnx_trainstate_and_training_loop_20260422_093114_07_distill_smoke/checkpoints I0422 09:55:11.184143 136547450599232 checkpoint_manager.py:921] [process=3][thread=MainThread] CheckpointManager created, primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_trainstate_and_training_loop_20260422_093114/pt_distill_nnx_xpk_feat_nnx_trainstate_and_training_loop_20260422_093114_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7c190d748320> I0422 09:55:11.184743 136547450599232 train_distill.py:691] Starting Distillation Training... I0422 09:55:11.184860 136547450599232 peft_trainer.py:590] Training with mesh: Mesh('diloco': 1, 'data': 4, 'stage': 1, 'fsdp': 8, 'fsdp_transpose': 1, 'sequence': 1, 'context': 1, 'context_autoregressive': 1, 'tensor': 1, 'tensor_transpose': 1, 'tensor_sequence': 1, 'expert': 1, 'autoregressive': 1, axis_types=(Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto)) I0422 09:55:11.305183 136547450599232 peft_trainer.py:600] Compiled train_step cache size: 0 Training: 0%| | 0/5 [00:00<?, ?step/s]I0422 09:55:11.306908 136403690170112 grain_pool.py:367] Grain pool will use 1 processes. I0422 09:55:11.333538 136403690170112 grain_pool.py:440] Grain pool will start child processes. I0422 09:55:11.338585 136403690170112 grain_pool.py:448] Grain pool started all child processes. 2026-04-22 09:55:17.383457: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303) I0422 09:55:20.619355 136547450599232 utils.py:86] Train loop finished in: 9.3136 seconds Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 765, in <module> app.run(main) File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 316, in run _run_main(main, args) File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 261, in _run_main sys.exit(main(argv)) ^^^^^^^^^^ File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 761, in main train_distill(student_config, teacher_config, is_offline, global_config.offline_data_dir) File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 693, in train_distill trainer.train(train_iter, eval_iter) File "/usr/local/lib/python3.12/site-packages/tunix/sft/peft_trainer.py", line 659, in train train_example = sharding_utils.shard_input( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 58, in shard_input return jax.tree.map( ^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/tree.py", line 155, in map return tree_util.tree_map(f, tree, *rest, is_leaf=is_leaf) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 362, in tree_map return treedef.unflatten(f(*xs) for xs in zip(*all_leaves)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 362, in <genexpr> return treedef.unflatten(f(*xs) for xs in zip(*all_leaves)) ^^^^^^ File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 59, in <lambda> lambda x: jax.make_array_from_process_local_data( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 986, in make_array_from_process_local_data out = [_array_from_process_local_data(data, s, shape) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 1048, in _array_from_process_local_data return make_array_from_callback(global_shape, sharding, cb) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 845, in make_array_from_callback per_device_values = api.device_put(per_device_values, devices) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/api.py", line 2729, in device_put out_flat = dispatch._batched_device_put_impl( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 558, in _batched_device_put_impl y = _device_put_impl(x, device=device, src=src, copy=cp, aval=aval) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 545, in _device_put_impl return _device_put_sharding_impl(x, aval, device, copy) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 487, in _device_put_sharding_impl raise ValueError( ValueError: device_put's first argument must be a fully addressable array, but got value with devices {TpuDevice(id=18, process_index=5, coords=(2,4,0), core_on_chip=0), TpuDevice(id=8, process_index=2, coords=(0,2,0), core_on_chip=0), TpuDevice(id=25, process_index=6, coords=(1,6,0), core_on_chip=0), TpuDevice(id=19, process_index=5, coords=(3,4,0), core_on_chip=0), TpuDevice(id=27, process_index=7, coords=(3,6,0), core_on_chip=0), TpuDevice(id=12, process_index=2, coords=(0,3,0), core_on_chip=0), TpuDevice(id=3, process_index=1, coords=(3,0,0), core_on_chip=0), TpuDevice(id=4, process_index=0, coords=(0,1,0), core_on_chip=0), TpuDevice(id=2, process_index=1, coords=(2,0,0), core_on_chip=0), TpuDevice(id=5, process_index=0, coords=(1,1,0), core_on_chip=0), TpuDevice(id=15, process_index=3, coords=(3,3,0), core_on_chip=0), TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), TpuDevice(id=20, process_index=4, coords=(0,5,0), core_on_chip=0), TpuDevice(id=28, process_index=6, coords=(0,7,0), core_on_chip=0), TpuDevice(id=22, process_index=5, coords=(2,5,0), core_on_chip=0), TpuDevice(id=30, process_index=7, coords=(2,7,0), core_on_chip=0), TpuDevice(id=16, process_index=4, coords=(0,4,0), core_on_chip=0), TpuDevice(id=10, process_index=3, coords=(2,2,0), core_on_chip=0), TpuDevice(id=7, process_index=1, coords=(3,1,0), core_on_chip=0), TpuDevice(id=13, process_index=2, coords=(1,3,0), core_on_chip=0), TpuDevice(id=29, process_index=6, coords=(1,7,0), core_on_chip=0), TpuDevice(id=23, process_index=5, coords=(3,5,0), core_on_chip=0), TpuDevice(id=31, process_index=7, coords=(3,7,0), core_on_chip=0), TpuDevice(id=6, process_index=1, coords=(2,1,0), core_on_chip=0), TpuDevice(id=21, process_index=4, coords=(1,5,0), core_on_chip=0), TpuDevice(id=1, process_index=0, coords=(1,0,0), core_on_chip=0), TpuDevice(id=17, process_index=4, coords=(1,4,0), core_on_chip=0), TpuDevice(id=14, process_index=3, coords=(2,3,0), core_on_chip=0), TpuDevice(id=9, process_index=2, coords=(1,2,0), core_on_chip=0), TpuDevice(id=24, process_index=6, coords=(0,6,0), core_on_chip=0), TpuDevice(id=26, process_index=7, coords=(2,6,0), core_on_chip=0), TpuDevice(id=11, process_index=3, coords=(3,2,0), core_on_chip=0)} I0422 09:55:20.962664 136403690170112 grain_pool.py:542] Grain pool is exiting. I0422 09:55:20.962768 136403690170112 grain_pool.py:547] Shutting down multiprocessing system. I0422 09:55:22.423396 136403690170112 grain_pool.py:547] Shutting down multiprocessing system. Training: 0%| | 0/5 [00:14<?, ?step/s] /usr/local/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 15 leaked shared_memory objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' XPK End: Wed Apr 22 09:55:30 UTC 2026 EXIT_CODE=1