XPK Start: Sun Apr 19 22:05:11 UTC 2026 2026-04-19 22:05:27.718584: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303) I0419 22:05:31.245958 132256122660672 max_utils.py:273] Attempting to initialize the jax distributed system... INFO:2026-04-19 22:05:40,284:jax._src.distributed:149: Starting JAX distributed service on [::]:8482 I0419 22:05:40.284542 132256122660672 distributed.py:149] Starting JAX distributed service on [::]:8482 INFO:2026-04-19 22:05:40,286:jax._src.distributed:166: Connecting to JAX distributed service on mt-07-distill-smoke-o51re-slice-job-0-0.mt-07-distill-smoke-o51re:8482 I0419 22:05:40.286754 132256122660672 distributed.py:166] Connecting to JAX distributed service on mt-07-distill-smoke-o51re-slice-job-0-0.mt-07-distill-smoke-o51re:8482 I0419 22:05:42.909840 132256122660672 max_utils.py:284] Jax distributed system initialized! I0419 22:05:48.244493 132256122660672 max_utils.py:244] Jax distributed system is already initialized. I0419 22:05:48.708741 132256122660672 max_utils.py:244] Jax distributed system is already initialized. I0419 22:05:48.709900 132256122660672 pyconfig.py:432] Config param abort_on_inf_loss: True I0419 22:05:48.709951 132256122660672 pyconfig.py:432] Config param abort_on_nan_loss: True I0419 22:05:48.709977 132256122660672 pyconfig.py:432] Config param act_quantization_calibration_method: absmax I0419 22:05:48.709998 132256122660672 pyconfig.py:432] Config param activation_dropout_for_audio: 0.0 I0419 22:05:48.710019 132256122660672 pyconfig.py:432] Config param activation_function_for_audio: gelu I0419 22:05:48.710038 132256122660672 pyconfig.py:432] Config param activations_in_float32: False I0419 22:05:48.710057 132256122660672 pyconfig.py:432] Config param adam_b1: 0.9 I0419 22:05:48.710076 132256122660672 pyconfig.py:432] Config param adam_b2: 0.95 I0419 22:05:48.710095 132256122660672 pyconfig.py:432] Config param adam_eps: 1e-08 I0419 22:05:48.710117 132256122660672 pyconfig.py:432] Config param adam_eps_root: 0.0 I0419 22:05:48.710134 132256122660672 pyconfig.py:432] Config param adam_weight_decay: 0.1 I0419 22:05:48.710151 132256122660672 pyconfig.py:432] Config param adamw_mask: [] I0419 22:05:48.710168 132256122660672 pyconfig.py:432] Config param add_bos: True I0419 22:05:48.710186 132256122660672 pyconfig.py:432] Config param add_eos: True I0419 22:05:48.710202 132256122660672 pyconfig.py:432] Config param allow_split_physical_axes: False I0419 22:05:48.710218 132256122660672 pyconfig.py:432] Config param ar_cache_axis_order: 1,2,0,3 I0419 22:05:48.710235 132256122660672 pyconfig.py:432] Config param async_checkpointing: True I0419 22:05:48.710252 132256122660672 pyconfig.py:432] Config param async_scheduling: False I0419 22:05:48.710268 132256122660672 pyconfig.py:432] Config param attention: dot_product I0419 22:05:48.710285 132256122660672 pyconfig.py:432] Config param attention_bias: False I0419 22:05:48.710302 132256122660672 pyconfig.py:432] Config param attention_dropout_for_audio: 0.0 I0419 22:05:48.710320 132256122660672 pyconfig.py:432] Config param attention_out: RematLocation.REMAT I0419 22:05:48.710352 132256122660672 pyconfig.py:432] Config param attention_output_dim: -1 I0419 22:05:48.710368 132256122660672 pyconfig.py:432] Config param attention_sink: False I0419 22:05:48.710385 132256122660672 pyconfig.py:432] Config param attention_type: global I0419 22:05:48.710401 132256122660672 pyconfig.py:432] Config param attn_logits_soft_cap: None I0419 22:05:48.710418 132256122660672 pyconfig.py:432] Config param audio_path: I0419 22:05:48.710433 132256122660672 pyconfig.py:432] Config param audio_placeholder: <|audio|> I0419 22:05:48.710449 132256122660672 pyconfig.py:432] Config param autoregressive_decode_assert: I0419 22:05:48.710466 132256122660672 pyconfig.py:432] Config param base_config: base.yml I0419 22:05:48.710482 132256122660672 pyconfig.py:432] Config param base_emb_dim: 16 I0419 22:05:48.710499 132256122660672 pyconfig.py:432] Config param base_mlp_dim: 64 I0419 22:05:48.710515 132256122660672 pyconfig.py:432] Config param base_moe_mlp_dim: -1 I0419 22:05:48.710530 132256122660672 pyconfig.py:432] Config param base_num_decoder_layers: 1 I0419 22:05:48.710547 132256122660672 pyconfig.py:432] Config param base_num_kv_heads: 2 I0419 22:05:48.710563 132256122660672 pyconfig.py:432] Config param base_num_query_heads: 2 I0419 22:05:48.710579 132256122660672 pyconfig.py:432] Config param base_output_directory: I0419 22:05:48.710595 132256122660672 pyconfig.py:432] Config param batch_size: 1 I0419 22:05:48.710612 132256122660672 pyconfig.py:432] Config param batch_split_factor: 1 I0419 22:05:48.710631 132256122660672 pyconfig.py:432] Config param beta_fast: 32 I0419 22:05:48.710648 132256122660672 pyconfig.py:432] Config param beta_slow: 1 I0419 22:05:48.710664 132256122660672 pyconfig.py:432] Config param bwd_quantization_calibration_method: absmax I0419 22:05:48.710681 132256122660672 pyconfig.py:432] Config param capacity_factor: -1.0 I0419 22:05:48.710698 132256122660672 pyconfig.py:432] Config param cast_logits_to_fp32: True I0419 22:05:48.710714 132256122660672 pyconfig.py:432] Config param chat_template: I0419 22:05:48.710729 132256122660672 pyconfig.py:432] Config param chat_template_path: I0419 22:05:48.710746 132256122660672 pyconfig.py:432] Config param checkpoint_conversion_fn: None I0419 22:05:48.710763 132256122660672 pyconfig.py:432] Config param checkpoint_dir: None I0419 22:05:48.710782 132256122660672 pyconfig.py:432] Config param checkpoint_is_quantized: False I0419 22:05:48.710799 132256122660672 pyconfig.py:432] Config param checkpoint_period: 2000 I0419 22:05:48.710815 132256122660672 pyconfig.py:432] Config param checkpoint_storage_concurrent_gb: 96 I0419 22:05:48.710831 132256122660672 pyconfig.py:432] Config param checkpoint_storage_target_data_file_size_bytes: 2147483648 I0419 22:05:48.710849 132256122660672 pyconfig.py:432] Config param checkpoint_storage_use_ocdbt: True I0419 22:05:48.710864 132256122660672 pyconfig.py:432] Config param checkpoint_storage_use_zarr3: True I0419 22:05:48.710880 132256122660672 pyconfig.py:432] Config param checkpoint_todelete_full_path: None I0419 22:05:48.710897 132256122660672 pyconfig.py:432] Config param checkpoint_todelete_subdir: None I0419 22:05:48.710912 132256122660672 pyconfig.py:432] Config param chips_per_vm: 4 I0419 22:05:48.710927 132256122660672 pyconfig.py:432] Config param chunk_attn_window_size: 0 I0419 22:05:48.710943 132256122660672 pyconfig.py:432] Config param collect_stack_trace: False I0419 22:05:48.710959 132256122660672 pyconfig.py:432] Config param colocated_python_checkpointing: False I0419 22:05:48.710975 132256122660672 pyconfig.py:432] Config param colocated_python_data_input: False I0419 22:05:48.710991 132256122660672 pyconfig.py:432] Config param compile_topology: I0419 22:05:48.711006 132256122660672 pyconfig.py:432] Config param compile_topology_num_slices: -1 I0419 22:05:48.711020 132256122660672 pyconfig.py:432] Config param compile_xla_flags: I0419 22:05:48.711036 132256122660672 pyconfig.py:432] Config param compiled_trainstep_file: I0419 22:05:48.711052 132256122660672 pyconfig.py:432] Config param compute_axis_order: 0,1,2,3 I0419 22:05:48.711068 132256122660672 pyconfig.py:432] Config param constant_bound_config: [] I0419 22:05:48.711084 132256122660672 pyconfig.py:432] Config param context: RematLocation.REMAT I0419 22:05:48.711101 132256122660672 pyconfig.py:432] Config param context_parallel_load_balance: True I0419 22:05:48.711116 132256122660672 pyconfig.py:432] Config param context_parallel_size: 1 I0419 22:05:48.711132 132256122660672 pyconfig.py:432] Config param context_parallel_strategy: all_gather I0419 22:05:48.711148 132256122660672 pyconfig.py:432] Config param context_sharding: context I0419 22:05:48.711164 132256122660672 pyconfig.py:432] Config param conv_chunksize_for_audio: 500 I0419 22:05:48.711180 132256122660672 pyconfig.py:432] Config param conv_stride_for_vit: 14 I0419 22:05:48.711194 132256122660672 pyconfig.py:432] Config param cost_estimate_flops_bwd: -1 I0419 22:05:48.711210 132256122660672 pyconfig.py:432] Config param cost_estimate_flops_fwd: -1 I0419 22:05:48.711226 132256122660672 pyconfig.py:432] Config param custom_mesh: I0419 22:05:48.711241 132256122660672 pyconfig.py:432] Config param custom_mesh_and_rule: I0419 22:05:48.711255 132256122660672 pyconfig.py:432] Config param d_model_for_audio: 256 I0419 22:05:48.711270 132256122660672 pyconfig.py:432] Config param data_sharding: (('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'),) I0419 22:05:48.711288 132256122660672 pyconfig.py:432] Config param data_shuffle_seed: 0 I0419 22:05:48.711304 132256122660672 pyconfig.py:432] Config param dataset_name: c4/en:3.0.1 I0419 22:05:48.711320 132256122660672 pyconfig.py:432] Config param dataset_path: I0419 22:05:48.711344 132256122660672 pyconfig.py:432] Config param dataset_type: DatasetType.HF I0419 22:05:48.711362 132256122660672 pyconfig.py:432] Config param dcn_autoregressive_parallelism: 1 I0419 22:05:48.711376 132256122660672 pyconfig.py:432] Config param dcn_context_autoregressive_parallelism: 1 I0419 22:05:48.711392 132256122660672 pyconfig.py:432] Config param dcn_context_parallelism: 1 I0419 22:05:48.711408 132256122660672 pyconfig.py:432] Config param dcn_data_parallelism: -1 I0419 22:05:48.711423 132256122660672 pyconfig.py:432] Config param dcn_diloco_parallelism: 1 I0419 22:05:48.711438 132256122660672 pyconfig.py:432] Config param dcn_expert_parallelism: 1 I0419 22:05:48.711454 132256122660672 pyconfig.py:432] Config param dcn_fsdp_parallelism: 1 I0419 22:05:48.711469 132256122660672 pyconfig.py:432] Config param dcn_fsdp_transpose_parallelism: 1 I0419 22:05:48.711484 132256122660672 pyconfig.py:432] Config param dcn_parallelism: [1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] I0419 22:05:48.711501 132256122660672 pyconfig.py:432] Config param dcn_pipeline_parallelism: 1 I0419 22:05:48.711517 132256122660672 pyconfig.py:432] Config param dcn_sequence_parallelism: 1 I0419 22:05:48.711532 132256122660672 pyconfig.py:432] Config param dcn_tensor_parallelism: 1 I0419 22:05:48.711547 132256122660672 pyconfig.py:432] Config param dcn_tensor_sequence_parallelism: 1 I0419 22:05:48.711562 132256122660672 pyconfig.py:432] Config param dcn_tensor_transpose_parallelism: 1 I0419 22:05:48.711578 132256122660672 pyconfig.py:432] Config param debug: {'rl': False} I0419 22:05:48.711596 132256122660672 pyconfig.py:432] Config param debug_sharding: False I0419 22:05:48.711611 132256122660672 pyconfig.py:432] Config param decode_sampling_nucleus_p: -1 I0419 22:05:48.711630 132256122660672 pyconfig.py:432] Config param decode_sampling_strategy: SamplingStrategy.GREEDY I0419 22:05:48.711648 132256122660672 pyconfig.py:432] Config param decode_sampling_temperature: 1.0 I0419 22:05:48.711663 132256122660672 pyconfig.py:432] Config param decode_sampling_top_k: 0 I0419 22:05:48.711679 132256122660672 pyconfig.py:432] Config param decoder_block: DecoderBlockType.GPT3 I0419 22:05:48.711696 132256122660672 pyconfig.py:432] Config param decoder_layer_input: RematLocation.DEVICE I0419 22:05:48.711713 132256122660672 pyconfig.py:432] Config param deepstack_visual_indexes_for_vit: [] I0419 22:05:48.711729 132256122660672 pyconfig.py:432] Config param degenerate_group_masking: True I0419 22:05:48.711745 132256122660672 pyconfig.py:432] Config param dense_init_scale: 1.0 I0419 22:05:48.711761 132256122660672 pyconfig.py:432] Config param diloco_outer_lr: 0.3 I0419 22:05:48.711776 132256122660672 pyconfig.py:432] Config param diloco_outer_momentum: 0.9 I0419 22:05:48.711792 132256122660672 pyconfig.py:432] Config param diloco_sync_period: 36 I0419 22:05:48.711808 132256122660672 pyconfig.py:432] Config param distill_alpha: 0.5 I0419 22:05:48.711824 132256122660672 pyconfig.py:432] Config param distill_beta: 0.0 I0419 22:05:48.711840 132256122660672 pyconfig.py:432] Config param distill_feature_loss_type: cosine I0419 22:05:48.711857 132256122660672 pyconfig.py:432] Config param distill_layer_indices: None I0419 22:05:48.711873 132256122660672 pyconfig.py:432] Config param distill_temperature: 1.0 I0419 22:05:48.711889 132256122660672 pyconfig.py:432] Config param downsample_hidden_size_for_audio: 256 I0419 22:05:48.711905 132256122660672 pyconfig.py:432] Config param dpo_beta: 0.1 I0419 22:05:48.711922 132256122660672 pyconfig.py:432] Config param dpo_label_smoothing: 0.0 I0419 22:05:48.711941 132256122660672 pyconfig.py:432] Config param dq_reduction_steps: 0 I0419 22:05:48.711966 132256122660672 pyconfig.py:432] Config param dropout_rate: 0.0 I0419 22:05:48.711991 132256122660672 pyconfig.py:432] Config param dtype: bfloat16 I0419 22:05:48.712037 132256122660672 pyconfig.py:432] Config param dtype_mm: float32 I0419 22:05:48.712063 132256122660672 pyconfig.py:432] Config param dump_hlo: False I0419 22:05:48.712080 132256122660672 pyconfig.py:432] Config param dump_hlo_delete_local_after: True I0419 22:05:48.712097 132256122660672 pyconfig.py:432] Config param dump_hlo_gcs_dir: gpt3-52k_2026-04-19-22-05/xla_dump I0419 22:05:48.712113 132256122660672 pyconfig.py:432] Config param dump_hlo_local_dir: /tmp/xla_dump/ I0419 22:05:48.712129 132256122660672 pyconfig.py:432] Config param dump_hlo_local_module_name: jit_train_step I0419 22:05:48.712145 132256122660672 pyconfig.py:432] Config param dump_hlo_module_name: jit_train_step I0419 22:05:48.712159 132256122660672 pyconfig.py:432] Config param dump_hlo_upload_all: False I0419 22:05:48.712175 132256122660672 pyconfig.py:432] Config param dump_hlo_xla_flags: I0419 22:05:48.712190 132256122660672 pyconfig.py:432] Config param dump_jaxpr: False I0419 22:05:48.712206 132256122660672 pyconfig.py:432] Config param dump_jaxpr_delete_local_after: True I0419 22:05:48.712223 132256122660672 pyconfig.py:432] Config param dump_jaxpr_gcs_dir: gpt3-52k_2026-04-19-22-05/jaxpr_dump I0419 22:05:48.712237 132256122660672 pyconfig.py:432] Config param dump_jaxpr_local_dir: /tmp/jaxpr_dump/ I0419 22:05:48.712253 132256122660672 pyconfig.py:432] Config param dump_step: -1 I0419 22:05:48.712268 132256122660672 pyconfig.py:432] Config param elastic_enabled: False I0419 22:05:48.712283 132256122660672 pyconfig.py:432] Config param elastic_max_retries: 10 I0419 22:05:48.712299 132256122660672 pyconfig.py:432] Config param elastic_timeout_seconds: 300 I0419 22:05:48.712315 132256122660672 pyconfig.py:432] Config param emb_dim: 16 I0419 22:05:48.712330 132256122660672 pyconfig.py:432] Config param enable_autocheckpoint: False I0419 22:05:48.712374 132256122660672 pyconfig.py:432] Config param enable_checkpoint_cloud_logger: False I0419 22:05:48.712391 132256122660672 pyconfig.py:432] Config param enable_checkpointing: True I0419 22:05:48.712407 132256122660672 pyconfig.py:432] Config param enable_continuous_checkpointing: False I0419 22:05:48.712423 132256122660672 pyconfig.py:432] Config param enable_data_shuffling: True I0419 22:05:48.712437 132256122660672 pyconfig.py:432] Config param enable_diloco: False I0419 22:05:48.712453 132256122660672 pyconfig.py:432] Config param enable_dp_attention: False I0419 22:05:48.712467 132256122660672 pyconfig.py:432] Config param enable_dropout: False I0419 22:05:48.712483 132256122660672 pyconfig.py:432] Config param enable_emergency_checkpoint: False I0419 22:05:48.712499 132256122660672 pyconfig.py:432] Config param enable_gcp_goodput_metrics: True I0419 22:05:48.712516 132256122660672 pyconfig.py:432] Config param enable_gcp_step_deviation_metrics: True I0419 22:05:48.712530 132256122660672 pyconfig.py:432] Config param enable_goodput_recording: False I0419 22:05:48.712546 132256122660672 pyconfig.py:432] Config param enable_jax_profiler: False I0419 22:05:48.712560 132256122660672 pyconfig.py:432] Config param enable_llm_inference_pool: False I0419 22:05:48.712576 132256122660672 pyconfig.py:432] Config param enable_model_warmup: False I0419 22:05:48.712590 132256122660672 pyconfig.py:432] Config param enable_multi_tier_checkpointing: False I0419 22:05:48.712607 132256122660672 pyconfig.py:432] Config param enable_nnx: False I0419 22:05:48.712626 132256122660672 pyconfig.py:432] Config param enable_orbax_v1: False I0419 22:05:48.712643 132256122660672 pyconfig.py:432] Config param enable_padding_causal_mask: True I0419 22:05:48.712659 132256122660672 pyconfig.py:432] Config param enable_pathways_goodput: False I0419 22:05:48.712674 132256122660672 pyconfig.py:432] Config param enable_prefix_caching: False I0419 22:05:48.712689 132256122660672 pyconfig.py:432] Config param enable_rampup_batch_size: False I0419 22:05:48.712704 132256122660672 pyconfig.py:432] Config param enable_single_controller: False I0419 22:05:48.712719 132256122660672 pyconfig.py:432] Config param enable_single_replica_ckpt_restoring: False I0419 22:05:48.712734 132256122660672 pyconfig.py:432] Config param enable_tensorboard: True I0419 22:05:48.712750 132256122660672 pyconfig.py:432] Config param enable_tunix_perf_metrics: False I0419 22:05:48.712766 132256122660672 pyconfig.py:432] Config param encoder_attention_heads_for_audio: 4 I0419 22:05:48.712782 132256122660672 pyconfig.py:432] Config param encoder_ffn_dim_for_audio: 512 I0419 22:05:48.712798 132256122660672 pyconfig.py:432] Config param encoder_layers_for_audio: 2 I0419 22:05:48.712812 132256122660672 pyconfig.py:432] Config param engram: RematLocation.REMAT I0419 22:05:48.712830 132256122660672 pyconfig.py:432] Config param engram_head_dim: 1280 I0419 22:05:48.712845 132256122660672 pyconfig.py:432] Config param engram_kernel_size: 4 I0419 22:05:48.712860 132256122660672 pyconfig.py:432] Config param engram_layers: [] I0419 22:05:48.712878 132256122660672 pyconfig.py:432] Config param engram_max_ngram_size: 3 I0419 22:05:48.712893 132256122660672 pyconfig.py:432] Config param engram_num_heads: 8 I0419 22:05:48.712908 132256122660672 pyconfig.py:432] Config param engram_seed: 0 I0419 22:05:48.712924 132256122660672 pyconfig.py:432] Config param engram_vocab_bases: [] I0419 22:05:48.712939 132256122660672 pyconfig.py:432] Config param epsilon_high: None I0419 22:05:48.712955 132256122660672 pyconfig.py:432] Config param eval_corr_lst: False I0419 22:05:48.712970 132256122660672 pyconfig.py:432] Config param eval_data_columns: ['text'] I0419 22:05:48.712986 132256122660672 pyconfig.py:432] Config param eval_dataset_name: c4/en:3.0.1 I0419 22:05:48.713001 132256122660672 pyconfig.py:432] Config param eval_image_column: image I0419 22:05:48.713018 132256122660672 pyconfig.py:432] Config param eval_interval: -1 I0419 22:05:48.713033 132256122660672 pyconfig.py:432] Config param eval_make_lst: False I0419 22:05:48.713049 132256122660672 pyconfig.py:432] Config param eval_per_device_batch_size: 2 I0419 22:05:48.713064 132256122660672 pyconfig.py:432] Config param eval_sampling_strategy: greedy I0419 22:05:48.713078 132256122660672 pyconfig.py:432] Config param eval_split: validation I0419 22:05:48.713094 132256122660672 pyconfig.py:432] Config param eval_steps: -1 I0419 22:05:48.713108 132256122660672 pyconfig.py:432] Config param expansion_factor_real_data: -1.0 I0419 22:05:48.713125 132256122660672 pyconfig.py:432] Config param final_logits_soft_cap: None I0419 22:05:48.713139 132256122660672 pyconfig.py:432] Config param first_num_dense_layers: 0 I0419 22:05:48.713155 132256122660672 pyconfig.py:432] Config param float32_gate_logits: False I0419 22:05:48.713169 132256122660672 pyconfig.py:432] Config param float32_logits: False I0419 22:05:48.713185 132256122660672 pyconfig.py:432] Config param float32_qk_product: False I0419 22:05:48.713200 132256122660672 pyconfig.py:432] Config param float32_weight_sum: True I0419 22:05:48.713216 132256122660672 pyconfig.py:432] Config param force_q_layout: False I0419 22:05:48.713230 132256122660672 pyconfig.py:432] Config param force_unroll: False I0419 22:05:48.713246 132256122660672 pyconfig.py:432] Config param freeze_audio_encoder_params: True I0419 22:05:48.713261 132256122660672 pyconfig.py:432] Config param freeze_vision_encoder_params: True I0419 22:05:48.713275 132256122660672 pyconfig.py:432] Config param fused_mlp: False I0419 22:05:48.713291 132256122660672 pyconfig.py:432] Config param fused_qkv: True I0419 22:05:48.713306 132256122660672 pyconfig.py:432] Config param gcs_metrics: False I0419 22:05:48.713321 132256122660672 pyconfig.py:432] Config param gdn_chunk_size: 64 I0419 22:05:48.713371 132256122660672 pyconfig.py:432] Config param gdn_conv_kernel_dim: 4 I0419 22:05:48.713390 132256122660672 pyconfig.py:432] Config param gdn_key_head_dim: 128 I0419 22:05:48.713405 132256122660672 pyconfig.py:432] Config param gdn_num_key_heads: 16 I0419 22:05:48.713421 132256122660672 pyconfig.py:432] Config param gdn_num_value_heads: 32 I0419 22:05:48.713435 132256122660672 pyconfig.py:432] Config param gdn_value_head_dim: 128 I0419 22:05:48.713451 132256122660672 pyconfig.py:432] Config param generate_padding_batch_eval: False I0419 22:05:48.713467 132256122660672 pyconfig.py:432] Config param generate_padding_batch_train: False I0419 22:05:48.713482 132256122660672 pyconfig.py:432] Config param generate_slice: v5e-16 I0419 22:05:48.713497 132256122660672 pyconfig.py:432] Config param generation_configs: {} I0419 22:05:48.713512 132256122660672 pyconfig.py:432] Config param global_batch_size_to_eval_on: 64 I0419 22:05:48.713528 132256122660672 pyconfig.py:432] Config param global_batch_size_to_load: 512 I0419 22:05:48.713543 132256122660672 pyconfig.py:432] Config param global_batch_size_to_load_eval: 64 I0419 22:05:48.713558 132256122660672 pyconfig.py:432] Config param global_batch_size_to_load_increment: None I0419 22:05:48.713574 132256122660672 pyconfig.py:432] Config param global_batch_size_to_load_start: None I0419 22:05:48.713590 132256122660672 pyconfig.py:432] Config param global_batch_size_to_train_on: 512 I0419 22:05:48.713605 132256122660672 pyconfig.py:432] Config param global_head_dim: 0 I0419 22:05:48.713624 132256122660672 pyconfig.py:432] Config param global_num_kv_heads: 0 I0419 22:05:48.713640 132256122660672 pyconfig.py:432] Config param global_parameter_scale: 1 I0419 22:05:48.713656 132256122660672 pyconfig.py:432] Config param global_rampup_samples: 500 I0419 22:05:48.713673 132256122660672 pyconfig.py:432] Config param global_rope_max_timescale: -1 I0419 22:05:48.713688 132256122660672 pyconfig.py:432] Config param global_rope_proportion: 0.25 I0419 22:05:48.713705 132256122660672 pyconfig.py:432] Config param goodput_upload_interval_seconds: 30 I0419 22:05:48.713720 132256122660672 pyconfig.py:432] Config param grad_dtype: float32 I0419 22:05:48.713754 132256122660672 pyconfig.py:432] Config param gradient_accumulation_steps: 8 I0419 22:05:48.713771 132256122660672 pyconfig.py:432] Config param gradient_clipping_threshold: 1.0 I0419 22:05:48.713787 132256122660672 pyconfig.py:432] Config param grain_data_source_max_workers: 16 I0419 22:05:48.713802 132256122660672 pyconfig.py:432] Config param grain_eval_files: I0419 22:05:48.713818 132256122660672 pyconfig.py:432] Config param grain_file_type: arrayrecord I0419 22:05:48.713835 132256122660672 pyconfig.py:432] Config param grain_num_threads: 16 I0419 22:05:48.713851 132256122660672 pyconfig.py:432] Config param grain_num_threads_eval: 16 I0419 22:05:48.713867 132256122660672 pyconfig.py:432] Config param grain_packing_type: first_fit I0419 22:05:48.713884 132256122660672 pyconfig.py:432] Config param grain_per_worker_buffer_size: 1 I0419 22:05:48.713901 132256122660672 pyconfig.py:432] Config param grain_per_worker_buffer_size_eval: 1 I0419 22:05:48.713916 132256122660672 pyconfig.py:432] Config param grain_prefetch_buffer_size: 500 I0419 22:05:48.713931 132256122660672 pyconfig.py:432] Config param grain_prefetch_buffer_size_eval: 500 I0419 22:05:48.713946 132256122660672 pyconfig.py:432] Config param grain_ram_budget_mb: 1024 I0419 22:05:48.713963 132256122660672 pyconfig.py:432] Config param grain_shuffle_buffer_size: 100 I0419 22:05:48.713978 132256122660672 pyconfig.py:432] Config param grain_train_files: I0419 22:05:48.713994 132256122660672 pyconfig.py:432] Config param grain_train_mixture_config_path: I0419 22:05:48.714010 132256122660672 pyconfig.py:432] Config param grain_worker_count: 1 I0419 22:05:48.714025 132256122660672 pyconfig.py:432] Config param grain_worker_count_eval: 1 I0419 22:05:48.714041 132256122660672 pyconfig.py:432] Config param grpo_beta: 0.08 I0419 22:05:48.714058 132256122660672 pyconfig.py:432] Config param grpo_epsilon: 0.2 I0419 22:05:48.714077 132256122660672 pyconfig.py:432] Config param hardware: tpu I0419 22:05:48.714095 132256122660672 pyconfig.py:432] Config param hbm_utilization_vllm: 0.72 I0419 22:05:48.714111 132256122660672 pyconfig.py:432] Config param head_dim: 8 I0419 22:05:48.714127 132256122660672 pyconfig.py:432] Config param heartbeat_reporting_interval_in_seconds: 5 I0419 22:05:48.714144 132256122660672 pyconfig.py:432] Config param hf_data_dir: None I0419 22:05:48.714164 132256122660672 pyconfig.py:432] Config param hf_eval_files: None I0419 22:05:48.714181 132256122660672 pyconfig.py:432] Config param hf_eval_split: None I0419 22:05:48.714196 132256122660672 pyconfig.py:432] Config param hf_name: None I0419 22:05:48.714214 132256122660672 pyconfig.py:432] Config param hf_path: OptimalScale/ClimbMix I0419 22:05:48.714231 132256122660672 pyconfig.py:432] Config param hf_train_files: None I0419 22:05:48.714245 132256122660672 pyconfig.py:432] Config param hidden_size_for_vit: 1408 I0419 22:05:48.714261 132256122660672 pyconfig.py:432] Config param hide_profiler_step_metric: False I0419 22:05:48.714276 132256122660672 pyconfig.py:432] Config param ici_autoregressive_parallelism: 1 I0419 22:05:48.714292 132256122660672 pyconfig.py:432] Config param ici_context_autoregressive_parallelism: 1 I0419 22:05:48.714308 132256122660672 pyconfig.py:432] Config param ici_context_parallelism: 1 I0419 22:05:48.714324 132256122660672 pyconfig.py:432] Config param ici_data_parallelism: 1 I0419 22:05:48.714348 132256122660672 pyconfig.py:432] Config param ici_diloco_parallelism: 1 I0419 22:05:48.714366 132256122660672 pyconfig.py:432] Config param ici_expert_parallelism: 1 I0419 22:05:48.714382 132256122660672 pyconfig.py:432] Config param ici_fsdp_parallelism: -1 I0419 22:05:48.714397 132256122660672 pyconfig.py:432] Config param ici_fsdp_transpose_parallelism: 1 I0419 22:05:48.714413 132256122660672 pyconfig.py:432] Config param ici_parallelism: [1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1] I0419 22:05:48.714431 132256122660672 pyconfig.py:432] Config param ici_pipeline_parallelism: 1 I0419 22:05:48.714446 132256122660672 pyconfig.py:432] Config param ici_sequence_parallelism: 1 I0419 22:05:48.714462 132256122660672 pyconfig.py:432] Config param ici_tensor_parallelism: 1 I0419 22:05:48.714476 132256122660672 pyconfig.py:432] Config param ici_tensor_sequence_parallelism: 1 I0419 22:05:48.714492 132256122660672 pyconfig.py:432] Config param ici_tensor_transpose_parallelism: 1 I0419 22:05:48.714507 132256122660672 pyconfig.py:432] Config param image_path: I0419 22:05:48.714523 132256122660672 pyconfig.py:432] Config param image_placeholder: <|image|> I0419 22:05:48.714548 132256122660672 pyconfig.py:432] Config param image_size_for_vit: 896 I0419 22:05:48.714563 132256122660672 pyconfig.py:432] Config param indexer_head_dim: 128 I0419 22:05:48.714579 132256122660672 pyconfig.py:432] Config param indexer_loss_scaling_factor: 0.0 I0419 22:05:48.714595 132256122660672 pyconfig.py:432] Config param indexer_n_heads: 64 I0419 22:05:48.714611 132256122660672 pyconfig.py:432] Config param indexer_sparse_training: False I0419 22:05:48.714629 132256122660672 pyconfig.py:432] Config param indexer_topk: 2048 I0419 22:05:48.714645 132256122660672 pyconfig.py:432] Config param inference_benchmark_test: False I0419 22:05:48.714662 132256122660672 pyconfig.py:432] Config param inference_metadata_file: I0419 22:05:48.714678 132256122660672 pyconfig.py:432] Config param inference_microbenchmark_log_file_path: I0419 22:05:48.714693 132256122660672 pyconfig.py:432] Config param inference_microbenchmark_loop_iters: 10 I0419 22:05:48.714708 132256122660672 pyconfig.py:432] Config param inference_microbenchmark_num_samples: [1, 2, 3, 4, 5] I0419 22:05:48.714724 132256122660672 pyconfig.py:432] Config param inference_microbenchmark_prefill_lengths: 64,128,256,512,1024 I0419 22:05:48.714740 132256122660672 pyconfig.py:432] Config param inference_microbenchmark_stages: prefill,generate I0419 22:05:48.714755 132256122660672 pyconfig.py:432] Config param inference_server: MaxtextInterleavedServer I0419 22:05:48.714770 132256122660672 pyconfig.py:432] Config param inhomogeneous_layer_cycle_interval: 1 I0419 22:05:48.714785 132256122660672 pyconfig.py:432] Config param init_weights_seed: 0 I0419 22:05:48.714801 132256122660672 pyconfig.py:432] Config param input_data_sharding_logical_axes: ['activation_embed_and_logits_batch', 'activation_norm_length'] I0419 22:05:48.714817 132256122660672 pyconfig.py:432] Config param interleave_moe_layer_step: 1 I0419 22:05:48.714832 132256122660672 pyconfig.py:432] Config param intermediate_size_for_vit: 5632 I0419 22:05:48.714847 132256122660672 pyconfig.py:432] Config param internal_compile: False I0419 22:05:48.714863 132256122660672 pyconfig.py:432] Config param internal_compile_num_devices: -1 I0419 22:05:48.714879 132256122660672 pyconfig.py:432] Config param jax_cache_dir: ~/jax_cache I0419 22:05:48.714895 132256122660672 pyconfig.py:432] Config param jax_debug_log_modules: I0419 22:05:48.714909 132256122660672 pyconfig.py:432] Config param jax_distributed_initialization_timeout: 300 I0419 22:05:48.714925 132256122660672 pyconfig.py:432] Config param jax_profiler_port: 9999 I0419 22:05:48.714939 132256122660672 pyconfig.py:432] Config param key_proj: RematLocation.REMAT I0419 22:05:48.714956 132256122660672 pyconfig.py:432] Config param kv_cache_buffer: 256 I0419 22:05:48.714972 132256122660672 pyconfig.py:432] Config param kv_lora_rank: 512 I0419 22:05:48.714987 132256122660672 pyconfig.py:432] Config param kv_quant_axis: KvQuantAxis.HEADS_AND_DKV I0419 22:05:48.715004 132256122660672 pyconfig.py:432] Config param kv_quant_dtype: int8 I0419 22:05:48.715019 132256122660672 pyconfig.py:432] Config param kv_wa_proj: RematLocation.REMAT I0419 22:05:48.715036 132256122660672 pyconfig.py:432] Config param learning_rate: 0.0002 I0419 22:05:48.715052 132256122660672 pyconfig.py:432] Config param learning_rate_final_fraction: 0.1 I0419 22:05:48.715068 132256122660672 pyconfig.py:432] Config param learning_rate_schedule_steps: 200000 I0419 22:05:48.715084 132256122660672 pyconfig.py:432] Config param load_balance_loss_weight: 0.0 I0419 22:05:48.715099 132256122660672 pyconfig.py:432] Config param load_checkpoint_only_once: False I0419 22:05:48.715114 132256122660672 pyconfig.py:432] Config param load_from_prefill_dir: False I0419 22:05:48.715130 132256122660672 pyconfig.py:432] Config param load_full_state_path: I0419 22:05:48.715146 132256122660672 pyconfig.py:432] Config param load_parameters_path: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items I0419 22:05:48.715162 132256122660672 pyconfig.py:432] Config param local_checkpoint_directory: I0419 22:05:48.715178 132256122660672 pyconfig.py:432] Config param local_checkpoint_period: 0 I0419 22:05:48.715192 132256122660672 pyconfig.py:432] Config param local_rope_max_timescale: -1 I0419 22:05:48.715208 132256122660672 pyconfig.py:432] Config param local_rope_proportion: 1.0 I0419 22:05:48.715222 132256122660672 pyconfig.py:432] Config param log_config: True I0419 22:05:48.715238 132256122660672 pyconfig.py:432] Config param log_period: 10 I0419 22:05:48.715253 132256122660672 pyconfig.py:432] Config param logical_axis_rules: (('activation_embed_and_logits_batch', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_embed_and_logits_batch_sequence', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_vocab', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_vocab', ('tensor', 'tensor_transpose')), ('activation_vocab', 'tensor_sequence'), ('activation_vocab', ('sequence', 'context')), ('vocab', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('embed_vocab', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence', 'autoregressive')), ('activation_kv_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence')), ('activation_attn_length', ('sequence', 'context')), ('activation_attn_length', ('context',)), ('activation_q_length', ('context',)), ('activation_kv_length', ()), ('activation_attn_embed', ('tensor', 'tensor_transpose')), ('activation_kv', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_kv_head_dim', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('q_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('kv_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('qkv', ()), ('kv', ()), ('kv_head_dim', ()), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'expert')), ('q_lora_up_proj', ()), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'expert')), ('kv_lora_up_proj', ()), ('activation_batch_moe', ('data', 'fsdp', 'fsdp_transpose')), ('activation_length_moe', ('sequence', 'context')), ('activation_length_moe', ('context',)), ('activation_norm_length_moe', ('tensor_sequence', 'context', 'sequence')), ('activation_embed_moe', ('tensor', 'tensor_transpose')), ('activation_mlp_moe', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_exp', ('expert',)), ('exp', 'expert'), ('mlp_moe', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'context')), ('embed_moe', ('fsdp', 'sequence', 'context')), ('activation_mlp', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_length', ('sequence', 'context')), ('activation_length', ('context',)), ('activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_embed', ('tensor', 'tensor_transpose')), ('activation_stage', 'stage'), ('mlp', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'context', 'expert')), ('norm', ('tensor', 'tensor_transpose')), ('layers', 'stage'), ('diloco', 'diloco'), ('engram_dim', ('tensor',)), ('dense_layers', ()), ('moe_layers', ()), ('mhc', ()), ('prefill_activation_length', ('sequence', 'context')), ('prefill_activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_prefill_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_length', ('sequence',)), ('cache_heads', ('autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence')), ('cache_heads', ('autoregressive', 'tensor', 'tensor_sequence')), ('paged_kv_heads', ('tensor',)), ('cache_batch_prefill', ()), ('cache_batch', ()), ('cache_heads_none', ()), ('cache_kv', ()), ('cache_sequence', ()), ('num_pages', ()), ('tokens_per_page', ()), ('paged_kv_head_dim_size', ()), ('mlp_no_fsdp', ('tensor', 'tensor_sequence', 'autoregressive')), ('embed_tensor_transpose', ('tensor_transpose',)), ('exp_with_fsdp', 'fsdp')) I0419 22:05:48.715331 132256122660672 pyconfig.py:432] Config param logits_dot_in_fp32: False I0419 22:05:48.715358 132256122660672 pyconfig.py:432] Config param logits_via_embedding: True I0419 22:05:48.715375 132256122660672 pyconfig.py:432] Config param lora_input_adapters_path: I0419 22:05:48.715391 132256122660672 pyconfig.py:432] Config param loss_algo: grpo I0419 22:05:48.715407 132256122660672 pyconfig.py:432] Config param lr_schedule_type: LearningRateScheduleType.COSINE I0419 22:05:48.715425 132256122660672 pyconfig.py:432] Config param managed_mldiagnostics: False I0419 22:05:48.715441 132256122660672 pyconfig.py:432] Config param managed_mldiagnostics_dir: None I0419 22:05:48.715455 132256122660672 pyconfig.py:432] Config param managed_mldiagnostics_run_group: I0419 22:05:48.715471 132256122660672 pyconfig.py:432] Config param matmul_precision: MatmulPrecision.DEFAULT I0419 22:05:48.715488 132256122660672 pyconfig.py:432] Config param max_checkify: False I0419 22:05:48.715504 132256122660672 pyconfig.py:432] Config param max_concurrency: 256 I0419 22:05:48.715521 132256122660672 pyconfig.py:432] Config param max_corpus_chars: 10000000 I0419 22:05:48.715536 132256122660672 pyconfig.py:432] Config param max_num_batched_tokens: None I0419 22:05:48.715553 132256122660672 pyconfig.py:432] Config param max_num_checkpoints_to_keep: None I0419 22:05:48.715569 132256122660672 pyconfig.py:432] Config param max_num_images_per_example: -1 I0419 22:05:48.715584 132256122660672 pyconfig.py:432] Config param max_num_seqs: None I0419 22:05:48.715599 132256122660672 pyconfig.py:432] Config param max_position_embeddings: 163840 I0419 22:05:48.715615 132256122660672 pyconfig.py:432] Config param max_prefill_predict_length: 64 I0419 22:05:48.715635 132256122660672 pyconfig.py:432] Config param max_sample_len_for_audio: 10000 I0419 22:05:48.715651 132256122660672 pyconfig.py:432] Config param max_segments_per_seq: -1 I0419 22:05:48.715667 132256122660672 pyconfig.py:432] Config param max_source_positions_for_audio: 1500 I0419 22:05:48.715683 132256122660672 pyconfig.py:432] Config param max_target_length: 2048 I0419 22:05:48.715698 132256122660672 pyconfig.py:432] Config param max_timescale_for_audio: 10000.0 I0419 22:05:48.715714 132256122660672 pyconfig.py:432] Config param megablox: True I0419 22:05:48.715729 132256122660672 pyconfig.py:432] Config param merge_gating_gmm: False I0419 22:05:48.715743 132256122660672 pyconfig.py:432] Config param mesh_axes: ['diloco', 'data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'] I0419 22:05:48.715761 132256122660672 pyconfig.py:432] Config param metrics_dir: None I0419 22:05:48.715777 132256122660672 pyconfig.py:432] Config param metrics_file: I0419 22:05:48.715793 132256122660672 pyconfig.py:432] Config param mhc_expansion_rate: 1 I0419 22:05:48.715809 132256122660672 pyconfig.py:432] Config param micro_batch_size_to_eval_on: 64 I0419 22:05:48.715824 132256122660672 pyconfig.py:432] Config param micro_batch_size_to_train_on: 64 I0419 22:05:48.715839 132256122660672 pyconfig.py:432] Config param mla_kv: RematLocation.REMAT I0419 22:05:48.715855 132256122660672 pyconfig.py:432] Config param mla_naive_kvcache: True I0419 22:05:48.715869 132256122660672 pyconfig.py:432] Config param mla_q: RematLocation.REMAT I0419 22:05:48.715885 132256122660672 pyconfig.py:432] Config param mlp_activations: ['gelu'] I0419 22:05:48.715902 132256122660672 pyconfig.py:432] Config param mlp_activations_limit: -1.0 I0419 22:05:48.715918 132256122660672 pyconfig.py:432] Config param mlp_bias: False I0419 22:05:48.715934 132256122660672 pyconfig.py:432] Config param mlp_dim: 64 I0419 22:05:48.715948 132256122660672 pyconfig.py:432] Config param mlpwi: RematLocation.REMAT I0419 22:05:48.715964 132256122660672 pyconfig.py:432] Config param mlpwi_0: RematLocation.REMAT I0419 22:05:48.715982 132256122660672 pyconfig.py:432] Config param mlpwi_1: RematLocation.REMAT I0419 22:05:48.715999 132256122660672 pyconfig.py:432] Config param mlpwo: RematLocation.REMAT I0419 22:05:48.716016 132256122660672 pyconfig.py:432] Config param moba: False I0419 22:05:48.716032 132256122660672 pyconfig.py:432] Config param moba_chunk_size: 1024 I0419 22:05:48.716047 132256122660672 pyconfig.py:432] Config param moba_topk: 8 I0419 22:05:48.716064 132256122660672 pyconfig.py:432] Config param model_call_mode: I0419 22:05:48.716078 132256122660672 pyconfig.py:432] Config param model_name: gpt3-52k I0419 22:05:48.716094 132256122660672 pyconfig.py:432] Config param moe_expert_input_dim: -1 I0419 22:05:48.716109 132256122660672 pyconfig.py:432] Config param moe_fsdp_use_two_stage_all_gather: False I0419 22:05:48.716125 132256122660672 pyconfig.py:432] Config param moe_mlp_dim: -1 I0419 22:05:48.716139 132256122660672 pyconfig.py:432] Config param moe_mlpwi_0: RematLocation.REMAT I0419 22:05:48.716155 132256122660672 pyconfig.py:432] Config param moe_mlpwi_1: RematLocation.REMAT I0419 22:05:48.716170 132256122660672 pyconfig.py:432] Config param moe_mlpwo: RematLocation.REMAT I0419 22:05:48.716186 132256122660672 pyconfig.py:432] Config param monitor_goodput: False I0419 22:05:48.716200 132256122660672 pyconfig.py:432] Config param monitor_step_time_deviation: True I0419 22:05:48.716216 132256122660672 pyconfig.py:432] Config param mrope_section: [24, 20, 20] I0419 22:05:48.716233 132256122660672 pyconfig.py:432] Config param mscale: 1.0 I0419 22:05:48.716248 132256122660672 pyconfig.py:432] Config param mtc_data_parallelism: 0 I0419 22:05:48.716264 132256122660672 pyconfig.py:432] Config param mtp_eval_target_module: 0 I0419 22:05:48.716280 132256122660672 pyconfig.py:432] Config param mtp_loss_scaling_factor: 0.1 I0419 22:05:48.716296 132256122660672 pyconfig.py:432] Config param mtp_num_layers: 0 I0419 22:05:48.716311 132256122660672 pyconfig.py:432] Config param mu_dtype: float32 I0419 22:05:48.716350 132256122660672 pyconfig.py:432] Config param multi_sampling: False I0419 22:05:48.716367 132256122660672 pyconfig.py:432] Config param multi_tier_checkpointing_backup_interval_minutes: 0 I0419 22:05:48.716382 132256122660672 pyconfig.py:432] Config param muon_beta: 0.95 I0419 22:05:48.716398 132256122660672 pyconfig.py:432] Config param muon_consistent_rms: None I0419 22:05:48.716414 132256122660672 pyconfig.py:432] Config param muon_weight_decay: 0.0 I0419 22:05:48.716429 132256122660672 pyconfig.py:432] Config param n_routing_groups: -1 I0419 22:05:48.716445 132256122660672 pyconfig.py:432] Config param n_window_for_audio: 50 I0419 22:05:48.716459 132256122660672 pyconfig.py:432] Config param n_window_infer_for_audio: 800 I0419 22:05:48.716475 132256122660672 pyconfig.py:432] Config param nope_layer_interval: -1 I0419 22:05:48.716490 132256122660672 pyconfig.py:432] Config param norm_topk_prob: False I0419 22:05:48.716505 132256122660672 pyconfig.py:432] Config param normalization_layer_epsilon: 1e-05 I0419 22:05:48.716522 132256122660672 pyconfig.py:432] Config param normalize_embedding_logits: False I0419 22:05:48.716538 132256122660672 pyconfig.py:432] Config param num_attention_heads_for_vit: 16 I0419 22:05:48.716553 132256122660672 pyconfig.py:432] Config param num_batches: 4 I0419 22:05:48.716568 132256122660672 pyconfig.py:432] Config param num_channels_for_vit: 3 I0419 22:05:48.716583 132256122660672 pyconfig.py:432] Config param num_conv_layers_for_audio: 3 I0419 22:05:48.716598 132256122660672 pyconfig.py:432] Config param num_decoder_layers: 1 I0419 22:05:48.716613 132256122660672 pyconfig.py:432] Config param num_diloco_replicas: 1 I0419 22:05:48.716632 132256122660672 pyconfig.py:432] Config param num_epoch: 1 I0419 22:05:48.716646 132256122660672 pyconfig.py:432] Config param num_eval_passes: 1 I0419 22:05:48.716661 132256122660672 pyconfig.py:432] Config param num_experts: 1 I0419 22:05:48.716677 132256122660672 pyconfig.py:432] Config param num_experts_per_tok: 1 I0419 22:05:48.716693 132256122660672 pyconfig.py:432] Config param num_generations: 2 I0419 22:05:48.716708 132256122660672 pyconfig.py:432] Config param num_hidden_layers_for_vit: 34 I0419 22:05:48.716724 132256122660672 pyconfig.py:432] Config param num_iterations: 1 I0419 22:05:48.716738 132256122660672 pyconfig.py:432] Config param num_kv_heads: 2 I0419 22:05:48.716754 132256122660672 pyconfig.py:432] Config param num_layers_per_pipeline_stage: 1 I0419 22:05:48.716769 132256122660672 pyconfig.py:432] Config param num_mel_bins_for_audio: 128 I0419 22:05:48.716784 132256122660672 pyconfig.py:432] Config param num_pipeline_microbatches: -1 I0419 22:05:48.716799 132256122660672 pyconfig.py:432] Config param num_pipeline_repeats: -1 I0419 22:05:48.716815 132256122660672 pyconfig.py:432] Config param num_position_embeddings_for_vit: 1024 I0419 22:05:48.716830 132256122660672 pyconfig.py:432] Config param num_query_heads: 2 I0419 22:05:48.716845 132256122660672 pyconfig.py:432] Config param num_samplers_slices: -1 I0419 22:05:48.716861 132256122660672 pyconfig.py:432] Config param num_slices: 1 I0419 22:05:48.716876 132256122660672 pyconfig.py:432] Config param num_target_devices: 32 I0419 22:05:48.716892 132256122660672 pyconfig.py:432] Config param num_test_batches: 5 I0419 22:05:48.716907 132256122660672 pyconfig.py:432] Config param num_trainer_slices: -1 I0419 22:05:48.716923 132256122660672 pyconfig.py:432] Config param num_vocab_tiling: 1 I0419 22:05:48.716938 132256122660672 pyconfig.py:432] Config param off_policy_steps: 0 I0419 22:05:48.716953 132256122660672 pyconfig.py:432] Config param offline_data_dir: None I0419 22:05:48.716969 132256122660672 pyconfig.py:432] Config param opt_type: OptimizerType.ADAM_PAX I0419 22:05:48.716986 132256122660672 pyconfig.py:432] Config param optimize_mesh_for_tpu_v6e: False I0419 22:05:48.717001 132256122660672 pyconfig.py:432] Config param optimizer_memory_host_offload: False I0419 22:05:48.717016 132256122660672 pyconfig.py:432] Config param original_max_position_embeddings: 4096 I0419 22:05:48.717032 132256122660672 pyconfig.py:432] Config param out_hidden_size_for_vit: 512 I0419 22:05:48.717048 132256122660672 pyconfig.py:432] Config param out_proj: RematLocation.REMAT I0419 22:05:48.717062 132256122660672 pyconfig.py:432] Config param output_dim_for_audio: 512 I0419 22:05:48.717078 132256122660672 pyconfig.py:432] Config param override_logical_axis_rules: False I0419 22:05:48.717092 132256122660672 pyconfig.py:432] Config param override_model_config: True I0419 22:05:48.717108 132256122660672 pyconfig.py:432] Config param packing: True I0419 22:05:48.717123 132256122660672 pyconfig.py:432] Config param pagedattn_head_dim_alignment: 128 I0419 22:05:48.717139 132256122660672 pyconfig.py:432] Config param pagedattn_max_pages_per_group: -1 I0419 22:05:48.717154 132256122660672 pyconfig.py:432] Config param pagedattn_num_pages: 64 I0419 22:05:48.717170 132256122660672 pyconfig.py:432] Config param pagedattn_pages_per_compute_block: 4 I0419 22:05:48.717185 132256122660672 pyconfig.py:432] Config param pagedattn_tokens_per_page: 32 I0419 22:05:48.717201 132256122660672 pyconfig.py:432] Config param param_scan_axis: 1 I0419 22:05:48.717216 132256122660672 pyconfig.py:432] Config param parameter_memory_host_offload: False I0419 22:05:48.717232 132256122660672 pyconfig.py:432] Config param partial_rotary_factor: 1.0 I0419 22:05:48.717247 132256122660672 pyconfig.py:432] Config param patch_size_for_vit: 14 I0419 22:05:48.717263 132256122660672 pyconfig.py:432] Config param penalty_incorrect_answer: -1.0 I0419 22:05:48.717279 132256122660672 pyconfig.py:432] Config param penalty_incorrect_format: -0.5 I0419 22:05:48.717296 132256122660672 pyconfig.py:432] Config param per_device_batch_size: 2 I0419 22:05:48.717312 132256122660672 pyconfig.py:432] Config param per_device_batch_size_increment: 2.0 I0419 22:05:48.717326 132256122660672 pyconfig.py:432] Config param per_device_batch_size_start: 4.0 I0419 22:05:48.717350 132256122660672 pyconfig.py:432] Config param pipeline_delay_activation_forwarding: False I0419 22:05:48.717366 132256122660672 pyconfig.py:432] Config param pipeline_fsdp_ag_once: False I0419 22:05:48.717380 132256122660672 pyconfig.py:432] Config param pipeline_fsdp_ag_per_repeat: False I0419 22:05:48.717396 132256122660672 pyconfig.py:432] Config param pipeline_parallel_layers: 1 I0419 22:05:48.717411 132256122660672 pyconfig.py:432] Config param pixel_shuffle_ratio_for_vit: 0.5 I0419 22:05:48.717427 132256122660672 pyconfig.py:432] Config param posemb_type_for_vit: learn I0419 22:05:48.717442 132256122660672 pyconfig.py:432] Config param position_id_per_seconds: 25 I0419 22:05:48.717458 132256122660672 pyconfig.py:432] Config param prefill_cache_axis_order: 1,2,0,3 I0419 22:05:48.717472 132256122660672 pyconfig.py:432] Config param prefill_cache_dir: I0419 22:05:48.717488 132256122660672 pyconfig.py:432] Config param prefill_chunk_size: 256 I0419 22:05:48.717504 132256122660672 pyconfig.py:432] Config param prefill_slice: v5e-16 I0419 22:05:48.717520 132256122660672 pyconfig.py:432] Config param prefix_caching_dram_byte: 100000000000 I0419 22:05:48.717534 132256122660672 pyconfig.py:432] Config param prefix_caching_hbm_byte: 10000000000 I0419 22:05:48.717550 132256122660672 pyconfig.py:432] Config param profile_cleanly: True I0419 22:05:48.717566 132256122660672 pyconfig.py:432] Config param profile_periodically_period: -1 I0419 22:05:48.717580 132256122660672 pyconfig.py:432] Config param profile_power_events: False I0419 22:05:48.717596 132256122660672 pyconfig.py:432] Config param profiler: ProfilerType.NONE I0419 22:05:48.717613 132256122660672 pyconfig.py:432] Config param profiler_steps: 5 I0419 22:05:48.717632 132256122660672 pyconfig.py:432] Config param projector_dropout_for_vit: 0.0 I0419 22:05:48.717649 132256122660672 pyconfig.py:432] Config param projector_input_dim_for_vit: 4096 I0419 22:05:48.717664 132256122660672 pyconfig.py:432] Config param projector_output_dim_for_vit: 4096 I0419 22:05:48.717680 132256122660672 pyconfig.py:432] Config param prometheus_port: 0 I0419 22:05:48.717694 132256122660672 pyconfig.py:432] Config param prompt: I love to I0419 22:05:48.717710 132256122660672 pyconfig.py:432] Config param pure_nnx: False I0419 22:05:48.717724 132256122660672 pyconfig.py:432] Config param pure_nnx_decoder: False I0419 22:05:48.717740 132256122660672 pyconfig.py:432] Config param q_lora_rank: 0 I0419 22:05:48.717754 132256122660672 pyconfig.py:432] Config param qk_clip_threshold: 100.0 I0419 22:05:48.717770 132256122660672 pyconfig.py:432] Config param qk_nope_head_dim: 128 I0419 22:05:48.717785 132256122660672 pyconfig.py:432] Config param qk_norm_with_scale: True I0419 22:05:48.717802 132256122660672 pyconfig.py:432] Config param qk_rope_head_dim: 64 I0419 22:05:48.717816 132256122660672 pyconfig.py:432] Config param qkv_proj: RematLocation.REMAT I0419 22:05:48.717833 132256122660672 pyconfig.py:432] Config param quant_cfg_path: I0419 22:05:48.717850 132256122660672 pyconfig.py:432] Config param quantization: QuantizationType.NONE I0419 22:05:48.717867 132256122660672 pyconfig.py:432] Config param quantization_local_shard_count: 4 I0419 22:05:48.717885 132256122660672 pyconfig.py:432] Config param quantize_kvcache: False I0419 22:05:48.717902 132256122660672 pyconfig.py:432] Config param query_proj: RematLocation.REMAT I0419 22:05:48.717919 132256122660672 pyconfig.py:432] Config param query_wa_proj: RematLocation.REMAT I0419 22:05:48.717936 132256122660672 pyconfig.py:432] Config param ragged_block_size: 256 I0419 22:05:48.717951 132256122660672 pyconfig.py:432] Config param ragged_buffer_factor: -1.0 I0419 22:05:48.717967 132256122660672 pyconfig.py:432] Config param rampup_end_step: 0 I0419 22:05:48.717981 132256122660672 pyconfig.py:432] Config param rampup_samples_per_increment_to_load: None I0419 22:05:48.717997 132256122660672 pyconfig.py:432] Config param reasoning_end_token: </reasoning> I0419 22:05:48.718013 132256122660672 pyconfig.py:432] Config param reasoning_start_token: <reasoning> I0419 22:05:48.718028 132256122660672 pyconfig.py:432] Config param record_internal_nn_metrics: 0 I0419 22:05:48.718044 132256122660672 pyconfig.py:432] Config param remat_policy: full I0419 22:05:48.718059 132256122660672 pyconfig.py:432] Config param remat_policy_for_vit: minimal I0419 22:05:48.718074 132256122660672 pyconfig.py:432] Config param remove_size_one_mesh_axis_from_type: True I0419 22:05:48.718088 132256122660672 pyconfig.py:432] Config param replicate_quant_scale: False I0419 22:05:48.718104 132256122660672 pyconfig.py:432] Config param replicator_backup_interval_minutes: 0 I0419 22:05:48.718120 132256122660672 pyconfig.py:432] Config param report_heartbeat_metric_for_gcp_monitoring: False I0419 22:05:48.718137 132256122660672 pyconfig.py:432] Config param report_performance_metric_for_gcp_monitoring: False I0419 22:05:48.718152 132256122660672 pyconfig.py:432] Config param reshape_q: False I0419 22:05:48.718168 132256122660672 pyconfig.py:432] Config param return_log_prob: False I0419 22:05:48.718182 132256122660672 pyconfig.py:432] Config param reuse_example_batch: 0 I0419 22:05:48.718198 132256122660672 pyconfig.py:432] Config param reward_exact_answer: 5.0 I0419 22:05:48.718214 132256122660672 pyconfig.py:432] Config param reward_exact_format_match: 3.0 I0419 22:05:48.718230 132256122660672 pyconfig.py:432] Config param reward_partial_format_match: 0.5 I0419 22:05:48.718247 132256122660672 pyconfig.py:432] Config param reward_ratio_guess_to_answer_high: 0.5 I0419 22:05:48.718261 132256122660672 pyconfig.py:432] Config param reward_ratio_guess_to_answer_low: 0.25 I0419 22:05:48.718277 132256122660672 pyconfig.py:432] Config param reward_white_space_format_match: 1.5 I0419 22:05:48.718294 132256122660672 pyconfig.py:432] Config param rl: {'num_generations': 2, 'num_iterations': 1, 'grpo_beta': 0.08, 'grpo_epsilon': 0.2, 'loss_algo': 'grpo', 'use_agentic_rollout': False, 'max_concurrency': 256, 'off_policy_steps': 0, 'system_prompt': '', 'degenerate_group_masking': True, 'epsilon_high': None} I0419 22:05:48.718315 132256122660672 pyconfig.py:432] Config param rollout_data_parallelism: -1 I0419 22:05:48.718330 132256122660672 pyconfig.py:432] Config param rollout_expert_parallelism: 1 I0419 22:05:48.718355 132256122660672 pyconfig.py:432] Config param rollout_micro_batch_size: -1 I0419 22:05:48.718370 132256122660672 pyconfig.py:432] Config param rollout_tensor_parallelism: -1 I0419 22:05:48.718386 132256122660672 pyconfig.py:432] Config param rope_attention_scaling: False I0419 22:05:48.718402 132256122660672 pyconfig.py:432] Config param rope_factor: 40 I0419 22:05:48.718417 132256122660672 pyconfig.py:432] Config param rope_interleave: True I0419 22:05:48.718432 132256122660672 pyconfig.py:432] Config param rope_linear_scaling_factor: 1.0 I0419 22:05:48.718467 132256122660672 pyconfig.py:432] Config param rope_max_timescale: 10000 I0419 22:05:48.718482 132256122660672 pyconfig.py:432] Config param rope_min_timescale: 1 I0419 22:05:48.718497 132256122660672 pyconfig.py:432] Config param rope_theta_for_vit: 10000 I0419 22:05:48.718513 132256122660672 pyconfig.py:432] Config param rope_truncate: True I0419 22:05:48.718527 132256122660672 pyconfig.py:432] Config param rope_type: RopeType.DEFAULT I0419 22:05:48.718545 132256122660672 pyconfig.py:432] Config param rope_use_scale: True I0419 22:05:48.718561 132256122660672 pyconfig.py:432] Config param routed_bias: False I0419 22:05:48.718577 132256122660672 pyconfig.py:432] Config param routed_bias_update_rate: 0.0 I0419 22:05:48.718591 132256122660672 pyconfig.py:432] Config param routed_scaling_factor: 1.0 I0419 22:05:48.718606 132256122660672 pyconfig.py:432] Config param routed_score_func: I0419 22:05:48.718625 132256122660672 pyconfig.py:432] Config param run_name: gpt3-52k_2026-04-19-22-05 I0419 22:05:48.718640 132256122660672 pyconfig.py:432] Config param sa_block_kv: 512 I0419 22:05:48.718656 132256122660672 pyconfig.py:432] Config param sa_block_kv_compute: 512 I0419 22:05:48.718671 132256122660672 pyconfig.py:432] Config param sa_block_kv_dkv: 512 I0419 22:05:48.718686 132256122660672 pyconfig.py:432] Config param sa_block_kv_dkv_compute: 512 I0419 22:05:48.718700 132256122660672 pyconfig.py:432] Config param sa_block_kv_dq: 512 I0419 22:05:48.718716 132256122660672 pyconfig.py:432] Config param sa_block_q: 512 I0419 22:05:48.718732 132256122660672 pyconfig.py:432] Config param sa_block_q_dkv: 512 I0419 22:05:48.718747 132256122660672 pyconfig.py:432] Config param sa_block_q_dq: 512 I0419 22:05:48.718763 132256122660672 pyconfig.py:432] Config param sa_k_layout: HEAD_DIM_MINOR I0419 22:05:48.718778 132256122660672 pyconfig.py:432] Config param sa_q_layout: HEAD_DIM_MINOR I0419 22:05:48.718795 132256122660672 pyconfig.py:432] Config param sa_use_fused_bwd_kernel: False I0419 22:05:48.718811 132256122660672 pyconfig.py:432] Config param sa_v_layout: HEAD_DIM_MINOR I0419 22:05:48.718826 132256122660672 pyconfig.py:432] Config param sampler_devices_fraction: 0.5 I0419 22:05:48.718842 132256122660672 pyconfig.py:432] Config param save_checkpoint_on_completion: True I0419 22:05:48.718858 132256122660672 pyconfig.py:432] Config param save_config_to_gcs: False I0419 22:05:48.718874 132256122660672 pyconfig.py:432] Config param save_quantized_params_path: I0419 22:05:48.718888 132256122660672 pyconfig.py:432] Config param scale_embedding_for_audio: True I0419 22:05:48.718904 132256122660672 pyconfig.py:432] Config param scan_layers: True I0419 22:05:48.718919 132256122660672 pyconfig.py:432] Config param scan_layers_per_stage: False I0419 22:05:48.718935 132256122660672 pyconfig.py:432] Config param scan_pipeline_iterations: True I0419 22:05:48.718951 132256122660672 pyconfig.py:432] Config param scan_pipeline_repeats: False I0419 22:05:48.718965 132256122660672 pyconfig.py:432] Config param set_remat_policy_on_layers_per_stage: False I0419 22:05:48.718981 132256122660672 pyconfig.py:432] Config param set_remat_policy_on_pipeline_iterations: True I0419 22:05:48.718995 132256122660672 pyconfig.py:432] Config param sft_train_on_completion_only: False I0419 22:05:48.719011 132256122660672 pyconfig.py:432] Config param shard_exp_on_fsdp: False I0419 22:05:48.719025 132256122660672 pyconfig.py:432] Config param shard_mode: ShardMode.AUTO I0419 22:05:48.719042 132256122660672 pyconfig.py:432] Config param shard_optimizer_over_data: False I0419 22:05:48.719057 132256122660672 pyconfig.py:432] Config param sharding_strategy: None I0419 22:05:48.719073 132256122660672 pyconfig.py:432] Config param sharding_tolerance: 0.02 I0419 22:05:48.719087 132256122660672 pyconfig.py:432] Config param shardy: True I0419 22:05:48.719103 132256122660672 pyconfig.py:432] Config param share_kv_projections: False I0419 22:05:48.719119 132256122660672 pyconfig.py:432] Config param shared_experts: 0 I0419 22:05:48.719134 132256122660672 pyconfig.py:432] Config param sinkhorn_iterations: 20 I0419 22:05:48.719150 132256122660672 pyconfig.py:432] Config param skip_first_n_steps_for_profiler: 1 I0419 22:05:48.719166 132256122660672 pyconfig.py:432] Config param skip_jax_distributed_system: False I0419 22:05:48.719183 132256122660672 pyconfig.py:432] Config param skip_step_interval: 128 I0419 22:05:48.719198 132256122660672 pyconfig.py:432] Config param skip_step_on_spikes: False I0419 22:05:48.719213 132256122660672 pyconfig.py:432] Config param skip_step_scaling_factor: 6.0 I0419 22:05:48.719229 132256122660672 pyconfig.py:432] Config param sliding_window_size: 0 I0419 22:05:48.719244 132256122660672 pyconfig.py:432] Config param solution_end_token: </answer> I0419 22:05:48.719259 132256122660672 pyconfig.py:432] Config param solution_start_token: <answer> I0419 22:05:48.719274 132256122660672 pyconfig.py:432] Config param source_checkpoint_layout: orbax I0419 22:05:48.719290 132256122660672 pyconfig.py:432] Config param sparse_matmul: True I0419 22:05:48.719305 132256122660672 pyconfig.py:432] Config param spatial_merge_size_for_vit: 2 I0419 22:05:48.719320 132256122660672 pyconfig.py:432] Config param stack_prefill_result_cache: False I0419 22:05:48.719343 132256122660672 pyconfig.py:432] Config param stack_trace_interval_seconds: 600 I0419 22:05:48.719359 132256122660672 pyconfig.py:432] Config param stack_trace_to_cloud: False I0419 22:05:48.719375 132256122660672 pyconfig.py:432] Config param step_deviation_interval_seconds: 30 I0419 22:05:48.719390 132256122660672 pyconfig.py:432] Config param steps: 200000 I0419 22:05:48.719406 132256122660672 pyconfig.py:432] Config param stop_strings: None I0419 22:05:48.719422 132256122660672 pyconfig.py:432] Config param student_overrides: {'model_name': 'llama3.1-8b'} I0419 22:05:48.719438 132256122660672 pyconfig.py:432] Config param student_params_to_update: None I0419 22:05:48.719453 132256122660672 pyconfig.py:432] Config param subslice_shape: I0419 22:05:48.719469 132256122660672 pyconfig.py:432] Config param swap_space_vllm_gb: 2 I0419 22:05:48.719484 132256122660672 pyconfig.py:432] Config param system_prompt: I0419 22:05:48.719499 132256122660672 pyconfig.py:432] Config param target_eval_loss: 0.0 I0419 22:05:48.719513 132256122660672 pyconfig.py:432] Config param teacher_overrides: {'model_name': 'llama3.1-8b'} I0419 22:05:48.719529 132256122660672 pyconfig.py:432] Config param temperature_tuning: False I0419 22:05:48.719543 132256122660672 pyconfig.py:432] Config param temporal_patch_size_for_vit: 2 I0419 22:05:48.719559 132256122660672 pyconfig.py:432] Config param tensorboard_dir: None I0419 22:05:48.719574 132256122660672 pyconfig.py:432] Config param tensors_on_device: None I0419 22:05:48.719590 132256122660672 pyconfig.py:432] Config param tensors_to_offload: None I0419 22:05:48.719606 132256122660672 pyconfig.py:432] Config param test_batch_start_index: 0 I0419 22:05:48.719622 132256122660672 pyconfig.py:432] Config param tile_size_for_vit: 336 I0419 22:05:48.719638 132256122660672 pyconfig.py:432] Config param tokenize_eval_data: True I0419 22:05:48.719653 132256122660672 pyconfig.py:432] Config param tokenize_train_data: True I0419 22:05:48.719669 132256122660672 pyconfig.py:432] Config param tokenizer_path: meta-llama/Llama-3.1-8B I0419 22:05:48.719683 132256122660672 pyconfig.py:432] Config param tokenizer_type: TokenizerType.HUGGINGFACE I0419 22:05:48.719699 132256122660672 pyconfig.py:432] Config param topk_routing_group: -1 I0419 22:05:48.719715 132256122660672 pyconfig.py:432] Config param train_data_columns: ['text'] I0419 22:05:48.719731 132256122660672 pyconfig.py:432] Config param train_fraction: 1.0 I0419 22:05:48.719747 132256122660672 pyconfig.py:432] Config param train_image_column: image I0419 22:05:48.719762 132256122660672 pyconfig.py:432] Config param train_micro_batch_size: -1 I0419 22:05:48.719777 132256122660672 pyconfig.py:432] Config param train_split: train I0419 22:05:48.719791 132256122660672 pyconfig.py:432] Config param trainable_parameters_mask: [] I0419 22:05:48.719806 132256122660672 pyconfig.py:432] Config param trainable_position_size: 2048 I0419 22:05:48.719821 132256122660672 pyconfig.py:432] Config param trainer_devices_fraction: 0.5 I0419 22:05:48.719837 132256122660672 pyconfig.py:432] Config param upload_all_profiler_results: False I0419 22:05:48.719851 132256122660672 pyconfig.py:432] Config param use_2d_fsdp_sharding: False I0419 22:05:48.719868 132256122660672 pyconfig.py:432] Config param use_agentic_rollout: False I0419 22:05:48.719882 132256122660672 pyconfig.py:432] Config param use_audio: False I0419 22:05:48.719897 132256122660672 pyconfig.py:432] Config param use_audio_in_video: False I0419 22:05:48.719912 132256122660672 pyconfig.py:432] Config param use_batch_split_schedule: False I0419 22:05:48.719928 132256122660672 pyconfig.py:432] Config param use_chat_template: False I0419 22:05:48.719943 132256122660672 pyconfig.py:432] Config param use_chunked_prefill: False I0419 22:05:48.719958 132256122660672 pyconfig.py:432] Config param use_custom_sort_vjp: True I0419 22:05:48.719973 132256122660672 pyconfig.py:432] Config param use_dpo: False I0419 22:05:48.719988 132256122660672 pyconfig.py:432] Config param use_gather_mosaic_kernel: False I0419 22:05:48.720003 132256122660672 pyconfig.py:432] Config param use_grpo: True I0419 22:05:48.720017 132256122660672 pyconfig.py:432] Config param use_indexer: False I0419 22:05:48.720033 132256122660672 pyconfig.py:432] Config param use_iota_embed: True I0419 22:05:48.720049 132256122660672 pyconfig.py:432] Config param use_jax_splash: False I0419 22:05:48.720063 132256122660672 pyconfig.py:432] Config param use_max_logit_estimate: -1 I0419 22:05:48.720079 132256122660672 pyconfig.py:432] Config param use_mrope: False I0419 22:05:48.720093 132256122660672 pyconfig.py:432] Config param use_multimodal: False I0419 22:05:48.720109 132256122660672 pyconfig.py:432] Config param use_pathways: True I0419 22:05:48.720123 132256122660672 pyconfig.py:432] Config param use_post_attn_norm: False I0419 22:05:48.720139 132256122660672 pyconfig.py:432] Config param use_post_ffw_norm: False I0419 22:05:48.720153 132256122660672 pyconfig.py:432] Config param use_qk_clip: False I0419 22:05:48.720169 132256122660672 pyconfig.py:432] Config param use_qk_norm: False I0419 22:05:48.720183 132256122660672 pyconfig.py:432] Config param use_qk_norm_in_gdn: True I0419 22:05:48.720199 132256122660672 pyconfig.py:432] Config param use_qwix_quantization: False I0419 22:05:48.720213 132256122660672 pyconfig.py:432] Config param use_ragged_attention: False I0419 22:05:48.720229 132256122660672 pyconfig.py:432] Config param use_random_routing: False I0419 22:05:48.720243 132256122660672 pyconfig.py:432] Config param use_replicator_service: False I0419 22:05:48.720258 132256122660672 pyconfig.py:432] Config param use_ring_of_experts: False I0419 22:05:48.720273 132256122660672 pyconfig.py:432] Config param use_sft: False I0419 22:05:48.720288 132256122660672 pyconfig.py:432] Config param use_splash_scheduler: False I0419 22:05:48.720303 132256122660672 pyconfig.py:432] Config param use_tokamax_gmm: False I0419 22:05:48.720319 132256122660672 pyconfig.py:432] Config param use_tokamax_splash: False I0419 22:05:48.720342 132256122660672 pyconfig.py:432] Config param use_truncation: True I0419 22:05:48.720358 132256122660672 pyconfig.py:432] Config param use_tunix_gradient_accumulation: False I0419 22:05:48.720374 132256122660672 pyconfig.py:432] Config param use_untrainable_positional_embedding: False I0419 22:05:48.720390 132256122660672 pyconfig.py:432] Config param use_vertex_tensorboard: False I0419 22:05:48.720406 132256122660672 pyconfig.py:432] Config param using_pipeline_parallelism: False I0419 22:05:48.720422 132256122660672 pyconfig.py:432] Config param v_head_dim: 128 I0419 22:05:48.720437 132256122660672 pyconfig.py:432] Config param v_norm_with_scale: True I0419 22:05:48.720453 132256122660672 pyconfig.py:432] Config param value_proj: RematLocation.REMAT I0419 22:05:48.720469 132256122660672 pyconfig.py:432] Config param vertex_tensorboard_project: I0419 22:05:48.720485 132256122660672 pyconfig.py:432] Config param vertex_tensorboard_region: I0419 22:05:48.720500 132256122660672 pyconfig.py:432] Config param video_path: I0419 22:05:48.720516 132256122660672 pyconfig.py:432] Config param video_placeholder: <|video|> I0419 22:05:48.720532 132256122660672 pyconfig.py:432] Config param vision_output_dim_for_vit: 4096 I0419 22:05:48.720548 132256122660672 pyconfig.py:432] Config param vision_output_length: -1 I0419 22:05:48.720563 132256122660672 pyconfig.py:432] Config param vllm_additional_config: {} I0419 22:05:48.720579 132256122660672 pyconfig.py:432] Config param vllm_hf_config_path: I0419 22:05:48.720595 132256122660672 pyconfig.py:432] Config param vllm_hf_overrides: {} I0419 22:05:48.720610 132256122660672 pyconfig.py:432] Config param vocab_size: 32000 I0419 22:05:48.720629 132256122660672 pyconfig.py:432] Config param warmup_steps_fraction: 0.1 I0419 22:05:48.720645 132256122660672 pyconfig.py:432] Config param weight_dtype: float32 I0419 22:05:48.720668 132256122660672 pyconfig.py:432] Config param weight_quantization_calibration_method: absmax I0419 22:05:48.720685 132256122660672 pyconfig.py:432] Config param wi_tile_dlhs_batch_seq: 512 I0419 22:05:48.720700 132256122660672 pyconfig.py:432] Config param wi_tile_dlhs_embed_dim: 1024 I0419 22:05:48.720716 132256122660672 pyconfig.py:432] Config param wi_tile_dlhs_mlp_dim: 1024 I0419 22:05:48.720732 132256122660672 pyconfig.py:432] Config param wi_tile_drhs_batch_seq: 512 I0419 22:05:48.720747 132256122660672 pyconfig.py:432] Config param wi_tile_drhs_embed_dim: 1024 I0419 22:05:48.720763 132256122660672 pyconfig.py:432] Config param wi_tile_drhs_mlp_dim: 1024 I0419 22:05:48.720778 132256122660672 pyconfig.py:432] Config param wi_tile_fwd_batch_seq: 512 I0419 22:05:48.720793 132256122660672 pyconfig.py:432] Config param wi_tile_fwd_embed_dim: 1024 I0419 22:05:48.720808 132256122660672 pyconfig.py:432] Config param wi_tile_fwd_mlp_dim: 1024 I0419 22:05:48.720823 132256122660672 pyconfig.py:432] Config param wo_tile_dlhs_batch_seq: 512 I0419 22:05:48.720839 132256122660672 pyconfig.py:432] Config param wo_tile_dlhs_embed_dim: 1024 I0419 22:05:48.720854 132256122660672 pyconfig.py:432] Config param wo_tile_dlhs_mlp_dim: 1024 I0419 22:05:48.720868 132256122660672 pyconfig.py:432] Config param wo_tile_drhs_batch_seq: 512 I0419 22:05:48.720884 132256122660672 pyconfig.py:432] Config param wo_tile_drhs_embed_dim: 1024 I0419 22:05:48.720900 132256122660672 pyconfig.py:432] Config param wo_tile_drhs_mlp_dim: 1024 I0419 22:05:48.720915 132256122660672 pyconfig.py:432] Config param wo_tile_fwd_batch_seq: 512 I0419 22:05:48.720931 132256122660672 pyconfig.py:432] Config param wo_tile_fwd_embed_dim: 1024 I0419 22:05:48.720947 132256122660672 pyconfig.py:432] Config param wo_tile_fwd_mlp_dim: 1024 I0419 22:05:48.720961 132256122660672 pyconfig.py:432] Config param wsd_decay_steps_fraction: 0.1 I0419 22:05:48.720977 132256122660672 pyconfig.py:432] Config param wsd_decay_style: WsdDecayStyle.LINEAR I0419 22:05:48.720995 132256122660672 pyconfig.py:432] Config param xprof_e2e_enable_fw_power_level_event: False I0419 22:05:48.721010 132256122660672 pyconfig.py:432] Config param xprof_e2e_enable_fw_thermal_event: False I0419 22:05:48.721026 132256122660672 pyconfig.py:432] Config param xprof_e2e_enable_fw_throttle_event: False I0419 22:05:48.721042 132256122660672 pyconfig.py:432] Config param xprof_tpu_power_trace_level: 0 I0419 22:05:48.721058 132256122660672 pyconfig.py:432] Config param z_loss_multiplier: 0.0 I0419 22:05:48.721399 132256122660672 tokenizer.py:245] Tokenizer path: meta-llama/Llama-2-7b-chat-hf I0419 22:05:48.721435 132256122660672 tokenizer.py:224] Loading HF tokenizer: meta-llama/Llama-2-7b-chat-hf I0419 22:05:52.655984 132256122660672 _schedule.py:129] A polynomial schedule was set with a non-positive `transition_steps` value; this results in a constant schedule with value `init_value`. Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 749, in <module> app.run(main) File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 316, in run _run_main(main, args) File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 261, in _run_main sys.exit(main(argv)) ^^^^^^^^^^ File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 745, in main train_distill(student_config, teacher_config, is_offline, global_config.offline_data_dir) File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 580, in train_distill devices_array = maxtext_utils.create_device_mesh(student_config, devices) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/deps/src/maxtext/utils/maxtext_utils.py", line 1677, in create_device_mesh ici_parallelism = max_utils.fill_unspecified_mesh_axes(config.ici_parallelism.copy(), num_devices_per_slice, "ICI") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/deps/src/maxtext/utils/max_utils.py", line 450, in fill_unspecified_mesh_axes assert np.prod(parallelism_vals) == target_product, ( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AssertionError: Number of devices per slice 32 does not match the product of the ICI parallelism 8 XPK End: Sun Apr 19 22:06:01 UTC 2026 EXIT_CODE=1