XPK Start: Sun Apr 19 22:21:39 UTC 2026 2026-04-19 22:21:56.311079: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303) I0419 22:21:59.858254 136021250135872 max_utils.py:273] Attempting to initialize the jax distributed system... INFO:2026-04-19 22:22:08,895:jax._src.distributed:149: Starting JAX distributed service on [::]:8482 I0419 22:22:08.895660 136021250135872 distributed.py:149] Starting JAX distributed service on [::]:8482 INFO:2026-04-19 22:22:08,897:jax._src.distributed:166: Connecting to JAX distributed service on mt-07-distill-smoke-37yg3-slice-job-0-0.mt-07-distill-smoke-37yg3:8482 I0419 22:22:08.897944 136021250135872 distributed.py:166] Connecting to JAX distributed service on mt-07-distill-smoke-37yg3-slice-job-0-0.mt-07-distill-smoke-37yg3:8482 I0419 22:22:10.729270 136021250135872 max_utils.py:284] Jax distributed system initialized! I0419 22:22:16.885894 136021250135872 max_utils.py:244] Jax distributed system is already initialized. I0419 22:22:17.355692 136021250135872 max_utils.py:244] Jax distributed system is already initialized. I0419 22:22:17.356843 136021250135872 pyconfig.py:432] Config param abort_on_inf_loss: True I0419 22:22:17.356891 136021250135872 pyconfig.py:432] Config param abort_on_nan_loss: True I0419 22:22:17.356916 136021250135872 pyconfig.py:432] Config param act_quantization_calibration_method: absmax I0419 22:22:17.356937 136021250135872 pyconfig.py:432] Config param activation_dropout_for_audio: 0.0 I0419 22:22:17.356957 136021250135872 pyconfig.py:432] Config param activation_function_for_audio: gelu I0419 22:22:17.356976 136021250135872 pyconfig.py:432] Config param activations_in_float32: False I0419 22:22:17.356995 136021250135872 pyconfig.py:432] Config param adam_b1: 0.9 I0419 22:22:17.357014 136021250135872 pyconfig.py:432] Config param adam_b2: 0.95 I0419 22:22:17.357031 136021250135872 pyconfig.py:432] Config param adam_eps: 1e-08 I0419 22:22:17.357054 136021250135872 pyconfig.py:432] Config param adam_eps_root: 0.0 I0419 22:22:17.357071 136021250135872 pyconfig.py:432] Config param adam_weight_decay: 0.1 I0419 22:22:17.357087 136021250135872 pyconfig.py:432] Config param adamw_mask: [] I0419 22:22:17.357103 136021250135872 pyconfig.py:432] Config param add_bos: True I0419 22:22:17.357119 136021250135872 pyconfig.py:432] Config param add_eos: True I0419 22:22:17.357134 136021250135872 pyconfig.py:432] Config param allow_split_physical_axes: False I0419 22:22:17.357151 136021250135872 pyconfig.py:432] Config param ar_cache_axis_order: 1,2,0,3 I0419 22:22:17.357168 136021250135872 pyconfig.py:432] Config param async_checkpointing: True I0419 22:22:17.357182 136021250135872 pyconfig.py:432] Config param async_scheduling: False I0419 22:22:17.357199 136021250135872 pyconfig.py:432] Config param attention: dot_product I0419 22:22:17.357215 136021250135872 pyconfig.py:432] Config param attention_bias: False I0419 22:22:17.357232 136021250135872 pyconfig.py:432] Config param attention_dropout_for_audio: 0.0 I0419 22:22:17.357249 136021250135872 pyconfig.py:432] Config param attention_out: RematLocation.REMAT I0419 22:22:17.357270 136021250135872 pyconfig.py:432] Config param attention_output_dim: -1 I0419 22:22:17.357288 136021250135872 pyconfig.py:432] Config param attention_sink: False I0419 22:22:17.357304 136021250135872 pyconfig.py:432] Config param attention_type: global I0419 22:22:17.357319 136021250135872 pyconfig.py:432] Config param attn_logits_soft_cap: None I0419 22:22:17.357346 136021250135872 pyconfig.py:432] Config param audio_path: I0419 22:22:17.357363 136021250135872 pyconfig.py:432] Config param audio_placeholder: <|audio|> I0419 22:22:17.357378 136021250135872 pyconfig.py:432] Config param autoregressive_decode_assert: I0419 22:22:17.357394 136021250135872 pyconfig.py:432] Config param base_config: base.yml I0419 22:22:17.357409 136021250135872 pyconfig.py:432] Config param base_emb_dim: 16 I0419 22:22:17.357424 136021250135872 pyconfig.py:432] Config param base_mlp_dim: 64 I0419 22:22:17.357440 136021250135872 pyconfig.py:432] Config param base_moe_mlp_dim: -1 I0419 22:22:17.357456 136021250135872 pyconfig.py:432] Config param base_num_decoder_layers: 1 I0419 22:22:17.357470 136021250135872 pyconfig.py:432] Config param base_num_kv_heads: 2 I0419 22:22:17.357486 136021250135872 pyconfig.py:432] Config param base_num_query_heads: 2 I0419 22:22:17.357502 136021250135872 pyconfig.py:432] Config param base_output_directory: I0419 22:22:17.357517 136021250135872 pyconfig.py:432] Config param batch_size: 1 I0419 22:22:17.357533 136021250135872 pyconfig.py:432] Config param batch_split_factor: 1 I0419 22:22:17.357548 136021250135872 pyconfig.py:432] Config param beta_fast: 32 I0419 22:22:17.357564 136021250135872 pyconfig.py:432] Config param beta_slow: 1 I0419 22:22:17.357579 136021250135872 pyconfig.py:432] Config param bwd_quantization_calibration_method: absmax I0419 22:22:17.357594 136021250135872 pyconfig.py:432] Config param capacity_factor: -1.0 I0419 22:22:17.357610 136021250135872 pyconfig.py:432] Config param cast_logits_to_fp32: True I0419 22:22:17.357626 136021250135872 pyconfig.py:432] Config param chat_template: I0419 22:22:17.357646 136021250135872 pyconfig.py:432] Config param chat_template_path: I0419 22:22:17.357662 136021250135872 pyconfig.py:432] Config param checkpoint_conversion_fn: None I0419 22:22:17.357678 136021250135872 pyconfig.py:432] Config param checkpoint_dir: None I0419 22:22:17.357696 136021250135872 pyconfig.py:432] Config param checkpoint_is_quantized: False I0419 22:22:17.357713 136021250135872 pyconfig.py:432] Config param checkpoint_period: 2000 I0419 22:22:17.357728 136021250135872 pyconfig.py:432] Config param checkpoint_storage_concurrent_gb: 96 I0419 22:22:17.357745 136021250135872 pyconfig.py:432] Config param checkpoint_storage_target_data_file_size_bytes: 2147483648 I0419 22:22:17.357762 136021250135872 pyconfig.py:432] Config param checkpoint_storage_use_ocdbt: True I0419 22:22:17.357778 136021250135872 pyconfig.py:432] Config param checkpoint_storage_use_zarr3: True I0419 22:22:17.357793 136021250135872 pyconfig.py:432] Config param checkpoint_todelete_full_path: None I0419 22:22:17.357808 136021250135872 pyconfig.py:432] Config param checkpoint_todelete_subdir: None I0419 22:22:17.357823 136021250135872 pyconfig.py:432] Config param chips_per_vm: 4 I0419 22:22:17.357839 136021250135872 pyconfig.py:432] Config param chunk_attn_window_size: 0 I0419 22:22:17.357853 136021250135872 pyconfig.py:432] Config param collect_stack_trace: False I0419 22:22:17.357869 136021250135872 pyconfig.py:432] Config param colocated_python_checkpointing: False I0419 22:22:17.357884 136021250135872 pyconfig.py:432] Config param colocated_python_data_input: False I0419 22:22:17.357898 136021250135872 pyconfig.py:432] Config param compile_topology: I0419 22:22:17.357913 136021250135872 pyconfig.py:432] Config param compile_topology_num_slices: -1 I0419 22:22:17.357928 136021250135872 pyconfig.py:432] Config param compile_xla_flags: I0419 22:22:17.357943 136021250135872 pyconfig.py:432] Config param compiled_trainstep_file: I0419 22:22:17.357958 136021250135872 pyconfig.py:432] Config param compute_axis_order: 0,1,2,3 I0419 22:22:17.357972 136021250135872 pyconfig.py:432] Config param constant_bound_config: [] I0419 22:22:17.357988 136021250135872 pyconfig.py:432] Config param context: RematLocation.REMAT I0419 22:22:17.358003 136021250135872 pyconfig.py:432] Config param context_parallel_load_balance: True I0419 22:22:17.358018 136021250135872 pyconfig.py:432] Config param context_parallel_size: 1 I0419 22:22:17.358032 136021250135872 pyconfig.py:432] Config param context_parallel_strategy: all_gather I0419 22:22:17.358048 136021250135872 pyconfig.py:432] Config param context_sharding: context I0419 22:22:17.358063 136021250135872 pyconfig.py:432] Config param conv_chunksize_for_audio: 500 I0419 22:22:17.358079 136021250135872 pyconfig.py:432] Config param conv_stride_for_vit: 14 I0419 22:22:17.358094 136021250135872 pyconfig.py:432] Config param cost_estimate_flops_bwd: -1 I0419 22:22:17.358109 136021250135872 pyconfig.py:432] Config param cost_estimate_flops_fwd: -1 I0419 22:22:17.358123 136021250135872 pyconfig.py:432] Config param custom_mesh: I0419 22:22:17.358138 136021250135872 pyconfig.py:432] Config param custom_mesh_and_rule: I0419 22:22:17.358154 136021250135872 pyconfig.py:432] Config param d_model_for_audio: 256 I0419 22:22:17.358168 136021250135872 pyconfig.py:432] Config param data_sharding: (('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'),) I0419 22:22:17.358187 136021250135872 pyconfig.py:432] Config param data_shuffle_seed: 0 I0419 22:22:17.358202 136021250135872 pyconfig.py:432] Config param dataset_name: c4/en:3.0.1 I0419 22:22:17.358217 136021250135872 pyconfig.py:432] Config param dataset_path: I0419 22:22:17.358231 136021250135872 pyconfig.py:432] Config param dataset_type: DatasetType.HF I0419 22:22:17.358248 136021250135872 pyconfig.py:432] Config param dcn_autoregressive_parallelism: 1 I0419 22:22:17.358263 136021250135872 pyconfig.py:432] Config param dcn_context_autoregressive_parallelism: 1 I0419 22:22:17.358278 136021250135872 pyconfig.py:432] Config param dcn_context_parallelism: 1 I0419 22:22:17.358292 136021250135872 pyconfig.py:432] Config param dcn_data_parallelism: -1 I0419 22:22:17.358327 136021250135872 pyconfig.py:432] Config param dcn_diloco_parallelism: 1 I0419 22:22:17.358352 136021250135872 pyconfig.py:432] Config param dcn_expert_parallelism: 1 I0419 22:22:17.358366 136021250135872 pyconfig.py:432] Config param dcn_fsdp_parallelism: 1 I0419 22:22:17.358382 136021250135872 pyconfig.py:432] Config param dcn_fsdp_transpose_parallelism: 1 I0419 22:22:17.358397 136021250135872 pyconfig.py:432] Config param dcn_parallelism: [1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] I0419 22:22:17.358414 136021250135872 pyconfig.py:432] Config param dcn_pipeline_parallelism: 1 I0419 22:22:17.358429 136021250135872 pyconfig.py:432] Config param dcn_sequence_parallelism: 1 I0419 22:22:17.358444 136021250135872 pyconfig.py:432] Config param dcn_tensor_parallelism: 1 I0419 22:22:17.358459 136021250135872 pyconfig.py:432] Config param dcn_tensor_sequence_parallelism: 1 I0419 22:22:17.358473 136021250135872 pyconfig.py:432] Config param dcn_tensor_transpose_parallelism: 1 I0419 22:22:17.358489 136021250135872 pyconfig.py:432] Config param debug: {'rl': False} I0419 22:22:17.358504 136021250135872 pyconfig.py:432] Config param debug_sharding: False I0419 22:22:17.358520 136021250135872 pyconfig.py:432] Config param decode_sampling_nucleus_p: -1 I0419 22:22:17.358536 136021250135872 pyconfig.py:432] Config param decode_sampling_strategy: SamplingStrategy.GREEDY I0419 22:22:17.358553 136021250135872 pyconfig.py:432] Config param decode_sampling_temperature: 1.0 I0419 22:22:17.358568 136021250135872 pyconfig.py:432] Config param decode_sampling_top_k: 0 I0419 22:22:17.358583 136021250135872 pyconfig.py:432] Config param decoder_block: DecoderBlockType.GPT3 I0419 22:22:17.358600 136021250135872 pyconfig.py:432] Config param decoder_layer_input: RematLocation.DEVICE I0419 22:22:17.358616 136021250135872 pyconfig.py:432] Config param deepstack_visual_indexes_for_vit: [] I0419 22:22:17.358635 136021250135872 pyconfig.py:432] Config param degenerate_group_masking: True I0419 22:22:17.358651 136021250135872 pyconfig.py:432] Config param dense_init_scale: 1.0 I0419 22:22:17.358665 136021250135872 pyconfig.py:432] Config param diloco_outer_lr: 0.3 I0419 22:22:17.358681 136021250135872 pyconfig.py:432] Config param diloco_outer_momentum: 0.9 I0419 22:22:17.358695 136021250135872 pyconfig.py:432] Config param diloco_sync_period: 36 I0419 22:22:17.358711 136021250135872 pyconfig.py:432] Config param distill_alpha: 0.5 I0419 22:22:17.358727 136021250135872 pyconfig.py:432] Config param distill_beta: 0.0 I0419 22:22:17.358741 136021250135872 pyconfig.py:432] Config param distill_feature_loss_type: cosine I0419 22:22:17.358757 136021250135872 pyconfig.py:432] Config param distill_layer_indices: None I0419 22:22:17.358772 136021250135872 pyconfig.py:432] Config param distill_temperature: 1.0 I0419 22:22:17.358787 136021250135872 pyconfig.py:432] Config param downsample_hidden_size_for_audio: 256 I0419 22:22:17.358803 136021250135872 pyconfig.py:432] Config param dpo_beta: 0.1 I0419 22:22:17.358818 136021250135872 pyconfig.py:432] Config param dpo_label_smoothing: 0.0 I0419 22:22:17.358834 136021250135872 pyconfig.py:432] Config param dq_reduction_steps: 0 I0419 22:22:17.358850 136021250135872 pyconfig.py:432] Config param dropout_rate: 0.0 I0419 22:22:17.358865 136021250135872 pyconfig.py:432] Config param dtype: bfloat16 I0419 22:22:17.358897 136021250135872 pyconfig.py:432] Config param dtype_mm: float32 I0419 22:22:17.358912 136021250135872 pyconfig.py:432] Config param dump_hlo: False I0419 22:22:17.358927 136021250135872 pyconfig.py:432] Config param dump_hlo_delete_local_after: True I0419 22:22:17.358943 136021250135872 pyconfig.py:432] Config param dump_hlo_gcs_dir: gpt3-52k_2026-04-19-22-22/xla_dump I0419 22:22:17.358957 136021250135872 pyconfig.py:432] Config param dump_hlo_local_dir: /tmp/xla_dump/ I0419 22:22:17.358973 136021250135872 pyconfig.py:432] Config param dump_hlo_local_module_name: jit_train_step I0419 22:22:17.358987 136021250135872 pyconfig.py:432] Config param dump_hlo_module_name: jit_train_step I0419 22:22:17.359003 136021250135872 pyconfig.py:432] Config param dump_hlo_upload_all: False I0419 22:22:17.359017 136021250135872 pyconfig.py:432] Config param dump_hlo_xla_flags: I0419 22:22:17.359032 136021250135872 pyconfig.py:432] Config param dump_jaxpr: False I0419 22:22:17.359046 136021250135872 pyconfig.py:432] Config param dump_jaxpr_delete_local_after: True I0419 22:22:17.359061 136021250135872 pyconfig.py:432] Config param dump_jaxpr_gcs_dir: gpt3-52k_2026-04-19-22-22/jaxpr_dump I0419 22:22:17.359076 136021250135872 pyconfig.py:432] Config param dump_jaxpr_local_dir: /tmp/jaxpr_dump/ I0419 22:22:17.359091 136021250135872 pyconfig.py:432] Config param dump_step: -1 I0419 22:22:17.359105 136021250135872 pyconfig.py:432] Config param elastic_enabled: False I0419 22:22:17.359120 136021250135872 pyconfig.py:432] Config param elastic_max_retries: 10 I0419 22:22:17.359135 136021250135872 pyconfig.py:432] Config param elastic_timeout_seconds: 300 I0419 22:22:17.359151 136021250135872 pyconfig.py:432] Config param emb_dim: 16 I0419 22:22:17.359165 136021250135872 pyconfig.py:432] Config param enable_autocheckpoint: False I0419 22:22:17.359180 136021250135872 pyconfig.py:432] Config param enable_checkpoint_cloud_logger: False I0419 22:22:17.359194 136021250135872 pyconfig.py:432] Config param enable_checkpointing: True I0419 22:22:17.359210 136021250135872 pyconfig.py:432] Config param enable_continuous_checkpointing: False I0419 22:22:17.359224 136021250135872 pyconfig.py:432] Config param enable_data_shuffling: True I0419 22:22:17.359239 136021250135872 pyconfig.py:432] Config param enable_diloco: False I0419 22:22:17.359253 136021250135872 pyconfig.py:432] Config param enable_dp_attention: False I0419 22:22:17.359269 136021250135872 pyconfig.py:432] Config param enable_dropout: False I0419 22:22:17.359283 136021250135872 pyconfig.py:432] Config param enable_emergency_checkpoint: False I0419 22:22:17.359299 136021250135872 pyconfig.py:432] Config param enable_gcp_goodput_metrics: True I0419 22:22:17.359315 136021250135872 pyconfig.py:432] Config param enable_gcp_step_deviation_metrics: True I0419 22:22:17.359329 136021250135872 pyconfig.py:432] Config param enable_goodput_recording: False I0419 22:22:17.359352 136021250135872 pyconfig.py:432] Config param enable_jax_profiler: False I0419 22:22:17.359367 136021250135872 pyconfig.py:432] Config param enable_llm_inference_pool: False I0419 22:22:17.359383 136021250135872 pyconfig.py:432] Config param enable_model_warmup: False I0419 22:22:17.359396 136021250135872 pyconfig.py:432] Config param enable_multi_tier_checkpointing: False I0419 22:22:17.359412 136021250135872 pyconfig.py:432] Config param enable_nnx: False I0419 22:22:17.359426 136021250135872 pyconfig.py:432] Config param enable_orbax_v1: False I0419 22:22:17.359441 136021250135872 pyconfig.py:432] Config param enable_padding_causal_mask: True I0419 22:22:17.359455 136021250135872 pyconfig.py:432] Config param enable_pathways_goodput: False I0419 22:22:17.359471 136021250135872 pyconfig.py:432] Config param enable_prefix_caching: False I0419 22:22:17.359485 136021250135872 pyconfig.py:432] Config param enable_rampup_batch_size: False I0419 22:22:17.359500 136021250135872 pyconfig.py:432] Config param enable_single_controller: False I0419 22:22:17.359515 136021250135872 pyconfig.py:432] Config param enable_single_replica_ckpt_restoring: False I0419 22:22:17.359529 136021250135872 pyconfig.py:432] Config param enable_tensorboard: True I0419 22:22:17.359544 136021250135872 pyconfig.py:432] Config param enable_tunix_perf_metrics: False I0419 22:22:17.359558 136021250135872 pyconfig.py:432] Config param encoder_attention_heads_for_audio: 4 I0419 22:22:17.359574 136021250135872 pyconfig.py:432] Config param encoder_ffn_dim_for_audio: 512 I0419 22:22:17.359588 136021250135872 pyconfig.py:432] Config param encoder_layers_for_audio: 2 I0419 22:22:17.359603 136021250135872 pyconfig.py:432] Config param engram: RematLocation.REMAT I0419 22:22:17.359619 136021250135872 pyconfig.py:432] Config param engram_head_dim: 1280 I0419 22:22:17.359637 136021250135872 pyconfig.py:432] Config param engram_kernel_size: 4 I0419 22:22:17.359651 136021250135872 pyconfig.py:432] Config param engram_layers: [] I0419 22:22:17.359667 136021250135872 pyconfig.py:432] Config param engram_max_ngram_size: 3 I0419 22:22:17.359681 136021250135872 pyconfig.py:432] Config param engram_num_heads: 8 I0419 22:22:17.359696 136021250135872 pyconfig.py:432] Config param engram_seed: 0 I0419 22:22:17.359711 136021250135872 pyconfig.py:432] Config param engram_vocab_bases: [] I0419 22:22:17.359725 136021250135872 pyconfig.py:432] Config param epsilon_high: None I0419 22:22:17.359740 136021250135872 pyconfig.py:432] Config param eval_corr_lst: False I0419 22:22:17.359754 136021250135872 pyconfig.py:432] Config param eval_data_columns: ['text'] I0419 22:22:17.359770 136021250135872 pyconfig.py:432] Config param eval_dataset_name: c4/en:3.0.1 I0419 22:22:17.359784 136021250135872 pyconfig.py:432] Config param eval_image_column: image I0419 22:22:17.359799 136021250135872 pyconfig.py:432] Config param eval_interval: -1 I0419 22:22:17.359813 136021250135872 pyconfig.py:432] Config param eval_make_lst: False I0419 22:22:17.359828 136021250135872 pyconfig.py:432] Config param eval_per_device_batch_size: 2 I0419 22:22:17.359846 136021250135872 pyconfig.py:432] Config param eval_sampling_strategy: greedy I0419 22:22:17.359859 136021250135872 pyconfig.py:432] Config param eval_split: validation I0419 22:22:17.359875 136021250135872 pyconfig.py:432] Config param eval_steps: -1 I0419 22:22:17.359889 136021250135872 pyconfig.py:432] Config param expansion_factor_real_data: -1.0 I0419 22:22:17.359904 136021250135872 pyconfig.py:432] Config param final_logits_soft_cap: None I0419 22:22:17.359918 136021250135872 pyconfig.py:432] Config param first_num_dense_layers: 0 I0419 22:22:17.359933 136021250135872 pyconfig.py:432] Config param float32_gate_logits: False I0419 22:22:17.359947 136021250135872 pyconfig.py:432] Config param float32_logits: False I0419 22:22:17.359962 136021250135872 pyconfig.py:432] Config param float32_qk_product: False I0419 22:22:17.359976 136021250135872 pyconfig.py:432] Config param float32_weight_sum: True I0419 22:22:17.359991 136021250135872 pyconfig.py:432] Config param force_q_layout: False I0419 22:22:17.360005 136021250135872 pyconfig.py:432] Config param force_unroll: False I0419 22:22:17.360020 136021250135872 pyconfig.py:432] Config param freeze_audio_encoder_params: True I0419 22:22:17.360034 136021250135872 pyconfig.py:432] Config param freeze_vision_encoder_params: True I0419 22:22:17.360049 136021250135872 pyconfig.py:432] Config param fused_mlp: False I0419 22:22:17.360063 136021250135872 pyconfig.py:432] Config param fused_qkv: True I0419 22:22:17.360078 136021250135872 pyconfig.py:432] Config param gcs_metrics: False I0419 22:22:17.360094 136021250135872 pyconfig.py:432] Config param gdn_chunk_size: 64 I0419 22:22:17.360108 136021250135872 pyconfig.py:432] Config param gdn_conv_kernel_dim: 4 I0419 22:22:17.360123 136021250135872 pyconfig.py:432] Config param gdn_key_head_dim: 128 I0419 22:22:17.360136 136021250135872 pyconfig.py:432] Config param gdn_num_key_heads: 16 I0419 22:22:17.360152 136021250135872 pyconfig.py:432] Config param gdn_num_value_heads: 32 I0419 22:22:17.360166 136021250135872 pyconfig.py:432] Config param gdn_value_head_dim: 128 I0419 22:22:17.360180 136021250135872 pyconfig.py:432] Config param generate_padding_batch_eval: False I0419 22:22:17.360195 136021250135872 pyconfig.py:432] Config param generate_padding_batch_train: False I0419 22:22:17.360209 136021250135872 pyconfig.py:432] Config param generate_slice: v5e-16 I0419 22:22:17.360224 136021250135872 pyconfig.py:432] Config param generation_configs: {} I0419 22:22:17.360238 136021250135872 pyconfig.py:432] Config param global_batch_size_to_eval_on: 64 I0419 22:22:17.360253 136021250135872 pyconfig.py:432] Config param global_batch_size_to_load: 512 I0419 22:22:17.360267 136021250135872 pyconfig.py:432] Config param global_batch_size_to_load_eval: 64 I0419 22:22:17.360281 136021250135872 pyconfig.py:432] Config param global_batch_size_to_load_increment: None I0419 22:22:17.360296 136021250135872 pyconfig.py:432] Config param global_batch_size_to_load_start: None I0419 22:22:17.360311 136021250135872 pyconfig.py:432] Config param global_batch_size_to_train_on: 512 I0419 22:22:17.360327 136021250135872 pyconfig.py:432] Config param global_head_dim: 0 I0419 22:22:17.360350 136021250135872 pyconfig.py:432] Config param global_num_kv_heads: 0 I0419 22:22:17.360366 136021250135872 pyconfig.py:432] Config param global_parameter_scale: 1 I0419 22:22:17.360379 136021250135872 pyconfig.py:432] Config param global_rampup_samples: 500 I0419 22:22:17.360395 136021250135872 pyconfig.py:432] Config param global_rope_max_timescale: -1 I0419 22:22:17.360408 136021250135872 pyconfig.py:432] Config param global_rope_proportion: 0.25 I0419 22:22:17.360424 136021250135872 pyconfig.py:432] Config param goodput_upload_interval_seconds: 30 I0419 22:22:17.360438 136021250135872 pyconfig.py:432] Config param grad_dtype: float32 I0419 22:22:17.360471 136021250135872 pyconfig.py:432] Config param gradient_accumulation_steps: 8 I0419 22:22:17.360486 136021250135872 pyconfig.py:432] Config param gradient_clipping_threshold: 1.0 I0419 22:22:17.360502 136021250135872 pyconfig.py:432] Config param grain_data_source_max_workers: 16 I0419 22:22:17.360516 136021250135872 pyconfig.py:432] Config param grain_eval_files: I0419 22:22:17.360531 136021250135872 pyconfig.py:432] Config param grain_file_type: arrayrecord I0419 22:22:17.360545 136021250135872 pyconfig.py:432] Config param grain_num_threads: 16 I0419 22:22:17.360560 136021250135872 pyconfig.py:432] Config param grain_num_threads_eval: 16 I0419 22:22:17.360574 136021250135872 pyconfig.py:432] Config param grain_packing_type: first_fit I0419 22:22:17.360589 136021250135872 pyconfig.py:432] Config param grain_per_worker_buffer_size: 1 I0419 22:22:17.360602 136021250135872 pyconfig.py:432] Config param grain_per_worker_buffer_size_eval: 1 I0419 22:22:17.360618 136021250135872 pyconfig.py:432] Config param grain_prefetch_buffer_size: 500 I0419 22:22:17.360635 136021250135872 pyconfig.py:432] Config param grain_prefetch_buffer_size_eval: 500 I0419 22:22:17.360650 136021250135872 pyconfig.py:432] Config param grain_ram_budget_mb: 1024 I0419 22:22:17.360665 136021250135872 pyconfig.py:432] Config param grain_shuffle_buffer_size: 100 I0419 22:22:17.360679 136021250135872 pyconfig.py:432] Config param grain_train_files: I0419 22:22:17.360695 136021250135872 pyconfig.py:432] Config param grain_train_mixture_config_path: I0419 22:22:17.360708 136021250135872 pyconfig.py:432] Config param grain_worker_count: 1 I0419 22:22:17.360723 136021250135872 pyconfig.py:432] Config param grain_worker_count_eval: 1 I0419 22:22:17.360738 136021250135872 pyconfig.py:432] Config param grpo_beta: 0.08 I0419 22:22:17.360753 136021250135872 pyconfig.py:432] Config param grpo_epsilon: 0.2 I0419 22:22:17.360768 136021250135872 pyconfig.py:432] Config param hardware: tpu I0419 22:22:17.360782 136021250135872 pyconfig.py:432] Config param hbm_utilization_vllm: 0.72 I0419 22:22:17.360796 136021250135872 pyconfig.py:432] Config param head_dim: 8 I0419 22:22:17.360811 136021250135872 pyconfig.py:432] Config param heartbeat_reporting_interval_in_seconds: 5 I0419 22:22:17.360827 136021250135872 pyconfig.py:432] Config param hf_data_dir: None I0419 22:22:17.360841 136021250135872 pyconfig.py:432] Config param hf_eval_files: None I0419 22:22:17.360856 136021250135872 pyconfig.py:432] Config param hf_eval_split: None I0419 22:22:17.360871 136021250135872 pyconfig.py:432] Config param hf_name: None I0419 22:22:17.360886 136021250135872 pyconfig.py:432] Config param hf_path: OptimalScale/ClimbMix I0419 22:22:17.360900 136021250135872 pyconfig.py:432] Config param hf_train_files: None I0419 22:22:17.360915 136021250135872 pyconfig.py:432] Config param hidden_size_for_vit: 1408 I0419 22:22:17.360929 136021250135872 pyconfig.py:432] Config param hide_profiler_step_metric: False I0419 22:22:17.360944 136021250135872 pyconfig.py:432] Config param ici_autoregressive_parallelism: 1 I0419 22:22:17.360958 136021250135872 pyconfig.py:432] Config param ici_context_autoregressive_parallelism: 1 I0419 22:22:17.360973 136021250135872 pyconfig.py:432] Config param ici_context_parallelism: 1 I0419 22:22:17.360987 136021250135872 pyconfig.py:432] Config param ici_data_parallelism: 1 I0419 22:22:17.361001 136021250135872 pyconfig.py:432] Config param ici_diloco_parallelism: 1 I0419 22:22:17.361015 136021250135872 pyconfig.py:432] Config param ici_expert_parallelism: 1 I0419 22:22:17.361031 136021250135872 pyconfig.py:432] Config param ici_fsdp_parallelism: -1 I0419 22:22:17.361044 136021250135872 pyconfig.py:432] Config param ici_fsdp_transpose_parallelism: 1 I0419 22:22:17.361059 136021250135872 pyconfig.py:432] Config param ici_parallelism: [1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1] I0419 22:22:17.361075 136021250135872 pyconfig.py:432] Config param ici_pipeline_parallelism: 1 I0419 22:22:17.361089 136021250135872 pyconfig.py:432] Config param ici_sequence_parallelism: 1 I0419 22:22:17.361104 136021250135872 pyconfig.py:432] Config param ici_tensor_parallelism: 1 I0419 22:22:17.361118 136021250135872 pyconfig.py:432] Config param ici_tensor_sequence_parallelism: 1 I0419 22:22:17.361133 136021250135872 pyconfig.py:432] Config param ici_tensor_transpose_parallelism: 1 I0419 22:22:17.361146 136021250135872 pyconfig.py:432] Config param image_path: I0419 22:22:17.361161 136021250135872 pyconfig.py:432] Config param image_placeholder: <|image|> I0419 22:22:17.361177 136021250135872 pyconfig.py:432] Config param image_size_for_vit: 896 I0419 22:22:17.361192 136021250135872 pyconfig.py:432] Config param indexer_head_dim: 128 I0419 22:22:17.361206 136021250135872 pyconfig.py:432] Config param indexer_loss_scaling_factor: 0.0 I0419 22:22:17.361221 136021250135872 pyconfig.py:432] Config param indexer_n_heads: 64 I0419 22:22:17.361237 136021250135872 pyconfig.py:432] Config param indexer_sparse_training: False I0419 22:22:17.361251 136021250135872 pyconfig.py:432] Config param indexer_topk: 2048 I0419 22:22:17.361266 136021250135872 pyconfig.py:432] Config param inference_benchmark_test: False I0419 22:22:17.361282 136021250135872 pyconfig.py:432] Config param inference_metadata_file: I0419 22:22:17.361296 136021250135872 pyconfig.py:432] Config param inference_microbenchmark_log_file_path: I0419 22:22:17.361311 136021250135872 pyconfig.py:432] Config param inference_microbenchmark_loop_iters: 10 I0419 22:22:17.361325 136021250135872 pyconfig.py:432] Config param inference_microbenchmark_num_samples: [1, 2, 3, 4, 5] I0419 22:22:17.361351 136021250135872 pyconfig.py:432] Config param inference_microbenchmark_prefill_lengths: 64,128,256,512,1024 I0419 22:22:17.361365 136021250135872 pyconfig.py:432] Config param inference_microbenchmark_stages: prefill,generate I0419 22:22:17.361381 136021250135872 pyconfig.py:432] Config param inference_server: MaxtextInterleavedServer I0419 22:22:17.361394 136021250135872 pyconfig.py:432] Config param inhomogeneous_layer_cycle_interval: 1 I0419 22:22:17.361410 136021250135872 pyconfig.py:432] Config param init_weights_seed: 0 I0419 22:22:17.361423 136021250135872 pyconfig.py:432] Config param input_data_sharding_logical_axes: ['activation_embed_and_logits_batch', 'activation_norm_length'] I0419 22:22:17.361439 136021250135872 pyconfig.py:432] Config param interleave_moe_layer_step: 1 I0419 22:22:17.361455 136021250135872 pyconfig.py:432] Config param intermediate_size_for_vit: 5632 I0419 22:22:17.361470 136021250135872 pyconfig.py:432] Config param internal_compile: False I0419 22:22:17.361485 136021250135872 pyconfig.py:432] Config param internal_compile_num_devices: -1 I0419 22:22:17.361499 136021250135872 pyconfig.py:432] Config param jax_cache_dir: ~/jax_cache I0419 22:22:17.361514 136021250135872 pyconfig.py:432] Config param jax_debug_log_modules: I0419 22:22:17.361528 136021250135872 pyconfig.py:432] Config param jax_distributed_initialization_timeout: 300 I0419 22:22:17.361543 136021250135872 pyconfig.py:432] Config param jax_profiler_port: 9999 I0419 22:22:17.361558 136021250135872 pyconfig.py:432] Config param key_proj: RematLocation.REMAT I0419 22:22:17.361575 136021250135872 pyconfig.py:432] Config param kv_cache_buffer: 256 I0419 22:22:17.361590 136021250135872 pyconfig.py:432] Config param kv_lora_rank: 512 I0419 22:22:17.361604 136021250135872 pyconfig.py:432] Config param kv_quant_axis: KvQuantAxis.HEADS_AND_DKV I0419 22:22:17.361623 136021250135872 pyconfig.py:432] Config param kv_quant_dtype: int8 I0419 22:22:17.361641 136021250135872 pyconfig.py:432] Config param kv_wa_proj: RematLocation.REMAT I0419 22:22:17.361656 136021250135872 pyconfig.py:432] Config param learning_rate: 0.0002 I0419 22:22:17.361672 136021250135872 pyconfig.py:432] Config param learning_rate_final_fraction: 0.1 I0419 22:22:17.361687 136021250135872 pyconfig.py:432] Config param learning_rate_schedule_steps: 200000 I0419 22:22:17.361703 136021250135872 pyconfig.py:432] Config param load_balance_loss_weight: 0.0 I0419 22:22:17.361716 136021250135872 pyconfig.py:432] Config param load_checkpoint_only_once: False I0419 22:22:17.361732 136021250135872 pyconfig.py:432] Config param load_from_prefill_dir: False I0419 22:22:17.361746 136021250135872 pyconfig.py:432] Config param load_full_state_path: I0419 22:22:17.361761 136021250135872 pyconfig.py:432] Config param load_parameters_path: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items I0419 22:22:17.361777 136021250135872 pyconfig.py:432] Config param local_checkpoint_directory: I0419 22:22:17.361791 136021250135872 pyconfig.py:432] Config param local_checkpoint_period: 0 I0419 22:22:17.361806 136021250135872 pyconfig.py:432] Config param local_rope_max_timescale: -1 I0419 22:22:17.361820 136021250135872 pyconfig.py:432] Config param local_rope_proportion: 1.0 I0419 22:22:17.361836 136021250135872 pyconfig.py:432] Config param log_config: True I0419 22:22:17.361850 136021250135872 pyconfig.py:432] Config param log_period: 10 I0419 22:22:17.361865 136021250135872 pyconfig.py:432] Config param logical_axis_rules: (('activation_embed_and_logits_batch', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_embed_and_logits_batch_sequence', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_vocab', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_vocab', ('tensor', 'tensor_transpose')), ('activation_vocab', 'tensor_sequence'), ('activation_vocab', ('sequence', 'context')), ('vocab', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('embed_vocab', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence', 'autoregressive')), ('activation_kv_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence')), ('activation_attn_length', ('sequence', 'context')), ('activation_attn_length', ('context',)), ('activation_q_length', ('context',)), ('activation_kv_length', ()), ('activation_attn_embed', ('tensor', 'tensor_transpose')), ('activation_kv', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_kv_head_dim', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('q_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('kv_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('qkv', ()), ('kv', ()), ('kv_head_dim', ()), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'expert')), ('q_lora_up_proj', ()), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'expert')), ('kv_lora_up_proj', ()), ('activation_batch_moe', ('data', 'fsdp', 'fsdp_transpose')), ('activation_length_moe', ('sequence', 'context')), ('activation_length_moe', ('context',)), ('activation_norm_length_moe', ('tensor_sequence', 'context', 'sequence')), ('activation_embed_moe', ('tensor', 'tensor_transpose')), ('activation_mlp_moe', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_exp', ('expert',)), ('exp', 'expert'), ('mlp_moe', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'context')), ('embed_moe', ('fsdp', 'sequence', 'context')), ('activation_mlp', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_length', ('sequence', 'context')), ('activation_length', ('context',)), ('activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_embed', ('tensor', 'tensor_transpose')), ('activation_stage', 'stage'), ('mlp', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'context', 'expert')), ('norm', ('tensor', 'tensor_transpose')), ('layers', 'stage'), ('diloco', 'diloco'), ('engram_dim', ('tensor',)), ('dense_layers', ()), ('moe_layers', ()), ('mhc', ()), ('prefill_activation_length', ('sequence', 'context')), ('prefill_activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_prefill_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_length', ('sequence',)), ('cache_heads', ('autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence')), ('cache_heads', ('autoregressive', 'tensor', 'tensor_sequence')), ('paged_kv_heads', ('tensor',)), ('cache_batch_prefill', ()), ('cache_batch', ()), ('cache_heads_none', ()), ('cache_kv', ()), ('cache_sequence', ()), ('num_pages', ()), ('tokens_per_page', ()), ('paged_kv_head_dim_size', ()), ('mlp_no_fsdp', ('tensor', 'tensor_sequence', 'autoregressive')), ('embed_tensor_transpose', ('tensor_transpose',)), ('exp_with_fsdp', 'fsdp')) I0419 22:22:17.361943 136021250135872 pyconfig.py:432] Config param logits_dot_in_fp32: False I0419 22:22:17.361960 136021250135872 pyconfig.py:432] Config param logits_via_embedding: True I0419 22:22:17.361975 136021250135872 pyconfig.py:432] Config param lora_input_adapters_path: I0419 22:22:17.361989 136021250135872 pyconfig.py:432] Config param loss_algo: grpo I0419 22:22:17.362004 136021250135872 pyconfig.py:432] Config param lr_schedule_type: LearningRateScheduleType.COSINE I0419 22:22:17.362021 136021250135872 pyconfig.py:432] Config param managed_mldiagnostics: False I0419 22:22:17.362035 136021250135872 pyconfig.py:432] Config param managed_mldiagnostics_dir: None I0419 22:22:17.362050 136021250135872 pyconfig.py:432] Config param managed_mldiagnostics_run_group: I0419 22:22:17.362064 136021250135872 pyconfig.py:432] Config param matmul_precision: MatmulPrecision.DEFAULT I0419 22:22:17.362082 136021250135872 pyconfig.py:432] Config param max_checkify: False I0419 22:22:17.362095 136021250135872 pyconfig.py:432] Config param max_concurrency: 256 I0419 22:22:17.362111 136021250135872 pyconfig.py:432] Config param max_corpus_chars: 10000000 I0419 22:22:17.362124 136021250135872 pyconfig.py:432] Config param max_num_batched_tokens: None I0419 22:22:17.362139 136021250135872 pyconfig.py:432] Config param max_num_checkpoints_to_keep: None I0419 22:22:17.362154 136021250135872 pyconfig.py:432] Config param max_num_images_per_example: -1 I0419 22:22:17.362169 136021250135872 pyconfig.py:432] Config param max_num_seqs: None I0419 22:22:17.362183 136021250135872 pyconfig.py:432] Config param max_position_embeddings: 163840 I0419 22:22:17.362198 136021250135872 pyconfig.py:432] Config param max_prefill_predict_length: 64 I0419 22:22:17.362212 136021250135872 pyconfig.py:432] Config param max_sample_len_for_audio: 10000 I0419 22:22:17.362227 136021250135872 pyconfig.py:432] Config param max_segments_per_seq: -1 I0419 22:22:17.362241 136021250135872 pyconfig.py:432] Config param max_source_positions_for_audio: 1500 I0419 22:22:17.362257 136021250135872 pyconfig.py:432] Config param max_target_length: 2048 I0419 22:22:17.362271 136021250135872 pyconfig.py:432] Config param max_timescale_for_audio: 10000.0 I0419 22:22:17.362286 136021250135872 pyconfig.py:432] Config param megablox: True I0419 22:22:17.362301 136021250135872 pyconfig.py:432] Config param merge_gating_gmm: False I0419 22:22:17.362316 136021250135872 pyconfig.py:432] Config param mesh_axes: ['diloco', 'data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'] I0419 22:22:17.362342 136021250135872 pyconfig.py:432] Config param metrics_dir: None I0419 22:22:17.362358 136021250135872 pyconfig.py:432] Config param metrics_file: I0419 22:22:17.362372 136021250135872 pyconfig.py:432] Config param mhc_expansion_rate: 1 I0419 22:22:17.362387 136021250135872 pyconfig.py:432] Config param micro_batch_size_to_eval_on: 64 I0419 22:22:17.362401 136021250135872 pyconfig.py:432] Config param micro_batch_size_to_train_on: 64 I0419 22:22:17.362416 136021250135872 pyconfig.py:432] Config param mla_kv: RematLocation.REMAT I0419 22:22:17.362431 136021250135872 pyconfig.py:432] Config param mla_naive_kvcache: True I0419 22:22:17.362446 136021250135872 pyconfig.py:432] Config param mla_q: RematLocation.REMAT I0419 22:22:17.362460 136021250135872 pyconfig.py:432] Config param mlp_activations: ['gelu'] I0419 22:22:17.362476 136021250135872 pyconfig.py:432] Config param mlp_activations_limit: -1.0 I0419 22:22:17.362493 136021250135872 pyconfig.py:432] Config param mlp_bias: False I0419 22:22:17.362507 136021250135872 pyconfig.py:432] Config param mlp_dim: 64 I0419 22:22:17.362522 136021250135872 pyconfig.py:432] Config param mlpwi: RematLocation.REMAT I0419 22:22:17.362536 136021250135872 pyconfig.py:432] Config param mlpwi_0: RematLocation.REMAT I0419 22:22:17.362551 136021250135872 pyconfig.py:432] Config param mlpwi_1: RematLocation.REMAT I0419 22:22:17.362565 136021250135872 pyconfig.py:432] Config param mlpwo: RematLocation.REMAT I0419 22:22:17.362581 136021250135872 pyconfig.py:432] Config param moba: False I0419 22:22:17.362595 136021250135872 pyconfig.py:432] Config param moba_chunk_size: 1024 I0419 22:22:17.362610 136021250135872 pyconfig.py:432] Config param moba_topk: 8 I0419 22:22:17.362624 136021250135872 pyconfig.py:432] Config param model_call_mode: I0419 22:22:17.362643 136021250135872 pyconfig.py:432] Config param model_name: gpt3-52k I0419 22:22:17.362657 136021250135872 pyconfig.py:432] Config param moe_expert_input_dim: -1 I0419 22:22:17.362673 136021250135872 pyconfig.py:432] Config param moe_fsdp_use_two_stage_all_gather: False I0419 22:22:17.362686 136021250135872 pyconfig.py:432] Config param moe_mlp_dim: -1 I0419 22:22:17.362702 136021250135872 pyconfig.py:432] Config param moe_mlpwi_0: RematLocation.REMAT I0419 22:22:17.362716 136021250135872 pyconfig.py:432] Config param moe_mlpwi_1: RematLocation.REMAT I0419 22:22:17.362732 136021250135872 pyconfig.py:432] Config param moe_mlpwo: RematLocation.REMAT I0419 22:22:17.362748 136021250135872 pyconfig.py:432] Config param monitor_goodput: False I0419 22:22:17.362762 136021250135872 pyconfig.py:432] Config param monitor_step_time_deviation: True I0419 22:22:17.362776 136021250135872 pyconfig.py:432] Config param mrope_section: [24, 20, 20] I0419 22:22:17.362792 136021250135872 pyconfig.py:432] Config param mscale: 1.0 I0419 22:22:17.362807 136021250135872 pyconfig.py:432] Config param mtc_data_parallelism: 0 I0419 22:22:17.362822 136021250135872 pyconfig.py:432] Config param mtp_eval_target_module: 0 I0419 22:22:17.362837 136021250135872 pyconfig.py:432] Config param mtp_loss_scaling_factor: 0.1 I0419 22:22:17.362852 136021250135872 pyconfig.py:432] Config param mtp_num_layers: 0 I0419 22:22:17.362866 136021250135872 pyconfig.py:432] Config param mu_dtype: float32 I0419 22:22:17.362889 136021250135872 pyconfig.py:432] Config param multi_sampling: False I0419 22:22:17.362903 136021250135872 pyconfig.py:432] Config param multi_tier_checkpointing_backup_interval_minutes: 0 I0419 22:22:17.362918 136021250135872 pyconfig.py:432] Config param muon_beta: 0.95 I0419 22:22:17.362933 136021250135872 pyconfig.py:432] Config param muon_consistent_rms: None I0419 22:22:17.362949 136021250135872 pyconfig.py:432] Config param muon_weight_decay: 0.0 I0419 22:22:17.362965 136021250135872 pyconfig.py:432] Config param n_routing_groups: -1 I0419 22:22:17.362980 136021250135872 pyconfig.py:432] Config param n_window_for_audio: 50 I0419 22:22:17.362996 136021250135872 pyconfig.py:432] Config param n_window_infer_for_audio: 800 I0419 22:22:17.363010 136021250135872 pyconfig.py:432] Config param nope_layer_interval: -1 I0419 22:22:17.363026 136021250135872 pyconfig.py:432] Config param norm_topk_prob: False I0419 22:22:17.363040 136021250135872 pyconfig.py:432] Config param normalization_layer_epsilon: 1e-05 I0419 22:22:17.363059 136021250135872 pyconfig.py:432] Config param normalize_embedding_logits: False I0419 22:22:17.363073 136021250135872 pyconfig.py:432] Config param num_attention_heads_for_vit: 16 I0419 22:22:17.363088 136021250135872 pyconfig.py:432] Config param num_batches: 4 I0419 22:22:17.363103 136021250135872 pyconfig.py:432] Config param num_channels_for_vit: 3 I0419 22:22:17.363119 136021250135872 pyconfig.py:432] Config param num_conv_layers_for_audio: 3 I0419 22:22:17.363132 136021250135872 pyconfig.py:432] Config param num_decoder_layers: 1 I0419 22:22:17.363148 136021250135872 pyconfig.py:432] Config param num_diloco_replicas: 1 I0419 22:22:17.363163 136021250135872 pyconfig.py:432] Config param num_epoch: 1 I0419 22:22:17.363177 136021250135872 pyconfig.py:432] Config param num_eval_passes: 1 I0419 22:22:17.363192 136021250135872 pyconfig.py:432] Config param num_experts: 1 I0419 22:22:17.363206 136021250135872 pyconfig.py:432] Config param num_experts_per_tok: 1 I0419 22:22:17.363222 136021250135872 pyconfig.py:432] Config param num_generations: 2 I0419 22:22:17.363238 136021250135872 pyconfig.py:432] Config param num_hidden_layers_for_vit: 34 I0419 22:22:17.363252 136021250135872 pyconfig.py:432] Config param num_iterations: 1 I0419 22:22:17.363267 136021250135872 pyconfig.py:432] Config param num_kv_heads: 2 I0419 22:22:17.363282 136021250135872 pyconfig.py:432] Config param num_layers_per_pipeline_stage: 1 I0419 22:22:17.363298 136021250135872 pyconfig.py:432] Config param num_mel_bins_for_audio: 128 I0419 22:22:17.363312 136021250135872 pyconfig.py:432] Config param num_pipeline_microbatches: -1 I0419 22:22:17.363327 136021250135872 pyconfig.py:432] Config param num_pipeline_repeats: -1 I0419 22:22:17.363352 136021250135872 pyconfig.py:432] Config param num_position_embeddings_for_vit: 1024 I0419 22:22:17.363368 136021250135872 pyconfig.py:432] Config param num_query_heads: 2 I0419 22:22:17.363382 136021250135872 pyconfig.py:432] Config param num_samplers_slices: -1 I0419 22:22:17.363397 136021250135872 pyconfig.py:432] Config param num_slices: 1 I0419 22:22:17.363413 136021250135872 pyconfig.py:432] Config param num_target_devices: 32 I0419 22:22:17.363428 136021250135872 pyconfig.py:432] Config param num_test_batches: 5 I0419 22:22:17.363442 136021250135872 pyconfig.py:432] Config param num_trainer_slices: -1 I0419 22:22:17.363457 136021250135872 pyconfig.py:432] Config param num_vocab_tiling: 1 I0419 22:22:17.363471 136021250135872 pyconfig.py:432] Config param off_policy_steps: 0 I0419 22:22:17.363487 136021250135872 pyconfig.py:432] Config param offline_data_dir: None I0419 22:22:17.363502 136021250135872 pyconfig.py:432] Config param opt_type: OptimizerType.ADAM_PAX I0419 22:22:17.363520 136021250135872 pyconfig.py:432] Config param optimize_mesh_for_tpu_v6e: False I0419 22:22:17.363535 136021250135872 pyconfig.py:432] Config param optimizer_memory_host_offload: False I0419 22:22:17.363550 136021250135872 pyconfig.py:432] Config param original_max_position_embeddings: 4096 I0419 22:22:17.363565 136021250135872 pyconfig.py:432] Config param out_hidden_size_for_vit: 512 I0419 22:22:17.363581 136021250135872 pyconfig.py:432] Config param out_proj: RematLocation.REMAT I0419 22:22:17.363595 136021250135872 pyconfig.py:432] Config param output_dim_for_audio: 512 I0419 22:22:17.363611 136021250135872 pyconfig.py:432] Config param override_logical_axis_rules: False I0419 22:22:17.363624 136021250135872 pyconfig.py:432] Config param override_model_config: True I0419 22:22:17.363642 136021250135872 pyconfig.py:432] Config param packing: True I0419 22:22:17.363656 136021250135872 pyconfig.py:432] Config param pagedattn_head_dim_alignment: 128 I0419 22:22:17.363671 136021250135872 pyconfig.py:432] Config param pagedattn_max_pages_per_group: -1 I0419 22:22:17.363685 136021250135872 pyconfig.py:432] Config param pagedattn_num_pages: 64 I0419 22:22:17.363701 136021250135872 pyconfig.py:432] Config param pagedattn_pages_per_compute_block: 4 I0419 22:22:17.363716 136021250135872 pyconfig.py:432] Config param pagedattn_tokens_per_page: 32 I0419 22:22:17.363730 136021250135872 pyconfig.py:432] Config param param_scan_axis: 1 I0419 22:22:17.363745 136021250135872 pyconfig.py:432] Config param parameter_memory_host_offload: False I0419 22:22:17.363758 136021250135872 pyconfig.py:432] Config param partial_rotary_factor: 1.0 I0419 22:22:17.363774 136021250135872 pyconfig.py:432] Config param patch_size_for_vit: 14 I0419 22:22:17.363788 136021250135872 pyconfig.py:432] Config param penalty_incorrect_answer: -1.0 I0419 22:22:17.363802 136021250135872 pyconfig.py:432] Config param penalty_incorrect_format: -0.5 I0419 22:22:17.363818 136021250135872 pyconfig.py:432] Config param per_device_batch_size: 2 I0419 22:22:17.363832 136021250135872 pyconfig.py:432] Config param per_device_batch_size_increment: 2.0 I0419 22:22:17.363847 136021250135872 pyconfig.py:432] Config param per_device_batch_size_start: 4.0 I0419 22:22:17.363861 136021250135872 pyconfig.py:432] Config param pipeline_delay_activation_forwarding: False I0419 22:22:17.363876 136021250135872 pyconfig.py:432] Config param pipeline_fsdp_ag_once: False I0419 22:22:17.363890 136021250135872 pyconfig.py:432] Config param pipeline_fsdp_ag_per_repeat: False I0419 22:22:17.363904 136021250135872 pyconfig.py:432] Config param pipeline_parallel_layers: 1 I0419 22:22:17.363919 136021250135872 pyconfig.py:432] Config param pixel_shuffle_ratio_for_vit: 0.5 I0419 22:22:17.363935 136021250135872 pyconfig.py:432] Config param posemb_type_for_vit: learn I0419 22:22:17.363950 136021250135872 pyconfig.py:432] Config param position_id_per_seconds: 25 I0419 22:22:17.363964 136021250135872 pyconfig.py:432] Config param prefill_cache_axis_order: 1,2,0,3 I0419 22:22:17.363979 136021250135872 pyconfig.py:432] Config param prefill_cache_dir: I0419 22:22:17.363993 136021250135872 pyconfig.py:432] Config param prefill_chunk_size: 256 I0419 22:22:17.364008 136021250135872 pyconfig.py:432] Config param prefill_slice: v5e-16 I0419 22:22:17.364022 136021250135872 pyconfig.py:432] Config param prefix_caching_dram_byte: 100000000000 I0419 22:22:17.364037 136021250135872 pyconfig.py:432] Config param prefix_caching_hbm_byte: 10000000000 I0419 22:22:17.364051 136021250135872 pyconfig.py:432] Config param profile_cleanly: True I0419 22:22:17.364066 136021250135872 pyconfig.py:432] Config param profile_periodically_period: -1 I0419 22:22:17.364080 136021250135872 pyconfig.py:432] Config param profile_power_events: False I0419 22:22:17.364095 136021250135872 pyconfig.py:432] Config param profiler: ProfilerType.NONE I0419 22:22:17.364111 136021250135872 pyconfig.py:432] Config param profiler_steps: 5 I0419 22:22:17.364126 136021250135872 pyconfig.py:432] Config param projector_dropout_for_vit: 0.0 I0419 22:22:17.364141 136021250135872 pyconfig.py:432] Config param projector_input_dim_for_vit: 4096 I0419 22:22:17.364156 136021250135872 pyconfig.py:432] Config param projector_output_dim_for_vit: 4096 I0419 22:22:17.364170 136021250135872 pyconfig.py:432] Config param prometheus_port: 0 I0419 22:22:17.364186 136021250135872 pyconfig.py:432] Config param prompt: I love to I0419 22:22:17.364200 136021250135872 pyconfig.py:432] Config param pure_nnx: False I0419 22:22:17.364215 136021250135872 pyconfig.py:432] Config param pure_nnx_decoder: False I0419 22:22:17.364229 136021250135872 pyconfig.py:432] Config param q_lora_rank: 0 I0419 22:22:17.364244 136021250135872 pyconfig.py:432] Config param qk_clip_threshold: 100.0 I0419 22:22:17.364258 136021250135872 pyconfig.py:432] Config param qk_nope_head_dim: 128 I0419 22:22:17.364273 136021250135872 pyconfig.py:432] Config param qk_norm_with_scale: True I0419 22:22:17.364287 136021250135872 pyconfig.py:432] Config param qk_rope_head_dim: 64 I0419 22:22:17.364302 136021250135872 pyconfig.py:432] Config param qkv_proj: RematLocation.REMAT I0419 22:22:17.364316 136021250135872 pyconfig.py:432] Config param quant_cfg_path: I0419 22:22:17.364342 136021250135872 pyconfig.py:432] Config param quantization: QuantizationType.NONE I0419 22:22:17.364359 136021250135872 pyconfig.py:432] Config param quantization_local_shard_count: 4 I0419 22:22:17.364374 136021250135872 pyconfig.py:432] Config param quantize_kvcache: False I0419 22:22:17.364388 136021250135872 pyconfig.py:432] Config param query_proj: RematLocation.REMAT I0419 22:22:17.364404 136021250135872 pyconfig.py:432] Config param query_wa_proj: RematLocation.REMAT I0419 22:22:17.364418 136021250135872 pyconfig.py:432] Config param ragged_block_size: 256 I0419 22:22:17.364433 136021250135872 pyconfig.py:432] Config param ragged_buffer_factor: -1.0 I0419 22:22:17.364447 136021250135872 pyconfig.py:432] Config param rampup_end_step: 0 I0419 22:22:17.364462 136021250135872 pyconfig.py:432] Config param rampup_samples_per_increment_to_load: None I0419 22:22:17.364476 136021250135872 pyconfig.py:432] Config param reasoning_end_token: </reasoning> I0419 22:22:17.364492 136021250135872 pyconfig.py:432] Config param reasoning_start_token: <reasoning> I0419 22:22:17.364505 136021250135872 pyconfig.py:432] Config param record_internal_nn_metrics: 0 I0419 22:22:17.364521 136021250135872 pyconfig.py:432] Config param remat_policy: full I0419 22:22:17.364535 136021250135872 pyconfig.py:432] Config param remat_policy_for_vit: minimal I0419 22:22:17.364550 136021250135872 pyconfig.py:432] Config param remove_size_one_mesh_axis_from_type: True I0419 22:22:17.364564 136021250135872 pyconfig.py:432] Config param replicate_quant_scale: False I0419 22:22:17.364579 136021250135872 pyconfig.py:432] Config param replicator_backup_interval_minutes: 0 I0419 22:22:17.364593 136021250135872 pyconfig.py:432] Config param report_heartbeat_metric_for_gcp_monitoring: False I0419 22:22:17.364610 136021250135872 pyconfig.py:432] Config param report_performance_metric_for_gcp_monitoring: False I0419 22:22:17.364624 136021250135872 pyconfig.py:432] Config param reshape_q: False I0419 22:22:17.364644 136021250135872 pyconfig.py:432] Config param return_log_prob: False I0419 22:22:17.364658 136021250135872 pyconfig.py:432] Config param reuse_example_batch: 0 I0419 22:22:17.364674 136021250135872 pyconfig.py:432] Config param reward_exact_answer: 5.0 I0419 22:22:17.364688 136021250135872 pyconfig.py:432] Config param reward_exact_format_match: 3.0 I0419 22:22:17.364703 136021250135872 pyconfig.py:432] Config param reward_partial_format_match: 0.5 I0419 22:22:17.364718 136021250135872 pyconfig.py:432] Config param reward_ratio_guess_to_answer_high: 0.5 I0419 22:22:17.364734 136021250135872 pyconfig.py:432] Config param reward_ratio_guess_to_answer_low: 0.25 I0419 22:22:17.364748 136021250135872 pyconfig.py:432] Config param reward_white_space_format_match: 1.5 I0419 22:22:17.364764 136021250135872 pyconfig.py:432] Config param rl: {'num_generations': 2, 'num_iterations': 1, 'grpo_beta': 0.08, 'grpo_epsilon': 0.2, 'loss_algo': 'grpo', 'use_agentic_rollout': False, 'max_concurrency': 256, 'off_policy_steps': 0, 'system_prompt': '', 'degenerate_group_masking': True, 'epsilon_high': None} I0419 22:22:17.364784 136021250135872 pyconfig.py:432] Config param rollout_data_parallelism: -1 I0419 22:22:17.364799 136021250135872 pyconfig.py:432] Config param rollout_expert_parallelism: 1 I0419 22:22:17.364813 136021250135872 pyconfig.py:432] Config param rollout_micro_batch_size: -1 I0419 22:22:17.364828 136021250135872 pyconfig.py:432] Config param rollout_tensor_parallelism: -1 I0419 22:22:17.364842 136021250135872 pyconfig.py:432] Config param rope_attention_scaling: False I0419 22:22:17.364857 136021250135872 pyconfig.py:432] Config param rope_factor: 40 I0419 22:22:17.364871 136021250135872 pyconfig.py:432] Config param rope_interleave: True I0419 22:22:17.364887 136021250135872 pyconfig.py:432] Config param rope_linear_scaling_factor: 1.0 I0419 22:22:17.364901 136021250135872 pyconfig.py:432] Config param rope_max_timescale: 10000 I0419 22:22:17.364917 136021250135872 pyconfig.py:432] Config param rope_min_timescale: 1 I0419 22:22:17.364931 136021250135872 pyconfig.py:432] Config param rope_theta_for_vit: 10000 I0419 22:22:17.364946 136021250135872 pyconfig.py:432] Config param rope_truncate: True I0419 22:22:17.364963 136021250135872 pyconfig.py:432] Config param rope_type: RopeType.DEFAULT I0419 22:22:17.364980 136021250135872 pyconfig.py:432] Config param rope_use_scale: True I0419 22:22:17.364994 136021250135872 pyconfig.py:432] Config param routed_bias: False I0419 22:22:17.365009 136021250135872 pyconfig.py:432] Config param routed_bias_update_rate: 0.0 I0419 22:22:17.365023 136021250135872 pyconfig.py:432] Config param routed_scaling_factor: 1.0 I0419 22:22:17.365039 136021250135872 pyconfig.py:432] Config param routed_score_func: I0419 22:22:17.365053 136021250135872 pyconfig.py:432] Config param run_name: gpt3-52k_2026-04-19-22-22 I0419 22:22:17.365069 136021250135872 pyconfig.py:432] Config param sa_block_kv: 512 I0419 22:22:17.365083 136021250135872 pyconfig.py:432] Config param sa_block_kv_compute: 512 I0419 22:22:17.365098 136021250135872 pyconfig.py:432] Config param sa_block_kv_dkv: 512 I0419 22:22:17.365114 136021250135872 pyconfig.py:432] Config param sa_block_kv_dkv_compute: 512 I0419 22:22:17.365129 136021250135872 pyconfig.py:432] Config param sa_block_kv_dq: 512 I0419 22:22:17.365144 136021250135872 pyconfig.py:432] Config param sa_block_q: 512 I0419 22:22:17.365159 136021250135872 pyconfig.py:432] Config param sa_block_q_dkv: 512 I0419 22:22:17.365174 136021250135872 pyconfig.py:432] Config param sa_block_q_dq: 512 I0419 22:22:17.365188 136021250135872 pyconfig.py:432] Config param sa_k_layout: HEAD_DIM_MINOR I0419 22:22:17.365203 136021250135872 pyconfig.py:432] Config param sa_q_layout: HEAD_DIM_MINOR I0419 22:22:17.365217 136021250135872 pyconfig.py:432] Config param sa_use_fused_bwd_kernel: False I0419 22:22:17.365232 136021250135872 pyconfig.py:432] Config param sa_v_layout: HEAD_DIM_MINOR I0419 22:22:17.365246 136021250135872 pyconfig.py:432] Config param sampler_devices_fraction: 0.5 I0419 22:22:17.365262 136021250135872 pyconfig.py:432] Config param save_checkpoint_on_completion: True I0419 22:22:17.365276 136021250135872 pyconfig.py:432] Config param save_config_to_gcs: False I0419 22:22:17.365291 136021250135872 pyconfig.py:432] Config param save_quantized_params_path: I0419 22:22:17.365305 136021250135872 pyconfig.py:432] Config param scale_embedding_for_audio: True I0419 22:22:17.365320 136021250135872 pyconfig.py:432] Config param scan_layers: True I0419 22:22:17.365343 136021250135872 pyconfig.py:432] Config param scan_layers_per_stage: False I0419 22:22:17.365359 136021250135872 pyconfig.py:432] Config param scan_pipeline_iterations: True I0419 22:22:17.365373 136021250135872 pyconfig.py:432] Config param scan_pipeline_repeats: False I0419 22:22:17.365388 136021250135872 pyconfig.py:432] Config param set_remat_policy_on_layers_per_stage: False I0419 22:22:17.365402 136021250135872 pyconfig.py:432] Config param set_remat_policy_on_pipeline_iterations: True I0419 22:22:17.365417 136021250135872 pyconfig.py:432] Config param sft_train_on_completion_only: False I0419 22:22:17.365431 136021250135872 pyconfig.py:432] Config param shard_exp_on_fsdp: False I0419 22:22:17.365447 136021250135872 pyconfig.py:432] Config param shard_mode: ShardMode.AUTO I0419 22:22:17.365463 136021250135872 pyconfig.py:432] Config param shard_optimizer_over_data: False I0419 22:22:17.365479 136021250135872 pyconfig.py:432] Config param sharding_strategy: None I0419 22:22:17.365494 136021250135872 pyconfig.py:432] Config param sharding_tolerance: 0.02 I0419 22:22:17.365508 136021250135872 pyconfig.py:432] Config param shardy: True I0419 22:22:17.365524 136021250135872 pyconfig.py:432] Config param share_kv_projections: False I0419 22:22:17.365541 136021250135872 pyconfig.py:432] Config param shared_experts: 0 I0419 22:22:17.365555 136021250135872 pyconfig.py:432] Config param sinkhorn_iterations: 20 I0419 22:22:17.365570 136021250135872 pyconfig.py:432] Config param skip_first_n_steps_for_profiler: 1 I0419 22:22:17.365584 136021250135872 pyconfig.py:432] Config param skip_jax_distributed_system: False I0419 22:22:17.365598 136021250135872 pyconfig.py:432] Config param skip_step_interval: 128 I0419 22:22:17.365612 136021250135872 pyconfig.py:432] Config param skip_step_on_spikes: False I0419 22:22:17.365631 136021250135872 pyconfig.py:432] Config param skip_step_scaling_factor: 6.0 I0419 22:22:17.365645 136021250135872 pyconfig.py:432] Config param sliding_window_size: 0 I0419 22:22:17.365661 136021250135872 pyconfig.py:432] Config param solution_end_token: </answer> I0419 22:22:17.365675 136021250135872 pyconfig.py:432] Config param solution_start_token: <answer> I0419 22:22:17.365689 136021250135872 pyconfig.py:432] Config param source_checkpoint_layout: orbax I0419 22:22:17.365704 136021250135872 pyconfig.py:432] Config param sparse_matmul: True I0419 22:22:17.365718 136021250135872 pyconfig.py:432] Config param spatial_merge_size_for_vit: 2 I0419 22:22:17.365734 136021250135872 pyconfig.py:432] Config param stack_prefill_result_cache: False I0419 22:22:17.365748 136021250135872 pyconfig.py:432] Config param stack_trace_interval_seconds: 600 I0419 22:22:17.365763 136021250135872 pyconfig.py:432] Config param stack_trace_to_cloud: False I0419 22:22:17.365777 136021250135872 pyconfig.py:432] Config param step_deviation_interval_seconds: 30 I0419 22:22:17.365792 136021250135872 pyconfig.py:432] Config param steps: 200000 I0419 22:22:17.365806 136021250135872 pyconfig.py:432] Config param stop_strings: None I0419 22:22:17.365821 136021250135872 pyconfig.py:432] Config param student_overrides: {'model_name': 'llama3.1-8b'} I0419 22:22:17.365836 136021250135872 pyconfig.py:432] Config param student_params_to_update: None I0419 22:22:17.365852 136021250135872 pyconfig.py:432] Config param subslice_shape: I0419 22:22:17.365866 136021250135872 pyconfig.py:432] Config param swap_space_vllm_gb: 2 I0419 22:22:17.365881 136021250135872 pyconfig.py:432] Config param system_prompt: I0419 22:22:17.365895 136021250135872 pyconfig.py:432] Config param target_eval_loss: 0.0 I0419 22:22:17.365911 136021250135872 pyconfig.py:432] Config param teacher_overrides: {'model_name': 'llama3.1-8b'} I0419 22:22:17.365925 136021250135872 pyconfig.py:432] Config param temperature_tuning: False I0419 22:22:17.365941 136021250135872 pyconfig.py:432] Config param temporal_patch_size_for_vit: 2 I0419 22:22:17.365954 136021250135872 pyconfig.py:432] Config param tensorboard_dir: None I0419 22:22:17.365970 136021250135872 pyconfig.py:432] Config param tensors_on_device: None I0419 22:22:17.365983 136021250135872 pyconfig.py:432] Config param tensors_to_offload: None I0419 22:22:17.365999 136021250135872 pyconfig.py:432] Config param test_batch_start_index: 0 I0419 22:22:17.366012 136021250135872 pyconfig.py:432] Config param tile_size_for_vit: 336 I0419 22:22:17.366027 136021250135872 pyconfig.py:432] Config param tokenize_eval_data: True I0419 22:22:17.366043 136021250135872 pyconfig.py:432] Config param tokenize_train_data: True I0419 22:22:17.366056 136021250135872 pyconfig.py:432] Config param tokenizer_path: meta-llama/Llama-3.1-8B I0419 22:22:17.366072 136021250135872 pyconfig.py:432] Config param tokenizer_type: TokenizerType.HUGGINGFACE I0419 22:22:17.366088 136021250135872 pyconfig.py:432] Config param topk_routing_group: -1 I0419 22:22:17.366104 136021250135872 pyconfig.py:432] Config param train_data_columns: ['text'] I0419 22:22:17.366118 136021250135872 pyconfig.py:432] Config param train_fraction: 1.0 I0419 22:22:17.366132 136021250135872 pyconfig.py:432] Config param train_image_column: image I0419 22:22:17.366147 136021250135872 pyconfig.py:432] Config param train_micro_batch_size: -1 I0419 22:22:17.366161 136021250135872 pyconfig.py:432] Config param train_split: train I0419 22:22:17.366174 136021250135872 pyconfig.py:432] Config param trainable_parameters_mask: [] I0419 22:22:17.366190 136021250135872 pyconfig.py:432] Config param trainable_position_size: 2048 I0419 22:22:17.366204 136021250135872 pyconfig.py:432] Config param trainer_devices_fraction: 0.5 I0419 22:22:17.366220 136021250135872 pyconfig.py:432] Config param upload_all_profiler_results: False I0419 22:22:17.366234 136021250135872 pyconfig.py:432] Config param use_2d_fsdp_sharding: False I0419 22:22:17.366249 136021250135872 pyconfig.py:432] Config param use_agentic_rollout: False I0419 22:22:17.366263 136021250135872 pyconfig.py:432] Config param use_audio: False I0419 22:22:17.366278 136021250135872 pyconfig.py:432] Config param use_audio_in_video: False I0419 22:22:17.366291 136021250135872 pyconfig.py:432] Config param use_batch_split_schedule: False I0419 22:22:17.366307 136021250135872 pyconfig.py:432] Config param use_chat_template: False I0419 22:22:17.366321 136021250135872 pyconfig.py:432] Config param use_chunked_prefill: False I0419 22:22:17.366343 136021250135872 pyconfig.py:432] Config param use_custom_sort_vjp: True I0419 22:22:17.366358 136021250135872 pyconfig.py:432] Config param use_dpo: False I0419 22:22:17.366372 136021250135872 pyconfig.py:432] Config param use_gather_mosaic_kernel: False I0419 22:22:17.366387 136021250135872 pyconfig.py:432] Config param use_grpo: True I0419 22:22:17.366404 136021250135872 pyconfig.py:432] Config param use_indexer: False I0419 22:22:17.366419 136021250135872 pyconfig.py:432] Config param use_iota_embed: True I0419 22:22:17.366434 136021250135872 pyconfig.py:432] Config param use_jax_splash: False I0419 22:22:17.366448 136021250135872 pyconfig.py:432] Config param use_max_logit_estimate: -1 I0419 22:22:17.366463 136021250135872 pyconfig.py:432] Config param use_mrope: False I0419 22:22:17.366477 136021250135872 pyconfig.py:432] Config param use_multimodal: False I0419 22:22:17.366492 136021250135872 pyconfig.py:432] Config param use_pathways: True I0419 22:22:17.366506 136021250135872 pyconfig.py:432] Config param use_post_attn_norm: False I0419 22:22:17.366521 136021250135872 pyconfig.py:432] Config param use_post_ffw_norm: False I0419 22:22:17.366535 136021250135872 pyconfig.py:432] Config param use_qk_clip: False I0419 22:22:17.366549 136021250135872 pyconfig.py:432] Config param use_qk_norm: False I0419 22:22:17.366564 136021250135872 pyconfig.py:432] Config param use_qk_norm_in_gdn: True I0419 22:22:17.366580 136021250135872 pyconfig.py:432] Config param use_qwix_quantization: False I0419 22:22:17.366596 136021250135872 pyconfig.py:432] Config param use_ragged_attention: False I0419 22:22:17.366611 136021250135872 pyconfig.py:432] Config param use_random_routing: False I0419 22:22:17.366629 136021250135872 pyconfig.py:432] Config param use_replicator_service: False I0419 22:22:17.366643 136021250135872 pyconfig.py:432] Config param use_ring_of_experts: False I0419 22:22:17.366658 136021250135872 pyconfig.py:432] Config param use_sft: False I0419 22:22:17.366671 136021250135872 pyconfig.py:432] Config param use_splash_scheduler: False I0419 22:22:17.366687 136021250135872 pyconfig.py:432] Config param use_tokamax_gmm: False I0419 22:22:17.366700 136021250135872 pyconfig.py:432] Config param use_tokamax_splash: False I0419 22:22:17.366716 136021250135872 pyconfig.py:432] Config param use_truncation: True I0419 22:22:17.366729 136021250135872 pyconfig.py:432] Config param use_tunix_gradient_accumulation: False I0419 22:22:17.366744 136021250135872 pyconfig.py:432] Config param use_untrainable_positional_embedding: False I0419 22:22:17.366759 136021250135872 pyconfig.py:432] Config param use_vertex_tensorboard: False I0419 22:22:17.366774 136021250135872 pyconfig.py:432] Config param using_pipeline_parallelism: False I0419 22:22:17.366788 136021250135872 pyconfig.py:432] Config param v_head_dim: 128 I0419 22:22:17.366803 136021250135872 pyconfig.py:432] Config param v_norm_with_scale: True I0419 22:22:17.366816 136021250135872 pyconfig.py:432] Config param value_proj: RematLocation.REMAT I0419 22:22:17.366832 136021250135872 pyconfig.py:432] Config param vertex_tensorboard_project: I0419 22:22:17.366846 136021250135872 pyconfig.py:432] Config param vertex_tensorboard_region: I0419 22:22:17.366861 136021250135872 pyconfig.py:432] Config param video_path: I0419 22:22:17.366875 136021250135872 pyconfig.py:432] Config param video_placeholder: <|video|> I0419 22:22:17.366890 136021250135872 pyconfig.py:432] Config param vision_output_dim_for_vit: 4096 I0419 22:22:17.366904 136021250135872 pyconfig.py:432] Config param vision_output_length: -1 I0419 22:22:17.366919 136021250135872 pyconfig.py:432] Config param vllm_additional_config: {} I0419 22:22:17.366933 136021250135872 pyconfig.py:432] Config param vllm_hf_config_path: I0419 22:22:17.366948 136021250135872 pyconfig.py:432] Config param vllm_hf_overrides: {} I0419 22:22:17.366964 136021250135872 pyconfig.py:432] Config param vocab_size: 32000 I0419 22:22:17.366979 136021250135872 pyconfig.py:432] Config param warmup_steps_fraction: 0.1 I0419 22:22:17.366994 136021250135872 pyconfig.py:432] Config param weight_dtype: float32 I0419 22:22:17.367017 136021250135872 pyconfig.py:432] Config param weight_quantization_calibration_method: absmax I0419 22:22:17.367032 136021250135872 pyconfig.py:432] Config param wi_tile_dlhs_batch_seq: 512 I0419 22:22:17.367048 136021250135872 pyconfig.py:432] Config param wi_tile_dlhs_embed_dim: 1024 I0419 22:22:17.367062 136021250135872 pyconfig.py:432] Config param wi_tile_dlhs_mlp_dim: 1024 I0419 22:22:17.367077 136021250135872 pyconfig.py:432] Config param wi_tile_drhs_batch_seq: 512 I0419 22:22:17.367091 136021250135872 pyconfig.py:432] Config param wi_tile_drhs_embed_dim: 1024 I0419 22:22:17.367107 136021250135872 pyconfig.py:432] Config param wi_tile_drhs_mlp_dim: 1024 I0419 22:22:17.367121 136021250135872 pyconfig.py:432] Config param wi_tile_fwd_batch_seq: 512 I0419 22:22:17.367136 136021250135872 pyconfig.py:432] Config param wi_tile_fwd_embed_dim: 1024 I0419 22:22:17.367151 136021250135872 pyconfig.py:432] Config param wi_tile_fwd_mlp_dim: 1024 I0419 22:22:17.367167 136021250135872 pyconfig.py:432] Config param wo_tile_dlhs_batch_seq: 512 I0419 22:22:17.367181 136021250135872 pyconfig.py:432] Config param wo_tile_dlhs_embed_dim: 1024 I0419 22:22:17.367195 136021250135872 pyconfig.py:432] Config param wo_tile_dlhs_mlp_dim: 1024 I0419 22:22:17.367211 136021250135872 pyconfig.py:432] Config param wo_tile_drhs_batch_seq: 512 I0419 22:22:17.367225 136021250135872 pyconfig.py:432] Config param wo_tile_drhs_embed_dim: 1024 I0419 22:22:17.367239 136021250135872 pyconfig.py:432] Config param wo_tile_drhs_mlp_dim: 1024 I0419 22:22:17.367254 136021250135872 pyconfig.py:432] Config param wo_tile_fwd_batch_seq: 512 I0419 22:22:17.367269 136021250135872 pyconfig.py:432] Config param wo_tile_fwd_embed_dim: 1024 I0419 22:22:17.367284 136021250135872 pyconfig.py:432] Config param wo_tile_fwd_mlp_dim: 1024 I0419 22:22:17.367298 136021250135872 pyconfig.py:432] Config param wsd_decay_steps_fraction: 0.1 I0419 22:22:17.367314 136021250135872 pyconfig.py:432] Config param wsd_decay_style: WsdDecayStyle.LINEAR I0419 22:22:17.367340 136021250135872 pyconfig.py:432] Config param xprof_e2e_enable_fw_power_level_event: False I0419 22:22:17.367356 136021250135872 pyconfig.py:432] Config param xprof_e2e_enable_fw_thermal_event: False I0419 22:22:17.367371 136021250135872 pyconfig.py:432] Config param xprof_e2e_enable_fw_throttle_event: False I0419 22:22:17.367386 136021250135872 pyconfig.py:432] Config param xprof_tpu_power_trace_level: 0 I0419 22:22:17.367403 136021250135872 pyconfig.py:432] Config param z_loss_multiplier: 0.0 I0419 22:22:17.367727 136021250135872 tokenizer.py:245] Tokenizer path: meta-llama/Llama-2-7b-chat-hf I0419 22:22:17.367762 136021250135872 tokenizer.py:224] Loading HF tokenizer: meta-llama/Llama-2-7b-chat-hf I0419 22:22:21.253602 136021250135872 _schedule.py:129] A polynomial schedule was set with a non-positive `transition_steps` value; this results in a constant schedule with value `init_value`. Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 749, in <module> app.run(main) File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 316, in run _run_main(main, args) File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 261, in _run_main sys.exit(main(argv)) ^^^^^^^^^^ File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 745, in main train_distill(student_config, teacher_config, is_offline, global_config.offline_data_dir) File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 580, in train_distill devices_array = maxtext_utils.create_device_mesh(student_config, devices) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/deps/src/maxtext/utils/maxtext_utils.py", line 1677, in create_device_mesh ici_parallelism = max_utils.fill_unspecified_mesh_axes(config.ici_parallelism.copy(), num_devices_per_slice, "ICI") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/deps/src/maxtext/utils/max_utils.py", line 450, in fill_unspecified_mesh_axes assert np.prod(parallelism_vals) == target_product, ( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AssertionError: Number of devices per slice 32 does not match the product of the ICI parallelism 8 XPK End: Sun Apr 19 22:22:29 UTC 2026 EXIT_CODE=1