MaxView

← Back to run

Log Summary

XPK Start: Sun Apr 19 18:20:42 UTC 2026
2026-04-19 18:20:59.695056: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
I0419 18:21:03.315843 139955906574144 max_utils.py:273] Attempting to initialize the jax distributed system...
INFO:2026-04-19 18:21:12,354:jax._src.distributed:149: Starting JAX distributed service on [::]:8482
I0419 18:21:12.354380 139955906574144 distributed.py:149] Starting JAX distributed service on [::]:8482
INFO:2026-04-19 18:21:12,356:jax._src.distributed:166: Connecting to JAX distributed service on mt-07-distill-smoke-m42ln-slice-job-0-0.mt-07-distill-smoke-m42ln:8482
I0419 18:21:12.356690 139955906574144 distributed.py:166] Connecting to JAX distributed service on mt-07-distill-smoke-m42ln-slice-job-0-0.mt-07-distill-smoke-m42ln:8482
I0419 18:21:14.334947 139955906574144 max_utils.py:284] Jax distributed system initialized!
I0419 18:21:20.623946 139955906574144 max_utils.py:244] Jax distributed system is already initialized.
I0419 18:21:21.097647 139955906574144 max_utils.py:244] Jax distributed system is already initialized.
I0419 18:21:21.098850 139955906574144 pyconfig.py:432] Config param abort_on_inf_loss: True
I0419 18:21:21.098897 139955906574144 pyconfig.py:432] Config param abort_on_nan_loss: True
I0419 18:21:21.098922 139955906574144 pyconfig.py:432] Config param act_quantization_calibration_method: absmax
I0419 18:21:21.098943 139955906574144 pyconfig.py:432] Config param activation_dropout_for_audio: 0.0
I0419 18:21:21.098963 139955906574144 pyconfig.py:432] Config param activation_function_for_audio: gelu
I0419 18:21:21.098980 139955906574144 pyconfig.py:432] Config param activations_in_float32: False
I0419 18:21:21.098999 139955906574144 pyconfig.py:432] Config param adam_b1: 0.9
I0419 18:21:21.099018 139955906574144 pyconfig.py:432] Config param adam_b2: 0.95
I0419 18:21:21.099035 139955906574144 pyconfig.py:432] Config param adam_eps: 1e-08
I0419 18:21:21.099057 139955906574144 pyconfig.py:432] Config param adam_eps_root: 0.0
I0419 18:21:21.099074 139955906574144 pyconfig.py:432] Config param adam_weight_decay: 0.1
I0419 18:21:21.099092 139955906574144 pyconfig.py:432] Config param adamw_mask: []
I0419 18:21:21.099108 139955906574144 pyconfig.py:432] Config param add_bos: True
I0419 18:21:21.099124 139955906574144 pyconfig.py:432] Config param add_eos: True
I0419 18:21:21.099141 139955906574144 pyconfig.py:432] Config param allow_split_physical_axes: False
I0419 18:21:21.099157 139955906574144 pyconfig.py:432] Config param ar_cache_axis_order: 1,2,0,3
I0419 18:21:21.099174 139955906574144 pyconfig.py:432] Config param async_checkpointing: True
I0419 18:21:21.099191 139955906574144 pyconfig.py:432] Config param async_scheduling: False
I0419 18:21:21.099206 139955906574144 pyconfig.py:432] Config param attention: dot_product
I0419 18:21:21.099223 139955906574144 pyconfig.py:432] Config param attention_bias: False
I0419 18:21:21.099244 139955906574144 pyconfig.py:432] Config param attention_dropout_for_audio: 0.0
I0419 18:21:21.099261 139955906574144 pyconfig.py:432] Config param attention_out: RematLocation.REMAT
I0419 18:21:21.099281 139955906574144 pyconfig.py:432] Config param attention_output_dim: -1
I0419 18:21:21.099298 139955906574144 pyconfig.py:432] Config param attention_sink: False
I0419 18:21:21.099315 139955906574144 pyconfig.py:432] Config param attention_type: global
I0419 18:21:21.099330 139955906574144 pyconfig.py:432] Config param attn_logits_soft_cap: None
I0419 18:21:21.099346 139955906574144 pyconfig.py:432] Config param audio_path: 
I0419 18:21:21.099362 139955906574144 pyconfig.py:432] Config param audio_placeholder: <|audio|>
I0419 18:21:21.099378 139955906574144 pyconfig.py:432] Config param autoregressive_decode_assert: 
I0419 18:21:21.099427 139955906574144 pyconfig.py:432] Config param base_config: base.yml
I0419 18:21:21.099443 139955906574144 pyconfig.py:432] Config param base_emb_dim: 16
I0419 18:21:21.099460 139955906574144 pyconfig.py:432] Config param base_mlp_dim: 64
I0419 18:21:21.099476 139955906574144 pyconfig.py:432] Config param base_moe_mlp_dim: -1
I0419 18:21:21.099493 139955906574144 pyconfig.py:432] Config param base_num_decoder_layers: 1
I0419 18:21:21.099509 139955906574144 pyconfig.py:432] Config param base_num_kv_heads: 2
I0419 18:21:21.099525 139955906574144 pyconfig.py:432] Config param base_num_query_heads: 2
I0419 18:21:21.099541 139955906574144 pyconfig.py:432] Config param base_output_directory: 
I0419 18:21:21.099558 139955906574144 pyconfig.py:432] Config param batch_size: 1
I0419 18:21:21.099573 139955906574144 pyconfig.py:432] Config param batch_split_factor: 1
I0419 18:21:21.099589 139955906574144 pyconfig.py:432] Config param beta_fast: 32
I0419 18:21:21.099605 139955906574144 pyconfig.py:432] Config param beta_slow: 1
I0419 18:21:21.099621 139955906574144 pyconfig.py:432] Config param bwd_quantization_calibration_method: absmax
I0419 18:21:21.099638 139955906574144 pyconfig.py:432] Config param capacity_factor: -1.0
I0419 18:21:21.099653 139955906574144 pyconfig.py:432] Config param cast_logits_to_fp32: True
I0419 18:21:21.099669 139955906574144 pyconfig.py:432] Config param chat_template: 
I0419 18:21:21.099685 139955906574144 pyconfig.py:432] Config param chat_template_path: 
I0419 18:21:21.099702 139955906574144 pyconfig.py:432] Config param checkpoint_conversion_fn: None
I0419 18:21:21.099720 139955906574144 pyconfig.py:432] Config param checkpoint_dir: None
I0419 18:21:21.099737 139955906574144 pyconfig.py:432] Config param checkpoint_is_quantized: False
I0419 18:21:21.099755 139955906574144 pyconfig.py:432] Config param checkpoint_period: 2000
I0419 18:21:21.099771 139955906574144 pyconfig.py:432] Config param checkpoint_storage_concurrent_gb: 96
I0419 18:21:21.099786 139955906574144 pyconfig.py:432] Config param checkpoint_storage_target_data_file_size_bytes: 2147483648
I0419 18:21:21.099803 139955906574144 pyconfig.py:432] Config param checkpoint_storage_use_ocdbt: True
I0419 18:21:21.099821 139955906574144 pyconfig.py:432] Config param checkpoint_storage_use_zarr3: True
I0419 18:21:21.099839 139955906574144 pyconfig.py:432] Config param checkpoint_todelete_full_path: None
I0419 18:21:21.099855 139955906574144 pyconfig.py:432] Config param checkpoint_todelete_subdir: None
I0419 18:21:21.099870 139955906574144 pyconfig.py:432] Config param chips_per_vm: 4
I0419 18:21:21.099886 139955906574144 pyconfig.py:432] Config param chunk_attn_window_size: 0
I0419 18:21:21.099901 139955906574144 pyconfig.py:432] Config param collect_stack_trace: False
I0419 18:21:21.099917 139955906574144 pyconfig.py:432] Config param colocated_python_checkpointing: False
I0419 18:21:21.099931 139955906574144 pyconfig.py:432] Config param colocated_python_data_input: False
I0419 18:21:21.099947 139955906574144 pyconfig.py:432] Config param compile_topology: 
I0419 18:21:21.099961 139955906574144 pyconfig.py:432] Config param compile_topology_num_slices: -1
I0419 18:21:21.099977 139955906574144 pyconfig.py:432] Config param compile_xla_flags: 
I0419 18:21:21.099991 139955906574144 pyconfig.py:432] Config param compiled_trainstep_file: 
I0419 18:21:21.100006 139955906574144 pyconfig.py:432] Config param compute_axis_order: 0,1,2,3
I0419 18:21:21.100022 139955906574144 pyconfig.py:432] Config param constant_bound_config: []
I0419 18:21:21.100036 139955906574144 pyconfig.py:432] Config param context: RematLocation.REMAT
I0419 18:21:21.100053 139955906574144 pyconfig.py:432] Config param context_parallel_load_balance: True
I0419 18:21:21.100067 139955906574144 pyconfig.py:432] Config param context_parallel_size: 1
I0419 18:21:21.100082 139955906574144 pyconfig.py:432] Config param context_parallel_strategy: all_gather
I0419 18:21:21.100098 139955906574144 pyconfig.py:432] Config param context_sharding: context
I0419 18:21:21.100112 139955906574144 pyconfig.py:432] Config param conv_chunksize_for_audio: 500
I0419 18:21:21.100128 139955906574144 pyconfig.py:432] Config param conv_stride_for_vit: 14
I0419 18:21:21.100143 139955906574144 pyconfig.py:432] Config param cost_estimate_flops_bwd: -1
I0419 18:21:21.100158 139955906574144 pyconfig.py:432] Config param cost_estimate_flops_fwd: -1
I0419 18:21:21.100173 139955906574144 pyconfig.py:432] Config param custom_mesh: 
I0419 18:21:21.100189 139955906574144 pyconfig.py:432] Config param custom_mesh_and_rule: 
I0419 18:21:21.100203 139955906574144 pyconfig.py:432] Config param d_model_for_audio: 256
I0419 18:21:21.100219 139955906574144 pyconfig.py:432] Config param data_sharding: (('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'),)
I0419 18:21:21.100243 139955906574144 pyconfig.py:432] Config param data_shuffle_seed: 0
I0419 18:21:21.100260 139955906574144 pyconfig.py:432] Config param dataset_name: c4/en:3.0.1
I0419 18:21:21.100277 139955906574144 pyconfig.py:432] Config param dataset_path: 
I0419 18:21:21.100295 139955906574144 pyconfig.py:432] Config param dataset_type: DatasetType.HF
I0419 18:21:21.100313 139955906574144 pyconfig.py:432] Config param dcn_autoregressive_parallelism: 1
I0419 18:21:21.100330 139955906574144 pyconfig.py:432] Config param dcn_context_autoregressive_parallelism: 1
I0419 18:21:21.100347 139955906574144 pyconfig.py:432] Config param dcn_context_parallelism: 1
I0419 18:21:21.100363 139955906574144 pyconfig.py:432] Config param dcn_data_parallelism: -1
I0419 18:21:21.100377 139955906574144 pyconfig.py:432] Config param dcn_diloco_parallelism: 1
I0419 18:21:21.100393 139955906574144 pyconfig.py:432] Config param dcn_expert_parallelism: 1
I0419 18:21:21.100417 139955906574144 pyconfig.py:432] Config param dcn_fsdp_parallelism: 1
I0419 18:21:21.100433 139955906574144 pyconfig.py:432] Config param dcn_fsdp_transpose_parallelism: 1
I0419 18:21:21.100447 139955906574144 pyconfig.py:432] Config param dcn_parallelism: [1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
I0419 18:21:21.100464 139955906574144 pyconfig.py:432] Config param dcn_pipeline_parallelism: 1
I0419 18:21:21.100480 139955906574144 pyconfig.py:432] Config param dcn_sequence_parallelism: 1
I0419 18:21:21.100494 139955906574144 pyconfig.py:432] Config param dcn_tensor_parallelism: 1
I0419 18:21:21.100510 139955906574144 pyconfig.py:432] Config param dcn_tensor_sequence_parallelism: 1
I0419 18:21:21.100526 139955906574144 pyconfig.py:432] Config param dcn_tensor_transpose_parallelism: 1
I0419 18:21:21.100543 139955906574144 pyconfig.py:432] Config param debug: {'rl': False}
I0419 18:21:21.100561 139955906574144 pyconfig.py:432] Config param debug_sharding: False
I0419 18:21:21.100577 139955906574144 pyconfig.py:432] Config param decode_sampling_nucleus_p: -1
I0419 18:21:21.100593 139955906574144 pyconfig.py:432] Config param decode_sampling_strategy: SamplingStrategy.GREEDY
I0419 18:21:21.100611 139955906574144 pyconfig.py:432] Config param decode_sampling_temperature: 1.0
I0419 18:21:21.100626 139955906574144 pyconfig.py:432] Config param decode_sampling_top_k: 0
I0419 18:21:21.100642 139955906574144 pyconfig.py:432] Config param decoder_block: DecoderBlockType.GPT3
I0419 18:21:21.100658 139955906574144 pyconfig.py:432] Config param decoder_layer_input: RematLocation.DEVICE
I0419 18:21:21.100675 139955906574144 pyconfig.py:432] Config param deepstack_visual_indexes_for_vit: []
I0419 18:21:21.100691 139955906574144 pyconfig.py:432] Config param degenerate_group_masking: True
I0419 18:21:21.100706 139955906574144 pyconfig.py:432] Config param dense_init_scale: 1.0
I0419 18:21:21.100721 139955906574144 pyconfig.py:432] Config param diloco_outer_lr: 0.3
I0419 18:21:21.100737 139955906574144 pyconfig.py:432] Config param diloco_outer_momentum: 0.9
I0419 18:21:21.100753 139955906574144 pyconfig.py:432] Config param diloco_sync_period: 36
I0419 18:21:21.100769 139955906574144 pyconfig.py:432] Config param distill_alpha: 0.5
I0419 18:21:21.100784 139955906574144 pyconfig.py:432] Config param distill_alpha_end: None
I0419 18:21:21.100799 139955906574144 pyconfig.py:432] Config param distill_alpha_schedule: constant
I0419 18:21:21.100814 139955906574144 pyconfig.py:432] Config param distill_beta: 0.0
I0419 18:21:21.100830 139955906574144 pyconfig.py:432] Config param distill_beta_end: None
I0419 18:21:21.100846 139955906574144 pyconfig.py:432] Config param distill_beta_schedule: constant
I0419 18:21:21.100862 139955906574144 pyconfig.py:432] Config param distill_feature_loss_type: cosine
I0419 18:21:21.100876 139955906574144 pyconfig.py:432] Config param distill_layer_indices: None
I0419 18:21:21.100892 139955906574144 pyconfig.py:432] Config param distill_temperature: 1.0
I0419 18:21:21.100908 139955906574144 pyconfig.py:432] Config param distill_temperature_end: None
I0419 18:21:21.100922 139955906574144 pyconfig.py:432] Config param distill_temperature_schedule: constant
I0419 18:21:21.100938 139955906574144 pyconfig.py:432] Config param downsample_hidden_size_for_audio: 256
I0419 18:21:21.100952 139955906574144 pyconfig.py:432] Config param dpo_beta: 0.1
I0419 18:21:21.100968 139955906574144 pyconfig.py:432] Config param dpo_label_smoothing: 0.0
I0419 18:21:21.100983 139955906574144 pyconfig.py:432] Config param dq_reduction_steps: 0
I0419 18:21:21.100998 139955906574144 pyconfig.py:432] Config param dropout_rate: 0.0
I0419 18:21:21.101014 139955906574144 pyconfig.py:432] Config param dtype: bfloat16
I0419 18:21:21.101045 139955906574144 pyconfig.py:432] Config param dtype_mm: float32
I0419 18:21:21.101061 139955906574144 pyconfig.py:432] Config param dump_hlo: False
I0419 18:21:21.101075 139955906574144 pyconfig.py:432] Config param dump_hlo_delete_local_after: True
I0419 18:21:21.101091 139955906574144 pyconfig.py:432] Config param dump_hlo_gcs_dir: gpt3-52k_2026-04-19-18-21/xla_dump
I0419 18:21:21.101106 139955906574144 pyconfig.py:432] Config param dump_hlo_local_dir: /tmp/xla_dump/
I0419 18:21:21.101123 139955906574144 pyconfig.py:432] Config param dump_hlo_local_module_name: jit_train_step
I0419 18:21:21.101138 139955906574144 pyconfig.py:432] Config param dump_hlo_module_name: jit_train_step
I0419 18:21:21.101153 139955906574144 pyconfig.py:432] Config param dump_hlo_upload_all: False
I0419 18:21:21.101168 139955906574144 pyconfig.py:432] Config param dump_hlo_xla_flags: 
I0419 18:21:21.101183 139955906574144 pyconfig.py:432] Config param dump_jaxpr: False
I0419 18:21:21.101199 139955906574144 pyconfig.py:432] Config param dump_jaxpr_delete_local_after: True
I0419 18:21:21.101215 139955906574144 pyconfig.py:432] Config param dump_jaxpr_gcs_dir: gpt3-52k_2026-04-19-18-21/jaxpr_dump
I0419 18:21:21.101234 139955906574144 pyconfig.py:432] Config param dump_jaxpr_local_dir: /tmp/jaxpr_dump/
I0419 18:21:21.101249 139955906574144 pyconfig.py:432] Config param dump_step: -1
I0419 18:21:21.101264 139955906574144 pyconfig.py:432] Config param elastic_enabled: False
I0419 18:21:21.101279 139955906574144 pyconfig.py:432] Config param elastic_max_retries: 10
I0419 18:21:21.101294 139955906574144 pyconfig.py:432] Config param elastic_timeout_seconds: 300
I0419 18:21:21.101309 139955906574144 pyconfig.py:432] Config param emb_dim: 16
I0419 18:21:21.101325 139955906574144 pyconfig.py:432] Config param enable_autocheckpoint: False
I0419 18:21:21.101340 139955906574144 pyconfig.py:432] Config param enable_checkpoint_cloud_logger: False
I0419 18:21:21.101355 139955906574144 pyconfig.py:432] Config param enable_checkpointing: True
I0419 18:21:21.101370 139955906574144 pyconfig.py:432] Config param enable_continuous_checkpointing: False
I0419 18:21:21.101385 139955906574144 pyconfig.py:432] Config param enable_data_shuffling: True
I0419 18:21:21.101401 139955906574144 pyconfig.py:432] Config param enable_diloco: False
I0419 18:21:21.101425 139955906574144 pyconfig.py:432] Config param enable_dp_attention: False
I0419 18:21:21.101441 139955906574144 pyconfig.py:432] Config param enable_dropout: False
I0419 18:21:21.101457 139955906574144 pyconfig.py:432] Config param enable_emergency_checkpoint: False
I0419 18:21:21.101471 139955906574144 pyconfig.py:432] Config param enable_expert_parallel: False
I0419 18:21:21.101487 139955906574144 pyconfig.py:432] Config param enable_gcp_goodput_metrics: True
I0419 18:21:21.101501 139955906574144 pyconfig.py:432] Config param enable_gcp_step_deviation_metrics: True
I0419 18:21:21.101517 139955906574144 pyconfig.py:432] Config param enable_goodput_recording: False
I0419 18:21:21.101531 139955906574144 pyconfig.py:432] Config param enable_jax_profiler: False
I0419 18:21:21.101547 139955906574144 pyconfig.py:432] Config param enable_llm_inference_pool: False
I0419 18:21:21.101561 139955906574144 pyconfig.py:432] Config param enable_model_warmup: False
I0419 18:21:21.101577 139955906574144 pyconfig.py:432] Config param enable_multi_tier_checkpointing: False
I0419 18:21:21.101591 139955906574144 pyconfig.py:432] Config param enable_nnx: False
I0419 18:21:21.101607 139955906574144 pyconfig.py:432] Config param enable_orbax_v1: False
I0419 18:21:21.101622 139955906574144 pyconfig.py:432] Config param enable_padding_causal_mask: True
I0419 18:21:21.101637 139955906574144 pyconfig.py:432] Config param enable_pathways_goodput: False
I0419 18:21:21.101652 139955906574144 pyconfig.py:432] Config param enable_prefix_caching: False
I0419 18:21:21.101667 139955906574144 pyconfig.py:432] Config param enable_rampup_batch_size: False
I0419 18:21:21.101682 139955906574144 pyconfig.py:432] Config param enable_single_controller: False
I0419 18:21:21.101697 139955906574144 pyconfig.py:432] Config param enable_single_replica_ckpt_restoring: False
I0419 18:21:21.101711 139955906574144 pyconfig.py:432] Config param enable_tensorboard: True
I0419 18:21:21.101727 139955906574144 pyconfig.py:432] Config param enable_tunix_perf_metrics: False
I0419 18:21:21.101742 139955906574144 pyconfig.py:432] Config param encoder_attention_heads_for_audio: 4
I0419 18:21:21.101757 139955906574144 pyconfig.py:432] Config param encoder_ffn_dim_for_audio: 512
I0419 18:21:21.101773 139955906574144 pyconfig.py:432] Config param encoder_layers_for_audio: 2
I0419 18:21:21.101788 139955906574144 pyconfig.py:432] Config param engram: RematLocation.REMAT
I0419 18:21:21.101805 139955906574144 pyconfig.py:432] Config param engram_head_dim: 1280
I0419 18:21:21.101820 139955906574144 pyconfig.py:432] Config param engram_kernel_size: 4
I0419 18:21:21.101836 139955906574144 pyconfig.py:432] Config param engram_layers: []
I0419 18:21:21.101852 139955906574144 pyconfig.py:432] Config param engram_max_ngram_size: 3
I0419 18:21:21.101866 139955906574144 pyconfig.py:432] Config param engram_num_heads: 8
I0419 18:21:21.101882 139955906574144 pyconfig.py:432] Config param engram_seed: 0
I0419 18:21:21.101897 139955906574144 pyconfig.py:432] Config param engram_vocab_bases: []
I0419 18:21:21.101912 139955906574144 pyconfig.py:432] Config param epsilon_high: None
I0419 18:21:21.101928 139955906574144 pyconfig.py:432] Config param eval_corr_lst: False
I0419 18:21:21.101944 139955906574144 pyconfig.py:432] Config param eval_data_columns: ['text']
I0419 18:21:21.101959 139955906574144 pyconfig.py:432] Config param eval_dataset_name: c4/en:3.0.1
I0419 18:21:21.101975 139955906574144 pyconfig.py:432] Config param eval_image_column: image
I0419 18:21:21.101991 139955906574144 pyconfig.py:432] Config param eval_interval: -1
I0419 18:21:21.102005 139955906574144 pyconfig.py:432] Config param eval_make_lst: False
I0419 18:21:21.102021 139955906574144 pyconfig.py:432] Config param eval_per_device_batch_size: 2
I0419 18:21:21.102036 139955906574144 pyconfig.py:432] Config param eval_sampling_strategy: greedy
I0419 18:21:21.102051 139955906574144 pyconfig.py:432] Config param eval_split: validation
I0419 18:21:21.102066 139955906574144 pyconfig.py:432] Config param eval_steps: -1
I0419 18:21:21.102082 139955906574144 pyconfig.py:432] Config param expansion_factor_real_data: -1.0
I0419 18:21:21.102098 139955906574144 pyconfig.py:432] Config param final_logits_soft_cap: None
I0419 18:21:21.102113 139955906574144 pyconfig.py:432] Config param first_num_dense_layers: 0
I0419 18:21:21.102129 139955906574144 pyconfig.py:432] Config param float32_gate_logits: False
I0419 18:21:21.102143 139955906574144 pyconfig.py:432] Config param float32_logits: False
I0419 18:21:21.102159 139955906574144 pyconfig.py:432] Config param float32_qk_product: False
I0419 18:21:21.102173 139955906574144 pyconfig.py:432] Config param float32_weight_sum: True
I0419 18:21:21.102189 139955906574144 pyconfig.py:432] Config param force_q_layout: False
I0419 18:21:21.102204 139955906574144 pyconfig.py:432] Config param force_unroll: False
I0419 18:21:21.102219 139955906574144 pyconfig.py:432] Config param freeze_audio_encoder_params: True
I0419 18:21:21.102237 139955906574144 pyconfig.py:432] Config param freeze_vision_encoder_params: True
I0419 18:21:21.102252 139955906574144 pyconfig.py:432] Config param fused_mlp: False
I0419 18:21:21.102267 139955906574144 pyconfig.py:432] Config param fused_qkv: True
I0419 18:21:21.102283 139955906574144 pyconfig.py:432] Config param gcs_metrics: False
I0419 18:21:21.102298 139955906574144 pyconfig.py:432] Config param gdn_chunk_size: 64
I0419 18:21:21.102314 139955906574144 pyconfig.py:432] Config param gdn_conv_kernel_dim: 4
I0419 18:21:21.102328 139955906574144 pyconfig.py:432] Config param gdn_key_head_dim: 128
I0419 18:21:21.102344 139955906574144 pyconfig.py:432] Config param gdn_num_key_heads: 16
I0419 18:21:21.102359 139955906574144 pyconfig.py:432] Config param gdn_num_value_heads: 32
I0419 18:21:21.102374 139955906574144 pyconfig.py:432] Config param gdn_value_head_dim: 128
I0419 18:21:21.102390 139955906574144 pyconfig.py:432] Config param generate_padding_batch_eval: False
I0419 18:21:21.102405 139955906574144 pyconfig.py:432] Config param generate_padding_batch_train: False
I0419 18:21:21.102429 139955906574144 pyconfig.py:432] Config param generate_slice: v5e-16
I0419 18:21:21.102445 139955906574144 pyconfig.py:432] Config param generation_configs: {}
I0419 18:21:21.102461 139955906574144 pyconfig.py:432] Config param global_batch_size_to_eval_on: 64
I0419 18:21:21.102476 139955906574144 pyconfig.py:432] Config param global_batch_size_to_load: 512
I0419 18:21:21.102491 139955906574144 pyconfig.py:432] Config param global_batch_size_to_load_eval: 64
I0419 18:21:21.102507 139955906574144 pyconfig.py:432] Config param global_batch_size_to_load_increment: None
I0419 18:21:21.102521 139955906574144 pyconfig.py:432] Config param global_batch_size_to_load_start: None
I0419 18:21:21.102537 139955906574144 pyconfig.py:432] Config param global_batch_size_to_train_on: 512
I0419 18:21:21.102552 139955906574144 pyconfig.py:432] Config param global_head_dim: 0
I0419 18:21:21.102567 139955906574144 pyconfig.py:432] Config param global_num_kv_heads: 0
I0419 18:21:21.102582 139955906574144 pyconfig.py:432] Config param global_parameter_scale: 1
I0419 18:21:21.102597 139955906574144 pyconfig.py:432] Config param global_rampup_samples: 500
I0419 18:21:21.102612 139955906574144 pyconfig.py:432] Config param global_rope_max_timescale: -1
I0419 18:21:21.102627 139955906574144 pyconfig.py:432] Config param global_rope_proportion: 0.25
I0419 18:21:21.102643 139955906574144 pyconfig.py:432] Config param goodput_upload_interval_seconds: 30
I0419 18:21:21.102659 139955906574144 pyconfig.py:432] Config param grad_dtype: float32
I0419 18:21:21.102694 139955906574144 pyconfig.py:432] Config param gradient_accumulation_steps: 8
I0419 18:21:21.102710 139955906574144 pyconfig.py:432] Config param gradient_clipping_threshold: 1.0
I0419 18:21:21.102726 139955906574144 pyconfig.py:432] Config param grain_data_source_max_workers: 16
I0419 18:21:21.102741 139955906574144 pyconfig.py:432] Config param grain_eval_files: 
I0419 18:21:21.102757 139955906574144 pyconfig.py:432] Config param grain_file_type: arrayrecord
I0419 18:21:21.102772 139955906574144 pyconfig.py:432] Config param grain_num_threads: 16
I0419 18:21:21.102788 139955906574144 pyconfig.py:432] Config param grain_num_threads_eval: 16
I0419 18:21:21.102805 139955906574144 pyconfig.py:432] Config param grain_packing_type: first_fit
I0419 18:21:21.102822 139955906574144 pyconfig.py:432] Config param grain_per_worker_buffer_size: 1
I0419 18:21:21.102838 139955906574144 pyconfig.py:432] Config param grain_per_worker_buffer_size_eval: 1
I0419 18:21:21.102853 139955906574144 pyconfig.py:432] Config param grain_prefetch_buffer_size: 500
I0419 18:21:21.102869 139955906574144 pyconfig.py:432] Config param grain_prefetch_buffer_size_eval: 500
I0419 18:21:21.102885 139955906574144 pyconfig.py:432] Config param grain_ram_budget_mb: 1024
I0419 18:21:21.102900 139955906574144 pyconfig.py:432] Config param grain_shuffle_buffer_size: 100
I0419 18:21:21.102915 139955906574144 pyconfig.py:432] Config param grain_train_files: 
I0419 18:21:21.102931 139955906574144 pyconfig.py:432] Config param grain_train_mixture_config_path: 
I0419 18:21:21.102946 139955906574144 pyconfig.py:432] Config param grain_worker_count: 1
I0419 18:21:21.102962 139955906574144 pyconfig.py:432] Config param grain_worker_count_eval: 1
I0419 18:21:21.102977 139955906574144 pyconfig.py:432] Config param grpo_beta: 0.08
I0419 18:21:21.102993 139955906574144 pyconfig.py:432] Config param grpo_epsilon: 0.2
I0419 18:21:21.103009 139955906574144 pyconfig.py:432] Config param hardware: tpu
I0419 18:21:21.103024 139955906574144 pyconfig.py:432] Config param hbm_utilization_vllm: 0.72
I0419 18:21:21.103040 139955906574144 pyconfig.py:432] Config param head_dim: 8
I0419 18:21:21.103056 139955906574144 pyconfig.py:432] Config param heartbeat_reporting_interval_in_seconds: 5
I0419 18:21:21.103071 139955906574144 pyconfig.py:432] Config param hf_data_dir: None
I0419 18:21:21.103087 139955906574144 pyconfig.py:432] Config param hf_eval_files: None
I0419 18:21:21.103102 139955906574144 pyconfig.py:432] Config param hf_eval_split: None
I0419 18:21:21.103117 139955906574144 pyconfig.py:432] Config param hf_name: None
I0419 18:21:21.103133 139955906574144 pyconfig.py:432] Config param hf_path: OptimalScale/ClimbMix
I0419 18:21:21.103148 139955906574144 pyconfig.py:432] Config param hf_train_files: None
I0419 18:21:21.103163 139955906574144 pyconfig.py:432] Config param hidden_size_for_vit: 1408
I0419 18:21:21.103178 139955906574144 pyconfig.py:432] Config param hide_profiler_step_metric: False
I0419 18:21:21.103193 139955906574144 pyconfig.py:432] Config param ici_autoregressive_parallelism: 1
I0419 18:21:21.103208 139955906574144 pyconfig.py:432] Config param ici_context_autoregressive_parallelism: 1
I0419 18:21:21.103223 139955906574144 pyconfig.py:432] Config param ici_context_parallelism: 1
I0419 18:21:21.103242 139955906574144 pyconfig.py:432] Config param ici_data_parallelism: 1
I0419 18:21:21.103257 139955906574144 pyconfig.py:432] Config param ici_diloco_parallelism: 1
I0419 18:21:21.103271 139955906574144 pyconfig.py:432] Config param ici_expert_parallelism: 1
I0419 18:21:21.103287 139955906574144 pyconfig.py:432] Config param ici_fsdp_parallelism: -1
I0419 18:21:21.103302 139955906574144 pyconfig.py:432] Config param ici_fsdp_transpose_parallelism: 1
I0419 18:21:21.103317 139955906574144 pyconfig.py:432] Config param ici_parallelism: [1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
I0419 18:21:21.103334 139955906574144 pyconfig.py:432] Config param ici_pipeline_parallelism: 1
I0419 18:21:21.103349 139955906574144 pyconfig.py:432] Config param ici_sequence_parallelism: 1
I0419 18:21:21.103365 139955906574144 pyconfig.py:432] Config param ici_tensor_parallelism: 1
I0419 18:21:21.103379 139955906574144 pyconfig.py:432] Config param ici_tensor_sequence_parallelism: 1
I0419 18:21:21.103395 139955906574144 pyconfig.py:432] Config param ici_tensor_transpose_parallelism: 1
I0419 18:21:21.103421 139955906574144 pyconfig.py:432] Config param image_path: 
I0419 18:21:21.103437 139955906574144 pyconfig.py:432] Config param image_placeholder: <|image|>
I0419 18:21:21.103451 139955906574144 pyconfig.py:432] Config param image_size_for_vit: 896
I0419 18:21:21.103467 139955906574144 pyconfig.py:432] Config param indexer_head_dim: 128
I0419 18:21:21.103483 139955906574144 pyconfig.py:432] Config param indexer_loss_scaling_factor: 0.0
I0419 18:21:21.103497 139955906574144 pyconfig.py:432] Config param indexer_n_heads: 64
I0419 18:21:21.103513 139955906574144 pyconfig.py:432] Config param indexer_sparse_training: False
I0419 18:21:21.103528 139955906574144 pyconfig.py:432] Config param indexer_topk: 2048
I0419 18:21:21.103543 139955906574144 pyconfig.py:432] Config param inference_benchmark_test: False
I0419 18:21:21.103559 139955906574144 pyconfig.py:432] Config param inference_metadata_file: 
I0419 18:21:21.103574 139955906574144 pyconfig.py:432] Config param inference_microbenchmark_log_file_path: 
I0419 18:21:21.103589 139955906574144 pyconfig.py:432] Config param inference_microbenchmark_loop_iters: 10
I0419 18:21:21.103604 139955906574144 pyconfig.py:432] Config param inference_microbenchmark_num_samples: [1, 2, 3, 4, 5]
I0419 18:21:21.103620 139955906574144 pyconfig.py:432] Config param inference_microbenchmark_prefill_lengths: 64,128,256,512,1024
I0419 18:21:21.103635 139955906574144 pyconfig.py:432] Config param inference_microbenchmark_stages: prefill,generate
I0419 18:21:21.103651 139955906574144 pyconfig.py:432] Config param inference_server: MaxtextInterleavedServer
I0419 18:21:21.103665 139955906574144 pyconfig.py:432] Config param inhomogeneous_layer_cycle_interval: 1
I0419 18:21:21.103681 139955906574144 pyconfig.py:432] Config param init_weights_seed: 0
I0419 18:21:21.103696 139955906574144 pyconfig.py:432] Config param input_data_sharding_logical_axes: ['activation_embed_and_logits_batch', 'activation_norm_length']
I0419 18:21:21.103712 139955906574144 pyconfig.py:432] Config param interleave_moe_layer_step: 1
I0419 18:21:21.103728 139955906574144 pyconfig.py:432] Config param intermediate_size_for_vit: 5632
I0419 18:21:21.103744 139955906574144 pyconfig.py:432] Config param internal_compile: False
I0419 18:21:21.103760 139955906574144 pyconfig.py:432] Config param internal_compile_num_devices: -1
I0419 18:21:21.103774 139955906574144 pyconfig.py:432] Config param jax_cache_dir: ~/jax_cache
I0419 18:21:21.103790 139955906574144 pyconfig.py:432] Config param jax_debug_log_modules: 
I0419 18:21:21.103804 139955906574144 pyconfig.py:432] Config param jax_distributed_initialization_timeout: 300
I0419 18:21:21.103820 139955906574144 pyconfig.py:432] Config param jax_profiler_port: 9999
I0419 18:21:21.103835 139955906574144 pyconfig.py:432] Config param key_proj: RematLocation.REMAT
I0419 18:21:21.103851 139955906574144 pyconfig.py:432] Config param kv_cache_buffer: 256
I0419 18:21:21.103867 139955906574144 pyconfig.py:432] Config param kv_lora_rank: 512
I0419 18:21:21.103882 139955906574144 pyconfig.py:432] Config param kv_quant_axis: KvQuantAxis.HEADS_AND_DKV
I0419 18:21:21.103899 139955906574144 pyconfig.py:432] Config param kv_quant_dtype: int8
I0419 18:21:21.103914 139955906574144 pyconfig.py:432] Config param kv_wa_proj: RematLocation.REMAT
I0419 18:21:21.103929 139955906574144 pyconfig.py:432] Config param learning_rate: 0.0002
I0419 18:21:21.103946 139955906574144 pyconfig.py:432] Config param learning_rate_final_fraction: 0.1
I0419 18:21:21.103962 139955906574144 pyconfig.py:432] Config param learning_rate_schedule_steps: 200000
I0419 18:21:21.103978 139955906574144 pyconfig.py:432] Config param load_balance_loss_weight: 0.0
I0419 18:21:21.103993 139955906574144 pyconfig.py:432] Config param load_checkpoint_only_once: False
I0419 18:21:21.104008 139955906574144 pyconfig.py:432] Config param load_from_prefill_dir: False
I0419 18:21:21.104023 139955906574144 pyconfig.py:432] Config param load_full_state_path: 
I0419 18:21:21.104039 139955906574144 pyconfig.py:432] Config param load_parameters_path: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items
I0419 18:21:21.104055 139955906574144 pyconfig.py:432] Config param local_checkpoint_directory: 
I0419 18:21:21.104070 139955906574144 pyconfig.py:432] Config param local_checkpoint_period: 0
I0419 18:21:21.104086 139955906574144 pyconfig.py:432] Config param local_rope_max_timescale: -1
I0419 18:21:21.104102 139955906574144 pyconfig.py:432] Config param local_rope_proportion: 1.0
I0419 18:21:21.104118 139955906574144 pyconfig.py:432] Config param log_config: True
I0419 18:21:21.104134 139955906574144 pyconfig.py:432] Config param log_period: 10
I0419 18:21:21.104149 139955906574144 pyconfig.py:432] Config param logical_axis_rules: (('activation_embed_and_logits_batch', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_embed_and_logits_batch_sequence', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_vocab', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_vocab', ('tensor', 'tensor_transpose')), ('activation_vocab', 'tensor_sequence'), ('activation_vocab', ('sequence', 'context')), ('vocab', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('embed_vocab', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence', 'autoregressive')), ('activation_kv_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence')), ('activation_attn_length', ('sequence', 'context')), ('activation_attn_length', ('context',)), ('activation_q_length', ('context',)), ('activation_kv_length', ()), ('activation_attn_embed', ('tensor', 'tensor_transpose')), ('activation_kv', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_kv_head_dim', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('q_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('kv_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('qkv', ()), ('kv', ()), ('kv_head_dim', ()), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'expert')), ('q_lora_up_proj', ()), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'expert')), ('kv_lora_up_proj', ()), ('activation_batch_moe', ('data', 'fsdp', 'fsdp_transpose')), ('activation_length_moe', ('sequence', 'context')), ('activation_length_moe', ('context',)), ('activation_norm_length_moe', ('tensor_sequence', 'context', 'sequence')), ('activation_embed_moe', ('tensor', 'tensor_transpose')), ('activation_mlp_moe', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_exp', ('expert',)), ('exp', 'expert'), ('mlp_moe', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'context')), ('embed_moe', ('fsdp', 'sequence', 'context')), ('activation_mlp', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_length', ('sequence', 'context')), ('activation_length', ('context',)), ('activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_embed', ('tensor', 'tensor_transpose')), ('activation_stage', 'stage'), ('mlp', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'context', 'expert')), ('norm', ('tensor', 'tensor_transpose')), ('layers', 'stage'), ('diloco', 'diloco'), ('engram_dim', ('tensor',)), ('dense_layers', ()), ('moe_layers', ()), ('mhc', ()), ('prefill_activation_length', ('sequence', 'context')), ('prefill_activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_prefill_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_length', ('sequence',)), ('cache_heads', ('autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence')), ('cache_heads', ('autoregressive', 'tensor', 'tensor_sequence')), ('paged_kv_heads', ('tensor',)), ('cache_batch_prefill', ()), ('cache_batch', ()), ('cache_heads_none', ()), ('cache_kv', ()), ('cache_sequence', ()), ('num_pages', ()), ('tokens_per_page', ()), ('paged_kv_head_dim_size', ()), ('mlp_no_fsdp', ('tensor', 'tensor_sequence', 'autoregressive')), ('embed_tensor_transpose', ('tensor_transpose',)), ('exp_with_fsdp', 'fsdp'))
I0419 18:21:21.104222 139955906574144 pyconfig.py:432] Config param logits_dot_in_fp32: False
I0419 18:21:21.104242 139955906574144 pyconfig.py:432] Config param logits_via_embedding: True
I0419 18:21:21.104258 139955906574144 pyconfig.py:432] Config param lora_input_adapters_path: 
I0419 18:21:21.104274 139955906574144 pyconfig.py:432] Config param loss_algo: grpo
I0419 18:21:21.104290 139955906574144 pyconfig.py:432] Config param lr_schedule_type: LearningRateScheduleType.COSINE
I0419 18:21:21.104308 139955906574144 pyconfig.py:432] Config param managed_mldiagnostics: False
I0419 18:21:21.104323 139955906574144 pyconfig.py:432] Config param managed_mldiagnostics_dir: None
I0419 18:21:21.104339 139955906574144 pyconfig.py:432] Config param managed_mldiagnostics_run_group: 
I0419 18:21:21.104354 139955906574144 pyconfig.py:432] Config param matmul_precision: MatmulPrecision.DEFAULT
I0419 18:21:21.104372 139955906574144 pyconfig.py:432] Config param max_checkify: False
I0419 18:21:21.104387 139955906574144 pyconfig.py:432] Config param max_concurrency: 256
I0419 18:21:21.104403 139955906574144 pyconfig.py:432] Config param max_corpus_chars: 10000000
I0419 18:21:21.104431 139955906574144 pyconfig.py:432] Config param max_num_batched_tokens: None
I0419 18:21:21.104446 139955906574144 pyconfig.py:432] Config param max_num_checkpoints_to_keep: None
I0419 18:21:21.104462 139955906574144 pyconfig.py:432] Config param max_num_images_per_example: -1
I0419 18:21:21.104476 139955906574144 pyconfig.py:432] Config param max_num_seqs: None
I0419 18:21:21.104492 139955906574144 pyconfig.py:432] Config param max_position_embeddings: 163840
I0419 18:21:21.104506 139955906574144 pyconfig.py:432] Config param max_prefill_predict_length: 64
I0419 18:21:21.104522 139955906574144 pyconfig.py:432] Config param max_sample_len_for_audio: 10000
I0419 18:21:21.104538 139955906574144 pyconfig.py:432] Config param max_segments_per_seq: -1
I0419 18:21:21.104554 139955906574144 pyconfig.py:432] Config param max_source_positions_for_audio: 1500
I0419 18:21:21.104569 139955906574144 pyconfig.py:432] Config param max_target_length: 2048
I0419 18:21:21.104585 139955906574144 pyconfig.py:432] Config param max_timescale_for_audio: 10000.0
I0419 18:21:21.104599 139955906574144 pyconfig.py:432] Config param megablox: True
I0419 18:21:21.104615 139955906574144 pyconfig.py:432] Config param merge_gating_gmm: False
I0419 18:21:21.104630 139955906574144 pyconfig.py:432] Config param mesh_axes: ['diloco', 'data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive']
I0419 18:21:21.104647 139955906574144 pyconfig.py:432] Config param metrics_dir: None
I0419 18:21:21.104662 139955906574144 pyconfig.py:432] Config param metrics_file: 
I0419 18:21:21.104678 139955906574144 pyconfig.py:432] Config param mhc_expansion_rate: 1
I0419 18:21:21.104692 139955906574144 pyconfig.py:432] Config param micro_batch_size_to_eval_on: 64
I0419 18:21:21.104708 139955906574144 pyconfig.py:432] Config param micro_batch_size_to_train_on: 64
I0419 18:21:21.104723 139955906574144 pyconfig.py:432] Config param mla_kv: RematLocation.REMAT
I0419 18:21:21.104740 139955906574144 pyconfig.py:432] Config param mla_naive_kvcache: True
I0419 18:21:21.104755 139955906574144 pyconfig.py:432] Config param mla_q: RematLocation.REMAT
I0419 18:21:21.104771 139955906574144 pyconfig.py:432] Config param mlp_activations: ['gelu']
I0419 18:21:21.104788 139955906574144 pyconfig.py:432] Config param mlp_activations_limit: -1.0
I0419 18:21:21.104804 139955906574144 pyconfig.py:432] Config param mlp_bias: False
I0419 18:21:21.104819 139955906574144 pyconfig.py:432] Config param mlp_dim: 64
I0419 18:21:21.104834 139955906574144 pyconfig.py:432] Config param mlpwi: RematLocation.REMAT
I0419 18:21:21.104850 139955906574144 pyconfig.py:432] Config param mlpwi_0: RematLocation.REMAT
I0419 18:21:21.104865 139955906574144 pyconfig.py:432] Config param mlpwi_1: RematLocation.REMAT
I0419 18:21:21.104880 139955906574144 pyconfig.py:432] Config param mlpwo: RematLocation.REMAT
I0419 18:21:21.104897 139955906574144 pyconfig.py:432] Config param moba: False
I0419 18:21:21.104912 139955906574144 pyconfig.py:432] Config param moba_chunk_size: 1024
I0419 18:21:21.104926 139955906574144 pyconfig.py:432] Config param moba_topk: 8
I0419 18:21:21.104942 139955906574144 pyconfig.py:432] Config param model_call_mode: 
I0419 18:21:21.104957 139955906574144 pyconfig.py:432] Config param model_name: gpt3-52k
I0419 18:21:21.104973 139955906574144 pyconfig.py:432] Config param moe_expert_input_dim: -1
I0419 18:21:21.104987 139955906574144 pyconfig.py:432] Config param moe_fsdp_use_two_stage_all_gather: False
I0419 18:21:21.105003 139955906574144 pyconfig.py:432] Config param moe_mlp_dim: -1
I0419 18:21:21.105019 139955906574144 pyconfig.py:432] Config param moe_mlpwi_0: RematLocation.REMAT
I0419 18:21:21.105034 139955906574144 pyconfig.py:432] Config param moe_mlpwi_1: RematLocation.REMAT
I0419 18:21:21.105050 139955906574144 pyconfig.py:432] Config param moe_mlpwo: RematLocation.REMAT
I0419 18:21:21.105066 139955906574144 pyconfig.py:432] Config param monitor_goodput: False
I0419 18:21:21.105081 139955906574144 pyconfig.py:432] Config param monitor_step_time_deviation: True
I0419 18:21:21.105097 139955906574144 pyconfig.py:432] Config param mrope_section: [24, 20, 20]
I0419 18:21:21.105112 139955906574144 pyconfig.py:432] Config param mscale: 1.0
I0419 18:21:21.105128 139955906574144 pyconfig.py:432] Config param mtc_data_parallelism: 0
I0419 18:21:21.105142 139955906574144 pyconfig.py:432] Config param mtp_eval_target_module: 0
I0419 18:21:21.105158 139955906574144 pyconfig.py:432] Config param mtp_loss_scaling_factor: 0.1
I0419 18:21:21.105174 139955906574144 pyconfig.py:432] Config param mtp_num_layers: 0
I0419 18:21:21.105190 139955906574144 pyconfig.py:432] Config param mu_dtype: float32
I0419 18:21:21.105213 139955906574144 pyconfig.py:432] Config param multi_sampling: False
I0419 18:21:21.105230 139955906574144 pyconfig.py:432] Config param multi_tier_checkpointing_backup_interval_minutes: 0
I0419 18:21:21.105251 139955906574144 pyconfig.py:432] Config param muon_beta: 0.95
I0419 18:21:21.105267 139955906574144 pyconfig.py:432] Config param muon_consistent_rms: None
I0419 18:21:21.105283 139955906574144 pyconfig.py:432] Config param muon_weight_decay: 0.0
I0419 18:21:21.105297 139955906574144 pyconfig.py:432] Config param n_routing_groups: -1
I0419 18:21:21.105313 139955906574144 pyconfig.py:432] Config param n_window_for_audio: 50
I0419 18:21:21.105327 139955906574144 pyconfig.py:432] Config param n_window_infer_for_audio: 800
I0419 18:21:21.105343 139955906574144 pyconfig.py:432] Config param nope_layer_interval: -1
I0419 18:21:21.105358 139955906574144 pyconfig.py:432] Config param norm_topk_prob: False
I0419 18:21:21.105373 139955906574144 pyconfig.py:432] Config param normalization_layer_epsilon: 1e-05
I0419 18:21:21.105391 139955906574144 pyconfig.py:432] Config param normalize_embedding_logits: False
I0419 18:21:21.105416 139955906574144 pyconfig.py:432] Config param num_attention_heads_for_vit: 16
I0419 18:21:21.105432 139955906574144 pyconfig.py:432] Config param num_batches: 4
I0419 18:21:21.105448 139955906574144 pyconfig.py:432] Config param num_channels_for_vit: 3
I0419 18:21:21.105463 139955906574144 pyconfig.py:432] Config param num_conv_layers_for_audio: 3
I0419 18:21:21.105478 139955906574144 pyconfig.py:432] Config param num_decoder_layers: 1
I0419 18:21:21.105494 139955906574144 pyconfig.py:432] Config param num_diloco_replicas: 1
I0419 18:21:21.105510 139955906574144 pyconfig.py:432] Config param num_epoch: 1
I0419 18:21:21.105526 139955906574144 pyconfig.py:432] Config param num_eval_passes: 1
I0419 18:21:21.105540 139955906574144 pyconfig.py:432] Config param num_experts: 1
I0419 18:21:21.105556 139955906574144 pyconfig.py:432] Config param num_experts_per_tok: 1
I0419 18:21:21.105570 139955906574144 pyconfig.py:432] Config param num_generations: 2
I0419 18:21:21.105586 139955906574144 pyconfig.py:432] Config param num_hidden_layers_for_vit: 34
I0419 18:21:21.105602 139955906574144 pyconfig.py:432] Config param num_iterations: 1
I0419 18:21:21.105616 139955906574144 pyconfig.py:432] Config param num_kv_heads: 2
I0419 18:21:21.105632 139955906574144 pyconfig.py:432] Config param num_layers_per_pipeline_stage: 1
I0419 18:21:21.105648 139955906574144 pyconfig.py:432] Config param num_mel_bins_for_audio: 128
I0419 18:21:21.105662 139955906574144 pyconfig.py:432] Config param num_pipeline_microbatches: -1
I0419 18:21:21.105678 139955906574144 pyconfig.py:432] Config param num_pipeline_repeats: -1
I0419 18:21:21.105694 139955906574144 pyconfig.py:432] Config param num_position_embeddings_for_vit: 1024
I0419 18:21:21.105709 139955906574144 pyconfig.py:432] Config param num_query_heads: 2
I0419 18:21:21.105724 139955906574144 pyconfig.py:432] Config param num_samplers_slices: -1
I0419 18:21:21.105740 139955906574144 pyconfig.py:432] Config param num_slices: 1
I0419 18:21:21.105756 139955906574144 pyconfig.py:432] Config param num_target_devices: 32
I0419 18:21:21.105771 139955906574144 pyconfig.py:432] Config param num_test_batches: 5
I0419 18:21:21.105786 139955906574144 pyconfig.py:432] Config param num_trainer_slices: -1
I0419 18:21:21.105803 139955906574144 pyconfig.py:432] Config param num_vocab_tiling: 1
I0419 18:21:21.105819 139955906574144 pyconfig.py:432] Config param off_policy_steps: 0
I0419 18:21:21.105834 139955906574144 pyconfig.py:432] Config param offline_data_dir: None
I0419 18:21:21.105850 139955906574144 pyconfig.py:432] Config param opt_type: OptimizerType.ADAM_PAX
I0419 18:21:21.105868 139955906574144 pyconfig.py:432] Config param optimize_mesh_for_tpu_v6e: False
I0419 18:21:21.105884 139955906574144 pyconfig.py:432] Config param optimizer_memory_host_offload: False
I0419 18:21:21.105900 139955906574144 pyconfig.py:432] Config param original_max_position_embeddings: 4096
I0419 18:21:21.105914 139955906574144 pyconfig.py:432] Config param out_hidden_size_for_vit: 512
I0419 18:21:21.105930 139955906574144 pyconfig.py:432] Config param out_proj: RematLocation.REMAT
I0419 18:21:21.105945 139955906574144 pyconfig.py:432] Config param output_dim_for_audio: 512
I0419 18:21:21.105961 139955906574144 pyconfig.py:432] Config param override_logical_axis_rules: False
I0419 18:21:21.105976 139955906574144 pyconfig.py:432] Config param override_model_config: True
I0419 18:21:21.105992 139955906574144 pyconfig.py:432] Config param packing: True
I0419 18:21:21.106006 139955906574144 pyconfig.py:432] Config param pagedattn_head_dim_alignment: 128
I0419 18:21:21.106022 139955906574144 pyconfig.py:432] Config param pagedattn_max_pages_per_group: -1
I0419 18:21:21.106038 139955906574144 pyconfig.py:432] Config param pagedattn_num_pages: 64
I0419 18:21:21.106052 139955906574144 pyconfig.py:432] Config param pagedattn_pages_per_compute_block: 4
I0419 18:21:21.106068 139955906574144 pyconfig.py:432] Config param pagedattn_tokens_per_page: 32
I0419 18:21:21.106083 139955906574144 pyconfig.py:432] Config param param_scan_axis: 1
I0419 18:21:21.106099 139955906574144 pyconfig.py:432] Config param parameter_memory_host_offload: False
I0419 18:21:21.106115 139955906574144 pyconfig.py:432] Config param partial_rotary_factor: 1.0
I0419 18:21:21.106130 139955906574144 pyconfig.py:432] Config param patch_size_for_vit: 14
I0419 18:21:21.106144 139955906574144 pyconfig.py:432] Config param penalty_incorrect_answer: -1.0
I0419 18:21:21.106160 139955906574144 pyconfig.py:432] Config param penalty_incorrect_format: -0.5
I0419 18:21:21.106176 139955906574144 pyconfig.py:432] Config param per_device_batch_size: 2
I0419 18:21:21.106191 139955906574144 pyconfig.py:432] Config param per_device_batch_size_increment: 2.0
I0419 18:21:21.106206 139955906574144 pyconfig.py:432] Config param per_device_batch_size_start: 4.0
I0419 18:21:21.106222 139955906574144 pyconfig.py:432] Config param pipeline_delay_activation_forwarding: False
I0419 18:21:21.106241 139955906574144 pyconfig.py:432] Config param pipeline_fsdp_ag_once: False
I0419 18:21:21.106256 139955906574144 pyconfig.py:432] Config param pipeline_fsdp_ag_per_repeat: False
I0419 18:21:21.106271 139955906574144 pyconfig.py:432] Config param pipeline_parallel_layers: 1
I0419 18:21:21.106286 139955906574144 pyconfig.py:432] Config param pixel_shuffle_ratio_for_vit: 0.5
I0419 18:21:21.106302 139955906574144 pyconfig.py:432] Config param posemb_type_for_vit: learn
I0419 18:21:21.106317 139955906574144 pyconfig.py:432] Config param position_id_per_seconds: 25
I0419 18:21:21.106333 139955906574144 pyconfig.py:432] Config param prefill_cache_axis_order: 1,2,0,3
I0419 18:21:21.106349 139955906574144 pyconfig.py:432] Config param prefill_cache_dir: 
I0419 18:21:21.106364 139955906574144 pyconfig.py:432] Config param prefill_chunk_size: 256
I0419 18:21:21.106379 139955906574144 pyconfig.py:432] Config param prefill_slice: v5e-16
I0419 18:21:21.106395 139955906574144 pyconfig.py:432] Config param prefix_caching_dram_byte: 100000000000
I0419 18:21:21.106420 139955906574144 pyconfig.py:432] Config param prefix_caching_hbm_byte: 10000000000
I0419 18:21:21.106435 139955906574144 pyconfig.py:432] Config param profile_cleanly: True
I0419 18:21:21.106451 139955906574144 pyconfig.py:432] Config param profile_periodically_period: -1
I0419 18:21:21.106467 139955906574144 pyconfig.py:432] Config param profile_power_events: False
I0419 18:21:21.106482 139955906574144 pyconfig.py:432] Config param profiler: ProfilerType.NONE
I0419 18:21:21.106499 139955906574144 pyconfig.py:432] Config param profiler_steps: 5
I0419 18:21:21.106514 139955906574144 pyconfig.py:432] Config param projector_dropout_for_vit: 0.0
I0419 18:21:21.106529 139955906574144 pyconfig.py:432] Config param projector_input_dim_for_vit: 4096
I0419 18:21:21.106544 139955906574144 pyconfig.py:432] Config param projector_output_dim_for_vit: 4096
I0419 18:21:21.106560 139955906574144 pyconfig.py:432] Config param prometheus_port: 0
I0419 18:21:21.106575 139955906574144 pyconfig.py:432] Config param prompt: I love to
I0419 18:21:21.106590 139955906574144 pyconfig.py:432] Config param pure_nnx: False
I0419 18:21:21.106605 139955906574144 pyconfig.py:432] Config param pure_nnx_decoder: False
I0419 18:21:21.106620 139955906574144 pyconfig.py:432] Config param q_lora_rank: 0
I0419 18:21:21.106636 139955906574144 pyconfig.py:432] Config param qk_clip_threshold: 100.0
I0419 18:21:21.106650 139955906574144 pyconfig.py:432] Config param qk_nope_head_dim: 128
I0419 18:21:21.106665 139955906574144 pyconfig.py:432] Config param qk_norm_with_scale: True
I0419 18:21:21.106681 139955906574144 pyconfig.py:432] Config param qk_rope_head_dim: 64
I0419 18:21:21.106695 139955906574144 pyconfig.py:432] Config param qkv_proj: RematLocation.REMAT
I0419 18:21:21.106711 139955906574144 pyconfig.py:432] Config param quant_cfg_path: 
I0419 18:21:21.106727 139955906574144 pyconfig.py:432] Config param quantization: QuantizationType.NONE
I0419 18:21:21.106745 139955906574144 pyconfig.py:432] Config param quantization_local_shard_count: 4
I0419 18:21:21.106759 139955906574144 pyconfig.py:432] Config param quantize_kvcache: False
I0419 18:21:21.106775 139955906574144 pyconfig.py:432] Config param query_proj: RematLocation.REMAT
I0419 18:21:21.106791 139955906574144 pyconfig.py:432] Config param query_wa_proj: RematLocation.REMAT
I0419 18:21:21.106806 139955906574144 pyconfig.py:432] Config param ragged_block_size: 256
I0419 18:21:21.106821 139955906574144 pyconfig.py:432] Config param ragged_buffer_factor: -1.0
I0419 18:21:21.106838 139955906574144 pyconfig.py:432] Config param rampup_end_step: 0
I0419 18:21:21.106853 139955906574144 pyconfig.py:432] Config param rampup_samples_per_increment_to_load: None
I0419 18:21:21.106869 139955906574144 pyconfig.py:432] Config param reasoning_end_token: </reasoning>
I0419 18:21:21.106884 139955906574144 pyconfig.py:432] Config param reasoning_start_token: <reasoning>
I0419 18:21:21.106901 139955906574144 pyconfig.py:432] Config param record_internal_nn_metrics: 0
I0419 18:21:21.106918 139955906574144 pyconfig.py:432] Config param remat_policy: full
I0419 18:21:21.106934 139955906574144 pyconfig.py:432] Config param remat_policy_for_vit: minimal
I0419 18:21:21.106951 139955906574144 pyconfig.py:432] Config param remove_size_one_mesh_axis_from_type: True
I0419 18:21:21.106967 139955906574144 pyconfig.py:432] Config param replicate_quant_scale: False
I0419 18:21:21.106983 139955906574144 pyconfig.py:432] Config param replicator_backup_interval_minutes: 0
I0419 18:21:21.106999 139955906574144 pyconfig.py:432] Config param report_heartbeat_metric_for_gcp_monitoring: False
I0419 18:21:21.107013 139955906574144 pyconfig.py:432] Config param report_performance_metric_for_gcp_monitoring: False
I0419 18:21:21.107029 139955906574144 pyconfig.py:432] Config param reshape_q: False
I0419 18:21:21.107043 139955906574144 pyconfig.py:432] Config param return_log_prob: False
I0419 18:21:21.107059 139955906574144 pyconfig.py:432] Config param reuse_example_batch: 0
I0419 18:21:21.107075 139955906574144 pyconfig.py:432] Config param reward_exact_answer: 5.0
I0419 18:21:21.107090 139955906574144 pyconfig.py:432] Config param reward_exact_format_match: 3.0
I0419 18:21:21.107106 139955906574144 pyconfig.py:432] Config param reward_partial_format_match: 0.5
I0419 18:21:21.107121 139955906574144 pyconfig.py:432] Config param reward_ratio_guess_to_answer_high: 0.5
I0419 18:21:21.107138 139955906574144 pyconfig.py:432] Config param reward_ratio_guess_to_answer_low: 0.25
I0419 18:21:21.107153 139955906574144 pyconfig.py:432] Config param reward_white_space_format_match: 1.5
I0419 18:21:21.107169 139955906574144 pyconfig.py:432] Config param rl: {'num_generations': 2, 'num_iterations': 1, 'grpo_beta': 0.08, 'grpo_epsilon': 0.2, 'loss_algo': 'grpo', 'use_agentic_rollout': False, 'max_concurrency': 256, 'off_policy_steps': 0, 'system_prompt': '', 'degenerate_group_masking': True, 'epsilon_high': None}
I0419 18:21:21.107190 139955906574144 pyconfig.py:432] Config param rollout_data_parallelism: -1
I0419 18:21:21.107206 139955906574144 pyconfig.py:432] Config param rollout_expert_parallelism: 1
I0419 18:21:21.107222 139955906574144 pyconfig.py:432] Config param rollout_micro_batch_size: -1
I0419 18:21:21.107241 139955906574144 pyconfig.py:432] Config param rollout_tensor_parallelism: -1
I0419 18:21:21.107256 139955906574144 pyconfig.py:432] Config param rope_attention_scaling: False
I0419 18:21:21.107272 139955906574144 pyconfig.py:432] Config param rope_factor: 40
I0419 18:21:21.107286 139955906574144 pyconfig.py:432] Config param rope_interleave: True
I0419 18:21:21.107302 139955906574144 pyconfig.py:432] Config param rope_linear_scaling_factor: 1.0
I0419 18:21:21.107318 139955906574144 pyconfig.py:432] Config param rope_max_timescale: 10000
I0419 18:21:21.107334 139955906574144 pyconfig.py:432] Config param rope_min_timescale: 1
I0419 18:21:21.107350 139955906574144 pyconfig.py:432] Config param rope_theta_for_vit: 10000
I0419 18:21:21.107365 139955906574144 pyconfig.py:432] Config param rope_truncate: True
I0419 18:21:21.107380 139955906574144 pyconfig.py:432] Config param rope_type: RopeType.DEFAULT
I0419 18:21:21.107398 139955906574144 pyconfig.py:432] Config param rope_use_scale: True
I0419 18:21:21.107423 139955906574144 pyconfig.py:432] Config param routed_bias: False
I0419 18:21:21.107438 139955906574144 pyconfig.py:432] Config param routed_bias_update_rate: 0.0
I0419 18:21:21.107453 139955906574144 pyconfig.py:432] Config param routed_scaling_factor: 1.0
I0419 18:21:21.107469 139955906574144 pyconfig.py:432] Config param routed_score_func: 
I0419 18:21:21.107485 139955906574144 pyconfig.py:432] Config param run_name: gpt3-52k_2026-04-19-18-21
I0419 18:21:21.107501 139955906574144 pyconfig.py:432] Config param sa_block_kv: 512
I0419 18:21:21.107515 139955906574144 pyconfig.py:432] Config param sa_block_kv_compute: 512
I0419 18:21:21.107531 139955906574144 pyconfig.py:432] Config param sa_block_kv_dkv: 512
I0419 18:21:21.107547 139955906574144 pyconfig.py:432] Config param sa_block_kv_dkv_compute: 512
I0419 18:21:21.107563 139955906574144 pyconfig.py:432] Config param sa_block_kv_dq: 512
I0419 18:21:21.107578 139955906574144 pyconfig.py:432] Config param sa_block_q: 512
I0419 18:21:21.107593 139955906574144 pyconfig.py:432] Config param sa_block_q_dkv: 512
I0419 18:21:21.107609 139955906574144 pyconfig.py:432] Config param sa_block_q_dq: 512
I0419 18:21:21.107624 139955906574144 pyconfig.py:432] Config param sa_k_layout: HEAD_DIM_MINOR
I0419 18:21:21.107639 139955906574144 pyconfig.py:432] Config param sa_q_layout: HEAD_DIM_MINOR
I0419 18:21:21.107654 139955906574144 pyconfig.py:432] Config param sa_use_fused_bwd_kernel: False
I0419 18:21:21.107670 139955906574144 pyconfig.py:432] Config param sa_v_layout: HEAD_DIM_MINOR
I0419 18:21:21.107686 139955906574144 pyconfig.py:432] Config param sampler_devices_fraction: 0.5
I0419 18:21:21.107702 139955906574144 pyconfig.py:432] Config param save_checkpoint_on_completion: True
I0419 18:21:21.107717 139955906574144 pyconfig.py:432] Config param save_config_to_gcs: False
I0419 18:21:21.107734 139955906574144 pyconfig.py:432] Config param save_quantized_params_path: 
I0419 18:21:21.107749 139955906574144 pyconfig.py:432] Config param scale_embedding_for_audio: True
I0419 18:21:21.107764 139955906574144 pyconfig.py:432] Config param scan_layers: True
I0419 18:21:21.107781 139955906574144 pyconfig.py:432] Config param scan_layers_per_stage: False
I0419 18:21:21.107796 139955906574144 pyconfig.py:432] Config param scan_pipeline_iterations: True
I0419 18:21:21.107811 139955906574144 pyconfig.py:432] Config param scan_pipeline_repeats: False
I0419 18:21:21.107826 139955906574144 pyconfig.py:432] Config param set_remat_policy_on_layers_per_stage: False
I0419 18:21:21.107841 139955906574144 pyconfig.py:432] Config param set_remat_policy_on_pipeline_iterations: True
I0419 18:21:21.107856 139955906574144 pyconfig.py:432] Config param sft_train_on_completion_only: False
I0419 18:21:21.107872 139955906574144 pyconfig.py:432] Config param shard_exp_on_fsdp: False
I0419 18:21:21.107888 139955906574144 pyconfig.py:432] Config param shard_mode: ShardMode.AUTO
I0419 18:21:21.107906 139955906574144 pyconfig.py:432] Config param shard_optimizer_over_data: False
I0419 18:21:21.107920 139955906574144 pyconfig.py:432] Config param sharding_strategy: None
I0419 18:21:21.107936 139955906574144 pyconfig.py:432] Config param sharding_tolerance: 0.02
I0419 18:21:21.107952 139955906574144 pyconfig.py:432] Config param shardy: True
I0419 18:21:21.107967 139955906574144 pyconfig.py:432] Config param share_kv_projections: False
I0419 18:21:21.107982 139955906574144 pyconfig.py:432] Config param shared_experts: 0
I0419 18:21:21.107997 139955906574144 pyconfig.py:432] Config param sinkhorn_iterations: 20
I0419 18:21:21.108013 139955906574144 pyconfig.py:432] Config param skip_first_n_steps_for_profiler: 1
I0419 18:21:21.108027 139955906574144 pyconfig.py:432] Config param skip_jax_distributed_system: False
I0419 18:21:21.108043 139955906574144 pyconfig.py:432] Config param skip_step_interval: 128
I0419 18:21:21.108058 139955906574144 pyconfig.py:432] Config param skip_step_on_spikes: False
I0419 18:21:21.108073 139955906574144 pyconfig.py:432] Config param skip_step_scaling_factor: 6.0
I0419 18:21:21.108088 139955906574144 pyconfig.py:432] Config param sliding_window_size: 0
I0419 18:21:21.108104 139955906574144 pyconfig.py:432] Config param solution_end_token: </answer>
I0419 18:21:21.108118 139955906574144 pyconfig.py:432] Config param solution_start_token: <answer>
I0419 18:21:21.108134 139955906574144 pyconfig.py:432] Config param source_checkpoint_layout: orbax
I0419 18:21:21.108149 139955906574144 pyconfig.py:432] Config param sparse_matmul: True
I0419 18:21:21.108164 139955906574144 pyconfig.py:432] Config param spatial_merge_size_for_vit: 2
I0419 18:21:21.108180 139955906574144 pyconfig.py:432] Config param stack_prefill_result_cache: False
I0419 18:21:21.108195 139955906574144 pyconfig.py:432] Config param stack_trace_interval_seconds: 600
I0419 18:21:21.108211 139955906574144 pyconfig.py:432] Config param stack_trace_to_cloud: False
I0419 18:21:21.108226 139955906574144 pyconfig.py:432] Config param step_deviation_interval_seconds: 30
I0419 18:21:21.108245 139955906574144 pyconfig.py:432] Config param steps: 200000
I0419 18:21:21.108261 139955906574144 pyconfig.py:432] Config param stop_strings: None
I0419 18:21:21.108278 139955906574144 pyconfig.py:432] Config param student_overrides: {'model_name': 'llama3.1-8b'}
I0419 18:21:21.108294 139955906574144 pyconfig.py:432] Config param student_params_to_update: None
I0419 18:21:21.108309 139955906574144 pyconfig.py:432] Config param subslice_shape: 
I0419 18:21:21.108324 139955906574144 pyconfig.py:432] Config param swap_space_vllm_gb: 2
I0419 18:21:21.108340 139955906574144 pyconfig.py:432] Config param system_prompt: 
I0419 18:21:21.108355 139955906574144 pyconfig.py:432] Config param target_eval_loss: 0.0
I0419 18:21:21.108371 139955906574144 pyconfig.py:432] Config param teacher_overrides: {'model_name': 'llama3.1-8b'}
I0419 18:21:21.108386 139955906574144 pyconfig.py:432] Config param temperature_tuning: False
I0419 18:21:21.108402 139955906574144 pyconfig.py:432] Config param temporal_patch_size_for_vit: 2
I0419 18:21:21.108426 139955906574144 pyconfig.py:432] Config param tensorboard_dir: None
I0419 18:21:21.108442 139955906574144 pyconfig.py:432] Config param tensors_on_device: None
I0419 18:21:21.108458 139955906574144 pyconfig.py:432] Config param tensors_to_offload: None
I0419 18:21:21.108472 139955906574144 pyconfig.py:432] Config param test_batch_start_index: 0
I0419 18:21:21.108488 139955906574144 pyconfig.py:432] Config param tile_size_for_vit: 336
I0419 18:21:21.108503 139955906574144 pyconfig.py:432] Config param tokenize_eval_data: True
I0419 18:21:21.108518 139955906574144 pyconfig.py:432] Config param tokenize_train_data: True
I0419 18:21:21.108534 139955906574144 pyconfig.py:432] Config param tokenizer_path: meta-llama/Llama-3.1-8B
I0419 18:21:21.108549 139955906574144 pyconfig.py:432] Config param tokenizer_type: TokenizerType.HUGGINGFACE
I0419 18:21:21.108566 139955906574144 pyconfig.py:432] Config param topk_routing_group: -1
I0419 18:21:21.108582 139955906574144 pyconfig.py:432] Config param train_data_columns: ['text']
I0419 18:21:21.108598 139955906574144 pyconfig.py:432] Config param train_fraction: 1.0
I0419 18:21:21.108613 139955906574144 pyconfig.py:432] Config param train_image_column: image
I0419 18:21:21.108629 139955906574144 pyconfig.py:432] Config param train_micro_batch_size: -1
I0419 18:21:21.108645 139955906574144 pyconfig.py:432] Config param train_split: train
I0419 18:21:21.108659 139955906574144 pyconfig.py:432] Config param trainable_parameters_mask: []
I0419 18:21:21.108676 139955906574144 pyconfig.py:432] Config param trainable_position_size: 2048
I0419 18:21:21.108692 139955906574144 pyconfig.py:432] Config param trainer_devices_fraction: 0.5
I0419 18:21:21.108708 139955906574144 pyconfig.py:432] Config param upload_all_profiler_results: False
I0419 18:21:21.108724 139955906574144 pyconfig.py:432] Config param use_2d_fsdp_sharding: False
I0419 18:21:21.108739 139955906574144 pyconfig.py:432] Config param use_agentic_rollout: False
I0419 18:21:21.108754 139955906574144 pyconfig.py:432] Config param use_audio: False
I0419 18:21:21.108771 139955906574144 pyconfig.py:432] Config param use_audio_in_video: False
I0419 18:21:21.108787 139955906574144 pyconfig.py:432] Config param use_batch_split_schedule: False
I0419 18:21:21.108803 139955906574144 pyconfig.py:432] Config param use_chat_template: False
I0419 18:21:21.108818 139955906574144 pyconfig.py:432] Config param use_chunked_prefill: False
I0419 18:21:21.108834 139955906574144 pyconfig.py:432] Config param use_custom_sort_vjp: True
I0419 18:21:21.108848 139955906574144 pyconfig.py:432] Config param use_dpo: False
I0419 18:21:21.108864 139955906574144 pyconfig.py:432] Config param use_gather_mosaic_kernel: False
I0419 18:21:21.108879 139955906574144 pyconfig.py:432] Config param use_grpo: True
I0419 18:21:21.108895 139955906574144 pyconfig.py:432] Config param use_indexer: False
I0419 18:21:21.108909 139955906574144 pyconfig.py:432] Config param use_iota_embed: True
I0419 18:21:21.108925 139955906574144 pyconfig.py:432] Config param use_jax_splash: False
I0419 18:21:21.108941 139955906574144 pyconfig.py:432] Config param use_max_logit_estimate: -1
I0419 18:21:21.108955 139955906574144 pyconfig.py:432] Config param use_mrope: False
I0419 18:21:21.108971 139955906574144 pyconfig.py:432] Config param use_multimodal: False
I0419 18:21:21.108986 139955906574144 pyconfig.py:432] Config param use_pathways: True
I0419 18:21:21.109001 139955906574144 pyconfig.py:432] Config param use_post_attn_norm: False
I0419 18:21:21.109017 139955906574144 pyconfig.py:432] Config param use_post_ffw_norm: False
I0419 18:21:21.109031 139955906574144 pyconfig.py:432] Config param use_qk_clip: False
I0419 18:21:21.109047 139955906574144 pyconfig.py:432] Config param use_qk_norm: False
I0419 18:21:21.109061 139955906574144 pyconfig.py:432] Config param use_qk_norm_in_gdn: True
I0419 18:21:21.109077 139955906574144 pyconfig.py:432] Config param use_qwix_quantization: False
I0419 18:21:21.109091 139955906574144 pyconfig.py:432] Config param use_ragged_attention: False
I0419 18:21:21.109107 139955906574144 pyconfig.py:432] Config param use_random_routing: False
I0419 18:21:21.109122 139955906574144 pyconfig.py:432] Config param use_replicator_service: False
I0419 18:21:21.109138 139955906574144 pyconfig.py:432] Config param use_ring_of_experts: False
I0419 18:21:21.109153 139955906574144 pyconfig.py:432] Config param use_sft: False
I0419 18:21:21.109167 139955906574144 pyconfig.py:432] Config param use_splash_scheduler: False
I0419 18:21:21.109183 139955906574144 pyconfig.py:432] Config param use_tokamax_gmm: False
I0419 18:21:21.109199 139955906574144 pyconfig.py:432] Config param use_tokamax_splash: False
I0419 18:21:21.109214 139955906574144 pyconfig.py:432] Config param use_truncation: True
I0419 18:21:21.109229 139955906574144 pyconfig.py:432] Config param use_tunix_gradient_accumulation: False
I0419 18:21:21.109250 139955906574144 pyconfig.py:432] Config param use_untrainable_positional_embedding: False
I0419 18:21:21.109264 139955906574144 pyconfig.py:432] Config param use_vertex_tensorboard: False
I0419 18:21:21.109280 139955906574144 pyconfig.py:432] Config param using_pipeline_parallelism: False
I0419 18:21:21.109295 139955906574144 pyconfig.py:432] Config param v_head_dim: 128
I0419 18:21:21.109309 139955906574144 pyconfig.py:432] Config param v_norm_with_scale: True
I0419 18:21:21.109326 139955906574144 pyconfig.py:432] Config param value_proj: RematLocation.REMAT
I0419 18:21:21.109341 139955906574144 pyconfig.py:432] Config param vertex_tensorboard_project: 
I0419 18:21:21.109357 139955906574144 pyconfig.py:432] Config param vertex_tensorboard_region: 
I0419 18:21:21.109372 139955906574144 pyconfig.py:432] Config param video_path: 
I0419 18:21:21.109387 139955906574144 pyconfig.py:432] Config param video_placeholder: <|video|>
I0419 18:21:21.109403 139955906574144 pyconfig.py:432] Config param vision_output_dim_for_vit: 4096
I0419 18:21:21.109430 139955906574144 pyconfig.py:432] Config param vision_output_length: -1
I0419 18:21:21.109444 139955906574144 pyconfig.py:432] Config param vllm_additional_config: {}
I0419 18:21:21.109461 139955906574144 pyconfig.py:432] Config param vllm_hf_config_path: 
I0419 18:21:21.109475 139955906574144 pyconfig.py:432] Config param vllm_hf_overrides: {}
I0419 18:21:21.109490 139955906574144 pyconfig.py:432] Config param vocab_size: 32000
I0419 18:21:21.109506 139955906574144 pyconfig.py:432] Config param warmup_steps_fraction: 0.1
I0419 18:21:21.109521 139955906574144 pyconfig.py:432] Config param weight_dtype: float32
I0419 18:21:21.109544 139955906574144 pyconfig.py:432] Config param weight_quantization_calibration_method: absmax
I0419 18:21:21.109560 139955906574144 pyconfig.py:432] Config param wi_tile_dlhs_batch_seq: 512
I0419 18:21:21.109575 139955906574144 pyconfig.py:432] Config param wi_tile_dlhs_embed_dim: 1024
I0419 18:21:21.109590 139955906574144 pyconfig.py:432] Config param wi_tile_dlhs_mlp_dim: 1024
I0419 18:21:21.109606 139955906574144 pyconfig.py:432] Config param wi_tile_drhs_batch_seq: 512
I0419 18:21:21.109621 139955906574144 pyconfig.py:432] Config param wi_tile_drhs_embed_dim: 1024
I0419 18:21:21.109636 139955906574144 pyconfig.py:432] Config param wi_tile_drhs_mlp_dim: 1024
I0419 18:21:21.109651 139955906574144 pyconfig.py:432] Config param wi_tile_fwd_batch_seq: 512
I0419 18:21:21.109666 139955906574144 pyconfig.py:432] Config param wi_tile_fwd_embed_dim: 1024
I0419 18:21:21.109681 139955906574144 pyconfig.py:432] Config param wi_tile_fwd_mlp_dim: 1024
I0419 18:21:21.109697 139955906574144 pyconfig.py:432] Config param wo_tile_dlhs_batch_seq: 512
I0419 18:21:21.109712 139955906574144 pyconfig.py:432] Config param wo_tile_dlhs_embed_dim: 1024
I0419 18:21:21.109728 139955906574144 pyconfig.py:432] Config param wo_tile_dlhs_mlp_dim: 1024
I0419 18:21:21.109743 139955906574144 pyconfig.py:432] Config param wo_tile_drhs_batch_seq: 512
I0419 18:21:21.109758 139955906574144 pyconfig.py:432] Config param wo_tile_drhs_embed_dim: 1024
I0419 18:21:21.109774 139955906574144 pyconfig.py:432] Config param wo_tile_drhs_mlp_dim: 1024
I0419 18:21:21.109789 139955906574144 pyconfig.py:432] Config param wo_tile_fwd_batch_seq: 512
I0419 18:21:21.109804 139955906574144 pyconfig.py:432] Config param wo_tile_fwd_embed_dim: 1024
I0419 18:21:21.109820 139955906574144 pyconfig.py:432] Config param wo_tile_fwd_mlp_dim: 1024
I0419 18:21:21.109834 139955906574144 pyconfig.py:432] Config param wsd_decay_steps_fraction: 0.1
I0419 18:21:21.109850 139955906574144 pyconfig.py:432] Config param wsd_decay_style: WsdDecayStyle.LINEAR
I0419 18:21:21.109868 139955906574144 pyconfig.py:432] Config param xprof_e2e_enable_fw_power_level_event: False
I0419 18:21:21.109882 139955906574144 pyconfig.py:432] Config param xprof_e2e_enable_fw_thermal_event: False
I0419 18:21:21.109898 139955906574144 pyconfig.py:432] Config param xprof_e2e_enable_fw_throttle_event: False
I0419 18:21:21.109912 139955906574144 pyconfig.py:432] Config param xprof_tpu_power_trace_level: 0
I0419 18:21:21.109930 139955906574144 pyconfig.py:432] Config param z_loss_multiplier: 0.0
I0419 18:21:21.110270 139955906574144 tokenizer.py:245] Tokenizer path: meta-llama/Llama-2-7b-chat-hf
I0419 18:21:21.110305 139955906574144 tokenizer.py:224] Loading HF tokenizer: meta-llama/Llama-2-7b-chat-hf
I0419 18:21:25.104990 139955906574144 _schedule.py:129] A polynomial schedule was set with a non-positive `transition_steps` value; this results in a constant schedule with value `init_value`.
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 765, in <module>
    app.run(main)
  File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 316, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 261, in _run_main
    sys.exit(main(argv))
             ^^^^^^^^^^
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 761, in main
    train_distill(student_config, teacher_config, is_offline, global_config.offline_data_dir)
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 592, in train_distill
    devices_array = maxtext_utils.create_device_mesh(student_config, devices)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/deps/src/maxtext/utils/maxtext_utils.py", line 1510, in create_device_mesh
    ici_parallelism = max_utils.fill_unspecified_mesh_axes(config.ici_parallelism.copy(), num_devices_per_slice, "ICI")
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/deps/src/maxtext/utils/max_utils.py", line 450, in fill_unspecified_mesh_axes
    assert np.prod(parallelism_vals) == target_product, (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: Number of devices per slice 32 does not match the product of the ICI parallelism 8
XPK End: Sun Apr 19 18:21:32 UTC 2026
EXIT_CODE=1