XPK Start: Tue Apr 21 06:28:57 UTC 2026 2026-04-21 06:29:14.837188: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303) Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} `rope_scaling`'s factor field must be a float >= 1, got 40 `rope_scaling`'s beta_fast field must be a float, got 32 `rope_scaling`'s beta_slow field must be a float, got 1 Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} I0421 06:29:18.977558 134206828996416 max_utils.py:273] Attempting to initialize the jax distributed system... INFO:2026-04-21 06:29:28,018:jax._src.distributed:149: Starting JAX distributed service on [::]:8482 I0421 06:29:28.018907 134206828996416 distributed.py:149] Starting JAX distributed service on [::]:8482 INFO:2026-04-21 06:29:28,024:jax._src.distributed:166: Connecting to JAX distributed service on mt-07-distill-smoke-amvxf-slice-job-0-0.mt-07-distill-smoke-amvxf:8482 I0421 06:29:28.024448 134206828996416 distributed.py:166] Connecting to JAX distributed service on mt-07-distill-smoke-amvxf-slice-job-0-0.mt-07-distill-smoke-amvxf:8482 I0421 06:29:29.331102 134206828996416 max_utils.py:284] Jax distributed system initialized! I0421 06:29:35.635210 134206828996416 max_utils.py:244] Jax distributed system is already initialized. W0421 06:29:36.075250 134206828996416 pyconfig.py:111] base_output_directory is not provided; Using local directory called maxtext_output I0421 06:29:36.136137 134206828996416 max_utils.py:244] Jax distributed system is already initialized. I0421 06:29:36.137334 134206828996416 pyconfig.py:471] Config param abort_on_inf_loss: True I0421 06:29:36.137381 134206828996416 pyconfig.py:471] Config param abort_on_nan_loss: True I0421 06:29:36.137407 134206828996416 pyconfig.py:471] Config param act_quantization_calibration_method: absmax I0421 06:29:36.137426 134206828996416 pyconfig.py:471] Config param activation_dropout_for_audio: 0.0 I0421 06:29:36.137446 134206828996416 pyconfig.py:471] Config param activation_function_for_audio: gelu I0421 06:29:36.137464 134206828996416 pyconfig.py:471] Config param activations_in_float32: False I0421 06:29:36.137482 134206828996416 pyconfig.py:471] Config param adam_b1: 0.9 I0421 06:29:36.137501 134206828996416 pyconfig.py:471] Config param adam_b2: 0.95 I0421 06:29:36.137519 134206828996416 pyconfig.py:471] Config param adam_eps: 1e-08 I0421 06:29:36.137542 134206828996416 pyconfig.py:471] Config param adam_eps_root: 0.0 I0421 06:29:36.137558 134206828996416 pyconfig.py:471] Config param adam_weight_decay: 0.1 I0421 06:29:36.137574 134206828996416 pyconfig.py:471] Config param adamw_mask: [] I0421 06:29:36.137591 134206828996416 pyconfig.py:471] Config param add_bos: True I0421 06:29:36.137610 134206828996416 pyconfig.py:471] Config param add_eos: True I0421 06:29:36.137626 134206828996416 pyconfig.py:471] Config param allow_split_physical_axes: False I0421 06:29:36.137654 134206828996416 pyconfig.py:471] Config param ar_cache_axis_order: 1,2,0,3 I0421 06:29:36.137671 134206828996416 pyconfig.py:471] Config param async_checkpointing: True I0421 06:29:36.137686 134206828996416 pyconfig.py:471] Config param async_scheduling: False I0421 06:29:36.137703 134206828996416 pyconfig.py:471] Config param attention: dot_product I0421 06:29:36.137722 134206828996416 pyconfig.py:471] Config param attention_bias: False I0421 06:29:36.137741 134206828996416 pyconfig.py:471] Config param attention_dropout_for_audio: 0.0 I0421 06:29:36.137758 134206828996416 pyconfig.py:471] Config param attention_out: RematLocation.REMAT I0421 06:29:36.137779 134206828996416 pyconfig.py:471] Config param attention_output_dim: -1 I0421 06:29:36.137796 134206828996416 pyconfig.py:471] Config param attention_sink: False I0421 06:29:36.137813 134206828996416 pyconfig.py:471] Config param attention_type: global I0421 06:29:36.137831 134206828996416 pyconfig.py:471] Config param attn_logits_soft_cap: None I0421 06:29:36.137848 134206828996416 pyconfig.py:471] Config param audio_path: I0421 06:29:36.137864 134206828996416 pyconfig.py:471] Config param audio_placeholder: <|audio|> I0421 06:29:36.137882 134206828996416 pyconfig.py:471] Config param autoregressive_decode_assert: I0421 06:29:36.137897 134206828996416 pyconfig.py:471] Config param base_config: base.yml I0421 06:29:36.137914 134206828996416 pyconfig.py:471] Config param base_emb_dim: 16 I0421 06:29:36.137929 134206828996416 pyconfig.py:471] Config param base_mlp_dim: 64 I0421 06:29:36.137946 134206828996416 pyconfig.py:471] Config param base_moe_mlp_dim: -1 I0421 06:29:36.137963 134206828996416 pyconfig.py:471] Config param base_num_decoder_layers: 1 I0421 06:29:36.137978 134206828996416 pyconfig.py:471] Config param base_num_kv_heads: 2 I0421 06:29:36.137994 134206828996416 pyconfig.py:471] Config param base_num_query_heads: 2 I0421 06:29:36.138008 134206828996416 pyconfig.py:471] Config param base_output_directory: /deps/maxtext_output I0421 06:29:36.138025 134206828996416 pyconfig.py:471] Config param batch_size: 1 I0421 06:29:36.138040 134206828996416 pyconfig.py:471] Config param batch_split_factor: 1 I0421 06:29:36.138056 134206828996416 pyconfig.py:471] Config param beta_fast: 32 I0421 06:29:36.138071 134206828996416 pyconfig.py:471] Config param beta_slow: 1 I0421 06:29:36.138087 134206828996416 pyconfig.py:471] Config param bwd_quantization_calibration_method: absmax I0421 06:29:36.138103 134206828996416 pyconfig.py:471] Config param capacity_factor: -1.0 I0421 06:29:36.138118 134206828996416 pyconfig.py:471] Config param cast_logits_to_fp32: True I0421 06:29:36.138134 134206828996416 pyconfig.py:471] Config param chat_template: I0421 06:29:36.138149 134206828996416 pyconfig.py:471] Config param chat_template_path: I0421 06:29:36.138166 134206828996416 pyconfig.py:471] Config param checkpoint_conversion_fn: None I0421 06:29:36.138184 134206828996416 pyconfig.py:471] Config param checkpoint_dir: /deps/maxtext_output/gpt3-52k_2026-04-21-06-29/checkpoints/ I0421 06:29:36.138199 134206828996416 pyconfig.py:471] Config param checkpoint_is_quantized: False I0421 06:29:36.138215 134206828996416 pyconfig.py:471] Config param checkpoint_period: 2000 I0421 06:29:36.138231 134206828996416 pyconfig.py:471] Config param checkpoint_storage_concurrent_gb: 96 I0421 06:29:36.138252 134206828996416 pyconfig.py:471] Config param checkpoint_storage_target_data_file_size_bytes: 2147483648 I0421 06:29:36.138270 134206828996416 pyconfig.py:471] Config param checkpoint_storage_use_ocdbt: True I0421 06:29:36.138290 134206828996416 pyconfig.py:471] Config param checkpoint_storage_use_zarr3: True I0421 06:29:36.138305 134206828996416 pyconfig.py:471] Config param checkpoint_todelete_full_path: None I0421 06:29:36.138321 134206828996416 pyconfig.py:471] Config param checkpoint_todelete_subdir: None I0421 06:29:36.138335 134206828996416 pyconfig.py:471] Config param chips_per_vm: 4 I0421 06:29:36.138352 134206828996416 pyconfig.py:471] Config param chunk_attn_window_size: 0 I0421 06:29:36.138366 134206828996416 pyconfig.py:471] Config param collect_stack_trace: False I0421 06:29:36.138382 134206828996416 pyconfig.py:471] Config param colocated_python_checkpointing: False I0421 06:29:36.138396 134206828996416 pyconfig.py:471] Config param colocated_python_data_input: False I0421 06:29:36.138412 134206828996416 pyconfig.py:471] Config param compile_topology: I0421 06:29:36.138427 134206828996416 pyconfig.py:471] Config param compile_topology_num_slices: -1 I0421 06:29:36.138442 134206828996416 pyconfig.py:471] Config param compile_xla_flags: I0421 06:29:36.138459 134206828996416 pyconfig.py:471] Config param compiled_trainstep_file: I0421 06:29:36.138473 134206828996416 pyconfig.py:471] Config param compute_axis_order: 0,1,2,3 I0421 06:29:36.138489 134206828996416 pyconfig.py:471] Config param constant_bound_config: [] I0421 06:29:36.138504 134206828996416 pyconfig.py:471] Config param context: RematLocation.REMAT I0421 06:29:36.138520 134206828996416 pyconfig.py:471] Config param context_parallel_load_balance: True I0421 06:29:36.138534 134206828996416 pyconfig.py:471] Config param context_parallel_reorder_strategy: ReorderStrategy.AUTO I0421 06:29:36.138552 134206828996416 pyconfig.py:471] Config param context_parallel_size: 1 I0421 06:29:36.138566 134206828996416 pyconfig.py:471] Config param context_parallel_strategy: all_gather I0421 06:29:36.138582 134206828996416 pyconfig.py:471] Config param context_sharding: context I0421 06:29:36.138596 134206828996416 pyconfig.py:471] Config param conv_chunksize_for_audio: 500 I0421 06:29:36.138611 134206828996416 pyconfig.py:471] Config param conv_stride_for_vit: 14 I0421 06:29:36.138625 134206828996416 pyconfig.py:471] Config param convert_checkpoint_if_possible: False I0421 06:29:36.138668 134206828996416 pyconfig.py:471] Config param cost_estimate_flops_bwd: -1 I0421 06:29:36.138686 134206828996416 pyconfig.py:471] Config param cost_estimate_flops_fwd: -1 I0421 06:29:36.138700 134206828996416 pyconfig.py:471] Config param custom_mesh: I0421 06:29:36.138716 134206828996416 pyconfig.py:471] Config param custom_mesh_and_rule: I0421 06:29:36.138732 134206828996416 pyconfig.py:471] Config param d_model_for_audio: 256 I0421 06:29:36.138747 134206828996416 pyconfig.py:471] Config param data_sharding: (('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'),) I0421 06:29:36.138767 134206828996416 pyconfig.py:471] Config param data_shuffle_seed: 0 I0421 06:29:36.138785 134206828996416 pyconfig.py:471] Config param dataset_name: c4/en:3.0.1 I0421 06:29:36.138802 134206828996416 pyconfig.py:471] Config param dataset_path: I0421 06:29:36.138818 134206828996416 pyconfig.py:471] Config param dataset_type: DatasetType.HF I0421 06:29:36.138836 134206828996416 pyconfig.py:471] Config param dcn_autoregressive_parallelism: 1 I0421 06:29:36.138853 134206828996416 pyconfig.py:471] Config param dcn_context_autoregressive_parallelism: 1 I0421 06:29:36.138869 134206828996416 pyconfig.py:471] Config param dcn_context_parallelism: 1 I0421 06:29:36.138884 134206828996416 pyconfig.py:471] Config param dcn_data_parallelism: -1 I0421 06:29:36.138900 134206828996416 pyconfig.py:471] Config param dcn_diloco_parallelism: 1 I0421 06:29:36.138915 134206828996416 pyconfig.py:471] Config param dcn_expert_parallelism: 1 I0421 06:29:36.138931 134206828996416 pyconfig.py:471] Config param dcn_fsdp_parallelism: 1 I0421 06:29:36.138947 134206828996416 pyconfig.py:471] Config param dcn_fsdp_transpose_parallelism: 1 I0421 06:29:36.138962 134206828996416 pyconfig.py:471] Config param dcn_parallelism: [1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] I0421 06:29:36.138979 134206828996416 pyconfig.py:471] Config param dcn_pipeline_parallelism: 1 I0421 06:29:36.138995 134206828996416 pyconfig.py:471] Config param dcn_sequence_parallelism: 1 I0421 06:29:36.139009 134206828996416 pyconfig.py:471] Config param dcn_tensor_parallelism: 1 I0421 06:29:36.139025 134206828996416 pyconfig.py:471] Config param dcn_tensor_sequence_parallelism: 1 I0421 06:29:36.139039 134206828996416 pyconfig.py:471] Config param dcn_tensor_transpose_parallelism: 1 I0421 06:29:36.139055 134206828996416 pyconfig.py:471] Config param debug: {'rl': False} I0421 06:29:36.139072 134206828996416 pyconfig.py:471] Config param debug_sharding: False I0421 06:29:36.139087 134206828996416 pyconfig.py:471] Config param decode_sampling_nucleus_p: -1 I0421 06:29:36.139102 134206828996416 pyconfig.py:471] Config param decode_sampling_strategy: SamplingStrategy.GREEDY I0421 06:29:36.139120 134206828996416 pyconfig.py:471] Config param decode_sampling_temperature: 1.0 I0421 06:29:36.139137 134206828996416 pyconfig.py:471] Config param decode_sampling_top_k: 0 I0421 06:29:36.139152 134206828996416 pyconfig.py:471] Config param decoder_block: DecoderBlockType.GPT3 I0421 06:29:36.139169 134206828996416 pyconfig.py:471] Config param decoder_layer_input: RematLocation.DEVICE I0421 06:29:36.139186 134206828996416 pyconfig.py:471] Config param deepstack_visual_indexes_for_vit: [] I0421 06:29:36.139201 134206828996416 pyconfig.py:471] Config param degenerate_group_masking: True I0421 06:29:36.139216 134206828996416 pyconfig.py:471] Config param dense_init_scale: 1.0 I0421 06:29:36.139231 134206828996416 pyconfig.py:471] Config param diloco_outer_lr: 0.3 I0421 06:29:36.139247 134206828996416 pyconfig.py:471] Config param diloco_outer_momentum: 0.9 I0421 06:29:36.139264 134206828996416 pyconfig.py:471] Config param diloco_sync_period: 36 I0421 06:29:36.139283 134206828996416 pyconfig.py:471] Config param distill_alpha: 0.5 I0421 06:29:36.139299 134206828996416 pyconfig.py:471] Config param distill_alpha_end: None I0421 06:29:36.139315 134206828996416 pyconfig.py:471] Config param distill_alpha_schedule: constant I0421 06:29:36.139331 134206828996416 pyconfig.py:471] Config param distill_beta: 0.0 I0421 06:29:36.139346 134206828996416 pyconfig.py:471] Config param distill_beta_end: None I0421 06:29:36.139361 134206828996416 pyconfig.py:471] Config param distill_beta_schedule: constant I0421 06:29:36.139376 134206828996416 pyconfig.py:471] Config param distill_feature_loss_type: cosine I0421 06:29:36.139391 134206828996416 pyconfig.py:471] Config param distill_layer_indices: None I0421 06:29:36.139405 134206828996416 pyconfig.py:471] Config param distill_temperature: 1.0 I0421 06:29:36.139421 134206828996416 pyconfig.py:471] Config param distill_temperature_end: None I0421 06:29:36.139436 134206828996416 pyconfig.py:471] Config param distill_temperature_schedule: constant I0421 06:29:36.139451 134206828996416 pyconfig.py:471] Config param downsample_hidden_size_for_audio: 256 I0421 06:29:36.139467 134206828996416 pyconfig.py:471] Config param dpo_beta: 0.1 I0421 06:29:36.139483 134206828996416 pyconfig.py:471] Config param dpo_label_smoothing: 0.0 I0421 06:29:36.139500 134206828996416 pyconfig.py:471] Config param dq_reduction_steps: 0 I0421 06:29:36.139514 134206828996416 pyconfig.py:471] Config param dropout_rate: 0.0 I0421 06:29:36.139530 134206828996416 pyconfig.py:471] Config param dtype: bfloat16 I0421 06:29:36.139559 134206828996416 pyconfig.py:471] Config param dtype_mm: float32 I0421 06:29:36.139574 134206828996416 pyconfig.py:471] Config param dump_hlo: False I0421 06:29:36.139589 134206828996416 pyconfig.py:471] Config param dump_hlo_delete_local_after: True I0421 06:29:36.139604 134206828996416 pyconfig.py:471] Config param dump_hlo_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-21-06-29/xla_dump I0421 06:29:36.139620 134206828996416 pyconfig.py:471] Config param dump_hlo_local_dir: /tmp/xla_dump/ I0421 06:29:36.139634 134206828996416 pyconfig.py:471] Config param dump_hlo_local_module_name: jit_train_step I0421 06:29:36.139660 134206828996416 pyconfig.py:471] Config param dump_hlo_module_name: jit_train_step I0421 06:29:36.139675 134206828996416 pyconfig.py:471] Config param dump_hlo_upload_all: False I0421 06:29:36.139691 134206828996416 pyconfig.py:471] Config param dump_hlo_xla_flags: I0421 06:29:36.139705 134206828996416 pyconfig.py:471] Config param dump_jaxpr: False I0421 06:29:36.139721 134206828996416 pyconfig.py:471] Config param dump_jaxpr_delete_local_after: True I0421 06:29:36.139735 134206828996416 pyconfig.py:471] Config param dump_jaxpr_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-21-06-29/jaxpr_dump I0421 06:29:36.139751 134206828996416 pyconfig.py:471] Config param dump_jaxpr_local_dir: /tmp/jaxpr_dump/ I0421 06:29:36.139765 134206828996416 pyconfig.py:471] Config param dump_step: -1 I0421 06:29:36.139780 134206828996416 pyconfig.py:471] Config param elastic_enabled: False I0421 06:29:36.139795 134206828996416 pyconfig.py:471] Config param elastic_max_retries: 10 I0421 06:29:36.139811 134206828996416 pyconfig.py:471] Config param elastic_timeout_seconds: 300 I0421 06:29:36.139827 134206828996416 pyconfig.py:471] Config param emb_dim: 16 I0421 06:29:36.139843 134206828996416 pyconfig.py:471] Config param enable_autocheckpoint: False I0421 06:29:36.139859 134206828996416 pyconfig.py:471] Config param enable_checkpoint_cloud_logger: False I0421 06:29:36.139874 134206828996416 pyconfig.py:471] Config param enable_checkpointing: True I0421 06:29:36.139889 134206828996416 pyconfig.py:471] Config param enable_continuous_checkpointing: False I0421 06:29:36.139904 134206828996416 pyconfig.py:471] Config param enable_data_shuffling: True I0421 06:29:36.139918 134206828996416 pyconfig.py:471] Config param enable_diloco: False I0421 06:29:36.139934 134206828996416 pyconfig.py:471] Config param enable_dp_attention: False I0421 06:29:36.139948 134206828996416 pyconfig.py:471] Config param enable_dropout: False I0421 06:29:36.139963 134206828996416 pyconfig.py:471] Config param enable_emergency_checkpoint: False I0421 06:29:36.139979 134206828996416 pyconfig.py:471] Config param enable_expert_parallel: False I0421 06:29:36.139993 134206828996416 pyconfig.py:471] Config param enable_gcp_goodput_metrics: True I0421 06:29:36.140008 134206828996416 pyconfig.py:471] Config param enable_gcp_step_deviation_metrics: True I0421 06:29:36.140023 134206828996416 pyconfig.py:471] Config param enable_goodput_recording: False I0421 06:29:36.140037 134206828996416 pyconfig.py:471] Config param enable_jax_profiler: False I0421 06:29:36.140053 134206828996416 pyconfig.py:471] Config param enable_llm_inference_pool: False I0421 06:29:36.140068 134206828996416 pyconfig.py:471] Config param enable_model_warmup: False I0421 06:29:36.140082 134206828996416 pyconfig.py:471] Config param enable_multi_tier_checkpointing: False I0421 06:29:36.140098 134206828996416 pyconfig.py:471] Config param enable_nnx: False I0421 06:29:36.140115 134206828996416 pyconfig.py:471] Config param enable_orbax_v1: False I0421 06:29:36.140129 134206828996416 pyconfig.py:471] Config param enable_padding_causal_mask: True I0421 06:29:36.140145 134206828996416 pyconfig.py:471] Config param enable_pathways_goodput: False I0421 06:29:36.140162 134206828996416 pyconfig.py:471] Config param enable_prefix_caching: False I0421 06:29:36.140177 134206828996416 pyconfig.py:471] Config param enable_rampup_batch_size: False I0421 06:29:36.140192 134206828996416 pyconfig.py:471] Config param enable_single_controller: False I0421 06:29:36.140206 134206828996416 pyconfig.py:471] Config param enable_single_replica_ckpt_restoring: False I0421 06:29:36.140222 134206828996416 pyconfig.py:471] Config param enable_tensorboard: True I0421 06:29:36.140237 134206828996416 pyconfig.py:471] Config param enable_tunix_perf_metrics: False I0421 06:29:36.140253 134206828996416 pyconfig.py:471] Config param encoder_attention_heads_for_audio: 4 I0421 06:29:36.140267 134206828996416 pyconfig.py:471] Config param encoder_ffn_dim_for_audio: 512 I0421 06:29:36.140287 134206828996416 pyconfig.py:471] Config param encoder_layers_for_audio: 2 I0421 06:29:36.140301 134206828996416 pyconfig.py:471] Config param engram: RematLocation.REMAT I0421 06:29:36.140317 134206828996416 pyconfig.py:471] Config param engram_head_dim: 1280 I0421 06:29:36.140331 134206828996416 pyconfig.py:471] Config param engram_kernel_size: 4 I0421 06:29:36.140347 134206828996416 pyconfig.py:471] Config param engram_layers: [] I0421 06:29:36.140362 134206828996416 pyconfig.py:471] Config param engram_max_ngram_size: 3 I0421 06:29:36.140377 134206828996416 pyconfig.py:471] Config param engram_num_heads: 8 I0421 06:29:36.140391 134206828996416 pyconfig.py:471] Config param engram_seed: 0 I0421 06:29:36.140407 134206828996416 pyconfig.py:471] Config param engram_vocab_bases: [] I0421 06:29:36.140421 134206828996416 pyconfig.py:471] Config param epsilon_high: None I0421 06:29:36.140436 134206828996416 pyconfig.py:471] Config param eval_corr_lst: False I0421 06:29:36.140450 134206828996416 pyconfig.py:471] Config param eval_data_columns: ['text'] I0421 06:29:36.140466 134206828996416 pyconfig.py:471] Config param eval_dataset_name: c4/en:3.0.1 I0421 06:29:36.140480 134206828996416 pyconfig.py:471] Config param eval_image_column: image I0421 06:29:36.140496 134206828996416 pyconfig.py:471] Config param eval_interval: -1 I0421 06:29:36.140512 134206828996416 pyconfig.py:471] Config param eval_make_lst: False I0421 06:29:36.140526 134206828996416 pyconfig.py:471] Config param eval_per_device_batch_size: 2 I0421 06:29:36.140542 134206828996416 pyconfig.py:471] Config param eval_sampling_strategy: greedy I0421 06:29:36.140558 134206828996416 pyconfig.py:471] Config param eval_split: validation I0421 06:29:36.140572 134206828996416 pyconfig.py:471] Config param eval_steps: -1 I0421 06:29:36.140588 134206828996416 pyconfig.py:471] Config param expansion_factor_real_data: -1.0 I0421 06:29:36.140602 134206828996416 pyconfig.py:471] Config param final_logits_soft_cap: None I0421 06:29:36.140618 134206828996416 pyconfig.py:471] Config param first_num_dense_layers: 0 I0421 06:29:36.140634 134206828996416 pyconfig.py:471] Config param float32_gate_logits: False I0421 06:29:36.140659 134206828996416 pyconfig.py:471] Config param float32_logits: False I0421 06:29:36.140675 134206828996416 pyconfig.py:471] Config param float32_qk_product: False I0421 06:29:36.140690 134206828996416 pyconfig.py:471] Config param float32_weight_sum: True I0421 06:29:36.140705 134206828996416 pyconfig.py:471] Config param force_q_layout: False I0421 06:29:36.140719 134206828996416 pyconfig.py:471] Config param force_unroll: False I0421 06:29:36.140735 134206828996416 pyconfig.py:471] Config param freeze_audio_encoder_params: True I0421 06:29:36.140749 134206828996416 pyconfig.py:471] Config param freeze_vision_encoder_params: True I0421 06:29:36.140765 134206828996416 pyconfig.py:471] Config param fused_mlp: False I0421 06:29:36.140779 134206828996416 pyconfig.py:471] Config param fused_qkv: True I0421 06:29:36.140794 134206828996416 pyconfig.py:471] Config param gcs_metrics: False I0421 06:29:36.140810 134206828996416 pyconfig.py:471] Config param gdn_chunk_size: 64 I0421 06:29:36.140824 134206828996416 pyconfig.py:471] Config param gdn_conv_kernel_dim: 4 I0421 06:29:36.140839 134206828996416 pyconfig.py:471] Config param gdn_key_head_dim: 128 I0421 06:29:36.140854 134206828996416 pyconfig.py:471] Config param gdn_num_key_heads: 16 I0421 06:29:36.140868 134206828996416 pyconfig.py:471] Config param gdn_num_value_heads: 32 I0421 06:29:36.140884 134206828996416 pyconfig.py:471] Config param gdn_value_head_dim: 128 I0421 06:29:36.140900 134206828996416 pyconfig.py:471] Config param generate_padding_batch_eval: False I0421 06:29:36.140914 134206828996416 pyconfig.py:471] Config param generate_padding_batch_train: False I0421 06:29:36.140929 134206828996416 pyconfig.py:471] Config param generate_slice: v5e-16 I0421 06:29:36.140944 134206828996416 pyconfig.py:471] Config param generation_configs: {} I0421 06:29:36.140959 134206828996416 pyconfig.py:471] Config param global_batch_size_to_eval_on: 64 I0421 06:29:36.140973 134206828996416 pyconfig.py:471] Config param global_batch_size_to_load: 512 I0421 06:29:36.140988 134206828996416 pyconfig.py:471] Config param global_batch_size_to_load_eval: 64 I0421 06:29:36.141002 134206828996416 pyconfig.py:471] Config param global_batch_size_to_load_increment: None I0421 06:29:36.141018 134206828996416 pyconfig.py:471] Config param global_batch_size_to_load_start: None I0421 06:29:36.141032 134206828996416 pyconfig.py:471] Config param global_batch_size_to_train_on: 512 I0421 06:29:36.141047 134206828996416 pyconfig.py:471] Config param global_head_dim: 0 I0421 06:29:36.141063 134206828996416 pyconfig.py:471] Config param global_num_kv_heads: 0 I0421 06:29:36.141079 134206828996416 pyconfig.py:471] Config param global_parameter_scale: 1 I0421 06:29:36.141093 134206828996416 pyconfig.py:471] Config param global_rampup_samples: 500 I0421 06:29:36.141108 134206828996416 pyconfig.py:471] Config param global_rope_max_timescale: -1 I0421 06:29:36.141125 134206828996416 pyconfig.py:471] Config param global_rope_proportion: 0.25 I0421 06:29:36.141140 134206828996416 pyconfig.py:471] Config param goodput_upload_interval_seconds: 30 I0421 06:29:36.141155 134206828996416 pyconfig.py:471] Config param grad_dtype: float32 I0421 06:29:36.141190 134206828996416 pyconfig.py:471] Config param gradient_accumulation_steps: 8 I0421 06:29:36.141205 134206828996416 pyconfig.py:471] Config param gradient_clipping_threshold: 1.0 I0421 06:29:36.141221 134206828996416 pyconfig.py:471] Config param grain_data_source_max_workers: 16 I0421 06:29:36.141236 134206828996416 pyconfig.py:471] Config param grain_eval_files: I0421 06:29:36.141251 134206828996416 pyconfig.py:471] Config param grain_file_type: arrayrecord I0421 06:29:36.141268 134206828996416 pyconfig.py:471] Config param grain_num_threads: 16 I0421 06:29:36.141286 134206828996416 pyconfig.py:471] Config param grain_num_threads_eval: 16 I0421 06:29:36.141301 134206828996416 pyconfig.py:471] Config param grain_packing_type: first_fit I0421 06:29:36.141317 134206828996416 pyconfig.py:471] Config param grain_per_worker_buffer_size: 1 I0421 06:29:36.141332 134206828996416 pyconfig.py:471] Config param grain_per_worker_buffer_size_eval: 1 I0421 06:29:36.141348 134206828996416 pyconfig.py:471] Config param grain_prefetch_buffer_size: 500 I0421 06:29:36.141362 134206828996416 pyconfig.py:471] Config param grain_prefetch_buffer_size_eval: 500 I0421 06:29:36.141379 134206828996416 pyconfig.py:471] Config param grain_ram_budget_mb: 1024 I0421 06:29:36.141393 134206828996416 pyconfig.py:471] Config param grain_shuffle_buffer_size: 100 I0421 06:29:36.141408 134206828996416 pyconfig.py:471] Config param grain_train_files: I0421 06:29:36.141424 134206828996416 pyconfig.py:471] Config param grain_train_mixture_config_path: I0421 06:29:36.141439 134206828996416 pyconfig.py:471] Config param grain_worker_count: 1 I0421 06:29:36.141454 134206828996416 pyconfig.py:471] Config param grain_worker_count_eval: 1 I0421 06:29:36.141469 134206828996416 pyconfig.py:471] Config param grpo_beta: 0.08 I0421 06:29:36.141484 134206828996416 pyconfig.py:471] Config param grpo_epsilon: 0.2 I0421 06:29:36.141498 134206828996416 pyconfig.py:471] Config param hardware: tpu I0421 06:29:36.141513 134206828996416 pyconfig.py:471] Config param hbm_utilization_vllm: 0.72 I0421 06:29:36.141528 134206828996416 pyconfig.py:471] Config param head_dim: 8 I0421 06:29:36.141543 134206828996416 pyconfig.py:471] Config param heartbeat_reporting_interval_in_seconds: 5 I0421 06:29:36.141560 134206828996416 pyconfig.py:471] Config param hf_data_dir: None I0421 06:29:36.141576 134206828996416 pyconfig.py:471] Config param hf_eval_files: None I0421 06:29:36.141591 134206828996416 pyconfig.py:471] Config param hf_eval_split: None I0421 06:29:36.141607 134206828996416 pyconfig.py:471] Config param hf_name: None I0421 06:29:36.141622 134206828996416 pyconfig.py:471] Config param hf_path: OptimalScale/ClimbMix I0421 06:29:36.141647 134206828996416 pyconfig.py:471] Config param hf_train_files: None I0421 06:29:36.141664 134206828996416 pyconfig.py:471] Config param hidden_size_for_vit: 1408 I0421 06:29:36.141678 134206828996416 pyconfig.py:471] Config param hide_profiler_step_metric: False I0421 06:29:36.141694 134206828996416 pyconfig.py:471] Config param ici_autoregressive_parallelism: 1 I0421 06:29:36.141708 134206828996416 pyconfig.py:471] Config param ici_context_autoregressive_parallelism: 1 I0421 06:29:36.141723 134206828996416 pyconfig.py:471] Config param ici_context_parallelism: 1 I0421 06:29:36.141738 134206828996416 pyconfig.py:471] Config param ici_data_parallelism: 1 I0421 06:29:36.141753 134206828996416 pyconfig.py:471] Config param ici_diloco_parallelism: 1 I0421 06:29:36.141769 134206828996416 pyconfig.py:471] Config param ici_expert_parallelism: 1 I0421 06:29:36.141783 134206828996416 pyconfig.py:471] Config param ici_fsdp_parallelism: -1 I0421 06:29:36.141799 134206828996416 pyconfig.py:471] Config param ici_fsdp_transpose_parallelism: 1 I0421 06:29:36.141813 134206828996416 pyconfig.py:471] Config param ici_parallelism: [1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1] I0421 06:29:36.141830 134206828996416 pyconfig.py:471] Config param ici_pipeline_parallelism: 1 I0421 06:29:36.141844 134206828996416 pyconfig.py:471] Config param ici_sequence_parallelism: 1 I0421 06:29:36.141860 134206828996416 pyconfig.py:471] Config param ici_tensor_parallelism: 1 I0421 06:29:36.141876 134206828996416 pyconfig.py:471] Config param ici_tensor_sequence_parallelism: 1 I0421 06:29:36.141890 134206828996416 pyconfig.py:471] Config param ici_tensor_transpose_parallelism: 1 I0421 06:29:36.141906 134206828996416 pyconfig.py:471] Config param image_path: I0421 06:29:36.141920 134206828996416 pyconfig.py:471] Config param image_placeholder: <|image|> I0421 06:29:36.141936 134206828996416 pyconfig.py:471] Config param image_size_for_vit: 896 I0421 06:29:36.141951 134206828996416 pyconfig.py:471] Config param indexer_head_dim: 128 I0421 06:29:36.141966 134206828996416 pyconfig.py:471] Config param indexer_loss_scaling_factor: 0.0 I0421 06:29:36.141980 134206828996416 pyconfig.py:471] Config param indexer_n_heads: 64 I0421 06:29:36.141996 134206828996416 pyconfig.py:471] Config param indexer_sparse_training: False I0421 06:29:36.142012 134206828996416 pyconfig.py:471] Config param indexer_topk: 2048 I0421 06:29:36.142026 134206828996416 pyconfig.py:471] Config param inference_benchmark_test: False I0421 06:29:36.142041 134206828996416 pyconfig.py:471] Config param inference_metadata_file: I0421 06:29:36.142055 134206828996416 pyconfig.py:471] Config param inference_microbenchmark_log_file_path: I0421 06:29:36.142072 134206828996416 pyconfig.py:471] Config param inference_microbenchmark_loop_iters: 10 I0421 06:29:36.142088 134206828996416 pyconfig.py:471] Config param inference_microbenchmark_num_samples: [1, 2, 3, 4, 5] I0421 06:29:36.142103 134206828996416 pyconfig.py:471] Config param inference_microbenchmark_prefill_lengths: 64,128,256,512,1024 I0421 06:29:36.142119 134206828996416 pyconfig.py:471] Config param inference_microbenchmark_stages: prefill,generate I0421 06:29:36.142133 134206828996416 pyconfig.py:471] Config param inference_server: MaxtextInterleavedServer I0421 06:29:36.142149 134206828996416 pyconfig.py:471] Config param inhomogeneous_layer_cycle_interval: 1 I0421 06:29:36.142165 134206828996416 pyconfig.py:471] Config param init_weights_seed: 0 I0421 06:29:36.142181 134206828996416 pyconfig.py:471] Config param input_data_sharding_logical_axes: ['activation_embed_and_logits_batch', 'activation_norm_length'] I0421 06:29:36.142196 134206828996416 pyconfig.py:471] Config param interleave_moe_layer_step: 1 I0421 06:29:36.142212 134206828996416 pyconfig.py:471] Config param intermediate_size_for_vit: 5632 I0421 06:29:36.142226 134206828996416 pyconfig.py:471] Config param internal_compile: False I0421 06:29:36.142242 134206828996416 pyconfig.py:471] Config param internal_compile_num_devices: -1 I0421 06:29:36.142256 134206828996416 pyconfig.py:471] Config param jax_cache_dir: ~/jax_cache I0421 06:29:36.142271 134206828996416 pyconfig.py:471] Config param jax_debug_log_modules: I0421 06:29:36.142289 134206828996416 pyconfig.py:471] Config param jax_distributed_initialization_timeout: 300 I0421 06:29:36.142303 134206828996416 pyconfig.py:471] Config param jax_profiler_port: 9999 I0421 06:29:36.142318 134206828996416 pyconfig.py:471] Config param key_proj: RematLocation.REMAT I0421 06:29:36.142334 134206828996416 pyconfig.py:471] Config param kv_cache_buffer: 256 I0421 06:29:36.142349 134206828996416 pyconfig.py:471] Config param kv_lora_rank: 512 I0421 06:29:36.142364 134206828996416 pyconfig.py:471] Config param kv_quant_axis: KvQuantAxis.HEADS_AND_DKV I0421 06:29:36.142382 134206828996416 pyconfig.py:471] Config param kv_quant_dtype: int8 I0421 06:29:36.142397 134206828996416 pyconfig.py:471] Config param kv_wa_proj: RematLocation.REMAT I0421 06:29:36.142412 134206828996416 pyconfig.py:471] Config param learning_rate: 0.0002 I0421 06:29:36.142428 134206828996416 pyconfig.py:471] Config param learning_rate_final_fraction: 0.1 I0421 06:29:36.142443 134206828996416 pyconfig.py:471] Config param learning_rate_schedule_steps: 200000 I0421 06:29:36.142459 134206828996416 pyconfig.py:471] Config param load_balance_loss_weight: 0.0 I0421 06:29:36.142473 134206828996416 pyconfig.py:471] Config param load_checkpoint_only_once: False I0421 06:29:36.142489 134206828996416 pyconfig.py:471] Config param load_from_prefill_dir: False I0421 06:29:36.142503 134206828996416 pyconfig.py:471] Config param load_full_state_path: I0421 06:29:36.142518 134206828996416 pyconfig.py:471] Config param load_parameters_path: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items I0421 06:29:36.142533 134206828996416 pyconfig.py:471] Config param local_checkpoint_directory: I0421 06:29:36.142549 134206828996416 pyconfig.py:471] Config param local_checkpoint_period: 0 I0421 06:29:36.142563 134206828996416 pyconfig.py:471] Config param local_rope_max_timescale: -1 I0421 06:29:36.142578 134206828996416 pyconfig.py:471] Config param local_rope_proportion: 1.0 I0421 06:29:36.142593 134206828996416 pyconfig.py:471] Config param log_config: True I0421 06:29:36.142609 134206828996416 pyconfig.py:471] Config param log_period: 10 I0421 06:29:36.142625 134206828996416 pyconfig.py:471] Config param logical_axis_rules: (('activation_embed_and_logits_batch', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_embed_and_logits_batch_sequence', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_vocab', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_vocab', ('tensor', 'tensor_transpose')), ('activation_vocab', 'tensor_sequence'), ('activation_vocab', ('sequence', 'context')), ('vocab', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('embed_vocab', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence', 'autoregressive')), ('activation_kv_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence')), ('activation_attn_length', ('sequence', 'context')), ('activation_attn_length', ('context',)), ('activation_q_length', ('context',)), ('activation_kv_length', ()), ('activation_attn_embed', ('tensor', 'tensor_transpose')), ('activation_kv', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_kv_head_dim', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('q_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('kv_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('qkv', ()), ('kv', ()), ('kv_head_dim', ()), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'expert')), ('q_lora_up_proj', ()), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'expert')), ('kv_lora_up_proj', ()), ('activation_batch_moe', ('data', 'fsdp', 'fsdp_transpose')), ('activation_length_moe', ('sequence', 'context')), ('activation_length_moe', ('context',)), ('activation_norm_length_moe', ('tensor_sequence', 'context', 'sequence')), ('activation_embed_moe', ('tensor', 'tensor_transpose')), ('activation_mlp_moe', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_exp', ('expert',)), ('exp', 'expert'), ('mlp_moe', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'context')), ('embed_moe', ('fsdp', 'sequence', 'context')), ('activation_mlp', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_length', ('sequence', 'context')), ('activation_length', ('context',)), ('activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_embed', ('tensor', 'tensor_transpose')), ('activation_stage', 'stage'), ('mlp', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'context', 'expert')), ('norm', ('tensor', 'tensor_transpose')), ('layers', 'stage'), ('diloco', 'diloco'), ('engram_dim', ('tensor',)), ('dense_layers', ()), ('moe_layers', ()), ('mhc', ()), ('prefill_activation_length', ('sequence', 'context')), ('prefill_activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_prefill_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_length', ('sequence',)), ('cache_heads', ('autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence')), ('cache_heads', ('autoregressive', 'tensor', 'tensor_sequence')), ('paged_kv_heads', ('tensor',)), ('cache_batch_prefill', ()), ('cache_batch', ()), ('cache_heads_none', ()), ('cache_kv', ()), ('cache_sequence', ()), ('num_pages', ()), ('tokens_per_page', ()), ('paged_kv_head_dim_size', ()), ('mlp_no_fsdp', ('tensor', 'tensor_sequence', 'autoregressive')), ('embed_tensor_transpose', ('tensor_transpose',)), ('exp_with_fsdp', 'fsdp')) I0421 06:29:36.142708 134206828996416 pyconfig.py:471] Config param logits_dot_in_fp32: False I0421 06:29:36.142726 134206828996416 pyconfig.py:471] Config param logits_via_embedding: True I0421 06:29:36.142743 134206828996416 pyconfig.py:471] Config param lora_input_adapters_path: I0421 06:29:36.142758 134206828996416 pyconfig.py:471] Config param loss_algo: grpo I0421 06:29:36.142775 134206828996416 pyconfig.py:471] Config param lr_schedule_type: LearningRateScheduleType.COSINE I0421 06:29:36.142792 134206828996416 pyconfig.py:471] Config param managed_mldiagnostics: False I0421 06:29:36.142808 134206828996416 pyconfig.py:471] Config param managed_mldiagnostics_dir: /deps/maxtext_output/gpt3-52k_2026-04-21-06-29/managed-mldiagnostics I0421 06:29:36.142822 134206828996416 pyconfig.py:471] Config param managed_mldiagnostics_run_group: I0421 06:29:36.142838 134206828996416 pyconfig.py:471] Config param matmul_precision: MatmulPrecision.DEFAULT I0421 06:29:36.142855 134206828996416 pyconfig.py:471] Config param max_checkify: False I0421 06:29:36.142870 134206828996416 pyconfig.py:471] Config param max_concurrency: 256 I0421 06:29:36.142886 134206828996416 pyconfig.py:471] Config param max_corpus_chars: 10000000 I0421 06:29:36.142900 134206828996416 pyconfig.py:471] Config param max_num_batched_tokens: None I0421 06:29:36.142916 134206828996416 pyconfig.py:471] Config param max_num_checkpoints_to_keep: None I0421 06:29:36.142930 134206828996416 pyconfig.py:471] Config param max_num_images_per_example: -1 I0421 06:29:36.142945 134206828996416 pyconfig.py:471] Config param max_num_seqs: None I0421 06:29:36.142961 134206828996416 pyconfig.py:471] Config param max_position_embeddings: 163840 I0421 06:29:36.142976 134206828996416 pyconfig.py:471] Config param max_prefill_predict_length: 64 I0421 06:29:36.142991 134206828996416 pyconfig.py:471] Config param max_sample_len_for_audio: 10000 I0421 06:29:36.143005 134206828996416 pyconfig.py:471] Config param max_segments_per_seq: -1 I0421 06:29:36.143021 134206828996416 pyconfig.py:471] Config param max_source_positions_for_audio: 1500 I0421 06:29:36.143035 134206828996416 pyconfig.py:471] Config param max_target_length: 2048 I0421 06:29:36.143051 134206828996416 pyconfig.py:471] Config param max_timescale_for_audio: 10000.0 I0421 06:29:36.143068 134206828996416 pyconfig.py:471] Config param megablox: True I0421 06:29:36.143084 134206828996416 pyconfig.py:471] Config param merge_gating_gmm: False I0421 06:29:36.143098 134206828996416 pyconfig.py:471] Config param mesh_axes: ['diloco', 'data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'] I0421 06:29:36.143116 134206828996416 pyconfig.py:471] Config param metrics_dir: /deps/maxtext_output/gpt3-52k_2026-04-21-06-29/metrics/ I0421 06:29:36.143132 134206828996416 pyconfig.py:471] Config param metrics_file: I0421 06:29:36.143146 134206828996416 pyconfig.py:471] Config param mhc_expansion_rate: 1 I0421 06:29:36.143162 134206828996416 pyconfig.py:471] Config param micro_batch_size_to_eval_on: 64 I0421 06:29:36.143178 134206828996416 pyconfig.py:471] Config param micro_batch_size_to_train_on: 64 I0421 06:29:36.143192 134206828996416 pyconfig.py:471] Config param mla_kv: RematLocation.REMAT I0421 06:29:36.143207 134206828996416 pyconfig.py:471] Config param mla_naive_kvcache: True I0421 06:29:36.143221 134206828996416 pyconfig.py:471] Config param mla_q: RematLocation.REMAT I0421 06:29:36.143238 134206828996416 pyconfig.py:471] Config param mlp_activations: ['gelu'] I0421 06:29:36.143253 134206828996416 pyconfig.py:471] Config param mlp_activations_limit: -1.0 I0421 06:29:36.143268 134206828996416 pyconfig.py:471] Config param mlp_bias: False I0421 06:29:36.143285 134206828996416 pyconfig.py:471] Config param mlp_dim: 64 I0421 06:29:36.143301 134206828996416 pyconfig.py:471] Config param mlpwi: RematLocation.REMAT I0421 06:29:36.143316 134206828996416 pyconfig.py:471] Config param mlpwi_0: RematLocation.REMAT I0421 06:29:36.143332 134206828996416 pyconfig.py:471] Config param mlpwi_1: RematLocation.REMAT I0421 06:29:36.143347 134206828996416 pyconfig.py:471] Config param mlpwo: RematLocation.REMAT I0421 06:29:36.143362 134206828996416 pyconfig.py:471] Config param moba: False I0421 06:29:36.143377 134206828996416 pyconfig.py:471] Config param moba_chunk_size: 1024 I0421 06:29:36.143392 134206828996416 pyconfig.py:471] Config param moba_topk: 8 I0421 06:29:36.143407 134206828996416 pyconfig.py:471] Config param model_call_mode: I0421 06:29:36.143423 134206828996416 pyconfig.py:471] Config param model_name: gpt3-52k I0421 06:29:36.143438 134206828996416 pyconfig.py:471] Config param moe_expert_input_dim: -1 I0421 06:29:36.143452 134206828996416 pyconfig.py:471] Config param moe_fsdp_use_two_stage_all_gather: False I0421 06:29:36.143468 134206828996416 pyconfig.py:471] Config param moe_mlp_dim: -1 I0421 06:29:36.143482 134206828996416 pyconfig.py:471] Config param moe_mlpwi_0: RematLocation.REMAT I0421 06:29:36.143498 134206828996416 pyconfig.py:471] Config param moe_mlpwi_1: RematLocation.REMAT I0421 06:29:36.143514 134206828996416 pyconfig.py:471] Config param moe_mlpwo: RematLocation.REMAT I0421 06:29:36.143529 134206828996416 pyconfig.py:471] Config param monitor_goodput: False I0421 06:29:36.143545 134206828996416 pyconfig.py:471] Config param monitor_step_time_deviation: True I0421 06:29:36.143559 134206828996416 pyconfig.py:471] Config param mrope_section: [24, 20, 20] I0421 06:29:36.143575 134206828996416 pyconfig.py:471] Config param mscale: 1.0 I0421 06:29:36.143589 134206828996416 pyconfig.py:471] Config param mtc_data_parallelism: 0 I0421 06:29:36.143605 134206828996416 pyconfig.py:471] Config param mtp_eval_target_module: 0 I0421 06:29:36.143619 134206828996416 pyconfig.py:471] Config param mtp_loss_scaling_factor: 0.1 I0421 06:29:36.143635 134206828996416 pyconfig.py:471] Config param mtp_num_layers: 0 I0421 06:29:36.143658 134206828996416 pyconfig.py:471] Config param mu_dtype: float32 I0421 06:29:36.143682 134206828996416 pyconfig.py:471] Config param multi_sampling: False I0421 06:29:36.143698 134206828996416 pyconfig.py:471] Config param multi_tier_checkpointing_backup_interval_minutes: 0 I0421 06:29:36.143712 134206828996416 pyconfig.py:471] Config param muon_beta: 0.95 I0421 06:29:36.143728 134206828996416 pyconfig.py:471] Config param muon_consistent_rms: None I0421 06:29:36.143744 134206828996416 pyconfig.py:471] Config param muon_weight_decay: 0.0 I0421 06:29:36.143759 134206828996416 pyconfig.py:471] Config param n_routing_groups: -1 I0421 06:29:36.143774 134206828996416 pyconfig.py:471] Config param n_window_for_audio: 50 I0421 06:29:36.143789 134206828996416 pyconfig.py:471] Config param n_window_infer_for_audio: 800 I0421 06:29:36.143803 134206828996416 pyconfig.py:471] Config param nope_layer_interval: -1 I0421 06:29:36.143819 134206828996416 pyconfig.py:471] Config param norm_topk_prob: False I0421 06:29:36.143835 134206828996416 pyconfig.py:471] Config param normalization_layer_epsilon: 1e-05 I0421 06:29:36.143851 134206828996416 pyconfig.py:471] Config param normalize_embedding_logits: False I0421 06:29:36.143865 134206828996416 pyconfig.py:471] Config param num_attention_heads_for_vit: 16 I0421 06:29:36.143881 134206828996416 pyconfig.py:471] Config param num_batches: 4 I0421 06:29:36.143895 134206828996416 pyconfig.py:471] Config param num_channels_for_vit: 3 I0421 06:29:36.143911 134206828996416 pyconfig.py:471] Config param num_conv_layers_for_audio: 3 I0421 06:29:36.143925 134206828996416 pyconfig.py:471] Config param num_decoder_layers: 1 I0421 06:29:36.143941 134206828996416 pyconfig.py:471] Config param num_diloco_replicas: 1 I0421 06:29:36.143955 134206828996416 pyconfig.py:471] Config param num_epoch: 1 I0421 06:29:36.143971 134206828996416 pyconfig.py:471] Config param num_eval_passes: 1 I0421 06:29:36.143985 134206828996416 pyconfig.py:471] Config param num_experts: 1 I0421 06:29:36.144001 134206828996416 pyconfig.py:471] Config param num_experts_per_tok: 1 I0421 06:29:36.144017 134206828996416 pyconfig.py:471] Config param num_generations: 2 I0421 06:29:36.144032 134206828996416 pyconfig.py:471] Config param num_hidden_layers_for_vit: 34 I0421 06:29:36.144047 134206828996416 pyconfig.py:471] Config param num_iterations: 1 I0421 06:29:36.144062 134206828996416 pyconfig.py:471] Config param num_kv_heads: 2 I0421 06:29:36.144079 134206828996416 pyconfig.py:471] Config param num_layers_per_pipeline_stage: 1 I0421 06:29:36.144094 134206828996416 pyconfig.py:471] Config param num_mel_bins_for_audio: 128 I0421 06:29:36.144111 134206828996416 pyconfig.py:471] Config param num_pipeline_microbatches: -1 I0421 06:29:36.144126 134206828996416 pyconfig.py:471] Config param num_pipeline_repeats: -1 I0421 06:29:36.144141 134206828996416 pyconfig.py:471] Config param num_position_embeddings_for_vit: 1024 I0421 06:29:36.144157 134206828996416 pyconfig.py:471] Config param num_query_heads: 2 I0421 06:29:36.144171 134206828996416 pyconfig.py:471] Config param num_samplers_slices: -1 I0421 06:29:36.144186 134206828996416 pyconfig.py:471] Config param num_slices: 1 I0421 06:29:36.144200 134206828996416 pyconfig.py:471] Config param num_target_devices: 32 I0421 06:29:36.144216 134206828996416 pyconfig.py:471] Config param num_test_batches: 5 I0421 06:29:36.144232 134206828996416 pyconfig.py:471] Config param num_trainer_slices: -1 I0421 06:29:36.144246 134206828996416 pyconfig.py:471] Config param num_vocab_tiling: 1 I0421 06:29:36.144262 134206828996416 pyconfig.py:471] Config param off_policy_steps: 0 I0421 06:29:36.144279 134206828996416 pyconfig.py:471] Config param offline_data_dir: None I0421 06:29:36.144295 134206828996416 pyconfig.py:471] Config param opt_type: OptimizerType.ADAM_PAX I0421 06:29:36.144313 134206828996416 pyconfig.py:471] Config param optimize_mesh_for_tpu_v6e: False I0421 06:29:36.144328 134206828996416 pyconfig.py:471] Config param optimizer_memory_host_offload: False I0421 06:29:36.144342 134206828996416 pyconfig.py:471] Config param original_max_position_embeddings: 4096 I0421 06:29:36.144358 134206828996416 pyconfig.py:471] Config param out_hidden_size_for_vit: 512 I0421 06:29:36.144372 134206828996416 pyconfig.py:471] Config param out_proj: RematLocation.REMAT I0421 06:29:36.144388 134206828996416 pyconfig.py:471] Config param output_dim_for_audio: 512 I0421 06:29:36.144403 134206828996416 pyconfig.py:471] Config param override_logical_axis_rules: False I0421 06:29:36.144418 134206828996416 pyconfig.py:471] Config param override_model_config: True I0421 06:29:36.144433 134206828996416 pyconfig.py:471] Config param packing: True I0421 06:29:36.144448 134206828996416 pyconfig.py:471] Config param pagedattn_head_dim_alignment: 128 I0421 06:29:36.144463 134206828996416 pyconfig.py:471] Config param pagedattn_max_pages_per_group: -1 I0421 06:29:36.144478 134206828996416 pyconfig.py:471] Config param pagedattn_num_pages: 64 I0421 06:29:36.144494 134206828996416 pyconfig.py:471] Config param pagedattn_pages_per_compute_block: 4 I0421 06:29:36.144508 134206828996416 pyconfig.py:471] Config param pagedattn_tokens_per_page: 32 I0421 06:29:36.144524 134206828996416 pyconfig.py:471] Config param param_scan_axis: 1 I0421 06:29:36.144540 134206828996416 pyconfig.py:471] Config param parameter_memory_host_offload: False I0421 06:29:36.144554 134206828996416 pyconfig.py:471] Config param partial_rotary_factor: 1.0 I0421 06:29:36.144570 134206828996416 pyconfig.py:471] Config param patch_size_for_vit: 14 I0421 06:29:36.144585 134206828996416 pyconfig.py:471] Config param penalty_incorrect_answer: -1.0 I0421 06:29:36.144601 134206828996416 pyconfig.py:471] Config param penalty_incorrect_format: -0.5 I0421 06:29:36.144616 134206828996416 pyconfig.py:471] Config param per_device_batch_size: 2 I0421 06:29:36.144632 134206828996416 pyconfig.py:471] Config param per_device_batch_size_increment: 2.0 I0421 06:29:36.144654 134206828996416 pyconfig.py:471] Config param per_device_batch_size_start: 4.0 I0421 06:29:36.144670 134206828996416 pyconfig.py:471] Config param pipeline_delay_activation_forwarding: False I0421 06:29:36.144685 134206828996416 pyconfig.py:471] Config param pipeline_fsdp_ag_once: False I0421 06:29:36.144702 134206828996416 pyconfig.py:471] Config param pipeline_fsdp_ag_per_repeat: False I0421 06:29:36.144716 134206828996416 pyconfig.py:471] Config param pipeline_parallel_layers: 1 I0421 06:29:36.144732 134206828996416 pyconfig.py:471] Config param pixel_shuffle_ratio_for_vit: 0.5 I0421 06:29:36.144748 134206828996416 pyconfig.py:471] Config param posemb_type_for_vit: learn I0421 06:29:36.144762 134206828996416 pyconfig.py:471] Config param position_id_per_seconds: 25 I0421 06:29:36.144778 134206828996416 pyconfig.py:471] Config param prefill_cache_axis_order: 1,2,0,3 I0421 06:29:36.144792 134206828996416 pyconfig.py:471] Config param prefill_cache_dir: I0421 06:29:36.144808 134206828996416 pyconfig.py:471] Config param prefill_chunk_size: 256 I0421 06:29:36.144823 134206828996416 pyconfig.py:471] Config param prefill_slice: v5e-16 I0421 06:29:36.144839 134206828996416 pyconfig.py:471] Config param prefix_caching_dram_byte: 100000000000 I0421 06:29:36.144854 134206828996416 pyconfig.py:471] Config param prefix_caching_hbm_byte: 10000000000 I0421 06:29:36.144869 134206828996416 pyconfig.py:471] Config param profile_cleanly: True I0421 06:29:36.144884 134206828996416 pyconfig.py:471] Config param profile_periodically_period: -1 I0421 06:29:36.144899 134206828996416 pyconfig.py:471] Config param profile_power_events: False I0421 06:29:36.144913 134206828996416 pyconfig.py:471] Config param profiler: ProfilerType.NONE I0421 06:29:36.144931 134206828996416 pyconfig.py:471] Config param profiler_steps: 5 I0421 06:29:36.144945 134206828996416 pyconfig.py:471] Config param projector_dropout_for_vit: 0.0 I0421 06:29:36.144961 134206828996416 pyconfig.py:471] Config param projector_input_dim_for_vit: 4096 I0421 06:29:36.144975 134206828996416 pyconfig.py:471] Config param projector_output_dim_for_vit: 4096 I0421 06:29:36.144991 134206828996416 pyconfig.py:471] Config param prometheus_port: 0 I0421 06:29:36.145005 134206828996416 pyconfig.py:471] Config param prompt: I love to I0421 06:29:36.145021 134206828996416 pyconfig.py:471] Config param pure_nnx: False I0421 06:29:36.145035 134206828996416 pyconfig.py:471] Config param pure_nnx_decoder: False I0421 06:29:36.145051 134206828996416 pyconfig.py:471] Config param q_lora_rank: 0 I0421 06:29:36.145065 134206828996416 pyconfig.py:471] Config param qk_clip_threshold: 100.0 I0421 06:29:36.145081 134206828996416 pyconfig.py:471] Config param qk_nope_head_dim: 128 I0421 06:29:36.145095 134206828996416 pyconfig.py:471] Config param qk_norm_with_scale: True I0421 06:29:36.145111 134206828996416 pyconfig.py:471] Config param qk_rope_head_dim: 64 I0421 06:29:36.145126 134206828996416 pyconfig.py:471] Config param qkv_proj: RematLocation.REMAT I0421 06:29:36.145142 134206828996416 pyconfig.py:471] Config param quant_cfg_path: I0421 06:29:36.145156 134206828996416 pyconfig.py:471] Config param quantization: QuantizationType.NONE I0421 06:29:36.145174 134206828996416 pyconfig.py:471] Config param quantization_local_shard_count: 4 I0421 06:29:36.145190 134206828996416 pyconfig.py:471] Config param quantize_kvcache: False I0421 06:29:36.145205 134206828996416 pyconfig.py:471] Config param query_proj: RematLocation.REMAT I0421 06:29:36.145220 134206828996416 pyconfig.py:471] Config param query_wa_proj: RematLocation.REMAT I0421 06:29:36.145236 134206828996416 pyconfig.py:471] Config param ragged_block_size: 256 I0421 06:29:36.145250 134206828996416 pyconfig.py:471] Config param ragged_buffer_factor: -1.0 I0421 06:29:36.145266 134206828996416 pyconfig.py:471] Config param rampup_end_step: 0 I0421 06:29:36.145283 134206828996416 pyconfig.py:471] Config param rampup_samples_per_increment_to_load: None I0421 06:29:36.145299 134206828996416 pyconfig.py:471] Config param reasoning_end_token: </reasoning> I0421 06:29:36.145313 134206828996416 pyconfig.py:471] Config param reasoning_start_token: <reasoning> I0421 06:29:36.145329 134206828996416 pyconfig.py:471] Config param record_internal_nn_metrics: 0 I0421 06:29:36.145343 134206828996416 pyconfig.py:471] Config param remat_policy: full I0421 06:29:36.145359 134206828996416 pyconfig.py:471] Config param remat_policy_for_vit: minimal I0421 06:29:36.145375 134206828996416 pyconfig.py:471] Config param remove_size_one_mesh_axis_from_type: True I0421 06:29:36.145391 134206828996416 pyconfig.py:471] Config param replicate_quant_scale: False I0421 06:29:36.145406 134206828996416 pyconfig.py:471] Config param replicator_backup_interval_minutes: 0 I0421 06:29:36.145421 134206828996416 pyconfig.py:471] Config param report_heartbeat_metric_for_gcp_monitoring: False I0421 06:29:36.145435 134206828996416 pyconfig.py:471] Config param report_performance_metric_for_gcp_monitoring: False I0421 06:29:36.145451 134206828996416 pyconfig.py:471] Config param reshape_q: False I0421 06:29:36.145465 134206828996416 pyconfig.py:471] Config param return_log_prob: False I0421 06:29:36.145481 134206828996416 pyconfig.py:471] Config param reuse_example_batch: 0 I0421 06:29:36.145496 134206828996416 pyconfig.py:471] Config param reward_exact_answer: 5.0 I0421 06:29:36.145511 134206828996416 pyconfig.py:471] Config param reward_exact_format_match: 3.0 I0421 06:29:36.145526 134206828996416 pyconfig.py:471] Config param reward_partial_format_match: 0.5 I0421 06:29:36.145541 134206828996416 pyconfig.py:471] Config param reward_ratio_guess_to_answer_high: 0.5 I0421 06:29:36.145556 134206828996416 pyconfig.py:471] Config param reward_ratio_guess_to_answer_low: 0.25 I0421 06:29:36.145572 134206828996416 pyconfig.py:471] Config param reward_white_space_format_match: 1.5 I0421 06:29:36.145588 134206828996416 pyconfig.py:471] Config param rl: {'num_generations': 2, 'num_iterations': 1, 'grpo_beta': 0.08, 'grpo_epsilon': 0.2, 'loss_algo': 'grpo', 'use_agentic_rollout': False, 'max_concurrency': 256, 'off_policy_steps': 0, 'system_prompt': '', 'degenerate_group_masking': True, 'epsilon_high': None} I0421 06:29:36.145609 134206828996416 pyconfig.py:471] Config param rollout_data_parallelism: -1 I0421 06:29:36.145624 134206828996416 pyconfig.py:471] Config param rollout_expert_parallelism: 1 I0421 06:29:36.145648 134206828996416 pyconfig.py:471] Config param rollout_micro_batch_size: -1 I0421 06:29:36.145666 134206828996416 pyconfig.py:471] Config param rollout_tensor_parallelism: -1 I0421 06:29:36.145682 134206828996416 pyconfig.py:471] Config param rope_attention_scaling: False I0421 06:29:36.145699 134206828996416 pyconfig.py:471] Config param rope_factor: 40 I0421 06:29:36.145713 134206828996416 pyconfig.py:471] Config param rope_interleave: True I0421 06:29:36.145729 134206828996416 pyconfig.py:471] Config param rope_linear_scaling_factor: 1.0 I0421 06:29:36.145745 134206828996416 pyconfig.py:471] Config param rope_max_timescale: 10000 I0421 06:29:36.145761 134206828996416 pyconfig.py:471] Config param rope_min_timescale: 1 I0421 06:29:36.145775 134206828996416 pyconfig.py:471] Config param rope_theta_for_vit: 10000 I0421 06:29:36.145792 134206828996416 pyconfig.py:471] Config param rope_truncate: True I0421 06:29:36.145806 134206828996416 pyconfig.py:471] Config param rope_type: RopeType.DEFAULT I0421 06:29:36.145824 134206828996416 pyconfig.py:471] Config param rope_use_scale: True I0421 06:29:36.145838 134206828996416 pyconfig.py:471] Config param routed_bias: False I0421 06:29:36.145854 134206828996416 pyconfig.py:471] Config param routed_bias_update_rate: 0.0 I0421 06:29:36.145869 134206828996416 pyconfig.py:471] Config param routed_scaling_factor: 1.0 I0421 06:29:36.145884 134206828996416 pyconfig.py:471] Config param routed_score_func: I0421 06:29:36.145899 134206828996416 pyconfig.py:471] Config param run_name: gpt3-52k_2026-04-21-06-29 I0421 06:29:36.145914 134206828996416 pyconfig.py:471] Config param sa_block_kv: 512 I0421 06:29:36.145929 134206828996416 pyconfig.py:471] Config param sa_block_kv_compute: 512 I0421 06:29:36.145944 134206828996416 pyconfig.py:471] Config param sa_block_kv_dkv: 512 I0421 06:29:36.145960 134206828996416 pyconfig.py:471] Config param sa_block_kv_dkv_compute: 512 I0421 06:29:36.145976 134206828996416 pyconfig.py:471] Config param sa_block_kv_dq: 512 I0421 06:29:36.145990 134206828996416 pyconfig.py:471] Config param sa_block_q: 512 I0421 06:29:36.146005 134206828996416 pyconfig.py:471] Config param sa_block_q_dkv: 512 I0421 06:29:36.146019 134206828996416 pyconfig.py:471] Config param sa_block_q_dq: 512 I0421 06:29:36.146035 134206828996416 pyconfig.py:471] Config param sa_k_layout: HEAD_DIM_MINOR I0421 06:29:36.146051 134206828996416 pyconfig.py:471] Config param sa_q_layout: HEAD_DIM_MINOR I0421 06:29:36.146065 134206828996416 pyconfig.py:471] Config param sa_use_fused_bwd_kernel: False I0421 06:29:36.146080 134206828996416 pyconfig.py:471] Config param sa_v_layout: HEAD_DIM_MINOR I0421 06:29:36.146095 134206828996416 pyconfig.py:471] Config param sampler_devices_fraction: 0.5 I0421 06:29:36.146111 134206828996416 pyconfig.py:471] Config param save_checkpoint_on_completion: True I0421 06:29:36.146127 134206828996416 pyconfig.py:471] Config param save_config_to_gcs: False I0421 06:29:36.146144 134206828996416 pyconfig.py:471] Config param save_quantized_params_path: I0421 06:29:36.146162 134206828996416 pyconfig.py:471] Config param scale_embedding_for_audio: True I0421 06:29:36.146178 134206828996416 pyconfig.py:471] Config param scan_layers: True I0421 06:29:36.146192 134206828996416 pyconfig.py:471] Config param scan_layers_per_stage: False I0421 06:29:36.146208 134206828996416 pyconfig.py:471] Config param scan_pipeline_iterations: True I0421 06:29:36.146235 134206828996416 pyconfig.py:471] Config param scan_pipeline_repeats: False I0421 06:29:36.146251 134206828996416 pyconfig.py:471] Config param set_remat_policy_on_layers_per_stage: False I0421 06:29:36.146266 134206828996416 pyconfig.py:471] Config param set_remat_policy_on_pipeline_iterations: True I0421 06:29:36.146286 134206828996416 pyconfig.py:471] Config param sft_train_on_completion_only: False I0421 06:29:36.146303 134206828996416 pyconfig.py:471] Config param shard_exp_on_fsdp: False I0421 06:29:36.146317 134206828996416 pyconfig.py:471] Config param shard_mode: ShardMode.AUTO I0421 06:29:36.146335 134206828996416 pyconfig.py:471] Config param shard_optimizer_over_data: False I0421 06:29:36.146351 134206828996416 pyconfig.py:471] Config param sharding_strategy: None I0421 06:29:36.146366 134206828996416 pyconfig.py:471] Config param sharding_tolerance: 0.02 I0421 06:29:36.146382 134206828996416 pyconfig.py:471] Config param shardy: True I0421 06:29:36.146397 134206828996416 pyconfig.py:471] Config param share_kv_projections: False I0421 06:29:36.146413 134206828996416 pyconfig.py:471] Config param shared_experts: 0 I0421 06:29:36.146428 134206828996416 pyconfig.py:471] Config param sinkhorn_iterations: 20 I0421 06:29:36.146442 134206828996416 pyconfig.py:471] Config param skip_first_n_steps_for_profiler: 1 I0421 06:29:36.146458 134206828996416 pyconfig.py:471] Config param skip_jax_distributed_system: False I0421 06:29:36.146473 134206828996416 pyconfig.py:471] Config param skip_step_interval: 128 I0421 06:29:36.146488 134206828996416 pyconfig.py:471] Config param skip_step_on_spikes: False I0421 06:29:36.146503 134206828996416 pyconfig.py:471] Config param skip_step_scaling_factor: 6.0 I0421 06:29:36.146518 134206828996416 pyconfig.py:471] Config param sliding_window_size: 0 I0421 06:29:36.146534 134206828996416 pyconfig.py:471] Config param solution_end_token: </answer> I0421 06:29:36.146549 134206828996416 pyconfig.py:471] Config param solution_start_token: <answer> I0421 06:29:36.146565 134206828996416 pyconfig.py:471] Config param source_checkpoint_layout: orbax I0421 06:29:36.146582 134206828996416 pyconfig.py:471] Config param sparse_matmul: True I0421 06:29:36.146598 134206828996416 pyconfig.py:471] Config param spatial_merge_size_for_vit: 2 I0421 06:29:36.146614 134206828996416 pyconfig.py:471] Config param stack_prefill_result_cache: False I0421 06:29:36.146629 134206828996416 pyconfig.py:471] Config param stack_trace_interval_seconds: 600 I0421 06:29:36.146653 134206828996416 pyconfig.py:471] Config param stack_trace_to_cloud: False I0421 06:29:36.146670 134206828996416 pyconfig.py:471] Config param step_deviation_interval_seconds: 30 I0421 06:29:36.146684 134206828996416 pyconfig.py:471] Config param steps: 200000 I0421 06:29:36.146701 134206828996416 pyconfig.py:471] Config param stop_strings: None I0421 06:29:36.146718 134206828996416 pyconfig.py:471] Config param student_overrides: {'model_name': 'llama3.1-8b'} I0421 06:29:36.146735 134206828996416 pyconfig.py:471] Config param student_params_to_update: None I0421 06:29:36.146749 134206828996416 pyconfig.py:471] Config param subslice_shape: I0421 06:29:36.146765 134206828996416 pyconfig.py:471] Config param swap_space_vllm_gb: 2 I0421 06:29:36.146780 134206828996416 pyconfig.py:471] Config param system_prompt: I0421 06:29:36.146795 134206828996416 pyconfig.py:471] Config param target_eval_loss: 0.0 I0421 06:29:36.146809 134206828996416 pyconfig.py:471] Config param teacher_overrides: {'model_name': 'llama3.1-8b'} I0421 06:29:36.146826 134206828996416 pyconfig.py:471] Config param temperature_tuning: False I0421 06:29:36.146842 134206828996416 pyconfig.py:471] Config param temporal_patch_size_for_vit: 2 I0421 06:29:36.146856 134206828996416 pyconfig.py:471] Config param tensorboard_dir: /deps/maxtext_output/gpt3-52k_2026-04-21-06-29/tensorboard/ I0421 06:29:36.146872 134206828996416 pyconfig.py:471] Config param tensors_on_device: None I0421 06:29:36.146887 134206828996416 pyconfig.py:471] Config param tensors_to_offload: None I0421 06:29:36.146902 134206828996416 pyconfig.py:471] Config param test_batch_start_index: 0 I0421 06:29:36.146917 134206828996416 pyconfig.py:471] Config param tile_size_for_vit: 336 I0421 06:29:36.146932 134206828996416 pyconfig.py:471] Config param tokenize_eval_data: True I0421 06:29:36.146947 134206828996416 pyconfig.py:471] Config param tokenize_train_data: True I0421 06:29:36.146961 134206828996416 pyconfig.py:471] Config param tokenizer_path: meta-llama/Llama-3.1-8B I0421 06:29:36.146977 134206828996416 pyconfig.py:471] Config param tokenizer_type: TokenizerType.HUGGINGFACE I0421 06:29:36.146994 134206828996416 pyconfig.py:471] Config param topk_routing_group: -1 I0421 06:29:36.147010 134206828996416 pyconfig.py:471] Config param train_data_columns: ['text'] I0421 06:29:36.147027 134206828996416 pyconfig.py:471] Config param train_fraction: 1.0 I0421 06:29:36.147041 134206828996416 pyconfig.py:471] Config param train_image_column: image I0421 06:29:36.147056 134206828996416 pyconfig.py:471] Config param train_micro_batch_size: -1 I0421 06:29:36.147071 134206828996416 pyconfig.py:471] Config param train_split: train I0421 06:29:36.147086 134206828996416 pyconfig.py:471] Config param trainable_parameters_mask: [] I0421 06:29:36.147102 134206828996416 pyconfig.py:471] Config param trainable_position_size: 2048 I0421 06:29:36.147116 134206828996416 pyconfig.py:471] Config param trainer_devices_fraction: 0.5 I0421 06:29:36.147132 134206828996416 pyconfig.py:471] Config param upload_all_profiler_results: False I0421 06:29:36.147146 134206828996416 pyconfig.py:471] Config param use_2d_fsdp_sharding: False I0421 06:29:36.147161 134206828996416 pyconfig.py:471] Config param use_agentic_rollout: False I0421 06:29:36.147176 134206828996416 pyconfig.py:471] Config param use_audio: False I0421 06:29:36.147191 134206828996416 pyconfig.py:471] Config param use_audio_in_video: False I0421 06:29:36.147206 134206828996416 pyconfig.py:471] Config param use_batch_split_schedule: False I0421 06:29:36.147222 134206828996416 pyconfig.py:471] Config param use_chat_template: False I0421 06:29:36.147237 134206828996416 pyconfig.py:471] Config param use_chunked_prefill: False I0421 06:29:36.147252 134206828996416 pyconfig.py:471] Config param use_custom_sort_vjp: True I0421 06:29:36.147267 134206828996416 pyconfig.py:471] Config param use_dpo: False I0421 06:29:36.147288 134206828996416 pyconfig.py:471] Config param use_gather_mosaic_kernel: False I0421 06:29:36.147303 134206828996416 pyconfig.py:471] Config param use_grpo: True I0421 06:29:36.147319 134206828996416 pyconfig.py:471] Config param use_indexer: False I0421 06:29:36.147333 134206828996416 pyconfig.py:471] Config param use_iota_embed: True I0421 06:29:36.147348 134206828996416 pyconfig.py:471] Config param use_jax_splash: False I0421 06:29:36.147363 134206828996416 pyconfig.py:471] Config param use_max_logit_estimate: -1 I0421 06:29:36.147378 134206828996416 pyconfig.py:471] Config param use_mrope: False I0421 06:29:36.147392 134206828996416 pyconfig.py:471] Config param use_multimodal: False I0421 06:29:36.147408 134206828996416 pyconfig.py:471] Config param use_pathways: True I0421 06:29:36.147423 134206828996416 pyconfig.py:471] Config param use_post_attn_norm: False I0421 06:29:36.147438 134206828996416 pyconfig.py:471] Config param use_post_ffw_norm: False I0421 06:29:36.147454 134206828996416 pyconfig.py:471] Config param use_qk_clip: False I0421 06:29:36.147468 134206828996416 pyconfig.py:471] Config param use_qk_norm: False I0421 06:29:36.147484 134206828996416 pyconfig.py:471] Config param use_qk_norm_in_gdn: True I0421 06:29:36.147500 134206828996416 pyconfig.py:471] Config param use_qwix_quantization: False I0421 06:29:36.147516 134206828996416 pyconfig.py:471] Config param use_ragged_attention: False I0421 06:29:36.147531 134206828996416 pyconfig.py:471] Config param use_random_routing: False I0421 06:29:36.147547 134206828996416 pyconfig.py:471] Config param use_replicator_service: False I0421 06:29:36.147561 134206828996416 pyconfig.py:471] Config param use_ring_of_experts: False I0421 06:29:36.147577 134206828996416 pyconfig.py:471] Config param use_sft: False I0421 06:29:36.147593 134206828996416 pyconfig.py:471] Config param use_splash_scheduler: False I0421 06:29:36.147607 134206828996416 pyconfig.py:471] Config param use_tokamax_gmm: False I0421 06:29:36.147623 134206828996416 pyconfig.py:471] Config param use_tokamax_splash: False I0421 06:29:36.147645 134206828996416 pyconfig.py:471] Config param use_truncation: True I0421 06:29:36.147662 134206828996416 pyconfig.py:471] Config param use_tunix_gradient_accumulation: False I0421 06:29:36.147677 134206828996416 pyconfig.py:471] Config param use_untrainable_positional_embedding: False I0421 06:29:36.147693 134206828996416 pyconfig.py:471] Config param use_vertex_tensorboard: False I0421 06:29:36.147709 134206828996416 pyconfig.py:471] Config param using_pipeline_parallelism: False I0421 06:29:36.147725 134206828996416 pyconfig.py:471] Config param v_head_dim: 128 I0421 06:29:36.147740 134206828996416 pyconfig.py:471] Config param v_norm_with_scale: True I0421 06:29:36.147756 134206828996416 pyconfig.py:471] Config param value_proj: RematLocation.REMAT I0421 06:29:36.147770 134206828996416 pyconfig.py:471] Config param vertex_tensorboard_project: I0421 06:29:36.147786 134206828996416 pyconfig.py:471] Config param vertex_tensorboard_region: I0421 06:29:36.147800 134206828996416 pyconfig.py:471] Config param video_path: I0421 06:29:36.147817 134206828996416 pyconfig.py:471] Config param video_placeholder: <|video|> I0421 06:29:36.147832 134206828996416 pyconfig.py:471] Config param vision_output_dim_for_vit: 4096 I0421 06:29:36.147848 134206828996416 pyconfig.py:471] Config param vision_output_length: -1 I0421 06:29:36.147864 134206828996416 pyconfig.py:471] Config param vllm_additional_config: {} I0421 06:29:36.147878 134206828996416 pyconfig.py:471] Config param vllm_hf_config_path: I0421 06:29:36.147894 134206828996416 pyconfig.py:471] Config param vllm_hf_overrides: {} I0421 06:29:36.147911 134206828996416 pyconfig.py:471] Config param vocab_size: 32000 I0421 06:29:36.147925 134206828996416 pyconfig.py:471] Config param warmup_steps_fraction: 0.1 I0421 06:29:36.147942 134206828996416 pyconfig.py:471] Config param weight_dtype: float32 I0421 06:29:36.147968 134206828996416 pyconfig.py:471] Config param weight_quantization_calibration_method: absmax I0421 06:29:36.147983 134206828996416 pyconfig.py:471] Config param wi_tile_dlhs_batch_seq: 512 I0421 06:29:36.147998 134206828996416 pyconfig.py:471] Config param wi_tile_dlhs_embed_dim: 1024 I0421 06:29:36.148013 134206828996416 pyconfig.py:471] Config param wi_tile_dlhs_mlp_dim: 1024 I0421 06:29:36.148029 134206828996416 pyconfig.py:471] Config param wi_tile_drhs_batch_seq: 512 I0421 06:29:36.148043 134206828996416 pyconfig.py:471] Config param wi_tile_drhs_embed_dim: 1024 I0421 06:29:36.148059 134206828996416 pyconfig.py:471] Config param wi_tile_drhs_mlp_dim: 1024 I0421 06:29:36.148073 134206828996416 pyconfig.py:471] Config param wi_tile_fwd_batch_seq: 512 I0421 06:29:36.148089 134206828996416 pyconfig.py:471] Config param wi_tile_fwd_embed_dim: 1024 I0421 06:29:36.148103 134206828996416 pyconfig.py:471] Config param wi_tile_fwd_mlp_dim: 1024 I0421 06:29:36.148118 134206828996416 pyconfig.py:471] Config param wo_tile_dlhs_batch_seq: 512 I0421 06:29:36.148132 134206828996416 pyconfig.py:471] Config param wo_tile_dlhs_embed_dim: 1024 I0421 06:29:36.148148 134206828996416 pyconfig.py:471] Config param wo_tile_dlhs_mlp_dim: 1024 I0421 06:29:36.148163 134206828996416 pyconfig.py:471] Config param wo_tile_drhs_batch_seq: 512 I0421 06:29:36.148177 134206828996416 pyconfig.py:471] Config param wo_tile_drhs_embed_dim: 1024 I0421 06:29:36.148192 134206828996416 pyconfig.py:471] Config param wo_tile_drhs_mlp_dim: 1024 I0421 06:29:36.148207 134206828996416 pyconfig.py:471] Config param wo_tile_fwd_batch_seq: 512 I0421 06:29:36.148223 134206828996416 pyconfig.py:471] Config param wo_tile_fwd_embed_dim: 1024 I0421 06:29:36.148237 134206828996416 pyconfig.py:471] Config param wo_tile_fwd_mlp_dim: 1024 I0421 06:29:36.148253 134206828996416 pyconfig.py:471] Config param wsd_decay_steps_fraction: 0.1 I0421 06:29:36.148268 134206828996416 pyconfig.py:471] Config param wsd_decay_style: WsdDecayStyle.LINEAR I0421 06:29:36.148290 134206828996416 pyconfig.py:471] Config param xprof_e2e_enable_fw_power_level_event: False I0421 06:29:36.148304 134206828996416 pyconfig.py:471] Config param xprof_e2e_enable_fw_thermal_event: False I0421 06:29:36.148320 134206828996416 pyconfig.py:471] Config param xprof_e2e_enable_fw_throttle_event: False I0421 06:29:36.148334 134206828996416 pyconfig.py:471] Config param xprof_tpu_power_trace_level: 0 I0421 06:29:36.148352 134206828996416 pyconfig.py:471] Config param z_loss_multiplier: 0.0 I0421 06:29:36.148677 134206828996416 tokenizer.py:245] Tokenizer path: meta-llama/Llama-2-7b-chat-hf I0421 06:29:36.148714 134206828996416 tokenizer.py:224] Loading HF tokenizer: meta-llama/Llama-2-7b-chat-hf I0421 06:29:39.874742 134206828996416 _schedule.py:129] A polynomial schedule was set with a non-positive `transition_steps` value; this results in a constant schedule with value `init_value`. I0421 06:29:39.877760 134206828996416 maxtext_utils.py:1565] Num_devices: 32, shape (1, 4, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1) I0421 06:29:39.877882 134206828996416 train_distill.py:596] Applying logical axis rules for model initialization and training... I0421 06:29:39.877952 134206828996416 train_distill.py:600] Loading Student from ... I0421 06:29:39.877980 134206828996416 train_distill.py:169] --- Student Configuration --- I0421 06:29:39.878001 134206828996416 train_distill.py:170] Model Name: gpt3-52k I0421 06:29:39.878023 134206828996416 train_distill.py:171] Dimensions: 1 Layers, 16 Emb Dim, 8 Head Dim I0421 06:29:39.878042 134206828996416 train_distill.py:174] Attention Heads: 2 Query, 2 KV I0421 06:29:39.878061 134206828996416 train_distill.py:175] Vocab Size: 32000 I0421 06:29:39.878079 134206828996416 train_distill.py:176] Checkpoint: I0421 06:29:39.878098 134206828996416 train_distill.py:465] Initializing model: gpt3-52k... I0421 06:29:41.277094 134206828996416 train_distill.py:614] Loading Teacher from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items... I0421 06:29:41.277207 134206828996416 train_distill.py:169] --- Teacher Configuration --- I0421 06:29:41.277237 134206828996416 train_distill.py:170] Model Name: gpt3-52k I0421 06:29:41.277261 134206828996416 train_distill.py:171] Dimensions: 1 Layers, 16 Emb Dim, 8 Head Dim I0421 06:29:41.277282 134206828996416 train_distill.py:174] Attention Heads: 2 Query, 2 KV I0421 06:29:41.277301 134206828996416 train_distill.py:175] Vocab Size: 32000 I0421 06:29:41.277324 134206828996416 train_distill.py:176] Checkpoint: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items I0421 06:29:41.277344 134206828996416 train_distill.py:465] Initializing model: gpt3-52k... I0421 06:29:42.344802 134206828996416 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0421 06:29:42.345315 134206828996416 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=True, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7a0ebee09eb0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0421 06:29:42.345382 134206828996416 abstract_checkpointer.py:35] orbax-checkpoint version: 0.11.28 W0421 06:29:42.855374 134206828996416 checkpoint.py:202] Metadata file does not exist: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items/_CHECKPOINT_METADATA I0421 06:29:43.403298 2112 google_auth_provider.cc:181] Running on GCE, using service account 562977990677-compute@developer.gserviceaccount.com I0421 06:29:44.981282 134206828996416 checkpointer.py:304] Restoring checkpoint from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items. W0421 06:29:47.524186 134206828996416 transform_utils.py:230] The transformations API will eventually be replaced by an upgraded design. The current API will not be removed until this point, but it will no longer be actively worked on. I0421 06:29:47.524569 134206828996416 transform_utils.py:288] The following keys are not loaded from the original tree after applying specified transforms: params/params/decoder/to_nnx__rngs/aqt/count, params/params/decoder/to_nnx__rngs/aqt/key, params/params/decoder/to_nnx__rngs/dropout/count, params/params/decoder/to_nnx__rngs/dropout/key, params/params/decoder/to_nnx__rngs/params/count, params/params/decoder/to_nnx__rngs/params/key I0421 06:29:48.496541 134206828996416 checkpointer.py:318] Finished restoring checkpoint in 4.33 seconds from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items. I0421 06:29:49.192152 134206828996416 train_distill.py:640] Initializing Data Iterators via MaxText pipeline... I0421 06:29:49.257752 134206828996416 config.py:112] TensorFlow version 2.20.0 available. I0421 06:29:49.258237 134206828996416 config.py:125] JAX version 0.8.3 available. E0421 06:29:51.409517 134206828996416 packing.py:209] PackAndBatchOperation is deprecated. Please use lazy_dataset.FirstFitPackIterDataset instead. I0421 06:29:51.409752 134206828996416 data_loader.py:408] Adding CopyNumPyArrayToSharedMemory MapTransform. I0421 06:29:51.414084 134206828996416 train_distill.py:410] Input Pipeline Checkpointing: DISABLED I0421 06:29:51.414150 134206828996416 train_distill.py:414] Reason: Iterator 'MultiHostDataLoadIterator' is not recognized as Grain (dataset_type='DatasetType.HF', has_save=False) I0421 06:29:51.414214 134206828996416 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0421 06:29:51.414290 134206828996416 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7a0ebee09eb0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0421 06:29:51.414329 134206828996416 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0421 06:29:51.414360 134206828996416 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7a0ebee09eb0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0421 06:29:51.414402 134206828996416 checkpoint_manager.py:702] [process=4][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x79f4d0395e50>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x79f4d068f650>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x79f4d068f5c0>}, handler_registry=None I0421 06:29:51.414603 134206828996416 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x79f4d0395e50>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0421 06:29:51.414657 134206828996416 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x79f4d068f650>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0421 06:29:51.414685 134206828996416 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x79f4d068f5c0>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0421 06:29:51.414710 134206828996416 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x79f4d068f230>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0421 06:29:51.414737 134206828996416 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x79f4d0395e50>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x79f4d0395e50>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x79f4d068f650>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x79f4d068f650>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x79f4d068f5c0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x79f4d068f5c0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x79f4d068f230>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x79f4d068f230>}). I0421 06:29:51.415130 134206828996416 async_checkpointer.py:177] [process=4][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x79f4d0758e00> timeout: 600 secs and primary_host=0 for async checkpoint writes I0421 06:29:54.299156 134206828996416 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_main_20260421_061409/pt_distill_linen_xpk_main_20260421_061409_07_distill_smoke/checkpoints I0421 06:29:54.741017 134206828996416 checkpoint_manager.py:921] [process=4][thread=MainThread] CheckpointManager created, primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_main_20260421_061409/pt_distill_linen_xpk_main_20260421_061409_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x79f4d068f590> I0421 06:29:54.741188 134206828996416 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0421 06:29:54.741261 134206828996416 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7a0ebee09eb0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0421 06:29:54.741296 134206828996416 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0421 06:29:54.741328 134206828996416 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7a0ebee09eb0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0421 06:29:54.741362 134206828996416 checkpoint_manager.py:1983] [process=4][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning. I0421 06:29:54.741416 134206828996416 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=134206828996416 count=1 at 0x79f69bd3acc0>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x79f4d068f3b0>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x79f4d068f380>, _write_futures=[]) I0421 06:29:54.741807 134206828996416 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=134206828996416 count=1 at 0x79f69bd3acc0>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x79f4d068f3b0>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x79f4d068f380>, _write_futures=[]) I0421 06:29:54.741835 134206828996416 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=134206828996416 count=1 at 0x79f69bd3acc0>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x79f4d068f3b0>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x79f4d068f380>, _write_futures=[]) I0421 06:29:54.741867 134206828996416 checkpoint_manager.py:702] [process=4][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x79f4d068f560>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x79f4d068c1d0>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x79f4d06eb260>, 'iter': <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x79f4d06eacf0>}, handler_registry=None I0421 06:29:54.741972 134206828996416 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x79f4d068f560>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0421 06:29:54.742007 134206828996416 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x79f4d068c1d0>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0421 06:29:54.742031 134206828996416 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x79f4d06eb260>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0421 06:29:54.742059 134206828996416 composite_checkpoint_handler.py:237] Deferred registration for item: "iter". Adding handler `<maxtext.common.checkpointing.GrainCheckpointHandler object at 0x79f4d06eacf0>` for item "iter" and save args `<class 'maxtext.common.checkpointing.GrainCheckpointSave'>` and restore args `<class 'maxtext.common.checkpointing.GrainCheckpointRestore'>` to `_handler_registry`. I0421 06:29:54.742081 134206828996416 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x79f4d06e9d30>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0421 06:29:54.742106 134206828996416 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x79f4d068f560>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x79f4d068f560>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x79f4d068c1d0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x79f4d068c1d0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x79f4d06eb260>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x79f4d06eb260>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointSave'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x79f4d06eacf0>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointRestore'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x79f4d06eacf0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x79f4d06e9d30>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x79f4d06e9d30>}). I0421 06:29:54.742177 134206828996416 async_checkpointer.py:177] [process=4][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x79f4d0758f40> timeout: 600 secs and primary_host=0 for async checkpoint writes I0421 06:29:55.563832 134206828996416 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_main_20260421_061409/pt_distill_linen_xpk_main_20260421_061409_07_distill_smoke/checkpoints I0421 06:29:55.988009 134206828996416 checkpoint_manager.py:921] [process=4][thread=MainThread] CheckpointManager created, primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_main_20260421_061409/pt_distill_linen_xpk_main_20260421_061409_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x79f4d06eb7a0> I0421 06:29:55.988593 134206828996416 train_distill.py:691] Starting Distillation Training... I0421 06:29:55.988716 134206828996416 peft_trainer.py:590] Training with mesh: Mesh('diloco': 1, 'data': 4, 'stage': 1, 'fsdp': 8, 'fsdp_transpose': 1, 'sequence': 1, 'context': 1, 'context_autoregressive': 1, 'tensor': 1, 'tensor_transpose': 1, 'tensor_sequence': 1, 'expert': 1, 'autoregressive': 1, axis_types=(Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto)) I0421 06:29:56.713052 134206828996416 peft_trainer.py:600] Compiled train_step cache size: 0 Training: 0%| | 0/5 [00:00<?, ?step/s]I0421 06:29:56.714944 134064233240320 grain_pool.py:367] Grain pool will use 1 processes. I0421 06:29:56.742247 134064233240320 grain_pool.py:440] Grain pool will start child processes. I0421 06:29:56.747099 134064233240320 grain_pool.py:448] Grain pool started all child processes. 2026-04-21 06:30:02.750525: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303) Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} `rope_scaling`'s factor field must be a float >= 1, got 40 `rope_scaling`'s beta_fast field must be a float, got 32 `rope_scaling`'s beta_slow field must be a float, got 1 Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} I0421 06:30:06.360327 134206828996416 utils.py:86] Train loop finished in: 9.6466 seconds Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 765, in <module> app.run(main) File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 316, in run _run_main(main, args) File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 261, in _run_main sys.exit(main(argv)) ^^^^^^^^^^ File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 761, in main train_distill(student_config, teacher_config, is_offline, global_config.offline_data_dir) File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 693, in train_distill trainer.train(train_iter, eval_iter) File "/usr/local/lib/python3.12/site-packages/tunix/sft/peft_trainer.py", line 659, in train train_example = sharding_utils.shard_input( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 58, in shard_input return jax.tree.map( ^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/tree.py", line 155, in map return tree_util.tree_map(f, tree, *rest, is_leaf=is_leaf) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 362, in tree_map return treedef.unflatten(f(*xs) for xs in zip(*all_leaves)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 362, in <genexpr> return treedef.unflatten(f(*xs) for xs in zip(*all_leaves)) ^^^^^^ File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 59, in <lambda> lambda x: jax.make_array_from_process_local_data( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 986, in make_array_from_process_local_data out = [_array_from_process_local_data(data, s, shape) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 1048, in _array_from_process_local_data return make_array_from_callback(global_shape, sharding, cb) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 845, in make_array_from_callback per_device_values = api.device_put(per_device_values, devices) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/api.py", line 2729, in device_put out_flat = dispatch._batched_device_put_impl( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 558, in _batched_device_put_impl y = _device_put_impl(x, device=device, src=src, copy=cp, aval=aval) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 545, in _device_put_impl return _device_put_sharding_impl(x, aval, device, copy) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 487, in _device_put_sharding_impl raise ValueError( ValueError: device_put's first argument must be a fully addressable array, but got value with devices {TpuDevice(id=16, process_index=4, coords=(0,4,0), core_on_chip=0), TpuDevice(id=12, process_index=2, coords=(0,3,0), core_on_chip=0), TpuDevice(id=18, process_index=5, coords=(2,4,0), core_on_chip=0), TpuDevice(id=2, process_index=1, coords=(2,0,0), core_on_chip=0), TpuDevice(id=6, process_index=1, coords=(2,1,0), core_on_chip=0), TpuDevice(id=3, process_index=1, coords=(3,0,0), core_on_chip=0), TpuDevice(id=31, process_index=7, coords=(3,7,0), core_on_chip=0), TpuDevice(id=5, process_index=0, coords=(1,1,0), core_on_chip=0), TpuDevice(id=25, process_index=6, coords=(1,6,0), core_on_chip=0), TpuDevice(id=26, process_index=7, coords=(2,6,0), core_on_chip=0), TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), TpuDevice(id=17, process_index=4, coords=(1,4,0), core_on_chip=0), TpuDevice(id=28, process_index=6, coords=(0,7,0), core_on_chip=0), TpuDevice(id=13, process_index=2, coords=(1,3,0), core_on_chip=0), TpuDevice(id=1, process_index=0, coords=(1,0,0), core_on_chip=0), TpuDevice(id=10, process_index=3, coords=(2,2,0), core_on_chip=0), TpuDevice(id=22, process_index=5, coords=(2,5,0), core_on_chip=0), TpuDevice(id=4, process_index=0, coords=(0,1,0), core_on_chip=0), TpuDevice(id=7, process_index=1, coords=(3,1,0), core_on_chip=0), TpuDevice(id=21, process_index=4, coords=(1,5,0), core_on_chip=0), TpuDevice(id=9, process_index=2, coords=(1,2,0), core_on_chip=0), TpuDevice(id=14, process_index=3, coords=(2,3,0), core_on_chip=0), TpuDevice(id=8, process_index=2, coords=(0,2,0), core_on_chip=0), TpuDevice(id=27, process_index=7, coords=(3,6,0), core_on_chip=0), TpuDevice(id=20, process_index=4, coords=(0,5,0), core_on_chip=0), TpuDevice(id=23, process_index=5, coords=(3,5,0), core_on_chip=0), TpuDevice(id=30, process_index=7, coords=(2,7,0), core_on_chip=0), TpuDevice(id=11, process_index=3, coords=(3,2,0), core_on_chip=0), TpuDevice(id=24, process_index=6, coords=(0,6,0), core_on_chip=0), TpuDevice(id=29, process_index=6, coords=(1,7,0), core_on_chip=0), TpuDevice(id=15, process_index=3, coords=(3,3,0), core_on_chip=0), TpuDevice(id=19, process_index=5, coords=(3,4,0), core_on_chip=0)} I0421 06:30:06.823846 134064233240320 grain_pool.py:542] Grain pool is exiting. I0421 06:30:06.823947 134064233240320 grain_pool.py:547] Shutting down multiprocessing system. I0421 06:30:08.278252 134064233240320 grain_pool.py:547] Shutting down multiprocessing system. Training: 0%| | 0/5 [00:13<?, ?step/s] /usr/local/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 15 leaked shared_memory objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' XPK End: Tue Apr 21 06:30:18 UTC 2026 EXIT_CODE=1