test/pipeline-scan-nnxXPK Start: Sat Apr 25 20:20:19 UTC 2026 2026-04-25 20:20:36.482522: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303) Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} `rope_scaling`'s factor field must be a float >= 1, got 40 `rope_scaling`'s beta_fast field must be a float, got 32 `rope_scaling`'s beta_slow field must be a float, got 1 Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} I0425 20:20:40.462960 137093242799936 max_utils.py:273] Attempting to initialize the jax distributed system... INFO:2026-04-25 20:20:49,502:jax._src.distributed:149: Starting JAX distributed service on [::]:8482 I0425 20:20:49.502413 137093242799936 distributed.py:149] Starting JAX distributed service on [::]:8482 INFO:2026-04-25 20:20:49,504:jax._src.distributed:166: Connecting to JAX distributed service on mt-07-distill-smoke-afxad-slice-job-0-0.mt-07-distill-smoke-afxad:8482 I0425 20:20:49.504888 137093242799936 distributed.py:166] Connecting to JAX distributed service on mt-07-distill-smoke-afxad-slice-job-0-0.mt-07-distill-smoke-afxad:8482 I0425 20:20:50.871732 137093242799936 max_utils.py:284] Jax distributed system initialized! I0425 20:20:57.363141 137093242799936 max_utils.py:244] Jax distributed system is already initialized. W0425 20:20:57.493292 137093242799936 pyconfig.py:111] base_output_directory is not provided; Using local directory called maxtext_output I0425 20:20:57.553258 137093242799936 max_utils.py:244] Jax distributed system is already initialized. I0425 20:20:57.554524 137093242799936 pyconfig.py:471] Config param abort_on_inf_loss: True I0425 20:20:57.554575 137093242799936 pyconfig.py:471] Config param abort_on_nan_loss: True I0425 20:20:57.554600 137093242799936 pyconfig.py:471] Config param act_quantization_calibration_method: absmax I0425 20:20:57.554629 137093242799936 pyconfig.py:471] Config param activation_dropout_for_audio: 0.0 I0425 20:20:57.554659 137093242799936 pyconfig.py:471] Config param activation_function_for_audio: gelu I0425 20:20:57.554688 137093242799936 pyconfig.py:471] Config param activations_in_float32: False I0425 20:20:57.554716 137093242799936 pyconfig.py:471] Config param adam_b1: 0.9 I0425 20:20:57.554748 137093242799936 pyconfig.py:471] Config param adam_b2: 0.95 I0425 20:20:57.554772 137093242799936 pyconfig.py:471] Config param adam_eps: 1e-08 I0425 20:20:57.554798 137093242799936 pyconfig.py:471] Config param adam_eps_root: 0.0 I0425 20:20:57.554823 137093242799936 pyconfig.py:471] Config param adam_weight_decay: 0.1 I0425 20:20:57.554849 137093242799936 pyconfig.py:471] Config param adamw_mask: [] I0425 20:20:57.554890 137093242799936 pyconfig.py:471] Config param add_bos: True I0425 20:20:57.554916 137093242799936 pyconfig.py:471] Config param add_eos: True I0425 20:20:57.554939 137093242799936 pyconfig.py:471] Config param allow_split_physical_axes: False I0425 20:20:57.554965 137093242799936 pyconfig.py:471] Config param ar_cache_axis_order: 1,2,0,3 I0425 20:20:57.554991 137093242799936 pyconfig.py:471] Config param async_checkpointing: True I0425 20:20:57.555016 137093242799936 pyconfig.py:471] Config param async_scheduling: False I0425 20:20:57.555042 137093242799936 pyconfig.py:471] Config param attention: dot_product I0425 20:20:57.555067 137093242799936 pyconfig.py:471] Config param attention_bias: False I0425 20:20:57.555092 137093242799936 pyconfig.py:471] Config param attention_dropout_for_audio: 0.0 I0425 20:20:57.555124 137093242799936 pyconfig.py:471] Config param attention_out: RematLocation.REMAT I0425 20:20:57.555153 137093242799936 pyconfig.py:471] Config param attention_output_dim: -1 I0425 20:20:57.555171 137093242799936 pyconfig.py:471] Config param attention_sink: False I0425 20:20:57.555197 137093242799936 pyconfig.py:471] Config param attention_type: global I0425 20:20:57.555221 137093242799936 pyconfig.py:471] Config param attn_logits_soft_cap: None I0425 20:20:57.555244 137093242799936 pyconfig.py:471] Config param audio_path: I0425 20:20:57.555265 137093242799936 pyconfig.py:471] Config param audio_placeholder: <|audio|> I0425 20:20:57.555280 137093242799936 pyconfig.py:471] Config param autoregressive_decode_assert: I0425 20:20:57.555296 137093242799936 pyconfig.py:471] Config param base_config: base.yml I0425 20:20:57.555312 137093242799936 pyconfig.py:471] Config param base_emb_dim: 16 I0425 20:20:57.555331 137093242799936 pyconfig.py:471] Config param base_mlp_dim: 64 I0425 20:20:57.555347 137093242799936 pyconfig.py:471] Config param base_moe_mlp_dim: -1 I0425 20:20:57.555362 137093242799936 pyconfig.py:471] Config param base_num_decoder_layers: 1 I0425 20:20:57.555379 137093242799936 pyconfig.py:471] Config param base_num_kv_heads: 2 I0425 20:20:57.555393 137093242799936 pyconfig.py:471] Config param base_num_query_heads: 2 I0425 20:20:57.555409 137093242799936 pyconfig.py:471] Config param base_output_directory: /deps/maxtext_output I0425 20:20:57.555424 137093242799936 pyconfig.py:471] Config param batch_size: 1 I0425 20:20:57.555440 137093242799936 pyconfig.py:471] Config param batch_split_factor: 1 I0425 20:20:57.555455 137093242799936 pyconfig.py:471] Config param beta_fast: 32 I0425 20:20:57.555471 137093242799936 pyconfig.py:471] Config param beta_slow: 1 I0425 20:20:57.555485 137093242799936 pyconfig.py:471] Config param bwd_quantization_calibration_method: absmax I0425 20:20:57.555503 137093242799936 pyconfig.py:471] Config param capacity_factor: -1.0 I0425 20:20:57.555518 137093242799936 pyconfig.py:471] Config param cast_logits_to_fp32: True I0425 20:20:57.555534 137093242799936 pyconfig.py:471] Config param chat_template: I0425 20:20:57.555551 137093242799936 pyconfig.py:471] Config param chat_template_path: I0425 20:20:57.555567 137093242799936 pyconfig.py:471] Config param checkpoint_conversion_fn: None I0425 20:20:57.555583 137093242799936 pyconfig.py:471] Config param checkpoint_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-20-20/checkpoints/ I0425 20:20:57.555600 137093242799936 pyconfig.py:471] Config param checkpoint_is_quantized: False I0425 20:20:57.555615 137093242799936 pyconfig.py:471] Config param checkpoint_period: 2000 I0425 20:20:57.555630 137093242799936 pyconfig.py:471] Config param checkpoint_storage_concurrent_gb: 96 I0425 20:20:57.555646 137093242799936 pyconfig.py:471] Config param checkpoint_storage_target_data_file_size_bytes: 2147483648 I0425 20:20:57.555662 137093242799936 pyconfig.py:471] Config param checkpoint_storage_use_ocdbt: True I0425 20:20:57.555676 137093242799936 pyconfig.py:471] Config param checkpoint_storage_use_zarr3: True I0425 20:20:57.555692 137093242799936 pyconfig.py:471] Config param checkpoint_todelete_full_path: None I0425 20:20:57.555706 137093242799936 pyconfig.py:471] Config param checkpoint_todelete_subdir: None I0425 20:20:57.555721 137093242799936 pyconfig.py:471] Config param chips_per_vm: 4 I0425 20:20:57.555736 137093242799936 pyconfig.py:471] Config param chunk_attn_window_size: 0 I0425 20:20:57.555751 137093242799936 pyconfig.py:471] Config param collect_stack_trace: False I0425 20:20:57.555768 137093242799936 pyconfig.py:471] Config param colocated_python_checkpointing: False I0425 20:20:57.555784 137093242799936 pyconfig.py:471] Config param colocated_python_data_input: False I0425 20:20:57.555799 137093242799936 pyconfig.py:471] Config param compile_topology: I0425 20:20:57.555815 137093242799936 pyconfig.py:471] Config param compile_topology_num_slices: -1 I0425 20:20:57.555829 137093242799936 pyconfig.py:471] Config param compile_xla_flags: I0425 20:20:57.555845 137093242799936 pyconfig.py:471] Config param compiled_trainstep_file: I0425 20:20:57.555859 137093242799936 pyconfig.py:471] Config param compute_axis_order: 0,1,2,3 I0425 20:20:57.555901 137093242799936 pyconfig.py:471] Config param constant_bound_config: [] I0425 20:20:57.555918 137093242799936 pyconfig.py:471] Config param context: RematLocation.REMAT I0425 20:20:57.555936 137093242799936 pyconfig.py:471] Config param context_parallel_load_balance: True I0425 20:20:57.555951 137093242799936 pyconfig.py:471] Config param context_parallel_reorder_strategy: ReorderStrategy.AUTO I0425 20:20:57.555968 137093242799936 pyconfig.py:471] Config param context_parallel_size: 1 I0425 20:20:57.555984 137093242799936 pyconfig.py:471] Config param context_parallel_strategy: all_gather I0425 20:20:57.555999 137093242799936 pyconfig.py:471] Config param context_sharding: context I0425 20:20:57.556014 137093242799936 pyconfig.py:471] Config param conv_chunksize_for_audio: 500 I0425 20:20:57.556028 137093242799936 pyconfig.py:471] Config param conv_stride_for_vit: 14 I0425 20:20:57.556044 137093242799936 pyconfig.py:471] Config param convert_checkpoint_if_possible: False I0425 20:20:57.556058 137093242799936 pyconfig.py:471] Config param cost_estimate_flops_bwd: -1 I0425 20:20:57.556073 137093242799936 pyconfig.py:471] Config param cost_estimate_flops_fwd: -1 I0425 20:20:57.556088 137093242799936 pyconfig.py:471] Config param custom_mesh: I0425 20:20:57.556104 137093242799936 pyconfig.py:471] Config param custom_mesh_and_rule: I0425 20:20:57.556118 137093242799936 pyconfig.py:471] Config param d_model_for_audio: 256 I0425 20:20:57.556132 137093242799936 pyconfig.py:471] Config param data_sharding: (('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'),) I0425 20:20:57.556151 137093242799936 pyconfig.py:471] Config param data_shuffle_seed: 0 I0425 20:20:57.556166 137093242799936 pyconfig.py:471] Config param dataset_name: c4/en:3.0.1 I0425 20:20:57.556181 137093242799936 pyconfig.py:471] Config param dataset_path: I0425 20:20:57.556195 137093242799936 pyconfig.py:471] Config param dataset_type: DatasetType.HF I0425 20:20:57.556213 137093242799936 pyconfig.py:471] Config param dcn_autoregressive_parallelism: 1 I0425 20:20:57.556228 137093242799936 pyconfig.py:471] Config param dcn_context_autoregressive_parallelism: 1 I0425 20:20:57.556243 137093242799936 pyconfig.py:471] Config param dcn_context_parallelism: 1 I0425 20:20:57.556258 137093242799936 pyconfig.py:471] Config param dcn_data_parallelism: -1 I0425 20:20:57.556274 137093242799936 pyconfig.py:471] Config param dcn_diloco_parallelism: 1 I0425 20:20:57.556288 137093242799936 pyconfig.py:471] Config param dcn_expert_parallelism: 1 I0425 20:20:57.556304 137093242799936 pyconfig.py:471] Config param dcn_fsdp_parallelism: 1 I0425 20:20:57.556318 137093242799936 pyconfig.py:471] Config param dcn_fsdp_transpose_parallelism: 1 I0425 20:20:57.556339 137093242799936 pyconfig.py:471] Config param dcn_parallelism: [1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] I0425 20:20:57.556355 137093242799936 pyconfig.py:471] Config param dcn_pipeline_parallelism: 1 I0425 20:20:57.556371 137093242799936 pyconfig.py:471] Config param dcn_sequence_parallelism: 1 I0425 20:20:57.556385 137093242799936 pyconfig.py:471] Config param dcn_tensor_parallelism: 1 I0425 20:20:57.556400 137093242799936 pyconfig.py:471] Config param dcn_tensor_sequence_parallelism: 1 I0425 20:20:57.556416 137093242799936 pyconfig.py:471] Config param dcn_tensor_transpose_parallelism: 1 I0425 20:20:57.556430 137093242799936 pyconfig.py:471] Config param debug: {'rl': False} I0425 20:20:57.556446 137093242799936 pyconfig.py:471] Config param debug_sharding: False I0425 20:20:57.556461 137093242799936 pyconfig.py:471] Config param decode_sampling_nucleus_p: -1 I0425 20:20:57.556476 137093242799936 pyconfig.py:471] Config param decode_sampling_strategy: SamplingStrategy.GREEDY I0425 20:20:57.556493 137093242799936 pyconfig.py:471] Config param decode_sampling_temperature: 1.0 I0425 20:20:57.556509 137093242799936 pyconfig.py:471] Config param decode_sampling_top_k: 0 I0425 20:20:57.556523 137093242799936 pyconfig.py:471] Config param decoder_block: DecoderBlockType.GPT3 I0425 20:20:57.556540 137093242799936 pyconfig.py:471] Config param decoder_layer_input: RematLocation.DEVICE I0425 20:20:57.556555 137093242799936 pyconfig.py:471] Config param deepstack_visual_indexes_for_vit: [] I0425 20:20:57.556570 137093242799936 pyconfig.py:471] Config param degenerate_group_masking: True I0425 20:20:57.556586 137093242799936 pyconfig.py:471] Config param dense_init_scale: 1.0 I0425 20:20:57.556600 137093242799936 pyconfig.py:471] Config param diloco_outer_lr: 0.3 I0425 20:20:57.556616 137093242799936 pyconfig.py:471] Config param diloco_outer_momentum: 0.9 I0425 20:20:57.556632 137093242799936 pyconfig.py:471] Config param diloco_sync_period: 36 I0425 20:20:57.556647 137093242799936 pyconfig.py:471] Config param distill_alpha: 0.5 I0425 20:20:57.556663 137093242799936 pyconfig.py:471] Config param distill_alpha_end: None I0425 20:20:57.556678 137093242799936 pyconfig.py:471] Config param distill_alpha_schedule: constant I0425 20:20:57.556692 137093242799936 pyconfig.py:471] Config param distill_beta: 0.0 I0425 20:20:57.556708 137093242799936 pyconfig.py:471] Config param distill_beta_end: None I0425 20:20:57.556724 137093242799936 pyconfig.py:471] Config param distill_beta_schedule: constant I0425 20:20:57.556741 137093242799936 pyconfig.py:471] Config param distill_feature_loss_type: cosine I0425 20:20:57.556756 137093242799936 pyconfig.py:471] Config param distill_layer_indices: None I0425 20:20:57.556770 137093242799936 pyconfig.py:471] Config param distill_temperature: 1.0 I0425 20:20:57.556786 137093242799936 pyconfig.py:471] Config param distill_temperature_end: None I0425 20:20:57.556800 137093242799936 pyconfig.py:471] Config param distill_temperature_schedule: constant I0425 20:20:57.556815 137093242799936 pyconfig.py:471] Config param downsample_hidden_size_for_audio: 256 I0425 20:20:57.556830 137093242799936 pyconfig.py:471] Config param dpo_beta: 0.1 I0425 20:20:57.556845 137093242799936 pyconfig.py:471] Config param dpo_label_smoothing: 0.0 I0425 20:20:57.556861 137093242799936 pyconfig.py:471] Config param dq_reduction_steps: 0 I0425 20:20:57.556887 137093242799936 pyconfig.py:471] Config param dropout_rate: 0.0 I0425 20:20:57.556901 137093242799936 pyconfig.py:471] Config param dtype: bfloat16 I0425 20:20:57.556931 137093242799936 pyconfig.py:471] Config param dtype_mm: float32 I0425 20:20:57.556946 137093242799936 pyconfig.py:471] Config param dump_hlo: False I0425 20:20:57.556961 137093242799936 pyconfig.py:471] Config param dump_hlo_delete_local_after: True I0425 20:20:57.556977 137093242799936 pyconfig.py:471] Config param dump_hlo_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-20-20/xla_dump I0425 20:20:57.556992 137093242799936 pyconfig.py:471] Config param dump_hlo_local_dir: /tmp/xla_dump/ I0425 20:20:57.557007 137093242799936 pyconfig.py:471] Config param dump_hlo_local_module_name: jit_train_step I0425 20:20:57.557021 137093242799936 pyconfig.py:471] Config param dump_hlo_module_name: jit_train_step I0425 20:20:57.557037 137093242799936 pyconfig.py:471] Config param dump_hlo_upload_all: False I0425 20:20:57.557059 137093242799936 pyconfig.py:471] Config param dump_hlo_xla_flags: I0425 20:20:57.557085 137093242799936 pyconfig.py:471] Config param dump_jaxpr: False I0425 20:20:57.557110 137093242799936 pyconfig.py:471] Config param dump_jaxpr_delete_local_after: True I0425 20:20:57.557135 137093242799936 pyconfig.py:471] Config param dump_jaxpr_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-20-20/jaxpr_dump I0425 20:20:57.557161 137093242799936 pyconfig.py:471] Config param dump_jaxpr_local_dir: /tmp/jaxpr_dump/ I0425 20:20:57.557186 137093242799936 pyconfig.py:471] Config param dump_step: -1 I0425 20:20:57.557211 137093242799936 pyconfig.py:471] Config param elastic_enabled: False I0425 20:20:57.557230 137093242799936 pyconfig.py:471] Config param elastic_max_retries: 10 I0425 20:20:57.557246 137093242799936 pyconfig.py:471] Config param elastic_timeout_seconds: 300 I0425 20:20:57.557261 137093242799936 pyconfig.py:471] Config param emb_dim: 16 I0425 20:20:57.557275 137093242799936 pyconfig.py:471] Config param enable_autocheckpoint: False I0425 20:20:57.557291 137093242799936 pyconfig.py:471] Config param enable_checkpoint_cloud_logger: False I0425 20:20:57.557305 137093242799936 pyconfig.py:471] Config param enable_checkpointing: True I0425 20:20:57.557323 137093242799936 pyconfig.py:471] Config param enable_continuous_checkpointing: False I0425 20:20:57.557345 137093242799936 pyconfig.py:471] Config param enable_data_shuffling: True I0425 20:20:57.557369 137093242799936 pyconfig.py:471] Config param enable_diloco: False I0425 20:20:57.557393 137093242799936 pyconfig.py:471] Config param enable_dp_attention: False I0425 20:20:57.557410 137093242799936 pyconfig.py:471] Config param enable_dropout: False I0425 20:20:57.557424 137093242799936 pyconfig.py:471] Config param enable_emergency_checkpoint: False I0425 20:20:57.557447 137093242799936 pyconfig.py:471] Config param enable_expert_parallel: False I0425 20:20:57.557472 137093242799936 pyconfig.py:471] Config param enable_gcp_goodput_metrics: True I0425 20:20:57.557496 137093242799936 pyconfig.py:471] Config param enable_gcp_step_deviation_metrics: True I0425 20:20:57.557520 137093242799936 pyconfig.py:471] Config param enable_goodput_recording: False I0425 20:20:57.557544 137093242799936 pyconfig.py:471] Config param enable_jax_profiler: False I0425 20:20:57.557569 137093242799936 pyconfig.py:471] Config param enable_llm_inference_pool: False I0425 20:20:57.557592 137093242799936 pyconfig.py:471] Config param enable_model_warmup: False I0425 20:20:57.557615 137093242799936 pyconfig.py:471] Config param enable_multi_tier_checkpointing: False I0425 20:20:57.557640 137093242799936 pyconfig.py:471] Config param enable_nnx: False I0425 20:20:57.557664 137093242799936 pyconfig.py:471] Config param enable_orbax_v1: False I0425 20:20:57.557688 137093242799936 pyconfig.py:471] Config param enable_padding_causal_mask: True I0425 20:20:57.557712 137093242799936 pyconfig.py:471] Config param enable_pathways_goodput: False I0425 20:20:57.557734 137093242799936 pyconfig.py:471] Config param enable_prefix_caching: False I0425 20:20:57.557758 137093242799936 pyconfig.py:471] Config param enable_rampup_batch_size: False I0425 20:20:57.557783 137093242799936 pyconfig.py:471] Config param enable_single_controller: False I0425 20:20:57.557808 137093242799936 pyconfig.py:471] Config param enable_single_replica_ckpt_restoring: False I0425 20:20:57.557833 137093242799936 pyconfig.py:471] Config param enable_tensorboard: True I0425 20:20:57.557857 137093242799936 pyconfig.py:471] Config param enable_tunix_perf_metrics: False I0425 20:20:57.557895 137093242799936 pyconfig.py:471] Config param encoder_attention_heads_for_audio: 4 I0425 20:20:57.557920 137093242799936 pyconfig.py:471] Config param encoder_ffn_dim_for_audio: 512 I0425 20:20:57.557943 137093242799936 pyconfig.py:471] Config param encoder_layers_for_audio: 2 I0425 20:20:57.557967 137093242799936 pyconfig.py:471] Config param engram: RematLocation.REMAT I0425 20:20:57.557994 137093242799936 pyconfig.py:471] Config param engram_head_dim: 1280 I0425 20:20:57.558020 137093242799936 pyconfig.py:471] Config param engram_kernel_size: 4 I0425 20:20:57.558043 137093242799936 pyconfig.py:471] Config param engram_layers: [] I0425 20:20:57.558064 137093242799936 pyconfig.py:471] Config param engram_max_ngram_size: 3 I0425 20:20:57.558079 137093242799936 pyconfig.py:471] Config param engram_num_heads: 8 I0425 20:20:57.558102 137093242799936 pyconfig.py:471] Config param engram_seed: 0 I0425 20:20:57.558125 137093242799936 pyconfig.py:471] Config param engram_vocab_bases: [] I0425 20:20:57.558149 137093242799936 pyconfig.py:471] Config param epsilon_high: None I0425 20:20:57.558172 137093242799936 pyconfig.py:471] Config param eval_corr_lst: False I0425 20:20:57.558196 137093242799936 pyconfig.py:471] Config param eval_data_columns: ['text'] I0425 20:20:57.558221 137093242799936 pyconfig.py:471] Config param eval_dataset_name: c4/en:3.0.1 I0425 20:20:57.558245 137093242799936 pyconfig.py:471] Config param eval_image_column: image I0425 20:20:57.558269 137093242799936 pyconfig.py:471] Config param eval_interval: -1 I0425 20:20:57.558294 137093242799936 pyconfig.py:471] Config param eval_make_lst: False I0425 20:20:57.558315 137093242799936 pyconfig.py:471] Config param eval_per_device_batch_size: 2 I0425 20:20:57.558338 137093242799936 pyconfig.py:471] Config param eval_sampling_strategy: greedy I0425 20:20:57.558354 137093242799936 pyconfig.py:471] Config param eval_split: validation I0425 20:20:57.558368 137093242799936 pyconfig.py:471] Config param eval_steps: -1 I0425 20:20:57.558383 137093242799936 pyconfig.py:471] Config param expansion_factor_real_data: -1.0 I0425 20:20:57.558399 137093242799936 pyconfig.py:471] Config param final_logits_soft_cap: None I0425 20:20:57.558415 137093242799936 pyconfig.py:471] Config param first_num_dense_layers: 0 I0425 20:20:57.558429 137093242799936 pyconfig.py:471] Config param float32_gate_logits: False I0425 20:20:57.558448 137093242799936 pyconfig.py:471] Config param float32_logits: False I0425 20:20:57.558472 137093242799936 pyconfig.py:471] Config param float32_qk_product: False I0425 20:20:57.558496 137093242799936 pyconfig.py:471] Config param float32_weight_sum: True I0425 20:20:57.558521 137093242799936 pyconfig.py:471] Config param force_q_layout: False I0425 20:20:57.558545 137093242799936 pyconfig.py:471] Config param force_unroll: False I0425 20:20:57.558569 137093242799936 pyconfig.py:471] Config param freeze_audio_encoder_params: True I0425 20:20:57.558593 137093242799936 pyconfig.py:471] Config param freeze_vision_encoder_params: True I0425 20:20:57.558617 137093242799936 pyconfig.py:471] Config param fused_mlp: False I0425 20:20:57.558641 137093242799936 pyconfig.py:471] Config param fused_qkv: True I0425 20:20:57.558665 137093242799936 pyconfig.py:471] Config param gcs_metrics: False I0425 20:20:57.558689 137093242799936 pyconfig.py:471] Config param gdn_chunk_size: 64 I0425 20:20:57.558713 137093242799936 pyconfig.py:471] Config param gdn_conv_kernel_dim: 4 I0425 20:20:57.558737 137093242799936 pyconfig.py:471] Config param gdn_key_head_dim: 128 I0425 20:20:57.558761 137093242799936 pyconfig.py:471] Config param gdn_num_key_heads: 16 I0425 20:20:57.558784 137093242799936 pyconfig.py:471] Config param gdn_num_value_heads: 32 I0425 20:20:57.558809 137093242799936 pyconfig.py:471] Config param gdn_value_head_dim: 128 I0425 20:20:57.558833 137093242799936 pyconfig.py:471] Config param generate_padding_batch_eval: False I0425 20:20:57.558858 137093242799936 pyconfig.py:471] Config param generate_padding_batch_train: False I0425 20:20:57.558895 137093242799936 pyconfig.py:471] Config param generate_slice: v5e-16 I0425 20:20:57.558920 137093242799936 pyconfig.py:471] Config param generation_configs: {} I0425 20:20:57.558946 137093242799936 pyconfig.py:471] Config param global_batch_size_to_eval_on: 64 I0425 20:20:57.558970 137093242799936 pyconfig.py:471] Config param global_batch_size_to_load: 512 I0425 20:20:57.558994 137093242799936 pyconfig.py:471] Config param global_batch_size_to_load_eval: 64 I0425 20:20:57.559016 137093242799936 pyconfig.py:471] Config param global_batch_size_to_load_increment: None I0425 20:20:57.559037 137093242799936 pyconfig.py:471] Config param global_batch_size_to_load_start: None I0425 20:20:57.559055 137093242799936 pyconfig.py:471] Config param global_batch_size_to_train_on: 512 I0425 20:20:57.559071 137093242799936 pyconfig.py:471] Config param global_head_dim: 0 I0425 20:20:57.559087 137093242799936 pyconfig.py:471] Config param global_num_kv_heads: 0 I0425 20:20:57.559101 137093242799936 pyconfig.py:471] Config param global_parameter_scale: 1 I0425 20:20:57.559116 137093242799936 pyconfig.py:471] Config param global_rampup_samples: 500 I0425 20:20:57.559131 137093242799936 pyconfig.py:471] Config param global_rope_max_timescale: -1 I0425 20:20:57.559149 137093242799936 pyconfig.py:471] Config param global_rope_proportion: 0.25 I0425 20:20:57.559175 137093242799936 pyconfig.py:471] Config param goodput_upload_interval_seconds: 30 I0425 20:20:57.559197 137093242799936 pyconfig.py:471] Config param grad_dtype: float32 I0425 20:20:57.559245 137093242799936 pyconfig.py:471] Config param gradient_accumulation_steps: 8 I0425 20:20:57.559269 137093242799936 pyconfig.py:471] Config param gradient_clipping_threshold: 1.0 I0425 20:20:57.559295 137093242799936 pyconfig.py:471] Config param grain_data_source_max_workers: 16 I0425 20:20:57.559324 137093242799936 pyconfig.py:471] Config param grain_eval_files: I0425 20:20:57.559349 137093242799936 pyconfig.py:471] Config param grain_file_type: arrayrecord I0425 20:20:57.559374 137093242799936 pyconfig.py:471] Config param grain_num_threads: 16 I0425 20:20:57.559398 137093242799936 pyconfig.py:471] Config param grain_num_threads_eval: 16 I0425 20:20:57.559422 137093242799936 pyconfig.py:471] Config param grain_packing_type: first_fit I0425 20:20:57.559446 137093242799936 pyconfig.py:471] Config param grain_per_worker_buffer_size: 1 I0425 20:20:57.559470 137093242799936 pyconfig.py:471] Config param grain_per_worker_buffer_size_eval: 1 I0425 20:20:57.559494 137093242799936 pyconfig.py:471] Config param grain_prefetch_buffer_size: 500 I0425 20:20:57.559519 137093242799936 pyconfig.py:471] Config param grain_prefetch_buffer_size_eval: 500 I0425 20:20:57.559543 137093242799936 pyconfig.py:471] Config param grain_ram_budget_mb: 1024 I0425 20:20:57.559564 137093242799936 pyconfig.py:471] Config param grain_shuffle_buffer_size: 100 I0425 20:20:57.559584 137093242799936 pyconfig.py:471] Config param grain_train_files: I0425 20:20:57.559603 137093242799936 pyconfig.py:471] Config param grain_train_mixture_config_path: I0425 20:20:57.559617 137093242799936 pyconfig.py:471] Config param grain_worker_count: 1 I0425 20:20:57.559634 137093242799936 pyconfig.py:471] Config param grain_worker_count_eval: 1 I0425 20:20:57.559658 137093242799936 pyconfig.py:471] Config param grpo_beta: 0.08 I0425 20:20:57.559683 137093242799936 pyconfig.py:471] Config param grpo_epsilon: 0.2 I0425 20:20:57.559707 137093242799936 pyconfig.py:471] Config param hardware: tpu I0425 20:20:57.559729 137093242799936 pyconfig.py:471] Config param hbm_utilization_vllm: 0.72 I0425 20:20:57.559754 137093242799936 pyconfig.py:471] Config param head_dim: 8 I0425 20:20:57.559777 137093242799936 pyconfig.py:471] Config param heartbeat_reporting_interval_in_seconds: 5 I0425 20:20:57.559801 137093242799936 pyconfig.py:471] Config param hf_data_dir: None I0425 20:20:57.559825 137093242799936 pyconfig.py:471] Config param hf_eval_files: None I0425 20:20:57.559848 137093242799936 pyconfig.py:471] Config param hf_eval_split: None I0425 20:20:57.559881 137093242799936 pyconfig.py:471] Config param hf_name: None I0425 20:20:57.559905 137093242799936 pyconfig.py:471] Config param hf_path: OptimalScale/ClimbMix I0425 20:20:57.559928 137093242799936 pyconfig.py:471] Config param hf_train_files: None I0425 20:20:57.559953 137093242799936 pyconfig.py:471] Config param hidden_size_for_vit: 1408 I0425 20:20:57.559976 137093242799936 pyconfig.py:471] Config param hide_profiler_step_metric: False I0425 20:20:57.559999 137093242799936 pyconfig.py:471] Config param ici_autoregressive_parallelism: 1 I0425 20:20:57.560023 137093242799936 pyconfig.py:471] Config param ici_context_autoregressive_parallelism: 1 I0425 20:20:57.560047 137093242799936 pyconfig.py:471] Config param ici_context_parallelism: 1 I0425 20:20:57.560071 137093242799936 pyconfig.py:471] Config param ici_data_parallelism: 1 I0425 20:20:57.560096 137093242799936 pyconfig.py:471] Config param ici_diloco_parallelism: 1 I0425 20:20:57.560120 137093242799936 pyconfig.py:471] Config param ici_expert_parallelism: 1 I0425 20:20:57.560144 137093242799936 pyconfig.py:471] Config param ici_fsdp_parallelism: -1 I0425 20:20:57.560168 137093242799936 pyconfig.py:471] Config param ici_fsdp_transpose_parallelism: 1 I0425 20:20:57.560192 137093242799936 pyconfig.py:471] Config param ici_parallelism: [1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1] I0425 20:20:57.560217 137093242799936 pyconfig.py:471] Config param ici_pipeline_parallelism: 1 I0425 20:20:57.560241 137093242799936 pyconfig.py:471] Config param ici_sequence_parallelism: 1 I0425 20:20:57.560265 137093242799936 pyconfig.py:471] Config param ici_tensor_parallelism: 1 I0425 20:20:57.560288 137093242799936 pyconfig.py:471] Config param ici_tensor_sequence_parallelism: 1 I0425 20:20:57.560311 137093242799936 pyconfig.py:471] Config param ici_tensor_transpose_parallelism: 1 I0425 20:20:57.560339 137093242799936 pyconfig.py:471] Config param image_path: I0425 20:20:57.560363 137093242799936 pyconfig.py:471] Config param image_placeholder: <|image|> I0425 20:20:57.560388 137093242799936 pyconfig.py:471] Config param image_size_for_vit: 896 I0425 20:20:57.560414 137093242799936 pyconfig.py:471] Config param indexer_head_dim: 128 I0425 20:20:57.560439 137093242799936 pyconfig.py:471] Config param indexer_loss_scaling_factor: 0.0 I0425 20:20:57.560463 137093242799936 pyconfig.py:471] Config param indexer_n_heads: 64 I0425 20:20:57.560487 137093242799936 pyconfig.py:471] Config param indexer_sparse_training: False I0425 20:20:57.560511 137093242799936 pyconfig.py:471] Config param indexer_topk: 2048 I0425 20:20:57.560533 137093242799936 pyconfig.py:471] Config param inference_benchmark_test: False I0425 20:20:57.560554 137093242799936 pyconfig.py:471] Config param inference_metadata_file: I0425 20:20:57.560574 137093242799936 pyconfig.py:471] Config param inference_microbenchmark_log_file_path: I0425 20:20:57.560596 137093242799936 pyconfig.py:471] Config param inference_microbenchmark_loop_iters: 10 I0425 20:20:57.560618 137093242799936 pyconfig.py:471] Config param inference_microbenchmark_num_samples: [1, 2, 3, 4, 5] I0425 20:20:57.560641 137093242799936 pyconfig.py:471] Config param inference_microbenchmark_prefill_lengths: 64,128,256,512,1024 I0425 20:20:57.560662 137093242799936 pyconfig.py:471] Config param inference_microbenchmark_stages: prefill,generate I0425 20:20:57.560683 137093242799936 pyconfig.py:471] Config param inference_server: MaxtextInterleavedServer I0425 20:20:57.560706 137093242799936 pyconfig.py:471] Config param inhomogeneous_layer_cycle_interval: 1 I0425 20:20:57.560728 137093242799936 pyconfig.py:471] Config param init_weights_seed: 0 I0425 20:20:57.560749 137093242799936 pyconfig.py:471] Config param input_data_sharding_logical_axes: ['activation_embed_and_logits_batch', 'activation_norm_length'] I0425 20:20:57.560773 137093242799936 pyconfig.py:471] Config param interleave_moe_layer_step: 1 I0425 20:20:57.560796 137093242799936 pyconfig.py:471] Config param intermediate_size_for_vit: 5632 I0425 20:20:57.560818 137093242799936 pyconfig.py:471] Config param internal_compile: False I0425 20:20:57.560840 137093242799936 pyconfig.py:471] Config param internal_compile_num_devices: -1 I0425 20:20:57.560863 137093242799936 pyconfig.py:471] Config param jax_cache_dir: ~/jax_cache I0425 20:20:57.560900 137093242799936 pyconfig.py:471] Config param jax_debug_log_modules: I0425 20:20:57.560924 137093242799936 pyconfig.py:471] Config param jax_distributed_initialization_timeout: 300 I0425 20:20:57.560946 137093242799936 pyconfig.py:471] Config param jax_profiler_port: 9999 I0425 20:20:57.560969 137093242799936 pyconfig.py:471] Config param key_proj: RematLocation.REMAT I0425 20:20:57.560994 137093242799936 pyconfig.py:471] Config param kv_cache_buffer: 256 I0425 20:20:57.561016 137093242799936 pyconfig.py:471] Config param kv_lora_rank: 512 I0425 20:20:57.561038 137093242799936 pyconfig.py:471] Config param kv_quant_axis: KvQuantAxis.HEADS_AND_DKV I0425 20:20:57.561064 137093242799936 pyconfig.py:471] Config param kv_quant_dtype: int8 I0425 20:20:57.561087 137093242799936 pyconfig.py:471] Config param kv_wa_proj: RematLocation.REMAT I0425 20:20:57.561110 137093242799936 pyconfig.py:471] Config param learning_rate: 0.0002 I0425 20:20:57.561132 137093242799936 pyconfig.py:471] Config param learning_rate_final_fraction: 0.1 I0425 20:20:57.561156 137093242799936 pyconfig.py:471] Config param learning_rate_schedule_steps: 200000 I0425 20:20:57.561179 137093242799936 pyconfig.py:471] Config param load_balance_loss_weight: 0.0 I0425 20:20:57.561202 137093242799936 pyconfig.py:471] Config param load_checkpoint_only_once: False I0425 20:20:57.561225 137093242799936 pyconfig.py:471] Config param load_from_prefill_dir: False I0425 20:20:57.561248 137093242799936 pyconfig.py:471] Config param load_full_state_path: I0425 20:20:57.561270 137093242799936 pyconfig.py:471] Config param load_parameters_path: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items I0425 20:20:57.561293 137093242799936 pyconfig.py:471] Config param local_checkpoint_directory: I0425 20:20:57.561316 137093242799936 pyconfig.py:471] Config param local_checkpoint_period: 0 I0425 20:20:57.561343 137093242799936 pyconfig.py:471] Config param local_rope_max_timescale: -1 I0425 20:20:57.561366 137093242799936 pyconfig.py:471] Config param local_rope_proportion: 1.0 I0425 20:20:57.561389 137093242799936 pyconfig.py:471] Config param log_config: True I0425 20:20:57.561411 137093242799936 pyconfig.py:471] Config param log_period: 10 I0425 20:20:57.561434 137093242799936 pyconfig.py:471] Config param logical_axis_rules: (('activation_embed_and_logits_batch', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_embed_and_logits_batch_sequence', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_vocab', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_vocab', ('tensor', 'tensor_transpose')), ('activation_vocab', 'tensor_sequence'), ('activation_vocab', ('sequence', 'context')), ('vocab', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('embed_vocab', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence', 'autoregressive')), ('activation_kv_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence')), ('activation_attn_length', ('sequence', 'context')), ('activation_attn_length', ('context',)), ('activation_q_length', ('context',)), ('activation_kv_length', ()), ('activation_attn_embed', ('tensor', 'tensor_transpose')), ('activation_kv', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_kv_head_dim', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('q_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('kv_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('qkv', ()), ('kv', ()), ('kv_head_dim', ()), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'expert')), ('q_lora_up_proj', ()), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'expert')), ('kv_lora_up_proj', ()), ('activation_batch_moe', ('data', 'fsdp', 'fsdp_transpose')), ('activation_length_moe', ('sequence', 'context')), ('activation_length_moe', ('context',)), ('activation_norm_length_moe', ('tensor_sequence', 'context', 'sequence')), ('activation_embed_moe', ('tensor', 'tensor_transpose')), ('activation_mlp_moe', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_exp', ('expert',)), ('exp', 'expert'), ('mlp_moe', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'context')), ('embed_moe', ('fsdp', 'sequence', 'context')), ('activation_mlp', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_length', ('sequence', 'context')), ('activation_length', ('context',)), ('activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_embed', ('tensor', 'tensor_transpose')), ('activation_stage', 'stage'), ('mlp', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'context', 'expert')), ('norm', ('tensor', 'tensor_transpose')), ('layers', 'stage'), ('diloco', 'diloco'), ('engram_dim', ('tensor',)), ('dense_layers', ()), ('moe_layers', ()), ('mhc', ()), ('prefill_activation_length', ('sequence', 'context')), ('prefill_activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_prefill_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_length', ('sequence',)), ('cache_heads', ('autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence')), ('cache_heads', ('autoregressive', 'tensor', 'tensor_sequence')), ('paged_kv_heads', ('tensor',)), ('cache_batch_prefill', ()), ('cache_batch', ()), ('cache_heads_none', ()), ('cache_kv', ()), ('cache_sequence', ()), ('num_pages', ()), ('tokens_per_page', ()), ('paged_kv_head_dim_size', ()), ('mlp_no_fsdp', ('tensor', 'tensor_sequence', 'autoregressive')), ('embed_tensor_transpose', ('tensor_transpose',)), ('exp_with_fsdp', 'fsdp')) I0425 20:20:57.561546 137093242799936 pyconfig.py:471] Config param logits_dot_in_fp32: False I0425 20:20:57.561579 137093242799936 pyconfig.py:471] Config param logits_via_embedding: True I0425 20:20:57.561603 137093242799936 pyconfig.py:471] Config param lora_input_adapters_path: I0425 20:20:57.561625 137093242799936 pyconfig.py:471] Config param loss_algo: grpo I0425 20:20:57.561648 137093242799936 pyconfig.py:471] Config param lr_schedule_type: LearningRateScheduleType.COSINE I0425 20:20:57.561672 137093242799936 pyconfig.py:471] Config param managed_mldiagnostics: False I0425 20:20:57.561696 137093242799936 pyconfig.py:471] Config param managed_mldiagnostics_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-20-20/managed-mldiagnostics I0425 20:20:57.561718 137093242799936 pyconfig.py:471] Config param managed_mldiagnostics_run_group: I0425 20:20:57.561741 137093242799936 pyconfig.py:471] Config param matmul_precision: MatmulPrecision.DEFAULT I0425 20:20:57.561766 137093242799936 pyconfig.py:471] Config param max_checkify: False I0425 20:20:57.561792 137093242799936 pyconfig.py:471] Config param max_concurrency: 256 I0425 20:20:57.561818 137093242799936 pyconfig.py:471] Config param max_corpus_chars: 10000000 I0425 20:20:57.561846 137093242799936 pyconfig.py:471] Config param max_num_batched_tokens: None I0425 20:20:57.561882 137093242799936 pyconfig.py:471] Config param max_num_checkpoints_to_keep: None I0425 20:20:57.561909 137093242799936 pyconfig.py:471] Config param max_num_images_per_example: -1 I0425 20:20:57.561933 137093242799936 pyconfig.py:471] Config param max_num_seqs: None I0425 20:20:57.561954 137093242799936 pyconfig.py:471] Config param max_position_embeddings: 163840 I0425 20:20:57.561975 137093242799936 pyconfig.py:471] Config param max_prefill_predict_length: 64 I0425 20:20:57.561991 137093242799936 pyconfig.py:471] Config param max_sample_len_for_audio: 10000 I0425 20:20:57.562005 137093242799936 pyconfig.py:471] Config param max_segments_per_seq: -1 I0425 20:20:57.562021 137093242799936 pyconfig.py:471] Config param max_source_positions_for_audio: 1500 I0425 20:20:57.562037 137093242799936 pyconfig.py:471] Config param max_target_length: 2048 I0425 20:20:57.562052 137093242799936 pyconfig.py:471] Config param max_timescale_for_audio: 10000.0 I0425 20:20:57.562069 137093242799936 pyconfig.py:471] Config param megablox: True I0425 20:20:57.562086 137093242799936 pyconfig.py:471] Config param merge_gating_gmm: False I0425 20:20:57.562099 137093242799936 pyconfig.py:471] Config param mesh_axes: ['diloco', 'data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'] I0425 20:20:57.562118 137093242799936 pyconfig.py:471] Config param metrics_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-20-20/metrics/ I0425 20:20:57.562132 137093242799936 pyconfig.py:471] Config param metrics_file: I0425 20:20:57.562150 137093242799936 pyconfig.py:471] Config param mhc_expansion_rate: 1 I0425 20:20:57.562167 137093242799936 pyconfig.py:471] Config param micro_batch_size_to_eval_on: 64 I0425 20:20:57.562189 137093242799936 pyconfig.py:471] Config param micro_batch_size_to_train_on: 64 I0425 20:20:57.562209 137093242799936 pyconfig.py:471] Config param mla_kv: RematLocation.REMAT I0425 20:20:57.562232 137093242799936 pyconfig.py:471] Config param mla_naive_kvcache: True I0425 20:20:57.562258 137093242799936 pyconfig.py:471] Config param mla_q: RematLocation.REMAT I0425 20:20:57.562281 137093242799936 pyconfig.py:471] Config param mlp_activations: ['gelu'] I0425 20:20:57.562298 137093242799936 pyconfig.py:471] Config param mlp_activations_limit: -1.0 I0425 20:20:57.562315 137093242799936 pyconfig.py:471] Config param mlp_bias: False I0425 20:20:57.562345 137093242799936 pyconfig.py:471] Config param mlp_dim: 64 I0425 20:20:57.562367 137093242799936 pyconfig.py:471] Config param mlpwi: RematLocation.REMAT I0425 20:20:57.562392 137093242799936 pyconfig.py:471] Config param mlpwi_0: RematLocation.REMAT I0425 20:20:57.562414 137093242799936 pyconfig.py:471] Config param mlpwi_1: RematLocation.REMAT I0425 20:20:57.562438 137093242799936 pyconfig.py:471] Config param mlpwo: RematLocation.REMAT I0425 20:20:57.562462 137093242799936 pyconfig.py:471] Config param moba: False I0425 20:20:57.562486 137093242799936 pyconfig.py:471] Config param moba_chunk_size: 1024 I0425 20:20:57.562509 137093242799936 pyconfig.py:471] Config param moba_topk: 8 I0425 20:20:57.562533 137093242799936 pyconfig.py:471] Config param model_call_mode: I0425 20:20:57.562558 137093242799936 pyconfig.py:471] Config param model_name: gpt3-52k I0425 20:20:57.562581 137093242799936 pyconfig.py:471] Config param moe_expert_input_dim: -1 I0425 20:20:57.562601 137093242799936 pyconfig.py:471] Config param moe_fsdp_use_two_stage_all_gather: False I0425 20:20:57.562619 137093242799936 pyconfig.py:471] Config param moe_mlp_dim: -1 I0425 20:20:57.562633 137093242799936 pyconfig.py:471] Config param moe_mlpwi_0: RematLocation.REMAT I0425 20:20:57.562649 137093242799936 pyconfig.py:471] Config param moe_mlpwi_1: RematLocation.REMAT I0425 20:20:57.562664 137093242799936 pyconfig.py:471] Config param moe_mlpwo: RematLocation.REMAT I0425 20:20:57.562679 137093242799936 pyconfig.py:471] Config param monitor_goodput: False I0425 20:20:57.562693 137093242799936 pyconfig.py:471] Config param monitor_step_time_deviation: True I0425 20:20:57.562710 137093242799936 pyconfig.py:471] Config param mrope_section: [24, 20, 20] I0425 20:20:57.562734 137093242799936 pyconfig.py:471] Config param mscale: 1.0 I0425 20:20:57.562758 137093242799936 pyconfig.py:471] Config param mtc_data_parallelism: 0 I0425 20:20:57.562783 137093242799936 pyconfig.py:471] Config param mtp_eval_target_module: 0 I0425 20:20:57.562807 137093242799936 pyconfig.py:471] Config param mtp_loss_scaling_factor: 0.1 I0425 20:20:57.562831 137093242799936 pyconfig.py:471] Config param mtp_num_layers: 0 I0425 20:20:57.562855 137093242799936 pyconfig.py:471] Config param mu_dtype: float32 I0425 20:20:57.562906 137093242799936 pyconfig.py:471] Config param multi_sampling: False I0425 20:20:57.562931 137093242799936 pyconfig.py:471] Config param multi_tier_checkpointing_backup_interval_minutes: 0 I0425 20:20:57.562955 137093242799936 pyconfig.py:471] Config param muon_beta: 0.95 I0425 20:20:57.562977 137093242799936 pyconfig.py:471] Config param muon_consistent_rms: None I0425 20:20:57.562999 137093242799936 pyconfig.py:471] Config param muon_weight_decay: 0.0 I0425 20:20:57.563023 137093242799936 pyconfig.py:471] Config param n_routing_groups: -1 I0425 20:20:57.563048 137093242799936 pyconfig.py:471] Config param n_window_for_audio: 50 I0425 20:20:57.563072 137093242799936 pyconfig.py:471] Config param n_window_infer_for_audio: 800 I0425 20:20:57.563096 137093242799936 pyconfig.py:471] Config param nope_layer_interval: -1 I0425 20:20:57.563122 137093242799936 pyconfig.py:471] Config param norm_topk_prob: False I0425 20:20:57.563145 137093242799936 pyconfig.py:471] Config param normalization_layer_epsilon: 1e-05 I0425 20:20:57.563169 137093242799936 pyconfig.py:471] Config param normalize_embedding_logits: False I0425 20:20:57.563192 137093242799936 pyconfig.py:471] Config param num_attention_heads_for_vit: 16 I0425 20:20:57.563215 137093242799936 pyconfig.py:471] Config param num_batches: 4 I0425 20:20:57.563246 137093242799936 pyconfig.py:471] Config param num_channels_for_vit: 3 I0425 20:20:57.563266 137093242799936 pyconfig.py:471] Config param num_conv_layers_for_audio: 3 I0425 20:20:57.563287 137093242799936 pyconfig.py:471] Config param num_decoder_layers: 1 I0425 20:20:57.563303 137093242799936 pyconfig.py:471] Config param num_diloco_replicas: 1 I0425 20:20:57.563317 137093242799936 pyconfig.py:471] Config param num_epoch: 1 I0425 20:20:57.563335 137093242799936 pyconfig.py:471] Config param num_eval_passes: 1 I0425 20:20:57.563360 137093242799936 pyconfig.py:471] Config param num_experts: 1 I0425 20:20:57.563384 137093242799936 pyconfig.py:471] Config param num_experts_per_tok: 1 I0425 20:20:57.563408 137093242799936 pyconfig.py:471] Config param num_generations: 2 I0425 20:20:57.563431 137093242799936 pyconfig.py:471] Config param num_hidden_layers_for_vit: 34 I0425 20:20:57.563457 137093242799936 pyconfig.py:471] Config param num_iterations: 1 I0425 20:20:57.563481 137093242799936 pyconfig.py:471] Config param num_kv_heads: 2 I0425 20:20:57.563505 137093242799936 pyconfig.py:471] Config param num_layers_per_pipeline_stage: 1 I0425 20:20:57.563529 137093242799936 pyconfig.py:471] Config param num_mel_bins_for_audio: 128 I0425 20:20:57.563549 137093242799936 pyconfig.py:471] Config param num_pipeline_microbatches: -1 I0425 20:20:57.563574 137093242799936 pyconfig.py:471] Config param num_pipeline_repeats: -1 I0425 20:20:57.563597 137093242799936 pyconfig.py:471] Config param num_position_embeddings_for_vit: 1024 I0425 20:20:57.563619 137093242799936 pyconfig.py:471] Config param num_query_heads: 2 I0425 20:20:57.563643 137093242799936 pyconfig.py:471] Config param num_samplers_slices: -1 I0425 20:20:57.563667 137093242799936 pyconfig.py:471] Config param num_slices: 1 I0425 20:20:57.563691 137093242799936 pyconfig.py:471] Config param num_target_devices: 32 I0425 20:20:57.563716 137093242799936 pyconfig.py:471] Config param num_test_batches: 5 I0425 20:20:57.563743 137093242799936 pyconfig.py:471] Config param num_trainer_slices: -1 I0425 20:20:57.563767 137093242799936 pyconfig.py:471] Config param num_vocab_tiling: 1 I0425 20:20:57.563789 137093242799936 pyconfig.py:471] Config param off_policy_steps: 0 I0425 20:20:57.563810 137093242799936 pyconfig.py:471] Config param offline_data_dir: None I0425 20:20:57.563826 137093242799936 pyconfig.py:471] Config param opt_type: OptimizerType.ADAM_PAX I0425 20:20:57.563844 137093242799936 pyconfig.py:471] Config param optimize_mesh_for_tpu_v6e: False I0425 20:20:57.563860 137093242799936 pyconfig.py:471] Config param optimizer_memory_host_offload: False I0425 20:20:57.563884 137093242799936 pyconfig.py:471] Config param original_max_position_embeddings: 4096 I0425 20:20:57.563899 137093242799936 pyconfig.py:471] Config param out_hidden_size_for_vit: 512 I0425 20:20:57.563915 137093242799936 pyconfig.py:471] Config param out_proj: RematLocation.REMAT I0425 20:20:57.563930 137093242799936 pyconfig.py:471] Config param output_dim_for_audio: 512 I0425 20:20:57.563946 137093242799936 pyconfig.py:471] Config param override_logical_axis_rules: False I0425 20:20:57.563961 137093242799936 pyconfig.py:471] Config param override_model_config: True I0425 20:20:57.563975 137093242799936 pyconfig.py:471] Config param packing: True I0425 20:20:57.563990 137093242799936 pyconfig.py:471] Config param pagedattn_head_dim_alignment: 128 I0425 20:20:57.564005 137093242799936 pyconfig.py:471] Config param pagedattn_max_pages_per_group: -1 I0425 20:20:57.564020 137093242799936 pyconfig.py:471] Config param pagedattn_num_pages: 64 I0425 20:20:57.564035 137093242799936 pyconfig.py:471] Config param pagedattn_pages_per_compute_block: 4 I0425 20:20:57.564049 137093242799936 pyconfig.py:471] Config param pagedattn_tokens_per_page: 32 I0425 20:20:57.564064 137093242799936 pyconfig.py:471] Config param param_scan_axis: 1 I0425 20:20:57.564079 137093242799936 pyconfig.py:471] Config param parameter_memory_host_offload: False I0425 20:20:57.564093 137093242799936 pyconfig.py:471] Config param partial_rotary_factor: 1.0 I0425 20:20:57.564108 137093242799936 pyconfig.py:471] Config param patch_size_for_vit: 14 I0425 20:20:57.564124 137093242799936 pyconfig.py:471] Config param penalty_incorrect_answer: -1.0 I0425 20:20:57.564138 137093242799936 pyconfig.py:471] Config param penalty_incorrect_format: -0.5 I0425 20:20:57.564154 137093242799936 pyconfig.py:471] Config param per_device_batch_size: 2 I0425 20:20:57.564169 137093242799936 pyconfig.py:471] Config param per_device_batch_size_increment: 2.0 I0425 20:20:57.564183 137093242799936 pyconfig.py:471] Config param per_device_batch_size_start: 4.0 I0425 20:20:57.564199 137093242799936 pyconfig.py:471] Config param pipeline_delay_activation_forwarding: False I0425 20:20:57.564213 137093242799936 pyconfig.py:471] Config param pipeline_fsdp_ag_once: False I0425 20:20:57.564229 137093242799936 pyconfig.py:471] Config param pipeline_fsdp_ag_per_repeat: False I0425 20:20:57.564243 137093242799936 pyconfig.py:471] Config param pipeline_parallel_layers: 1 I0425 20:20:57.564258 137093242799936 pyconfig.py:471] Config param pixel_shuffle_ratio_for_vit: 0.5 I0425 20:20:57.564273 137093242799936 pyconfig.py:471] Config param posemb_type_for_vit: learn I0425 20:20:57.564287 137093242799936 pyconfig.py:471] Config param position_id_per_seconds: 25 I0425 20:20:57.564303 137093242799936 pyconfig.py:471] Config param prefill_cache_axis_order: 1,2,0,3 I0425 20:20:57.564318 137093242799936 pyconfig.py:471] Config param prefill_cache_dir: I0425 20:20:57.564336 137093242799936 pyconfig.py:471] Config param prefill_chunk_size: 256 I0425 20:20:57.564352 137093242799936 pyconfig.py:471] Config param prefill_slice: v5e-16 I0425 20:20:57.564365 137093242799936 pyconfig.py:471] Config param prefix_caching_dram_byte: 100000000000 I0425 20:20:57.564381 137093242799936 pyconfig.py:471] Config param prefix_caching_hbm_byte: 10000000000 I0425 20:20:57.564395 137093242799936 pyconfig.py:471] Config param profile_cleanly: True I0425 20:20:57.564410 137093242799936 pyconfig.py:471] Config param profile_periodically_period: -1 I0425 20:20:57.564426 137093242799936 pyconfig.py:471] Config param profile_power_events: False I0425 20:20:57.564441 137093242799936 pyconfig.py:471] Config param profiler: ProfilerType.NONE I0425 20:20:57.564457 137093242799936 pyconfig.py:471] Config param profiler_steps: 5 I0425 20:20:57.564473 137093242799936 pyconfig.py:471] Config param projector_dropout_for_vit: 0.0 I0425 20:20:57.564488 137093242799936 pyconfig.py:471] Config param projector_input_dim_for_vit: 4096 I0425 20:20:57.564502 137093242799936 pyconfig.py:471] Config param projector_output_dim_for_vit: 4096 I0425 20:20:57.564517 137093242799936 pyconfig.py:471] Config param prometheus_port: 0 I0425 20:20:57.564531 137093242799936 pyconfig.py:471] Config param prompt: I love to I0425 20:20:57.564547 137093242799936 pyconfig.py:471] Config param pure_nnx: False I0425 20:20:57.564561 137093242799936 pyconfig.py:471] Config param pure_nnx_decoder: False I0425 20:20:57.564578 137093242799936 pyconfig.py:471] Config param q_lora_rank: 0 I0425 20:20:57.564594 137093242799936 pyconfig.py:471] Config param qk_clip_threshold: 100.0 I0425 20:20:57.564609 137093242799936 pyconfig.py:471] Config param qk_nope_head_dim: 128 I0425 20:20:57.564623 137093242799936 pyconfig.py:471] Config param qk_norm_with_scale: True I0425 20:20:57.564638 137093242799936 pyconfig.py:471] Config param qk_rope_head_dim: 64 I0425 20:20:57.564653 137093242799936 pyconfig.py:471] Config param qkv_proj: RematLocation.REMAT I0425 20:20:57.564669 137093242799936 pyconfig.py:471] Config param quant_cfg_path: I0425 20:20:57.564683 137093242799936 pyconfig.py:471] Config param quantization: QuantizationType.NONE I0425 20:20:57.564701 137093242799936 pyconfig.py:471] Config param quantization_local_shard_count: 4 I0425 20:20:57.564716 137093242799936 pyconfig.py:471] Config param quantize_kvcache: False I0425 20:20:57.564731 137093242799936 pyconfig.py:471] Config param query_proj: RematLocation.REMAT I0425 20:20:57.564747 137093242799936 pyconfig.py:471] Config param query_wa_proj: RematLocation.REMAT I0425 20:20:57.564761 137093242799936 pyconfig.py:471] Config param ragged_block_size: 256 I0425 20:20:57.564776 137093242799936 pyconfig.py:471] Config param ragged_buffer_factor: -1.0 I0425 20:20:57.564791 137093242799936 pyconfig.py:471] Config param rampup_end_step: 0 I0425 20:20:57.564807 137093242799936 pyconfig.py:471] Config param rampup_samples_per_increment_to_load: None I0425 20:20:57.564821 137093242799936 pyconfig.py:471] Config param reasoning_end_token: </reasoning> I0425 20:20:57.564836 137093242799936 pyconfig.py:471] Config param reasoning_start_token: <reasoning> I0425 20:20:57.564851 137093242799936 pyconfig.py:471] Config param record_internal_nn_metrics: 0 I0425 20:20:57.564866 137093242799936 pyconfig.py:471] Config param remat_policy: full I0425 20:20:57.564891 137093242799936 pyconfig.py:471] Config param remat_policy_for_vit: minimal I0425 20:20:57.564906 137093242799936 pyconfig.py:471] Config param remove_size_one_mesh_axis_from_type: True I0425 20:20:57.564921 137093242799936 pyconfig.py:471] Config param replicate_quant_scale: False I0425 20:20:57.564937 137093242799936 pyconfig.py:471] Config param replicator_backup_interval_minutes: 0 I0425 20:20:57.564953 137093242799936 pyconfig.py:471] Config param report_heartbeat_metric_for_gcp_monitoring: False I0425 20:20:57.564968 137093242799936 pyconfig.py:471] Config param report_performance_metric_for_gcp_monitoring: False I0425 20:20:57.564983 137093242799936 pyconfig.py:471] Config param reshape_q: False I0425 20:20:57.564997 137093242799936 pyconfig.py:471] Config param return_log_prob: False I0425 20:20:57.565012 137093242799936 pyconfig.py:471] Config param reuse_example_batch: 0 I0425 20:20:57.565026 137093242799936 pyconfig.py:471] Config param reward_exact_answer: 5.0 I0425 20:20:57.565042 137093242799936 pyconfig.py:471] Config param reward_exact_format_match: 3.0 I0425 20:20:57.565056 137093242799936 pyconfig.py:471] Config param reward_partial_format_match: 0.5 I0425 20:20:57.565072 137093242799936 pyconfig.py:471] Config param reward_ratio_guess_to_answer_high: 0.5 I0425 20:20:57.565087 137093242799936 pyconfig.py:471] Config param reward_ratio_guess_to_answer_low: 0.25 I0425 20:20:57.565101 137093242799936 pyconfig.py:471] Config param reward_white_space_format_match: 1.5 I0425 20:20:57.565117 137093242799936 pyconfig.py:471] Config param rl: {'num_generations': 2, 'num_iterations': 1, 'grpo_beta': 0.08, 'grpo_epsilon': 0.2, 'loss_algo': 'grpo', 'use_agentic_rollout': False, 'max_concurrency': 256, 'off_policy_steps': 0, 'system_prompt': '', 'degenerate_group_masking': True, 'epsilon_high': None} I0425 20:20:57.565138 137093242799936 pyconfig.py:471] Config param rollout_data_parallelism: -1 I0425 20:20:57.565152 137093242799936 pyconfig.py:471] Config param rollout_expert_parallelism: 1 I0425 20:20:57.565168 137093242799936 pyconfig.py:471] Config param rollout_micro_batch_size: -1 I0425 20:20:57.565181 137093242799936 pyconfig.py:471] Config param rollout_tensor_parallelism: -1 I0425 20:20:57.565197 137093242799936 pyconfig.py:471] Config param rope_attention_scaling: False I0425 20:20:57.565212 137093242799936 pyconfig.py:471] Config param rope_factor: 40 I0425 20:20:57.565227 137093242799936 pyconfig.py:471] Config param rope_interleave: True I0425 20:20:57.565241 137093242799936 pyconfig.py:471] Config param rope_linear_scaling_factor: 1.0 I0425 20:20:57.565256 137093242799936 pyconfig.py:471] Config param rope_max_timescale: 10000 I0425 20:20:57.565271 137093242799936 pyconfig.py:471] Config param rope_min_timescale: 1 I0425 20:20:57.565287 137093242799936 pyconfig.py:471] Config param rope_theta_for_vit: 10000 I0425 20:20:57.565301 137093242799936 pyconfig.py:471] Config param rope_truncate: True I0425 20:20:57.565316 137093242799936 pyconfig.py:471] Config param rope_type: RopeType.DEFAULT I0425 20:20:57.565335 137093242799936 pyconfig.py:471] Config param rope_use_scale: True I0425 20:20:57.565351 137093242799936 pyconfig.py:471] Config param routed_bias: False I0425 20:20:57.565364 137093242799936 pyconfig.py:471] Config param routed_bias_update_rate: 0.0 I0425 20:20:57.565380 137093242799936 pyconfig.py:471] Config param routed_scaling_factor: 1.0 I0425 20:20:57.565393 137093242799936 pyconfig.py:471] Config param routed_score_func: I0425 20:20:57.565408 137093242799936 pyconfig.py:471] Config param run_name: gpt3-52k_2026-04-25-20-20 I0425 20:20:57.565423 137093242799936 pyconfig.py:471] Config param sa_block_kv: 512 I0425 20:20:57.565438 137093242799936 pyconfig.py:471] Config param sa_block_kv_compute: 512 I0425 20:20:57.565452 137093242799936 pyconfig.py:471] Config param sa_block_kv_dkv: 512 I0425 20:20:57.565467 137093242799936 pyconfig.py:471] Config param sa_block_kv_dkv_compute: 512 I0425 20:20:57.565481 137093242799936 pyconfig.py:471] Config param sa_block_kv_dq: 512 I0425 20:20:57.565496 137093242799936 pyconfig.py:471] Config param sa_block_q: 512 I0425 20:20:57.565510 137093242799936 pyconfig.py:471] Config param sa_block_q_dkv: 512 I0425 20:20:57.565526 137093242799936 pyconfig.py:471] Config param sa_block_q_dq: 512 I0425 20:20:57.565539 137093242799936 pyconfig.py:471] Config param sa_k_layout: HEAD_DIM_MINOR I0425 20:20:57.565555 137093242799936 pyconfig.py:471] Config param sa_q_layout: HEAD_DIM_MINOR I0425 20:20:57.565568 137093242799936 pyconfig.py:471] Config param sa_use_fused_bwd_kernel: False I0425 20:20:57.565583 137093242799936 pyconfig.py:471] Config param sa_v_layout: HEAD_DIM_MINOR I0425 20:20:57.565598 137093242799936 pyconfig.py:471] Config param sampler_devices_fraction: 0.5 I0425 20:20:57.565613 137093242799936 pyconfig.py:471] Config param save_checkpoint_on_completion: True I0425 20:20:57.565627 137093242799936 pyconfig.py:471] Config param save_config_to_gcs: False I0425 20:20:57.565643 137093242799936 pyconfig.py:471] Config param save_quantized_params_path: I0425 20:20:57.565657 137093242799936 pyconfig.py:471] Config param scale_embedding_for_audio: True I0425 20:20:57.565673 137093242799936 pyconfig.py:471] Config param scan_layers: True I0425 20:20:57.565688 137093242799936 pyconfig.py:471] Config param scan_layers_per_stage: False I0425 20:20:57.565703 137093242799936 pyconfig.py:471] Config param scan_pipeline_iterations: True I0425 20:20:57.565718 137093242799936 pyconfig.py:471] Config param scan_pipeline_repeats: False I0425 20:20:57.565734 137093242799936 pyconfig.py:471] Config param set_remat_policy_on_layers_per_stage: False I0425 20:20:57.565748 137093242799936 pyconfig.py:471] Config param set_remat_policy_on_pipeline_iterations: True I0425 20:20:57.565763 137093242799936 pyconfig.py:471] Config param sft_train_on_completion_only: False I0425 20:20:57.565777 137093242799936 pyconfig.py:471] Config param shard_exp_on_fsdp: False I0425 20:20:57.565792 137093242799936 pyconfig.py:471] Config param shard_mode: ShardMode.AUTO I0425 20:20:57.565807 137093242799936 pyconfig.py:471] Config param shard_optimizer_over_data: False I0425 20:20:57.565822 137093242799936 pyconfig.py:471] Config param sharding_strategy: None I0425 20:20:57.565836 137093242799936 pyconfig.py:471] Config param sharding_tolerance: 0.02 I0425 20:20:57.565852 137093242799936 pyconfig.py:471] Config param shardy: True I0425 20:20:57.565866 137093242799936 pyconfig.py:471] Config param share_kv_projections: False I0425 20:20:57.565894 137093242799936 pyconfig.py:471] Config param shared_experts: 0 I0425 20:20:57.565908 137093242799936 pyconfig.py:471] Config param sinkhorn_iterations: 20 I0425 20:20:57.565924 137093242799936 pyconfig.py:471] Config param skip_first_n_steps_for_profiler: 1 I0425 20:20:57.565937 137093242799936 pyconfig.py:471] Config param skip_jax_distributed_system: False I0425 20:20:57.565953 137093242799936 pyconfig.py:471] Config param skip_step_interval: 128 I0425 20:20:57.565966 137093242799936 pyconfig.py:471] Config param skip_step_on_spikes: False I0425 20:20:57.565982 137093242799936 pyconfig.py:471] Config param skip_step_scaling_factor: 6.0 I0425 20:20:57.565995 137093242799936 pyconfig.py:471] Config param sliding_window_size: 0 I0425 20:20:57.566011 137093242799936 pyconfig.py:471] Config param solution_end_token: </answer> I0425 20:20:57.566025 137093242799936 pyconfig.py:471] Config param solution_start_token: <answer> I0425 20:20:57.566040 137093242799936 pyconfig.py:471] Config param source_checkpoint_layout: orbax I0425 20:20:57.566054 137093242799936 pyconfig.py:471] Config param sparse_matmul: True I0425 20:20:57.566070 137093242799936 pyconfig.py:471] Config param spatial_merge_size_for_vit: 2 I0425 20:20:57.566083 137093242799936 pyconfig.py:471] Config param stack_prefill_result_cache: False I0425 20:20:57.566099 137093242799936 pyconfig.py:471] Config param stack_trace_interval_seconds: 600 I0425 20:20:57.566113 137093242799936 pyconfig.py:471] Config param stack_trace_to_cloud: False I0425 20:20:57.566128 137093242799936 pyconfig.py:471] Config param step_deviation_interval_seconds: 30 I0425 20:20:57.566142 137093242799936 pyconfig.py:471] Config param steps: 200000 I0425 20:20:57.566158 137093242799936 pyconfig.py:471] Config param stop_strings: None I0425 20:20:57.566171 137093242799936 pyconfig.py:471] Config param student_overrides: {'model_name': 'llama3.1-8b'} I0425 20:20:57.566188 137093242799936 pyconfig.py:471] Config param student_params_to_update: None I0425 20:20:57.566201 137093242799936 pyconfig.py:471] Config param subslice_shape: I0425 20:20:57.566217 137093242799936 pyconfig.py:471] Config param swap_space_vllm_gb: 2 I0425 20:20:57.566231 137093242799936 pyconfig.py:471] Config param system_prompt: I0425 20:20:57.566246 137093242799936 pyconfig.py:471] Config param target_eval_loss: 0.0 I0425 20:20:57.566260 137093242799936 pyconfig.py:471] Config param teacher_overrides: {'model_name': 'llama3.1-8b'} I0425 20:20:57.566276 137093242799936 pyconfig.py:471] Config param temperature_tuning: False I0425 20:20:57.566290 137093242799936 pyconfig.py:471] Config param temporal_patch_size_for_vit: 2 I0425 20:20:57.566305 137093242799936 pyconfig.py:471] Config param tensorboard_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-20-20/tensorboard/ I0425 20:20:57.566319 137093242799936 pyconfig.py:471] Config param tensors_on_device: None I0425 20:20:57.566337 137093242799936 pyconfig.py:471] Config param tensors_to_offload: None I0425 20:20:57.566351 137093242799936 pyconfig.py:471] Config param test_batch_start_index: 0 I0425 20:20:57.566366 137093242799936 pyconfig.py:471] Config param tile_size_for_vit: 336 I0425 20:20:57.566380 137093242799936 pyconfig.py:471] Config param tokenize_eval_data: True I0425 20:20:57.566395 137093242799936 pyconfig.py:471] Config param tokenize_train_data: True I0425 20:20:57.566409 137093242799936 pyconfig.py:471] Config param tokenizer_path: meta-llama/Llama-3.1-8B I0425 20:20:57.566424 137093242799936 pyconfig.py:471] Config param tokenizer_type: TokenizerType.HUGGINGFACE I0425 20:20:57.566440 137093242799936 pyconfig.py:471] Config param topk_routing_group: -1 I0425 20:20:57.566455 137093242799936 pyconfig.py:471] Config param train_data_columns: ['text'] I0425 20:20:57.566469 137093242799936 pyconfig.py:471] Config param train_fraction: 1.0 I0425 20:20:57.566483 137093242799936 pyconfig.py:471] Config param train_image_column: image I0425 20:20:57.566498 137093242799936 pyconfig.py:471] Config param train_micro_batch_size: -1 I0425 20:20:57.566512 137093242799936 pyconfig.py:471] Config param train_split: train I0425 20:20:57.566526 137093242799936 pyconfig.py:471] Config param trainable_parameters_mask: [] I0425 20:20:57.566541 137093242799936 pyconfig.py:471] Config param trainable_position_size: 2048 I0425 20:20:57.566555 137093242799936 pyconfig.py:471] Config param trainer_devices_fraction: 0.5 I0425 20:20:57.566571 137093242799936 pyconfig.py:471] Config param upload_all_profiler_results: False I0425 20:20:57.566585 137093242799936 pyconfig.py:471] Config param use_2d_fsdp_sharding: False I0425 20:20:57.566600 137093242799936 pyconfig.py:471] Config param use_agentic_rollout: False I0425 20:20:57.566617 137093242799936 pyconfig.py:471] Config param use_audio: False I0425 20:20:57.566641 137093242799936 pyconfig.py:471] Config param use_audio_in_video: False I0425 20:20:57.566667 137093242799936 pyconfig.py:471] Config param use_batch_split_schedule: False I0425 20:20:57.566687 137093242799936 pyconfig.py:471] Config param use_chat_template: False I0425 20:20:57.566701 137093242799936 pyconfig.py:471] Config param use_chunked_prefill: False I0425 20:20:57.566718 137093242799936 pyconfig.py:471] Config param use_custom_sort_vjp: True I0425 20:20:57.566743 137093242799936 pyconfig.py:471] Config param use_dpo: False I0425 20:20:57.566768 137093242799936 pyconfig.py:471] Config param use_gather_mosaic_kernel: False I0425 20:20:57.566793 137093242799936 pyconfig.py:471] Config param use_grpo: True I0425 20:20:57.566818 137093242799936 pyconfig.py:471] Config param use_indexer: False I0425 20:20:57.566842 137093242799936 pyconfig.py:471] Config param use_iota_embed: True I0425 20:20:57.566865 137093242799936 pyconfig.py:471] Config param use_jax_splash: False I0425 20:20:57.566903 137093242799936 pyconfig.py:471] Config param use_max_logit_estimate: -1 I0425 20:20:57.566925 137093242799936 pyconfig.py:471] Config param use_mrope: False I0425 20:20:57.566941 137093242799936 pyconfig.py:471] Config param use_multimodal: False I0425 20:20:57.566955 137093242799936 pyconfig.py:471] Config param use_nnx_pipeline: False I0425 20:20:57.566970 137093242799936 pyconfig.py:471] Config param use_pathways: True I0425 20:20:57.566984 137093242799936 pyconfig.py:471] Config param use_post_attn_norm: False I0425 20:20:57.567000 137093242799936 pyconfig.py:471] Config param use_post_ffw_norm: False I0425 20:20:57.567016 137093242799936 pyconfig.py:471] Config param use_qk_clip: False I0425 20:20:57.567032 137093242799936 pyconfig.py:471] Config param use_qk_norm: False I0425 20:20:57.567045 137093242799936 pyconfig.py:471] Config param use_qk_norm_in_gdn: True I0425 20:20:57.567063 137093242799936 pyconfig.py:471] Config param use_qwix_quantization: False I0425 20:20:57.567085 137093242799936 pyconfig.py:471] Config param use_ragged_attention: False I0425 20:20:57.567110 137093242799936 pyconfig.py:471] Config param use_random_routing: False I0425 20:20:57.567132 137093242799936 pyconfig.py:471] Config param use_replicator_service: False I0425 20:20:57.567152 137093242799936 pyconfig.py:471] Config param use_ring_of_experts: False I0425 20:20:57.567175 137093242799936 pyconfig.py:471] Config param use_sft: False I0425 20:20:57.567199 137093242799936 pyconfig.py:471] Config param use_splash_scheduler: False I0425 20:20:57.567222 137093242799936 pyconfig.py:471] Config param use_tokamax_gmm: False I0425 20:20:57.567247 137093242799936 pyconfig.py:471] Config param use_tokamax_splash: False I0425 20:20:57.567272 137093242799936 pyconfig.py:471] Config param use_truncation: True I0425 20:20:57.567297 137093242799936 pyconfig.py:471] Config param use_tunix_gradient_accumulation: False I0425 20:20:57.567326 137093242799936 pyconfig.py:471] Config param use_untrainable_positional_embedding: False I0425 20:20:57.567349 137093242799936 pyconfig.py:471] Config param use_vertex_tensorboard: False I0425 20:20:57.567374 137093242799936 pyconfig.py:471] Config param using_pipeline_parallelism: False I0425 20:20:57.567399 137093242799936 pyconfig.py:471] Config param v_head_dim: 128 I0425 20:20:57.567423 137093242799936 pyconfig.py:471] Config param v_norm_with_scale: True I0425 20:20:57.567447 137093242799936 pyconfig.py:471] Config param value_proj: RematLocation.REMAT I0425 20:20:57.567474 137093242799936 pyconfig.py:471] Config param vertex_tensorboard_project: I0425 20:20:57.567499 137093242799936 pyconfig.py:471] Config param vertex_tensorboard_region: I0425 20:20:57.567524 137093242799936 pyconfig.py:471] Config param video_path: I0425 20:20:57.567546 137093242799936 pyconfig.py:471] Config param video_placeholder: <|video|> I0425 20:20:57.567569 137093242799936 pyconfig.py:471] Config param vision_output_dim_for_vit: 4096 I0425 20:20:57.567591 137093242799936 pyconfig.py:471] Config param vision_output_length: -1 I0425 20:20:57.567616 137093242799936 pyconfig.py:471] Config param vllm_additional_config: {} I0425 20:20:57.567637 137093242799936 pyconfig.py:471] Config param vllm_hf_config_path: I0425 20:20:57.567661 137093242799936 pyconfig.py:471] Config param vllm_hf_overrides: {} I0425 20:20:57.567682 137093242799936 pyconfig.py:471] Config param vocab_size: 32000 I0425 20:20:57.567704 137093242799936 pyconfig.py:471] Config param warmup_steps_fraction: 0.1 I0425 20:20:57.567730 137093242799936 pyconfig.py:471] Config param weight_dtype: float32 I0425 20:20:57.567769 137093242799936 pyconfig.py:471] Config param weight_quantization_calibration_method: absmax I0425 20:20:57.567791 137093242799936 pyconfig.py:471] Config param wi_tile_dlhs_batch_seq: 512 I0425 20:20:57.567815 137093242799936 pyconfig.py:471] Config param wi_tile_dlhs_embed_dim: 1024 I0425 20:20:57.567838 137093242799936 pyconfig.py:471] Config param wi_tile_dlhs_mlp_dim: 1024 I0425 20:20:57.567863 137093242799936 pyconfig.py:471] Config param wi_tile_drhs_batch_seq: 512 I0425 20:20:57.567902 137093242799936 pyconfig.py:471] Config param wi_tile_drhs_embed_dim: 1024 I0425 20:20:57.567928 137093242799936 pyconfig.py:471] Config param wi_tile_drhs_mlp_dim: 1024 I0425 20:20:57.567953 137093242799936 pyconfig.py:471] Config param wi_tile_fwd_batch_seq: 512 I0425 20:20:57.567976 137093242799936 pyconfig.py:471] Config param wi_tile_fwd_embed_dim: 1024 I0425 20:20:57.568001 137093242799936 pyconfig.py:471] Config param wi_tile_fwd_mlp_dim: 1024 I0425 20:20:57.568023 137093242799936 pyconfig.py:471] Config param wo_tile_dlhs_batch_seq: 512 I0425 20:20:57.568045 137093242799936 pyconfig.py:471] Config param wo_tile_dlhs_embed_dim: 1024 I0425 20:20:57.568067 137093242799936 pyconfig.py:471] Config param wo_tile_dlhs_mlp_dim: 1024 I0425 20:20:57.568089 137093242799936 pyconfig.py:471] Config param wo_tile_drhs_batch_seq: 512 I0425 20:20:57.568111 137093242799936 pyconfig.py:471] Config param wo_tile_drhs_embed_dim: 1024 I0425 20:20:57.568133 137093242799936 pyconfig.py:471] Config param wo_tile_drhs_mlp_dim: 1024 I0425 20:20:57.568155 137093242799936 pyconfig.py:471] Config param wo_tile_fwd_batch_seq: 512 I0425 20:20:57.568177 137093242799936 pyconfig.py:471] Config param wo_tile_fwd_embed_dim: 1024 I0425 20:20:57.568198 137093242799936 pyconfig.py:471] Config param wo_tile_fwd_mlp_dim: 1024 I0425 20:20:57.568220 137093242799936 pyconfig.py:471] Config param wsd_decay_steps_fraction: 0.1 I0425 20:20:57.568243 137093242799936 pyconfig.py:471] Config param wsd_decay_style: WsdDecayStyle.LINEAR I0425 20:20:57.568269 137093242799936 pyconfig.py:471] Config param xprof_e2e_enable_fw_power_level_event: False I0425 20:20:57.568293 137093242799936 pyconfig.py:471] Config param xprof_e2e_enable_fw_thermal_event: False I0425 20:20:57.568315 137093242799936 pyconfig.py:471] Config param xprof_e2e_enable_fw_throttle_event: False I0425 20:20:57.568343 137093242799936 pyconfig.py:471] Config param xprof_tpu_power_trace_level: 0 I0425 20:20:57.568368 137093242799936 pyconfig.py:471] Config param z_loss_multiplier: 0.0 I0425 20:20:57.568815 137093242799936 tokenizer.py:245] Tokenizer path: meta-llama/Llama-2-7b-chat-hf I0425 20:20:57.568861 137093242799936 tokenizer.py:224] Loading HF tokenizer: meta-llama/Llama-2-7b-chat-hf I0425 20:21:01.215321 137093242799936 _schedule.py:129] A polynomial schedule was set with a non-positive `transition_steps` value; this results in a constant schedule with value `init_value`. I0425 20:21:01.218285 137093242799936 maxtext_utils.py:1565] Num_devices: 32, shape (1, 4, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1) I0425 20:21:01.218394 137093242799936 train_distill.py:596] Applying logical axis rules for model initialization and training... I0425 20:21:01.218466 137093242799936 train_distill.py:600] Loading Student from ... I0425 20:21:01.218494 137093242799936 train_distill.py:169] --- Student Configuration --- I0425 20:21:01.218515 137093242799936 train_distill.py:170] Model Name: gpt3-52k I0425 20:21:01.218537 137093242799936 train_distill.py:171] Dimensions: 1 Layers, 16 Emb Dim, 8 Head Dim I0425 20:21:01.218555 137093242799936 train_distill.py:174] Attention Heads: 2 Query, 2 KV I0425 20:21:01.218573 137093242799936 train_distill.py:175] Vocab Size: 32000 I0425 20:21:01.218598 137093242799936 train_distill.py:176] Checkpoint: I0425 20:21:01.218617 137093242799936 train_distill.py:465] Initializing model: gpt3-52k... I0425 20:21:02.625485 137093242799936 train_distill.py:614] Loading Teacher from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items... I0425 20:21:02.625594 137093242799936 train_distill.py:169] --- Teacher Configuration --- I0425 20:21:02.625624 137093242799936 train_distill.py:170] Model Name: gpt3-52k I0425 20:21:02.625651 137093242799936 train_distill.py:171] Dimensions: 1 Layers, 16 Emb Dim, 8 Head Dim I0425 20:21:02.625670 137093242799936 train_distill.py:174] Attention Heads: 2 Query, 2 KV I0425 20:21:02.625690 137093242799936 train_distill.py:175] Vocab Size: 32000 I0425 20:21:02.625707 137093242799936 train_distill.py:176] Checkpoint: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items I0425 20:21:02.625726 137093242799936 train_distill.py:465] Initializing model: gpt3-52k... I0425 20:21:03.691725 137093242799936 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0425 20:21:03.692182 137093242799936 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=True, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7caecd674b00>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0425 20:21:03.692241 137093242799936 abstract_checkpointer.py:35] orbax-checkpoint version: 0.11.28 W0425 20:21:04.245743 137093242799936 checkpoint.py:202] Metadata file does not exist: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items/_CHECKPOINT_METADATA I0425 20:21:04.770398 2144 google_auth_provider.cc:181] Running on GCE, using service account 562977990677-compute@developer.gserviceaccount.com I0425 20:21:05.958174 137093242799936 checkpointer.py:304] Restoring checkpoint from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items. W0425 20:21:07.952928 137093242799936 transform_utils.py:230] The transformations API will eventually be replaced by an upgraded design. The current API will not be removed until this point, but it will no longer be actively worked on. I0425 20:21:07.953288 137093242799936 transform_utils.py:288] The following keys are not loaded from the original tree after applying specified transforms: params/params/decoder/to_nnx__rngs/aqt/count, params/params/decoder/to_nnx__rngs/aqt/key, params/params/decoder/to_nnx__rngs/dropout/count, params/params/decoder/to_nnx__rngs/dropout/key, params/params/decoder/to_nnx__rngs/params/count, params/params/decoder/to_nnx__rngs/params/key I0425 20:21:09.079477 137093242799936 checkpointer.py:318] Finished restoring checkpoint in 3.51 seconds from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items. I0425 20:21:09.771392 137093242799936 train_distill.py:640] Initializing Data Iterators via MaxText pipeline... I0425 20:21:09.834831 137093242799936 config.py:112] TensorFlow version 2.20.0 available. I0425 20:21:09.835328 137093242799936 config.py:125] JAX version 0.8.3 available. E0425 20:21:11.964960 137093242799936 packing.py:209] PackAndBatchOperation is deprecated. Please use lazy_dataset.FirstFitPackIterDataset instead. I0425 20:21:11.965175 137093242799936 data_loader.py:408] Adding CopyNumPyArrayToSharedMemory MapTransform. I0425 20:21:11.968232 137093242799936 train_distill.py:410] Input Pipeline Checkpointing: DISABLED I0425 20:21:11.968306 137093242799936 train_distill.py:414] Reason: Iterator 'MultiHostDataLoadIterator' is not recognized as Grain (dataset_type='DatasetType.HF', has_save=False) I0425 20:21:11.968386 137093242799936 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0425 20:21:11.968486 137093242799936 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7caecd674b00>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0425 20:21:11.968545 137093242799936 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0425 20:21:11.968594 137093242799936 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7caecd674b00>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0425 20:21:11.968660 137093242799936 checkpoint_manager.py:702] [process=4][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c9880e04380>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c988155c440>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c988069b4a0>}, handler_registry=None I0425 20:21:11.968942 137093242799936 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c9880e04380>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0425 20:21:11.968995 137093242799936 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c988155c440>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0425 20:21:11.969026 137093242799936 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c988069b4a0>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0425 20:21:11.969063 137093242799936 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c988069b0e0>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0425 20:21:11.969102 137093242799936 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c9880e04380>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c9880e04380>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c988155c440>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c988155c440>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c988069b4a0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c988069b4a0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c988069b0e0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c988069b0e0>}). I0425 20:21:11.969551 137093242799936 async_checkpointer.py:177] [process=4][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7c97e3fff240> timeout: 600 secs and primary_host=0 for async checkpoint writes I0425 20:21:14.284101 137093242799936 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_test_pipeline_scan_nnx_20260425_201236/pt_distill_linen_xpk_test_pipeline_scan_nnx_20260425_201236_07_distill_smoke/checkpoints I0425 20:21:14.300472 137093242799936 checkpoint_manager.py:921] [process=4][thread=MainThread] CheckpointManager created, primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_test_pipeline_scan_nnx_20260425_201236/pt_distill_linen_xpk_test_pipeline_scan_nnx_20260425_201236_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7c988069b470> I0425 20:21:14.300601 137093242799936 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0425 20:21:14.300668 137093242799936 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7caecd674b00>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0425 20:21:14.300702 137093242799936 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0425 20:21:14.300732 137093242799936 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7caecd674b00>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0425 20:21:14.300768 137093242799936 checkpoint_manager.py:1983] [process=4][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning. I0425 20:21:14.300822 137093242799936 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=137093242799936 count=1 at 0x7ca7001d3940>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7c988069b260>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7c988069b230>, _write_futures=[]) I0425 20:21:14.301172 137093242799936 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=137093242799936 count=1 at 0x7ca7001d3940>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7c988069b260>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7c988069b230>, _write_futures=[]) I0425 20:21:14.301199 137093242799936 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=137093242799936 count=1 at 0x7ca7001d3940>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7c988069b260>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7c988069b230>, _write_futures=[]) I0425 20:21:14.301230 137093242799936 checkpoint_manager.py:702] [process=4][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c988069b440>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c97e3e3eff0>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c97e3e3f0e0>, 'iter': <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7c97e3e3dfd0>}, handler_registry=None I0425 20:21:14.301326 137093242799936 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c988069b440>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0425 20:21:14.301360 137093242799936 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c97e3e3eff0>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0425 20:21:14.301384 137093242799936 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c97e3e3f0e0>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0425 20:21:14.301411 137093242799936 composite_checkpoint_handler.py:237] Deferred registration for item: "iter". Adding handler `<maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7c97e3e3dfd0>` for item "iter" and save args `<class 'maxtext.common.checkpointing.GrainCheckpointSave'>` and restore args `<class 'maxtext.common.checkpointing.GrainCheckpointRestore'>` to `_handler_registry`. I0425 20:21:14.301433 137093242799936 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c97e3e3dc10>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0425 20:21:14.301457 137093242799936 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c988069b440>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c988069b440>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c97e3e3eff0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7c97e3e3eff0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c97e3e3f0e0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c97e3e3f0e0>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointSave'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7c97e3e3dfd0>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointRestore'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7c97e3e3dfd0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c97e3e3dc10>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7c97e3e3dc10>}). I0425 20:21:14.301530 137093242799936 async_checkpointer.py:177] [process=4][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7c97e3fff380> timeout: 600 secs and primary_host=0 for async checkpoint writes I0425 20:21:14.673759 137093242799936 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_test_pipeline_scan_nnx_20260425_201236/pt_distill_linen_xpk_test_pipeline_scan_nnx_20260425_201236_07_distill_smoke/checkpoints I0425 20:21:15.126462 137093242799936 checkpoint_manager.py:921] [process=4][thread=MainThread] CheckpointManager created, primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_test_pipeline_scan_nnx_20260425_201236/pt_distill_linen_xpk_test_pipeline_scan_nnx_20260425_201236_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7c9880e060c0> I0425 20:21:15.127110 137093242799936 train_distill.py:691] Starting Distillation Training... I0425 20:21:15.127236 137093242799936 peft_trainer.py:590] Training with mesh: Mesh('diloco': 1, 'data': 4, 'stage': 1, 'fsdp': 8, 'fsdp_transpose': 1, 'sequence': 1, 'context': 1, 'context_autoregressive': 1, 'tensor': 1, 'tensor_transpose': 1, 'tensor_sequence': 1, 'expert': 1, 'autoregressive': 1, axis_types=(Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto)) I0425 20:21:15.475644 137093242799936 peft_trainer.py:600] Compiled train_step cache size: 0 Training: 0%| | 0/5 [00:00<?, ?step/s]I0425 20:21:15.477568 136948630931200 grain_pool.py:367] Grain pool will use 1 processes. I0425 20:21:15.504040 136948630931200 grain_pool.py:440] Grain pool will start child processes. I0425 20:21:15.509222 136948630931200 grain_pool.py:448] Grain pool started all child processes. 2026-04-25 20:21:21.522615: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303) Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} `rope_scaling`'s factor field must be a float >= 1, got 40 `rope_scaling`'s beta_fast field must be a float, got 32 `rope_scaling`'s beta_slow field must be a float, got 1 Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} I0425 20:21:24.767581 137093242799936 utils.py:86] Train loop finished in: 9.2913 seconds Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 765, in <module> app.run(main) File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 316, in run _run_main(main, args) File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 261, in _run_main sys.exit(main(argv)) ^^^^^^^^^^ File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 761, in main train_distill(student_config, teacher_config, is_offline, global_config.offline_data_dir) File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 693, in train_distill trainer.train(train_iter, eval_iter) File "/usr/local/lib/python3.12/site-packages/tunix/sft/peft_trainer.py", line 659, in train train_example = sharding_utils.shard_input( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 58, in shard_input return jax.tree.map( ^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/tree.py", line 155, in map return tree_util.tree_map(f, tree, *rest, is_leaf=is_leaf) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 362, in tree_map return treedef.unflatten(f(*xs) for xs in zip(*all_leaves)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 362, in <genexpr> return treedef.unflatten(f(*xs) for xs in zip(*all_leaves)) ^^^^^^ File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 59, in <lambda> lambda x: jax.make_array_from_process_local_data( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 986, in make_array_from_process_local_data out = [_array_from_process_local_data(data, s, shape) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 1048, in _array_from_process_local_data return make_array_from_callback(global_shape, sharding, cb) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 845, in make_array_from_callback per_device_values = api.device_put(per_device_values, devices) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/api.py", line 2729, in device_put out_flat = dispatch._batched_device_put_impl( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 558, in _batched_device_put_impl y = _device_put_impl(x, device=device, src=src, copy=cp, aval=aval) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 545, in _device_put_impl return _device_put_sharding_impl(x, aval, device, copy) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 487, in _device_put_sharding_impl raise ValueError( ValueError: device_put's first argument must be a fully addressable array, but got value with devices {TpuDevice(id=22, process_index=5, coords=(2,5,0), core_on_chip=0), TpuDevice(id=27, process_index=7, coords=(3,6,0), core_on_chip=0), TpuDevice(id=29, process_index=6, coords=(1,7,0), core_on_chip=0), TpuDevice(id=7, process_index=1, coords=(3,1,0), core_on_chip=0), TpuDevice(id=30, process_index=7, coords=(2,7,0), core_on_chip=0), TpuDevice(id=13, process_index=2, coords=(1,3,0), core_on_chip=0), TpuDevice(id=5, process_index=0, coords=(1,1,0), core_on_chip=0), TpuDevice(id=4, process_index=0, coords=(0,1,0), core_on_chip=0), TpuDevice(id=1, process_index=0, coords=(1,0,0), core_on_chip=0), TpuDevice(id=10, process_index=3, coords=(2,2,0), core_on_chip=0), TpuDevice(id=23, process_index=5, coords=(3,5,0), core_on_chip=0), TpuDevice(id=25, process_index=6, coords=(1,6,0), core_on_chip=0), TpuDevice(id=20, process_index=4, coords=(0,5,0), core_on_chip=0), TpuDevice(id=8, process_index=2, coords=(0,2,0), core_on_chip=0), TpuDevice(id=3, process_index=1, coords=(3,0,0), core_on_chip=0), TpuDevice(id=15, process_index=3, coords=(3,3,0), core_on_chip=0), TpuDevice(id=11, process_index=3, coords=(3,2,0), core_on_chip=0), TpuDevice(id=12, process_index=2, coords=(0,3,0), core_on_chip=0), TpuDevice(id=24, process_index=6, coords=(0,6,0), core_on_chip=0), TpuDevice(id=19, process_index=5, coords=(3,4,0), core_on_chip=0), TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), TpuDevice(id=21, process_index=4, coords=(1,5,0), core_on_chip=0), TpuDevice(id=31, process_index=7, coords=(3,7,0), core_on_chip=0), TpuDevice(id=6, process_index=1, coords=(2,1,0), core_on_chip=0), TpuDevice(id=26, process_index=7, coords=(2,6,0), core_on_chip=0), TpuDevice(id=28, process_index=6, coords=(0,7,0), core_on_chip=0), TpuDevice(id=14, process_index=3, coords=(2,3,0), core_on_chip=0), TpuDevice(id=16, process_index=4, coords=(0,4,0), core_on_chip=0), TpuDevice(id=18, process_index=5, coords=(2,4,0), core_on_chip=0), TpuDevice(id=9, process_index=2, coords=(1,2,0), core_on_chip=0), TpuDevice(id=17, process_index=4, coords=(1,4,0), core_on_chip=0), TpuDevice(id=2, process_index=1, coords=(2,0,0), core_on_chip=0)} I0425 20:21:25.110014 136948630931200 grain_pool.py:542] Grain pool is exiting. I0425 20:21:25.110114 136948630931200 grain_pool.py:547] Shutting down multiprocessing system. I0425 20:21:26.555464 136948630931200 grain_pool.py:547] Shutting down multiprocessing system. Training: 0%| | 0/5 [00:13<?, ?step/s] /usr/local/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 15 leaked shared_memory objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' XPK End: Sat Apr 25 20:21:34 UTC 2026 EXIT_CODE=1
XPK Start: Sat Apr 25 20:30:30 UTC 2026 2026-04-25 20:30:47.767562: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303) Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} `rope_scaling`'s factor field must be a float >= 1, got 40 `rope_scaling`'s beta_fast field must be a float, got 32 `rope_scaling`'s beta_slow field must be a float, got 1 Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} I0425 20:30:51.792989 137607144601408 max_utils.py:273] Attempting to initialize the jax distributed system... INFO:2026-04-25 20:31:00,832:jax._src.distributed:149: Starting JAX distributed service on [::]:8482 I0425 20:31:00.832506 137607144601408 distributed.py:149] Starting JAX distributed service on [::]:8482 INFO:2026-04-25 20:31:00,834:jax._src.distributed:166: Connecting to JAX distributed service on mt-07-distill-smoke-8mwiu-slice-job-0-0.mt-07-distill-smoke-8mwiu:8482 I0425 20:31:00.834697 137607144601408 distributed.py:166] Connecting to JAX distributed service on mt-07-distill-smoke-8mwiu-slice-job-0-0.mt-07-distill-smoke-8mwiu:8482 I0425 20:31:01.829145 137607144601408 max_utils.py:284] Jax distributed system initialized! I0425 20:31:08.550554 137607144601408 max_utils.py:244] Jax distributed system is already initialized. W0425 20:31:08.684797 137607144601408 pyconfig.py:111] base_output_directory is not provided; Using local directory called maxtext_output I0425 20:31:08.744848 137607144601408 max_utils.py:244] Jax distributed system is already initialized. I0425 20:31:08.746036 137607144601408 pyconfig.py:471] Config param abort_on_inf_loss: True I0425 20:31:08.746084 137607144601408 pyconfig.py:471] Config param abort_on_nan_loss: True I0425 20:31:08.746122 137607144601408 pyconfig.py:471] Config param act_quantization_calibration_method: absmax I0425 20:31:08.746144 137607144601408 pyconfig.py:471] Config param activation_dropout_for_audio: 0.0 I0425 20:31:08.746162 137607144601408 pyconfig.py:471] Config param activation_function_for_audio: gelu I0425 20:31:08.746181 137607144601408 pyconfig.py:471] Config param activations_in_float32: False I0425 20:31:08.746200 137607144601408 pyconfig.py:471] Config param adam_b1: 0.9 I0425 20:31:08.746218 137607144601408 pyconfig.py:471] Config param adam_b2: 0.95 I0425 20:31:08.746246 137607144601408 pyconfig.py:471] Config param adam_eps: 1e-08 I0425 20:31:08.746269 137607144601408 pyconfig.py:471] Config param adam_eps_root: 0.0 I0425 20:31:08.746287 137607144601408 pyconfig.py:471] Config param adam_weight_decay: 0.1 I0425 20:31:08.746304 137607144601408 pyconfig.py:471] Config param adamw_mask: [] I0425 20:31:08.746321 137607144601408 pyconfig.py:471] Config param add_bos: True I0425 20:31:08.746338 137607144601408 pyconfig.py:471] Config param add_eos: True I0425 20:31:08.746356 137607144601408 pyconfig.py:471] Config param allow_split_physical_axes: False I0425 20:31:08.746372 137607144601408 pyconfig.py:471] Config param ar_cache_axis_order: 1,2,0,3 I0425 20:31:08.746388 137607144601408 pyconfig.py:471] Config param async_checkpointing: True I0425 20:31:08.746404 137607144601408 pyconfig.py:471] Config param async_scheduling: False I0425 20:31:08.746420 137607144601408 pyconfig.py:471] Config param attention: dot_product I0425 20:31:08.746437 137607144601408 pyconfig.py:471] Config param attention_bias: False I0425 20:31:08.746453 137607144601408 pyconfig.py:471] Config param attention_dropout_for_audio: 0.0 I0425 20:31:08.746470 137607144601408 pyconfig.py:471] Config param attention_out: RematLocation.REMAT I0425 20:31:08.746491 137607144601408 pyconfig.py:471] Config param attention_output_dim: -1 I0425 20:31:08.746506 137607144601408 pyconfig.py:471] Config param attention_sink: False I0425 20:31:08.746524 137607144601408 pyconfig.py:471] Config param attention_type: global I0425 20:31:08.746557 137607144601408 pyconfig.py:471] Config param attn_logits_soft_cap: None I0425 20:31:08.746572 137607144601408 pyconfig.py:471] Config param audio_path: I0425 20:31:08.746589 137607144601408 pyconfig.py:471] Config param audio_placeholder: <|audio|> I0425 20:31:08.746604 137607144601408 pyconfig.py:471] Config param autoregressive_decode_assert: I0425 20:31:08.746621 137607144601408 pyconfig.py:471] Config param base_config: base.yml I0425 20:31:08.746638 137607144601408 pyconfig.py:471] Config param base_emb_dim: 16 I0425 20:31:08.746653 137607144601408 pyconfig.py:471] Config param base_mlp_dim: 64 I0425 20:31:08.746669 137607144601408 pyconfig.py:471] Config param base_moe_mlp_dim: -1 I0425 20:31:08.746686 137607144601408 pyconfig.py:471] Config param base_num_decoder_layers: 1 I0425 20:31:08.746702 137607144601408 pyconfig.py:471] Config param base_num_kv_heads: 2 I0425 20:31:08.746718 137607144601408 pyconfig.py:471] Config param base_num_query_heads: 2 I0425 20:31:08.746733 137607144601408 pyconfig.py:471] Config param base_output_directory: /deps/maxtext_output I0425 20:31:08.746749 137607144601408 pyconfig.py:471] Config param batch_size: 1 I0425 20:31:08.746764 137607144601408 pyconfig.py:471] Config param batch_split_factor: 1 I0425 20:31:08.746780 137607144601408 pyconfig.py:471] Config param beta_fast: 32 I0425 20:31:08.746797 137607144601408 pyconfig.py:471] Config param beta_slow: 1 I0425 20:31:08.746813 137607144601408 pyconfig.py:471] Config param bwd_quantization_calibration_method: absmax I0425 20:31:08.746830 137607144601408 pyconfig.py:471] Config param capacity_factor: -1.0 I0425 20:31:08.746846 137607144601408 pyconfig.py:471] Config param cast_logits_to_fp32: True I0425 20:31:08.746863 137607144601408 pyconfig.py:471] Config param chat_template: I0425 20:31:08.746879 137607144601408 pyconfig.py:471] Config param chat_template_path: I0425 20:31:08.746895 137607144601408 pyconfig.py:471] Config param checkpoint_conversion_fn: None I0425 20:31:08.746912 137607144601408 pyconfig.py:471] Config param checkpoint_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-20-31/checkpoints/ I0425 20:31:08.746930 137607144601408 pyconfig.py:471] Config param checkpoint_is_quantized: False I0425 20:31:08.746947 137607144601408 pyconfig.py:471] Config param checkpoint_period: 2000 I0425 20:31:08.746963 137607144601408 pyconfig.py:471] Config param checkpoint_storage_concurrent_gb: 96 I0425 20:31:08.746980 137607144601408 pyconfig.py:471] Config param checkpoint_storage_target_data_file_size_bytes: 2147483648 I0425 20:31:08.746996 137607144601408 pyconfig.py:471] Config param checkpoint_storage_use_ocdbt: True I0425 20:31:08.747012 137607144601408 pyconfig.py:471] Config param checkpoint_storage_use_zarr3: True I0425 20:31:08.747028 137607144601408 pyconfig.py:471] Config param checkpoint_todelete_full_path: None I0425 20:31:08.747043 137607144601408 pyconfig.py:471] Config param checkpoint_todelete_subdir: None I0425 20:31:08.747060 137607144601408 pyconfig.py:471] Config param chips_per_vm: 4 I0425 20:31:08.747074 137607144601408 pyconfig.py:471] Config param chunk_attn_window_size: 0 I0425 20:31:08.747090 137607144601408 pyconfig.py:471] Config param collect_stack_trace: False I0425 20:31:08.747116 137607144601408 pyconfig.py:471] Config param colocated_python_checkpointing: False I0425 20:31:08.747132 137607144601408 pyconfig.py:471] Config param colocated_python_data_input: False I0425 20:31:08.747147 137607144601408 pyconfig.py:471] Config param compile_topology: I0425 20:31:08.747162 137607144601408 pyconfig.py:471] Config param compile_topology_num_slices: -1 I0425 20:31:08.747178 137607144601408 pyconfig.py:471] Config param compile_xla_flags: I0425 20:31:08.747193 137607144601408 pyconfig.py:471] Config param compiled_trainstep_file: I0425 20:31:08.747209 137607144601408 pyconfig.py:471] Config param compute_axis_order: 0,1,2,3 I0425 20:31:08.747230 137607144601408 pyconfig.py:471] Config param constant_bound_config: [] I0425 20:31:08.747245 137607144601408 pyconfig.py:471] Config param context: RematLocation.REMAT I0425 20:31:08.747261 137607144601408 pyconfig.py:471] Config param context_parallel_load_balance: True I0425 20:31:08.747276 137607144601408 pyconfig.py:471] Config param context_parallel_reorder_strategy: ReorderStrategy.AUTO I0425 20:31:08.747294 137607144601408 pyconfig.py:471] Config param context_parallel_size: 1 I0425 20:31:08.747310 137607144601408 pyconfig.py:471] Config param context_parallel_strategy: all_gather I0425 20:31:08.747324 137607144601408 pyconfig.py:471] Config param context_sharding: context I0425 20:31:08.747340 137607144601408 pyconfig.py:471] Config param conv_chunksize_for_audio: 500 I0425 20:31:08.747357 137607144601408 pyconfig.py:471] Config param conv_stride_for_vit: 14 I0425 20:31:08.747372 137607144601408 pyconfig.py:471] Config param convert_checkpoint_if_possible: False I0425 20:31:08.747388 137607144601408 pyconfig.py:471] Config param cost_estimate_flops_bwd: -1 I0425 20:31:08.747402 137607144601408 pyconfig.py:471] Config param cost_estimate_flops_fwd: -1 I0425 20:31:08.747418 137607144601408 pyconfig.py:471] Config param custom_mesh: I0425 20:31:08.747432 137607144601408 pyconfig.py:471] Config param custom_mesh_and_rule: I0425 20:31:08.747447 137607144601408 pyconfig.py:471] Config param d_model_for_audio: 256 I0425 20:31:08.747462 137607144601408 pyconfig.py:471] Config param data_sharding: (('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'),) I0425 20:31:08.747482 137607144601408 pyconfig.py:471] Config param data_shuffle_seed: 0 I0425 20:31:08.747498 137607144601408 pyconfig.py:471] Config param dataset_name: c4/en:3.0.1 I0425 20:31:08.747512 137607144601408 pyconfig.py:471] Config param dataset_path: I0425 20:31:08.747527 137607144601408 pyconfig.py:471] Config param dataset_type: DatasetType.HF I0425 20:31:08.747545 137607144601408 pyconfig.py:471] Config param dcn_autoregressive_parallelism: 1 I0425 20:31:08.747560 137607144601408 pyconfig.py:471] Config param dcn_context_autoregressive_parallelism: 1 I0425 20:31:08.747576 137607144601408 pyconfig.py:471] Config param dcn_context_parallelism: 1 I0425 20:31:08.747591 137607144601408 pyconfig.py:471] Config param dcn_data_parallelism: -1 I0425 20:31:08.747606 137607144601408 pyconfig.py:471] Config param dcn_diloco_parallelism: 1 I0425 20:31:08.747621 137607144601408 pyconfig.py:471] Config param dcn_expert_parallelism: 1 I0425 20:31:08.747637 137607144601408 pyconfig.py:471] Config param dcn_fsdp_parallelism: 1 I0425 20:31:08.747652 137607144601408 pyconfig.py:471] Config param dcn_fsdp_transpose_parallelism: 1 I0425 20:31:08.747668 137607144601408 pyconfig.py:471] Config param dcn_parallelism: [1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] I0425 20:31:08.747686 137607144601408 pyconfig.py:471] Config param dcn_pipeline_parallelism: 1 I0425 20:31:08.747703 137607144601408 pyconfig.py:471] Config param dcn_sequence_parallelism: 1 I0425 20:31:08.747717 137607144601408 pyconfig.py:471] Config param dcn_tensor_parallelism: 1 I0425 20:31:08.747733 137607144601408 pyconfig.py:471] Config param dcn_tensor_sequence_parallelism: 1 I0425 20:31:08.747748 137607144601408 pyconfig.py:471] Config param dcn_tensor_transpose_parallelism: 1 I0425 20:31:08.747763 137607144601408 pyconfig.py:471] Config param debug: {'rl': False} I0425 20:31:08.747779 137607144601408 pyconfig.py:471] Config param debug_sharding: False I0425 20:31:08.747795 137607144601408 pyconfig.py:471] Config param decode_sampling_nucleus_p: -1 I0425 20:31:08.747809 137607144601408 pyconfig.py:471] Config param decode_sampling_strategy: SamplingStrategy.GREEDY I0425 20:31:08.747827 137607144601408 pyconfig.py:471] Config param decode_sampling_temperature: 1.0 I0425 20:31:08.747843 137607144601408 pyconfig.py:471] Config param decode_sampling_top_k: 0 I0425 20:31:08.747858 137607144601408 pyconfig.py:471] Config param decoder_block: DecoderBlockType.GPT3 I0425 20:31:08.747874 137607144601408 pyconfig.py:471] Config param decoder_layer_input: RematLocation.DEVICE I0425 20:31:08.747890 137607144601408 pyconfig.py:471] Config param deepstack_visual_indexes_for_vit: [] I0425 20:31:08.747905 137607144601408 pyconfig.py:471] Config param degenerate_group_masking: True I0425 20:31:08.747920 137607144601408 pyconfig.py:471] Config param dense_init_scale: 1.0 I0425 20:31:08.747936 137607144601408 pyconfig.py:471] Config param diloco_outer_lr: 0.3 I0425 20:31:08.747951 137607144601408 pyconfig.py:471] Config param diloco_outer_momentum: 0.9 I0425 20:31:08.747967 137607144601408 pyconfig.py:471] Config param diloco_sync_period: 36 I0425 20:31:08.747983 137607144601408 pyconfig.py:471] Config param distill_alpha: 0.5 I0425 20:31:08.747998 137607144601408 pyconfig.py:471] Config param distill_alpha_end: None I0425 20:31:08.748014 137607144601408 pyconfig.py:471] Config param distill_alpha_schedule: constant I0425 20:31:08.748030 137607144601408 pyconfig.py:471] Config param distill_beta: 0.0 I0425 20:31:08.748044 137607144601408 pyconfig.py:471] Config param distill_beta_end: None I0425 20:31:08.748060 137607144601408 pyconfig.py:471] Config param distill_beta_schedule: constant I0425 20:31:08.748075 137607144601408 pyconfig.py:471] Config param distill_feature_loss_type: cosine I0425 20:31:08.748090 137607144601408 pyconfig.py:471] Config param distill_layer_indices: None I0425 20:31:08.748116 137607144601408 pyconfig.py:471] Config param distill_temperature: 1.0 I0425 20:31:08.748131 137607144601408 pyconfig.py:471] Config param distill_temperature_end: None I0425 20:31:08.748147 137607144601408 pyconfig.py:471] Config param distill_temperature_schedule: constant I0425 20:31:08.748162 137607144601408 pyconfig.py:471] Config param downsample_hidden_size_for_audio: 256 I0425 20:31:08.748178 137607144601408 pyconfig.py:471] Config param dpo_beta: 0.1 I0425 20:31:08.748195 137607144601408 pyconfig.py:471] Config param dpo_label_smoothing: 0.0 I0425 20:31:08.748209 137607144601408 pyconfig.py:471] Config param dq_reduction_steps: 0 I0425 20:31:08.748229 137607144601408 pyconfig.py:471] Config param dropout_rate: 0.0 I0425 20:31:08.748245 137607144601408 pyconfig.py:471] Config param dtype: bfloat16 I0425 20:31:08.748275 137607144601408 pyconfig.py:471] Config param dtype_mm: float32 I0425 20:31:08.748290 137607144601408 pyconfig.py:471] Config param dump_hlo: False I0425 20:31:08.748306 137607144601408 pyconfig.py:471] Config param dump_hlo_delete_local_after: True I0425 20:31:08.748322 137607144601408 pyconfig.py:471] Config param dump_hlo_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-20-31/xla_dump I0425 20:31:08.748336 137607144601408 pyconfig.py:471] Config param dump_hlo_local_dir: /tmp/xla_dump/ I0425 20:31:08.748353 137607144601408 pyconfig.py:471] Config param dump_hlo_local_module_name: jit_train_step I0425 20:31:08.748369 137607144601408 pyconfig.py:471] Config param dump_hlo_module_name: jit_train_step I0425 20:31:08.748383 137607144601408 pyconfig.py:471] Config param dump_hlo_upload_all: False I0425 20:31:08.748399 137607144601408 pyconfig.py:471] Config param dump_hlo_xla_flags: I0425 20:31:08.748415 137607144601408 pyconfig.py:471] Config param dump_jaxpr: False I0425 20:31:08.748429 137607144601408 pyconfig.py:471] Config param dump_jaxpr_delete_local_after: True I0425 20:31:08.748445 137607144601408 pyconfig.py:471] Config param dump_jaxpr_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-20-31/jaxpr_dump I0425 20:31:08.748460 137607144601408 pyconfig.py:471] Config param dump_jaxpr_local_dir: /tmp/jaxpr_dump/ I0425 20:31:08.748476 137607144601408 pyconfig.py:471] Config param dump_step: -1 I0425 20:31:08.748490 137607144601408 pyconfig.py:471] Config param elastic_enabled: False I0425 20:31:08.748506 137607144601408 pyconfig.py:471] Config param elastic_max_retries: 10 I0425 20:31:08.748520 137607144601408 pyconfig.py:471] Config param elastic_timeout_seconds: 300 I0425 20:31:08.748536 137607144601408 pyconfig.py:471] Config param emb_dim: 16 I0425 20:31:08.748550 137607144601408 pyconfig.py:471] Config param enable_autocheckpoint: False I0425 20:31:08.748565 137607144601408 pyconfig.py:471] Config param enable_checkpoint_cloud_logger: False I0425 20:31:08.748580 137607144601408 pyconfig.py:471] Config param enable_checkpointing: True I0425 20:31:08.748596 137607144601408 pyconfig.py:471] Config param enable_continuous_checkpointing: False I0425 20:31:08.748612 137607144601408 pyconfig.py:471] Config param enable_data_shuffling: True I0425 20:31:08.748627 137607144601408 pyconfig.py:471] Config param enable_diloco: False I0425 20:31:08.748643 137607144601408 pyconfig.py:471] Config param enable_dp_attention: False I0425 20:31:08.748658 137607144601408 pyconfig.py:471] Config param enable_dropout: False I0425 20:31:08.748674 137607144601408 pyconfig.py:471] Config param enable_emergency_checkpoint: False I0425 20:31:08.748688 137607144601408 pyconfig.py:471] Config param enable_expert_parallel: False I0425 20:31:08.748704 137607144601408 pyconfig.py:471] Config param enable_gcp_goodput_metrics: True I0425 20:31:08.748720 137607144601408 pyconfig.py:471] Config param enable_gcp_step_deviation_metrics: True I0425 20:31:08.748734 137607144601408 pyconfig.py:471] Config param enable_goodput_recording: False I0425 20:31:08.748749 137607144601408 pyconfig.py:471] Config param enable_jax_profiler: False I0425 20:31:08.748764 137607144601408 pyconfig.py:471] Config param enable_llm_inference_pool: False I0425 20:31:08.748779 137607144601408 pyconfig.py:471] Config param enable_model_warmup: False I0425 20:31:08.748795 137607144601408 pyconfig.py:471] Config param enable_multi_tier_checkpointing: False I0425 20:31:08.748810 137607144601408 pyconfig.py:471] Config param enable_nnx: False I0425 20:31:08.748826 137607144601408 pyconfig.py:471] Config param enable_orbax_v1: False I0425 20:31:08.748841 137607144601408 pyconfig.py:471] Config param enable_padding_causal_mask: True I0425 20:31:08.748856 137607144601408 pyconfig.py:471] Config param enable_pathways_goodput: False I0425 20:31:08.748871 137607144601408 pyconfig.py:471] Config param enable_prefix_caching: False I0425 20:31:08.748886 137607144601408 pyconfig.py:471] Config param enable_rampup_batch_size: False I0425 20:31:08.748902 137607144601408 pyconfig.py:471] Config param enable_single_controller: False I0425 20:31:08.748916 137607144601408 pyconfig.py:471] Config param enable_single_replica_ckpt_restoring: False I0425 20:31:08.748932 137607144601408 pyconfig.py:471] Config param enable_tensorboard: True I0425 20:31:08.748947 137607144601408 pyconfig.py:471] Config param enable_tunix_perf_metrics: False I0425 20:31:08.748962 137607144601408 pyconfig.py:471] Config param encoder_attention_heads_for_audio: 4 I0425 20:31:08.748977 137607144601408 pyconfig.py:471] Config param encoder_ffn_dim_for_audio: 512 I0425 20:31:08.748993 137607144601408 pyconfig.py:471] Config param encoder_layers_for_audio: 2 I0425 20:31:08.749007 137607144601408 pyconfig.py:471] Config param engram: RematLocation.REMAT I0425 20:31:08.749023 137607144601408 pyconfig.py:471] Config param engram_head_dim: 1280 I0425 20:31:08.749038 137607144601408 pyconfig.py:471] Config param engram_kernel_size: 4 I0425 20:31:08.749054 137607144601408 pyconfig.py:471] Config param engram_layers: [] I0425 20:31:08.749068 137607144601408 pyconfig.py:471] Config param engram_max_ngram_size: 3 I0425 20:31:08.749083 137607144601408 pyconfig.py:471] Config param engram_num_heads: 8 I0425 20:31:08.749117 137607144601408 pyconfig.py:471] Config param engram_seed: 0 I0425 20:31:08.749133 137607144601408 pyconfig.py:471] Config param engram_vocab_bases: [] I0425 20:31:08.749149 137607144601408 pyconfig.py:471] Config param epsilon_high: None I0425 20:31:08.749164 137607144601408 pyconfig.py:471] Config param eval_corr_lst: False I0425 20:31:08.749179 137607144601408 pyconfig.py:471] Config param eval_data_columns: ['text'] I0425 20:31:08.749196 137607144601408 pyconfig.py:471] Config param eval_dataset_name: c4/en:3.0.1 I0425 20:31:08.749210 137607144601408 pyconfig.py:471] Config param eval_image_column: image I0425 20:31:08.749229 137607144601408 pyconfig.py:471] Config param eval_interval: -1 I0425 20:31:08.749244 137607144601408 pyconfig.py:471] Config param eval_make_lst: False I0425 20:31:08.749260 137607144601408 pyconfig.py:471] Config param eval_per_device_batch_size: 2 I0425 20:31:08.749275 137607144601408 pyconfig.py:471] Config param eval_sampling_strategy: greedy I0425 20:31:08.749290 137607144601408 pyconfig.py:471] Config param eval_split: validation I0425 20:31:08.749305 137607144601408 pyconfig.py:471] Config param eval_steps: -1 I0425 20:31:08.749321 137607144601408 pyconfig.py:471] Config param expansion_factor_real_data: -1.0 I0425 20:31:08.749336 137607144601408 pyconfig.py:471] Config param final_logits_soft_cap: None I0425 20:31:08.749352 137607144601408 pyconfig.py:471] Config param first_num_dense_layers: 0 I0425 20:31:08.749367 137607144601408 pyconfig.py:471] Config param float32_gate_logits: False I0425 20:31:08.749383 137607144601408 pyconfig.py:471] Config param float32_logits: False I0425 20:31:08.749399 137607144601408 pyconfig.py:471] Config param float32_qk_product: False I0425 20:31:08.749413 137607144601408 pyconfig.py:471] Config param float32_weight_sum: True I0425 20:31:08.749429 137607144601408 pyconfig.py:471] Config param force_q_layout: False I0425 20:31:08.749444 137607144601408 pyconfig.py:471] Config param force_unroll: False I0425 20:31:08.749459 137607144601408 pyconfig.py:471] Config param freeze_audio_encoder_params: True I0425 20:31:08.749474 137607144601408 pyconfig.py:471] Config param freeze_vision_encoder_params: True I0425 20:31:08.749490 137607144601408 pyconfig.py:471] Config param fused_mlp: False I0425 20:31:08.749504 137607144601408 pyconfig.py:471] Config param fused_qkv: True I0425 20:31:08.749519 137607144601408 pyconfig.py:471] Config param gcs_metrics: False I0425 20:31:08.749533 137607144601408 pyconfig.py:471] Config param gdn_chunk_size: 64 I0425 20:31:08.749549 137607144601408 pyconfig.py:471] Config param gdn_conv_kernel_dim: 4 I0425 20:31:08.749565 137607144601408 pyconfig.py:471] Config param gdn_key_head_dim: 128 I0425 20:31:08.749581 137607144601408 pyconfig.py:471] Config param gdn_num_key_heads: 16 I0425 20:31:08.749596 137607144601408 pyconfig.py:471] Config param gdn_num_value_heads: 32 I0425 20:31:08.749611 137607144601408 pyconfig.py:471] Config param gdn_value_head_dim: 128 I0425 20:31:08.749625 137607144601408 pyconfig.py:471] Config param generate_padding_batch_eval: False I0425 20:31:08.749640 137607144601408 pyconfig.py:471] Config param generate_padding_batch_train: False I0425 20:31:08.749655 137607144601408 pyconfig.py:471] Config param generate_slice: v5e-16 I0425 20:31:08.749671 137607144601408 pyconfig.py:471] Config param generation_configs: {} I0425 20:31:08.749686 137607144601408 pyconfig.py:471] Config param global_batch_size_to_eval_on: 64 I0425 20:31:08.749701 137607144601408 pyconfig.py:471] Config param global_batch_size_to_load: 512 I0425 20:31:08.749716 137607144601408 pyconfig.py:471] Config param global_batch_size_to_load_eval: 64 I0425 20:31:08.749731 137607144601408 pyconfig.py:471] Config param global_batch_size_to_load_increment: None I0425 20:31:08.749747 137607144601408 pyconfig.py:471] Config param global_batch_size_to_load_start: None I0425 20:31:08.749761 137607144601408 pyconfig.py:471] Config param global_batch_size_to_train_on: 512 I0425 20:31:08.749777 137607144601408 pyconfig.py:471] Config param global_head_dim: 0 I0425 20:31:08.749792 137607144601408 pyconfig.py:471] Config param global_num_kv_heads: 0 I0425 20:31:08.749808 137607144601408 pyconfig.py:471] Config param global_parameter_scale: 1 I0425 20:31:08.749823 137607144601408 pyconfig.py:471] Config param global_rampup_samples: 500 I0425 20:31:08.749839 137607144601408 pyconfig.py:471] Config param global_rope_max_timescale: -1 I0425 20:31:08.749855 137607144601408 pyconfig.py:471] Config param global_rope_proportion: 0.25 I0425 20:31:08.749870 137607144601408 pyconfig.py:471] Config param goodput_upload_interval_seconds: 30 I0425 20:31:08.749886 137607144601408 pyconfig.py:471] Config param grad_dtype: float32 I0425 20:31:08.749920 137607144601408 pyconfig.py:471] Config param gradient_accumulation_steps: 8 I0425 20:31:08.749936 137607144601408 pyconfig.py:471] Config param gradient_clipping_threshold: 1.0 I0425 20:31:08.749952 137607144601408 pyconfig.py:471] Config param grain_data_source_max_workers: 16 I0425 20:31:08.749967 137607144601408 pyconfig.py:471] Config param grain_eval_files: I0425 20:31:08.749983 137607144601408 pyconfig.py:471] Config param grain_file_type: arrayrecord I0425 20:31:08.749998 137607144601408 pyconfig.py:471] Config param grain_num_threads: 16 I0425 20:31:08.750014 137607144601408 pyconfig.py:471] Config param grain_num_threads_eval: 16 I0425 20:31:08.750029 137607144601408 pyconfig.py:471] Config param grain_packing_type: first_fit I0425 20:31:08.750043 137607144601408 pyconfig.py:471] Config param grain_per_worker_buffer_size: 1 I0425 20:31:08.750059 137607144601408 pyconfig.py:471] Config param grain_per_worker_buffer_size_eval: 1 I0425 20:31:08.750075 137607144601408 pyconfig.py:471] Config param grain_prefetch_buffer_size: 500 I0425 20:31:08.750090 137607144601408 pyconfig.py:471] Config param grain_prefetch_buffer_size_eval: 500 I0425 20:31:08.750116 137607144601408 pyconfig.py:471] Config param grain_ram_budget_mb: 1024 I0425 20:31:08.750132 137607144601408 pyconfig.py:471] Config param grain_shuffle_buffer_size: 100 I0425 20:31:08.750148 137607144601408 pyconfig.py:471] Config param grain_train_files: I0425 20:31:08.750164 137607144601408 pyconfig.py:471] Config param grain_train_mixture_config_path: I0425 20:31:08.750178 137607144601408 pyconfig.py:471] Config param grain_worker_count: 1 I0425 20:31:08.750194 137607144601408 pyconfig.py:471] Config param grain_worker_count_eval: 1 I0425 20:31:08.750209 137607144601408 pyconfig.py:471] Config param grpo_beta: 0.08 I0425 20:31:08.750229 137607144601408 pyconfig.py:471] Config param grpo_epsilon: 0.2 I0425 20:31:08.750245 137607144601408 pyconfig.py:471] Config param hardware: tpu I0425 20:31:08.750261 137607144601408 pyconfig.py:471] Config param hbm_utilization_vllm: 0.72 I0425 20:31:08.750277 137607144601408 pyconfig.py:471] Config param head_dim: 8 I0425 20:31:08.750292 137607144601408 pyconfig.py:471] Config param heartbeat_reporting_interval_in_seconds: 5 I0425 20:31:08.750307 137607144601408 pyconfig.py:471] Config param hf_data_dir: None I0425 20:31:08.750322 137607144601408 pyconfig.py:471] Config param hf_eval_files: None I0425 20:31:08.750338 137607144601408 pyconfig.py:471] Config param hf_eval_split: None I0425 20:31:08.750354 137607144601408 pyconfig.py:471] Config param hf_name: None I0425 20:31:08.750368 137607144601408 pyconfig.py:471] Config param hf_path: OptimalScale/ClimbMix I0425 20:31:08.750383 137607144601408 pyconfig.py:471] Config param hf_train_files: None I0425 20:31:08.750398 137607144601408 pyconfig.py:471] Config param hidden_size_for_vit: 1408 I0425 20:31:08.750413 137607144601408 pyconfig.py:471] Config param hide_profiler_step_metric: False I0425 20:31:08.750428 137607144601408 pyconfig.py:471] Config param ici_autoregressive_parallelism: 1 I0425 20:31:08.750443 137607144601408 pyconfig.py:471] Config param ici_context_autoregressive_parallelism: 1 I0425 20:31:08.750459 137607144601408 pyconfig.py:471] Config param ici_context_parallelism: 1 I0425 20:31:08.750474 137607144601408 pyconfig.py:471] Config param ici_data_parallelism: 1 I0425 20:31:08.750491 137607144601408 pyconfig.py:471] Config param ici_diloco_parallelism: 1 I0425 20:31:08.750505 137607144601408 pyconfig.py:471] Config param ici_expert_parallelism: 1 I0425 20:31:08.750521 137607144601408 pyconfig.py:471] Config param ici_fsdp_parallelism: -1 I0425 20:31:08.750535 137607144601408 pyconfig.py:471] Config param ici_fsdp_transpose_parallelism: 1 I0425 20:31:08.750550 137607144601408 pyconfig.py:471] Config param ici_parallelism: [1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1] I0425 20:31:08.750566 137607144601408 pyconfig.py:471] Config param ici_pipeline_parallelism: 1 I0425 20:31:08.750582 137607144601408 pyconfig.py:471] Config param ici_sequence_parallelism: 1 I0425 20:31:08.750597 137607144601408 pyconfig.py:471] Config param ici_tensor_parallelism: 1 I0425 20:31:08.750612 137607144601408 pyconfig.py:471] Config param ici_tensor_sequence_parallelism: 1 I0425 20:31:08.750627 137607144601408 pyconfig.py:471] Config param ici_tensor_transpose_parallelism: 1 I0425 20:31:08.750642 137607144601408 pyconfig.py:471] Config param image_path: I0425 20:31:08.750658 137607144601408 pyconfig.py:471] Config param image_placeholder: <|image|> I0425 20:31:08.750674 137607144601408 pyconfig.py:471] Config param image_size_for_vit: 896 I0425 20:31:08.750690 137607144601408 pyconfig.py:471] Config param indexer_head_dim: 128 I0425 20:31:08.750705 137607144601408 pyconfig.py:471] Config param indexer_loss_scaling_factor: 0.0 I0425 20:31:08.750721 137607144601408 pyconfig.py:471] Config param indexer_n_heads: 64 I0425 20:31:08.750736 137607144601408 pyconfig.py:471] Config param indexer_sparse_training: False I0425 20:31:08.750751 137607144601408 pyconfig.py:471] Config param indexer_topk: 2048 I0425 20:31:08.750766 137607144601408 pyconfig.py:471] Config param inference_benchmark_test: False I0425 20:31:08.750780 137607144601408 pyconfig.py:471] Config param inference_metadata_file: I0425 20:31:08.750796 137607144601408 pyconfig.py:471] Config param inference_microbenchmark_log_file_path: I0425 20:31:08.750810 137607144601408 pyconfig.py:471] Config param inference_microbenchmark_loop_iters: 10 I0425 20:31:08.750826 137607144601408 pyconfig.py:471] Config param inference_microbenchmark_num_samples: [1, 2, 3, 4, 5] I0425 20:31:08.750841 137607144601408 pyconfig.py:471] Config param inference_microbenchmark_prefill_lengths: 64,128,256,512,1024 I0425 20:31:08.750857 137607144601408 pyconfig.py:471] Config param inference_microbenchmark_stages: prefill,generate I0425 20:31:08.750872 137607144601408 pyconfig.py:471] Config param inference_server: MaxtextInterleavedServer I0425 20:31:08.750888 137607144601408 pyconfig.py:471] Config param inhomogeneous_layer_cycle_interval: 1 I0425 20:31:08.750903 137607144601408 pyconfig.py:471] Config param init_weights_seed: 0 I0425 20:31:08.750918 137607144601408 pyconfig.py:471] Config param input_data_sharding_logical_axes: ['activation_embed_and_logits_batch', 'activation_norm_length'] I0425 20:31:08.750935 137607144601408 pyconfig.py:471] Config param interleave_moe_layer_step: 1 I0425 20:31:08.750951 137607144601408 pyconfig.py:471] Config param intermediate_size_for_vit: 5632 I0425 20:31:08.750967 137607144601408 pyconfig.py:471] Config param internal_compile: False I0425 20:31:08.750983 137607144601408 pyconfig.py:471] Config param internal_compile_num_devices: -1 I0425 20:31:08.750999 137607144601408 pyconfig.py:471] Config param jax_cache_dir: ~/jax_cache I0425 20:31:08.751013 137607144601408 pyconfig.py:471] Config param jax_debug_log_modules: I0425 20:31:08.751029 137607144601408 pyconfig.py:471] Config param jax_distributed_initialization_timeout: 300 I0425 20:31:08.751045 137607144601408 pyconfig.py:471] Config param jax_profiler_port: 9999 I0425 20:31:08.751060 137607144601408 pyconfig.py:471] Config param key_proj: RematLocation.REMAT I0425 20:31:08.751075 137607144601408 pyconfig.py:471] Config param kv_cache_buffer: 256 I0425 20:31:08.751091 137607144601408 pyconfig.py:471] Config param kv_lora_rank: 512 I0425 20:31:08.751122 137607144601408 pyconfig.py:471] Config param kv_quant_axis: KvQuantAxis.HEADS_AND_DKV I0425 20:31:08.751141 137607144601408 pyconfig.py:471] Config param kv_quant_dtype: int8 I0425 20:31:08.751155 137607144601408 pyconfig.py:471] Config param kv_wa_proj: RematLocation.REMAT I0425 20:31:08.751171 137607144601408 pyconfig.py:471] Config param learning_rate: 0.0002 I0425 20:31:08.751186 137607144601408 pyconfig.py:471] Config param learning_rate_final_fraction: 0.1 I0425 20:31:08.751202 137607144601408 pyconfig.py:471] Config param learning_rate_schedule_steps: 200000 I0425 20:31:08.751217 137607144601408 pyconfig.py:471] Config param load_balance_loss_weight: 0.0 I0425 20:31:08.751236 137607144601408 pyconfig.py:471] Config param load_checkpoint_only_once: False I0425 20:31:08.751251 137607144601408 pyconfig.py:471] Config param load_from_prefill_dir: False I0425 20:31:08.751267 137607144601408 pyconfig.py:471] Config param load_full_state_path: I0425 20:31:08.751283 137607144601408 pyconfig.py:471] Config param load_parameters_path: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items I0425 20:31:08.751298 137607144601408 pyconfig.py:471] Config param local_checkpoint_directory: I0425 20:31:08.751313 137607144601408 pyconfig.py:471] Config param local_checkpoint_period: 0 I0425 20:31:08.751328 137607144601408 pyconfig.py:471] Config param local_rope_max_timescale: -1 I0425 20:31:08.751344 137607144601408 pyconfig.py:471] Config param local_rope_proportion: 1.0 I0425 20:31:08.751360 137607144601408 pyconfig.py:471] Config param log_config: True I0425 20:31:08.751375 137607144601408 pyconfig.py:471] Config param log_period: 10 I0425 20:31:08.751390 137607144601408 pyconfig.py:471] Config param logical_axis_rules: (('activation_embed_and_logits_batch', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_embed_and_logits_batch_sequence', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_vocab', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_vocab', ('tensor', 'tensor_transpose')), ('activation_vocab', 'tensor_sequence'), ('activation_vocab', ('sequence', 'context')), ('vocab', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('embed_vocab', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence', 'autoregressive')), ('activation_kv_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence')), ('activation_attn_length', ('sequence', 'context')), ('activation_attn_length', ('context',)), ('activation_q_length', ('context',)), ('activation_kv_length', ()), ('activation_attn_embed', ('tensor', 'tensor_transpose')), ('activation_kv', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_kv_head_dim', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('q_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('kv_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('qkv', ()), ('kv', ()), ('kv_head_dim', ()), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'expert')), ('q_lora_up_proj', ()), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'expert')), ('kv_lora_up_proj', ()), ('activation_batch_moe', ('data', 'fsdp', 'fsdp_transpose')), ('activation_length_moe', ('sequence', 'context')), ('activation_length_moe', ('context',)), ('activation_norm_length_moe', ('tensor_sequence', 'context', 'sequence')), ('activation_embed_moe', ('tensor', 'tensor_transpose')), ('activation_mlp_moe', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_exp', ('expert',)), ('exp', 'expert'), ('mlp_moe', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'context')), ('embed_moe', ('fsdp', 'sequence', 'context')), ('activation_mlp', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_length', ('sequence', 'context')), ('activation_length', ('context',)), ('activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_embed', ('tensor', 'tensor_transpose')), ('activation_stage', 'stage'), ('mlp', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'context', 'expert')), ('norm', ('tensor', 'tensor_transpose')), ('layers', 'stage'), ('diloco', 'diloco'), ('engram_dim', ('tensor',)), ('dense_layers', ()), ('moe_layers', ()), ('mhc', ()), ('prefill_activation_length', ('sequence', 'context')), ('prefill_activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_prefill_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_length', ('sequence',)), ('cache_heads', ('autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence')), ('cache_heads', ('autoregressive', 'tensor', 'tensor_sequence')), ('paged_kv_heads', ('tensor',)), ('cache_batch_prefill', ()), ('cache_batch', ()), ('cache_heads_none', ()), ('cache_kv', ()), ('cache_sequence', ()), ('num_pages', ()), ('tokens_per_page', ()), ('paged_kv_head_dim_size', ()), ('mlp_no_fsdp', ('tensor', 'tensor_sequence', 'autoregressive')), ('embed_tensor_transpose', ('tensor_transpose',)), ('exp_with_fsdp', 'fsdp')) I0425 20:31:08.751469 137607144601408 pyconfig.py:471] Config param logits_dot_in_fp32: False I0425 20:31:08.751484 137607144601408 pyconfig.py:471] Config param logits_via_embedding: True I0425 20:31:08.751501 137607144601408 pyconfig.py:471] Config param lora_input_adapters_path: I0425 20:31:08.751515 137607144601408 pyconfig.py:471] Config param loss_algo: grpo I0425 20:31:08.751531 137607144601408 pyconfig.py:471] Config param lr_schedule_type: LearningRateScheduleType.COSINE I0425 20:31:08.751548 137607144601408 pyconfig.py:471] Config param managed_mldiagnostics: False I0425 20:31:08.751564 137607144601408 pyconfig.py:471] Config param managed_mldiagnostics_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-20-31/managed-mldiagnostics I0425 20:31:08.751578 137607144601408 pyconfig.py:471] Config param managed_mldiagnostics_run_group: I0425 20:31:08.751594 137607144601408 pyconfig.py:471] Config param matmul_precision: MatmulPrecision.DEFAULT I0425 20:31:08.751611 137607144601408 pyconfig.py:471] Config param max_checkify: False I0425 20:31:08.751626 137607144601408 pyconfig.py:471] Config param max_concurrency: 256 I0425 20:31:08.751641 137607144601408 pyconfig.py:471] Config param max_corpus_chars: 10000000 I0425 20:31:08.751657 137607144601408 pyconfig.py:471] Config param max_num_batched_tokens: None I0425 20:31:08.751671 137607144601408 pyconfig.py:471] Config param max_num_checkpoints_to_keep: None I0425 20:31:08.751686 137607144601408 pyconfig.py:471] Config param max_num_images_per_example: -1 I0425 20:31:08.751701 137607144601408 pyconfig.py:471] Config param max_num_seqs: None I0425 20:31:08.751717 137607144601408 pyconfig.py:471] Config param max_position_embeddings: 163840 I0425 20:31:08.751731 137607144601408 pyconfig.py:471] Config param max_prefill_predict_length: 64 I0425 20:31:08.751747 137607144601408 pyconfig.py:471] Config param max_sample_len_for_audio: 10000 I0425 20:31:08.751763 137607144601408 pyconfig.py:471] Config param max_segments_per_seq: -1 I0425 20:31:08.751779 137607144601408 pyconfig.py:471] Config param max_source_positions_for_audio: 1500 I0425 20:31:08.751795 137607144601408 pyconfig.py:471] Config param max_target_length: 2048 I0425 20:31:08.751809 137607144601408 pyconfig.py:471] Config param max_timescale_for_audio: 10000.0 I0425 20:31:08.751824 137607144601408 pyconfig.py:471] Config param megablox: True I0425 20:31:08.751840 137607144601408 pyconfig.py:471] Config param merge_gating_gmm: False I0425 20:31:08.751856 137607144601408 pyconfig.py:471] Config param mesh_axes: ['diloco', 'data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'] I0425 20:31:08.751873 137607144601408 pyconfig.py:471] Config param metrics_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-20-31/metrics/ I0425 20:31:08.751889 137607144601408 pyconfig.py:471] Config param metrics_file: I0425 20:31:08.751906 137607144601408 pyconfig.py:471] Config param mhc_expansion_rate: 1 I0425 20:31:08.751921 137607144601408 pyconfig.py:471] Config param micro_batch_size_to_eval_on: 64 I0425 20:31:08.751938 137607144601408 pyconfig.py:471] Config param micro_batch_size_to_train_on: 64 I0425 20:31:08.751952 137607144601408 pyconfig.py:471] Config param mla_kv: RematLocation.REMAT I0425 20:31:08.751969 137607144601408 pyconfig.py:471] Config param mla_naive_kvcache: True I0425 20:31:08.751985 137607144601408 pyconfig.py:471] Config param mla_q: RematLocation.REMAT I0425 20:31:08.752001 137607144601408 pyconfig.py:471] Config param mlp_activations: ['gelu'] I0425 20:31:08.752017 137607144601408 pyconfig.py:471] Config param mlp_activations_limit: -1.0 I0425 20:31:08.752032 137607144601408 pyconfig.py:471] Config param mlp_bias: False I0425 20:31:08.752047 137607144601408 pyconfig.py:471] Config param mlp_dim: 64 I0425 20:31:08.752063 137607144601408 pyconfig.py:471] Config param mlpwi: RematLocation.REMAT I0425 20:31:08.752078 137607144601408 pyconfig.py:471] Config param mlpwi_0: RematLocation.REMAT I0425 20:31:08.752104 137607144601408 pyconfig.py:471] Config param mlpwi_1: RematLocation.REMAT I0425 20:31:08.752120 137607144601408 pyconfig.py:471] Config param mlpwo: RematLocation.REMAT I0425 20:31:08.752135 137607144601408 pyconfig.py:471] Config param moba: False I0425 20:31:08.752151 137607144601408 pyconfig.py:471] Config param moba_chunk_size: 1024 I0425 20:31:08.752166 137607144601408 pyconfig.py:471] Config param moba_topk: 8 I0425 20:31:08.752180 137607144601408 pyconfig.py:471] Config param model_call_mode: I0425 20:31:08.752196 137607144601408 pyconfig.py:471] Config param model_name: gpt3-52k I0425 20:31:08.752211 137607144601408 pyconfig.py:471] Config param moe_expert_input_dim: -1 I0425 20:31:08.752230 137607144601408 pyconfig.py:471] Config param moe_fsdp_use_two_stage_all_gather: False I0425 20:31:08.752245 137607144601408 pyconfig.py:471] Config param moe_mlp_dim: -1 I0425 20:31:08.752261 137607144601408 pyconfig.py:471] Config param moe_mlpwi_0: RematLocation.REMAT I0425 20:31:08.752277 137607144601408 pyconfig.py:471] Config param moe_mlpwi_1: RematLocation.REMAT I0425 20:31:08.752292 137607144601408 pyconfig.py:471] Config param moe_mlpwo: RematLocation.REMAT I0425 20:31:08.752307 137607144601408 pyconfig.py:471] Config param monitor_goodput: False I0425 20:31:08.752323 137607144601408 pyconfig.py:471] Config param monitor_step_time_deviation: True I0425 20:31:08.752337 137607144601408 pyconfig.py:471] Config param mrope_section: [24, 20, 20] I0425 20:31:08.752353 137607144601408 pyconfig.py:471] Config param mscale: 1.0 I0425 20:31:08.752368 137607144601408 pyconfig.py:471] Config param mtc_data_parallelism: 0 I0425 20:31:08.752383 137607144601408 pyconfig.py:471] Config param mtp_eval_target_module: 0 I0425 20:31:08.752397 137607144601408 pyconfig.py:471] Config param mtp_loss_scaling_factor: 0.1 I0425 20:31:08.752413 137607144601408 pyconfig.py:471] Config param mtp_num_layers: 0 I0425 20:31:08.752428 137607144601408 pyconfig.py:471] Config param mu_dtype: float32 I0425 20:31:08.752452 137607144601408 pyconfig.py:471] Config param multi_sampling: False I0425 20:31:08.752467 137607144601408 pyconfig.py:471] Config param multi_tier_checkpointing_backup_interval_minutes: 0 I0425 20:31:08.752482 137607144601408 pyconfig.py:471] Config param muon_beta: 0.95 I0425 20:31:08.752498 137607144601408 pyconfig.py:471] Config param muon_consistent_rms: None I0425 20:31:08.752513 137607144601408 pyconfig.py:471] Config param muon_weight_decay: 0.0 I0425 20:31:08.752529 137607144601408 pyconfig.py:471] Config param n_routing_groups: -1 I0425 20:31:08.752544 137607144601408 pyconfig.py:471] Config param n_window_for_audio: 50 I0425 20:31:08.752558 137607144601408 pyconfig.py:471] Config param n_window_infer_for_audio: 800 I0425 20:31:08.752574 137607144601408 pyconfig.py:471] Config param nope_layer_interval: -1 I0425 20:31:08.752590 137607144601408 pyconfig.py:471] Config param norm_topk_prob: False I0425 20:31:08.752605 137607144601408 pyconfig.py:471] Config param normalization_layer_epsilon: 1e-05 I0425 20:31:08.752623 137607144601408 pyconfig.py:471] Config param normalize_embedding_logits: False I0425 20:31:08.752638 137607144601408 pyconfig.py:471] Config param num_attention_heads_for_vit: 16 I0425 20:31:08.752653 137607144601408 pyconfig.py:471] Config param num_batches: 4 I0425 20:31:08.752668 137607144601408 pyconfig.py:471] Config param num_channels_for_vit: 3 I0425 20:31:08.752683 137607144601408 pyconfig.py:471] Config param num_conv_layers_for_audio: 3 I0425 20:31:08.752699 137607144601408 pyconfig.py:471] Config param num_decoder_layers: 1 I0425 20:31:08.752714 137607144601408 pyconfig.py:471] Config param num_diloco_replicas: 1 I0425 20:31:08.752730 137607144601408 pyconfig.py:471] Config param num_epoch: 1 I0425 20:31:08.752744 137607144601408 pyconfig.py:471] Config param num_eval_passes: 1 I0425 20:31:08.752760 137607144601408 pyconfig.py:471] Config param num_experts: 1 I0425 20:31:08.752774 137607144601408 pyconfig.py:471] Config param num_experts_per_tok: 1 I0425 20:31:08.752790 137607144601408 pyconfig.py:471] Config param num_generations: 2 I0425 20:31:08.752805 137607144601408 pyconfig.py:471] Config param num_hidden_layers_for_vit: 34 I0425 20:31:08.752820 137607144601408 pyconfig.py:471] Config param num_iterations: 1 I0425 20:31:08.752835 137607144601408 pyconfig.py:471] Config param num_kv_heads: 2 I0425 20:31:08.752849 137607144601408 pyconfig.py:471] Config param num_layers_per_pipeline_stage: 1 I0425 20:31:08.752865 137607144601408 pyconfig.py:471] Config param num_mel_bins_for_audio: 128 I0425 20:31:08.752880 137607144601408 pyconfig.py:471] Config param num_pipeline_microbatches: -1 I0425 20:31:08.752895 137607144601408 pyconfig.py:471] Config param num_pipeline_repeats: -1 I0425 20:31:08.752910 137607144601408 pyconfig.py:471] Config param num_position_embeddings_for_vit: 1024 I0425 20:31:08.752925 137607144601408 pyconfig.py:471] Config param num_query_heads: 2 I0425 20:31:08.752939 137607144601408 pyconfig.py:471] Config param num_samplers_slices: -1 I0425 20:31:08.752955 137607144601408 pyconfig.py:471] Config param num_slices: 1 I0425 20:31:08.752969 137607144601408 pyconfig.py:471] Config param num_target_devices: 32 I0425 20:31:08.752985 137607144601408 pyconfig.py:471] Config param num_test_batches: 5 I0425 20:31:08.753001 137607144601408 pyconfig.py:471] Config param num_trainer_slices: -1 I0425 20:31:08.753015 137607144601408 pyconfig.py:471] Config param num_vocab_tiling: 1 I0425 20:31:08.753031 137607144601408 pyconfig.py:471] Config param off_policy_steps: 0 I0425 20:31:08.753046 137607144601408 pyconfig.py:471] Config param offline_data_dir: None I0425 20:31:08.753062 137607144601408 pyconfig.py:471] Config param opt_type: OptimizerType.ADAM_PAX I0425 20:31:08.753079 137607144601408 pyconfig.py:471] Config param optimize_mesh_for_tpu_v6e: False I0425 20:31:08.753103 137607144601408 pyconfig.py:471] Config param optimizer_memory_host_offload: False I0425 20:31:08.753118 137607144601408 pyconfig.py:471] Config param original_max_position_embeddings: 4096 I0425 20:31:08.753134 137607144601408 pyconfig.py:471] Config param out_hidden_size_for_vit: 512 I0425 20:31:08.753149 137607144601408 pyconfig.py:471] Config param out_proj: RematLocation.REMAT I0425 20:31:08.753165 137607144601408 pyconfig.py:471] Config param output_dim_for_audio: 512 I0425 20:31:08.753179 137607144601408 pyconfig.py:471] Config param override_logical_axis_rules: False I0425 20:31:08.753195 137607144601408 pyconfig.py:471] Config param override_model_config: True I0425 20:31:08.753209 137607144601408 pyconfig.py:471] Config param packing: True I0425 20:31:08.753228 137607144601408 pyconfig.py:471] Config param pagedattn_head_dim_alignment: 128 I0425 20:31:08.753242 137607144601408 pyconfig.py:471] Config param pagedattn_max_pages_per_group: -1 I0425 20:31:08.753258 137607144601408 pyconfig.py:471] Config param pagedattn_num_pages: 64 I0425 20:31:08.753273 137607144601408 pyconfig.py:471] Config param pagedattn_pages_per_compute_block: 4 I0425 20:31:08.753288 137607144601408 pyconfig.py:471] Config param pagedattn_tokens_per_page: 32 I0425 20:31:08.753302 137607144601408 pyconfig.py:471] Config param param_scan_axis: 1 I0425 20:31:08.753316 137607144601408 pyconfig.py:471] Config param parameter_memory_host_offload: False I0425 20:31:08.753332 137607144601408 pyconfig.py:471] Config param partial_rotary_factor: 1.0 I0425 20:31:08.753349 137607144601408 pyconfig.py:471] Config param patch_size_for_vit: 14 I0425 20:31:08.753363 137607144601408 pyconfig.py:471] Config param penalty_incorrect_answer: -1.0 I0425 20:31:08.753379 137607144601408 pyconfig.py:471] Config param penalty_incorrect_format: -0.5 I0425 20:31:08.753393 137607144601408 pyconfig.py:471] Config param per_device_batch_size: 2 I0425 20:31:08.753409 137607144601408 pyconfig.py:471] Config param per_device_batch_size_increment: 2.0 I0425 20:31:08.753425 137607144601408 pyconfig.py:471] Config param per_device_batch_size_start: 4.0 I0425 20:31:08.753440 137607144601408 pyconfig.py:471] Config param pipeline_delay_activation_forwarding: False I0425 20:31:08.753455 137607144601408 pyconfig.py:471] Config param pipeline_fsdp_ag_once: False I0425 20:31:08.753472 137607144601408 pyconfig.py:471] Config param pipeline_fsdp_ag_per_repeat: False I0425 20:31:08.753486 137607144601408 pyconfig.py:471] Config param pipeline_parallel_layers: 1 I0425 20:31:08.753503 137607144601408 pyconfig.py:471] Config param pixel_shuffle_ratio_for_vit: 0.5 I0425 20:31:08.753517 137607144601408 pyconfig.py:471] Config param posemb_type_for_vit: learn I0425 20:31:08.753532 137607144601408 pyconfig.py:471] Config param position_id_per_seconds: 25 I0425 20:31:08.753548 137607144601408 pyconfig.py:471] Config param prefill_cache_axis_order: 1,2,0,3 I0425 20:31:08.753564 137607144601408 pyconfig.py:471] Config param prefill_cache_dir: I0425 20:31:08.753579 137607144601408 pyconfig.py:471] Config param prefill_chunk_size: 256 I0425 20:31:08.753595 137607144601408 pyconfig.py:471] Config param prefill_slice: v5e-16 I0425 20:31:08.753612 137607144601408 pyconfig.py:471] Config param prefix_caching_dram_byte: 100000000000 I0425 20:31:08.753626 137607144601408 pyconfig.py:471] Config param prefix_caching_hbm_byte: 10000000000 I0425 20:31:08.753643 137607144601408 pyconfig.py:471] Config param profile_cleanly: True I0425 20:31:08.753657 137607144601408 pyconfig.py:471] Config param profile_periodically_period: -1 I0425 20:31:08.753672 137607144601408 pyconfig.py:471] Config param profile_power_events: False I0425 20:31:08.753686 137607144601408 pyconfig.py:471] Config param profiler: ProfilerType.NONE I0425 20:31:08.753704 137607144601408 pyconfig.py:471] Config param profiler_steps: 5 I0425 20:31:08.753720 137607144601408 pyconfig.py:471] Config param projector_dropout_for_vit: 0.0 I0425 20:31:08.753736 137607144601408 pyconfig.py:471] Config param projector_input_dim_for_vit: 4096 I0425 20:31:08.753751 137607144601408 pyconfig.py:471] Config param projector_output_dim_for_vit: 4096 I0425 20:31:08.753766 137607144601408 pyconfig.py:471] Config param prometheus_port: 0 I0425 20:31:08.753781 137607144601408 pyconfig.py:471] Config param prompt: I love to I0425 20:31:08.753796 137607144601408 pyconfig.py:471] Config param pure_nnx: False I0425 20:31:08.753811 137607144601408 pyconfig.py:471] Config param pure_nnx_decoder: False I0425 20:31:08.753827 137607144601408 pyconfig.py:471] Config param q_lora_rank: 0 I0425 20:31:08.753841 137607144601408 pyconfig.py:471] Config param qk_clip_threshold: 100.0 I0425 20:31:08.753857 137607144601408 pyconfig.py:471] Config param qk_nope_head_dim: 128 I0425 20:31:08.753873 137607144601408 pyconfig.py:471] Config param qk_norm_with_scale: True I0425 20:31:08.753889 137607144601408 pyconfig.py:471] Config param qk_rope_head_dim: 64 I0425 20:31:08.753904 137607144601408 pyconfig.py:471] Config param qkv_proj: RematLocation.REMAT I0425 20:31:08.753919 137607144601408 pyconfig.py:471] Config param quant_cfg_path: I0425 20:31:08.753935 137607144601408 pyconfig.py:471] Config param quantization: QuantizationType.NONE I0425 20:31:08.753953 137607144601408 pyconfig.py:471] Config param quantization_local_shard_count: 4 I0425 20:31:08.753968 137607144601408 pyconfig.py:471] Config param quantize_kvcache: False I0425 20:31:08.753983 137607144601408 pyconfig.py:471] Config param query_proj: RematLocation.REMAT I0425 20:31:08.753998 137607144601408 pyconfig.py:471] Config param query_wa_proj: RematLocation.REMAT I0425 20:31:08.754016 137607144601408 pyconfig.py:471] Config param ragged_block_size: 256 I0425 20:31:08.754039 137607144601408 pyconfig.py:471] Config param ragged_buffer_factor: -1.0 I0425 20:31:08.754066 137607144601408 pyconfig.py:471] Config param rampup_end_step: 0 I0425 20:31:08.754102 137607144601408 pyconfig.py:471] Config param rampup_samples_per_increment_to_load: None I0425 20:31:08.754131 137607144601408 pyconfig.py:471] Config param reasoning_end_token: </reasoning> I0425 20:31:08.754157 137607144601408 pyconfig.py:471] Config param reasoning_start_token: <reasoning> I0425 20:31:08.754175 137607144601408 pyconfig.py:471] Config param record_internal_nn_metrics: 0 I0425 20:31:08.754189 137607144601408 pyconfig.py:471] Config param remat_policy: full I0425 20:31:08.754215 137607144601408 pyconfig.py:471] Config param remat_policy_for_vit: minimal I0425 20:31:08.754247 137607144601408 pyconfig.py:471] Config param remove_size_one_mesh_axis_from_type: True I0425 20:31:08.754273 137607144601408 pyconfig.py:471] Config param replicate_quant_scale: False I0425 20:31:08.754298 137607144601408 pyconfig.py:471] Config param replicator_backup_interval_minutes: 0 I0425 20:31:08.754322 137607144601408 pyconfig.py:471] Config param report_heartbeat_metric_for_gcp_monitoring: False I0425 20:31:08.754346 137607144601408 pyconfig.py:471] Config param report_performance_metric_for_gcp_monitoring: False I0425 20:31:08.754370 137607144601408 pyconfig.py:471] Config param reshape_q: False I0425 20:31:08.754394 137607144601408 pyconfig.py:471] Config param return_log_prob: False I0425 20:31:08.754418 137607144601408 pyconfig.py:471] Config param reuse_example_batch: 0 I0425 20:31:08.754442 137607144601408 pyconfig.py:471] Config param reward_exact_answer: 5.0 I0425 20:31:08.754467 137607144601408 pyconfig.py:471] Config param reward_exact_format_match: 3.0 I0425 20:31:08.754492 137607144601408 pyconfig.py:471] Config param reward_partial_format_match: 0.5 I0425 20:31:08.754516 137607144601408 pyconfig.py:471] Config param reward_ratio_guess_to_answer_high: 0.5 I0425 20:31:08.754533 137607144601408 pyconfig.py:471] Config param reward_ratio_guess_to_answer_low: 0.25 I0425 20:31:08.754552 137607144601408 pyconfig.py:471] Config param reward_white_space_format_match: 1.5 I0425 20:31:08.754578 137607144601408 pyconfig.py:471] Config param rl: {'num_generations': 2, 'num_iterations': 1, 'grpo_beta': 0.08, 'grpo_epsilon': 0.2, 'loss_algo': 'grpo', 'use_agentic_rollout': False, 'max_concurrency': 256, 'off_policy_steps': 0, 'system_prompt': '', 'degenerate_group_masking': True, 'epsilon_high': None} I0425 20:31:08.754611 137607144601408 pyconfig.py:471] Config param rollout_data_parallelism: -1 I0425 20:31:08.754638 137607144601408 pyconfig.py:471] Config param rollout_expert_parallelism: 1 I0425 20:31:08.754664 137607144601408 pyconfig.py:471] Config param rollout_micro_batch_size: -1 I0425 20:31:08.754690 137607144601408 pyconfig.py:471] Config param rollout_tensor_parallelism: -1 I0425 20:31:08.754715 137607144601408 pyconfig.py:471] Config param rope_attention_scaling: False I0425 20:31:08.754737 137607144601408 pyconfig.py:471] Config param rope_factor: 40 I0425 20:31:08.754754 137607144601408 pyconfig.py:471] Config param rope_interleave: True I0425 20:31:08.754770 137607144601408 pyconfig.py:471] Config param rope_linear_scaling_factor: 1.0 I0425 20:31:08.754785 137607144601408 pyconfig.py:471] Config param rope_max_timescale: 10000 I0425 20:31:08.754801 137607144601408 pyconfig.py:471] Config param rope_min_timescale: 1 I0425 20:31:08.754817 137607144601408 pyconfig.py:471] Config param rope_theta_for_vit: 10000 I0425 20:31:08.754838 137607144601408 pyconfig.py:471] Config param rope_truncate: True I0425 20:31:08.754863 137607144601408 pyconfig.py:471] Config param rope_type: RopeType.DEFAULT I0425 20:31:08.754891 137607144601408 pyconfig.py:471] Config param rope_use_scale: True I0425 20:31:08.754918 137607144601408 pyconfig.py:471] Config param routed_bias: False I0425 20:31:08.754944 137607144601408 pyconfig.py:471] Config param routed_bias_update_rate: 0.0 I0425 20:31:08.754969 137607144601408 pyconfig.py:471] Config param routed_scaling_factor: 1.0 I0425 20:31:08.754995 137607144601408 pyconfig.py:471] Config param routed_score_func: I0425 20:31:08.755020 137607144601408 pyconfig.py:471] Config param run_name: gpt3-52k_2026-04-25-20-31 I0425 20:31:08.755045 137607144601408 pyconfig.py:471] Config param sa_block_kv: 512 I0425 20:31:08.755070 137607144601408 pyconfig.py:471] Config param sa_block_kv_compute: 512 I0425 20:31:08.755105 137607144601408 pyconfig.py:471] Config param sa_block_kv_dkv: 512 I0425 20:31:08.755132 137607144601408 pyconfig.py:471] Config param sa_block_kv_dkv_compute: 512 I0425 20:31:08.755156 137607144601408 pyconfig.py:471] Config param sa_block_kv_dq: 512 I0425 20:31:08.755181 137607144601408 pyconfig.py:471] Config param sa_block_q: 512 I0425 20:31:08.755205 137607144601408 pyconfig.py:471] Config param sa_block_q_dkv: 512 I0425 20:31:08.755235 137607144601408 pyconfig.py:471] Config param sa_block_q_dq: 512 I0425 20:31:08.755260 137607144601408 pyconfig.py:471] Config param sa_k_layout: HEAD_DIM_MINOR I0425 20:31:08.755285 137607144601408 pyconfig.py:471] Config param sa_q_layout: HEAD_DIM_MINOR I0425 20:31:08.755309 137607144601408 pyconfig.py:471] Config param sa_use_fused_bwd_kernel: False I0425 20:31:08.755334 137607144601408 pyconfig.py:471] Config param sa_v_layout: HEAD_DIM_MINOR I0425 20:31:08.755358 137607144601408 pyconfig.py:471] Config param sampler_devices_fraction: 0.5 I0425 20:31:08.755384 137607144601408 pyconfig.py:471] Config param save_checkpoint_on_completion: True I0425 20:31:08.755409 137607144601408 pyconfig.py:471] Config param save_config_to_gcs: False I0425 20:31:08.755433 137607144601408 pyconfig.py:471] Config param save_quantized_params_path: I0425 20:31:08.755458 137607144601408 pyconfig.py:471] Config param scale_embedding_for_audio: True I0425 20:31:08.755483 137607144601408 pyconfig.py:471] Config param scan_layers: True I0425 20:31:08.755507 137607144601408 pyconfig.py:471] Config param scan_layers_per_stage: False I0425 20:31:08.755533 137607144601408 pyconfig.py:471] Config param scan_pipeline_iterations: True I0425 20:31:08.755555 137607144601408 pyconfig.py:471] Config param scan_pipeline_repeats: False I0425 20:31:08.755577 137607144601408 pyconfig.py:471] Config param set_remat_policy_on_layers_per_stage: False I0425 20:31:08.755601 137607144601408 pyconfig.py:471] Config param set_remat_policy_on_pipeline_iterations: True I0425 20:31:08.755626 137607144601408 pyconfig.py:471] Config param sft_train_on_completion_only: False I0425 20:31:08.755650 137607144601408 pyconfig.py:471] Config param shard_exp_on_fsdp: False I0425 20:31:08.755675 137607144601408 pyconfig.py:471] Config param shard_mode: ShardMode.AUTO I0425 20:31:08.755702 137607144601408 pyconfig.py:471] Config param shard_optimizer_over_data: False I0425 20:31:08.755724 137607144601408 pyconfig.py:471] Config param sharding_strategy: None I0425 20:31:08.755746 137607144601408 pyconfig.py:471] Config param sharding_tolerance: 0.02 I0425 20:31:08.755766 137607144601408 pyconfig.py:471] Config param shardy: True I0425 20:31:08.755791 137607144601408 pyconfig.py:471] Config param share_kv_projections: False I0425 20:31:08.755814 137607144601408 pyconfig.py:471] Config param shared_experts: 0 I0425 20:31:08.755835 137607144601408 pyconfig.py:471] Config param sinkhorn_iterations: 20 I0425 20:31:08.755857 137607144601408 pyconfig.py:471] Config param skip_first_n_steps_for_profiler: 1 I0425 20:31:08.755873 137607144601408 pyconfig.py:471] Config param skip_jax_distributed_system: False I0425 20:31:08.755887 137607144601408 pyconfig.py:471] Config param skip_step_interval: 128 I0425 20:31:08.755906 137607144601408 pyconfig.py:471] Config param skip_step_on_spikes: False I0425 20:31:08.755931 137607144601408 pyconfig.py:471] Config param skip_step_scaling_factor: 6.0 I0425 20:31:08.755956 137607144601408 pyconfig.py:471] Config param sliding_window_size: 0 I0425 20:31:08.755982 137607144601408 pyconfig.py:471] Config param solution_end_token: </answer> I0425 20:31:08.756005 137607144601408 pyconfig.py:471] Config param solution_start_token: <answer> I0425 20:31:08.756026 137607144601408 pyconfig.py:471] Config param source_checkpoint_layout: orbax I0425 20:31:08.756046 137607144601408 pyconfig.py:471] Config param sparse_matmul: True I0425 20:31:08.756063 137607144601408 pyconfig.py:471] Config param spatial_merge_size_for_vit: 2 I0425 20:31:08.756077 137607144601408 pyconfig.py:471] Config param stack_prefill_result_cache: False I0425 20:31:08.756105 137607144601408 pyconfig.py:471] Config param stack_trace_interval_seconds: 600 I0425 20:31:08.756121 137607144601408 pyconfig.py:471] Config param stack_trace_to_cloud: False I0425 20:31:08.756137 137607144601408 pyconfig.py:471] Config param step_deviation_interval_seconds: 30 I0425 20:31:08.756151 137607144601408 pyconfig.py:471] Config param steps: 200000 I0425 20:31:08.756167 137607144601408 pyconfig.py:471] Config param stop_strings: None I0425 20:31:08.756182 137607144601408 pyconfig.py:471] Config param student_overrides: {'model_name': 'llama3.1-8b'} I0425 20:31:08.756199 137607144601408 pyconfig.py:471] Config param student_params_to_update: None I0425 20:31:08.756214 137607144601408 pyconfig.py:471] Config param subslice_shape: I0425 20:31:08.756233 137607144601408 pyconfig.py:471] Config param swap_space_vllm_gb: 2 I0425 20:31:08.756247 137607144601408 pyconfig.py:471] Config param system_prompt: I0425 20:31:08.756263 137607144601408 pyconfig.py:471] Config param target_eval_loss: 0.0 I0425 20:31:08.756278 137607144601408 pyconfig.py:471] Config param teacher_overrides: {'model_name': 'llama3.1-8b'} I0425 20:31:08.756294 137607144601408 pyconfig.py:471] Config param temperature_tuning: False I0425 20:31:08.756310 137607144601408 pyconfig.py:471] Config param temporal_patch_size_for_vit: 2 I0425 20:31:08.756324 137607144601408 pyconfig.py:471] Config param tensorboard_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-20-31/tensorboard/ I0425 20:31:08.756340 137607144601408 pyconfig.py:471] Config param tensors_on_device: None I0425 20:31:08.756357 137607144601408 pyconfig.py:471] Config param tensors_to_offload: None I0425 20:31:08.756371 137607144601408 pyconfig.py:471] Config param test_batch_start_index: 0 I0425 20:31:08.756387 137607144601408 pyconfig.py:471] Config param tile_size_for_vit: 336 I0425 20:31:08.756403 137607144601408 pyconfig.py:471] Config param tokenize_eval_data: True I0425 20:31:08.756418 137607144601408 pyconfig.py:471] Config param tokenize_train_data: True I0425 20:31:08.756433 137607144601408 pyconfig.py:471] Config param tokenizer_path: meta-llama/Llama-3.1-8B I0425 20:31:08.756449 137607144601408 pyconfig.py:471] Config param tokenizer_type: TokenizerType.HUGGINGFACE I0425 20:31:08.756467 137607144601408 pyconfig.py:471] Config param topk_routing_group: -1 I0425 20:31:08.756484 137607144601408 pyconfig.py:471] Config param train_data_columns: ['text'] I0425 20:31:08.756501 137607144601408 pyconfig.py:471] Config param train_fraction: 1.0 I0425 20:31:08.756517 137607144601408 pyconfig.py:471] Config param train_image_column: image I0425 20:31:08.756531 137607144601408 pyconfig.py:471] Config param train_micro_batch_size: -1 I0425 20:31:08.756548 137607144601408 pyconfig.py:471] Config param train_split: train I0425 20:31:08.756562 137607144601408 pyconfig.py:471] Config param trainable_parameters_mask: [] I0425 20:31:08.756578 137607144601408 pyconfig.py:471] Config param trainable_position_size: 2048 I0425 20:31:08.756593 137607144601408 pyconfig.py:471] Config param trainer_devices_fraction: 0.5 I0425 20:31:08.756609 137607144601408 pyconfig.py:471] Config param upload_all_profiler_results: False I0425 20:31:08.756624 137607144601408 pyconfig.py:471] Config param use_2d_fsdp_sharding: False I0425 20:31:08.756640 137607144601408 pyconfig.py:471] Config param use_agentic_rollout: False I0425 20:31:08.756655 137607144601408 pyconfig.py:471] Config param use_audio: False I0425 20:31:08.756670 137607144601408 pyconfig.py:471] Config param use_audio_in_video: False I0425 20:31:08.756686 137607144601408 pyconfig.py:471] Config param use_batch_split_schedule: False I0425 20:31:08.756702 137607144601408 pyconfig.py:471] Config param use_chat_template: False I0425 20:31:08.756717 137607144601408 pyconfig.py:471] Config param use_chunked_prefill: False I0425 20:31:08.756733 137607144601408 pyconfig.py:471] Config param use_custom_sort_vjp: True I0425 20:31:08.756747 137607144601408 pyconfig.py:471] Config param use_dpo: False I0425 20:31:08.756763 137607144601408 pyconfig.py:471] Config param use_gather_mosaic_kernel: False I0425 20:31:08.756778 137607144601408 pyconfig.py:471] Config param use_grpo: True I0425 20:31:08.756793 137607144601408 pyconfig.py:471] Config param use_indexer: False I0425 20:31:08.756809 137607144601408 pyconfig.py:471] Config param use_iota_embed: True I0425 20:31:08.756823 137607144601408 pyconfig.py:471] Config param use_jax_splash: False I0425 20:31:08.756839 137607144601408 pyconfig.py:471] Config param use_max_logit_estimate: -1 I0425 20:31:08.756853 137607144601408 pyconfig.py:471] Config param use_mrope: False I0425 20:31:08.756869 137607144601408 pyconfig.py:471] Config param use_multimodal: False I0425 20:31:08.756883 137607144601408 pyconfig.py:471] Config param use_nnx_pipeline: False I0425 20:31:08.756899 137607144601408 pyconfig.py:471] Config param use_pathways: True I0425 20:31:08.756915 137607144601408 pyconfig.py:471] Config param use_post_attn_norm: False I0425 20:31:08.756929 137607144601408 pyconfig.py:471] Config param use_post_ffw_norm: False I0425 20:31:08.756945 137607144601408 pyconfig.py:471] Config param use_qk_clip: False I0425 20:31:08.756959 137607144601408 pyconfig.py:471] Config param use_qk_norm: False I0425 20:31:08.756975 137607144601408 pyconfig.py:471] Config param use_qk_norm_in_gdn: True I0425 20:31:08.756990 137607144601408 pyconfig.py:471] Config param use_qwix_quantization: False I0425 20:31:08.757006 137607144601408 pyconfig.py:471] Config param use_ragged_attention: False I0425 20:31:08.757021 137607144601408 pyconfig.py:471] Config param use_random_routing: False I0425 20:31:08.757036 137607144601408 pyconfig.py:471] Config param use_replicator_service: False I0425 20:31:08.757050 137607144601408 pyconfig.py:471] Config param use_ring_of_experts: False I0425 20:31:08.757066 137607144601408 pyconfig.py:471] Config param use_sft: False I0425 20:31:08.757081 137607144601408 pyconfig.py:471] Config param use_splash_scheduler: False I0425 20:31:08.757112 137607144601408 pyconfig.py:471] Config param use_tokamax_gmm: False I0425 20:31:08.757130 137607144601408 pyconfig.py:471] Config param use_tokamax_splash: False I0425 20:31:08.757145 137607144601408 pyconfig.py:471] Config param use_truncation: True I0425 20:31:08.757163 137607144601408 pyconfig.py:471] Config param use_tunix_gradient_accumulation: False I0425 20:31:08.757177 137607144601408 pyconfig.py:471] Config param use_untrainable_positional_embedding: False I0425 20:31:08.757192 137607144601408 pyconfig.py:471] Config param use_vertex_tensorboard: False I0425 20:31:08.757208 137607144601408 pyconfig.py:471] Config param using_pipeline_parallelism: False I0425 20:31:08.757222 137607144601408 pyconfig.py:471] Config param v_head_dim: 128 I0425 20:31:08.757242 137607144601408 pyconfig.py:471] Config param v_norm_with_scale: True I0425 20:31:08.757257 137607144601408 pyconfig.py:471] Config param value_proj: RematLocation.REMAT I0425 20:31:08.757273 137607144601408 pyconfig.py:471] Config param vertex_tensorboard_project: I0425 20:31:08.757288 137607144601408 pyconfig.py:471] Config param vertex_tensorboard_region: I0425 20:31:08.757304 137607144601408 pyconfig.py:471] Config param video_path: I0425 20:31:08.757319 137607144601408 pyconfig.py:471] Config param video_placeholder: <|video|> I0425 20:31:08.757335 137607144601408 pyconfig.py:471] Config param vision_output_dim_for_vit: 4096 I0425 20:31:08.757349 137607144601408 pyconfig.py:471] Config param vision_output_length: -1 I0425 20:31:08.757365 137607144601408 pyconfig.py:471] Config param vllm_additional_config: {} I0425 20:31:08.757380 137607144601408 pyconfig.py:471] Config param vllm_hf_config_path: I0425 20:31:08.757395 137607144601408 pyconfig.py:471] Config param vllm_hf_overrides: {} I0425 20:31:08.757410 137607144601408 pyconfig.py:471] Config param vocab_size: 32000 I0425 20:31:08.757426 137607144601408 pyconfig.py:471] Config param warmup_steps_fraction: 0.1 I0425 20:31:08.757441 137607144601408 pyconfig.py:471] Config param weight_dtype: float32 I0425 20:31:08.757470 137607144601408 pyconfig.py:471] Config param weight_quantization_calibration_method: absmax I0425 20:31:08.757485 137607144601408 pyconfig.py:471] Config param wi_tile_dlhs_batch_seq: 512 I0425 20:31:08.757501 137607144601408 pyconfig.py:471] Config param wi_tile_dlhs_embed_dim: 1024 I0425 20:31:08.757516 137607144601408 pyconfig.py:471] Config param wi_tile_dlhs_mlp_dim: 1024 I0425 20:31:08.757532 137607144601408 pyconfig.py:471] Config param wi_tile_drhs_batch_seq: 512 I0425 20:31:08.757547 137607144601408 pyconfig.py:471] Config param wi_tile_drhs_embed_dim: 1024 I0425 20:31:08.757563 137607144601408 pyconfig.py:471] Config param wi_tile_drhs_mlp_dim: 1024 I0425 20:31:08.757581 137607144601408 pyconfig.py:471] Config param wi_tile_fwd_batch_seq: 512 I0425 20:31:08.757607 137607144601408 pyconfig.py:471] Config param wi_tile_fwd_embed_dim: 1024 I0425 20:31:08.757633 137607144601408 pyconfig.py:471] Config param wi_tile_fwd_mlp_dim: 1024 I0425 20:31:08.757658 137607144601408 pyconfig.py:471] Config param wo_tile_dlhs_batch_seq: 512 I0425 20:31:08.757683 137607144601408 pyconfig.py:471] Config param wo_tile_dlhs_embed_dim: 1024 I0425 20:31:08.757709 137607144601408 pyconfig.py:471] Config param wo_tile_dlhs_mlp_dim: 1024 I0425 20:31:08.757732 137607144601408 pyconfig.py:471] Config param wo_tile_drhs_batch_seq: 512 I0425 20:31:08.757748 137607144601408 pyconfig.py:471] Config param wo_tile_drhs_embed_dim: 1024 I0425 20:31:08.757773 137607144601408 pyconfig.py:471] Config param wo_tile_drhs_mlp_dim: 1024 I0425 20:31:08.757796 137607144601408 pyconfig.py:471] Config param wo_tile_fwd_batch_seq: 512 I0425 20:31:08.757814 137607144601408 pyconfig.py:471] Config param wo_tile_fwd_embed_dim: 1024 I0425 20:31:08.757829 137607144601408 pyconfig.py:471] Config param wo_tile_fwd_mlp_dim: 1024 I0425 20:31:08.757844 137607144601408 pyconfig.py:471] Config param wsd_decay_steps_fraction: 0.1 I0425 20:31:08.757862 137607144601408 pyconfig.py:471] Config param wsd_decay_style: WsdDecayStyle.LINEAR I0425 20:31:08.757879 137607144601408 pyconfig.py:471] Config param xprof_e2e_enable_fw_power_level_event: False I0425 20:31:08.757894 137607144601408 pyconfig.py:471] Config param xprof_e2e_enable_fw_thermal_event: False I0425 20:31:08.757914 137607144601408 pyconfig.py:471] Config param xprof_e2e_enable_fw_throttle_event: False I0425 20:31:08.757939 137607144601408 pyconfig.py:471] Config param xprof_tpu_power_trace_level: 0 I0425 20:31:08.757964 137607144601408 pyconfig.py:471] Config param z_loss_multiplier: 0.0 I0425 20:31:08.758318 137607144601408 tokenizer.py:245] Tokenizer path: meta-llama/Llama-2-7b-chat-hf I0425 20:31:08.758358 137607144601408 tokenizer.py:224] Loading HF tokenizer: meta-llama/Llama-2-7b-chat-hf I0425 20:31:12.413176 137607144601408 _schedule.py:129] A polynomial schedule was set with a non-positive `transition_steps` value; this results in a constant schedule with value `init_value`. I0425 20:31:12.416154 137607144601408 maxtext_utils.py:1565] Num_devices: 32, shape (1, 4, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1) I0425 20:31:12.416264 137607144601408 train_distill.py:596] Applying logical axis rules for model initialization and training... I0425 20:31:12.416337 137607144601408 train_distill.py:600] Loading Student from ... I0425 20:31:12.416364 137607144601408 train_distill.py:169] --- Student Configuration --- I0425 20:31:12.416386 137607144601408 train_distill.py:170] Model Name: gpt3-52k I0425 20:31:12.416407 137607144601408 train_distill.py:171] Dimensions: 1 Layers, 16 Emb Dim, 8 Head Dim I0425 20:31:12.416425 137607144601408 train_distill.py:174] Attention Heads: 2 Query, 2 KV I0425 20:31:12.416443 137607144601408 train_distill.py:175] Vocab Size: 32000 I0425 20:31:12.416461 137607144601408 train_distill.py:176] Checkpoint: I0425 20:31:12.416478 137607144601408 train_distill.py:465] Initializing model: gpt3-52k... I0425 20:31:13.698173 137607144601408 train_distill.py:614] Loading Teacher from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items... I0425 20:31:13.698279 137607144601408 train_distill.py:169] --- Teacher Configuration --- I0425 20:31:13.698307 137607144601408 train_distill.py:170] Model Name: gpt3-52k I0425 20:31:13.698333 137607144601408 train_distill.py:171] Dimensions: 1 Layers, 16 Emb Dim, 8 Head Dim I0425 20:31:13.698355 137607144601408 train_distill.py:174] Attention Heads: 2 Query, 2 KV I0425 20:31:13.698375 137607144601408 train_distill.py:175] Vocab Size: 32000 I0425 20:31:13.698393 137607144601408 train_distill.py:176] Checkpoint: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items I0425 20:31:13.698411 137607144601408 train_distill.py:465] Initializing model: gpt3-52k... I0425 20:31:14.837927 137607144601408 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0425 20:31:14.838406 137607144601408 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=True, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7d26715c9eb0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0425 20:31:14.838466 137607144601408 abstract_checkpointer.py:35] orbax-checkpoint version: 0.11.28 W0425 20:31:15.769386 137607144601408 checkpoint.py:202] Metadata file does not exist: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items/_CHECKPOINT_METADATA I0425 20:31:16.335219 2135 google_auth_provider.cc:181] Running on GCE, using service account 562977990677-compute@developer.gserviceaccount.com I0425 20:31:17.911160 137607144601408 checkpointer.py:304] Restoring checkpoint from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items. W0425 20:31:21.386642 137607144601408 transform_utils.py:230] The transformations API will eventually be replaced by an upgraded design. The current API will not be removed until this point, but it will no longer be actively worked on. I0425 20:31:21.387005 137607144601408 transform_utils.py:288] The following keys are not loaded from the original tree after applying specified transforms: params/params/decoder/to_nnx__rngs/aqt/count, params/params/decoder/to_nnx__rngs/aqt/key, params/params/decoder/to_nnx__rngs/dropout/count, params/params/decoder/to_nnx__rngs/dropout/key, params/params/decoder/to_nnx__rngs/params/count, params/params/decoder/to_nnx__rngs/params/key I0425 20:31:21.439079 137607144601408 checkpointer.py:318] Finished restoring checkpoint in 4.33 seconds from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items. I0425 20:31:22.123501 137607144601408 train_distill.py:640] Initializing Data Iterators via MaxText pipeline... I0425 20:31:22.188062 137607144601408 config.py:112] TensorFlow version 2.20.0 available. I0425 20:31:22.188589 137607144601408 config.py:125] JAX version 0.8.3 available. E0425 20:31:24.255765 137607144601408 packing.py:209] PackAndBatchOperation is deprecated. Please use lazy_dataset.FirstFitPackIterDataset instead. I0425 20:31:24.256014 137607144601408 data_loader.py:408] Adding CopyNumPyArrayToSharedMemory MapTransform. I0425 20:31:24.259280 137607144601408 train_distill.py:410] Input Pipeline Checkpointing: DISABLED I0425 20:31:24.259350 137607144601408 train_distill.py:414] Reason: Iterator 'MultiHostDataLoadIterator' is not recognized as Grain (dataset_type='DatasetType.HF', has_save=False) I0425 20:31:24.259433 137607144601408 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0425 20:31:24.259527 137607144601408 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7d26715c9eb0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0425 20:31:24.259586 137607144601408 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0425 20:31:24.259636 137607144601408 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7d26715c9eb0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0425 20:31:24.259699 137607144601408 checkpoint_manager.py:702] [process=6][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7d0e6b9baf60>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7d0e6b9baea0>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7d0e6b9bae10>}, handler_registry=None I0425 20:31:24.259921 137607144601408 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7d0e6b9baf60>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0425 20:31:24.259972 137607144601408 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7d0e6b9baea0>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0425 20:31:24.260016 137607144601408 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7d0e6b9bae10>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0425 20:31:24.260063 137607144601408 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7d0c6b735f70>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0425 20:31:24.260112 137607144601408 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7d0e6b9baf60>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7d0e6b9baf60>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7d0e6b9baea0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7d0e6b9baea0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7d0e6b9bae10>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7d0e6b9bae10>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7d0c6b735f70>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7d0c6b735f70>}). I0425 20:31:24.260535 137607144601408 async_checkpointer.py:177] [process=6][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7d0e6b976480> timeout: 600 secs and primary_host=0 for async checkpoint writes I0425 20:31:27.239557 137607144601408 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_test_pipeline_scan_nnx_20260425_201236/pt_distill_nnx_xpk_test_pipeline_scan_nnx_20260425_201236_07_distill_smoke/checkpoints I0425 20:31:27.241731 137607144601408 checkpoint_manager.py:921] [process=6][thread=MainThread] CheckpointManager created, primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_test_pipeline_scan_nnx_20260425_201236/pt_distill_nnx_xpk_test_pipeline_scan_nnx_20260425_201236_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7d0e6b9bade0> I0425 20:31:27.241847 137607144601408 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0425 20:31:27.241912 137607144601408 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7d26715c9eb0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0425 20:31:27.241948 137607144601408 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0425 20:31:27.241977 137607144601408 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7d26715c9eb0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0425 20:31:27.242013 137607144601408 checkpoint_manager.py:1983] [process=6][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning. I0425 20:31:27.242065 137607144601408 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=137607144601408 count=1 at 0x7d0fc9121100>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7d0e6b9babd0>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7d0e6b9baba0>, _write_futures=[]) I0425 20:31:27.242437 137607144601408 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=137607144601408 count=1 at 0x7d0fc9121100>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7d0e6b9babd0>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7d0e6b9baba0>, _write_futures=[]) I0425 20:31:27.242465 137607144601408 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=137607144601408 count=1 at 0x7d0fc9121100>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7d0e6b9babd0>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7d0e6b9baba0>, _write_futures=[]) I0425 20:31:27.242496 137607144601408 checkpoint_manager.py:702] [process=6][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7d0e6b9badb0>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7d0e6b9b9fa0>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7d0e6b665580>, 'iter': <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7d0e6b665820>}, handler_registry=None I0425 20:31:27.242599 137607144601408 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7d0e6b9badb0>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0425 20:31:27.242631 137607144601408 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7d0e6b9b9fa0>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0425 20:31:27.242653 137607144601408 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7d0e6b665580>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0425 20:31:27.242681 137607144601408 composite_checkpoint_handler.py:237] Deferred registration for item: "iter". Adding handler `<maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7d0e6b665820>` for item "iter" and save args `<class 'maxtext.common.checkpointing.GrainCheckpointSave'>` and restore args `<class 'maxtext.common.checkpointing.GrainCheckpointRestore'>` to `_handler_registry`. I0425 20:31:27.242704 137607144601408 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7d0e6b665040>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0425 20:31:27.242727 137607144601408 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7d0e6b9badb0>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7d0e6b9badb0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7d0e6b9b9fa0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7d0e6b9b9fa0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7d0e6b665580>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7d0e6b665580>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointSave'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7d0e6b665820>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointRestore'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7d0e6b665820>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7d0e6b665040>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7d0e6b665040>}). I0425 20:31:27.242796 137607144601408 async_checkpointer.py:177] [process=6][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7d0e6b9765c0> timeout: 600 secs and primary_host=0 for async checkpoint writes I0425 20:31:28.473176 137607144601408 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_test_pipeline_scan_nnx_20260425_201236/pt_distill_nnx_xpk_test_pipeline_scan_nnx_20260425_201236_07_distill_smoke/checkpoints I0425 20:31:28.475355 137607144601408 checkpoint_manager.py:921] [process=6][thread=MainThread] CheckpointManager created, primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_test_pipeline_scan_nnx_20260425_201236/pt_distill_nnx_xpk_test_pipeline_scan_nnx_20260425_201236_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7d0e6b9ba3c0> I0425 20:31:28.475761 137607144601408 train_distill.py:691] Starting Distillation Training... I0425 20:31:28.475861 137607144601408 peft_trainer.py:590] Training with mesh: Mesh('diloco': 1, 'data': 4, 'stage': 1, 'fsdp': 8, 'fsdp_transpose': 1, 'sequence': 1, 'context': 1, 'context_autoregressive': 1, 'tensor': 1, 'tensor_transpose': 1, 'tensor_sequence': 1, 'expert': 1, 'autoregressive': 1, axis_types=(Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto)) I0425 20:31:29.207936 137607144601408 peft_trainer.py:600] Compiled train_step cache size: 0 Training: 0%| | 0/5 [00:00<?, ?step/s]I0425 20:31:29.209843 137463112660736 grain_pool.py:367] Grain pool will use 1 processes. I0425 20:31:29.236400 137463112660736 grain_pool.py:440] Grain pool will start child processes. I0425 20:31:29.241698 137463112660736 grain_pool.py:448] Grain pool started all child processes. 2026-04-25 20:31:35.275799: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303) Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} `rope_scaling`'s factor field must be a float >= 1, got 40 `rope_scaling`'s beta_fast field must be a float, got 32 `rope_scaling`'s beta_slow field must be a float, got 1 Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} I0425 20:31:38.645317 137607144601408 utils.py:86] Train loop finished in: 9.4367 seconds Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 765, in <module> app.run(main) File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 316, in run _run_main(main, args) File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 261, in _run_main sys.exit(main(argv)) ^^^^^^^^^^ File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 761, in main train_distill(student_config, teacher_config, is_offline, global_config.offline_data_dir) File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 693, in train_distill trainer.train(train_iter, eval_iter) File "/usr/local/lib/python3.12/site-packages/tunix/sft/peft_trainer.py", line 659, in train train_example = sharding_utils.shard_input( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 58, in shard_input return jax.tree.map( ^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/tree.py", line 155, in map return tree_util.tree_map(f, tree, *rest, is_leaf=is_leaf) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 362, in tree_map return treedef.unflatten(f(*xs) for xs in zip(*all_leaves)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 362, in <genexpr> return treedef.unflatten(f(*xs) for xs in zip(*all_leaves)) ^^^^^^ File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 59, in <lambda> lambda x: jax.make_array_from_process_local_data( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 986, in make_array_from_process_local_data out = [_array_from_process_local_data(data, s, shape) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 1048, in _array_from_process_local_data return make_array_from_callback(global_shape, sharding, cb) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 845, in make_array_from_callback per_device_values = api.device_put(per_device_values, devices) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/api.py", line 2729, in device_put out_flat = dispatch._batched_device_put_impl( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 558, in _batched_device_put_impl y = _device_put_impl(x, device=device, src=src, copy=cp, aval=aval) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 545, in _device_put_impl return _device_put_sharding_impl(x, aval, device, copy) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 487, in _device_put_sharding_impl raise ValueError( ValueError: device_put's first argument must be a fully addressable array, but got value with devices {TpuDevice(id=4, process_index=0, coords=(0,1,0), core_on_chip=0), TpuDevice(id=9, process_index=2, coords=(1,2,0), core_on_chip=0), TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), TpuDevice(id=14, process_index=3, coords=(2,3,0), core_on_chip=0), TpuDevice(id=19, process_index=5, coords=(3,4,0), core_on_chip=0), TpuDevice(id=8, process_index=2, coords=(0,2,0), core_on_chip=0), TpuDevice(id=15, process_index=3, coords=(3,3,0), core_on_chip=0), TpuDevice(id=26, process_index=7, coords=(2,6,0), core_on_chip=0), TpuDevice(id=3, process_index=1, coords=(3,0,0), core_on_chip=0), TpuDevice(id=23, process_index=5, coords=(3,5,0), core_on_chip=0), TpuDevice(id=20, process_index=4, coords=(0,5,0), core_on_chip=0), TpuDevice(id=13, process_index=2, coords=(1,3,0), core_on_chip=0), TpuDevice(id=7, process_index=1, coords=(3,1,0), core_on_chip=0), TpuDevice(id=16, process_index=4, coords=(0,4,0), core_on_chip=0), TpuDevice(id=2, process_index=1, coords=(2,0,0), core_on_chip=0), TpuDevice(id=24, process_index=6, coords=(0,6,0), core_on_chip=0), TpuDevice(id=6, process_index=1, coords=(2,1,0), core_on_chip=0), TpuDevice(id=22, process_index=5, coords=(2,5,0), core_on_chip=0), TpuDevice(id=5, process_index=0, coords=(1,1,0), core_on_chip=0), TpuDevice(id=11, process_index=3, coords=(3,2,0), core_on_chip=0), TpuDevice(id=21, process_index=4, coords=(1,5,0), core_on_chip=0), TpuDevice(id=27, process_index=7, coords=(3,6,0), core_on_chip=0), TpuDevice(id=17, process_index=4, coords=(1,4,0), core_on_chip=0), TpuDevice(id=10, process_index=3, coords=(2,2,0), core_on_chip=0), TpuDevice(id=31, process_index=7, coords=(3,7,0), core_on_chip=0), TpuDevice(id=1, process_index=0, coords=(1,0,0), core_on_chip=0), TpuDevice(id=18, process_index=5, coords=(2,4,0), core_on_chip=0), TpuDevice(id=12, process_index=2, coords=(0,3,0), core_on_chip=0), TpuDevice(id=28, process_index=6, coords=(0,7,0), core_on_chip=0), TpuDevice(id=30, process_index=7, coords=(2,7,0), core_on_chip=0), TpuDevice(id=29, process_index=6, coords=(1,7,0), core_on_chip=0), TpuDevice(id=25, process_index=6, coords=(1,6,0), core_on_chip=0)} I0425 20:31:38.992609 137463112660736 grain_pool.py:542] Grain pool is exiting. I0425 20:31:38.992708 137463112660736 grain_pool.py:547] Shutting down multiprocessing system. I0425 20:31:40.473656 137463112660736 grain_pool.py:547] Shutting down multiprocessing system. Training: 0%| | 0/5 [00:13<?, ?step/s] /usr/local/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 15 leaked shared_memory objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' XPK End: Sat Apr 25 20:31:51 UTC 2026 EXIT_CODE=1