XPK Start: Fri Apr 24 20:29:35 UTC 2026 2026-04-24 20:29:52.959036: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303) Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} `rope_scaling`'s factor field must be a float >= 1, got 40 `rope_scaling`'s beta_fast field must be a float, got 32 `rope_scaling`'s beta_slow field must be a float, got 1 Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} I0424 20:29:56.983273 139001921824576 max_utils.py:273] Attempting to initialize the jax distributed system... INFO:2026-04-24 20:30:06,023:jax._src.distributed:149: Starting JAX distributed service on [::]:8482 I0424 20:30:06.023531 139001921824576 distributed.py:149] Starting JAX distributed service on [::]:8482 INFO:2026-04-24 20:30:06,025:jax._src.distributed:166: Connecting to JAX distributed service on mt-07-distill-smoke-cbkf2-slice-job-0-0.mt-07-distill-smoke-cbkf2:8482 I0424 20:30:06.025687 139001921824576 distributed.py:166] Connecting to JAX distributed service on mt-07-distill-smoke-cbkf2-slice-job-0-0.mt-07-distill-smoke-cbkf2:8482 I0424 20:30:06.972494 139001921824576 max_utils.py:284] Jax distributed system initialized! I0424 20:30:13.654056 139001921824576 max_utils.py:244] Jax distributed system is already initialized. W0424 20:30:13.785633 139001921824576 pyconfig.py:111] base_output_directory is not provided; Using local directory called maxtext_output I0424 20:30:13.844972 139001921824576 max_utils.py:244] Jax distributed system is already initialized. I0424 20:30:13.846500 139001921824576 pyconfig.py:471] Config param abort_on_inf_loss: True I0424 20:30:13.846556 139001921824576 pyconfig.py:471] Config param abort_on_nan_loss: True I0424 20:30:13.846583 139001921824576 pyconfig.py:471] Config param act_quantization_calibration_method: absmax I0424 20:30:13.846602 139001921824576 pyconfig.py:471] Config param activation_dropout_for_audio: 0.0 I0424 20:30:13.846623 139001921824576 pyconfig.py:471] Config param activation_function_for_audio: gelu I0424 20:30:13.846641 139001921824576 pyconfig.py:471] Config param activations_in_float32: False I0424 20:30:13.846659 139001921824576 pyconfig.py:471] Config param adam_b1: 0.9 I0424 20:30:13.846678 139001921824576 pyconfig.py:471] Config param adam_b2: 0.95 I0424 20:30:13.846696 139001921824576 pyconfig.py:471] Config param adam_eps: 1e-08 I0424 20:30:13.846719 139001921824576 pyconfig.py:471] Config param adam_eps_root: 0.0 I0424 20:30:13.846735 139001921824576 pyconfig.py:471] Config param adam_weight_decay: 0.1 I0424 20:30:13.846753 139001921824576 pyconfig.py:471] Config param adamw_mask: [] I0424 20:30:13.846768 139001921824576 pyconfig.py:471] Config param add_bos: True I0424 20:30:13.846786 139001921824576 pyconfig.py:471] Config param add_eos: True I0424 20:30:13.846802 139001921824576 pyconfig.py:471] Config param allow_split_physical_axes: False I0424 20:30:13.846817 139001921824576 pyconfig.py:471] Config param ar_cache_axis_order: 1,2,0,3 I0424 20:30:13.846832 139001921824576 pyconfig.py:471] Config param async_checkpointing: True I0424 20:30:13.846848 139001921824576 pyconfig.py:471] Config param async_scheduling: False I0424 20:30:13.846865 139001921824576 pyconfig.py:471] Config param attention: dot_product I0424 20:30:13.846881 139001921824576 pyconfig.py:471] Config param attention_bias: False I0424 20:30:13.846899 139001921824576 pyconfig.py:471] Config param attention_dropout_for_audio: 0.0 I0424 20:30:13.846917 139001921824576 pyconfig.py:471] Config param attention_out: RematLocation.REMAT I0424 20:30:13.846939 139001921824576 pyconfig.py:471] Config param attention_output_dim: -1 I0424 20:30:13.846956 139001921824576 pyconfig.py:471] Config param attention_sink: False I0424 20:30:13.846971 139001921824576 pyconfig.py:471] Config param attention_type: global I0424 20:30:13.846987 139001921824576 pyconfig.py:471] Config param attn_logits_soft_cap: None I0424 20:30:13.847004 139001921824576 pyconfig.py:471] Config param audio_path: I0424 20:30:13.847021 139001921824576 pyconfig.py:471] Config param audio_placeholder: <|audio|> I0424 20:30:13.847058 139001921824576 pyconfig.py:471] Config param autoregressive_decode_assert: I0424 20:30:13.847077 139001921824576 pyconfig.py:471] Config param base_config: base.yml I0424 20:30:13.847101 139001921824576 pyconfig.py:471] Config param base_emb_dim: 16 I0424 20:30:13.847119 139001921824576 pyconfig.py:471] Config param base_mlp_dim: 64 I0424 20:30:13.847136 139001921824576 pyconfig.py:471] Config param base_moe_mlp_dim: -1 I0424 20:30:13.847151 139001921824576 pyconfig.py:471] Config param base_num_decoder_layers: 1 I0424 20:30:13.847166 139001921824576 pyconfig.py:471] Config param base_num_kv_heads: 2 I0424 20:30:13.847182 139001921824576 pyconfig.py:471] Config param base_num_query_heads: 2 I0424 20:30:13.847197 139001921824576 pyconfig.py:471] Config param base_output_directory: /deps/maxtext_output I0424 20:30:13.847213 139001921824576 pyconfig.py:471] Config param batch_size: 1 I0424 20:30:13.847229 139001921824576 pyconfig.py:471] Config param batch_split_factor: 1 I0424 20:30:13.847245 139001921824576 pyconfig.py:471] Config param beta_fast: 32 I0424 20:30:13.847260 139001921824576 pyconfig.py:471] Config param beta_slow: 1 I0424 20:30:13.847276 139001921824576 pyconfig.py:471] Config param bwd_quantization_calibration_method: absmax I0424 20:30:13.847293 139001921824576 pyconfig.py:471] Config param capacity_factor: -1.0 I0424 20:30:13.847310 139001921824576 pyconfig.py:471] Config param cast_logits_to_fp32: True I0424 20:30:13.847327 139001921824576 pyconfig.py:471] Config param chat_template: I0424 20:30:13.847341 139001921824576 pyconfig.py:471] Config param chat_template_path: I0424 20:30:13.847358 139001921824576 pyconfig.py:471] Config param checkpoint_conversion_fn: None I0424 20:30:13.847375 139001921824576 pyconfig.py:471] Config param checkpoint_dir: /deps/maxtext_output/gpt3-52k_2026-04-24-20-30/checkpoints/ I0424 20:30:13.847391 139001921824576 pyconfig.py:471] Config param checkpoint_is_quantized: False I0424 20:30:13.847407 139001921824576 pyconfig.py:471] Config param checkpoint_period: 2000 I0424 20:30:13.847423 139001921824576 pyconfig.py:471] Config param checkpoint_storage_concurrent_gb: 96 I0424 20:30:13.847438 139001921824576 pyconfig.py:471] Config param checkpoint_storage_target_data_file_size_bytes: 2147483648 I0424 20:30:13.847454 139001921824576 pyconfig.py:471] Config param checkpoint_storage_use_ocdbt: True I0424 20:30:13.847469 139001921824576 pyconfig.py:471] Config param checkpoint_storage_use_zarr3: True I0424 20:30:13.847485 139001921824576 pyconfig.py:471] Config param checkpoint_todelete_full_path: None I0424 20:30:13.847500 139001921824576 pyconfig.py:471] Config param checkpoint_todelete_subdir: None I0424 20:30:13.847520 139001921824576 pyconfig.py:471] Config param chips_per_vm: 4 I0424 20:30:13.847535 139001921824576 pyconfig.py:471] Config param chunk_attn_window_size: 0 I0424 20:30:13.847550 139001921824576 pyconfig.py:471] Config param collect_stack_trace: False I0424 20:30:13.847565 139001921824576 pyconfig.py:471] Config param colocated_python_checkpointing: False I0424 20:30:13.847581 139001921824576 pyconfig.py:471] Config param colocated_python_data_input: False I0424 20:30:13.847595 139001921824576 pyconfig.py:471] Config param compile_topology: I0424 20:30:13.847610 139001921824576 pyconfig.py:471] Config param compile_topology_num_slices: -1 I0424 20:30:13.847626 139001921824576 pyconfig.py:471] Config param compile_xla_flags: I0424 20:30:13.847642 139001921824576 pyconfig.py:471] Config param compiled_trainstep_file: I0424 20:30:13.847656 139001921824576 pyconfig.py:471] Config param compute_axis_order: 0,1,2,3 I0424 20:30:13.847672 139001921824576 pyconfig.py:471] Config param constant_bound_config: [] I0424 20:30:13.847687 139001921824576 pyconfig.py:471] Config param context: RematLocation.REMAT I0424 20:30:13.847702 139001921824576 pyconfig.py:471] Config param context_parallel_load_balance: True I0424 20:30:13.847718 139001921824576 pyconfig.py:471] Config param context_parallel_reorder_strategy: ReorderStrategy.AUTO I0424 20:30:13.847735 139001921824576 pyconfig.py:471] Config param context_parallel_size: 1 I0424 20:30:13.847751 139001921824576 pyconfig.py:471] Config param context_parallel_strategy: all_gather I0424 20:30:13.847766 139001921824576 pyconfig.py:471] Config param context_sharding: context I0424 20:30:13.847781 139001921824576 pyconfig.py:471] Config param conv_chunksize_for_audio: 500 I0424 20:30:13.847796 139001921824576 pyconfig.py:471] Config param conv_stride_for_vit: 14 I0424 20:30:13.847812 139001921824576 pyconfig.py:471] Config param convert_checkpoint_if_possible: False I0424 20:30:13.847826 139001921824576 pyconfig.py:471] Config param cost_estimate_flops_bwd: -1 I0424 20:30:13.847841 139001921824576 pyconfig.py:471] Config param cost_estimate_flops_fwd: -1 I0424 20:30:13.847856 139001921824576 pyconfig.py:471] Config param custom_mesh: I0424 20:30:13.847872 139001921824576 pyconfig.py:471] Config param custom_mesh_and_rule: I0424 20:30:13.847887 139001921824576 pyconfig.py:471] Config param d_model_for_audio: 256 I0424 20:30:13.847901 139001921824576 pyconfig.py:471] Config param data_sharding: (('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'),) I0424 20:30:13.847921 139001921824576 pyconfig.py:471] Config param data_shuffle_seed: 0 I0424 20:30:13.847935 139001921824576 pyconfig.py:471] Config param dataset_name: c4/en:3.0.1 I0424 20:30:13.847951 139001921824576 pyconfig.py:471] Config param dataset_path: I0424 20:30:13.847966 139001921824576 pyconfig.py:471] Config param dataset_type: DatasetType.HF I0424 20:30:13.847983 139001921824576 pyconfig.py:471] Config param dcn_autoregressive_parallelism: 1 I0424 20:30:13.847999 139001921824576 pyconfig.py:471] Config param dcn_context_autoregressive_parallelism: 1 I0424 20:30:13.848013 139001921824576 pyconfig.py:471] Config param dcn_context_parallelism: 1 I0424 20:30:13.848029 139001921824576 pyconfig.py:471] Config param dcn_data_parallelism: -1 I0424 20:30:13.848045 139001921824576 pyconfig.py:471] Config param dcn_diloco_parallelism: 1 I0424 20:30:13.848059 139001921824576 pyconfig.py:471] Config param dcn_expert_parallelism: 1 I0424 20:30:13.848073 139001921824576 pyconfig.py:471] Config param dcn_fsdp_parallelism: 1 I0424 20:30:13.848089 139001921824576 pyconfig.py:471] Config param dcn_fsdp_transpose_parallelism: 1 I0424 20:30:13.848124 139001921824576 pyconfig.py:471] Config param dcn_parallelism: [1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] I0424 20:30:13.848142 139001921824576 pyconfig.py:471] Config param dcn_pipeline_parallelism: 1 I0424 20:30:13.848157 139001921824576 pyconfig.py:471] Config param dcn_sequence_parallelism: 1 I0424 20:30:13.848173 139001921824576 pyconfig.py:471] Config param dcn_tensor_parallelism: 1 I0424 20:30:13.848188 139001921824576 pyconfig.py:471] Config param dcn_tensor_sequence_parallelism: 1 I0424 20:30:13.848204 139001921824576 pyconfig.py:471] Config param dcn_tensor_transpose_parallelism: 1 I0424 20:30:13.848219 139001921824576 pyconfig.py:471] Config param debug: {'rl': False} I0424 20:30:13.848235 139001921824576 pyconfig.py:471] Config param debug_sharding: False I0424 20:30:13.848250 139001921824576 pyconfig.py:471] Config param decode_sampling_nucleus_p: -1 I0424 20:30:13.848265 139001921824576 pyconfig.py:471] Config param decode_sampling_strategy: SamplingStrategy.GREEDY I0424 20:30:13.848283 139001921824576 pyconfig.py:471] Config param decode_sampling_temperature: 1.0 I0424 20:30:13.848299 139001921824576 pyconfig.py:471] Config param decode_sampling_top_k: 0 I0424 20:30:13.848314 139001921824576 pyconfig.py:471] Config param decoder_block: DecoderBlockType.GPT3 I0424 20:30:13.848332 139001921824576 pyconfig.py:471] Config param decoder_layer_input: RematLocation.DEVICE I0424 20:30:13.848348 139001921824576 pyconfig.py:471] Config param deepstack_visual_indexes_for_vit: [] I0424 20:30:13.848363 139001921824576 pyconfig.py:471] Config param degenerate_group_masking: True I0424 20:30:13.848379 139001921824576 pyconfig.py:471] Config param dense_init_scale: 1.0 I0424 20:30:13.848393 139001921824576 pyconfig.py:471] Config param diloco_outer_lr: 0.3 I0424 20:30:13.848410 139001921824576 pyconfig.py:471] Config param diloco_outer_momentum: 0.9 I0424 20:30:13.848424 139001921824576 pyconfig.py:471] Config param diloco_sync_period: 36 I0424 20:30:13.848440 139001921824576 pyconfig.py:471] Config param distill_alpha: 0.5 I0424 20:30:13.848456 139001921824576 pyconfig.py:471] Config param distill_alpha_end: None I0424 20:30:13.848472 139001921824576 pyconfig.py:471] Config param distill_alpha_schedule: constant I0424 20:30:13.848486 139001921824576 pyconfig.py:471] Config param distill_beta: 0.0 I0424 20:30:13.848505 139001921824576 pyconfig.py:471] Config param distill_beta_end: None I0424 20:30:13.848520 139001921824576 pyconfig.py:471] Config param distill_beta_schedule: constant I0424 20:30:13.848534 139001921824576 pyconfig.py:471] Config param distill_feature_loss_type: cosine I0424 20:30:13.848551 139001921824576 pyconfig.py:471] Config param distill_layer_indices: None I0424 20:30:13.848566 139001921824576 pyconfig.py:471] Config param distill_temperature: 1.0 I0424 20:30:13.848581 139001921824576 pyconfig.py:471] Config param distill_temperature_end: None I0424 20:30:13.848595 139001921824576 pyconfig.py:471] Config param distill_temperature_schedule: constant I0424 20:30:13.848611 139001921824576 pyconfig.py:471] Config param downsample_hidden_size_for_audio: 256 I0424 20:30:13.848625 139001921824576 pyconfig.py:471] Config param dpo_beta: 0.1 I0424 20:30:13.848642 139001921824576 pyconfig.py:471] Config param dpo_label_smoothing: 0.0 I0424 20:30:13.848656 139001921824576 pyconfig.py:471] Config param dq_reduction_steps: 0 I0424 20:30:13.848671 139001921824576 pyconfig.py:471] Config param dropout_rate: 0.0 I0424 20:30:13.848686 139001921824576 pyconfig.py:471] Config param dtype: bfloat16 I0424 20:30:13.848716 139001921824576 pyconfig.py:471] Config param dtype_mm: float32 I0424 20:30:13.848733 139001921824576 pyconfig.py:471] Config param dump_hlo: False I0424 20:30:13.848748 139001921824576 pyconfig.py:471] Config param dump_hlo_delete_local_after: True I0424 20:30:13.848764 139001921824576 pyconfig.py:471] Config param dump_hlo_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-24-20-30/xla_dump I0424 20:30:13.848779 139001921824576 pyconfig.py:471] Config param dump_hlo_local_dir: /tmp/xla_dump/ I0424 20:30:13.848795 139001921824576 pyconfig.py:471] Config param dump_hlo_local_module_name: jit_train_step I0424 20:30:13.848810 139001921824576 pyconfig.py:471] Config param dump_hlo_module_name: jit_train_step I0424 20:30:13.848826 139001921824576 pyconfig.py:471] Config param dump_hlo_upload_all: False I0424 20:30:13.848842 139001921824576 pyconfig.py:471] Config param dump_hlo_xla_flags: I0424 20:30:13.848856 139001921824576 pyconfig.py:471] Config param dump_jaxpr: False I0424 20:30:13.848872 139001921824576 pyconfig.py:471] Config param dump_jaxpr_delete_local_after: True I0424 20:30:13.848888 139001921824576 pyconfig.py:471] Config param dump_jaxpr_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-24-20-30/jaxpr_dump I0424 20:30:13.848902 139001921824576 pyconfig.py:471] Config param dump_jaxpr_local_dir: /tmp/jaxpr_dump/ I0424 20:30:13.848918 139001921824576 pyconfig.py:471] Config param dump_step: -1 I0424 20:30:13.848934 139001921824576 pyconfig.py:471] Config param elastic_enabled: False I0424 20:30:13.848948 139001921824576 pyconfig.py:471] Config param elastic_max_retries: 10 I0424 20:30:13.848965 139001921824576 pyconfig.py:471] Config param elastic_timeout_seconds: 300 I0424 20:30:13.848980 139001921824576 pyconfig.py:471] Config param emb_dim: 16 I0424 20:30:13.848996 139001921824576 pyconfig.py:471] Config param enable_autocheckpoint: False I0424 20:30:13.849011 139001921824576 pyconfig.py:471] Config param enable_checkpoint_cloud_logger: False I0424 20:30:13.849026 139001921824576 pyconfig.py:471] Config param enable_checkpointing: True I0424 20:30:13.849042 139001921824576 pyconfig.py:471] Config param enable_continuous_checkpointing: False I0424 20:30:13.849056 139001921824576 pyconfig.py:471] Config param enable_data_shuffling: True I0424 20:30:13.849072 139001921824576 pyconfig.py:471] Config param enable_diloco: False I0424 20:30:13.849102 139001921824576 pyconfig.py:471] Config param enable_dp_attention: False I0424 20:30:13.849119 139001921824576 pyconfig.py:471] Config param enable_dropout: False I0424 20:30:13.849135 139001921824576 pyconfig.py:471] Config param enable_emergency_checkpoint: False I0424 20:30:13.849149 139001921824576 pyconfig.py:471] Config param enable_expert_parallel: False I0424 20:30:13.849165 139001921824576 pyconfig.py:471] Config param enable_gcp_goodput_metrics: True I0424 20:30:13.849180 139001921824576 pyconfig.py:471] Config param enable_gcp_step_deviation_metrics: True I0424 20:30:13.849195 139001921824576 pyconfig.py:471] Config param enable_goodput_recording: False I0424 20:30:13.849210 139001921824576 pyconfig.py:471] Config param enable_jax_profiler: False I0424 20:30:13.849225 139001921824576 pyconfig.py:471] Config param enable_llm_inference_pool: False I0424 20:30:13.849241 139001921824576 pyconfig.py:471] Config param enable_model_warmup: False I0424 20:30:13.849255 139001921824576 pyconfig.py:471] Config param enable_multi_tier_checkpointing: False I0424 20:30:13.849271 139001921824576 pyconfig.py:471] Config param enable_nnx: False I0424 20:30:13.849285 139001921824576 pyconfig.py:471] Config param enable_orbax_v1: False I0424 20:30:13.849301 139001921824576 pyconfig.py:471] Config param enable_padding_causal_mask: True I0424 20:30:13.849315 139001921824576 pyconfig.py:471] Config param enable_pathways_goodput: False I0424 20:30:13.849331 139001921824576 pyconfig.py:471] Config param enable_prefix_caching: False I0424 20:30:13.849345 139001921824576 pyconfig.py:471] Config param enable_rampup_batch_size: False I0424 20:30:13.849361 139001921824576 pyconfig.py:471] Config param enable_single_controller: False I0424 20:30:13.849375 139001921824576 pyconfig.py:471] Config param enable_single_replica_ckpt_restoring: False I0424 20:30:13.849391 139001921824576 pyconfig.py:471] Config param enable_tensorboard: True I0424 20:30:13.849406 139001921824576 pyconfig.py:471] Config param enable_tunix_perf_metrics: False I0424 20:30:13.849421 139001921824576 pyconfig.py:471] Config param encoder_attention_heads_for_audio: 4 I0424 20:30:13.849435 139001921824576 pyconfig.py:471] Config param encoder_ffn_dim_for_audio: 512 I0424 20:30:13.849450 139001921824576 pyconfig.py:471] Config param encoder_layers_for_audio: 2 I0424 20:30:13.849465 139001921824576 pyconfig.py:471] Config param engram: RematLocation.REMAT I0424 20:30:13.849481 139001921824576 pyconfig.py:471] Config param engram_head_dim: 1280 I0424 20:30:13.849497 139001921824576 pyconfig.py:471] Config param engram_kernel_size: 4 I0424 20:30:13.849514 139001921824576 pyconfig.py:471] Config param engram_layers: [] I0424 20:30:13.849530 139001921824576 pyconfig.py:471] Config param engram_max_ngram_size: 3 I0424 20:30:13.849545 139001921824576 pyconfig.py:471] Config param engram_num_heads: 8 I0424 20:30:13.849559 139001921824576 pyconfig.py:471] Config param engram_seed: 0 I0424 20:30:13.849575 139001921824576 pyconfig.py:471] Config param engram_vocab_bases: [] I0424 20:30:13.849590 139001921824576 pyconfig.py:471] Config param epsilon_high: None I0424 20:30:13.849605 139001921824576 pyconfig.py:471] Config param eval_corr_lst: False I0424 20:30:13.849619 139001921824576 pyconfig.py:471] Config param eval_data_columns: ['text'] I0424 20:30:13.849635 139001921824576 pyconfig.py:471] Config param eval_dataset_name: c4/en:3.0.1 I0424 20:30:13.849650 139001921824576 pyconfig.py:471] Config param eval_image_column: image I0424 20:30:13.849664 139001921824576 pyconfig.py:471] Config param eval_interval: -1 I0424 20:30:13.849680 139001921824576 pyconfig.py:471] Config param eval_make_lst: False I0424 20:30:13.849695 139001921824576 pyconfig.py:471] Config param eval_per_device_batch_size: 2 I0424 20:30:13.849711 139001921824576 pyconfig.py:471] Config param eval_sampling_strategy: greedy I0424 20:30:13.849727 139001921824576 pyconfig.py:471] Config param eval_split: validation I0424 20:30:13.849741 139001921824576 pyconfig.py:471] Config param eval_steps: -1 I0424 20:30:13.849757 139001921824576 pyconfig.py:471] Config param expansion_factor_real_data: -1.0 I0424 20:30:13.849772 139001921824576 pyconfig.py:471] Config param final_logits_soft_cap: None I0424 20:30:13.849788 139001921824576 pyconfig.py:471] Config param first_num_dense_layers: 0 I0424 20:30:13.849802 139001921824576 pyconfig.py:471] Config param float32_gate_logits: False I0424 20:30:13.849818 139001921824576 pyconfig.py:471] Config param float32_logits: False I0424 20:30:13.849832 139001921824576 pyconfig.py:471] Config param float32_qk_product: False I0424 20:30:13.849847 139001921824576 pyconfig.py:471] Config param float32_weight_sum: True I0424 20:30:13.849862 139001921824576 pyconfig.py:471] Config param force_q_layout: False I0424 20:30:13.849876 139001921824576 pyconfig.py:471] Config param force_unroll: False I0424 20:30:13.849892 139001921824576 pyconfig.py:471] Config param freeze_audio_encoder_params: True I0424 20:30:13.849906 139001921824576 pyconfig.py:471] Config param freeze_vision_encoder_params: True I0424 20:30:13.849922 139001921824576 pyconfig.py:471] Config param fused_mlp: False I0424 20:30:13.849937 139001921824576 pyconfig.py:471] Config param fused_qkv: True I0424 20:30:13.849953 139001921824576 pyconfig.py:471] Config param gcs_metrics: False I0424 20:30:13.849967 139001921824576 pyconfig.py:471] Config param gdn_chunk_size: 64 I0424 20:30:13.849984 139001921824576 pyconfig.py:471] Config param gdn_conv_kernel_dim: 4 I0424 20:30:13.849998 139001921824576 pyconfig.py:471] Config param gdn_key_head_dim: 128 I0424 20:30:13.850014 139001921824576 pyconfig.py:471] Config param gdn_num_key_heads: 16 I0424 20:30:13.850028 139001921824576 pyconfig.py:471] Config param gdn_num_value_heads: 32 I0424 20:30:13.850044 139001921824576 pyconfig.py:471] Config param gdn_value_head_dim: 128 I0424 20:30:13.850060 139001921824576 pyconfig.py:471] Config param generate_padding_batch_eval: False I0424 20:30:13.850075 139001921824576 pyconfig.py:471] Config param generate_padding_batch_train: False I0424 20:30:13.850091 139001921824576 pyconfig.py:471] Config param generate_slice: v5e-16 I0424 20:30:13.850115 139001921824576 pyconfig.py:471] Config param generation_configs: {} I0424 20:30:13.850130 139001921824576 pyconfig.py:471] Config param global_batch_size_to_eval_on: 64 I0424 20:30:13.850145 139001921824576 pyconfig.py:471] Config param global_batch_size_to_load: 512 I0424 20:30:13.850160 139001921824576 pyconfig.py:471] Config param global_batch_size_to_load_eval: 64 I0424 20:30:13.850175 139001921824576 pyconfig.py:471] Config param global_batch_size_to_load_increment: None I0424 20:30:13.850191 139001921824576 pyconfig.py:471] Config param global_batch_size_to_load_start: None I0424 20:30:13.850205 139001921824576 pyconfig.py:471] Config param global_batch_size_to_train_on: 512 I0424 20:30:13.850221 139001921824576 pyconfig.py:471] Config param global_head_dim: 0 I0424 20:30:13.850236 139001921824576 pyconfig.py:471] Config param global_num_kv_heads: 0 I0424 20:30:13.850251 139001921824576 pyconfig.py:471] Config param global_parameter_scale: 1 I0424 20:30:13.850267 139001921824576 pyconfig.py:471] Config param global_rampup_samples: 500 I0424 20:30:13.850284 139001921824576 pyconfig.py:471] Config param global_rope_max_timescale: -1 I0424 20:30:13.850298 139001921824576 pyconfig.py:471] Config param global_rope_proportion: 0.25 I0424 20:30:13.850315 139001921824576 pyconfig.py:471] Config param goodput_upload_interval_seconds: 30 I0424 20:30:13.850331 139001921824576 pyconfig.py:471] Config param grad_dtype: float32 I0424 20:30:13.850365 139001921824576 pyconfig.py:471] Config param gradient_accumulation_steps: 8 I0424 20:30:13.850381 139001921824576 pyconfig.py:471] Config param gradient_clipping_threshold: 1.0 I0424 20:30:13.850397 139001921824576 pyconfig.py:471] Config param grain_data_source_max_workers: 16 I0424 20:30:13.850413 139001921824576 pyconfig.py:471] Config param grain_eval_files: I0424 20:30:13.850428 139001921824576 pyconfig.py:471] Config param grain_file_type: arrayrecord I0424 20:30:13.850444 139001921824576 pyconfig.py:471] Config param grain_num_threads: 16 I0424 20:30:13.850459 139001921824576 pyconfig.py:471] Config param grain_num_threads_eval: 16 I0424 20:30:13.850474 139001921824576 pyconfig.py:471] Config param grain_packing_type: first_fit I0424 20:30:13.850490 139001921824576 pyconfig.py:471] Config param grain_per_worker_buffer_size: 1 I0424 20:30:13.850507 139001921824576 pyconfig.py:471] Config param grain_per_worker_buffer_size_eval: 1 I0424 20:30:13.850524 139001921824576 pyconfig.py:471] Config param grain_prefetch_buffer_size: 500 I0424 20:30:13.850538 139001921824576 pyconfig.py:471] Config param grain_prefetch_buffer_size_eval: 500 I0424 20:30:13.850554 139001921824576 pyconfig.py:471] Config param grain_ram_budget_mb: 1024 I0424 20:30:13.850569 139001921824576 pyconfig.py:471] Config param grain_shuffle_buffer_size: 100 I0424 20:30:13.850584 139001921824576 pyconfig.py:471] Config param grain_train_files: I0424 20:30:13.850599 139001921824576 pyconfig.py:471] Config param grain_train_mixture_config_path: I0424 20:30:13.850614 139001921824576 pyconfig.py:471] Config param grain_worker_count: 1 I0424 20:30:13.850629 139001921824576 pyconfig.py:471] Config param grain_worker_count_eval: 1 I0424 20:30:13.850644 139001921824576 pyconfig.py:471] Config param grpo_beta: 0.08 I0424 20:30:13.850659 139001921824576 pyconfig.py:471] Config param grpo_epsilon: 0.2 I0424 20:30:13.850675 139001921824576 pyconfig.py:471] Config param hardware: tpu I0424 20:30:13.850691 139001921824576 pyconfig.py:471] Config param hbm_utilization_vllm: 0.72 I0424 20:30:13.850706 139001921824576 pyconfig.py:471] Config param head_dim: 8 I0424 20:30:13.850721 139001921824576 pyconfig.py:471] Config param heartbeat_reporting_interval_in_seconds: 5 I0424 20:30:13.850736 139001921824576 pyconfig.py:471] Config param hf_data_dir: None I0424 20:30:13.850752 139001921824576 pyconfig.py:471] Config param hf_eval_files: None I0424 20:30:13.850766 139001921824576 pyconfig.py:471] Config param hf_eval_split: None I0424 20:30:13.850781 139001921824576 pyconfig.py:471] Config param hf_name: None I0424 20:30:13.850796 139001921824576 pyconfig.py:471] Config param hf_path: OptimalScale/ClimbMix I0424 20:30:13.850811 139001921824576 pyconfig.py:471] Config param hf_train_files: None I0424 20:30:13.850826 139001921824576 pyconfig.py:471] Config param hidden_size_for_vit: 1408 I0424 20:30:13.850842 139001921824576 pyconfig.py:471] Config param hide_profiler_step_metric: False I0424 20:30:13.850857 139001921824576 pyconfig.py:471] Config param ici_autoregressive_parallelism: 1 I0424 20:30:13.850872 139001921824576 pyconfig.py:471] Config param ici_context_autoregressive_parallelism: 1 I0424 20:30:13.850886 139001921824576 pyconfig.py:471] Config param ici_context_parallelism: 1 I0424 20:30:13.850902 139001921824576 pyconfig.py:471] Config param ici_data_parallelism: 1 I0424 20:30:13.850917 139001921824576 pyconfig.py:471] Config param ici_diloco_parallelism: 1 I0424 20:30:13.850931 139001921824576 pyconfig.py:471] Config param ici_expert_parallelism: 1 I0424 20:30:13.850947 139001921824576 pyconfig.py:471] Config param ici_fsdp_parallelism: -1 I0424 20:30:13.850962 139001921824576 pyconfig.py:471] Config param ici_fsdp_transpose_parallelism: 1 I0424 20:30:13.850977 139001921824576 pyconfig.py:471] Config param ici_parallelism: [1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1] I0424 20:30:13.850994 139001921824576 pyconfig.py:471] Config param ici_pipeline_parallelism: 1 I0424 20:30:13.851009 139001921824576 pyconfig.py:471] Config param ici_sequence_parallelism: 1 I0424 20:30:13.851024 139001921824576 pyconfig.py:471] Config param ici_tensor_parallelism: 1 I0424 20:30:13.851039 139001921824576 pyconfig.py:471] Config param ici_tensor_sequence_parallelism: 1 I0424 20:30:13.851054 139001921824576 pyconfig.py:471] Config param ici_tensor_transpose_parallelism: 1 I0424 20:30:13.851069 139001921824576 pyconfig.py:471] Config param image_path: I0424 20:30:13.851084 139001921824576 pyconfig.py:471] Config param image_placeholder: <|image|> I0424 20:30:13.851108 139001921824576 pyconfig.py:471] Config param image_size_for_vit: 896 I0424 20:30:13.851122 139001921824576 pyconfig.py:471] Config param indexer_head_dim: 128 I0424 20:30:13.851138 139001921824576 pyconfig.py:471] Config param indexer_loss_scaling_factor: 0.0 I0424 20:30:13.851154 139001921824576 pyconfig.py:471] Config param indexer_n_heads: 64 I0424 20:30:13.851169 139001921824576 pyconfig.py:471] Config param indexer_sparse_training: False I0424 20:30:13.851184 139001921824576 pyconfig.py:471] Config param indexer_topk: 2048 I0424 20:30:13.851200 139001921824576 pyconfig.py:471] Config param inference_benchmark_test: False I0424 20:30:13.851214 139001921824576 pyconfig.py:471] Config param inference_metadata_file: I0424 20:30:13.851230 139001921824576 pyconfig.py:471] Config param inference_microbenchmark_log_file_path: I0424 20:30:13.851244 139001921824576 pyconfig.py:471] Config param inference_microbenchmark_loop_iters: 10 I0424 20:30:13.851260 139001921824576 pyconfig.py:471] Config param inference_microbenchmark_num_samples: [1, 2, 3, 4, 5] I0424 20:30:13.851274 139001921824576 pyconfig.py:471] Config param inference_microbenchmark_prefill_lengths: 64,128,256,512,1024 I0424 20:30:13.851290 139001921824576 pyconfig.py:471] Config param inference_microbenchmark_stages: prefill,generate I0424 20:30:13.851305 139001921824576 pyconfig.py:471] Config param inference_server: MaxtextInterleavedServer I0424 20:30:13.851320 139001921824576 pyconfig.py:471] Config param inhomogeneous_layer_cycle_interval: 1 I0424 20:30:13.851336 139001921824576 pyconfig.py:471] Config param init_weights_seed: 0 I0424 20:30:13.851351 139001921824576 pyconfig.py:471] Config param input_data_sharding_logical_axes: ['activation_embed_and_logits_batch', 'activation_norm_length'] I0424 20:30:13.851367 139001921824576 pyconfig.py:471] Config param interleave_moe_layer_step: 1 I0424 20:30:13.851382 139001921824576 pyconfig.py:471] Config param intermediate_size_for_vit: 5632 I0424 20:30:13.851398 139001921824576 pyconfig.py:471] Config param internal_compile: False I0424 20:30:13.851413 139001921824576 pyconfig.py:471] Config param internal_compile_num_devices: -1 I0424 20:30:13.851428 139001921824576 pyconfig.py:471] Config param jax_cache_dir: ~/jax_cache I0424 20:30:13.851444 139001921824576 pyconfig.py:471] Config param jax_debug_log_modules: I0424 20:30:13.851459 139001921824576 pyconfig.py:471] Config param jax_distributed_initialization_timeout: 300 I0424 20:30:13.851475 139001921824576 pyconfig.py:471] Config param jax_profiler_port: 9999 I0424 20:30:13.851490 139001921824576 pyconfig.py:471] Config param key_proj: RematLocation.REMAT I0424 20:30:13.851510 139001921824576 pyconfig.py:471] Config param kv_cache_buffer: 256 I0424 20:30:13.851526 139001921824576 pyconfig.py:471] Config param kv_lora_rank: 512 I0424 20:30:13.851541 139001921824576 pyconfig.py:471] Config param kv_quant_axis: KvQuantAxis.HEADS_AND_DKV I0424 20:30:13.851558 139001921824576 pyconfig.py:471] Config param kv_quant_dtype: int8 I0424 20:30:13.851574 139001921824576 pyconfig.py:471] Config param kv_wa_proj: RematLocation.REMAT I0424 20:30:13.851590 139001921824576 pyconfig.py:471] Config param learning_rate: 0.0002 I0424 20:30:13.851606 139001921824576 pyconfig.py:471] Config param learning_rate_final_fraction: 0.1 I0424 20:30:13.851621 139001921824576 pyconfig.py:471] Config param learning_rate_schedule_steps: 200000 I0424 20:30:13.851636 139001921824576 pyconfig.py:471] Config param load_balance_loss_weight: 0.0 I0424 20:30:13.851652 139001921824576 pyconfig.py:471] Config param load_checkpoint_only_once: False I0424 20:30:13.851667 139001921824576 pyconfig.py:471] Config param load_from_prefill_dir: False I0424 20:30:13.851681 139001921824576 pyconfig.py:471] Config param load_full_state_path: I0424 20:30:13.851697 139001921824576 pyconfig.py:471] Config param load_parameters_path: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items I0424 20:30:13.851713 139001921824576 pyconfig.py:471] Config param local_checkpoint_directory: I0424 20:30:13.851729 139001921824576 pyconfig.py:471] Config param local_checkpoint_period: 0 I0424 20:30:13.851745 139001921824576 pyconfig.py:471] Config param local_rope_max_timescale: -1 I0424 20:30:13.851759 139001921824576 pyconfig.py:471] Config param local_rope_proportion: 1.0 I0424 20:30:13.851775 139001921824576 pyconfig.py:471] Config param log_config: True I0424 20:30:13.851790 139001921824576 pyconfig.py:471] Config param log_period: 10 I0424 20:30:13.851805 139001921824576 pyconfig.py:471] Config param logical_axis_rules: (('activation_embed_and_logits_batch', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_embed_and_logits_batch_sequence', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_vocab', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_vocab', ('tensor', 'tensor_transpose')), ('activation_vocab', 'tensor_sequence'), ('activation_vocab', ('sequence', 'context')), ('vocab', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('embed_vocab', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence', 'autoregressive')), ('activation_kv_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence')), ('activation_attn_length', ('sequence', 'context')), ('activation_attn_length', ('context',)), ('activation_q_length', ('context',)), ('activation_kv_length', ()), ('activation_attn_embed', ('tensor', 'tensor_transpose')), ('activation_kv', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_kv_head_dim', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('q_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('kv_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('qkv', ()), ('kv', ()), ('kv_head_dim', ()), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'expert')), ('q_lora_up_proj', ()), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'expert')), ('kv_lora_up_proj', ()), ('activation_batch_moe', ('data', 'fsdp', 'fsdp_transpose')), ('activation_length_moe', ('sequence', 'context')), ('activation_length_moe', ('context',)), ('activation_norm_length_moe', ('tensor_sequence', 'context', 'sequence')), ('activation_embed_moe', ('tensor', 'tensor_transpose')), ('activation_mlp_moe', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_exp', ('expert',)), ('exp', 'expert'), ('mlp_moe', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'context')), ('embed_moe', ('fsdp', 'sequence', 'context')), ('activation_mlp', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_length', ('sequence', 'context')), ('activation_length', ('context',)), ('activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_embed', ('tensor', 'tensor_transpose')), ('activation_stage', 'stage'), ('mlp', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'context', 'expert')), ('norm', ('tensor', 'tensor_transpose')), ('layers', 'stage'), ('diloco', 'diloco'), ('engram_dim', ('tensor',)), ('dense_layers', ()), ('moe_layers', ()), ('mhc', ()), ('prefill_activation_length', ('sequence', 'context')), ('prefill_activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_prefill_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_length', ('sequence',)), ('cache_heads', ('autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence')), ('cache_heads', ('autoregressive', 'tensor', 'tensor_sequence')), ('paged_kv_heads', ('tensor',)), ('cache_batch_prefill', ()), ('cache_batch', ()), ('cache_heads_none', ()), ('cache_kv', ()), ('cache_sequence', ()), ('num_pages', ()), ('tokens_per_page', ()), ('paged_kv_head_dim_size', ()), ('mlp_no_fsdp', ('tensor', 'tensor_sequence', 'autoregressive')), ('embed_tensor_transpose', ('tensor_transpose',)), ('exp_with_fsdp', 'fsdp')) I0424 20:30:13.851880 139001921824576 pyconfig.py:471] Config param logits_dot_in_fp32: False I0424 20:30:13.851896 139001921824576 pyconfig.py:471] Config param logits_via_embedding: True I0424 20:30:13.851912 139001921824576 pyconfig.py:471] Config param lora_input_adapters_path: I0424 20:30:13.851926 139001921824576 pyconfig.py:471] Config param loss_algo: grpo I0424 20:30:13.851942 139001921824576 pyconfig.py:471] Config param lr_schedule_type: LearningRateScheduleType.COSINE I0424 20:30:13.851960 139001921824576 pyconfig.py:471] Config param managed_mldiagnostics: False I0424 20:30:13.851975 139001921824576 pyconfig.py:471] Config param managed_mldiagnostics_dir: /deps/maxtext_output/gpt3-52k_2026-04-24-20-30/managed-mldiagnostics I0424 20:30:13.851991 139001921824576 pyconfig.py:471] Config param managed_mldiagnostics_run_group: I0424 20:30:13.852007 139001921824576 pyconfig.py:471] Config param matmul_precision: MatmulPrecision.DEFAULT I0424 20:30:13.852025 139001921824576 pyconfig.py:471] Config param max_checkify: False I0424 20:30:13.852041 139001921824576 pyconfig.py:471] Config param max_concurrency: 256 I0424 20:30:13.852055 139001921824576 pyconfig.py:471] Config param max_corpus_chars: 10000000 I0424 20:30:13.852071 139001921824576 pyconfig.py:471] Config param max_num_batched_tokens: None I0424 20:30:13.852086 139001921824576 pyconfig.py:471] Config param max_num_checkpoints_to_keep: None I0424 20:30:13.852111 139001921824576 pyconfig.py:471] Config param max_num_images_per_example: -1 I0424 20:30:13.852125 139001921824576 pyconfig.py:471] Config param max_num_seqs: None I0424 20:30:13.852141 139001921824576 pyconfig.py:471] Config param max_position_embeddings: 163840 I0424 20:30:13.852157 139001921824576 pyconfig.py:471] Config param max_prefill_predict_length: 64 I0424 20:30:13.852172 139001921824576 pyconfig.py:471] Config param max_sample_len_for_audio: 10000 I0424 20:30:13.852187 139001921824576 pyconfig.py:471] Config param max_segments_per_seq: -1 I0424 20:30:13.852203 139001921824576 pyconfig.py:471] Config param max_source_positions_for_audio: 1500 I0424 20:30:13.852218 139001921824576 pyconfig.py:471] Config param max_target_length: 2048 I0424 20:30:13.852233 139001921824576 pyconfig.py:471] Config param max_timescale_for_audio: 10000.0 I0424 20:30:13.852249 139001921824576 pyconfig.py:471] Config param megablox: True I0424 20:30:13.852264 139001921824576 pyconfig.py:471] Config param merge_gating_gmm: False I0424 20:30:13.852280 139001921824576 pyconfig.py:471] Config param mesh_axes: ['diloco', 'data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'] I0424 20:30:13.852298 139001921824576 pyconfig.py:471] Config param metrics_dir: /deps/maxtext_output/gpt3-52k_2026-04-24-20-30/metrics/ I0424 20:30:13.852313 139001921824576 pyconfig.py:471] Config param metrics_file: I0424 20:30:13.852329 139001921824576 pyconfig.py:471] Config param mhc_expansion_rate: 1 I0424 20:30:13.852344 139001921824576 pyconfig.py:471] Config param micro_batch_size_to_eval_on: 64 I0424 20:30:13.852359 139001921824576 pyconfig.py:471] Config param micro_batch_size_to_train_on: 64 I0424 20:30:13.852375 139001921824576 pyconfig.py:471] Config param mla_kv: RematLocation.REMAT I0424 20:30:13.852390 139001921824576 pyconfig.py:471] Config param mla_naive_kvcache: True I0424 20:30:13.852406 139001921824576 pyconfig.py:471] Config param mla_q: RematLocation.REMAT I0424 20:30:13.852421 139001921824576 pyconfig.py:471] Config param mlp_activations: ['gelu'] I0424 20:30:13.852437 139001921824576 pyconfig.py:471] Config param mlp_activations_limit: -1.0 I0424 20:30:13.852452 139001921824576 pyconfig.py:471] Config param mlp_bias: False I0424 20:30:13.852468 139001921824576 pyconfig.py:471] Config param mlp_dim: 64 I0424 20:30:13.852483 139001921824576 pyconfig.py:471] Config param mlpwi: RematLocation.REMAT I0424 20:30:13.852499 139001921824576 pyconfig.py:471] Config param mlpwi_0: RematLocation.REMAT I0424 20:30:13.852518 139001921824576 pyconfig.py:471] Config param mlpwi_1: RematLocation.REMAT I0424 20:30:13.852535 139001921824576 pyconfig.py:471] Config param mlpwo: RematLocation.REMAT I0424 20:30:13.852550 139001921824576 pyconfig.py:471] Config param moba: False I0424 20:30:13.852566 139001921824576 pyconfig.py:471] Config param moba_chunk_size: 1024 I0424 20:30:13.852581 139001921824576 pyconfig.py:471] Config param moba_topk: 8 I0424 20:30:13.852596 139001921824576 pyconfig.py:471] Config param model_call_mode: I0424 20:30:13.852612 139001921824576 pyconfig.py:471] Config param model_name: gpt3-52k I0424 20:30:13.852628 139001921824576 pyconfig.py:471] Config param moe_expert_input_dim: -1 I0424 20:30:13.852643 139001921824576 pyconfig.py:471] Config param moe_fsdp_use_two_stage_all_gather: False I0424 20:30:13.852659 139001921824576 pyconfig.py:471] Config param moe_mlp_dim: -1 I0424 20:30:13.852675 139001921824576 pyconfig.py:471] Config param moe_mlpwi_0: RematLocation.REMAT I0424 20:30:13.852691 139001921824576 pyconfig.py:471] Config param moe_mlpwi_1: RematLocation.REMAT I0424 20:30:13.852705 139001921824576 pyconfig.py:471] Config param moe_mlpwo: RematLocation.REMAT I0424 20:30:13.852721 139001921824576 pyconfig.py:471] Config param monitor_goodput: False I0424 20:30:13.852736 139001921824576 pyconfig.py:471] Config param monitor_step_time_deviation: True I0424 20:30:13.852752 139001921824576 pyconfig.py:471] Config param mrope_section: [24, 20, 20] I0424 20:30:13.852767 139001921824576 pyconfig.py:471] Config param mscale: 1.0 I0424 20:30:13.852783 139001921824576 pyconfig.py:471] Config param mtc_data_parallelism: 0 I0424 20:30:13.852799 139001921824576 pyconfig.py:471] Config param mtp_eval_target_module: 0 I0424 20:30:13.852814 139001921824576 pyconfig.py:471] Config param mtp_loss_scaling_factor: 0.1 I0424 20:30:13.852830 139001921824576 pyconfig.py:471] Config param mtp_num_layers: 0 I0424 20:30:13.852846 139001921824576 pyconfig.py:471] Config param mu_dtype: float32 I0424 20:30:13.852869 139001921824576 pyconfig.py:471] Config param multi_sampling: False I0424 20:30:13.852885 139001921824576 pyconfig.py:471] Config param multi_tier_checkpointing_backup_interval_minutes: 0 I0424 20:30:13.852901 139001921824576 pyconfig.py:471] Config param muon_beta: 0.95 I0424 20:30:13.852916 139001921824576 pyconfig.py:471] Config param muon_consistent_rms: None I0424 20:30:13.852932 139001921824576 pyconfig.py:471] Config param muon_weight_decay: 0.0 I0424 20:30:13.852947 139001921824576 pyconfig.py:471] Config param n_routing_groups: -1 I0424 20:30:13.852962 139001921824576 pyconfig.py:471] Config param n_window_for_audio: 50 I0424 20:30:13.852978 139001921824576 pyconfig.py:471] Config param n_window_infer_for_audio: 800 I0424 20:30:13.852993 139001921824576 pyconfig.py:471] Config param nope_layer_interval: -1 I0424 20:30:13.853008 139001921824576 pyconfig.py:471] Config param norm_topk_prob: False I0424 20:30:13.853023 139001921824576 pyconfig.py:471] Config param normalization_layer_epsilon: 1e-05 I0424 20:30:13.853041 139001921824576 pyconfig.py:471] Config param normalize_embedding_logits: False I0424 20:30:13.853057 139001921824576 pyconfig.py:471] Config param num_attention_heads_for_vit: 16 I0424 20:30:13.853073 139001921824576 pyconfig.py:471] Config param num_batches: 4 I0424 20:30:13.853088 139001921824576 pyconfig.py:471] Config param num_channels_for_vit: 3 I0424 20:30:13.853111 139001921824576 pyconfig.py:471] Config param num_conv_layers_for_audio: 3 I0424 20:30:13.853126 139001921824576 pyconfig.py:471] Config param num_decoder_layers: 1 I0424 20:30:13.853141 139001921824576 pyconfig.py:471] Config param num_diloco_replicas: 1 I0424 20:30:13.853157 139001921824576 pyconfig.py:471] Config param num_epoch: 1 I0424 20:30:13.853171 139001921824576 pyconfig.py:471] Config param num_eval_passes: 1 I0424 20:30:13.853187 139001921824576 pyconfig.py:471] Config param num_experts: 1 I0424 20:30:13.853201 139001921824576 pyconfig.py:471] Config param num_experts_per_tok: 1 I0424 20:30:13.853217 139001921824576 pyconfig.py:471] Config param num_generations: 2 I0424 20:30:13.853231 139001921824576 pyconfig.py:471] Config param num_hidden_layers_for_vit: 34 I0424 20:30:13.853246 139001921824576 pyconfig.py:471] Config param num_iterations: 1 I0424 20:30:13.853261 139001921824576 pyconfig.py:471] Config param num_kv_heads: 2 I0424 20:30:13.853277 139001921824576 pyconfig.py:471] Config param num_layers_per_pipeline_stage: 1 I0424 20:30:13.853291 139001921824576 pyconfig.py:471] Config param num_mel_bins_for_audio: 128 I0424 20:30:13.853306 139001921824576 pyconfig.py:471] Config param num_pipeline_microbatches: -1 I0424 20:30:13.853321 139001921824576 pyconfig.py:471] Config param num_pipeline_repeats: -1 I0424 20:30:13.853337 139001921824576 pyconfig.py:471] Config param num_position_embeddings_for_vit: 1024 I0424 20:30:13.853351 139001921824576 pyconfig.py:471] Config param num_query_heads: 2 I0424 20:30:13.853367 139001921824576 pyconfig.py:471] Config param num_samplers_slices: -1 I0424 20:30:13.853381 139001921824576 pyconfig.py:471] Config param num_slices: 1 I0424 20:30:13.853397 139001921824576 pyconfig.py:471] Config param num_target_devices: 32 I0424 20:30:13.853412 139001921824576 pyconfig.py:471] Config param num_test_batches: 5 I0424 20:30:13.853427 139001921824576 pyconfig.py:471] Config param num_trainer_slices: -1 I0424 20:30:13.853443 139001921824576 pyconfig.py:471] Config param num_vocab_tiling: 1 I0424 20:30:13.853459 139001921824576 pyconfig.py:471] Config param off_policy_steps: 0 I0424 20:30:13.853473 139001921824576 pyconfig.py:471] Config param offline_data_dir: None I0424 20:30:13.853489 139001921824576 pyconfig.py:471] Config param opt_type: OptimizerType.ADAM_PAX I0424 20:30:13.853508 139001921824576 pyconfig.py:471] Config param optimize_mesh_for_tpu_v6e: False I0424 20:30:13.853524 139001921824576 pyconfig.py:471] Config param optimizer_memory_host_offload: False I0424 20:30:13.853539 139001921824576 pyconfig.py:471] Config param original_max_position_embeddings: 4096 I0424 20:30:13.853554 139001921824576 pyconfig.py:471] Config param out_hidden_size_for_vit: 512 I0424 20:30:13.853569 139001921824576 pyconfig.py:471] Config param out_proj: RematLocation.REMAT I0424 20:30:13.853584 139001921824576 pyconfig.py:471] Config param output_dim_for_audio: 512 I0424 20:30:13.853600 139001921824576 pyconfig.py:471] Config param override_logical_axis_rules: False I0424 20:30:13.853614 139001921824576 pyconfig.py:471] Config param override_model_config: True I0424 20:30:13.853630 139001921824576 pyconfig.py:471] Config param packing: True I0424 20:30:13.853644 139001921824576 pyconfig.py:471] Config param pagedattn_head_dim_alignment: 128 I0424 20:30:13.853659 139001921824576 pyconfig.py:471] Config param pagedattn_max_pages_per_group: -1 I0424 20:30:13.853674 139001921824576 pyconfig.py:471] Config param pagedattn_num_pages: 64 I0424 20:30:13.853689 139001921824576 pyconfig.py:471] Config param pagedattn_pages_per_compute_block: 4 I0424 20:30:13.853704 139001921824576 pyconfig.py:471] Config param pagedattn_tokens_per_page: 32 I0424 20:30:13.853718 139001921824576 pyconfig.py:471] Config param param_scan_axis: 1 I0424 20:30:13.853733 139001921824576 pyconfig.py:471] Config param parameter_memory_host_offload: False I0424 20:30:13.853748 139001921824576 pyconfig.py:471] Config param partial_rotary_factor: 1.0 I0424 20:30:13.853763 139001921824576 pyconfig.py:471] Config param patch_size_for_vit: 14 I0424 20:30:13.853779 139001921824576 pyconfig.py:471] Config param penalty_incorrect_answer: -1.0 I0424 20:30:13.853794 139001921824576 pyconfig.py:471] Config param penalty_incorrect_format: -0.5 I0424 20:30:13.853810 139001921824576 pyconfig.py:471] Config param per_device_batch_size: 2 I0424 20:30:13.853827 139001921824576 pyconfig.py:471] Config param per_device_batch_size_increment: 2.0 I0424 20:30:13.853843 139001921824576 pyconfig.py:471] Config param per_device_batch_size_start: 4.0 I0424 20:30:13.853858 139001921824576 pyconfig.py:471] Config param pipeline_delay_activation_forwarding: False I0424 20:30:13.853873 139001921824576 pyconfig.py:471] Config param pipeline_fsdp_ag_once: False I0424 20:30:13.853889 139001921824576 pyconfig.py:471] Config param pipeline_fsdp_ag_per_repeat: False I0424 20:30:13.853903 139001921824576 pyconfig.py:471] Config param pipeline_parallel_layers: 1 I0424 20:30:13.853919 139001921824576 pyconfig.py:471] Config param pixel_shuffle_ratio_for_vit: 0.5 I0424 20:30:13.853934 139001921824576 pyconfig.py:471] Config param posemb_type_for_vit: learn I0424 20:30:13.853949 139001921824576 pyconfig.py:471] Config param position_id_per_seconds: 25 I0424 20:30:13.853964 139001921824576 pyconfig.py:471] Config param prefill_cache_axis_order: 1,2,0,3 I0424 20:30:13.853979 139001921824576 pyconfig.py:471] Config param prefill_cache_dir: I0424 20:30:13.853994 139001921824576 pyconfig.py:471] Config param prefill_chunk_size: 256 I0424 20:30:13.854009 139001921824576 pyconfig.py:471] Config param prefill_slice: v5e-16 I0424 20:30:13.854024 139001921824576 pyconfig.py:471] Config param prefix_caching_dram_byte: 100000000000 I0424 20:30:13.854039 139001921824576 pyconfig.py:471] Config param prefix_caching_hbm_byte: 10000000000 I0424 20:30:13.854054 139001921824576 pyconfig.py:471] Config param profile_cleanly: True I0424 20:30:13.854069 139001921824576 pyconfig.py:471] Config param profile_periodically_period: -1 I0424 20:30:13.854085 139001921824576 pyconfig.py:471] Config param profile_power_events: False I0424 20:30:13.854111 139001921824576 pyconfig.py:471] Config param profiler: ProfilerType.NONE I0424 20:30:13.854128 139001921824576 pyconfig.py:471] Config param profiler_steps: 5 I0424 20:30:13.854144 139001921824576 pyconfig.py:471] Config param projector_dropout_for_vit: 0.0 I0424 20:30:13.854158 139001921824576 pyconfig.py:471] Config param projector_input_dim_for_vit: 4096 I0424 20:30:13.854173 139001921824576 pyconfig.py:471] Config param projector_output_dim_for_vit: 4096 I0424 20:30:13.854189 139001921824576 pyconfig.py:471] Config param prometheus_port: 0 I0424 20:30:13.854203 139001921824576 pyconfig.py:471] Config param prompt: I love to I0424 20:30:13.854219 139001921824576 pyconfig.py:471] Config param pure_nnx: False I0424 20:30:13.854234 139001921824576 pyconfig.py:471] Config param pure_nnx_decoder: False I0424 20:30:13.854249 139001921824576 pyconfig.py:471] Config param q_lora_rank: 0 I0424 20:30:13.854264 139001921824576 pyconfig.py:471] Config param qk_clip_threshold: 100.0 I0424 20:30:13.854279 139001921824576 pyconfig.py:471] Config param qk_nope_head_dim: 128 I0424 20:30:13.854298 139001921824576 pyconfig.py:471] Config param qk_norm_with_scale: True I0424 20:30:13.854314 139001921824576 pyconfig.py:471] Config param qk_rope_head_dim: 64 I0424 20:30:13.854330 139001921824576 pyconfig.py:471] Config param qkv_proj: RematLocation.REMAT I0424 20:30:13.854345 139001921824576 pyconfig.py:471] Config param quant_cfg_path: I0424 20:30:13.854360 139001921824576 pyconfig.py:471] Config param quantization: QuantizationType.NONE I0424 20:30:13.854378 139001921824576 pyconfig.py:471] Config param quantization_local_shard_count: 4 I0424 20:30:13.854393 139001921824576 pyconfig.py:471] Config param quantize_kvcache: False I0424 20:30:13.854408 139001921824576 pyconfig.py:471] Config param query_proj: RematLocation.REMAT I0424 20:30:13.854423 139001921824576 pyconfig.py:471] Config param query_wa_proj: RematLocation.REMAT I0424 20:30:13.854439 139001921824576 pyconfig.py:471] Config param ragged_block_size: 256 I0424 20:30:13.854454 139001921824576 pyconfig.py:471] Config param ragged_buffer_factor: -1.0 I0424 20:30:13.854470 139001921824576 pyconfig.py:471] Config param rampup_end_step: 0 I0424 20:30:13.854485 139001921824576 pyconfig.py:471] Config param rampup_samples_per_increment_to_load: None I0424 20:30:13.854500 139001921824576 pyconfig.py:471] Config param reasoning_end_token: </reasoning> I0424 20:30:13.854519 139001921824576 pyconfig.py:471] Config param reasoning_start_token: <reasoning> I0424 20:30:13.854534 139001921824576 pyconfig.py:471] Config param record_internal_nn_metrics: 0 I0424 20:30:13.854550 139001921824576 pyconfig.py:471] Config param remat_policy: full I0424 20:30:13.854565 139001921824576 pyconfig.py:471] Config param remat_policy_for_vit: minimal I0424 20:30:13.854580 139001921824576 pyconfig.py:471] Config param remove_size_one_mesh_axis_from_type: True I0424 20:30:13.854594 139001921824576 pyconfig.py:471] Config param replicate_quant_scale: False I0424 20:30:13.854610 139001921824576 pyconfig.py:471] Config param replicator_backup_interval_minutes: 0 I0424 20:30:13.854624 139001921824576 pyconfig.py:471] Config param report_heartbeat_metric_for_gcp_monitoring: False I0424 20:30:13.854640 139001921824576 pyconfig.py:471] Config param report_performance_metric_for_gcp_monitoring: False I0424 20:30:13.854654 139001921824576 pyconfig.py:471] Config param reshape_q: False I0424 20:30:13.854670 139001921824576 pyconfig.py:471] Config param return_log_prob: False I0424 20:30:13.854686 139001921824576 pyconfig.py:471] Config param reuse_example_batch: 0 I0424 20:30:13.854702 139001921824576 pyconfig.py:471] Config param reward_exact_answer: 5.0 I0424 20:30:13.854717 139001921824576 pyconfig.py:471] Config param reward_exact_format_match: 3.0 I0424 20:30:13.854732 139001921824576 pyconfig.py:471] Config param reward_partial_format_match: 0.5 I0424 20:30:13.854749 139001921824576 pyconfig.py:471] Config param reward_ratio_guess_to_answer_high: 0.5 I0424 20:30:13.854763 139001921824576 pyconfig.py:471] Config param reward_ratio_guess_to_answer_low: 0.25 I0424 20:30:13.854780 139001921824576 pyconfig.py:471] Config param reward_white_space_format_match: 1.5 I0424 20:30:13.854795 139001921824576 pyconfig.py:471] Config param rl: {'num_generations': 2, 'num_iterations': 1, 'grpo_beta': 0.08, 'grpo_epsilon': 0.2, 'loss_algo': 'grpo', 'use_agentic_rollout': False, 'max_concurrency': 256, 'off_policy_steps': 0, 'system_prompt': '', 'degenerate_group_masking': True, 'epsilon_high': None} I0424 20:30:13.854815 139001921824576 pyconfig.py:471] Config param rollout_data_parallelism: -1 I0424 20:30:13.854831 139001921824576 pyconfig.py:471] Config param rollout_expert_parallelism: 1 I0424 20:30:13.854846 139001921824576 pyconfig.py:471] Config param rollout_micro_batch_size: -1 I0424 20:30:13.854861 139001921824576 pyconfig.py:471] Config param rollout_tensor_parallelism: -1 I0424 20:30:13.854876 139001921824576 pyconfig.py:471] Config param rope_attention_scaling: False I0424 20:30:13.854892 139001921824576 pyconfig.py:471] Config param rope_factor: 40 I0424 20:30:13.854907 139001921824576 pyconfig.py:471] Config param rope_interleave: True I0424 20:30:13.854922 139001921824576 pyconfig.py:471] Config param rope_linear_scaling_factor: 1.0 I0424 20:30:13.854937 139001921824576 pyconfig.py:471] Config param rope_max_timescale: 10000 I0424 20:30:13.854953 139001921824576 pyconfig.py:471] Config param rope_min_timescale: 1 I0424 20:30:13.854968 139001921824576 pyconfig.py:471] Config param rope_theta_for_vit: 10000 I0424 20:30:13.854983 139001921824576 pyconfig.py:471] Config param rope_truncate: True I0424 20:30:13.854998 139001921824576 pyconfig.py:471] Config param rope_type: RopeType.DEFAULT I0424 20:30:13.855016 139001921824576 pyconfig.py:471] Config param rope_use_scale: True I0424 20:30:13.855032 139001921824576 pyconfig.py:471] Config param routed_bias: False I0424 20:30:13.855048 139001921824576 pyconfig.py:471] Config param routed_bias_update_rate: 0.0 I0424 20:30:13.855063 139001921824576 pyconfig.py:471] Config param routed_scaling_factor: 1.0 I0424 20:30:13.855078 139001921824576 pyconfig.py:471] Config param routed_score_func: I0424 20:30:13.855106 139001921824576 pyconfig.py:471] Config param run_name: gpt3-52k_2026-04-24-20-30 I0424 20:30:13.855123 139001921824576 pyconfig.py:471] Config param sa_block_kv: 512 I0424 20:30:13.855137 139001921824576 pyconfig.py:471] Config param sa_block_kv_compute: 512 I0424 20:30:13.855152 139001921824576 pyconfig.py:471] Config param sa_block_kv_dkv: 512 I0424 20:30:13.855168 139001921824576 pyconfig.py:471] Config param sa_block_kv_dkv_compute: 512 I0424 20:30:13.855182 139001921824576 pyconfig.py:471] Config param sa_block_kv_dq: 512 I0424 20:30:13.855197 139001921824576 pyconfig.py:471] Config param sa_block_q: 512 I0424 20:30:13.855212 139001921824576 pyconfig.py:471] Config param sa_block_q_dkv: 512 I0424 20:30:13.855227 139001921824576 pyconfig.py:471] Config param sa_block_q_dq: 512 I0424 20:30:13.855243 139001921824576 pyconfig.py:471] Config param sa_k_layout: HEAD_DIM_MINOR I0424 20:30:13.855259 139001921824576 pyconfig.py:471] Config param sa_q_layout: HEAD_DIM_MINOR I0424 20:30:13.855273 139001921824576 pyconfig.py:471] Config param sa_use_fused_bwd_kernel: False I0424 20:30:13.855289 139001921824576 pyconfig.py:471] Config param sa_v_layout: HEAD_DIM_MINOR I0424 20:30:13.855304 139001921824576 pyconfig.py:471] Config param sampler_devices_fraction: 0.5 I0424 20:30:13.855319 139001921824576 pyconfig.py:471] Config param save_checkpoint_on_completion: True I0424 20:30:13.855335 139001921824576 pyconfig.py:471] Config param save_config_to_gcs: False I0424 20:30:13.855349 139001921824576 pyconfig.py:471] Config param save_quantized_params_path: I0424 20:30:13.855364 139001921824576 pyconfig.py:471] Config param scale_embedding_for_audio: True I0424 20:30:13.855380 139001921824576 pyconfig.py:471] Config param scan_layers: True I0424 20:30:13.855396 139001921824576 pyconfig.py:471] Config param scan_layers_per_stage: False I0424 20:30:13.855410 139001921824576 pyconfig.py:471] Config param scan_pipeline_iterations: True I0424 20:30:13.855425 139001921824576 pyconfig.py:471] Config param scan_pipeline_repeats: False I0424 20:30:13.855439 139001921824576 pyconfig.py:471] Config param set_remat_policy_on_layers_per_stage: False I0424 20:30:13.855455 139001921824576 pyconfig.py:471] Config param set_remat_policy_on_pipeline_iterations: True I0424 20:30:13.855470 139001921824576 pyconfig.py:471] Config param sft_train_on_completion_only: False I0424 20:30:13.855486 139001921824576 pyconfig.py:471] Config param shard_exp_on_fsdp: False I0424 20:30:13.855500 139001921824576 pyconfig.py:471] Config param shard_mode: ShardMode.AUTO I0424 20:30:13.855521 139001921824576 pyconfig.py:471] Config param shard_optimizer_over_data: False I0424 20:30:13.855537 139001921824576 pyconfig.py:471] Config param sharding_strategy: None I0424 20:30:13.855551 139001921824576 pyconfig.py:471] Config param sharding_tolerance: 0.02 I0424 20:30:13.855567 139001921824576 pyconfig.py:471] Config param shardy: True I0424 20:30:13.855581 139001921824576 pyconfig.py:471] Config param share_kv_projections: False I0424 20:30:13.855596 139001921824576 pyconfig.py:471] Config param shared_experts: 0 I0424 20:30:13.855611 139001921824576 pyconfig.py:471] Config param sinkhorn_iterations: 20 I0424 20:30:13.855626 139001921824576 pyconfig.py:471] Config param skip_first_n_steps_for_profiler: 1 I0424 20:30:13.855640 139001921824576 pyconfig.py:471] Config param skip_jax_distributed_system: False I0424 20:30:13.855656 139001921824576 pyconfig.py:471] Config param skip_step_interval: 128 I0424 20:30:13.855670 139001921824576 pyconfig.py:471] Config param skip_step_on_spikes: False I0424 20:30:13.855685 139001921824576 pyconfig.py:471] Config param skip_step_scaling_factor: 6.0 I0424 20:30:13.855700 139001921824576 pyconfig.py:471] Config param sliding_window_size: 0 I0424 20:30:13.855715 139001921824576 pyconfig.py:471] Config param solution_end_token: </answer> I0424 20:30:13.855730 139001921824576 pyconfig.py:471] Config param solution_start_token: <answer> I0424 20:30:13.855745 139001921824576 pyconfig.py:471] Config param source_checkpoint_layout: orbax I0424 20:30:13.855760 139001921824576 pyconfig.py:471] Config param sparse_matmul: True I0424 20:30:13.855774 139001921824576 pyconfig.py:471] Config param spatial_merge_size_for_vit: 2 I0424 20:30:13.855789 139001921824576 pyconfig.py:471] Config param stack_prefill_result_cache: False I0424 20:30:13.855804 139001921824576 pyconfig.py:471] Config param stack_trace_interval_seconds: 600 I0424 20:30:13.855819 139001921824576 pyconfig.py:471] Config param stack_trace_to_cloud: False I0424 20:30:13.855834 139001921824576 pyconfig.py:471] Config param step_deviation_interval_seconds: 30 I0424 20:30:13.855849 139001921824576 pyconfig.py:471] Config param steps: 200000 I0424 20:30:13.855864 139001921824576 pyconfig.py:471] Config param stop_strings: None I0424 20:30:13.855879 139001921824576 pyconfig.py:471] Config param student_overrides: {'model_name': 'llama3.1-8b'} I0424 20:30:13.855895 139001921824576 pyconfig.py:471] Config param student_params_to_update: None I0424 20:30:13.855911 139001921824576 pyconfig.py:471] Config param subslice_shape: I0424 20:30:13.855927 139001921824576 pyconfig.py:471] Config param swap_space_vllm_gb: 2 I0424 20:30:13.855942 139001921824576 pyconfig.py:471] Config param system_prompt: I0424 20:30:13.855957 139001921824576 pyconfig.py:471] Config param target_eval_loss: 0.0 I0424 20:30:13.855972 139001921824576 pyconfig.py:471] Config param teacher_overrides: {'model_name': 'llama3.1-8b'} I0424 20:30:13.855988 139001921824576 pyconfig.py:471] Config param temperature_tuning: False I0424 20:30:13.856003 139001921824576 pyconfig.py:471] Config param temporal_patch_size_for_vit: 2 I0424 20:30:13.856018 139001921824576 pyconfig.py:471] Config param tensorboard_dir: /deps/maxtext_output/gpt3-52k_2026-04-24-20-30/tensorboard/ I0424 20:30:13.856034 139001921824576 pyconfig.py:471] Config param tensors_on_device: None I0424 20:30:13.856048 139001921824576 pyconfig.py:471] Config param tensors_to_offload: None I0424 20:30:13.856063 139001921824576 pyconfig.py:471] Config param test_batch_start_index: 0 I0424 20:30:13.856078 139001921824576 pyconfig.py:471] Config param tile_size_for_vit: 336 I0424 20:30:13.856102 139001921824576 pyconfig.py:471] Config param tokenize_eval_data: True I0424 20:30:13.856117 139001921824576 pyconfig.py:471] Config param tokenize_train_data: True I0424 20:30:13.856133 139001921824576 pyconfig.py:471] Config param tokenizer_path: meta-llama/Llama-3.1-8B I0424 20:30:13.856149 139001921824576 pyconfig.py:471] Config param tokenizer_type: TokenizerType.HUGGINGFACE I0424 20:30:13.856167 139001921824576 pyconfig.py:471] Config param topk_routing_group: -1 I0424 20:30:13.856182 139001921824576 pyconfig.py:471] Config param train_data_columns: ['text'] I0424 20:30:13.856198 139001921824576 pyconfig.py:471] Config param train_fraction: 1.0 I0424 20:30:13.856213 139001921824576 pyconfig.py:471] Config param train_image_column: image I0424 20:30:13.856229 139001921824576 pyconfig.py:471] Config param train_micro_batch_size: -1 I0424 20:30:13.856245 139001921824576 pyconfig.py:471] Config param train_split: train I0424 20:30:13.856259 139001921824576 pyconfig.py:471] Config param trainable_parameters_mask: [] I0424 20:30:13.856274 139001921824576 pyconfig.py:471] Config param trainable_position_size: 2048 I0424 20:30:13.856288 139001921824576 pyconfig.py:471] Config param trainer_devices_fraction: 0.5 I0424 20:30:13.856304 139001921824576 pyconfig.py:471] Config param upload_all_profiler_results: False I0424 20:30:13.856320 139001921824576 pyconfig.py:471] Config param use_2d_fsdp_sharding: False I0424 20:30:13.856336 139001921824576 pyconfig.py:471] Config param use_agentic_rollout: False I0424 20:30:13.856352 139001921824576 pyconfig.py:471] Config param use_audio: False I0424 20:30:13.856366 139001921824576 pyconfig.py:471] Config param use_audio_in_video: False I0424 20:30:13.856382 139001921824576 pyconfig.py:471] Config param use_batch_split_schedule: False I0424 20:30:13.856398 139001921824576 pyconfig.py:471] Config param use_chat_template: False I0424 20:30:13.856412 139001921824576 pyconfig.py:471] Config param use_chunked_prefill: False I0424 20:30:13.856427 139001921824576 pyconfig.py:471] Config param use_custom_sort_vjp: True I0424 20:30:13.856443 139001921824576 pyconfig.py:471] Config param use_dpo: False I0424 20:30:13.856457 139001921824576 pyconfig.py:471] Config param use_gather_mosaic_kernel: False I0424 20:30:13.856473 139001921824576 pyconfig.py:471] Config param use_grpo: True I0424 20:30:13.856488 139001921824576 pyconfig.py:471] Config param use_indexer: False I0424 20:30:13.856507 139001921824576 pyconfig.py:471] Config param use_iota_embed: True I0424 20:30:13.856523 139001921824576 pyconfig.py:471] Config param use_jax_splash: False I0424 20:30:13.856538 139001921824576 pyconfig.py:471] Config param use_max_logit_estimate: -1 I0424 20:30:13.856553 139001921824576 pyconfig.py:471] Config param use_mrope: False I0424 20:30:13.856568 139001921824576 pyconfig.py:471] Config param use_multimodal: False I0424 20:30:13.856584 139001921824576 pyconfig.py:471] Config param use_nnx_pipeline: False I0424 20:30:13.856598 139001921824576 pyconfig.py:471] Config param use_pathways: True I0424 20:30:13.856613 139001921824576 pyconfig.py:471] Config param use_post_attn_norm: False I0424 20:30:13.856628 139001921824576 pyconfig.py:471] Config param use_post_ffw_norm: False I0424 20:30:13.856644 139001921824576 pyconfig.py:471] Config param use_qk_clip: False I0424 20:30:13.856659 139001921824576 pyconfig.py:471] Config param use_qk_norm: False I0424 20:30:13.856673 139001921824576 pyconfig.py:471] Config param use_qk_norm_in_gdn: True I0424 20:30:13.856689 139001921824576 pyconfig.py:471] Config param use_qwix_quantization: False I0424 20:30:13.856703 139001921824576 pyconfig.py:471] Config param use_ragged_attention: False I0424 20:30:13.856719 139001921824576 pyconfig.py:471] Config param use_random_routing: False I0424 20:30:13.856733 139001921824576 pyconfig.py:471] Config param use_replicator_service: False I0424 20:30:13.856749 139001921824576 pyconfig.py:471] Config param use_ring_of_experts: False I0424 20:30:13.856763 139001921824576 pyconfig.py:471] Config param use_sft: False I0424 20:30:13.856778 139001921824576 pyconfig.py:471] Config param use_splash_scheduler: False I0424 20:30:13.856792 139001921824576 pyconfig.py:471] Config param use_tokamax_gmm: False I0424 20:30:13.856808 139001921824576 pyconfig.py:471] Config param use_tokamax_splash: False I0424 20:30:13.856823 139001921824576 pyconfig.py:471] Config param use_truncation: True I0424 20:30:13.856838 139001921824576 pyconfig.py:471] Config param use_tunix_gradient_accumulation: False I0424 20:30:13.856853 139001921824576 pyconfig.py:471] Config param use_untrainable_positional_embedding: False I0424 20:30:13.856868 139001921824576 pyconfig.py:471] Config param use_vertex_tensorboard: False I0424 20:30:13.856883 139001921824576 pyconfig.py:471] Config param using_pipeline_parallelism: False I0424 20:30:13.856897 139001921824576 pyconfig.py:471] Config param v_head_dim: 128 I0424 20:30:13.856913 139001921824576 pyconfig.py:471] Config param v_norm_with_scale: True I0424 20:30:13.856927 139001921824576 pyconfig.py:471] Config param value_proj: RematLocation.REMAT I0424 20:30:13.856943 139001921824576 pyconfig.py:471] Config param vertex_tensorboard_project: I0424 20:30:13.856958 139001921824576 pyconfig.py:471] Config param vertex_tensorboard_region: I0424 20:30:13.856975 139001921824576 pyconfig.py:471] Config param video_path: I0424 20:30:13.856990 139001921824576 pyconfig.py:471] Config param video_placeholder: <|video|> I0424 20:30:13.857005 139001921824576 pyconfig.py:471] Config param vision_output_dim_for_vit: 4096 I0424 20:30:13.857020 139001921824576 pyconfig.py:471] Config param vision_output_length: -1 I0424 20:30:13.857036 139001921824576 pyconfig.py:471] Config param vllm_additional_config: {} I0424 20:30:13.857051 139001921824576 pyconfig.py:471] Config param vllm_hf_config_path: I0424 20:30:13.857067 139001921824576 pyconfig.py:471] Config param vllm_hf_overrides: {} I0424 20:30:13.857081 139001921824576 pyconfig.py:471] Config param vocab_size: 32000 I0424 20:30:13.857104 139001921824576 pyconfig.py:471] Config param warmup_steps_fraction: 0.1 I0424 20:30:13.857120 139001921824576 pyconfig.py:471] Config param weight_dtype: float32 I0424 20:30:13.857144 139001921824576 pyconfig.py:471] Config param weight_quantization_calibration_method: absmax I0424 20:30:13.857159 139001921824576 pyconfig.py:471] Config param wi_tile_dlhs_batch_seq: 512 I0424 20:30:13.857175 139001921824576 pyconfig.py:471] Config param wi_tile_dlhs_embed_dim: 1024 I0424 20:30:13.857190 139001921824576 pyconfig.py:471] Config param wi_tile_dlhs_mlp_dim: 1024 I0424 20:30:13.857206 139001921824576 pyconfig.py:471] Config param wi_tile_drhs_batch_seq: 512 I0424 20:30:13.857221 139001921824576 pyconfig.py:471] Config param wi_tile_drhs_embed_dim: 1024 I0424 20:30:13.857236 139001921824576 pyconfig.py:471] Config param wi_tile_drhs_mlp_dim: 1024 I0424 20:30:13.857251 139001921824576 pyconfig.py:471] Config param wi_tile_fwd_batch_seq: 512 I0424 20:30:13.857267 139001921824576 pyconfig.py:471] Config param wi_tile_fwd_embed_dim: 1024 I0424 20:30:13.857283 139001921824576 pyconfig.py:471] Config param wi_tile_fwd_mlp_dim: 1024 I0424 20:30:13.857298 139001921824576 pyconfig.py:471] Config param wo_tile_dlhs_batch_seq: 512 I0424 20:30:13.857314 139001921824576 pyconfig.py:471] Config param wo_tile_dlhs_embed_dim: 1024 I0424 20:30:13.857328 139001921824576 pyconfig.py:471] Config param wo_tile_dlhs_mlp_dim: 1024 I0424 20:30:13.857344 139001921824576 pyconfig.py:471] Config param wo_tile_drhs_batch_seq: 512 I0424 20:30:13.857359 139001921824576 pyconfig.py:471] Config param wo_tile_drhs_embed_dim: 1024 I0424 20:30:13.857375 139001921824576 pyconfig.py:471] Config param wo_tile_drhs_mlp_dim: 1024 I0424 20:30:13.857390 139001921824576 pyconfig.py:471] Config param wo_tile_fwd_batch_seq: 512 I0424 20:30:13.857405 139001921824576 pyconfig.py:471] Config param wo_tile_fwd_embed_dim: 1024 I0424 20:30:13.857421 139001921824576 pyconfig.py:471] Config param wo_tile_fwd_mlp_dim: 1024 I0424 20:30:13.857435 139001921824576 pyconfig.py:471] Config param wsd_decay_steps_fraction: 0.1 I0424 20:30:13.857451 139001921824576 pyconfig.py:471] Config param wsd_decay_style: WsdDecayStyle.LINEAR I0424 20:30:13.857468 139001921824576 pyconfig.py:471] Config param xprof_e2e_enable_fw_power_level_event: False I0424 20:30:13.857483 139001921824576 pyconfig.py:471] Config param xprof_e2e_enable_fw_thermal_event: False I0424 20:30:13.857498 139001921824576 pyconfig.py:471] Config param xprof_e2e_enable_fw_throttle_event: False I0424 20:30:13.857517 139001921824576 pyconfig.py:471] Config param xprof_tpu_power_trace_level: 0 I0424 20:30:13.857535 139001921824576 pyconfig.py:471] Config param z_loss_multiplier: 0.0 I0424 20:30:13.857841 139001921824576 tokenizer.py:245] Tokenizer path: meta-llama/Llama-2-7b-chat-hf I0424 20:30:13.857876 139001921824576 tokenizer.py:224] Loading HF tokenizer: meta-llama/Llama-2-7b-chat-hf I0424 20:30:17.526254 139001921824576 _schedule.py:129] A polynomial schedule was set with a non-positive `transition_steps` value; this results in a constant schedule with value `init_value`. I0424 20:30:17.529526 139001921824576 maxtext_utils.py:1565] Num_devices: 32, shape (1, 4, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1) I0424 20:30:17.529659 139001921824576 train_distill.py:596] Applying logical axis rules for model initialization and training... I0424 20:30:17.529733 139001921824576 train_distill.py:600] Loading Student from ... I0424 20:30:17.529764 139001921824576 train_distill.py:169] --- Student Configuration --- I0424 20:30:17.529786 139001921824576 train_distill.py:170] Model Name: gpt3-52k I0424 20:30:17.529808 139001921824576 train_distill.py:171] Dimensions: 1 Layers, 16 Emb Dim, 8 Head Dim I0424 20:30:17.529827 139001921824576 train_distill.py:174] Attention Heads: 2 Query, 2 KV I0424 20:30:17.529846 139001921824576 train_distill.py:175] Vocab Size: 32000 I0424 20:30:17.529862 139001921824576 train_distill.py:176] Checkpoint: I0424 20:30:17.529881 139001921824576 train_distill.py:465] Initializing model: gpt3-52k... I0424 20:30:18.814233 139001921824576 train_distill.py:614] Loading Teacher from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items... I0424 20:30:18.814345 139001921824576 train_distill.py:169] --- Teacher Configuration --- I0424 20:30:18.814374 139001921824576 train_distill.py:170] Model Name: gpt3-52k I0424 20:30:18.814398 139001921824576 train_distill.py:171] Dimensions: 1 Layers, 16 Emb Dim, 8 Head Dim I0424 20:30:18.814425 139001921824576 train_distill.py:174] Attention Heads: 2 Query, 2 KV I0424 20:30:18.814443 139001921824576 train_distill.py:175] Vocab Size: 32000 I0424 20:30:18.814461 139001921824576 train_distill.py:176] Checkpoint: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items I0424 20:30:18.814477 139001921824576 train_distill.py:465] Initializing model: gpt3-52k... I0424 20:30:19.949215 139001921824576 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0424 20:30:19.949728 139001921824576 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=True, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7e6b308e8dd0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0424 20:30:19.949786 139001921824576 abstract_checkpointer.py:35] orbax-checkpoint version: 0.11.28 W0424 20:30:20.517600 139001921824576 checkpoint.py:202] Metadata file does not exist: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items/_CHECKPOINT_METADATA I0424 20:30:21.083168 2135 google_auth_provider.cc:181] Running on GCE, using service account 562977990677-compute@developer.gserviceaccount.com I0424 20:30:22.220327 139001921824576 checkpointer.py:304] Restoring checkpoint from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items. W0424 20:30:24.446226 139001921824576 transform_utils.py:230] The transformations API will eventually be replaced by an upgraded design. The current API will not be removed until this point, but it will no longer be actively worked on. I0424 20:30:24.446607 139001921824576 transform_utils.py:288] The following keys are not loaded from the original tree after applying specified transforms: params/params/decoder/to_nnx__rngs/aqt/count, params/params/decoder/to_nnx__rngs/aqt/key, params/params/decoder/to_nnx__rngs/dropout/count, params/params/decoder/to_nnx__rngs/dropout/key, params/params/decoder/to_nnx__rngs/params/count, params/params/decoder/to_nnx__rngs/params/key I0424 20:30:24.753030 139001921824576 checkpointer.py:318] Finished restoring checkpoint in 2.90 seconds from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items. I0424 20:30:25.458837 139001921824576 train_distill.py:640] Initializing Data Iterators via MaxText pipeline... I0424 20:30:25.522717 139001921824576 config.py:112] TensorFlow version 2.20.0 available. I0424 20:30:25.523209 139001921824576 config.py:125] JAX version 0.8.3 available. E0424 20:30:28.139124 139001921824576 packing.py:209] PackAndBatchOperation is deprecated. Please use lazy_dataset.FirstFitPackIterDataset instead. I0424 20:30:28.139353 139001921824576 data_loader.py:408] Adding CopyNumPyArrayToSharedMemory MapTransform. I0424 20:30:28.142957 139001921824576 train_distill.py:410] Input Pipeline Checkpointing: DISABLED I0424 20:30:28.143066 139001921824576 train_distill.py:414] Reason: Iterator 'MultiHostDataLoadIterator' is not recognized as Grain (dataset_type='DatasetType.HF', has_save=False) I0424 20:30:28.143145 139001921824576 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0424 20:30:28.143228 139001921824576 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7e6b308e8dd0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0424 20:30:28.143271 139001921824576 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0424 20:30:28.143304 139001921824576 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7e6b308e8dd0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0424 20:30:28.143350 139001921824576 checkpoint_manager.py:702] [process=6][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e52600769c0>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e5260076750>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e5260077290>}, handler_registry=None I0424 20:30:28.143581 139001921824576 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e52600769c0>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0424 20:30:28.143625 139001921824576 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e5260076750>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0424 20:30:28.143653 139001921824576 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e5260077290>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0424 20:30:28.143678 139001921824576 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e5397b4e180>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0424 20:30:28.143706 139001921824576 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e52600769c0>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e52600769c0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e5260076750>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e5260076750>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e5260077290>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e5260077290>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e5397b4e180>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e5397b4e180>}). I0424 20:30:28.144191 139001921824576 async_checkpointer.py:177] [process=6][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7e6705dc3380> timeout: 600 secs and primary_host=0 for async checkpoint writes I0424 20:30:30.235603 139001921824576 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_test_pipeline_scan_nnx_20260424_200844/pt_distill_nnx_xpk_test_pipeline_scan_nnx_20260424_200844_07_distill_smoke/checkpoints I0424 20:30:30.237836 139001921824576 checkpoint_manager.py:921] [process=6][thread=MainThread] CheckpointManager created, primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_test_pipeline_scan_nnx_20260424_200844/pt_distill_nnx_xpk_test_pipeline_scan_nnx_20260424_200844_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7e5260076cf0> I0424 20:30:30.237949 139001921824576 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0424 20:30:30.238012 139001921824576 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7e6b308e8dd0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0424 20:30:30.238047 139001921824576 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0424 20:30:30.238077 139001921824576 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7e6b308e8dd0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0424 20:30:30.238129 139001921824576 checkpoint_manager.py:1983] [process=6][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning. I0424 20:30:30.238181 139001921824576 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=139001921824576 count=1 at 0x7e671f1abec0>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7e5260077050>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7e52600771d0>, _write_futures=[]) I0424 20:30:30.238542 139001921824576 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=139001921824576 count=1 at 0x7e671f1abec0>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7e5260077050>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7e52600771d0>, _write_futures=[]) I0424 20:30:30.238567 139001921824576 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=139001921824576 count=1 at 0x7e671f1abec0>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7e5260077050>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7e52600771d0>, _write_futures=[]) I0424 20:30:30.238597 139001921824576 checkpoint_manager.py:702] [process=6][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e5fefd0df10>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e5260075d30>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e5260076060>, 'iter': <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7e5260077710>}, handler_registry=None I0424 20:30:30.238691 139001921824576 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e5fefd0df10>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0424 20:30:30.238724 139001921824576 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e5260075d30>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0424 20:30:30.238747 139001921824576 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e5260076060>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0424 20:30:30.238774 139001921824576 composite_checkpoint_handler.py:237] Deferred registration for item: "iter". Adding handler `<maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7e5260077710>` for item "iter" and save args `<class 'maxtext.common.checkpointing.GrainCheckpointSave'>` and restore args `<class 'maxtext.common.checkpointing.GrainCheckpointRestore'>` to `_handler_registry`. I0424 20:30:30.238796 139001921824576 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e52600d8650>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0424 20:30:30.238821 139001921824576 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e5fefd0df10>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e5fefd0df10>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e5260075d30>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e5260075d30>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e5260076060>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e5260076060>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointSave'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7e5260077710>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointRestore'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7e5260077710>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e52600d8650>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e52600d8650>}). I0424 20:30:30.238889 139001921824576 async_checkpointer.py:177] [process=6][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7e6705dc34c0> timeout: 600 secs and primary_host=0 for async checkpoint writes I0424 20:30:30.623303 139001921824576 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_test_pipeline_scan_nnx_20260424_200844/pt_distill_nnx_xpk_test_pipeline_scan_nnx_20260424_200844_07_distill_smoke/checkpoints I0424 20:30:31.048935 139001921824576 checkpoint_manager.py:921] [process=6][thread=MainThread] CheckpointManager created, primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_test_pipeline_scan_nnx_20260424_200844/pt_distill_nnx_xpk_test_pipeline_scan_nnx_20260424_200844_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7e5260075df0> I0424 20:30:31.049568 139001921824576 train_distill.py:691] Starting Distillation Training... I0424 20:30:31.049677 139001921824576 peft_trainer.py:590] Training with mesh: Mesh('diloco': 1, 'data': 4, 'stage': 1, 'fsdp': 8, 'fsdp_transpose': 1, 'sequence': 1, 'context': 1, 'context_autoregressive': 1, 'tensor': 1, 'tensor_transpose': 1, 'tensor_sequence': 1, 'expert': 1, 'autoregressive': 1, axis_types=(Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto)) I0424 20:30:31.779694 139001921824576 peft_trainer.py:600] Compiled train_step cache size: 0 Training: 0%| | 0/5 [00:00<?, ?step/s]I0424 20:30:31.781680 138858071045888 grain_pool.py:367] Grain pool will use 1 processes. I0424 20:30:31.808471 138858071045888 grain_pool.py:440] Grain pool will start child processes. I0424 20:30:31.813706 138858071045888 grain_pool.py:448] Grain pool started all child processes. 2026-04-24 20:30:37.855060: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303) Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} `rope_scaling`'s factor field must be a float >= 1, got 40 `rope_scaling`'s beta_fast field must be a float, got 32 `rope_scaling`'s beta_slow field must be a float, got 1 Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} I0424 20:30:41.311844 139001921824576 utils.py:86] Train loop finished in: 9.5315 seconds Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 765, in <module> app.run(main) File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 316, in run _run_main(main, args) File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 261, in _run_main sys.exit(main(argv)) ^^^^^^^^^^ File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 761, in main train_distill(student_config, teacher_config, is_offline, global_config.offline_data_dir) File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 693, in train_distill trainer.train(train_iter, eval_iter) File "/usr/local/lib/python3.12/site-packages/tunix/sft/peft_trainer.py", line 659, in train train_example = sharding_utils.shard_input( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 58, in shard_input return jax.tree.map( ^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/tree.py", line 155, in map return tree_util.tree_map(f, tree, *rest, is_leaf=is_leaf) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 362, in tree_map return treedef.unflatten(f(*xs) for xs in zip(*all_leaves)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 362, in <genexpr> return treedef.unflatten(f(*xs) for xs in zip(*all_leaves)) ^^^^^^ File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 59, in <lambda> lambda x: jax.make_array_from_process_local_data( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 986, in make_array_from_process_local_data out = [_array_from_process_local_data(data, s, shape) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 1048, in _array_from_process_local_data return make_array_from_callback(global_shape, sharding, cb) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 845, in make_array_from_callback per_device_values = api.device_put(per_device_values, devices) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/api.py", line 2729, in device_put out_flat = dispatch._batched_device_put_impl( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 558, in _batched_device_put_impl y = _device_put_impl(x, device=device, src=src, copy=cp, aval=aval) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 545, in _device_put_impl return _device_put_sharding_impl(x, aval, device, copy) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 487, in _device_put_sharding_impl raise ValueError( ValueError: device_put's first argument must be a fully addressable array, but got value with devices {TpuDevice(id=5, process_index=0, coords=(1,1,0), core_on_chip=0), TpuDevice(id=21, process_index=4, coords=(1,5,0), core_on_chip=0), TpuDevice(id=1, process_index=0, coords=(1,0,0), core_on_chip=0), TpuDevice(id=12, process_index=2, coords=(0,3,0), core_on_chip=0), TpuDevice(id=27, process_index=7, coords=(3,6,0), core_on_chip=0), TpuDevice(id=23, process_index=5, coords=(3,5,0), core_on_chip=0), TpuDevice(id=11, process_index=3, coords=(3,2,0), core_on_chip=0), TpuDevice(id=25, process_index=6, coords=(1,6,0), core_on_chip=0), TpuDevice(id=18, process_index=5, coords=(2,4,0), core_on_chip=0), TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), TpuDevice(id=3, process_index=1, coords=(3,0,0), core_on_chip=0), TpuDevice(id=28, process_index=6, coords=(0,7,0), core_on_chip=0), TpuDevice(id=30, process_index=7, coords=(2,7,0), core_on_chip=0), TpuDevice(id=15, process_index=3, coords=(3,3,0), core_on_chip=0), TpuDevice(id=24, process_index=6, coords=(0,6,0), core_on_chip=0), TpuDevice(id=14, process_index=3, coords=(2,3,0), core_on_chip=0), TpuDevice(id=26, process_index=7, coords=(2,6,0), core_on_chip=0), TpuDevice(id=8, process_index=2, coords=(0,2,0), core_on_chip=0), TpuDevice(id=4, process_index=0, coords=(0,1,0), core_on_chip=0), TpuDevice(id=19, process_index=5, coords=(3,4,0), core_on_chip=0), TpuDevice(id=31, process_index=7, coords=(3,7,0), core_on_chip=0), TpuDevice(id=10, process_index=3, coords=(2,2,0), core_on_chip=0), TpuDevice(id=22, process_index=5, coords=(2,5,0), core_on_chip=0), TpuDevice(id=17, process_index=4, coords=(1,4,0), core_on_chip=0), TpuDevice(id=29, process_index=6, coords=(1,7,0), core_on_chip=0), TpuDevice(id=13, process_index=2, coords=(1,3,0), core_on_chip=0), TpuDevice(id=6, process_index=1, coords=(2,1,0), core_on_chip=0), TpuDevice(id=7, process_index=1, coords=(3,1,0), core_on_chip=0), TpuDevice(id=9, process_index=2, coords=(1,2,0), core_on_chip=0), TpuDevice(id=2, process_index=1, coords=(2,0,0), core_on_chip=0), TpuDevice(id=16, process_index=4, coords=(0,4,0), core_on_chip=0), TpuDevice(id=20, process_index=4, coords=(0,5,0), core_on_chip=0)} I0424 20:30:41.657060 138858071045888 grain_pool.py:542] Grain pool is exiting. I0424 20:30:41.657175 138858071045888 grain_pool.py:547] Shutting down multiprocessing system. I0424 20:30:43.120647 138858071045888 grain_pool.py:547] Shutting down multiprocessing system. Training: 0%| | 0/5 [00:13<?, ?step/s] /usr/local/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 15 leaked shared_memory objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' XPK End: Fri Apr 24 20:30:51 UTC 2026 EXIT_CODE=1