feat/nnx-post-train-fixesXPK Start: Fri Apr 24 12:13:42 UTC 2026 2026-04-24 12:13:59.171463: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303) Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} `rope_scaling`'s factor field must be a float >= 1, got 40 `rope_scaling`'s beta_fast field must be a float, got 32 `rope_scaling`'s beta_slow field must be a float, got 1 Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} I0424 12:14:03.179915 134763599910720 max_utils.py:273] Attempting to initialize the jax distributed system... INFO:2026-04-24 12:14:12,219:jax._src.distributed:149: Starting JAX distributed service on [::]:8482 I0424 12:14:12.219719 134763599910720 distributed.py:149] Starting JAX distributed service on [::]:8482 INFO:2026-04-24 12:14:12,221:jax._src.distributed:166: Connecting to JAX distributed service on mt-07-distill-smoke-p98z9-slice-job-0-0.mt-07-distill-smoke-p98z9:8482 I0424 12:14:12.221941 134763599910720 distributed.py:166] Connecting to JAX distributed service on mt-07-distill-smoke-p98z9-slice-job-0-0.mt-07-distill-smoke-p98z9:8482 I0424 12:14:13.281163 134763599910720 max_utils.py:284] Jax distributed system initialized! I0424 12:14:19.712228 134763599910720 max_utils.py:244] Jax distributed system is already initialized. W0424 12:14:19.843696 134763599910720 pyconfig.py:111] base_output_directory is not provided; Using local directory called maxtext_output I0424 12:14:19.904165 134763599910720 max_utils.py:244] Jax distributed system is already initialized. I0424 12:14:19.905361 134763599910720 pyconfig.py:471] Config param abort_on_inf_loss: True I0424 12:14:19.905407 134763599910720 pyconfig.py:471] Config param abort_on_nan_loss: True I0424 12:14:19.905434 134763599910720 pyconfig.py:471] Config param act_quantization_calibration_method: absmax I0424 12:14:19.905455 134763599910720 pyconfig.py:471] Config param activation_dropout_for_audio: 0.0 I0424 12:14:19.905475 134763599910720 pyconfig.py:471] Config param activation_function_for_audio: gelu I0424 12:14:19.905493 134763599910720 pyconfig.py:471] Config param activations_in_float32: False I0424 12:14:19.905513 134763599910720 pyconfig.py:471] Config param adam_b1: 0.9 I0424 12:14:19.905534 134763599910720 pyconfig.py:471] Config param adam_b2: 0.95 I0424 12:14:19.905552 134763599910720 pyconfig.py:471] Config param adam_eps: 1e-08 I0424 12:14:19.905577 134763599910720 pyconfig.py:471] Config param adam_eps_root: 0.0 I0424 12:14:19.905594 134763599910720 pyconfig.py:471] Config param adam_weight_decay: 0.1 I0424 12:14:19.905611 134763599910720 pyconfig.py:471] Config param adamw_mask: [] I0424 12:14:19.905627 134763599910720 pyconfig.py:471] Config param add_bos: True I0424 12:14:19.905645 134763599910720 pyconfig.py:471] Config param add_eos: True I0424 12:14:19.905674 134763599910720 pyconfig.py:471] Config param allow_split_physical_axes: False I0424 12:14:19.905690 134763599910720 pyconfig.py:471] Config param ar_cache_axis_order: 1,2,0,3 I0424 12:14:19.905705 134763599910720 pyconfig.py:471] Config param async_checkpointing: True I0424 12:14:19.905722 134763599910720 pyconfig.py:471] Config param async_scheduling: False I0424 12:14:19.905737 134763599910720 pyconfig.py:471] Config param attention: dot_product I0424 12:14:19.905754 134763599910720 pyconfig.py:471] Config param attention_bias: False I0424 12:14:19.905771 134763599910720 pyconfig.py:471] Config param attention_dropout_for_audio: 0.0 I0424 12:14:19.905809 134763599910720 pyconfig.py:471] Config param attention_out: RematLocation.REMAT I0424 12:14:19.905830 134763599910720 pyconfig.py:471] Config param attention_output_dim: -1 I0424 12:14:19.905847 134763599910720 pyconfig.py:471] Config param attention_sink: False I0424 12:14:19.905864 134763599910720 pyconfig.py:471] Config param attention_type: global I0424 12:14:19.905879 134763599910720 pyconfig.py:471] Config param attn_logits_soft_cap: None I0424 12:14:19.905894 134763599910720 pyconfig.py:471] Config param audio_path: I0424 12:14:19.905910 134763599910720 pyconfig.py:471] Config param audio_placeholder: <|audio|> I0424 12:14:19.905926 134763599910720 pyconfig.py:471] Config param autoregressive_decode_assert: I0424 12:14:19.905941 134763599910720 pyconfig.py:471] Config param base_config: base.yml I0424 12:14:19.905957 134763599910720 pyconfig.py:471] Config param base_emb_dim: 16 I0424 12:14:19.905977 134763599910720 pyconfig.py:471] Config param base_mlp_dim: 64 I0424 12:14:19.905994 134763599910720 pyconfig.py:471] Config param base_moe_mlp_dim: -1 I0424 12:14:19.906010 134763599910720 pyconfig.py:471] Config param base_num_decoder_layers: 1 I0424 12:14:19.906027 134763599910720 pyconfig.py:471] Config param base_num_kv_heads: 2 I0424 12:14:19.906042 134763599910720 pyconfig.py:471] Config param base_num_query_heads: 2 I0424 12:14:19.906058 134763599910720 pyconfig.py:471] Config param base_output_directory: /deps/maxtext_output I0424 12:14:19.906075 134763599910720 pyconfig.py:471] Config param batch_size: 1 I0424 12:14:19.906091 134763599910720 pyconfig.py:471] Config param batch_split_factor: 1 I0424 12:14:19.906108 134763599910720 pyconfig.py:471] Config param beta_fast: 32 I0424 12:14:19.906125 134763599910720 pyconfig.py:471] Config param beta_slow: 1 I0424 12:14:19.906140 134763599910720 pyconfig.py:471] Config param bwd_quantization_calibration_method: absmax I0424 12:14:19.906157 134763599910720 pyconfig.py:471] Config param capacity_factor: -1.0 I0424 12:14:19.906174 134763599910720 pyconfig.py:471] Config param cast_logits_to_fp32: True I0424 12:14:19.906191 134763599910720 pyconfig.py:471] Config param chat_template: I0424 12:14:19.906207 134763599910720 pyconfig.py:471] Config param chat_template_path: I0424 12:14:19.906225 134763599910720 pyconfig.py:471] Config param checkpoint_conversion_fn: None I0424 12:14:19.906242 134763599910720 pyconfig.py:471] Config param checkpoint_dir: /deps/maxtext_output/gpt3-52k_2026-04-24-12-14/checkpoints/ I0424 12:14:19.906260 134763599910720 pyconfig.py:471] Config param checkpoint_is_quantized: False I0424 12:14:19.906276 134763599910720 pyconfig.py:471] Config param checkpoint_period: 2000 I0424 12:14:19.906291 134763599910720 pyconfig.py:471] Config param checkpoint_storage_concurrent_gb: 96 I0424 12:14:19.906307 134763599910720 pyconfig.py:471] Config param checkpoint_storage_target_data_file_size_bytes: 2147483648 I0424 12:14:19.906324 134763599910720 pyconfig.py:471] Config param checkpoint_storage_use_ocdbt: True I0424 12:14:19.906339 134763599910720 pyconfig.py:471] Config param checkpoint_storage_use_zarr3: True I0424 12:14:19.906356 134763599910720 pyconfig.py:471] Config param checkpoint_todelete_full_path: None I0424 12:14:19.906372 134763599910720 pyconfig.py:471] Config param checkpoint_todelete_subdir: None I0424 12:14:19.906388 134763599910720 pyconfig.py:471] Config param chips_per_vm: 4 I0424 12:14:19.906404 134763599910720 pyconfig.py:471] Config param chunk_attn_window_size: 0 I0424 12:14:19.906421 134763599910720 pyconfig.py:471] Config param collect_stack_trace: False I0424 12:14:19.906436 134763599910720 pyconfig.py:471] Config param colocated_python_checkpointing: False I0424 12:14:19.906452 134763599910720 pyconfig.py:471] Config param colocated_python_data_input: False I0424 12:14:19.906466 134763599910720 pyconfig.py:471] Config param compile_topology: I0424 12:14:19.906482 134763599910720 pyconfig.py:471] Config param compile_topology_num_slices: -1 I0424 12:14:19.906498 134763599910720 pyconfig.py:471] Config param compile_xla_flags: I0424 12:14:19.906512 134763599910720 pyconfig.py:471] Config param compiled_trainstep_file: I0424 12:14:19.906528 134763599910720 pyconfig.py:471] Config param compute_axis_order: 0,1,2,3 I0424 12:14:19.906543 134763599910720 pyconfig.py:471] Config param constant_bound_config: [] I0424 12:14:19.906558 134763599910720 pyconfig.py:471] Config param context: RematLocation.REMAT I0424 12:14:19.906574 134763599910720 pyconfig.py:471] Config param context_parallel_load_balance: True I0424 12:14:19.906589 134763599910720 pyconfig.py:471] Config param context_parallel_reorder_strategy: ReorderStrategy.AUTO I0424 12:14:19.906607 134763599910720 pyconfig.py:471] Config param context_parallel_size: 1 I0424 12:14:19.906621 134763599910720 pyconfig.py:471] Config param context_parallel_strategy: all_gather I0424 12:14:19.906637 134763599910720 pyconfig.py:471] Config param context_sharding: context I0424 12:14:19.906664 134763599910720 pyconfig.py:471] Config param conv_chunksize_for_audio: 500 I0424 12:14:19.906680 134763599910720 pyconfig.py:471] Config param conv_stride_for_vit: 14 I0424 12:14:19.906697 134763599910720 pyconfig.py:471] Config param convert_checkpoint_if_possible: False I0424 12:14:19.906712 134763599910720 pyconfig.py:471] Config param cost_estimate_flops_bwd: -1 I0424 12:14:19.906728 134763599910720 pyconfig.py:471] Config param cost_estimate_flops_fwd: -1 I0424 12:14:19.906743 134763599910720 pyconfig.py:471] Config param custom_mesh: I0424 12:14:19.906758 134763599910720 pyconfig.py:471] Config param custom_mesh_and_rule: I0424 12:14:19.906772 134763599910720 pyconfig.py:471] Config param d_model_for_audio: 256 I0424 12:14:19.906788 134763599910720 pyconfig.py:471] Config param data_sharding: (('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'),) I0424 12:14:19.906809 134763599910720 pyconfig.py:471] Config param data_shuffle_seed: 0 I0424 12:14:19.906825 134763599910720 pyconfig.py:471] Config param dataset_name: c4/en:3.0.1 I0424 12:14:19.906841 134763599910720 pyconfig.py:471] Config param dataset_path: I0424 12:14:19.906855 134763599910720 pyconfig.py:471] Config param dataset_type: DatasetType.HF I0424 12:14:19.906873 134763599910720 pyconfig.py:471] Config param dcn_autoregressive_parallelism: 1 I0424 12:14:19.906888 134763599910720 pyconfig.py:471] Config param dcn_context_autoregressive_parallelism: 1 I0424 12:14:19.906904 134763599910720 pyconfig.py:471] Config param dcn_context_parallelism: 1 I0424 12:14:19.906920 134763599910720 pyconfig.py:471] Config param dcn_data_parallelism: -1 I0424 12:14:19.906936 134763599910720 pyconfig.py:471] Config param dcn_diloco_parallelism: 1 I0424 12:14:19.906951 134763599910720 pyconfig.py:471] Config param dcn_expert_parallelism: 1 I0424 12:14:19.906966 134763599910720 pyconfig.py:471] Config param dcn_fsdp_parallelism: 1 I0424 12:14:19.906985 134763599910720 pyconfig.py:471] Config param dcn_fsdp_transpose_parallelism: 1 I0424 12:14:19.907000 134763599910720 pyconfig.py:471] Config param dcn_parallelism: [1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] I0424 12:14:19.907017 134763599910720 pyconfig.py:471] Config param dcn_pipeline_parallelism: 1 I0424 12:14:19.907032 134763599910720 pyconfig.py:471] Config param dcn_sequence_parallelism: 1 I0424 12:14:19.907047 134763599910720 pyconfig.py:471] Config param dcn_tensor_parallelism: 1 I0424 12:14:19.907063 134763599910720 pyconfig.py:471] Config param dcn_tensor_sequence_parallelism: 1 I0424 12:14:19.907078 134763599910720 pyconfig.py:471] Config param dcn_tensor_transpose_parallelism: 1 I0424 12:14:19.907094 134763599910720 pyconfig.py:471] Config param debug: {'rl': False} I0424 12:14:19.907111 134763599910720 pyconfig.py:471] Config param debug_sharding: False I0424 12:14:19.907125 134763599910720 pyconfig.py:471] Config param decode_sampling_nucleus_p: -1 I0424 12:14:19.907141 134763599910720 pyconfig.py:471] Config param decode_sampling_strategy: SamplingStrategy.GREEDY I0424 12:14:19.907159 134763599910720 pyconfig.py:471] Config param decode_sampling_temperature: 1.0 I0424 12:14:19.907174 134763599910720 pyconfig.py:471] Config param decode_sampling_top_k: 0 I0424 12:14:19.907188 134763599910720 pyconfig.py:471] Config param decoder_block: DecoderBlockType.GPT3 I0424 12:14:19.907206 134763599910720 pyconfig.py:471] Config param decoder_layer_input: RematLocation.DEVICE I0424 12:14:19.907222 134763599910720 pyconfig.py:471] Config param deepstack_visual_indexes_for_vit: [] I0424 12:14:19.907237 134763599910720 pyconfig.py:471] Config param degenerate_group_masking: True I0424 12:14:19.907253 134763599910720 pyconfig.py:471] Config param dense_init_scale: 1.0 I0424 12:14:19.907269 134763599910720 pyconfig.py:471] Config param diloco_outer_lr: 0.3 I0424 12:14:19.907286 134763599910720 pyconfig.py:471] Config param diloco_outer_momentum: 0.9 I0424 12:14:19.907302 134763599910720 pyconfig.py:471] Config param diloco_sync_period: 36 I0424 12:14:19.907319 134763599910720 pyconfig.py:471] Config param distill_alpha: 0.5 I0424 12:14:19.907336 134763599910720 pyconfig.py:471] Config param distill_alpha_end: None I0424 12:14:19.907352 134763599910720 pyconfig.py:471] Config param distill_alpha_schedule: constant I0424 12:14:19.907368 134763599910720 pyconfig.py:471] Config param distill_beta: 0.0 I0424 12:14:19.907382 134763599910720 pyconfig.py:471] Config param distill_beta_end: None I0424 12:14:19.907398 134763599910720 pyconfig.py:471] Config param distill_beta_schedule: constant I0424 12:14:19.907414 134763599910720 pyconfig.py:471] Config param distill_feature_loss_type: cosine I0424 12:14:19.907429 134763599910720 pyconfig.py:471] Config param distill_layer_indices: None I0424 12:14:19.907444 134763599910720 pyconfig.py:471] Config param distill_temperature: 1.0 I0424 12:14:19.907460 134763599910720 pyconfig.py:471] Config param distill_temperature_end: None I0424 12:14:19.907476 134763599910720 pyconfig.py:471] Config param distill_temperature_schedule: constant I0424 12:14:19.907490 134763599910720 pyconfig.py:471] Config param downsample_hidden_size_for_audio: 256 I0424 12:14:19.907506 134763599910720 pyconfig.py:471] Config param dpo_beta: 0.1 I0424 12:14:19.907520 134763599910720 pyconfig.py:471] Config param dpo_label_smoothing: 0.0 I0424 12:14:19.907536 134763599910720 pyconfig.py:471] Config param dq_reduction_steps: 0 I0424 12:14:19.907552 134763599910720 pyconfig.py:471] Config param dropout_rate: 0.0 I0424 12:14:19.907566 134763599910720 pyconfig.py:471] Config param dtype: bfloat16 I0424 12:14:19.907597 134763599910720 pyconfig.py:471] Config param dtype_mm: float32 I0424 12:14:19.907613 134763599910720 pyconfig.py:471] Config param dump_hlo: False I0424 12:14:19.907629 134763599910720 pyconfig.py:471] Config param dump_hlo_delete_local_after: True I0424 12:14:19.907644 134763599910720 pyconfig.py:471] Config param dump_hlo_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-24-12-14/xla_dump I0424 12:14:19.907672 134763599910720 pyconfig.py:471] Config param dump_hlo_local_dir: /tmp/xla_dump/ I0424 12:14:19.907687 134763599910720 pyconfig.py:471] Config param dump_hlo_local_module_name: jit_train_step I0424 12:14:19.907701 134763599910720 pyconfig.py:471] Config param dump_hlo_module_name: jit_train_step I0424 12:14:19.907717 134763599910720 pyconfig.py:471] Config param dump_hlo_upload_all: False I0424 12:14:19.907732 134763599910720 pyconfig.py:471] Config param dump_hlo_xla_flags: I0424 12:14:19.907748 134763599910720 pyconfig.py:471] Config param dump_jaxpr: False I0424 12:14:19.907764 134763599910720 pyconfig.py:471] Config param dump_jaxpr_delete_local_after: True I0424 12:14:19.907779 134763599910720 pyconfig.py:471] Config param dump_jaxpr_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-24-12-14/jaxpr_dump I0424 12:14:19.907795 134763599910720 pyconfig.py:471] Config param dump_jaxpr_local_dir: /tmp/jaxpr_dump/ I0424 12:14:19.907811 134763599910720 pyconfig.py:471] Config param dump_step: -1 I0424 12:14:19.907826 134763599910720 pyconfig.py:471] Config param elastic_enabled: False I0424 12:14:19.907842 134763599910720 pyconfig.py:471] Config param elastic_max_retries: 10 I0424 12:14:19.907858 134763599910720 pyconfig.py:471] Config param elastic_timeout_seconds: 300 I0424 12:14:19.907872 134763599910720 pyconfig.py:471] Config param emb_dim: 16 I0424 12:14:19.907887 134763599910720 pyconfig.py:471] Config param enable_autocheckpoint: False I0424 12:14:19.907902 134763599910720 pyconfig.py:471] Config param enable_checkpoint_cloud_logger: False I0424 12:14:19.907917 134763599910720 pyconfig.py:471] Config param enable_checkpointing: True I0424 12:14:19.907932 134763599910720 pyconfig.py:471] Config param enable_continuous_checkpointing: False I0424 12:14:19.907948 134763599910720 pyconfig.py:471] Config param enable_data_shuffling: True I0424 12:14:19.907964 134763599910720 pyconfig.py:471] Config param enable_diloco: False I0424 12:14:19.907982 134763599910720 pyconfig.py:471] Config param enable_dp_attention: False I0424 12:14:19.907997 134763599910720 pyconfig.py:471] Config param enable_dropout: False I0424 12:14:19.908012 134763599910720 pyconfig.py:471] Config param enable_emergency_checkpoint: False I0424 12:14:19.908027 134763599910720 pyconfig.py:471] Config param enable_expert_parallel: False I0424 12:14:19.908043 134763599910720 pyconfig.py:471] Config param enable_gcp_goodput_metrics: True I0424 12:14:19.908059 134763599910720 pyconfig.py:471] Config param enable_gcp_step_deviation_metrics: True I0424 12:14:19.908073 134763599910720 pyconfig.py:471] Config param enable_goodput_recording: False I0424 12:14:19.908089 134763599910720 pyconfig.py:471] Config param enable_jax_profiler: False I0424 12:14:19.908104 134763599910720 pyconfig.py:471] Config param enable_llm_inference_pool: False I0424 12:14:19.908119 134763599910720 pyconfig.py:471] Config param enable_model_warmup: False I0424 12:14:19.908134 134763599910720 pyconfig.py:471] Config param enable_multi_tier_checkpointing: False I0424 12:14:19.908149 134763599910720 pyconfig.py:471] Config param enable_nnx: False I0424 12:14:19.908164 134763599910720 pyconfig.py:471] Config param enable_orbax_v1: False I0424 12:14:19.908179 134763599910720 pyconfig.py:471] Config param enable_padding_causal_mask: True I0424 12:14:19.908193 134763599910720 pyconfig.py:471] Config param enable_pathways_goodput: False I0424 12:14:19.908208 134763599910720 pyconfig.py:471] Config param enable_prefix_caching: False I0424 12:14:19.908223 134763599910720 pyconfig.py:471] Config param enable_rampup_batch_size: False I0424 12:14:19.908238 134763599910720 pyconfig.py:471] Config param enable_single_controller: False I0424 12:14:19.908253 134763599910720 pyconfig.py:471] Config param enable_single_replica_ckpt_restoring: False I0424 12:14:19.908268 134763599910720 pyconfig.py:471] Config param enable_tensorboard: True I0424 12:14:19.908283 134763599910720 pyconfig.py:471] Config param enable_tunix_perf_metrics: False I0424 12:14:19.908299 134763599910720 pyconfig.py:471] Config param encoder_attention_heads_for_audio: 4 I0424 12:14:19.908313 134763599910720 pyconfig.py:471] Config param encoder_ffn_dim_for_audio: 512 I0424 12:14:19.908329 134763599910720 pyconfig.py:471] Config param encoder_layers_for_audio: 2 I0424 12:14:19.908344 134763599910720 pyconfig.py:471] Config param engram: RematLocation.REMAT I0424 12:14:19.908360 134763599910720 pyconfig.py:471] Config param engram_head_dim: 1280 I0424 12:14:19.908377 134763599910720 pyconfig.py:471] Config param engram_kernel_size: 4 I0424 12:14:19.908393 134763599910720 pyconfig.py:471] Config param engram_layers: [] I0424 12:14:19.908409 134763599910720 pyconfig.py:471] Config param engram_max_ngram_size: 3 I0424 12:14:19.908424 134763599910720 pyconfig.py:471] Config param engram_num_heads: 8 I0424 12:14:19.908439 134763599910720 pyconfig.py:471] Config param engram_seed: 0 I0424 12:14:19.908454 134763599910720 pyconfig.py:471] Config param engram_vocab_bases: [] I0424 12:14:19.908469 134763599910720 pyconfig.py:471] Config param epsilon_high: None I0424 12:14:19.908485 134763599910720 pyconfig.py:471] Config param eval_corr_lst: False I0424 12:14:19.908500 134763599910720 pyconfig.py:471] Config param eval_data_columns: ['text'] I0424 12:14:19.908516 134763599910720 pyconfig.py:471] Config param eval_dataset_name: c4/en:3.0.1 I0424 12:14:19.908531 134763599910720 pyconfig.py:471] Config param eval_image_column: image I0424 12:14:19.908546 134763599910720 pyconfig.py:471] Config param eval_interval: -1 I0424 12:14:19.908563 134763599910720 pyconfig.py:471] Config param eval_make_lst: False I0424 12:14:19.908579 134763599910720 pyconfig.py:471] Config param eval_per_device_batch_size: 2 I0424 12:14:19.908594 134763599910720 pyconfig.py:471] Config param eval_sampling_strategy: greedy I0424 12:14:19.908608 134763599910720 pyconfig.py:471] Config param eval_split: validation I0424 12:14:19.908624 134763599910720 pyconfig.py:471] Config param eval_steps: -1 I0424 12:14:19.908638 134763599910720 pyconfig.py:471] Config param expansion_factor_real_data: -1.0 I0424 12:14:19.908665 134763599910720 pyconfig.py:471] Config param final_logits_soft_cap: None I0424 12:14:19.908681 134763599910720 pyconfig.py:471] Config param first_num_dense_layers: 0 I0424 12:14:19.908697 134763599910720 pyconfig.py:471] Config param float32_gate_logits: False I0424 12:14:19.908711 134763599910720 pyconfig.py:471] Config param float32_logits: False I0424 12:14:19.908727 134763599910720 pyconfig.py:471] Config param float32_qk_product: False I0424 12:14:19.908741 134763599910720 pyconfig.py:471] Config param float32_weight_sum: True I0424 12:14:19.908756 134763599910720 pyconfig.py:471] Config param force_q_layout: False I0424 12:14:19.908771 134763599910720 pyconfig.py:471] Config param force_unroll: False I0424 12:14:19.908787 134763599910720 pyconfig.py:471] Config param formatting_func_kwargs: {} I0424 12:14:19.908802 134763599910720 pyconfig.py:471] Config param formatting_func_path: I0424 12:14:19.908817 134763599910720 pyconfig.py:471] Config param freeze_audio_encoder_params: True I0424 12:14:19.908832 134763599910720 pyconfig.py:471] Config param freeze_vision_encoder_params: True I0424 12:14:19.908847 134763599910720 pyconfig.py:471] Config param fused_mlp: False I0424 12:14:19.908862 134763599910720 pyconfig.py:471] Config param fused_qkv: True I0424 12:14:19.908878 134763599910720 pyconfig.py:471] Config param gcs_metrics: False I0424 12:14:19.908893 134763599910720 pyconfig.py:471] Config param gdn_chunk_size: 64 I0424 12:14:19.908909 134763599910720 pyconfig.py:471] Config param gdn_conv_kernel_dim: 4 I0424 12:14:19.908923 134763599910720 pyconfig.py:471] Config param gdn_key_head_dim: 128 I0424 12:14:19.908939 134763599910720 pyconfig.py:471] Config param gdn_num_key_heads: 16 I0424 12:14:19.908954 134763599910720 pyconfig.py:471] Config param gdn_num_value_heads: 32 I0424 12:14:19.908969 134763599910720 pyconfig.py:471] Config param gdn_value_head_dim: 128 I0424 12:14:19.908990 134763599910720 pyconfig.py:471] Config param generate_padding_batch_eval: False I0424 12:14:19.909004 134763599910720 pyconfig.py:471] Config param generate_padding_batch_train: False I0424 12:14:19.909020 134763599910720 pyconfig.py:471] Config param generate_slice: v5e-16 I0424 12:14:19.909035 134763599910720 pyconfig.py:471] Config param generation_configs: {} I0424 12:14:19.909051 134763599910720 pyconfig.py:471] Config param global_batch_size_to_eval_on: 64 I0424 12:14:19.909066 134763599910720 pyconfig.py:471] Config param global_batch_size_to_load: 512 I0424 12:14:19.909081 134763599910720 pyconfig.py:471] Config param global_batch_size_to_load_eval: 64 I0424 12:14:19.909097 134763599910720 pyconfig.py:471] Config param global_batch_size_to_load_increment: None I0424 12:14:19.909111 134763599910720 pyconfig.py:471] Config param global_batch_size_to_load_start: None I0424 12:14:19.909127 134763599910720 pyconfig.py:471] Config param global_batch_size_to_train_on: 512 I0424 12:14:19.909144 134763599910720 pyconfig.py:471] Config param global_head_dim: 0 I0424 12:14:19.909159 134763599910720 pyconfig.py:471] Config param global_num_kv_heads: 0 I0424 12:14:19.909175 134763599910720 pyconfig.py:471] Config param global_parameter_scale: 1 I0424 12:14:19.909190 134763599910720 pyconfig.py:471] Config param global_rampup_samples: 500 I0424 12:14:19.909205 134763599910720 pyconfig.py:471] Config param global_rope_max_timescale: -1 I0424 12:14:19.909220 134763599910720 pyconfig.py:471] Config param global_rope_proportion: 0.25 I0424 12:14:19.909237 134763599910720 pyconfig.py:471] Config param goodput_upload_interval_seconds: 30 I0424 12:14:19.909253 134763599910720 pyconfig.py:471] Config param grad_dtype: float32 I0424 12:14:19.909289 134763599910720 pyconfig.py:471] Config param gradient_accumulation_steps: 8 I0424 12:14:19.909305 134763599910720 pyconfig.py:471] Config param gradient_clipping_threshold: 1.0 I0424 12:14:19.909322 134763599910720 pyconfig.py:471] Config param grain_data_source_max_workers: 16 I0424 12:14:19.909338 134763599910720 pyconfig.py:471] Config param grain_eval_files: I0424 12:14:19.909353 134763599910720 pyconfig.py:471] Config param grain_file_type: arrayrecord I0424 12:14:19.909368 134763599910720 pyconfig.py:471] Config param grain_num_threads: 16 I0424 12:14:19.909383 134763599910720 pyconfig.py:471] Config param grain_num_threads_eval: 16 I0424 12:14:19.909399 134763599910720 pyconfig.py:471] Config param grain_packing_type: first_fit I0424 12:14:19.909413 134763599910720 pyconfig.py:471] Config param grain_per_worker_buffer_size: 1 I0424 12:14:19.909429 134763599910720 pyconfig.py:471] Config param grain_per_worker_buffer_size_eval: 1 I0424 12:14:19.909445 134763599910720 pyconfig.py:471] Config param grain_prefetch_buffer_size: 500 I0424 12:14:19.909461 134763599910720 pyconfig.py:471] Config param grain_prefetch_buffer_size_eval: 500 I0424 12:14:19.909476 134763599910720 pyconfig.py:471] Config param grain_ram_budget_mb: 1024 I0424 12:14:19.909493 134763599910720 pyconfig.py:471] Config param grain_shuffle_buffer_size: 100 I0424 12:14:19.909508 134763599910720 pyconfig.py:471] Config param grain_train_files: I0424 12:14:19.909524 134763599910720 pyconfig.py:471] Config param grain_train_mixture_config_path: I0424 12:14:19.909540 134763599910720 pyconfig.py:471] Config param grain_worker_count: 1 I0424 12:14:19.909554 134763599910720 pyconfig.py:471] Config param grain_worker_count_eval: 1 I0424 12:14:19.909568 134763599910720 pyconfig.py:471] Config param grpo_beta: 0.08 I0424 12:14:19.909584 134763599910720 pyconfig.py:471] Config param grpo_epsilon: 0.2 I0424 12:14:19.909600 134763599910720 pyconfig.py:471] Config param hardware: tpu I0424 12:14:19.909615 134763599910720 pyconfig.py:471] Config param hbm_utilization_vllm: 0.72 I0424 12:14:19.909631 134763599910720 pyconfig.py:471] Config param head_dim: 8 I0424 12:14:19.909646 134763599910720 pyconfig.py:471] Config param heartbeat_reporting_interval_in_seconds: 5 I0424 12:14:19.909675 134763599910720 pyconfig.py:471] Config param hf_data_dir: None I0424 12:14:19.909690 134763599910720 pyconfig.py:471] Config param hf_eval_files: None I0424 12:14:19.909705 134763599910720 pyconfig.py:471] Config param hf_eval_split: None I0424 12:14:19.909720 134763599910720 pyconfig.py:471] Config param hf_name: None I0424 12:14:19.909735 134763599910720 pyconfig.py:471] Config param hf_path: OptimalScale/ClimbMix I0424 12:14:19.909752 134763599910720 pyconfig.py:471] Config param hf_train_files: None I0424 12:14:19.909770 134763599910720 pyconfig.py:471] Config param hidden_size_for_vit: 1408 I0424 12:14:19.909787 134763599910720 pyconfig.py:471] Config param hide_profiler_step_metric: False I0424 12:14:19.909802 134763599910720 pyconfig.py:471] Config param ici_autoregressive_parallelism: 1 I0424 12:14:19.909816 134763599910720 pyconfig.py:471] Config param ici_context_autoregressive_parallelism: 1 I0424 12:14:19.909832 134763599910720 pyconfig.py:471] Config param ici_context_parallelism: 1 I0424 12:14:19.909847 134763599910720 pyconfig.py:471] Config param ici_data_parallelism: 1 I0424 12:14:19.909863 134763599910720 pyconfig.py:471] Config param ici_diloco_parallelism: 1 I0424 12:14:19.909877 134763599910720 pyconfig.py:471] Config param ici_expert_parallelism: 1 I0424 12:14:19.909893 134763599910720 pyconfig.py:471] Config param ici_fsdp_parallelism: -1 I0424 12:14:19.909907 134763599910720 pyconfig.py:471] Config param ici_fsdp_transpose_parallelism: 1 I0424 12:14:19.909923 134763599910720 pyconfig.py:471] Config param ici_parallelism: [1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1] I0424 12:14:19.909939 134763599910720 pyconfig.py:471] Config param ici_pipeline_parallelism: 1 I0424 12:14:19.909954 134763599910720 pyconfig.py:471] Config param ici_sequence_parallelism: 1 I0424 12:14:19.909969 134763599910720 pyconfig.py:471] Config param ici_tensor_parallelism: 1 I0424 12:14:19.909989 134763599910720 pyconfig.py:471] Config param ici_tensor_sequence_parallelism: 1 I0424 12:14:19.910004 134763599910720 pyconfig.py:471] Config param ici_tensor_transpose_parallelism: 1 I0424 12:14:19.910021 134763599910720 pyconfig.py:471] Config param image_path: I0424 12:14:19.910035 134763599910720 pyconfig.py:471] Config param image_placeholder: <|image|> I0424 12:14:19.910050 134763599910720 pyconfig.py:471] Config param image_size_for_vit: 896 I0424 12:14:19.910065 134763599910720 pyconfig.py:471] Config param indexer_head_dim: 128 I0424 12:14:19.910081 134763599910720 pyconfig.py:471] Config param indexer_loss_scaling_factor: 0.0 I0424 12:14:19.910097 134763599910720 pyconfig.py:471] Config param indexer_n_heads: 64 I0424 12:14:19.910111 134763599910720 pyconfig.py:471] Config param indexer_sparse_training: False I0424 12:14:19.910127 134763599910720 pyconfig.py:471] Config param indexer_topk: 2048 I0424 12:14:19.910141 134763599910720 pyconfig.py:471] Config param inference_benchmark_test: False I0424 12:14:19.910157 134763599910720 pyconfig.py:471] Config param inference_metadata_file: I0424 12:14:19.910173 134763599910720 pyconfig.py:471] Config param inference_microbenchmark_log_file_path: I0424 12:14:19.910188 134763599910720 pyconfig.py:471] Config param inference_microbenchmark_loop_iters: 10 I0424 12:14:19.910204 134763599910720 pyconfig.py:471] Config param inference_microbenchmark_num_samples: [1, 2, 3, 4, 5] I0424 12:14:19.910219 134763599910720 pyconfig.py:471] Config param inference_microbenchmark_prefill_lengths: 64,128,256,512,1024 I0424 12:14:19.910235 134763599910720 pyconfig.py:471] Config param inference_microbenchmark_stages: prefill,generate I0424 12:14:19.910250 134763599910720 pyconfig.py:471] Config param inference_server: MaxtextInterleavedServer I0424 12:14:19.910266 134763599910720 pyconfig.py:471] Config param inhomogeneous_layer_cycle_interval: 1 I0424 12:14:19.910282 134763599910720 pyconfig.py:471] Config param init_weights_seed: 0 I0424 12:14:19.910298 134763599910720 pyconfig.py:471] Config param input_data_sharding_logical_axes: ['activation_embed_and_logits_batch', 'activation_norm_length'] I0424 12:14:19.910314 134763599910720 pyconfig.py:471] Config param interleave_moe_layer_step: 1 I0424 12:14:19.910330 134763599910720 pyconfig.py:471] Config param intermediate_size_for_vit: 5632 I0424 12:14:19.910346 134763599910720 pyconfig.py:471] Config param internal_compile: False I0424 12:14:19.910362 134763599910720 pyconfig.py:471] Config param internal_compile_num_devices: -1 I0424 12:14:19.910377 134763599910720 pyconfig.py:471] Config param jax_cache_dir: ~/jax_cache I0424 12:14:19.910392 134763599910720 pyconfig.py:471] Config param jax_debug_log_modules: I0424 12:14:19.910408 134763599910720 pyconfig.py:471] Config param jax_distributed_initialization_timeout: 300 I0424 12:14:19.910423 134763599910720 pyconfig.py:471] Config param jax_profiler_port: 9999 I0424 12:14:19.910437 134763599910720 pyconfig.py:471] Config param key_proj: RematLocation.REMAT I0424 12:14:19.910454 134763599910720 pyconfig.py:471] Config param kv_cache_buffer: 256 I0424 12:14:19.910469 134763599910720 pyconfig.py:471] Config param kv_lora_rank: 512 I0424 12:14:19.910486 134763599910720 pyconfig.py:471] Config param kv_quant_axis: KvQuantAxis.HEADS_AND_DKV I0424 12:14:19.910505 134763599910720 pyconfig.py:471] Config param kv_quant_dtype: int8 I0424 12:14:19.910520 134763599910720 pyconfig.py:471] Config param kv_wa_proj: RematLocation.REMAT I0424 12:14:19.910535 134763599910720 pyconfig.py:471] Config param learning_rate: 0.0002 I0424 12:14:19.910550 134763599910720 pyconfig.py:471] Config param learning_rate_final_fraction: 0.1 I0424 12:14:19.910566 134763599910720 pyconfig.py:471] Config param learning_rate_schedule_steps: 200000 I0424 12:14:19.910581 134763599910720 pyconfig.py:471] Config param load_balance_loss_weight: 0.0 I0424 12:14:19.910595 134763599910720 pyconfig.py:471] Config param load_checkpoint_only_once: False I0424 12:14:19.910611 134763599910720 pyconfig.py:471] Config param load_from_prefill_dir: False I0424 12:14:19.910627 134763599910720 pyconfig.py:471] Config param load_full_state_path: I0424 12:14:19.910641 134763599910720 pyconfig.py:471] Config param load_parameters_path: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items I0424 12:14:19.910666 134763599910720 pyconfig.py:471] Config param local_checkpoint_directory: I0424 12:14:19.910680 134763599910720 pyconfig.py:471] Config param local_checkpoint_period: 0 I0424 12:14:19.910696 134763599910720 pyconfig.py:471] Config param local_rope_max_timescale: -1 I0424 12:14:19.910710 134763599910720 pyconfig.py:471] Config param local_rope_proportion: 1.0 I0424 12:14:19.910727 134763599910720 pyconfig.py:471] Config param log_config: True I0424 12:14:19.910742 134763599910720 pyconfig.py:471] Config param log_period: 10 I0424 12:14:19.910757 134763599910720 pyconfig.py:471] Config param logical_axis_rules: (('activation_embed_and_logits_batch', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_embed_and_logits_batch_sequence', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_vocab', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_vocab', ('tensor', 'tensor_transpose')), ('activation_vocab', 'tensor_sequence'), ('activation_vocab', ('sequence', 'context')), ('vocab', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('embed_vocab', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence', 'autoregressive')), ('activation_kv_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence')), ('activation_attn_length', ('sequence', 'context')), ('activation_attn_length', ('context',)), ('activation_q_length', ('context',)), ('activation_kv_length', ()), ('activation_attn_embed', ('tensor', 'tensor_transpose')), ('activation_kv', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_kv_head_dim', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('q_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('kv_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('qkv', ()), ('kv', ()), ('kv_head_dim', ()), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'expert')), ('q_lora_up_proj', ()), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'expert')), ('kv_lora_up_proj', ()), ('activation_batch_moe', ('data', 'fsdp', 'fsdp_transpose')), ('activation_length_moe', ('sequence', 'context')), ('activation_length_moe', ('context',)), ('activation_norm_length_moe', ('tensor_sequence', 'context', 'sequence')), ('activation_embed_moe', ('tensor', 'tensor_transpose')), ('activation_mlp_moe', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_exp', ('expert',)), ('exp', 'expert'), ('mlp_moe', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'context')), ('embed_moe', ('fsdp', 'sequence', 'context')), ('activation_mlp', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_length', ('sequence', 'context')), ('activation_length', ('context',)), ('activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_embed', ('tensor', 'tensor_transpose')), ('activation_stage', 'stage'), ('mlp', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'context', 'expert')), ('norm', ('tensor', 'tensor_transpose')), ('layers', 'stage'), ('diloco', 'diloco'), ('engram_dim', ('tensor',)), ('dense_layers', ()), ('moe_layers', ()), ('mhc', ()), ('prefill_activation_length', ('sequence', 'context')), ('prefill_activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_prefill_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_length', ('sequence',)), ('cache_heads', ('autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence')), ('cache_heads', ('autoregressive', 'tensor', 'tensor_sequence')), ('paged_kv_heads', ('tensor',)), ('cache_batch_prefill', ()), ('cache_batch', ()), ('cache_heads_none', ()), ('cache_kv', ()), ('cache_sequence', ()), ('num_pages', ()), ('tokens_per_page', ()), ('paged_kv_head_dim_size', ()), ('mlp_no_fsdp', ('tensor', 'tensor_sequence', 'autoregressive')), ('embed_tensor_transpose', ('tensor_transpose',)), ('exp_with_fsdp', 'fsdp')) I0424 12:14:19.910835 134763599910720 pyconfig.py:471] Config param logits_dot_in_fp32: False I0424 12:14:19.910851 134763599910720 pyconfig.py:471] Config param logits_via_embedding: True I0424 12:14:19.910867 134763599910720 pyconfig.py:471] Config param lora_input_adapters_path: I0424 12:14:19.910882 134763599910720 pyconfig.py:471] Config param loss_algo: grpo I0424 12:14:19.910898 134763599910720 pyconfig.py:471] Config param lr_schedule_type: LearningRateScheduleType.COSINE I0424 12:14:19.910916 134763599910720 pyconfig.py:471] Config param managed_mldiagnostics: False I0424 12:14:19.910933 134763599910720 pyconfig.py:471] Config param managed_mldiagnostics_dir: /deps/maxtext_output/gpt3-52k_2026-04-24-12-14/managed-mldiagnostics I0424 12:14:19.910949 134763599910720 pyconfig.py:471] Config param managed_mldiagnostics_run_group: I0424 12:14:19.910965 134763599910720 pyconfig.py:471] Config param matmul_precision: MatmulPrecision.DEFAULT I0424 12:14:19.910986 134763599910720 pyconfig.py:471] Config param max_checkify: False I0424 12:14:19.911001 134763599910720 pyconfig.py:471] Config param max_concurrency: 256 I0424 12:14:19.911017 134763599910720 pyconfig.py:471] Config param max_corpus_chars: 10000000 I0424 12:14:19.911033 134763599910720 pyconfig.py:471] Config param max_num_batched_tokens: None I0424 12:14:19.911047 134763599910720 pyconfig.py:471] Config param max_num_checkpoints_to_keep: None I0424 12:14:19.911063 134763599910720 pyconfig.py:471] Config param max_num_images_per_example: -1 I0424 12:14:19.911078 134763599910720 pyconfig.py:471] Config param max_num_seqs: None I0424 12:14:19.911093 134763599910720 pyconfig.py:471] Config param max_position_embeddings: 163840 I0424 12:14:19.911108 134763599910720 pyconfig.py:471] Config param max_prefill_predict_length: 64 I0424 12:14:19.911123 134763599910720 pyconfig.py:471] Config param max_sample_len_for_audio: 10000 I0424 12:14:19.911138 134763599910720 pyconfig.py:471] Config param max_segments_per_seq: -1 I0424 12:14:19.911154 134763599910720 pyconfig.py:471] Config param max_source_positions_for_audio: 1500 I0424 12:14:19.911170 134763599910720 pyconfig.py:471] Config param max_target_length: 2048 I0424 12:14:19.911185 134763599910720 pyconfig.py:471] Config param max_timescale_for_audio: 10000.0 I0424 12:14:19.911200 134763599910720 pyconfig.py:471] Config param megablox: True I0424 12:14:19.911216 134763599910720 pyconfig.py:471] Config param merge_gating_gmm: False I0424 12:14:19.911231 134763599910720 pyconfig.py:471] Config param mesh_axes: ['diloco', 'data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'] I0424 12:14:19.911249 134763599910720 pyconfig.py:471] Config param metrics_dir: /deps/maxtext_output/gpt3-52k_2026-04-24-12-14/metrics/ I0424 12:14:19.911264 134763599910720 pyconfig.py:471] Config param metrics_file: I0424 12:14:19.911279 134763599910720 pyconfig.py:471] Config param mhc_expansion_rate: 1 I0424 12:14:19.911294 134763599910720 pyconfig.py:471] Config param micro_batch_size_to_eval_on: 64 I0424 12:14:19.911310 134763599910720 pyconfig.py:471] Config param micro_batch_size_to_train_on: 64 I0424 12:14:19.911325 134763599910720 pyconfig.py:471] Config param mla_kv: RematLocation.REMAT I0424 12:14:19.911340 134763599910720 pyconfig.py:471] Config param mla_naive_kvcache: True I0424 12:14:19.911356 134763599910720 pyconfig.py:471] Config param mla_q: RematLocation.REMAT I0424 12:14:19.911372 134763599910720 pyconfig.py:471] Config param mlp_activations: ['gelu'] I0424 12:14:19.911388 134763599910720 pyconfig.py:471] Config param mlp_activations_limit: -1.0 I0424 12:14:19.911404 134763599910720 pyconfig.py:471] Config param mlp_bias: False I0424 12:14:19.911419 134763599910720 pyconfig.py:471] Config param mlp_dim: 64 I0424 12:14:19.911434 134763599910720 pyconfig.py:471] Config param mlpwi: RematLocation.REMAT I0424 12:14:19.911450 134763599910720 pyconfig.py:471] Config param mlpwi_0: RematLocation.REMAT I0424 12:14:19.911465 134763599910720 pyconfig.py:471] Config param mlpwi_1: RematLocation.REMAT I0424 12:14:19.911481 134763599910720 pyconfig.py:471] Config param mlpwo: RematLocation.REMAT I0424 12:14:19.911495 134763599910720 pyconfig.py:471] Config param moba: False I0424 12:14:19.911510 134763599910720 pyconfig.py:471] Config param moba_chunk_size: 1024 I0424 12:14:19.911525 134763599910720 pyconfig.py:471] Config param moba_topk: 8 I0424 12:14:19.911541 134763599910720 pyconfig.py:471] Config param model_call_mode: I0424 12:14:19.911555 134763599910720 pyconfig.py:471] Config param model_name: gpt3-52k I0424 12:14:19.911570 134763599910720 pyconfig.py:471] Config param moe_expert_input_dim: -1 I0424 12:14:19.911586 134763599910720 pyconfig.py:471] Config param moe_fsdp_use_two_stage_all_gather: False I0424 12:14:19.911600 134763599910720 pyconfig.py:471] Config param moe_mlp_dim: -1 I0424 12:14:19.911616 134763599910720 pyconfig.py:471] Config param moe_mlpwi_0: RematLocation.REMAT I0424 12:14:19.911631 134763599910720 pyconfig.py:471] Config param moe_mlpwi_1: RematLocation.REMAT I0424 12:14:19.911647 134763599910720 pyconfig.py:471] Config param moe_mlpwo: RematLocation.REMAT I0424 12:14:19.911672 134763599910720 pyconfig.py:471] Config param monitor_goodput: False I0424 12:14:19.911688 134763599910720 pyconfig.py:471] Config param monitor_step_time_deviation: True I0424 12:14:19.911703 134763599910720 pyconfig.py:471] Config param mrope_section: [24, 20, 20] I0424 12:14:19.911719 134763599910720 pyconfig.py:471] Config param mscale: 1.0 I0424 12:14:19.911735 134763599910720 pyconfig.py:471] Config param mtc_data_parallelism: 0 I0424 12:14:19.911751 134763599910720 pyconfig.py:471] Config param mtp_eval_target_module: 0 I0424 12:14:19.911766 134763599910720 pyconfig.py:471] Config param mtp_loss_scaling_factor: 0.1 I0424 12:14:19.911781 134763599910720 pyconfig.py:471] Config param mtp_num_layers: 0 I0424 12:14:19.911797 134763599910720 pyconfig.py:471] Config param mu_dtype: float32 I0424 12:14:19.911822 134763599910720 pyconfig.py:471] Config param multi_sampling: False I0424 12:14:19.911838 134763599910720 pyconfig.py:471] Config param multi_tier_checkpointing_backup_interval_minutes: 0 I0424 12:14:19.911854 134763599910720 pyconfig.py:471] Config param muon_beta: 0.95 I0424 12:14:19.911871 134763599910720 pyconfig.py:471] Config param muon_consistent_rms: None I0424 12:14:19.911885 134763599910720 pyconfig.py:471] Config param muon_weight_decay: 0.0 I0424 12:14:19.911901 134763599910720 pyconfig.py:471] Config param n_routing_groups: -1 I0424 12:14:19.911918 134763599910720 pyconfig.py:471] Config param n_window_for_audio: 50 I0424 12:14:19.911933 134763599910720 pyconfig.py:471] Config param n_window_infer_for_audio: 800 I0424 12:14:19.911948 134763599910720 pyconfig.py:471] Config param nope_layer_interval: -1 I0424 12:14:19.911964 134763599910720 pyconfig.py:471] Config param norm_topk_prob: False I0424 12:14:19.911985 134763599910720 pyconfig.py:471] Config param normalization_layer_epsilon: 1e-05 I0424 12:14:19.912004 134763599910720 pyconfig.py:471] Config param normalize_embedding_logits: False I0424 12:14:19.912020 134763599910720 pyconfig.py:471] Config param num_attention_heads_for_vit: 16 I0424 12:14:19.912034 134763599910720 pyconfig.py:471] Config param num_batches: 4 I0424 12:14:19.912051 134763599910720 pyconfig.py:471] Config param num_channels_for_vit: 3 I0424 12:14:19.912067 134763599910720 pyconfig.py:471] Config param num_conv_layers_for_audio: 3 I0424 12:14:19.912083 134763599910720 pyconfig.py:471] Config param num_decoder_layers: 1 I0424 12:14:19.912097 134763599910720 pyconfig.py:471] Config param num_diloco_replicas: 1 I0424 12:14:19.912113 134763599910720 pyconfig.py:471] Config param num_epoch: 1 I0424 12:14:19.912127 134763599910720 pyconfig.py:471] Config param num_eval_passes: 1 I0424 12:14:19.912143 134763599910720 pyconfig.py:471] Config param num_experts: 1 I0424 12:14:19.912158 134763599910720 pyconfig.py:471] Config param num_experts_per_tok: 1 I0424 12:14:19.912174 134763599910720 pyconfig.py:471] Config param num_generations: 2 I0424 12:14:19.912189 134763599910720 pyconfig.py:471] Config param num_hidden_layers_for_vit: 34 I0424 12:14:19.912203 134763599910720 pyconfig.py:471] Config param num_iterations: 1 I0424 12:14:19.912219 134763599910720 pyconfig.py:471] Config param num_kv_heads: 2 I0424 12:14:19.912234 134763599910720 pyconfig.py:471] Config param num_layers_per_pipeline_stage: 1 I0424 12:14:19.912249 134763599910720 pyconfig.py:471] Config param num_mel_bins_for_audio: 128 I0424 12:14:19.912265 134763599910720 pyconfig.py:471] Config param num_pipeline_microbatches: -1 I0424 12:14:19.912279 134763599910720 pyconfig.py:471] Config param num_pipeline_repeats: -1 I0424 12:14:19.912295 134763599910720 pyconfig.py:471] Config param num_position_embeddings_for_vit: 1024 I0424 12:14:19.912309 134763599910720 pyconfig.py:471] Config param num_query_heads: 2 I0424 12:14:19.912325 134763599910720 pyconfig.py:471] Config param num_samplers_slices: -1 I0424 12:14:19.912340 134763599910720 pyconfig.py:471] Config param num_slices: 1 I0424 12:14:19.912356 134763599910720 pyconfig.py:471] Config param num_target_devices: 32 I0424 12:14:19.912371 134763599910720 pyconfig.py:471] Config param num_test_batches: 5 I0424 12:14:19.912386 134763599910720 pyconfig.py:471] Config param num_trainer_slices: -1 I0424 12:14:19.912401 134763599910720 pyconfig.py:471] Config param num_vocab_tiling: 1 I0424 12:14:19.912416 134763599910720 pyconfig.py:471] Config param off_policy_steps: 0 I0424 12:14:19.912431 134763599910720 pyconfig.py:471] Config param offline_data_dir: None I0424 12:14:19.912446 134763599910720 pyconfig.py:471] Config param opt_type: OptimizerType.ADAM_PAX I0424 12:14:19.912464 134763599910720 pyconfig.py:471] Config param optimize_mesh_for_tpu_v6e: False I0424 12:14:19.912479 134763599910720 pyconfig.py:471] Config param optimizer_memory_host_offload: False I0424 12:14:19.912495 134763599910720 pyconfig.py:471] Config param original_max_position_embeddings: 4096 I0424 12:14:19.912510 134763599910720 pyconfig.py:471] Config param out_hidden_size_for_vit: 512 I0424 12:14:19.912525 134763599910720 pyconfig.py:471] Config param out_proj: RematLocation.REMAT I0424 12:14:19.912541 134763599910720 pyconfig.py:471] Config param output_dim_for_audio: 512 I0424 12:14:19.912555 134763599910720 pyconfig.py:471] Config param override_logical_axis_rules: False I0424 12:14:19.912571 134763599910720 pyconfig.py:471] Config param override_model_config: True I0424 12:14:19.912585 134763599910720 pyconfig.py:471] Config param packing: True I0424 12:14:19.912601 134763599910720 pyconfig.py:471] Config param pagedattn_head_dim_alignment: 128 I0424 12:14:19.912616 134763599910720 pyconfig.py:471] Config param pagedattn_max_pages_per_group: -1 I0424 12:14:19.912631 134763599910720 pyconfig.py:471] Config param pagedattn_num_pages: 64 I0424 12:14:19.912646 134763599910720 pyconfig.py:471] Config param pagedattn_pages_per_compute_block: 4 I0424 12:14:19.912670 134763599910720 pyconfig.py:471] Config param pagedattn_tokens_per_page: 32 I0424 12:14:19.912684 134763599910720 pyconfig.py:471] Config param param_scan_axis: 1 I0424 12:14:19.912700 134763599910720 pyconfig.py:471] Config param parameter_memory_host_offload: False I0424 12:14:19.912714 134763599910720 pyconfig.py:471] Config param partial_rotary_factor: 1.0 I0424 12:14:19.912729 134763599910720 pyconfig.py:471] Config param patch_size_for_vit: 14 I0424 12:14:19.912744 134763599910720 pyconfig.py:471] Config param penalty_incorrect_answer: -1.0 I0424 12:14:19.912759 134763599910720 pyconfig.py:471] Config param penalty_incorrect_format: -0.5 I0424 12:14:19.912775 134763599910720 pyconfig.py:471] Config param per_device_batch_size: 2 I0424 12:14:19.912791 134763599910720 pyconfig.py:471] Config param per_device_batch_size_increment: 2.0 I0424 12:14:19.912807 134763599910720 pyconfig.py:471] Config param per_device_batch_size_start: 4.0 I0424 12:14:19.912822 134763599910720 pyconfig.py:471] Config param pipeline_delay_activation_forwarding: False I0424 12:14:19.912838 134763599910720 pyconfig.py:471] Config param pipeline_fsdp_ag_once: False I0424 12:14:19.912852 134763599910720 pyconfig.py:471] Config param pipeline_fsdp_ag_per_repeat: False I0424 12:14:19.912868 134763599910720 pyconfig.py:471] Config param pipeline_parallel_layers: 1 I0424 12:14:19.912884 134763599910720 pyconfig.py:471] Config param pixel_shuffle_ratio_for_vit: 0.5 I0424 12:14:19.912900 134763599910720 pyconfig.py:471] Config param posemb_type_for_vit: learn I0424 12:14:19.912914 134763599910720 pyconfig.py:471] Config param position_id_per_seconds: 25 I0424 12:14:19.912930 134763599910720 pyconfig.py:471] Config param prefill_cache_axis_order: 1,2,0,3 I0424 12:14:19.912944 134763599910720 pyconfig.py:471] Config param prefill_cache_dir: I0424 12:14:19.912960 134763599910720 pyconfig.py:471] Config param prefill_chunk_size: 256 I0424 12:14:19.912978 134763599910720 pyconfig.py:471] Config param prefill_slice: v5e-16 I0424 12:14:19.912994 134763599910720 pyconfig.py:471] Config param prefix_caching_dram_byte: 100000000000 I0424 12:14:19.913008 134763599910720 pyconfig.py:471] Config param prefix_caching_hbm_byte: 10000000000 I0424 12:14:19.913023 134763599910720 pyconfig.py:471] Config param prefuse_moe_weights: False I0424 12:14:19.913039 134763599910720 pyconfig.py:471] Config param profile_cleanly: True I0424 12:14:19.913054 134763599910720 pyconfig.py:471] Config param profile_periodically_period: -1 I0424 12:14:19.913070 134763599910720 pyconfig.py:471] Config param profile_power_events: False I0424 12:14:19.913086 134763599910720 pyconfig.py:471] Config param profiler: ProfilerType.NONE I0424 12:14:19.913102 134763599910720 pyconfig.py:471] Config param profiler_steps: 5 I0424 12:14:19.913117 134763599910720 pyconfig.py:471] Config param projector_dropout_for_vit: 0.0 I0424 12:14:19.913133 134763599910720 pyconfig.py:471] Config param projector_input_dim_for_vit: 4096 I0424 12:14:19.913148 134763599910720 pyconfig.py:471] Config param projector_output_dim_for_vit: 4096 I0424 12:14:19.913163 134763599910720 pyconfig.py:471] Config param prometheus_port: 0 I0424 12:14:19.913179 134763599910720 pyconfig.py:471] Config param prompt: I love to I0424 12:14:19.913194 134763599910720 pyconfig.py:471] Config param pure_nnx: False I0424 12:14:19.913210 134763599910720 pyconfig.py:471] Config param pure_nnx_decoder: False I0424 12:14:19.913225 134763599910720 pyconfig.py:471] Config param q_lora_rank: 0 I0424 12:14:19.913240 134763599910720 pyconfig.py:471] Config param qk_clip_threshold: 100.0 I0424 12:14:19.913256 134763599910720 pyconfig.py:471] Config param qk_nope_head_dim: 128 I0424 12:14:19.913270 134763599910720 pyconfig.py:471] Config param qk_norm_with_scale: True I0424 12:14:19.913286 134763599910720 pyconfig.py:471] Config param qk_rope_head_dim: 64 I0424 12:14:19.913301 134763599910720 pyconfig.py:471] Config param qkv_proj: RematLocation.REMAT I0424 12:14:19.913317 134763599910720 pyconfig.py:471] Config param quant_cfg_path: I0424 12:14:19.913332 134763599910720 pyconfig.py:471] Config param quantization: QuantizationType.NONE I0424 12:14:19.913349 134763599910720 pyconfig.py:471] Config param quantization_local_shard_count: 4 I0424 12:14:19.913366 134763599910720 pyconfig.py:471] Config param quantize_kvcache: False I0424 12:14:19.913383 134763599910720 pyconfig.py:471] Config param query_proj: RematLocation.REMAT I0424 12:14:19.913398 134763599910720 pyconfig.py:471] Config param query_wa_proj: RematLocation.REMAT I0424 12:14:19.913414 134763599910720 pyconfig.py:471] Config param ragged_block_size: 256 I0424 12:14:19.913429 134763599910720 pyconfig.py:471] Config param ragged_buffer_factor: -1.0 I0424 12:14:19.913444 134763599910720 pyconfig.py:471] Config param rampup_end_step: 0 I0424 12:14:19.913460 134763599910720 pyconfig.py:471] Config param rampup_samples_per_increment_to_load: None I0424 12:14:19.913474 134763599910720 pyconfig.py:471] Config param reasoning_end_token: </reasoning> I0424 12:14:19.913490 134763599910720 pyconfig.py:471] Config param reasoning_start_token: <reasoning> I0424 12:14:19.913506 134763599910720 pyconfig.py:471] Config param record_internal_nn_metrics: 0 I0424 12:14:19.913522 134763599910720 pyconfig.py:471] Config param remat_policy: full I0424 12:14:19.913536 134763599910720 pyconfig.py:471] Config param remat_policy_for_vit: minimal I0424 12:14:19.913551 134763599910720 pyconfig.py:471] Config param remove_size_one_mesh_axis_from_type: True I0424 12:14:19.913565 134763599910720 pyconfig.py:471] Config param replicate_quant_scale: False I0424 12:14:19.913582 134763599910720 pyconfig.py:471] Config param replicator_backup_interval_minutes: 0 I0424 12:14:19.913598 134763599910720 pyconfig.py:471] Config param report_heartbeat_metric_for_gcp_monitoring: False I0424 12:14:19.913612 134763599910720 pyconfig.py:471] Config param report_performance_metric_for_gcp_monitoring: False I0424 12:14:19.913628 134763599910720 pyconfig.py:471] Config param reshape_q: False I0424 12:14:19.913644 134763599910720 pyconfig.py:471] Config param return_log_prob: False I0424 12:14:19.913667 134763599910720 pyconfig.py:471] Config param reuse_example_batch: 0 I0424 12:14:19.913683 134763599910720 pyconfig.py:471] Config param reward_exact_answer: 5.0 I0424 12:14:19.913698 134763599910720 pyconfig.py:471] Config param reward_exact_format_match: 3.0 I0424 12:14:19.913713 134763599910720 pyconfig.py:471] Config param reward_partial_format_match: 0.5 I0424 12:14:19.913729 134763599910720 pyconfig.py:471] Config param reward_ratio_guess_to_answer_high: 0.5 I0424 12:14:19.913745 134763599910720 pyconfig.py:471] Config param reward_ratio_guess_to_answer_low: 0.25 I0424 12:14:19.913761 134763599910720 pyconfig.py:471] Config param reward_white_space_format_match: 1.5 I0424 12:14:19.913777 134763599910720 pyconfig.py:471] Config param rl: {'num_generations': 2, 'num_iterations': 1, 'grpo_beta': 0.08, 'grpo_epsilon': 0.2, 'loss_algo': 'grpo', 'use_agentic_rollout': False, 'max_concurrency': 256, 'off_policy_steps': 0, 'system_prompt': '', 'degenerate_group_masking': True, 'epsilon_high': None} I0424 12:14:19.913797 134763599910720 pyconfig.py:471] Config param rollout_data_parallelism: -1 I0424 12:14:19.913812 134763599910720 pyconfig.py:471] Config param rollout_expert_parallelism: 1 I0424 12:14:19.913828 134763599910720 pyconfig.py:471] Config param rollout_micro_batch_size: -1 I0424 12:14:19.913843 134763599910720 pyconfig.py:471] Config param rollout_tensor_parallelism: -1 I0424 12:14:19.913858 134763599910720 pyconfig.py:471] Config param rope_attention_scaling: False I0424 12:14:19.913873 134763599910720 pyconfig.py:471] Config param rope_factor: 40 I0424 12:14:19.913888 134763599910720 pyconfig.py:471] Config param rope_interleave: True I0424 12:14:19.913904 134763599910720 pyconfig.py:471] Config param rope_linear_scaling_factor: 1.0 I0424 12:14:19.913918 134763599910720 pyconfig.py:471] Config param rope_max_timescale: 10000 I0424 12:14:19.913934 134763599910720 pyconfig.py:471] Config param rope_min_timescale: 1 I0424 12:14:19.913950 134763599910720 pyconfig.py:471] Config param rope_theta_for_vit: 10000 I0424 12:14:19.913964 134763599910720 pyconfig.py:471] Config param rope_truncate: True I0424 12:14:19.913982 134763599910720 pyconfig.py:471] Config param rope_type: RopeType.DEFAULT I0424 12:14:19.914000 134763599910720 pyconfig.py:471] Config param rope_use_scale: True I0424 12:14:19.914016 134763599910720 pyconfig.py:471] Config param routed_bias: False I0424 12:14:19.914030 134763599910720 pyconfig.py:471] Config param routed_bias_update_rate: 0.0 I0424 12:14:19.914046 134763599910720 pyconfig.py:471] Config param routed_scaling_factor: 1.0 I0424 12:14:19.914061 134763599910720 pyconfig.py:471] Config param routed_score_func: I0424 12:14:19.914077 134763599910720 pyconfig.py:471] Config param run_name: gpt3-52k_2026-04-24-12-14 I0424 12:14:19.914093 134763599910720 pyconfig.py:471] Config param sa_block_kv: 512 I0424 12:14:19.914107 134763599910720 pyconfig.py:471] Config param sa_block_kv_compute: 512 I0424 12:14:19.914123 134763599910720 pyconfig.py:471] Config param sa_block_kv_dkv: 512 I0424 12:14:19.914139 134763599910720 pyconfig.py:471] Config param sa_block_kv_dkv_compute: 512 I0424 12:14:19.914155 134763599910720 pyconfig.py:471] Config param sa_block_kv_dq: 512 I0424 12:14:19.914171 134763599910720 pyconfig.py:471] Config param sa_block_q: 512 I0424 12:14:19.914186 134763599910720 pyconfig.py:471] Config param sa_block_q_dkv: 512 I0424 12:14:19.914202 134763599910720 pyconfig.py:471] Config param sa_block_q_dq: 512 I0424 12:14:19.914218 134763599910720 pyconfig.py:471] Config param sa_k_layout: HEAD_DIM_MINOR I0424 12:14:19.914233 134763599910720 pyconfig.py:471] Config param sa_q_layout: HEAD_DIM_MINOR I0424 12:14:19.914249 134763599910720 pyconfig.py:471] Config param sa_use_fused_bwd_kernel: False I0424 12:14:19.914265 134763599910720 pyconfig.py:471] Config param sa_v_layout: HEAD_DIM_MINOR I0424 12:14:19.914279 134763599910720 pyconfig.py:471] Config param sampler_devices_fraction: 0.5 I0424 12:14:19.914295 134763599910720 pyconfig.py:471] Config param save_checkpoint_on_completion: True I0424 12:14:19.914311 134763599910720 pyconfig.py:471] Config param save_config_to_gcs: False I0424 12:14:19.914325 134763599910720 pyconfig.py:471] Config param save_quantized_params_path: I0424 12:14:19.914341 134763599910720 pyconfig.py:471] Config param scale_embedding_for_audio: True I0424 12:14:19.914355 134763599910720 pyconfig.py:471] Config param scan_layers: True I0424 12:14:19.914371 134763599910720 pyconfig.py:471] Config param scan_layers_per_stage: False I0424 12:14:19.914386 134763599910720 pyconfig.py:471] Config param scan_pipeline_iterations: True I0424 12:14:19.914401 134763599910720 pyconfig.py:471] Config param scan_pipeline_repeats: False I0424 12:14:19.914416 134763599910720 pyconfig.py:471] Config param set_remat_policy_on_layers_per_stage: False I0424 12:14:19.914432 134763599910720 pyconfig.py:471] Config param set_remat_policy_on_pipeline_iterations: True I0424 12:14:19.914446 134763599910720 pyconfig.py:471] Config param sft_train_on_completion_only: False I0424 12:14:19.914462 134763599910720 pyconfig.py:471] Config param shard_exp_on_fsdp: False I0424 12:14:19.914476 134763599910720 pyconfig.py:471] Config param shard_mode: ShardMode.AUTO I0424 12:14:19.914493 134763599910720 pyconfig.py:471] Config param shard_optimizer_over_data: False I0424 12:14:19.914508 134763599910720 pyconfig.py:471] Config param sharding_strategy: None I0424 12:14:19.914524 134763599910720 pyconfig.py:471] Config param sharding_tolerance: 0.02 I0424 12:14:19.914539 134763599910720 pyconfig.py:471] Config param shardy: True I0424 12:14:19.914554 134763599910720 pyconfig.py:471] Config param share_kv_projections: False I0424 12:14:19.914570 134763599910720 pyconfig.py:471] Config param shared_experts: 0 I0424 12:14:19.914584 134763599910720 pyconfig.py:471] Config param sinkhorn_iterations: 20 I0424 12:14:19.914600 134763599910720 pyconfig.py:471] Config param skip_first_n_steps_for_profiler: 1 I0424 12:14:19.914614 134763599910720 pyconfig.py:471] Config param skip_jax_distributed_system: False I0424 12:14:19.914630 134763599910720 pyconfig.py:471] Config param skip_step_interval: 128 I0424 12:14:19.914644 134763599910720 pyconfig.py:471] Config param skip_step_on_spikes: False I0424 12:14:19.914669 134763599910720 pyconfig.py:471] Config param skip_step_scaling_factor: 6.0 I0424 12:14:19.914685 134763599910720 pyconfig.py:471] Config param sliding_window_size: 0 I0424 12:14:19.914700 134763599910720 pyconfig.py:471] Config param solution_end_token: </answer> I0424 12:14:19.914716 134763599910720 pyconfig.py:471] Config param solution_start_token: <answer> I0424 12:14:19.914731 134763599910720 pyconfig.py:471] Config param source_checkpoint_layout: orbax I0424 12:14:19.914746 134763599910720 pyconfig.py:471] Config param sparse_matmul: True I0424 12:14:19.914763 134763599910720 pyconfig.py:471] Config param spatial_merge_size_for_vit: 2 I0424 12:14:19.914778 134763599910720 pyconfig.py:471] Config param stack_prefill_result_cache: False I0424 12:14:19.914795 134763599910720 pyconfig.py:471] Config param stack_trace_interval_seconds: 600 I0424 12:14:19.914811 134763599910720 pyconfig.py:471] Config param stack_trace_to_cloud: False I0424 12:14:19.914827 134763599910720 pyconfig.py:471] Config param step_deviation_interval_seconds: 30 I0424 12:14:19.914842 134763599910720 pyconfig.py:471] Config param steps: 200000 I0424 12:14:19.914858 134763599910720 pyconfig.py:471] Config param stop_strings: None I0424 12:14:19.914874 134763599910720 pyconfig.py:471] Config param student_overrides: {'model_name': 'llama3.1-8b'} I0424 12:14:19.914890 134763599910720 pyconfig.py:471] Config param student_params_to_update: None I0424 12:14:19.914906 134763599910720 pyconfig.py:471] Config param subslice_shape: I0424 12:14:19.914921 134763599910720 pyconfig.py:471] Config param swap_space_vllm_gb: 2 I0424 12:14:19.914947 134763599910720 pyconfig.py:471] Config param system_prompt: I0424 12:14:19.914963 134763599910720 pyconfig.py:471] Config param target_eval_loss: 0.0 I0424 12:14:19.914980 134763599910720 pyconfig.py:471] Config param teacher_overrides: {'model_name': 'llama3.1-8b'} I0424 12:14:19.914996 134763599910720 pyconfig.py:471] Config param temperature_tuning: False I0424 12:14:19.915012 134763599910720 pyconfig.py:471] Config param temporal_patch_size_for_vit: 2 I0424 12:14:19.915028 134763599910720 pyconfig.py:471] Config param tensorboard_dir: /deps/maxtext_output/gpt3-52k_2026-04-24-12-14/tensorboard/ I0424 12:14:19.915044 134763599910720 pyconfig.py:471] Config param tensors_on_device: None I0424 12:14:19.915058 134763599910720 pyconfig.py:471] Config param tensors_to_offload: None I0424 12:14:19.915074 134763599910720 pyconfig.py:471] Config param test_batch_start_index: 0 I0424 12:14:19.915088 134763599910720 pyconfig.py:471] Config param tile_size_for_vit: 336 I0424 12:14:19.915103 134763599910720 pyconfig.py:471] Config param tokenize_eval_data: True I0424 12:14:19.915119 134763599910720 pyconfig.py:471] Config param tokenize_train_data: True I0424 12:14:19.915134 134763599910720 pyconfig.py:471] Config param tokenizer_path: meta-llama/Llama-3.1-8B I0424 12:14:19.915150 134763599910720 pyconfig.py:471] Config param tokenizer_type: TokenizerType.HUGGINGFACE I0424 12:14:19.915168 134763599910720 pyconfig.py:471] Config param topk_routing_group: -1 I0424 12:14:19.915183 134763599910720 pyconfig.py:471] Config param train_data_columns: ['text'] I0424 12:14:19.915198 134763599910720 pyconfig.py:471] Config param train_fraction: 1.0 I0424 12:14:19.915214 134763599910720 pyconfig.py:471] Config param train_image_column: image I0424 12:14:19.915229 134763599910720 pyconfig.py:471] Config param train_micro_batch_size: -1 I0424 12:14:19.915245 134763599910720 pyconfig.py:471] Config param train_split: train I0424 12:14:19.915259 134763599910720 pyconfig.py:471] Config param trainable_parameters_mask: [] I0424 12:14:19.915275 134763599910720 pyconfig.py:471] Config param trainable_position_size: 2048 I0424 12:14:19.915290 134763599910720 pyconfig.py:471] Config param trainer_devices_fraction: 0.5 I0424 12:14:19.915307 134763599910720 pyconfig.py:471] Config param upload_all_profiler_results: False I0424 12:14:19.915321 134763599910720 pyconfig.py:471] Config param use_2d_fsdp_sharding: False I0424 12:14:19.915337 134763599910720 pyconfig.py:471] Config param use_agentic_rollout: False I0424 12:14:19.915354 134763599910720 pyconfig.py:471] Config param use_audio: False I0424 12:14:19.915369 134763599910720 pyconfig.py:471] Config param use_audio_in_video: False I0424 12:14:19.915385 134763599910720 pyconfig.py:471] Config param use_batch_split_schedule: False I0424 12:14:19.915400 134763599910720 pyconfig.py:471] Config param use_chat_template: False I0424 12:14:19.915415 134763599910720 pyconfig.py:471] Config param use_chunked_prefill: False I0424 12:14:19.915430 134763599910720 pyconfig.py:471] Config param use_custom_sort_vjp: True I0424 12:14:19.915445 134763599910720 pyconfig.py:471] Config param use_dpo: False I0424 12:14:19.915459 134763599910720 pyconfig.py:471] Config param use_gather_mosaic_kernel: False I0424 12:14:19.915476 134763599910720 pyconfig.py:471] Config param use_grpo: True I0424 12:14:19.915491 134763599910720 pyconfig.py:471] Config param use_indexer: False I0424 12:14:19.915506 134763599910720 pyconfig.py:471] Config param use_iota_embed: True I0424 12:14:19.915522 134763599910720 pyconfig.py:471] Config param use_jax_splash: False I0424 12:14:19.915536 134763599910720 pyconfig.py:471] Config param use_max_logit_estimate: -1 I0424 12:14:19.915552 134763599910720 pyconfig.py:471] Config param use_mrope: False I0424 12:14:19.915566 134763599910720 pyconfig.py:471] Config param use_multimodal: False I0424 12:14:19.915582 134763599910720 pyconfig.py:471] Config param use_pathways: True I0424 12:14:19.915596 134763599910720 pyconfig.py:471] Config param use_post_attn_norm: False I0424 12:14:19.915612 134763599910720 pyconfig.py:471] Config param use_post_ffw_norm: False I0424 12:14:19.915628 134763599910720 pyconfig.py:471] Config param use_qk_clip: False I0424 12:14:19.915642 134763599910720 pyconfig.py:471] Config param use_qk_norm: False I0424 12:14:19.915672 134763599910720 pyconfig.py:471] Config param use_qk_norm_in_gdn: True I0424 12:14:19.915689 134763599910720 pyconfig.py:471] Config param use_qwix_quantization: False I0424 12:14:19.915705 134763599910720 pyconfig.py:471] Config param use_ragged_attention: False I0424 12:14:19.915721 134763599910720 pyconfig.py:471] Config param use_random_routing: False I0424 12:14:19.915736 134763599910720 pyconfig.py:471] Config param use_replicator_service: False I0424 12:14:19.915751 134763599910720 pyconfig.py:471] Config param use_ring_of_experts: False I0424 12:14:19.915766 134763599910720 pyconfig.py:471] Config param use_sft: False I0424 12:14:19.915782 134763599910720 pyconfig.py:471] Config param use_splash_scheduler: False I0424 12:14:19.915796 134763599910720 pyconfig.py:471] Config param use_tokamax_gmm: False I0424 12:14:19.915812 134763599910720 pyconfig.py:471] Config param use_tokamax_splash: False I0424 12:14:19.915826 134763599910720 pyconfig.py:471] Config param use_truncation: True I0424 12:14:19.915842 134763599910720 pyconfig.py:471] Config param use_tunix_gradient_accumulation: False I0424 12:14:19.915856 134763599910720 pyconfig.py:471] Config param use_untrainable_positional_embedding: False I0424 12:14:19.915872 134763599910720 pyconfig.py:471] Config param use_vertex_tensorboard: False I0424 12:14:19.915886 134763599910720 pyconfig.py:471] Config param using_pipeline_parallelism: False I0424 12:14:19.915902 134763599910720 pyconfig.py:471] Config param v_head_dim: 128 I0424 12:14:19.915916 134763599910720 pyconfig.py:471] Config param v_norm_with_scale: True I0424 12:14:19.915931 134763599910720 pyconfig.py:471] Config param value_proj: RematLocation.REMAT I0424 12:14:19.915948 134763599910720 pyconfig.py:471] Config param vertex_tensorboard_project: I0424 12:14:19.915963 134763599910720 pyconfig.py:471] Config param vertex_tensorboard_region: I0424 12:14:19.915981 134763599910720 pyconfig.py:471] Config param video_path: I0424 12:14:19.915997 134763599910720 pyconfig.py:471] Config param video_placeholder: <|video|> I0424 12:14:19.916011 134763599910720 pyconfig.py:471] Config param vision_output_dim_for_vit: 4096 I0424 12:14:19.916026 134763599910720 pyconfig.py:471] Config param vision_output_length: -1 I0424 12:14:19.916041 134763599910720 pyconfig.py:471] Config param vllm_additional_config: {} I0424 12:14:19.916057 134763599910720 pyconfig.py:471] Config param vllm_hf_config_path: I0424 12:14:19.916074 134763599910720 pyconfig.py:471] Config param vllm_hf_overrides: {} I0424 12:14:19.916088 134763599910720 pyconfig.py:471] Config param vocab_size: 32000 I0424 12:14:19.916105 134763599910720 pyconfig.py:471] Config param warmup_steps_fraction: 0.1 I0424 12:14:19.916121 134763599910720 pyconfig.py:471] Config param weight_dtype: float32 I0424 12:14:19.916146 134763599910720 pyconfig.py:471] Config param weight_quantization_calibration_method: absmax I0424 12:14:19.916163 134763599910720 pyconfig.py:471] Config param wi_tile_dlhs_batch_seq: 512 I0424 12:14:19.916179 134763599910720 pyconfig.py:471] Config param wi_tile_dlhs_embed_dim: 1024 I0424 12:14:19.916195 134763599910720 pyconfig.py:471] Config param wi_tile_dlhs_mlp_dim: 1024 I0424 12:14:19.916211 134763599910720 pyconfig.py:471] Config param wi_tile_drhs_batch_seq: 512 I0424 12:14:19.916226 134763599910720 pyconfig.py:471] Config param wi_tile_drhs_embed_dim: 1024 I0424 12:14:19.916241 134763599910720 pyconfig.py:471] Config param wi_tile_drhs_mlp_dim: 1024 I0424 12:14:19.916255 134763599910720 pyconfig.py:471] Config param wi_tile_fwd_batch_seq: 512 I0424 12:14:19.916271 134763599910720 pyconfig.py:471] Config param wi_tile_fwd_embed_dim: 1024 I0424 12:14:19.916286 134763599910720 pyconfig.py:471] Config param wi_tile_fwd_mlp_dim: 1024 I0424 12:14:19.916301 134763599910720 pyconfig.py:471] Config param wo_tile_dlhs_batch_seq: 512 I0424 12:14:19.916316 134763599910720 pyconfig.py:471] Config param wo_tile_dlhs_embed_dim: 1024 I0424 12:14:19.916331 134763599910720 pyconfig.py:471] Config param wo_tile_dlhs_mlp_dim: 1024 I0424 12:14:19.916346 134763599910720 pyconfig.py:471] Config param wo_tile_drhs_batch_seq: 512 I0424 12:14:19.916362 134763599910720 pyconfig.py:471] Config param wo_tile_drhs_embed_dim: 1024 I0424 12:14:19.916377 134763599910720 pyconfig.py:471] Config param wo_tile_drhs_mlp_dim: 1024 I0424 12:14:19.916394 134763599910720 pyconfig.py:471] Config param wo_tile_fwd_batch_seq: 512 I0424 12:14:19.916408 134763599910720 pyconfig.py:471] Config param wo_tile_fwd_embed_dim: 1024 I0424 12:14:19.916424 134763599910720 pyconfig.py:471] Config param wo_tile_fwd_mlp_dim: 1024 I0424 12:14:19.916438 134763599910720 pyconfig.py:471] Config param wsd_decay_steps_fraction: 0.1 I0424 12:14:19.916454 134763599910720 pyconfig.py:471] Config param wsd_decay_style: WsdDecayStyle.LINEAR I0424 12:14:19.916473 134763599910720 pyconfig.py:471] Config param xprof_e2e_enable_fw_power_level_event: False I0424 12:14:19.916487 134763599910720 pyconfig.py:471] Config param xprof_e2e_enable_fw_thermal_event: False I0424 12:14:19.916502 134763599910720 pyconfig.py:471] Config param xprof_e2e_enable_fw_throttle_event: False I0424 12:14:19.916518 134763599910720 pyconfig.py:471] Config param xprof_tpu_power_trace_level: 0 I0424 12:14:19.916536 134763599910720 pyconfig.py:471] Config param z_loss_multiplier: 0.0 I0424 12:14:19.916871 134763599910720 tokenizer.py:245] Tokenizer path: meta-llama/Llama-2-7b-chat-hf I0424 12:14:19.916908 134763599910720 tokenizer.py:224] Loading HF tokenizer: meta-llama/Llama-2-7b-chat-hf I0424 12:14:23.526765 134763599910720 _schedule.py:129] A polynomial schedule was set with a non-positive `transition_steps` value; this results in a constant schedule with value `init_value`. I0424 12:14:23.529805 134763599910720 maxtext_utils.py:1771] Num_devices: 32, shape (1, 4, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1) I0424 12:14:23.529925 134763599910720 train_distill.py:608] Applying logical axis rules for model initialization and training... I0424 12:14:23.529996 134763599910720 train_distill.py:612] Loading Student from ... I0424 12:14:23.530025 134763599910720 train_distill.py:169] --- Student Configuration --- I0424 12:14:23.530046 134763599910720 train_distill.py:170] Model Name: gpt3-52k I0424 12:14:23.530068 134763599910720 train_distill.py:171] Dimensions: 1 Layers, 16 Emb Dim, 8 Head Dim I0424 12:14:23.530086 134763599910720 train_distill.py:174] Attention Heads: 2 Query, 2 KV I0424 12:14:23.530103 134763599910720 train_distill.py:175] Vocab Size: 32000 I0424 12:14:23.530120 134763599910720 train_distill.py:176] Checkpoint: I0424 12:14:23.530138 134763599910720 train_distill.py:477] Initializing model: gpt3-52k... I0424 12:14:24.892444 134763599910720 train_distill.py:626] Loading Teacher from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items... I0424 12:14:24.892547 134763599910720 train_distill.py:169] --- Teacher Configuration --- I0424 12:14:24.892576 134763599910720 train_distill.py:170] Model Name: gpt3-52k I0424 12:14:24.892598 134763599910720 train_distill.py:171] Dimensions: 1 Layers, 16 Emb Dim, 8 Head Dim I0424 12:14:24.892620 134763599910720 train_distill.py:174] Attention Heads: 2 Query, 2 KV I0424 12:14:24.892646 134763599910720 train_distill.py:175] Vocab Size: 32000 I0424 12:14:24.892676 134763599910720 train_distill.py:176] Checkpoint: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items I0424 12:14:24.892696 134763599910720 train_distill.py:477] Initializing model: gpt3-52k... I0424 12:14:25.922031 134763599910720 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0424 12:14:25.922453 134763599910720 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=True, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7a9060db0ce0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0424 12:14:25.922513 134763599910720 abstract_checkpointer.py:35] orbax-checkpoint version: 0.11.28 W0424 12:14:26.451388 134763599910720 checkpoint.py:202] Metadata file does not exist: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items/_CHECKPOINT_METADATA I0424 12:14:26.991782 2124 google_auth_provider.cc:181] Running on GCE, using service account 562977990677-compute@developer.gserviceaccount.com I0424 12:14:28.118183 134763599910720 checkpointer.py:304] Restoring checkpoint from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items. W0424 12:14:30.192122 134763599910720 transform_utils.py:230] The transformations API will eventually be replaced by an upgraded design. The current API will not be removed until this point, but it will no longer be actively worked on. I0424 12:14:30.192480 134763599910720 transform_utils.py:288] The following keys are not loaded from the original tree after applying specified transforms: params/params/decoder/to_nnx__rngs/aqt/count, params/params/decoder/to_nnx__rngs/aqt/key, params/params/decoder/to_nnx__rngs/dropout/count, params/params/decoder/to_nnx__rngs/dropout/key, params/params/decoder/to_nnx__rngs/params/count, params/params/decoder/to_nnx__rngs/params/key I0424 12:14:30.699513 134763599910720 checkpointer.py:318] Finished restoring checkpoint in 2.95 seconds from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items. I0424 12:14:31.382269 134763599910720 train_distill.py:652] Initializing Data Iterators via MaxText pipeline... I0424 12:14:31.445345 134763599910720 config.py:112] TensorFlow version 2.20.0 available. I0424 12:14:31.445847 134763599910720 config.py:125] JAX version 0.8.3 available. E0424 12:14:33.455706 134763599910720 packing.py:209] PackAndBatchOperation is deprecated. Please use lazy_dataset.FirstFitPackIterDataset instead. I0424 12:14:33.455920 134763599910720 data_loader.py:408] Adding CopyNumPyArrayToSharedMemory MapTransform. I0424 12:14:33.458882 134763599910720 train_distill.py:422] Input Pipeline Checkpointing: DISABLED I0424 12:14:33.458941 134763599910720 train_distill.py:426] Reason: Iterator 'MultiHostDataLoadIterator' is not recognized as Grain (dataset_type='DatasetType.HF', has_save=False) I0424 12:14:33.459001 134763599910720 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0424 12:14:33.459079 134763599910720 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7a9060db0ce0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0424 12:14:33.459121 134763599910720 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0424 12:14:33.459153 134763599910720 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7a9060db0ce0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0424 12:14:33.459195 134763599910720 checkpoint_manager.py:702] [process=5][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a7986d1c9b0>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a7986e8fa40>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a7986e8f9b0>}, handler_registry=None I0424 12:14:33.459389 134763599910720 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a7986d1c9b0>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0424 12:14:33.459429 134763599910720 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a7986e8fa40>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0424 12:14:33.459454 134763599910720 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a7986e8f9b0>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0424 12:14:33.459477 134763599910720 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a7986ee31d0>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0424 12:14:33.459503 134763599910720 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a7986d1c9b0>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a7986d1c9b0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a7986e8fa40>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a7986e8fa40>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a7986e8f9b0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a7986e8f9b0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a7986ee31d0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a7986ee31d0>}). I0424 12:14:33.459914 134763599910720 async_checkpointer.py:177] [process=5][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7a79870fb7e0> timeout: 600 secs and primary_host=0 for async checkpoint writes I0424 12:14:35.833997 134763599910720 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260424_120707/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260424_120707_07_distill_smoke/checkpoints I0424 12:14:35.843926 134763599910720 checkpoint_manager.py:921] [process=5][thread=MainThread] CheckpointManager created, primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260424_120707/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260424_120707_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7a7986e8f980> I0424 12:14:35.844045 134763599910720 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0424 12:14:35.844110 134763599910720 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7a9060db0ce0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0424 12:14:35.844146 134763599910720 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0424 12:14:35.844177 134763599910720 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7a9060db0ce0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0424 12:14:35.844217 134763599910720 checkpoint_manager.py:1983] [process=5][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning. I0424 12:14:35.844268 134763599910720 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=134763599910720 count=1 at 0x7a79ccc91140>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7a79cc317140>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7a7986e8cd10>, _write_futures=[]) I0424 12:14:35.844619 134763599910720 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=134763599910720 count=1 at 0x7a79ccc91140>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7a79cc317140>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7a7986e8cd10>, _write_futures=[]) I0424 12:14:35.844647 134763599910720 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=134763599910720 count=1 at 0x7a79ccc91140>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7a79cc317140>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7a7986e8cd10>, _write_futures=[]) I0424 12:14:35.844694 134763599910720 checkpoint_manager.py:702] [process=5][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a79cc3a2c30>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a7986ee3da0>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a7986ee2330>, 'iter': <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7a7986ee2900>}, handler_registry=None I0424 12:14:35.844793 134763599910720 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a79cc3a2c30>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0424 12:14:35.844826 134763599910720 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a7986ee3da0>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0424 12:14:35.844851 134763599910720 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a7986ee2330>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0424 12:14:35.844879 134763599910720 composite_checkpoint_handler.py:237] Deferred registration for item: "iter". Adding handler `<maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7a7986ee2900>` for item "iter" and save args `<class 'maxtext.common.checkpointing.GrainCheckpointSave'>` and restore args `<class 'maxtext.common.checkpointing.GrainCheckpointRestore'>` to `_handler_registry`. I0424 12:14:35.844902 134763599910720 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a7986ee1f70>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0424 12:14:35.844925 134763599910720 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a79cc3a2c30>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a79cc3a2c30>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a7986ee3da0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a7986ee3da0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a7986ee2330>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a7986ee2330>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointSave'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7a7986ee2900>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointRestore'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7a7986ee2900>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a7986ee1f70>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a7986ee1f70>}). I0424 12:14:35.844995 134763599910720 async_checkpointer.py:177] [process=5][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7a79870fb920> timeout: 600 secs and primary_host=0 for async checkpoint writes I0424 12:14:36.234893 134763599910720 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260424_120707/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260424_120707_07_distill_smoke/checkpoints I0424 12:14:36.272171 134763599910720 checkpoint_manager.py:921] [process=5][thread=MainThread] CheckpointManager created, primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260424_120707/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260424_120707_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7a7986d32fc0> I0424 12:14:36.272686 134763599910720 train_distill.py:703] Starting Distillation Training... I0424 12:14:36.272787 134763599910720 peft_trainer.py:584] Training with mesh: Mesh('diloco': 1, 'data': 4, 'stage': 1, 'fsdp': 8, 'fsdp_transpose': 1, 'sequence': 1, 'context': 1, 'context_autoregressive': 1, 'tensor': 1, 'tensor_transpose': 1, 'tensor_sequence': 1, 'expert': 1, 'autoregressive': 1, axis_types=(Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto)) I0424 12:14:37.002600 134763599910720 peft_trainer.py:594] Compiled train_step cache size: 0 I0424 12:14:37.004314 134620431505152 grain_pool.py:367] Grain pool will use 1 processes. I0424 12:14:37.031280 134620431505152 grain_pool.py:440] Grain pool will start child processes. I0424 12:14:37.036638 134620431505152 grain_pool.py:448] Grain pool started all child processes. 2026-04-24 12:14:43.081040: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303) Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} `rope_scaling`'s factor field must be a float >= 1, got 40 `rope_scaling`'s beta_fast field must be a float, got 32 `rope_scaling`'s beta_slow field must be a float, got 1 Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} /deps/src/maxtext/trainers/post_train/distillation/train_distill.py:281: DeprecationWarning: '.value' access is now deprecated. For Variable[Array] instances use: variable[...] For other Variable types use: variable.get_value() current_step = model.training_step.value I0424 12:14:49.532822 134763599910720 checkpoint_manager.py:1983] [process=5][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning. I0424 12:14:49.534730 134763599910720 checkpoint_manager.py:1501] [process=5] Saving checkpoint at step 1 I0424 12:14:49.537879 134763599910720 async_checkpointer.py:452] [process=5] Started async saving checkpoint to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260424_120707/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260424_120707_07_distill_smoke/checkpoints/1. I0424 12:14:50.255514 134763599910720 signaling_client.py:364] Using JaxDistributedSignalingClient I0424 12:14:50.256613 134763599910720 jax_array_handlers.py:347] Scheduling D2H of 37 prioritized jax.Array. I0424 12:14:50.256683 134763599910720 replica_slices.py:410] Transferring arrays to host memory with options: use_replica_parallel=True, min_slice_bytes_for_replica_parallel=None, max_replicas_for_replica_parallel=None, enable_pinned_host_transfer=False I0424 12:14:50.932822 134763599910720 base_pytree_checkpoint_handler.py:153] [process=5][thread=MainThread] Initiated "orbax.checkpoint._src.serialization.jax_array_handlers.ArrayHandler".serialize. Time taken: 0.677396s I0424 12:14:50.934344 134763599910720 base_pytree_checkpoint_handler.py:128] [process=5] /jax/checkpoint/write/blocking_gbytes_per_sec: 571.056 KiB/s (total gbytes: 535.1 KiB) (time elapsed: 936 milliseconds) (per-host) I0424 12:14:50.934403 134763599910720 base_pytree_checkpoint_handler.py:732] [process=5][thread=MainThread] Initiated Pytree async_save. Time taken: 0.937069s (batch_requests_ready=0.254539s, total_serialization_initiated=0.682424s, others=0.000106s) I0424 12:14:50.935491 134763599910720 jax_array_handlers.py:347] Scheduling D2H of 22 prioritized jax.Array. I0424 12:14:50.935547 134763599910720 replica_slices.py:410] Transferring arrays to host memory with options: use_replica_parallel=True, min_slice_bytes_for_replica_parallel=None, max_replicas_for_replica_parallel=None, enable_pinned_host_transfer=False I0424 12:14:50.939902 134763599910720 base_pytree_checkpoint_handler.py:153] [process=5][thread=MainThread] Initiated "orbax.checkpoint._src.serialization.jax_array_handlers.ArrayHandler".serialize. Time taken: 0.005384s I0424 12:14:50.940011 134763599910720 base_pytree_checkpoint_handler.py:128] [process=5] /jax/checkpoint/write/blocking_gbytes_per_sec: 283.108 KiB/s (total gbytes: 267.5 KiB) (time elapsed: 944 milliseconds) (per-host) I0424 12:14:50.940056 134763599910720 base_pytree_checkpoint_handler.py:732] [process=5][thread=MainThread] Initiated Pytree async_save. Time taken: 0.945045s (batch_requests_ready=0.937989s, total_serialization_initiated=0.006985s, others=0.000071s) I0424 12:14:50.940146 134763599910720 composite_checkpoint_handler.py:715] [process=5][thread=MainThread] Initiated CompositeCheckpointHandler.async_save. Time taken: 0.949039s (all_items=0.000023s, per_item={'model_params': '0.00001836', 'optimizer_state': '0.00000429'}, temp_paths=0.949016) I0424 12:14:50.941112 134614519813888 async_checkpointer.py:79] [process=5][thread=async_save] Background save thread started. I0424 12:14:50.941279 134763599910720 async_checkpointer.py:561] Finished blocking save. Time taken: 1.406480s. Continuing background save to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260424_120707/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260424_120707_07_distill_smoke/checkpoints/1. I0424 12:14:50.952685 134763599910720 checkpoint_manager.py:1549] [process=5][thread=MainThread][step=1] Starting CheckpointManager Save Finalize thread=save_finalize I0424 12:14:50.952970 134614536599296 async_checkpointer.py:265] [process=5][thread=save_finalize] Waiting for background save thread=async_save. I0424 12:14:50.953127 134763599910720 standard_logger.py:34] {'step': 1, 'event_type': 'save', 'directory': 'gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260424_120707/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260424_120707_07_distill_smoke/checkpoints', 'reached_preemption': False, 'preemption_received_at': None, 'synchronous': False, 'wait_for_prev_start_time': 1777032889.5327876, 'wait_for_prev_duration_secs': 0.00013208389282226562, 'time_between_consecutive_saves_sec': None, 'checkpointer_blocking_start_time': 1777032889.5347712, 'checkpointer_blocking_duration_secs': 1.4066107273101807, 'get_old_steps_start_time': 1777032890.9414027, 'get_old_steps_duration_secs': 8.296966552734375e-05, 'checkpoint_manager_blocking_start_time': 1777032888.5634387, 'checkpoint_manager_blocking_duration_secs': 2.3896546363830566} /deps/src/maxtext/trainers/post_train/distillation/train_distill.py:281: DeprecationWarning: '.value' access is now deprecated. For Variable[Array] instances use: variable[...] For other Variable types use: variable.get_value() current_step = model.training_step.value I0424 12:14:54.145635 134763599910720 peft_trainer.py:474] Train step 1 training loss: 15.963623 - training perplexity: 8568669.000000 I0424 12:14:54.166398 134763599910720 peft_trainer.py:474] Train step 2 training loss: 15.943937 - training perplexity: 8401639.000000 I0424 12:14:54.191308 134763599910720 peft_trainer.py:474] Train step 3 training loss: 15.973638 - training perplexity: 8654912.000000 I0424 12:14:54.210690 134763599910720 peft_trainer.py:474] Train step 4 training loss: 15.952717 - training perplexity: 8475726.000000 I0424 12:14:54.216256 134763599910720 peft_trainer.py:733] Train loop finished in: 17.2132 seconds I0424 12:14:54.216719 134763599910720 train_distill.py:712] Saving final checkpoint to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260424_120707/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260424_120707_07_distill_smoke/checkpoints/... I0424 12:14:55.538059 134612915312384 array_metadata_store.py:203] [process=5][thread=array_type_handler] Wrote 37 array_metadata.ArrayMetadata to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260424_120707/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260424_120707_07_distill_smoke/checkpoints/1/optimizer_state/array_metadatas/process_5 I0424 12:14:55.682706 134614528206592 array_metadata_store.py:203] [process=5][thread=array_type_handler] Wrote 22 array_metadata.ArrayMetadata to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260424_120707/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260424_120707_07_distill_smoke/checkpoints/1/model_params/array_metadatas/process_5 I0424 12:14:55.683828 134614519813888 base_pytree_checkpoint_handler.py:128] [process=5] /jax/checkpoint/write/gbytes_per_sec: 47.028 KiB/s (total gbytes: 267.5 KiB) (time elapsed: 5 seconds) (per-host) I0424 12:14:55.683964 134614519813888 base_pytree_checkpoint_handler.py:128] [process=5] /jax/checkpoint/write/gbytes_per_sec: 94.091 KiB/s (total gbytes: 535.1 KiB) (time elapsed: 5 seconds) (per-host) I0424 12:14:55.684000 134614519813888 async_checkpointer.py:90] [process=5][thread=async_save] 4 Handler Commit operations completed. Time taken: 4.742781s. I0424 12:15:06.307051 134763599910720 checkpoint_manager.py:1994] [process=5][thread=MainThread][step=1][wait_until_finished] Waiting for Save Finalize thread (save_finalize) to complete. I0424 12:15:07.130701 134614519813888 async_checkpointer.py:144] [process=5][thread=async_save] Background save thread done. Time taken: 16.189463s. I0424 12:15:07.131011 134614536599296 async_checkpointer.py:273] [process=5][thread=save_finalize] Done with waiting for background save thread=async_save. I0424 12:15:07.131130 134614536599296 async_checkpointer.py:283] [process=5][thread=save_finalize] No errors found in background save thread=async_save. I0424 12:15:07.131177 134614536599296 checkpoint_manager.py:2103] [process=5][thread=save_finalize][step=1] CheckpointManager Save Finalize is syncing with other hosts... I0424 12:15:07.132769 134614536599296 checkpoint_manager.py:2112] [process=5][thread=save_finalize][step=1] CheckpointManager Save Finalize is done on all hosts. I0424 12:15:07.132948 134763599910720 checkpoint_manager.py:2006] [process=5][thread=MainThread][step=1][wait_until_finished] Done waiting for Save Finalize thread (save_finalize) running at step=1. I0424 12:15:07.134567 134763599910720 checkpoint_manager.py:1501] [process=5] Saving checkpoint at step 5 I0424 12:15:07.138056 134763599910720 async_checkpointer.py:452] [process=5] Started async saving checkpoint to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260424_120707/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260424_120707_07_distill_smoke/checkpoints/5. I0424 12:15:07.665554 134763599910720 jax_array_handlers.py:347] Scheduling D2H of 37 prioritized jax.Array. I0424 12:15:07.665666 134763599910720 replica_slices.py:410] Transferring arrays to host memory with options: use_replica_parallel=True, min_slice_bytes_for_replica_parallel=None, max_replicas_for_replica_parallel=None, enable_pinned_host_transfer=False I0424 12:15:08.308600 134763599910720 base_pytree_checkpoint_handler.py:153] [process=5][thread=MainThread] Initiated "orbax.checkpoint._src.serialization.jax_array_handlers.ArrayHandler".serialize. Time taken: 0.643959s I0424 12:15:08.310095 134763599910720 base_pytree_checkpoint_handler.py:128] [process=5] /jax/checkpoint/write/blocking_gbytes_per_sec: 597.693 KiB/s (total gbytes: 535.1 KiB) (time elapsed: 895 milliseconds) (per-host) I0424 12:15:08.310156 134763599910720 base_pytree_checkpoint_handler.py:732] [process=5][thread=MainThread] Initiated Pytree async_save. Time taken: 0.895307s (batch_requests_ready=0.248152s, total_serialization_initiated=0.647056s, others=0.000098s) I0424 12:15:08.311261 134763599910720 jax_array_handlers.py:347] Scheduling D2H of 22 prioritized jax.Array. I0424 12:15:08.311318 134763599910720 replica_slices.py:410] Transferring arrays to host memory with options: use_replica_parallel=True, min_slice_bytes_for_replica_parallel=None, max_replicas_for_replica_parallel=None, enable_pinned_host_transfer=False I0424 12:15:08.316111 134763599910720 base_pytree_checkpoint_handler.py:153] [process=5][thread=MainThread] Initiated "orbax.checkpoint._src.serialization.jax_array_handlers.ArrayHandler".serialize. Time taken: 0.005878s I0424 12:15:08.316214 134763599910720 base_pytree_checkpoint_handler.py:128] [process=5] /jax/checkpoint/write/blocking_gbytes_per_sec: 296.343 KiB/s (total gbytes: 267.5 KiB) (time elapsed: 902 milliseconds) (per-host) I0424 12:15:08.316264 134763599910720 base_pytree_checkpoint_handler.py:732] [process=5][thread=MainThread] Initiated Pytree async_save. Time taken: 0.902842s (batch_requests_ready=0.895365s, total_serialization_initiated=0.007406s, others=0.000072s) I0424 12:15:08.316341 134763599910720 composite_checkpoint_handler.py:715] [process=5][thread=MainThread] Initiated CompositeCheckpointHandler.async_save. Time taken: 0.906778s (all_items=0.000013s, per_item={'model_params': '0.00001001', 'optimizer_state': '0.00000286'}, temp_paths=0.906765) I0424 12:15:08.317342 134621471676160 async_checkpointer.py:79] [process=5][thread=async_save] Background save thread started. I0424 12:15:08.317468 134763599910720 async_checkpointer.py:561] Finished blocking save. Time taken: 1.182830s. Continuing background save to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260424_120707/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260424_120707_07_distill_smoke/checkpoints/5. I0424 12:15:08.352777 134763599910720 checkpoint_manager.py:1549] [process=5][thread=MainThread][step=5] Starting CheckpointManager Save Finalize thread=save_finalize I0424 12:15:08.353063 134614536599296 async_checkpointer.py:265] [process=5][thread=save_finalize] Waiting for background save thread=async_save. I0424 12:15:08.353228 134763599910720 standard_logger.py:34] {'step': 5, 'event_type': 'save', 'directory': 'gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260424_120707/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260424_120707_07_distill_smoke/checkpoints', 'reached_preemption': False, 'preemption_received_at': None, 'synchronous': False, 'wait_for_prev_start_time': 1777032906.30701, 'wait_for_prev_duration_secs': 0.8260409832000732, 'time_between_consecutive_saves_sec': None, 'checkpointer_blocking_start_time': 1777032907.1346064, 'checkpointer_blocking_duration_secs': 1.183027744293213, 'get_old_steps_start_time': 1777032908.317665, 'get_old_steps_duration_secs': 8.463859558105469e-05, 'checkpoint_manager_blocking_start_time': 1777032894.2186382, 'checkpoint_manager_blocking_duration_secs': 14.134541273117065} I0424 12:15:08.353399 134763599910720 checkpoint_manager.py:1994] [process=5][thread=MainThread][step=5][wait_until_finished] Waiting for Save Finalize thread (save_finalize) to complete. I0424 12:15:12.474618 134611285591808 array_metadata_store.py:203] [process=5][thread=array_type_handler] Wrote 37 array_metadata.ArrayMetadata to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260424_120707/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260424_120707_07_distill_smoke/checkpoints/5/optimizer_state/array_metadatas/process_5 I0424 12:15:12.502099 134612915312384 array_metadata_store.py:203] [process=5][thread=array_type_handler] Wrote 22 array_metadata.ArrayMetadata to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260424_120707/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260424_120707_07_distill_smoke/checkpoints/5/model_params/array_metadatas/process_5 I0424 12:15:12.503050 134621471676160 base_pytree_checkpoint_handler.py:128] [process=5] /jax/checkpoint/write/gbytes_per_sec: 52.565 KiB/s (total gbytes: 267.5 KiB) (time elapsed: 5 seconds) (per-host) I0424 12:15:12.503208 134621471676160 base_pytree_checkpoint_handler.py:128] [process=5] /jax/checkpoint/write/gbytes_per_sec: 105.154 KiB/s (total gbytes: 535.1 KiB) (time elapsed: 5 seconds) (per-host) I0424 12:15:12.503247 134621471676160 async_checkpointer.py:90] [process=5][thread=async_save] 4 Handler Commit operations completed. Time taken: 4.185702s. I0424 12:15:22.415789 134621471676160 async_checkpointer.py:144] [process=5][thread=async_save] Background save thread done. Time taken: 14.098229s. I0424 12:15:22.416089 134614536599296 async_checkpointer.py:273] [process=5][thread=save_finalize] Done with waiting for background save thread=async_save. I0424 12:15:22.416211 134614536599296 async_checkpointer.py:283] [process=5][thread=save_finalize] No errors found in background save thread=async_save. I0424 12:15:22.416259 134614536599296 checkpoint_manager.py:2103] [process=5][thread=save_finalize][step=5] CheckpointManager Save Finalize is syncing with other hosts... I0424 12:15:22.417625 134614536599296 checkpoint_manager.py:2112] [process=5][thread=save_finalize][step=5] CheckpointManager Save Finalize is done on all hosts. I0424 12:15:22.417803 134763599910720 checkpoint_manager.py:2006] [process=5][thread=MainThread][step=5][wait_until_finished] Done waiting for Save Finalize thread (save_finalize) running at step=5. I0424 12:15:22.417921 134763599910720 train_distill.py:724] Final checkpoint saved. I0424 12:15:22.420140 134763599910720 peft_trainer.py:474] Train step 5 training loss: 15.949528 - training perplexity: 8448739.000000 I0424 12:15:22.420522 134763599910720 checkpoint_manager.py:1983] [process=5][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning. I0424 12:15:22.420594 134763599910720 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=134763599910720 count=1 at 0x7a7986ea4880>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7a7986ee2240>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7a7986ee1f10>, _write_futures=[]) I0424 12:15:22.420642 134763599910720 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=134763599910720 count=1 at 0x7a7986ea4880>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7a7986ee2240>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7a7986ee1f10>, _write_futures=[]) I0424 12:15:22.420686 134763599910720 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=134763599910720 count=1 at 0x7a7986ea4880>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7a7986ee2240>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7a7986ee1f10>, _write_futures=[]) I0424 12:15:22.420731 134763599910720 train_distill.py:734] Distillation Complete. I0424 12:15:22.719268 134620431505152 grain_pool.py:547] Shutting down multiprocessing system. I0424 12:15:24.183759 134620431505152 grain_pool.py:542] Grain pool is exiting. I0424 12:15:24.183867 134620431505152 grain_pool.py:547] Shutting down multiprocessing system. I0424 12:15:24.183929 134620431505152 grain_pool.py:547] Shutting down multiprocessing system. XPK End: Fri Apr 24 12:15:32 UTC 2026 EXIT_CODE=0
XPK Start: Fri Apr 24 12:29:34 UTC 2026 2026-04-24 12:29:53.088084: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303) `rope_parameters`'s factor field must be a float >= 1, got 40 `rope_parameters`'s beta_fast field must be a float, got 32 `rope_parameters`'s beta_slow field must be a float, got 1 DeepseekV32Config got `key=rope_scaling` in kwargs but hasn't set it as attribute. For RoPE standardization you need to set `self.rope_parameters` in model's config. I0424 12:29:59.518248 139204572510016 max_utils.py:273] Attempting to initialize the jax distributed system... I0424 12:30:08.557706 139204572510016 distributed.py:149] Starting JAX distributed service on [::]:8482 I0424 12:30:08.559996 139204572510016 distributed.py:172] Connecting to JAX distributed service on mt-07-distill-smoke-mp06h-slice-job-0-0.mt-07-distill-smoke-mp06h:8482 I0424 12:30:09.628017 139204572510016 max_utils.py:284] Jax distributed system initialized! I0424 12:30:16.020037 139204572510016 max_utils.py:244] Jax distributed system is already initialized. W0424 12:30:16.151197 139204572510016 pyconfig.py:111] base_output_directory is not provided; Using local directory called maxtext_output I0424 12:30:16.212115 139204572510016 max_utils.py:244] Jax distributed system is already initialized. I0424 12:30:16.213351 139204572510016 pyconfig.py:471] Config param abort_on_inf_loss: True I0424 12:30:16.213401 139204572510016 pyconfig.py:471] Config param abort_on_nan_loss: True I0424 12:30:16.213427 139204572510016 pyconfig.py:471] Config param act_quantization_calibration_method: absmax I0424 12:30:16.213448 139204572510016 pyconfig.py:471] Config param activation_dropout_for_audio: 0.0 I0424 12:30:16.213469 139204572510016 pyconfig.py:471] Config param activation_function_for_audio: gelu I0424 12:30:16.213487 139204572510016 pyconfig.py:471] Config param activations_in_float32: False I0424 12:30:16.213503 139204572510016 pyconfig.py:471] Config param adam_b1: 0.9 I0424 12:30:16.213523 139204572510016 pyconfig.py:471] Config param adam_b2: 0.95 I0424 12:30:16.213540 139204572510016 pyconfig.py:471] Config param adam_eps: 1e-08 I0424 12:30:16.213563 139204572510016 pyconfig.py:471] Config param adam_eps_root: 0.0 I0424 12:30:16.213580 139204572510016 pyconfig.py:471] Config param adam_weight_decay: 0.1 I0424 12:30:16.213597 139204572510016 pyconfig.py:471] Config param adamw_mask: [] I0424 12:30:16.213613 139204572510016 pyconfig.py:471] Config param add_bos: True I0424 12:30:16.213629 139204572510016 pyconfig.py:471] Config param add_eos: True I0424 12:30:16.213647 139204572510016 pyconfig.py:471] Config param allow_split_physical_axes: False I0424 12:30:16.213661 139204572510016 pyconfig.py:471] Config param ar_cache_axis_order: 1,2,0,3 I0424 12:30:16.213679 139204572510016 pyconfig.py:471] Config param async_checkpointing: True I0424 12:30:16.213693 139204572510016 pyconfig.py:471] Config param async_scheduling: False I0424 12:30:16.213710 139204572510016 pyconfig.py:471] Config param attention: dot_product I0424 12:30:16.213725 139204572510016 pyconfig.py:471] Config param attention_bias: False I0424 12:30:16.213742 139204572510016 pyconfig.py:471] Config param attention_dropout_for_audio: 0.0 I0424 12:30:16.213758 139204572510016 pyconfig.py:471] Config param attention_out: RematLocation.REMAT I0424 12:30:16.213779 139204572510016 pyconfig.py:471] Config param attention_output_dim: -1 I0424 12:30:16.213793 139204572510016 pyconfig.py:471] Config param attention_sink: False I0424 12:30:16.213810 139204572510016 pyconfig.py:471] Config param attention_type: global I0424 12:30:16.213825 139204572510016 pyconfig.py:471] Config param attn_logits_soft_cap: None I0424 12:30:16.213842 139204572510016 pyconfig.py:471] Config param audio_path: I0424 12:30:16.213857 139204572510016 pyconfig.py:471] Config param audio_placeholder: <|audio|> I0424 12:30:16.213873 139204572510016 pyconfig.py:471] Config param autoregressive_decode_assert: I0424 12:30:16.213888 139204572510016 pyconfig.py:471] Config param base_config: base.yml I0424 12:30:16.213904 139204572510016 pyconfig.py:471] Config param base_emb_dim: 16 I0424 12:30:16.213921 139204572510016 pyconfig.py:471] Config param base_mlp_dim: 64 I0424 12:30:16.213935 139204572510016 pyconfig.py:471] Config param base_moe_mlp_dim: -1 I0424 12:30:16.213952 139204572510016 pyconfig.py:471] Config param base_num_decoder_layers: 1 I0424 12:30:16.213967 139204572510016 pyconfig.py:471] Config param base_num_kv_heads: 2 I0424 12:30:16.213992 139204572510016 pyconfig.py:471] Config param base_num_query_heads: 2 I0424 12:30:16.214007 139204572510016 pyconfig.py:471] Config param base_output_directory: /deps/maxtext_output I0424 12:30:16.214024 139204572510016 pyconfig.py:471] Config param batch_size: 1 I0424 12:30:16.214039 139204572510016 pyconfig.py:471] Config param batch_split_factor: 1 I0424 12:30:16.214056 139204572510016 pyconfig.py:471] Config param beta_fast: 32 I0424 12:30:16.214072 139204572510016 pyconfig.py:471] Config param beta_slow: 1 I0424 12:30:16.214088 139204572510016 pyconfig.py:471] Config param bwd_quantization_calibration_method: absmax I0424 12:30:16.214132 139204572510016 pyconfig.py:471] Config param capacity_factor: -1.0 I0424 12:30:16.214149 139204572510016 pyconfig.py:471] Config param cast_logits_to_fp32: True I0424 12:30:16.214164 139204572510016 pyconfig.py:471] Config param chat_template: I0424 12:30:16.214181 139204572510016 pyconfig.py:471] Config param chat_template_path: I0424 12:30:16.214198 139204572510016 pyconfig.py:471] Config param checkpoint_conversion_fn: None I0424 12:30:16.214216 139204572510016 pyconfig.py:471] Config param checkpoint_dir: /deps/maxtext_output/gpt3-52k_2026-04-24-12-30/checkpoints/ I0424 12:30:16.214233 139204572510016 pyconfig.py:471] Config param checkpoint_is_quantized: False I0424 12:30:16.214447 139204572510016 pyconfig.py:471] Config param checkpoint_period: 2000 I0424 12:30:16.214493 139204572510016 pyconfig.py:471] Config param checkpoint_storage_concurrent_gb: 96 I0424 12:30:16.214521 139204572510016 pyconfig.py:471] Config param checkpoint_storage_target_data_file_size_bytes: 2147483648 I0424 12:30:16.214551 139204572510016 pyconfig.py:471] Config param checkpoint_storage_use_ocdbt: True I0424 12:30:16.214568 139204572510016 pyconfig.py:471] Config param checkpoint_storage_use_zarr3: True I0424 12:30:16.214587 139204572510016 pyconfig.py:471] Config param checkpoint_todelete_full_path: None I0424 12:30:16.214606 139204572510016 pyconfig.py:471] Config param checkpoint_todelete_subdir: None I0424 12:30:16.214622 139204572510016 pyconfig.py:471] Config param chips_per_vm: 4 I0424 12:30:16.214637 139204572510016 pyconfig.py:471] Config param chunk_attn_window_size: 0 I0424 12:30:16.214653 139204572510016 pyconfig.py:471] Config param collect_stack_trace: False I0424 12:30:16.214668 139204572510016 pyconfig.py:471] Config param colocated_python_checkpointing: False I0424 12:30:16.214684 139204572510016 pyconfig.py:471] Config param colocated_python_data_input: False I0424 12:30:16.214698 139204572510016 pyconfig.py:471] Config param compile_topology: I0424 12:30:16.214714 139204572510016 pyconfig.py:471] Config param compile_topology_num_slices: -1 I0424 12:30:16.214729 139204572510016 pyconfig.py:471] Config param compile_xla_flags: I0424 12:30:16.214744 139204572510016 pyconfig.py:471] Config param compiled_trainstep_file: I0424 12:30:16.214759 139204572510016 pyconfig.py:471] Config param compute_axis_order: 0,1,2,3 I0424 12:30:16.214775 139204572510016 pyconfig.py:471] Config param constant_bound_config: [] I0424 12:30:16.214790 139204572510016 pyconfig.py:471] Config param context: RematLocation.REMAT I0424 12:30:16.214810 139204572510016 pyconfig.py:471] Config param context_parallel_load_balance: True I0424 12:30:16.214827 139204572510016 pyconfig.py:471] Config param context_parallel_reorder_strategy: ReorderStrategy.AUTO I0424 12:30:16.214846 139204572510016 pyconfig.py:471] Config param context_parallel_size: 1 I0424 12:30:16.214861 139204572510016 pyconfig.py:471] Config param context_parallel_strategy: all_gather I0424 12:30:16.214878 139204572510016 pyconfig.py:471] Config param context_sharding: context I0424 12:30:16.214892 139204572510016 pyconfig.py:471] Config param conv_chunksize_for_audio: 500 I0424 12:30:16.214908 139204572510016 pyconfig.py:471] Config param conv_stride_for_vit: 14 I0424 12:30:16.214922 139204572510016 pyconfig.py:471] Config param convert_checkpoint_if_possible: False I0424 12:30:16.214938 139204572510016 pyconfig.py:471] Config param cost_estimate_flops_bwd: -1 I0424 12:30:16.214952 139204572510016 pyconfig.py:471] Config param cost_estimate_flops_fwd: -1 I0424 12:30:16.214969 139204572510016 pyconfig.py:471] Config param custom_mesh: I0424 12:30:16.214983 139204572510016 pyconfig.py:471] Config param custom_mesh_and_rule: I0424 12:30:16.214999 139204572510016 pyconfig.py:471] Config param d_model_for_audio: 256 I0424 12:30:16.215015 139204572510016 pyconfig.py:471] Config param data_sharding: (('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'),) I0424 12:30:16.215039 139204572510016 pyconfig.py:471] Config param data_shuffle_seed: 0 I0424 12:30:16.215053 139204572510016 pyconfig.py:471] Config param dataset_name: c4/en:3.0.1 I0424 12:30:16.215069 139204572510016 pyconfig.py:471] Config param dataset_path: I0424 12:30:16.215084 139204572510016 pyconfig.py:471] Config param dataset_type: DatasetType.HF I0424 12:30:16.215123 139204572510016 pyconfig.py:471] Config param dcn_autoregressive_parallelism: 1 I0424 12:30:16.215138 139204572510016 pyconfig.py:471] Config param dcn_context_autoregressive_parallelism: 1 I0424 12:30:16.215153 139204572510016 pyconfig.py:471] Config param dcn_context_parallelism: 1 I0424 12:30:16.215168 139204572510016 pyconfig.py:471] Config param dcn_data_parallelism: -1 I0424 12:30:16.215184 139204572510016 pyconfig.py:471] Config param dcn_diloco_parallelism: 1 I0424 12:30:16.215199 139204572510016 pyconfig.py:471] Config param dcn_expert_parallelism: 1 I0424 12:30:16.215215 139204572510016 pyconfig.py:471] Config param dcn_fsdp_parallelism: 1 I0424 12:30:16.215229 139204572510016 pyconfig.py:471] Config param dcn_fsdp_transpose_parallelism: 1 I0424 12:30:16.215246 139204572510016 pyconfig.py:471] Config param dcn_parallelism: [1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] I0424 12:30:16.215264 139204572510016 pyconfig.py:471] Config param dcn_pipeline_parallelism: 1 I0424 12:30:16.215278 139204572510016 pyconfig.py:471] Config param dcn_sequence_parallelism: 1 I0424 12:30:16.215293 139204572510016 pyconfig.py:471] Config param dcn_tensor_parallelism: 1 I0424 12:30:16.215307 139204572510016 pyconfig.py:471] Config param dcn_tensor_sequence_parallelism: 1 I0424 12:30:16.215324 139204572510016 pyconfig.py:471] Config param dcn_tensor_transpose_parallelism: 1 I0424 12:30:16.215338 139204572510016 pyconfig.py:471] Config param debug: {'rl': False} I0424 12:30:16.215355 139204572510016 pyconfig.py:471] Config param debug_sharding: False I0424 12:30:16.215369 139204572510016 pyconfig.py:471] Config param decode_sampling_nucleus_p: -1 I0424 12:30:16.215385 139204572510016 pyconfig.py:471] Config param decode_sampling_strategy: SamplingStrategy.GREEDY I0424 12:30:16.215404 139204572510016 pyconfig.py:471] Config param decode_sampling_temperature: 1.0 I0424 12:30:16.215422 139204572510016 pyconfig.py:471] Config param decode_sampling_top_k: 0 I0424 12:30:16.215435 139204572510016 pyconfig.py:471] Config param decoder_block: DecoderBlockType.GPT3 I0424 12:30:16.215453 139204572510016 pyconfig.py:471] Config param decoder_layer_input: RematLocation.DEVICE I0424 12:30:16.215469 139204572510016 pyconfig.py:471] Config param deepstack_visual_indexes_for_vit: [] I0424 12:30:16.215485 139204572510016 pyconfig.py:471] Config param degenerate_group_masking: True I0424 12:30:16.215501 139204572510016 pyconfig.py:471] Config param dense_init_scale: 1.0 I0424 12:30:16.215517 139204572510016 pyconfig.py:471] Config param diloco_outer_lr: 0.3 I0424 12:30:16.215534 139204572510016 pyconfig.py:471] Config param diloco_outer_momentum: 0.9 I0424 12:30:16.215555 139204572510016 pyconfig.py:471] Config param diloco_sync_period: 36 I0424 12:30:16.215570 139204572510016 pyconfig.py:471] Config param distill_alpha: 0.5 I0424 12:30:16.215586 139204572510016 pyconfig.py:471] Config param distill_alpha_end: None I0424 12:30:16.215601 139204572510016 pyconfig.py:471] Config param distill_alpha_schedule: constant I0424 12:30:16.215623 139204572510016 pyconfig.py:471] Config param distill_beta: 0.0 I0424 12:30:16.215646 139204572510016 pyconfig.py:471] Config param distill_beta_end: None I0424 12:30:16.215670 139204572510016 pyconfig.py:471] Config param distill_beta_schedule: constant I0424 12:30:16.215692 139204572510016 pyconfig.py:471] Config param distill_feature_loss_type: cosine I0424 12:30:16.215713 139204572510016 pyconfig.py:471] Config param distill_layer_indices: None I0424 12:30:16.215733 139204572510016 pyconfig.py:471] Config param distill_temperature: 1.0 I0424 12:30:16.215754 139204572510016 pyconfig.py:471] Config param distill_temperature_end: None I0424 12:30:16.215775 139204572510016 pyconfig.py:471] Config param distill_temperature_schedule: constant I0424 12:30:16.215796 139204572510016 pyconfig.py:471] Config param downsample_hidden_size_for_audio: 256 I0424 12:30:16.215814 139204572510016 pyconfig.py:471] Config param dpo_beta: 0.1 I0424 12:30:16.215834 139204572510016 pyconfig.py:471] Config param dpo_label_smoothing: 0.0 I0424 12:30:16.215854 139204572510016 pyconfig.py:471] Config param dq_reduction_steps: 0 I0424 12:30:16.215874 139204572510016 pyconfig.py:471] Config param dropout_rate: 0.0 I0424 12:30:16.215895 139204572510016 pyconfig.py:471] Config param dtype: bfloat16 I0424 12:30:16.215940 139204572510016 pyconfig.py:471] Config param dtype_mm: float32 I0424 12:30:16.215959 139204572510016 pyconfig.py:471] Config param dump_hlo: False I0424 12:30:16.215979 139204572510016 pyconfig.py:471] Config param dump_hlo_delete_local_after: True I0424 12:30:16.216000 139204572510016 pyconfig.py:471] Config param dump_hlo_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-24-12-30/xla_dump I0424 12:30:16.216022 139204572510016 pyconfig.py:471] Config param dump_hlo_local_dir: /tmp/xla_dump/ I0424 12:30:16.216042 139204572510016 pyconfig.py:471] Config param dump_hlo_local_module_name: jit_train_step I0424 12:30:16.216062 139204572510016 pyconfig.py:471] Config param dump_hlo_module_name: jit_train_step I0424 12:30:16.216082 139204572510016 pyconfig.py:471] Config param dump_hlo_upload_all: False I0424 12:30:16.216126 139204572510016 pyconfig.py:471] Config param dump_hlo_xla_flags: I0424 12:30:16.216148 139204572510016 pyconfig.py:471] Config param dump_jaxpr: False I0424 12:30:16.216168 139204572510016 pyconfig.py:471] Config param dump_jaxpr_delete_local_after: True I0424 12:30:16.216189 139204572510016 pyconfig.py:471] Config param dump_jaxpr_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-24-12-30/jaxpr_dump I0424 12:30:16.216209 139204572510016 pyconfig.py:471] Config param dump_jaxpr_local_dir: /tmp/jaxpr_dump/ I0424 12:30:16.216228 139204572510016 pyconfig.py:471] Config param dump_step: -1 I0424 12:30:16.216249 139204572510016 pyconfig.py:471] Config param elastic_enabled: False I0424 12:30:16.216270 139204572510016 pyconfig.py:471] Config param elastic_max_retries: 10 I0424 12:30:16.216293 139204572510016 pyconfig.py:471] Config param elastic_timeout_seconds: 300 I0424 12:30:16.216312 139204572510016 pyconfig.py:471] Config param emb_dim: 16 I0424 12:30:16.216332 139204572510016 pyconfig.py:471] Config param enable_autocheckpoint: False I0424 12:30:16.216354 139204572510016 pyconfig.py:471] Config param enable_checkpoint_cloud_logger: False I0424 12:30:16.216373 139204572510016 pyconfig.py:471] Config param enable_checkpointing: True I0424 12:30:16.216395 139204572510016 pyconfig.py:471] Config param enable_continuous_checkpointing: False I0424 12:30:16.216413 139204572510016 pyconfig.py:471] Config param enable_data_shuffling: True I0424 12:30:16.216432 139204572510016 pyconfig.py:471] Config param enable_diloco: False I0424 12:30:16.216452 139204572510016 pyconfig.py:471] Config param enable_dp_attention: False I0424 12:30:16.216470 139204572510016 pyconfig.py:471] Config param enable_dropout: False I0424 12:30:16.216492 139204572510016 pyconfig.py:471] Config param enable_emergency_checkpoint: False I0424 12:30:16.216511 139204572510016 pyconfig.py:471] Config param enable_expert_parallel: False I0424 12:30:16.216532 139204572510016 pyconfig.py:471] Config param enable_gcp_goodput_metrics: True I0424 12:30:16.216557 139204572510016 pyconfig.py:471] Config param enable_gcp_step_deviation_metrics: True I0424 12:30:16.216581 139204572510016 pyconfig.py:471] Config param enable_goodput_recording: False I0424 12:30:16.216600 139204572510016 pyconfig.py:471] Config param enable_jax_profiler: False I0424 12:30:16.216624 139204572510016 pyconfig.py:471] Config param enable_llm_inference_pool: False I0424 12:30:16.216644 139204572510016 pyconfig.py:471] Config param enable_model_warmup: False I0424 12:30:16.216663 139204572510016 pyconfig.py:471] Config param enable_multi_tier_checkpointing: False I0424 12:30:16.216683 139204572510016 pyconfig.py:471] Config param enable_nnx: False I0424 12:30:16.216702 139204572510016 pyconfig.py:471] Config param enable_orbax_v1: False I0424 12:30:16.216722 139204572510016 pyconfig.py:471] Config param enable_padding_causal_mask: True I0424 12:30:16.216742 139204572510016 pyconfig.py:471] Config param enable_pathways_goodput: False I0424 12:30:16.216761 139204572510016 pyconfig.py:471] Config param enable_prefix_caching: False I0424 12:30:16.216781 139204572510016 pyconfig.py:471] Config param enable_rampup_batch_size: False I0424 12:30:16.216803 139204572510016 pyconfig.py:471] Config param enable_single_controller: False I0424 12:30:16.216825 139204572510016 pyconfig.py:471] Config param enable_single_replica_ckpt_restoring: False I0424 12:30:16.216845 139204572510016 pyconfig.py:471] Config param enable_tensorboard: True I0424 12:30:16.216866 139204572510016 pyconfig.py:471] Config param enable_tunix_perf_metrics: False I0424 12:30:16.216890 139204572510016 pyconfig.py:471] Config param encoder_attention_heads_for_audio: 4 I0424 12:30:16.216908 139204572510016 pyconfig.py:471] Config param encoder_ffn_dim_for_audio: 512 I0424 12:30:16.216924 139204572510016 pyconfig.py:471] Config param encoder_layers_for_audio: 2 I0424 12:30:16.216938 139204572510016 pyconfig.py:471] Config param engram: RematLocation.REMAT I0424 12:30:16.216955 139204572510016 pyconfig.py:471] Config param engram_head_dim: 1280 I0424 12:30:16.216969 139204572510016 pyconfig.py:471] Config param engram_kernel_size: 4 I0424 12:30:16.216985 139204572510016 pyconfig.py:471] Config param engram_layers: [] I0424 12:30:16.217000 139204572510016 pyconfig.py:471] Config param engram_max_ngram_size: 3 I0424 12:30:16.217015 139204572510016 pyconfig.py:471] Config param engram_num_heads: 8 I0424 12:30:16.217031 139204572510016 pyconfig.py:471] Config param engram_seed: 0 I0424 12:30:16.217045 139204572510016 pyconfig.py:471] Config param engram_vocab_bases: [] I0424 12:30:16.217060 139204572510016 pyconfig.py:471] Config param epsilon_high: None I0424 12:30:16.217074 139204572510016 pyconfig.py:471] Config param eval_corr_lst: False I0424 12:30:16.217090 139204572510016 pyconfig.py:471] Config param eval_data_columns: ['text'] I0424 12:30:16.217122 139204572510016 pyconfig.py:471] Config param eval_dataset_name: c4/en:3.0.1 I0424 12:30:16.217136 139204572510016 pyconfig.py:471] Config param eval_image_column: image I0424 12:30:16.217151 139204572510016 pyconfig.py:471] Config param eval_interval: -1 I0424 12:30:16.217167 139204572510016 pyconfig.py:471] Config param eval_make_lst: False I0424 12:30:16.217181 139204572510016 pyconfig.py:471] Config param eval_per_device_batch_size: 2 I0424 12:30:16.217197 139204572510016 pyconfig.py:471] Config param eval_sampling_strategy: greedy I0424 12:30:16.217212 139204572510016 pyconfig.py:471] Config param eval_split: validation I0424 12:30:16.217227 139204572510016 pyconfig.py:471] Config param eval_steps: -1 I0424 12:30:16.217241 139204572510016 pyconfig.py:471] Config param expansion_factor_real_data: -1.0 I0424 12:30:16.217258 139204572510016 pyconfig.py:471] Config param final_logits_soft_cap: None I0424 12:30:16.217272 139204572510016 pyconfig.py:471] Config param first_num_dense_layers: 0 I0424 12:30:16.217290 139204572510016 pyconfig.py:471] Config param float32_gate_logits: False I0424 12:30:16.217305 139204572510016 pyconfig.py:471] Config param float32_logits: False I0424 12:30:16.217321 139204572510016 pyconfig.py:471] Config param float32_qk_product: False I0424 12:30:16.217335 139204572510016 pyconfig.py:471] Config param float32_weight_sum: True I0424 12:30:16.217351 139204572510016 pyconfig.py:471] Config param force_q_layout: False I0424 12:30:16.217366 139204572510016 pyconfig.py:471] Config param force_unroll: False I0424 12:30:16.217381 139204572510016 pyconfig.py:471] Config param formatting_func_kwargs: {} I0424 12:30:16.217399 139204572510016 pyconfig.py:471] Config param formatting_func_path: I0424 12:30:16.217414 139204572510016 pyconfig.py:471] Config param freeze_audio_encoder_params: True I0424 12:30:16.217429 139204572510016 pyconfig.py:471] Config param freeze_vision_encoder_params: True I0424 12:30:16.217444 139204572510016 pyconfig.py:471] Config param fused_mlp: False I0424 12:30:16.217459 139204572510016 pyconfig.py:471] Config param fused_qkv: True I0424 12:30:16.217475 139204572510016 pyconfig.py:471] Config param gcs_metrics: False I0424 12:30:16.217489 139204572510016 pyconfig.py:471] Config param gdn_chunk_size: 64 I0424 12:30:16.217505 139204572510016 pyconfig.py:471] Config param gdn_conv_kernel_dim: 4 I0424 12:30:16.217520 139204572510016 pyconfig.py:471] Config param gdn_key_head_dim: 128 I0424 12:30:16.217540 139204572510016 pyconfig.py:471] Config param gdn_num_key_heads: 16 I0424 12:30:16.217554 139204572510016 pyconfig.py:471] Config param gdn_num_value_heads: 32 I0424 12:30:16.217569 139204572510016 pyconfig.py:471] Config param gdn_value_head_dim: 128 I0424 12:30:16.217585 139204572510016 pyconfig.py:471] Config param generate_padding_batch_eval: False I0424 12:30:16.217599 139204572510016 pyconfig.py:471] Config param generate_padding_batch_train: False I0424 12:30:16.217615 139204572510016 pyconfig.py:471] Config param generate_slice: v5e-16 I0424 12:30:16.217629 139204572510016 pyconfig.py:471] Config param generation_configs: {} I0424 12:30:16.217645 139204572510016 pyconfig.py:471] Config param global_batch_size_to_eval_on: 64 I0424 12:30:16.217659 139204572510016 pyconfig.py:471] Config param global_batch_size_to_load: 512 I0424 12:30:16.217674 139204572510016 pyconfig.py:471] Config param global_batch_size_to_load_eval: 64 I0424 12:30:16.217689 139204572510016 pyconfig.py:471] Config param global_batch_size_to_load_increment: None I0424 12:30:16.217705 139204572510016 pyconfig.py:471] Config param global_batch_size_to_load_start: None I0424 12:30:16.217719 139204572510016 pyconfig.py:471] Config param global_batch_size_to_train_on: 512 I0424 12:30:16.217735 139204572510016 pyconfig.py:471] Config param global_head_dim: 0 I0424 12:30:16.217750 139204572510016 pyconfig.py:471] Config param global_num_kv_heads: 0 I0424 12:30:16.217766 139204572510016 pyconfig.py:471] Config param global_parameter_scale: 1 I0424 12:30:16.217780 139204572510016 pyconfig.py:471] Config param global_rampup_samples: 500 I0424 12:30:16.217796 139204572510016 pyconfig.py:471] Config param global_rope_max_timescale: -1 I0424 12:30:16.217812 139204572510016 pyconfig.py:471] Config param global_rope_proportion: 0.25 I0424 12:30:16.217828 139204572510016 pyconfig.py:471] Config param goodput_upload_interval_seconds: 30 I0424 12:30:16.217843 139204572510016 pyconfig.py:471] Config param grad_dtype: float32 I0424 12:30:16.217885 139204572510016 pyconfig.py:471] Config param gradient_accumulation_steps: 8 I0424 12:30:16.217904 139204572510016 pyconfig.py:471] Config param gradient_clipping_threshold: 1.0 I0424 12:30:16.217920 139204572510016 pyconfig.py:471] Config param grain_data_source_max_workers: 16 I0424 12:30:16.217936 139204572510016 pyconfig.py:471] Config param grain_eval_files: I0424 12:30:16.217952 139204572510016 pyconfig.py:471] Config param grain_file_type: arrayrecord I0424 12:30:16.217969 139204572510016 pyconfig.py:471] Config param grain_num_threads: 16 I0424 12:30:16.217984 139204572510016 pyconfig.py:471] Config param grain_num_threads_eval: 16 I0424 12:30:16.218000 139204572510016 pyconfig.py:471] Config param grain_packing_type: first_fit I0424 12:30:16.218015 139204572510016 pyconfig.py:471] Config param grain_per_worker_buffer_size: 1 I0424 12:30:16.218031 139204572510016 pyconfig.py:471] Config param grain_per_worker_buffer_size_eval: 1 I0424 12:30:16.218045 139204572510016 pyconfig.py:471] Config param grain_prefetch_buffer_size: 500 I0424 12:30:16.218064 139204572510016 pyconfig.py:471] Config param grain_prefetch_buffer_size_eval: 500 I0424 12:30:16.218078 139204572510016 pyconfig.py:471] Config param grain_ram_budget_mb: 1024 I0424 12:30:16.218094 139204572510016 pyconfig.py:471] Config param grain_shuffle_buffer_size: 100 I0424 12:30:16.218122 139204572510016 pyconfig.py:471] Config param grain_train_files: I0424 12:30:16.218137 139204572510016 pyconfig.py:471] Config param grain_train_mixture_config_path: I0424 12:30:16.218152 139204572510016 pyconfig.py:471] Config param grain_worker_count: 1 I0424 12:30:16.218168 139204572510016 pyconfig.py:471] Config param grain_worker_count_eval: 1 I0424 12:30:16.218182 139204572510016 pyconfig.py:471] Config param grpo_beta: 0.08 I0424 12:30:16.218198 139204572510016 pyconfig.py:471] Config param grpo_epsilon: 0.2 I0424 12:30:16.218213 139204572510016 pyconfig.py:471] Config param hardware: tpu I0424 12:30:16.218229 139204572510016 pyconfig.py:471] Config param hbm_utilization_vllm: 0.72 I0424 12:30:16.218243 139204572510016 pyconfig.py:471] Config param head_dim: 8 I0424 12:30:16.218259 139204572510016 pyconfig.py:471] Config param heartbeat_reporting_interval_in_seconds: 5 I0424 12:30:16.218274 139204572510016 pyconfig.py:471] Config param hf_data_dir: None I0424 12:30:16.218290 139204572510016 pyconfig.py:471] Config param hf_eval_files: None I0424 12:30:16.218304 139204572510016 pyconfig.py:471] Config param hf_eval_split: None I0424 12:30:16.218320 139204572510016 pyconfig.py:471] Config param hf_name: None I0424 12:30:16.218334 139204572510016 pyconfig.py:471] Config param hf_path: OptimalScale/ClimbMix I0424 12:30:16.218350 139204572510016 pyconfig.py:471] Config param hf_train_files: None I0424 12:30:16.218364 139204572510016 pyconfig.py:471] Config param hidden_size_for_vit: 1408 I0424 12:30:16.218380 139204572510016 pyconfig.py:471] Config param hide_profiler_step_metric: False I0424 12:30:16.218394 139204572510016 pyconfig.py:471] Config param ici_autoregressive_parallelism: 1 I0424 12:30:16.218410 139204572510016 pyconfig.py:471] Config param ici_context_autoregressive_parallelism: 1 I0424 12:30:16.218424 139204572510016 pyconfig.py:471] Config param ici_context_parallelism: 1 I0424 12:30:16.218438 139204572510016 pyconfig.py:471] Config param ici_data_parallelism: 1 I0424 12:30:16.218452 139204572510016 pyconfig.py:471] Config param ici_diloco_parallelism: 1 I0424 12:30:16.218466 139204572510016 pyconfig.py:471] Config param ici_expert_parallelism: 1 I0424 12:30:16.218482 139204572510016 pyconfig.py:471] Config param ici_fsdp_parallelism: -1 I0424 12:30:16.218496 139204572510016 pyconfig.py:471] Config param ici_fsdp_transpose_parallelism: 1 I0424 12:30:16.218511 139204572510016 pyconfig.py:471] Config param ici_parallelism: [1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1] I0424 12:30:16.218527 139204572510016 pyconfig.py:471] Config param ici_pipeline_parallelism: 1 I0424 12:30:16.218547 139204572510016 pyconfig.py:471] Config param ici_sequence_parallelism: 1 I0424 12:30:16.218561 139204572510016 pyconfig.py:471] Config param ici_tensor_parallelism: 1 I0424 12:30:16.218576 139204572510016 pyconfig.py:471] Config param ici_tensor_sequence_parallelism: 1 I0424 12:30:16.218590 139204572510016 pyconfig.py:471] Config param ici_tensor_transpose_parallelism: 1 I0424 12:30:16.218607 139204572510016 pyconfig.py:471] Config param image_path: I0424 12:30:16.218621 139204572510016 pyconfig.py:471] Config param image_placeholder: <|image|> I0424 12:30:16.218637 139204572510016 pyconfig.py:471] Config param image_size_for_vit: 896 I0424 12:30:16.218651 139204572510016 pyconfig.py:471] Config param indexer_head_dim: 128 I0424 12:30:16.218667 139204572510016 pyconfig.py:471] Config param indexer_loss_scaling_factor: 0.0 I0424 12:30:16.218681 139204572510016 pyconfig.py:471] Config param indexer_n_heads: 64 I0424 12:30:16.218697 139204572510016 pyconfig.py:471] Config param indexer_sparse_training: False I0424 12:30:16.218711 139204572510016 pyconfig.py:471] Config param indexer_topk: 2048 I0424 12:30:16.218727 139204572510016 pyconfig.py:471] Config param inference_benchmark_test: False I0424 12:30:16.218741 139204572510016 pyconfig.py:471] Config param inference_metadata_file: I0424 12:30:16.218757 139204572510016 pyconfig.py:471] Config param inference_microbenchmark_log_file_path: I0424 12:30:16.218771 139204572510016 pyconfig.py:471] Config param inference_microbenchmark_loop_iters: 10 I0424 12:30:16.218786 139204572510016 pyconfig.py:471] Config param inference_microbenchmark_num_samples: [1, 2, 3, 4, 5] I0424 12:30:16.218801 139204572510016 pyconfig.py:471] Config param inference_microbenchmark_prefill_lengths: 64,128,256,512,1024 I0424 12:30:16.218818 139204572510016 pyconfig.py:471] Config param inference_microbenchmark_stages: prefill,generate I0424 12:30:16.218832 139204572510016 pyconfig.py:471] Config param inference_server: MaxtextInterleavedServer I0424 12:30:16.218848 139204572510016 pyconfig.py:471] Config param inhomogeneous_layer_cycle_interval: 1 I0424 12:30:16.218862 139204572510016 pyconfig.py:471] Config param init_weights_seed: 0 I0424 12:30:16.218878 139204572510016 pyconfig.py:471] Config param input_data_sharding_logical_axes: ['activation_embed_and_logits_batch', 'activation_norm_length'] I0424 12:30:16.218894 139204572510016 pyconfig.py:471] Config param interleave_moe_layer_step: 1 I0424 12:30:16.218910 139204572510016 pyconfig.py:471] Config param intermediate_size_for_vit: 5632 I0424 12:30:16.218924 139204572510016 pyconfig.py:471] Config param internal_compile: False I0424 12:30:16.218940 139204572510016 pyconfig.py:471] Config param internal_compile_num_devices: -1 I0424 12:30:16.218954 139204572510016 pyconfig.py:471] Config param jax_cache_dir: ~/jax_cache I0424 12:30:16.218970 139204572510016 pyconfig.py:471] Config param jax_debug_log_modules: I0424 12:30:16.218984 139204572510016 pyconfig.py:471] Config param jax_distributed_initialization_timeout: 300 I0424 12:30:16.219000 139204572510016 pyconfig.py:471] Config param jax_profiler_port: 9999 I0424 12:30:16.219014 139204572510016 pyconfig.py:471] Config param key_proj: RematLocation.REMAT I0424 12:30:16.219031 139204572510016 pyconfig.py:471] Config param kv_cache_buffer: 256 I0424 12:30:16.219045 139204572510016 pyconfig.py:471] Config param kv_lora_rank: 512 I0424 12:30:16.219061 139204572510016 pyconfig.py:471] Config param kv_quant_axis: KvQuantAxis.HEADS_AND_DKV I0424 12:30:16.219081 139204572510016 pyconfig.py:471] Config param kv_quant_dtype: int8 I0424 12:30:16.219104 139204572510016 pyconfig.py:471] Config param kv_wa_proj: RematLocation.REMAT I0424 12:30:16.219119 139204572510016 pyconfig.py:471] Config param learning_rate: 0.0002 I0424 12:30:16.219135 139204572510016 pyconfig.py:471] Config param learning_rate_final_fraction: 0.1 I0424 12:30:16.219150 139204572510016 pyconfig.py:471] Config param learning_rate_schedule_steps: 200000 I0424 12:30:16.219164 139204572510016 pyconfig.py:471] Config param load_balance_loss_weight: 0.0 I0424 12:30:16.219181 139204572510016 pyconfig.py:471] Config param load_checkpoint_only_once: False I0424 12:30:16.219197 139204572510016 pyconfig.py:471] Config param load_from_prefill_dir: False I0424 12:30:16.219213 139204572510016 pyconfig.py:471] Config param load_full_state_path: I0424 12:30:16.219228 139204572510016 pyconfig.py:471] Config param load_parameters_path: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items I0424 12:30:16.219243 139204572510016 pyconfig.py:471] Config param local_checkpoint_directory: I0424 12:30:16.219258 139204572510016 pyconfig.py:471] Config param local_checkpoint_period: 0 I0424 12:30:16.219274 139204572510016 pyconfig.py:471] Config param local_rope_max_timescale: -1 I0424 12:30:16.219289 139204572510016 pyconfig.py:471] Config param local_rope_proportion: 1.0 I0424 12:30:16.219305 139204572510016 pyconfig.py:471] Config param log_config: True I0424 12:30:16.219320 139204572510016 pyconfig.py:471] Config param log_period: 10 I0424 12:30:16.219336 139204572510016 pyconfig.py:471] Config param logical_axis_rules: (('activation_embed_and_logits_batch', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_embed_and_logits_batch_sequence', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_vocab', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_vocab', ('tensor', 'tensor_transpose')), ('activation_vocab', 'tensor_sequence'), ('activation_vocab', ('sequence', 'context')), ('vocab', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('embed_vocab', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_batch_attn', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence', 'autoregressive')), ('activation_kv_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence')), ('activation_length_attn', ('sequence', 'context')), ('activation_length_attn', ('context',)), ('activation_q_length', ('context',)), ('activation_kv_length', ()), ('activation_embed_attn', ('tensor', 'tensor_transpose')), ('activation_kv', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_kv_head_dim', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('q_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('kv_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('qkv', ()), ('kv', ()), ('kv_head_dim', ()), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'expert')), ('q_lora_up_proj', ()), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'expert')), ('kv_lora_up_proj', ()), ('activation_batch_moe', ('data', 'fsdp', 'fsdp_transpose')), ('activation_length_moe', ('sequence', 'context')), ('activation_length_moe', ('context',)), ('activation_norm_length_moe', ('tensor_sequence', 'context', 'sequence')), ('activation_embed_moe', ('tensor', 'tensor_transpose')), ('activation_mlp_moe', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_exp', ('expert',)), ('exp', 'expert'), ('mlp_moe', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'context')), ('embed_moe', ('fsdp', 'sequence', 'context')), ('activation_mlp', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_length', ('sequence', 'context')), ('activation_length', ('context',)), ('activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_embed', ('tensor', 'tensor_transpose')), ('activation_stage', 'stage'), ('mlp', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'context', 'expert')), ('norm', ('tensor', 'tensor_transpose')), ('layers', 'stage'), ('diloco', 'diloco'), ('engram_dim', ('tensor',)), ('dense_layers', ()), ('moe_layers', ()), ('mhc', ()), ('prefill_activation_length', ('sequence', 'context')), ('prefill_activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_prefill_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_length', ('sequence',)), ('cache_heads', ('autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence')), ('cache_heads', ('autoregressive', 'tensor', 'tensor_sequence')), ('paged_kv_heads', ('tensor',)), ('cache_batch_prefill', ()), ('cache_batch', ()), ('cache_heads_none', ()), ('cache_kv', ()), ('cache_sequence', ()), ('num_pages', ()), ('tokens_per_page', ()), ('paged_kv_head_dim_size', ()), ('mlp_no_fsdp', ('tensor', 'tensor_sequence', 'autoregressive')), ('embed_tensor_transpose', ('tensor_transpose',)), ('exp_with_fsdp', 'fsdp')) I0424 12:30:16.219444 139204572510016 pyconfig.py:471] Config param logits_dot_in_fp32: False I0424 12:30:16.219478 139204572510016 pyconfig.py:471] Config param logits_via_embedding: True I0424 12:30:16.219496 139204572510016 pyconfig.py:471] Config param lora_input_adapters_path: I0424 12:30:16.219513 139204572510016 pyconfig.py:471] Config param loss_algo: grpo I0424 12:30:16.219528 139204572510016 pyconfig.py:471] Config param lr_schedule_type: LearningRateScheduleType.COSINE I0424 12:30:16.219549 139204572510016 pyconfig.py:471] Config param managed_mldiagnostics: False I0424 12:30:16.219563 139204572510016 pyconfig.py:471] Config param managed_mldiagnostics_dir: /deps/maxtext_output/gpt3-52k_2026-04-24-12-30/managed-mldiagnostics I0424 12:30:16.219578 139204572510016 pyconfig.py:471] Config param managed_mldiagnostics_run_group: I0424 12:30:16.219594 139204572510016 pyconfig.py:471] Config param matmul_precision: MatmulPrecision.DEFAULT I0424 12:30:16.219612 139204572510016 pyconfig.py:471] Config param max_checkify: False I0424 12:30:16.219626 139204572510016 pyconfig.py:471] Config param max_concurrency: 256 I0424 12:30:16.219642 139204572510016 pyconfig.py:471] Config param max_corpus_chars: 10000000 I0424 12:30:16.219660 139204572510016 pyconfig.py:471] Config param max_num_batched_tokens: None I0424 12:30:16.219682 139204572510016 pyconfig.py:471] Config param max_num_checkpoints_to_keep: None I0424 12:30:16.219697 139204572510016 pyconfig.py:471] Config param max_num_images_per_example: -1 I0424 12:30:16.219713 139204572510016 pyconfig.py:471] Config param max_num_seqs: None I0424 12:30:16.219728 139204572510016 pyconfig.py:471] Config param max_position_embeddings: 163840 I0424 12:30:16.219743 139204572510016 pyconfig.py:471] Config param max_prefill_predict_length: 64 I0424 12:30:16.219758 139204572510016 pyconfig.py:471] Config param max_sample_len_for_audio: 10000 I0424 12:30:16.219774 139204572510016 pyconfig.py:471] Config param max_segments_per_seq: -1 I0424 12:30:16.219788 139204572510016 pyconfig.py:471] Config param max_source_positions_for_audio: 1500 I0424 12:30:16.219803 139204572510016 pyconfig.py:471] Config param max_target_length: 2048 I0424 12:30:16.219818 139204572510016 pyconfig.py:471] Config param max_timescale_for_audio: 10000.0 I0424 12:30:16.219834 139204572510016 pyconfig.py:471] Config param megablox: True I0424 12:30:16.219849 139204572510016 pyconfig.py:471] Config param merge_gating_gmm: False I0424 12:30:16.219865 139204572510016 pyconfig.py:471] Config param mesh_axes: ['diloco', 'data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'] I0424 12:30:16.219882 139204572510016 pyconfig.py:471] Config param metrics_dir: /deps/maxtext_output/gpt3-52k_2026-04-24-12-30/metrics/ I0424 12:30:16.219899 139204572510016 pyconfig.py:471] Config param metrics_file: I0424 12:30:16.219913 139204572510016 pyconfig.py:471] Config param mhc_expansion_rate: 1 I0424 12:30:16.219928 139204572510016 pyconfig.py:471] Config param micro_batch_size_to_eval_on: 64 I0424 12:30:16.219943 139204572510016 pyconfig.py:471] Config param micro_batch_size_to_train_on: 64 I0424 12:30:16.219958 139204572510016 pyconfig.py:471] Config param mla_kv: RematLocation.REMAT I0424 12:30:16.219973 139204572510016 pyconfig.py:471] Config param mla_naive_kvcache: True I0424 12:30:16.219989 139204572510016 pyconfig.py:471] Config param mla_q: RematLocation.REMAT I0424 12:30:16.220003 139204572510016 pyconfig.py:471] Config param mlp_activations: ['gelu'] I0424 12:30:16.220020 139204572510016 pyconfig.py:471] Config param mlp_activations_limit: -1.0 I0424 12:30:16.220034 139204572510016 pyconfig.py:471] Config param mlp_bias: False I0424 12:30:16.220049 139204572510016 pyconfig.py:471] Config param mlp_dim: 64 I0424 12:30:16.220063 139204572510016 pyconfig.py:471] Config param mlpwi: RematLocation.REMAT I0424 12:30:16.220080 139204572510016 pyconfig.py:471] Config param mlpwi_0: RematLocation.REMAT I0424 12:30:16.220102 139204572510016 pyconfig.py:471] Config param mlpwi_1: RematLocation.REMAT I0424 12:30:16.220118 139204572510016 pyconfig.py:471] Config param mlpwo: RematLocation.REMAT I0424 12:30:16.220133 139204572510016 pyconfig.py:471] Config param moba: False I0424 12:30:16.220149 139204572510016 pyconfig.py:471] Config param moba_chunk_size: 1024 I0424 12:30:16.220163 139204572510016 pyconfig.py:471] Config param moba_topk: 8 I0424 12:30:16.220179 139204572510016 pyconfig.py:471] Config param model_call_mode: I0424 12:30:16.220194 139204572510016 pyconfig.py:471] Config param model_name: gpt3-52k I0424 12:30:16.220210 139204572510016 pyconfig.py:471] Config param moe_expert_input_dim: -1 I0424 12:30:16.220224 139204572510016 pyconfig.py:471] Config param moe_fsdp_use_two_stage_all_gather: False I0424 12:30:16.220240 139204572510016 pyconfig.py:471] Config param moe_mlp_dim: -1 I0424 12:30:16.220254 139204572510016 pyconfig.py:471] Config param moe_mlpwi_0: RematLocation.REMAT I0424 12:30:16.220270 139204572510016 pyconfig.py:471] Config param moe_mlpwi_1: RematLocation.REMAT I0424 12:30:16.220285 139204572510016 pyconfig.py:471] Config param moe_mlpwo: RematLocation.REMAT I0424 12:30:16.220301 139204572510016 pyconfig.py:471] Config param monitor_goodput: False I0424 12:30:16.220316 139204572510016 pyconfig.py:471] Config param monitor_step_time_deviation: True I0424 12:30:16.220332 139204572510016 pyconfig.py:471] Config param mrope_section: [24, 20, 20] I0424 12:30:16.220346 139204572510016 pyconfig.py:471] Config param mscale: 1.0 I0424 12:30:16.220362 139204572510016 pyconfig.py:471] Config param mtc_data_parallelism: 0 I0424 12:30:16.220377 139204572510016 pyconfig.py:471] Config param mtp_eval_target_module: 0 I0424 12:30:16.220393 139204572510016 pyconfig.py:471] Config param mtp_loss_scaling_factor: 0.1 I0424 12:30:16.220407 139204572510016 pyconfig.py:471] Config param mtp_num_layers: 0 I0424 12:30:16.220423 139204572510016 pyconfig.py:471] Config param mu_dtype: float32 I0424 12:30:16.220448 139204572510016 pyconfig.py:471] Config param multi_sampling: False I0424 12:30:16.220463 139204572510016 pyconfig.py:471] Config param multi_tier_checkpointing_backup_interval_minutes: 0 I0424 12:30:16.220480 139204572510016 pyconfig.py:471] Config param muon_beta: 0.95 I0424 12:30:16.220496 139204572510016 pyconfig.py:471] Config param muon_consistent_rms: None I0424 12:30:16.220513 139204572510016 pyconfig.py:471] Config param muon_weight_decay: 0.0 I0424 12:30:16.220530 139204572510016 pyconfig.py:471] Config param n_routing_groups: -1 I0424 12:30:16.220549 139204572510016 pyconfig.py:471] Config param n_window_for_audio: 50 I0424 12:30:16.220564 139204572510016 pyconfig.py:471] Config param n_window_infer_for_audio: 800 I0424 12:30:16.220580 139204572510016 pyconfig.py:471] Config param nope_layer_interval: -1 I0424 12:30:16.220596 139204572510016 pyconfig.py:471] Config param norm_topk_prob: False I0424 12:30:16.220611 139204572510016 pyconfig.py:471] Config param normalization_layer_epsilon: 1e-05 I0424 12:30:16.220632 139204572510016 pyconfig.py:471] Config param normalize_embedding_logits: False I0424 12:30:16.220648 139204572510016 pyconfig.py:471] Config param num_attention_heads_for_vit: 16 I0424 12:30:16.220664 139204572510016 pyconfig.py:471] Config param num_batches: 4 I0424 12:30:16.220680 139204572510016 pyconfig.py:471] Config param num_channels_for_vit: 3 I0424 12:30:16.220696 139204572510016 pyconfig.py:471] Config param num_conv_layers_for_audio: 3 I0424 12:30:16.220710 139204572510016 pyconfig.py:471] Config param num_decoder_layers: 1 I0424 12:30:16.220726 139204572510016 pyconfig.py:471] Config param num_diloco_replicas: 1 I0424 12:30:16.220741 139204572510016 pyconfig.py:471] Config param num_epoch: 1 I0424 12:30:16.220756 139204572510016 pyconfig.py:471] Config param num_eval_passes: 1 I0424 12:30:16.220772 139204572510016 pyconfig.py:471] Config param num_experts: 1 I0424 12:30:16.220786 139204572510016 pyconfig.py:471] Config param num_experts_per_tok: 1 I0424 12:30:16.220802 139204572510016 pyconfig.py:471] Config param num_generations: 2 I0424 12:30:16.220816 139204572510016 pyconfig.py:471] Config param num_hidden_layers_for_vit: 34 I0424 12:30:16.220832 139204572510016 pyconfig.py:471] Config param num_iterations: 1 I0424 12:30:16.220847 139204572510016 pyconfig.py:471] Config param num_kv_heads: 2 I0424 12:30:16.220862 139204572510016 pyconfig.py:471] Config param num_layers_per_pipeline_stage: 1 I0424 12:30:16.220878 139204572510016 pyconfig.py:471] Config param num_mel_bins_for_audio: 128 I0424 12:30:16.220892 139204572510016 pyconfig.py:471] Config param num_pipeline_microbatches: -1 I0424 12:30:16.220908 139204572510016 pyconfig.py:471] Config param num_pipeline_repeats: -1 I0424 12:30:16.220922 139204572510016 pyconfig.py:471] Config param num_position_embeddings_for_vit: 1024 I0424 12:30:16.220938 139204572510016 pyconfig.py:471] Config param num_query_heads: 2 I0424 12:30:16.220954 139204572510016 pyconfig.py:471] Config param num_samplers_slices: -1 I0424 12:30:16.220969 139204572510016 pyconfig.py:471] Config param num_slices: 1 I0424 12:30:16.220984 139204572510016 pyconfig.py:471] Config param num_target_devices: 32 I0424 12:30:16.220999 139204572510016 pyconfig.py:471] Config param num_test_batches: 5 I0424 12:30:16.221014 139204572510016 pyconfig.py:471] Config param num_trainer_slices: -1 I0424 12:30:16.221030 139204572510016 pyconfig.py:471] Config param num_vocab_tiling: 1 I0424 12:30:16.221046 139204572510016 pyconfig.py:471] Config param off_policy_steps: 0 I0424 12:30:16.221061 139204572510016 pyconfig.py:471] Config param offline_data_dir: None I0424 12:30:16.221076 139204572510016 pyconfig.py:471] Config param opt_type: OptimizerType.ADAM_PAX I0424 12:30:16.221106 139204572510016 pyconfig.py:471] Config param optimize_mesh_for_tpu_v6e: False I0424 12:30:16.221123 139204572510016 pyconfig.py:471] Config param optimizer_memory_host_offload: False I0424 12:30:16.221138 139204572510016 pyconfig.py:471] Config param original_max_position_embeddings: 4096 I0424 12:30:16.221153 139204572510016 pyconfig.py:471] Config param out_hidden_size_for_vit: 512 I0424 12:30:16.221169 139204572510016 pyconfig.py:471] Config param out_proj: RematLocation.REMAT I0424 12:30:16.221184 139204572510016 pyconfig.py:471] Config param output_dim_for_audio: 512 I0424 12:30:16.221199 139204572510016 pyconfig.py:471] Config param override_logical_axis_rules: False I0424 12:30:16.221214 139204572510016 pyconfig.py:471] Config param override_model_config: True I0424 12:30:16.221229 139204572510016 pyconfig.py:471] Config param packing: True I0424 12:30:16.221245 139204572510016 pyconfig.py:471] Config param pagedattn_head_dim_alignment: 128 I0424 12:30:16.221259 139204572510016 pyconfig.py:471] Config param pagedattn_max_pages_per_group: -1 I0424 12:30:16.221275 139204572510016 pyconfig.py:471] Config param pagedattn_num_pages: 64 I0424 12:30:16.221289 139204572510016 pyconfig.py:471] Config param pagedattn_pages_per_compute_block: 4 I0424 12:30:16.221305 139204572510016 pyconfig.py:471] Config param pagedattn_tokens_per_page: 32 I0424 12:30:16.221319 139204572510016 pyconfig.py:471] Config param param_scan_axis: 1 I0424 12:30:16.221335 139204572510016 pyconfig.py:471] Config param parameter_memory_host_offload: False I0424 12:30:16.221350 139204572510016 pyconfig.py:471] Config param partial_rotary_factor: 1.0 I0424 12:30:16.221365 139204572510016 pyconfig.py:471] Config param patch_size_for_vit: 14 I0424 12:30:16.221379 139204572510016 pyconfig.py:471] Config param penalty_incorrect_answer: -1.0 I0424 12:30:16.221395 139204572510016 pyconfig.py:471] Config param penalty_incorrect_format: -0.5 I0424 12:30:16.221412 139204572510016 pyconfig.py:471] Config param per_device_batch_size: 2 I0424 12:30:16.221426 139204572510016 pyconfig.py:471] Config param per_device_batch_size_increment: 2.0 I0424 12:30:16.221442 139204572510016 pyconfig.py:471] Config param per_device_batch_size_start: 4.0 I0424 12:30:16.221458 139204572510016 pyconfig.py:471] Config param pipeline_delay_activation_forwarding: False I0424 12:30:16.221472 139204572510016 pyconfig.py:471] Config param pipeline_fsdp_ag_once: False I0424 12:30:16.221488 139204572510016 pyconfig.py:471] Config param pipeline_fsdp_ag_per_repeat: False I0424 12:30:16.221503 139204572510016 pyconfig.py:471] Config param pipeline_parallel_layers: 1 I0424 12:30:16.221519 139204572510016 pyconfig.py:471] Config param pixel_shuffle_ratio_for_vit: 0.5 I0424 12:30:16.221534 139204572510016 pyconfig.py:471] Config param posemb_type_for_vit: learn I0424 12:30:16.221556 139204572510016 pyconfig.py:471] Config param position_id_per_seconds: 25 I0424 12:30:16.221573 139204572510016 pyconfig.py:471] Config param prefill_cache_axis_order: 1,2,0,3 I0424 12:30:16.221587 139204572510016 pyconfig.py:471] Config param prefill_cache_dir: I0424 12:30:16.221603 139204572510016 pyconfig.py:471] Config param prefill_chunk_size: 256 I0424 12:30:16.221617 139204572510016 pyconfig.py:471] Config param prefill_slice: v5e-16 I0424 12:30:16.221633 139204572510016 pyconfig.py:471] Config param prefix_caching_dram_byte: 100000000000 I0424 12:30:16.221648 139204572510016 pyconfig.py:471] Config param prefix_caching_hbm_byte: 10000000000 I0424 12:30:16.221664 139204572510016 pyconfig.py:471] Config param prefuse_moe_weights: False I0424 12:30:16.221681 139204572510016 pyconfig.py:471] Config param profile_cleanly: True I0424 12:30:16.221695 139204572510016 pyconfig.py:471] Config param profile_periodically_period: -1 I0424 12:30:16.221711 139204572510016 pyconfig.py:471] Config param profile_power_events: False I0424 12:30:16.221726 139204572510016 pyconfig.py:471] Config param profiler: ProfilerType.NONE I0424 12:30:16.221743 139204572510016 pyconfig.py:471] Config param profiler_steps: 5 I0424 12:30:16.221759 139204572510016 pyconfig.py:471] Config param projector_dropout_for_vit: 0.0 I0424 12:30:16.221774 139204572510016 pyconfig.py:471] Config param projector_input_dim_for_vit: 4096 I0424 12:30:16.221789 139204572510016 pyconfig.py:471] Config param projector_output_dim_for_vit: 4096 I0424 12:30:16.221805 139204572510016 pyconfig.py:471] Config param prometheus_port: 0 I0424 12:30:16.221819 139204572510016 pyconfig.py:471] Config param prompt: I love to I0424 12:30:16.221835 139204572510016 pyconfig.py:471] Config param pure_nnx: False I0424 12:30:16.221849 139204572510016 pyconfig.py:471] Config param pure_nnx_decoder: False I0424 12:30:16.221865 139204572510016 pyconfig.py:471] Config param q_lora_rank: 0 I0424 12:30:16.221879 139204572510016 pyconfig.py:471] Config param qk_clip_threshold: 100.0 I0424 12:30:16.221896 139204572510016 pyconfig.py:471] Config param qk_nope_head_dim: 128 I0424 12:30:16.221911 139204572510016 pyconfig.py:471] Config param qk_norm_with_scale: True I0424 12:30:16.221927 139204572510016 pyconfig.py:471] Config param qk_rope_head_dim: 64 I0424 12:30:16.221941 139204572510016 pyconfig.py:471] Config param qkv_proj: RematLocation.REMAT I0424 12:30:16.221957 139204572510016 pyconfig.py:471] Config param quant_cfg_path: I0424 12:30:16.221972 139204572510016 pyconfig.py:471] Config param quantization: QuantizationType.NONE I0424 12:30:16.221991 139204572510016 pyconfig.py:471] Config param quantization_local_shard_count: 4 I0424 12:30:16.222006 139204572510016 pyconfig.py:471] Config param quantize_kvcache: False I0424 12:30:16.222021 139204572510016 pyconfig.py:471] Config param query_proj: RematLocation.REMAT I0424 12:30:16.222036 139204572510016 pyconfig.py:471] Config param query_wa_proj: RematLocation.REMAT I0424 12:30:16.222053 139204572510016 pyconfig.py:471] Config param ragged_block_size: 256 I0424 12:30:16.222068 139204572510016 pyconfig.py:471] Config param ragged_buffer_factor: -1.0 I0424 12:30:16.222084 139204572510016 pyconfig.py:471] Config param rampup_end_step: 0 I0424 12:30:16.222110 139204572510016 pyconfig.py:471] Config param rampup_samples_per_increment_to_load: None I0424 12:30:16.222126 139204572510016 pyconfig.py:471] Config param reasoning_end_token: </reasoning> I0424 12:30:16.222143 139204572510016 pyconfig.py:471] Config param reasoning_start_token: <reasoning> I0424 12:30:16.222160 139204572510016 pyconfig.py:471] Config param record_internal_nn_metrics: 0 I0424 12:30:16.222177 139204572510016 pyconfig.py:471] Config param remat_policy: full I0424 12:30:16.222192 139204572510016 pyconfig.py:471] Config param remat_policy_for_vit: minimal I0424 12:30:16.222206 139204572510016 pyconfig.py:471] Config param remove_size_one_mesh_axis_from_type: True I0424 12:30:16.222224 139204572510016 pyconfig.py:471] Config param replicate_quant_scale: False I0424 12:30:16.222239 139204572510016 pyconfig.py:471] Config param replicator_backup_interval_minutes: 0 I0424 12:30:16.222255 139204572510016 pyconfig.py:471] Config param report_heartbeat_metric_for_gcp_monitoring: False I0424 12:30:16.222270 139204572510016 pyconfig.py:471] Config param report_performance_metric_for_gcp_monitoring: False I0424 12:30:16.222286 139204572510016 pyconfig.py:471] Config param reshape_q: False I0424 12:30:16.222300 139204572510016 pyconfig.py:471] Config param return_log_prob: False I0424 12:30:16.222316 139204572510016 pyconfig.py:471] Config param reuse_example_batch: 0 I0424 12:30:16.222330 139204572510016 pyconfig.py:471] Config param reward_exact_answer: 5.0 I0424 12:30:16.222347 139204572510016 pyconfig.py:471] Config param reward_exact_format_match: 3.0 I0424 12:30:16.222362 139204572510016 pyconfig.py:471] Config param reward_partial_format_match: 0.5 I0424 12:30:16.222378 139204572510016 pyconfig.py:471] Config param reward_ratio_guess_to_answer_high: 0.5 I0424 12:30:16.222393 139204572510016 pyconfig.py:471] Config param reward_ratio_guess_to_answer_low: 0.25 I0424 12:30:16.222409 139204572510016 pyconfig.py:471] Config param reward_white_space_format_match: 1.5 I0424 12:30:16.222424 139204572510016 pyconfig.py:471] Config param rl: {'num_generations': 2, 'num_iterations': 1, 'grpo_beta': 0.08, 'grpo_epsilon': 0.2, 'loss_algo': 'grpo', 'use_agentic_rollout': False, 'max_concurrency': 256, 'off_policy_steps': 0, 'system_prompt': '', 'degenerate_group_masking': True, 'epsilon_high': None} I0424 12:30:16.222446 139204572510016 pyconfig.py:471] Config param rollout_data_parallelism: -1 I0424 12:30:16.222461 139204572510016 pyconfig.py:471] Config param rollout_expert_parallelism: 1 I0424 12:30:16.222478 139204572510016 pyconfig.py:471] Config param rollout_micro_batch_size: -1 I0424 12:30:16.222493 139204572510016 pyconfig.py:471] Config param rollout_tensor_parallelism: -1 I0424 12:30:16.222509 139204572510016 pyconfig.py:471] Config param rope_attention_scaling: False I0424 12:30:16.222523 139204572510016 pyconfig.py:471] Config param rope_factor: 40 I0424 12:30:16.222544 139204572510016 pyconfig.py:471] Config param rope_interleave: True I0424 12:30:16.222559 139204572510016 pyconfig.py:471] Config param rope_linear_scaling_factor: 1.0 I0424 12:30:16.222575 139204572510016 pyconfig.py:471] Config param rope_max_timescale: 10000 I0424 12:30:16.222590 139204572510016 pyconfig.py:471] Config param rope_min_timescale: 1 I0424 12:30:16.222606 139204572510016 pyconfig.py:471] Config param rope_theta_for_vit: 10000 I0424 12:30:16.222621 139204572510016 pyconfig.py:471] Config param rope_truncate: True I0424 12:30:16.222636 139204572510016 pyconfig.py:471] Config param rope_type: RopeType.DEFAULT I0424 12:30:16.222654 139204572510016 pyconfig.py:471] Config param rope_use_scale: True I0424 12:30:16.222670 139204572510016 pyconfig.py:471] Config param routed_bias: False I0424 12:30:16.222686 139204572510016 pyconfig.py:471] Config param routed_bias_update_rate: 0.0 I0424 12:30:16.222700 139204572510016 pyconfig.py:471] Config param routed_scaling_factor: 1.0 I0424 12:30:16.222716 139204572510016 pyconfig.py:471] Config param routed_score_func: I0424 12:30:16.222732 139204572510016 pyconfig.py:471] Config param run_name: gpt3-52k_2026-04-24-12-30 I0424 12:30:16.222747 139204572510016 pyconfig.py:471] Config param sa_block_kv: 512 I0424 12:30:16.222763 139204572510016 pyconfig.py:471] Config param sa_block_kv_compute: 512 I0424 12:30:16.222777 139204572510016 pyconfig.py:471] Config param sa_block_kv_dkv: 512 I0424 12:30:16.222793 139204572510016 pyconfig.py:471] Config param sa_block_kv_dkv_compute: 512 I0424 12:30:16.222807 139204572510016 pyconfig.py:471] Config param sa_block_kv_dq: 512 I0424 12:30:16.222823 139204572510016 pyconfig.py:471] Config param sa_block_q: 512 I0424 12:30:16.222838 139204572510016 pyconfig.py:471] Config param sa_block_q_dkv: 512 I0424 12:30:16.222852 139204572510016 pyconfig.py:471] Config param sa_block_q_dq: 512 I0424 12:30:16.222868 139204572510016 pyconfig.py:471] Config param sa_k_layout: HEAD_DIM_MINOR I0424 12:30:16.222884 139204572510016 pyconfig.py:471] Config param sa_q_layout: HEAD_DIM_MINOR I0424 12:30:16.222898 139204572510016 pyconfig.py:471] Config param sa_use_fused_bwd_kernel: False I0424 12:30:16.222914 139204572510016 pyconfig.py:471] Config param sa_v_layout: HEAD_DIM_MINOR I0424 12:30:16.222929 139204572510016 pyconfig.py:471] Config param sampler_devices_fraction: 0.5 I0424 12:30:16.222945 139204572510016 pyconfig.py:471] Config param save_checkpoint_on_completion: True I0424 12:30:16.222959 139204572510016 pyconfig.py:471] Config param save_config_to_gcs: False I0424 12:30:16.222975 139204572510016 pyconfig.py:471] Config param save_quantized_params_path: I0424 12:30:16.222990 139204572510016 pyconfig.py:471] Config param scale_embedding_for_audio: True I0424 12:30:16.223006 139204572510016 pyconfig.py:471] Config param scan_layers: True I0424 12:30:16.223020 139204572510016 pyconfig.py:471] Config param scan_layers_per_stage: False I0424 12:30:16.223036 139204572510016 pyconfig.py:471] Config param scan_pipeline_iterations: True I0424 12:30:16.223050 139204572510016 pyconfig.py:471] Config param scan_pipeline_repeats: False I0424 12:30:16.223066 139204572510016 pyconfig.py:471] Config param set_remat_policy_on_layers_per_stage: False I0424 12:30:16.223081 139204572510016 pyconfig.py:471] Config param set_remat_policy_on_pipeline_iterations: True I0424 12:30:16.223108 139204572510016 pyconfig.py:471] Config param sft_train_on_completion_only: False I0424 12:30:16.223125 139204572510016 pyconfig.py:471] Config param shard_exp_on_fsdp: False I0424 12:30:16.223140 139204572510016 pyconfig.py:471] Config param shard_mode: ShardMode.AUTO I0424 12:30:16.223157 139204572510016 pyconfig.py:471] Config param shard_optimizer_over_data: False I0424 12:30:16.223172 139204572510016 pyconfig.py:471] Config param sharding_strategy: None I0424 12:30:16.223188 139204572510016 pyconfig.py:471] Config param sharding_tolerance: 0.02 I0424 12:30:16.223205 139204572510016 pyconfig.py:471] Config param shardy: True I0424 12:30:16.223219 139204572510016 pyconfig.py:471] Config param share_kv_projections: False I0424 12:30:16.223235 139204572510016 pyconfig.py:471] Config param shared_experts: 0 I0424 12:30:16.223250 139204572510016 pyconfig.py:471] Config param sinkhorn_iterations: 20 I0424 12:30:16.223266 139204572510016 pyconfig.py:471] Config param skip_first_n_steps_for_profiler: 1 I0424 12:30:16.223280 139204572510016 pyconfig.py:471] Config param skip_jax_distributed_system: False I0424 12:30:16.223296 139204572510016 pyconfig.py:471] Config param skip_step_interval: 128 I0424 12:30:16.223310 139204572510016 pyconfig.py:471] Config param skip_step_on_spikes: False I0424 12:30:16.223326 139204572510016 pyconfig.py:471] Config param skip_step_scaling_factor: 6.0 I0424 12:30:16.223341 139204572510016 pyconfig.py:471] Config param sliding_window_size: 0 I0424 12:30:16.223357 139204572510016 pyconfig.py:471] Config param solution_end_token: </answer> I0424 12:30:16.223371 139204572510016 pyconfig.py:471] Config param solution_start_token: <answer> I0424 12:30:16.223387 139204572510016 pyconfig.py:471] Config param source_checkpoint_layout: orbax I0424 12:30:16.223401 139204572510016 pyconfig.py:471] Config param sparse_matmul: True I0424 12:30:16.223417 139204572510016 pyconfig.py:471] Config param spatial_merge_size_for_vit: 2 I0424 12:30:16.223433 139204572510016 pyconfig.py:471] Config param stack_prefill_result_cache: False I0424 12:30:16.223447 139204572510016 pyconfig.py:471] Config param stack_trace_interval_seconds: 600 I0424 12:30:16.223463 139204572510016 pyconfig.py:471] Config param stack_trace_to_cloud: False I0424 12:30:16.223477 139204572510016 pyconfig.py:471] Config param step_deviation_interval_seconds: 30 I0424 12:30:16.223493 139204572510016 pyconfig.py:471] Config param steps: 200000 I0424 12:30:16.223507 139204572510016 pyconfig.py:471] Config param stop_strings: None I0424 12:30:16.223523 139204572510016 pyconfig.py:471] Config param student_overrides: {'model_name': 'llama3.1-8b'} I0424 12:30:16.223544 139204572510016 pyconfig.py:471] Config param student_params_to_update: None I0424 12:30:16.223560 139204572510016 pyconfig.py:471] Config param subslice_shape: I0424 12:30:16.223574 139204572510016 pyconfig.py:471] Config param swap_space_vllm_gb: 2 I0424 12:30:16.223590 139204572510016 pyconfig.py:471] Config param system_prompt: I0424 12:30:16.223604 139204572510016 pyconfig.py:471] Config param target_eval_loss: 0.0 I0424 12:30:16.223621 139204572510016 pyconfig.py:471] Config param teacher_overrides: {'model_name': 'llama3.1-8b'} I0424 12:30:16.223639 139204572510016 pyconfig.py:471] Config param temperature_tuning: False I0424 12:30:16.223653 139204572510016 pyconfig.py:471] Config param temporal_patch_size_for_vit: 2 I0424 12:30:16.223669 139204572510016 pyconfig.py:471] Config param tensorboard_dir: /deps/maxtext_output/gpt3-52k_2026-04-24-12-30/tensorboard/ I0424 12:30:16.223686 139204572510016 pyconfig.py:471] Config param tensors_on_device: None I0424 12:30:16.223702 139204572510016 pyconfig.py:471] Config param tensors_to_offload: None I0424 12:30:16.223716 139204572510016 pyconfig.py:471] Config param test_batch_start_index: 0 I0424 12:30:16.223732 139204572510016 pyconfig.py:471] Config param tile_size_for_vit: 336 I0424 12:30:16.223746 139204572510016 pyconfig.py:471] Config param tokenize_eval_data: True I0424 12:30:16.223762 139204572510016 pyconfig.py:471] Config param tokenize_train_data: True I0424 12:30:16.223777 139204572510016 pyconfig.py:471] Config param tokenizer_path: meta-llama/Llama-3.1-8B I0424 12:30:16.223793 139204572510016 pyconfig.py:471] Config param tokenizer_type: TokenizerType.HUGGINGFACE I0424 12:30:16.223811 139204572510016 pyconfig.py:471] Config param topk_routing_group: -1 I0424 12:30:16.223825 139204572510016 pyconfig.py:471] Config param train_data_columns: ['text'] I0424 12:30:16.223843 139204572510016 pyconfig.py:471] Config param train_fraction: 1.0 I0424 12:30:16.223857 139204572510016 pyconfig.py:471] Config param train_image_column: image I0424 12:30:16.223873 139204572510016 pyconfig.py:471] Config param train_micro_batch_size: -1 I0424 12:30:16.223887 139204572510016 pyconfig.py:471] Config param train_split: train I0424 12:30:16.223903 139204572510016 pyconfig.py:471] Config param trainable_parameters_mask: [] I0424 12:30:16.223918 139204572510016 pyconfig.py:471] Config param trainable_position_size: 2048 I0424 12:30:16.223934 139204572510016 pyconfig.py:471] Config param trainer_devices_fraction: 0.5 I0424 12:30:16.223950 139204572510016 pyconfig.py:471] Config param upload_all_profiler_results: False I0424 12:30:16.223963 139204572510016 pyconfig.py:471] Config param use_2d_fsdp_sharding: False I0424 12:30:16.223979 139204572510016 pyconfig.py:471] Config param use_agentic_rollout: False I0424 12:30:16.223994 139204572510016 pyconfig.py:471] Config param use_audio: False I0424 12:30:16.224009 139204572510016 pyconfig.py:471] Config param use_audio_in_video: False I0424 12:30:16.224024 139204572510016 pyconfig.py:471] Config param use_batch_split_schedule: False I0424 12:30:16.224038 139204572510016 pyconfig.py:471] Config param use_chat_template: False I0424 12:30:16.224055 139204572510016 pyconfig.py:471] Config param use_chunked_prefill: False I0424 12:30:16.224070 139204572510016 pyconfig.py:471] Config param use_custom_sort_vjp: True I0424 12:30:16.224086 139204572510016 pyconfig.py:471] Config param use_dpo: False I0424 12:30:16.224108 139204572510016 pyconfig.py:471] Config param use_gather_mosaic_kernel: False I0424 12:30:16.224124 139204572510016 pyconfig.py:471] Config param use_grpo: True I0424 12:30:16.224138 139204572510016 pyconfig.py:471] Config param use_indexer: False I0424 12:30:16.224154 139204572510016 pyconfig.py:471] Config param use_iota_embed: True I0424 12:30:16.224169 139204572510016 pyconfig.py:471] Config param use_jax_splash: False I0424 12:30:16.224185 139204572510016 pyconfig.py:471] Config param use_max_logit_estimate: -1 I0424 12:30:16.224200 139204572510016 pyconfig.py:471] Config param use_mrope: False I0424 12:30:16.224215 139204572510016 pyconfig.py:471] Config param use_multimodal: False I0424 12:30:16.224231 139204572510016 pyconfig.py:471] Config param use_pathways: True I0424 12:30:16.224246 139204572510016 pyconfig.py:471] Config param use_post_attn_norm: False I0424 12:30:16.224261 139204572510016 pyconfig.py:471] Config param use_post_ffw_norm: False I0424 12:30:16.224275 139204572510016 pyconfig.py:471] Config param use_qk_clip: False I0424 12:30:16.224291 139204572510016 pyconfig.py:471] Config param use_qk_norm: False I0424 12:30:16.224305 139204572510016 pyconfig.py:471] Config param use_qk_norm_in_gdn: True I0424 12:30:16.224321 139204572510016 pyconfig.py:471] Config param use_qwix_quantization: False I0424 12:30:16.224335 139204572510016 pyconfig.py:471] Config param use_ragged_attention: False I0424 12:30:16.224351 139204572510016 pyconfig.py:471] Config param use_random_routing: False I0424 12:30:16.224366 139204572510016 pyconfig.py:471] Config param use_replicator_service: False I0424 12:30:16.224381 139204572510016 pyconfig.py:471] Config param use_ring_of_experts: False I0424 12:30:16.224396 139204572510016 pyconfig.py:471] Config param use_sft: False I0424 12:30:16.224412 139204572510016 pyconfig.py:471] Config param use_splash_scheduler: False I0424 12:30:16.224426 139204572510016 pyconfig.py:471] Config param use_tokamax_gmm: False I0424 12:30:16.224442 139204572510016 pyconfig.py:471] Config param use_tokamax_splash: False I0424 12:30:16.224457 139204572510016 pyconfig.py:471] Config param use_truncation: True I0424 12:30:16.224472 139204572510016 pyconfig.py:471] Config param use_tunix_gradient_accumulation: False I0424 12:30:16.224486 139204572510016 pyconfig.py:471] Config param use_untrainable_positional_embedding: False I0424 12:30:16.224502 139204572510016 pyconfig.py:471] Config param use_vertex_tensorboard: False I0424 12:30:16.224517 139204572510016 pyconfig.py:471] Config param using_pipeline_parallelism: False I0424 12:30:16.224532 139204572510016 pyconfig.py:471] Config param v_head_dim: 128 I0424 12:30:16.224552 139204572510016 pyconfig.py:471] Config param v_norm_with_scale: True I0424 12:30:16.224567 139204572510016 pyconfig.py:471] Config param value_proj: RematLocation.REMAT I0424 12:30:16.224582 139204572510016 pyconfig.py:471] Config param vertex_tensorboard_project: I0424 12:30:16.224598 139204572510016 pyconfig.py:471] Config param vertex_tensorboard_region: I0424 12:30:16.224614 139204572510016 pyconfig.py:471] Config param video_path: I0424 12:30:16.224628 139204572510016 pyconfig.py:471] Config param video_placeholder: <|video|> I0424 12:30:16.224644 139204572510016 pyconfig.py:471] Config param vision_output_dim_for_vit: 4096 I0424 12:30:16.224658 139204572510016 pyconfig.py:471] Config param vision_output_length: -1 I0424 12:30:16.224674 139204572510016 pyconfig.py:471] Config param vllm_additional_config: {} I0424 12:30:16.224690 139204572510016 pyconfig.py:471] Config param vllm_hf_config_path: I0424 12:30:16.224705 139204572510016 pyconfig.py:471] Config param vllm_hf_overrides: {} I0424 12:30:16.224721 139204572510016 pyconfig.py:471] Config param vocab_size: 32000 I0424 12:30:16.224735 139204572510016 pyconfig.py:471] Config param warmup_steps_fraction: 0.1 I0424 12:30:16.224752 139204572510016 pyconfig.py:471] Config param weight_dtype: float32 I0424 12:30:16.224777 139204572510016 pyconfig.py:471] Config param weight_quantization_calibration_method: absmax I0424 12:30:16.224792 139204572510016 pyconfig.py:471] Config param wi_tile_dlhs_batch_seq: 512 I0424 12:30:16.224807 139204572510016 pyconfig.py:471] Config param wi_tile_dlhs_embed_dim: 1024 I0424 12:30:16.224822 139204572510016 pyconfig.py:471] Config param wi_tile_dlhs_mlp_dim: 1024 I0424 12:30:16.224838 139204572510016 pyconfig.py:471] Config param wi_tile_drhs_batch_seq: 512 I0424 12:30:16.224854 139204572510016 pyconfig.py:471] Config param wi_tile_drhs_embed_dim: 1024 I0424 12:30:16.224868 139204572510016 pyconfig.py:471] Config param wi_tile_drhs_mlp_dim: 1024 I0424 12:30:16.224884 139204572510016 pyconfig.py:471] Config param wi_tile_fwd_batch_seq: 512 I0424 12:30:16.224898 139204572510016 pyconfig.py:471] Config param wi_tile_fwd_embed_dim: 1024 I0424 12:30:16.224914 139204572510016 pyconfig.py:471] Config param wi_tile_fwd_mlp_dim: 1024 I0424 12:30:16.224929 139204572510016 pyconfig.py:471] Config param wo_tile_dlhs_batch_seq: 512 I0424 12:30:16.224943 139204572510016 pyconfig.py:471] Config param wo_tile_dlhs_embed_dim: 1024 I0424 12:30:16.224958 139204572510016 pyconfig.py:471] Config param wo_tile_dlhs_mlp_dim: 1024 I0424 12:30:16.224974 139204572510016 pyconfig.py:471] Config param wo_tile_drhs_batch_seq: 512 I0424 12:30:16.224988 139204572510016 pyconfig.py:471] Config param wo_tile_drhs_embed_dim: 1024 I0424 12:30:16.225003 139204572510016 pyconfig.py:471] Config param wo_tile_drhs_mlp_dim: 1024 I0424 12:30:16.225018 139204572510016 pyconfig.py:471] Config param wo_tile_fwd_batch_seq: 512 I0424 12:30:16.225034 139204572510016 pyconfig.py:471] Config param wo_tile_fwd_embed_dim: 1024 I0424 12:30:16.225048 139204572510016 pyconfig.py:471] Config param wo_tile_fwd_mlp_dim: 1024 I0424 12:30:16.225064 139204572510016 pyconfig.py:471] Config param wsd_decay_steps_fraction: 0.1 I0424 12:30:16.225078 139204572510016 pyconfig.py:471] Config param wsd_decay_style: WsdDecayStyle.LINEAR I0424 12:30:16.225106 139204572510016 pyconfig.py:471] Config param xprof_e2e_enable_fw_power_level_event: False I0424 12:30:16.225122 139204572510016 pyconfig.py:471] Config param xprof_e2e_enable_fw_thermal_event: False I0424 12:30:16.225137 139204572510016 pyconfig.py:471] Config param xprof_e2e_enable_fw_throttle_event: False I0424 12:30:16.225152 139204572510016 pyconfig.py:471] Config param xprof_tpu_power_trace_level: 0 I0424 12:30:16.225171 139204572510016 pyconfig.py:471] Config param z_loss_multiplier: 0.0 I0424 12:30:16.225684 139204572510016 tokenizer.py:245] Tokenizer path: meta-llama/Llama-2-7b-chat-hf I0424 12:30:16.225727 139204572510016 tokenizer.py:224] Loading HF tokenizer: meta-llama/Llama-2-7b-chat-hf I0424 12:30:16.424091 139204572510016 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json "HTTP/1.1 200 OK" I0424 12:30:16.536410 139204572510016 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json "HTTP/1.1 200 OK" I0424 12:30:16.643306 139204572510016 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer_config.json "HTTP/1.1 200 OK" I0424 12:30:16.753740 139204572510016 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer_config.json "HTTP/1.1 200 OK" I0424 12:30:16.866543 139204572510016 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main/additional_chat_templates?recursive=false&expand=false "HTTP/1.1 404 Not Found" I0424 12:30:16.973860 139204572510016 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main?recursive=true&expand=false "HTTP/1.1 200 OK" I0424 12:30:17.088721 139204572510016 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer.model "HTTP/1.1 302 Found" I0424 12:30:17.205578 139204572510016 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/xet-read-token/f5db02db724555f92da89c216ac04704f23d4590 "HTTP/1.1 200 OK" I0424 12:30:17.847542 139204572510016 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer.json "HTTP/1.1 200 OK" I0424 12:30:17.955623 139204572510016 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer.json "HTTP/1.1 200 OK" I0424 12:30:18.242249 139204572510016 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/added_tokens.json "HTTP/1.1 404 Not Found" I0424 12:30:18.369615 139204572510016 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/special_tokens_map.json "HTTP/1.1 200 OK" I0424 12:30:18.482303 139204572510016 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/special_tokens_map.json "HTTP/1.1 200 OK" I0424 12:30:18.602080 139204572510016 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/chat_template.jinja "HTTP/1.1 404 Not Found" I0424 12:30:18.694280 139204572510016 _schedule.py:129] A polynomial schedule was set with a non-positive `transition_steps` value; this results in a constant schedule with value `init_value`. I0424 12:30:18.701184 139204572510016 maxtext_utils.py:1771] Num_devices: 32, shape (1, 4, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1) I0424 12:30:18.701333 139204572510016 train_distill.py:594] Applying logical axis rules for model initialization and training... I0424 12:30:18.701407 139204572510016 train_distill.py:598] Loading Student from ... I0424 12:30:18.701436 139204572510016 train_distill.py:170] --- Student Configuration --- I0424 12:30:18.701456 139204572510016 train_distill.py:171] Model Name: gpt3-52k I0424 12:30:18.701478 139204572510016 train_distill.py:172] Dimensions: 1 Layers, 16 Emb Dim, 8 Head Dim I0424 12:30:18.701497 139204572510016 train_distill.py:175] Attention Heads: 2 Query, 2 KV I0424 12:30:18.701516 139204572510016 train_distill.py:176] Vocab Size: 32000 I0424 12:30:18.701534 139204572510016 train_distill.py:177] Checkpoint: I0424 12:30:18.701553 139204572510016 train_distill.py:463] Initializing model: gpt3-52k... I0424 12:30:20.357028 139204572510016 train_distill.py:612] Loading Teacher from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items... I0424 12:30:20.357148 139204572510016 train_distill.py:170] --- Teacher Configuration --- I0424 12:30:20.357178 139204572510016 train_distill.py:171] Model Name: gpt3-52k I0424 12:30:20.357204 139204572510016 train_distill.py:172] Dimensions: 1 Layers, 16 Emb Dim, 8 Head Dim I0424 12:30:20.357225 139204572510016 train_distill.py:175] Attention Heads: 2 Query, 2 KV I0424 12:30:20.357245 139204572510016 train_distill.py:176] Vocab Size: 32000 I0424 12:30:20.357262 139204572510016 train_distill.py:177] Checkpoint: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items I0424 12:30:20.357281 139204572510016 train_distill.py:463] Initializing model: gpt3-52k... I0424 12:30:21.424192 139204572510016 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0424 12:30:21.424350 139204572510016 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=True, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7e9a5efbc410>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0424 12:30:21.424408 139204572510016 abstract_checkpointer.py:35] orbax-checkpoint version: 0.11.28 W0424 12:30:21.975232 139204572510016 checkpoint.py:202] Metadata file does not exist: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items/_CHECKPOINT_METADATA I0424 12:30:22.527007 1970 google_auth_provider.cc:181] Running on GCE, using service account 562977990677-compute@developer.gserviceaccount.com I0424 12:30:23.540428 139204572510016 checkpointer.py:304] Restoring checkpoint from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items. W0424 12:30:25.558202 139204572510016 transform_utils.py:230] The transformations API will eventually be replaced by an upgraded design. The current API will not be removed until this point, but it will no longer be actively worked on. I0424 12:30:25.558612 139204572510016 transform_utils.py:288] The following keys are not loaded from the original tree after applying specified transforms: params/params/decoder/to_nnx__rngs/aqt/count, params/params/decoder/to_nnx__rngs/aqt/key, params/params/decoder/to_nnx__rngs/dropout/count, params/params/decoder/to_nnx__rngs/dropout/key, params/params/decoder/to_nnx__rngs/params/count, params/params/decoder/to_nnx__rngs/params/key I0424 12:30:28.415608 139204572510016 checkpointer.py:318] Finished restoring checkpoint in 5.25 seconds from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items. I0424 12:30:29.149979 139204572510016 train_distill.py:638] Initializing Data Iterators via MaxText pipeline... I0424 12:30:29.212905 139204572510016 config.py:112] TensorFlow version 2.20.0 available. I0424 12:30:29.213404 139204572510016 config.py:125] JAX version 0.9.2 available. I0424 12:30:29.607367 139204572510016 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/main/README.md "HTTP/1.1 307 Temporary Redirect" I0424 12:30:29.615870 139204572510016 _client.py:1025] HTTP Request: HEAD https://huggingface.co/api/resolve-cache/datasets/OptimalScale/ClimbMix/6d467b96d8f26cbe7465e2d70e36191aa75867ac/README.md "HTTP/1.1 200 OK" I0424 12:30:29.623713 139204572510016 _client.py:1025] HTTP Request: GET https://huggingface.co/api/resolve-cache/datasets/OptimalScale/ClimbMix/6d467b96d8f26cbe7465e2d70e36191aa75867ac/README.md "HTTP/1.1 200 OK" I0424 12:30:29.732566 139204572510016 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/6d467b96d8f26cbe7465e2d70e36191aa75867ac/ClimbMix.py "HTTP/1.1 404 Not Found" I0424 12:30:30.053443 139204572510016 _client.py:1025] HTTP Request: HEAD https://s3.amazonaws.com/datasets.huggingface.co/datasets/datasets/OptimalScale/ClimbMix/OptimalScale/ClimbMix.py "HTTP/1.1 404 Not Found" I0424 12:30:30.167991 139204572510016 _client.py:1025] HTTP Request: GET https://huggingface.co/api/datasets/OptimalScale/ClimbMix/revision/6d467b96d8f26cbe7465e2d70e36191aa75867ac "HTTP/1.1 200 OK" I0424 12:30:30.276301 139204572510016 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/6d467b96d8f26cbe7465e2d70e36191aa75867ac/.huggingface.yaml "HTTP/1.1 404 Not Found" I0424 12:30:30.603306 139204572510016 _client.py:1025] HTTP Request: GET https://datasets-server.huggingface.co/info?dataset=OptimalScale/ClimbMix "HTTP/1.1 200 OK" I0424 12:30:30.709961 139204572510016 _client.py:1025] HTTP Request: GET https://huggingface.co/api/datasets/OptimalScale/ClimbMix/tree/6d467b96d8f26cbe7465e2d70e36191aa75867ac/data?recursive=true&expand=false "HTTP/1.1 404 Not Found" I0424 12:30:30.822354 139204572510016 _client.py:1025] HTTP Request: GET https://huggingface.co/api/datasets/OptimalScale/ClimbMix/tree/6d467b96d8f26cbe7465e2d70e36191aa75867ac?recursive=false&expand=false "HTTP/1.1 200 OK" I0424 12:30:30.964246 139204572510016 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/6d467b96d8f26cbe7465e2d70e36191aa75867ac/dataset_infos.json "HTTP/1.1 404 Not Found" I0424 12:30:31.130358 139204572510016 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json "HTTP/1.1 200 OK" I0424 12:30:31.234907 139204572510016 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer_config.json "HTTP/1.1 200 OK" I0424 12:30:31.341013 139204572510016 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main/additional_chat_templates?recursive=false&expand=false "HTTP/1.1 404 Not Found" I0424 12:30:31.449221 139204572510016 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main?recursive=true&expand=false "HTTP/1.1 200 OK" E0424 12:30:31.542356 139204572510016 packing.py:209] PackAndBatchOperation is deprecated. Please use lazy_dataset.FirstFitPackIterDataset instead. I0424 12:30:31.542570 139204572510016 data_loader.py:408] Adding CopyNumPyArrayToSharedMemory MapTransform. I0424 12:30:31.545611 139204572510016 train_distill.py:408] Input Pipeline Checkpointing: DISABLED I0424 12:30:31.545671 139204572510016 train_distill.py:412] Reason: Iterator 'MultiHostDataLoadIterator' is not recognized as Grain (dataset_type='DatasetType.HF', has_save=False) I0424 12:30:31.545735 139204572510016 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0424 12:30:31.545813 139204572510016 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7e9a5efbc410>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0424 12:30:31.545855 139204572510016 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0424 12:30:31.545884 139204572510016 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7e9a5efbc410>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0424 12:30:31.545925 139204572510016 checkpoint_manager.py:702] [process=6][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e9489966c00>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e948966d9d0>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e82e570b5f0>}, handler_registry=None I0424 12:30:31.546125 139204572510016 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e9489966c00>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0424 12:30:31.546166 139204572510016 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e948966d9d0>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0424 12:30:31.546193 139204572510016 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e82e570b5f0>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0424 12:30:31.546217 139204572510016 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e948923e750>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0424 12:30:31.546243 139204572510016 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e9489966c00>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e9489966c00>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e948966d9d0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e948966d9d0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e82e570b5f0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e82e570b5f0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e948923e750>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e948923e750>}). I0424 12:30:31.546625 139204572510016 async_checkpointer.py:177] [process=6][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7e818c108220> timeout: 600 secs and primary_host=0 for async checkpoint writes I0424 12:30:33.084211 139204572510016 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260424_120707/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260424_120707_07_distill_smoke/checkpoints I0424 12:30:33.106153 139204572510016 checkpoint_manager.py:921] [process=6][thread=MainThread] CheckpointManager created, primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260424_120707/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260424_120707_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7e82e570b5c0> I0424 12:30:33.106279 139204572510016 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0424 12:30:33.106352 139204572510016 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7e9a5efbc410>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0424 12:30:33.106404 139204572510016 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0424 12:30:33.106450 139204572510016 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7e9a5efbc410>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0424 12:30:33.106501 139204572510016 checkpoint_manager.py:1983] [process=6][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning. I0424 12:30:33.106573 139204572510016 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=139204572510016 count=1 at 0x7e81a4340c80>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7e948923dc70>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7e82e570b3b0>, _write_futures=[]) I0424 12:30:33.106963 139204572510016 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=139204572510016 count=1 at 0x7e81a4340c80>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7e948923dc70>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7e82e570b3b0>, _write_futures=[]) I0424 12:30:33.106993 139204572510016 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=139204572510016 count=1 at 0x7e81a4340c80>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7e948923dc70>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7e82e570b3b0>, _write_futures=[]) I0424 12:30:33.107036 139204572510016 checkpoint_manager.py:702] [process=6][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e82e570b590>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e82e568e1b0>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e82e44a2ed0>, 'iter': <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7e82e44a3f50>}, handler_registry=None I0424 12:30:33.107155 139204572510016 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e82e570b590>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0424 12:30:33.107191 139204572510016 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e82e568e1b0>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0424 12:30:33.107214 139204572510016 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e82e44a2ed0>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0424 12:30:33.107247 139204572510016 composite_checkpoint_handler.py:237] Deferred registration for item: "iter". Adding handler `<maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7e82e44a3f50>` for item "iter" and save args `<class 'maxtext.common.checkpointing.GrainCheckpointSave'>` and restore args `<class 'maxtext.common.checkpointing.GrainCheckpointRestore'>` to `_handler_registry`. I0424 12:30:33.107275 139204572510016 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e82e44a2a20>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0424 12:30:33.107297 139204572510016 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e82e570b590>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e82e570b590>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e82e568e1b0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e82e568e1b0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e82e44a2ed0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e82e44a2ed0>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointSave'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7e82e44a3f50>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointRestore'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7e82e44a3f50>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e82e44a2a20>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e82e44a2a20>}). I0424 12:30:33.107370 139204572510016 async_checkpointer.py:177] [process=6][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7e818c108360> timeout: 600 secs and primary_host=0 for async checkpoint writes I0424 12:30:33.486491 139204572510016 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260424_120707/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260424_120707_07_distill_smoke/checkpoints I0424 12:30:33.499366 139204572510016 checkpoint_manager.py:921] [process=6][thread=MainThread] CheckpointManager created, primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260424_120707/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260424_120707_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7e81a41cf830> I0424 12:30:33.499790 139204572510016 train_distill.py:689] Starting Distillation Training... I0424 12:30:33.499897 139204572510016 peft_trainer.py:584] Training with mesh: Mesh('diloco': 1, 'data': 4, 'stage': 1, 'fsdp': 8, 'fsdp_transpose': 1, 'sequence': 1, 'context': 1, 'context_autoregressive': 1, 'tensor': 1, 'tensor_transpose': 1, 'tensor_sequence': 1, 'expert': 1, 'autoregressive': 1, axis_types=(Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto)) I0424 12:30:33.968382 139204572510016 peft_trainer.py:594] Compiled train_step cache size: 0 I0424 12:30:33.970036 139049683638016 grain_pool.py:367] Grain pool will use 1 processes. I0424 12:30:34.025596 139049683638016 grain_pool.py:440] Grain pool will start child processes. I0424 12:30:34.031444 139049683638016 grain_pool.py:448] Grain pool started all child processes. 2026-04-24 12:30:40.535454: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303) `rope_parameters`'s factor field must be a float >= 1, got 40 `rope_parameters`'s beta_fast field must be a float, got 32 `rope_parameters`'s beta_slow field must be a float, got 1 DeepseekV32Config got `key=rope_scaling` in kwargs but hasn't set it as attribute. For RoPE standardization you need to set `self.rope_parameters` in model's config. Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 793, in <module> app.run(main) File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 316, in run _run_main(main, args) File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 261, in _run_main sys.exit(main(argv)) ^^^^^^^^^^ File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 789, in main train_distill(student_config, teacher_config, is_offline, global_config.offline_data_dir) File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 691, in train_distill trainer.train(train_iter, eval_iter) File "/usr/local/lib/python3.12/site-packages/tunix/sft/peft_trainer.py", line 652, in train train_example = sharding_utils.shard_input( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 58, in shard_input return jax.tree.map( ^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/tree.py", line 156, in map return tree_util.tree_map(f, tree, *rest, is_leaf=is_leaf) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 373, in tree_map return treedef.unflatten(f(*xs) for xs in zip(*all_leaves)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 373, in <genexpr> return treedef.unflatten(f(*xs) for xs in zip(*all_leaves)) ^^^^^^ File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 59, in <lambda> lambda x: jax.make_array_from_process_local_data( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 985, in make_array_from_process_local_data out = [_array_from_process_local_data(data, s, shape) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 1047, in _array_from_process_local_data return make_array_from_callback(global_shape, sharding, cb) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 844, in make_array_from_callback per_device_values = api.device_put(per_device_values, devices) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/api.py", line 2732, in device_put out_flat = dispatch._batched_device_put_impl( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 602, in _batched_device_put_impl y = _device_put_impl(x, device=device, src=src, copy=cp, aval=aval) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 582, in _device_put_impl return _device_put_sharding_impl(x, aval, device, copy) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 512, in _device_put_sharding_impl raise ValueError( ValueError: When the second argument to `device_put` is a Device, the first argument must be a fully addressable array or a non-addressable array with a single device sharding. Got value with devices {TpuDevice(id=1, process_index=0, coords=(1,0,0), core_on_chip=0), TpuDevice(id=6, process_index=1, coords=(2,1,0), core_on_chip=0), TpuDevice(id=21, process_index=4, coords=(1,5,0), core_on_chip=0), TpuDevice(id=29, process_index=6, coords=(1,7,0), core_on_chip=0), TpuDevice(id=15, process_index=3, coords=(3,3,0), core_on_chip=0), TpuDevice(id=23, process_index=5, coords=(3,5,0), core_on_chip=0), TpuDevice(id=4, process_index=0, coords=(0,1,0), core_on_chip=0), TpuDevice(id=11, process_index=3, coords=(3,2,0), core_on_chip=0), TpuDevice(id=31, process_index=7, coords=(3,7,0), core_on_chip=0), TpuDevice(id=16, process_index=4, coords=(0,4,0), core_on_chip=0), TpuDevice(id=24, process_index=6, coords=(0,6,0), core_on_chip=0), TpuDevice(id=5, process_index=0, coords=(1,1,0), core_on_chip=0), TpuDevice(id=18, process_index=5, coords=(2,4,0), core_on_chip=0), TpuDevice(id=26, process_index=7, coords=(2,6,0), core_on_chip=0), TpuDevice(id=7, process_index=1, coords=(3,1,0), core_on_chip=0), TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), TpuDevice(id=2, process_index=1, coords=(2,0,0), core_on_chip=0), TpuDevice(id=13, process_index=2, coords=(1,3,0), core_on_chip=0), TpuDevice(id=14, process_index=3, coords=(2,3,0), core_on_chip=0), TpuDevice(id=17, process_index=4, coords=(1,4,0), core_on_chip=0), TpuDevice(id=25, process_index=6, coords=(1,6,0), core_on_chip=0), TpuDevice(id=19, process_index=5, coords=(3,4,0), core_on_chip=0), TpuDevice(id=27, process_index=7, coords=(3,6,0), core_on_chip=0), TpuDevice(id=8, process_index=2, coords=(0,2,0), core_on_chip=0), TpuDevice(id=3, process_index=1, coords=(3,0,0), core_on_chip=0), TpuDevice(id=30, process_index=7, coords=(2,7,0), core_on_chip=0), TpuDevice(id=20, process_index=4, coords=(0,5,0), core_on_chip=0), TpuDevice(id=28, process_index=6, coords=(0,7,0), core_on_chip=0), TpuDevice(id=22, process_index=5, coords=(2,5,0), core_on_chip=0), TpuDevice(id=9, process_index=2, coords=(1,2,0), core_on_chip=0), TpuDevice(id=10, process_index=3, coords=(2,2,0), core_on_chip=0), TpuDevice(id=12, process_index=2, coords=(0,3,0), core_on_chip=0)} I0424 12:30:44.698386 139049683638016 grain_pool.py:542] Grain pool is exiting. I0424 12:30:44.698487 139049683638016 grain_pool.py:547] Shutting down multiprocessing system. I0424 12:30:46.410725 139049683638016 grain_pool.py:547] Shutting down multiprocessing system. /usr/local/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 15 leaked shared_memory objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' XPK End: Fri Apr 24 12:30:56 UTC 2026 EXIT_CODE=1