XPK Start: Wed Apr 22 12:54:27 UTC 2026 2026-04-22 12:54:44.226306: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303) Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} `rope_scaling`'s factor field must be a float >= 1, got 40 `rope_scaling`'s beta_fast field must be a float, got 32 `rope_scaling`'s beta_slow field must be a float, got 1 Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} I0422 12:54:48.415966 136017910646592 max_utils.py:273] Attempting to initialize the jax distributed system... INFO:2026-04-22 12:54:57,455:jax._src.distributed:149: Starting JAX distributed service on [::]:8482 I0422 12:54:57.455343 136017910646592 distributed.py:149] Starting JAX distributed service on [::]:8482 INFO:2026-04-22 12:54:57,457:jax._src.distributed:166: Connecting to JAX distributed service on mt-07-distill-smoke-4f4l1-slice-job-0-0.mt-07-distill-smoke-4f4l1:8482 I0422 12:54:57.457572 136017910646592 distributed.py:166] Connecting to JAX distributed service on mt-07-distill-smoke-4f4l1-slice-job-0-0.mt-07-distill-smoke-4f4l1:8482 I0422 12:54:58.188399 136017910646592 max_utils.py:284] Jax distributed system initialized! I0422 12:55:05.294265 136017910646592 max_utils.py:244] Jax distributed system is already initialized. W0422 12:55:05.425008 136017910646592 pyconfig.py:111] base_output_directory is not provided; Using local directory called maxtext_output I0422 12:55:05.485073 136017910646592 max_utils.py:244] Jax distributed system is already initialized. I0422 12:55:05.486241 136017910646592 pyconfig.py:471] Config param abort_on_inf_loss: True I0422 12:55:05.486294 136017910646592 pyconfig.py:471] Config param abort_on_nan_loss: True I0422 12:55:05.486322 136017910646592 pyconfig.py:471] Config param act_quantization_calibration_method: absmax I0422 12:55:05.486343 136017910646592 pyconfig.py:471] Config param activation_dropout_for_audio: 0.0 I0422 12:55:05.486363 136017910646592 pyconfig.py:471] Config param activation_function_for_audio: gelu I0422 12:55:05.486381 136017910646592 pyconfig.py:471] Config param activations_in_float32: False I0422 12:55:05.486399 136017910646592 pyconfig.py:471] Config param adam_b1: 0.9 I0422 12:55:05.486419 136017910646592 pyconfig.py:471] Config param adam_b2: 0.95 I0422 12:55:05.486436 136017910646592 pyconfig.py:471] Config param adam_eps: 1e-08 I0422 12:55:05.486459 136017910646592 pyconfig.py:471] Config param adam_eps_root: 0.0 I0422 12:55:05.486476 136017910646592 pyconfig.py:471] Config param adam_weight_decay: 0.1 I0422 12:55:05.486493 136017910646592 pyconfig.py:471] Config param adamw_mask: [] I0422 12:55:05.486509 136017910646592 pyconfig.py:471] Config param add_bos: True I0422 12:55:05.486527 136017910646592 pyconfig.py:471] Config param add_eos: True I0422 12:55:05.486543 136017910646592 pyconfig.py:471] Config param allow_split_physical_axes: False I0422 12:55:05.486560 136017910646592 pyconfig.py:471] Config param ar_cache_axis_order: 1,2,0,3 I0422 12:55:05.486577 136017910646592 pyconfig.py:471] Config param async_checkpointing: True I0422 12:55:05.486593 136017910646592 pyconfig.py:471] Config param async_scheduling: False I0422 12:55:05.486610 136017910646592 pyconfig.py:471] Config param attention: dot_product I0422 12:55:05.486625 136017910646592 pyconfig.py:471] Config param attention_bias: False I0422 12:55:05.486643 136017910646592 pyconfig.py:471] Config param attention_dropout_for_audio: 0.0 I0422 12:55:05.486675 136017910646592 pyconfig.py:471] Config param attention_out: RematLocation.REMAT I0422 12:55:05.486696 136017910646592 pyconfig.py:471] Config param attention_output_dim: -1 I0422 12:55:05.486733 136017910646592 pyconfig.py:471] Config param attention_sink: False I0422 12:55:05.486751 136017910646592 pyconfig.py:471] Config param attention_type: global I0422 12:55:05.486946 136017910646592 pyconfig.py:471] Config param attn_logits_soft_cap: None I0422 12:55:05.486987 136017910646592 pyconfig.py:471] Config param audio_path: I0422 12:55:05.487013 136017910646592 pyconfig.py:471] Config param audio_placeholder: <|audio|> I0422 12:55:05.487035 136017910646592 pyconfig.py:471] Config param autoregressive_decode_assert: I0422 12:55:05.487055 136017910646592 pyconfig.py:471] Config param base_config: base.yml I0422 12:55:05.487074 136017910646592 pyconfig.py:471] Config param base_emb_dim: 16 I0422 12:55:05.487093 136017910646592 pyconfig.py:471] Config param base_mlp_dim: 64 I0422 12:55:05.487110 136017910646592 pyconfig.py:471] Config param base_moe_mlp_dim: -1 I0422 12:55:05.487126 136017910646592 pyconfig.py:471] Config param base_num_decoder_layers: 1 I0422 12:55:05.487142 136017910646592 pyconfig.py:471] Config param base_num_kv_heads: 2 I0422 12:55:05.487159 136017910646592 pyconfig.py:471] Config param base_num_query_heads: 2 I0422 12:55:05.487174 136017910646592 pyconfig.py:471] Config param base_output_directory: /deps/maxtext_output I0422 12:55:05.487192 136017910646592 pyconfig.py:471] Config param batch_size: 1 I0422 12:55:05.487210 136017910646592 pyconfig.py:471] Config param batch_split_factor: 1 I0422 12:55:05.487233 136017910646592 pyconfig.py:471] Config param beta_fast: 32 I0422 12:55:05.487250 136017910646592 pyconfig.py:471] Config param beta_slow: 1 I0422 12:55:05.487266 136017910646592 pyconfig.py:471] Config param bwd_quantization_calibration_method: absmax I0422 12:55:05.487285 136017910646592 pyconfig.py:471] Config param capacity_factor: -1.0 I0422 12:55:05.487304 136017910646592 pyconfig.py:471] Config param cast_logits_to_fp32: True I0422 12:55:05.487321 136017910646592 pyconfig.py:471] Config param chat_template: I0422 12:55:05.487336 136017910646592 pyconfig.py:471] Config param chat_template_path: I0422 12:55:05.487355 136017910646592 pyconfig.py:471] Config param checkpoint_conversion_fn: None I0422 12:55:05.487373 136017910646592 pyconfig.py:471] Config param checkpoint_dir: /deps/maxtext_output/gpt3-52k_2026-04-22-12-55/checkpoints/ I0422 12:55:05.487391 136017910646592 pyconfig.py:471] Config param checkpoint_is_quantized: False I0422 12:55:05.487407 136017910646592 pyconfig.py:471] Config param checkpoint_period: 2000 I0422 12:55:05.487423 136017910646592 pyconfig.py:471] Config param checkpoint_storage_concurrent_gb: 96 I0422 12:55:05.487440 136017910646592 pyconfig.py:471] Config param checkpoint_storage_target_data_file_size_bytes: 2147483648 I0422 12:55:05.487457 136017910646592 pyconfig.py:471] Config param checkpoint_storage_use_ocdbt: True I0422 12:55:05.487473 136017910646592 pyconfig.py:471] Config param checkpoint_storage_use_zarr3: True I0422 12:55:05.487489 136017910646592 pyconfig.py:471] Config param checkpoint_todelete_full_path: None I0422 12:55:05.487503 136017910646592 pyconfig.py:471] Config param checkpoint_todelete_subdir: None I0422 12:55:05.487519 136017910646592 pyconfig.py:471] Config param chips_per_vm: 4 I0422 12:55:05.487535 136017910646592 pyconfig.py:471] Config param chunk_attn_window_size: 0 I0422 12:55:05.487551 136017910646592 pyconfig.py:471] Config param collect_stack_trace: False I0422 12:55:05.487566 136017910646592 pyconfig.py:471] Config param colocated_python_checkpointing: False I0422 12:55:05.487583 136017910646592 pyconfig.py:471] Config param colocated_python_data_input: False I0422 12:55:05.487599 136017910646592 pyconfig.py:471] Config param compile_topology: I0422 12:55:05.487614 136017910646592 pyconfig.py:471] Config param compile_topology_num_slices: -1 I0422 12:55:05.487630 136017910646592 pyconfig.py:471] Config param compile_xla_flags: I0422 12:55:05.487646 136017910646592 pyconfig.py:471] Config param compiled_trainstep_file: I0422 12:55:05.487675 136017910646592 pyconfig.py:471] Config param compute_axis_order: 0,1,2,3 I0422 12:55:05.487691 136017910646592 pyconfig.py:471] Config param constant_bound_config: [] I0422 12:55:05.487708 136017910646592 pyconfig.py:471] Config param context: RematLocation.REMAT I0422 12:55:05.487730 136017910646592 pyconfig.py:471] Config param context_parallel_load_balance: True I0422 12:55:05.487747 136017910646592 pyconfig.py:471] Config param context_parallel_reorder_strategy: ReorderStrategy.AUTO I0422 12:55:05.487766 136017910646592 pyconfig.py:471] Config param context_parallel_size: 1 I0422 12:55:05.487782 136017910646592 pyconfig.py:471] Config param context_parallel_strategy: all_gather I0422 12:55:05.487797 136017910646592 pyconfig.py:471] Config param context_sharding: context I0422 12:55:05.487813 136017910646592 pyconfig.py:471] Config param conv_chunksize_for_audio: 500 I0422 12:55:05.487829 136017910646592 pyconfig.py:471] Config param conv_stride_for_vit: 14 I0422 12:55:05.487844 136017910646592 pyconfig.py:471] Config param convert_checkpoint_if_possible: False I0422 12:55:05.487860 136017910646592 pyconfig.py:471] Config param cost_estimate_flops_bwd: -1 I0422 12:55:05.487876 136017910646592 pyconfig.py:471] Config param cost_estimate_flops_fwd: -1 I0422 12:55:05.487892 136017910646592 pyconfig.py:471] Config param custom_mesh: I0422 12:55:05.487908 136017910646592 pyconfig.py:471] Config param custom_mesh_and_rule: I0422 12:55:05.487923 136017910646592 pyconfig.py:471] Config param d_model_for_audio: 256 I0422 12:55:05.487939 136017910646592 pyconfig.py:471] Config param data_sharding: (('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'),) I0422 12:55:05.487961 136017910646592 pyconfig.py:471] Config param data_shuffle_seed: 0 I0422 12:55:05.487977 136017910646592 pyconfig.py:471] Config param dataset_name: c4/en:3.0.1 I0422 12:55:05.487993 136017910646592 pyconfig.py:471] Config param dataset_path: I0422 12:55:05.488009 136017910646592 pyconfig.py:471] Config param dataset_type: DatasetType.HF I0422 12:55:05.488027 136017910646592 pyconfig.py:471] Config param dcn_autoregressive_parallelism: 1 I0422 12:55:05.488042 136017910646592 pyconfig.py:471] Config param dcn_context_autoregressive_parallelism: 1 I0422 12:55:05.488057 136017910646592 pyconfig.py:471] Config param dcn_context_parallelism: 1 I0422 12:55:05.488074 136017910646592 pyconfig.py:471] Config param dcn_data_parallelism: -1 I0422 12:55:05.488090 136017910646592 pyconfig.py:471] Config param dcn_diloco_parallelism: 1 I0422 12:55:05.488105 136017910646592 pyconfig.py:471] Config param dcn_expert_parallelism: 1 I0422 12:55:05.488121 136017910646592 pyconfig.py:471] Config param dcn_fsdp_parallelism: 1 I0422 12:55:05.488136 136017910646592 pyconfig.py:471] Config param dcn_fsdp_transpose_parallelism: 1 I0422 12:55:05.488152 136017910646592 pyconfig.py:471] Config param dcn_parallelism: [1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] I0422 12:55:05.488170 136017910646592 pyconfig.py:471] Config param dcn_pipeline_parallelism: 1 I0422 12:55:05.488185 136017910646592 pyconfig.py:471] Config param dcn_sequence_parallelism: 1 I0422 12:55:05.488201 136017910646592 pyconfig.py:471] Config param dcn_tensor_parallelism: 1 I0422 12:55:05.488218 136017910646592 pyconfig.py:471] Config param dcn_tensor_sequence_parallelism: 1 I0422 12:55:05.488238 136017910646592 pyconfig.py:471] Config param dcn_tensor_transpose_parallelism: 1 I0422 12:55:05.488253 136017910646592 pyconfig.py:471] Config param debug: {'rl': False} I0422 12:55:05.488270 136017910646592 pyconfig.py:471] Config param debug_sharding: False I0422 12:55:05.488284 136017910646592 pyconfig.py:471] Config param decode_sampling_nucleus_p: -1 I0422 12:55:05.488300 136017910646592 pyconfig.py:471] Config param decode_sampling_strategy: SamplingStrategy.GREEDY I0422 12:55:05.488319 136017910646592 pyconfig.py:471] Config param decode_sampling_temperature: 1.0 I0422 12:55:05.488335 136017910646592 pyconfig.py:471] Config param decode_sampling_top_k: 0 I0422 12:55:05.488352 136017910646592 pyconfig.py:471] Config param decoder_block: DecoderBlockType.GPT3 I0422 12:55:05.488371 136017910646592 pyconfig.py:471] Config param decoder_layer_input: RematLocation.DEVICE I0422 12:55:05.488388 136017910646592 pyconfig.py:471] Config param deepstack_visual_indexes_for_vit: [] I0422 12:55:05.488404 136017910646592 pyconfig.py:471] Config param degenerate_group_masking: True I0422 12:55:05.488420 136017910646592 pyconfig.py:471] Config param dense_init_scale: 1.0 I0422 12:55:05.488436 136017910646592 pyconfig.py:471] Config param diloco_outer_lr: 0.3 I0422 12:55:05.488453 136017910646592 pyconfig.py:471] Config param diloco_outer_momentum: 0.9 I0422 12:55:05.488469 136017910646592 pyconfig.py:471] Config param diloco_sync_period: 36 I0422 12:55:05.488485 136017910646592 pyconfig.py:471] Config param distill_alpha: 0.5 I0422 12:55:05.488501 136017910646592 pyconfig.py:471] Config param distill_alpha_end: None I0422 12:55:05.488517 136017910646592 pyconfig.py:471] Config param distill_alpha_schedule: constant I0422 12:55:05.488533 136017910646592 pyconfig.py:471] Config param distill_beta: 0.0 I0422 12:55:05.488549 136017910646592 pyconfig.py:471] Config param distill_beta_end: None I0422 12:55:05.488564 136017910646592 pyconfig.py:471] Config param distill_beta_schedule: constant I0422 12:55:05.488579 136017910646592 pyconfig.py:471] Config param distill_feature_loss_type: cosine I0422 12:55:05.488595 136017910646592 pyconfig.py:471] Config param distill_layer_indices: None I0422 12:55:05.488610 136017910646592 pyconfig.py:471] Config param distill_temperature: 1.0 I0422 12:55:05.488626 136017910646592 pyconfig.py:471] Config param distill_temperature_end: None I0422 12:55:05.488641 136017910646592 pyconfig.py:471] Config param distill_temperature_schedule: constant I0422 12:55:05.488666 136017910646592 pyconfig.py:471] Config param downsample_hidden_size_for_audio: 256 I0422 12:55:05.488681 136017910646592 pyconfig.py:471] Config param dpo_beta: 0.1 I0422 12:55:05.488698 136017910646592 pyconfig.py:471] Config param dpo_label_smoothing: 0.0 I0422 12:55:05.488714 136017910646592 pyconfig.py:471] Config param dq_reduction_steps: 0 I0422 12:55:05.488730 136017910646592 pyconfig.py:471] Config param dropout_rate: 0.0 I0422 12:55:05.488745 136017910646592 pyconfig.py:471] Config param dtype: bfloat16 I0422 12:55:05.488783 136017910646592 pyconfig.py:471] Config param dtype_mm: float32 I0422 12:55:05.488798 136017910646592 pyconfig.py:471] Config param dump_hlo: False I0422 12:55:05.488814 136017910646592 pyconfig.py:471] Config param dump_hlo_delete_local_after: True I0422 12:55:05.488831 136017910646592 pyconfig.py:471] Config param dump_hlo_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-22-12-55/xla_dump I0422 12:55:05.488847 136017910646592 pyconfig.py:471] Config param dump_hlo_local_dir: /tmp/xla_dump/ I0422 12:55:05.488863 136017910646592 pyconfig.py:471] Config param dump_hlo_local_module_name: jit_train_step I0422 12:55:05.488879 136017910646592 pyconfig.py:471] Config param dump_hlo_module_name: jit_train_step I0422 12:55:05.488894 136017910646592 pyconfig.py:471] Config param dump_hlo_upload_all: False I0422 12:55:05.488910 136017910646592 pyconfig.py:471] Config param dump_hlo_xla_flags: I0422 12:55:05.488924 136017910646592 pyconfig.py:471] Config param dump_jaxpr: False I0422 12:55:05.488940 136017910646592 pyconfig.py:471] Config param dump_jaxpr_delete_local_after: True I0422 12:55:05.488956 136017910646592 pyconfig.py:471] Config param dump_jaxpr_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-22-12-55/jaxpr_dump I0422 12:55:05.488971 136017910646592 pyconfig.py:471] Config param dump_jaxpr_local_dir: /tmp/jaxpr_dump/ I0422 12:55:05.488986 136017910646592 pyconfig.py:471] Config param dump_step: -1 I0422 12:55:05.489002 136017910646592 pyconfig.py:471] Config param elastic_enabled: False I0422 12:55:05.489017 136017910646592 pyconfig.py:471] Config param elastic_max_retries: 10 I0422 12:55:05.489032 136017910646592 pyconfig.py:471] Config param elastic_timeout_seconds: 300 I0422 12:55:05.489048 136017910646592 pyconfig.py:471] Config param emb_dim: 16 I0422 12:55:05.489064 136017910646592 pyconfig.py:471] Config param enable_autocheckpoint: False I0422 12:55:05.489079 136017910646592 pyconfig.py:471] Config param enable_checkpoint_cloud_logger: False I0422 12:55:05.489096 136017910646592 pyconfig.py:471] Config param enable_checkpointing: True I0422 12:55:05.489112 136017910646592 pyconfig.py:471] Config param enable_continuous_checkpointing: False I0422 12:55:05.489128 136017910646592 pyconfig.py:471] Config param enable_data_shuffling: True I0422 12:55:05.489144 136017910646592 pyconfig.py:471] Config param enable_diloco: False I0422 12:55:05.489159 136017910646592 pyconfig.py:471] Config param enable_dp_attention: False I0422 12:55:05.489175 136017910646592 pyconfig.py:471] Config param enable_dropout: False I0422 12:55:05.489189 136017910646592 pyconfig.py:471] Config param enable_emergency_checkpoint: False I0422 12:55:05.489205 136017910646592 pyconfig.py:471] Config param enable_expert_parallel: False I0422 12:55:05.489220 136017910646592 pyconfig.py:471] Config param enable_gcp_goodput_metrics: True I0422 12:55:05.489240 136017910646592 pyconfig.py:471] Config param enable_gcp_step_deviation_metrics: True I0422 12:55:05.489255 136017910646592 pyconfig.py:471] Config param enable_goodput_recording: False I0422 12:55:05.489270 136017910646592 pyconfig.py:471] Config param enable_jax_profiler: False I0422 12:55:05.489285 136017910646592 pyconfig.py:471] Config param enable_llm_inference_pool: False I0422 12:55:05.489300 136017910646592 pyconfig.py:471] Config param enable_model_warmup: False I0422 12:55:05.489316 136017910646592 pyconfig.py:471] Config param enable_multi_tier_checkpointing: False I0422 12:55:05.489331 136017910646592 pyconfig.py:471] Config param enable_nnx: False I0422 12:55:05.489346 136017910646592 pyconfig.py:471] Config param enable_orbax_v1: False I0422 12:55:05.489362 136017910646592 pyconfig.py:471] Config param enable_padding_causal_mask: True I0422 12:55:05.489378 136017910646592 pyconfig.py:471] Config param enable_pathways_goodput: False I0422 12:55:05.489393 136017910646592 pyconfig.py:471] Config param enable_prefix_caching: False I0422 12:55:05.489410 136017910646592 pyconfig.py:471] Config param enable_rampup_batch_size: False I0422 12:55:05.489426 136017910646592 pyconfig.py:471] Config param enable_single_controller: False I0422 12:55:05.489441 136017910646592 pyconfig.py:471] Config param enable_single_replica_ckpt_restoring: False I0422 12:55:05.489457 136017910646592 pyconfig.py:471] Config param enable_tensorboard: True I0422 12:55:05.489473 136017910646592 pyconfig.py:471] Config param enable_tunix_perf_metrics: False I0422 12:55:05.489487 136017910646592 pyconfig.py:471] Config param encoder_attention_heads_for_audio: 4 I0422 12:55:05.489502 136017910646592 pyconfig.py:471] Config param encoder_ffn_dim_for_audio: 512 I0422 12:55:05.489518 136017910646592 pyconfig.py:471] Config param encoder_layers_for_audio: 2 I0422 12:55:05.489533 136017910646592 pyconfig.py:471] Config param engram: RematLocation.REMAT I0422 12:55:05.489549 136017910646592 pyconfig.py:471] Config param engram_head_dim: 1280 I0422 12:55:05.489564 136017910646592 pyconfig.py:471] Config param engram_kernel_size: 4 I0422 12:55:05.489579 136017910646592 pyconfig.py:471] Config param engram_layers: [] I0422 12:55:05.489594 136017910646592 pyconfig.py:471] Config param engram_max_ngram_size: 3 I0422 12:55:05.489609 136017910646592 pyconfig.py:471] Config param engram_num_heads: 8 I0422 12:55:05.489624 136017910646592 pyconfig.py:471] Config param engram_seed: 0 I0422 12:55:05.489639 136017910646592 pyconfig.py:471] Config param engram_vocab_bases: [] I0422 12:55:05.489662 136017910646592 pyconfig.py:471] Config param epsilon_high: None I0422 12:55:05.489678 136017910646592 pyconfig.py:471] Config param eval_corr_lst: False I0422 12:55:05.489695 136017910646592 pyconfig.py:471] Config param eval_data_columns: ['text'] I0422 12:55:05.489711 136017910646592 pyconfig.py:471] Config param eval_dataset_name: c4/en:3.0.1 I0422 12:55:05.489727 136017910646592 pyconfig.py:471] Config param eval_image_column: image I0422 12:55:05.489742 136017910646592 pyconfig.py:471] Config param eval_interval: -1 I0422 12:55:05.489757 136017910646592 pyconfig.py:471] Config param eval_make_lst: False I0422 12:55:05.489773 136017910646592 pyconfig.py:471] Config param eval_per_device_batch_size: 2 I0422 12:55:05.489788 136017910646592 pyconfig.py:471] Config param eval_sampling_strategy: greedy I0422 12:55:05.489804 136017910646592 pyconfig.py:471] Config param eval_split: validation I0422 12:55:05.489820 136017910646592 pyconfig.py:471] Config param eval_steps: -1 I0422 12:55:05.489835 136017910646592 pyconfig.py:471] Config param expansion_factor_real_data: -1.0 I0422 12:55:05.489851 136017910646592 pyconfig.py:471] Config param final_logits_soft_cap: None I0422 12:55:05.489867 136017910646592 pyconfig.py:471] Config param first_num_dense_layers: 0 I0422 12:55:05.489881 136017910646592 pyconfig.py:471] Config param float32_gate_logits: False I0422 12:55:05.489897 136017910646592 pyconfig.py:471] Config param float32_logits: False I0422 12:55:05.489912 136017910646592 pyconfig.py:471] Config param float32_qk_product: False I0422 12:55:05.489928 136017910646592 pyconfig.py:471] Config param float32_weight_sum: True I0422 12:55:05.489944 136017910646592 pyconfig.py:471] Config param force_q_layout: False I0422 12:55:05.489960 136017910646592 pyconfig.py:471] Config param force_unroll: False I0422 12:55:05.489976 136017910646592 pyconfig.py:471] Config param formatting_func_kwargs: {} I0422 12:55:05.489993 136017910646592 pyconfig.py:471] Config param formatting_func_path: I0422 12:55:05.490008 136017910646592 pyconfig.py:471] Config param freeze_audio_encoder_params: True I0422 12:55:05.490024 136017910646592 pyconfig.py:471] Config param freeze_vision_encoder_params: True I0422 12:55:05.490040 136017910646592 pyconfig.py:471] Config param fused_mlp: False I0422 12:55:05.490055 136017910646592 pyconfig.py:471] Config param fused_qkv: True I0422 12:55:05.490070 136017910646592 pyconfig.py:471] Config param gcs_metrics: False I0422 12:55:05.490086 136017910646592 pyconfig.py:471] Config param gdn_chunk_size: 64 I0422 12:55:05.490102 136017910646592 pyconfig.py:471] Config param gdn_conv_kernel_dim: 4 I0422 12:55:05.490117 136017910646592 pyconfig.py:471] Config param gdn_key_head_dim: 128 I0422 12:55:05.490133 136017910646592 pyconfig.py:471] Config param gdn_num_key_heads: 16 I0422 12:55:05.490148 136017910646592 pyconfig.py:471] Config param gdn_num_value_heads: 32 I0422 12:55:05.490164 136017910646592 pyconfig.py:471] Config param gdn_value_head_dim: 128 I0422 12:55:05.490178 136017910646592 pyconfig.py:471] Config param generate_padding_batch_eval: False I0422 12:55:05.490193 136017910646592 pyconfig.py:471] Config param generate_padding_batch_train: False I0422 12:55:05.490209 136017910646592 pyconfig.py:471] Config param generate_slice: v5e-16 I0422 12:55:05.490225 136017910646592 pyconfig.py:471] Config param generation_configs: {} I0422 12:55:05.490244 136017910646592 pyconfig.py:471] Config param global_batch_size_to_eval_on: 64 I0422 12:55:05.490258 136017910646592 pyconfig.py:471] Config param global_batch_size_to_load: 512 I0422 12:55:05.490274 136017910646592 pyconfig.py:471] Config param global_batch_size_to_load_eval: 64 I0422 12:55:05.490289 136017910646592 pyconfig.py:471] Config param global_batch_size_to_load_increment: None I0422 12:55:05.490305 136017910646592 pyconfig.py:471] Config param global_batch_size_to_load_start: None I0422 12:55:05.490321 136017910646592 pyconfig.py:471] Config param global_batch_size_to_train_on: 512 I0422 12:55:05.490337 136017910646592 pyconfig.py:471] Config param global_head_dim: 0 I0422 12:55:05.490353 136017910646592 pyconfig.py:471] Config param global_num_kv_heads: 0 I0422 12:55:05.490367 136017910646592 pyconfig.py:471] Config param global_parameter_scale: 1 I0422 12:55:05.490383 136017910646592 pyconfig.py:471] Config param global_rampup_samples: 500 I0422 12:55:05.490399 136017910646592 pyconfig.py:471] Config param global_rope_max_timescale: -1 I0422 12:55:05.490414 136017910646592 pyconfig.py:471] Config param global_rope_proportion: 0.25 I0422 12:55:05.490431 136017910646592 pyconfig.py:471] Config param goodput_upload_interval_seconds: 30 I0422 12:55:05.490446 136017910646592 pyconfig.py:471] Config param grad_dtype: float32 I0422 12:55:05.490486 136017910646592 pyconfig.py:471] Config param gradient_accumulation_steps: 8 I0422 12:55:05.490504 136017910646592 pyconfig.py:471] Config param gradient_clipping_threshold: 1.0 I0422 12:55:05.490521 136017910646592 pyconfig.py:471] Config param grain_data_source_max_workers: 16 I0422 12:55:05.490537 136017910646592 pyconfig.py:471] Config param grain_eval_files: I0422 12:55:05.490552 136017910646592 pyconfig.py:471] Config param grain_file_type: arrayrecord I0422 12:55:05.490568 136017910646592 pyconfig.py:471] Config param grain_num_threads: 16 I0422 12:55:05.490583 136017910646592 pyconfig.py:471] Config param grain_num_threads_eval: 16 I0422 12:55:05.490599 136017910646592 pyconfig.py:471] Config param grain_packing_type: first_fit I0422 12:55:05.490615 136017910646592 pyconfig.py:471] Config param grain_per_worker_buffer_size: 1 I0422 12:55:05.490631 136017910646592 pyconfig.py:471] Config param grain_per_worker_buffer_size_eval: 1 I0422 12:55:05.490647 136017910646592 pyconfig.py:471] Config param grain_prefetch_buffer_size: 500 I0422 12:55:05.490674 136017910646592 pyconfig.py:471] Config param grain_prefetch_buffer_size_eval: 500 I0422 12:55:05.490689 136017910646592 pyconfig.py:471] Config param grain_ram_budget_mb: 1024 I0422 12:55:05.490704 136017910646592 pyconfig.py:471] Config param grain_shuffle_buffer_size: 100 I0422 12:55:05.490720 136017910646592 pyconfig.py:471] Config param grain_train_files: I0422 12:55:05.490736 136017910646592 pyconfig.py:471] Config param grain_train_mixture_config_path: I0422 12:55:05.490752 136017910646592 pyconfig.py:471] Config param grain_worker_count: 1 I0422 12:55:05.490767 136017910646592 pyconfig.py:471] Config param grain_worker_count_eval: 1 I0422 12:55:05.490782 136017910646592 pyconfig.py:471] Config param grpo_beta: 0.08 I0422 12:55:05.490798 136017910646592 pyconfig.py:471] Config param grpo_epsilon: 0.2 I0422 12:55:05.490814 136017910646592 pyconfig.py:471] Config param hardware: tpu I0422 12:55:05.490828 136017910646592 pyconfig.py:471] Config param hbm_utilization_vllm: 0.72 I0422 12:55:05.490844 136017910646592 pyconfig.py:471] Config param head_dim: 8 I0422 12:55:05.490860 136017910646592 pyconfig.py:471] Config param heartbeat_reporting_interval_in_seconds: 5 I0422 12:55:05.490876 136017910646592 pyconfig.py:471] Config param hf_data_dir: None I0422 12:55:05.490891 136017910646592 pyconfig.py:471] Config param hf_eval_files: None I0422 12:55:05.490907 136017910646592 pyconfig.py:471] Config param hf_eval_split: None I0422 12:55:05.490922 136017910646592 pyconfig.py:471] Config param hf_name: None I0422 12:55:05.490938 136017910646592 pyconfig.py:471] Config param hf_path: OptimalScale/ClimbMix I0422 12:55:05.490952 136017910646592 pyconfig.py:471] Config param hf_train_files: None I0422 12:55:05.490968 136017910646592 pyconfig.py:471] Config param hidden_size_for_vit: 1408 I0422 12:55:05.490983 136017910646592 pyconfig.py:471] Config param hide_profiler_step_metric: False I0422 12:55:05.490998 136017910646592 pyconfig.py:471] Config param ici_autoregressive_parallelism: 1 I0422 12:55:05.491012 136017910646592 pyconfig.py:471] Config param ici_context_autoregressive_parallelism: 1 I0422 12:55:05.491028 136017910646592 pyconfig.py:471] Config param ici_context_parallelism: 1 I0422 12:55:05.491042 136017910646592 pyconfig.py:471] Config param ici_data_parallelism: 1 I0422 12:55:05.491058 136017910646592 pyconfig.py:471] Config param ici_diloco_parallelism: 1 I0422 12:55:05.491073 136017910646592 pyconfig.py:471] Config param ici_expert_parallelism: 1 I0422 12:55:05.491089 136017910646592 pyconfig.py:471] Config param ici_fsdp_parallelism: -1 I0422 12:55:05.491103 136017910646592 pyconfig.py:471] Config param ici_fsdp_transpose_parallelism: 1 I0422 12:55:05.491118 136017910646592 pyconfig.py:471] Config param ici_parallelism: [1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1] I0422 12:55:05.491134 136017910646592 pyconfig.py:471] Config param ici_pipeline_parallelism: 1 I0422 12:55:05.491150 136017910646592 pyconfig.py:471] Config param ici_sequence_parallelism: 1 I0422 12:55:05.491166 136017910646592 pyconfig.py:471] Config param ici_tensor_parallelism: 1 I0422 12:55:05.491181 136017910646592 pyconfig.py:471] Config param ici_tensor_sequence_parallelism: 1 I0422 12:55:05.491196 136017910646592 pyconfig.py:471] Config param ici_tensor_transpose_parallelism: 1 I0422 12:55:05.491210 136017910646592 pyconfig.py:471] Config param image_path: I0422 12:55:05.491230 136017910646592 pyconfig.py:471] Config param image_placeholder: <|image|> I0422 12:55:05.491245 136017910646592 pyconfig.py:471] Config param image_size_for_vit: 896 I0422 12:55:05.491260 136017910646592 pyconfig.py:471] Config param indexer_head_dim: 128 I0422 12:55:05.491275 136017910646592 pyconfig.py:471] Config param indexer_loss_scaling_factor: 0.0 I0422 12:55:05.491290 136017910646592 pyconfig.py:471] Config param indexer_n_heads: 64 I0422 12:55:05.491306 136017910646592 pyconfig.py:471] Config param indexer_sparse_training: False I0422 12:55:05.491321 136017910646592 pyconfig.py:471] Config param indexer_topk: 2048 I0422 12:55:05.491336 136017910646592 pyconfig.py:471] Config param inference_benchmark_test: False I0422 12:55:05.491351 136017910646592 pyconfig.py:471] Config param inference_metadata_file: I0422 12:55:05.491366 136017910646592 pyconfig.py:471] Config param inference_microbenchmark_log_file_path: I0422 12:55:05.491381 136017910646592 pyconfig.py:471] Config param inference_microbenchmark_loop_iters: 10 I0422 12:55:05.491395 136017910646592 pyconfig.py:471] Config param inference_microbenchmark_num_samples: [1, 2, 3, 4, 5] I0422 12:55:05.491412 136017910646592 pyconfig.py:471] Config param inference_microbenchmark_prefill_lengths: 64,128,256,512,1024 I0422 12:55:05.491428 136017910646592 pyconfig.py:471] Config param inference_microbenchmark_stages: prefill,generate I0422 12:55:05.491443 136017910646592 pyconfig.py:471] Config param inference_server: MaxtextInterleavedServer I0422 12:55:05.491458 136017910646592 pyconfig.py:471] Config param inhomogeneous_layer_cycle_interval: 1 I0422 12:55:05.491473 136017910646592 pyconfig.py:471] Config param init_weights_seed: 0 I0422 12:55:05.491489 136017910646592 pyconfig.py:471] Config param input_data_sharding_logical_axes: ['activation_embed_and_logits_batch', 'activation_norm_length'] I0422 12:55:05.491505 136017910646592 pyconfig.py:471] Config param interleave_moe_layer_step: 1 I0422 12:55:05.491521 136017910646592 pyconfig.py:471] Config param intermediate_size_for_vit: 5632 I0422 12:55:05.491537 136017910646592 pyconfig.py:471] Config param internal_compile: False I0422 12:55:05.491553 136017910646592 pyconfig.py:471] Config param internal_compile_num_devices: -1 I0422 12:55:05.491568 136017910646592 pyconfig.py:471] Config param jax_cache_dir: ~/jax_cache I0422 12:55:05.491583 136017910646592 pyconfig.py:471] Config param jax_debug_log_modules: I0422 12:55:05.491598 136017910646592 pyconfig.py:471] Config param jax_distributed_initialization_timeout: 300 I0422 12:55:05.491613 136017910646592 pyconfig.py:471] Config param jax_profiler_port: 9999 I0422 12:55:05.491628 136017910646592 pyconfig.py:471] Config param key_proj: RematLocation.REMAT I0422 12:55:05.491644 136017910646592 pyconfig.py:471] Config param kv_cache_buffer: 256 I0422 12:55:05.491668 136017910646592 pyconfig.py:471] Config param kv_lora_rank: 512 I0422 12:55:05.491685 136017910646592 pyconfig.py:471] Config param kv_quant_axis: KvQuantAxis.HEADS_AND_DKV I0422 12:55:05.491704 136017910646592 pyconfig.py:471] Config param kv_quant_dtype: int8 I0422 12:55:05.491721 136017910646592 pyconfig.py:471] Config param kv_wa_proj: RematLocation.REMAT I0422 12:55:05.491737 136017910646592 pyconfig.py:471] Config param learning_rate: 0.0002 I0422 12:55:05.491753 136017910646592 pyconfig.py:471] Config param learning_rate_final_fraction: 0.1 I0422 12:55:05.491769 136017910646592 pyconfig.py:471] Config param learning_rate_schedule_steps: 200000 I0422 12:55:05.491784 136017910646592 pyconfig.py:471] Config param load_balance_loss_weight: 0.0 I0422 12:55:05.491799 136017910646592 pyconfig.py:471] Config param load_checkpoint_only_once: False I0422 12:55:05.491815 136017910646592 pyconfig.py:471] Config param load_from_prefill_dir: False I0422 12:55:05.491830 136017910646592 pyconfig.py:471] Config param load_full_state_path: I0422 12:55:05.491845 136017910646592 pyconfig.py:471] Config param load_parameters_path: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items I0422 12:55:05.491862 136017910646592 pyconfig.py:471] Config param local_checkpoint_directory: I0422 12:55:05.491877 136017910646592 pyconfig.py:471] Config param local_checkpoint_period: 0 I0422 12:55:05.491893 136017910646592 pyconfig.py:471] Config param local_rope_max_timescale: -1 I0422 12:55:05.491908 136017910646592 pyconfig.py:471] Config param local_rope_proportion: 1.0 I0422 12:55:05.491925 136017910646592 pyconfig.py:471] Config param log_config: True I0422 12:55:05.491940 136017910646592 pyconfig.py:471] Config param log_period: 10 I0422 12:55:05.491956 136017910646592 pyconfig.py:471] Config param logical_axis_rules: (('activation_embed_and_logits_batch', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_embed_and_logits_batch_sequence', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_vocab', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_vocab', ('tensor', 'tensor_transpose')), ('activation_vocab', 'tensor_sequence'), ('activation_vocab', ('sequence', 'context')), ('vocab', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('embed_vocab', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence', 'autoregressive')), ('activation_kv_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence')), ('activation_attn_length', ('sequence', 'context')), ('activation_attn_length', ('context',)), ('activation_q_length', ('context',)), ('activation_kv_length', ()), ('activation_attn_embed', ('tensor', 'tensor_transpose')), ('activation_kv', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_kv_head_dim', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('q_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('kv_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('qkv', ()), ('kv', ()), ('kv_head_dim', ()), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'expert')), ('q_lora_up_proj', ()), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'expert')), ('kv_lora_up_proj', ()), ('activation_batch_moe', ('data', 'fsdp', 'fsdp_transpose')), ('activation_length_moe', ('sequence', 'context')), ('activation_length_moe', ('context',)), ('activation_norm_length_moe', ('tensor_sequence', 'context', 'sequence')), ('activation_embed_moe', ('tensor', 'tensor_transpose')), ('activation_mlp_moe', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_exp', ('expert',)), ('exp', 'expert'), ('mlp_moe', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'context')), ('embed_moe', ('fsdp', 'sequence', 'context')), ('activation_mlp', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_length', ('sequence', 'context')), ('activation_length', ('context',)), ('activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_embed', ('tensor', 'tensor_transpose')), ('activation_stage', 'stage'), ('mlp', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'context', 'expert')), ('norm', ('tensor', 'tensor_transpose')), ('layers', 'stage'), ('diloco', 'diloco'), ('engram_dim', ('tensor',)), ('dense_layers', ()), ('moe_layers', ()), ('mhc', ()), ('prefill_activation_length', ('sequence', 'context')), ('prefill_activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_prefill_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_length', ('sequence',)), ('cache_heads', ('autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence')), ('cache_heads', ('autoregressive', 'tensor', 'tensor_sequence')), ('paged_kv_heads', ('tensor',)), ('cache_batch_prefill', ()), ('cache_batch', ()), ('cache_heads_none', ()), ('cache_kv', ()), ('cache_sequence', ()), ('num_pages', ()), ('tokens_per_page', ()), ('paged_kv_head_dim_size', ()), ('mlp_no_fsdp', ('tensor', 'tensor_sequence', 'autoregressive')), ('embed_tensor_transpose', ('tensor_transpose',)), ('exp_with_fsdp', 'fsdp')) I0422 12:55:05.492072 136017910646592 pyconfig.py:471] Config param logits_dot_in_fp32: False I0422 12:55:05.492089 136017910646592 pyconfig.py:471] Config param logits_via_embedding: True I0422 12:55:05.492105 136017910646592 pyconfig.py:471] Config param lora_input_adapters_path: I0422 12:55:05.492121 136017910646592 pyconfig.py:471] Config param loss_algo: grpo I0422 12:55:05.492136 136017910646592 pyconfig.py:471] Config param lr_schedule_type: LearningRateScheduleType.COSINE I0422 12:55:05.492156 136017910646592 pyconfig.py:471] Config param managed_mldiagnostics: False I0422 12:55:05.492171 136017910646592 pyconfig.py:471] Config param managed_mldiagnostics_dir: /deps/maxtext_output/gpt3-52k_2026-04-22-12-55/managed-mldiagnostics I0422 12:55:05.492186 136017910646592 pyconfig.py:471] Config param managed_mldiagnostics_run_group: I0422 12:55:05.492201 136017910646592 pyconfig.py:471] Config param matmul_precision: MatmulPrecision.DEFAULT I0422 12:55:05.492219 136017910646592 pyconfig.py:471] Config param max_checkify: False I0422 12:55:05.492238 136017910646592 pyconfig.py:471] Config param max_concurrency: 256 I0422 12:55:05.492254 136017910646592 pyconfig.py:471] Config param max_corpus_chars: 10000000 I0422 12:55:05.492269 136017910646592 pyconfig.py:471] Config param max_num_batched_tokens: None I0422 12:55:05.492284 136017910646592 pyconfig.py:471] Config param max_num_checkpoints_to_keep: None I0422 12:55:05.492300 136017910646592 pyconfig.py:471] Config param max_num_images_per_example: -1 I0422 12:55:05.492314 136017910646592 pyconfig.py:471] Config param max_num_seqs: None I0422 12:55:05.492330 136017910646592 pyconfig.py:471] Config param max_position_embeddings: 163840 I0422 12:55:05.492344 136017910646592 pyconfig.py:471] Config param max_prefill_predict_length: 64 I0422 12:55:05.492360 136017910646592 pyconfig.py:471] Config param max_sample_len_for_audio: 10000 I0422 12:55:05.492374 136017910646592 pyconfig.py:471] Config param max_segments_per_seq: -1 I0422 12:55:05.492390 136017910646592 pyconfig.py:471] Config param max_source_positions_for_audio: 1500 I0422 12:55:05.492404 136017910646592 pyconfig.py:471] Config param max_target_length: 2048 I0422 12:55:05.492420 136017910646592 pyconfig.py:471] Config param max_timescale_for_audio: 10000.0 I0422 12:55:05.492435 136017910646592 pyconfig.py:471] Config param megablox: True I0422 12:55:05.492450 136017910646592 pyconfig.py:471] Config param merge_gating_gmm: False I0422 12:55:05.492466 136017910646592 pyconfig.py:471] Config param mesh_axes: ['diloco', 'data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'] I0422 12:55:05.492485 136017910646592 pyconfig.py:471] Config param metrics_dir: /deps/maxtext_output/gpt3-52k_2026-04-22-12-55/metrics/ I0422 12:55:05.492501 136017910646592 pyconfig.py:471] Config param metrics_file: I0422 12:55:05.492516 136017910646592 pyconfig.py:471] Config param mhc_expansion_rate: 1 I0422 12:55:05.492531 136017910646592 pyconfig.py:471] Config param micro_batch_size_to_eval_on: 64 I0422 12:55:05.492546 136017910646592 pyconfig.py:471] Config param micro_batch_size_to_train_on: 64 I0422 12:55:05.492561 136017910646592 pyconfig.py:471] Config param mla_kv: RematLocation.REMAT I0422 12:55:05.492577 136017910646592 pyconfig.py:471] Config param mla_naive_kvcache: True I0422 12:55:05.492592 136017910646592 pyconfig.py:471] Config param mla_q: RematLocation.REMAT I0422 12:55:05.492608 136017910646592 pyconfig.py:471] Config param mlp_activations: ['gelu'] I0422 12:55:05.492624 136017910646592 pyconfig.py:471] Config param mlp_activations_limit: -1.0 I0422 12:55:05.492639 136017910646592 pyconfig.py:471] Config param mlp_bias: False I0422 12:55:05.492666 136017910646592 pyconfig.py:471] Config param mlp_dim: 64 I0422 12:55:05.492682 136017910646592 pyconfig.py:471] Config param mlpwi: RematLocation.REMAT I0422 12:55:05.492697 136017910646592 pyconfig.py:471] Config param mlpwi_0: RematLocation.REMAT I0422 12:55:05.492712 136017910646592 pyconfig.py:471] Config param mlpwi_1: RematLocation.REMAT I0422 12:55:05.492727 136017910646592 pyconfig.py:471] Config param mlpwo: RematLocation.REMAT I0422 12:55:05.492743 136017910646592 pyconfig.py:471] Config param moba: False I0422 12:55:05.492758 136017910646592 pyconfig.py:471] Config param moba_chunk_size: 1024 I0422 12:55:05.492774 136017910646592 pyconfig.py:471] Config param moba_topk: 8 I0422 12:55:05.492789 136017910646592 pyconfig.py:471] Config param model_call_mode: I0422 12:55:05.492803 136017910646592 pyconfig.py:471] Config param model_name: gpt3-52k I0422 12:55:05.492819 136017910646592 pyconfig.py:471] Config param moe_expert_input_dim: -1 I0422 12:55:05.492834 136017910646592 pyconfig.py:471] Config param moe_fsdp_use_two_stage_all_gather: False I0422 12:55:05.492850 136017910646592 pyconfig.py:471] Config param moe_mlp_dim: -1 I0422 12:55:05.492865 136017910646592 pyconfig.py:471] Config param moe_mlpwi_0: RematLocation.REMAT I0422 12:55:05.492881 136017910646592 pyconfig.py:471] Config param moe_mlpwi_1: RematLocation.REMAT I0422 12:55:05.492896 136017910646592 pyconfig.py:471] Config param moe_mlpwo: RematLocation.REMAT I0422 12:55:05.492911 136017910646592 pyconfig.py:471] Config param monitor_goodput: False I0422 12:55:05.492925 136017910646592 pyconfig.py:471] Config param monitor_step_time_deviation: True I0422 12:55:05.492941 136017910646592 pyconfig.py:471] Config param mrope_section: [24, 20, 20] I0422 12:55:05.492957 136017910646592 pyconfig.py:471] Config param mscale: 1.0 I0422 12:55:05.492972 136017910646592 pyconfig.py:471] Config param mtc_data_parallelism: 0 I0422 12:55:05.492987 136017910646592 pyconfig.py:471] Config param mtp_eval_target_module: 0 I0422 12:55:05.493002 136017910646592 pyconfig.py:471] Config param mtp_loss_scaling_factor: 0.1 I0422 12:55:05.493019 136017910646592 pyconfig.py:471] Config param mtp_num_layers: 0 I0422 12:55:05.493034 136017910646592 pyconfig.py:471] Config param mu_dtype: float32 I0422 12:55:05.493058 136017910646592 pyconfig.py:471] Config param multi_sampling: False I0422 12:55:05.493074 136017910646592 pyconfig.py:471] Config param multi_tier_checkpointing_backup_interval_minutes: 0 I0422 12:55:05.493090 136017910646592 pyconfig.py:471] Config param muon_beta: 0.95 I0422 12:55:05.493106 136017910646592 pyconfig.py:471] Config param muon_consistent_rms: None I0422 12:55:05.493121 136017910646592 pyconfig.py:471] Config param muon_weight_decay: 0.0 I0422 12:55:05.493136 136017910646592 pyconfig.py:471] Config param n_routing_groups: -1 I0422 12:55:05.493151 136017910646592 pyconfig.py:471] Config param n_window_for_audio: 50 I0422 12:55:05.493167 136017910646592 pyconfig.py:471] Config param n_window_infer_for_audio: 800 I0422 12:55:05.493182 136017910646592 pyconfig.py:471] Config param nope_layer_interval: -1 I0422 12:55:05.493198 136017910646592 pyconfig.py:471] Config param norm_topk_prob: False I0422 12:55:05.493213 136017910646592 pyconfig.py:471] Config param normalization_layer_epsilon: 1e-05 I0422 12:55:05.493238 136017910646592 pyconfig.py:471] Config param normalize_embedding_logits: False I0422 12:55:05.493253 136017910646592 pyconfig.py:471] Config param num_attention_heads_for_vit: 16 I0422 12:55:05.493269 136017910646592 pyconfig.py:471] Config param num_batches: 4 I0422 12:55:05.493284 136017910646592 pyconfig.py:471] Config param num_channels_for_vit: 3 I0422 12:55:05.493299 136017910646592 pyconfig.py:471] Config param num_conv_layers_for_audio: 3 I0422 12:55:05.493314 136017910646592 pyconfig.py:471] Config param num_decoder_layers: 1 I0422 12:55:05.493329 136017910646592 pyconfig.py:471] Config param num_diloco_replicas: 1 I0422 12:55:05.493344 136017910646592 pyconfig.py:471] Config param num_epoch: 1 I0422 12:55:05.493360 136017910646592 pyconfig.py:471] Config param num_eval_passes: 1 I0422 12:55:05.493374 136017910646592 pyconfig.py:471] Config param num_experts: 1 I0422 12:55:05.493390 136017910646592 pyconfig.py:471] Config param num_experts_per_tok: 1 I0422 12:55:05.493405 136017910646592 pyconfig.py:471] Config param num_generations: 2 I0422 12:55:05.493420 136017910646592 pyconfig.py:471] Config param num_hidden_layers_for_vit: 34 I0422 12:55:05.493435 136017910646592 pyconfig.py:471] Config param num_iterations: 1 I0422 12:55:05.493450 136017910646592 pyconfig.py:471] Config param num_kv_heads: 2 I0422 12:55:05.493465 136017910646592 pyconfig.py:471] Config param num_layers_per_pipeline_stage: 1 I0422 12:55:05.493481 136017910646592 pyconfig.py:471] Config param num_mel_bins_for_audio: 128 I0422 12:55:05.493496 136017910646592 pyconfig.py:471] Config param num_pipeline_microbatches: -1 I0422 12:55:05.493512 136017910646592 pyconfig.py:471] Config param num_pipeline_repeats: -1 I0422 12:55:05.493527 136017910646592 pyconfig.py:471] Config param num_position_embeddings_for_vit: 1024 I0422 12:55:05.493543 136017910646592 pyconfig.py:471] Config param num_query_heads: 2 I0422 12:55:05.493558 136017910646592 pyconfig.py:471] Config param num_samplers_slices: -1 I0422 12:55:05.493574 136017910646592 pyconfig.py:471] Config param num_slices: 1 I0422 12:55:05.493588 136017910646592 pyconfig.py:471] Config param num_target_devices: 32 I0422 12:55:05.493604 136017910646592 pyconfig.py:471] Config param num_test_batches: 5 I0422 12:55:05.493620 136017910646592 pyconfig.py:471] Config param num_trainer_slices: -1 I0422 12:55:05.493636 136017910646592 pyconfig.py:471] Config param num_vocab_tiling: 1 I0422 12:55:05.493659 136017910646592 pyconfig.py:471] Config param off_policy_steps: 0 I0422 12:55:05.493675 136017910646592 pyconfig.py:471] Config param offline_data_dir: None I0422 12:55:05.493690 136017910646592 pyconfig.py:471] Config param opt_type: OptimizerType.ADAM_PAX I0422 12:55:05.493707 136017910646592 pyconfig.py:471] Config param optimize_mesh_for_tpu_v6e: False I0422 12:55:05.493723 136017910646592 pyconfig.py:471] Config param optimizer_memory_host_offload: False I0422 12:55:05.493738 136017910646592 pyconfig.py:471] Config param original_max_position_embeddings: 4096 I0422 12:55:05.493753 136017910646592 pyconfig.py:471] Config param out_hidden_size_for_vit: 512 I0422 12:55:05.493767 136017910646592 pyconfig.py:471] Config param out_proj: RematLocation.REMAT I0422 12:55:05.493783 136017910646592 pyconfig.py:471] Config param output_dim_for_audio: 512 I0422 12:55:05.493797 136017910646592 pyconfig.py:471] Config param override_logical_axis_rules: False I0422 12:55:05.493813 136017910646592 pyconfig.py:471] Config param override_model_config: True I0422 12:55:05.493828 136017910646592 pyconfig.py:471] Config param packing: True I0422 12:55:05.493844 136017910646592 pyconfig.py:471] Config param pagedattn_head_dim_alignment: 128 I0422 12:55:05.493858 136017910646592 pyconfig.py:471] Config param pagedattn_max_pages_per_group: -1 I0422 12:55:05.493874 136017910646592 pyconfig.py:471] Config param pagedattn_num_pages: 64 I0422 12:55:05.493888 136017910646592 pyconfig.py:471] Config param pagedattn_pages_per_compute_block: 4 I0422 12:55:05.493903 136017910646592 pyconfig.py:471] Config param pagedattn_tokens_per_page: 32 I0422 12:55:05.493917 136017910646592 pyconfig.py:471] Config param param_scan_axis: 1 I0422 12:55:05.493932 136017910646592 pyconfig.py:471] Config param parameter_memory_host_offload: False I0422 12:55:05.493947 136017910646592 pyconfig.py:471] Config param partial_rotary_factor: 1.0 I0422 12:55:05.493962 136017910646592 pyconfig.py:471] Config param patch_size_for_vit: 14 I0422 12:55:05.493977 136017910646592 pyconfig.py:471] Config param penalty_incorrect_answer: -1.0 I0422 12:55:05.493993 136017910646592 pyconfig.py:471] Config param penalty_incorrect_format: -0.5 I0422 12:55:05.494009 136017910646592 pyconfig.py:471] Config param per_device_batch_size: 2 I0422 12:55:05.494024 136017910646592 pyconfig.py:471] Config param per_device_batch_size_increment: 2.0 I0422 12:55:05.494040 136017910646592 pyconfig.py:471] Config param per_device_batch_size_start: 4.0 I0422 12:55:05.494055 136017910646592 pyconfig.py:471] Config param pipeline_delay_activation_forwarding: False I0422 12:55:05.494071 136017910646592 pyconfig.py:471] Config param pipeline_fsdp_ag_once: False I0422 12:55:05.494087 136017910646592 pyconfig.py:471] Config param pipeline_fsdp_ag_per_repeat: False I0422 12:55:05.494102 136017910646592 pyconfig.py:471] Config param pipeline_parallel_layers: 1 I0422 12:55:05.494118 136017910646592 pyconfig.py:471] Config param pixel_shuffle_ratio_for_vit: 0.5 I0422 12:55:05.494134 136017910646592 pyconfig.py:471] Config param posemb_type_for_vit: learn I0422 12:55:05.494149 136017910646592 pyconfig.py:471] Config param position_id_per_seconds: 25 I0422 12:55:05.494165 136017910646592 pyconfig.py:471] Config param prefill_cache_axis_order: 1,2,0,3 I0422 12:55:05.494181 136017910646592 pyconfig.py:471] Config param prefill_cache_dir: I0422 12:55:05.494196 136017910646592 pyconfig.py:471] Config param prefill_chunk_size: 256 I0422 12:55:05.494212 136017910646592 pyconfig.py:471] Config param prefill_slice: v5e-16 I0422 12:55:05.494230 136017910646592 pyconfig.py:471] Config param prefix_caching_dram_byte: 100000000000 I0422 12:55:05.494246 136017910646592 pyconfig.py:471] Config param prefix_caching_hbm_byte: 10000000000 I0422 12:55:05.494260 136017910646592 pyconfig.py:471] Config param prefuse_moe_weights: False I0422 12:55:05.494275 136017910646592 pyconfig.py:471] Config param profile_cleanly: True I0422 12:55:05.494290 136017910646592 pyconfig.py:471] Config param profile_periodically_period: -1 I0422 12:55:05.494305 136017910646592 pyconfig.py:471] Config param profile_power_events: False I0422 12:55:05.494320 136017910646592 pyconfig.py:471] Config param profiler: ProfilerType.NONE I0422 12:55:05.494337 136017910646592 pyconfig.py:471] Config param profiler_steps: 5 I0422 12:55:05.494352 136017910646592 pyconfig.py:471] Config param projector_dropout_for_vit: 0.0 I0422 12:55:05.494368 136017910646592 pyconfig.py:471] Config param projector_input_dim_for_vit: 4096 I0422 12:55:05.494382 136017910646592 pyconfig.py:471] Config param projector_output_dim_for_vit: 4096 I0422 12:55:05.494397 136017910646592 pyconfig.py:471] Config param prometheus_port: 0 I0422 12:55:05.494413 136017910646592 pyconfig.py:471] Config param prompt: I love to I0422 12:55:05.494428 136017910646592 pyconfig.py:471] Config param pure_nnx: False I0422 12:55:05.494442 136017910646592 pyconfig.py:471] Config param pure_nnx_decoder: False I0422 12:55:05.494458 136017910646592 pyconfig.py:471] Config param q_lora_rank: 0 I0422 12:55:05.494472 136017910646592 pyconfig.py:471] Config param qk_clip_threshold: 100.0 I0422 12:55:05.494489 136017910646592 pyconfig.py:471] Config param qk_nope_head_dim: 128 I0422 12:55:05.494505 136017910646592 pyconfig.py:471] Config param qk_norm_with_scale: True I0422 12:55:05.494519 136017910646592 pyconfig.py:471] Config param qk_rope_head_dim: 64 I0422 12:55:05.494536 136017910646592 pyconfig.py:471] Config param qkv_proj: RematLocation.REMAT I0422 12:55:05.494552 136017910646592 pyconfig.py:471] Config param quant_cfg_path: I0422 12:55:05.494566 136017910646592 pyconfig.py:471] Config param quantization: QuantizationType.NONE I0422 12:55:05.494583 136017910646592 pyconfig.py:471] Config param quantization_local_shard_count: 4 I0422 12:55:05.494598 136017910646592 pyconfig.py:471] Config param quantize_kvcache: False I0422 12:55:05.494613 136017910646592 pyconfig.py:471] Config param query_proj: RematLocation.REMAT I0422 12:55:05.494628 136017910646592 pyconfig.py:471] Config param query_wa_proj: RematLocation.REMAT I0422 12:55:05.494644 136017910646592 pyconfig.py:471] Config param ragged_block_size: 256 I0422 12:55:05.494670 136017910646592 pyconfig.py:471] Config param ragged_buffer_factor: -1.0 I0422 12:55:05.494686 136017910646592 pyconfig.py:471] Config param rampup_end_step: 0 I0422 12:55:05.494702 136017910646592 pyconfig.py:471] Config param rampup_samples_per_increment_to_load: None I0422 12:55:05.494717 136017910646592 pyconfig.py:471] Config param reasoning_end_token: </reasoning> I0422 12:55:05.494731 136017910646592 pyconfig.py:471] Config param reasoning_start_token: <reasoning> I0422 12:55:05.494747 136017910646592 pyconfig.py:471] Config param record_internal_nn_metrics: 0 I0422 12:55:05.494763 136017910646592 pyconfig.py:471] Config param remat_policy: full I0422 12:55:05.494778 136017910646592 pyconfig.py:471] Config param remat_policy_for_vit: minimal I0422 12:55:05.494793 136017910646592 pyconfig.py:471] Config param remove_size_one_mesh_axis_from_type: True I0422 12:55:05.494808 136017910646592 pyconfig.py:471] Config param replicate_quant_scale: False I0422 12:55:05.494823 136017910646592 pyconfig.py:471] Config param replicator_backup_interval_minutes: 0 I0422 12:55:05.494837 136017910646592 pyconfig.py:471] Config param report_heartbeat_metric_for_gcp_monitoring: False I0422 12:55:05.494853 136017910646592 pyconfig.py:471] Config param report_performance_metric_for_gcp_monitoring: False I0422 12:55:05.494868 136017910646592 pyconfig.py:471] Config param reshape_q: False I0422 12:55:05.494882 136017910646592 pyconfig.py:471] Config param return_log_prob: False I0422 12:55:05.494898 136017910646592 pyconfig.py:471] Config param reuse_example_batch: 0 I0422 12:55:05.494913 136017910646592 pyconfig.py:471] Config param reward_exact_answer: 5.0 I0422 12:55:05.494942 136017910646592 pyconfig.py:471] Config param reward_exact_format_match: 3.0 I0422 12:55:05.494958 136017910646592 pyconfig.py:471] Config param reward_partial_format_match: 0.5 I0422 12:55:05.494973 136017910646592 pyconfig.py:471] Config param reward_ratio_guess_to_answer_high: 0.5 I0422 12:55:05.494988 136017910646592 pyconfig.py:471] Config param reward_ratio_guess_to_answer_low: 0.25 I0422 12:55:05.495004 136017910646592 pyconfig.py:471] Config param reward_white_space_format_match: 1.5 I0422 12:55:05.495018 136017910646592 pyconfig.py:471] Config param rl: {'num_generations': 2, 'num_iterations': 1, 'grpo_beta': 0.08, 'grpo_epsilon': 0.2, 'loss_algo': 'grpo', 'use_agentic_rollout': False, 'max_concurrency': 256, 'off_policy_steps': 0, 'system_prompt': '', 'degenerate_group_masking': True, 'epsilon_high': None} I0422 12:55:05.495040 136017910646592 pyconfig.py:471] Config param rollout_data_parallelism: -1 I0422 12:55:05.495056 136017910646592 pyconfig.py:471] Config param rollout_expert_parallelism: 1 I0422 12:55:05.495072 136017910646592 pyconfig.py:471] Config param rollout_micro_batch_size: -1 I0422 12:55:05.495086 136017910646592 pyconfig.py:471] Config param rollout_tensor_parallelism: -1 I0422 12:55:05.495101 136017910646592 pyconfig.py:471] Config param rope_attention_scaling: False I0422 12:55:05.495117 136017910646592 pyconfig.py:471] Config param rope_factor: 40 I0422 12:55:05.495132 136017910646592 pyconfig.py:471] Config param rope_interleave: True I0422 12:55:05.495147 136017910646592 pyconfig.py:471] Config param rope_linear_scaling_factor: 1.0 I0422 12:55:05.495163 136017910646592 pyconfig.py:471] Config param rope_max_timescale: 10000 I0422 12:55:05.495177 136017910646592 pyconfig.py:471] Config param rope_min_timescale: 1 I0422 12:55:05.495192 136017910646592 pyconfig.py:471] Config param rope_theta_for_vit: 10000 I0422 12:55:05.495208 136017910646592 pyconfig.py:471] Config param rope_truncate: True I0422 12:55:05.495223 136017910646592 pyconfig.py:471] Config param rope_type: RopeType.DEFAULT I0422 12:55:05.495245 136017910646592 pyconfig.py:471] Config param rope_use_scale: True I0422 12:55:05.495260 136017910646592 pyconfig.py:471] Config param routed_bias: False I0422 12:55:05.495275 136017910646592 pyconfig.py:471] Config param routed_bias_update_rate: 0.0 I0422 12:55:05.495290 136017910646592 pyconfig.py:471] Config param routed_scaling_factor: 1.0 I0422 12:55:05.495305 136017910646592 pyconfig.py:471] Config param routed_score_func: I0422 12:55:05.495319 136017910646592 pyconfig.py:471] Config param run_name: gpt3-52k_2026-04-22-12-55 I0422 12:55:05.495335 136017910646592 pyconfig.py:471] Config param sa_block_kv: 512 I0422 12:55:05.495350 136017910646592 pyconfig.py:471] Config param sa_block_kv_compute: 512 I0422 12:55:05.495365 136017910646592 pyconfig.py:471] Config param sa_block_kv_dkv: 512 I0422 12:55:05.495380 136017910646592 pyconfig.py:471] Config param sa_block_kv_dkv_compute: 512 I0422 12:55:05.495395 136017910646592 pyconfig.py:471] Config param sa_block_kv_dq: 512 I0422 12:55:05.495410 136017910646592 pyconfig.py:471] Config param sa_block_q: 512 I0422 12:55:05.495424 136017910646592 pyconfig.py:471] Config param sa_block_q_dkv: 512 I0422 12:55:05.495439 136017910646592 pyconfig.py:471] Config param sa_block_q_dq: 512 I0422 12:55:05.495453 136017910646592 pyconfig.py:471] Config param sa_k_layout: HEAD_DIM_MINOR I0422 12:55:05.495469 136017910646592 pyconfig.py:471] Config param sa_q_layout: HEAD_DIM_MINOR I0422 12:55:05.495483 136017910646592 pyconfig.py:471] Config param sa_use_fused_bwd_kernel: False I0422 12:55:05.495498 136017910646592 pyconfig.py:471] Config param sa_v_layout: HEAD_DIM_MINOR I0422 12:55:05.495513 136017910646592 pyconfig.py:471] Config param sampler_devices_fraction: 0.5 I0422 12:55:05.495528 136017910646592 pyconfig.py:471] Config param save_checkpoint_on_completion: True I0422 12:55:05.495543 136017910646592 pyconfig.py:471] Config param save_config_to_gcs: False I0422 12:55:05.495558 136017910646592 pyconfig.py:471] Config param save_quantized_params_path: I0422 12:55:05.495573 136017910646592 pyconfig.py:471] Config param scale_embedding_for_audio: True I0422 12:55:05.495589 136017910646592 pyconfig.py:471] Config param scan_layers: True I0422 12:55:05.495604 136017910646592 pyconfig.py:471] Config param scan_layers_per_stage: False I0422 12:55:05.495618 136017910646592 pyconfig.py:471] Config param scan_pipeline_iterations: True I0422 12:55:05.495633 136017910646592 pyconfig.py:471] Config param scan_pipeline_repeats: False I0422 12:55:05.495657 136017910646592 pyconfig.py:471] Config param set_remat_policy_on_layers_per_stage: False I0422 12:55:05.495672 136017910646592 pyconfig.py:471] Config param set_remat_policy_on_pipeline_iterations: True I0422 12:55:05.495687 136017910646592 pyconfig.py:471] Config param sft_train_on_completion_only: False I0422 12:55:05.495701 136017910646592 pyconfig.py:471] Config param shard_exp_on_fsdp: False I0422 12:55:05.495716 136017910646592 pyconfig.py:471] Config param shard_mode: ShardMode.AUTO I0422 12:55:05.495732 136017910646592 pyconfig.py:471] Config param shard_optimizer_over_data: False I0422 12:55:05.495748 136017910646592 pyconfig.py:471] Config param sharding_strategy: None I0422 12:55:05.495762 136017910646592 pyconfig.py:471] Config param sharding_tolerance: 0.02 I0422 12:55:05.495778 136017910646592 pyconfig.py:471] Config param shardy: True I0422 12:55:05.495793 136017910646592 pyconfig.py:471] Config param share_kv_projections: False I0422 12:55:05.495808 136017910646592 pyconfig.py:471] Config param shared_experts: 0 I0422 12:55:05.495822 136017910646592 pyconfig.py:471] Config param sinkhorn_iterations: 20 I0422 12:55:05.495836 136017910646592 pyconfig.py:471] Config param skip_first_n_steps_for_profiler: 1 I0422 12:55:05.495852 136017910646592 pyconfig.py:471] Config param skip_jax_distributed_system: False I0422 12:55:05.495866 136017910646592 pyconfig.py:471] Config param skip_step_interval: 128 I0422 12:55:05.495881 136017910646592 pyconfig.py:471] Config param skip_step_on_spikes: False I0422 12:55:05.495896 136017910646592 pyconfig.py:471] Config param skip_step_scaling_factor: 6.0 I0422 12:55:05.495911 136017910646592 pyconfig.py:471] Config param sliding_window_size: 0 I0422 12:55:05.495925 136017910646592 pyconfig.py:471] Config param solution_end_token: </answer> I0422 12:55:05.495940 136017910646592 pyconfig.py:471] Config param solution_start_token: <answer> I0422 12:55:05.495954 136017910646592 pyconfig.py:471] Config param source_checkpoint_layout: orbax I0422 12:55:05.495970 136017910646592 pyconfig.py:471] Config param sparse_matmul: True I0422 12:55:05.495984 136017910646592 pyconfig.py:471] Config param spatial_merge_size_for_vit: 2 I0422 12:55:05.495999 136017910646592 pyconfig.py:471] Config param stack_prefill_result_cache: False I0422 12:55:05.496014 136017910646592 pyconfig.py:471] Config param stack_trace_interval_seconds: 600 I0422 12:55:05.496029 136017910646592 pyconfig.py:471] Config param stack_trace_to_cloud: False I0422 12:55:05.496044 136017910646592 pyconfig.py:471] Config param step_deviation_interval_seconds: 30 I0422 12:55:05.496058 136017910646592 pyconfig.py:471] Config param steps: 200000 I0422 12:55:05.496073 136017910646592 pyconfig.py:471] Config param stop_strings: None I0422 12:55:05.496088 136017910646592 pyconfig.py:471] Config param student_overrides: {'model_name': 'llama3.1-8b'} I0422 12:55:05.496104 136017910646592 pyconfig.py:471] Config param student_params_to_update: None I0422 12:55:05.496119 136017910646592 pyconfig.py:471] Config param subslice_shape: I0422 12:55:05.496134 136017910646592 pyconfig.py:471] Config param swap_space_vllm_gb: 2 I0422 12:55:05.496150 136017910646592 pyconfig.py:471] Config param system_prompt: I0422 12:55:05.496165 136017910646592 pyconfig.py:471] Config param target_eval_loss: 0.0 I0422 12:55:05.496181 136017910646592 pyconfig.py:471] Config param teacher_overrides: {'model_name': 'llama3.1-8b'} I0422 12:55:05.496197 136017910646592 pyconfig.py:471] Config param temperature_tuning: False I0422 12:55:05.496212 136017910646592 pyconfig.py:471] Config param temporal_patch_size_for_vit: 2 I0422 12:55:05.496231 136017910646592 pyconfig.py:471] Config param tensorboard_dir: /deps/maxtext_output/gpt3-52k_2026-04-22-12-55/tensorboard/ I0422 12:55:05.496246 136017910646592 pyconfig.py:471] Config param tensors_on_device: None I0422 12:55:05.496261 136017910646592 pyconfig.py:471] Config param tensors_to_offload: None I0422 12:55:05.496276 136017910646592 pyconfig.py:471] Config param test_batch_start_index: 0 I0422 12:55:05.496291 136017910646592 pyconfig.py:471] Config param tile_size_for_vit: 336 I0422 12:55:05.496305 136017910646592 pyconfig.py:471] Config param tokenize_eval_data: True I0422 12:55:05.496321 136017910646592 pyconfig.py:471] Config param tokenize_train_data: True I0422 12:55:05.496335 136017910646592 pyconfig.py:471] Config param tokenizer_path: meta-llama/Llama-3.1-8B I0422 12:55:05.496351 136017910646592 pyconfig.py:471] Config param tokenizer_type: TokenizerType.HUGGINGFACE I0422 12:55:05.496368 136017910646592 pyconfig.py:471] Config param topk_routing_group: -1 I0422 12:55:05.496382 136017910646592 pyconfig.py:471] Config param train_data_columns: ['text'] I0422 12:55:05.496398 136017910646592 pyconfig.py:471] Config param train_fraction: 1.0 I0422 12:55:05.496413 136017910646592 pyconfig.py:471] Config param train_image_column: image I0422 12:55:05.496427 136017910646592 pyconfig.py:471] Config param train_micro_batch_size: -1 I0422 12:55:05.496442 136017910646592 pyconfig.py:471] Config param train_split: train I0422 12:55:05.496456 136017910646592 pyconfig.py:471] Config param trainable_parameters_mask: [] I0422 12:55:05.496472 136017910646592 pyconfig.py:471] Config param trainable_position_size: 2048 I0422 12:55:05.496487 136017910646592 pyconfig.py:471] Config param trainer_devices_fraction: 0.5 I0422 12:55:05.496502 136017910646592 pyconfig.py:471] Config param upload_all_profiler_results: False I0422 12:55:05.496518 136017910646592 pyconfig.py:471] Config param use_2d_fsdp_sharding: False I0422 12:55:05.496532 136017910646592 pyconfig.py:471] Config param use_agentic_rollout: False I0422 12:55:05.496548 136017910646592 pyconfig.py:471] Config param use_audio: False I0422 12:55:05.496563 136017910646592 pyconfig.py:471] Config param use_audio_in_video: False I0422 12:55:05.496579 136017910646592 pyconfig.py:471] Config param use_batch_split_schedule: False I0422 12:55:05.496593 136017910646592 pyconfig.py:471] Config param use_chat_template: False I0422 12:55:05.496608 136017910646592 pyconfig.py:471] Config param use_chunked_prefill: False I0422 12:55:05.496624 136017910646592 pyconfig.py:471] Config param use_custom_sort_vjp: True I0422 12:55:05.496638 136017910646592 pyconfig.py:471] Config param use_dpo: False I0422 12:55:05.496663 136017910646592 pyconfig.py:471] Config param use_gather_mosaic_kernel: False I0422 12:55:05.496678 136017910646592 pyconfig.py:471] Config param use_grpo: True I0422 12:55:05.496693 136017910646592 pyconfig.py:471] Config param use_indexer: False I0422 12:55:05.496707 136017910646592 pyconfig.py:471] Config param use_iota_embed: True I0422 12:55:05.496723 136017910646592 pyconfig.py:471] Config param use_jax_splash: False I0422 12:55:05.496738 136017910646592 pyconfig.py:471] Config param use_max_logit_estimate: -1 I0422 12:55:05.496752 136017910646592 pyconfig.py:471] Config param use_mrope: False I0422 12:55:05.496767 136017910646592 pyconfig.py:471] Config param use_multimodal: False I0422 12:55:05.496782 136017910646592 pyconfig.py:471] Config param use_pathways: True I0422 12:55:05.496797 136017910646592 pyconfig.py:471] Config param use_post_attn_norm: False I0422 12:55:05.496813 136017910646592 pyconfig.py:471] Config param use_post_ffw_norm: False I0422 12:55:05.496828 136017910646592 pyconfig.py:471] Config param use_qk_clip: False I0422 12:55:05.496843 136017910646592 pyconfig.py:471] Config param use_qk_norm: False I0422 12:55:05.496858 136017910646592 pyconfig.py:471] Config param use_qk_norm_in_gdn: True I0422 12:55:05.496874 136017910646592 pyconfig.py:471] Config param use_qwix_quantization: False I0422 12:55:05.496889 136017910646592 pyconfig.py:471] Config param use_ragged_attention: False I0422 12:55:05.496904 136017910646592 pyconfig.py:471] Config param use_random_routing: False I0422 12:55:05.496920 136017910646592 pyconfig.py:471] Config param use_replicator_service: False I0422 12:55:05.496935 136017910646592 pyconfig.py:471] Config param use_ring_of_experts: False I0422 12:55:05.496952 136017910646592 pyconfig.py:471] Config param use_sft: False I0422 12:55:05.496968 136017910646592 pyconfig.py:471] Config param use_splash_scheduler: False I0422 12:55:05.496982 136017910646592 pyconfig.py:471] Config param use_tokamax_gmm: False I0422 12:55:05.496997 136017910646592 pyconfig.py:471] Config param use_tokamax_splash: False I0422 12:55:05.497012 136017910646592 pyconfig.py:471] Config param use_truncation: True I0422 12:55:05.497026 136017910646592 pyconfig.py:471] Config param use_tunix_gradient_accumulation: False I0422 12:55:05.497041 136017910646592 pyconfig.py:471] Config param use_untrainable_positional_embedding: False I0422 12:55:05.497056 136017910646592 pyconfig.py:471] Config param use_vertex_tensorboard: False I0422 12:55:05.497071 136017910646592 pyconfig.py:471] Config param using_pipeline_parallelism: False I0422 12:55:05.497086 136017910646592 pyconfig.py:471] Config param v_head_dim: 128 I0422 12:55:05.497101 136017910646592 pyconfig.py:471] Config param v_norm_with_scale: True I0422 12:55:05.497115 136017910646592 pyconfig.py:471] Config param value_proj: RematLocation.REMAT I0422 12:55:05.497131 136017910646592 pyconfig.py:471] Config param vertex_tensorboard_project: I0422 12:55:05.497145 136017910646592 pyconfig.py:471] Config param vertex_tensorboard_region: I0422 12:55:05.497161 136017910646592 pyconfig.py:471] Config param video_path: I0422 12:55:05.497175 136017910646592 pyconfig.py:471] Config param video_placeholder: <|video|> I0422 12:55:05.497189 136017910646592 pyconfig.py:471] Config param vision_output_dim_for_vit: 4096 I0422 12:55:05.497205 136017910646592 pyconfig.py:471] Config param vision_output_length: -1 I0422 12:55:05.497219 136017910646592 pyconfig.py:471] Config param vllm_additional_config: {} I0422 12:55:05.497238 136017910646592 pyconfig.py:471] Config param vllm_hf_config_path: I0422 12:55:05.497253 136017910646592 pyconfig.py:471] Config param vllm_hf_overrides: {} I0422 12:55:05.497267 136017910646592 pyconfig.py:471] Config param vocab_size: 32000 I0422 12:55:05.497283 136017910646592 pyconfig.py:471] Config param warmup_steps_fraction: 0.1 I0422 12:55:05.497299 136017910646592 pyconfig.py:471] Config param weight_dtype: float32 I0422 12:55:05.497323 136017910646592 pyconfig.py:471] Config param weight_quantization_calibration_method: absmax I0422 12:55:05.497340 136017910646592 pyconfig.py:471] Config param wi_tile_dlhs_batch_seq: 512 I0422 12:55:05.497354 136017910646592 pyconfig.py:471] Config param wi_tile_dlhs_embed_dim: 1024 I0422 12:55:05.497370 136017910646592 pyconfig.py:471] Config param wi_tile_dlhs_mlp_dim: 1024 I0422 12:55:05.497385 136017910646592 pyconfig.py:471] Config param wi_tile_drhs_batch_seq: 512 I0422 12:55:05.497400 136017910646592 pyconfig.py:471] Config param wi_tile_drhs_embed_dim: 1024 I0422 12:55:05.497414 136017910646592 pyconfig.py:471] Config param wi_tile_drhs_mlp_dim: 1024 I0422 12:55:05.497430 136017910646592 pyconfig.py:471] Config param wi_tile_fwd_batch_seq: 512 I0422 12:55:05.497445 136017910646592 pyconfig.py:471] Config param wi_tile_fwd_embed_dim: 1024 I0422 12:55:05.497459 136017910646592 pyconfig.py:471] Config param wi_tile_fwd_mlp_dim: 1024 I0422 12:55:05.497474 136017910646592 pyconfig.py:471] Config param wo_tile_dlhs_batch_seq: 512 I0422 12:55:05.497489 136017910646592 pyconfig.py:471] Config param wo_tile_dlhs_embed_dim: 1024 I0422 12:55:05.497504 136017910646592 pyconfig.py:471] Config param wo_tile_dlhs_mlp_dim: 1024 I0422 12:55:05.497520 136017910646592 pyconfig.py:471] Config param wo_tile_drhs_batch_seq: 512 I0422 12:55:05.497534 136017910646592 pyconfig.py:471] Config param wo_tile_drhs_embed_dim: 1024 I0422 12:55:05.497550 136017910646592 pyconfig.py:471] Config param wo_tile_drhs_mlp_dim: 1024 I0422 12:55:05.497565 136017910646592 pyconfig.py:471] Config param wo_tile_fwd_batch_seq: 512 I0422 12:55:05.497580 136017910646592 pyconfig.py:471] Config param wo_tile_fwd_embed_dim: 1024 I0422 12:55:05.497595 136017910646592 pyconfig.py:471] Config param wo_tile_fwd_mlp_dim: 1024 I0422 12:55:05.497610 136017910646592 pyconfig.py:471] Config param wsd_decay_steps_fraction: 0.1 I0422 12:55:05.497626 136017910646592 pyconfig.py:471] Config param wsd_decay_style: WsdDecayStyle.LINEAR I0422 12:55:05.497643 136017910646592 pyconfig.py:471] Config param xprof_e2e_enable_fw_power_level_event: False I0422 12:55:05.497667 136017910646592 pyconfig.py:471] Config param xprof_e2e_enable_fw_thermal_event: False I0422 12:55:05.497683 136017910646592 pyconfig.py:471] Config param xprof_e2e_enable_fw_throttle_event: False I0422 12:55:05.497699 136017910646592 pyconfig.py:471] Config param xprof_tpu_power_trace_level: 0 I0422 12:55:05.497716 136017910646592 pyconfig.py:471] Config param z_loss_multiplier: 0.0 I0422 12:55:05.498203 136017910646592 tokenizer.py:245] Tokenizer path: meta-llama/Llama-2-7b-chat-hf I0422 12:55:05.498246 136017910646592 tokenizer.py:224] Loading HF tokenizer: meta-llama/Llama-2-7b-chat-hf I0422 12:55:09.139218 136017910646592 _schedule.py:129] A polynomial schedule was set with a non-positive `transition_steps` value; this results in a constant schedule with value `init_value`. I0422 12:55:09.142225 136017910646592 maxtext_utils.py:1771] Num_devices: 32, shape (1, 4, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1) I0422 12:55:09.142351 136017910646592 train_distill.py:608] Applying logical axis rules for model initialization and training... I0422 12:55:09.142423 136017910646592 train_distill.py:612] Loading Student from ... I0422 12:55:09.142451 136017910646592 train_distill.py:169] --- Student Configuration --- I0422 12:55:09.142473 136017910646592 train_distill.py:170] Model Name: gpt3-52k I0422 12:55:09.142494 136017910646592 train_distill.py:171] Dimensions: 1 Layers, 16 Emb Dim, 8 Head Dim I0422 12:55:09.142513 136017910646592 train_distill.py:174] Attention Heads: 2 Query, 2 KV I0422 12:55:09.142530 136017910646592 train_distill.py:175] Vocab Size: 32000 I0422 12:55:09.142547 136017910646592 train_distill.py:176] Checkpoint: I0422 12:55:09.142565 136017910646592 train_distill.py:477] Initializing model: gpt3-52k... I0422 12:55:10.501821 136017910646592 train_distill.py:626] Loading Teacher from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items... I0422 12:55:10.501932 136017910646592 train_distill.py:169] --- Teacher Configuration --- I0422 12:55:10.501961 136017910646592 train_distill.py:170] Model Name: gpt3-52k I0422 12:55:10.501986 136017910646592 train_distill.py:171] Dimensions: 1 Layers, 16 Emb Dim, 8 Head Dim I0422 12:55:10.502011 136017910646592 train_distill.py:174] Attention Heads: 2 Query, 2 KV I0422 12:55:10.502032 136017910646592 train_distill.py:175] Vocab Size: 32000 I0422 12:55:10.502052 136017910646592 train_distill.py:176] Checkpoint: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items I0422 12:55:10.502071 136017910646592 train_distill.py:477] Initializing model: gpt3-52k... I0422 12:55:11.528985 136017910646592 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0422 12:55:11.529431 136017910646592 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=True, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7bb46ba35be0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0422 12:55:11.529490 136017910646592 abstract_checkpointer.py:35] orbax-checkpoint version: 0.11.28 W0422 12:55:12.032305 136017910646592 checkpoint.py:202] Metadata file does not exist: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items/_CHECKPOINT_METADATA I0422 12:55:12.551841 2142 google_auth_provider.cc:181] Running on GCE, using service account 562977990677-compute@developer.gserviceaccount.com I0422 12:55:13.352817 136017910646592 checkpointer.py:304] Restoring checkpoint from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items. W0422 12:55:15.126493 136017910646592 transform_utils.py:230] The transformations API will eventually be replaced by an upgraded design. The current API will not be removed until this point, but it will no longer be actively worked on. I0422 12:55:15.126890 136017910646592 transform_utils.py:288] The following keys are not loaded from the original tree after applying specified transforms: params/params/decoder/to_nnx__rngs/aqt/count, params/params/decoder/to_nnx__rngs/aqt/key, params/params/decoder/to_nnx__rngs/dropout/count, params/params/decoder/to_nnx__rngs/dropout/key, params/params/decoder/to_nnx__rngs/params/count, params/params/decoder/to_nnx__rngs/params/key I0422 12:55:16.735419 136017910646592 checkpointer.py:318] Finished restoring checkpoint in 3.76 seconds from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items. I0422 12:55:17.418205 136017910646592 train_distill.py:652] Initializing Data Iterators via MaxText pipeline... I0422 12:55:17.481556 136017910646592 config.py:112] TensorFlow version 2.20.0 available. I0422 12:55:17.482065 136017910646592 config.py:125] JAX version 0.8.3 available. E0422 12:55:19.469093 136017910646592 packing.py:209] PackAndBatchOperation is deprecated. Please use lazy_dataset.FirstFitPackIterDataset instead. I0422 12:55:19.469308 136017910646592 data_loader.py:408] Adding CopyNumPyArrayToSharedMemory MapTransform. I0422 12:55:19.472381 136017910646592 train_distill.py:422] Input Pipeline Checkpointing: DISABLED I0422 12:55:19.472445 136017910646592 train_distill.py:426] Reason: Iterator 'MultiHostDataLoadIterator' is not recognized as Grain (dataset_type='DatasetType.HF', has_save=False) I0422 12:55:19.472508 136017910646592 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0422 12:55:19.472584 136017910646592 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7bb46ba35be0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0422 12:55:19.472624 136017910646592 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0422 12:55:19.472669 136017910646592 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7bb46ba35be0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0422 12:55:19.472713 136017910646592 checkpoint_manager.py:702] [process=5][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7baa8065d880>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b9ce5f87230>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7b9ce4a6f9b0>}, handler_registry=None I0422 12:55:19.472903 136017910646592 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7baa8065d880>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0422 12:55:19.472943 136017910646592 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b9ce5f87230>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0422 12:55:19.472975 136017910646592 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7b9ce4a6f9b0>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0422 12:55:19.472999 136017910646592 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7b9ce4a6f590>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0422 12:55:19.473025 136017910646592 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7baa8065d880>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7baa8065d880>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b9ce5f87230>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b9ce5f87230>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7b9ce4a6f9b0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7b9ce4a6f9b0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7b9ce4a6f590>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7b9ce4a6f590>}). I0422 12:55:19.473418 136017910646592 async_checkpointer.py:177] [process=5][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7b9ce4a0b7e0> timeout: 600 secs and primary_host=0 for async checkpoint writes I0422 12:55:27.546467 136017910646592 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260422_123915/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260422_123915_07_distill_smoke/checkpoints I0422 12:55:27.565359 136017910646592 checkpoint_manager.py:921] [process=5][thread=MainThread] CheckpointManager created, primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260422_123915/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260422_123915_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7b9ce4a6f920> I0422 12:55:27.565491 136017910646592 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0422 12:55:27.565553 136017910646592 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7bb46ba35be0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0422 12:55:27.565590 136017910646592 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0422 12:55:27.565622 136017910646592 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7bb46ba35be0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0422 12:55:27.565670 136017910646592 checkpoint_manager.py:1983] [process=5][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning. I0422 12:55:27.565722 136017910646592 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=136017910646592 count=1 at 0x7bb35de38580>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7b9ce4a6f710>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7b9ce4a6f6e0>, _write_futures=[]) I0422 12:55:27.566080 136017910646592 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=136017910646592 count=1 at 0x7bb35de38580>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7b9ce4a6f710>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7b9ce4a6f6e0>, _write_futures=[]) I0422 12:55:27.566107 136017910646592 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=136017910646592 count=1 at 0x7bb35de38580>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7b9ce4a6f710>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7b9ce4a6f6e0>, _write_futures=[]) I0422 12:55:27.566137 136017910646592 checkpoint_manager.py:702] [process=5][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b9ce4a6f8f0>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b9ce4867170>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7b9ce4866c60>, 'iter': <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7b9ce4866570>}, handler_registry=None I0422 12:55:27.566240 136017910646592 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b9ce4a6f8f0>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0422 12:55:27.566274 136017910646592 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b9ce4867170>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0422 12:55:27.566296 136017910646592 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7b9ce4866c60>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0422 12:55:27.566324 136017910646592 composite_checkpoint_handler.py:237] Deferred registration for item: "iter". Adding handler `<maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7b9ce4866570>` for item "iter" and save args `<class 'maxtext.common.checkpointing.GrainCheckpointSave'>` and restore args `<class 'maxtext.common.checkpointing.GrainCheckpointRestore'>` to `_handler_registry`. I0422 12:55:27.566348 136017910646592 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7b9ce4866150>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0422 12:55:27.566372 136017910646592 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b9ce4a6f8f0>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b9ce4a6f8f0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b9ce4867170>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b9ce4867170>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7b9ce4866c60>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7b9ce4866c60>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointSave'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7b9ce4866570>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointRestore'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7b9ce4866570>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7b9ce4866150>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7b9ce4866150>}). I0422 12:55:27.566443 136017910646592 async_checkpointer.py:177] [process=5][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7b9ce4a0b920> timeout: 600 secs and primary_host=0 for async checkpoint writes I0422 12:55:27.946499 136017910646592 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260422_123915/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260422_123915_07_distill_smoke/checkpoints I0422 12:55:27.961585 136017910646592 checkpoint_manager.py:921] [process=5][thread=MainThread] CheckpointManager created, primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260422_123915/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260422_123915_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7b9ce4a6f2c0> I0422 12:55:27.962040 136017910646592 train_distill.py:703] Starting Distillation Training... I0422 12:55:27.962140 136017910646592 peft_trainer.py:584] Training with mesh: Mesh('diloco': 1, 'data': 4, 'stage': 1, 'fsdp': 8, 'fsdp_transpose': 1, 'sequence': 1, 'context': 1, 'context_autoregressive': 1, 'tensor': 1, 'tensor_transpose': 1, 'tensor_sequence': 1, 'expert': 1, 'autoregressive': 1, axis_types=(Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto)) I0422 12:55:28.684983 136017910646592 peft_trainer.py:594] Compiled train_step cache size: 0 I0422 12:55:28.686637 135873723078400 grain_pool.py:367] Grain pool will use 1 processes. I0422 12:55:28.712894 135873723078400 grain_pool.py:440] Grain pool will start child processes. I0422 12:55:28.718143 135873723078400 grain_pool.py:448] Grain pool started all child processes. 2026-04-22 12:55:34.720259: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303) Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} `rope_scaling`'s factor field must be a float >= 1, got 40 `rope_scaling`'s beta_fast field must be a float, got 32 `rope_scaling`'s beta_slow field must be a float, got 1 Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} /deps/src/maxtext/trainers/post_train/distillation/train_distill.py:281: DeprecationWarning: '.value' access is now deprecated. For Variable[Array] instances use: variable[...] For other Variable types use: variable.get_value() current_step = model.training_step.value I0422 12:55:41.197088 136017910646592 checkpoint_manager.py:1983] [process=5][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning. I0422 12:55:41.199338 136017910646592 checkpoint_manager.py:1501] [process=5] Saving checkpoint at step 1 I0422 12:55:41.202505 136017910646592 async_checkpointer.py:452] [process=5] Started async saving checkpoint to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260422_123915/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260422_123915_07_distill_smoke/checkpoints/1. I0422 12:55:41.758711 136017910646592 signaling_client.py:364] Using JaxDistributedSignalingClient I0422 12:55:41.759753 136017910646592 jax_array_handlers.py:347] Scheduling D2H of 22 prioritized jax.Array. I0422 12:55:41.759809 136017910646592 replica_slices.py:410] Transferring arrays to host memory with options: use_replica_parallel=True, min_slice_bytes_for_replica_parallel=None, max_replicas_for_replica_parallel=None, enable_pinned_host_transfer=False I0422 12:55:42.427602 136017910646592 base_pytree_checkpoint_handler.py:153] [process=5][thread=MainThread] Initiated "orbax.checkpoint._src.serialization.jax_array_handlers.ArrayHandler".serialize. Time taken: 0.669227s I0422 12:55:42.430478 136017910646592 base_pytree_checkpoint_handler.py:128] [process=5] /jax/checkpoint/write/blocking_gbytes_per_sec: 288.760 KiB/s (total gbytes: 267.5 KiB) (time elapsed: 926 milliseconds) (per-host) I0422 12:55:42.430552 136017910646592 base_pytree_checkpoint_handler.py:732] [process=5][thread=MainThread] Initiated Pytree async_save. Time taken: 0.926602s (batch_requests_ready=0.250882s, total_serialization_initiated=0.675593s, others=0.000127s) I0422 12:55:42.431319 136017910646592 jax_array_handlers.py:347] Scheduling D2H of 37 prioritized jax.Array. I0422 12:55:42.431376 136017910646592 replica_slices.py:410] Transferring arrays to host memory with options: use_replica_parallel=True, min_slice_bytes_for_replica_parallel=None, max_replicas_for_replica_parallel=None, enable_pinned_host_transfer=False I0422 12:55:42.439478 136017910646592 base_pytree_checkpoint_handler.py:153] [process=5][thread=MainThread] Initiated "orbax.checkpoint._src.serialization.jax_array_handlers.ArrayHandler".serialize. Time taken: 0.008857s I0422 12:55:42.439594 136017910646592 base_pytree_checkpoint_handler.py:128] [process=5] /jax/checkpoint/write/blocking_gbytes_per_sec: 573.189 KiB/s (total gbytes: 535.1 KiB) (time elapsed: 933 milliseconds) (per-host) I0422 12:55:42.439639 136017910646592 base_pytree_checkpoint_handler.py:732] [process=5][thread=MainThread] Initiated Pytree async_save. Time taken: 0.933553s (batch_requests_ready=0.921706s, total_serialization_initiated=0.011773s, others=0.000074s) I0422 12:55:42.439743 136017910646592 composite_checkpoint_handler.py:715] [process=5][thread=MainThread] Initiated CompositeCheckpointHandler.async_save. Time taken: 0.939689s (all_items=0.000022s, per_item={'model_params': '0.00001836', 'optimizer_state': '0.00000405'}, temp_paths=0.939667) I0422 12:55:42.440727 135867330983680 async_checkpointer.py:79] [process=5][thread=async_save] Background save thread started. I0422 12:55:42.440896 136017910646592 async_checkpointer.py:561] Finished blocking save. Time taken: 1.241488s. Continuing background save to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260422_123915/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260422_123915_07_distill_smoke/checkpoints/1. I0422 12:55:42.663301 136017910646592 checkpoint_manager.py:1549] [process=5][thread=MainThread][step=1] Starting CheckpointManager Save Finalize thread=save_finalize I0422 12:55:42.663698 135867828172544 async_checkpointer.py:265] [process=5][thread=save_finalize] Waiting for background save thread=async_save. I0422 12:55:42.663839 136017910646592 standard_logger.py:34] {'step': 1, 'event_type': 'save', 'directory': 'gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260422_123915/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260422_123915_07_distill_smoke/checkpoints', 'reached_preemption': False, 'preemption_received_at': None, 'synchronous': False, 'wait_for_prev_start_time': 1776862541.1970558, 'wait_for_prev_duration_secs': 0.00013637542724609375, 'time_between_consecutive_saves_sec': None, 'checkpointer_blocking_start_time': 1776862541.1993785, 'checkpointer_blocking_duration_secs': 1.2416222095489502, 'get_old_steps_start_time': 1776862542.4410224, 'get_old_steps_duration_secs': 8.177757263183594e-05, 'checkpoint_manager_blocking_start_time': 1776862540.1807055, 'checkpoint_manager_blocking_duration_secs': 2.4830970764160156} /deps/src/maxtext/trainers/post_train/distillation/train_distill.py:281: DeprecationWarning: '.value' access is now deprecated. For Variable[Array] instances use: variable[...] For other Variable types use: variable.get_value() current_step = model.training_step.value I0422 12:55:46.011006 136017910646592 peft_trainer.py:474] Train step 1 training loss: 15.963623 - training perplexity: 8568669.000000 I0422 12:55:46.031438 136017910646592 peft_trainer.py:474] Train step 2 training loss: 15.943937 - training perplexity: 8401639.000000 I0422 12:55:46.056514 136017910646592 peft_trainer.py:474] Train step 3 training loss: 15.973638 - training perplexity: 8654912.000000 I0422 12:55:46.077093 136017910646592 peft_trainer.py:474] Train step 4 training loss: 15.952717 - training perplexity: 8475726.000000 I0422 12:55:46.081989 136017910646592 peft_trainer.py:733] Train loop finished in: 17.3965 seconds I0422 12:55:46.082452 136017910646592 train_distill.py:712] Saving final checkpoint to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260422_123915/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260422_123915_07_distill_smoke/checkpoints/... I0422 12:55:47.053892 135866206885632 array_metadata_store.py:203] [process=5][thread=array_type_handler] Wrote 22 array_metadata.ArrayMetadata to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260422_123915/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260422_123915_07_distill_smoke/checkpoints/1/model_params/array_metadatas/process_5 I0422 12:55:47.054998 135867330983680 base_pytree_checkpoint_handler.py:128] [process=5] /jax/checkpoint/write/gbytes_per_sec: 48.195 KiB/s (total gbytes: 267.5 KiB) (time elapsed: 5 seconds) (per-host) I0422 12:55:47.546693 135867811387136 array_metadata_store.py:203] [process=5][thread=array_type_handler] Wrote 37 array_metadata.ArrayMetadata to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260422_123915/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260422_123915_07_distill_smoke/checkpoints/1/optimizer_state/array_metadatas/process_5 I0422 12:55:47.547867 135867330983680 base_pytree_checkpoint_handler.py:128] [process=5] /jax/checkpoint/write/gbytes_per_sec: 88.561 KiB/s (total gbytes: 535.1 KiB) (time elapsed: 6 seconds) (per-host) I0422 12:55:47.547931 135867330983680 async_checkpointer.py:90] [process=5][thread=async_save] 4 Handler Commit operations completed. Time taken: 5.107094s. I0422 12:55:58.366756 136017910646592 checkpoint_manager.py:1994] [process=5][thread=MainThread][step=1][wait_until_finished] Waiting for Save Finalize thread (save_finalize) to complete. I0422 12:55:59.230052 135867330983680 async_checkpointer.py:144] [process=5][thread=async_save] Background save thread done. Time taken: 16.789199s. I0422 12:55:59.230355 135867828172544 async_checkpointer.py:273] [process=5][thread=save_finalize] Done with waiting for background save thread=async_save. I0422 12:55:59.230476 135867828172544 async_checkpointer.py:283] [process=5][thread=save_finalize] No errors found in background save thread=async_save. I0422 12:55:59.230524 135867828172544 checkpoint_manager.py:2103] [process=5][thread=save_finalize][step=1] CheckpointManager Save Finalize is syncing with other hosts... I0422 12:55:59.232294 135867828172544 checkpoint_manager.py:2112] [process=5][thread=save_finalize][step=1] CheckpointManager Save Finalize is done on all hosts. I0422 12:55:59.232415 136017910646592 checkpoint_manager.py:2006] [process=5][thread=MainThread][step=1][wait_until_finished] Done waiting for Save Finalize thread (save_finalize) running at step=1. I0422 12:55:59.233949 136017910646592 checkpoint_manager.py:1501] [process=5] Saving checkpoint at step 5 I0422 12:55:59.237481 136017910646592 async_checkpointer.py:452] [process=5] Started async saving checkpoint to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260422_123915/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260422_123915_07_distill_smoke/checkpoints/5. I0422 12:55:59.786011 136017910646592 jax_array_handlers.py:347] Scheduling D2H of 22 prioritized jax.Array. I0422 12:55:59.786110 136017910646592 replica_slices.py:410] Transferring arrays to host memory with options: use_replica_parallel=True, min_slice_bytes_for_replica_parallel=None, max_replicas_for_replica_parallel=None, enable_pinned_host_transfer=False I0422 12:56:00.426178 136017910646592 base_pytree_checkpoint_handler.py:153] [process=5][thread=MainThread] Initiated "orbax.checkpoint._src.serialization.jax_array_handlers.ArrayHandler".serialize. Time taken: 0.641554s I0422 12:56:00.427833 136017910646592 base_pytree_checkpoint_handler.py:128] [process=5] /jax/checkpoint/write/blocking_gbytes_per_sec: 298.389 KiB/s (total gbytes: 267.5 KiB) (time elapsed: 896 milliseconds) (per-host) I0422 12:56:00.427894 136017910646592 base_pytree_checkpoint_handler.py:732] [process=5][thread=MainThread] Initiated Pytree async_save. Time taken: 0.896685s (batch_requests_ready=0.251894s, total_serialization_initiated=0.644689s, others=0.000102s) I0422 12:56:00.428771 136017910646592 jax_array_handlers.py:347] Scheduling D2H of 37 prioritized jax.Array. I0422 12:56:00.428825 136017910646592 replica_slices.py:410] Transferring arrays to host memory with options: use_replica_parallel=True, min_slice_bytes_for_replica_parallel=None, max_replicas_for_replica_parallel=None, enable_pinned_host_transfer=False I0422 12:56:00.436929 136017910646592 base_pytree_checkpoint_handler.py:153] [process=5][thread=MainThread] Initiated "orbax.checkpoint._src.serialization.jax_array_handlers.ArrayHandler".serialize. Time taken: 0.008940s I0422 12:56:00.437036 136017910646592 base_pytree_checkpoint_handler.py:128] [process=5] /jax/checkpoint/write/blocking_gbytes_per_sec: 591.634 KiB/s (total gbytes: 535.1 KiB) (time elapsed: 904 milliseconds) (per-host) I0422 12:56:00.437080 136017910646592 base_pytree_checkpoint_handler.py:732] [process=5][thread=MainThread] Initiated Pytree async_save. Time taken: 0.904443s (batch_requests_ready=0.893718s, total_serialization_initiated=0.010659s, others=0.000066s) I0422 12:56:00.437206 136017910646592 composite_checkpoint_handler.py:715] [process=5][thread=MainThread] Initiated CompositeCheckpointHandler.async_save. Time taken: 0.909915s (all_items=0.000014s, per_item={'model_params': '0.00001121', 'optimizer_state': '0.00000262'}, temp_paths=0.909901) I0422 12:56:00.438166 135867819779840 async_checkpointer.py:79] [process=5][thread=async_save] Background save thread started. I0422 12:56:00.438337 136017910646592 async_checkpointer.py:561] Finished blocking save. Time taken: 1.204314s. Continuing background save to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260422_123915/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260422_123915_07_distill_smoke/checkpoints/5. I0422 12:56:00.479436 136017910646592 checkpoint_manager.py:1549] [process=5][thread=MainThread][step=5] Starting CheckpointManager Save Finalize thread=save_finalize I0422 12:56:00.479778 135867330983680 async_checkpointer.py:265] [process=5][thread=save_finalize] Waiting for background save thread=async_save. I0422 12:56:00.479947 136017910646592 standard_logger.py:34] {'step': 5, 'event_type': 'save', 'directory': 'gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260422_123915/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260422_123915_07_distill_smoke/checkpoints', 'reached_preemption': False, 'preemption_received_at': None, 'synchronous': False, 'wait_for_prev_start_time': 1776862558.366717, 'wait_for_prev_duration_secs': 0.8657562732696533, 'time_between_consecutive_saves_sec': None, 'checkpointer_blocking_start_time': 1776862559.2339902, 'checkpointer_blocking_duration_secs': 1.2044544219970703, 'get_old_steps_start_time': 1776862560.438469, 'get_old_steps_duration_secs': 8.106231689453125e-05, 'checkpoint_manager_blocking_start_time': 1776862546.0846229, 'checkpoint_manager_blocking_duration_secs': 14.395289897918701} I0422 12:56:00.480119 136017910646592 checkpoint_manager.py:1994] [process=5][thread=MainThread][step=5][wait_until_finished] Waiting for Save Finalize thread (save_finalize) to complete. I0422 12:56:04.569566 135866206885632 array_metadata_store.py:203] [process=5][thread=array_type_handler] Wrote 37 array_metadata.ArrayMetadata to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260422_123915/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260422_123915_07_distill_smoke/checkpoints/5/optimizer_state/array_metadatas/process_5 I0422 12:56:04.584495 135864621451008 array_metadata_store.py:203] [process=5][thread=array_type_handler] Wrote 22 array_metadata.ArrayMetadata to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260422_123915/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260422_123915_07_distill_smoke/checkpoints/5/model_params/array_metadatas/process_5 I0422 12:56:04.585395 135867819779840 base_pytree_checkpoint_handler.py:128] [process=5] /jax/checkpoint/write/gbytes_per_sec: 52.934 KiB/s (total gbytes: 267.5 KiB) (time elapsed: 5 seconds) (per-host) I0422 12:56:04.585554 135867819779840 base_pytree_checkpoint_handler.py:128] [process=5] /jax/checkpoint/write/gbytes_per_sec: 105.892 KiB/s (total gbytes: 535.1 KiB) (time elapsed: 5 seconds) (per-host) I0422 12:56:04.585594 135867819779840 async_checkpointer.py:90] [process=5][thread=async_save] 4 Handler Commit operations completed. Time taken: 4.147313s. I0422 12:56:13.018258 135867819779840 async_checkpointer.py:144] [process=5][thread=async_save] Background save thread done. Time taken: 12.579961s. I0422 12:56:13.018611 135867330983680 async_checkpointer.py:273] [process=5][thread=save_finalize] Done with waiting for background save thread=async_save. I0422 12:56:13.018742 135867330983680 async_checkpointer.py:283] [process=5][thread=save_finalize] No errors found in background save thread=async_save. I0422 12:56:13.018791 135867330983680 checkpoint_manager.py:2103] [process=5][thread=save_finalize][step=5] CheckpointManager Save Finalize is syncing with other hosts... I0422 12:56:13.020680 135867330983680 checkpoint_manager.py:2112] [process=5][thread=save_finalize][step=5] CheckpointManager Save Finalize is done on all hosts. I0422 12:56:13.020862 136017910646592 checkpoint_manager.py:2006] [process=5][thread=MainThread][step=5][wait_until_finished] Done waiting for Save Finalize thread (save_finalize) running at step=5. I0422 12:56:13.020984 136017910646592 train_distill.py:724] Final checkpoint saved. I0422 12:56:13.023260 136017910646592 peft_trainer.py:474] Train step 5 training loss: 15.949528 - training perplexity: 8448739.000000 I0422 12:56:13.023636 136017910646592 checkpoint_manager.py:1983] [process=5][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning. I0422 12:56:13.023725 136017910646592 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=136017910646592 count=1 at 0x7b9ce49fd600>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7b9ce48674d0>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7b9ce4866ae0>, _write_futures=[]) I0422 12:56:13.023773 136017910646592 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=136017910646592 count=1 at 0x7b9ce49fd600>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7b9ce48674d0>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7b9ce4866ae0>, _write_futures=[]) I0422 12:56:13.023801 136017910646592 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=136017910646592 count=1 at 0x7b9ce49fd600>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7b9ce48674d0>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7b9ce4866ae0>, _write_futures=[]) I0422 12:56:13.023847 136017910646592 train_distill.py:734] Distillation Complete. I0422 12:56:13.085232 135873723078400 grain_pool.py:547] Shutting down multiprocessing system. I0422 12:56:14.521100 135873723078400 grain_pool.py:542] Grain pool is exiting. I0422 12:56:14.521208 135873723078400 grain_pool.py:547] Shutting down multiprocessing system. I0422 12:56:14.521270 135873723078400 grain_pool.py:547] Shutting down multiprocessing system. XPK End: Wed Apr 22 12:56:22 UTC 2026 EXIT_CODE=0