feat/nnx-post-train-fixesXPK Start: Thu Apr 23 12:59:29 UTC 2026 2026-04-23 12:59:46.552023: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303) Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} `rope_scaling`'s factor field must be a float >= 1, got 40 `rope_scaling`'s beta_fast field must be a float, got 32 `rope_scaling`'s beta_slow field must be a float, got 1 Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} I0423 12:59:50.929174 138819277952832 max_utils.py:273] Attempting to initialize the jax distributed system... INFO:2026-04-23 12:59:59,968:jax._src.distributed:149: Starting JAX distributed service on [::]:8482 I0423 12:59:59.968068 138819277952832 distributed.py:149] Starting JAX distributed service on [::]:8482 INFO:2026-04-23 12:59:59,973:jax._src.distributed:166: Connecting to JAX distributed service on mt-07-distill-smoke-7lmkx-slice-job-0-0.mt-07-distill-smoke-7lmkx:8482 I0423 12:59:59.973568 138819277952832 distributed.py:166] Connecting to JAX distributed service on mt-07-distill-smoke-7lmkx-slice-job-0-0.mt-07-distill-smoke-7lmkx:8482 I0423 13:00:01.172903 138819277952832 max_utils.py:284] Jax distributed system initialized! I0423 13:00:07.888718 138819277952832 max_utils.py:244] Jax distributed system is already initialized. W0423 13:00:08.021780 138819277952832 pyconfig.py:111] base_output_directory is not provided; Using local directory called maxtext_output I0423 13:00:08.082703 138819277952832 max_utils.py:244] Jax distributed system is already initialized. I0423 13:00:08.083906 138819277952832 pyconfig.py:471] Config param abort_on_inf_loss: True I0423 13:00:08.083971 138819277952832 pyconfig.py:471] Config param abort_on_nan_loss: True I0423 13:00:08.083997 138819277952832 pyconfig.py:471] Config param act_quantization_calibration_method: absmax I0423 13:00:08.084018 138819277952832 pyconfig.py:471] Config param activation_dropout_for_audio: 0.0 I0423 13:00:08.084038 138819277952832 pyconfig.py:471] Config param activation_function_for_audio: gelu I0423 13:00:08.084056 138819277952832 pyconfig.py:471] Config param activations_in_float32: False I0423 13:00:08.084074 138819277952832 pyconfig.py:471] Config param adam_b1: 0.9 I0423 13:00:08.084096 138819277952832 pyconfig.py:471] Config param adam_b2: 0.95 I0423 13:00:08.084115 138819277952832 pyconfig.py:471] Config param adam_eps: 1e-08 I0423 13:00:08.084139 138819277952832 pyconfig.py:471] Config param adam_eps_root: 0.0 I0423 13:00:08.084156 138819277952832 pyconfig.py:471] Config param adam_weight_decay: 0.1 I0423 13:00:08.084174 138819277952832 pyconfig.py:471] Config param adamw_mask: [] I0423 13:00:08.084191 138819277952832 pyconfig.py:471] Config param add_bos: True I0423 13:00:08.084208 138819277952832 pyconfig.py:471] Config param add_eos: True I0423 13:00:08.084224 138819277952832 pyconfig.py:471] Config param allow_split_physical_axes: False I0423 13:00:08.084241 138819277952832 pyconfig.py:471] Config param ar_cache_axis_order: 1,2,0,3 I0423 13:00:08.084258 138819277952832 pyconfig.py:471] Config param async_checkpointing: True I0423 13:00:08.084275 138819277952832 pyconfig.py:471] Config param async_scheduling: False I0423 13:00:08.084292 138819277952832 pyconfig.py:471] Config param attention: dot_product I0423 13:00:08.084307 138819277952832 pyconfig.py:471] Config param attention_bias: False I0423 13:00:08.084325 138819277952832 pyconfig.py:471] Config param attention_dropout_for_audio: 0.0 I0423 13:00:08.084341 138819277952832 pyconfig.py:471] Config param attention_out: RematLocation.REMAT I0423 13:00:08.084361 138819277952832 pyconfig.py:471] Config param attention_output_dim: -1 I0423 13:00:08.084377 138819277952832 pyconfig.py:471] Config param attention_sink: False I0423 13:00:08.084394 138819277952832 pyconfig.py:471] Config param attention_type: global I0423 13:00:08.084412 138819277952832 pyconfig.py:471] Config param attn_logits_soft_cap: None I0423 13:00:08.084430 138819277952832 pyconfig.py:471] Config param audio_path: I0423 13:00:08.084445 138819277952832 pyconfig.py:471] Config param audio_placeholder: <|audio|> I0423 13:00:08.084462 138819277952832 pyconfig.py:471] Config param autoregressive_decode_assert: I0423 13:00:08.084478 138819277952832 pyconfig.py:471] Config param base_config: base.yml I0423 13:00:08.084493 138819277952832 pyconfig.py:471] Config param base_emb_dim: 16 I0423 13:00:08.084510 138819277952832 pyconfig.py:471] Config param base_mlp_dim: 64 I0423 13:00:08.084526 138819277952832 pyconfig.py:471] Config param base_moe_mlp_dim: -1 I0423 13:00:08.084565 138819277952832 pyconfig.py:471] Config param base_num_decoder_layers: 1 I0423 13:00:08.084583 138819277952832 pyconfig.py:471] Config param base_num_kv_heads: 2 I0423 13:00:08.084599 138819277952832 pyconfig.py:471] Config param base_num_query_heads: 2 I0423 13:00:08.084615 138819277952832 pyconfig.py:471] Config param base_output_directory: /deps/maxtext_output I0423 13:00:08.084631 138819277952832 pyconfig.py:471] Config param batch_size: 1 I0423 13:00:08.084646 138819277952832 pyconfig.py:471] Config param batch_split_factor: 1 I0423 13:00:08.084662 138819277952832 pyconfig.py:471] Config param beta_fast: 32 I0423 13:00:08.084678 138819277952832 pyconfig.py:471] Config param beta_slow: 1 I0423 13:00:08.084700 138819277952832 pyconfig.py:471] Config param bwd_quantization_calibration_method: absmax I0423 13:00:08.084718 138819277952832 pyconfig.py:471] Config param capacity_factor: -1.0 I0423 13:00:08.084734 138819277952832 pyconfig.py:471] Config param cast_logits_to_fp32: True I0423 13:00:08.084750 138819277952832 pyconfig.py:471] Config param chat_template: I0423 13:00:08.084766 138819277952832 pyconfig.py:471] Config param chat_template_path: I0423 13:00:08.084783 138819277952832 pyconfig.py:471] Config param checkpoint_conversion_fn: None I0423 13:00:08.084801 138819277952832 pyconfig.py:471] Config param checkpoint_dir: /deps/maxtext_output/gpt3-52k_2026-04-23-13-00/checkpoints/ I0423 13:00:08.084818 138819277952832 pyconfig.py:471] Config param checkpoint_is_quantized: False I0423 13:00:08.084833 138819277952832 pyconfig.py:471] Config param checkpoint_period: 2000 I0423 13:00:08.084849 138819277952832 pyconfig.py:471] Config param checkpoint_storage_concurrent_gb: 96 I0423 13:00:08.084865 138819277952832 pyconfig.py:471] Config param checkpoint_storage_target_data_file_size_bytes: 2147483648 I0423 13:00:08.084882 138819277952832 pyconfig.py:471] Config param checkpoint_storage_use_ocdbt: True I0423 13:00:08.084897 138819277952832 pyconfig.py:471] Config param checkpoint_storage_use_zarr3: True I0423 13:00:08.084913 138819277952832 pyconfig.py:471] Config param checkpoint_todelete_full_path: None I0423 13:00:08.084939 138819277952832 pyconfig.py:471] Config param checkpoint_todelete_subdir: None I0423 13:00:08.084955 138819277952832 pyconfig.py:471] Config param chips_per_vm: 4 I0423 13:00:08.084969 138819277952832 pyconfig.py:471] Config param chunk_attn_window_size: 0 I0423 13:00:08.084985 138819277952832 pyconfig.py:471] Config param collect_stack_trace: False I0423 13:00:08.085000 138819277952832 pyconfig.py:471] Config param colocated_python_checkpointing: False I0423 13:00:08.085016 138819277952832 pyconfig.py:471] Config param colocated_python_data_input: False I0423 13:00:08.085031 138819277952832 pyconfig.py:471] Config param compile_topology: I0423 13:00:08.085047 138819277952832 pyconfig.py:471] Config param compile_topology_num_slices: -1 I0423 13:00:08.085062 138819277952832 pyconfig.py:471] Config param compile_xla_flags: I0423 13:00:08.085078 138819277952832 pyconfig.py:471] Config param compiled_trainstep_file: I0423 13:00:08.085094 138819277952832 pyconfig.py:471] Config param compute_axis_order: 0,1,2,3 I0423 13:00:08.085108 138819277952832 pyconfig.py:471] Config param constant_bound_config: [] I0423 13:00:08.085124 138819277952832 pyconfig.py:471] Config param context: RematLocation.REMAT I0423 13:00:08.085139 138819277952832 pyconfig.py:471] Config param context_parallel_load_balance: True I0423 13:00:08.085156 138819277952832 pyconfig.py:471] Config param context_parallel_reorder_strategy: ReorderStrategy.AUTO I0423 13:00:08.085173 138819277952832 pyconfig.py:471] Config param context_parallel_size: 1 I0423 13:00:08.085189 138819277952832 pyconfig.py:471] Config param context_parallel_strategy: all_gather I0423 13:00:08.085205 138819277952832 pyconfig.py:471] Config param context_sharding: context I0423 13:00:08.085222 138819277952832 pyconfig.py:471] Config param conv_chunksize_for_audio: 500 I0423 13:00:08.085237 138819277952832 pyconfig.py:471] Config param conv_stride_for_vit: 14 I0423 13:00:08.085254 138819277952832 pyconfig.py:471] Config param convert_checkpoint_if_possible: False I0423 13:00:08.085269 138819277952832 pyconfig.py:471] Config param cost_estimate_flops_bwd: -1 I0423 13:00:08.085286 138819277952832 pyconfig.py:471] Config param cost_estimate_flops_fwd: -1 I0423 13:00:08.085300 138819277952832 pyconfig.py:471] Config param custom_mesh: I0423 13:00:08.085316 138819277952832 pyconfig.py:471] Config param custom_mesh_and_rule: I0423 13:00:08.085332 138819277952832 pyconfig.py:471] Config param d_model_for_audio: 256 I0423 13:00:08.085348 138819277952832 pyconfig.py:471] Config param data_sharding: (('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'),) I0423 13:00:08.085368 138819277952832 pyconfig.py:471] Config param data_shuffle_seed: 0 I0423 13:00:08.085385 138819277952832 pyconfig.py:471] Config param dataset_name: c4/en:3.0.1 I0423 13:00:08.085400 138819277952832 pyconfig.py:471] Config param dataset_path: I0423 13:00:08.085417 138819277952832 pyconfig.py:471] Config param dataset_type: DatasetType.HF I0423 13:00:08.085434 138819277952832 pyconfig.py:471] Config param dcn_autoregressive_parallelism: 1 I0423 13:00:08.085451 138819277952832 pyconfig.py:471] Config param dcn_context_autoregressive_parallelism: 1 I0423 13:00:08.085465 138819277952832 pyconfig.py:471] Config param dcn_context_parallelism: 1 I0423 13:00:08.085481 138819277952832 pyconfig.py:471] Config param dcn_data_parallelism: -1 I0423 13:00:08.085496 138819277952832 pyconfig.py:471] Config param dcn_diloco_parallelism: 1 I0423 13:00:08.085512 138819277952832 pyconfig.py:471] Config param dcn_expert_parallelism: 1 I0423 13:00:08.085527 138819277952832 pyconfig.py:471] Config param dcn_fsdp_parallelism: 1 I0423 13:00:08.085543 138819277952832 pyconfig.py:471] Config param dcn_fsdp_transpose_parallelism: 1 I0423 13:00:08.085558 138819277952832 pyconfig.py:471] Config param dcn_parallelism: [1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] I0423 13:00:08.085575 138819277952832 pyconfig.py:471] Config param dcn_pipeline_parallelism: 1 I0423 13:00:08.085590 138819277952832 pyconfig.py:471] Config param dcn_sequence_parallelism: 1 I0423 13:00:08.085605 138819277952832 pyconfig.py:471] Config param dcn_tensor_parallelism: 1 I0423 13:00:08.085620 138819277952832 pyconfig.py:471] Config param dcn_tensor_sequence_parallelism: 1 I0423 13:00:08.085636 138819277952832 pyconfig.py:471] Config param dcn_tensor_transpose_parallelism: 1 I0423 13:00:08.085651 138819277952832 pyconfig.py:471] Config param debug: {'rl': False} I0423 13:00:08.085668 138819277952832 pyconfig.py:471] Config param debug_sharding: False I0423 13:00:08.085683 138819277952832 pyconfig.py:471] Config param decode_sampling_nucleus_p: -1 I0423 13:00:08.085702 138819277952832 pyconfig.py:471] Config param decode_sampling_strategy: SamplingStrategy.GREEDY I0423 13:00:08.085722 138819277952832 pyconfig.py:471] Config param decode_sampling_temperature: 1.0 I0423 13:00:08.085738 138819277952832 pyconfig.py:471] Config param decode_sampling_top_k: 0 I0423 13:00:08.085754 138819277952832 pyconfig.py:471] Config param decoder_block: DecoderBlockType.GPT3 I0423 13:00:08.085770 138819277952832 pyconfig.py:471] Config param decoder_layer_input: RematLocation.DEVICE I0423 13:00:08.085786 138819277952832 pyconfig.py:471] Config param deepstack_visual_indexes_for_vit: [] I0423 13:00:08.085802 138819277952832 pyconfig.py:471] Config param degenerate_group_masking: True I0423 13:00:08.085818 138819277952832 pyconfig.py:471] Config param dense_init_scale: 1.0 I0423 13:00:08.085833 138819277952832 pyconfig.py:471] Config param diloco_outer_lr: 0.3 I0423 13:00:08.085850 138819277952832 pyconfig.py:471] Config param diloco_outer_momentum: 0.9 I0423 13:00:08.085865 138819277952832 pyconfig.py:471] Config param diloco_sync_period: 36 I0423 13:00:08.085882 138819277952832 pyconfig.py:471] Config param distill_alpha: 0.5 I0423 13:00:08.085897 138819277952832 pyconfig.py:471] Config param distill_alpha_end: None I0423 13:00:08.085914 138819277952832 pyconfig.py:471] Config param distill_alpha_schedule: constant I0423 13:00:08.085941 138819277952832 pyconfig.py:471] Config param distill_beta: 0.0 I0423 13:00:08.085958 138819277952832 pyconfig.py:471] Config param distill_beta_end: None I0423 13:00:08.085973 138819277952832 pyconfig.py:471] Config param distill_beta_schedule: constant I0423 13:00:08.085989 138819277952832 pyconfig.py:471] Config param distill_feature_loss_type: cosine I0423 13:00:08.086004 138819277952832 pyconfig.py:471] Config param distill_layer_indices: None I0423 13:00:08.086019 138819277952832 pyconfig.py:471] Config param distill_temperature: 1.0 I0423 13:00:08.086036 138819277952832 pyconfig.py:471] Config param distill_temperature_end: None I0423 13:00:08.086051 138819277952832 pyconfig.py:471] Config param distill_temperature_schedule: constant I0423 13:00:08.086067 138819277952832 pyconfig.py:471] Config param downsample_hidden_size_for_audio: 256 I0423 13:00:08.086082 138819277952832 pyconfig.py:471] Config param dpo_beta: 0.1 I0423 13:00:08.086099 138819277952832 pyconfig.py:471] Config param dpo_label_smoothing: 0.0 I0423 13:00:08.086115 138819277952832 pyconfig.py:471] Config param dq_reduction_steps: 0 I0423 13:00:08.086130 138819277952832 pyconfig.py:471] Config param dropout_rate: 0.0 I0423 13:00:08.086146 138819277952832 pyconfig.py:471] Config param dtype: bfloat16 I0423 13:00:08.086177 138819277952832 pyconfig.py:471] Config param dtype_mm: float32 I0423 13:00:08.086193 138819277952832 pyconfig.py:471] Config param dump_hlo: False I0423 13:00:08.086208 138819277952832 pyconfig.py:471] Config param dump_hlo_delete_local_after: True I0423 13:00:08.086224 138819277952832 pyconfig.py:471] Config param dump_hlo_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-23-13-00/xla_dump I0423 13:00:08.086239 138819277952832 pyconfig.py:471] Config param dump_hlo_local_dir: /tmp/xla_dump/ I0423 13:00:08.086255 138819277952832 pyconfig.py:471] Config param dump_hlo_local_module_name: jit_train_step I0423 13:00:08.086270 138819277952832 pyconfig.py:471] Config param dump_hlo_module_name: jit_train_step I0423 13:00:08.086286 138819277952832 pyconfig.py:471] Config param dump_hlo_upload_all: False I0423 13:00:08.086301 138819277952832 pyconfig.py:471] Config param dump_hlo_xla_flags: I0423 13:00:08.086317 138819277952832 pyconfig.py:471] Config param dump_jaxpr: False I0423 13:00:08.086332 138819277952832 pyconfig.py:471] Config param dump_jaxpr_delete_local_after: True I0423 13:00:08.086348 138819277952832 pyconfig.py:471] Config param dump_jaxpr_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-23-13-00/jaxpr_dump I0423 13:00:08.086363 138819277952832 pyconfig.py:471] Config param dump_jaxpr_local_dir: /tmp/jaxpr_dump/ I0423 13:00:08.086379 138819277952832 pyconfig.py:471] Config param dump_step: -1 I0423 13:00:08.086395 138819277952832 pyconfig.py:471] Config param elastic_enabled: False I0423 13:00:08.086410 138819277952832 pyconfig.py:471] Config param elastic_max_retries: 10 I0423 13:00:08.086425 138819277952832 pyconfig.py:471] Config param elastic_timeout_seconds: 300 I0423 13:00:08.086441 138819277952832 pyconfig.py:471] Config param emb_dim: 16 I0423 13:00:08.086457 138819277952832 pyconfig.py:471] Config param enable_autocheckpoint: False I0423 13:00:08.086472 138819277952832 pyconfig.py:471] Config param enable_checkpoint_cloud_logger: False I0423 13:00:08.086488 138819277952832 pyconfig.py:471] Config param enable_checkpointing: True I0423 13:00:08.086504 138819277952832 pyconfig.py:471] Config param enable_continuous_checkpointing: False I0423 13:00:08.086518 138819277952832 pyconfig.py:471] Config param enable_data_shuffling: True I0423 13:00:08.086534 138819277952832 pyconfig.py:471] Config param enable_diloco: False I0423 13:00:08.086549 138819277952832 pyconfig.py:471] Config param enable_dp_attention: False I0423 13:00:08.086565 138819277952832 pyconfig.py:471] Config param enable_dropout: False I0423 13:00:08.086581 138819277952832 pyconfig.py:471] Config param enable_emergency_checkpoint: False I0423 13:00:08.086598 138819277952832 pyconfig.py:471] Config param enable_expert_parallel: False I0423 13:00:08.086613 138819277952832 pyconfig.py:471] Config param enable_gcp_goodput_metrics: True I0423 13:00:08.086630 138819277952832 pyconfig.py:471] Config param enable_gcp_step_deviation_metrics: True I0423 13:00:08.086645 138819277952832 pyconfig.py:471] Config param enable_goodput_recording: False I0423 13:00:08.086661 138819277952832 pyconfig.py:471] Config param enable_jax_profiler: False I0423 13:00:08.086676 138819277952832 pyconfig.py:471] Config param enable_llm_inference_pool: False I0423 13:00:08.086695 138819277952832 pyconfig.py:471] Config param enable_model_warmup: False I0423 13:00:08.086709 138819277952832 pyconfig.py:471] Config param enable_multi_tier_checkpointing: False I0423 13:00:08.086725 138819277952832 pyconfig.py:471] Config param enable_nnx: False I0423 13:00:08.086742 138819277952832 pyconfig.py:471] Config param enable_orbax_v1: False I0423 13:00:08.086756 138819277952832 pyconfig.py:471] Config param enable_padding_causal_mask: True I0423 13:00:08.086772 138819277952832 pyconfig.py:471] Config param enable_pathways_goodput: False I0423 13:00:08.086787 138819277952832 pyconfig.py:471] Config param enable_prefix_caching: False I0423 13:00:08.086802 138819277952832 pyconfig.py:471] Config param enable_rampup_batch_size: False I0423 13:00:08.086817 138819277952832 pyconfig.py:471] Config param enable_single_controller: False I0423 13:00:08.086833 138819277952832 pyconfig.py:471] Config param enable_single_replica_ckpt_restoring: False I0423 13:00:08.086848 138819277952832 pyconfig.py:471] Config param enable_tensorboard: True I0423 13:00:08.086864 138819277952832 pyconfig.py:471] Config param enable_tunix_perf_metrics: False I0423 13:00:08.086879 138819277952832 pyconfig.py:471] Config param encoder_attention_heads_for_audio: 4 I0423 13:00:08.086894 138819277952832 pyconfig.py:471] Config param encoder_ffn_dim_for_audio: 512 I0423 13:00:08.086910 138819277952832 pyconfig.py:471] Config param encoder_layers_for_audio: 2 I0423 13:00:08.086934 138819277952832 pyconfig.py:471] Config param engram: RematLocation.REMAT I0423 13:00:08.086951 138819277952832 pyconfig.py:471] Config param engram_head_dim: 1280 I0423 13:00:08.086967 138819277952832 pyconfig.py:471] Config param engram_kernel_size: 4 I0423 13:00:08.086983 138819277952832 pyconfig.py:471] Config param engram_layers: [] I0423 13:00:08.086999 138819277952832 pyconfig.py:471] Config param engram_max_ngram_size: 3 I0423 13:00:08.087014 138819277952832 pyconfig.py:471] Config param engram_num_heads: 8 I0423 13:00:08.087030 138819277952832 pyconfig.py:471] Config param engram_seed: 0 I0423 13:00:08.087046 138819277952832 pyconfig.py:471] Config param engram_vocab_bases: [] I0423 13:00:08.087061 138819277952832 pyconfig.py:471] Config param epsilon_high: None I0423 13:00:08.087077 138819277952832 pyconfig.py:471] Config param eval_corr_lst: False I0423 13:00:08.087091 138819277952832 pyconfig.py:471] Config param eval_data_columns: ['text'] I0423 13:00:08.087108 138819277952832 pyconfig.py:471] Config param eval_dataset_name: c4/en:3.0.1 I0423 13:00:08.087124 138819277952832 pyconfig.py:471] Config param eval_image_column: image I0423 13:00:08.087139 138819277952832 pyconfig.py:471] Config param eval_interval: -1 I0423 13:00:08.087155 138819277952832 pyconfig.py:471] Config param eval_make_lst: False I0423 13:00:08.087170 138819277952832 pyconfig.py:471] Config param eval_per_device_batch_size: 2 I0423 13:00:08.087185 138819277952832 pyconfig.py:471] Config param eval_sampling_strategy: greedy I0423 13:00:08.087201 138819277952832 pyconfig.py:471] Config param eval_split: validation I0423 13:00:08.087215 138819277952832 pyconfig.py:471] Config param eval_steps: -1 I0423 13:00:08.087231 138819277952832 pyconfig.py:471] Config param expansion_factor_real_data: -1.0 I0423 13:00:08.087247 138819277952832 pyconfig.py:471] Config param final_logits_soft_cap: None I0423 13:00:08.087262 138819277952832 pyconfig.py:471] Config param first_num_dense_layers: 0 I0423 13:00:08.087278 138819277952832 pyconfig.py:471] Config param float32_gate_logits: False I0423 13:00:08.087293 138819277952832 pyconfig.py:471] Config param float32_logits: False I0423 13:00:08.087308 138819277952832 pyconfig.py:471] Config param float32_qk_product: False I0423 13:00:08.087324 138819277952832 pyconfig.py:471] Config param float32_weight_sum: True I0423 13:00:08.087339 138819277952832 pyconfig.py:471] Config param force_q_layout: False I0423 13:00:08.087355 138819277952832 pyconfig.py:471] Config param force_unroll: False I0423 13:00:08.087371 138819277952832 pyconfig.py:471] Config param formatting_func_kwargs: {} I0423 13:00:08.087385 138819277952832 pyconfig.py:471] Config param formatting_func_path: I0423 13:00:08.087401 138819277952832 pyconfig.py:471] Config param freeze_audio_encoder_params: True I0423 13:00:08.087415 138819277952832 pyconfig.py:471] Config param freeze_vision_encoder_params: True I0423 13:00:08.087431 138819277952832 pyconfig.py:471] Config param fused_mlp: False I0423 13:00:08.087446 138819277952832 pyconfig.py:471] Config param fused_qkv: True I0423 13:00:08.087462 138819277952832 pyconfig.py:471] Config param gcs_metrics: False I0423 13:00:08.087476 138819277952832 pyconfig.py:471] Config param gdn_chunk_size: 64 I0423 13:00:08.087492 138819277952832 pyconfig.py:471] Config param gdn_conv_kernel_dim: 4 I0423 13:00:08.087507 138819277952832 pyconfig.py:471] Config param gdn_key_head_dim: 128 I0423 13:00:08.087522 138819277952832 pyconfig.py:471] Config param gdn_num_key_heads: 16 I0423 13:00:08.087538 138819277952832 pyconfig.py:471] Config param gdn_num_value_heads: 32 I0423 13:00:08.087553 138819277952832 pyconfig.py:471] Config param gdn_value_head_dim: 128 I0423 13:00:08.087568 138819277952832 pyconfig.py:471] Config param generate_padding_batch_eval: False I0423 13:00:08.087584 138819277952832 pyconfig.py:471] Config param generate_padding_batch_train: False I0423 13:00:08.087599 138819277952832 pyconfig.py:471] Config param generate_slice: v5e-16 I0423 13:00:08.087615 138819277952832 pyconfig.py:471] Config param generation_configs: {} I0423 13:00:08.087630 138819277952832 pyconfig.py:471] Config param global_batch_size_to_eval_on: 64 I0423 13:00:08.087646 138819277952832 pyconfig.py:471] Config param global_batch_size_to_load: 512 I0423 13:00:08.087662 138819277952832 pyconfig.py:471] Config param global_batch_size_to_load_eval: 64 I0423 13:00:08.087677 138819277952832 pyconfig.py:471] Config param global_batch_size_to_load_increment: None I0423 13:00:08.087699 138819277952832 pyconfig.py:471] Config param global_batch_size_to_load_start: None I0423 13:00:08.087715 138819277952832 pyconfig.py:471] Config param global_batch_size_to_train_on: 512 I0423 13:00:08.087732 138819277952832 pyconfig.py:471] Config param global_head_dim: 0 I0423 13:00:08.087747 138819277952832 pyconfig.py:471] Config param global_num_kv_heads: 0 I0423 13:00:08.087764 138819277952832 pyconfig.py:471] Config param global_parameter_scale: 1 I0423 13:00:08.087779 138819277952832 pyconfig.py:471] Config param global_rampup_samples: 500 I0423 13:00:08.087796 138819277952832 pyconfig.py:471] Config param global_rope_max_timescale: -1 I0423 13:00:08.087810 138819277952832 pyconfig.py:471] Config param global_rope_proportion: 0.25 I0423 13:00:08.087827 138819277952832 pyconfig.py:471] Config param goodput_upload_interval_seconds: 30 I0423 13:00:08.087842 138819277952832 pyconfig.py:471] Config param grad_dtype: float32 I0423 13:00:08.087877 138819277952832 pyconfig.py:471] Config param gradient_accumulation_steps: 8 I0423 13:00:08.087893 138819277952832 pyconfig.py:471] Config param gradient_clipping_threshold: 1.0 I0423 13:00:08.087909 138819277952832 pyconfig.py:471] Config param grain_data_source_max_workers: 16 I0423 13:00:08.087933 138819277952832 pyconfig.py:471] Config param grain_eval_files: I0423 13:00:08.087950 138819277952832 pyconfig.py:471] Config param grain_file_type: arrayrecord I0423 13:00:08.087966 138819277952832 pyconfig.py:471] Config param grain_num_threads: 16 I0423 13:00:08.087982 138819277952832 pyconfig.py:471] Config param grain_num_threads_eval: 16 I0423 13:00:08.087998 138819277952832 pyconfig.py:471] Config param grain_packing_type: first_fit I0423 13:00:08.088013 138819277952832 pyconfig.py:471] Config param grain_per_worker_buffer_size: 1 I0423 13:00:08.088029 138819277952832 pyconfig.py:471] Config param grain_per_worker_buffer_size_eval: 1 I0423 13:00:08.088044 138819277952832 pyconfig.py:471] Config param grain_prefetch_buffer_size: 500 I0423 13:00:08.088060 138819277952832 pyconfig.py:471] Config param grain_prefetch_buffer_size_eval: 500 I0423 13:00:08.088075 138819277952832 pyconfig.py:471] Config param grain_ram_budget_mb: 1024 I0423 13:00:08.088091 138819277952832 pyconfig.py:471] Config param grain_shuffle_buffer_size: 100 I0423 13:00:08.088106 138819277952832 pyconfig.py:471] Config param grain_train_files: I0423 13:00:08.088122 138819277952832 pyconfig.py:471] Config param grain_train_mixture_config_path: I0423 13:00:08.088137 138819277952832 pyconfig.py:471] Config param grain_worker_count: 1 I0423 13:00:08.088153 138819277952832 pyconfig.py:471] Config param grain_worker_count_eval: 1 I0423 13:00:08.088168 138819277952832 pyconfig.py:471] Config param grpo_beta: 0.08 I0423 13:00:08.088184 138819277952832 pyconfig.py:471] Config param grpo_epsilon: 0.2 I0423 13:00:08.088199 138819277952832 pyconfig.py:471] Config param hardware: tpu I0423 13:00:08.088215 138819277952832 pyconfig.py:471] Config param hbm_utilization_vllm: 0.72 I0423 13:00:08.088230 138819277952832 pyconfig.py:471] Config param head_dim: 8 I0423 13:00:08.088247 138819277952832 pyconfig.py:471] Config param heartbeat_reporting_interval_in_seconds: 5 I0423 13:00:08.088261 138819277952832 pyconfig.py:471] Config param hf_data_dir: None I0423 13:00:08.088277 138819277952832 pyconfig.py:471] Config param hf_eval_files: None I0423 13:00:08.088293 138819277952832 pyconfig.py:471] Config param hf_eval_split: None I0423 13:00:08.088308 138819277952832 pyconfig.py:471] Config param hf_name: None I0423 13:00:08.088325 138819277952832 pyconfig.py:471] Config param hf_path: OptimalScale/ClimbMix I0423 13:00:08.088340 138819277952832 pyconfig.py:471] Config param hf_train_files: None I0423 13:00:08.088356 138819277952832 pyconfig.py:471] Config param hidden_size_for_vit: 1408 I0423 13:00:08.088371 138819277952832 pyconfig.py:471] Config param hide_profiler_step_metric: False I0423 13:00:08.088386 138819277952832 pyconfig.py:471] Config param ici_autoregressive_parallelism: 1 I0423 13:00:08.088401 138819277952832 pyconfig.py:471] Config param ici_context_autoregressive_parallelism: 1 I0423 13:00:08.088417 138819277952832 pyconfig.py:471] Config param ici_context_parallelism: 1 I0423 13:00:08.088432 138819277952832 pyconfig.py:471] Config param ici_data_parallelism: 1 I0423 13:00:08.088448 138819277952832 pyconfig.py:471] Config param ici_diloco_parallelism: 1 I0423 13:00:08.088463 138819277952832 pyconfig.py:471] Config param ici_expert_parallelism: 1 I0423 13:00:08.088479 138819277952832 pyconfig.py:471] Config param ici_fsdp_parallelism: -1 I0423 13:00:08.088493 138819277952832 pyconfig.py:471] Config param ici_fsdp_transpose_parallelism: 1 I0423 13:00:08.088509 138819277952832 pyconfig.py:471] Config param ici_parallelism: [1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1] I0423 13:00:08.088526 138819277952832 pyconfig.py:471] Config param ici_pipeline_parallelism: 1 I0423 13:00:08.088541 138819277952832 pyconfig.py:471] Config param ici_sequence_parallelism: 1 I0423 13:00:08.088557 138819277952832 pyconfig.py:471] Config param ici_tensor_parallelism: 1 I0423 13:00:08.088572 138819277952832 pyconfig.py:471] Config param ici_tensor_sequence_parallelism: 1 I0423 13:00:08.088588 138819277952832 pyconfig.py:471] Config param ici_tensor_transpose_parallelism: 1 I0423 13:00:08.088604 138819277952832 pyconfig.py:471] Config param image_path: I0423 13:00:08.088618 138819277952832 pyconfig.py:471] Config param image_placeholder: <|image|> I0423 13:00:08.088634 138819277952832 pyconfig.py:471] Config param image_size_for_vit: 896 I0423 13:00:08.088649 138819277952832 pyconfig.py:471] Config param indexer_head_dim: 128 I0423 13:00:08.088665 138819277952832 pyconfig.py:471] Config param indexer_loss_scaling_factor: 0.0 I0423 13:00:08.088681 138819277952832 pyconfig.py:471] Config param indexer_n_heads: 64 I0423 13:00:08.088759 138819277952832 pyconfig.py:471] Config param indexer_sparse_training: False I0423 13:00:08.088778 138819277952832 pyconfig.py:471] Config param indexer_topk: 2048 I0423 13:00:08.088796 138819277952832 pyconfig.py:471] Config param inference_benchmark_test: False I0423 13:00:08.088811 138819277952832 pyconfig.py:471] Config param inference_metadata_file: I0423 13:00:08.088827 138819277952832 pyconfig.py:471] Config param inference_microbenchmark_log_file_path: I0423 13:00:08.088842 138819277952832 pyconfig.py:471] Config param inference_microbenchmark_loop_iters: 10 I0423 13:00:08.088858 138819277952832 pyconfig.py:471] Config param inference_microbenchmark_num_samples: [1, 2, 3, 4, 5] I0423 13:00:08.088874 138819277952832 pyconfig.py:471] Config param inference_microbenchmark_prefill_lengths: 64,128,256,512,1024 I0423 13:00:08.088890 138819277952832 pyconfig.py:471] Config param inference_microbenchmark_stages: prefill,generate I0423 13:00:08.088905 138819277952832 pyconfig.py:471] Config param inference_server: MaxtextInterleavedServer I0423 13:00:08.088921 138819277952832 pyconfig.py:471] Config param inhomogeneous_layer_cycle_interval: 1 I0423 13:00:08.088948 138819277952832 pyconfig.py:471] Config param init_weights_seed: 0 I0423 13:00:08.088964 138819277952832 pyconfig.py:471] Config param input_data_sharding_logical_axes: ['activation_embed_and_logits_batch', 'activation_norm_length'] I0423 13:00:08.088979 138819277952832 pyconfig.py:471] Config param interleave_moe_layer_step: 1 I0423 13:00:08.088996 138819277952832 pyconfig.py:471] Config param intermediate_size_for_vit: 5632 I0423 13:00:08.089012 138819277952832 pyconfig.py:471] Config param internal_compile: False I0423 13:00:08.089029 138819277952832 pyconfig.py:471] Config param internal_compile_num_devices: -1 I0423 13:00:08.089045 138819277952832 pyconfig.py:471] Config param jax_cache_dir: ~/jax_cache I0423 13:00:08.089060 138819277952832 pyconfig.py:471] Config param jax_debug_log_modules: I0423 13:00:08.089076 138819277952832 pyconfig.py:471] Config param jax_distributed_initialization_timeout: 300 I0423 13:00:08.089092 138819277952832 pyconfig.py:471] Config param jax_profiler_port: 9999 I0423 13:00:08.089108 138819277952832 pyconfig.py:471] Config param key_proj: RematLocation.REMAT I0423 13:00:08.089124 138819277952832 pyconfig.py:471] Config param kv_cache_buffer: 256 I0423 13:00:08.089140 138819277952832 pyconfig.py:471] Config param kv_lora_rank: 512 I0423 13:00:08.089155 138819277952832 pyconfig.py:471] Config param kv_quant_axis: KvQuantAxis.HEADS_AND_DKV I0423 13:00:08.089174 138819277952832 pyconfig.py:471] Config param kv_quant_dtype: int8 I0423 13:00:08.089188 138819277952832 pyconfig.py:471] Config param kv_wa_proj: RematLocation.REMAT I0423 13:00:08.089205 138819277952832 pyconfig.py:471] Config param learning_rate: 0.0002 I0423 13:00:08.089221 138819277952832 pyconfig.py:471] Config param learning_rate_final_fraction: 0.1 I0423 13:00:08.089236 138819277952832 pyconfig.py:471] Config param learning_rate_schedule_steps: 200000 I0423 13:00:08.089252 138819277952832 pyconfig.py:471] Config param load_balance_loss_weight: 0.0 I0423 13:00:08.089267 138819277952832 pyconfig.py:471] Config param load_checkpoint_only_once: False I0423 13:00:08.089282 138819277952832 pyconfig.py:471] Config param load_from_prefill_dir: False I0423 13:00:08.089296 138819277952832 pyconfig.py:471] Config param load_full_state_path: I0423 13:00:08.089312 138819277952832 pyconfig.py:471] Config param load_parameters_path: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items I0423 13:00:08.089327 138819277952832 pyconfig.py:471] Config param local_checkpoint_directory: I0423 13:00:08.089343 138819277952832 pyconfig.py:471] Config param local_checkpoint_period: 0 I0423 13:00:08.089358 138819277952832 pyconfig.py:471] Config param local_rope_max_timescale: -1 I0423 13:00:08.089374 138819277952832 pyconfig.py:471] Config param local_rope_proportion: 1.0 I0423 13:00:08.089389 138819277952832 pyconfig.py:471] Config param log_config: True I0423 13:00:08.089404 138819277952832 pyconfig.py:471] Config param log_period: 10 I0423 13:00:08.089421 138819277952832 pyconfig.py:471] Config param logical_axis_rules: (('activation_embed_and_logits_batch', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_embed_and_logits_batch_sequence', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_vocab', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_vocab', ('tensor', 'tensor_transpose')), ('activation_vocab', 'tensor_sequence'), ('activation_vocab', ('sequence', 'context')), ('vocab', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('embed_vocab', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence', 'autoregressive')), ('activation_kv_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence')), ('activation_attn_length', ('sequence', 'context')), ('activation_attn_length', ('context',)), ('activation_q_length', ('context',)), ('activation_kv_length', ()), ('activation_attn_embed', ('tensor', 'tensor_transpose')), ('activation_kv', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_kv_head_dim', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('q_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('kv_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('qkv', ()), ('kv', ()), ('kv_head_dim', ()), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'expert')), ('q_lora_up_proj', ()), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'expert')), ('kv_lora_up_proj', ()), ('activation_batch_moe', ('data', 'fsdp', 'fsdp_transpose')), ('activation_length_moe', ('sequence', 'context')), ('activation_length_moe', ('context',)), ('activation_norm_length_moe', ('tensor_sequence', 'context', 'sequence')), ('activation_embed_moe', ('tensor', 'tensor_transpose')), ('activation_mlp_moe', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_exp', ('expert',)), ('exp', 'expert'), ('mlp_moe', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'context')), ('embed_moe', ('fsdp', 'sequence', 'context')), ('activation_mlp', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_length', ('sequence', 'context')), ('activation_length', ('context',)), ('activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_embed', ('tensor', 'tensor_transpose')), ('activation_stage', 'stage'), ('mlp', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'context', 'expert')), ('norm', ('tensor', 'tensor_transpose')), ('layers', 'stage'), ('diloco', 'diloco'), ('engram_dim', ('tensor',)), ('dense_layers', ()), ('moe_layers', ()), ('mhc', ()), ('prefill_activation_length', ('sequence', 'context')), ('prefill_activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_prefill_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_length', ('sequence',)), ('cache_heads', ('autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence')), ('cache_heads', ('autoregressive', 'tensor', 'tensor_sequence')), ('paged_kv_heads', ('tensor',)), ('cache_batch_prefill', ()), ('cache_batch', ()), ('cache_heads_none', ()), ('cache_kv', ()), ('cache_sequence', ()), ('num_pages', ()), ('tokens_per_page', ()), ('paged_kv_head_dim_size', ()), ('mlp_no_fsdp', ('tensor', 'tensor_sequence', 'autoregressive')), ('embed_tensor_transpose', ('tensor_transpose',)), ('exp_with_fsdp', 'fsdp')) I0423 13:00:08.089494 138819277952832 pyconfig.py:471] Config param logits_dot_in_fp32: False I0423 13:00:08.089511 138819277952832 pyconfig.py:471] Config param logits_via_embedding: True I0423 13:00:08.089527 138819277952832 pyconfig.py:471] Config param lora_input_adapters_path: I0423 13:00:08.089542 138819277952832 pyconfig.py:471] Config param loss_algo: grpo I0423 13:00:08.089558 138819277952832 pyconfig.py:471] Config param lr_schedule_type: LearningRateScheduleType.COSINE I0423 13:00:08.089574 138819277952832 pyconfig.py:471] Config param managed_mldiagnostics: False I0423 13:00:08.089590 138819277952832 pyconfig.py:471] Config param managed_mldiagnostics_dir: /deps/maxtext_output/gpt3-52k_2026-04-23-13-00/managed-mldiagnostics I0423 13:00:08.089605 138819277952832 pyconfig.py:471] Config param managed_mldiagnostics_run_group: I0423 13:00:08.089621 138819277952832 pyconfig.py:471] Config param matmul_precision: MatmulPrecision.DEFAULT I0423 13:00:08.089639 138819277952832 pyconfig.py:471] Config param max_checkify: False I0423 13:00:08.089653 138819277952832 pyconfig.py:471] Config param max_concurrency: 256 I0423 13:00:08.089669 138819277952832 pyconfig.py:471] Config param max_corpus_chars: 10000000 I0423 13:00:08.089684 138819277952832 pyconfig.py:471] Config param max_num_batched_tokens: None I0423 13:00:08.089705 138819277952832 pyconfig.py:471] Config param max_num_checkpoints_to_keep: None I0423 13:00:08.089720 138819277952832 pyconfig.py:471] Config param max_num_images_per_example: -1 I0423 13:00:08.089736 138819277952832 pyconfig.py:471] Config param max_num_seqs: None I0423 13:00:08.089750 138819277952832 pyconfig.py:471] Config param max_position_embeddings: 163840 I0423 13:00:08.089766 138819277952832 pyconfig.py:471] Config param max_prefill_predict_length: 64 I0423 13:00:08.089781 138819277952832 pyconfig.py:471] Config param max_sample_len_for_audio: 10000 I0423 13:00:08.089797 138819277952832 pyconfig.py:471] Config param max_segments_per_seq: -1 I0423 13:00:08.089811 138819277952832 pyconfig.py:471] Config param max_source_positions_for_audio: 1500 I0423 13:00:08.089827 138819277952832 pyconfig.py:471] Config param max_target_length: 2048 I0423 13:00:08.089842 138819277952832 pyconfig.py:471] Config param max_timescale_for_audio: 10000.0 I0423 13:00:08.089858 138819277952832 pyconfig.py:471] Config param megablox: True I0423 13:00:08.089873 138819277952832 pyconfig.py:471] Config param merge_gating_gmm: False I0423 13:00:08.089888 138819277952832 pyconfig.py:471] Config param mesh_axes: ['diloco', 'data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'] I0423 13:00:08.089905 138819277952832 pyconfig.py:471] Config param metrics_dir: /deps/maxtext_output/gpt3-52k_2026-04-23-13-00/metrics/ I0423 13:00:08.089920 138819277952832 pyconfig.py:471] Config param metrics_file: I0423 13:00:08.089946 138819277952832 pyconfig.py:471] Config param mhc_expansion_rate: 1 I0423 13:00:08.089963 138819277952832 pyconfig.py:471] Config param micro_batch_size_to_eval_on: 64 I0423 13:00:08.089978 138819277952832 pyconfig.py:471] Config param micro_batch_size_to_train_on: 64 I0423 13:00:08.089994 138819277952832 pyconfig.py:471] Config param mla_kv: RematLocation.REMAT I0423 13:00:08.090008 138819277952832 pyconfig.py:471] Config param mla_naive_kvcache: True I0423 13:00:08.090025 138819277952832 pyconfig.py:471] Config param mla_q: RematLocation.REMAT I0423 13:00:08.090040 138819277952832 pyconfig.py:471] Config param mlp_activations: ['gelu'] I0423 13:00:08.090056 138819277952832 pyconfig.py:471] Config param mlp_activations_limit: -1.0 I0423 13:00:08.090071 138819277952832 pyconfig.py:471] Config param mlp_bias: False I0423 13:00:08.090086 138819277952832 pyconfig.py:471] Config param mlp_dim: 64 I0423 13:00:08.090102 138819277952832 pyconfig.py:471] Config param mlpwi: RematLocation.REMAT I0423 13:00:08.090117 138819277952832 pyconfig.py:471] Config param mlpwi_0: RematLocation.REMAT I0423 13:00:08.090135 138819277952832 pyconfig.py:471] Config param mlpwi_1: RematLocation.REMAT I0423 13:00:08.090149 138819277952832 pyconfig.py:471] Config param mlpwo: RematLocation.REMAT I0423 13:00:08.090165 138819277952832 pyconfig.py:471] Config param moba: False I0423 13:00:08.090180 138819277952832 pyconfig.py:471] Config param moba_chunk_size: 1024 I0423 13:00:08.090195 138819277952832 pyconfig.py:471] Config param moba_topk: 8 I0423 13:00:08.090210 138819277952832 pyconfig.py:471] Config param model_call_mode: I0423 13:00:08.090226 138819277952832 pyconfig.py:471] Config param model_name: gpt3-52k I0423 13:00:08.090243 138819277952832 pyconfig.py:471] Config param moe_expert_input_dim: -1 I0423 13:00:08.090258 138819277952832 pyconfig.py:471] Config param moe_fsdp_use_two_stage_all_gather: False I0423 13:00:08.090275 138819277952832 pyconfig.py:471] Config param moe_mlp_dim: -1 I0423 13:00:08.090290 138819277952832 pyconfig.py:471] Config param moe_mlpwi_0: RematLocation.REMAT I0423 13:00:08.090306 138819277952832 pyconfig.py:471] Config param moe_mlpwi_1: RematLocation.REMAT I0423 13:00:08.090322 138819277952832 pyconfig.py:471] Config param moe_mlpwo: RematLocation.REMAT I0423 13:00:08.090338 138819277952832 pyconfig.py:471] Config param monitor_goodput: False I0423 13:00:08.090352 138819277952832 pyconfig.py:471] Config param monitor_step_time_deviation: True I0423 13:00:08.090368 138819277952832 pyconfig.py:471] Config param mrope_section: [24, 20, 20] I0423 13:00:08.090383 138819277952832 pyconfig.py:471] Config param mscale: 1.0 I0423 13:00:08.090399 138819277952832 pyconfig.py:471] Config param mtc_data_parallelism: 0 I0423 13:00:08.090414 138819277952832 pyconfig.py:471] Config param mtp_eval_target_module: 0 I0423 13:00:08.090430 138819277952832 pyconfig.py:471] Config param mtp_loss_scaling_factor: 0.1 I0423 13:00:08.090446 138819277952832 pyconfig.py:471] Config param mtp_num_layers: 0 I0423 13:00:08.090462 138819277952832 pyconfig.py:471] Config param mu_dtype: float32 I0423 13:00:08.090488 138819277952832 pyconfig.py:471] Config param multi_sampling: False I0423 13:00:08.090504 138819277952832 pyconfig.py:471] Config param multi_tier_checkpointing_backup_interval_minutes: 0 I0423 13:00:08.090520 138819277952832 pyconfig.py:471] Config param muon_beta: 0.95 I0423 13:00:08.090537 138819277952832 pyconfig.py:471] Config param muon_consistent_rms: None I0423 13:00:08.090552 138819277952832 pyconfig.py:471] Config param muon_weight_decay: 0.0 I0423 13:00:08.090568 138819277952832 pyconfig.py:471] Config param n_routing_groups: -1 I0423 13:00:08.090584 138819277952832 pyconfig.py:471] Config param n_window_for_audio: 50 I0423 13:00:08.090600 138819277952832 pyconfig.py:471] Config param n_window_infer_for_audio: 800 I0423 13:00:08.090615 138819277952832 pyconfig.py:471] Config param nope_layer_interval: -1 I0423 13:00:08.090631 138819277952832 pyconfig.py:471] Config param norm_topk_prob: False I0423 13:00:08.090646 138819277952832 pyconfig.py:471] Config param normalization_layer_epsilon: 1e-05 I0423 13:00:08.090664 138819277952832 pyconfig.py:471] Config param normalize_embedding_logits: False I0423 13:00:08.090680 138819277952832 pyconfig.py:471] Config param num_attention_heads_for_vit: 16 I0423 13:00:08.090700 138819277952832 pyconfig.py:471] Config param num_batches: 4 I0423 13:00:08.090717 138819277952832 pyconfig.py:471] Config param num_channels_for_vit: 3 I0423 13:00:08.090732 138819277952832 pyconfig.py:471] Config param num_conv_layers_for_audio: 3 I0423 13:00:08.090748 138819277952832 pyconfig.py:471] Config param num_decoder_layers: 1 I0423 13:00:08.090763 138819277952832 pyconfig.py:471] Config param num_diloco_replicas: 1 I0423 13:00:08.090780 138819277952832 pyconfig.py:471] Config param num_epoch: 1 I0423 13:00:08.090794 138819277952832 pyconfig.py:471] Config param num_eval_passes: 1 I0423 13:00:08.090810 138819277952832 pyconfig.py:471] Config param num_experts: 1 I0423 13:00:08.090825 138819277952832 pyconfig.py:471] Config param num_experts_per_tok: 1 I0423 13:00:08.090841 138819277952832 pyconfig.py:471] Config param num_generations: 2 I0423 13:00:08.090855 138819277952832 pyconfig.py:471] Config param num_hidden_layers_for_vit: 34 I0423 13:00:08.090871 138819277952832 pyconfig.py:471] Config param num_iterations: 1 I0423 13:00:08.090886 138819277952832 pyconfig.py:471] Config param num_kv_heads: 2 I0423 13:00:08.090902 138819277952832 pyconfig.py:471] Config param num_layers_per_pipeline_stage: 1 I0423 13:00:08.090918 138819277952832 pyconfig.py:471] Config param num_mel_bins_for_audio: 128 I0423 13:00:08.090941 138819277952832 pyconfig.py:471] Config param num_pipeline_microbatches: -1 I0423 13:00:08.090958 138819277952832 pyconfig.py:471] Config param num_pipeline_repeats: -1 I0423 13:00:08.090973 138819277952832 pyconfig.py:471] Config param num_position_embeddings_for_vit: 1024 I0423 13:00:08.090989 138819277952832 pyconfig.py:471] Config param num_query_heads: 2 I0423 13:00:08.091004 138819277952832 pyconfig.py:471] Config param num_samplers_slices: -1 I0423 13:00:08.091020 138819277952832 pyconfig.py:471] Config param num_slices: 1 I0423 13:00:08.091034 138819277952832 pyconfig.py:471] Config param num_target_devices: 32 I0423 13:00:08.091049 138819277952832 pyconfig.py:471] Config param num_test_batches: 5 I0423 13:00:08.091065 138819277952832 pyconfig.py:471] Config param num_trainer_slices: -1 I0423 13:00:08.091080 138819277952832 pyconfig.py:471] Config param num_vocab_tiling: 1 I0423 13:00:08.091096 138819277952832 pyconfig.py:471] Config param off_policy_steps: 0 I0423 13:00:08.091112 138819277952832 pyconfig.py:471] Config param offline_data_dir: None I0423 13:00:08.091127 138819277952832 pyconfig.py:471] Config param opt_type: OptimizerType.ADAM_PAX I0423 13:00:08.091145 138819277952832 pyconfig.py:471] Config param optimize_mesh_for_tpu_v6e: False I0423 13:00:08.091160 138819277952832 pyconfig.py:471] Config param optimizer_memory_host_offload: False I0423 13:00:08.091176 138819277952832 pyconfig.py:471] Config param original_max_position_embeddings: 4096 I0423 13:00:08.091192 138819277952832 pyconfig.py:471] Config param out_hidden_size_for_vit: 512 I0423 13:00:08.091208 138819277952832 pyconfig.py:471] Config param out_proj: RematLocation.REMAT I0423 13:00:08.091223 138819277952832 pyconfig.py:471] Config param output_dim_for_audio: 512 I0423 13:00:08.091239 138819277952832 pyconfig.py:471] Config param override_logical_axis_rules: False I0423 13:00:08.091253 138819277952832 pyconfig.py:471] Config param override_model_config: True I0423 13:00:08.091269 138819277952832 pyconfig.py:471] Config param packing: True I0423 13:00:08.091284 138819277952832 pyconfig.py:471] Config param pagedattn_head_dim_alignment: 128 I0423 13:00:08.091300 138819277952832 pyconfig.py:471] Config param pagedattn_max_pages_per_group: -1 I0423 13:00:08.091315 138819277952832 pyconfig.py:471] Config param pagedattn_num_pages: 64 I0423 13:00:08.091331 138819277952832 pyconfig.py:471] Config param pagedattn_pages_per_compute_block: 4 I0423 13:00:08.091346 138819277952832 pyconfig.py:471] Config param pagedattn_tokens_per_page: 32 I0423 13:00:08.091361 138819277952832 pyconfig.py:471] Config param param_scan_axis: 1 I0423 13:00:08.091377 138819277952832 pyconfig.py:471] Config param parameter_memory_host_offload: False I0423 13:00:08.091392 138819277952832 pyconfig.py:471] Config param partial_rotary_factor: 1.0 I0423 13:00:08.091407 138819277952832 pyconfig.py:471] Config param patch_size_for_vit: 14 I0423 13:00:08.091423 138819277952832 pyconfig.py:471] Config param penalty_incorrect_answer: -1.0 I0423 13:00:08.091438 138819277952832 pyconfig.py:471] Config param penalty_incorrect_format: -0.5 I0423 13:00:08.091455 138819277952832 pyconfig.py:471] Config param per_device_batch_size: 2 I0423 13:00:08.091470 138819277952832 pyconfig.py:471] Config param per_device_batch_size_increment: 2.0 I0423 13:00:08.091486 138819277952832 pyconfig.py:471] Config param per_device_batch_size_start: 4.0 I0423 13:00:08.091500 138819277952832 pyconfig.py:471] Config param pipeline_delay_activation_forwarding: False I0423 13:00:08.091517 138819277952832 pyconfig.py:471] Config param pipeline_fsdp_ag_once: False I0423 13:00:08.091531 138819277952832 pyconfig.py:471] Config param pipeline_fsdp_ag_per_repeat: False I0423 13:00:08.091547 138819277952832 pyconfig.py:471] Config param pipeline_parallel_layers: 1 I0423 13:00:08.091562 138819277952832 pyconfig.py:471] Config param pixel_shuffle_ratio_for_vit: 0.5 I0423 13:00:08.091578 138819277952832 pyconfig.py:471] Config param posemb_type_for_vit: learn I0423 13:00:08.091593 138819277952832 pyconfig.py:471] Config param position_id_per_seconds: 25 I0423 13:00:08.091610 138819277952832 pyconfig.py:471] Config param prefill_cache_axis_order: 1,2,0,3 I0423 13:00:08.091624 138819277952832 pyconfig.py:471] Config param prefill_cache_dir: I0423 13:00:08.091640 138819277952832 pyconfig.py:471] Config param prefill_chunk_size: 256 I0423 13:00:08.091655 138819277952832 pyconfig.py:471] Config param prefill_slice: v5e-16 I0423 13:00:08.091671 138819277952832 pyconfig.py:471] Config param prefix_caching_dram_byte: 100000000000 I0423 13:00:08.091685 138819277952832 pyconfig.py:471] Config param prefix_caching_hbm_byte: 10000000000 I0423 13:00:08.091704 138819277952832 pyconfig.py:471] Config param prefuse_moe_weights: False I0423 13:00:08.091721 138819277952832 pyconfig.py:471] Config param profile_cleanly: True I0423 13:00:08.091735 138819277952832 pyconfig.py:471] Config param profile_periodically_period: -1 I0423 13:00:08.091751 138819277952832 pyconfig.py:471] Config param profile_power_events: False I0423 13:00:08.091766 138819277952832 pyconfig.py:471] Config param profiler: ProfilerType.NONE I0423 13:00:08.091784 138819277952832 pyconfig.py:471] Config param profiler_steps: 5 I0423 13:00:08.091800 138819277952832 pyconfig.py:471] Config param projector_dropout_for_vit: 0.0 I0423 13:00:08.091814 138819277952832 pyconfig.py:471] Config param projector_input_dim_for_vit: 4096 I0423 13:00:08.091830 138819277952832 pyconfig.py:471] Config param projector_output_dim_for_vit: 4096 I0423 13:00:08.091845 138819277952832 pyconfig.py:471] Config param prometheus_port: 0 I0423 13:00:08.091861 138819277952832 pyconfig.py:471] Config param prompt: I love to I0423 13:00:08.091876 138819277952832 pyconfig.py:471] Config param pure_nnx: False I0423 13:00:08.091892 138819277952832 pyconfig.py:471] Config param pure_nnx_decoder: False I0423 13:00:08.091906 138819277952832 pyconfig.py:471] Config param q_lora_rank: 0 I0423 13:00:08.091922 138819277952832 pyconfig.py:471] Config param qk_clip_threshold: 100.0 I0423 13:00:08.091951 138819277952832 pyconfig.py:471] Config param qk_nope_head_dim: 128 I0423 13:00:08.091966 138819277952832 pyconfig.py:471] Config param qk_norm_with_scale: True I0423 13:00:08.091982 138819277952832 pyconfig.py:471] Config param qk_rope_head_dim: 64 I0423 13:00:08.091997 138819277952832 pyconfig.py:471] Config param qkv_proj: RematLocation.REMAT I0423 13:00:08.092014 138819277952832 pyconfig.py:471] Config param quant_cfg_path: I0423 13:00:08.092029 138819277952832 pyconfig.py:471] Config param quantization: QuantizationType.NONE I0423 13:00:08.092047 138819277952832 pyconfig.py:471] Config param quantization_local_shard_count: 4 I0423 13:00:08.092061 138819277952832 pyconfig.py:471] Config param quantize_kvcache: False I0423 13:00:08.092077 138819277952832 pyconfig.py:471] Config param query_proj: RematLocation.REMAT I0423 13:00:08.092093 138819277952832 pyconfig.py:471] Config param query_wa_proj: RematLocation.REMAT I0423 13:00:08.092109 138819277952832 pyconfig.py:471] Config param ragged_block_size: 256 I0423 13:00:08.092124 138819277952832 pyconfig.py:471] Config param ragged_buffer_factor: -1.0 I0423 13:00:08.092140 138819277952832 pyconfig.py:471] Config param rampup_end_step: 0 I0423 13:00:08.092156 138819277952832 pyconfig.py:471] Config param rampup_samples_per_increment_to_load: None I0423 13:00:08.092172 138819277952832 pyconfig.py:471] Config param reasoning_end_token: </reasoning> I0423 13:00:08.092188 138819277952832 pyconfig.py:471] Config param reasoning_start_token: <reasoning> I0423 13:00:08.092203 138819277952832 pyconfig.py:471] Config param record_internal_nn_metrics: 0 I0423 13:00:08.092219 138819277952832 pyconfig.py:471] Config param remat_policy: full I0423 13:00:08.092234 138819277952832 pyconfig.py:471] Config param remat_policy_for_vit: minimal I0423 13:00:08.092250 138819277952832 pyconfig.py:471] Config param remove_size_one_mesh_axis_from_type: True I0423 13:00:08.092264 138819277952832 pyconfig.py:471] Config param replicate_quant_scale: False I0423 13:00:08.092280 138819277952832 pyconfig.py:471] Config param replicator_backup_interval_minutes: 0 I0423 13:00:08.092295 138819277952832 pyconfig.py:471] Config param report_heartbeat_metric_for_gcp_monitoring: False I0423 13:00:08.092311 138819277952832 pyconfig.py:471] Config param report_performance_metric_for_gcp_monitoring: False I0423 13:00:08.092327 138819277952832 pyconfig.py:471] Config param reshape_q: False I0423 13:00:08.092342 138819277952832 pyconfig.py:471] Config param return_log_prob: False I0423 13:00:08.092358 138819277952832 pyconfig.py:471] Config param reuse_example_batch: 0 I0423 13:00:08.092374 138819277952832 pyconfig.py:471] Config param reward_exact_answer: 5.0 I0423 13:00:08.092389 138819277952832 pyconfig.py:471] Config param reward_exact_format_match: 3.0 I0423 13:00:08.092405 138819277952832 pyconfig.py:471] Config param reward_partial_format_match: 0.5 I0423 13:00:08.092420 138819277952832 pyconfig.py:471] Config param reward_ratio_guess_to_answer_high: 0.5 I0423 13:00:08.092437 138819277952832 pyconfig.py:471] Config param reward_ratio_guess_to_answer_low: 0.25 I0423 13:00:08.092452 138819277952832 pyconfig.py:471] Config param reward_white_space_format_match: 1.5 I0423 13:00:08.092468 138819277952832 pyconfig.py:471] Config param rl: {'num_generations': 2, 'num_iterations': 1, 'grpo_beta': 0.08, 'grpo_epsilon': 0.2, 'loss_algo': 'grpo', 'use_agentic_rollout': False, 'max_concurrency': 256, 'off_policy_steps': 0, 'system_prompt': '', 'degenerate_group_masking': True, 'epsilon_high': None} I0423 13:00:08.092489 138819277952832 pyconfig.py:471] Config param rollout_data_parallelism: -1 I0423 13:00:08.092507 138819277952832 pyconfig.py:471] Config param rollout_expert_parallelism: 1 I0423 13:00:08.092522 138819277952832 pyconfig.py:471] Config param rollout_micro_batch_size: -1 I0423 13:00:08.092538 138819277952832 pyconfig.py:471] Config param rollout_tensor_parallelism: -1 I0423 13:00:08.092554 138819277952832 pyconfig.py:471] Config param rope_attention_scaling: False I0423 13:00:08.092570 138819277952832 pyconfig.py:471] Config param rope_factor: 40 I0423 13:00:08.092584 138819277952832 pyconfig.py:471] Config param rope_interleave: True I0423 13:00:08.092600 138819277952832 pyconfig.py:471] Config param rope_linear_scaling_factor: 1.0 I0423 13:00:08.092615 138819277952832 pyconfig.py:471] Config param rope_max_timescale: 10000 I0423 13:00:08.092631 138819277952832 pyconfig.py:471] Config param rope_min_timescale: 1 I0423 13:00:08.092646 138819277952832 pyconfig.py:471] Config param rope_theta_for_vit: 10000 I0423 13:00:08.092662 138819277952832 pyconfig.py:471] Config param rope_truncate: True I0423 13:00:08.092677 138819277952832 pyconfig.py:471] Config param rope_type: RopeType.DEFAULT I0423 13:00:08.092699 138819277952832 pyconfig.py:471] Config param rope_use_scale: True I0423 13:00:08.092715 138819277952832 pyconfig.py:471] Config param routed_bias: False I0423 13:00:08.092732 138819277952832 pyconfig.py:471] Config param routed_bias_update_rate: 0.0 I0423 13:00:08.092749 138819277952832 pyconfig.py:471] Config param routed_scaling_factor: 1.0 I0423 13:00:08.092764 138819277952832 pyconfig.py:471] Config param routed_score_func: I0423 13:00:08.092781 138819277952832 pyconfig.py:471] Config param run_name: gpt3-52k_2026-04-23-13-00 I0423 13:00:08.092796 138819277952832 pyconfig.py:471] Config param sa_block_kv: 512 I0423 13:00:08.092811 138819277952832 pyconfig.py:471] Config param sa_block_kv_compute: 512 I0423 13:00:08.092828 138819277952832 pyconfig.py:471] Config param sa_block_kv_dkv: 512 I0423 13:00:08.092843 138819277952832 pyconfig.py:471] Config param sa_block_kv_dkv_compute: 512 I0423 13:00:08.092859 138819277952832 pyconfig.py:471] Config param sa_block_kv_dq: 512 I0423 13:00:08.092874 138819277952832 pyconfig.py:471] Config param sa_block_q: 512 I0423 13:00:08.092890 138819277952832 pyconfig.py:471] Config param sa_block_q_dkv: 512 I0423 13:00:08.092904 138819277952832 pyconfig.py:471] Config param sa_block_q_dq: 512 I0423 13:00:08.092920 138819277952832 pyconfig.py:471] Config param sa_k_layout: HEAD_DIM_MINOR I0423 13:00:08.092945 138819277952832 pyconfig.py:471] Config param sa_q_layout: HEAD_DIM_MINOR I0423 13:00:08.092961 138819277952832 pyconfig.py:471] Config param sa_use_fused_bwd_kernel: False I0423 13:00:08.092977 138819277952832 pyconfig.py:471] Config param sa_v_layout: HEAD_DIM_MINOR I0423 13:00:08.092993 138819277952832 pyconfig.py:471] Config param sampler_devices_fraction: 0.5 I0423 13:00:08.093009 138819277952832 pyconfig.py:471] Config param save_checkpoint_on_completion: True I0423 13:00:08.093024 138819277952832 pyconfig.py:471] Config param save_config_to_gcs: False I0423 13:00:08.093040 138819277952832 pyconfig.py:471] Config param save_quantized_params_path: I0423 13:00:08.093055 138819277952832 pyconfig.py:471] Config param scale_embedding_for_audio: True I0423 13:00:08.093071 138819277952832 pyconfig.py:471] Config param scan_layers: True I0423 13:00:08.093085 138819277952832 pyconfig.py:471] Config param scan_layers_per_stage: False I0423 13:00:08.093101 138819277952832 pyconfig.py:471] Config param scan_pipeline_iterations: True I0423 13:00:08.093117 138819277952832 pyconfig.py:471] Config param scan_pipeline_repeats: False I0423 13:00:08.093132 138819277952832 pyconfig.py:471] Config param set_remat_policy_on_layers_per_stage: False I0423 13:00:08.093147 138819277952832 pyconfig.py:471] Config param set_remat_policy_on_pipeline_iterations: True I0423 13:00:08.093162 138819277952832 pyconfig.py:471] Config param sft_train_on_completion_only: False I0423 13:00:08.093178 138819277952832 pyconfig.py:471] Config param shard_exp_on_fsdp: False I0423 13:00:08.093193 138819277952832 pyconfig.py:471] Config param shard_mode: ShardMode.AUTO I0423 13:00:08.093210 138819277952832 pyconfig.py:471] Config param shard_optimizer_over_data: False I0423 13:00:08.093226 138819277952832 pyconfig.py:471] Config param sharding_strategy: None I0423 13:00:08.093240 138819277952832 pyconfig.py:471] Config param sharding_tolerance: 0.02 I0423 13:00:08.093257 138819277952832 pyconfig.py:471] Config param shardy: True I0423 13:00:08.093271 138819277952832 pyconfig.py:471] Config param share_kv_projections: False I0423 13:00:08.093295 138819277952832 pyconfig.py:471] Config param shared_experts: 0 I0423 13:00:08.093310 138819277952832 pyconfig.py:471] Config param sinkhorn_iterations: 20 I0423 13:00:08.093325 138819277952832 pyconfig.py:471] Config param skip_first_n_steps_for_profiler: 1 I0423 13:00:08.093342 138819277952832 pyconfig.py:471] Config param skip_jax_distributed_system: False I0423 13:00:08.093356 138819277952832 pyconfig.py:471] Config param skip_step_interval: 128 I0423 13:00:08.093372 138819277952832 pyconfig.py:471] Config param skip_step_on_spikes: False I0423 13:00:08.093387 138819277952832 pyconfig.py:471] Config param skip_step_scaling_factor: 6.0 I0423 13:00:08.093403 138819277952832 pyconfig.py:471] Config param sliding_window_size: 0 I0423 13:00:08.093418 138819277952832 pyconfig.py:471] Config param solution_end_token: </answer> I0423 13:00:08.093434 138819277952832 pyconfig.py:471] Config param solution_start_token: <answer> I0423 13:00:08.093448 138819277952832 pyconfig.py:471] Config param source_checkpoint_layout: orbax I0423 13:00:08.093464 138819277952832 pyconfig.py:471] Config param sparse_matmul: True I0423 13:00:08.093480 138819277952832 pyconfig.py:471] Config param spatial_merge_size_for_vit: 2 I0423 13:00:08.093495 138819277952832 pyconfig.py:471] Config param stack_prefill_result_cache: False I0423 13:00:08.093510 138819277952832 pyconfig.py:471] Config param stack_trace_interval_seconds: 600 I0423 13:00:08.093525 138819277952832 pyconfig.py:471] Config param stack_trace_to_cloud: False I0423 13:00:08.093541 138819277952832 pyconfig.py:471] Config param step_deviation_interval_seconds: 30 I0423 13:00:08.093556 138819277952832 pyconfig.py:471] Config param steps: 200000 I0423 13:00:08.093572 138819277952832 pyconfig.py:471] Config param stop_strings: None I0423 13:00:08.093587 138819277952832 pyconfig.py:471] Config param student_overrides: {'model_name': 'llama3.1-8b'} I0423 13:00:08.093604 138819277952832 pyconfig.py:471] Config param student_params_to_update: None I0423 13:00:08.093619 138819277952832 pyconfig.py:471] Config param subslice_shape: I0423 13:00:08.093635 138819277952832 pyconfig.py:471] Config param swap_space_vllm_gb: 2 I0423 13:00:08.093651 138819277952832 pyconfig.py:471] Config param system_prompt: I0423 13:00:08.093666 138819277952832 pyconfig.py:471] Config param target_eval_loss: 0.0 I0423 13:00:08.093681 138819277952832 pyconfig.py:471] Config param teacher_overrides: {'model_name': 'llama3.1-8b'} I0423 13:00:08.093702 138819277952832 pyconfig.py:471] Config param temperature_tuning: False I0423 13:00:08.093717 138819277952832 pyconfig.py:471] Config param temporal_patch_size_for_vit: 2 I0423 13:00:08.093733 138819277952832 pyconfig.py:471] Config param tensorboard_dir: /deps/maxtext_output/gpt3-52k_2026-04-23-13-00/tensorboard/ I0423 13:00:08.093748 138819277952832 pyconfig.py:471] Config param tensors_on_device: None I0423 13:00:08.093763 138819277952832 pyconfig.py:471] Config param tensors_to_offload: None I0423 13:00:08.093778 138819277952832 pyconfig.py:471] Config param test_batch_start_index: 0 I0423 13:00:08.093794 138819277952832 pyconfig.py:471] Config param tile_size_for_vit: 336 I0423 13:00:08.093809 138819277952832 pyconfig.py:471] Config param tokenize_eval_data: True I0423 13:00:08.093825 138819277952832 pyconfig.py:471] Config param tokenize_train_data: True I0423 13:00:08.093839 138819277952832 pyconfig.py:471] Config param tokenizer_path: meta-llama/Llama-3.1-8B I0423 13:00:08.093855 138819277952832 pyconfig.py:471] Config param tokenizer_type: TokenizerType.HUGGINGFACE I0423 13:00:08.093873 138819277952832 pyconfig.py:471] Config param topk_routing_group: -1 I0423 13:00:08.093888 138819277952832 pyconfig.py:471] Config param train_data_columns: ['text'] I0423 13:00:08.093904 138819277952832 pyconfig.py:471] Config param train_fraction: 1.0 I0423 13:00:08.093920 138819277952832 pyconfig.py:471] Config param train_image_column: image I0423 13:00:08.093943 138819277952832 pyconfig.py:471] Config param train_micro_batch_size: -1 I0423 13:00:08.093959 138819277952832 pyconfig.py:471] Config param train_split: train I0423 13:00:08.093975 138819277952832 pyconfig.py:471] Config param trainable_parameters_mask: [] I0423 13:00:08.093989 138819277952832 pyconfig.py:471] Config param trainable_position_size: 2048 I0423 13:00:08.094006 138819277952832 pyconfig.py:471] Config param trainer_devices_fraction: 0.5 I0423 13:00:08.094022 138819277952832 pyconfig.py:471] Config param upload_all_profiler_results: False I0423 13:00:08.094038 138819277952832 pyconfig.py:471] Config param use_2d_fsdp_sharding: False I0423 13:00:08.094053 138819277952832 pyconfig.py:471] Config param use_agentic_rollout: False I0423 13:00:08.094069 138819277952832 pyconfig.py:471] Config param use_audio: False I0423 13:00:08.094083 138819277952832 pyconfig.py:471] Config param use_audio_in_video: False I0423 13:00:08.094099 138819277952832 pyconfig.py:471] Config param use_batch_split_schedule: False I0423 13:00:08.094114 138819277952832 pyconfig.py:471] Config param use_chat_template: False I0423 13:00:08.094130 138819277952832 pyconfig.py:471] Config param use_chunked_prefill: False I0423 13:00:08.094145 138819277952832 pyconfig.py:471] Config param use_custom_sort_vjp: True I0423 13:00:08.094161 138819277952832 pyconfig.py:471] Config param use_dpo: False I0423 13:00:08.094177 138819277952832 pyconfig.py:471] Config param use_gather_mosaic_kernel: False I0423 13:00:08.094193 138819277952832 pyconfig.py:471] Config param use_grpo: True I0423 13:00:08.094209 138819277952832 pyconfig.py:471] Config param use_indexer: False I0423 13:00:08.094224 138819277952832 pyconfig.py:471] Config param use_iota_embed: True I0423 13:00:08.094240 138819277952832 pyconfig.py:471] Config param use_jax_splash: False I0423 13:00:08.094256 138819277952832 pyconfig.py:471] Config param use_max_logit_estimate: -1 I0423 13:00:08.094271 138819277952832 pyconfig.py:471] Config param use_mrope: False I0423 13:00:08.094287 138819277952832 pyconfig.py:471] Config param use_multimodal: False I0423 13:00:08.094303 138819277952832 pyconfig.py:471] Config param use_pathways: True I0423 13:00:08.094318 138819277952832 pyconfig.py:471] Config param use_post_attn_norm: False I0423 13:00:08.094334 138819277952832 pyconfig.py:471] Config param use_post_ffw_norm: False I0423 13:00:08.094349 138819277952832 pyconfig.py:471] Config param use_qk_clip: False I0423 13:00:08.094364 138819277952832 pyconfig.py:471] Config param use_qk_norm: False I0423 13:00:08.094380 138819277952832 pyconfig.py:471] Config param use_qk_norm_in_gdn: True I0423 13:00:08.094394 138819277952832 pyconfig.py:471] Config param use_qwix_quantization: False I0423 13:00:08.094410 138819277952832 pyconfig.py:471] Config param use_ragged_attention: False I0423 13:00:08.094425 138819277952832 pyconfig.py:471] Config param use_random_routing: False I0423 13:00:08.094441 138819277952832 pyconfig.py:471] Config param use_replicator_service: False I0423 13:00:08.094455 138819277952832 pyconfig.py:471] Config param use_ring_of_experts: False I0423 13:00:08.094471 138819277952832 pyconfig.py:471] Config param use_sft: False I0423 13:00:08.094486 138819277952832 pyconfig.py:471] Config param use_splash_scheduler: False I0423 13:00:08.094502 138819277952832 pyconfig.py:471] Config param use_tokamax_gmm: False I0423 13:00:08.094517 138819277952832 pyconfig.py:471] Config param use_tokamax_splash: False I0423 13:00:08.094533 138819277952832 pyconfig.py:471] Config param use_truncation: True I0423 13:00:08.094547 138819277952832 pyconfig.py:471] Config param use_tunix_gradient_accumulation: False I0423 13:00:08.094564 138819277952832 pyconfig.py:471] Config param use_untrainable_positional_embedding: False I0423 13:00:08.094579 138819277952832 pyconfig.py:471] Config param use_vertex_tensorboard: False I0423 13:00:08.094594 138819277952832 pyconfig.py:471] Config param using_pipeline_parallelism: False I0423 13:00:08.094610 138819277952832 pyconfig.py:471] Config param v_head_dim: 128 I0423 13:00:08.094625 138819277952832 pyconfig.py:471] Config param v_norm_with_scale: True I0423 13:00:08.094640 138819277952832 pyconfig.py:471] Config param value_proj: RematLocation.REMAT I0423 13:00:08.094656 138819277952832 pyconfig.py:471] Config param vertex_tensorboard_project: I0423 13:00:08.094671 138819277952832 pyconfig.py:471] Config param vertex_tensorboard_region: I0423 13:00:08.094687 138819277952832 pyconfig.py:471] Config param video_path: I0423 13:00:08.094705 138819277952832 pyconfig.py:471] Config param video_placeholder: <|video|> I0423 13:00:08.094721 138819277952832 pyconfig.py:471] Config param vision_output_dim_for_vit: 4096 I0423 13:00:08.094735 138819277952832 pyconfig.py:471] Config param vision_output_length: -1 I0423 13:00:08.094752 138819277952832 pyconfig.py:471] Config param vllm_additional_config: {} I0423 13:00:08.094768 138819277952832 pyconfig.py:471] Config param vllm_hf_config_path: I0423 13:00:08.094783 138819277952832 pyconfig.py:471] Config param vllm_hf_overrides: {} I0423 13:00:08.094799 138819277952832 pyconfig.py:471] Config param vocab_size: 32000 I0423 13:00:08.094814 138819277952832 pyconfig.py:471] Config param warmup_steps_fraction: 0.1 I0423 13:00:08.094830 138819277952832 pyconfig.py:471] Config param weight_dtype: float32 I0423 13:00:08.094856 138819277952832 pyconfig.py:471] Config param weight_quantization_calibration_method: absmax I0423 13:00:08.094871 138819277952832 pyconfig.py:471] Config param wi_tile_dlhs_batch_seq: 512 I0423 13:00:08.094887 138819277952832 pyconfig.py:471] Config param wi_tile_dlhs_embed_dim: 1024 I0423 13:00:08.094904 138819277952832 pyconfig.py:471] Config param wi_tile_dlhs_mlp_dim: 1024 I0423 13:00:08.094920 138819277952832 pyconfig.py:471] Config param wi_tile_drhs_batch_seq: 512 I0423 13:00:08.094946 138819277952832 pyconfig.py:471] Config param wi_tile_drhs_embed_dim: 1024 I0423 13:00:08.094962 138819277952832 pyconfig.py:471] Config param wi_tile_drhs_mlp_dim: 1024 I0423 13:00:08.094977 138819277952832 pyconfig.py:471] Config param wi_tile_fwd_batch_seq: 512 I0423 13:00:08.094993 138819277952832 pyconfig.py:471] Config param wi_tile_fwd_embed_dim: 1024 I0423 13:00:08.095009 138819277952832 pyconfig.py:471] Config param wi_tile_fwd_mlp_dim: 1024 I0423 13:00:08.095024 138819277952832 pyconfig.py:471] Config param wo_tile_dlhs_batch_seq: 512 I0423 13:00:08.095039 138819277952832 pyconfig.py:471] Config param wo_tile_dlhs_embed_dim: 1024 I0423 13:00:08.095054 138819277952832 pyconfig.py:471] Config param wo_tile_dlhs_mlp_dim: 1024 I0423 13:00:08.095070 138819277952832 pyconfig.py:471] Config param wo_tile_drhs_batch_seq: 512 I0423 13:00:08.095086 138819277952832 pyconfig.py:471] Config param wo_tile_drhs_embed_dim: 1024 I0423 13:00:08.095101 138819277952832 pyconfig.py:471] Config param wo_tile_drhs_mlp_dim: 1024 I0423 13:00:08.095117 138819277952832 pyconfig.py:471] Config param wo_tile_fwd_batch_seq: 512 I0423 13:00:08.095132 138819277952832 pyconfig.py:471] Config param wo_tile_fwd_embed_dim: 1024 I0423 13:00:08.095147 138819277952832 pyconfig.py:471] Config param wo_tile_fwd_mlp_dim: 1024 I0423 13:00:08.095163 138819277952832 pyconfig.py:471] Config param wsd_decay_steps_fraction: 0.1 I0423 13:00:08.095179 138819277952832 pyconfig.py:471] Config param wsd_decay_style: WsdDecayStyle.LINEAR I0423 13:00:08.095197 138819277952832 pyconfig.py:471] Config param xprof_e2e_enable_fw_power_level_event: False I0423 13:00:08.095212 138819277952832 pyconfig.py:471] Config param xprof_e2e_enable_fw_thermal_event: False I0423 13:00:08.095228 138819277952832 pyconfig.py:471] Config param xprof_e2e_enable_fw_throttle_event: False I0423 13:00:08.095243 138819277952832 pyconfig.py:471] Config param xprof_tpu_power_trace_level: 0 I0423 13:00:08.095261 138819277952832 pyconfig.py:471] Config param z_loss_multiplier: 0.0 I0423 13:00:08.095576 138819277952832 tokenizer.py:245] Tokenizer path: meta-llama/Llama-2-7b-chat-hf I0423 13:00:08.095611 138819277952832 tokenizer.py:224] Loading HF tokenizer: meta-llama/Llama-2-7b-chat-hf I0423 13:00:11.766002 138819277952832 _schedule.py:129] A polynomial schedule was set with a non-positive `transition_steps` value; this results in a constant schedule with value `init_value`. I0423 13:00:11.769094 138819277952832 maxtext_utils.py:1771] Num_devices: 32, shape (1, 4, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1) I0423 13:00:11.769220 138819277952832 train_distill.py:608] Applying logical axis rules for model initialization and training... I0423 13:00:11.769293 138819277952832 train_distill.py:612] Loading Student from ... I0423 13:00:11.769323 138819277952832 train_distill.py:169] --- Student Configuration --- I0423 13:00:11.769345 138819277952832 train_distill.py:170] Model Name: gpt3-52k I0423 13:00:11.769368 138819277952832 train_distill.py:171] Dimensions: 1 Layers, 16 Emb Dim, 8 Head Dim I0423 13:00:11.769387 138819277952832 train_distill.py:174] Attention Heads: 2 Query, 2 KV I0423 13:00:11.769405 138819277952832 train_distill.py:175] Vocab Size: 32000 I0423 13:00:11.769428 138819277952832 train_distill.py:176] Checkpoint: I0423 13:00:11.769447 138819277952832 train_distill.py:477] Initializing model: gpt3-52k... I0423 13:00:13.140629 138819277952832 train_distill.py:626] Loading Teacher from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items... I0423 13:00:13.140739 138819277952832 train_distill.py:169] --- Teacher Configuration --- I0423 13:00:13.140767 138819277952832 train_distill.py:170] Model Name: gpt3-52k I0423 13:00:13.140791 138819277952832 train_distill.py:171] Dimensions: 1 Layers, 16 Emb Dim, 8 Head Dim I0423 13:00:13.140814 138819277952832 train_distill.py:174] Attention Heads: 2 Query, 2 KV I0423 13:00:13.140833 138819277952832 train_distill.py:175] Vocab Size: 32000 I0423 13:00:13.140853 138819277952832 train_distill.py:176] Checkpoint: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items I0423 13:00:13.140872 138819277952832 train_distill.py:477] Initializing model: gpt3-52k... I0423 13:00:14.177534 138819277952832 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0423 13:00:14.177978 138819277952832 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=True, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7e40aa25dca0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0423 13:00:14.178039 138819277952832 abstract_checkpointer.py:35] orbax-checkpoint version: 0.11.28 W0423 13:00:14.703804 138819277952832 checkpoint.py:202] Metadata file does not exist: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items/_CHECKPOINT_METADATA I0423 13:00:15.262089 2136 google_auth_provider.cc:181] Running on GCE, using service account 562977990677-compute@developer.gserviceaccount.com I0423 13:00:16.408087 138819277952832 checkpointer.py:304] Restoring checkpoint from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items. W0423 13:00:18.600100 138819277952832 transform_utils.py:230] The transformations API will eventually be replaced by an upgraded design. The current API will not be removed until this point, but it will no longer be actively worked on. I0423 13:00:18.600483 138819277952832 transform_utils.py:288] The following keys are not loaded from the original tree after applying specified transforms: params/params/decoder/to_nnx__rngs/aqt/count, params/params/decoder/to_nnx__rngs/aqt/key, params/params/decoder/to_nnx__rngs/dropout/count, params/params/decoder/to_nnx__rngs/dropout/key, params/params/decoder/to_nnx__rngs/params/count, params/params/decoder/to_nnx__rngs/params/key I0423 13:00:19.263489 138819277952832 checkpointer.py:318] Finished restoring checkpoint in 3.23 seconds from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items. I0423 13:00:19.948423 138819277952832 train_distill.py:652] Initializing Data Iterators via MaxText pipeline... I0423 13:00:20.012689 138819277952832 config.py:112] TensorFlow version 2.20.0 available. I0423 13:00:20.013212 138819277952832 config.py:125] JAX version 0.8.3 available. E0423 13:00:22.221735 138819277952832 packing.py:209] PackAndBatchOperation is deprecated. Please use lazy_dataset.FirstFitPackIterDataset instead. I0423 13:00:22.221972 138819277952832 data_loader.py:408] Adding CopyNumPyArrayToSharedMemory MapTransform. I0423 13:00:22.224998 138819277952832 train_distill.py:422] Input Pipeline Checkpointing: DISABLED I0423 13:00:22.225057 138819277952832 train_distill.py:426] Reason: Iterator 'MultiHostDataLoadIterator' is not recognized as Grain (dataset_type='DatasetType.HF', has_save=False) I0423 13:00:22.225120 138819277952832 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0423 13:00:22.225197 138819277952832 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7e40aa25dca0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0423 13:00:22.225238 138819277952832 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0423 13:00:22.225269 138819277952832 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7e40aa25dca0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0423 13:00:22.225311 138819277952832 checkpoint_manager.py:702] [process=3][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e368e19f380>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e28eb5d8440>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e28eb5d80e0>}, handler_registry=None I0423 13:00:22.225509 138819277952832 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e368e19f380>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0423 13:00:22.225551 138819277952832 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e28eb5d8440>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0423 13:00:22.225578 138819277952832 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e28eb5d80e0>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0423 13:00:22.225603 138819277952832 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e2a0d64eb70>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0423 13:00:22.225631 138819277952832 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e368e19f380>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e368e19f380>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e28eb5d8440>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e28eb5d8440>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e28eb5d80e0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e28eb5d80e0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e2a0d64eb70>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e2a0d64eb70>}). I0423 13:00:22.226054 138819277952832 async_checkpointer.py:177] [process=3][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7e28eb36b7e0> timeout: 600 secs and primary_host=0 for async checkpoint writes I0423 13:00:24.514165 138819277952832 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260423_124550/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260423_124550_07_distill_smoke/checkpoints I0423 13:00:24.533222 138819277952832 checkpoint_manager.py:921] [process=3][thread=MainThread] CheckpointManager created, primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260423_124550/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260423_124550_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7e28eb5d8110> I0423 13:00:24.533354 138819277952832 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0423 13:00:24.533421 138819277952832 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7e40aa25dca0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0423 13:00:24.533458 138819277952832 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0423 13:00:24.533498 138819277952832 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7e40aa25dca0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0423 13:00:24.533536 138819277952832 checkpoint_manager.py:1983] [process=3][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning. I0423 13:00:24.533588 138819277952832 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=138819277952832 count=1 at 0x7e26a4339ac0>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7e28eb5d82c0>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7e28eb5d82f0>, _write_futures=[]) I0423 13:00:24.533968 138819277952832 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=138819277952832 count=1 at 0x7e26a4339ac0>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7e28eb5d82c0>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7e28eb5d82f0>, _write_futures=[]) I0423 13:00:24.533997 138819277952832 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=138819277952832 count=1 at 0x7e26a4339ac0>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7e28eb5d82c0>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7e28eb5d82f0>, _write_futures=[]) I0423 13:00:24.534029 138819277952832 checkpoint_manager.py:702] [process=3][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e2a0ce178f0>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e28eb23f8f0>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e28eb23fb00>, 'iter': <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7e28eb23f2c0>}, handler_registry=None I0423 13:00:24.534134 138819277952832 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e2a0ce178f0>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0423 13:00:24.534168 138819277952832 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e28eb23f8f0>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0423 13:00:24.534193 138819277952832 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e28eb23fb00>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0423 13:00:24.534221 138819277952832 composite_checkpoint_handler.py:237] Deferred registration for item: "iter". Adding handler `<maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7e28eb23f2c0>` for item "iter" and save args `<class 'maxtext.common.checkpointing.GrainCheckpointSave'>` and restore args `<class 'maxtext.common.checkpointing.GrainCheckpointRestore'>` to `_handler_registry`. I0423 13:00:24.534244 138819277952832 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e28eb23e8a0>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0423 13:00:24.534269 138819277952832 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e2a0ce178f0>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e2a0ce178f0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e28eb23f8f0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7e28eb23f8f0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e28eb23fb00>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e28eb23fb00>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointSave'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7e28eb23f2c0>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointRestore'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7e28eb23f2c0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e28eb23e8a0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7e28eb23e8a0>}). I0423 13:00:24.534339 138819277952832 async_checkpointer.py:177] [process=3][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7e28eb36b920> timeout: 600 secs and primary_host=0 for async checkpoint writes I0423 13:00:24.911252 138819277952832 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260423_124550/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260423_124550_07_distill_smoke/checkpoints I0423 13:00:25.285905 138819277952832 checkpoint_manager.py:921] [process=3][thread=MainThread] CheckpointManager created, primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260423_124550/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260423_124550_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7e2a0c94f830> I0423 13:00:25.286526 138819277952832 train_distill.py:703] Starting Distillation Training... I0423 13:00:25.286635 138819277952832 peft_trainer.py:584] Training with mesh: Mesh('diloco': 1, 'data': 4, 'stage': 1, 'fsdp': 8, 'fsdp_transpose': 1, 'sequence': 1, 'context': 1, 'context_autoregressive': 1, 'tensor': 1, 'tensor_transpose': 1, 'tensor_sequence': 1, 'expert': 1, 'autoregressive': 1, axis_types=(Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto)) I0423 13:00:26.016175 138819277952832 peft_trainer.py:594] Compiled train_step cache size: 0 I0423 13:00:26.017793 138675216176896 grain_pool.py:367] Grain pool will use 1 processes. I0423 13:00:26.044195 138675216176896 grain_pool.py:440] Grain pool will start child processes. I0423 13:00:26.049457 138675216176896 grain_pool.py:448] Grain pool started all child processes. 2026-04-23 13:00:32.086674: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303) Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} `rope_scaling`'s factor field must be a float >= 1, got 40 `rope_scaling`'s beta_fast field must be a float, got 32 `rope_scaling`'s beta_slow field must be a float, got 1 Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} /deps/src/maxtext/trainers/post_train/distillation/train_distill.py:281: DeprecationWarning: '.value' access is now deprecated. For Variable[Array] instances use: variable[...] For other Variable types use: variable.get_value() current_step = model.training_step.value I0423 13:00:38.397490 138819277952832 checkpoint_manager.py:1983] [process=3][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning. I0423 13:00:38.399842 138819277952832 checkpoint_manager.py:1501] [process=3] Saving checkpoint at step 1 I0423 13:00:38.402865 138819277952832 async_checkpointer.py:452] [process=3] Started async saving checkpoint to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260423_124550/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260423_124550_07_distill_smoke/checkpoints/1. I0423 13:00:38.955012 138819277952832 signaling_client.py:364] Using JaxDistributedSignalingClient I0423 13:00:38.956034 138819277952832 jax_array_handlers.py:347] Scheduling D2H of 37 prioritized jax.Array. I0423 13:00:38.956095 138819277952832 replica_slices.py:410] Transferring arrays to host memory with options: use_replica_parallel=True, min_slice_bytes_for_replica_parallel=None, max_replicas_for_replica_parallel=None, enable_pinned_host_transfer=False I0423 13:00:39.632847 138819277952832 base_pytree_checkpoint_handler.py:153] [process=3][thread=MainThread] Initiated "orbax.checkpoint._src.serialization.jax_array_handlers.ArrayHandler".serialize. Time taken: 0.677937s I0423 13:00:39.634477 138819277952832 base_pytree_checkpoint_handler.py:128] [process=3] /jax/checkpoint/write/blocking_gbytes_per_sec: 575.749 KiB/s (total gbytes: 535.1 KiB) (time elapsed: 929 milliseconds) (per-host) I0423 13:00:39.634540 138819277952832 base_pytree_checkpoint_handler.py:732] [process=3][thread=MainThread] Initiated Pytree async_save. Time taken: 0.929436s (batch_requests_ready=0.246083s, total_serialization_initiated=0.683244s, others=0.000109s) I0423 13:00:39.635596 138819277952832 jax_array_handlers.py:347] Scheduling D2H of 22 prioritized jax.Array. I0423 13:00:39.635650 138819277952832 replica_slices.py:410] Transferring arrays to host memory with options: use_replica_parallel=True, min_slice_bytes_for_replica_parallel=None, max_replicas_for_replica_parallel=None, enable_pinned_host_transfer=False I0423 13:00:39.640424 138819277952832 base_pytree_checkpoint_handler.py:153] [process=3][thread=MainThread] Initiated "orbax.checkpoint._src.serialization.jax_array_handlers.ArrayHandler".serialize. Time taken: 0.005769s I0423 13:00:39.640530 138819277952832 base_pytree_checkpoint_handler.py:128] [process=3] /jax/checkpoint/write/blocking_gbytes_per_sec: 285.262 KiB/s (total gbytes: 267.5 KiB) (time elapsed: 937 milliseconds) (per-host) I0423 13:00:39.640573 138819277952832 base_pytree_checkpoint_handler.py:732] [process=3][thread=MainThread] Initiated Pytree async_save. Time taken: 0.937907s (batch_requests_ready=0.930373s, total_serialization_initiated=0.007465s, others=0.000069s) I0423 13:00:39.640681 138819277952832 composite_checkpoint_handler.py:715] [process=3][thread=MainThread] Initiated CompositeCheckpointHandler.async_save. Time taken: 0.941912s (all_items=0.000023s, per_item={'model_params': '0.00001884', 'optimizer_state': '0.00000429'}, temp_paths=0.941889) I0423 13:00:39.641700 138669293037312 async_checkpointer.py:79] [process=3][thread=async_save] Background save thread started. I0423 13:00:39.641874 138819277952832 async_checkpointer.py:561] Finished blocking save. Time taken: 1.241955s. Continuing background save to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260423_124550/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260423_124550_07_distill_smoke/checkpoints/1. I0423 13:00:39.675400 138819277952832 checkpoint_manager.py:1549] [process=3][thread=MainThread][step=1] Starting CheckpointManager Save Finalize thread=save_finalize I0423 13:00:39.675706 138669309822720 async_checkpointer.py:265] [process=3][thread=save_finalize] Waiting for background save thread=async_save. I0423 13:00:39.675860 138819277952832 standard_logger.py:34] {'step': 1, 'event_type': 'save', 'directory': 'gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260423_124550/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260423_124550_07_distill_smoke/checkpoints', 'reached_preemption': False, 'preemption_received_at': None, 'synchronous': False, 'wait_for_prev_start_time': 1776949238.3974662, 'wait_for_prev_duration_secs': 0.00011444091796875, 'time_between_consecutive_saves_sec': None, 'checkpointer_blocking_start_time': 1776949238.399887, 'checkpointer_blocking_duration_secs': 1.2421057224273682, 'get_old_steps_start_time': 1776949239.642015, 'get_old_steps_duration_secs': 8.130073547363281e-05, 'checkpoint_manager_blocking_start_time': 1776949238.3882062, 'checkpoint_manager_blocking_duration_secs': 1.287621021270752} /deps/src/maxtext/trainers/post_train/distillation/train_distill.py:281: DeprecationWarning: '.value' access is now deprecated. For Variable[Array] instances use: variable[...] For other Variable types use: variable.get_value() current_step = model.training_step.value I0423 13:00:42.963537 138819277952832 peft_trainer.py:474] Train step 1 training loss: 15.963623 - training perplexity: 8568669.000000 I0423 13:00:42.983771 138819277952832 peft_trainer.py:474] Train step 2 training loss: 15.943937 - training perplexity: 8401639.000000 I0423 13:00:43.008636 138819277952832 peft_trainer.py:474] Train step 3 training loss: 15.973638 - training perplexity: 8654912.000000 I0423 13:00:43.029178 138819277952832 peft_trainer.py:474] Train step 4 training loss: 15.952717 - training perplexity: 8475726.000000 I0423 13:00:43.034556 138819277952832 peft_trainer.py:733] Train loop finished in: 17.0179 seconds I0423 13:00:43.035021 138819277952832 train_distill.py:712] Saving final checkpoint to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260423_124550/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260423_124550_07_distill_smoke/checkpoints/... I0423 13:00:44.631125 138669301430016 array_metadata_store.py:203] [process=3][thread=array_type_handler] Wrote 22 array_metadata.ArrayMetadata to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260423_124550/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260423_124550_07_distill_smoke/checkpoints/1/model_params/array_metadatas/process_3 I0423 13:00:44.632389 138669293037312 base_pytree_checkpoint_handler.py:128] [process=3] /jax/checkpoint/write/gbytes_per_sec: 45.118 KiB/s (total gbytes: 267.5 KiB) (time elapsed: 5 seconds) (per-host) I0423 13:00:44.769472 138676273133312 array_metadata_store.py:203] [process=3][thread=array_type_handler] Wrote 37 array_metadata.ArrayMetadata to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260423_124550/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260423_124550_07_distill_smoke/checkpoints/1/optimizer_state/array_metadatas/process_3 I0423 13:00:44.770713 138669293037312 base_pytree_checkpoint_handler.py:128] [process=3] /jax/checkpoint/write/gbytes_per_sec: 88.214 KiB/s (total gbytes: 535.1 KiB) (time elapsed: 6 seconds) (per-host) I0423 13:00:44.770828 138669293037312 async_checkpointer.py:90] [process=3][thread=async_save] 4 Handler Commit operations completed. Time taken: 5.129011s. I0423 13:00:54.986907 138819277952832 checkpoint_manager.py:1994] [process=3][thread=MainThread][step=1][wait_until_finished] Waiting for Save Finalize thread (save_finalize) to complete. I0423 13:00:56.977653 138669293037312 async_checkpointer.py:144] [process=3][thread=async_save] Background save thread done. Time taken: 17.335821s. I0423 13:00:56.977986 138669309822720 async_checkpointer.py:273] [process=3][thread=save_finalize] Done with waiting for background save thread=async_save. I0423 13:00:56.978109 138669309822720 async_checkpointer.py:283] [process=3][thread=save_finalize] No errors found in background save thread=async_save. I0423 13:00:56.978158 138669309822720 checkpoint_manager.py:2103] [process=3][thread=save_finalize][step=1] CheckpointManager Save Finalize is syncing with other hosts... I0423 13:00:56.979884 138669309822720 checkpoint_manager.py:2112] [process=3][thread=save_finalize][step=1] CheckpointManager Save Finalize is done on all hosts. I0423 13:00:56.980075 138819277952832 checkpoint_manager.py:2006] [process=3][thread=MainThread][step=1][wait_until_finished] Done waiting for Save Finalize thread (save_finalize) running at step=1. W0423 13:00:56.980201 138819277952832 checkpoint_manager.py:1441] Waiting for previous save to complete took 1.993312 seconds. If this number is high, consider checkpointing less frequently. I0423 13:00:56.981814 138819277952832 checkpoint_manager.py:1501] [process=3] Saving checkpoint at step 5 I0423 13:00:56.985178 138819277952832 async_checkpointer.py:452] [process=3] Started async saving checkpoint to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260423_124550/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260423_124550_07_distill_smoke/checkpoints/5. I0423 13:00:57.933134 138819277952832 jax_array_handlers.py:347] Scheduling D2H of 37 prioritized jax.Array. I0423 13:00:57.933236 138819277952832 replica_slices.py:410] Transferring arrays to host memory with options: use_replica_parallel=True, min_slice_bytes_for_replica_parallel=None, max_replicas_for_replica_parallel=None, enable_pinned_host_transfer=False I0423 13:00:58.595269 138819277952832 base_pytree_checkpoint_handler.py:153] [process=3][thread=MainThread] Initiated "orbax.checkpoint._src.serialization.jax_array_handlers.ArrayHandler".serialize. Time taken: 0.663116s I0423 13:00:58.596801 138819277952832 base_pytree_checkpoint_handler.py:128] [process=3] /jax/checkpoint/write/blocking_gbytes_per_sec: 586.548 KiB/s (total gbytes: 535.1 KiB) (time elapsed: 912 milliseconds) (per-host) I0423 13:00:58.596863 138819277952832 base_pytree_checkpoint_handler.py:732] [process=3][thread=MainThread] Initiated Pytree async_save. Time taken: 0.912321s (batch_requests_ready=0.245911s, total_serialization_initiated=0.666308s, others=0.000103s) I0423 13:00:58.598022 138819277952832 jax_array_handlers.py:347] Scheduling D2H of 22 prioritized jax.Array. I0423 13:00:58.598079 138819277952832 replica_slices.py:410] Transferring arrays to host memory with options: use_replica_parallel=True, min_slice_bytes_for_replica_parallel=None, max_replicas_for_replica_parallel=None, enable_pinned_host_transfer=False I0423 13:00:58.602853 138819277952832 base_pytree_checkpoint_handler.py:153] [process=3][thread=MainThread] Initiated "orbax.checkpoint._src.serialization.jax_array_handlers.ArrayHandler".serialize. Time taken: 0.005897s I0423 13:00:58.602973 138819277952832 base_pytree_checkpoint_handler.py:128] [process=3] /jax/checkpoint/write/blocking_gbytes_per_sec: 290.834 KiB/s (total gbytes: 267.5 KiB) (time elapsed: 919 milliseconds) (per-host) I0423 13:00:58.603017 138819277952832 base_pytree_checkpoint_handler.py:732] [process=3][thread=MainThread] Initiated Pytree async_save. Time taken: 0.919941s (batch_requests_ready=0.912367s, total_serialization_initiated=0.007504s, others=0.000070s) I0423 13:00:58.603097 138819277952832 composite_checkpoint_handler.py:715] [process=3][thread=MainThread] Initiated CompositeCheckpointHandler.async_save. Time taken: 0.923838s (all_items=0.000013s, per_item={'model_params': '0.00001049', 'optimizer_state': '0.00000262'}, temp_paths=0.923825) I0423 13:00:58.604096 138669309822720 async_checkpointer.py:79] [process=3][thread=async_save] Background save thread started. I0423 13:00:58.604252 138819277952832 async_checkpointer.py:561] Finished blocking save. Time taken: 1.622365s. Continuing background save to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260423_124550/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260423_124550_07_distill_smoke/checkpoints/5. I0423 13:00:58.638483 138819277952832 checkpoint_manager.py:1549] [process=3][thread=MainThread][step=5] Starting CheckpointManager Save Finalize thread=save_finalize I0423 13:00:58.638803 138669293037312 async_checkpointer.py:265] [process=3][thread=save_finalize] Waiting for background save thread=async_save. I0423 13:00:58.638985 138819277952832 standard_logger.py:34] {'step': 5, 'event_type': 'save', 'directory': 'gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260423_124550/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260423_124550_07_distill_smoke/checkpoints', 'reached_preemption': False, 'preemption_received_at': None, 'synchronous': False, 'wait_for_prev_start_time': 1776949254.9868677, 'wait_for_prev_duration_secs': 1.993312120437622, 'time_between_consecutive_saves_sec': None, 'checkpointer_blocking_start_time': 1776949256.981853, 'checkpointer_blocking_duration_secs': 1.6225054264068604, 'get_old_steps_start_time': 1776949258.604381, 'get_old_steps_duration_secs': 8.082389831542969e-05, 'checkpoint_manager_blocking_start_time': 1776949243.0370445, 'checkpoint_manager_blocking_duration_secs': 15.60190725326538} I0423 13:00:58.639160 138819277952832 checkpoint_manager.py:1994] [process=3][thread=MainThread][step=5][wait_until_finished] Waiting for Save Finalize thread (save_finalize) to complete. I0423 13:01:02.758916 138669301430016 array_metadata_store.py:203] [process=3][thread=array_type_handler] Wrote 22 array_metadata.ArrayMetadata to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260423_124550/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260423_124550_07_distill_smoke/checkpoints/5/model_params/array_metadatas/process_3 I0423 13:01:02.760193 138669309822720 base_pytree_checkpoint_handler.py:128] [process=3] /jax/checkpoint/write/gbytes_per_sec: 52.695 KiB/s (total gbytes: 267.5 KiB) (time elapsed: 5 seconds) (per-host) I0423 13:01:02.815129 138676273133312 array_metadata_store.py:203] [process=3][thread=array_type_handler] Wrote 37 array_metadata.ArrayMetadata to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260423_124550/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260423_124550_07_distill_smoke/checkpoints/5/optimizer_state/array_metadatas/process_3 I0423 13:01:02.816344 138669309822720 base_pytree_checkpoint_handler.py:128] [process=3] /jax/checkpoint/write/gbytes_per_sec: 104.266 KiB/s (total gbytes: 535.1 KiB) (time elapsed: 5 seconds) (per-host) I0423 13:01:02.816440 138669309822720 async_checkpointer.py:90] [process=3][thread=async_save] 4 Handler Commit operations completed. Time taken: 4.212234s. I0423 13:01:13.850396 138669309822720 async_checkpointer.py:144] [process=3][thread=async_save] Background save thread done. Time taken: 15.246176s. I0423 13:01:13.850717 138669293037312 async_checkpointer.py:273] [process=3][thread=save_finalize] Done with waiting for background save thread=async_save. I0423 13:01:13.850852 138669293037312 async_checkpointer.py:283] [process=3][thread=save_finalize] No errors found in background save thread=async_save. I0423 13:01:13.850909 138669293037312 checkpoint_manager.py:2103] [process=3][thread=save_finalize][step=5] CheckpointManager Save Finalize is syncing with other hosts... I0423 13:01:13.852545 138669293037312 checkpoint_manager.py:2112] [process=3][thread=save_finalize][step=5] CheckpointManager Save Finalize is done on all hosts. I0423 13:01:13.852789 138819277952832 checkpoint_manager.py:2006] [process=3][thread=MainThread][step=5][wait_until_finished] Done waiting for Save Finalize thread (save_finalize) running at step=5. I0423 13:01:13.852953 138819277952832 train_distill.py:724] Final checkpoint saved. I0423 13:01:13.855172 138819277952832 peft_trainer.py:474] Train step 5 training loss: 15.949528 - training perplexity: 8448739.000000 I0423 13:01:13.855563 138819277952832 checkpoint_manager.py:1983] [process=3][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning. I0423 13:01:13.855637 138819277952832 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=138819277952832 count=1 at 0x7e2a0c525d80>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7e28eb23eb70>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7e28eb23e3c0>, _write_futures=[]) I0423 13:01:13.855685 138819277952832 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=138819277952832 count=1 at 0x7e2a0c525d80>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7e28eb23eb70>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7e28eb23e3c0>, _write_futures=[]) I0423 13:01:13.855714 138819277952832 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=138819277952832 count=1 at 0x7e2a0c525d80>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7e28eb23eb70>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7e28eb23e3c0>, _write_futures=[]) I0423 13:01:13.855758 138819277952832 train_distill.py:734] Distillation Complete. I0423 13:01:14.039521 138675216176896 grain_pool.py:547] Shutting down multiprocessing system. I0423 13:01:15.507535 138675216176896 grain_pool.py:542] Grain pool is exiting. I0423 13:01:15.507646 138675216176896 grain_pool.py:547] Shutting down multiprocessing system. I0423 13:01:15.507708 138675216176896 grain_pool.py:547] Shutting down multiprocessing system. XPK End: Thu Apr 23 13:01:24 UTC 2026 EXIT_CODE=0
XPK Start: Thu Apr 23 13:13:41 UTC 2026 2026-04-23 13:13:58.733175: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303) Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} `rope_scaling`'s factor field must be a float >= 1, got 40 `rope_scaling`'s beta_fast field must be a float, got 32 `rope_scaling`'s beta_slow field must be a float, got 1 Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} I0423 13:14:02.772321 137380724926272 max_utils.py:273] Attempting to initialize the jax distributed system... INFO:2026-04-23 13:14:11,812:jax._src.distributed:149: Starting JAX distributed service on [::]:8482 I0423 13:14:11.812060 137380724926272 distributed.py:149] Starting JAX distributed service on [::]:8482 INFO:2026-04-23 13:14:11,814:jax._src.distributed:166: Connecting to JAX distributed service on mt-07-distill-smoke-ret7f-slice-job-0-0.mt-07-distill-smoke-ret7f:8482 I0423 13:14:11.814386 137380724926272 distributed.py:166] Connecting to JAX distributed service on mt-07-distill-smoke-ret7f-slice-job-0-0.mt-07-distill-smoke-ret7f:8482 I0423 13:14:12.745243 137380724926272 max_utils.py:284] Jax distributed system initialized! I0423 13:14:19.271490 137380724926272 max_utils.py:244] Jax distributed system is already initialized. W0423 13:14:19.401405 137380724926272 pyconfig.py:111] base_output_directory is not provided; Using local directory called maxtext_output I0423 13:14:19.461344 137380724926272 max_utils.py:244] Jax distributed system is already initialized. I0423 13:14:19.462539 137380724926272 pyconfig.py:471] Config param abort_on_inf_loss: True I0423 13:14:19.462592 137380724926272 pyconfig.py:471] Config param abort_on_nan_loss: True I0423 13:14:19.462628 137380724926272 pyconfig.py:471] Config param act_quantization_calibration_method: absmax I0423 13:14:19.462659 137380724926272 pyconfig.py:471] Config param activation_dropout_for_audio: 0.0 I0423 13:14:19.462690 137380724926272 pyconfig.py:471] Config param activation_function_for_audio: gelu I0423 13:14:19.462719 137380724926272 pyconfig.py:471] Config param activations_in_float32: False I0423 13:14:19.462747 137380724926272 pyconfig.py:471] Config param adam_b1: 0.9 I0423 13:14:19.462778 137380724926272 pyconfig.py:471] Config param adam_b2: 0.95 I0423 13:14:19.462806 137380724926272 pyconfig.py:471] Config param adam_eps: 1e-08 I0423 13:14:19.462840 137380724926272 pyconfig.py:471] Config param adam_eps_root: 0.0 I0423 13:14:19.462867 137380724926272 pyconfig.py:471] Config param adam_weight_decay: 0.1 I0423 13:14:19.462894 137380724926272 pyconfig.py:471] Config param adamw_mask: [] I0423 13:14:19.462919 137380724926272 pyconfig.py:471] Config param add_bos: True I0423 13:14:19.462946 137380724926272 pyconfig.py:471] Config param add_eos: True I0423 13:14:19.462971 137380724926272 pyconfig.py:471] Config param allow_split_physical_axes: False I0423 13:14:19.462997 137380724926272 pyconfig.py:471] Config param ar_cache_axis_order: 1,2,0,3 I0423 13:14:19.463023 137380724926272 pyconfig.py:471] Config param async_checkpointing: True I0423 13:14:19.463049 137380724926272 pyconfig.py:471] Config param async_scheduling: False I0423 13:14:19.463075 137380724926272 pyconfig.py:471] Config param attention: dot_product I0423 13:14:19.463130 137380724926272 pyconfig.py:471] Config param attention_bias: False I0423 13:14:19.463160 137380724926272 pyconfig.py:471] Config param attention_dropout_for_audio: 0.0 I0423 13:14:19.463186 137380724926272 pyconfig.py:471] Config param attention_out: RematLocation.REMAT I0423 13:14:19.463217 137380724926272 pyconfig.py:471] Config param attention_output_dim: -1 I0423 13:14:19.463244 137380724926272 pyconfig.py:471] Config param attention_sink: False I0423 13:14:19.463272 137380724926272 pyconfig.py:471] Config param attention_type: global I0423 13:14:19.463298 137380724926272 pyconfig.py:471] Config param attn_logits_soft_cap: None I0423 13:14:19.463325 137380724926272 pyconfig.py:471] Config param audio_path: I0423 13:14:19.463350 137380724926272 pyconfig.py:471] Config param audio_placeholder: <|audio|> I0423 13:14:19.463376 137380724926272 pyconfig.py:471] Config param autoregressive_decode_assert: I0423 13:14:19.463402 137380724926272 pyconfig.py:471] Config param base_config: base.yml I0423 13:14:19.463427 137380724926272 pyconfig.py:471] Config param base_emb_dim: 16 I0423 13:14:19.463453 137380724926272 pyconfig.py:471] Config param base_mlp_dim: 64 I0423 13:14:19.463479 137380724926272 pyconfig.py:471] Config param base_moe_mlp_dim: -1 I0423 13:14:19.463507 137380724926272 pyconfig.py:471] Config param base_num_decoder_layers: 1 I0423 13:14:19.463531 137380724926272 pyconfig.py:471] Config param base_num_kv_heads: 2 I0423 13:14:19.463558 137380724926272 pyconfig.py:471] Config param base_num_query_heads: 2 I0423 13:14:19.463583 137380724926272 pyconfig.py:471] Config param base_output_directory: /deps/maxtext_output I0423 13:14:19.463608 137380724926272 pyconfig.py:471] Config param batch_size: 1 I0423 13:14:19.463634 137380724926272 pyconfig.py:471] Config param batch_split_factor: 1 I0423 13:14:19.463659 137380724926272 pyconfig.py:471] Config param beta_fast: 32 I0423 13:14:19.463684 137380724926272 pyconfig.py:471] Config param beta_slow: 1 I0423 13:14:19.463709 137380724926272 pyconfig.py:471] Config param bwd_quantization_calibration_method: absmax I0423 13:14:19.463736 137380724926272 pyconfig.py:471] Config param capacity_factor: -1.0 I0423 13:14:19.463763 137380724926272 pyconfig.py:471] Config param cast_logits_to_fp32: True I0423 13:14:19.463789 137380724926272 pyconfig.py:471] Config param chat_template: I0423 13:14:19.463815 137380724926272 pyconfig.py:471] Config param chat_template_path: I0423 13:14:19.463840 137380724926272 pyconfig.py:471] Config param checkpoint_conversion_fn: None I0423 13:14:19.463866 137380724926272 pyconfig.py:471] Config param checkpoint_dir: /deps/maxtext_output/gpt3-52k_2026-04-23-13-14/checkpoints/ I0423 13:14:19.463892 137380724926272 pyconfig.py:471] Config param checkpoint_is_quantized: False I0423 13:14:19.463916 137380724926272 pyconfig.py:471] Config param checkpoint_period: 2000 I0423 13:14:19.463940 137380724926272 pyconfig.py:471] Config param checkpoint_storage_concurrent_gb: 96 I0423 13:14:19.463966 137380724926272 pyconfig.py:471] Config param checkpoint_storage_target_data_file_size_bytes: 2147483648 I0423 13:14:19.463992 137380724926272 pyconfig.py:471] Config param checkpoint_storage_use_ocdbt: True I0423 13:14:19.464016 137380724926272 pyconfig.py:471] Config param checkpoint_storage_use_zarr3: True I0423 13:14:19.464040 137380724926272 pyconfig.py:471] Config param checkpoint_todelete_full_path: None I0423 13:14:19.464065 137380724926272 pyconfig.py:471] Config param checkpoint_todelete_subdir: None I0423 13:14:19.464090 137380724926272 pyconfig.py:471] Config param chips_per_vm: 4 I0423 13:14:19.464122 137380724926272 pyconfig.py:471] Config param chunk_attn_window_size: 0 I0423 13:14:19.464148 137380724926272 pyconfig.py:471] Config param collect_stack_trace: False I0423 13:14:19.464174 137380724926272 pyconfig.py:471] Config param colocated_python_checkpointing: False I0423 13:14:19.464200 137380724926272 pyconfig.py:471] Config param colocated_python_data_input: False I0423 13:14:19.464226 137380724926272 pyconfig.py:471] Config param compile_topology: I0423 13:14:19.464249 137380724926272 pyconfig.py:471] Config param compile_topology_num_slices: -1 I0423 13:14:19.464272 137380724926272 pyconfig.py:471] Config param compile_xla_flags: I0423 13:14:19.464298 137380724926272 pyconfig.py:471] Config param compiled_trainstep_file: I0423 13:14:19.464323 137380724926272 pyconfig.py:471] Config param compute_axis_order: 0,1,2,3 I0423 13:14:19.464346 137380724926272 pyconfig.py:471] Config param constant_bound_config: [] I0423 13:14:19.464371 137380724926272 pyconfig.py:471] Config param context: RematLocation.REMAT I0423 13:14:19.464397 137380724926272 pyconfig.py:471] Config param context_parallel_load_balance: True I0423 13:14:19.464421 137380724926272 pyconfig.py:471] Config param context_parallel_reorder_strategy: ReorderStrategy.AUTO I0423 13:14:19.464448 137380724926272 pyconfig.py:471] Config param context_parallel_size: 1 I0423 13:14:19.464474 137380724926272 pyconfig.py:471] Config param context_parallel_strategy: all_gather I0423 13:14:19.464498 137380724926272 pyconfig.py:471] Config param context_sharding: context I0423 13:14:19.464527 137380724926272 pyconfig.py:471] Config param conv_chunksize_for_audio: 500 I0423 13:14:19.464552 137380724926272 pyconfig.py:471] Config param conv_stride_for_vit: 14 I0423 13:14:19.464577 137380724926272 pyconfig.py:471] Config param convert_checkpoint_if_possible: False I0423 13:14:19.464602 137380724926272 pyconfig.py:471] Config param cost_estimate_flops_bwd: -1 I0423 13:14:19.464627 137380724926272 pyconfig.py:471] Config param cost_estimate_flops_fwd: -1 I0423 13:14:19.464652 137380724926272 pyconfig.py:471] Config param custom_mesh: I0423 13:14:19.464676 137380724926272 pyconfig.py:471] Config param custom_mesh_and_rule: I0423 13:14:19.464701 137380724926272 pyconfig.py:471] Config param d_model_for_audio: 256 I0423 13:14:19.464725 137380724926272 pyconfig.py:471] Config param data_sharding: (('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'),) I0423 13:14:19.464755 137380724926272 pyconfig.py:471] Config param data_shuffle_seed: 0 I0423 13:14:19.464781 137380724926272 pyconfig.py:471] Config param dataset_name: c4/en:3.0.1 I0423 13:14:19.464805 137380724926272 pyconfig.py:471] Config param dataset_path: I0423 13:14:19.464830 137380724926272 pyconfig.py:471] Config param dataset_type: DatasetType.HF I0423 13:14:19.464858 137380724926272 pyconfig.py:471] Config param dcn_autoregressive_parallelism: 1 I0423 13:14:19.464883 137380724926272 pyconfig.py:471] Config param dcn_context_autoregressive_parallelism: 1 I0423 13:14:19.464907 137380724926272 pyconfig.py:471] Config param dcn_context_parallelism: 1 I0423 13:14:19.464932 137380724926272 pyconfig.py:471] Config param dcn_data_parallelism: -1 I0423 13:14:19.464955 137380724926272 pyconfig.py:471] Config param dcn_diloco_parallelism: 1 I0423 13:14:19.464980 137380724926272 pyconfig.py:471] Config param dcn_expert_parallelism: 1 I0423 13:14:19.465004 137380724926272 pyconfig.py:471] Config param dcn_fsdp_parallelism: 1 I0423 13:14:19.465029 137380724926272 pyconfig.py:471] Config param dcn_fsdp_transpose_parallelism: 1 I0423 13:14:19.465052 137380724926272 pyconfig.py:471] Config param dcn_parallelism: [1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] I0423 13:14:19.465078 137380724926272 pyconfig.py:471] Config param dcn_pipeline_parallelism: 1 I0423 13:14:19.465113 137380724926272 pyconfig.py:471] Config param dcn_sequence_parallelism: 1 I0423 13:14:19.465137 137380724926272 pyconfig.py:471] Config param dcn_tensor_parallelism: 1 I0423 13:14:19.465159 137380724926272 pyconfig.py:471] Config param dcn_tensor_sequence_parallelism: 1 I0423 13:14:19.465183 137380724926272 pyconfig.py:471] Config param dcn_tensor_transpose_parallelism: 1 I0423 13:14:19.465208 137380724926272 pyconfig.py:471] Config param debug: {'rl': False} I0423 13:14:19.465233 137380724926272 pyconfig.py:471] Config param debug_sharding: False I0423 13:14:19.465257 137380724926272 pyconfig.py:471] Config param decode_sampling_nucleus_p: -1 I0423 13:14:19.465283 137380724926272 pyconfig.py:471] Config param decode_sampling_strategy: SamplingStrategy.GREEDY I0423 13:14:19.465311 137380724926272 pyconfig.py:471] Config param decode_sampling_temperature: 1.0 I0423 13:14:19.465336 137380724926272 pyconfig.py:471] Config param decode_sampling_top_k: 0 I0423 13:14:19.465359 137380724926272 pyconfig.py:471] Config param decoder_block: DecoderBlockType.GPT3 I0423 13:14:19.465386 137380724926272 pyconfig.py:471] Config param decoder_layer_input: RematLocation.DEVICE I0423 13:14:19.465412 137380724926272 pyconfig.py:471] Config param deepstack_visual_indexes_for_vit: [] I0423 13:14:19.465437 137380724926272 pyconfig.py:471] Config param degenerate_group_masking: True I0423 13:14:19.465459 137380724926272 pyconfig.py:471] Config param dense_init_scale: 1.0 I0423 13:14:19.465483 137380724926272 pyconfig.py:471] Config param diloco_outer_lr: 0.3 I0423 13:14:19.465514 137380724926272 pyconfig.py:471] Config param diloco_outer_momentum: 0.9 I0423 13:14:19.465539 137380724926272 pyconfig.py:471] Config param diloco_sync_period: 36 I0423 13:14:19.465564 137380724926272 pyconfig.py:471] Config param distill_alpha: 0.5 I0423 13:14:19.465590 137380724926272 pyconfig.py:471] Config param distill_alpha_end: None I0423 13:14:19.465615 137380724926272 pyconfig.py:471] Config param distill_alpha_schedule: constant I0423 13:14:19.465640 137380724926272 pyconfig.py:471] Config param distill_beta: 0.0 I0423 13:14:19.465665 137380724926272 pyconfig.py:471] Config param distill_beta_end: None I0423 13:14:19.465690 137380724926272 pyconfig.py:471] Config param distill_beta_schedule: constant I0423 13:14:19.465715 137380724926272 pyconfig.py:471] Config param distill_feature_loss_type: cosine I0423 13:14:19.465739 137380724926272 pyconfig.py:471] Config param distill_layer_indices: None I0423 13:14:19.465764 137380724926272 pyconfig.py:471] Config param distill_temperature: 1.0 I0423 13:14:19.465789 137380724926272 pyconfig.py:471] Config param distill_temperature_end: None I0423 13:14:19.465814 137380724926272 pyconfig.py:471] Config param distill_temperature_schedule: constant I0423 13:14:19.465838 137380724926272 pyconfig.py:471] Config param downsample_hidden_size_for_audio: 256 I0423 13:14:19.465859 137380724926272 pyconfig.py:471] Config param dpo_beta: 0.1 I0423 13:14:19.465884 137380724926272 pyconfig.py:471] Config param dpo_label_smoothing: 0.0 I0423 13:14:19.465909 137380724926272 pyconfig.py:471] Config param dq_reduction_steps: 0 I0423 13:14:19.465933 137380724926272 pyconfig.py:471] Config param dropout_rate: 0.0 I0423 13:14:19.465956 137380724926272 pyconfig.py:471] Config param dtype: bfloat16 I0423 13:14:19.465999 137380724926272 pyconfig.py:471] Config param dtype_mm: float32 I0423 13:14:19.466025 137380724926272 pyconfig.py:471] Config param dump_hlo: False I0423 13:14:19.466048 137380724926272 pyconfig.py:471] Config param dump_hlo_delete_local_after: True I0423 13:14:19.466073 137380724926272 pyconfig.py:471] Config param dump_hlo_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-23-13-14/xla_dump I0423 13:14:19.466110 137380724926272 pyconfig.py:471] Config param dump_hlo_local_dir: /tmp/xla_dump/ I0423 13:14:19.466135 137380724926272 pyconfig.py:471] Config param dump_hlo_local_module_name: jit_train_step I0423 13:14:19.466159 137380724926272 pyconfig.py:471] Config param dump_hlo_module_name: jit_train_step I0423 13:14:19.466183 137380724926272 pyconfig.py:471] Config param dump_hlo_upload_all: False I0423 13:14:19.466208 137380724926272 pyconfig.py:471] Config param dump_hlo_xla_flags: I0423 13:14:19.466233 137380724926272 pyconfig.py:471] Config param dump_jaxpr: False I0423 13:14:19.466258 137380724926272 pyconfig.py:471] Config param dump_jaxpr_delete_local_after: True I0423 13:14:19.466281 137380724926272 pyconfig.py:471] Config param dump_jaxpr_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-23-13-14/jaxpr_dump I0423 13:14:19.466305 137380724926272 pyconfig.py:471] Config param dump_jaxpr_local_dir: /tmp/jaxpr_dump/ I0423 13:14:19.466330 137380724926272 pyconfig.py:471] Config param dump_step: -1 I0423 13:14:19.466354 137380724926272 pyconfig.py:471] Config param elastic_enabled: False I0423 13:14:19.466378 137380724926272 pyconfig.py:471] Config param elastic_max_retries: 10 I0423 13:14:19.466403 137380724926272 pyconfig.py:471] Config param elastic_timeout_seconds: 300 I0423 13:14:19.466428 137380724926272 pyconfig.py:471] Config param emb_dim: 16 I0423 13:14:19.466452 137380724926272 pyconfig.py:471] Config param enable_autocheckpoint: False I0423 13:14:19.466476 137380724926272 pyconfig.py:471] Config param enable_checkpoint_cloud_logger: False I0423 13:14:19.466499 137380724926272 pyconfig.py:471] Config param enable_checkpointing: True I0423 13:14:19.466528 137380724926272 pyconfig.py:471] Config param enable_continuous_checkpointing: False I0423 13:14:19.466552 137380724926272 pyconfig.py:471] Config param enable_data_shuffling: True I0423 13:14:19.466576 137380724926272 pyconfig.py:471] Config param enable_diloco: False I0423 13:14:19.466601 137380724926272 pyconfig.py:471] Config param enable_dp_attention: False I0423 13:14:19.466626 137380724926272 pyconfig.py:471] Config param enable_dropout: False I0423 13:14:19.466652 137380724926272 pyconfig.py:471] Config param enable_emergency_checkpoint: False I0423 13:14:19.466677 137380724926272 pyconfig.py:471] Config param enable_expert_parallel: False I0423 13:14:19.466702 137380724926272 pyconfig.py:471] Config param enable_gcp_goodput_metrics: True I0423 13:14:19.466727 137380724926272 pyconfig.py:471] Config param enable_gcp_step_deviation_metrics: True I0423 13:14:19.466752 137380724926272 pyconfig.py:471] Config param enable_goodput_recording: False I0423 13:14:19.466776 137380724926272 pyconfig.py:471] Config param enable_jax_profiler: False I0423 13:14:19.466801 137380724926272 pyconfig.py:471] Config param enable_llm_inference_pool: False I0423 13:14:19.466826 137380724926272 pyconfig.py:471] Config param enable_model_warmup: False I0423 13:14:19.466849 137380724926272 pyconfig.py:471] Config param enable_multi_tier_checkpointing: False I0423 13:14:19.466873 137380724926272 pyconfig.py:471] Config param enable_nnx: False I0423 13:14:19.466898 137380724926272 pyconfig.py:471] Config param enable_orbax_v1: False I0423 13:14:19.466922 137380724926272 pyconfig.py:471] Config param enable_padding_causal_mask: True I0423 13:14:19.466947 137380724926272 pyconfig.py:471] Config param enable_pathways_goodput: False I0423 13:14:19.466971 137380724926272 pyconfig.py:471] Config param enable_prefix_caching: False I0423 13:14:19.466996 137380724926272 pyconfig.py:471] Config param enable_rampup_batch_size: False I0423 13:14:19.467020 137380724926272 pyconfig.py:471] Config param enable_single_controller: False I0423 13:14:19.467045 137380724926272 pyconfig.py:471] Config param enable_single_replica_ckpt_restoring: False I0423 13:14:19.467070 137380724926272 pyconfig.py:471] Config param enable_tensorboard: True I0423 13:14:19.467105 137380724926272 pyconfig.py:471] Config param enable_tunix_perf_metrics: False I0423 13:14:19.467130 137380724926272 pyconfig.py:471] Config param encoder_attention_heads_for_audio: 4 I0423 13:14:19.467154 137380724926272 pyconfig.py:471] Config param encoder_ffn_dim_for_audio: 512 I0423 13:14:19.467178 137380724926272 pyconfig.py:471] Config param encoder_layers_for_audio: 2 I0423 13:14:19.467203 137380724926272 pyconfig.py:471] Config param engram: RematLocation.REMAT I0423 13:14:19.467229 137380724926272 pyconfig.py:471] Config param engram_head_dim: 1280 I0423 13:14:19.467253 137380724926272 pyconfig.py:471] Config param engram_kernel_size: 4 I0423 13:14:19.467278 137380724926272 pyconfig.py:471] Config param engram_layers: [] I0423 13:14:19.467303 137380724926272 pyconfig.py:471] Config param engram_max_ngram_size: 3 I0423 13:14:19.467328 137380724926272 pyconfig.py:471] Config param engram_num_heads: 8 I0423 13:14:19.467352 137380724926272 pyconfig.py:471] Config param engram_seed: 0 I0423 13:14:19.467377 137380724926272 pyconfig.py:471] Config param engram_vocab_bases: [] I0423 13:14:19.467401 137380724926272 pyconfig.py:471] Config param epsilon_high: None I0423 13:14:19.467427 137380724926272 pyconfig.py:471] Config param eval_corr_lst: False I0423 13:14:19.467450 137380724926272 pyconfig.py:471] Config param eval_data_columns: ['text'] I0423 13:14:19.467476 137380724926272 pyconfig.py:471] Config param eval_dataset_name: c4/en:3.0.1 I0423 13:14:19.467504 137380724926272 pyconfig.py:471] Config param eval_image_column: image I0423 13:14:19.467529 137380724926272 pyconfig.py:471] Config param eval_interval: -1 I0423 13:14:19.467552 137380724926272 pyconfig.py:471] Config param eval_make_lst: False I0423 13:14:19.467577 137380724926272 pyconfig.py:471] Config param eval_per_device_batch_size: 2 I0423 13:14:19.467600 137380724926272 pyconfig.py:471] Config param eval_sampling_strategy: greedy I0423 13:14:19.467626 137380724926272 pyconfig.py:471] Config param eval_split: validation I0423 13:14:19.467654 137380724926272 pyconfig.py:471] Config param eval_steps: -1 I0423 13:14:19.467680 137380724926272 pyconfig.py:471] Config param expansion_factor_real_data: -1.0 I0423 13:14:19.467706 137380724926272 pyconfig.py:471] Config param final_logits_soft_cap: None I0423 13:14:19.467731 137380724926272 pyconfig.py:471] Config param first_num_dense_layers: 0 I0423 13:14:19.467754 137380724926272 pyconfig.py:471] Config param float32_gate_logits: False I0423 13:14:19.467779 137380724926272 pyconfig.py:471] Config param float32_logits: False I0423 13:14:19.467804 137380724926272 pyconfig.py:471] Config param float32_qk_product: False I0423 13:14:19.467828 137380724926272 pyconfig.py:471] Config param float32_weight_sum: True I0423 13:14:19.467853 137380724926272 pyconfig.py:471] Config param force_q_layout: False I0423 13:14:19.467877 137380724926272 pyconfig.py:471] Config param force_unroll: False I0423 13:14:19.467902 137380724926272 pyconfig.py:471] Config param formatting_func_kwargs: {} I0423 13:14:19.467927 137380724926272 pyconfig.py:471] Config param formatting_func_path: I0423 13:14:19.467951 137380724926272 pyconfig.py:471] Config param freeze_audio_encoder_params: True I0423 13:14:19.467976 137380724926272 pyconfig.py:471] Config param freeze_vision_encoder_params: True I0423 13:14:19.468001 137380724926272 pyconfig.py:471] Config param fused_mlp: False I0423 13:14:19.468025 137380724926272 pyconfig.py:471] Config param fused_qkv: True I0423 13:14:19.468050 137380724926272 pyconfig.py:471] Config param gcs_metrics: False I0423 13:14:19.468075 137380724926272 pyconfig.py:471] Config param gdn_chunk_size: 64 I0423 13:14:19.468111 137380724926272 pyconfig.py:471] Config param gdn_conv_kernel_dim: 4 I0423 13:14:19.468136 137380724926272 pyconfig.py:471] Config param gdn_key_head_dim: 128 I0423 13:14:19.468160 137380724926272 pyconfig.py:471] Config param gdn_num_key_heads: 16 I0423 13:14:19.468185 137380724926272 pyconfig.py:471] Config param gdn_num_value_heads: 32 I0423 13:14:19.468209 137380724926272 pyconfig.py:471] Config param gdn_value_head_dim: 128 I0423 13:14:19.468234 137380724926272 pyconfig.py:471] Config param generate_padding_batch_eval: False I0423 13:14:19.468258 137380724926272 pyconfig.py:471] Config param generate_padding_batch_train: False I0423 13:14:19.468284 137380724926272 pyconfig.py:471] Config param generate_slice: v5e-16 I0423 13:14:19.468309 137380724926272 pyconfig.py:471] Config param generation_configs: {} I0423 13:14:19.468334 137380724926272 pyconfig.py:471] Config param global_batch_size_to_eval_on: 64 I0423 13:14:19.468358 137380724926272 pyconfig.py:471] Config param global_batch_size_to_load: 512 I0423 13:14:19.468383 137380724926272 pyconfig.py:471] Config param global_batch_size_to_load_eval: 64 I0423 13:14:19.468407 137380724926272 pyconfig.py:471] Config param global_batch_size_to_load_increment: None I0423 13:14:19.468431 137380724926272 pyconfig.py:471] Config param global_batch_size_to_load_start: None I0423 13:14:19.468456 137380724926272 pyconfig.py:471] Config param global_batch_size_to_train_on: 512 I0423 13:14:19.468481 137380724926272 pyconfig.py:471] Config param global_head_dim: 0 I0423 13:14:19.468509 137380724926272 pyconfig.py:471] Config param global_num_kv_heads: 0 I0423 13:14:19.468534 137380724926272 pyconfig.py:471] Config param global_parameter_scale: 1 I0423 13:14:19.468559 137380724926272 pyconfig.py:471] Config param global_rampup_samples: 500 I0423 13:14:19.468584 137380724926272 pyconfig.py:471] Config param global_rope_max_timescale: -1 I0423 13:14:19.468610 137380724926272 pyconfig.py:471] Config param global_rope_proportion: 0.25 I0423 13:14:19.468636 137380724926272 pyconfig.py:471] Config param goodput_upload_interval_seconds: 30 I0423 13:14:19.468662 137380724926272 pyconfig.py:471] Config param grad_dtype: float32 I0423 13:14:19.468714 137380724926272 pyconfig.py:471] Config param gradient_accumulation_steps: 8 I0423 13:14:19.468742 137380724926272 pyconfig.py:471] Config param gradient_clipping_threshold: 1.0 I0423 13:14:19.468768 137380724926272 pyconfig.py:471] Config param grain_data_source_max_workers: 16 I0423 13:14:19.468794 137380724926272 pyconfig.py:471] Config param grain_eval_files: I0423 13:14:19.468819 137380724926272 pyconfig.py:471] Config param grain_file_type: arrayrecord I0423 13:14:19.468844 137380724926272 pyconfig.py:471] Config param grain_num_threads: 16 I0423 13:14:19.468870 137380724926272 pyconfig.py:471] Config param grain_num_threads_eval: 16 I0423 13:14:19.468895 137380724926272 pyconfig.py:471] Config param grain_packing_type: first_fit I0423 13:14:19.468920 137380724926272 pyconfig.py:471] Config param grain_per_worker_buffer_size: 1 I0423 13:14:19.468944 137380724926272 pyconfig.py:471] Config param grain_per_worker_buffer_size_eval: 1 I0423 13:14:19.468967 137380724926272 pyconfig.py:471] Config param grain_prefetch_buffer_size: 500 I0423 13:14:19.468991 137380724926272 pyconfig.py:471] Config param grain_prefetch_buffer_size_eval: 500 I0423 13:14:19.469016 137380724926272 pyconfig.py:471] Config param grain_ram_budget_mb: 1024 I0423 13:14:19.469041 137380724926272 pyconfig.py:471] Config param grain_shuffle_buffer_size: 100 I0423 13:14:19.469063 137380724926272 pyconfig.py:471] Config param grain_train_files: I0423 13:14:19.469105 137380724926272 pyconfig.py:471] Config param grain_train_mixture_config_path: I0423 13:14:19.469133 137380724926272 pyconfig.py:471] Config param grain_worker_count: 1 I0423 13:14:19.469158 137380724926272 pyconfig.py:471] Config param grain_worker_count_eval: 1 I0423 13:14:19.469182 137380724926272 pyconfig.py:471] Config param grpo_beta: 0.08 I0423 13:14:19.469209 137380724926272 pyconfig.py:471] Config param grpo_epsilon: 0.2 I0423 13:14:19.469234 137380724926272 pyconfig.py:471] Config param hardware: tpu I0423 13:14:19.469259 137380724926272 pyconfig.py:471] Config param hbm_utilization_vllm: 0.72 I0423 13:14:19.469285 137380724926272 pyconfig.py:471] Config param head_dim: 8 I0423 13:14:19.469311 137380724926272 pyconfig.py:471] Config param heartbeat_reporting_interval_in_seconds: 5 I0423 13:14:19.469336 137380724926272 pyconfig.py:471] Config param hf_data_dir: None I0423 13:14:19.469361 137380724926272 pyconfig.py:471] Config param hf_eval_files: None I0423 13:14:19.469386 137380724926272 pyconfig.py:471] Config param hf_eval_split: None I0423 13:14:19.469410 137380724926272 pyconfig.py:471] Config param hf_name: None I0423 13:14:19.469435 137380724926272 pyconfig.py:471] Config param hf_path: OptimalScale/ClimbMix I0423 13:14:19.469460 137380724926272 pyconfig.py:471] Config param hf_train_files: None I0423 13:14:19.469484 137380724926272 pyconfig.py:471] Config param hidden_size_for_vit: 1408 I0423 13:14:19.469513 137380724926272 pyconfig.py:471] Config param hide_profiler_step_metric: False I0423 13:14:19.469539 137380724926272 pyconfig.py:471] Config param ici_autoregressive_parallelism: 1 I0423 13:14:19.469563 137380724926272 pyconfig.py:471] Config param ici_context_autoregressive_parallelism: 1 I0423 13:14:19.469588 137380724926272 pyconfig.py:471] Config param ici_context_parallelism: 1 I0423 13:14:19.469613 137380724926272 pyconfig.py:471] Config param ici_data_parallelism: 1 I0423 13:14:19.469637 137380724926272 pyconfig.py:471] Config param ici_diloco_parallelism: 1 I0423 13:14:19.469660 137380724926272 pyconfig.py:471] Config param ici_expert_parallelism: 1 I0423 13:14:19.469684 137380724926272 pyconfig.py:471] Config param ici_fsdp_parallelism: -1 I0423 13:14:19.469708 137380724926272 pyconfig.py:471] Config param ici_fsdp_transpose_parallelism: 1 I0423 13:14:19.469733 137380724926272 pyconfig.py:471] Config param ici_parallelism: [1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1] I0423 13:14:19.469759 137380724926272 pyconfig.py:471] Config param ici_pipeline_parallelism: 1 I0423 13:14:19.469784 137380724926272 pyconfig.py:471] Config param ici_sequence_parallelism: 1 I0423 13:14:19.469808 137380724926272 pyconfig.py:471] Config param ici_tensor_parallelism: 1 I0423 13:14:19.469832 137380724926272 pyconfig.py:471] Config param ici_tensor_sequence_parallelism: 1 I0423 13:14:19.469857 137380724926272 pyconfig.py:471] Config param ici_tensor_transpose_parallelism: 1 I0423 13:14:19.469882 137380724926272 pyconfig.py:471] Config param image_path: I0423 13:14:19.469907 137380724926272 pyconfig.py:471] Config param image_placeholder: <|image|> I0423 13:14:19.469932 137380724926272 pyconfig.py:471] Config param image_size_for_vit: 896 I0423 13:14:19.469957 137380724926272 pyconfig.py:471] Config param indexer_head_dim: 128 I0423 13:14:19.469980 137380724926272 pyconfig.py:471] Config param indexer_loss_scaling_factor: 0.0 I0423 13:14:19.470005 137380724926272 pyconfig.py:471] Config param indexer_n_heads: 64 I0423 13:14:19.470027 137380724926272 pyconfig.py:471] Config param indexer_sparse_training: False I0423 13:14:19.470051 137380724926272 pyconfig.py:471] Config param indexer_topk: 2048 I0423 13:14:19.470076 137380724926272 pyconfig.py:471] Config param inference_benchmark_test: False I0423 13:14:19.470110 137380724926272 pyconfig.py:471] Config param inference_metadata_file: I0423 13:14:19.470136 137380724926272 pyconfig.py:471] Config param inference_microbenchmark_log_file_path: I0423 13:14:19.470160 137380724926272 pyconfig.py:471] Config param inference_microbenchmark_loop_iters: 10 I0423 13:14:19.470185 137380724926272 pyconfig.py:471] Config param inference_microbenchmark_num_samples: [1, 2, 3, 4, 5] I0423 13:14:19.470211 137380724926272 pyconfig.py:471] Config param inference_microbenchmark_prefill_lengths: 64,128,256,512,1024 I0423 13:14:19.470236 137380724926272 pyconfig.py:471] Config param inference_microbenchmark_stages: prefill,generate I0423 13:14:19.470261 137380724926272 pyconfig.py:471] Config param inference_server: MaxtextInterleavedServer I0423 13:14:19.470286 137380724926272 pyconfig.py:471] Config param inhomogeneous_layer_cycle_interval: 1 I0423 13:14:19.470311 137380724926272 pyconfig.py:471] Config param init_weights_seed: 0 I0423 13:14:19.470334 137380724926272 pyconfig.py:471] Config param input_data_sharding_logical_axes: ['activation_embed_and_logits_batch', 'activation_norm_length'] I0423 13:14:19.470359 137380724926272 pyconfig.py:471] Config param interleave_moe_layer_step: 1 I0423 13:14:19.470381 137380724926272 pyconfig.py:471] Config param intermediate_size_for_vit: 5632 I0423 13:14:19.470405 137380724926272 pyconfig.py:471] Config param internal_compile: False I0423 13:14:19.470429 137380724926272 pyconfig.py:471] Config param internal_compile_num_devices: -1 I0423 13:14:19.470454 137380724926272 pyconfig.py:471] Config param jax_cache_dir: ~/jax_cache I0423 13:14:19.470479 137380724926272 pyconfig.py:471] Config param jax_debug_log_modules: I0423 13:14:19.470508 137380724926272 pyconfig.py:471] Config param jax_distributed_initialization_timeout: 300 I0423 13:14:19.470533 137380724926272 pyconfig.py:471] Config param jax_profiler_port: 9999 I0423 13:14:19.470558 137380724926272 pyconfig.py:471] Config param key_proj: RematLocation.REMAT I0423 13:14:19.470584 137380724926272 pyconfig.py:471] Config param kv_cache_buffer: 256 I0423 13:14:19.470608 137380724926272 pyconfig.py:471] Config param kv_lora_rank: 512 I0423 13:14:19.470633 137380724926272 pyconfig.py:471] Config param kv_quant_axis: KvQuantAxis.HEADS_AND_DKV I0423 13:14:19.470660 137380724926272 pyconfig.py:471] Config param kv_quant_dtype: int8 I0423 13:14:19.470684 137380724926272 pyconfig.py:471] Config param kv_wa_proj: RematLocation.REMAT I0423 13:14:19.470710 137380724926272 pyconfig.py:471] Config param learning_rate: 0.0002 I0423 13:14:19.470735 137380724926272 pyconfig.py:471] Config param learning_rate_final_fraction: 0.1 I0423 13:14:19.470760 137380724926272 pyconfig.py:471] Config param learning_rate_schedule_steps: 200000 I0423 13:14:19.470785 137380724926272 pyconfig.py:471] Config param load_balance_loss_weight: 0.0 I0423 13:14:19.470810 137380724926272 pyconfig.py:471] Config param load_checkpoint_only_once: False I0423 13:14:19.470834 137380724926272 pyconfig.py:471] Config param load_from_prefill_dir: False I0423 13:14:19.470859 137380724926272 pyconfig.py:471] Config param load_full_state_path: I0423 13:14:19.470883 137380724926272 pyconfig.py:471] Config param load_parameters_path: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items I0423 13:14:19.470908 137380724926272 pyconfig.py:471] Config param local_checkpoint_directory: I0423 13:14:19.470932 137380724926272 pyconfig.py:471] Config param local_checkpoint_period: 0 I0423 13:14:19.470957 137380724926272 pyconfig.py:471] Config param local_rope_max_timescale: -1 I0423 13:14:19.470982 137380724926272 pyconfig.py:471] Config param local_rope_proportion: 1.0 I0423 13:14:19.471007 137380724926272 pyconfig.py:471] Config param log_config: True I0423 13:14:19.471032 137380724926272 pyconfig.py:471] Config param log_period: 10 I0423 13:14:19.471057 137380724926272 pyconfig.py:471] Config param logical_axis_rules: (('activation_embed_and_logits_batch', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_embed_and_logits_batch_sequence', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_vocab', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_vocab', ('tensor', 'tensor_transpose')), ('activation_vocab', 'tensor_sequence'), ('activation_vocab', ('sequence', 'context')), ('vocab', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('embed_vocab', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence', 'autoregressive')), ('activation_kv_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence')), ('activation_attn_length', ('sequence', 'context')), ('activation_attn_length', ('context',)), ('activation_q_length', ('context',)), ('activation_kv_length', ()), ('activation_attn_embed', ('tensor', 'tensor_transpose')), ('activation_kv', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_kv_head_dim', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('q_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('kv_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('qkv', ()), ('kv', ()), ('kv_head_dim', ()), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'expert')), ('q_lora_up_proj', ()), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'expert')), ('kv_lora_up_proj', ()), ('activation_batch_moe', ('data', 'fsdp', 'fsdp_transpose')), ('activation_length_moe', ('sequence', 'context')), ('activation_length_moe', ('context',)), ('activation_norm_length_moe', ('tensor_sequence', 'context', 'sequence')), ('activation_embed_moe', ('tensor', 'tensor_transpose')), ('activation_mlp_moe', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_exp', ('expert',)), ('exp', 'expert'), ('mlp_moe', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'context')), ('embed_moe', ('fsdp', 'sequence', 'context')), ('activation_mlp', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_length', ('sequence', 'context')), ('activation_length', ('context',)), ('activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_embed', ('tensor', 'tensor_transpose')), ('activation_stage', 'stage'), ('mlp', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'context', 'expert')), ('norm', ('tensor', 'tensor_transpose')), ('layers', 'stage'), ('diloco', 'diloco'), ('engram_dim', ('tensor',)), ('dense_layers', ()), ('moe_layers', ()), ('mhc', ()), ('prefill_activation_length', ('sequence', 'context')), ('prefill_activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_prefill_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_length', ('sequence',)), ('cache_heads', ('autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence')), ('cache_heads', ('autoregressive', 'tensor', 'tensor_sequence')), ('paged_kv_heads', ('tensor',)), ('cache_batch_prefill', ()), ('cache_batch', ()), ('cache_heads_none', ()), ('cache_kv', ()), ('cache_sequence', ()), ('num_pages', ()), ('tokens_per_page', ()), ('paged_kv_head_dim_size', ()), ('mlp_no_fsdp', ('tensor', 'tensor_sequence', 'autoregressive')), ('embed_tensor_transpose', ('tensor_transpose',)), ('exp_with_fsdp', 'fsdp')) I0423 13:14:19.471171 137380724926272 pyconfig.py:471] Config param logits_dot_in_fp32: False I0423 13:14:19.471201 137380724926272 pyconfig.py:471] Config param logits_via_embedding: True I0423 13:14:19.471228 137380724926272 pyconfig.py:471] Config param lora_input_adapters_path: I0423 13:14:19.471253 137380724926272 pyconfig.py:471] Config param loss_algo: grpo I0423 13:14:19.471277 137380724926272 pyconfig.py:471] Config param lr_schedule_type: LearningRateScheduleType.COSINE I0423 13:14:19.471305 137380724926272 pyconfig.py:471] Config param managed_mldiagnostics: False I0423 13:14:19.471330 137380724926272 pyconfig.py:471] Config param managed_mldiagnostics_dir: /deps/maxtext_output/gpt3-52k_2026-04-23-13-14/managed-mldiagnostics I0423 13:14:19.471355 137380724926272 pyconfig.py:471] Config param managed_mldiagnostics_run_group: I0423 13:14:19.471380 137380724926272 pyconfig.py:471] Config param matmul_precision: MatmulPrecision.DEFAULT I0423 13:14:19.471408 137380724926272 pyconfig.py:471] Config param max_checkify: False I0423 13:14:19.471433 137380724926272 pyconfig.py:471] Config param max_concurrency: 256 I0423 13:14:19.471458 137380724926272 pyconfig.py:471] Config param max_corpus_chars: 10000000 I0423 13:14:19.471483 137380724926272 pyconfig.py:471] Config param max_num_batched_tokens: None I0423 13:14:19.471513 137380724926272 pyconfig.py:471] Config param max_num_checkpoints_to_keep: None I0423 13:14:19.471538 137380724926272 pyconfig.py:471] Config param max_num_images_per_example: -1 I0423 13:14:19.471563 137380724926272 pyconfig.py:471] Config param max_num_seqs: None I0423 13:14:19.471588 137380724926272 pyconfig.py:471] Config param max_position_embeddings: 163840 I0423 13:14:19.471611 137380724926272 pyconfig.py:471] Config param max_prefill_predict_length: 64 I0423 13:14:19.471637 137380724926272 pyconfig.py:471] Config param max_sample_len_for_audio: 10000 I0423 13:14:19.471661 137380724926272 pyconfig.py:471] Config param max_segments_per_seq: -1 I0423 13:14:19.471686 137380724926272 pyconfig.py:471] Config param max_source_positions_for_audio: 1500 I0423 13:14:19.471709 137380724926272 pyconfig.py:471] Config param max_target_length: 2048 I0423 13:14:19.471733 137380724926272 pyconfig.py:471] Config param max_timescale_for_audio: 10000.0 I0423 13:14:19.471758 137380724926272 pyconfig.py:471] Config param megablox: True I0423 13:14:19.471783 137380724926272 pyconfig.py:471] Config param merge_gating_gmm: False I0423 13:14:19.471806 137380724926272 pyconfig.py:471] Config param mesh_axes: ['diloco', 'data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'] I0423 13:14:19.471834 137380724926272 pyconfig.py:471] Config param metrics_dir: /deps/maxtext_output/gpt3-52k_2026-04-23-13-14/metrics/ I0423 13:14:19.471859 137380724926272 pyconfig.py:471] Config param metrics_file: I0423 13:14:19.471884 137380724926272 pyconfig.py:471] Config param mhc_expansion_rate: 1 I0423 13:14:19.471908 137380724926272 pyconfig.py:471] Config param micro_batch_size_to_eval_on: 64 I0423 13:14:19.471933 137380724926272 pyconfig.py:471] Config param micro_batch_size_to_train_on: 64 I0423 13:14:19.471957 137380724926272 pyconfig.py:471] Config param mla_kv: RematLocation.REMAT I0423 13:14:19.471983 137380724926272 pyconfig.py:471] Config param mla_naive_kvcache: True I0423 13:14:19.472007 137380724926272 pyconfig.py:471] Config param mla_q: RematLocation.REMAT I0423 13:14:19.472032 137380724926272 pyconfig.py:471] Config param mlp_activations: ['gelu'] I0423 13:14:19.472058 137380724926272 pyconfig.py:471] Config param mlp_activations_limit: -1.0 I0423 13:14:19.472083 137380724926272 pyconfig.py:471] Config param mlp_bias: False I0423 13:14:19.472118 137380724926272 pyconfig.py:471] Config param mlp_dim: 64 I0423 13:14:19.472144 137380724926272 pyconfig.py:471] Config param mlpwi: RematLocation.REMAT I0423 13:14:19.472169 137380724926272 pyconfig.py:471] Config param mlpwi_0: RematLocation.REMAT I0423 13:14:19.472193 137380724926272 pyconfig.py:471] Config param mlpwi_1: RematLocation.REMAT I0423 13:14:19.472217 137380724926272 pyconfig.py:471] Config param mlpwo: RematLocation.REMAT I0423 13:14:19.472241 137380724926272 pyconfig.py:471] Config param moba: False I0423 13:14:19.472265 137380724926272 pyconfig.py:471] Config param moba_chunk_size: 1024 I0423 13:14:19.472289 137380724926272 pyconfig.py:471] Config param moba_topk: 8 I0423 13:14:19.472313 137380724926272 pyconfig.py:471] Config param model_call_mode: I0423 13:14:19.472338 137380724926272 pyconfig.py:471] Config param model_name: gpt3-52k I0423 13:14:19.472362 137380724926272 pyconfig.py:471] Config param moe_expert_input_dim: -1 I0423 13:14:19.472387 137380724926272 pyconfig.py:471] Config param moe_fsdp_use_two_stage_all_gather: False I0423 13:14:19.472410 137380724926272 pyconfig.py:471] Config param moe_mlp_dim: -1 I0423 13:14:19.472433 137380724926272 pyconfig.py:471] Config param moe_mlpwi_0: RematLocation.REMAT I0423 13:14:19.472458 137380724926272 pyconfig.py:471] Config param moe_mlpwi_1: RematLocation.REMAT I0423 13:14:19.472483 137380724926272 pyconfig.py:471] Config param moe_mlpwo: RematLocation.REMAT I0423 13:14:19.472512 137380724926272 pyconfig.py:471] Config param monitor_goodput: False I0423 13:14:19.472537 137380724926272 pyconfig.py:471] Config param monitor_step_time_deviation: True I0423 13:14:19.472559 137380724926272 pyconfig.py:471] Config param mrope_section: [24, 20, 20] I0423 13:14:19.472584 137380724926272 pyconfig.py:471] Config param mscale: 1.0 I0423 13:14:19.472608 137380724926272 pyconfig.py:471] Config param mtc_data_parallelism: 0 I0423 13:14:19.472632 137380724926272 pyconfig.py:471] Config param mtp_eval_target_module: 0 I0423 13:14:19.472657 137380724926272 pyconfig.py:471] Config param mtp_loss_scaling_factor: 0.1 I0423 13:14:19.472684 137380724926272 pyconfig.py:471] Config param mtp_num_layers: 0 I0423 13:14:19.472709 137380724926272 pyconfig.py:471] Config param mu_dtype: float32 I0423 13:14:19.472748 137380724926272 pyconfig.py:471] Config param multi_sampling: False I0423 13:14:19.472775 137380724926272 pyconfig.py:471] Config param multi_tier_checkpointing_backup_interval_minutes: 0 I0423 13:14:19.472799 137380724926272 pyconfig.py:471] Config param muon_beta: 0.95 I0423 13:14:19.472825 137380724926272 pyconfig.py:471] Config param muon_consistent_rms: None I0423 13:14:19.472851 137380724926272 pyconfig.py:471] Config param muon_weight_decay: 0.0 I0423 13:14:19.472876 137380724926272 pyconfig.py:471] Config param n_routing_groups: -1 I0423 13:14:19.472900 137380724926272 pyconfig.py:471] Config param n_window_for_audio: 50 I0423 13:14:19.472925 137380724926272 pyconfig.py:471] Config param n_window_infer_for_audio: 800 I0423 13:14:19.472950 137380724926272 pyconfig.py:471] Config param nope_layer_interval: -1 I0423 13:14:19.472975 137380724926272 pyconfig.py:471] Config param norm_topk_prob: False I0423 13:14:19.473000 137380724926272 pyconfig.py:471] Config param normalization_layer_epsilon: 1e-05 I0423 13:14:19.473028 137380724926272 pyconfig.py:471] Config param normalize_embedding_logits: False I0423 13:14:19.473053 137380724926272 pyconfig.py:471] Config param num_attention_heads_for_vit: 16 I0423 13:14:19.473078 137380724926272 pyconfig.py:471] Config param num_batches: 4 I0423 13:14:19.473116 137380724926272 pyconfig.py:471] Config param num_channels_for_vit: 3 I0423 13:14:19.473142 137380724926272 pyconfig.py:471] Config param num_conv_layers_for_audio: 3 I0423 13:14:19.473166 137380724926272 pyconfig.py:471] Config param num_decoder_layers: 1 I0423 13:14:19.473193 137380724926272 pyconfig.py:471] Config param num_diloco_replicas: 1 I0423 13:14:19.473218 137380724926272 pyconfig.py:471] Config param num_epoch: 1 I0423 13:14:19.473243 137380724926272 pyconfig.py:471] Config param num_eval_passes: 1 I0423 13:14:19.473266 137380724926272 pyconfig.py:471] Config param num_experts: 1 I0423 13:14:19.473290 137380724926272 pyconfig.py:471] Config param num_experts_per_tok: 1 I0423 13:14:19.473315 137380724926272 pyconfig.py:471] Config param num_generations: 2 I0423 13:14:19.473339 137380724926272 pyconfig.py:471] Config param num_hidden_layers_for_vit: 34 I0423 13:14:19.473364 137380724926272 pyconfig.py:471] Config param num_iterations: 1 I0423 13:14:19.473389 137380724926272 pyconfig.py:471] Config param num_kv_heads: 2 I0423 13:14:19.473414 137380724926272 pyconfig.py:471] Config param num_layers_per_pipeline_stage: 1 I0423 13:14:19.473438 137380724926272 pyconfig.py:471] Config param num_mel_bins_for_audio: 128 I0423 13:14:19.473463 137380724926272 pyconfig.py:471] Config param num_pipeline_microbatches: -1 I0423 13:14:19.473487 137380724926272 pyconfig.py:471] Config param num_pipeline_repeats: -1 I0423 13:14:19.473519 137380724926272 pyconfig.py:471] Config param num_position_embeddings_for_vit: 1024 I0423 13:14:19.473544 137380724926272 pyconfig.py:471] Config param num_query_heads: 2 I0423 13:14:19.473569 137380724926272 pyconfig.py:471] Config param num_samplers_slices: -1 I0423 13:14:19.473594 137380724926272 pyconfig.py:471] Config param num_slices: 1 I0423 13:14:19.473617 137380724926272 pyconfig.py:471] Config param num_target_devices: 32 I0423 13:14:19.473641 137380724926272 pyconfig.py:471] Config param num_test_batches: 5 I0423 13:14:19.473667 137380724926272 pyconfig.py:471] Config param num_trainer_slices: -1 I0423 13:14:19.473691 137380724926272 pyconfig.py:471] Config param num_vocab_tiling: 1 I0423 13:14:19.473715 137380724926272 pyconfig.py:471] Config param off_policy_steps: 0 I0423 13:14:19.473740 137380724926272 pyconfig.py:471] Config param offline_data_dir: None I0423 13:14:19.473763 137380724926272 pyconfig.py:471] Config param opt_type: OptimizerType.ADAM_PAX I0423 13:14:19.473790 137380724926272 pyconfig.py:471] Config param optimize_mesh_for_tpu_v6e: False I0423 13:14:19.473814 137380724926272 pyconfig.py:471] Config param optimizer_memory_host_offload: False I0423 13:14:19.473839 137380724926272 pyconfig.py:471] Config param original_max_position_embeddings: 4096 I0423 13:14:19.473865 137380724926272 pyconfig.py:471] Config param out_hidden_size_for_vit: 512 I0423 13:14:19.473890 137380724926272 pyconfig.py:471] Config param out_proj: RematLocation.REMAT I0423 13:14:19.473915 137380724926272 pyconfig.py:471] Config param output_dim_for_audio: 512 I0423 13:14:19.473940 137380724926272 pyconfig.py:471] Config param override_logical_axis_rules: False I0423 13:14:19.473965 137380724926272 pyconfig.py:471] Config param override_model_config: True I0423 13:14:19.473989 137380724926272 pyconfig.py:471] Config param packing: True I0423 13:14:19.474012 137380724926272 pyconfig.py:471] Config param pagedattn_head_dim_alignment: 128 I0423 13:14:19.474037 137380724926272 pyconfig.py:471] Config param pagedattn_max_pages_per_group: -1 I0423 13:14:19.474061 137380724926272 pyconfig.py:471] Config param pagedattn_num_pages: 64 I0423 13:14:19.474086 137380724926272 pyconfig.py:471] Config param pagedattn_pages_per_compute_block: 4 I0423 13:14:19.474118 137380724926272 pyconfig.py:471] Config param pagedattn_tokens_per_page: 32 I0423 13:14:19.474142 137380724926272 pyconfig.py:471] Config param param_scan_axis: 1 I0423 13:14:19.474167 137380724926272 pyconfig.py:471] Config param parameter_memory_host_offload: False I0423 13:14:19.474191 137380724926272 pyconfig.py:471] Config param partial_rotary_factor: 1.0 I0423 13:14:19.474215 137380724926272 pyconfig.py:471] Config param patch_size_for_vit: 14 I0423 13:14:19.474238 137380724926272 pyconfig.py:471] Config param penalty_incorrect_answer: -1.0 I0423 13:14:19.474264 137380724926272 pyconfig.py:471] Config param penalty_incorrect_format: -0.5 I0423 13:14:19.474290 137380724926272 pyconfig.py:471] Config param per_device_batch_size: 2 I0423 13:14:19.474315 137380724926272 pyconfig.py:471] Config param per_device_batch_size_increment: 2.0 I0423 13:14:19.474340 137380724926272 pyconfig.py:471] Config param per_device_batch_size_start: 4.0 I0423 13:14:19.474365 137380724926272 pyconfig.py:471] Config param pipeline_delay_activation_forwarding: False I0423 13:14:19.474390 137380724926272 pyconfig.py:471] Config param pipeline_fsdp_ag_once: False I0423 13:14:19.474412 137380724926272 pyconfig.py:471] Config param pipeline_fsdp_ag_per_repeat: False I0423 13:14:19.474437 137380724926272 pyconfig.py:471] Config param pipeline_parallel_layers: 1 I0423 13:14:19.474461 137380724926272 pyconfig.py:471] Config param pixel_shuffle_ratio_for_vit: 0.5 I0423 13:14:19.474487 137380724926272 pyconfig.py:471] Config param posemb_type_for_vit: learn I0423 13:14:19.474515 137380724926272 pyconfig.py:471] Config param position_id_per_seconds: 25 I0423 13:14:19.474540 137380724926272 pyconfig.py:471] Config param prefill_cache_axis_order: 1,2,0,3 I0423 13:14:19.474565 137380724926272 pyconfig.py:471] Config param prefill_cache_dir: I0423 13:14:19.474590 137380724926272 pyconfig.py:471] Config param prefill_chunk_size: 256 I0423 13:14:19.474615 137380724926272 pyconfig.py:471] Config param prefill_slice: v5e-16 I0423 13:14:19.474639 137380724926272 pyconfig.py:471] Config param prefix_caching_dram_byte: 100000000000 I0423 13:14:19.474664 137380724926272 pyconfig.py:471] Config param prefix_caching_hbm_byte: 10000000000 I0423 13:14:19.474689 137380724926272 pyconfig.py:471] Config param prefuse_moe_weights: False I0423 13:14:19.474713 137380724926272 pyconfig.py:471] Config param profile_cleanly: True I0423 13:14:19.474737 137380724926272 pyconfig.py:471] Config param profile_periodically_period: -1 I0423 13:14:19.474762 137380724926272 pyconfig.py:471] Config param profile_power_events: False I0423 13:14:19.474787 137380724926272 pyconfig.py:471] Config param profiler: ProfilerType.NONE I0423 13:14:19.474815 137380724926272 pyconfig.py:471] Config param profiler_steps: 5 I0423 13:14:19.474839 137380724926272 pyconfig.py:471] Config param projector_dropout_for_vit: 0.0 I0423 13:14:19.474864 137380724926272 pyconfig.py:471] Config param projector_input_dim_for_vit: 4096 I0423 13:14:19.474889 137380724926272 pyconfig.py:471] Config param projector_output_dim_for_vit: 4096 I0423 13:14:19.474913 137380724926272 pyconfig.py:471] Config param prometheus_port: 0 I0423 13:14:19.474938 137380724926272 pyconfig.py:471] Config param prompt: I love to I0423 13:14:19.474962 137380724926272 pyconfig.py:471] Config param pure_nnx: False I0423 13:14:19.474987 137380724926272 pyconfig.py:471] Config param pure_nnx_decoder: False I0423 13:14:19.475011 137380724926272 pyconfig.py:471] Config param q_lora_rank: 0 I0423 13:14:19.475036 137380724926272 pyconfig.py:471] Config param qk_clip_threshold: 100.0 I0423 13:14:19.475061 137380724926272 pyconfig.py:471] Config param qk_nope_head_dim: 128 I0423 13:14:19.475086 137380724926272 pyconfig.py:471] Config param qk_norm_with_scale: True I0423 13:14:19.475126 137380724926272 pyconfig.py:471] Config param qk_rope_head_dim: 64 I0423 13:14:19.475152 137380724926272 pyconfig.py:471] Config param qkv_proj: RematLocation.REMAT I0423 13:14:19.475178 137380724926272 pyconfig.py:471] Config param quant_cfg_path: I0423 13:14:19.475202 137380724926272 pyconfig.py:471] Config param quantization: QuantizationType.NONE I0423 13:14:19.475230 137380724926272 pyconfig.py:471] Config param quantization_local_shard_count: 4 I0423 13:14:19.475255 137380724926272 pyconfig.py:471] Config param quantize_kvcache: False I0423 13:14:19.475278 137380724926272 pyconfig.py:471] Config param query_proj: RematLocation.REMAT I0423 13:14:19.475303 137380724926272 pyconfig.py:471] Config param query_wa_proj: RematLocation.REMAT I0423 13:14:19.475329 137380724926272 pyconfig.py:471] Config param ragged_block_size: 256 I0423 13:14:19.475354 137380724926272 pyconfig.py:471] Config param ragged_buffer_factor: -1.0 I0423 13:14:19.475379 137380724926272 pyconfig.py:471] Config param rampup_end_step: 0 I0423 13:14:19.475404 137380724926272 pyconfig.py:471] Config param rampup_samples_per_increment_to_load: None I0423 13:14:19.475429 137380724926272 pyconfig.py:471] Config param reasoning_end_token: </reasoning> I0423 13:14:19.475453 137380724926272 pyconfig.py:471] Config param reasoning_start_token: <reasoning> I0423 13:14:19.475479 137380724926272 pyconfig.py:471] Config param record_internal_nn_metrics: 0 I0423 13:14:19.475507 137380724926272 pyconfig.py:471] Config param remat_policy: full I0423 13:14:19.475532 137380724926272 pyconfig.py:471] Config param remat_policy_for_vit: minimal I0423 13:14:19.475557 137380724926272 pyconfig.py:471] Config param remove_size_one_mesh_axis_from_type: True I0423 13:14:19.475582 137380724926272 pyconfig.py:471] Config param replicate_quant_scale: False I0423 13:14:19.475606 137380724926272 pyconfig.py:471] Config param replicator_backup_interval_minutes: 0 I0423 13:14:19.475631 137380724926272 pyconfig.py:471] Config param report_heartbeat_metric_for_gcp_monitoring: False I0423 13:14:19.475656 137380724926272 pyconfig.py:471] Config param report_performance_metric_for_gcp_monitoring: False I0423 13:14:19.475680 137380724926272 pyconfig.py:471] Config param reshape_q: False I0423 13:14:19.475707 137380724926272 pyconfig.py:471] Config param return_log_prob: False I0423 13:14:19.475733 137380724926272 pyconfig.py:471] Config param reuse_example_batch: 0 I0423 13:14:19.475758 137380724926272 pyconfig.py:471] Config param reward_exact_answer: 5.0 I0423 13:14:19.475784 137380724926272 pyconfig.py:471] Config param reward_exact_format_match: 3.0 I0423 13:14:19.475811 137380724926272 pyconfig.py:471] Config param reward_partial_format_match: 0.5 I0423 13:14:19.475837 137380724926272 pyconfig.py:471] Config param reward_ratio_guess_to_answer_high: 0.5 I0423 13:14:19.475862 137380724926272 pyconfig.py:471] Config param reward_ratio_guess_to_answer_low: 0.25 I0423 13:14:19.475886 137380724926272 pyconfig.py:471] Config param reward_white_space_format_match: 1.5 I0423 13:14:19.475911 137380724926272 pyconfig.py:471] Config param rl: {'num_generations': 2, 'num_iterations': 1, 'grpo_beta': 0.08, 'grpo_epsilon': 0.2, 'loss_algo': 'grpo', 'use_agentic_rollout': False, 'max_concurrency': 256, 'off_policy_steps': 0, 'system_prompt': '', 'degenerate_group_masking': True, 'epsilon_high': None} I0423 13:14:19.475942 137380724926272 pyconfig.py:471] Config param rollout_data_parallelism: -1 I0423 13:14:19.475967 137380724926272 pyconfig.py:471] Config param rollout_expert_parallelism: 1 I0423 13:14:19.475992 137380724926272 pyconfig.py:471] Config param rollout_micro_batch_size: -1 I0423 13:14:19.476017 137380724926272 pyconfig.py:471] Config param rollout_tensor_parallelism: -1 I0423 13:14:19.476042 137380724926272 pyconfig.py:471] Config param rope_attention_scaling: False I0423 13:14:19.476064 137380724926272 pyconfig.py:471] Config param rope_factor: 40 I0423 13:14:19.476089 137380724926272 pyconfig.py:471] Config param rope_interleave: True I0423 13:14:19.476123 137380724926272 pyconfig.py:471] Config param rope_linear_scaling_factor: 1.0 I0423 13:14:19.476149 137380724926272 pyconfig.py:471] Config param rope_max_timescale: 10000 I0423 13:14:19.476173 137380724926272 pyconfig.py:471] Config param rope_min_timescale: 1 I0423 13:14:19.476198 137380724926272 pyconfig.py:471] Config param rope_theta_for_vit: 10000 I0423 13:14:19.476220 137380724926272 pyconfig.py:471] Config param rope_truncate: True I0423 13:14:19.476245 137380724926272 pyconfig.py:471] Config param rope_type: RopeType.DEFAULT I0423 13:14:19.476272 137380724926272 pyconfig.py:471] Config param rope_use_scale: True I0423 13:14:19.476297 137380724926272 pyconfig.py:471] Config param routed_bias: False I0423 13:14:19.476321 137380724926272 pyconfig.py:471] Config param routed_bias_update_rate: 0.0 I0423 13:14:19.476346 137380724926272 pyconfig.py:471] Config param routed_scaling_factor: 1.0 I0423 13:14:19.476371 137380724926272 pyconfig.py:471] Config param routed_score_func: I0423 13:14:19.476396 137380724926272 pyconfig.py:471] Config param run_name: gpt3-52k_2026-04-23-13-14 I0423 13:14:19.476421 137380724926272 pyconfig.py:471] Config param sa_block_kv: 512 I0423 13:14:19.476444 137380724926272 pyconfig.py:471] Config param sa_block_kv_compute: 512 I0423 13:14:19.476468 137380724926272 pyconfig.py:471] Config param sa_block_kv_dkv: 512 I0423 13:14:19.476493 137380724926272 pyconfig.py:471] Config param sa_block_kv_dkv_compute: 512 I0423 13:14:19.476521 137380724926272 pyconfig.py:471] Config param sa_block_kv_dq: 512 I0423 13:14:19.476545 137380724926272 pyconfig.py:471] Config param sa_block_q: 512 I0423 13:14:19.476570 137380724926272 pyconfig.py:471] Config param sa_block_q_dkv: 512 I0423 13:14:19.476594 137380724926272 pyconfig.py:471] Config param sa_block_q_dq: 512 I0423 13:14:19.476619 137380724926272 pyconfig.py:471] Config param sa_k_layout: HEAD_DIM_MINOR I0423 13:14:19.476643 137380724926272 pyconfig.py:471] Config param sa_q_layout: HEAD_DIM_MINOR I0423 13:14:19.476669 137380724926272 pyconfig.py:471] Config param sa_use_fused_bwd_kernel: False I0423 13:14:19.476694 137380724926272 pyconfig.py:471] Config param sa_v_layout: HEAD_DIM_MINOR I0423 13:14:19.476718 137380724926272 pyconfig.py:471] Config param sampler_devices_fraction: 0.5 I0423 13:14:19.476744 137380724926272 pyconfig.py:471] Config param save_checkpoint_on_completion: True I0423 13:14:19.476768 137380724926272 pyconfig.py:471] Config param save_config_to_gcs: False I0423 13:14:19.476793 137380724926272 pyconfig.py:471] Config param save_quantized_params_path: I0423 13:14:19.476818 137380724926272 pyconfig.py:471] Config param scale_embedding_for_audio: True I0423 13:14:19.476843 137380724926272 pyconfig.py:471] Config param scan_layers: True I0423 13:14:19.476866 137380724926272 pyconfig.py:471] Config param scan_layers_per_stage: False I0423 13:14:19.476892 137380724926272 pyconfig.py:471] Config param scan_pipeline_iterations: True I0423 13:14:19.476917 137380724926272 pyconfig.py:471] Config param scan_pipeline_repeats: False I0423 13:14:19.476942 137380724926272 pyconfig.py:471] Config param set_remat_policy_on_layers_per_stage: False I0423 13:14:19.476964 137380724926272 pyconfig.py:471] Config param set_remat_policy_on_pipeline_iterations: True I0423 13:14:19.476988 137380724926272 pyconfig.py:471] Config param sft_train_on_completion_only: False I0423 13:14:19.477013 137380724926272 pyconfig.py:471] Config param shard_exp_on_fsdp: False I0423 13:14:19.477038 137380724926272 pyconfig.py:471] Config param shard_mode: ShardMode.AUTO I0423 13:14:19.477063 137380724926272 pyconfig.py:471] Config param shard_optimizer_over_data: False I0423 13:14:19.477087 137380724926272 pyconfig.py:471] Config param sharding_strategy: None I0423 13:14:19.477123 137380724926272 pyconfig.py:471] Config param sharding_tolerance: 0.02 I0423 13:14:19.477149 137380724926272 pyconfig.py:471] Config param shardy: True I0423 13:14:19.477174 137380724926272 pyconfig.py:471] Config param share_kv_projections: False I0423 13:14:19.477197 137380724926272 pyconfig.py:471] Config param shared_experts: 0 I0423 13:14:19.477220 137380724926272 pyconfig.py:471] Config param sinkhorn_iterations: 20 I0423 13:14:19.477245 137380724926272 pyconfig.py:471] Config param skip_first_n_steps_for_profiler: 1 I0423 13:14:19.477270 137380724926272 pyconfig.py:471] Config param skip_jax_distributed_system: False I0423 13:14:19.477294 137380724926272 pyconfig.py:471] Config param skip_step_interval: 128 I0423 13:14:19.477319 137380724926272 pyconfig.py:471] Config param skip_step_on_spikes: False I0423 13:14:19.477344 137380724926272 pyconfig.py:471] Config param skip_step_scaling_factor: 6.0 I0423 13:14:19.477368 137380724926272 pyconfig.py:471] Config param sliding_window_size: 0 I0423 13:14:19.477393 137380724926272 pyconfig.py:471] Config param solution_end_token: </answer> I0423 13:14:19.477417 137380724926272 pyconfig.py:471] Config param solution_start_token: <answer> I0423 13:14:19.477442 137380724926272 pyconfig.py:471] Config param source_checkpoint_layout: orbax I0423 13:14:19.477466 137380724926272 pyconfig.py:471] Config param sparse_matmul: True I0423 13:14:19.477491 137380724926272 pyconfig.py:471] Config param spatial_merge_size_for_vit: 2 I0423 13:14:19.477518 137380724926272 pyconfig.py:471] Config param stack_prefill_result_cache: False I0423 13:14:19.477543 137380724926272 pyconfig.py:471] Config param stack_trace_interval_seconds: 600 I0423 13:14:19.477564 137380724926272 pyconfig.py:471] Config param stack_trace_to_cloud: False I0423 13:14:19.477588 137380724926272 pyconfig.py:471] Config param step_deviation_interval_seconds: 30 I0423 13:14:19.477612 137380724926272 pyconfig.py:471] Config param steps: 200000 I0423 13:14:19.477637 137380724926272 pyconfig.py:471] Config param stop_strings: None I0423 13:14:19.477660 137380724926272 pyconfig.py:471] Config param student_overrides: {'model_name': 'llama3.1-8b'} I0423 13:14:19.477685 137380724926272 pyconfig.py:471] Config param student_params_to_update: None I0423 13:14:19.477710 137380724926272 pyconfig.py:471] Config param subslice_shape: I0423 13:14:19.477735 137380724926272 pyconfig.py:471] Config param swap_space_vllm_gb: 2 I0423 13:14:19.477760 137380724926272 pyconfig.py:471] Config param system_prompt: I0423 13:14:19.477785 137380724926272 pyconfig.py:471] Config param target_eval_loss: 0.0 I0423 13:14:19.477810 137380724926272 pyconfig.py:471] Config param teacher_overrides: {'model_name': 'llama3.1-8b'} I0423 13:14:19.477836 137380724926272 pyconfig.py:471] Config param temperature_tuning: False I0423 13:14:19.477860 137380724926272 pyconfig.py:471] Config param temporal_patch_size_for_vit: 2 I0423 13:14:19.477885 137380724926272 pyconfig.py:471] Config param tensorboard_dir: /deps/maxtext_output/gpt3-52k_2026-04-23-13-14/tensorboard/ I0423 13:14:19.477910 137380724926272 pyconfig.py:471] Config param tensors_on_device: None I0423 13:14:19.477935 137380724926272 pyconfig.py:471] Config param tensors_to_offload: None I0423 13:14:19.477957 137380724926272 pyconfig.py:471] Config param test_batch_start_index: 0 I0423 13:14:19.477982 137380724926272 pyconfig.py:471] Config param tile_size_for_vit: 336 I0423 13:14:19.478006 137380724926272 pyconfig.py:471] Config param tokenize_eval_data: True I0423 13:14:19.478031 137380724926272 pyconfig.py:471] Config param tokenize_train_data: True I0423 13:14:19.478054 137380724926272 pyconfig.py:471] Config param tokenizer_path: meta-llama/Llama-3.1-8B I0423 13:14:19.478078 137380724926272 pyconfig.py:471] Config param tokenizer_type: TokenizerType.HUGGINGFACE I0423 13:14:19.478115 137380724926272 pyconfig.py:471] Config param topk_routing_group: -1 I0423 13:14:19.478141 137380724926272 pyconfig.py:471] Config param train_data_columns: ['text'] I0423 13:14:19.478165 137380724926272 pyconfig.py:471] Config param train_fraction: 1.0 I0423 13:14:19.478191 137380724926272 pyconfig.py:471] Config param train_image_column: image I0423 13:14:19.478216 137380724926272 pyconfig.py:471] Config param train_micro_batch_size: -1 I0423 13:14:19.478239 137380724926272 pyconfig.py:471] Config param train_split: train I0423 13:14:19.478264 137380724926272 pyconfig.py:471] Config param trainable_parameters_mask: [] I0423 13:14:19.478290 137380724926272 pyconfig.py:471] Config param trainable_position_size: 2048 I0423 13:14:19.478314 137380724926272 pyconfig.py:471] Config param trainer_devices_fraction: 0.5 I0423 13:14:19.478341 137380724926272 pyconfig.py:471] Config param upload_all_profiler_results: False I0423 13:14:19.478365 137380724926272 pyconfig.py:471] Config param use_2d_fsdp_sharding: False I0423 13:14:19.478389 137380724926272 pyconfig.py:471] Config param use_agentic_rollout: False I0423 13:14:19.478414 137380724926272 pyconfig.py:471] Config param use_audio: False I0423 13:14:19.478440 137380724926272 pyconfig.py:471] Config param use_audio_in_video: False I0423 13:14:19.478462 137380724926272 pyconfig.py:471] Config param use_batch_split_schedule: False I0423 13:14:19.478486 137380724926272 pyconfig.py:471] Config param use_chat_template: False I0423 13:14:19.478514 137380724926272 pyconfig.py:471] Config param use_chunked_prefill: False I0423 13:14:19.478540 137380724926272 pyconfig.py:471] Config param use_custom_sort_vjp: True I0423 13:14:19.478565 137380724926272 pyconfig.py:471] Config param use_dpo: False I0423 13:14:19.478591 137380724926272 pyconfig.py:471] Config param use_gather_mosaic_kernel: False I0423 13:14:19.478615 137380724926272 pyconfig.py:471] Config param use_grpo: True I0423 13:14:19.478638 137380724926272 pyconfig.py:471] Config param use_indexer: False I0423 13:14:19.478662 137380724926272 pyconfig.py:471] Config param use_iota_embed: True I0423 13:14:19.478687 137380724926272 pyconfig.py:471] Config param use_jax_splash: False I0423 13:14:19.478711 137380724926272 pyconfig.py:471] Config param use_max_logit_estimate: -1 I0423 13:14:19.478734 137380724926272 pyconfig.py:471] Config param use_mrope: False I0423 13:14:19.478758 137380724926272 pyconfig.py:471] Config param use_multimodal: False I0423 13:14:19.478782 137380724926272 pyconfig.py:471] Config param use_pathways: True I0423 13:14:19.478807 137380724926272 pyconfig.py:471] Config param use_post_attn_norm: False I0423 13:14:19.478829 137380724926272 pyconfig.py:471] Config param use_post_ffw_norm: False I0423 13:14:19.478853 137380724926272 pyconfig.py:471] Config param use_qk_clip: False I0423 13:14:19.478878 137380724926272 pyconfig.py:471] Config param use_qk_norm: False I0423 13:14:19.478902 137380724926272 pyconfig.py:471] Config param use_qk_norm_in_gdn: True I0423 13:14:19.478927 137380724926272 pyconfig.py:471] Config param use_qwix_quantization: False I0423 13:14:19.478949 137380724926272 pyconfig.py:471] Config param use_ragged_attention: False I0423 13:14:19.478974 137380724926272 pyconfig.py:471] Config param use_random_routing: False I0423 13:14:19.478998 137380724926272 pyconfig.py:471] Config param use_replicator_service: False I0423 13:14:19.479021 137380724926272 pyconfig.py:471] Config param use_ring_of_experts: False I0423 13:14:19.479045 137380724926272 pyconfig.py:471] Config param use_sft: False I0423 13:14:19.479070 137380724926272 pyconfig.py:471] Config param use_splash_scheduler: False I0423 13:14:19.479105 137380724926272 pyconfig.py:471] Config param use_tokamax_gmm: False I0423 13:14:19.479131 137380724926272 pyconfig.py:471] Config param use_tokamax_splash: False I0423 13:14:19.479156 137380724926272 pyconfig.py:471] Config param use_truncation: True I0423 13:14:19.479181 137380724926272 pyconfig.py:471] Config param use_tunix_gradient_accumulation: False I0423 13:14:19.479205 137380724926272 pyconfig.py:471] Config param use_untrainable_positional_embedding: False I0423 13:14:19.479230 137380724926272 pyconfig.py:471] Config param use_vertex_tensorboard: False I0423 13:14:19.479254 137380724926272 pyconfig.py:471] Config param using_pipeline_parallelism: False I0423 13:14:19.479277 137380724926272 pyconfig.py:471] Config param v_head_dim: 128 I0423 13:14:19.479301 137380724926272 pyconfig.py:471] Config param v_norm_with_scale: True I0423 13:14:19.479323 137380724926272 pyconfig.py:471] Config param value_proj: RematLocation.REMAT I0423 13:14:19.479348 137380724926272 pyconfig.py:471] Config param vertex_tensorboard_project: I0423 13:14:19.479372 137380724926272 pyconfig.py:471] Config param vertex_tensorboard_region: I0423 13:14:19.479397 137380724926272 pyconfig.py:471] Config param video_path: I0423 13:14:19.479421 137380724926272 pyconfig.py:471] Config param video_placeholder: <|video|> I0423 13:14:19.479446 137380724926272 pyconfig.py:471] Config param vision_output_dim_for_vit: 4096 I0423 13:14:19.479470 137380724926272 pyconfig.py:471] Config param vision_output_length: -1 I0423 13:14:19.479494 137380724926272 pyconfig.py:471] Config param vllm_additional_config: {} I0423 13:14:19.479524 137380724926272 pyconfig.py:471] Config param vllm_hf_config_path: I0423 13:14:19.479549 137380724926272 pyconfig.py:471] Config param vllm_hf_overrides: {} I0423 13:14:19.479573 137380724926272 pyconfig.py:471] Config param vocab_size: 32000 I0423 13:14:19.479598 137380724926272 pyconfig.py:471] Config param warmup_steps_fraction: 0.1 I0423 13:14:19.479622 137380724926272 pyconfig.py:471] Config param weight_dtype: float32 I0423 13:14:19.479660 137380724926272 pyconfig.py:471] Config param weight_quantization_calibration_method: absmax I0423 13:14:19.479685 137380724926272 pyconfig.py:471] Config param wi_tile_dlhs_batch_seq: 512 I0423 13:14:19.479711 137380724926272 pyconfig.py:471] Config param wi_tile_dlhs_embed_dim: 1024 I0423 13:14:19.479735 137380724926272 pyconfig.py:471] Config param wi_tile_dlhs_mlp_dim: 1024 I0423 13:14:19.479759 137380724926272 pyconfig.py:471] Config param wi_tile_drhs_batch_seq: 512 I0423 13:14:19.479784 137380724926272 pyconfig.py:471] Config param wi_tile_drhs_embed_dim: 1024 I0423 13:14:19.479806 137380724926272 pyconfig.py:471] Config param wi_tile_drhs_mlp_dim: 1024 I0423 13:14:19.479830 137380724926272 pyconfig.py:471] Config param wi_tile_fwd_batch_seq: 512 I0423 13:14:19.479855 137380724926272 pyconfig.py:471] Config param wi_tile_fwd_embed_dim: 1024 I0423 13:14:19.479879 137380724926272 pyconfig.py:471] Config param wi_tile_fwd_mlp_dim: 1024 I0423 13:14:19.479902 137380724926272 pyconfig.py:471] Config param wo_tile_dlhs_batch_seq: 512 I0423 13:14:19.479927 137380724926272 pyconfig.py:471] Config param wo_tile_dlhs_embed_dim: 1024 I0423 13:14:19.479951 137380724926272 pyconfig.py:471] Config param wo_tile_dlhs_mlp_dim: 1024 I0423 13:14:19.479975 137380724926272 pyconfig.py:471] Config param wo_tile_drhs_batch_seq: 512 I0423 13:14:19.479997 137380724926272 pyconfig.py:471] Config param wo_tile_drhs_embed_dim: 1024 I0423 13:14:19.480021 137380724926272 pyconfig.py:471] Config param wo_tile_drhs_mlp_dim: 1024 I0423 13:14:19.480046 137380724926272 pyconfig.py:471] Config param wo_tile_fwd_batch_seq: 512 I0423 13:14:19.480070 137380724926272 pyconfig.py:471] Config param wo_tile_fwd_embed_dim: 1024 I0423 13:14:19.480105 137380724926272 pyconfig.py:471] Config param wo_tile_fwd_mlp_dim: 1024 I0423 13:14:19.480130 137380724926272 pyconfig.py:471] Config param wsd_decay_steps_fraction: 0.1 I0423 13:14:19.480154 137380724926272 pyconfig.py:471] Config param wsd_decay_style: WsdDecayStyle.LINEAR I0423 13:14:19.480181 137380724926272 pyconfig.py:471] Config param xprof_e2e_enable_fw_power_level_event: False I0423 13:14:19.480207 137380724926272 pyconfig.py:471] Config param xprof_e2e_enable_fw_thermal_event: False I0423 13:14:19.480232 137380724926272 pyconfig.py:471] Config param xprof_e2e_enable_fw_throttle_event: False I0423 13:14:19.480257 137380724926272 pyconfig.py:471] Config param xprof_tpu_power_trace_level: 0 I0423 13:14:19.480284 137380724926272 pyconfig.py:471] Config param z_loss_multiplier: 0.0 I0423 13:14:19.480653 137380724926272 tokenizer.py:245] Tokenizer path: meta-llama/Llama-2-7b-chat-hf I0423 13:14:19.480696 137380724926272 tokenizer.py:224] Loading HF tokenizer: meta-llama/Llama-2-7b-chat-hf I0423 13:14:23.203809 137380724926272 _schedule.py:129] A polynomial schedule was set with a non-positive `transition_steps` value; this results in a constant schedule with value `init_value`. I0423 13:14:23.206887 137380724926272 maxtext_utils.py:1771] Num_devices: 32, shape (1, 4, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1) I0423 13:14:23.207034 137380724926272 train_distill.py:608] Applying logical axis rules for model initialization and training... I0423 13:14:23.207153 137380724926272 train_distill.py:612] Loading Student from ... I0423 13:14:23.207196 137380724926272 train_distill.py:169] --- Student Configuration --- I0423 13:14:23.207231 137380724926272 train_distill.py:170] Model Name: gpt3-52k I0423 13:14:23.207270 137380724926272 train_distill.py:171] Dimensions: 1 Layers, 16 Emb Dim, 8 Head Dim I0423 13:14:23.207296 137380724926272 train_distill.py:174] Attention Heads: 2 Query, 2 KV I0423 13:14:23.207315 137380724926272 train_distill.py:175] Vocab Size: 32000 I0423 13:14:23.207332 137380724926272 train_distill.py:176] Checkpoint: I0423 13:14:23.207348 137380724926272 train_distill.py:477] Initializing model: gpt3-52k... I0423 13:14:24.475392 137380724926272 train_distill.py:626] Loading Teacher from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items... I0423 13:14:24.475500 137380724926272 train_distill.py:169] --- Teacher Configuration --- I0423 13:14:24.475527 137380724926272 train_distill.py:170] Model Name: gpt3-52k I0423 13:14:24.475550 137380724926272 train_distill.py:171] Dimensions: 1 Layers, 16 Emb Dim, 8 Head Dim I0423 13:14:24.475573 137380724926272 train_distill.py:174] Attention Heads: 2 Query, 2 KV I0423 13:14:24.475594 137380724926272 train_distill.py:175] Vocab Size: 32000 I0423 13:14:24.475614 137380724926272 train_distill.py:176] Checkpoint: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items I0423 13:14:24.475634 137380724926272 train_distill.py:477] Initializing model: gpt3-52k... I0423 13:14:25.592561 137380724926272 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0423 13:14:25.592989 137380724926272 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=True, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7cf1bccb6180>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0423 13:14:25.593045 137380724926272 abstract_checkpointer.py:35] orbax-checkpoint version: 0.11.28 W0423 13:14:26.115203 137380724926272 checkpoint.py:202] Metadata file does not exist: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items/_CHECKPOINT_METADATA I0423 13:14:26.652715 2135 google_auth_provider.cc:181] Running on GCE, using service account 562977990677-compute@developer.gserviceaccount.com I0423 13:14:28.199202 137380724926272 checkpointer.py:304] Restoring checkpoint from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items. W0423 13:14:30.385305 137380724926272 transform_utils.py:230] The transformations API will eventually be replaced by an upgraded design. The current API will not be removed until this point, but it will no longer be actively worked on. I0423 13:14:30.385672 137380724926272 transform_utils.py:288] The following keys are not loaded from the original tree after applying specified transforms: params/params/decoder/to_nnx__rngs/aqt/count, params/params/decoder/to_nnx__rngs/aqt/key, params/params/decoder/to_nnx__rngs/dropout/count, params/params/decoder/to_nnx__rngs/dropout/key, params/params/decoder/to_nnx__rngs/params/count, params/params/decoder/to_nnx__rngs/params/key I0423 13:14:30.702067 137380724926272 checkpointer.py:318] Finished restoring checkpoint in 3.28 seconds from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items. I0423 13:14:31.394543 137380724926272 train_distill.py:652] Initializing Data Iterators via MaxText pipeline... I0423 13:14:31.459282 137380724926272 config.py:112] TensorFlow version 2.20.0 available. I0423 13:14:31.459796 137380724926272 config.py:125] JAX version 0.8.3 available. E0423 13:14:33.494585 137380724926272 packing.py:209] PackAndBatchOperation is deprecated. Please use lazy_dataset.FirstFitPackIterDataset instead. I0423 13:14:33.494810 137380724926272 data_loader.py:408] Adding CopyNumPyArrayToSharedMemory MapTransform. I0423 13:14:33.497897 137380724926272 train_distill.py:422] Input Pipeline Checkpointing: DISABLED I0423 13:14:33.497963 137380724926272 train_distill.py:426] Reason: Iterator 'MultiHostDataLoadIterator' is not recognized as Grain (dataset_type='DatasetType.HF', has_save=False) I0423 13:14:33.498027 137380724926272 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0423 13:14:33.498118 137380724926272 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7cf1bccb6180>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0423 13:14:33.498162 137380724926272 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0423 13:14:33.498194 137380724926272 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7cf1bccb6180>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0423 13:14:33.498238 137380724926272 checkpoint_manager.py:702] [process=6][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7cdb56da7500>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7cdb650ef9b0>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7cdb56da73e0>}, handler_registry=None I0423 13:14:33.498440 137380724926272 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7cdb56da7500>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0423 13:14:33.498486 137380724926272 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7cdb650ef9b0>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0423 13:14:33.498513 137380724926272 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7cdb56da73e0>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0423 13:14:33.498537 137380724926272 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7cdb56da7020>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0423 13:14:33.498563 137380724926272 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7cdb56da7500>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7cdb56da7500>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7cdb650ef9b0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7cdb650ef9b0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7cdb56da73e0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7cdb56da73e0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7cdb56da7020>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7cdb56da7020>}). I0423 13:14:33.498971 137380724926272 async_checkpointer.py:177] [process=6][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7cdb56e78220> timeout: 600 secs and primary_host=0 for async checkpoint writes I0423 13:14:36.233714 137380724926272 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260423_124550/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260423_124550_07_distill_smoke/checkpoints I0423 13:14:36.663551 137380724926272 checkpoint_manager.py:921] [process=6][thread=MainThread] CheckpointManager created, primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260423_124550/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260423_124550_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7cdb56da73b0> I0423 13:14:36.663720 137380724926272 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0423 13:14:36.663788 137380724926272 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7cf1bccb6180>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0423 13:14:36.663827 137380724926272 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0423 13:14:36.663858 137380724926272 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7cf1bccb6180>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0423 13:14:36.663893 137380724926272 checkpoint_manager.py:1983] [process=6][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning. I0423 13:14:36.663964 137380724926272 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=137380724926272 count=1 at 0x7cdb651b1200>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7cdb56da71a0>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7cdb56da7170>, _write_futures=[]) I0423 13:14:36.664386 137380724926272 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=137380724926272 count=1 at 0x7cdb651b1200>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7cdb56da71a0>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7cdb56da7170>, _write_futures=[]) I0423 13:14:36.664417 137380724926272 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=137380724926272 count=1 at 0x7cdb651b1200>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7cdb56da71a0>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7cdb56da7170>, _write_futures=[]) I0423 13:14:36.664453 137380724926272 checkpoint_manager.py:702] [process=6][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7cdb56da7380>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7cdb56da6480>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7cdb56e01d90>, 'iter': <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7cdb56e02360>}, handler_registry=None I0423 13:14:36.664584 137380724926272 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7cdb56da7380>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0423 13:14:36.664628 137380724926272 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7cdb56da6480>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0423 13:14:36.664654 137380724926272 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7cdb56e01d90>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0423 13:14:36.664683 137380724926272 composite_checkpoint_handler.py:237] Deferred registration for item: "iter". Adding handler `<maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7cdb56e02360>` for item "iter" and save args `<class 'maxtext.common.checkpointing.GrainCheckpointSave'>` and restore args `<class 'maxtext.common.checkpointing.GrainCheckpointRestore'>` to `_handler_registry`. I0423 13:14:36.664706 137380724926272 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7cdb56e01af0>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0423 13:14:36.664732 137380724926272 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7cdb56da7380>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7cdb56da7380>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7cdb56da6480>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7cdb56da6480>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7cdb56e01d90>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7cdb56e01d90>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointSave'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7cdb56e02360>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointRestore'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7cdb56e02360>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7cdb56e01af0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7cdb56e01af0>}). I0423 13:14:36.664808 137380724926272 async_checkpointer.py:177] [process=6][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7cdb56e78360> timeout: 600 secs and primary_host=0 for async checkpoint writes I0423 13:14:37.037829 137380724926272 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260423_124550/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260423_124550_07_distill_smoke/checkpoints I0423 13:14:37.048750 137380724926272 checkpoint_manager.py:921] [process=6][thread=MainThread] CheckpointManager created, primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260423_124550/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260423_124550_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7cdb56da6690> I0423 13:14:37.049203 137380724926272 train_distill.py:703] Starting Distillation Training... I0423 13:14:37.049309 137380724926272 peft_trainer.py:584] Training with mesh: Mesh('diloco': 1, 'data': 4, 'stage': 1, 'fsdp': 8, 'fsdp_transpose': 1, 'sequence': 1, 'context': 1, 'context_autoregressive': 1, 'tensor': 1, 'tensor_transpose': 1, 'tensor_sequence': 1, 'expert': 1, 'autoregressive': 1, axis_types=(Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto)) I0423 13:14:37.913335 137380724926272 peft_trainer.py:594] Compiled train_step cache size: 0 I0423 13:14:37.915080 137237316478720 grain_pool.py:367] Grain pool will use 1 processes. I0423 13:14:37.942090 137237316478720 grain_pool.py:440] Grain pool will start child processes. I0423 13:14:37.947321 137237316478720 grain_pool.py:448] Grain pool started all child processes. 2026-04-23 13:14:43.981943: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303) Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} `rope_scaling`'s factor field must be a float >= 1, got 40 `rope_scaling`'s beta_fast field must be a float, got 32 `rope_scaling`'s beta_slow field must be a float, got 1 Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} /deps/src/maxtext/trainers/post_train/distillation/train_distill.py:281: DeprecationWarning: '.value' access is now deprecated. For Variable[Array] instances use: variable[...] For other Variable types use: variable.get_value() current_step = model.training_step.value I0423 13:14:50.124435 137380724926272 checkpoint_manager.py:1983] [process=6][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning. I0423 13:14:50.126633 137380724926272 checkpoint_manager.py:1501] [process=6] Saving checkpoint at step 1 I0423 13:14:50.129708 137380724926272 async_checkpointer.py:452] [process=6] Started async saving checkpoint to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260423_124550/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260423_124550_07_distill_smoke/checkpoints/1. I0423 13:14:50.709839 137380724926272 signaling_client.py:364] Using JaxDistributedSignalingClient I0423 13:14:50.710731 137380724926272 jax_array_handlers.py:347] Scheduling D2H of 37 prioritized jax.Array. I0423 13:14:50.710791 137380724926272 replica_slices.py:410] Transferring arrays to host memory with options: use_replica_parallel=True, min_slice_bytes_for_replica_parallel=None, max_replicas_for_replica_parallel=None, enable_pinned_host_transfer=False I0423 13:14:51.385946 137380724926272 base_pytree_checkpoint_handler.py:153] [process=6][thread=MainThread] Initiated "orbax.checkpoint._src.serialization.jax_array_handlers.ArrayHandler".serialize. Time taken: 0.676190s I0423 13:14:51.389828 137380724926272 base_pytree_checkpoint_handler.py:128] [process=6] /jax/checkpoint/write/blocking_gbytes_per_sec: 558.631 KiB/s (total gbytes: 535.1 KiB) (time elapsed: 957 milliseconds) (per-host) I0423 13:14:51.389901 137380724926272 base_pytree_checkpoint_handler.py:732] [process=6][thread=MainThread] Initiated Pytree async_save. Time taken: 0.958036s (batch_requests_ready=0.274060s, total_serialization_initiated=0.683855s, others=0.000121s) I0423 13:14:51.391851 137380724926272 jax_array_handlers.py:347] Scheduling D2H of 46 prioritized jax.Array. I0423 13:14:51.391912 137380724926272 replica_slices.py:410] Transferring arrays to host memory with options: use_replica_parallel=True, min_slice_bytes_for_replica_parallel=None, max_replicas_for_replica_parallel=None, enable_pinned_host_transfer=False I0423 13:14:51.397293 137380724926272 base_pytree_checkpoint_handler.py:153] [process=6][thread=MainThread] Initiated "orbax.checkpoint._src.serialization.jax_array_handlers.ArrayHandler".serialize. Time taken: 0.007284s I0423 13:14:51.397410 137380724926272 base_pytree_checkpoint_handler.py:128] [process=6] /jax/checkpoint/write/blocking_gbytes_per_sec: 276.196 KiB/s (total gbytes: 267.6 KiB) (time elapsed: 968 milliseconds) (per-host) I0423 13:14:51.397454 137380724926272 base_pytree_checkpoint_handler.py:732] [process=6][thread=MainThread] Initiated Pytree async_save. Time taken: 0.968810s (batch_requests_ready=0.957498s, total_serialization_initiated=0.011238s, others=0.000074s) I0423 13:14:51.397573 137380724926272 composite_checkpoint_handler.py:715] [process=6][thread=MainThread] Initiated CompositeCheckpointHandler.async_save. Time taken: 0.972824s (all_items=0.000023s, per_item={'model_params': '0.00001907', 'optimizer_state': '0.00000429'}, temp_paths=0.972801) I0423 13:14:51.398393 137230899205888 async_checkpointer.py:79] [process=6][thread=async_save] Background save thread started. I0423 13:14:51.398516 137380724926272 async_checkpointer.py:561] Finished blocking save. Time taken: 1.271822s. Continuing background save to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260423_124550/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260423_124550_07_distill_smoke/checkpoints/1. I0423 13:14:51.400253 137380724926272 checkpoint_manager.py:1549] [process=6][thread=MainThread][step=1] Starting CheckpointManager Save Finalize thread=save_finalize I0423 13:14:51.400518 137231404787456 async_checkpointer.py:265] [process=6][thread=save_finalize] Waiting for background save thread=async_save. I0423 13:14:51.400672 137380724926272 standard_logger.py:34] {'step': 1, 'event_type': 'save', 'directory': 'gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260423_124550/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260423_124550_07_distill_smoke/checkpoints', 'reached_preemption': False, 'preemption_received_at': None, 'synchronous': False, 'wait_for_prev_start_time': 1776950090.1244128, 'wait_for_prev_duration_secs': 0.00011110305786132812, 'time_between_consecutive_saves_sec': None, 'checkpointer_blocking_start_time': 1776950090.1266744, 'checkpointer_blocking_duration_secs': 1.271977186203003, 'get_old_steps_start_time': 1776950091.3986685, 'get_old_steps_duration_secs': 8.082389831542969e-05, 'checkpoint_manager_blocking_start_time': 1776950090.1180573, 'checkpoint_manager_blocking_duration_secs': 1.2825839519500732} /deps/src/maxtext/trainers/post_train/distillation/train_distill.py:281: DeprecationWarning: '.value' access is now deprecated. For Variable[Array] instances use: variable[...] For other Variable types use: variable.get_value() current_step = model.training_step.value I0423 13:14:54.537606 137380724926272 peft_trainer.py:474] Train step 1 training loss: 15.990566 - training perplexity: 8802675.000000 I0423 13:14:54.558562 137380724926272 peft_trainer.py:474] Train step 2 training loss: 15.974588 - training perplexity: 8663145.000000 I0423 13:14:54.589455 137380724926272 peft_trainer.py:474] Train step 3 training loss: 16.008877 - training perplexity: 8965342.000000 I0423 13:14:54.609530 137380724926272 peft_trainer.py:474] Train step 4 training loss: 16.001873 - training perplexity: 8902770.000000 I0423 13:14:54.614575 137380724926272 peft_trainer.py:733] Train loop finished in: 16.7008 seconds I0423 13:14:54.615024 137380724926272 train_distill.py:712] Saving final checkpoint to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260423_124550/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260423_124550_07_distill_smoke/checkpoints/... I0423 13:14:55.760531 137231396394752 array_metadata_store.py:203] [process=6][thread=array_type_handler] Wrote 37 array_metadata.ArrayMetadata to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260423_124550/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260423_124550_07_distill_smoke/checkpoints/1/optimizer_state/array_metadatas/process_6 I0423 13:14:55.810447 137230907598592 array_metadata_store.py:203] [process=6][thread=array_type_handler] Wrote 46 array_metadata.ArrayMetadata to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260423_124550/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260423_124550_07_distill_smoke/checkpoints/1/model_params/array_metadatas/process_6 I0423 13:14:55.811518 137230899205888 base_pytree_checkpoint_handler.py:128] [process=6] /jax/checkpoint/write/gbytes_per_sec: 49.707 KiB/s (total gbytes: 267.6 KiB) (time elapsed: 5 seconds) (per-host) I0423 13:14:55.811668 137230899205888 base_pytree_checkpoint_handler.py:128] [process=6] /jax/checkpoint/write/gbytes_per_sec: 99.469 KiB/s (total gbytes: 535.1 KiB) (time elapsed: 5 seconds) (per-host) I0423 13:14:55.811707 137230899205888 async_checkpointer.py:90] [process=6][thread=async_save] 4 Handler Commit operations completed. Time taken: 4.413128s. I0423 13:15:06.345960 137380724926272 checkpoint_manager.py:1994] [process=6][thread=MainThread][step=1][wait_until_finished] Waiting for Save Finalize thread (save_finalize) to complete. I0423 13:15:07.389572 137230899205888 async_checkpointer.py:144] [process=6][thread=async_save] Background save thread done. Time taken: 15.990976s. I0423 13:15:07.389865 137231404787456 async_checkpointer.py:273] [process=6][thread=save_finalize] Done with waiting for background save thread=async_save. I0423 13:15:07.389986 137231404787456 async_checkpointer.py:283] [process=6][thread=save_finalize] No errors found in background save thread=async_save. I0423 13:15:07.390035 137231404787456 checkpoint_manager.py:2103] [process=6][thread=save_finalize][step=1] CheckpointManager Save Finalize is syncing with other hosts... I0423 13:15:07.391604 137231404787456 checkpoint_manager.py:2112] [process=6][thread=save_finalize][step=1] CheckpointManager Save Finalize is done on all hosts. I0423 13:15:07.391737 137380724926272 checkpoint_manager.py:2006] [process=6][thread=MainThread][step=1][wait_until_finished] Done waiting for Save Finalize thread (save_finalize) running at step=1. W0423 13:15:07.391812 137380724926272 checkpoint_manager.py:1441] Waiting for previous save to complete took 1.045871 seconds. If this number is high, consider checkpointing less frequently. I0423 13:15:07.393364 137380724926272 checkpoint_manager.py:1501] [process=6] Saving checkpoint at step 5 I0423 13:15:07.396787 137380724926272 async_checkpointer.py:452] [process=6] Started async saving checkpoint to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260423_124550/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260423_124550_07_distill_smoke/checkpoints/5. I0423 13:15:07.923105 137380724926272 jax_array_handlers.py:347] Scheduling D2H of 37 prioritized jax.Array. I0423 13:15:07.923206 137380724926272 replica_slices.py:410] Transferring arrays to host memory with options: use_replica_parallel=True, min_slice_bytes_for_replica_parallel=None, max_replicas_for_replica_parallel=None, enable_pinned_host_transfer=False I0423 13:15:08.587385 137380724926272 base_pytree_checkpoint_handler.py:153] [process=6][thread=MainThread] Initiated "orbax.checkpoint._src.serialization.jax_array_handlers.ArrayHandler".serialize. Time taken: 0.665383s I0423 13:15:08.591005 137380724926272 base_pytree_checkpoint_handler.py:128] [process=6] /jax/checkpoint/write/blocking_gbytes_per_sec: 588.340 KiB/s (total gbytes: 535.1 KiB) (time elapsed: 909 milliseconds) (per-host) I0423 13:15:08.591068 137380724926272 base_pytree_checkpoint_handler.py:732] [process=6][thread=MainThread] Initiated Pytree async_save. Time taken: 0.909649s (batch_requests_ready=0.239016s, total_serialization_initiated=0.670530s, others=0.000103s) I0423 13:15:08.592943 137380724926272 jax_array_handlers.py:347] Scheduling D2H of 46 prioritized jax.Array. I0423 13:15:08.593002 137380724926272 replica_slices.py:410] Transferring arrays to host memory with options: use_replica_parallel=True, min_slice_bytes_for_replica_parallel=None, max_replicas_for_replica_parallel=None, enable_pinned_host_transfer=False I0423 13:15:08.597869 137380724926272 base_pytree_checkpoint_handler.py:153] [process=6][thread=MainThread] Initiated "orbax.checkpoint._src.serialization.jax_array_handlers.ArrayHandler".serialize. Time taken: 0.006672s I0423 13:15:08.597975 137380724926272 base_pytree_checkpoint_handler.py:128] [process=6] /jax/checkpoint/write/blocking_gbytes_per_sec: 291.033 KiB/s (total gbytes: 267.6 KiB) (time elapsed: 919 milliseconds) (per-host) I0423 13:15:08.598018 137380724926272 base_pytree_checkpoint_handler.py:732] [process=6][thread=MainThread] Initiated Pytree async_save. Time taken: 0.919416s (batch_requests_ready=0.908958s, total_serialization_initiated=0.010394s, others=0.000064s) I0423 13:15:08.598153 137380724926272 composite_checkpoint_handler.py:715] [process=6][thread=MainThread] Initiated CompositeCheckpointHandler.async_save. Time taken: 0.923780s (all_items=0.000014s, per_item={'model_params': '0.00001121', 'optimizer_state': '0.00000286'}, temp_paths=0.923766) I0423 13:15:08.599126 137230915991296 async_checkpointer.py:79] [process=6][thread=async_save] Background save thread started. I0423 13:15:08.599284 137380724926272 async_checkpointer.py:561] Finished blocking save. Time taken: 1.205851s. Continuing background save to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260423_124550/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260423_124550_07_distill_smoke/checkpoints/5. I0423 13:15:08.627909 137380724926272 checkpoint_manager.py:1549] [process=6][thread=MainThread][step=5] Starting CheckpointManager Save Finalize thread=save_finalize I0423 13:15:08.628219 137231404787456 async_checkpointer.py:265] [process=6][thread=save_finalize] Waiting for background save thread=async_save. I0423 13:15:08.628404 137380724926272 standard_logger.py:34] {'step': 5, 'event_type': 'save', 'directory': 'gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260423_124550/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260423_124550_07_distill_smoke/checkpoints', 'reached_preemption': False, 'preemption_received_at': None, 'synchronous': False, 'wait_for_prev_start_time': 1776950106.3459198, 'wait_for_prev_duration_secs': 1.0458705425262451, 'time_between_consecutive_saves_sec': None, 'checkpointer_blocking_start_time': 1776950107.3934033, 'checkpointer_blocking_duration_secs': 1.2059948444366455, 'get_old_steps_start_time': 1776950108.5994186, 'get_old_steps_duration_secs': 8.487701416015625e-05, 'checkpoint_manager_blocking_start_time': 1776950094.6191497, 'checkpoint_manager_blocking_duration_secs': 14.00921893119812} I0423 13:15:08.628596 137380724926272 checkpoint_manager.py:1994] [process=6][thread=MainThread][step=5][wait_until_finished] Waiting for Save Finalize thread (save_finalize) to complete. I0423 13:15:13.152869 137281469163264 array_metadata_store.py:203] [process=6][thread=array_type_handler] Wrote 46 array_metadata.ArrayMetadata to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260423_124550/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260423_124550_07_distill_smoke/checkpoints/5/model_params/array_metadatas/process_6 I0423 13:15:13.154114 137230915991296 base_pytree_checkpoint_handler.py:128] [process=6] /jax/checkpoint/write/gbytes_per_sec: 48.866 KiB/s (total gbytes: 267.6 KiB) (time elapsed: 5 seconds) (per-host) I0423 13:15:13.217005 137231396394752 array_metadata_store.py:203] [process=6][thread=array_type_handler] Wrote 37 array_metadata.ArrayMetadata to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260423_124550/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260423_124550_07_distill_smoke/checkpoints/5/optimizer_state/array_metadatas/process_6 I0423 13:15:13.218245 137230915991296 base_pytree_checkpoint_handler.py:128] [process=6] /jax/checkpoint/write/gbytes_per_sec: 96.650 KiB/s (total gbytes: 535.1 KiB) (time elapsed: 5 seconds) (per-host) I0423 13:15:13.218383 137230915991296 async_checkpointer.py:90] [process=6][thread=async_save] 4 Handler Commit operations completed. Time taken: 4.619140s. I0423 13:15:23.971322 137230915991296 async_checkpointer.py:144] [process=6][thread=async_save] Background save thread done. Time taken: 15.372058s. I0423 13:15:23.971626 137231404787456 async_checkpointer.py:273] [process=6][thread=save_finalize] Done with waiting for background save thread=async_save. I0423 13:15:23.971758 137231404787456 async_checkpointer.py:283] [process=6][thread=save_finalize] No errors found in background save thread=async_save. I0423 13:15:23.971807 137231404787456 checkpoint_manager.py:2103] [process=6][thread=save_finalize][step=5] CheckpointManager Save Finalize is syncing with other hosts... I0423 13:15:23.973335 137231404787456 checkpoint_manager.py:2112] [process=6][thread=save_finalize][step=5] CheckpointManager Save Finalize is done on all hosts. I0423 13:15:23.973534 137380724926272 checkpoint_manager.py:2006] [process=6][thread=MainThread][step=5][wait_until_finished] Done waiting for Save Finalize thread (save_finalize) running at step=5. I0423 13:15:23.973663 137380724926272 train_distill.py:724] Final checkpoint saved. I0423 13:15:23.976142 137380724926272 peft_trainer.py:474] Train step 5 training loss: 15.987230 - training perplexity: 8773359.000000 I0423 13:15:23.976613 137380724926272 checkpoint_manager.py:1983] [process=6][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning. I0423 13:15:23.976699 137380724926272 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=137380724926272 count=1 at 0x7cdb6553c900>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7cdb56e02840>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7cdb56e01eb0>, _write_futures=[]) I0423 13:15:23.976755 137380724926272 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=137380724926272 count=1 at 0x7cdb6553c900>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7cdb56e02840>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7cdb56e01eb0>, _write_futures=[]) I0423 13:15:23.976792 137380724926272 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=137380724926272 count=1 at 0x7cdb6553c900>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7cdb56e02840>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7cdb56e01eb0>, _write_futures=[]) I0423 13:15:23.976846 137380724926272 train_distill.py:734] Distillation Complete. I0423 13:15:24.119185 137237316478720 grain_pool.py:547] Shutting down multiprocessing system. I0423 13:15:26.091928 137237316478720 grain_pool.py:542] Grain pool is exiting. I0423 13:15:26.092033 137237316478720 grain_pool.py:547] Shutting down multiprocessing system. I0423 13:15:26.092109 137237316478720 grain_pool.py:547] Shutting down multiprocessing system. XPK End: Thu Apr 23 13:15:36 UTC 2026 EXIT_CODE=0