MaxView

← Back to run

Log Summary

XPK Start: Wed Apr 22 13:05:38 UTC 2026
2026-04-22 13:05:55.805195: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
I0422 13:05:59.430211 134234063824704 max_utils.py:273] Attempting to initialize the jax distributed system...
INFO:2026-04-22 13:06:08,469:jax._src.distributed:149: Starting JAX distributed service on [::]:8482
I0422 13:06:08.469258 134234063824704 distributed.py:149] Starting JAX distributed service on [::]:8482
INFO:2026-04-22 13:06:08,471:jax._src.distributed:166: Connecting to JAX distributed service on mt-07-distill-smoke-v2uxs-slice-job-0-0.mt-07-distill-smoke-v2uxs:8482
I0422 13:06:08.471575 134234063824704 distributed.py:166] Connecting to JAX distributed service on mt-07-distill-smoke-v2uxs-slice-job-0-0.mt-07-distill-smoke-v2uxs:8482
I0422 13:06:09.715779 134234063824704 max_utils.py:284] Jax distributed system initialized!
I0422 13:06:15.003356 134234063824704 max_utils.py:244] Jax distributed system is already initialized.
I0422 13:06:15.475959 134234063824704 max_utils.py:244] Jax distributed system is already initialized.
I0422 13:06:15.477457 134234063824704 pyconfig.py:432] Config param abort_on_inf_loss: True
I0422 13:06:15.477519 134234063824704 pyconfig.py:432] Config param abort_on_nan_loss: True
I0422 13:06:15.477560 134234063824704 pyconfig.py:432] Config param act_quantization_calibration_method: absmax
I0422 13:06:15.477591 134234063824704 pyconfig.py:432] Config param activation_dropout_for_audio: 0.0
I0422 13:06:15.477620 134234063824704 pyconfig.py:432] Config param activation_function_for_audio: gelu
I0422 13:06:15.477647 134234063824704 pyconfig.py:432] Config param activations_in_float32: False
I0422 13:06:15.477674 134234063824704 pyconfig.py:432] Config param adam_b1: 0.9
I0422 13:06:15.477703 134234063824704 pyconfig.py:432] Config param adam_b2: 0.95
I0422 13:06:15.477728 134234063824704 pyconfig.py:432] Config param adam_eps: 1e-08
I0422 13:06:15.477760 134234063824704 pyconfig.py:432] Config param adam_eps_root: 0.0
I0422 13:06:15.477785 134234063824704 pyconfig.py:432] Config param adam_weight_decay: 0.1
I0422 13:06:15.477810 134234063824704 pyconfig.py:432] Config param adamw_mask: []
I0422 13:06:15.477834 134234063824704 pyconfig.py:432] Config param add_bos: True
I0422 13:06:15.477859 134234063824704 pyconfig.py:432] Config param add_eos: True
I0422 13:06:15.477882 134234063824704 pyconfig.py:432] Config param allow_split_physical_axes: False
I0422 13:06:15.477906 134234063824704 pyconfig.py:432] Config param ar_cache_axis_order: 1,2,0,3
I0422 13:06:15.477931 134234063824704 pyconfig.py:432] Config param async_checkpointing: True
I0422 13:06:15.477954 134234063824704 pyconfig.py:432] Config param async_scheduling: False
I0422 13:06:15.477978 134234063824704 pyconfig.py:432] Config param attention: dot_product
I0422 13:06:15.478001 134234063824704 pyconfig.py:432] Config param attention_bias: False
I0422 13:06:15.478026 134234063824704 pyconfig.py:432] Config param attention_dropout_for_audio: 0.0
I0422 13:06:15.478053 134234063824704 pyconfig.py:432] Config param attention_out: RematLocation.REMAT
I0422 13:06:15.478084 134234063824704 pyconfig.py:432] Config param attention_output_dim: -1
I0422 13:06:15.478123 134234063824704 pyconfig.py:432] Config param attention_sink: False
I0422 13:06:15.478141 134234063824704 pyconfig.py:432] Config param attention_type: global
I0422 13:06:15.478157 134234063824704 pyconfig.py:432] Config param attn_logits_soft_cap: None
I0422 13:06:15.478176 134234063824704 pyconfig.py:432] Config param audio_path: 
I0422 13:06:15.478192 134234063824704 pyconfig.py:432] Config param audio_placeholder: <|audio|>
I0422 13:06:15.478209 134234063824704 pyconfig.py:432] Config param autoregressive_decode_assert: 
I0422 13:06:15.478225 134234063824704 pyconfig.py:432] Config param base_config: base.yml
I0422 13:06:15.478242 134234063824704 pyconfig.py:432] Config param base_emb_dim: 16
I0422 13:06:15.478258 134234063824704 pyconfig.py:432] Config param base_mlp_dim: 64
I0422 13:06:15.478278 134234063824704 pyconfig.py:432] Config param base_moe_mlp_dim: -1
I0422 13:06:15.478304 134234063824704 pyconfig.py:432] Config param base_num_decoder_layers: 1
I0422 13:06:15.478332 134234063824704 pyconfig.py:432] Config param base_num_kv_heads: 2
I0422 13:06:15.478359 134234063824704 pyconfig.py:432] Config param base_num_query_heads: 2
I0422 13:06:15.478386 134234063824704 pyconfig.py:432] Config param base_output_directory: 
I0422 13:06:15.478411 134234063824704 pyconfig.py:432] Config param batch_size: 1
I0422 13:06:15.478435 134234063824704 pyconfig.py:432] Config param batch_split_factor: 1
I0422 13:06:15.478461 134234063824704 pyconfig.py:432] Config param beta_fast: 32
I0422 13:06:15.478486 134234063824704 pyconfig.py:432] Config param beta_slow: 1
I0422 13:06:15.478508 134234063824704 pyconfig.py:432] Config param bwd_quantization_calibration_method: absmax
I0422 13:06:15.478524 134234063824704 pyconfig.py:432] Config param capacity_factor: -1.0
I0422 13:06:15.478550 134234063824704 pyconfig.py:432] Config param cast_logits_to_fp32: True
I0422 13:06:15.478577 134234063824704 pyconfig.py:432] Config param chat_template: 
I0422 13:06:15.478600 134234063824704 pyconfig.py:432] Config param chat_template_path: 
I0422 13:06:15.478624 134234063824704 pyconfig.py:432] Config param checkpoint_conversion_fn: None
I0422 13:06:15.478643 134234063824704 pyconfig.py:432] Config param checkpoint_dir: None
I0422 13:06:15.478662 134234063824704 pyconfig.py:432] Config param checkpoint_is_quantized: False
I0422 13:06:15.478679 134234063824704 pyconfig.py:432] Config param checkpoint_period: 2000
I0422 13:06:15.478700 134234063824704 pyconfig.py:432] Config param checkpoint_storage_concurrent_gb: 96
I0422 13:06:15.478725 134234063824704 pyconfig.py:432] Config param checkpoint_storage_target_data_file_size_bytes: 2147483648
I0422 13:06:15.478752 134234063824704 pyconfig.py:432] Config param checkpoint_storage_use_ocdbt: True
I0422 13:06:15.478778 134234063824704 pyconfig.py:432] Config param checkpoint_storage_use_zarr3: True
I0422 13:06:15.478805 134234063824704 pyconfig.py:432] Config param checkpoint_todelete_full_path: None
I0422 13:06:15.478830 134234063824704 pyconfig.py:432] Config param checkpoint_todelete_subdir: None
I0422 13:06:15.478852 134234063824704 pyconfig.py:432] Config param chips_per_vm: 4
I0422 13:06:15.478874 134234063824704 pyconfig.py:432] Config param chunk_attn_window_size: 0
I0422 13:06:15.478897 134234063824704 pyconfig.py:432] Config param collect_stack_trace: False
I0422 13:06:15.478922 134234063824704 pyconfig.py:432] Config param colocated_python_checkpointing: False
I0422 13:06:15.478947 134234063824704 pyconfig.py:432] Config param colocated_python_data_input: False
I0422 13:06:15.478972 134234063824704 pyconfig.py:432] Config param compile_topology: 
I0422 13:06:15.478993 134234063824704 pyconfig.py:432] Config param compile_topology_num_slices: -1
I0422 13:06:15.479015 134234063824704 pyconfig.py:432] Config param compile_xla_flags: 
I0422 13:06:15.479035 134234063824704 pyconfig.py:432] Config param compiled_trainstep_file: 
I0422 13:06:15.479052 134234063824704 pyconfig.py:432] Config param compute_axis_order: 0,1,2,3
I0422 13:06:15.479068 134234063824704 pyconfig.py:432] Config param constant_bound_config: []
I0422 13:06:15.479082 134234063824704 pyconfig.py:432] Config param context: RematLocation.REMAT
I0422 13:06:15.479112 134234063824704 pyconfig.py:432] Config param context_parallel_load_balance: True
I0422 13:06:15.479139 134234063824704 pyconfig.py:432] Config param context_parallel_size: 1
I0422 13:06:15.479162 134234063824704 pyconfig.py:432] Config param context_parallel_strategy: all_gather
I0422 13:06:15.479185 134234063824704 pyconfig.py:432] Config param context_sharding: context
I0422 13:06:15.479209 134234063824704 pyconfig.py:432] Config param conv_chunksize_for_audio: 500
I0422 13:06:15.479235 134234063824704 pyconfig.py:432] Config param conv_stride_for_vit: 14
I0422 13:06:15.479258 134234063824704 pyconfig.py:432] Config param cost_estimate_flops_bwd: -1
I0422 13:06:15.479282 134234063824704 pyconfig.py:432] Config param cost_estimate_flops_fwd: -1
I0422 13:06:15.479307 134234063824704 pyconfig.py:432] Config param custom_mesh: 
I0422 13:06:15.479330 134234063824704 pyconfig.py:432] Config param custom_mesh_and_rule: 
I0422 13:06:15.479354 134234063824704 pyconfig.py:432] Config param d_model_for_audio: 256
I0422 13:06:15.479379 134234063824704 pyconfig.py:432] Config param data_sharding: (('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'),)
I0422 13:06:15.479411 134234063824704 pyconfig.py:432] Config param data_shuffle_seed: 0
I0422 13:06:15.479437 134234063824704 pyconfig.py:432] Config param dataset_name: c4/en:3.0.1
I0422 13:06:15.479461 134234063824704 pyconfig.py:432] Config param dataset_path: 
I0422 13:06:15.479519 134234063824704 pyconfig.py:432] Config param dataset_type: DatasetType.HF
I0422 13:06:15.479552 134234063824704 pyconfig.py:432] Config param dcn_autoregressive_parallelism: 1
I0422 13:06:15.479579 134234063824704 pyconfig.py:432] Config param dcn_context_autoregressive_parallelism: 1
I0422 13:06:15.479603 134234063824704 pyconfig.py:432] Config param dcn_context_parallelism: 1
I0422 13:06:15.479624 134234063824704 pyconfig.py:432] Config param dcn_data_parallelism: -1
I0422 13:06:15.479647 134234063824704 pyconfig.py:432] Config param dcn_diloco_parallelism: 1
I0422 13:06:15.479671 134234063824704 pyconfig.py:432] Config param dcn_expert_parallelism: 1
I0422 13:06:15.479695 134234063824704 pyconfig.py:432] Config param dcn_fsdp_parallelism: 1
I0422 13:06:15.479719 134234063824704 pyconfig.py:432] Config param dcn_fsdp_transpose_parallelism: 1
I0422 13:06:15.479743 134234063824704 pyconfig.py:432] Config param dcn_parallelism: [1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
I0422 13:06:15.479766 134234063824704 pyconfig.py:432] Config param dcn_pipeline_parallelism: 1
I0422 13:06:15.479789 134234063824704 pyconfig.py:432] Config param dcn_sequence_parallelism: 1
I0422 13:06:15.479812 134234063824704 pyconfig.py:432] Config param dcn_tensor_parallelism: 1
I0422 13:06:15.479834 134234063824704 pyconfig.py:432] Config param dcn_tensor_sequence_parallelism: 1
I0422 13:06:15.479856 134234063824704 pyconfig.py:432] Config param dcn_tensor_transpose_parallelism: 1
I0422 13:06:15.479878 134234063824704 pyconfig.py:432] Config param debug: {'rl': False}
I0422 13:06:15.479895 134234063824704 pyconfig.py:432] Config param debug_sharding: False
I0422 13:06:15.479910 134234063824704 pyconfig.py:432] Config param decode_sampling_nucleus_p: -1
I0422 13:06:15.479926 134234063824704 pyconfig.py:432] Config param decode_sampling_strategy: SamplingStrategy.GREEDY
I0422 13:06:15.479943 134234063824704 pyconfig.py:432] Config param decode_sampling_temperature: 1.0
I0422 13:06:15.479958 134234063824704 pyconfig.py:432] Config param decode_sampling_top_k: 0
I0422 13:06:15.479974 134234063824704 pyconfig.py:432] Config param decoder_block: DecoderBlockType.GPT3
I0422 13:06:15.479990 134234063824704 pyconfig.py:432] Config param decoder_layer_input: RematLocation.DEVICE
I0422 13:06:15.480007 134234063824704 pyconfig.py:432] Config param deepstack_visual_indexes_for_vit: []
I0422 13:06:15.480023 134234063824704 pyconfig.py:432] Config param degenerate_group_masking: True
I0422 13:06:15.480038 134234063824704 pyconfig.py:432] Config param dense_init_scale: 1.0
I0422 13:06:15.480053 134234063824704 pyconfig.py:432] Config param diloco_outer_lr: 0.3
I0422 13:06:15.480070 134234063824704 pyconfig.py:432] Config param diloco_outer_momentum: 0.9
I0422 13:06:15.480086 134234063824704 pyconfig.py:432] Config param diloco_sync_period: 36
I0422 13:06:15.480116 134234063824704 pyconfig.py:432] Config param distill_alpha: 0.5
I0422 13:06:15.480132 134234063824704 pyconfig.py:432] Config param distill_alpha_end: None
I0422 13:06:15.480148 134234063824704 pyconfig.py:432] Config param distill_alpha_schedule: constant
I0422 13:06:15.480166 134234063824704 pyconfig.py:432] Config param distill_beta: 0.0
I0422 13:06:15.480180 134234063824704 pyconfig.py:432] Config param distill_beta_end: None
I0422 13:06:15.480197 134234063824704 pyconfig.py:432] Config param distill_beta_schedule: constant
I0422 13:06:15.480212 134234063824704 pyconfig.py:432] Config param distill_feature_loss_type: cosine
I0422 13:06:15.480228 134234063824704 pyconfig.py:432] Config param distill_layer_indices: None
I0422 13:06:15.480242 134234063824704 pyconfig.py:432] Config param distill_temperature: 1.0
I0422 13:06:15.480258 134234063824704 pyconfig.py:432] Config param distill_temperature_end: None
I0422 13:06:15.480273 134234063824704 pyconfig.py:432] Config param distill_temperature_schedule: constant
I0422 13:06:15.480288 134234063824704 pyconfig.py:432] Config param downsample_hidden_size_for_audio: 256
I0422 13:06:15.480304 134234063824704 pyconfig.py:432] Config param dpo_beta: 0.1
I0422 13:06:15.480319 134234063824704 pyconfig.py:432] Config param dpo_label_smoothing: 0.0
I0422 13:06:15.480335 134234063824704 pyconfig.py:432] Config param dq_reduction_steps: 0
I0422 13:06:15.480349 134234063824704 pyconfig.py:432] Config param dropout_rate: 0.0
I0422 13:06:15.480365 134234063824704 pyconfig.py:432] Config param dtype: bfloat16
I0422 13:06:15.480396 134234063824704 pyconfig.py:432] Config param dtype_mm: float32
I0422 13:06:15.480412 134234063824704 pyconfig.py:432] Config param dump_hlo: False
I0422 13:06:15.480427 134234063824704 pyconfig.py:432] Config param dump_hlo_delete_local_after: True
I0422 13:06:15.480442 134234063824704 pyconfig.py:432] Config param dump_hlo_gcs_dir: gpt3-52k_2026-04-22-13-06/xla_dump
I0422 13:06:15.480458 134234063824704 pyconfig.py:432] Config param dump_hlo_local_dir: /tmp/xla_dump/
I0422 13:06:15.480472 134234063824704 pyconfig.py:432] Config param dump_hlo_local_module_name: jit_train_step
I0422 13:06:15.480488 134234063824704 pyconfig.py:432] Config param dump_hlo_module_name: jit_train_step
I0422 13:06:15.480503 134234063824704 pyconfig.py:432] Config param dump_hlo_upload_all: False
I0422 13:06:15.480526 134234063824704 pyconfig.py:432] Config param dump_hlo_xla_flags: 
I0422 13:06:15.480554 134234063824704 pyconfig.py:432] Config param dump_jaxpr: False
I0422 13:06:15.480570 134234063824704 pyconfig.py:432] Config param dump_jaxpr_delete_local_after: True
I0422 13:06:15.480592 134234063824704 pyconfig.py:432] Config param dump_jaxpr_gcs_dir: gpt3-52k_2026-04-22-13-06/jaxpr_dump
I0422 13:06:15.480617 134234063824704 pyconfig.py:432] Config param dump_jaxpr_local_dir: /tmp/jaxpr_dump/
I0422 13:06:15.480641 134234063824704 pyconfig.py:432] Config param dump_step: -1
I0422 13:06:15.480661 134234063824704 pyconfig.py:432] Config param elastic_enabled: False
I0422 13:06:15.480677 134234063824704 pyconfig.py:432] Config param elastic_max_retries: 10
I0422 13:06:15.480696 134234063824704 pyconfig.py:432] Config param elastic_timeout_seconds: 300
I0422 13:06:15.480721 134234063824704 pyconfig.py:432] Config param emb_dim: 16
I0422 13:06:15.480742 134234063824704 pyconfig.py:432] Config param enable_autocheckpoint: False
I0422 13:06:15.480764 134234063824704 pyconfig.py:432] Config param enable_checkpoint_cloud_logger: False
I0422 13:06:15.480789 134234063824704 pyconfig.py:432] Config param enable_checkpointing: True
I0422 13:06:15.480814 134234063824704 pyconfig.py:432] Config param enable_continuous_checkpointing: False
I0422 13:06:15.480839 134234063824704 pyconfig.py:432] Config param enable_data_shuffling: True
I0422 13:06:15.480863 134234063824704 pyconfig.py:432] Config param enable_diloco: False
I0422 13:06:15.480888 134234063824704 pyconfig.py:432] Config param enable_dp_attention: False
I0422 13:06:15.480912 134234063824704 pyconfig.py:432] Config param enable_dropout: False
I0422 13:06:15.480937 134234063824704 pyconfig.py:432] Config param enable_emergency_checkpoint: False
I0422 13:06:15.480961 134234063824704 pyconfig.py:432] Config param enable_expert_parallel: False
I0422 13:06:15.480985 134234063824704 pyconfig.py:432] Config param enable_gcp_goodput_metrics: True
I0422 13:06:15.481010 134234063824704 pyconfig.py:432] Config param enable_gcp_step_deviation_metrics: True
I0422 13:06:15.481034 134234063824704 pyconfig.py:432] Config param enable_goodput_recording: False
I0422 13:06:15.481056 134234063824704 pyconfig.py:432] Config param enable_jax_profiler: False
I0422 13:06:15.481081 134234063824704 pyconfig.py:432] Config param enable_llm_inference_pool: False
I0422 13:06:15.481116 134234063824704 pyconfig.py:432] Config param enable_model_warmup: False
I0422 13:06:15.481142 134234063824704 pyconfig.py:432] Config param enable_multi_tier_checkpointing: False
I0422 13:06:15.481166 134234063824704 pyconfig.py:432] Config param enable_nnx: False
I0422 13:06:15.481191 134234063824704 pyconfig.py:432] Config param enable_orbax_v1: False
I0422 13:06:15.481216 134234063824704 pyconfig.py:432] Config param enable_padding_causal_mask: True
I0422 13:06:15.481240 134234063824704 pyconfig.py:432] Config param enable_pathways_goodput: False
I0422 13:06:15.481264 134234063824704 pyconfig.py:432] Config param enable_prefix_caching: False
I0422 13:06:15.481287 134234063824704 pyconfig.py:432] Config param enable_rampup_batch_size: False
I0422 13:06:15.481309 134234063824704 pyconfig.py:432] Config param enable_single_controller: False
I0422 13:06:15.481330 134234063824704 pyconfig.py:432] Config param enable_single_replica_ckpt_restoring: False
I0422 13:06:15.481353 134234063824704 pyconfig.py:432] Config param enable_tensorboard: True
I0422 13:06:15.481375 134234063824704 pyconfig.py:432] Config param enable_tunix_perf_metrics: False
I0422 13:06:15.481396 134234063824704 pyconfig.py:432] Config param encoder_attention_heads_for_audio: 4
I0422 13:06:15.481419 134234063824704 pyconfig.py:432] Config param encoder_ffn_dim_for_audio: 512
I0422 13:06:15.481441 134234063824704 pyconfig.py:432] Config param encoder_layers_for_audio: 2
I0422 13:06:15.481463 134234063824704 pyconfig.py:432] Config param engram: RematLocation.REMAT
I0422 13:06:15.481487 134234063824704 pyconfig.py:432] Config param engram_head_dim: 1280
I0422 13:06:15.481509 134234063824704 pyconfig.py:432] Config param engram_kernel_size: 4
I0422 13:06:15.481540 134234063824704 pyconfig.py:432] Config param engram_layers: []
I0422 13:06:15.481563 134234063824704 pyconfig.py:432] Config param engram_max_ngram_size: 3
I0422 13:06:15.481587 134234063824704 pyconfig.py:432] Config param engram_num_heads: 8
I0422 13:06:15.481608 134234063824704 pyconfig.py:432] Config param engram_seed: 0
I0422 13:06:15.481631 134234063824704 pyconfig.py:432] Config param engram_vocab_bases: []
I0422 13:06:15.481655 134234063824704 pyconfig.py:432] Config param epsilon_high: None
I0422 13:06:15.481678 134234063824704 pyconfig.py:432] Config param eval_corr_lst: False
I0422 13:06:15.481700 134234063824704 pyconfig.py:432] Config param eval_data_columns: ['text']
I0422 13:06:15.481724 134234063824704 pyconfig.py:432] Config param eval_dataset_name: c4/en:3.0.1
I0422 13:06:15.481747 134234063824704 pyconfig.py:432] Config param eval_image_column: image
I0422 13:06:15.481771 134234063824704 pyconfig.py:432] Config param eval_interval: -1
I0422 13:06:15.481793 134234063824704 pyconfig.py:432] Config param eval_make_lst: False
I0422 13:06:15.481816 134234063824704 pyconfig.py:432] Config param eval_per_device_batch_size: 2
I0422 13:06:15.481839 134234063824704 pyconfig.py:432] Config param eval_sampling_strategy: greedy
I0422 13:06:15.481862 134234063824704 pyconfig.py:432] Config param eval_split: validation
I0422 13:06:15.481885 134234063824704 pyconfig.py:432] Config param eval_steps: -1
I0422 13:06:15.481908 134234063824704 pyconfig.py:432] Config param expansion_factor_real_data: -1.0
I0422 13:06:15.481932 134234063824704 pyconfig.py:432] Config param final_logits_soft_cap: None
I0422 13:06:15.481955 134234063824704 pyconfig.py:432] Config param first_num_dense_layers: 0
I0422 13:06:15.481978 134234063824704 pyconfig.py:432] Config param float32_gate_logits: False
I0422 13:06:15.482001 134234063824704 pyconfig.py:432] Config param float32_logits: False
I0422 13:06:15.482023 134234063824704 pyconfig.py:432] Config param float32_qk_product: False
I0422 13:06:15.482046 134234063824704 pyconfig.py:432] Config param float32_weight_sum: True
I0422 13:06:15.482069 134234063824704 pyconfig.py:432] Config param force_q_layout: False
I0422 13:06:15.482107 134234063824704 pyconfig.py:432] Config param force_unroll: False
I0422 13:06:15.482132 134234063824704 pyconfig.py:432] Config param freeze_audio_encoder_params: True
I0422 13:06:15.482155 134234063824704 pyconfig.py:432] Config param freeze_vision_encoder_params: True
I0422 13:06:15.482178 134234063824704 pyconfig.py:432] Config param fused_mlp: False
I0422 13:06:15.482202 134234063824704 pyconfig.py:432] Config param fused_qkv: True
I0422 13:06:15.482225 134234063824704 pyconfig.py:432] Config param gcs_metrics: False
I0422 13:06:15.482248 134234063824704 pyconfig.py:432] Config param gdn_chunk_size: 64
I0422 13:06:15.482271 134234063824704 pyconfig.py:432] Config param gdn_conv_kernel_dim: 4
I0422 13:06:15.482294 134234063824704 pyconfig.py:432] Config param gdn_key_head_dim: 128
I0422 13:06:15.482317 134234063824704 pyconfig.py:432] Config param gdn_num_key_heads: 16
I0422 13:06:15.482340 134234063824704 pyconfig.py:432] Config param gdn_num_value_heads: 32
I0422 13:06:15.482363 134234063824704 pyconfig.py:432] Config param gdn_value_head_dim: 128
I0422 13:06:15.482386 134234063824704 pyconfig.py:432] Config param generate_padding_batch_eval: False
I0422 13:06:15.482409 134234063824704 pyconfig.py:432] Config param generate_padding_batch_train: False
I0422 13:06:15.482432 134234063824704 pyconfig.py:432] Config param generate_slice: v5e-16
I0422 13:06:15.482455 134234063824704 pyconfig.py:432] Config param generation_configs: {}
I0422 13:06:15.482479 134234063824704 pyconfig.py:432] Config param global_batch_size_to_eval_on: 64
I0422 13:06:15.482502 134234063824704 pyconfig.py:432] Config param global_batch_size_to_load: 512
I0422 13:06:15.482524 134234063824704 pyconfig.py:432] Config param global_batch_size_to_load_eval: 64
I0422 13:06:15.482553 134234063824704 pyconfig.py:432] Config param global_batch_size_to_load_increment: None
I0422 13:06:15.482578 134234063824704 pyconfig.py:432] Config param global_batch_size_to_load_start: None
I0422 13:06:15.482604 134234063824704 pyconfig.py:432] Config param global_batch_size_to_train_on: 512
I0422 13:06:15.482627 134234063824704 pyconfig.py:432] Config param global_head_dim: 0
I0422 13:06:15.482650 134234063824704 pyconfig.py:432] Config param global_num_kv_heads: 0
I0422 13:06:15.482675 134234063824704 pyconfig.py:432] Config param global_parameter_scale: 1
I0422 13:06:15.482702 134234063824704 pyconfig.py:432] Config param global_rampup_samples: 500
I0422 13:06:15.482726 134234063824704 pyconfig.py:432] Config param global_rope_max_timescale: -1
I0422 13:06:15.482751 134234063824704 pyconfig.py:432] Config param global_rope_proportion: 0.25
I0422 13:06:15.482778 134234063824704 pyconfig.py:432] Config param goodput_upload_interval_seconds: 30
I0422 13:06:15.482802 134234063824704 pyconfig.py:432] Config param grad_dtype: float32
I0422 13:06:15.482852 134234063824704 pyconfig.py:432] Config param gradient_accumulation_steps: 8
I0422 13:06:15.482877 134234063824704 pyconfig.py:432] Config param gradient_clipping_threshold: 1.0
I0422 13:06:15.482902 134234063824704 pyconfig.py:432] Config param grain_data_source_max_workers: 16
I0422 13:06:15.482927 134234063824704 pyconfig.py:432] Config param grain_eval_files: 
I0422 13:06:15.482952 134234063824704 pyconfig.py:432] Config param grain_file_type: arrayrecord
I0422 13:06:15.482976 134234063824704 pyconfig.py:432] Config param grain_num_threads: 16
I0422 13:06:15.483001 134234063824704 pyconfig.py:432] Config param grain_num_threads_eval: 16
I0422 13:06:15.483025 134234063824704 pyconfig.py:432] Config param grain_packing_type: first_fit
I0422 13:06:15.483050 134234063824704 pyconfig.py:432] Config param grain_per_worker_buffer_size: 1
I0422 13:06:15.483076 134234063824704 pyconfig.py:432] Config param grain_per_worker_buffer_size_eval: 1
I0422 13:06:15.483112 134234063824704 pyconfig.py:432] Config param grain_prefetch_buffer_size: 500
I0422 13:06:15.483139 134234063824704 pyconfig.py:432] Config param grain_prefetch_buffer_size_eval: 500
I0422 13:06:15.483162 134234063824704 pyconfig.py:432] Config param grain_ram_budget_mb: 1024
I0422 13:06:15.483186 134234063824704 pyconfig.py:432] Config param grain_shuffle_buffer_size: 100
I0422 13:06:15.483211 134234063824704 pyconfig.py:432] Config param grain_train_files: 
I0422 13:06:15.483235 134234063824704 pyconfig.py:432] Config param grain_train_mixture_config_path: 
I0422 13:06:15.483260 134234063824704 pyconfig.py:432] Config param grain_worker_count: 1
I0422 13:06:15.483284 134234063824704 pyconfig.py:432] Config param grain_worker_count_eval: 1
I0422 13:06:15.483309 134234063824704 pyconfig.py:432] Config param grpo_beta: 0.08
I0422 13:06:15.483334 134234063824704 pyconfig.py:432] Config param grpo_epsilon: 0.2
I0422 13:06:15.483360 134234063824704 pyconfig.py:432] Config param hardware: tpu
I0422 13:06:15.483384 134234063824704 pyconfig.py:432] Config param hbm_utilization_vllm: 0.72
I0422 13:06:15.483410 134234063824704 pyconfig.py:432] Config param head_dim: 8
I0422 13:06:15.483435 134234063824704 pyconfig.py:432] Config param heartbeat_reporting_interval_in_seconds: 5
I0422 13:06:15.483460 134234063824704 pyconfig.py:432] Config param hf_data_dir: None
I0422 13:06:15.483483 134234063824704 pyconfig.py:432] Config param hf_eval_files: None
I0422 13:06:15.483509 134234063824704 pyconfig.py:432] Config param hf_eval_split: None
I0422 13:06:15.483538 134234063824704 pyconfig.py:432] Config param hf_name: None
I0422 13:06:15.483563 134234063824704 pyconfig.py:432] Config param hf_path: OptimalScale/ClimbMix
I0422 13:06:15.483587 134234063824704 pyconfig.py:432] Config param hf_train_files: None
I0422 13:06:15.483613 134234063824704 pyconfig.py:432] Config param hidden_size_for_vit: 1408
I0422 13:06:15.483637 134234063824704 pyconfig.py:432] Config param hide_profiler_step_metric: False
I0422 13:06:15.483662 134234063824704 pyconfig.py:432] Config param ici_autoregressive_parallelism: 1
I0422 13:06:15.483687 134234063824704 pyconfig.py:432] Config param ici_context_autoregressive_parallelism: 1
I0422 13:06:15.483712 134234063824704 pyconfig.py:432] Config param ici_context_parallelism: 1
I0422 13:06:15.483736 134234063824704 pyconfig.py:432] Config param ici_data_parallelism: 1
I0422 13:06:15.483763 134234063824704 pyconfig.py:432] Config param ici_diloco_parallelism: 1
I0422 13:06:15.483788 134234063824704 pyconfig.py:432] Config param ici_expert_parallelism: 1
I0422 13:06:15.483814 134234063824704 pyconfig.py:432] Config param ici_fsdp_parallelism: -1
I0422 13:06:15.483839 134234063824704 pyconfig.py:432] Config param ici_fsdp_transpose_parallelism: 1
I0422 13:06:15.483865 134234063824704 pyconfig.py:432] Config param ici_parallelism: [1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
I0422 13:06:15.483893 134234063824704 pyconfig.py:432] Config param ici_pipeline_parallelism: 1
I0422 13:06:15.483918 134234063824704 pyconfig.py:432] Config param ici_sequence_parallelism: 1
I0422 13:06:15.483942 134234063824704 pyconfig.py:432] Config param ici_tensor_parallelism: 1
I0422 13:06:15.483967 134234063824704 pyconfig.py:432] Config param ici_tensor_sequence_parallelism: 1
I0422 13:06:15.483991 134234063824704 pyconfig.py:432] Config param ici_tensor_transpose_parallelism: 1
I0422 13:06:15.484016 134234063824704 pyconfig.py:432] Config param image_path: 
I0422 13:06:15.484041 134234063824704 pyconfig.py:432] Config param image_placeholder: <|image|>
I0422 13:06:15.484066 134234063824704 pyconfig.py:432] Config param image_size_for_vit: 896
I0422 13:06:15.484090 134234063824704 pyconfig.py:432] Config param indexer_head_dim: 128
I0422 13:06:15.484124 134234063824704 pyconfig.py:432] Config param indexer_loss_scaling_factor: 0.0
I0422 13:06:15.484148 134234063824704 pyconfig.py:432] Config param indexer_n_heads: 64
I0422 13:06:15.484173 134234063824704 pyconfig.py:432] Config param indexer_sparse_training: False
I0422 13:06:15.484198 134234063824704 pyconfig.py:432] Config param indexer_topk: 2048
I0422 13:06:15.484222 134234063824704 pyconfig.py:432] Config param inference_benchmark_test: False
I0422 13:06:15.484246 134234063824704 pyconfig.py:432] Config param inference_metadata_file: 
I0422 13:06:15.484271 134234063824704 pyconfig.py:432] Config param inference_microbenchmark_log_file_path: 
I0422 13:06:15.484294 134234063824704 pyconfig.py:432] Config param inference_microbenchmark_loop_iters: 10
I0422 13:06:15.484318 134234063824704 pyconfig.py:432] Config param inference_microbenchmark_num_samples: [1, 2, 3, 4, 5]
I0422 13:06:15.484344 134234063824704 pyconfig.py:432] Config param inference_microbenchmark_prefill_lengths: 64,128,256,512,1024
I0422 13:06:15.484369 134234063824704 pyconfig.py:432] Config param inference_microbenchmark_stages: prefill,generate
I0422 13:06:15.484394 134234063824704 pyconfig.py:432] Config param inference_server: MaxtextInterleavedServer
I0422 13:06:15.484416 134234063824704 pyconfig.py:432] Config param inhomogeneous_layer_cycle_interval: 1
I0422 13:06:15.484438 134234063824704 pyconfig.py:432] Config param init_weights_seed: 0
I0422 13:06:15.484459 134234063824704 pyconfig.py:432] Config param input_data_sharding_logical_axes: ['activation_embed_and_logits_batch', 'activation_norm_length']
I0422 13:06:15.484486 134234063824704 pyconfig.py:432] Config param interleave_moe_layer_step: 1
I0422 13:06:15.484510 134234063824704 pyconfig.py:432] Config param intermediate_size_for_vit: 5632
I0422 13:06:15.484540 134234063824704 pyconfig.py:432] Config param internal_compile: False
I0422 13:06:15.484563 134234063824704 pyconfig.py:432] Config param internal_compile_num_devices: -1
I0422 13:06:15.484585 134234063824704 pyconfig.py:432] Config param jax_cache_dir: ~/jax_cache
I0422 13:06:15.484603 134234063824704 pyconfig.py:432] Config param jax_debug_log_modules: 
I0422 13:06:15.484618 134234063824704 pyconfig.py:432] Config param jax_distributed_initialization_timeout: 300
I0422 13:06:15.484639 134234063824704 pyconfig.py:432] Config param jax_profiler_port: 9999
I0422 13:06:15.484663 134234063824704 pyconfig.py:432] Config param key_proj: RematLocation.REMAT
I0422 13:06:15.484690 134234063824704 pyconfig.py:432] Config param kv_cache_buffer: 256
I0422 13:06:15.484714 134234063824704 pyconfig.py:432] Config param kv_lora_rank: 512
I0422 13:06:15.484738 134234063824704 pyconfig.py:432] Config param kv_quant_axis: KvQuantAxis.HEADS_AND_DKV
I0422 13:06:15.484765 134234063824704 pyconfig.py:432] Config param kv_quant_dtype: int8
I0422 13:06:15.484790 134234063824704 pyconfig.py:432] Config param kv_wa_proj: RematLocation.REMAT
I0422 13:06:15.484815 134234063824704 pyconfig.py:432] Config param learning_rate: 0.0002
I0422 13:06:15.484841 134234063824704 pyconfig.py:432] Config param learning_rate_final_fraction: 0.1
I0422 13:06:15.484866 134234063824704 pyconfig.py:432] Config param learning_rate_schedule_steps: 200000
I0422 13:06:15.484891 134234063824704 pyconfig.py:432] Config param load_balance_loss_weight: 0.0
I0422 13:06:15.484916 134234063824704 pyconfig.py:432] Config param load_checkpoint_only_once: False
I0422 13:06:15.484940 134234063824704 pyconfig.py:432] Config param load_from_prefill_dir: False
I0422 13:06:15.484964 134234063824704 pyconfig.py:432] Config param load_full_state_path: 
I0422 13:06:15.484988 134234063824704 pyconfig.py:432] Config param load_parameters_path: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items
I0422 13:06:15.485013 134234063824704 pyconfig.py:432] Config param local_checkpoint_directory: 
I0422 13:06:15.485037 134234063824704 pyconfig.py:432] Config param local_checkpoint_period: 0
I0422 13:06:15.485060 134234063824704 pyconfig.py:432] Config param local_rope_max_timescale: -1
I0422 13:06:15.485084 134234063824704 pyconfig.py:432] Config param local_rope_proportion: 1.0
I0422 13:06:15.485120 134234063824704 pyconfig.py:432] Config param log_config: True
I0422 13:06:15.485144 134234063824704 pyconfig.py:432] Config param log_period: 10
I0422 13:06:15.485168 134234063824704 pyconfig.py:432] Config param logical_axis_rules: (('activation_embed_and_logits_batch', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_embed_and_logits_batch_sequence', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_vocab', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_vocab', ('tensor', 'tensor_transpose')), ('activation_vocab', 'tensor_sequence'), ('activation_vocab', ('sequence', 'context')), ('vocab', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('embed_vocab', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence', 'autoregressive')), ('activation_kv_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence')), ('activation_attn_length', ('sequence', 'context')), ('activation_attn_length', ('context',)), ('activation_q_length', ('context',)), ('activation_kv_length', ()), ('activation_attn_embed', ('tensor', 'tensor_transpose')), ('activation_kv', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_kv_head_dim', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('q_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('kv_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('qkv', ()), ('kv', ()), ('kv_head_dim', ()), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'expert')), ('q_lora_up_proj', ()), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'expert')), ('kv_lora_up_proj', ()), ('activation_batch_moe', ('data', 'fsdp', 'fsdp_transpose')), ('activation_length_moe', ('sequence', 'context')), ('activation_length_moe', ('context',)), ('activation_norm_length_moe', ('tensor_sequence', 'context', 'sequence')), ('activation_embed_moe', ('tensor', 'tensor_transpose')), ('activation_mlp_moe', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_exp', ('expert',)), ('exp', 'expert'), ('mlp_moe', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'context')), ('embed_moe', ('fsdp', 'sequence', 'context')), ('activation_mlp', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_length', ('sequence', 'context')), ('activation_length', ('context',)), ('activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_embed', ('tensor', 'tensor_transpose')), ('activation_stage', 'stage'), ('mlp', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'context', 'expert')), ('norm', ('tensor', 'tensor_transpose')), ('layers', 'stage'), ('diloco', 'diloco'), ('engram_dim', ('tensor',)), ('dense_layers', ()), ('moe_layers', ()), ('mhc', ()), ('prefill_activation_length', ('sequence', 'context')), ('prefill_activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_prefill_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_length', ('sequence',)), ('cache_heads', ('autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence')), ('cache_heads', ('autoregressive', 'tensor', 'tensor_sequence')), ('paged_kv_heads', ('tensor',)), ('cache_batch_prefill', ()), ('cache_batch', ()), ('cache_heads_none', ()), ('cache_kv', ()), ('cache_sequence', ()), ('num_pages', ()), ('tokens_per_page', ()), ('paged_kv_head_dim_size', ()), ('mlp_no_fsdp', ('tensor', 'tensor_sequence', 'autoregressive')), ('embed_tensor_transpose', ('tensor_transpose',)), ('exp_with_fsdp', 'fsdp'))
I0422 13:06:15.485260 134234063824704 pyconfig.py:432] Config param logits_dot_in_fp32: False
I0422 13:06:15.485278 134234063824704 pyconfig.py:432] Config param logits_via_embedding: True
I0422 13:06:15.485296 134234063824704 pyconfig.py:432] Config param lora_input_adapters_path: 
I0422 13:06:15.485316 134234063824704 pyconfig.py:432] Config param loss_algo: grpo
I0422 13:06:15.485341 134234063824704 pyconfig.py:432] Config param lr_schedule_type: LearningRateScheduleType.COSINE
I0422 13:06:15.485366 134234063824704 pyconfig.py:432] Config param managed_mldiagnostics: False
I0422 13:06:15.485388 134234063824704 pyconfig.py:432] Config param managed_mldiagnostics_dir: None
I0422 13:06:15.485405 134234063824704 pyconfig.py:432] Config param managed_mldiagnostics_run_group: 
I0422 13:06:15.485421 134234063824704 pyconfig.py:432] Config param matmul_precision: MatmulPrecision.DEFAULT
I0422 13:06:15.485440 134234063824704 pyconfig.py:432] Config param max_checkify: False
I0422 13:06:15.485456 134234063824704 pyconfig.py:432] Config param max_concurrency: 256
I0422 13:06:15.485471 134234063824704 pyconfig.py:432] Config param max_corpus_chars: 10000000
I0422 13:06:15.485491 134234063824704 pyconfig.py:432] Config param max_num_batched_tokens: None
I0422 13:06:15.485516 134234063824704 pyconfig.py:432] Config param max_num_checkpoints_to_keep: None
I0422 13:06:15.485542 134234063824704 pyconfig.py:432] Config param max_num_images_per_example: -1
I0422 13:06:15.485564 134234063824704 pyconfig.py:432] Config param max_num_seqs: None
I0422 13:06:15.485583 134234063824704 pyconfig.py:432] Config param max_position_embeddings: 163840
I0422 13:06:15.485598 134234063824704 pyconfig.py:432] Config param max_prefill_predict_length: 64
I0422 13:06:15.485617 134234063824704 pyconfig.py:432] Config param max_sample_len_for_audio: 10000
I0422 13:06:15.485642 134234063824704 pyconfig.py:432] Config param max_segments_per_seq: -1
I0422 13:06:15.485665 134234063824704 pyconfig.py:432] Config param max_source_positions_for_audio: 1500
I0422 13:06:15.485689 134234063824704 pyconfig.py:432] Config param max_target_length: 2048
I0422 13:06:15.485713 134234063824704 pyconfig.py:432] Config param max_timescale_for_audio: 10000.0
I0422 13:06:15.485739 134234063824704 pyconfig.py:432] Config param megablox: True
I0422 13:06:15.485765 134234063824704 pyconfig.py:432] Config param merge_gating_gmm: False
I0422 13:06:15.485790 134234063824704 pyconfig.py:432] Config param mesh_axes: ['diloco', 'data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive']
I0422 13:06:15.485817 134234063824704 pyconfig.py:432] Config param metrics_dir: None
I0422 13:06:15.485839 134234063824704 pyconfig.py:432] Config param metrics_file: 
I0422 13:06:15.485862 134234063824704 pyconfig.py:432] Config param mhc_expansion_rate: 1
I0422 13:06:15.485885 134234063824704 pyconfig.py:432] Config param micro_batch_size_to_eval_on: 64
I0422 13:06:15.485909 134234063824704 pyconfig.py:432] Config param micro_batch_size_to_train_on: 64
I0422 13:06:15.485933 134234063824704 pyconfig.py:432] Config param mla_kv: RematLocation.REMAT
I0422 13:06:15.485959 134234063824704 pyconfig.py:432] Config param mla_naive_kvcache: True
I0422 13:06:15.485984 134234063824704 pyconfig.py:432] Config param mla_q: RematLocation.REMAT
I0422 13:06:15.486009 134234063824704 pyconfig.py:432] Config param mlp_activations: ['gelu']
I0422 13:06:15.486034 134234063824704 pyconfig.py:432] Config param mlp_activations_limit: -1.0
I0422 13:06:15.486059 134234063824704 pyconfig.py:432] Config param mlp_bias: False
I0422 13:06:15.486084 134234063824704 pyconfig.py:432] Config param mlp_dim: 64
I0422 13:06:15.486120 134234063824704 pyconfig.py:432] Config param mlpwi: RematLocation.REMAT
I0422 13:06:15.486145 134234063824704 pyconfig.py:432] Config param mlpwi_0: RematLocation.REMAT
I0422 13:06:15.486170 134234063824704 pyconfig.py:432] Config param mlpwi_1: RematLocation.REMAT
I0422 13:06:15.486196 134234063824704 pyconfig.py:432] Config param mlpwo: RematLocation.REMAT
I0422 13:06:15.486221 134234063824704 pyconfig.py:432] Config param moba: False
I0422 13:06:15.486245 134234063824704 pyconfig.py:432] Config param moba_chunk_size: 1024
I0422 13:06:15.486270 134234063824704 pyconfig.py:432] Config param moba_topk: 8
I0422 13:06:15.486295 134234063824704 pyconfig.py:432] Config param model_call_mode: 
I0422 13:06:15.486319 134234063824704 pyconfig.py:432] Config param model_name: gpt3-52k
I0422 13:06:15.486344 134234063824704 pyconfig.py:432] Config param moe_expert_input_dim: -1
I0422 13:06:15.486369 134234063824704 pyconfig.py:432] Config param moe_fsdp_use_two_stage_all_gather: False
I0422 13:06:15.486392 134234063824704 pyconfig.py:432] Config param moe_mlp_dim: -1
I0422 13:06:15.486417 134234063824704 pyconfig.py:432] Config param moe_mlpwi_0: RematLocation.REMAT
I0422 13:06:15.486442 134234063824704 pyconfig.py:432] Config param moe_mlpwi_1: RematLocation.REMAT
I0422 13:06:15.486467 134234063824704 pyconfig.py:432] Config param moe_mlpwo: RematLocation.REMAT
I0422 13:06:15.486493 134234063824704 pyconfig.py:432] Config param monitor_goodput: False
I0422 13:06:15.486518 134234063824704 pyconfig.py:432] Config param monitor_step_time_deviation: True
I0422 13:06:15.486550 134234063824704 pyconfig.py:432] Config param mrope_section: [24, 20, 20]
I0422 13:06:15.486577 134234063824704 pyconfig.py:432] Config param mscale: 1.0
I0422 13:06:15.486603 134234063824704 pyconfig.py:432] Config param mtc_data_parallelism: 0
I0422 13:06:15.486627 134234063824704 pyconfig.py:432] Config param mtp_eval_target_module: 0
I0422 13:06:15.486653 134234063824704 pyconfig.py:432] Config param mtp_loss_scaling_factor: 0.1
I0422 13:06:15.486678 134234063824704 pyconfig.py:432] Config param mtp_num_layers: 0
I0422 13:06:15.486702 134234063824704 pyconfig.py:432] Config param mu_dtype: float32
I0422 13:06:15.486740 134234063824704 pyconfig.py:432] Config param multi_sampling: False
I0422 13:06:15.486765 134234063824704 pyconfig.py:432] Config param multi_tier_checkpointing_backup_interval_minutes: 0
I0422 13:06:15.486790 134234063824704 pyconfig.py:432] Config param muon_beta: 0.95
I0422 13:06:15.486816 134234063824704 pyconfig.py:432] Config param muon_consistent_rms: None
I0422 13:06:15.486841 134234063824704 pyconfig.py:432] Config param muon_weight_decay: 0.0
I0422 13:06:15.486866 134234063824704 pyconfig.py:432] Config param n_routing_groups: -1
I0422 13:06:15.486891 134234063824704 pyconfig.py:432] Config param n_window_for_audio: 50
I0422 13:06:15.486915 134234063824704 pyconfig.py:432] Config param n_window_infer_for_audio: 800
I0422 13:06:15.486940 134234063824704 pyconfig.py:432] Config param nope_layer_interval: -1
I0422 13:06:15.486965 134234063824704 pyconfig.py:432] Config param norm_topk_prob: False
I0422 13:06:15.486990 134234063824704 pyconfig.py:432] Config param normalization_layer_epsilon: 1e-05
I0422 13:06:15.487017 134234063824704 pyconfig.py:432] Config param normalize_embedding_logits: False
I0422 13:06:15.487043 134234063824704 pyconfig.py:432] Config param num_attention_heads_for_vit: 16
I0422 13:06:15.487068 134234063824704 pyconfig.py:432] Config param num_batches: 4
I0422 13:06:15.487104 134234063824704 pyconfig.py:432] Config param num_channels_for_vit: 3
I0422 13:06:15.487130 134234063824704 pyconfig.py:432] Config param num_conv_layers_for_audio: 3
I0422 13:06:15.487155 134234063824704 pyconfig.py:432] Config param num_decoder_layers: 1
I0422 13:06:15.487180 134234063824704 pyconfig.py:432] Config param num_diloco_replicas: 1
I0422 13:06:15.487205 134234063824704 pyconfig.py:432] Config param num_epoch: 1
I0422 13:06:15.487230 134234063824704 pyconfig.py:432] Config param num_eval_passes: 1
I0422 13:06:15.487255 134234063824704 pyconfig.py:432] Config param num_experts: 1
I0422 13:06:15.487279 134234063824704 pyconfig.py:432] Config param num_experts_per_tok: 1
I0422 13:06:15.487303 134234063824704 pyconfig.py:432] Config param num_generations: 2
I0422 13:06:15.487328 134234063824704 pyconfig.py:432] Config param num_hidden_layers_for_vit: 34
I0422 13:06:15.487353 134234063824704 pyconfig.py:432] Config param num_iterations: 1
I0422 13:06:15.487378 134234063824704 pyconfig.py:432] Config param num_kv_heads: 2
I0422 13:06:15.487403 134234063824704 pyconfig.py:432] Config param num_layers_per_pipeline_stage: 1
I0422 13:06:15.487427 134234063824704 pyconfig.py:432] Config param num_mel_bins_for_audio: 128
I0422 13:06:15.487451 134234063824704 pyconfig.py:432] Config param num_pipeline_microbatches: -1
I0422 13:06:15.487476 134234063824704 pyconfig.py:432] Config param num_pipeline_repeats: -1
I0422 13:06:15.487500 134234063824704 pyconfig.py:432] Config param num_position_embeddings_for_vit: 1024
I0422 13:06:15.487525 134234063824704 pyconfig.py:432] Config param num_query_heads: 2
I0422 13:06:15.487554 134234063824704 pyconfig.py:432] Config param num_samplers_slices: -1
I0422 13:06:15.487579 134234063824704 pyconfig.py:432] Config param num_slices: 1
I0422 13:06:15.487603 134234063824704 pyconfig.py:432] Config param num_target_devices: 32
I0422 13:06:15.487627 134234063824704 pyconfig.py:432] Config param num_test_batches: 5
I0422 13:06:15.487652 134234063824704 pyconfig.py:432] Config param num_trainer_slices: -1
I0422 13:06:15.487677 134234063824704 pyconfig.py:432] Config param num_vocab_tiling: 1
I0422 13:06:15.487700 134234063824704 pyconfig.py:432] Config param off_policy_steps: 0
I0422 13:06:15.487725 134234063824704 pyconfig.py:432] Config param offline_data_dir: None
I0422 13:06:15.487749 134234063824704 pyconfig.py:432] Config param opt_type: OptimizerType.ADAM_PAX
I0422 13:06:15.487778 134234063824704 pyconfig.py:432] Config param optimize_mesh_for_tpu_v6e: False
I0422 13:06:15.487803 134234063824704 pyconfig.py:432] Config param optimizer_memory_host_offload: False
I0422 13:06:15.487828 134234063824704 pyconfig.py:432] Config param original_max_position_embeddings: 4096
I0422 13:06:15.487855 134234063824704 pyconfig.py:432] Config param out_hidden_size_for_vit: 512
I0422 13:06:15.487880 134234063824704 pyconfig.py:432] Config param out_proj: RematLocation.REMAT
I0422 13:06:15.487905 134234063824704 pyconfig.py:432] Config param output_dim_for_audio: 512
I0422 13:06:15.487929 134234063824704 pyconfig.py:432] Config param override_logical_axis_rules: False
I0422 13:06:15.487953 134234063824704 pyconfig.py:432] Config param override_model_config: True
I0422 13:06:15.487977 134234063824704 pyconfig.py:432] Config param packing: True
I0422 13:06:15.488001 134234063824704 pyconfig.py:432] Config param pagedattn_head_dim_alignment: 128
I0422 13:06:15.488025 134234063824704 pyconfig.py:432] Config param pagedattn_max_pages_per_group: -1
I0422 13:06:15.488050 134234063824704 pyconfig.py:432] Config param pagedattn_num_pages: 64
I0422 13:06:15.488074 134234063824704 pyconfig.py:432] Config param pagedattn_pages_per_compute_block: 4
I0422 13:06:15.488107 134234063824704 pyconfig.py:432] Config param pagedattn_tokens_per_page: 32
I0422 13:06:15.488132 134234063824704 pyconfig.py:432] Config param param_scan_axis: 1
I0422 13:06:15.488156 134234063824704 pyconfig.py:432] Config param parameter_memory_host_offload: False
I0422 13:06:15.488181 134234063824704 pyconfig.py:432] Config param partial_rotary_factor: 1.0
I0422 13:06:15.488206 134234063824704 pyconfig.py:432] Config param patch_size_for_vit: 14
I0422 13:06:15.488230 134234063824704 pyconfig.py:432] Config param penalty_incorrect_answer: -1.0
I0422 13:06:15.488255 134234063824704 pyconfig.py:432] Config param penalty_incorrect_format: -0.5
I0422 13:06:15.488280 134234063824704 pyconfig.py:432] Config param per_device_batch_size: 2
I0422 13:06:15.488305 134234063824704 pyconfig.py:432] Config param per_device_batch_size_increment: 2.0
I0422 13:06:15.488331 134234063824704 pyconfig.py:432] Config param per_device_batch_size_start: 4.0
I0422 13:06:15.488356 134234063824704 pyconfig.py:432] Config param pipeline_delay_activation_forwarding: False
I0422 13:06:15.488381 134234063824704 pyconfig.py:432] Config param pipeline_fsdp_ag_once: False
I0422 13:06:15.488408 134234063824704 pyconfig.py:432] Config param pipeline_fsdp_ag_per_repeat: False
I0422 13:06:15.488433 134234063824704 pyconfig.py:432] Config param pipeline_parallel_layers: 1
I0422 13:06:15.488456 134234063824704 pyconfig.py:432] Config param pixel_shuffle_ratio_for_vit: 0.5
I0422 13:06:15.488480 134234063824704 pyconfig.py:432] Config param posemb_type_for_vit: learn
I0422 13:06:15.488505 134234063824704 pyconfig.py:432] Config param position_id_per_seconds: 25
I0422 13:06:15.488530 134234063824704 pyconfig.py:432] Config param prefill_cache_axis_order: 1,2,0,3
I0422 13:06:15.488561 134234063824704 pyconfig.py:432] Config param prefill_cache_dir: 
I0422 13:06:15.488586 134234063824704 pyconfig.py:432] Config param prefill_chunk_size: 256
I0422 13:06:15.488611 134234063824704 pyconfig.py:432] Config param prefill_slice: v5e-16
I0422 13:06:15.488636 134234063824704 pyconfig.py:432] Config param prefix_caching_dram_byte: 100000000000
I0422 13:06:15.488662 134234063824704 pyconfig.py:432] Config param prefix_caching_hbm_byte: 10000000000
I0422 13:06:15.488687 134234063824704 pyconfig.py:432] Config param profile_cleanly: True
I0422 13:06:15.488711 134234063824704 pyconfig.py:432] Config param profile_periodically_period: -1
I0422 13:06:15.488735 134234063824704 pyconfig.py:432] Config param profile_power_events: False
I0422 13:06:15.488759 134234063824704 pyconfig.py:432] Config param profiler: ProfilerType.NONE
I0422 13:06:15.488787 134234063824704 pyconfig.py:432] Config param profiler_steps: 5
I0422 13:06:15.488812 134234063824704 pyconfig.py:432] Config param projector_dropout_for_vit: 0.0
I0422 13:06:15.488837 134234063824704 pyconfig.py:432] Config param projector_input_dim_for_vit: 4096
I0422 13:06:15.488862 134234063824704 pyconfig.py:432] Config param projector_output_dim_for_vit: 4096
I0422 13:06:15.488886 134234063824704 pyconfig.py:432] Config param prometheus_port: 0
I0422 13:06:15.488910 134234063824704 pyconfig.py:432] Config param prompt: I love to
I0422 13:06:15.488933 134234063824704 pyconfig.py:432] Config param pure_nnx: False
I0422 13:06:15.488957 134234063824704 pyconfig.py:432] Config param pure_nnx_decoder: False
I0422 13:06:15.488981 134234063824704 pyconfig.py:432] Config param q_lora_rank: 0
I0422 13:06:15.489005 134234063824704 pyconfig.py:432] Config param qk_clip_threshold: 100.0
I0422 13:06:15.489030 134234063824704 pyconfig.py:432] Config param qk_nope_head_dim: 128
I0422 13:06:15.489054 134234063824704 pyconfig.py:432] Config param qk_norm_with_scale: True
I0422 13:06:15.489078 134234063824704 pyconfig.py:432] Config param qk_rope_head_dim: 64
I0422 13:06:15.489140 134234063824704 pyconfig.py:432] Config param qkv_proj: RematLocation.REMAT
I0422 13:06:15.489165 134234063824704 pyconfig.py:432] Config param quant_cfg_path: 
I0422 13:06:15.489188 134234063824704 pyconfig.py:432] Config param quantization: QuantizationType.NONE
I0422 13:06:15.489214 134234063824704 pyconfig.py:432] Config param quantization_local_shard_count: 4
I0422 13:06:15.489239 134234063824704 pyconfig.py:432] Config param quantize_kvcache: False
I0422 13:06:15.489262 134234063824704 pyconfig.py:432] Config param query_proj: RematLocation.REMAT
I0422 13:06:15.489285 134234063824704 pyconfig.py:432] Config param query_wa_proj: RematLocation.REMAT
I0422 13:06:15.489303 134234063824704 pyconfig.py:432] Config param ragged_block_size: 256
I0422 13:06:15.489320 134234063824704 pyconfig.py:432] Config param ragged_buffer_factor: -1.0
I0422 13:06:15.489336 134234063824704 pyconfig.py:432] Config param rampup_end_step: 0
I0422 13:06:15.489352 134234063824704 pyconfig.py:432] Config param rampup_samples_per_increment_to_load: None
I0422 13:06:15.489372 134234063824704 pyconfig.py:432] Config param reasoning_end_token: </reasoning>
I0422 13:06:15.489398 134234063824704 pyconfig.py:432] Config param reasoning_start_token: <reasoning>
I0422 13:06:15.489422 134234063824704 pyconfig.py:432] Config param record_internal_nn_metrics: 0
I0422 13:06:15.489443 134234063824704 pyconfig.py:432] Config param remat_policy: full
I0422 13:06:15.489463 134234063824704 pyconfig.py:432] Config param remat_policy_for_vit: minimal
I0422 13:06:15.489480 134234063824704 pyconfig.py:432] Config param remove_size_one_mesh_axis_from_type: True
I0422 13:06:15.489505 134234063824704 pyconfig.py:432] Config param replicate_quant_scale: False
I0422 13:06:15.489529 134234063824704 pyconfig.py:432] Config param replicator_backup_interval_minutes: 0
I0422 13:06:15.489561 134234063824704 pyconfig.py:432] Config param report_heartbeat_metric_for_gcp_monitoring: False
I0422 13:06:15.489585 134234063824704 pyconfig.py:432] Config param report_performance_metric_for_gcp_monitoring: False
I0422 13:06:15.489606 134234063824704 pyconfig.py:432] Config param reshape_q: False
I0422 13:06:15.489629 134234063824704 pyconfig.py:432] Config param return_log_prob: False
I0422 13:06:15.489645 134234063824704 pyconfig.py:432] Config param reuse_example_batch: 0
I0422 13:06:15.489660 134234063824704 pyconfig.py:432] Config param reward_exact_answer: 5.0
I0422 13:06:15.489675 134234063824704 pyconfig.py:432] Config param reward_exact_format_match: 3.0
I0422 13:06:15.489691 134234063824704 pyconfig.py:432] Config param reward_partial_format_match: 0.5
I0422 13:06:15.489712 134234063824704 pyconfig.py:432] Config param reward_ratio_guess_to_answer_high: 0.5
I0422 13:06:15.489739 134234063824704 pyconfig.py:432] Config param reward_ratio_guess_to_answer_low: 0.25
I0422 13:06:15.489766 134234063824704 pyconfig.py:432] Config param reward_white_space_format_match: 1.5
I0422 13:06:15.489792 134234063824704 pyconfig.py:432] Config param rl: {'num_generations': 2, 'num_iterations': 1, 'grpo_beta': 0.08, 'grpo_epsilon': 0.2, 'loss_algo': 'grpo', 'use_agentic_rollout': False, 'max_concurrency': 256, 'off_policy_steps': 0, 'system_prompt': '', 'degenerate_group_masking': True, 'epsilon_high': None}
I0422 13:06:15.489824 134234063824704 pyconfig.py:432] Config param rollout_data_parallelism: -1
I0422 13:06:15.489851 134234063824704 pyconfig.py:432] Config param rollout_expert_parallelism: 1
I0422 13:06:15.489873 134234063824704 pyconfig.py:432] Config param rollout_micro_batch_size: -1
I0422 13:06:15.489896 134234063824704 pyconfig.py:432] Config param rollout_tensor_parallelism: -1
I0422 13:06:15.489918 134234063824704 pyconfig.py:432] Config param rope_attention_scaling: False
I0422 13:06:15.489940 134234063824704 pyconfig.py:432] Config param rope_factor: 40
I0422 13:06:15.489962 134234063824704 pyconfig.py:432] Config param rope_interleave: True
I0422 13:06:15.489984 134234063824704 pyconfig.py:432] Config param rope_linear_scaling_factor: 1.0
I0422 13:06:15.490007 134234063824704 pyconfig.py:432] Config param rope_max_timescale: 10000
I0422 13:06:15.490029 134234063824704 pyconfig.py:432] Config param rope_min_timescale: 1
I0422 13:06:15.490051 134234063824704 pyconfig.py:432] Config param rope_theta_for_vit: 10000
I0422 13:06:15.490073 134234063824704 pyconfig.py:432] Config param rope_truncate: True
I0422 13:06:15.490110 134234063824704 pyconfig.py:432] Config param rope_type: RopeType.DEFAULT
I0422 13:06:15.490137 134234063824704 pyconfig.py:432] Config param rope_use_scale: True
I0422 13:06:15.490161 134234063824704 pyconfig.py:432] Config param routed_bias: False
I0422 13:06:15.490184 134234063824704 pyconfig.py:432] Config param routed_bias_update_rate: 0.0
I0422 13:06:15.490208 134234063824704 pyconfig.py:432] Config param routed_scaling_factor: 1.0
I0422 13:06:15.490231 134234063824704 pyconfig.py:432] Config param routed_score_func: 
I0422 13:06:15.490255 134234063824704 pyconfig.py:432] Config param run_name: gpt3-52k_2026-04-22-13-06
I0422 13:06:15.490278 134234063824704 pyconfig.py:432] Config param sa_block_kv: 512
I0422 13:06:15.490301 134234063824704 pyconfig.py:432] Config param sa_block_kv_compute: 512
I0422 13:06:15.490325 134234063824704 pyconfig.py:432] Config param sa_block_kv_dkv: 512
I0422 13:06:15.490348 134234063824704 pyconfig.py:432] Config param sa_block_kv_dkv_compute: 512
I0422 13:06:15.490371 134234063824704 pyconfig.py:432] Config param sa_block_kv_dq: 512
I0422 13:06:15.490394 134234063824704 pyconfig.py:432] Config param sa_block_q: 512
I0422 13:06:15.490417 134234063824704 pyconfig.py:432] Config param sa_block_q_dkv: 512
I0422 13:06:15.490440 134234063824704 pyconfig.py:432] Config param sa_block_q_dq: 512
I0422 13:06:15.490463 134234063824704 pyconfig.py:432] Config param sa_k_layout: HEAD_DIM_MINOR
I0422 13:06:15.490487 134234063824704 pyconfig.py:432] Config param sa_q_layout: HEAD_DIM_MINOR
I0422 13:06:15.490509 134234063824704 pyconfig.py:432] Config param sa_use_fused_bwd_kernel: False
I0422 13:06:15.490537 134234063824704 pyconfig.py:432] Config param sa_v_layout: HEAD_DIM_MINOR
I0422 13:06:15.490560 134234063824704 pyconfig.py:432] Config param sampler_devices_fraction: 0.5
I0422 13:06:15.490583 134234063824704 pyconfig.py:432] Config param save_checkpoint_on_completion: True
I0422 13:06:15.490606 134234063824704 pyconfig.py:432] Config param save_config_to_gcs: False
I0422 13:06:15.490630 134234063824704 pyconfig.py:432] Config param save_quantized_params_path: 
I0422 13:06:15.490653 134234063824704 pyconfig.py:432] Config param scale_embedding_for_audio: True
I0422 13:06:15.490676 134234063824704 pyconfig.py:432] Config param scan_layers: True
I0422 13:06:15.490699 134234063824704 pyconfig.py:432] Config param scan_layers_per_stage: False
I0422 13:06:15.490722 134234063824704 pyconfig.py:432] Config param scan_pipeline_iterations: True
I0422 13:06:15.490745 134234063824704 pyconfig.py:432] Config param scan_pipeline_repeats: False
I0422 13:06:15.490769 134234063824704 pyconfig.py:432] Config param set_remat_policy_on_layers_per_stage: False
I0422 13:06:15.490792 134234063824704 pyconfig.py:432] Config param set_remat_policy_on_pipeline_iterations: True
I0422 13:06:15.490815 134234063824704 pyconfig.py:432] Config param sft_train_on_completion_only: False
I0422 13:06:15.490836 134234063824704 pyconfig.py:432] Config param shard_exp_on_fsdp: False
I0422 13:06:15.490859 134234063824704 pyconfig.py:432] Config param shard_mode: ShardMode.AUTO
I0422 13:06:15.490886 134234063824704 pyconfig.py:432] Config param shard_optimizer_over_data: False
I0422 13:06:15.490909 134234063824704 pyconfig.py:432] Config param sharding_strategy: None
I0422 13:06:15.490933 134234063824704 pyconfig.py:432] Config param sharding_tolerance: 0.02
I0422 13:06:15.490956 134234063824704 pyconfig.py:432] Config param shardy: True
I0422 13:06:15.490979 134234063824704 pyconfig.py:432] Config param share_kv_projections: False
I0422 13:06:15.491002 134234063824704 pyconfig.py:432] Config param shared_experts: 0
I0422 13:06:15.491025 134234063824704 pyconfig.py:432] Config param sinkhorn_iterations: 20
I0422 13:06:15.491049 134234063824704 pyconfig.py:432] Config param skip_first_n_steps_for_profiler: 1
I0422 13:06:15.491071 134234063824704 pyconfig.py:432] Config param skip_jax_distributed_system: False
I0422 13:06:15.491105 134234063824704 pyconfig.py:432] Config param skip_step_interval: 128
I0422 13:06:15.491129 134234063824704 pyconfig.py:432] Config param skip_step_on_spikes: False
I0422 13:06:15.491154 134234063824704 pyconfig.py:432] Config param skip_step_scaling_factor: 6.0
I0422 13:06:15.491181 134234063824704 pyconfig.py:432] Config param sliding_window_size: 0
I0422 13:06:15.491206 134234063824704 pyconfig.py:432] Config param solution_end_token: </answer>
I0422 13:06:15.491233 134234063824704 pyconfig.py:432] Config param solution_start_token: <answer>
I0422 13:06:15.491257 134234063824704 pyconfig.py:432] Config param source_checkpoint_layout: orbax
I0422 13:06:15.491278 134234063824704 pyconfig.py:432] Config param sparse_matmul: True
I0422 13:06:15.491299 134234063824704 pyconfig.py:432] Config param spatial_merge_size_for_vit: 2
I0422 13:06:15.491316 134234063824704 pyconfig.py:432] Config param stack_prefill_result_cache: False
I0422 13:06:15.491330 134234063824704 pyconfig.py:432] Config param stack_trace_interval_seconds: 600
I0422 13:06:15.491346 134234063824704 pyconfig.py:432] Config param stack_trace_to_cloud: False
I0422 13:06:15.491360 134234063824704 pyconfig.py:432] Config param step_deviation_interval_seconds: 30
I0422 13:06:15.491376 134234063824704 pyconfig.py:432] Config param steps: 200000
I0422 13:06:15.491391 134234063824704 pyconfig.py:432] Config param stop_strings: None
I0422 13:06:15.491406 134234063824704 pyconfig.py:432] Config param student_overrides: {'model_name': 'llama3.1-8b'}
I0422 13:06:15.491422 134234063824704 pyconfig.py:432] Config param student_params_to_update: None
I0422 13:06:15.491438 134234063824704 pyconfig.py:432] Config param subslice_shape: 
I0422 13:06:15.491454 134234063824704 pyconfig.py:432] Config param swap_space_vllm_gb: 2
I0422 13:06:15.491470 134234063824704 pyconfig.py:432] Config param system_prompt: 
I0422 13:06:15.491484 134234063824704 pyconfig.py:432] Config param target_eval_loss: 0.0
I0422 13:06:15.491501 134234063824704 pyconfig.py:432] Config param teacher_overrides: {'model_name': 'llama3.1-8b'}
I0422 13:06:15.491517 134234063824704 pyconfig.py:432] Config param temperature_tuning: False
I0422 13:06:15.491536 134234063824704 pyconfig.py:432] Config param temporal_patch_size_for_vit: 2
I0422 13:06:15.491552 134234063824704 pyconfig.py:432] Config param tensorboard_dir: None
I0422 13:06:15.491568 134234063824704 pyconfig.py:432] Config param tensors_on_device: None
I0422 13:06:15.491584 134234063824704 pyconfig.py:432] Config param tensors_to_offload: None
I0422 13:06:15.491600 134234063824704 pyconfig.py:432] Config param test_batch_start_index: 0
I0422 13:06:15.491615 134234063824704 pyconfig.py:432] Config param tile_size_for_vit: 336
I0422 13:06:15.491630 134234063824704 pyconfig.py:432] Config param tokenize_eval_data: True
I0422 13:06:15.491646 134234063824704 pyconfig.py:432] Config param tokenize_train_data: True
I0422 13:06:15.491660 134234063824704 pyconfig.py:432] Config param tokenizer_path: meta-llama/Llama-3.1-8B
I0422 13:06:15.491677 134234063824704 pyconfig.py:432] Config param tokenizer_type: TokenizerType.HUGGINGFACE
I0422 13:06:15.491695 134234063824704 pyconfig.py:432] Config param topk_routing_group: -1
I0422 13:06:15.491711 134234063824704 pyconfig.py:432] Config param train_data_columns: ['text']
I0422 13:06:15.491727 134234063824704 pyconfig.py:432] Config param train_fraction: 1.0
I0422 13:06:15.491744 134234063824704 pyconfig.py:432] Config param train_image_column: image
I0422 13:06:15.491758 134234063824704 pyconfig.py:432] Config param train_micro_batch_size: -1
I0422 13:06:15.491774 134234063824704 pyconfig.py:432] Config param train_split: train
I0422 13:06:15.491788 134234063824704 pyconfig.py:432] Config param trainable_parameters_mask: []
I0422 13:06:15.491804 134234063824704 pyconfig.py:432] Config param trainable_position_size: 2048
I0422 13:06:15.491819 134234063824704 pyconfig.py:432] Config param trainer_devices_fraction: 0.5
I0422 13:06:15.491836 134234063824704 pyconfig.py:432] Config param upload_all_profiler_results: False
I0422 13:06:15.491852 134234063824704 pyconfig.py:432] Config param use_2d_fsdp_sharding: False
I0422 13:06:15.491869 134234063824704 pyconfig.py:432] Config param use_agentic_rollout: False
I0422 13:06:15.491883 134234063824704 pyconfig.py:432] Config param use_audio: False
I0422 13:06:15.491899 134234063824704 pyconfig.py:432] Config param use_audio_in_video: False
I0422 13:06:15.491915 134234063824704 pyconfig.py:432] Config param use_batch_split_schedule: False
I0422 13:06:15.491929 134234063824704 pyconfig.py:432] Config param use_chat_template: False
I0422 13:06:15.491944 134234063824704 pyconfig.py:432] Config param use_chunked_prefill: False
I0422 13:06:15.491959 134234063824704 pyconfig.py:432] Config param use_custom_sort_vjp: True
I0422 13:06:15.491974 134234063824704 pyconfig.py:432] Config param use_dpo: False
I0422 13:06:15.491990 134234063824704 pyconfig.py:432] Config param use_gather_mosaic_kernel: False
I0422 13:06:15.492005 134234063824704 pyconfig.py:432] Config param use_grpo: True
I0422 13:06:15.492021 134234063824704 pyconfig.py:432] Config param use_indexer: False
I0422 13:06:15.492035 134234063824704 pyconfig.py:432] Config param use_iota_embed: True
I0422 13:06:15.492051 134234063824704 pyconfig.py:432] Config param use_jax_splash: False
I0422 13:06:15.492068 134234063824704 pyconfig.py:432] Config param use_max_logit_estimate: -1
I0422 13:06:15.492082 134234063824704 pyconfig.py:432] Config param use_mrope: False
I0422 13:06:15.492109 134234063824704 pyconfig.py:432] Config param use_multimodal: False
I0422 13:06:15.492125 134234063824704 pyconfig.py:432] Config param use_pathways: True
I0422 13:06:15.492141 134234063824704 pyconfig.py:432] Config param use_post_attn_norm: False
I0422 13:06:15.492156 134234063824704 pyconfig.py:432] Config param use_post_ffw_norm: False
I0422 13:06:15.492171 134234063824704 pyconfig.py:432] Config param use_qk_clip: False
I0422 13:06:15.492187 134234063824704 pyconfig.py:432] Config param use_qk_norm: False
I0422 13:06:15.492203 134234063824704 pyconfig.py:432] Config param use_qk_norm_in_gdn: True
I0422 13:06:15.492219 134234063824704 pyconfig.py:432] Config param use_qwix_quantization: False
I0422 13:06:15.492235 134234063824704 pyconfig.py:432] Config param use_ragged_attention: False
I0422 13:06:15.492250 134234063824704 pyconfig.py:432] Config param use_random_routing: False
I0422 13:06:15.492265 134234063824704 pyconfig.py:432] Config param use_replicator_service: False
I0422 13:06:15.492281 134234063824704 pyconfig.py:432] Config param use_ring_of_experts: False
I0422 13:06:15.492297 134234063824704 pyconfig.py:432] Config param use_sft: False
I0422 13:06:15.492313 134234063824704 pyconfig.py:432] Config param use_splash_scheduler: False
I0422 13:06:15.492328 134234063824704 pyconfig.py:432] Config param use_tokamax_gmm: False
I0422 13:06:15.492343 134234063824704 pyconfig.py:432] Config param use_tokamax_splash: False
I0422 13:06:15.492358 134234063824704 pyconfig.py:432] Config param use_truncation: True
I0422 13:06:15.492374 134234063824704 pyconfig.py:432] Config param use_tunix_gradient_accumulation: False
I0422 13:06:15.492390 134234063824704 pyconfig.py:432] Config param use_untrainable_positional_embedding: False
I0422 13:06:15.492411 134234063824704 pyconfig.py:432] Config param use_vertex_tensorboard: False
I0422 13:06:15.492435 134234063824704 pyconfig.py:432] Config param using_pipeline_parallelism: False
I0422 13:06:15.492460 134234063824704 pyconfig.py:432] Config param v_head_dim: 128
I0422 13:06:15.492482 134234063824704 pyconfig.py:432] Config param v_norm_with_scale: True
I0422 13:06:15.492498 134234063824704 pyconfig.py:432] Config param value_proj: RematLocation.REMAT
I0422 13:06:15.492516 134234063824704 pyconfig.py:432] Config param vertex_tensorboard_project: 
I0422 13:06:15.492543 134234063824704 pyconfig.py:432] Config param vertex_tensorboard_region: 
I0422 13:06:15.492568 134234063824704 pyconfig.py:432] Config param video_path: 
I0422 13:06:15.492592 134234063824704 pyconfig.py:432] Config param video_placeholder: <|video|>
I0422 13:06:15.492616 134234063824704 pyconfig.py:432] Config param vision_output_dim_for_vit: 4096
I0422 13:06:15.492641 134234063824704 pyconfig.py:432] Config param vision_output_length: -1
I0422 13:06:15.492665 134234063824704 pyconfig.py:432] Config param vllm_additional_config: {}
I0422 13:06:15.492690 134234063824704 pyconfig.py:432] Config param vllm_hf_config_path: 
I0422 13:06:15.492714 134234063824704 pyconfig.py:432] Config param vllm_hf_overrides: {}
I0422 13:06:15.492738 134234063824704 pyconfig.py:432] Config param vocab_size: 32000
I0422 13:06:15.492761 134234063824704 pyconfig.py:432] Config param warmup_steps_fraction: 0.1
I0422 13:06:15.492788 134234063824704 pyconfig.py:432] Config param weight_dtype: float32
I0422 13:06:15.492830 134234063824704 pyconfig.py:432] Config param weight_quantization_calibration_method: absmax
I0422 13:06:15.492856 134234063824704 pyconfig.py:432] Config param wi_tile_dlhs_batch_seq: 512
I0422 13:06:15.492882 134234063824704 pyconfig.py:432] Config param wi_tile_dlhs_embed_dim: 1024
I0422 13:06:15.492908 134234063824704 pyconfig.py:432] Config param wi_tile_dlhs_mlp_dim: 1024
I0422 13:06:15.492933 134234063824704 pyconfig.py:432] Config param wi_tile_drhs_batch_seq: 512
I0422 13:06:15.492958 134234063824704 pyconfig.py:432] Config param wi_tile_drhs_embed_dim: 1024
I0422 13:06:15.492984 134234063824704 pyconfig.py:432] Config param wi_tile_drhs_mlp_dim: 1024
I0422 13:06:15.493008 134234063824704 pyconfig.py:432] Config param wi_tile_fwd_batch_seq: 512
I0422 13:06:15.493032 134234063824704 pyconfig.py:432] Config param wi_tile_fwd_embed_dim: 1024
I0422 13:06:15.493056 134234063824704 pyconfig.py:432] Config param wi_tile_fwd_mlp_dim: 1024
I0422 13:06:15.493080 134234063824704 pyconfig.py:432] Config param wo_tile_dlhs_batch_seq: 512
I0422 13:06:15.493118 134234063824704 pyconfig.py:432] Config param wo_tile_dlhs_embed_dim: 1024
I0422 13:06:15.493142 134234063824704 pyconfig.py:432] Config param wo_tile_dlhs_mlp_dim: 1024
I0422 13:06:15.493167 134234063824704 pyconfig.py:432] Config param wo_tile_drhs_batch_seq: 512
I0422 13:06:15.493192 134234063824704 pyconfig.py:432] Config param wo_tile_drhs_embed_dim: 1024
I0422 13:06:15.493218 134234063824704 pyconfig.py:432] Config param wo_tile_drhs_mlp_dim: 1024
I0422 13:06:15.493244 134234063824704 pyconfig.py:432] Config param wo_tile_fwd_batch_seq: 512
I0422 13:06:15.493270 134234063824704 pyconfig.py:432] Config param wo_tile_fwd_embed_dim: 1024
I0422 13:06:15.493295 134234063824704 pyconfig.py:432] Config param wo_tile_fwd_mlp_dim: 1024
I0422 13:06:15.493321 134234063824704 pyconfig.py:432] Config param wsd_decay_steps_fraction: 0.1
I0422 13:06:15.493345 134234063824704 pyconfig.py:432] Config param wsd_decay_style: WsdDecayStyle.LINEAR
I0422 13:06:15.493367 134234063824704 pyconfig.py:432] Config param xprof_e2e_enable_fw_power_level_event: False
I0422 13:06:15.493383 134234063824704 pyconfig.py:432] Config param xprof_e2e_enable_fw_thermal_event: False
I0422 13:06:15.493398 134234063824704 pyconfig.py:432] Config param xprof_e2e_enable_fw_throttle_event: False
I0422 13:06:15.493414 134234063824704 pyconfig.py:432] Config param xprof_tpu_power_trace_level: 0
I0422 13:06:15.493432 134234063824704 pyconfig.py:432] Config param z_loss_multiplier: 0.0
I0422 13:06:15.493808 134234063824704 tokenizer.py:245] Tokenizer path: meta-llama/Llama-2-7b-chat-hf
I0422 13:06:15.493858 134234063824704 tokenizer.py:224] Loading HF tokenizer: meta-llama/Llama-2-7b-chat-hf
I0422 13:06:19.417582 134234063824704 _schedule.py:129] A polynomial schedule was set with a non-positive `transition_steps` value; this results in a constant schedule with value `init_value`.
I0422 13:06:19.420611 134234063824704 maxtext_utils.py:1718] Num_devices: 32, shape (1, 4, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1)
I0422 13:06:19.420732 134234063824704 train_distill.py:608] Applying logical axis rules for model initialization and training...
I0422 13:06:19.420804 134234063824704 train_distill.py:612] Loading Student from ...
I0422 13:06:19.420833 134234063824704 train_distill.py:169] --- Student Configuration ---
I0422 13:06:19.420856 134234063824704 train_distill.py:170]   Model Name:      gpt3-52k
I0422 13:06:19.420878 134234063824704 train_distill.py:171]   Dimensions:      1 Layers, 16 Emb Dim, 8 Head Dim
I0422 13:06:19.420897 134234063824704 train_distill.py:174]   Attention Heads: 2 Query, 2 KV
I0422 13:06:19.420916 134234063824704 train_distill.py:175]   Vocab Size:      32000
I0422 13:06:19.420934 134234063824704 train_distill.py:176]   Checkpoint:      
I0422 13:06:19.420953 134234063824704 train_distill.py:477] Initializing model: gpt3-52k...
I0422 13:06:20.693515 134234063824704 train_distill.py:626] Loading Teacher from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items...
I0422 13:06:20.693621 134234063824704 train_distill.py:169] --- Teacher Configuration ---
I0422 13:06:20.693648 134234063824704 train_distill.py:170]   Model Name:      gpt3-52k
I0422 13:06:20.693672 134234063824704 train_distill.py:171]   Dimensions:      1 Layers, 16 Emb Dim, 8 Head Dim
I0422 13:06:20.693693 134234063824704 train_distill.py:174]   Attention Heads: 2 Query, 2 KV
I0422 13:06:20.693713 134234063824704 train_distill.py:175]   Vocab Size:      32000
I0422 13:06:20.693730 134234063824704 train_distill.py:176]   Checkpoint:      gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items
I0422 13:06:20.693749 134234063824704 train_distill.py:477] Initializing model: gpt3-52k...
I0422 13:06:21.859416 134234063824704 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0422 13:06:21.859863 134234063824704 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=True, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7a151610e2a0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0422 13:06:21.859923 134234063824704 abstract_checkpointer.py:35] orbax-checkpoint version: 0.11.28
W0422 13:06:22.412320 134234063824704 checkpoint.py:202] Metadata file does not exist: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items/_CHECKPOINT_METADATA
I0422 13:06:22.947969    2125 google_auth_provider.cc:181] Running on GCE, using service account 562977990677-compute@developer.gserviceaccount.com
I0422 13:06:24.116325 134234063824704 checkpointer.py:304] Restoring checkpoint from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items.
W0422 13:06:26.270449 134234063824704 transform_utils.py:230] The transformations API will eventually be replaced by an upgraded design. The current API will not be removed until this point, but it will no longer be actively worked on.
I0422 13:06:26.270842 134234063824704 transform_utils.py:288] The following keys are not loaded from the original tree after applying specified transforms: params/params/decoder/to_nnx__rngs/aqt/count, params/params/decoder/to_nnx__rngs/aqt/key, params/params/decoder/to_nnx__rngs/dropout/count, params/params/decoder/to_nnx__rngs/dropout/key, params/params/decoder/to_nnx__rngs/params/count, params/params/decoder/to_nnx__rngs/params/key
I0422 13:06:26.824538 134234063824704 checkpointer.py:318] Finished restoring checkpoint in 3.08 seconds from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items.
I0422 13:06:27.514820 134234063824704 train_distill.py:652] Initializing Data Iterators via MaxText pipeline...
I0422 13:06:27.579390 134234063824704 config.py:112] TensorFlow version 2.20.0 available.
I0422 13:06:27.579923 134234063824704 config.py:125] JAX version 0.8.3 available.
E0422 13:06:29.649948 134234063824704 packing.py:209] PackAndBatchOperation is deprecated. Please use lazy_dataset.FirstFitPackIterDataset instead.
I0422 13:06:29.650187 134234063824704 data_loader.py:408] Adding CopyNumPyArrayToSharedMemory MapTransform.
I0422 13:06:29.653304 134234063824704 train_distill.py:422] Input Pipeline Checkpointing: DISABLED
I0422 13:06:29.653368 134234063824704 train_distill.py:426] Reason: Iterator 'MultiHostDataLoadIterator' is not recognized as Grain (dataset_type='DatasetType.HF', has_save=False)
I0422 13:06:29.653430 134234063824704 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0422 13:06:29.653513 134234063824704 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7a151610e2a0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0422 13:06:29.653554 134234063824704 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0422 13:06:29.653586 134234063824704 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7a151610e2a0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0422 13:06:29.653629 134234063824704 checkpoint_manager.py:702] [process=6][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a0f703a5fd0>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a0ba8237e90>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a0b01c47620>}, handler_registry=None
I0422 13:06:29.653819 134234063824704 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a0f703a5fd0>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0422 13:06:29.653861 134234063824704 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a0ba8237e90>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0422 13:06:29.653888 134234063824704 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a0b01c47620>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0422 13:06:29.653913 134234063824704 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x79fd2f8a6300>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0422 13:06:29.653940 134234063824704 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a0f703a5fd0>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a0f703a5fd0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a0ba8237e90>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a0ba8237e90>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a0b01c47620>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a0b01c47620>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x79fd2f8a6300>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x79fd2f8a6300>}).
I0422 13:06:29.654366 134234063824704 async_checkpointer.py:177] [process=6][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7a0b01d959e0> timeout: 600 secs and primary_host=0 for async checkpoint writes
I0422 13:06:32.305812 134234063824704 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260422_123915/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260422_123915_07_distill_smoke/checkpoints
I0422 13:06:32.316063 134234063824704 checkpoint_manager.py:921] [process=6][thread=MainThread] CheckpointManager created,  primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260422_123915/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260422_123915_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7a0b01c475f0>
I0422 13:06:32.316196 134234063824704 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0422 13:06:32.316262 134234063824704 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7a151610e2a0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0422 13:06:32.316298 134234063824704 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0422 13:06:32.316333 134234063824704 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7a151610e2a0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0422 13:06:32.316370 134234063824704 checkpoint_manager.py:1983] [process=6][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning.
I0422 13:06:32.316419 134234063824704 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=134234063824704 count=1 at 0x7a0ba8227080>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7a0b01c473e0>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7a0b01c473b0>, _write_futures=[])
I0422 13:06:32.316773 134234063824704 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=134234063824704 count=1 at 0x7a0ba8227080>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7a0b01c473e0>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7a0b01c473b0>, _write_futures=[])
I0422 13:06:32.316800 134234063824704 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=134234063824704 count=1 at 0x7a0ba8227080>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7a0b01c473e0>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7a0b01c473b0>, _write_futures=[])
I0422 13:06:32.316832 134234063824704 checkpoint_manager.py:702] [process=6][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x79fd2f859d00>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a0b01c45cd0>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a0b01c45a90>, 'iter': <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7a0b01c46960>}, handler_registry=None
I0422 13:06:32.316929 134234063824704 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x79fd2f859d00>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0422 13:06:32.316962 134234063824704 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a0b01c45cd0>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0422 13:06:32.316985 134234063824704 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a0b01c45a90>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0422 13:06:32.317012 134234063824704 composite_checkpoint_handler.py:237] Deferred registration for item: "iter". Adding handler `<maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7a0b01c46960>` for item "iter" and save args `<class 'maxtext.common.checkpointing.GrainCheckpointSave'>` and restore args `<class 'maxtext.common.checkpointing.GrainCheckpointRestore'>` to `_handler_registry`.
I0422 13:06:32.317035 134234063824704 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a0b01c45220>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0422 13:06:32.317060 134234063824704 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x79fd2f859d00>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x79fd2f859d00>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a0b01c45cd0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a0b01c45cd0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a0b01c45a90>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a0b01c45a90>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointSave'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7a0b01c46960>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointRestore'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7a0b01c46960>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a0b01c45220>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a0b01c45220>}).
I0422 13:06:32.317141 134234063824704 async_checkpointer.py:177] [process=6][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7a0b01d95b20> timeout: 600 secs and primary_host=0 for async checkpoint writes
I0422 13:06:33.129243 134234063824704 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260422_123915/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260422_123915_07_distill_smoke/checkpoints
I0422 13:06:33.139197 134234063824704 checkpoint_manager.py:921] [process=6][thread=MainThread] CheckpointManager created,  primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260422_123915/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260422_123915_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7a0f70397230>
I0422 13:06:33.139699 134234063824704 train_distill.py:703] Starting Distillation Training...
I0422 13:06:33.139800 134234063824704 peft_trainer.py:584] Training with mesh: Mesh('diloco': 1, 'data': 4, 'stage': 1, 'fsdp': 8, 'fsdp_transpose': 1, 'sequence': 1, 'context': 1, 'context_autoregressive': 1, 'tensor': 1, 'tensor_transpose': 1, 'tensor_sequence': 1, 'expert': 1, 'autoregressive': 1, axis_types=(Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto))
I0422 13:06:33.261906 134234063824704 peft_trainer.py:594] Compiled train_step cache size: 0
I0422 13:06:33.263593 134091202610944 grain_pool.py:367] Grain pool will use 1 processes.
I0422 13:06:33.290120 134091202610944 grain_pool.py:440] Grain pool will start child processes.
I0422 13:06:33.295902 134091202610944 grain_pool.py:448] Grain pool started all child processes.
2026-04-22 13:06:39.344586: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
/deps/src/maxtext/trainers/post_train/distillation/train_distill.py:281: DeprecationWarning: '.value' access is now deprecated. For Variable[Array] instances use:

  variable[...]

For other Variable types use:

  variable.get_value()

  current_step = model.training_step.value
I0422 13:06:45.723172 134234063824704 checkpoint_manager.py:1983] [process=6][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning.
I0422 13:06:45.726345 134234063824704 checkpoint_manager.py:1501] [process=6] Saving checkpoint at step 1
I0422 13:06:45.729509 134234063824704 async_checkpointer.py:452] [process=6] Started async saving checkpoint to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260422_123915/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260422_123915_07_distill_smoke/checkpoints/1.
I0422 13:06:46.422651 134234063824704 signaling_client.py:364] Using JaxDistributedSignalingClient
I0422 13:06:46.423770 134234063824704 jax_array_handlers.py:347] Scheduling D2H of 37 prioritized jax.Array.
I0422 13:06:46.423829 134234063824704 replica_slices.py:410] Transferring arrays to host memory with options: use_replica_parallel=True, min_slice_bytes_for_replica_parallel=None, max_replicas_for_replica_parallel=None, enable_pinned_host_transfer=False
I0422 13:06:47.097388 134234063824704 base_pytree_checkpoint_handler.py:153] [process=6][thread=MainThread] Initiated "orbax.checkpoint._src.serialization.jax_array_handlers.ArrayHandler".serialize. Time taken: 0.674828s
I0422 13:06:47.101299 134234063824704 base_pytree_checkpoint_handler.py:128] [process=6] /jax/checkpoint/write/blocking_gbytes_per_sec: 567.114 KiB/s (total gbytes: 535.1 KiB) (time elapsed: 943 milliseconds) (per-host)
I0422 13:06:47.101367 134234063824704 base_pytree_checkpoint_handler.py:732] [process=6][thread=MainThread] Initiated Pytree async_save. Time taken: 0.943706s (batch_requests_ready=0.260178s, total_serialization_initiated=0.683409s, others=0.000118s)
I0422 13:06:47.103294 134234063824704 jax_array_handlers.py:347] Scheduling D2H of 46 prioritized jax.Array.
I0422 13:06:47.103355 134234063824704 replica_slices.py:410] Transferring arrays to host memory with options: use_replica_parallel=True, min_slice_bytes_for_replica_parallel=None, max_replicas_for_replica_parallel=None, enable_pinned_host_transfer=False
I0422 13:06:47.108627 134234063824704 base_pytree_checkpoint_handler.py:153] [process=6][thread=MainThread] Initiated "orbax.checkpoint._src.serialization.jax_array_handlers.ArrayHandler".serialize. Time taken: 0.007146s
I0422 13:06:47.108738 134234063824704 base_pytree_checkpoint_handler.py:128] [process=6] /jax/checkpoint/write/blocking_gbytes_per_sec: 280.336 KiB/s (total gbytes: 267.6 KiB) (time elapsed: 954 milliseconds) (per-host)
I0422 13:06:47.108786 134234063824704 base_pytree_checkpoint_handler.py:732] [process=6][thread=MainThread] Initiated Pytree async_save. Time taken: 0.954502s (batch_requests_ready=0.943289s, total_serialization_initiated=0.011141s, others=0.000072s)
I0422 13:06:47.108955 134234063824704 composite_checkpoint_handler.py:715] [process=6][thread=MainThread] Initiated CompositeCheckpointHandler.async_save. Time taken: 0.958670s (all_items=0.000023s, per_item={'model_params': '0.00001884', 'optimizer_state': '0.00000405'}, temp_paths=0.958647)
I0422 13:06:47.109987 134084205729536 async_checkpointer.py:79] [process=6][thread=async_save] Background save thread started.
I0422 13:06:47.110174 134234063824704 async_checkpointer.py:561] Finished blocking save. Time taken: 1.383754s. Continuing background save to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260422_123915/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260422_123915_07_distill_smoke/checkpoints/1.
I0422 13:06:47.116829 134234063824704 checkpoint_manager.py:1549] [process=6][thread=MainThread][step=1] Starting CheckpointManager Save Finalize thread=save_finalize
I0422 13:06:47.117063 134084239300352 async_checkpointer.py:265] [process=6][thread=save_finalize] Waiting for background save thread=async_save.
I0422 13:06:47.117235 134234063824704 standard_logger.py:34] {'step': 1, 'event_type': 'save', 'directory': 'gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260422_123915/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260422_123915_07_distill_smoke/checkpoints', 'reached_preemption': False, 'preemption_received_at': None, 'synchronous': False, 'wait_for_prev_start_time': 1776863205.723149, 'wait_for_prev_duration_secs': 0.00010991096496582031, 'time_between_consecutive_saves_sec': None, 'checkpointer_blocking_start_time': 1776863205.7263858, 'checkpointer_blocking_duration_secs': 1.3838975429534912, 'get_old_steps_start_time': 1776863207.1103072, 'get_old_steps_duration_secs': 8.058547973632812e-05, 'checkpoint_manager_blocking_start_time': 1776863205.7168176, 'checkpoint_manager_blocking_duration_secs': 1.4003827571868896}
/deps/src/maxtext/trainers/post_train/distillation/train_distill.py:281: DeprecationWarning: '.value' access is now deprecated. For Variable[Array] instances use:

  variable[...]

For other Variable types use:

  variable.get_value()

  current_step = model.training_step.value
I0422 13:06:50.315065 134234063824704 peft_trainer.py:474] Train step 1 training loss: 15.990566  - training perplexity: 8802675.000000
I0422 13:06:50.335635 134234063824704 peft_trainer.py:474] Train step 2 training loss: 15.974588  - training perplexity: 8663145.000000
I0422 13:06:50.366178 134234063824704 peft_trainer.py:474] Train step 3 training loss: 16.008877  - training perplexity: 8965342.000000
I0422 13:06:50.385915 134234063824704 peft_trainer.py:474] Train step 4 training loss: 16.001873  - training perplexity: 8902770.000000
I0422 13:06:50.390866 134234063824704 peft_trainer.py:733] Train loop finished in: 17.1285 seconds
I0422 13:06:50.391349 134234063824704 train_distill.py:712] Saving final checkpoint to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260422_123915/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260422_123915_07_distill_smoke/checkpoints/...
I0422 13:06:51.544021 134084214122240 array_metadata_store.py:203] [process=6][thread=array_type_handler] Wrote 46 array_metadata.ArrayMetadata to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260422_123915/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260422_123915_07_distill_smoke/checkpoints/1/model_params/array_metadatas/process_6
I0422 13:06:51.545217 134084205729536 base_pytree_checkpoint_handler.py:128] [process=6] /jax/checkpoint/write/gbytes_per_sec: 49.633 KiB/s (total gbytes: 267.6 KiB) (time elapsed: 5 seconds) (per-host)
I0422 13:06:56.539340 134086345144064 array_metadata_store.py:203] [process=6][thread=array_type_handler] Wrote 37 array_metadata.ArrayMetadata to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260422_123915/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260422_123915_07_distill_smoke/checkpoints/1/optimizer_state/array_metadatas/process_6
I0422 13:06:56.540570 134084205729536 base_pytree_checkpoint_handler.py:128] [process=6] /jax/checkpoint/write/gbytes_per_sec: 51.539 KiB/s (total gbytes: 535.1 KiB) (time elapsed: 10 seconds) (per-host)
I0422 13:06:56.540699 134084205729536 async_checkpointer.py:90] [process=6][thread=async_save] 4 Handler Commit operations completed. Time taken: 9.430598s.
I0422 13:07:02.019013 134234063824704 checkpoint_manager.py:1994] [process=6][thread=MainThread][step=1][wait_until_finished] Waiting for Save Finalize thread (save_finalize) to complete.
I0422 13:07:05.886790 134084205729536 async_checkpointer.py:144] [process=6][thread=async_save] Background save thread done. Time taken: 18.776674s.
I0422 13:07:05.887117 134084239300352 async_checkpointer.py:273] [process=6][thread=save_finalize] Done with waiting for background save thread=async_save.
I0422 13:07:05.887270 134084239300352 async_checkpointer.py:283] [process=6][thread=save_finalize] No errors found in background save thread=async_save.
I0422 13:07:05.887330 134084239300352 checkpoint_manager.py:2103] [process=6][thread=save_finalize][step=1] CheckpointManager Save Finalize is syncing with other hosts...
I0422 13:07:05.889051 134084239300352 checkpoint_manager.py:2112] [process=6][thread=save_finalize][step=1] CheckpointManager Save Finalize is done on all hosts.
I0422 13:07:05.889240 134234063824704 checkpoint_manager.py:2006] [process=6][thread=MainThread][step=1][wait_until_finished] Done waiting for Save Finalize thread (save_finalize) running at step=1.
W0422 13:07:05.889367 134234063824704 checkpoint_manager.py:1441] Waiting for previous save to complete took 3.870373 seconds. If this number is high, consider checkpointing less frequently.
I0422 13:07:05.890763 134234063824704 checkpoint_manager.py:1501] [process=6] Saving checkpoint at step 5
I0422 13:07:05.894373 134234063824704 async_checkpointer.py:452] [process=6] Started async saving checkpoint to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260422_123915/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260422_123915_07_distill_smoke/checkpoints/5.
I0422 13:07:06.840269 134234063824704 jax_array_handlers.py:347] Scheduling D2H of 37 prioritized jax.Array.
I0422 13:07:06.840373 134234063824704 replica_slices.py:410] Transferring arrays to host memory with options: use_replica_parallel=True, min_slice_bytes_for_replica_parallel=None, max_replicas_for_replica_parallel=None, enable_pinned_host_transfer=False
I0422 13:07:07.505397 134234063824704 base_pytree_checkpoint_handler.py:153] [process=6][thread=MainThread] Initiated "orbax.checkpoint._src.serialization.jax_array_handlers.ArrayHandler".serialize. Time taken: 0.666266s
I0422 13:07:07.509125 134234063824704 base_pytree_checkpoint_handler.py:128] [process=6] /jax/checkpoint/write/blocking_gbytes_per_sec: 590.184 KiB/s (total gbytes: 535.1 KiB) (time elapsed: 906 milliseconds) (per-host)
I0422 13:07:07.509191 134234063824704 base_pytree_checkpoint_handler.py:732] [process=6][thread=MainThread] Initiated Pytree async_save. Time taken: 0.906833s (batch_requests_ready=0.235171s, total_serialization_initiated=0.671534s, others=0.000128s)
I0422 13:07:07.511146 134234063824704 jax_array_handlers.py:347] Scheduling D2H of 46 prioritized jax.Array.
I0422 13:07:07.511206 134234063824704 replica_slices.py:410] Transferring arrays to host memory with options: use_replica_parallel=True, min_slice_bytes_for_replica_parallel=None, max_replicas_for_replica_parallel=None, enable_pinned_host_transfer=False
I0422 13:07:07.516537 134234063824704 base_pytree_checkpoint_handler.py:153] [process=6][thread=MainThread] Initiated "orbax.checkpoint._src.serialization.jax_array_handlers.ArrayHandler".serialize. Time taken: 0.007224s
I0422 13:07:07.516645 134234063824704 base_pytree_checkpoint_handler.py:128] [process=6] /jax/checkpoint/write/blocking_gbytes_per_sec: 291.741 KiB/s (total gbytes: 267.6 KiB) (time elapsed: 917 milliseconds) (per-host)
I0422 13:07:07.516688 134234063824704 base_pytree_checkpoint_handler.py:732] [process=6][thread=MainThread] Initiated Pytree async_save. Time taken: 0.917185s (batch_requests_ready=0.906068s, total_serialization_initiated=0.011051s, others=0.000066s)
I0422 13:07:07.516809 134234063824704 composite_checkpoint_handler.py:715] [process=6][thread=MainThread] Initiated CompositeCheckpointHandler.async_save. Time taken: 0.921475s (all_items=0.000014s, per_item={'model_params': '0.00001144', 'optimizer_state': '0.00000262'}, temp_paths=0.921461)
I0422 13:07:07.517622 134082637854464 async_checkpointer.py:79] [process=6][thread=async_save] Background save thread started.
I0422 13:07:07.517753 134234063824704 async_checkpointer.py:561] Finished blocking save. Time taken: 1.626923s. Continuing background save to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260422_123915/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260422_123915_07_distill_smoke/checkpoints/5.
I0422 13:07:07.551069 134234063824704 checkpoint_manager.py:1549] [process=6][thread=MainThread][step=5] Starting CheckpointManager Save Finalize thread=save_finalize
I0422 13:07:07.551372 134084239300352 async_checkpointer.py:265] [process=6][thread=save_finalize] Waiting for background save thread=async_save.
I0422 13:07:07.551543 134234063824704 standard_logger.py:34] {'step': 5, 'event_type': 'save', 'directory': 'gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260422_123915/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260422_123915_07_distill_smoke/checkpoints', 'reached_preemption': False, 'preemption_received_at': None, 'synchronous': False, 'wait_for_prev_start_time': 1776863222.0189717, 'wait_for_prev_duration_secs': 3.870372772216797, 'time_between_consecutive_saves_sec': None, 'checkpointer_blocking_start_time': 1776863225.8908026, 'checkpointer_blocking_duration_secs': 1.627098798751831, 'get_old_steps_start_time': 1776863227.5179193, 'get_old_steps_duration_secs': 6.103515625e-05, 'checkpoint_manager_blocking_start_time': 1776863210.395568, 'checkpoint_manager_blocking_duration_secs': 17.155941009521484}
I0422 13:07:07.551713 134234063824704 checkpoint_manager.py:1994] [process=6][thread=MainThread][step=5][wait_until_finished] Waiting for Save Finalize thread (save_finalize) to complete.
I0422 13:07:12.071382 134086345144064 array_metadata_store.py:203] [process=6][thread=array_type_handler] Wrote 37 array_metadata.ArrayMetadata to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260422_123915/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260422_123915_07_distill_smoke/checkpoints/5/optimizer_state/array_metadatas/process_6
I0422 13:07:12.121109 134084214122240 array_metadata_store.py:203] [process=6][thread=array_type_handler] Wrote 46 array_metadata.ArrayMetadata to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260422_123915/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260422_123915_07_distill_smoke/checkpoints/5/model_params/array_metadatas/process_6
I0422 13:07:12.122262 134082637854464 base_pytree_checkpoint_handler.py:128] [process=6] /jax/checkpoint/write/gbytes_per_sec: 48.448 KiB/s (total gbytes: 267.6 KiB) (time elapsed: 5 seconds) (per-host)
I0422 13:07:12.122430 134082637854464 base_pytree_checkpoint_handler.py:128] [process=6] /jax/checkpoint/write/gbytes_per_sec: 96.942 KiB/s (total gbytes: 535.1 KiB) (time elapsed: 5 seconds) (per-host)
I0422 13:07:12.122469 134082637854464 async_checkpointer.py:90] [process=6][thread=async_save] 4 Handler Commit operations completed. Time taken: 4.604642s.
I0422 13:07:22.703430 134082637854464 async_checkpointer.py:144] [process=6][thread=async_save] Background save thread done. Time taken: 15.185586s.
I0422 13:07:22.703778 134084239300352 async_checkpointer.py:273] [process=6][thread=save_finalize] Done with waiting for background save thread=async_save.
I0422 13:07:22.703891 134084239300352 async_checkpointer.py:283] [process=6][thread=save_finalize] No errors found in background save thread=async_save.
I0422 13:07:22.703941 134084239300352 checkpoint_manager.py:2103] [process=6][thread=save_finalize][step=5] CheckpointManager Save Finalize is syncing with other hosts...
I0422 13:07:22.705662 134084239300352 checkpoint_manager.py:2112] [process=6][thread=save_finalize][step=5] CheckpointManager Save Finalize is done on all hosts.
I0422 13:07:22.705826 134234063824704 checkpoint_manager.py:2006] [process=6][thread=MainThread][step=5][wait_until_finished] Done waiting for Save Finalize thread (save_finalize) running at step=5.
I0422 13:07:22.705950 134234063824704 train_distill.py:724] Final checkpoint saved.
I0422 13:07:22.708311 134234063824704 peft_trainer.py:474] Train step 5 training loss: 15.987230  - training perplexity: 8773359.000000
I0422 13:07:22.708689 134234063824704 checkpoint_manager.py:1983] [process=6][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning.
I0422 13:07:22.708764 134234063824704 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=134234063824704 count=1 at 0x79fd2f851f00>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7a0b01c45310>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7a0b01c453a0>, _write_futures=[])
I0422 13:07:22.708814 134234063824704 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=134234063824704 count=1 at 0x79fd2f851f00>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7a0b01c45310>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7a0b01c453a0>, _write_futures=[])
I0422 13:07:22.708844 134234063824704 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=134234063824704 count=1 at 0x79fd2f851f00>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7a0b01c45310>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7a0b01c453a0>, _write_futures=[])
I0422 13:07:22.708888 134234063824704 train_distill.py:734] Distillation Complete.
I0422 13:07:22.896083 134091202610944 grain_pool.py:547] Shutting down multiprocessing system.
I0422 13:07:24.342618 134091202610944 grain_pool.py:542] Grain pool is exiting.
I0422 13:07:24.342731 134091202610944 grain_pool.py:547] Shutting down multiprocessing system.
I0422 13:07:24.342793 134091202610944 grain_pool.py:547] Shutting down multiprocessing system.
XPK End: Wed Apr 22 13:07:33 UTC 2026
EXIT_CODE=0