MaxView

‹ 01_sft_smokeCase: 07_distill_smoke— ›

Metrics: Linen vs NNX  ·  feat/nnx-post-train-fixes

MetricLinen  e27fc1e97NNX  e27fc1e97Diff (NNX − Linen)

Diff = NNX value − Linen value. Green = NNX improved. Red = NNX regressed.

Linen  ·  e27fc1e97  ·  feat_nnx_post_train_fixes_20260421_114106  ·  full log
XPK Start: Tue Apr 21 11:55:45 UTC 2026
2026-04-21 11:56:02.745574: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
I0421 11:56:06.325611 132915091408704 max_utils.py:273] Attempting to initialize the jax distributed system...
INFO:2026-04-21 11:56:15,365:jax._src.distributed:149: Starting JAX distributed service on [::]:8482
I0421 11:56:15.365546 132915091408704 distributed.py:149] Starting JAX distributed service on [::]:8482
INFO:2026-04-21 11:56:15,367:jax._src.distributed:166: Connecting to JAX distributed service on mt-07-distill-smoke-s9nkr-slice-job-0-0.mt-07-distill-smoke-s9nkr:8482
I0421 11:56:15.367972 132915091408704 distributed.py:166] Connecting to JAX distributed service on mt-07-distill-smoke-s9nkr-slice-job-0-0.mt-07-distill-smoke-s9nkr:8482
I0421 11:56:16.707696 132915091408704 max_utils.py:284] Jax distributed system initialized!
I0421 11:56:22.105066 132915091408704 max_utils.py:244] Jax distributed system is already initialized.
I0421 11:56:22.575030 132915091408704 max_utils.py:244] Jax distributed system is already initialized.
I0421 11:56:22.576203 132915091408704 pyconfig.py:432] Config param abort_on_inf_loss: True
I0421 11:56:22.576251 132915091408704 pyconfig.py:432] Config param abort_on_nan_loss: True
I0421 11:56:22.576277 132915091408704 pyconfig.py:432] Config param act_quantization_calibration_method: absmax
I0421 11:56:22.576299 132915091408704 pyconfig.py:432] Config param activation_dropout_for_audio: 0.0
I0421 11:56:22.576323 132915091408704 pyconfig.py:432] Config param activation_function_for_audio: gelu
I0421 11:56:22.576342 132915091408704 pyconfig.py:432] Config param activations_in_float32: False
I0421 11:56:22.576363 132915091408704 pyconfig.py:432] Config param adam_b1: 0.9
I0421 11:56:22.576383 132915091408704 pyconfig.py:432] Config param adam_b2: 0.95
I0421 11:56:22.576401 132915091408704 pyconfig.py:432] Config param adam_eps: 1e-08
I0421 11:56:22.576424 132915091408704 pyconfig.py:432] Config param adam_eps_root: 0.0
I0421 11:56:22.576438 132915091408704 pyconfig.py:432] Config param adam_weight_decay: 0.1
I0421 11:56:22.576455 132915091408704 pyconfig.py:432] Config param adamw_mask: []
I0421 11:56:22.576471 132915091408704 pyconfig.py:432] Config param add_bos: True
I0421 11:56:22.576488 132915091408704 pyconfig.py:432] Config param add_eos: True
I0421 11:56:22.576502 132915091408704 pyconfig.py:432] Config param allow_split_physical_axes: False
I0421 11:56:22.576518 132915091408704 pyconfig.py:432] Config param ar_cache_axis_order: 1,2,0,3
I0421 11:56:22.576534 132915091408704 pyconfig.py:432] Config param async_checkpointing: True
I0421 11:56:22.576551 132915091408704 pyconfig.py:432] Config param async_scheduling: False
I0421 11:56:22.576568 132915091408704 pyconfig.py:432] Config param attention: dot_product
I0421 11:56:22.576585 132915091408704 pyconfig.py:432] Config param attention_bias: False
I0421 11:56:22.576602 132915091408704 pyconfig.py:432] Config param attention_dropout_for_audio: 0.0
I0421 11:56:22.576620 132915091408704 pyconfig.py:432] Config param attention_out: RematLocation.REMAT
I0421 11:56:22.576654 132915091408704 pyconfig.py:432] Config param attention_output_dim: -1
I0421 11:56:22.576672 132915091408704 pyconfig.py:432] Config param attention_sink: False
I0421 11:56:22.576687 132915091408704 pyconfig.py:432] Config param attention_type: global
I0421 11:56:22.576704 132915091408704 pyconfig.py:432] Config param attn_logits_soft_cap: None
I0421 11:56:22.576720 132915091408704 pyconfig.py:432] Config param audio_path: 
I0421 11:56:22.576736 132915091408704 pyconfig.py:432] Config param audio_placeholder: <|audio|>
I0421 11:56:22.576752 132915091408704 pyconfig.py:432] Config param autoregressive_decode_assert: 
I0421 11:56:22.576768 132915091408704 pyconfig.py:432] Config param base_config: base.yml
I0421 11:56:22.576783 132915091408704 pyconfig.py:432] Config param base_emb_dim: 16
I0421 11:56:22.576799 132915091408704 pyconfig.py:432] Config param base_mlp_dim: 64
I0421 11:56:22.576814 132915091408704 pyconfig.py:432] Config param base_moe_mlp_dim: -1
I0421 11:56:22.576830 132915091408704 pyconfig.py:432] Config param base_num_decoder_layers: 1
I0421 11:56:22.576845 132915091408704 pyconfig.py:432] Config param base_num_kv_heads: 2
I0421 11:56:22.576861 132915091408704 pyconfig.py:432] Config param base_num_query_heads: 2
I0421 11:56:22.576875 132915091408704 pyconfig.py:432] Config param base_output_directory: 
I0421 11:56:22.576891 132915091408704 pyconfig.py:432] Config param batch_size: 1
I0421 11:56:22.576907 132915091408704 pyconfig.py:432] Config param batch_split_factor: 1
I0421 11:56:22.576922 132915091408704 pyconfig.py:432] Config param beta_fast: 32
I0421 11:56:22.576938 132915091408704 pyconfig.py:432] Config param beta_slow: 1
I0421 11:56:22.576954 132915091408704 pyconfig.py:432] Config param bwd_quantization_calibration_method: absmax
I0421 11:56:22.576970 132915091408704 pyconfig.py:432] Config param capacity_factor: -1.0
I0421 11:56:22.576986 132915091408704 pyconfig.py:432] Config param cast_logits_to_fp32: True
I0421 11:56:22.577002 132915091408704 pyconfig.py:432] Config param chat_template: 
I0421 11:56:22.577017 132915091408704 pyconfig.py:432] Config param chat_template_path: 
I0421 11:56:22.577034 132915091408704 pyconfig.py:432] Config param checkpoint_conversion_fn: None
I0421 11:56:22.577050 132915091408704 pyconfig.py:432] Config param checkpoint_dir: None
I0421 11:56:22.577069 132915091408704 pyconfig.py:432] Config param checkpoint_is_quantized: False
I0421 11:56:22.577085 132915091408704 pyconfig.py:432] Config param checkpoint_period: 2000
I0421 11:56:22.577102 132915091408704 pyconfig.py:432] Config param checkpoint_storage_concurrent_gb: 96
I0421 11:56:22.577118 132915091408704 pyconfig.py:432] Config param checkpoint_storage_target_data_file_size_bytes: 2147483648
I0421 11:56:22.577135 132915091408704 pyconfig.py:432] Config param checkpoint_storage_use_ocdbt: True
I0421 11:56:22.577150 132915091408704 pyconfig.py:432] Config param checkpoint_storage_use_zarr3: True
I0421 11:56:22.577165 132915091408704 pyconfig.py:432] Config param checkpoint_todelete_full_path: None
I0421 11:56:22.577207 132915091408704 pyconfig.py:432] Config param checkpoint_todelete_subdir: None
I0421 11:56:22.577224 132915091408704 pyconfig.py:432] Config param chips_per_vm: 4
I0421 11:56:22.577239 132915091408704 pyconfig.py:432] Config param chunk_attn_window_size: 0
I0421 11:56:22.577255 132915091408704 pyconfig.py:432] Config param collect_stack_trace: False
I0421 11:56:22.577269 132915091408704 pyconfig.py:432] Config param colocated_python_checkpointing: False
I0421 11:56:22.577285 132915091408704 pyconfig.py:432] Config param colocated_python_data_input: False
I0421 11:56:22.577299 132915091408704 pyconfig.py:432] Config param compile_topology: 
I0421 11:56:22.577315 132915091408704 pyconfig.py:432] Config param compile_topology_num_slices: -1
I0421 11:56:22.577335 132915091408704 pyconfig.py:432] Config param compile_xla_flags: 
I0421 11:56:22.577349 132915091408704 pyconfig.py:432] Config param compiled_trainstep_file: 
I0421 11:56:22.577364 132915091408704 pyconfig.py:432] Config param compute_axis_order: 0,1,2,3
I0421 11:56:22.577378 132915091408704 pyconfig.py:432] Config param constant_bound_config: []
I0421 11:56:22.577393 132915091408704 pyconfig.py:432] Config param context: RematLocation.REMAT
I0421 11:56:22.577410 132915091408704 pyconfig.py:432] Config param context_parallel_load_balance: True
I0421 11:56:22.577426 132915091408704 pyconfig.py:432] Config param context_parallel_size: 1
I0421 11:56:22.577441 132915091408704 pyconfig.py:432] Config param context_parallel_strategy: all_gather
I0421 11:56:22.577456 132915091408704 pyconfig.py:432] Config param context_sharding: context
I0421 11:56:22.577472 132915091408704 pyconfig.py:432] Config param conv_chunksize_for_audio: 500
I0421 11:56:22.577488 132915091408704 pyconfig.py:432] Config param conv_stride_for_vit: 14
I0421 11:56:22.577502 132915091408704 pyconfig.py:432] Config param cost_estimate_flops_bwd: -1
I0421 11:56:22.577518 132915091408704 pyconfig.py:432] Config param cost_estimate_flops_fwd: -1
I0421 11:56:22.577533 132915091408704 pyconfig.py:432] Config param custom_mesh: 
I0421 11:56:22.577547 132915091408704 pyconfig.py:432] Config param custom_mesh_and_rule: 
I0421 11:56:22.577562 132915091408704 pyconfig.py:432] Config param d_model_for_audio: 256
I0421 11:56:22.577577 132915091408704 pyconfig.py:432] Config param data_sharding: (('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'),)
I0421 11:56:22.577597 132915091408704 pyconfig.py:432] Config param data_shuffle_seed: 0
I0421 11:56:22.577611 132915091408704 pyconfig.py:432] Config param dataset_name: c4/en:3.0.1
I0421 11:56:22.577627 132915091408704 pyconfig.py:432] Config param dataset_path: 
I0421 11:56:22.577650 132915091408704 pyconfig.py:432] Config param dataset_type: DatasetType.HF
I0421 11:56:22.577666 132915091408704 pyconfig.py:432] Config param dcn_autoregressive_parallelism: 1
I0421 11:56:22.577682 132915091408704 pyconfig.py:432] Config param dcn_context_autoregressive_parallelism: 1
I0421 11:56:22.577696 132915091408704 pyconfig.py:432] Config param dcn_context_parallelism: 1
I0421 11:56:22.577711 132915091408704 pyconfig.py:432] Config param dcn_data_parallelism: -1
I0421 11:56:22.577725 132915091408704 pyconfig.py:432] Config param dcn_diloco_parallelism: 1
I0421 11:56:22.577742 132915091408704 pyconfig.py:432] Config param dcn_expert_parallelism: 1
I0421 11:56:22.577756 132915091408704 pyconfig.py:432] Config param dcn_fsdp_parallelism: 1
I0421 11:56:22.577772 132915091408704 pyconfig.py:432] Config param dcn_fsdp_transpose_parallelism: 1
I0421 11:56:22.577786 132915091408704 pyconfig.py:432] Config param dcn_parallelism: [1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
I0421 11:56:22.577802 132915091408704 pyconfig.py:432] Config param dcn_pipeline_parallelism: 1
I0421 11:56:22.577818 132915091408704 pyconfig.py:432] Config param dcn_sequence_parallelism: 1
I0421 11:56:22.577831 132915091408704 pyconfig.py:432] Config param dcn_tensor_parallelism: 1
I0421 11:56:22.577847 132915091408704 pyconfig.py:432] Config param dcn_tensor_sequence_parallelism: 1
I0421 11:56:22.577860 132915091408704 pyconfig.py:432] Config param dcn_tensor_transpose_parallelism: 1
I0421 11:56:22.577876 132915091408704 pyconfig.py:432] Config param debug: {'rl': False}
I0421 11:56:22.577891 132915091408704 pyconfig.py:432] Config param debug_sharding: False
I0421 11:56:22.577907 132915091408704 pyconfig.py:432] Config param decode_sampling_nucleus_p: -1
I0421 11:56:22.577922 132915091408704 pyconfig.py:432] Config param decode_sampling_strategy: SamplingStrategy.GREEDY
I0421 11:56:22.577939 132915091408704 pyconfig.py:432] Config param decode_sampling_temperature: 1.0
I0421 11:56:22.577955 132915091408704 pyconfig.py:432] Config param decode_sampling_top_k: 0
I0421 11:56:22.577970 132915091408704 pyconfig.py:432] Config param decoder_block: DecoderBlockType.GPT3
I0421 11:56:22.577988 132915091408704 pyconfig.py:432] Config param decoder_layer_input: RematLocation.DEVICE
I0421 11:56:22.578005 132915091408704 pyconfig.py:432] Config param deepstack_visual_indexes_for_vit: []
I0421 11:56:22.578020 132915091408704 pyconfig.py:432] Config param degenerate_group_masking: True
I0421 11:56:22.578035 132915091408704 pyconfig.py:432] Config param dense_init_scale: 1.0
I0421 11:56:22.578051 132915091408704 pyconfig.py:432] Config param diloco_outer_lr: 0.3
I0421 11:56:22.578067 132915091408704 pyconfig.py:432] Config param diloco_outer_momentum: 0.9
I0421 11:56:22.578081 132915091408704 pyconfig.py:432] Config param diloco_sync_period: 36
I0421 11:56:22.578095 132915091408704 pyconfig.py:432] Config param distill_alpha: 0.5
I0421 11:56:22.578111 132915091408704 pyconfig.py:432] Config param distill_alpha_end: None
I0421 11:56:22.578125 132915091408704 pyconfig.py:432] Config param distill_alpha_schedule: constant
I0421 11:56:22.578142 132915091408704 pyconfig.py:432] Config param distill_beta: 0.0
I0421 11:56:22.578157 132915091408704 pyconfig.py:432] Config param distill_beta_end: None
I0421 11:56:22.578172 132915091408704 pyconfig.py:432] Config param distill_beta_schedule: constant
I0421 11:56:22.578187 132915091408704 pyconfig.py:432] Config param distill_feature_loss_type: cosine
I0421 11:56:22.578201 132915091408704 pyconfig.py:432] Config param distill_layer_indices: None
I0421 11:56:22.578217 132915091408704 pyconfig.py:432] Config param distill_temperature: 1.0
I0421 11:56:22.578231 132915091408704 pyconfig.py:432] Config param distill_temperature_end: None
I0421 11:56:22.578246 132915091408704 pyconfig.py:432] Config param distill_temperature_schedule: constant
I0421 11:56:22.578260 132915091408704 pyconfig.py:432] Config param downsample_hidden_size_for_audio: 256
I0421 11:56:22.578276 132915091408704 pyconfig.py:432] Config param dpo_beta: 0.1
I0421 11:56:22.578291 132915091408704 pyconfig.py:432] Config param dpo_label_smoothing: 0.0
I0421 11:56:22.578305 132915091408704 pyconfig.py:432] Config param dq_reduction_steps: 0
I0421 11:56:22.578325 132915091408704 pyconfig.py:432] Config param dropout_rate: 0.0
I0421 11:56:22.578340 132915091408704 pyconfig.py:432] Config param dtype: bfloat16
I0421 11:56:22.578370 132915091408704 pyconfig.py:432] Config param dtype_mm: float32
I0421 11:56:22.578385 132915091408704 pyconfig.py:432] Config param dump_hlo: False
I0421 11:56:22.578400 132915091408704 pyconfig.py:432] Config param dump_hlo_delete_local_after: True
I0421 11:56:22.578414 132915091408704 pyconfig.py:432] Config param dump_hlo_gcs_dir: gpt3-52k_2026-04-21-11-56/xla_dump
I0421 11:56:22.578429 132915091408704 pyconfig.py:432] Config param dump_hlo_local_dir: /tmp/xla_dump/
I0421 11:56:22.578444 132915091408704 pyconfig.py:432] Config param dump_hlo_local_module_name: jit_train_step
I0421 11:56:22.578457 132915091408704 pyconfig.py:432] Config param dump_hlo_module_name: jit_train_step
I0421 11:56:22.578472 132915091408704 pyconfig.py:432] Config param dump_hlo_upload_all: False
I0421 11:56:22.578485 132915091408704 pyconfig.py:432] Config param dump_hlo_xla_flags: 
I0421 11:56:22.578501 132915091408704 pyconfig.py:432] Config param dump_jaxpr: False
I0421 11:56:22.578515 132915091408704 pyconfig.py:432] Config param dump_jaxpr_delete_local_after: True
I0421 11:56:22.578530 132915091408704 pyconfig.py:432] Config param dump_jaxpr_gcs_dir: gpt3-52k_2026-04-21-11-56/jaxpr_dump
I0421 11:56:22.578545 132915091408704 pyconfig.py:432] Config param dump_jaxpr_local_dir: /tmp/jaxpr_dump/
I0421 11:56:22.578558 132915091408704 pyconfig.py:432] Config param dump_step: -1
I0421 11:56:22.578574 132915091408704 pyconfig.py:432] Config param elastic_enabled: False
I0421 11:56:22.578588 132915091408704 pyconfig.py:432] Config param elastic_max_retries: 10
I0421 11:56:22.578603 132915091408704 pyconfig.py:432] Config param elastic_timeout_seconds: 300
I0421 11:56:22.578617 132915091408704 pyconfig.py:432] Config param emb_dim: 16
I0421 11:56:22.578631 132915091408704 pyconfig.py:432] Config param enable_autocheckpoint: False
I0421 11:56:22.578656 132915091408704 pyconfig.py:432] Config param enable_checkpoint_cloud_logger: False
I0421 11:56:22.578670 132915091408704 pyconfig.py:432] Config param enable_checkpointing: True
I0421 11:56:22.578686 132915091408704 pyconfig.py:432] Config param enable_continuous_checkpointing: False
I0421 11:56:22.578700 132915091408704 pyconfig.py:432] Config param enable_data_shuffling: True
I0421 11:56:22.578716 132915091408704 pyconfig.py:432] Config param enable_diloco: False
I0421 11:56:22.578731 132915091408704 pyconfig.py:432] Config param enable_dp_attention: False
I0421 11:56:22.578746 132915091408704 pyconfig.py:432] Config param enable_dropout: False
I0421 11:56:22.578761 132915091408704 pyconfig.py:432] Config param enable_emergency_checkpoint: False
I0421 11:56:22.578775 132915091408704 pyconfig.py:432] Config param enable_expert_parallel: False
I0421 11:56:22.578790 132915091408704 pyconfig.py:432] Config param enable_gcp_goodput_metrics: True
I0421 11:56:22.578804 132915091408704 pyconfig.py:432] Config param enable_gcp_step_deviation_metrics: True
I0421 11:56:22.578819 132915091408704 pyconfig.py:432] Config param enable_goodput_recording: False
I0421 11:56:22.578833 132915091408704 pyconfig.py:432] Config param enable_jax_profiler: False
I0421 11:56:22.578847 132915091408704 pyconfig.py:432] Config param enable_llm_inference_pool: False
I0421 11:56:22.578862 132915091408704 pyconfig.py:432] Config param enable_model_warmup: False
I0421 11:56:22.578876 132915091408704 pyconfig.py:432] Config param enable_multi_tier_checkpointing: False
I0421 11:56:22.578891 132915091408704 pyconfig.py:432] Config param enable_nnx: False
I0421 11:56:22.578906 132915091408704 pyconfig.py:432] Config param enable_orbax_v1: False
I0421 11:56:22.578921 132915091408704 pyconfig.py:432] Config param enable_padding_causal_mask: True
I0421 11:56:22.578935 132915091408704 pyconfig.py:432] Config param enable_pathways_goodput: False
I0421 11:56:22.578950 132915091408704 pyconfig.py:432] Config param enable_prefix_caching: False
I0421 11:56:22.578965 132915091408704 pyconfig.py:432] Config param enable_rampup_batch_size: False
I0421 11:56:22.578980 132915091408704 pyconfig.py:432] Config param enable_single_controller: False
I0421 11:56:22.578996 132915091408704 pyconfig.py:432] Config param enable_single_replica_ckpt_restoring: False
I0421 11:56:22.579010 132915091408704 pyconfig.py:432] Config param enable_tensorboard: True
I0421 11:56:22.579024 132915091408704 pyconfig.py:432] Config param enable_tunix_perf_metrics: False
I0421 11:56:22.579040 132915091408704 pyconfig.py:432] Config param encoder_attention_heads_for_audio: 4
I0421 11:56:22.579055 132915091408704 pyconfig.py:432] Config param encoder_ffn_dim_for_audio: 512
I0421 11:56:22.579069 132915091408704 pyconfig.py:432] Config param encoder_layers_for_audio: 2
I0421 11:56:22.579084 132915091408704 pyconfig.py:432] Config param engram: RematLocation.REMAT
I0421 11:56:22.579099 132915091408704 pyconfig.py:432] Config param engram_head_dim: 1280
I0421 11:56:22.579115 132915091408704 pyconfig.py:432] Config param engram_kernel_size: 4
I0421 11:56:22.579129 132915091408704 pyconfig.py:432] Config param engram_layers: []
I0421 11:56:22.579144 132915091408704 pyconfig.py:432] Config param engram_max_ngram_size: 3
I0421 11:56:22.579158 132915091408704 pyconfig.py:432] Config param engram_num_heads: 8
I0421 11:56:22.579173 132915091408704 pyconfig.py:432] Config param engram_seed: 0
I0421 11:56:22.579187 132915091408704 pyconfig.py:432] Config param engram_vocab_bases: []
I0421 11:56:22.579203 132915091408704 pyconfig.py:432] Config param epsilon_high: None
I0421 11:56:22.579217 132915091408704 pyconfig.py:432] Config param eval_corr_lst: False
I0421 11:56:22.579232 132915091408704 pyconfig.py:432] Config param eval_data_columns: ['text']
I0421 11:56:22.579247 132915091408704 pyconfig.py:432] Config param eval_dataset_name: c4/en:3.0.1
I0421 11:56:22.579263 132915091408704 pyconfig.py:432] Config param eval_image_column: image
I0421 11:56:22.579277 132915091408704 pyconfig.py:432] Config param eval_interval: -1
I0421 11:56:22.579292 132915091408704 pyconfig.py:432] Config param eval_make_lst: False
I0421 11:56:22.579306 132915091408704 pyconfig.py:432] Config param eval_per_device_batch_size: 2
I0421 11:56:22.579325 132915091408704 pyconfig.py:432] Config param eval_sampling_strategy: greedy
I0421 11:56:22.579340 132915091408704 pyconfig.py:432] Config param eval_split: validation
I0421 11:56:22.579354 132915091408704 pyconfig.py:432] Config param eval_steps: -1
I0421 11:56:22.579368 132915091408704 pyconfig.py:432] Config param expansion_factor_real_data: -1.0
I0421 11:56:22.579385 132915091408704 pyconfig.py:432] Config param final_logits_soft_cap: None
I0421 11:56:22.579399 132915091408704 pyconfig.py:432] Config param first_num_dense_layers: 0
I0421 11:56:22.579414 132915091408704 pyconfig.py:432] Config param float32_gate_logits: False
I0421 11:56:22.579428 132915091408704 pyconfig.py:432] Config param float32_logits: False
I0421 11:56:22.579443 132915091408704 pyconfig.py:432] Config param float32_qk_product: False
I0421 11:56:22.579457 132915091408704 pyconfig.py:432] Config param float32_weight_sum: True
I0421 11:56:22.579473 132915091408704 pyconfig.py:432] Config param force_q_layout: False
I0421 11:56:22.579488 132915091408704 pyconfig.py:432] Config param force_unroll: False
I0421 11:56:22.579503 132915091408704 pyconfig.py:432] Config param freeze_audio_encoder_params: True
I0421 11:56:22.579517 132915091408704 pyconfig.py:432] Config param freeze_vision_encoder_params: True
I0421 11:56:22.579532 132915091408704 pyconfig.py:432] Config param fused_mlp: False
I0421 11:56:22.579547 132915091408704 pyconfig.py:432] Config param fused_qkv: True
I0421 11:56:22.579561 132915091408704 pyconfig.py:432] Config param gcs_metrics: False
I0421 11:56:22.579576 132915091408704 pyconfig.py:432] Config param gdn_chunk_size: 64
I0421 11:56:22.579591 132915091408704 pyconfig.py:432] Config param gdn_conv_kernel_dim: 4
I0421 11:56:22.579606 132915091408704 pyconfig.py:432] Config param gdn_key_head_dim: 128
I0421 11:56:22.579621 132915091408704 pyconfig.py:432] Config param gdn_num_key_heads: 16
I0421 11:56:22.579635 132915091408704 pyconfig.py:432] Config param gdn_num_value_heads: 32
I0421 11:56:22.579662 132915091408704 pyconfig.py:432] Config param gdn_value_head_dim: 128
I0421 11:56:22.579676 132915091408704 pyconfig.py:432] Config param generate_padding_batch_eval: False
I0421 11:56:22.579692 132915091408704 pyconfig.py:432] Config param generate_padding_batch_train: False
I0421 11:56:22.579707 132915091408704 pyconfig.py:432] Config param generate_slice: v5e-16
I0421 11:56:22.579721 132915091408704 pyconfig.py:432] Config param generation_configs: {}
I0421 11:56:22.579736 132915091408704 pyconfig.py:432] Config param global_batch_size_to_eval_on: 64
I0421 11:56:22.579751 132915091408704 pyconfig.py:432] Config param global_batch_size_to_load: 512
I0421 11:56:22.579767 132915091408704 pyconfig.py:432] Config param global_batch_size_to_load_eval: 64
I0421 11:56:22.579782 132915091408704 pyconfig.py:432] Config param global_batch_size_to_load_increment: None
I0421 11:56:22.579796 132915091408704 pyconfig.py:432] Config param global_batch_size_to_load_start: None
I0421 11:56:22.579812 132915091408704 pyconfig.py:432] Config param global_batch_size_to_train_on: 512
I0421 11:56:22.579826 132915091408704 pyconfig.py:432] Config param global_head_dim: 0
I0421 11:56:22.579841 132915091408704 pyconfig.py:432] Config param global_num_kv_heads: 0
I0421 11:56:22.579856 132915091408704 pyconfig.py:432] Config param global_parameter_scale: 1
I0421 11:56:22.579870 132915091408704 pyconfig.py:432] Config param global_rampup_samples: 500
I0421 11:56:22.579886 132915091408704 pyconfig.py:432] Config param global_rope_max_timescale: -1
I0421 11:56:22.579899 132915091408704 pyconfig.py:432] Config param global_rope_proportion: 0.25
I0421 11:56:22.579915 132915091408704 pyconfig.py:432] Config param goodput_upload_interval_seconds: 30
I0421 11:56:22.579930 132915091408704 pyconfig.py:432] Config param grad_dtype: float32
I0421 11:56:22.579963 132915091408704 pyconfig.py:432] Config param gradient_accumulation_steps: 8
I0421 11:56:22.579979 132915091408704 pyconfig.py:432] Config param gradient_clipping_threshold: 1.0
I0421 11:56:22.579994 132915091408704 pyconfig.py:432] Config param grain_data_source_max_workers: 16
I0421 11:56:22.580009 132915091408704 pyconfig.py:432] Config param grain_eval_files: 
I0421 11:56:22.580025 132915091408704 pyconfig.py:432] Config param grain_file_type: arrayrecord
I0421 11:56:22.580040 132915091408704 pyconfig.py:432] Config param grain_num_threads: 16
I0421 11:56:22.580056 132915091408704 pyconfig.py:432] Config param grain_num_threads_eval: 16
I0421 11:56:22.580072 132915091408704 pyconfig.py:432] Config param grain_packing_type: first_fit
I0421 11:56:22.580087 132915091408704 pyconfig.py:432] Config param grain_per_worker_buffer_size: 1
I0421 11:56:22.580103 132915091408704 pyconfig.py:432] Config param grain_per_worker_buffer_size_eval: 1
I0421 11:56:22.580117 132915091408704 pyconfig.py:432] Config param grain_prefetch_buffer_size: 500
I0421 11:56:22.580133 132915091408704 pyconfig.py:432] Config param grain_prefetch_buffer_size_eval: 500
I0421 11:56:22.580147 132915091408704 pyconfig.py:432] Config param grain_ram_budget_mb: 1024
I0421 11:56:22.580163 132915091408704 pyconfig.py:432] Config param grain_shuffle_buffer_size: 100
I0421 11:56:22.580177 132915091408704 pyconfig.py:432] Config param grain_train_files: 
I0421 11:56:22.580192 132915091408704 pyconfig.py:432] Config param grain_train_mixture_config_path: 
I0421 11:56:22.580208 132915091408704 pyconfig.py:432] Config param grain_worker_count: 1
I0421 11:56:22.580222 132915091408704 pyconfig.py:432] Config param grain_worker_count_eval: 1
I0421 11:56:22.580238 132915091408704 pyconfig.py:432] Config param grpo_beta: 0.08
I0421 11:56:22.580252 132915091408704 pyconfig.py:432] Config param grpo_epsilon: 0.2
I0421 11:56:22.580268 132915091408704 pyconfig.py:432] Config param hardware: tpu
I0421 11:56:22.580283 132915091408704 pyconfig.py:432] Config param hbm_utilization_vllm: 0.72
I0421 11:56:22.580300 132915091408704 pyconfig.py:432] Config param head_dim: 8
I0421 11:56:22.580315 132915091408704 pyconfig.py:432] Config param heartbeat_reporting_interval_in_seconds: 5
I0421 11:56:22.580335 132915091408704 pyconfig.py:432] Config param hf_data_dir: None
I0421 11:56:22.580349 132915091408704 pyconfig.py:432] Config param hf_eval_files: None
I0421 11:56:22.580365 132915091408704 pyconfig.py:432] Config param hf_eval_split: None
I0421 11:56:22.580379 132915091408704 pyconfig.py:432] Config param hf_name: None
I0421 11:56:22.580394 132915091408704 pyconfig.py:432] Config param hf_path: OptimalScale/ClimbMix
I0421 11:56:22.580410 132915091408704 pyconfig.py:432] Config param hf_train_files: None
I0421 11:56:22.580425 132915091408704 pyconfig.py:432] Config param hidden_size_for_vit: 1408
I0421 11:56:22.580439 132915091408704 pyconfig.py:432] Config param hide_profiler_step_metric: False
I0421 11:56:22.580454 132915091408704 pyconfig.py:432] Config param ici_autoregressive_parallelism: 1
I0421 11:56:22.580468 132915091408704 pyconfig.py:432] Config param ici_context_autoregressive_parallelism: 1
I0421 11:56:22.580483 132915091408704 pyconfig.py:432] Config param ici_context_parallelism: 1
I0421 11:56:22.580497 132915091408704 pyconfig.py:432] Config param ici_data_parallelism: 1
I0421 11:56:22.580513 132915091408704 pyconfig.py:432] Config param ici_diloco_parallelism: 1
I0421 11:56:22.580527 132915091408704 pyconfig.py:432] Config param ici_expert_parallelism: 1
I0421 11:56:22.580543 132915091408704 pyconfig.py:432] Config param ici_fsdp_parallelism: -1
I0421 11:56:22.580557 132915091408704 pyconfig.py:432] Config param ici_fsdp_transpose_parallelism: 1
I0421 11:56:22.580572 132915091408704 pyconfig.py:432] Config param ici_parallelism: [1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
I0421 11:56:22.580588 132915091408704 pyconfig.py:432] Config param ici_pipeline_parallelism: 1
I0421 11:56:22.580603 132915091408704 pyconfig.py:432] Config param ici_sequence_parallelism: 1
I0421 11:56:22.580616 132915091408704 pyconfig.py:432] Config param ici_tensor_parallelism: 1
I0421 11:56:22.580631 132915091408704 pyconfig.py:432] Config param ici_tensor_sequence_parallelism: 1
I0421 11:56:22.580655 132915091408704 pyconfig.py:432] Config param ici_tensor_transpose_parallelism: 1
I0421 11:56:22.580670 132915091408704 pyconfig.py:432] Config param image_path: 
I0421 11:56:22.580685 132915091408704 pyconfig.py:432] Config param image_placeholder: <|image|>
I0421 11:56:22.580701 132915091408704 pyconfig.py:432] Config param image_size_for_vit: 896
I0421 11:56:22.580716 132915091408704 pyconfig.py:432] Config param indexer_head_dim: 128
I0421 11:56:22.580730 132915091408704 pyconfig.py:432] Config param indexer_loss_scaling_factor: 0.0
I0421 11:56:22.580746 132915091408704 pyconfig.py:432] Config param indexer_n_heads: 64
I0421 11:56:22.580760 132915091408704 pyconfig.py:432] Config param indexer_sparse_training: False
I0421 11:56:22.580775 132915091408704 pyconfig.py:432] Config param indexer_topk: 2048
I0421 11:56:22.580790 132915091408704 pyconfig.py:432] Config param inference_benchmark_test: False
I0421 11:56:22.580806 132915091408704 pyconfig.py:432] Config param inference_metadata_file: 
I0421 11:56:22.580820 132915091408704 pyconfig.py:432] Config param inference_microbenchmark_log_file_path: 
I0421 11:56:22.580836 132915091408704 pyconfig.py:432] Config param inference_microbenchmark_loop_iters: 10
I0421 11:56:22.580850 132915091408704 pyconfig.py:432] Config param inference_microbenchmark_num_samples: [1, 2, 3, 4, 5]
I0421 11:56:22.580866 132915091408704 pyconfig.py:432] Config param inference_microbenchmark_prefill_lengths: 64,128,256,512,1024
I0421 11:56:22.580882 132915091408704 pyconfig.py:432] Config param inference_microbenchmark_stages: prefill,generate
I0421 11:56:22.580896 132915091408704 pyconfig.py:432] Config param inference_server: MaxtextInterleavedServer
I0421 11:56:22.580912 132915091408704 pyconfig.py:432] Config param inhomogeneous_layer_cycle_interval: 1
I0421 11:56:22.580927 132915091408704 pyconfig.py:432] Config param init_weights_seed: 0
I0421 11:56:22.580942 132915091408704 pyconfig.py:432] Config param input_data_sharding_logical_axes: ['activation_embed_and_logits_batch', 'activation_norm_length']
I0421 11:56:22.580959 132915091408704 pyconfig.py:432] Config param interleave_moe_layer_step: 1
I0421 11:56:22.580973 132915091408704 pyconfig.py:432] Config param intermediate_size_for_vit: 5632
I0421 11:56:22.580988 132915091408704 pyconfig.py:432] Config param internal_compile: False
I0421 11:56:22.581003 132915091408704 pyconfig.py:432] Config param internal_compile_num_devices: -1
I0421 11:56:22.581017 132915091408704 pyconfig.py:432] Config param jax_cache_dir: ~/jax_cache
I0421 11:56:22.581032 132915091408704 pyconfig.py:432] Config param jax_debug_log_modules: 
I0421 11:56:22.581048 132915091408704 pyconfig.py:432] Config param jax_distributed_initialization_timeout: 300
I0421 11:56:22.581062 132915091408704 pyconfig.py:432] Config param jax_profiler_port: 9999
I0421 11:56:22.581078 132915091408704 pyconfig.py:432] Config param key_proj: RematLocation.REMAT
I0421 11:56:22.581093 132915091408704 pyconfig.py:432] Config param kv_cache_buffer: 256
I0421 11:56:22.581108 132915091408704 pyconfig.py:432] Config param kv_lora_rank: 512
I0421 11:56:22.581123 132915091408704 pyconfig.py:432] Config param kv_quant_axis: KvQuantAxis.HEADS_AND_DKV
I0421 11:56:22.581140 132915091408704 pyconfig.py:432] Config param kv_quant_dtype: int8
I0421 11:56:22.581154 132915091408704 pyconfig.py:432] Config param kv_wa_proj: RematLocation.REMAT
I0421 11:56:22.581170 132915091408704 pyconfig.py:432] Config param learning_rate: 0.0002
I0421 11:56:22.581185 132915091408704 pyconfig.py:432] Config param learning_rate_final_fraction: 0.1
I0421 11:56:22.581200 132915091408704 pyconfig.py:432] Config param learning_rate_schedule_steps: 200000
I0421 11:56:22.581214 132915091408704 pyconfig.py:432] Config param load_balance_loss_weight: 0.0
I0421 11:56:22.581230 132915091408704 pyconfig.py:432] Config param load_checkpoint_only_once: False
I0421 11:56:22.581244 132915091408704 pyconfig.py:432] Config param load_from_prefill_dir: False
I0421 11:56:22.581259 132915091408704 pyconfig.py:432] Config param load_full_state_path: 
I0421 11:56:22.581274 132915091408704 pyconfig.py:432] Config param load_parameters_path: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items
I0421 11:56:22.581289 132915091408704 pyconfig.py:432] Config param local_checkpoint_directory: 
I0421 11:56:22.581303 132915091408704 pyconfig.py:432] Config param local_checkpoint_period: 0
I0421 11:56:22.581322 132915091408704 pyconfig.py:432] Config param local_rope_max_timescale: -1
I0421 11:56:22.581336 132915091408704 pyconfig.py:432] Config param local_rope_proportion: 1.0
I0421 11:56:22.581352 132915091408704 pyconfig.py:432] Config param log_config: True
I0421 11:56:22.581367 132915091408704 pyconfig.py:432] Config param log_period: 10
I0421 11:56:22.581382 132915091408704 pyconfig.py:432] Config param logical_axis_rules: (('activation_embed_and_logits_batch', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_embed_and_logits_batch_sequence', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_vocab', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_vocab', ('tensor', 'tensor_transpose')), ('activation_vocab', 'tensor_sequence'), ('activation_vocab', ('sequence', 'context')), ('vocab', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('embed_vocab', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence', 'autoregressive')), ('activation_kv_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence')), ('activation_attn_length', ('sequence', 'context')), ('activation_attn_length', ('context',)), ('activation_q_length', ('context',)), ('activation_kv_length', ()), ('activation_attn_embed', ('tensor', 'tensor_transpose')), ('activation_kv', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_kv_head_dim', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('q_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('kv_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('qkv', ()), ('kv', ()), ('kv_head_dim', ()), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'expert')), ('q_lora_up_proj', ()), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'expert')), ('kv_lora_up_proj', ()), ('activation_batch_moe', ('data', 'fsdp', 'fsdp_transpose')), ('activation_length_moe', ('sequence', 'context')), ('activation_length_moe', ('context',)), ('activation_norm_length_moe', ('tensor_sequence', 'context', 'sequence')), ('activation_embed_moe', ('tensor', 'tensor_transpose')), ('activation_mlp_moe', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_exp', ('expert',)), ('exp', 'expert'), ('mlp_moe', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'context')), ('embed_moe', ('fsdp', 'sequence', 'context')), ('activation_mlp', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_length', ('sequence', 'context')), ('activation_length', ('context',)), ('activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_embed', ('tensor', 'tensor_transpose')), ('activation_stage', 'stage'), ('mlp', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'context', 'expert')), ('norm', ('tensor', 'tensor_transpose')), ('layers', 'stage'), ('diloco', 'diloco'), ('engram_dim', ('tensor',)), ('dense_layers', ()), ('moe_layers', ()), ('mhc', ()), ('prefill_activation_length', ('sequence', 'context')), ('prefill_activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_prefill_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_length', ('sequence',)), ('cache_heads', ('autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence')), ('cache_heads', ('autoregressive', 'tensor', 'tensor_sequence')), ('paged_kv_heads', ('tensor',)), ('cache_batch_prefill', ()), ('cache_batch', ()), ('cache_heads_none', ()), ('cache_kv', ()), ('cache_sequence', ()), ('num_pages', ()), ('tokens_per_page', ()), ('paged_kv_head_dim_size', ()), ('mlp_no_fsdp', ('tensor', 'tensor_sequence', 'autoregressive')), ('embed_tensor_transpose', ('tensor_transpose',)), ('exp_with_fsdp', 'fsdp'))
I0421 11:56:22.581452 132915091408704 pyconfig.py:432] Config param logits_dot_in_fp32: False
I0421 11:56:22.581467 132915091408704 pyconfig.py:432] Config param logits_via_embedding: True
I0421 11:56:22.581483 132915091408704 pyconfig.py:432] Config param lora_input_adapters_path: 
I0421 11:56:22.581497 132915091408704 pyconfig.py:432] Config param loss_algo: grpo
I0421 11:56:22.581513 132915091408704 pyconfig.py:432] Config param lr_schedule_type: LearningRateScheduleType.COSINE
I0421 11:56:22.581530 132915091408704 pyconfig.py:432] Config param managed_mldiagnostics: False
I0421 11:56:22.581545 132915091408704 pyconfig.py:432] Config param managed_mldiagnostics_dir: None
I0421 11:56:22.581560 132915091408704 pyconfig.py:432] Config param managed_mldiagnostics_run_group: 
I0421 11:56:22.581574 132915091408704 pyconfig.py:432] Config param matmul_precision: MatmulPrecision.DEFAULT
I0421 11:56:22.581592 132915091408704 pyconfig.py:432] Config param max_checkify: False
I0421 11:56:22.581606 132915091408704 pyconfig.py:432] Config param max_concurrency: 256
I0421 11:56:22.581622 132915091408704 pyconfig.py:432] Config param max_corpus_chars: 10000000
I0421 11:56:22.581636 132915091408704 pyconfig.py:432] Config param max_num_batched_tokens: None
I0421 11:56:22.581662 132915091408704 pyconfig.py:432] Config param max_num_checkpoints_to_keep: None
I0421 11:56:22.581676 132915091408704 pyconfig.py:432] Config param max_num_images_per_example: -1
I0421 11:56:22.581692 132915091408704 pyconfig.py:432] Config param max_num_seqs: None
I0421 11:56:22.581707 132915091408704 pyconfig.py:432] Config param max_position_embeddings: 163840
I0421 11:56:22.581721 132915091408704 pyconfig.py:432] Config param max_prefill_predict_length: 64
I0421 11:56:22.581737 132915091408704 pyconfig.py:432] Config param max_sample_len_for_audio: 10000
I0421 11:56:22.581751 132915091408704 pyconfig.py:432] Config param max_segments_per_seq: -1
I0421 11:56:22.581766 132915091408704 pyconfig.py:432] Config param max_source_positions_for_audio: 1500
I0421 11:56:22.581780 132915091408704 pyconfig.py:432] Config param max_target_length: 2048
I0421 11:56:22.581795 132915091408704 pyconfig.py:432] Config param max_timescale_for_audio: 10000.0
I0421 11:56:22.581810 132915091408704 pyconfig.py:432] Config param megablox: True
I0421 11:56:22.581825 132915091408704 pyconfig.py:432] Config param merge_gating_gmm: False
I0421 11:56:22.581839 132915091408704 pyconfig.py:432] Config param mesh_axes: ['diloco', 'data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive']
I0421 11:56:22.581857 132915091408704 pyconfig.py:432] Config param metrics_dir: None
I0421 11:56:22.581871 132915091408704 pyconfig.py:432] Config param metrics_file: 
I0421 11:56:22.581887 132915091408704 pyconfig.py:432] Config param mhc_expansion_rate: 1
I0421 11:56:22.581901 132915091408704 pyconfig.py:432] Config param micro_batch_size_to_eval_on: 64
I0421 11:56:22.581916 132915091408704 pyconfig.py:432] Config param micro_batch_size_to_train_on: 64
I0421 11:56:22.581931 132915091408704 pyconfig.py:432] Config param mla_kv: RematLocation.REMAT
I0421 11:56:22.581946 132915091408704 pyconfig.py:432] Config param mla_naive_kvcache: True
I0421 11:56:22.581961 132915091408704 pyconfig.py:432] Config param mla_q: RematLocation.REMAT
I0421 11:56:22.581976 132915091408704 pyconfig.py:432] Config param mlp_activations: ['gelu']
I0421 11:56:22.581992 132915091408704 pyconfig.py:432] Config param mlp_activations_limit: -1.0
I0421 11:56:22.582007 132915091408704 pyconfig.py:432] Config param mlp_bias: False
I0421 11:56:22.582021 132915091408704 pyconfig.py:432] Config param mlp_dim: 64
I0421 11:56:22.582036 132915091408704 pyconfig.py:432] Config param mlpwi: RematLocation.REMAT
I0421 11:56:22.582051 132915091408704 pyconfig.py:432] Config param mlpwi_0: RematLocation.REMAT
I0421 11:56:22.582066 132915091408704 pyconfig.py:432] Config param mlpwi_1: RematLocation.REMAT
I0421 11:56:22.582082 132915091408704 pyconfig.py:432] Config param mlpwo: RematLocation.REMAT
I0421 11:56:22.582095 132915091408704 pyconfig.py:432] Config param moba: False
I0421 11:56:22.582111 132915091408704 pyconfig.py:432] Config param moba_chunk_size: 1024
I0421 11:56:22.582125 132915091408704 pyconfig.py:432] Config param moba_topk: 8
I0421 11:56:22.582139 132915091408704 pyconfig.py:432] Config param model_call_mode: 
I0421 11:56:22.582154 132915091408704 pyconfig.py:432] Config param model_name: gpt3-52k
I0421 11:56:22.582169 132915091408704 pyconfig.py:432] Config param moe_expert_input_dim: -1
I0421 11:56:22.582184 132915091408704 pyconfig.py:432] Config param moe_fsdp_use_two_stage_all_gather: False
I0421 11:56:22.582198 132915091408704 pyconfig.py:432] Config param moe_mlp_dim: -1
I0421 11:56:22.582212 132915091408704 pyconfig.py:432] Config param moe_mlpwi_0: RematLocation.REMAT
I0421 11:56:22.582228 132915091408704 pyconfig.py:432] Config param moe_mlpwi_1: RematLocation.REMAT
I0421 11:56:22.582244 132915091408704 pyconfig.py:432] Config param moe_mlpwo: RematLocation.REMAT
I0421 11:56:22.582260 132915091408704 pyconfig.py:432] Config param monitor_goodput: False
I0421 11:56:22.582275 132915091408704 pyconfig.py:432] Config param monitor_step_time_deviation: True
I0421 11:56:22.582290 132915091408704 pyconfig.py:432] Config param mrope_section: [24, 20, 20]
I0421 11:56:22.582307 132915091408704 pyconfig.py:432] Config param mscale: 1.0
I0421 11:56:22.582323 132915091408704 pyconfig.py:432] Config param mtc_data_parallelism: 0
I0421 11:56:22.582339 132915091408704 pyconfig.py:432] Config param mtp_eval_target_module: 0
I0421 11:56:22.582354 132915091408704 pyconfig.py:432] Config param mtp_loss_scaling_factor: 0.1
I0421 11:56:22.582369 132915091408704 pyconfig.py:432] Config param mtp_num_layers: 0
I0421 11:56:22.582384 132915091408704 pyconfig.py:432] Config param mu_dtype: float32
I0421 11:56:22.582407 132915091408704 pyconfig.py:432] Config param multi_sampling: False
I0421 11:56:22.582422 132915091408704 pyconfig.py:432] Config param multi_tier_checkpointing_backup_interval_minutes: 0
I0421 11:56:22.582437 132915091408704 pyconfig.py:432] Config param muon_beta: 0.95
I0421 11:56:22.582453 132915091408704 pyconfig.py:432] Config param muon_consistent_rms: None
I0421 11:56:22.582469 132915091408704 pyconfig.py:432] Config param muon_weight_decay: 0.0
I0421 11:56:22.582484 132915091408704 pyconfig.py:432] Config param n_routing_groups: -1
I0421 11:56:22.582498 132915091408704 pyconfig.py:432] Config param n_window_for_audio: 50
I0421 11:56:22.582513 132915091408704 pyconfig.py:432] Config param n_window_infer_for_audio: 800
I0421 11:56:22.582527 132915091408704 pyconfig.py:432] Config param nope_layer_interval: -1
I0421 11:56:22.582543 132915091408704 pyconfig.py:432] Config param norm_topk_prob: False
I0421 11:56:22.582558 132915091408704 pyconfig.py:432] Config param normalization_layer_epsilon: 1e-05
I0421 11:56:22.582575 132915091408704 pyconfig.py:432] Config param normalize_embedding_logits: False
I0421 11:56:22.582589 132915091408704 pyconfig.py:432] Config param num_attention_heads_for_vit: 16
I0421 11:56:22.582605 132915091408704 pyconfig.py:432] Config param num_batches: 4
I0421 11:56:22.582619 132915091408704 pyconfig.py:432] Config param num_channels_for_vit: 3
I0421 11:56:22.582634 132915091408704 pyconfig.py:432] Config param num_conv_layers_for_audio: 3
I0421 11:56:22.582660 132915091408704 pyconfig.py:432] Config param num_decoder_layers: 1
I0421 11:56:22.582674 132915091408704 pyconfig.py:432] Config param num_diloco_replicas: 1
I0421 11:56:22.582689 132915091408704 pyconfig.py:432] Config param num_epoch: 1
I0421 11:56:22.582703 132915091408704 pyconfig.py:432] Config param num_eval_passes: 1
I0421 11:56:22.582717 132915091408704 pyconfig.py:432] Config param num_experts: 1
I0421 11:56:22.582731 132915091408704 pyconfig.py:432] Config param num_experts_per_tok: 1
I0421 11:56:22.582748 132915091408704 pyconfig.py:432] Config param num_generations: 2
I0421 11:56:22.582762 132915091408704 pyconfig.py:432] Config param num_hidden_layers_for_vit: 34
I0421 11:56:22.582778 132915091408704 pyconfig.py:432] Config param num_iterations: 1
I0421 11:56:22.582792 132915091408704 pyconfig.py:432] Config param num_kv_heads: 2
I0421 11:56:22.582807 132915091408704 pyconfig.py:432] Config param num_layers_per_pipeline_stage: 1
I0421 11:56:22.582821 132915091408704 pyconfig.py:432] Config param num_mel_bins_for_audio: 128
I0421 11:56:22.582837 132915091408704 pyconfig.py:432] Config param num_pipeline_microbatches: -1
I0421 11:56:22.582851 132915091408704 pyconfig.py:432] Config param num_pipeline_repeats: -1
I0421 11:56:22.582866 132915091408704 pyconfig.py:432] Config param num_position_embeddings_for_vit: 1024
I0421 11:56:22.582880 132915091408704 pyconfig.py:432] Config param num_query_heads: 2
I0421 11:56:22.582896 132915091408704 pyconfig.py:432] Config param num_samplers_slices: -1
I0421 11:56:22.582910 132915091408704 pyconfig.py:432] Config param num_slices: 1
I0421 11:56:22.582925 132915091408704 pyconfig.py:432] Config param num_target_devices: 32
I0421 11:56:22.582939 132915091408704 pyconfig.py:432] Config param num_test_batches: 5
I0421 11:56:22.582954 132915091408704 pyconfig.py:432] Config param num_trainer_slices: -1
I0421 11:56:22.582968 132915091408704 pyconfig.py:432] Config param num_vocab_tiling: 1
I0421 11:56:22.582983 132915091408704 pyconfig.py:432] Config param off_policy_steps: 0
I0421 11:56:22.582998 132915091408704 pyconfig.py:432] Config param offline_data_dir: None
I0421 11:56:22.583013 132915091408704 pyconfig.py:432] Config param opt_type: OptimizerType.ADAM_PAX
I0421 11:56:22.583029 132915091408704 pyconfig.py:432] Config param optimize_mesh_for_tpu_v6e: False
I0421 11:56:22.583044 132915091408704 pyconfig.py:432] Config param optimizer_memory_host_offload: False
I0421 11:56:22.583058 132915091408704 pyconfig.py:432] Config param original_max_position_embeddings: 4096
I0421 11:56:22.583074 132915091408704 pyconfig.py:432] Config param out_hidden_size_for_vit: 512
I0421 11:56:22.583089 132915091408704 pyconfig.py:432] Config param out_proj: RematLocation.REMAT
I0421 11:56:22.583103 132915091408704 pyconfig.py:432] Config param output_dim_for_audio: 512
I0421 11:56:22.583118 132915091408704 pyconfig.py:432] Config param override_logical_axis_rules: False
I0421 11:56:22.583132 132915091408704 pyconfig.py:432] Config param override_model_config: True
I0421 11:56:22.583147 132915091408704 pyconfig.py:432] Config param packing: True
I0421 11:56:22.583161 132915091408704 pyconfig.py:432] Config param pagedattn_head_dim_alignment: 128
I0421 11:56:22.583176 132915091408704 pyconfig.py:432] Config param pagedattn_max_pages_per_group: -1
I0421 11:56:22.583190 132915091408704 pyconfig.py:432] Config param pagedattn_num_pages: 64
I0421 11:56:22.583206 132915091408704 pyconfig.py:432] Config param pagedattn_pages_per_compute_block: 4
I0421 11:56:22.583220 132915091408704 pyconfig.py:432] Config param pagedattn_tokens_per_page: 32
I0421 11:56:22.583235 132915091408704 pyconfig.py:432] Config param param_scan_axis: 1
I0421 11:56:22.583249 132915091408704 pyconfig.py:432] Config param parameter_memory_host_offload: False
I0421 11:56:22.583265 132915091408704 pyconfig.py:432] Config param partial_rotary_factor: 1.0
I0421 11:56:22.583280 132915091408704 pyconfig.py:432] Config param patch_size_for_vit: 14
I0421 11:56:22.583294 132915091408704 pyconfig.py:432] Config param penalty_incorrect_answer: -1.0
I0421 11:56:22.583309 132915091408704 pyconfig.py:432] Config param penalty_incorrect_format: -0.5
I0421 11:56:22.583326 132915091408704 pyconfig.py:432] Config param per_device_batch_size: 2
I0421 11:56:22.583341 132915091408704 pyconfig.py:432] Config param per_device_batch_size_increment: 2.0
I0421 11:56:22.583356 132915091408704 pyconfig.py:432] Config param per_device_batch_size_start: 4.0
I0421 11:56:22.583373 132915091408704 pyconfig.py:432] Config param pipeline_delay_activation_forwarding: False
I0421 11:56:22.583387 132915091408704 pyconfig.py:432] Config param pipeline_fsdp_ag_once: False
I0421 11:56:22.583402 132915091408704 pyconfig.py:432] Config param pipeline_fsdp_ag_per_repeat: False
I0421 11:56:22.583416 132915091408704 pyconfig.py:432] Config param pipeline_parallel_layers: 1
I0421 11:56:22.583431 132915091408704 pyconfig.py:432] Config param pixel_shuffle_ratio_for_vit: 0.5
I0421 11:56:22.583446 132915091408704 pyconfig.py:432] Config param posemb_type_for_vit: learn
I0421 11:56:22.583461 132915091408704 pyconfig.py:432] Config param position_id_per_seconds: 25
I0421 11:56:22.583475 132915091408704 pyconfig.py:432] Config param prefill_cache_axis_order: 1,2,0,3
I0421 11:56:22.583491 132915091408704 pyconfig.py:432] Config param prefill_cache_dir: 
I0421 11:56:22.583506 132915091408704 pyconfig.py:432] Config param prefill_chunk_size: 256
I0421 11:56:22.583520 132915091408704 pyconfig.py:432] Config param prefill_slice: v5e-16
I0421 11:56:22.583535 132915091408704 pyconfig.py:432] Config param prefix_caching_dram_byte: 100000000000
I0421 11:56:22.583549 132915091408704 pyconfig.py:432] Config param prefix_caching_hbm_byte: 10000000000
I0421 11:56:22.583565 132915091408704 pyconfig.py:432] Config param profile_cleanly: True
I0421 11:56:22.583580 132915091408704 pyconfig.py:432] Config param profile_periodically_period: -1
I0421 11:56:22.583594 132915091408704 pyconfig.py:432] Config param profile_power_events: False
I0421 11:56:22.583609 132915091408704 pyconfig.py:432] Config param profiler: ProfilerType.NONE
I0421 11:56:22.583627 132915091408704 pyconfig.py:432] Config param profiler_steps: 5
I0421 11:56:22.583653 132915091408704 pyconfig.py:432] Config param projector_dropout_for_vit: 0.0
I0421 11:56:22.583668 132915091408704 pyconfig.py:432] Config param projector_input_dim_for_vit: 4096
I0421 11:56:22.583684 132915091408704 pyconfig.py:432] Config param projector_output_dim_for_vit: 4096
I0421 11:56:22.583698 132915091408704 pyconfig.py:432] Config param prometheus_port: 0
I0421 11:56:22.583713 132915091408704 pyconfig.py:432] Config param prompt: I love to
I0421 11:56:22.583729 132915091408704 pyconfig.py:432] Config param pure_nnx: False
I0421 11:56:22.583744 132915091408704 pyconfig.py:432] Config param pure_nnx_decoder: False
I0421 11:56:22.583758 132915091408704 pyconfig.py:432] Config param q_lora_rank: 0
I0421 11:56:22.583773 132915091408704 pyconfig.py:432] Config param qk_clip_threshold: 100.0
I0421 11:56:22.583787 132915091408704 pyconfig.py:432] Config param qk_nope_head_dim: 128
I0421 11:56:22.583803 132915091408704 pyconfig.py:432] Config param qk_norm_with_scale: True
I0421 11:56:22.583819 132915091408704 pyconfig.py:432] Config param qk_rope_head_dim: 64
I0421 11:56:22.583834 132915091408704 pyconfig.py:432] Config param qkv_proj: RematLocation.REMAT
I0421 11:56:22.583848 132915091408704 pyconfig.py:432] Config param quant_cfg_path: 
I0421 11:56:22.583864 132915091408704 pyconfig.py:432] Config param quantization: QuantizationType.NONE
I0421 11:56:22.583882 132915091408704 pyconfig.py:432] Config param quantization_local_shard_count: 4
I0421 11:56:22.583897 132915091408704 pyconfig.py:432] Config param quantize_kvcache: False
I0421 11:56:22.583911 132915091408704 pyconfig.py:432] Config param query_proj: RematLocation.REMAT
I0421 11:56:22.583927 132915091408704 pyconfig.py:432] Config param query_wa_proj: RematLocation.REMAT
I0421 11:56:22.583941 132915091408704 pyconfig.py:432] Config param ragged_block_size: 256
I0421 11:56:22.583956 132915091408704 pyconfig.py:432] Config param ragged_buffer_factor: -1.0
I0421 11:56:22.583971 132915091408704 pyconfig.py:432] Config param rampup_end_step: 0
I0421 11:56:22.583986 132915091408704 pyconfig.py:432] Config param rampup_samples_per_increment_to_load: None
I0421 11:56:22.584001 132915091408704 pyconfig.py:432] Config param reasoning_end_token: </reasoning>
I0421 11:56:22.584016 132915091408704 pyconfig.py:432] Config param reasoning_start_token: <reasoning>
I0421 11:56:22.584033 132915091408704 pyconfig.py:432] Config param record_internal_nn_metrics: 0
I0421 11:56:22.584047 132915091408704 pyconfig.py:432] Config param remat_policy: full
I0421 11:56:22.584063 132915091408704 pyconfig.py:432] Config param remat_policy_for_vit: minimal
I0421 11:56:22.584077 132915091408704 pyconfig.py:432] Config param remove_size_one_mesh_axis_from_type: True
I0421 11:56:22.584091 132915091408704 pyconfig.py:432] Config param replicate_quant_scale: False
I0421 11:56:22.584106 132915091408704 pyconfig.py:432] Config param replicator_backup_interval_minutes: 0
I0421 11:56:22.584120 132915091408704 pyconfig.py:432] Config param report_heartbeat_metric_for_gcp_monitoring: False
I0421 11:56:22.584136 132915091408704 pyconfig.py:432] Config param report_performance_metric_for_gcp_monitoring: False
I0421 11:56:22.584150 132915091408704 pyconfig.py:432] Config param reshape_q: False
I0421 11:56:22.584165 132915091408704 pyconfig.py:432] Config param return_log_prob: False
I0421 11:56:22.584181 132915091408704 pyconfig.py:432] Config param reuse_example_batch: 0
I0421 11:56:22.584195 132915091408704 pyconfig.py:432] Config param reward_exact_answer: 5.0
I0421 11:56:22.584210 132915091408704 pyconfig.py:432] Config param reward_exact_format_match: 3.0
I0421 11:56:22.584226 132915091408704 pyconfig.py:432] Config param reward_partial_format_match: 0.5
I0421 11:56:22.584240 132915091408704 pyconfig.py:432] Config param reward_ratio_guess_to_answer_high: 0.5
I0421 11:56:22.584255 132915091408704 pyconfig.py:432] Config param reward_ratio_guess_to_answer_low: 0.25
I0421 11:56:22.584270 132915091408704 pyconfig.py:432] Config param reward_white_space_format_match: 1.5
I0421 11:56:22.584286 132915091408704 pyconfig.py:432] Config param rl: {'num_generations': 2, 'num_iterations': 1, 'grpo_beta': 0.08, 'grpo_epsilon': 0.2, 'loss_algo': 'grpo', 'use_agentic_rollout': False, 'max_concurrency': 256, 'off_policy_steps': 0, 'system_prompt': '', 'degenerate_group_masking': True, 'epsilon_high': None}
I0421 11:56:22.584306 132915091408704 pyconfig.py:432] Config param rollout_data_parallelism: -1
I0421 11:56:22.584325 132915091408704 pyconfig.py:432] Config param rollout_expert_parallelism: 1
I0421 11:56:22.584341 132915091408704 pyconfig.py:432] Config param rollout_micro_batch_size: -1
I0421 11:56:22.584356 132915091408704 pyconfig.py:432] Config param rollout_tensor_parallelism: -1
I0421 11:56:22.584370 132915091408704 pyconfig.py:432] Config param rope_attention_scaling: False
I0421 11:56:22.584385 132915091408704 pyconfig.py:432] Config param rope_factor: 40
I0421 11:56:22.584401 132915091408704 pyconfig.py:432] Config param rope_interleave: True
I0421 11:56:22.584415 132915091408704 pyconfig.py:432] Config param rope_linear_scaling_factor: 1.0
I0421 11:56:22.584431 132915091408704 pyconfig.py:432] Config param rope_max_timescale: 10000
I0421 11:56:22.584446 132915091408704 pyconfig.py:432] Config param rope_min_timescale: 1
I0421 11:56:22.584462 132915091408704 pyconfig.py:432] Config param rope_theta_for_vit: 10000
I0421 11:56:22.584476 132915091408704 pyconfig.py:432] Config param rope_truncate: True
I0421 11:56:22.584491 132915091408704 pyconfig.py:432] Config param rope_type: RopeType.DEFAULT
I0421 11:56:22.584508 132915091408704 pyconfig.py:432] Config param rope_use_scale: True
I0421 11:56:22.584524 132915091408704 pyconfig.py:432] Config param routed_bias: False
I0421 11:56:22.584538 132915091408704 pyconfig.py:432] Config param routed_bias_update_rate: 0.0
I0421 11:56:22.584554 132915091408704 pyconfig.py:432] Config param routed_scaling_factor: 1.0
I0421 11:56:22.584568 132915091408704 pyconfig.py:432] Config param routed_score_func: 
I0421 11:56:22.584583 132915091408704 pyconfig.py:432] Config param run_name: gpt3-52k_2026-04-21-11-56
I0421 11:56:22.584597 132915091408704 pyconfig.py:432] Config param sa_block_kv: 512
I0421 11:56:22.584612 132915091408704 pyconfig.py:432] Config param sa_block_kv_compute: 512
I0421 11:56:22.584626 132915091408704 pyconfig.py:432] Config param sa_block_kv_dkv: 512
I0421 11:56:22.584654 132915091408704 pyconfig.py:432] Config param sa_block_kv_dkv_compute: 512
I0421 11:56:22.584669 132915091408704 pyconfig.py:432] Config param sa_block_kv_dq: 512
I0421 11:56:22.584684 132915091408704 pyconfig.py:432] Config param sa_block_q: 512
I0421 11:56:22.584698 132915091408704 pyconfig.py:432] Config param sa_block_q_dkv: 512
I0421 11:56:22.584714 132915091408704 pyconfig.py:432] Config param sa_block_q_dq: 512
I0421 11:56:22.584728 132915091408704 pyconfig.py:432] Config param sa_k_layout: HEAD_DIM_MINOR
I0421 11:56:22.584743 132915091408704 pyconfig.py:432] Config param sa_q_layout: HEAD_DIM_MINOR
I0421 11:56:22.584758 132915091408704 pyconfig.py:432] Config param sa_use_fused_bwd_kernel: False
I0421 11:56:22.584772 132915091408704 pyconfig.py:432] Config param sa_v_layout: HEAD_DIM_MINOR
I0421 11:56:22.584788 132915091408704 pyconfig.py:432] Config param sampler_devices_fraction: 0.5
I0421 11:56:22.584802 132915091408704 pyconfig.py:432] Config param save_checkpoint_on_completion: True
I0421 11:56:22.584818 132915091408704 pyconfig.py:432] Config param save_config_to_gcs: False
I0421 11:56:22.584832 132915091408704 pyconfig.py:432] Config param save_quantized_params_path: 
I0421 11:56:22.584847 132915091408704 pyconfig.py:432] Config param scale_embedding_for_audio: True
I0421 11:56:22.584861 132915091408704 pyconfig.py:432] Config param scan_layers: True
I0421 11:56:22.584877 132915091408704 pyconfig.py:432] Config param scan_layers_per_stage: False
I0421 11:56:22.584891 132915091408704 pyconfig.py:432] Config param scan_pipeline_iterations: True
I0421 11:56:22.584905 132915091408704 pyconfig.py:432] Config param scan_pipeline_repeats: False
I0421 11:56:22.584920 132915091408704 pyconfig.py:432] Config param set_remat_policy_on_layers_per_stage: False
I0421 11:56:22.584934 132915091408704 pyconfig.py:432] Config param set_remat_policy_on_pipeline_iterations: True
I0421 11:56:22.584949 132915091408704 pyconfig.py:432] Config param sft_train_on_completion_only: False
I0421 11:56:22.584963 132915091408704 pyconfig.py:432] Config param shard_exp_on_fsdp: False
I0421 11:56:22.584977 132915091408704 pyconfig.py:432] Config param shard_mode: ShardMode.AUTO
I0421 11:56:22.584995 132915091408704 pyconfig.py:432] Config param shard_optimizer_over_data: False
I0421 11:56:22.585010 132915091408704 pyconfig.py:432] Config param sharding_strategy: None
I0421 11:56:22.585025 132915091408704 pyconfig.py:432] Config param sharding_tolerance: 0.02
I0421 11:56:22.585040 132915091408704 pyconfig.py:432] Config param shardy: True
I0421 11:56:22.585055 132915091408704 pyconfig.py:432] Config param share_kv_projections: False
I0421 11:56:22.585070 132915091408704 pyconfig.py:432] Config param shared_experts: 0
I0421 11:56:22.585086 132915091408704 pyconfig.py:432] Config param sinkhorn_iterations: 20
I0421 11:56:22.585100 132915091408704 pyconfig.py:432] Config param skip_first_n_steps_for_profiler: 1
I0421 11:56:22.585115 132915091408704 pyconfig.py:432] Config param skip_jax_distributed_system: False
I0421 11:56:22.585130 132915091408704 pyconfig.py:432] Config param skip_step_interval: 128
I0421 11:56:22.585145 132915091408704 pyconfig.py:432] Config param skip_step_on_spikes: False
I0421 11:56:22.585160 132915091408704 pyconfig.py:432] Config param skip_step_scaling_factor: 6.0
I0421 11:56:22.585175 132915091408704 pyconfig.py:432] Config param sliding_window_size: 0
I0421 11:56:22.585189 132915091408704 pyconfig.py:432] Config param solution_end_token: </answer>
I0421 11:56:22.585205 132915091408704 pyconfig.py:432] Config param solution_start_token: <answer>
I0421 11:56:22.585220 132915091408704 pyconfig.py:432] Config param source_checkpoint_layout: orbax
I0421 11:56:22.585234 132915091408704 pyconfig.py:432] Config param sparse_matmul: True
I0421 11:56:22.585249 132915091408704 pyconfig.py:432] Config param spatial_merge_size_for_vit: 2
I0421 11:56:22.585264 132915091408704 pyconfig.py:432] Config param stack_prefill_result_cache: False
I0421 11:56:22.585279 132915091408704 pyconfig.py:432] Config param stack_trace_interval_seconds: 600
I0421 11:56:22.585294 132915091408704 pyconfig.py:432] Config param stack_trace_to_cloud: False
I0421 11:56:22.585308 132915091408704 pyconfig.py:432] Config param step_deviation_interval_seconds: 30
I0421 11:56:22.585326 132915091408704 pyconfig.py:432] Config param steps: 200000
I0421 11:56:22.585342 132915091408704 pyconfig.py:432] Config param stop_strings: None
I0421 11:56:22.585356 132915091408704 pyconfig.py:432] Config param student_overrides: {'model_name': 'llama3.1-8b'}
I0421 11:56:22.585372 132915091408704 pyconfig.py:432] Config param student_params_to_update: None
I0421 11:56:22.585387 132915091408704 pyconfig.py:432] Config param subslice_shape: 
I0421 11:56:22.585402 132915091408704 pyconfig.py:432] Config param swap_space_vllm_gb: 2
I0421 11:56:22.585417 132915091408704 pyconfig.py:432] Config param system_prompt: 
I0421 11:56:22.585432 132915091408704 pyconfig.py:432] Config param target_eval_loss: 0.0
I0421 11:56:22.585446 132915091408704 pyconfig.py:432] Config param teacher_overrides: {'model_name': 'llama3.1-8b'}
I0421 11:56:22.585462 132915091408704 pyconfig.py:432] Config param temperature_tuning: False
I0421 11:56:22.585476 132915091408704 pyconfig.py:432] Config param temporal_patch_size_for_vit: 2
I0421 11:56:22.585492 132915091408704 pyconfig.py:432] Config param tensorboard_dir: None
I0421 11:56:22.585507 132915091408704 pyconfig.py:432] Config param tensors_on_device: None
I0421 11:56:22.585522 132915091408704 pyconfig.py:432] Config param tensors_to_offload: None
I0421 11:56:22.585536 132915091408704 pyconfig.py:432] Config param test_batch_start_index: 0
I0421 11:56:22.585551 132915091408704 pyconfig.py:432] Config param tile_size_for_vit: 336
I0421 11:56:22.585566 132915091408704 pyconfig.py:432] Config param tokenize_eval_data: True
I0421 11:56:22.585581 132915091408704 pyconfig.py:432] Config param tokenize_train_data: True
I0421 11:56:22.585595 132915091408704 pyconfig.py:432] Config param tokenizer_path: meta-llama/Llama-3.1-8B
I0421 11:56:22.585610 132915091408704 pyconfig.py:432] Config param tokenizer_type: TokenizerType.HUGGINGFACE
I0421 11:56:22.585627 132915091408704 pyconfig.py:432] Config param topk_routing_group: -1
I0421 11:56:22.585650 132915091408704 pyconfig.py:432] Config param train_data_columns: ['text']
I0421 11:56:22.585666 132915091408704 pyconfig.py:432] Config param train_fraction: 1.0
I0421 11:56:22.585681 132915091408704 pyconfig.py:432] Config param train_image_column: image
I0421 11:56:22.585696 132915091408704 pyconfig.py:432] Config param train_micro_batch_size: -1
I0421 11:56:22.585711 132915091408704 pyconfig.py:432] Config param train_split: train
I0421 11:56:22.585725 132915091408704 pyconfig.py:432] Config param trainable_parameters_mask: []
I0421 11:56:22.585741 132915091408704 pyconfig.py:432] Config param trainable_position_size: 2048
I0421 11:56:22.585756 132915091408704 pyconfig.py:432] Config param trainer_devices_fraction: 0.5
I0421 11:56:22.585771 132915091408704 pyconfig.py:432] Config param upload_all_profiler_results: False
I0421 11:56:22.585786 132915091408704 pyconfig.py:432] Config param use_2d_fsdp_sharding: False
I0421 11:56:22.585801 132915091408704 pyconfig.py:432] Config param use_agentic_rollout: False
I0421 11:56:22.585814 132915091408704 pyconfig.py:432] Config param use_audio: False
I0421 11:56:22.585830 132915091408704 pyconfig.py:432] Config param use_audio_in_video: False
I0421 11:56:22.585845 132915091408704 pyconfig.py:432] Config param use_batch_split_schedule: False
I0421 11:56:22.585859 132915091408704 pyconfig.py:432] Config param use_chat_template: False
I0421 11:56:22.585875 132915091408704 pyconfig.py:432] Config param use_chunked_prefill: False
I0421 11:56:22.585889 132915091408704 pyconfig.py:432] Config param use_custom_sort_vjp: True
I0421 11:56:22.585903 132915091408704 pyconfig.py:432] Config param use_dpo: False
I0421 11:56:22.585918 132915091408704 pyconfig.py:432] Config param use_gather_mosaic_kernel: False
I0421 11:56:22.585932 132915091408704 pyconfig.py:432] Config param use_grpo: True
I0421 11:56:22.585947 132915091408704 pyconfig.py:432] Config param use_indexer: False
I0421 11:56:22.585963 132915091408704 pyconfig.py:432] Config param use_iota_embed: True
I0421 11:56:22.585978 132915091408704 pyconfig.py:432] Config param use_jax_splash: False
I0421 11:56:22.585992 132915091408704 pyconfig.py:432] Config param use_max_logit_estimate: -1
I0421 11:56:22.586007 132915091408704 pyconfig.py:432] Config param use_mrope: False
I0421 11:56:22.586021 132915091408704 pyconfig.py:432] Config param use_multimodal: False
I0421 11:56:22.586036 132915091408704 pyconfig.py:432] Config param use_pathways: True
I0421 11:56:22.586050 132915091408704 pyconfig.py:432] Config param use_post_attn_norm: False
I0421 11:56:22.586066 132915091408704 pyconfig.py:432] Config param use_post_ffw_norm: False
I0421 11:56:22.586080 132915091408704 pyconfig.py:432] Config param use_qk_clip: False
I0421 11:56:22.586095 132915091408704 pyconfig.py:432] Config param use_qk_norm: False
I0421 11:56:22.586111 132915091408704 pyconfig.py:432] Config param use_qk_norm_in_gdn: True
I0421 11:56:22.586125 132915091408704 pyconfig.py:432] Config param use_qwix_quantization: False
I0421 11:56:22.586140 132915091408704 pyconfig.py:432] Config param use_ragged_attention: False
I0421 11:56:22.586154 132915091408704 pyconfig.py:432] Config param use_random_routing: False
I0421 11:56:22.586169 132915091408704 pyconfig.py:432] Config param use_replicator_service: False
I0421 11:56:22.586184 132915091408704 pyconfig.py:432] Config param use_ring_of_experts: False
I0421 11:56:22.586199 132915091408704 pyconfig.py:432] Config param use_sft: False
I0421 11:56:22.586213 132915091408704 pyconfig.py:432] Config param use_splash_scheduler: False
I0421 11:56:22.586239 132915091408704 pyconfig.py:432] Config param use_tokamax_gmm: False
I0421 11:56:22.586255 132915091408704 pyconfig.py:432] Config param use_tokamax_splash: False
I0421 11:56:22.586270 132915091408704 pyconfig.py:432] Config param use_truncation: True
I0421 11:56:22.586285 132915091408704 pyconfig.py:432] Config param use_tunix_gradient_accumulation: False
I0421 11:56:22.586299 132915091408704 pyconfig.py:432] Config param use_untrainable_positional_embedding: False
I0421 11:56:22.586314 132915091408704 pyconfig.py:432] Config param use_vertex_tensorboard: False
I0421 11:56:22.586333 132915091408704 pyconfig.py:432] Config param using_pipeline_parallelism: False
I0421 11:56:22.586347 132915091408704 pyconfig.py:432] Config param v_head_dim: 128
I0421 11:56:22.586362 132915091408704 pyconfig.py:432] Config param v_norm_with_scale: True
I0421 11:56:22.586377 132915091408704 pyconfig.py:432] Config param value_proj: RematLocation.REMAT
I0421 11:56:22.586392 132915091408704 pyconfig.py:432] Config param vertex_tensorboard_project: 
I0421 11:56:22.586407 132915091408704 pyconfig.py:432] Config param vertex_tensorboard_region: 
I0421 11:56:22.586421 132915091408704 pyconfig.py:432] Config param video_path: 
I0421 11:56:22.586435 132915091408704 pyconfig.py:432] Config param video_placeholder: <|video|>
I0421 11:56:22.586450 132915091408704 pyconfig.py:432] Config param vision_output_dim_for_vit: 4096
I0421 11:56:22.586464 132915091408704 pyconfig.py:432] Config param vision_output_length: -1
I0421 11:56:22.586479 132915091408704 pyconfig.py:432] Config param vllm_additional_config: {}
I0421 11:56:22.586495 132915091408704 pyconfig.py:432] Config param vllm_hf_config_path: 
I0421 11:56:22.586509 132915091408704 pyconfig.py:432] Config param vllm_hf_overrides: {}
I0421 11:56:22.586525 132915091408704 pyconfig.py:432] Config param vocab_size: 32000
I0421 11:56:22.586539 132915091408704 pyconfig.py:432] Config param warmup_steps_fraction: 0.1
I0421 11:56:22.586554 132915091408704 pyconfig.py:432] Config param weight_dtype: float32
I0421 11:56:22.586576 132915091408704 pyconfig.py:432] Config param weight_quantization_calibration_method: absmax
I0421 11:56:22.586591 132915091408704 pyconfig.py:432] Config param wi_tile_dlhs_batch_seq: 512
I0421 11:56:22.586606 132915091408704 pyconfig.py:432] Config param wi_tile_dlhs_embed_dim: 1024
I0421 11:56:22.586621 132915091408704 pyconfig.py:432] Config param wi_tile_dlhs_mlp_dim: 1024
I0421 11:56:22.586635 132915091408704 pyconfig.py:432] Config param wi_tile_drhs_batch_seq: 512
I0421 11:56:22.586657 132915091408704 pyconfig.py:432] Config param wi_tile_drhs_embed_dim: 1024
I0421 11:56:22.586672 132915091408704 pyconfig.py:432] Config param wi_tile_drhs_mlp_dim: 1024
I0421 11:56:22.586687 132915091408704 pyconfig.py:432] Config param wi_tile_fwd_batch_seq: 512
I0421 11:56:22.586701 132915091408704 pyconfig.py:432] Config param wi_tile_fwd_embed_dim: 1024
I0421 11:56:22.586716 132915091408704 pyconfig.py:432] Config param wi_tile_fwd_mlp_dim: 1024
I0421 11:56:22.586730 132915091408704 pyconfig.py:432] Config param wo_tile_dlhs_batch_seq: 512
I0421 11:56:22.586745 132915091408704 pyconfig.py:432] Config param wo_tile_dlhs_embed_dim: 1024
I0421 11:56:22.586760 132915091408704 pyconfig.py:432] Config param wo_tile_dlhs_mlp_dim: 1024
I0421 11:56:22.586775 132915091408704 pyconfig.py:432] Config param wo_tile_drhs_batch_seq: 512
I0421 11:56:22.586789 132915091408704 pyconfig.py:432] Config param wo_tile_drhs_embed_dim: 1024
I0421 11:56:22.586804 132915091408704 pyconfig.py:432] Config param wo_tile_drhs_mlp_dim: 1024
I0421 11:56:22.586818 132915091408704 pyconfig.py:432] Config param wo_tile_fwd_batch_seq: 512
I0421 11:56:22.586833 132915091408704 pyconfig.py:432] Config param wo_tile_fwd_embed_dim: 1024
I0421 11:56:22.586847 132915091408704 pyconfig.py:432] Config param wo_tile_fwd_mlp_dim: 1024
I0421 11:56:22.586862 132915091408704 pyconfig.py:432] Config param wsd_decay_steps_fraction: 0.1
I0421 11:56:22.586877 132915091408704 pyconfig.py:432] Config param wsd_decay_style: WsdDecayStyle.LINEAR
I0421 11:56:22.586893 132915091408704 pyconfig.py:432] Config param xprof_e2e_enable_fw_power_level_event: False
I0421 11:56:22.586908 132915091408704 pyconfig.py:432] Config param xprof_e2e_enable_fw_thermal_event: False
I0421 11:56:22.586923 132915091408704 pyconfig.py:432] Config param xprof_e2e_enable_fw_throttle_event: False
I0421 11:56:22.586937 132915091408704 pyconfig.py:432] Config param xprof_tpu_power_trace_level: 0
I0421 11:56:22.586954 132915091408704 pyconfig.py:432] Config param z_loss_multiplier: 0.0
I0421 11:56:22.587307 132915091408704 tokenizer.py:245] Tokenizer path: meta-llama/Llama-2-7b-chat-hf
I0421 11:56:22.587347 132915091408704 tokenizer.py:224] Loading HF tokenizer: meta-llama/Llama-2-7b-chat-hf
I0421 11:56:26.626177 132915091408704 _schedule.py:129] A polynomial schedule was set with a non-positive `transition_steps` value; this results in a constant schedule with value `init_value`.
I0421 11:56:26.629189 132915091408704 maxtext_utils.py:1718] Num_devices: 32, shape (1, 4, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1)
I0421 11:56:26.629307 132915091408704 train_distill.py:608] Applying logical axis rules for model initialization and training...
I0421 11:56:26.629379 132915091408704 train_distill.py:612] Loading Student from ...
I0421 11:56:26.629407 132915091408704 train_distill.py:169] --- Student Configuration ---
I0421 11:56:26.629428 132915091408704 train_distill.py:170]   Model Name:      gpt3-52k
I0421 11:56:26.629449 132915091408704 train_distill.py:171]   Dimensions:      1 Layers, 16 Emb Dim, 8 Head Dim
I0421 11:56:26.629467 132915091408704 train_distill.py:174]   Attention Heads: 2 Query, 2 KV
I0421 11:56:26.629484 132915091408704 train_distill.py:175]   Vocab Size:      32000
I0421 11:56:26.629501 132915091408704 train_distill.py:176]   Checkpoint:      
I0421 11:56:26.629524 132915091408704 train_distill.py:477] Initializing model: gpt3-52k...
I0421 11:56:28.029580 132915091408704 train_distill.py:626] Loading Teacher from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items...
I0421 11:56:28.029695 132915091408704 train_distill.py:169] --- Teacher Configuration ---
I0421 11:56:28.029723 132915091408704 train_distill.py:170]   Model Name:      gpt3-52k
I0421 11:56:28.029746 132915091408704 train_distill.py:171]   Dimensions:      1 Layers, 16 Emb Dim, 8 Head Dim
I0421 11:56:28.029767 132915091408704 train_distill.py:174]   Attention Heads: 2 Query, 2 KV
I0421 11:56:28.029785 132915091408704 train_distill.py:175]   Vocab Size:      32000
I0421 11:56:28.029804 132915091408704 train_distill.py:176]   Checkpoint:      gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items
I0421 11:56:28.029823 132915091408704 train_distill.py:477] Initializing model: gpt3-52k...
I0421 11:56:29.098298 132915091408704 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0421 11:56:29.098752 132915091408704 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=True, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x78e1fd4d1e50>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0421 11:56:29.098811 132915091408704 abstract_checkpointer.py:35] orbax-checkpoint version: 0.11.28
W0421 11:56:29.614618 132915091408704 checkpoint.py:202] Metadata file does not exist: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items/_CHECKPOINT_METADATA
I0421 11:56:30.590670    2085 google_auth_provider.cc:181] Running on GCE, using service account 562977990677-compute@developer.gserviceaccount.com
I0421 11:56:31.724815 132915091408704 checkpointer.py:304] Restoring checkpoint from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items.
W0421 11:56:34.225456 132915091408704 transform_utils.py:230] The transformations API will eventually be replaced by an upgraded design. The current API will not be removed until this point, but it will no longer be actively worked on.
I0421 11:56:34.225995 132915091408704 transform_utils.py:288] The following keys are not loaded from the original tree after applying specified transforms: params/params/decoder/to_nnx__rngs/aqt/count, params/params/decoder/to_nnx__rngs/aqt/key, params/params/decoder/to_nnx__rngs/dropout/count, params/params/decoder/to_nnx__rngs/dropout/key, params/params/decoder/to_nnx__rngs/params/count, params/params/decoder/to_nnx__rngs/params/key
I0421 11:56:34.284867 132915091408704 checkpointer.py:318] Finished restoring checkpoint in 2.94 seconds from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items.
I0421 11:56:34.977324 132915091408704 train_distill.py:652] Initializing Data Iterators via MaxText pipeline...
I0421 11:56:35.041019 132915091408704 config.py:112] TensorFlow version 2.20.0 available.
I0421 11:56:35.041513 132915091408704 config.py:125] JAX version 0.8.3 available.
E0421 11:56:37.075127 132915091408704 packing.py:209] PackAndBatchOperation is deprecated. Please use lazy_dataset.FirstFitPackIterDataset instead.
I0421 11:56:37.075346 132915091408704 data_loader.py:408] Adding CopyNumPyArrayToSharedMemory MapTransform.
I0421 11:56:37.078364 132915091408704 train_distill.py:422] Input Pipeline Checkpointing: DISABLED
I0421 11:56:37.078426 132915091408704 train_distill.py:426] Reason: Iterator 'MultiHostDataLoadIterator' is not recognized as Grain (dataset_type='DatasetType.HF', has_save=False)
I0421 11:56:37.078487 132915091408704 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0421 11:56:37.078568 132915091408704 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x78e1fd4d1e50>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0421 11:56:37.078608 132915091408704 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0421 11:56:37.078653 132915091408704 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x78e1fd4d1e50>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0421 11:56:37.078697 132915091408704 checkpoint_manager.py:702] [process=4][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78c79efd8200>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78c79efd83e0>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x78c79efd8980>}, handler_registry=None
I0421 11:56:37.078887 132915091408704 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78c79efd8200>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0421 11:56:37.078927 132915091408704 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78c79efd83e0>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0421 11:56:37.078954 132915091408704 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x78c79efd8980>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0421 11:56:37.078977 132915091408704 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x78d994109b80>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0421 11:56:37.079004 132915091408704 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78c79efd8200>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78c79efd8200>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78c79efd83e0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78c79efd83e0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x78c79efd8980>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x78c79efd8980>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x78d994109b80>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x78d994109b80>}).
I0421 11:56:37.079409 132915091408704 async_checkpointer.py:177] [process=4][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x78c71eff8c20> timeout: 600 secs and primary_host=0 for async checkpoint writes
I0421 11:56:40.005038 132915091408704 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260421_114106/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260421_114106_07_distill_smoke/checkpoints
I0421 11:56:40.040076 132915091408704 checkpoint_manager.py:921] [process=4][thread=MainThread] CheckpointManager created,  primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260421_114106/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260421_114106_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x78c79efd8950>
I0421 11:56:40.040204 132915091408704 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0421 11:56:40.040270 132915091408704 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x78e1fd4d1e50>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0421 11:56:40.040321 132915091408704 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0421 11:56:40.040369 132915091408704 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x78e1fd4d1e50>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0421 11:56:40.040420 132915091408704 checkpoint_manager.py:1983] [process=4][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning.
I0421 11:56:40.040498 132915091408704 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=132915091408704 count=1 at 0x78c71ef6d480>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x78c79efd9580>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x78c79efd92e0>, _write_futures=[])
I0421 11:56:40.040912 132915091408704 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=132915091408704 count=1 at 0x78c71ef6d480>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x78c79efd9580>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x78c79efd92e0>, _write_futures=[])
I0421 11:56:40.040943 132915091408704 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=132915091408704 count=1 at 0x78c71ef6d480>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x78c79efd9580>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x78c79efd92e0>, _write_futures=[])
I0421 11:56:40.040977 132915091408704 checkpoint_manager.py:702] [process=4][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78c79efd8920>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78c71ef9fa70>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x78c71ef9e300>, 'iter': <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x78c71ef9e900>}, handler_registry=None
I0421 11:56:40.041100 132915091408704 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78c79efd8920>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0421 11:56:40.041153 132915091408704 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78c71ef9fa70>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0421 11:56:40.041195 132915091408704 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x78c71ef9e300>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0421 11:56:40.041239 132915091408704 composite_checkpoint_handler.py:237] Deferred registration for item: "iter". Adding handler `<maxtext.common.checkpointing.GrainCheckpointHandler object at 0x78c71ef9e900>` for item "iter" and save args `<class 'maxtext.common.checkpointing.GrainCheckpointSave'>` and restore args `<class 'maxtext.common.checkpointing.GrainCheckpointRestore'>` to `_handler_registry`.
I0421 11:56:40.041276 132915091408704 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x78c71ef9fbc0>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0421 11:56:40.041312 132915091408704 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78c79efd8920>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78c79efd8920>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78c71ef9fa70>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78c71ef9fa70>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x78c71ef9e300>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x78c71ef9e300>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointSave'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x78c71ef9e900>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointRestore'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x78c71ef9e900>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x78c71ef9fbc0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x78c71ef9fbc0>}).
I0421 11:56:40.041409 132915091408704 async_checkpointer.py:177] [process=4][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x78c71eff8d60> timeout: 600 secs and primary_host=0 for async checkpoint writes
I0421 11:56:40.426063 132915091408704 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260421_114106/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260421_114106_07_distill_smoke/checkpoints
I0421 11:56:40.848942 132915091408704 checkpoint_manager.py:921] [process=4][thread=MainThread] CheckpointManager created,  primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260421_114106/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260421_114106_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x78d99416adb0>
I0421 11:56:40.849529 132915091408704 train_distill.py:703] Starting Distillation Training...
I0421 11:56:40.849714 132915091408704 peft_trainer.py:584] Training with mesh: Mesh('diloco': 1, 'data': 4, 'stage': 1, 'fsdp': 8, 'fsdp_transpose': 1, 'sequence': 1, 'context': 1, 'context_autoregressive': 1, 'tensor': 1, 'tensor_transpose': 1, 'tensor_sequence': 1, 'expert': 1, 'autoregressive': 1, axis_types=(Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto))
I0421 11:56:41.201780 132915091408704 peft_trainer.py:594] Compiled train_step cache size: 0
I0421 11:56:41.203452 132774467983104 grain_pool.py:367] Grain pool will use 1 processes.
I0421 11:56:41.230426 132774467983104 grain_pool.py:440] Grain pool will start child processes.
I0421 11:56:41.235541 132774467983104 grain_pool.py:448] Grain pool started all child processes.
2026-04-21 11:56:47.240771: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
/deps/src/maxtext/trainers/post_train/distillation/train_distill.py:281: DeprecationWarning: '.value' access is now deprecated. For Variable[Array] instances use:

  variable[...]

For other Variable types use:

  variable.get_value()

  current_step = model.training_step.value
I0421 11:56:53.691217 132915091408704 checkpoint_manager.py:1983] [process=4][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning.
I0421 11:56:53.693050 132915091408704 checkpoint_manager.py:1501] [process=4] Saving checkpoint at step 1
I0421 11:56:53.696366 132915091408704 async_checkpointer.py:452] [process=4] Started async saving checkpoint to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260421_114106/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260421_114106_07_distill_smoke/checkpoints/1.
I0421 11:56:54.642632 132915091408704 signaling_client.py:364] Using JaxDistributedSignalingClient
I0421 11:56:54.643626 132915091408704 jax_array_handlers.py:347] Scheduling D2H of 22 prioritized jax.Array.
I0421 11:56:54.643694 132915091408704 replica_slices.py:410] Transferring arrays to host memory with options: use_replica_parallel=True, min_slice_bytes_for_replica_parallel=None, max_replicas_for_replica_parallel=None, enable_pinned_host_transfer=False
I0421 11:56:55.299752 132915091408704 base_pytree_checkpoint_handler.py:153] [process=4][thread=MainThread] Initiated "orbax.checkpoint._src.serialization.jax_array_handlers.ArrayHandler".serialize. Time taken: 0.657440s
I0421 11:56:55.301318 132915091408704 base_pytree_checkpoint_handler.py:128] [process=4] /jax/checkpoint/write/blocking_gbytes_per_sec: 292.855 KiB/s (total gbytes: 267.6 KiB) (time elapsed: 913 milliseconds) (per-host)
I0421 11:56:55.301381 132915091408704 base_pytree_checkpoint_handler.py:732] [process=4][thread=MainThread] Initiated Pytree async_save. Time taken: 0.913734s (batch_requests_ready=0.251172s, total_serialization_initiated=0.662453s, others=0.000110s)
I0421 11:56:55.302214 132915091408704 jax_array_handlers.py:347] Scheduling D2H of 37 prioritized jax.Array.
I0421 11:56:55.302273 132915091408704 replica_slices.py:410] Transferring arrays to host memory with options: use_replica_parallel=True, min_slice_bytes_for_replica_parallel=None, max_replicas_for_replica_parallel=None, enable_pinned_host_transfer=False
I0421 11:56:55.310679 132915091408704 base_pytree_checkpoint_handler.py:153] [process=4][thread=MainThread] Initiated "orbax.checkpoint._src.serialization.jax_array_handlers.ArrayHandler".serialize. Time taken: 0.009209s
I0421 11:56:55.310785 132915091408704 base_pytree_checkpoint_handler.py:128] [process=4] /jax/checkpoint/write/blocking_gbytes_per_sec: 581.010 KiB/s (total gbytes: 535.1 KiB) (time elapsed: 921 milliseconds) (per-host)
I0421 11:56:55.310829 132915091408704 base_pytree_checkpoint_handler.py:732] [process=4][thread=MainThread] Initiated Pytree async_save. Time taken: 0.921089s (batch_requests_ready=0.910184s, total_serialization_initiated=0.010838s, others=0.000067s)
I0421 11:56:55.310936 132915091408704 composite_checkpoint_handler.py:715] [process=4][thread=MainThread] Initiated CompositeCheckpointHandler.async_save. Time taken: 0.927230s (all_items=0.000021s, per_item={'model_params': '0.00001740', 'optimizer_state': '0.00000381'}, temp_paths=0.927209)
I0421 11:56:55.311954 132768021702400 async_checkpointer.py:79] [process=4][thread=async_save] Background save thread started.
I0421 11:56:55.312121 132915091408704 async_checkpointer.py:561] Finished blocking save. Time taken: 1.619000s. Continuing background save to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260421_114106/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260421_114106_07_distill_smoke/checkpoints/1.
I0421 11:56:55.661831 132915091408704 checkpoint_manager.py:1549] [process=4][thread=MainThread][step=1] Starting CheckpointManager Save Finalize thread=save_finalize
I0421 11:56:55.662262 132775508154112 async_checkpointer.py:265] [process=4][thread=save_finalize] Waiting for background save thread=async_save.
I0421 11:56:55.662404 132915091408704 standard_logger.py:34] {'step': 1, 'event_type': 'save', 'directory': 'gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260421_114106/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260421_114106_07_distill_smoke/checkpoints', 'reached_preemption': False, 'preemption_received_at': None, 'synchronous': False, 'wait_for_prev_start_time': 1776772613.6911833, 'wait_for_prev_duration_secs': 0.000133514404296875, 'time_between_consecutive_saves_sec': None, 'checkpointer_blocking_start_time': 1776772613.693089, 'checkpointer_blocking_duration_secs': 1.6191368103027344, 'get_old_steps_start_time': 1776772615.3122494, 'get_old_steps_duration_secs': 8.320808410644531e-05, 'checkpoint_manager_blocking_start_time': 1776772613.5898967, 'checkpoint_manager_blocking_duration_secs': 2.072472333908081}
/deps/src/maxtext/trainers/post_train/distillation/train_distill.py:281: DeprecationWarning: '.value' access is now deprecated. For Variable[Array] instances use:

  variable[...]

For other Variable types use:

  variable.get_value()

  current_step = model.training_step.value
I0421 11:56:58.872431 132915091408704 peft_trainer.py:474] Train step 1 training loss: 15.963623  - training perplexity: 8568669.000000
I0421 11:56:58.892655 132915091408704 peft_trainer.py:474] Train step 2 training loss: 15.943937  - training perplexity: 8401639.000000
I0421 11:56:58.917175 132915091408704 peft_trainer.py:474] Train step 3 training loss: 15.973638  - training perplexity: 8654912.000000
I0421 11:56:58.936390 132915091408704 peft_trainer.py:474] Train step 4 training loss: 15.952717  - training perplexity: 8475726.000000
I0421 11:56:58.940882 132915091408704 peft_trainer.py:733] Train loop finished in: 17.7387 seconds
I0421 11:56:58.941313 132915091408704 train_distill.py:712] Saving final checkpoint to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260421_114106/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260421_114106_07_distill_smoke/checkpoints/...
I0421 11:57:00.453261 132816505706240 array_metadata_store.py:203] [process=4][thread=array_type_handler] Wrote 37 array_metadata.ArrayMetadata to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260421_114106/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260421_114106_07_distill_smoke/checkpoints/1/optimizer_state/array_metadatas/process_4
I0421 11:57:01.273452 132765869655808 array_metadata_store.py:203] [process=4][thread=array_type_handler] Wrote 22 array_metadata.ArrayMetadata to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260421_114106/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260421_114106_07_distill_smoke/checkpoints/1/model_params/array_metadatas/process_4
I0421 11:57:01.274703 132768021702400 base_pytree_checkpoint_handler.py:128] [process=4] /jax/checkpoint/write/gbytes_per_sec: 38.851 KiB/s (total gbytes: 267.6 KiB) (time elapsed: 6 seconds) (per-host)
I0421 11:57:01.274857 132768021702400 base_pytree_checkpoint_handler.py:128] [process=4] /jax/checkpoint/write/gbytes_per_sec: 77.722 KiB/s (total gbytes: 535.1 KiB) (time elapsed: 6 seconds) (per-host)
I0421 11:57:01.274896 132768021702400 async_checkpointer.py:90] [process=4][thread=async_save] 4 Handler Commit operations completed. Time taken: 5.962833s.
I0421 11:57:10.799210 132915091408704 checkpoint_manager.py:1994] [process=4][thread=MainThread][step=1][wait_until_finished] Waiting for Save Finalize thread (save_finalize) to complete.
I0421 11:57:12.398907 132768021702400 async_checkpointer.py:144] [process=4][thread=async_save] Background save thread done. Time taken: 17.086827s.
I0421 11:57:12.399219 132775508154112 async_checkpointer.py:273] [process=4][thread=save_finalize] Done with waiting for background save thread=async_save.
I0421 11:57:12.399344 132775508154112 async_checkpointer.py:283] [process=4][thread=save_finalize] No errors found in background save thread=async_save.
I0421 11:57:12.399390 132775508154112 checkpoint_manager.py:2103] [process=4][thread=save_finalize][step=1] CheckpointManager Save Finalize is syncing with other hosts...
I0421 11:57:12.401290 132775508154112 checkpoint_manager.py:2112] [process=4][thread=save_finalize][step=1] CheckpointManager Save Finalize is done on all hosts.
I0421 11:57:12.401472 132915091408704 checkpoint_manager.py:2006] [process=4][thread=MainThread][step=1][wait_until_finished] Done waiting for Save Finalize thread (save_finalize) running at step=1.
W0421 11:57:12.401615 132915091408704 checkpoint_manager.py:1441] Waiting for previous save to complete took 1.602417 seconds. If this number is high, consider checkpointing less frequently.
I0421 11:57:12.403207 132915091408704 checkpoint_manager.py:1501] [process=4] Saving checkpoint at step 5
I0421 11:57:12.406715 132915091408704 async_checkpointer.py:452] [process=4] Started async saving checkpoint to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260421_114106/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260421_114106_07_distill_smoke/checkpoints/5.
I0421 11:57:12.947975 132915091408704 jax_array_handlers.py:347] Scheduling D2H of 22 prioritized jax.Array.
I0421 11:57:12.948071 132915091408704 replica_slices.py:410] Transferring arrays to host memory with options: use_replica_parallel=True, min_slice_bytes_for_replica_parallel=None, max_replicas_for_replica_parallel=None, enable_pinned_host_transfer=False
I0421 11:57:13.605412 132915091408704 base_pytree_checkpoint_handler.py:153] [process=4][thread=MainThread] Initiated "orbax.checkpoint._src.serialization.jax_array_handlers.ArrayHandler".serialize. Time taken: 0.658593s
I0421 11:57:13.607225 132915091408704 base_pytree_checkpoint_handler.py:128] [process=4] /jax/checkpoint/write/blocking_gbytes_per_sec: 293.030 KiB/s (total gbytes: 267.6 KiB) (time elapsed: 913 milliseconds) (per-host)
I0421 11:57:13.607290 132915091408704 base_pytree_checkpoint_handler.py:732] [process=4][thread=MainThread] Initiated Pytree async_save. Time taken: 0.913189s (batch_requests_ready=0.250994s, total_serialization_initiated=0.662092s, others=0.000103s)
I0421 11:57:13.608089 132915091408704 jax_array_handlers.py:347] Scheduling D2H of 37 prioritized jax.Array.
I0421 11:57:13.608146 132915091408704 replica_slices.py:410] Transferring arrays to host memory with options: use_replica_parallel=True, min_slice_bytes_for_replica_parallel=None, max_replicas_for_replica_parallel=None, enable_pinned_host_transfer=False
I0421 11:57:13.618217 132915091408704 base_pytree_checkpoint_handler.py:153] [process=4][thread=MainThread] Initiated "orbax.checkpoint._src.serialization.jax_array_handlers.ArrayHandler".serialize. Time taken: 0.010829s
I0421 11:57:13.618336 132915091408704 base_pytree_checkpoint_handler.py:128] [process=4] /jax/checkpoint/write/blocking_gbytes_per_sec: 579.909 KiB/s (total gbytes: 535.1 KiB) (time elapsed: 922 milliseconds) (per-host)
I0421 11:57:13.618380 132915091408704 base_pytree_checkpoint_handler.py:732] [process=4][thread=MainThread] Initiated Pytree async_save. Time taken: 0.922845s (batch_requests_ready=0.910047s, total_serialization_initiated=0.012724s, others=0.000074s)
I0421 11:57:13.618496 132915091408704 composite_checkpoint_handler.py:715] [process=4][thread=MainThread] Initiated CompositeCheckpointHandler.async_save. Time taken: 0.928512s (all_items=0.000012s, per_item={'model_params': '0.00000978', 'optimizer_state': '0.00000238'}, temp_paths=0.928500)
I0421 11:57:13.619491 132766406526720 async_checkpointer.py:79] [process=4][thread=async_save] Background save thread started.
I0421 11:57:13.619667 132915091408704 async_checkpointer.py:561] Finished blocking save. Time taken: 1.216378s. Continuing background save to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260421_114106/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260421_114106_07_distill_smoke/checkpoints/5.
I0421 11:57:13.642632 132915091408704 checkpoint_manager.py:1549] [process=4][thread=MainThread][step=5] Starting CheckpointManager Save Finalize thread=save_finalize
I0421 11:57:13.642933 132768021702400 async_checkpointer.py:265] [process=4][thread=save_finalize] Waiting for background save thread=async_save.
I0421 11:57:13.643099 132915091408704 standard_logger.py:34] {'step': 5, 'event_type': 'save', 'directory': 'gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260421_114106/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260421_114106_07_distill_smoke/checkpoints', 'reached_preemption': False, 'preemption_received_at': None, 'synchronous': False, 'wait_for_prev_start_time': 1776772630.79917, 'wait_for_prev_duration_secs': 1.602417230606079, 'time_between_consecutive_saves_sec': None, 'checkpointer_blocking_start_time': 1776772632.403247, 'checkpointer_blocking_duration_secs': 1.2165255546569824, 'get_old_steps_start_time': 1776772633.6197963, 'get_old_steps_duration_secs': 7.915496826171875e-05, 'checkpoint_manager_blocking_start_time': 1776772618.9431396, 'checkpoint_manager_blocking_duration_secs': 14.699926376342773}
I0421 11:57:13.643270 132915091408704 checkpoint_manager.py:1994] [process=4][thread=MainThread][step=5][wait_until_finished] Waiting for Save Finalize thread (save_finalize) to complete.
I0421 11:57:18.354816 132764770735872 array_metadata_store.py:203] [process=4][thread=array_type_handler] Wrote 22 array_metadata.ArrayMetadata to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260421_114106/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260421_114106_07_distill_smoke/checkpoints/5/model_params/array_metadatas/process_4
I0421 11:57:18.355936 132766406526720 base_pytree_checkpoint_handler.py:128] [process=4] /jax/checkpoint/write/gbytes_per_sec: 47.258 KiB/s (total gbytes: 267.6 KiB) (time elapsed: 5 seconds) (per-host)
I0421 11:57:18.364829 132816505706240 array_metadata_store.py:203] [process=4][thread=array_type_handler] Wrote 37 array_metadata.ArrayMetadata to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260421_114106/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260421_114106_07_distill_smoke/checkpoints/5/optimizer_state/array_metadatas/process_4
I0421 11:57:18.365839 132766406526720 base_pytree_checkpoint_handler.py:128] [process=4] /jax/checkpoint/write/gbytes_per_sec: 94.374 KiB/s (total gbytes: 535.1 KiB) (time elapsed: 5 seconds) (per-host)
I0421 11:57:18.365887 132766406526720 async_checkpointer.py:90] [process=4][thread=async_save] 4 Handler Commit operations completed. Time taken: 4.746280s.
I0421 11:57:29.329934 132766406526720 async_checkpointer.py:144] [process=4][thread=async_save] Background save thread done. Time taken: 15.710311s.
I0421 11:57:29.330250 132768021702400 async_checkpointer.py:273] [process=4][thread=save_finalize] Done with waiting for background save thread=async_save.
I0421 11:57:29.330373 132768021702400 async_checkpointer.py:283] [process=4][thread=save_finalize] No errors found in background save thread=async_save.
I0421 11:57:29.330420 132768021702400 checkpoint_manager.py:2103] [process=4][thread=save_finalize][step=5] CheckpointManager Save Finalize is syncing with other hosts...
I0421 11:57:29.332122 132768021702400 checkpoint_manager.py:2112] [process=4][thread=save_finalize][step=5] CheckpointManager Save Finalize is done on all hosts.
I0421 11:57:29.332299 132915091408704 checkpoint_manager.py:2006] [process=4][thread=MainThread][step=5][wait_until_finished] Done waiting for Save Finalize thread (save_finalize) running at step=5.
I0421 11:57:29.332416 132915091408704 train_distill.py:724] Final checkpoint saved.
I0421 11:57:29.334597 132915091408704 peft_trainer.py:474] Train step 5 training loss: 15.949528  - training perplexity: 8448739.000000
I0421 11:57:29.335007 132915091408704 checkpoint_manager.py:1983] [process=4][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning.
I0421 11:57:29.335079 132915091408704 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=132915091408704 count=1 at 0x78c71ef0bec0>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x78c71ef9dfa0>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x78c71ef9df40>, _write_futures=[])
I0421 11:57:29.335127 132915091408704 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=132915091408704 count=1 at 0x78c71ef0bec0>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x78c71ef9dfa0>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x78c71ef9df40>, _write_futures=[])
I0421 11:57:29.335153 132915091408704 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=132915091408704 count=1 at 0x78c71ef0bec0>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x78c71ef9dfa0>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x78c71ef9df40>, _write_futures=[])
I0421 11:57:29.335196 132915091408704 train_distill.py:734] Distillation Complete.
I0421 11:57:29.449062 132774467983104 grain_pool.py:547] Shutting down multiprocessing system.
I0421 11:57:31.377781 132774467983104 grain_pool.py:542] Grain pool is exiting.
I0421 11:57:31.377887 132774467983104 grain_pool.py:547] Shutting down multiprocessing system.
I0421 11:57:31.377947 132774467983104 grain_pool.py:547] Shutting down multiprocessing system.
XPK End: Tue Apr 21 11:57:40 UTC 2026
EXIT_CODE=0
NNX  ·  e27fc1e97  ·  feat_nnx_post_train_fixes_20260421_114106  ·  full log
XPK Start: Tue Apr 21 12:06:23 UTC 2026
2026-04-21 12:06:40.712581: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
I0421 12:06:44.317013 132528837793600 max_utils.py:273] Attempting to initialize the jax distributed system...
INFO:2026-04-21 12:06:53,356:jax._src.distributed:149: Starting JAX distributed service on [::]:8482
I0421 12:06:53.356404 132528837793600 distributed.py:149] Starting JAX distributed service on [::]:8482
INFO:2026-04-21 12:06:53,358:jax._src.distributed:166: Connecting to JAX distributed service on mt-07-distill-smoke-clkfo-slice-job-0-0.mt-07-distill-smoke-clkfo:8482
I0421 12:06:53.358831 132528837793600 distributed.py:166] Connecting to JAX distributed service on mt-07-distill-smoke-clkfo-slice-job-0-0.mt-07-distill-smoke-clkfo:8482
I0421 12:06:54.419385 132528837793600 max_utils.py:284] Jax distributed system initialized!
I0421 12:07:00.568473 132528837793600 max_utils.py:244] Jax distributed system is already initialized.
I0421 12:07:01.042128 132528837793600 max_utils.py:244] Jax distributed system is already initialized.
I0421 12:07:01.043320 132528837793600 pyconfig.py:432] Config param abort_on_inf_loss: True
I0421 12:07:01.043370 132528837793600 pyconfig.py:432] Config param abort_on_nan_loss: True
I0421 12:07:01.043398 132528837793600 pyconfig.py:432] Config param act_quantization_calibration_method: absmax
I0421 12:07:01.043420 132528837793600 pyconfig.py:432] Config param activation_dropout_for_audio: 0.0
I0421 12:07:01.043441 132528837793600 pyconfig.py:432] Config param activation_function_for_audio: gelu
I0421 12:07:01.043459 132528837793600 pyconfig.py:432] Config param activations_in_float32: False
I0421 12:07:01.043479 132528837793600 pyconfig.py:432] Config param adam_b1: 0.9
I0421 12:07:01.043497 132528837793600 pyconfig.py:432] Config param adam_b2: 0.95
I0421 12:07:01.043514 132528837793600 pyconfig.py:432] Config param adam_eps: 1e-08
I0421 12:07:01.043537 132528837793600 pyconfig.py:432] Config param adam_eps_root: 0.0
I0421 12:07:01.043555 132528837793600 pyconfig.py:432] Config param adam_weight_decay: 0.1
I0421 12:07:01.043572 132528837793600 pyconfig.py:432] Config param adamw_mask: []
I0421 12:07:01.043589 132528837793600 pyconfig.py:432] Config param add_bos: True
I0421 12:07:01.043606 132528837793600 pyconfig.py:432] Config param add_eos: True
I0421 12:07:01.043623 132528837793600 pyconfig.py:432] Config param allow_split_physical_axes: False
I0421 12:07:01.043640 132528837793600 pyconfig.py:432] Config param ar_cache_axis_order: 1,2,0,3
I0421 12:07:01.043657 132528837793600 pyconfig.py:432] Config param async_checkpointing: True
I0421 12:07:01.043673 132528837793600 pyconfig.py:432] Config param async_scheduling: False
I0421 12:07:01.043689 132528837793600 pyconfig.py:432] Config param attention: dot_product
I0421 12:07:01.043706 132528837793600 pyconfig.py:432] Config param attention_bias: False
I0421 12:07:01.043721 132528837793600 pyconfig.py:432] Config param attention_dropout_for_audio: 0.0
I0421 12:07:01.043752 132528837793600 pyconfig.py:432] Config param attention_out: RematLocation.REMAT
I0421 12:07:01.043773 132528837793600 pyconfig.py:432] Config param attention_output_dim: -1
I0421 12:07:01.043794 132528837793600 pyconfig.py:432] Config param attention_sink: False
I0421 12:07:01.043812 132528837793600 pyconfig.py:432] Config param attention_type: global
I0421 12:07:01.043829 132528837793600 pyconfig.py:432] Config param attn_logits_soft_cap: None
I0421 12:07:01.043846 132528837793600 pyconfig.py:432] Config param audio_path: 
I0421 12:07:01.043861 132528837793600 pyconfig.py:432] Config param audio_placeholder: <|audio|>
I0421 12:07:01.043877 132528837793600 pyconfig.py:432] Config param autoregressive_decode_assert: 
I0421 12:07:01.043895 132528837793600 pyconfig.py:432] Config param base_config: base.yml
I0421 12:07:01.043911 132528837793600 pyconfig.py:432] Config param base_emb_dim: 16
I0421 12:07:01.043928 132528837793600 pyconfig.py:432] Config param base_mlp_dim: 64
I0421 12:07:01.043944 132528837793600 pyconfig.py:432] Config param base_moe_mlp_dim: -1
I0421 12:07:01.043960 132528837793600 pyconfig.py:432] Config param base_num_decoder_layers: 1
I0421 12:07:01.043975 132528837793600 pyconfig.py:432] Config param base_num_kv_heads: 2
I0421 12:07:01.043991 132528837793600 pyconfig.py:432] Config param base_num_query_heads: 2
I0421 12:07:01.044008 132528837793600 pyconfig.py:432] Config param base_output_directory: 
I0421 12:07:01.044023 132528837793600 pyconfig.py:432] Config param batch_size: 1
I0421 12:07:01.044039 132528837793600 pyconfig.py:432] Config param batch_split_factor: 1
I0421 12:07:01.044055 132528837793600 pyconfig.py:432] Config param beta_fast: 32
I0421 12:07:01.044071 132528837793600 pyconfig.py:432] Config param beta_slow: 1
I0421 12:07:01.044087 132528837793600 pyconfig.py:432] Config param bwd_quantization_calibration_method: absmax
I0421 12:07:01.044103 132528837793600 pyconfig.py:432] Config param capacity_factor: -1.0
I0421 12:07:01.044120 132528837793600 pyconfig.py:432] Config param cast_logits_to_fp32: True
I0421 12:07:01.044136 132528837793600 pyconfig.py:432] Config param chat_template: 
I0421 12:07:01.044152 132528837793600 pyconfig.py:432] Config param chat_template_path: 
I0421 12:07:01.044170 132528837793600 pyconfig.py:432] Config param checkpoint_conversion_fn: None
I0421 12:07:01.044187 132528837793600 pyconfig.py:432] Config param checkpoint_dir: None
I0421 12:07:01.044205 132528837793600 pyconfig.py:432] Config param checkpoint_is_quantized: False
I0421 12:07:01.044222 132528837793600 pyconfig.py:432] Config param checkpoint_period: 2000
I0421 12:07:01.044238 132528837793600 pyconfig.py:432] Config param checkpoint_storage_concurrent_gb: 96
I0421 12:07:01.044255 132528837793600 pyconfig.py:432] Config param checkpoint_storage_target_data_file_size_bytes: 2147483648
I0421 12:07:01.044272 132528837793600 pyconfig.py:432] Config param checkpoint_storage_use_ocdbt: True
I0421 12:07:01.044289 132528837793600 pyconfig.py:432] Config param checkpoint_storage_use_zarr3: True
I0421 12:07:01.044305 132528837793600 pyconfig.py:432] Config param checkpoint_todelete_full_path: None
I0421 12:07:01.044320 132528837793600 pyconfig.py:432] Config param checkpoint_todelete_subdir: None
I0421 12:07:01.044336 132528837793600 pyconfig.py:432] Config param chips_per_vm: 4
I0421 12:07:01.044352 132528837793600 pyconfig.py:432] Config param chunk_attn_window_size: 0
I0421 12:07:01.044367 132528837793600 pyconfig.py:432] Config param collect_stack_trace: False
I0421 12:07:01.044383 132528837793600 pyconfig.py:432] Config param colocated_python_checkpointing: False
I0421 12:07:01.044400 132528837793600 pyconfig.py:432] Config param colocated_python_data_input: False
I0421 12:07:01.044418 132528837793600 pyconfig.py:432] Config param compile_topology: 
I0421 12:07:01.044434 132528837793600 pyconfig.py:432] Config param compile_topology_num_slices: -1
I0421 12:07:01.044450 132528837793600 pyconfig.py:432] Config param compile_xla_flags: 
I0421 12:07:01.044465 132528837793600 pyconfig.py:432] Config param compiled_trainstep_file: 
I0421 12:07:01.044480 132528837793600 pyconfig.py:432] Config param compute_axis_order: 0,1,2,3
I0421 12:07:01.044496 132528837793600 pyconfig.py:432] Config param constant_bound_config: []
I0421 12:07:01.044512 132528837793600 pyconfig.py:432] Config param context: RematLocation.REMAT
I0421 12:07:01.044529 132528837793600 pyconfig.py:432] Config param context_parallel_load_balance: True
I0421 12:07:01.044544 132528837793600 pyconfig.py:432] Config param context_parallel_size: 1
I0421 12:07:01.044559 132528837793600 pyconfig.py:432] Config param context_parallel_strategy: all_gather
I0421 12:07:01.044575 132528837793600 pyconfig.py:432] Config param context_sharding: context
I0421 12:07:01.044598 132528837793600 pyconfig.py:432] Config param conv_chunksize_for_audio: 500
I0421 12:07:01.044614 132528837793600 pyconfig.py:432] Config param conv_stride_for_vit: 14
I0421 12:07:01.044630 132528837793600 pyconfig.py:432] Config param cost_estimate_flops_bwd: -1
I0421 12:07:01.044646 132528837793600 pyconfig.py:432] Config param cost_estimate_flops_fwd: -1
I0421 12:07:01.044661 132528837793600 pyconfig.py:432] Config param custom_mesh: 
I0421 12:07:01.044677 132528837793600 pyconfig.py:432] Config param custom_mesh_and_rule: 
I0421 12:07:01.044692 132528837793600 pyconfig.py:432] Config param d_model_for_audio: 256
I0421 12:07:01.044708 132528837793600 pyconfig.py:432] Config param data_sharding: (('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'),)
I0421 12:07:01.044728 132528837793600 pyconfig.py:432] Config param data_shuffle_seed: 0
I0421 12:07:01.044753 132528837793600 pyconfig.py:432] Config param dataset_name: c4/en:3.0.1
I0421 12:07:01.044769 132528837793600 pyconfig.py:432] Config param dataset_path: 
I0421 12:07:01.044790 132528837793600 pyconfig.py:432] Config param dataset_type: DatasetType.HF
I0421 12:07:01.044807 132528837793600 pyconfig.py:432] Config param dcn_autoregressive_parallelism: 1
I0421 12:07:01.044823 132528837793600 pyconfig.py:432] Config param dcn_context_autoregressive_parallelism: 1
I0421 12:07:01.044839 132528837793600 pyconfig.py:432] Config param dcn_context_parallelism: 1
I0421 12:07:01.044854 132528837793600 pyconfig.py:432] Config param dcn_data_parallelism: -1
I0421 12:07:01.044870 132528837793600 pyconfig.py:432] Config param dcn_diloco_parallelism: 1
I0421 12:07:01.044886 132528837793600 pyconfig.py:432] Config param dcn_expert_parallelism: 1
I0421 12:07:01.044900 132528837793600 pyconfig.py:432] Config param dcn_fsdp_parallelism: 1
I0421 12:07:01.044915 132528837793600 pyconfig.py:432] Config param dcn_fsdp_transpose_parallelism: 1
I0421 12:07:01.044932 132528837793600 pyconfig.py:432] Config param dcn_parallelism: [1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
I0421 12:07:01.044949 132528837793600 pyconfig.py:432] Config param dcn_pipeline_parallelism: 1
I0421 12:07:01.044965 132528837793600 pyconfig.py:432] Config param dcn_sequence_parallelism: 1
I0421 12:07:01.044981 132528837793600 pyconfig.py:432] Config param dcn_tensor_parallelism: 1
I0421 12:07:01.044995 132528837793600 pyconfig.py:432] Config param dcn_tensor_sequence_parallelism: 1
I0421 12:07:01.045011 132528837793600 pyconfig.py:432] Config param dcn_tensor_transpose_parallelism: 1
I0421 12:07:01.045026 132528837793600 pyconfig.py:432] Config param debug: {'rl': False}
I0421 12:07:01.045043 132528837793600 pyconfig.py:432] Config param debug_sharding: False
I0421 12:07:01.045060 132528837793600 pyconfig.py:432] Config param decode_sampling_nucleus_p: -1
I0421 12:07:01.045075 132528837793600 pyconfig.py:432] Config param decode_sampling_strategy: SamplingStrategy.GREEDY
I0421 12:07:01.045092 132528837793600 pyconfig.py:432] Config param decode_sampling_temperature: 1.0
I0421 12:07:01.045108 132528837793600 pyconfig.py:432] Config param decode_sampling_top_k: 0
I0421 12:07:01.045123 132528837793600 pyconfig.py:432] Config param decoder_block: DecoderBlockType.GPT3
I0421 12:07:01.045140 132528837793600 pyconfig.py:432] Config param decoder_layer_input: RematLocation.DEVICE
I0421 12:07:01.045157 132528837793600 pyconfig.py:432] Config param deepstack_visual_indexes_for_vit: []
I0421 12:07:01.045173 132528837793600 pyconfig.py:432] Config param degenerate_group_masking: True
I0421 12:07:01.045187 132528837793600 pyconfig.py:432] Config param dense_init_scale: 1.0
I0421 12:07:01.045203 132528837793600 pyconfig.py:432] Config param diloco_outer_lr: 0.3
I0421 12:07:01.045220 132528837793600 pyconfig.py:432] Config param diloco_outer_momentum: 0.9
I0421 12:07:01.045235 132528837793600 pyconfig.py:432] Config param diloco_sync_period: 36
I0421 12:07:01.045250 132528837793600 pyconfig.py:432] Config param distill_alpha: 0.5
I0421 12:07:01.045266 132528837793600 pyconfig.py:432] Config param distill_alpha_end: None
I0421 12:07:01.045282 132528837793600 pyconfig.py:432] Config param distill_alpha_schedule: constant
I0421 12:07:01.045299 132528837793600 pyconfig.py:432] Config param distill_beta: 0.0
I0421 12:07:01.045314 132528837793600 pyconfig.py:432] Config param distill_beta_end: None
I0421 12:07:01.045330 132528837793600 pyconfig.py:432] Config param distill_beta_schedule: constant
I0421 12:07:01.045346 132528837793600 pyconfig.py:432] Config param distill_feature_loss_type: cosine
I0421 12:07:01.045361 132528837793600 pyconfig.py:432] Config param distill_layer_indices: None
I0421 12:07:01.045376 132528837793600 pyconfig.py:432] Config param distill_temperature: 1.0
I0421 12:07:01.045392 132528837793600 pyconfig.py:432] Config param distill_temperature_end: None
I0421 12:07:01.045408 132528837793600 pyconfig.py:432] Config param distill_temperature_schedule: constant
I0421 12:07:01.045424 132528837793600 pyconfig.py:432] Config param downsample_hidden_size_for_audio: 256
I0421 12:07:01.045440 132528837793600 pyconfig.py:432] Config param dpo_beta: 0.1
I0421 12:07:01.045456 132528837793600 pyconfig.py:432] Config param dpo_label_smoothing: 0.0
I0421 12:07:01.045472 132528837793600 pyconfig.py:432] Config param dq_reduction_steps: 0
I0421 12:07:01.045488 132528837793600 pyconfig.py:432] Config param dropout_rate: 0.0
I0421 12:07:01.045504 132528837793600 pyconfig.py:432] Config param dtype: bfloat16
I0421 12:07:01.045535 132528837793600 pyconfig.py:432] Config param dtype_mm: float32
I0421 12:07:01.045551 132528837793600 pyconfig.py:432] Config param dump_hlo: False
I0421 12:07:01.045567 132528837793600 pyconfig.py:432] Config param dump_hlo_delete_local_after: True
I0421 12:07:01.045582 132528837793600 pyconfig.py:432] Config param dump_hlo_gcs_dir: gpt3-52k_2026-04-21-12-07/xla_dump
I0421 12:07:01.045598 132528837793600 pyconfig.py:432] Config param dump_hlo_local_dir: /tmp/xla_dump/
I0421 12:07:01.045613 132528837793600 pyconfig.py:432] Config param dump_hlo_local_module_name: jit_train_step
I0421 12:07:01.045628 132528837793600 pyconfig.py:432] Config param dump_hlo_module_name: jit_train_step
I0421 12:07:01.045644 132528837793600 pyconfig.py:432] Config param dump_hlo_upload_all: False
I0421 12:07:01.045660 132528837793600 pyconfig.py:432] Config param dump_hlo_xla_flags: 
I0421 12:07:01.045675 132528837793600 pyconfig.py:432] Config param dump_jaxpr: False
I0421 12:07:01.045691 132528837793600 pyconfig.py:432] Config param dump_jaxpr_delete_local_after: True
I0421 12:07:01.045707 132528837793600 pyconfig.py:432] Config param dump_jaxpr_gcs_dir: gpt3-52k_2026-04-21-12-07/jaxpr_dump
I0421 12:07:01.045723 132528837793600 pyconfig.py:432] Config param dump_jaxpr_local_dir: /tmp/jaxpr_dump/
I0421 12:07:01.045749 132528837793600 pyconfig.py:432] Config param dump_step: -1
I0421 12:07:01.045765 132528837793600 pyconfig.py:432] Config param elastic_enabled: False
I0421 12:07:01.045785 132528837793600 pyconfig.py:432] Config param elastic_max_retries: 10
I0421 12:07:01.045801 132528837793600 pyconfig.py:432] Config param elastic_timeout_seconds: 300
I0421 12:07:01.045817 132528837793600 pyconfig.py:432] Config param emb_dim: 16
I0421 12:07:01.045832 132528837793600 pyconfig.py:432] Config param enable_autocheckpoint: False
I0421 12:07:01.045848 132528837793600 pyconfig.py:432] Config param enable_checkpoint_cloud_logger: False
I0421 12:07:01.045863 132528837793600 pyconfig.py:432] Config param enable_checkpointing: True
I0421 12:07:01.045879 132528837793600 pyconfig.py:432] Config param enable_continuous_checkpointing: False
I0421 12:07:01.045895 132528837793600 pyconfig.py:432] Config param enable_data_shuffling: True
I0421 12:07:01.045910 132528837793600 pyconfig.py:432] Config param enable_diloco: False
I0421 12:07:01.045926 132528837793600 pyconfig.py:432] Config param enable_dp_attention: False
I0421 12:07:01.045941 132528837793600 pyconfig.py:432] Config param enable_dropout: False
I0421 12:07:01.045956 132528837793600 pyconfig.py:432] Config param enable_emergency_checkpoint: False
I0421 12:07:01.045971 132528837793600 pyconfig.py:432] Config param enable_expert_parallel: False
I0421 12:07:01.045987 132528837793600 pyconfig.py:432] Config param enable_gcp_goodput_metrics: True
I0421 12:07:01.046002 132528837793600 pyconfig.py:432] Config param enable_gcp_step_deviation_metrics: True
I0421 12:07:01.046018 132528837793600 pyconfig.py:432] Config param enable_goodput_recording: False
I0421 12:07:01.046033 132528837793600 pyconfig.py:432] Config param enable_jax_profiler: False
I0421 12:07:01.046047 132528837793600 pyconfig.py:432] Config param enable_llm_inference_pool: False
I0421 12:07:01.046063 132528837793600 pyconfig.py:432] Config param enable_model_warmup: False
I0421 12:07:01.046077 132528837793600 pyconfig.py:432] Config param enable_multi_tier_checkpointing: False
I0421 12:07:01.046092 132528837793600 pyconfig.py:432] Config param enable_nnx: False
I0421 12:07:01.046108 132528837793600 pyconfig.py:432] Config param enable_orbax_v1: False
I0421 12:07:01.046124 132528837793600 pyconfig.py:432] Config param enable_padding_causal_mask: True
I0421 12:07:01.046140 132528837793600 pyconfig.py:432] Config param enable_pathways_goodput: False
I0421 12:07:01.046154 132528837793600 pyconfig.py:432] Config param enable_prefix_caching: False
I0421 12:07:01.046170 132528837793600 pyconfig.py:432] Config param enable_rampup_batch_size: False
I0421 12:07:01.046185 132528837793600 pyconfig.py:432] Config param enable_single_controller: False
I0421 12:07:01.046201 132528837793600 pyconfig.py:432] Config param enable_single_replica_ckpt_restoring: False
I0421 12:07:01.046217 132528837793600 pyconfig.py:432] Config param enable_tensorboard: True
I0421 12:07:01.046233 132528837793600 pyconfig.py:432] Config param enable_tunix_perf_metrics: False
I0421 12:07:01.046248 132528837793600 pyconfig.py:432] Config param encoder_attention_heads_for_audio: 4
I0421 12:07:01.046264 132528837793600 pyconfig.py:432] Config param encoder_ffn_dim_for_audio: 512
I0421 12:07:01.046280 132528837793600 pyconfig.py:432] Config param encoder_layers_for_audio: 2
I0421 12:07:01.046297 132528837793600 pyconfig.py:432] Config param engram: RematLocation.REMAT
I0421 12:07:01.046313 132528837793600 pyconfig.py:432] Config param engram_head_dim: 1280
I0421 12:07:01.046328 132528837793600 pyconfig.py:432] Config param engram_kernel_size: 4
I0421 12:07:01.046344 132528837793600 pyconfig.py:432] Config param engram_layers: []
I0421 12:07:01.046360 132528837793600 pyconfig.py:432] Config param engram_max_ngram_size: 3
I0421 12:07:01.046375 132528837793600 pyconfig.py:432] Config param engram_num_heads: 8
I0421 12:07:01.046391 132528837793600 pyconfig.py:432] Config param engram_seed: 0
I0421 12:07:01.046407 132528837793600 pyconfig.py:432] Config param engram_vocab_bases: []
I0421 12:07:01.046422 132528837793600 pyconfig.py:432] Config param epsilon_high: None
I0421 12:07:01.046438 132528837793600 pyconfig.py:432] Config param eval_corr_lst: False
I0421 12:07:01.046453 132528837793600 pyconfig.py:432] Config param eval_data_columns: ['text']
I0421 12:07:01.046469 132528837793600 pyconfig.py:432] Config param eval_dataset_name: c4/en:3.0.1
I0421 12:07:01.046485 132528837793600 pyconfig.py:432] Config param eval_image_column: image
I0421 12:07:01.046501 132528837793600 pyconfig.py:432] Config param eval_interval: -1
I0421 12:07:01.046515 132528837793600 pyconfig.py:432] Config param eval_make_lst: False
I0421 12:07:01.046531 132528837793600 pyconfig.py:432] Config param eval_per_device_batch_size: 2
I0421 12:07:01.046546 132528837793600 pyconfig.py:432] Config param eval_sampling_strategy: greedy
I0421 12:07:01.046562 132528837793600 pyconfig.py:432] Config param eval_split: validation
I0421 12:07:01.046577 132528837793600 pyconfig.py:432] Config param eval_steps: -1
I0421 12:07:01.046592 132528837793600 pyconfig.py:432] Config param expansion_factor_real_data: -1.0
I0421 12:07:01.046607 132528837793600 pyconfig.py:432] Config param final_logits_soft_cap: None
I0421 12:07:01.046623 132528837793600 pyconfig.py:432] Config param first_num_dense_layers: 0
I0421 12:07:01.046638 132528837793600 pyconfig.py:432] Config param float32_gate_logits: False
I0421 12:07:01.046653 132528837793600 pyconfig.py:432] Config param float32_logits: False
I0421 12:07:01.046669 132528837793600 pyconfig.py:432] Config param float32_qk_product: False
I0421 12:07:01.046684 132528837793600 pyconfig.py:432] Config param float32_weight_sum: True
I0421 12:07:01.046700 132528837793600 pyconfig.py:432] Config param force_q_layout: False
I0421 12:07:01.046715 132528837793600 pyconfig.py:432] Config param force_unroll: False
I0421 12:07:01.046738 132528837793600 pyconfig.py:432] Config param freeze_audio_encoder_params: True
I0421 12:07:01.046754 132528837793600 pyconfig.py:432] Config param freeze_vision_encoder_params: True
I0421 12:07:01.046770 132528837793600 pyconfig.py:432] Config param fused_mlp: False
I0421 12:07:01.046789 132528837793600 pyconfig.py:432] Config param fused_qkv: True
I0421 12:07:01.046804 132528837793600 pyconfig.py:432] Config param gcs_metrics: False
I0421 12:07:01.046819 132528837793600 pyconfig.py:432] Config param gdn_chunk_size: 64
I0421 12:07:01.046836 132528837793600 pyconfig.py:432] Config param gdn_conv_kernel_dim: 4
I0421 12:07:01.046852 132528837793600 pyconfig.py:432] Config param gdn_key_head_dim: 128
I0421 12:07:01.046868 132528837793600 pyconfig.py:432] Config param gdn_num_key_heads: 16
I0421 12:07:01.046884 132528837793600 pyconfig.py:432] Config param gdn_num_value_heads: 32
I0421 12:07:01.046898 132528837793600 pyconfig.py:432] Config param gdn_value_head_dim: 128
I0421 12:07:01.046914 132528837793600 pyconfig.py:432] Config param generate_padding_batch_eval: False
I0421 12:07:01.046929 132528837793600 pyconfig.py:432] Config param generate_padding_batch_train: False
I0421 12:07:01.046945 132528837793600 pyconfig.py:432] Config param generate_slice: v5e-16
I0421 12:07:01.046960 132528837793600 pyconfig.py:432] Config param generation_configs: {}
I0421 12:07:01.046977 132528837793600 pyconfig.py:432] Config param global_batch_size_to_eval_on: 64
I0421 12:07:01.046991 132528837793600 pyconfig.py:432] Config param global_batch_size_to_load: 512
I0421 12:07:01.047007 132528837793600 pyconfig.py:432] Config param global_batch_size_to_load_eval: 64
I0421 12:07:01.047023 132528837793600 pyconfig.py:432] Config param global_batch_size_to_load_increment: None
I0421 12:07:01.047037 132528837793600 pyconfig.py:432] Config param global_batch_size_to_load_start: None
I0421 12:07:01.047054 132528837793600 pyconfig.py:432] Config param global_batch_size_to_train_on: 512
I0421 12:07:01.047070 132528837793600 pyconfig.py:432] Config param global_head_dim: 0
I0421 12:07:01.047084 132528837793600 pyconfig.py:432] Config param global_num_kv_heads: 0
I0421 12:07:01.047100 132528837793600 pyconfig.py:432] Config param global_parameter_scale: 1
I0421 12:07:01.047117 132528837793600 pyconfig.py:432] Config param global_rampup_samples: 500
I0421 12:07:01.047133 132528837793600 pyconfig.py:432] Config param global_rope_max_timescale: -1
I0421 12:07:01.047148 132528837793600 pyconfig.py:432] Config param global_rope_proportion: 0.25
I0421 12:07:01.047166 132528837793600 pyconfig.py:432] Config param goodput_upload_interval_seconds: 30
I0421 12:07:01.047182 132528837793600 pyconfig.py:432] Config param grad_dtype: float32
I0421 12:07:01.047217 132528837793600 pyconfig.py:432] Config param gradient_accumulation_steps: 8
I0421 12:07:01.047234 132528837793600 pyconfig.py:432] Config param gradient_clipping_threshold: 1.0
I0421 12:07:01.047250 132528837793600 pyconfig.py:432] Config param grain_data_source_max_workers: 16
I0421 12:07:01.047266 132528837793600 pyconfig.py:432] Config param grain_eval_files: 
I0421 12:07:01.047282 132528837793600 pyconfig.py:432] Config param grain_file_type: arrayrecord
I0421 12:07:01.047299 132528837793600 pyconfig.py:432] Config param grain_num_threads: 16
I0421 12:07:01.047315 132528837793600 pyconfig.py:432] Config param grain_num_threads_eval: 16
I0421 12:07:01.047331 132528837793600 pyconfig.py:432] Config param grain_packing_type: first_fit
I0421 12:07:01.047348 132528837793600 pyconfig.py:432] Config param grain_per_worker_buffer_size: 1
I0421 12:07:01.047364 132528837793600 pyconfig.py:432] Config param grain_per_worker_buffer_size_eval: 1
I0421 12:07:01.047379 132528837793600 pyconfig.py:432] Config param grain_prefetch_buffer_size: 500
I0421 12:07:01.047396 132528837793600 pyconfig.py:432] Config param grain_prefetch_buffer_size_eval: 500
I0421 12:07:01.047412 132528837793600 pyconfig.py:432] Config param grain_ram_budget_mb: 1024
I0421 12:07:01.047428 132528837793600 pyconfig.py:432] Config param grain_shuffle_buffer_size: 100
I0421 12:07:01.047443 132528837793600 pyconfig.py:432] Config param grain_train_files: 
I0421 12:07:01.047459 132528837793600 pyconfig.py:432] Config param grain_train_mixture_config_path: 
I0421 12:07:01.047474 132528837793600 pyconfig.py:432] Config param grain_worker_count: 1
I0421 12:07:01.047489 132528837793600 pyconfig.py:432] Config param grain_worker_count_eval: 1
I0421 12:07:01.047505 132528837793600 pyconfig.py:432] Config param grpo_beta: 0.08
I0421 12:07:01.047522 132528837793600 pyconfig.py:432] Config param grpo_epsilon: 0.2
I0421 12:07:01.047538 132528837793600 pyconfig.py:432] Config param hardware: tpu
I0421 12:07:01.047554 132528837793600 pyconfig.py:432] Config param hbm_utilization_vllm: 0.72
I0421 12:07:01.047569 132528837793600 pyconfig.py:432] Config param head_dim: 8
I0421 12:07:01.047585 132528837793600 pyconfig.py:432] Config param heartbeat_reporting_interval_in_seconds: 5
I0421 12:07:01.047601 132528837793600 pyconfig.py:432] Config param hf_data_dir: None
I0421 12:07:01.047617 132528837793600 pyconfig.py:432] Config param hf_eval_files: None
I0421 12:07:01.047633 132528837793600 pyconfig.py:432] Config param hf_eval_split: None
I0421 12:07:01.047649 132528837793600 pyconfig.py:432] Config param hf_name: None
I0421 12:07:01.047664 132528837793600 pyconfig.py:432] Config param hf_path: OptimalScale/ClimbMix
I0421 12:07:01.047679 132528837793600 pyconfig.py:432] Config param hf_train_files: None
I0421 12:07:01.047695 132528837793600 pyconfig.py:432] Config param hidden_size_for_vit: 1408
I0421 12:07:01.047709 132528837793600 pyconfig.py:432] Config param hide_profiler_step_metric: False
I0421 12:07:01.047724 132528837793600 pyconfig.py:432] Config param ici_autoregressive_parallelism: 1
I0421 12:07:01.047748 132528837793600 pyconfig.py:432] Config param ici_context_autoregressive_parallelism: 1
I0421 12:07:01.047762 132528837793600 pyconfig.py:432] Config param ici_context_parallelism: 1
I0421 12:07:01.047782 132528837793600 pyconfig.py:432] Config param ici_data_parallelism: 1
I0421 12:07:01.047820 132528837793600 pyconfig.py:432] Config param ici_diloco_parallelism: 1
I0421 12:07:01.047834 132528837793600 pyconfig.py:432] Config param ici_expert_parallelism: 1
I0421 12:07:01.047850 132528837793600 pyconfig.py:432] Config param ici_fsdp_parallelism: -1
I0421 12:07:01.047865 132528837793600 pyconfig.py:432] Config param ici_fsdp_transpose_parallelism: 1
I0421 12:07:01.047880 132528837793600 pyconfig.py:432] Config param ici_parallelism: [1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
I0421 12:07:01.047898 132528837793600 pyconfig.py:432] Config param ici_pipeline_parallelism: 1
I0421 12:07:01.047913 132528837793600 pyconfig.py:432] Config param ici_sequence_parallelism: 1
I0421 12:07:01.047929 132528837793600 pyconfig.py:432] Config param ici_tensor_parallelism: 1
I0421 12:07:01.047943 132528837793600 pyconfig.py:432] Config param ici_tensor_sequence_parallelism: 1
I0421 12:07:01.047958 132528837793600 pyconfig.py:432] Config param ici_tensor_transpose_parallelism: 1
I0421 12:07:01.047973 132528837793600 pyconfig.py:432] Config param image_path: 
I0421 12:07:01.047988 132528837793600 pyconfig.py:432] Config param image_placeholder: <|image|>
I0421 12:07:01.048004 132528837793600 pyconfig.py:432] Config param image_size_for_vit: 896
I0421 12:07:01.048019 132528837793600 pyconfig.py:432] Config param indexer_head_dim: 128
I0421 12:07:01.048035 132528837793600 pyconfig.py:432] Config param indexer_loss_scaling_factor: 0.0
I0421 12:07:01.048050 132528837793600 pyconfig.py:432] Config param indexer_n_heads: 64
I0421 12:07:01.048066 132528837793600 pyconfig.py:432] Config param indexer_sparse_training: False
I0421 12:07:01.048080 132528837793600 pyconfig.py:432] Config param indexer_topk: 2048
I0421 12:07:01.048096 132528837793600 pyconfig.py:432] Config param inference_benchmark_test: False
I0421 12:07:01.048110 132528837793600 pyconfig.py:432] Config param inference_metadata_file: 
I0421 12:07:01.048126 132528837793600 pyconfig.py:432] Config param inference_microbenchmark_log_file_path: 
I0421 12:07:01.048142 132528837793600 pyconfig.py:432] Config param inference_microbenchmark_loop_iters: 10
I0421 12:07:01.048157 132528837793600 pyconfig.py:432] Config param inference_microbenchmark_num_samples: [1, 2, 3, 4, 5]
I0421 12:07:01.048173 132528837793600 pyconfig.py:432] Config param inference_microbenchmark_prefill_lengths: 64,128,256,512,1024
I0421 12:07:01.048189 132528837793600 pyconfig.py:432] Config param inference_microbenchmark_stages: prefill,generate
I0421 12:07:01.048204 132528837793600 pyconfig.py:432] Config param inference_server: MaxtextInterleavedServer
I0421 12:07:01.048219 132528837793600 pyconfig.py:432] Config param inhomogeneous_layer_cycle_interval: 1
I0421 12:07:01.048235 132528837793600 pyconfig.py:432] Config param init_weights_seed: 0
I0421 12:07:01.048251 132528837793600 pyconfig.py:432] Config param input_data_sharding_logical_axes: ['activation_embed_and_logits_batch', 'activation_norm_length']
I0421 12:07:01.048268 132528837793600 pyconfig.py:432] Config param interleave_moe_layer_step: 1
I0421 12:07:01.048283 132528837793600 pyconfig.py:432] Config param intermediate_size_for_vit: 5632
I0421 12:07:01.048299 132528837793600 pyconfig.py:432] Config param internal_compile: False
I0421 12:07:01.048314 132528837793600 pyconfig.py:432] Config param internal_compile_num_devices: -1
I0421 12:07:01.048330 132528837793600 pyconfig.py:432] Config param jax_cache_dir: ~/jax_cache
I0421 12:07:01.048344 132528837793600 pyconfig.py:432] Config param jax_debug_log_modules: 
I0421 12:07:01.048360 132528837793600 pyconfig.py:432] Config param jax_distributed_initialization_timeout: 300
I0421 12:07:01.048376 132528837793600 pyconfig.py:432] Config param jax_profiler_port: 9999
I0421 12:07:01.048391 132528837793600 pyconfig.py:432] Config param key_proj: RematLocation.REMAT
I0421 12:07:01.048408 132528837793600 pyconfig.py:432] Config param kv_cache_buffer: 256
I0421 12:07:01.048424 132528837793600 pyconfig.py:432] Config param kv_lora_rank: 512
I0421 12:07:01.048440 132528837793600 pyconfig.py:432] Config param kv_quant_axis: KvQuantAxis.HEADS_AND_DKV
I0421 12:07:01.048459 132528837793600 pyconfig.py:432] Config param kv_quant_dtype: int8
I0421 12:07:01.048475 132528837793600 pyconfig.py:432] Config param kv_wa_proj: RematLocation.REMAT
I0421 12:07:01.048491 132528837793600 pyconfig.py:432] Config param learning_rate: 0.0002
I0421 12:07:01.048507 132528837793600 pyconfig.py:432] Config param learning_rate_final_fraction: 0.1
I0421 12:07:01.048522 132528837793600 pyconfig.py:432] Config param learning_rate_schedule_steps: 200000
I0421 12:07:01.048537 132528837793600 pyconfig.py:432] Config param load_balance_loss_weight: 0.0
I0421 12:07:01.048552 132528837793600 pyconfig.py:432] Config param load_checkpoint_only_once: False
I0421 12:07:01.048568 132528837793600 pyconfig.py:432] Config param load_from_prefill_dir: False
I0421 12:07:01.048583 132528837793600 pyconfig.py:432] Config param load_full_state_path: 
I0421 12:07:01.048598 132528837793600 pyconfig.py:432] Config param load_parameters_path: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items
I0421 12:07:01.048613 132528837793600 pyconfig.py:432] Config param local_checkpoint_directory: 
I0421 12:07:01.048628 132528837793600 pyconfig.py:432] Config param local_checkpoint_period: 0
I0421 12:07:01.048644 132528837793600 pyconfig.py:432] Config param local_rope_max_timescale: -1
I0421 12:07:01.048660 132528837793600 pyconfig.py:432] Config param local_rope_proportion: 1.0
I0421 12:07:01.048675 132528837793600 pyconfig.py:432] Config param log_config: True
I0421 12:07:01.048690 132528837793600 pyconfig.py:432] Config param log_period: 10
I0421 12:07:01.048705 132528837793600 pyconfig.py:432] Config param logical_axis_rules: (('activation_embed_and_logits_batch', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_embed_and_logits_batch_sequence', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_vocab', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_vocab', ('tensor', 'tensor_transpose')), ('activation_vocab', 'tensor_sequence'), ('activation_vocab', ('sequence', 'context')), ('vocab', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('embed_vocab', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence', 'autoregressive')), ('activation_kv_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence')), ('activation_attn_length', ('sequence', 'context')), ('activation_attn_length', ('context',)), ('activation_q_length', ('context',)), ('activation_kv_length', ()), ('activation_attn_embed', ('tensor', 'tensor_transpose')), ('activation_kv', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_kv_head_dim', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('q_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('kv_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('qkv', ()), ('kv', ()), ('kv_head_dim', ()), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'expert')), ('q_lora_up_proj', ()), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'expert')), ('kv_lora_up_proj', ()), ('activation_batch_moe', ('data', 'fsdp', 'fsdp_transpose')), ('activation_length_moe', ('sequence', 'context')), ('activation_length_moe', ('context',)), ('activation_norm_length_moe', ('tensor_sequence', 'context', 'sequence')), ('activation_embed_moe', ('tensor', 'tensor_transpose')), ('activation_mlp_moe', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_exp', ('expert',)), ('exp', 'expert'), ('mlp_moe', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'context')), ('embed_moe', ('fsdp', 'sequence', 'context')), ('activation_mlp', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_length', ('sequence', 'context')), ('activation_length', ('context',)), ('activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_embed', ('tensor', 'tensor_transpose')), ('activation_stage', 'stage'), ('mlp', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'context', 'expert')), ('norm', ('tensor', 'tensor_transpose')), ('layers', 'stage'), ('diloco', 'diloco'), ('engram_dim', ('tensor',)), ('dense_layers', ()), ('moe_layers', ()), ('mhc', ()), ('prefill_activation_length', ('sequence', 'context')), ('prefill_activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_prefill_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_length', ('sequence',)), ('cache_heads', ('autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence')), ('cache_heads', ('autoregressive', 'tensor', 'tensor_sequence')), ('paged_kv_heads', ('tensor',)), ('cache_batch_prefill', ()), ('cache_batch', ()), ('cache_heads_none', ()), ('cache_kv', ()), ('cache_sequence', ()), ('num_pages', ()), ('tokens_per_page', ()), ('paged_kv_head_dim_size', ()), ('mlp_no_fsdp', ('tensor', 'tensor_sequence', 'autoregressive')), ('embed_tensor_transpose', ('tensor_transpose',)), ('exp_with_fsdp', 'fsdp'))
I0421 12:07:01.048795 132528837793600 pyconfig.py:432] Config param logits_dot_in_fp32: False
I0421 12:07:01.048822 132528837793600 pyconfig.py:432] Config param logits_via_embedding: True
I0421 12:07:01.048838 132528837793600 pyconfig.py:432] Config param lora_input_adapters_path: 
I0421 12:07:01.048854 132528837793600 pyconfig.py:432] Config param loss_algo: grpo
I0421 12:07:01.048870 132528837793600 pyconfig.py:432] Config param lr_schedule_type: LearningRateScheduleType.COSINE
I0421 12:07:01.048888 132528837793600 pyconfig.py:432] Config param managed_mldiagnostics: False
I0421 12:07:01.048904 132528837793600 pyconfig.py:432] Config param managed_mldiagnostics_dir: None
I0421 12:07:01.048920 132528837793600 pyconfig.py:432] Config param managed_mldiagnostics_run_group: 
I0421 12:07:01.048934 132528837793600 pyconfig.py:432] Config param matmul_precision: MatmulPrecision.DEFAULT
I0421 12:07:01.048952 132528837793600 pyconfig.py:432] Config param max_checkify: False
I0421 12:07:01.048968 132528837793600 pyconfig.py:432] Config param max_concurrency: 256
I0421 12:07:01.048984 132528837793600 pyconfig.py:432] Config param max_corpus_chars: 10000000
I0421 12:07:01.048999 132528837793600 pyconfig.py:432] Config param max_num_batched_tokens: None
I0421 12:07:01.049014 132528837793600 pyconfig.py:432] Config param max_num_checkpoints_to_keep: None
I0421 12:07:01.049029 132528837793600 pyconfig.py:432] Config param max_num_images_per_example: -1
I0421 12:07:01.049044 132528837793600 pyconfig.py:432] Config param max_num_seqs: None
I0421 12:07:01.049060 132528837793600 pyconfig.py:432] Config param max_position_embeddings: 163840
I0421 12:07:01.049076 132528837793600 pyconfig.py:432] Config param max_prefill_predict_length: 64
I0421 12:07:01.049092 132528837793600 pyconfig.py:432] Config param max_sample_len_for_audio: 10000
I0421 12:07:01.049107 132528837793600 pyconfig.py:432] Config param max_segments_per_seq: -1
I0421 12:07:01.049122 132528837793600 pyconfig.py:432] Config param max_source_positions_for_audio: 1500
I0421 12:07:01.049138 132528837793600 pyconfig.py:432] Config param max_target_length: 2048
I0421 12:07:01.049154 132528837793600 pyconfig.py:432] Config param max_timescale_for_audio: 10000.0
I0421 12:07:01.049170 132528837793600 pyconfig.py:432] Config param megablox: True
I0421 12:07:01.049184 132528837793600 pyconfig.py:432] Config param merge_gating_gmm: False
I0421 12:07:01.049199 132528837793600 pyconfig.py:432] Config param mesh_axes: ['diloco', 'data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive']
I0421 12:07:01.049216 132528837793600 pyconfig.py:432] Config param metrics_dir: None
I0421 12:07:01.049232 132528837793600 pyconfig.py:432] Config param metrics_file: 
I0421 12:07:01.049248 132528837793600 pyconfig.py:432] Config param mhc_expansion_rate: 1
I0421 12:07:01.049264 132528837793600 pyconfig.py:432] Config param micro_batch_size_to_eval_on: 64
I0421 12:07:01.049280 132528837793600 pyconfig.py:432] Config param micro_batch_size_to_train_on: 64
I0421 12:07:01.049296 132528837793600 pyconfig.py:432] Config param mla_kv: RematLocation.REMAT
I0421 12:07:01.049313 132528837793600 pyconfig.py:432] Config param mla_naive_kvcache: True
I0421 12:07:01.049329 132528837793600 pyconfig.py:432] Config param mla_q: RematLocation.REMAT
I0421 12:07:01.049345 132528837793600 pyconfig.py:432] Config param mlp_activations: ['gelu']
I0421 12:07:01.049361 132528837793600 pyconfig.py:432] Config param mlp_activations_limit: -1.0
I0421 12:07:01.049377 132528837793600 pyconfig.py:432] Config param mlp_bias: False
I0421 12:07:01.049392 132528837793600 pyconfig.py:432] Config param mlp_dim: 64
I0421 12:07:01.049408 132528837793600 pyconfig.py:432] Config param mlpwi: RematLocation.REMAT
I0421 12:07:01.049424 132528837793600 pyconfig.py:432] Config param mlpwi_0: RematLocation.REMAT
I0421 12:07:01.049439 132528837793600 pyconfig.py:432] Config param mlpwi_1: RematLocation.REMAT
I0421 12:07:01.049455 132528837793600 pyconfig.py:432] Config param mlpwo: RematLocation.REMAT
I0421 12:07:01.049470 132528837793600 pyconfig.py:432] Config param moba: False
I0421 12:07:01.049484 132528837793600 pyconfig.py:432] Config param moba_chunk_size: 1024
I0421 12:07:01.049500 132528837793600 pyconfig.py:432] Config param moba_topk: 8
I0421 12:07:01.049515 132528837793600 pyconfig.py:432] Config param model_call_mode: 
I0421 12:07:01.049530 132528837793600 pyconfig.py:432] Config param model_name: gpt3-52k
I0421 12:07:01.049546 132528837793600 pyconfig.py:432] Config param moe_expert_input_dim: -1
I0421 12:07:01.049560 132528837793600 pyconfig.py:432] Config param moe_fsdp_use_two_stage_all_gather: False
I0421 12:07:01.049576 132528837793600 pyconfig.py:432] Config param moe_mlp_dim: -1
I0421 12:07:01.049590 132528837793600 pyconfig.py:432] Config param moe_mlpwi_0: RematLocation.REMAT
I0421 12:07:01.049606 132528837793600 pyconfig.py:432] Config param moe_mlpwi_1: RematLocation.REMAT
I0421 12:07:01.049621 132528837793600 pyconfig.py:432] Config param moe_mlpwo: RematLocation.REMAT
I0421 12:07:01.049637 132528837793600 pyconfig.py:432] Config param monitor_goodput: False
I0421 12:07:01.049651 132528837793600 pyconfig.py:432] Config param monitor_step_time_deviation: True
I0421 12:07:01.049667 132528837793600 pyconfig.py:432] Config param mrope_section: [24, 20, 20]
I0421 12:07:01.049682 132528837793600 pyconfig.py:432] Config param mscale: 1.0
I0421 12:07:01.049698 132528837793600 pyconfig.py:432] Config param mtc_data_parallelism: 0
I0421 12:07:01.049714 132528837793600 pyconfig.py:432] Config param mtp_eval_target_module: 0
I0421 12:07:01.049729 132528837793600 pyconfig.py:432] Config param mtp_loss_scaling_factor: 0.1
I0421 12:07:01.049757 132528837793600 pyconfig.py:432] Config param mtp_num_layers: 0
I0421 12:07:01.049771 132528837793600 pyconfig.py:432] Config param mu_dtype: float32
I0421 12:07:01.049799 132528837793600 pyconfig.py:432] Config param multi_sampling: False
I0421 12:07:01.049816 132528837793600 pyconfig.py:432] Config param multi_tier_checkpointing_backup_interval_minutes: 0
I0421 12:07:01.049830 132528837793600 pyconfig.py:432] Config param muon_beta: 0.95
I0421 12:07:01.049846 132528837793600 pyconfig.py:432] Config param muon_consistent_rms: None
I0421 12:07:01.049860 132528837793600 pyconfig.py:432] Config param muon_weight_decay: 0.0
I0421 12:07:01.049876 132528837793600 pyconfig.py:432] Config param n_routing_groups: -1
I0421 12:07:01.049890 132528837793600 pyconfig.py:432] Config param n_window_for_audio: 50
I0421 12:07:01.049906 132528837793600 pyconfig.py:432] Config param n_window_infer_for_audio: 800
I0421 12:07:01.049922 132528837793600 pyconfig.py:432] Config param nope_layer_interval: -1
I0421 12:07:01.049938 132528837793600 pyconfig.py:432] Config param norm_topk_prob: False
I0421 12:07:01.049952 132528837793600 pyconfig.py:432] Config param normalization_layer_epsilon: 1e-05
I0421 12:07:01.049970 132528837793600 pyconfig.py:432] Config param normalize_embedding_logits: False
I0421 12:07:01.049986 132528837793600 pyconfig.py:432] Config param num_attention_heads_for_vit: 16
I0421 12:07:01.050000 132528837793600 pyconfig.py:432] Config param num_batches: 4
I0421 12:07:01.050016 132528837793600 pyconfig.py:432] Config param num_channels_for_vit: 3
I0421 12:07:01.050030 132528837793600 pyconfig.py:432] Config param num_conv_layers_for_audio: 3
I0421 12:07:01.050045 132528837793600 pyconfig.py:432] Config param num_decoder_layers: 1
I0421 12:07:01.050060 132528837793600 pyconfig.py:432] Config param num_diloco_replicas: 1
I0421 12:07:01.050075 132528837793600 pyconfig.py:432] Config param num_epoch: 1
I0421 12:07:01.050090 132528837793600 pyconfig.py:432] Config param num_eval_passes: 1
I0421 12:07:01.050105 132528837793600 pyconfig.py:432] Config param num_experts: 1
I0421 12:07:01.050119 132528837793600 pyconfig.py:432] Config param num_experts_per_tok: 1
I0421 12:07:01.050135 132528837793600 pyconfig.py:432] Config param num_generations: 2
I0421 12:07:01.050151 132528837793600 pyconfig.py:432] Config param num_hidden_layers_for_vit: 34
I0421 12:07:01.050166 132528837793600 pyconfig.py:432] Config param num_iterations: 1
I0421 12:07:01.050182 132528837793600 pyconfig.py:432] Config param num_kv_heads: 2
I0421 12:07:01.050196 132528837793600 pyconfig.py:432] Config param num_layers_per_pipeline_stage: 1
I0421 12:07:01.050211 132528837793600 pyconfig.py:432] Config param num_mel_bins_for_audio: 128
I0421 12:07:01.050227 132528837793600 pyconfig.py:432] Config param num_pipeline_microbatches: -1
I0421 12:07:01.050243 132528837793600 pyconfig.py:432] Config param num_pipeline_repeats: -1
I0421 12:07:01.050258 132528837793600 pyconfig.py:432] Config param num_position_embeddings_for_vit: 1024
I0421 12:07:01.050274 132528837793600 pyconfig.py:432] Config param num_query_heads: 2
I0421 12:07:01.050288 132528837793600 pyconfig.py:432] Config param num_samplers_slices: -1
I0421 12:07:01.050304 132528837793600 pyconfig.py:432] Config param num_slices: 1
I0421 12:07:01.050320 132528837793600 pyconfig.py:432] Config param num_target_devices: 32
I0421 12:07:01.050335 132528837793600 pyconfig.py:432] Config param num_test_batches: 5
I0421 12:07:01.050350 132528837793600 pyconfig.py:432] Config param num_trainer_slices: -1
I0421 12:07:01.050366 132528837793600 pyconfig.py:432] Config param num_vocab_tiling: 1
I0421 12:07:01.050380 132528837793600 pyconfig.py:432] Config param off_policy_steps: 0
I0421 12:07:01.050396 132528837793600 pyconfig.py:432] Config param offline_data_dir: None
I0421 12:07:01.050410 132528837793600 pyconfig.py:432] Config param opt_type: OptimizerType.ADAM_PAX
I0421 12:07:01.050428 132528837793600 pyconfig.py:432] Config param optimize_mesh_for_tpu_v6e: False
I0421 12:07:01.050444 132528837793600 pyconfig.py:432] Config param optimizer_memory_host_offload: False
I0421 12:07:01.050459 132528837793600 pyconfig.py:432] Config param original_max_position_embeddings: 4096
I0421 12:07:01.050474 132528837793600 pyconfig.py:432] Config param out_hidden_size_for_vit: 512
I0421 12:07:01.050489 132528837793600 pyconfig.py:432] Config param out_proj: RematLocation.REMAT
I0421 12:07:01.050504 132528837793600 pyconfig.py:432] Config param output_dim_for_audio: 512
I0421 12:07:01.050520 132528837793600 pyconfig.py:432] Config param override_logical_axis_rules: False
I0421 12:07:01.050534 132528837793600 pyconfig.py:432] Config param override_model_config: True
I0421 12:07:01.050550 132528837793600 pyconfig.py:432] Config param packing: True
I0421 12:07:01.050565 132528837793600 pyconfig.py:432] Config param pagedattn_head_dim_alignment: 128
I0421 12:07:01.050580 132528837793600 pyconfig.py:432] Config param pagedattn_max_pages_per_group: -1
I0421 12:07:01.050596 132528837793600 pyconfig.py:432] Config param pagedattn_num_pages: 64
I0421 12:07:01.050611 132528837793600 pyconfig.py:432] Config param pagedattn_pages_per_compute_block: 4
I0421 12:07:01.050626 132528837793600 pyconfig.py:432] Config param pagedattn_tokens_per_page: 32
I0421 12:07:01.050641 132528837793600 pyconfig.py:432] Config param param_scan_axis: 1
I0421 12:07:01.050656 132528837793600 pyconfig.py:432] Config param parameter_memory_host_offload: False
I0421 12:07:01.050671 132528837793600 pyconfig.py:432] Config param partial_rotary_factor: 1.0
I0421 12:07:01.050686 132528837793600 pyconfig.py:432] Config param patch_size_for_vit: 14
I0421 12:07:01.050702 132528837793600 pyconfig.py:432] Config param penalty_incorrect_answer: -1.0
I0421 12:07:01.050716 132528837793600 pyconfig.py:432] Config param penalty_incorrect_format: -0.5
I0421 12:07:01.050740 132528837793600 pyconfig.py:432] Config param per_device_batch_size: 2
I0421 12:07:01.050756 132528837793600 pyconfig.py:432] Config param per_device_batch_size_increment: 2.0
I0421 12:07:01.050771 132528837793600 pyconfig.py:432] Config param per_device_batch_size_start: 4.0
I0421 12:07:01.050790 132528837793600 pyconfig.py:432] Config param pipeline_delay_activation_forwarding: False
I0421 12:07:01.050804 132528837793600 pyconfig.py:432] Config param pipeline_fsdp_ag_once: False
I0421 12:07:01.050820 132528837793600 pyconfig.py:432] Config param pipeline_fsdp_ag_per_repeat: False
I0421 12:07:01.050834 132528837793600 pyconfig.py:432] Config param pipeline_parallel_layers: 1
I0421 12:07:01.050850 132528837793600 pyconfig.py:432] Config param pixel_shuffle_ratio_for_vit: 0.5
I0421 12:07:01.050867 132528837793600 pyconfig.py:432] Config param posemb_type_for_vit: learn
I0421 12:07:01.050883 132528837793600 pyconfig.py:432] Config param position_id_per_seconds: 25
I0421 12:07:01.050897 132528837793600 pyconfig.py:432] Config param prefill_cache_axis_order: 1,2,0,3
I0421 12:07:01.050913 132528837793600 pyconfig.py:432] Config param prefill_cache_dir: 
I0421 12:07:01.050927 132528837793600 pyconfig.py:432] Config param prefill_chunk_size: 256
I0421 12:07:01.050943 132528837793600 pyconfig.py:432] Config param prefill_slice: v5e-16
I0421 12:07:01.050958 132528837793600 pyconfig.py:432] Config param prefix_caching_dram_byte: 100000000000
I0421 12:07:01.050975 132528837793600 pyconfig.py:432] Config param prefix_caching_hbm_byte: 10000000000
I0421 12:07:01.050990 132528837793600 pyconfig.py:432] Config param profile_cleanly: True
I0421 12:07:01.051006 132528837793600 pyconfig.py:432] Config param profile_periodically_period: -1
I0421 12:07:01.051020 132528837793600 pyconfig.py:432] Config param profile_power_events: False
I0421 12:07:01.051036 132528837793600 pyconfig.py:432] Config param profiler: ProfilerType.NONE
I0421 12:07:01.051053 132528837793600 pyconfig.py:432] Config param profiler_steps: 5
I0421 12:07:01.051069 132528837793600 pyconfig.py:432] Config param projector_dropout_for_vit: 0.0
I0421 12:07:01.051084 132528837793600 pyconfig.py:432] Config param projector_input_dim_for_vit: 4096
I0421 12:07:01.051100 132528837793600 pyconfig.py:432] Config param projector_output_dim_for_vit: 4096
I0421 12:07:01.051114 132528837793600 pyconfig.py:432] Config param prometheus_port: 0
I0421 12:07:01.051130 132528837793600 pyconfig.py:432] Config param prompt: I love to
I0421 12:07:01.051144 132528837793600 pyconfig.py:432] Config param pure_nnx: False
I0421 12:07:01.051160 132528837793600 pyconfig.py:432] Config param pure_nnx_decoder: False
I0421 12:07:01.051174 132528837793600 pyconfig.py:432] Config param q_lora_rank: 0
I0421 12:07:01.051190 132528837793600 pyconfig.py:432] Config param qk_clip_threshold: 100.0
I0421 12:07:01.051204 132528837793600 pyconfig.py:432] Config param qk_nope_head_dim: 128
I0421 12:07:01.051220 132528837793600 pyconfig.py:432] Config param qk_norm_with_scale: True
I0421 12:07:01.051236 132528837793600 pyconfig.py:432] Config param qk_rope_head_dim: 64
I0421 12:07:01.051250 132528837793600 pyconfig.py:432] Config param qkv_proj: RematLocation.REMAT
I0421 12:07:01.051266 132528837793600 pyconfig.py:432] Config param quant_cfg_path: 
I0421 12:07:01.051282 132528837793600 pyconfig.py:432] Config param quantization: QuantizationType.NONE
I0421 12:07:01.051298 132528837793600 pyconfig.py:432] Config param quantization_local_shard_count: 4
I0421 12:07:01.051314 132528837793600 pyconfig.py:432] Config param quantize_kvcache: False
I0421 12:07:01.051330 132528837793600 pyconfig.py:432] Config param query_proj: RematLocation.REMAT
I0421 12:07:01.051344 132528837793600 pyconfig.py:432] Config param query_wa_proj: RematLocation.REMAT
I0421 12:07:01.051360 132528837793600 pyconfig.py:432] Config param ragged_block_size: 256
I0421 12:07:01.051375 132528837793600 pyconfig.py:432] Config param ragged_buffer_factor: -1.0
I0421 12:07:01.051389 132528837793600 pyconfig.py:432] Config param rampup_end_step: 0
I0421 12:07:01.051405 132528837793600 pyconfig.py:432] Config param rampup_samples_per_increment_to_load: None
I0421 12:07:01.051422 132528837793600 pyconfig.py:432] Config param reasoning_end_token: </reasoning>
I0421 12:07:01.051437 132528837793600 pyconfig.py:432] Config param reasoning_start_token: <reasoning>
I0421 12:07:01.051453 132528837793600 pyconfig.py:432] Config param record_internal_nn_metrics: 0
I0421 12:07:01.051468 132528837793600 pyconfig.py:432] Config param remat_policy: full
I0421 12:07:01.051483 132528837793600 pyconfig.py:432] Config param remat_policy_for_vit: minimal
I0421 12:07:01.051498 132528837793600 pyconfig.py:432] Config param remove_size_one_mesh_axis_from_type: True
I0421 12:07:01.051514 132528837793600 pyconfig.py:432] Config param replicate_quant_scale: False
I0421 12:07:01.051529 132528837793600 pyconfig.py:432] Config param replicator_backup_interval_minutes: 0
I0421 12:07:01.051544 132528837793600 pyconfig.py:432] Config param report_heartbeat_metric_for_gcp_monitoring: False
I0421 12:07:01.051560 132528837793600 pyconfig.py:432] Config param report_performance_metric_for_gcp_monitoring: False
I0421 12:07:01.051576 132528837793600 pyconfig.py:432] Config param reshape_q: False
I0421 12:07:01.051590 132528837793600 pyconfig.py:432] Config param return_log_prob: False
I0421 12:07:01.051605 132528837793600 pyconfig.py:432] Config param reuse_example_batch: 0
I0421 12:07:01.051622 132528837793600 pyconfig.py:432] Config param reward_exact_answer: 5.0
I0421 12:07:01.051636 132528837793600 pyconfig.py:432] Config param reward_exact_format_match: 3.0
I0421 12:07:01.051652 132528837793600 pyconfig.py:432] Config param reward_partial_format_match: 0.5
I0421 12:07:01.051667 132528837793600 pyconfig.py:432] Config param reward_ratio_guess_to_answer_high: 0.5
I0421 12:07:01.051681 132528837793600 pyconfig.py:432] Config param reward_ratio_guess_to_answer_low: 0.25
I0421 12:07:01.051698 132528837793600 pyconfig.py:432] Config param reward_white_space_format_match: 1.5
I0421 12:07:01.051714 132528837793600 pyconfig.py:432] Config param rl: {'num_generations': 2, 'num_iterations': 1, 'grpo_beta': 0.08, 'grpo_epsilon': 0.2, 'loss_algo': 'grpo', 'use_agentic_rollout': False, 'max_concurrency': 256, 'off_policy_steps': 0, 'system_prompt': '', 'degenerate_group_masking': True, 'epsilon_high': None}
I0421 12:07:01.051743 132528837793600 pyconfig.py:432] Config param rollout_data_parallelism: -1
I0421 12:07:01.051759 132528837793600 pyconfig.py:432] Config param rollout_expert_parallelism: 1
I0421 12:07:01.051774 132528837793600 pyconfig.py:432] Config param rollout_micro_batch_size: -1
I0421 12:07:01.051793 132528837793600 pyconfig.py:432] Config param rollout_tensor_parallelism: -1
I0421 12:07:01.051808 132528837793600 pyconfig.py:432] Config param rope_attention_scaling: False
I0421 12:07:01.051823 132528837793600 pyconfig.py:432] Config param rope_factor: 40
I0421 12:07:01.051838 132528837793600 pyconfig.py:432] Config param rope_interleave: True
I0421 12:07:01.051854 132528837793600 pyconfig.py:432] Config param rope_linear_scaling_factor: 1.0
I0421 12:07:01.051869 132528837793600 pyconfig.py:432] Config param rope_max_timescale: 10000
I0421 12:07:01.051884 132528837793600 pyconfig.py:432] Config param rope_min_timescale: 1
I0421 12:07:01.051899 132528837793600 pyconfig.py:432] Config param rope_theta_for_vit: 10000
I0421 12:07:01.051914 132528837793600 pyconfig.py:432] Config param rope_truncate: True
I0421 12:07:01.051930 132528837793600 pyconfig.py:432] Config param rope_type: RopeType.DEFAULT
I0421 12:07:01.051947 132528837793600 pyconfig.py:432] Config param rope_use_scale: True
I0421 12:07:01.051963 132528837793600 pyconfig.py:432] Config param routed_bias: False
I0421 12:07:01.051979 132528837793600 pyconfig.py:432] Config param routed_bias_update_rate: 0.0
I0421 12:07:01.051993 132528837793600 pyconfig.py:432] Config param routed_scaling_factor: 1.0
I0421 12:07:01.052009 132528837793600 pyconfig.py:432] Config param routed_score_func: 
I0421 12:07:01.052023 132528837793600 pyconfig.py:432] Config param run_name: gpt3-52k_2026-04-21-12-07
I0421 12:07:01.052039 132528837793600 pyconfig.py:432] Config param sa_block_kv: 512
I0421 12:07:01.052053 132528837793600 pyconfig.py:432] Config param sa_block_kv_compute: 512
I0421 12:07:01.052070 132528837793600 pyconfig.py:432] Config param sa_block_kv_dkv: 512
I0421 12:07:01.052084 132528837793600 pyconfig.py:432] Config param sa_block_kv_dkv_compute: 512
I0421 12:07:01.052098 132528837793600 pyconfig.py:432] Config param sa_block_kv_dq: 512
I0421 12:07:01.052114 132528837793600 pyconfig.py:432] Config param sa_block_q: 512
I0421 12:07:01.052130 132528837793600 pyconfig.py:432] Config param sa_block_q_dkv: 512
I0421 12:07:01.052145 132528837793600 pyconfig.py:432] Config param sa_block_q_dq: 512
I0421 12:07:01.052161 132528837793600 pyconfig.py:432] Config param sa_k_layout: HEAD_DIM_MINOR
I0421 12:07:01.052177 132528837793600 pyconfig.py:432] Config param sa_q_layout: HEAD_DIM_MINOR
I0421 12:07:01.052193 132528837793600 pyconfig.py:432] Config param sa_use_fused_bwd_kernel: False
I0421 12:07:01.052207 132528837793600 pyconfig.py:432] Config param sa_v_layout: HEAD_DIM_MINOR
I0421 12:07:01.052222 132528837793600 pyconfig.py:432] Config param sampler_devices_fraction: 0.5
I0421 12:07:01.052239 132528837793600 pyconfig.py:432] Config param save_checkpoint_on_completion: True
I0421 12:07:01.052253 132528837793600 pyconfig.py:432] Config param save_config_to_gcs: False
I0421 12:07:01.052269 132528837793600 pyconfig.py:432] Config param save_quantized_params_path: 
I0421 12:07:01.052285 132528837793600 pyconfig.py:432] Config param scale_embedding_for_audio: True
I0421 12:07:01.052299 132528837793600 pyconfig.py:432] Config param scan_layers: True
I0421 12:07:01.052315 132528837793600 pyconfig.py:432] Config param scan_layers_per_stage: False
I0421 12:07:01.052330 132528837793600 pyconfig.py:432] Config param scan_pipeline_iterations: True
I0421 12:07:01.052345 132528837793600 pyconfig.py:432] Config param scan_pipeline_repeats: False
I0421 12:07:01.052359 132528837793600 pyconfig.py:432] Config param set_remat_policy_on_layers_per_stage: False
I0421 12:07:01.052375 132528837793600 pyconfig.py:432] Config param set_remat_policy_on_pipeline_iterations: True
I0421 12:07:01.052397 132528837793600 pyconfig.py:432] Config param sft_train_on_completion_only: False
I0421 12:07:01.052412 132528837793600 pyconfig.py:432] Config param shard_exp_on_fsdp: False
I0421 12:07:01.052428 132528837793600 pyconfig.py:432] Config param shard_mode: ShardMode.AUTO
I0421 12:07:01.052443 132528837793600 pyconfig.py:432] Config param shard_optimizer_over_data: False
I0421 12:07:01.052459 132528837793600 pyconfig.py:432] Config param sharding_strategy: None
I0421 12:07:01.052474 132528837793600 pyconfig.py:432] Config param sharding_tolerance: 0.02
I0421 12:07:01.052491 132528837793600 pyconfig.py:432] Config param shardy: True
I0421 12:07:01.052505 132528837793600 pyconfig.py:432] Config param share_kv_projections: False
I0421 12:07:01.052520 132528837793600 pyconfig.py:432] Config param shared_experts: 0
I0421 12:07:01.052536 132528837793600 pyconfig.py:432] Config param sinkhorn_iterations: 20
I0421 12:07:01.052550 132528837793600 pyconfig.py:432] Config param skip_first_n_steps_for_profiler: 1
I0421 12:07:01.052566 132528837793600 pyconfig.py:432] Config param skip_jax_distributed_system: False
I0421 12:07:01.052580 132528837793600 pyconfig.py:432] Config param skip_step_interval: 128
I0421 12:07:01.052596 132528837793600 pyconfig.py:432] Config param skip_step_on_spikes: False
I0421 12:07:01.052611 132528837793600 pyconfig.py:432] Config param skip_step_scaling_factor: 6.0
I0421 12:07:01.052626 132528837793600 pyconfig.py:432] Config param sliding_window_size: 0
I0421 12:07:01.052643 132528837793600 pyconfig.py:432] Config param solution_end_token: </answer>
I0421 12:07:01.052659 132528837793600 pyconfig.py:432] Config param solution_start_token: <answer>
I0421 12:07:01.052676 132528837793600 pyconfig.py:432] Config param source_checkpoint_layout: orbax
I0421 12:07:01.052691 132528837793600 pyconfig.py:432] Config param sparse_matmul: True
I0421 12:07:01.052706 132528837793600 pyconfig.py:432] Config param spatial_merge_size_for_vit: 2
I0421 12:07:01.052721 132528837793600 pyconfig.py:432] Config param stack_prefill_result_cache: False
I0421 12:07:01.052747 132528837793600 pyconfig.py:432] Config param stack_trace_interval_seconds: 600
I0421 12:07:01.052762 132528837793600 pyconfig.py:432] Config param stack_trace_to_cloud: False
I0421 12:07:01.052781 132528837793600 pyconfig.py:432] Config param step_deviation_interval_seconds: 30
I0421 12:07:01.052796 132528837793600 pyconfig.py:432] Config param steps: 200000
I0421 12:07:01.052814 132528837793600 pyconfig.py:432] Config param stop_strings: None
I0421 12:07:01.052830 132528837793600 pyconfig.py:432] Config param student_overrides: {'model_name': 'llama3.1-8b'}
I0421 12:07:01.052846 132528837793600 pyconfig.py:432] Config param student_params_to_update: None
I0421 12:07:01.052861 132528837793600 pyconfig.py:432] Config param subslice_shape: 
I0421 12:07:01.052878 132528837793600 pyconfig.py:432] Config param swap_space_vllm_gb: 2
I0421 12:07:01.052892 132528837793600 pyconfig.py:432] Config param system_prompt: 
I0421 12:07:01.052907 132528837793600 pyconfig.py:432] Config param target_eval_loss: 0.0
I0421 12:07:01.052923 132528837793600 pyconfig.py:432] Config param teacher_overrides: {'model_name': 'llama3.1-8b'}
I0421 12:07:01.052938 132528837793600 pyconfig.py:432] Config param temperature_tuning: False
I0421 12:07:01.052954 132528837793600 pyconfig.py:432] Config param temporal_patch_size_for_vit: 2
I0421 12:07:01.052968 132528837793600 pyconfig.py:432] Config param tensorboard_dir: None
I0421 12:07:01.052984 132528837793600 pyconfig.py:432] Config param tensors_on_device: None
I0421 12:07:01.052999 132528837793600 pyconfig.py:432] Config param tensors_to_offload: None
I0421 12:07:01.053013 132528837793600 pyconfig.py:432] Config param test_batch_start_index: 0
I0421 12:07:01.053029 132528837793600 pyconfig.py:432] Config param tile_size_for_vit: 336
I0421 12:07:01.053044 132528837793600 pyconfig.py:432] Config param tokenize_eval_data: True
I0421 12:07:01.053059 132528837793600 pyconfig.py:432] Config param tokenize_train_data: True
I0421 12:07:01.053073 132528837793600 pyconfig.py:432] Config param tokenizer_path: meta-llama/Llama-3.1-8B
I0421 12:07:01.053089 132528837793600 pyconfig.py:432] Config param tokenizer_type: TokenizerType.HUGGINGFACE
I0421 12:07:01.053106 132528837793600 pyconfig.py:432] Config param topk_routing_group: -1
I0421 12:07:01.053122 132528837793600 pyconfig.py:432] Config param train_data_columns: ['text']
I0421 12:07:01.053139 132528837793600 pyconfig.py:432] Config param train_fraction: 1.0
I0421 12:07:01.053155 132528837793600 pyconfig.py:432] Config param train_image_column: image
I0421 12:07:01.053169 132528837793600 pyconfig.py:432] Config param train_micro_batch_size: -1
I0421 12:07:01.053185 132528837793600 pyconfig.py:432] Config param train_split: train
I0421 12:07:01.053199 132528837793600 pyconfig.py:432] Config param trainable_parameters_mask: []
I0421 12:07:01.053214 132528837793600 pyconfig.py:432] Config param trainable_position_size: 2048
I0421 12:07:01.053230 132528837793600 pyconfig.py:432] Config param trainer_devices_fraction: 0.5
I0421 12:07:01.053245 132528837793600 pyconfig.py:432] Config param upload_all_profiler_results: False
I0421 12:07:01.053261 132528837793600 pyconfig.py:432] Config param use_2d_fsdp_sharding: False
I0421 12:07:01.053275 132528837793600 pyconfig.py:432] Config param use_agentic_rollout: False
I0421 12:07:01.053290 132528837793600 pyconfig.py:432] Config param use_audio: False
I0421 12:07:01.053306 132528837793600 pyconfig.py:432] Config param use_audio_in_video: False
I0421 12:07:01.053322 132528837793600 pyconfig.py:432] Config param use_batch_split_schedule: False
I0421 12:07:01.053336 132528837793600 pyconfig.py:432] Config param use_chat_template: False
I0421 12:07:01.053352 132528837793600 pyconfig.py:432] Config param use_chunked_prefill: False
I0421 12:07:01.053367 132528837793600 pyconfig.py:432] Config param use_custom_sort_vjp: True
I0421 12:07:01.053382 132528837793600 pyconfig.py:432] Config param use_dpo: False
I0421 12:07:01.053396 132528837793600 pyconfig.py:432] Config param use_gather_mosaic_kernel: False
I0421 12:07:01.053412 132528837793600 pyconfig.py:432] Config param use_grpo: True
I0421 12:07:01.053428 132528837793600 pyconfig.py:432] Config param use_indexer: False
I0421 12:07:01.053443 132528837793600 pyconfig.py:432] Config param use_iota_embed: True
I0421 12:07:01.053457 132528837793600 pyconfig.py:432] Config param use_jax_splash: False
I0421 12:07:01.053472 132528837793600 pyconfig.py:432] Config param use_max_logit_estimate: -1
I0421 12:07:01.053487 132528837793600 pyconfig.py:432] Config param use_mrope: False
I0421 12:07:01.053503 132528837793600 pyconfig.py:432] Config param use_multimodal: False
I0421 12:07:01.053518 132528837793600 pyconfig.py:432] Config param use_pathways: True
I0421 12:07:01.053532 132528837793600 pyconfig.py:432] Config param use_post_attn_norm: False
I0421 12:07:01.053548 132528837793600 pyconfig.py:432] Config param use_post_ffw_norm: False
I0421 12:07:01.053564 132528837793600 pyconfig.py:432] Config param use_qk_clip: False
I0421 12:07:01.053579 132528837793600 pyconfig.py:432] Config param use_qk_norm: False
I0421 12:07:01.053593 132528837793600 pyconfig.py:432] Config param use_qk_norm_in_gdn: True
I0421 12:07:01.053609 132528837793600 pyconfig.py:432] Config param use_qwix_quantization: False
I0421 12:07:01.053623 132528837793600 pyconfig.py:432] Config param use_ragged_attention: False
I0421 12:07:01.053639 132528837793600 pyconfig.py:432] Config param use_random_routing: False
I0421 12:07:01.053654 132528837793600 pyconfig.py:432] Config param use_replicator_service: False
I0421 12:07:01.053669 132528837793600 pyconfig.py:432] Config param use_ring_of_experts: False
I0421 12:07:01.053684 132528837793600 pyconfig.py:432] Config param use_sft: False
I0421 12:07:01.053698 132528837793600 pyconfig.py:432] Config param use_splash_scheduler: False
I0421 12:07:01.053712 132528837793600 pyconfig.py:432] Config param use_tokamax_gmm: False
I0421 12:07:01.053728 132528837793600 pyconfig.py:432] Config param use_tokamax_splash: False
I0421 12:07:01.053750 132528837793600 pyconfig.py:432] Config param use_truncation: True
I0421 12:07:01.053766 132528837793600 pyconfig.py:432] Config param use_tunix_gradient_accumulation: False
I0421 12:07:01.053783 132528837793600 pyconfig.py:432] Config param use_untrainable_positional_embedding: False
I0421 12:07:01.053799 132528837793600 pyconfig.py:432] Config param use_vertex_tensorboard: False
I0421 12:07:01.053813 132528837793600 pyconfig.py:432] Config param using_pipeline_parallelism: False
I0421 12:07:01.053829 132528837793600 pyconfig.py:432] Config param v_head_dim: 128
I0421 12:07:01.053845 132528837793600 pyconfig.py:432] Config param v_norm_with_scale: True
I0421 12:07:01.053859 132528837793600 pyconfig.py:432] Config param value_proj: RematLocation.REMAT
I0421 12:07:01.053875 132528837793600 pyconfig.py:432] Config param vertex_tensorboard_project: 
I0421 12:07:01.053889 132528837793600 pyconfig.py:432] Config param vertex_tensorboard_region: 
I0421 12:07:01.053905 132528837793600 pyconfig.py:432] Config param video_path: 
I0421 12:07:01.053920 132528837793600 pyconfig.py:432] Config param video_placeholder: <|video|>
I0421 12:07:01.053935 132528837793600 pyconfig.py:432] Config param vision_output_dim_for_vit: 4096
I0421 12:07:01.053951 132528837793600 pyconfig.py:432] Config param vision_output_length: -1
I0421 12:07:01.053966 132528837793600 pyconfig.py:432] Config param vllm_additional_config: {}
I0421 12:07:01.053982 132528837793600 pyconfig.py:432] Config param vllm_hf_config_path: 
I0421 12:07:01.053996 132528837793600 pyconfig.py:432] Config param vllm_hf_overrides: {}
I0421 12:07:01.054011 132528837793600 pyconfig.py:432] Config param vocab_size: 32000
I0421 12:07:01.054026 132528837793600 pyconfig.py:432] Config param warmup_steps_fraction: 0.1
I0421 12:07:01.054043 132528837793600 pyconfig.py:432] Config param weight_dtype: float32
I0421 12:07:01.054064 132528837793600 pyconfig.py:432] Config param weight_quantization_calibration_method: absmax
I0421 12:07:01.054079 132528837793600 pyconfig.py:432] Config param wi_tile_dlhs_batch_seq: 512
I0421 12:07:01.054094 132528837793600 pyconfig.py:432] Config param wi_tile_dlhs_embed_dim: 1024
I0421 12:07:01.054109 132528837793600 pyconfig.py:432] Config param wi_tile_dlhs_mlp_dim: 1024
I0421 12:07:01.054124 132528837793600 pyconfig.py:432] Config param wi_tile_drhs_batch_seq: 512
I0421 12:07:01.054139 132528837793600 pyconfig.py:432] Config param wi_tile_drhs_embed_dim: 1024
I0421 12:07:01.054155 132528837793600 pyconfig.py:432] Config param wi_tile_drhs_mlp_dim: 1024
I0421 12:07:01.054170 132528837793600 pyconfig.py:432] Config param wi_tile_fwd_batch_seq: 512
I0421 12:07:01.054185 132528837793600 pyconfig.py:432] Config param wi_tile_fwd_embed_dim: 1024
I0421 12:07:01.054202 132528837793600 pyconfig.py:432] Config param wi_tile_fwd_mlp_dim: 1024
I0421 12:07:01.054216 132528837793600 pyconfig.py:432] Config param wo_tile_dlhs_batch_seq: 512
I0421 12:07:01.054232 132528837793600 pyconfig.py:432] Config param wo_tile_dlhs_embed_dim: 1024
I0421 12:07:01.054248 132528837793600 pyconfig.py:432] Config param wo_tile_dlhs_mlp_dim: 1024
I0421 12:07:01.054263 132528837793600 pyconfig.py:432] Config param wo_tile_drhs_batch_seq: 512
I0421 12:07:01.054278 132528837793600 pyconfig.py:432] Config param wo_tile_drhs_embed_dim: 1024
I0421 12:07:01.054294 132528837793600 pyconfig.py:432] Config param wo_tile_drhs_mlp_dim: 1024
I0421 12:07:01.054308 132528837793600 pyconfig.py:432] Config param wo_tile_fwd_batch_seq: 512
I0421 12:07:01.054323 132528837793600 pyconfig.py:432] Config param wo_tile_fwd_embed_dim: 1024
I0421 12:07:01.054339 132528837793600 pyconfig.py:432] Config param wo_tile_fwd_mlp_dim: 1024
I0421 12:07:01.054353 132528837793600 pyconfig.py:432] Config param wsd_decay_steps_fraction: 0.1
I0421 12:07:01.054369 132528837793600 pyconfig.py:432] Config param wsd_decay_style: WsdDecayStyle.LINEAR
I0421 12:07:01.054386 132528837793600 pyconfig.py:432] Config param xprof_e2e_enable_fw_power_level_event: False
I0421 12:07:01.054401 132528837793600 pyconfig.py:432] Config param xprof_e2e_enable_fw_thermal_event: False
I0421 12:07:01.054417 132528837793600 pyconfig.py:432] Config param xprof_e2e_enable_fw_throttle_event: False
I0421 12:07:01.054432 132528837793600 pyconfig.py:432] Config param xprof_tpu_power_trace_level: 0
I0421 12:07:01.054448 132528837793600 pyconfig.py:432] Config param z_loss_multiplier: 0.0
I0421 12:07:01.054797 132528837793600 tokenizer.py:245] Tokenizer path: meta-llama/Llama-2-7b-chat-hf
I0421 12:07:01.054834 132528837793600 tokenizer.py:224] Loading HF tokenizer: meta-llama/Llama-2-7b-chat-hf
I0421 12:07:04.853672 132528837793600 _schedule.py:129] A polynomial schedule was set with a non-positive `transition_steps` value; this results in a constant schedule with value `init_value`.
I0421 12:07:04.856702 132528837793600 maxtext_utils.py:1718] Num_devices: 32, shape (1, 4, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1)
I0421 12:07:04.856836 132528837793600 train_distill.py:608] Applying logical axis rules for model initialization and training...
I0421 12:07:04.856909 132528837793600 train_distill.py:612] Loading Student from ...
I0421 12:07:04.856937 132528837793600 train_distill.py:169] --- Student Configuration ---
I0421 12:07:04.856960 132528837793600 train_distill.py:170]   Model Name:      gpt3-52k
I0421 12:07:04.856981 132528837793600 train_distill.py:171]   Dimensions:      1 Layers, 16 Emb Dim, 8 Head Dim
I0421 12:07:04.857004 132528837793600 train_distill.py:174]   Attention Heads: 2 Query, 2 KV
I0421 12:07:04.857023 132528837793600 train_distill.py:175]   Vocab Size:      32000
I0421 12:07:04.857040 132528837793600 train_distill.py:176]   Checkpoint:      
I0421 12:07:04.857058 132528837793600 train_distill.py:477] Initializing model: gpt3-52k...
I0421 12:07:06.123166 132528837793600 train_distill.py:626] Loading Teacher from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items...
I0421 12:07:06.123273 132528837793600 train_distill.py:169] --- Teacher Configuration ---
I0421 12:07:06.123301 132528837793600 train_distill.py:170]   Model Name:      gpt3-52k
I0421 12:07:06.123325 132528837793600 train_distill.py:171]   Dimensions:      1 Layers, 16 Emb Dim, 8 Head Dim
I0421 12:07:06.123345 132528837793600 train_distill.py:174]   Attention Heads: 2 Query, 2 KV
I0421 12:07:06.123365 132528837793600 train_distill.py:175]   Vocab Size:      32000
I0421 12:07:06.123382 132528837793600 train_distill.py:176]   Checkpoint:      gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items
I0421 12:07:06.123401 132528837793600 train_distill.py:477] Initializing model: gpt3-52k...
I0421 12:07:07.271403 132528837793600 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0421 12:07:07.271847 132528837793600 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=True, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x78880ecd5910>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0421 12:07:07.271906 132528837793600 abstract_checkpointer.py:35] orbax-checkpoint version: 0.11.28
W0421 12:07:07.799982 132528837793600 checkpoint.py:202] Metadata file does not exist: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items/_CHECKPOINT_METADATA
I0421 12:07:08.347514    2190 google_auth_provider.cc:181] Running on GCE, using service account 562977990677-compute@developer.gserviceaccount.com
I0421 12:07:09.915121 132528837793600 checkpointer.py:304] Restoring checkpoint from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items.
W0421 12:07:12.087010 132528837793600 transform_utils.py:230] The transformations API will eventually be replaced by an upgraded design. The current API will not be removed until this point, but it will no longer be actively worked on.
I0421 12:07:12.087378 132528837793600 transform_utils.py:288] The following keys are not loaded from the original tree after applying specified transforms: params/params/decoder/to_nnx__rngs/aqt/count, params/params/decoder/to_nnx__rngs/aqt/key, params/params/decoder/to_nnx__rngs/dropout/count, params/params/decoder/to_nnx__rngs/dropout/key, params/params/decoder/to_nnx__rngs/params/count, params/params/decoder/to_nnx__rngs/params/key
I0421 12:07:12.679553 132528837793600 checkpointer.py:318] Finished restoring checkpoint in 3.56 seconds from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items.
I0421 12:07:13.368643 132528837793600 train_distill.py:652] Initializing Data Iterators via MaxText pipeline...
I0421 12:07:13.433447 132528837793600 config.py:112] TensorFlow version 2.20.0 available.
I0421 12:07:13.433944 132528837793600 config.py:125] JAX version 0.8.3 available.
E0421 12:07:15.534079 132528837793600 packing.py:209] PackAndBatchOperation is deprecated. Please use lazy_dataset.FirstFitPackIterDataset instead.
I0421 12:07:15.534299 132528837793600 data_loader.py:408] Adding CopyNumPyArrayToSharedMemory MapTransform.
I0421 12:07:15.537320 132528837793600 train_distill.py:422] Input Pipeline Checkpointing: DISABLED
I0421 12:07:15.537385 132528837793600 train_distill.py:426] Reason: Iterator 'MultiHostDataLoadIterator' is not recognized as Grain (dataset_type='DatasetType.HF', has_save=False)
I0421 12:07:15.537449 132528837793600 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0421 12:07:15.537527 132528837793600 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x78880ecd5910>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0421 12:07:15.537569 132528837793600 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0421 12:07:15.537600 132528837793600 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x78880ecd5910>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0421 12:07:15.537642 132528837793600 checkpoint_manager.py:702] [process=7][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x787ccfcc1520>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78680c05f350>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x78680c05f290>}, handler_registry=None
I0421 12:07:15.537873 132528837793600 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x787ccfcc1520>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0421 12:07:15.537919 132528837793600 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78680c05f350>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0421 12:07:15.537946 132528837793600 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x78680c05f290>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0421 12:07:15.537970 132528837793600 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x78680c05eed0>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0421 12:07:15.537998 132528837793600 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x787ccfcc1520>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x787ccfcc1520>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78680c05f350>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78680c05f350>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x78680c05f290>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x78680c05f290>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x78680c05eed0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x78680c05eed0>}).
I0421 12:07:15.538474 132528837793600 async_checkpointer.py:177] [process=7][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7867d0099620> timeout: 600 secs and primary_host=0 for async checkpoint writes
I0421 12:07:18.008699 132528837793600 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260421_114106/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260421_114106_07_distill_smoke/checkpoints
I0421 12:07:18.463287 132528837793600 checkpoint_manager.py:921] [process=7][thread=MainThread] CheckpointManager created,  primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260421_114106/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260421_114106_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x78680c05f260>
I0421 12:07:18.463460 132528837793600 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0421 12:07:18.463533 132528837793600 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x78880ecd5910>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0421 12:07:18.463569 132528837793600 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None
I0421 12:07:18.463602 132528837793600 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x78880ecd5910>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0421 12:07:18.463638 132528837793600 checkpoint_manager.py:1983] [process=7][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning.
I0421 12:07:18.463693 132528837793600 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=132528837793600 count=1 at 0x7871a642ab40>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x78680c05f050>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x78680c05f020>, _write_futures=[])
I0421 12:07:18.464090 132528837793600 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=132528837793600 count=1 at 0x7871a642ab40>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x78680c05f050>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x78680c05f020>, _write_futures=[])
I0421 12:07:18.464118 132528837793600 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=132528837793600 count=1 at 0x7871a642ab40>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x78680c05f050>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x78680c05f020>, _write_futures=[])
I0421 12:07:18.464151 132528837793600 checkpoint_manager.py:702] [process=7][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78680c05f230>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78680c05e630>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x78680c05d7c0>, 'iter': <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x78680c05d2b0>}, handler_registry=None
I0421 12:07:18.464259 132528837793600 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78680c05f230>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0421 12:07:18.464295 132528837793600 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78680c05e630>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0421 12:07:18.464318 132528837793600 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x78680c05d7c0>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0421 12:07:18.464346 132528837793600 composite_checkpoint_handler.py:237] Deferred registration for item: "iter". Adding handler `<maxtext.common.checkpointing.GrainCheckpointHandler object at 0x78680c05d2b0>` for item "iter" and save args `<class 'maxtext.common.checkpointing.GrainCheckpointSave'>` and restore args `<class 'maxtext.common.checkpointing.GrainCheckpointRestore'>` to `_handler_registry`.
I0421 12:07:18.464369 132528837793600 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x78680c05cce0>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0421 12:07:18.464394 132528837793600 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78680c05f230>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78680c05f230>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78680c05e630>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78680c05e630>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x78680c05d7c0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x78680c05d7c0>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointSave'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x78680c05d2b0>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointRestore'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x78680c05d2b0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x78680c05cce0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x78680c05cce0>}).
I0421 12:07:18.464464 132528837793600 async_checkpointer.py:177] [process=7][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7867d0099760> timeout: 600 secs and primary_host=0 for async checkpoint writes
I0421 12:07:18.839772 132528837793600 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260421_114106/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260421_114106_07_distill_smoke/checkpoints
I0421 12:07:18.852114 132528837793600 checkpoint_manager.py:921] [process=7][thread=MainThread] CheckpointManager created,  primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260421_114106/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260421_114106_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x78680c05cf50>
I0421 12:07:18.852577 132528837793600 train_distill.py:703] Starting Distillation Training...
I0421 12:07:18.852684 132528837793600 peft_trainer.py:584] Training with mesh: Mesh('diloco': 1, 'data': 4, 'stage': 1, 'fsdp': 8, 'fsdp_transpose': 1, 'sequence': 1, 'context': 1, 'context_autoregressive': 1, 'tensor': 1, 'tensor_transpose': 1, 'tensor_sequence': 1, 'expert': 1, 'autoregressive': 1, axis_types=(Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto))
I0421 12:07:18.973792 132528837793600 peft_trainer.py:594] Compiled train_step cache size: 0
I0421 12:07:18.975484 132385141139200 grain_pool.py:367] Grain pool will use 1 processes.
I0421 12:07:19.002618 132385141139200 grain_pool.py:440] Grain pool will start child processes.
I0421 12:07:19.007725 132385141139200 grain_pool.py:448] Grain pool started all child processes.
2026-04-21 12:07:25.058097: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
/deps/src/maxtext/trainers/post_train/distillation/train_distill.py:281: DeprecationWarning: '.value' access is now deprecated. For Variable[Array] instances use:

  variable[...]

For other Variable types use:

  variable.get_value()

  current_step = model.training_step.value
I0421 12:07:31.332397 132528837793600 checkpoint_manager.py:1983] [process=7][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning.
I0421 12:07:31.334380 132528837793600 checkpoint_manager.py:1501] [process=7] Saving checkpoint at step 1
I0421 12:07:31.337377 132528837793600 async_checkpointer.py:452] [process=7] Started async saving checkpoint to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260421_114106/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260421_114106_07_distill_smoke/checkpoints/1.
I0421 12:07:31.870894 132528837793600 signaling_client.py:364] Using JaxDistributedSignalingClient
I0421 12:07:31.871953 132528837793600 jax_array_handlers.py:347] Scheduling D2H of 37 prioritized jax.Array.
I0421 12:07:31.872012 132528837793600 replica_slices.py:410] Transferring arrays to host memory with options: use_replica_parallel=True, min_slice_bytes_for_replica_parallel=None, max_replicas_for_replica_parallel=None, enable_pinned_host_transfer=False
I0421 12:07:32.537761 132528837793600 base_pytree_checkpoint_handler.py:153] [process=7][thread=MainThread] Initiated "orbax.checkpoint._src.serialization.jax_array_handlers.ArrayHandler".serialize. Time taken: 0.666957s
I0421 12:07:32.542562 132528837793600 base_pytree_checkpoint_handler.py:128] [process=7] /jax/checkpoint/write/blocking_gbytes_per_sec: 574.358 KiB/s (total gbytes: 535.1 KiB) (time elapsed: 931 milliseconds) (per-host)
I0421 12:07:32.542628 132528837793600 base_pytree_checkpoint_handler.py:732] [process=7][thread=MainThread] Initiated Pytree async_save. Time taken: 0.931692s (batch_requests_ready=0.256245s, total_serialization_initiated=0.675333s, others=0.000114s)
I0421 12:07:32.544438 132528837793600 jax_array_handlers.py:347] Scheduling D2H of 46 prioritized jax.Array.
I0421 12:07:32.544494 132528837793600 replica_slices.py:410] Transferring arrays to host memory with options: use_replica_parallel=True, min_slice_bytes_for_replica_parallel=None, max_replicas_for_replica_parallel=None, enable_pinned_host_transfer=False
I0421 12:07:32.549323 132528837793600 base_pytree_checkpoint_handler.py:153] [process=7][thread=MainThread] Initiated "orbax.checkpoint._src.serialization.jax_array_handlers.ArrayHandler".serialize. Time taken: 0.006618s
I0421 12:07:32.549433 132528837793600 base_pytree_checkpoint_handler.py:128] [process=7] /jax/checkpoint/write/blocking_gbytes_per_sec: 284.015 KiB/s (total gbytes: 267.5 KiB) (time elapsed: 941 milliseconds) (per-host)
I0421 12:07:32.549478 132528837793600 base_pytree_checkpoint_handler.py:732] [process=7][thread=MainThread] Initiated Pytree async_save. Time taken: 0.942028s (batch_requests_ready=0.930490s, total_serialization_initiated=0.011469s, others=0.000070s)
I0421 12:07:32.549571 132528837793600 composite_checkpoint_handler.py:715] [process=7][thread=MainThread] Initiated CompositeCheckpointHandler.async_save. Time taken: 0.946015s (all_items=0.000023s, per_item={'model_params': '0.00001860', 'optimizer_state': '0.00000405'}, temp_paths=0.945992)
I0421 12:07:32.550561 132378172581632 async_checkpointer.py:79] [process=7][thread=async_save] Background save thread started.
I0421 12:07:32.550748 132528837793600 async_checkpointer.py:561] Finished blocking save. Time taken: 1.216280s. Continuing background save to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260421_114106/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260421_114106_07_distill_smoke/checkpoints/1.
I0421 12:07:32.576971 132528837793600 checkpoint_manager.py:1549] [process=7][thread=MainThread][step=1] Starting CheckpointManager Save Finalize thread=save_finalize
I0421 12:07:32.577257 132378197759744 async_checkpointer.py:265] [process=7][thread=save_finalize] Waiting for background save thread=async_save.
I0421 12:07:32.577411 132528837793600 standard_logger.py:34] {'step': 1, 'event_type': 'save', 'directory': 'gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260421_114106/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260421_114106_07_distill_smoke/checkpoints', 'reached_preemption': False, 'preemption_received_at': None, 'synchronous': False, 'wait_for_prev_start_time': 1776773251.3323634, 'wait_for_prev_duration_secs': 0.0001361370086669922, 'time_between_consecutive_saves_sec': None, 'checkpointer_blocking_start_time': 1776773251.33442, 'checkpointer_blocking_duration_secs': 1.2164409160614014, 'get_old_steps_start_time': 1776773252.5508857, 'get_old_steps_duration_secs': 7.987022399902344e-05, 'checkpoint_manager_blocking_start_time': 1776773251.0773523, 'checkpoint_manager_blocking_duration_secs': 1.5000245571136475}
/deps/src/maxtext/trainers/post_train/distillation/train_distill.py:281: DeprecationWarning: '.value' access is now deprecated. For Variable[Array] instances use:

  variable[...]

For other Variable types use:

  variable.get_value()

  current_step = model.training_step.value
I0421 12:07:35.693797 132528837793600 peft_trainer.py:474] Train step 1 training loss: 15.990566  - training perplexity: 8802675.000000
I0421 12:07:35.714111 132528837793600 peft_trainer.py:474] Train step 2 training loss: 15.974588  - training perplexity: 8663145.000000
I0421 12:07:35.744526 132528837793600 peft_trainer.py:474] Train step 3 training loss: 16.008877  - training perplexity: 8965342.000000
I0421 12:07:35.763965 132528837793600 peft_trainer.py:474] Train step 4 training loss: 16.001873  - training perplexity: 8902770.000000
I0421 12:07:35.768807 132528837793600 peft_trainer.py:733] Train loop finished in: 16.7945 seconds
I0421 12:07:35.769370 132528837793600 train_distill.py:712] Saving final checkpoint to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260421_114106/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260421_114106_07_distill_smoke/checkpoints/...
I0421 12:07:37.781939 132385706333952 array_metadata_store.py:203] [process=7][thread=array_type_handler] Wrote 37 array_metadata.ArrayMetadata to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260421_114106/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260421_114106_07_distill_smoke/checkpoints/1/optimizer_state/array_metadatas/process_7
I0421 12:07:37.840428 132378180974336 array_metadata_store.py:203] [process=7][thread=array_type_handler] Wrote 46 array_metadata.ArrayMetadata to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260421_114106/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260421_114106_07_distill_smoke/checkpoints/1/model_params/array_metadatas/process_7
I0421 12:07:37.841520 132378172581632 base_pytree_checkpoint_handler.py:128] [process=7] /jax/checkpoint/write/gbytes_per_sec: 42.915 KiB/s (total gbytes: 267.5 KiB) (time elapsed: 6 seconds) (per-host)
I0421 12:07:37.841630 132378172581632 base_pytree_checkpoint_handler.py:128] [process=7] /jax/checkpoint/write/gbytes_per_sec: 85.875 KiB/s (total gbytes: 535.1 KiB) (time elapsed: 6 seconds) (per-host)
I0421 12:07:37.841668 132378172581632 async_checkpointer.py:90] [process=7][thread=async_save] 4 Handler Commit operations completed. Time taken: 5.290997s.
I0421 12:07:48.109959 132528837793600 checkpoint_manager.py:1994] [process=7][thread=MainThread][step=1][wait_until_finished] Waiting for Save Finalize thread (save_finalize) to complete.
I0421 12:07:48.979501 132378172581632 async_checkpointer.py:144] [process=7][thread=async_save] Background save thread done. Time taken: 16.428812s.
I0421 12:07:48.979861 132378197759744 async_checkpointer.py:273] [process=7][thread=save_finalize] Done with waiting for background save thread=async_save.
I0421 12:07:48.979979 132378197759744 async_checkpointer.py:283] [process=7][thread=save_finalize] No errors found in background save thread=async_save.
I0421 12:07:48.980029 132378197759744 checkpoint_manager.py:2103] [process=7][thread=save_finalize][step=1] CheckpointManager Save Finalize is syncing with other hosts...
I0421 12:07:48.981458 132378197759744 checkpoint_manager.py:2112] [process=7][thread=save_finalize][step=1] CheckpointManager Save Finalize is done on all hosts.
I0421 12:07:48.981636 132528837793600 checkpoint_manager.py:2006] [process=7][thread=MainThread][step=1][wait_until_finished] Done waiting for Save Finalize thread (save_finalize) running at step=1.
I0421 12:07:48.983293 132528837793600 checkpoint_manager.py:1501] [process=7] Saving checkpoint at step 5
I0421 12:07:48.986681 132528837793600 async_checkpointer.py:452] [process=7] Started async saving checkpoint to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260421_114106/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260421_114106_07_distill_smoke/checkpoints/5.
I0421 12:07:49.529242 132528837793600 jax_array_handlers.py:347] Scheduling D2H of 37 prioritized jax.Array.
I0421 12:07:49.529339 132528837793600 replica_slices.py:410] Transferring arrays to host memory with options: use_replica_parallel=True, min_slice_bytes_for_replica_parallel=None, max_replicas_for_replica_parallel=None, enable_pinned_host_transfer=False
I0421 12:07:50.184609 132528837793600 base_pytree_checkpoint_handler.py:153] [process=7][thread=MainThread] Initiated "orbax.checkpoint._src.serialization.jax_array_handlers.ArrayHandler".serialize. Time taken: 0.656202s
I0421 12:07:50.188256 132528837793600 base_pytree_checkpoint_handler.py:128] [process=7] /jax/checkpoint/write/blocking_gbytes_per_sec: 585.673 KiB/s (total gbytes: 535.1 KiB) (time elapsed: 913 milliseconds) (per-host)
I0421 12:07:50.188319 132528837793600 base_pytree_checkpoint_handler.py:732] [process=7][thread=MainThread] Initiated Pytree async_save. Time taken: 0.913687s (batch_requests_ready=0.252215s, total_serialization_initiated=0.661368s, others=0.000103s)
I0421 12:07:50.190164 132528837793600 jax_array_handlers.py:347] Scheduling D2H of 46 prioritized jax.Array.
I0421 12:07:50.190222 132528837793600 replica_slices.py:410] Transferring arrays to host memory with options: use_replica_parallel=True, min_slice_bytes_for_replica_parallel=None, max_replicas_for_replica_parallel=None, enable_pinned_host_transfer=False
I0421 12:07:50.195029 132528837793600 base_pytree_checkpoint_handler.py:153] [process=7][thread=MainThread] Initiated "orbax.checkpoint._src.serialization.jax_array_handlers.ArrayHandler".serialize. Time taken: 0.006599s
I0421 12:07:50.195132 132528837793600 base_pytree_checkpoint_handler.py:128] [process=7] /jax/checkpoint/write/blocking_gbytes_per_sec: 289.707 KiB/s (total gbytes: 267.5 KiB) (time elapsed: 923 milliseconds) (per-host)
I0421 12:07:50.195176 132528837793600 base_pytree_checkpoint_handler.py:732] [process=7][thread=MainThread] Initiated Pytree async_save. Time taken: 0.923516s (batch_requests_ready=0.913138s, total_serialization_initiated=0.010311s, others=0.000066s)
I0421 12:07:50.195298 132528837793600 composite_checkpoint_handler.py:715] [process=7][thread=MainThread] Initiated CompositeCheckpointHandler.async_save. Time taken: 0.927532s (all_items=0.000014s, per_item={'model_params': '0.00001121', 'optimizer_state': '0.00000286'}, temp_paths=0.927518)
I0421 12:07:50.196250 132377099613952 async_checkpointer.py:79] [process=7][thread=async_save] Background save thread started.
I0421 12:07:50.196407 132528837793600 async_checkpointer.py:561] Finished blocking save. Time taken: 1.213042s. Continuing background save to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260421_114106/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260421_114106_07_distill_smoke/checkpoints/5.
I0421 12:07:50.222283 132528837793600 checkpoint_manager.py:1549] [process=7][thread=MainThread][step=5] Starting CheckpointManager Save Finalize thread=save_finalize
I0421 12:07:50.222575 132378197759744 async_checkpointer.py:265] [process=7][thread=save_finalize] Waiting for background save thread=async_save.
I0421 12:07:50.222758 132528837793600 standard_logger.py:34] {'step': 5, 'event_type': 'save', 'directory': 'gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260421_114106/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260421_114106_07_distill_smoke/checkpoints', 'reached_preemption': False, 'preemption_received_at': None, 'synchronous': False, 'wait_for_prev_start_time': 1776773268.1099184, 'wait_for_prev_duration_secs': 0.8718478679656982, 'time_between_consecutive_saves_sec': None, 'checkpointer_blocking_start_time': 1776773268.983333, 'checkpointer_blocking_duration_secs': 1.213179111480713, 'get_old_steps_start_time': 1776773270.1965382, 'get_old_steps_duration_secs': 7.891654968261719e-05, 'checkpoint_manager_blocking_start_time': 1776773255.7732153, 'checkpoint_manager_blocking_duration_secs': 14.449496269226074}
I0421 12:07:50.222946 132528837793600 checkpoint_manager.py:1994] [process=7][thread=MainThread][step=5][wait_until_finished] Waiting for Save Finalize thread (save_finalize) to complete.
I0421 12:07:54.718913 132378189367040 array_metadata_store.py:203] [process=7][thread=array_type_handler] Wrote 46 array_metadata.ArrayMetadata to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260421_114106/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260421_114106_07_distill_smoke/checkpoints/5/model_params/array_metadatas/process_7
I0421 12:07:54.720081 132377099613952 base_pytree_checkpoint_handler.py:128] [process=7] /jax/checkpoint/write/gbytes_per_sec: 49.103 KiB/s (total gbytes: 267.5 KiB) (time elapsed: 5 seconds) (per-host)
I0421 12:07:54.755824 132385706333952 array_metadata_store.py:203] [process=7][thread=array_type_handler] Wrote 37 array_metadata.ArrayMetadata to gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260421_114106/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260421_114106_07_distill_smoke/checkpoints/5/optimizer_state/array_metadatas/process_7
I0421 12:07:54.756923 132377099613952 base_pytree_checkpoint_handler.py:128] [process=7] /jax/checkpoint/write/gbytes_per_sec: 97.599 KiB/s (total gbytes: 535.1 KiB) (time elapsed: 5 seconds) (per-host)
I0421 12:07:54.757021 132377099613952 async_checkpointer.py:90] [process=7][thread=async_save] 4 Handler Commit operations completed. Time taken: 4.560652s.
I0421 12:08:06.003826 132377099613952 async_checkpointer.py:144] [process=7][thread=async_save] Background save thread done. Time taken: 15.807448s.
I0421 12:08:06.004141 132378197759744 async_checkpointer.py:273] [process=7][thread=save_finalize] Done with waiting for background save thread=async_save.
I0421 12:08:06.004258 132378197759744 async_checkpointer.py:283] [process=7][thread=save_finalize] No errors found in background save thread=async_save.
I0421 12:08:06.004304 132378197759744 checkpoint_manager.py:2103] [process=7][thread=save_finalize][step=5] CheckpointManager Save Finalize is syncing with other hosts...
I0421 12:08:06.005948 132378197759744 checkpoint_manager.py:2112] [process=7][thread=save_finalize][step=5] CheckpointManager Save Finalize is done on all hosts.
I0421 12:08:06.006130 132528837793600 checkpoint_manager.py:2006] [process=7][thread=MainThread][step=5][wait_until_finished] Done waiting for Save Finalize thread (save_finalize) running at step=5.
I0421 12:08:06.006251 132528837793600 train_distill.py:724] Final checkpoint saved.
I0421 12:08:06.008454 132528837793600 peft_trainer.py:474] Train step 5 training loss: 15.987230  - training perplexity: 8773359.000000
I0421 12:08:06.008838 132528837793600 checkpoint_manager.py:1983] [process=7][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning.
I0421 12:08:06.008913 132528837793600 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=132528837793600 count=1 at 0x78682c0748c0>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x78680c05d0a0>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x78680c05cc80>, _write_futures=[])
I0421 12:08:06.008961 132528837793600 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=132528837793600 count=1 at 0x78682c0748c0>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x78680c05d0a0>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x78680c05cc80>, _write_futures=[])
I0421 12:08:06.008991 132528837793600 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=132528837793600 count=1 at 0x78682c0748c0>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x78680c05d0a0>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x78680c05cc80>, _write_futures=[])
I0421 12:08:06.009044 132528837793600 train_distill.py:734] Distillation Complete.
I0421 12:08:06.273888 132385141139200 grain_pool.py:547] Shutting down multiprocessing system.
I0421 12:08:07.711265 132385141139200 grain_pool.py:542] Grain pool is exiting.
I0421 12:08:07.711373 132385141139200 grain_pool.py:547] Shutting down multiprocessing system.
I0421 12:08:07.711437 132385141139200 grain_pool.py:547] Shutting down multiprocessing system.
XPK End: Tue Apr 21 12:08:18 UTC 2026
EXIT_CODE=0