XPK Start: Thu Apr 23 09:59:38 UTC 2026 2026-04-23 09:59:55.406479: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303) Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} `rope_scaling`'s factor field must be a float >= 1, got 40 `rope_scaling`'s beta_fast field must be a float, got 32 `rope_scaling`'s beta_slow field must be a float, got 1 Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} I0423 09:59:59.401236 132148589201216 max_utils.py:273] Attempting to initialize the jax distributed system... INFO:2026-04-23 10:00:08,442:jax._src.distributed:149: Starting JAX distributed service on [::]:8482 I0423 10:00:08.442309 132148589201216 distributed.py:149] Starting JAX distributed service on [::]:8482 INFO:2026-04-23 10:00:08,444:jax._src.distributed:166: Connecting to JAX distributed service on mt-07-distill-smoke-p5x10-slice-job-0-0.mt-07-distill-smoke-p5x10:8482 I0423 10:00:08.444617 132148589201216 distributed.py:166] Connecting to JAX distributed service on mt-07-distill-smoke-p5x10-slice-job-0-0.mt-07-distill-smoke-p5x10:8482 I0423 10:00:09.534562 132148589201216 max_utils.py:284] Jax distributed system initialized! I0423 10:00:16.313964 132148589201216 max_utils.py:244] Jax distributed system is already initialized. W0423 10:00:16.444896 132148589201216 pyconfig.py:111] base_output_directory is not provided; Using local directory called maxtext_output I0423 10:00:16.505388 132148589201216 max_utils.py:244] Jax distributed system is already initialized. I0423 10:00:16.506591 132148589201216 pyconfig.py:471] Config param abort_on_inf_loss: True I0423 10:00:16.506642 132148589201216 pyconfig.py:471] Config param abort_on_nan_loss: True I0423 10:00:16.506667 132148589201216 pyconfig.py:471] Config param act_quantization_calibration_method: absmax I0423 10:00:16.506688 132148589201216 pyconfig.py:471] Config param activation_dropout_for_audio: 0.0 I0423 10:00:16.506707 132148589201216 pyconfig.py:471] Config param activation_function_for_audio: gelu I0423 10:00:16.506727 132148589201216 pyconfig.py:471] Config param activations_in_float32: False I0423 10:00:16.506746 132148589201216 pyconfig.py:471] Config param adam_b1: 0.9 I0423 10:00:16.506763 132148589201216 pyconfig.py:471] Config param adam_b2: 0.95 I0423 10:00:16.506781 132148589201216 pyconfig.py:471] Config param adam_eps: 1e-08 I0423 10:00:16.506804 132148589201216 pyconfig.py:471] Config param adam_eps_root: 0.0 I0423 10:00:16.506820 132148589201216 pyconfig.py:471] Config param adam_weight_decay: 0.1 I0423 10:00:16.506837 132148589201216 pyconfig.py:471] Config param adamw_mask: [] I0423 10:00:16.506853 132148589201216 pyconfig.py:471] Config param add_bos: True I0423 10:00:16.506870 132148589201216 pyconfig.py:471] Config param add_eos: True I0423 10:00:16.506886 132148589201216 pyconfig.py:471] Config param allow_split_physical_axes: False I0423 10:00:16.506901 132148589201216 pyconfig.py:471] Config param ar_cache_axis_order: 1,2,0,3 I0423 10:00:16.506917 132148589201216 pyconfig.py:471] Config param async_checkpointing: True I0423 10:00:16.506932 132148589201216 pyconfig.py:471] Config param async_scheduling: False I0423 10:00:16.506947 132148589201216 pyconfig.py:471] Config param attention: dot_product I0423 10:00:16.506963 132148589201216 pyconfig.py:471] Config param attention_bias: False I0423 10:00:16.506979 132148589201216 pyconfig.py:471] Config param attention_dropout_for_audio: 0.0 I0423 10:00:16.506996 132148589201216 pyconfig.py:471] Config param attention_out: RematLocation.REMAT I0423 10:00:16.507016 132148589201216 pyconfig.py:471] Config param attention_output_dim: -1 I0423 10:00:16.507032 132148589201216 pyconfig.py:471] Config param attention_sink: False I0423 10:00:16.507048 132148589201216 pyconfig.py:471] Config param attention_type: global I0423 10:00:16.507063 132148589201216 pyconfig.py:471] Config param attn_logits_soft_cap: None I0423 10:00:16.507081 132148589201216 pyconfig.py:471] Config param audio_path: I0423 10:00:16.507109 132148589201216 pyconfig.py:471] Config param audio_placeholder: <|audio|> I0423 10:00:16.507127 132148589201216 pyconfig.py:471] Config param autoregressive_decode_assert: I0423 10:00:16.507142 132148589201216 pyconfig.py:471] Config param base_config: base.yml I0423 10:00:16.507158 132148589201216 pyconfig.py:471] Config param base_emb_dim: 16 I0423 10:00:16.507173 132148589201216 pyconfig.py:471] Config param base_mlp_dim: 64 I0423 10:00:16.507189 132148589201216 pyconfig.py:471] Config param base_moe_mlp_dim: -1 I0423 10:00:16.507205 132148589201216 pyconfig.py:471] Config param base_num_decoder_layers: 1 I0423 10:00:16.507224 132148589201216 pyconfig.py:471] Config param base_num_kv_heads: 2 I0423 10:00:16.507241 132148589201216 pyconfig.py:471] Config param base_num_query_heads: 2 I0423 10:00:16.507257 132148589201216 pyconfig.py:471] Config param base_output_directory: /deps/maxtext_output I0423 10:00:16.507274 132148589201216 pyconfig.py:471] Config param batch_size: 1 I0423 10:00:16.507291 132148589201216 pyconfig.py:471] Config param batch_split_factor: 1 I0423 10:00:16.507305 132148589201216 pyconfig.py:471] Config param beta_fast: 32 I0423 10:00:16.507322 132148589201216 pyconfig.py:471] Config param beta_slow: 1 I0423 10:00:16.507341 132148589201216 pyconfig.py:471] Config param bwd_quantization_calibration_method: absmax I0423 10:00:16.507358 132148589201216 pyconfig.py:471] Config param capacity_factor: -1.0 I0423 10:00:16.507373 132148589201216 pyconfig.py:471] Config param cast_logits_to_fp32: True I0423 10:00:16.507390 132148589201216 pyconfig.py:471] Config param chat_template: I0423 10:00:16.507406 132148589201216 pyconfig.py:471] Config param chat_template_path: I0423 10:00:16.507423 132148589201216 pyconfig.py:471] Config param checkpoint_conversion_fn: None I0423 10:00:16.507440 132148589201216 pyconfig.py:471] Config param checkpoint_dir: /deps/maxtext_output/gpt3-52k_2026-04-23-10-00/checkpoints/ I0423 10:00:16.507457 132148589201216 pyconfig.py:471] Config param checkpoint_is_quantized: False I0423 10:00:16.507471 132148589201216 pyconfig.py:471] Config param checkpoint_period: 2000 I0423 10:00:16.507487 132148589201216 pyconfig.py:471] Config param checkpoint_storage_concurrent_gb: 96 I0423 10:00:16.507502 132148589201216 pyconfig.py:471] Config param checkpoint_storage_target_data_file_size_bytes: 2147483648 I0423 10:00:16.507518 132148589201216 pyconfig.py:471] Config param checkpoint_storage_use_ocdbt: True I0423 10:00:16.507533 132148589201216 pyconfig.py:471] Config param checkpoint_storage_use_zarr3: True I0423 10:00:16.507550 132148589201216 pyconfig.py:471] Config param checkpoint_todelete_full_path: None I0423 10:00:16.507565 132148589201216 pyconfig.py:471] Config param checkpoint_todelete_subdir: None I0423 10:00:16.507581 132148589201216 pyconfig.py:471] Config param chips_per_vm: 4 I0423 10:00:16.507595 132148589201216 pyconfig.py:471] Config param chunk_attn_window_size: 0 I0423 10:00:16.507611 132148589201216 pyconfig.py:471] Config param collect_stack_trace: False I0423 10:00:16.507625 132148589201216 pyconfig.py:471] Config param colocated_python_checkpointing: False I0423 10:00:16.507641 132148589201216 pyconfig.py:471] Config param colocated_python_data_input: False I0423 10:00:16.507655 132148589201216 pyconfig.py:471] Config param compile_topology: I0423 10:00:16.507670 132148589201216 pyconfig.py:471] Config param compile_topology_num_slices: -1 I0423 10:00:16.507685 132148589201216 pyconfig.py:471] Config param compile_xla_flags: I0423 10:00:16.507700 132148589201216 pyconfig.py:471] Config param compiled_trainstep_file: I0423 10:00:16.507715 132148589201216 pyconfig.py:471] Config param compute_axis_order: 0,1,2,3 I0423 10:00:16.507730 132148589201216 pyconfig.py:471] Config param constant_bound_config: [] I0423 10:00:16.507745 132148589201216 pyconfig.py:471] Config param context: RematLocation.REMAT I0423 10:00:16.507760 132148589201216 pyconfig.py:471] Config param context_parallel_load_balance: True I0423 10:00:16.507776 132148589201216 pyconfig.py:471] Config param context_parallel_reorder_strategy: ReorderStrategy.AUTO I0423 10:00:16.507793 132148589201216 pyconfig.py:471] Config param context_parallel_size: 1 I0423 10:00:16.507807 132148589201216 pyconfig.py:471] Config param context_parallel_strategy: all_gather I0423 10:00:16.507823 132148589201216 pyconfig.py:471] Config param context_sharding: context I0423 10:00:16.507838 132148589201216 pyconfig.py:471] Config param conv_chunksize_for_audio: 500 I0423 10:00:16.507852 132148589201216 pyconfig.py:471] Config param conv_stride_for_vit: 14 I0423 10:00:16.507867 132148589201216 pyconfig.py:471] Config param convert_checkpoint_if_possible: False I0423 10:00:16.507883 132148589201216 pyconfig.py:471] Config param cost_estimate_flops_bwd: -1 I0423 10:00:16.507897 132148589201216 pyconfig.py:471] Config param cost_estimate_flops_fwd: -1 I0423 10:00:16.507912 132148589201216 pyconfig.py:471] Config param custom_mesh: I0423 10:00:16.507927 132148589201216 pyconfig.py:471] Config param custom_mesh_and_rule: I0423 10:00:16.507942 132148589201216 pyconfig.py:471] Config param d_model_for_audio: 256 I0423 10:00:16.507956 132148589201216 pyconfig.py:471] Config param data_sharding: (('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'),) I0423 10:00:16.507976 132148589201216 pyconfig.py:471] Config param data_shuffle_seed: 0 I0423 10:00:16.507991 132148589201216 pyconfig.py:471] Config param dataset_name: c4/en:3.0.1 I0423 10:00:16.508007 132148589201216 pyconfig.py:471] Config param dataset_path: I0423 10:00:16.508022 132148589201216 pyconfig.py:471] Config param dataset_type: DatasetType.HF I0423 10:00:16.508038 132148589201216 pyconfig.py:471] Config param dcn_autoregressive_parallelism: 1 I0423 10:00:16.508054 132148589201216 pyconfig.py:471] Config param dcn_context_autoregressive_parallelism: 1 I0423 10:00:16.508069 132148589201216 pyconfig.py:471] Config param dcn_context_parallelism: 1 I0423 10:00:16.508083 132148589201216 pyconfig.py:471] Config param dcn_data_parallelism: -1 I0423 10:00:16.508113 132148589201216 pyconfig.py:471] Config param dcn_diloco_parallelism: 1 I0423 10:00:16.508130 132148589201216 pyconfig.py:471] Config param dcn_expert_parallelism: 1 I0423 10:00:16.508145 132148589201216 pyconfig.py:471] Config param dcn_fsdp_parallelism: 1 I0423 10:00:16.508161 132148589201216 pyconfig.py:471] Config param dcn_fsdp_transpose_parallelism: 1 I0423 10:00:16.508178 132148589201216 pyconfig.py:471] Config param dcn_parallelism: [1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] I0423 10:00:16.508196 132148589201216 pyconfig.py:471] Config param dcn_pipeline_parallelism: 1 I0423 10:00:16.508213 132148589201216 pyconfig.py:471] Config param dcn_sequence_parallelism: 1 I0423 10:00:16.508229 132148589201216 pyconfig.py:471] Config param dcn_tensor_parallelism: 1 I0423 10:00:16.508243 132148589201216 pyconfig.py:471] Config param dcn_tensor_sequence_parallelism: 1 I0423 10:00:16.508259 132148589201216 pyconfig.py:471] Config param dcn_tensor_transpose_parallelism: 1 I0423 10:00:16.508274 132148589201216 pyconfig.py:471] Config param debug: {'rl': False} I0423 10:00:16.508290 132148589201216 pyconfig.py:471] Config param debug_sharding: False I0423 10:00:16.508305 132148589201216 pyconfig.py:471] Config param decode_sampling_nucleus_p: -1 I0423 10:00:16.508320 132148589201216 pyconfig.py:471] Config param decode_sampling_strategy: SamplingStrategy.GREEDY I0423 10:00:16.508342 132148589201216 pyconfig.py:471] Config param decode_sampling_temperature: 1.0 I0423 10:00:16.508357 132148589201216 pyconfig.py:471] Config param decode_sampling_top_k: 0 I0423 10:00:16.508373 132148589201216 pyconfig.py:471] Config param decoder_block: DecoderBlockType.GPT3 I0423 10:00:16.508390 132148589201216 pyconfig.py:471] Config param decoder_layer_input: RematLocation.DEVICE I0423 10:00:16.508405 132148589201216 pyconfig.py:471] Config param deepstack_visual_indexes_for_vit: [] I0423 10:00:16.508421 132148589201216 pyconfig.py:471] Config param degenerate_group_masking: True I0423 10:00:16.508435 132148589201216 pyconfig.py:471] Config param dense_init_scale: 1.0 I0423 10:00:16.508450 132148589201216 pyconfig.py:471] Config param diloco_outer_lr: 0.3 I0423 10:00:16.508466 132148589201216 pyconfig.py:471] Config param diloco_outer_momentum: 0.9 I0423 10:00:16.508482 132148589201216 pyconfig.py:471] Config param diloco_sync_period: 36 I0423 10:00:16.508497 132148589201216 pyconfig.py:471] Config param distill_alpha: 0.5 I0423 10:00:16.508513 132148589201216 pyconfig.py:471] Config param distill_alpha_end: None I0423 10:00:16.508527 132148589201216 pyconfig.py:471] Config param distill_alpha_schedule: constant I0423 10:00:16.508543 132148589201216 pyconfig.py:471] Config param distill_beta: 0.0 I0423 10:00:16.508558 132148589201216 pyconfig.py:471] Config param distill_beta_end: None I0423 10:00:16.508573 132148589201216 pyconfig.py:471] Config param distill_beta_schedule: constant I0423 10:00:16.508587 132148589201216 pyconfig.py:471] Config param distill_feature_loss_type: cosine I0423 10:00:16.508602 132148589201216 pyconfig.py:471] Config param distill_layer_indices: None I0423 10:00:16.508616 132148589201216 pyconfig.py:471] Config param distill_temperature: 1.0 I0423 10:00:16.508653 132148589201216 pyconfig.py:471] Config param distill_temperature_end: None I0423 10:00:16.508670 132148589201216 pyconfig.py:471] Config param distill_temperature_schedule: constant I0423 10:00:16.508685 132148589201216 pyconfig.py:471] Config param downsample_hidden_size_for_audio: 256 I0423 10:00:16.508700 132148589201216 pyconfig.py:471] Config param dpo_beta: 0.1 I0423 10:00:16.508719 132148589201216 pyconfig.py:471] Config param dpo_label_smoothing: 0.0 I0423 10:00:16.508742 132148589201216 pyconfig.py:471] Config param dq_reduction_steps: 0 I0423 10:00:16.508766 132148589201216 pyconfig.py:471] Config param dropout_rate: 0.0 I0423 10:00:16.508791 132148589201216 pyconfig.py:471] Config param dtype: bfloat16 I0423 10:00:16.508823 132148589201216 pyconfig.py:471] Config param dtype_mm: float32 I0423 10:00:16.508840 132148589201216 pyconfig.py:471] Config param dump_hlo: False I0423 10:00:16.508856 132148589201216 pyconfig.py:471] Config param dump_hlo_delete_local_after: True I0423 10:00:16.508872 132148589201216 pyconfig.py:471] Config param dump_hlo_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-23-10-00/xla_dump I0423 10:00:16.508889 132148589201216 pyconfig.py:471] Config param dump_hlo_local_dir: /tmp/xla_dump/ I0423 10:00:16.508904 132148589201216 pyconfig.py:471] Config param dump_hlo_local_module_name: jit_train_step I0423 10:00:16.508919 132148589201216 pyconfig.py:471] Config param dump_hlo_module_name: jit_train_step I0423 10:00:16.508935 132148589201216 pyconfig.py:471] Config param dump_hlo_upload_all: False I0423 10:00:16.508951 132148589201216 pyconfig.py:471] Config param dump_hlo_xla_flags: I0423 10:00:16.508966 132148589201216 pyconfig.py:471] Config param dump_jaxpr: False I0423 10:00:16.508981 132148589201216 pyconfig.py:471] Config param dump_jaxpr_delete_local_after: True I0423 10:00:16.508997 132148589201216 pyconfig.py:471] Config param dump_jaxpr_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-23-10-00/jaxpr_dump I0423 10:00:16.509012 132148589201216 pyconfig.py:471] Config param dump_jaxpr_local_dir: /tmp/jaxpr_dump/ I0423 10:00:16.509027 132148589201216 pyconfig.py:471] Config param dump_step: -1 I0423 10:00:16.509043 132148589201216 pyconfig.py:471] Config param elastic_enabled: False I0423 10:00:16.509057 132148589201216 pyconfig.py:471] Config param elastic_max_retries: 10 I0423 10:00:16.509073 132148589201216 pyconfig.py:471] Config param elastic_timeout_seconds: 300 I0423 10:00:16.509109 132148589201216 pyconfig.py:471] Config param emb_dim: 16 I0423 10:00:16.509124 132148589201216 pyconfig.py:471] Config param enable_autocheckpoint: False I0423 10:00:16.509140 132148589201216 pyconfig.py:471] Config param enable_checkpoint_cloud_logger: False I0423 10:00:16.509154 132148589201216 pyconfig.py:471] Config param enable_checkpointing: True I0423 10:00:16.509170 132148589201216 pyconfig.py:471] Config param enable_continuous_checkpointing: False I0423 10:00:16.509185 132148589201216 pyconfig.py:471] Config param enable_data_shuffling: True I0423 10:00:16.509200 132148589201216 pyconfig.py:471] Config param enable_diloco: False I0423 10:00:16.509217 132148589201216 pyconfig.py:471] Config param enable_dp_attention: False I0423 10:00:16.509232 132148589201216 pyconfig.py:471] Config param enable_dropout: False I0423 10:00:16.509246 132148589201216 pyconfig.py:471] Config param enable_emergency_checkpoint: False I0423 10:00:16.509262 132148589201216 pyconfig.py:471] Config param enable_expert_parallel: False I0423 10:00:16.509277 132148589201216 pyconfig.py:471] Config param enable_gcp_goodput_metrics: True I0423 10:00:16.509293 132148589201216 pyconfig.py:471] Config param enable_gcp_step_deviation_metrics: True I0423 10:00:16.509307 132148589201216 pyconfig.py:471] Config param enable_goodput_recording: False I0423 10:00:16.509322 132148589201216 pyconfig.py:471] Config param enable_jax_profiler: False I0423 10:00:16.509341 132148589201216 pyconfig.py:471] Config param enable_llm_inference_pool: False I0423 10:00:16.509356 132148589201216 pyconfig.py:471] Config param enable_model_warmup: False I0423 10:00:16.509371 132148589201216 pyconfig.py:471] Config param enable_multi_tier_checkpointing: False I0423 10:00:16.509386 132148589201216 pyconfig.py:471] Config param enable_nnx: False I0423 10:00:16.509401 132148589201216 pyconfig.py:471] Config param enable_orbax_v1: False I0423 10:00:16.509417 132148589201216 pyconfig.py:471] Config param enable_padding_causal_mask: True I0423 10:00:16.509431 132148589201216 pyconfig.py:471] Config param enable_pathways_goodput: False I0423 10:00:16.509447 132148589201216 pyconfig.py:471] Config param enable_prefix_caching: False I0423 10:00:16.509461 132148589201216 pyconfig.py:471] Config param enable_rampup_batch_size: False I0423 10:00:16.509476 132148589201216 pyconfig.py:471] Config param enable_single_controller: False I0423 10:00:16.509491 132148589201216 pyconfig.py:471] Config param enable_single_replica_ckpt_restoring: False I0423 10:00:16.509506 132148589201216 pyconfig.py:471] Config param enable_tensorboard: True I0423 10:00:16.509521 132148589201216 pyconfig.py:471] Config param enable_tunix_perf_metrics: False I0423 10:00:16.509536 132148589201216 pyconfig.py:471] Config param encoder_attention_heads_for_audio: 4 I0423 10:00:16.509551 132148589201216 pyconfig.py:471] Config param encoder_ffn_dim_for_audio: 512 I0423 10:00:16.509566 132148589201216 pyconfig.py:471] Config param encoder_layers_for_audio: 2 I0423 10:00:16.509581 132148589201216 pyconfig.py:471] Config param engram: RematLocation.REMAT I0423 10:00:16.509596 132148589201216 pyconfig.py:471] Config param engram_head_dim: 1280 I0423 10:00:16.509612 132148589201216 pyconfig.py:471] Config param engram_kernel_size: 4 I0423 10:00:16.509628 132148589201216 pyconfig.py:471] Config param engram_layers: [] I0423 10:00:16.509642 132148589201216 pyconfig.py:471] Config param engram_max_ngram_size: 3 I0423 10:00:16.509657 132148589201216 pyconfig.py:471] Config param engram_num_heads: 8 I0423 10:00:16.509673 132148589201216 pyconfig.py:471] Config param engram_seed: 0 I0423 10:00:16.509688 132148589201216 pyconfig.py:471] Config param engram_vocab_bases: [] I0423 10:00:16.509703 132148589201216 pyconfig.py:471] Config param epsilon_high: None I0423 10:00:16.509718 132148589201216 pyconfig.py:471] Config param eval_corr_lst: False I0423 10:00:16.509734 132148589201216 pyconfig.py:471] Config param eval_data_columns: ['text'] I0423 10:00:16.509749 132148589201216 pyconfig.py:471] Config param eval_dataset_name: c4/en:3.0.1 I0423 10:00:16.509765 132148589201216 pyconfig.py:471] Config param eval_image_column: image I0423 10:00:16.509779 132148589201216 pyconfig.py:471] Config param eval_interval: -1 I0423 10:00:16.509795 132148589201216 pyconfig.py:471] Config param eval_make_lst: False I0423 10:00:16.509809 132148589201216 pyconfig.py:471] Config param eval_per_device_batch_size: 2 I0423 10:00:16.509825 132148589201216 pyconfig.py:471] Config param eval_sampling_strategy: greedy I0423 10:00:16.509839 132148589201216 pyconfig.py:471] Config param eval_split: validation I0423 10:00:16.509855 132148589201216 pyconfig.py:471] Config param eval_steps: -1 I0423 10:00:16.509871 132148589201216 pyconfig.py:471] Config param expansion_factor_real_data: -1.0 I0423 10:00:16.509886 132148589201216 pyconfig.py:471] Config param final_logits_soft_cap: None I0423 10:00:16.509903 132148589201216 pyconfig.py:471] Config param first_num_dense_layers: 0 I0423 10:00:16.509918 132148589201216 pyconfig.py:471] Config param float32_gate_logits: False I0423 10:00:16.509933 132148589201216 pyconfig.py:471] Config param float32_logits: False I0423 10:00:16.509949 132148589201216 pyconfig.py:471] Config param float32_qk_product: False I0423 10:00:16.509964 132148589201216 pyconfig.py:471] Config param float32_weight_sum: True I0423 10:00:16.509981 132148589201216 pyconfig.py:471] Config param force_q_layout: False I0423 10:00:16.509998 132148589201216 pyconfig.py:471] Config param force_unroll: False I0423 10:00:16.510013 132148589201216 pyconfig.py:471] Config param formatting_func_kwargs: {} I0423 10:00:16.510030 132148589201216 pyconfig.py:471] Config param formatting_func_path: I0423 10:00:16.510044 132148589201216 pyconfig.py:471] Config param freeze_audio_encoder_params: True I0423 10:00:16.510060 132148589201216 pyconfig.py:471] Config param freeze_vision_encoder_params: True I0423 10:00:16.510075 132148589201216 pyconfig.py:471] Config param fused_mlp: False I0423 10:00:16.510091 132148589201216 pyconfig.py:471] Config param fused_qkv: True I0423 10:00:16.510123 132148589201216 pyconfig.py:471] Config param gcs_metrics: False I0423 10:00:16.510139 132148589201216 pyconfig.py:471] Config param gdn_chunk_size: 64 I0423 10:00:16.510154 132148589201216 pyconfig.py:471] Config param gdn_conv_kernel_dim: 4 I0423 10:00:16.510171 132148589201216 pyconfig.py:471] Config param gdn_key_head_dim: 128 I0423 10:00:16.510187 132148589201216 pyconfig.py:471] Config param gdn_num_key_heads: 16 I0423 10:00:16.510201 132148589201216 pyconfig.py:471] Config param gdn_num_value_heads: 32 I0423 10:00:16.510217 132148589201216 pyconfig.py:471] Config param gdn_value_head_dim: 128 I0423 10:00:16.510232 132148589201216 pyconfig.py:471] Config param generate_padding_batch_eval: False I0423 10:00:16.510248 132148589201216 pyconfig.py:471] Config param generate_padding_batch_train: False I0423 10:00:16.510262 132148589201216 pyconfig.py:471] Config param generate_slice: v5e-16 I0423 10:00:16.510278 132148589201216 pyconfig.py:471] Config param generation_configs: {} I0423 10:00:16.510293 132148589201216 pyconfig.py:471] Config param global_batch_size_to_eval_on: 64 I0423 10:00:16.510308 132148589201216 pyconfig.py:471] Config param global_batch_size_to_load: 512 I0423 10:00:16.510323 132148589201216 pyconfig.py:471] Config param global_batch_size_to_load_eval: 64 I0423 10:00:16.510343 132148589201216 pyconfig.py:471] Config param global_batch_size_to_load_increment: None I0423 10:00:16.510358 132148589201216 pyconfig.py:471] Config param global_batch_size_to_load_start: None I0423 10:00:16.510374 132148589201216 pyconfig.py:471] Config param global_batch_size_to_train_on: 512 I0423 10:00:16.510389 132148589201216 pyconfig.py:471] Config param global_head_dim: 0 I0423 10:00:16.510405 132148589201216 pyconfig.py:471] Config param global_num_kv_heads: 0 I0423 10:00:16.510420 132148589201216 pyconfig.py:471] Config param global_parameter_scale: 1 I0423 10:00:16.510436 132148589201216 pyconfig.py:471] Config param global_rampup_samples: 500 I0423 10:00:16.510450 132148589201216 pyconfig.py:471] Config param global_rope_max_timescale: -1 I0423 10:00:16.510466 132148589201216 pyconfig.py:471] Config param global_rope_proportion: 0.25 I0423 10:00:16.510482 132148589201216 pyconfig.py:471] Config param goodput_upload_interval_seconds: 30 I0423 10:00:16.510498 132148589201216 pyconfig.py:471] Config param grad_dtype: float32 I0423 10:00:16.510533 132148589201216 pyconfig.py:471] Config param gradient_accumulation_steps: 8 I0423 10:00:16.510550 132148589201216 pyconfig.py:471] Config param gradient_clipping_threshold: 1.0 I0423 10:00:16.510566 132148589201216 pyconfig.py:471] Config param grain_data_source_max_workers: 16 I0423 10:00:16.510582 132148589201216 pyconfig.py:471] Config param grain_eval_files: I0423 10:00:16.510597 132148589201216 pyconfig.py:471] Config param grain_file_type: arrayrecord I0423 10:00:16.510614 132148589201216 pyconfig.py:471] Config param grain_num_threads: 16 I0423 10:00:16.510629 132148589201216 pyconfig.py:471] Config param grain_num_threads_eval: 16 I0423 10:00:16.510645 132148589201216 pyconfig.py:471] Config param grain_packing_type: first_fit I0423 10:00:16.510661 132148589201216 pyconfig.py:471] Config param grain_per_worker_buffer_size: 1 I0423 10:00:16.510676 132148589201216 pyconfig.py:471] Config param grain_per_worker_buffer_size_eval: 1 I0423 10:00:16.510692 132148589201216 pyconfig.py:471] Config param grain_prefetch_buffer_size: 500 I0423 10:00:16.510706 132148589201216 pyconfig.py:471] Config param grain_prefetch_buffer_size_eval: 500 I0423 10:00:16.510722 132148589201216 pyconfig.py:471] Config param grain_ram_budget_mb: 1024 I0423 10:00:16.510738 132148589201216 pyconfig.py:471] Config param grain_shuffle_buffer_size: 100 I0423 10:00:16.510754 132148589201216 pyconfig.py:471] Config param grain_train_files: I0423 10:00:16.510769 132148589201216 pyconfig.py:471] Config param grain_train_mixture_config_path: I0423 10:00:16.510784 132148589201216 pyconfig.py:471] Config param grain_worker_count: 1 I0423 10:00:16.510799 132148589201216 pyconfig.py:471] Config param grain_worker_count_eval: 1 I0423 10:00:16.510815 132148589201216 pyconfig.py:471] Config param grpo_beta: 0.08 I0423 10:00:16.510830 132148589201216 pyconfig.py:471] Config param grpo_epsilon: 0.2 I0423 10:00:16.510846 132148589201216 pyconfig.py:471] Config param hardware: tpu I0423 10:00:16.510861 132148589201216 pyconfig.py:471] Config param hbm_utilization_vllm: 0.72 I0423 10:00:16.510876 132148589201216 pyconfig.py:471] Config param head_dim: 8 I0423 10:00:16.510891 132148589201216 pyconfig.py:471] Config param heartbeat_reporting_interval_in_seconds: 5 I0423 10:00:16.510907 132148589201216 pyconfig.py:471] Config param hf_data_dir: None I0423 10:00:16.510921 132148589201216 pyconfig.py:471] Config param hf_eval_files: None I0423 10:00:16.510937 132148589201216 pyconfig.py:471] Config param hf_eval_split: None I0423 10:00:16.510951 132148589201216 pyconfig.py:471] Config param hf_name: None I0423 10:00:16.510967 132148589201216 pyconfig.py:471] Config param hf_path: OptimalScale/ClimbMix I0423 10:00:16.510983 132148589201216 pyconfig.py:471] Config param hf_train_files: None I0423 10:00:16.510998 132148589201216 pyconfig.py:471] Config param hidden_size_for_vit: 1408 I0423 10:00:16.511013 132148589201216 pyconfig.py:471] Config param hide_profiler_step_metric: False I0423 10:00:16.511029 132148589201216 pyconfig.py:471] Config param ici_autoregressive_parallelism: 1 I0423 10:00:16.511044 132148589201216 pyconfig.py:471] Config param ici_context_autoregressive_parallelism: 1 I0423 10:00:16.511061 132148589201216 pyconfig.py:471] Config param ici_context_parallelism: 1 I0423 10:00:16.511076 132148589201216 pyconfig.py:471] Config param ici_data_parallelism: 1 I0423 10:00:16.511100 132148589201216 pyconfig.py:471] Config param ici_diloco_parallelism: 1 I0423 10:00:16.511116 132148589201216 pyconfig.py:471] Config param ici_expert_parallelism: 1 I0423 10:00:16.511132 132148589201216 pyconfig.py:471] Config param ici_fsdp_parallelism: -1 I0423 10:00:16.511147 132148589201216 pyconfig.py:471] Config param ici_fsdp_transpose_parallelism: 1 I0423 10:00:16.511163 132148589201216 pyconfig.py:471] Config param ici_parallelism: [1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1] I0423 10:00:16.511180 132148589201216 pyconfig.py:471] Config param ici_pipeline_parallelism: 1 I0423 10:00:16.511194 132148589201216 pyconfig.py:471] Config param ici_sequence_parallelism: 1 I0423 10:00:16.511209 132148589201216 pyconfig.py:471] Config param ici_tensor_parallelism: 1 I0423 10:00:16.511224 132148589201216 pyconfig.py:471] Config param ici_tensor_sequence_parallelism: 1 I0423 10:00:16.511240 132148589201216 pyconfig.py:471] Config param ici_tensor_transpose_parallelism: 1 I0423 10:00:16.511255 132148589201216 pyconfig.py:471] Config param image_path: I0423 10:00:16.511271 132148589201216 pyconfig.py:471] Config param image_placeholder: <|image|> I0423 10:00:16.511287 132148589201216 pyconfig.py:471] Config param image_size_for_vit: 896 I0423 10:00:16.511301 132148589201216 pyconfig.py:471] Config param indexer_head_dim: 128 I0423 10:00:16.511316 132148589201216 pyconfig.py:471] Config param indexer_loss_scaling_factor: 0.0 I0423 10:00:16.511335 132148589201216 pyconfig.py:471] Config param indexer_n_heads: 64 I0423 10:00:16.511351 132148589201216 pyconfig.py:471] Config param indexer_sparse_training: False I0423 10:00:16.511367 132148589201216 pyconfig.py:471] Config param indexer_topk: 2048 I0423 10:00:16.511383 132148589201216 pyconfig.py:471] Config param inference_benchmark_test: False I0423 10:00:16.511397 132148589201216 pyconfig.py:471] Config param inference_metadata_file: I0423 10:00:16.511413 132148589201216 pyconfig.py:471] Config param inference_microbenchmark_log_file_path: I0423 10:00:16.511429 132148589201216 pyconfig.py:471] Config param inference_microbenchmark_loop_iters: 10 I0423 10:00:16.511443 132148589201216 pyconfig.py:471] Config param inference_microbenchmark_num_samples: [1, 2, 3, 4, 5] I0423 10:00:16.511459 132148589201216 pyconfig.py:471] Config param inference_microbenchmark_prefill_lengths: 64,128,256,512,1024 I0423 10:00:16.511476 132148589201216 pyconfig.py:471] Config param inference_microbenchmark_stages: prefill,generate I0423 10:00:16.511490 132148589201216 pyconfig.py:471] Config param inference_server: MaxtextInterleavedServer I0423 10:00:16.511505 132148589201216 pyconfig.py:471] Config param inhomogeneous_layer_cycle_interval: 1 I0423 10:00:16.511520 132148589201216 pyconfig.py:471] Config param init_weights_seed: 0 I0423 10:00:16.511536 132148589201216 pyconfig.py:471] Config param input_data_sharding_logical_axes: ['activation_embed_and_logits_batch', 'activation_norm_length'] I0423 10:00:16.511552 132148589201216 pyconfig.py:471] Config param interleave_moe_layer_step: 1 I0423 10:00:16.511567 132148589201216 pyconfig.py:471] Config param intermediate_size_for_vit: 5632 I0423 10:00:16.511583 132148589201216 pyconfig.py:471] Config param internal_compile: False I0423 10:00:16.511597 132148589201216 pyconfig.py:471] Config param internal_compile_num_devices: -1 I0423 10:00:16.511613 132148589201216 pyconfig.py:471] Config param jax_cache_dir: ~/jax_cache I0423 10:00:16.511628 132148589201216 pyconfig.py:471] Config param jax_debug_log_modules: I0423 10:00:16.511643 132148589201216 pyconfig.py:471] Config param jax_distributed_initialization_timeout: 300 I0423 10:00:16.511658 132148589201216 pyconfig.py:471] Config param jax_profiler_port: 9999 I0423 10:00:16.511674 132148589201216 pyconfig.py:471] Config param key_proj: RematLocation.REMAT I0423 10:00:16.511690 132148589201216 pyconfig.py:471] Config param kv_cache_buffer: 256 I0423 10:00:16.511705 132148589201216 pyconfig.py:471] Config param kv_lora_rank: 512 I0423 10:00:16.511721 132148589201216 pyconfig.py:471] Config param kv_quant_axis: KvQuantAxis.HEADS_AND_DKV I0423 10:00:16.511739 132148589201216 pyconfig.py:471] Config param kv_quant_dtype: int8 I0423 10:00:16.511755 132148589201216 pyconfig.py:471] Config param kv_wa_proj: RematLocation.REMAT I0423 10:00:16.511770 132148589201216 pyconfig.py:471] Config param learning_rate: 0.0002 I0423 10:00:16.511787 132148589201216 pyconfig.py:471] Config param learning_rate_final_fraction: 0.1 I0423 10:00:16.511803 132148589201216 pyconfig.py:471] Config param learning_rate_schedule_steps: 200000 I0423 10:00:16.511818 132148589201216 pyconfig.py:471] Config param load_balance_loss_weight: 0.0 I0423 10:00:16.511834 132148589201216 pyconfig.py:471] Config param load_checkpoint_only_once: False I0423 10:00:16.511849 132148589201216 pyconfig.py:471] Config param load_from_prefill_dir: False I0423 10:00:16.511864 132148589201216 pyconfig.py:471] Config param load_full_state_path: I0423 10:00:16.511880 132148589201216 pyconfig.py:471] Config param load_parameters_path: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items I0423 10:00:16.511895 132148589201216 pyconfig.py:471] Config param local_checkpoint_directory: I0423 10:00:16.511911 132148589201216 pyconfig.py:471] Config param local_checkpoint_period: 0 I0423 10:00:16.511926 132148589201216 pyconfig.py:471] Config param local_rope_max_timescale: -1 I0423 10:00:16.511942 132148589201216 pyconfig.py:471] Config param local_rope_proportion: 1.0 I0423 10:00:16.511956 132148589201216 pyconfig.py:471] Config param log_config: True I0423 10:00:16.511973 132148589201216 pyconfig.py:471] Config param log_period: 10 I0423 10:00:16.511987 132148589201216 pyconfig.py:471] Config param logical_axis_rules: (('activation_embed_and_logits_batch', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_embed_and_logits_batch_sequence', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_vocab', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_vocab', ('tensor', 'tensor_transpose')), ('activation_vocab', 'tensor_sequence'), ('activation_vocab', ('sequence', 'context')), ('vocab', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('embed_vocab', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence', 'autoregressive')), ('activation_kv_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence')), ('activation_attn_length', ('sequence', 'context')), ('activation_attn_length', ('context',)), ('activation_q_length', ('context',)), ('activation_kv_length', ()), ('activation_attn_embed', ('tensor', 'tensor_transpose')), ('activation_kv', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_kv_head_dim', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('q_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('kv_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('qkv', ()), ('kv', ()), ('kv_head_dim', ()), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'expert')), ('q_lora_up_proj', ()), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'expert')), ('kv_lora_up_proj', ()), ('activation_batch_moe', ('data', 'fsdp', 'fsdp_transpose')), ('activation_length_moe', ('sequence', 'context')), ('activation_length_moe', ('context',)), ('activation_norm_length_moe', ('tensor_sequence', 'context', 'sequence')), ('activation_embed_moe', ('tensor', 'tensor_transpose')), ('activation_mlp_moe', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_exp', ('expert',)), ('exp', 'expert'), ('mlp_moe', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'context')), ('embed_moe', ('fsdp', 'sequence', 'context')), ('activation_mlp', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_length', ('sequence', 'context')), ('activation_length', ('context',)), ('activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_embed', ('tensor', 'tensor_transpose')), ('activation_stage', 'stage'), ('mlp', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'context', 'expert')), ('norm', ('tensor', 'tensor_transpose')), ('layers', 'stage'), ('diloco', 'diloco'), ('engram_dim', ('tensor',)), ('dense_layers', ()), ('moe_layers', ()), ('mhc', ()), ('prefill_activation_length', ('sequence', 'context')), ('prefill_activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_prefill_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_length', ('sequence',)), ('cache_heads', ('autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence')), ('cache_heads', ('autoregressive', 'tensor', 'tensor_sequence')), ('paged_kv_heads', ('tensor',)), ('cache_batch_prefill', ()), ('cache_batch', ()), ('cache_heads_none', ()), ('cache_kv', ()), ('cache_sequence', ()), ('num_pages', ()), ('tokens_per_page', ()), ('paged_kv_head_dim_size', ()), ('mlp_no_fsdp', ('tensor', 'tensor_sequence', 'autoregressive')), ('embed_tensor_transpose', ('tensor_transpose',)), ('exp_with_fsdp', 'fsdp')) I0423 10:00:16.512063 132148589201216 pyconfig.py:471] Config param logits_dot_in_fp32: False I0423 10:00:16.512081 132148589201216 pyconfig.py:471] Config param logits_via_embedding: True I0423 10:00:16.512104 132148589201216 pyconfig.py:471] Config param lora_input_adapters_path: I0423 10:00:16.512121 132148589201216 pyconfig.py:471] Config param loss_algo: grpo I0423 10:00:16.512137 132148589201216 pyconfig.py:471] Config param lr_schedule_type: LearningRateScheduleType.COSINE I0423 10:00:16.512154 132148589201216 pyconfig.py:471] Config param managed_mldiagnostics: False I0423 10:00:16.512170 132148589201216 pyconfig.py:471] Config param managed_mldiagnostics_dir: /deps/maxtext_output/gpt3-52k_2026-04-23-10-00/managed-mldiagnostics I0423 10:00:16.512186 132148589201216 pyconfig.py:471] Config param managed_mldiagnostics_run_group: I0423 10:00:16.512201 132148589201216 pyconfig.py:471] Config param matmul_precision: MatmulPrecision.DEFAULT I0423 10:00:16.512219 132148589201216 pyconfig.py:471] Config param max_checkify: False I0423 10:00:16.512234 132148589201216 pyconfig.py:471] Config param max_concurrency: 256 I0423 10:00:16.512249 132148589201216 pyconfig.py:471] Config param max_corpus_chars: 10000000 I0423 10:00:16.512264 132148589201216 pyconfig.py:471] Config param max_num_batched_tokens: None I0423 10:00:16.512279 132148589201216 pyconfig.py:471] Config param max_num_checkpoints_to_keep: None I0423 10:00:16.512295 132148589201216 pyconfig.py:471] Config param max_num_images_per_example: -1 I0423 10:00:16.512309 132148589201216 pyconfig.py:471] Config param max_num_seqs: None I0423 10:00:16.512325 132148589201216 pyconfig.py:471] Config param max_position_embeddings: 163840 I0423 10:00:16.512345 132148589201216 pyconfig.py:471] Config param max_prefill_predict_length: 64 I0423 10:00:16.512359 132148589201216 pyconfig.py:471] Config param max_sample_len_for_audio: 10000 I0423 10:00:16.512374 132148589201216 pyconfig.py:471] Config param max_segments_per_seq: -1 I0423 10:00:16.512390 132148589201216 pyconfig.py:471] Config param max_source_positions_for_audio: 1500 I0423 10:00:16.512404 132148589201216 pyconfig.py:471] Config param max_target_length: 2048 I0423 10:00:16.512420 132148589201216 pyconfig.py:471] Config param max_timescale_for_audio: 10000.0 I0423 10:00:16.512436 132148589201216 pyconfig.py:471] Config param megablox: True I0423 10:00:16.512452 132148589201216 pyconfig.py:471] Config param merge_gating_gmm: False I0423 10:00:16.512466 132148589201216 pyconfig.py:471] Config param mesh_axes: ['diloco', 'data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'] I0423 10:00:16.512484 132148589201216 pyconfig.py:471] Config param metrics_dir: /deps/maxtext_output/gpt3-52k_2026-04-23-10-00/metrics/ I0423 10:00:16.512500 132148589201216 pyconfig.py:471] Config param metrics_file: I0423 10:00:16.512515 132148589201216 pyconfig.py:471] Config param mhc_expansion_rate: 1 I0423 10:00:16.512529 132148589201216 pyconfig.py:471] Config param micro_batch_size_to_eval_on: 64 I0423 10:00:16.512545 132148589201216 pyconfig.py:471] Config param micro_batch_size_to_train_on: 64 I0423 10:00:16.512560 132148589201216 pyconfig.py:471] Config param mla_kv: RematLocation.REMAT I0423 10:00:16.512576 132148589201216 pyconfig.py:471] Config param mla_naive_kvcache: True I0423 10:00:16.512591 132148589201216 pyconfig.py:471] Config param mla_q: RematLocation.REMAT I0423 10:00:16.512607 132148589201216 pyconfig.py:471] Config param mlp_activations: ['gelu'] I0423 10:00:16.512622 132148589201216 pyconfig.py:471] Config param mlp_activations_limit: -1.0 I0423 10:00:16.512638 132148589201216 pyconfig.py:471] Config param mlp_bias: False I0423 10:00:16.512652 132148589201216 pyconfig.py:471] Config param mlp_dim: 64 I0423 10:00:16.512668 132148589201216 pyconfig.py:471] Config param mlpwi: RematLocation.REMAT I0423 10:00:16.512682 132148589201216 pyconfig.py:471] Config param mlpwi_0: RematLocation.REMAT I0423 10:00:16.512698 132148589201216 pyconfig.py:471] Config param mlpwi_1: RematLocation.REMAT I0423 10:00:16.512713 132148589201216 pyconfig.py:471] Config param mlpwo: RematLocation.REMAT I0423 10:00:16.512729 132148589201216 pyconfig.py:471] Config param moba: False I0423 10:00:16.512743 132148589201216 pyconfig.py:471] Config param moba_chunk_size: 1024 I0423 10:00:16.512759 132148589201216 pyconfig.py:471] Config param moba_topk: 8 I0423 10:00:16.512773 132148589201216 pyconfig.py:471] Config param model_call_mode: I0423 10:00:16.512789 132148589201216 pyconfig.py:471] Config param model_name: gpt3-52k I0423 10:00:16.512804 132148589201216 pyconfig.py:471] Config param moe_expert_input_dim: -1 I0423 10:00:16.512820 132148589201216 pyconfig.py:471] Config param moe_fsdp_use_two_stage_all_gather: False I0423 10:00:16.512834 132148589201216 pyconfig.py:471] Config param moe_mlp_dim: -1 I0423 10:00:16.512849 132148589201216 pyconfig.py:471] Config param moe_mlpwi_0: RematLocation.REMAT I0423 10:00:16.512864 132148589201216 pyconfig.py:471] Config param moe_mlpwi_1: RematLocation.REMAT I0423 10:00:16.512880 132148589201216 pyconfig.py:471] Config param moe_mlpwo: RematLocation.REMAT I0423 10:00:16.512895 132148589201216 pyconfig.py:471] Config param monitor_goodput: False I0423 10:00:16.512911 132148589201216 pyconfig.py:471] Config param monitor_step_time_deviation: True I0423 10:00:16.512925 132148589201216 pyconfig.py:471] Config param mrope_section: [24, 20, 20] I0423 10:00:16.512941 132148589201216 pyconfig.py:471] Config param mscale: 1.0 I0423 10:00:16.512956 132148589201216 pyconfig.py:471] Config param mtc_data_parallelism: 0 I0423 10:00:16.512972 132148589201216 pyconfig.py:471] Config param mtp_eval_target_module: 0 I0423 10:00:16.512986 132148589201216 pyconfig.py:471] Config param mtp_loss_scaling_factor: 0.1 I0423 10:00:16.513003 132148589201216 pyconfig.py:471] Config param mtp_num_layers: 0 I0423 10:00:16.513017 132148589201216 pyconfig.py:471] Config param mu_dtype: float32 I0423 10:00:16.513041 132148589201216 pyconfig.py:471] Config param multi_sampling: False I0423 10:00:16.513056 132148589201216 pyconfig.py:471] Config param multi_tier_checkpointing_backup_interval_minutes: 0 I0423 10:00:16.513072 132148589201216 pyconfig.py:471] Config param muon_beta: 0.95 I0423 10:00:16.513087 132148589201216 pyconfig.py:471] Config param muon_consistent_rms: None I0423 10:00:16.513112 132148589201216 pyconfig.py:471] Config param muon_weight_decay: 0.0 I0423 10:00:16.513126 132148589201216 pyconfig.py:471] Config param n_routing_groups: -1 I0423 10:00:16.513142 132148589201216 pyconfig.py:471] Config param n_window_for_audio: 50 I0423 10:00:16.513156 132148589201216 pyconfig.py:471] Config param n_window_infer_for_audio: 800 I0423 10:00:16.513172 132148589201216 pyconfig.py:471] Config param nope_layer_interval: -1 I0423 10:00:16.513187 132148589201216 pyconfig.py:471] Config param norm_topk_prob: False I0423 10:00:16.513202 132148589201216 pyconfig.py:471] Config param normalization_layer_epsilon: 1e-05 I0423 10:00:16.513219 132148589201216 pyconfig.py:471] Config param normalize_embedding_logits: False I0423 10:00:16.513234 132148589201216 pyconfig.py:471] Config param num_attention_heads_for_vit: 16 I0423 10:00:16.513249 132148589201216 pyconfig.py:471] Config param num_batches: 4 I0423 10:00:16.513265 132148589201216 pyconfig.py:471] Config param num_channels_for_vit: 3 I0423 10:00:16.513280 132148589201216 pyconfig.py:471] Config param num_conv_layers_for_audio: 3 I0423 10:00:16.513295 132148589201216 pyconfig.py:471] Config param num_decoder_layers: 1 I0423 10:00:16.513309 132148589201216 pyconfig.py:471] Config param num_diloco_replicas: 1 I0423 10:00:16.513325 132148589201216 pyconfig.py:471] Config param num_epoch: 1 I0423 10:00:16.513345 132148589201216 pyconfig.py:471] Config param num_eval_passes: 1 I0423 10:00:16.513360 132148589201216 pyconfig.py:471] Config param num_experts: 1 I0423 10:00:16.513375 132148589201216 pyconfig.py:471] Config param num_experts_per_tok: 1 I0423 10:00:16.513391 132148589201216 pyconfig.py:471] Config param num_generations: 2 I0423 10:00:16.513406 132148589201216 pyconfig.py:471] Config param num_hidden_layers_for_vit: 34 I0423 10:00:16.513421 132148589201216 pyconfig.py:471] Config param num_iterations: 1 I0423 10:00:16.513436 132148589201216 pyconfig.py:471] Config param num_kv_heads: 2 I0423 10:00:16.513452 132148589201216 pyconfig.py:471] Config param num_layers_per_pipeline_stage: 1 I0423 10:00:16.513467 132148589201216 pyconfig.py:471] Config param num_mel_bins_for_audio: 128 I0423 10:00:16.513483 132148589201216 pyconfig.py:471] Config param num_pipeline_microbatches: -1 I0423 10:00:16.513499 132148589201216 pyconfig.py:471] Config param num_pipeline_repeats: -1 I0423 10:00:16.513515 132148589201216 pyconfig.py:471] Config param num_position_embeddings_for_vit: 1024 I0423 10:00:16.513531 132148589201216 pyconfig.py:471] Config param num_query_heads: 2 I0423 10:00:16.513545 132148589201216 pyconfig.py:471] Config param num_samplers_slices: -1 I0423 10:00:16.513561 132148589201216 pyconfig.py:471] Config param num_slices: 1 I0423 10:00:16.513576 132148589201216 pyconfig.py:471] Config param num_target_devices: 32 I0423 10:00:16.513590 132148589201216 pyconfig.py:471] Config param num_test_batches: 5 I0423 10:00:16.513605 132148589201216 pyconfig.py:471] Config param num_trainer_slices: -1 I0423 10:00:16.513620 132148589201216 pyconfig.py:471] Config param num_vocab_tiling: 1 I0423 10:00:16.513636 132148589201216 pyconfig.py:471] Config param off_policy_steps: 0 I0423 10:00:16.513650 132148589201216 pyconfig.py:471] Config param offline_data_dir: None I0423 10:00:16.513666 132148589201216 pyconfig.py:471] Config param opt_type: OptimizerType.ADAM_PAX I0423 10:00:16.513682 132148589201216 pyconfig.py:471] Config param optimize_mesh_for_tpu_v6e: False I0423 10:00:16.513697 132148589201216 pyconfig.py:471] Config param optimizer_memory_host_offload: False I0423 10:00:16.513713 132148589201216 pyconfig.py:471] Config param original_max_position_embeddings: 4096 I0423 10:00:16.513727 132148589201216 pyconfig.py:471] Config param out_hidden_size_for_vit: 512 I0423 10:00:16.513743 132148589201216 pyconfig.py:471] Config param out_proj: RematLocation.REMAT I0423 10:00:16.513758 132148589201216 pyconfig.py:471] Config param output_dim_for_audio: 512 I0423 10:00:16.513774 132148589201216 pyconfig.py:471] Config param override_logical_axis_rules: False I0423 10:00:16.513790 132148589201216 pyconfig.py:471] Config param override_model_config: True I0423 10:00:16.513806 132148589201216 pyconfig.py:471] Config param packing: True I0423 10:00:16.513821 132148589201216 pyconfig.py:471] Config param pagedattn_head_dim_alignment: 128 I0423 10:00:16.513836 132148589201216 pyconfig.py:471] Config param pagedattn_max_pages_per_group: -1 I0423 10:00:16.513851 132148589201216 pyconfig.py:471] Config param pagedattn_num_pages: 64 I0423 10:00:16.513867 132148589201216 pyconfig.py:471] Config param pagedattn_pages_per_compute_block: 4 I0423 10:00:16.513881 132148589201216 pyconfig.py:471] Config param pagedattn_tokens_per_page: 32 I0423 10:00:16.513897 132148589201216 pyconfig.py:471] Config param param_scan_axis: 1 I0423 10:00:16.513911 132148589201216 pyconfig.py:471] Config param parameter_memory_host_offload: False I0423 10:00:16.513927 132148589201216 pyconfig.py:471] Config param partial_rotary_factor: 1.0 I0423 10:00:16.513942 132148589201216 pyconfig.py:471] Config param patch_size_for_vit: 14 I0423 10:00:16.513958 132148589201216 pyconfig.py:471] Config param penalty_incorrect_answer: -1.0 I0423 10:00:16.513972 132148589201216 pyconfig.py:471] Config param penalty_incorrect_format: -0.5 I0423 10:00:16.513989 132148589201216 pyconfig.py:471] Config param per_device_batch_size: 2 I0423 10:00:16.514003 132148589201216 pyconfig.py:471] Config param per_device_batch_size_increment: 2.0 I0423 10:00:16.514019 132148589201216 pyconfig.py:471] Config param per_device_batch_size_start: 4.0 I0423 10:00:16.514035 132148589201216 pyconfig.py:471] Config param pipeline_delay_activation_forwarding: False I0423 10:00:16.514051 132148589201216 pyconfig.py:471] Config param pipeline_fsdp_ag_once: False I0423 10:00:16.514066 132148589201216 pyconfig.py:471] Config param pipeline_fsdp_ag_per_repeat: False I0423 10:00:16.514081 132148589201216 pyconfig.py:471] Config param pipeline_parallel_layers: 1 I0423 10:00:16.514111 132148589201216 pyconfig.py:471] Config param pixel_shuffle_ratio_for_vit: 0.5 I0423 10:00:16.514127 132148589201216 pyconfig.py:471] Config param posemb_type_for_vit: learn I0423 10:00:16.514141 132148589201216 pyconfig.py:471] Config param position_id_per_seconds: 25 I0423 10:00:16.514157 132148589201216 pyconfig.py:471] Config param prefill_cache_axis_order: 1,2,0,3 I0423 10:00:16.514171 132148589201216 pyconfig.py:471] Config param prefill_cache_dir: I0423 10:00:16.514186 132148589201216 pyconfig.py:471] Config param prefill_chunk_size: 256 I0423 10:00:16.514201 132148589201216 pyconfig.py:471] Config param prefill_slice: v5e-16 I0423 10:00:16.514216 132148589201216 pyconfig.py:471] Config param prefix_caching_dram_byte: 100000000000 I0423 10:00:16.514231 132148589201216 pyconfig.py:471] Config param prefix_caching_hbm_byte: 10000000000 I0423 10:00:16.514247 132148589201216 pyconfig.py:471] Config param prefuse_moe_weights: False I0423 10:00:16.514263 132148589201216 pyconfig.py:471] Config param profile_cleanly: True I0423 10:00:16.514277 132148589201216 pyconfig.py:471] Config param profile_periodically_period: -1 I0423 10:00:16.514293 132148589201216 pyconfig.py:471] Config param profile_power_events: False I0423 10:00:16.514307 132148589201216 pyconfig.py:471] Config param profiler: ProfilerType.NONE I0423 10:00:16.514325 132148589201216 pyconfig.py:471] Config param profiler_steps: 5 I0423 10:00:16.514345 132148589201216 pyconfig.py:471] Config param projector_dropout_for_vit: 0.0 I0423 10:00:16.514360 132148589201216 pyconfig.py:471] Config param projector_input_dim_for_vit: 4096 I0423 10:00:16.514374 132148589201216 pyconfig.py:471] Config param projector_output_dim_for_vit: 4096 I0423 10:00:16.514390 132148589201216 pyconfig.py:471] Config param prometheus_port: 0 I0423 10:00:16.514404 132148589201216 pyconfig.py:471] Config param prompt: I love to I0423 10:00:16.514420 132148589201216 pyconfig.py:471] Config param pure_nnx: False I0423 10:00:16.514436 132148589201216 pyconfig.py:471] Config param pure_nnx_decoder: False I0423 10:00:16.514450 132148589201216 pyconfig.py:471] Config param q_lora_rank: 0 I0423 10:00:16.514466 132148589201216 pyconfig.py:471] Config param qk_clip_threshold: 100.0 I0423 10:00:16.514481 132148589201216 pyconfig.py:471] Config param qk_nope_head_dim: 128 I0423 10:00:16.514497 132148589201216 pyconfig.py:471] Config param qk_norm_with_scale: True I0423 10:00:16.514512 132148589201216 pyconfig.py:471] Config param qk_rope_head_dim: 64 I0423 10:00:16.514528 132148589201216 pyconfig.py:471] Config param qkv_proj: RematLocation.REMAT I0423 10:00:16.514544 132148589201216 pyconfig.py:471] Config param quant_cfg_path: I0423 10:00:16.514559 132148589201216 pyconfig.py:471] Config param quantization: QuantizationType.NONE I0423 10:00:16.514576 132148589201216 pyconfig.py:471] Config param quantization_local_shard_count: 4 I0423 10:00:16.514591 132148589201216 pyconfig.py:471] Config param quantize_kvcache: False I0423 10:00:16.514607 132148589201216 pyconfig.py:471] Config param query_proj: RematLocation.REMAT I0423 10:00:16.514622 132148589201216 pyconfig.py:471] Config param query_wa_proj: RematLocation.REMAT I0423 10:00:16.514639 132148589201216 pyconfig.py:471] Config param ragged_block_size: 256 I0423 10:00:16.514653 132148589201216 pyconfig.py:471] Config param ragged_buffer_factor: -1.0 I0423 10:00:16.514668 132148589201216 pyconfig.py:471] Config param rampup_end_step: 0 I0423 10:00:16.514683 132148589201216 pyconfig.py:471] Config param rampup_samples_per_increment_to_load: None I0423 10:00:16.514699 132148589201216 pyconfig.py:471] Config param reasoning_end_token: </reasoning> I0423 10:00:16.514715 132148589201216 pyconfig.py:471] Config param reasoning_start_token: <reasoning> I0423 10:00:16.514729 132148589201216 pyconfig.py:471] Config param record_internal_nn_metrics: 0 I0423 10:00:16.514745 132148589201216 pyconfig.py:471] Config param remat_policy: full I0423 10:00:16.514760 132148589201216 pyconfig.py:471] Config param remat_policy_for_vit: minimal I0423 10:00:16.514775 132148589201216 pyconfig.py:471] Config param remove_size_one_mesh_axis_from_type: True I0423 10:00:16.514789 132148589201216 pyconfig.py:471] Config param replicate_quant_scale: False I0423 10:00:16.514805 132148589201216 pyconfig.py:471] Config param replicator_backup_interval_minutes: 0 I0423 10:00:16.514819 132148589201216 pyconfig.py:471] Config param report_heartbeat_metric_for_gcp_monitoring: False I0423 10:00:16.514835 132148589201216 pyconfig.py:471] Config param report_performance_metric_for_gcp_monitoring: False I0423 10:00:16.514849 132148589201216 pyconfig.py:471] Config param reshape_q: False I0423 10:00:16.514865 132148589201216 pyconfig.py:471] Config param return_log_prob: False I0423 10:00:16.514879 132148589201216 pyconfig.py:471] Config param reuse_example_batch: 0 I0423 10:00:16.514895 132148589201216 pyconfig.py:471] Config param reward_exact_answer: 5.0 I0423 10:00:16.514910 132148589201216 pyconfig.py:471] Config param reward_exact_format_match: 3.0 I0423 10:00:16.514927 132148589201216 pyconfig.py:471] Config param reward_partial_format_match: 0.5 I0423 10:00:16.514941 132148589201216 pyconfig.py:471] Config param reward_ratio_guess_to_answer_high: 0.5 I0423 10:00:16.514957 132148589201216 pyconfig.py:471] Config param reward_ratio_guess_to_answer_low: 0.25 I0423 10:00:16.514972 132148589201216 pyconfig.py:471] Config param reward_white_space_format_match: 1.5 I0423 10:00:16.514988 132148589201216 pyconfig.py:471] Config param rl: {'num_generations': 2, 'num_iterations': 1, 'grpo_beta': 0.08, 'grpo_epsilon': 0.2, 'loss_algo': 'grpo', 'use_agentic_rollout': False, 'max_concurrency': 256, 'off_policy_steps': 0, 'system_prompt': '', 'degenerate_group_masking': True, 'epsilon_high': None} I0423 10:00:16.515009 132148589201216 pyconfig.py:471] Config param rollout_data_parallelism: -1 I0423 10:00:16.515023 132148589201216 pyconfig.py:471] Config param rollout_expert_parallelism: 1 I0423 10:00:16.515040 132148589201216 pyconfig.py:471] Config param rollout_micro_batch_size: -1 I0423 10:00:16.515056 132148589201216 pyconfig.py:471] Config param rollout_tensor_parallelism: -1 I0423 10:00:16.515070 132148589201216 pyconfig.py:471] Config param rope_attention_scaling: False I0423 10:00:16.515086 132148589201216 pyconfig.py:471] Config param rope_factor: 40 I0423 10:00:16.515114 132148589201216 pyconfig.py:471] Config param rope_interleave: True I0423 10:00:16.515131 132148589201216 pyconfig.py:471] Config param rope_linear_scaling_factor: 1.0 I0423 10:00:16.515145 132148589201216 pyconfig.py:471] Config param rope_max_timescale: 10000 I0423 10:00:16.515161 132148589201216 pyconfig.py:471] Config param rope_min_timescale: 1 I0423 10:00:16.515175 132148589201216 pyconfig.py:471] Config param rope_theta_for_vit: 10000 I0423 10:00:16.515191 132148589201216 pyconfig.py:471] Config param rope_truncate: True I0423 10:00:16.515206 132148589201216 pyconfig.py:471] Config param rope_type: RopeType.DEFAULT I0423 10:00:16.515224 132148589201216 pyconfig.py:471] Config param rope_use_scale: True I0423 10:00:16.515240 132148589201216 pyconfig.py:471] Config param routed_bias: False I0423 10:00:16.515256 132148589201216 pyconfig.py:471] Config param routed_bias_update_rate: 0.0 I0423 10:00:16.515272 132148589201216 pyconfig.py:471] Config param routed_scaling_factor: 1.0 I0423 10:00:16.515286 132148589201216 pyconfig.py:471] Config param routed_score_func: I0423 10:00:16.515302 132148589201216 pyconfig.py:471] Config param run_name: gpt3-52k_2026-04-23-10-00 I0423 10:00:16.515318 132148589201216 pyconfig.py:471] Config param sa_block_kv: 512 I0423 10:00:16.515338 132148589201216 pyconfig.py:471] Config param sa_block_kv_compute: 512 I0423 10:00:16.515353 132148589201216 pyconfig.py:471] Config param sa_block_kv_dkv: 512 I0423 10:00:16.515369 132148589201216 pyconfig.py:471] Config param sa_block_kv_dkv_compute: 512 I0423 10:00:16.515384 132148589201216 pyconfig.py:471] Config param sa_block_kv_dq: 512 I0423 10:00:16.515399 132148589201216 pyconfig.py:471] Config param sa_block_q: 512 I0423 10:00:16.515413 132148589201216 pyconfig.py:471] Config param sa_block_q_dkv: 512 I0423 10:00:16.515429 132148589201216 pyconfig.py:471] Config param sa_block_q_dq: 512 I0423 10:00:16.515444 132148589201216 pyconfig.py:471] Config param sa_k_layout: HEAD_DIM_MINOR I0423 10:00:16.515459 132148589201216 pyconfig.py:471] Config param sa_q_layout: HEAD_DIM_MINOR I0423 10:00:16.515475 132148589201216 pyconfig.py:471] Config param sa_use_fused_bwd_kernel: False I0423 10:00:16.515490 132148589201216 pyconfig.py:471] Config param sa_v_layout: HEAD_DIM_MINOR I0423 10:00:16.515506 132148589201216 pyconfig.py:471] Config param sampler_devices_fraction: 0.5 I0423 10:00:16.515522 132148589201216 pyconfig.py:471] Config param save_checkpoint_on_completion: True I0423 10:00:16.515536 132148589201216 pyconfig.py:471] Config param save_config_to_gcs: False I0423 10:00:16.515552 132148589201216 pyconfig.py:471] Config param save_quantized_params_path: I0423 10:00:16.515567 132148589201216 pyconfig.py:471] Config param scale_embedding_for_audio: True I0423 10:00:16.515581 132148589201216 pyconfig.py:471] Config param scan_layers: True I0423 10:00:16.515597 132148589201216 pyconfig.py:471] Config param scan_layers_per_stage: False I0423 10:00:16.515612 132148589201216 pyconfig.py:471] Config param scan_pipeline_iterations: True I0423 10:00:16.515628 132148589201216 pyconfig.py:471] Config param scan_pipeline_repeats: False I0423 10:00:16.515643 132148589201216 pyconfig.py:471] Config param set_remat_policy_on_layers_per_stage: False I0423 10:00:16.515659 132148589201216 pyconfig.py:471] Config param set_remat_policy_on_pipeline_iterations: True I0423 10:00:16.515674 132148589201216 pyconfig.py:471] Config param sft_train_on_completion_only: False I0423 10:00:16.515688 132148589201216 pyconfig.py:471] Config param shard_exp_on_fsdp: False I0423 10:00:16.515704 132148589201216 pyconfig.py:471] Config param shard_mode: ShardMode.AUTO I0423 10:00:16.515721 132148589201216 pyconfig.py:471] Config param shard_optimizer_over_data: False I0423 10:00:16.515736 132148589201216 pyconfig.py:471] Config param sharding_strategy: None I0423 10:00:16.515751 132148589201216 pyconfig.py:471] Config param sharding_tolerance: 0.02 I0423 10:00:16.515768 132148589201216 pyconfig.py:471] Config param shardy: True I0423 10:00:16.515782 132148589201216 pyconfig.py:471] Config param share_kv_projections: False I0423 10:00:16.515800 132148589201216 pyconfig.py:471] Config param shared_experts: 0 I0423 10:00:16.515814 132148589201216 pyconfig.py:471] Config param sinkhorn_iterations: 20 I0423 10:00:16.515828 132148589201216 pyconfig.py:471] Config param skip_first_n_steps_for_profiler: 1 I0423 10:00:16.515843 132148589201216 pyconfig.py:471] Config param skip_jax_distributed_system: False I0423 10:00:16.515858 132148589201216 pyconfig.py:471] Config param skip_step_interval: 128 I0423 10:00:16.515874 132148589201216 pyconfig.py:471] Config param skip_step_on_spikes: False I0423 10:00:16.515888 132148589201216 pyconfig.py:471] Config param skip_step_scaling_factor: 6.0 I0423 10:00:16.515904 132148589201216 pyconfig.py:471] Config param sliding_window_size: 0 I0423 10:00:16.515918 132148589201216 pyconfig.py:471] Config param solution_end_token: </answer> I0423 10:00:16.515933 132148589201216 pyconfig.py:471] Config param solution_start_token: <answer> I0423 10:00:16.515949 132148589201216 pyconfig.py:471] Config param source_checkpoint_layout: orbax I0423 10:00:16.515963 132148589201216 pyconfig.py:471] Config param sparse_matmul: True I0423 10:00:16.515979 132148589201216 pyconfig.py:471] Config param spatial_merge_size_for_vit: 2 I0423 10:00:16.515995 132148589201216 pyconfig.py:471] Config param stack_prefill_result_cache: False I0423 10:00:16.516010 132148589201216 pyconfig.py:471] Config param stack_trace_interval_seconds: 600 I0423 10:00:16.516025 132148589201216 pyconfig.py:471] Config param stack_trace_to_cloud: False I0423 10:00:16.516040 132148589201216 pyconfig.py:471] Config param step_deviation_interval_seconds: 30 I0423 10:00:16.516055 132148589201216 pyconfig.py:471] Config param steps: 200000 I0423 10:00:16.516071 132148589201216 pyconfig.py:471] Config param stop_strings: None I0423 10:00:16.516087 132148589201216 pyconfig.py:471] Config param student_overrides: {'model_name': 'llama3.1-8b'} I0423 10:00:16.516113 132148589201216 pyconfig.py:471] Config param student_params_to_update: None I0423 10:00:16.516129 132148589201216 pyconfig.py:471] Config param subslice_shape: I0423 10:00:16.516143 132148589201216 pyconfig.py:471] Config param swap_space_vllm_gb: 2 I0423 10:00:16.516158 132148589201216 pyconfig.py:471] Config param system_prompt: I0423 10:00:16.516173 132148589201216 pyconfig.py:471] Config param target_eval_loss: 0.0 I0423 10:00:16.516189 132148589201216 pyconfig.py:471] Config param teacher_overrides: {'model_name': 'llama3.1-8b'} I0423 10:00:16.516205 132148589201216 pyconfig.py:471] Config param temperature_tuning: False I0423 10:00:16.516220 132148589201216 pyconfig.py:471] Config param temporal_patch_size_for_vit: 2 I0423 10:00:16.516237 132148589201216 pyconfig.py:471] Config param tensorboard_dir: /deps/maxtext_output/gpt3-52k_2026-04-23-10-00/tensorboard/ I0423 10:00:16.516251 132148589201216 pyconfig.py:471] Config param tensors_on_device: None I0423 10:00:16.516267 132148589201216 pyconfig.py:471] Config param tensors_to_offload: None I0423 10:00:16.516282 132148589201216 pyconfig.py:471] Config param test_batch_start_index: 0 I0423 10:00:16.516297 132148589201216 pyconfig.py:471] Config param tile_size_for_vit: 336 I0423 10:00:16.516313 132148589201216 pyconfig.py:471] Config param tokenize_eval_data: True I0423 10:00:16.516331 132148589201216 pyconfig.py:471] Config param tokenize_train_data: True I0423 10:00:16.516347 132148589201216 pyconfig.py:471] Config param tokenizer_path: meta-llama/Llama-3.1-8B I0423 10:00:16.516362 132148589201216 pyconfig.py:471] Config param tokenizer_type: TokenizerType.HUGGINGFACE I0423 10:00:16.516380 132148589201216 pyconfig.py:471] Config param topk_routing_group: -1 I0423 10:00:16.516396 132148589201216 pyconfig.py:471] Config param train_data_columns: ['text'] I0423 10:00:16.516413 132148589201216 pyconfig.py:471] Config param train_fraction: 1.0 I0423 10:00:16.516429 132148589201216 pyconfig.py:471] Config param train_image_column: image I0423 10:00:16.516444 132148589201216 pyconfig.py:471] Config param train_micro_batch_size: -1 I0423 10:00:16.516459 132148589201216 pyconfig.py:471] Config param train_split: train I0423 10:00:16.516474 132148589201216 pyconfig.py:471] Config param trainable_parameters_mask: [] I0423 10:00:16.516490 132148589201216 pyconfig.py:471] Config param trainable_position_size: 2048 I0423 10:00:16.516506 132148589201216 pyconfig.py:471] Config param trainer_devices_fraction: 0.5 I0423 10:00:16.516521 132148589201216 pyconfig.py:471] Config param upload_all_profiler_results: False I0423 10:00:16.516536 132148589201216 pyconfig.py:471] Config param use_2d_fsdp_sharding: False I0423 10:00:16.516552 132148589201216 pyconfig.py:471] Config param use_agentic_rollout: False I0423 10:00:16.516566 132148589201216 pyconfig.py:471] Config param use_audio: False I0423 10:00:16.516581 132148589201216 pyconfig.py:471] Config param use_audio_in_video: False I0423 10:00:16.516596 132148589201216 pyconfig.py:471] Config param use_batch_split_schedule: False I0423 10:00:16.516611 132148589201216 pyconfig.py:471] Config param use_chat_template: False I0423 10:00:16.516626 132148589201216 pyconfig.py:471] Config param use_chunked_prefill: False I0423 10:00:16.516641 132148589201216 pyconfig.py:471] Config param use_custom_sort_vjp: True I0423 10:00:16.516655 132148589201216 pyconfig.py:471] Config param use_dpo: False I0423 10:00:16.516671 132148589201216 pyconfig.py:471] Config param use_gather_mosaic_kernel: False I0423 10:00:16.516685 132148589201216 pyconfig.py:471] Config param use_grpo: True I0423 10:00:16.516701 132148589201216 pyconfig.py:471] Config param use_indexer: False I0423 10:00:16.516717 132148589201216 pyconfig.py:471] Config param use_iota_embed: True I0423 10:00:16.516732 132148589201216 pyconfig.py:471] Config param use_jax_splash: False I0423 10:00:16.516746 132148589201216 pyconfig.py:471] Config param use_max_logit_estimate: -1 I0423 10:00:16.516762 132148589201216 pyconfig.py:471] Config param use_mrope: False I0423 10:00:16.516776 132148589201216 pyconfig.py:471] Config param use_multimodal: False I0423 10:00:16.516791 132148589201216 pyconfig.py:471] Config param use_pathways: True I0423 10:00:16.516805 132148589201216 pyconfig.py:471] Config param use_post_attn_norm: False I0423 10:00:16.516820 132148589201216 pyconfig.py:471] Config param use_post_ffw_norm: False I0423 10:00:16.516834 132148589201216 pyconfig.py:471] Config param use_qk_clip: False I0423 10:00:16.516850 132148589201216 pyconfig.py:471] Config param use_qk_norm: False I0423 10:00:16.516867 132148589201216 pyconfig.py:471] Config param use_qk_norm_in_gdn: True I0423 10:00:16.516881 132148589201216 pyconfig.py:471] Config param use_qwix_quantization: False I0423 10:00:16.516896 132148589201216 pyconfig.py:471] Config param use_ragged_attention: False I0423 10:00:16.516912 132148589201216 pyconfig.py:471] Config param use_random_routing: False I0423 10:00:16.516927 132148589201216 pyconfig.py:471] Config param use_replicator_service: False I0423 10:00:16.516942 132148589201216 pyconfig.py:471] Config param use_ring_of_experts: False I0423 10:00:16.516957 132148589201216 pyconfig.py:471] Config param use_sft: False I0423 10:00:16.516973 132148589201216 pyconfig.py:471] Config param use_splash_scheduler: False I0423 10:00:16.516988 132148589201216 pyconfig.py:471] Config param use_tokamax_gmm: False I0423 10:00:16.517003 132148589201216 pyconfig.py:471] Config param use_tokamax_splash: False I0423 10:00:16.517018 132148589201216 pyconfig.py:471] Config param use_truncation: True I0423 10:00:16.517034 132148589201216 pyconfig.py:471] Config param use_tunix_gradient_accumulation: False I0423 10:00:16.517048 132148589201216 pyconfig.py:471] Config param use_untrainable_positional_embedding: False I0423 10:00:16.517066 132148589201216 pyconfig.py:471] Config param use_vertex_tensorboard: False I0423 10:00:16.517082 132148589201216 pyconfig.py:471] Config param using_pipeline_parallelism: False I0423 10:00:16.517110 132148589201216 pyconfig.py:471] Config param v_head_dim: 128 I0423 10:00:16.517125 132148589201216 pyconfig.py:471] Config param v_norm_with_scale: True I0423 10:00:16.517141 132148589201216 pyconfig.py:471] Config param value_proj: RematLocation.REMAT I0423 10:00:16.517155 132148589201216 pyconfig.py:471] Config param vertex_tensorboard_project: I0423 10:00:16.517171 132148589201216 pyconfig.py:471] Config param vertex_tensorboard_region: I0423 10:00:16.517185 132148589201216 pyconfig.py:471] Config param video_path: I0423 10:00:16.517200 132148589201216 pyconfig.py:471] Config param video_placeholder: <|video|> I0423 10:00:16.517215 132148589201216 pyconfig.py:471] Config param vision_output_dim_for_vit: 4096 I0423 10:00:16.517230 132148589201216 pyconfig.py:471] Config param vision_output_length: -1 I0423 10:00:16.517245 132148589201216 pyconfig.py:471] Config param vllm_additional_config: {} I0423 10:00:16.517261 132148589201216 pyconfig.py:471] Config param vllm_hf_config_path: I0423 10:00:16.517276 132148589201216 pyconfig.py:471] Config param vllm_hf_overrides: {} I0423 10:00:16.517292 132148589201216 pyconfig.py:471] Config param vocab_size: 32000 I0423 10:00:16.517306 132148589201216 pyconfig.py:471] Config param warmup_steps_fraction: 0.1 I0423 10:00:16.517323 132148589201216 pyconfig.py:471] Config param weight_dtype: float32 I0423 10:00:16.517350 132148589201216 pyconfig.py:471] Config param weight_quantization_calibration_method: absmax I0423 10:00:16.517365 132148589201216 pyconfig.py:471] Config param wi_tile_dlhs_batch_seq: 512 I0423 10:00:16.517381 132148589201216 pyconfig.py:471] Config param wi_tile_dlhs_embed_dim: 1024 I0423 10:00:16.517398 132148589201216 pyconfig.py:471] Config param wi_tile_dlhs_mlp_dim: 1024 I0423 10:00:16.517413 132148589201216 pyconfig.py:471] Config param wi_tile_drhs_batch_seq: 512 I0423 10:00:16.517427 132148589201216 pyconfig.py:471] Config param wi_tile_drhs_embed_dim: 1024 I0423 10:00:16.517443 132148589201216 pyconfig.py:471] Config param wi_tile_drhs_mlp_dim: 1024 I0423 10:00:16.517457 132148589201216 pyconfig.py:471] Config param wi_tile_fwd_batch_seq: 512 I0423 10:00:16.517473 132148589201216 pyconfig.py:471] Config param wi_tile_fwd_embed_dim: 1024 I0423 10:00:16.517487 132148589201216 pyconfig.py:471] Config param wi_tile_fwd_mlp_dim: 1024 I0423 10:00:16.517503 132148589201216 pyconfig.py:471] Config param wo_tile_dlhs_batch_seq: 512 I0423 10:00:16.517521 132148589201216 pyconfig.py:471] Config param wo_tile_dlhs_embed_dim: 1024 I0423 10:00:16.517535 132148589201216 pyconfig.py:471] Config param wo_tile_dlhs_mlp_dim: 1024 I0423 10:00:16.517551 132148589201216 pyconfig.py:471] Config param wo_tile_drhs_batch_seq: 512 I0423 10:00:16.517565 132148589201216 pyconfig.py:471] Config param wo_tile_drhs_embed_dim: 1024 I0423 10:00:16.517580 132148589201216 pyconfig.py:471] Config param wo_tile_drhs_mlp_dim: 1024 I0423 10:00:16.517595 132148589201216 pyconfig.py:471] Config param wo_tile_fwd_batch_seq: 512 I0423 10:00:16.517609 132148589201216 pyconfig.py:471] Config param wo_tile_fwd_embed_dim: 1024 I0423 10:00:16.517625 132148589201216 pyconfig.py:471] Config param wo_tile_fwd_mlp_dim: 1024 I0423 10:00:16.517639 132148589201216 pyconfig.py:471] Config param wsd_decay_steps_fraction: 0.1 I0423 10:00:16.517655 132148589201216 pyconfig.py:471] Config param wsd_decay_style: WsdDecayStyle.LINEAR I0423 10:00:16.517671 132148589201216 pyconfig.py:471] Config param xprof_e2e_enable_fw_power_level_event: False I0423 10:00:16.517687 132148589201216 pyconfig.py:471] Config param xprof_e2e_enable_fw_thermal_event: False I0423 10:00:16.517701 132148589201216 pyconfig.py:471] Config param xprof_e2e_enable_fw_throttle_event: False I0423 10:00:16.517716 132148589201216 pyconfig.py:471] Config param xprof_tpu_power_trace_level: 0 I0423 10:00:16.517733 132148589201216 pyconfig.py:471] Config param z_loss_multiplier: 0.0 I0423 10:00:16.518047 132148589201216 tokenizer.py:245] Tokenizer path: meta-llama/Llama-2-7b-chat-hf I0423 10:00:16.518081 132148589201216 tokenizer.py:224] Loading HF tokenizer: meta-llama/Llama-2-7b-chat-hf I0423 10:00:20.126775 132148589201216 _schedule.py:129] A polynomial schedule was set with a non-positive `transition_steps` value; this results in a constant schedule with value `init_value`. I0423 10:00:20.129797 132148589201216 maxtext_utils.py:1771] Num_devices: 32, shape (1, 4, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1) I0423 10:00:20.129917 132148589201216 train_distill.py:580] Applying logical axis rules for model initialization and training... I0423 10:00:20.129988 132148589201216 train_distill.py:584] Loading Student from ... I0423 10:00:20.130016 132148589201216 train_distill.py:168] --- Student Configuration --- I0423 10:00:20.130038 132148589201216 train_distill.py:169] Model Name: gpt3-52k I0423 10:00:20.130059 132148589201216 train_distill.py:170] Dimensions: 1 Layers, 16 Emb Dim, 8 Head Dim I0423 10:00:20.130076 132148589201216 train_distill.py:173] Attention Heads: 2 Query, 2 KV I0423 10:00:20.130106 132148589201216 train_distill.py:174] Vocab Size: 32000 I0423 10:00:20.130125 132148589201216 train_distill.py:175] Checkpoint: I0423 10:00:20.130146 132148589201216 train_distill.py:449] Initializing model: gpt3-52k... I0423 10:00:21.403322 132148589201216 train_distill.py:598] Loading Teacher from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items... I0423 10:00:21.403429 132148589201216 train_distill.py:168] --- Teacher Configuration --- I0423 10:00:21.403458 132148589201216 train_distill.py:169] Model Name: gpt3-52k I0423 10:00:21.403484 132148589201216 train_distill.py:170] Dimensions: 1 Layers, 16 Emb Dim, 8 Head Dim I0423 10:00:21.403505 132148589201216 train_distill.py:173] Attention Heads: 2 Query, 2 KV I0423 10:00:21.403525 132148589201216 train_distill.py:174] Vocab Size: 32000 I0423 10:00:21.403544 132148589201216 train_distill.py:175] Checkpoint: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items I0423 10:00:21.403563 132148589201216 train_distill.py:449] Initializing model: gpt3-52k... I0423 10:00:22.522470 132148589201216 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0423 10:00:22.522903 132148589201216 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=True, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x782f861d1250>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0423 10:00:22.522964 132148589201216 abstract_checkpointer.py:35] orbax-checkpoint version: 0.11.28 W0423 10:00:23.080690 132148589201216 checkpoint.py:202] Metadata file does not exist: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items/_CHECKPOINT_METADATA I0423 10:00:23.668250 2138 google_auth_provider.cc:181] Running on GCE, using service account 562977990677-compute@developer.gserviceaccount.com I0423 10:00:24.811826 132148589201216 checkpointer.py:304] Restoring checkpoint from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items. W0423 10:00:27.439194 132148589201216 transform_utils.py:230] The transformations API will eventually be replaced by an upgraded design. The current API will not be removed until this point, but it will no longer be actively worked on. I0423 10:00:27.439568 132148589201216 transform_utils.py:288] The following keys are not loaded from the original tree after applying specified transforms: params/params/decoder/to_nnx__rngs/aqt/count, params/params/decoder/to_nnx__rngs/aqt/key, params/params/decoder/to_nnx__rngs/dropout/count, params/params/decoder/to_nnx__rngs/dropout/key, params/params/decoder/to_nnx__rngs/params/count, params/params/decoder/to_nnx__rngs/params/key I0423 10:00:28.832030 132148589201216 checkpointer.py:318] Finished restoring checkpoint in 4.40 seconds from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items. I0423 10:00:29.524989 132148589201216 train_distill.py:624] Initializing Data Iterators via MaxText pipeline... I0423 10:00:29.589671 132148589201216 config.py:112] TensorFlow version 2.20.0 available. I0423 10:00:29.590172 132148589201216 config.py:125] JAX version 0.8.3 available. E0423 10:00:31.623401 132148589201216 packing.py:209] PackAndBatchOperation is deprecated. Please use lazy_dataset.FirstFitPackIterDataset instead. I0423 10:00:31.623623 132148589201216 data_loader.py:408] Adding CopyNumPyArrayToSharedMemory MapTransform. I0423 10:00:31.626663 132148589201216 train_distill.py:394] Input Pipeline Checkpointing: DISABLED I0423 10:00:31.626723 132148589201216 train_distill.py:398] Reason: Iterator 'MultiHostDataLoadIterator' is not recognized as Grain (dataset_type='DatasetType.HF', has_save=False) I0423 10:00:31.626784 132148589201216 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0423 10:00:31.626859 132148589201216 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x782f861d1250>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0423 10:00:31.626900 132148589201216 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0423 10:00:31.626931 132148589201216 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x782f861d1250>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0423 10:00:31.626974 132148589201216 checkpoint_manager.py:702] [process=6][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7818ac7ff050>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7818ac7fefc0>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7818ac7fef30>}, handler_registry=None I0423 10:00:31.627180 132148589201216 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7818ac7ff050>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0423 10:00:31.627223 132148589201216 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7818ac7fefc0>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0423 10:00:31.627259 132148589201216 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7818ac7fef30>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0423 10:00:31.627298 132148589201216 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7818ac7feba0>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0423 10:00:31.627326 132148589201216 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7818ac7ff050>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7818ac7ff050>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7818ac7fefc0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7818ac7fefc0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7818ac7fef30>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7818ac7fef30>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7818ac7feba0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7818ac7feba0>}). I0423 10:00:31.627732 132148589201216 async_checkpointer.py:177] [process=6][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7818ac6da2a0> timeout: 600 secs and primary_host=0 for async checkpoint writes I0423 10:00:34.500703 132148589201216 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_trainstate_and_training_loop_20260423_093815/pt_distill_nnx_xpk_feat_nnx_trainstate_and_training_loop_20260423_093815_07_distill_smoke/checkpoints I0423 10:00:34.930415 132148589201216 checkpoint_manager.py:921] [process=6][thread=MainThread] CheckpointManager created, primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_trainstate_and_training_loop_20260423_093815/pt_distill_nnx_xpk_feat_nnx_trainstate_and_training_loop_20260423_093815_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7818ac7fef00> I0423 10:00:34.930596 132148589201216 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0423 10:00:34.930664 132148589201216 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x782f861d1250>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0423 10:00:34.930701 132148589201216 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0423 10:00:34.930733 132148589201216 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x782f861d1250>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0423 10:00:34.930768 132148589201216 checkpoint_manager.py:1983] [process=6][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning. I0423 10:00:34.930821 132148589201216 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=132148589201216 count=1 at 0x7818adf9a4c0>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7818ac7fecf0>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7818ac7fecc0>, _write_futures=[]) I0423 10:00:34.931197 132148589201216 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=132148589201216 count=1 at 0x7818adf9a4c0>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7818ac7fecf0>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7818ac7fecc0>, _write_futures=[]) I0423 10:00:34.931223 132148589201216 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=132148589201216 count=1 at 0x7818adf9a4c0>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7818ac7fecf0>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7818ac7fecc0>, _write_futures=[]) I0423 10:00:34.931257 132148589201216 checkpoint_manager.py:702] [process=6][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7818ac7feed0>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7818ac7fe4e0>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7818ac7fdf40>, 'iter': <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7818ac7fdfd0>}, handler_registry=None I0423 10:00:34.931358 132148589201216 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7818ac7feed0>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0423 10:00:34.931391 132148589201216 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7818ac7fe4e0>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0423 10:00:34.931414 132148589201216 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7818ac7fdf40>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0423 10:00:34.931441 132148589201216 composite_checkpoint_handler.py:237] Deferred registration for item: "iter". Adding handler `<maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7818ac7fdfd0>` for item "iter" and save args `<class 'maxtext.common.checkpointing.GrainCheckpointSave'>` and restore args `<class 'maxtext.common.checkpointing.GrainCheckpointRestore'>` to `_handler_registry`. I0423 10:00:34.931463 132148589201216 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7818ac7fc7a0>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0423 10:00:34.931492 132148589201216 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7818ac7feed0>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7818ac7feed0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7818ac7fe4e0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7818ac7fe4e0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7818ac7fdf40>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7818ac7fdf40>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointSave'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7818ac7fdfd0>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointRestore'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7818ac7fdfd0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7818ac7fc7a0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7818ac7fc7a0>}). I0423 10:00:34.931562 132148589201216 async_checkpointer.py:177] [process=6][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7818ac6da3e0> timeout: 600 secs and primary_host=0 for async checkpoint writes I0423 10:00:35.310302 132148589201216 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_trainstate_and_training_loop_20260423_093815/pt_distill_nnx_xpk_feat_nnx_trainstate_and_training_loop_20260423_093815_07_distill_smoke/checkpoints I0423 10:00:35.746438 132148589201216 checkpoint_manager.py:921] [process=6][thread=MainThread] CheckpointManager created, primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_trainstate_and_training_loop_20260423_093815/pt_distill_nnx_xpk_feat_nnx_trainstate_and_training_loop_20260423_093815_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7818adef4230> I0423 10:00:35.747038 132148589201216 train_distill.py:675] Starting Distillation Training... I0423 10:00:35.747159 132148589201216 peft_trainer.py:590] Training with mesh: Mesh('diloco': 1, 'data': 4, 'stage': 1, 'fsdp': 8, 'fsdp_transpose': 1, 'sequence': 1, 'context': 1, 'context_autoregressive': 1, 'tensor': 1, 'tensor_transpose': 1, 'tensor_sequence': 1, 'expert': 1, 'autoregressive': 1, axis_types=(Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto)) I0423 10:00:36.563148 132148589201216 peft_trainer.py:600] Compiled train_step cache size: 0 Training: 0%| | 0/5 [00:00<?, ?step/s]I0423 10:00:36.564948 132004276336384 grain_pool.py:367] Grain pool will use 1 processes. I0423 10:00:36.591526 132004276336384 grain_pool.py:440] Grain pool will start child processes. I0423 10:00:36.596865 132004276336384 grain_pool.py:448] Grain pool started all child processes. 2026-04-23 10:00:42.671416: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303) Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} `rope_scaling`'s factor field must be a float >= 1, got 40 `rope_scaling`'s beta_fast field must be a float, got 32 `rope_scaling`'s beta_slow field must be a float, got 1 Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'} I0423 10:00:46.900781 132148589201216 utils.py:86] Train loop finished in: 10.3371 seconds Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 749, in <module> app.run(main) File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 316, in run _run_main(main, args) File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 261, in _run_main sys.exit(main(argv)) ^^^^^^^^^^ File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 745, in main train_distill(student_config, teacher_config, is_offline, global_config.offline_data_dir) File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 677, in train_distill trainer.train(train_iter, eval_iter) File "/usr/local/lib/python3.12/site-packages/tunix/sft/peft_trainer.py", line 659, in train train_example = sharding_utils.shard_input( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 58, in shard_input return jax.tree.map( ^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/tree.py", line 155, in map return tree_util.tree_map(f, tree, *rest, is_leaf=is_leaf) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 362, in tree_map return treedef.unflatten(f(*xs) for xs in zip(*all_leaves)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 362, in <genexpr> return treedef.unflatten(f(*xs) for xs in zip(*all_leaves)) ^^^^^^ File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 59, in <lambda> lambda x: jax.make_array_from_process_local_data( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 986, in make_array_from_process_local_data out = [_array_from_process_local_data(data, s, shape) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 1048, in _array_from_process_local_data return make_array_from_callback(global_shape, sharding, cb) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 845, in make_array_from_callback per_device_values = api.device_put(per_device_values, devices) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/api.py", line 2729, in device_put out_flat = dispatch._batched_device_put_impl( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 558, in _batched_device_put_impl y = _device_put_impl(x, device=device, src=src, copy=cp, aval=aval) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 545, in _device_put_impl return _device_put_sharding_impl(x, aval, device, copy) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 487, in _device_put_sharding_impl raise ValueError( ValueError: device_put's first argument must be a fully addressable array, but got value with devices {TpuDevice(id=21, process_index=4, coords=(1,5,0), core_on_chip=0), TpuDevice(id=23, process_index=5, coords=(3,5,0), core_on_chip=0), TpuDevice(id=12, process_index=2, coords=(0,3,0), core_on_chip=0), TpuDevice(id=11, process_index=3, coords=(3,2,0), core_on_chip=0), TpuDevice(id=27, process_index=7, coords=(3,6,0), core_on_chip=0), TpuDevice(id=6, process_index=1, coords=(2,1,0), core_on_chip=0), TpuDevice(id=15, process_index=3, coords=(3,3,0), core_on_chip=0), TpuDevice(id=26, process_index=7, coords=(2,6,0), core_on_chip=0), TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), TpuDevice(id=25, process_index=6, coords=(1,6,0), core_on_chip=0), TpuDevice(id=28, process_index=6, coords=(0,7,0), core_on_chip=0), TpuDevice(id=13, process_index=2, coords=(1,3,0), core_on_chip=0), TpuDevice(id=19, process_index=5, coords=(3,4,0), core_on_chip=0), TpuDevice(id=8, process_index=2, coords=(0,2,0), core_on_chip=0), TpuDevice(id=14, process_index=3, coords=(2,3,0), core_on_chip=0), TpuDevice(id=9, process_index=2, coords=(1,2,0), core_on_chip=0), TpuDevice(id=24, process_index=6, coords=(0,6,0), core_on_chip=0), TpuDevice(id=5, process_index=0, coords=(1,1,0), core_on_chip=0), TpuDevice(id=10, process_index=3, coords=(2,2,0), core_on_chip=0), TpuDevice(id=31, process_index=7, coords=(3,7,0), core_on_chip=0), TpuDevice(id=16, process_index=4, coords=(0,4,0), core_on_chip=0), TpuDevice(id=18, process_index=5, coords=(2,4,0), core_on_chip=0), TpuDevice(id=30, process_index=7, coords=(2,7,0), core_on_chip=0), TpuDevice(id=4, process_index=0, coords=(0,1,0), core_on_chip=0), TpuDevice(id=22, process_index=5, coords=(2,5,0), core_on_chip=0), TpuDevice(id=1, process_index=0, coords=(1,0,0), core_on_chip=0), TpuDevice(id=7, process_index=1, coords=(3,1,0), core_on_chip=0), TpuDevice(id=3, process_index=1, coords=(3,0,0), core_on_chip=0), TpuDevice(id=20, process_index=4, coords=(0,5,0), core_on_chip=0), TpuDevice(id=17, process_index=4, coords=(1,4,0), core_on_chip=0), TpuDevice(id=29, process_index=6, coords=(1,7,0), core_on_chip=0), TpuDevice(id=2, process_index=1, coords=(2,0,0), core_on_chip=0)} I0423 10:00:47.245630 132004276336384 grain_pool.py:542] Grain pool is exiting. I0423 10:00:47.245732 132004276336384 grain_pool.py:547] Shutting down multiprocessing system. I0423 10:00:48.697272 132004276336384 grain_pool.py:547] Shutting down multiprocessing system. Training: 0%| | 0/5 [00:13<?, ?step/s] /usr/local/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 15 leaked shared_memory objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' XPK End: Thu Apr 23 10:00:58 UTC 2026 EXIT_CODE=1