feat/nnx-post-train-fixesXPK Start: Sat Apr 25 12:21:57 UTC 2026 2026-04-25 12:22:15.767721: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303) `rope_parameters`'s factor field must be a float >= 1, got 40 `rope_parameters`'s beta_fast field must be a float, got 32 `rope_parameters`'s beta_slow field must be a float, got 1 DeepseekV32Config got `key=rope_scaling` in kwargs but hasn't set it as attribute. For RoPE standardization you need to set `self.rope_parameters` in model's config. I0425 12:22:22.153492 135043092322112 max_utils.py:273] Attempting to initialize the jax distributed system... I0425 12:22:31.194328 135043092322112 distributed.py:149] Starting JAX distributed service on [::]:8482 I0425 12:22:31.196619 135043092322112 distributed.py:172] Connecting to JAX distributed service on mt-07-distill-smoke-go4wt-slice-job-0-0.mt-07-distill-smoke-go4wt:8482 I0425 12:22:32.456594 135043092322112 max_utils.py:284] Jax distributed system initialized! I0425 12:22:38.695442 135043092322112 max_utils.py:244] Jax distributed system is already initialized. W0425 12:22:38.826238 135043092322112 pyconfig.py:111] base_output_directory is not provided; Using local directory called maxtext_output I0425 12:22:38.887249 135043092322112 max_utils.py:244] Jax distributed system is already initialized. I0425 12:22:38.888467 135043092322112 pyconfig.py:471] Config param abort_on_inf_loss: True I0425 12:22:38.888523 135043092322112 pyconfig.py:471] Config param abort_on_nan_loss: True I0425 12:22:38.888554 135043092322112 pyconfig.py:471] Config param act_quantization_calibration_method: absmax I0425 12:22:38.888574 135043092322112 pyconfig.py:471] Config param activation_dropout_for_audio: 0.0 I0425 12:22:38.888593 135043092322112 pyconfig.py:471] Config param activation_function_for_audio: gelu I0425 12:22:38.888613 135043092322112 pyconfig.py:471] Config param activations_in_float32: False I0425 12:22:38.888633 135043092322112 pyconfig.py:471] Config param adam_b1: 0.9 I0425 12:22:38.888654 135043092322112 pyconfig.py:471] Config param adam_b2: 0.95 I0425 12:22:38.888672 135043092322112 pyconfig.py:471] Config param adam_eps: 1e-08 I0425 12:22:38.888700 135043092322112 pyconfig.py:471] Config param adam_eps_root: 0.0 I0425 12:22:38.888718 135043092322112 pyconfig.py:471] Config param adam_weight_decay: 0.1 I0425 12:22:38.888736 135043092322112 pyconfig.py:471] Config param adamw_mask: [] I0425 12:22:38.888752 135043092322112 pyconfig.py:471] Config param add_bos: True I0425 12:22:38.888769 135043092322112 pyconfig.py:471] Config param add_eos: True I0425 12:22:38.888786 135043092322112 pyconfig.py:471] Config param allow_split_physical_axes: False I0425 12:22:38.888802 135043092322112 pyconfig.py:471] Config param ar_cache_axis_order: 1,2,0,3 I0425 12:22:38.888819 135043092322112 pyconfig.py:471] Config param async_checkpointing: True I0425 12:22:38.888836 135043092322112 pyconfig.py:471] Config param async_scheduling: False I0425 12:22:38.888853 135043092322112 pyconfig.py:471] Config param attention: dot_product I0425 12:22:38.888869 135043092322112 pyconfig.py:471] Config param attention_bias: False I0425 12:22:38.888887 135043092322112 pyconfig.py:471] Config param attention_dropout_for_audio: 0.0 I0425 12:22:38.888903 135043092322112 pyconfig.py:471] Config param attention_out: RematLocation.REMAT I0425 12:22:38.888924 135043092322112 pyconfig.py:471] Config param attention_output_dim: -1 I0425 12:22:38.888940 135043092322112 pyconfig.py:471] Config param attention_sink: False I0425 12:22:38.888957 135043092322112 pyconfig.py:471] Config param attention_type: global I0425 12:22:38.888973 135043092322112 pyconfig.py:471] Config param attn_logits_soft_cap: None I0425 12:22:38.888989 135043092322112 pyconfig.py:471] Config param audio_path: I0425 12:22:38.889005 135043092322112 pyconfig.py:471] Config param audio_placeholder: <|audio|> I0425 12:22:38.889021 135043092322112 pyconfig.py:471] Config param autoregressive_decode_assert: I0425 12:22:38.889036 135043092322112 pyconfig.py:471] Config param base_config: base.yml I0425 12:22:38.889053 135043092322112 pyconfig.py:471] Config param base_emb_dim: 16 I0425 12:22:38.889068 135043092322112 pyconfig.py:471] Config param base_mlp_dim: 64 I0425 12:22:38.889084 135043092322112 pyconfig.py:471] Config param base_moe_mlp_dim: -1 I0425 12:22:38.889116 135043092322112 pyconfig.py:471] Config param base_num_decoder_layers: 1 I0425 12:22:38.889131 135043092322112 pyconfig.py:471] Config param base_num_kv_heads: 2 I0425 12:22:38.889147 135043092322112 pyconfig.py:471] Config param base_num_query_heads: 2 I0425 12:22:38.889163 135043092322112 pyconfig.py:471] Config param base_output_directory: /deps/maxtext_output I0425 12:22:38.889179 135043092322112 pyconfig.py:471] Config param batch_size: 1 I0425 12:22:38.889194 135043092322112 pyconfig.py:471] Config param batch_split_factor: 1 I0425 12:22:38.889211 135043092322112 pyconfig.py:471] Config param beta_fast: 32 I0425 12:22:38.889227 135043092322112 pyconfig.py:471] Config param beta_slow: 1 I0425 12:22:38.889244 135043092322112 pyconfig.py:471] Config param bwd_quantization_calibration_method: absmax I0425 12:22:38.889260 135043092322112 pyconfig.py:471] Config param capacity_factor: -1.0 I0425 12:22:38.889277 135043092322112 pyconfig.py:471] Config param cast_logits_to_fp32: True I0425 12:22:38.889292 135043092322112 pyconfig.py:471] Config param chat_template: I0425 12:22:38.889313 135043092322112 pyconfig.py:471] Config param chat_template_path: I0425 12:22:38.889329 135043092322112 pyconfig.py:471] Config param checkpoint_conversion_fn: None I0425 12:22:38.889347 135043092322112 pyconfig.py:471] Config param checkpoint_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-12-22/checkpoints/ I0425 12:22:38.889363 135043092322112 pyconfig.py:471] Config param checkpoint_is_quantized: False I0425 12:22:38.889379 135043092322112 pyconfig.py:471] Config param checkpoint_period: 2000 I0425 12:22:38.889393 135043092322112 pyconfig.py:471] Config param checkpoint_storage_concurrent_gb: 96 I0425 12:22:38.889410 135043092322112 pyconfig.py:471] Config param checkpoint_storage_target_data_file_size_bytes: 2147483648 I0425 12:22:38.889425 135043092322112 pyconfig.py:471] Config param checkpoint_storage_use_ocdbt: True I0425 12:22:38.889441 135043092322112 pyconfig.py:471] Config param checkpoint_storage_use_zarr3: True I0425 12:22:38.889457 135043092322112 pyconfig.py:471] Config param checkpoint_todelete_full_path: None I0425 12:22:38.889472 135043092322112 pyconfig.py:471] Config param checkpoint_todelete_subdir: None I0425 12:22:38.889488 135043092322112 pyconfig.py:471] Config param chips_per_vm: 4 I0425 12:22:38.889503 135043092322112 pyconfig.py:471] Config param chunk_attn_window_size: 0 I0425 12:22:38.889519 135043092322112 pyconfig.py:471] Config param collect_stack_trace: False I0425 12:22:38.889533 135043092322112 pyconfig.py:471] Config param colocated_python_checkpointing: False I0425 12:22:38.889549 135043092322112 pyconfig.py:471] Config param colocated_python_data_input: False I0425 12:22:38.889564 135043092322112 pyconfig.py:471] Config param compile_topology: I0425 12:22:38.889579 135043092322112 pyconfig.py:471] Config param compile_topology_num_slices: -1 I0425 12:22:38.889594 135043092322112 pyconfig.py:471] Config param compile_xla_flags: I0425 12:22:38.889610 135043092322112 pyconfig.py:471] Config param compiled_trainstep_file: I0425 12:22:38.889624 135043092322112 pyconfig.py:471] Config param compute_axis_order: 0,1,2,3 I0425 12:22:38.889640 135043092322112 pyconfig.py:471] Config param constant_bound_config: [] I0425 12:22:38.889655 135043092322112 pyconfig.py:471] Config param context: RematLocation.REMAT I0425 12:22:38.889672 135043092322112 pyconfig.py:471] Config param context_parallel_load_balance: True I0425 12:22:38.889688 135043092322112 pyconfig.py:471] Config param context_parallel_reorder_strategy: ReorderStrategy.AUTO I0425 12:22:38.889706 135043092322112 pyconfig.py:471] Config param context_parallel_size: 1 I0425 12:22:38.889722 135043092322112 pyconfig.py:471] Config param context_parallel_strategy: all_gather I0425 12:22:38.889736 135043092322112 pyconfig.py:471] Config param context_sharding: context I0425 12:22:38.889752 135043092322112 pyconfig.py:471] Config param conv_chunksize_for_audio: 500 I0425 12:22:38.889767 135043092322112 pyconfig.py:471] Config param conv_stride_for_vit: 14 I0425 12:22:38.889782 135043092322112 pyconfig.py:471] Config param convert_checkpoint_if_possible: False I0425 12:22:38.889798 135043092322112 pyconfig.py:471] Config param cost_estimate_flops_bwd: -1 I0425 12:22:38.889814 135043092322112 pyconfig.py:471] Config param cost_estimate_flops_fwd: -1 I0425 12:22:38.889829 135043092322112 pyconfig.py:471] Config param custom_mesh: I0425 12:22:38.889844 135043092322112 pyconfig.py:471] Config param custom_mesh_and_rule: I0425 12:22:38.889860 135043092322112 pyconfig.py:471] Config param d_model_for_audio: 256 I0425 12:22:38.889875 135043092322112 pyconfig.py:471] Config param data_sharding: (('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'),) I0425 12:22:38.889894 135043092322112 pyconfig.py:471] Config param data_shuffle_seed: 0 I0425 12:22:38.889909 135043092322112 pyconfig.py:471] Config param dataset_name: c4/en:3.0.1 I0425 12:22:38.889925 135043092322112 pyconfig.py:471] Config param dataset_path: I0425 12:22:38.889940 135043092322112 pyconfig.py:471] Config param dataset_type: DatasetType.HF I0425 12:22:38.889955 135043092322112 pyconfig.py:471] Config param dcn_autoregressive_parallelism: 1 I0425 12:22:38.889973 135043092322112 pyconfig.py:471] Config param dcn_context_autoregressive_parallelism: 1 I0425 12:22:38.889988 135043092322112 pyconfig.py:471] Config param dcn_context_parallelism: 1 I0425 12:22:38.890004 135043092322112 pyconfig.py:471] Config param dcn_data_parallelism: -1 I0425 12:22:38.890020 135043092322112 pyconfig.py:471] Config param dcn_diloco_parallelism: 1 I0425 12:22:38.890035 135043092322112 pyconfig.py:471] Config param dcn_expert_parallelism: 1 I0425 12:22:38.890051 135043092322112 pyconfig.py:471] Config param dcn_fsdp_parallelism: 1 I0425 12:22:38.890065 135043092322112 pyconfig.py:471] Config param dcn_fsdp_transpose_parallelism: 1 I0425 12:22:38.890081 135043092322112 pyconfig.py:471] Config param dcn_parallelism: [1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] I0425 12:22:38.890122 135043092322112 pyconfig.py:471] Config param dcn_pipeline_parallelism: 1 I0425 12:22:38.890140 135043092322112 pyconfig.py:471] Config param dcn_sequence_parallelism: 1 I0425 12:22:38.890155 135043092322112 pyconfig.py:471] Config param dcn_tensor_parallelism: 1 I0425 12:22:38.890169 135043092322112 pyconfig.py:471] Config param dcn_tensor_sequence_parallelism: 1 I0425 12:22:38.890185 135043092322112 pyconfig.py:471] Config param dcn_tensor_transpose_parallelism: 1 I0425 12:22:38.890202 135043092322112 pyconfig.py:471] Config param debug: {'rl': False} I0425 12:22:38.890217 135043092322112 pyconfig.py:471] Config param debug_sharding: False I0425 12:22:38.890233 135043092322112 pyconfig.py:471] Config param decode_sampling_nucleus_p: -1 I0425 12:22:38.890248 135043092322112 pyconfig.py:471] Config param decode_sampling_strategy: SamplingStrategy.GREEDY I0425 12:22:38.890266 135043092322112 pyconfig.py:471] Config param decode_sampling_temperature: 1.0 I0425 12:22:38.890283 135043092322112 pyconfig.py:471] Config param decode_sampling_top_k: 0 I0425 12:22:38.890303 135043092322112 pyconfig.py:471] Config param decoder_block: DecoderBlockType.GPT3 I0425 12:22:38.890321 135043092322112 pyconfig.py:471] Config param decoder_layer_input: RematLocation.DEVICE I0425 12:22:38.890338 135043092322112 pyconfig.py:471] Config param deepstack_visual_indexes_for_vit: [] I0425 12:22:38.890355 135043092322112 pyconfig.py:471] Config param degenerate_group_masking: True I0425 12:22:38.890371 135043092322112 pyconfig.py:471] Config param dense_init_scale: 1.0 I0425 12:22:38.890387 135043092322112 pyconfig.py:471] Config param diloco_outer_lr: 0.3 I0425 12:22:38.890403 135043092322112 pyconfig.py:471] Config param diloco_outer_momentum: 0.9 I0425 12:22:38.890418 135043092322112 pyconfig.py:471] Config param diloco_sync_period: 36 I0425 12:22:38.890435 135043092322112 pyconfig.py:471] Config param distill_alpha: 0.5 I0425 12:22:38.890452 135043092322112 pyconfig.py:471] Config param distill_alpha_end: None I0425 12:22:38.890467 135043092322112 pyconfig.py:471] Config param distill_alpha_schedule: constant I0425 12:22:38.890482 135043092322112 pyconfig.py:471] Config param distill_beta: 0.0 I0425 12:22:38.890498 135043092322112 pyconfig.py:471] Config param distill_beta_end: None I0425 12:22:38.890514 135043092322112 pyconfig.py:471] Config param distill_beta_schedule: constant I0425 12:22:38.890530 135043092322112 pyconfig.py:471] Config param distill_feature_loss_type: cosine I0425 12:22:38.890544 135043092322112 pyconfig.py:471] Config param distill_layer_indices: None I0425 12:22:38.890559 135043092322112 pyconfig.py:471] Config param distill_temperature: 1.0 I0425 12:22:38.890576 135043092322112 pyconfig.py:471] Config param distill_temperature_end: None I0425 12:22:38.890590 135043092322112 pyconfig.py:471] Config param distill_temperature_schedule: constant I0425 12:22:38.890607 135043092322112 pyconfig.py:471] Config param downsample_hidden_size_for_audio: 256 I0425 12:22:38.890622 135043092322112 pyconfig.py:471] Config param dpo_beta: 0.1 I0425 12:22:38.890638 135043092322112 pyconfig.py:471] Config param dpo_label_smoothing: 0.0 I0425 12:22:38.890652 135043092322112 pyconfig.py:471] Config param dq_reduction_steps: 0 I0425 12:22:38.890669 135043092322112 pyconfig.py:471] Config param dropout_rate: 0.0 I0425 12:22:38.890683 135043092322112 pyconfig.py:471] Config param dtype: bfloat16 I0425 12:22:38.890713 135043092322112 pyconfig.py:471] Config param dtype_mm: float32 I0425 12:22:38.890729 135043092322112 pyconfig.py:471] Config param dump_hlo: False I0425 12:22:38.890745 135043092322112 pyconfig.py:471] Config param dump_hlo_delete_local_after: True I0425 12:22:38.890761 135043092322112 pyconfig.py:471] Config param dump_hlo_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-12-22/xla_dump I0425 12:22:38.890776 135043092322112 pyconfig.py:471] Config param dump_hlo_local_dir: /tmp/xla_dump/ I0425 12:22:38.890792 135043092322112 pyconfig.py:471] Config param dump_hlo_local_module_name: jit_train_step I0425 12:22:38.890807 135043092322112 pyconfig.py:471] Config param dump_hlo_module_name: jit_train_step I0425 12:22:38.890822 135043092322112 pyconfig.py:471] Config param dump_hlo_upload_all: False I0425 12:22:38.890838 135043092322112 pyconfig.py:471] Config param dump_hlo_xla_flags: I0425 12:22:38.890853 135043092322112 pyconfig.py:471] Config param dump_jaxpr: False I0425 12:22:38.890868 135043092322112 pyconfig.py:471] Config param dump_jaxpr_delete_local_after: True I0425 12:22:38.890884 135043092322112 pyconfig.py:471] Config param dump_jaxpr_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-12-22/jaxpr_dump I0425 12:22:38.890899 135043092322112 pyconfig.py:471] Config param dump_jaxpr_local_dir: /tmp/jaxpr_dump/ I0425 12:22:38.890914 135043092322112 pyconfig.py:471] Config param dump_step: -1 I0425 12:22:38.890929 135043092322112 pyconfig.py:471] Config param elastic_enabled: False I0425 12:22:38.890944 135043092322112 pyconfig.py:471] Config param elastic_max_retries: 10 I0425 12:22:38.890961 135043092322112 pyconfig.py:471] Config param elastic_timeout_seconds: 300 I0425 12:22:38.890976 135043092322112 pyconfig.py:471] Config param emb_dim: 16 I0425 12:22:38.890992 135043092322112 pyconfig.py:471] Config param enable_autocheckpoint: False I0425 12:22:38.891006 135043092322112 pyconfig.py:471] Config param enable_checkpoint_cloud_logger: False I0425 12:22:38.891022 135043092322112 pyconfig.py:471] Config param enable_checkpointing: True I0425 12:22:38.891036 135043092322112 pyconfig.py:471] Config param enable_continuous_checkpointing: False I0425 12:22:38.891052 135043092322112 pyconfig.py:471] Config param enable_data_shuffling: True I0425 12:22:38.891067 135043092322112 pyconfig.py:471] Config param enable_diloco: False I0425 12:22:38.891083 135043092322112 pyconfig.py:471] Config param enable_dp_attention: False I0425 12:22:38.891110 135043092322112 pyconfig.py:471] Config param enable_dropout: False I0425 12:22:38.891127 135043092322112 pyconfig.py:471] Config param enable_emergency_checkpoint: False I0425 12:22:38.891141 135043092322112 pyconfig.py:471] Config param enable_expert_parallel: False I0425 12:22:38.891157 135043092322112 pyconfig.py:471] Config param enable_gcp_goodput_metrics: True I0425 12:22:38.891175 135043092322112 pyconfig.py:471] Config param enable_gcp_step_deviation_metrics: True I0425 12:22:38.891191 135043092322112 pyconfig.py:471] Config param enable_goodput_recording: False I0425 12:22:38.891207 135043092322112 pyconfig.py:471] Config param enable_jax_profiler: False I0425 12:22:38.891222 135043092322112 pyconfig.py:471] Config param enable_llm_inference_pool: False I0425 12:22:38.891238 135043092322112 pyconfig.py:471] Config param enable_model_warmup: False I0425 12:22:38.891253 135043092322112 pyconfig.py:471] Config param enable_multi_tier_checkpointing: False I0425 12:22:38.891268 135043092322112 pyconfig.py:471] Config param enable_nnx: False I0425 12:22:38.891286 135043092322112 pyconfig.py:471] Config param enable_orbax_v1: False I0425 12:22:38.891304 135043092322112 pyconfig.py:471] Config param enable_padding_causal_mask: True I0425 12:22:38.891321 135043092322112 pyconfig.py:471] Config param enable_pathways_goodput: False I0425 12:22:38.891336 135043092322112 pyconfig.py:471] Config param enable_prefix_caching: False I0425 12:22:38.891352 135043092322112 pyconfig.py:471] Config param enable_rampup_batch_size: False I0425 12:22:38.891366 135043092322112 pyconfig.py:471] Config param enable_single_controller: False I0425 12:22:38.891382 135043092322112 pyconfig.py:471] Config param enable_single_replica_ckpt_restoring: False I0425 12:22:38.891398 135043092322112 pyconfig.py:471] Config param enable_tensorboard: True I0425 12:22:38.891413 135043092322112 pyconfig.py:471] Config param enable_tunix_perf_metrics: False I0425 12:22:38.891429 135043092322112 pyconfig.py:471] Config param encoder_attention_heads_for_audio: 4 I0425 12:22:38.891444 135043092322112 pyconfig.py:471] Config param encoder_ffn_dim_for_audio: 512 I0425 12:22:38.891460 135043092322112 pyconfig.py:471] Config param encoder_layers_for_audio: 2 I0425 12:22:38.891475 135043092322112 pyconfig.py:471] Config param engram: RematLocation.REMAT I0425 12:22:38.891491 135043092322112 pyconfig.py:471] Config param engram_head_dim: 1280 I0425 12:22:38.891506 135043092322112 pyconfig.py:471] Config param engram_kernel_size: 4 I0425 12:22:38.891522 135043092322112 pyconfig.py:471] Config param engram_layers: [] I0425 12:22:38.891539 135043092322112 pyconfig.py:471] Config param engram_max_ngram_size: 3 I0425 12:22:38.891553 135043092322112 pyconfig.py:471] Config param engram_num_heads: 8 I0425 12:22:38.891569 135043092322112 pyconfig.py:471] Config param engram_seed: 0 I0425 12:22:38.891585 135043092322112 pyconfig.py:471] Config param engram_vocab_bases: [] I0425 12:22:38.891599 135043092322112 pyconfig.py:471] Config param epsilon_high: None I0425 12:22:38.891615 135043092322112 pyconfig.py:471] Config param eval_corr_lst: False I0425 12:22:38.891630 135043092322112 pyconfig.py:471] Config param eval_data_columns: ['text'] I0425 12:22:38.891645 135043092322112 pyconfig.py:471] Config param eval_dataset_name: c4/en:3.0.1 I0425 12:22:38.891661 135043092322112 pyconfig.py:471] Config param eval_image_column: image I0425 12:22:38.891675 135043092322112 pyconfig.py:471] Config param eval_interval: -1 I0425 12:22:38.891691 135043092322112 pyconfig.py:471] Config param eval_make_lst: False I0425 12:22:38.891708 135043092322112 pyconfig.py:471] Config param eval_per_device_batch_size: 2 I0425 12:22:38.891722 135043092322112 pyconfig.py:471] Config param eval_sampling_strategy: greedy I0425 12:22:38.891738 135043092322112 pyconfig.py:471] Config param eval_split: validation I0425 12:22:38.891753 135043092322112 pyconfig.py:471] Config param eval_steps: -1 I0425 12:22:38.891769 135043092322112 pyconfig.py:471] Config param expansion_factor_real_data: -1.0 I0425 12:22:38.891785 135043092322112 pyconfig.py:471] Config param final_logits_soft_cap: None I0425 12:22:38.891800 135043092322112 pyconfig.py:471] Config param first_num_dense_layers: 0 I0425 12:22:38.891816 135043092322112 pyconfig.py:471] Config param float32_gate_logits: False I0425 12:22:38.891833 135043092322112 pyconfig.py:471] Config param float32_logits: False I0425 12:22:38.891849 135043092322112 pyconfig.py:471] Config param float32_qk_product: False I0425 12:22:38.891863 135043092322112 pyconfig.py:471] Config param float32_weight_sum: True I0425 12:22:38.891878 135043092322112 pyconfig.py:471] Config param force_q_layout: False I0425 12:22:38.891894 135043092322112 pyconfig.py:471] Config param force_unroll: False I0425 12:22:38.891910 135043092322112 pyconfig.py:471] Config param formatting_func_kwargs: {} I0425 12:22:38.891925 135043092322112 pyconfig.py:471] Config param formatting_func_path: I0425 12:22:38.891941 135043092322112 pyconfig.py:471] Config param freeze_audio_encoder_params: True I0425 12:22:38.891955 135043092322112 pyconfig.py:471] Config param freeze_vision_encoder_params: True I0425 12:22:38.891971 135043092322112 pyconfig.py:471] Config param fused_mlp: False I0425 12:22:38.891987 135043092322112 pyconfig.py:471] Config param fused_qkv: True I0425 12:22:38.892002 135043092322112 pyconfig.py:471] Config param gcs_metrics: False I0425 12:22:38.892017 135043092322112 pyconfig.py:471] Config param gdn_chunk_size: 64 I0425 12:22:38.892033 135043092322112 pyconfig.py:471] Config param gdn_conv_kernel_dim: 4 I0425 12:22:38.892048 135043092322112 pyconfig.py:471] Config param gdn_key_head_dim: 128 I0425 12:22:38.892064 135043092322112 pyconfig.py:471] Config param gdn_num_key_heads: 16 I0425 12:22:38.892079 135043092322112 pyconfig.py:471] Config param gdn_num_value_heads: 32 I0425 12:22:38.892107 135043092322112 pyconfig.py:471] Config param gdn_value_head_dim: 128 I0425 12:22:38.892124 135043092322112 pyconfig.py:471] Config param generate_padding_batch_eval: False I0425 12:22:38.892139 135043092322112 pyconfig.py:471] Config param generate_padding_batch_train: False I0425 12:22:38.892155 135043092322112 pyconfig.py:471] Config param generate_slice: v5e-16 I0425 12:22:38.892169 135043092322112 pyconfig.py:471] Config param generation_configs: {} I0425 12:22:38.892185 135043092322112 pyconfig.py:471] Config param global_batch_size_to_eval_on: 64 I0425 12:22:38.892200 135043092322112 pyconfig.py:471] Config param global_batch_size_to_load: 512 I0425 12:22:38.892215 135043092322112 pyconfig.py:471] Config param global_batch_size_to_load_eval: 64 I0425 12:22:38.892231 135043092322112 pyconfig.py:471] Config param global_batch_size_to_load_increment: None I0425 12:22:38.892246 135043092322112 pyconfig.py:471] Config param global_batch_size_to_load_start: None I0425 12:22:38.892262 135043092322112 pyconfig.py:471] Config param global_batch_size_to_train_on: 512 I0425 12:22:38.892277 135043092322112 pyconfig.py:471] Config param global_head_dim: 0 I0425 12:22:38.892292 135043092322112 pyconfig.py:471] Config param global_num_kv_heads: 0 I0425 12:22:38.892313 135043092322112 pyconfig.py:471] Config param global_parameter_scale: 1 I0425 12:22:38.892329 135043092322112 pyconfig.py:471] Config param global_rampup_samples: 500 I0425 12:22:38.892345 135043092322112 pyconfig.py:471] Config param global_rope_max_timescale: -1 I0425 12:22:38.892360 135043092322112 pyconfig.py:471] Config param global_rope_proportion: 0.25 I0425 12:22:38.892375 135043092322112 pyconfig.py:471] Config param goodput_upload_interval_seconds: 30 I0425 12:22:38.892392 135043092322112 pyconfig.py:471] Config param grad_dtype: float32 I0425 12:22:38.892427 135043092322112 pyconfig.py:471] Config param gradient_accumulation_steps: 8 I0425 12:22:38.892444 135043092322112 pyconfig.py:471] Config param gradient_clipping_threshold: 1.0 I0425 12:22:38.892460 135043092322112 pyconfig.py:471] Config param grain_data_source_max_workers: 16 I0425 12:22:38.892475 135043092322112 pyconfig.py:471] Config param grain_eval_files: I0425 12:22:38.892491 135043092322112 pyconfig.py:471] Config param grain_file_type: arrayrecord I0425 12:22:38.892506 135043092322112 pyconfig.py:471] Config param grain_num_threads: 16 I0425 12:22:38.892526 135043092322112 pyconfig.py:471] Config param grain_num_threads_eval: 16 I0425 12:22:38.892550 135043092322112 pyconfig.py:471] Config param grain_packing_type: first_fit I0425 12:22:38.892576 135043092322112 pyconfig.py:471] Config param grain_per_worker_buffer_size: 1 I0425 12:22:38.892603 135043092322112 pyconfig.py:471] Config param grain_per_worker_buffer_size_eval: 1 I0425 12:22:38.892630 135043092322112 pyconfig.py:471] Config param grain_prefetch_buffer_size: 500 I0425 12:22:38.892654 135043092322112 pyconfig.py:471] Config param grain_prefetch_buffer_size_eval: 500 I0425 12:22:38.892671 135043092322112 pyconfig.py:471] Config param grain_ram_budget_mb: 1024 I0425 12:22:38.892688 135043092322112 pyconfig.py:471] Config param grain_shuffle_buffer_size: 100 I0425 12:22:38.892704 135043092322112 pyconfig.py:471] Config param grain_train_files: I0425 12:22:38.892719 135043092322112 pyconfig.py:471] Config param grain_train_mixture_config_path: I0425 12:22:38.892735 135043092322112 pyconfig.py:471] Config param grain_worker_count: 1 I0425 12:22:38.892753 135043092322112 pyconfig.py:471] Config param grain_worker_count_eval: 1 I0425 12:22:38.892769 135043092322112 pyconfig.py:471] Config param grpo_beta: 0.08 I0425 12:22:38.892786 135043092322112 pyconfig.py:471] Config param grpo_epsilon: 0.2 I0425 12:22:38.892802 135043092322112 pyconfig.py:471] Config param hardware: tpu I0425 12:22:38.892817 135043092322112 pyconfig.py:471] Config param hbm_utilization_vllm: 0.72 I0425 12:22:38.892833 135043092322112 pyconfig.py:471] Config param head_dim: 8 I0425 12:22:38.892848 135043092322112 pyconfig.py:471] Config param heartbeat_reporting_interval_in_seconds: 5 I0425 12:22:38.892863 135043092322112 pyconfig.py:471] Config param hf_data_dir: None I0425 12:22:38.892879 135043092322112 pyconfig.py:471] Config param hf_eval_files: None I0425 12:22:38.892894 135043092322112 pyconfig.py:471] Config param hf_eval_split: None I0425 12:22:38.892910 135043092322112 pyconfig.py:471] Config param hf_name: None I0425 12:22:38.892925 135043092322112 pyconfig.py:471] Config param hf_path: OptimalScale/ClimbMix I0425 12:22:38.892941 135043092322112 pyconfig.py:471] Config param hf_train_files: None I0425 12:22:38.892955 135043092322112 pyconfig.py:471] Config param hidden_size_for_vit: 1408 I0425 12:22:38.892971 135043092322112 pyconfig.py:471] Config param hide_profiler_step_metric: False I0425 12:22:38.892986 135043092322112 pyconfig.py:471] Config param ici_autoregressive_parallelism: 1 I0425 12:22:38.893001 135043092322112 pyconfig.py:471] Config param ici_context_autoregressive_parallelism: 1 I0425 12:22:38.893016 135043092322112 pyconfig.py:471] Config param ici_context_parallelism: 1 I0425 12:22:38.893032 135043092322112 pyconfig.py:471] Config param ici_data_parallelism: 1 I0425 12:22:38.893048 135043092322112 pyconfig.py:471] Config param ici_diloco_parallelism: 1 I0425 12:22:38.893064 135043092322112 pyconfig.py:471] Config param ici_expert_parallelism: 1 I0425 12:22:38.893080 135043092322112 pyconfig.py:471] Config param ici_fsdp_parallelism: -1 I0425 12:22:38.893105 135043092322112 pyconfig.py:471] Config param ici_fsdp_transpose_parallelism: 1 I0425 12:22:38.893129 135043092322112 pyconfig.py:471] Config param ici_parallelism: [1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1] I0425 12:22:38.893156 135043092322112 pyconfig.py:471] Config param ici_pipeline_parallelism: 1 I0425 12:22:38.893183 135043092322112 pyconfig.py:471] Config param ici_sequence_parallelism: 1 I0425 12:22:38.893211 135043092322112 pyconfig.py:471] Config param ici_tensor_parallelism: 1 I0425 12:22:38.893235 135043092322112 pyconfig.py:471] Config param ici_tensor_sequence_parallelism: 1 I0425 12:22:38.893252 135043092322112 pyconfig.py:471] Config param ici_tensor_transpose_parallelism: 1 I0425 12:22:38.893268 135043092322112 pyconfig.py:471] Config param image_path: I0425 12:22:38.893284 135043092322112 pyconfig.py:471] Config param image_placeholder: <|image|> I0425 12:22:38.893304 135043092322112 pyconfig.py:471] Config param image_size_for_vit: 896 I0425 12:22:38.893331 135043092322112 pyconfig.py:471] Config param indexer_head_dim: 128 I0425 12:22:38.893352 135043092322112 pyconfig.py:471] Config param indexer_loss_scaling_factor: 0.0 I0425 12:22:38.893371 135043092322112 pyconfig.py:471] Config param indexer_n_heads: 64 I0425 12:22:38.893386 135043092322112 pyconfig.py:471] Config param indexer_sparse_training: False I0425 12:22:38.893402 135043092322112 pyconfig.py:471] Config param indexer_topk: 2048 I0425 12:22:38.893417 135043092322112 pyconfig.py:471] Config param inference_benchmark_test: False I0425 12:22:38.893432 135043092322112 pyconfig.py:471] Config param inference_metadata_file: I0425 12:22:38.893448 135043092322112 pyconfig.py:471] Config param inference_microbenchmark_log_file_path: I0425 12:22:38.893464 135043092322112 pyconfig.py:471] Config param inference_microbenchmark_loop_iters: 10 I0425 12:22:38.893478 135043092322112 pyconfig.py:471] Config param inference_microbenchmark_num_samples: [1, 2, 3, 4, 5] I0425 12:22:38.893496 135043092322112 pyconfig.py:471] Config param inference_microbenchmark_prefill_lengths: 64,128,256,512,1024 I0425 12:22:38.893511 135043092322112 pyconfig.py:471] Config param inference_microbenchmark_stages: prefill,generate I0425 12:22:38.893527 135043092322112 pyconfig.py:471] Config param inference_server: MaxtextInterleavedServer I0425 12:22:38.893542 135043092322112 pyconfig.py:471] Config param inhomogeneous_layer_cycle_interval: 1 I0425 12:22:38.893561 135043092322112 pyconfig.py:471] Config param init_weights_seed: 0 I0425 12:22:38.893576 135043092322112 pyconfig.py:471] Config param input_data_sharding_logical_axes: ['activation_embed_and_logits_batch', 'activation_norm_length'] I0425 12:22:38.893594 135043092322112 pyconfig.py:471] Config param interleave_moe_layer_step: 1 I0425 12:22:38.893609 135043092322112 pyconfig.py:471] Config param intermediate_size_for_vit: 5632 I0425 12:22:38.893625 135043092322112 pyconfig.py:471] Config param internal_compile: False I0425 12:22:38.893642 135043092322112 pyconfig.py:471] Config param internal_compile_num_devices: -1 I0425 12:22:38.893659 135043092322112 pyconfig.py:471] Config param jax_cache_dir: ~/jax_cache I0425 12:22:38.893675 135043092322112 pyconfig.py:471] Config param jax_debug_log_modules: I0425 12:22:38.893691 135043092322112 pyconfig.py:471] Config param jax_distributed_initialization_timeout: 300 I0425 12:22:38.893706 135043092322112 pyconfig.py:471] Config param jax_profiler_port: 9999 I0425 12:22:38.893722 135043092322112 pyconfig.py:471] Config param key_proj: RematLocation.REMAT I0425 12:22:38.893738 135043092322112 pyconfig.py:471] Config param kv_cache_buffer: 256 I0425 12:22:38.893754 135043092322112 pyconfig.py:471] Config param kv_lora_rank: 512 I0425 12:22:38.893768 135043092322112 pyconfig.py:471] Config param kv_quant_axis: KvQuantAxis.HEADS_AND_DKV I0425 12:22:38.893787 135043092322112 pyconfig.py:471] Config param kv_quant_dtype: int8 I0425 12:22:38.893804 135043092322112 pyconfig.py:471] Config param kv_wa_proj: RematLocation.REMAT I0425 12:22:38.893820 135043092322112 pyconfig.py:471] Config param learning_rate: 0.0002 I0425 12:22:38.893836 135043092322112 pyconfig.py:471] Config param learning_rate_final_fraction: 0.1 I0425 12:22:38.893853 135043092322112 pyconfig.py:471] Config param learning_rate_schedule_steps: 200000 I0425 12:22:38.893867 135043092322112 pyconfig.py:471] Config param load_balance_loss_weight: 0.0 I0425 12:22:38.893883 135043092322112 pyconfig.py:471] Config param load_checkpoint_only_once: False I0425 12:22:38.893899 135043092322112 pyconfig.py:471] Config param load_from_prefill_dir: False I0425 12:22:38.893914 135043092322112 pyconfig.py:471] Config param load_full_state_path: I0425 12:22:38.893929 135043092322112 pyconfig.py:471] Config param load_parameters_path: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items I0425 12:22:38.893944 135043092322112 pyconfig.py:471] Config param local_checkpoint_directory: I0425 12:22:38.893960 135043092322112 pyconfig.py:471] Config param local_checkpoint_period: 0 I0425 12:22:38.893975 135043092322112 pyconfig.py:471] Config param local_rope_max_timescale: -1 I0425 12:22:38.893991 135043092322112 pyconfig.py:471] Config param local_rope_proportion: 1.0 I0425 12:22:38.894006 135043092322112 pyconfig.py:471] Config param log_config: True I0425 12:22:38.894022 135043092322112 pyconfig.py:471] Config param log_period: 10 I0425 12:22:38.894036 135043092322112 pyconfig.py:471] Config param logical_axis_rules: (('activation_embed_and_logits_batch', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_embed_and_logits_batch_sequence', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_vocab', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_vocab', ('tensor', 'tensor_transpose')), ('activation_vocab', 'tensor_sequence'), ('activation_vocab', ('sequence', 'context')), ('vocab', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('embed_vocab', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_batch_attn', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence', 'autoregressive')), ('activation_kv_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence')), ('activation_length_attn', ('sequence', 'context')), ('activation_length_attn', ('context',)), ('activation_q_length', ('context',)), ('activation_kv_length', ()), ('activation_embed_attn', ('tensor', 'tensor_transpose')), ('activation_kv', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_kv_head_dim', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('q_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('kv_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('qkv', ()), ('kv', ()), ('kv_head_dim', ()), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'expert')), ('q_lora_up_proj', ()), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'expert')), ('kv_lora_up_proj', ()), ('activation_batch_moe', ('data', 'fsdp', 'fsdp_transpose')), ('activation_length_moe', ('sequence', 'context')), ('activation_length_moe', ('context',)), ('activation_norm_length_moe', ('tensor_sequence', 'context', 'sequence')), ('activation_embed_moe', ('tensor', 'tensor_transpose')), ('activation_mlp_moe', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_exp', ('expert',)), ('exp', 'expert'), ('mlp_moe', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'context')), ('embed_moe', ('fsdp', 'sequence', 'context')), ('activation_mlp', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_length', ('sequence', 'context')), ('activation_length', ('context',)), ('activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_embed', ('tensor', 'tensor_transpose')), ('activation_stage', 'stage'), ('mlp', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'context', 'expert')), ('norm', ('tensor', 'tensor_transpose')), ('layers', 'stage'), ('diloco', 'diloco'), ('engram_dim', ('tensor',)), ('dense_layers', ()), ('moe_layers', ()), ('mhc', ()), ('prefill_activation_length', ('sequence', 'context')), ('prefill_activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_prefill_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_length', ('sequence',)), ('cache_heads', ('autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence')), ('cache_heads', ('autoregressive', 'tensor', 'tensor_sequence')), ('paged_kv_heads', ('tensor',)), ('cache_batch_prefill', ()), ('cache_batch', ()), ('cache_heads_none', ()), ('cache_kv', ()), ('cache_sequence', ()), ('num_pages', ()), ('tokens_per_page', ()), ('paged_kv_head_dim_size', ()), ('mlp_no_fsdp', ('tensor', 'tensor_sequence', 'autoregressive')), ('embed_tensor_transpose', ('tensor_transpose',)), ('exp_with_fsdp', 'fsdp')) I0425 12:22:38.894124 135043092322112 pyconfig.py:471] Config param logits_dot_in_fp32: False I0425 12:22:38.894141 135043092322112 pyconfig.py:471] Config param logits_via_embedding: True I0425 12:22:38.894157 135043092322112 pyconfig.py:471] Config param lora_input_adapters_path: I0425 12:22:38.894172 135043092322112 pyconfig.py:471] Config param loss_algo: grpo I0425 12:22:38.894191 135043092322112 pyconfig.py:471] Config param lr_schedule_type: LearningRateScheduleType.COSINE I0425 12:22:38.894219 135043092322112 pyconfig.py:471] Config param managed_mldiagnostics: False I0425 12:22:38.894244 135043092322112 pyconfig.py:471] Config param managed_mldiagnostics_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-12-22/managed-mldiagnostics I0425 12:22:38.894266 135043092322112 pyconfig.py:471] Config param managed_mldiagnostics_run_group: I0425 12:22:38.894291 135043092322112 pyconfig.py:471] Config param matmul_precision: MatmulPrecision.DEFAULT I0425 12:22:38.894326 135043092322112 pyconfig.py:471] Config param max_checkify: False I0425 12:22:38.894354 135043092322112 pyconfig.py:471] Config param max_concurrency: 256 I0425 12:22:38.894379 135043092322112 pyconfig.py:471] Config param max_corpus_chars: 10000000 I0425 12:22:38.894405 135043092322112 pyconfig.py:471] Config param max_num_batched_tokens: None I0425 12:22:38.894430 135043092322112 pyconfig.py:471] Config param max_num_checkpoints_to_keep: None I0425 12:22:38.894446 135043092322112 pyconfig.py:471] Config param max_num_images_per_example: -1 I0425 12:22:38.894462 135043092322112 pyconfig.py:471] Config param max_num_seqs: None I0425 12:22:38.894486 135043092322112 pyconfig.py:471] Config param max_position_embeddings: 163840 I0425 12:22:38.894512 135043092322112 pyconfig.py:471] Config param max_prefill_predict_length: 64 I0425 12:22:38.894539 135043092322112 pyconfig.py:471] Config param max_sample_len_for_audio: 10000 I0425 12:22:38.894565 135043092322112 pyconfig.py:471] Config param max_segments_per_seq: -1 I0425 12:22:38.894592 135043092322112 pyconfig.py:471] Config param max_source_positions_for_audio: 1500 I0425 12:22:38.894619 135043092322112 pyconfig.py:471] Config param max_target_length: 2048 I0425 12:22:38.894645 135043092322112 pyconfig.py:471] Config param max_timescale_for_audio: 10000.0 I0425 12:22:38.894672 135043092322112 pyconfig.py:471] Config param megablox: True I0425 12:22:38.894698 135043092322112 pyconfig.py:471] Config param merge_gating_gmm: False I0425 12:22:38.894724 135043092322112 pyconfig.py:471] Config param mesh_axes: ['diloco', 'data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'] I0425 12:22:38.894752 135043092322112 pyconfig.py:471] Config param metrics_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-12-22/metrics/ I0425 12:22:38.894779 135043092322112 pyconfig.py:471] Config param metrics_file: I0425 12:22:38.894803 135043092322112 pyconfig.py:471] Config param mhc_expansion_rate: 1 I0425 12:22:38.894827 135043092322112 pyconfig.py:471] Config param micro_batch_size_to_eval_on: 64 I0425 12:22:38.894847 135043092322112 pyconfig.py:471] Config param micro_batch_size_to_train_on: 64 I0425 12:22:38.894863 135043092322112 pyconfig.py:471] Config param mla_kv: RematLocation.REMAT I0425 12:22:38.894889 135043092322112 pyconfig.py:471] Config param mla_naive_kvcache: True I0425 12:22:38.894915 135043092322112 pyconfig.py:471] Config param mla_q: RematLocation.REMAT I0425 12:22:38.894940 135043092322112 pyconfig.py:471] Config param mlp_activations: ['gelu'] I0425 12:22:38.894967 135043092322112 pyconfig.py:471] Config param mlp_activations_limit: -1.0 I0425 12:22:38.894994 135043092322112 pyconfig.py:471] Config param mlp_bias: False I0425 12:22:38.895021 135043092322112 pyconfig.py:471] Config param mlp_dim: 64 I0425 12:22:38.895046 135043092322112 pyconfig.py:471] Config param mlpwi: RematLocation.REMAT I0425 12:22:38.895073 135043092322112 pyconfig.py:471] Config param mlpwi_0: RematLocation.REMAT I0425 12:22:38.895112 135043092322112 pyconfig.py:471] Config param mlpwi_1: RematLocation.REMAT I0425 12:22:38.895144 135043092322112 pyconfig.py:471] Config param mlpwo: RematLocation.REMAT I0425 12:22:38.895168 135043092322112 pyconfig.py:471] Config param moba: False I0425 12:22:38.895191 135043092322112 pyconfig.py:471] Config param moba_chunk_size: 1024 I0425 12:22:38.895216 135043092322112 pyconfig.py:471] Config param moba_topk: 8 I0425 12:22:38.895241 135043092322112 pyconfig.py:471] Config param model_call_mode: I0425 12:22:38.895265 135043092322112 pyconfig.py:471] Config param model_name: gpt3-52k I0425 12:22:38.895289 135043092322112 pyconfig.py:471] Config param moe_expert_input_dim: -1 I0425 12:22:38.895318 135043092322112 pyconfig.py:471] Config param moe_fsdp_use_two_stage_all_gather: False I0425 12:22:38.895343 135043092322112 pyconfig.py:471] Config param moe_mlp_dim: -1 I0425 12:22:38.895369 135043092322112 pyconfig.py:471] Config param moe_mlpwi_0: RematLocation.REMAT I0425 12:22:38.895395 135043092322112 pyconfig.py:471] Config param moe_mlpwi_1: RematLocation.REMAT I0425 12:22:38.895421 135043092322112 pyconfig.py:471] Config param moe_mlpwo: RematLocation.REMAT I0425 12:22:38.895447 135043092322112 pyconfig.py:471] Config param monitor_goodput: False I0425 12:22:38.895470 135043092322112 pyconfig.py:471] Config param monitor_step_time_deviation: True I0425 12:22:38.895492 135043092322112 pyconfig.py:471] Config param mrope_section: [24, 20, 20] I0425 12:22:38.895511 135043092322112 pyconfig.py:471] Config param mscale: 1.0 I0425 12:22:38.895527 135043092322112 pyconfig.py:471] Config param mtc_data_parallelism: 0 I0425 12:22:38.895542 135043092322112 pyconfig.py:471] Config param mtp_eval_target_module: 0 I0425 12:22:38.895560 135043092322112 pyconfig.py:471] Config param mtp_loss_scaling_factor: 0.1 I0425 12:22:38.895586 135043092322112 pyconfig.py:471] Config param mtp_num_layers: 0 I0425 12:22:38.895611 135043092322112 pyconfig.py:471] Config param mu_dtype: float32 I0425 12:22:38.895651 135043092322112 pyconfig.py:471] Config param multi_sampling: False I0425 12:22:38.895678 135043092322112 pyconfig.py:471] Config param multi_tier_checkpointing_backup_interval_minutes: 0 I0425 12:22:38.895703 135043092322112 pyconfig.py:471] Config param muon_beta: 0.95 I0425 12:22:38.895729 135043092322112 pyconfig.py:471] Config param muon_consistent_rms: None I0425 12:22:38.895755 135043092322112 pyconfig.py:471] Config param muon_weight_decay: 0.0 I0425 12:22:38.895780 135043092322112 pyconfig.py:471] Config param n_routing_groups: -1 I0425 12:22:38.895805 135043092322112 pyconfig.py:471] Config param n_window_for_audio: 50 I0425 12:22:38.895830 135043092322112 pyconfig.py:471] Config param n_window_infer_for_audio: 800 I0425 12:22:38.895855 135043092322112 pyconfig.py:471] Config param nope_layer_interval: -1 I0425 12:22:38.895880 135043092322112 pyconfig.py:471] Config param norm_topk_prob: False I0425 12:22:38.895905 135043092322112 pyconfig.py:471] Config param normalization_layer_epsilon: 1e-05 I0425 12:22:38.895933 135043092322112 pyconfig.py:471] Config param normalize_embedding_logits: False I0425 12:22:38.895959 135043092322112 pyconfig.py:471] Config param num_attention_heads_for_vit: 16 I0425 12:22:38.895984 135043092322112 pyconfig.py:471] Config param num_batches: 4 I0425 12:22:38.896009 135043092322112 pyconfig.py:471] Config param num_channels_for_vit: 3 I0425 12:22:38.896034 135043092322112 pyconfig.py:471] Config param num_conv_layers_for_audio: 3 I0425 12:22:38.896059 135043092322112 pyconfig.py:471] Config param num_decoder_layers: 1 I0425 12:22:38.896086 135043092322112 pyconfig.py:471] Config param num_diloco_replicas: 1 I0425 12:22:38.896128 135043092322112 pyconfig.py:471] Config param num_epoch: 1 I0425 12:22:38.896155 135043092322112 pyconfig.py:471] Config param num_eval_passes: 1 I0425 12:22:38.896182 135043092322112 pyconfig.py:471] Config param num_experts: 1 I0425 12:22:38.896206 135043092322112 pyconfig.py:471] Config param num_experts_per_tok: 1 I0425 12:22:38.896228 135043092322112 pyconfig.py:471] Config param num_generations: 2 I0425 12:22:38.896249 135043092322112 pyconfig.py:471] Config param num_hidden_layers_for_vit: 34 I0425 12:22:38.896266 135043092322112 pyconfig.py:471] Config param num_iterations: 1 I0425 12:22:38.896282 135043092322112 pyconfig.py:471] Config param num_kv_heads: 2 I0425 12:22:38.896300 135043092322112 pyconfig.py:471] Config param num_layers_per_pipeline_stage: 1 I0425 12:22:38.896324 135043092322112 pyconfig.py:471] Config param num_mel_bins_for_audio: 128 I0425 12:22:38.896349 135043092322112 pyconfig.py:471] Config param num_pipeline_microbatches: -1 I0425 12:22:38.896374 135043092322112 pyconfig.py:471] Config param num_pipeline_repeats: -1 I0425 12:22:38.896397 135043092322112 pyconfig.py:471] Config param num_position_embeddings_for_vit: 1024 I0425 12:22:38.896420 135043092322112 pyconfig.py:471] Config param num_query_heads: 2 I0425 12:22:38.896445 135043092322112 pyconfig.py:471] Config param num_samplers_slices: -1 I0425 12:22:38.896471 135043092322112 pyconfig.py:471] Config param num_slices: 1 I0425 12:22:38.896496 135043092322112 pyconfig.py:471] Config param num_target_devices: 32 I0425 12:22:38.896521 135043092322112 pyconfig.py:471] Config param num_test_batches: 5 I0425 12:22:38.896545 135043092322112 pyconfig.py:471] Config param num_trainer_slices: -1 I0425 12:22:38.896570 135043092322112 pyconfig.py:471] Config param num_vocab_tiling: 1 I0425 12:22:38.896595 135043092322112 pyconfig.py:471] Config param off_policy_steps: 0 I0425 12:22:38.896620 135043092322112 pyconfig.py:471] Config param offline_data_dir: None I0425 12:22:38.896645 135043092322112 pyconfig.py:471] Config param opt_type: OptimizerType.ADAM_PAX I0425 12:22:38.896672 135043092322112 pyconfig.py:471] Config param optimize_mesh_for_tpu_v6e: False I0425 12:22:38.896697 135043092322112 pyconfig.py:471] Config param optimizer_memory_host_offload: False I0425 12:22:38.896721 135043092322112 pyconfig.py:471] Config param original_max_position_embeddings: 4096 I0425 12:22:38.896745 135043092322112 pyconfig.py:471] Config param out_hidden_size_for_vit: 512 I0425 12:22:38.896769 135043092322112 pyconfig.py:471] Config param out_proj: RematLocation.REMAT I0425 12:22:38.896795 135043092322112 pyconfig.py:471] Config param output_dim_for_audio: 512 I0425 12:22:38.896820 135043092322112 pyconfig.py:471] Config param override_logical_axis_rules: False I0425 12:22:38.896844 135043092322112 pyconfig.py:471] Config param override_model_config: True I0425 12:22:38.896868 135043092322112 pyconfig.py:471] Config param packing: True I0425 12:22:38.896893 135043092322112 pyconfig.py:471] Config param pagedattn_head_dim_alignment: 128 I0425 12:22:38.896919 135043092322112 pyconfig.py:471] Config param pagedattn_max_pages_per_group: -1 I0425 12:22:38.896944 135043092322112 pyconfig.py:471] Config param pagedattn_num_pages: 64 I0425 12:22:38.896969 135043092322112 pyconfig.py:471] Config param pagedattn_pages_per_compute_block: 4 I0425 12:22:38.896997 135043092322112 pyconfig.py:471] Config param pagedattn_tokens_per_page: 32 I0425 12:22:38.897022 135043092322112 pyconfig.py:471] Config param param_scan_axis: 1 I0425 12:22:38.897049 135043092322112 pyconfig.py:471] Config param parameter_memory_host_offload: False I0425 12:22:38.897072 135043092322112 pyconfig.py:471] Config param partial_rotary_factor: 1.0 I0425 12:22:38.897113 135043092322112 pyconfig.py:471] Config param patch_size_for_vit: 14 I0425 12:22:38.897131 135043092322112 pyconfig.py:471] Config param penalty_incorrect_answer: -1.0 I0425 12:22:38.897148 135043092322112 pyconfig.py:471] Config param penalty_incorrect_format: -0.5 I0425 12:22:38.897163 135043092322112 pyconfig.py:471] Config param per_device_batch_size: 2 I0425 12:22:38.897179 135043092322112 pyconfig.py:471] Config param per_device_batch_size_increment: 2.0 I0425 12:22:38.897196 135043092322112 pyconfig.py:471] Config param per_device_batch_size_start: 4.0 I0425 12:22:38.897211 135043092322112 pyconfig.py:471] Config param pipeline_delay_activation_forwarding: False I0425 12:22:38.897230 135043092322112 pyconfig.py:471] Config param pipeline_fsdp_ag_once: False I0425 12:22:38.897255 135043092322112 pyconfig.py:471] Config param pipeline_fsdp_ag_per_repeat: False I0425 12:22:38.897280 135043092322112 pyconfig.py:471] Config param pipeline_parallel_layers: 1 I0425 12:22:38.897309 135043092322112 pyconfig.py:471] Config param pixel_shuffle_ratio_for_vit: 0.5 I0425 12:22:38.897335 135043092322112 pyconfig.py:471] Config param posemb_type_for_vit: learn I0425 12:22:38.897360 135043092322112 pyconfig.py:471] Config param position_id_per_seconds: 25 I0425 12:22:38.897385 135043092322112 pyconfig.py:471] Config param prefill_cache_axis_order: 1,2,0,3 I0425 12:22:38.897410 135043092322112 pyconfig.py:471] Config param prefill_cache_dir: I0425 12:22:38.897435 135043092322112 pyconfig.py:471] Config param prefill_chunk_size: 256 I0425 12:22:38.897459 135043092322112 pyconfig.py:471] Config param prefill_slice: v5e-16 I0425 12:22:38.897484 135043092322112 pyconfig.py:471] Config param prefix_caching_dram_byte: 100000000000 I0425 12:22:38.897511 135043092322112 pyconfig.py:471] Config param prefix_caching_hbm_byte: 10000000000 I0425 12:22:38.897534 135043092322112 pyconfig.py:471] Config param prefuse_moe_weights: False I0425 12:22:38.897557 135043092322112 pyconfig.py:471] Config param profile_cleanly: True I0425 12:22:38.897581 135043092322112 pyconfig.py:471] Config param profile_periodically_period: -1 I0425 12:22:38.897603 135043092322112 pyconfig.py:471] Config param profile_power_events: False I0425 12:22:38.897627 135043092322112 pyconfig.py:471] Config param profiler: ProfilerType.NONE I0425 12:22:38.897652 135043092322112 pyconfig.py:471] Config param profiler_steps: 5 I0425 12:22:38.897672 135043092322112 pyconfig.py:471] Config param projector_dropout_for_vit: 0.0 I0425 12:22:38.897696 135043092322112 pyconfig.py:471] Config param projector_input_dim_for_vit: 4096 I0425 12:22:38.897721 135043092322112 pyconfig.py:471] Config param projector_output_dim_for_vit: 4096 I0425 12:22:38.897747 135043092322112 pyconfig.py:471] Config param prometheus_port: 0 I0425 12:22:38.897773 135043092322112 pyconfig.py:471] Config param prompt: I love to I0425 12:22:38.897799 135043092322112 pyconfig.py:471] Config param pure_nnx: False I0425 12:22:38.897823 135043092322112 pyconfig.py:471] Config param pure_nnx_decoder: False I0425 12:22:38.897848 135043092322112 pyconfig.py:471] Config param q_lora_rank: 0 I0425 12:22:38.897873 135043092322112 pyconfig.py:471] Config param qk_clip_threshold: 100.0 I0425 12:22:38.897899 135043092322112 pyconfig.py:471] Config param qk_nope_head_dim: 128 I0425 12:22:38.897923 135043092322112 pyconfig.py:471] Config param qk_norm_with_scale: True I0425 12:22:38.897947 135043092322112 pyconfig.py:471] Config param qk_rope_head_dim: 64 I0425 12:22:38.897971 135043092322112 pyconfig.py:471] Config param qkv_proj: RematLocation.REMAT I0425 12:22:38.897997 135043092322112 pyconfig.py:471] Config param quant_cfg_path: I0425 12:22:38.898020 135043092322112 pyconfig.py:471] Config param quantization: QuantizationType.NONE I0425 12:22:38.898045 135043092322112 pyconfig.py:471] Config param quantization_local_shard_count: 4 I0425 12:22:38.898063 135043092322112 pyconfig.py:471] Config param quantize_kvcache: False I0425 12:22:38.898079 135043092322112 pyconfig.py:471] Config param query_proj: RematLocation.REMAT I0425 12:22:38.898105 135043092322112 pyconfig.py:471] Config param query_wa_proj: RematLocation.REMAT I0425 12:22:38.898123 135043092322112 pyconfig.py:471] Config param ragged_block_size: 256 I0425 12:22:38.898137 135043092322112 pyconfig.py:471] Config param ragged_buffer_factor: -1.0 I0425 12:22:38.898153 135043092322112 pyconfig.py:471] Config param rampup_end_step: 0 I0425 12:22:38.898168 135043092322112 pyconfig.py:471] Config param rampup_samples_per_increment_to_load: None I0425 12:22:38.898183 135043092322112 pyconfig.py:471] Config param reasoning_end_token: </reasoning> I0425 12:22:38.898198 135043092322112 pyconfig.py:471] Config param reasoning_start_token: <reasoning> I0425 12:22:38.898214 135043092322112 pyconfig.py:471] Config param record_internal_nn_metrics: 0 I0425 12:22:38.898228 135043092322112 pyconfig.py:471] Config param remat_policy: full I0425 12:22:38.898244 135043092322112 pyconfig.py:471] Config param remat_policy_for_vit: minimal I0425 12:22:38.898258 135043092322112 pyconfig.py:471] Config param remove_size_one_mesh_axis_from_type: True I0425 12:22:38.898274 135043092322112 pyconfig.py:471] Config param replicate_quant_scale: False I0425 12:22:38.898289 135043092322112 pyconfig.py:471] Config param replicator_backup_interval_minutes: 0 I0425 12:22:38.898309 135043092322112 pyconfig.py:471] Config param report_heartbeat_metric_for_gcp_monitoring: False I0425 12:22:38.898323 135043092322112 pyconfig.py:471] Config param report_performance_metric_for_gcp_monitoring: False I0425 12:22:38.898339 135043092322112 pyconfig.py:471] Config param reshape_q: False I0425 12:22:38.898353 135043092322112 pyconfig.py:471] Config param return_log_prob: False I0425 12:22:38.898369 135043092322112 pyconfig.py:471] Config param reuse_example_batch: 0 I0425 12:22:38.898384 135043092322112 pyconfig.py:471] Config param reward_exact_answer: 5.0 I0425 12:22:38.898400 135043092322112 pyconfig.py:471] Config param reward_exact_format_match: 3.0 I0425 12:22:38.898415 135043092322112 pyconfig.py:471] Config param reward_partial_format_match: 0.5 I0425 12:22:38.898431 135043092322112 pyconfig.py:471] Config param reward_ratio_guess_to_answer_high: 0.5 I0425 12:22:38.898446 135043092322112 pyconfig.py:471] Config param reward_ratio_guess_to_answer_low: 0.25 I0425 12:22:38.898472 135043092322112 pyconfig.py:471] Config param reward_white_space_format_match: 1.5 I0425 12:22:38.898495 135043092322112 pyconfig.py:471] Config param rl: {'num_generations': 2, 'num_iterations': 1, 'grpo_beta': 0.08, 'grpo_epsilon': 0.2, 'loss_algo': 'grpo', 'use_agentic_rollout': False, 'max_concurrency': 256, 'off_policy_steps': 0, 'system_prompt': '', 'degenerate_group_masking': True, 'epsilon_high': None} I0425 12:22:38.898526 135043092322112 pyconfig.py:471] Config param rollout_data_parallelism: -1 I0425 12:22:38.898552 135043092322112 pyconfig.py:471] Config param rollout_expert_parallelism: 1 I0425 12:22:38.898575 135043092322112 pyconfig.py:471] Config param rollout_micro_batch_size: -1 I0425 12:22:38.898591 135043092322112 pyconfig.py:471] Config param rollout_tensor_parallelism: -1 I0425 12:22:38.898607 135043092322112 pyconfig.py:471] Config param rope_attention_scaling: False I0425 12:22:38.898630 135043092322112 pyconfig.py:471] Config param rope_factor: 40 I0425 12:22:38.898657 135043092322112 pyconfig.py:471] Config param rope_interleave: True I0425 12:22:38.898681 135043092322112 pyconfig.py:471] Config param rope_linear_scaling_factor: 1.0 I0425 12:22:38.898704 135043092322112 pyconfig.py:471] Config param rope_max_timescale: 10000 I0425 12:22:38.898728 135043092322112 pyconfig.py:471] Config param rope_min_timescale: 1 I0425 12:22:38.898753 135043092322112 pyconfig.py:471] Config param rope_theta_for_vit: 10000 I0425 12:22:38.898779 135043092322112 pyconfig.py:471] Config param rope_truncate: True I0425 12:22:38.898803 135043092322112 pyconfig.py:471] Config param rope_type: RopeType.DEFAULT I0425 12:22:38.898833 135043092322112 pyconfig.py:471] Config param rope_use_scale: True I0425 12:22:38.898859 135043092322112 pyconfig.py:471] Config param routed_bias: False I0425 12:22:38.898885 135043092322112 pyconfig.py:471] Config param routed_bias_update_rate: 0.0 I0425 12:22:38.898908 135043092322112 pyconfig.py:471] Config param routed_scaling_factor: 1.0 I0425 12:22:38.898933 135043092322112 pyconfig.py:471] Config param routed_score_func: I0425 12:22:38.898959 135043092322112 pyconfig.py:471] Config param run_name: gpt3-52k_2026-04-25-12-22 I0425 12:22:38.898981 135043092322112 pyconfig.py:471] Config param sa_block_kv: 512 I0425 12:22:38.899003 135043092322112 pyconfig.py:471] Config param sa_block_kv_compute: 512 I0425 12:22:38.899026 135043092322112 pyconfig.py:471] Config param sa_block_kv_dkv: 512 I0425 12:22:38.899047 135043092322112 pyconfig.py:471] Config param sa_block_kv_dkv_compute: 512 I0425 12:22:38.899070 135043092322112 pyconfig.py:471] Config param sa_block_kv_dq: 512 I0425 12:22:38.899091 135043092322112 pyconfig.py:471] Config param sa_block_q: 512 I0425 12:22:38.899127 135043092322112 pyconfig.py:471] Config param sa_block_q_dkv: 512 I0425 12:22:38.899148 135043092322112 pyconfig.py:471] Config param sa_block_q_dq: 512 I0425 12:22:38.899170 135043092322112 pyconfig.py:471] Config param sa_k_layout: HEAD_DIM_MINOR I0425 12:22:38.899192 135043092322112 pyconfig.py:471] Config param sa_q_layout: HEAD_DIM_MINOR I0425 12:22:38.899214 135043092322112 pyconfig.py:471] Config param sa_use_fused_bwd_kernel: False I0425 12:22:38.899237 135043092322112 pyconfig.py:471] Config param sa_v_layout: HEAD_DIM_MINOR I0425 12:22:38.899260 135043092322112 pyconfig.py:471] Config param sampler_devices_fraction: 0.5 I0425 12:22:38.899285 135043092322112 pyconfig.py:471] Config param save_checkpoint_on_completion: True I0425 12:22:38.899318 135043092322112 pyconfig.py:471] Config param save_config_to_gcs: False I0425 12:22:38.899342 135043092322112 pyconfig.py:471] Config param save_quantized_params_path: I0425 12:22:38.899366 135043092322112 pyconfig.py:471] Config param scale_embedding_for_audio: True I0425 12:22:38.899389 135043092322112 pyconfig.py:471] Config param scan_layers: True I0425 12:22:38.899413 135043092322112 pyconfig.py:471] Config param scan_layers_per_stage: False I0425 12:22:38.899436 135043092322112 pyconfig.py:471] Config param scan_pipeline_iterations: True I0425 12:22:38.899460 135043092322112 pyconfig.py:471] Config param scan_pipeline_repeats: False I0425 12:22:38.899483 135043092322112 pyconfig.py:471] Config param set_remat_policy_on_layers_per_stage: False I0425 12:22:38.899506 135043092322112 pyconfig.py:471] Config param set_remat_policy_on_pipeline_iterations: True I0425 12:22:38.899530 135043092322112 pyconfig.py:471] Config param sft_train_on_completion_only: False I0425 12:22:38.899552 135043092322112 pyconfig.py:471] Config param shard_exp_on_fsdp: False I0425 12:22:38.899576 135043092322112 pyconfig.py:471] Config param shard_mode: ShardMode.AUTO I0425 12:22:38.899602 135043092322112 pyconfig.py:471] Config param shard_optimizer_over_data: False I0425 12:22:38.899625 135043092322112 pyconfig.py:471] Config param sharding_strategy: None I0425 12:22:38.899648 135043092322112 pyconfig.py:471] Config param sharding_tolerance: 0.02 I0425 12:22:38.899672 135043092322112 pyconfig.py:471] Config param shardy: True I0425 12:22:38.899695 135043092322112 pyconfig.py:471] Config param share_kv_projections: False I0425 12:22:38.899718 135043092322112 pyconfig.py:471] Config param shared_experts: 0 I0425 12:22:38.899740 135043092322112 pyconfig.py:471] Config param sinkhorn_iterations: 20 I0425 12:22:38.899763 135043092322112 pyconfig.py:471] Config param skip_first_n_steps_for_profiler: 1 I0425 12:22:38.899786 135043092322112 pyconfig.py:471] Config param skip_jax_distributed_system: False I0425 12:22:38.899809 135043092322112 pyconfig.py:471] Config param skip_step_interval: 128 I0425 12:22:38.899832 135043092322112 pyconfig.py:471] Config param skip_step_on_spikes: False I0425 12:22:38.899854 135043092322112 pyconfig.py:471] Config param skip_step_scaling_factor: 6.0 I0425 12:22:38.899878 135043092322112 pyconfig.py:471] Config param sliding_window_size: 0 I0425 12:22:38.899900 135043092322112 pyconfig.py:471] Config param solution_end_token: </answer> I0425 12:22:38.899922 135043092322112 pyconfig.py:471] Config param solution_start_token: <answer> I0425 12:22:38.899945 135043092322112 pyconfig.py:471] Config param source_checkpoint_layout: orbax I0425 12:22:38.899969 135043092322112 pyconfig.py:471] Config param sparse_matmul: True I0425 12:22:38.899991 135043092322112 pyconfig.py:471] Config param spatial_merge_size_for_vit: 2 I0425 12:22:38.900015 135043092322112 pyconfig.py:471] Config param stack_prefill_result_cache: False I0425 12:22:38.900038 135043092322112 pyconfig.py:471] Config param stack_trace_interval_seconds: 600 I0425 12:22:38.900062 135043092322112 pyconfig.py:471] Config param stack_trace_to_cloud: False I0425 12:22:38.900087 135043092322112 pyconfig.py:471] Config param step_deviation_interval_seconds: 30 I0425 12:22:38.900125 135043092322112 pyconfig.py:471] Config param steps: 200000 I0425 12:22:38.900150 135043092322112 pyconfig.py:471] Config param stop_strings: None I0425 12:22:38.900174 135043092322112 pyconfig.py:471] Config param student_overrides: {'model_name': 'llama3.1-8b'} I0425 12:22:38.900199 135043092322112 pyconfig.py:471] Config param student_params_to_update: None I0425 12:22:38.900222 135043092322112 pyconfig.py:471] Config param subslice_shape: I0425 12:22:38.900248 135043092322112 pyconfig.py:471] Config param swap_space_vllm_gb: 2 I0425 12:22:38.900274 135043092322112 pyconfig.py:471] Config param system_prompt: I0425 12:22:38.900308 135043092322112 pyconfig.py:471] Config param target_eval_loss: 0.0 I0425 12:22:38.900334 135043092322112 pyconfig.py:471] Config param teacher_overrides: {'model_name': 'llama3.1-8b'} I0425 12:22:38.900355 135043092322112 pyconfig.py:471] Config param temperature_tuning: False I0425 12:22:38.900370 135043092322112 pyconfig.py:471] Config param temporal_patch_size_for_vit: 2 I0425 12:22:38.900385 135043092322112 pyconfig.py:471] Config param tensorboard_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-12-22/tensorboard/ I0425 12:22:38.900401 135043092322112 pyconfig.py:471] Config param tensors_on_device: None I0425 12:22:38.900418 135043092322112 pyconfig.py:471] Config param tensors_to_offload: None I0425 12:22:38.900432 135043092322112 pyconfig.py:471] Config param test_batch_start_index: 0 I0425 12:22:38.900448 135043092322112 pyconfig.py:471] Config param tile_size_for_vit: 336 I0425 12:22:38.900463 135043092322112 pyconfig.py:471] Config param tokenize_eval_data: True I0425 12:22:38.900478 135043092322112 pyconfig.py:471] Config param tokenize_train_data: True I0425 12:22:38.900494 135043092322112 pyconfig.py:471] Config param tokenizer_path: meta-llama/Llama-3.1-8B I0425 12:22:38.900508 135043092322112 pyconfig.py:471] Config param tokenizer_type: TokenizerType.HUGGINGFACE I0425 12:22:38.900527 135043092322112 pyconfig.py:471] Config param topk_routing_group: -1 I0425 12:22:38.900543 135043092322112 pyconfig.py:471] Config param train_data_columns: ['text'] I0425 12:22:38.900562 135043092322112 pyconfig.py:471] Config param train_fraction: 1.0 I0425 12:22:38.900579 135043092322112 pyconfig.py:471] Config param train_image_column: image I0425 12:22:38.900594 135043092322112 pyconfig.py:471] Config param train_micro_batch_size: -1 I0425 12:22:38.900609 135043092322112 pyconfig.py:471] Config param train_split: train I0425 12:22:38.900625 135043092322112 pyconfig.py:471] Config param trainable_parameters_mask: [] I0425 12:22:38.900640 135043092322112 pyconfig.py:471] Config param trainable_position_size: 2048 I0425 12:22:38.900656 135043092322112 pyconfig.py:471] Config param trainer_devices_fraction: 0.5 I0425 12:22:38.900673 135043092322112 pyconfig.py:471] Config param upload_all_profiler_results: False I0425 12:22:38.900688 135043092322112 pyconfig.py:471] Config param use_2d_fsdp_sharding: False I0425 12:22:38.900704 135043092322112 pyconfig.py:471] Config param use_agentic_rollout: False I0425 12:22:38.900720 135043092322112 pyconfig.py:471] Config param use_audio: False I0425 12:22:38.900734 135043092322112 pyconfig.py:471] Config param use_audio_in_video: False I0425 12:22:38.900750 135043092322112 pyconfig.py:471] Config param use_batch_split_schedule: False I0425 12:22:38.900765 135043092322112 pyconfig.py:471] Config param use_chat_template: False I0425 12:22:38.900781 135043092322112 pyconfig.py:471] Config param use_chunked_prefill: False I0425 12:22:38.900797 135043092322112 pyconfig.py:471] Config param use_custom_sort_vjp: True I0425 12:22:38.900812 135043092322112 pyconfig.py:471] Config param use_dpo: False I0425 12:22:38.900827 135043092322112 pyconfig.py:471] Config param use_gather_mosaic_kernel: False I0425 12:22:38.900843 135043092322112 pyconfig.py:471] Config param use_grpo: True I0425 12:22:38.900858 135043092322112 pyconfig.py:471] Config param use_indexer: False I0425 12:22:38.900873 135043092322112 pyconfig.py:471] Config param use_iota_embed: True I0425 12:22:38.900888 135043092322112 pyconfig.py:471] Config param use_jax_splash: False I0425 12:22:38.900904 135043092322112 pyconfig.py:471] Config param use_max_logit_estimate: -1 I0425 12:22:38.900919 135043092322112 pyconfig.py:471] Config param use_mrope: False I0425 12:22:38.900935 135043092322112 pyconfig.py:471] Config param use_multimodal: False I0425 12:22:38.900949 135043092322112 pyconfig.py:471] Config param use_pathways: True I0425 12:22:38.900965 135043092322112 pyconfig.py:471] Config param use_post_attn_norm: False I0425 12:22:38.900979 135043092322112 pyconfig.py:471] Config param use_post_ffw_norm: False I0425 12:22:38.900995 135043092322112 pyconfig.py:471] Config param use_qk_clip: False I0425 12:22:38.901009 135043092322112 pyconfig.py:471] Config param use_qk_norm: False I0425 12:22:38.901025 135043092322112 pyconfig.py:471] Config param use_qk_norm_in_gdn: True I0425 12:22:38.901039 135043092322112 pyconfig.py:471] Config param use_qwix_quantization: False I0425 12:22:38.901055 135043092322112 pyconfig.py:471] Config param use_ragged_attention: False I0425 12:22:38.901069 135043092322112 pyconfig.py:471] Config param use_random_routing: False I0425 12:22:38.901085 135043092322112 pyconfig.py:471] Config param use_replicator_service: False I0425 12:22:38.901121 135043092322112 pyconfig.py:471] Config param use_ring_of_experts: False I0425 12:22:38.901136 135043092322112 pyconfig.py:471] Config param use_sft: False I0425 12:22:38.901152 135043092322112 pyconfig.py:471] Config param use_splash_scheduler: False I0425 12:22:38.901166 135043092322112 pyconfig.py:471] Config param use_tokamax_gmm: False I0425 12:22:38.901183 135043092322112 pyconfig.py:471] Config param use_tokamax_splash: False I0425 12:22:38.901197 135043092322112 pyconfig.py:471] Config param use_truncation: True I0425 12:22:38.901212 135043092322112 pyconfig.py:471] Config param use_tunix_gradient_accumulation: False I0425 12:22:38.901227 135043092322112 pyconfig.py:471] Config param use_untrainable_positional_embedding: False I0425 12:22:38.901243 135043092322112 pyconfig.py:471] Config param use_vertex_tensorboard: False I0425 12:22:38.901257 135043092322112 pyconfig.py:471] Config param using_pipeline_parallelism: False I0425 12:22:38.901274 135043092322112 pyconfig.py:471] Config param v_head_dim: 128 I0425 12:22:38.901288 135043092322112 pyconfig.py:471] Config param v_norm_with_scale: True I0425 12:22:38.901307 135043092322112 pyconfig.py:471] Config param value_proj: RematLocation.REMAT I0425 12:22:38.901323 135043092322112 pyconfig.py:471] Config param vertex_tensorboard_project: I0425 12:22:38.901339 135043092322112 pyconfig.py:471] Config param vertex_tensorboard_region: I0425 12:22:38.901353 135043092322112 pyconfig.py:471] Config param video_path: I0425 12:22:38.901369 135043092322112 pyconfig.py:471] Config param video_placeholder: <|video|> I0425 12:22:38.901383 135043092322112 pyconfig.py:471] Config param vision_output_dim_for_vit: 4096 I0425 12:22:38.901399 135043092322112 pyconfig.py:471] Config param vision_output_length: -1 I0425 12:22:38.901413 135043092322112 pyconfig.py:471] Config param vllm_additional_config: {} I0425 12:22:38.901430 135043092322112 pyconfig.py:471] Config param vllm_hf_config_path: I0425 12:22:38.901444 135043092322112 pyconfig.py:471] Config param vllm_hf_overrides: {} I0425 12:22:38.901460 135043092322112 pyconfig.py:471] Config param vocab_size: 32000 I0425 12:22:38.901474 135043092322112 pyconfig.py:471] Config param warmup_steps_fraction: 0.1 I0425 12:22:38.901491 135043092322112 pyconfig.py:471] Config param weight_dtype: float32 I0425 12:22:38.901520 135043092322112 pyconfig.py:471] Config param weight_quantization_calibration_method: absmax I0425 12:22:38.901535 135043092322112 pyconfig.py:471] Config param wi_tile_dlhs_batch_seq: 512 I0425 12:22:38.901551 135043092322112 pyconfig.py:471] Config param wi_tile_dlhs_embed_dim: 1024 I0425 12:22:38.901566 135043092322112 pyconfig.py:471] Config param wi_tile_dlhs_mlp_dim: 1024 I0425 12:22:38.901582 135043092322112 pyconfig.py:471] Config param wi_tile_drhs_batch_seq: 512 I0425 12:22:38.901598 135043092322112 pyconfig.py:471] Config param wi_tile_drhs_embed_dim: 1024 I0425 12:22:38.901615 135043092322112 pyconfig.py:471] Config param wi_tile_drhs_mlp_dim: 1024 I0425 12:22:38.901629 135043092322112 pyconfig.py:471] Config param wi_tile_fwd_batch_seq: 512 I0425 12:22:38.901645 135043092322112 pyconfig.py:471] Config param wi_tile_fwd_embed_dim: 1024 I0425 12:22:38.901659 135043092322112 pyconfig.py:471] Config param wi_tile_fwd_mlp_dim: 1024 I0425 12:22:38.901675 135043092322112 pyconfig.py:471] Config param wo_tile_dlhs_batch_seq: 512 I0425 12:22:38.901689 135043092322112 pyconfig.py:471] Config param wo_tile_dlhs_embed_dim: 1024 I0425 12:22:38.901705 135043092322112 pyconfig.py:471] Config param wo_tile_dlhs_mlp_dim: 1024 I0425 12:22:38.901720 135043092322112 pyconfig.py:471] Config param wo_tile_drhs_batch_seq: 512 I0425 12:22:38.901736 135043092322112 pyconfig.py:471] Config param wo_tile_drhs_embed_dim: 1024 I0425 12:22:38.901750 135043092322112 pyconfig.py:471] Config param wo_tile_drhs_mlp_dim: 1024 I0425 12:22:38.901766 135043092322112 pyconfig.py:471] Config param wo_tile_fwd_batch_seq: 512 I0425 12:22:38.901780 135043092322112 pyconfig.py:471] Config param wo_tile_fwd_embed_dim: 1024 I0425 12:22:38.901796 135043092322112 pyconfig.py:471] Config param wo_tile_fwd_mlp_dim: 1024 I0425 12:22:38.901810 135043092322112 pyconfig.py:471] Config param wsd_decay_steps_fraction: 0.1 I0425 12:22:38.901827 135043092322112 pyconfig.py:471] Config param wsd_decay_style: WsdDecayStyle.LINEAR I0425 12:22:38.901843 135043092322112 pyconfig.py:471] Config param xprof_e2e_enable_fw_power_level_event: False I0425 12:22:38.901859 135043092322112 pyconfig.py:471] Config param xprof_e2e_enable_fw_thermal_event: False I0425 12:22:38.901874 135043092322112 pyconfig.py:471] Config param xprof_e2e_enable_fw_throttle_event: False I0425 12:22:38.901889 135043092322112 pyconfig.py:471] Config param xprof_tpu_power_trace_level: 0 I0425 12:22:38.901906 135043092322112 pyconfig.py:471] Config param z_loss_multiplier: 0.0 I0425 12:22:38.902288 135043092322112 tokenizer.py:245] Tokenizer path: meta-llama/Llama-2-7b-chat-hf I0425 12:22:38.902334 135043092322112 tokenizer.py:224] Loading HF tokenizer: meta-llama/Llama-2-7b-chat-hf I0425 12:22:39.080019 135043092322112 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json "HTTP/1.1 200 OK" I0425 12:22:39.188948 135043092322112 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json "HTTP/1.1 200 OK" I0425 12:22:39.446728 135043092322112 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer_config.json "HTTP/1.1 200 OK" I0425 12:22:39.557550 135043092322112 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer_config.json "HTTP/1.1 200 OK" I0425 12:22:39.677907 135043092322112 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main/additional_chat_templates?recursive=false&expand=false "HTTP/1.1 404 Not Found" I0425 12:22:39.795863 135043092322112 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main?recursive=true&expand=false "HTTP/1.1 200 OK" I0425 12:22:39.903144 135043092322112 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer.model "HTTP/1.1 302 Found" I0425 12:22:40.027707 135043092322112 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/xet-read-token/f5db02db724555f92da89c216ac04704f23d4590 "HTTP/1.1 200 OK" I0425 12:22:40.662429 135043092322112 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer.json "HTTP/1.1 200 OK" I0425 12:22:40.771512 135043092322112 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer.json "HTTP/1.1 200 OK" I0425 12:22:41.168534 135043092322112 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/added_tokens.json "HTTP/1.1 404 Not Found" I0425 12:22:41.276163 135043092322112 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/special_tokens_map.json "HTTP/1.1 200 OK" I0425 12:22:41.386924 135043092322112 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/special_tokens_map.json "HTTP/1.1 200 OK" I0425 12:22:41.505843 135043092322112 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/chat_template.jinja "HTTP/1.1 404 Not Found" I0425 12:22:41.598463 135043092322112 _schedule.py:129] A polynomial schedule was set with a non-positive `transition_steps` value; this results in a constant schedule with value `init_value`. I0425 12:22:41.605594 135043092322112 maxtext_utils.py:1771] Num_devices: 32, shape (1, 4, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1) I0425 12:22:41.605738 135043092322112 train_distill.py:594] Applying logical axis rules for model initialization and training... I0425 12:22:41.605813 135043092322112 train_distill.py:598] Loading Student from ... I0425 12:22:41.605843 135043092322112 train_distill.py:170] --- Student Configuration --- I0425 12:22:41.605863 135043092322112 train_distill.py:171] Model Name: gpt3-52k I0425 12:22:41.605883 135043092322112 train_distill.py:172] Dimensions: 1 Layers, 16 Emb Dim, 8 Head Dim I0425 12:22:41.605899 135043092322112 train_distill.py:175] Attention Heads: 2 Query, 2 KV I0425 12:22:41.605916 135043092322112 train_distill.py:176] Vocab Size: 32000 I0425 12:22:41.605934 135043092322112 train_distill.py:177] Checkpoint: I0425 12:22:41.605952 135043092322112 train_distill.py:463] Initializing model: gpt3-52k... I0425 12:22:43.353833 135043092322112 train_distill.py:612] Loading Teacher from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items... I0425 12:22:43.353945 135043092322112 train_distill.py:170] --- Teacher Configuration --- I0425 12:22:43.353975 135043092322112 train_distill.py:171] Model Name: gpt3-52k I0425 12:22:43.354000 135043092322112 train_distill.py:172] Dimensions: 1 Layers, 16 Emb Dim, 8 Head Dim I0425 12:22:43.354021 135043092322112 train_distill.py:175] Attention Heads: 2 Query, 2 KV I0425 12:22:43.354042 135043092322112 train_distill.py:176] Vocab Size: 32000 I0425 12:22:43.354061 135043092322112 train_distill.py:177] Checkpoint: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items I0425 12:22:43.354082 135043092322112 train_distill.py:463] Initializing model: gpt3-52k... I0425 12:22:44.343260 135043092322112 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0425 12:22:44.343417 135043092322112 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=True, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7ad173554410>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0425 12:22:44.343477 135043092322112 abstract_checkpointer.py:35] orbax-checkpoint version: 0.11.28 W0425 12:22:44.844952 135043092322112 checkpoint.py:202] Metadata file does not exist: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items/_CHECKPOINT_METADATA I0425 12:22:45.375808 1943 google_auth_provider.cc:181] Running on GCE, using service account 562977990677-compute@developer.gserviceaccount.com I0425 12:22:46.645027 135043092322112 checkpointer.py:304] Restoring checkpoint from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items. W0425 12:22:48.777385 135043092322112 transform_utils.py:230] The transformations API will eventually be replaced by an upgraded design. The current API will not be removed until this point, but it will no longer be actively worked on. I0425 12:22:48.777755 135043092322112 transform_utils.py:288] The following keys are not loaded from the original tree after applying specified transforms: params/params/decoder/to_nnx__rngs/aqt/count, params/params/decoder/to_nnx__rngs/aqt/key, params/params/decoder/to_nnx__rngs/dropout/count, params/params/decoder/to_nnx__rngs/dropout/key, params/params/decoder/to_nnx__rngs/params/count, params/params/decoder/to_nnx__rngs/params/key I0425 12:22:50.676408 135043092322112 checkpointer.py:318] Finished restoring checkpoint in 4.41 seconds from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items. I0425 12:22:51.401952 135043092322112 train_distill.py:638] Initializing Data Iterators via MaxText pipeline... I0425 12:22:51.464447 135043092322112 config.py:112] TensorFlow version 2.20.0 available. I0425 12:22:51.464941 135043092322112 config.py:125] JAX version 0.9.2 available. I0425 12:22:51.859261 135043092322112 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/main/README.md "HTTP/1.1 307 Temporary Redirect" I0425 12:22:51.867042 135043092322112 _client.py:1025] HTTP Request: HEAD https://huggingface.co/api/resolve-cache/datasets/OptimalScale/ClimbMix/6d467b96d8f26cbe7465e2d70e36191aa75867ac/README.md "HTTP/1.1 200 OK" I0425 12:22:51.876678 135043092322112 _client.py:1025] HTTP Request: GET https://huggingface.co/api/resolve-cache/datasets/OptimalScale/ClimbMix/6d467b96d8f26cbe7465e2d70e36191aa75867ac/README.md "HTTP/1.1 200 OK" I0425 12:22:51.984472 135043092322112 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/6d467b96d8f26cbe7465e2d70e36191aa75867ac/ClimbMix.py "HTTP/1.1 404 Not Found" I0425 12:22:52.289181 135043092322112 _client.py:1025] HTTP Request: HEAD https://s3.amazonaws.com/datasets.huggingface.co/datasets/datasets/OptimalScale/ClimbMix/OptimalScale/ClimbMix.py "HTTP/1.1 404 Not Found" I0425 12:22:52.401256 135043092322112 _client.py:1025] HTTP Request: GET https://huggingface.co/api/datasets/OptimalScale/ClimbMix/revision/6d467b96d8f26cbe7465e2d70e36191aa75867ac "HTTP/1.1 200 OK" I0425 12:22:52.520518 135043092322112 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/6d467b96d8f26cbe7465e2d70e36191aa75867ac/.huggingface.yaml "HTTP/1.1 404 Not Found" I0425 12:22:52.702140 135043092322112 _client.py:1025] HTTP Request: GET https://datasets-server.huggingface.co/info?dataset=OptimalScale/ClimbMix "HTTP/1.1 200 OK" I0425 12:22:52.810872 135043092322112 _client.py:1025] HTTP Request: GET https://huggingface.co/api/datasets/OptimalScale/ClimbMix/tree/6d467b96d8f26cbe7465e2d70e36191aa75867ac/data?recursive=true&expand=false "HTTP/1.1 404 Not Found" I0425 12:22:52.922900 135043092322112 _client.py:1025] HTTP Request: GET https://huggingface.co/api/datasets/OptimalScale/ClimbMix/tree/6d467b96d8f26cbe7465e2d70e36191aa75867ac?recursive=false&expand=false "HTTP/1.1 200 OK" I0425 12:22:53.061565 135043092322112 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/6d467b96d8f26cbe7465e2d70e36191aa75867ac/dataset_infos.json "HTTP/1.1 404 Not Found" I0425 12:22:53.226361 135043092322112 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json "HTTP/1.1 200 OK" I0425 12:22:53.343992 135043092322112 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer_config.json "HTTP/1.1 200 OK" I0425 12:22:53.454885 135043092322112 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main/additional_chat_templates?recursive=false&expand=false "HTTP/1.1 404 Not Found" I0425 12:22:53.558966 135043092322112 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main?recursive=true&expand=false "HTTP/1.1 200 OK" E0425 12:22:53.655039 135043092322112 packing.py:209] PackAndBatchOperation is deprecated. Please use lazy_dataset.FirstFitPackIterDataset instead. I0425 12:22:53.655261 135043092322112 data_loader.py:408] Adding CopyNumPyArrayToSharedMemory MapTransform. I0425 12:22:53.658295 135043092322112 train_distill.py:408] Input Pipeline Checkpointing: DISABLED I0425 12:22:53.658356 135043092322112 train_distill.py:412] Reason: Iterator 'MultiHostDataLoadIterator' is not recognized as Grain (dataset_type='DatasetType.HF', has_save=False) I0425 12:22:53.658424 135043092322112 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0425 12:22:53.658501 135043092322112 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7ad173554410>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0425 12:22:53.658542 135043092322112 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0425 12:22:53.658574 135043092322112 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7ad173554410>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0425 12:22:53.658616 135043092322112 checkpoint_manager.py:702] [process=6][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7acb9d8c3200>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7acb9dbd42f0>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7ab5ec1374a0>}, handler_registry=None I0425 12:22:53.658813 135043092322112 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7acb9d8c3200>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0425 12:22:53.658856 135043092322112 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7acb9dbd42f0>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0425 12:22:53.658883 135043092322112 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7ab5ec1374a0>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0425 12:22:53.658908 135043092322112 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7ab8f870e990>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0425 12:22:53.658938 135043092322112 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7acb9d8c3200>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7acb9d8c3200>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7acb9dbd42f0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7acb9dbd42f0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7ab5ec1374a0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7ab5ec1374a0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7ab8f870e990>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7ab8f870e990>}). I0425 12:22:53.659343 135043092322112 async_checkpointer.py:177] [process=6][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7ab50c4cb740> timeout: 600 secs and primary_host=0 for async checkpoint writes I0425 12:22:55.343192 135043092322112 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260425_121405/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260425_121405_07_distill_smoke/checkpoints I0425 12:22:55.358112 135043092322112 checkpoint_manager.py:921] [process=6][thread=MainThread] CheckpointManager created, primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260425_121405/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260425_121405_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7ab5ec137470> I0425 12:22:55.358223 135043092322112 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0425 12:22:55.358294 135043092322112 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7ad173554410>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0425 12:22:55.358332 135043092322112 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0425 12:22:55.358363 135043092322112 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7ad173554410>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0425 12:22:55.358398 135043092322112 checkpoint_manager.py:1983] [process=6][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning. I0425 12:22:55.358449 135043092322112 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=135043092322112 count=1 at 0x7ab50c4e2900>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7ab5ec137260>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7ab5ec137230>, _write_futures=[]) I0425 12:22:55.358804 135043092322112 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=135043092322112 count=1 at 0x7ab50c4e2900>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7ab5ec137260>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7ab5ec137230>, _write_futures=[]) I0425 12:22:55.358832 135043092322112 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=135043092322112 count=1 at 0x7ab50c4e2900>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7ab5ec137260>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7ab5ec137230>, _write_futures=[]) I0425 12:22:55.358863 135043092322112 checkpoint_manager.py:702] [process=6][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7ab5ec137440>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7ab5ec136ea0>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7ab9380cf740>, 'iter': <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7ab9380cf8c0>}, handler_registry=None I0425 12:22:55.358961 135043092322112 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7ab5ec137440>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0425 12:22:55.358994 135043092322112 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7ab5ec136ea0>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0425 12:22:55.359019 135043092322112 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7ab9380cf740>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0425 12:22:55.359051 135043092322112 composite_checkpoint_handler.py:237] Deferred registration for item: "iter". Adding handler `<maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7ab9380cf8c0>` for item "iter" and save args `<class 'maxtext.common.checkpointing.GrainCheckpointSave'>` and restore args `<class 'maxtext.common.checkpointing.GrainCheckpointRestore'>` to `_handler_registry`. I0425 12:22:55.359073 135043092322112 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7ab9380ce6c0>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0425 12:22:55.359116 135043092322112 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7ab5ec137440>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7ab5ec137440>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7ab5ec136ea0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7ab5ec136ea0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7ab9380cf740>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7ab9380cf740>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointSave'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7ab9380cf8c0>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointRestore'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7ab9380cf8c0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7ab9380ce6c0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7ab9380ce6c0>}). I0425 12:22:55.359190 135043092322112 async_checkpointer.py:177] [process=6][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7ab50c4cb880> timeout: 600 secs and primary_host=0 for async checkpoint writes I0425 12:22:55.740898 135043092322112 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260425_121405/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260425_121405_07_distill_smoke/checkpoints I0425 12:22:56.179495 135043092322112 checkpoint_manager.py:921] [process=6][thread=MainThread] CheckpointManager created, primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260425_121405/pt_distill_linen_xpk_feat_nnx_post_train_fixes_20260425_121405_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7ab5ec136870> I0425 12:22:56.180088 135043092322112 train_distill.py:689] Starting Distillation Training... I0425 12:22:56.180223 135043092322112 peft_trainer.py:584] Training with mesh: Mesh('diloco': 1, 'data': 4, 'stage': 1, 'fsdp': 8, 'fsdp_transpose': 1, 'sequence': 1, 'context': 1, 'context_autoregressive': 1, 'tensor': 1, 'tensor_transpose': 1, 'tensor_sequence': 1, 'expert': 1, 'autoregressive': 1, axis_types=(Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto)) I0425 12:22:56.641991 135043092322112 peft_trainer.py:594] Compiled train_step cache size: 0 I0425 12:22:56.643623 134889974241024 grain_pool.py:367] Grain pool will use 1 processes. I0425 12:22:56.702237 134889974241024 grain_pool.py:440] Grain pool will start child processes. I0425 12:22:56.707898 134889974241024 grain_pool.py:448] Grain pool started all child processes. 2026-04-25 12:23:03.241574: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303) `rope_parameters`'s factor field must be a float >= 1, got 40 `rope_parameters`'s beta_fast field must be a float, got 32 `rope_parameters`'s beta_slow field must be a float, got 1 DeepseekV32Config got `key=rope_scaling` in kwargs but hasn't set it as attribute. For RoPE standardization you need to set `self.rope_parameters` in model's config. Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 793, in <module> app.run(main) File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 316, in run _run_main(main, args) File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 261, in _run_main sys.exit(main(argv)) ^^^^^^^^^^ File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 789, in main train_distill(student_config, teacher_config, is_offline, global_config.offline_data_dir) File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 691, in train_distill trainer.train(train_iter, eval_iter) File "/usr/local/lib/python3.12/site-packages/tunix/sft/peft_trainer.py", line 652, in train train_example = sharding_utils.shard_input( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 58, in shard_input return jax.tree.map( ^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/tree.py", line 156, in map return tree_util.tree_map(f, tree, *rest, is_leaf=is_leaf) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 373, in tree_map return treedef.unflatten(f(*xs) for xs in zip(*all_leaves)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 373, in <genexpr> return treedef.unflatten(f(*xs) for xs in zip(*all_leaves)) ^^^^^^ File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 59, in <lambda> lambda x: jax.make_array_from_process_local_data( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 985, in make_array_from_process_local_data out = [_array_from_process_local_data(data, s, shape) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 1047, in _array_from_process_local_data return make_array_from_callback(global_shape, sharding, cb) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 844, in make_array_from_callback per_device_values = api.device_put(per_device_values, devices) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/api.py", line 2732, in device_put out_flat = dispatch._batched_device_put_impl( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 602, in _batched_device_put_impl y = _device_put_impl(x, device=device, src=src, copy=cp, aval=aval) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 582, in _device_put_impl return _device_put_sharding_impl(x, aval, device, copy) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 512, in _device_put_sharding_impl raise ValueError( ValueError: When the second argument to `device_put` is a Device, the first argument must be a fully addressable array or a non-addressable array with a single device sharding. Got value with devices {TpuDevice(id=15, process_index=3, coords=(3,3,0), core_on_chip=0), TpuDevice(id=27, process_index=7, coords=(3,6,0), core_on_chip=0), TpuDevice(id=21, process_index=4, coords=(1,5,0), core_on_chip=0), TpuDevice(id=17, process_index=4, coords=(1,4,0), core_on_chip=0), TpuDevice(id=10, process_index=3, coords=(2,2,0), core_on_chip=0), TpuDevice(id=4, process_index=0, coords=(0,1,0), core_on_chip=0), TpuDevice(id=7, process_index=1, coords=(3,1,0), core_on_chip=0), TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), TpuDevice(id=13, process_index=2, coords=(1,3,0), core_on_chip=0), TpuDevice(id=28, process_index=6, coords=(0,7,0), core_on_chip=0), TpuDevice(id=30, process_index=7, coords=(2,7,0), core_on_chip=0), TpuDevice(id=18, process_index=5, coords=(2,4,0), core_on_chip=0), TpuDevice(id=23, process_index=5, coords=(3,5,0), core_on_chip=0), TpuDevice(id=6, process_index=1, coords=(2,1,0), core_on_chip=0), TpuDevice(id=8, process_index=2, coords=(0,2,0), core_on_chip=0), TpuDevice(id=1, process_index=0, coords=(1,0,0), core_on_chip=0), TpuDevice(id=2, process_index=1, coords=(2,0,0), core_on_chip=0), TpuDevice(id=29, process_index=6, coords=(1,7,0), core_on_chip=0), TpuDevice(id=20, process_index=4, coords=(0,5,0), core_on_chip=0), TpuDevice(id=31, process_index=7, coords=(3,7,0), core_on_chip=0), TpuDevice(id=14, process_index=3, coords=(2,3,0), core_on_chip=0), TpuDevice(id=3, process_index=1, coords=(3,0,0), core_on_chip=0), TpuDevice(id=9, process_index=2, coords=(1,2,0), core_on_chip=0), TpuDevice(id=11, process_index=3, coords=(3,2,0), core_on_chip=0), TpuDevice(id=24, process_index=6, coords=(0,6,0), core_on_chip=0), TpuDevice(id=19, process_index=5, coords=(3,4,0), core_on_chip=0), TpuDevice(id=22, process_index=5, coords=(2,5,0), core_on_chip=0), TpuDevice(id=26, process_index=7, coords=(2,6,0), core_on_chip=0), TpuDevice(id=16, process_index=4, coords=(0,4,0), core_on_chip=0), TpuDevice(id=12, process_index=2, coords=(0,3,0), core_on_chip=0), TpuDevice(id=5, process_index=0, coords=(1,1,0), core_on_chip=0), TpuDevice(id=25, process_index=6, coords=(1,6,0), core_on_chip=0)} I0425 12:23:07.360229 134889974241024 grain_pool.py:542] Grain pool is exiting. I0425 12:23:07.360344 134889974241024 grain_pool.py:547] Shutting down multiprocessing system. I0425 12:23:09.066293 134889974241024 grain_pool.py:547] Shutting down multiprocessing system. /usr/local/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 15 leaked shared_memory objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' XPK End: Sat Apr 25 12:23:18 UTC 2026 EXIT_CODE=1
XPK Start: Sat Apr 25 12:36:18 UTC 2026 2026-04-25 12:36:36.433330: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303) `rope_parameters`'s factor field must be a float >= 1, got 40 `rope_parameters`'s beta_fast field must be a float, got 32 `rope_parameters`'s beta_slow field must be a float, got 1 DeepseekV32Config got `key=rope_scaling` in kwargs but hasn't set it as attribute. For RoPE standardization you need to set `self.rope_parameters` in model's config. I0425 12:36:42.824079 132429263365952 max_utils.py:273] Attempting to initialize the jax distributed system... I0425 12:36:51.864138 132429263365952 distributed.py:149] Starting JAX distributed service on [::]:8482 I0425 12:36:51.866536 132429263365952 distributed.py:172] Connecting to JAX distributed service on mt-07-distill-smoke-8eijd-slice-job-0-0.mt-07-distill-smoke-8eijd:8482 I0425 12:36:53.027882 132429263365952 max_utils.py:284] Jax distributed system initialized! I0425 12:36:59.240904 132429263365952 max_utils.py:244] Jax distributed system is already initialized. W0425 12:36:59.373199 132429263365952 pyconfig.py:111] base_output_directory is not provided; Using local directory called maxtext_output I0425 12:36:59.434777 132429263365952 max_utils.py:244] Jax distributed system is already initialized. I0425 12:36:59.436007 132429263365952 pyconfig.py:471] Config param abort_on_inf_loss: True I0425 12:36:59.436057 132429263365952 pyconfig.py:471] Config param abort_on_nan_loss: True I0425 12:36:59.436082 132429263365952 pyconfig.py:471] Config param act_quantization_calibration_method: absmax I0425 12:36:59.436117 132429263365952 pyconfig.py:471] Config param activation_dropout_for_audio: 0.0 I0425 12:36:59.436139 132429263365952 pyconfig.py:471] Config param activation_function_for_audio: gelu I0425 12:36:59.436159 132429263365952 pyconfig.py:471] Config param activations_in_float32: False I0425 12:36:59.436177 132429263365952 pyconfig.py:471] Config param adam_b1: 0.9 I0425 12:36:59.436198 132429263365952 pyconfig.py:471] Config param adam_b2: 0.95 I0425 12:36:59.436217 132429263365952 pyconfig.py:471] Config param adam_eps: 1e-08 I0425 12:36:59.436238 132429263365952 pyconfig.py:471] Config param adam_eps_root: 0.0 I0425 12:36:59.436254 132429263365952 pyconfig.py:471] Config param adam_weight_decay: 0.1 I0425 12:36:59.436272 132429263365952 pyconfig.py:471] Config param adamw_mask: [] I0425 12:36:59.436287 132429263365952 pyconfig.py:471] Config param add_bos: True I0425 12:36:59.436304 132429263365952 pyconfig.py:471] Config param add_eos: True I0425 12:36:59.436319 132429263365952 pyconfig.py:471] Config param allow_split_physical_axes: False I0425 12:36:59.436335 132429263365952 pyconfig.py:471] Config param ar_cache_axis_order: 1,2,0,3 I0425 12:36:59.436352 132429263365952 pyconfig.py:471] Config param async_checkpointing: True I0425 12:36:59.436368 132429263365952 pyconfig.py:471] Config param async_scheduling: False I0425 12:36:59.436383 132429263365952 pyconfig.py:471] Config param attention: dot_product I0425 12:36:59.436400 132429263365952 pyconfig.py:471] Config param attention_bias: False I0425 12:36:59.436418 132429263365952 pyconfig.py:471] Config param attention_dropout_for_audio: 0.0 I0425 12:36:59.436435 132429263365952 pyconfig.py:471] Config param attention_out: RematLocation.REMAT I0425 12:36:59.436455 132429263365952 pyconfig.py:471] Config param attention_output_dim: -1 I0425 12:36:59.436470 132429263365952 pyconfig.py:471] Config param attention_sink: False I0425 12:36:59.436492 132429263365952 pyconfig.py:471] Config param attention_type: global I0425 12:36:59.436509 132429263365952 pyconfig.py:471] Config param attn_logits_soft_cap: None I0425 12:36:59.436525 132429263365952 pyconfig.py:471] Config param audio_path: I0425 12:36:59.436540 132429263365952 pyconfig.py:471] Config param audio_placeholder: <|audio|> I0425 12:36:59.436556 132429263365952 pyconfig.py:471] Config param autoregressive_decode_assert: I0425 12:36:59.436573 132429263365952 pyconfig.py:471] Config param base_config: base.yml I0425 12:36:59.436589 132429263365952 pyconfig.py:471] Config param base_emb_dim: 16 I0425 12:36:59.436604 132429263365952 pyconfig.py:471] Config param base_mlp_dim: 64 I0425 12:36:59.436619 132429263365952 pyconfig.py:471] Config param base_moe_mlp_dim: -1 I0425 12:36:59.436634 132429263365952 pyconfig.py:471] Config param base_num_decoder_layers: 1 I0425 12:36:59.436650 132429263365952 pyconfig.py:471] Config param base_num_kv_heads: 2 I0425 12:36:59.436667 132429263365952 pyconfig.py:471] Config param base_num_query_heads: 2 I0425 12:36:59.436681 132429263365952 pyconfig.py:471] Config param base_output_directory: /deps/maxtext_output I0425 12:36:59.436697 132429263365952 pyconfig.py:471] Config param batch_size: 1 I0425 12:36:59.436713 132429263365952 pyconfig.py:471] Config param batch_split_factor: 1 I0425 12:36:59.436728 132429263365952 pyconfig.py:471] Config param beta_fast: 32 I0425 12:36:59.436744 132429263365952 pyconfig.py:471] Config param beta_slow: 1 I0425 12:36:59.436761 132429263365952 pyconfig.py:471] Config param bwd_quantization_calibration_method: absmax I0425 12:36:59.436778 132429263365952 pyconfig.py:471] Config param capacity_factor: -1.0 I0425 12:36:59.436795 132429263365952 pyconfig.py:471] Config param cast_logits_to_fp32: True I0425 12:36:59.436810 132429263365952 pyconfig.py:471] Config param chat_template: I0425 12:36:59.436828 132429263365952 pyconfig.py:471] Config param chat_template_path: I0425 12:36:59.436846 132429263365952 pyconfig.py:471] Config param checkpoint_conversion_fn: None I0425 12:36:59.436865 132429263365952 pyconfig.py:471] Config param checkpoint_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-12-36/checkpoints/ I0425 12:36:59.436882 132429263365952 pyconfig.py:471] Config param checkpoint_is_quantized: False I0425 12:36:59.436897 132429263365952 pyconfig.py:471] Config param checkpoint_period: 2000 I0425 12:36:59.436913 132429263365952 pyconfig.py:471] Config param checkpoint_storage_concurrent_gb: 96 I0425 12:36:59.436928 132429263365952 pyconfig.py:471] Config param checkpoint_storage_target_data_file_size_bytes: 2147483648 I0425 12:36:59.436945 132429263365952 pyconfig.py:471] Config param checkpoint_storage_use_ocdbt: True I0425 12:36:59.436960 132429263365952 pyconfig.py:471] Config param checkpoint_storage_use_zarr3: True I0425 12:36:59.436975 132429263365952 pyconfig.py:471] Config param checkpoint_todelete_full_path: None I0425 12:36:59.436990 132429263365952 pyconfig.py:471] Config param checkpoint_todelete_subdir: None I0425 12:36:59.437004 132429263365952 pyconfig.py:471] Config param chips_per_vm: 4 I0425 12:36:59.437020 132429263365952 pyconfig.py:471] Config param chunk_attn_window_size: 0 I0425 12:36:59.437035 132429263365952 pyconfig.py:471] Config param collect_stack_trace: False I0425 12:36:59.437050 132429263365952 pyconfig.py:471] Config param colocated_python_checkpointing: False I0425 12:36:59.437066 132429263365952 pyconfig.py:471] Config param colocated_python_data_input: False I0425 12:36:59.437081 132429263365952 pyconfig.py:471] Config param compile_topology: I0425 12:36:59.437105 132429263365952 pyconfig.py:471] Config param compile_topology_num_slices: -1 I0425 12:36:59.437120 132429263365952 pyconfig.py:471] Config param compile_xla_flags: I0425 12:36:59.437136 132429263365952 pyconfig.py:471] Config param compiled_trainstep_file: I0425 12:36:59.437151 132429263365952 pyconfig.py:471] Config param compute_axis_order: 0,1,2,3 I0425 12:36:59.437167 132429263365952 pyconfig.py:471] Config param constant_bound_config: [] I0425 12:36:59.437182 132429263365952 pyconfig.py:471] Config param context: RematLocation.REMAT I0425 12:36:59.437198 132429263365952 pyconfig.py:471] Config param context_parallel_load_balance: True I0425 12:36:59.437214 132429263365952 pyconfig.py:471] Config param context_parallel_reorder_strategy: ReorderStrategy.AUTO I0425 12:36:59.437232 132429263365952 pyconfig.py:471] Config param context_parallel_size: 1 I0425 12:36:59.437246 132429263365952 pyconfig.py:471] Config param context_parallel_strategy: all_gather I0425 12:36:59.437262 132429263365952 pyconfig.py:471] Config param context_sharding: context I0425 12:36:59.437278 132429263365952 pyconfig.py:471] Config param conv_chunksize_for_audio: 500 I0425 12:36:59.437293 132429263365952 pyconfig.py:471] Config param conv_stride_for_vit: 14 I0425 12:36:59.437308 132429263365952 pyconfig.py:471] Config param convert_checkpoint_if_possible: False I0425 12:36:59.437323 132429263365952 pyconfig.py:471] Config param cost_estimate_flops_bwd: -1 I0425 12:36:59.437340 132429263365952 pyconfig.py:471] Config param cost_estimate_flops_fwd: -1 I0425 12:36:59.437354 132429263365952 pyconfig.py:471] Config param custom_mesh: I0425 12:36:59.437370 132429263365952 pyconfig.py:471] Config param custom_mesh_and_rule: I0425 12:36:59.437386 132429263365952 pyconfig.py:471] Config param d_model_for_audio: 256 I0425 12:36:59.437400 132429263365952 pyconfig.py:471] Config param data_sharding: (('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'),) I0425 12:36:59.437420 132429263365952 pyconfig.py:471] Config param data_shuffle_seed: 0 I0425 12:36:59.437435 132429263365952 pyconfig.py:471] Config param dataset_name: c4/en:3.0.1 I0425 12:36:59.437452 132429263365952 pyconfig.py:471] Config param dataset_path: I0425 12:36:59.437467 132429263365952 pyconfig.py:471] Config param dataset_type: DatasetType.HF I0425 12:36:59.437484 132429263365952 pyconfig.py:471] Config param dcn_autoregressive_parallelism: 1 I0425 12:36:59.437505 132429263365952 pyconfig.py:471] Config param dcn_context_autoregressive_parallelism: 1 I0425 12:36:59.437521 132429263365952 pyconfig.py:471] Config param dcn_context_parallelism: 1 I0425 12:36:59.437536 132429263365952 pyconfig.py:471] Config param dcn_data_parallelism: -1 I0425 12:36:59.437551 132429263365952 pyconfig.py:471] Config param dcn_diloco_parallelism: 1 I0425 12:36:59.437565 132429263365952 pyconfig.py:471] Config param dcn_expert_parallelism: 1 I0425 12:36:59.437581 132429263365952 pyconfig.py:471] Config param dcn_fsdp_parallelism: 1 I0425 12:36:59.437595 132429263365952 pyconfig.py:471] Config param dcn_fsdp_transpose_parallelism: 1 I0425 12:36:59.437611 132429263365952 pyconfig.py:471] Config param dcn_parallelism: [1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] I0425 12:36:59.437628 132429263365952 pyconfig.py:471] Config param dcn_pipeline_parallelism: 1 I0425 12:36:59.437643 132429263365952 pyconfig.py:471] Config param dcn_sequence_parallelism: 1 I0425 12:36:59.437658 132429263365952 pyconfig.py:471] Config param dcn_tensor_parallelism: 1 I0425 12:36:59.437672 132429263365952 pyconfig.py:471] Config param dcn_tensor_sequence_parallelism: 1 I0425 12:36:59.437688 132429263365952 pyconfig.py:471] Config param dcn_tensor_transpose_parallelism: 1 I0425 12:36:59.437702 132429263365952 pyconfig.py:471] Config param debug: {'rl': False} I0425 12:36:59.437719 132429263365952 pyconfig.py:471] Config param debug_sharding: False I0425 12:36:59.437735 132429263365952 pyconfig.py:471] Config param decode_sampling_nucleus_p: -1 I0425 12:36:59.437750 132429263365952 pyconfig.py:471] Config param decode_sampling_strategy: SamplingStrategy.GREEDY I0425 12:36:59.437767 132429263365952 pyconfig.py:471] Config param decode_sampling_temperature: 1.0 I0425 12:36:59.437783 132429263365952 pyconfig.py:471] Config param decode_sampling_top_k: 0 I0425 12:36:59.437799 132429263365952 pyconfig.py:471] Config param decoder_block: DecoderBlockType.GPT3 I0425 12:36:59.437815 132429263365952 pyconfig.py:471] Config param decoder_layer_input: RematLocation.DEVICE I0425 12:36:59.437830 132429263365952 pyconfig.py:471] Config param deepstack_visual_indexes_for_vit: [] I0425 12:36:59.437845 132429263365952 pyconfig.py:471] Config param degenerate_group_masking: True I0425 12:36:59.437861 132429263365952 pyconfig.py:471] Config param dense_init_scale: 1.0 I0425 12:36:59.437878 132429263365952 pyconfig.py:471] Config param diloco_outer_lr: 0.3 I0425 12:36:59.437894 132429263365952 pyconfig.py:471] Config param diloco_outer_momentum: 0.9 I0425 12:36:59.437910 132429263365952 pyconfig.py:471] Config param diloco_sync_period: 36 I0425 12:36:59.437925 132429263365952 pyconfig.py:471] Config param distill_alpha: 0.5 I0425 12:36:59.437944 132429263365952 pyconfig.py:471] Config param distill_alpha_end: None I0425 12:36:59.437960 132429263365952 pyconfig.py:471] Config param distill_alpha_schedule: constant I0425 12:36:59.437976 132429263365952 pyconfig.py:471] Config param distill_beta: 0.0 I0425 12:36:59.437993 132429263365952 pyconfig.py:471] Config param distill_beta_end: None I0425 12:36:59.438008 132429263365952 pyconfig.py:471] Config param distill_beta_schedule: constant I0425 12:36:59.438024 132429263365952 pyconfig.py:471] Config param distill_feature_loss_type: cosine I0425 12:36:59.438040 132429263365952 pyconfig.py:471] Config param distill_layer_indices: None I0425 12:36:59.438054 132429263365952 pyconfig.py:471] Config param distill_temperature: 1.0 I0425 12:36:59.438070 132429263365952 pyconfig.py:471] Config param distill_temperature_end: None I0425 12:36:59.438085 132429263365952 pyconfig.py:471] Config param distill_temperature_schedule: constant I0425 12:36:59.438109 132429263365952 pyconfig.py:471] Config param downsample_hidden_size_for_audio: 256 I0425 12:36:59.438124 132429263365952 pyconfig.py:471] Config param dpo_beta: 0.1 I0425 12:36:59.438140 132429263365952 pyconfig.py:471] Config param dpo_label_smoothing: 0.0 I0425 12:36:59.438156 132429263365952 pyconfig.py:471] Config param dq_reduction_steps: 0 I0425 12:36:59.438171 132429263365952 pyconfig.py:471] Config param dropout_rate: 0.0 I0425 12:36:59.438186 132429263365952 pyconfig.py:471] Config param dtype: bfloat16 I0425 12:36:59.438217 132429263365952 pyconfig.py:471] Config param dtype_mm: float32 I0425 12:36:59.438233 132429263365952 pyconfig.py:471] Config param dump_hlo: False I0425 12:36:59.438249 132429263365952 pyconfig.py:471] Config param dump_hlo_delete_local_after: True I0425 12:36:59.438265 132429263365952 pyconfig.py:471] Config param dump_hlo_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-12-36/xla_dump I0425 12:36:59.438281 132429263365952 pyconfig.py:471] Config param dump_hlo_local_dir: /tmp/xla_dump/ I0425 12:36:59.438297 132429263365952 pyconfig.py:471] Config param dump_hlo_local_module_name: jit_train_step I0425 12:36:59.438313 132429263365952 pyconfig.py:471] Config param dump_hlo_module_name: jit_train_step I0425 12:36:59.438329 132429263365952 pyconfig.py:471] Config param dump_hlo_upload_all: False I0425 12:36:59.438346 132429263365952 pyconfig.py:471] Config param dump_hlo_xla_flags: I0425 12:36:59.438360 132429263365952 pyconfig.py:471] Config param dump_jaxpr: False I0425 12:36:59.438376 132429263365952 pyconfig.py:471] Config param dump_jaxpr_delete_local_after: True I0425 12:36:59.438391 132429263365952 pyconfig.py:471] Config param dump_jaxpr_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-12-36/jaxpr_dump I0425 12:36:59.438407 132429263365952 pyconfig.py:471] Config param dump_jaxpr_local_dir: /tmp/jaxpr_dump/ I0425 12:36:59.438423 132429263365952 pyconfig.py:471] Config param dump_step: -1 I0425 12:36:59.438439 132429263365952 pyconfig.py:471] Config param elastic_enabled: False I0425 12:36:59.438455 132429263365952 pyconfig.py:471] Config param elastic_max_retries: 10 I0425 12:36:59.438471 132429263365952 pyconfig.py:471] Config param elastic_timeout_seconds: 300 I0425 12:36:59.438486 132429263365952 pyconfig.py:471] Config param emb_dim: 16 I0425 12:36:59.438506 132429263365952 pyconfig.py:471] Config param enable_autocheckpoint: False I0425 12:36:59.438522 132429263365952 pyconfig.py:471] Config param enable_checkpoint_cloud_logger: False I0425 12:36:59.438537 132429263365952 pyconfig.py:471] Config param enable_checkpointing: True I0425 12:36:59.438552 132429263365952 pyconfig.py:471] Config param enable_continuous_checkpointing: False I0425 12:36:59.438567 132429263365952 pyconfig.py:471] Config param enable_data_shuffling: True I0425 12:36:59.438583 132429263365952 pyconfig.py:471] Config param enable_diloco: False I0425 12:36:59.438599 132429263365952 pyconfig.py:471] Config param enable_dp_attention: False I0425 12:36:59.438615 132429263365952 pyconfig.py:471] Config param enable_dropout: False I0425 12:36:59.438630 132429263365952 pyconfig.py:471] Config param enable_emergency_checkpoint: False I0425 12:36:59.438645 132429263365952 pyconfig.py:471] Config param enable_expert_parallel: False I0425 12:36:59.438661 132429263365952 pyconfig.py:471] Config param enable_gcp_goodput_metrics: True I0425 12:36:59.438675 132429263365952 pyconfig.py:471] Config param enable_gcp_step_deviation_metrics: True I0425 12:36:59.438691 132429263365952 pyconfig.py:471] Config param enable_goodput_recording: False I0425 12:36:59.438706 132429263365952 pyconfig.py:471] Config param enable_jax_profiler: False I0425 12:36:59.438722 132429263365952 pyconfig.py:471] Config param enable_llm_inference_pool: False I0425 12:36:59.438737 132429263365952 pyconfig.py:471] Config param enable_model_warmup: False I0425 12:36:59.438753 132429263365952 pyconfig.py:471] Config param enable_multi_tier_checkpointing: False I0425 12:36:59.438769 132429263365952 pyconfig.py:471] Config param enable_nnx: False I0425 12:36:59.438783 132429263365952 pyconfig.py:471] Config param enable_orbax_v1: False I0425 12:36:59.438799 132429263365952 pyconfig.py:471] Config param enable_padding_causal_mask: True I0425 12:36:59.438813 132429263365952 pyconfig.py:471] Config param enable_pathways_goodput: False I0425 12:36:59.438829 132429263365952 pyconfig.py:471] Config param enable_prefix_caching: False I0425 12:36:59.438844 132429263365952 pyconfig.py:471] Config param enable_rampup_batch_size: False I0425 12:36:59.438859 132429263365952 pyconfig.py:471] Config param enable_single_controller: False I0425 12:36:59.438874 132429263365952 pyconfig.py:471] Config param enable_single_replica_ckpt_restoring: False I0425 12:36:59.438889 132429263365952 pyconfig.py:471] Config param enable_tensorboard: True I0425 12:36:59.438903 132429263365952 pyconfig.py:471] Config param enable_tunix_perf_metrics: False I0425 12:36:59.438919 132429263365952 pyconfig.py:471] Config param encoder_attention_heads_for_audio: 4 I0425 12:36:59.438934 132429263365952 pyconfig.py:471] Config param encoder_ffn_dim_for_audio: 512 I0425 12:36:59.438949 132429263365952 pyconfig.py:471] Config param encoder_layers_for_audio: 2 I0425 12:36:59.438964 132429263365952 pyconfig.py:471] Config param engram: RematLocation.REMAT I0425 12:36:59.438979 132429263365952 pyconfig.py:471] Config param engram_head_dim: 1280 I0425 12:36:59.438995 132429263365952 pyconfig.py:471] Config param engram_kernel_size: 4 I0425 12:36:59.439010 132429263365952 pyconfig.py:471] Config param engram_layers: [] I0425 12:36:59.439025 132429263365952 pyconfig.py:471] Config param engram_max_ngram_size: 3 I0425 12:36:59.439040 132429263365952 pyconfig.py:471] Config param engram_num_heads: 8 I0425 12:36:59.439055 132429263365952 pyconfig.py:471] Config param engram_seed: 0 I0425 12:36:59.439071 132429263365952 pyconfig.py:471] Config param engram_vocab_bases: [] I0425 12:36:59.439087 132429263365952 pyconfig.py:471] Config param epsilon_high: None I0425 12:36:59.439110 132429263365952 pyconfig.py:471] Config param eval_corr_lst: False I0425 12:36:59.439127 132429263365952 pyconfig.py:471] Config param eval_data_columns: ['text'] I0425 12:36:59.439143 132429263365952 pyconfig.py:471] Config param eval_dataset_name: c4/en:3.0.1 I0425 12:36:59.439159 132429263365952 pyconfig.py:471] Config param eval_image_column: image I0425 12:36:59.439175 132429263365952 pyconfig.py:471] Config param eval_interval: -1 I0425 12:36:59.439189 132429263365952 pyconfig.py:471] Config param eval_make_lst: False I0425 12:36:59.439205 132429263365952 pyconfig.py:471] Config param eval_per_device_batch_size: 2 I0425 12:36:59.439221 132429263365952 pyconfig.py:471] Config param eval_sampling_strategy: greedy I0425 12:36:59.439238 132429263365952 pyconfig.py:471] Config param eval_split: validation I0425 12:36:59.439254 132429263365952 pyconfig.py:471] Config param eval_steps: -1 I0425 12:36:59.439268 132429263365952 pyconfig.py:471] Config param expansion_factor_real_data: -1.0 I0425 12:36:59.439284 132429263365952 pyconfig.py:471] Config param final_logits_soft_cap: None I0425 12:36:59.439300 132429263365952 pyconfig.py:471] Config param first_num_dense_layers: 0 I0425 12:36:59.439314 132429263365952 pyconfig.py:471] Config param float32_gate_logits: False I0425 12:36:59.439330 132429263365952 pyconfig.py:471] Config param float32_logits: False I0425 12:36:59.439345 132429263365952 pyconfig.py:471] Config param float32_qk_product: False I0425 12:36:59.439360 132429263365952 pyconfig.py:471] Config param float32_weight_sum: True I0425 12:36:59.439375 132429263365952 pyconfig.py:471] Config param force_q_layout: False I0425 12:36:59.439390 132429263365952 pyconfig.py:471] Config param force_unroll: False I0425 12:36:59.439406 132429263365952 pyconfig.py:471] Config param formatting_func_kwargs: {} I0425 12:36:59.439422 132429263365952 pyconfig.py:471] Config param formatting_func_path: I0425 12:36:59.439437 132429263365952 pyconfig.py:471] Config param freeze_audio_encoder_params: True I0425 12:36:59.439452 132429263365952 pyconfig.py:471] Config param freeze_vision_encoder_params: True I0425 12:36:59.439467 132429263365952 pyconfig.py:471] Config param fused_mlp: False I0425 12:36:59.439482 132429263365952 pyconfig.py:471] Config param fused_qkv: True I0425 12:36:59.439501 132429263365952 pyconfig.py:471] Config param gcs_metrics: False I0425 12:36:59.439516 132429263365952 pyconfig.py:471] Config param gdn_chunk_size: 64 I0425 12:36:59.439531 132429263365952 pyconfig.py:471] Config param gdn_conv_kernel_dim: 4 I0425 12:36:59.439547 132429263365952 pyconfig.py:471] Config param gdn_key_head_dim: 128 I0425 12:36:59.439562 132429263365952 pyconfig.py:471] Config param gdn_num_key_heads: 16 I0425 12:36:59.439577 132429263365952 pyconfig.py:471] Config param gdn_num_value_heads: 32 I0425 12:36:59.439593 132429263365952 pyconfig.py:471] Config param gdn_value_head_dim: 128 I0425 12:36:59.439610 132429263365952 pyconfig.py:471] Config param generate_padding_batch_eval: False I0425 12:36:59.439624 132429263365952 pyconfig.py:471] Config param generate_padding_batch_train: False I0425 12:36:59.439639 132429263365952 pyconfig.py:471] Config param generate_slice: v5e-16 I0425 12:36:59.439653 132429263365952 pyconfig.py:471] Config param generation_configs: {} I0425 12:36:59.439669 132429263365952 pyconfig.py:471] Config param global_batch_size_to_eval_on: 64 I0425 12:36:59.439686 132429263365952 pyconfig.py:471] Config param global_batch_size_to_load: 512 I0425 12:36:59.439700 132429263365952 pyconfig.py:471] Config param global_batch_size_to_load_eval: 64 I0425 12:36:59.439716 132429263365952 pyconfig.py:471] Config param global_batch_size_to_load_increment: None I0425 12:36:59.439730 132429263365952 pyconfig.py:471] Config param global_batch_size_to_load_start: None I0425 12:36:59.439745 132429263365952 pyconfig.py:471] Config param global_batch_size_to_train_on: 512 I0425 12:36:59.439760 132429263365952 pyconfig.py:471] Config param global_head_dim: 0 I0425 12:36:59.439776 132429263365952 pyconfig.py:471] Config param global_num_kv_heads: 0 I0425 12:36:59.439791 132429263365952 pyconfig.py:471] Config param global_parameter_scale: 1 I0425 12:36:59.439807 132429263365952 pyconfig.py:471] Config param global_rampup_samples: 500 I0425 12:36:59.439822 132429263365952 pyconfig.py:471] Config param global_rope_max_timescale: -1 I0425 12:36:59.439837 132429263365952 pyconfig.py:471] Config param global_rope_proportion: 0.25 I0425 12:36:59.439852 132429263365952 pyconfig.py:471] Config param goodput_upload_interval_seconds: 30 I0425 12:36:59.439868 132429263365952 pyconfig.py:471] Config param grad_dtype: float32 I0425 12:36:59.439903 132429263365952 pyconfig.py:471] Config param gradient_accumulation_steps: 8 I0425 12:36:59.439918 132429263365952 pyconfig.py:471] Config param gradient_clipping_threshold: 1.0 I0425 12:36:59.439934 132429263365952 pyconfig.py:471] Config param grain_data_source_max_workers: 16 I0425 12:36:59.439949 132429263365952 pyconfig.py:471] Config param grain_eval_files: I0425 12:36:59.439965 132429263365952 pyconfig.py:471] Config param grain_file_type: arrayrecord I0425 12:36:59.439979 132429263365952 pyconfig.py:471] Config param grain_num_threads: 16 I0425 12:36:59.439994 132429263365952 pyconfig.py:471] Config param grain_num_threads_eval: 16 I0425 12:36:59.440008 132429263365952 pyconfig.py:471] Config param grain_packing_type: first_fit I0425 12:36:59.440024 132429263365952 pyconfig.py:471] Config param grain_per_worker_buffer_size: 1 I0425 12:36:59.440038 132429263365952 pyconfig.py:471] Config param grain_per_worker_buffer_size_eval: 1 I0425 12:36:59.440053 132429263365952 pyconfig.py:471] Config param grain_prefetch_buffer_size: 500 I0425 12:36:59.440067 132429263365952 pyconfig.py:471] Config param grain_prefetch_buffer_size_eval: 500 I0425 12:36:59.440083 132429263365952 pyconfig.py:471] Config param grain_ram_budget_mb: 1024 I0425 12:36:59.440107 132429263365952 pyconfig.py:471] Config param grain_shuffle_buffer_size: 100 I0425 12:36:59.440123 132429263365952 pyconfig.py:471] Config param grain_train_files: I0425 12:36:59.440139 132429263365952 pyconfig.py:471] Config param grain_train_mixture_config_path: I0425 12:36:59.440153 132429263365952 pyconfig.py:471] Config param grain_worker_count: 1 I0425 12:36:59.440169 132429263365952 pyconfig.py:471] Config param grain_worker_count_eval: 1 I0425 12:36:59.440183 132429263365952 pyconfig.py:471] Config param grpo_beta: 0.08 I0425 12:36:59.440199 132429263365952 pyconfig.py:471] Config param grpo_epsilon: 0.2 I0425 12:36:59.440215 132429263365952 pyconfig.py:471] Config param hardware: tpu I0425 12:36:59.440230 132429263365952 pyconfig.py:471] Config param hbm_utilization_vllm: 0.72 I0425 12:36:59.440245 132429263365952 pyconfig.py:471] Config param head_dim: 8 I0425 12:36:59.440264 132429263365952 pyconfig.py:471] Config param heartbeat_reporting_interval_in_seconds: 5 I0425 12:36:59.440280 132429263365952 pyconfig.py:471] Config param hf_data_dir: None I0425 12:36:59.440295 132429263365952 pyconfig.py:471] Config param hf_eval_files: None I0425 12:36:59.440309 132429263365952 pyconfig.py:471] Config param hf_eval_split: None I0425 12:36:59.440325 132429263365952 pyconfig.py:471] Config param hf_name: None I0425 12:36:59.440339 132429263365952 pyconfig.py:471] Config param hf_path: OptimalScale/ClimbMix I0425 12:36:59.440355 132429263365952 pyconfig.py:471] Config param hf_train_files: None I0425 12:36:59.440369 132429263365952 pyconfig.py:471] Config param hidden_size_for_vit: 1408 I0425 12:36:59.440385 132429263365952 pyconfig.py:471] Config param hide_profiler_step_metric: False I0425 12:36:59.440399 132429263365952 pyconfig.py:471] Config param ici_autoregressive_parallelism: 1 I0425 12:36:59.440415 132429263365952 pyconfig.py:471] Config param ici_context_autoregressive_parallelism: 1 I0425 12:36:59.440429 132429263365952 pyconfig.py:471] Config param ici_context_parallelism: 1 I0425 12:36:59.440445 132429263365952 pyconfig.py:471] Config param ici_data_parallelism: 1 I0425 12:36:59.440459 132429263365952 pyconfig.py:471] Config param ici_diloco_parallelism: 1 I0425 12:36:59.440475 132429263365952 pyconfig.py:471] Config param ici_expert_parallelism: 1 I0425 12:36:59.440496 132429263365952 pyconfig.py:471] Config param ici_fsdp_parallelism: -1 I0425 12:36:59.440510 132429263365952 pyconfig.py:471] Config param ici_fsdp_transpose_parallelism: 1 I0425 12:36:59.440526 132429263365952 pyconfig.py:471] Config param ici_parallelism: [1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1] I0425 12:36:59.440541 132429263365952 pyconfig.py:471] Config param ici_pipeline_parallelism: 1 I0425 12:36:59.440557 132429263365952 pyconfig.py:471] Config param ici_sequence_parallelism: 1 I0425 12:36:59.440571 132429263365952 pyconfig.py:471] Config param ici_tensor_parallelism: 1 I0425 12:36:59.440586 132429263365952 pyconfig.py:471] Config param ici_tensor_sequence_parallelism: 1 I0425 12:36:59.440600 132429263365952 pyconfig.py:471] Config param ici_tensor_transpose_parallelism: 1 I0425 12:36:59.440616 132429263365952 pyconfig.py:471] Config param image_path: I0425 12:36:59.440630 132429263365952 pyconfig.py:471] Config param image_placeholder: <|image|> I0425 12:36:59.440645 132429263365952 pyconfig.py:471] Config param image_size_for_vit: 896 I0425 12:36:59.440659 132429263365952 pyconfig.py:471] Config param indexer_head_dim: 128 I0425 12:36:59.440675 132429263365952 pyconfig.py:471] Config param indexer_loss_scaling_factor: 0.0 I0425 12:36:59.440689 132429263365952 pyconfig.py:471] Config param indexer_n_heads: 64 I0425 12:36:59.440705 132429263365952 pyconfig.py:471] Config param indexer_sparse_training: False I0425 12:36:59.440719 132429263365952 pyconfig.py:471] Config param indexer_topk: 2048 I0425 12:36:59.440735 132429263365952 pyconfig.py:471] Config param inference_benchmark_test: False I0425 12:36:59.440749 132429263365952 pyconfig.py:471] Config param inference_metadata_file: I0425 12:36:59.440764 132429263365952 pyconfig.py:471] Config param inference_microbenchmark_log_file_path: I0425 12:36:59.440778 132429263365952 pyconfig.py:471] Config param inference_microbenchmark_loop_iters: 10 I0425 12:36:59.440794 132429263365952 pyconfig.py:471] Config param inference_microbenchmark_num_samples: [1, 2, 3, 4, 5] I0425 12:36:59.440811 132429263365952 pyconfig.py:471] Config param inference_microbenchmark_prefill_lengths: 64,128,256,512,1024 I0425 12:36:59.440827 132429263365952 pyconfig.py:471] Config param inference_microbenchmark_stages: prefill,generate I0425 12:36:59.440843 132429263365952 pyconfig.py:471] Config param inference_server: MaxtextInterleavedServer I0425 12:36:59.440859 132429263365952 pyconfig.py:471] Config param inhomogeneous_layer_cycle_interval: 1 I0425 12:36:59.440873 132429263365952 pyconfig.py:471] Config param init_weights_seed: 0 I0425 12:36:59.440889 132429263365952 pyconfig.py:471] Config param input_data_sharding_logical_axes: ['activation_embed_and_logits_batch', 'activation_norm_length'] I0425 12:36:59.440904 132429263365952 pyconfig.py:471] Config param interleave_moe_layer_step: 1 I0425 12:36:59.440920 132429263365952 pyconfig.py:471] Config param intermediate_size_for_vit: 5632 I0425 12:36:59.440934 132429263365952 pyconfig.py:471] Config param internal_compile: False I0425 12:36:59.440950 132429263365952 pyconfig.py:471] Config param internal_compile_num_devices: -1 I0425 12:36:59.440967 132429263365952 pyconfig.py:471] Config param jax_cache_dir: ~/jax_cache I0425 12:36:59.440983 132429263365952 pyconfig.py:471] Config param jax_debug_log_modules: I0425 12:36:59.440998 132429263365952 pyconfig.py:471] Config param jax_distributed_initialization_timeout: 300 I0425 12:36:59.441013 132429263365952 pyconfig.py:471] Config param jax_profiler_port: 9999 I0425 12:36:59.441027 132429263365952 pyconfig.py:471] Config param key_proj: RematLocation.REMAT I0425 12:36:59.441044 132429263365952 pyconfig.py:471] Config param kv_cache_buffer: 256 I0425 12:36:59.441058 132429263365952 pyconfig.py:471] Config param kv_lora_rank: 512 I0425 12:36:59.441074 132429263365952 pyconfig.py:471] Config param kv_quant_axis: KvQuantAxis.HEADS_AND_DKV I0425 12:36:59.441091 132429263365952 pyconfig.py:471] Config param kv_quant_dtype: int8 I0425 12:36:59.441116 132429263365952 pyconfig.py:471] Config param kv_wa_proj: RematLocation.REMAT I0425 12:36:59.441133 132429263365952 pyconfig.py:471] Config param learning_rate: 0.0002 I0425 12:36:59.441147 132429263365952 pyconfig.py:471] Config param learning_rate_final_fraction: 0.1 I0425 12:36:59.441164 132429263365952 pyconfig.py:471] Config param learning_rate_schedule_steps: 200000 I0425 12:36:59.441178 132429263365952 pyconfig.py:471] Config param load_balance_loss_weight: 0.0 I0425 12:36:59.441194 132429263365952 pyconfig.py:471] Config param load_checkpoint_only_once: False I0425 12:36:59.441208 132429263365952 pyconfig.py:471] Config param load_from_prefill_dir: False I0425 12:36:59.441224 132429263365952 pyconfig.py:471] Config param load_full_state_path: I0425 12:36:59.441238 132429263365952 pyconfig.py:471] Config param load_parameters_path: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items I0425 12:36:59.441254 132429263365952 pyconfig.py:471] Config param local_checkpoint_directory: I0425 12:36:59.441268 132429263365952 pyconfig.py:471] Config param local_checkpoint_period: 0 I0425 12:36:59.441283 132429263365952 pyconfig.py:471] Config param local_rope_max_timescale: -1 I0425 12:36:59.441298 132429263365952 pyconfig.py:471] Config param local_rope_proportion: 1.0 I0425 12:36:59.441313 132429263365952 pyconfig.py:471] Config param log_config: True I0425 12:36:59.441329 132429263365952 pyconfig.py:471] Config param log_period: 10 I0425 12:36:59.441344 132429263365952 pyconfig.py:471] Config param logical_axis_rules: (('activation_embed_and_logits_batch', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_embed_and_logits_batch_sequence', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_vocab', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_vocab', ('tensor', 'tensor_transpose')), ('activation_vocab', 'tensor_sequence'), ('activation_vocab', ('sequence', 'context')), ('vocab', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('embed_vocab', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_batch_attn', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence', 'autoregressive')), ('activation_kv_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence')), ('activation_length_attn', ('sequence', 'context')), ('activation_length_attn', ('context',)), ('activation_q_length', ('context',)), ('activation_kv_length', ()), ('activation_embed_attn', ('tensor', 'tensor_transpose')), ('activation_kv', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_kv_head_dim', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('q_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('kv_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('qkv', ()), ('kv', ()), ('kv_head_dim', ()), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'expert')), ('q_lora_up_proj', ()), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'expert')), ('kv_lora_up_proj', ()), ('activation_batch_moe', ('data', 'fsdp', 'fsdp_transpose')), ('activation_length_moe', ('sequence', 'context')), ('activation_length_moe', ('context',)), ('activation_norm_length_moe', ('tensor_sequence', 'context', 'sequence')), ('activation_embed_moe', ('tensor', 'tensor_transpose')), ('activation_mlp_moe', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_exp', ('expert',)), ('exp', 'expert'), ('mlp_moe', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'context')), ('embed_moe', ('fsdp', 'sequence', 'context')), ('activation_mlp', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_length', ('sequence', 'context')), ('activation_length', ('context',)), ('activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_embed', ('tensor', 'tensor_transpose')), ('activation_stage', 'stage'), ('mlp', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'context', 'expert')), ('norm', ('tensor', 'tensor_transpose')), ('layers', 'stage'), ('diloco', 'diloco'), ('engram_dim', ('tensor',)), ('dense_layers', ()), ('moe_layers', ()), ('mhc', ()), ('prefill_activation_length', ('sequence', 'context')), ('prefill_activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_prefill_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_length', ('sequence',)), ('cache_heads', ('autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence')), ('cache_heads', ('autoregressive', 'tensor', 'tensor_sequence')), ('paged_kv_heads', ('tensor',)), ('cache_batch_prefill', ()), ('cache_batch', ()), ('cache_heads_none', ()), ('cache_kv', ()), ('cache_sequence', ()), ('num_pages', ()), ('tokens_per_page', ()), ('paged_kv_head_dim_size', ()), ('mlp_no_fsdp', ('tensor', 'tensor_sequence', 'autoregressive')), ('embed_tensor_transpose', ('tensor_transpose',)), ('exp_with_fsdp', 'fsdp')) I0425 12:36:59.441419 132429263365952 pyconfig.py:471] Config param logits_dot_in_fp32: False I0425 12:36:59.441435 132429263365952 pyconfig.py:471] Config param logits_via_embedding: True I0425 12:36:59.441451 132429263365952 pyconfig.py:471] Config param lora_input_adapters_path: I0425 12:36:59.441466 132429263365952 pyconfig.py:471] Config param loss_algo: grpo I0425 12:36:59.441482 132429263365952 pyconfig.py:471] Config param lr_schedule_type: LearningRateScheduleType.COSINE I0425 12:36:59.441503 132429263365952 pyconfig.py:471] Config param managed_mldiagnostics: False I0425 12:36:59.441519 132429263365952 pyconfig.py:471] Config param managed_mldiagnostics_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-12-36/managed-mldiagnostics I0425 12:36:59.441533 132429263365952 pyconfig.py:471] Config param managed_mldiagnostics_run_group: I0425 12:36:59.441549 132429263365952 pyconfig.py:471] Config param matmul_precision: MatmulPrecision.DEFAULT I0425 12:36:59.441567 132429263365952 pyconfig.py:471] Config param max_checkify: False I0425 12:36:59.441583 132429263365952 pyconfig.py:471] Config param max_concurrency: 256 I0425 12:36:59.441598 132429263365952 pyconfig.py:471] Config param max_corpus_chars: 10000000 I0425 12:36:59.441614 132429263365952 pyconfig.py:471] Config param max_num_batched_tokens: None I0425 12:36:59.441628 132429263365952 pyconfig.py:471] Config param max_num_checkpoints_to_keep: None I0425 12:36:59.441643 132429263365952 pyconfig.py:471] Config param max_num_images_per_example: -1 I0425 12:36:59.441658 132429263365952 pyconfig.py:471] Config param max_num_seqs: None I0425 12:36:59.441673 132429263365952 pyconfig.py:471] Config param max_position_embeddings: 163840 I0425 12:36:59.441689 132429263365952 pyconfig.py:471] Config param max_prefill_predict_length: 64 I0425 12:36:59.441703 132429263365952 pyconfig.py:471] Config param max_sample_len_for_audio: 10000 I0425 12:36:59.441719 132429263365952 pyconfig.py:471] Config param max_segments_per_seq: -1 I0425 12:36:59.441733 132429263365952 pyconfig.py:471] Config param max_source_positions_for_audio: 1500 I0425 12:36:59.441749 132429263365952 pyconfig.py:471] Config param max_target_length: 2048 I0425 12:36:59.441764 132429263365952 pyconfig.py:471] Config param max_timescale_for_audio: 10000.0 I0425 12:36:59.441780 132429263365952 pyconfig.py:471] Config param megablox: True I0425 12:36:59.441795 132429263365952 pyconfig.py:471] Config param merge_gating_gmm: False I0425 12:36:59.441811 132429263365952 pyconfig.py:471] Config param mesh_axes: ['diloco', 'data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'] I0425 12:36:59.441828 132429263365952 pyconfig.py:471] Config param metrics_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-12-36/metrics/ I0425 12:36:59.441844 132429263365952 pyconfig.py:471] Config param metrics_file: I0425 12:36:59.441859 132429263365952 pyconfig.py:471] Config param mhc_expansion_rate: 1 I0425 12:36:59.441875 132429263365952 pyconfig.py:471] Config param micro_batch_size_to_eval_on: 64 I0425 12:36:59.441889 132429263365952 pyconfig.py:471] Config param micro_batch_size_to_train_on: 64 I0425 12:36:59.441905 132429263365952 pyconfig.py:471] Config param mla_kv: RematLocation.REMAT I0425 12:36:59.441919 132429263365952 pyconfig.py:471] Config param mla_naive_kvcache: True I0425 12:36:59.441935 132429263365952 pyconfig.py:471] Config param mla_q: RematLocation.REMAT I0425 12:36:59.441950 132429263365952 pyconfig.py:471] Config param mlp_activations: ['gelu'] I0425 12:36:59.441965 132429263365952 pyconfig.py:471] Config param mlp_activations_limit: -1.0 I0425 12:36:59.441981 132429263365952 pyconfig.py:471] Config param mlp_bias: False I0425 12:36:59.441996 132429263365952 pyconfig.py:471] Config param mlp_dim: 64 I0425 12:36:59.442011 132429263365952 pyconfig.py:471] Config param mlpwi: RematLocation.REMAT I0425 12:36:59.442026 132429263365952 pyconfig.py:471] Config param mlpwi_0: RematLocation.REMAT I0425 12:36:59.442042 132429263365952 pyconfig.py:471] Config param mlpwi_1: RematLocation.REMAT I0425 12:36:59.442057 132429263365952 pyconfig.py:471] Config param mlpwo: RematLocation.REMAT I0425 12:36:59.442073 132429263365952 pyconfig.py:471] Config param moba: False I0425 12:36:59.442089 132429263365952 pyconfig.py:471] Config param moba_chunk_size: 1024 I0425 12:36:59.442114 132429263365952 pyconfig.py:471] Config param moba_topk: 8 I0425 12:36:59.442130 132429263365952 pyconfig.py:471] Config param model_call_mode: I0425 12:36:59.442144 132429263365952 pyconfig.py:471] Config param model_name: gpt3-52k I0425 12:36:59.442160 132429263365952 pyconfig.py:471] Config param moe_expert_input_dim: -1 I0425 12:36:59.442174 132429263365952 pyconfig.py:471] Config param moe_fsdp_use_two_stage_all_gather: False I0425 12:36:59.442190 132429263365952 pyconfig.py:471] Config param moe_mlp_dim: -1 I0425 12:36:59.442206 132429263365952 pyconfig.py:471] Config param moe_mlpwi_0: RematLocation.REMAT I0425 12:36:59.442221 132429263365952 pyconfig.py:471] Config param moe_mlpwi_1: RematLocation.REMAT I0425 12:36:59.442236 132429263365952 pyconfig.py:471] Config param moe_mlpwo: RematLocation.REMAT I0425 12:36:59.442252 132429263365952 pyconfig.py:471] Config param monitor_goodput: False I0425 12:36:59.442266 132429263365952 pyconfig.py:471] Config param monitor_step_time_deviation: True I0425 12:36:59.442282 132429263365952 pyconfig.py:471] Config param mrope_section: [24, 20, 20] I0425 12:36:59.442297 132429263365952 pyconfig.py:471] Config param mscale: 1.0 I0425 12:36:59.442313 132429263365952 pyconfig.py:471] Config param mtc_data_parallelism: 0 I0425 12:36:59.442328 132429263365952 pyconfig.py:471] Config param mtp_eval_target_module: 0 I0425 12:36:59.442343 132429263365952 pyconfig.py:471] Config param mtp_loss_scaling_factor: 0.1 I0425 12:36:59.442359 132429263365952 pyconfig.py:471] Config param mtp_num_layers: 0 I0425 12:36:59.442375 132429263365952 pyconfig.py:471] Config param mu_dtype: float32 I0425 12:36:59.442399 132429263365952 pyconfig.py:471] Config param multi_sampling: False I0425 12:36:59.442414 132429263365952 pyconfig.py:471] Config param multi_tier_checkpointing_backup_interval_minutes: 0 I0425 12:36:59.442430 132429263365952 pyconfig.py:471] Config param muon_beta: 0.95 I0425 12:36:59.442447 132429263365952 pyconfig.py:471] Config param muon_consistent_rms: None I0425 12:36:59.442463 132429263365952 pyconfig.py:471] Config param muon_weight_decay: 0.0 I0425 12:36:59.442478 132429263365952 pyconfig.py:471] Config param n_routing_groups: -1 I0425 12:36:59.442499 132429263365952 pyconfig.py:471] Config param n_window_for_audio: 50 I0425 12:36:59.442514 132429263365952 pyconfig.py:471] Config param n_window_infer_for_audio: 800 I0425 12:36:59.442530 132429263365952 pyconfig.py:471] Config param nope_layer_interval: -1 I0425 12:36:59.442544 132429263365952 pyconfig.py:471] Config param norm_topk_prob: False I0425 12:36:59.442560 132429263365952 pyconfig.py:471] Config param normalization_layer_epsilon: 1e-05 I0425 12:36:59.442578 132429263365952 pyconfig.py:471] Config param normalize_embedding_logits: False I0425 12:36:59.442592 132429263365952 pyconfig.py:471] Config param num_attention_heads_for_vit: 16 I0425 12:36:59.442608 132429263365952 pyconfig.py:471] Config param num_batches: 4 I0425 12:36:59.442622 132429263365952 pyconfig.py:471] Config param num_channels_for_vit: 3 I0425 12:36:59.442639 132429263365952 pyconfig.py:471] Config param num_conv_layers_for_audio: 3 I0425 12:36:59.442655 132429263365952 pyconfig.py:471] Config param num_decoder_layers: 1 I0425 12:36:59.442669 132429263365952 pyconfig.py:471] Config param num_diloco_replicas: 1 I0425 12:36:59.442685 132429263365952 pyconfig.py:471] Config param num_epoch: 1 I0425 12:36:59.442700 132429263365952 pyconfig.py:471] Config param num_eval_passes: 1 I0425 12:36:59.442714 132429263365952 pyconfig.py:471] Config param num_experts: 1 I0425 12:36:59.442730 132429263365952 pyconfig.py:471] Config param num_experts_per_tok: 1 I0425 12:36:59.442744 132429263365952 pyconfig.py:471] Config param num_generations: 2 I0425 12:36:59.442760 132429263365952 pyconfig.py:471] Config param num_hidden_layers_for_vit: 34 I0425 12:36:59.442775 132429263365952 pyconfig.py:471] Config param num_iterations: 1 I0425 12:36:59.442790 132429263365952 pyconfig.py:471] Config param num_kv_heads: 2 I0425 12:36:59.442805 132429263365952 pyconfig.py:471] Config param num_layers_per_pipeline_stage: 1 I0425 12:36:59.442819 132429263365952 pyconfig.py:471] Config param num_mel_bins_for_audio: 128 I0425 12:36:59.442835 132429263365952 pyconfig.py:471] Config param num_pipeline_microbatches: -1 I0425 12:36:59.442851 132429263365952 pyconfig.py:471] Config param num_pipeline_repeats: -1 I0425 12:36:59.442867 132429263365952 pyconfig.py:471] Config param num_position_embeddings_for_vit: 1024 I0425 12:36:59.442882 132429263365952 pyconfig.py:471] Config param num_query_heads: 2 I0425 12:36:59.442898 132429263365952 pyconfig.py:471] Config param num_samplers_slices: -1 I0425 12:36:59.442912 132429263365952 pyconfig.py:471] Config param num_slices: 1 I0425 12:36:59.442927 132429263365952 pyconfig.py:471] Config param num_target_devices: 32 I0425 12:36:59.442943 132429263365952 pyconfig.py:471] Config param num_test_batches: 5 I0425 12:36:59.442957 132429263365952 pyconfig.py:471] Config param num_trainer_slices: -1 I0425 12:36:59.442973 132429263365952 pyconfig.py:471] Config param num_vocab_tiling: 1 I0425 12:36:59.442989 132429263365952 pyconfig.py:471] Config param off_policy_steps: 0 I0425 12:36:59.443004 132429263365952 pyconfig.py:471] Config param offline_data_dir: None I0425 12:36:59.443020 132429263365952 pyconfig.py:471] Config param opt_type: OptimizerType.ADAM_PAX I0425 12:36:59.443038 132429263365952 pyconfig.py:471] Config param optimize_mesh_for_tpu_v6e: False I0425 12:36:59.443054 132429263365952 pyconfig.py:471] Config param optimizer_memory_host_offload: False I0425 12:36:59.443070 132429263365952 pyconfig.py:471] Config param original_max_position_embeddings: 4096 I0425 12:36:59.443086 132429263365952 pyconfig.py:471] Config param out_hidden_size_for_vit: 512 I0425 12:36:59.443110 132429263365952 pyconfig.py:471] Config param out_proj: RematLocation.REMAT I0425 12:36:59.443126 132429263365952 pyconfig.py:471] Config param output_dim_for_audio: 512 I0425 12:36:59.443141 132429263365952 pyconfig.py:471] Config param override_logical_axis_rules: False I0425 12:36:59.443156 132429263365952 pyconfig.py:471] Config param override_model_config: True I0425 12:36:59.443172 132429263365952 pyconfig.py:471] Config param packing: True I0425 12:36:59.443188 132429263365952 pyconfig.py:471] Config param pagedattn_head_dim_alignment: 128 I0425 12:36:59.443202 132429263365952 pyconfig.py:471] Config param pagedattn_max_pages_per_group: -1 I0425 12:36:59.443217 132429263365952 pyconfig.py:471] Config param pagedattn_num_pages: 64 I0425 12:36:59.443233 132429263365952 pyconfig.py:471] Config param pagedattn_pages_per_compute_block: 4 I0425 12:36:59.443247 132429263365952 pyconfig.py:471] Config param pagedattn_tokens_per_page: 32 I0425 12:36:59.443263 132429263365952 pyconfig.py:471] Config param param_scan_axis: 1 I0425 12:36:59.443277 132429263365952 pyconfig.py:471] Config param parameter_memory_host_offload: False I0425 12:36:59.443292 132429263365952 pyconfig.py:471] Config param partial_rotary_factor: 1.0 I0425 12:36:59.443307 132429263365952 pyconfig.py:471] Config param patch_size_for_vit: 14 I0425 12:36:59.443323 132429263365952 pyconfig.py:471] Config param penalty_incorrect_answer: -1.0 I0425 12:36:59.443339 132429263365952 pyconfig.py:471] Config param penalty_incorrect_format: -0.5 I0425 12:36:59.443354 132429263365952 pyconfig.py:471] Config param per_device_batch_size: 2 I0425 12:36:59.443369 132429263365952 pyconfig.py:471] Config param per_device_batch_size_increment: 2.0 I0425 12:36:59.443386 132429263365952 pyconfig.py:471] Config param per_device_batch_size_start: 4.0 I0425 12:36:59.443401 132429263365952 pyconfig.py:471] Config param pipeline_delay_activation_forwarding: False I0425 12:36:59.443417 132429263365952 pyconfig.py:471] Config param pipeline_fsdp_ag_once: False I0425 12:36:59.443431 132429263365952 pyconfig.py:471] Config param pipeline_fsdp_ag_per_repeat: False I0425 12:36:59.443447 132429263365952 pyconfig.py:471] Config param pipeline_parallel_layers: 1 I0425 12:36:59.443463 132429263365952 pyconfig.py:471] Config param pixel_shuffle_ratio_for_vit: 0.5 I0425 12:36:59.443478 132429263365952 pyconfig.py:471] Config param posemb_type_for_vit: learn I0425 12:36:59.443496 132429263365952 pyconfig.py:471] Config param position_id_per_seconds: 25 I0425 12:36:59.443511 132429263365952 pyconfig.py:471] Config param prefill_cache_axis_order: 1,2,0,3 I0425 12:36:59.443527 132429263365952 pyconfig.py:471] Config param prefill_cache_dir: I0425 12:36:59.443542 132429263365952 pyconfig.py:471] Config param prefill_chunk_size: 256 I0425 12:36:59.443557 132429263365952 pyconfig.py:471] Config param prefill_slice: v5e-16 I0425 12:36:59.443573 132429263365952 pyconfig.py:471] Config param prefix_caching_dram_byte: 100000000000 I0425 12:36:59.443587 132429263365952 pyconfig.py:471] Config param prefix_caching_hbm_byte: 10000000000 I0425 12:36:59.443602 132429263365952 pyconfig.py:471] Config param prefuse_moe_weights: False I0425 12:36:59.443617 132429263365952 pyconfig.py:471] Config param profile_cleanly: True I0425 12:36:59.443632 132429263365952 pyconfig.py:471] Config param profile_periodically_period: -1 I0425 12:36:59.443648 132429263365952 pyconfig.py:471] Config param profile_power_events: False I0425 12:36:59.443662 132429263365952 pyconfig.py:471] Config param profiler: ProfilerType.NONE I0425 12:36:59.443680 132429263365952 pyconfig.py:471] Config param profiler_steps: 5 I0425 12:36:59.443695 132429263365952 pyconfig.py:471] Config param projector_dropout_for_vit: 0.0 I0425 12:36:59.443711 132429263365952 pyconfig.py:471] Config param projector_input_dim_for_vit: 4096 I0425 12:36:59.443725 132429263365952 pyconfig.py:471] Config param projector_output_dim_for_vit: 4096 I0425 12:36:59.443741 132429263365952 pyconfig.py:471] Config param prometheus_port: 0 I0425 12:36:59.443756 132429263365952 pyconfig.py:471] Config param prompt: I love to I0425 12:36:59.443771 132429263365952 pyconfig.py:471] Config param pure_nnx: False I0425 12:36:59.443787 132429263365952 pyconfig.py:471] Config param pure_nnx_decoder: False I0425 12:36:59.443803 132429263365952 pyconfig.py:471] Config param q_lora_rank: 0 I0425 12:36:59.443818 132429263365952 pyconfig.py:471] Config param qk_clip_threshold: 100.0 I0425 12:36:59.443833 132429263365952 pyconfig.py:471] Config param qk_nope_head_dim: 128 I0425 12:36:59.443850 132429263365952 pyconfig.py:471] Config param qk_norm_with_scale: True I0425 12:36:59.443864 132429263365952 pyconfig.py:471] Config param qk_rope_head_dim: 64 I0425 12:36:59.443879 132429263365952 pyconfig.py:471] Config param qkv_proj: RematLocation.REMAT I0425 12:36:59.443895 132429263365952 pyconfig.py:471] Config param quant_cfg_path: I0425 12:36:59.443911 132429263365952 pyconfig.py:471] Config param quantization: QuantizationType.NONE I0425 12:36:59.443928 132429263365952 pyconfig.py:471] Config param quantization_local_shard_count: 4 I0425 12:36:59.443943 132429263365952 pyconfig.py:471] Config param quantize_kvcache: False I0425 12:36:59.443959 132429263365952 pyconfig.py:471] Config param query_proj: RematLocation.REMAT I0425 12:36:59.443974 132429263365952 pyconfig.py:471] Config param query_wa_proj: RematLocation.REMAT I0425 12:36:59.443990 132429263365952 pyconfig.py:471] Config param ragged_block_size: 256 I0425 12:36:59.444006 132429263365952 pyconfig.py:471] Config param ragged_buffer_factor: -1.0 I0425 12:36:59.444023 132429263365952 pyconfig.py:471] Config param rampup_end_step: 0 I0425 12:36:59.444038 132429263365952 pyconfig.py:471] Config param rampup_samples_per_increment_to_load: None I0425 12:36:59.444054 132429263365952 pyconfig.py:471] Config param reasoning_end_token: </reasoning> I0425 12:36:59.444069 132429263365952 pyconfig.py:471] Config param reasoning_start_token: <reasoning> I0425 12:36:59.444085 132429263365952 pyconfig.py:471] Config param record_internal_nn_metrics: 0 I0425 12:36:59.444109 132429263365952 pyconfig.py:471] Config param remat_policy: full I0425 12:36:59.444125 132429263365952 pyconfig.py:471] Config param remat_policy_for_vit: minimal I0425 12:36:59.444141 132429263365952 pyconfig.py:471] Config param remove_size_one_mesh_axis_from_type: True I0425 12:36:59.444155 132429263365952 pyconfig.py:471] Config param replicate_quant_scale: False I0425 12:36:59.444170 132429263365952 pyconfig.py:471] Config param replicator_backup_interval_minutes: 0 I0425 12:36:59.444186 132429263365952 pyconfig.py:471] Config param report_heartbeat_metric_for_gcp_monitoring: False I0425 12:36:59.444201 132429263365952 pyconfig.py:471] Config param report_performance_metric_for_gcp_monitoring: False I0425 12:36:59.444216 132429263365952 pyconfig.py:471] Config param reshape_q: False I0425 12:36:59.444232 132429263365952 pyconfig.py:471] Config param return_log_prob: False I0425 12:36:59.444247 132429263365952 pyconfig.py:471] Config param reuse_example_batch: 0 I0425 12:36:59.444263 132429263365952 pyconfig.py:471] Config param reward_exact_answer: 5.0 I0425 12:36:59.444278 132429263365952 pyconfig.py:471] Config param reward_exact_format_match: 3.0 I0425 12:36:59.444294 132429263365952 pyconfig.py:471] Config param reward_partial_format_match: 0.5 I0425 12:36:59.444308 132429263365952 pyconfig.py:471] Config param reward_ratio_guess_to_answer_high: 0.5 I0425 12:36:59.444324 132429263365952 pyconfig.py:471] Config param reward_ratio_guess_to_answer_low: 0.25 I0425 12:36:59.444339 132429263365952 pyconfig.py:471] Config param reward_white_space_format_match: 1.5 I0425 12:36:59.444355 132429263365952 pyconfig.py:471] Config param rl: {'num_generations': 2, 'num_iterations': 1, 'grpo_beta': 0.08, 'grpo_epsilon': 0.2, 'loss_algo': 'grpo', 'use_agentic_rollout': False, 'max_concurrency': 256, 'off_policy_steps': 0, 'system_prompt': '', 'degenerate_group_masking': True, 'epsilon_high': None} I0425 12:36:59.444374 132429263365952 pyconfig.py:471] Config param rollout_data_parallelism: -1 I0425 12:36:59.444389 132429263365952 pyconfig.py:471] Config param rollout_expert_parallelism: 1 I0425 12:36:59.444405 132429263365952 pyconfig.py:471] Config param rollout_micro_batch_size: -1 I0425 12:36:59.444419 132429263365952 pyconfig.py:471] Config param rollout_tensor_parallelism: -1 I0425 12:36:59.444434 132429263365952 pyconfig.py:471] Config param rope_attention_scaling: False I0425 12:36:59.444451 132429263365952 pyconfig.py:471] Config param rope_factor: 40 I0425 12:36:59.444467 132429263365952 pyconfig.py:471] Config param rope_interleave: True I0425 12:36:59.444481 132429263365952 pyconfig.py:471] Config param rope_linear_scaling_factor: 1.0 I0425 12:36:59.444501 132429263365952 pyconfig.py:471] Config param rope_max_timescale: 10000 I0425 12:36:59.444516 132429263365952 pyconfig.py:471] Config param rope_min_timescale: 1 I0425 12:36:59.444531 132429263365952 pyconfig.py:471] Config param rope_theta_for_vit: 10000 I0425 12:36:59.444545 132429263365952 pyconfig.py:471] Config param rope_truncate: True I0425 12:36:59.444561 132429263365952 pyconfig.py:471] Config param rope_type: RopeType.DEFAULT I0425 12:36:59.444577 132429263365952 pyconfig.py:471] Config param rope_use_scale: True I0425 12:36:59.444593 132429263365952 pyconfig.py:471] Config param routed_bias: False I0425 12:36:59.444607 132429263365952 pyconfig.py:471] Config param routed_bias_update_rate: 0.0 I0425 12:36:59.444623 132429263365952 pyconfig.py:471] Config param routed_scaling_factor: 1.0 I0425 12:36:59.444638 132429263365952 pyconfig.py:471] Config param routed_score_func: I0425 12:36:59.444655 132429263365952 pyconfig.py:471] Config param run_name: gpt3-52k_2026-04-25-12-36 I0425 12:36:59.444669 132429263365952 pyconfig.py:471] Config param sa_block_kv: 512 I0425 12:36:59.444684 132429263365952 pyconfig.py:471] Config param sa_block_kv_compute: 512 I0425 12:36:59.444699 132429263365952 pyconfig.py:471] Config param sa_block_kv_dkv: 512 I0425 12:36:59.444714 132429263365952 pyconfig.py:471] Config param sa_block_kv_dkv_compute: 512 I0425 12:36:59.444728 132429263365952 pyconfig.py:471] Config param sa_block_kv_dq: 512 I0425 12:36:59.444744 132429263365952 pyconfig.py:471] Config param sa_block_q: 512 I0425 12:36:59.444758 132429263365952 pyconfig.py:471] Config param sa_block_q_dkv: 512 I0425 12:36:59.444774 132429263365952 pyconfig.py:471] Config param sa_block_q_dq: 512 I0425 12:36:59.444788 132429263365952 pyconfig.py:471] Config param sa_k_layout: HEAD_DIM_MINOR I0425 12:36:59.444804 132429263365952 pyconfig.py:471] Config param sa_q_layout: HEAD_DIM_MINOR I0425 12:36:59.444818 132429263365952 pyconfig.py:471] Config param sa_use_fused_bwd_kernel: False I0425 12:36:59.444833 132429263365952 pyconfig.py:471] Config param sa_v_layout: HEAD_DIM_MINOR I0425 12:36:59.444847 132429263365952 pyconfig.py:471] Config param sampler_devices_fraction: 0.5 I0425 12:36:59.444864 132429263365952 pyconfig.py:471] Config param save_checkpoint_on_completion: True I0425 12:36:59.444878 132429263365952 pyconfig.py:471] Config param save_config_to_gcs: False I0425 12:36:59.444893 132429263365952 pyconfig.py:471] Config param save_quantized_params_path: I0425 12:36:59.444908 132429263365952 pyconfig.py:471] Config param scale_embedding_for_audio: True I0425 12:36:59.444923 132429263365952 pyconfig.py:471] Config param scan_layers: True I0425 12:36:59.444937 132429263365952 pyconfig.py:471] Config param scan_layers_per_stage: False I0425 12:36:59.444953 132429263365952 pyconfig.py:471] Config param scan_pipeline_iterations: True I0425 12:36:59.444967 132429263365952 pyconfig.py:471] Config param scan_pipeline_repeats: False I0425 12:36:59.444983 132429263365952 pyconfig.py:471] Config param set_remat_policy_on_layers_per_stage: False I0425 12:36:59.444997 132429263365952 pyconfig.py:471] Config param set_remat_policy_on_pipeline_iterations: True I0425 12:36:59.445012 132429263365952 pyconfig.py:471] Config param sft_train_on_completion_only: False I0425 12:36:59.445026 132429263365952 pyconfig.py:471] Config param shard_exp_on_fsdp: False I0425 12:36:59.445041 132429263365952 pyconfig.py:471] Config param shard_mode: ShardMode.AUTO I0425 12:36:59.445059 132429263365952 pyconfig.py:471] Config param shard_optimizer_over_data: False I0425 12:36:59.445073 132429263365952 pyconfig.py:471] Config param sharding_strategy: None I0425 12:36:59.445089 132429263365952 pyconfig.py:471] Config param sharding_tolerance: 0.02 I0425 12:36:59.445207 132429263365952 pyconfig.py:471] Config param shardy: True I0425 12:36:59.445233 132429263365952 pyconfig.py:471] Config param share_kv_projections: False I0425 12:36:59.445255 132429263365952 pyconfig.py:471] Config param shared_experts: 0 I0425 12:36:59.445273 132429263365952 pyconfig.py:471] Config param sinkhorn_iterations: 20 I0425 12:36:59.445289 132429263365952 pyconfig.py:471] Config param skip_first_n_steps_for_profiler: 1 I0425 12:36:59.445305 132429263365952 pyconfig.py:471] Config param skip_jax_distributed_system: False I0425 12:36:59.445320 132429263365952 pyconfig.py:471] Config param skip_step_interval: 128 I0425 12:36:59.445336 132429263365952 pyconfig.py:471] Config param skip_step_on_spikes: False I0425 12:36:59.445352 132429263365952 pyconfig.py:471] Config param skip_step_scaling_factor: 6.0 I0425 12:36:59.445368 132429263365952 pyconfig.py:471] Config param sliding_window_size: 0 I0425 12:36:59.445383 132429263365952 pyconfig.py:471] Config param solution_end_token: </answer> I0425 12:36:59.445398 132429263365952 pyconfig.py:471] Config param solution_start_token: <answer> I0425 12:36:59.445413 132429263365952 pyconfig.py:471] Config param source_checkpoint_layout: orbax I0425 12:36:59.445427 132429263365952 pyconfig.py:471] Config param sparse_matmul: True I0425 12:36:59.445443 132429263365952 pyconfig.py:471] Config param spatial_merge_size_for_vit: 2 I0425 12:36:59.445459 132429263365952 pyconfig.py:471] Config param stack_prefill_result_cache: False I0425 12:36:59.445473 132429263365952 pyconfig.py:471] Config param stack_trace_interval_seconds: 600 I0425 12:36:59.445492 132429263365952 pyconfig.py:471] Config param stack_trace_to_cloud: False I0425 12:36:59.445508 132429263365952 pyconfig.py:471] Config param step_deviation_interval_seconds: 30 I0425 12:36:59.445523 132429263365952 pyconfig.py:471] Config param steps: 200000 I0425 12:36:59.445539 132429263365952 pyconfig.py:471] Config param stop_strings: None I0425 12:36:59.445554 132429263365952 pyconfig.py:471] Config param student_overrides: {'model_name': 'llama3.1-8b'} I0425 12:36:59.445570 132429263365952 pyconfig.py:471] Config param student_params_to_update: None I0425 12:36:59.445586 132429263365952 pyconfig.py:471] Config param subslice_shape: I0425 12:36:59.445602 132429263365952 pyconfig.py:471] Config param swap_space_vllm_gb: 2 I0425 12:36:59.445620 132429263365952 pyconfig.py:471] Config param system_prompt: I0425 12:36:59.445635 132429263365952 pyconfig.py:471] Config param target_eval_loss: 0.0 I0425 12:36:59.445650 132429263365952 pyconfig.py:471] Config param teacher_overrides: {'model_name': 'llama3.1-8b'} I0425 12:36:59.445667 132429263365952 pyconfig.py:471] Config param temperature_tuning: False I0425 12:36:59.445680 132429263365952 pyconfig.py:471] Config param temporal_patch_size_for_vit: 2 I0425 12:36:59.445696 132429263365952 pyconfig.py:471] Config param tensorboard_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-12-36/tensorboard/ I0425 12:36:59.445711 132429263365952 pyconfig.py:471] Config param tensors_on_device: None I0425 12:36:59.445726 132429263365952 pyconfig.py:471] Config param tensors_to_offload: None I0425 12:36:59.445740 132429263365952 pyconfig.py:471] Config param test_batch_start_index: 0 I0425 12:36:59.445755 132429263365952 pyconfig.py:471] Config param tile_size_for_vit: 336 I0425 12:36:59.445769 132429263365952 pyconfig.py:471] Config param tokenize_eval_data: True I0425 12:36:59.445785 132429263365952 pyconfig.py:471] Config param tokenize_train_data: True I0425 12:36:59.445799 132429263365952 pyconfig.py:471] Config param tokenizer_path: meta-llama/Llama-3.1-8B I0425 12:36:59.445814 132429263365952 pyconfig.py:471] Config param tokenizer_type: TokenizerType.HUGGINGFACE I0425 12:36:59.445832 132429263365952 pyconfig.py:471] Config param topk_routing_group: -1 I0425 12:36:59.445849 132429263365952 pyconfig.py:471] Config param train_data_columns: ['text'] I0425 12:36:59.445863 132429263365952 pyconfig.py:471] Config param train_fraction: 1.0 I0425 12:36:59.445879 132429263365952 pyconfig.py:471] Config param train_image_column: image I0425 12:36:59.445893 132429263365952 pyconfig.py:471] Config param train_micro_batch_size: -1 I0425 12:36:59.445909 132429263365952 pyconfig.py:471] Config param train_split: train I0425 12:36:59.445925 132429263365952 pyconfig.py:471] Config param trainable_parameters_mask: [] I0425 12:36:59.445941 132429263365952 pyconfig.py:471] Config param trainable_position_size: 2048 I0425 12:36:59.445956 132429263365952 pyconfig.py:471] Config param trainer_devices_fraction: 0.5 I0425 12:36:59.445971 132429263365952 pyconfig.py:471] Config param upload_all_profiler_results: False I0425 12:36:59.445986 132429263365952 pyconfig.py:471] Config param use_2d_fsdp_sharding: False I0425 12:36:59.446001 132429263365952 pyconfig.py:471] Config param use_agentic_rollout: False I0425 12:36:59.446017 132429263365952 pyconfig.py:471] Config param use_audio: False I0425 12:36:59.446031 132429263365952 pyconfig.py:471] Config param use_audio_in_video: False I0425 12:36:59.446047 132429263365952 pyconfig.py:471] Config param use_batch_split_schedule: False I0425 12:36:59.446061 132429263365952 pyconfig.py:471] Config param use_chat_template: False I0425 12:36:59.446076 132429263365952 pyconfig.py:471] Config param use_chunked_prefill: False I0425 12:36:59.446091 132429263365952 pyconfig.py:471] Config param use_custom_sort_vjp: True I0425 12:36:59.446118 132429263365952 pyconfig.py:471] Config param use_dpo: False I0425 12:36:59.446133 132429263365952 pyconfig.py:471] Config param use_gather_mosaic_kernel: False I0425 12:36:59.446148 132429263365952 pyconfig.py:471] Config param use_grpo: True I0425 12:36:59.446162 132429263365952 pyconfig.py:471] Config param use_indexer: False I0425 12:36:59.446177 132429263365952 pyconfig.py:471] Config param use_iota_embed: True I0425 12:36:59.446192 132429263365952 pyconfig.py:471] Config param use_jax_splash: False I0425 12:36:59.446207 132429263365952 pyconfig.py:471] Config param use_max_logit_estimate: -1 I0425 12:36:59.446222 132429263365952 pyconfig.py:471] Config param use_mrope: False I0425 12:36:59.446237 132429263365952 pyconfig.py:471] Config param use_multimodal: False I0425 12:36:59.446254 132429263365952 pyconfig.py:471] Config param use_pathways: True I0425 12:36:59.446268 132429263365952 pyconfig.py:471] Config param use_post_attn_norm: False I0425 12:36:59.446284 132429263365952 pyconfig.py:471] Config param use_post_ffw_norm: False I0425 12:36:59.446298 132429263365952 pyconfig.py:471] Config param use_qk_clip: False I0425 12:36:59.446313 132429263365952 pyconfig.py:471] Config param use_qk_norm: False I0425 12:36:59.446328 132429263365952 pyconfig.py:471] Config param use_qk_norm_in_gdn: True I0425 12:36:59.446343 132429263365952 pyconfig.py:471] Config param use_qwix_quantization: False I0425 12:36:59.446357 132429263365952 pyconfig.py:471] Config param use_ragged_attention: False I0425 12:36:59.446373 132429263365952 pyconfig.py:471] Config param use_random_routing: False I0425 12:36:59.446387 132429263365952 pyconfig.py:471] Config param use_replicator_service: False I0425 12:36:59.446402 132429263365952 pyconfig.py:471] Config param use_ring_of_experts: False I0425 12:36:59.446416 132429263365952 pyconfig.py:471] Config param use_sft: False I0425 12:36:59.446432 132429263365952 pyconfig.py:471] Config param use_splash_scheduler: False I0425 12:36:59.446446 132429263365952 pyconfig.py:471] Config param use_tokamax_gmm: False I0425 12:36:59.446461 132429263365952 pyconfig.py:471] Config param use_tokamax_splash: False I0425 12:36:59.446475 132429263365952 pyconfig.py:471] Config param use_truncation: True I0425 12:36:59.446494 132429263365952 pyconfig.py:471] Config param use_tunix_gradient_accumulation: False I0425 12:36:59.446507 132429263365952 pyconfig.py:471] Config param use_untrainable_positional_embedding: False I0425 12:36:59.446521 132429263365952 pyconfig.py:471] Config param use_vertex_tensorboard: False I0425 12:36:59.446537 132429263365952 pyconfig.py:471] Config param using_pipeline_parallelism: False I0425 12:36:59.446551 132429263365952 pyconfig.py:471] Config param v_head_dim: 128 I0425 12:36:59.446567 132429263365952 pyconfig.py:471] Config param v_norm_with_scale: True I0425 12:36:59.446582 132429263365952 pyconfig.py:471] Config param value_proj: RematLocation.REMAT I0425 12:36:59.446597 132429263365952 pyconfig.py:471] Config param vertex_tensorboard_project: I0425 12:36:59.446611 132429263365952 pyconfig.py:471] Config param vertex_tensorboard_region: I0425 12:36:59.446625 132429263365952 pyconfig.py:471] Config param video_path: I0425 12:36:59.446641 132429263365952 pyconfig.py:471] Config param video_placeholder: <|video|> I0425 12:36:59.446655 132429263365952 pyconfig.py:471] Config param vision_output_dim_for_vit: 4096 I0425 12:36:59.446670 132429263365952 pyconfig.py:471] Config param vision_output_length: -1 I0425 12:36:59.446684 132429263365952 pyconfig.py:471] Config param vllm_additional_config: {} I0425 12:36:59.446699 132429263365952 pyconfig.py:471] Config param vllm_hf_config_path: I0425 12:36:59.446713 132429263365952 pyconfig.py:471] Config param vllm_hf_overrides: {} I0425 12:36:59.446729 132429263365952 pyconfig.py:471] Config param vocab_size: 32000 I0425 12:36:59.446745 132429263365952 pyconfig.py:471] Config param warmup_steps_fraction: 0.1 I0425 12:36:59.446759 132429263365952 pyconfig.py:471] Config param weight_dtype: float32 I0425 12:36:59.446787 132429263365952 pyconfig.py:471] Config param weight_quantization_calibration_method: absmax I0425 12:36:59.446802 132429263365952 pyconfig.py:471] Config param wi_tile_dlhs_batch_seq: 512 I0425 12:36:59.446819 132429263365952 pyconfig.py:471] Config param wi_tile_dlhs_embed_dim: 1024 I0425 12:36:59.446833 132429263365952 pyconfig.py:471] Config param wi_tile_dlhs_mlp_dim: 1024 I0425 12:36:59.446849 132429263365952 pyconfig.py:471] Config param wi_tile_drhs_batch_seq: 512 I0425 12:36:59.446863 132429263365952 pyconfig.py:471] Config param wi_tile_drhs_embed_dim: 1024 I0425 12:36:59.446879 132429263365952 pyconfig.py:471] Config param wi_tile_drhs_mlp_dim: 1024 I0425 12:36:59.446893 132429263365952 pyconfig.py:471] Config param wi_tile_fwd_batch_seq: 512 I0425 12:36:59.446908 132429263365952 pyconfig.py:471] Config param wi_tile_fwd_embed_dim: 1024 I0425 12:36:59.446923 132429263365952 pyconfig.py:471] Config param wi_tile_fwd_mlp_dim: 1024 I0425 12:36:59.446939 132429263365952 pyconfig.py:471] Config param wo_tile_dlhs_batch_seq: 512 I0425 12:36:59.446953 132429263365952 pyconfig.py:471] Config param wo_tile_dlhs_embed_dim: 1024 I0425 12:36:59.446968 132429263365952 pyconfig.py:471] Config param wo_tile_dlhs_mlp_dim: 1024 I0425 12:36:59.446982 132429263365952 pyconfig.py:471] Config param wo_tile_drhs_batch_seq: 512 I0425 12:36:59.446998 132429263365952 pyconfig.py:471] Config param wo_tile_drhs_embed_dim: 1024 I0425 12:36:59.447011 132429263365952 pyconfig.py:471] Config param wo_tile_drhs_mlp_dim: 1024 I0425 12:36:59.447026 132429263365952 pyconfig.py:471] Config param wo_tile_fwd_batch_seq: 512 I0425 12:36:59.447043 132429263365952 pyconfig.py:471] Config param wo_tile_fwd_embed_dim: 1024 I0425 12:36:59.447057 132429263365952 pyconfig.py:471] Config param wo_tile_fwd_mlp_dim: 1024 I0425 12:36:59.447071 132429263365952 pyconfig.py:471] Config param wsd_decay_steps_fraction: 0.1 I0425 12:36:59.447087 132429263365952 pyconfig.py:471] Config param wsd_decay_style: WsdDecayStyle.LINEAR I0425 12:36:59.447112 132429263365952 pyconfig.py:471] Config param xprof_e2e_enable_fw_power_level_event: False I0425 12:36:59.447128 132429263365952 pyconfig.py:471] Config param xprof_e2e_enable_fw_thermal_event: False I0425 12:36:59.447142 132429263365952 pyconfig.py:471] Config param xprof_e2e_enable_fw_throttle_event: False I0425 12:36:59.447158 132429263365952 pyconfig.py:471] Config param xprof_tpu_power_trace_level: 0 I0425 12:36:59.447173 132429263365952 pyconfig.py:471] Config param z_loss_multiplier: 0.0 I0425 12:36:59.447505 132429263365952 tokenizer.py:245] Tokenizer path: meta-llama/Llama-2-7b-chat-hf I0425 12:36:59.447544 132429263365952 tokenizer.py:224] Loading HF tokenizer: meta-llama/Llama-2-7b-chat-hf I0425 12:36:59.629454 132429263365952 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json "HTTP/1.1 200 OK" I0425 12:36:59.735421 132429263365952 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json "HTTP/1.1 200 OK" I0425 12:36:59.846511 132429263365952 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer_config.json "HTTP/1.1 200 OK" I0425 12:36:59.958696 132429263365952 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer_config.json "HTTP/1.1 200 OK" I0425 12:37:00.069355 132429263365952 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main/additional_chat_templates?recursive=false&expand=false "HTTP/1.1 404 Not Found" I0425 12:37:00.170532 132429263365952 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main?recursive=true&expand=false "HTTP/1.1 200 OK" I0425 12:37:00.280682 132429263365952 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer.model "HTTP/1.1 302 Found" I0425 12:37:00.390251 132429263365952 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/xet-read-token/f5db02db724555f92da89c216ac04704f23d4590 "HTTP/1.1 200 OK" I0425 12:37:01.006219 132429263365952 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer.json "HTTP/1.1 200 OK" I0425 12:37:01.123492 132429263365952 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer.json "HTTP/1.1 200 OK" I0425 12:37:01.322163 132429263365952 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/added_tokens.json "HTTP/1.1 404 Not Found" I0425 12:37:01.432627 132429263365952 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/special_tokens_map.json "HTTP/1.1 200 OK" I0425 12:37:01.541991 132429263365952 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/special_tokens_map.json "HTTP/1.1 200 OK" I0425 12:37:01.655000 132429263365952 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/chat_template.jinja "HTTP/1.1 404 Not Found" I0425 12:37:01.746901 132429263365952 _schedule.py:129] A polynomial schedule was set with a non-positive `transition_steps` value; this results in a constant schedule with value `init_value`. I0425 12:37:01.753747 132429263365952 maxtext_utils.py:1771] Num_devices: 32, shape (1, 4, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1) I0425 12:37:01.753870 132429263365952 train_distill.py:594] Applying logical axis rules for model initialization and training... I0425 12:37:01.753940 132429263365952 train_distill.py:598] Loading Student from ... I0425 12:37:01.753969 132429263365952 train_distill.py:170] --- Student Configuration --- I0425 12:37:01.753990 132429263365952 train_distill.py:171] Model Name: gpt3-52k I0425 12:37:01.754011 132429263365952 train_distill.py:172] Dimensions: 1 Layers, 16 Emb Dim, 8 Head Dim I0425 12:37:01.754030 132429263365952 train_distill.py:175] Attention Heads: 2 Query, 2 KV I0425 12:37:01.754048 132429263365952 train_distill.py:176] Vocab Size: 32000 I0425 12:37:01.754065 132429263365952 train_distill.py:177] Checkpoint: I0425 12:37:01.754082 132429263365952 train_distill.py:463] Initializing model: gpt3-52k... I0425 12:37:03.415879 132429263365952 train_distill.py:612] Loading Teacher from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items... I0425 12:37:03.415987 132429263365952 train_distill.py:170] --- Teacher Configuration --- I0425 12:37:03.416016 132429263365952 train_distill.py:171] Model Name: gpt3-52k I0425 12:37:03.416046 132429263365952 train_distill.py:172] Dimensions: 1 Layers, 16 Emb Dim, 8 Head Dim I0425 12:37:03.416075 132429263365952 train_distill.py:175] Attention Heads: 2 Query, 2 KV I0425 12:37:03.416116 132429263365952 train_distill.py:176] Vocab Size: 32000 I0425 12:37:03.416144 132429263365952 train_distill.py:177] Checkpoint: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items I0425 12:37:03.416171 132429263365952 train_distill.py:463] Initializing model: gpt3-52k... I0425 12:37:04.491088 132429263365952 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0425 12:37:04.491264 132429263365952 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=True, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7870df187fe0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0425 12:37:04.491325 132429263365952 abstract_checkpointer.py:35] orbax-checkpoint version: 0.11.28 W0425 12:37:04.997848 132429263365952 checkpoint.py:202] Metadata file does not exist: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items/_CHECKPOINT_METADATA I0425 12:37:05.523376 1966 google_auth_provider.cc:181] Running on GCE, using service account 562977990677-compute@developer.gserviceaccount.com I0425 12:37:06.353659 132429263365952 checkpointer.py:304] Restoring checkpoint from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items. W0425 12:37:08.250764 132429263365952 transform_utils.py:230] The transformations API will eventually be replaced by an upgraded design. The current API will not be removed until this point, but it will no longer be actively worked on. I0425 12:37:08.251160 132429263365952 transform_utils.py:288] The following keys are not loaded from the original tree after applying specified transforms: params/params/decoder/to_nnx__rngs/aqt/count, params/params/decoder/to_nnx__rngs/aqt/key, params/params/decoder/to_nnx__rngs/dropout/count, params/params/decoder/to_nnx__rngs/dropout/key, params/params/decoder/to_nnx__rngs/params/count, params/params/decoder/to_nnx__rngs/params/key I0425 12:37:10.681402 132429263365952 checkpointer.py:318] Finished restoring checkpoint in 4.70 seconds from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items. I0425 12:37:11.413488 132429263365952 train_distill.py:638] Initializing Data Iterators via MaxText pipeline... I0425 12:37:11.476791 132429263365952 config.py:112] TensorFlow version 2.20.0 available. I0425 12:37:11.477285 132429263365952 config.py:125] JAX version 0.9.2 available. I0425 12:37:11.875832 132429263365952 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/main/README.md "HTTP/1.1 307 Temporary Redirect" I0425 12:37:11.884615 132429263365952 _client.py:1025] HTTP Request: HEAD https://huggingface.co/api/resolve-cache/datasets/OptimalScale/ClimbMix/6d467b96d8f26cbe7465e2d70e36191aa75867ac/README.md "HTTP/1.1 200 OK" I0425 12:37:11.893135 132429263365952 _client.py:1025] HTTP Request: GET https://huggingface.co/api/resolve-cache/datasets/OptimalScale/ClimbMix/6d467b96d8f26cbe7465e2d70e36191aa75867ac/README.md "HTTP/1.1 200 OK" I0425 12:37:11.998768 132429263365952 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/6d467b96d8f26cbe7465e2d70e36191aa75867ac/ClimbMix.py "HTTP/1.1 404 Not Found" I0425 12:37:12.303333 132429263365952 _client.py:1025] HTTP Request: HEAD https://s3.amazonaws.com/datasets.huggingface.co/datasets/datasets/OptimalScale/ClimbMix/OptimalScale/ClimbMix.py "HTTP/1.1 404 Not Found" I0425 12:37:12.417210 132429263365952 _client.py:1025] HTTP Request: GET https://huggingface.co/api/datasets/OptimalScale/ClimbMix/revision/6d467b96d8f26cbe7465e2d70e36191aa75867ac "HTTP/1.1 200 OK" I0425 12:37:12.528142 132429263365952 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/6d467b96d8f26cbe7465e2d70e36191aa75867ac/.huggingface.yaml "HTTP/1.1 404 Not Found" I0425 12:37:12.699657 132429263365952 _client.py:1025] HTTP Request: GET https://datasets-server.huggingface.co/info?dataset=OptimalScale/ClimbMix "HTTP/1.1 200 OK" I0425 12:37:12.853958 132429263365952 _client.py:1025] HTTP Request: GET https://huggingface.co/api/datasets/OptimalScale/ClimbMix/tree/6d467b96d8f26cbe7465e2d70e36191aa75867ac/data?recursive=true&expand=false "HTTP/1.1 404 Not Found" I0425 12:37:12.967163 132429263365952 _client.py:1025] HTTP Request: GET https://huggingface.co/api/datasets/OptimalScale/ClimbMix/tree/6d467b96d8f26cbe7465e2d70e36191aa75867ac?recursive=false&expand=false "HTTP/1.1 200 OK" I0425 12:37:13.103075 132429263365952 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/6d467b96d8f26cbe7465e2d70e36191aa75867ac/dataset_infos.json "HTTP/1.1 404 Not Found" I0425 12:37:13.271920 132429263365952 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json "HTTP/1.1 200 OK" I0425 12:37:13.380566 132429263365952 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer_config.json "HTTP/1.1 200 OK" I0425 12:37:13.492754 132429263365952 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main/additional_chat_templates?recursive=false&expand=false "HTTP/1.1 404 Not Found" I0425 12:37:13.602460 132429263365952 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main?recursive=true&expand=false "HTTP/1.1 200 OK" E0425 12:37:13.695597 132429263365952 packing.py:209] PackAndBatchOperation is deprecated. Please use lazy_dataset.FirstFitPackIterDataset instead. I0425 12:37:13.695807 132429263365952 data_loader.py:408] Adding CopyNumPyArrayToSharedMemory MapTransform. I0425 12:37:13.699006 132429263365952 train_distill.py:408] Input Pipeline Checkpointing: DISABLED I0425 12:37:13.699078 132429263365952 train_distill.py:412] Reason: Iterator 'MultiHostDataLoadIterator' is not recognized as Grain (dataset_type='DatasetType.HF', has_save=False) I0425 12:37:13.699162 132429263365952 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0425 12:37:13.699253 132429263365952 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7870df187fe0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0425 12:37:13.699314 132429263365952 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0425 12:37:13.699355 132429263365952 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7870df187fe0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0425 12:37:13.699415 132429263365952 checkpoint_manager.py:702] [process=6][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7859e1dcf5c0>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7859e1dcf530>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7859e1dcf4a0>}, handler_registry=None I0425 12:37:13.699656 132429263365952 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7859e1dcf5c0>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0425 12:37:13.699704 132429263365952 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7859e1dcf530>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0425 12:37:13.699733 132429263365952 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7859e1dcf4a0>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0425 12:37:13.699758 132429263365952 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x786b28333c50>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0425 12:37:13.699786 132429263365952 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7859e1dcf5c0>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7859e1dcf5c0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7859e1dcf530>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7859e1dcf530>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7859e1dcf4a0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7859e1dcf4a0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x786b28333c50>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x786b28333c50>}). I0425 12:37:13.700198 132429263365952 async_checkpointer.py:177] [process=6][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x786b15cfc220> timeout: 600 secs and primary_host=0 for async checkpoint writes I0425 12:37:15.190019 132429263365952 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260425_121405/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260425_121405_07_distill_smoke/checkpoints I0425 12:37:15.421787 132429263365952 checkpoint_manager.py:921] [process=6][thread=MainThread] CheckpointManager created, primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260425_121405/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260425_121405_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7859e1dcf470> I0425 12:37:15.421948 132429263365952 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0425 12:37:15.422025 132429263365952 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7870df187fe0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0425 12:37:15.422063 132429263365952 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0425 12:37:15.422122 132429263365952 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7870df187fe0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0425 12:37:15.422174 132429263365952 checkpoint_manager.py:1983] [process=6][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning. I0425 12:37:15.422239 132429263365952 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=132429263365952 count=1 at 0x7859e1db98c0>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7859e1dcf260>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7859e1dcf230>, _write_futures=[]) I0425 12:37:15.422711 132429263365952 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=132429263365952 count=1 at 0x7859e1db98c0>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7859e1dcf260>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7859e1dcf230>, _write_futures=[]) I0425 12:37:15.422751 132429263365952 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=132429263365952 count=1 at 0x7859e1db98c0>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7859e1dcf260>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7859e1dcf230>, _write_futures=[]) I0425 12:37:15.422795 132429263365952 checkpoint_manager.py:702] [process=6][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7859e1dcf440>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7859e1dce7b0>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7859e1db31d0>, 'iter': <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7859e1db3fe0>}, handler_registry=None I0425 12:37:15.422933 132429263365952 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7859e1dcf440>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0425 12:37:15.422983 132429263365952 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7859e1dce7b0>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0425 12:37:15.423017 132429263365952 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7859e1db31d0>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0425 12:37:15.423053 132429263365952 composite_checkpoint_handler.py:237] Deferred registration for item: "iter". Adding handler `<maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7859e1db3fe0>` for item "iter" and save args `<class 'maxtext.common.checkpointing.GrainCheckpointSave'>` and restore args `<class 'maxtext.common.checkpointing.GrainCheckpointRestore'>` to `_handler_registry`. I0425 12:37:15.423086 132429263365952 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7859e1db2f00>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0425 12:37:15.423130 132429263365952 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7859e1dcf440>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7859e1dcf440>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7859e1dce7b0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7859e1dce7b0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7859e1db31d0>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7859e1db31d0>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointSave'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7859e1db3fe0>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointRestore'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7859e1db3fe0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7859e1db2f00>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7859e1db2f00>}). I0425 12:37:15.423229 132429263365952 async_checkpointer.py:177] [process=6][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x786b15cfc360> timeout: 600 secs and primary_host=0 for async checkpoint writes I0425 12:37:15.979603 132429263365952 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260425_121405/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260425_121405_07_distill_smoke/checkpoints I0425 12:37:15.988213 132429263365952 checkpoint_manager.py:921] [process=6][thread=MainThread] CheckpointManager created, primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260425_121405/pt_distill_nnx_xpk_feat_nnx_post_train_fixes_20260425_121405_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x786b28333380> I0425 12:37:15.988669 132429263365952 train_distill.py:689] Starting Distillation Training... I0425 12:37:15.988781 132429263365952 peft_trainer.py:584] Training with mesh: Mesh('diloco': 1, 'data': 4, 'stage': 1, 'fsdp': 8, 'fsdp_transpose': 1, 'sequence': 1, 'context': 1, 'context_autoregressive': 1, 'tensor': 1, 'tensor_transpose': 1, 'tensor_sequence': 1, 'expert': 1, 'autoregressive': 1, axis_types=(Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto)) I0425 12:37:16.452909 132429263365952 peft_trainer.py:594] Compiled train_step cache size: 0 I0425 12:37:16.454680 132273475135232 grain_pool.py:367] Grain pool will use 1 processes. I0425 12:37:16.512087 132273475135232 grain_pool.py:440] Grain pool will start child processes. I0425 12:37:16.517871 132273475135232 grain_pool.py:448] Grain pool started all child processes. 2026-04-25 12:37:23.068488: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303) `rope_parameters`'s factor field must be a float >= 1, got 40 `rope_parameters`'s beta_fast field must be a float, got 32 `rope_parameters`'s beta_slow field must be a float, got 1 DeepseekV32Config got `key=rope_scaling` in kwargs but hasn't set it as attribute. For RoPE standardization you need to set `self.rope_parameters` in model's config. Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 793, in <module> app.run(main) File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 316, in run _run_main(main, args) File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 261, in _run_main sys.exit(main(argv)) ^^^^^^^^^^ File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 789, in main train_distill(student_config, teacher_config, is_offline, global_config.offline_data_dir) File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 691, in train_distill trainer.train(train_iter, eval_iter) File "/usr/local/lib/python3.12/site-packages/tunix/sft/peft_trainer.py", line 652, in train train_example = sharding_utils.shard_input( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 58, in shard_input return jax.tree.map( ^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/tree.py", line 156, in map return tree_util.tree_map(f, tree, *rest, is_leaf=is_leaf) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 373, in tree_map return treedef.unflatten(f(*xs) for xs in zip(*all_leaves)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 373, in <genexpr> return treedef.unflatten(f(*xs) for xs in zip(*all_leaves)) ^^^^^^ File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 59, in <lambda> lambda x: jax.make_array_from_process_local_data( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 985, in make_array_from_process_local_data out = [_array_from_process_local_data(data, s, shape) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 1047, in _array_from_process_local_data return make_array_from_callback(global_shape, sharding, cb) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 844, in make_array_from_callback per_device_values = api.device_put(per_device_values, devices) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/api.py", line 2732, in device_put out_flat = dispatch._batched_device_put_impl( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 602, in _batched_device_put_impl y = _device_put_impl(x, device=device, src=src, copy=cp, aval=aval) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 582, in _device_put_impl return _device_put_sharding_impl(x, aval, device, copy) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 512, in _device_put_sharding_impl raise ValueError( ValueError: When the second argument to `device_put` is a Device, the first argument must be a fully addressable array or a non-addressable array with a single device sharding. Got value with devices {TpuDevice(id=2, process_index=1, coords=(2,0,0), core_on_chip=0), TpuDevice(id=3, process_index=1, coords=(3,0,0), core_on_chip=0), TpuDevice(id=11, process_index=3, coords=(3,2,0), core_on_chip=0), TpuDevice(id=8, process_index=2, coords=(0,2,0), core_on_chip=0), TpuDevice(id=18, process_index=5, coords=(2,4,0), core_on_chip=0), TpuDevice(id=23, process_index=5, coords=(3,5,0), core_on_chip=0), TpuDevice(id=28, process_index=6, coords=(0,7,0), core_on_chip=0), TpuDevice(id=30, process_index=7, coords=(2,7,0), core_on_chip=0), TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), TpuDevice(id=17, process_index=4, coords=(1,4,0), core_on_chip=0), TpuDevice(id=14, process_index=3, coords=(2,3,0), core_on_chip=0), TpuDevice(id=1, process_index=0, coords=(1,0,0), core_on_chip=0), TpuDevice(id=7, process_index=1, coords=(3,1,0), core_on_chip=0), TpuDevice(id=29, process_index=6, coords=(1,7,0), core_on_chip=0), TpuDevice(id=31, process_index=7, coords=(3,7,0), core_on_chip=0), TpuDevice(id=20, process_index=4, coords=(0,5,0), core_on_chip=0), TpuDevice(id=9, process_index=2, coords=(1,2,0), core_on_chip=0), TpuDevice(id=4, process_index=0, coords=(0,1,0), core_on_chip=0), TpuDevice(id=19, process_index=5, coords=(3,4,0), core_on_chip=0), TpuDevice(id=16, process_index=4, coords=(0,4,0), core_on_chip=0), TpuDevice(id=21, process_index=4, coords=(1,5,0), core_on_chip=0), TpuDevice(id=13, process_index=2, coords=(1,3,0), core_on_chip=0), TpuDevice(id=10, process_index=3, coords=(2,2,0), core_on_chip=0), TpuDevice(id=15, process_index=3, coords=(3,3,0), core_on_chip=0), TpuDevice(id=5, process_index=0, coords=(1,1,0), core_on_chip=0), TpuDevice(id=6, process_index=1, coords=(2,1,0), core_on_chip=0), TpuDevice(id=24, process_index=6, coords=(0,6,0), core_on_chip=0), TpuDevice(id=26, process_index=7, coords=(2,6,0), core_on_chip=0), TpuDevice(id=22, process_index=5, coords=(2,5,0), core_on_chip=0), TpuDevice(id=27, process_index=7, coords=(3,6,0), core_on_chip=0), TpuDevice(id=12, process_index=2, coords=(0,3,0), core_on_chip=0), TpuDevice(id=25, process_index=6, coords=(1,6,0), core_on_chip=0)} I0425 12:37:27.205173 132273475135232 grain_pool.py:542] Grain pool is exiting. I0425 12:37:27.205277 132273475135232 grain_pool.py:547] Shutting down multiprocessing system. I0425 12:37:28.923931 132273475135232 grain_pool.py:547] Shutting down multiprocessing system. /usr/local/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 15 leaked shared_memory objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' XPK End: Sat Apr 25 12:37:39 UTC 2026 EXIT_CODE=1