XPK Start: Sat Apr 25 09:39:29 UTC 2026 2026-04-25 09:39:47.749278: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303) `rope_parameters`'s factor field must be a float >= 1, got 40 `rope_parameters`'s beta_fast field must be a float, got 32 `rope_parameters`'s beta_slow field must be a float, got 1 DeepseekV32Config got `key=rope_scaling` in kwargs but hasn't set it as attribute. For RoPE standardization you need to set `self.rope_parameters` in model's config. I0425 09:39:54.115956 135841449502528 max_utils.py:273] Attempting to initialize the jax distributed system... I0425 09:40:03.154597 135841449502528 distributed.py:149] Starting JAX distributed service on [::]:8482 I0425 09:40:03.156864 135841449502528 distributed.py:172] Connecting to JAX distributed service on mt-07-distill-smoke-pnaut-slice-job-0-0.mt-07-distill-smoke-pnaut:8482 I0425 09:40:04.087690 135841449502528 max_utils.py:284] Jax distributed system initialized! I0425 09:40:10.549182 135841449502528 max_utils.py:244] Jax distributed system is already initialized. W0425 09:40:10.681233 135841449502528 pyconfig.py:111] base_output_directory is not provided; Using local directory called maxtext_output I0425 09:40:10.741974 135841449502528 max_utils.py:244] Jax distributed system is already initialized. I0425 09:40:10.743273 135841449502528 pyconfig.py:471] Config param abort_on_inf_loss: True I0425 09:40:10.743323 135841449502528 pyconfig.py:471] Config param abort_on_nan_loss: True I0425 09:40:10.743347 135841449502528 pyconfig.py:471] Config param act_quantization_calibration_method: absmax I0425 09:40:10.743368 135841449502528 pyconfig.py:471] Config param activation_dropout_for_audio: 0.0 I0425 09:40:10.743389 135841449502528 pyconfig.py:471] Config param activation_function_for_audio: gelu I0425 09:40:10.743409 135841449502528 pyconfig.py:471] Config param activations_in_float32: False I0425 09:40:10.743427 135841449502528 pyconfig.py:471] Config param adam_b1: 0.9 I0425 09:40:10.743445 135841449502528 pyconfig.py:471] Config param adam_b2: 0.95 I0425 09:40:10.743464 135841449502528 pyconfig.py:471] Config param adam_eps: 1e-08 I0425 09:40:10.743488 135841449502528 pyconfig.py:471] Config param adam_eps_root: 0.0 I0425 09:40:10.743506 135841449502528 pyconfig.py:471] Config param adam_weight_decay: 0.1 I0425 09:40:10.743523 135841449502528 pyconfig.py:471] Config param adamw_mask: [] I0425 09:40:10.743545 135841449502528 pyconfig.py:471] Config param add_bos: True I0425 09:40:10.743562 135841449502528 pyconfig.py:471] Config param add_eos: True I0425 09:40:10.743577 135841449502528 pyconfig.py:471] Config param allow_split_physical_axes: False I0425 09:40:10.743594 135841449502528 pyconfig.py:471] Config param ar_cache_axis_order: 1,2,0,3 I0425 09:40:10.743611 135841449502528 pyconfig.py:471] Config param async_checkpointing: True I0425 09:40:10.743626 135841449502528 pyconfig.py:471] Config param async_scheduling: False I0425 09:40:10.743643 135841449502528 pyconfig.py:471] Config param attention: dot_product I0425 09:40:10.743659 135841449502528 pyconfig.py:471] Config param attention_bias: False I0425 09:40:10.743677 135841449502528 pyconfig.py:471] Config param attention_dropout_for_audio: 0.0 I0425 09:40:10.743693 135841449502528 pyconfig.py:471] Config param attention_out: RematLocation.REMAT I0425 09:40:10.743714 135841449502528 pyconfig.py:471] Config param attention_output_dim: -1 I0425 09:40:10.743729 135841449502528 pyconfig.py:471] Config param attention_sink: False I0425 09:40:10.743746 135841449502528 pyconfig.py:471] Config param attention_type: global I0425 09:40:10.743763 135841449502528 pyconfig.py:471] Config param attn_logits_soft_cap: None I0425 09:40:10.743778 135841449502528 pyconfig.py:471] Config param audio_path: I0425 09:40:10.743795 135841449502528 pyconfig.py:471] Config param audio_placeholder: <|audio|> I0425 09:40:10.743811 135841449502528 pyconfig.py:471] Config param autoregressive_decode_assert: I0425 09:40:10.743826 135841449502528 pyconfig.py:471] Config param base_config: base.yml I0425 09:40:10.743843 135841449502528 pyconfig.py:471] Config param base_emb_dim: 16 I0425 09:40:10.743859 135841449502528 pyconfig.py:471] Config param base_mlp_dim: 64 I0425 09:40:10.743874 135841449502528 pyconfig.py:471] Config param base_moe_mlp_dim: -1 I0425 09:40:10.743891 135841449502528 pyconfig.py:471] Config param base_num_decoder_layers: 1 I0425 09:40:10.743906 135841449502528 pyconfig.py:471] Config param base_num_kv_heads: 2 I0425 09:40:10.743922 135841449502528 pyconfig.py:471] Config param base_num_query_heads: 2 I0425 09:40:10.743937 135841449502528 pyconfig.py:471] Config param base_output_directory: /deps/maxtext_output I0425 09:40:10.743953 135841449502528 pyconfig.py:471] Config param batch_size: 1 I0425 09:40:10.743968 135841449502528 pyconfig.py:471] Config param batch_split_factor: 1 I0425 09:40:10.743983 135841449502528 pyconfig.py:471] Config param beta_fast: 32 I0425 09:40:10.743999 135841449502528 pyconfig.py:471] Config param beta_slow: 1 I0425 09:40:10.744016 135841449502528 pyconfig.py:471] Config param bwd_quantization_calibration_method: absmax I0425 09:40:10.744034 135841449502528 pyconfig.py:471] Config param capacity_factor: -1.0 I0425 09:40:10.744051 135841449502528 pyconfig.py:471] Config param cast_logits_to_fp32: True I0425 09:40:10.744066 135841449502528 pyconfig.py:471] Config param chat_template: I0425 09:40:10.744081 135841449502528 pyconfig.py:471] Config param chat_template_path: I0425 09:40:10.744110 135841449502528 pyconfig.py:471] Config param checkpoint_conversion_fn: None I0425 09:40:10.744128 135841449502528 pyconfig.py:471] Config param checkpoint_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-09-40/checkpoints/ I0425 09:40:10.744144 135841449502528 pyconfig.py:471] Config param checkpoint_is_quantized: False I0425 09:40:10.744160 135841449502528 pyconfig.py:471] Config param checkpoint_period: 2000 I0425 09:40:10.744176 135841449502528 pyconfig.py:471] Config param checkpoint_storage_concurrent_gb: 96 I0425 09:40:10.744193 135841449502528 pyconfig.py:471] Config param checkpoint_storage_target_data_file_size_bytes: 2147483648 I0425 09:40:10.744210 135841449502528 pyconfig.py:471] Config param checkpoint_storage_use_ocdbt: True I0425 09:40:10.744224 135841449502528 pyconfig.py:471] Config param checkpoint_storage_use_zarr3: True I0425 09:40:10.744241 135841449502528 pyconfig.py:471] Config param checkpoint_todelete_full_path: None I0425 09:40:10.744256 135841449502528 pyconfig.py:471] Config param checkpoint_todelete_subdir: None I0425 09:40:10.744272 135841449502528 pyconfig.py:471] Config param chips_per_vm: 4 I0425 09:40:10.744287 135841449502528 pyconfig.py:471] Config param chunk_attn_window_size: 0 I0425 09:40:10.744302 135841449502528 pyconfig.py:471] Config param collect_stack_trace: False I0425 09:40:10.744318 135841449502528 pyconfig.py:471] Config param colocated_python_checkpointing: False I0425 09:40:10.744334 135841449502528 pyconfig.py:471] Config param colocated_python_data_input: False I0425 09:40:10.744349 135841449502528 pyconfig.py:471] Config param compile_topology: I0425 09:40:10.744364 135841449502528 pyconfig.py:471] Config param compile_topology_num_slices: -1 I0425 09:40:10.744380 135841449502528 pyconfig.py:471] Config param compile_xla_flags: I0425 09:40:10.744396 135841449502528 pyconfig.py:471] Config param compiled_trainstep_file: I0425 09:40:10.744413 135841449502528 pyconfig.py:471] Config param compute_axis_order: 0,1,2,3 I0425 09:40:10.744428 135841449502528 pyconfig.py:471] Config param constant_bound_config: [] I0425 09:40:10.744446 135841449502528 pyconfig.py:471] Config param context: RematLocation.REMAT I0425 09:40:10.744463 135841449502528 pyconfig.py:471] Config param context_parallel_load_balance: True I0425 09:40:10.744479 135841449502528 pyconfig.py:471] Config param context_parallel_reorder_strategy: ReorderStrategy.AUTO I0425 09:40:10.744499 135841449502528 pyconfig.py:471] Config param context_parallel_size: 1 I0425 09:40:10.744515 135841449502528 pyconfig.py:471] Config param context_parallel_strategy: all_gather I0425 09:40:10.744534 135841449502528 pyconfig.py:471] Config param context_sharding: context I0425 09:40:10.744550 135841449502528 pyconfig.py:471] Config param conv_chunksize_for_audio: 500 I0425 09:40:10.744565 135841449502528 pyconfig.py:471] Config param conv_stride_for_vit: 14 I0425 09:40:10.744581 135841449502528 pyconfig.py:471] Config param convert_checkpoint_if_possible: False I0425 09:40:10.744595 135841449502528 pyconfig.py:471] Config param cost_estimate_flops_bwd: -1 I0425 09:40:10.744611 135841449502528 pyconfig.py:471] Config param cost_estimate_flops_fwd: -1 I0425 09:40:10.744627 135841449502528 pyconfig.py:471] Config param custom_mesh: I0425 09:40:10.744643 135841449502528 pyconfig.py:471] Config param custom_mesh_and_rule: I0425 09:40:10.744657 135841449502528 pyconfig.py:471] Config param d_model_for_audio: 256 I0425 09:40:10.744673 135841449502528 pyconfig.py:471] Config param data_sharding: (('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'),) I0425 09:40:10.744693 135841449502528 pyconfig.py:471] Config param data_shuffle_seed: 0 I0425 09:40:10.744711 135841449502528 pyconfig.py:471] Config param dataset_name: c4/en:3.0.1 I0425 09:40:10.744726 135841449502528 pyconfig.py:471] Config param dataset_path: I0425 09:40:10.744742 135841449502528 pyconfig.py:471] Config param dataset_type: DatasetType.HF I0425 09:40:10.744760 135841449502528 pyconfig.py:471] Config param dcn_autoregressive_parallelism: 1 I0425 09:40:10.744776 135841449502528 pyconfig.py:471] Config param dcn_context_autoregressive_parallelism: 1 I0425 09:40:10.744791 135841449502528 pyconfig.py:471] Config param dcn_context_parallelism: 1 I0425 09:40:10.744807 135841449502528 pyconfig.py:471] Config param dcn_data_parallelism: -1 I0425 09:40:10.744823 135841449502528 pyconfig.py:471] Config param dcn_diloco_parallelism: 1 I0425 09:40:10.744839 135841449502528 pyconfig.py:471] Config param dcn_expert_parallelism: 1 I0425 09:40:10.744854 135841449502528 pyconfig.py:471] Config param dcn_fsdp_parallelism: 1 I0425 09:40:10.744870 135841449502528 pyconfig.py:471] Config param dcn_fsdp_transpose_parallelism: 1 I0425 09:40:10.744885 135841449502528 pyconfig.py:471] Config param dcn_parallelism: [1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] I0425 09:40:10.744902 135841449502528 pyconfig.py:471] Config param dcn_pipeline_parallelism: 1 I0425 09:40:10.744918 135841449502528 pyconfig.py:471] Config param dcn_sequence_parallelism: 1 I0425 09:40:10.744933 135841449502528 pyconfig.py:471] Config param dcn_tensor_parallelism: 1 I0425 09:40:10.744949 135841449502528 pyconfig.py:471] Config param dcn_tensor_sequence_parallelism: 1 I0425 09:40:10.744965 135841449502528 pyconfig.py:471] Config param dcn_tensor_transpose_parallelism: 1 I0425 09:40:10.744979 135841449502528 pyconfig.py:471] Config param debug: {'rl': False} I0425 09:40:10.744996 135841449502528 pyconfig.py:471] Config param debug_sharding: False I0425 09:40:10.745012 135841449502528 pyconfig.py:471] Config param decode_sampling_nucleus_p: -1 I0425 09:40:10.745026 135841449502528 pyconfig.py:471] Config param decode_sampling_strategy: SamplingStrategy.GREEDY I0425 09:40:10.745044 135841449502528 pyconfig.py:471] Config param decode_sampling_temperature: 1.0 I0425 09:40:10.745060 135841449502528 pyconfig.py:471] Config param decode_sampling_top_k: 0 I0425 09:40:10.745075 135841449502528 pyconfig.py:471] Config param decoder_block: DecoderBlockType.GPT3 I0425 09:40:10.745092 135841449502528 pyconfig.py:471] Config param decoder_layer_input: RematLocation.DEVICE I0425 09:40:10.745133 135841449502528 pyconfig.py:471] Config param deepstack_visual_indexes_for_vit: [] I0425 09:40:10.745147 135841449502528 pyconfig.py:471] Config param degenerate_group_masking: True I0425 09:40:10.745165 135841449502528 pyconfig.py:471] Config param dense_init_scale: 1.0 I0425 09:40:10.745180 135841449502528 pyconfig.py:471] Config param diloco_outer_lr: 0.3 I0425 09:40:10.745196 135841449502528 pyconfig.py:471] Config param diloco_outer_momentum: 0.9 I0425 09:40:10.745211 135841449502528 pyconfig.py:471] Config param diloco_sync_period: 36 I0425 09:40:10.745228 135841449502528 pyconfig.py:471] Config param distill_alpha: 0.5 I0425 09:40:10.745244 135841449502528 pyconfig.py:471] Config param distill_alpha_end: None I0425 09:40:10.745260 135841449502528 pyconfig.py:471] Config param distill_alpha_schedule: constant I0425 09:40:10.745275 135841449502528 pyconfig.py:471] Config param distill_beta: 0.0 I0425 09:40:10.745291 135841449502528 pyconfig.py:471] Config param distill_beta_end: None I0425 09:40:10.745306 135841449502528 pyconfig.py:471] Config param distill_beta_schedule: constant I0425 09:40:10.745322 135841449502528 pyconfig.py:471] Config param distill_feature_loss_type: cosine I0425 09:40:10.745337 135841449502528 pyconfig.py:471] Config param distill_layer_indices: None I0425 09:40:10.745352 135841449502528 pyconfig.py:471] Config param distill_temperature: 1.0 I0425 09:40:10.745368 135841449502528 pyconfig.py:471] Config param distill_temperature_end: None I0425 09:40:10.745383 135841449502528 pyconfig.py:471] Config param distill_temperature_schedule: constant I0425 09:40:10.745400 135841449502528 pyconfig.py:471] Config param downsample_hidden_size_for_audio: 256 I0425 09:40:10.745415 135841449502528 pyconfig.py:471] Config param dpo_beta: 0.1 I0425 09:40:10.745432 135841449502528 pyconfig.py:471] Config param dpo_label_smoothing: 0.0 I0425 09:40:10.745448 135841449502528 pyconfig.py:471] Config param dq_reduction_steps: 0 I0425 09:40:10.745464 135841449502528 pyconfig.py:471] Config param dropout_rate: 0.0 I0425 09:40:10.745481 135841449502528 pyconfig.py:471] Config param dtype: bfloat16 I0425 09:40:10.745511 135841449502528 pyconfig.py:471] Config param dtype_mm: float32 I0425 09:40:10.745527 135841449502528 pyconfig.py:471] Config param dump_hlo: False I0425 09:40:10.745548 135841449502528 pyconfig.py:471] Config param dump_hlo_delete_local_after: True I0425 09:40:10.745563 135841449502528 pyconfig.py:471] Config param dump_hlo_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-09-40/xla_dump I0425 09:40:10.745580 135841449502528 pyconfig.py:471] Config param dump_hlo_local_dir: /tmp/xla_dump/ I0425 09:40:10.745596 135841449502528 pyconfig.py:471] Config param dump_hlo_local_module_name: jit_train_step I0425 09:40:10.745612 135841449502528 pyconfig.py:471] Config param dump_hlo_module_name: jit_train_step I0425 09:40:10.745626 135841449502528 pyconfig.py:471] Config param dump_hlo_upload_all: False I0425 09:40:10.745644 135841449502528 pyconfig.py:471] Config param dump_hlo_xla_flags: I0425 09:40:10.745658 135841449502528 pyconfig.py:471] Config param dump_jaxpr: False I0425 09:40:10.745675 135841449502528 pyconfig.py:471] Config param dump_jaxpr_delete_local_after: True I0425 09:40:10.745692 135841449502528 pyconfig.py:471] Config param dump_jaxpr_gcs_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-09-40/jaxpr_dump I0425 09:40:10.745708 135841449502528 pyconfig.py:471] Config param dump_jaxpr_local_dir: /tmp/jaxpr_dump/ I0425 09:40:10.745723 135841449502528 pyconfig.py:471] Config param dump_step: -1 I0425 09:40:10.745739 135841449502528 pyconfig.py:471] Config param elastic_enabled: False I0425 09:40:10.745755 135841449502528 pyconfig.py:471] Config param elastic_max_retries: 10 I0425 09:40:10.745771 135841449502528 pyconfig.py:471] Config param elastic_timeout_seconds: 300 I0425 09:40:10.745787 135841449502528 pyconfig.py:471] Config param emb_dim: 16 I0425 09:40:10.745803 135841449502528 pyconfig.py:471] Config param enable_autocheckpoint: False I0425 09:40:10.745817 135841449502528 pyconfig.py:471] Config param enable_checkpoint_cloud_logger: False I0425 09:40:10.745833 135841449502528 pyconfig.py:471] Config param enable_checkpointing: True I0425 09:40:10.745849 135841449502528 pyconfig.py:471] Config param enable_continuous_checkpointing: False I0425 09:40:10.745865 135841449502528 pyconfig.py:471] Config param enable_data_shuffling: True I0425 09:40:10.745882 135841449502528 pyconfig.py:471] Config param enable_diloco: False I0425 09:40:10.745899 135841449502528 pyconfig.py:471] Config param enable_dp_attention: False I0425 09:40:10.745914 135841449502528 pyconfig.py:471] Config param enable_dropout: False I0425 09:40:10.745929 135841449502528 pyconfig.py:471] Config param enable_emergency_checkpoint: False I0425 09:40:10.745944 135841449502528 pyconfig.py:471] Config param enable_expert_parallel: False I0425 09:40:10.745960 135841449502528 pyconfig.py:471] Config param enable_gcp_goodput_metrics: True I0425 09:40:10.745976 135841449502528 pyconfig.py:471] Config param enable_gcp_step_deviation_metrics: True I0425 09:40:10.745990 135841449502528 pyconfig.py:471] Config param enable_goodput_recording: False I0425 09:40:10.746006 135841449502528 pyconfig.py:471] Config param enable_jax_profiler: False I0425 09:40:10.746021 135841449502528 pyconfig.py:471] Config param enable_llm_inference_pool: False I0425 09:40:10.746037 135841449502528 pyconfig.py:471] Config param enable_model_warmup: False I0425 09:40:10.746051 135841449502528 pyconfig.py:471] Config param enable_multi_tier_checkpointing: False I0425 09:40:10.746068 135841449502528 pyconfig.py:471] Config param enable_nnx: False I0425 09:40:10.746082 135841449502528 pyconfig.py:471] Config param enable_orbax_v1: False I0425 09:40:10.746107 135841449502528 pyconfig.py:471] Config param enable_padding_causal_mask: True I0425 09:40:10.746123 135841449502528 pyconfig.py:471] Config param enable_pathways_goodput: False I0425 09:40:10.746139 135841449502528 pyconfig.py:471] Config param enable_prefix_caching: False I0425 09:40:10.746153 135841449502528 pyconfig.py:471] Config param enable_rampup_batch_size: False I0425 09:40:10.746169 135841449502528 pyconfig.py:471] Config param enable_single_controller: False I0425 09:40:10.746185 135841449502528 pyconfig.py:471] Config param enable_single_replica_ckpt_restoring: False I0425 09:40:10.746200 135841449502528 pyconfig.py:471] Config param enable_tensorboard: True I0425 09:40:10.746216 135841449502528 pyconfig.py:471] Config param enable_tunix_perf_metrics: False I0425 09:40:10.746232 135841449502528 pyconfig.py:471] Config param encoder_attention_heads_for_audio: 4 I0425 09:40:10.746248 135841449502528 pyconfig.py:471] Config param encoder_ffn_dim_for_audio: 512 I0425 09:40:10.746264 135841449502528 pyconfig.py:471] Config param encoder_layers_for_audio: 2 I0425 09:40:10.746280 135841449502528 pyconfig.py:471] Config param engram: RematLocation.REMAT I0425 09:40:10.746294 135841449502528 pyconfig.py:471] Config param engram_head_dim: 1280 I0425 09:40:10.746311 135841449502528 pyconfig.py:471] Config param engram_kernel_size: 4 I0425 09:40:10.746326 135841449502528 pyconfig.py:471] Config param engram_layers: [] I0425 09:40:10.746342 135841449502528 pyconfig.py:471] Config param engram_max_ngram_size: 3 I0425 09:40:10.746359 135841449502528 pyconfig.py:471] Config param engram_num_heads: 8 I0425 09:40:10.746375 135841449502528 pyconfig.py:471] Config param engram_seed: 0 I0425 09:40:10.746390 135841449502528 pyconfig.py:471] Config param engram_vocab_bases: [] I0425 09:40:10.746406 135841449502528 pyconfig.py:471] Config param epsilon_high: None I0425 09:40:10.746422 135841449502528 pyconfig.py:471] Config param eval_corr_lst: False I0425 09:40:10.746438 135841449502528 pyconfig.py:471] Config param eval_data_columns: ['text'] I0425 09:40:10.746453 135841449502528 pyconfig.py:471] Config param eval_dataset_name: c4/en:3.0.1 I0425 09:40:10.746469 135841449502528 pyconfig.py:471] Config param eval_image_column: image I0425 09:40:10.746486 135841449502528 pyconfig.py:471] Config param eval_interval: -1 I0425 09:40:10.746501 135841449502528 pyconfig.py:471] Config param eval_make_lst: False I0425 09:40:10.746516 135841449502528 pyconfig.py:471] Config param eval_per_device_batch_size: 2 I0425 09:40:10.746536 135841449502528 pyconfig.py:471] Config param eval_sampling_strategy: greedy I0425 09:40:10.746552 135841449502528 pyconfig.py:471] Config param eval_split: validation I0425 09:40:10.746567 135841449502528 pyconfig.py:471] Config param eval_steps: -1 I0425 09:40:10.746582 135841449502528 pyconfig.py:471] Config param expansion_factor_real_data: -1.0 I0425 09:40:10.746598 135841449502528 pyconfig.py:471] Config param final_logits_soft_cap: None I0425 09:40:10.746613 135841449502528 pyconfig.py:471] Config param first_num_dense_layers: 0 I0425 09:40:10.746629 135841449502528 pyconfig.py:471] Config param float32_gate_logits: False I0425 09:40:10.746644 135841449502528 pyconfig.py:471] Config param float32_logits: False I0425 09:40:10.746660 135841449502528 pyconfig.py:471] Config param float32_qk_product: False I0425 09:40:10.746675 135841449502528 pyconfig.py:471] Config param float32_weight_sum: True I0425 09:40:10.746690 135841449502528 pyconfig.py:471] Config param force_q_layout: False I0425 09:40:10.746705 135841449502528 pyconfig.py:471] Config param force_unroll: False I0425 09:40:10.746721 135841449502528 pyconfig.py:471] Config param formatting_func_kwargs: {} I0425 09:40:10.746736 135841449502528 pyconfig.py:471] Config param formatting_func_path: I0425 09:40:10.746752 135841449502528 pyconfig.py:471] Config param freeze_audio_encoder_params: True I0425 09:40:10.746766 135841449502528 pyconfig.py:471] Config param freeze_vision_encoder_params: True I0425 09:40:10.746783 135841449502528 pyconfig.py:471] Config param fused_mlp: False I0425 09:40:10.746797 135841449502528 pyconfig.py:471] Config param fused_qkv: True I0425 09:40:10.746813 135841449502528 pyconfig.py:471] Config param gcs_metrics: False I0425 09:40:10.746828 135841449502528 pyconfig.py:471] Config param gdn_chunk_size: 64 I0425 09:40:10.746843 135841449502528 pyconfig.py:471] Config param gdn_conv_kernel_dim: 4 I0425 09:40:10.746858 135841449502528 pyconfig.py:471] Config param gdn_key_head_dim: 128 I0425 09:40:10.746874 135841449502528 pyconfig.py:471] Config param gdn_num_key_heads: 16 I0425 09:40:10.746890 135841449502528 pyconfig.py:471] Config param gdn_num_value_heads: 32 I0425 09:40:10.746904 135841449502528 pyconfig.py:471] Config param gdn_value_head_dim: 128 I0425 09:40:10.746920 135841449502528 pyconfig.py:471] Config param generate_padding_batch_eval: False I0425 09:40:10.746935 135841449502528 pyconfig.py:471] Config param generate_padding_batch_train: False I0425 09:40:10.746951 135841449502528 pyconfig.py:471] Config param generate_slice: v5e-16 I0425 09:40:10.746966 135841449502528 pyconfig.py:471] Config param generation_configs: {} I0425 09:40:10.746983 135841449502528 pyconfig.py:471] Config param global_batch_size_to_eval_on: 64 I0425 09:40:10.746998 135841449502528 pyconfig.py:471] Config param global_batch_size_to_load: 512 I0425 09:40:10.747014 135841449502528 pyconfig.py:471] Config param global_batch_size_to_load_eval: 64 I0425 09:40:10.747028 135841449502528 pyconfig.py:471] Config param global_batch_size_to_load_increment: None I0425 09:40:10.747044 135841449502528 pyconfig.py:471] Config param global_batch_size_to_load_start: None I0425 09:40:10.747060 135841449502528 pyconfig.py:471] Config param global_batch_size_to_train_on: 512 I0425 09:40:10.747075 135841449502528 pyconfig.py:471] Config param global_head_dim: 0 I0425 09:40:10.747102 135841449502528 pyconfig.py:471] Config param global_num_kv_heads: 0 I0425 09:40:10.747120 135841449502528 pyconfig.py:471] Config param global_parameter_scale: 1 I0425 09:40:10.747135 135841449502528 pyconfig.py:471] Config param global_rampup_samples: 500 I0425 09:40:10.747150 135841449502528 pyconfig.py:471] Config param global_rope_max_timescale: -1 I0425 09:40:10.747168 135841449502528 pyconfig.py:471] Config param global_rope_proportion: 0.25 I0425 09:40:10.747184 135841449502528 pyconfig.py:471] Config param goodput_upload_interval_seconds: 30 I0425 09:40:10.747200 135841449502528 pyconfig.py:471] Config param grad_dtype: float32 I0425 09:40:10.747236 135841449502528 pyconfig.py:471] Config param gradient_accumulation_steps: 8 I0425 09:40:10.747251 135841449502528 pyconfig.py:471] Config param gradient_clipping_threshold: 1.0 I0425 09:40:10.747268 135841449502528 pyconfig.py:471] Config param grain_data_source_max_workers: 16 I0425 09:40:10.747284 135841449502528 pyconfig.py:471] Config param grain_eval_files: I0425 09:40:10.747301 135841449502528 pyconfig.py:471] Config param grain_file_type: arrayrecord I0425 09:40:10.747315 135841449502528 pyconfig.py:471] Config param grain_num_threads: 16 I0425 09:40:10.747332 135841449502528 pyconfig.py:471] Config param grain_num_threads_eval: 16 I0425 09:40:10.747348 135841449502528 pyconfig.py:471] Config param grain_packing_type: first_fit I0425 09:40:10.747364 135841449502528 pyconfig.py:471] Config param grain_per_worker_buffer_size: 1 I0425 09:40:10.747379 135841449502528 pyconfig.py:471] Config param grain_per_worker_buffer_size_eval: 1 I0425 09:40:10.747395 135841449502528 pyconfig.py:471] Config param grain_prefetch_buffer_size: 500 I0425 09:40:10.747411 135841449502528 pyconfig.py:471] Config param grain_prefetch_buffer_size_eval: 500 I0425 09:40:10.747425 135841449502528 pyconfig.py:471] Config param grain_ram_budget_mb: 1024 I0425 09:40:10.747441 135841449502528 pyconfig.py:471] Config param grain_shuffle_buffer_size: 100 I0425 09:40:10.747456 135841449502528 pyconfig.py:471] Config param grain_train_files: I0425 09:40:10.747471 135841449502528 pyconfig.py:471] Config param grain_train_mixture_config_path: I0425 09:40:10.747486 135841449502528 pyconfig.py:471] Config param grain_worker_count: 1 I0425 09:40:10.747502 135841449502528 pyconfig.py:471] Config param grain_worker_count_eval: 1 I0425 09:40:10.747516 135841449502528 pyconfig.py:471] Config param grpo_beta: 0.08 I0425 09:40:10.747535 135841449502528 pyconfig.py:471] Config param grpo_epsilon: 0.2 I0425 09:40:10.747550 135841449502528 pyconfig.py:471] Config param hardware: tpu I0425 09:40:10.747566 135841449502528 pyconfig.py:471] Config param hbm_utilization_vllm: 0.72 I0425 09:40:10.747581 135841449502528 pyconfig.py:471] Config param head_dim: 8 I0425 09:40:10.747596 135841449502528 pyconfig.py:471] Config param heartbeat_reporting_interval_in_seconds: 5 I0425 09:40:10.747611 135841449502528 pyconfig.py:471] Config param hf_data_dir: None I0425 09:40:10.747627 135841449502528 pyconfig.py:471] Config param hf_eval_files: None I0425 09:40:10.747642 135841449502528 pyconfig.py:471] Config param hf_eval_split: None I0425 09:40:10.747658 135841449502528 pyconfig.py:471] Config param hf_name: None I0425 09:40:10.747673 135841449502528 pyconfig.py:471] Config param hf_path: OptimalScale/ClimbMix I0425 09:40:10.747689 135841449502528 pyconfig.py:471] Config param hf_train_files: None I0425 09:40:10.747703 135841449502528 pyconfig.py:471] Config param hidden_size_for_vit: 1408 I0425 09:40:10.747719 135841449502528 pyconfig.py:471] Config param hide_profiler_step_metric: False I0425 09:40:10.747734 135841449502528 pyconfig.py:471] Config param ici_autoregressive_parallelism: 1 I0425 09:40:10.747748 135841449502528 pyconfig.py:471] Config param ici_context_autoregressive_parallelism: 1 I0425 09:40:10.747764 135841449502528 pyconfig.py:471] Config param ici_context_parallelism: 1 I0425 09:40:10.747779 135841449502528 pyconfig.py:471] Config param ici_data_parallelism: 1 I0425 09:40:10.747795 135841449502528 pyconfig.py:471] Config param ici_diloco_parallelism: 1 I0425 09:40:10.747810 135841449502528 pyconfig.py:471] Config param ici_expert_parallelism: 1 I0425 09:40:10.747824 135841449502528 pyconfig.py:471] Config param ici_fsdp_parallelism: -1 I0425 09:40:10.747838 135841449502528 pyconfig.py:471] Config param ici_fsdp_transpose_parallelism: 1 I0425 09:40:10.747854 135841449502528 pyconfig.py:471] Config param ici_parallelism: [1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1] I0425 09:40:10.747872 135841449502528 pyconfig.py:471] Config param ici_pipeline_parallelism: 1 I0425 09:40:10.747887 135841449502528 pyconfig.py:471] Config param ici_sequence_parallelism: 1 I0425 09:40:10.747902 135841449502528 pyconfig.py:471] Config param ici_tensor_parallelism: 1 I0425 09:40:10.747916 135841449502528 pyconfig.py:471] Config param ici_tensor_sequence_parallelism: 1 I0425 09:40:10.747933 135841449502528 pyconfig.py:471] Config param ici_tensor_transpose_parallelism: 1 I0425 09:40:10.747948 135841449502528 pyconfig.py:471] Config param image_path: I0425 09:40:10.747964 135841449502528 pyconfig.py:471] Config param image_placeholder: <|image|> I0425 09:40:10.747980 135841449502528 pyconfig.py:471] Config param image_size_for_vit: 896 I0425 09:40:10.747994 135841449502528 pyconfig.py:471] Config param indexer_head_dim: 128 I0425 09:40:10.748010 135841449502528 pyconfig.py:471] Config param indexer_loss_scaling_factor: 0.0 I0425 09:40:10.748024 135841449502528 pyconfig.py:471] Config param indexer_n_heads: 64 I0425 09:40:10.748040 135841449502528 pyconfig.py:471] Config param indexer_sparse_training: False I0425 09:40:10.748056 135841449502528 pyconfig.py:471] Config param indexer_topk: 2048 I0425 09:40:10.748072 135841449502528 pyconfig.py:471] Config param inference_benchmark_test: False I0425 09:40:10.748088 135841449502528 pyconfig.py:471] Config param inference_metadata_file: I0425 09:40:10.748115 135841449502528 pyconfig.py:471] Config param inference_microbenchmark_log_file_path: I0425 09:40:10.748131 135841449502528 pyconfig.py:471] Config param inference_microbenchmark_loop_iters: 10 I0425 09:40:10.748146 135841449502528 pyconfig.py:471] Config param inference_microbenchmark_num_samples: [1, 2, 3, 4, 5] I0425 09:40:10.748162 135841449502528 pyconfig.py:471] Config param inference_microbenchmark_prefill_lengths: 64,128,256,512,1024 I0425 09:40:10.748178 135841449502528 pyconfig.py:471] Config param inference_microbenchmark_stages: prefill,generate I0425 09:40:10.748194 135841449502528 pyconfig.py:471] Config param inference_server: MaxtextInterleavedServer I0425 09:40:10.748209 135841449502528 pyconfig.py:471] Config param inhomogeneous_layer_cycle_interval: 1 I0425 09:40:10.748225 135841449502528 pyconfig.py:471] Config param init_weights_seed: 0 I0425 09:40:10.748240 135841449502528 pyconfig.py:471] Config param input_data_sharding_logical_axes: ['activation_embed_and_logits_batch', 'activation_norm_length'] I0425 09:40:10.748257 135841449502528 pyconfig.py:471] Config param interleave_moe_layer_step: 1 I0425 09:40:10.748272 135841449502528 pyconfig.py:471] Config param intermediate_size_for_vit: 5632 I0425 09:40:10.748288 135841449502528 pyconfig.py:471] Config param internal_compile: False I0425 09:40:10.748302 135841449502528 pyconfig.py:471] Config param internal_compile_num_devices: -1 I0425 09:40:10.748318 135841449502528 pyconfig.py:471] Config param jax_cache_dir: ~/jax_cache I0425 09:40:10.748332 135841449502528 pyconfig.py:471] Config param jax_debug_log_modules: I0425 09:40:10.748349 135841449502528 pyconfig.py:471] Config param jax_distributed_initialization_timeout: 300 I0425 09:40:10.748363 135841449502528 pyconfig.py:471] Config param jax_profiler_port: 9999 I0425 09:40:10.748378 135841449502528 pyconfig.py:471] Config param key_proj: RematLocation.REMAT I0425 09:40:10.748394 135841449502528 pyconfig.py:471] Config param kv_cache_buffer: 256 I0425 09:40:10.748410 135841449502528 pyconfig.py:471] Config param kv_lora_rank: 512 I0425 09:40:10.748427 135841449502528 pyconfig.py:471] Config param kv_quant_axis: KvQuantAxis.HEADS_AND_DKV I0425 09:40:10.748445 135841449502528 pyconfig.py:471] Config param kv_quant_dtype: int8 I0425 09:40:10.748460 135841449502528 pyconfig.py:471] Config param kv_wa_proj: RematLocation.REMAT I0425 09:40:10.748476 135841449502528 pyconfig.py:471] Config param learning_rate: 0.0002 I0425 09:40:10.748492 135841449502528 pyconfig.py:471] Config param learning_rate_final_fraction: 0.1 I0425 09:40:10.748508 135841449502528 pyconfig.py:471] Config param learning_rate_schedule_steps: 200000 I0425 09:40:10.748523 135841449502528 pyconfig.py:471] Config param load_balance_loss_weight: 0.0 I0425 09:40:10.748542 135841449502528 pyconfig.py:471] Config param load_checkpoint_only_once: False I0425 09:40:10.748556 135841449502528 pyconfig.py:471] Config param load_from_prefill_dir: False I0425 09:40:10.748572 135841449502528 pyconfig.py:471] Config param load_full_state_path: I0425 09:40:10.748587 135841449502528 pyconfig.py:471] Config param load_parameters_path: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items I0425 09:40:10.748604 135841449502528 pyconfig.py:471] Config param local_checkpoint_directory: I0425 09:40:10.748619 135841449502528 pyconfig.py:471] Config param local_checkpoint_period: 0 I0425 09:40:10.748634 135841449502528 pyconfig.py:471] Config param local_rope_max_timescale: -1 I0425 09:40:10.748650 135841449502528 pyconfig.py:471] Config param local_rope_proportion: 1.0 I0425 09:40:10.748665 135841449502528 pyconfig.py:471] Config param log_config: True I0425 09:40:10.748681 135841449502528 pyconfig.py:471] Config param log_period: 10 I0425 09:40:10.748696 135841449502528 pyconfig.py:471] Config param logical_axis_rules: (('activation_embed_and_logits_batch', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_embed_and_logits_batch_sequence', ('data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_vocab', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_vocab', ('tensor', 'tensor_transpose')), ('activation_vocab', 'tensor_sequence'), ('activation_vocab', ('sequence', 'context')), ('vocab', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('embed_vocab', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('activation_batch_attn', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence', 'autoregressive')), ('activation_kv_heads', ('tensor', 'tensor_transpose', 'sequence', 'tensor_sequence')), ('activation_length_attn', ('sequence', 'context')), ('activation_length_attn', ('context',)), ('activation_q_length', ('context',)), ('activation_kv_length', ()), ('activation_embed_attn', ('tensor', 'tensor_transpose')), ('activation_kv', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_kv_head_dim', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('q_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('kv_heads', ('tensor', 'tensor_transpose', 'tensor_sequence', 'autoregressive')), ('qkv', ()), ('kv', ()), ('kv_head_dim', ()), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('q_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('q_lora', ('fsdp', 'sequence', 'context', 'expert')), ('q_lora_up_proj', ()), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'tensor_transpose', 'expert')), ('kv_lora', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('kv_lora', ('fsdp', 'sequence', 'context', 'expert')), ('kv_lora_up_proj', ()), ('activation_batch_moe', ('data', 'fsdp', 'fsdp_transpose')), ('activation_length_moe', ('sequence', 'context')), ('activation_length_moe', ('context',)), ('activation_norm_length_moe', ('tensor_sequence', 'context', 'sequence')), ('activation_embed_moe', ('tensor', 'tensor_transpose')), ('activation_mlp_moe', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_exp', ('expert',)), ('exp', 'expert'), ('mlp_moe', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'sequence', 'tensor_transpose', 'context')), ('embed_moe', ('fsdp', 'fsdp_transpose', 'sequence', 'context')), ('embed_moe', ('fsdp', 'sequence', 'context')), ('activation_mlp', ('tensor', 'tensor_transpose', 'tensor_sequence')), ('activation_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('activation_length', ('sequence', 'context')), ('activation_length', ('context',)), ('activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_embed', ('tensor', 'tensor_transpose')), ('activation_stage', 'stage'), ('mlp', ('fsdp_transpose', 'tensor', 'tensor_sequence', 'autoregressive')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'tensor_transpose', 'context', 'expert')), ('embed', ('fsdp', 'fsdp_transpose', 'sequence', 'context', 'expert')), ('embed', ('fsdp', 'sequence', 'context', 'expert')), ('norm', ('tensor', 'tensor_transpose')), ('layers', 'stage'), ('diloco', 'diloco'), ('engram_dim', ('tensor',)), ('dense_layers', ()), ('moe_layers', ()), ('mhc', ()), ('prefill_activation_length', ('sequence', 'context')), ('prefill_activation_norm_length', ('tensor_sequence', 'context', 'sequence')), ('activation_prefill_kv_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_batch', ('data', 'fsdp', 'fsdp_transpose', 'expert')), ('decode_length', ('sequence',)), ('cache_heads', ('autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence')), ('cache_heads', ('autoregressive', 'tensor', 'tensor_sequence')), ('paged_kv_heads', ('tensor',)), ('cache_batch_prefill', ()), ('cache_batch', ()), ('cache_heads_none', ()), ('cache_kv', ()), ('cache_sequence', ()), ('num_pages', ()), ('tokens_per_page', ()), ('paged_kv_head_dim_size', ()), ('mlp_no_fsdp', ('tensor', 'tensor_sequence', 'autoregressive')), ('embed_tensor_transpose', ('tensor_transpose',)), ('exp_with_fsdp', 'fsdp')) I0425 09:40:10.748772 135841449502528 pyconfig.py:471] Config param logits_dot_in_fp32: False I0425 09:40:10.748791 135841449502528 pyconfig.py:471] Config param logits_via_embedding: True I0425 09:40:10.748808 135841449502528 pyconfig.py:471] Config param lora_input_adapters_path: I0425 09:40:10.748824 135841449502528 pyconfig.py:471] Config param loss_algo: grpo I0425 09:40:10.748841 135841449502528 pyconfig.py:471] Config param lr_schedule_type: LearningRateScheduleType.COSINE I0425 09:40:10.748859 135841449502528 pyconfig.py:471] Config param managed_mldiagnostics: False I0425 09:40:10.748874 135841449502528 pyconfig.py:471] Config param managed_mldiagnostics_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-09-40/managed-mldiagnostics I0425 09:40:10.748889 135841449502528 pyconfig.py:471] Config param managed_mldiagnostics_run_group: I0425 09:40:10.748905 135841449502528 pyconfig.py:471] Config param matmul_precision: MatmulPrecision.DEFAULT I0425 09:40:10.748923 135841449502528 pyconfig.py:471] Config param max_checkify: False I0425 09:40:10.748939 135841449502528 pyconfig.py:471] Config param max_concurrency: 256 I0425 09:40:10.748955 135841449502528 pyconfig.py:471] Config param max_corpus_chars: 10000000 I0425 09:40:10.748970 135841449502528 pyconfig.py:471] Config param max_num_batched_tokens: None I0425 09:40:10.748987 135841449502528 pyconfig.py:471] Config param max_num_checkpoints_to_keep: None I0425 09:40:10.749001 135841449502528 pyconfig.py:471] Config param max_num_images_per_example: -1 I0425 09:40:10.749017 135841449502528 pyconfig.py:471] Config param max_num_seqs: None I0425 09:40:10.749032 135841449502528 pyconfig.py:471] Config param max_position_embeddings: 163840 I0425 09:40:10.749048 135841449502528 pyconfig.py:471] Config param max_prefill_predict_length: 64 I0425 09:40:10.749064 135841449502528 pyconfig.py:471] Config param max_sample_len_for_audio: 10000 I0425 09:40:10.749078 135841449502528 pyconfig.py:471] Config param max_segments_per_seq: -1 I0425 09:40:10.749111 135841449502528 pyconfig.py:471] Config param max_source_positions_for_audio: 1500 I0425 09:40:10.749128 135841449502528 pyconfig.py:471] Config param max_target_length: 2048 I0425 09:40:10.749144 135841449502528 pyconfig.py:471] Config param max_timescale_for_audio: 10000.0 I0425 09:40:10.749161 135841449502528 pyconfig.py:471] Config param megablox: True I0425 09:40:10.749176 135841449502528 pyconfig.py:471] Config param merge_gating_gmm: False I0425 09:40:10.749193 135841449502528 pyconfig.py:471] Config param mesh_axes: ['diloco', 'data', 'stage', 'fsdp', 'fsdp_transpose', 'sequence', 'context', 'context_autoregressive', 'tensor', 'tensor_transpose', 'tensor_sequence', 'expert', 'autoregressive'] I0425 09:40:10.749211 135841449502528 pyconfig.py:471] Config param metrics_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-09-40/metrics/ I0425 09:40:10.749226 135841449502528 pyconfig.py:471] Config param metrics_file: I0425 09:40:10.749242 135841449502528 pyconfig.py:471] Config param mhc_expansion_rate: 1 I0425 09:40:10.749257 135841449502528 pyconfig.py:471] Config param micro_batch_size_to_eval_on: 64 I0425 09:40:10.749272 135841449502528 pyconfig.py:471] Config param micro_batch_size_to_train_on: 64 I0425 09:40:10.749287 135841449502528 pyconfig.py:471] Config param mla_kv: RematLocation.REMAT I0425 09:40:10.749303 135841449502528 pyconfig.py:471] Config param mla_naive_kvcache: True I0425 09:40:10.749318 135841449502528 pyconfig.py:471] Config param mla_q: RematLocation.REMAT I0425 09:40:10.749334 135841449502528 pyconfig.py:471] Config param mlp_activations: ['gelu'] I0425 09:40:10.749351 135841449502528 pyconfig.py:471] Config param mlp_activations_limit: -1.0 I0425 09:40:10.749367 135841449502528 pyconfig.py:471] Config param mlp_bias: False I0425 09:40:10.749383 135841449502528 pyconfig.py:471] Config param mlp_dim: 64 I0425 09:40:10.749399 135841449502528 pyconfig.py:471] Config param mlpwi: RematLocation.REMAT I0425 09:40:10.749415 135841449502528 pyconfig.py:471] Config param mlpwi_0: RematLocation.REMAT I0425 09:40:10.749430 135841449502528 pyconfig.py:471] Config param mlpwi_1: RematLocation.REMAT I0425 09:40:10.749446 135841449502528 pyconfig.py:471] Config param mlpwo: RematLocation.REMAT I0425 09:40:10.749463 135841449502528 pyconfig.py:471] Config param moba: False I0425 09:40:10.749477 135841449502528 pyconfig.py:471] Config param moba_chunk_size: 1024 I0425 09:40:10.749494 135841449502528 pyconfig.py:471] Config param moba_topk: 8 I0425 09:40:10.749509 135841449502528 pyconfig.py:471] Config param model_call_mode: I0425 09:40:10.749525 135841449502528 pyconfig.py:471] Config param model_name: gpt3-52k I0425 09:40:10.749543 135841449502528 pyconfig.py:471] Config param moe_expert_input_dim: -1 I0425 09:40:10.749559 135841449502528 pyconfig.py:471] Config param moe_fsdp_use_two_stage_all_gather: False I0425 09:40:10.749574 135841449502528 pyconfig.py:471] Config param moe_mlp_dim: -1 I0425 09:40:10.749590 135841449502528 pyconfig.py:471] Config param moe_mlpwi_0: RematLocation.REMAT I0425 09:40:10.749606 135841449502528 pyconfig.py:471] Config param moe_mlpwi_1: RematLocation.REMAT I0425 09:40:10.749624 135841449502528 pyconfig.py:471] Config param moe_mlpwo: RematLocation.REMAT I0425 09:40:10.749639 135841449502528 pyconfig.py:471] Config param monitor_goodput: False I0425 09:40:10.749656 135841449502528 pyconfig.py:471] Config param monitor_step_time_deviation: True I0425 09:40:10.749671 135841449502528 pyconfig.py:471] Config param mrope_section: [24, 20, 20] I0425 09:40:10.749688 135841449502528 pyconfig.py:471] Config param mscale: 1.0 I0425 09:40:10.749703 135841449502528 pyconfig.py:471] Config param mtc_data_parallelism: 0 I0425 09:40:10.749719 135841449502528 pyconfig.py:471] Config param mtp_eval_target_module: 0 I0425 09:40:10.749734 135841449502528 pyconfig.py:471] Config param mtp_loss_scaling_factor: 0.1 I0425 09:40:10.749750 135841449502528 pyconfig.py:471] Config param mtp_num_layers: 0 I0425 09:40:10.749766 135841449502528 pyconfig.py:471] Config param mu_dtype: float32 I0425 09:40:10.749791 135841449502528 pyconfig.py:471] Config param multi_sampling: False I0425 09:40:10.749806 135841449502528 pyconfig.py:471] Config param multi_tier_checkpointing_backup_interval_minutes: 0 I0425 09:40:10.749822 135841449502528 pyconfig.py:471] Config param muon_beta: 0.95 I0425 09:40:10.749837 135841449502528 pyconfig.py:471] Config param muon_consistent_rms: None I0425 09:40:10.749853 135841449502528 pyconfig.py:471] Config param muon_weight_decay: 0.0 I0425 09:40:10.749868 135841449502528 pyconfig.py:471] Config param n_routing_groups: -1 I0425 09:40:10.749884 135841449502528 pyconfig.py:471] Config param n_window_for_audio: 50 I0425 09:40:10.749899 135841449502528 pyconfig.py:471] Config param n_window_infer_for_audio: 800 I0425 09:40:10.749915 135841449502528 pyconfig.py:471] Config param nope_layer_interval: -1 I0425 09:40:10.749929 135841449502528 pyconfig.py:471] Config param norm_topk_prob: False I0425 09:40:10.749946 135841449502528 pyconfig.py:471] Config param normalization_layer_epsilon: 1e-05 I0425 09:40:10.749963 135841449502528 pyconfig.py:471] Config param normalize_embedding_logits: False I0425 09:40:10.749979 135841449502528 pyconfig.py:471] Config param num_attention_heads_for_vit: 16 I0425 09:40:10.749995 135841449502528 pyconfig.py:471] Config param num_batches: 4 I0425 09:40:10.750010 135841449502528 pyconfig.py:471] Config param num_channels_for_vit: 3 I0425 09:40:10.750025 135841449502528 pyconfig.py:471] Config param num_conv_layers_for_audio: 3 I0425 09:40:10.750040 135841449502528 pyconfig.py:471] Config param num_decoder_layers: 1 I0425 09:40:10.750056 135841449502528 pyconfig.py:471] Config param num_diloco_replicas: 1 I0425 09:40:10.750072 135841449502528 pyconfig.py:471] Config param num_epoch: 1 I0425 09:40:10.750087 135841449502528 pyconfig.py:471] Config param num_eval_passes: 1 I0425 09:40:10.750114 135841449502528 pyconfig.py:471] Config param num_experts: 1 I0425 09:40:10.750130 135841449502528 pyconfig.py:471] Config param num_experts_per_tok: 1 I0425 09:40:10.750146 135841449502528 pyconfig.py:471] Config param num_generations: 2 I0425 09:40:10.750161 135841449502528 pyconfig.py:471] Config param num_hidden_layers_for_vit: 34 I0425 09:40:10.750177 135841449502528 pyconfig.py:471] Config param num_iterations: 1 I0425 09:40:10.750192 135841449502528 pyconfig.py:471] Config param num_kv_heads: 2 I0425 09:40:10.750207 135841449502528 pyconfig.py:471] Config param num_layers_per_pipeline_stage: 1 I0425 09:40:10.750224 135841449502528 pyconfig.py:471] Config param num_mel_bins_for_audio: 128 I0425 09:40:10.750238 135841449502528 pyconfig.py:471] Config param num_pipeline_microbatches: -1 I0425 09:40:10.750254 135841449502528 pyconfig.py:471] Config param num_pipeline_repeats: -1 I0425 09:40:10.750269 135841449502528 pyconfig.py:471] Config param num_position_embeddings_for_vit: 1024 I0425 09:40:10.750285 135841449502528 pyconfig.py:471] Config param num_query_heads: 2 I0425 09:40:10.750300 135841449502528 pyconfig.py:471] Config param num_samplers_slices: -1 I0425 09:40:10.750315 135841449502528 pyconfig.py:471] Config param num_slices: 1 I0425 09:40:10.750332 135841449502528 pyconfig.py:471] Config param num_target_devices: 32 I0425 09:40:10.750348 135841449502528 pyconfig.py:471] Config param num_test_batches: 5 I0425 09:40:10.750362 135841449502528 pyconfig.py:471] Config param num_trainer_slices: -1 I0425 09:40:10.750378 135841449502528 pyconfig.py:471] Config param num_vocab_tiling: 1 I0425 09:40:10.750393 135841449502528 pyconfig.py:471] Config param off_policy_steps: 0 I0425 09:40:10.750409 135841449502528 pyconfig.py:471] Config param offline_data_dir: None I0425 09:40:10.750424 135841449502528 pyconfig.py:471] Config param opt_type: OptimizerType.ADAM_PAX I0425 09:40:10.750443 135841449502528 pyconfig.py:471] Config param optimize_mesh_for_tpu_v6e: False I0425 09:40:10.750459 135841449502528 pyconfig.py:471] Config param optimizer_memory_host_offload: False I0425 09:40:10.750473 135841449502528 pyconfig.py:471] Config param original_max_position_embeddings: 4096 I0425 09:40:10.750490 135841449502528 pyconfig.py:471] Config param out_hidden_size_for_vit: 512 I0425 09:40:10.750504 135841449502528 pyconfig.py:471] Config param out_proj: RematLocation.REMAT I0425 09:40:10.750520 135841449502528 pyconfig.py:471] Config param output_dim_for_audio: 512 I0425 09:40:10.750540 135841449502528 pyconfig.py:471] Config param override_logical_axis_rules: False I0425 09:40:10.750555 135841449502528 pyconfig.py:471] Config param override_model_config: True I0425 09:40:10.750571 135841449502528 pyconfig.py:471] Config param packing: True I0425 09:40:10.750586 135841449502528 pyconfig.py:471] Config param pagedattn_head_dim_alignment: 128 I0425 09:40:10.750602 135841449502528 pyconfig.py:471] Config param pagedattn_max_pages_per_group: -1 I0425 09:40:10.750617 135841449502528 pyconfig.py:471] Config param pagedattn_num_pages: 64 I0425 09:40:10.750633 135841449502528 pyconfig.py:471] Config param pagedattn_pages_per_compute_block: 4 I0425 09:40:10.750648 135841449502528 pyconfig.py:471] Config param pagedattn_tokens_per_page: 32 I0425 09:40:10.750664 135841449502528 pyconfig.py:471] Config param param_scan_axis: 1 I0425 09:40:10.750679 135841449502528 pyconfig.py:471] Config param parameter_memory_host_offload: False I0425 09:40:10.750694 135841449502528 pyconfig.py:471] Config param partial_rotary_factor: 1.0 I0425 09:40:10.750709 135841449502528 pyconfig.py:471] Config param patch_size_for_vit: 14 I0425 09:40:10.750725 135841449502528 pyconfig.py:471] Config param penalty_incorrect_answer: -1.0 I0425 09:40:10.750740 135841449502528 pyconfig.py:471] Config param penalty_incorrect_format: -0.5 I0425 09:40:10.750757 135841449502528 pyconfig.py:471] Config param per_device_batch_size: 2 I0425 09:40:10.750773 135841449502528 pyconfig.py:471] Config param per_device_batch_size_increment: 2.0 I0425 09:40:10.750787 135841449502528 pyconfig.py:471] Config param per_device_batch_size_start: 4.0 I0425 09:40:10.750803 135841449502528 pyconfig.py:471] Config param pipeline_delay_activation_forwarding: False I0425 09:40:10.750818 135841449502528 pyconfig.py:471] Config param pipeline_fsdp_ag_once: False I0425 09:40:10.750836 135841449502528 pyconfig.py:471] Config param pipeline_fsdp_ag_per_repeat: False I0425 09:40:10.750852 135841449502528 pyconfig.py:471] Config param pipeline_parallel_layers: 1 I0425 09:40:10.750868 135841449502528 pyconfig.py:471] Config param pixel_shuffle_ratio_for_vit: 0.5 I0425 09:40:10.750884 135841449502528 pyconfig.py:471] Config param posemb_type_for_vit: learn I0425 09:40:10.750900 135841449502528 pyconfig.py:471] Config param position_id_per_seconds: 25 I0425 09:40:10.750916 135841449502528 pyconfig.py:471] Config param prefill_cache_axis_order: 1,2,0,3 I0425 09:40:10.750931 135841449502528 pyconfig.py:471] Config param prefill_cache_dir: I0425 09:40:10.750947 135841449502528 pyconfig.py:471] Config param prefill_chunk_size: 256 I0425 09:40:10.750962 135841449502528 pyconfig.py:471] Config param prefill_slice: v5e-16 I0425 09:40:10.750978 135841449502528 pyconfig.py:471] Config param prefix_caching_dram_byte: 100000000000 I0425 09:40:10.750993 135841449502528 pyconfig.py:471] Config param prefix_caching_hbm_byte: 10000000000 I0425 09:40:10.751009 135841449502528 pyconfig.py:471] Config param prefuse_moe_weights: False I0425 09:40:10.751024 135841449502528 pyconfig.py:471] Config param profile_cleanly: True I0425 09:40:10.751039 135841449502528 pyconfig.py:471] Config param profile_periodically_period: -1 I0425 09:40:10.751056 135841449502528 pyconfig.py:471] Config param profile_power_events: False I0425 09:40:10.751070 135841449502528 pyconfig.py:471] Config param profiler: ProfilerType.NONE I0425 09:40:10.751088 135841449502528 pyconfig.py:471] Config param profiler_steps: 5 I0425 09:40:10.751112 135841449502528 pyconfig.py:471] Config param projector_dropout_for_vit: 0.0 I0425 09:40:10.751128 135841449502528 pyconfig.py:471] Config param projector_input_dim_for_vit: 4096 I0425 09:40:10.751144 135841449502528 pyconfig.py:471] Config param projector_output_dim_for_vit: 4096 I0425 09:40:10.751158 135841449502528 pyconfig.py:471] Config param prometheus_port: 0 I0425 09:40:10.751175 135841449502528 pyconfig.py:471] Config param prompt: I love to I0425 09:40:10.751191 135841449502528 pyconfig.py:471] Config param pure_nnx: False I0425 09:40:10.751207 135841449502528 pyconfig.py:471] Config param pure_nnx_decoder: False I0425 09:40:10.751222 135841449502528 pyconfig.py:471] Config param q_lora_rank: 0 I0425 09:40:10.751238 135841449502528 pyconfig.py:471] Config param qk_clip_threshold: 100.0 I0425 09:40:10.751253 135841449502528 pyconfig.py:471] Config param qk_nope_head_dim: 128 I0425 09:40:10.751269 135841449502528 pyconfig.py:471] Config param qk_norm_with_scale: True I0425 09:40:10.751285 135841449502528 pyconfig.py:471] Config param qk_rope_head_dim: 64 I0425 09:40:10.751300 135841449502528 pyconfig.py:471] Config param qkv_proj: RematLocation.REMAT I0425 09:40:10.751316 135841449502528 pyconfig.py:471] Config param quant_cfg_path: I0425 09:40:10.751331 135841449502528 pyconfig.py:471] Config param quantization: QuantizationType.NONE I0425 09:40:10.751350 135841449502528 pyconfig.py:471] Config param quantization_local_shard_count: 4 I0425 09:40:10.751366 135841449502528 pyconfig.py:471] Config param quantize_kvcache: False I0425 09:40:10.751382 135841449502528 pyconfig.py:471] Config param query_proj: RematLocation.REMAT I0425 09:40:10.751399 135841449502528 pyconfig.py:471] Config param query_wa_proj: RematLocation.REMAT I0425 09:40:10.751416 135841449502528 pyconfig.py:471] Config param ragged_block_size: 256 I0425 09:40:10.751431 135841449502528 pyconfig.py:471] Config param ragged_buffer_factor: -1.0 I0425 09:40:10.751447 135841449502528 pyconfig.py:471] Config param rampup_end_step: 0 I0425 09:40:10.751461 135841449502528 pyconfig.py:471] Config param rampup_samples_per_increment_to_load: None I0425 09:40:10.751477 135841449502528 pyconfig.py:471] Config param reasoning_end_token: </reasoning> I0425 09:40:10.751494 135841449502528 pyconfig.py:471] Config param reasoning_start_token: <reasoning> I0425 09:40:10.751509 135841449502528 pyconfig.py:471] Config param record_internal_nn_metrics: 0 I0425 09:40:10.751525 135841449502528 pyconfig.py:471] Config param remat_policy: full I0425 09:40:10.751543 135841449502528 pyconfig.py:471] Config param remat_policy_for_vit: minimal I0425 09:40:10.751560 135841449502528 pyconfig.py:471] Config param remove_size_one_mesh_axis_from_type: True I0425 09:40:10.751574 135841449502528 pyconfig.py:471] Config param replicate_quant_scale: False I0425 09:40:10.751589 135841449502528 pyconfig.py:471] Config param replicator_backup_interval_minutes: 0 I0425 09:40:10.751605 135841449502528 pyconfig.py:471] Config param report_heartbeat_metric_for_gcp_monitoring: False I0425 09:40:10.751622 135841449502528 pyconfig.py:471] Config param report_performance_metric_for_gcp_monitoring: False I0425 09:40:10.751639 135841449502528 pyconfig.py:471] Config param reshape_q: False I0425 09:40:10.751654 135841449502528 pyconfig.py:471] Config param return_log_prob: False I0425 09:40:10.751670 135841449502528 pyconfig.py:471] Config param reuse_example_batch: 0 I0425 09:40:10.751684 135841449502528 pyconfig.py:471] Config param reward_exact_answer: 5.0 I0425 09:40:10.751700 135841449502528 pyconfig.py:471] Config param reward_exact_format_match: 3.0 I0425 09:40:10.751717 135841449502528 pyconfig.py:471] Config param reward_partial_format_match: 0.5 I0425 09:40:10.751733 135841449502528 pyconfig.py:471] Config param reward_ratio_guess_to_answer_high: 0.5 I0425 09:40:10.751749 135841449502528 pyconfig.py:471] Config param reward_ratio_guess_to_answer_low: 0.25 I0425 09:40:10.751764 135841449502528 pyconfig.py:471] Config param reward_white_space_format_match: 1.5 I0425 09:40:10.751779 135841449502528 pyconfig.py:471] Config param rl: {'num_generations': 2, 'num_iterations': 1, 'grpo_beta': 0.08, 'grpo_epsilon': 0.2, 'loss_algo': 'grpo', 'use_agentic_rollout': False, 'max_concurrency': 256, 'off_policy_steps': 0, 'system_prompt': '', 'degenerate_group_masking': True, 'epsilon_high': None} I0425 09:40:10.751800 135841449502528 pyconfig.py:471] Config param rollout_data_parallelism: -1 I0425 09:40:10.751817 135841449502528 pyconfig.py:471] Config param rollout_expert_parallelism: 1 I0425 09:40:10.751831 135841449502528 pyconfig.py:471] Config param rollout_micro_batch_size: -1 I0425 09:40:10.751847 135841449502528 pyconfig.py:471] Config param rollout_tensor_parallelism: -1 I0425 09:40:10.751862 135841449502528 pyconfig.py:471] Config param rope_attention_scaling: False I0425 09:40:10.751878 135841449502528 pyconfig.py:471] Config param rope_factor: 40 I0425 09:40:10.751894 135841449502528 pyconfig.py:471] Config param rope_interleave: True I0425 09:40:10.751909 135841449502528 pyconfig.py:471] Config param rope_linear_scaling_factor: 1.0 I0425 09:40:10.751925 135841449502528 pyconfig.py:471] Config param rope_max_timescale: 10000 I0425 09:40:10.751941 135841449502528 pyconfig.py:471] Config param rope_min_timescale: 1 I0425 09:40:10.751956 135841449502528 pyconfig.py:471] Config param rope_theta_for_vit: 10000 I0425 09:40:10.751972 135841449502528 pyconfig.py:471] Config param rope_truncate: True I0425 09:40:10.751986 135841449502528 pyconfig.py:471] Config param rope_type: RopeType.DEFAULT I0425 09:40:10.752004 135841449502528 pyconfig.py:471] Config param rope_use_scale: True I0425 09:40:10.752021 135841449502528 pyconfig.py:471] Config param routed_bias: False I0425 09:40:10.752035 135841449502528 pyconfig.py:471] Config param routed_bias_update_rate: 0.0 I0425 09:40:10.752051 135841449502528 pyconfig.py:471] Config param routed_scaling_factor: 1.0 I0425 09:40:10.752066 135841449502528 pyconfig.py:471] Config param routed_score_func: I0425 09:40:10.752082 135841449502528 pyconfig.py:471] Config param run_name: gpt3-52k_2026-04-25-09-40 I0425 09:40:10.752116 135841449502528 pyconfig.py:471] Config param sa_block_kv: 512 I0425 09:40:10.752134 135841449502528 pyconfig.py:471] Config param sa_block_kv_compute: 512 I0425 09:40:10.752150 135841449502528 pyconfig.py:471] Config param sa_block_kv_dkv: 512 I0425 09:40:10.752164 135841449502528 pyconfig.py:471] Config param sa_block_kv_dkv_compute: 512 I0425 09:40:10.752180 135841449502528 pyconfig.py:471] Config param sa_block_kv_dq: 512 I0425 09:40:10.752196 135841449502528 pyconfig.py:471] Config param sa_block_q: 512 I0425 09:40:10.752211 135841449502528 pyconfig.py:471] Config param sa_block_q_dkv: 512 I0425 09:40:10.752227 135841449502528 pyconfig.py:471] Config param sa_block_q_dq: 512 I0425 09:40:10.752243 135841449502528 pyconfig.py:471] Config param sa_k_layout: HEAD_DIM_MINOR I0425 09:40:10.752260 135841449502528 pyconfig.py:471] Config param sa_q_layout: HEAD_DIM_MINOR I0425 09:40:10.752276 135841449502528 pyconfig.py:471] Config param sa_use_fused_bwd_kernel: False I0425 09:40:10.752291 135841449502528 pyconfig.py:471] Config param sa_v_layout: HEAD_DIM_MINOR I0425 09:40:10.752307 135841449502528 pyconfig.py:471] Config param sampler_devices_fraction: 0.5 I0425 09:40:10.752324 135841449502528 pyconfig.py:471] Config param save_checkpoint_on_completion: True I0425 09:40:10.752341 135841449502528 pyconfig.py:471] Config param save_config_to_gcs: False I0425 09:40:10.752357 135841449502528 pyconfig.py:471] Config param save_quantized_params_path: I0425 09:40:10.752372 135841449502528 pyconfig.py:471] Config param scale_embedding_for_audio: True I0425 09:40:10.752387 135841449502528 pyconfig.py:471] Config param scan_layers: True I0425 09:40:10.752403 135841449502528 pyconfig.py:471] Config param scan_layers_per_stage: False I0425 09:40:10.752419 135841449502528 pyconfig.py:471] Config param scan_pipeline_iterations: True I0425 09:40:10.752435 135841449502528 pyconfig.py:471] Config param scan_pipeline_repeats: False I0425 09:40:10.752452 135841449502528 pyconfig.py:471] Config param set_remat_policy_on_layers_per_stage: False I0425 09:40:10.752466 135841449502528 pyconfig.py:471] Config param set_remat_policy_on_pipeline_iterations: True I0425 09:40:10.752481 135841449502528 pyconfig.py:471] Config param sft_train_on_completion_only: False I0425 09:40:10.752498 135841449502528 pyconfig.py:471] Config param shard_exp_on_fsdp: False I0425 09:40:10.752514 135841449502528 pyconfig.py:471] Config param shard_mode: ShardMode.AUTO I0425 09:40:10.752536 135841449502528 pyconfig.py:471] Config param shard_optimizer_over_data: False I0425 09:40:10.752554 135841449502528 pyconfig.py:471] Config param sharding_strategy: None I0425 09:40:10.752571 135841449502528 pyconfig.py:471] Config param sharding_tolerance: 0.02 I0425 09:40:10.752587 135841449502528 pyconfig.py:471] Config param shardy: True I0425 09:40:10.752603 135841449502528 pyconfig.py:471] Config param share_kv_projections: False I0425 09:40:10.752619 135841449502528 pyconfig.py:471] Config param shared_experts: 0 I0425 09:40:10.752635 135841449502528 pyconfig.py:471] Config param sinkhorn_iterations: 20 I0425 09:40:10.752650 135841449502528 pyconfig.py:471] Config param skip_first_n_steps_for_profiler: 1 I0425 09:40:10.752666 135841449502528 pyconfig.py:471] Config param skip_jax_distributed_system: False I0425 09:40:10.752682 135841449502528 pyconfig.py:471] Config param skip_step_interval: 128 I0425 09:40:10.752696 135841449502528 pyconfig.py:471] Config param skip_step_on_spikes: False I0425 09:40:10.752712 135841449502528 pyconfig.py:471] Config param skip_step_scaling_factor: 6.0 I0425 09:40:10.752729 135841449502528 pyconfig.py:471] Config param sliding_window_size: 0 I0425 09:40:10.752743 135841449502528 pyconfig.py:471] Config param solution_end_token: </answer> I0425 09:40:10.752759 135841449502528 pyconfig.py:471] Config param solution_start_token: <answer> I0425 09:40:10.752774 135841449502528 pyconfig.py:471] Config param source_checkpoint_layout: orbax I0425 09:40:10.752790 135841449502528 pyconfig.py:471] Config param sparse_matmul: True I0425 09:40:10.752805 135841449502528 pyconfig.py:471] Config param spatial_merge_size_for_vit: 2 I0425 09:40:10.752821 135841449502528 pyconfig.py:471] Config param stack_prefill_result_cache: False I0425 09:40:10.752835 135841449502528 pyconfig.py:471] Config param stack_trace_interval_seconds: 600 I0425 09:40:10.752852 135841449502528 pyconfig.py:471] Config param stack_trace_to_cloud: False I0425 09:40:10.752868 135841449502528 pyconfig.py:471] Config param step_deviation_interval_seconds: 30 I0425 09:40:10.752883 135841449502528 pyconfig.py:471] Config param steps: 200000 I0425 09:40:10.752899 135841449502528 pyconfig.py:471] Config param stop_strings: None I0425 09:40:10.752914 135841449502528 pyconfig.py:471] Config param student_overrides: {'model_name': 'llama3.1-8b'} I0425 09:40:10.752931 135841449502528 pyconfig.py:471] Config param student_params_to_update: None I0425 09:40:10.752948 135841449502528 pyconfig.py:471] Config param subslice_shape: I0425 09:40:10.752964 135841449502528 pyconfig.py:471] Config param swap_space_vllm_gb: 2 I0425 09:40:10.752980 135841449502528 pyconfig.py:471] Config param system_prompt: I0425 09:40:10.752995 135841449502528 pyconfig.py:471] Config param target_eval_loss: 0.0 I0425 09:40:10.753010 135841449502528 pyconfig.py:471] Config param teacher_overrides: {'model_name': 'llama3.1-8b'} I0425 09:40:10.753026 135841449502528 pyconfig.py:471] Config param temperature_tuning: False I0425 09:40:10.753041 135841449502528 pyconfig.py:471] Config param temporal_patch_size_for_vit: 2 I0425 09:40:10.753057 135841449502528 pyconfig.py:471] Config param tensorboard_dir: /deps/maxtext_output/gpt3-52k_2026-04-25-09-40/tensorboard/ I0425 09:40:10.753073 135841449502528 pyconfig.py:471] Config param tensors_on_device: None I0425 09:40:10.753089 135841449502528 pyconfig.py:471] Config param tensors_to_offload: None I0425 09:40:10.753116 135841449502528 pyconfig.py:471] Config param test_batch_start_index: 0 I0425 09:40:10.753132 135841449502528 pyconfig.py:471] Config param tile_size_for_vit: 336 I0425 09:40:10.753146 135841449502528 pyconfig.py:471] Config param tokenize_eval_data: True I0425 09:40:10.753163 135841449502528 pyconfig.py:471] Config param tokenize_train_data: True I0425 09:40:10.753178 135841449502528 pyconfig.py:471] Config param tokenizer_path: meta-llama/Llama-3.1-8B I0425 09:40:10.753194 135841449502528 pyconfig.py:471] Config param tokenizer_type: TokenizerType.HUGGINGFACE I0425 09:40:10.753213 135841449502528 pyconfig.py:471] Config param topk_routing_group: -1 I0425 09:40:10.753227 135841449502528 pyconfig.py:471] Config param train_data_columns: ['text'] I0425 09:40:10.753244 135841449502528 pyconfig.py:471] Config param train_fraction: 1.0 I0425 09:40:10.753259 135841449502528 pyconfig.py:471] Config param train_image_column: image I0425 09:40:10.753275 135841449502528 pyconfig.py:471] Config param train_micro_batch_size: -1 I0425 09:40:10.753291 135841449502528 pyconfig.py:471] Config param train_split: train I0425 09:40:10.753308 135841449502528 pyconfig.py:471] Config param trainable_parameters_mask: [] I0425 09:40:10.753324 135841449502528 pyconfig.py:471] Config param trainable_position_size: 2048 I0425 09:40:10.753339 135841449502528 pyconfig.py:471] Config param trainer_devices_fraction: 0.5 I0425 09:40:10.753356 135841449502528 pyconfig.py:471] Config param upload_all_profiler_results: False I0425 09:40:10.753372 135841449502528 pyconfig.py:471] Config param use_2d_fsdp_sharding: False I0425 09:40:10.753388 135841449502528 pyconfig.py:471] Config param use_agentic_rollout: False I0425 09:40:10.753403 135841449502528 pyconfig.py:471] Config param use_audio: False I0425 09:40:10.753419 135841449502528 pyconfig.py:471] Config param use_audio_in_video: False I0425 09:40:10.753434 135841449502528 pyconfig.py:471] Config param use_batch_split_schedule: False I0425 09:40:10.753450 135841449502528 pyconfig.py:471] Config param use_chat_template: False I0425 09:40:10.753464 135841449502528 pyconfig.py:471] Config param use_chunked_prefill: False I0425 09:40:10.753482 135841449502528 pyconfig.py:471] Config param use_custom_sort_vjp: True I0425 09:40:10.753499 135841449502528 pyconfig.py:471] Config param use_dpo: False I0425 09:40:10.753515 135841449502528 pyconfig.py:471] Config param use_gather_mosaic_kernel: False I0425 09:40:10.753536 135841449502528 pyconfig.py:471] Config param use_grpo: True I0425 09:40:10.753551 135841449502528 pyconfig.py:471] Config param use_indexer: False I0425 09:40:10.753568 135841449502528 pyconfig.py:471] Config param use_iota_embed: True I0425 09:40:10.753583 135841449502528 pyconfig.py:471] Config param use_jax_splash: False I0425 09:40:10.753599 135841449502528 pyconfig.py:471] Config param use_max_logit_estimate: -1 I0425 09:40:10.753613 135841449502528 pyconfig.py:471] Config param use_mrope: False I0425 09:40:10.753630 135841449502528 pyconfig.py:471] Config param use_multimodal: False I0425 09:40:10.753644 135841449502528 pyconfig.py:471] Config param use_pathways: True I0425 09:40:10.753659 135841449502528 pyconfig.py:471] Config param use_post_attn_norm: False I0425 09:40:10.753675 135841449502528 pyconfig.py:471] Config param use_post_ffw_norm: False I0425 09:40:10.753690 135841449502528 pyconfig.py:471] Config param use_qk_clip: False I0425 09:40:10.753706 135841449502528 pyconfig.py:471] Config param use_qk_norm: False I0425 09:40:10.753720 135841449502528 pyconfig.py:471] Config param use_qk_norm_in_gdn: True I0425 09:40:10.753736 135841449502528 pyconfig.py:471] Config param use_qwix_quantization: False I0425 09:40:10.753751 135841449502528 pyconfig.py:471] Config param use_ragged_attention: False I0425 09:40:10.753767 135841449502528 pyconfig.py:471] Config param use_random_routing: False I0425 09:40:10.753783 135841449502528 pyconfig.py:471] Config param use_replicator_service: False I0425 09:40:10.753800 135841449502528 pyconfig.py:471] Config param use_ring_of_experts: False I0425 09:40:10.753814 135841449502528 pyconfig.py:471] Config param use_sft: False I0425 09:40:10.753830 135841449502528 pyconfig.py:471] Config param use_splash_scheduler: False I0425 09:40:10.753846 135841449502528 pyconfig.py:471] Config param use_tokamax_gmm: False I0425 09:40:10.753862 135841449502528 pyconfig.py:471] Config param use_tokamax_splash: False I0425 09:40:10.753876 135841449502528 pyconfig.py:471] Config param use_truncation: True I0425 09:40:10.753892 135841449502528 pyconfig.py:471] Config param use_tunix_gradient_accumulation: False I0425 09:40:10.753907 135841449502528 pyconfig.py:471] Config param use_untrainable_positional_embedding: False I0425 09:40:10.753924 135841449502528 pyconfig.py:471] Config param use_vertex_tensorboard: False I0425 09:40:10.753938 135841449502528 pyconfig.py:471] Config param using_pipeline_parallelism: False I0425 09:40:10.753954 135841449502528 pyconfig.py:471] Config param v_head_dim: 128 I0425 09:40:10.753973 135841449502528 pyconfig.py:471] Config param v_norm_with_scale: True I0425 09:40:10.753997 135841449502528 pyconfig.py:471] Config param value_proj: RematLocation.REMAT I0425 09:40:10.754022 135841449502528 pyconfig.py:471] Config param vertex_tensorboard_project: I0425 09:40:10.754048 135841449502528 pyconfig.py:471] Config param vertex_tensorboard_region: I0425 09:40:10.754075 135841449502528 pyconfig.py:471] Config param video_path: I0425 09:40:10.754132 135841449502528 pyconfig.py:471] Config param video_placeholder: <|video|> I0425 09:40:10.754163 135841449502528 pyconfig.py:471] Config param vision_output_dim_for_vit: 4096 I0425 09:40:10.754193 135841449502528 pyconfig.py:471] Config param vision_output_length: -1 I0425 09:40:10.754220 135841449502528 pyconfig.py:471] Config param vllm_additional_config: {} I0425 09:40:10.754248 135841449502528 pyconfig.py:471] Config param vllm_hf_config_path: I0425 09:40:10.754275 135841449502528 pyconfig.py:471] Config param vllm_hf_overrides: {} I0425 09:40:10.754302 135841449502528 pyconfig.py:471] Config param vocab_size: 32000 I0425 09:40:10.754329 135841449502528 pyconfig.py:471] Config param warmup_steps_fraction: 0.1 I0425 09:40:10.754356 135841449502528 pyconfig.py:471] Config param weight_dtype: float32 I0425 09:40:10.754397 135841449502528 pyconfig.py:471] Config param weight_quantization_calibration_method: absmax I0425 09:40:10.754425 135841449502528 pyconfig.py:471] Config param wi_tile_dlhs_batch_seq: 512 I0425 09:40:10.754450 135841449502528 pyconfig.py:471] Config param wi_tile_dlhs_embed_dim: 1024 I0425 09:40:10.754475 135841449502528 pyconfig.py:471] Config param wi_tile_dlhs_mlp_dim: 1024 I0425 09:40:10.754501 135841449502528 pyconfig.py:471] Config param wi_tile_drhs_batch_seq: 512 I0425 09:40:10.754525 135841449502528 pyconfig.py:471] Config param wi_tile_drhs_embed_dim: 1024 I0425 09:40:10.754557 135841449502528 pyconfig.py:471] Config param wi_tile_drhs_mlp_dim: 1024 I0425 09:40:10.754578 135841449502528 pyconfig.py:471] Config param wi_tile_fwd_batch_seq: 512 I0425 09:40:10.754593 135841449502528 pyconfig.py:471] Config param wi_tile_fwd_embed_dim: 1024 I0425 09:40:10.754608 135841449502528 pyconfig.py:471] Config param wi_tile_fwd_mlp_dim: 1024 I0425 09:40:10.754623 135841449502528 pyconfig.py:471] Config param wo_tile_dlhs_batch_seq: 512 I0425 09:40:10.754639 135841449502528 pyconfig.py:471] Config param wo_tile_dlhs_embed_dim: 1024 I0425 09:40:10.754655 135841449502528 pyconfig.py:471] Config param wo_tile_dlhs_mlp_dim: 1024 I0425 09:40:10.754671 135841449502528 pyconfig.py:471] Config param wo_tile_drhs_batch_seq: 512 I0425 09:40:10.754685 135841449502528 pyconfig.py:471] Config param wo_tile_drhs_embed_dim: 1024 I0425 09:40:10.754701 135841449502528 pyconfig.py:471] Config param wo_tile_drhs_mlp_dim: 1024 I0425 09:40:10.754718 135841449502528 pyconfig.py:471] Config param wo_tile_fwd_batch_seq: 512 I0425 09:40:10.754733 135841449502528 pyconfig.py:471] Config param wo_tile_fwd_embed_dim: 1024 I0425 09:40:10.754747 135841449502528 pyconfig.py:471] Config param wo_tile_fwd_mlp_dim: 1024 I0425 09:40:10.754764 135841449502528 pyconfig.py:471] Config param wsd_decay_steps_fraction: 0.1 I0425 09:40:10.754781 135841449502528 pyconfig.py:471] Config param wsd_decay_style: WsdDecayStyle.LINEAR I0425 09:40:10.754800 135841449502528 pyconfig.py:471] Config param xprof_e2e_enable_fw_power_level_event: False I0425 09:40:10.754814 135841449502528 pyconfig.py:471] Config param xprof_e2e_enable_fw_thermal_event: False I0425 09:40:10.754830 135841449502528 pyconfig.py:471] Config param xprof_e2e_enable_fw_throttle_event: False I0425 09:40:10.754844 135841449502528 pyconfig.py:471] Config param xprof_tpu_power_trace_level: 0 I0425 09:40:10.754863 135841449502528 pyconfig.py:471] Config param z_loss_multiplier: 0.0 I0425 09:40:10.755196 135841449502528 tokenizer.py:245] Tokenizer path: meta-llama/Llama-2-7b-chat-hf I0425 09:40:10.755246 135841449502528 tokenizer.py:224] Loading HF tokenizer: meta-llama/Llama-2-7b-chat-hf I0425 09:40:10.929849 135841449502528 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json "HTTP/1.1 200 OK" I0425 09:40:11.038690 135841449502528 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json "HTTP/1.1 200 OK" I0425 09:40:11.145458 135841449502528 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer_config.json "HTTP/1.1 200 OK" I0425 09:40:11.256408 135841449502528 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer_config.json "HTTP/1.1 200 OK" I0425 09:40:11.366306 135841449502528 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main/additional_chat_templates?recursive=false&expand=false "HTTP/1.1 404 Not Found" I0425 09:40:11.470265 135841449502528 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main?recursive=true&expand=false "HTTP/1.1 200 OK" I0425 09:40:11.577383 135841449502528 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer.model "HTTP/1.1 302 Found" I0425 09:40:11.687134 135841449502528 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/xet-read-token/f5db02db724555f92da89c216ac04704f23d4590 "HTTP/1.1 200 OK" I0425 09:40:12.368352 135841449502528 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer.json "HTTP/1.1 200 OK" I0425 09:40:12.491213 135841449502528 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer.json "HTTP/1.1 200 OK" I0425 09:40:12.888143 135841449502528 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/added_tokens.json "HTTP/1.1 404 Not Found" I0425 09:40:12.997843 135841449502528 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/special_tokens_map.json "HTTP/1.1 200 OK" I0425 09:40:13.108508 135841449502528 _client.py:1025] HTTP Request: GET https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/special_tokens_map.json "HTTP/1.1 200 OK" I0425 09:40:13.217005 135841449502528 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/chat_template.jinja "HTTP/1.1 404 Not Found" I0425 09:40:13.308080 135841449502528 _schedule.py:129] A polynomial schedule was set with a non-positive `transition_steps` value; this results in a constant schedule with value `init_value`. I0425 09:40:13.315258 135841449502528 maxtext_utils.py:1771] Num_devices: 32, shape (1, 4, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1) I0425 09:40:13.315392 135841449502528 train_distill.py:582] Applying logical axis rules for model initialization and training... I0425 09:40:13.315463 135841449502528 train_distill.py:586] Loading Student from ... I0425 09:40:13.315491 135841449502528 train_distill.py:170] --- Student Configuration --- I0425 09:40:13.315524 135841449502528 train_distill.py:171] Model Name: gpt3-52k I0425 09:40:13.315546 135841449502528 train_distill.py:172] Dimensions: 1 Layers, 16 Emb Dim, 8 Head Dim I0425 09:40:13.315571 135841449502528 train_distill.py:175] Attention Heads: 2 Query, 2 KV I0425 09:40:13.315591 135841449502528 train_distill.py:176] Vocab Size: 32000 I0425 09:40:13.315609 135841449502528 train_distill.py:177] Checkpoint: I0425 09:40:13.315628 135841449502528 train_distill.py:451] Initializing model: gpt3-52k... I0425 09:40:14.967737 135841449502528 train_distill.py:600] Loading Teacher from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items... I0425 09:40:14.967845 135841449502528 train_distill.py:170] --- Teacher Configuration --- I0425 09:40:14.967874 135841449502528 train_distill.py:171] Model Name: gpt3-52k I0425 09:40:14.967900 135841449502528 train_distill.py:172] Dimensions: 1 Layers, 16 Emb Dim, 8 Head Dim I0425 09:40:14.967921 135841449502528 train_distill.py:175] Attention Heads: 2 Query, 2 KV I0425 09:40:14.967941 135841449502528 train_distill.py:176] Vocab Size: 32000 I0425 09:40:14.967960 135841449502528 train_distill.py:177] Checkpoint: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items I0425 09:40:14.967979 135841449502528 train_distill.py:451] Initializing model: gpt3-52k... I0425 09:40:16.059128 135841449502528 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0425 09:40:16.059284 135841449502528 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=True, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7b8b552340b0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0425 09:40:16.059343 135841449502528 abstract_checkpointer.py:35] orbax-checkpoint version: 0.11.28 W0425 09:40:16.553310 135841449502528 checkpoint.py:202] Metadata file does not exist: gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items/_CHECKPOINT_METADATA I0425 09:40:17.083914 1974 google_auth_provider.cc:181] Running on GCE, using service account 562977990677-compute@developer.gserviceaccount.com I0425 09:40:18.084918 135841449502528 checkpointer.py:304] Restoring checkpoint from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items. W0425 09:40:20.123017 135841449502528 transform_utils.py:230] The transformations API will eventually be replaced by an upgraded design. The current API will not be removed until this point, but it will no longer be actively worked on. I0425 09:40:20.123401 135841449502528 transform_utils.py:288] The following keys are not loaded from the original tree after applying specified transforms: params/params/decoder/to_nnx__rngs/aqt/count, params/params/decoder/to_nnx__rngs/aqt/key, params/params/decoder/to_nnx__rngs/dropout/count, params/params/decoder/to_nnx__rngs/dropout/key, params/params/decoder/to_nnx__rngs/params/count, params/params/decoder/to_nnx__rngs/params/key I0425 09:40:22.174038 135841449502528 checkpointer.py:318] Finished restoring checkpoint in 4.46 seconds from gs://lance-maxtext/pt_seed_ckpts/pt_seed_ckpts/pt_seed_ckpt_gpt352k_v32k_linen/checkpoints/4/items. I0425 09:40:22.916039 135841449502528 train_distill.py:626] Initializing Data Iterators via MaxText pipeline... I0425 09:40:22.979585 135841449502528 config.py:112] TensorFlow version 2.20.0 available. I0425 09:40:22.980074 135841449502528 config.py:125] JAX version 0.9.2 available. I0425 09:40:23.379253 135841449502528 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/main/README.md "HTTP/1.1 307 Temporary Redirect" I0425 09:40:23.387244 135841449502528 _client.py:1025] HTTP Request: HEAD https://huggingface.co/api/resolve-cache/datasets/OptimalScale/ClimbMix/6d467b96d8f26cbe7465e2d70e36191aa75867ac/README.md "HTTP/1.1 200 OK" I0425 09:40:23.395687 135841449502528 _client.py:1025] HTTP Request: GET https://huggingface.co/api/resolve-cache/datasets/OptimalScale/ClimbMix/6d467b96d8f26cbe7465e2d70e36191aa75867ac/README.md "HTTP/1.1 200 OK" I0425 09:40:23.503421 135841449502528 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/6d467b96d8f26cbe7465e2d70e36191aa75867ac/ClimbMix.py "HTTP/1.1 404 Not Found" I0425 09:40:23.804675 135841449502528 _client.py:1025] HTTP Request: HEAD https://s3.amazonaws.com/datasets.huggingface.co/datasets/datasets/OptimalScale/ClimbMix/OptimalScale/ClimbMix.py "HTTP/1.1 404 Not Found" I0425 09:40:23.916430 135841449502528 _client.py:1025] HTTP Request: GET https://huggingface.co/api/datasets/OptimalScale/ClimbMix/revision/6d467b96d8f26cbe7465e2d70e36191aa75867ac "HTTP/1.1 200 OK" I0425 09:40:24.022580 135841449502528 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/6d467b96d8f26cbe7465e2d70e36191aa75867ac/.huggingface.yaml "HTTP/1.1 404 Not Found" I0425 09:40:24.192391 135841449502528 _client.py:1025] HTTP Request: GET https://datasets-server.huggingface.co/info?dataset=OptimalScale/ClimbMix "HTTP/1.1 200 OK" I0425 09:40:24.304739 135841449502528 _client.py:1025] HTTP Request: GET https://huggingface.co/api/datasets/OptimalScale/ClimbMix/tree/6d467b96d8f26cbe7465e2d70e36191aa75867ac/data?recursive=true&expand=false "HTTP/1.1 404 Not Found" I0425 09:40:24.415592 135841449502528 _client.py:1025] HTTP Request: GET https://huggingface.co/api/datasets/OptimalScale/ClimbMix/tree/6d467b96d8f26cbe7465e2d70e36191aa75867ac?recursive=false&expand=false "HTTP/1.1 200 OK" I0425 09:40:24.550088 135841449502528 _client.py:1025] HTTP Request: HEAD https://huggingface.co/datasets/OptimalScale/ClimbMix/resolve/6d467b96d8f26cbe7465e2d70e36191aa75867ac/dataset_infos.json "HTTP/1.1 404 Not Found" I0425 09:40:24.706859 135841449502528 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json "HTTP/1.1 200 OK" I0425 09:40:24.816681 135841449502528 _client.py:1025] HTTP Request: HEAD https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tokenizer_config.json "HTTP/1.1 200 OK" I0425 09:40:24.924253 135841449502528 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main/additional_chat_templates?recursive=false&expand=false "HTTP/1.1 404 Not Found" I0425 09:40:25.027009 135841449502528 _client.py:1025] HTTP Request: GET https://huggingface.co/api/models/meta-llama/Llama-2-7b-chat-hf/tree/main?recursive=true&expand=false "HTTP/1.1 200 OK" E0425 09:40:25.130743 135841449502528 packing.py:209] PackAndBatchOperation is deprecated. Please use lazy_dataset.FirstFitPackIterDataset instead. I0425 09:40:25.130956 135841449502528 data_loader.py:408] Adding CopyNumPyArrayToSharedMemory MapTransform. I0425 09:40:25.133968 135841449502528 train_distill.py:396] Input Pipeline Checkpointing: DISABLED I0425 09:40:25.134034 135841449502528 train_distill.py:400] Reason: Iterator 'MultiHostDataLoadIterator' is not recognized as Grain (dataset_type='DatasetType.HF', has_save=False) I0425 09:40:25.134111 135841449502528 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0425 09:40:25.134192 135841449502528 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7b8b552340b0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0425 09:40:25.134233 135841449502528 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0425 09:40:25.134265 135841449502528 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7b8b552340b0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0425 09:40:25.134305 135841449502528 checkpoint_manager.py:702] [process=6][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b85a6db0410>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b859e3f2f00>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7b727c0db860>}, handler_registry=None I0425 09:40:25.134498 135841449502528 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b85a6db0410>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0425 09:40:25.134548 135841449502528 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b859e3f2f00>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0425 09:40:25.134576 135841449502528 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7b727c0db860>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0425 09:40:25.134600 135841449502528 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7b85a6db2780>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0425 09:40:25.134627 135841449502528 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b85a6db0410>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b85a6db0410>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b859e3f2f00>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b859e3f2f00>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7b727c0db860>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7b727c0db860>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7b85a6db2780>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7b85a6db2780>}). I0425 09:40:25.135001 135841449502528 async_checkpointer.py:177] [process=6][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7b737ae84220> timeout: 600 secs and primary_host=0 for async checkpoint writes I0425 09:40:26.741923 135841449502528 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_trainstate_and_training_loop_20260425_092123/pt_distill_nnx_xpk_feat_nnx_trainstate_and_training_loop_20260425_092123_07_distill_smoke/checkpoints I0425 09:40:27.189987 135841449502528 checkpoint_manager.py:921] [process=6][thread=MainThread] CheckpointManager created, primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_trainstate_and_training_loop_20260425_092123/pt_distill_nnx_xpk_feat_nnx_trainstate_and_training_loop_20260425_092123_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7b727c0db830> I0425 09:40:27.190200 135841449502528 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0425 09:40:27.190279 135841449502528 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7b8b552340b0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0425 09:40:27.190320 135841449502528 pytree_checkpoint_handler.py:577] save_device_host_concurrent_bytes=None I0425 09:40:27.190355 135841449502528 base_pytree_checkpoint_handler.py:411] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=False, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7b8b552340b0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0425 09:40:27.190393 135841449502528 checkpoint_manager.py:1983] [process=6][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning. I0425 09:40:27.190451 135841449502528 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=135841449502528 count=1 at 0x7b85af709b40>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7b727c0db620>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7b727c0db5f0>, _write_futures=[]) I0425 09:40:27.190829 135841449502528 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=135841449502528 count=1 at 0x7b85af709b40>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7b727c0db620>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7b727c0db5f0>, _write_futures=[]) I0425 09:40:27.190857 135841449502528 checkpoint.py:459] Closing _NonBlockingMetadataStore(enable_write=True, _write_lock=<locked _thread.RLock object owner=135841449502528 count=1 at 0x7b85af709b40>, _store_impl=<orbax.checkpoint._src.metadata.checkpoint._MetadataStoreImpl object at 0x7b727c0db620>, _single_thread_executor=<concurrent.futures.thread.ThreadPoolExecutor object at 0x7b727c0db5f0>, _write_futures=[]) I0425 09:40:27.190888 135841449502528 checkpoint_manager.py:702] [process=6][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=None, item_handlers={'model_params': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b727c0db7d0>, 'optimizer_state': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b727c0dbf50>, 'custom_metadata': <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7b857ed85190>, 'iter': <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7b737a0fe4b0>}, handler_registry=None I0425 09:40:27.190995 135841449502528 composite_checkpoint_handler.py:237] Deferred registration for item: "model_params". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b727c0db7d0>` for item "model_params" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0425 09:40:27.191031 135841449502528 composite_checkpoint_handler.py:237] Deferred registration for item: "optimizer_state". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b727c0dbf50>` for item "optimizer_state" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0425 09:40:27.191055 135841449502528 composite_checkpoint_handler.py:237] Deferred registration for item: "custom_metadata". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7b857ed85190>` for item "custom_metadata" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0425 09:40:27.191083 135841449502528 composite_checkpoint_handler.py:237] Deferred registration for item: "iter". Adding handler `<maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7b737a0fe4b0>` for item "iter" and save args `<class 'maxtext.common.checkpointing.GrainCheckpointSave'>` and restore args `<class 'maxtext.common.checkpointing.GrainCheckpointRestore'>` to `_handler_registry`. I0425 09:40:27.191119 135841449502528 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7b737a0fe720>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0425 09:40:27.191146 135841449502528 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b727c0db7d0>, ('model_params', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b727c0db7d0>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b727c0dbf50>, ('optimizer_state', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7b727c0dbf50>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7b857ed85190>, ('custom_metadata', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7b857ed85190>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointSave'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7b737a0fe4b0>, ('iter', <class 'maxtext.common.checkpointing.GrainCheckpointRestore'>): <maxtext.common.checkpointing.GrainCheckpointHandler object at 0x7b737a0fe4b0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7b737a0fe720>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7b737a0fe720>}). I0425 09:40:27.191221 135841449502528 async_checkpointer.py:177] [process=6][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7b737ae84360> timeout: 600 secs and primary_host=0 for async checkpoint writes I0425 09:40:27.900541 135841449502528 checkpoint_manager.py:1788] Found 0 checkpoint steps in gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_trainstate_and_training_loop_20260425_092123/pt_distill_nnx_xpk_feat_nnx_trainstate_and_training_loop_20260425_092123_07_distill_smoke/checkpoints I0425 09:40:27.902654 135841449502528 checkpoint_manager.py:921] [process=6][thread=MainThread] CheckpointManager created, primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=2000, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_hns=False, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=None, preservation_policy=None, prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False), root_directory=gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_trainstate_and_training_loop_20260425_092123/pt_distill_nnx_xpk_feat_nnx_trainstate_and_training_loop_20260425_092123_07_distill_smoke/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7b857edc03e0> I0425 09:40:27.903056 135841449502528 train_distill.py:677] Starting Distillation Training... I0425 09:40:27.903176 135841449502528 peft_trainer.py:584] Training with mesh: Mesh('diloco': 1, 'data': 4, 'stage': 1, 'fsdp': 8, 'fsdp_transpose': 1, 'sequence': 1, 'context': 1, 'context_autoregressive': 1, 'tensor': 1, 'tensor_transpose': 1, 'tensor_sequence': 1, 'expert': 1, 'autoregressive': 1, axis_types=(Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto, Auto)) I0425 09:40:28.389654 135841449502528 peft_trainer.py:594] Compiled train_step cache size: 0 I0425 09:40:28.391305 135686623565568 grain_pool.py:367] Grain pool will use 1 processes. I0425 09:40:28.448391 135686623565568 grain_pool.py:440] Grain pool will start child processes. I0425 09:40:28.454276 135686623565568 grain_pool.py:448] Grain pool started all child processes. 2026-04-25 09:40:34.943890: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303) `rope_parameters`'s factor field must be a float >= 1, got 40 `rope_parameters`'s beta_fast field must be a float, got 32 `rope_parameters`'s beta_slow field must be a float, got 1 DeepseekV32Config got `key=rope_scaling` in kwargs but hasn't set it as attribute. For RoPE standardization you need to set `self.rope_parameters` in model's config. Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 781, in <module> app.run(main) File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 316, in run _run_main(main, args) File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 261, in _run_main sys.exit(main(argv)) ^^^^^^^^^^ File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 777, in main train_distill(student_config, teacher_config, is_offline, global_config.offline_data_dir) File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 679, in train_distill trainer.train(train_iter, eval_iter) File "/usr/local/lib/python3.12/site-packages/tunix/sft/peft_trainer.py", line 652, in train train_example = sharding_utils.shard_input( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 58, in shard_input return jax.tree.map( ^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/tree.py", line 156, in map return tree_util.tree_map(f, tree, *rest, is_leaf=is_leaf) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 373, in tree_map return treedef.unflatten(f(*xs) for xs in zip(*all_leaves)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/tree_util.py", line 373, in <genexpr> return treedef.unflatten(f(*xs) for xs in zip(*all_leaves)) ^^^^^^ File "/usr/local/lib/python3.12/site-packages/tunix/sft/sharding_utils.py", line 59, in <lambda> lambda x: jax.make_array_from_process_local_data( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 985, in make_array_from_process_local_data out = [_array_from_process_local_data(data, s, shape) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 1047, in _array_from_process_local_data return make_array_from_callback(global_shape, sharding, cb) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/array.py", line 844, in make_array_from_callback per_device_values = api.device_put(per_device_values, devices) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/api.py", line 2732, in device_put out_flat = dispatch._batched_device_put_impl( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 602, in _batched_device_put_impl y = _device_put_impl(x, device=device, src=src, copy=cp, aval=aval) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 582, in _device_put_impl return _device_put_sharding_impl(x, aval, device, copy) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/jax/_src/dispatch.py", line 512, in _device_put_sharding_impl raise ValueError( ValueError: When the second argument to `device_put` is a Device, the first argument must be a fully addressable array or a non-addressable array with a single device sharding. Got value with devices {TpuDevice(id=25, process_index=6, coords=(1,6,0), core_on_chip=0), TpuDevice(id=27, process_index=7, coords=(3,6,0), core_on_chip=0), TpuDevice(id=16, process_index=4, coords=(0,4,0), core_on_chip=0), TpuDevice(id=6, process_index=1, coords=(2,1,0), core_on_chip=0), TpuDevice(id=21, process_index=4, coords=(1,5,0), core_on_chip=0), TpuDevice(id=17, process_index=4, coords=(1,4,0), core_on_chip=0), TpuDevice(id=14, process_index=3, coords=(2,3,0), core_on_chip=0), TpuDevice(id=9, process_index=2, coords=(1,2,0), core_on_chip=0), TpuDevice(id=7, process_index=1, coords=(3,1,0), core_on_chip=0), TpuDevice(id=11, process_index=3, coords=(3,2,0), core_on_chip=0), TpuDevice(id=28, process_index=6, coords=(0,7,0), core_on_chip=0), TpuDevice(id=22, process_index=5, coords=(2,5,0), core_on_chip=0), TpuDevice(id=30, process_index=7, coords=(2,7,0), core_on_chip=0), TpuDevice(id=3, process_index=1, coords=(3,0,0), core_on_chip=0), TpuDevice(id=18, process_index=5, coords=(2,4,0), core_on_chip=0), TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), TpuDevice(id=1, process_index=0, coords=(1,0,0), core_on_chip=0), TpuDevice(id=8, process_index=2, coords=(0,2,0), core_on_chip=0), TpuDevice(id=12, process_index=2, coords=(0,3,0), core_on_chip=0), TpuDevice(id=29, process_index=6, coords=(1,7,0), core_on_chip=0), TpuDevice(id=2, process_index=1, coords=(2,0,0), core_on_chip=0), TpuDevice(id=23, process_index=5, coords=(3,5,0), core_on_chip=0), TpuDevice(id=31, process_index=7, coords=(3,7,0), core_on_chip=0), TpuDevice(id=13, process_index=2, coords=(1,3,0), core_on_chip=0), TpuDevice(id=20, process_index=4, coords=(0,5,0), core_on_chip=0), TpuDevice(id=19, process_index=5, coords=(3,4,0), core_on_chip=0), TpuDevice(id=5, process_index=0, coords=(1,1,0), core_on_chip=0), TpuDevice(id=24, process_index=6, coords=(0,6,0), core_on_chip=0), TpuDevice(id=26, process_index=7, coords=(2,6,0), core_on_chip=0), TpuDevice(id=4, process_index=0, coords=(0,1,0), core_on_chip=0), TpuDevice(id=15, process_index=3, coords=(3,3,0), core_on_chip=0), TpuDevice(id=10, process_index=3, coords=(2,2,0), core_on_chip=0)} I0425 09:40:39.103536 135686623565568 grain_pool.py:542] Grain pool is exiting. I0425 09:40:39.103641 135686623565568 grain_pool.py:547] Shutting down multiprocessing system. I0425 09:40:40.796302 135686623565568 grain_pool.py:547] Shutting down multiprocessing system. /usr/local/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 15 leaked shared_memory objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' XPK End: Sat Apr 25 09:40:50 UTC 2026 EXIT_CODE=1