XPK Start: Sat Apr 25 17:08:19 UTC 2026 2026-04-25 17:08:23.270536: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered WARNING: All log messages before absl::InitializeLog() is called are written to STDERR E0000 00:00:1777136903.283385 10 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered E0000 00:00:1777136903.287059 10 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered W0000 00:00:1777136903.298300 10 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once. W0000 00:00:1777136903.298320 10 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once. W0000 00:00:1777136903.298322 10 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once. W0000 00:00:1777136903.298324 10 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once. 2026-04-25 17:08:42.451619: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303) I0425 17:08:42.977398 134500511639360 max_utils.py:273] Attempting to initialize the jax distributed system... INFO:2026-04-25 17:08:52,019:jax._src.distributed:140: Starting JAX distributed service on [::]:8482 I0425 17:08:52.019111 134500511639360 distributed.py:140] Starting JAX distributed service on [::]:8482 INFO:2026-04-25 17:08:52,021:jax._src.distributed:157: Connecting to JAX distributed service on mt-13-scan-layers-false-aq2be-slice-job-0-0.mt-13-scan-layers-false-aq2be:8482 I0425 17:08:52.021416 134500511639360 distributed.py:157] Connecting to JAX distributed service on mt-13-scan-layers-false-aq2be-slice-job-0-0.mt-13-scan-layers-false-aq2be:8482 I0425 17:08:53.151230 134500511639360 max_utils.py:284] Jax distributed system initialized! I0425 17:09:00.127494 134500511639360 max_utils.py:800] System Information: Jax Version: 0.8.1 I0425 17:09:00.127603 134500511639360 max_utils.py:801] System Information: Jaxlib Version: 0.8.1 I0425 17:09:00.127645 134500511639360 max_utils.py:802] System Information: Jax Backend: PJRT C API TFRT TPU v6 lite Built on Nov 12 2025 14:16:36 (1762985796) cl/831091709 I0425 17:09:00.127703 134500511639360 train_utils.py:377] WARNING: Sequence packing is essentially ignored for synthetic data. Please use a real dataset to use sequence packing. I0425 17:09:00.770312 134500511639360 maxtext_utils.py:1631] Num_devices: 32, shape (1, 1, 1, 32, 1, 1, 1, 1, 1, 1, 1, 1, 1) I0425 17:09:01.947969 134500511639360 checkpointing.py:688] Setting up checkpoint logger... I0425 17:09:01.948099 134500511639360 checkpointing.py:234] Creating checkpoint manager with ocdbt=True and zarr3=True I0425 17:09:01.948145 134500511639360 pytree_checkpoint_handler.py:589] save_device_host_concurrent_bytes=None I0425 17:09:01.948392 134500511639360 base_pytree_checkpoint_handler.py:415] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=True, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7a5355d71100>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB) I0425 17:09:04.773167 134500511639360 checkpointing.py:266] Enabling policy for fixed interval checkpointing. I0425 17:09:04.773410 134500511639360 checkpoint_manager.py:709] [process=5][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=('items',), item_handlers={'items': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a4074db4b30>}, handler_registry=None I0425 17:09:04.773662 134500511639360 composite_checkpoint_handler.py:237] Deferred registration for item: "items". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a4074db4b30>` for item "items" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`. I0425 17:09:04.773712 134500511639360 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a3ef0438170>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`. I0425 17:09:04.773750 134500511639360 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('items', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a4074db4b30>, ('items', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7a4074db4b30>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a3ef0438170>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7a3ef0438170>}). I0425 17:09:04.774086 134500511639360 abstract_checkpointer.py:35] orbax-checkpoint version: 0.11.33 I0425 17:09:04.774166 134500511639360 async_checkpointer.py:177] [process=5][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7a3ef048c0e0> timeout: 600 secs and primary_host=0 for async checkpoint writes I0425 17:09:05.917082 134500511639360 checkpoint_manager.py:1818] Found 0 checkpoint steps in gs://lance-maxtext/nnx_ckpt_xpk_feat_nnx_set_defaults_true_20260425_150412/nnx_xpk_feat_nnx_set_defaults_true_20260425_150412_13_scan_layers_false/checkpoints I0425 17:09:06.316069 134500511639360 checkpoint_manager.py:929] [process=5][thread=MainThread] CheckpointManager created, primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=1, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=FixedIntervalPolicy(interval=10), preservation_policy=LatestN(n=None), prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False, lightweight_initialize=False), root_directory=gs://lance-maxtext/nnx_ckpt_xpk_feat_nnx_set_defaults_true_20260425_150412/nnx_xpk_feat_nnx_set_defaults_true_20260425_150412_13_scan_layers_false/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7a3ef0174f80> I0425 17:09:06.316246 134500511639360 checkpointing.py:302] Checkpoint manager created! I0425 17:09:08.766388 134500511639360 checkpointing.py:578] checkpoint manager exists so trying to load this run's existing checkpoint I0425 17:09:08.766525 134500511639360 checkpointing.py:676] No existing checkpoints found, not restoring checkpoint. fsdp: 32 I0425 17:09:18.901700 134500511639360 maxtext_utils.py:1749] decoder/decoder_norm/scale/value Shape: float32[2048] Physical: (None,) I0425 17:09:18.901826 134500511639360 maxtext_utils.py:1749] decoder/layers_0/mlp/wi_0/kernel/value Shape: float32[2048,7168] Physical: ('fsdp', None) I0425 17:09:18.901880 134500511639360 maxtext_utils.py:1749] decoder/layers_0/mlp/wi_1/kernel/value Shape: float32[2048,7168] Physical: ('fsdp', None) I0425 17:09:18.901924 134500511639360 maxtext_utils.py:1749] decoder/layers_0/mlp/wo/kernel/value Shape: float32[7168,2048] Physical: (None, 'fsdp') I0425 17:09:18.901962 134500511639360 maxtext_utils.py:1749] decoder/layers_0/post_self_attention_layer_norm/scale/value Shape: float32[2048] Physical: (None,) I0425 17:09:18.902004 134500511639360 maxtext_utils.py:1749] decoder/layers_0/pre_self_attention_layer_norm/scale/value Shape: float32[2048] Physical: (None,) I0425 17:09:18.902044 134500511639360 maxtext_utils.py:1749] decoder/layers_0/self_attention/key/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.902081 134500511639360 maxtext_utils.py:1749] decoder/layers_0/self_attention/out/kernel/value Shape: float32[16,128,2048] Physical: (None, None, 'fsdp') I0425 17:09:18.902117 134500511639360 maxtext_utils.py:1749] decoder/layers_0/self_attention/query/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.902150 134500511639360 maxtext_utils.py:1749] decoder/layers_0/self_attention/value/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.902184 134500511639360 maxtext_utils.py:1749] decoder/layers_1/mlp/wi_0/kernel/value Shape: float32[2048,7168] Physical: ('fsdp', None) I0425 17:09:18.902233 134500511639360 maxtext_utils.py:1749] decoder/layers_1/mlp/wi_1/kernel/value Shape: float32[2048,7168] Physical: ('fsdp', None) I0425 17:09:18.902274 134500511639360 maxtext_utils.py:1749] decoder/layers_1/mlp/wo/kernel/value Shape: float32[7168,2048] Physical: (None, 'fsdp') I0425 17:09:18.902306 134500511639360 maxtext_utils.py:1749] decoder/layers_1/post_self_attention_layer_norm/scale/value Shape: float32[2048] Physical: (None,) I0425 17:09:18.902337 134500511639360 maxtext_utils.py:1749] decoder/layers_1/pre_self_attention_layer_norm/scale/value Shape: float32[2048] Physical: (None,) I0425 17:09:18.902370 134500511639360 maxtext_utils.py:1749] decoder/layers_1/self_attention/key/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.902418 134500511639360 maxtext_utils.py:1749] decoder/layers_1/self_attention/out/kernel/value Shape: float32[16,128,2048] Physical: (None, None, 'fsdp') I0425 17:09:18.902472 134500511639360 maxtext_utils.py:1749] decoder/layers_1/self_attention/query/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.902508 134500511639360 maxtext_utils.py:1749] decoder/layers_1/self_attention/value/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.902541 134500511639360 maxtext_utils.py:1749] decoder/layers_10/mlp/wi_0/kernel/value Shape: float32[2048,7168] Physical: ('fsdp', None) I0425 17:09:18.902572 134500511639360 maxtext_utils.py:1749] decoder/layers_10/mlp/wi_1/kernel/value Shape: float32[2048,7168] Physical: ('fsdp', None) I0425 17:09:18.902604 134500511639360 maxtext_utils.py:1749] decoder/layers_10/mlp/wo/kernel/value Shape: float32[7168,2048] Physical: (None, 'fsdp') I0425 17:09:18.902634 134500511639360 maxtext_utils.py:1749] decoder/layers_10/post_self_attention_layer_norm/scale/value Shape: float32[2048] Physical: (None,) I0425 17:09:18.902675 134500511639360 maxtext_utils.py:1749] decoder/layers_10/pre_self_attention_layer_norm/scale/value Shape: float32[2048] Physical: (None,) I0425 17:09:18.902708 134500511639360 maxtext_utils.py:1749] decoder/layers_10/self_attention/key/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.902740 134500511639360 maxtext_utils.py:1749] decoder/layers_10/self_attention/out/kernel/value Shape: float32[16,128,2048] Physical: (None, None, 'fsdp') I0425 17:09:18.902785 134500511639360 maxtext_utils.py:1749] decoder/layers_10/self_attention/query/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.902822 134500511639360 maxtext_utils.py:1749] decoder/layers_10/self_attention/value/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.902855 134500511639360 maxtext_utils.py:1749] decoder/layers_11/mlp/wi_0/kernel/value Shape: float32[2048,7168] Physical: ('fsdp', None) I0425 17:09:18.902889 134500511639360 maxtext_utils.py:1749] decoder/layers_11/mlp/wi_1/kernel/value Shape: float32[2048,7168] Physical: ('fsdp', None) I0425 17:09:18.902921 134500511639360 maxtext_utils.py:1749] decoder/layers_11/mlp/wo/kernel/value Shape: float32[7168,2048] Physical: (None, 'fsdp') I0425 17:09:18.902953 134500511639360 maxtext_utils.py:1749] decoder/layers_11/post_self_attention_layer_norm/scale/value Shape: float32[2048] Physical: (None,) I0425 17:09:18.902988 134500511639360 maxtext_utils.py:1749] decoder/layers_11/pre_self_attention_layer_norm/scale/value Shape: float32[2048] Physical: (None,) I0425 17:09:18.903022 134500511639360 maxtext_utils.py:1749] decoder/layers_11/self_attention/key/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.903056 134500511639360 maxtext_utils.py:1749] decoder/layers_11/self_attention/out/kernel/value Shape: float32[16,128,2048] Physical: (None, None, 'fsdp') I0425 17:09:18.903089 134500511639360 maxtext_utils.py:1749] decoder/layers_11/self_attention/query/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.903120 134500511639360 maxtext_utils.py:1749] decoder/layers_11/self_attention/value/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.903151 134500511639360 maxtext_utils.py:1749] decoder/layers_12/mlp/wi_0/kernel/value Shape: float32[2048,7168] Physical: ('fsdp', None) I0425 17:09:18.903182 134500511639360 maxtext_utils.py:1749] decoder/layers_12/mlp/wi_1/kernel/value Shape: float32[2048,7168] Physical: ('fsdp', None) I0425 17:09:18.903212 134500511639360 maxtext_utils.py:1749] decoder/layers_12/mlp/wo/kernel/value Shape: float32[7168,2048] Physical: (None, 'fsdp') I0425 17:09:18.903244 134500511639360 maxtext_utils.py:1749] decoder/layers_12/post_self_attention_layer_norm/scale/value Shape: float32[2048] Physical: (None,) I0425 17:09:18.903273 134500511639360 maxtext_utils.py:1749] decoder/layers_12/pre_self_attention_layer_norm/scale/value Shape: float32[2048] Physical: (None,) I0425 17:09:18.903303 134500511639360 maxtext_utils.py:1749] decoder/layers_12/self_attention/key/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.903334 134500511639360 maxtext_utils.py:1749] decoder/layers_12/self_attention/out/kernel/value Shape: float32[16,128,2048] Physical: (None, None, 'fsdp') I0425 17:09:18.903365 134500511639360 maxtext_utils.py:1749] decoder/layers_12/self_attention/query/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.903395 134500511639360 maxtext_utils.py:1749] decoder/layers_12/self_attention/value/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.903424 134500511639360 maxtext_utils.py:1749] decoder/layers_13/mlp/wi_0/kernel/value Shape: float32[2048,7168] Physical: ('fsdp', None) I0425 17:09:18.903455 134500511639360 maxtext_utils.py:1749] decoder/layers_13/mlp/wi_1/kernel/value Shape: float32[2048,7168] Physical: ('fsdp', None) I0425 17:09:18.903485 134500511639360 maxtext_utils.py:1749] decoder/layers_13/mlp/wo/kernel/value Shape: float32[7168,2048] Physical: (None, 'fsdp') I0425 17:09:18.903513 134500511639360 maxtext_utils.py:1749] decoder/layers_13/post_self_attention_layer_norm/scale/value Shape: float32[2048] Physical: (None,) I0425 17:09:18.903540 134500511639360 maxtext_utils.py:1749] decoder/layers_13/pre_self_attention_layer_norm/scale/value Shape: float32[2048] Physical: (None,) I0425 17:09:18.903570 134500511639360 maxtext_utils.py:1749] decoder/layers_13/self_attention/key/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.903600 134500511639360 maxtext_utils.py:1749] decoder/layers_13/self_attention/out/kernel/value Shape: float32[16,128,2048] Physical: (None, None, 'fsdp') I0425 17:09:18.903630 134500511639360 maxtext_utils.py:1749] decoder/layers_13/self_attention/query/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.903670 134500511639360 maxtext_utils.py:1749] decoder/layers_13/self_attention/value/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.903700 134500511639360 maxtext_utils.py:1749] decoder/layers_14/mlp/wi_0/kernel/value Shape: float32[2048,7168] Physical: ('fsdp', None) I0425 17:09:18.903732 134500511639360 maxtext_utils.py:1749] decoder/layers_14/mlp/wi_1/kernel/value Shape: float32[2048,7168] Physical: ('fsdp', None) I0425 17:09:18.903762 134500511639360 maxtext_utils.py:1749] decoder/layers_14/mlp/wo/kernel/value Shape: float32[7168,2048] Physical: (None, 'fsdp') I0425 17:09:18.903790 134500511639360 maxtext_utils.py:1749] decoder/layers_14/post_self_attention_layer_norm/scale/value Shape: float32[2048] Physical: (None,) I0425 17:09:18.903817 134500511639360 maxtext_utils.py:1749] decoder/layers_14/pre_self_attention_layer_norm/scale/value Shape: float32[2048] Physical: (None,) I0425 17:09:18.903847 134500511639360 maxtext_utils.py:1749] decoder/layers_14/self_attention/key/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.903879 134500511639360 maxtext_utils.py:1749] decoder/layers_14/self_attention/out/kernel/value Shape: float32[16,128,2048] Physical: (None, None, 'fsdp') I0425 17:09:18.903910 134500511639360 maxtext_utils.py:1749] decoder/layers_14/self_attention/query/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.903940 134500511639360 maxtext_utils.py:1749] decoder/layers_14/self_attention/value/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.903969 134500511639360 maxtext_utils.py:1749] decoder/layers_15/mlp/wi_0/kernel/value Shape: float32[2048,7168] Physical: ('fsdp', None) I0425 17:09:18.904003 134500511639360 maxtext_utils.py:1749] decoder/layers_15/mlp/wi_1/kernel/value Shape: float32[2048,7168] Physical: ('fsdp', None) I0425 17:09:18.904032 134500511639360 maxtext_utils.py:1749] decoder/layers_15/mlp/wo/kernel/value Shape: float32[7168,2048] Physical: (None, 'fsdp') I0425 17:09:18.904060 134500511639360 maxtext_utils.py:1749] decoder/layers_15/post_self_attention_layer_norm/scale/value Shape: float32[2048] Physical: (None,) I0425 17:09:18.904087 134500511639360 maxtext_utils.py:1749] decoder/layers_15/pre_self_attention_layer_norm/scale/value Shape: float32[2048] Physical: (None,) I0425 17:09:18.904116 134500511639360 maxtext_utils.py:1749] decoder/layers_15/self_attention/key/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.904145 134500511639360 maxtext_utils.py:1749] decoder/layers_15/self_attention/out/kernel/value Shape: float32[16,128,2048] Physical: (None, None, 'fsdp') I0425 17:09:18.904175 134500511639360 maxtext_utils.py:1749] decoder/layers_15/self_attention/query/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.904205 134500511639360 maxtext_utils.py:1749] decoder/layers_15/self_attention/value/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.904235 134500511639360 maxtext_utils.py:1749] decoder/layers_2/mlp/wi_0/kernel/value Shape: float32[2048,7168] Physical: ('fsdp', None) I0425 17:09:18.904265 134500511639360 maxtext_utils.py:1749] decoder/layers_2/mlp/wi_1/kernel/value Shape: float32[2048,7168] Physical: ('fsdp', None) I0425 17:09:18.904295 134500511639360 maxtext_utils.py:1749] decoder/layers_2/mlp/wo/kernel/value Shape: float32[7168,2048] Physical: (None, 'fsdp') I0425 17:09:18.904324 134500511639360 maxtext_utils.py:1749] decoder/layers_2/post_self_attention_layer_norm/scale/value Shape: float32[2048] Physical: (None,) I0425 17:09:18.904352 134500511639360 maxtext_utils.py:1749] decoder/layers_2/pre_self_attention_layer_norm/scale/value Shape: float32[2048] Physical: (None,) I0425 17:09:18.904382 134500511639360 maxtext_utils.py:1749] decoder/layers_2/self_attention/key/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.904414 134500511639360 maxtext_utils.py:1749] decoder/layers_2/self_attention/out/kernel/value Shape: float32[16,128,2048] Physical: (None, None, 'fsdp') I0425 17:09:18.904446 134500511639360 maxtext_utils.py:1749] decoder/layers_2/self_attention/query/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.904476 134500511639360 maxtext_utils.py:1749] decoder/layers_2/self_attention/value/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.904506 134500511639360 maxtext_utils.py:1749] decoder/layers_3/mlp/wi_0/kernel/value Shape: float32[2048,7168] Physical: ('fsdp', None) I0425 17:09:18.904536 134500511639360 maxtext_utils.py:1749] decoder/layers_3/mlp/wi_1/kernel/value Shape: float32[2048,7168] Physical: ('fsdp', None) I0425 17:09:18.904568 134500511639360 maxtext_utils.py:1749] decoder/layers_3/mlp/wo/kernel/value Shape: float32[7168,2048] Physical: (None, 'fsdp') I0425 17:09:18.904595 134500511639360 maxtext_utils.py:1749] decoder/layers_3/post_self_attention_layer_norm/scale/value Shape: float32[2048] Physical: (None,) I0425 17:09:18.904623 134500511639360 maxtext_utils.py:1749] decoder/layers_3/pre_self_attention_layer_norm/scale/value Shape: float32[2048] Physical: (None,) I0425 17:09:18.904662 134500511639360 maxtext_utils.py:1749] decoder/layers_3/self_attention/key/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.904694 134500511639360 maxtext_utils.py:1749] decoder/layers_3/self_attention/out/kernel/value Shape: float32[16,128,2048] Physical: (None, None, 'fsdp') I0425 17:09:18.904725 134500511639360 maxtext_utils.py:1749] decoder/layers_3/self_attention/query/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.904754 134500511639360 maxtext_utils.py:1749] decoder/layers_3/self_attention/value/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.904783 134500511639360 maxtext_utils.py:1749] decoder/layers_4/mlp/wi_0/kernel/value Shape: float32[2048,7168] Physical: ('fsdp', None) I0425 17:09:18.904813 134500511639360 maxtext_utils.py:1749] decoder/layers_4/mlp/wi_1/kernel/value Shape: float32[2048,7168] Physical: ('fsdp', None) I0425 17:09:18.904843 134500511639360 maxtext_utils.py:1749] decoder/layers_4/mlp/wo/kernel/value Shape: float32[7168,2048] Physical: (None, 'fsdp') I0425 17:09:18.904872 134500511639360 maxtext_utils.py:1749] decoder/layers_4/post_self_attention_layer_norm/scale/value Shape: float32[2048] Physical: (None,) I0425 17:09:18.904900 134500511639360 maxtext_utils.py:1749] decoder/layers_4/pre_self_attention_layer_norm/scale/value Shape: float32[2048] Physical: (None,) I0425 17:09:18.904932 134500511639360 maxtext_utils.py:1749] decoder/layers_4/self_attention/key/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.904963 134500511639360 maxtext_utils.py:1749] decoder/layers_4/self_attention/out/kernel/value Shape: float32[16,128,2048] Physical: (None, None, 'fsdp') I0425 17:09:18.904996 134500511639360 maxtext_utils.py:1749] decoder/layers_4/self_attention/query/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.905026 134500511639360 maxtext_utils.py:1749] decoder/layers_4/self_attention/value/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.905056 134500511639360 maxtext_utils.py:1749] decoder/layers_5/mlp/wi_0/kernel/value Shape: float32[2048,7168] Physical: ('fsdp', None) I0425 17:09:18.905086 134500511639360 maxtext_utils.py:1749] decoder/layers_5/mlp/wi_1/kernel/value Shape: float32[2048,7168] Physical: ('fsdp', None) I0425 17:09:18.905115 134500511639360 maxtext_utils.py:1749] decoder/layers_5/mlp/wo/kernel/value Shape: float32[7168,2048] Physical: (None, 'fsdp') I0425 17:09:18.905143 134500511639360 maxtext_utils.py:1749] decoder/layers_5/post_self_attention_layer_norm/scale/value Shape: float32[2048] Physical: (None,) I0425 17:09:18.905170 134500511639360 maxtext_utils.py:1749] decoder/layers_5/pre_self_attention_layer_norm/scale/value Shape: float32[2048] Physical: (None,) I0425 17:09:18.905199 134500511639360 maxtext_utils.py:1749] decoder/layers_5/self_attention/key/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.905230 134500511639360 maxtext_utils.py:1749] decoder/layers_5/self_attention/out/kernel/value Shape: float32[16,128,2048] Physical: (None, None, 'fsdp') I0425 17:09:18.905260 134500511639360 maxtext_utils.py:1749] decoder/layers_5/self_attention/query/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.905289 134500511639360 maxtext_utils.py:1749] decoder/layers_5/self_attention/value/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.905319 134500511639360 maxtext_utils.py:1749] decoder/layers_6/mlp/wi_0/kernel/value Shape: float32[2048,7168] Physical: ('fsdp', None) I0425 17:09:18.905348 134500511639360 maxtext_utils.py:1749] decoder/layers_6/mlp/wi_1/kernel/value Shape: float32[2048,7168] Physical: ('fsdp', None) I0425 17:09:18.905378 134500511639360 maxtext_utils.py:1749] decoder/layers_6/mlp/wo/kernel/value Shape: float32[7168,2048] Physical: (None, 'fsdp') I0425 17:09:18.905405 134500511639360 maxtext_utils.py:1749] decoder/layers_6/post_self_attention_layer_norm/scale/value Shape: float32[2048] Physical: (None,) I0425 17:09:18.905433 134500511639360 maxtext_utils.py:1749] decoder/layers_6/pre_self_attention_layer_norm/scale/value Shape: float32[2048] Physical: (None,) I0425 17:09:18.905463 134500511639360 maxtext_utils.py:1749] decoder/layers_6/self_attention/key/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.905493 134500511639360 maxtext_utils.py:1749] decoder/layers_6/self_attention/out/kernel/value Shape: float32[16,128,2048] Physical: (None, None, 'fsdp') I0425 17:09:18.905523 134500511639360 maxtext_utils.py:1749] decoder/layers_6/self_attention/query/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.905552 134500511639360 maxtext_utils.py:1749] decoder/layers_6/self_attention/value/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.905582 134500511639360 maxtext_utils.py:1749] decoder/layers_7/mlp/wi_0/kernel/value Shape: float32[2048,7168] Physical: ('fsdp', None) I0425 17:09:18.905611 134500511639360 maxtext_utils.py:1749] decoder/layers_7/mlp/wi_1/kernel/value Shape: float32[2048,7168] Physical: ('fsdp', None) I0425 17:09:18.905642 134500511639360 maxtext_utils.py:1749] decoder/layers_7/mlp/wo/kernel/value Shape: float32[7168,2048] Physical: (None, 'fsdp') I0425 17:09:18.905680 134500511639360 maxtext_utils.py:1749] decoder/layers_7/post_self_attention_layer_norm/scale/value Shape: float32[2048] Physical: (None,) I0425 17:09:18.905709 134500511639360 maxtext_utils.py:1749] decoder/layers_7/pre_self_attention_layer_norm/scale/value Shape: float32[2048] Physical: (None,) I0425 17:09:18.905739 134500511639360 maxtext_utils.py:1749] decoder/layers_7/self_attention/key/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.905770 134500511639360 maxtext_utils.py:1749] decoder/layers_7/self_attention/out/kernel/value Shape: float32[16,128,2048] Physical: (None, None, 'fsdp') I0425 17:09:18.905799 134500511639360 maxtext_utils.py:1749] decoder/layers_7/self_attention/query/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.905829 134500511639360 maxtext_utils.py:1749] decoder/layers_7/self_attention/value/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.905858 134500511639360 maxtext_utils.py:1749] decoder/layers_8/mlp/wi_0/kernel/value Shape: float32[2048,7168] Physical: ('fsdp', None) I0425 17:09:18.905888 134500511639360 maxtext_utils.py:1749] decoder/layers_8/mlp/wi_1/kernel/value Shape: float32[2048,7168] Physical: ('fsdp', None) I0425 17:09:18.905917 134500511639360 maxtext_utils.py:1749] decoder/layers_8/mlp/wo/kernel/value Shape: float32[7168,2048] Physical: (None, 'fsdp') I0425 17:09:18.905945 134500511639360 maxtext_utils.py:1749] decoder/layers_8/post_self_attention_layer_norm/scale/value Shape: float32[2048] Physical: (None,) I0425 17:09:18.905975 134500511639360 maxtext_utils.py:1749] decoder/layers_8/pre_self_attention_layer_norm/scale/value Shape: float32[2048] Physical: (None,) I0425 17:09:18.906007 134500511639360 maxtext_utils.py:1749] decoder/layers_8/self_attention/key/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.906037 134500511639360 maxtext_utils.py:1749] decoder/layers_8/self_attention/out/kernel/value Shape: float32[16,128,2048] Physical: (None, None, 'fsdp') I0425 17:09:18.906067 134500511639360 maxtext_utils.py:1749] decoder/layers_8/self_attention/query/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.906097 134500511639360 maxtext_utils.py:1749] decoder/layers_8/self_attention/value/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.906127 134500511639360 maxtext_utils.py:1749] decoder/layers_9/mlp/wi_0/kernel/value Shape: float32[2048,7168] Physical: ('fsdp', None) I0425 17:09:18.906157 134500511639360 maxtext_utils.py:1749] decoder/layers_9/mlp/wi_1/kernel/value Shape: float32[2048,7168] Physical: ('fsdp', None) I0425 17:09:18.906187 134500511639360 maxtext_utils.py:1749] decoder/layers_9/mlp/wo/kernel/value Shape: float32[7168,2048] Physical: (None, 'fsdp') I0425 17:09:18.906215 134500511639360 maxtext_utils.py:1749] decoder/layers_9/post_self_attention_layer_norm/scale/value Shape: float32[2048] Physical: (None,) I0425 17:09:18.906242 134500511639360 maxtext_utils.py:1749] decoder/layers_9/pre_self_attention_layer_norm/scale/value Shape: float32[2048] Physical: (None,) I0425 17:09:18.906272 134500511639360 maxtext_utils.py:1749] decoder/layers_9/self_attention/key/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.906301 134500511639360 maxtext_utils.py:1749] decoder/layers_9/self_attention/out/kernel/value Shape: float32[16,128,2048] Physical: (None, None, 'fsdp') I0425 17:09:18.906332 134500511639360 maxtext_utils.py:1749] decoder/layers_9/self_attention/query/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.906362 134500511639360 maxtext_utils.py:1749] decoder/layers_9/self_attention/value/kernel/value Shape: float32[2048,16,128] Physical: ('fsdp', None, None) I0425 17:09:18.906392 134500511639360 maxtext_utils.py:1749] decoder/logits_dense/kernel/value Shape: float32[2048,32000] Physical: ('fsdp', None) I0425 17:09:18.906421 134500511639360 maxtext_utils.py:1749] token_embedder/embedding/value Shape: float32[32000,2048] Physical: (None, 'fsdp') I0425 17:09:19.788718 134500511639360 nnx_decoders.py:770] nnx_decoders/y_in Logical: bfloat16[32,2048,2048]...................................... ('activation_batch', 'activation_norm_length', 'activation_embed'). I0425 17:09:19.788813 134500511639360 nnx_decoders.py:770] nnx_decoders/y_in Physical: bfloat16[32,2048,2048]...................................... ('fsdp', None, None). I0425 17:09:19.793716 134500511639360 nnx_decoders.py:770] Unknown Logical: bfloat16[32,2048,2048]...................................... ('activation_batch', 'activation_norm_length', 'activation_embed'). I0425 17:09:19.793768 134500511639360 nnx_decoders.py:770] Unknown Physical: bfloat16[32,2048,2048]...................................... ('fsdp', None, None). I0425 17:09:19.809457 134500511639360 attentions.py:1084] attentions/inputs_q Logical: bfloat16[32,2048,2048]...................................... ('activation_batch', 'activation_attn_length', 'activation_attn_embed'). I0425 17:09:19.809520 134500511639360 attentions.py:1084] attentions/inputs_q Physical: bfloat16[32,2048,2048]...................................... ('fsdp', None, None). I0425 17:09:19.824632 134500511639360 attentions.py:1085] attentions/inputs_kv Logical: bfloat16[32,2048,2048]...................................... ('activation_batch', 'activation_attn_length', 'activation_attn_embed'). I0425 17:09:19.824705 134500511639360 attentions.py:1085] attentions/inputs_kv Physical: bfloat16[32,2048,2048]...................................... ('fsdp', None, None). I0425 17:09:19.849978 134500511639360 attentions.py:1150] attentions/query Logical: bfloat16[32,2048,16,128].................................... ('activation_kv_batch', 'activation_attn_length', 'activation_kv_heads', 'activation_kv_head_dim'). I0425 17:09:19.850044 134500511639360 attentions.py:1150] attentions/query Physical: bfloat16[32,2048,16,128].................................... ('fsdp', None, None, None). I0425 17:09:19.865514 134500511639360 attentions.py:1151] attentions/key Logical: bfloat16[32,2048,16,128].................................... ('activation_kv_batch', 'activation_attn_length', 'activation_kv_heads', 'activation_kv_head_dim'). I0425 17:09:19.865575 134500511639360 attentions.py:1151] attentions/key Physical: bfloat16[32,2048,16,128].................................... ('fsdp', None, None, None). I0425 17:09:19.880788 134500511639360 attentions.py:1152] attentions/value Logical: bfloat16[32,2048,16,128].................................... ('activation_kv_batch', 'activation_attn_length', 'activation_kv_heads', 'activation_kv_head_dim'). I0425 17:09:19.880855 134500511639360 attentions.py:1152] attentions/value Physical: bfloat16[32,2048,16,128].................................... ('fsdp', None, None, None). I0425 17:09:19.910905 134500511639360 attentions.py:1193] attentions/out Logical: bfloat16[32,2048,16,128].................................... ('activation_batch', 'activation_attn_length', 'activation_heads', 'activation_kv'). I0425 17:09:19.910978 134500511639360 attentions.py:1193] attentions/out Physical: bfloat16[32,2048,16,128].................................... ('fsdp', None, None, None). I0425 17:09:19.930453 134500511639360 linears.py:541] linears/x Logical: bfloat16[32,2048,7168]...................................... ('activation_batch', 'activation_length', 'activation_mlp'). I0425 17:09:19.930515 134500511639360 linears.py:541] linears/x Physical: bfloat16[32,2048,7168]...................................... ('fsdp', None, None). I0425 17:10:20.995496 134500511639360 max_utils.py:791] Total memory size: 1.8 GB, Output size: 0.4 GB, Temp size: 1.4 GB, Argument size: 0.4 GB, Host temp size: 0.0 GB. I0425 17:10:21.007893 134500511639360 metric_logger.py:289] number parameters: 1.104 billion I0425 17:11:24.222742 134500511639360 checkpointing.py:794] Waiting for step 0 to finish before checkpoint... I0425 17:11:24.497321 134500511639360 checkpointing.py:798] Waited 0.274564266204834 seconds for step 0 to finish before starting checkpointing. I0425 17:11:24.500702 134500511639360 checkpoint_manager.py:2013] [process=5][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning. I0425 17:11:24.502392 134500511639360 checkpoint_manager.py:1518] [process=5] Saving checkpoint at step 0 I0425 17:11:24.505234 134500511639360 async_checkpointer.py:452] [process=5] Started async saving checkpoint to gs://lance-maxtext/nnx_ckpt_xpk_feat_nnx_set_defaults_true_20260425_150412/nnx_xpk_feat_nnx_set_defaults_true_20260425_150412_13_scan_layers_false/checkpoints/0. I0425 17:11:25.596255 134500511639360 signaling_client.py:364] Using JaxDistributedSignalingClient I0425 17:11:25.597463 134500511639360 jax_array_handlers.py:358] Scheduling D2H of 648 prioritized jax.Array. I0425 17:11:25.597574 134500511639360 replica_slices.py:424] Transferring arrays to host memory with options: use_replica_parallel=True, min_slice_bytes_for_replica_parallel=None, max_replicas_for_replica_parallel=None, enable_pinned_host_transfer=False I0425 17:11:25.918831 134500511639360 base_pytree_checkpoint_handler.py:153] [process=5][thread=MainThread] Initiated "orbax.checkpoint._src.serialization.jax_array_handlers.ArrayHandler".serialize. Time taken: 0.329508s I0425 17:11:25.919000 134500511639360 base_pytree_checkpoint_handler.py:129] [process=5] /jax/checkpoint/write/blocking_gbytes_per_sec: 2.270 GiB/s (total gbytes: 1.5 GiB) (time elapsed: 0.6796448230743408 s) (per-host) I0425 17:11:25.919052 134500511639360 base_pytree_checkpoint_handler.py:737] [process=5][thread=MainThread] Initiated Pytree async_save. Time taken: 0.679704s (batch_requests_ready=0.314570s, total_serialization_initiated=0.365067s, others=0.000067s) I0425 17:11:25.919431 134500511639360 composite_checkpoint_handler.py:715] [process=5][thread=MainThread] Initiated CompositeCheckpointHandler.async_save. Time taken: 0.684189s (all_items=0.000024s, per_item={'items': '0.00002384'}, temp_paths=0.684165) I0425 17:11:25.920419 134375031174912 async_checkpointer.py:79] [process=5][thread=async_save] Background save thread started. I0425 17:11:25.920585 134500511639360 async_checkpointer.py:561] Finished blocking save. Time taken: 1.418122s. Continuing background save to gs://lance-maxtext/nnx_ckpt_xpk_feat_nnx_set_defaults_true_20260425_150412/nnx_xpk_feat_nnx_set_defaults_true_20260425_150412_13_scan_layers_false/checkpoints/0. I0425 17:11:25.937064 134500511639360 checkpoint_manager.py:1566] [process=5][thread=MainThread][step=0] Starting CheckpointManager Save Finalize thread=save_finalize I0425 17:11:25.937381 134373949040384 async_checkpointer.py:265] [process=5][thread=save_finalize] Waiting for background save thread=async_save. I0425 17:11:25.937537 134500511639360 standard_logger.py:34] {'step': 0, 'event_type': 'save', 'directory': 'gs://lance-maxtext/nnx_ckpt_xpk_feat_nnx_set_defaults_true_20260425_150412/nnx_xpk_feat_nnx_set_defaults_true_20260425_150412_13_scan_layers_false/checkpoints', 'reached_preemption': False, 'preemption_received_at': None, 'synchronous': False, 'wait_for_prev_start_time': 1777137084.5006847, 'wait_for_prev_duration_secs': 6.151199340820312e-05, 'time_between_consecutive_saves_sec': None, 'checkpointer_blocking_start_time': 1777137084.5024314, 'checkpointer_blocking_duration_secs': 1.4182744026184082, 'get_old_steps_start_time': 1777137085.920727, 'get_old_steps_duration_secs': 2.5510787963867188e-05, 'checkpoint_manager_blocking_start_time': 1777137084.498825, 'checkpoint_manager_blocking_duration_secs': 1.4386727809906006} I0425 17:11:25.937873 134500511639360 checkpointing.py:409] Started an asynchronous checkpoint save for step 0 I0425 17:11:25.938112 134500511639360 max_utils.py:750] Memstats: After params initialized: I0425 17:11:25.938169 134500511639360 max_utils.py:756] Using (GB) 0.82 / 31.25 (2.624000%) on TPU_18(process=5,(2,4,0,0)) I0425 17:11:25.938209 134500511639360 max_utils.py:756] Using (GB) 0.82 / 31.25 (2.624000%) on TPU_19(process=5,(3,4,0,0)) I0425 17:11:25.938237 134500511639360 max_utils.py:756] Using (GB) 0.82 / 31.25 (2.624000%) on TPU_22(process=5,(2,5,0,0)) I0425 17:11:25.938261 134500511639360 max_utils.py:756] Using (GB) 0.82 / 31.25 (2.624000%) on TPU_23(process=5,(3,5,0,0)) I0425 17:11:26.278721 134500511639360 metric_logger.py:185] completed step: 0, seconds: 63.208, TFLOP/s/device: 0.215, Tokens/s/device: 32.401, total_weights: 65536, loss: 10.874 I0425 17:11:26.391413 134500511639360 metric_logger.py:185] completed step: 1, seconds: 2.040, TFLOP/s/device: 6.659, Tokens/s/device: 1003.724, total_weights: 65536, loss: 10.874 I0425 17:11:26.921403 134500511639360 metric_logger.py:185] completed step: 2, seconds: 0.060, TFLOP/s/device: 227.232, Tokens/s/device: 34250.928, total_weights: 65536, loss: 10.265 I0425 17:11:27.058056 134500511639360 metric_logger.py:185] completed step: 3, seconds: 0.398, TFLOP/s/device: 34.120, Tokens/s/device: 5142.886, total_weights: 65536, loss: 9.736 I0425 17:11:27.335574 134500511639360 metric_logger.py:185] completed step: 4, seconds: 0.219, TFLOP/s/device: 61.907, Tokens/s/device: 9331.316, total_weights: 65536, loss: 9.279 I0425 17:11:27.359514 134500511639360 metric_logger.py:185] completed step: 5, seconds: 0.136, TFLOP/s/device: 99.908, Tokens/s/device: 15059.156, total_weights: 65536, loss: 8.894 I0425 17:11:49.339257 134500511639360 metric_logger.py:185] completed step: 6, seconds: 0.277, TFLOP/s/device: 49.094, Tokens/s/device: 7399.967, total_weights: 65536, loss: 8.595 I0425 17:11:49.475724 134500511639360 metric_logger.py:185] completed step: 7, seconds: 21.858, TFLOP/s/device: 0.622, Tokens/s/device: 93.694, total_weights: 65536, loss: 8.387 I0425 17:11:49.612013 134500511639360 metric_logger.py:185] completed step: 8, seconds: 0.146, TFLOP/s/device: 92.819, Tokens/s/device: 13990.696, total_weights: 65536, loss: 8.258 I0425 17:11:49.756039 134500511639360 checkpointing.py:794] Waiting for step 9 to finish before checkpoint... I0425 17:11:49.760529 134500511639360 checkpointing.py:798] Waited 0.004498481750488281 seconds for step 9 to finish before starting checkpointing. I0425 17:11:49.763728 134500511639360 checkpoint_manager.py:2024] [process=5][thread=MainThread][step=0][wait_until_finished] Waiting for Save Finalize thread (save_finalize) to complete. I0425 17:11:50.310906 2770 google_auth_provider.cc:181] Running on GCE, using service account 562977990677-compute@developer.gserviceaccount.com I0425 17:11:52.891487 134374443947776 array_metadata_store.py:203] [process=5][thread=array_type_handler] Wrote 648 array_metadata.ArrayMetadata to gs://lance-maxtext/nnx_ckpt_xpk_feat_nnx_set_defaults_true_20260425_150412/nnx_xpk_feat_nnx_set_defaults_true_20260425_150412_13_scan_layers_false/checkpoints/0/items/array_metadatas/process_5 I0425 17:12:21.075039 134375031174912 base_pytree_checkpoint_handler.py:129] [process=5] /jax/checkpoint/write/gbytes_per_sec: 28.290 MiB/s (total gbytes: 1.5 GiB) (time elapsed: 55.83561849594116 s) (per-host) I0425 17:12:21.075155 134375031174912 async_checkpointer.py:90] [process=5][thread=async_save] 3 Handler Commit operations completed. Time taken: 55.154627s. I0425 17:12:29.567993 134375031174912 async_checkpointer.py:144] [process=5][thread=async_save] Background save thread done. Time taken: 63.647442s. I0425 17:12:29.568280 134373949040384 async_checkpointer.py:273] [process=5][thread=save_finalize] Done with waiting for background save thread=async_save. I0425 17:12:29.568418 134373949040384 async_checkpointer.py:283] [process=5][thread=save_finalize] No errors found in background save thread=async_save. I0425 17:12:29.568471 134373949040384 checkpoint_manager.py:2133] [process=5][thread=save_finalize][step=0] CheckpointManager Save Finalize is syncing with other hosts... I0425 17:12:29.570704 134373949040384 checkpoint_manager.py:2142] [process=5][thread=save_finalize][step=0] CheckpointManager Save Finalize is done on all hosts. I0425 17:12:29.570893 134500511639360 checkpoint_manager.py:2036] [process=5][thread=MainThread][step=0][wait_until_finished] Done waiting for Save Finalize thread (save_finalize) running at step=0. W0425 17:12:29.571044 134500511639360 checkpoint_manager.py:1458] Waiting for previous save to complete took 39.807321 seconds. If this number is high, consider checkpointing less frequently. I0425 17:12:29.573225 134500511639360 checkpoint_manager.py:1518] [process=5] Saving checkpoint at step 9 I0425 17:12:29.576896 134500511639360 async_checkpointer.py:452] [process=5] Started async saving checkpoint to gs://lance-maxtext/nnx_ckpt_xpk_feat_nnx_set_defaults_true_20260425_150412/nnx_xpk_feat_nnx_set_defaults_true_20260425_150412_13_scan_layers_false/checkpoints/9. I0425 17:12:30.178827 134500511639360 jax_array_handlers.py:358] Scheduling D2H of 648 prioritized jax.Array. I0425 17:12:30.178960 134500511639360 replica_slices.py:424] Transferring arrays to host memory with options: use_replica_parallel=True, min_slice_bytes_for_replica_parallel=None, max_replicas_for_replica_parallel=None, enable_pinned_host_transfer=False I0425 17:12:30.353849 134500511639360 base_pytree_checkpoint_handler.py:153] [process=5][thread=MainThread] Initiated "orbax.checkpoint._src.serialization.jax_array_handlers.ArrayHandler".serialize. Time taken: 0.183115s I0425 17:12:30.354018 134500511639360 base_pytree_checkpoint_handler.py:129] [process=5] /jax/checkpoint/write/blocking_gbytes_per_sec: 3.217 GiB/s (total gbytes: 1.5 GiB) (time elapsed: 0.4795680046081543 s) (per-host) I0425 17:12:30.354073 134500511639360 base_pytree_checkpoint_handler.py:737] [process=5][thread=MainThread] Initiated Pytree async_save. Time taken: 0.479630s (batch_requests_ready=0.262457s, total_serialization_initiated=0.217103s, others=0.000070s) I0425 17:12:30.354486 134500511639360 composite_checkpoint_handler.py:715] [process=5][thread=MainThread] Initiated CompositeCheckpointHandler.async_save. Time taken: 0.484307s (all_items=0.000016s, per_item={'items': '0.00001597'}, temp_paths=0.484291) I0425 17:12:30.355305 134374997604096 async_checkpointer.py:79] [process=5][thread=async_save] Background save thread started. I0425 17:12:30.355456 134500511639360 async_checkpointer.py:561] Finished blocking save. Time taken: 0.782166s. Continuing background save to gs://lance-maxtext/nnx_ckpt_xpk_feat_nnx_set_defaults_true_20260425_150412/nnx_xpk_feat_nnx_set_defaults_true_20260425_150412_13_scan_layers_false/checkpoints/9. I0425 17:12:30.387723 134500511639360 checkpoint_manager.py:1566] [process=5][thread=MainThread][step=9] Starting CheckpointManager Save Finalize thread=save_finalize I0425 17:12:30.387991 134373949040384 async_checkpointer.py:265] [process=5][thread=save_finalize] Waiting for background save thread=async_save. I0425 17:12:30.388163 134500511639360 standard_logger.py:34] {'step': 9, 'event_type': 'save', 'directory': 'gs://lance-maxtext/nnx_ckpt_xpk_feat_nnx_set_defaults_true_20260425_150412/nnx_xpk_feat_nnx_set_defaults_true_20260425_150412_13_scan_layers_false/checkpoints', 'reached_preemption': False, 'preemption_received_at': None, 'synchronous': False, 'wait_for_prev_start_time': 1777137109.7637, 'wait_for_prev_duration_secs': 39.8073205947876, 'time_between_consecutive_saves_sec': None, 'checkpointer_blocking_start_time': 1777137149.573266, 'checkpointer_blocking_duration_secs': 0.7822649478912354, 'get_old_steps_start_time': 1777137150.355551, 'get_old_steps_duration_secs': 2.193450927734375e-05, 'checkpoint_manager_blocking_start_time': 1777137109.7616744, 'checkpoint_manager_blocking_duration_secs': 40.626452922821045} I0425 17:12:30.388412 134500511639360 checkpointing.py:409] Started an asynchronous checkpoint save for step 9 I0425 17:12:30.388466 134500511639360 checkpoint_manager.py:2024] [process=5][thread=MainThread][step=9][wait_until_finished] Waiting for Save Finalize thread (save_finalize) to complete. I0425 17:12:35.891796 134375005996800 array_metadata_store.py:203] [process=5][thread=array_type_handler] Wrote 648 array_metadata.ArrayMetadata to gs://lance-maxtext/nnx_ckpt_xpk_feat_nnx_set_defaults_true_20260425_150412/nnx_xpk_feat_nnx_set_defaults_true_20260425_150412_13_scan_layers_false/checkpoints/9/items/array_metadatas/process_5 I0425 17:13:12.331993 134374997604096 base_pytree_checkpoint_handler.py:129] [process=5] /jax/checkpoint/write/gbytes_per_sec: 37.204 MiB/s (total gbytes: 1.5 GiB) (time elapsed: 42.4574990272522 s) (per-host) I0425 17:13:12.332114 134374997604096 async_checkpointer.py:90] [process=5][thread=async_save] 3 Handler Commit operations completed. Time taken: 41.976696s. I0425 17:13:24.107081 134374997604096 async_checkpointer.py:144] [process=5][thread=async_save] Background save thread done. Time taken: 53.751648s. I0425 17:13:24.107396 134373949040384 async_checkpointer.py:273] [process=5][thread=save_finalize] Done with waiting for background save thread=async_save. I0425 17:13:24.107517 134373949040384 async_checkpointer.py:283] [process=5][thread=save_finalize] No errors found in background save thread=async_save. I0425 17:13:24.107564 134373949040384 checkpoint_manager.py:2133] [process=5][thread=save_finalize][step=9] CheckpointManager Save Finalize is syncing with other hosts... I0425 17:13:24.110127 134373949040384 checkpoint_manager.py:2142] [process=5][thread=save_finalize][step=9] CheckpointManager Save Finalize is done on all hosts. I0425 17:13:24.110315 134500511639360 checkpoint_manager.py:2036] [process=5][thread=MainThread][step=9][wait_until_finished] Done waiting for Save Finalize thread (save_finalize) running at step=9. I0425 17:13:24.110671 134500511639360 checkpoint_manager.py:2013] [process=5][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning. I0425 17:13:24.111392 134500511639360 metric_logger.py:185] completed step: 9, seconds: 0.136, TFLOP/s/device: 100.015, Tokens/s/device: 15075.340, total_weights: 65536, loss: 8.182 Per train step: Total TFLOPs: 13.59 split as 93.93% learnable weight flops and 6.07% attention flops XPK End: Sat Apr 25 17:13:36 UTC 2026 EXIT_CODE=0