MaxView

← Back to run

Log Summary

XPK Start: Thu Apr 16 23:02:23 UTC 2026
PyTorch was not found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
2026-04-16 23:02:47.377170: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
I0416 23:02:47.909306 135235630511936 max_utils.py:273] Attempting to initialize the jax distributed system...
I0416 23:02:56.949992 135235630511936 distributed.py:149] Starting JAX distributed service on [::]:8482
I0416 23:02:56.952187 135235630511936 distributed.py:172] Connecting to JAX distributed service on mt-13-scan-layers-false-48621-slice-job-0-0.mt-13-scan-layers-false-48621:8482
I0416 23:02:58.123687 135235630511936 max_utils.py:284] Jax distributed system initialized!
I0416 23:03:04.541820 135235630511936 max_utils.py:800] System Information: Jax Version: 0.9.2
I0416 23:03:04.541944 135235630511936 max_utils.py:801] System Information: Jaxlib Version: 0.9.2
I0416 23:03:04.541983 135235630511936 max_utils.py:802] System Information: Jax Backend: PJRT C API
TFRT TPU v6 lite
Built on Mar 4 2026 11:32:08 (1772652728) cl/878335365
I0416 23:03:04.542014 135235630511936 train_utils.py:377] WARNING: Sequence packing is essentially ignored for synthetic data. Please use a real dataset to use sequence packing.
I0416 23:03:05.208220 135235630511936 maxtext_utils.py:1687] Num_devices: 32, shape (1, 1, 1, 32, 1, 1, 1, 1, 1, 1, 1, 1, 1)
I0416 23:03:06.019344 135235630511936 checkpointing.py:688] Setting up checkpoint logger...
I0416 23:03:06.019461 135235630511936 checkpointing.py:234] Creating checkpoint manager with ocdbt=True and zarr3=True
I0416 23:03:06.019506 135235630511936 pytree_checkpoint_handler.py:592] save_device_host_concurrent_bytes=None
I0416 23:03:06.019798 135235630511936 base_pytree_checkpoint_handler.py:441] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=True, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7afe48150080>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0416 23:03:09.355129 135235630511936 checkpointing.py:266] Enabling policy for fixed interval checkpointing.
I0416 23:03:09.355351 135235630511936 checkpoint_manager.py:708] [process=0][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=('items',), item_handlers={'items': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7aea181605c0>}, handler_registry=None
I0416 23:03:09.355693 135235630511936 composite_checkpoint_handler.py:237] Deferred registration for item: "items". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7aea181605c0>` for item "items" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0416 23:03:09.355760 135235630511936 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7aea383820c0>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0416 23:03:09.355798 135235630511936 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('items', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7aea181605c0>, ('items', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7aea181605c0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7aea383820c0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7aea383820c0>}).
I0416 23:03:09.356224 135235630511936 abstract_checkpointer.py:35] orbax-checkpoint version: 0.11.34
I0416 23:03:09.356313 135235630511936 async_checkpointer.py:192] [process=0][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7aea38539800> timeout: 1200 secs and primary_host=0 for async checkpoint writes
I0416 23:03:10.033010 135235630511936 checkpoint_manager.py:1812] Found 0 checkpoint steps in gs://lance-maxtext/nnx_ckpt_xpk_feat_nnx_post_train_fixes_20260416_221011/nnx_xpk_feat_nnx_post_train_fixes_20260416_221011_13_scan_layers_false/checkpoints
I0416 23:03:10.471116 135235630511936 checkpoint_manager.py:929] [process=0][thread=MainThread] CheckpointManager created,  primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=1, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=FixedIntervalPolicy(interval=10), preservation_policy=LatestN(n=None), prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False, lightweight_initialize=False), root_directory=gs://lance-maxtext/nnx_ckpt_xpk_feat_nnx_post_train_fixes_20260416_221011/nnx_xpk_feat_nnx_post_train_fixes_20260416_221011_13_scan_layers_false/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7aea18113e60>
I0416 23:03:10.471287 135235630511936 checkpointing.py:302] Checkpoint manager created!
I0416 23:03:12.598423 135235630511936 checkpointing.py:578] checkpoint manager exists so trying to load this run's existing checkpoint
I0416 23:03:12.598558 135235630511936 checkpointing.py:676] No existing checkpoints found, not restoring checkpoint.
fsdp: 32
I0416 23:03:20.509187 135235630511936 maxtext_utils.py:1805]  decoder/decoder_norm/scale/value
    Shape:     float32[2048]
    Physical:  (None,)
I0416 23:03:20.509305 135235630511936 maxtext_utils.py:1805]  decoder/layers/0/mlp/wi_0/kernel/value
    Shape:     float32[2048,7168]
    Physical:  ('fsdp', None)
I0416 23:03:20.509357 135235630511936 maxtext_utils.py:1805]  decoder/layers/0/mlp/wi_1/kernel/value
    Shape:     float32[2048,7168]
    Physical:  ('fsdp', None)
I0416 23:03:20.509400 135235630511936 maxtext_utils.py:1805]  decoder/layers/0/mlp/wo/kernel/value
    Shape:     float32[7168,2048]
    Physical:  (None, 'fsdp')
I0416 23:03:20.509437 135235630511936 maxtext_utils.py:1805]  decoder/layers/0/post_self_attention_layer_norm/scale/value
    Shape:     float32[2048]
    Physical:  (None,)
I0416 23:03:20.509473 135235630511936 maxtext_utils.py:1805]  decoder/layers/0/pre_self_attention_layer_norm/scale/value
    Shape:     float32[2048]
    Physical:  (None,)
I0416 23:03:20.509508 135235630511936 maxtext_utils.py:1805]  decoder/layers/0/self_attention/key/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.509543 135235630511936 maxtext_utils.py:1805]  decoder/layers/0/self_attention/out/kernel/value
    Shape:     float32[16,128,2048]
    Physical:  (None, None, 'fsdp')
I0416 23:03:20.509577 135235630511936 maxtext_utils.py:1805]  decoder/layers/0/self_attention/query/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.509609 135235630511936 maxtext_utils.py:1805]  decoder/layers/0/self_attention/value/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.509641 135235630511936 maxtext_utils.py:1805]  decoder/layers/1/mlp/wi_0/kernel/value
    Shape:     float32[2048,7168]
    Physical:  ('fsdp', None)
I0416 23:03:20.509671 135235630511936 maxtext_utils.py:1805]  decoder/layers/1/mlp/wi_1/kernel/value
    Shape:     float32[2048,7168]
    Physical:  ('fsdp', None)
I0416 23:03:20.509701 135235630511936 maxtext_utils.py:1805]  decoder/layers/1/mlp/wo/kernel/value
    Shape:     float32[7168,2048]
    Physical:  (None, 'fsdp')
I0416 23:03:20.509730 135235630511936 maxtext_utils.py:1805]  decoder/layers/1/post_self_attention_layer_norm/scale/value
    Shape:     float32[2048]
    Physical:  (None,)
I0416 23:03:20.509758 135235630511936 maxtext_utils.py:1805]  decoder/layers/1/pre_self_attention_layer_norm/scale/value
    Shape:     float32[2048]
    Physical:  (None,)
I0416 23:03:20.509788 135235630511936 maxtext_utils.py:1805]  decoder/layers/1/self_attention/key/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.509818 135235630511936 maxtext_utils.py:1805]  decoder/layers/1/self_attention/out/kernel/value
    Shape:     float32[16,128,2048]
    Physical:  (None, None, 'fsdp')
I0416 23:03:20.509865 135235630511936 maxtext_utils.py:1805]  decoder/layers/1/self_attention/query/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.509899 135235630511936 maxtext_utils.py:1805]  decoder/layers/1/self_attention/value/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.509932 135235630511936 maxtext_utils.py:1805]  decoder/layers/2/mlp/wi_0/kernel/value
    Shape:     float32[2048,7168]
    Physical:  ('fsdp', None)
I0416 23:03:20.509962 135235630511936 maxtext_utils.py:1805]  decoder/layers/2/mlp/wi_1/kernel/value
    Shape:     float32[2048,7168]
    Physical:  ('fsdp', None)
I0416 23:03:20.509993 135235630511936 maxtext_utils.py:1805]  decoder/layers/2/mlp/wo/kernel/value
    Shape:     float32[7168,2048]
    Physical:  (None, 'fsdp')
I0416 23:03:20.510021 135235630511936 maxtext_utils.py:1805]  decoder/layers/2/post_self_attention_layer_norm/scale/value
    Shape:     float32[2048]
    Physical:  (None,)
I0416 23:03:20.510050 135235630511936 maxtext_utils.py:1805]  decoder/layers/2/pre_self_attention_layer_norm/scale/value
    Shape:     float32[2048]
    Physical:  (None,)
I0416 23:03:20.510080 135235630511936 maxtext_utils.py:1805]  decoder/layers/2/self_attention/key/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.510112 135235630511936 maxtext_utils.py:1805]  decoder/layers/2/self_attention/out/kernel/value
    Shape:     float32[16,128,2048]
    Physical:  (None, None, 'fsdp')
I0416 23:03:20.510142 135235630511936 maxtext_utils.py:1805]  decoder/layers/2/self_attention/query/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.510171 135235630511936 maxtext_utils.py:1805]  decoder/layers/2/self_attention/value/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.510208 135235630511936 maxtext_utils.py:1805]  decoder/layers/3/mlp/wi_0/kernel/value
    Shape:     float32[2048,7168]
    Physical:  ('fsdp', None)
I0416 23:03:20.510240 135235630511936 maxtext_utils.py:1805]  decoder/layers/3/mlp/wi_1/kernel/value
    Shape:     float32[2048,7168]
    Physical:  ('fsdp', None)
I0416 23:03:20.510269 135235630511936 maxtext_utils.py:1805]  decoder/layers/3/mlp/wo/kernel/value
    Shape:     float32[7168,2048]
    Physical:  (None, 'fsdp')
I0416 23:03:20.510298 135235630511936 maxtext_utils.py:1805]  decoder/layers/3/post_self_attention_layer_norm/scale/value
    Shape:     float32[2048]
    Physical:  (None,)
I0416 23:03:20.510328 135235630511936 maxtext_utils.py:1805]  decoder/layers/3/pre_self_attention_layer_norm/scale/value
    Shape:     float32[2048]
    Physical:  (None,)
I0416 23:03:20.510358 135235630511936 maxtext_utils.py:1805]  decoder/layers/3/self_attention/key/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.510390 135235630511936 maxtext_utils.py:1805]  decoder/layers/3/self_attention/out/kernel/value
    Shape:     float32[16,128,2048]
    Physical:  (None, None, 'fsdp')
I0416 23:03:20.510420 135235630511936 maxtext_utils.py:1805]  decoder/layers/3/self_attention/query/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.510451 135235630511936 maxtext_utils.py:1805]  decoder/layers/3/self_attention/value/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.510480 135235630511936 maxtext_utils.py:1805]  decoder/layers/4/mlp/wi_0/kernel/value
    Shape:     float32[2048,7168]
    Physical:  ('fsdp', None)
I0416 23:03:20.510509 135235630511936 maxtext_utils.py:1805]  decoder/layers/4/mlp/wi_1/kernel/value
    Shape:     float32[2048,7168]
    Physical:  ('fsdp', None)
I0416 23:03:20.510537 135235630511936 maxtext_utils.py:1805]  decoder/layers/4/mlp/wo/kernel/value
    Shape:     float32[7168,2048]
    Physical:  (None, 'fsdp')
I0416 23:03:20.510565 135235630511936 maxtext_utils.py:1805]  decoder/layers/4/post_self_attention_layer_norm/scale/value
    Shape:     float32[2048]
    Physical:  (None,)
I0416 23:03:20.510592 135235630511936 maxtext_utils.py:1805]  decoder/layers/4/pre_self_attention_layer_norm/scale/value
    Shape:     float32[2048]
    Physical:  (None,)
I0416 23:03:20.510621 135235630511936 maxtext_utils.py:1805]  decoder/layers/4/self_attention/key/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.510651 135235630511936 maxtext_utils.py:1805]  decoder/layers/4/self_attention/out/kernel/value
    Shape:     float32[16,128,2048]
    Physical:  (None, None, 'fsdp')
I0416 23:03:20.510680 135235630511936 maxtext_utils.py:1805]  decoder/layers/4/self_attention/query/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.510710 135235630511936 maxtext_utils.py:1805]  decoder/layers/4/self_attention/value/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.510742 135235630511936 maxtext_utils.py:1805]  decoder/layers/5/mlp/wi_0/kernel/value
    Shape:     float32[2048,7168]
    Physical:  ('fsdp', None)
I0416 23:03:20.510772 135235630511936 maxtext_utils.py:1805]  decoder/layers/5/mlp/wi_1/kernel/value
    Shape:     float32[2048,7168]
    Physical:  ('fsdp', None)
I0416 23:03:20.510801 135235630511936 maxtext_utils.py:1805]  decoder/layers/5/mlp/wo/kernel/value
    Shape:     float32[7168,2048]
    Physical:  (None, 'fsdp')
I0416 23:03:20.510828 135235630511936 maxtext_utils.py:1805]  decoder/layers/5/post_self_attention_layer_norm/scale/value
    Shape:     float32[2048]
    Physical:  (None,)
I0416 23:03:20.510894 135235630511936 maxtext_utils.py:1805]  decoder/layers/5/pre_self_attention_layer_norm/scale/value
    Shape:     float32[2048]
    Physical:  (None,)
I0416 23:03:20.510928 135235630511936 maxtext_utils.py:1805]  decoder/layers/5/self_attention/key/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.510959 135235630511936 maxtext_utils.py:1805]  decoder/layers/5/self_attention/out/kernel/value
    Shape:     float32[16,128,2048]
    Physical:  (None, None, 'fsdp')
I0416 23:03:20.510989 135235630511936 maxtext_utils.py:1805]  decoder/layers/5/self_attention/query/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.511018 135235630511936 maxtext_utils.py:1805]  decoder/layers/5/self_attention/value/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.511047 135235630511936 maxtext_utils.py:1805]  decoder/layers/6/mlp/wi_0/kernel/value
    Shape:     float32[2048,7168]
    Physical:  ('fsdp', None)
I0416 23:03:20.511075 135235630511936 maxtext_utils.py:1805]  decoder/layers/6/mlp/wi_1/kernel/value
    Shape:     float32[2048,7168]
    Physical:  ('fsdp', None)
I0416 23:03:20.511104 135235630511936 maxtext_utils.py:1805]  decoder/layers/6/mlp/wo/kernel/value
    Shape:     float32[7168,2048]
    Physical:  (None, 'fsdp')
I0416 23:03:20.511131 135235630511936 maxtext_utils.py:1805]  decoder/layers/6/post_self_attention_layer_norm/scale/value
    Shape:     float32[2048]
    Physical:  (None,)
I0416 23:03:20.511158 135235630511936 maxtext_utils.py:1805]  decoder/layers/6/pre_self_attention_layer_norm/scale/value
    Shape:     float32[2048]
    Physical:  (None,)
I0416 23:03:20.511187 135235630511936 maxtext_utils.py:1805]  decoder/layers/6/self_attention/key/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.511215 135235630511936 maxtext_utils.py:1805]  decoder/layers/6/self_attention/out/kernel/value
    Shape:     float32[16,128,2048]
    Physical:  (None, None, 'fsdp')
I0416 23:03:20.511245 135235630511936 maxtext_utils.py:1805]  decoder/layers/6/self_attention/query/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.511276 135235630511936 maxtext_utils.py:1805]  decoder/layers/6/self_attention/value/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.511305 135235630511936 maxtext_utils.py:1805]  decoder/layers/7/mlp/wi_0/kernel/value
    Shape:     float32[2048,7168]
    Physical:  ('fsdp', None)
I0416 23:03:20.511334 135235630511936 maxtext_utils.py:1805]  decoder/layers/7/mlp/wi_1/kernel/value
    Shape:     float32[2048,7168]
    Physical:  ('fsdp', None)
I0416 23:03:20.511363 135235630511936 maxtext_utils.py:1805]  decoder/layers/7/mlp/wo/kernel/value
    Shape:     float32[7168,2048]
    Physical:  (None, 'fsdp')
I0416 23:03:20.511391 135235630511936 maxtext_utils.py:1805]  decoder/layers/7/post_self_attention_layer_norm/scale/value
    Shape:     float32[2048]
    Physical:  (None,)
I0416 23:03:20.511420 135235630511936 maxtext_utils.py:1805]  decoder/layers/7/pre_self_attention_layer_norm/scale/value
    Shape:     float32[2048]
    Physical:  (None,)
I0416 23:03:20.511449 135235630511936 maxtext_utils.py:1805]  decoder/layers/7/self_attention/key/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.511479 135235630511936 maxtext_utils.py:1805]  decoder/layers/7/self_attention/out/kernel/value
    Shape:     float32[16,128,2048]
    Physical:  (None, None, 'fsdp')
I0416 23:03:20.511508 135235630511936 maxtext_utils.py:1805]  decoder/layers/7/self_attention/query/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.511538 135235630511936 maxtext_utils.py:1805]  decoder/layers/7/self_attention/value/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.511567 135235630511936 maxtext_utils.py:1805]  decoder/layers/8/mlp/wi_0/kernel/value
    Shape:     float32[2048,7168]
    Physical:  ('fsdp', None)
I0416 23:03:20.511596 135235630511936 maxtext_utils.py:1805]  decoder/layers/8/mlp/wi_1/kernel/value
    Shape:     float32[2048,7168]
    Physical:  ('fsdp', None)
I0416 23:03:20.511625 135235630511936 maxtext_utils.py:1805]  decoder/layers/8/mlp/wo/kernel/value
    Shape:     float32[7168,2048]
    Physical:  (None, 'fsdp')
I0416 23:03:20.511652 135235630511936 maxtext_utils.py:1805]  decoder/layers/8/post_self_attention_layer_norm/scale/value
    Shape:     float32[2048]
    Physical:  (None,)
I0416 23:03:20.511679 135235630511936 maxtext_utils.py:1805]  decoder/layers/8/pre_self_attention_layer_norm/scale/value
    Shape:     float32[2048]
    Physical:  (None,)
I0416 23:03:20.511709 135235630511936 maxtext_utils.py:1805]  decoder/layers/8/self_attention/key/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.511738 135235630511936 maxtext_utils.py:1805]  decoder/layers/8/self_attention/out/kernel/value
    Shape:     float32[16,128,2048]
    Physical:  (None, None, 'fsdp')
I0416 23:03:20.511767 135235630511936 maxtext_utils.py:1805]  decoder/layers/8/self_attention/query/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.511796 135235630511936 maxtext_utils.py:1805]  decoder/layers/8/self_attention/value/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.511824 135235630511936 maxtext_utils.py:1805]  decoder/layers/9/mlp/wi_0/kernel/value
    Shape:     float32[2048,7168]
    Physical:  ('fsdp', None)
I0416 23:03:20.511869 135235630511936 maxtext_utils.py:1805]  decoder/layers/9/mlp/wi_1/kernel/value
    Shape:     float32[2048,7168]
    Physical:  ('fsdp', None)
I0416 23:03:20.511899 135235630511936 maxtext_utils.py:1805]  decoder/layers/9/mlp/wo/kernel/value
    Shape:     float32[7168,2048]
    Physical:  (None, 'fsdp')
I0416 23:03:20.511928 135235630511936 maxtext_utils.py:1805]  decoder/layers/9/post_self_attention_layer_norm/scale/value
    Shape:     float32[2048]
    Physical:  (None,)
I0416 23:03:20.511955 135235630511936 maxtext_utils.py:1805]  decoder/layers/9/pre_self_attention_layer_norm/scale/value
    Shape:     float32[2048]
    Physical:  (None,)
I0416 23:03:20.511983 135235630511936 maxtext_utils.py:1805]  decoder/layers/9/self_attention/key/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.512012 135235630511936 maxtext_utils.py:1805]  decoder/layers/9/self_attention/out/kernel/value
    Shape:     float32[16,128,2048]
    Physical:  (None, None, 'fsdp')
I0416 23:03:20.512040 135235630511936 maxtext_utils.py:1805]  decoder/layers/9/self_attention/query/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.512069 135235630511936 maxtext_utils.py:1805]  decoder/layers/9/self_attention/value/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.512097 135235630511936 maxtext_utils.py:1805]  decoder/layers/10/mlp/wi_0/kernel/value
    Shape:     float32[2048,7168]
    Physical:  ('fsdp', None)
I0416 23:03:20.512125 135235630511936 maxtext_utils.py:1805]  decoder/layers/10/mlp/wi_1/kernel/value
    Shape:     float32[2048,7168]
    Physical:  ('fsdp', None)
I0416 23:03:20.512153 135235630511936 maxtext_utils.py:1805]  decoder/layers/10/mlp/wo/kernel/value
    Shape:     float32[7168,2048]
    Physical:  (None, 'fsdp')
I0416 23:03:20.512181 135235630511936 maxtext_utils.py:1805]  decoder/layers/10/post_self_attention_layer_norm/scale/value
    Shape:     float32[2048]
    Physical:  (None,)
I0416 23:03:20.512209 135235630511936 maxtext_utils.py:1805]  decoder/layers/10/pre_self_attention_layer_norm/scale/value
    Shape:     float32[2048]
    Physical:  (None,)
I0416 23:03:20.512238 135235630511936 maxtext_utils.py:1805]  decoder/layers/10/self_attention/key/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.512267 135235630511936 maxtext_utils.py:1805]  decoder/layers/10/self_attention/out/kernel/value
    Shape:     float32[16,128,2048]
    Physical:  (None, None, 'fsdp')
I0416 23:03:20.512295 135235630511936 maxtext_utils.py:1805]  decoder/layers/10/self_attention/query/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.512323 135235630511936 maxtext_utils.py:1805]  decoder/layers/10/self_attention/value/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.512351 135235630511936 maxtext_utils.py:1805]  decoder/layers/11/mlp/wi_0/kernel/value
    Shape:     float32[2048,7168]
    Physical:  ('fsdp', None)
I0416 23:03:20.512380 135235630511936 maxtext_utils.py:1805]  decoder/layers/11/mlp/wi_1/kernel/value
    Shape:     float32[2048,7168]
    Physical:  ('fsdp', None)
I0416 23:03:20.512408 135235630511936 maxtext_utils.py:1805]  decoder/layers/11/mlp/wo/kernel/value
    Shape:     float32[7168,2048]
    Physical:  (None, 'fsdp')
I0416 23:03:20.512434 135235630511936 maxtext_utils.py:1805]  decoder/layers/11/post_self_attention_layer_norm/scale/value
    Shape:     float32[2048]
    Physical:  (None,)
I0416 23:03:20.512460 135235630511936 maxtext_utils.py:1805]  decoder/layers/11/pre_self_attention_layer_norm/scale/value
    Shape:     float32[2048]
    Physical:  (None,)
I0416 23:03:20.512489 135235630511936 maxtext_utils.py:1805]  decoder/layers/11/self_attention/key/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.512517 135235630511936 maxtext_utils.py:1805]  decoder/layers/11/self_attention/out/kernel/value
    Shape:     float32[16,128,2048]
    Physical:  (None, None, 'fsdp')
I0416 23:03:20.512546 135235630511936 maxtext_utils.py:1805]  decoder/layers/11/self_attention/query/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.512574 135235630511936 maxtext_utils.py:1805]  decoder/layers/11/self_attention/value/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.512601 135235630511936 maxtext_utils.py:1805]  decoder/layers/12/mlp/wi_0/kernel/value
    Shape:     float32[2048,7168]
    Physical:  ('fsdp', None)
I0416 23:03:20.512630 135235630511936 maxtext_utils.py:1805]  decoder/layers/12/mlp/wi_1/kernel/value
    Shape:     float32[2048,7168]
    Physical:  ('fsdp', None)
I0416 23:03:20.512658 135235630511936 maxtext_utils.py:1805]  decoder/layers/12/mlp/wo/kernel/value
    Shape:     float32[7168,2048]
    Physical:  (None, 'fsdp')
I0416 23:03:20.512685 135235630511936 maxtext_utils.py:1805]  decoder/layers/12/post_self_attention_layer_norm/scale/value
    Shape:     float32[2048]
    Physical:  (None,)
I0416 23:03:20.512712 135235630511936 maxtext_utils.py:1805]  decoder/layers/12/pre_self_attention_layer_norm/scale/value
    Shape:     float32[2048]
    Physical:  (None,)
I0416 23:03:20.512740 135235630511936 maxtext_utils.py:1805]  decoder/layers/12/self_attention/key/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.512768 135235630511936 maxtext_utils.py:1805]  decoder/layers/12/self_attention/out/kernel/value
    Shape:     float32[16,128,2048]
    Physical:  (None, None, 'fsdp')
I0416 23:03:20.512796 135235630511936 maxtext_utils.py:1805]  decoder/layers/12/self_attention/query/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.512824 135235630511936 maxtext_utils.py:1805]  decoder/layers/12/self_attention/value/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.512865 135235630511936 maxtext_utils.py:1805]  decoder/layers/13/mlp/wi_0/kernel/value
    Shape:     float32[2048,7168]
    Physical:  ('fsdp', None)
I0416 23:03:20.512895 135235630511936 maxtext_utils.py:1805]  decoder/layers/13/mlp/wi_1/kernel/value
    Shape:     float32[2048,7168]
    Physical:  ('fsdp', None)
I0416 23:03:20.512924 135235630511936 maxtext_utils.py:1805]  decoder/layers/13/mlp/wo/kernel/value
    Shape:     float32[7168,2048]
    Physical:  (None, 'fsdp')
I0416 23:03:20.512951 135235630511936 maxtext_utils.py:1805]  decoder/layers/13/post_self_attention_layer_norm/scale/value
    Shape:     float32[2048]
    Physical:  (None,)
I0416 23:03:20.512977 135235630511936 maxtext_utils.py:1805]  decoder/layers/13/pre_self_attention_layer_norm/scale/value
    Shape:     float32[2048]
    Physical:  (None,)
I0416 23:03:20.513007 135235630511936 maxtext_utils.py:1805]  decoder/layers/13/self_attention/key/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.513035 135235630511936 maxtext_utils.py:1805]  decoder/layers/13/self_attention/out/kernel/value
    Shape:     float32[16,128,2048]
    Physical:  (None, None, 'fsdp')
I0416 23:03:20.513064 135235630511936 maxtext_utils.py:1805]  decoder/layers/13/self_attention/query/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.513092 135235630511936 maxtext_utils.py:1805]  decoder/layers/13/self_attention/value/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.513121 135235630511936 maxtext_utils.py:1805]  decoder/layers/14/mlp/wi_0/kernel/value
    Shape:     float32[2048,7168]
    Physical:  ('fsdp', None)
I0416 23:03:20.513150 135235630511936 maxtext_utils.py:1805]  decoder/layers/14/mlp/wi_1/kernel/value
    Shape:     float32[2048,7168]
    Physical:  ('fsdp', None)
I0416 23:03:20.513179 135235630511936 maxtext_utils.py:1805]  decoder/layers/14/mlp/wo/kernel/value
    Shape:     float32[7168,2048]
    Physical:  (None, 'fsdp')
I0416 23:03:20.513206 135235630511936 maxtext_utils.py:1805]  decoder/layers/14/post_self_attention_layer_norm/scale/value
    Shape:     float32[2048]
    Physical:  (None,)
I0416 23:03:20.513232 135235630511936 maxtext_utils.py:1805]  decoder/layers/14/pre_self_attention_layer_norm/scale/value
    Shape:     float32[2048]
    Physical:  (None,)
I0416 23:03:20.513261 135235630511936 maxtext_utils.py:1805]  decoder/layers/14/self_attention/key/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.513289 135235630511936 maxtext_utils.py:1805]  decoder/layers/14/self_attention/out/kernel/value
    Shape:     float32[16,128,2048]
    Physical:  (None, None, 'fsdp')
I0416 23:03:20.513318 135235630511936 maxtext_utils.py:1805]  decoder/layers/14/self_attention/query/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.513346 135235630511936 maxtext_utils.py:1805]  decoder/layers/14/self_attention/value/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.513374 135235630511936 maxtext_utils.py:1805]  decoder/layers/15/mlp/wi_0/kernel/value
    Shape:     float32[2048,7168]
    Physical:  ('fsdp', None)
I0416 23:03:20.513402 135235630511936 maxtext_utils.py:1805]  decoder/layers/15/mlp/wi_1/kernel/value
    Shape:     float32[2048,7168]
    Physical:  ('fsdp', None)
I0416 23:03:20.513430 135235630511936 maxtext_utils.py:1805]  decoder/layers/15/mlp/wo/kernel/value
    Shape:     float32[7168,2048]
    Physical:  (None, 'fsdp')
I0416 23:03:20.513457 135235630511936 maxtext_utils.py:1805]  decoder/layers/15/post_self_attention_layer_norm/scale/value
    Shape:     float32[2048]
    Physical:  (None,)
I0416 23:03:20.513484 135235630511936 maxtext_utils.py:1805]  decoder/layers/15/pre_self_attention_layer_norm/scale/value
    Shape:     float32[2048]
    Physical:  (None,)
I0416 23:03:20.513512 135235630511936 maxtext_utils.py:1805]  decoder/layers/15/self_attention/key/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.513540 135235630511936 maxtext_utils.py:1805]  decoder/layers/15/self_attention/out/kernel/value
    Shape:     float32[16,128,2048]
    Physical:  (None, None, 'fsdp')
I0416 23:03:20.513567 135235630511936 maxtext_utils.py:1805]  decoder/layers/15/self_attention/query/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)
I0416 23:03:20.513596 135235630511936 maxtext_utils.py:1805]  decoder/layers/15/self_attention/value/kernel/value
    Shape:     float32[2048,16,128]
    Physical:  ('fsdp', None, None)

I0416 23:03:20.513624 135235630511936 maxtext_utils.py:1805]  decoder/logits_dense/kernel/value
    Shape:     float32[2048,32000]
    Physical:  ('fsdp', None)
I0416 23:03:20.513652 135235630511936 maxtext_utils.py:1805]  token_embedder/embedding/value
    Shape:     float32[32000,2048]
    Physical:  (None, 'fsdp')
I0416 23:03:21.004078 135235630511936 nnx_decoders.py:1023] nnx_decoders/y_in Logical: bfloat16[32,2048,2048]...................................... ('activation_batch', 'activation_norm_length', 'activation_embed').
I0416 23:03:21.004169 135235630511936 nnx_decoders.py:1023] nnx_decoders/y_in Physical: bfloat16[32,2048,2048]...................................... ('fsdp', None, None).
I0416 23:03:21.008731 135235630511936 nnx_decoders.py:1023] Unknown Logical: bfloat16[32,2048,2048]...................................... ('activation_batch', 'activation_norm_length', 'activation_embed').
I0416 23:03:21.008786 135235630511936 nnx_decoders.py:1023] Unknown Physical: bfloat16[32,2048,2048]...................................... ('fsdp', None, None).
I0416 23:03:21.024582 135235630511936 attentions.py:1088] attentions/inputs_q Logical: bfloat16[32,2048,2048]...................................... ('activation_batch', 'activation_attn_length', 'activation_attn_embed').
I0416 23:03:21.024639 135235630511936 attentions.py:1088] attentions/inputs_q Physical: bfloat16[32,2048,2048]...................................... ('fsdp', None, None).
I0416 23:03:21.039936 135235630511936 attentions.py:1089] attentions/inputs_kv Logical: bfloat16[32,2048,2048]...................................... ('activation_batch', 'activation_attn_length', 'activation_attn_embed').
I0416 23:03:21.039988 135235630511936 attentions.py:1089] attentions/inputs_kv Physical: bfloat16[32,2048,2048]...................................... ('fsdp', None, None).
I0416 23:03:21.062225 135235630511936 attentions.py:1154] attentions/query Logical: bfloat16[32,2048,16,128].................................... ('activation_kv_batch', 'activation_attn_length', 'activation_kv_heads', 'activation_kv_head_dim').
I0416 23:03:21.062290 135235630511936 attentions.py:1154] attentions/query Physical: bfloat16[32,2048,16,128].................................... ('fsdp', None, None, None).
I0416 23:03:21.077561 135235630511936 attentions.py:1155] attentions/key Logical: bfloat16[32,2048,16,128].................................... ('activation_kv_batch', 'activation_attn_length', 'activation_kv_heads', 'activation_kv_head_dim').
I0416 23:03:21.077622 135235630511936 attentions.py:1155] attentions/key Physical: bfloat16[32,2048,16,128].................................... ('fsdp', None, None, None).
I0416 23:03:21.092879 135235630511936 attentions.py:1156] attentions/value Logical: bfloat16[32,2048,16,128].................................... ('activation_kv_batch', 'activation_attn_length', 'activation_kv_heads', 'activation_kv_head_dim').
I0416 23:03:21.092936 135235630511936 attentions.py:1156] attentions/value Physical: bfloat16[32,2048,16,128].................................... ('fsdp', None, None, None).
I0416 23:03:21.119728 135235630511936 attentions.py:1197] attentions/out Logical: bfloat16[32,2048,16,128].................................... ('activation_batch', 'activation_attn_length', 'activation_heads', 'activation_kv').
I0416 23:03:21.119793 135235630511936 attentions.py:1197] attentions/out Physical: bfloat16[32,2048,16,128].................................... ('fsdp', None, None, None).
I0416 23:03:21.138918 135235630511936 linears.py:525] linears/x Logical: bfloat16[32,2048,7168]...................................... ('activation_batch', 'activation_length', 'activation_mlp').
I0416 23:03:21.138987 135235630511936 linears.py:525] linears/x Physical: bfloat16[32,2048,7168]...................................... ('fsdp', None, None).
I0416 23:04:20.677405 135235630511936 max_utils.py:791] Total memory size: 1.8 GB, Output size: 0.4 GB, Temp size: 1.4 GB, Argument size: 0.4 GB, Host temp size: 0.0 GB.
I0416 23:04:20.695289 135235630511936 metric_logger.py:289] number parameters: 1.104 billion
I0416 23:05:22.148575 135235630511936 checkpointing.py:794] Waiting for step 0 to finish before checkpoint...
I0416 23:05:22.370134 135235630511936 checkpointing.py:798] Waited 0.22154498100280762 seconds for step 0 to finish before starting checkpointing.
I0416 23:05:22.373887 135235630511936 checkpoint_manager.py:2009] [process=0][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning.
I0416 23:05:22.375787 135235630511936 checkpoint_manager.py:1512] [process=0] Saving checkpoint at step 0
I0416 23:05:22.377279 135235630511936 event_tracking.py:70] [process=0] [async] Started save checkpoint @ gs://lance-maxtext/nnx_ckpt_xpk_feat_nnx_post_train_fixes_20260416_221011/nnx_xpk_feat_nnx_post_train_fixes_20260416_221011_13_scan_layers_false/checkpoints/0.
I0416 23:05:22.719941 135235630511936 signaling_client.py:364] Using JaxDistributedSignalingClient
I0416 23:05:22.801032 135235630511936 jax_array_handlers.py:360] Scheduling D2H of 648 prioritized jax.Array.
I0416 23:05:22.801198 135235630511936 replica_slices.py:424] Transferring arrays to host memory with options: use_replica_parallel=True, min_slice_bytes_for_replica_parallel=None, max_replicas_for_replica_parallel=None, enable_pinned_host_transfer=False
I0416 23:05:23.002231 135106584901376 atomicity.py:140] Creating tmp directory gs://lance-maxtext/nnx_ckpt_xpk_feat_nnx_post_train_fixes_20260416_221011/nnx_xpk_feat_nnx_post_train_fixes_20260416_221011_13_scan_layers_false/checkpoints/0
I0416 23:05:23.132739 135235630511936 base_pytree_checkpoint_handler.py:154] [process=0][thread=MainThread] Initiated "orbax.checkpoint._src.serialization.jax_array_handlers.ArrayHandler".serialize. Time taken: 0.337678s
I0416 23:05:23.133570 135235630511936 base_pytree_checkpoint_handler.py:130] [process=0] /jax/orbax/write/blocking_gbytes_per_sec: 3.802 GiB/s (total gbytes: 1.5 GiB) (time elapsed: 0.40572214126586914 s) (per-host)
I0416 23:05:23.133643 135235630511936 base_pytree_checkpoint_handler.py:768] [process=0][thread=MainThread] Initiated Pytree async_save. Time taken: 0.405815s (batch_requests_ready=0.026392s, total_serialization_initiated=0.378687s, others=0.000737s)
I0416 23:05:23.133805 135235630511936 composite_checkpoint_handler.py:715] [process=0][thread=MainThread] Initiated CompositeCheckpointHandler.async_save. Time taken: 0.410364s (all_items=0.000017s, per_item={'items': '0.00001717'}, temp_paths=0.410347)
I0416 23:05:23.134621 135235630511936 event_tracking.py:125] [process=0] [async] Finished blocking save in 0.76 seconds. Continuing save @ gs://lance-maxtext/nnx_ckpt_xpk_feat_nnx_post_train_fixes_20260416_221011/nnx_xpk_feat_nnx_post_train_fixes_20260416_221011_13_scan_layers_false/checkpoints/0.
I0416 23:05:23.134939 135106534545152 async_checkpointer.py:76] [process=0][thread=async_save] Background save thread started. Deadline for this save operation is 2026-04-16 23:25:23.134905
I0416 23:05:23.142973 135235630511936 checkpoint_manager.py:1560] [process=0][thread=MainThread][step=0] Starting CheckpointManager Save Finalize thread=save_finalize
I0416 23:05:23.143364 135107066214144 async_checkpointer.py:280] [process=0][thread=save_finalize] Waiting for background save thread=async_save.
I0416 23:05:23.143535 135235630511936 standard_logger.py:34] {'step': 0, 'event_type': 'save', 'directory': 'gs://lance-maxtext/nnx_ckpt_xpk_feat_nnx_post_train_fixes_20260416_221011/nnx_xpk_feat_nnx_post_train_fixes_20260416_221011_13_scan_layers_false/checkpoints', 'reached_preemption': False, 'preemption_received_at': None, 'synchronous': False, 'wait_for_prev_start_time': 1776380722.373868, 'wait_for_prev_duration_secs': 6.198883056640625e-05, 'time_between_consecutive_saves_sec': None, 'checkpointer_blocking_start_time': 1776380722.3758276, 'checkpointer_blocking_duration_secs': 0.7592573165893555, 'get_old_steps_start_time': 1776380723.1351101, 'get_old_steps_duration_secs': 2.8133392333984375e-05, 'checkpoint_manager_blocking_start_time': 1776380722.3714352, 'checkpoint_manager_blocking_duration_secs': 0.772045373916626}
I0416 23:05:23.143767 135235630511936 checkpointing.py:409] Started an asynchronous checkpoint save for step 0
I0416 23:05:23.143999 135235630511936 max_utils.py:750] 
Memstats: After params initialized:
I0416 23:05:23.144055 135235630511936 max_utils.py:756] 	Using (GB) 0.82 / 31.25 (2.624000%) on TPU_0(process=0,(0,0,0,0))
I0416 23:05:23.144088 135235630511936 max_utils.py:756] 	Using (GB) 0.82 / 31.25 (2.624000%) on TPU_1(process=0,(1,0,0,0))
I0416 23:05:23.144116 135235630511936 max_utils.py:756] 	Using (GB) 0.82 / 31.25 (2.624000%) on TPU_4(process=0,(0,1,0,0))
I0416 23:05:23.144140 135235630511936 max_utils.py:756] 	Using (GB) 0.82 / 31.25 (2.624000%) on TPU_5(process=0,(1,1,0,0))
I0416 23:05:23.551164 135235630511936 metric_logger.py:185] completed step: 0, seconds: 60.910, TFLOP/s/device: 0.223, Tokens/s/device: 33.623, total_weights: 65536, loss: 10.874
I0416 23:05:23.554070 135235630511936 metric_logger.py:269] To see full metrics 'tensorboard --logdir=gs://lance-maxtext/nnx_ckpt_xpk_feat_nnx_post_train_fixes_20260416_221011/nnx_xpk_feat_nnx_post_train_fixes_20260416_221011_13_scan_layers_false/tensorboard/'
I0416 23:05:24.090750 135235630511936 metric_logger.py:185] completed step: 1, seconds: 1.380, TFLOP/s/device: 9.846, Tokens/s/device: 1484.161, total_weights: 65536, loss: 10.874
I0416 23:05:24.188531 135235630511936 metric_logger.py:185] completed step: 2, seconds: 0.547, TFLOP/s/device: 24.842, Tokens/s/device: 3744.517, total_weights: 65536, loss: 10.265
I0416 23:05:24.324078 135235630511936 metric_logger.py:185] completed step: 3, seconds: 0.039, TFLOP/s/device: 345.940, Tokens/s/device: 52143.803, total_weights: 65536, loss: 9.736
I0416 23:05:25.357600 135106568115968 atomicity.py:140] Creating tmp directory gs://lance-maxtext/nnx_ckpt_xpk_feat_nnx_post_train_fixes_20260416_221011/nnx_xpk_feat_nnx_post_train_fixes_20260416_221011_13_scan_layers_false/checkpoints/0/items
I0416 23:05:25.490752 135106029696768 checkpoint.py:188] Wrote Metadata={'item_handlers': None, 'metrics': {}, 'performance_metrics': {}, 'init_timestamp_nsecs': 1776380725080346923, 'commit_timestamp_nsecs': None, 'custom_metadata': {}}, json={"item_handlers": null, "metrics": {}, "performance_metrics": {}, "init_timestamp_nsecs": 1776380725080346923, "commit_timestamp_nsecs": null, "custom_metadata": {}} to gs://lance-maxtext/nnx_ckpt_xpk_feat_nnx_post_train_fixes_20260416_221011/nnx_xpk_feat_nnx_post_train_fixes_20260416_221011_13_scan_layers_false/checkpoints/0/_CHECKPOINT_METADATA
I0416 23:05:26.963480    2915 google_auth_provider.cc:181] Running on GCE, using service account 562977990677-compute@developer.gserviceaccount.com
I0416 23:05:29.468484 135106551330560 array_metadata_store.py:203] [process=0][thread=array_type_handler] Wrote 648 array_metadata.ArrayMetadata to gs://lance-maxtext/nnx_ckpt_xpk_feat_nnx_post_train_fixes_20260416_221011/nnx_xpk_feat_nnx_post_train_fixes_20260416_221011_13_scan_layers_false/checkpoints/0/items/array_metadatas/process_0
I0416 23:05:48.060690 135235630511936 metric_logger.py:185] completed step: 4, seconds: 0.099, TFLOP/s/device: 137.743, Tokens/s/device: 20762.158, total_weights: 65536, loss: 9.279
I0416 23:05:48.087941 135235630511936 metric_logger.py:185] completed step: 5, seconds: 0.136, TFLOP/s/device: 99.977, Tokens/s/device: 15069.683, total_weights: 65536, loss: 8.894
I0416 23:05:48.207321 135235630511936 metric_logger.py:185] completed step: 6, seconds: 23.734, TFLOP/s/device: 0.572, Tokens/s/device: 86.291, total_weights: 65536, loss: 8.595
I0416 23:05:48.343514 135235630511936 metric_logger.py:185] completed step: 7, seconds: 0.022, TFLOP/s/device: 604.598, Tokens/s/device: 91131.580, total_weights: 65536, loss: 8.387
I0416 23:05:48.478937 135235630511936 metric_logger.py:185] completed step: 8, seconds: 0.123, TFLOP/s/device: 110.836, Tokens/s/device: 16706.366, total_weights: 65536, loss: 8.258
I0416 23:05:48.622285 135235630511936 checkpointing.py:794] Waiting for step 9 to finish before checkpoint...
I0416 23:05:48.626723 135235630511936 checkpointing.py:798] Waited 0.00444483757019043 seconds for step 9 to finish before starting checkpointing.
I0416 23:05:48.629773 135235630511936 checkpoint_manager.py:2020] [process=0][thread=MainThread][step=0][wait_until_finished] Waiting for Save Finalize thread (save_finalize) to complete.
I0416 23:06:00.963037 135106542937856 base_pytree_checkpoint_handler.py:1282] [process=0][thread=write_metadata_after_commits] Commit + Array metadata written. Time taken: 34.043807s (commit=31.672902s, array_metadata_write=2.370905s)
I0416 23:06:00.965241 135106534545152 base_pytree_checkpoint_handler.py:130] [process=0] /jax/orbax/write/gbytes_per_sec: 41.310 MiB/s (total gbytes: 1.5 GiB) (time elapsed: 38.23733949661255 s) (per-host)
I0416 23:06:00.965371 135106534545152 async_checkpointer.py:90] [process=0][thread=async_save] 3 Handler Commit operations completed. Time taken: 37.830317s.
I0416 23:06:01.740465 135106534545152 checkpoint.py:228] Read Metadata={'item_handlers': None, 'metrics': {}, 'performance_metrics': {}, 'init_timestamp_nsecs': 1776380725080346923, 'commit_timestamp_nsecs': None, 'custom_metadata': {}} from gs://lance-maxtext/nnx_ckpt_xpk_feat_nnx_post_train_fixes_20260416_221011/nnx_xpk_feat_nnx_post_train_fixes_20260416_221011_13_scan_layers_false/checkpoints/0/_CHECKPOINT_METADATA
I0416 23:06:02.503442 135106029696768 checkpoint.py:247] Updated Metadata={'item_handlers': {'items': 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler'}, 'metrics': {}, 'performance_metrics': {}, 'init_timestamp_nsecs': 1776380725080346923, 'commit_timestamp_nsecs': None, 'custom_metadata': {}} to gs://lance-maxtext/nnx_ckpt_xpk_feat_nnx_post_train_fixes_20260416_221011/nnx_xpk_feat_nnx_post_train_fixes_20260416_221011_13_scan_layers_false/checkpoints/0/_CHECKPOINT_METADATA
I0416 23:06:02.742032 135106534545152 array_metadata_store.py:411] [process=0][thread=async_save] Validated ArrayMetadata from all 8 hosts. Time taken: 0.003967s.
I0416 23:06:03.387554 135106534545152 ocdbt_utils.py:49] Param validation support for Zarr3 will be added later (b/362328389).
I0416 23:06:04.754381 135106534545152 base_pytree_checkpoint_handler.py:1406] [process=0][thread=async_save] Pytree save finalize (merge_ocdbt + ArrayMetadata validation) completed. Time taken: 2.891613s. use_zarr3=True, enable_post_merge_validation=True, directory=gs://lance-maxtext/nnx_ckpt_xpk_feat_nnx_post_train_fixes_20260416_221011/nnx_xpk_feat_nnx_post_train_fixes_20260416_221011_13_scan_layers_false/checkpoints/0/items
I0416 23:06:04.756309 135106534545152 atomicity.py:666] Finalizing gs://lance-maxtext/nnx_ckpt_xpk_feat_nnx_post_train_fixes_20260416_221011/nnx_xpk_feat_nnx_post_train_fixes_20260416_221011_13_scan_layers_false/checkpoints/0/items
I0416 23:06:05.270058 135106534545152 atomicity.py:666] Finalizing gs://lance-maxtext/nnx_ckpt_xpk_feat_nnx_post_train_fixes_20260416_221011/nnx_xpk_feat_nnx_post_train_fixes_20260416_221011_13_scan_layers_false/checkpoints/0
I0416 23:06:06.680951 135106534545152 atomicity.py:847] [process=0][thread=async_save] Finished saving checkpoint (finalized tmp dir) to `gs://lance-maxtext/nnx_ckpt_xpk_feat_nnx_post_train_fixes_20260416_221011/nnx_xpk_feat_nnx_post_train_fixes_20260416_221011_13_scan_layers_false/checkpoints/0`.
I0416 23:06:06.681799 135106534545152 event_tracking.py:138] [process=0] [async] Finished save (blocking + background) in 44.31 seconds @ gs://lance-maxtext/nnx_ckpt_xpk_feat_nnx_post_train_fixes_20260416_221011/nnx_xpk_feat_nnx_post_train_fixes_20260416_221011_13_scan_layers_false/checkpoints/0
I0416 23:06:06.683232 135106534545152 async_checkpointer.py:160] [process=0][thread=async_save] Background save thread done. Time taken: 43.548175s.
I0416 23:06:06.683448 135107066214144 async_checkpointer.py:288] [process=0][thread=save_finalize] Done with waiting for background save thread=async_save.
I0416 23:06:06.683577 135107066214144 async_checkpointer.py:298] [process=0][thread=save_finalize] No errors found in background save thread=async_save.
I0416 23:06:06.683644 135107066214144 checkpoint_manager.py:2137] [process=0][thread=save_finalize][step=0] CheckpointManager Save Finalize is syncing with other hosts...
I0416 23:06:06.686547 135107066214144 checkpoint_manager.py:2146] [process=0][thread=save_finalize][step=0] CheckpointManager Save Finalize is done on all hosts.
I0416 23:06:06.686698 135235630511936 checkpoint_manager.py:2032] [process=0][thread=MainThread][step=0][wait_until_finished] Done waiting for Save Finalize thread (save_finalize) running at step=0.
W0416 23:06:06.686791 135235630511936 checkpoint_manager.py:1452] Waiting for previous save to complete took 18.057021 seconds. If this number is high, consider checkpointing less frequently.
I0416 23:06:06.689258 135235630511936 checkpoint_manager.py:1512] [process=0] Saving checkpoint at step 9
I0416 23:06:06.691164 135235630511936 event_tracking.py:70] [process=0] [async] Started save checkpoint @ gs://lance-maxtext/nnx_ckpt_xpk_feat_nnx_post_train_fixes_20260416_221011/nnx_xpk_feat_nnx_post_train_fixes_20260416_221011_13_scan_layers_false/checkpoints/9.
I0416 23:06:07.057705 135235630511936 jax_array_handlers.py:360] Scheduling D2H of 648 prioritized jax.Array.
I0416 23:06:07.057862 135235630511936 replica_slices.py:424] Transferring arrays to host memory with options: use_replica_parallel=True, min_slice_bytes_for_replica_parallel=None, max_replicas_for_replica_parallel=None, enable_pinned_host_transfer=False
I0416 23:06:07.204968 135235630511936 base_pytree_checkpoint_handler.py:154] [process=0][thread=MainThread] Initiated "orbax.checkpoint._src.serialization.jax_array_handlers.ArrayHandler".serialize. Time taken: 0.152871s
I0416 23:06:07.205789 135235630511936 base_pytree_checkpoint_handler.py:130] [process=0] /jax/orbax/write/blocking_gbytes_per_sec: 7.050 GiB/s (total gbytes: 1.5 GiB) (time elapsed: 0.21879315376281738 s) (per-host)
I0416 23:06:07.205875 135235630511936 base_pytree_checkpoint_handler.py:768] [process=0][thread=MainThread] Initiated Pytree async_save. Time taken: 0.218895s (batch_requests_ready=0.025933s, total_serialization_initiated=0.192207s, others=0.000755s)
I0416 23:06:07.206063 135235630511936 composite_checkpoint_handler.py:715] [process=0][thread=MainThread] Initiated CompositeCheckpointHandler.async_save. Time taken: 0.223482s (all_items=0.000017s, per_item={'items': '0.00001717'}, temp_paths=0.223465)
I0416 23:06:07.206871 135235630511936 event_tracking.py:125] [process=0] [async] Finished blocking save in 0.52 seconds. Continuing save @ gs://lance-maxtext/nnx_ckpt_xpk_feat_nnx_post_train_fixes_20260416_221011/nnx_xpk_feat_nnx_post_train_fixes_20260416_221011_13_scan_layers_false/checkpoints/9.
I0416 23:06:07.207198 135106542937856 async_checkpointer.py:76] [process=0][thread=async_save] Background save thread started. Deadline for this save operation is 2026-04-16 23:26:07.207160
I0416 23:06:07.209253 135235630511936 checkpoint_manager.py:1560] [process=0][thread=MainThread][step=9] Starting CheckpointManager Save Finalize thread=save_finalize
I0416 23:06:07.209569 135107066214144 async_checkpointer.py:280] [process=0][thread=save_finalize] Waiting for background save thread=async_save.
I0416 23:06:07.209708 135235630511936 standard_logger.py:34] {'step': 9, 'event_type': 'save', 'directory': 'gs://lance-maxtext/nnx_ckpt_xpk_feat_nnx_post_train_fixes_20260416_221011/nnx_xpk_feat_nnx_post_train_fixes_20260416_221011_13_scan_layers_false/checkpoints', 'reached_preemption': False, 'preemption_received_at': None, 'synchronous': False, 'wait_for_prev_start_time': 1776380748.62974, 'wait_for_prev_duration_secs': 18.057021141052246, 'time_between_consecutive_saves_sec': None, 'checkpointer_blocking_start_time': 1776380766.6892984, 'checkpointer_blocking_duration_secs': 0.5180368423461914, 'get_old_steps_start_time': 1776380767.2073584, 'get_old_steps_duration_secs': 2.9087066650390625e-05, 'checkpoint_manager_blocking_start_time': 1776380748.6277776, 'checkpoint_manager_blocking_duration_secs': 18.58189630508423}
I0416 23:06:07.209977 135235630511936 checkpointing.py:409] Started an asynchronous checkpoint save for step 9
I0416 23:06:07.210029 135235630511936 checkpoint_manager.py:2020] [process=0][thread=MainThread][step=9][wait_until_finished] Waiting for Save Finalize thread (save_finalize) to complete.
I0416 23:06:07.225346 135106534545152 atomicity.py:140] Creating tmp directory gs://lance-maxtext/nnx_ckpt_xpk_feat_nnx_post_train_fixes_20260416_221011/nnx_xpk_feat_nnx_post_train_fixes_20260416_221011_13_scan_layers_false/checkpoints/9
I0416 23:06:09.364570 135106551330560 atomicity.py:140] Creating tmp directory gs://lance-maxtext/nnx_ckpt_xpk_feat_nnx_post_train_fixes_20260416_221011/nnx_xpk_feat_nnx_post_train_fixes_20260416_221011_13_scan_layers_false/checkpoints/9/items
I0416 23:06:12.476576 135108195514112 array_metadata_store.py:203] [process=0][thread=array_type_handler] Wrote 648 array_metadata.ArrayMetadata to gs://lance-maxtext/nnx_ckpt_xpk_feat_nnx_post_train_fixes_20260416_221011/nnx_xpk_feat_nnx_post_train_fixes_20260416_221011_13_scan_layers_false/checkpoints/9/items/array_metadatas/process_0
I0416 23:06:53.006535 135106584901376 base_pytree_checkpoint_handler.py:1282] [process=0][thread=write_metadata_after_commits] Commit + Array metadata written. Time taken: 42.540309s (commit=40.641784s, array_metadata_write=1.898525s)
I0416 23:06:53.008699 135106542937856 base_pytree_checkpoint_handler.py:130] [process=0] /jax/orbax/write/gbytes_per_sec: 34.323 MiB/s (total gbytes: 1.5 GiB) (time elapsed: 46.021665811538696 s) (per-host)
I0416 23:06:53.008809 135106542937856 async_checkpointer.py:90] [process=0][thread=async_save] 3 Handler Commit operations completed. Time taken: 45.801498s.
I0416 23:06:54.985026 135106542937856 ocdbt_utils.py:49] Param validation support for Zarr3 will be added later (b/362328389).
I0416 23:06:55.179312 135106542937856 array_metadata_store.py:411] [process=0][thread=async_save] Validated ArrayMetadata from all 8 hosts. Time taken: 0.004098s.
I0416 23:06:56.204738 135106542937856 base_pytree_checkpoint_handler.py:1406] [process=0][thread=async_save] Pytree save finalize (merge_ocdbt + ArrayMetadata validation) completed. Time taken: 2.295642s. use_zarr3=True, enable_post_merge_validation=True, directory=gs://lance-maxtext/nnx_ckpt_xpk_feat_nnx_post_train_fixes_20260416_221011/nnx_xpk_feat_nnx_post_train_fixes_20260416_221011_13_scan_layers_false/checkpoints/9/items
I0416 23:06:56.206444 135106542937856 atomicity.py:666] Finalizing gs://lance-maxtext/nnx_ckpt_xpk_feat_nnx_post_train_fixes_20260416_221011/nnx_xpk_feat_nnx_post_train_fixes_20260416_221011_13_scan_layers_false/checkpoints/9/items
I0416 23:06:56.728343 135106542937856 atomicity.py:666] Finalizing gs://lance-maxtext/nnx_ckpt_xpk_feat_nnx_post_train_fixes_20260416_221011/nnx_xpk_feat_nnx_post_train_fixes_20260416_221011_13_scan_layers_false/checkpoints/9
I0416 23:06:58.114995 135106542937856 atomicity.py:847] [process=0][thread=async_save] Finished saving checkpoint (finalized tmp dir) to `gs://lance-maxtext/nnx_ckpt_xpk_feat_nnx_post_train_fixes_20260416_221011/nnx_xpk_feat_nnx_post_train_fixes_20260416_221011_13_scan_layers_false/checkpoints/9`.
I0416 23:06:58.115797 135106542937856 event_tracking.py:138] [process=0] [async] Finished save (blocking + background) in 51.43 seconds @ gs://lance-maxtext/nnx_ckpt_xpk_feat_nnx_post_train_fixes_20260416_221011/nnx_xpk_feat_nnx_post_train_fixes_20260416_221011_13_scan_layers_false/checkpoints/9
I0416 23:06:58.117287 135106542937856 async_checkpointer.py:160] [process=0][thread=async_save] Background save thread done. Time taken: 50.909973s.
I0416 23:06:58.117491 135107066214144 async_checkpointer.py:288] [process=0][thread=save_finalize] Done with waiting for background save thread=async_save.
I0416 23:06:58.117605 135107066214144 async_checkpointer.py:298] [process=0][thread=save_finalize] No errors found in background save thread=async_save.
I0416 23:06:58.117668 135107066214144 checkpoint_manager.py:2137] [process=0][thread=save_finalize][step=9] CheckpointManager Save Finalize is syncing with other hosts...
I0416 23:06:58.119395 135107066214144 checkpoint_manager.py:2146] [process=0][thread=save_finalize][step=9] CheckpointManager Save Finalize is done on all hosts.
I0416 23:06:58.119572 135235630511936 checkpoint_manager.py:2032] [process=0][thread=MainThread][step=9][wait_until_finished] Done waiting for Save Finalize thread (save_finalize) running at step=9.
I0416 23:06:58.119867 135235630511936 checkpoint_manager.py:2009] [process=0][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning.
I0416 23:06:58.120544 135235630511936 metric_logger.py:185] completed step: 9, seconds: 0.135, TFLOP/s/device: 100.473, Tokens/s/device: 15144.345, total_weights: 65536, loss: 8.182
Per train step:
 Total TFLOPs: 13.59 
 split as 93.93% learnable weight flops and 6.07% attention flops
XPK End: Thu Apr 16 23:07:09 UTC 2026
EXIT_CODE=0