MaxView

‹ 11_optimizer_offload_trueCase: 13_scan_layers_false13_scan_layers_true ›

Metrics: main (586e69205) vs feat/nnx-post-train-fixes (7f06c99ac)

Metricmain  586e69205feat/nnx-post-train-fixes  7f06c99acDiff (feat/nnx-post-train-fixes − main)
Parameters1.104 billion1.104 billion
Final loss8.18808.18800
TFLOP/s99.37499.259-0.115
Tok/s14978.814961.5-17.289
Avg s/step3.0122.982-0.03
Memory %2.622.620
JAX0.9.20.9.2

Diff = branch value − main value. Green = branch improved. Red = branch regressed.

main  ·  586e69205  ·  main_20260423_071538  ·  full log
XPK Start: Thu Apr 23 08:02:23 UTC 2026
PyTorch was not found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
`rope_parameters`'s factor field must be a float >= 1, got 40
`rope_parameters`'s beta_fast field must be a float, got 32
`rope_parameters`'s beta_slow field must be a float, got 1
DeepseekV32Config got `key=rope_scaling` in kwargs but hasn't set it as attribute. For RoPE standardization you need to set `self.rope_parameters` in model's config. 
2026-04-23 08:02:47.938952: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
I0423 08:02:48.151821 138057675323200 max_utils.py:273] Attempting to initialize the jax distributed system...
I0423 08:02:57.193799 138057675323200 distributed.py:149] Starting JAX distributed service on [::]:8482
I0423 08:02:57.196135 138057675323200 distributed.py:172] Connecting to JAX distributed service on mt-13-scan-layers-false-oep7m-slice-job-0-0.mt-13-scan-layers-false-oep7m:8482
I0423 08:02:58.080470 138057675323200 max_utils.py:284] Jax distributed system initialized!
I0423 08:03:04.622394 138057675323200 max_utils.py:800] System Information: Jax Version: 0.9.2
I0423 08:03:04.622500 138057675323200 max_utils.py:801] System Information: Jaxlib Version: 0.9.2
I0423 08:03:04.622540 138057675323200 max_utils.py:802] System Information: Jax Backend: PJRT C API
TFRT TPU v6 lite
Built on Mar 4 2026 11:32:08 (1772652728) cl/878335365
I0423 08:03:04.622575 138057675323200 train_utils.py:361] WARNING: Sequence packing is essentially ignored for synthetic data. Please use a real dataset to use sequence packing.
I0423 08:03:05.315634 138057675323200 maxtext_utils.py:1604] Num_devices: 32, shape (1, 1, 1, 32, 1, 1, 1, 1, 1, 1, 1, 1, 1)
I0423 08:03:05.315913 138057675323200 checkpointing.py:677] Setting up checkpoint logger...
I0423 08:03:05.315968 138057675323200 checkpointing.py:233] Creating checkpoint manager with ocdbt=True and zarr3=True
I0423 08:03:05.316011 138057675323200 pytree_checkpoint_handler.py:592] save_device_host_concurrent_bytes=None
I0423 08:03:05.316368 138057675323200 base_pytree_checkpoint_handler.py:441] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=True, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x7d8f78314350>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0423 08:03:08.471778 138057675323200 checkpointing.py:265] Enabling policy for fixed interval checkpointing.
I0423 08:03:08.472016 138057675323200 checkpoint_manager.py:708] [process=6][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=('items',), item_handlers={'items': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7d7af47170e0>}, handler_registry=None
I0423 08:03:08.472270 138057675323200 composite_checkpoint_handler.py:237] Deferred registration for item: "items". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7d7af47170e0>` for item "items" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0423 08:03:08.472321 138057675323200 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7d7af471b200>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0423 08:03:08.472363 138057675323200 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('items', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7d7af47170e0>, ('items', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x7d7af47170e0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7d7af471b200>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7d7af471b200>}).
I0423 08:03:08.472688 138057675323200 abstract_checkpointer.py:35] orbax-checkpoint version: 0.11.34
I0423 08:03:08.472758 138057675323200 async_checkpointer.py:192] [process=6][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7d7af44c5a80> timeout: 1200 secs and primary_host=0 for async checkpoint writes
I0423 08:03:09.965656 138057675323200 checkpoint_manager.py:1812] Found 0 checkpoint steps in gs://lance-maxtext/linen_ckpt_xpk_main_20260423_071538/linen_xpk_main_20260423_071538_13_scan_layers_false/checkpoints
I0423 08:03:10.429204 138057675323200 checkpoint_manager.py:929] [process=6][thread=MainThread] CheckpointManager created,  primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=1, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=FixedIntervalPolicy(interval=10), preservation_policy=LatestN(n=None), prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False, lightweight_initialize=False), root_directory=gs://lance-maxtext/linen_ckpt_xpk_main_20260423_071538/linen_xpk_main_20260423_071538_13_scan_layers_false/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7d8e4bafd280>
I0423 08:03:10.429389 138057675323200 checkpointing.py:301] Checkpoint manager created!
I0423 08:03:11.372931 138057675323200 nnx_wrappers.py:437] Unknown Logical: bfloat16[32,2048,2048]...................................... ('activation_batch', 'activation_norm_length', 'activation_embed').
I0423 08:03:11.373037 138057675323200 nnx_wrappers.py:437] Unknown Physical: bfloat16[32,2048,2048]...................................... ('fsdp', None, None).
I0423 08:03:11.762753 138057675323200 attentions.py:1088] attentions/inputs_q Logical: bfloat16[32,2048,2048]...................................... ('activation_batch_attn', 'activation_length_attn', 'activation_embed_attn').
I0423 08:03:11.762846 138057675323200 attentions.py:1088] attentions/inputs_q Physical: bfloat16[32,2048,2048]...................................... ('fsdp', None, None).
I0423 08:03:11.779165 138057675323200 attentions.py:1089] attentions/inputs_kv Logical: bfloat16[32,2048,2048]...................................... ('activation_batch_attn', 'activation_length_attn', 'activation_embed_attn').
I0423 08:03:11.779227 138057675323200 attentions.py:1089] attentions/inputs_kv Physical: bfloat16[32,2048,2048]...................................... ('fsdp', None, None).
I0423 08:03:11.809839 138057675323200 attentions.py:1154] attentions/query Logical: bfloat16[32,2048,16,128].................................... ('activation_kv_batch', 'activation_length_attn', 'activation_kv_heads', 'activation_kv_head_dim').
I0423 08:03:11.809911 138057675323200 attentions.py:1154] attentions/query Physical: bfloat16[32,2048,16,128].................................... ('fsdp', None, None, None).
I0423 08:03:11.826325 138057675323200 attentions.py:1155] attentions/key Logical: bfloat16[32,2048,16,128].................................... ('activation_kv_batch', 'activation_length_attn', 'activation_kv_heads', 'activation_kv_head_dim').
I0423 08:03:11.826387 138057675323200 attentions.py:1155] attentions/key Physical: bfloat16[32,2048,16,128].................................... ('fsdp', None, None, None).
I0423 08:03:11.842606 138057675323200 attentions.py:1156] attentions/value Logical: bfloat16[32,2048,16,128].................................... ('activation_kv_batch', 'activation_length_attn', 'activation_kv_heads', 'activation_kv_head_dim').
I0423 08:03:11.842666 138057675323200 attentions.py:1156] attentions/value Physical: bfloat16[32,2048,16,128].................................... ('fsdp', None, None, None).
I0423 08:03:11.872589 138057675323200 attentions.py:1198] attentions/out Logical: bfloat16[32,2048,16,128].................................... ('activation_batch_attn', 'activation_length_attn', 'activation_heads', 'activation_kv').
I0423 08:03:11.872664 138057675323200 attentions.py:1198] attentions/out Physical: bfloat16[32,2048,16,128].................................... ('fsdp', None, None, None).
I0423 08:03:11.894843 138057675323200 linears.py:525] linears/x Logical: bfloat16[32,2048,7168]...................................... ('activation_batch', 'activation_length', 'activation_mlp').
I0423 08:03:11.894914 138057675323200 linears.py:525] linears/x Physical: bfloat16[32,2048,7168]...................................... ('fsdp', None, None).
I0423 08:03:14.842813 138057675323200 checkpointing.py:577] checkpoint manager exists so trying to load this run's existing checkpoint
I0423 08:03:14.842941 138057675323200 checkpointing.py:665] No existing checkpoints found, not restoring checkpoint.
fsdp: 32
I0423 08:03:22.161544 138057675323200 maxtext_utils.py:1707]  params/params/decoder/decoder_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 08:03:22.161674 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_0/mlp/wi_0/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 08:03:22.161726 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_0/mlp/wi_1/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 08:03:22.161784 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_0/mlp/wo/kernel
    Shape:     float32[7168,2048]
    Logical:   P('mlp', 'embed')
    Physical:  (None, 'fsdp')
I0423 08:03:22.161824 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_0/post_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 08:03:22.161859 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_0/pre_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 08:03:22.161910 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_0/self_attention/key/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.161961 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_0/self_attention/out/kernel
    Shape:     float32[16,128,2048]
    Logical:   P('heads', 'kv', 'embed')
    Physical:  (None, None, 'fsdp')
I0423 08:03:22.161999 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_0/self_attention/query/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'q_heads', 'kv')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.162034 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_0/self_attention/value/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.162067 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_1/mlp/wi_0/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 08:03:22.162111 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_1/mlp/wi_1/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 08:03:22.162151 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_1/mlp/wo/kernel
    Shape:     float32[7168,2048]
    Logical:   P('mlp', 'embed')
    Physical:  (None, 'fsdp')
I0423 08:03:22.162184 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_1/post_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 08:03:22.162214 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_1/pre_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 08:03:22.162251 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_1/self_attention/key/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.162292 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_1/self_attention/out/kernel
    Shape:     float32[16,128,2048]
    Logical:   P('heads', 'kv', 'embed')
    Physical:  (None, None, 'fsdp')
I0423 08:03:22.162327 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_1/self_attention/query/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'q_heads', 'kv')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.162359 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_1/self_attention/value/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.162391 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_10/mlp/wi_0/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 08:03:22.162423 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_10/mlp/wi_1/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 08:03:22.162454 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_10/mlp/wo/kernel
    Shape:     float32[7168,2048]
    Logical:   P('mlp', 'embed')
    Physical:  (None, 'fsdp')
I0423 08:03:22.162482 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_10/post_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 08:03:22.162511 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_10/pre_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 08:03:22.162544 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_10/self_attention/key/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.162576 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_10/self_attention/out/kernel
    Shape:     float32[16,128,2048]
    Logical:   P('heads', 'kv', 'embed')
    Physical:  (None, None, 'fsdp')
I0423 08:03:22.162608 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_10/self_attention/query/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'q_heads', 'kv')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.162640 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_10/self_attention/value/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.162671 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_11/mlp/wi_0/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 08:03:22.162702 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_11/mlp/wi_1/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 08:03:22.162732 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_11/mlp/wo/kernel
    Shape:     float32[7168,2048]
    Logical:   P('mlp', 'embed')
    Physical:  (None, 'fsdp')
I0423 08:03:22.162760 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_11/post_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 08:03:22.162788 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_11/pre_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 08:03:22.162817 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_11/self_attention/key/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.162847 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_11/self_attention/out/kernel
    Shape:     float32[16,128,2048]
    Logical:   P('heads', 'kv', 'embed')
    Physical:  (None, None, 'fsdp')
I0423 08:03:22.162876 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_11/self_attention/query/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'q_heads', 'kv')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.162908 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_11/self_attention/value/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.162940 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_12/mlp/wi_0/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 08:03:22.162973 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_12/mlp/wi_1/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 08:03:22.163004 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_12/mlp/wo/kernel
    Shape:     float32[7168,2048]
    Logical:   P('mlp', 'embed')
    Physical:  (None, 'fsdp')
I0423 08:03:22.163034 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_12/post_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 08:03:22.163062 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_12/pre_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 08:03:22.163117 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_12/self_attention/key/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.163159 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_12/self_attention/out/kernel
    Shape:     float32[16,128,2048]
    Logical:   P('heads', 'kv', 'embed')
    Physical:  (None, None, 'fsdp')
I0423 08:03:22.163191 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_12/self_attention/query/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'q_heads', 'kv')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.163221 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_12/self_attention/value/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.163256 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_13/mlp/wi_0/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 08:03:22.163285 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_13/mlp/wi_1/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 08:03:22.163318 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_13/mlp/wo/kernel
    Shape:     float32[7168,2048]
    Logical:   P('mlp', 'embed')
    Physical:  (None, 'fsdp')
I0423 08:03:22.163346 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_13/post_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 08:03:22.163373 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_13/pre_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 08:03:22.163402 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_13/self_attention/key/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.163431 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_13/self_attention/out/kernel
    Shape:     float32[16,128,2048]
    Logical:   P('heads', 'kv', 'embed')
    Physical:  (None, None, 'fsdp')
I0423 08:03:22.163461 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_13/self_attention/query/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'q_heads', 'kv')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.163489 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_13/self_attention/value/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.163518 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_14/mlp/wi_0/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 08:03:22.163548 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_14/mlp/wi_1/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 08:03:22.163577 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_14/mlp/wo/kernel
    Shape:     float32[7168,2048]
    Logical:   P('mlp', 'embed')
    Physical:  (None, 'fsdp')
I0423 08:03:22.163605 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_14/post_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 08:03:22.163633 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_14/pre_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 08:03:22.163662 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_14/self_attention/key/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.163691 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_14/self_attention/out/kernel
    Shape:     float32[16,128,2048]
    Logical:   P('heads', 'kv', 'embed')
    Physical:  (None, None, 'fsdp')
I0423 08:03:22.163721 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_14/self_attention/query/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'q_heads', 'kv')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.163750 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_14/self_attention/value/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.163779 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_15/mlp/wi_0/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 08:03:22.163808 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_15/mlp/wi_1/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 08:03:22.163836 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_15/mlp/wo/kernel
    Shape:     float32[7168,2048]
    Logical:   P('mlp', 'embed')
    Physical:  (None, 'fsdp')
I0423 08:03:22.163864 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_15/post_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 08:03:22.163890 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_15/pre_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 08:03:22.163920 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_15/self_attention/key/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.163949 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_15/self_attention/out/kernel
    Shape:     float32[16,128,2048]
    Logical:   P('heads', 'kv', 'embed')
    Physical:  (None, None, 'fsdp')
I0423 08:03:22.163977 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_15/self_attention/query/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'q_heads', 'kv')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.164006 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_15/self_attention/value/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.164035 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_2/mlp/wi_0/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 08:03:22.164066 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_2/mlp/wi_1/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 08:03:22.164107 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_2/mlp/wo/kernel
    Shape:     float32[7168,2048]
    Logical:   P('mlp', 'embed')
    Physical:  (None, 'fsdp')
I0423 08:03:22.164137 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_2/post_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 08:03:22.164165 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_2/pre_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 08:03:22.164195 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_2/self_attention/key/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.164225 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_2/self_attention/out/kernel
    Shape:     float32[16,128,2048]
    Logical:   P('heads', 'kv', 'embed')
    Physical:  (None, None, 'fsdp')
I0423 08:03:22.164262 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_2/self_attention/query/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'q_heads', 'kv')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.164292 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_2/self_attention/value/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.164329 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_3/mlp/wi_0/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 08:03:22.164376 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_3/mlp/wi_1/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 08:03:22.164425 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_3/mlp/wo/kernel
    Shape:     float32[7168,2048]
    Logical:   P('mlp', 'embed')
    Physical:  (None, 'fsdp')
I0423 08:03:22.164458 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_3/post_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 08:03:22.164487 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_3/pre_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 08:03:22.164518 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_3/self_attention/key/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.164549 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_3/self_attention/out/kernel
    Shape:     float32[16,128,2048]
    Logical:   P('heads', 'kv', 'embed')
    Physical:  (None, None, 'fsdp')
I0423 08:03:22.164580 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_3/self_attention/query/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'q_heads', 'kv')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.164610 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_3/self_attention/value/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.164639 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_4/mlp/wi_0/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 08:03:22.164668 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_4/mlp/wi_1/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 08:03:22.164697 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_4/mlp/wo/kernel
    Shape:     float32[7168,2048]
    Logical:   P('mlp', 'embed')
    Physical:  (None, 'fsdp')
I0423 08:03:22.164724 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_4/post_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 08:03:22.164751 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_4/pre_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 08:03:22.164781 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_4/self_attention/key/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.164810 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_4/self_attention/out/kernel
    Shape:     float32[16,128,2048]
    Logical:   P('heads', 'kv', 'embed')
    Physical:  (None, None, 'fsdp')
I0423 08:03:22.164839 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_4/self_attention/query/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'q_heads', 'kv')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.164869 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_4/self_attention/value/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.164899 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_5/mlp/wi_0/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 08:03:22.164927 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_5/mlp/wi_1/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 08:03:22.164956 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_5/mlp/wo/kernel
    Shape:     float32[7168,2048]
    Logical:   P('mlp', 'embed')
    Physical:  (None, 'fsdp')
I0423 08:03:22.164983 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_5/post_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 08:03:22.165010 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_5/pre_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 08:03:22.165039 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_5/self_attention/key/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.165067 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_5/self_attention/out/kernel
    Shape:     float32[16,128,2048]
    Logical:   P('heads', 'kv', 'embed')
    Physical:  (None, None, 'fsdp')
I0423 08:03:22.165118 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_5/self_attention/query/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'q_heads', 'kv')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.165176 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_5/self_attention/value/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.165213 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_6/mlp/wi_0/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 08:03:22.165248 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_6/mlp/wi_1/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 08:03:22.165279 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_6/mlp/wo/kernel
    Shape:     float32[7168,2048]
    Logical:   P('mlp', 'embed')
    Physical:  (None, 'fsdp')
I0423 08:03:22.165307 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_6/post_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 08:03:22.165335 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_6/pre_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 08:03:22.165365 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_6/self_attention/key/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.165395 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_6/self_attention/out/kernel
    Shape:     float32[16,128,2048]
    Logical:   P('heads', 'kv', 'embed')
    Physical:  (None, None, 'fsdp')
I0423 08:03:22.165424 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_6/self_attention/query/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'q_heads', 'kv')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.165453 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_6/self_attention/value/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.165482 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_7/mlp/wi_0/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 08:03:22.165510 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_7/mlp/wi_1/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 08:03:22.165539 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_7/mlp/wo/kernel
    Shape:     float32[7168,2048]
    Logical:   P('mlp', 'embed')
    Physical:  (None, 'fsdp')
I0423 08:03:22.165566 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_7/post_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 08:03:22.165594 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_7/pre_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 08:03:22.165623 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_7/self_attention/key/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.165653 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_7/self_attention/out/kernel
    Shape:     float32[16,128,2048]
    Logical:   P('heads', 'kv', 'embed')
    Physical:  (None, None, 'fsdp')
I0423 08:03:22.165683 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_7/self_attention/query/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'q_heads', 'kv')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.165711 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_7/self_attention/value/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.165740 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_8/mlp/wi_0/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 08:03:22.165768 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_8/mlp/wi_1/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 08:03:22.165796 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_8/mlp/wo/kernel
    Shape:     float32[7168,2048]
    Logical:   P('mlp', 'embed')
    Physical:  (None, 'fsdp')
I0423 08:03:22.165823 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_8/post_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 08:03:22.165850 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_8/pre_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 08:03:22.165879 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_8/self_attention/key/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.165907 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_8/self_attention/out/kernel
    Shape:     float32[16,128,2048]
    Logical:   P('heads', 'kv', 'embed')
    Physical:  (None, None, 'fsdp')
I0423 08:03:22.165951 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_8/self_attention/query/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'q_heads', 'kv')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.165995 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_8/self_attention/value/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.166027 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_9/mlp/wi_0/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 08:03:22.166056 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_9/mlp/wi_1/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 08:03:22.166084 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_9/mlp/wo/kernel
    Shape:     float32[7168,2048]
    Logical:   P('mlp', 'embed')
    Physical:  (None, 'fsdp')
I0423 08:03:22.166124 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_9/post_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 08:03:22.166152 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_9/pre_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 08:03:22.166182 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_9/self_attention/key/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.166212 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_9/self_attention/out/kernel
    Shape:     float32[16,128,2048]
    Logical:   P('heads', 'kv', 'embed')
    Physical:  (None, None, 'fsdp')
I0423 08:03:22.166245 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_9/self_attention/query/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'q_heads', 'kv')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.166274 138057675323200 maxtext_utils.py:1707]  params/params/decoder/layers_9/self_attention/value/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 08:03:22.166323 138057675323200 maxtext_utils.py:1707]  params/params/decoder/logits_dense/kernel
    Shape:     float32[2048,32000]
    Logical:   P('embed_vocab', 'vocab')
    Physical:  ('fsdp', None)

I0423 08:03:22.166368 138057675323200 maxtext_utils.py:1707]  params/params/token_embedder/embedding
    Shape:     float32[32000,2048]
    Logical:   P('vocab', 'embed_vocab')
    Physical:  (None, 'fsdp')
I0423 08:03:26.794346 138057675323200 train.py:155] train/xent Logical: float32[32,2048]............................................ ('activation_embed_and_logits_batch', 'activation_length').
I0423 08:03:26.794440 138057675323200 train.py:155] train/xent Physical: float32[32,2048]............................................ ('fsdp', None).
I0423 08:03:26.810064 138057675323200 train.py:162] train/z_loss Logical: float32[32,2048]............................................ ('activation_embed_and_logits_batch', 'activation_length').
I0423 08:03:26.810148 138057675323200 train.py:162] train/z_loss Physical: float32[32,2048]............................................ ('fsdp', None).
I0423 08:04:23.778403 138057675323200 max_utils.py:791] Total memory size: 1.8 GB, Output size: 0.4 GB, Temp size: 1.5 GB, Argument size: 0.4 GB, Host temp size: 0.0 GB.
I0423 08:04:23.779625 138057675323200 metric_logger.py:301] number parameters: 1.104 billion
I0423 08:05:25.121773 138057675323200 checkpointing.py:772] Waiting for step 0 to finish before checkpoint...
I0423 08:05:25.345391 138057675323200 checkpointing.py:776] Waited 0.2236003875732422 seconds for step 0 to finish before starting checkpointing.
I0423 08:05:25.349152 138057675323200 checkpoint_manager.py:2009] [process=6][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning.
I0423 08:05:25.350980 138057675323200 checkpoint_manager.py:1512] [process=6] Saving checkpoint at step 0
I0423 08:05:25.352353 138057675323200 event_tracking.py:70] [process=6] [async] Started save checkpoint @ gs://lance-maxtext/linen_ckpt_xpk_main_20260423_071538/linen_xpk_main_20260423_071538_13_scan_layers_false/checkpoints/0.
I0423 08:05:26.157580 138057675323200 signaling_client.py:364] Using JaxDistributedSignalingClient
I0423 08:05:26.158644 138057675323200 jax_array_handlers.py:360] Scheduling D2H of 444 prioritized jax.Array.
I0423 08:05:26.158771 138057675323200 replica_slices.py:424] Transferring arrays to host memory with options: use_replica_parallel=True, min_slice_bytes_for_replica_parallel=None, max_replicas_for_replica_parallel=None, enable_pinned_host_transfer=False
I0423 08:05:26.457227 138057675323200 base_pytree_checkpoint_handler.py:154] [process=6][thread=MainThread] Initiated "orbax.checkpoint._src.serialization.jax_array_handlers.ArrayHandler".serialize. Time taken: 0.300180s
I0423 08:05:26.457392 138057675323200 base_pytree_checkpoint_handler.py:130] [process=6] /jax/orbax/write/blocking_gbytes_per_sec: 4.588 GiB/s (total gbytes: 1.5 GiB) (time elapsed: 0.33621788024902344 s) (per-host)
I0423 08:05:26.457441 138057675323200 base_pytree_checkpoint_handler.py:768] [process=6][thread=MainThread] Initiated Pytree async_save. Time taken: 0.336275s (batch_requests_ready=0.018661s, total_serialization_initiated=0.317550s, others=0.000064s)
I0423 08:05:26.457700 138057675323200 composite_checkpoint_handler.py:715] [process=6][thread=MainThread] Initiated CompositeCheckpointHandler.async_save. Time taken: 0.340485s (all_items=0.000017s, per_item={'items': '0.00001717'}, temp_paths=0.340468)
I0423 08:05:26.458539 138057675323200 event_tracking.py:125] [process=6] [async] Finished blocking save in 1.11 seconds. Continuing save @ gs://lance-maxtext/linen_ckpt_xpk_main_20260423_071538/linen_xpk_main_20260423_071538_13_scan_layers_false/checkpoints/0.
I0423 08:05:26.458904 137928369248000 async_checkpointer.py:76] [process=6][thread=async_save] Background save thread started. Deadline for this save operation is 2026-04-23 08:25:26.458866
I0423 08:05:26.507604 138057675323200 checkpoint_manager.py:1560] [process=6][thread=MainThread][step=0] Starting CheckpointManager Save Finalize thread=save_finalize
I0423 08:05:26.507977 137932084205312 async_checkpointer.py:280] [process=6][thread=save_finalize] Waiting for background save thread=async_save.
I0423 08:05:26.508149 138057675323200 standard_logger.py:34] {'step': 0, 'event_type': 'save', 'directory': 'gs://lance-maxtext/linen_ckpt_xpk_main_20260423_071538/linen_xpk_main_20260423_071538_13_scan_layers_false/checkpoints', 'reached_preemption': False, 'preemption_received_at': None, 'synchronous': False, 'wait_for_prev_start_time': 1776931525.349133, 'wait_for_prev_duration_secs': 6.151199340820312e-05, 'time_between_consecutive_saves_sec': None, 'checkpointer_blocking_start_time': 1776931525.351018, 'checkpointer_blocking_duration_secs': 1.1080374717712402, 'get_old_steps_start_time': 1776931526.4590802, 'get_old_steps_duration_secs': 4.410743713378906e-05, 'checkpoint_manager_blocking_start_time': 1776931525.347404, 'checkpoint_manager_blocking_duration_secs': 1.160705804824829}
I0423 08:05:26.508342 138057675323200 checkpointing.py:408] Started an asynchronous checkpoint save for step 0
I0423 08:05:26.508395 138057675323200 max_utils.py:750] 
Memstats: After params initialized:
I0423 08:05:26.508447 138057675323200 max_utils.py:756] 	Using (GB) 0.82 / 31.25 (2.624000%) on TPU_24(process=6,(0,6,0,0))
I0423 08:05:26.508481 138057675323200 max_utils.py:756] 	Using (GB) 0.82 / 31.25 (2.624000%) on TPU_25(process=6,(1,6,0,0))
I0423 08:05:26.508508 138057675323200 max_utils.py:756] 	Using (GB) 0.82 / 31.25 (2.624000%) on TPU_28(process=6,(0,7,0,0))
I0423 08:05:26.508532 138057675323200 max_utils.py:756] 	Using (GB) 0.82 / 31.25 (2.624000%) on TPU_29(process=6,(1,7,0,0))
I0423 08:05:26.866797 138057675323200 metric_logger.py:196] completed step: 0, seconds: 61.342, TFLOP/s/device: 0.221, Tokens/s/device: 33.387, total_weights: 65536, loss: 10.877, lm_loss: 10.877, perplexity: 52938.617
I0423 08:05:27.025209 138057675323200 metric_logger.py:196] completed step: 1, seconds: 1.736, TFLOP/s/device: 7.826, Tokens/s/device: 1179.594, total_weights: 65536, loss: 10.877, lm_loss: 10.877, perplexity: 52938.617
I0423 08:05:27.469827 138057675323200 metric_logger.py:196] completed step: 2, seconds: 0.031, TFLOP/s/device: 444.765, Tokens/s/device: 67039.838, total_weights: 65536, loss: 10.268, lm_loss: 10.268, perplexity: 28794.986
I0423 08:05:27.606175 138057675323200 metric_logger.py:196] completed step: 3, seconds: 0.445, TFLOP/s/device: 30.505, Tokens/s/device: 4598.073, total_weights: 65536, loss: 9.741, lm_loss: 9.741, perplexity: 16999.971
I0423 08:05:27.883059 138057675323200 metric_logger.py:196] completed step: 4, seconds: 0.143, TFLOP/s/device: 95.035, Tokens/s/device: 14324.784, total_weights: 65536, loss: 9.285, lm_loss: 9.285, perplexity: 10778.489
I0423 08:05:27.894690 138057675323200 metric_logger.py:196] completed step: 5, seconds: 0.136, TFLOP/s/device: 99.927, Tokens/s/device: 15062.035, total_weights: 65536, loss: 8.901, lm_loss: 8.901, perplexity: 7336.347
I0423 08:05:30.375991    2828 google_auth_provider.cc:181] Running on GCE, using service account 562977990677-compute@developer.gserviceaccount.com
I0423 08:05:32.511241 137932092598016 array_metadata_store.py:203] [process=6][thread=array_type_handler] Wrote 444 array_metadata.ArrayMetadata to gs://lance-maxtext/linen_ckpt_xpk_main_20260423_071538/linen_xpk_main_20260423_071538_13_scan_layers_false/checkpoints/0/items/array_metadatas/process_6
I0423 08:05:52.083383 138057675323200 metric_logger.py:196] completed step: 6, seconds: 0.278, TFLOP/s/device: 48.877, Tokens/s/device: 7367.251, total_weights: 65536, loss: 8.602, lm_loss: 8.602, perplexity: 5440.818
I0423 08:05:52.219369 138057675323200 metric_logger.py:196] completed step: 7, seconds: 24.057, TFLOP/s/device: 0.565, Tokens/s/device: 85.130, total_weights: 65536, loss: 8.393, lm_loss: 8.393, perplexity: 4418.163
I0423 08:05:52.355632 138057675323200 metric_logger.py:196] completed step: 8, seconds: 0.142, TFLOP/s/device: 95.625, Tokens/s/device: 14413.704, total_weights: 65536, loss: 8.264, lm_loss: 8.264, perplexity: 3881.985
I0423 08:05:52.491068 138057675323200 checkpointing.py:772] Waiting for step 9 to finish before checkpoint...
I0423 08:05:52.494589 138057675323200 checkpointing.py:776] Waited 0.0035316944122314453 seconds for step 9 to finish before starting checkpointing.
I0423 08:05:52.497320 138057675323200 checkpoint_manager.py:2020] [process=6][thread=MainThread][step=0][wait_until_finished] Waiting for Save Finalize thread (save_finalize) to complete.
I0423 08:06:00.930972 137928369248000 base_pytree_checkpoint_handler.py:130] [process=6] /jax/orbax/write/gbytes_per_sec: 45.378 MiB/s (total gbytes: 1.5 GiB) (time elapsed: 34.80973935127258 s) (per-host)
I0423 08:06:00.931118 137928369248000 async_checkpointer.py:90] [process=6][thread=async_save] 3 Handler Commit operations completed. Time taken: 34.472100s.
I0423 08:06:10.174330 137928369248000 async_checkpointer.py:160] [process=6][thread=async_save] Background save thread done. Time taken: 43.715296s.
I0423 08:06:10.174617 137932084205312 async_checkpointer.py:288] [process=6][thread=save_finalize] Done with waiting for background save thread=async_save.
I0423 08:06:10.174734 137932084205312 async_checkpointer.py:298] [process=6][thread=save_finalize] No errors found in background save thread=async_save.
I0423 08:06:10.174785 137932084205312 checkpoint_manager.py:2137] [process=6][thread=save_finalize][step=0] CheckpointManager Save Finalize is syncing with other hosts...
I0423 08:06:10.176432 137932084205312 checkpoint_manager.py:2146] [process=6][thread=save_finalize][step=0] CheckpointManager Save Finalize is done on all hosts.
I0423 08:06:10.176603 138057675323200 checkpoint_manager.py:2032] [process=6][thread=MainThread][step=0][wait_until_finished] Done waiting for Save Finalize thread (save_finalize) running at step=0.
W0423 08:06:10.176736 138057675323200 checkpoint_manager.py:1452] Waiting for previous save to complete took 17.679417 seconds. If this number is high, consider checkpointing less frequently.
I0423 08:06:10.179191 138057675323200 checkpoint_manager.py:1512] [process=6] Saving checkpoint at step 9
I0423 08:06:10.181172 138057675323200 event_tracking.py:70] [process=6] [async] Started save checkpoint @ gs://lance-maxtext/linen_ckpt_xpk_main_20260423_071538/linen_xpk_main_20260423_071538_13_scan_layers_false/checkpoints/9.
I0423 08:06:10.923310 138057675323200 jax_array_handlers.py:360] Scheduling D2H of 444 prioritized jax.Array.
I0423 08:06:10.923481 138057675323200 replica_slices.py:424] Transferring arrays to host memory with options: use_replica_parallel=True, min_slice_bytes_for_replica_parallel=None, max_replicas_for_replica_parallel=None, enable_pinned_host_transfer=False
I0423 08:06:11.046299 138057675323200 base_pytree_checkpoint_handler.py:154] [process=6][thread=MainThread] Initiated "orbax.checkpoint._src.serialization.jax_array_handlers.ArrayHandler".serialize. Time taken: 0.124442s
I0423 08:06:11.046462 138057675323200 base_pytree_checkpoint_handler.py:130] [process=6] /jax/orbax/write/blocking_gbytes_per_sec: 9.706 GiB/s (total gbytes: 1.5 GiB) (time elapsed: 0.15893006324768066 s) (per-host)
I0423 08:06:11.046511 138057675323200 base_pytree_checkpoint_handler.py:768] [process=6][thread=MainThread] Initiated Pytree async_save. Time taken: 0.158988s (batch_requests_ready=0.018515s, total_serialization_initiated=0.140408s, others=0.000065s)
I0423 08:06:11.046797 138057675323200 composite_checkpoint_handler.py:715] [process=6][thread=MainThread] Initiated CompositeCheckpointHandler.async_save. Time taken: 0.163442s (all_items=0.000015s, per_item={'items': '0.00001454'}, temp_paths=0.163427)
I0423 08:06:11.047544 138057675323200 event_tracking.py:125] [process=6] [async] Finished blocking save in 0.87 seconds. Continuing save @ gs://lance-maxtext/linen_ckpt_xpk_main_20260423_071538/linen_xpk_main_20260423_071538_13_scan_layers_false/checkpoints/9.
I0423 08:06:11.047920 137928889521920 async_checkpointer.py:76] [process=6][thread=async_save] Background save thread started. Deadline for this save operation is 2026-04-23 08:26:11.047879
I0423 08:06:11.059668 138057675323200 checkpoint_manager.py:1560] [process=6][thread=MainThread][step=9] Starting CheckpointManager Save Finalize thread=save_finalize
I0423 08:06:11.059909 137932084205312 async_checkpointer.py:280] [process=6][thread=save_finalize] Waiting for background save thread=async_save.
I0423 08:06:11.060037 138057675323200 standard_logger.py:34] {'step': 9, 'event_type': 'save', 'directory': 'gs://lance-maxtext/linen_ckpt_xpk_main_20260423_071538/linen_xpk_main_20260423_071538_13_scan_layers_false/checkpoints', 'reached_preemption': False, 'preemption_received_at': None, 'synchronous': False, 'wait_for_prev_start_time': 1776931552.497289, 'wait_for_prev_duration_secs': 17.67941689491272, 'time_between_consecutive_saves_sec': None, 'checkpointer_blocking_start_time': 1776931570.1792295, 'checkpointer_blocking_duration_secs': 0.8688409328460693, 'get_old_steps_start_time': 1776931571.0481064, 'get_old_steps_duration_secs': 3.0279159545898438e-05, 'checkpoint_manager_blocking_start_time': 1776931552.495557, 'checkpoint_manager_blocking_duration_secs': 18.564445972442627}
I0423 08:06:11.060258 138057675323200 checkpointing.py:408] Started an asynchronous checkpoint save for step 9
I0423 08:06:11.060306 138057675323200 checkpoint_manager.py:2020] [process=6][thread=MainThread][step=9][wait_until_finished] Waiting for Save Finalize thread (save_finalize) to complete.
I0423 08:06:16.482231 137932092598016 array_metadata_store.py:203] [process=6][thread=array_type_handler] Wrote 444 array_metadata.ArrayMetadata to gs://lance-maxtext/linen_ckpt_xpk_main_20260423_071538/linen_xpk_main_20260423_071538_13_scan_layers_false/checkpoints/9/items/array_metadatas/process_6
I0423 08:06:53.002733 137928889521920 base_pytree_checkpoint_handler.py:130] [process=6] /jax/orbax/write/gbytes_per_sec: 37.507 MiB/s (total gbytes: 1.5 GiB) (time elapsed: 42.115159034729004 s) (per-host)
I0423 08:06:53.002871 137928889521920 async_checkpointer.py:90] [process=6][thread=async_save] 3 Handler Commit operations completed. Time taken: 41.954834s.
I0423 08:07:02.600718 137928889521920 async_checkpointer.py:160] [process=6][thread=async_save] Background save thread done. Time taken: 51.552666s.
I0423 08:07:02.601030 137932084205312 async_checkpointer.py:288] [process=6][thread=save_finalize] Done with waiting for background save thread=async_save.
I0423 08:07:02.601175 137932084205312 async_checkpointer.py:298] [process=6][thread=save_finalize] No errors found in background save thread=async_save.
I0423 08:07:02.601224 137932084205312 checkpoint_manager.py:2137] [process=6][thread=save_finalize][step=9] CheckpointManager Save Finalize is syncing with other hosts...
I0423 08:07:02.602705 137932084205312 checkpoint_manager.py:2146] [process=6][thread=save_finalize][step=9] CheckpointManager Save Finalize is done on all hosts.
I0423 08:07:02.602881 138057675323200 checkpoint_manager.py:2032] [process=6][thread=MainThread][step=9][wait_until_finished] Done waiting for Save Finalize thread (save_finalize) running at step=9.
I0423 08:07:02.603035 138057675323200 checkpoint_manager.py:2009] [process=6][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning.
I0423 08:07:02.604192 138057675323200 metric_logger.py:196] completed step: 9, seconds: 0.137, TFLOP/s/device: 99.374, Tokens/s/device: 14978.753, total_weights: 65536, loss: 8.188, lm_loss: 8.188, perplexity: 3599.097
Per train step:
 Total TFLOPs: 13.59 
 split as 93.93% learnable weight flops and 6.07% attention flops
XPK End: Thu Apr 23 08:07:13 UTC 2026
EXIT_CODE=0
XPK Start: Thu Apr 23 13:28:45 UTC 2026
PyTorch was not found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
`rope_parameters`'s factor field must be a float >= 1, got 40
`rope_parameters`'s beta_fast field must be a float, got 32
`rope_parameters`'s beta_slow field must be a float, got 1
DeepseekV32Config got `key=rope_scaling` in kwargs but hasn't set it as attribute. For RoPE standardization you need to set `self.rope_parameters` in model's config. 
2026-04-23 13:29:10.743537: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
I0423 13:29:10.956107 132680881104704 max_utils.py:273] Attempting to initialize the jax distributed system...
I0423 13:29:19.998201 132680881104704 distributed.py:149] Starting JAX distributed service on [::]:8482
I0423 13:29:20.000469 132680881104704 distributed.py:172] Connecting to JAX distributed service on mt-13-scan-layers-false-4q9cy-slice-job-0-0.mt-13-scan-layers-false-4q9cy:8482
I0423 13:29:21.059775 132680881104704 max_utils.py:284] Jax distributed system initialized!
I0423 13:29:27.226684 132680881104704 max_utils.py:800] System Information: Jax Version: 0.9.2
I0423 13:29:27.226793 132680881104704 max_utils.py:801] System Information: Jaxlib Version: 0.9.2
I0423 13:29:27.226833 132680881104704 max_utils.py:802] System Information: Jax Backend: PJRT C API
TFRT TPU v6 lite
Built on Mar 4 2026 11:32:08 (1772652728) cl/878335365
I0423 13:29:27.226870 132680881104704 train_utils.py:391] WARNING: Sequence packing is essentially ignored for synthetic data. Please use a real dataset to use sequence packing.
I0423 13:29:27.922353 132680881104704 maxtext_utils.py:1771] Num_devices: 32, shape (1, 1, 1, 32, 1, 1, 1, 1, 1, 1, 1, 1, 1)
I0423 13:29:27.922941 132680881104704 maxtext_utils.py:1771] Num_devices: 32, shape (1, 1, 1, 32, 1, 1, 1, 1, 1, 1, 1, 1, 1)
I0423 13:29:27.923132 132680881104704 checkpointing.py:688] Setting up checkpoint logger...
I0423 13:29:27.923184 132680881104704 checkpointing.py:234] Creating checkpoint manager with ocdbt=True and zarr3=True
I0423 13:29:27.923226 132680881104704 pytree_checkpoint_handler.py:592] save_device_host_concurrent_bytes=None
I0423 13:29:27.923559 132680881104704 base_pytree_checkpoint_handler.py:441] Created BasePyTreeCheckpointHandler: use_ocdbt=True, use_zarr3=True, pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=<orbax.checkpoint._src.metadata.array_metadata_store.Store object at 0x78ab752482c0>, enable_pinned_host_transfer=False, save_concurrent_bytes: 96000000000 (89.4 GiB), restore_concurrent_bytes: 96000000000 (89.4 GiB)
I0423 13:29:30.908893 132680881104704 checkpointing.py:266] Enabling policy for fixed interval checkpointing.
I0423 13:29:30.909168 132680881104704 checkpoint_manager.py:708] [process=6][thread=MainThread] CheckpointManager init: checkpointers=None, item_names=('items',), item_handlers={'items': <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78972c1a6e10>}, handler_registry=None
I0423 13:29:30.909451 132680881104704 composite_checkpoint_handler.py:237] Deferred registration for item: "items". Adding handler `<orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78972c1a6e10>` for item "items" and save args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>` to `_handler_registry`.
I0423 13:29:30.909512 132680881104704 composite_checkpoint_handler.py:237] Deferred registration for item: "metrics". Adding handler `<orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7896646999a0>` for item "metrics" and save args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>` and restore args `<class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>` to `_handler_registry`.
I0423 13:29:30.909549 132680881104704 composite_checkpoint_handler.py:505] Initialized registry DefaultCheckpointHandlerRegistry({('items', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeSaveArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78972c1a6e10>, ('items', <class 'orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeRestoreArgs'>): <orbax.checkpoint._src.handlers.pytree_checkpoint_handler.PyTreeCheckpointHandler object at 0x78972c1a6e10>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonSaveArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7896646999a0>, ('metrics', <class 'orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonRestoreArgs'>): <orbax.checkpoint._src.handlers.json_checkpoint_handler.JsonCheckpointHandler object at 0x7896646999a0>}).
I0423 13:29:30.909869 132680881104704 abstract_checkpointer.py:35] orbax-checkpoint version: 0.11.34
I0423 13:29:30.909937 132680881104704 async_checkpointer.py:192] [process=6][thread=MainThread] Using barrier_sync_fn: <function get_barrier_sync_fn.<locals>._fn at 0x7896642c9d00> timeout: 1200 secs and primary_host=0 for async checkpoint writes
I0423 13:29:31.938016 132680881104704 checkpoint_manager.py:1812] Found 0 checkpoint steps in gs://lance-maxtext/linen_ckpt_xpk_feat_nnx_post_train_fixes_20260423_124541/linen_xpk_feat_nnx_post_train_fixes_20260423_124541_13_scan_layers_false/checkpoints
I0423 13:29:31.945634 132680881104704 checkpoint_manager.py:929] [process=6][thread=MainThread] CheckpointManager created,  primary_host=0, CheckpointManagerOptions=CheckpointManagerOptions(save_interval_steps=1, max_to_keep=None, keep_time_interval=None, keep_period=None, should_keep_fn=None, best_fn=None, best_mode='max', keep_checkpoints_without_metrics=True, step_prefix=None, step_format_fixed_length=None, step_name_format=None, create=True, cleanup_tmp_directories=False, save_on_steps=frozenset(), single_host_load_and_broadcast=False, todelete_subdir=None, todelete_full_path=None, enable_background_delete=False, read_only=False, enable_async_checkpointing=True, async_options=None, multiprocessing_options=MultiprocessingOptions(primary_host=0, active_processes=None, barrier_sync_key_prefix=None), should_save_fn=None, file_options=FileOptions(path_permission_mode=None), save_root_metadata=True, temporary_path_class=None, save_decision_policy=FixedIntervalPolicy(interval=10), preservation_policy=LatestN(n=None), prevent_write_metrics=False, enable_should_save_is_saving_in_progress_check=True, enable_per_process_directory_creation=False, lightweight_initialize=False), root_directory=gs://lance-maxtext/linen_ckpt_xpk_feat_nnx_post_train_fixes_20260423_124541/linen_xpk_feat_nnx_post_train_fixes_20260423_124541_13_scan_layers_false/checkpoints: <orbax.checkpoint.checkpoint_manager.CheckpointManager object at 0x7896645fcd10>
I0423 13:29:31.945751 132680881104704 checkpointing.py:302] Checkpoint manager created!
I0423 13:29:32.898742 132680881104704 nnx_wrappers.py:437] Unknown Logical: bfloat16[32,2048,2048]...................................... ('activation_batch', 'activation_norm_length', 'activation_embed').
I0423 13:29:32.898848 132680881104704 nnx_wrappers.py:437] Unknown Physical: bfloat16[32,2048,2048]...................................... ('fsdp', None, None).
I0423 13:29:33.279887 132680881104704 attentions.py:1088] attentions/inputs_q Logical: bfloat16[32,2048,2048]...................................... ('activation_batch', 'activation_attn_length', 'activation_attn_embed').
I0423 13:29:33.279981 132680881104704 attentions.py:1088] attentions/inputs_q Physical: bfloat16[32,2048,2048]...................................... ('fsdp', None, None).
I0423 13:29:33.296268 132680881104704 attentions.py:1089] attentions/inputs_kv Logical: bfloat16[32,2048,2048]...................................... ('activation_batch', 'activation_attn_length', 'activation_attn_embed').
I0423 13:29:33.296329 132680881104704 attentions.py:1089] attentions/inputs_kv Physical: bfloat16[32,2048,2048]...................................... ('fsdp', None, None).
I0423 13:29:33.326244 132680881104704 attentions.py:1154] attentions/query Logical: bfloat16[32,2048,16,128].................................... ('activation_kv_batch', 'activation_attn_length', 'activation_kv_heads', 'activation_kv_head_dim').
I0423 13:29:33.326313 132680881104704 attentions.py:1154] attentions/query Physical: bfloat16[32,2048,16,128].................................... ('fsdp', None, None, None).
I0423 13:29:33.342850 132680881104704 attentions.py:1155] attentions/key Logical: bfloat16[32,2048,16,128].................................... ('activation_kv_batch', 'activation_attn_length', 'activation_kv_heads', 'activation_kv_head_dim').
I0423 13:29:33.342913 132680881104704 attentions.py:1155] attentions/key Physical: bfloat16[32,2048,16,128].................................... ('fsdp', None, None, None).
I0423 13:29:33.359388 132680881104704 attentions.py:1156] attentions/value Logical: bfloat16[32,2048,16,128].................................... ('activation_kv_batch', 'activation_attn_length', 'activation_kv_heads', 'activation_kv_head_dim').
I0423 13:29:33.359451 132680881104704 attentions.py:1156] attentions/value Physical: bfloat16[32,2048,16,128].................................... ('fsdp', None, None, None).
I0423 13:29:33.388468 132680881104704 attentions.py:1198] attentions/out Logical: bfloat16[32,2048,16,128].................................... ('activation_batch', 'activation_attn_length', 'activation_heads', 'activation_kv').
I0423 13:29:33.388546 132680881104704 attentions.py:1198] attentions/out Physical: bfloat16[32,2048,16,128].................................... ('fsdp', None, None, None).
I0423 13:29:33.410843 132680881104704 linears.py:525] linears/x Logical: bfloat16[32,2048,7168]...................................... ('activation_batch', 'activation_length', 'activation_mlp').
I0423 13:29:33.410907 132680881104704 linears.py:525] linears/x Physical: bfloat16[32,2048,7168]...................................... ('fsdp', None, None).
I0423 13:29:36.331605 132680881104704 checkpointing.py:578] checkpoint manager exists so trying to load this run's existing checkpoint
I0423 13:29:36.331734 132680881104704 checkpointing.py:676] No existing checkpoints found, not restoring checkpoint.
fsdp: 32
I0423 13:29:43.659507 132680881104704 maxtext_utils.py:1880]  params/params/decoder/decoder_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 13:29:43.659651 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_0/mlp/wi_0/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 13:29:43.659706 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_0/mlp/wi_1/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 13:29:43.659763 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_0/mlp/wo/kernel
    Shape:     float32[7168,2048]
    Logical:   P('mlp', 'embed')
    Physical:  (None, 'fsdp')
I0423 13:29:43.659804 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_0/post_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 13:29:43.659840 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_0/pre_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 13:29:43.659891 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_0/self_attention/key/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.659944 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_0/self_attention/out/kernel
    Shape:     float32[16,128,2048]
    Logical:   P('heads', 'kv', 'embed')
    Physical:  (None, None, 'fsdp')
I0423 13:29:43.659981 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_0/self_attention/query/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'q_heads', 'kv')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.660018 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_0/self_attention/value/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.660053 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_1/mlp/wi_0/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 13:29:43.660088 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_1/mlp/wi_1/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 13:29:43.660138 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_1/mlp/wo/kernel
    Shape:     float32[7168,2048]
    Logical:   P('mlp', 'embed')
    Physical:  (None, 'fsdp')
I0423 13:29:43.660172 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_1/post_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 13:29:43.660204 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_1/pre_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 13:29:43.660237 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_1/self_attention/key/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.660271 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_1/self_attention/out/kernel
    Shape:     float32[16,128,2048]
    Logical:   P('heads', 'kv', 'embed')
    Physical:  (None, None, 'fsdp')
I0423 13:29:43.660303 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_1/self_attention/query/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'q_heads', 'kv')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.660335 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_1/self_attention/value/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.660380 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_10/mlp/wi_0/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 13:29:43.660416 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_10/mlp/wi_1/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 13:29:43.660448 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_10/mlp/wo/kernel
    Shape:     float32[7168,2048]
    Logical:   P('mlp', 'embed')
    Physical:  (None, 'fsdp')
I0423 13:29:43.660478 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_10/post_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 13:29:43.660507 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_10/pre_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 13:29:43.660537 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_10/self_attention/key/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.660574 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_10/self_attention/out/kernel
    Shape:     float32[16,128,2048]
    Logical:   P('heads', 'kv', 'embed')
    Physical:  (None, None, 'fsdp')
I0423 13:29:43.660605 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_10/self_attention/query/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'q_heads', 'kv')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.660635 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_10/self_attention/value/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.660666 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_11/mlp/wi_0/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 13:29:43.660695 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_11/mlp/wi_1/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 13:29:43.660726 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_11/mlp/wo/kernel
    Shape:     float32[7168,2048]
    Logical:   P('mlp', 'embed')
    Physical:  (None, 'fsdp')
I0423 13:29:43.660754 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_11/post_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 13:29:43.660782 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_11/pre_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 13:29:43.660832 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_11/self_attention/key/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.660887 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_11/self_attention/out/kernel
    Shape:     float32[16,128,2048]
    Logical:   P('heads', 'kv', 'embed')
    Physical:  (None, None, 'fsdp')
I0423 13:29:43.660945 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_11/self_attention/query/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'q_heads', 'kv')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.660998 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_11/self_attention/value/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.661057 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_12/mlp/wi_0/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 13:29:43.661125 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_12/mlp/wi_1/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 13:29:43.661181 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_12/mlp/wo/kernel
    Shape:     float32[7168,2048]
    Logical:   P('mlp', 'embed')
    Physical:  (None, 'fsdp')
I0423 13:29:43.661240 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_12/post_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 13:29:43.661298 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_12/pre_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 13:29:43.661360 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_12/self_attention/key/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.661414 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_12/self_attention/out/kernel
    Shape:     float32[16,128,2048]
    Logical:   P('heads', 'kv', 'embed')
    Physical:  (None, None, 'fsdp')
I0423 13:29:43.661465 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_12/self_attention/query/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'q_heads', 'kv')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.661515 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_12/self_attention/value/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.661570 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_13/mlp/wi_0/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 13:29:43.661617 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_13/mlp/wi_1/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 13:29:43.661669 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_13/mlp/wo/kernel
    Shape:     float32[7168,2048]
    Logical:   P('mlp', 'embed')
    Physical:  (None, 'fsdp')
I0423 13:29:43.661724 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_13/post_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 13:29:43.661772 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_13/pre_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 13:29:43.661821 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_13/self_attention/key/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.661870 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_13/self_attention/out/kernel
    Shape:     float32[16,128,2048]
    Logical:   P('heads', 'kv', 'embed')
    Physical:  (None, None, 'fsdp')
I0423 13:29:43.661918 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_13/self_attention/query/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'q_heads', 'kv')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.661971 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_13/self_attention/value/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.662031 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_14/mlp/wi_0/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 13:29:43.662084 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_14/mlp/wi_1/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 13:29:43.662157 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_14/mlp/wo/kernel
    Shape:     float32[7168,2048]
    Logical:   P('mlp', 'embed')
    Physical:  (None, 'fsdp')
I0423 13:29:43.662206 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_14/post_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 13:29:43.662254 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_14/pre_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 13:29:43.662305 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_14/self_attention/key/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.662358 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_14/self_attention/out/kernel
    Shape:     float32[16,128,2048]
    Logical:   P('heads', 'kv', 'embed')
    Physical:  (None, None, 'fsdp')
I0423 13:29:43.662409 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_14/self_attention/query/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'q_heads', 'kv')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.662457 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_14/self_attention/value/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.662503 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_15/mlp/wi_0/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 13:29:43.662554 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_15/mlp/wi_1/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 13:29:43.662605 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_15/mlp/wo/kernel
    Shape:     float32[7168,2048]
    Logical:   P('mlp', 'embed')
    Physical:  (None, 'fsdp')
I0423 13:29:43.662653 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_15/post_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 13:29:43.662690 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_15/pre_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 13:29:43.662733 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_15/self_attention/key/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.662787 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_15/self_attention/out/kernel
    Shape:     float32[16,128,2048]
    Logical:   P('heads', 'kv', 'embed')
    Physical:  (None, None, 'fsdp')
I0423 13:29:43.662835 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_15/self_attention/query/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'q_heads', 'kv')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.662886 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_15/self_attention/value/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.662934 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_2/mlp/wi_0/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 13:29:43.662984 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_2/mlp/wi_1/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 13:29:43.663033 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_2/mlp/wo/kernel
    Shape:     float32[7168,2048]
    Logical:   P('mlp', 'embed')
    Physical:  (None, 'fsdp')
I0423 13:29:43.663079 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_2/post_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 13:29:43.663137 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_2/pre_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 13:29:43.663185 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_2/self_attention/key/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.663237 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_2/self_attention/out/kernel
    Shape:     float32[16,128,2048]
    Logical:   P('heads', 'kv', 'embed')
    Physical:  (None, None, 'fsdp')
I0423 13:29:43.663278 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_2/self_attention/query/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'q_heads', 'kv')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.663313 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_2/self_attention/value/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.663365 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_3/mlp/wi_0/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 13:29:43.663414 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_3/mlp/wi_1/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 13:29:43.663460 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_3/mlp/wo/kernel
    Shape:     float32[7168,2048]
    Logical:   P('mlp', 'embed')
    Physical:  (None, 'fsdp')
I0423 13:29:43.663508 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_3/post_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 13:29:43.663568 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_3/pre_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 13:29:43.663622 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_3/self_attention/key/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.663672 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_3/self_attention/out/kernel
    Shape:     float32[16,128,2048]
    Logical:   P('heads', 'kv', 'embed')
    Physical:  (None, None, 'fsdp')
I0423 13:29:43.663720 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_3/self_attention/query/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'q_heads', 'kv')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.663768 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_3/self_attention/value/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.663815 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_4/mlp/wi_0/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 13:29:43.663864 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_4/mlp/wi_1/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 13:29:43.663909 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_4/mlp/wo/kernel
    Shape:     float32[7168,2048]
    Logical:   P('mlp', 'embed')
    Physical:  (None, 'fsdp')
I0423 13:29:43.663954 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_4/post_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 13:29:43.663996 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_4/pre_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 13:29:43.664045 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_4/self_attention/key/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.664088 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_4/self_attention/out/kernel
    Shape:     float32[16,128,2048]
    Logical:   P('heads', 'kv', 'embed')
    Physical:  (None, None, 'fsdp')
I0423 13:29:43.664153 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_4/self_attention/query/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'q_heads', 'kv')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.664201 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_4/self_attention/value/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.664245 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_5/mlp/wi_0/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 13:29:43.664280 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_5/mlp/wi_1/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 13:29:43.664311 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_5/mlp/wo/kernel
    Shape:     float32[7168,2048]
    Logical:   P('mlp', 'embed')
    Physical:  (None, 'fsdp')
I0423 13:29:43.664340 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_5/post_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 13:29:43.664367 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_5/pre_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 13:29:43.664397 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_5/self_attention/key/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.664427 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_5/self_attention/out/kernel
    Shape:     float32[16,128,2048]
    Logical:   P('heads', 'kv', 'embed')
    Physical:  (None, None, 'fsdp')
I0423 13:29:43.664457 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_5/self_attention/query/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'q_heads', 'kv')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.664487 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_5/self_attention/value/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.664517 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_6/mlp/wi_0/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 13:29:43.664550 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_6/mlp/wi_1/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 13:29:43.664580 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_6/mlp/wo/kernel
    Shape:     float32[7168,2048]
    Logical:   P('mlp', 'embed')
    Physical:  (None, 'fsdp')
I0423 13:29:43.664608 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_6/post_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 13:29:43.664636 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_6/pre_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 13:29:43.664666 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_6/self_attention/key/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.664695 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_6/self_attention/out/kernel
    Shape:     float32[16,128,2048]
    Logical:   P('heads', 'kv', 'embed')
    Physical:  (None, None, 'fsdp')
I0423 13:29:43.664725 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_6/self_attention/query/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'q_heads', 'kv')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.664754 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_6/self_attention/value/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.664783 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_7/mlp/wi_0/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 13:29:43.664812 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_7/mlp/wi_1/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 13:29:43.664842 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_7/mlp/wo/kernel
    Shape:     float32[7168,2048]
    Logical:   P('mlp', 'embed')
    Physical:  (None, 'fsdp')
I0423 13:29:43.664869 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_7/post_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 13:29:43.664896 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_7/pre_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 13:29:43.664926 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_7/self_attention/key/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.664955 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_7/self_attention/out/kernel
    Shape:     float32[16,128,2048]
    Logical:   P('heads', 'kv', 'embed')
    Physical:  (None, None, 'fsdp')
I0423 13:29:43.664984 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_7/self_attention/query/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'q_heads', 'kv')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.665013 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_7/self_attention/value/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.665042 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_8/mlp/wi_0/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 13:29:43.665071 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_8/mlp/wi_1/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 13:29:43.665112 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_8/mlp/wo/kernel
    Shape:     float32[7168,2048]
    Logical:   P('mlp', 'embed')
    Physical:  (None, 'fsdp')
I0423 13:29:43.665141 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_8/post_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 13:29:43.665169 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_8/pre_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 13:29:43.665198 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_8/self_attention/key/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.665227 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_8/self_attention/out/kernel
    Shape:     float32[16,128,2048]
    Logical:   P('heads', 'kv', 'embed')
    Physical:  (None, None, 'fsdp')
I0423 13:29:43.665256 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_8/self_attention/query/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'q_heads', 'kv')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.665285 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_8/self_attention/value/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.665314 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_9/mlp/wi_0/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 13:29:43.665343 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_9/mlp/wi_1/kernel
    Shape:     float32[2048,7168]
    Logical:   P('embed', 'mlp')
    Physical:  ('fsdp', None)
I0423 13:29:43.665372 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_9/mlp/wo/kernel
    Shape:     float32[7168,2048]
    Logical:   P('mlp', 'embed')
    Physical:  (None, 'fsdp')
I0423 13:29:43.665400 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_9/post_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 13:29:43.665429 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_9/pre_self_attention_layer_norm/scale
    Shape:     float32[2048]
    Logical:   P('norm',)
    Physical:  (None,)
I0423 13:29:43.665461 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_9/self_attention/key/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.665491 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_9/self_attention/out/kernel
    Shape:     float32[16,128,2048]
    Logical:   P('heads', 'kv', 'embed')
    Physical:  (None, None, 'fsdp')
I0423 13:29:43.665521 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_9/self_attention/query/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'q_heads', 'kv')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.665554 132680881104704 maxtext_utils.py:1880]  params/params/decoder/layers_9/self_attention/value/kernel
    Shape:     float32[2048,16,128]
    Logical:   P('embed', 'kv_heads', 'kv_head_dim')
    Physical:  ('fsdp', None, None)
I0423 13:29:43.665605 132680881104704 maxtext_utils.py:1880]  params/params/decoder/logits_dense/kernel
    Shape:     float32[2048,32000]
    Logical:   P('embed_vocab', 'vocab')
    Physical:  ('fsdp', None)

I0423 13:29:43.665650 132680881104704 maxtext_utils.py:1880]  params/params/token_embedder/embedding
    Shape:     float32[32000,2048]
    Logical:   P('vocab', 'embed_vocab')
    Physical:  (None, 'fsdp')
I0423 13:29:48.258944 132680881104704 train.py:157] train/xent Logical: float32[32,2048]............................................ ('activation_embed_and_logits_batch', 'activation_length').
I0423 13:29:48.259039 132680881104704 train.py:157] train/xent Physical: float32[32,2048]............................................ ('fsdp', None).
I0423 13:29:48.274798 132680881104704 train.py:164] train/z_loss Logical: float32[32,2048]............................................ ('activation_embed_and_logits_batch', 'activation_length').
I0423 13:29:48.274859 132680881104704 train.py:164] train/z_loss Physical: float32[32,2048]............................................ ('fsdp', None).
I0423 13:30:45.554044 132680881104704 max_utils.py:791] Total memory size: 1.8 GB, Output size: 0.4 GB, Temp size: 1.5 GB, Argument size: 0.4 GB, Host temp size: 0.0 GB.
I0423 13:30:45.555301 132680881104704 metric_logger.py:301] number parameters: 1.104 billion
I0423 13:31:46.744894 132680881104704 checkpointing.py:794] Waiting for step 0 to finish before checkpoint...
I0423 13:31:46.979646 132680881104704 checkpointing.py:798] Waited 0.23473334312438965 seconds for step 0 to finish before starting checkpointing.
I0423 13:31:46.983136 132680881104704 checkpoint_manager.py:2009] [process=6][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning.
I0423 13:31:46.984985 132680881104704 checkpoint_manager.py:1512] [process=6] Saving checkpoint at step 0
I0423 13:31:46.986616 132680881104704 event_tracking.py:70] [process=6] [async] Started save checkpoint @ gs://lance-maxtext/linen_ckpt_xpk_feat_nnx_post_train_fixes_20260423_124541/linen_xpk_feat_nnx_post_train_fixes_20260423_124541_13_scan_layers_false/checkpoints/0.
I0423 13:31:47.767395 132680881104704 signaling_client.py:364] Using JaxDistributedSignalingClient
I0423 13:31:47.768345 132680881104704 jax_array_handlers.py:360] Scheduling D2H of 444 prioritized jax.Array.
I0423 13:31:47.768466 132680881104704 replica_slices.py:424] Transferring arrays to host memory with options: use_replica_parallel=True, min_slice_bytes_for_replica_parallel=None, max_replicas_for_replica_parallel=None, enable_pinned_host_transfer=False
I0423 13:31:48.072597 132680881104704 base_pytree_checkpoint_handler.py:154] [process=6][thread=MainThread] Initiated "orbax.checkpoint._src.serialization.jax_array_handlers.ArrayHandler".serialize. Time taken: 0.305745s
I0423 13:31:48.072768 132680881104704 base_pytree_checkpoint_handler.py:130] [process=6] /jax/orbax/write/blocking_gbytes_per_sec: 4.523 GiB/s (total gbytes: 1.5 GiB) (time elapsed: 0.3410639762878418 s) (per-host)
I0423 13:31:48.072818 132680881104704 base_pytree_checkpoint_handler.py:768] [process=6][thread=MainThread] Initiated Pytree async_save. Time taken: 0.341125s (batch_requests_ready=0.018365s, total_serialization_initiated=0.322692s, others=0.000068s)
I0423 13:31:48.073071 132680881104704 composite_checkpoint_handler.py:715] [process=6][thread=MainThread] Initiated CompositeCheckpointHandler.async_save. Time taken: 0.345305s (all_items=0.000017s, per_item={'items': '0.00001669'}, temp_paths=0.345289)
I0423 13:31:48.073949 132680881104704 event_tracking.py:125] [process=6] [async] Finished blocking save in 1.09 seconds. Continuing save @ gs://lance-maxtext/linen_ckpt_xpk_feat_nnx_post_train_fixes_20260423_124541/linen_xpk_feat_nnx_post_train_fixes_20260423_124541_13_scan_layers_false/checkpoints/0.
I0423 13:31:48.074476 132552085993216 async_checkpointer.py:76] [process=6][thread=async_save] Background save thread started. Deadline for this save operation is 2026-04-23 13:51:48.074430
I0423 13:31:48.115918 132680881104704 checkpoint_manager.py:1560] [process=6][thread=MainThread][step=0] Starting CheckpointManager Save Finalize thread=save_finalize
I0423 13:31:48.116288 132555289929472 async_checkpointer.py:280] [process=6][thread=save_finalize] Waiting for background save thread=async_save.
I0423 13:31:48.116453 132680881104704 standard_logger.py:34] {'step': 0, 'event_type': 'save', 'directory': 'gs://lance-maxtext/linen_ckpt_xpk_feat_nnx_post_train_fixes_20260423_124541/linen_xpk_feat_nnx_post_train_fixes_20260423_124541_13_scan_layers_false/checkpoints', 'reached_preemption': False, 'preemption_received_at': None, 'synchronous': False, 'wait_for_prev_start_time': 1776951106.9831176, 'wait_for_prev_duration_secs': 6.198883056640625e-05, 'time_between_consecutive_saves_sec': None, 'checkpointer_blocking_start_time': 1776951106.9850235, 'checkpointer_blocking_duration_secs': 1.0896341800689697, 'get_old_steps_start_time': 1776951108.0746863, 'get_old_steps_duration_secs': 3.0279159545898438e-05, 'checkpoint_manager_blocking_start_time': 1776951106.9812577, 'checkpoint_manager_blocking_duration_secs': 1.1351535320281982}
I0423 13:31:48.116663 132680881104704 checkpointing.py:409] Started an asynchronous checkpoint save for step 0
I0423 13:31:48.116714 132680881104704 max_utils.py:750] 
Memstats: After params initialized:
I0423 13:31:48.116765 132680881104704 max_utils.py:756] 	Using (GB) 0.82 / 31.25 (2.624000%) on TPU_24(process=6,(0,6,0,0))
I0423 13:31:48.116796 132680881104704 max_utils.py:756] 	Using (GB) 0.82 / 31.25 (2.624000%) on TPU_25(process=6,(1,6,0,0))
I0423 13:31:48.116822 132680881104704 max_utils.py:756] 	Using (GB) 0.82 / 31.25 (2.624000%) on TPU_28(process=6,(0,7,0,0))
I0423 13:31:48.116845 132680881104704 max_utils.py:756] 	Using (GB) 0.82 / 31.25 (2.624000%) on TPU_29(process=6,(1,7,0,0))
I0423 13:31:48.471698 132680881104704 metric_logger.py:196] completed step: 0, seconds: 61.189, TFLOP/s/device: 0.222, Tokens/s/device: 33.470, total_weights: 65536, loss: 10.877, lm_loss: 10.877, perplexity: 52938.617
I0423 13:31:48.634039 132680881104704 metric_logger.py:196] completed step: 1, seconds: 1.718, TFLOP/s/device: 7.909, Tokens/s/device: 1192.155, total_weights: 65536, loss: 10.877, lm_loss: 10.877, perplexity: 52938.617
I0423 13:31:49.075584 132680881104704 metric_logger.py:196] completed step: 2, seconds: 0.035, TFLOP/s/device: 390.884, Tokens/s/device: 58918.297, total_weights: 65536, loss: 10.268, lm_loss: 10.268, perplexity: 28795.291
I0423 13:31:49.211834 132680881104704 metric_logger.py:196] completed step: 3, seconds: 0.443, TFLOP/s/device: 30.652, Tokens/s/device: 4620.198, total_weights: 65536, loss: 9.741, lm_loss: 9.741, perplexity: 17000.555
I0423 13:31:49.491779 132680881104704 metric_logger.py:196] completed step: 4, seconds: 0.142, TFLOP/s/device: 95.702, Tokens/s/device: 14425.176, total_weights: 65536, loss: 9.285, lm_loss: 9.285, perplexity: 10779.814
I0423 13:31:49.503314 132680881104704 metric_logger.py:196] completed step: 5, seconds: 0.136, TFLOP/s/device: 99.730, Tokens/s/device: 15032.406, total_weights: 65536, loss: 8.901, lm_loss: 8.901, perplexity: 7335.930
I0423 13:31:51.132657    2887 google_auth_provider.cc:181] Running on GCE, using service account 562977990677-compute@developer.gserviceaccount.com
I0423 13:31:53.333228 132555298322176 array_metadata_store.py:203] [process=6][thread=array_type_handler] Wrote 444 array_metadata.ArrayMetadata to gs://lance-maxtext/linen_ckpt_xpk_feat_nnx_post_train_fixes_20260423_124541/linen_xpk_feat_nnx_post_train_fixes_20260423_124541_13_scan_layers_false/checkpoints/0/items/array_metadatas/process_6
I0423 13:32:13.441152 132680881104704 metric_logger.py:196] completed step: 6, seconds: 0.281, TFLOP/s/device: 48.414, Tokens/s/device: 7297.527, total_weights: 65536, loss: 8.602, lm_loss: 8.602, perplexity: 5441.304
I0423 13:32:13.578078 132680881104704 metric_logger.py:196] completed step: 7, seconds: 23.806, TFLOP/s/device: 0.571, Tokens/s/device: 86.028, total_weights: 65536, loss: 8.393, lm_loss: 8.393, perplexity: 4418.082
I0423 13:32:13.714015 132680881104704 metric_logger.py:196] completed step: 8, seconds: 0.142, TFLOP/s/device: 95.438, Tokens/s/device: 14385.457, total_weights: 65536, loss: 8.264, lm_loss: 8.264, perplexity: 3882.411
I0423 13:32:13.849500 132680881104704 checkpointing.py:794] Waiting for step 9 to finish before checkpoint...
I0423 13:32:13.852994 132680881104704 checkpointing.py:798] Waited 0.0035114288330078125 seconds for step 9 to finish before starting checkpointing.
I0423 13:32:13.855866 132680881104704 checkpoint_manager.py:2020] [process=6][thread=MainThread][step=0][wait_until_finished] Waiting for Save Finalize thread (save_finalize) to complete.
I0423 13:32:21.580919 132552085993216 base_pytree_checkpoint_handler.py:130] [process=6] /jax/orbax/write/gbytes_per_sec: 46.666 MiB/s (total gbytes: 1.5 GiB) (time elapsed: 33.84915518760681 s) (per-host)
I0423 13:32:21.581052 132552085993216 async_checkpointer.py:90] [process=6][thread=async_save] 3 Handler Commit operations completed. Time taken: 33.506442s.
I0423 13:32:31.004749 132552085993216 async_checkpointer.py:160] [process=6][thread=async_save] Background save thread done. Time taken: 42.930121s.
I0423 13:32:31.005070 132555289929472 async_checkpointer.py:288] [process=6][thread=save_finalize] Done with waiting for background save thread=async_save.
I0423 13:32:31.005221 132555289929472 async_checkpointer.py:298] [process=6][thread=save_finalize] No errors found in background save thread=async_save.
I0423 13:32:31.005279 132555289929472 checkpoint_manager.py:2137] [process=6][thread=save_finalize][step=0] CheckpointManager Save Finalize is syncing with other hosts...
I0423 13:32:31.016167 132555289929472 checkpoint_manager.py:2146] [process=6][thread=save_finalize][step=0] CheckpointManager Save Finalize is done on all hosts.
I0423 13:32:31.016391 132680881104704 checkpoint_manager.py:2032] [process=6][thread=MainThread][step=0][wait_until_finished] Done waiting for Save Finalize thread (save_finalize) running at step=0.
W0423 13:32:31.016568 132680881104704 checkpoint_manager.py:1452] Waiting for previous save to complete took 17.160701 seconds. If this number is high, consider checkpointing less frequently.
I0423 13:32:31.019344 132680881104704 checkpoint_manager.py:1512] [process=6] Saving checkpoint at step 9
I0423 13:32:31.021449 132680881104704 event_tracking.py:70] [process=6] [async] Started save checkpoint @ gs://lance-maxtext/linen_ckpt_xpk_feat_nnx_post_train_fixes_20260423_124541/linen_xpk_feat_nnx_post_train_fixes_20260423_124541_13_scan_layers_false/checkpoints/9.
I0423 13:32:31.790304 132680881104704 jax_array_handlers.py:360] Scheduling D2H of 444 prioritized jax.Array.
I0423 13:32:31.790479 132680881104704 replica_slices.py:424] Transferring arrays to host memory with options: use_replica_parallel=True, min_slice_bytes_for_replica_parallel=None, max_replicas_for_replica_parallel=None, enable_pinned_host_transfer=False
I0423 13:32:31.911249 132680881104704 base_pytree_checkpoint_handler.py:154] [process=6][thread=MainThread] Initiated "orbax.checkpoint._src.serialization.jax_array_handlers.ArrayHandler".serialize. Time taken: 0.122414s
I0423 13:32:31.911430 132680881104704 base_pytree_checkpoint_handler.py:130] [process=6] /jax/orbax/write/blocking_gbytes_per_sec: 9.872 GiB/s (total gbytes: 1.5 GiB) (time elapsed: 0.15626263618469238 s) (per-host)
I0423 13:32:31.911491 132680881104704 base_pytree_checkpoint_handler.py:768] [process=6][thread=MainThread] Initiated Pytree async_save. Time taken: 0.156335s (batch_requests_ready=0.017814s, total_serialization_initiated=0.138439s, others=0.000082s)
I0423 13:32:31.911786 132680881104704 composite_checkpoint_handler.py:715] [process=6][thread=MainThread] Initiated CompositeCheckpointHandler.async_save. Time taken: 0.160580s (all_items=0.000015s, per_item={'items': '0.00001454'}, temp_paths=0.160566)
I0423 13:32:31.912564 132680881104704 event_tracking.py:125] [process=6] [async] Finished blocking save in 0.89 seconds. Continuing save @ gs://lance-maxtext/linen_ckpt_xpk_feat_nnx_post_train_fixes_20260423_124541/linen_xpk_feat_nnx_post_train_fixes_20260423_124541_13_scan_layers_false/checkpoints/9.
I0423 13:32:31.912935 132553107621632 async_checkpointer.py:76] [process=6][thread=async_save] Background save thread started. Deadline for this save operation is 2026-04-23 13:52:31.912892
I0423 13:32:31.923471 132680881104704 checkpoint_manager.py:1560] [process=6][thread=MainThread][step=9] Starting CheckpointManager Save Finalize thread=save_finalize
I0423 13:32:31.923721 132555289929472 async_checkpointer.py:280] [process=6][thread=save_finalize] Waiting for background save thread=async_save.
I0423 13:32:31.923848 132680881104704 standard_logger.py:34] {'step': 9, 'event_type': 'save', 'directory': 'gs://lance-maxtext/linen_ckpt_xpk_feat_nnx_post_train_fixes_20260423_124541/linen_xpk_feat_nnx_post_train_fixes_20260423_124541_13_scan_layers_false/checkpoints', 'reached_preemption': False, 'preemption_received_at': None, 'synchronous': False, 'wait_for_prev_start_time': 1776951133.8558354, 'wait_for_prev_duration_secs': 17.160701036453247, 'time_between_consecutive_saves_sec': None, 'checkpointer_blocking_start_time': 1776951151.019393, 'checkpointer_blocking_duration_secs': 0.8936889171600342, 'get_old_steps_start_time': 1776951151.9131207, 'get_old_steps_duration_secs': 3.24249267578125e-05, 'checkpoint_manager_blocking_start_time': 1776951133.8538792, 'checkpoint_manager_blocking_duration_secs': 18.069932222366333}
I0423 13:32:31.924052 132680881104704 checkpointing.py:409] Started an asynchronous checkpoint save for step 9
I0423 13:32:31.924113 132680881104704 checkpoint_manager.py:2020] [process=6][thread=MainThread][step=9][wait_until_finished] Waiting for Save Finalize thread (save_finalize) to complete.
I0423 13:32:37.174020 132555300566784 array_metadata_store.py:203] [process=6][thread=array_type_handler] Wrote 444 array_metadata.ArrayMetadata to gs://lance-maxtext/linen_ckpt_xpk_feat_nnx_post_train_fixes_20260423_124541/linen_xpk_feat_nnx_post_train_fixes_20260423_124541_13_scan_layers_false/checkpoints/9/items/array_metadatas/process_6
I0423 13:33:13.234683 132553107621632 base_pytree_checkpoint_handler.py:130] [process=6] /jax/orbax/write/gbytes_per_sec: 38.081 MiB/s (total gbytes: 1.5 GiB) (time elapsed: 41.479480504989624 s) (per-host)
I0423 13:33:13.234817 132553107621632 async_checkpointer.py:90] [process=6][thread=async_save] 3 Handler Commit operations completed. Time taken: 41.321764s.
I0423 13:33:24.223285 132553107621632 async_checkpointer.py:160] [process=6][thread=async_save] Background save thread done. Time taken: 52.310216s.
I0423 13:33:24.223565 132555289929472 async_checkpointer.py:288] [process=6][thread=save_finalize] Done with waiting for background save thread=async_save.
I0423 13:33:24.223687 132555289929472 async_checkpointer.py:298] [process=6][thread=save_finalize] No errors found in background save thread=async_save.
I0423 13:33:24.223736 132555289929472 checkpoint_manager.py:2137] [process=6][thread=save_finalize][step=9] CheckpointManager Save Finalize is syncing with other hosts...
I0423 13:33:24.225425 132555289929472 checkpoint_manager.py:2146] [process=6][thread=save_finalize][step=9] CheckpointManager Save Finalize is done on all hosts.
I0423 13:33:24.225592 132680881104704 checkpoint_manager.py:2032] [process=6][thread=MainThread][step=9][wait_until_finished] Done waiting for Save Finalize thread (save_finalize) running at step=9.
I0423 13:33:24.225754 132680881104704 checkpoint_manager.py:2009] [process=6][thread=MainThread][wait_until_finished] No Save Finalize thread to wait for. Returning.
I0423 13:33:24.226679 132680881104704 metric_logger.py:196] completed step: 9, seconds: 0.137, TFLOP/s/device: 99.259, Tokens/s/device: 14961.464, total_weights: 65536, loss: 8.188, lm_loss: 8.188, perplexity: 3598.726
Per train step:
 Total TFLOPs: 13.59 
 split as 93.93% learnable weight flops and 6.07% attention flops
XPK End: Thu Apr 23 13:33:36 UTC 2026
EXIT_CODE=0