MaxView

‹ 01_sft_smokeCase: 07_distill_smoke— ›

Metrics: Linen vs NNX  ·  feat/nnx-post-train-fixes

MetricLinen  574ad3fb9NNX  574ad3fb9Diff (NNX − Linen)

Diff = NNX value − Linen value. Green = NNX improved. Red = NNX regressed.

Linen  ·  574ad3fb9  ·  feat_nnx_post_train_fixes_20260418_235042  ·  full log
XPK Start: Sat Apr 18 23:56:33 UTC 2026
2026-04-18 23:56:50.051298: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
I0418 23:56:53.587070 137653564442432 max_utils.py:273] Attempting to initialize the jax distributed system...
INFO:2026-04-18 23:57:02,626:jax._src.distributed:149: Starting JAX distributed service on [::]:8482
I0418 23:57:02.626484 137653564442432 distributed.py:149] Starting JAX distributed service on [::]:8482
INFO:2026-04-18 23:57:02,628:jax._src.distributed:166: Connecting to JAX distributed service on mt-07-distill-smoke-8l92i-slice-job-0-0.mt-07-distill-smoke-8l92i:8482
I0418 23:57:02.628710 137653564442432 distributed.py:166] Connecting to JAX distributed service on mt-07-distill-smoke-8l92i-slice-job-0-0.mt-07-distill-smoke-8l92i:8482
I0418 23:57:04.266946 137653564442432 max_utils.py:284] Jax distributed system initialized!
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 761, in <module>
    app.run(main)
  File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 316, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 261, in _run_main
    sys.exit(main(argv))
             ^^^^^^^^^^
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 736, in main
    student_config = pyconfig.initialize(argv, **student_overrides)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/deps/src/maxtext/configs/pyconfig.py", line 294, in initialize
    pydantic_config = initialize_pydantic(argv, **kwargs)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/deps/src/maxtext/configs/pyconfig.py", line 343, in initialize_pydantic
    validate_no_keys_overridden_twice(model_loaded_cfg.keys(), overrides_cfg.keys())
  File "/deps/src/maxtext/configs/pyconfig.py", line 99, in validate_no_keys_overridden_twice
    raise ValueError(
ValueError: Keys ['vocab_size'] are overridden by both model config and CLI/kwargs.This is not allowed, unless setting `override_model_config=True`.
XPK End: Sat Apr 18 23:57:18 UTC 2026
EXIT_CODE=1
NNX  ·  574ad3fb9  ·  feat_nnx_post_train_fixes_20260418_235042  ·  full log
XPK Start: Sun Apr 19 00:05:24 UTC 2026
2026-04-19 00:05:41.385047: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
I0419 00:05:44.938157 138124305766208 max_utils.py:273] Attempting to initialize the jax distributed system...
INFO:2026-04-19 00:05:53,975:jax._src.distributed:149: Starting JAX distributed service on [::]:8482
I0419 00:05:53.975873 138124305766208 distributed.py:149] Starting JAX distributed service on [::]:8482
INFO:2026-04-19 00:05:53,978:jax._src.distributed:166: Connecting to JAX distributed service on mt-07-distill-smoke-cbcam-slice-job-0-0.mt-07-distill-smoke-cbcam:8482
I0419 00:05:53.978110 138124305766208 distributed.py:166] Connecting to JAX distributed service on mt-07-distill-smoke-cbcam-slice-job-0-0.mt-07-distill-smoke-cbcam:8482
I0419 00:05:56.166451 138124305766208 max_utils.py:284] Jax distributed system initialized!
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 761, in <module>
    app.run(main)
  File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 316, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 261, in _run_main
    sys.exit(main(argv))
             ^^^^^^^^^^
  File "/deps/src/maxtext/trainers/post_train/distillation/train_distill.py", line 736, in main
    student_config = pyconfig.initialize(argv, **student_overrides)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/deps/src/maxtext/configs/pyconfig.py", line 294, in initialize
    pydantic_config = initialize_pydantic(argv, **kwargs)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/deps/src/maxtext/configs/pyconfig.py", line 343, in initialize_pydantic
    validate_no_keys_overridden_twice(model_loaded_cfg.keys(), overrides_cfg.keys())
  File "/deps/src/maxtext/configs/pyconfig.py", line 99, in validate_no_keys_overridden_twice
    raise ValueError(
ValueError: Keys ['vocab_size'] are overridden by both model config and CLI/kwargs.This is not allowed, unless setting `override_model_config=True`.
XPK End: Sun Apr 19 00:06:09 UTC 2026
EXIT_CODE=1