MaxView

‹ 01_sft_smokeCase: 07_distill_smoke— ›

Metrics: Linen vs NNX  ·  feat/nnx-post-train-fixes

MetricLinen  d8cde296bNNX  d8cde296bDiff (NNX − Linen)

Diff = NNX value − Linen value. Green = NNX improved. Red = NNX regressed.

Linen  ·  d8cde296b  ·  feat_nnx_post_train_fixes_20260416_210550  ·  full log
2026-04-16 21:09:46.698891: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
I0416 21:09:47.690807 137267685428352 max_utils.py:238] Skipping jax distributed system due to skip_jax_distributed_system=True flag.
~/maxtext/maxtext_pt_venv/lib/python3.12/site-packages/jax/_src/xla_bridge.py:202: UserWarning: TPU backend initialization is taking more than 60.0 seconds. Did you run your code on all TPU hosts? See https://docs.jax.dev/en/latest/multi_process.html for more information.
  warnings.warn(
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "~/maxtext/src/maxtext/trainers/post_train/distillation/train_distill.py", line 761, in <module>
    app.run(main)
  File "~/maxtext/maxtext_pt_venv/lib/python3.12/site-packages/absl/app.py", line 367, in run
    _run_main(main, args)
  File "~/maxtext/maxtext_pt_venv/lib/python3.12/site-packages/absl/app.py", line 312, in _run_main
    sys.exit(main(argv))
             ^^^^^^^^^^
  File "~/maxtext/src/maxtext/trainers/post_train/distillation/train_distill.py", line 736, in main
    student_config = pyconfig.initialize(argv, **student_overrides)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/maxtext/src/maxtext/configs/pyconfig.py", line 294, in initialize
    pydantic_config = initialize_pydantic(argv, **kwargs)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/maxtext/src/maxtext/configs/pyconfig.py", line 343, in initialize_pydantic
    validate_no_keys_overridden_twice(model_loaded_cfg.keys(), overrides_cfg.keys())
  File "~/maxtext/src/maxtext/configs/pyconfig.py", line 99, in validate_no_keys_overridden_twice
    raise ValueError(
ValueError: Keys ['vocab_size'] are overridden by both model config and CLI/kwargs.This is not allowed, unless setting `override_model_config=True`.
[DECOUPLED NO-OP] gcs_storage: using stubs.
NNX  ·  d8cde296b  ·  feat_nnx_post_train_fixes_20260416_210550  ·  full log
2026-04-16 21:16:58.621322: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
I0416 21:16:59.809267 130503540145280 max_utils.py:238] Skipping jax distributed system due to skip_jax_distributed_system=True flag.
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "~/maxtext/src/maxtext/trainers/post_train/distillation/train_distill.py", line 761, in <module>
    app.run(main)
  File "~/maxtext/maxtext_pt_venv/lib/python3.12/site-packages/absl/app.py", line 367, in run
    _run_main(main, args)
  File "~/maxtext/maxtext_pt_venv/lib/python3.12/site-packages/absl/app.py", line 312, in _run_main
    sys.exit(main(argv))
             ^^^^^^^^^^
  File "~/maxtext/src/maxtext/trainers/post_train/distillation/train_distill.py", line 736, in main
    student_config = pyconfig.initialize(argv, **student_overrides)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/maxtext/src/maxtext/configs/pyconfig.py", line 294, in initialize
    pydantic_config = initialize_pydantic(argv, **kwargs)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/maxtext/src/maxtext/configs/pyconfig.py", line 343, in initialize_pydantic
    validate_no_keys_overridden_twice(model_loaded_cfg.keys(), overrides_cfg.keys())
  File "~/maxtext/src/maxtext/configs/pyconfig.py", line 99, in validate_no_keys_overridden_twice
    raise ValueError(
ValueError: Keys ['vocab_size'] are overridden by both model config and CLI/kwargs.This is not allowed, unless setting `override_model_config=True`.
[DECOUPLED NO-OP] gcs_storage: using stubs.