MaxView

← Back to run

Log Summary

XPK Start: Sun Apr 19 09:20:42 UTC 2026
Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'}
`rope_scaling`'s factor field must be a float >= 1, got 40
`rope_scaling`'s beta_fast field must be a float, got 32
`rope_scaling`'s beta_slow field must be a float, got 1
Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'}
Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'}
Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'}
2026-04-19 09:21:05.247055: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
I0419 09:21:05.247176 137950148179776 max_utils.py:800] System Information: Jax Version: 0.8.3
I0419 09:21:05.247277 137950148179776 max_utils.py:801] System Information: Jaxlib Version: 0.8.3
I0419 09:21:12.935755 137950148179776 max_utils.py:802] System Information: Jax Backend: PJRT C API
TFRT TPU v6 lite
Built on Dec 15 2025 14:03:46 (1765836226) cl/844590465
I0419 09:21:13.114730 137950148179776 max_utils.py:238] Skipping jax distributed system due to skip_jax_distributed_system=True flag.
I0419 09:21:13.116350 137950148179776 train_rl.py:158] Running RL on a single slice
I0419 09:21:13.116409 137950148179776 train_rl.py:671] Starting RL Training
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/deps/src/maxtext/trainers/post_train/rl/train_rl.py", line 777, in <module>
    app.run(main)
  File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 316, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 261, in _run_main
    sys.exit(main(argv))
             ^^^^^^^^^^
  File "/deps/src/maxtext/trainers/post_train/rl/train_rl.py", line 773, in main
    rl_train(trainer_config, sampler_config, trainer_devices, sampler_devices)
  File "/deps/src/maxtext/trainers/post_train/rl/train_rl.py", line 676, in rl_train
    epath.Path(trainer_config.checkpoint_dir).mkdir(parents=True)
  File "/usr/local/lib/python3.12/site-packages/etils/epath/gpath.py", line 207, in mkdir
    self._backend.makedirs(self._path_str, exist_ok=exist_ok, mode=mode)
  File "/usr/local/lib/python3.12/site-packages/etils/epath/backend.py", line 314, in makedirs
    raise FileExistsError(f'{path} already exists.')
FileExistsError: gs://lance-maxtext/pt_ckpt_xpk_test_pipeline_scan_nnx_20260419_085528/pt_rl_nnx_xpk_test_pipeline_scan_nnx_20260419_085528_05_rl_gspo_smoke/checkpoints already exists.
XPK End: Sun Apr 19 09:26:23 UTC 2026
EXIT_CODE=1