XPK Start: Sun Apr 19 00:12:24 UTC 2026
Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'}
`rope_scaling`'s factor field must be a float >= 1, got 40
`rope_scaling`'s beta_fast field must be a float, got 32
`rope_scaling`'s beta_slow field must be a float, got 1
Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'}
Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'}
Unrecognized keys in `rope_scaling` for 'rope_type'='yarn': {'rope_theta'}
2026-04-19 00:12:46.523693: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
I0419 00:12:46.523813 134584422963008 max_utils.py:800] System Information: Jax Version: 0.8.3
I0419 00:12:46.523905 134584422963008 max_utils.py:801] System Information: Jaxlib Version: 0.8.3
I0419 00:12:53.579893 134584422963008 max_utils.py:802] System Information: Jax Backend: PJRT C API
TFRT TPU v6 lite
Built on Dec 15 2025 14:03:46 (1765836226) cl/844590465
I0419 00:12:53.760594 134584422963008 max_utils.py:238] Skipping jax distributed system due to skip_jax_distributed_system=True flag.
I0419 00:12:53.762229 134584422963008 train_rl.py:194] Running RL on a single slice
I0419 00:12:53.762306 134584422963008 train_rl.py:709] Starting RL Training
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/deps/src/maxtext/trainers/post_train/rl/train_rl.py", line 815, in <module>
app.run(main)
File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 316, in run
_run_main(main, args)
File "/usr/local/lib/python3.12/site-packages/absl/app.py", line 261, in _run_main
sys.exit(main(argv))
^^^^^^^^^^
File "/deps/src/maxtext/trainers/post_train/rl/train_rl.py", line 811, in main
rl_train(trainer_config, sampler_config, trainer_devices, sampler_devices)
File "/deps/src/maxtext/trainers/post_train/rl/train_rl.py", line 714, in rl_train
epath.Path(trainer_config.checkpoint_dir).mkdir(parents=True)
File "/usr/local/lib/python3.12/site-packages/etils/epath/gpath.py", line 207, in mkdir
self._backend.makedirs(self._path_str, exist_ok=exist_ok, mode=mode)
File "/usr/local/lib/python3.12/site-packages/etils/epath/backend.py", line 314, in makedirs
raise FileExistsError(f'{path} already exists.')
FileExistsError: gs://lance-maxtext/pt_ckpt_xpk_feat_nnx_post_train_fixes_20260418_235042/pt_rl_nnx_xpk_feat_nnx_post_train_fixes_20260418_235042_06_rl_grpo_functional/checkpoints already exists.
XPK End: Sun Apr 19 00:13:04 UTC 2026
EXIT_CODE=1