XPK Start: Wed Apr 22 16:24:30 UTC 2026 2026-04-22 16:24:34.413970: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered WARNING: All log messages before absl::InitializeLog() is called are written to STDERR E0000 00:00:1776875074.426823 10 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered E0000 00:00:1776875074.430535 10 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered W0000 00:00:1776875074.441872 10 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once. W0000 00:00:1776875074.441890 10 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once. W0000 00:00:1776875074.441892 10 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once. W0000 00:00:1776875074.441894 10 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once. 2026-04-22 16:24:53.904473: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303) I0422 16:24:54.426673 135720599283520 max_utils.py:273] Attempting to initialize the jax distributed system... INFO:2026-04-22 16:25:03,466:jax._src.distributed:140: Starting JAX distributed service on [::]:8482 I0422 16:25:03.466144 135720599283520 distributed.py:140] Starting JAX distributed service on [::]:8482 INFO:2026-04-22 16:25:03,468:jax._src.distributed:157: Connecting to JAX distributed service on mt-10-shardy-true-180t6-slice-job-0-0.mt-10-shardy-true-180t6:8482 I0422 16:25:03.468529 135720599283520 distributed.py:157] Connecting to JAX distributed service on mt-10-shardy-true-180t6-slice-job-0-0.mt-10-shardy-true-180t6:8482 F0422 16:30:08.471020 10 client.h:77] Terminating process because the JAX distributed service detected fatal errors. This most likely indicates that another task died; see the other task logs for more details. Disable Python buffering, i.e. `python -u`, to be sure to see all the previous output. absl::Status: DEADLINE_EXCEEDED: Deadline Exceeded RPC: /tensorflow.CoordinationService/RegisterTask XPK End: Wed Apr 22 16:30:10 UTC 2026 EXIT_CODE=1