MaxView

← Back to run

Log Summary

XPK Start: Wed Apr 22 16:24:30 UTC 2026
2026-04-22 16:24:34.413970: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1776875074.426823      10 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1776875074.430535      10 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1776875074.441872      10 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1776875074.441890      10 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1776875074.441892      10 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1776875074.441894      10 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
2026-04-22 16:24:53.904473: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
I0422 16:24:54.426673 135720599283520 max_utils.py:273] Attempting to initialize the jax distributed system...
INFO:2026-04-22 16:25:03,466:jax._src.distributed:140: Starting JAX distributed service on [::]:8482
I0422 16:25:03.466144 135720599283520 distributed.py:140] Starting JAX distributed service on [::]:8482
INFO:2026-04-22 16:25:03,468:jax._src.distributed:157: Connecting to JAX distributed service on mt-10-shardy-true-180t6-slice-job-0-0.mt-10-shardy-true-180t6:8482
I0422 16:25:03.468529 135720599283520 distributed.py:157] Connecting to JAX distributed service on mt-10-shardy-true-180t6-slice-job-0-0.mt-10-shardy-true-180t6:8482
F0422 16:30:08.471020      10 client.h:77] Terminating process because the JAX distributed service detected fatal errors. This most likely indicates that another task died; see the other task logs for more details. Disable Python buffering, i.e. `python -u`, to be sure to see all the previous output. absl::Status: DEADLINE_EXCEEDED: Deadline Exceeded

RPC: /tensorflow.CoordinationService/RegisterTask
XPK End: Wed Apr 22 16:30:10 UTC 2026
EXIT_CODE=1