MaxView

← Back to run

Log Summary

XPK Start: Thu Apr 23 16:26:14 UTC 2026
2026-04-23 16:26:18.677667: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1776961578.690932      10 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1776961578.694723      10 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1776961578.706516      10 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1776961578.706543      10 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1776961578.706545      10 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1776961578.706551      10 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
2026-04-23 16:26:37.874516: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
I0423 16:26:38.401066 134025324922688 max_utils.py:273] Attempting to initialize the jax distributed system...
INFO:2026-04-23 16:26:47,443:jax._src.distributed:140: Starting JAX distributed service on [::]:8482
I0423 16:26:47.443027 134025324922688 distributed.py:140] Starting JAX distributed service on [::]:8482
INFO:2026-04-23 16:26:47,445:jax._src.distributed:157: Connecting to JAX distributed service on mt-10-shardy-true-lmyxg-slice-job-0-0.mt-10-shardy-true-lmyxg:8482
I0423 16:26:47.445386 134025324922688 distributed.py:157] Connecting to JAX distributed service on mt-10-shardy-true-lmyxg-slice-job-0-0.mt-10-shardy-true-lmyxg:8482
F0423 16:31:52.447047      10 client.h:77] Terminating process because the JAX distributed service detected fatal errors. This most likely indicates that another task died; see the other task logs for more details. Disable Python buffering, i.e. `python -u`, to be sure to see all the previous output. absl::Status: DEADLINE_EXCEEDED: Deadline Exceeded

RPC: /tensorflow.CoordinationService/RegisterTask
XPK End: Thu Apr 23 16:31:54 UTC 2026
EXIT_CODE=1