The module torch.distributed.launch is deprecated and going to be removed in future.Migrate to torch.distributed.run WARNING:torch.distributed.run:--use_env is deprecated and will be removed in future releases. Please read local_rank from `os.environ('LOCAL_RANK')` instead. INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs: entrypoint : pretrain_bert.py min_nodes : 2 max_nodes : 2 nproc_per_node : 8 run_id : none rdzv_backend : static rdzv_endpoint : 198.18.8.30:6000 rdzv_configs : {'rank': 1, 'timeout': 900} max_restarts : 3 monitor_interval : 5 log_dir : None metrics_cfg : {} INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_kmy5evt4/none_nkf7yoxp INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group /opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:52: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: restart_count=0 master_addr=198.18.8.30 master_port=6000 group_rank=1 group_world_size=2 local_ranks=[0, 1, 2, 3, 4, 5, 6, 7] role_ranks=[8, 9, 10, 11, 12, 13, 14, 15] global_ranks=[8, 9, 10, 11, 12, 13, 14, 15] role_world_sizes=[16, 16, 16, 16, 16, 16, 16, 16] global_world_sizes=[16, 16, 16, 16, 16, 16, 16, 16] INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_kmy5evt4/none_nkf7yoxp/attempt_0/0/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_kmy5evt4/none_nkf7yoxp/attempt_0/1/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_kmy5evt4/none_nkf7yoxp/attempt_0/2/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker3 reply file to: /tmp/torchelastic_kmy5evt4/none_nkf7yoxp/attempt_0/3/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker4 reply file to: /tmp/torchelastic_kmy5evt4/none_nkf7yoxp/attempt_0/4/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker5 reply file to: /tmp/torchelastic_kmy5evt4/none_nkf7yoxp/attempt_0/5/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker6 reply file to: /tmp/torchelastic_kmy5evt4/none_nkf7yoxp/attempt_0/6/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker7 reply file to: /tmp/torchelastic_kmy5evt4/none_nkf7yoxp/attempt_0/7/error.json [W ProcessGroupNCCL.cpp:1671] Rank 14 using best-guess GPU 6 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device. [W ProcessGroupNCCL.cpp:1671] Rank 15 using best-guess GPU 7 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device. [W ProcessGroupNCCL.cpp:1671] Rank 11 using best-guess GPU 3 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device. [W ProcessGroupNCCL.cpp:1671] Rank 12 using best-guess GPU 4 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device. [W ProcessGroupNCCL.cpp:1671] Rank 9 using best-guess GPU 1 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device. [W ProcessGroupNCCL.cpp:1671] Rank 8 using best-guess GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device. [W ProcessGroupNCCL.cpp:1671] Rank 13 using best-guess GPU 5 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device. [W ProcessGroupNCCL.cpp:1671] Rank 10 using best-guess GPU 2 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device. iv-2udaavw4l02thdv8lcrl:3386:3386 [6] NCCL INFO Bootstrap : Using eth0:192.168.11.42<0> iv-2udaavw4l02thdv8lcrl:3385:3385 [5] NCCL INFO Bootstrap : Using eth0:192.168.11.42<0> iv-2udaavw4l02thdv8lcrl:3387:3387 [7] NCCL INFO Bootstrap : Using eth0:192.168.11.42<0> iv-2udaavw4l02thdv8lcrl:3381:3381 [1] NCCL INFO Bootstrap : Using eth0:192.168.11.42<0> iv-2udaavw4l02thdv8lcrl:3386:3386 [6] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so iv-2udaavw4l02thdv8lcrl:3386:3386 [6] NCCL INFO P2P plugin IBext iv-2udaavw4l02thdv8lcrl:3385:3385 [5] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so iv-2udaavw4l02thdv8lcrl:3386:3386 [6] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. iv-2udaavw4l02thdv8lcrl:3385:3385 [5] NCCL INFO P2P plugin IBext iv-2udaavw4l02thdv8lcrl:3385:3385 [5] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. iv-2udaavw4l02thdv8lcrl:3384:3384 [4] NCCL INFO Bootstrap : Using eth0:192.168.11.42<0> iv-2udaavw4l02thdv8lcrl:3382:3382 [2] NCCL INFO Bootstrap : Using eth0:192.168.11.42<0> iv-2udaavw4l02thdv8lcrl:3380:3380 [0] NCCL INFO Bootstrap : Using eth0:192.168.11.42<0> iv-2udaavw4l02thdv8lcrl:3381:3381 [1] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so iv-2udaavw4l02thdv8lcrl:3387:3387 [7] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so iv-2udaavw4l02thdv8lcrl:3387:3387 [7] NCCL INFO P2P plugin IBext iv-2udaavw4l02thdv8lcrl:3381:3381 [1] NCCL INFO P2P plugin IBext iv-2udaavw4l02thdv8lcrl:3387:3387 [7] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. iv-2udaavw4l02thdv8lcrl:3381:3381 [1] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. iv-2udaavw4l02thdv8lcrl:3384:3384 [4] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so iv-2udaavw4l02thdv8lcrl:3384:3384 [4] NCCL INFO P2P plugin IBext iv-2udaavw4l02thdv8lcrl:3384:3384 [4] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. iv-2udaavw4l02thdv8lcrl:3382:3382 [2] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so iv-2udaavw4l02thdv8lcrl:3382:3382 [2] NCCL INFO P2P plugin IBext iv-2udaavw4l02thdv8lcrl:3382:3382 [2] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. iv-2udaavw4l02thdv8lcrl:3380:3380 [0] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so iv-2udaavw4l02thdv8lcrl:3380:3380 [0] NCCL INFO P2P plugin IBext iv-2udaavw4l02thdv8lcrl:3380:3380 [0] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. iv-2udaavw4l02thdv8lcrl:3385:3385 [5] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE ; OOB eth0:192.168.11.42<0> iv-2udaavw4l02thdv8lcrl:3385:3385 [5] NCCL INFO Using network IBext iv-2udaavw4l02thdv8lcrl:3387:3387 [7] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE ; OOB eth0:192.168.11.42<0> iv-2udaavw4l02thdv8lcrl:3387:3387 [7] NCCL INFO Using network IBext iv-2udaavw4l02thdv8lcrl:3381:3381 [1] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE ; OOB eth0:192.168.11.42<0> iv-2udaavw4l02thdv8lcrl:3381:3381 [1] NCCL INFO Using network IBext iv-2udaavw4l02thdv8lcrl:3386:3386 [6] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE ; OOB eth0:192.168.11.42<0> iv-2udaavw4l02thdv8lcrl:3386:3386 [6] NCCL INFO Using network IBext iv-2udaavw4l02thdv8lcrl:3380:3380 [0] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE ; OOB eth0:192.168.11.42<0> iv-2udaavw4l02thdv8lcrl:3380:3380 [0] NCCL INFO Using network IBext iv-2udaavw4l02thdv8lcrl:3384:3384 [4] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE ; OOB eth0:192.168.11.42<0> iv-2udaavw4l02thdv8lcrl:3382:3382 [2] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE ; OOB eth0:192.168.11.42<0> iv-2udaavw4l02thdv8lcrl:3384:3384 [4] NCCL INFO Using network IBext iv-2udaavw4l02thdv8lcrl:3382:3382 [2] NCCL INFO Using network IBext iv-2udaavw4l02thdv8lcrl:3383:3383 [3] NCCL INFO Bootstrap : Using eth0:192.168.11.42<0> iv-2udaavw4l02thdv8lcrl:3383:3383 [3] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so iv-2udaavw4l02thdv8lcrl:3383:3383 [3] NCCL INFO P2P plugin IBext iv-2udaavw4l02thdv8lcrl:3383:3383 [3] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. iv-2udaavw4l02thdv8lcrl:3383:3383 [3] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE ; OOB eth0:192.168.11.42<0> iv-2udaavw4l02thdv8lcrl:3383:3383 [3] NCCL INFO Using network IBext iv-2udaavw4l02thdv8lcrl:3387:3593 [7] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. iv-2udaavw4l02thdv8lcrl:3385:3592 [5] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. iv-2udaavw4l02thdv8lcrl:3384:3597 [4] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. iv-2udaavw4l02thdv8lcrl:3386:3596 [6] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. iv-2udaavw4l02thdv8lcrl:3382:3598 [2] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. iv-2udaavw4l02thdv8lcrl:3381:3594 [1] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. iv-2udaavw4l02thdv8lcrl:3383:3600 [3] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. iv-2udaavw4l02thdv8lcrl:3380:3595 [0] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. iv-2udaavw4l02thdv8lcrl:3383:3600 [3] NCCL INFO NCCL_IB_TIMEOUT set by environment to 23. iv-2udaavw4l02thdv8lcrl:3386:3596 [6] NCCL INFO NCCL_IB_TIMEOUT set by environment to 23. iv-2udaavw4l02thdv8lcrl:3380:3595 [0] NCCL INFO NCCL_IB_TIMEOUT set by environment to 23. iv-2udaavw4l02thdv8lcrl:3386:3596 [6] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 7. iv-2udaavw4l02thdv8lcrl:3383:3600 [3] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 7. iv-2udaavw4l02thdv8lcrl:3387:3593 [7] NCCL INFO NCCL_IB_TIMEOUT set by environment to 23. iv-2udaavw4l02thdv8lcrl:3380:3595 [0] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 7. iv-2udaavw4l02thdv8lcrl:3387:3593 [7] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 7. iv-2udaavw4l02thdv8lcrl:3381:3594 [1] NCCL INFO NCCL_IB_TIMEOUT set by environment to 23. iv-2udaavw4l02thdv8lcrl:3385:3592 [5] NCCL INFO NCCL_IB_TIMEOUT set by environment to 23. iv-2udaavw4l02thdv8lcrl:3381:3594 [1] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 7. iv-2udaavw4l02thdv8lcrl:3385:3592 [5] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 7. iv-2udaavw4l02thdv8lcrl:3384:3597 [4] NCCL INFO NCCL_IB_TIMEOUT set by environment to 23. iv-2udaavw4l02thdv8lcrl:3384:3597 [4] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 7. iv-2udaavw4l02thdv8lcrl:3382:3598 [2] NCCL INFO NCCL_IB_TIMEOUT set by environment to 23. iv-2udaavw4l02thdv8lcrl:3382:3598 [2] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 7. iv-2udaavw4l02thdv8lcrl:3380:3595 [0] NCCL INFO Trees [0] 10/-1/-1->8->15 [1] 10/-1/-1->8->15 iv-2udaavw4l02thdv8lcrl:3380:3595 [0] NCCL INFO Setting affinity for GPU 0 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:3381:3594 [1] NCCL INFO Trees [0] 14/-1/-1->9->11 [1] 14/-1/-1->9->11 iv-2udaavw4l02thdv8lcrl:3382:3598 [2] NCCL INFO Trees [0] 11/-1/-1->10->8 [1] 11/-1/-1->10->8 iv-2udaavw4l02thdv8lcrl:3381:3594 [1] NCCL INFO Setting affinity for GPU 1 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:3385:3592 [5] NCCL INFO Trees [0] 15/-1/-1->13->12 [1] 15/-1/-1->13->12 iv-2udaavw4l02thdv8lcrl:3382:3598 [2] NCCL INFO Setting affinity for GPU 2 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:3383:3600 [3] NCCL INFO Trees [0] 9/-1/-1->11->10 [1] 9/-1/-1->11->10 iv-2udaavw4l02thdv8lcrl:3385:3592 [5] NCCL INFO Setting affinity for GPU 5 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:3383:3600 [3] NCCL INFO Setting affinity for GPU 3 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:3386:3596 [6] NCCL INFO Trees [0] -1/-1/-1->14->9 [1] -1/-1/-1->14->9 iv-2udaavw4l02thdv8lcrl:3387:3593 [7] NCCL INFO Trees [0] 8/-1/-1->15->13 [1] 8/-1/-1->15->13 iv-2udaavw4l02thdv8lcrl:3386:3596 [6] NCCL INFO Setting affinity for GPU 6 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:3384:3597 [4] NCCL INFO Trees [0] 13/-1/-1->12->4 [1] 13/4/-1->12->-1 iv-2udaavw4l02thdv8lcrl:3387:3593 [7] NCCL INFO Setting affinity for GPU 7 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:3384:3597 [4] NCCL INFO Setting affinity for GPU 4 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:3386:3596 [6] NCCL INFO Channel 00 : 14[6b010] -> 15[6b020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3384:3597 [4] NCCL INFO Channel 00 : 12[69010] -> 14[6b010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3382:3598 [2] NCCL INFO Channel 00 : 10[67010] -> 13[69020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3380:3595 [0] NCCL INFO Channel 00 : 8[65010] -> 9[65020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3386:3596 [6] NCCL INFO Channel 01 : 14[6b010] -> 15[6b020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3384:3597 [4] NCCL INFO Channel 01 : 12[69010] -> 14[6b010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3382:3598 [2] NCCL INFO Channel 01 : 10[67010] -> 13[69020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3380:3595 [0] NCCL INFO Channel 01 : 8[65010] -> 9[65020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3387:3593 [7] NCCL INFO Channel 00 : 15[6b020] -> 8[65010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3385:3592 [5] NCCL INFO Channel 00 : 13[69020] -> 4[69010] [send] via NET/IBext/0/GDRDMA iv-2udaavw4l02thdv8lcrl:3381:3594 [1] NCCL INFO Channel 00 : 9[65020] -> 11[67020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3387:3593 [7] NCCL INFO Channel 01 : 15[6b020] -> 8[65010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3385:3592 [5] NCCL INFO Channel 01 : 13[69020] -> 4[69010] [send] via NET/IBext/0/GDRDMA iv-2udaavw4l02thdv8lcrl:3381:3594 [1] NCCL INFO Channel 01 : 9[65020] -> 11[67020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3386:3596 [6] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:3381:3594 [1] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:3384:3597 [4] NCCL INFO Channel 00 : 5[69020] -> 12[69010] [receive] via NET/IBext/0/GDRDMA iv-2udaavw4l02thdv8lcrl:3387:3593 [7] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:3380:3595 [0] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:3381:3594 [1] NCCL INFO Channel 00 : 9[65020] -> 14[6b010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3383:3600 [3] NCCL INFO Channel 00 : 11[67020] -> 10[67010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3384:3597 [4] NCCL INFO Channel 01 : 5[69020] -> 12[69010] [receive] via NET/IBext/0/GDRDMA iv-2udaavw4l02thdv8lcrl:3380:3595 [0] NCCL INFO Channel 00 : 8[65010] -> 10[67010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3383:3600 [3] NCCL INFO Channel 01 : 11[67020] -> 10[67010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3381:3594 [1] NCCL INFO Channel 01 : 9[65020] -> 14[6b010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3380:3595 [0] NCCL INFO Channel 01 : 8[65010] -> 10[67010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3383:3600 [3] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:3382:3598 [2] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:3385:3592 [5] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:3386:3596 [6] NCCL INFO Channel 00 : 14[6b010] -> 9[65020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3382:3598 [2] NCCL INFO Channel 00 : 10[67010] -> 11[67020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3384:3597 [4] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:3386:3596 [6] NCCL INFO Channel 01 : 14[6b010] -> 9[65020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3382:3598 [2] NCCL INFO Channel 01 : 10[67010] -> 11[67020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3384:3597 [4] NCCL INFO Channel 00 : 12[69010] -> 13[69020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3386:3596 [6] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:3386:3596 [6] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:3386:3596 [6] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:3384:3597 [4] NCCL INFO Channel 01 : 12[69010] -> 13[69020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3383:3600 [3] NCCL INFO Channel 00 : 11[67020] -> 9[65020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3383:3600 [3] NCCL INFO Channel 01 : 11[67020] -> 9[65020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3384:3597 [4] NCCL INFO Channel 00 : 4[69010] -> 12[69010] [receive] via NET/IBext/0/GDRDMA iv-2udaavw4l02thdv8lcrl:3385:3592 [5] NCCL INFO Channel 00 : 13[69020] -> 15[6b020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3385:3592 [5] NCCL INFO Channel 01 : 13[69020] -> 15[6b020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3384:3597 [4] NCCL INFO Channel 01 : 4[69010] -> 12[69010] [receive] via NET/IBext/0/GDRDMA iv-2udaavw4l02thdv8lcrl:3384:3597 [4] NCCL INFO Channel 00 : 12[69010] -> 4[69010] [send] via NET/IBext/0/GDRDMA iv-2udaavw4l02thdv8lcrl:3382:3598 [2] NCCL INFO Channel 00 : 10[67010] -> 8[65010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3380:3595 [0] NCCL INFO Channel 00 : 8[65010] -> 15[6b020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3381:3594 [1] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:3381:3594 [1] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:3381:3594 [1] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:3383:3600 [3] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:3383:3600 [3] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:3383:3600 [3] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:3382:3598 [2] NCCL INFO Channel 01 : 10[67010] -> 8[65010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3380:3595 [0] NCCL INFO Channel 01 : 8[65010] -> 15[6b020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3381:3594 [1] NCCL INFO Channel 01 : 9[65020] -> 12[69010] via P2P/indirect/14[6b010] iv-2udaavw4l02thdv8lcrl:3384:3597 [4] NCCL INFO Channel 01 : 12[69010] -> 4[69010] [send] via NET/IBext/0/GDRDMA iv-2udaavw4l02thdv8lcrl:3383:3600 [3] NCCL INFO Channel 00 : 11[67020] -> 13[69020] via P2P/indirect/12[69010] iv-2udaavw4l02thdv8lcrl:3387:3593 [7] NCCL INFO Channel 00 : 15[6b020] -> 13[69020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3387:3593 [7] NCCL INFO Channel 01 : 15[6b020] -> 13[69020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3382:3598 [2] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:3382:3598 [2] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:3382:3598 [2] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:3380:3595 [0] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:3380:3595 [0] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:3380:3595 [0] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:3387:3593 [7] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:3387:3593 [7] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:3387:3593 [7] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:3380:3595 [0] NCCL INFO Channel 00 : 8[65010] -> 12[69010] via P2P/indirect/11[67020] iv-2udaavw4l02thdv8lcrl:3385:3592 [5] NCCL INFO Channel 00 : 13[69020] -> 12[69010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3382:3598 [2] NCCL INFO Channel 00 : 10[67010] -> 12[69010] via P2P/indirect/13[69020] iv-2udaavw4l02thdv8lcrl:3385:3592 [5] NCCL INFO Channel 01 : 13[69020] -> 12[69010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3385:3592 [5] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:3385:3592 [5] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:3385:3592 [5] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:3384:3597 [4] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:3384:3597 [4] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:3384:3597 [4] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:3383:3600 [3] NCCL INFO Channel 01 : 11[67020] -> 14[6b010] via P2P/indirect/9[65020] iv-2udaavw4l02thdv8lcrl:3382:3598 [2] NCCL INFO Channel 00 : 10[67010] -> 14[6b010] via P2P/indirect/9[65020] iv-2udaavw4l02thdv8lcrl:3383:3600 [3] NCCL INFO Channel 00 : 11[67020] -> 15[6b020] via P2P/indirect/8[65010] iv-2udaavw4l02thdv8lcrl:3381:3594 [1] NCCL INFO Channel 00 : 9[65020] -> 13[69020] via P2P/indirect/14[6b010] iv-2udaavw4l02thdv8lcrl:3384:3597 [4] NCCL INFO Channel 00 : 12[69010] -> 8[65010] via P2P/indirect/15[6b020] iv-2udaavw4l02thdv8lcrl:3382:3598 [2] NCCL INFO Channel 01 : 10[67010] -> 15[6b020] via P2P/indirect/8[65010] iv-2udaavw4l02thdv8lcrl:3380:3595 [0] NCCL INFO Channel 01 : 8[65010] -> 13[69020] via P2P/indirect/15[6b020] iv-2udaavw4l02thdv8lcrl:3381:3594 [1] NCCL INFO Channel 00 : 9[65020] -> 15[6b020] via P2P/indirect/8[65010] iv-2udaavw4l02thdv8lcrl:3380:3595 [0] NCCL INFO Channel 00 : 8[65010] -> 14[6b010] via P2P/indirect/9[65020] iv-2udaavw4l02thdv8lcrl:3385:3592 [5] NCCL INFO Channel 01 : 13[69020] -> 8[65010] via P2P/indirect/15[6b020] iv-2udaavw4l02thdv8lcrl:3387:3593 [7] NCCL INFO Channel 00 : 15[6b020] -> 9[65020] via P2P/indirect/14[6b010] iv-2udaavw4l02thdv8lcrl:3386:3596 [6] NCCL INFO Channel 00 : 14[6b010] -> 8[65010] via P2P/indirect/15[6b020] iv-2udaavw4l02thdv8lcrl:3387:3593 [7] NCCL INFO Channel 01 : 15[6b020] -> 10[67010] via P2P/indirect/8[65010] iv-2udaavw4l02thdv8lcrl:3386:3596 [6] NCCL INFO Channel 00 : 14[6b010] -> 10[67010] via P2P/indirect/13[69020] iv-2udaavw4l02thdv8lcrl:3387:3593 [7] NCCL INFO Channel 00 : 15[6b020] -> 11[67020] via P2P/indirect/8[65010] iv-2udaavw4l02thdv8lcrl:3386:3596 [6] NCCL INFO Channel 01 : 14[6b010] -> 11[67020] via P2P/indirect/9[65020] iv-2udaavw4l02thdv8lcrl:3385:3592 [5] NCCL INFO Channel 00 : 13[69020] -> 9[65020] via P2P/indirect/14[6b010] iv-2udaavw4l02thdv8lcrl:3385:3592 [5] NCCL INFO Channel 00 : 13[69020] -> 11[67020] via P2P/indirect/10[67010] iv-2udaavw4l02thdv8lcrl:3384:3597 [4] NCCL INFO Channel 01 : 12[69010] -> 9[65020] via P2P/indirect/14[6b010] iv-2udaavw4l02thdv8lcrl:3384:3597 [4] NCCL INFO Channel 00 : 12[69010] -> 10[67010] via P2P/indirect/11[67020] iv-2udaavw4l02thdv8lcrl:3385:3592 [5] NCCL INFO comm 0x7f9de8008fb0 rank 13 nranks 16 cudaDev 5 busId 69020 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:3387:3593 [7] NCCL INFO comm 0x7fa780008fb0 rank 15 nranks 16 cudaDev 7 busId 6b020 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:3383:3600 [3] NCCL INFO comm 0x7fcea4008fb0 rank 11 nranks 16 cudaDev 3 busId 67020 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:3381:3594 [1] NCCL INFO comm 0x7f8d14008fb0 rank 9 nranks 16 cudaDev 1 busId 65020 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:3386:3596 [6] NCCL INFO comm 0x7f01a4008fb0 rank 14 nranks 16 cudaDev 6 busId 6b010 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:3382:3598 [2] NCCL INFO comm 0x7fb784008fb0 rank 10 nranks 16 cudaDev 2 busId 67010 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:3384:3597 [4] NCCL INFO comm 0x7fb388008fb0 rank 12 nranks 16 cudaDev 4 busId 69010 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:3380:3595 [0] NCCL INFO comm 0x7f9a08008fb0 rank 8 nranks 16 cudaDev 0 busId 65010 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:3385:3653 [5] NCCL INFO Trees [0] 15/-1/-1->13->12 [1] 15/-1/-1->13->12 iv-2udaavw4l02thdv8lcrl:3381:3655 [1] NCCL INFO Trees [0] 14/-1/-1->9->11 [1] 14/-1/-1->9->11 iv-2udaavw4l02thdv8lcrl:3380:3651 [0] NCCL INFO Trees [0] 10/-1/-1->8->15 [1] 10/-1/-1->8->15 iv-2udaavw4l02thdv8lcrl:3383:3654 [3] NCCL INFO Trees [0] 9/-1/-1->11->10 [1] 9/-1/-1->11->10 iv-2udaavw4l02thdv8lcrl:3381:3655 [1] NCCL INFO Setting affinity for GPU 1 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:3382:3656 [2] NCCL INFO Trees [0] 11/-1/-1->10->8 [1] 11/-1/-1->10->8 iv-2udaavw4l02thdv8lcrl:3383:3654 [3] NCCL INFO Setting affinity for GPU 3 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:3385:3653 [5] NCCL INFO Setting affinity for GPU 5 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:3386:3657 [6] NCCL INFO Trees [0] -1/-1/-1->14->9 [1] -1/-1/-1->14->9 iv-2udaavw4l02thdv8lcrl:3384:3650 [4] NCCL INFO Trees [0] 13/-1/-1->12->4 [1] 13/4/-1->12->-1 iv-2udaavw4l02thdv8lcrl:3380:3651 [0] NCCL INFO Setting affinity for GPU 0 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:3387:3652 [7] NCCL INFO Trees [0] 8/-1/-1->15->13 [1] 8/-1/-1->15->13 iv-2udaavw4l02thdv8lcrl:3382:3656 [2] NCCL INFO Setting affinity for GPU 2 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:3386:3657 [6] NCCL INFO Setting affinity for GPU 6 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:3387:3652 [7] NCCL INFO Setting affinity for GPU 7 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:3384:3650 [4] NCCL INFO Setting affinity for GPU 4 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:3380:3651 [0] NCCL INFO Channel 00 : 8[65010] -> 9[65020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3384:3650 [4] NCCL INFO Channel 00 : 12[69010] -> 14[6b010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3382:3656 [2] NCCL INFO Channel 00 : 10[67010] -> 13[69020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3386:3657 [6] NCCL INFO Channel 00 : 14[6b010] -> 15[6b020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3380:3651 [0] NCCL INFO Channel 01 : 8[65010] -> 9[65020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3384:3650 [4] NCCL INFO Channel 01 : 12[69010] -> 14[6b010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3382:3656 [2] NCCL INFO Channel 01 : 10[67010] -> 13[69020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3386:3657 [6] NCCL INFO Channel 01 : 14[6b010] -> 15[6b020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3385:3653 [5] NCCL INFO Channel 00 : 13[69020] -> 4[69010] [send] via NET/IBext/0/GDRDMA iv-2udaavw4l02thdv8lcrl:3381:3655 [1] NCCL INFO Channel 00 : 9[65020] -> 11[67020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3387:3652 [7] NCCL INFO Channel 00 : 15[6b020] -> 8[65010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3381:3655 [1] NCCL INFO Channel 01 : 9[65020] -> 11[67020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3387:3652 [7] NCCL INFO Channel 01 : 15[6b020] -> 8[65010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3385:3653 [5] NCCL INFO Channel 01 : 13[69020] -> 4[69010] [send] via NET/IBext/0/GDRDMA iv-2udaavw4l02thdv8lcrl:3381:3655 [1] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:3386:3657 [6] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:3387:3652 [7] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:3380:3651 [0] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:3381:3655 [1] NCCL INFO Channel 00 : 9[65020] -> 14[6b010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3383:3654 [3] NCCL INFO Channel 00 : 11[67020] -> 10[67010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3384:3650 [4] NCCL INFO Channel 00 : 5[69020] -> 12[69010] [receive] via NET/IBext/0/GDRDMA iv-2udaavw4l02thdv8lcrl:3380:3651 [0] NCCL INFO Channel 00 : 8[65010] -> 10[67010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3381:3655 [1] NCCL INFO Channel 01 : 9[65020] -> 14[6b010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3383:3654 [3] NCCL INFO Channel 01 : 11[67020] -> 10[67010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3384:3650 [4] NCCL INFO Channel 01 : 5[69020] -> 12[69010] [receive] via NET/IBext/0/GDRDMA iv-2udaavw4l02thdv8lcrl:3380:3651 [0] NCCL INFO Channel 01 : 8[65010] -> 10[67010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3383:3654 [3] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:3382:3656 [2] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:3385:3653 [5] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:3382:3656 [2] NCCL INFO Channel 00 : 10[67010] -> 11[67020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3386:3657 [6] NCCL INFO Channel 00 : 14[6b010] -> 9[65020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3384:3650 [4] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:3382:3656 [2] NCCL INFO Channel 01 : 10[67010] -> 11[67020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3386:3657 [6] NCCL INFO Channel 01 : 14[6b010] -> 9[65020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3384:3650 [4] NCCL INFO Channel 00 : 12[69010] -> 13[69020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3384:3650 [4] NCCL INFO Channel 01 : 12[69010] -> 13[69020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3386:3657 [6] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:3386:3657 [6] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:3386:3657 [6] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:3383:3654 [3] NCCL INFO Channel 00 : 11[67020] -> 9[65020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3383:3654 [3] NCCL INFO Channel 01 : 11[67020] -> 9[65020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3385:3653 [5] NCCL INFO Channel 00 : 13[69020] -> 15[6b020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3384:3650 [4] NCCL INFO Channel 00 : 4[69010] -> 12[69010] [receive] via NET/IBext/0/GDRDMA iv-2udaavw4l02thdv8lcrl:3384:3650 [4] NCCL INFO Channel 01 : 4[69010] -> 12[69010] [receive] via NET/IBext/0/GDRDMA iv-2udaavw4l02thdv8lcrl:3385:3653 [5] NCCL INFO Channel 01 : 13[69020] -> 15[6b020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3384:3650 [4] NCCL INFO Channel 00 : 12[69010] -> 4[69010] [send] via NET/IBext/0/GDRDMA iv-2udaavw4l02thdv8lcrl:3382:3656 [2] NCCL INFO Channel 00 : 10[67010] -> 8[65010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3380:3651 [0] NCCL INFO Channel 00 : 8[65010] -> 15[6b020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3383:3654 [3] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:3383:3654 [3] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:3383:3654 [3] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:3381:3655 [1] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:3381:3655 [1] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:3381:3655 [1] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:3382:3656 [2] NCCL INFO Channel 01 : 10[67010] -> 8[65010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3380:3651 [0] NCCL INFO Channel 01 : 8[65010] -> 15[6b020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3381:3655 [1] NCCL INFO Channel 01 : 9[65020] -> 12[69010] via P2P/indirect/14[6b010] iv-2udaavw4l02thdv8lcrl:3384:3650 [4] NCCL INFO Channel 01 : 12[69010] -> 4[69010] [send] via NET/IBext/0/GDRDMA iv-2udaavw4l02thdv8lcrl:3383:3654 [3] NCCL INFO Channel 00 : 11[67020] -> 13[69020] via P2P/indirect/12[69010] iv-2udaavw4l02thdv8lcrl:3387:3652 [7] NCCL INFO Channel 00 : 15[6b020] -> 13[69020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3387:3652 [7] NCCL INFO Channel 01 : 15[6b020] -> 13[69020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3387:3652 [7] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:3387:3652 [7] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:3387:3652 [7] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:3382:3656 [2] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:3382:3656 [2] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:3382:3656 [2] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:3380:3651 [0] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:3380:3651 [0] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:3380:3651 [0] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:3385:3653 [5] NCCL INFO Channel 00 : 13[69020] -> 12[69010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3382:3656 [2] NCCL INFO Channel 00 : 10[67010] -> 12[69010] via P2P/indirect/13[69020] iv-2udaavw4l02thdv8lcrl:3380:3651 [0] NCCL INFO Channel 00 : 8[65010] -> 12[69010] via P2P/indirect/11[67020] iv-2udaavw4l02thdv8lcrl:3385:3653 [5] NCCL INFO Channel 01 : 13[69020] -> 12[69010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:3385:3653 [5] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:3385:3653 [5] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:3385:3653 [5] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:3384:3650 [4] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:3384:3650 [4] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:3384:3650 [4] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:3383:3654 [3] NCCL INFO Channel 01 : 11[67020] -> 14[6b010] via P2P/indirect/9[65020] iv-2udaavw4l02thdv8lcrl:3382:3656 [2] NCCL INFO Channel 00 : 10[67010] -> 14[6b010] via P2P/indirect/9[65020] iv-2udaavw4l02thdv8lcrl:3383:3654 [3] NCCL INFO Channel 00 : 11[67020] -> 15[6b020] via P2P/indirect/8[65010] iv-2udaavw4l02thdv8lcrl:3381:3655 [1] NCCL INFO Channel 00 : 9[65020] -> 13[69020] via P2P/indirect/14[6b010] iv-2udaavw4l02thdv8lcrl:3384:3650 [4] NCCL INFO Channel 00 : 12[69010] -> 8[65010] via P2P/indirect/15[6b020] iv-2udaavw4l02thdv8lcrl:3382:3656 [2] NCCL INFO Channel 01 : 10[67010] -> 15[6b020] via P2P/indirect/8[65010] iv-2udaavw4l02thdv8lcrl:3381:3655 [1] NCCL INFO Channel 00 : 9[65020] -> 15[6b020] via P2P/indirect/8[65010] iv-2udaavw4l02thdv8lcrl:3380:3651 [0] NCCL INFO Channel 01 : 8[65010] -> 13[69020] via P2P/indirect/15[6b020] iv-2udaavw4l02thdv8lcrl:3380:3651 [0] NCCL INFO Channel 00 : 8[65010] -> 14[6b010] via P2P/indirect/9[65020] iv-2udaavw4l02thdv8lcrl:3385:3653 [5] NCCL INFO Channel 01 : 13[69020] -> 8[65010] via P2P/indirect/15[6b020] iv-2udaavw4l02thdv8lcrl:3387:3652 [7] NCCL INFO Channel 00 : 15[6b020] -> 9[65020] via P2P/indirect/14[6b010] iv-2udaavw4l02thdv8lcrl:3386:3657 [6] NCCL INFO Channel 00 : 14[6b010] -> 8[65010] via P2P/indirect/15[6b020] iv-2udaavw4l02thdv8lcrl:3387:3652 [7] NCCL INFO Channel 01 : 15[6b020] -> 10[67010] via P2P/indirect/8[65010] iv-2udaavw4l02thdv8lcrl:3386:3657 [6] NCCL INFO Channel 00 : 14[6b010] -> 10[67010] via P2P/indirect/13[69020] iv-2udaavw4l02thdv8lcrl:3385:3653 [5] NCCL INFO Channel 00 : 13[69020] -> 9[65020] via P2P/indirect/14[6b010] iv-2udaavw4l02thdv8lcrl:3387:3652 [7] NCCL INFO Channel 00 : 15[6b020] -> 11[67020] via P2P/indirect/8[65010] iv-2udaavw4l02thdv8lcrl:3386:3657 [6] NCCL INFO Channel 01 : 14[6b010] -> 11[67020] via P2P/indirect/9[65020] iv-2udaavw4l02thdv8lcrl:3385:3653 [5] NCCL INFO Channel 00 : 13[69020] -> 11[67020] via P2P/indirect/10[67010] iv-2udaavw4l02thdv8lcrl:3384:3650 [4] NCCL INFO Channel 01 : 12[69010] -> 9[65020] via P2P/indirect/14[6b010] iv-2udaavw4l02thdv8lcrl:3384:3650 [4] NCCL INFO Channel 00 : 12[69010] -> 10[67010] via P2P/indirect/11[67020] iv-2udaavw4l02thdv8lcrl:3385:3653 [5] NCCL INFO comm 0x7f9d10008fb0 rank 13 nranks 16 cudaDev 5 busId 69020 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:3380:3651 [0] NCCL INFO comm 0x7f9928008fb0 rank 8 nranks 16 cudaDev 0 busId 65010 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:3381:3655 [1] NCCL INFO comm 0x7f8c3c008fb0 rank 9 nranks 16 cudaDev 1 busId 65020 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:3383:3654 [3] NCCL INFO comm 0x7fcdcc008fb0 rank 11 nranks 16 cudaDev 3 busId 67020 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:3387:3652 [7] NCCL INFO comm 0x7fa6a0008fb0 rank 15 nranks 16 cudaDev 7 busId 6b020 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:3386:3657 [6] NCCL INFO comm 0x7f00dc008fb0 rank 14 nranks 16 cudaDev 6 busId 6b010 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:3382:3656 [2] NCCL INFO comm 0x7fb6b8008fb0 rank 10 nranks 16 cudaDev 2 busId 67010 - Init COMPLETE NCCL version 2.10.3+cuda11.4 NCCL version 2.10.3+cuda11.4 NCCL version 2.10.3+cuda11.4 NCCL version 2.10.3+cuda11.4 NCCL version 2.10.3+cuda11.4 NCCL version 2.10.3+cuda11.4 NCCL version 2.10.3+cuda11.4 iv-2udaavw4l02thdv8lcrl:3384:3650 [4] NCCL INFO comm 0x7fb2b4008fb0 rank 12 nranks 16 cudaDev 4 busId 69010 - Init COMPLETE NCCL version 2.10.3+cuda11.4 iv-2udaavw4l02thdv8lcrl:3380:3675 [0] NCCL INFO Channel 00/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3675 [0] NCCL INFO Channel 01/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3675 [0] NCCL INFO Channel 02/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3675 [0] NCCL INFO Channel 03/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3675 [0] NCCL INFO Channel 04/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3675 [0] NCCL INFO Channel 05/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3675 [0] NCCL INFO Channel 06/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3675 [0] NCCL INFO Channel 07/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3675 [0] NCCL INFO Channel 08/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3675 [0] NCCL INFO Channel 09/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3675 [0] NCCL INFO Channel 10/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3675 [0] NCCL INFO Channel 11/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3675 [0] NCCL INFO Channel 12/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3675 [0] NCCL INFO Channel 13/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3675 [0] NCCL INFO Channel 14/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3675 [0] NCCL INFO Channel 15/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3675 [0] NCCL INFO Channel 16/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3675 [0] NCCL INFO Channel 17/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3675 [0] NCCL INFO Channel 18/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3675 [0] NCCL INFO Channel 19/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3675 [0] NCCL INFO Channel 20/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3675 [0] NCCL INFO Channel 21/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3675 [0] NCCL INFO Channel 22/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3675 [0] NCCL INFO Channel 23/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3675 [0] NCCL INFO Channel 24/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3675 [0] NCCL INFO Channel 25/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3675 [0] NCCL INFO Channel 26/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3675 [0] NCCL INFO Channel 27/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3675 [0] NCCL INFO Channel 28/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3675 [0] NCCL INFO Channel 29/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3675 [0] NCCL INFO Channel 30/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3675 [0] NCCL INFO Channel 31/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3675 [0] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-2udaavw4l02thdv8lcrl:3380:3675 [0] NCCL INFO Setting affinity for GPU 0 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:3382:3680 [2] NCCL INFO Channel 00/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3680 [2] NCCL INFO Channel 01/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3680 [2] NCCL INFO Channel 02/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3680 [2] NCCL INFO Channel 03/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3680 [2] NCCL INFO Channel 04/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3680 [2] NCCL INFO Channel 05/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3680 [2] NCCL INFO Channel 06/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3680 [2] NCCL INFO Channel 07/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3680 [2] NCCL INFO Channel 08/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3680 [2] NCCL INFO Channel 09/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3680 [2] NCCL INFO Channel 10/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3680 [2] NCCL INFO Channel 11/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3680 [2] NCCL INFO Channel 12/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3680 [2] NCCL INFO Channel 13/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3680 [2] NCCL INFO Channel 14/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3680 [2] NCCL INFO Channel 15/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3680 [2] NCCL INFO Channel 16/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3680 [2] NCCL INFO Channel 17/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3680 [2] NCCL INFO Channel 18/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3680 [2] NCCL INFO Channel 19/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3680 [2] NCCL INFO Channel 20/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3680 [2] NCCL INFO Channel 21/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3680 [2] NCCL INFO Channel 22/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3680 [2] NCCL INFO Channel 23/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3680 [2] NCCL INFO Channel 24/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3680 [2] NCCL INFO Channel 25/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3680 [2] NCCL INFO Channel 26/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3680 [2] NCCL INFO Channel 27/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3680 [2] NCCL INFO Channel 28/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3680 [2] NCCL INFO Channel 29/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3680 [2] NCCL INFO Channel 30/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3680 [2] NCCL INFO Channel 31/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3680 [2] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-2udaavw4l02thdv8lcrl:3382:3680 [2] NCCL INFO Setting affinity for GPU 2 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:3383:3679 [3] NCCL INFO Channel 00/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3679 [3] NCCL INFO Channel 01/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3679 [3] NCCL INFO Channel 02/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3679 [3] NCCL INFO Channel 03/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3679 [3] NCCL INFO Channel 04/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3679 [3] NCCL INFO Channel 05/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3679 [3] NCCL INFO Channel 06/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3679 [3] NCCL INFO Channel 07/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3679 [3] NCCL INFO Channel 08/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3679 [3] NCCL INFO Channel 09/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3679 [3] NCCL INFO Channel 10/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3679 [3] NCCL INFO Channel 11/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3679 [3] NCCL INFO Channel 12/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3679 [3] NCCL INFO Channel 13/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3679 [3] NCCL INFO Channel 14/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3679 [3] NCCL INFO Channel 15/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3679 [3] NCCL INFO Channel 16/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3679 [3] NCCL INFO Channel 17/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3679 [3] NCCL INFO Channel 18/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3679 [3] NCCL INFO Channel 19/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3679 [3] NCCL INFO Channel 20/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3679 [3] NCCL INFO Channel 21/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3679 [3] NCCL INFO Channel 22/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3679 [3] NCCL INFO Channel 23/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3679 [3] NCCL INFO Channel 24/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3679 [3] NCCL INFO Channel 25/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3679 [3] NCCL INFO Channel 26/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3679 [3] NCCL INFO Channel 27/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3679 [3] NCCL INFO Channel 28/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3679 [3] NCCL INFO Channel 29/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3679 [3] NCCL INFO Channel 30/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3679 [3] NCCL INFO Channel 31/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3679 [3] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-2udaavw4l02thdv8lcrl:3383:3679 [3] NCCL INFO Setting affinity for GPU 3 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:3385:3677 [5] NCCL INFO Channel 00/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3677 [5] NCCL INFO Channel 01/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3677 [5] NCCL INFO Channel 02/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3677 [5] NCCL INFO Channel 03/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3677 [5] NCCL INFO Channel 04/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3677 [5] NCCL INFO Channel 05/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3677 [5] NCCL INFO Channel 06/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3677 [5] NCCL INFO Channel 07/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3677 [5] NCCL INFO Channel 08/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3677 [5] NCCL INFO Channel 09/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3677 [5] NCCL INFO Channel 10/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3677 [5] NCCL INFO Channel 11/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3677 [5] NCCL INFO Channel 12/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3677 [5] NCCL INFO Channel 13/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3677 [5] NCCL INFO Channel 14/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3677 [5] NCCL INFO Channel 15/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3677 [5] NCCL INFO Channel 16/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3677 [5] NCCL INFO Channel 17/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3677 [5] NCCL INFO Channel 18/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3677 [5] NCCL INFO Channel 19/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3677 [5] NCCL INFO Channel 20/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3677 [5] NCCL INFO Channel 21/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3677 [5] NCCL INFO Channel 22/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3677 [5] NCCL INFO Channel 23/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3677 [5] NCCL INFO Channel 24/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3677 [5] NCCL INFO Channel 25/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3677 [5] NCCL INFO Channel 26/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3677 [5] NCCL INFO Channel 27/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3677 [5] NCCL INFO Channel 28/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3677 [5] NCCL INFO Channel 29/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3677 [5] NCCL INFO Channel 30/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3677 [5] NCCL INFO Channel 31/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3677 [5] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-2udaavw4l02thdv8lcrl:3385:3677 [5] NCCL INFO Setting affinity for GPU 5 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:3381:3683 [1] NCCL INFO Channel 00/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3683 [1] NCCL INFO Channel 01/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3683 [1] NCCL INFO Channel 02/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3683 [1] NCCL INFO Channel 03/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3683 [1] NCCL INFO Channel 04/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3683 [1] NCCL INFO Channel 05/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3683 [1] NCCL INFO Channel 06/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3683 [1] NCCL INFO Channel 07/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3683 [1] NCCL INFO Channel 08/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3683 [1] NCCL INFO Channel 09/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3683 [1] NCCL INFO Channel 10/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3683 [1] NCCL INFO Channel 11/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3683 [1] NCCL INFO Channel 12/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3683 [1] NCCL INFO Channel 13/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3683 [1] NCCL INFO Channel 14/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3683 [1] NCCL INFO Channel 15/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3683 [1] NCCL INFO Channel 16/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3683 [1] NCCL INFO Channel 17/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3683 [1] NCCL INFO Channel 18/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3683 [1] NCCL INFO Channel 19/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3683 [1] NCCL INFO Channel 20/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3683 [1] NCCL INFO Channel 21/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3683 [1] NCCL INFO Channel 22/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3683 [1] NCCL INFO Channel 23/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3683 [1] NCCL INFO Channel 24/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3683 [1] NCCL INFO Channel 25/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3683 [1] NCCL INFO Channel 26/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3683 [1] NCCL INFO Channel 27/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3683 [1] NCCL INFO Channel 28/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3683 [1] NCCL INFO Channel 29/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3683 [1] NCCL INFO Channel 30/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3683 [1] NCCL INFO Channel 31/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3683 [1] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-2udaavw4l02thdv8lcrl:3381:3683 [1] NCCL INFO Setting affinity for GPU 1 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:3386:3687 [6] NCCL INFO Channel 00/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3687 [6] NCCL INFO Channel 01/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3687 [6] NCCL INFO Channel 02/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3687 [6] NCCL INFO Channel 03/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3687 [6] NCCL INFO Channel 04/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3687 [6] NCCL INFO Channel 05/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3687 [6] NCCL INFO Channel 06/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3687 [6] NCCL INFO Channel 07/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3687 [6] NCCL INFO Channel 08/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3687 [6] NCCL INFO Channel 09/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3687 [6] NCCL INFO Channel 10/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3687 [6] NCCL INFO Channel 11/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3687 [6] NCCL INFO Channel 12/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3687 [6] NCCL INFO Channel 13/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3687 [6] NCCL INFO Channel 14/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3687 [6] NCCL INFO Channel 15/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3687 [6] NCCL INFO Channel 16/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3687 [6] NCCL INFO Channel 17/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3687 [6] NCCL INFO Channel 18/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3687 [6] NCCL INFO Channel 19/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3687 [6] NCCL INFO Channel 20/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3687 [6] NCCL INFO Channel 21/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3687 [6] NCCL INFO Channel 22/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3687 [6] NCCL INFO Channel 23/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3687 [6] NCCL INFO Channel 24/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3687 [6] NCCL INFO Channel 25/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3687 [6] NCCL INFO Channel 26/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3687 [6] NCCL INFO Channel 27/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3687 [6] NCCL INFO Channel 28/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3687 [6] NCCL INFO Channel 29/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3687 [6] NCCL INFO Channel 30/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3687 [6] NCCL INFO Channel 31/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3687 [6] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-2udaavw4l02thdv8lcrl:3386:3687 [6] NCCL INFO Setting affinity for GPU 6 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:3387:3688 [7] NCCL INFO Channel 00/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3688 [7] NCCL INFO Channel 01/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3688 [7] NCCL INFO Channel 02/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3688 [7] NCCL INFO Channel 03/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3688 [7] NCCL INFO Channel 04/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3688 [7] NCCL INFO Channel 05/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3688 [7] NCCL INFO Channel 06/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3688 [7] NCCL INFO Channel 07/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3688 [7] NCCL INFO Channel 08/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3688 [7] NCCL INFO Channel 09/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3688 [7] NCCL INFO Channel 10/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3688 [7] NCCL INFO Channel 11/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3688 [7] NCCL INFO Channel 12/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3688 [7] NCCL INFO Channel 13/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3688 [7] NCCL INFO Channel 14/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3688 [7] NCCL INFO Channel 15/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3688 [7] NCCL INFO Channel 16/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3688 [7] NCCL INFO Channel 17/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3688 [7] NCCL INFO Channel 18/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3688 [7] NCCL INFO Channel 19/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3688 [7] NCCL INFO Channel 20/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3688 [7] NCCL INFO Channel 21/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3688 [7] NCCL INFO Channel 22/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3688 [7] NCCL INFO Channel 23/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3688 [7] NCCL INFO Channel 24/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3688 [7] NCCL INFO Channel 25/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3688 [7] NCCL INFO Channel 26/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3688 [7] NCCL INFO Channel 27/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3688 [7] NCCL INFO Channel 28/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3688 [7] NCCL INFO Channel 29/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3688 [7] NCCL INFO Channel 30/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3688 [7] NCCL INFO Channel 31/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3688 [7] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-2udaavw4l02thdv8lcrl:3387:3688 [7] NCCL INFO Setting affinity for GPU 7 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:3384:3696 [4] NCCL INFO Channel 00/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3696 [4] NCCL INFO Channel 01/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3696 [4] NCCL INFO Channel 02/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3696 [4] NCCL INFO Channel 03/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3696 [4] NCCL INFO Channel 04/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3696 [4] NCCL INFO Channel 05/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3696 [4] NCCL INFO Channel 06/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3696 [4] NCCL INFO Channel 07/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3696 [4] NCCL INFO Channel 08/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3696 [4] NCCL INFO Channel 09/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3696 [4] NCCL INFO Channel 10/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3696 [4] NCCL INFO Channel 11/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3696 [4] NCCL INFO Channel 12/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3696 [4] NCCL INFO Channel 13/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3696 [4] NCCL INFO Channel 14/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3696 [4] NCCL INFO Channel 15/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3696 [4] NCCL INFO Channel 16/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3696 [4] NCCL INFO Channel 17/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3696 [4] NCCL INFO Channel 18/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3696 [4] NCCL INFO Channel 19/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3696 [4] NCCL INFO Channel 20/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3696 [4] NCCL INFO Channel 21/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3696 [4] NCCL INFO Channel 22/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3696 [4] NCCL INFO Channel 23/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3696 [4] NCCL INFO Channel 24/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3696 [4] NCCL INFO Channel 25/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3696 [4] NCCL INFO Channel 26/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3696 [4] NCCL INFO Channel 27/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3696 [4] NCCL INFO Channel 28/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3696 [4] NCCL INFO Channel 29/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3696 [4] NCCL INFO Channel 30/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3696 [4] NCCL INFO Channel 31/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3696 [4] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-2udaavw4l02thdv8lcrl:3384:3696 [4] NCCL INFO Setting affinity for GPU 4 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:3382:3680 [2] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:3382:3680 [2] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:3382:3680 [2] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-2udaavw4l02thdv8lcrl:3383:3679 [3] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:3383:3679 [3] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:3383:3679 [3] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-2udaavw4l02thdv8lcrl:3383:3679 [3] NCCL INFO comm 0x7fcdc8008fb0 rank 0 nranks 1 cudaDev 3 busId 67020 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:3382:3680 [2] NCCL INFO comm 0x7fb6b4008fb0 rank 0 nranks 1 cudaDev 2 busId 67010 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:3380:3675 [0] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:3380:3675 [0] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:3380:3675 [0] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-2udaavw4l02thdv8lcrl:3385:3677 [5] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:3385:3677 [5] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:3385:3677 [5] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-2udaavw4l02thdv8lcrl:3385:3677 [5] NCCL INFO comm 0x7f9d0c008fb0 rank 0 nranks 1 cudaDev 5 busId 69020 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:3380:3675 [0] NCCL INFO comm 0x7f9924008fb0 rank 0 nranks 1 cudaDev 0 busId 65010 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:3381:3683 [1] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:3381:3683 [1] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:3381:3683 [1] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-2udaavw4l02thdv8lcrl:3381:3683 [1] NCCL INFO comm 0x7f8c38008fb0 rank 0 nranks 1 cudaDev 1 busId 65020 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:3386:3687 [6] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:3386:3687 [6] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:3386:3687 [6] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-2udaavw4l02thdv8lcrl:3386:3687 [6] NCCL INFO comm 0x7f00d8008fb0 rank 0 nranks 1 cudaDev 6 busId 6b010 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:3387:3688 [7] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:3387:3688 [7] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:3387:3688 [7] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-2udaavw4l02thdv8lcrl:3387:3688 [7] NCCL INFO comm 0x7fa69c008fb0 rank 0 nranks 1 cudaDev 7 busId 6b020 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:3384:3696 [4] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:3384:3696 [4] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:3384:3696 [4] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-2udaavw4l02thdv8lcrl:3384:3696 [4] NCCL INFO comm 0x7fb2b0008fb0 rank 0 nranks 1 cudaDev 4 busId 69010 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:3382:3707 [2] NCCL INFO Channel 00/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3707 [2] NCCL INFO Channel 01/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3707 [2] NCCL INFO Channel 02/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3707 [2] NCCL INFO Channel 03/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3707 [2] NCCL INFO Channel 04/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3707 [2] NCCL INFO Channel 05/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3707 [2] NCCL INFO Channel 06/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3707 [2] NCCL INFO Channel 07/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3707 [2] NCCL INFO Channel 08/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3707 [2] NCCL INFO Channel 09/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3707 [2] NCCL INFO Channel 10/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3707 [2] NCCL INFO Channel 11/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3707 [2] NCCL INFO Channel 12/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3707 [2] NCCL INFO Channel 13/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3707 [2] NCCL INFO Channel 14/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3707 [2] NCCL INFO Channel 15/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3707 [2] NCCL INFO Channel 16/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3707 [2] NCCL INFO Channel 17/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3707 [2] NCCL INFO Channel 18/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3707 [2] NCCL INFO Channel 19/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3707 [2] NCCL INFO Channel 20/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3707 [2] NCCL INFO Channel 21/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3707 [2] NCCL INFO Channel 22/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3707 [2] NCCL INFO Channel 23/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3707 [2] NCCL INFO Channel 24/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3707 [2] NCCL INFO Channel 25/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3707 [2] NCCL INFO Channel 26/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3707 [2] NCCL INFO Channel 27/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3707 [2] NCCL INFO Channel 28/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3707 [2] NCCL INFO Channel 29/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3707 [2] NCCL INFO Channel 30/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3707 [2] NCCL INFO Channel 31/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:3707 [2] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-2udaavw4l02thdv8lcrl:3382:3707 [2] NCCL INFO Setting affinity for GPU 2 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:3385:3709 [5] NCCL INFO Channel 00/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3709 [5] NCCL INFO Channel 01/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3709 [5] NCCL INFO Channel 02/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3709 [5] NCCL INFO Channel 03/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3709 [5] NCCL INFO Channel 04/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3709 [5] NCCL INFO Channel 05/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3709 [5] NCCL INFO Channel 06/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3709 [5] NCCL INFO Channel 07/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3709 [5] NCCL INFO Channel 08/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3709 [5] NCCL INFO Channel 09/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3709 [5] NCCL INFO Channel 10/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3709 [5] NCCL INFO Channel 11/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3709 [5] NCCL INFO Channel 12/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3709 [5] NCCL INFO Channel 13/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3709 [5] NCCL INFO Channel 14/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3709 [5] NCCL INFO Channel 15/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3709 [5] NCCL INFO Channel 16/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3709 [5] NCCL INFO Channel 17/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3709 [5] NCCL INFO Channel 18/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3709 [5] NCCL INFO Channel 19/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3709 [5] NCCL INFO Channel 20/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3709 [5] NCCL INFO Channel 21/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3709 [5] NCCL INFO Channel 22/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3709 [5] NCCL INFO Channel 23/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3709 [5] NCCL INFO Channel 24/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3709 [5] NCCL INFO Channel 25/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3709 [5] NCCL INFO Channel 26/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3709 [5] NCCL INFO Channel 27/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3709 [5] NCCL INFO Channel 28/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3709 [5] NCCL INFO Channel 29/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3709 [5] NCCL INFO Channel 30/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3709 [5] NCCL INFO Channel 31/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:3709 [5] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-2udaavw4l02thdv8lcrl:3385:3709 [5] NCCL INFO Setting affinity for GPU 5 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:3383:3718 [3] NCCL INFO Channel 00/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3718 [3] NCCL INFO Channel 01/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3718 [3] NCCL INFO Channel 02/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3718 [3] NCCL INFO Channel 03/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3718 [3] NCCL INFO Channel 04/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3718 [3] NCCL INFO Channel 05/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3718 [3] NCCL INFO Channel 06/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3718 [3] NCCL INFO Channel 07/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3718 [3] NCCL INFO Channel 08/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3718 [3] NCCL INFO Channel 09/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3718 [3] NCCL INFO Channel 10/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3718 [3] NCCL INFO Channel 11/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3718 [3] NCCL INFO Channel 12/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3718 [3] NCCL INFO Channel 13/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3718 [3] NCCL INFO Channel 14/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3718 [3] NCCL INFO Channel 15/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3718 [3] NCCL INFO Channel 16/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3718 [3] NCCL INFO Channel 17/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3718 [3] NCCL INFO Channel 18/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3718 [3] NCCL INFO Channel 19/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3718 [3] NCCL INFO Channel 20/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3718 [3] NCCL INFO Channel 21/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3718 [3] NCCL INFO Channel 22/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3718 [3] NCCL INFO Channel 23/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3718 [3] NCCL INFO Channel 24/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3718 [3] NCCL INFO Channel 25/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3718 [3] NCCL INFO Channel 26/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3718 [3] NCCL INFO Channel 27/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3718 [3] NCCL INFO Channel 28/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3718 [3] NCCL INFO Channel 29/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3718 [3] NCCL INFO Channel 30/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3718 [3] NCCL INFO Channel 31/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:3718 [3] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-2udaavw4l02thdv8lcrl:3383:3718 [3] NCCL INFO Setting affinity for GPU 3 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:3387:3712 [7] NCCL INFO Channel 00/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3712 [7] NCCL INFO Channel 01/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3712 [7] NCCL INFO Channel 02/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3712 [7] NCCL INFO Channel 03/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3712 [7] NCCL INFO Channel 04/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3712 [7] NCCL INFO Channel 05/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3712 [7] NCCL INFO Channel 06/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3712 [7] NCCL INFO Channel 07/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3712 [7] NCCL INFO Channel 08/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3712 [7] NCCL INFO Channel 09/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3712 [7] NCCL INFO Channel 10/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3712 [7] NCCL INFO Channel 11/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3712 [7] NCCL INFO Channel 12/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3712 [7] NCCL INFO Channel 13/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3712 [7] NCCL INFO Channel 14/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3712 [7] NCCL INFO Channel 15/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3712 [7] NCCL INFO Channel 16/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3712 [7] NCCL INFO Channel 17/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3712 [7] NCCL INFO Channel 18/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3712 [7] NCCL INFO Channel 19/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3712 [7] NCCL INFO Channel 20/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3712 [7] NCCL INFO Channel 21/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3712 [7] NCCL INFO Channel 22/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3712 [7] NCCL INFO Channel 23/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3712 [7] NCCL INFO Channel 24/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3712 [7] NCCL INFO Channel 25/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3712 [7] NCCL INFO Channel 26/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3712 [7] NCCL INFO Channel 27/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3712 [7] NCCL INFO Channel 28/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3712 [7] NCCL INFO Channel 29/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3712 [7] NCCL INFO Channel 30/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3712 [7] NCCL INFO Channel 31/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:3712 [7] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-2udaavw4l02thdv8lcrl:3387:3712 [7] NCCL INFO Setting affinity for GPU 7 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:3384:3717 [4] NCCL INFO Channel 00/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3717 [4] NCCL INFO Channel 01/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3717 [4] NCCL INFO Channel 02/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3717 [4] NCCL INFO Channel 03/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3717 [4] NCCL INFO Channel 04/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3717 [4] NCCL INFO Channel 05/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3717 [4] NCCL INFO Channel 06/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3717 [4] NCCL INFO Channel 07/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3717 [4] NCCL INFO Channel 08/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3717 [4] NCCL INFO Channel 09/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3717 [4] NCCL INFO Channel 10/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3717 [4] NCCL INFO Channel 11/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3717 [4] NCCL INFO Channel 12/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3717 [4] NCCL INFO Channel 13/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3717 [4] NCCL INFO Channel 14/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3717 [4] NCCL INFO Channel 15/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3717 [4] NCCL INFO Channel 16/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3717 [4] NCCL INFO Channel 17/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3717 [4] NCCL INFO Channel 18/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3717 [4] NCCL INFO Channel 19/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3717 [4] NCCL INFO Channel 20/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3717 [4] NCCL INFO Channel 21/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3717 [4] NCCL INFO Channel 22/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3717 [4] NCCL INFO Channel 23/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3717 [4] NCCL INFO Channel 24/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3717 [4] NCCL INFO Channel 25/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3717 [4] NCCL INFO Channel 26/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3717 [4] NCCL INFO Channel 27/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3717 [4] NCCL INFO Channel 28/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3717 [4] NCCL INFO Channel 29/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3717 [4] NCCL INFO Channel 30/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3717 [4] NCCL INFO Channel 31/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:3717 [4] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-2udaavw4l02thdv8lcrl:3384:3717 [4] NCCL INFO Setting affinity for GPU 4 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:3381:3715 [1] NCCL INFO Channel 00/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3715 [1] NCCL INFO Channel 01/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3715 [1] NCCL INFO Channel 02/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3715 [1] NCCL INFO Channel 03/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3715 [1] NCCL INFO Channel 04/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3715 [1] NCCL INFO Channel 05/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3715 [1] NCCL INFO Channel 06/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3715 [1] NCCL INFO Channel 07/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3715 [1] NCCL INFO Channel 08/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3715 [1] NCCL INFO Channel 09/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3715 [1] NCCL INFO Channel 10/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3715 [1] NCCL INFO Channel 11/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3715 [1] NCCL INFO Channel 12/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3715 [1] NCCL INFO Channel 13/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3715 [1] NCCL INFO Channel 14/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3715 [1] NCCL INFO Channel 15/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3715 [1] NCCL INFO Channel 16/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3715 [1] NCCL INFO Channel 17/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3715 [1] NCCL INFO Channel 18/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3715 [1] NCCL INFO Channel 19/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3715 [1] NCCL INFO Channel 20/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3715 [1] NCCL INFO Channel 21/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3715 [1] NCCL INFO Channel 22/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3715 [1] NCCL INFO Channel 23/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3715 [1] NCCL INFO Channel 24/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3715 [1] NCCL INFO Channel 25/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3715 [1] NCCL INFO Channel 26/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3715 [1] NCCL INFO Channel 27/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3715 [1] NCCL INFO Channel 28/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3715 [1] NCCL INFO Channel 29/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3715 [1] NCCL INFO Channel 30/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3715 [1] NCCL INFO Channel 31/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:3715 [1] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-2udaavw4l02thdv8lcrl:3381:3715 [1] NCCL INFO Setting affinity for GPU 1 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:3380:3725 [0] NCCL INFO Channel 00/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3725 [0] NCCL INFO Channel 01/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3725 [0] NCCL INFO Channel 02/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3725 [0] NCCL INFO Channel 03/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3725 [0] NCCL INFO Channel 04/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3725 [0] NCCL INFO Channel 05/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3725 [0] NCCL INFO Channel 06/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3725 [0] NCCL INFO Channel 07/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3725 [0] NCCL INFO Channel 08/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3725 [0] NCCL INFO Channel 09/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3725 [0] NCCL INFO Channel 10/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3725 [0] NCCL INFO Channel 11/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3725 [0] NCCL INFO Channel 12/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3725 [0] NCCL INFO Channel 13/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3725 [0] NCCL INFO Channel 14/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3725 [0] NCCL INFO Channel 15/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3725 [0] NCCL INFO Channel 16/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3725 [0] NCCL INFO Channel 17/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3725 [0] NCCL INFO Channel 18/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3725 [0] NCCL INFO Channel 19/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3725 [0] NCCL INFO Channel 20/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3725 [0] NCCL INFO Channel 21/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3725 [0] NCCL INFO Channel 22/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3725 [0] NCCL INFO Channel 23/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3725 [0] NCCL INFO Channel 24/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3725 [0] NCCL INFO Channel 25/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3725 [0] NCCL INFO Channel 26/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3725 [0] NCCL INFO Channel 27/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3725 [0] NCCL INFO Channel 28/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3725 [0] NCCL INFO Channel 29/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3725 [0] NCCL INFO Channel 30/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3725 [0] NCCL INFO Channel 31/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:3725 [0] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-2udaavw4l02thdv8lcrl:3380:3725 [0] NCCL INFO Setting affinity for GPU 0 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:3386:3728 [6] NCCL INFO Channel 00/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3728 [6] NCCL INFO Channel 01/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3728 [6] NCCL INFO Channel 02/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3728 [6] NCCL INFO Channel 03/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3728 [6] NCCL INFO Channel 04/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3728 [6] NCCL INFO Channel 05/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3728 [6] NCCL INFO Channel 06/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3728 [6] NCCL INFO Channel 07/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3728 [6] NCCL INFO Channel 08/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3728 [6] NCCL INFO Channel 09/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3728 [6] NCCL INFO Channel 10/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3728 [6] NCCL INFO Channel 11/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3728 [6] NCCL INFO Channel 12/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3728 [6] NCCL INFO Channel 13/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3728 [6] NCCL INFO Channel 14/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3728 [6] NCCL INFO Channel 15/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3728 [6] NCCL INFO Channel 16/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3728 [6] NCCL INFO Channel 17/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3728 [6] NCCL INFO Channel 18/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3728 [6] NCCL INFO Channel 19/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3728 [6] NCCL INFO Channel 20/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3728 [6] NCCL INFO Channel 21/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3728 [6] NCCL INFO Channel 22/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3728 [6] NCCL INFO Channel 23/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3728 [6] NCCL INFO Channel 24/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3728 [6] NCCL INFO Channel 25/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3728 [6] NCCL INFO Channel 26/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3728 [6] NCCL INFO Channel 27/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3728 [6] NCCL INFO Channel 28/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3728 [6] NCCL INFO Channel 29/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3728 [6] NCCL INFO Channel 30/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3728 [6] NCCL INFO Channel 31/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:3728 [6] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-2udaavw4l02thdv8lcrl:3386:3728 [6] NCCL INFO Setting affinity for GPU 6 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:3382:3707 [2] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:3382:3707 [2] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:3382:3707 [2] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-2udaavw4l02thdv8lcrl:3382:3707 [2] NCCL INFO comm 0x7fb6a8008fb0 rank 0 nranks 1 cudaDev 2 busId 67010 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:3385:3709 [5] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:3385:3709 [5] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:3385:3709 [5] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-2udaavw4l02thdv8lcrl:3383:3718 [3] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:3383:3718 [3] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:3383:3718 [3] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-2udaavw4l02thdv8lcrl:3385:3709 [5] NCCL INFO comm 0x7f9d00008fb0 rank 0 nranks 1 cudaDev 5 busId 69020 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:3383:3718 [3] NCCL INFO comm 0x7fcdbc008fb0 rank 0 nranks 1 cudaDev 3 busId 67020 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:3387:3712 [7] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:3387:3712 [7] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:3387:3712 [7] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-2udaavw4l02thdv8lcrl:3387:3712 [7] NCCL INFO comm 0x7fa690008fb0 rank 0 nranks 1 cudaDev 7 busId 6b020 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:3381:3715 [1] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:3381:3715 [1] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:3381:3715 [1] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-2udaavw4l02thdv8lcrl:3381:3715 [1] NCCL INFO comm 0x7f8c2c008fb0 rank 0 nranks 1 cudaDev 1 busId 65020 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:3384:3717 [4] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:3384:3717 [4] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:3384:3717 [4] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-2udaavw4l02thdv8lcrl:3380:3725 [0] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:3380:3725 [0] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:3380:3725 [0] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-2udaavw4l02thdv8lcrl:3380:3725 [0] NCCL INFO comm 0x7f9918008fb0 rank 0 nranks 1 cudaDev 0 busId 65010 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:3384:3717 [4] NCCL INFO comm 0x7fb2a4008fb0 rank 0 nranks 1 cudaDev 4 busId 69010 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:3386:3728 [6] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:3386:3728 [6] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:3386:3728 [6] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-2udaavw4l02thdv8lcrl:3386:3728 [6] NCCL INFO comm 0x7f00cc008fb0 rank 0 nranks 1 cudaDev 6 busId 6b010 - Init COMPLETE [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) time (ms) | model-and-optimizer-setup: 338.55 | train/valid/test-data-iterators-setup: 2006.77 /dataset/xyn/Megatron-LM/megatron/model/transformer.py:536: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/LegacyTypeDispatch.h:74.) output = bias_dropout_add_func( /dataset/xyn/Megatron-LM/megatron/model/transformer.py:536: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/LegacyTypeDispatch.h:74.) output = bias_dropout_add_func( /dataset/xyn/Megatron-LM/megatron/model/transformer.py:536: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/LegacyTypeDispatch.h:74.) output = bias_dropout_add_func( /dataset/xyn/Megatron-LM/megatron/model/transformer.py:536: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/LegacyTypeDispatch.h:74.) output = bias_dropout_add_func( /dataset/xyn/Megatron-LM/megatron/model/transformer.py:536: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/LegacyTypeDispatch.h:74.) output = bias_dropout_add_func( /dataset/xyn/Megatron-LM/megatron/model/transformer.py:536: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/LegacyTypeDispatch.h:74.) output = bias_dropout_add_func( /dataset/xyn/Megatron-LM/megatron/model/transformer.py:536: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/LegacyTypeDispatch.h:74.) output = bias_dropout_add_func( /dataset/xyn/Megatron-LM/megatron/model/transformer.py:536: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/LegacyTypeDispatch.h:74.) output = bias_dropout_add_func( iv-2udaavw4l02thdv8lcrl:3387:6025 [7] NCCL INFO Channel 00/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:6025 [7] NCCL INFO Channel 01/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:6025 [7] NCCL INFO Channel 02/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:6025 [7] NCCL INFO Channel 03/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:6025 [7] NCCL INFO Channel 04/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:6025 [7] NCCL INFO Channel 05/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:6025 [7] NCCL INFO Channel 06/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:6025 [7] NCCL INFO Channel 07/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:6025 [7] NCCL INFO Channel 08/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:6025 [7] NCCL INFO Channel 09/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:6025 [7] NCCL INFO Channel 10/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:6025 [7] NCCL INFO Channel 11/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:6025 [7] NCCL INFO Channel 12/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:6025 [7] NCCL INFO Channel 13/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:6025 [7] NCCL INFO Channel 14/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:6025 [7] NCCL INFO Channel 15/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:6025 [7] NCCL INFO Channel 16/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:6025 [7] NCCL INFO Channel 17/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:6025 [7] NCCL INFO Channel 18/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:6025 [7] NCCL INFO Channel 19/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:6025 [7] NCCL INFO Channel 20/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:6025 [7] NCCL INFO Channel 21/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:6025 [7] NCCL INFO Channel 22/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:6025 [7] NCCL INFO Channel 23/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:6025 [7] NCCL INFO Channel 24/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:6025 [7] NCCL INFO Channel 25/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:6025 [7] NCCL INFO Channel 26/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:6025 [7] NCCL INFO Channel 27/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:6025 [7] NCCL INFO Channel 28/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:6025 [7] NCCL INFO Channel 29/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:6025 [7] NCCL INFO Channel 30/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:6025 [7] NCCL INFO Channel 31/32 : 0 iv-2udaavw4l02thdv8lcrl:3387:6025 [7] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-2udaavw4l02thdv8lcrl:3387:6025 [7] NCCL INFO Setting affinity for GPU 7 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:3383:6038 [3] NCCL INFO Channel 00/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:6038 [3] NCCL INFO Channel 01/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:6038 [3] NCCL INFO Channel 02/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:6038 [3] NCCL INFO Channel 03/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:6038 [3] NCCL INFO Channel 04/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:6038 [3] NCCL INFO Channel 05/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:6038 [3] NCCL INFO Channel 06/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:6038 [3] NCCL INFO Channel 07/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:6038 [3] NCCL INFO Channel 08/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:6038 [3] NCCL INFO Channel 09/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:6038 [3] NCCL INFO Channel 10/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:6038 [3] NCCL INFO Channel 11/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:6038 [3] NCCL INFO Channel 12/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:6038 [3] NCCL INFO Channel 13/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:6038 [3] NCCL INFO Channel 14/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:6038 [3] NCCL INFO Channel 15/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:6038 [3] NCCL INFO Channel 16/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:6038 [3] NCCL INFO Channel 17/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:6038 [3] NCCL INFO Channel 18/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:6038 [3] NCCL INFO Channel 19/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:6038 [3] NCCL INFO Channel 20/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:6038 [3] NCCL INFO Channel 21/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:6038 [3] NCCL INFO Channel 22/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:6038 [3] NCCL INFO Channel 23/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:6038 [3] NCCL INFO Channel 24/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:6038 [3] NCCL INFO Channel 25/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:6038 [3] NCCL INFO Channel 26/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:6038 [3] NCCL INFO Channel 27/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:6038 [3] NCCL INFO Channel 28/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:6038 [3] NCCL INFO Channel 29/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:6038 [3] NCCL INFO Channel 30/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:6038 [3] NCCL INFO Channel 31/32 : 0 iv-2udaavw4l02thdv8lcrl:3383:6038 [3] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-2udaavw4l02thdv8lcrl:3383:6038 [3] NCCL INFO Setting affinity for GPU 3 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:3385:6028 [5] NCCL INFO Channel 00/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:6028 [5] NCCL INFO Channel 01/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:6028 [5] NCCL INFO Channel 02/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:6028 [5] NCCL INFO Channel 03/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:6028 [5] NCCL INFO Channel 04/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:6028 [5] NCCL INFO Channel 05/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:6028 [5] NCCL INFO Channel 06/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:6028 [5] NCCL INFO Channel 07/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:6028 [5] NCCL INFO Channel 08/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:6028 [5] NCCL INFO Channel 09/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:6028 [5] NCCL INFO Channel 10/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:6028 [5] NCCL INFO Channel 11/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:6028 [5] NCCL INFO Channel 12/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:6028 [5] NCCL INFO Channel 13/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:6028 [5] NCCL INFO Channel 14/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:6028 [5] NCCL INFO Channel 15/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:6028 [5] NCCL INFO Channel 16/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:6028 [5] NCCL INFO Channel 17/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:6028 [5] NCCL INFO Channel 18/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:6028 [5] NCCL INFO Channel 19/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:6028 [5] NCCL INFO Channel 20/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:6028 [5] NCCL INFO Channel 21/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:6028 [5] NCCL INFO Channel 22/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:6028 [5] NCCL INFO Channel 23/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:6028 [5] NCCL INFO Channel 24/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:6028 [5] NCCL INFO Channel 25/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:6028 [5] NCCL INFO Channel 26/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:6028 [5] NCCL INFO Channel 27/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:6028 [5] NCCL INFO Channel 28/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:6028 [5] NCCL INFO Channel 29/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:6028 [5] NCCL INFO Channel 30/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:6028 [5] NCCL INFO Channel 31/32 : 0 iv-2udaavw4l02thdv8lcrl:3385:6028 [5] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-2udaavw4l02thdv8lcrl:3385:6028 [5] NCCL INFO Setting affinity for GPU 5 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:3384:6027 [4] NCCL INFO Channel 00/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:6027 [4] NCCL INFO Channel 01/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:6027 [4] NCCL INFO Channel 02/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:6027 [4] NCCL INFO Channel 03/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:6027 [4] NCCL INFO Channel 04/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:6027 [4] NCCL INFO Channel 05/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:6027 [4] NCCL INFO Channel 06/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:6027 [4] NCCL INFO Channel 07/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:6027 [4] NCCL INFO Channel 08/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:6027 [4] NCCL INFO Channel 09/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:6027 [4] NCCL INFO Channel 10/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:6027 [4] NCCL INFO Channel 11/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:6027 [4] NCCL INFO Channel 12/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:6027 [4] NCCL INFO Channel 13/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:6027 [4] NCCL INFO Channel 14/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:6027 [4] NCCL INFO Channel 15/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:6027 [4] NCCL INFO Channel 16/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:6027 [4] NCCL INFO Channel 17/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:6027 [4] NCCL INFO Channel 18/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:6027 [4] NCCL INFO Channel 19/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:6027 [4] NCCL INFO Channel 20/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:6027 [4] NCCL INFO Channel 21/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:6027 [4] NCCL INFO Channel 22/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:6027 [4] NCCL INFO Channel 23/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:6027 [4] NCCL INFO Channel 24/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:6027 [4] NCCL INFO Channel 25/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:6027 [4] NCCL INFO Channel 26/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:6027 [4] NCCL INFO Channel 27/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:6027 [4] NCCL INFO Channel 28/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:6027 [4] NCCL INFO Channel 29/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:6027 [4] NCCL INFO Channel 30/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:6027 [4] NCCL INFO Channel 31/32 : 0 iv-2udaavw4l02thdv8lcrl:3384:6027 [4] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-2udaavw4l02thdv8lcrl:3384:6027 [4] NCCL INFO Setting affinity for GPU 4 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:3382:6033 [2] NCCL INFO Channel 00/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:6033 [2] NCCL INFO Channel 01/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:6033 [2] NCCL INFO Channel 02/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:6033 [2] NCCL INFO Channel 03/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:6033 [2] NCCL INFO Channel 04/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:6033 [2] NCCL INFO Channel 05/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:6033 [2] NCCL INFO Channel 06/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:6033 [2] NCCL INFO Channel 07/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:6033 [2] NCCL INFO Channel 08/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:6033 [2] NCCL INFO Channel 09/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:6033 [2] NCCL INFO Channel 10/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:6033 [2] NCCL INFO Channel 11/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:6033 [2] NCCL INFO Channel 12/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:6033 [2] NCCL INFO Channel 13/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:6033 [2] NCCL INFO Channel 14/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:6033 [2] NCCL INFO Channel 15/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:6033 [2] NCCL INFO Channel 16/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:6033 [2] NCCL INFO Channel 17/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:6033 [2] NCCL INFO Channel 18/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:6033 [2] NCCL INFO Channel 19/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:6033 [2] NCCL INFO Channel 20/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:6033 [2] NCCL INFO Channel 21/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:6033 [2] NCCL INFO Channel 22/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:6033 [2] NCCL INFO Channel 23/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:6033 [2] NCCL INFO Channel 24/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:6033 [2] NCCL INFO Channel 25/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:6033 [2] NCCL INFO Channel 26/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:6033 [2] NCCL INFO Channel 27/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:6033 [2] NCCL INFO Channel 28/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:6033 [2] NCCL INFO Channel 29/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:6033 [2] NCCL INFO Channel 30/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:6033 [2] NCCL INFO Channel 31/32 : 0 iv-2udaavw4l02thdv8lcrl:3382:6033 [2] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-2udaavw4l02thdv8lcrl:3382:6033 [2] NCCL INFO Setting affinity for GPU 2 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:3386:6032 [6] NCCL INFO Channel 00/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:6032 [6] NCCL INFO Channel 01/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:6032 [6] NCCL INFO Channel 02/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:6032 [6] NCCL INFO Channel 03/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:6032 [6] NCCL INFO Channel 04/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:6032 [6] NCCL INFO Channel 05/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:6032 [6] NCCL INFO Channel 06/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:6032 [6] NCCL INFO Channel 07/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:6032 [6] NCCL INFO Channel 08/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:6032 [6] NCCL INFO Channel 09/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:6032 [6] NCCL INFO Channel 10/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:6032 [6] NCCL INFO Channel 11/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:6032 [6] NCCL INFO Channel 12/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:6032 [6] NCCL INFO Channel 13/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:6032 [6] NCCL INFO Channel 14/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:6032 [6] NCCL INFO Channel 15/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:6032 [6] NCCL INFO Channel 16/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:6032 [6] NCCL INFO Channel 17/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:6032 [6] NCCL INFO Channel 18/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:6032 [6] NCCL INFO Channel 19/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:6032 [6] NCCL INFO Channel 20/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:6032 [6] NCCL INFO Channel 21/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:6032 [6] NCCL INFO Channel 22/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:6032 [6] NCCL INFO Channel 23/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:6032 [6] NCCL INFO Channel 24/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:6032 [6] NCCL INFO Channel 25/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:6032 [6] NCCL INFO Channel 26/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:6032 [6] NCCL INFO Channel 27/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:6032 [6] NCCL INFO Channel 28/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:6032 [6] NCCL INFO Channel 29/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:6032 [6] NCCL INFO Channel 30/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:6032 [6] NCCL INFO Channel 31/32 : 0 iv-2udaavw4l02thdv8lcrl:3386:6032 [6] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-2udaavw4l02thdv8lcrl:3386:6032 [6] NCCL INFO Setting affinity for GPU 6 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:3380:6034 [0] NCCL INFO Channel 00/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:6034 [0] NCCL INFO Channel 01/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:6034 [0] NCCL INFO Channel 02/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:6034 [0] NCCL INFO Channel 03/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:6034 [0] NCCL INFO Channel 04/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:6034 [0] NCCL INFO Channel 05/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:6034 [0] NCCL INFO Channel 06/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:6034 [0] NCCL INFO Channel 07/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:6034 [0] NCCL INFO Channel 08/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:6034 [0] NCCL INFO Channel 09/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:6034 [0] NCCL INFO Channel 10/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:6034 [0] NCCL INFO Channel 11/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:6034 [0] NCCL INFO Channel 12/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:6034 [0] NCCL INFO Channel 13/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:6034 [0] NCCL INFO Channel 14/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:6034 [0] NCCL INFO Channel 15/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:6034 [0] NCCL INFO Channel 16/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:6034 [0] NCCL INFO Channel 17/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:6034 [0] NCCL INFO Channel 18/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:6034 [0] NCCL INFO Channel 19/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:6034 [0] NCCL INFO Channel 20/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:6034 [0] NCCL INFO Channel 21/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:6034 [0] NCCL INFO Channel 22/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:6034 [0] NCCL INFO Channel 23/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:6034 [0] NCCL INFO Channel 24/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:6034 [0] NCCL INFO Channel 25/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:6034 [0] NCCL INFO Channel 26/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:6034 [0] NCCL INFO Channel 27/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:6034 [0] NCCL INFO Channel 28/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:6034 [0] NCCL INFO Channel 29/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:6034 [0] NCCL INFO Channel 30/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:6034 [0] NCCL INFO Channel 31/32 : 0 iv-2udaavw4l02thdv8lcrl:3380:6034 [0] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-2udaavw4l02thdv8lcrl:3380:6034 [0] NCCL INFO Setting affinity for GPU 0 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:3381:6037 [1] NCCL INFO Channel 00/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:6037 [1] NCCL INFO Channel 01/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:6037 [1] NCCL INFO Channel 02/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:6037 [1] NCCL INFO Channel 03/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:6037 [1] NCCL INFO Channel 04/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:6037 [1] NCCL INFO Channel 05/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:6037 [1] NCCL INFO Channel 06/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:6037 [1] NCCL INFO Channel 07/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:6037 [1] NCCL INFO Channel 08/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:6037 [1] NCCL INFO Channel 09/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:6037 [1] NCCL INFO Channel 10/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:6037 [1] NCCL INFO Channel 11/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:6037 [1] NCCL INFO Channel 12/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:6037 [1] NCCL INFO Channel 13/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:6037 [1] NCCL INFO Channel 14/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:6037 [1] NCCL INFO Channel 15/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:6037 [1] NCCL INFO Channel 16/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:6037 [1] NCCL INFO Channel 17/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:6037 [1] NCCL INFO Channel 18/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:6037 [1] NCCL INFO Channel 19/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:6037 [1] NCCL INFO Channel 20/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:6037 [1] NCCL INFO Channel 21/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:6037 [1] NCCL INFO Channel 22/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:6037 [1] NCCL INFO Channel 23/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:6037 [1] NCCL INFO Channel 24/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:6037 [1] NCCL INFO Channel 25/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:6037 [1] NCCL INFO Channel 26/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:6037 [1] NCCL INFO Channel 27/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:6037 [1] NCCL INFO Channel 28/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:6037 [1] NCCL INFO Channel 29/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:6037 [1] NCCL INFO Channel 30/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:6037 [1] NCCL INFO Channel 31/32 : 0 iv-2udaavw4l02thdv8lcrl:3381:6037 [1] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-2udaavw4l02thdv8lcrl:3381:6037 [1] NCCL INFO Setting affinity for GPU 1 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:3383:6038 [3] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:3383:6038 [3] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:3383:6038 [3] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-2udaavw4l02thdv8lcrl:3383:6038 [3] NCCL INFO comm 0x7fc984008fb0 rank 0 nranks 1 cudaDev 3 busId 67020 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:3387:6025 [7] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:3387:6025 [7] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:3387:6025 [7] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-2udaavw4l02thdv8lcrl:3387:6025 [7] NCCL INFO comm 0x7fa254008fb0 rank 0 nranks 1 cudaDev 7 busId 6b020 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:3382:6033 [2] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:3382:6033 [2] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:3382:6033 [2] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-2udaavw4l02thdv8lcrl:3385:6028 [5] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:3385:6028 [5] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:3385:6028 [5] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-2udaavw4l02thdv8lcrl:3382:6033 [2] NCCL INFO comm 0x7fb268008fb0 rank 0 nranks 1 cudaDev 2 busId 67010 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:3386:6032 [6] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:3386:6032 [6] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:3386:6032 [6] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-2udaavw4l02thdv8lcrl:3385:6028 [5] NCCL INFO comm 0x7f98d0008fb0 rank 0 nranks 1 cudaDev 5 busId 69020 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:3386:6032 [6] NCCL INFO comm 0x7efc94008fb0 rank 0 nranks 1 cudaDev 6 busId 6b010 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:3384:6027 [4] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:3384:6027 [4] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:3384:6027 [4] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-2udaavw4l02thdv8lcrl:3380:6034 [0] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:3380:6034 [0] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:3380:6034 [0] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-2udaavw4l02thdv8lcrl:3384:6027 [4] NCCL INFO comm 0x7fae6c008fb0 rank 0 nranks 1 cudaDev 4 busId 69010 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:3380:6034 [0] NCCL INFO comm 0x7f94d8008fb0 rank 0 nranks 1 cudaDev 0 busId 65010 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:3381:6037 [1] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:3381:6037 [1] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:3381:6037 [1] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-2udaavw4l02thdv8lcrl:3381:6037 [1] NCCL INFO comm 0x7f87f4008fb0 rank 0 nranks 1 cudaDev 1 busId 65020 - Init COMPLETE iteration 100/ 220 | consumed samples: 25600 | elapsed time per iteration (ms): 509.2 | learning rate: 8.384E-07 | tpt: 502.8 samples/s | global batch size: 256 | lm loss: 9.597939E+00 | sop loss: 7.002005E-01 | loss scale: 65536.0 | grad norm: 4.153 | number of skipped iterations: 17 | number of nan iterations: 0 | time (ms) | forward-compute: 158.45 | backward-compute: 230.59 | backward-params-all-reduce: 81.27 | backward-embedding-all-reduce: 0.03 | optimizer-copy-to-main-grad: 4.10 | optimizer-unscale-and-check-inf: 11.16 | optimizer-clip-main-grad: 5.42 | optimizer-copy-main-to-model-params: 3.75 | optimizer: 35.91 | batch-generator: 3.97 timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] 2022/07/05 19:32:11.141, Tesla V100-SXM2-32GB, 470.57.02, 16 %, 16 %, 32510 MiB, 10080 MiB, 22430 MiB 2022/07/05 19:32:11.141, Tesla V100-SXM2-32GB, 470.57.02, 16 %, 16 %, 32510 MiB, 10080 MiB, 22430 MiB 2022/07/05 19:32:11.145, Tesla V100-SXM2-32GB, 470.57.02, 67 %, 23 %, 32510 MiB, 10070 MiB, 22440 MiB 2022/07/05 19:32:11.145, Tesla V100-SXM2-32GB, 470.57.02, 67 %, 23 %, 32510 MiB, 10070 MiB, 22440 MiB timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] 2022/07/05 19:32:11.147, Tesla V100-SXM2-32GB, 470.57.02, 51 %, 22 %, 32510 MiB, 10176 MiB, 22334 MiB 2022/07/05 19:32:11.147, Tesla V100-SXM2-32GB, 470.57.02, 51 %, 22 %, 32510 MiB, 10176 MiB, 22334 MiB 2022/07/05 19:32:11.147, Tesla V100-SXM2-32GB, 470.57.02, 16 %, 16 %, 32510 MiB, 10080 MiB, 22430 MiB timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] 2022/07/05 19:32:11.148, Tesla V100-SXM2-32GB, 470.57.02, 13 %, 13 %, 32510 MiB, 10160 MiB, 22350 MiB timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] 2022/07/05 19:32:11.149, Tesla V100-SXM2-32GB, 470.57.02, 13 %, 13 %, 32510 MiB, 10160 MiB, 22350 MiB 2022/07/05 19:32:11.150, Tesla V100-SXM2-32GB, 470.57.02, 67 %, 23 %, 32510 MiB, 10070 MiB, 22440 MiB 2022/07/05 19:32:11.150, Tesla V100-SXM2-32GB, 470.57.02, 16 %, 16 %, 32510 MiB, 10080 MiB, 22430 MiB 2022/07/05 19:32:11.151, Tesla V100-SXM2-32GB, 470.57.02, 15 %, 16 %, 32510 MiB, 10102 MiB, 22408 MiB 2022/07/05 19:32:11.151, Tesla V100-SXM2-32GB, 470.57.02, 16 %, 16 %, 32510 MiB, 10080 MiB, 22430 MiB 2022/07/05 19:32:11.152, Tesla V100-SXM2-32GB, 470.57.02, 16 %, 16 %, 32510 MiB, 10080 MiB, 22430 MiB 2022/07/05 19:32:11.152, Tesla V100-SXM2-32GB, 470.57.02, 15 %, 16 %, 32510 MiB, 10102 MiB, 22408 MiB 2022/07/05 19:32:11.154, Tesla V100-SXM2-32GB, 470.57.02, 51 %, 22 %, 32510 MiB, 10176 MiB, 22334 MiB 2022/07/05 19:32:11.156, Tesla V100-SXM2-32GB, 470.57.02, 67 %, 23 %, 32510 MiB, 10070 MiB, 22440 MiB 2022/07/05 19:32:11.157, Tesla V100-SXM2-32GB, 470.57.02, 84 %, 36 %, 32510 MiB, 10078 MiB, 22432 MiB 2022/07/05 19:32:11.158, Tesla V100-SXM2-32GB, 470.57.02, 67 %, 23 %, 32510 MiB, 10070 MiB, 22440 MiB timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] 2022/07/05 19:32:11.158, Tesla V100-SXM2-32GB, 470.57.02, 67 %, 23 %, 32510 MiB, 10070 MiB, 22440 MiB 2022/07/05 19:32:11.158, Tesla V100-SXM2-32GB, 470.57.02, 84 %, 36 %, 32510 MiB, 10078 MiB, 22432 MiB 2022/07/05 19:32:11.160, Tesla V100-SXM2-32GB, 470.57.02, 13 %, 13 %, 32510 MiB, 10160 MiB, 22350 MiB timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] 2022/07/05 19:32:11.161, Tesla V100-SXM2-32GB, 470.57.02, 51 %, 22 %, 32510 MiB, 10176 MiB, 22334 MiB 2022/07/05 19:32:11.162, Tesla V100-SXM2-32GB, 470.57.02, 33 %, 18 %, 32510 MiB, 10088 MiB, 22422 MiB 2022/07/05 19:32:11.163, Tesla V100-SXM2-32GB, 470.57.02, 51 %, 22 %, 32510 MiB, 10176 MiB, 22334 MiB 2022/07/05 19:32:11.163, Tesla V100-SXM2-32GB, 470.57.02, 51 %, 22 %, 32510 MiB, 10176 MiB, 22334 MiB 2022/07/05 19:32:11.163, Tesla V100-SXM2-32GB, 470.57.02, 33 %, 18 %, 32510 MiB, 10088 MiB, 22422 MiB 2022/07/05 19:32:11.163, Tesla V100-SXM2-32GB, 470.57.02, 16 %, 16 %, 32510 MiB, 10080 MiB, 22430 MiB 2022/07/05 19:32:11.165, Tesla V100-SXM2-32GB, 470.57.02, 15 %, 16 %, 32510 MiB, 10102 MiB, 22408 MiB 2022/07/05 19:32:11.165, Tesla V100-SXM2-32GB, 470.57.02, 16 %, 16 %, 32510 MiB, 10080 MiB, 22430 MiB 2022/07/05 19:32:11.166, Tesla V100-SXM2-32GB, 470.57.02, 13 %, 13 %, 32510 MiB, 10160 MiB, 22350 MiB 2022/07/05 19:32:11.167, Tesla V100-SXM2-32GB, 470.57.02, 52 %, 21 %, 32510 MiB, 10072 MiB, 22438 MiB 2022/07/05 19:32:11.168, Tesla V100-SXM2-32GB, 470.57.02, 13 %, 13 %, 32510 MiB, 10160 MiB, 22350 MiB 2022/07/05 19:32:11.169, Tesla V100-SXM2-32GB, 470.57.02, 13 %, 13 %, 32510 MiB, 10160 MiB, 22350 MiB 2022/07/05 19:32:11.169, Tesla V100-SXM2-32GB, 470.57.02, 52 %, 21 %, 32510 MiB, 10072 MiB, 22438 MiB 2022/07/05 19:32:11.169, Tesla V100-SXM2-32GB, 470.57.02, 67 %, 23 %, 32510 MiB, 10070 MiB, 22440 MiB 2022/07/05 19:32:11.171, Tesla V100-SXM2-32GB, 470.57.02, 84 %, 36 %, 32510 MiB, 10078 MiB, 22432 MiB 2022/07/05 19:32:11.172, Tesla V100-SXM2-32GB, 470.57.02, 67 %, 23 %, 32510 MiB, 10070 MiB, 22440 MiB 2022/07/05 19:32:11.173, Tesla V100-SXM2-32GB, 470.57.02, 15 %, 16 %, 32510 MiB, 10102 MiB, 22408 MiB 2022/07/05 19:32:11.175, Tesla V100-SXM2-32GB, 470.57.02, 15 %, 16 %, 32510 MiB, 10102 MiB, 22408 MiB 2022/07/05 19:32:11.175, Tesla V100-SXM2-32GB, 470.57.02, 15 %, 16 %, 32510 MiB, 10102 MiB, 22408 MiB 2022/07/05 19:32:11.176, Tesla V100-SXM2-32GB, 470.57.02, 51 %, 22 %, 32510 MiB, 10176 MiB, 22334 MiB 2022/07/05 19:32:11.178, Tesla V100-SXM2-32GB, 470.57.02, 33 %, 18 %, 32510 MiB, 10088 MiB, 22422 MiB 2022/07/05 19:32:11.179, Tesla V100-SXM2-32GB, 470.57.02, 51 %, 22 %, 32510 MiB, 10176 MiB, 22334 MiB 2022/07/05 19:32:11.179, Tesla V100-SXM2-32GB, 470.57.02, 84 %, 36 %, 32510 MiB, 10078 MiB, 22432 MiB 2022/07/05 19:32:11.181, Tesla V100-SXM2-32GB, 470.57.02, 84 %, 36 %, 32510 MiB, 10078 MiB, 22432 MiB 2022/07/05 19:32:11.181, Tesla V100-SXM2-32GB, 470.57.02, 84 %, 36 %, 32510 MiB, 10078 MiB, 22432 MiB 2022/07/05 19:32:11.182, Tesla V100-SXM2-32GB, 470.57.02, 13 %, 13 %, 32510 MiB, 10160 MiB, 22350 MiB 2022/07/05 19:32:11.184, Tesla V100-SXM2-32GB, 470.57.02, 52 %, 21 %, 32510 MiB, 10072 MiB, 22438 MiB 2022/07/05 19:32:11.185, Tesla V100-SXM2-32GB, 470.57.02, 13 %, 13 %, 32510 MiB, 10160 MiB, 22350 MiB 2022/07/05 19:32:11.185, Tesla V100-SXM2-32GB, 470.57.02, 33 %, 18 %, 32510 MiB, 10088 MiB, 22422 MiB 2022/07/05 19:32:11.187, Tesla V100-SXM2-32GB, 470.57.02, 33 %, 18 %, 32510 MiB, 10088 MiB, 22422 MiB 2022/07/05 19:32:11.187, Tesla V100-SXM2-32GB, 470.57.02, 33 %, 18 %, 32510 MiB, 10088 MiB, 22422 MiB 2022/07/05 19:32:11.188, Tesla V100-SXM2-32GB, 470.57.02, 15 %, 16 %, 32510 MiB, 10102 MiB, 22408 MiB 2022/07/05 19:32:11.191, Tesla V100-SXM2-32GB, 470.57.02, 15 %, 16 %, 32510 MiB, 10102 MiB, 22408 MiB 2022/07/05 19:32:11.191, Tesla V100-SXM2-32GB, 470.57.02, 52 %, 21 %, 32510 MiB, 10072 MiB, 22438 MiB 2022/07/05 19:32:11.193, Tesla V100-SXM2-32GB, 470.57.02, 52 %, 21 %, 32510 MiB, 10072 MiB, 22438 MiB 2022/07/05 19:32:11.193, Tesla V100-SXM2-32GB, 470.57.02, 52 %, 21 %, 32510 MiB, 10072 MiB, 22438 MiB 2022/07/05 19:32:11.194, Tesla V100-SXM2-32GB, 470.57.02, 0 %, 0 %, 32510 MiB, 10078 MiB, 22432 MiB 2022/07/05 19:32:11.199, Tesla V100-SXM2-32GB, 470.57.02, 0 %, 0 %, 32510 MiB, 10078 MiB, 22432 MiB 2022/07/05 19:32:11.203, Tesla V100-SXM2-32GB, 470.57.02, 33 %, 18 %, 32510 MiB, 10088 MiB, 22422 MiB 2022/07/05 19:32:11.205, Tesla V100-SXM2-32GB, 470.57.02, 33 %, 18 %, 32510 MiB, 10088 MiB, 22422 MiB 2022/07/05 19:32:11.206, Tesla V100-SXM2-32GB, 470.57.02, 52 %, 21 %, 32510 MiB, 10072 MiB, 22438 MiB 2022/07/05 19:32:11.208, Tesla V100-SXM2-32GB, 470.57.02, 52 %, 21 %, 32510 MiB, 10072 MiB, 22438 MiB iteration 200/ 220 | consumed samples: 51200 | elapsed time per iteration (ms): 478.8 | learning rate: 1.848E-06 | tpt: 534.7 samples/s | global batch size: 256 | lm loss: 8.994741E+00 | sop loss: 6.946176E-01 | loss scale: 65536.0 | grad norm: 2.540 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) | forward-compute: 135.23 | backward-compute: 226.08 | backward-params-all-reduce: 79.92 | backward-embedding-all-reduce: 0.03 | optimizer-copy-to-main-grad: 4.10 | optimizer-unscale-and-check-inf: 4.27 | optimizer-clip-main-grad: 6.50 | optimizer-copy-main-to-model-params: 4.50 | optimizer: 32.66 | batch-generator: 1.20 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ validation loss at the end of training for val data | lm loss value: 8.741688E+00 | lm loss PPL: 6.258450E+03 | sop loss value: 6.938093E-01 | sop loss PPL: 2.001325E+00 | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- validation loss at the end of training for test data | lm loss value: 8.703194E+00 | lm loss PPL: 6.022114E+03 | sop loss value: 6.931536E-01 | sop loss PPL: 2.000013E+00 | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- INFO:torch.distributed.elastic.agent.server.api:[default] worker group successfully finished. Waiting 300 seconds for other agents to finish. INFO:torch.distributed.elastic.agent.server.api:Local worker group finished (SUCCEEDED). Waiting 300 seconds for other agents to finish /opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:70: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:Done waiting for other agents. Elapsed: 0.0019137859344482422 seconds {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 8, "group_rank": 1, "worker_id": "3380", "role": "default", "hostname": "iv-2udaavw4l02thdv8lcrl", "state": "SUCCEEDED", "total_run_time": 125, "rdzv_backend": "static", "raw_error": null, "metadata": "{\"group_world_size\": 2, \"entry_point\": \"python\", \"local_rank\": [0], \"role_rank\": [8], \"role_world_size\": [16]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 9, "group_rank": 1, "worker_id": "3381", "role": "default", "hostname": "iv-2udaavw4l02thdv8lcrl", "state": "SUCCEEDED", "total_run_time": 125, "rdzv_backend": "static", "raw_error": null, "metadata": "{\"group_world_size\": 2, \"entry_point\": \"python\", \"local_rank\": [1], \"role_rank\": [9], \"role_world_size\": [16]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 10, "group_rank": 1, "worker_id": "3382", "role": "default", "hostname": "iv-2udaavw4l02thdv8lcrl", "state": "SUCCEEDED", "total_run_time": 125, "rdzv_backend": "static", "raw_error": null, "metadata": "{\"group_world_size\": 2, \"entry_point\": \"python\", \"local_rank\": [2], \"role_rank\": [10], \"role_world_size\": [16]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 11, "group_rank": 1, "worker_id": "3383", "role": "default", "hostname": "iv-2udaavw4l02thdv8lcrl", "state": "SUCCEEDED", "total_run_time": 125, "rdzv_backend": "static", "raw_error": null, "metadata": "{\"group_world_size\": 2, \"entry_point\": \"python\", \"local_rank\": [3], \"role_rank\": [11], \"role_world_size\": [16]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 12, "group_rank": 1, "worker_id": "3384", "role": "default", "hostname": "iv-2udaavw4l02thdv8lcrl", "state": "SUCCEEDED", "total_run_time": 125, "rdzv_backend": "static", "raw_error": null, "metadata": "{\"group_world_size\": 2, \"entry_point\": \"python\", \"local_rank\": [4], \"role_rank\": [12], \"role_world_size\": [16]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 13, "group_rank": 1, "worker_id": "3385", "role": "default", "hostname": "iv-2udaavw4l02thdv8lcrl", "state": "SUCCEEDED", "total_run_time": 125, "rdzv_backend": "static", "raw_error": null, "metadata": "{\"group_world_size\": 2, \"entry_point\": \"python\", \"local_rank\": [5], \"role_rank\": [13], \"role_world_size\": [16]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 14, "group_rank": 1, "worker_id": "3386", "role": "default", "hostname": "iv-2udaavw4l02thdv8lcrl", "state": "SUCCEEDED", "total_run_time": 125, "rdzv_backend": "static", "raw_error": null, "metadata": "{\"group_world_size\": 2, \"entry_point\": \"python\", \"local_rank\": [6], \"role_rank\": [14], \"role_world_size\": [16]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 15, "group_rank": 1, "worker_id": "3387", "role": "default", "hostname": "iv-2udaavw4l02thdv8lcrl", "state": "SUCCEEDED", "total_run_time": 125, "rdzv_backend": "static", "raw_error": null, "metadata": "{\"group_world_size\": 2, \"entry_point\": \"python\", \"local_rank\": [7], \"role_rank\": [15], \"role_world_size\": [16]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "AGENT", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": null, "group_rank": 1, "worker_id": null, "role": "default", "hostname": "iv-2udaavw4l02thdv8lcrl", "state": "SUCCEEDED", "total_run_time": 125, "rdzv_backend": "static", "raw_error": null, "metadata": "{\"group_world_size\": 2, \"entry_point\": \"python\"}", "agent_restarts": 0}} ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. *****************************************