The module torch.distributed.launch is deprecated and going to be removed in future.Migrate to torch.distributed.run WARNING:torch.distributed.run:--use_env is deprecated and will be removed in future releases. Please read local_rank from `os.environ('LOCAL_RANK')` instead. INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs: entrypoint : pretrain_gpt.py min_nodes : 4 max_nodes : 4 nproc_per_node : 8 run_id : none rdzv_backend : static rdzv_endpoint : 198.18.8.22:6000 rdzv_configs : {'rank': 3, 'timeout': 900} max_restarts : 3 monitor_interval : 5 log_dir : None metrics_cfg : {} INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_n79hzavy/none_0fm6n83w INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group /opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:52: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: restart_count=0 master_addr=198.18.8.22 master_port=6000 group_rank=3 group_world_size=4 local_ranks=[0, 1, 2, 3, 4, 5, 6, 7] role_ranks=[24, 25, 26, 27, 28, 29, 30, 31] global_ranks=[24, 25, 26, 27, 28, 29, 30, 31] role_world_sizes=[32, 32, 32, 32, 32, 32, 32, 32] global_world_sizes=[32, 32, 32, 32, 32, 32, 32, 32] INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_n79hzavy/none_0fm6n83w/attempt_0/0/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_n79hzavy/none_0fm6n83w/attempt_0/1/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_n79hzavy/none_0fm6n83w/attempt_0/2/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker3 reply file to: /tmp/torchelastic_n79hzavy/none_0fm6n83w/attempt_0/3/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker4 reply file to: /tmp/torchelastic_n79hzavy/none_0fm6n83w/attempt_0/4/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker5 reply file to: /tmp/torchelastic_n79hzavy/none_0fm6n83w/attempt_0/5/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker6 reply file to: /tmp/torchelastic_n79hzavy/none_0fm6n83w/attempt_0/6/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker7 reply file to: /tmp/torchelastic_n79hzavy/none_0fm6n83w/attempt_0/7/error.json [W ProcessGroupNCCL.cpp:1671] Rank 26 using best-guess GPU 2 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device. [W ProcessGroupNCCL.cpp:1671] Rank 30 using best-guess GPU 6 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device. [W ProcessGroupNCCL.cpp:1671] Rank 27 using best-guess GPU 3 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device. [W ProcessGroupNCCL.cpp:1671] Rank 28 using best-guess GPU 4 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device. [W ProcessGroupNCCL.cpp:1671] Rank 29 using best-guess GPU 5 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device. [W ProcessGroupNCCL.cpp:1671] Rank 24 using best-guess GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device. [W ProcessGroupNCCL.cpp:1671] Rank 31 using best-guess GPU 7 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device. [W ProcessGroupNCCL.cpp:1671] Rank 25 using best-guess GPU 1 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device. iv-ybpu7pvmis5m57pm6ny1:3375:3375 [2] NCCL INFO Bootstrap : Using eth0:192.168.11.142<0> iv-ybpu7pvmis5m57pm6ny1:3373:3373 [0] NCCL INFO Bootstrap : Using eth0:192.168.11.142<0> iv-ybpu7pvmis5m57pm6ny1:3377:3377 [4] NCCL INFO Bootstrap : Using eth0:192.168.11.142<0> iv-ybpu7pvmis5m57pm6ny1:3380:3380 [7] NCCL INFO Bootstrap : Using eth0:192.168.11.142<0> iv-ybpu7pvmis5m57pm6ny1:3375:3375 [2] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so iv-ybpu7pvmis5m57pm6ny1:3375:3375 [2] NCCL INFO P2P plugin IBext iv-ybpu7pvmis5m57pm6ny1:3375:3375 [2] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. iv-ybpu7pvmis5m57pm6ny1:3376:3376 [3] NCCL INFO Bootstrap : Using eth0:192.168.11.142<0> iv-ybpu7pvmis5m57pm6ny1:3378:3378 [5] NCCL INFO Bootstrap : Using eth0:192.168.11.142<0> iv-ybpu7pvmis5m57pm6ny1:3373:3373 [0] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so iv-ybpu7pvmis5m57pm6ny1:3373:3373 [0] NCCL INFO P2P plugin IBext iv-ybpu7pvmis5m57pm6ny1:3373:3373 [0] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. iv-ybpu7pvmis5m57pm6ny1:3380:3380 [7] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so iv-ybpu7pvmis5m57pm6ny1:3380:3380 [7] NCCL INFO P2P plugin IBext iv-ybpu7pvmis5m57pm6ny1:3380:3380 [7] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. iv-ybpu7pvmis5m57pm6ny1:3377:3377 [4] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so iv-ybpu7pvmis5m57pm6ny1:3377:3377 [4] NCCL INFO P2P plugin IBext iv-ybpu7pvmis5m57pm6ny1:3377:3377 [4] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. iv-ybpu7pvmis5m57pm6ny1:3379:3379 [6] NCCL INFO Bootstrap : Using eth0:192.168.11.142<0> iv-ybpu7pvmis5m57pm6ny1:3374:3374 [1] NCCL INFO Bootstrap : Using eth0:192.168.11.142<0> iv-ybpu7pvmis5m57pm6ny1:3376:3376 [3] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so iv-ybpu7pvmis5m57pm6ny1:3376:3376 [3] NCCL INFO P2P plugin IBext iv-ybpu7pvmis5m57pm6ny1:3376:3376 [3] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. iv-ybpu7pvmis5m57pm6ny1:3378:3378 [5] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so iv-ybpu7pvmis5m57pm6ny1:3378:3378 [5] NCCL INFO P2P plugin IBext iv-ybpu7pvmis5m57pm6ny1:3378:3378 [5] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. iv-ybpu7pvmis5m57pm6ny1:3379:3379 [6] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so iv-ybpu7pvmis5m57pm6ny1:3379:3379 [6] NCCL INFO P2P plugin IBext iv-ybpu7pvmis5m57pm6ny1:3379:3379 [6] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. iv-ybpu7pvmis5m57pm6ny1:3374:3374 [1] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so iv-ybpu7pvmis5m57pm6ny1:3374:3374 [1] NCCL INFO P2P plugin IBext iv-ybpu7pvmis5m57pm6ny1:3374:3374 [1] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. iv-ybpu7pvmis5m57pm6ny1:3375:3375 [2] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE ; OOB eth0:192.168.11.142<0> iv-ybpu7pvmis5m57pm6ny1:3375:3375 [2] NCCL INFO Using network IBext iv-ybpu7pvmis5m57pm6ny1:3378:3378 [5] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE ; OOB eth0:192.168.11.142<0> iv-ybpu7pvmis5m57pm6ny1:3378:3378 [5] NCCL INFO Using network IBext iv-ybpu7pvmis5m57pm6ny1:3373:3373 [0] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE ; OOB eth0:192.168.11.142<0> iv-ybpu7pvmis5m57pm6ny1:3373:3373 [0] NCCL INFO Using network IBext iv-ybpu7pvmis5m57pm6ny1:3377:3377 [4] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE ; OOB eth0:192.168.11.142<0> iv-ybpu7pvmis5m57pm6ny1:3377:3377 [4] NCCL INFO Using network IBext iv-ybpu7pvmis5m57pm6ny1:3380:3380 [7] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE ; OOB eth0:192.168.11.142<0> iv-ybpu7pvmis5m57pm6ny1:3380:3380 [7] NCCL INFO Using network IBext iv-ybpu7pvmis5m57pm6ny1:3376:3376 [3] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE ; OOB eth0:192.168.11.142<0> iv-ybpu7pvmis5m57pm6ny1:3376:3376 [3] NCCL INFO Using network IBext iv-ybpu7pvmis5m57pm6ny1:3374:3374 [1] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE ; OOB eth0:192.168.11.142<0> iv-ybpu7pvmis5m57pm6ny1:3379:3379 [6] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE ; OOB eth0:192.168.11.142<0> iv-ybpu7pvmis5m57pm6ny1:3379:3379 [6] NCCL INFO Using network IBext iv-ybpu7pvmis5m57pm6ny1:3374:3374 [1] NCCL INFO Using network IBext iv-ybpu7pvmis5m57pm6ny1:3379:3593 [6] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. iv-ybpu7pvmis5m57pm6ny1:3378:3587 [5] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. iv-ybpu7pvmis5m57pm6ny1:3380:3590 [7] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. iv-ybpu7pvmis5m57pm6ny1:3374:3592 [1] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. iv-ybpu7pvmis5m57pm6ny1:3375:3586 [2] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. iv-ybpu7pvmis5m57pm6ny1:3373:3588 [0] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. iv-ybpu7pvmis5m57pm6ny1:3376:3591 [3] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. iv-ybpu7pvmis5m57pm6ny1:3377:3589 [4] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. iv-ybpu7pvmis5m57pm6ny1:3374:3592 [1] NCCL INFO NCCL_IB_TIMEOUT set by environment to 23. iv-ybpu7pvmis5m57pm6ny1:3373:3588 [0] NCCL INFO NCCL_IB_TIMEOUT set by environment to 23. iv-ybpu7pvmis5m57pm6ny1:3373:3588 [0] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 7. iv-ybpu7pvmis5m57pm6ny1:3374:3592 [1] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 7. iv-ybpu7pvmis5m57pm6ny1:3379:3593 [6] NCCL INFO NCCL_IB_TIMEOUT set by environment to 23. iv-ybpu7pvmis5m57pm6ny1:3379:3593 [6] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 7. iv-ybpu7pvmis5m57pm6ny1:3380:3590 [7] NCCL INFO NCCL_IB_TIMEOUT set by environment to 23. iv-ybpu7pvmis5m57pm6ny1:3375:3586 [2] NCCL INFO NCCL_IB_TIMEOUT set by environment to 23. iv-ybpu7pvmis5m57pm6ny1:3380:3590 [7] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 7. iv-ybpu7pvmis5m57pm6ny1:3375:3586 [2] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 7. iv-ybpu7pvmis5m57pm6ny1:3376:3591 [3] NCCL INFO NCCL_IB_TIMEOUT set by environment to 23. iv-ybpu7pvmis5m57pm6ny1:3376:3591 [3] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 7. iv-ybpu7pvmis5m57pm6ny1:3378:3587 [5] NCCL INFO NCCL_IB_TIMEOUT set by environment to 23. iv-ybpu7pvmis5m57pm6ny1:3378:3587 [5] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 7. iv-ybpu7pvmis5m57pm6ny1:3377:3589 [4] NCCL INFO NCCL_IB_TIMEOUT set by environment to 23. iv-ybpu7pvmis5m57pm6ny1:3377:3589 [4] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 7. iv-ybpu7pvmis5m57pm6ny1:3376:3591 [3] NCCL INFO Trees [0] 25/-1/-1->27->26 [1] 25/-1/-1->27->26 iv-ybpu7pvmis5m57pm6ny1:3378:3587 [5] NCCL INFO Trees [0] 31/-1/-1->29->28 [1] 31/-1/-1->29->28 iv-ybpu7pvmis5m57pm6ny1:3376:3591 [3] NCCL INFO Setting affinity for GPU 3 to 03ff,ffffffff iv-ybpu7pvmis5m57pm6ny1:3378:3587 [5] NCCL INFO Setting affinity for GPU 5 to 0fffff,fffffc00,00000000 iv-ybpu7pvmis5m57pm6ny1:3380:3590 [7] NCCL INFO Trees [0] 24/-1/-1->31->29 [1] 24/-1/-1->31->29 iv-ybpu7pvmis5m57pm6ny1:3377:3589 [4] NCCL INFO Trees [0] 29/-1/-1->28->20 [1] 29/12/-1->28->-1 iv-ybpu7pvmis5m57pm6ny1:3380:3590 [7] NCCL INFO Setting affinity for GPU 7 to 0fffff,fffffc00,00000000 iv-ybpu7pvmis5m57pm6ny1:3377:3589 [4] NCCL INFO Setting affinity for GPU 4 to 0fffff,fffffc00,00000000 iv-ybpu7pvmis5m57pm6ny1:3379:3593 [6] NCCL INFO Trees [0] -1/-1/-1->30->25 [1] -1/-1/-1->30->25 iv-ybpu7pvmis5m57pm6ny1:3379:3593 [6] NCCL INFO Setting affinity for GPU 6 to 0fffff,fffffc00,00000000 iv-ybpu7pvmis5m57pm6ny1:3375:3586 [2] NCCL INFO Trees [0] 27/-1/-1->26->24 [1] 27/-1/-1->26->24 iv-ybpu7pvmis5m57pm6ny1:3375:3586 [2] NCCL INFO Setting affinity for GPU 2 to 03ff,ffffffff iv-ybpu7pvmis5m57pm6ny1:3373:3588 [0] NCCL INFO Trees [0] 26/-1/-1->24->31 [1] 26/-1/-1->24->31 iv-ybpu7pvmis5m57pm6ny1:3374:3592 [1] NCCL INFO Trees [0] 30/-1/-1->25->27 [1] 30/-1/-1->25->27 iv-ybpu7pvmis5m57pm6ny1:3373:3588 [0] NCCL INFO Setting affinity for GPU 0 to 03ff,ffffffff iv-ybpu7pvmis5m57pm6ny1:3374:3592 [1] NCCL INFO Setting affinity for GPU 1 to 03ff,ffffffff iv-ybpu7pvmis5m57pm6ny1:3375:3586 [2] NCCL INFO Channel 00 : 26[67010] -> 29[69020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3379:3593 [6] NCCL INFO Channel 00 : 30[6b010] -> 31[6b020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3373:3588 [0] NCCL INFO Channel 00 : 24[65010] -> 25[65020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3377:3589 [4] NCCL INFO Channel 00 : 28[69010] -> 30[6b010] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3375:3586 [2] NCCL INFO Channel 01 : 26[67010] -> 29[69020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3373:3588 [0] NCCL INFO Channel 01 : 24[65010] -> 25[65020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3379:3593 [6] NCCL INFO Channel 01 : 30[6b010] -> 31[6b020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3377:3589 [4] NCCL INFO Channel 01 : 28[69010] -> 30[6b010] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3378:3587 [5] NCCL INFO Channel 00 : 29[69020] -> 4[69010] [send] via NET/IBext/0/GDRDMA iv-ybpu7pvmis5m57pm6ny1:3380:3590 [7] NCCL INFO Channel 00 : 31[6b020] -> 24[65010] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3374:3592 [1] NCCL INFO Channel 00 : 25[65020] -> 27[67020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3378:3587 [5] NCCL INFO Channel 01 : 29[69020] -> 4[69010] [send] via NET/IBext/0/GDRDMA iv-ybpu7pvmis5m57pm6ny1:3380:3590 [7] NCCL INFO Channel 01 : 31[6b020] -> 24[65010] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3374:3592 [1] NCCL INFO Channel 01 : 25[65020] -> 27[67020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3374:3592 [1] NCCL INFO Connected all rings iv-ybpu7pvmis5m57pm6ny1:3380:3590 [7] NCCL INFO Connected all rings iv-ybpu7pvmis5m57pm6ny1:3373:3588 [0] NCCL INFO Connected all rings iv-ybpu7pvmis5m57pm6ny1:3379:3593 [6] NCCL INFO Connected all rings iv-ybpu7pvmis5m57pm6ny1:3377:3589 [4] NCCL INFO Channel 00 : 21[69020] -> 28[69010] [receive] via NET/IBext/0/GDRDMA iv-ybpu7pvmis5m57pm6ny1:3374:3592 [1] NCCL INFO Channel 00 : 25[65020] -> 30[6b010] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3373:3588 [0] NCCL INFO Channel 00 : 24[65010] -> 26[67010] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3376:3591 [3] NCCL INFO Channel 00 : 27[67020] -> 26[67010] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3377:3589 [4] NCCL INFO Channel 01 : 21[69020] -> 28[69010] [receive] via NET/IBext/0/GDRDMA iv-ybpu7pvmis5m57pm6ny1:3374:3592 [1] NCCL INFO Channel 01 : 25[65020] -> 30[6b010] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3373:3588 [0] NCCL INFO Channel 01 : 24[65010] -> 26[67010] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3376:3591 [3] NCCL INFO Channel 01 : 27[67020] -> 26[67010] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3376:3591 [3] NCCL INFO Connected all rings iv-ybpu7pvmis5m57pm6ny1:3375:3586 [2] NCCL INFO Connected all rings iv-ybpu7pvmis5m57pm6ny1:3379:3593 [6] NCCL INFO Channel 00 : 30[6b010] -> 25[65020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3375:3586 [2] NCCL INFO Channel 00 : 26[67010] -> 27[67020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3375:3586 [2] NCCL INFO Channel 01 : 26[67010] -> 27[67020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3379:3593 [6] NCCL INFO Channel 01 : 30[6b010] -> 25[65020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3378:3587 [5] NCCL INFO Connected all rings iv-ybpu7pvmis5m57pm6ny1:3377:3589 [4] NCCL INFO Connected all rings iv-ybpu7pvmis5m57pm6ny1:3379:3593 [6] NCCL INFO Connected all trees iv-ybpu7pvmis5m57pm6ny1:3379:3593 [6] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 8/8/512 iv-ybpu7pvmis5m57pm6ny1:3379:3593 [6] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-ybpu7pvmis5m57pm6ny1:3377:3589 [4] NCCL INFO Channel 00 : 28[69010] -> 29[69020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3376:3591 [3] NCCL INFO Channel 00 : 27[67020] -> 25[65020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3377:3589 [4] NCCL INFO Channel 01 : 28[69010] -> 29[69020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3376:3591 [3] NCCL INFO Channel 01 : 27[67020] -> 25[65020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3377:3589 [4] NCCL INFO Channel 00 : 20[69010] -> 28[69010] [receive] via NET/IBext/0/GDRDMA iv-ybpu7pvmis5m57pm6ny1:3378:3587 [5] NCCL INFO Channel 00 : 29[69020] -> 31[6b020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3375:3586 [2] NCCL INFO Channel 00 : 26[67010] -> 24[65010] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3373:3588 [0] NCCL INFO Channel 00 : 24[65010] -> 31[6b020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3374:3592 [1] NCCL INFO Connected all trees iv-ybpu7pvmis5m57pm6ny1:3374:3592 [1] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 8/8/512 iv-ybpu7pvmis5m57pm6ny1:3374:3592 [1] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-ybpu7pvmis5m57pm6ny1:3376:3591 [3] NCCL INFO Connected all trees iv-ybpu7pvmis5m57pm6ny1:3376:3591 [3] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 8/8/512 iv-ybpu7pvmis5m57pm6ny1:3376:3591 [3] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-ybpu7pvmis5m57pm6ny1:3378:3587 [5] NCCL INFO Channel 01 : 29[69020] -> 31[6b020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3375:3586 [2] NCCL INFO Channel 01 : 26[67010] -> 24[65010] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3373:3588 [0] NCCL INFO Channel 01 : 24[65010] -> 31[6b020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3374:3592 [1] NCCL INFO Channel 01 : 25[65020] -> 28[69010] via P2P/indirect/30[6b010] iv-ybpu7pvmis5m57pm6ny1:3376:3591 [3] NCCL INFO Channel 00 : 27[67020] -> 29[69020] via P2P/indirect/28[69010] iv-ybpu7pvmis5m57pm6ny1:3380:3590 [7] NCCL INFO Channel 00 : 31[6b020] -> 29[69020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3380:3590 [7] NCCL INFO Channel 01 : 31[6b020] -> 29[69020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3377:3589 [4] NCCL INFO Channel 01 : 12[69010] -> 28[69010] [receive] via NET/IBext/0/GDRDMA iv-ybpu7pvmis5m57pm6ny1:3373:3588 [0] NCCL INFO Connected all trees iv-ybpu7pvmis5m57pm6ny1:3373:3588 [0] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 8/8/512 iv-ybpu7pvmis5m57pm6ny1:3373:3588 [0] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-ybpu7pvmis5m57pm6ny1:3375:3586 [2] NCCL INFO Connected all trees iv-ybpu7pvmis5m57pm6ny1:3375:3586 [2] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 8/8/512 iv-ybpu7pvmis5m57pm6ny1:3375:3586 [2] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-ybpu7pvmis5m57pm6ny1:3377:3589 [4] NCCL INFO Channel 01 : 28[69010] -> 12[69010] [send] via NET/IBext/0/GDRDMA iv-ybpu7pvmis5m57pm6ny1:3380:3590 [7] NCCL INFO Connected all trees iv-ybpu7pvmis5m57pm6ny1:3380:3590 [7] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 8/8/512 iv-ybpu7pvmis5m57pm6ny1:3380:3590 [7] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-ybpu7pvmis5m57pm6ny1:3373:3588 [0] NCCL INFO Channel 00 : 24[65010] -> 28[69010] via P2P/indirect/27[67020] iv-ybpu7pvmis5m57pm6ny1:3375:3586 [2] NCCL INFO Channel 00 : 26[67010] -> 28[69010] via P2P/indirect/29[69020] iv-ybpu7pvmis5m57pm6ny1:3378:3587 [5] NCCL INFO Channel 00 : 29[69020] -> 28[69010] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3378:3587 [5] NCCL INFO Channel 01 : 29[69020] -> 28[69010] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3377:3589 [4] NCCL INFO Channel 00 : 28[69010] -> 20[69010] [send] via NET/IBext/0/GDRDMA iv-ybpu7pvmis5m57pm6ny1:3377:3589 [4] NCCL INFO Connected all trees iv-ybpu7pvmis5m57pm6ny1:3377:3589 [4] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 8/8/512 iv-ybpu7pvmis5m57pm6ny1:3377:3589 [4] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-ybpu7pvmis5m57pm6ny1:3378:3587 [5] NCCL INFO Connected all trees iv-ybpu7pvmis5m57pm6ny1:3378:3587 [5] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 8/8/512 iv-ybpu7pvmis5m57pm6ny1:3378:3587 [5] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-ybpu7pvmis5m57pm6ny1:3376:3591 [3] NCCL INFO Channel 01 : 27[67020] -> 30[6b010] via P2P/indirect/25[65020] iv-ybpu7pvmis5m57pm6ny1:3375:3586 [2] NCCL INFO Channel 00 : 26[67010] -> 30[6b010] via P2P/indirect/25[65020] iv-ybpu7pvmis5m57pm6ny1:3376:3591 [3] NCCL INFO Channel 00 : 27[67020] -> 31[6b020] via P2P/indirect/24[65010] iv-ybpu7pvmis5m57pm6ny1:3374:3592 [1] NCCL INFO Channel 00 : 25[65020] -> 29[69020] via P2P/indirect/30[6b010] iv-ybpu7pvmis5m57pm6ny1:3375:3586 [2] NCCL INFO Channel 01 : 26[67010] -> 31[6b020] via P2P/indirect/24[65010] iv-ybpu7pvmis5m57pm6ny1:3377:3589 [4] NCCL INFO Channel 00 : 28[69010] -> 24[65010] via P2P/indirect/31[6b020] iv-ybpu7pvmis5m57pm6ny1:3374:3592 [1] NCCL INFO Channel 00 : 25[65020] -> 31[6b020] via P2P/indirect/24[65010] iv-ybpu7pvmis5m57pm6ny1:3373:3588 [0] NCCL INFO Channel 01 : 24[65010] -> 29[69020] via P2P/indirect/31[6b020] iv-ybpu7pvmis5m57pm6ny1:3373:3588 [0] NCCL INFO Channel 00 : 24[65010] -> 30[6b010] via P2P/indirect/25[65020] iv-ybpu7pvmis5m57pm6ny1:3378:3587 [5] NCCL INFO Channel 01 : 29[69020] -> 24[65010] via P2P/indirect/31[6b020] iv-ybpu7pvmis5m57pm6ny1:3380:3590 [7] NCCL INFO Channel 00 : 31[6b020] -> 25[65020] via P2P/indirect/30[6b010] iv-ybpu7pvmis5m57pm6ny1:3379:3593 [6] NCCL INFO Channel 00 : 30[6b010] -> 24[65010] via P2P/indirect/31[6b020] iv-ybpu7pvmis5m57pm6ny1:3380:3590 [7] NCCL INFO Channel 01 : 31[6b020] -> 26[67010] via P2P/indirect/24[65010] iv-ybpu7pvmis5m57pm6ny1:3379:3593 [6] NCCL INFO Channel 00 : 30[6b010] -> 26[67010] via P2P/indirect/29[69020] iv-ybpu7pvmis5m57pm6ny1:3380:3590 [7] NCCL INFO Channel 00 : 31[6b020] -> 27[67020] via P2P/indirect/24[65010] iv-ybpu7pvmis5m57pm6ny1:3379:3593 [6] NCCL INFO Channel 01 : 30[6b010] -> 27[67020] via P2P/indirect/25[65020] iv-ybpu7pvmis5m57pm6ny1:3378:3587 [5] NCCL INFO Channel 00 : 29[69020] -> 25[65020] via P2P/indirect/30[6b010] iv-ybpu7pvmis5m57pm6ny1:3378:3587 [5] NCCL INFO Channel 00 : 29[69020] -> 27[67020] via P2P/indirect/26[67010] iv-ybpu7pvmis5m57pm6ny1:3377:3589 [4] NCCL INFO Channel 01 : 28[69010] -> 25[65020] via P2P/indirect/30[6b010] iv-ybpu7pvmis5m57pm6ny1:3377:3589 [4] NCCL INFO Channel 00 : 28[69010] -> 26[67010] via P2P/indirect/27[67020] iv-ybpu7pvmis5m57pm6ny1:3374:3592 [1] NCCL INFO comm 0x7fb6d0008fb0 rank 25 nranks 32 cudaDev 1 busId 65020 - Init COMPLETE iv-ybpu7pvmis5m57pm6ny1:3378:3587 [5] NCCL INFO comm 0x7f6bd8008fb0 rank 29 nranks 32 cudaDev 5 busId 69020 - Init COMPLETE iv-ybpu7pvmis5m57pm6ny1:3375:3586 [2] NCCL INFO comm 0x7f776c008fb0 rank 26 nranks 32 cudaDev 2 busId 67010 - Init COMPLETE iv-ybpu7pvmis5m57pm6ny1:3379:3593 [6] NCCL INFO comm 0x7f8888008fb0 rank 30 nranks 32 cudaDev 6 busId 6b010 - Init COMPLETE iv-ybpu7pvmis5m57pm6ny1:3376:3591 [3] NCCL INFO comm 0x7ff748008fb0 rank 27 nranks 32 cudaDev 3 busId 67020 - Init COMPLETE iv-ybpu7pvmis5m57pm6ny1:3373:3588 [0] NCCL INFO comm 0x7f2d1c008fb0 rank 24 nranks 32 cudaDev 0 busId 65010 - Init COMPLETE iv-ybpu7pvmis5m57pm6ny1:3377:3589 [4] NCCL INFO comm 0x7faae0008fb0 rank 28 nranks 32 cudaDev 4 busId 69010 - Init COMPLETE iv-ybpu7pvmis5m57pm6ny1:3380:3590 [7] NCCL INFO comm 0x7fbd04008fb0 rank 31 nranks 32 cudaDev 7 busId 6b020 - Init COMPLETE iv-ybpu7pvmis5m57pm6ny1:3373:3635 [0] NCCL INFO Trees [0] 26/-1/-1->24->31 [1] 26/-1/-1->24->31 iv-ybpu7pvmis5m57pm6ny1:3375:3638 [2] NCCL INFO Trees [0] 27/-1/-1->26->24 [1] 27/-1/-1->26->24 iv-ybpu7pvmis5m57pm6ny1:3376:3639 [3] NCCL INFO Trees [0] 25/-1/-1->27->26 [1] 25/-1/-1->27->26 iv-ybpu7pvmis5m57pm6ny1:3375:3638 [2] NCCL INFO Setting affinity for GPU 2 to 03ff,ffffffff iv-ybpu7pvmis5m57pm6ny1:3373:3635 [0] NCCL INFO Setting affinity for GPU 0 to 03ff,ffffffff iv-ybpu7pvmis5m57pm6ny1:3376:3639 [3] NCCL INFO Setting affinity for GPU 3 to 03ff,ffffffff iv-ybpu7pvmis5m57pm6ny1:3374:3640 [1] NCCL INFO Trees [0] 30/-1/-1->25->27 [1] 30/-1/-1->25->27 iv-ybpu7pvmis5m57pm6ny1:3374:3640 [1] NCCL INFO Setting affinity for GPU 1 to 03ff,ffffffff iv-ybpu7pvmis5m57pm6ny1:3380:3641 [7] NCCL INFO Trees [0] 24/-1/-1->31->29 [1] 24/-1/-1->31->29 iv-ybpu7pvmis5m57pm6ny1:3377:3642 [4] NCCL INFO Trees [0] 29/-1/-1->28->20 [1] 29/12/-1->28->-1 iv-ybpu7pvmis5m57pm6ny1:3379:3637 [6] NCCL INFO Trees [0] -1/-1/-1->30->25 [1] -1/-1/-1->30->25 iv-ybpu7pvmis5m57pm6ny1:3378:3636 [5] NCCL INFO Trees [0] 31/-1/-1->29->28 [1] 31/-1/-1->29->28 iv-ybpu7pvmis5m57pm6ny1:3377:3642 [4] NCCL INFO Setting affinity for GPU 4 to 0fffff,fffffc00,00000000 iv-ybpu7pvmis5m57pm6ny1:3380:3641 [7] NCCL INFO Setting affinity for GPU 7 to 0fffff,fffffc00,00000000 iv-ybpu7pvmis5m57pm6ny1:3379:3637 [6] NCCL INFO Setting affinity for GPU 6 to 0fffff,fffffc00,00000000 iv-ybpu7pvmis5m57pm6ny1:3378:3636 [5] NCCL INFO Setting affinity for GPU 5 to 0fffff,fffffc00,00000000 iv-ybpu7pvmis5m57pm6ny1:3379:3637 [6] NCCL INFO Channel 00 : 30[6b010] -> 31[6b020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3375:3638 [2] NCCL INFO Channel 00 : 26[67010] -> 29[69020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3377:3642 [4] NCCL INFO Channel 00 : 28[69010] -> 30[6b010] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3375:3638 [2] NCCL INFO Channel 01 : 26[67010] -> 29[69020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3379:3637 [6] NCCL INFO Channel 01 : 30[6b010] -> 31[6b020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3377:3642 [4] NCCL INFO Channel 01 : 28[69010] -> 30[6b010] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3373:3635 [0] NCCL INFO Channel 00 : 24[65010] -> 25[65020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3373:3635 [0] NCCL INFO Channel 01 : 24[65010] -> 25[65020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3378:3636 [5] NCCL INFO Channel 00 : 29[69020] -> 4[69010] [send] via NET/IBext/0/GDRDMA iv-ybpu7pvmis5m57pm6ny1:3380:3641 [7] NCCL INFO Channel 00 : 31[6b020] -> 24[65010] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3380:3641 [7] NCCL INFO Channel 01 : 31[6b020] -> 24[65010] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3374:3640 [1] NCCL INFO Channel 00 : 25[65020] -> 27[67020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3378:3636 [5] NCCL INFO Channel 01 : 29[69020] -> 4[69010] [send] via NET/IBext/0/GDRDMA iv-ybpu7pvmis5m57pm6ny1:3374:3640 [1] NCCL INFO Channel 01 : 25[65020] -> 27[67020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3379:3637 [6] NCCL INFO Connected all rings iv-ybpu7pvmis5m57pm6ny1:3380:3641 [7] NCCL INFO Connected all rings iv-ybpu7pvmis5m57pm6ny1:3373:3635 [0] NCCL INFO Connected all rings iv-ybpu7pvmis5m57pm6ny1:3374:3640 [1] NCCL INFO Connected all rings iv-ybpu7pvmis5m57pm6ny1:3377:3642 [4] NCCL INFO Channel 00 : 21[69020] -> 28[69010] [receive] via NET/IBext/0/GDRDMA iv-ybpu7pvmis5m57pm6ny1:3373:3635 [0] NCCL INFO Channel 00 : 24[65010] -> 26[67010] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3374:3640 [1] NCCL INFO Channel 00 : 25[65020] -> 30[6b010] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3376:3639 [3] NCCL INFO Channel 00 : 27[67020] -> 26[67010] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3377:3642 [4] NCCL INFO Channel 01 : 21[69020] -> 28[69010] [receive] via NET/IBext/0/GDRDMA iv-ybpu7pvmis5m57pm6ny1:3373:3635 [0] NCCL INFO Channel 01 : 24[65010] -> 26[67010] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3374:3640 [1] NCCL INFO Channel 01 : 25[65020] -> 30[6b010] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3376:3639 [3] NCCL INFO Channel 01 : 27[67020] -> 26[67010] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3376:3639 [3] NCCL INFO Connected all rings iv-ybpu7pvmis5m57pm6ny1:3375:3638 [2] NCCL INFO Connected all rings iv-ybpu7pvmis5m57pm6ny1:3379:3637 [6] NCCL INFO Channel 00 : 30[6b010] -> 25[65020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3375:3638 [2] NCCL INFO Channel 00 : 26[67010] -> 27[67020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3379:3637 [6] NCCL INFO Channel 01 : 30[6b010] -> 25[65020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3375:3638 [2] NCCL INFO Channel 01 : 26[67010] -> 27[67020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3378:3636 [5] NCCL INFO Connected all rings iv-ybpu7pvmis5m57pm6ny1:3377:3642 [4] NCCL INFO Connected all rings iv-ybpu7pvmis5m57pm6ny1:3379:3637 [6] NCCL INFO Connected all trees iv-ybpu7pvmis5m57pm6ny1:3379:3637 [6] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 8/8/512 iv-ybpu7pvmis5m57pm6ny1:3379:3637 [6] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-ybpu7pvmis5m57pm6ny1:3377:3642 [4] NCCL INFO Channel 00 : 28[69010] -> 29[69020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3376:3639 [3] NCCL INFO Channel 00 : 27[67020] -> 25[65020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3377:3642 [4] NCCL INFO Channel 01 : 28[69010] -> 29[69020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3376:3639 [3] NCCL INFO Channel 01 : 27[67020] -> 25[65020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3376:3639 [3] NCCL INFO Connected all trees iv-ybpu7pvmis5m57pm6ny1:3376:3639 [3] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 8/8/512 iv-ybpu7pvmis5m57pm6ny1:3376:3639 [3] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-ybpu7pvmis5m57pm6ny1:3374:3640 [1] NCCL INFO Connected all trees iv-ybpu7pvmis5m57pm6ny1:3374:3640 [1] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 8/8/512 iv-ybpu7pvmis5m57pm6ny1:3374:3640 [1] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-ybpu7pvmis5m57pm6ny1:3378:3636 [5] NCCL INFO Channel 00 : 29[69020] -> 31[6b020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3375:3638 [2] NCCL INFO Channel 00 : 26[67010] -> 24[65010] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3373:3635 [0] NCCL INFO Channel 00 : 24[65010] -> 31[6b020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3378:3636 [5] NCCL INFO Channel 01 : 29[69020] -> 31[6b020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3374:3640 [1] NCCL INFO Channel 01 : 25[65020] -> 28[69010] via P2P/indirect/30[6b010] iv-ybpu7pvmis5m57pm6ny1:3375:3638 [2] NCCL INFO Channel 01 : 26[67010] -> 24[65010] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3373:3635 [0] NCCL INFO Channel 01 : 24[65010] -> 31[6b020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3377:3642 [4] NCCL INFO Channel 00 : 20[69010] -> 28[69010] [receive] via NET/IBext/0/GDRDMA iv-ybpu7pvmis5m57pm6ny1:3376:3639 [3] NCCL INFO Channel 00 : 27[67020] -> 29[69020] via P2P/indirect/28[69010] iv-ybpu7pvmis5m57pm6ny1:3380:3641 [7] NCCL INFO Channel 00 : 31[6b020] -> 29[69020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3380:3641 [7] NCCL INFO Channel 01 : 31[6b020] -> 29[69020] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3373:3635 [0] NCCL INFO Connected all trees iv-ybpu7pvmis5m57pm6ny1:3373:3635 [0] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 8/8/512 iv-ybpu7pvmis5m57pm6ny1:3373:3635 [0] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-ybpu7pvmis5m57pm6ny1:3375:3638 [2] NCCL INFO Connected all trees iv-ybpu7pvmis5m57pm6ny1:3375:3638 [2] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 8/8/512 iv-ybpu7pvmis5m57pm6ny1:3375:3638 [2] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-ybpu7pvmis5m57pm6ny1:3380:3641 [7] NCCL INFO Connected all trees iv-ybpu7pvmis5m57pm6ny1:3380:3641 [7] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 8/8/512 iv-ybpu7pvmis5m57pm6ny1:3380:3641 [7] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-ybpu7pvmis5m57pm6ny1:3373:3635 [0] NCCL INFO Channel 00 : 24[65010] -> 28[69010] via P2P/indirect/27[67020] iv-ybpu7pvmis5m57pm6ny1:3375:3638 [2] NCCL INFO Channel 00 : 26[67010] -> 28[69010] via P2P/indirect/29[69020] iv-ybpu7pvmis5m57pm6ny1:3378:3636 [5] NCCL INFO Channel 00 : 29[69020] -> 28[69010] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3378:3636 [5] NCCL INFO Channel 01 : 29[69020] -> 28[69010] via P2P/IPC iv-ybpu7pvmis5m57pm6ny1:3377:3642 [4] NCCL INFO Channel 01 : 12[69010] -> 28[69010] [receive] via NET/IBext/0/GDRDMA iv-ybpu7pvmis5m57pm6ny1:3377:3642 [4] NCCL INFO Channel 01 : 28[69010] -> 12[69010] [send] via NET/IBext/0/GDRDMA iv-ybpu7pvmis5m57pm6ny1:3377:3642 [4] NCCL INFO Channel 00 : 28[69010] -> 20[69010] [send] via NET/IBext/0/GDRDMA iv-ybpu7pvmis5m57pm6ny1:3377:3642 [4] NCCL INFO Connected all trees iv-ybpu7pvmis5m57pm6ny1:3377:3642 [4] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 8/8/512 iv-ybpu7pvmis5m57pm6ny1:3377:3642 [4] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-ybpu7pvmis5m57pm6ny1:3378:3636 [5] NCCL INFO Connected all trees iv-ybpu7pvmis5m57pm6ny1:3378:3636 [5] NCCL INFO threadThresholds 8/8/64 | 256/8/64 | 8/8/512 iv-ybpu7pvmis5m57pm6ny1:3378:3636 [5] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-ybpu7pvmis5m57pm6ny1:3375:3638 [2] NCCL INFO Channel 00 : 26[67010] -> 30[6b010] via P2P/indirect/25[65020] iv-ybpu7pvmis5m57pm6ny1:3376:3639 [3] NCCL INFO Channel 01 : 27[67020] -> 30[6b010] via P2P/indirect/25[65020] iv-ybpu7pvmis5m57pm6ny1:3374:3640 [1] NCCL INFO Channel 00 : 25[65020] -> 29[69020] via P2P/indirect/30[6b010] iv-ybpu7pvmis5m57pm6ny1:3377:3642 [4] NCCL INFO Channel 00 : 28[69010] -> 24[65010] via P2P/indirect/31[6b020] iv-ybpu7pvmis5m57pm6ny1:3376:3639 [3] NCCL INFO Channel 00 : 27[67020] -> 31[6b020] via P2P/indirect/24[65010] iv-ybpu7pvmis5m57pm6ny1:3373:3635 [0] NCCL INFO Channel 01 : 24[65010] -> 29[69020] via P2P/indirect/31[6b020] iv-ybpu7pvmis5m57pm6ny1:3374:3640 [1] NCCL INFO Channel 00 : 25[65020] -> 31[6b020] via P2P/indirect/24[65010] iv-ybpu7pvmis5m57pm6ny1:3375:3638 [2] NCCL INFO Channel 01 : 26[67010] -> 31[6b020] via P2P/indirect/24[65010] iv-ybpu7pvmis5m57pm6ny1:3378:3636 [5] NCCL INFO Channel 01 : 29[69020] -> 24[65010] via P2P/indirect/31[6b020] iv-ybpu7pvmis5m57pm6ny1:3373:3635 [0] NCCL INFO Channel 00 : 24[65010] -> 30[6b010] via P2P/indirect/25[65020] iv-ybpu7pvmis5m57pm6ny1:3379:3637 [6] NCCL INFO Channel 00 : 30[6b010] -> 24[65010] via P2P/indirect/31[6b020] iv-ybpu7pvmis5m57pm6ny1:3379:3637 [6] NCCL INFO Channel 00 : 30[6b010] -> 26[67010] via P2P/indirect/29[69020] iv-ybpu7pvmis5m57pm6ny1:3380:3641 [7] NCCL INFO Channel 00 : 31[6b020] -> 25[65020] via P2P/indirect/30[6b010] iv-ybpu7pvmis5m57pm6ny1:3378:3636 [5] NCCL INFO Channel 00 : 29[69020] -> 25[65020] via P2P/indirect/30[6b010] iv-ybpu7pvmis5m57pm6ny1:3380:3641 [7] NCCL INFO Channel 01 : 31[6b020] -> 26[67010] via P2P/indirect/24[65010] iv-ybpu7pvmis5m57pm6ny1:3377:3642 [4] NCCL INFO Channel 01 : 28[69010] -> 25[65020] via P2P/indirect/30[6b010] iv-ybpu7pvmis5m57pm6ny1:3378:3636 [5] NCCL INFO Channel 00 : 29[69020] -> 27[67020] via P2P/indirect/26[67010] iv-ybpu7pvmis5m57pm6ny1:3380:3641 [7] NCCL INFO Channel 00 : 31[6b020] -> 27[67020] via P2P/indirect/24[65010] iv-ybpu7pvmis5m57pm6ny1:3377:3642 [4] NCCL INFO Channel 00 : 28[69010] -> 26[67010] via P2P/indirect/27[67020] iv-ybpu7pvmis5m57pm6ny1:3379:3637 [6] NCCL INFO Channel 01 : 30[6b010] -> 27[67020] via P2P/indirect/25[65020] iv-ybpu7pvmis5m57pm6ny1:3380:3641 [7] NCCL INFO comm 0x7fbc18008fb0 rank 31 nranks 32 cudaDev 7 busId 6b020 - Init COMPLETE iv-ybpu7pvmis5m57pm6ny1:3374:3640 [1] NCCL INFO comm 0x7fb5ec008fb0 rank 25 nranks 32 cudaDev 1 busId 65020 - Init COMPLETE iv-ybpu7pvmis5m57pm6ny1:3376:3639 [3] NCCL INFO comm 0x7ff660008fb0 rank 27 nranks 32 cudaDev 3 busId 67020 - Init COMPLETE iv-ybpu7pvmis5m57pm6ny1:3373:3635 [0] NCCL INFO comm 0x7f2c38008fb0 rank 24 nranks 32 cudaDev 0 busId 65010 - Init COMPLETE iv-ybpu7pvmis5m57pm6ny1:3377:3642 [4] NCCL INFO comm 0x7faa04008fb0 rank 28 nranks 32 cudaDev 4 busId 69010 - Init COMPLETE iv-ybpu7pvmis5m57pm6ny1:3378:3636 [5] NCCL INFO comm 0x7f6af4008fb0 rank 29 nranks 32 cudaDev 5 busId 69020 - Init COMPLETE iv-ybpu7pvmis5m57pm6ny1:3375:3638 [2] NCCL INFO comm 0x7f7684008fb0 rank 26 nranks 32 cudaDev 2 busId 67010 - Init COMPLETE iv-ybpu7pvmis5m57pm6ny1:3379:3637 [6] NCCL INFO comm 0x7f87a0008fb0 rank 30 nranks 32 cudaDev 6 busId 6b010 - Init COMPLETE NCCL version 2.10.3+cuda11.4 NCCL version 2.10.3+cuda11.4 NCCL version 2.10.3+cuda11.4 NCCL version 2.10.3+cuda11.4 NCCL version 2.10.3+cuda11.4 NCCL version 2.10.3+cuda11.4 NCCL version 2.10.3+cuda11.4 NCCL version 2.10.3+cuda11.4 iv-ybpu7pvmis5m57pm6ny1:3375:3665 [2] NCCL INFO Channel 00/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3665 [2] NCCL INFO Channel 01/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3665 [2] NCCL INFO Channel 02/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3665 [2] NCCL INFO Channel 03/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3665 [2] NCCL INFO Channel 04/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3665 [2] NCCL INFO Channel 05/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3665 [2] NCCL INFO Channel 06/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3665 [2] NCCL INFO Channel 07/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3665 [2] NCCL INFO Channel 08/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3665 [2] NCCL INFO Channel 09/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3665 [2] NCCL INFO Channel 10/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3665 [2] NCCL INFO Channel 11/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3665 [2] NCCL INFO Channel 12/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3665 [2] NCCL INFO Channel 13/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3665 [2] NCCL INFO Channel 14/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3665 [2] NCCL INFO Channel 15/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3665 [2] NCCL INFO Channel 16/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3665 [2] NCCL INFO Channel 17/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3665 [2] NCCL INFO Channel 18/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3665 [2] NCCL INFO Channel 19/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3665 [2] NCCL INFO Channel 20/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3665 [2] NCCL INFO Channel 21/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3665 [2] NCCL INFO Channel 22/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3665 [2] NCCL INFO Channel 23/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3665 [2] NCCL INFO Channel 24/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3665 [2] NCCL INFO Channel 25/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3665 [2] NCCL INFO Channel 26/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3665 [2] NCCL INFO Channel 27/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3665 [2] NCCL INFO Channel 28/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3665 [2] NCCL INFO Channel 29/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3665 [2] NCCL INFO Channel 30/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3665 [2] NCCL INFO Channel 31/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3665 [2] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-ybpu7pvmis5m57pm6ny1:3375:3665 [2] NCCL INFO Setting affinity for GPU 2 to 03ff,ffffffff iv-ybpu7pvmis5m57pm6ny1:3376:3663 [3] NCCL INFO Channel 00/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3663 [3] NCCL INFO Channel 01/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3663 [3] NCCL INFO Channel 02/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3663 [3] NCCL INFO Channel 03/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3663 [3] NCCL INFO Channel 04/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3663 [3] NCCL INFO Channel 05/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3663 [3] NCCL INFO Channel 06/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3663 [3] NCCL INFO Channel 07/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3663 [3] NCCL INFO Channel 08/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3663 [3] NCCL INFO Channel 09/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3663 [3] NCCL INFO Channel 10/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3663 [3] NCCL INFO Channel 11/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3663 [3] NCCL INFO Channel 12/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3663 [3] NCCL INFO Channel 13/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3663 [3] NCCL INFO Channel 14/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3663 [3] NCCL INFO Channel 15/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3663 [3] NCCL INFO Channel 16/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3663 [3] NCCL INFO Channel 17/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3663 [3] NCCL INFO Channel 18/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3663 [3] NCCL INFO Channel 19/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3663 [3] NCCL INFO Channel 20/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3663 [3] NCCL INFO Channel 21/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3663 [3] NCCL INFO Channel 22/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3663 [3] NCCL INFO Channel 23/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3663 [3] NCCL INFO Channel 24/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3663 [3] NCCL INFO Channel 25/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3663 [3] NCCL INFO Channel 26/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3663 [3] NCCL INFO Channel 27/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3663 [3] NCCL INFO Channel 28/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3663 [3] NCCL INFO Channel 29/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3663 [3] NCCL INFO Channel 30/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3663 [3] NCCL INFO Channel 31/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3663 [3] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-ybpu7pvmis5m57pm6ny1:3376:3663 [3] NCCL INFO Setting affinity for GPU 3 to 03ff,ffffffff iv-ybpu7pvmis5m57pm6ny1:3373:3667 [0] NCCL INFO Channel 00/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3667 [0] NCCL INFO Channel 01/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3667 [0] NCCL INFO Channel 02/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3667 [0] NCCL INFO Channel 03/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3667 [0] NCCL INFO Channel 04/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3667 [0] NCCL INFO Channel 05/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3667 [0] NCCL INFO Channel 06/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3667 [0] NCCL INFO Channel 07/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3667 [0] NCCL INFO Channel 08/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3667 [0] NCCL INFO Channel 09/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3667 [0] NCCL INFO Channel 10/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3667 [0] NCCL INFO Channel 11/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3667 [0] NCCL INFO Channel 12/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3667 [0] NCCL INFO Channel 13/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3667 [0] NCCL INFO Channel 14/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3667 [0] NCCL INFO Channel 15/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3667 [0] NCCL INFO Channel 16/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3667 [0] NCCL INFO Channel 17/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3667 [0] NCCL INFO Channel 18/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3667 [0] NCCL INFO Channel 19/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3667 [0] NCCL INFO Channel 20/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3667 [0] NCCL INFO Channel 21/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3667 [0] NCCL INFO Channel 22/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3667 [0] NCCL INFO Channel 23/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3667 [0] NCCL INFO Channel 24/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3667 [0] NCCL INFO Channel 25/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3667 [0] NCCL INFO Channel 26/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3667 [0] NCCL INFO Channel 27/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3667 [0] NCCL INFO Channel 28/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3667 [0] NCCL INFO Channel 29/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3667 [0] NCCL INFO Channel 30/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3667 [0] NCCL INFO Channel 31/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3667 [0] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-ybpu7pvmis5m57pm6ny1:3373:3667 [0] NCCL INFO Setting affinity for GPU 0 to 03ff,ffffffff iv-ybpu7pvmis5m57pm6ny1:3380:3668 [7] NCCL INFO Channel 00/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3668 [7] NCCL INFO Channel 01/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3668 [7] NCCL INFO Channel 02/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3668 [7] NCCL INFO Channel 03/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3668 [7] NCCL INFO Channel 04/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3668 [7] NCCL INFO Channel 05/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3668 [7] NCCL INFO Channel 06/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3668 [7] NCCL INFO Channel 07/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3668 [7] NCCL INFO Channel 08/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3668 [7] NCCL INFO Channel 09/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3668 [7] NCCL INFO Channel 10/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3668 [7] NCCL INFO Channel 11/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3668 [7] NCCL INFO Channel 12/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3668 [7] NCCL INFO Channel 13/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3668 [7] NCCL INFO Channel 14/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3668 [7] NCCL INFO Channel 15/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3668 [7] NCCL INFO Channel 16/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3668 [7] NCCL INFO Channel 17/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3668 [7] NCCL INFO Channel 18/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3668 [7] NCCL INFO Channel 19/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3668 [7] NCCL INFO Channel 20/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3668 [7] NCCL INFO Channel 21/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3668 [7] NCCL INFO Channel 22/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3668 [7] NCCL INFO Channel 23/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3668 [7] NCCL INFO Channel 24/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3668 [7] NCCL INFO Channel 25/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3668 [7] NCCL INFO Channel 26/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3668 [7] NCCL INFO Channel 27/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3668 [7] NCCL INFO Channel 28/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3668 [7] NCCL INFO Channel 29/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3668 [7] NCCL INFO Channel 30/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3668 [7] NCCL INFO Channel 31/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3668 [7] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-ybpu7pvmis5m57pm6ny1:3380:3668 [7] NCCL INFO Setting affinity for GPU 7 to 0fffff,fffffc00,00000000 iv-ybpu7pvmis5m57pm6ny1:3378:3664 [5] NCCL INFO Channel 00/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3664 [5] NCCL INFO Channel 01/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3664 [5] NCCL INFO Channel 02/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3664 [5] NCCL INFO Channel 03/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3664 [5] NCCL INFO Channel 04/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3664 [5] NCCL INFO Channel 05/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3664 [5] NCCL INFO Channel 06/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3664 [5] NCCL INFO Channel 07/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3664 [5] NCCL INFO Channel 08/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3664 [5] NCCL INFO Channel 09/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3664 [5] NCCL INFO Channel 10/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3664 [5] NCCL INFO Channel 11/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3664 [5] NCCL INFO Channel 12/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3664 [5] NCCL INFO Channel 13/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3664 [5] NCCL INFO Channel 14/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3664 [5] NCCL INFO Channel 15/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3664 [5] NCCL INFO Channel 16/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3664 [5] NCCL INFO Channel 17/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3664 [5] NCCL INFO Channel 18/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3664 [5] NCCL INFO Channel 19/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3664 [5] NCCL INFO Channel 20/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3664 [5] NCCL INFO Channel 21/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3664 [5] NCCL INFO Channel 22/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3664 [5] NCCL INFO Channel 23/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3664 [5] NCCL INFO Channel 24/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3664 [5] NCCL INFO Channel 25/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3664 [5] NCCL INFO Channel 26/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3664 [5] NCCL INFO Channel 27/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3664 [5] NCCL INFO Channel 28/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3664 [5] NCCL INFO Channel 29/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3664 [5] NCCL INFO Channel 30/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3664 [5] NCCL INFO Channel 31/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3664 [5] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-ybpu7pvmis5m57pm6ny1:3378:3664 [5] NCCL INFO Setting affinity for GPU 5 to 0fffff,fffffc00,00000000 iv-ybpu7pvmis5m57pm6ny1:3379:3671 [6] NCCL INFO Channel 00/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3671 [6] NCCL INFO Channel 01/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3671 [6] NCCL INFO Channel 02/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3671 [6] NCCL INFO Channel 03/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3671 [6] NCCL INFO Channel 04/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3671 [6] NCCL INFO Channel 05/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3671 [6] NCCL INFO Channel 06/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3671 [6] NCCL INFO Channel 07/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3671 [6] NCCL INFO Channel 08/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3671 [6] NCCL INFO Channel 09/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3671 [6] NCCL INFO Channel 10/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3671 [6] NCCL INFO Channel 11/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3671 [6] NCCL INFO Channel 12/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3671 [6] NCCL INFO Channel 13/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3671 [6] NCCL INFO Channel 14/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3671 [6] NCCL INFO Channel 15/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3671 [6] NCCL INFO Channel 16/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3671 [6] NCCL INFO Channel 17/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3671 [6] NCCL INFO Channel 18/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3671 [6] NCCL INFO Channel 19/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3671 [6] NCCL INFO Channel 20/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3671 [6] NCCL INFO Channel 21/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3671 [6] NCCL INFO Channel 22/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3671 [6] NCCL INFO Channel 23/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3671 [6] NCCL INFO Channel 24/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3671 [6] NCCL INFO Channel 25/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3671 [6] NCCL INFO Channel 26/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3671 [6] NCCL INFO Channel 27/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3671 [6] NCCL INFO Channel 28/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3671 [6] NCCL INFO Channel 29/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3671 [6] NCCL INFO Channel 30/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3671 [6] NCCL INFO Channel 31/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3671 [6] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-ybpu7pvmis5m57pm6ny1:3379:3671 [6] NCCL INFO Setting affinity for GPU 6 to 0fffff,fffffc00,00000000 iv-ybpu7pvmis5m57pm6ny1:3374:3672 [1] NCCL INFO Channel 00/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3672 [1] NCCL INFO Channel 01/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3672 [1] NCCL INFO Channel 02/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3672 [1] NCCL INFO Channel 03/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3672 [1] NCCL INFO Channel 04/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3672 [1] NCCL INFO Channel 05/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3672 [1] NCCL INFO Channel 06/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3672 [1] NCCL INFO Channel 07/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3672 [1] NCCL INFO Channel 08/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3672 [1] NCCL INFO Channel 09/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3672 [1] NCCL INFO Channel 10/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3672 [1] NCCL INFO Channel 11/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3672 [1] NCCL INFO Channel 12/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3672 [1] NCCL INFO Channel 13/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3672 [1] NCCL INFO Channel 14/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3672 [1] NCCL INFO Channel 15/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3672 [1] NCCL INFO Channel 16/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3672 [1] NCCL INFO Channel 17/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3672 [1] NCCL INFO Channel 18/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3672 [1] NCCL INFO Channel 19/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3672 [1] NCCL INFO Channel 20/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3672 [1] NCCL INFO Channel 21/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3672 [1] NCCL INFO Channel 22/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3672 [1] NCCL INFO Channel 23/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3672 [1] NCCL INFO Channel 24/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3672 [1] NCCL INFO Channel 25/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3672 [1] NCCL INFO Channel 26/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3672 [1] NCCL INFO Channel 27/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3672 [1] NCCL INFO Channel 28/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3672 [1] NCCL INFO Channel 29/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3672 [1] NCCL INFO Channel 30/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3672 [1] NCCL INFO Channel 31/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3672 [1] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-ybpu7pvmis5m57pm6ny1:3374:3672 [1] NCCL INFO Setting affinity for GPU 1 to 03ff,ffffffff iv-ybpu7pvmis5m57pm6ny1:3377:3681 [4] NCCL INFO Channel 00/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3681 [4] NCCL INFO Channel 01/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3681 [4] NCCL INFO Channel 02/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3681 [4] NCCL INFO Channel 03/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3681 [4] NCCL INFO Channel 04/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3681 [4] NCCL INFO Channel 05/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3681 [4] NCCL INFO Channel 06/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3681 [4] NCCL INFO Channel 07/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3681 [4] NCCL INFO Channel 08/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3681 [4] NCCL INFO Channel 09/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3681 [4] NCCL INFO Channel 10/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3681 [4] NCCL INFO Channel 11/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3681 [4] NCCL INFO Channel 12/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3681 [4] NCCL INFO Channel 13/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3681 [4] NCCL INFO Channel 14/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3681 [4] NCCL INFO Channel 15/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3681 [4] NCCL INFO Channel 16/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3681 [4] NCCL INFO Channel 17/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3681 [4] NCCL INFO Channel 18/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3681 [4] NCCL INFO Channel 19/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3681 [4] NCCL INFO Channel 20/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3681 [4] NCCL INFO Channel 21/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3681 [4] NCCL INFO Channel 22/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3681 [4] NCCL INFO Channel 23/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3681 [4] NCCL INFO Channel 24/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3681 [4] NCCL INFO Channel 25/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3681 [4] NCCL INFO Channel 26/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3681 [4] NCCL INFO Channel 27/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3681 [4] NCCL INFO Channel 28/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3681 [4] NCCL INFO Channel 29/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3681 [4] NCCL INFO Channel 30/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3681 [4] NCCL INFO Channel 31/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3681 [4] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-ybpu7pvmis5m57pm6ny1:3377:3681 [4] NCCL INFO Setting affinity for GPU 4 to 0fffff,fffffc00,00000000 iv-ybpu7pvmis5m57pm6ny1:3375:3665 [2] NCCL INFO Connected all rings iv-ybpu7pvmis5m57pm6ny1:3375:3665 [2] NCCL INFO Connected all trees iv-ybpu7pvmis5m57pm6ny1:3375:3665 [2] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-ybpu7pvmis5m57pm6ny1:3375:3665 [2] NCCL INFO comm 0x7f7680008fb0 rank 0 nranks 1 cudaDev 2 busId 67010 - Init COMPLETE iv-ybpu7pvmis5m57pm6ny1:3376:3663 [3] NCCL INFO Connected all rings iv-ybpu7pvmis5m57pm6ny1:3376:3663 [3] NCCL INFO Connected all trees iv-ybpu7pvmis5m57pm6ny1:3376:3663 [3] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-ybpu7pvmis5m57pm6ny1:3376:3663 [3] NCCL INFO comm 0x7ff65c008fb0 rank 0 nranks 1 cudaDev 3 busId 67020 - Init COMPLETE iv-ybpu7pvmis5m57pm6ny1:3373:3667 [0] NCCL INFO Connected all rings iv-ybpu7pvmis5m57pm6ny1:3373:3667 [0] NCCL INFO Connected all trees iv-ybpu7pvmis5m57pm6ny1:3373:3667 [0] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-ybpu7pvmis5m57pm6ny1:3373:3667 [0] NCCL INFO comm 0x7f2c34008fb0 rank 0 nranks 1 cudaDev 0 busId 65010 - Init COMPLETE iv-ybpu7pvmis5m57pm6ny1:3380:3668 [7] NCCL INFO Connected all rings iv-ybpu7pvmis5m57pm6ny1:3380:3668 [7] NCCL INFO Connected all trees iv-ybpu7pvmis5m57pm6ny1:3380:3668 [7] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-ybpu7pvmis5m57pm6ny1:3380:3668 [7] NCCL INFO comm 0x7fbc14008fb0 rank 0 nranks 1 cudaDev 7 busId 6b020 - Init COMPLETE iv-ybpu7pvmis5m57pm6ny1:3378:3664 [5] NCCL INFO Connected all rings iv-ybpu7pvmis5m57pm6ny1:3378:3664 [5] NCCL INFO Connected all trees iv-ybpu7pvmis5m57pm6ny1:3378:3664 [5] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-ybpu7pvmis5m57pm6ny1:3378:3664 [5] NCCL INFO comm 0x7f6af0008fb0 rank 0 nranks 1 cudaDev 5 busId 69020 - Init COMPLETE iv-ybpu7pvmis5m57pm6ny1:3379:3671 [6] NCCL INFO Connected all rings iv-ybpu7pvmis5m57pm6ny1:3379:3671 [6] NCCL INFO Connected all trees iv-ybpu7pvmis5m57pm6ny1:3379:3671 [6] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-ybpu7pvmis5m57pm6ny1:3379:3671 [6] NCCL INFO comm 0x7f879c008fb0 rank 0 nranks 1 cudaDev 6 busId 6b010 - Init COMPLETE iv-ybpu7pvmis5m57pm6ny1:3374:3672 [1] NCCL INFO Connected all rings iv-ybpu7pvmis5m57pm6ny1:3374:3672 [1] NCCL INFO Connected all trees iv-ybpu7pvmis5m57pm6ny1:3374:3672 [1] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-ybpu7pvmis5m57pm6ny1:3374:3672 [1] NCCL INFO comm 0x7fb5e8008fb0 rank 0 nranks 1 cudaDev 1 busId 65020 - Init COMPLETE iv-ybpu7pvmis5m57pm6ny1:3377:3681 [4] NCCL INFO Connected all rings iv-ybpu7pvmis5m57pm6ny1:3377:3681 [4] NCCL INFO Connected all trees iv-ybpu7pvmis5m57pm6ny1:3377:3681 [4] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-ybpu7pvmis5m57pm6ny1:3377:3681 [4] NCCL INFO comm 0x7faa00008fb0 rank 0 nranks 1 cudaDev 4 busId 69010 - Init COMPLETE iv-ybpu7pvmis5m57pm6ny1:3379:3699 [6] NCCL INFO Channel 00/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3699 [6] NCCL INFO Channel 01/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3699 [6] NCCL INFO Channel 02/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3699 [6] NCCL INFO Channel 03/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3699 [6] NCCL INFO Channel 04/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3699 [6] NCCL INFO Channel 05/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3699 [6] NCCL INFO Channel 06/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3699 [6] NCCL INFO Channel 07/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3699 [6] NCCL INFO Channel 08/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3699 [6] NCCL INFO Channel 09/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3699 [6] NCCL INFO Channel 10/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3699 [6] NCCL INFO Channel 11/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3699 [6] NCCL INFO Channel 12/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3699 [6] NCCL INFO Channel 13/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3699 [6] NCCL INFO Channel 14/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3699 [6] NCCL INFO Channel 15/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3699 [6] NCCL INFO Channel 16/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3699 [6] NCCL INFO Channel 17/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3699 [6] NCCL INFO Channel 18/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3699 [6] NCCL INFO Channel 19/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3699 [6] NCCL INFO Channel 20/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3699 [6] NCCL INFO Channel 21/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3699 [6] NCCL INFO Channel 22/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3699 [6] NCCL INFO Channel 23/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3699 [6] NCCL INFO Channel 24/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3699 [6] NCCL INFO Channel 25/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3699 [6] NCCL INFO Channel 26/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3699 [6] NCCL INFO Channel 27/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3699 [6] NCCL INFO Channel 28/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3699 [6] NCCL INFO Channel 29/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3699 [6] NCCL INFO Channel 30/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3699 [6] NCCL INFO Channel 31/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:3699 [6] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-ybpu7pvmis5m57pm6ny1:3379:3699 [6] NCCL INFO Setting affinity for GPU 6 to 0fffff,fffffc00,00000000 iv-ybpu7pvmis5m57pm6ny1:3373:3700 [0] NCCL INFO Channel 00/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3700 [0] NCCL INFO Channel 01/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3700 [0] NCCL INFO Channel 02/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3700 [0] NCCL INFO Channel 03/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3700 [0] NCCL INFO Channel 04/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3700 [0] NCCL INFO Channel 05/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3700 [0] NCCL INFO Channel 06/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3700 [0] NCCL INFO Channel 07/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3700 [0] NCCL INFO Channel 08/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3700 [0] NCCL INFO Channel 09/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3700 [0] NCCL INFO Channel 10/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3700 [0] NCCL INFO Channel 11/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3700 [0] NCCL INFO Channel 12/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3700 [0] NCCL INFO Channel 13/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3700 [0] NCCL INFO Channel 14/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3700 [0] NCCL INFO Channel 15/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3700 [0] NCCL INFO Channel 16/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3700 [0] NCCL INFO Channel 17/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3700 [0] NCCL INFO Channel 18/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3700 [0] NCCL INFO Channel 19/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3700 [0] NCCL INFO Channel 20/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3700 [0] NCCL INFO Channel 21/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3700 [0] NCCL INFO Channel 22/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3700 [0] NCCL INFO Channel 23/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3700 [0] NCCL INFO Channel 24/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3700 [0] NCCL INFO Channel 25/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3700 [0] NCCL INFO Channel 26/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3700 [0] NCCL INFO Channel 27/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3700 [0] NCCL INFO Channel 28/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3700 [0] NCCL INFO Channel 29/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3700 [0] NCCL INFO Channel 30/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3700 [0] NCCL INFO Channel 31/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:3700 [0] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-ybpu7pvmis5m57pm6ny1:3373:3700 [0] NCCL INFO Setting affinity for GPU 0 to 03ff,ffffffff iv-ybpu7pvmis5m57pm6ny1:3376:3697 [3] NCCL INFO Channel 00/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3697 [3] NCCL INFO Channel 01/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3697 [3] NCCL INFO Channel 02/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3697 [3] NCCL INFO Channel 03/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3697 [3] NCCL INFO Channel 04/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3697 [3] NCCL INFO Channel 05/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3697 [3] NCCL INFO Channel 06/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3697 [3] NCCL INFO Channel 07/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3697 [3] NCCL INFO Channel 08/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3697 [3] NCCL INFO Channel 09/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3697 [3] NCCL INFO Channel 10/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3697 [3] NCCL INFO Channel 11/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3697 [3] NCCL INFO Channel 12/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3697 [3] NCCL INFO Channel 13/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3697 [3] NCCL INFO Channel 14/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3697 [3] NCCL INFO Channel 15/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3697 [3] NCCL INFO Channel 16/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3697 [3] NCCL INFO Channel 17/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3697 [3] NCCL INFO Channel 18/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3697 [3] NCCL INFO Channel 19/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3697 [3] NCCL INFO Channel 20/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3697 [3] NCCL INFO Channel 21/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3697 [3] NCCL INFO Channel 22/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3697 [3] NCCL INFO Channel 23/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3697 [3] NCCL INFO Channel 24/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3697 [3] NCCL INFO Channel 25/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3697 [3] NCCL INFO Channel 26/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3697 [3] NCCL INFO Channel 27/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3697 [3] NCCL INFO Channel 28/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3697 [3] NCCL INFO Channel 29/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3697 [3] NCCL INFO Channel 30/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3697 [3] NCCL INFO Channel 31/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:3697 [3] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-ybpu7pvmis5m57pm6ny1:3376:3697 [3] NCCL INFO Setting affinity for GPU 3 to 03ff,ffffffff iv-ybpu7pvmis5m57pm6ny1:3375:3696 [2] NCCL INFO Channel 00/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3696 [2] NCCL INFO Channel 01/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3696 [2] NCCL INFO Channel 02/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3696 [2] NCCL INFO Channel 03/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3696 [2] NCCL INFO Channel 04/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3696 [2] NCCL INFO Channel 05/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3696 [2] NCCL INFO Channel 06/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3696 [2] NCCL INFO Channel 07/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3696 [2] NCCL INFO Channel 08/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3696 [2] NCCL INFO Channel 09/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3696 [2] NCCL INFO Channel 10/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3696 [2] NCCL INFO Channel 11/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3696 [2] NCCL INFO Channel 12/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3696 [2] NCCL INFO Channel 13/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3696 [2] NCCL INFO Channel 14/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3696 [2] NCCL INFO Channel 15/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3696 [2] NCCL INFO Channel 16/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3696 [2] NCCL INFO Channel 17/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3696 [2] NCCL INFO Channel 18/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3696 [2] NCCL INFO Channel 19/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3696 [2] NCCL INFO Channel 20/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3696 [2] NCCL INFO Channel 21/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3696 [2] NCCL INFO Channel 22/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3696 [2] NCCL INFO Channel 23/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3696 [2] NCCL INFO Channel 24/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3696 [2] NCCL INFO Channel 25/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3696 [2] NCCL INFO Channel 26/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3696 [2] NCCL INFO Channel 27/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3696 [2] NCCL INFO Channel 28/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3696 [2] NCCL INFO Channel 29/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3696 [2] NCCL INFO Channel 30/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3696 [2] NCCL INFO Channel 31/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:3696 [2] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-ybpu7pvmis5m57pm6ny1:3375:3696 [2] NCCL INFO Setting affinity for GPU 2 to 03ff,ffffffff iv-ybpu7pvmis5m57pm6ny1:3377:3706 [4] NCCL INFO Channel 00/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3706 [4] NCCL INFO Channel 01/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3706 [4] NCCL INFO Channel 02/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3706 [4] NCCL INFO Channel 03/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3706 [4] NCCL INFO Channel 04/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3706 [4] NCCL INFO Channel 05/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3706 [4] NCCL INFO Channel 06/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3706 [4] NCCL INFO Channel 07/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3706 [4] NCCL INFO Channel 08/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3706 [4] NCCL INFO Channel 09/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3706 [4] NCCL INFO Channel 10/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3706 [4] NCCL INFO Channel 11/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3706 [4] NCCL INFO Channel 12/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3706 [4] NCCL INFO Channel 13/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3706 [4] NCCL INFO Channel 14/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3706 [4] NCCL INFO Channel 15/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3706 [4] NCCL INFO Channel 16/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3706 [4] NCCL INFO Channel 17/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3706 [4] NCCL INFO Channel 18/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3706 [4] NCCL INFO Channel 19/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3706 [4] NCCL INFO Channel 20/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3706 [4] NCCL INFO Channel 21/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3706 [4] NCCL INFO Channel 22/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3706 [4] NCCL INFO Channel 23/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3706 [4] NCCL INFO Channel 24/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3706 [4] NCCL INFO Channel 25/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3706 [4] NCCL INFO Channel 26/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3706 [4] NCCL INFO Channel 27/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3706 [4] NCCL INFO Channel 28/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3706 [4] NCCL INFO Channel 29/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3706 [4] NCCL INFO Channel 30/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3706 [4] NCCL INFO Channel 31/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:3706 [4] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-ybpu7pvmis5m57pm6ny1:3377:3706 [4] NCCL INFO Setting affinity for GPU 4 to 0fffff,fffffc00,00000000 iv-ybpu7pvmis5m57pm6ny1:3374:3708 [1] NCCL INFO Channel 00/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3708 [1] NCCL INFO Channel 01/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3708 [1] NCCL INFO Channel 02/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3708 [1] NCCL INFO Channel 03/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3708 [1] NCCL INFO Channel 04/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3708 [1] NCCL INFO Channel 05/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3708 [1] NCCL INFO Channel 06/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3708 [1] NCCL INFO Channel 07/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3708 [1] NCCL INFO Channel 08/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3708 [1] NCCL INFO Channel 09/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3708 [1] NCCL INFO Channel 10/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3708 [1] NCCL INFO Channel 11/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3708 [1] NCCL INFO Channel 12/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3708 [1] NCCL INFO Channel 13/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3708 [1] NCCL INFO Channel 14/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3708 [1] NCCL INFO Channel 15/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3708 [1] NCCL INFO Channel 16/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3708 [1] NCCL INFO Channel 17/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3708 [1] NCCL INFO Channel 18/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3708 [1] NCCL INFO Channel 19/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3708 [1] NCCL INFO Channel 20/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3708 [1] NCCL INFO Channel 21/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3708 [1] NCCL INFO Channel 22/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3708 [1] NCCL INFO Channel 23/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3708 [1] NCCL INFO Channel 24/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3708 [1] NCCL INFO Channel 25/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3708 [1] NCCL INFO Channel 26/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3708 [1] NCCL INFO Channel 27/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3708 [1] NCCL INFO Channel 28/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3708 [1] NCCL INFO Channel 29/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3708 [1] NCCL INFO Channel 30/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3708 [1] NCCL INFO Channel 31/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:3708 [1] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-ybpu7pvmis5m57pm6ny1:3374:3708 [1] NCCL INFO Setting affinity for GPU 1 to 03ff,ffffffff iv-ybpu7pvmis5m57pm6ny1:3380:3712 [7] NCCL INFO Channel 00/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3712 [7] NCCL INFO Channel 01/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3712 [7] NCCL INFO Channel 02/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3712 [7] NCCL INFO Channel 03/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3712 [7] NCCL INFO Channel 04/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3712 [7] NCCL INFO Channel 05/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3712 [7] NCCL INFO Channel 06/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3712 [7] NCCL INFO Channel 07/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3712 [7] NCCL INFO Channel 08/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3712 [7] NCCL INFO Channel 09/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3712 [7] NCCL INFO Channel 10/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3712 [7] NCCL INFO Channel 11/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3712 [7] NCCL INFO Channel 12/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3712 [7] NCCL INFO Channel 13/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3712 [7] NCCL INFO Channel 14/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3712 [7] NCCL INFO Channel 15/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3712 [7] NCCL INFO Channel 16/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3712 [7] NCCL INFO Channel 17/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3712 [7] NCCL INFO Channel 18/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3712 [7] NCCL INFO Channel 19/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3712 [7] NCCL INFO Channel 20/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3712 [7] NCCL INFO Channel 21/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3712 [7] NCCL INFO Channel 22/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3712 [7] NCCL INFO Channel 23/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3712 [7] NCCL INFO Channel 24/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3712 [7] NCCL INFO Channel 25/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3712 [7] NCCL INFO Channel 26/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3712 [7] NCCL INFO Channel 27/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3712 [7] NCCL INFO Channel 28/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3712 [7] NCCL INFO Channel 29/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3712 [7] NCCL INFO Channel 30/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3712 [7] NCCL INFO Channel 31/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:3712 [7] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-ybpu7pvmis5m57pm6ny1:3380:3712 [7] NCCL INFO Setting affinity for GPU 7 to 0fffff,fffffc00,00000000 iv-ybpu7pvmis5m57pm6ny1:3378:3692 [5] NCCL INFO Channel 00/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3692 [5] NCCL INFO Channel 01/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3692 [5] NCCL INFO Channel 02/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3692 [5] NCCL INFO Channel 03/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3692 [5] NCCL INFO Channel 04/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3692 [5] NCCL INFO Channel 05/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3692 [5] NCCL INFO Channel 06/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3692 [5] NCCL INFO Channel 07/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3692 [5] NCCL INFO Channel 08/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3692 [5] NCCL INFO Channel 09/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3692 [5] NCCL INFO Channel 10/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3692 [5] NCCL INFO Channel 11/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3692 [5] NCCL INFO Channel 12/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3692 [5] NCCL INFO Channel 13/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3692 [5] NCCL INFO Channel 14/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3692 [5] NCCL INFO Channel 15/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3692 [5] NCCL INFO Channel 16/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3692 [5] NCCL INFO Channel 17/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3692 [5] NCCL INFO Channel 18/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3692 [5] NCCL INFO Channel 19/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3692 [5] NCCL INFO Channel 20/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3692 [5] NCCL INFO Channel 21/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3692 [5] NCCL INFO Channel 22/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3692 [5] NCCL INFO Channel 23/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3692 [5] NCCL INFO Channel 24/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3692 [5] NCCL INFO Channel 25/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3692 [5] NCCL INFO Channel 26/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3692 [5] NCCL INFO Channel 27/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3692 [5] NCCL INFO Channel 28/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3692 [5] NCCL INFO Channel 29/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3692 [5] NCCL INFO Channel 30/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3692 [5] NCCL INFO Channel 31/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:3692 [5] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-ybpu7pvmis5m57pm6ny1:3378:3692 [5] NCCL INFO Setting affinity for GPU 5 to 0fffff,fffffc00,00000000 iv-ybpu7pvmis5m57pm6ny1:3379:3699 [6] NCCL INFO Connected all rings iv-ybpu7pvmis5m57pm6ny1:3379:3699 [6] NCCL INFO Connected all trees iv-ybpu7pvmis5m57pm6ny1:3379:3699 [6] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-ybpu7pvmis5m57pm6ny1:3379:3699 [6] NCCL INFO comm 0x7f8794008fb0 rank 0 nranks 1 cudaDev 6 busId 6b010 - Init COMPLETE iv-ybpu7pvmis5m57pm6ny1:3373:3700 [0] NCCL INFO Connected all rings iv-ybpu7pvmis5m57pm6ny1:3373:3700 [0] NCCL INFO Connected all trees iv-ybpu7pvmis5m57pm6ny1:3373:3700 [0] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-ybpu7pvmis5m57pm6ny1:3373:3700 [0] NCCL INFO comm 0x7f2c28008fb0 rank 0 nranks 1 cudaDev 0 busId 65010 - Init COMPLETE iv-ybpu7pvmis5m57pm6ny1:3376:3697 [3] NCCL INFO Connected all rings iv-ybpu7pvmis5m57pm6ny1:3376:3697 [3] NCCL INFO Connected all trees iv-ybpu7pvmis5m57pm6ny1:3376:3697 [3] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-ybpu7pvmis5m57pm6ny1:3376:3697 [3] NCCL INFO comm 0x7ff650008fb0 rank 0 nranks 1 cudaDev 3 busId 67020 - Init COMPLETE iv-ybpu7pvmis5m57pm6ny1:3375:3696 [2] NCCL INFO Connected all rings iv-ybpu7pvmis5m57pm6ny1:3375:3696 [2] NCCL INFO Connected all trees iv-ybpu7pvmis5m57pm6ny1:3375:3696 [2] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-ybpu7pvmis5m57pm6ny1:3375:3696 [2] NCCL INFO comm 0x7f7674008fb0 rank 0 nranks 1 cudaDev 2 busId 67010 - Init COMPLETE iv-ybpu7pvmis5m57pm6ny1:3377:3706 [4] NCCL INFO Connected all rings iv-ybpu7pvmis5m57pm6ny1:3377:3706 [4] NCCL INFO Connected all trees iv-ybpu7pvmis5m57pm6ny1:3377:3706 [4] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-ybpu7pvmis5m57pm6ny1:3377:3706 [4] NCCL INFO comm 0x7fa9f4008fb0 rank 0 nranks 1 cudaDev 4 busId 69010 - Init COMPLETE iv-ybpu7pvmis5m57pm6ny1:3374:3708 [1] NCCL INFO Connected all rings iv-ybpu7pvmis5m57pm6ny1:3374:3708 [1] NCCL INFO Connected all trees iv-ybpu7pvmis5m57pm6ny1:3374:3708 [1] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-ybpu7pvmis5m57pm6ny1:3374:3708 [1] NCCL INFO comm 0x7fb5dc008fb0 rank 0 nranks 1 cudaDev 1 busId 65020 - Init COMPLETE iv-ybpu7pvmis5m57pm6ny1:3380:3712 [7] NCCL INFO Connected all rings iv-ybpu7pvmis5m57pm6ny1:3380:3712 [7] NCCL INFO Connected all trees iv-ybpu7pvmis5m57pm6ny1:3380:3712 [7] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-ybpu7pvmis5m57pm6ny1:3380:3712 [7] NCCL INFO comm 0x7fbc0c008fb0 rank 0 nranks 1 cudaDev 7 busId 6b020 - Init COMPLETE iv-ybpu7pvmis5m57pm6ny1:3378:3692 [5] NCCL INFO Connected all rings iv-ybpu7pvmis5m57pm6ny1:3378:3692 [5] NCCL INFO Connected all trees iv-ybpu7pvmis5m57pm6ny1:3378:3692 [5] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-ybpu7pvmis5m57pm6ny1:3378:3692 [5] NCCL INFO comm 0x7f6ae4008fb0 rank 0 nranks 1 cudaDev 5 busId 69020 - Init COMPLETE [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) time (ms) | model-and-optimizer-setup: 311.21 | train/valid/test-data-iterators-setup: 2147.45 /dataset/xyn/Megatron-LM/megatron/model/transformer.py:536: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/LegacyTypeDispatch.h:74.) output = bias_dropout_add_func( /dataset/xyn/Megatron-LM/megatron/model/transformer.py:536: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/LegacyTypeDispatch.h:74.) output = bias_dropout_add_func( /dataset/xyn/Megatron-LM/megatron/model/transformer.py:536: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/LegacyTypeDispatch.h:74.) output = bias_dropout_add_func( /dataset/xyn/Megatron-LM/megatron/model/transformer.py:536: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/LegacyTypeDispatch.h:74.) output = bias_dropout_add_func( /dataset/xyn/Megatron-LM/megatron/model/transformer.py:536: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/LegacyTypeDispatch.h:74.) output = bias_dropout_add_func( /dataset/xyn/Megatron-LM/megatron/model/transformer.py:536: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/LegacyTypeDispatch.h:74.) output = bias_dropout_add_func( /dataset/xyn/Megatron-LM/megatron/model/transformer.py:536: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/LegacyTypeDispatch.h:74.) output = bias_dropout_add_func( /dataset/xyn/Megatron-LM/megatron/model/transformer.py:536: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/LegacyTypeDispatch.h:74.) output = bias_dropout_add_func( iv-ybpu7pvmis5m57pm6ny1:3375:6016 [2] NCCL INFO Channel 00/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:6016 [2] NCCL INFO Channel 01/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:6016 [2] NCCL INFO Channel 02/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:6016 [2] NCCL INFO Channel 03/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:6016 [2] NCCL INFO Channel 04/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:6016 [2] NCCL INFO Channel 05/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:6016 [2] NCCL INFO Channel 06/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:6016 [2] NCCL INFO Channel 07/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:6016 [2] NCCL INFO Channel 08/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:6016 [2] NCCL INFO Channel 09/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:6016 [2] NCCL INFO Channel 10/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:6016 [2] NCCL INFO Channel 11/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:6016 [2] NCCL INFO Channel 12/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:6016 [2] NCCL INFO Channel 13/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:6016 [2] NCCL INFO Channel 14/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:6016 [2] NCCL INFO Channel 15/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:6016 [2] NCCL INFO Channel 16/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:6016 [2] NCCL INFO Channel 17/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:6016 [2] NCCL INFO Channel 18/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:6016 [2] NCCL INFO Channel 19/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:6016 [2] NCCL INFO Channel 20/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:6016 [2] NCCL INFO Channel 21/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:6016 [2] NCCL INFO Channel 22/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:6016 [2] NCCL INFO Channel 23/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:6016 [2] NCCL INFO Channel 24/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:6016 [2] NCCL INFO Channel 25/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:6016 [2] NCCL INFO Channel 26/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:6016 [2] NCCL INFO Channel 27/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:6016 [2] NCCL INFO Channel 28/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:6016 [2] NCCL INFO Channel 29/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:6016 [2] NCCL INFO Channel 30/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:6016 [2] NCCL INFO Channel 31/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3375:6016 [2] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-ybpu7pvmis5m57pm6ny1:3375:6016 [2] NCCL INFO Setting affinity for GPU 2 to 03ff,ffffffff iv-ybpu7pvmis5m57pm6ny1:3378:6021 [5] NCCL INFO Channel 00/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:6021 [5] NCCL INFO Channel 01/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:6021 [5] NCCL INFO Channel 02/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:6021 [5] NCCL INFO Channel 03/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:6021 [5] NCCL INFO Channel 04/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:6021 [5] NCCL INFO Channel 05/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:6021 [5] NCCL INFO Channel 06/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:6021 [5] NCCL INFO Channel 07/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:6021 [5] NCCL INFO Channel 08/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:6021 [5] NCCL INFO Channel 09/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:6021 [5] NCCL INFO Channel 10/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:6021 [5] NCCL INFO Channel 11/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:6021 [5] NCCL INFO Channel 12/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:6021 [5] NCCL INFO Channel 13/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:6021 [5] NCCL INFO Channel 14/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:6021 [5] NCCL INFO Channel 15/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:6021 [5] NCCL INFO Channel 16/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:6021 [5] NCCL INFO Channel 17/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:6021 [5] NCCL INFO Channel 18/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:6021 [5] NCCL INFO Channel 19/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:6021 [5] NCCL INFO Channel 20/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:6021 [5] NCCL INFO Channel 21/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:6021 [5] NCCL INFO Channel 22/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:6021 [5] NCCL INFO Channel 23/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:6021 [5] NCCL INFO Channel 24/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:6021 [5] NCCL INFO Channel 25/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:6021 [5] NCCL INFO Channel 26/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:6021 [5] NCCL INFO Channel 27/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:6021 [5] NCCL INFO Channel 28/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:6021 [5] NCCL INFO Channel 29/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:6021 [5] NCCL INFO Channel 30/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:6021 [5] NCCL INFO Channel 31/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3378:6021 [5] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-ybpu7pvmis5m57pm6ny1:3378:6021 [5] NCCL INFO Setting affinity for GPU 5 to 0fffff,fffffc00,00000000 iv-ybpu7pvmis5m57pm6ny1:3373:6010 [0] NCCL INFO Channel 00/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:6010 [0] NCCL INFO Channel 01/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:6010 [0] NCCL INFO Channel 02/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:6010 [0] NCCL INFO Channel 03/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:6010 [0] NCCL INFO Channel 04/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:6010 [0] NCCL INFO Channel 05/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:6010 [0] NCCL INFO Channel 06/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:6010 [0] NCCL INFO Channel 07/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:6010 [0] NCCL INFO Channel 08/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:6010 [0] NCCL INFO Channel 09/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:6010 [0] NCCL INFO Channel 10/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:6010 [0] NCCL INFO Channel 11/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:6010 [0] NCCL INFO Channel 12/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:6010 [0] NCCL INFO Channel 13/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:6010 [0] NCCL INFO Channel 14/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:6010 [0] NCCL INFO Channel 15/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:6010 [0] NCCL INFO Channel 16/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:6010 [0] NCCL INFO Channel 17/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:6010 [0] NCCL INFO Channel 18/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:6010 [0] NCCL INFO Channel 19/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:6010 [0] NCCL INFO Channel 20/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:6010 [0] NCCL INFO Channel 21/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:6010 [0] NCCL INFO Channel 22/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:6010 [0] NCCL INFO Channel 23/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:6010 [0] NCCL INFO Channel 24/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:6010 [0] NCCL INFO Channel 25/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:6010 [0] NCCL INFO Channel 26/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:6010 [0] NCCL INFO Channel 27/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:6010 [0] NCCL INFO Channel 28/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:6010 [0] NCCL INFO Channel 29/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:6010 [0] NCCL INFO Channel 30/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:6010 [0] NCCL INFO Channel 31/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3373:6010 [0] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-ybpu7pvmis5m57pm6ny1:3373:6010 [0] NCCL INFO Setting affinity for GPU 0 to 03ff,ffffffff iv-ybpu7pvmis5m57pm6ny1:3380:6019 [7] NCCL INFO Channel 00/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:6019 [7] NCCL INFO Channel 01/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:6019 [7] NCCL INFO Channel 02/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:6019 [7] NCCL INFO Channel 03/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:6019 [7] NCCL INFO Channel 04/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:6019 [7] NCCL INFO Channel 05/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:6019 [7] NCCL INFO Channel 06/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:6019 [7] NCCL INFO Channel 07/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:6019 [7] NCCL INFO Channel 08/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:6019 [7] NCCL INFO Channel 09/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:6019 [7] NCCL INFO Channel 10/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:6019 [7] NCCL INFO Channel 11/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:6019 [7] NCCL INFO Channel 12/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:6019 [7] NCCL INFO Channel 13/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:6019 [7] NCCL INFO Channel 14/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:6019 [7] NCCL INFO Channel 15/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:6019 [7] NCCL INFO Channel 16/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:6019 [7] NCCL INFO Channel 17/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:6019 [7] NCCL INFO Channel 18/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:6019 [7] NCCL INFO Channel 19/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:6019 [7] NCCL INFO Channel 20/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:6019 [7] NCCL INFO Channel 21/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:6019 [7] NCCL INFO Channel 22/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:6019 [7] NCCL INFO Channel 23/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:6019 [7] NCCL INFO Channel 24/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:6019 [7] NCCL INFO Channel 25/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:6019 [7] NCCL INFO Channel 26/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:6019 [7] NCCL INFO Channel 27/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:6019 [7] NCCL INFO Channel 28/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:6019 [7] NCCL INFO Channel 29/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:6019 [7] NCCL INFO Channel 30/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:6019 [7] NCCL INFO Channel 31/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3380:6019 [7] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-ybpu7pvmis5m57pm6ny1:3380:6019 [7] NCCL INFO Setting affinity for GPU 7 to 0fffff,fffffc00,00000000 iv-ybpu7pvmis5m57pm6ny1:3379:6013 [6] NCCL INFO Channel 00/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:6013 [6] NCCL INFO Channel 01/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:6013 [6] NCCL INFO Channel 02/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:6013 [6] NCCL INFO Channel 03/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:6013 [6] NCCL INFO Channel 04/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:6013 [6] NCCL INFO Channel 05/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:6013 [6] NCCL INFO Channel 06/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:6013 [6] NCCL INFO Channel 07/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:6013 [6] NCCL INFO Channel 08/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:6013 [6] NCCL INFO Channel 09/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:6013 [6] NCCL INFO Channel 10/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:6013 [6] NCCL INFO Channel 11/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:6013 [6] NCCL INFO Channel 12/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:6013 [6] NCCL INFO Channel 13/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:6013 [6] NCCL INFO Channel 14/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:6013 [6] NCCL INFO Channel 15/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:6013 [6] NCCL INFO Channel 16/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:6013 [6] NCCL INFO Channel 17/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:6013 [6] NCCL INFO Channel 18/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:6013 [6] NCCL INFO Channel 19/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:6013 [6] NCCL INFO Channel 20/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:6013 [6] NCCL INFO Channel 21/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:6013 [6] NCCL INFO Channel 22/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:6013 [6] NCCL INFO Channel 23/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:6013 [6] NCCL INFO Channel 24/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:6013 [6] NCCL INFO Channel 25/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:6013 [6] NCCL INFO Channel 26/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:6013 [6] NCCL INFO Channel 27/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:6013 [6] NCCL INFO Channel 28/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:6013 [6] NCCL INFO Channel 29/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:6013 [6] NCCL INFO Channel 30/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:6013 [6] NCCL INFO Channel 31/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3379:6013 [6] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-ybpu7pvmis5m57pm6ny1:3379:6013 [6] NCCL INFO Setting affinity for GPU 6 to 0fffff,fffffc00,00000000 iv-ybpu7pvmis5m57pm6ny1:3374:6020 [1] NCCL INFO Channel 00/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:6020 [1] NCCL INFO Channel 01/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:6020 [1] NCCL INFO Channel 02/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:6020 [1] NCCL INFO Channel 03/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:6020 [1] NCCL INFO Channel 04/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:6020 [1] NCCL INFO Channel 05/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:6020 [1] NCCL INFO Channel 06/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:6020 [1] NCCL INFO Channel 07/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:6020 [1] NCCL INFO Channel 08/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:6020 [1] NCCL INFO Channel 09/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:6020 [1] NCCL INFO Channel 10/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:6020 [1] NCCL INFO Channel 11/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:6020 [1] NCCL INFO Channel 12/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:6020 [1] NCCL INFO Channel 13/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:6020 [1] NCCL INFO Channel 14/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:6020 [1] NCCL INFO Channel 15/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:6020 [1] NCCL INFO Channel 16/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:6020 [1] NCCL INFO Channel 17/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:6020 [1] NCCL INFO Channel 18/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:6020 [1] NCCL INFO Channel 19/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:6020 [1] NCCL INFO Channel 20/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:6020 [1] NCCL INFO Channel 21/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:6020 [1] NCCL INFO Channel 22/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:6020 [1] NCCL INFO Channel 23/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:6020 [1] NCCL INFO Channel 24/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:6020 [1] NCCL INFO Channel 25/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:6020 [1] NCCL INFO Channel 26/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:6020 [1] NCCL INFO Channel 27/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:6020 [1] NCCL INFO Channel 28/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:6020 [1] NCCL INFO Channel 29/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:6020 [1] NCCL INFO Channel 30/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:6020 [1] NCCL INFO Channel 31/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3374:6020 [1] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-ybpu7pvmis5m57pm6ny1:3374:6020 [1] NCCL INFO Setting affinity for GPU 1 to 03ff,ffffffff iv-ybpu7pvmis5m57pm6ny1:3376:6022 [3] NCCL INFO Channel 00/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:6022 [3] NCCL INFO Channel 01/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:6022 [3] NCCL INFO Channel 02/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:6022 [3] NCCL INFO Channel 03/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:6022 [3] NCCL INFO Channel 04/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:6022 [3] NCCL INFO Channel 05/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:6022 [3] NCCL INFO Channel 06/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:6022 [3] NCCL INFO Channel 07/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:6022 [3] NCCL INFO Channel 08/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:6022 [3] NCCL INFO Channel 09/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:6022 [3] NCCL INFO Channel 10/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:6022 [3] NCCL INFO Channel 11/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:6022 [3] NCCL INFO Channel 12/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:6022 [3] NCCL INFO Channel 13/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:6022 [3] NCCL INFO Channel 14/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:6022 [3] NCCL INFO Channel 15/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:6022 [3] NCCL INFO Channel 16/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:6022 [3] NCCL INFO Channel 17/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:6022 [3] NCCL INFO Channel 18/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:6022 [3] NCCL INFO Channel 19/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:6022 [3] NCCL INFO Channel 20/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:6022 [3] NCCL INFO Channel 21/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:6022 [3] NCCL INFO Channel 22/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:6022 [3] NCCL INFO Channel 23/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:6022 [3] NCCL INFO Channel 24/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:6022 [3] NCCL INFO Channel 25/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:6022 [3] NCCL INFO Channel 26/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:6022 [3] NCCL INFO Channel 27/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:6022 [3] NCCL INFO Channel 28/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:6022 [3] NCCL INFO Channel 29/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:6022 [3] NCCL INFO Channel 30/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:6022 [3] NCCL INFO Channel 31/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3376:6022 [3] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-ybpu7pvmis5m57pm6ny1:3376:6022 [3] NCCL INFO Setting affinity for GPU 3 to 03ff,ffffffff iv-ybpu7pvmis5m57pm6ny1:3377:6023 [4] NCCL INFO Channel 00/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:6023 [4] NCCL INFO Channel 01/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:6023 [4] NCCL INFO Channel 02/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:6023 [4] NCCL INFO Channel 03/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:6023 [4] NCCL INFO Channel 04/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:6023 [4] NCCL INFO Channel 05/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:6023 [4] NCCL INFO Channel 06/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:6023 [4] NCCL INFO Channel 07/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:6023 [4] NCCL INFO Channel 08/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:6023 [4] NCCL INFO Channel 09/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:6023 [4] NCCL INFO Channel 10/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:6023 [4] NCCL INFO Channel 11/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:6023 [4] NCCL INFO Channel 12/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:6023 [4] NCCL INFO Channel 13/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:6023 [4] NCCL INFO Channel 14/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:6023 [4] NCCL INFO Channel 15/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:6023 [4] NCCL INFO Channel 16/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:6023 [4] NCCL INFO Channel 17/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:6023 [4] NCCL INFO Channel 18/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:6023 [4] NCCL INFO Channel 19/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:6023 [4] NCCL INFO Channel 20/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:6023 [4] NCCL INFO Channel 21/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:6023 [4] NCCL INFO Channel 22/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:6023 [4] NCCL INFO Channel 23/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:6023 [4] NCCL INFO Channel 24/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:6023 [4] NCCL INFO Channel 25/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:6023 [4] NCCL INFO Channel 26/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:6023 [4] NCCL INFO Channel 27/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:6023 [4] NCCL INFO Channel 28/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:6023 [4] NCCL INFO Channel 29/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:6023 [4] NCCL INFO Channel 30/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:6023 [4] NCCL INFO Channel 31/32 : 0 iv-ybpu7pvmis5m57pm6ny1:3377:6023 [4] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-ybpu7pvmis5m57pm6ny1:3377:6023 [4] NCCL INFO Setting affinity for GPU 4 to 0fffff,fffffc00,00000000 iv-ybpu7pvmis5m57pm6ny1:3378:6021 [5] NCCL INFO Connected all rings iv-ybpu7pvmis5m57pm6ny1:3378:6021 [5] NCCL INFO Connected all trees iv-ybpu7pvmis5m57pm6ny1:3378:6021 [5] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-ybpu7pvmis5m57pm6ny1:3375:6016 [2] NCCL INFO Connected all rings iv-ybpu7pvmis5m57pm6ny1:3375:6016 [2] NCCL INFO Connected all trees iv-ybpu7pvmis5m57pm6ny1:3375:6016 [2] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-ybpu7pvmis5m57pm6ny1:3378:6021 [5] NCCL INFO comm 0x7f6784008fb0 rank 0 nranks 1 cudaDev 5 busId 69020 - Init COMPLETE iv-ybpu7pvmis5m57pm6ny1:3375:6016 [2] NCCL INFO comm 0x7f7314008fb0 rank 0 nranks 1 cudaDev 2 busId 67010 - Init COMPLETE iv-ybpu7pvmis5m57pm6ny1:3379:6013 [6] NCCL INFO Connected all rings iv-ybpu7pvmis5m57pm6ny1:3379:6013 [6] NCCL INFO Connected all trees iv-ybpu7pvmis5m57pm6ny1:3379:6013 [6] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-ybpu7pvmis5m57pm6ny1:3373:6010 [0] NCCL INFO Connected all rings iv-ybpu7pvmis5m57pm6ny1:3373:6010 [0] NCCL INFO Connected all trees iv-ybpu7pvmis5m57pm6ny1:3373:6010 [0] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-ybpu7pvmis5m57pm6ny1:3379:6013 [6] NCCL INFO comm 0x7f8430008fb0 rank 0 nranks 1 cudaDev 6 busId 6b010 - Init COMPLETE iv-ybpu7pvmis5m57pm6ny1:3373:6010 [0] NCCL INFO comm 0x7f28c8008fb0 rank 0 nranks 1 cudaDev 0 busId 65010 - Init COMPLETE iv-ybpu7pvmis5m57pm6ny1:3374:6020 [1] NCCL INFO Connected all rings iv-ybpu7pvmis5m57pm6ny1:3374:6020 [1] NCCL INFO Connected all trees iv-ybpu7pvmis5m57pm6ny1:3374:6020 [1] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-ybpu7pvmis5m57pm6ny1:3374:6020 [1] NCCL INFO comm 0x7fb278008fb0 rank 0 nranks 1 cudaDev 1 busId 65020 - Init COMPLETE iv-ybpu7pvmis5m57pm6ny1:3380:6019 [7] NCCL INFO Connected all rings iv-ybpu7pvmis5m57pm6ny1:3380:6019 [7] NCCL INFO Connected all trees iv-ybpu7pvmis5m57pm6ny1:3380:6019 [7] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-ybpu7pvmis5m57pm6ny1:3380:6019 [7] NCCL INFO comm 0x7fb8a8008fb0 rank 0 nranks 1 cudaDev 7 busId 6b020 - Init COMPLETE iv-ybpu7pvmis5m57pm6ny1:3376:6022 [3] NCCL INFO Connected all rings iv-ybpu7pvmis5m57pm6ny1:3376:6022 [3] NCCL INFO Connected all trees iv-ybpu7pvmis5m57pm6ny1:3376:6022 [3] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-ybpu7pvmis5m57pm6ny1:3377:6023 [4] NCCL INFO Connected all rings iv-ybpu7pvmis5m57pm6ny1:3377:6023 [4] NCCL INFO Connected all trees iv-ybpu7pvmis5m57pm6ny1:3377:6023 [4] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-ybpu7pvmis5m57pm6ny1:3377:6023 [4] NCCL INFO comm 0x7fa690008fb0 rank 0 nranks 1 cudaDev 4 busId 69010 - Init COMPLETE iv-ybpu7pvmis5m57pm6ny1:3376:6022 [3] NCCL INFO comm 0x7ff2f0008fb0 rank 0 nranks 1 cudaDev 3 busId 67020 - Init COMPLETE iteration 100/ 220 | consumed samples: 12800 | elapsed time per iteration (ms): 454.3 | learning rate: 3.937E-06 | tpt: 281.7 samples/s | global batch size: 128 | lm loss: 1.001760E+01 | loss scale: 131072.0 | grad norm: 2.161 | number of skipped iterations: 16 | number of nan iterations: 0 | time (ms) | forward-compute: 113.50 | backward-compute: 150.61 | backward-params-all-reduce: 150.50 | backward-embedding-all-reduce: 0.03 | optimizer-copy-to-main-grad: 4.30 | optimizer-unscale-and-check-inf: 10.66 | optimizer-clip-main-grad: 5.81 | optimizer-copy-main-to-model-params: 3.96 | optimizer: 36.82 | batch-generator: 1.06 timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] 2022/07/06 12:35:38.024, Tesla V100-SXM2-32GB, 470.57.02, 27 %, 19 %, 32510 MiB, 13506 MiB, 19004 MiB 2022/07/06 12:35:38.026, Tesla V100-SXM2-32GB, 470.57.02, 27 %, 19 %, 32510 MiB, 13506 MiB, 19004 MiB 2022/07/06 12:35:38.027, Tesla V100-SXM2-32GB, 470.57.02, 98 %, 20 %, 32510 MiB, 13496 MiB, 19014 MiB 2022/07/06 12:35:38.030, Tesla V100-SXM2-32GB, 470.57.02, 98 %, 20 %, 32510 MiB, 13496 MiB, 19014 MiB 2022/07/06 12:35:38.031, Tesla V100-SXM2-32GB, 470.57.02, 53 %, 21 %, 32510 MiB, 13602 MiB, 18908 MiB 2022/07/06 12:35:38.033, Tesla V100-SXM2-32GB, 470.57.02, 53 %, 21 %, 32510 MiB, 13602 MiB, 18908 MiB timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] 2022/07/06 12:35:38.034, Tesla V100-SXM2-32GB, 470.57.02, 40 %, 20 %, 32510 MiB, 13586 MiB, 18924 MiB timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] 2022/07/06 12:35:38.035, Tesla V100-SXM2-32GB, 470.57.02, 40 %, 20 %, 32510 MiB, 13586 MiB, 18924 MiB 2022/07/06 12:35:38.035, Tesla V100-SXM2-32GB, 470.57.02, 27 %, 19 %, 32510 MiB, 13506 MiB, 19004 MiB 2022/07/06 12:35:38.036, Tesla V100-SXM2-32GB, 470.57.02, 98 %, 22 %, 32510 MiB, 13528 MiB, 18982 MiB timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] 2022/07/06 12:35:38.037, Tesla V100-SXM2-32GB, 470.57.02, 27 %, 19 %, 32510 MiB, 13506 MiB, 19004 MiB 2022/07/06 12:35:38.037, Tesla V100-SXM2-32GB, 470.57.02, 98 %, 22 %, 32510 MiB, 13528 MiB, 18982 MiB 2022/07/06 12:35:38.038, Tesla V100-SXM2-32GB, 470.57.02, 98 %, 20 %, 32510 MiB, 13496 MiB, 19014 MiB 2022/07/06 12:35:38.038, Tesla V100-SXM2-32GB, 470.57.02, 74 %, 24 %, 32510 MiB, 13504 MiB, 19006 MiB 2022/07/06 12:35:38.039, Tesla V100-SXM2-32GB, 470.57.02, 27 %, 19 %, 32510 MiB, 13506 MiB, 19004 MiB 2022/07/06 12:35:38.041, Tesla V100-SXM2-32GB, 470.57.02, 27 %, 19 %, 32510 MiB, 13506 MiB, 19004 MiB 2022/07/06 12:35:38.042, Tesla V100-SXM2-32GB, 470.57.02, 98 %, 20 %, 32510 MiB, 13496 MiB, 19014 MiB 2022/07/06 12:35:38.042, Tesla V100-SXM2-32GB, 470.57.02, 74 %, 24 %, 32510 MiB, 13504 MiB, 19006 MiB 2022/07/06 12:35:38.043, Tesla V100-SXM2-32GB, 470.57.02, 53 %, 21 %, 32510 MiB, 13602 MiB, 18908 MiB 2022/07/06 12:35:38.043, Tesla V100-SXM2-32GB, 470.57.02, 87 %, 21 %, 32510 MiB, 13514 MiB, 18996 MiB 2022/07/06 12:35:38.044, Tesla V100-SXM2-32GB, 470.57.02, 98 %, 20 %, 32510 MiB, 13496 MiB, 19014 MiB timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] 2022/07/06 12:35:38.047, Tesla V100-SXM2-32GB, 470.57.02, 98 %, 20 %, 32510 MiB, 13496 MiB, 19014 MiB 2022/07/06 12:35:38.048, Tesla V100-SXM2-32GB, 470.57.02, 53 %, 21 %, 32510 MiB, 13602 MiB, 18908 MiB 2022/07/06 12:35:38.048, Tesla V100-SXM2-32GB, 470.57.02, 87 %, 21 %, 32510 MiB, 13514 MiB, 18996 MiB timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] 2022/07/06 12:35:38.048, Tesla V100-SXM2-32GB, 470.57.02, 40 %, 20 %, 32510 MiB, 13586 MiB, 18924 MiB 2022/07/06 12:35:38.049, Tesla V100-SXM2-32GB, 470.57.02, 98 %, 20 %, 32510 MiB, 13498 MiB, 19012 MiB 2022/07/06 12:35:38.050, Tesla V100-SXM2-32GB, 470.57.02, 53 %, 21 %, 32510 MiB, 13602 MiB, 18908 MiB 2022/07/06 12:35:38.052, Tesla V100-SXM2-32GB, 470.57.02, 53 %, 21 %, 32510 MiB, 13602 MiB, 18908 MiB 2022/07/06 12:35:38.052, Tesla V100-SXM2-32GB, 470.57.02, 40 %, 20 %, 32510 MiB, 13586 MiB, 18924 MiB 2022/07/06 12:35:38.052, Tesla V100-SXM2-32GB, 470.57.02, 27 %, 19 %, 32510 MiB, 13506 MiB, 19004 MiB 2022/07/06 12:35:38.053, Tesla V100-SXM2-32GB, 470.57.02, 3 %, 2 %, 32510 MiB, 13498 MiB, 19012 MiB 2022/07/06 12:35:38.053, Tesla V100-SXM2-32GB, 470.57.02, 98 %, 22 %, 32510 MiB, 13528 MiB, 18982 MiB 2022/07/06 12:35:38.053, Tesla V100-SXM2-32GB, 470.57.02, 27 %, 19 %, 32510 MiB, 13506 MiB, 19004 MiB 2022/07/06 12:35:38.055, Tesla V100-SXM2-32GB, 470.57.02, 40 %, 20 %, 32510 MiB, 13586 MiB, 18924 MiB 2022/07/06 12:35:38.058, Tesla V100-SXM2-32GB, 470.57.02, 40 %, 20 %, 32510 MiB, 13586 MiB, 18924 MiB 2022/07/06 12:35:38.058, Tesla V100-SXM2-32GB, 470.57.02, 2 %, 1 %, 32510 MiB, 13528 MiB, 18982 MiB 2022/07/06 12:35:38.059, Tesla V100-SXM2-32GB, 470.57.02, 4 %, 4 %, 32510 MiB, 13496 MiB, 19014 MiB 2022/07/06 12:35:38.060, Tesla V100-SXM2-32GB, 470.57.02, 74 %, 24 %, 32510 MiB, 13504 MiB, 19006 MiB 2022/07/06 12:35:38.060, Tesla V100-SXM2-32GB, 470.57.02, 4 %, 4 %, 32510 MiB, 13496 MiB, 19014 MiB 2022/07/06 12:35:38.061, Tesla V100-SXM2-32GB, 470.57.02, 2 %, 1 %, 32510 MiB, 13528 MiB, 18982 MiB 2022/07/06 12:35:38.064, Tesla V100-SXM2-32GB, 470.57.02, 2 %, 1 %, 32510 MiB, 13528 MiB, 18982 MiB 2022/07/06 12:35:38.064, Tesla V100-SXM2-32GB, 470.57.02, 74 %, 24 %, 32510 MiB, 13504 MiB, 19006 MiB 2022/07/06 12:35:38.064, Tesla V100-SXM2-32GB, 470.57.02, 53 %, 21 %, 32510 MiB, 13602 MiB, 18908 MiB 2022/07/06 12:35:38.065, Tesla V100-SXM2-32GB, 470.57.02, 87 %, 21 %, 32510 MiB, 13514 MiB, 18996 MiB 2022/07/06 12:35:38.066, Tesla V100-SXM2-32GB, 470.57.02, 53 %, 21 %, 32510 MiB, 13602 MiB, 18908 MiB 2022/07/06 12:35:38.067, Tesla V100-SXM2-32GB, 470.57.02, 74 %, 24 %, 32510 MiB, 13504 MiB, 19006 MiB 2022/07/06 12:35:38.070, Tesla V100-SXM2-32GB, 470.57.02, 74 %, 24 %, 32510 MiB, 13504 MiB, 19006 MiB 2022/07/06 12:35:38.070, Tesla V100-SXM2-32GB, 470.57.02, 87 %, 21 %, 32510 MiB, 13514 MiB, 18996 MiB 2022/07/06 12:35:38.070, Tesla V100-SXM2-32GB, 470.57.02, 40 %, 20 %, 32510 MiB, 13586 MiB, 18924 MiB 2022/07/06 12:35:38.071, Tesla V100-SXM2-32GB, 470.57.02, 3 %, 2 %, 32510 MiB, 13498 MiB, 19012 MiB 2022/07/06 12:35:38.072, Tesla V100-SXM2-32GB, 470.57.02, 40 %, 20 %, 32510 MiB, 13586 MiB, 18924 MiB 2022/07/06 12:35:38.073, Tesla V100-SXM2-32GB, 470.57.02, 87 %, 21 %, 32510 MiB, 13514 MiB, 18996 MiB 2022/07/06 12:35:38.076, Tesla V100-SXM2-32GB, 470.57.02, 0 %, 0 %, 32510 MiB, 13514 MiB, 18996 MiB 2022/07/06 12:35:38.076, Tesla V100-SXM2-32GB, 470.57.02, 3 %, 2 %, 32510 MiB, 13498 MiB, 19012 MiB 2022/07/06 12:35:38.076, Tesla V100-SXM2-32GB, 470.57.02, 2 %, 1 %, 32510 MiB, 13528 MiB, 18982 MiB 2022/07/06 12:35:38.080, Tesla V100-SXM2-32GB, 470.57.02, 2 %, 1 %, 32510 MiB, 13528 MiB, 18982 MiB 2022/07/06 12:35:38.081, Tesla V100-SXM2-32GB, 470.57.02, 3 %, 2 %, 32510 MiB, 13498 MiB, 19012 MiB 2022/07/06 12:35:38.085, Tesla V100-SXM2-32GB, 470.57.02, 3 %, 2 %, 32510 MiB, 13498 MiB, 19012 MiB 2022/07/06 12:35:38.086, Tesla V100-SXM2-32GB, 470.57.02, 74 %, 24 %, 32510 MiB, 13504 MiB, 19006 MiB 2022/07/06 12:35:38.087, Tesla V100-SXM2-32GB, 470.57.02, 74 %, 24 %, 32510 MiB, 13504 MiB, 19006 MiB 2022/07/06 12:35:38.089, Tesla V100-SXM2-32GB, 470.57.02, 0 %, 0 %, 32510 MiB, 13514 MiB, 18996 MiB 2022/07/06 12:35:38.090, Tesla V100-SXM2-32GB, 470.57.02, 0 %, 0 %, 32510 MiB, 13514 MiB, 18996 MiB 2022/07/06 12:35:38.093, Tesla V100-SXM2-32GB, 470.57.02, 3 %, 2 %, 32510 MiB, 13498 MiB, 19012 MiB 2022/07/06 12:35:38.093, Tesla V100-SXM2-32GB, 470.57.02, 3 %, 2 %, 32510 MiB, 13498 MiB, 19012 MiB iteration 200/ 220 | consumed samples: 25600 | elapsed time per iteration (ms): 431.5 | learning rate: 8.578E-06 | tpt: 296.6 samples/s | global batch size: 128 | lm loss: 8.772468E+00 | loss scale: 65536.0 | grad norm: 3.504 | number of skipped iterations: 1 | number of nan iterations: 0 | time (ms) | forward-compute: 95.01 | backward-compute: 145.47 | backward-params-all-reduce: 151.64 | backward-embedding-all-reduce: 0.03 | optimizer-copy-to-main-grad: 4.29 | optimizer-unscale-and-check-inf: 4.53 | optimizer-clip-main-grad: 6.84 | optimizer-copy-main-to-model-params: 4.66 | optimizer: 34.49 | batch-generator: 0.97 ------------------------------------------------------------------------------------------------------------------ validation loss at the end of training for val data | lm loss value: 7.836002E+00 | lm loss PPL: 2.530069E+03 | ------------------------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------------------------- validation loss at the end of training for test data | lm loss value: 7.669576E+00 | lm loss PPL: 2.142173E+03 | ------------------------------------------------------------------------------------------------------------------- INFO:torch.distributed.elastic.agent.server.api:[default] worker group successfully finished. Waiting 300 seconds for other agents to finish. INFO:torch.distributed.elastic.agent.server.api:Local worker group finished (SUCCEEDED). Waiting 300 seconds for other agents to finish /opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:70: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:Done waiting for other agents. Elapsed: 0.005271434783935547 seconds {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 24, "group_rank": 3, "worker_id": "3373", "role": "default", "hostname": "iv-ybpu7pvmis5m57pm6ny1", "state": "SUCCEEDED", "total_run_time": 115, "rdzv_backend": "static", "raw_error": null, "metadata": "{\"group_world_size\": 4, \"entry_point\": \"python\", \"local_rank\": [0], \"role_rank\": [24], \"role_world_size\": [32]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 25, "group_rank": 3, "worker_id": "3374", "role": "default", "hostname": "iv-ybpu7pvmis5m57pm6ny1", "state": "SUCCEEDED", "total_run_time": 115, "rdzv_backend": "static", "raw_error": null, "metadata": "{\"group_world_size\": 4, \"entry_point\": \"python\", \"local_rank\": [1], \"role_rank\": [25], \"role_world_size\": [32]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 26, "group_rank": 3, "worker_id": "3375", "role": "default", "hostname": "iv-ybpu7pvmis5m57pm6ny1", "state": "SUCCEEDED", "total_run_time": 115, "rdzv_backend": "static", "raw_error": null, "metadata": "{\"group_world_size\": 4, \"entry_point\": \"python\", \"local_rank\": [2], \"role_rank\": [26], \"role_world_size\": [32]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 27, "group_rank": 3, "worker_id": "3376", "role": "default", "hostname": "iv-ybpu7pvmis5m57pm6ny1", "state": "SUCCEEDED", "total_run_time": 115, "rdzv_backend": "static", "raw_error": null, "metadata": "{\"group_world_size\": 4, \"entry_point\": \"python\", \"local_rank\": [3], \"role_rank\": [27], \"role_world_size\": [32]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 28, "group_rank": 3, "worker_id": "3377", "role": "default", "hostname": "iv-ybpu7pvmis5m57pm6ny1", "state": "SUCCEEDED", "total_run_time": 115, "rdzv_backend": "static", "raw_error": null, "metadata": "{\"group_world_size\": 4, \"entry_point\": \"python\", \"local_rank\": [4], \"role_rank\": [28], \"role_world_size\": [32]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 29, "group_rank": 3, "worker_id": "3378", "role": "default", "hostname": "iv-ybpu7pvmis5m57pm6ny1", "state": "SUCCEEDED", "total_run_time": 115, "rdzv_backend": "static", "raw_error": null, "metadata": "{\"group_world_size\": 4, \"entry_point\": \"python\", \"local_rank\": [5], \"role_rank\": [29], \"role_world_size\": [32]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 30, "group_rank": 3, "worker_id": "3379", "role": "default", "hostname": "iv-ybpu7pvmis5m57pm6ny1", "state": "SUCCEEDED", "total_run_time": 115, "rdzv_backend": "static", "raw_error": null, "metadata": "{\"group_world_size\": 4, \"entry_point\": \"python\", \"local_rank\": [6], \"role_rank\": [30], \"role_world_size\": [32]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 31, "group_rank": 3, "worker_id": "3380", "role": "default", "hostname": "iv-ybpu7pvmis5m57pm6ny1", "state": "SUCCEEDED", "total_run_time": 115, "rdzv_backend": "static", "raw_error": null, "metadata": "{\"group_world_size\": 4, \"entry_point\": \"python\", \"local_rank\": [7], \"role_rank\": [31], \"role_world_size\": [32]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "AGENT", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": null, "group_rank": 3, "worker_id": null, "role": "default", "hostname": "iv-ybpu7pvmis5m57pm6ny1", "state": "SUCCEEDED", "total_run_time": 115, "rdzv_backend": "static", "raw_error": null, "metadata": "{\"group_world_size\": 4, \"entry_point\": \"python\"}", "agent_restarts": 0}} ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. *****************************************