The module torch.distributed.launch is deprecated and going to be removed in future.Migrate to torch.distributed.run WARNING:torch.distributed.run:--use_env is deprecated and will be removed in future releases. Please read local_rank from `os.environ('LOCAL_RANK')` instead. INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs: entrypoint : pretrain_gpt.py min_nodes : 2 max_nodes : 2 nproc_per_node : 8 run_id : none rdzv_backend : static rdzv_endpoint : 198.18.8.30:6000 rdzv_configs : {'rank': 1, 'timeout': 900} max_restarts : 3 monitor_interval : 5 log_dir : None metrics_cfg : {} INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_hraku2to/none_5ir3bn8b INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group /opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:52: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: restart_count=0 master_addr=198.18.8.30 master_port=6000 group_rank=1 group_world_size=2 local_ranks=[0, 1, 2, 3, 4, 5, 6, 7] role_ranks=[8, 9, 10, 11, 12, 13, 14, 15] global_ranks=[8, 9, 10, 11, 12, 13, 14, 15] role_world_sizes=[16, 16, 16, 16, 16, 16, 16, 16] global_world_sizes=[16, 16, 16, 16, 16, 16, 16, 16] INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_hraku2to/none_5ir3bn8b/attempt_0/0/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_hraku2to/none_5ir3bn8b/attempt_0/1/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_hraku2to/none_5ir3bn8b/attempt_0/2/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker3 reply file to: /tmp/torchelastic_hraku2to/none_5ir3bn8b/attempt_0/3/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker4 reply file to: /tmp/torchelastic_hraku2to/none_5ir3bn8b/attempt_0/4/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker5 reply file to: /tmp/torchelastic_hraku2to/none_5ir3bn8b/attempt_0/5/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker6 reply file to: /tmp/torchelastic_hraku2to/none_5ir3bn8b/attempt_0/6/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker7 reply file to: /tmp/torchelastic_hraku2to/none_5ir3bn8b/attempt_0/7/error.json [W ProcessGroupNCCL.cpp:1671] Rank 13 using best-guess GPU 5 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device. [W ProcessGroupNCCL.cpp:1671] Rank 9 using best-guess GPU 1 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device. [W ProcessGroupNCCL.cpp:1671] Rank 12 using best-guess GPU 4 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device. [W ProcessGroupNCCL.cpp:1671] Rank 8 using best-guess GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device. [W ProcessGroupNCCL.cpp:1671] Rank 14 using best-guess GPU 6 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device. [W ProcessGroupNCCL.cpp:1671] Rank 11 using best-guess GPU 3 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device. [W ProcessGroupNCCL.cpp:1671] Rank 10 using best-guess GPU 2 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device. [W ProcessGroupNCCL.cpp:1671] Rank 15 using best-guess GPU 7 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device. iv-2udaavw4l02thdv8lcrl:34206:34206 [5] NCCL INFO Bootstrap : Using eth0:192.168.11.42<0> iv-2udaavw4l02thdv8lcrl:34205:34205 [4] NCCL INFO Bootstrap : Using eth0:192.168.11.42<0> iv-2udaavw4l02thdv8lcrl:34208:34208 [7] NCCL INFO Bootstrap : Using eth0:192.168.11.42<0> iv-2udaavw4l02thdv8lcrl:34203:34203 [2] NCCL INFO Bootstrap : Using eth0:192.168.11.42<0> iv-2udaavw4l02thdv8lcrl:34202:34202 [1] NCCL INFO Bootstrap : Using eth0:192.168.11.42<0> iv-2udaavw4l02thdv8lcrl:34206:34206 [5] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so iv-2udaavw4l02thdv8lcrl:34206:34206 [5] NCCL INFO P2P plugin IBext iv-2udaavw4l02thdv8lcrl:34206:34206 [5] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. iv-2udaavw4l02thdv8lcrl:34207:34207 [6] NCCL INFO Bootstrap : Using eth0:192.168.11.42<0> iv-2udaavw4l02thdv8lcrl:34204:34204 [3] NCCL INFO Bootstrap : Using eth0:192.168.11.42<0> iv-2udaavw4l02thdv8lcrl:34205:34205 [4] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so iv-2udaavw4l02thdv8lcrl:34208:34208 [7] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so iv-2udaavw4l02thdv8lcrl:34203:34203 [2] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so iv-2udaavw4l02thdv8lcrl:34203:34203 [2] NCCL INFO P2P plugin IBext iv-2udaavw4l02thdv8lcrl:34208:34208 [7] NCCL INFO P2P plugin IBext iv-2udaavw4l02thdv8lcrl:34205:34205 [4] NCCL INFO P2P plugin IBext iv-2udaavw4l02thdv8lcrl:34203:34203 [2] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. iv-2udaavw4l02thdv8lcrl:34208:34208 [7] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. iv-2udaavw4l02thdv8lcrl:34205:34205 [4] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. iv-2udaavw4l02thdv8lcrl:34202:34202 [1] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so iv-2udaavw4l02thdv8lcrl:34202:34202 [1] NCCL INFO P2P plugin IBext iv-2udaavw4l02thdv8lcrl:34202:34202 [1] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. iv-2udaavw4l02thdv8lcrl:34201:34201 [0] NCCL INFO Bootstrap : Using eth0:192.168.11.42<0> iv-2udaavw4l02thdv8lcrl:34207:34207 [6] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so iv-2udaavw4l02thdv8lcrl:34207:34207 [6] NCCL INFO P2P plugin IBext iv-2udaavw4l02thdv8lcrl:34207:34207 [6] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. iv-2udaavw4l02thdv8lcrl:34204:34204 [3] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so iv-2udaavw4l02thdv8lcrl:34204:34204 [3] NCCL INFO P2P plugin IBext iv-2udaavw4l02thdv8lcrl:34204:34204 [3] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. iv-2udaavw4l02thdv8lcrl:34201:34201 [0] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so iv-2udaavw4l02thdv8lcrl:34201:34201 [0] NCCL INFO P2P plugin IBext iv-2udaavw4l02thdv8lcrl:34201:34201 [0] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. iv-2udaavw4l02thdv8lcrl:34206:34206 [5] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE ; OOB eth0:192.168.11.42<0> iv-2udaavw4l02thdv8lcrl:34206:34206 [5] NCCL INFO Using network IBext iv-2udaavw4l02thdv8lcrl:34202:34202 [1] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE ; OOB eth0:192.168.11.42<0> iv-2udaavw4l02thdv8lcrl:34202:34202 [1] NCCL INFO Using network IBext iv-2udaavw4l02thdv8lcrl:34205:34205 [4] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE ; OOB eth0:192.168.11.42<0> iv-2udaavw4l02thdv8lcrl:34205:34205 [4] NCCL INFO Using network IBext iv-2udaavw4l02thdv8lcrl:34201:34201 [0] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE ; OOB eth0:192.168.11.42<0> iv-2udaavw4l02thdv8lcrl:34201:34201 [0] NCCL INFO Using network IBext iv-2udaavw4l02thdv8lcrl:34207:34207 [6] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE ; OOB eth0:192.168.11.42<0> iv-2udaavw4l02thdv8lcrl:34207:34207 [6] NCCL INFO Using network IBext iv-2udaavw4l02thdv8lcrl:34203:34203 [2] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE ; OOB eth0:192.168.11.42<0> iv-2udaavw4l02thdv8lcrl:34203:34203 [2] NCCL INFO Using network IBext iv-2udaavw4l02thdv8lcrl:34208:34208 [7] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE ; OOB eth0:192.168.11.42<0> iv-2udaavw4l02thdv8lcrl:34204:34204 [3] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE ; OOB eth0:192.168.11.42<0> iv-2udaavw4l02thdv8lcrl:34208:34208 [7] NCCL INFO Using network IBext iv-2udaavw4l02thdv8lcrl:34204:34204 [3] NCCL INFO Using network IBext iv-2udaavw4l02thdv8lcrl:34208:34397 [7] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. iv-2udaavw4l02thdv8lcrl:34201:34393 [0] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. iv-2udaavw4l02thdv8lcrl:34202:34391 [1] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. iv-2udaavw4l02thdv8lcrl:34205:34392 [4] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. iv-2udaavw4l02thdv8lcrl:34203:34395 [2] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. iv-2udaavw4l02thdv8lcrl:34207:34394 [6] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. iv-2udaavw4l02thdv8lcrl:34204:34396 [3] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. iv-2udaavw4l02thdv8lcrl:34206:34390 [5] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. iv-2udaavw4l02thdv8lcrl:34201:34393 [0] NCCL INFO NCCL_IB_TIMEOUT set by environment to 23. iv-2udaavw4l02thdv8lcrl:34201:34393 [0] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 7. iv-2udaavw4l02thdv8lcrl:34202:34391 [1] NCCL INFO NCCL_IB_TIMEOUT set by environment to 23. iv-2udaavw4l02thdv8lcrl:34202:34391 [1] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 7. iv-2udaavw4l02thdv8lcrl:34205:34392 [4] NCCL INFO NCCL_IB_TIMEOUT set by environment to 23. iv-2udaavw4l02thdv8lcrl:34208:34397 [7] NCCL INFO NCCL_IB_TIMEOUT set by environment to 23. iv-2udaavw4l02thdv8lcrl:34208:34397 [7] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 7. iv-2udaavw4l02thdv8lcrl:34205:34392 [4] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 7. iv-2udaavw4l02thdv8lcrl:34203:34395 [2] NCCL INFO NCCL_IB_TIMEOUT set by environment to 23. iv-2udaavw4l02thdv8lcrl:34203:34395 [2] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 7. iv-2udaavw4l02thdv8lcrl:34207:34394 [6] NCCL INFO NCCL_IB_TIMEOUT set by environment to 23. iv-2udaavw4l02thdv8lcrl:34207:34394 [6] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 7. iv-2udaavw4l02thdv8lcrl:34204:34396 [3] NCCL INFO NCCL_IB_TIMEOUT set by environment to 23. iv-2udaavw4l02thdv8lcrl:34204:34396 [3] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 7. iv-2udaavw4l02thdv8lcrl:34206:34390 [5] NCCL INFO NCCL_IB_TIMEOUT set by environment to 23. iv-2udaavw4l02thdv8lcrl:34206:34390 [5] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 7. iv-2udaavw4l02thdv8lcrl:34201:34393 [0] NCCL INFO Trees [0] 10/-1/-1->8->15 [1] 10/-1/-1->8->15 iv-2udaavw4l02thdv8lcrl:34201:34393 [0] NCCL INFO Setting affinity for GPU 0 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:34202:34391 [1] NCCL INFO Trees [0] 14/-1/-1->9->11 [1] 14/-1/-1->9->11 iv-2udaavw4l02thdv8lcrl:34202:34391 [1] NCCL INFO Setting affinity for GPU 1 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:34203:34395 [2] NCCL INFO Trees [0] 11/-1/-1->10->8 [1] 11/-1/-1->10->8 iv-2udaavw4l02thdv8lcrl:34203:34395 [2] NCCL INFO Setting affinity for GPU 2 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:34204:34396 [3] NCCL INFO Trees [0] 9/-1/-1->11->10 [1] 9/-1/-1->11->10 iv-2udaavw4l02thdv8lcrl:34204:34396 [3] NCCL INFO Setting affinity for GPU 3 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:34205:34392 [4] NCCL INFO Trees [0] 13/-1/-1->12->4 [1] 13/4/-1->12->-1 iv-2udaavw4l02thdv8lcrl:34205:34392 [4] NCCL INFO Setting affinity for GPU 4 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:34208:34397 [7] NCCL INFO Trees [0] 8/-1/-1->15->13 [1] 8/-1/-1->15->13 iv-2udaavw4l02thdv8lcrl:34207:34394 [6] NCCL INFO Trees [0] -1/-1/-1->14->9 [1] -1/-1/-1->14->9 iv-2udaavw4l02thdv8lcrl:34206:34390 [5] NCCL INFO Trees [0] 15/-1/-1->13->12 [1] 15/-1/-1->13->12 iv-2udaavw4l02thdv8lcrl:34208:34397 [7] NCCL INFO Setting affinity for GPU 7 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:34207:34394 [6] NCCL INFO Setting affinity for GPU 6 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:34206:34390 [5] NCCL INFO Setting affinity for GPU 5 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:34207:34394 [6] NCCL INFO Channel 00 : 14[6b010] -> 15[6b020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34205:34392 [4] NCCL INFO Channel 00 : 12[69010] -> 14[6b010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34207:34394 [6] NCCL INFO Channel 01 : 14[6b010] -> 15[6b020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34203:34395 [2] NCCL INFO Channel 00 : 10[67010] -> 13[69020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34201:34393 [0] NCCL INFO Channel 00 : 8[65010] -> 9[65020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34205:34392 [4] NCCL INFO Channel 01 : 12[69010] -> 14[6b010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34203:34395 [2] NCCL INFO Channel 01 : 10[67010] -> 13[69020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34201:34393 [0] NCCL INFO Channel 01 : 8[65010] -> 9[65020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34208:34397 [7] NCCL INFO Channel 00 : 15[6b020] -> 8[65010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34206:34390 [5] NCCL INFO Channel 00 : 13[69020] -> 4[69010] [send] via NET/IBext/0/GDRDMA iv-2udaavw4l02thdv8lcrl:34202:34391 [1] NCCL INFO Channel 00 : 9[65020] -> 11[67020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34208:34397 [7] NCCL INFO Channel 01 : 15[6b020] -> 8[65010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34206:34390 [5] NCCL INFO Channel 01 : 13[69020] -> 4[69010] [send] via NET/IBext/0/GDRDMA iv-2udaavw4l02thdv8lcrl:34202:34391 [1] NCCL INFO Channel 01 : 9[65020] -> 11[67020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34207:34394 [6] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34202:34391 [1] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34208:34397 [7] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34201:34393 [0] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34205:34392 [4] NCCL INFO Channel 00 : 5[69020] -> 12[69010] [receive] via NET/IBext/0/GDRDMA iv-2udaavw4l02thdv8lcrl:34202:34391 [1] NCCL INFO Channel 00 : 9[65020] -> 14[6b010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34204:34396 [3] NCCL INFO Channel 00 : 11[67020] -> 10[67010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34201:34393 [0] NCCL INFO Channel 00 : 8[65010] -> 10[67010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34205:34392 [4] NCCL INFO Channel 01 : 5[69020] -> 12[69010] [receive] via NET/IBext/0/GDRDMA iv-2udaavw4l02thdv8lcrl:34202:34391 [1] NCCL INFO Channel 01 : 9[65020] -> 14[6b010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34204:34396 [3] NCCL INFO Channel 01 : 11[67020] -> 10[67010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34201:34393 [0] NCCL INFO Channel 01 : 8[65010] -> 10[67010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34203:34395 [2] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34204:34396 [3] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34203:34395 [2] NCCL INFO Channel 00 : 10[67010] -> 11[67020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34207:34394 [6] NCCL INFO Channel 00 : 14[6b010] -> 9[65020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34206:34390 [5] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34205:34392 [4] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34203:34395 [2] NCCL INFO Channel 01 : 10[67010] -> 11[67020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34207:34394 [6] NCCL INFO Channel 01 : 14[6b010] -> 9[65020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34205:34392 [4] NCCL INFO Channel 00 : 12[69010] -> 13[69020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34207:34394 [6] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34207:34394 [6] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34207:34394 [6] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34205:34392 [4] NCCL INFO Channel 01 : 12[69010] -> 13[69020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34204:34396 [3] NCCL INFO Channel 00 : 11[67020] -> 9[65020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34204:34396 [3] NCCL INFO Channel 01 : 11[67020] -> 9[65020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34205:34392 [4] NCCL INFO Channel 00 : 4[69010] -> 12[69010] [receive] via NET/IBext/0/GDRDMA iv-2udaavw4l02thdv8lcrl:34206:34390 [5] NCCL INFO Channel 00 : 13[69020] -> 15[6b020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34205:34392 [4] NCCL INFO Channel 01 : 4[69010] -> 12[69010] [receive] via NET/IBext/0/GDRDMA iv-2udaavw4l02thdv8lcrl:34206:34390 [5] NCCL INFO Channel 01 : 13[69020] -> 15[6b020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34203:34395 [2] NCCL INFO Channel 00 : 10[67010] -> 8[65010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34201:34393 [0] NCCL INFO Channel 00 : 8[65010] -> 15[6b020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34205:34392 [4] NCCL INFO Channel 00 : 12[69010] -> 4[69010] [send] via NET/IBext/0/GDRDMA iv-2udaavw4l02thdv8lcrl:34202:34391 [1] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34202:34391 [1] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34202:34391 [1] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34203:34395 [2] NCCL INFO Channel 01 : 10[67010] -> 8[65010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34204:34396 [3] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34204:34396 [3] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34204:34396 [3] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34201:34393 [0] NCCL INFO Channel 01 : 8[65010] -> 15[6b020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34202:34391 [1] NCCL INFO Channel 01 : 9[65020] -> 12[69010] via P2P/indirect/14[6b010] iv-2udaavw4l02thdv8lcrl:34205:34392 [4] NCCL INFO Channel 01 : 12[69010] -> 4[69010] [send] via NET/IBext/0/GDRDMA iv-2udaavw4l02thdv8lcrl:34204:34396 [3] NCCL INFO Channel 00 : 11[67020] -> 13[69020] via P2P/indirect/12[69010] iv-2udaavw4l02thdv8lcrl:34208:34397 [7] NCCL INFO Channel 00 : 15[6b020] -> 13[69020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34208:34397 [7] NCCL INFO Channel 01 : 15[6b020] -> 13[69020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34201:34393 [0] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34201:34393 [0] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34201:34393 [0] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34203:34395 [2] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34203:34395 [2] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34203:34395 [2] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34208:34397 [7] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34208:34397 [7] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34208:34397 [7] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34201:34393 [0] NCCL INFO Channel 00 : 8[65010] -> 12[69010] via P2P/indirect/11[67020] iv-2udaavw4l02thdv8lcrl:34206:34390 [5] NCCL INFO Channel 00 : 13[69020] -> 12[69010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34203:34395 [2] NCCL INFO Channel 00 : 10[67010] -> 12[69010] via P2P/indirect/13[69020] iv-2udaavw4l02thdv8lcrl:34206:34390 [5] NCCL INFO Channel 01 : 13[69020] -> 12[69010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34206:34390 [5] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34206:34390 [5] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34206:34390 [5] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34205:34392 [4] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34205:34392 [4] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34205:34392 [4] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34204:34396 [3] NCCL INFO Channel 01 : 11[67020] -> 14[6b010] via P2P/indirect/9[65020] iv-2udaavw4l02thdv8lcrl:34203:34395 [2] NCCL INFO Channel 00 : 10[67010] -> 14[6b010] via P2P/indirect/9[65020] iv-2udaavw4l02thdv8lcrl:34204:34396 [3] NCCL INFO Channel 00 : 11[67020] -> 15[6b020] via P2P/indirect/8[65010] iv-2udaavw4l02thdv8lcrl:34202:34391 [1] NCCL INFO Channel 00 : 9[65020] -> 13[69020] via P2P/indirect/14[6b010] iv-2udaavw4l02thdv8lcrl:34203:34395 [2] NCCL INFO Channel 01 : 10[67010] -> 15[6b020] via P2P/indirect/8[65010] iv-2udaavw4l02thdv8lcrl:34205:34392 [4] NCCL INFO Channel 00 : 12[69010] -> 8[65010] via P2P/indirect/15[6b020] iv-2udaavw4l02thdv8lcrl:34202:34391 [1] NCCL INFO Channel 00 : 9[65020] -> 15[6b020] via P2P/indirect/8[65010] iv-2udaavw4l02thdv8lcrl:34201:34393 [0] NCCL INFO Channel 01 : 8[65010] -> 13[69020] via P2P/indirect/15[6b020] iv-2udaavw4l02thdv8lcrl:34201:34393 [0] NCCL INFO Channel 00 : 8[65010] -> 14[6b010] via P2P/indirect/9[65020] iv-2udaavw4l02thdv8lcrl:34206:34390 [5] NCCL INFO Channel 01 : 13[69020] -> 8[65010] via P2P/indirect/15[6b020] iv-2udaavw4l02thdv8lcrl:34208:34397 [7] NCCL INFO Channel 00 : 15[6b020] -> 9[65020] via P2P/indirect/14[6b010] iv-2udaavw4l02thdv8lcrl:34207:34394 [6] NCCL INFO Channel 00 : 14[6b010] -> 8[65010] via P2P/indirect/15[6b020] iv-2udaavw4l02thdv8lcrl:34208:34397 [7] NCCL INFO Channel 01 : 15[6b020] -> 10[67010] via P2P/indirect/8[65010] iv-2udaavw4l02thdv8lcrl:34207:34394 [6] NCCL INFO Channel 00 : 14[6b010] -> 10[67010] via P2P/indirect/13[69020] iv-2udaavw4l02thdv8lcrl:34208:34397 [7] NCCL INFO Channel 00 : 15[6b020] -> 11[67020] via P2P/indirect/8[65010] iv-2udaavw4l02thdv8lcrl:34207:34394 [6] NCCL INFO Channel 01 : 14[6b010] -> 11[67020] via P2P/indirect/9[65020] iv-2udaavw4l02thdv8lcrl:34206:34390 [5] NCCL INFO Channel 00 : 13[69020] -> 9[65020] via P2P/indirect/14[6b010] iv-2udaavw4l02thdv8lcrl:34206:34390 [5] NCCL INFO Channel 00 : 13[69020] -> 11[67020] via P2P/indirect/10[67010] iv-2udaavw4l02thdv8lcrl:34205:34392 [4] NCCL INFO Channel 01 : 12[69010] -> 9[65020] via P2P/indirect/14[6b010] iv-2udaavw4l02thdv8lcrl:34205:34392 [4] NCCL INFO Channel 00 : 12[69010] -> 10[67010] via P2P/indirect/11[67020] iv-2udaavw4l02thdv8lcrl:34202:34391 [1] NCCL INFO comm 0x7f0938008fb0 rank 9 nranks 16 cudaDev 1 busId 65020 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34201:34393 [0] NCCL INFO comm 0x7f9b20008fb0 rank 8 nranks 16 cudaDev 0 busId 65010 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34204:34396 [3] NCCL INFO comm 0x7ff308008fb0 rank 11 nranks 16 cudaDev 3 busId 67020 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34205:34392 [4] NCCL INFO comm 0x7f1180008fb0 rank 12 nranks 16 cudaDev 4 busId 69010 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34208:34397 [7] NCCL INFO comm 0x7f13d0008fb0 rank 15 nranks 16 cudaDev 7 busId 6b020 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34207:34394 [6] NCCL INFO comm 0x7feb40008fb0 rank 14 nranks 16 cudaDev 6 busId 6b010 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34203:34395 [2] NCCL INFO comm 0x7fcf5c008fb0 rank 10 nranks 16 cudaDev 2 busId 67010 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34206:34390 [5] NCCL INFO comm 0x7f9d4c008fb0 rank 13 nranks 16 cudaDev 5 busId 69020 - Init COMPLETE > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 37807104 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 37807104 iv-2udaavw4l02thdv8lcrl:34207:34435 [6] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] 0/-1/-1->1->-1 iv-2udaavw4l02thdv8lcrl:34207:34435 [6] NCCL INFO Setting affinity for GPU 6 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:34208:34436 [7] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] 0/-1/-1->1->-1 iv-2udaavw4l02thdv8lcrl:34208:34436 [7] NCCL INFO Setting affinity for GPU 7 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:34205:34434 [4] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] 0/-1/-1->1->-1 iv-2udaavw4l02thdv8lcrl:34205:34434 [4] NCCL INFO Setting affinity for GPU 4 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:34206:34437 [5] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] 0/-1/-1->1->-1 iv-2udaavw4l02thdv8lcrl:34206:34437 [5] NCCL INFO Setting affinity for GPU 5 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:34205:34434 [4] NCCL INFO Channel 00 : 0[65010] -> 1[69010] [receive] via NET/IBext/0/GDRDMA iv-2udaavw4l02thdv8lcrl:34205:34434 [4] NCCL INFO Channel 01 : 0[65010] -> 1[69010] [receive] via NET/IBext/0/GDRDMA iv-2udaavw4l02thdv8lcrl:34206:34437 [5] NCCL INFO Channel 00 : 0[65020] -> 1[69020] [receive] via NET/IBext/0/GDRDMA iv-2udaavw4l02thdv8lcrl:34205:34434 [4] NCCL INFO Channel 00 : 1[69010] -> 0[65010] [send] via NET/IBext/0/GDRDMA iv-2udaavw4l02thdv8lcrl:34206:34437 [5] NCCL INFO Channel 01 : 0[65020] -> 1[69020] [receive] via NET/IBext/0/GDRDMA iv-2udaavw4l02thdv8lcrl:34206:34437 [5] NCCL INFO Channel 00 : 1[69020] -> 0[65020] [send] via NET/IBext/0/GDRDMA iv-2udaavw4l02thdv8lcrl:34207:34435 [6] NCCL INFO Channel 00 : 0[67010] -> 1[6b010] [receive] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34205:34434 [4] NCCL INFO Channel 01 : 1[69010] -> 0[65010] [send] via NET/IBext/0/GDRDMA iv-2udaavw4l02thdv8lcrl:34208:34436 [7] NCCL INFO Channel 00 : 0[67020] -> 1[6b020] [receive] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34206:34437 [5] NCCL INFO Channel 01 : 1[69020] -> 0[65020] [send] via NET/IBext/0/GDRDMA iv-2udaavw4l02thdv8lcrl:34207:34435 [6] NCCL INFO Channel 01 : 0[67010] -> 1[6b010] [receive] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34208:34436 [7] NCCL INFO Channel 01 : 0[67020] -> 1[6b020] [receive] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34207:34435 [6] NCCL INFO Channel 00 : 1[6b010] -> 0[67010] [send] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34208:34436 [7] NCCL INFO Channel 00 : 1[6b020] -> 0[67020] [send] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34207:34435 [6] NCCL INFO Channel 01 : 1[6b010] -> 0[67010] [send] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34208:34436 [7] NCCL INFO Channel 01 : 1[6b020] -> 0[67020] [send] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34208:34436 [7] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34208:34436 [7] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34208:34436 [7] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34208:34436 [7] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34208:34436 [7] NCCL INFO comm 0x7f13ac008fb0 rank 1 nranks 2 cudaDev 7 busId 6b020 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34207:34435 [6] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34207:34435 [6] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34207:34435 [6] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34207:34435 [6] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34207:34435 [6] NCCL INFO comm 0x7feb1c008fb0 rank 1 nranks 2 cudaDev 6 busId 6b010 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34206:34437 [5] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34206:34437 [5] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34206:34437 [5] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34206:34437 [5] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34206:34437 [5] NCCL INFO comm 0x7f9d34008fb0 rank 1 nranks 2 cudaDev 5 busId 69020 - Init COMPLETE > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 63630336 iv-2udaavw4l02thdv8lcrl:34205:34434 [4] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34205:34434 [4] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34205:34434 [4] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34205:34434 [4] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34205:34434 [4] NCCL INFO comm 0x7f115c008fb0 rank 1 nranks 2 cudaDev 4 busId 69010 - Init COMPLETE > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 63630336 NCCL version 2.10.3+cuda11.4 NCCL version 2.10.3+cuda11.4 iv-2udaavw4l02thdv8lcrl:34205:34461 [4] NCCL INFO Channel 00/04 : 0 1 iv-2udaavw4l02thdv8lcrl:34205:34461 [4] NCCL INFO Channel 01/04 : 0 1 iv-2udaavw4l02thdv8lcrl:34207:34463 [6] NCCL INFO Trees [0] 0/-1/-1->1->-1 [1] 0/-1/-1->1->-1 [2] 0/-1/-1->1->-1 [3] 0/-1/-1->1->-1 iv-2udaavw4l02thdv8lcrl:34205:34461 [4] NCCL INFO Channel 02/04 : 0 1 iv-2udaavw4l02thdv8lcrl:34205:34461 [4] NCCL INFO Channel 03/04 : 0 1 iv-2udaavw4l02thdv8lcrl:34207:34463 [6] NCCL INFO Setting affinity for GPU 6 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:34205:34461 [4] NCCL INFO Trees [0] -1/-1/-1->0->1 [1] -1/-1/-1->0->1 [2] -1/-1/-1->0->1 [3] -1/-1/-1->0->1 iv-2udaavw4l02thdv8lcrl:34205:34461 [4] NCCL INFO Setting affinity for GPU 4 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:34203:34466 [2] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 [2] -1/-1/-1->1->0 [3] -1/-1/-1->1->0 iv-2udaavw4l02thdv8lcrl:34201:34462 [0] NCCL INFO Channel 00/04 : 0 1 iv-2udaavw4l02thdv8lcrl:34201:34462 [0] NCCL INFO Channel 01/04 : 0 1 iv-2udaavw4l02thdv8lcrl:34203:34466 [2] NCCL INFO Setting affinity for GPU 2 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:34201:34462 [0] NCCL INFO Channel 02/04 : 0 1 iv-2udaavw4l02thdv8lcrl:34201:34462 [0] NCCL INFO Channel 03/04 : 0 1 iv-2udaavw4l02thdv8lcrl:34201:34462 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 iv-2udaavw4l02thdv8lcrl:34201:34462 [0] NCCL INFO Setting affinity for GPU 0 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:34207:34463 [6] NCCL INFO Channel 00 : 1[6b010] -> 0[69010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34205:34461 [4] NCCL INFO Channel 00 : 0[69010] -> 1[6b010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34207:34463 [6] NCCL INFO Channel 01 : 1[6b010] -> 0[69010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34205:34461 [4] NCCL INFO Channel 01 : 0[69010] -> 1[6b010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34207:34463 [6] NCCL INFO Channel 02 : 1[6b010] -> 0[69010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34205:34461 [4] NCCL INFO Channel 02 : 0[69010] -> 1[6b010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34207:34463 [6] NCCL INFO Channel 03 : 1[6b010] -> 0[69010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34205:34461 [4] NCCL INFO Channel 03 : 0[69010] -> 1[6b010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34201:34462 [0] NCCL INFO Channel 00 : 0[65010] -> 1[67010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34203:34466 [2] NCCL INFO Channel 00 : 1[67010] -> 0[65010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34201:34462 [0] NCCL INFO Channel 01 : 0[65010] -> 1[67010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34203:34466 [2] NCCL INFO Channel 01 : 1[67010] -> 0[65010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34201:34462 [0] NCCL INFO Channel 02 : 0[65010] -> 1[67010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34205:34461 [4] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34205:34461 [4] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34205:34461 [4] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34205:34461 [4] NCCL INFO 4 coll channels, 4 p2p channels, 4 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34207:34463 [6] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34207:34463 [6] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34207:34463 [6] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34207:34463 [6] NCCL INFO 4 coll channels, 4 p2p channels, 4 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34201:34462 [0] NCCL INFO Channel 03 : 0[65010] -> 1[67010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34203:34466 [2] NCCL INFO Channel 02 : 1[67010] -> 0[65010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34205:34461 [4] NCCL INFO comm 0x7f1124008fb0 rank 0 nranks 2 cudaDev 4 busId 69010 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34207:34463 [6] NCCL INFO comm 0x7feb1c0d3010 rank 1 nranks 2 cudaDev 6 busId 6b010 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34203:34466 [2] NCCL INFO Channel 03 : 1[67010] -> 0[65010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34205:34205 [4] NCCL INFO Launch mode Parallel iv-2udaavw4l02thdv8lcrl:34201:34462 [0] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34201:34462 [0] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34201:34462 [0] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34201:34462 [0] NCCL INFO 4 coll channels, 4 p2p channels, 4 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34203:34466 [2] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34203:34466 [2] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34203:34466 [2] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34203:34466 [2] NCCL INFO 4 coll channels, 4 p2p channels, 4 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34201:34462 [0] NCCL INFO comm 0x7f9ad8008fb0 rank 0 nranks 2 cudaDev 0 busId 65010 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34203:34466 [2] NCCL INFO comm 0x7fcf38008fb0 rank 1 nranks 2 cudaDev 2 busId 67010 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34201:34201 [0] NCCL INFO Launch mode Parallel iv-2udaavw4l02thdv8lcrl:34201:34474 [0] NCCL INFO Trees [0] -1/-1/-1->2->3 [1] -1/-1/-1->2->3 iv-2udaavw4l02thdv8lcrl:34205:34473 [4] NCCL INFO Trees [0] 2/-1/-1->3->1 [1] 2/1/-1->3->-1 iv-2udaavw4l02thdv8lcrl:34201:34474 [0] NCCL INFO Setting affinity for GPU 0 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:34205:34473 [4] NCCL INFO Setting affinity for GPU 4 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:34203:34476 [2] NCCL INFO Trees [0] -1/-1/-1->2->3 [1] -1/-1/-1->2->3 iv-2udaavw4l02thdv8lcrl:34203:34476 [2] NCCL INFO Setting affinity for GPU 2 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:34207:34475 [6] NCCL INFO Trees [0] 2/-1/-1->3->1 [1] 2/1/-1->3->-1 iv-2udaavw4l02thdv8lcrl:34207:34475 [6] NCCL INFO Setting affinity for GPU 6 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:34201:34474 [0] NCCL INFO Channel 00 : 1[69010] -> 2[65010] [receive] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34203:34476 [2] NCCL INFO Channel 00 : 1[6b010] -> 2[67010] [receive] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34201:34474 [0] NCCL INFO Channel 01 : 1[69010] -> 2[65010] [receive] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34201:34474 [0] NCCL INFO Channel 00 : 2[65010] -> 3[69010] via direct shared memory iv-2udaavw4l02thdv8lcrl:34201:34474 [0] NCCL INFO Channel 01 : 2[65010] -> 3[69010] via direct shared memory iv-2udaavw4l02thdv8lcrl:34203:34476 [2] NCCL INFO Channel 01 : 1[6b010] -> 2[67010] [receive] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34203:34476 [2] NCCL INFO Channel 00 : 2[67010] -> 3[6b010] via direct shared memory iv-2udaavw4l02thdv8lcrl:34203:34476 [2] NCCL INFO Channel 01 : 2[67010] -> 3[6b010] via direct shared memory iv-2udaavw4l02thdv8lcrl:34205:34473 [4] NCCL INFO Channel 00 : 3[69010] -> 0[65010] [send] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34205:34473 [4] NCCL INFO Channel 01 : 3[69010] -> 0[65010] [send] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34207:34475 [6] NCCL INFO Channel 00 : 3[6b010] -> 0[67010] [send] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34207:34475 [6] NCCL INFO Channel 01 : 3[6b010] -> 0[67010] [send] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34205:34473 [4] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34205:34473 [4] NCCL INFO Channel 00 : 1[69010] -> 3[69010] [receive] via NET/IBext/0/GDRDMA iv-2udaavw4l02thdv8lcrl:34205:34473 [4] NCCL INFO Channel 01 : 1[69010] -> 3[69010] [receive] via NET/IBext/0/GDRDMA iv-2udaavw4l02thdv8lcrl:34207:34475 [6] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34205:34473 [4] NCCL INFO Channel 00 : 3[69010] -> 1[69010] [send] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34203:34476 [2] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34207:34475 [6] NCCL INFO Channel 00 : 1[6b010] -> 3[6b010] [receive] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34201:34474 [0] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34205:34473 [4] NCCL INFO Channel 01 : 3[69010] -> 1[69010] [send] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34207:34475 [6] NCCL INFO Channel 01 : 1[6b010] -> 3[6b010] [receive] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34207:34475 [6] NCCL INFO Channel 00 : 3[6b010] -> 1[6b010] [send] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34207:34475 [6] NCCL INFO Channel 01 : 3[6b010] -> 1[6b010] [send] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34207:34475 [6] NCCL INFO Channel 00 : 3[6b010] -> 2[67010] via direct shared memory iv-2udaavw4l02thdv8lcrl:34207:34475 [6] NCCL INFO Channel 01 : 3[6b010] -> 2[67010] via direct shared memory iv-2udaavw4l02thdv8lcrl:34203:34476 [2] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34203:34476 [2] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34203:34476 [2] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34207:34475 [6] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34207:34475 [6] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34207:34475 [6] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34203:34476 [2] NCCL INFO comm 0x7fcf380df6f0 rank 2 nranks 4 cudaDev 2 busId 67010 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34207:34475 [6] NCCL INFO comm 0x7feadc008fb0 rank 3 nranks 4 cudaDev 6 busId 6b010 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34205:34473 [4] NCCL INFO Channel 00 : 3[69010] -> 2[65010] via direct shared memory iv-2udaavw4l02thdv8lcrl:34205:34473 [4] NCCL INFO Channel 01 : 3[69010] -> 2[65010] via direct shared memory iv-2udaavw4l02thdv8lcrl:34201:34474 [0] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34201:34474 [0] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34201:34474 [0] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34205:34473 [4] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34205:34473 [4] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34205:34473 [4] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34201:34474 [0] NCCL INFO comm 0x7f9ad80de4b0 rank 2 nranks 4 cudaDev 0 busId 65010 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34205:34473 [4] NCCL INFO comm 0x7f111c008fb0 rank 3 nranks 4 cudaDev 4 busId 69010 - Init COMPLETE NCCL version 2.10.3+cuda11.4 NCCL version 2.10.3+cuda11.4 iv-2udaavw4l02thdv8lcrl:34201:34486 [0] NCCL INFO Channel 00/02 : 0 1 iv-2udaavw4l02thdv8lcrl:34202:34489 [1] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 iv-2udaavw4l02thdv8lcrl:34201:34486 [0] NCCL INFO Channel 01/02 : 0 1 iv-2udaavw4l02thdv8lcrl:34202:34489 [1] NCCL INFO Setting affinity for GPU 1 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:34201:34486 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 iv-2udaavw4l02thdv8lcrl:34201:34486 [0] NCCL INFO Setting affinity for GPU 0 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:34208:34490 [7] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 iv-2udaavw4l02thdv8lcrl:34207:34488 [6] NCCL INFO Channel 00/02 : 0 1 iv-2udaavw4l02thdv8lcrl:34207:34488 [6] NCCL INFO Channel 01/02 : 0 1 iv-2udaavw4l02thdv8lcrl:34208:34490 [7] NCCL INFO Setting affinity for GPU 7 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:34207:34488 [6] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 iv-2udaavw4l02thdv8lcrl:34207:34488 [6] NCCL INFO Setting affinity for GPU 6 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:34204:34493 [3] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 [2] -1/-1/-1->1->0 [3] -1/-1/-1->1->0 iv-2udaavw4l02thdv8lcrl:34203:34492 [2] NCCL INFO Channel 00/04 : 0 1 iv-2udaavw4l02thdv8lcrl:34203:34492 [2] NCCL INFO Channel 01/04 : 0 1 iv-2udaavw4l02thdv8lcrl:34204:34493 [3] NCCL INFO Setting affinity for GPU 3 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:34203:34492 [2] NCCL INFO Channel 02/04 : 0 1 iv-2udaavw4l02thdv8lcrl:34203:34492 [2] NCCL INFO Channel 03/04 : 0 1 iv-2udaavw4l02thdv8lcrl:34203:34492 [2] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 iv-2udaavw4l02thdv8lcrl:34203:34492 [2] NCCL INFO Setting affinity for GPU 2 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:34201:34486 [0] NCCL INFO Channel 00 : 0[65010] -> 1[65020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34202:34489 [1] NCCL INFO Channel 00 : 1[65020] -> 0[65010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34206:34502 [5] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 [2] -1/-1/-1->1->0 [3] -1/-1/-1->1->0 iv-2udaavw4l02thdv8lcrl:34205:34501 [4] NCCL INFO Channel 00/04 : 0 1 iv-2udaavw4l02thdv8lcrl:34206:34502 [5] NCCL INFO Setting affinity for GPU 5 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:34205:34501 [4] NCCL INFO Channel 01/04 : 0 1 iv-2udaavw4l02thdv8lcrl:34205:34501 [4] NCCL INFO Channel 02/04 : 0 1 iv-2udaavw4l02thdv8lcrl:34205:34501 [4] NCCL INFO Channel 03/04 : 0 1 iv-2udaavw4l02thdv8lcrl:34205:34501 [4] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 iv-2udaavw4l02thdv8lcrl:34205:34501 [4] NCCL INFO Setting affinity for GPU 4 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:34201:34486 [0] NCCL INFO Channel 01 : 0[65010] -> 1[65020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34202:34489 [1] NCCL INFO Channel 01 : 1[65020] -> 0[65010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34207:34488 [6] NCCL INFO Channel 00 : 0[6b010] -> 1[6b020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34207:34488 [6] NCCL INFO Channel 01 : 0[6b010] -> 1[6b020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34208:34490 [7] NCCL INFO Channel 00 : 1[6b020] -> 0[6b010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34208:34490 [7] NCCL INFO Channel 01 : 1[6b020] -> 0[6b010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34201:34486 [0] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34201:34486 [0] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34201:34486 [0] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34201:34486 [0] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34202:34489 [1] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34202:34489 [1] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34202:34489 [1] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34202:34489 [1] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34201:34486 [0] NCCL INFO comm 0x7f9ad0008fb0 rank 0 nranks 2 cudaDev 0 busId 65010 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34202:34489 [1] NCCL INFO comm 0x7f0904008fb0 rank 1 nranks 2 cudaDev 1 busId 65020 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34201:34201 [0] NCCL INFO Launch mode Parallel iv-2udaavw4l02thdv8lcrl:34203:34492 [2] NCCL INFO Channel 00 : 0[67010] -> 1[67020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34207:34488 [6] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34207:34488 [6] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34207:34488 [6] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34207:34488 [6] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34208:34490 [7] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34208:34490 [7] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34208:34490 [7] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34208:34490 [7] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34207:34488 [6] NCCL INFO comm 0x7fead4008fb0 rank 0 nranks 2 cudaDev 6 busId 6b010 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34207:34207 [6] NCCL INFO Launch mode Parallel iv-2udaavw4l02thdv8lcrl:34204:34493 [3] NCCL INFO Channel 00 : 1[67020] -> 0[67010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34208:34490 [7] NCCL INFO comm 0x7f137c008fb0 rank 1 nranks 2 cudaDev 7 busId 6b020 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34203:34492 [2] NCCL INFO Channel 01 : 0[67010] -> 1[67020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34204:34493 [3] NCCL INFO Channel 01 : 1[67020] -> 0[67010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34205:34501 [4] NCCL INFO Channel 00 : 0[69010] -> 1[69020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34206:34502 [5] NCCL INFO Channel 00 : 1[69020] -> 0[69010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34203:34492 [2] NCCL INFO Channel 02 : 0[67010] -> 1[67020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34204:34493 [3] NCCL INFO Channel 02 : 1[67020] -> 0[67010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34205:34501 [4] NCCL INFO Channel 01 : 0[69010] -> 1[69020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34203:34492 [2] NCCL INFO Channel 03 : 0[67010] -> 1[67020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34206:34502 [5] NCCL INFO Channel 01 : 1[69020] -> 0[69010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34204:34493 [3] NCCL INFO Channel 03 : 1[67020] -> 0[67010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34205:34501 [4] NCCL INFO Channel 02 : 0[69010] -> 1[69020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34206:34502 [5] NCCL INFO Channel 02 : 1[69020] -> 0[69010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34205:34501 [4] NCCL INFO Channel 03 : 0[69010] -> 1[69020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34206:34502 [5] NCCL INFO Channel 03 : 1[69020] -> 0[69010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34203:34492 [2] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34203:34492 [2] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34203:34492 [2] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34203:34492 [2] NCCL INFO 4 coll channels, 4 p2p channels, 4 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34204:34493 [3] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34204:34493 [3] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34204:34493 [3] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34204:34493 [3] NCCL INFO 4 coll channels, 4 p2p channels, 4 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34203:34492 [2] NCCL INFO comm 0x7fcf24008fb0 rank 0 nranks 2 cudaDev 2 busId 67010 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34203:34203 [2] NCCL INFO Launch mode Parallel iv-2udaavw4l02thdv8lcrl:34204:34493 [3] NCCL INFO comm 0x7ff2cc008fb0 rank 1 nranks 2 cudaDev 3 busId 67020 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34205:34501 [4] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34205:34501 [4] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34205:34501 [4] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34205:34501 [4] NCCL INFO 4 coll channels, 4 p2p channels, 4 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34206:34502 [5] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34206:34502 [5] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34206:34502 [5] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34206:34502 [5] NCCL INFO 4 coll channels, 4 p2p channels, 4 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34205:34501 [4] NCCL INFO comm 0x7f1114008fb0 rank 0 nranks 2 cudaDev 4 busId 69010 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34206:34502 [5] NCCL INFO comm 0x7f9d04008fb0 rank 1 nranks 2 cudaDev 5 busId 69020 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34205:34205 [4] NCCL INFO Launch mode Parallel [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) time (ms) | model-and-optimizer-setup: 408.91 | train/valid/test-data-iterators-setup: 1274.62 iv-2udaavw4l02thdv8lcrl:34201:35618 [0] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] 0/-1/-1->1->-1 iv-2udaavw4l02thdv8lcrl:34201:35618 [0] NCCL INFO Setting affinity for GPU 0 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:34202:35617 [1] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] 0/-1/-1->1->-1 iv-2udaavw4l02thdv8lcrl:34202:35617 [1] NCCL INFO Setting affinity for GPU 1 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:34203:35622 [2] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] 0/-1/-1->1->-1 iv-2udaavw4l02thdv8lcrl:34203:35622 [2] NCCL INFO Setting affinity for GPU 2 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:34204:35621 [3] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] 0/-1/-1->1->-1 iv-2udaavw4l02thdv8lcrl:34204:35621 [3] NCCL INFO Setting affinity for GPU 3 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:34201:35618 [0] NCCL INFO Channel 00 : 0[69010] -> 1[65010] [receive] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34202:35617 [1] NCCL INFO Channel 00 : 0[69020] -> 1[65020] [receive] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34203:35622 [2] NCCL INFO Channel 00 : 0[6b010] -> 1[67010] [receive] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34204:35621 [3] NCCL INFO Channel 00 : 0[6b020] -> 1[67020] [receive] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34201:35618 [0] NCCL INFO Channel 01 : 0[69010] -> 1[65010] [receive] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34202:35617 [1] NCCL INFO Channel 01 : 0[69020] -> 1[65020] [receive] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34203:35622 [2] NCCL INFO Channel 01 : 0[6b010] -> 1[67010] [receive] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34204:35621 [3] NCCL INFO Channel 01 : 0[6b020] -> 1[67020] [receive] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34201:35618 [0] NCCL INFO Channel 00 : 1[65010] -> 0[69010] [send] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34202:35617 [1] NCCL INFO Channel 00 : 1[65020] -> 0[69020] [send] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34203:35622 [2] NCCL INFO Channel 00 : 1[67010] -> 0[6b010] [send] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34204:35621 [3] NCCL INFO Channel 00 : 1[67020] -> 0[6b020] [send] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34201:35618 [0] NCCL INFO Channel 01 : 1[65010] -> 0[69010] [send] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34202:35617 [1] NCCL INFO Channel 01 : 1[65020] -> 0[69020] [send] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34203:35622 [2] NCCL INFO Channel 01 : 1[67010] -> 0[6b010] [send] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34204:35621 [3] NCCL INFO Channel 01 : 1[67020] -> 0[6b020] [send] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34204:35621 [3] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34204:35621 [3] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34204:35621 [3] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34204:35621 [3] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34204:35621 [3] NCCL INFO comm 0x7ff2c4008fb0 rank 1 nranks 2 cudaDev 3 busId 67020 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34203:35622 [2] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34203:35622 [2] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34203:35622 [2] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34203:35622 [2] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34203:35622 [2] NCCL INFO comm 0x7fced8008fb0 rank 1 nranks 2 cudaDev 2 busId 67010 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34201:35618 [0] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34201:35618 [0] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34201:35618 [0] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34201:35618 [0] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34201:35618 [0] NCCL INFO comm 0x7f9a90008fb0 rank 1 nranks 2 cudaDev 0 busId 65010 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34202:35617 [1] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34202:35617 [1] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34202:35617 [1] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34202:35617 [1] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34202:35617 [1] NCCL INFO comm 0x7f08f8008fb0 rank 1 nranks 2 cudaDev 1 busId 65020 - Init COMPLETE /dataset/xyn/Megatron-LM/megatron/model/transformer.py:536: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/LegacyTypeDispatch.h:74.) output = bias_dropout_add_func( /dataset/xyn/Megatron-LM/megatron/model/transformer.py:536: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/LegacyTypeDispatch.h:74.) output = bias_dropout_add_func( NCCL version 2.10.3+cuda11.4 /dataset/xyn/Megatron-LM/megatron/model/transformer.py:536: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/LegacyTypeDispatch.h:74.) output = bias_dropout_add_func( /dataset/xyn/Megatron-LM/megatron/model/transformer.py:536: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/LegacyTypeDispatch.h:74.) output = bias_dropout_add_func( NCCL version 2.10.3+cuda11.4 iv-2udaavw4l02thdv8lcrl:34207:35647 [6] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 iv-2udaavw4l02thdv8lcrl:34207:35647 [6] NCCL INFO Setting affinity for GPU 6 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:34203:35645 [2] NCCL INFO Channel 00/02 : 0 1 iv-2udaavw4l02thdv8lcrl:34203:35645 [2] NCCL INFO Channel 01/02 : 0 1 iv-2udaavw4l02thdv8lcrl:34203:35645 [2] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 iv-2udaavw4l02thdv8lcrl:34203:35645 [2] NCCL INFO Setting affinity for GPU 2 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:34204:35644 [3] NCCL INFO Channel 00/02 : 0 1 iv-2udaavw4l02thdv8lcrl:34204:35644 [3] NCCL INFO Channel 01/02 : 0 1 iv-2udaavw4l02thdv8lcrl:34208:35646 [7] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 iv-2udaavw4l02thdv8lcrl:34204:35644 [3] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 iv-2udaavw4l02thdv8lcrl:34204:35644 [3] NCCL INFO Setting affinity for GPU 3 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:34208:35646 [7] NCCL INFO Setting affinity for GPU 7 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:34205:35657 [4] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 iv-2udaavw4l02thdv8lcrl:34201:35655 [0] NCCL INFO Channel 00/02 : 0 1 iv-2udaavw4l02thdv8lcrl:34201:35655 [0] NCCL INFO Channel 01/02 : 0 1 iv-2udaavw4l02thdv8lcrl:34205:35657 [4] NCCL INFO Setting affinity for GPU 4 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:34201:35655 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 iv-2udaavw4l02thdv8lcrl:34201:35655 [0] NCCL INFO Setting affinity for GPU 0 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:34202:35653 [1] NCCL INFO Channel 00/02 : 0 1 iv-2udaavw4l02thdv8lcrl:34206:35656 [5] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 iv-2udaavw4l02thdv8lcrl:34202:35653 [1] NCCL INFO Channel 01/02 : 0 1 iv-2udaavw4l02thdv8lcrl:34206:35656 [5] NCCL INFO Setting affinity for GPU 5 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:34202:35653 [1] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 iv-2udaavw4l02thdv8lcrl:34202:35653 [1] NCCL INFO Setting affinity for GPU 1 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:34207:35647 [6] NCCL INFO Channel 00 : 1[6b010] -> 0[67010] via direct shared memory iv-2udaavw4l02thdv8lcrl:34203:35645 [2] NCCL INFO Channel 00 : 0[67010] -> 1[6b010] via direct shared memory iv-2udaavw4l02thdv8lcrl:34207:35647 [6] NCCL INFO Channel 01 : 1[6b010] -> 0[67010] via direct shared memory iv-2udaavw4l02thdv8lcrl:34203:35645 [2] NCCL INFO Channel 01 : 0[67010] -> 1[6b010] via direct shared memory iv-2udaavw4l02thdv8lcrl:34208:35646 [7] NCCL INFO Channel 00 : 1[6b020] -> 0[67020] via direct shared memory iv-2udaavw4l02thdv8lcrl:34204:35644 [3] NCCL INFO Channel 00 : 0[67020] -> 1[6b020] via direct shared memory iv-2udaavw4l02thdv8lcrl:34208:35646 [7] NCCL INFO Channel 01 : 1[6b020] -> 0[67020] via direct shared memory iv-2udaavw4l02thdv8lcrl:34204:35644 [3] NCCL INFO Channel 01 : 0[67020] -> 1[6b020] via direct shared memory iv-2udaavw4l02thdv8lcrl:34201:35655 [0] NCCL INFO Channel 00 : 0[65010] -> 1[69010] via direct shared memory iv-2udaavw4l02thdv8lcrl:34201:35655 [0] NCCL INFO Channel 01 : 0[65010] -> 1[69010] via direct shared memory iv-2udaavw4l02thdv8lcrl:34207:35647 [6] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34207:35647 [6] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34207:35647 [6] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34207:35647 [6] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34203:35645 [2] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34203:35645 [2] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34203:35645 [2] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34203:35645 [2] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34203:35645 [2] NCCL INFO comm 0x7fce14008fb0 rank 0 nranks 2 cudaDev 2 busId 67010 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34207:35647 [6] NCCL INFO comm 0x7fea90008fb0 rank 1 nranks 2 cudaDev 6 busId 6b010 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34203:34203 [2] NCCL INFO Launch mode Parallel iv-2udaavw4l02thdv8lcrl:34206:35656 [5] NCCL INFO Channel 00 : 1[69020] -> 0[65020] via direct shared memory iv-2udaavw4l02thdv8lcrl:34205:35657 [4] NCCL INFO Channel 00 : 1[69010] -> 0[65010] via direct shared memory iv-2udaavw4l02thdv8lcrl:34202:35653 [1] NCCL INFO Channel 00 : 0[65020] -> 1[69020] via direct shared memory iv-2udaavw4l02thdv8lcrl:34208:35646 [7] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34208:35646 [7] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34208:35646 [7] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34208:35646 [7] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34206:35656 [5] NCCL INFO Channel 01 : 1[69020] -> 0[65020] via direct shared memory iv-2udaavw4l02thdv8lcrl:34204:35644 [3] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34204:35644 [3] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34204:35644 [3] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34204:35644 [3] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34202:35653 [1] NCCL INFO Channel 01 : 0[65020] -> 1[69020] via direct shared memory iv-2udaavw4l02thdv8lcrl:34205:35657 [4] NCCL INFO Channel 01 : 1[69010] -> 0[65010] via direct shared memory iv-2udaavw4l02thdv8lcrl:34208:35646 [7] NCCL INFO comm 0x7f1378008fb0 rank 1 nranks 2 cudaDev 7 busId 6b020 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34204:35644 [3] NCCL INFO comm 0x7ff1f8008fb0 rank 0 nranks 2 cudaDev 3 busId 67020 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34204:34204 [3] NCCL INFO Launch mode Parallel iv-2udaavw4l02thdv8lcrl:34206:35656 [5] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34206:35656 [5] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34206:35656 [5] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34206:35656 [5] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34205:35657 [4] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34205:35657 [4] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34205:35657 [4] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34205:35657 [4] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34202:35653 [1] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34202:35653 [1] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34201:35655 [0] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34202:35653 [1] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34202:35653 [1] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34201:35655 [0] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34201:35655 [0] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34201:35655 [0] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34206:35656 [5] NCCL INFO comm 0x7f9d00008fb0 rank 1 nranks 2 cudaDev 5 busId 69020 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34202:35653 [1] NCCL INFO comm 0x7f082c008fb0 rank 0 nranks 2 cudaDev 1 busId 65020 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34201:35655 [0] NCCL INFO comm 0x7f99c4008fb0 rank 0 nranks 2 cudaDev 0 busId 65010 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34205:35657 [4] NCCL INFO comm 0x7f10d4008fb0 rank 1 nranks 2 cudaDev 4 busId 69010 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34201:34201 [0] NCCL INFO Launch mode Parallel iv-2udaavw4l02thdv8lcrl:34202:34202 [1] NCCL INFO Launch mode Parallel /dataset/xyn/Megatron-LM/megatron/model/transformer.py:536: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/LegacyTypeDispatch.h:74.) output = bias_dropout_add_func( /dataset/xyn/Megatron-LM/megatron/model/transformer.py:536: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/LegacyTypeDispatch.h:74.) output = bias_dropout_add_func( /dataset/xyn/Megatron-LM/megatron/model/transformer.py:536: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/LegacyTypeDispatch.h:74.) output = bias_dropout_add_func( /dataset/xyn/Megatron-LM/megatron/model/transformer.py:536: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/LegacyTypeDispatch.h:74.) output = bias_dropout_add_func( NCCL version 2.10.3+cuda11.4 iv-2udaavw4l02thdv8lcrl:34208:35672 [7] NCCL INFO Trees [0] 0/-1/-1->1->-1 [1] 0/-1/-1->1->-1 [2] 0/-1/-1->1->-1 [3] 0/-1/-1->1->-1 iv-2udaavw4l02thdv8lcrl:34206:35671 [5] NCCL INFO Channel 00/04 : 0 1 iv-2udaavw4l02thdv8lcrl:34206:35671 [5] NCCL INFO Channel 01/04 : 0 1 iv-2udaavw4l02thdv8lcrl:34208:35672 [7] NCCL INFO Setting affinity for GPU 7 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:34206:35671 [5] NCCL INFO Channel 02/04 : 0 1 iv-2udaavw4l02thdv8lcrl:34206:35671 [5] NCCL INFO Channel 03/04 : 0 1 iv-2udaavw4l02thdv8lcrl:34206:35671 [5] NCCL INFO Trees [0] -1/-1/-1->0->1 [1] -1/-1/-1->0->1 [2] -1/-1/-1->0->1 [3] -1/-1/-1->0->1 iv-2udaavw4l02thdv8lcrl:34206:35671 [5] NCCL INFO Setting affinity for GPU 5 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:34208:35672 [7] NCCL INFO Channel 00 : 1[6b020] -> 0[69020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34206:35671 [5] NCCL INFO Channel 00 : 0[69020] -> 1[6b020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34208:35672 [7] NCCL INFO Channel 01 : 1[6b020] -> 0[69020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34206:35671 [5] NCCL INFO Channel 01 : 0[69020] -> 1[6b020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34208:35672 [7] NCCL INFO Channel 02 : 1[6b020] -> 0[69020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34206:35671 [5] NCCL INFO Channel 02 : 0[69020] -> 1[6b020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34208:35672 [7] NCCL INFO Channel 03 : 1[6b020] -> 0[69020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34206:35671 [5] NCCL INFO Channel 03 : 0[69020] -> 1[6b020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34208:35672 [7] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34208:35672 [7] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34208:35672 [7] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34208:35672 [7] NCCL INFO 4 coll channels, 4 p2p channels, 4 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34206:35671 [5] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34206:35671 [5] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34206:35671 [5] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34206:35671 [5] NCCL INFO 4 coll channels, 4 p2p channels, 4 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34208:35672 [7] NCCL INFO comm 0x7f13780db490 rank 1 nranks 2 cudaDev 7 busId 6b020 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34206:35671 [5] NCCL INFO comm 0x7f9b08008fb0 rank 0 nranks 2 cudaDev 5 busId 69020 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34206:34206 [5] NCCL INFO Launch mode Parallel iv-2udaavw4l02thdv8lcrl:34202:35768 [1] NCCL INFO Channel 00/04 : 0 1 iv-2udaavw4l02thdv8lcrl:34204:35769 [3] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 [2] -1/-1/-1->1->0 [3] -1/-1/-1->1->0 iv-2udaavw4l02thdv8lcrl:34202:35768 [1] NCCL INFO Channel 01/04 : 0 1 iv-2udaavw4l02thdv8lcrl:34204:35769 [3] NCCL INFO Setting affinity for GPU 3 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:34202:35768 [1] NCCL INFO Channel 02/04 : 0 1 iv-2udaavw4l02thdv8lcrl:34202:35768 [1] NCCL INFO Channel 03/04 : 0 1 iv-2udaavw4l02thdv8lcrl:34202:35768 [1] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 iv-2udaavw4l02thdv8lcrl:34202:35768 [1] NCCL INFO Setting affinity for GPU 1 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:34202:35768 [1] NCCL INFO Channel 00 : 0[65020] -> 1[67020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34204:35769 [3] NCCL INFO Channel 00 : 1[67020] -> 0[65020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34202:35768 [1] NCCL INFO Channel 01 : 0[65020] -> 1[67020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34204:35769 [3] NCCL INFO Channel 01 : 1[67020] -> 0[65020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34202:35768 [1] NCCL INFO Channel 02 : 0[65020] -> 1[67020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34204:35769 [3] NCCL INFO Channel 02 : 1[67020] -> 0[65020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34202:35768 [1] NCCL INFO Channel 03 : 0[65020] -> 1[67020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34204:35769 [3] NCCL INFO Channel 03 : 1[67020] -> 0[65020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34202:35768 [1] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34202:35768 [1] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34202:35768 [1] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34202:35768 [1] NCCL INFO 4 coll channels, 4 p2p channels, 4 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34204:35769 [3] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34204:35769 [3] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34204:35769 [3] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34204:35769 [3] NCCL INFO 4 coll channels, 4 p2p channels, 4 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34202:35768 [1] NCCL INFO comm 0x7f0758008fb0 rank 0 nranks 2 cudaDev 1 busId 65020 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34204:35769 [3] NCCL INFO comm 0x7ff120008fb0 rank 1 nranks 2 cudaDev 3 busId 67020 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34202:34202 [1] NCCL INFO Launch mode Parallel iv-2udaavw4l02thdv8lcrl:34203:35779 [2] NCCL INFO Trees [0] 5/-1/-1->4->6 [1] 5/-1/-1->4->6 iv-2udaavw4l02thdv8lcrl:34204:35777 [3] NCCL INFO Trees [0] 7/-1/-1->5->4 [1] 7/-1/-1->5->4 iv-2udaavw4l02thdv8lcrl:34203:35779 [2] NCCL INFO Setting affinity for GPU 2 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:34204:35777 [3] NCCL INFO Setting affinity for GPU 3 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:34207:35778 [6] NCCL INFO Trees [0] 4/-1/-1->6->2 [1] 4/2/-1->6->-1 iv-2udaavw4l02thdv8lcrl:34208:35781 [7] NCCL INFO Trees [0] -1/-1/-1->7->5 [1] -1/-1/-1->7->5 iv-2udaavw4l02thdv8lcrl:34207:35778 [6] NCCL INFO Setting affinity for GPU 6 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:34208:35781 [7] NCCL INFO Setting affinity for GPU 7 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:34204:35777 [3] NCCL INFO Channel 00 : 5[67020] -> 6[6b010] via direct shared memory iv-2udaavw4l02thdv8lcrl:34206:35780 [5] NCCL INFO Trees [0] 4/-1/-1->7->6 [1] 4/-1/-1->7->6 iv-2udaavw4l02thdv8lcrl:34206:35780 [5] NCCL INFO Setting affinity for GPU 5 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:34205:35776 [4] NCCL INFO Trees [0] 7/-1/-1->6->2 [1] 7/2/-1->6->-1 iv-2udaavw4l02thdv8lcrl:34205:35776 [4] NCCL INFO Setting affinity for GPU 4 to 0fffff,fffffc00,00000000 iv-2udaavw4l02thdv8lcrl:34202:35774 [1] NCCL INFO Trees [0] -1/-1/-1->5->4 [1] -1/-1/-1->5->4 iv-2udaavw4l02thdv8lcrl:34201:35775 [0] NCCL INFO Trees [0] 5/-1/-1->4->7 [1] 5/-1/-1->4->7 iv-2udaavw4l02thdv8lcrl:34201:35775 [0] NCCL INFO Setting affinity for GPU 0 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:34202:35774 [1] NCCL INFO Setting affinity for GPU 1 to 03ff,ffffffff iv-2udaavw4l02thdv8lcrl:34204:35777 [3] NCCL INFO Channel 01 : 5[67020] -> 6[6b010] via direct shared memory iv-2udaavw4l02thdv8lcrl:34203:35779 [2] NCCL INFO Channel 00 : 3[6b020] -> 4[67010] [receive] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34208:35781 [7] NCCL INFO Channel 00 : 7[6b020] -> 0[67010] [send] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34202:35774 [1] NCCL INFO Channel 00 : 5[65020] -> 6[69010] via direct shared memory iv-2udaavw4l02thdv8lcrl:34206:35780 [5] NCCL INFO Channel 00 : 7[69020] -> 0[65010] [send] via NET/IBext/0/GDRDMA iv-2udaavw4l02thdv8lcrl:34202:35774 [1] NCCL INFO Channel 01 : 5[65020] -> 6[69010] via direct shared memory iv-2udaavw4l02thdv8lcrl:34206:35780 [5] NCCL INFO Channel 01 : 7[69020] -> 0[65010] [send] via NET/IBext/0/GDRDMA iv-2udaavw4l02thdv8lcrl:34203:35779 [2] NCCL INFO Channel 01 : 3[6b020] -> 4[67010] [receive] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34201:35775 [0] NCCL INFO Channel 00 : 3[69020] -> 4[65010] [receive] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34203:35779 [2] NCCL INFO Channel 00 : 4[67010] -> 5[67020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34208:35781 [7] NCCL INFO Channel 01 : 7[6b020] -> 0[67010] [send] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34203:35779 [2] NCCL INFO Channel 01 : 4[67010] -> 5[67020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34207:35778 [6] NCCL INFO Channel 00 : 6[6b010] -> 7[6b020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34207:35778 [6] NCCL INFO Channel 01 : 6[6b010] -> 7[6b020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34201:35775 [0] NCCL INFO Channel 01 : 3[69020] -> 4[65010] [receive] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34201:35775 [0] NCCL INFO Channel 00 : 4[65010] -> 5[65020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34201:35775 [0] NCCL INFO Channel 01 : 4[65010] -> 5[65020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34207:35778 [6] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34204:35777 [3] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34204:35777 [3] NCCL INFO Channel 00 : 5[67020] -> 7[6b020] via direct shared memory iv-2udaavw4l02thdv8lcrl:34204:35777 [3] NCCL INFO Channel 01 : 5[67020] -> 7[6b020] via direct shared memory iv-2udaavw4l02thdv8lcrl:34208:35781 [7] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34205:35776 [4] NCCL INFO Channel 00 : 6[69010] -> 7[69020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34205:35776 [4] NCCL INFO Channel 01 : 6[69010] -> 7[69020] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34203:35779 [2] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34203:35779 [2] NCCL INFO Channel 00 : 4[67010] -> 6[6b010] via direct shared memory iv-2udaavw4l02thdv8lcrl:34201:35775 [0] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34203:35779 [2] NCCL INFO Channel 01 : 4[67010] -> 6[6b010] via direct shared memory iv-2udaavw4l02thdv8lcrl:34205:35776 [4] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34202:35774 [1] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34201:35775 [0] NCCL INFO Channel 00 : 4[65010] -> 7[69020] via direct shared memory iv-2udaavw4l02thdv8lcrl:34205:35776 [4] NCCL INFO Channel 00 : 2[69010] -> 6[69010] [receive] via NET/IBext/0/GDRDMA iv-2udaavw4l02thdv8lcrl:34202:35774 [1] NCCL INFO Channel 00 : 5[65020] -> 4[65010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34205:35776 [4] NCCL INFO Channel 01 : 2[69010] -> 6[69010] [receive] via NET/IBext/0/GDRDMA iv-2udaavw4l02thdv8lcrl:34201:35775 [0] NCCL INFO Channel 01 : 4[65010] -> 7[69020] via direct shared memory iv-2udaavw4l02thdv8lcrl:34202:35774 [1] NCCL INFO Channel 01 : 5[65020] -> 4[65010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34206:35780 [5] NCCL INFO Connected all rings iv-2udaavw4l02thdv8lcrl:34208:35781 [7] NCCL INFO Channel 00 : 7[6b020] -> 5[67020] via direct shared memory iv-2udaavw4l02thdv8lcrl:34208:35781 [7] NCCL INFO Channel 01 : 7[6b020] -> 5[67020] via direct shared memory iv-2udaavw4l02thdv8lcrl:34205:35776 [4] NCCL INFO Channel 00 : 6[69010] -> 2[69010] [send] via NET/IBext/0/GDRDMA iv-2udaavw4l02thdv8lcrl:34205:35776 [4] NCCL INFO Channel 01 : 6[69010] -> 2[69010] [send] via NET/IBext/0/GDRDMA iv-2udaavw4l02thdv8lcrl:34207:35778 [6] NCCL INFO Channel 00 : 2[6b010] -> 6[6b010] [receive] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34207:35778 [6] NCCL INFO Channel 01 : 2[6b010] -> 6[6b010] [receive] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34207:35778 [6] NCCL INFO Channel 00 : 6[6b010] -> 2[6b010] [send] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34207:35778 [6] NCCL INFO Channel 01 : 6[6b010] -> 2[6b010] [send] via NET/IBext/0 iv-2udaavw4l02thdv8lcrl:34208:35781 [7] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34208:35781 [7] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34208:35781 [7] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34204:35777 [3] NCCL INFO Channel 00 : 5[67020] -> 4[67010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34206:35780 [5] NCCL INFO Channel 00 : 7[69020] -> 4[65010] via direct shared memory iv-2udaavw4l02thdv8lcrl:34204:35777 [3] NCCL INFO Channel 01 : 5[67020] -> 4[67010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34206:35780 [5] NCCL INFO Channel 01 : 7[69020] -> 4[65010] via direct shared memory iv-2udaavw4l02thdv8lcrl:34207:35778 [6] NCCL INFO Channel 00 : 6[6b010] -> 4[67010] via direct shared memory iv-2udaavw4l02thdv8lcrl:34207:35778 [6] NCCL INFO Channel 01 : 6[6b010] -> 4[67010] via direct shared memory iv-2udaavw4l02thdv8lcrl:34207:35778 [6] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34207:35778 [6] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34207:35778 [6] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34203:35779 [2] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34203:35779 [2] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34203:35779 [2] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34204:35777 [3] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34204:35777 [3] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34204:35777 [3] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34207:35778 [6] NCCL INFO comm 0x7fe888008fb0 rank 6 nranks 8 cudaDev 6 busId 6b010 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34208:35781 [7] NCCL INFO comm 0x7f1094008fb0 rank 7 nranks 8 cudaDev 7 busId 6b020 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34203:35779 [2] NCCL INFO comm 0x7fcd40008fb0 rank 4 nranks 8 cudaDev 2 busId 67010 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34204:35777 [3] NCCL INFO comm 0x7ff114008fb0 rank 5 nranks 8 cudaDev 3 busId 67020 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34206:35780 [5] NCCL INFO Channel 00 : 7[69020] -> 6[69010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34206:35780 [5] NCCL INFO Channel 01 : 7[69020] -> 6[69010] via P2P/IPC iv-2udaavw4l02thdv8lcrl:34205:35776 [4] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34205:35776 [4] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34205:35776 [4] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34206:35780 [5] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34206:35780 [5] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34206:35780 [5] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34201:35775 [0] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34201:35775 [0] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34201:35775 [0] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34202:35774 [1] NCCL INFO Connected all trees iv-2udaavw4l02thdv8lcrl:34202:35774 [1] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512 iv-2udaavw4l02thdv8lcrl:34202:35774 [1] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer iv-2udaavw4l02thdv8lcrl:34202:35774 [1] NCCL INFO comm 0x7f0750008fb0 rank 5 nranks 8 cudaDev 1 busId 65020 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34206:35780 [5] NCCL INFO comm 0x7f9a18008fb0 rank 7 nranks 8 cudaDev 5 busId 69020 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34205:35776 [4] NCCL INFO comm 0x7f0ec8008fb0 rank 6 nranks 8 cudaDev 4 busId 69010 - Init COMPLETE iv-2udaavw4l02thdv8lcrl:34201:35775 [0] NCCL INFO comm 0x7f98e4008fb0 rank 4 nranks 8 cudaDev 0 busId 65010 - Init COMPLETE [Rank 8] (after 100 iterations) memory (MB) | allocated: 851.1171875 | max allocated: 3492.9501953125 | reserved: 6646.0 | max reserved: 6646.0 [Rank 9] (after 100 iterations) memory (MB) | allocated: 851.1171875 | max allocated: 3492.9501953125 | reserved: 6390.0 | max reserved: 6390.0 iteration 100/ 220 | consumed samples: 102400 | elapsed time per iteration (ms): 9715.8 | learning rate: 3.984E-06 | tpt: 105.4 samples/s | global batch size: 1024 | lm loss: 1.000876E+01 | loss scale: 262144.0 | grad norm: 1.343 | number of skipped iterations: 15 | number of nan iterations: 0 | [Rank 12] (after 100 iterations) memory (MB) | allocated: 1345.0419921875 | max allocated: 6154.5771484375 | reserved: 11332.0 | max reserved: 11332.0[Rank 13] (after 100 iterations) memory (MB) | allocated: 1345.0419921875 | max allocated: 6154.5771484375 | reserved: 11334.0 | max reserved: 11334.0 time (ms) | forward-compute: 2886.52 | forward-recv: 397.82 | backward-compute: 5442.11 | backward-send: 5.43 | backward-send-forward-recv: 157.63 | backward-params-all-reduce: 3.61 | backward-embedding-all-reduce: 803.37 | optimizer-copy-to-main-grad: 1.19 | optimizer-unscale-and-check-inf: 8.02 | optimizer-clip-main-grad: 1.55 | optimizer-copy-main-to-model-params: 0.88 | optimizer: 14.20 | batch-generator: 13.38 timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] 2022/07/05 21:50:17.552, Tesla V100-SXM2-32GB, 470.57.02, 84 %, 2 %, 32510 MiB, 24528 MiB, 7982 MiB timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] 2022/07/05 21:50:17.553, Tesla V100-SXM2-32GB, 470.57.02, 84 %, 2 %, 32510 MiB, 24528 MiB, 7982 MiB 2022/07/05 21:50:17.553, Tesla V100-SXM2-32GB, 470.57.02, 69 %, 2 %, 32510 MiB, 24778 MiB, 7732 MiB 2022/07/05 21:50:17.554, Tesla V100-SXM2-32GB, 470.57.02, 84 %, 2 %, 32510 MiB, 24528 MiB, 7982 MiB 2022/07/05 21:50:17.557, Tesla V100-SXM2-32GB, 470.57.02, 69 %, 2 %, 32510 MiB, 24778 MiB, 7732 MiB 2022/07/05 21:50:17.557, Tesla V100-SXM2-32GB, 470.57.02, 84 %, 2 %, 32510 MiB, 24528 MiB, 7982 MiB 2022/07/05 21:50:17.558, Tesla V100-SXM2-32GB, 470.57.02, 65 %, 2 %, 32510 MiB, 25192 MiB, 7318 MiB 2022/07/05 21:50:17.559, Tesla V100-SXM2-32GB, 470.57.02, 69 %, 2 %, 32510 MiB, 24778 MiB, 7732 MiB timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] 2022/07/05 21:50:17.561, Tesla V100-SXM2-32GB, 470.57.02, 65 %, 2 %, 32510 MiB, 25192 MiB, 7318 MiB 2022/07/05 21:50:17.561, Tesla V100-SXM2-32GB, 470.57.02, 69 %, 2 %, 32510 MiB, 24778 MiB, 7732 MiB 2022/07/05 21:50:17.561, Tesla V100-SXM2-32GB, 470.57.02, 72 %, 2 %, 32510 MiB, 24768 MiB, 7742 MiB timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] 2022/07/05 21:50:17.562, Tesla V100-SXM2-32GB, 470.57.02, 65 %, 2 %, 32510 MiB, 25192 MiB, 7318 MiB timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] 2022/07/05 21:50:17.563, Tesla V100-SXM2-32GB, 470.57.02, 84 %, 2 %, 32510 MiB, 24528 MiB, 7982 MiB 2022/07/05 21:50:17.564, Tesla V100-SXM2-32GB, 470.57.02, 72 %, 2 %, 32510 MiB, 24768 MiB, 7742 MiB 2022/07/05 21:50:17.565, Tesla V100-SXM2-32GB, 470.57.02, 65 %, 2 %, 32510 MiB, 25192 MiB, 7318 MiB 2022/07/05 21:50:17.565, Tesla V100-SXM2-32GB, 470.57.02, 77 %, 4 %, 32510 MiB, 19728 MiB, 12782 MiB 2022/07/05 21:50:17.565, Tesla V100-SXM2-32GB, 470.57.02, 84 %, 2 %, 32510 MiB, 24528 MiB, 7982 MiB 2022/07/05 21:50:17.566, Tesla V100-SXM2-32GB, 470.57.02, 72 %, 2 %, 32510 MiB, 24768 MiB, 7742 MiB 2022/07/05 21:50:17.567, Tesla V100-SXM2-32GB, 470.57.02, 84 %, 2 %, 32510 MiB, 24528 MiB, 7982 MiB 2022/07/05 21:50:17.568, Tesla V100-SXM2-32GB, 470.57.02, 69 %, 2 %, 32510 MiB, 24778 MiB, 7732 MiB 2022/07/05 21:50:17.570, Tesla V100-SXM2-32GB, 470.57.02, 77 %, 4 %, 32510 MiB, 19728 MiB, 12782 MiB 2022/07/05 21:50:17.570, Tesla V100-SXM2-32GB, 470.57.02, 72 %, 2 %, 32510 MiB, 24768 MiB, 7742 MiB 2022/07/05 21:50:17.570, Tesla V100-SXM2-32GB, 470.57.02, 100 %, 0 %, 32510 MiB, 19754 MiB, 12756 MiB 2022/07/05 21:50:17.571, Tesla V100-SXM2-32GB, 470.57.02, 69 %, 2 %, 32510 MiB, 24778 MiB, 7732 MiB 2022/07/05 21:50:17.572, Tesla V100-SXM2-32GB, 470.57.02, 77 %, 4 %, 32510 MiB, 19728 MiB, 12782 MiB 2022/07/05 21:50:17.573, Tesla V100-SXM2-32GB, 470.57.02, 69 %, 2 %, 32510 MiB, 24778 MiB, 7732 MiB timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] 2022/07/05 21:50:17.574, Tesla V100-SXM2-32GB, 470.57.02, 65 %, 2 %, 32510 MiB, 25192 MiB, 7318 MiB 2022/07/05 21:50:17.576, Tesla V100-SXM2-32GB, 470.57.02, 100 %, 0 %, 32510 MiB, 19754 MiB, 12756 MiB 2022/07/05 21:50:17.576, Tesla V100-SXM2-32GB, 470.57.02, 77 %, 4 %, 32510 MiB, 19728 MiB, 12782 MiB 2022/07/05 21:50:17.576, Tesla V100-SXM2-32GB, 470.57.02, 100 %, 0 %, 32510 MiB, 19864 MiB, 12646 MiB 2022/07/05 21:50:17.577, Tesla V100-SXM2-32GB, 470.57.02, 65 %, 2 %, 32510 MiB, 25192 MiB, 7318 MiB 2022/07/05 21:50:17.578, Tesla V100-SXM2-32GB, 470.57.02, 16 %, 4 %, 32510 MiB, 19754 MiB, 12756 MiB 2022/07/05 21:50:17.579, Tesla V100-SXM2-32GB, 470.57.02, 65 %, 2 %, 32510 MiB, 25192 MiB, 7318 MiB 2022/07/05 21:50:17.579, Tesla V100-SXM2-32GB, 470.57.02, 84 %, 2 %, 32510 MiB, 24528 MiB, 7982 MiB 2022/07/05 21:50:17.580, Tesla V100-SXM2-32GB, 470.57.02, 72 %, 2 %, 32510 MiB, 24768 MiB, 7742 MiB 2022/07/05 21:50:17.581, Tesla V100-SXM2-32GB, 470.57.02, 100 %, 0 %, 32510 MiB, 19864 MiB, 12646 MiB 2022/07/05 21:50:17.582, Tesla V100-SXM2-32GB, 470.57.02, 16 %, 4 %, 32510 MiB, 19754 MiB, 12756 MiB 2022/07/05 21:50:17.582, Tesla V100-SXM2-32GB, 470.57.02, 41 %, 3 %, 32510 MiB, 19840 MiB, 12670 MiB 2022/07/05 21:50:17.583, Tesla V100-SXM2-32GB, 470.57.02, 72 %, 2 %, 32510 MiB, 24768 MiB, 7742 MiB 2022/07/05 21:50:17.584, Tesla V100-SXM2-32GB, 470.57.02, 100 %, 0 %, 32510 MiB, 19864 MiB, 12646 MiB 2022/07/05 21:50:17.585, Tesla V100-SXM2-32GB, 470.57.02, 72 %, 2 %, 32510 MiB, 24768 MiB, 7742 MiB 2022/07/05 21:50:17.586, Tesla V100-SXM2-32GB, 470.57.02, 69 %, 2 %, 32510 MiB, 24778 MiB, 7732 MiB 2022/07/05 21:50:17.586, Tesla V100-SXM2-32GB, 470.57.02, 77 %, 4 %, 32510 MiB, 19728 MiB, 12782 MiB 2022/07/05 21:50:17.588, Tesla V100-SXM2-32GB, 470.57.02, 41 %, 3 %, 32510 MiB, 19840 MiB, 12670 MiB 2022/07/05 21:50:17.588, Tesla V100-SXM2-32GB, 470.57.02, 100 %, 0 %, 32510 MiB, 19864 MiB, 12646 MiB 2022/07/05 21:50:17.590, Tesla V100-SXM2-32GB, 470.57.02, 77 %, 4 %, 32510 MiB, 19728 MiB, 12782 MiB 2022/07/05 21:50:17.591, Tesla V100-SXM2-32GB, 470.57.02, 41 %, 3 %, 32510 MiB, 19840 MiB, 12670 MiB 2022/07/05 21:50:17.592, Tesla V100-SXM2-32GB, 470.57.02, 77 %, 4 %, 32510 MiB, 19728 MiB, 12782 MiB 2022/07/05 21:50:17.593, Tesla V100-SXM2-32GB, 470.57.02, 65 %, 2 %, 32510 MiB, 25192 MiB, 7318 MiB 2022/07/05 21:50:17.593, Tesla V100-SXM2-32GB, 470.57.02, 16 %, 4 %, 32510 MiB, 19754 MiB, 12756 MiB 2022/07/05 21:50:17.595, Tesla V100-SXM2-32GB, 470.57.02, 41 %, 3 %, 32510 MiB, 19840 MiB, 12670 MiB 2022/07/05 21:50:17.596, Tesla V100-SXM2-32GB, 470.57.02, 16 %, 4 %, 32510 MiB, 19754 MiB, 12756 MiB 2022/07/05 21:50:17.598, Tesla V100-SXM2-32GB, 470.57.02, 16 %, 4 %, 32510 MiB, 19754 MiB, 12756 MiB 2022/07/05 21:50:17.599, Tesla V100-SXM2-32GB, 470.57.02, 72 %, 2 %, 32510 MiB, 24768 MiB, 7742 MiB 2022/07/05 21:50:17.599, Tesla V100-SXM2-32GB, 470.57.02, 4 %, 3 %, 32510 MiB, 19864 MiB, 12646 MiB 2022/07/05 21:50:17.602, Tesla V100-SXM2-32GB, 470.57.02, 4 %, 3 %, 32510 MiB, 19864 MiB, 12646 MiB 2022/07/05 21:50:17.604, Tesla V100-SXM2-32GB, 470.57.02, 4 %, 3 %, 32510 MiB, 19864 MiB, 12646 MiB 2022/07/05 21:50:17.604, Tesla V100-SXM2-32GB, 470.57.02, 77 %, 4 %, 32510 MiB, 19728 MiB, 12782 MiB 2022/07/05 21:50:17.604, Tesla V100-SXM2-32GB, 470.57.02, 41 %, 3 %, 32510 MiB, 19840 MiB, 12670 MiB 2022/07/05 21:50:17.607, Tesla V100-SXM2-32GB, 470.57.02, 41 %, 3 %, 32510 MiB, 19840 MiB, 12670 MiB 2022/07/05 21:50:17.609, Tesla V100-SXM2-32GB, 470.57.02, 41 %, 3 %, 32510 MiB, 19840 MiB, 12670 MiB 2022/07/05 21:50:17.609, Tesla V100-SXM2-32GB, 470.57.02, 16 %, 4 %, 32510 MiB, 19754 MiB, 12756 MiB 2022/07/05 21:50:17.616, Tesla V100-SXM2-32GB, 470.57.02, 4 %, 3 %, 32510 MiB, 19864 MiB, 12646 MiB 2022/07/05 21:50:17.621, Tesla V100-SXM2-32GB, 470.57.02, 41 %, 3 %, 32510 MiB, 19840 MiB, 12670 MiB iteration 200/ 220 | consumed samples: 204800 | elapsed time per iteration (ms): 9630.5 | learning rate: 8.672E-06 | tpt: 106.3 samples/s | global batch size: 1024 | lm loss: 8.633007E+00 | loss scale: 262144.0 | grad norm: 2.279 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) | forward-compute: 2872.72 | forward-recv: 347.88 | backward-compute: 5437.37 | backward-send: 5.45 | backward-send-forward-recv: 145.76 | backward-params-all-reduce: 3.63 | backward-embedding-all-reduce: 802.67 | optimizer-copy-to-main-grad: 1.18 | optimizer-unscale-and-check-inf: 1.15 | optimizer-clip-main-grad: 1.80 | optimizer-copy-main-to-model-params: 1.03 | optimizer: 8.10 | batch-generator: 13.23 ------------------------------------------------------------------------------------------------------------------ validation loss at the end of training for val data | lm loss value: 7.602548E+00 | lm loss PPL: 2.003294E+03 | ------------------------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------------------------- validation loss at the end of training for test data | lm loss value: 7.429202E+00 | lm loss PPL: 1.684462E+03 | ------------------------------------------------------------------------------------------------------------------- INFO:torch.distributed.elastic.agent.server.api:[default] worker group successfully finished. Waiting 300 seconds for other agents to finish. INFO:torch.distributed.elastic.agent.server.api:Local worker group finished (SUCCEEDED). Waiting 300 seconds for other agents to finish /opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:70: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:Done waiting for other agents. Elapsed: 0.0006103515625 seconds {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 8, "group_rank": 1, "worker_id": "34201", "role": "default", "hostname": "iv-2udaavw4l02thdv8lcrl", "state": "SUCCEEDED", "total_run_time": 2197, "rdzv_backend": "static", "raw_error": null, "metadata": "{\"group_world_size\": 2, \"entry_point\": \"python\", \"local_rank\": [0], \"role_rank\": [8], \"role_world_size\": [16]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 9, "group_rank": 1, "worker_id": "34202", "role": "default", "hostname": "iv-2udaavw4l02thdv8lcrl", "state": "SUCCEEDED", "total_run_time": 2197, "rdzv_backend": "static", "raw_error": null, "metadata": "{\"group_world_size\": 2, \"entry_point\": \"python\", \"local_rank\": [1], \"role_rank\": [9], \"role_world_size\": [16]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 10, "group_rank": 1, "worker_id": "34203", "role": "default", "hostname": "iv-2udaavw4l02thdv8lcrl", "state": "SUCCEEDED", "total_run_time": 2197, "rdzv_backend": "static", "raw_error": null, "metadata": "{\"group_world_size\": 2, \"entry_point\": \"python\", \"local_rank\": [2], \"role_rank\": [10], \"role_world_size\": [16]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 11, "group_rank": 1, "worker_id": "34204", "role": "default", "hostname": "iv-2udaavw4l02thdv8lcrl", "state": "SUCCEEDED", "total_run_time": 2197, "rdzv_backend": "static", "raw_error": null, "metadata": "{\"group_world_size\": 2, \"entry_point\": \"python\", \"local_rank\": [3], \"role_rank\": [11], \"role_world_size\": [16]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 12, "group_rank": 1, "worker_id": "34205", "role": "default", "hostname": "iv-2udaavw4l02thdv8lcrl", "state": "SUCCEEDED", "total_run_time": 2197, "rdzv_backend": "static", "raw_error": null, "metadata": "{\"group_world_size\": 2, \"entry_point\": \"python\", \"local_rank\": [4], \"role_rank\": [12], \"role_world_size\": [16]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 13, "group_rank": 1, "worker_id": "34206", "role": "default", "hostname": "iv-2udaavw4l02thdv8lcrl", "state": "SUCCEEDED", "total_run_time": 2197, "rdzv_backend": "static", "raw_error": null, "metadata": "{\"group_world_size\": 2, \"entry_point\": \"python\", \"local_rank\": [5], \"role_rank\": [13], \"role_world_size\": [16]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 14, "group_rank": 1, "worker_id": "34207", "role": "default", "hostname": "iv-2udaavw4l02thdv8lcrl", "state": "SUCCEEDED", "total_run_time": 2197, "rdzv_backend": "static", "raw_error": null, "metadata": "{\"group_world_size\": 2, \"entry_point\": \"python\", \"local_rank\": [6], \"role_rank\": [14], \"role_world_size\": [16]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 15, "group_rank": 1, "worker_id": "34208", "role": "default", "hostname": "iv-2udaavw4l02thdv8lcrl", "state": "SUCCEEDED", "total_run_time": 2197, "rdzv_backend": "static", "raw_error": null, "metadata": "{\"group_world_size\": 2, \"entry_point\": \"python\", \"local_rank\": [7], \"role_rank\": [15], \"role_world_size\": [16]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "AGENT", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": null, "group_rank": 1, "worker_id": null, "role": "default", "hostname": "iv-2udaavw4l02thdv8lcrl", "state": "SUCCEEDED", "total_run_time": 2197, "rdzv_backend": "static", "raw_error": null, "metadata": "{\"group_world_size\": 2, \"entry_point\": \"python\"}", "agent_restarts": 0}} ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. *****************************************