loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1 loaded library: loaded library: loaded library: loaded library: loaded library: loaded library: loaded library: loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1 /usr/lib/x86_64-linux-gnu/libibverbs.so.1/usr/lib/x86_64-linux-gnu/libibverbs.so.1 /usr/lib/x86_64-linux-gnu/libibverbs.so.1 /usr/lib/x86_64-linux-gnu/libibverbs.so.1/usr/lib/x86_64-linux-gnu/libibverbs.so.1 /usr/lib/x86_64-linux-gnu/libibverbs.so.1/usr/lib/x86_64-linux-gnu/libibverbs.so.1 W20220705 22:31:58.093453 62978 rpc_client.cpp:190] LoadServer 198.18.8.34 Failed at 0 times error_code 14 error_message failed to connect to all addresses W20220705 22:31:58.093397 62979 rpc_client.cpp:190] LoadServer 198.18.8.34 Failed at 0 times error_code 14 error_message failed to connect to all addresses [07/05 22:32:09 libai]: Rank of current process: 0. World size: 16 [07/05 22:32:09 libai]: Command line arguments: Namespace(config_file='configs/bert_nl24_nah16_hs1024.py', eval_only=False, fast_dev_run=False, opts=['model.cfg.hidden_layers=24', 'train.dist.pipeline_num_layers=24', 'train.train_micro_batch_size=128', 'train.global_batch_size=2048', 'train.dist.tensor_parallel_size=2', 'train.dist.pipeline_parallel_size=4', 'train.amp.enabled=true', 'train.activation_checkpoint.enabled=true', 'train.train_iter=220', 'train.log_period=100', 'train.output_dir=test_logs/01b1d32/2n8g/LibAI_bert_nl24_nah16_hs1024_FP16_actrue_mp2_pp4_mb128_gb2048_2n8g_20220705_223156628574994'], resume=False) [07/05 22:32:09 libai]: Contents of args.config_file=configs/bert_nl24_nah16_hs1024.py: from libai.config import LazyCall from libai.evaluation import PPLEvaluator from .common.models.bert import pretrain_model as model from .common.models.graph import graph from .common.train import train from .common.optim import optim from .common.data.bert_dataset import dataloader, tokenization #vocab_file = "/workspace/dataset/bert-base-chinese-vocab.txt" #data_prefix = "/workspace/dataset/loss_compara_content_sentence" vocab_file = "/dataset/source/dataset/bert-base-chinese-vocab.txt" data_prefix = "/dataset/source/dataset/loss_compara_content_sentence" tokenization.tokenizer.vocab_file = vocab_file dataloader.train.dataset[0].data_prefix = data_prefix dataloader.train.dataset[0].indexed_dataset.data_prefix = data_prefix # dataloader.train.num_workers = 4 # Bert-large model config #model.cfg.hidden_layers = 24 model.cfg.num_attention_heads = 16 model.cfg.hidden_size = 1024 #train.dist.pipeline_num_layers = model.cfg.hidden_layers train.test_micro_batch_size = 4 train.evaluation.evaluator = LazyCall(PPLEvaluator)() train.input_placement_device = "cpu" train.evaluation.enabled = False train.evaluation.eval_iter = 30 [07/05 22:32:09 libai]: Full config saved to test_logs/01b1d32/2n8g/LibAI_bert_nl24_nah16_hs1024_FP16_actrue_mp2_pp4_mb128_gb2048_2n8g_20220705_223156628574994/config.yaml [07/05 22:32:09 lb.engine.default]: > compiling dataset index builder ... make: Entering directory '/dataset/xyn/libai_bench/libai/libai/data/data_utils' make: Nothing to be done for 'default'. make: Leaving directory '/dataset/xyn/libai_bench/libai/libai/data/data_utils' [07/05 22:32:09 lb.engine.default]: >>> done with dataset index builder. Compilation time: 0.043 seconds [07/05 22:32:09 lb.engine.default]: >>> done with compiling. Compilation time: 0.044 seconds [07/05 22:32:09 lb.engine.default]: Prepare training, validating, testing set [07/05 22:32:09 lb.data.data_utils.indexed_dataset]: building dataset index ... [07/05 22:32:09 lb.data.data_utils.indexed_dataset]: warming up index mmap file... [07/05 22:32:09 lb.data.data_utils.indexed_dataset]: reading sizes... [07/05 22:32:09 lb.data.data_utils.indexed_dataset]: reading pointers... [07/05 22:32:09 lb.data.data_utils.indexed_dataset]: reading document index... [07/05 22:32:09 lb.data.data_utils.indexed_dataset]: warming up data mmap file... [07/05 22:32:10 lb.data.data_utils.indexed_dataset]: creating numpy buffer of mmap... [07/05 22:32:10 lb.data.data_utils.indexed_dataset]: creating memory view of numpy buffer... [07/05 22:32:10 lb.data.data_utils.indexed_dataset]: Finished creating indexed dataset in 0.100731 seconds [07/05 22:32:10 lb.data.data_utils.indexed_dataset]: indexed dataset stats: [07/05 22:32:10 lb.data.data_utils.indexed_dataset]: number of documents: 50000 [07/05 22:32:10 lb.data.data_utils.indexed_dataset]: number of sentences: 1249934 [07/05 22:32:10 lb.data.data_utils.dataset_utils]:  > loading indexed mapping from /dataset/source/dataset/loss_compara_content_sentence_bert_indexmap_450560mns_509msl_0.10ssp_1234s.npy [07/05 22:32:10 lb.data.data_utils.dataset_utils]:  loaded indexed file in 0.004 seconds [07/05 22:32:10 lb.data.data_utils.dataset_utils]:  total number of samples: 452417 [07/05 22:32:10 lb.data.data_utils.dataset_utils]:  > loading indexed mapping from /dataset/source/dataset/loss_compara_content_sentence_bert_indexmap_8mns_509msl_0.10ssp_1234s.npy [07/05 22:32:10 lb.data.data_utils.dataset_utils]:  loaded indexed file in 0.001 seconds [07/05 22:32:10 lb.data.data_utils.dataset_utils]:  total number of samples: 5884 [07/05 22:32:10 lb.data.data_utils.dataset_utils]:  > loading indexed mapping from /dataset/source/dataset/loss_compara_content_sentence_bert_indexmap_8mns_509msl_0.10ssp_1234s.npy [07/05 22:32:10 lb.data.data_utils.dataset_utils]:  loaded indexed file in 0.001 seconds [07/05 22:32:10 lb.data.data_utils.dataset_utils]:  total number of samples: 5884 [07/05 22:32:12 lb.engine.default]: Auto-scaling the config to train.train_iter=220, train.warmup_iter=0 iv-ybpu7pvmiu5m57lh5kdd:62972:64172 [0] NCCL INFO Bootstrap : Using eth0:192.168.11.230<0> iv-ybpu7pvmiu5m57lh5kdd:62972:64172 [0] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. iv-ybpu7pvmiu5m57lh5kdd:62972:64172 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. iv-ybpu7pvmiu5m57lh5kdd:62972:64172 [0] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so iv-ybpu7pvmiu5m57lh5kdd:62972:64172 [0] NCCL INFO P2P plugin IBext iv-ybpu7pvmiu5m57lh5kdd:62972:64172 [0] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. iv-ybpu7pvmiu5m57lh5kdd:62972:64172 [0] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE ; OOB eth0:192.168.11.230<0> iv-ybpu7pvmiu5m57lh5kdd:62972:64172 [0] NCCL INFO Using network IBext NCCL version 2.12.10+cuda11.2 iv-ybpu7pvmiu5m57lh5kdd:62974:64173 [2] NCCL INFO Bootstrap : Using eth0:192.168.11.230<0> iv-ybpu7pvmiu5m57lh5kdd:62975:64171 [3] NCCL INFO Bootstrap : Using eth0:192.168.11.230<0> iv-ybpu7pvmiu5m57lh5kdd:62973:64165 [1] NCCL INFO Bootstrap : Using eth0:192.168.11.230<0> iv-ybpu7pvmiu5m57lh5kdd:62973:64165 [1] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. iv-ybpu7pvmiu5m57lh5kdd:62973:64165 [1] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. iv-ybpu7pvmiu5m57lh5kdd:62973:64165 [1] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so iv-ybpu7pvmiu5m57lh5kdd:62973:64165 [1] NCCL INFO P2P plugin IBext iv-ybpu7pvmiu5m57lh5kdd:62973:64165 [1] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. iv-ybpu7pvmiu5m57lh5kdd:62974:64173 [2] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. iv-ybpu7pvmiu5m57lh5kdd:62974:64173 [2] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. iv-ybpu7pvmiu5m57lh5kdd:62974:64173 [2] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so iv-ybpu7pvmiu5m57lh5kdd:62974:64173 [2] NCCL INFO P2P plugin IBext iv-ybpu7pvmiu5m57lh5kdd:62974:64173 [2] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. iv-ybpu7pvmiu5m57lh5kdd:62975:64171 [3] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. iv-ybpu7pvmiu5m57lh5kdd:62975:64171 [3] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. iv-ybpu7pvmiu5m57lh5kdd:62975:64171 [3] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so iv-ybpu7pvmiu5m57lh5kdd:62975:64171 [3] NCCL INFO P2P plugin IBext iv-ybpu7pvmiu5m57lh5kdd:62975:64171 [3] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. iv-ybpu7pvmiu5m57lh5kdd:62973:64165 [1] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE ; OOB eth0:192.168.11.230<0> iv-ybpu7pvmiu5m57lh5kdd:62973:64165 [1] NCCL INFO Using network IBext iv-ybpu7pvmiu5m57lh5kdd:62974:64173 [2] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE ; OOB eth0:192.168.11.230<0> iv-ybpu7pvmiu5m57lh5kdd:62974:64173 [2] NCCL INFO Using network IBext iv-ybpu7pvmiu5m57lh5kdd:62975:64171 [3] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE ; OOB eth0:192.168.11.230<0> iv-ybpu7pvmiu5m57lh5kdd:62975:64171 [3] NCCL INFO Using network IBext iv-ybpu7pvmiu5m57lh5kdd:62975:64171 [3] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. iv-ybpu7pvmiu5m57lh5kdd:62972:64172 [0] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. iv-ybpu7pvmiu5m57lh5kdd:62974:64173 [2] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. iv-ybpu7pvmiu5m57lh5kdd:62973:64165 [1] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. iv-ybpu7pvmiu5m57lh5kdd:62974:64173 [2] NCCL INFO NCCL_IB_TIMEOUT set by environment to 23. iv-ybpu7pvmiu5m57lh5kdd:62973:64165 [1] NCCL INFO NCCL_IB_TIMEOUT set by environment to 23. iv-ybpu7pvmiu5m57lh5kdd:62972:64172 [0] NCCL INFO NCCL_IB_TIMEOUT set by environment to 23. iv-ybpu7pvmiu5m57lh5kdd:62973:64165 [1] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 7. iv-ybpu7pvmiu5m57lh5kdd:62974:64173 [2] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 7. iv-ybpu7pvmiu5m57lh5kdd:62972:64172 [0] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 7. iv-ybpu7pvmiu5m57lh5kdd:62975:64171 [3] NCCL INFO NCCL_IB_TIMEOUT set by environment to 23. iv-ybpu7pvmiu5m57lh5kdd:62975:64171 [3] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 7. iv-ybpu7pvmiu5m57lh5kdd:62973:64165 [1] NCCL INFO PXN Disabled as plugin is v4 iv-ybpu7pvmiu5m57lh5kdd:62973:64165 [1] NCCL INFO Setting affinity for GPU 1 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:62975:64171 [3] NCCL INFO PXN Disabled as plugin is v4 iv-ybpu7pvmiu5m57lh5kdd:62975:64171 [3] NCCL INFO Setting affinity for GPU 3 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:62974:64173 [2] NCCL INFO PXN Disabled as plugin is v4 iv-ybpu7pvmiu5m57lh5kdd:62974:64173 [2] NCCL INFO Setting affinity for GPU 2 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:62972:64172 [0] NCCL INFO PXN Disabled as plugin is v4 iv-ybpu7pvmiu5m57lh5kdd:62972:64172 [0] NCCL INFO Setting affinity for GPU 0 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:62972:64172 [0] NCCL INFO Channel 00/08 : 0 2 3 1 iv-ybpu7pvmiu5m57lh5kdd:62972:64172 [0] NCCL INFO Channel 01/08 : 0 2 1 3 iv-ybpu7pvmiu5m57lh5kdd:62972:64172 [0] NCCL INFO Channel 02/08 : 0 1 3 2 iv-ybpu7pvmiu5m57lh5kdd:62975:64171 [3] NCCL INFO Trees [0] 1/-1/-1->3->2 [1] 2/-1/-1->3->1 [2] 1/-1/-1->3->2 [3] 2/-1/-1->3->1 [4] 1/-1/-1->3->2 [5] 2/-1/-1->3->1 [6] 1/-1/-1->3->2 [7] 2/-1/-1->3->1 iv-ybpu7pvmiu5m57lh5kdd:62972:64172 [0] NCCL INFO Channel 03/08 : 0 3 1 2 iv-ybpu7pvmiu5m57lh5kdd:62972:64172 [0] NCCL INFO Channel 04/08 : 0 2 3 1 iv-ybpu7pvmiu5m57lh5kdd:62974:64173 [2] NCCL INFO Trees [0] 3/-1/-1->2->0 [1] 0/-1/-1->2->3 [2] 3/-1/-1->2->0 [3] 0/-1/-1->2->3 [4] 3/-1/-1->2->0 [5] 0/-1/-1->2->3 [6] 3/-1/-1->2->0 [7] 0/-1/-1->2->3 iv-ybpu7pvmiu5m57lh5kdd:62972:64172 [0] NCCL INFO Channel 05/08 : 0 2 1 3 iv-ybpu7pvmiu5m57lh5kdd:62972:64172 [0] NCCL INFO Channel 06/08 : 0 1 3 2 iv-ybpu7pvmiu5m57lh5kdd:62972:64172 [0] NCCL INFO Channel 07/08 : 0 3 1 2 iv-ybpu7pvmiu5m57lh5kdd:62973:64165 [1] NCCL INFO Trees [0] -1/-1/-1->1->3 [1] 3/-1/-1->1->-1 [2] -1/-1/-1->1->3 [3] 3/-1/-1->1->-1 [4] -1/-1/-1->1->3 [5] 3/-1/-1->1->-1 [6] -1/-1/-1->1->3 [7] 3/-1/-1->1->-1 iv-ybpu7pvmiu5m57lh5kdd:62972:64172 [0] NCCL INFO Trees [0] 2/-1/-1->0->-1 [1] -1/-1/-1->0->2 [2] 2/-1/-1->0->-1 [3] -1/-1/-1->0->2 [4] 2/-1/-1->0->-1 [5] -1/-1/-1->0->2 [6] 2/-1/-1->0->-1 [7] -1/-1/-1->0->2 iv-ybpu7pvmiu5m57lh5kdd:62975:64171 [3] NCCL INFO Channel 01 : 3[67020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:64171 [3] NCCL INFO Channel 05 : 3[67020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:64172 [0] NCCL INFO Channel 02 : 0[65010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:64173 [2] NCCL INFO Channel 00 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:64165 [1] NCCL INFO Channel 03 : 1[65020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:64172 [0] NCCL INFO Channel 06 : 0[65010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:64173 [2] NCCL INFO Channel 04 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:64165 [1] NCCL INFO Channel 07 : 1[65020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:64172 [0] NCCL INFO Channel 00 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:64165 [1] NCCL INFO Channel 01 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:64171 [3] NCCL INFO Channel 00 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:64173 [2] NCCL INFO Channel 02 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:64172 [0] NCCL INFO Channel 01 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:64165 [1] NCCL INFO Channel 02 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:64173 [2] NCCL INFO Channel 03 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:64171 [3] NCCL INFO Channel 03 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:64172 [0] NCCL INFO Channel 04 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:64165 [1] NCCL INFO Channel 05 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:64173 [2] NCCL INFO Channel 06 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:64171 [3] NCCL INFO Channel 04 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:64172 [0] NCCL INFO Channel 05 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:64165 [1] NCCL INFO Channel 06 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:64173 [2] NCCL INFO Channel 07 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:64171 [3] NCCL INFO Channel 07 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:64172 [0] NCCL INFO Channel 03 : 0[65010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:64173 [2] NCCL INFO Channel 01 : 2[67010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:64165 [1] NCCL INFO Channel 00 : 1[65020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:64171 [3] NCCL INFO Channel 02 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:64172 [0] NCCL INFO Channel 07 : 0[65010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:64173 [2] NCCL INFO Channel 05 : 2[67010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:64165 [1] NCCL INFO Channel 04 : 1[65020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:64171 [3] NCCL INFO Channel 06 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:64172 [0] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62973:64165 [1] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62974:64173 [2] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62974:64173 [2] NCCL INFO Channel 01 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:64171 [3] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62974:64173 [2] NCCL INFO Channel 02 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:64173 [2] NCCL INFO Channel 03 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:64173 [2] NCCL INFO Channel 05 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:64172 [0] NCCL INFO Channel 02 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:64165 [1] NCCL INFO Channel 00 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:64173 [2] NCCL INFO Channel 06 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:64172 [0] NCCL INFO Channel 03 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:64165 [1] NCCL INFO Channel 03 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:64173 [2] NCCL INFO Channel 07 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:64172 [0] NCCL INFO Channel 06 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:64165 [1] NCCL INFO Channel 04 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:64172 [0] NCCL INFO Channel 07 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:64165 [1] NCCL INFO Channel 07 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:64173 [2] NCCL INFO Channel 00 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:64171 [3] NCCL INFO Channel 01 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:64173 [2] NCCL INFO Channel 01 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:64171 [3] NCCL INFO Channel 02 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:64173 [2] NCCL INFO Channel 04 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:64171 [3] NCCL INFO Channel 05 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:64173 [2] NCCL INFO Channel 05 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:64171 [3] NCCL INFO Channel 06 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:64172 [0] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62972:64172 [0] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62972:64172 [0] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62975:64171 [3] NCCL INFO Channel 00 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:64165 [1] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62973:64165 [1] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62973:64165 [1] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62975:64171 [3] NCCL INFO Channel 01 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:64171 [3] NCCL INFO Channel 03 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:64171 [3] NCCL INFO Channel 04 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:64171 [3] NCCL INFO Channel 05 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:64171 [3] NCCL INFO Channel 07 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:64173 [2] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62974:64173 [2] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62974:64173 [2] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62975:64171 [3] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62975:64171 [3] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62975:64171 [3] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62973:64165 [1] NCCL INFO comm 0x7f2020535e90 rank 1 nranks 4 cudaDev 1 busId 65020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62972:64172 [0] NCCL INFO comm 0x7f0f2443bab0 rank 0 nranks 4 cudaDev 0 busId 65010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62975:64171 [3] NCCL INFO comm 0x7f5ffc082e20 rank 3 nranks 4 cudaDev 3 busId 67020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62974:64173 [2] NCCL INFO comm 0x7fc950bedfe0 rank 2 nranks 4 cudaDev 2 busId 67010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62972:64172 [0] NCCL INFO Launch mode Parallel iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO Bootstrap : Using eth0:192.168.11.230<0> iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO P2P plugin IBext iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE ; OOB eth0:192.168.11.230<0> iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO Using network IBext NCCL version 2.12.10+cuda11.2 iv-ybpu7pvmiu5m57lh5kdd:62978:64169 [6] NCCL INFO Bootstrap : Using eth0:192.168.11.230<0> iv-ybpu7pvmiu5m57lh5kdd:62977:64164 [5] NCCL INFO Bootstrap : Using eth0:192.168.11.230<0> iv-ybpu7pvmiu5m57lh5kdd:62979:64167 [7] NCCL INFO Bootstrap : Using eth0:192.168.11.230<0> iv-ybpu7pvmiu5m57lh5kdd:62979:64167 [7] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. iv-ybpu7pvmiu5m57lh5kdd:62978:64169 [6] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. iv-ybpu7pvmiu5m57lh5kdd:62979:64167 [7] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. iv-ybpu7pvmiu5m57lh5kdd:62978:64169 [6] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. iv-ybpu7pvmiu5m57lh5kdd:62979:64167 [7] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so iv-ybpu7pvmiu5m57lh5kdd:62978:64169 [6] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so iv-ybpu7pvmiu5m57lh5kdd:62978:64169 [6] NCCL INFO P2P plugin IBext iv-ybpu7pvmiu5m57lh5kdd:62979:64167 [7] NCCL INFO P2P plugin IBext iv-ybpu7pvmiu5m57lh5kdd:62978:64169 [6] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. iv-ybpu7pvmiu5m57lh5kdd:62979:64167 [7] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. iv-ybpu7pvmiu5m57lh5kdd:62977:64164 [5] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v5 symbol. iv-ybpu7pvmiu5m57lh5kdd:62977:64164 [5] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v5 symbol. iv-ybpu7pvmiu5m57lh5kdd:62977:64164 [5] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so iv-ybpu7pvmiu5m57lh5kdd:62977:64164 [5] NCCL INFO P2P plugin IBext iv-ybpu7pvmiu5m57lh5kdd:62977:64164 [5] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. iv-ybpu7pvmiu5m57lh5kdd:62978:64169 [6] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE ; OOB eth0:192.168.11.230<0> iv-ybpu7pvmiu5m57lh5kdd:62978:64169 [6] NCCL INFO Using network IBext iv-ybpu7pvmiu5m57lh5kdd:62977:64164 [5] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE ; OOB eth0:192.168.11.230<0> iv-ybpu7pvmiu5m57lh5kdd:62977:64164 [5] NCCL INFO Using network IBext iv-ybpu7pvmiu5m57lh5kdd:62979:64167 [7] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE ; OOB eth0:192.168.11.230<0> iv-ybpu7pvmiu5m57lh5kdd:62979:64167 [7] NCCL INFO Using network IBext iv-ybpu7pvmiu5m57lh5kdd:62977:64164 [5] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. iv-ybpu7pvmiu5m57lh5kdd:62978:64169 [6] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. iv-ybpu7pvmiu5m57lh5kdd:62977:64164 [5] NCCL INFO NCCL_IB_TIMEOUT set by environment to 23. iv-ybpu7pvmiu5m57lh5kdd:62977:64164 [5] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 7. iv-ybpu7pvmiu5m57lh5kdd:62979:64167 [7] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO NCCL_IB_TIMEOUT set by environment to 23. iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 7. iv-ybpu7pvmiu5m57lh5kdd:62978:64169 [6] NCCL INFO NCCL_IB_TIMEOUT set by environment to 23. iv-ybpu7pvmiu5m57lh5kdd:62978:64169 [6] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 7. iv-ybpu7pvmiu5m57lh5kdd:62979:64167 [7] NCCL INFO NCCL_IB_TIMEOUT set by environment to 23. iv-ybpu7pvmiu5m57lh5kdd:62979:64167 [7] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 7. iv-ybpu7pvmiu5m57lh5kdd:62978:64169 [6] NCCL INFO PXN Disabled as plugin is v4 iv-ybpu7pvmiu5m57lh5kdd:62978:64169 [6] NCCL INFO Setting affinity for GPU 6 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:62977:64164 [5] NCCL INFO PXN Disabled as plugin is v4 iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO PXN Disabled as plugin is v4 iv-ybpu7pvmiu5m57lh5kdd:62977:64164 [5] NCCL INFO Setting affinity for GPU 5 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO Setting affinity for GPU 4 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:62979:64167 [7] NCCL INFO PXN Disabled as plugin is v4 iv-ybpu7pvmiu5m57lh5kdd:62979:64167 [7] NCCL INFO Setting affinity for GPU 7 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:62979:64167 [7] NCCL INFO Trees [0] -1/-1/-1->3->1 [1] 1/-1/-1->3->-1 [2] -1/-1/-1->3->1 [3] 1/-1/-1->3->-1 [4] -1/-1/-1->3->1 [5] 1/-1/-1->3->-1 [6] -1/-1/-1->3->1 [7] 1/-1/-1->3->-1 iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO Channel 00/08 : 0 1 3 2 iv-ybpu7pvmiu5m57lh5kdd:62978:64169 [6] NCCL INFO Trees [0] 0/-1/-1->2->-1 [1] -1/-1/-1->2->0 [2] 0/-1/-1->2->-1 [3] -1/-1/-1->2->0 [4] 0/-1/-1->2->-1 [5] -1/-1/-1->2->0 [6] 0/-1/-1->2->-1 [7] -1/-1/-1->2->0 iv-ybpu7pvmiu5m57lh5kdd:62977:64164 [5] NCCL INFO Trees [0] 3/-1/-1->1->0 [1] 0/-1/-1->1->3 [2] 3/-1/-1->1->0 [3] 0/-1/-1->1->3 [4] 3/-1/-1->1->0 [5] 0/-1/-1->1->3 [6] 3/-1/-1->1->0 [7] 0/-1/-1->1->3 iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO Channel 01/08 : 0 3 1 2 iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO Channel 02/08 : 0 2 3 1 iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO Channel 03/08 : 0 2 1 3 iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO Channel 04/08 : 0 1 3 2 iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO Channel 05/08 : 0 3 1 2 iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO Channel 06/08 : 0 2 3 1 iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO Channel 07/08 : 0 2 1 3 iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO Trees [0] 1/-1/-1->0->2 [1] 2/-1/-1->0->1 [2] 1/-1/-1->0->2 [3] 2/-1/-1->0->1 [4] 1/-1/-1->0->2 [5] 2/-1/-1->0->1 [6] 1/-1/-1->0->2 [7] 2/-1/-1->0->1 iv-ybpu7pvmiu5m57lh5kdd:62978:64169 [6] NCCL INFO Channel 02 : 2[6b010] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:64164 [5] NCCL INFO Channel 01 : 1[69020] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:64169 [6] NCCL INFO Channel 06 : 2[6b010] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:64167 [7] NCCL INFO Channel 03 : 3[6b020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO Channel 00 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:64164 [5] NCCL INFO Channel 05 : 1[69020] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO Channel 04 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:64167 [7] NCCL INFO Channel 07 : 3[6b020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:64164 [5] NCCL INFO Channel 00 : 1[69020] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO Channel 02 : 0[69010] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:64169 [6] NCCL INFO Channel 00 : 2[6b010] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:64167 [7] NCCL INFO Channel 01 : 3[6b020] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:64164 [5] NCCL INFO Channel 03 : 1[69020] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO Channel 03 : 0[69010] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:64169 [6] NCCL INFO Channel 01 : 2[6b010] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:64167 [7] NCCL INFO Channel 02 : 3[6b020] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:64164 [5] NCCL INFO Channel 04 : 1[69020] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO Channel 06 : 0[69010] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:64169 [6] NCCL INFO Channel 04 : 2[6b010] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:64167 [7] NCCL INFO Channel 05 : 3[6b020] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:64164 [5] NCCL INFO Channel 07 : 1[69020] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO Channel 07 : 0[69010] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:64169 [6] NCCL INFO Channel 05 : 2[6b010] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:64167 [7] NCCL INFO Channel 06 : 3[6b020] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO Channel 01 : 0[69010] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:64169 [6] NCCL INFO Channel 03 : 2[6b010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:64164 [5] NCCL INFO Channel 02 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:64167 [7] NCCL INFO Channel 00 : 3[6b020] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO Channel 05 : 0[69010] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:64169 [6] NCCL INFO Channel 07 : 2[6b010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:64164 [5] NCCL INFO Channel 06 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:64167 [7] NCCL INFO Channel 04 : 3[6b020] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:64164 [5] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO Channel 01 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:64169 [6] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62979:64167 [7] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO Channel 02 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO Channel 03 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO Channel 05 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:64169 [6] NCCL INFO Channel 02 : 2[6b010] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO Channel 06 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:64167 [7] NCCL INFO Channel 00 : 3[6b020] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:64169 [6] NCCL INFO Channel 03 : 2[6b010] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO Channel 07 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:64167 [7] NCCL INFO Channel 03 : 3[6b020] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:64169 [6] NCCL INFO Channel 06 : 2[6b010] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:64167 [7] NCCL INFO Channel 04 : 3[6b020] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:64169 [6] NCCL INFO Channel 07 : 2[6b010] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:64167 [7] NCCL INFO Channel 07 : 3[6b020] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:64164 [5] NCCL INFO Channel 01 : 1[69020] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO Channel 00 : 0[69010] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:64164 [5] NCCL INFO Channel 02 : 1[69020] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO Channel 01 : 0[69010] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:64164 [5] NCCL INFO Channel 05 : 1[69020] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO Channel 04 : 0[69010] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:64164 [5] NCCL INFO Channel 06 : 1[69020] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO Channel 05 : 0[69010] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:64164 [5] NCCL INFO Channel 00 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:64167 [7] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62979:64167 [7] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62979:64167 [7] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62978:64169 [6] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62978:64169 [6] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62978:64169 [6] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62977:64164 [5] NCCL INFO Channel 01 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:64164 [5] NCCL INFO Channel 03 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:64164 [5] NCCL INFO Channel 04 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:64164 [5] NCCL INFO Channel 05 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:64164 [5] NCCL INFO Channel 07 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62977:64164 [5] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62977:64164 [5] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62977:64164 [5] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62978:64169 [6] NCCL INFO comm 0x7fd714697ac0 rank 2 nranks 4 cudaDev 6 busId 6b010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62979:64167 [7] NCCL INFO comm 0x7f79b6da1bb0 rank 3 nranks 4 cudaDev 7 busId 6b020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO comm 0x7f03c42a5d30 rank 0 nranks 4 cudaDev 4 busId 69010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62977:64164 [5] NCCL INFO comm 0x7fb72caae050 rank 1 nranks 4 cudaDev 5 busId 69020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62976:64168 [4] NCCL INFO Launch mode Parallel [07/05 22:32:28 lb.engine.default]: Model: BertForPreTraining( (bert): BertModel( (embeddings): BertEmbeddings( (vocab_embeddings): VocabEmbedding(num_embeddings=21248, embedding_dim=1024) (position_embeddings): Embedding(num_embeddings=512, embedding_dim=1024) (tokentype_embeddings): Embedding(num_embeddings=2, embedding_dim=1024) (embedding_dropout): Dropout(p=0.1, inplace=False) ) (extended_attn_mask): BertExtendedAttnMask() (encoders): ModuleList( (0): TransformerLayer( (drop_path): Identity() (input_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (self_attention): MultiheadAttention( hidden_size=1024, num_heads=16, is_cross_attention=False (dropout): Dropout(p=0.1, inplace=False) (query_key_value): Linear1D(in_features=1024, out_features=3072, bias=True, parallel=col) (dense): Linear1D(in_features=1024, out_features=1024, bias=True, parallel=row) ) (post_attention_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.1 (dense_h_to_4h): Linear1D(in_features=1024, out_features=4096, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=4096, out_features=1024, bias=True, parallel=row) ) ) (1): TransformerLayer( (drop_path): Identity() (input_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (self_attention): MultiheadAttention( hidden_size=1024, num_heads=16, is_cross_attention=False (dropout): Dropout(p=0.1, inplace=False) (query_key_value): Linear1D(in_features=1024, out_features=3072, bias=True, parallel=col) (dense): Linear1D(in_features=1024, out_features=1024, bias=True, parallel=row) ) (post_attention_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.1 (dense_h_to_4h): Linear1D(in_features=1024, out_features=4096, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=4096, out_features=1024, bias=True, parallel=row) ) ) (2): TransformerLayer( (drop_path): Identity() (input_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (self_attention): MultiheadAttention( hidden_size=1024, num_heads=16, is_cross_attention=False (dropout): Dropout(p=0.1, inplace=False) (query_key_value): Linear1D(in_features=1024, out_features=3072, bias=True, parallel=col) (dense): Linear1D(in_features=1024, out_features=1024, bias=True, parallel=row) ) (post_attention_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.1 (dense_h_to_4h): Linear1D(in_features=1024, out_features=4096, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=4096, out_features=1024, bias=True, parallel=row) ) ) (3): TransformerLayer( (drop_path): Identity() (input_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (self_attention): MultiheadAttention( hidden_size=1024, num_heads=16, is_cross_attention=False (dropout): Dropout(p=0.1, inplace=False) (query_key_value): Linear1D(in_features=1024, out_features=3072, bias=True, parallel=col) (dense): Linear1D(in_features=1024, out_features=1024, bias=True, parallel=row) ) (post_attention_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.1 (dense_h_to_4h): Linear1D(in_features=1024, out_features=4096, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=4096, out_features=1024, bias=True, parallel=row) ) ) (4): TransformerLayer( (drop_path): Identity() (input_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (self_attention): MultiheadAttention( hidden_size=1024, num_heads=16, is_cross_attention=False (dropout): Dropout(p=0.1, inplace=False) (query_key_value): Linear1D(in_features=1024, out_features=3072, bias=True, parallel=col) (dense): Linear1D(in_features=1024, out_features=1024, bias=True, parallel=row) ) (post_attention_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.1 (dense_h_to_4h): Linear1D(in_features=1024, out_features=4096, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=4096, out_features=1024, bias=True, parallel=row) ) ) (5): TransformerLayer( (drop_path): Identity() (input_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (self_attention): MultiheadAttention( hidden_size=1024, num_heads=16, is_cross_attention=False (dropout): Dropout(p=0.1, inplace=False) (query_key_value): Linear1D(in_features=1024, out_features=3072, bias=True, parallel=col) (dense): Linear1D(in_features=1024, out_features=1024, bias=True, parallel=row) ) (post_attention_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.1 (dense_h_to_4h): Linear1D(in_features=1024, out_features=4096, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=4096, out_features=1024, bias=True, parallel=row) ) ) (6): TransformerLayer( (drop_path): Identity() (input_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (self_attention): MultiheadAttention( hidden_size=1024, num_heads=16, is_cross_attention=False (dropout): Dropout(p=0.1, inplace=False) (query_key_value): Linear1D(in_features=1024, out_features=3072, bias=True, parallel=col) (dense): Linear1D(in_features=1024, out_features=1024, bias=True, parallel=row) ) (post_attention_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.1 (dense_h_to_4h): Linear1D(in_features=1024, out_features=4096, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=4096, out_features=1024, bias=True, parallel=row) ) ) (7): TransformerLayer( (drop_path): Identity() (input_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (self_attention): MultiheadAttention( hidden_size=1024, num_heads=16, is_cross_attention=False (dropout): Dropout(p=0.1, inplace=False) (query_key_value): Linear1D(in_features=1024, out_features=3072, bias=True, parallel=col) (dense): Linear1D(in_features=1024, out_features=1024, bias=True, parallel=row) ) (post_attention_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.1 (dense_h_to_4h): Linear1D(in_features=1024, out_features=4096, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=4096, out_features=1024, bias=True, parallel=row) ) ) (8): TransformerLayer( (drop_path): Identity() (input_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (self_attention): MultiheadAttention( hidden_size=1024, num_heads=16, is_cross_attention=False (dropout): Dropout(p=0.1, inplace=False) (query_key_value): Linear1D(in_features=1024, out_features=3072, bias=True, parallel=col) (dense): Linear1D(in_features=1024, out_features=1024, bias=True, parallel=row) ) (post_attention_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.1 (dense_h_to_4h): Linear1D(in_features=1024, out_features=4096, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=4096, out_features=1024, bias=True, parallel=row) ) ) (9): TransformerLayer( (drop_path): Identity() (input_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (self_attention): MultiheadAttention( hidden_size=1024, num_heads=16, is_cross_attention=False (dropout): Dropout(p=0.1, inplace=False) (query_key_value): Linear1D(in_features=1024, out_features=3072, bias=True, parallel=col) (dense): Linear1D(in_features=1024, out_features=1024, bias=True, parallel=row) ) (post_attention_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.1 (dense_h_to_4h): Linear1D(in_features=1024, out_features=4096, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=4096, out_features=1024, bias=True, parallel=row) ) ) (10): TransformerLayer( (drop_path): Identity() (input_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (self_attention): MultiheadAttention( hidden_size=1024, num_heads=16, is_cross_attention=False (dropout): Dropout(p=0.1, inplace=False) (query_key_value): Linear1D(in_features=1024, out_features=3072, bias=True, parallel=col) (dense): Linear1D(in_features=1024, out_features=1024, bias=True, parallel=row) ) (post_attention_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.1 (dense_h_to_4h): Linear1D(in_features=1024, out_features=4096, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=4096, out_features=1024, bias=True, parallel=row) ) ) (11): TransformerLayer( (drop_path): Identity() (input_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (self_attention): MultiheadAttention( hidden_size=1024, num_heads=16, is_cross_attention=False (dropout): Dropout(p=0.1, inplace=False) (query_key_value): Linear1D(in_features=1024, out_features=3072, bias=True, parallel=col) (dense): Linear1D(in_features=1024, out_features=1024, bias=True, parallel=row) ) (post_attention_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.1 (dense_h_to_4h): Linear1D(in_features=1024, out_features=4096, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=4096, out_features=1024, bias=True, parallel=row) ) ) (12): TransformerLayer( (drop_path): Identity() (input_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (self_attention): MultiheadAttention( hidden_size=1024, num_heads=16, is_cross_attention=False (dropout): Dropout(p=0.1, inplace=False) (query_key_value): Linear1D(in_features=1024, out_features=3072, bias=True, parallel=col) (dense): Linear1D(in_features=1024, out_features=1024, bias=True, parallel=row) ) (post_attention_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.1 (dense_h_to_4h): Linear1D(in_features=1024, out_features=4096, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=4096, out_features=1024, bias=True, parallel=row) ) ) (13): TransformerLayer( (drop_path): Identity() (input_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (self_attention): MultiheadAttention( hidden_size=1024, num_heads=16, is_cross_attention=False (dropout): Dropout(p=0.1, inplace=False) (query_key_value): Linear1D(in_features=1024, out_features=3072, bias=True, parallel=col) (dense): Linear1D(in_features=1024, out_features=1024, bias=True, parallel=row) ) (post_attention_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.1 (dense_h_to_4h): Linear1D(in_features=1024, out_features=4096, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=4096, out_features=1024, bias=True, parallel=row) ) ) (14): TransformerLayer( (drop_path): Identity() (input_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (self_attention): MultiheadAttention( hidden_size=1024, num_heads=16, is_cross_attention=False (dropout): Dropout(p=0.1, inplace=False) (query_key_value): Linear1D(in_features=1024, out_features=3072, bias=True, parallel=col) (dense): Linear1D(in_features=1024, out_features=1024, bias=True, parallel=row) ) (post_attention_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.1 (dense_h_to_4h): Linear1D(in_features=1024, out_features=4096, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=4096, out_features=1024, bias=True, parallel=row) ) ) (15): TransformerLayer( (drop_path): Identity() (input_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (self_attention): MultiheadAttention( hidden_size=1024, num_heads=16, is_cross_attention=False (dropout): Dropout(p=0.1, inplace=False) (query_key_value): Linear1D(in_features=1024, out_features=3072, bias=True, parallel=col) (dense): Linear1D(in_features=1024, out_features=1024, bias=True, parallel=row) ) (post_attention_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.1 (dense_h_to_4h): Linear1D(in_features=1024, out_features=4096, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=4096, out_features=1024, bias=True, parallel=row) ) ) (16): TransformerLayer( (drop_path): Identity() (input_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (self_attention): MultiheadAttention( hidden_size=1024, num_heads=16, is_cross_attention=False (dropout): Dropout(p=0.1, inplace=False) (query_key_value): Linear1D(in_features=1024, out_features=3072, bias=True, parallel=col) (dense): Linear1D(in_features=1024, out_features=1024, bias=True, parallel=row) ) (post_attention_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.1 (dense_h_to_4h): Linear1D(in_features=1024, out_features=4096, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=4096, out_features=1024, bias=True, parallel=row) ) ) (17): TransformerLayer( (drop_path): Identity() (input_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (self_attention): MultiheadAttention( hidden_size=1024, num_heads=16, is_cross_attention=False (dropout): Dropout(p=0.1, inplace=False) (query_key_value): Linear1D(in_features=1024, out_features=3072, bias=True, parallel=col) (dense): Linear1D(in_features=1024, out_features=1024, bias=True, parallel=row) ) (post_attention_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.1 (dense_h_to_4h): Linear1D(in_features=1024, out_features=4096, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=4096, out_features=1024, bias=True, parallel=row) ) ) (18): TransformerLayer( (drop_path): Identity() (input_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (self_attention): MultiheadAttention( hidden_size=1024, num_heads=16, is_cross_attention=False (dropout): Dropout(p=0.1, inplace=False) (query_key_value): Linear1D(in_features=1024, out_features=3072, bias=True, parallel=col) (dense): Linear1D(in_features=1024, out_features=1024, bias=True, parallel=row) ) (post_attention_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.1 (dense_h_to_4h): Linear1D(in_features=1024, out_features=4096, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=4096, out_features=1024, bias=True, parallel=row) ) ) (19): TransformerLayer( (drop_path): Identity() (input_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (self_attention): MultiheadAttention( hidden_size=1024, num_heads=16, is_cross_attention=False (dropout): Dropout(p=0.1, inplace=False) (query_key_value): Linear1D(in_features=1024, out_features=3072, bias=True, parallel=col) (dense): Linear1D(in_features=1024, out_features=1024, bias=True, parallel=row) ) (post_attention_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.1 (dense_h_to_4h): Linear1D(in_features=1024, out_features=4096, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=4096, out_features=1024, bias=True, parallel=row) ) ) (20): TransformerLayer( (drop_path): Identity() (input_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (self_attention): MultiheadAttention( hidden_size=1024, num_heads=16, is_cross_attention=False (dropout): Dropout(p=0.1, inplace=False) (query_key_value): Linear1D(in_features=1024, out_features=3072, bias=True, parallel=col) (dense): Linear1D(in_features=1024, out_features=1024, bias=True, parallel=row) ) (post_attention_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.1 (dense_h_to_4h): Linear1D(in_features=1024, out_features=4096, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=4096, out_features=1024, bias=True, parallel=row) ) ) (21): TransformerLayer( (drop_path): Identity() (input_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (self_attention): MultiheadAttention( hidden_size=1024, num_heads=16, is_cross_attention=False (dropout): Dropout(p=0.1, inplace=False) (query_key_value): Linear1D(in_features=1024, out_features=3072, bias=True, parallel=col) (dense): Linear1D(in_features=1024, out_features=1024, bias=True, parallel=row) ) (post_attention_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.1 (dense_h_to_4h): Linear1D(in_features=1024, out_features=4096, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=4096, out_features=1024, bias=True, parallel=row) ) ) (22): TransformerLayer( (drop_path): Identity() (input_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (self_attention): MultiheadAttention( hidden_size=1024, num_heads=16, is_cross_attention=False (dropout): Dropout(p=0.1, inplace=False) (query_key_value): Linear1D(in_features=1024, out_features=3072, bias=True, parallel=col) (dense): Linear1D(in_features=1024, out_features=1024, bias=True, parallel=row) ) (post_attention_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.1 (dense_h_to_4h): Linear1D(in_features=1024, out_features=4096, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=4096, out_features=1024, bias=True, parallel=row) ) ) (23): TransformerLayer( (drop_path): Identity() (input_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (self_attention): MultiheadAttention( hidden_size=1024, num_heads=16, is_cross_attention=False (dropout): Dropout(p=0.1, inplace=False) (query_key_value): Linear1D(in_features=1024, out_features=3072, bias=True, parallel=col) (dense): Linear1D(in_features=1024, out_features=1024, bias=True, parallel=row) ) (post_attention_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.1 (dense_h_to_4h): Linear1D(in_features=1024, out_features=4096, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=4096, out_features=1024, bias=True, parallel=row) ) ) ) (final_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (pooler): BertPooler( (dense): Linear1D(in_features=1024, out_features=1024, bias=True, parallel=col) (activation_func): Tanh() ) ) (cls_head): BertPreTrainingHeads( (predictions): BertLMPredictionHead( (dense): Linear1D(in_features=1024, out_features=1024, bias=True, parallel=data) (activation_func): GELU() (layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) ) (seq_relationship): Linear1D(in_features=1024, out_features=2, bias=True, parallel=data) (lm_logits): LMLogits() (loss_func): BertLoss( (lm_loss): ParallelCrossEntropyLoss() ) ) ) WARNING [07/05 22:32:28 lb.scheduler.lr_scheduler]: warmup iters equals to zero, return CosineLR [07/05 22:32:29 lb.engine.trainer]: Starting training from iteration 0 [07/05 22:33:24 lb.models.utils.graph_base]: Start compling the train graph which may take some time. Please wait for a moment ... iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Setting affinity for GPU 1 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Setting affinity for GPU 2 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Setting affinity for GPU 0 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Setting affinity for GPU 3 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 00/08 : 0 2 3 1 iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Trees [0] 3/-1/-1->2->0 [1] 0/-1/-1->2->3 [2] 3/-1/-1->2->0 [3] 0/-1/-1->2->3 [4] 3/-1/-1->2->0 [5] 0/-1/-1->2->3 [6] 3/-1/-1->2->0 [7] 0/-1/-1->2->3 iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Trees [0] 1/-1/-1->3->2 [1] 2/-1/-1->3->1 [2] 1/-1/-1->3->2 [3] 2/-1/-1->3->1 [4] 1/-1/-1->3->2 [5] 2/-1/-1->3->1 [6] 1/-1/-1->3->2 [7] 2/-1/-1->3->1 iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Trees [0] -1/-1/-1->1->3 [1] 3/-1/-1->1->-1 [2] -1/-1/-1->1->3 [3] 3/-1/-1->1->-1 [4] -1/-1/-1->1->3 [5] 3/-1/-1->1->-1 [6] -1/-1/-1->1->3 [7] 3/-1/-1->1->-1 iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 01/08 : 0 2 1 3 iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 02/08 : 0 1 3 2 iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 03/08 : 0 3 1 2 iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 04/08 : 0 2 3 1 iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 05/08 : 0 2 1 3 iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 06/08 : 0 1 3 2 iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 07/08 : 0 3 1 2 iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Trees [0] 2/-1/-1->0->-1 [1] -1/-1/-1->0->2 [2] 2/-1/-1->0->-1 [3] -1/-1/-1->0->2 [4] 2/-1/-1->0->-1 [5] -1/-1/-1->0->2 [6] 2/-1/-1->0->-1 [7] -1/-1/-1->0->2 iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 01 : 3[67020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 00 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 02 : 0[65010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Channel 03 : 1[65020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 06 : 0[65010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 05 : 3[67020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Channel 07 : 1[65020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 04 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 00 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Channel 01 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 00 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 02 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 01 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Channel 02 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 03 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 03 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 04 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Channel 05 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 04 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 06 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 05 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Channel 06 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 07 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 07 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 03 : 0[65010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 02 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Channel 00 : 1[65020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 01 : 2[67010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 07 : 0[65010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 06 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Channel 04 : 1[65020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 05 : 2[67010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 01 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 02 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 03 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 05 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 02 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 06 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Channel 00 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 03 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 07 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Channel 03 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 06 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Channel 04 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 07 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Setting affinity for GPU 7 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Channel 07 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Setting affinity for GPU 5 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Setting affinity for GPU 6 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Setting affinity for GPU 4 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 01 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 00 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 02 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 01 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 05 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 04 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 06 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 05 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Trees [0] -1/-1/-1->3->1 [1] 1/-1/-1->3->-1 [2] -1/-1/-1->3->1 [3] 1/-1/-1->3->-1 [4] -1/-1/-1->3->1 [5] 1/-1/-1->3->-1 [6] -1/-1/-1->3->1 [7] 1/-1/-1->3->-1 iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Trees [0] 0/-1/-1->2->-1 [1] -1/-1/-1->2->0 [2] 0/-1/-1->2->-1 [3] -1/-1/-1->2->0 [4] 0/-1/-1->2->-1 [5] -1/-1/-1->2->0 [6] 0/-1/-1->2->-1 [7] -1/-1/-1->2->0 iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 00/08 : 0 1 3 2 iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Trees [0] 3/-1/-1->1->0 [1] 0/-1/-1->1->3 [2] 3/-1/-1->1->0 [3] 0/-1/-1->1->3 [4] 3/-1/-1->1->0 [5] 0/-1/-1->1->3 [6] 3/-1/-1->1->0 [7] 0/-1/-1->1->3 iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 01/08 : 0 3 1 2 iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 02/08 : 0 2 3 1 iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 03/08 : 0 2 1 3 iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 04/08 : 0 1 3 2 iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 05/08 : 0 3 1 2 iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 06/08 : 0 2 3 1 iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 07/08 : 0 2 1 3 iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Trees [0] 1/-1/-1->0->2 [1] 2/-1/-1->0->1 [2] 1/-1/-1->0->2 [3] 2/-1/-1->0->1 [4] 1/-1/-1->0->2 [5] 2/-1/-1->0->1 [6] 1/-1/-1->0->2 [7] 2/-1/-1->0->1 iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 00 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 01 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Channel 02 : 2[6b010] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 03 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Channel 03 : 3[6b020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 01 : 1[69020] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 00 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Channel 06 : 2[6b010] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 04 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Channel 07 : 3[6b020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 05 : 1[69020] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 04 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 05 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 07 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Channel 00 : 2[6b010] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Channel 01 : 3[6b020] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 00 : 1[69020] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 02 : 0[69010] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Channel 01 : 2[6b010] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Channel 02 : 3[6b020] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 03 : 1[69020] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 03 : 0[69010] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Channel 04 : 2[6b010] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Channel 05 : 3[6b020] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 04 : 1[69020] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 06 : 0[69010] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Channel 05 : 2[6b010] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Channel 06 : 3[6b020] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 07 : 1[69020] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 07 : 0[69010] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 02 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Channel 03 : 2[6b010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Channel 00 : 3[6b020] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 01 : 0[69010] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 06 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Channel 04 : 3[6b020] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Channel 07 : 2[6b010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 05 : 0[69010] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 01 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 02 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 03 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 05 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Channel 00 : 3[6b020] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Channel 02 : 2[6b010] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 06 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Channel 03 : 3[6b020] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Channel 03 : 2[6b010] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 07 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Channel 04 : 3[6b020] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Channel 06 : 2[6b010] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Channel 07 : 3[6b020] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Channel 07 : 2[6b010] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO comm 0x560af488d3d0 rank 1 nranks 4 cudaDev 1 busId 65020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO comm 0x5650f222c230 rank 0 nranks 4 cudaDev 0 busId 65010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO comm 0x561a885d7fb0 rank 3 nranks 4 cudaDev 3 busId 67020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO comm 0x55ef13cfa560 rank 2 nranks 4 cudaDev 2 busId 67010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 01 : 1[69020] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 00 : 0[69010] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 02 : 1[69020] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 01 : 0[69010] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 05 : 1[69020] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 04 : 0[69010] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 06 : 1[69020] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 05 : 0[69010] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 00 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 01 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 03 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 04 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 05 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 07 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Setting affinity for GPU 0 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Setting affinity for GPU 2 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Setting affinity for GPU 3 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Setting affinity for GPU 1 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO comm 0x55ad5f6bd300 rank 3 nranks 4 cudaDev 7 busId 6b020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO comm 0x55b1f98fe530 rank 2 nranks 4 cudaDev 6 busId 6b010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO comm 0x55bdf64896c0 rank 0 nranks 4 cudaDev 4 busId 69010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO comm 0x5588b665b910 rank 1 nranks 4 cudaDev 5 busId 69020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Trees [0] -1/-1/-1->1->3 [1] 3/-1/-1->1->-1 [2] -1/-1/-1->1->3 [3] 3/-1/-1->1->-1 [4] -1/-1/-1->1->3 [5] 3/-1/-1->1->-1 [6] -1/-1/-1->1->3 [7] 3/-1/-1->1->-1 iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 00/08 : 0 2 3 1 iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Trees [0] 1/-1/-1->3->2 [1] 2/-1/-1->3->1 [2] 1/-1/-1->3->2 [3] 2/-1/-1->3->1 [4] 1/-1/-1->3->2 [5] 2/-1/-1->3->1 [6] 1/-1/-1->3->2 [7] 2/-1/-1->3->1 iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Trees [0] 3/-1/-1->2->0 [1] 0/-1/-1->2->3 [2] 3/-1/-1->2->0 [3] 0/-1/-1->2->3 [4] 3/-1/-1->2->0 [5] 0/-1/-1->2->3 [6] 3/-1/-1->2->0 [7] 0/-1/-1->2->3 iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 01/08 : 0 2 1 3 iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 02/08 : 0 1 3 2 iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 03/08 : 0 3 1 2 iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 04/08 : 0 2 3 1 iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 05/08 : 0 2 1 3 iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 06/08 : 0 1 3 2 iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 07/08 : 0 3 1 2 iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Trees [0] 2/-1/-1->0->-1 [1] -1/-1/-1->0->2 [2] 2/-1/-1->0->-1 [3] -1/-1/-1->0->2 [4] 2/-1/-1->0->-1 [5] -1/-1/-1->0->2 [6] 2/-1/-1->0->-1 [7] -1/-1/-1->0->2 iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Channel 03 : 1[65020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 02 : 0[65010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 01 : 3[67020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 00 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Channel 07 : 1[65020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 06 : 0[65010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 05 : 3[67020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 04 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Channel 01 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 00 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 00 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 02 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Channel 02 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 01 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 03 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 03 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Channel 05 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 04 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 04 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 06 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Channel 06 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 05 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 07 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 07 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Channel 00 : 1[65020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 02 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 03 : 0[65010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 01 : 2[67010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Channel 04 : 1[65020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 06 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 07 : 0[65010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 05 : 2[67010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 01 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 02 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 03 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 05 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 02 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Channel 00 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 06 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 03 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Channel 03 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 07 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 06 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Channel 04 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 07 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Channel 07 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Setting affinity for GPU 6 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Setting affinity for GPU 5 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Setting affinity for GPU 7 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Setting affinity for GPU 4 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 01 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 00 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 02 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 01 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 05 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 04 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 06 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 05 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Trees [0] -1/-1/-1->3->1 [1] 1/-1/-1->3->-1 [2] -1/-1/-1->3->1 [3] 1/-1/-1->3->-1 [4] -1/-1/-1->3->1 [5] 1/-1/-1->3->-1 [6] -1/-1/-1->3->1 [7] 1/-1/-1->3->-1 iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 00/08 : 0 1 3 2 iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Trees [0] 0/-1/-1->2->-1 [1] -1/-1/-1->2->0 [2] 0/-1/-1->2->-1 [3] -1/-1/-1->2->0 [4] 0/-1/-1->2->-1 [5] -1/-1/-1->2->0 [6] 0/-1/-1->2->-1 [7] -1/-1/-1->2->0 iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Trees [0] 3/-1/-1->1->0 [1] 0/-1/-1->1->3 [2] 3/-1/-1->1->0 [3] 0/-1/-1->1->3 [4] 3/-1/-1->1->0 [5] 0/-1/-1->1->3 [6] 3/-1/-1->1->0 [7] 0/-1/-1->1->3 iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 01/08 : 0 3 1 2 iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 02/08 : 0 2 3 1 iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 03/08 : 0 2 1 3 iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 04/08 : 0 1 3 2 iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 05/08 : 0 3 1 2 iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 06/08 : 0 2 3 1 iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 07/08 : 0 2 1 3 iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Trees [0] 1/-1/-1->0->2 [1] 2/-1/-1->0->1 [2] 1/-1/-1->0->2 [3] 2/-1/-1->0->1 [4] 1/-1/-1->0->2 [5] 2/-1/-1->0->1 [6] 1/-1/-1->0->2 [7] 2/-1/-1->0->1 iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 00 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 01 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 03 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 04 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 01 : 1[69020] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Channel 03 : 3[6b020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Channel 02 : 2[6b010] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 05 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 00 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 05 : 1[69020] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Channel 07 : 3[6b020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Channel 06 : 2[6b010] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 07 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 04 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Channel 00 : 2[6b010] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 00 : 1[69020] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Channel 01 : 3[6b020] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 02 : 0[69010] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Channel 01 : 2[6b010] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 03 : 1[69020] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Channel 02 : 3[6b020] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 03 : 0[69010] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Channel 04 : 2[6b010] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 04 : 1[69020] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Channel 05 : 3[6b020] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 06 : 0[69010] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Channel 05 : 2[6b010] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 07 : 1[69020] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Channel 06 : 3[6b020] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 07 : 0[69010] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 02 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Channel 00 : 3[6b020] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Channel 03 : 2[6b010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 01 : 0[69010] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 06 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Channel 04 : 3[6b020] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 05 : 0[69010] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Channel 07 : 2[6b010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 01 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 02 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 03 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 05 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Channel 00 : 3[6b020] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 06 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Channel 02 : 2[6b010] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Channel 03 : 3[6b020] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 07 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Channel 03 : 2[6b010] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Channel 04 : 3[6b020] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO comm 0x5650f236cce0 rank 0 nranks 4 cudaDev 0 busId 65010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO comm 0x560af49cb670 rank 1 nranks 4 cudaDev 1 busId 65020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Channel 06 : 2[6b010] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Channel 07 : 3[6b020] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO comm 0x561a8871d890 rank 3 nranks 4 cudaDev 3 busId 67020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO comm 0x55ef13e3f960 rank 2 nranks 4 cudaDev 2 busId 67010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Channel 07 : 2[6b010] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 00 : 0[69010] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 01 : 1[69020] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 01 : 0[69010] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 02 : 1[69020] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 04 : 0[69010] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 05 : 1[69020] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 05 : 0[69010] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 06 : 1[69020] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 00 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 01 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 03 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 04 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 05 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 07 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Setting affinity for GPU 3 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Setting affinity for GPU 0 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Setting affinity for GPU 2 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Setting affinity for GPU 1 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 00/08 : 0 2 3 1 iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 01/08 : 0 2 1 3 iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Trees [0] 1/-1/-1->3->2 [1] 2/-1/-1->3->1 [2] 1/-1/-1->3->2 [3] 2/-1/-1->3->1 [4] 1/-1/-1->3->2 [5] 2/-1/-1->3->1 [6] 1/-1/-1->3->2 [7] 2/-1/-1->3->1 iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 02/08 : 0 1 3 2 iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Trees [0] -1/-1/-1->1->3 [1] 3/-1/-1->1->-1 [2] -1/-1/-1->1->3 [3] 3/-1/-1->1->-1 [4] -1/-1/-1->1->3 [5] 3/-1/-1->1->-1 [6] -1/-1/-1->1->3 [7] 3/-1/-1->1->-1 iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Trees [0] 3/-1/-1->2->0 [1] 0/-1/-1->2->3 [2] 3/-1/-1->2->0 [3] 0/-1/-1->2->3 [4] 3/-1/-1->2->0 [5] 0/-1/-1->2->3 [6] 3/-1/-1->2->0 [7] 0/-1/-1->2->3 iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 03/08 : 0 3 1 2 iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 04/08 : 0 2 3 1 iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 05/08 : 0 2 1 3 iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 06/08 : 0 1 3 2 iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 07/08 : 0 3 1 2 iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Trees [0] 2/-1/-1->0->-1 [1] -1/-1/-1->0->2 [2] 2/-1/-1->0->-1 [3] -1/-1/-1->0->2 [4] 2/-1/-1->0->-1 [5] -1/-1/-1->0->2 [6] 2/-1/-1->0->-1 [7] -1/-1/-1->0->2 iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO comm 0x55ad5f7b4290 rank 3 nranks 4 cudaDev 7 busId 6b020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO comm 0x55b1f99f5660 rank 2 nranks 4 cudaDev 6 busId 6b010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO comm 0x5588b675a220 rank 1 nranks 4 cudaDev 5 busId 69020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO comm 0x55bdf6588d20 rank 0 nranks 4 cudaDev 4 busId 69010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Channel 03 : 1[65020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 01 : 3[67020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 02 : 0[65010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 00 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Channel 07 : 1[65020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 06 : 0[65010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 05 : 3[67020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 04 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Channel 01 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 00 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 00 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 02 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Channel 02 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 01 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 03 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 03 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Channel 05 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 04 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 04 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 06 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Channel 06 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 07 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 05 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 07 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 02 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Channel 00 : 1[65020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 03 : 0[65010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 01 : 2[67010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 06 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Channel 04 : 1[65020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 07 : 0[65010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 05 : 2[67010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 01 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 02 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 03 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 05 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 02 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Channel 00 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 06 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 03 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Channel 03 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 07 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 06 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Channel 04 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Channel 07 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Channel 07 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 00 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 01 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Setting affinity for GPU 5 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Setting affinity for GPU 7 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Setting affinity for GPU 4 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Setting affinity for GPU 6 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 01 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 02 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 04 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 05 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Channel 05 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 06 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Trees [0] 3/-1/-1->1->0 [1] 0/-1/-1->1->3 [2] 3/-1/-1->1->0 [3] 0/-1/-1->1->3 [4] 3/-1/-1->1->0 [5] 0/-1/-1->1->3 [6] 3/-1/-1->1->0 [7] 0/-1/-1->1->3 iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 00/08 : 0 1 3 2 iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Trees [0] 0/-1/-1->2->-1 [1] -1/-1/-1->2->0 [2] 0/-1/-1->2->-1 [3] -1/-1/-1->2->0 [4] 0/-1/-1->2->-1 [5] -1/-1/-1->2->0 [6] 0/-1/-1->2->-1 [7] -1/-1/-1->2->0 iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Trees [0] -1/-1/-1->3->1 [1] 1/-1/-1->3->-1 [2] -1/-1/-1->3->1 [3] 1/-1/-1->3->-1 [4] -1/-1/-1->3->1 [5] 1/-1/-1->3->-1 [6] -1/-1/-1->3->1 [7] 1/-1/-1->3->-1 iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 01/08 : 0 3 1 2 iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 02/08 : 0 2 3 1 iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 03/08 : 0 2 1 3 iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 04/08 : 0 1 3 2 iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 05/08 : 0 3 1 2 iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 06/08 : 0 2 3 1 iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 07/08 : 0 2 1 3 iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Trees [0] 1/-1/-1->0->2 [1] 2/-1/-1->0->1 [2] 1/-1/-1->0->2 [3] 2/-1/-1->0->1 [4] 1/-1/-1->0->2 [5] 2/-1/-1->0->1 [6] 1/-1/-1->0->2 [7] 2/-1/-1->0->1 iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 00 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 01 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 03 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 04 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 01 : 1[69020] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Channel 02 : 2[6b010] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Channel 03 : 3[6b020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 00 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 05 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 05 : 1[69020] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Channel 06 : 2[6b010] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Channel 07 : 3[6b020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 04 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Channel 07 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 00 : 1[69020] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Channel 01 : 3[6b020] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 02 : 0[69010] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Channel 00 : 2[6b010] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 03 : 1[69020] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 03 : 0[69010] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Channel 02 : 3[6b020] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Channel 01 : 2[6b010] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 04 : 1[69020] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 06 : 0[69010] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Channel 05 : 3[6b020] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Channel 04 : 2[6b010] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 07 : 1[69020] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 07 : 0[69010] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Channel 06 : 3[6b020] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Channel 05 : 2[6b010] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Channel 00 : 3[6b020] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 02 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 01 : 0[69010] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Channel 03 : 2[6b010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Channel 04 : 3[6b020] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 05 : 0[69010] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 06 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Channel 07 : 2[6b010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 01 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 02 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 03 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 05 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Channel 00 : 3[6b020] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 06 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Channel 02 : 2[6b010] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Channel 03 : 3[6b020] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 07 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Channel 03 : 2[6b010] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Channel 04 : 3[6b020] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO comm 0x5650f25f5370 rank 0 nranks 4 cudaDev 0 busId 65010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO comm 0x560af4c51740 rank 1 nranks 4 cudaDev 1 busId 65020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Channel 06 : 2[6b010] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO comm 0x55ef34eb64c0 rank 2 nranks 4 cudaDev 2 busId 67010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO comm 0x561a889ac720 rank 3 nranks 4 cudaDev 3 busId 67020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Channel 07 : 3[6b020] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Channel 07 : 2[6b010] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 00 : 0[69010] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 01 : 1[69020] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 01 : 0[69010] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 02 : 1[69020] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 04 : 0[69010] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 05 : 1[69020] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Channel 05 : 0[69010] -> 2[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 06 : 1[69020] -> 3[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 00 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 01 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 03 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 04 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 05 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Channel 07 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO 8 coll channels, 8 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO comm 0x55b1f9c368b0 rank 2 nranks 4 cudaDev 6 busId 6b010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO comm 0x55ad5f9f53b0 rank 3 nranks 4 cudaDev 7 busId 6b020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO comm 0x55bdf67d1830 rank 0 nranks 4 cudaDev 4 busId 69010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO comm 0x5588b69a4c20 rank 1 nranks 4 cudaDev 5 busId 69020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62972:65000 [0] NCCL INFO Launch mode Parallel NCCL version 2.12.10+cuda11.2 iv-ybpu7pvmiu5m57lh5kdd:62972:65000 [0] NCCL INFO Setting affinity for GPU 0 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:62973:65134 [1] NCCL INFO Setting affinity for GPU 1 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:62972:65000 [0] NCCL INFO Channel 00/02 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62972:65000 [0] NCCL INFO Channel 01/02 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62973:65134 [1] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 iv-ybpu7pvmiu5m57lh5kdd:62972:65000 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 iv-ybpu7pvmiu5m57lh5kdd:62972:65000 [0] NCCL INFO Channel 00 : 0[65010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:65070 [2] NCCL INFO Setting affinity for GPU 2 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:62975:65042 [3] NCCL INFO Setting affinity for GPU 3 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:62973:65134 [1] NCCL INFO Channel 00 : 1[65020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:65070 [2] NCCL INFO Channel 00/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62974:65070 [2] NCCL INFO Channel 01/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62975:65042 [3] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 [2] -1/-1/-1->1->0 [3] -1/-1/-1->1->0 iv-ybpu7pvmiu5m57lh5kdd:62974:65070 [2] NCCL INFO Channel 02/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62974:65070 [2] NCCL INFO Channel 03/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62974:65070 [2] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 iv-ybpu7pvmiu5m57lh5kdd:62972:65000 [0] NCCL INFO Channel 01 : 0[65010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:65134 [1] NCCL INFO Channel 01 : 1[65020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:65134 [1] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62973:65134 [1] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62973:65134 [1] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62973:65134 [1] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62972:65000 [0] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62972:65000 [0] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62972:65000 [0] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62972:65000 [0] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62975:65042 [3] NCCL INFO Channel 00 : 1[67020] -> 0[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:65070 [2] NCCL INFO Channel 00 : 0[67010] -> 1[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:65042 [3] NCCL INFO Channel 01 : 1[67020] -> 0[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:65070 [2] NCCL INFO Channel 01 : 0[67010] -> 1[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:65042 [3] NCCL INFO Channel 02 : 1[67020] -> 0[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:65070 [2] NCCL INFO Channel 02 : 0[67010] -> 1[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:65042 [3] NCCL INFO Channel 03 : 1[67020] -> 0[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:65070 [2] NCCL INFO Channel 03 : 0[67010] -> 1[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:65042 [3] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62975:65042 [3] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62975:65042 [3] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62975:65042 [3] NCCL INFO 4 coll channels, 4 p2p channels, 4 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62974:65070 [2] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62974:65070 [2] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62974:65070 [2] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62974:65070 [2] NCCL INFO 4 coll channels, 4 p2p channels, 4 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62972:65000 [0] NCCL INFO comm 0x7f0c3cc5f740 rank 0 nranks 2 cudaDev 0 busId 65010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62972:65000 [0] NCCL INFO Launch mode Parallel iv-ybpu7pvmiu5m57lh5kdd:62973:65134 [1] NCCL INFO comm 0x7f2050aa2d20 rank 1 nranks 2 cudaDev 1 busId 65020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62974:65070 [2] NCCL INFO comm 0x7fc560aa6460 rank 0 nranks 2 cudaDev 2 busId 67010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62974:65070 [2] NCCL INFO Launch mode Parallel iv-ybpu7pvmiu5m57lh5kdd:62975:65042 [3] NCCL INFO comm 0x7f5c04aac650 rank 1 nranks 2 cudaDev 3 busId 67020 - Init COMPLETE NCCL version 2.12.10+cuda11.2 iv-ybpu7pvmiu5m57lh5kdd:62979:65112 [7] NCCL INFO Setting affinity for GPU 7 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:62978:65200 [6] NCCL INFO Setting affinity for GPU 6 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:62978:65200 [6] NCCL INFO Channel 00/02 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62978:65200 [6] NCCL INFO Channel 01/02 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62979:65112 [7] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 iv-ybpu7pvmiu5m57lh5kdd:62978:65200 [6] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 iv-ybpu7pvmiu5m57lh5kdd:62977:65109 [5] NCCL INFO Setting affinity for GPU 5 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:62976:65196 [4] NCCL INFO Setting affinity for GPU 4 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:62978:65200 [6] NCCL INFO Channel 00 : 0[6b010] -> 1[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:65109 [5] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 [2] -1/-1/-1->1->0 [3] -1/-1/-1->1->0 iv-ybpu7pvmiu5m57lh5kdd:62976:65196 [4] NCCL INFO Channel 00/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62976:65196 [4] NCCL INFO Channel 01/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62976:65196 [4] NCCL INFO Channel 02/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62976:65196 [4] NCCL INFO Channel 03/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62976:65196 [4] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 iv-ybpu7pvmiu5m57lh5kdd:62979:65112 [7] NCCL INFO Channel 00 : 1[6b020] -> 0[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:65200 [6] NCCL INFO Channel 01 : 0[6b010] -> 1[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:65112 [7] NCCL INFO Channel 01 : 1[6b020] -> 0[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:65200 [6] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62978:65200 [6] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62978:65200 [6] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62978:65200 [6] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62979:65112 [7] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62979:65112 [7] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62979:65112 [7] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62979:65112 [7] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62977:65109 [5] NCCL INFO Channel 00 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:65196 [4] NCCL INFO Channel 00 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:65109 [5] NCCL INFO Channel 01 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:65196 [4] NCCL INFO Channel 01 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:65109 [5] NCCL INFO Channel 02 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:65196 [4] NCCL INFO Channel 02 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:65109 [5] NCCL INFO Channel 03 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:65196 [4] NCCL INFO Channel 03 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:65109 [5] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62977:65109 [5] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62977:65109 [5] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62977:65109 [5] NCCL INFO 4 coll channels, 4 p2p channels, 4 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62976:65196 [4] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62976:65196 [4] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62976:65196 [4] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62976:65196 [4] NCCL INFO 4 coll channels, 4 p2p channels, 4 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62978:65200 [6] NCCL INFO comm 0x7fd2dcafb1a0 rank 0 nranks 2 cudaDev 6 busId 6b010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62979:65112 [7] NCCL INFO comm 0x7f7584af6600 rank 1 nranks 2 cudaDev 7 busId 6b020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62978:65200 [6] NCCL INFO Launch mode Parallel iv-ybpu7pvmiu5m57lh5kdd:62977:65109 [5] NCCL INFO comm 0x7fb300af8d60 rank 1 nranks 2 cudaDev 5 busId 69020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62976:65196 [4] NCCL INFO comm 0x7eff98af7650 rank 0 nranks 2 cudaDev 4 busId 69010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62976:65196 [4] NCCL INFO Launch mode Parallel iv-ybpu7pvmiu5m57lh5kdd:62979:65093 [7] NCCL INFO Setting affinity for GPU 7 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:62978:65175 [6] NCCL INFO Setting affinity for GPU 6 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:62978:65175 [6] NCCL INFO Channel 00/02 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62979:65093 [7] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 iv-ybpu7pvmiu5m57lh5kdd:62978:65175 [6] NCCL INFO Channel 01/02 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62978:65175 [6] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 iv-ybpu7pvmiu5m57lh5kdd:62979:65093 [7] NCCL INFO Channel 00 : 1[6b020] -> 0[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:65175 [6] NCCL INFO Channel 00 : 0[6b010] -> 1[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:65093 [7] NCCL INFO Channel 01 : 1[6b020] -> 0[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:65175 [6] NCCL INFO Channel 01 : 0[6b010] -> 1[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:65104 [5] NCCL INFO Setting affinity for GPU 5 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:62976:65184 [4] NCCL INFO Setting affinity for GPU 4 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:62977:65104 [5] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 [2] -1/-1/-1->1->0 [3] -1/-1/-1->1->0 iv-ybpu7pvmiu5m57lh5kdd:62976:65184 [4] NCCL INFO Channel 00/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62976:65184 [4] NCCL INFO Channel 01/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62976:65184 [4] NCCL INFO Channel 02/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62976:65184 [4] NCCL INFO Channel 03/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62976:65184 [4] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 iv-ybpu7pvmiu5m57lh5kdd:62978:65175 [6] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62978:65175 [6] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62978:65175 [6] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62978:65175 [6] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62979:65093 [7] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62979:65093 [7] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62979:65093 [7] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62979:65093 [7] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62977:65104 [5] NCCL INFO Channel 00 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:65184 [4] NCCL INFO Channel 00 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:65104 [5] NCCL INFO Channel 01 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:65184 [4] NCCL INFO Channel 01 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:65104 [5] NCCL INFO Channel 02 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:65184 [4] NCCL INFO Channel 02 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:65104 [5] NCCL INFO Channel 03 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:65184 [4] NCCL INFO Channel 03 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:65104 [5] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62977:65104 [5] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62977:65104 [5] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62977:65104 [5] NCCL INFO 4 coll channels, 4 p2p channels, 4 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62976:65184 [4] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62976:65184 [4] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62976:65184 [4] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62976:65184 [4] NCCL INFO 4 coll channels, 4 p2p channels, 4 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62979:65093 [7] NCCL INFO comm 0x7f79c597dfa0 rank 1 nranks 2 cudaDev 7 busId 6b020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62978:65175 [6] NCCL INFO comm 0x7fd2f997efc0 rank 0 nranks 2 cudaDev 6 busId 6b010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62978:65175 [6] NCCL INFO Launch mode Parallel iv-ybpu7pvmiu5m57lh5kdd:62977:65104 [5] NCCL INFO comm 0x7fb309974ea0 rank 1 nranks 2 cudaDev 5 busId 69020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62976:65184 [4] NCCL INFO comm 0x7effa997a440 rank 0 nranks 2 cudaDev 4 busId 69010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62976:65184 [4] NCCL INFO Launch mode Parallel iv-ybpu7pvmiu5m57lh5kdd:62972:64993 [0] NCCL INFO Setting affinity for GPU 0 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:62973:65150 [1] NCCL INFO Setting affinity for GPU 1 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:62972:64993 [0] NCCL INFO Channel 00/02 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62973:65150 [1] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 iv-ybpu7pvmiu5m57lh5kdd:62972:64993 [0] NCCL INFO Channel 01/02 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62972:64993 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 iv-ybpu7pvmiu5m57lh5kdd:62972:64993 [0] NCCL INFO Channel 00 : 0[65010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:65150 [1] NCCL INFO Channel 00 : 1[65020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:64993 [0] NCCL INFO Channel 01 : 0[65010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:65150 [1] NCCL INFO Channel 01 : 1[65020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:65066 [2] NCCL INFO Setting affinity for GPU 2 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:62975:65033 [3] NCCL INFO Setting affinity for GPU 3 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:62975:65033 [3] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 [2] -1/-1/-1->1->0 [3] -1/-1/-1->1->0 iv-ybpu7pvmiu5m57lh5kdd:62974:65066 [2] NCCL INFO Channel 00/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62974:65066 [2] NCCL INFO Channel 01/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62974:65066 [2] NCCL INFO Channel 02/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62974:65066 [2] NCCL INFO Channel 03/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62974:65066 [2] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 iv-ybpu7pvmiu5m57lh5kdd:62972:64993 [0] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62972:64993 [0] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62972:64993 [0] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62972:64993 [0] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62973:65150 [1] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62973:65150 [1] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62973:65150 [1] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62973:65150 [1] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62974:65066 [2] NCCL INFO Channel 00 : 0[67010] -> 1[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:65033 [3] NCCL INFO Channel 00 : 1[67020] -> 0[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:65066 [2] NCCL INFO Channel 01 : 0[67010] -> 1[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:65033 [3] NCCL INFO Channel 01 : 1[67020] -> 0[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:65066 [2] NCCL INFO Channel 02 : 0[67010] -> 1[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:65033 [3] NCCL INFO Channel 02 : 1[67020] -> 0[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:65066 [2] NCCL INFO Channel 03 : 0[67010] -> 1[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:65033 [3] NCCL INFO Channel 03 : 1[67020] -> 0[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:65066 [2] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62974:65066 [2] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62974:65066 [2] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62974:65066 [2] NCCL INFO 4 coll channels, 4 p2p channels, 4 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62975:65033 [3] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62975:65033 [3] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62975:65033 [3] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62975:65033 [3] NCCL INFO 4 coll channels, 4 p2p channels, 4 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62972:64993 [0] NCCL INFO comm 0x7f0c397631c0 rank 0 nranks 2 cudaDev 0 busId 65010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62972:64993 [0] NCCL INFO Launch mode Parallel iv-ybpu7pvmiu5m57lh5kdd:62973:65150 [1] NCCL INFO comm 0x7f1c316430c0 rank 1 nranks 2 cudaDev 1 busId 65020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62974:65066 [2] NCCL INFO comm 0x7fc56963e920 rank 0 nranks 2 cudaDev 2 busId 67010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62974:65066 [2] NCCL INFO Launch mode Parallel iv-ybpu7pvmiu5m57lh5kdd:62975:65033 [3] NCCL INFO comm 0x7f5c0d63d580 rank 1 nranks 2 cudaDev 3 busId 67020 - Init COMPLETE NCCL version 2.12.10+cuda11.2 NCCL version 2.12.10+cuda11.2 iv-ybpu7pvmiu5m57lh5kdd:62978:65194 [6] NCCL INFO Setting affinity for GPU 6 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:62976:65191 [4] NCCL INFO Setting affinity for GPU 4 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:62976:65191 [4] NCCL INFO Channel 00/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62978:65194 [6] NCCL INFO Trees [0] 0/-1/-1->1->-1 [1] 0/-1/-1->1->-1 [2] 0/-1/-1->1->-1 [3] 0/-1/-1->1->-1 iv-ybpu7pvmiu5m57lh5kdd:62976:65191 [4] NCCL INFO Channel 01/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62976:65191 [4] NCCL INFO Channel 02/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62976:65191 [4] NCCL INFO Channel 03/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62976:65191 [4] NCCL INFO Trees [0] -1/-1/-1->0->1 [1] -1/-1/-1->0->1 [2] -1/-1/-1->0->1 [3] -1/-1/-1->0->1 iv-ybpu7pvmiu5m57lh5kdd:62979:65101 [7] NCCL INFO Setting affinity for GPU 7 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:62977:65116 [5] NCCL INFO Setting affinity for GPU 5 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:62977:65116 [5] NCCL INFO Channel 00/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62979:65101 [7] NCCL INFO Trees [0] 0/-1/-1->1->-1 [1] 0/-1/-1->1->-1 [2] 0/-1/-1->1->-1 [3] 0/-1/-1->1->-1 iv-ybpu7pvmiu5m57lh5kdd:62977:65116 [5] NCCL INFO Channel 01/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62977:65116 [5] NCCL INFO Channel 02/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62977:65116 [5] NCCL INFO Channel 03/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62977:65116 [5] NCCL INFO Trees [0] -1/-1/-1->0->1 [1] -1/-1/-1->0->1 [2] -1/-1/-1->0->1 [3] -1/-1/-1->0->1 iv-ybpu7pvmiu5m57lh5kdd:62976:65191 [4] NCCL INFO Channel 00 : 0[69010] -> 1[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:65191 [4] NCCL INFO Channel 01 : 0[69010] -> 1[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:65194 [6] NCCL INFO Channel 00 : 1[6b010] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:65191 [4] NCCL INFO Channel 02 : 0[69010] -> 1[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:65194 [6] NCCL INFO Channel 01 : 1[6b010] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:65191 [4] NCCL INFO Channel 03 : 0[69010] -> 1[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:65101 [7] NCCL INFO Channel 00 : 1[6b020] -> 0[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:65194 [6] NCCL INFO Channel 02 : 1[6b010] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:65116 [5] NCCL INFO Channel 00 : 0[69020] -> 1[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:65101 [7] NCCL INFO Channel 01 : 1[6b020] -> 0[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:65194 [6] NCCL INFO Channel 03 : 1[6b010] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:65116 [5] NCCL INFO Channel 01 : 0[69020] -> 1[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:65101 [7] NCCL INFO Channel 02 : 1[6b020] -> 0[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:65116 [5] NCCL INFO Channel 02 : 0[69020] -> 1[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:65101 [7] NCCL INFO Channel 03 : 1[6b020] -> 0[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:65116 [5] NCCL INFO Channel 03 : 0[69020] -> 1[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:65194 [6] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62978:65194 [6] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62978:65194 [6] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62978:65194 [6] NCCL INFO 4 coll channels, 4 p2p channels, 4 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62976:65191 [4] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62976:65191 [4] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62976:65191 [4] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62976:65191 [4] NCCL INFO 4 coll channels, 4 p2p channels, 4 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62979:65101 [7] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62979:65101 [7] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62979:65101 [7] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62979:65101 [7] NCCL INFO 4 coll channels, 4 p2p channels, 4 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62977:65116 [5] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62977:65116 [5] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62977:65116 [5] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62977:65116 [5] NCCL INFO 4 coll channels, 4 p2p channels, 4 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62975:65045 [3] NCCL INFO Setting affinity for GPU 3 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:62973:65139 [1] NCCL INFO Setting affinity for GPU 1 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:62973:65139 [1] NCCL INFO Channel 00/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62975:65045 [3] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 [2] -1/-1/-1->1->0 [3] -1/-1/-1->1->0 iv-ybpu7pvmiu5m57lh5kdd:62973:65139 [1] NCCL INFO Channel 01/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62973:65139 [1] NCCL INFO Channel 02/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62973:65139 [1] NCCL INFO Channel 03/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62973:65139 [1] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 iv-ybpu7pvmiu5m57lh5kdd:62974:65075 [2] NCCL INFO Setting affinity for GPU 2 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:62972:64998 [0] NCCL INFO Setting affinity for GPU 0 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:62974:65075 [2] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 [2] -1/-1/-1->1->0 [3] -1/-1/-1->1->0 iv-ybpu7pvmiu5m57lh5kdd:62972:64998 [0] NCCL INFO Channel 00/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62972:64998 [0] NCCL INFO Channel 01/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62972:64998 [0] NCCL INFO Channel 02/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62972:64998 [0] NCCL INFO Channel 03/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62972:64998 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 iv-ybpu7pvmiu5m57lh5kdd:62975:65045 [3] NCCL INFO Channel 00 : 1[67020] -> 0[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:65139 [1] NCCL INFO Channel 00 : 0[65020] -> 1[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:65045 [3] NCCL INFO Channel 01 : 1[67020] -> 0[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:65139 [1] NCCL INFO Channel 01 : 0[65020] -> 1[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:65194 [6] NCCL INFO comm 0x7fd2e685d740 rank 1 nranks 2 cudaDev 6 busId 6b010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62975:65045 [3] NCCL INFO Channel 02 : 1[67020] -> 0[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:65075 [2] NCCL INFO Channel 00 : 1[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:65191 [4] NCCL INFO comm 0x7effa2834660 rank 0 nranks 2 cudaDev 4 busId 69010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62976:65191 [4] NCCL INFO Launch mode Parallel iv-ybpu7pvmiu5m57lh5kdd:62973:65139 [1] NCCL INFO Channel 02 : 0[65020] -> 1[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:65045 [3] NCCL INFO Channel 03 : 1[67020] -> 0[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:64998 [0] NCCL INFO Channel 00 : 0[65010] -> 1[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:65075 [2] NCCL INFO Channel 01 : 1[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:65139 [1] NCCL INFO Channel 03 : 0[65020] -> 1[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:64998 [0] NCCL INFO Channel 01 : 0[65010] -> 1[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:65075 [2] NCCL INFO Channel 02 : 1[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:64998 [0] NCCL INFO Channel 02 : 0[65010] -> 1[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:65075 [2] NCCL INFO Channel 03 : 1[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:65101 [7] NCCL INFO comm 0x7f7592844880 rank 1 nranks 2 cudaDev 7 busId 6b020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62972:64998 [0] NCCL INFO Channel 03 : 0[65010] -> 1[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:65116 [5] NCCL INFO comm 0x7fb2fa855630 rank 0 nranks 2 cudaDev 5 busId 69020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62977:65116 [5] NCCL INFO Launch mode Parallel iv-ybpu7pvmiu5m57lh5kdd:62975:65045 [3] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62975:65045 [3] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62975:65045 [3] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62975:65045 [3] NCCL INFO 4 coll channels, 4 p2p channels, 4 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62973:65139 [1] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62973:65139 [1] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62973:65139 [1] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62973:65139 [1] NCCL INFO 4 coll channels, 4 p2p channels, 4 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62975:65045 [3] NCCL INFO comm 0x7f5bfe471510 rank 1 nranks 2 cudaDev 3 busId 67020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62973:65139 [1] NCCL INFO comm 0x7f1c42479c10 rank 0 nranks 2 cudaDev 1 busId 65020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62973:65139 [1] NCCL INFO Launch mode Parallel iv-ybpu7pvmiu5m57lh5kdd:62974:65075 [2] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62974:65075 [2] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62974:65075 [2] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62974:65075 [2] NCCL INFO 4 coll channels, 4 p2p channels, 4 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62972:64998 [0] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62972:64998 [0] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62972:64998 [0] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62972:64998 [0] NCCL INFO 4 coll channels, 4 p2p channels, 4 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62974:65075 [2] NCCL INFO comm 0x7fc55a48bc00 rank 1 nranks 2 cudaDev 2 busId 67010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62972:64998 [0] NCCL INFO comm 0x7f0c4263d480 rank 0 nranks 2 cudaDev 0 busId 65010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62972:64998 [0] NCCL INFO Launch mode Parallel iv-ybpu7pvmiu5m57lh5kdd:62979:65101 [7] NCCL INFO Setting affinity for GPU 7 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:62978:65194 [6] NCCL INFO Setting affinity for GPU 6 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:62979:65101 [7] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 iv-ybpu7pvmiu5m57lh5kdd:62978:65194 [6] NCCL INFO Channel 00/02 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62978:65194 [6] NCCL INFO Channel 01/02 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62978:65194 [6] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 iv-ybpu7pvmiu5m57lh5kdd:62977:65116 [5] NCCL INFO Setting affinity for GPU 5 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:62976:65191 [4] NCCL INFO Setting affinity for GPU 4 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:62977:65116 [5] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 [2] -1/-1/-1->1->0 [3] -1/-1/-1->1->0 iv-ybpu7pvmiu5m57lh5kdd:62976:65191 [4] NCCL INFO Channel 00/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62976:65191 [4] NCCL INFO Channel 01/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62976:65191 [4] NCCL INFO Channel 02/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62976:65191 [4] NCCL INFO Channel 03/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62976:65191 [4] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 iv-ybpu7pvmiu5m57lh5kdd:62979:65101 [7] NCCL INFO Channel 00 : 1[6b020] -> 0[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:65194 [6] NCCL INFO Channel 00 : 0[6b010] -> 1[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:65075 [2] NCCL INFO Setting affinity for GPU 2 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:62975:65045 [3] NCCL INFO Setting affinity for GPU 3 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:62979:65101 [7] NCCL INFO Channel 01 : 1[6b020] -> 0[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:65194 [6] NCCL INFO Channel 01 : 0[6b010] -> 1[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:65075 [2] NCCL INFO Channel 00/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62975:65045 [3] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 [2] -1/-1/-1->1->0 [3] -1/-1/-1->1->0 iv-ybpu7pvmiu5m57lh5kdd:62974:65075 [2] NCCL INFO Channel 01/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62974:65075 [2] NCCL INFO Channel 02/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62974:65075 [2] NCCL INFO Channel 03/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62974:65075 [2] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 iv-ybpu7pvmiu5m57lh5kdd:62972:64998 [0] NCCL INFO Setting affinity for GPU 0 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:62973:65139 [1] NCCL INFO Setting affinity for GPU 1 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:62972:64998 [0] NCCL INFO Channel 00/02 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62973:65139 [1] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 iv-ybpu7pvmiu5m57lh5kdd:62972:64998 [0] NCCL INFO Channel 01/02 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:62972:64998 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 iv-ybpu7pvmiu5m57lh5kdd:62977:65116 [5] NCCL INFO Channel 00 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:65191 [4] NCCL INFO Channel 00 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62979:65101 [7] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62979:65101 [7] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62979:65101 [7] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62979:65101 [7] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62977:65116 [5] NCCL INFO Channel 01 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62978:65194 [6] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62978:65194 [6] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62978:65194 [6] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62978:65194 [6] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62976:65191 [4] NCCL INFO Channel 01 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:65116 [5] NCCL INFO Channel 02 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:64998 [0] NCCL INFO Channel 00 : 0[65010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:65191 [4] NCCL INFO Channel 02 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:65139 [1] NCCL INFO Channel 00 : 1[65020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62977:65116 [5] NCCL INFO Channel 03 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:65075 [2] NCCL INFO Channel 00 : 0[67010] -> 1[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62976:65191 [4] NCCL INFO Channel 03 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:64998 [0] NCCL INFO Channel 01 : 0[65010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62973:65139 [1] NCCL INFO Channel 01 : 1[65020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:65045 [3] NCCL INFO Channel 00 : 1[67020] -> 0[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:65075 [2] NCCL INFO Channel 01 : 0[67010] -> 1[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:65045 [3] NCCL INFO Channel 01 : 1[67020] -> 0[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:65075 [2] NCCL INFO Channel 02 : 0[67010] -> 1[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:65045 [3] NCCL INFO Channel 02 : 1[67020] -> 0[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62974:65075 [2] NCCL INFO Channel 03 : 0[67010] -> 1[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62975:65045 [3] NCCL INFO Channel 03 : 1[67020] -> 0[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:62972:64998 [0] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62972:64998 [0] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62972:64998 [0] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62972:64998 [0] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62973:65139 [1] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62973:65139 [1] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62973:65139 [1] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62973:65139 [1] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62977:65116 [5] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62977:65116 [5] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62977:65116 [5] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62977:65116 [5] NCCL INFO 4 coll channels, 4 p2p channels, 4 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62976:65191 [4] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62976:65191 [4] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62976:65191 [4] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62976:65191 [4] NCCL INFO 4 coll channels, 4 p2p channels, 4 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62974:65075 [2] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62974:65075 [2] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62974:65075 [2] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62974:65075 [2] NCCL INFO 4 coll channels, 4 p2p channels, 4 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62975:65045 [3] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:62975:65045 [3] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:62975:65045 [3] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:62975:65045 [3] NCCL INFO 4 coll channels, 4 p2p channels, 4 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:62979:65101 [7] NCCL INFO comm 0x7f7592909160 rank 1 nranks 2 cudaDev 7 busId 6b020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62978:65194 [6] NCCL INFO comm 0x7fd2e6912670 rank 0 nranks 2 cudaDev 6 busId 6b010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62978:65194 [6] NCCL INFO Launch mode Parallel iv-ybpu7pvmiu5m57lh5kdd:62972:64998 [0] NCCL INFO comm 0x7f0c4272bbd0 rank 0 nranks 2 cudaDev 0 busId 65010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62972:64998 [0] NCCL INFO Launch mode Parallel iv-ybpu7pvmiu5m57lh5kdd:62973:65139 [1] NCCL INFO comm 0x7f1c42564e90 rank 1 nranks 2 cudaDev 1 busId 65020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62977:65116 [5] NCCL INFO comm 0x7fb2fa90c110 rank 1 nranks 2 cudaDev 5 busId 69020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62976:65191 [4] NCCL INFO comm 0x7effa28f7fa0 rank 0 nranks 2 cudaDev 4 busId 69010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62976:65191 [4] NCCL INFO Launch mode Parallel iv-ybpu7pvmiu5m57lh5kdd:62974:65075 [2] NCCL INFO comm 0x7fc55a57a100 rank 0 nranks 2 cudaDev 2 busId 67010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62974:65075 [2] NCCL INFO Launch mode Parallel iv-ybpu7pvmiu5m57lh5kdd:62975:65045 [3] NCCL INFO comm 0x7f5bfe566130 rank 1 nranks 2 cudaDev 3 busId 67020 - Init COMPLETE timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] 2022/07/05 22:48:19.856, Tesla V100-SXM2-32GB, 470.57.02, 94 %, 42 %, 32510 MiB, 16221 MiB, 16289 MiB 2022/07/05 22:48:19.856, Tesla V100-SXM2-32GB, 470.57.02, 94 %, 42 %, 32510 MiB, 16221 MiB, 16289 MiB 2022/07/05 22:48:19.858, Tesla V100-SXM2-32GB, 470.57.02, 94 %, 42 %, 32510 MiB, 16221 MiB, 16289 MiB 2022/07/05 22:48:19.859, Tesla V100-SXM2-32GB, 470.57.02, 68 %, 26 %, 32510 MiB, 16221 MiB, 16289 MiB 2022/07/05 22:48:19.859, Tesla V100-SXM2-32GB, 470.57.02, 68 %, 26 %, 32510 MiB, 16221 MiB, 16289 MiB timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] 2022/07/05 22:48:19.861, Tesla V100-SXM2-32GB, 470.57.02, 68 %, 26 %, 32510 MiB, 16221 MiB, 16289 MiB 2022/07/05 22:48:19.861, Tesla V100-SXM2-32GB, 470.57.02, 53 %, 23 %, 32510 MiB, 15849 MiB, 16661 MiB 2022/07/05 22:48:19.861, Tesla V100-SXM2-32GB, 470.57.02, 53 %, 23 %, 32510 MiB, 15849 MiB, 16661 MiB timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] 2022/07/05 22:48:19.862, Tesla V100-SXM2-32GB, 470.57.02, 94 %, 42 %, 32510 MiB, 16221 MiB, 16289 MiB 2022/07/05 22:48:19.863, Tesla V100-SXM2-32GB, 470.57.02, 94 %, 42 %, 32510 MiB, 16221 MiB, 16289 MiB 2022/07/05 22:48:19.863, Tesla V100-SXM2-32GB, 470.57.02, 94 %, 42 %, 32510 MiB, 16221 MiB, 16289 MiB 2022/07/05 22:48:19.864, Tesla V100-SXM2-32GB, 470.57.02, 53 %, 23 %, 32510 MiB, 15849 MiB, 16661 MiB 2022/07/05 22:48:19.864, Tesla V100-SXM2-32GB, 470.57.02, 47 %, 25 %, 32510 MiB, 15849 MiB, 16661 MiB 2022/07/05 22:48:19.865, Tesla V100-SXM2-32GB, 470.57.02, 47 %, 25 %, 32510 MiB, 15849 MiB, 16661 MiB 2022/07/05 22:48:19.865, Tesla V100-SXM2-32GB, 470.57.02, 94 %, 42 %, 32510 MiB, 16221 MiB, 16289 MiB 2022/07/05 22:48:19.866, Tesla V100-SXM2-32GB, 470.57.02, 68 %, 26 %, 32510 MiB, 16221 MiB, 16289 MiB 2022/07/05 22:48:19.868, Tesla V100-SXM2-32GB, 470.57.02, 68 %, 26 %, 32510 MiB, 16221 MiB, 16289 MiB 2022/07/05 22:48:19.869, Tesla V100-SXM2-32GB, 470.57.02, 68 %, 26 %, 32510 MiB, 16221 MiB, 16289 MiB 2022/07/05 22:48:19.869, Tesla V100-SXM2-32GB, 470.57.02, 75 %, 35 %, 32510 MiB, 15849 MiB, 16661 MiB 2022/07/05 22:48:19.869, Tesla V100-SXM2-32GB, 470.57.02, 86 %, 41 %, 32510 MiB, 15031 MiB, 17479 MiB 2022/07/05 22:48:19.870, Tesla V100-SXM2-32GB, 470.57.02, 86 %, 41 %, 32510 MiB, 15031 MiB, 17479 MiB 2022/07/05 22:48:19.871, Tesla V100-SXM2-32GB, 470.57.02, 68 %, 26 %, 32510 MiB, 16221 MiB, 16289 MiB 2022/07/05 22:48:19.871, Tesla V100-SXM2-32GB, 470.57.02, 53 %, 23 %, 32510 MiB, 15849 MiB, 16661 MiB 2022/07/05 22:48:19.877, Tesla V100-SXM2-32GB, 470.57.02, 53 %, 23 %, 32510 MiB, 15849 MiB, 16661 MiB 2022/07/05 22:48:19.878, Tesla V100-SXM2-32GB, 470.57.02, 53 %, 23 %, 32510 MiB, 15849 MiB, 16661 MiB 2022/07/05 22:48:19.878, Tesla V100-SXM2-32GB, 470.57.02, 86 %, 41 %, 32510 MiB, 15031 MiB, 17479 MiB 2022/07/05 22:48:19.878, Tesla V100-SXM2-32GB, 470.57.02, 82 %, 37 %, 32510 MiB, 15031 MiB, 17479 MiB 2022/07/05 22:48:19.879, Tesla V100-SXM2-32GB, 470.57.02, 82 %, 37 %, 32510 MiB, 15031 MiB, 17479 MiB 2022/07/05 22:48:19.880, Tesla V100-SXM2-32GB, 470.57.02, 53 %, 23 %, 32510 MiB, 15849 MiB, 16661 MiB 2022/07/05 22:48:19.880, Tesla V100-SXM2-32GB, 470.57.02, 75 %, 35 %, 32510 MiB, 15849 MiB, 16661 MiB 2022/07/05 22:48:19.883, Tesla V100-SXM2-32GB, 470.57.02, 75 %, 35 %, 32510 MiB, 15849 MiB, 16661 MiB 2022/07/05 22:48:19.884, Tesla V100-SXM2-32GB, 470.57.02, 75 %, 35 %, 32510 MiB, 15849 MiB, 16661 MiB 2022/07/05 22:48:19.884, Tesla V100-SXM2-32GB, 470.57.02, 82 %, 37 %, 32510 MiB, 15031 MiB, 17479 MiB 2022/07/05 22:48:19.884, Tesla V100-SXM2-32GB, 470.57.02, 100 %, 48 %, 32510 MiB, 15403 MiB, 17107 MiB 2022/07/05 22:48:19.885, Tesla V100-SXM2-32GB, 470.57.02, 100 %, 48 %, 32510 MiB, 15403 MiB, 17107 MiB 2022/07/05 22:48:19.886, Tesla V100-SXM2-32GB, 470.57.02, 75 %, 35 %, 32510 MiB, 15849 MiB, 16661 MiB 2022/07/05 22:48:19.886, Tesla V100-SXM2-32GB, 470.57.02, 86 %, 41 %, 32510 MiB, 15031 MiB, 17479 MiB 2022/07/05 22:48:19.889, Tesla V100-SXM2-32GB, 470.57.02, 86 %, 41 %, 32510 MiB, 15031 MiB, 17479 MiB 2022/07/05 22:48:19.890, Tesla V100-SXM2-32GB, 470.57.02, 86 %, 41 %, 32510 MiB, 15031 MiB, 17479 MiB 2022/07/05 22:48:19.890, Tesla V100-SXM2-32GB, 470.57.02, 100 %, 48 %, 32510 MiB, 15403 MiB, 17107 MiB 2022/07/05 22:48:19.890, Tesla V100-SXM2-32GB, 470.57.02, 100 %, 48 %, 32510 MiB, 15403 MiB, 17107 MiB 2022/07/05 22:48:19.891, Tesla V100-SXM2-32GB, 470.57.02, 100 %, 48 %, 32510 MiB, 15403 MiB, 17107 MiB 2022/07/05 22:48:19.892, Tesla V100-SXM2-32GB, 470.57.02, 86 %, 41 %, 32510 MiB, 15031 MiB, 17479 MiB 2022/07/05 22:48:19.893, Tesla V100-SXM2-32GB, 470.57.02, 82 %, 37 %, 32510 MiB, 15031 MiB, 17479 MiB 2022/07/05 22:48:19.896, Tesla V100-SXM2-32GB, 470.57.02, 82 %, 37 %, 32510 MiB, 15031 MiB, 17479 MiB 2022/07/05 22:48:19.897, Tesla V100-SXM2-32GB, 470.57.02, 82 %, 37 %, 32510 MiB, 15031 MiB, 17479 MiB 2022/07/05 22:48:19.897, Tesla V100-SXM2-32GB, 470.57.02, 100 %, 48 %, 32510 MiB, 15403 MiB, 17107 MiB 2022/07/05 22:48:19.899, Tesla V100-SXM2-32GB, 470.57.02, 82 %, 37 %, 32510 MiB, 15031 MiB, 17479 MiB 2022/07/05 22:48:19.900, Tesla V100-SXM2-32GB, 470.57.02, 100 %, 48 %, 32510 MiB, 15403 MiB, 17107 MiB 2022/07/05 22:48:19.902, Tesla V100-SXM2-32GB, 470.57.02, 100 %, 48 %, 32510 MiB, 15403 MiB, 17107 MiB 2022/07/05 22:48:19.903, Tesla V100-SXM2-32GB, 470.57.02, 100 %, 48 %, 32510 MiB, 15403 MiB, 17107 MiB 2022/07/05 22:48:19.905, Tesla V100-SXM2-32GB, 470.57.02, 100 %, 48 %, 32510 MiB, 15403 MiB, 17107 MiB 2022/07/05 22:48:19.905, Tesla V100-SXM2-32GB, 470.57.02, 100 %, 48 %, 32510 MiB, 15403 MiB, 17107 MiB 2022/07/05 22:48:19.907, Tesla V100-SXM2-32GB, 470.57.02, 100 %, 48 %, 32510 MiB, 15403 MiB, 17107 MiB 2022/07/05 22:48:19.908, Tesla V100-SXM2-32GB, 470.57.02, 100 %, 48 %, 32510 MiB, 15403 MiB, 17107 MiB 2022/07/05 22:48:19.909, Tesla V100-SXM2-32GB, 470.57.02, 100 %, 48 %, 32510 MiB, 15403 MiB, 17107 MiB timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] 2022/07/05 22:48:27.433, Tesla V100-SXM2-32GB, 470.57.02, 82 %, 38 %, 32510 MiB, 16221 MiB, 16289 MiB 2022/07/05 22:48:27.434, Tesla V100-SXM2-32GB, 470.57.02, 78 %, 37 %, 32510 MiB, 16221 MiB, 16289 MiB 2022/07/05 22:48:27.436, Tesla V100-SXM2-32GB, 470.57.02, 35 %, 20 %, 32510 MiB, 15849 MiB, 16661 MiB 2022/07/05 22:48:27.437, Tesla V100-SXM2-32GB, 470.57.02, 27 %, 10 %, 32510 MiB, 15849 MiB, 16661 MiB 2022/07/05 22:48:27.438, Tesla V100-SXM2-32GB, 470.57.02, 100 %, 55 %, 32510 MiB, 15031 MiB, 17479 MiB 2022/07/05 22:48:27.439, Tesla V100-SXM2-32GB, 470.57.02, 100 %, 58 %, 32510 MiB, 15031 MiB, 17479 MiB 2022/07/05 22:48:27.439, Tesla V100-SXM2-32GB, 470.57.02, 100 %, 46 %, 32510 MiB, 15403 MiB, 17107 MiB 2022/07/05 22:48:27.440, Tesla V100-SXM2-32GB, 470.57.02, 100 %, 49 %, 32510 MiB, 15403 MiB, 17107 MiB [07/05 22:48:35 lb.utils.events]:  eta: 0:15:18 iteration: 99/220 consumed_samples: 204800 total_loss: 7.99 lm_loss: 7.294 sop_loss: 0.6967 time: 7.6544 s/iter data_time: 0.0155 s/iter total_throughput: 267.56 samples/s lr: 5.82e-05 [07/05 23:01:21 lb.utils.events]:  eta: 0:02:33 iteration: 199/220 consumed_samples: 409600 total_loss: 7.895 lm_loss: 7.201 sop_loss: 0.6945 time: 7.6591 s/iter data_time: 0.0156 s/iter total_throughput: 267.39 samples/s lr: 3.21e-06 [07/05 23:03:54 lb.utils.events]:  eta: 0:00:00 iteration: 219/220 consumed_samples: 450560 total_loss: 7.892 lm_loss: 7.198 sop_loss: 0.6939 time: 7.6593 s/iter data_time: 0.1308 s/iter total_throughput: 267.39 samples/s lr: 1.01e-06 [07/05 23:03:54 lb.engine.hooks]: Overall training speed: 218 iterations in 0:27:49 (7.6593 s / it) [07/05 23:03:54 lb.engine.hooks]: Total training time: 0:27:49 (0:00:00 on hooks) iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO comm 0x7f0f2443bab0 rank 0 nranks 4 cudaDev 0 busId 65010 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO comm 0x7f2020535e90 rank 1 nranks 4 cudaDev 1 busId 65020 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO comm 0x7f79b6da1bb0 rank 3 nranks 4 cudaDev 7 busId 6b020 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO comm 0x7fd714697ac0 rank 2 nranks 4 cudaDev 6 busId 6b010 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO comm 0x7f5ffc082e20 rank 3 nranks 4 cudaDev 3 busId 67020 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO comm 0x7f03c42a5d30 rank 0 nranks 4 cudaDev 4 busId 69010 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO comm 0x7fc950bedfe0 rank 2 nranks 4 cudaDev 2 busId 67010 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO comm 0x7fb72caae050 rank 1 nranks 4 cudaDev 5 busId 69020 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO comm 0x560af488d3d0 rank 1 nranks 4 cudaDev 1 busId 65020 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO comm 0x55ad5f6bd300 rank 3 nranks 4 cudaDev 7 busId 6b020 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO comm 0x5650f222c230 rank 0 nranks 4 cudaDev 0 busId 65010 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO comm 0x55b1f98fe530 rank 2 nranks 4 cudaDev 6 busId 6b010 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO comm 0x561a885d7fb0 rank 3 nranks 4 cudaDev 3 busId 67020 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO comm 0x55bdf64896c0 rank 0 nranks 4 cudaDev 4 busId 69010 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO comm 0x55ef13cfa560 rank 2 nranks 4 cudaDev 2 busId 67010 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO comm 0x5588b665b910 rank 1 nranks 4 cudaDev 5 busId 69020 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO comm 0x560af49cb670 rank 1 nranks 4 cudaDev 1 busId 65020 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO comm 0x55ad5f7b4290 rank 3 nranks 4 cudaDev 7 busId 6b020 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO comm 0x55b1f99f5660 rank 2 nranks 4 cudaDev 6 busId 6b010 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO comm 0x5650f236cce0 rank 0 nranks 4 cudaDev 0 busId 65010 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO comm 0x55ad5f9f53b0 rank 3 nranks 4 cudaDev 7 busId 6b020 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO comm 0x560af4c51740 rank 1 nranks 4 cudaDev 1 busId 65020 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO comm 0x55b1f9c368b0 rank 2 nranks 4 cudaDev 6 busId 6b010 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO comm 0x5650f25f5370 rank 0 nranks 4 cudaDev 0 busId 65010 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO comm 0x7f1c42564e90 rank 1 nranks 2 cudaDev 1 busId 65020 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO comm 0x7f0c4272bbd0 rank 0 nranks 2 cudaDev 0 busId 65010 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO comm 0x561a8871d890 rank 3 nranks 4 cudaDev 3 busId 67020 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO comm 0x55bdf6588d20 rank 0 nranks 4 cudaDev 4 busId 69010 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO comm 0x55ef13e3f960 rank 2 nranks 4 cudaDev 2 busId 67010 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO comm 0x5588b675a220 rank 1 nranks 4 cudaDev 5 busId 69020 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO comm 0x7f7592844880 rank 1 nranks 2 cudaDev 7 busId 6b020 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO comm 0x7fd2e685d740 rank 1 nranks 2 cudaDev 6 busId 6b010 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO comm 0x7f2050aa2d20 rank 1 nranks 2 cudaDev 1 busId 65020 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO comm 0x7f0c3cc5f740 rank 0 nranks 2 cudaDev 0 busId 65010 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO comm 0x7f7592909160 rank 1 nranks 2 cudaDev 7 busId 6b020 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO comm 0x7fd2e6912670 rank 0 nranks 2 cudaDev 6 busId 6b010 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO comm 0x7f1c316430c0 rank 1 nranks 2 cudaDev 1 busId 65020 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO comm 0x7f0c397631c0 rank 0 nranks 2 cudaDev 0 busId 65010 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO comm 0x7f7584af6600 rank 1 nranks 2 cudaDev 7 busId 6b020 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO comm 0x7fd2dcafb1a0 rank 0 nranks 2 cudaDev 6 busId 6b010 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62979:62979 [7] NCCL INFO comm 0x7f79c597dfa0 rank 1 nranks 2 cudaDev 7 busId 6b020 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62973:62973 [1] NCCL INFO comm 0x7f1c42479c10 rank 0 nranks 2 cudaDev 1 busId 65020 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62978:62978 [6] NCCL INFO comm 0x7fd2f997efc0 rank 0 nranks 2 cudaDev 6 busId 6b010 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62972:62972 [0] NCCL INFO comm 0x7f0c4263d480 rank 0 nranks 2 cudaDev 0 busId 65010 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO comm 0x561a889ac720 rank 3 nranks 4 cudaDev 3 busId 67020 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO comm 0x55bdf67d1830 rank 0 nranks 4 cudaDev 4 busId 69010 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO comm 0x55ef34eb64c0 rank 2 nranks 4 cudaDev 2 busId 67010 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO comm 0x5588b69a4c20 rank 1 nranks 4 cudaDev 5 busId 69020 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO comm 0x7f5bfe471510 rank 1 nranks 2 cudaDev 3 busId 67020 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO comm 0x7fc55a48bc00 rank 1 nranks 2 cudaDev 2 busId 67010 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO comm 0x7fb2fa90c110 rank 1 nranks 2 cudaDev 5 busId 69020 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO comm 0x7effa28f7fa0 rank 0 nranks 2 cudaDev 4 busId 69010 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO comm 0x7f5bfe566130 rank 1 nranks 2 cudaDev 3 busId 67020 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO comm 0x7fc55a57a100 rank 0 nranks 2 cudaDev 2 busId 67010 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO comm 0x7fb300af8d60 rank 1 nranks 2 cudaDev 5 busId 69020 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO comm 0x7eff98af7650 rank 0 nranks 2 cudaDev 4 busId 69010 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO comm 0x7f5c04aac650 rank 1 nranks 2 cudaDev 3 busId 67020 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO comm 0x7fc560aa6460 rank 0 nranks 2 cudaDev 2 busId 67010 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO comm 0x7fb309974ea0 rank 1 nranks 2 cudaDev 5 busId 69020 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO comm 0x7effa997a440 rank 0 nranks 2 cudaDev 4 busId 69010 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62975:62975 [3] NCCL INFO comm 0x7f5c0d63d580 rank 1 nranks 2 cudaDev 3 busId 67020 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62977:62977 [5] NCCL INFO comm 0x7fb2fa855630 rank 0 nranks 2 cudaDev 5 busId 69020 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62974:62974 [2] NCCL INFO comm 0x7fc56963e920 rank 0 nranks 2 cudaDev 2 busId 67010 - Destroy COMPLETE iv-ybpu7pvmiu5m57lh5kdd:62976:62976 [4] NCCL INFO comm 0x7effa2834660 rank 0 nranks 2 cudaDev 4 busId 69010 - Destroy COMPLETE ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. *****************************************