The module torch.distributed.launch is deprecated and going to be removed in future.Migrate to torch.distributed.run WARNING:torch.distributed.run:--use_env is deprecated and will be removed in future releases. Please read local_rank from `os.environ('LOCAL_RANK')` instead. INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs: entrypoint : pretrain_gpt.py min_nodes : 1 max_nodes : 1 nproc_per_node : 8 run_id : none rdzv_backend : static rdzv_endpoint : 127.0.0.1:6000 rdzv_configs : {'rank': 0, 'timeout': 900} max_restarts : 3 monitor_interval : 5 log_dir : None metrics_cfg : {} INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_b7ash4mr/none_a5nwqye3 INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group /opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:52: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: restart_count=0 master_addr=127.0.0.1 master_port=6000 group_rank=0 group_world_size=1 local_ranks=[0, 1, 2, 3, 4, 5, 6, 7] role_ranks=[0, 1, 2, 3, 4, 5, 6, 7] global_ranks=[0, 1, 2, 3, 4, 5, 6, 7] role_world_sizes=[8, 8, 8, 8, 8, 8, 8, 8] global_world_sizes=[8, 8, 8, 8, 8, 8, 8, 8] INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_b7ash4mr/none_a5nwqye3/attempt_0/0/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_b7ash4mr/none_a5nwqye3/attempt_0/1/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_b7ash4mr/none_a5nwqye3/attempt_0/2/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker3 reply file to: /tmp/torchelastic_b7ash4mr/none_a5nwqye3/attempt_0/3/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker4 reply file to: /tmp/torchelastic_b7ash4mr/none_a5nwqye3/attempt_0/4/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker5 reply file to: /tmp/torchelastic_b7ash4mr/none_a5nwqye3/attempt_0/5/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker6 reply file to: /tmp/torchelastic_b7ash4mr/none_a5nwqye3/attempt_0/6/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker7 reply file to: /tmp/torchelastic_b7ash4mr/none_a5nwqye3/attempt_0/7/error.json using world size: 8, data-parallel-size: 1, tensor-model-parallel size: 1, pipeline-model-parallel size: 8 using torch.float16 for parameters ... Persistent fused layer norm kernel is supported from pytorch v1.11 (nvidia pytorch container paired with v1.11). Defaulting to no_persist_layer_norm=True ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. False activations_checkpoint_method ................... uniform activations_checkpoint_num_layers ............... 1 adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None classes_fraction ................................ 1.0 clip_grad ....................................... 1.0 consumed_train_samples .......................... 0 consumed_valid_samples .......................... 0 data_impl ....................................... mmap data_parallel_random_init ....................... False data_parallel_size .............................. 1 data_path ....................................... ['/dataset/source/dataset/loss_compara_content_sentence'] data_per_class_fraction ......................... 1.0 data_sharding ................................... True dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embedding_path .................................. None empty_unused_memory_level ....................... 0 encoder_seq_length .............................. 1024 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 10 evidence_data_path .............................. None exit_duration_in_mins ........................... None exit_interval ................................... None exit_signal_handler ............................. False ffn_hidden_size ................................. 4096 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False global_batch_size ............................... 384 hidden_dropout .................................. 0.1 hidden_size ..................................... 1024 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_h ........................................... 224 img_w ........................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 inference_batch_times_seqlen_threshold .......... 512 init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 64 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ None local_rank ...................................... 0 log_batch_size_to_tensorboard ................... False log_interval .................................... 100 log_learning_rate_to_tensorboard ................ True log_loss_scale_to_tensorboard ................... True log_memory_to_tensorboard ....................... False log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... False log_validation_ppl_to_tensorboard ............... False log_world_size_to_tensorboard ................... False loss_scale ...................................... None loss_scale_window ............................... 1000 lr .............................................. 0.00015 lr_decay_iters .................................. 320000 lr_decay_samples ................................ None lr_decay_style .................................. cosine lr_warmup_fraction .............................. 0.01 lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 0 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 1024 merge_file ...................................... /dataset/source/dataset/gpt2-merges.txt micro_batch_size ................................ 24 min_loss_scale .................................. 1.0 min_lr .......................................... 1e-05 mmap_warmup ..................................... False no_async_tensor_model_parallel_allreduce ........ False no_load_optim ................................... None no_load_rng ..................................... None no_persist_layer_norm ........................... True no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 16 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 48 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 patch_dim ....................................... 16 pipeline_model_parallel_size .................... 8 pipeline_model_parallel_split_rank .............. None query_in_block_prob ............................. 0.1 rampup_batch_size ............................... None rank ............................................ 0 reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 sample_rate ..................................... 1.0 save ............................................ None save_interval ................................... 10000 scatter_gather_tensors_in_pipeline .............. True seed ............................................ 1234 seq_length ...................................... 1024 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 split ........................................... 949,50,1 tensor_model_parallel_size ...................... 1 tensorboard_dir ................................. None tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 1000 titles_data_path ................................ None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... 220 train_samples ................................... None use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_local_ddp ............. True use_cpu_initialization .......................... None use_one_sent_docs ............................... False virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /dataset/source/dataset/gpt2-vocab.json weight_decay .................................... 0.01 world_size ...................................... 8 -------------------- end of arguments --------------------- setting number of micro-batches to constant 16 > building GPT2BPETokenizer tokenizer ... > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) > initializing torch distributed ... > initializing tensor model parallel with size 1 > initializing pipeline model parallel with size 8 [W ProcessGroupNCCL.cpp:1671] Rank 7 using best-guess GPU 7 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device. > setting random seeds to 1234 ... > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 [W ProcessGroupNCCL.cpp:1671] Rank 5 using best-guess GPU 5 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device. [W ProcessGroupNCCL.cpp:1671] Rank 1 using best-guess GPU 1 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device. > compiling dataset index builder ... [W ProcessGroupNCCL.cpp:1671] Rank 3 using best-guess GPU 3 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device. [W ProcessGroupNCCL.cpp:1671] Rank 2 using best-guess GPU 2 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device. [W ProcessGroupNCCL.cpp:1671] Rank 4 using best-guess GPU 4 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device. [W ProcessGroupNCCL.cpp:1671] Rank 6 using best-guess GPU 6 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device. make: Entering directory '/dataset/xyn/Megatron-LM/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/dataset/xyn/Megatron-LM/megatron/data' >>> done with dataset index builder. Compilation time: 0.041 seconds > compiling and loading fused kernels ... Detected CUDA files, patching ldflags Emitting ninja build file /dataset/xyn/Megatron-LM/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... Detected CUDA files, patching ldflags Emitting ninja build file /dataset/xyn/Megatron-LM/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... Detected CUDA files, patching ldflags Emitting ninja build file /dataset/xyn/Megatron-LM/megatron/fused_kernels/build/build.ninja... Building extension module scaled_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_softmax_cuda... Detected CUDA files, patching ldflags Emitting ninja build file /dataset/xyn/Megatron-LM/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... [W ProcessGroupNCCL.cpp:1671] Rank 0 using best-guess GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device. iv-ybpu7pvmiu5m57lh5kdd:58391:58391 [0] NCCL INFO Bootstrap : Using eth0:192.168.11.230<0> iv-ybpu7pvmiu5m57lh5kdd:58391:58391 [0] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so iv-ybpu7pvmiu5m57lh5kdd:58391:58391 [0] NCCL INFO P2P plugin IBext iv-ybpu7pvmiu5m57lh5kdd:58391:58391 [0] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. iv-ybpu7pvmiu5m57lh5kdd:58391:58391 [0] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE ; OOB eth0:192.168.11.230<0> iv-ybpu7pvmiu5m57lh5kdd:58391:58391 [0] NCCL INFO Using network IBext NCCL version 2.10.3+cuda11.4 iv-ybpu7pvmiu5m57lh5kdd:58394:58394 [3] NCCL INFO Bootstrap : Using eth0:192.168.11.230<0> iv-ybpu7pvmiu5m57lh5kdd:58398:58398 [7] NCCL INFO Bootstrap : Using eth0:192.168.11.230<0> iv-ybpu7pvmiu5m57lh5kdd:58392:58392 [1] NCCL INFO Bootstrap : Using eth0:192.168.11.230<0> iv-ybpu7pvmiu5m57lh5kdd:58393:58393 [2] NCCL INFO Bootstrap : Using eth0:192.168.11.230<0> iv-ybpu7pvmiu5m57lh5kdd:58396:58396 [5] NCCL INFO Bootstrap : Using eth0:192.168.11.230<0> iv-ybpu7pvmiu5m57lh5kdd:58395:58395 [4] NCCL INFO Bootstrap : Using eth0:192.168.11.230<0> iv-ybpu7pvmiu5m57lh5kdd:58397:58397 [6] NCCL INFO Bootstrap : Using eth0:192.168.11.230<0> iv-ybpu7pvmiu5m57lh5kdd:58394:58394 [3] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so iv-ybpu7pvmiu5m57lh5kdd:58394:58394 [3] NCCL INFO P2P plugin IBext iv-ybpu7pvmiu5m57lh5kdd:58394:58394 [3] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. iv-ybpu7pvmiu5m57lh5kdd:58398:58398 [7] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so iv-ybpu7pvmiu5m57lh5kdd:58398:58398 [7] NCCL INFO P2P plugin IBext iv-ybpu7pvmiu5m57lh5kdd:58398:58398 [7] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. iv-ybpu7pvmiu5m57lh5kdd:58392:58392 [1] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so iv-ybpu7pvmiu5m57lh5kdd:58392:58392 [1] NCCL INFO P2P plugin IBext iv-ybpu7pvmiu5m57lh5kdd:58392:58392 [1] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. iv-ybpu7pvmiu5m57lh5kdd:58393:58393 [2] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so iv-ybpu7pvmiu5m57lh5kdd:58393:58393 [2] NCCL INFO P2P plugin IBext iv-ybpu7pvmiu5m57lh5kdd:58393:58393 [2] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. iv-ybpu7pvmiu5m57lh5kdd:58396:58396 [5] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so iv-ybpu7pvmiu5m57lh5kdd:58396:58396 [5] NCCL INFO P2P plugin IBext iv-ybpu7pvmiu5m57lh5kdd:58396:58396 [5] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. iv-ybpu7pvmiu5m57lh5kdd:58397:58397 [6] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so iv-ybpu7pvmiu5m57lh5kdd:58397:58397 [6] NCCL INFO P2P plugin IBext iv-ybpu7pvmiu5m57lh5kdd:58397:58397 [6] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. iv-ybpu7pvmiu5m57lh5kdd:58395:58395 [4] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so iv-ybpu7pvmiu5m57lh5kdd:58395:58395 [4] NCCL INFO P2P plugin IBext iv-ybpu7pvmiu5m57lh5kdd:58395:58395 [4] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1. iv-ybpu7pvmiu5m57lh5kdd:58394:58394 [3] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE ; OOB eth0:192.168.11.230<0> iv-ybpu7pvmiu5m57lh5kdd:58394:58394 [3] NCCL INFO Using network IBext iv-ybpu7pvmiu5m57lh5kdd:58396:58396 [5] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE ; OOB eth0:192.168.11.230<0> iv-ybpu7pvmiu5m57lh5kdd:58396:58396 [5] NCCL INFO Using network IBext iv-ybpu7pvmiu5m57lh5kdd:58398:58398 [7] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE ; OOB eth0:192.168.11.230<0> iv-ybpu7pvmiu5m57lh5kdd:58398:58398 [7] NCCL INFO Using network IBext iv-ybpu7pvmiu5m57lh5kdd:58392:58392 [1] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE ; OOB eth0:192.168.11.230<0> iv-ybpu7pvmiu5m57lh5kdd:58392:58392 [1] NCCL INFO Using network IBext iv-ybpu7pvmiu5m57lh5kdd:58395:58395 [4] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE ; OOB eth0:192.168.11.230<0> iv-ybpu7pvmiu5m57lh5kdd:58395:58395 [4] NCCL INFO Using network IBext iv-ybpu7pvmiu5m57lh5kdd:58393:58393 [2] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE ; OOB eth0:192.168.11.230<0> iv-ybpu7pvmiu5m57lh5kdd:58397:58397 [6] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE ; OOB eth0:192.168.11.230<0> iv-ybpu7pvmiu5m57lh5kdd:58397:58397 [6] NCCL INFO Using network IBext iv-ybpu7pvmiu5m57lh5kdd:58393:58393 [2] NCCL INFO Using network IBext iv-ybpu7pvmiu5m57lh5kdd:58391:58622 [0] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. iv-ybpu7pvmiu5m57lh5kdd:58395:58634 [4] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. iv-ybpu7pvmiu5m57lh5kdd:58396:58631 [5] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. iv-ybpu7pvmiu5m57lh5kdd:58393:58636 [2] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. iv-ybpu7pvmiu5m57lh5kdd:58398:58632 [7] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. iv-ybpu7pvmiu5m57lh5kdd:58392:58633 [1] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. iv-ybpu7pvmiu5m57lh5kdd:58397:58635 [6] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. iv-ybpu7pvmiu5m57lh5kdd:58394:58630 [3] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. iv-ybpu7pvmiu5m57lh5kdd:58391:58622 [0] NCCL INFO NCCL_IB_TIMEOUT set by environment to 23. iv-ybpu7pvmiu5m57lh5kdd:58395:58634 [4] NCCL INFO NCCL_IB_TIMEOUT set by environment to 23. iv-ybpu7pvmiu5m57lh5kdd:58395:58634 [4] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 7. iv-ybpu7pvmiu5m57lh5kdd:58391:58622 [0] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 7. iv-ybpu7pvmiu5m57lh5kdd:58396:58631 [5] NCCL INFO NCCL_IB_TIMEOUT set by environment to 23. iv-ybpu7pvmiu5m57lh5kdd:58396:58631 [5] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 7. iv-ybpu7pvmiu5m57lh5kdd:58398:58632 [7] NCCL INFO NCCL_IB_TIMEOUT set by environment to 23. iv-ybpu7pvmiu5m57lh5kdd:58398:58632 [7] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 7. iv-ybpu7pvmiu5m57lh5kdd:58393:58636 [2] NCCL INFO NCCL_IB_TIMEOUT set by environment to 23. iv-ybpu7pvmiu5m57lh5kdd:58393:58636 [2] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 7. iv-ybpu7pvmiu5m57lh5kdd:58397:58635 [6] NCCL INFO NCCL_IB_TIMEOUT set by environment to 23. iv-ybpu7pvmiu5m57lh5kdd:58397:58635 [6] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 7. iv-ybpu7pvmiu5m57lh5kdd:58392:58633 [1] NCCL INFO NCCL_IB_TIMEOUT set by environment to 23. iv-ybpu7pvmiu5m57lh5kdd:58392:58633 [1] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 7. iv-ybpu7pvmiu5m57lh5kdd:58394:58630 [3] NCCL INFO NCCL_IB_TIMEOUT set by environment to 23. iv-ybpu7pvmiu5m57lh5kdd:58394:58630 [3] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 7. iv-ybpu7pvmiu5m57lh5kdd:58394:58630 [3] NCCL INFO Trees [0] 1/-1/-1->3->2 [1] 1/-1/-1->3->2 [2] 2/-1/-1->3->1 [3] 2/-1/-1->3->1 [4] -1/-1/-1->3->4 [5] 4/-1/-1->3->0 [6] 1/-1/-1->3->2 [7] 1/-1/-1->3->2 [8] 2/-1/-1->3->1 [9] 2/-1/-1->3->1 [10] -1/-1/-1->3->4 [11] 4/-1/-1->3->0 iv-ybpu7pvmiu5m57lh5kdd:58397:58635 [6] NCCL INFO Trees [0] 4/-1/-1->6->1 [1] 4/-1/-1->6->1 [2] 1/-1/-1->6->4 [3] 1/-1/-1->6->4 [4] 7/-1/-1->6->5 [5] 5/-1/-1->6->7 [6] 4/-1/-1->6->1 [7] 4/-1/-1->6->1 [8] 1/-1/-1->6->4 [9] 1/-1/-1->6->4 [10] 7/-1/-1->6->5 [11] 5/-1/-1->6->7 iv-ybpu7pvmiu5m57lh5kdd:58396:58631 [5] NCCL INFO Trees [0] 7/-1/-1->5->4 [1] 7/-1/-1->5->4 [2] 4/-1/-1->5->7 [3] 4/-1/-1->5->7 [4] 6/-1/-1->5->2 [5] 2/-1/-1->5->6 [6] 7/-1/-1->5->4 [7] 7/-1/-1->5->4 [8] 4/-1/-1->5->7 [9] 4/-1/-1->5->7 [10] 6/-1/-1->5->2 [11] 2/-1/-1->5->6 iv-ybpu7pvmiu5m57lh5kdd:58398:58632 [7] NCCL INFO Trees [0] -1/-1/-1->7->5 [1] -1/-1/-1->7->5 [2] 5/-1/-1->7->0 [3] 5/-1/-1->7->0 [4] 4/-1/-1->7->6 [5] 6/-1/-1->7->4 [6] -1/-1/-1->7->5 [7] -1/-1/-1->7->5 [8] 5/-1/-1->7->0 [9] 5/-1/-1->7->0 [10] 4/-1/-1->7->6 [11] 6/-1/-1->7->4 iv-ybpu7pvmiu5m57lh5kdd:58391:58622 [0] NCCL INFO Channel 00/12 : 0 2 3 1 6 4 5 7 iv-ybpu7pvmiu5m57lh5kdd:58395:58634 [4] NCCL INFO Trees [0] 5/-1/-1->4->6 [1] 5/-1/-1->4->6 [2] 6/-1/-1->4->5 [3] 6/-1/-1->4->5 [4] 3/-1/-1->4->7 [5] 7/-1/-1->4->3 [6] 5/-1/-1->4->6 [7] 5/-1/-1->4->6 [8] 6/-1/-1->4->5 [9] 6/-1/-1->4->5 [10] 3/-1/-1->4->7 [11] 7/-1/-1->4->3 iv-ybpu7pvmiu5m57lh5kdd:58397:58635 [6] NCCL INFO Setting affinity for GPU 6 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:58398:58632 [7] NCCL INFO Setting affinity for GPU 7 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:58396:58631 [5] NCCL INFO Setting affinity for GPU 5 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:58394:58630 [3] NCCL INFO Setting affinity for GPU 3 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:58391:58622 [0] NCCL INFO Channel 01/12 : 0 2 3 1 6 4 5 7 iv-ybpu7pvmiu5m57lh5kdd:58391:58622 [0] NCCL INFO Channel 02/12 : 0 7 5 4 6 1 3 2 iv-ybpu7pvmiu5m57lh5kdd:58391:58622 [0] NCCL INFO Channel 03/12 : 0 7 5 4 6 1 3 2 iv-ybpu7pvmiu5m57lh5kdd:58395:58634 [4] NCCL INFO Setting affinity for GPU 4 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:58391:58622 [0] NCCL INFO Channel 04/12 : 0 1 2 5 6 7 4 3 iv-ybpu7pvmiu5m57lh5kdd:58391:58622 [0] NCCL INFO Channel 05/12 : 0 3 4 7 6 5 2 1 iv-ybpu7pvmiu5m57lh5kdd:58391:58622 [0] NCCL INFO Channel 06/12 : 0 2 3 1 6 4 5 7 iv-ybpu7pvmiu5m57lh5kdd:58391:58622 [0] NCCL INFO Channel 07/12 : 0 2 3 1 6 4 5 7 iv-ybpu7pvmiu5m57lh5kdd:58391:58622 [0] NCCL INFO Channel 08/12 : 0 7 5 4 6 1 3 2 iv-ybpu7pvmiu5m57lh5kdd:58391:58622 [0] NCCL INFO Channel 09/12 : 0 7 5 4 6 1 3 2 iv-ybpu7pvmiu5m57lh5kdd:58392:58633 [1] NCCL INFO Trees [0] 6/-1/-1->1->3 [1] 6/-1/-1->1->3 [2] 3/-1/-1->1->6 [3] 3/-1/-1->1->6 [4] 2/-1/-1->1->0 [5] -1/-1/-1->1->2 [6] 6/-1/-1->1->3 [7] 6/-1/-1->1->3 [8] 3/-1/-1->1->6 [9] 3/-1/-1->1->6 [10] 2/-1/-1->1->0 [11] -1/-1/-1->1->2 iv-ybpu7pvmiu5m57lh5kdd:58391:58622 [0] NCCL INFO Channel 10/12 : 0 1 2 5 6 7 4 3 iv-ybpu7pvmiu5m57lh5kdd:58393:58636 [2] NCCL INFO Trees [0] 3/-1/-1->2->0 [1] 3/-1/-1->2->0 [2] -1/-1/-1->2->3 [3] -1/-1/-1->2->3 [4] 5/-1/-1->2->1 [5] 1/-1/-1->2->5 [6] 3/-1/-1->2->0 [7] 3/-1/-1->2->0 [8] -1/-1/-1->2->3 [9] -1/-1/-1->2->3 [10] 5/-1/-1->2->1 [11] 1/-1/-1->2->5 iv-ybpu7pvmiu5m57lh5kdd:58391:58622 [0] NCCL INFO Channel 11/12 : 0 3 4 7 6 5 2 1 iv-ybpu7pvmiu5m57lh5kdd:58392:58633 [1] NCCL INFO Setting affinity for GPU 1 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:58391:58622 [0] NCCL INFO Trees [0] 2/-1/-1->0->-1 [1] 2/-1/-1->0->-1 [2] 7/-1/-1->0->-1 [3] 7/-1/-1->0->-1 [4] 1/-1/-1->0->-1 [5] 3/-1/-1->0->-1 [6] 2/-1/-1->0->-1 [7] 2/-1/-1->0->-1 [8] 7/-1/-1->0->-1 [9] 7/-1/-1->0->-1 [10] 1/-1/-1->0->-1 [11] 3/-1/-1->0->-1 iv-ybpu7pvmiu5m57lh5kdd:58393:58636 [2] NCCL INFO Setting affinity for GPU 2 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:58391:58622 [0] NCCL INFO Setting affinity for GPU 0 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:58397:58635 [6] NCCL INFO Channel 04 : 6[6b010] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58634 [4] NCCL INFO Channel 00 : 4[69010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58633 [1] NCCL INFO Channel 04 : 1[65020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58636 [2] NCCL INFO Channel 00 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58632 [7] NCCL INFO Channel 00 : 7[6b020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58635 [6] NCCL INFO Channel 10 : 6[6b010] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58634 [4] NCCL INFO Channel 01 : 4[69010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58633 [1] NCCL INFO Channel 10 : 1[65020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:58622 [0] NCCL INFO Channel 04 : 0[65010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58636 [2] NCCL INFO Channel 01 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58632 [7] NCCL INFO Channel 01 : 7[6b020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58634 [4] NCCL INFO Channel 06 : 4[69010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58630 [3] NCCL INFO Channel 05 : 3[67020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58631 [5] NCCL INFO Channel 04 : 5[69020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:58622 [0] NCCL INFO Channel 10 : 0[65010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58636 [2] NCCL INFO Channel 06 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58632 [7] NCCL INFO Channel 06 : 7[6b020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58634 [4] NCCL INFO Channel 07 : 4[69010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58630 [3] NCCL INFO Channel 11 : 3[67020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58631 [5] NCCL INFO Channel 10 : 5[69020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58636 [2] NCCL INFO Channel 07 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58632 [7] NCCL INFO Channel 07 : 7[6b020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58633 [1] NCCL INFO Channel 02 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58634 [4] NCCL INFO Channel 02 : 4[69010] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58633 [1] NCCL INFO Channel 03 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58631 [5] NCCL INFO Channel 00 : 5[69020] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:58622 [0] NCCL INFO Channel 00 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58634 [4] NCCL INFO Channel 03 : 4[69010] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58633 [1] NCCL INFO Channel 08 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58631 [5] NCCL INFO Channel 01 : 5[69020] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:58622 [0] NCCL INFO Channel 01 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58633 [1] NCCL INFO Channel 09 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58634 [4] NCCL INFO Channel 08 : 4[69010] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58631 [5] NCCL INFO Channel 06 : 5[69020] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:58622 [0] NCCL INFO Channel 06 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58634 [4] NCCL INFO Channel 09 : 4[69010] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58631 [5] NCCL INFO Channel 07 : 5[69020] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:58622 [0] NCCL INFO Channel 07 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58635 [6] NCCL INFO Channel 02 : 6[6b010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58634 [4] NCCL INFO Channel 05 : 4[69010] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:58622 [0] NCCL INFO Channel 05 : 0[65010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58636 [2] NCCL INFO Channel 04 : 2[67010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58634 [4] NCCL INFO Channel 11 : 4[69010] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58635 [6] NCCL INFO Channel 03 : 6[6b010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:58622 [0] NCCL INFO Channel 11 : 0[65010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58636 [2] NCCL INFO Channel 10 : 2[67010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58635 [6] NCCL INFO Channel 08 : 6[6b010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58635 [6] NCCL INFO Channel 09 : 6[6b010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58631 [5] NCCL INFO Channel 05 : 5[69020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58632 [7] NCCL INFO Channel 04 : 7[6b020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58630 [3] NCCL INFO Channel 04 : 3[67020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58631 [5] NCCL INFO Channel 11 : 5[69020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58630 [3] NCCL INFO Channel 10 : 3[67020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58632 [7] NCCL INFO Channel 10 : 7[6b020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58633 [1] NCCL INFO Channel 00 : 1[65020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58630 [3] NCCL INFO Channel 00 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58632 [7] NCCL INFO Channel 02 : 7[6b020] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58636 [2] NCCL INFO Channel 02 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58633 [1] NCCL INFO Channel 01 : 1[65020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58630 [3] NCCL INFO Channel 01 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58632 [7] NCCL INFO Channel 03 : 7[6b020] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58636 [2] NCCL INFO Channel 03 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58633 [1] NCCL INFO Channel 06 : 1[65020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58630 [3] NCCL INFO Channel 06 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58632 [7] NCCL INFO Channel 08 : 7[6b020] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58636 [2] NCCL INFO Channel 08 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58633 [1] NCCL INFO Channel 07 : 1[65020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58630 [3] NCCL INFO Channel 07 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58632 [7] NCCL INFO Channel 09 : 7[6b020] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58636 [2] NCCL INFO Channel 09 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58635 [6] NCCL INFO Channel 00 : 6[6b010] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58635 [6] NCCL INFO Channel 01 : 6[6b010] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58635 [6] NCCL INFO Channel 06 : 6[6b010] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58631 [5] NCCL INFO Channel 02 : 5[69020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:58622 [0] NCCL INFO Channel 02 : 0[65010] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58635 [6] NCCL INFO Channel 07 : 6[6b010] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58631 [5] NCCL INFO Channel 03 : 5[69020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:58622 [0] NCCL INFO Channel 03 : 0[65010] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58632 [7] NCCL INFO Channel 05 : 7[6b020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58631 [5] NCCL INFO Channel 08 : 5[69020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:58622 [0] NCCL INFO Channel 08 : 0[65010] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58636 [2] NCCL INFO Channel 05 : 2[67010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58632 [7] NCCL INFO Channel 11 : 7[6b020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58631 [5] NCCL INFO Channel 09 : 5[69020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:58622 [0] NCCL INFO Channel 09 : 0[65010] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58636 [2] NCCL INFO Channel 11 : 2[67010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58635 [6] NCCL INFO Channel 05 : 6[6b010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58630 [3] NCCL INFO Channel 02 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58633 [1] NCCL INFO Channel 05 : 1[65020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58635 [6] NCCL INFO Channel 11 : 6[6b010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58630 [3] NCCL INFO Channel 03 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58633 [1] NCCL INFO Channel 11 : 1[65020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58634 [4] NCCL INFO Channel 04 : 4[69010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58630 [3] NCCL INFO Channel 08 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58634 [4] NCCL INFO Channel 10 : 4[69010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58630 [3] NCCL INFO Channel 09 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58635 [6] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58392:58633 [1] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58392:58633 [1] NCCL INFO Channel 05 : 1[65020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58632 [7] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58391:58622 [0] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58392:58633 [1] NCCL INFO Channel 11 : 1[65020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58631 [5] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58397:58635 [6] NCCL INFO Channel 05 : 6[6b010] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58634 [4] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58393:58636 [2] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58394:58630 [3] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58397:58635 [6] NCCL INFO Channel 11 : 6[6b010] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58632 [7] NCCL INFO Channel 02 : 7[6b020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58632 [7] NCCL INFO Channel 03 : 7[6b020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58634 [4] NCCL INFO Channel 02 : 4[69010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58636 [2] NCCL INFO Channel 02 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58632 [7] NCCL INFO Channel 08 : 7[6b020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58634 [4] NCCL INFO Channel 03 : 4[69010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58636 [2] NCCL INFO Channel 03 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58632 [7] NCCL INFO Channel 09 : 7[6b020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58631 [5] NCCL INFO Channel 05 : 5[69020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58630 [3] NCCL INFO Channel 04 : 3[67020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58636 [2] NCCL INFO Channel 08 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58634 [4] NCCL INFO Channel 08 : 4[69010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58631 [5] NCCL INFO Channel 11 : 5[69020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58636 [2] NCCL INFO Channel 09 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58630 [3] NCCL INFO Channel 10 : 3[67020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58634 [4] NCCL INFO Channel 09 : 4[69010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58633 [1] NCCL INFO Channel 00 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58633 [1] NCCL INFO Channel 01 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58633 [1] NCCL INFO Channel 06 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58636 [2] NCCL INFO Channel 05 : 2[67010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58631 [5] NCCL INFO Channel 02 : 5[69020] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58634 [4] NCCL INFO Channel 00 : 4[69010] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58633 [1] NCCL INFO Channel 07 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58636 [2] NCCL INFO Channel 11 : 2[67010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58634 [4] NCCL INFO Channel 01 : 4[69010] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58631 [5] NCCL INFO Channel 03 : 5[69020] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58634 [4] NCCL INFO Channel 06 : 4[69010] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58631 [5] NCCL INFO Channel 08 : 5[69020] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58634 [4] NCCL INFO Channel 07 : 4[69010] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58631 [5] NCCL INFO Channel 09 : 5[69020] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58630 [3] NCCL INFO Channel 05 : 3[67020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58635 [6] NCCL INFO Channel 00 : 6[6b010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58634 [4] NCCL INFO Channel 04 : 4[69010] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58630 [3] NCCL INFO Channel 11 : 3[67020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58635 [6] NCCL INFO Channel 01 : 6[6b010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58634 [4] NCCL INFO Channel 10 : 4[69010] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58635 [6] NCCL INFO Channel 06 : 6[6b010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58635 [6] NCCL INFO Channel 07 : 6[6b010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58630 [3] NCCL INFO Channel 02 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58632 [7] NCCL INFO Channel 05 : 7[6b020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58631 [5] NCCL INFO Channel 04 : 5[69020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58630 [3] NCCL INFO Channel 03 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58632 [7] NCCL INFO Channel 11 : 7[6b020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58631 [5] NCCL INFO Channel 10 : 5[69020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58630 [3] NCCL INFO Channel 08 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58630 [3] NCCL INFO Channel 09 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58633 [1] NCCL INFO Channel 02 : 1[65020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58632 [7] NCCL INFO Channel 00 : 7[6b020] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58636 [2] NCCL INFO Channel 00 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58633 [1] NCCL INFO Channel 03 : 1[65020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58632 [7] NCCL INFO Channel 01 : 7[6b020] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58636 [2] NCCL INFO Channel 01 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58633 [1] NCCL INFO Channel 08 : 1[65020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58636 [2] NCCL INFO Channel 06 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58632 [7] NCCL INFO Channel 06 : 7[6b020] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58633 [1] NCCL INFO Channel 09 : 1[65020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58636 [2] NCCL INFO Channel 07 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58632 [7] NCCL INFO Channel 07 : 7[6b020] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58635 [6] NCCL INFO Channel 02 : 6[6b010] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58632 [7] NCCL INFO Channel 04 : 7[6b020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58635 [6] NCCL INFO Channel 03 : 6[6b010] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58632 [7] NCCL INFO Channel 10 : 7[6b020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58635 [6] NCCL INFO Channel 08 : 6[6b010] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58631 [5] NCCL INFO Channel 00 : 5[69020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58635 [6] NCCL INFO Channel 09 : 6[6b010] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58631 [5] NCCL INFO Channel 01 : 5[69020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58636 [2] NCCL INFO Channel 04 : 2[67010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58631 [5] NCCL INFO Channel 06 : 5[69020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58636 [2] NCCL INFO Channel 10 : 2[67010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58631 [5] NCCL INFO Channel 07 : 5[69020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58633 [1] NCCL INFO Channel 04 : 1[65020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58630 [3] NCCL INFO Channel 00 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58635 [6] NCCL INFO Channel 04 : 6[6b010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58630 [3] NCCL INFO Channel 01 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58633 [1] NCCL INFO Channel 10 : 1[65020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58635 [6] NCCL INFO Channel 10 : 6[6b010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58634 [4] NCCL INFO Channel 05 : 4[69010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58630 [3] NCCL INFO Channel 06 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:58622 [0] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58391:58622 [0] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:58398:58632 [7] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58398:58632 [7] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:58395:58634 [4] NCCL INFO Channel 11 : 4[69010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58630 [3] NCCL INFO Channel 07 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58633 [1] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58392:58633 [1] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:58397:58635 [6] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58397:58635 [6] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:58391:58622 [0] NCCL INFO 12 coll channels, 16 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58398:58632 [7] NCCL INFO 12 coll channels, 16 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58398:58632 [7] NCCL INFO Channel 02 : 7[6b020] -> 1[65020] via P2P/indirect/6[6b010] iv-ybpu7pvmiu5m57lh5kdd:58396:58631 [5] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58396:58631 [5] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:58395:58634 [4] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58395:58634 [4] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:58392:58633 [1] NCCL INFO 12 coll channels, 16 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58393:58636 [2] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58393:58636 [2] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:58397:58635 [6] NCCL INFO 12 coll channels, 16 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58394:58630 [3] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58394:58630 [3] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:58398:58632 [7] NCCL INFO Channel 10 : 7[6b020] -> 1[65020] via P2P/indirect/6[6b010] iv-ybpu7pvmiu5m57lh5kdd:58396:58631 [5] NCCL INFO 12 coll channels, 16 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58397:58635 [6] NCCL INFO Channel 02 : 6[6b010] -> 0[65010] via P2P/indirect/7[6b020] iv-ybpu7pvmiu5m57lh5kdd:58395:58634 [4] NCCL INFO 12 coll channels, 16 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58393:58636 [2] NCCL INFO 12 coll channels, 16 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58394:58630 [3] NCCL INFO 12 coll channels, 16 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58397:58635 [6] NCCL INFO Channel 10 : 6[6b010] -> 0[65010] via P2P/indirect/7[6b020] iv-ybpu7pvmiu5m57lh5kdd:58393:58636 [2] NCCL INFO Channel 02 : 2[67010] -> 4[69010] via P2P/indirect/5[69020] iv-ybpu7pvmiu5m57lh5kdd:58394:58630 [3] NCCL INFO Channel 02 : 3[67020] -> 5[69020] via P2P/indirect/4[69010] iv-ybpu7pvmiu5m57lh5kdd:58398:58632 [7] NCCL INFO Channel 03 : 7[6b020] -> 2[67010] via P2P/indirect/0[65010] iv-ybpu7pvmiu5m57lh5kdd:58392:58633 [1] NCCL INFO Channel 03 : 1[65020] -> 4[69010] via P2P/indirect/6[6b010] iv-ybpu7pvmiu5m57lh5kdd:58393:58636 [2] NCCL INFO Channel 10 : 2[67010] -> 4[69010] via P2P/indirect/5[69020] iv-ybpu7pvmiu5m57lh5kdd:58398:58632 [7] NCCL INFO Channel 11 : 7[6b020] -> 2[67010] via P2P/indirect/0[65010] iv-ybpu7pvmiu5m57lh5kdd:58392:58633 [1] NCCL INFO Channel 11 : 1[65020] -> 4[69010] via P2P/indirect/6[6b010] iv-ybpu7pvmiu5m57lh5kdd:58394:58630 [3] NCCL INFO Channel 10 : 3[67020] -> 5[69020] via P2P/indirect/4[69010] iv-ybpu7pvmiu5m57lh5kdd:58396:58631 [5] NCCL INFO Channel 03 : 5[69020] -> 0[65010] via P2P/indirect/7[6b020] iv-ybpu7pvmiu5m57lh5kdd:58394:58630 [3] NCCL INFO Channel 03 : 3[67020] -> 6[6b010] via P2P/indirect/1[65020] iv-ybpu7pvmiu5m57lh5kdd:58396:58631 [5] NCCL INFO Channel 11 : 5[69020] -> 0[65010] via P2P/indirect/7[6b020] iv-ybpu7pvmiu5m57lh5kdd:58394:58630 [3] NCCL INFO Channel 11 : 3[67020] -> 6[6b010] via P2P/indirect/1[65020] iv-ybpu7pvmiu5m57lh5kdd:58393:58636 [2] NCCL INFO Channel 04 : 2[67010] -> 6[6b010] via P2P/indirect/1[65020] iv-ybpu7pvmiu5m57lh5kdd:58395:58634 [4] NCCL INFO Channel 04 : 4[69010] -> 0[65010] via P2P/indirect/7[6b020] iv-ybpu7pvmiu5m57lh5kdd:58397:58635 [6] NCCL INFO Channel 04 : 6[6b010] -> 2[67010] via P2P/indirect/5[69020] iv-ybpu7pvmiu5m57lh5kdd:58395:58634 [4] NCCL INFO Channel 12 : 4[69010] -> 0[65010] via P2P/indirect/7[6b020] iv-ybpu7pvmiu5m57lh5kdd:58398:58632 [7] NCCL INFO Channel 04 : 7[6b020] -> 3[67020] via P2P/indirect/0[65010] iv-ybpu7pvmiu5m57lh5kdd:58393:58636 [2] NCCL INFO Channel 12 : 2[67010] -> 6[6b010] via P2P/indirect/1[65020] iv-ybpu7pvmiu5m57lh5kdd:58391:58622 [0] NCCL INFO Channel 04 : 0[65010] -> 4[69010] via P2P/indirect/3[67020] iv-ybpu7pvmiu5m57lh5kdd:58396:58631 [5] NCCL INFO Channel 04 : 5[69020] -> 1[65020] via P2P/indirect/6[6b010] iv-ybpu7pvmiu5m57lh5kdd:58392:58633 [1] NCCL INFO Channel 04 : 1[65020] -> 5[69020] via P2P/indirect/6[6b010] iv-ybpu7pvmiu5m57lh5kdd:58394:58630 [3] NCCL INFO Channel 04 : 3[67020] -> 7[6b020] via P2P/indirect/0[65010] iv-ybpu7pvmiu5m57lh5kdd:58397:58635 [6] NCCL INFO Channel 12 : 6[6b010] -> 2[67010] via P2P/indirect/5[69020] iv-ybpu7pvmiu5m57lh5kdd:58398:58632 [7] NCCL INFO Channel 12 : 7[6b020] -> 3[67020] via P2P/indirect/0[65010] iv-ybpu7pvmiu5m57lh5kdd:58391:58622 [0] NCCL INFO Channel 12 : 0[65010] -> 4[69010] via P2P/indirect/3[67020] iv-ybpu7pvmiu5m57lh5kdd:58396:58631 [5] NCCL INFO Channel 12 : 5[69020] -> 1[65020] via P2P/indirect/6[6b010] iv-ybpu7pvmiu5m57lh5kdd:58394:58630 [3] NCCL INFO Channel 12 : 3[67020] -> 7[6b020] via P2P/indirect/0[65010] iv-ybpu7pvmiu5m57lh5kdd:58392:58633 [1] NCCL INFO Channel 12 : 1[65020] -> 5[69020] via P2P/indirect/6[6b010] iv-ybpu7pvmiu5m57lh5kdd:58393:58636 [2] NCCL INFO Channel 05 : 2[67010] -> 7[6b020] via P2P/indirect/0[65010] iv-ybpu7pvmiu5m57lh5kdd:58395:58634 [4] NCCL INFO Channel 05 : 4[69010] -> 1[65020] via P2P/indirect/6[6b010] iv-ybpu7pvmiu5m57lh5kdd:58391:58622 [0] NCCL INFO Channel 05 : 0[65010] -> 5[69020] via P2P/indirect/7[6b020] iv-ybpu7pvmiu5m57lh5kdd:58397:58635 [6] NCCL INFO Channel 05 : 6[6b010] -> 3[67020] via P2P/indirect/1[65020] iv-ybpu7pvmiu5m57lh5kdd:58395:58634 [4] NCCL INFO Channel 13 : 4[69010] -> 1[65020] via P2P/indirect/6[6b010] iv-ybpu7pvmiu5m57lh5kdd:58393:58636 [2] NCCL INFO Channel 13 : 2[67010] -> 7[6b020] via P2P/indirect/0[65010] iv-ybpu7pvmiu5m57lh5kdd:58391:58622 [0] NCCL INFO Channel 13 : 0[65010] -> 5[69020] via P2P/indirect/7[6b020] iv-ybpu7pvmiu5m57lh5kdd:58397:58635 [6] NCCL INFO Channel 13 : 6[6b010] -> 3[67020] via P2P/indirect/1[65020] iv-ybpu7pvmiu5m57lh5kdd:58392:58633 [1] NCCL INFO Channel 06 : 1[65020] -> 7[6b020] via P2P/indirect/0[65010] iv-ybpu7pvmiu5m57lh5kdd:58391:58622 [0] NCCL INFO Channel 06 : 0[65010] -> 6[6b010] via P2P/indirect/1[65020] iv-ybpu7pvmiu5m57lh5kdd:58395:58634 [4] NCCL INFO Channel 06 : 4[69010] -> 2[67010] via P2P/indirect/3[67020] iv-ybpu7pvmiu5m57lh5kdd:58392:58633 [1] NCCL INFO Channel 14 : 1[65020] -> 7[6b020] via P2P/indirect/0[65010] iv-ybpu7pvmiu5m57lh5kdd:58396:58631 [5] NCCL INFO Channel 06 : 5[69020] -> 3[67020] via P2P/indirect/2[67010] iv-ybpu7pvmiu5m57lh5kdd:58391:58622 [0] NCCL INFO Channel 14 : 0[65010] -> 6[6b010] via P2P/indirect/1[65020] iv-ybpu7pvmiu5m57lh5kdd:58396:58631 [5] NCCL INFO Channel 14 : 5[69020] -> 3[67020] via P2P/indirect/2[67010] iv-ybpu7pvmiu5m57lh5kdd:58395:58634 [4] NCCL INFO Channel 14 : 4[69010] -> 2[67010] via P2P/indirect/3[67020] iv-ybpu7pvmiu5m57lh5kdd:58397:58635 [6] NCCL INFO comm 0x7fd4dc008fb0 rank 6 nranks 8 cudaDev 6 busId 6b010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58396:58631 [5] NCCL INFO comm 0x7f6fdc008fb0 rank 5 nranks 8 cudaDev 5 busId 69020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58395:58634 [4] NCCL INFO comm 0x7f97f4008fb0 rank 4 nranks 8 cudaDev 4 busId 69010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58392:58633 [1] NCCL INFO comm 0x7f37b4008fb0 rank 1 nranks 8 cudaDev 1 busId 65020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58393:58636 [2] NCCL INFO comm 0x7f9510008fb0 rank 2 nranks 8 cudaDev 2 busId 67010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58394:58630 [3] NCCL INFO comm 0x7fb2b0008fb0 rank 3 nranks 8 cudaDev 3 busId 67020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58391:58622 [0] NCCL INFO comm 0x7fe3b0008fb0 rank 0 nranks 8 cudaDev 0 busId 65010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58398:58632 [7] NCCL INFO comm 0x7fe0fc008fb0 rank 7 nranks 8 cudaDev 7 busId 6b020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58391:58391 [0] NCCL INFO Launch mode Parallel >>> done with compiling and loading fused kernels. Compilation time: 6.770 seconds time to initialize megatron (seconds): 7.348 [after megatron is initialized] datetime: 2022-07-05 15:41:54 building GPT model ... > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 75577344 > number of parameters on (tensor, pipeline) model parallel rank (0, 6): 75577344 > number of parameters on (tensor, pipeline) model parallel rank (0, 5): 75577344 > number of parameters on (tensor, pipeline) model parallel rank (0, 4): 75577344 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 75577344 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 75577344 iv-ybpu7pvmiu5m57lh5kdd:58398:58713 [7] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 [2] -1/-1/-1->1->0 [3] -1/-1/-1->1->0 iv-ybpu7pvmiu5m57lh5kdd:58391:58712 [0] NCCL INFO Channel 00/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:58391:58712 [0] NCCL INFO Channel 01/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:58398:58713 [7] NCCL INFO Setting affinity for GPU 7 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:58391:58712 [0] NCCL INFO Channel 02/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:58391:58712 [0] NCCL INFO Channel 03/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:58391:58712 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 iv-ybpu7pvmiu5m57lh5kdd:58391:58712 [0] NCCL INFO Setting affinity for GPU 0 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:58391:58712 [0] NCCL INFO Channel 00 : 0[65010] -> 1[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58713 [7] NCCL INFO Channel 00 : 1[6b020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:58712 [0] NCCL INFO Channel 01 : 0[65010] -> 1[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58713 [7] NCCL INFO Channel 01 : 1[6b020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:58712 [0] NCCL INFO Channel 02 : 0[65010] -> 1[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58713 [7] NCCL INFO Channel 02 : 1[6b020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:58712 [0] NCCL INFO Channel 03 : 0[65010] -> 1[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58713 [7] NCCL INFO Channel 03 : 1[6b020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:58712 [0] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58391:58712 [0] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58391:58712 [0] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:58391:58712 [0] NCCL INFO 4 coll channels, 4 p2p channels, 4 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58398:58713 [7] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58398:58713 [7] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58398:58713 [7] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:58398:58713 [7] NCCL INFO 4 coll channels, 4 p2p channels, 4 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58391:58712 [0] NCCL INFO comm 0x7fe360008fb0 rank 0 nranks 2 cudaDev 0 busId 65010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58398:58713 [7] NCCL INFO comm 0x7fe0b8008fb0 rank 1 nranks 2 cudaDev 7 busId 6b020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58391:58391 [0] NCCL INFO Launch mode Parallel > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 128137216 > number of parameters on (tensor, pipeline) model parallel rank (0, 7): 127090688 > learning rate decay style: cosine [after model, optimizer, and learning rate scheduler are built] datetime: 2022-07-05 15:41:54 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 84480 validation: 3840 test: 3840 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 0.005477 seconds number of documents: 1249934 > dataset split: train: document indices in [0, 1186187) total of 1186187 documents validation: document indices in [1186187, 1248684) total of 62497 documents test: document indices in [1248684, 1249934) total of 1250 documents NCCL version 2.10.3+cuda11.4 NCCL version 2.10.3+cuda11.4 NCCL version 2.10.3+cuda11.4 NCCL version 2.10.3+cuda11.4 NCCL version 2.10.3+cuda11.4 NCCL version 2.10.3+cuda11.4 NCCL version 2.10.3+cuda11.4 > WARNING: could not find index map files, building the indices on rank 0 ... > last epoch number of samples (30057) is smaller than 80% of number of samples per epoch (54423), setting separate_last_epoch to True > elasped time to build and save doc-idx mapping (seconds): 0.098484 using: number of documents: 1186187 number of epochs: 2 sequence length: 1024 total number of samples: 108846 > elasped time to build and save sample-idx mapping (seconds): 0.010924 > building shuffle index with split [0, 54423) and [54423, 108846) ... > elasped time to build and save shuffle-idx mapping (seconds): 0.003898 iv-ybpu7pvmiu5m57lh5kdd:58397:58721 [6] NCCL INFO Channel 00/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58721 [6] NCCL INFO Channel 01/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58721 [6] NCCL INFO Channel 02/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58721 [6] NCCL INFO Channel 03/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58721 [6] NCCL INFO Channel 04/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58721 [6] NCCL INFO Channel 05/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58721 [6] NCCL INFO Channel 06/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58721 [6] NCCL INFO Channel 07/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58721 [6] NCCL INFO Channel 08/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58721 [6] NCCL INFO Channel 09/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58721 [6] NCCL INFO Channel 10/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58721 [6] NCCL INFO Channel 11/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58721 [6] NCCL INFO Channel 12/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58721 [6] NCCL INFO Channel 13/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58721 [6] NCCL INFO Channel 14/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58721 [6] NCCL INFO Channel 15/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58721 [6] NCCL INFO Channel 16/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58721 [6] NCCL INFO Channel 17/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58721 [6] NCCL INFO Channel 18/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58721 [6] NCCL INFO Channel 19/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58721 [6] NCCL INFO Channel 20/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58721 [6] NCCL INFO Channel 21/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58721 [6] NCCL INFO Channel 22/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58721 [6] NCCL INFO Channel 23/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58721 [6] NCCL INFO Channel 24/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58721 [6] NCCL INFO Channel 25/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58721 [6] NCCL INFO Channel 26/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58721 [6] NCCL INFO Channel 27/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58721 [6] NCCL INFO Channel 28/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58721 [6] NCCL INFO Channel 29/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58721 [6] NCCL INFO Channel 30/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58721 [6] NCCL INFO Channel 31/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58721 [6] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-ybpu7pvmiu5m57lh5kdd:58397:58721 [6] NCCL INFO Setting affinity for GPU 6 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:58396:58723 [5] NCCL INFO Channel 00/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58723 [5] NCCL INFO Channel 01/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58723 [5] NCCL INFO Channel 02/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58723 [5] NCCL INFO Channel 03/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58723 [5] NCCL INFO Channel 04/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58723 [5] NCCL INFO Channel 05/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58723 [5] NCCL INFO Channel 06/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58723 [5] NCCL INFO Channel 07/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58723 [5] NCCL INFO Channel 08/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58723 [5] NCCL INFO Channel 09/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58723 [5] NCCL INFO Channel 10/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58723 [5] NCCL INFO Channel 11/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58723 [5] NCCL INFO Channel 12/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58723 [5] NCCL INFO Channel 13/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58723 [5] NCCL INFO Channel 14/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58723 [5] NCCL INFO Channel 15/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58723 [5] NCCL INFO Channel 16/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58723 [5] NCCL INFO Channel 17/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58723 [5] NCCL INFO Channel 18/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58723 [5] NCCL INFO Channel 19/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58723 [5] NCCL INFO Channel 20/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58723 [5] NCCL INFO Channel 21/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58723 [5] NCCL INFO Channel 22/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58723 [5] NCCL INFO Channel 23/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58723 [5] NCCL INFO Channel 24/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58723 [5] NCCL INFO Channel 25/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58723 [5] NCCL INFO Channel 26/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58723 [5] NCCL INFO Channel 27/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58723 [5] NCCL INFO Channel 28/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58723 [5] NCCL INFO Channel 29/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58723 [5] NCCL INFO Channel 30/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58723 [5] NCCL INFO Channel 31/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58723 [5] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-ybpu7pvmiu5m57lh5kdd:58396:58723 [5] NCCL INFO Setting affinity for GPU 5 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:58395:58722 [4] NCCL INFO Channel 00/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58722 [4] NCCL INFO Channel 01/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58722 [4] NCCL INFO Channel 02/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58722 [4] NCCL INFO Channel 03/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58722 [4] NCCL INFO Channel 04/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58722 [4] NCCL INFO Channel 05/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58722 [4] NCCL INFO Channel 06/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58722 [4] NCCL INFO Channel 07/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58722 [4] NCCL INFO Channel 08/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58722 [4] NCCL INFO Channel 09/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58722 [4] NCCL INFO Channel 10/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58722 [4] NCCL INFO Channel 11/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58722 [4] NCCL INFO Channel 12/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58722 [4] NCCL INFO Channel 13/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58722 [4] NCCL INFO Channel 14/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58722 [4] NCCL INFO Channel 15/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58722 [4] NCCL INFO Channel 16/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58722 [4] NCCL INFO Channel 17/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58722 [4] NCCL INFO Channel 18/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58722 [4] NCCL INFO Channel 19/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58722 [4] NCCL INFO Channel 20/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58722 [4] NCCL INFO Channel 21/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58722 [4] NCCL INFO Channel 22/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58722 [4] NCCL INFO Channel 23/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58722 [4] NCCL INFO Channel 24/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58722 [4] NCCL INFO Channel 25/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58722 [4] NCCL INFO Channel 26/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58722 [4] NCCL INFO Channel 27/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58722 [4] NCCL INFO Channel 28/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58722 [4] NCCL INFO Channel 29/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58722 [4] NCCL INFO Channel 30/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58722 [4] NCCL INFO Channel 31/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58722 [4] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-ybpu7pvmiu5m57lh5kdd:58395:58722 [4] NCCL INFO Setting affinity for GPU 4 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:58392:58725 [1] NCCL INFO Channel 00/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58725 [1] NCCL INFO Channel 01/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58725 [1] NCCL INFO Channel 02/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58725 [1] NCCL INFO Channel 03/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58725 [1] NCCL INFO Channel 04/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58725 [1] NCCL INFO Channel 05/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58725 [1] NCCL INFO Channel 06/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58725 [1] NCCL INFO Channel 07/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58725 [1] NCCL INFO Channel 08/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58725 [1] NCCL INFO Channel 09/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58725 [1] NCCL INFO Channel 10/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58725 [1] NCCL INFO Channel 11/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58725 [1] NCCL INFO Channel 12/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58725 [1] NCCL INFO Channel 13/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58725 [1] NCCL INFO Channel 14/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58725 [1] NCCL INFO Channel 15/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58725 [1] NCCL INFO Channel 16/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58725 [1] NCCL INFO Channel 17/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58725 [1] NCCL INFO Channel 18/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58725 [1] NCCL INFO Channel 19/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58725 [1] NCCL INFO Channel 20/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58725 [1] NCCL INFO Channel 21/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58725 [1] NCCL INFO Channel 22/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58725 [1] NCCL INFO Channel 23/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58725 [1] NCCL INFO Channel 24/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58725 [1] NCCL INFO Channel 25/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58725 [1] NCCL INFO Channel 26/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58725 [1] NCCL INFO Channel 27/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58725 [1] NCCL INFO Channel 28/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58725 [1] NCCL INFO Channel 29/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58725 [1] NCCL INFO Channel 30/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58725 [1] NCCL INFO Channel 31/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58725 [1] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-ybpu7pvmiu5m57lh5kdd:58392:58725 [1] NCCL INFO Setting affinity for GPU 1 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:58394:58733 [3] NCCL INFO Channel 00/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58733 [3] NCCL INFO Channel 01/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58733 [3] NCCL INFO Channel 02/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58733 [3] NCCL INFO Channel 03/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58733 [3] NCCL INFO Channel 04/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58733 [3] NCCL INFO Channel 05/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58733 [3] NCCL INFO Channel 06/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58733 [3] NCCL INFO Channel 07/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58733 [3] NCCL INFO Channel 08/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58733 [3] NCCL INFO Channel 09/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58733 [3] NCCL INFO Channel 10/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58733 [3] NCCL INFO Channel 11/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58733 [3] NCCL INFO Channel 12/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58733 [3] NCCL INFO Channel 13/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58733 [3] NCCL INFO Channel 14/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58733 [3] NCCL INFO Channel 15/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58733 [3] NCCL INFO Channel 16/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58733 [3] NCCL INFO Channel 17/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58733 [3] NCCL INFO Channel 18/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58733 [3] NCCL INFO Channel 19/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58733 [3] NCCL INFO Channel 20/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58733 [3] NCCL INFO Channel 21/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58733 [3] NCCL INFO Channel 22/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58733 [3] NCCL INFO Channel 23/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58733 [3] NCCL INFO Channel 24/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58733 [3] NCCL INFO Channel 25/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58733 [3] NCCL INFO Channel 26/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58733 [3] NCCL INFO Channel 27/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58733 [3] NCCL INFO Channel 28/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58733 [3] NCCL INFO Channel 29/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58733 [3] NCCL INFO Channel 30/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58733 [3] NCCL INFO Channel 31/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58733 [3] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-ybpu7pvmiu5m57lh5kdd:58394:58733 [3] NCCL INFO Setting affinity for GPU 3 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:58393:58731 [2] NCCL INFO Channel 00/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58731 [2] NCCL INFO Channel 01/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58731 [2] NCCL INFO Channel 02/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58731 [2] NCCL INFO Channel 03/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58731 [2] NCCL INFO Channel 04/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58731 [2] NCCL INFO Channel 05/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58731 [2] NCCL INFO Channel 06/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58731 [2] NCCL INFO Channel 07/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58731 [2] NCCL INFO Channel 08/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58731 [2] NCCL INFO Channel 09/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58731 [2] NCCL INFO Channel 10/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58731 [2] NCCL INFO Channel 11/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58731 [2] NCCL INFO Channel 12/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58731 [2] NCCL INFO Channel 13/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58731 [2] NCCL INFO Channel 14/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58731 [2] NCCL INFO Channel 15/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58731 [2] NCCL INFO Channel 16/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58731 [2] NCCL INFO Channel 17/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58731 [2] NCCL INFO Channel 18/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58731 [2] NCCL INFO Channel 19/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58731 [2] NCCL INFO Channel 20/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58731 [2] NCCL INFO Channel 21/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58731 [2] NCCL INFO Channel 22/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58731 [2] NCCL INFO Channel 23/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58731 [2] NCCL INFO Channel 24/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58731 [2] NCCL INFO Channel 25/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58731 [2] NCCL INFO Channel 26/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58731 [2] NCCL INFO Channel 27/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58731 [2] NCCL INFO Channel 28/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58731 [2] NCCL INFO Channel 29/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58731 [2] NCCL INFO Channel 30/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58731 [2] NCCL INFO Channel 31/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58731 [2] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-ybpu7pvmiu5m57lh5kdd:58393:58731 [2] NCCL INFO Setting affinity for GPU 2 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:58398:58735 [7] NCCL INFO Channel 00/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58735 [7] NCCL INFO Channel 01/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58735 [7] NCCL INFO Channel 02/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58735 [7] NCCL INFO Channel 03/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58735 [7] NCCL INFO Channel 04/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58735 [7] NCCL INFO Channel 05/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58735 [7] NCCL INFO Channel 06/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58735 [7] NCCL INFO Channel 07/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58735 [7] NCCL INFO Channel 08/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58735 [7] NCCL INFO Channel 09/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58735 [7] NCCL INFO Channel 10/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58735 [7] NCCL INFO Channel 11/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58735 [7] NCCL INFO Channel 12/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58735 [7] NCCL INFO Channel 13/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58735 [7] NCCL INFO Channel 14/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58735 [7] NCCL INFO Channel 15/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58735 [7] NCCL INFO Channel 16/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58735 [7] NCCL INFO Channel 17/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58735 [7] NCCL INFO Channel 18/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58735 [7] NCCL INFO Channel 19/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58735 [7] NCCL INFO Channel 20/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58735 [7] NCCL INFO Channel 21/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58735 [7] NCCL INFO Channel 22/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58735 [7] NCCL INFO Channel 23/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58735 [7] NCCL INFO Channel 24/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58735 [7] NCCL INFO Channel 25/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58735 [7] NCCL INFO Channel 26/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58735 [7] NCCL INFO Channel 27/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58735 [7] NCCL INFO Channel 28/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58735 [7] NCCL INFO Channel 29/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58735 [7] NCCL INFO Channel 30/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58735 [7] NCCL INFO Channel 31/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58735 [7] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-ybpu7pvmiu5m57lh5kdd:58398:58735 [7] NCCL INFO Setting affinity for GPU 7 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:58391:58740 [0] NCCL INFO Channel 00/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58740 [0] NCCL INFO Channel 01/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58740 [0] NCCL INFO Channel 02/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58740 [0] NCCL INFO Channel 03/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58740 [0] NCCL INFO Channel 04/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58740 [0] NCCL INFO Channel 05/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58740 [0] NCCL INFO Channel 06/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58740 [0] NCCL INFO Channel 07/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58740 [0] NCCL INFO Channel 08/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58740 [0] NCCL INFO Channel 09/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58740 [0] NCCL INFO Channel 10/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58740 [0] NCCL INFO Channel 11/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58740 [0] NCCL INFO Channel 12/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58740 [0] NCCL INFO Channel 13/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58740 [0] NCCL INFO Channel 14/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58740 [0] NCCL INFO Channel 15/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58740 [0] NCCL INFO Channel 16/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58740 [0] NCCL INFO Channel 17/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58740 [0] NCCL INFO Channel 18/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58740 [0] NCCL INFO Channel 19/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58740 [0] NCCL INFO Channel 20/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58740 [0] NCCL INFO Channel 21/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58740 [0] NCCL INFO Channel 22/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58740 [0] NCCL INFO Channel 23/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58740 [0] NCCL INFO Channel 24/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58740 [0] NCCL INFO Channel 25/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58740 [0] NCCL INFO Channel 26/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58740 [0] NCCL INFO Channel 27/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58740 [0] NCCL INFO Channel 28/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58740 [0] NCCL INFO Channel 29/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58740 [0] NCCL INFO Channel 30/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58740 [0] NCCL INFO Channel 31/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58740 [0] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-ybpu7pvmiu5m57lh5kdd:58391:58740 [0] NCCL INFO Setting affinity for GPU 0 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:58397:58721 [6] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58397:58721 [6] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58397:58721 [6] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58397:58721 [6] NCCL INFO comm 0x7fd46c008fb0 rank 0 nranks 1 cudaDev 6 busId 6b010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58395:58722 [4] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58395:58722 [4] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58395:58722 [4] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58395:58722 [4] NCCL INFO comm 0x7f9774008fb0 rank 0 nranks 1 cudaDev 4 busId 69010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58396:58723 [5] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58396:58723 [5] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58396:58723 [5] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58396:58723 [5] NCCL INFO comm 0x7f6f6c008fb0 rank 0 nranks 1 cudaDev 5 busId 69020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58392:58725 [1] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58392:58725 [1] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58392:58725 [1] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58392:58725 [1] NCCL INFO comm 0x7f373c008fb0 rank 0 nranks 1 cudaDev 1 busId 65020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58394:58733 [3] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58394:58733 [3] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58394:58733 [3] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58394:58733 [3] NCCL INFO comm 0x7fb244008fb0 rank 0 nranks 1 cudaDev 3 busId 67020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58393:58731 [2] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58393:58731 [2] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58393:58731 [2] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58393:58731 [2] NCCL INFO comm 0x7f9498008fb0 rank 0 nranks 1 cudaDev 2 busId 67010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58398:58735 [7] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58398:58735 [7] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58398:58735 [7] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58398:58735 [7] NCCL INFO comm 0x7fe06c008fb0 rank 0 nranks 1 cudaDev 7 busId 6b020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58391:58740 [0] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58391:58740 [0] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58391:58740 [0] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58391:58740 [0] NCCL INFO comm 0x7fe320008fb0 rank 0 nranks 1 cudaDev 0 busId 65010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58398:58757 [7] NCCL INFO Trees [0] -1/-1/-1->7->5 [1] -1/-1/-1->7->5 [2] 5/-1/-1->7->0 [3] 5/-1/-1->7->0 [4] 4/-1/-1->7->6 [5] 6/-1/-1->7->4 [6] -1/-1/-1->7->5 [7] -1/-1/-1->7->5 [8] 5/-1/-1->7->0 [9] 5/-1/-1->7->0 [10] 4/-1/-1->7->6 [11] 6/-1/-1->7->4 iv-ybpu7pvmiu5m57lh5kdd:58398:58757 [7] NCCL INFO Setting affinity for GPU 7 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:58391:58751 [0] NCCL INFO Channel 00/12 : 0 2 3 1 6 4 5 7 iv-ybpu7pvmiu5m57lh5kdd:58391:58751 [0] NCCL INFO Channel 01/12 : 0 2 3 1 6 4 5 7 iv-ybpu7pvmiu5m57lh5kdd:58391:58751 [0] NCCL INFO Channel 02/12 : 0 7 5 4 6 1 3 2 iv-ybpu7pvmiu5m57lh5kdd:58391:58751 [0] NCCL INFO Channel 03/12 : 0 7 5 4 6 1 3 2 iv-ybpu7pvmiu5m57lh5kdd:58391:58751 [0] NCCL INFO Channel 04/12 : 0 1 2 5 6 7 4 3 iv-ybpu7pvmiu5m57lh5kdd:58391:58751 [0] NCCL INFO Channel 05/12 : 0 3 4 7 6 5 2 1 iv-ybpu7pvmiu5m57lh5kdd:58391:58751 [0] NCCL INFO Channel 06/12 : 0 2 3 1 6 4 5 7 iv-ybpu7pvmiu5m57lh5kdd:58392:58758 [1] NCCL INFO Trees [0] 6/-1/-1->1->3 [1] 6/-1/-1->1->3 [2] 3/-1/-1->1->6 [3] 3/-1/-1->1->6 [4] 2/-1/-1->1->0 [5] -1/-1/-1->1->2 [6] 6/-1/-1->1->3 [7] 6/-1/-1->1->3 [8] 3/-1/-1->1->6 [9] 3/-1/-1->1->6 [10] 2/-1/-1->1->0 [11] -1/-1/-1->1->2 iv-ybpu7pvmiu5m57lh5kdd:58391:58751 [0] NCCL INFO Channel 07/12 : 0 2 3 1 6 4 5 7 iv-ybpu7pvmiu5m57lh5kdd:58393:58756 [2] NCCL INFO Trees [0] 3/-1/-1->2->0 [1] 3/-1/-1->2->0 [2] -1/-1/-1->2->3 [3] -1/-1/-1->2->3 [4] 5/-1/-1->2->1 [5] 1/-1/-1->2->5 [6] 3/-1/-1->2->0 [7] 3/-1/-1->2->0 [8] -1/-1/-1->2->3 [9] -1/-1/-1->2->3 [10] 5/-1/-1->2->1 [11] 1/-1/-1->2->5 iv-ybpu7pvmiu5m57lh5kdd:58391:58751 [0] NCCL INFO Channel 08/12 : 0 7 5 4 6 1 3 2 iv-ybpu7pvmiu5m57lh5kdd:58392:58758 [1] NCCL INFO Setting affinity for GPU 1 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:58391:58751 [0] NCCL INFO Channel 09/12 : 0 7 5 4 6 1 3 2 iv-ybpu7pvmiu5m57lh5kdd:58393:58756 [2] NCCL INFO Setting affinity for GPU 2 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:58391:58751 [0] NCCL INFO Channel 10/12 : 0 1 2 5 6 7 4 3 iv-ybpu7pvmiu5m57lh5kdd:58391:58751 [0] NCCL INFO Channel 11/12 : 0 3 4 7 6 5 2 1 iv-ybpu7pvmiu5m57lh5kdd:58391:58751 [0] NCCL INFO Trees [0] 2/-1/-1->0->-1 [1] 2/-1/-1->0->-1 [2] 7/-1/-1->0->-1 [3] 7/-1/-1->0->-1 [4] 1/-1/-1->0->-1 [5] 3/-1/-1->0->-1 [6] 2/-1/-1->0->-1 [7] 2/-1/-1->0->-1 [8] 7/-1/-1->0->-1 [9] 7/-1/-1->0->-1 [10] 1/-1/-1->0->-1 [11] 3/-1/-1->0->-1 iv-ybpu7pvmiu5m57lh5kdd:58391:58751 [0] NCCL INFO Setting affinity for GPU 0 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:58394:58753 [3] NCCL INFO Trees [0] 1/-1/-1->3->2 [1] 1/-1/-1->3->2 [2] 2/-1/-1->3->1 [3] 2/-1/-1->3->1 [4] -1/-1/-1->3->4 [5] 4/-1/-1->3->0 [6] 1/-1/-1->3->2 [7] 1/-1/-1->3->2 [8] 2/-1/-1->3->1 [9] 2/-1/-1->3->1 [10] -1/-1/-1->3->4 [11] 4/-1/-1->3->0 iv-ybpu7pvmiu5m57lh5kdd:58394:58753 [3] NCCL INFO Setting affinity for GPU 3 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:58395:58754 [4] NCCL INFO Trees [0] 5/-1/-1->4->6 [1] 5/-1/-1->4->6 [2] 6/-1/-1->4->5 [3] 6/-1/-1->4->5 [4] 3/-1/-1->4->7 [5] 7/-1/-1->4->3 [6] 5/-1/-1->4->6 [7] 5/-1/-1->4->6 [8] 6/-1/-1->4->5 [9] 6/-1/-1->4->5 [10] 3/-1/-1->4->7 [11] 7/-1/-1->4->3 iv-ybpu7pvmiu5m57lh5kdd:58395:58754 [4] NCCL INFO Setting affinity for GPU 4 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:58396:58755 [5] NCCL INFO Trees [0] 7/-1/-1->5->4 [1] 7/-1/-1->5->4 [2] 4/-1/-1->5->7 [3] 4/-1/-1->5->7 [4] 6/-1/-1->5->2 [5] 2/-1/-1->5->6 [6] 7/-1/-1->5->4 [7] 7/-1/-1->5->4 [8] 4/-1/-1->5->7 [9] 4/-1/-1->5->7 [10] 6/-1/-1->5->2 [11] 2/-1/-1->5->6 iv-ybpu7pvmiu5m57lh5kdd:58397:58752 [6] NCCL INFO Trees [0] 4/-1/-1->6->1 [1] 4/-1/-1->6->1 [2] 1/-1/-1->6->4 [3] 1/-1/-1->6->4 [4] 7/-1/-1->6->5 [5] 5/-1/-1->6->7 [6] 4/-1/-1->6->1 [7] 4/-1/-1->6->1 [8] 1/-1/-1->6->4 [9] 1/-1/-1->6->4 [10] 7/-1/-1->6->5 [11] 5/-1/-1->6->7 iv-ybpu7pvmiu5m57lh5kdd:58396:58755 [5] NCCL INFO Setting affinity for GPU 5 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:58397:58752 [6] NCCL INFO Setting affinity for GPU 6 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:58393:58756 [2] NCCL INFO Channel 00 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58758 [1] NCCL INFO Channel 04 : 1[65020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58754 [4] NCCL INFO Channel 00 : 4[69010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58756 [2] NCCL INFO Channel 01 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58752 [6] NCCL INFO Channel 04 : 6[6b010] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58758 [1] NCCL INFO Channel 10 : 1[65020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58757 [7] NCCL INFO Channel 00 : 7[6b020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58754 [4] NCCL INFO Channel 01 : 4[69010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58756 [2] NCCL INFO Channel 06 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58752 [6] NCCL INFO Channel 10 : 6[6b010] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:58751 [0] NCCL INFO Channel 04 : 0[65010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58753 [3] NCCL INFO Channel 05 : 3[67020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58757 [7] NCCL INFO Channel 01 : 7[6b020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58754 [4] NCCL INFO Channel 06 : 4[69010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58756 [2] NCCL INFO Channel 07 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:58751 [0] NCCL INFO Channel 10 : 0[65010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58755 [5] NCCL INFO Channel 04 : 5[69020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58753 [3] NCCL INFO Channel 11 : 3[67020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58754 [4] NCCL INFO Channel 07 : 4[69010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58757 [7] NCCL INFO Channel 06 : 7[6b020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58755 [5] NCCL INFO Channel 10 : 5[69020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58757 [7] NCCL INFO Channel 07 : 7[6b020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58758 [1] NCCL INFO Channel 02 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58758 [1] NCCL INFO Channel 03 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58758 [1] NCCL INFO Channel 08 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58754 [4] NCCL INFO Channel 02 : 4[69010] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58755 [5] NCCL INFO Channel 00 : 5[69020] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:58751 [0] NCCL INFO Channel 00 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58758 [1] NCCL INFO Channel 09 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58754 [4] NCCL INFO Channel 03 : 4[69010] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58755 [5] NCCL INFO Channel 01 : 5[69020] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:58751 [0] NCCL INFO Channel 01 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58754 [4] NCCL INFO Channel 08 : 4[69010] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58755 [5] NCCL INFO Channel 06 : 5[69020] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:58751 [0] NCCL INFO Channel 06 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58754 [4] NCCL INFO Channel 09 : 4[69010] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58755 [5] NCCL INFO Channel 07 : 5[69020] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:58751 [0] NCCL INFO Channel 07 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58752 [6] NCCL INFO Channel 02 : 6[6b010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58754 [4] NCCL INFO Channel 05 : 4[69010] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58756 [2] NCCL INFO Channel 04 : 2[67010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:58751 [0] NCCL INFO Channel 05 : 0[65010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58752 [6] NCCL INFO Channel 03 : 6[6b010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58754 [4] NCCL INFO Channel 11 : 4[69010] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58756 [2] NCCL INFO Channel 10 : 2[67010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:58751 [0] NCCL INFO Channel 11 : 0[65010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58752 [6] NCCL INFO Channel 08 : 6[6b010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58752 [6] NCCL INFO Channel 09 : 6[6b010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58757 [7] NCCL INFO Channel 04 : 7[6b020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58755 [5] NCCL INFO Channel 05 : 5[69020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58753 [3] NCCL INFO Channel 04 : 3[67020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58757 [7] NCCL INFO Channel 10 : 7[6b020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58755 [5] NCCL INFO Channel 11 : 5[69020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58753 [3] NCCL INFO Channel 10 : 3[67020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58758 [1] NCCL INFO Channel 00 : 1[65020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58757 [7] NCCL INFO Channel 02 : 7[6b020] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58756 [2] NCCL INFO Channel 02 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58753 [3] NCCL INFO Channel 00 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58758 [1] NCCL INFO Channel 01 : 1[65020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58757 [7] NCCL INFO Channel 03 : 7[6b020] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58753 [3] NCCL INFO Channel 01 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58756 [2] NCCL INFO Channel 03 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58758 [1] NCCL INFO Channel 06 : 1[65020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58757 [7] NCCL INFO Channel 08 : 7[6b020] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58753 [3] NCCL INFO Channel 06 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58756 [2] NCCL INFO Channel 08 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58758 [1] NCCL INFO Channel 07 : 1[65020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58757 [7] NCCL INFO Channel 09 : 7[6b020] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58753 [3] NCCL INFO Channel 07 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58756 [2] NCCL INFO Channel 09 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58752 [6] NCCL INFO Channel 00 : 6[6b010] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58752 [6] NCCL INFO Channel 01 : 6[6b010] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58752 [6] NCCL INFO Channel 06 : 6[6b010] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:58751 [0] NCCL INFO Channel 02 : 0[65010] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58755 [5] NCCL INFO Channel 02 : 5[69020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58752 [6] NCCL INFO Channel 07 : 6[6b010] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:58751 [0] NCCL INFO Channel 03 : 0[65010] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58755 [5] NCCL INFO Channel 03 : 5[69020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:58751 [0] NCCL INFO Channel 08 : 0[65010] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58756 [2] NCCL INFO Channel 05 : 2[67010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58755 [5] NCCL INFO Channel 08 : 5[69020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58757 [7] NCCL INFO Channel 05 : 7[6b020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:58751 [0] NCCL INFO Channel 09 : 0[65010] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58756 [2] NCCL INFO Channel 11 : 2[67010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58755 [5] NCCL INFO Channel 09 : 5[69020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58757 [7] NCCL INFO Channel 11 : 7[6b020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58752 [6] NCCL INFO Channel 05 : 6[6b010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58758 [1] NCCL INFO Channel 05 : 1[65020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58753 [3] NCCL INFO Channel 02 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58758 [1] NCCL INFO Channel 11 : 1[65020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58752 [6] NCCL INFO Channel 11 : 6[6b010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58753 [3] NCCL INFO Channel 03 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58754 [4] NCCL INFO Channel 04 : 4[69010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58753 [3] NCCL INFO Channel 08 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58754 [4] NCCL INFO Channel 10 : 4[69010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58753 [3] NCCL INFO Channel 09 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58752 [6] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58392:58758 [1] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58392:58758 [1] NCCL INFO Channel 05 : 1[65020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58757 [7] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58391:58751 [0] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58392:58758 [1] NCCL INFO Channel 11 : 1[65020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58755 [5] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58397:58752 [6] NCCL INFO Channel 05 : 6[6b010] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58754 [4] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58393:58756 [2] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58394:58753 [3] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58397:58752 [6] NCCL INFO Channel 11 : 6[6b010] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58757 [7] NCCL INFO Channel 02 : 7[6b020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58757 [7] NCCL INFO Channel 03 : 7[6b020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58754 [4] NCCL INFO Channel 02 : 4[69010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58756 [2] NCCL INFO Channel 02 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58757 [7] NCCL INFO Channel 08 : 7[6b020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58756 [2] NCCL INFO Channel 03 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58754 [4] NCCL INFO Channel 03 : 4[69010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58757 [7] NCCL INFO Channel 09 : 7[6b020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58755 [5] NCCL INFO Channel 05 : 5[69020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58753 [3] NCCL INFO Channel 04 : 3[67020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58756 [2] NCCL INFO Channel 08 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58754 [4] NCCL INFO Channel 08 : 4[69010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58755 [5] NCCL INFO Channel 11 : 5[69020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58756 [2] NCCL INFO Channel 09 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58753 [3] NCCL INFO Channel 10 : 3[67020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58754 [4] NCCL INFO Channel 09 : 4[69010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58758 [1] NCCL INFO Channel 00 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58758 [1] NCCL INFO Channel 01 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58758 [1] NCCL INFO Channel 06 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58756 [2] NCCL INFO Channel 05 : 2[67010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58755 [5] NCCL INFO Channel 02 : 5[69020] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58754 [4] NCCL INFO Channel 00 : 4[69010] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58758 [1] NCCL INFO Channel 07 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58756 [2] NCCL INFO Channel 11 : 2[67010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58755 [5] NCCL INFO Channel 03 : 5[69020] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58754 [4] NCCL INFO Channel 01 : 4[69010] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58754 [4] NCCL INFO Channel 06 : 4[69010] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58755 [5] NCCL INFO Channel 08 : 5[69020] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58754 [4] NCCL INFO Channel 07 : 4[69010] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58755 [5] NCCL INFO Channel 09 : 5[69020] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58753 [3] NCCL INFO Channel 05 : 3[67020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58752 [6] NCCL INFO Channel 00 : 6[6b010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58754 [4] NCCL INFO Channel 04 : 4[69010] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58753 [3] NCCL INFO Channel 11 : 3[67020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58752 [6] NCCL INFO Channel 01 : 6[6b010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58754 [4] NCCL INFO Channel 10 : 4[69010] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58752 [6] NCCL INFO Channel 06 : 6[6b010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58752 [6] NCCL INFO Channel 07 : 6[6b010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58753 [3] NCCL INFO Channel 02 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58757 [7] NCCL INFO Channel 05 : 7[6b020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58755 [5] NCCL INFO Channel 04 : 5[69020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58753 [3] NCCL INFO Channel 03 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58757 [7] NCCL INFO Channel 11 : 7[6b020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58755 [5] NCCL INFO Channel 10 : 5[69020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58753 [3] NCCL INFO Channel 08 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58753 [3] NCCL INFO Channel 09 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58758 [1] NCCL INFO Channel 02 : 1[65020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58757 [7] NCCL INFO Channel 00 : 7[6b020] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58756 [2] NCCL INFO Channel 00 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58758 [1] NCCL INFO Channel 03 : 1[65020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58756 [2] NCCL INFO Channel 01 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58757 [7] NCCL INFO Channel 01 : 7[6b020] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58758 [1] NCCL INFO Channel 08 : 1[65020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58756 [2] NCCL INFO Channel 06 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58757 [7] NCCL INFO Channel 06 : 7[6b020] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58758 [1] NCCL INFO Channel 09 : 1[65020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58756 [2] NCCL INFO Channel 07 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58757 [7] NCCL INFO Channel 07 : 7[6b020] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58752 [6] NCCL INFO Channel 02 : 6[6b010] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58757 [7] NCCL INFO Channel 04 : 7[6b020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58752 [6] NCCL INFO Channel 03 : 6[6b010] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58752 [6] NCCL INFO Channel 08 : 6[6b010] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58757 [7] NCCL INFO Channel 10 : 7[6b020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58755 [5] NCCL INFO Channel 00 : 5[69020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58752 [6] NCCL INFO Channel 09 : 6[6b010] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58755 [5] NCCL INFO Channel 01 : 5[69020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58756 [2] NCCL INFO Channel 04 : 2[67010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58755 [5] NCCL INFO Channel 06 : 5[69020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:58756 [2] NCCL INFO Channel 10 : 2[67010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:58755 [5] NCCL INFO Channel 07 : 5[69020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58752 [6] NCCL INFO Channel 04 : 6[6b010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58758 [1] NCCL INFO Channel 04 : 1[65020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58753 [3] NCCL INFO Channel 00 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58758 [1] NCCL INFO Channel 10 : 1[65020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:58752 [6] NCCL INFO Channel 10 : 6[6b010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58753 [3] NCCL INFO Channel 01 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:58754 [4] NCCL INFO Channel 05 : 4[69010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58753 [3] NCCL INFO Channel 06 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:58757 [7] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58398:58757 [7] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:58391:58751 [0] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58391:58751 [0] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:58395:58754 [4] NCCL INFO Channel 11 : 4[69010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:58753 [3] NCCL INFO Channel 07 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:58758 [1] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58392:58758 [1] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:58397:58752 [6] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58397:58752 [6] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:58398:58757 [7] NCCL INFO 12 coll channels, 16 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58391:58751 [0] NCCL INFO 12 coll channels, 16 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58392:58758 [1] NCCL INFO 12 coll channels, 16 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58398:58757 [7] NCCL INFO Channel 02 : 7[6b020] -> 1[65020] via P2P/indirect/6[6b010] iv-ybpu7pvmiu5m57lh5kdd:58396:58755 [5] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58396:58755 [5] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:58395:58754 [4] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58395:58754 [4] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:58397:58752 [6] NCCL INFO 12 coll channels, 16 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58393:58756 [2] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58393:58756 [2] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:58394:58753 [3] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58394:58753 [3] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:58398:58757 [7] NCCL INFO Channel 10 : 7[6b020] -> 1[65020] via P2P/indirect/6[6b010] iv-ybpu7pvmiu5m57lh5kdd:58396:58755 [5] NCCL INFO 12 coll channels, 16 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58397:58752 [6] NCCL INFO Channel 02 : 6[6b010] -> 0[65010] via P2P/indirect/7[6b020] iv-ybpu7pvmiu5m57lh5kdd:58395:58754 [4] NCCL INFO 12 coll channels, 16 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58393:58756 [2] NCCL INFO 12 coll channels, 16 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58394:58753 [3] NCCL INFO 12 coll channels, 16 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58393:58756 [2] NCCL INFO Channel 02 : 2[67010] -> 4[69010] via P2P/indirect/5[69020] iv-ybpu7pvmiu5m57lh5kdd:58394:58753 [3] NCCL INFO Channel 02 : 3[67020] -> 5[69020] via P2P/indirect/4[69010] iv-ybpu7pvmiu5m57lh5kdd:58398:58757 [7] NCCL INFO Channel 03 : 7[6b020] -> 2[67010] via P2P/indirect/0[65010] iv-ybpu7pvmiu5m57lh5kdd:58393:58756 [2] NCCL INFO Channel 10 : 2[67010] -> 4[69010] via P2P/indirect/5[69020] iv-ybpu7pvmiu5m57lh5kdd:58394:58753 [3] NCCL INFO Channel 10 : 3[67020] -> 5[69020] via P2P/indirect/4[69010] iv-ybpu7pvmiu5m57lh5kdd:58394:58753 [3] NCCL INFO Channel 03 : 3[67020] -> 6[6b010] via P2P/indirect/1[65020] iv-ybpu7pvmiu5m57lh5kdd:58394:58753 [3] NCCL INFO Channel 11 : 3[67020] -> 6[6b010] via P2P/indirect/1[65020] iv-ybpu7pvmiu5m57lh5kdd:58397:58752 [6] NCCL INFO Channel 10 : 6[6b010] -> 0[65010] via P2P/indirect/7[6b020] iv-ybpu7pvmiu5m57lh5kdd:58392:58758 [1] NCCL INFO Channel 03 : 1[65020] -> 4[69010] via P2P/indirect/6[6b010] iv-ybpu7pvmiu5m57lh5kdd:58398:58757 [7] NCCL INFO Channel 11 : 7[6b020] -> 2[67010] via P2P/indirect/0[65010] iv-ybpu7pvmiu5m57lh5kdd:58396:58755 [5] NCCL INFO Channel 03 : 5[69020] -> 0[65010] via P2P/indirect/7[6b020] iv-ybpu7pvmiu5m57lh5kdd:58396:58755 [5] NCCL INFO Channel 11 : 5[69020] -> 0[65010] via P2P/indirect/7[6b020] iv-ybpu7pvmiu5m57lh5kdd:58392:58758 [1] NCCL INFO Channel 11 : 1[65020] -> 4[69010] via P2P/indirect/6[6b010] iv-ybpu7pvmiu5m57lh5kdd:58398:58757 [7] NCCL INFO Channel 04 : 7[6b020] -> 3[67020] via P2P/indirect/0[65010] iv-ybpu7pvmiu5m57lh5kdd:58398:58757 [7] NCCL INFO Channel 12 : 7[6b020] -> 3[67020] via P2P/indirect/0[65010] iv-ybpu7pvmiu5m57lh5kdd:58396:58755 [5] NCCL INFO Channel 04 : 5[69020] -> 1[65020] via P2P/indirect/6[6b010] iv-ybpu7pvmiu5m57lh5kdd:58391:58751 [0] NCCL INFO Channel 04 : 0[65010] -> 4[69010] via P2P/indirect/3[67020] iv-ybpu7pvmiu5m57lh5kdd:58396:58755 [5] NCCL INFO Channel 12 : 5[69020] -> 1[65020] via P2P/indirect/6[6b010] iv-ybpu7pvmiu5m57lh5kdd:58391:58751 [0] NCCL INFO Channel 12 : 0[65010] -> 4[69010] via P2P/indirect/3[67020] iv-ybpu7pvmiu5m57lh5kdd:58395:58754 [4] NCCL INFO Channel 04 : 4[69010] -> 0[65010] via P2P/indirect/7[6b020] iv-ybpu7pvmiu5m57lh5kdd:58393:58756 [2] NCCL INFO Channel 04 : 2[67010] -> 6[6b010] via P2P/indirect/1[65020] iv-ybpu7pvmiu5m57lh5kdd:58395:58754 [4] NCCL INFO Channel 12 : 4[69010] -> 0[65010] via P2P/indirect/7[6b020] iv-ybpu7pvmiu5m57lh5kdd:58393:58756 [2] NCCL INFO Channel 12 : 2[67010] -> 6[6b010] via P2P/indirect/1[65020] iv-ybpu7pvmiu5m57lh5kdd:58392:58758 [1] NCCL INFO Channel 04 : 1[65020] -> 5[69020] via P2P/indirect/6[6b010] iv-ybpu7pvmiu5m57lh5kdd:58392:58758 [1] NCCL INFO Channel 12 : 1[65020] -> 5[69020] via P2P/indirect/6[6b010] iv-ybpu7pvmiu5m57lh5kdd:58395:58754 [4] NCCL INFO Channel 05 : 4[69010] -> 1[65020] via P2P/indirect/6[6b010] iv-ybpu7pvmiu5m57lh5kdd:58391:58751 [0] NCCL INFO Channel 05 : 0[65010] -> 5[69020] via P2P/indirect/7[6b020] iv-ybpu7pvmiu5m57lh5kdd:58395:58754 [4] NCCL INFO Channel 13 : 4[69010] -> 1[65020] via P2P/indirect/6[6b010] iv-ybpu7pvmiu5m57lh5kdd:58394:58753 [3] NCCL INFO Channel 04 : 3[67020] -> 7[6b020] via P2P/indirect/0[65010] iv-ybpu7pvmiu5m57lh5kdd:58391:58751 [0] NCCL INFO Channel 13 : 0[65010] -> 5[69020] via P2P/indirect/7[6b020] iv-ybpu7pvmiu5m57lh5kdd:58397:58752 [6] NCCL INFO Channel 04 : 6[6b010] -> 2[67010] via P2P/indirect/5[69020] iv-ybpu7pvmiu5m57lh5kdd:58394:58753 [3] NCCL INFO Channel 12 : 3[67020] -> 7[6b020] via P2P/indirect/0[65010] iv-ybpu7pvmiu5m57lh5kdd:58397:58752 [6] NCCL INFO Channel 12 : 6[6b010] -> 2[67010] via P2P/indirect/5[69020] iv-ybpu7pvmiu5m57lh5kdd:58392:58758 [1] NCCL INFO Channel 06 : 1[65020] -> 7[6b020] via P2P/indirect/0[65010] iv-ybpu7pvmiu5m57lh5kdd:58395:58754 [4] NCCL INFO Channel 06 : 4[69010] -> 2[67010] via P2P/indirect/3[67020] iv-ybpu7pvmiu5m57lh5kdd:58392:58758 [1] NCCL INFO Channel 14 : 1[65020] -> 7[6b020] via P2P/indirect/0[65010] iv-ybpu7pvmiu5m57lh5kdd:58397:58752 [6] NCCL INFO Channel 05 : 6[6b010] -> 3[67020] via P2P/indirect/1[65020] iv-ybpu7pvmiu5m57lh5kdd:58393:58756 [2] NCCL INFO Channel 05 : 2[67010] -> 7[6b020] via P2P/indirect/0[65010] iv-ybpu7pvmiu5m57lh5kdd:58397:58752 [6] NCCL INFO Channel 13 : 6[6b010] -> 3[67020] via P2P/indirect/1[65020] iv-ybpu7pvmiu5m57lh5kdd:58393:58756 [2] NCCL INFO Channel 13 : 2[67010] -> 7[6b020] via P2P/indirect/0[65010] iv-ybpu7pvmiu5m57lh5kdd:58395:58754 [4] NCCL INFO Channel 14 : 4[69010] -> 2[67010] via P2P/indirect/3[67020] iv-ybpu7pvmiu5m57lh5kdd:58391:58751 [0] NCCL INFO Channel 06 : 0[65010] -> 6[6b010] via P2P/indirect/1[65020] iv-ybpu7pvmiu5m57lh5kdd:58396:58755 [5] NCCL INFO Channel 06 : 5[69020] -> 3[67020] via P2P/indirect/2[67010] iv-ybpu7pvmiu5m57lh5kdd:58391:58751 [0] NCCL INFO Channel 14 : 0[65010] -> 6[6b010] via P2P/indirect/1[65020] iv-ybpu7pvmiu5m57lh5kdd:58396:58755 [5] NCCL INFO Channel 14 : 5[69020] -> 3[67020] via P2P/indirect/2[67010] iv-ybpu7pvmiu5m57lh5kdd:58396:58755 [5] NCCL INFO comm 0x7f6f6c0a5d20 rank 5 nranks 8 cudaDev 5 busId 69020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58391:58751 [0] NCCL INFO comm 0x7fe318008fb0 rank 0 nranks 8 cudaDev 0 busId 65010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58397:58752 [6] NCCL INFO comm 0x7fd46c0acd60 rank 6 nranks 8 cudaDev 6 busId 6b010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58393:58756 [2] NCCL INFO comm 0x7f94980c8de0 rank 2 nranks 8 cudaDev 2 busId 67010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58392:58758 [1] NCCL INFO comm 0x7f373c0c8e80 rank 1 nranks 8 cudaDev 1 busId 65020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58398:58757 [7] NCCL INFO comm 0x7fe06c0ad1e0 rank 7 nranks 8 cudaDev 7 busId 6b020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58395:58754 [4] NCCL INFO comm 0x7f97740a5d20 rank 4 nranks 8 cudaDev 4 busId 69010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58394:58753 [3] NCCL INFO comm 0x7fb2440c8e80 rank 3 nranks 8 cudaDev 3 busId 67020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58391:58391 [0] NCCL INFO Launch mode Parallel > loading doc-idx mapping from /dataset/source/dataset/loss_compara_content_sentence_train_indexmap_84480ns_1024sl_1234s_doc_idx.npy > loading sample-idx mapping from /dataset/source/dataset/loss_compara_content_sentence_train_indexmap_84480ns_1024sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /dataset/source/dataset/loss_compara_content_sentence_train_indexmap_84480ns_1024sl_1234s_shuffle_idx.npy loaded indexed file in 0.004 seconds total number of samples: 108847 total number of epochs: 2 > WARNING: could not find index map files, building the indices on rank 0 ... > last epoch number of samples (982) is smaller than 80% of number of samples per epoch (2858), setting separate_last_epoch to True > elasped time to build and save doc-idx mapping (seconds): 0.005736 using: number of documents: 62497 number of epochs: 2 sequence length: 1024 total number of samples: 5717 > elasped time to build and save sample-idx mapping (seconds): 0.001225 > building shuffle index with split [0, 2858) and [2858, 5717) ... > elasped time to build and save shuffle-idx mapping (seconds): 0.001086 > loading doc-idx mapping from /dataset/source/dataset/loss_compara_content_sentence_valid_indexmap_3840ns_1024sl_1234s_doc_idx.npy > loading sample-idx mapping from /dataset/source/dataset/loss_compara_content_sentence_valid_indexmap_3840ns_1024sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /dataset/source/dataset/loss_compara_content_sentence_valid_indexmap_3840ns_1024sl_1234s_shuffle_idx.npy loaded indexed file in 0.002 seconds total number of samples: 5718 total number of epochs: 2 > WARNING: could not find index map files, building the indices on rank 0 ... > last epoch number of samples (20) is smaller than 80% of number of samples per epoch (50), setting separate_last_epoch to True > elasped time to build and save doc-idx mapping (seconds): 0.004167 using: number of documents: 1250 number of epochs: 77 sequence length: 1024 total number of samples: 3870 > elasped time to build and save sample-idx mapping (seconds): 0.001061 > building shuffle index with split [0, 3820) and [3820, 3870) ... > elasped time to build and save shuffle-idx mapping (seconds): 0.001023 > loading doc-idx mapping from /dataset/source/dataset/loss_compara_content_sentence_test_indexmap_3840ns_1024sl_1234s_doc_idx.npy > loading sample-idx mapping from /dataset/source/dataset/loss_compara_content_sentence_test_indexmap_3840ns_1024sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /dataset/source/dataset/loss_compara_content_sentence_test_indexmap_3840ns_1024sl_1234s_shuffle_idx.npy loaded indexed file in 0.002 seconds total number of samples: 3871 total number of epochs: 77 > finished creating GPT datasets ... iv-ybpu7pvmiu5m57lh5kdd:58397:58777 [6] NCCL INFO Channel 00/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58777 [6] NCCL INFO Channel 01/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58777 [6] NCCL INFO Channel 02/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58777 [6] NCCL INFO Channel 03/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58777 [6] NCCL INFO Channel 04/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58777 [6] NCCL INFO Channel 05/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58777 [6] NCCL INFO Channel 06/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58777 [6] NCCL INFO Channel 07/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58777 [6] NCCL INFO Channel 08/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58777 [6] NCCL INFO Channel 09/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58777 [6] NCCL INFO Channel 10/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58777 [6] NCCL INFO Channel 11/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58777 [6] NCCL INFO Channel 12/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58777 [6] NCCL INFO Channel 13/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58777 [6] NCCL INFO Channel 14/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58777 [6] NCCL INFO Channel 15/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58777 [6] NCCL INFO Channel 16/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58777 [6] NCCL INFO Channel 17/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58777 [6] NCCL INFO Channel 18/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58777 [6] NCCL INFO Channel 19/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58777 [6] NCCL INFO Channel 20/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58777 [6] NCCL INFO Channel 21/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58777 [6] NCCL INFO Channel 22/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58777 [6] NCCL INFO Channel 23/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58777 [6] NCCL INFO Channel 24/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58777 [6] NCCL INFO Channel 25/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58777 [6] NCCL INFO Channel 26/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58777 [6] NCCL INFO Channel 27/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58777 [6] NCCL INFO Channel 28/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58777 [6] NCCL INFO Channel 29/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58777 [6] NCCL INFO Channel 30/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58777 [6] NCCL INFO Channel 31/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58397:58777 [6] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-ybpu7pvmiu5m57lh5kdd:58397:58777 [6] NCCL INFO Setting affinity for GPU 6 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:58395:58778 [4] NCCL INFO Channel 00/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58778 [4] NCCL INFO Channel 01/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58778 [4] NCCL INFO Channel 02/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58778 [4] NCCL INFO Channel 03/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58778 [4] NCCL INFO Channel 04/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58778 [4] NCCL INFO Channel 05/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58778 [4] NCCL INFO Channel 06/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58778 [4] NCCL INFO Channel 07/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58778 [4] NCCL INFO Channel 08/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58778 [4] NCCL INFO Channel 09/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58778 [4] NCCL INFO Channel 10/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58778 [4] NCCL INFO Channel 11/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58778 [4] NCCL INFO Channel 12/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58778 [4] NCCL INFO Channel 13/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58778 [4] NCCL INFO Channel 14/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58778 [4] NCCL INFO Channel 15/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58778 [4] NCCL INFO Channel 16/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58778 [4] NCCL INFO Channel 17/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58778 [4] NCCL INFO Channel 18/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58778 [4] NCCL INFO Channel 19/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58778 [4] NCCL INFO Channel 20/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58778 [4] NCCL INFO Channel 21/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58778 [4] NCCL INFO Channel 22/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58778 [4] NCCL INFO Channel 23/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58778 [4] NCCL INFO Channel 24/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58778 [4] NCCL INFO Channel 25/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58778 [4] NCCL INFO Channel 26/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58778 [4] NCCL INFO Channel 27/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58778 [4] NCCL INFO Channel 28/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58778 [4] NCCL INFO Channel 29/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58778 [4] NCCL INFO Channel 30/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58778 [4] NCCL INFO Channel 31/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58395:58778 [4] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-ybpu7pvmiu5m57lh5kdd:58395:58778 [4] NCCL INFO Setting affinity for GPU 4 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:58396:58782 [5] NCCL INFO Channel 00/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58782 [5] NCCL INFO Channel 01/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58782 [5] NCCL INFO Channel 02/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58782 [5] NCCL INFO Channel 03/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58782 [5] NCCL INFO Channel 04/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58782 [5] NCCL INFO Channel 05/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58782 [5] NCCL INFO Channel 06/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58782 [5] NCCL INFO Channel 07/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58782 [5] NCCL INFO Channel 08/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58782 [5] NCCL INFO Channel 09/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58782 [5] NCCL INFO Channel 10/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58782 [5] NCCL INFO Channel 11/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58782 [5] NCCL INFO Channel 12/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58782 [5] NCCL INFO Channel 13/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58782 [5] NCCL INFO Channel 14/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58782 [5] NCCL INFO Channel 15/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58782 [5] NCCL INFO Channel 16/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58782 [5] NCCL INFO Channel 17/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58782 [5] NCCL INFO Channel 18/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58782 [5] NCCL INFO Channel 19/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58782 [5] NCCL INFO Channel 20/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58782 [5] NCCL INFO Channel 21/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58782 [5] NCCL INFO Channel 22/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58782 [5] NCCL INFO Channel 23/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58782 [5] NCCL INFO Channel 24/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58782 [5] NCCL INFO Channel 25/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58782 [5] NCCL INFO Channel 26/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58782 [5] NCCL INFO Channel 27/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58782 [5] NCCL INFO Channel 28/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58782 [5] NCCL INFO Channel 29/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58782 [5] NCCL INFO Channel 30/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58782 [5] NCCL INFO Channel 31/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58396:58782 [5] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-ybpu7pvmiu5m57lh5kdd:58396:58782 [5] NCCL INFO Setting affinity for GPU 5 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:58391:58781 [0] NCCL INFO Channel 00/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58781 [0] NCCL INFO Channel 01/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58781 [0] NCCL INFO Channel 02/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58781 [0] NCCL INFO Channel 03/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58781 [0] NCCL INFO Channel 04/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58781 [0] NCCL INFO Channel 05/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58781 [0] NCCL INFO Channel 06/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58781 [0] NCCL INFO Channel 07/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58781 [0] NCCL INFO Channel 08/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58781 [0] NCCL INFO Channel 09/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58781 [0] NCCL INFO Channel 10/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58781 [0] NCCL INFO Channel 11/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58781 [0] NCCL INFO Channel 12/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58781 [0] NCCL INFO Channel 13/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58781 [0] NCCL INFO Channel 14/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58781 [0] NCCL INFO Channel 15/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58781 [0] NCCL INFO Channel 16/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58781 [0] NCCL INFO Channel 17/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58781 [0] NCCL INFO Channel 18/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58781 [0] NCCL INFO Channel 19/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58781 [0] NCCL INFO Channel 20/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58781 [0] NCCL INFO Channel 21/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58781 [0] NCCL INFO Channel 22/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58781 [0] NCCL INFO Channel 23/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58781 [0] NCCL INFO Channel 24/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58781 [0] NCCL INFO Channel 25/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58781 [0] NCCL INFO Channel 26/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58781 [0] NCCL INFO Channel 27/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58781 [0] NCCL INFO Channel 28/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58781 [0] NCCL INFO Channel 29/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58781 [0] NCCL INFO Channel 30/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58781 [0] NCCL INFO Channel 31/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58391:58781 [0] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-ybpu7pvmiu5m57lh5kdd:58391:58781 [0] NCCL INFO Setting affinity for GPU 0 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:58393:58788 [2] NCCL INFO Channel 00/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58788 [2] NCCL INFO Channel 01/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58788 [2] NCCL INFO Channel 02/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58788 [2] NCCL INFO Channel 03/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58788 [2] NCCL INFO Channel 04/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58788 [2] NCCL INFO Channel 05/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58788 [2] NCCL INFO Channel 06/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58788 [2] NCCL INFO Channel 07/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58788 [2] NCCL INFO Channel 08/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58788 [2] NCCL INFO Channel 09/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58788 [2] NCCL INFO Channel 10/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58788 [2] NCCL INFO Channel 11/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58788 [2] NCCL INFO Channel 12/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58788 [2] NCCL INFO Channel 13/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58788 [2] NCCL INFO Channel 14/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58788 [2] NCCL INFO Channel 15/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58788 [2] NCCL INFO Channel 16/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58788 [2] NCCL INFO Channel 17/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58788 [2] NCCL INFO Channel 18/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58788 [2] NCCL INFO Channel 19/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58788 [2] NCCL INFO Channel 20/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58788 [2] NCCL INFO Channel 21/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58788 [2] NCCL INFO Channel 22/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58788 [2] NCCL INFO Channel 23/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58788 [2] NCCL INFO Channel 24/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58788 [2] NCCL INFO Channel 25/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58788 [2] NCCL INFO Channel 26/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58788 [2] NCCL INFO Channel 27/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58788 [2] NCCL INFO Channel 28/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58788 [2] NCCL INFO Channel 29/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58788 [2] NCCL INFO Channel 30/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58788 [2] NCCL INFO Channel 31/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58393:58788 [2] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-ybpu7pvmiu5m57lh5kdd:58393:58788 [2] NCCL INFO Setting affinity for GPU 2 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:58394:58791 [3] NCCL INFO Channel 00/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58791 [3] NCCL INFO Channel 01/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58791 [3] NCCL INFO Channel 02/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58791 [3] NCCL INFO Channel 03/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58791 [3] NCCL INFO Channel 04/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58791 [3] NCCL INFO Channel 05/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58791 [3] NCCL INFO Channel 06/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58791 [3] NCCL INFO Channel 07/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58791 [3] NCCL INFO Channel 08/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58791 [3] NCCL INFO Channel 09/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58791 [3] NCCL INFO Channel 10/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58791 [3] NCCL INFO Channel 11/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58791 [3] NCCL INFO Channel 12/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58791 [3] NCCL INFO Channel 13/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58791 [3] NCCL INFO Channel 14/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58791 [3] NCCL INFO Channel 15/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58791 [3] NCCL INFO Channel 16/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58791 [3] NCCL INFO Channel 17/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58791 [3] NCCL INFO Channel 18/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58791 [3] NCCL INFO Channel 19/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58791 [3] NCCL INFO Channel 20/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58791 [3] NCCL INFO Channel 21/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58791 [3] NCCL INFO Channel 22/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58791 [3] NCCL INFO Channel 23/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58791 [3] NCCL INFO Channel 24/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58791 [3] NCCL INFO Channel 25/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58791 [3] NCCL INFO Channel 26/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58791 [3] NCCL INFO Channel 27/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58791 [3] NCCL INFO Channel 28/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58791 [3] NCCL INFO Channel 29/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58791 [3] NCCL INFO Channel 30/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58791 [3] NCCL INFO Channel 31/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58394:58791 [3] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-ybpu7pvmiu5m57lh5kdd:58394:58791 [3] NCCL INFO Setting affinity for GPU 3 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:58398:58792 [7] NCCL INFO Channel 00/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58792 [7] NCCL INFO Channel 01/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58792 [7] NCCL INFO Channel 02/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58792 [7] NCCL INFO Channel 03/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58792 [7] NCCL INFO Channel 04/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58792 [7] NCCL INFO Channel 05/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58792 [7] NCCL INFO Channel 06/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58792 [7] NCCL INFO Channel 07/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58792 [7] NCCL INFO Channel 08/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58792 [7] NCCL INFO Channel 09/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58792 [7] NCCL INFO Channel 10/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58792 [7] NCCL INFO Channel 11/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58792 [7] NCCL INFO Channel 12/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58792 [7] NCCL INFO Channel 13/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58792 [7] NCCL INFO Channel 14/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58792 [7] NCCL INFO Channel 15/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58792 [7] NCCL INFO Channel 16/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58792 [7] NCCL INFO Channel 17/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58792 [7] NCCL INFO Channel 18/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58792 [7] NCCL INFO Channel 19/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58792 [7] NCCL INFO Channel 20/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58792 [7] NCCL INFO Channel 21/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58792 [7] NCCL INFO Channel 22/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58792 [7] NCCL INFO Channel 23/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58792 [7] NCCL INFO Channel 24/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58792 [7] NCCL INFO Channel 25/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58792 [7] NCCL INFO Channel 26/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58792 [7] NCCL INFO Channel 27/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58792 [7] NCCL INFO Channel 28/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58792 [7] NCCL INFO Channel 29/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58792 [7] NCCL INFO Channel 30/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58792 [7] NCCL INFO Channel 31/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58398:58792 [7] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-ybpu7pvmiu5m57lh5kdd:58398:58792 [7] NCCL INFO Setting affinity for GPU 7 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:58392:58795 [1] NCCL INFO Channel 00/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58795 [1] NCCL INFO Channel 01/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58795 [1] NCCL INFO Channel 02/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58795 [1] NCCL INFO Channel 03/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58795 [1] NCCL INFO Channel 04/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58795 [1] NCCL INFO Channel 05/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58795 [1] NCCL INFO Channel 06/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58795 [1] NCCL INFO Channel 07/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58795 [1] NCCL INFO Channel 08/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58795 [1] NCCL INFO Channel 09/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58795 [1] NCCL INFO Channel 10/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58795 [1] NCCL INFO Channel 11/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58795 [1] NCCL INFO Channel 12/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58795 [1] NCCL INFO Channel 13/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58795 [1] NCCL INFO Channel 14/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58795 [1] NCCL INFO Channel 15/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58795 [1] NCCL INFO Channel 16/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58795 [1] NCCL INFO Channel 17/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58795 [1] NCCL INFO Channel 18/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58795 [1] NCCL INFO Channel 19/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58795 [1] NCCL INFO Channel 20/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58795 [1] NCCL INFO Channel 21/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58795 [1] NCCL INFO Channel 22/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58795 [1] NCCL INFO Channel 23/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58795 [1] NCCL INFO Channel 24/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58795 [1] NCCL INFO Channel 25/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58795 [1] NCCL INFO Channel 26/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58795 [1] NCCL INFO Channel 27/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58795 [1] NCCL INFO Channel 28/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58795 [1] NCCL INFO Channel 29/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58795 [1] NCCL INFO Channel 30/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58795 [1] NCCL INFO Channel 31/32 : 0 iv-ybpu7pvmiu5m57lh5kdd:58392:58795 [1] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 iv-ybpu7pvmiu5m57lh5kdd:58392:58795 [1] NCCL INFO Setting affinity for GPU 1 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:58397:58777 [6] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58397:58777 [6] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58397:58777 [6] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58397:58777 [6] NCCL INFO comm 0x7fd42c008fb0 rank 0 nranks 1 cudaDev 6 busId 6b010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58395:58778 [4] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58395:58778 [4] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58395:58778 [4] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58395:58778 [4] NCCL INFO comm 0x7f9738008fb0 rank 0 nranks 1 cudaDev 4 busId 69010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58396:58782 [5] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58396:58782 [5] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58396:58782 [5] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58396:58782 [5] NCCL INFO comm 0x7f6f30008fb0 rank 0 nranks 1 cudaDev 5 busId 69020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58391:58781 [0] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58391:58781 [0] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58391:58781 [0] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58393:58788 [2] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58393:58788 [2] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58393:58788 [2] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58394:58791 [3] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58394:58791 [3] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58394:58791 [3] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58393:58788 [2] NCCL INFO comm 0x7f9464008fb0 rank 0 nranks 1 cudaDev 2 busId 67010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58391:58781 [0] NCCL INFO comm 0x7fe2e0008fb0 rank 0 nranks 1 cudaDev 0 busId 65010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58394:58791 [3] NCCL INFO comm 0x7fb208008fb0 rank 0 nranks 1 cudaDev 3 busId 67020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58398:58792 [7] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58398:58792 [7] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58398:58792 [7] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58398:58792 [7] NCCL INFO comm 0x7fe024008fb0 rank 0 nranks 1 cudaDev 7 busId 6b020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58392:58795 [1] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58392:58795 [1] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58392:58795 [1] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58392:58795 [1] NCCL INFO comm 0x7f3700008fb0 rank 0 nranks 1 cudaDev 1 busId 65020 - Init COMPLETE [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:99] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [after dataloaders are built] datetime: 2022-07-05 15:41:57 done with setup ... training ... time (ms) | model-and-optimizer-setup: 203.22 | train/valid/test-data-iterators-setup: 2328.70 [before the start of training step] datetime: 2022-07-05 15:41:57 /dataset/xyn/Megatron-LM/megatron/model/transformer.py:536: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/LegacyTypeDispatch.h:74.) output = bias_dropout_add_func( iv-ybpu7pvmiu5m57lh5kdd:58392:61030 [1] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 iv-ybpu7pvmiu5m57lh5kdd:58391:61029 [0] NCCL INFO Channel 00/02 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:58391:61029 [0] NCCL INFO Channel 01/02 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:58392:61030 [1] NCCL INFO Setting affinity for GPU 1 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:58391:61029 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 iv-ybpu7pvmiu5m57lh5kdd:58391:61029 [0] NCCL INFO Setting affinity for GPU 0 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:58392:61030 [1] NCCL INFO Channel 00 : 1[65020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:61029 [0] NCCL INFO Channel 00 : 0[65010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:61030 [1] NCCL INFO Channel 01 : 1[65020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:61029 [0] NCCL INFO Channel 01 : 0[65010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:61029 [0] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58391:61029 [0] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58391:61029 [0] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:58391:61029 [0] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58392:61030 [1] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58392:61030 [1] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58392:61030 [1] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:58392:61030 [1] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58391:61029 [0] NCCL INFO comm 0x7fe1c4008fb0 rank 0 nranks 2 cudaDev 0 busId 65010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58392:61030 [1] NCCL INFO comm 0x7f36cc008fb0 rank 1 nranks 2 cudaDev 1 busId 65020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58391:58391 [0] NCCL INFO Launch mode Parallel /dataset/xyn/Megatron-LM/megatron/model/transformer.py:536: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/LegacyTypeDispatch.h:74.) output = bias_dropout_add_func( iv-ybpu7pvmiu5m57lh5kdd:58393:61037 [2] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 iv-ybpu7pvmiu5m57lh5kdd:58392:61036 [1] NCCL INFO Channel 00/02 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:58392:61036 [1] NCCL INFO Channel 01/02 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:58393:61037 [2] NCCL INFO Setting affinity for GPU 2 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:58392:61036 [1] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 iv-ybpu7pvmiu5m57lh5kdd:58392:61036 [1] NCCL INFO Setting affinity for GPU 1 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:58392:61036 [1] NCCL INFO Channel 00 : 0[65020] -> 1[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:61037 [2] NCCL INFO Channel 00 : 1[67010] -> 0[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:61036 [1] NCCL INFO Channel 01 : 0[65020] -> 1[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:61037 [2] NCCL INFO Channel 01 : 1[67010] -> 0[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:61036 [1] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58392:61036 [1] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58392:61036 [1] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:58392:61036 [1] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58393:61037 [2] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58393:61037 [2] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58393:61037 [2] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:58393:61037 [2] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58393:61037 [2] NCCL INFO comm 0x7f9430008fb0 rank 1 nranks 2 cudaDev 2 busId 67010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58392:61036 [1] NCCL INFO comm 0x7f35e4008fb0 rank 0 nranks 2 cudaDev 1 busId 65020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58392:58392 [1] NCCL INFO Launch mode Parallel /dataset/xyn/Megatron-LM/megatron/model/transformer.py:536: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/LegacyTypeDispatch.h:74.) output = bias_dropout_add_func( iv-ybpu7pvmiu5m57lh5kdd:58394:61044 [3] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 [2] -1/-1/-1->1->0 [3] -1/-1/-1->1->0 iv-ybpu7pvmiu5m57lh5kdd:58393:61043 [2] NCCL INFO Channel 00/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:58393:61043 [2] NCCL INFO Channel 01/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:58394:61044 [3] NCCL INFO Setting affinity for GPU 3 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:58393:61043 [2] NCCL INFO Channel 02/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:58393:61043 [2] NCCL INFO Channel 03/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:58393:61043 [2] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 iv-ybpu7pvmiu5m57lh5kdd:58393:61043 [2] NCCL INFO Setting affinity for GPU 2 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:58394:61044 [3] NCCL INFO Channel 00 : 1[67020] -> 0[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:61044 [3] NCCL INFO Channel 01 : 1[67020] -> 0[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:61044 [3] NCCL INFO Channel 02 : 1[67020] -> 0[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:61044 [3] NCCL INFO Channel 03 : 1[67020] -> 0[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:61043 [2] NCCL INFO Channel 00 : 0[67010] -> 1[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:61043 [2] NCCL INFO Channel 01 : 0[67010] -> 1[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:61043 [2] NCCL INFO Channel 02 : 0[67010] -> 1[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:61043 [2] NCCL INFO Channel 03 : 0[67010] -> 1[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:61043 [2] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58393:61043 [2] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58393:61043 [2] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:58393:61043 [2] NCCL INFO 4 coll channels, 4 p2p channels, 4 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58394:61044 [3] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58394:61044 [3] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58394:61044 [3] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:58394:61044 [3] NCCL INFO 4 coll channels, 4 p2p channels, 4 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58393:61043 [2] NCCL INFO comm 0x7f9348008fb0 rank 0 nranks 2 cudaDev 2 busId 67010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58394:61044 [3] NCCL INFO comm 0x7fb1d4008fb0 rank 1 nranks 2 cudaDev 3 busId 67020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58393:58393 [2] NCCL INFO Launch mode Parallel /dataset/xyn/Megatron-LM/megatron/model/transformer.py:536: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/LegacyTypeDispatch.h:74.) output = bias_dropout_add_func( iv-ybpu7pvmiu5m57lh5kdd:58394:61063 [3] NCCL INFO Channel 00/02 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:58395:61064 [4] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 iv-ybpu7pvmiu5m57lh5kdd:58394:61063 [3] NCCL INFO Channel 01/02 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:58395:61064 [4] NCCL INFO Setting affinity for GPU 4 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:58394:61063 [3] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 iv-ybpu7pvmiu5m57lh5kdd:58394:61063 [3] NCCL INFO Setting affinity for GPU 3 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:58395:61064 [4] NCCL INFO Channel 00 : 1[69010] -> 0[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:61063 [3] NCCL INFO Channel 00 : 0[67020] -> 1[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:61064 [4] NCCL INFO Channel 01 : 1[69010] -> 0[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:61063 [3] NCCL INFO Channel 01 : 0[67020] -> 1[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:61064 [4] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58395:61064 [4] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58395:61064 [4] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:58395:61064 [4] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58394:61063 [3] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58394:61063 [3] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58394:61063 [3] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:58394:61063 [3] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58395:61064 [4] NCCL INFO comm 0x7f9700008fb0 rank 1 nranks 2 cudaDev 4 busId 69010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58394:61063 [3] NCCL INFO comm 0x7fb0ec008fb0 rank 0 nranks 2 cudaDev 3 busId 67020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58394:58394 [3] NCCL INFO Launch mode Parallel /dataset/xyn/Megatron-LM/megatron/model/transformer.py:536: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/LegacyTypeDispatch.h:74.) output = bias_dropout_add_func( iv-ybpu7pvmiu5m57lh5kdd:58396:61071 [5] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 [2] -1/-1/-1->1->0 [3] -1/-1/-1->1->0 iv-ybpu7pvmiu5m57lh5kdd:58395:61070 [4] NCCL INFO Channel 00/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:58395:61070 [4] NCCL INFO Channel 01/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:58396:61071 [5] NCCL INFO Setting affinity for GPU 5 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:58395:61070 [4] NCCL INFO Channel 02/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:58395:61070 [4] NCCL INFO Channel 03/04 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:58395:61070 [4] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 iv-ybpu7pvmiu5m57lh5kdd:58395:61070 [4] NCCL INFO Setting affinity for GPU 4 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:58396:61071 [5] NCCL INFO Channel 00 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:61070 [4] NCCL INFO Channel 00 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:61071 [5] NCCL INFO Channel 01 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:61070 [4] NCCL INFO Channel 01 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:61071 [5] NCCL INFO Channel 02 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:61070 [4] NCCL INFO Channel 02 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:61071 [5] NCCL INFO Channel 03 : 1[69020] -> 0[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:61070 [4] NCCL INFO Channel 03 : 0[69010] -> 1[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:61070 [4] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58395:61070 [4] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58395:61070 [4] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:58395:61070 [4] NCCL INFO 4 coll channels, 4 p2p channels, 4 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58396:61071 [5] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58396:61071 [5] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58396:61071 [5] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:58396:61071 [5] NCCL INFO 4 coll channels, 4 p2p channels, 4 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58396:61071 [5] NCCL INFO comm 0x7f6efc008fb0 rank 1 nranks 2 cudaDev 5 busId 69020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58395:61070 [4] NCCL INFO comm 0x7f961c008fb0 rank 0 nranks 2 cudaDev 4 busId 69010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58395:58395 [4] NCCL INFO Launch mode Parallel /dataset/xyn/Megatron-LM/megatron/model/transformer.py:536: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/LegacyTypeDispatch.h:74.) output = bias_dropout_add_func( iv-ybpu7pvmiu5m57lh5kdd:58396:61077 [5] NCCL INFO Channel 00/02 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:58397:61078 [6] NCCL INFO Trees [0] 0/-1/-1->1->-1 [1] 0/-1/-1->1->-1 iv-ybpu7pvmiu5m57lh5kdd:58396:61077 [5] NCCL INFO Channel 01/02 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:58397:61078 [6] NCCL INFO Setting affinity for GPU 6 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:58396:61077 [5] NCCL INFO Trees [0] -1/-1/-1->0->1 [1] -1/-1/-1->0->1 iv-ybpu7pvmiu5m57lh5kdd:58396:61077 [5] NCCL INFO Setting affinity for GPU 5 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:58396:61077 [5] NCCL INFO Channel 00 : 0[69020] -> 1[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:61078 [6] NCCL INFO Channel 00 : 1[6b010] -> 0[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:61077 [5] NCCL INFO Channel 01 : 0[69020] -> 1[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:61078 [6] NCCL INFO Channel 01 : 1[6b010] -> 0[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:61078 [6] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58397:61078 [6] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58397:61078 [6] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:58397:61078 [6] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58396:61077 [5] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58396:61077 [5] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58396:61077 [5] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:58396:61077 [5] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58397:61078 [6] NCCL INFO comm 0x7fd3f8008fb0 rank 1 nranks 2 cudaDev 6 busId 6b010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58396:61077 [5] NCCL INFO comm 0x7f6e10008fb0 rank 0 nranks 2 cudaDev 5 busId 69020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58396:58396 [5] NCCL INFO Launch mode Parallel /dataset/xyn/Megatron-LM/megatron/model/transformer.py:536: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/LegacyTypeDispatch.h:74.) output = bias_dropout_add_func( iv-ybpu7pvmiu5m57lh5kdd:58397:61097 [6] NCCL INFO Channel 00/02 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:58397:61097 [6] NCCL INFO Channel 01/02 : 0 1 iv-ybpu7pvmiu5m57lh5kdd:58398:61098 [7] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 iv-ybpu7pvmiu5m57lh5kdd:58397:61097 [6] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 iv-ybpu7pvmiu5m57lh5kdd:58397:61097 [6] NCCL INFO Setting affinity for GPU 6 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:58398:61098 [7] NCCL INFO Setting affinity for GPU 7 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:58397:61097 [6] NCCL INFO Channel 00 : 0[6b010] -> 1[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:61097 [6] NCCL INFO Channel 01 : 0[6b010] -> 1[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:61098 [7] NCCL INFO Channel 00 : 1[6b020] -> 0[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:61098 [7] NCCL INFO Channel 01 : 1[6b020] -> 0[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:61097 [6] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58397:61097 [6] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58397:61097 [6] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:58397:61097 [6] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58398:61098 [7] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58398:61098 [7] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58398:61098 [7] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:58398:61098 [7] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58397:61097 [6] NCCL INFO comm 0x7fd310008fb0 rank 0 nranks 2 cudaDev 6 busId 6b010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58398:61098 [7] NCCL INFO comm 0x7fdff4008fb0 rank 1 nranks 2 cudaDev 7 busId 6b020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58397:58397 [6] NCCL INFO Launch mode Parallel /dataset/xyn/Megatron-LM/megatron/model/transformer.py:536: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/core/LegacyTypeDispatch.h:74.) output = bias_dropout_add_func( iv-ybpu7pvmiu5m57lh5kdd:58394:61209 [3] NCCL INFO Trees [0] 1/-1/-1->3->2 [1] 1/-1/-1->3->2 [2] 2/-1/-1->3->1 [3] 2/-1/-1->3->1 [4] -1/-1/-1->3->4 [5] 4/-1/-1->3->0 [6] 1/-1/-1->3->2 [7] 1/-1/-1->3->2 [8] 2/-1/-1->3->1 [9] 2/-1/-1->3->1 [10] -1/-1/-1->3->4 [11] 4/-1/-1->3->0 iv-ybpu7pvmiu5m57lh5kdd:58394:61209 [3] NCCL INFO Setting affinity for GPU 3 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:58395:61210 [4] NCCL INFO Trees [0] 5/-1/-1->4->6 [1] 5/-1/-1->4->6 [2] 6/-1/-1->4->5 [3] 6/-1/-1->4->5 [4] 3/-1/-1->4->7 [5] 7/-1/-1->4->3 [6] 5/-1/-1->4->6 [7] 5/-1/-1->4->6 [8] 6/-1/-1->4->5 [9] 6/-1/-1->4->5 [10] 3/-1/-1->4->7 [11] 7/-1/-1->4->3 iv-ybpu7pvmiu5m57lh5kdd:58395:61210 [4] NCCL INFO Setting affinity for GPU 4 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:58396:61211 [5] NCCL INFO Trees [0] 7/-1/-1->5->4 [1] 7/-1/-1->5->4 [2] 4/-1/-1->5->7 [3] 4/-1/-1->5->7 [4] 6/-1/-1->5->2 [5] 2/-1/-1->5->6 [6] 7/-1/-1->5->4 [7] 7/-1/-1->5->4 [8] 4/-1/-1->5->7 [9] 4/-1/-1->5->7 [10] 6/-1/-1->5->2 [11] 2/-1/-1->5->6 iv-ybpu7pvmiu5m57lh5kdd:58397:61208 [6] NCCL INFO Trees [0] 4/-1/-1->6->1 [1] 4/-1/-1->6->1 [2] 1/-1/-1->6->4 [3] 1/-1/-1->6->4 [4] 7/-1/-1->6->5 [5] 5/-1/-1->6->7 [6] 4/-1/-1->6->1 [7] 4/-1/-1->6->1 [8] 1/-1/-1->6->4 [9] 1/-1/-1->6->4 [10] 7/-1/-1->6->5 [11] 5/-1/-1->6->7 iv-ybpu7pvmiu5m57lh5kdd:58396:61211 [5] NCCL INFO Setting affinity for GPU 5 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:58397:61208 [6] NCCL INFO Setting affinity for GPU 6 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:58398:61214 [7] NCCL INFO Trees [0] -1/-1/-1->7->5 [1] -1/-1/-1->7->5 [2] 5/-1/-1->7->0 [3] 5/-1/-1->7->0 [4] 4/-1/-1->7->6 [5] 6/-1/-1->7->4 [6] -1/-1/-1->7->5 [7] -1/-1/-1->7->5 [8] 5/-1/-1->7->0 [9] 5/-1/-1->7->0 [10] 4/-1/-1->7->6 [11] 6/-1/-1->7->4 iv-ybpu7pvmiu5m57lh5kdd:58398:61214 [7] NCCL INFO Setting affinity for GPU 7 to 0fffff,fffffc00,00000000 iv-ybpu7pvmiu5m57lh5kdd:58391:61207 [0] NCCL INFO Channel 00/12 : 0 2 3 1 6 4 5 7 iv-ybpu7pvmiu5m57lh5kdd:58391:61207 [0] NCCL INFO Channel 01/12 : 0 2 3 1 6 4 5 7 iv-ybpu7pvmiu5m57lh5kdd:58391:61207 [0] NCCL INFO Channel 02/12 : 0 7 5 4 6 1 3 2 iv-ybpu7pvmiu5m57lh5kdd:58391:61207 [0] NCCL INFO Channel 03/12 : 0 7 5 4 6 1 3 2 iv-ybpu7pvmiu5m57lh5kdd:58391:61207 [0] NCCL INFO Channel 04/12 : 0 1 2 5 6 7 4 3 iv-ybpu7pvmiu5m57lh5kdd:58392:61212 [1] NCCL INFO Trees [0] 6/-1/-1->1->3 [1] 6/-1/-1->1->3 [2] 3/-1/-1->1->6 [3] 3/-1/-1->1->6 [4] 2/-1/-1->1->0 [5] -1/-1/-1->1->2 [6] 6/-1/-1->1->3 [7] 6/-1/-1->1->3 [8] 3/-1/-1->1->6 [9] 3/-1/-1->1->6 [10] 2/-1/-1->1->0 [11] -1/-1/-1->1->2 iv-ybpu7pvmiu5m57lh5kdd:58391:61207 [0] NCCL INFO Channel 05/12 : 0 3 4 7 6 5 2 1 iv-ybpu7pvmiu5m57lh5kdd:58391:61207 [0] NCCL INFO Channel 06/12 : 0 2 3 1 6 4 5 7 iv-ybpu7pvmiu5m57lh5kdd:58392:61212 [1] NCCL INFO Setting affinity for GPU 1 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:58391:61207 [0] NCCL INFO Channel 07/12 : 0 2 3 1 6 4 5 7 iv-ybpu7pvmiu5m57lh5kdd:58393:61213 [2] NCCL INFO Trees [0] 3/-1/-1->2->0 [1] 3/-1/-1->2->0 [2] -1/-1/-1->2->3 [3] -1/-1/-1->2->3 [4] 5/-1/-1->2->1 [5] 1/-1/-1->2->5 [6] 3/-1/-1->2->0 [7] 3/-1/-1->2->0 [8] -1/-1/-1->2->3 [9] -1/-1/-1->2->3 [10] 5/-1/-1->2->1 [11] 1/-1/-1->2->5 iv-ybpu7pvmiu5m57lh5kdd:58391:61207 [0] NCCL INFO Channel 08/12 : 0 7 5 4 6 1 3 2 iv-ybpu7pvmiu5m57lh5kdd:58391:61207 [0] NCCL INFO Channel 09/12 : 0 7 5 4 6 1 3 2 iv-ybpu7pvmiu5m57lh5kdd:58393:61213 [2] NCCL INFO Setting affinity for GPU 2 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:58391:61207 [0] NCCL INFO Channel 10/12 : 0 1 2 5 6 7 4 3 iv-ybpu7pvmiu5m57lh5kdd:58391:61207 [0] NCCL INFO Channel 11/12 : 0 3 4 7 6 5 2 1 iv-ybpu7pvmiu5m57lh5kdd:58391:61207 [0] NCCL INFO Trees [0] 2/-1/-1->0->-1 [1] 2/-1/-1->0->-1 [2] 7/-1/-1->0->-1 [3] 7/-1/-1->0->-1 [4] 1/-1/-1->0->-1 [5] 3/-1/-1->0->-1 [6] 2/-1/-1->0->-1 [7] 2/-1/-1->0->-1 [8] 7/-1/-1->0->-1 [9] 7/-1/-1->0->-1 [10] 1/-1/-1->0->-1 [11] 3/-1/-1->0->-1 iv-ybpu7pvmiu5m57lh5kdd:58391:61207 [0] NCCL INFO Setting affinity for GPU 0 to 03ff,ffffffff iv-ybpu7pvmiu5m57lh5kdd:58395:61210 [4] NCCL INFO Channel 00 : 4[69010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:61212 [1] NCCL INFO Channel 04 : 1[65020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:61208 [6] NCCL INFO Channel 04 : 6[6b010] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:61214 [7] NCCL INFO Channel 00 : 7[6b020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:61210 [4] NCCL INFO Channel 01 : 4[69010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:61212 [1] NCCL INFO Channel 10 : 1[65020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:61213 [2] NCCL INFO Channel 00 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:61208 [6] NCCL INFO Channel 10 : 6[6b010] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:61210 [4] NCCL INFO Channel 06 : 4[69010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:61214 [7] NCCL INFO Channel 01 : 7[6b020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:61211 [5] NCCL INFO Channel 04 : 5[69020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:61209 [3] NCCL INFO Channel 05 : 3[67020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:61213 [2] NCCL INFO Channel 01 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:61210 [4] NCCL INFO Channel 07 : 4[69010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:61214 [7] NCCL INFO Channel 06 : 7[6b020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:61207 [0] NCCL INFO Channel 04 : 0[65010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:61211 [5] NCCL INFO Channel 10 : 5[69020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:61209 [3] NCCL INFO Channel 11 : 3[67020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:61213 [2] NCCL INFO Channel 06 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:61214 [7] NCCL INFO Channel 07 : 7[6b020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:61207 [0] NCCL INFO Channel 10 : 0[65010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:61213 [2] NCCL INFO Channel 07 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:61210 [4] NCCL INFO Channel 02 : 4[69010] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:61211 [5] NCCL INFO Channel 00 : 5[69020] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:61212 [1] NCCL INFO Channel 02 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:61207 [0] NCCL INFO Channel 00 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:61210 [4] NCCL INFO Channel 03 : 4[69010] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:61211 [5] NCCL INFO Channel 01 : 5[69020] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:61212 [1] NCCL INFO Channel 03 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:61207 [0] NCCL INFO Channel 01 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:61210 [4] NCCL INFO Channel 08 : 4[69010] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:61211 [5] NCCL INFO Channel 06 : 5[69020] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:61212 [1] NCCL INFO Channel 08 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:61207 [0] NCCL INFO Channel 06 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:61210 [4] NCCL INFO Channel 09 : 4[69010] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:61211 [5] NCCL INFO Channel 07 : 5[69020] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:61212 [1] NCCL INFO Channel 09 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:61207 [0] NCCL INFO Channel 07 : 0[65010] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:61208 [6] NCCL INFO Channel 02 : 6[6b010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:61210 [4] NCCL INFO Channel 05 : 4[69010] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:61208 [6] NCCL INFO Channel 03 : 6[6b010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:61210 [4] NCCL INFO Channel 11 : 4[69010] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:61213 [2] NCCL INFO Channel 04 : 2[67010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:61207 [0] NCCL INFO Channel 05 : 0[65010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:61208 [6] NCCL INFO Channel 08 : 6[6b010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:61213 [2] NCCL INFO Channel 10 : 2[67010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:61207 [0] NCCL INFO Channel 11 : 0[65010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:61208 [6] NCCL INFO Channel 09 : 6[6b010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:61214 [7] NCCL INFO Channel 04 : 7[6b020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:61211 [5] NCCL INFO Channel 05 : 5[69020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:61209 [3] NCCL INFO Channel 04 : 3[67020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:61214 [7] NCCL INFO Channel 10 : 7[6b020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:61211 [5] NCCL INFO Channel 11 : 5[69020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:61209 [3] NCCL INFO Channel 10 : 3[67020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:61214 [7] NCCL INFO Channel 02 : 7[6b020] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:61212 [1] NCCL INFO Channel 00 : 1[65020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:61213 [2] NCCL INFO Channel 02 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:61209 [3] NCCL INFO Channel 00 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:61214 [7] NCCL INFO Channel 03 : 7[6b020] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:61212 [1] NCCL INFO Channel 01 : 1[65020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:61213 [2] NCCL INFO Channel 03 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:61209 [3] NCCL INFO Channel 01 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:61214 [7] NCCL INFO Channel 08 : 7[6b020] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:61212 [1] NCCL INFO Channel 06 : 1[65020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:61213 [2] NCCL INFO Channel 08 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:61209 [3] NCCL INFO Channel 06 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:61214 [7] NCCL INFO Channel 09 : 7[6b020] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:61212 [1] NCCL INFO Channel 07 : 1[65020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:61213 [2] NCCL INFO Channel 09 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:61209 [3] NCCL INFO Channel 07 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:61208 [6] NCCL INFO Channel 00 : 6[6b010] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:61208 [6] NCCL INFO Channel 01 : 6[6b010] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:61208 [6] NCCL INFO Channel 06 : 6[6b010] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:61207 [0] NCCL INFO Channel 02 : 0[65010] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:61211 [5] NCCL INFO Channel 02 : 5[69020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:61208 [6] NCCL INFO Channel 07 : 6[6b010] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:61207 [0] NCCL INFO Channel 03 : 0[65010] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:61211 [5] NCCL INFO Channel 03 : 5[69020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:61213 [2] NCCL INFO Channel 05 : 2[67010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:61214 [7] NCCL INFO Channel 05 : 7[6b020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:61211 [5] NCCL INFO Channel 08 : 5[69020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:61207 [0] NCCL INFO Channel 08 : 0[65010] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:61213 [2] NCCL INFO Channel 11 : 2[67010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:61214 [7] NCCL INFO Channel 11 : 7[6b020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:61211 [5] NCCL INFO Channel 09 : 5[69020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:61207 [0] NCCL INFO Channel 09 : 0[65010] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:61208 [6] NCCL INFO Channel 05 : 6[6b010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:61212 [1] NCCL INFO Channel 05 : 1[65020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:61209 [3] NCCL INFO Channel 02 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:61208 [6] NCCL INFO Channel 11 : 6[6b010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:61212 [1] NCCL INFO Channel 11 : 1[65020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:61209 [3] NCCL INFO Channel 03 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:61210 [4] NCCL INFO Channel 04 : 4[69010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:61209 [3] NCCL INFO Channel 08 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:61210 [4] NCCL INFO Channel 10 : 4[69010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:61209 [3] NCCL INFO Channel 09 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:61208 [6] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58392:61212 [1] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58398:61214 [7] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58392:61212 [1] NCCL INFO Channel 05 : 1[65020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:61207 [0] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58392:61212 [1] NCCL INFO Channel 11 : 1[65020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:61208 [6] NCCL INFO Channel 05 : 6[6b010] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:61211 [5] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58395:61210 [4] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58393:61213 [2] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58394:61209 [3] NCCL INFO Connected all rings iv-ybpu7pvmiu5m57lh5kdd:58397:61208 [6] NCCL INFO Channel 11 : 6[6b010] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:61214 [7] NCCL INFO Channel 02 : 7[6b020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:61214 [7] NCCL INFO Channel 03 : 7[6b020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:61214 [7] NCCL INFO Channel 08 : 7[6b020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:61210 [4] NCCL INFO Channel 02 : 4[69010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:61213 [2] NCCL INFO Channel 02 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:61214 [7] NCCL INFO Channel 09 : 7[6b020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:61210 [4] NCCL INFO Channel 03 : 4[69010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:61213 [2] NCCL INFO Channel 03 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:61211 [5] NCCL INFO Channel 05 : 5[69020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:61213 [2] NCCL INFO Channel 08 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:61210 [4] NCCL INFO Channel 08 : 4[69010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:61209 [3] NCCL INFO Channel 04 : 3[67020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:61211 [5] NCCL INFO Channel 11 : 5[69020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:61213 [2] NCCL INFO Channel 09 : 2[67010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:61210 [4] NCCL INFO Channel 09 : 4[69010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:61209 [3] NCCL INFO Channel 10 : 3[67020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:61212 [1] NCCL INFO Channel 00 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:61212 [1] NCCL INFO Channel 01 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:61212 [1] NCCL INFO Channel 06 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:61211 [5] NCCL INFO Channel 02 : 5[69020] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:61213 [2] NCCL INFO Channel 05 : 2[67010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:61210 [4] NCCL INFO Channel 00 : 4[69010] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:61212 [1] NCCL INFO Channel 07 : 1[65020] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:61211 [5] NCCL INFO Channel 03 : 5[69020] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:61213 [2] NCCL INFO Channel 11 : 2[67010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:61210 [4] NCCL INFO Channel 01 : 4[69010] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:61211 [5] NCCL INFO Channel 08 : 5[69020] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:61210 [4] NCCL INFO Channel 06 : 4[69010] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:61211 [5] NCCL INFO Channel 09 : 5[69020] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:61210 [4] NCCL INFO Channel 07 : 4[69010] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:61209 [3] NCCL INFO Channel 05 : 3[67020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:61208 [6] NCCL INFO Channel 00 : 6[6b010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:61210 [4] NCCL INFO Channel 04 : 4[69010] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:61209 [3] NCCL INFO Channel 11 : 3[67020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:61208 [6] NCCL INFO Channel 01 : 6[6b010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:61210 [4] NCCL INFO Channel 10 : 4[69010] -> 7[6b020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:61208 [6] NCCL INFO Channel 06 : 6[6b010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:61208 [6] NCCL INFO Channel 07 : 6[6b010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:61209 [3] NCCL INFO Channel 02 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:61211 [5] NCCL INFO Channel 04 : 5[69020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:61214 [7] NCCL INFO Channel 05 : 7[6b020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:61209 [3] NCCL INFO Channel 03 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:61214 [7] NCCL INFO Channel 11 : 7[6b020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:61211 [5] NCCL INFO Channel 10 : 5[69020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:61209 [3] NCCL INFO Channel 08 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:61209 [3] NCCL INFO Channel 09 : 3[67020] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:61212 [1] NCCL INFO Channel 02 : 1[65020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:61214 [7] NCCL INFO Channel 00 : 7[6b020] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:61213 [2] NCCL INFO Channel 00 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:61212 [1] NCCL INFO Channel 03 : 1[65020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:61214 [7] NCCL INFO Channel 01 : 7[6b020] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:61213 [2] NCCL INFO Channel 01 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:61212 [1] NCCL INFO Channel 08 : 1[65020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:61214 [7] NCCL INFO Channel 06 : 7[6b020] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:61213 [2] NCCL INFO Channel 06 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:61212 [1] NCCL INFO Channel 09 : 1[65020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:61213 [2] NCCL INFO Channel 07 : 2[67010] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:61214 [7] NCCL INFO Channel 07 : 7[6b020] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:61208 [6] NCCL INFO Channel 02 : 6[6b010] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:61214 [7] NCCL INFO Channel 04 : 7[6b020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:61208 [6] NCCL INFO Channel 03 : 6[6b010] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58398:61214 [7] NCCL INFO Channel 10 : 7[6b020] -> 6[6b010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:61211 [5] NCCL INFO Channel 00 : 5[69020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:61208 [6] NCCL INFO Channel 08 : 6[6b010] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:61211 [5] NCCL INFO Channel 01 : 5[69020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:61208 [6] NCCL INFO Channel 09 : 6[6b010] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:61211 [5] NCCL INFO Channel 06 : 5[69020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:61213 [2] NCCL INFO Channel 04 : 2[67010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58396:61211 [5] NCCL INFO Channel 07 : 5[69020] -> 4[69010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58393:61213 [2] NCCL INFO Channel 10 : 2[67010] -> 1[65020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:61212 [1] NCCL INFO Channel 04 : 1[65020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:61209 [3] NCCL INFO Channel 00 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:61208 [6] NCCL INFO Channel 04 : 6[6b010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:61212 [1] NCCL INFO Channel 10 : 1[65020] -> 0[65010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:61209 [3] NCCL INFO Channel 01 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58397:61208 [6] NCCL INFO Channel 10 : 6[6b010] -> 5[69020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58395:61210 [4] NCCL INFO Channel 05 : 4[69010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:61209 [3] NCCL INFO Channel 06 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58391:61207 [0] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58391:61207 [0] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:58398:61214 [7] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58398:61214 [7] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:58395:61210 [4] NCCL INFO Channel 11 : 4[69010] -> 3[67020] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58394:61209 [3] NCCL INFO Channel 07 : 3[67020] -> 2[67010] via P2P/IPC iv-ybpu7pvmiu5m57lh5kdd:58392:61212 [1] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58392:61212 [1] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:58397:61208 [6] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58397:61208 [6] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:58391:61207 [0] NCCL INFO 12 coll channels, 16 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58398:61214 [7] NCCL INFO 12 coll channels, 16 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58398:61214 [7] NCCL INFO Channel 02 : 7[6b020] -> 1[65020] via P2P/indirect/6[6b010] iv-ybpu7pvmiu5m57lh5kdd:58392:61212 [1] NCCL INFO 12 coll channels, 16 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58397:61208 [6] NCCL INFO 12 coll channels, 16 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58396:61211 [5] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58396:61211 [5] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:58393:61213 [2] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58393:61213 [2] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:58395:61210 [4] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58395:61210 [4] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:58394:61209 [3] NCCL INFO Connected all trees iv-ybpu7pvmiu5m57lh5kdd:58394:61209 [3] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512 iv-ybpu7pvmiu5m57lh5kdd:58398:61214 [7] NCCL INFO Channel 10 : 7[6b020] -> 1[65020] via P2P/indirect/6[6b010] iv-ybpu7pvmiu5m57lh5kdd:58396:61211 [5] NCCL INFO 12 coll channels, 16 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58397:61208 [6] NCCL INFO Channel 02 : 6[6b010] -> 0[65010] via P2P/indirect/7[6b020] iv-ybpu7pvmiu5m57lh5kdd:58393:61213 [2] NCCL INFO 12 coll channels, 16 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58395:61210 [4] NCCL INFO 12 coll channels, 16 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58394:61209 [3] NCCL INFO 12 coll channels, 16 p2p channels, 2 p2p channels per peer iv-ybpu7pvmiu5m57lh5kdd:58397:61208 [6] NCCL INFO Channel 10 : 6[6b010] -> 0[65010] via P2P/indirect/7[6b020] iv-ybpu7pvmiu5m57lh5kdd:58393:61213 [2] NCCL INFO Channel 02 : 2[67010] -> 4[69010] via P2P/indirect/5[69020] iv-ybpu7pvmiu5m57lh5kdd:58394:61209 [3] NCCL INFO Channel 02 : 3[67020] -> 5[69020] via P2P/indirect/4[69010] iv-ybpu7pvmiu5m57lh5kdd:58398:61214 [7] NCCL INFO Channel 03 : 7[6b020] -> 2[67010] via P2P/indirect/0[65010] iv-ybpu7pvmiu5m57lh5kdd:58392:61212 [1] NCCL INFO Channel 03 : 1[65020] -> 4[69010] via P2P/indirect/6[6b010] iv-ybpu7pvmiu5m57lh5kdd:58393:61213 [2] NCCL INFO Channel 10 : 2[67010] -> 4[69010] via P2P/indirect/5[69020] iv-ybpu7pvmiu5m57lh5kdd:58398:61214 [7] NCCL INFO Channel 11 : 7[6b020] -> 2[67010] via P2P/indirect/0[65010] iv-ybpu7pvmiu5m57lh5kdd:58392:61212 [1] NCCL INFO Channel 11 : 1[65020] -> 4[69010] via P2P/indirect/6[6b010] iv-ybpu7pvmiu5m57lh5kdd:58394:61209 [3] NCCL INFO Channel 10 : 3[67020] -> 5[69020] via P2P/indirect/4[69010] iv-ybpu7pvmiu5m57lh5kdd:58396:61211 [5] NCCL INFO Channel 03 : 5[69020] -> 0[65010] via P2P/indirect/7[6b020] iv-ybpu7pvmiu5m57lh5kdd:58394:61209 [3] NCCL INFO Channel 03 : 3[67020] -> 6[6b010] via P2P/indirect/1[65020] iv-ybpu7pvmiu5m57lh5kdd:58396:61211 [5] NCCL INFO Channel 11 : 5[69020] -> 0[65010] via P2P/indirect/7[6b020] iv-ybpu7pvmiu5m57lh5kdd:58394:61209 [3] NCCL INFO Channel 11 : 3[67020] -> 6[6b010] via P2P/indirect/1[65020] iv-ybpu7pvmiu5m57lh5kdd:58393:61213 [2] NCCL INFO Channel 04 : 2[67010] -> 6[6b010] via P2P/indirect/1[65020] iv-ybpu7pvmiu5m57lh5kdd:58395:61210 [4] NCCL INFO Channel 04 : 4[69010] -> 0[65010] via P2P/indirect/7[6b020] iv-ybpu7pvmiu5m57lh5kdd:58398:61214 [7] NCCL INFO Channel 04 : 7[6b020] -> 3[67020] via P2P/indirect/0[65010] iv-ybpu7pvmiu5m57lh5kdd:58393:61213 [2] NCCL INFO Channel 12 : 2[67010] -> 6[6b010] via P2P/indirect/1[65020] iv-ybpu7pvmiu5m57lh5kdd:58396:61211 [5] NCCL INFO Channel 04 : 5[69020] -> 1[65020] via P2P/indirect/6[6b010] iv-ybpu7pvmiu5m57lh5kdd:58391:61207 [0] NCCL INFO Channel 04 : 0[65010] -> 4[69010] via P2P/indirect/3[67020] iv-ybpu7pvmiu5m57lh5kdd:58395:61210 [4] NCCL INFO Channel 12 : 4[69010] -> 0[65010] via P2P/indirect/7[6b020] iv-ybpu7pvmiu5m57lh5kdd:58392:61212 [1] NCCL INFO Channel 04 : 1[65020] -> 5[69020] via P2P/indirect/6[6b010] iv-ybpu7pvmiu5m57lh5kdd:58397:61208 [6] NCCL INFO Channel 04 : 6[6b010] -> 2[67010] via P2P/indirect/5[69020] iv-ybpu7pvmiu5m57lh5kdd:58394:61209 [3] NCCL INFO Channel 04 : 3[67020] -> 7[6b020] via P2P/indirect/0[65010] iv-ybpu7pvmiu5m57lh5kdd:58398:61214 [7] NCCL INFO Channel 12 : 7[6b020] -> 3[67020] via P2P/indirect/0[65010] iv-ybpu7pvmiu5m57lh5kdd:58396:61211 [5] NCCL INFO Channel 12 : 5[69020] -> 1[65020] via P2P/indirect/6[6b010] iv-ybpu7pvmiu5m57lh5kdd:58391:61207 [0] NCCL INFO Channel 12 : 0[65010] -> 4[69010] via P2P/indirect/3[67020] iv-ybpu7pvmiu5m57lh5kdd:58392:61212 [1] NCCL INFO Channel 12 : 1[65020] -> 5[69020] via P2P/indirect/6[6b010] iv-ybpu7pvmiu5m57lh5kdd:58397:61208 [6] NCCL INFO Channel 12 : 6[6b010] -> 2[67010] via P2P/indirect/5[69020] iv-ybpu7pvmiu5m57lh5kdd:58394:61209 [3] NCCL INFO Channel 12 : 3[67020] -> 7[6b020] via P2P/indirect/0[65010] iv-ybpu7pvmiu5m57lh5kdd:58395:61210 [4] NCCL INFO Channel 05 : 4[69010] -> 1[65020] via P2P/indirect/6[6b010] iv-ybpu7pvmiu5m57lh5kdd:58393:61213 [2] NCCL INFO Channel 05 : 2[67010] -> 7[6b020] via P2P/indirect/0[65010] iv-ybpu7pvmiu5m57lh5kdd:58391:61207 [0] NCCL INFO Channel 05 : 0[65010] -> 5[69020] via P2P/indirect/7[6b020] iv-ybpu7pvmiu5m57lh5kdd:58397:61208 [6] NCCL INFO Channel 05 : 6[6b010] -> 3[67020] via P2P/indirect/1[65020] iv-ybpu7pvmiu5m57lh5kdd:58395:61210 [4] NCCL INFO Channel 13 : 4[69010] -> 1[65020] via P2P/indirect/6[6b010] iv-ybpu7pvmiu5m57lh5kdd:58393:61213 [2] NCCL INFO Channel 13 : 2[67010] -> 7[6b020] via P2P/indirect/0[65010] iv-ybpu7pvmiu5m57lh5kdd:58391:61207 [0] NCCL INFO Channel 13 : 0[65010] -> 5[69020] via P2P/indirect/7[6b020] iv-ybpu7pvmiu5m57lh5kdd:58397:61208 [6] NCCL INFO Channel 13 : 6[6b010] -> 3[67020] via P2P/indirect/1[65020] iv-ybpu7pvmiu5m57lh5kdd:58395:61210 [4] NCCL INFO Channel 06 : 4[69010] -> 2[67010] via P2P/indirect/3[67020] iv-ybpu7pvmiu5m57lh5kdd:58392:61212 [1] NCCL INFO Channel 06 : 1[65020] -> 7[6b020] via P2P/indirect/0[65010] iv-ybpu7pvmiu5m57lh5kdd:58391:61207 [0] NCCL INFO Channel 06 : 0[65010] -> 6[6b010] via P2P/indirect/1[65020] iv-ybpu7pvmiu5m57lh5kdd:58396:61211 [5] NCCL INFO Channel 06 : 5[69020] -> 3[67020] via P2P/indirect/2[67010] iv-ybpu7pvmiu5m57lh5kdd:58392:61212 [1] NCCL INFO Channel 14 : 1[65020] -> 7[6b020] via P2P/indirect/0[65010] iv-ybpu7pvmiu5m57lh5kdd:58395:61210 [4] NCCL INFO Channel 14 : 4[69010] -> 2[67010] via P2P/indirect/3[67020] iv-ybpu7pvmiu5m57lh5kdd:58391:61207 [0] NCCL INFO Channel 14 : 0[65010] -> 6[6b010] via P2P/indirect/1[65020] iv-ybpu7pvmiu5m57lh5kdd:58396:61211 [5] NCCL INFO Channel 14 : 5[69020] -> 3[67020] via P2P/indirect/2[67010] iv-ybpu7pvmiu5m57lh5kdd:58395:61210 [4] NCCL INFO comm 0x7f95b0008fb0 rank 4 nranks 8 cudaDev 4 busId 69010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58396:61211 [5] NCCL INFO comm 0x7f6d88008fb0 rank 5 nranks 8 cudaDev 5 busId 69020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58392:61212 [1] NCCL INFO comm 0x7f3514008fb0 rank 1 nranks 8 cudaDev 1 busId 65020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58394:61209 [3] NCCL INFO comm 0x7fb084008fb0 rank 3 nranks 8 cudaDev 3 busId 67020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58391:61207 [0] NCCL INFO comm 0x7fe0d4008fb0 rank 0 nranks 8 cudaDev 0 busId 65010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58398:61214 [7] NCCL INFO comm 0x7fdd3c008fb0 rank 7 nranks 8 cudaDev 7 busId 6b020 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58397:61208 [6] NCCL INFO comm 0x7fd264008fb0 rank 6 nranks 8 cudaDev 6 busId 6b010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58393:61213 [2] NCCL INFO comm 0x7f9250008fb0 rank 2 nranks 8 cudaDev 2 busId 67010 - Init COMPLETE iv-ybpu7pvmiu5m57lh5kdd:58391:58391 [0] NCCL INFO Launch mode Parallel [Rank 5] (after 100 iterations) memory (MB) | allocated: 1537.52734375 | max allocated: 5355.98974609375 | reserved: 6264.0 | max reserved: 6264.0 [Rank 4] (after 100 iterations) memory (MB) | allocated: 1537.52734375 | max allocated: 5644.990234375 | reserved: 6648.0 | max reserved: 6648.0 [Rank 2] (after 100 iterations) memory (MB) | allocated: 1537.52734375 | max allocated: 6222.9912109375 | reserved: 7226.0 | max reserved: 7226.0 [Rank 3] (after 100 iterations) memory (MB) | allocated: 1537.52734375 | max allocated: 5933.99072265625 | reserved: 6842.0 | max reserved: 6842.0 [Rank 1] (after 100 iterations) memory (MB) | allocated: 1537.52734375 | max allocated: 6511.99169921875 | reserved: 7420.0 | max reserved: 7420.0 [Rank 6] (after 100 iterations) memory (MB) | allocated: 1537.52734375 | max allocated: 5066.9892578125 | reserved: 6070.0 | max reserved: 6070.0 iteration 100/ 220 | consumed samples: 38400 | elapsed time per iteration (ms): 12225.0 | learning rate: 3.984E-06 | tpt: 31.4 samples/s | global batch size: 384 | lm loss: 9.881203E+00 | loss scale: 262144.0 | grad norm: 2.721 | number of skipped iterations: 15 | number of nan iterations: 0 | [Rank 0] (after 100 iterations) memory (MB) | allocated: 2444.02734375 | max allocated: 7580.9765625 | reserved: 8664.0 | max reserved: 8664.0 [Rank 7] (after 100 iterations) memory (MB) | allocated: 2520.0673828125 | max allocated: 9496.7080078125 | reserved: 11432.0 | max reserved: 11432.0 time (ms) | forward-compute: 3248.53 | forward-recv: 884.45 | backward-compute: 5955.42 | backward-send: 2.40 | backward-send-forward-recv: 68.01 | backward-params-all-reduce: 0.85 | backward-embedding-all-reduce: 2036.80 | optimizer-copy-to-main-grad: 1.50 | optimizer-unscale-and-check-inf: 13.83 | optimizer-clip-main-grad: 2.24 | optimizer-copy-main-to-model-params: 1.33 | optimizer: 23.41 | batch-generator: 15.74 timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] 2022/07/05 16:02:31.764, Tesla V100-SXM2-32GB, 470.57.02, 99 %, 66 %, 32510 MiB, 21494 MiB, 11016 MiB 2022/07/05 16:02:31.768, Tesla V100-SXM2-32GB, 470.57.02, 56 %, 3 %, 32510 MiB, 22822 MiB, 9688 MiB 2022/07/05 16:02:31.771, Tesla V100-SXM2-32GB, 470.57.02, 57 %, 3 %, 32510 MiB, 23268 MiB, 9242 MiB 2022/07/05 16:02:31.773, Tesla V100-SXM2-32GB, 470.57.02, 90 %, 3 %, 32510 MiB, 23520 MiB, 8990 MiB 2022/07/05 16:02:31.774, Tesla V100-SXM2-32GB, 470.57.02, 38 %, 3 %, 32510 MiB, 23726 MiB, 8784 MiB 2022/07/05 16:02:31.775, Tesla V100-SXM2-32GB, 470.57.02, 70 %, 3 %, 32510 MiB, 24038 MiB, 8472 MiB timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] 2022/07/05 16:02:31.777, Tesla V100-SXM2-32GB, 470.57.02, 69 %, 3 %, 32510 MiB, 24040 MiB, 8470 MiB timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] 2022/07/05 16:02:31.778, Tesla V100-SXM2-32GB, 470.57.02, 99 %, 66 %, 32510 MiB, 21494 MiB, 11016 MiB 2022/07/05 16:02:31.781, Tesla V100-SXM2-32GB, 470.57.02, 37 %, 7 %, 32510 MiB, 18846 MiB, 13664 MiB 2022/07/05 16:02:31.781, Tesla V100-SXM2-32GB, 470.57.02, 99 %, 66 %, 32510 MiB, 21494 MiB, 11016 MiB 2022/07/05 16:02:31.782, Tesla V100-SXM2-32GB, 470.57.02, 56 %, 3 %, 32510 MiB, 22822 MiB, 9688 MiB timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] 2022/07/05 16:02:31.784, Tesla V100-SXM2-32GB, 470.57.02, 56 %, 3 %, 32510 MiB, 22822 MiB, 9688 MiB 2022/07/05 16:02:31.785, Tesla V100-SXM2-32GB, 470.57.02, 57 %, 3 %, 32510 MiB, 23268 MiB, 9242 MiB 2022/07/05 16:02:31.786, Tesla V100-SXM2-32GB, 470.57.02, 99 %, 66 %, 32510 MiB, 21494 MiB, 11016 MiB timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] 2022/07/05 16:02:31.787, Tesla V100-SXM2-32GB, 470.57.02, 99 %, 66 %, 32510 MiB, 21494 MiB, 11016 MiB 2022/07/05 16:02:31.788, Tesla V100-SXM2-32GB, 470.57.02, 57 %, 3 %, 32510 MiB, 23268 MiB, 9242 MiB 2022/07/05 16:02:31.788, Tesla V100-SXM2-32GB, 470.57.02, 90 %, 3 %, 32510 MiB, 23520 MiB, 8990 MiB 2022/07/05 16:02:31.788, Tesla V100-SXM2-32GB, 470.57.02, 99 %, 66 %, 32510 MiB, 21494 MiB, 11016 MiB 2022/07/05 16:02:31.790, Tesla V100-SXM2-32GB, 470.57.02, 56 %, 3 %, 32510 MiB, 22822 MiB, 9688 MiB 2022/07/05 16:02:31.790, Tesla V100-SXM2-32GB, 470.57.02, 99 %, 66 %, 32510 MiB, 21494 MiB, 11016 MiB 2022/07/05 16:02:31.793, Tesla V100-SXM2-32GB, 470.57.02, 56 %, 3 %, 32510 MiB, 22822 MiB, 9688 MiB 2022/07/05 16:02:31.793, Tesla V100-SXM2-32GB, 470.57.02, 90 %, 3 %, 32510 MiB, 23520 MiB, 8990 MiB 2022/07/05 16:02:31.794, Tesla V100-SXM2-32GB, 470.57.02, 38 %, 3 %, 32510 MiB, 23726 MiB, 8784 MiB 2022/07/05 16:02:31.794, Tesla V100-SXM2-32GB, 470.57.02, 56 %, 3 %, 32510 MiB, 22822 MiB, 9688 MiB timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] 2022/07/05 16:02:31.795, Tesla V100-SXM2-32GB, 470.57.02, 57 %, 3 %, 32510 MiB, 23268 MiB, 9242 MiB 2022/07/05 16:02:31.796, Tesla V100-SXM2-32GB, 470.57.02, 56 %, 3 %, 32510 MiB, 22822 MiB, 9688 MiB 2022/07/05 16:02:31.798, Tesla V100-SXM2-32GB, 470.57.02, 57 %, 3 %, 32510 MiB, 23268 MiB, 9242 MiB 2022/07/05 16:02:31.798, Tesla V100-SXM2-32GB, 470.57.02, 38 %, 3 %, 32510 MiB, 23726 MiB, 8784 MiB 2022/07/05 16:02:31.798, Tesla V100-SXM2-32GB, 470.57.02, 70 %, 3 %, 32510 MiB, 24038 MiB, 8472 MiB 2022/07/05 16:02:31.799, Tesla V100-SXM2-32GB, 470.57.02, 57 %, 3 %, 32510 MiB, 23268 MiB, 9242 MiB 2022/07/05 16:02:31.800, Tesla V100-SXM2-32GB, 470.57.02, 90 %, 3 %, 32510 MiB, 23520 MiB, 8990 MiB 2022/07/05 16:02:31.799, Tesla V100-SXM2-32GB, 470.57.02, 99 %, 66 %, 32510 MiB, 21494 MiB, 11016 MiB 2022/07/05 16:02:31.801, Tesla V100-SXM2-32GB, 470.57.02, 57 %, 3 %, 32510 MiB, 23268 MiB, 9242 MiB 2022/07/05 16:02:31.806, Tesla V100-SXM2-32GB, 470.57.02, 90 %, 3 %, 32510 MiB, 23520 MiB, 8990 MiB 2022/07/05 16:02:31.807, Tesla V100-SXM2-32GB, 470.57.02, 70 %, 3 %, 32510 MiB, 24038 MiB, 8472 MiB 2022/07/05 16:02:31.807, Tesla V100-SXM2-32GB, 470.57.02, 69 %, 3 %, 32510 MiB, 24040 MiB, 8470 MiB 2022/07/05 16:02:31.808, Tesla V100-SXM2-32GB, 470.57.02, 90 %, 3 %, 32510 MiB, 23520 MiB, 8990 MiB 2022/07/05 16:02:31.809, Tesla V100-SXM2-32GB, 470.57.02, 38 %, 3 %, 32510 MiB, 23726 MiB, 8784 MiB 2022/07/05 16:02:31.809, Tesla V100-SXM2-32GB, 470.57.02, 56 %, 3 %, 32510 MiB, 22822 MiB, 9688 MiB 2022/07/05 16:02:31.810, Tesla V100-SXM2-32GB, 470.57.02, 90 %, 3 %, 32510 MiB, 23520 MiB, 8990 MiB 2022/07/05 16:02:31.812, Tesla V100-SXM2-32GB, 470.57.02, 38 %, 3 %, 32510 MiB, 23726 MiB, 8784 MiB 2022/07/05 16:02:31.813, Tesla V100-SXM2-32GB, 470.57.02, 69 %, 3 %, 32510 MiB, 24040 MiB, 8470 MiB 2022/07/05 16:02:31.813, Tesla V100-SXM2-32GB, 470.57.02, 37 %, 7 %, 32510 MiB, 18846 MiB, 13664 MiB 2022/07/05 16:02:31.813, Tesla V100-SXM2-32GB, 470.57.02, 38 %, 3 %, 32510 MiB, 23726 MiB, 8784 MiB 2022/07/05 16:02:31.814, Tesla V100-SXM2-32GB, 470.57.02, 70 %, 3 %, 32510 MiB, 24038 MiB, 8472 MiB 2022/07/05 16:02:31.815, Tesla V100-SXM2-32GB, 470.57.02, 57 %, 3 %, 32510 MiB, 23268 MiB, 9242 MiB 2022/07/05 16:02:31.815, Tesla V100-SXM2-32GB, 470.57.02, 38 %, 3 %, 32510 MiB, 23726 MiB, 8784 MiB 2022/07/05 16:02:31.817, Tesla V100-SXM2-32GB, 470.57.02, 70 %, 3 %, 32510 MiB, 24038 MiB, 8472 MiB 2022/07/05 16:02:31.818, Tesla V100-SXM2-32GB, 470.57.02, 37 %, 7 %, 32510 MiB, 18846 MiB, 13664 MiB 2022/07/05 16:02:31.819, Tesla V100-SXM2-32GB, 470.57.02, 70 %, 3 %, 32510 MiB, 24038 MiB, 8472 MiB 2022/07/05 16:02:31.820, Tesla V100-SXM2-32GB, 470.57.02, 69 %, 3 %, 32510 MiB, 24040 MiB, 8470 MiB 2022/07/05 16:02:31.820, Tesla V100-SXM2-32GB, 470.57.02, 90 %, 3 %, 32510 MiB, 23520 MiB, 8990 MiB 2022/07/05 16:02:31.821, Tesla V100-SXM2-32GB, 470.57.02, 70 %, 3 %, 32510 MiB, 24038 MiB, 8472 MiB 2022/07/05 16:02:31.823, Tesla V100-SXM2-32GB, 470.57.02, 69 %, 3 %, 32510 MiB, 24040 MiB, 8470 MiB 2022/07/05 16:02:31.824, Tesla V100-SXM2-32GB, 470.57.02, 69 %, 3 %, 32510 MiB, 24040 MiB, 8470 MiB 2022/07/05 16:02:31.825, Tesla V100-SXM2-32GB, 470.57.02, 37 %, 7 %, 32510 MiB, 18846 MiB, 13664 MiB 2022/07/05 16:02:31.826, Tesla V100-SXM2-32GB, 470.57.02, 38 %, 3 %, 32510 MiB, 23726 MiB, 8784 MiB 2022/07/05 16:02:31.826, Tesla V100-SXM2-32GB, 470.57.02, 69 %, 3 %, 32510 MiB, 24040 MiB, 8470 MiB 2022/07/05 16:02:31.828, Tesla V100-SXM2-32GB, 470.57.02, 37 %, 7 %, 32510 MiB, 18846 MiB, 13664 MiB 2022/07/05 16:02:31.829, Tesla V100-SXM2-32GB, 470.57.02, 37 %, 7 %, 32510 MiB, 18846 MiB, 13664 MiB 2022/07/05 16:02:31.831, Tesla V100-SXM2-32GB, 470.57.02, 70 %, 3 %, 32510 MiB, 24038 MiB, 8472 MiB 2022/07/05 16:02:31.831, Tesla V100-SXM2-32GB, 470.57.02, 37 %, 7 %, 32510 MiB, 18846 MiB, 13664 MiB 2022/07/05 16:02:31.835, Tesla V100-SXM2-32GB, 470.57.02, 69 %, 3 %, 32510 MiB, 24040 MiB, 8470 MiB 2022/07/05 16:02:31.840, Tesla V100-SXM2-32GB, 470.57.02, 37 %, 7 %, 32510 MiB, 18846 MiB, 13664 MiB iteration 200/ 220 | consumed samples: 76800 | elapsed time per iteration (ms): 12075.8 | learning rate: 8.672E-06 | tpt: 31.8 samples/s | global batch size: 384 | lm loss: 8.575095E+00 | loss scale: 262144.0 | grad norm: 2.544 | number of skipped iterations: 0 | number of nan iterations: 0 | time (ms) | forward-compute: 3236.74 | forward-recv: 788.21 | backward-compute: 5949.34 | backward-send: 2.40 | backward-send-forward-recv: 41.82 | backward-params-all-reduce: 0.85 | backward-embedding-all-reduce: 2037.16 | optimizer-copy-to-main-grad: 1.49 | optimizer-unscale-and-check-inf: 1.65 | optimizer-clip-main-grad: 2.57 | optimizer-copy-main-to-model-params: 1.59 | optimizer: 12.53 | batch-generator: 15.03 [after training is done] datetime: 2022-07-05 16:26:28 ------------------------------------------------------------------------------------------------------------------ validation loss at the end of training for val data | lm loss value: 7.624660E+00 | lm loss PPL: 2.048084E+03 | ------------------------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------------------------- validation loss at the end of training for test data | lm loss value: 7.441236E+00 | lm loss PPL: 1.704855E+03 | ------------------------------------------------------------------------------------------------------------------- INFO:torch.distributed.elastic.agent.server.api:[default] worker group successfully finished. Waiting 300 seconds for other agents to finish. INFO:torch.distributed.elastic.agent.server.api:Local worker group finished (SUCCEEDED). Waiting 300 seconds for other agents to finish /opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:70: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:Done waiting for other agents. Elapsed: 0.00041103363037109375 seconds {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 0, "group_rank": 0, "worker_id": "58391", "role": "default", "hostname": "iv-ybpu7pvmiu5m57lh5kdd", "state": "SUCCEEDED", "total_run_time": 2747, "rdzv_backend": "static", "raw_error": null, "metadata": "{\"group_world_size\": 1, \"entry_point\": \"python\", \"local_rank\": [0], \"role_rank\": [0], \"role_world_size\": [8]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 1, "group_rank": 0, "worker_id": "58392", "role": "default", "hostname": "iv-ybpu7pvmiu5m57lh5kdd", "state": "SUCCEEDED", "total_run_time": 2747, "rdzv_backend": "static", "raw_error": null, "metadata": "{\"group_world_size\": 1, \"entry_point\": \"python\", \"local_rank\": [1], \"role_rank\": [1], \"role_world_size\": [8]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 2, "group_rank": 0, "worker_id": "58393", "role": "default", "hostname": "iv-ybpu7pvmiu5m57lh5kdd", "state": "SUCCEEDED", "total_run_time": 2747, "rdzv_backend": "static", "raw_error": null, "metadata": "{\"group_world_size\": 1, \"entry_point\": \"python\", \"local_rank\": [2], \"role_rank\": [2], \"role_world_size\": [8]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 3, "group_rank": 0, "worker_id": "58394", "role": "default", "hostname": "iv-ybpu7pvmiu5m57lh5kdd", "state": "SUCCEEDED", "total_run_time": 2747, "rdzv_backend": "static", "raw_error": null, "metadata": "{\"group_world_size\": 1, \"entry_point\": \"python\", \"local_rank\": [3], \"role_rank\": [3], \"role_world_size\": [8]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 4, "group_rank": 0, "worker_id": "58395", "role": "default", "hostname": "iv-ybpu7pvmiu5m57lh5kdd", "state": "SUCCEEDED", "total_run_time": 2747, "rdzv_backend": "static", "raw_error": null, "metadata": "{\"group_world_size\": 1, \"entry_point\": \"python\", \"local_rank\": [4], \"role_rank\": [4], \"role_world_size\": [8]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 5, "group_rank": 0, "worker_id": "58396", "role": "default", "hostname": "iv-ybpu7pvmiu5m57lh5kdd", "state": "SUCCEEDED", "total_run_time": 2747, "rdzv_backend": "static", "raw_error": null, "metadata": "{\"group_world_size\": 1, \"entry_point\": \"python\", \"local_rank\": [5], \"role_rank\": [5], \"role_world_size\": [8]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 6, "group_rank": 0, "worker_id": "58397", "role": "default", "hostname": "iv-ybpu7pvmiu5m57lh5kdd", "state": "SUCCEEDED", "total_run_time": 2747, "rdzv_backend": "static", "raw_error": null, "metadata": "{\"group_world_size\": 1, \"entry_point\": \"python\", \"local_rank\": [6], \"role_rank\": [6], \"role_world_size\": [8]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 7, "group_rank": 0, "worker_id": "58398", "role": "default", "hostname": "iv-ybpu7pvmiu5m57lh5kdd", "state": "SUCCEEDED", "total_run_time": 2747, "rdzv_backend": "static", "raw_error": null, "metadata": "{\"group_world_size\": 1, \"entry_point\": \"python\", \"local_rank\": [7], \"role_rank\": [7], \"role_world_size\": [8]}", "agent_restarts": 0}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "AGENT", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": null, "group_rank": 0, "worker_id": null, "role": "default", "hostname": "iv-ybpu7pvmiu5m57lh5kdd", "state": "SUCCEEDED", "total_run_time": 2747, "rdzv_backend": "static", "raw_error": null, "metadata": "{\"group_world_size\": 1, \"entry_point\": \"python\"}", "agent_restarts": 0}} ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. *****************************************