[03/05 19:52:36 libai]: Rank of current process: 0. World size: 8 [03/05 19:52:36 libai]: Command line arguments: Namespace(config_file='configs/swin_imagenet.py', eval_only=False, fast_dev_run=False, opts=['model.cfg.hidden_dropout_prob=0.1', 'model.cfg.attention_probs_dropout_prob=0.1', 'model.cfg.bias_dropout_fusion=true', 'model.cfg.hidden_layers=12', 'model.cfg.hidden_size=768', 'model.cfg.num_attention_heads=12', 'model.cfg.intermediate_size=3072', 'model.cfg.ffn_hidden_size=3072', 'model.cfg.head_size=64', 'graph.enabled=true', 'train.dist.pipeline_num_layers=12', 'train.train_micro_batch_size=64', 'train.global_batch_size=2048', 'train.dist.tensor_parallel_size=1', 'train.dist.pipeline_parallel_size=2', 'train.amp.enabled=true', 'train.activation_checkpoint.enabled=true', 'train.num_accumulation_steps=8', 'train.evaluation.enabled=false', 'train.train_iter=220', 'train.train_epoch=0', 'train.log_period=100', 'train.zero_optimization.enabled=true', 'train.zero_optimization.stage=2', 'train.load_weight=', 'train.output_dir=test_logs/oneflow-28/NVIDIA_GeForce_RTX_3080_Ti/7d07caf/LibAI_swin_imagenet_graph_nl12_nah12_hs768_FP16_actrue_DP4_MP1_PP2_zerotrue_stage2_mbs64_gbs2048_acc8_1n8g'], resume=False) [03/05 19:52:36 libai]: Contents of args.config_file=configs/swin_imagenet.py: from libai.config import LazyCall from .common.models.swin.swin_tiny_patch4_window7_224 import model from .common.models.graph import graph from .common.train import train from .common.optim import optim from .common.data.imagenet import dataloader from flowvision.data import Mixup from flowvision.loss.cross_entropy import SoftTargetCrossEntropy # Refine data path to imagenet dataloader.train.dataset[0].root = "/ssd/dataset/ImageNet/extract" dataloader.test[0].dataset.root = "/ssd/dataset/ImageNet/extract" # Add Mixup Func dataloader.train.mixup_func = LazyCall(Mixup)(  mixup_alpha=0.8,  cutmix_alpha=1.0,  prob=1.0,  switch_prob=0.5,  mode="batch",  num_classes=1000, ) # Refine model cfg for vit training on imagenet model.cfg.num_classes = 1000 model.cfg.loss_func = SoftTargetCrossEntropy() # Refine optimizer cfg for vit model optim.lr = 1e-3 optim.eps = 1e-8 optim.weight_decay = 0.05 optim.params.clip_grad_max_norm = None optim.params.clip_grad_norm_type = None # Refine train cfg for vit model train.train_micro_batch_size = 128 train.test_micro_batch_size = 128 train.train_epoch = 300 train.warmup_ratio = 20 / 300 train.eval_period = 1562 train.log_period = 100 # Scheduler train.scheduler.warmup_factor = 0.001 train.scheduler.alpha = 0.01 train.scheduler.warmup_method = "linear" # Set fp16 ON train.amp.enabled = True [03/05 19:52:36 libai]: Full config saved to test_logs/oneflow-28/NVIDIA_GeForce_RTX_3080_Ti/7d07caf/LibAI_swin_imagenet_graph_nl12_nah12_hs768_FP16_actrue_DP4_MP1_PP2_zerotrue_stage2_mbs64_gbs2048_acc8_1n8g/config.yaml [03/05 19:52:36 lb.engine.default]: > compiling dataset index builder ... make: Entering directory '/ssd/home/ouyangyu/libai_week_test/libai/libai/data/data_utils' make: Nothing to be done for 'default'. make: Leaving directory '/ssd/home/ouyangyu/libai_week_test/libai/libai/data/data_utils' [03/05 19:52:36 lb.engine.default]: >>> done with dataset index builder. Compilation time: 0.052 seconds [03/05 19:52:36 lb.engine.default]: >>> done with compiling. Compilation time: 0.054 seconds [03/05 19:52:36 lb.engine.default]: Prepare training, validating, testing set [03/05 19:52:40 lb.engine.default]: Prepare testing set [03/05 19:52:49 lb.engine.default]: Auto-scaling the config to train.train_iter=220, train.warmup_iter=15 [03/05 19:52:49 libai]: > Start building model... W20230305 19:52:52.323129 1970150 eager_local_op_interpreter.cpp:256] Casting a local tensor to a global tensor with Broadcast sbp will modify the data of input! If you want to keep the input local tensor unchanged, please set the arg copy to True. [03/05 19:52:54 lb.engine.default]: Model: SwinTransformer( (patch_embed): PatchEmbed( (proj): Conv2d(3, 96, kernel_size=(4, 4), stride=(4, 4)) (norm): LayerNorm((96,), eps=1e-05, elementwise_affine=True) ) (pos_drop): Dropout(p=0.0, inplace=False) (layers): ModuleList( (0): BasicLayer( (blocks): ModuleList( (0): SwinTransformerBlock( (norm1): LayerNorm((96,), eps=1e-05, elementwise_affine=True) (attn): WindowAttention( (qkv): Linear1D(in_features=96, out_features=288, bias=True, parallel=data) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear1D(in_features=96, out_features=96, bias=True, parallel=data) (proj_drop): Dropout(p=0.0, inplace=False) (softmax): Softmax(dim=-1) ) (drop_path): Identity() (norm2): LayerNorm((96,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.0 (dense_h_to_4h): Linear1D(in_features=96, out_features=384, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=384, out_features=96, bias=True, parallel=row) ) ) (1): SwinTransformerBlock( (norm1): LayerNorm((96,), eps=1e-05, elementwise_affine=True) (attn): WindowAttention( (qkv): Linear1D(in_features=96, out_features=288, bias=True, parallel=data) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear1D(in_features=96, out_features=96, bias=True, parallel=data) (proj_drop): Dropout(p=0.0, inplace=False) (softmax): Softmax(dim=-1) ) (drop_path): DropPath() (norm2): LayerNorm((96,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.0 (dense_h_to_4h): Linear1D(in_features=96, out_features=384, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=384, out_features=96, bias=True, parallel=row) ) ) ) (downsample): PatchMerging( (reduction): Linear1D(in_features=384, out_features=192, bias=False, parallel=data) (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True) ) ) (1): BasicLayer( (blocks): ModuleList( (0): SwinTransformerBlock( (norm1): LayerNorm((192,), eps=1e-05, elementwise_affine=True) (attn): WindowAttention( (qkv): Linear1D(in_features=192, out_features=576, bias=True, parallel=data) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear1D(in_features=192, out_features=192, bias=True, parallel=data) (proj_drop): Dropout(p=0.0, inplace=False) (softmax): Softmax(dim=-1) ) (drop_path): DropPath() (norm2): LayerNorm((192,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.0 (dense_h_to_4h): Linear1D(in_features=192, out_features=768, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=768, out_features=192, bias=True, parallel=row) ) ) (1): SwinTransformerBlock( (norm1): LayerNorm((192,), eps=1e-05, elementwise_affine=True) (attn): WindowAttention( (qkv): Linear1D(in_features=192, out_features=576, bias=True, parallel=data) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear1D(in_features=192, out_features=192, bias=True, parallel=data) (proj_drop): Dropout(p=0.0, inplace=False) (softmax): Softmax(dim=-1) ) (drop_path): DropPath() (norm2): LayerNorm((192,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.0 (dense_h_to_4h): Linear1D(in_features=192, out_features=768, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=768, out_features=192, bias=True, parallel=row) ) ) ) (downsample): PatchMerging( (reduction): Linear1D(in_features=768, out_features=384, bias=False, parallel=data) (norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) ) (2): BasicLayer( (blocks): ModuleList( (0): SwinTransformerBlock( (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True) (attn): WindowAttention( (qkv): Linear1D(in_features=384, out_features=1152, bias=True, parallel=data) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear1D(in_features=384, out_features=384, bias=True, parallel=data) (proj_drop): Dropout(p=0.0, inplace=False) (softmax): Softmax(dim=-1) ) (drop_path): DropPath() (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.0 (dense_h_to_4h): Linear1D(in_features=384, out_features=1536, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=1536, out_features=384, bias=True, parallel=row) ) ) (1): SwinTransformerBlock( (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True) (attn): WindowAttention( (qkv): Linear1D(in_features=384, out_features=1152, bias=True, parallel=data) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear1D(in_features=384, out_features=384, bias=True, parallel=data) (proj_drop): Dropout(p=0.0, inplace=False) (softmax): Softmax(dim=-1) ) (drop_path): DropPath() (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.0 (dense_h_to_4h): Linear1D(in_features=384, out_features=1536, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=1536, out_features=384, bias=True, parallel=row) ) ) (2): SwinTransformerBlock( (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True) (attn): WindowAttention( (qkv): Linear1D(in_features=384, out_features=1152, bias=True, parallel=data) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear1D(in_features=384, out_features=384, bias=True, parallel=data) (proj_drop): Dropout(p=0.0, inplace=False) (softmax): Softmax(dim=-1) ) (drop_path): DropPath() (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.0 (dense_h_to_4h): Linear1D(in_features=384, out_features=1536, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=1536, out_features=384, bias=True, parallel=row) ) ) (3): SwinTransformerBlock( (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True) (attn): WindowAttention( (qkv): Linear1D(in_features=384, out_features=1152, bias=True, parallel=data) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear1D(in_features=384, out_features=384, bias=True, parallel=data) (proj_drop): Dropout(p=0.0, inplace=False) (softmax): Softmax(dim=-1) ) (drop_path): DropPath() (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.0 (dense_h_to_4h): Linear1D(in_features=384, out_features=1536, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=1536, out_features=384, bias=True, parallel=row) ) ) (4): SwinTransformerBlock( (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True) (attn): WindowAttention( (qkv): Linear1D(in_features=384, out_features=1152, bias=True, parallel=data) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear1D(in_features=384, out_features=384, bias=True, parallel=data) (proj_drop): Dropout(p=0.0, inplace=False) (softmax): Softmax(dim=-1) ) (drop_path): DropPath() (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.0 (dense_h_to_4h): Linear1D(in_features=384, out_features=1536, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=1536, out_features=384, bias=True, parallel=row) ) ) (5): SwinTransformerBlock( (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True) (attn): WindowAttention( (qkv): Linear1D(in_features=384, out_features=1152, bias=True, parallel=data) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear1D(in_features=384, out_features=384, bias=True, parallel=data) (proj_drop): Dropout(p=0.0, inplace=False) (softmax): Softmax(dim=-1) ) (drop_path): DropPath() (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.0 (dense_h_to_4h): Linear1D(in_features=384, out_features=1536, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=1536, out_features=384, bias=True, parallel=row) ) ) ) (downsample): PatchMerging( (reduction): Linear1D(in_features=1536, out_features=768, bias=False, parallel=data) (norm): LayerNorm((1536,), eps=1e-05, elementwise_affine=True) ) ) (3): BasicLayer( (blocks): ModuleList( (0): SwinTransformerBlock( (norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (attn): WindowAttention( (qkv): Linear1D(in_features=768, out_features=2304, bias=True, parallel=data) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear1D(in_features=768, out_features=768, bias=True, parallel=data) (proj_drop): Dropout(p=0.0, inplace=False) (softmax): Softmax(dim=-1) ) (drop_path): DropPath() (norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.0 (dense_h_to_4h): Linear1D(in_features=768, out_features=3072, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=3072, out_features=768, bias=True, parallel=row) ) ) (1): SwinTransformerBlock( (norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (attn): WindowAttention( (qkv): Linear1D(in_features=768, out_features=2304, bias=True, parallel=data) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear1D(in_features=768, out_features=768, bias=True, parallel=data) (proj_drop): Dropout(p=0.0, inplace=False) (softmax): Softmax(dim=-1) ) (drop_path): DropPath() (norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.0 (dense_h_to_4h): Linear1D(in_features=768, out_features=3072, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=3072, out_features=768, bias=True, parallel=row) ) ) ) ) ) (norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (avgpool): AdaptiveAvgPool1d() (head): Linear1D(in_features=768, out_features=1000, bias=True, parallel=data) (loss_func): SoftTargetCrossEntropy() ) [03/05 19:52:54 libai]: >>> done with building model. Building time: 4.340 seconds [03/05 19:52:54 lb.engine.trainer]: Starting training from iteration 0 [03/05 19:52:56 lb.models.utils.graph_base]: Start compiling the train graph which may take some time. Please wait for a moment ... W20230305 19:53:09.016754 1970152 insert_nccl_logical_op_pass.cpp:1150] In Graph: GraphBase_0 Placement: cuda-@0:0-@1:1-@2:2-@3:3 the total_op_num = 1125 and has 2 different nccl stream which is possible to trigger cuda stream kernel launch upper limit. So the nccl logical kernel will from async to sync exec, which may affect performance. W20230305 19:53:09.090948 1970155 insert_nccl_logical_op_pass.cpp:1150] In Graph: GraphBase_0 Placement: cuda-@0:0-@1:1-@2:2-@3:3 the total_op_num = 1125 and has 2 different nccl stream which is possible to trigger cuda stream kernel launch upper limit. So the nccl logical kernel will from async to sync exec, which may affect performance. W20230305 19:53:09.702463 1970154 insert_nccl_logical_op_pass.cpp:1150] In Graph: GraphBase_0 Placement: cuda-@0:0-@1:1-@2:2-@3:3 the total_op_num = 1125 and has 2 different nccl stream which is possible to trigger cuda stream kernel launch upper limit. So the nccl logical kernel will from async to sync exec, which may affect performance. W20230305 19:53:09.854228 1970158 insert_nccl_logical_op_pass.cpp:1150] In Graph: GraphBase_0 Placement: cuda-@0:0-@1:1-@2:2-@3:3 the total_op_num = 1125 and has 2 different nccl stream which is possible to trigger cuda stream kernel launch upper limit. So the nccl logical kernel will from async to sync exec, which may affect performance. W20230305 19:53:09.950340 1970150 insert_nccl_logical_op_pass.cpp:1150] In Graph: GraphBase_0 Placement: cuda-@0:0-@1:1-@2:2-@3:3 the total_op_num = 1125 and has 2 different nccl stream which is possible to trigger cuda stream kernel launch upper limit. So the nccl logical kernel will from async to sync exec, which may affect performance. W20230305 19:53:09.957793 1970153 insert_nccl_logical_op_pass.cpp:1150] In Graph: GraphBase_0 Placement: cuda-@0:0-@1:1-@2:2-@3:3 the total_op_num = 1125 and has 2 different nccl stream which is possible to trigger cuda stream kernel launch upper limit. So the nccl logical kernel will from async to sync exec, which may affect performance. W20230305 19:53:10.007398 1970151 insert_nccl_logical_op_pass.cpp:1150] In Graph: GraphBase_0 Placement: cuda-@0:0-@1:1-@2:2-@3:3 the total_op_num = 1125 and has 2 different nccl stream which is possible to trigger cuda stream kernel launch upper limit. So the nccl logical kernel will from async to sync exec, which may affect performance. W20230305 19:53:10.057950 1970160 insert_nccl_logical_op_pass.cpp:1150] In Graph: GraphBase_0 Placement: cuda-@0:0-@1:1-@2:2-@3:3 the total_op_num = 1125 and has 2 different nccl stream which is possible to trigger cuda stream kernel launch upper limit. So the nccl logical kernel will from async to sync exec, which may affect performance. timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] 2023/03/05 20:00:26.319, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 0 %, 0 %, 12288 MiB, 6342 MiB, 5711 MiB 2023/03/05 20:00:26.322, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 23 %, 1 %, 12288 MiB, 6943 MiB, 5110 MiB timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] 2023/03/05 20:00:26.324, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 1 %, 1 %, 12288 MiB, 6941 MiB, 5112 MiB timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] 2023/03/05 20:00:26.325, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 0 %, 0 %, 12288 MiB, 6342 MiB, 5711 MiB 2023/03/05 20:00:26.326, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 3 %, 1 %, 12288 MiB, 6943 MiB, 5110 MiB 2023/03/05 20:00:26.328, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 0 %, 0 %, 12288 MiB, 6342 MiB, 5711 MiB 2023/03/05 20:00:26.328, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 23 %, 1 %, 12288 MiB, 6943 MiB, 5110 MiB 2023/03/05 20:00:26.329, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 0 %, 0 %, 12288 MiB, 9595 MiB, 2458 MiB 2023/03/05 20:00:26.332, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 23 %, 1 %, 12288 MiB, 6943 MiB, 5110 MiB 2023/03/05 20:00:26.333, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 1 %, 1 %, 12288 MiB, 6941 MiB, 5112 MiB 2023/03/05 20:00:26.334, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 5 %, 0 %, 12288 MiB, 9591 MiB, 2462 MiB 2023/03/05 20:00:26.337, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 1 %, 1 %, 12288 MiB, 6941 MiB, 5112 MiB 2023/03/05 20:00:26.337, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 3 %, 1 %, 12288 MiB, 6943 MiB, 5110 MiB 2023/03/05 20:00:26.339, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 44 %, 0 %, 12288 MiB, 9591 MiB, 2462 MiB timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] 2023/03/05 20:00:26.341, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 3 %, 1 %, 12288 MiB, 6943 MiB, 5110 MiB 2023/03/05 20:00:26.341, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 0 %, 0 %, 12288 MiB, 9595 MiB, 2458 MiB 2023/03/05 20:00:26.342, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 0 %, 0 %, 12288 MiB, 9591 MiB, 2462 MiB 2023/03/05 20:00:26.344, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 0 %, 0 %, 12288 MiB, 6342 MiB, 5711 MiB 2023/03/05 20:00:26.345, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 0 %, 0 %, 12288 MiB, 9595 MiB, 2458 MiB 2023/03/05 20:00:26.345, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 5 %, 0 %, 12288 MiB, 9591 MiB, 2462 MiB 2023/03/05 20:00:26.351, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 23 %, 1 %, 12288 MiB, 6943 MiB, 5110 MiB 2023/03/05 20:00:26.351, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 5 %, 0 %, 12288 MiB, 9591 MiB, 2462 MiB 2023/03/05 20:00:26.354, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 44 %, 0 %, 12288 MiB, 9591 MiB, 2462 MiB 2023/03/05 20:00:26.357, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 1 %, 1 %, 12288 MiB, 6941 MiB, 5112 MiB 2023/03/05 20:00:26.358, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 44 %, 0 %, 12288 MiB, 9591 MiB, 2462 MiB 2023/03/05 20:00:26.358, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 0 %, 0 %, 12288 MiB, 9591 MiB, 2462 MiB timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] 2023/03/05 20:00:26.363, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 3 %, 1 %, 12288 MiB, 6943 MiB, 5110 MiB 2023/03/05 20:00:26.363, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 0 %, 0 %, 12288 MiB, 9591 MiB, 2462 MiB 2023/03/05 20:00:26.364, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 0 %, 0 %, 12288 MiB, 6342 MiB, 5711 MiB 2023/03/05 20:00:26.367, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 62 %, 0 %, 12288 MiB, 9595 MiB, 2458 MiB timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] 2023/03/05 20:00:26.368, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 23 %, 1 %, 12288 MiB, 6943 MiB, 5110 MiB 2023/03/05 20:00:26.371, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 5 %, 0 %, 12288 MiB, 9591 MiB, 2462 MiB 2023/03/05 20:00:26.372, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 1 %, 1 %, 12288 MiB, 6941 MiB, 5112 MiB 2023/03/05 20:00:26.371, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 0 %, 0 %, 12288 MiB, 6342 MiB, 5711 MiB 2023/03/05 20:00:26.376, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 44 %, 0 %, 12288 MiB, 9591 MiB, 2462 MiB 2023/03/05 20:00:26.377, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 3 %, 1 %, 12288 MiB, 6943 MiB, 5110 MiB 2023/03/05 20:00:26.378, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 23 %, 1 %, 12288 MiB, 6943 MiB, 5110 MiB timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] 2023/03/05 20:00:26.380, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 1 %, 1 %, 12288 MiB, 6941 MiB, 5112 MiB 2023/03/05 20:00:26.379, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 0 %, 0 %, 12288 MiB, 9591 MiB, 2462 MiB 2023/03/05 20:00:26.380, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 62 %, 0 %, 12288 MiB, 9595 MiB, 2458 MiB 2023/03/05 20:00:26.381, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 0 %, 0 %, 12288 MiB, 6342 MiB, 5711 MiB 2023/03/05 20:00:26.382, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 3 %, 1 %, 12288 MiB, 6943 MiB, 5110 MiB 2023/03/05 20:00:26.385, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 5 %, 0 %, 12288 MiB, 9591 MiB, 2462 MiB 2023/03/05 20:00:26.385, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 23 %, 1 %, 12288 MiB, 6943 MiB, 5110 MiB 2023/03/05 20:00:26.388, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 62 %, 0 %, 12288 MiB, 9595 MiB, 2458 MiB 2023/03/05 20:00:26.390, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 44 %, 0 %, 12288 MiB, 9591 MiB, 2462 MiB 2023/03/05 20:00:26.390, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 1 %, 1 %, 12288 MiB, 6941 MiB, 5112 MiB 2023/03/05 20:00:26.392, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 5 %, 0 %, 12288 MiB, 9591 MiB, 2462 MiB 2023/03/05 20:00:26.393, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 0 %, 0 %, 12288 MiB, 9591 MiB, 2462 MiB 2023/03/05 20:00:26.394, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 3 %, 1 %, 12288 MiB, 6943 MiB, 5110 MiB 2023/03/05 20:00:26.395, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 44 %, 0 %, 12288 MiB, 9591 MiB, 2462 MiB 2023/03/05 20:00:26.398, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 62 %, 0 %, 12288 MiB, 9595 MiB, 2458 MiB 2023/03/05 20:00:26.401, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 0 %, 0 %, 12288 MiB, 9591 MiB, 2462 MiB 2023/03/05 20:00:26.403, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 5 %, 0 %, 12288 MiB, 9591 MiB, 2462 MiB 2023/03/05 20:00:26.407, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 44 %, 0 %, 12288 MiB, 9591 MiB, 2462 MiB 2023/03/05 20:00:26.409, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 0 %, 0 %, 12288 MiB, 9591 MiB, 2462 MiB timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] 2023/03/05 20:00:29.051, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 77 %, 36 %, 12288 MiB, 6342 MiB, 5711 MiB 2023/03/05 20:00:29.053, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 82 %, 43 %, 12288 MiB, 6943 MiB, 5110 MiB 2023/03/05 20:00:29.055, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 29 %, 15 %, 12288 MiB, 6941 MiB, 5112 MiB 2023/03/05 20:00:29.056, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 21 %, 13 %, 12288 MiB, 6943 MiB, 5110 MiB 2023/03/05 20:00:29.057, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 19 %, 0 %, 12288 MiB, 9595 MiB, 2458 MiB 2023/03/05 20:00:29.059, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 0 %, 0 %, 12288 MiB, 9591 MiB, 2462 MiB 2023/03/05 20:00:29.060, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 30 %, 4 %, 12288 MiB, 9591 MiB, 2462 MiB 2023/03/05 20:00:29.060, NVIDIA GeForce RTX 3080 Ti, 515.65.01, 0 %, 0 %, 12288 MiB, 9591 MiB, 2462 MiB [03/05 20:00:32 lb.utils.events]: eta: 0:07:33 iteration: 99/220 consumed_samples: 204800 total_loss: 6.89 time: 4.3703 s/iter data_time: 0.6721 s/iter total_throughput: 468.62 samples/s lr: 5.82e-04 [03/05 20:06:57 lb.utils.events]: eta: 0:01:15 iteration: 199/220 consumed_samples: 409600 total_loss: 6.855 time: 4.1098 s/iter data_time: 1.0404 s/iter total_throughput: 498.32 samples/s lr: 3.21e-05 [03/05 20:08:09 lb.utils.events]: eta: 0:00:00 iteration: 219/220 consumed_samples: 450560 total_loss: 6.848 time: 4.0628 s/iter data_time: 1.0844 s/iter total_throughput: 504.08 samples/s lr: 1.01e-05 [03/05 20:08:09 lb.engine.hooks]: Overall training speed: 218 iterations in 0:14:45 (4.0629 s / it) [03/05 20:08:09 lb.engine.hooks]: Total training time: 0:14:45 (0:00:00 on hooks) ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** oneflow-version(git_commit)=0.9.1.dev20230304+cu117 oneflow-commit(git_commit)=7d07caf oneflow-libai(git_commit)=50a973dc5de635b8613ad7666c073c763e238850