loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1 loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1 ------------------------ arguments ------------------------ batch_size ...................................... 131072 batch_size_per_proc ............................. 131072 data_dir ........................................ /dataset/f9f659c5/wdl_ofrecord data_part_name_suffix_length .................... 5 data_part_num ................................... 256 dataset_format .................................. ofrecord ddp ............................................. True deep_dropout_rate ............................... 0.5 deep_embedding_vec_size ......................... 16 deep_vocab_size ................................. 2322444 eval_after_training ............................. False eval_batchs ..................................... 20 eval_interval ................................... 0 execution_mode .................................. eager hidden_size ..................................... 1024 hidden_units_num ................................ 2 learning_rate ................................... 0.001 loss_print_every_n_iter ......................... 100 max_iter ........................................ 1100 model_load_dir .................................. model_save_dir .................................. ./checkpoint num_deep_sparse_fields .......................... 26 num_dense_fields ................................ 13 num_wide_sparse_fields .......................... 2 save_initial_model .............................. False save_model_after_each_eval ...................... False test_name ....................................... noname_test wide_vocab_size ................................. 2322444 -------------------- end of arguments --------------------- [rank:0] iter: 100/1100, loss: 0.5025979876518250, latency(ms): 364.7471006214618683 | 2022-04-05 01:49:58.739 timestamp, name, driver_version, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB] 2022/04/05 01:49:58.759, Tesla V100-SXM2-32GB, 470.57.02, 51 %, 21 %, 32510 MiB, 27118 MiB, 5392 MiB [rank:0] iter: 200/1100, loss: 0.4670323431491852, latency(ms): 312.1553770452737808 | 2022-04-05 01:50:29.955 [rank:0] iter: 300/1100, loss: 0.4597111046314240, latency(ms): 306.2256181240081787 | 2022-04-05 01:51:00.577 [rank:0] iter: 400/1100, loss: 0.4519645571708679, latency(ms): 311.7322466522455215 | 2022-04-05 01:51:31.751 [rank:0] iter: 500/1100, loss: 0.4508184492588043, latency(ms): 300.1015184074640274 | 2022-04-05 01:52:01.761 [rank:0] iter: 600/1100, loss: 0.4457454681396484, latency(ms): 302.7083187922835350 | 2022-04-05 01:52:32.032 [rank:0] iter: 700/1100, loss: 0.4400880932807922, latency(ms): 301.3959471881389618 | 2022-04-05 01:53:02.171 [rank:0] iter: 800/1100, loss: 0.4416058063507080, latency(ms): 300.1133351027965546 | 2022-04-05 01:53:32.183 [rank:0] iter: 900/1100, loss: 0.4336046874523163, latency(ms): 300.0575577095150948 | 2022-04-05 01:54:02.188 [rank:0] iter: 1000/1100, loss: 0.4307317733764648, latency(ms): 302.0426918193697929 | 2022-04-05 01:54:32.393 [rank:0] iter: 1100/1100, loss: 0.4328159093856812, latency(ms): 305.7435703277587891 | 2022-04-05 01:55:02.967