Script started on Thursday 20 October 2016 07:19:06 PM IST hans@hans-Lenovo-IdeaPad-Y500: ~/Documents/HANS/MAC/SUCCESSFUL MODELS/ADD/seq2seq-attn-master th train.lua -data_file data/demo-train.hdf5 -val_data_file data/demo-val.hdf5 -savefile demo-model using CUDA on GPU 1... loading data... done! Source vocab size: 50004, Target vocab size: 150004 Source max sent len: 50, Target max sent len: 52 Number of additional features on source side: 0 Switching on memory preallocation Number of parameters: 84236504 (active: 84236504) Epoch: 1, Batch: 50/11961, Batch size: 16, LR: 0.1000, PPL: 9760756027.98, |Param|: 1492.43, |GParam|: 52.79, Training: 91/17/73 total/source/target tokens/sec Epoch: 1, Batch: 100/11961, Batch size: 16, LR: 0.1000, PPL: 781739548.98, |Param|: 1525.25, |GParam|: 36.10, Training: 96/23/73 total/source/target tokens/sec Epoch: 1, Batch: 150/11961, Batch size: 16, LR: 0.1000, PPL: 198282910.29, |Param|: 1539.39, |GParam|: 29.43, Training: 99/27/72 total/source/target tokens/sec Epoch: 1, Batch: 200/11961, Batch size: 16, LR: 0.1000, PPL: 69439114.08, |Param|: 1574.14, |GParam|: 50.12, Training: 101/29/72 total/source/target tokens/sec Epoch: 1, Batch: 250/11961, Batch size: 16, LR: 0.1000, PPL: 32040293.20, |Param|: 1599.76, |GParam|: 30.52, Training: 103/30/72 total/source/target tokens/sec Epoch: 1, Batch: 300/11961, Batch size: 16, LR: 0.1000, PPL: 16857440.28, |Param|: 1633.76, |GParam|: 31.80, Training: 104/32/71 total/source/target tokens/sec Epoch: 1, Batch: 350/11961, Batch size: 16, LR: 0.1000, PPL: 8440380.57, |Param|: 1677.64, |GParam|: 31.49, Training: 105/33/71 total/source/target tokens/sec Epoch: 1, Batch: 400/11961, Batch size: 16, LR: 0.1000, PPL: 5225945.55, |Param|: 1717.95, |GParam|: 155.25, Training: 106/34/71 total/source/target tokens/sec Epoch: 1, Batch: 450/11961, Batch size: 16, LR: 0.1000, PPL: 3305570.73, |Param|: 1780.78, |GParam|: 57.80, Training: 107/35/71 total/source/target tokens/sec Epoch: 1, Batch: 500/11961, Batch size: 16, LR: 0.1000, PPL: 2351523.43, |Param|: 1819.88, |GParam|: 33.01, Training: 107/36/71 total/source/target tokens/sec Epoch: 1, Batch: 550/11961, Batch size: 16, LR: 0.1000, PPL: 1786160.88, |Param|: 1855.07, |GParam|: 199.94, Training: 108/37/71 total/source/target tokens/sec Epoch: 1, Batch: 600/11961, Batch size: 16, LR: 0.1000, PPL: 1425461.03, |Param|: 1895.04, |GParam|: 26.97, Training: 109/37/71 total/source/target tokens/sec Epoch: 1, Batch: 650/11961, Batch size: 16, LR: 0.1000, PPL: 1207773.65, |Param|: 1925.25, |GParam|: 24.32, Training: 109/38/71 total/source/target tokens/sec Epoch: 1, Batch: 700/11961, Batch size: 16, LR: 0.1000, PPL: 1026115.91, |Param|: 1951.99, |GParam|: 33.10, Training: 110/38/71 total/source/target tokens/sec Epoch: 1, Batch: 750/11961, Batch size: 16, LR: 0.1000, PPL: 888780.29, |Param|: 1970.96, |GParam|: 25.26, Training: 110/39/71 total/source/target tokens/sec Epoch: 1, Batch: 800/11961, Batch size: 16, LR: 0.1000, PPL: 784672.86, |Param|: 1987.17, |GParam|: 29.38, Training: 111/39/71 total/source/target tokens/sec Epoch: 1, Batch: 850/11961, Batch size: 16, LR: 0.1000, PPL: 692913.19, |Param|: 2014.80, |GParam|: 51.22, Training: 111/40/71 total/source/target tokens/sec Epoch: 1, Batch: 900/11961, Batch size: 16, LR: 0.1000, PPL: 611825.25, |Param|: 2033.51, |GParam|: 28.75, Training: 112/41/71 total/source/target tokens/sec Epoch: 1, Batch: 950/11961, Batch size: 16, LR: 0.1000, PPL: 541493.88, |Param|: 2054.81, |GParam|: 24.51, Training: 112/41/71 total/source/target tokens/sec Epoch: 1, Batch: 1000/11961, Batch size: 16, LR: 0.1000, PPL: 489838.11, |Param|: 2069.47, |GParam|: 31.09, Training: 112/41/71 total/source/target tokens/sec Epoch: 1, Batch: 1050/11961, Batch size: 16, LR: 0.1000, PPL: 442902.63, |Param|: 2094.37, |GParam|: 26.77, Training: 113/42/71 total/source/target tokens/sec Epoch: 1, Batch: 1100/11961, Batch size: 16, LR: 0.1000, PPL: 406865.32, |Param|: 2119.09, |GParam|: 40.04, Training: 113/42/71 total/source/target tokens/sec Epoch: 1, Batch: 1150/11961, Batch size: 16, LR: 0.1000, PPL: 373234.70, |Param|: 2134.49, |GParam|: 31.03, Training: 114/43/71 total/source/target tokens/sec Epoch: 1, Batch: 1200/11961, Batch size: 16, LR: 0.1000, PPL: 344561.53, |Param|: 2157.66, |GParam|: 35.83, Training: 114/43/71 total/source/target tokens/sec Epoch: 1, Batch: 1250/11961, Batch size: 16, LR: 0.1000, PPL: 317501.06, |Param|: 2187.22, |GParam|: 26.91, Training: 114/43/70 total/source/target tokens/sec Epoch: 1, Batch: 1300/11961, Batch size: 16, LR: 0.1000, PPL: 293374.70, |Param|: 2211.05, |GParam|: 20.50, Training: 114/43/70 total/source/target tokens/sec Epoch: 1, Batch: 1350/11961, Batch size: 16, LR: 0.1000, PPL: 272437.01, |Param|: 2273.10, |GParam|: 20.24, Training: 115/44/70 total/source/target tokens/sec Epoch: 1, Batch: 1400/11961, Batch size: 16, LR: 0.1000, PPL: 252708.00, |Param|: 2317.82, |GParam|: 20.77, Training: 115/44/70 total/source/target tokens/sec Epoch: 1, Batch: 1450/11961, Batch size: 16, LR: 0.1000, PPL: 234315.28, |Param|: 2402.63, |GParam|: 17.40, Training: 115/45/70 total/source/target tokens/sec Epoch: 1, Batch: 1500/11961, Batch size: 16, LR: 0.1000, PPL: 218744.68, |Param|: 2500.08, |GParam|: 15.95, Training: 116/45/70 total/source/target tokens/sec Epoch: 1, Batch: 1550/11961, Batch size: 16, LR: 0.1000, PPL: 202407.88, |Param|: 2603.47, |GParam|: 29.93, Training: 116/45/70 total/source/target tokens/sec Epoch: 1, Batch: 1600/11961, Batch size: 16, LR: 0.1000, PPL: 186070.18, |Param|: 2703.16, |GParam|: 17.41, Training: 116/45/70 total/source/target tokens/sec Epoch: 1, Batch: 1650/11961, Batch size: 16, LR: 0.1000, PPL: 170317.83, |Param|: 2768.73, |GParam|: 33.90, Training: 116/45/70 total/source/target tokens/sec Epoch: 1, Batch: 1700/11961, Batch size: 16, LR: 0.1000, PPL: 155934.95, |Param|: 2813.45, |GParam|: 22.21, Training: 117/46/70 total/source/target tokens/sec Epoch: 1, Batch: 1750/11961, Batch size: 16, LR: 0.1000, PPL: 143552.46, |Param|: 2842.71, |GParam|: 17.09, Training: 117/46/70 total/source/target tokens/sec Epoch: 1, Batch: 1800/11961, Batch size: 16, LR: 0.1000, PPL: 131534.06, |Param|: 2884.29, |GParam|: 18.21, Training: 117/46/70 total/source/target tokens/sec Epoch: 1, Batch: 1850/11961, Batch size: 16, LR: 0.1000, PPL: 121425.11, |Param|: 2914.86, |GParam|: 27.51, Training: 117/46/70 total/source/target tokens/sec Epoch: 1, Batch: 1900/11961, Batch size: 16, LR: 0.1000, PPL: 112412.24, |Param|: 2957.91, |GParam|: 22.95, Training: 117/47/70 total/source/target tokens/sec Epoch: 1, Batch: 1950/11961, Batch size: 16, LR: 0.1000, PPL: 104347.67, |Param|: 2996.45, |GParam|: 26.89, Training: 118/47/70 total/source/target tokens/sec Epoch: 1, Batch: 2000/11961, Batch size: 16, LR: 0.1000, PPL: 97074.00, |Param|: 3030.32, |GParam|: 17.52, Training: 118/47/70 total/source/target tokens/sec Epoch: 1, Batch: 2050/11961, Batch size: 16, LR: 0.1000, PPL: 90625.13, |Param|: 3055.64, |GParam|: 20.97, Training: 118/47/70 total/source/target tokens/sec Epoch: 1, Batch: 2100/11961, Batch size: 16, LR: 0.1000, PPL: 85192.15, |Param|: 3075.55, |GParam|: 36.70, Training: 118/47/70 total/source/target tokens/sec Epoch: 1, Batch: 2150/11961, Batch size: 16, LR: 0.1000, PPL: 79908.68, |Param|: 3109.03, |GParam|: 29.87, Training: 118/48/70 total/source/target tokens/sec Epoch: 1, Batch: 2200/11961, Batch size: 16, LR: 0.1000, PPL: 75464.99, |Param|: 3133.45, |GParam|: 31.74, Training: 119/48/70 total/source/target tokens/sec Epoch: 1, Batch: 2250/11961, Batch size: 16, LR: 0.1000, PPL: 71002.95, |Param|: 3157.71, |GParam|: 20.48, Training: 119/48/70 total/source/target tokens/sec Epoch: 1, Batch: 2300/11961, Batch size: 16, LR: 0.1000, PPL: 67253.25, |Param|: 3181.75, |GParam|: 18.52, Training: 119/48/70 total/source/target tokens/sec Epoch: 1, Batch: 2350/11961, Batch size: 16, LR: 0.1000, PPL: 63818.31, |Param|: 3199.38, |GParam|: 38.23, Training: 119/48/70 total/source/target tokens/sec Epoch: 1, Batch: 2400/11961, Batch size: 16, LR: 0.1000, PPL: 60743.26, |Param|: 3221.21, |GParam|: 22.90, Training: 119/49/70 total/source/target tokens/sec Epoch: 1, Batch: 2450/11961, Batch size: 16, LR: 0.1000, PPL: 58005.56, |Param|: 3240.70, |GParam|: 13.11, Training: 119/49/70 total/source/target tokens/sec Epoch: 1, Batch: 2500/11961, Batch size: 16, LR: 0.1000, PPL: 55473.70, |Param|: 3262.39, |GParam|: 34.70, Training: 120/49/70 total/source/target tokens/sec Epoch: 1, Batch: 2550/11961, Batch size: 16, LR: 0.1000, PPL: 53071.07, |Param|: 3277.64, |GParam|: 34.18, Training: 120/49/70 total/source/target tokens/sec Epoch: 1, Batch: 2600/11961, Batch size: 16, LR: 0.1000, PPL: 50917.91, |Param|: 3294.66, |GParam|: 18.02, Training: 120/49/70 total/source/target tokens/sec Epoch: 1, Batch: 2650/11961, Batch size: 16, LR: 0.1000, PPL: 48871.50, |Param|: 3302.97, |GParam|: 30.21, Training: 120/49/70 total/source/target tokens/sec Epoch: 1, Batch: 2700/11961, Batch size: 16, LR: 0.1000, PPL: 47185.55, |Param|: 3318.66, |GParam|: 27.48, Training: 120/50/70 total/source/target tokens/sec Epoch: 1, Batch: 2750/11961, Batch size: 16, LR: 0.1000, PPL: 45409.50, |Param|: 3331.03, |GParam|: 31.71, Training: 120/50/70 total/source/target tokens/sec Epoch: 1, Batch: 2800/11961, Batch size: 16, LR: 0.1000, PPL: 43785.92, |Param|: 3344.61, |GParam|: 15.76, Training: 120/50/70 total/source/target tokens/sec Epoch: 1, Batch: 2850/11961, Batch size: 16, LR: 0.1000, PPL: 42266.89, |Param|: 3360.23, |GParam|: 23.17, Training: 121/50/70 total/source/target tokens/sec Epoch: 1, Batch: 2900/11961, Batch size: 16, LR: 0.1000, PPL: 40869.19, |Param|: 3367.25, |GParam|: 34.42, Training: 121/50/70 total/source/target tokens/sec Epoch: 1, Batch: 2950/11961, Batch size: 16, LR: 0.1000, PPL: 39512.09, |Param|: 3378.93, |GParam|: 20.65, Training: 121/50/70 total/source/target tokens/sec Epoch: 1, Batch: 3000/11961, Batch size: 16, LR: 0.1000, PPL: 38179.40, |Param|: 3389.34, |GParam|: 28.28, Training: 121/51/70 total/source/target tokens/sec Epoch: 1, Batch: 3050/11961, Batch size: 16, LR: 0.1000, PPL: 37003.45, |Param|: 3402.76, |GParam|: 44.36, Training: 121/51/70 total/source/target tokens/sec Epoch: 1, Batch: 3100/11961, Batch size: 16, LR: 0.1000, PPL: 35884.09, |Param|: 3412.07, |GParam|: 20.04, Training: 121/51/70 total/source/target tokens/sec Epoch: 1, Batch: 3150/11961, Batch size: 16, LR: 0.1000, PPL: 34824.76, |Param|: 3416.96, |GParam|: 30.95, Training: 121/51/70 total/source/target tokens/sec Epoch: 1, Batch: 3200/11961, Batch size: 16, LR: 0.1000, PPL: 33833.11, |Param|: 3422.76, |GParam|: 34.37, Training: 122/51/70 total/source/target tokens/sec Epoch: 1, Batch: 3250/11961, Batch size: 16, LR: 0.1000, PPL: 32887.52, |Param|: 3431.40, |GParam|: 14.76, Training: 122/51/70 total/source/target tokens/sec Epoch: 1, Batch: 3300/11961, Batch size: 16, LR: 0.1000, PPL: 31987.74, |Param|: 3441.65, |GParam|: 32.04, Training: 122/51/70 total/source/target tokens/sec Epoch: 1, Batch: 3350/11961, Batch size: 16, LR: 0.1000, PPL: 31124.43, |Param|: 3448.91, |GParam|: 14.75, Training: 122/51/70 total/source/target tokens/sec Epoch: 1, Batch: 3400/11961, Batch size: 16, LR: 0.1000, PPL: 30343.92, |Param|: 3456.80, |GParam|: 31.17, Training: 122/52/70 total/source/target tokens/sec Epoch: 1, Batch: 3450/11961, Batch size: 16, LR: 0.1000, PPL: 29586.80, |Param|: 3464.15, |GParam|: 27.44, Training: 122/52/70 total/source/target tokens/sec Epoch: 1, Batch: 3500/11961, Batch size: 16, LR: 0.1000, PPL: 28872.67, |Param|: 3470.11, |GParam|: 22.15, Training: 122/52/70 total/source/target tokens/sec Epoch: 1, Batch: 3550/11961, Batch size: 16, LR: 0.1000, PPL: 28197.09, |Param|: 3477.93, |GParam|: 25.38, Training: 122/52/70 total/source/target tokens/sec Epoch: 1, Batch: 3600/11961, Batch size: 16, LR: 0.1000, PPL: 27546.57, |Param|: 3480.89, |GParam|: 48.66, Training: 123/52/70 total/source/target tokens/sec Epoch: 1, Batch: 3650/11961, Batch size: 16, LR: 0.1000, PPL: 26960.08, |Param|: 3483.98, |GParam|: 29.90, Training: 123/52/70 total/source/target tokens/sec Epoch: 1, Batch: 3700/11961, Batch size: 16, LR: 0.1000, PPL: 26359.14, |Param|: 3489.95, |GParam|: 19.47, Training: 123/52/70 total/source/target tokens/sec Epoch: 1, Batch: 3750/11961, Batch size: 16, LR: 0.1000, PPL: 25809.87, |Param|: 3495.46, |GParam|: 19.84, Training: 123/52/70 total/source/target tokens/sec Epoch: 1, Batch: 3800/11961, Batch size: 16, LR: 0.1000, PPL: 25271.42, |Param|: 3500.36, |GParam|: 26.38, Training: 123/53/70 total/source/target tokens/sec Epoch: 1, Batch: 3850/11961, Batch size: 16, LR: 0.1000, PPL: 24752.44, |Param|: 3504.30, |GParam|: 23.34, Training: 123/53/70 total/source/target tokens/sec Epoch: 1, Batch: 3900/11961, Batch size: 16, LR: 0.1000, PPL: 24284.14, |Param|: 3508.03, |GParam|: 33.71, Training: 123/53/70 total/source/target tokens/sec Epoch: 1, Batch: 3950/11961, Batch size: 16, LR: 0.1000, PPL: 23822.97, |Param|: 3512.59, |GParam|: 25.32, Training: 123/53/70 total/source/target tokens/sec Epoch: 1, Batch: 4000/11961, Batch size: 16, LR: 0.1000, PPL: 23378.85, |Param|: 3519.20, |GParam|: 35.94, Training: 123/53/70 total/source/target tokens/sec Epoch: 1, Batch: 4050/11961, Batch size: 16, LR: 0.1000, PPL: 22970.72, |Param|: 3524.10, |GParam|: 62.00, Training: 124/53/70 total/source/target tokens/sec Epoch: 1, Batch: 4100/11961, Batch size: 16, LR: 0.1000, PPL: 22535.91, |Param|: 3529.89, |GParam|: 34.96, Training: 124/53/70 total/source/target tokens/sec Epoch: 1, Batch: 4150/11961, Batch size: 16, LR: 0.1000, PPL: 22133.70, |Param|: 3534.69, |GParam|: 15.45, Training: 124/53/70 total/source/target tokens/sec Epoch: 1, Batch: 4200/11961, Batch size: 16, LR: 0.1000, PPL: 21758.41, |Param|: 3539.70, |GParam|: 14.76, Training: 124/54/70 total/source/target tokens/sec Epoch: 1, Batch: 4250/11961, Batch size: 16, LR: 0.1000, PPL: 21392.12, |Param|: 3544.87, |GParam|: 48.46, Training: 124/54/70 total/source/target tokens/sec Epoch: 1, Batch: 4300/11961, Batch size: 16, LR: 0.1000, PPL: 21049.78, |Param|: 3550.19, |GParam|: 37.55, Training: 124/54/70 total/source/target tokens/sec Epoch: 1, Batch: 4350/11961, Batch size: 16, LR: 0.1000, PPL: 20687.22, |Param|: 3553.12, |GParam|: 23.37, Training: 124/54/70 total/source/target tokens/sec Epoch: 1, Batch: 4400/11961, Batch size: 16, LR: 0.1000, PPL: 20343.59, |Param|: 3556.38, |GParam|: 56.69, Training: 124/54/70 total/source/target tokens/sec Epoch: 1, Batch: 4450/11961, Batch size: 16, LR: 0.1000, PPL: 20028.91, |Param|: 3560.25, |GParam|: 24.36, Training: 124/54/70 total/source/target tokens/sec Epoch: 1, Batch: 4500/11961, Batch size: 16, LR: 0.1000, PPL: 19738.44, |Param|: 3562.31, |GParam|: 25.20, Training: 124/54/70 total/source/target tokens/sec Epoch: 1, Batch: 4550/11961, Batch size: 16, LR: 0.1000, PPL: 19432.68, |Param|: 3565.94, |GParam|: 44.91, Training: 124/54/70 total/source/target tokens/sec Epoch: 1, Batch: 4600/11961, Batch size: 16, LR: 0.1000, PPL: 19155.34, |Param|: 3571.01, |GParam|: 16.50, Training: 125/54/70 total/source/target tokens/sec Epoch: 1, Batch: 4650/11961, Batch size: 16, LR: 0.1000, PPL: 18907.28, |Param|: 3573.81, |GParam|: 21.07, Training: 125/54/70 total/source/target tokens/sec Epoch: 1, Batch: 4700/11961, Batch size: 16, LR: 0.1000, PPL: 18652.95, |Param|: 3577.25, |GParam|: 20.42, Training: 125/55/70 total/source/target tokens/sec Epoch: 1, Batch: 4750/11961, Batch size: 16, LR: 0.1000, PPL: 18380.85, |Param|: 3582.19, |GParam|: 19.96, Training: 125/55/70 total/source/target tokens/sec Epoch: 1, Batch: 4800/11961, Batch size: 16, LR: 0.1000, PPL: 18117.08, |Param|: 3585.52, |GParam|: 24.84, Training: 125/55/70 total/source/target tokens/sec Epoch: 1, Batch: 4850/11961, Batch size: 16, LR: 0.1000, PPL: 17877.19, |Param|: 3588.34, |GParam|: 55.27, Training: 125/55/70 total/source/target tokens/sec Epoch: 1, Batch: 4900/11961, Batch size: 16, LR: 0.1000, PPL: 17647.91, |Param|: 3592.06, |GParam|: 17.45, Training: 125/55/70 total/source/target tokens/sec Epoch: 1, Batch: 4950/11961, Batch size: 16, LR: 0.1000, PPL: 17419.65, |Param|: 3594.24, |GParam|: 23.18, Training: 125/55/70 total/source/target tokens/sec Epoch: 1, Batch: 5000/11961, Batch size: 16, LR: 0.1000, PPL: 17201.65, |Param|: 3597.98, |GParam|: 22.29, Training: 125/55/70 total/source/target tokens/sec Epoch: 1, Batch: 5050/11961, Batch size: 16, LR: 0.1000, PPL: 16997.01, |Param|: 3601.37, |GParam|: 18.02, Training: 125/55/70 total/source/target tokens/sec Epoch: 1, Batch: 5100/11961, Batch size: 16, LR: 0.1000, PPL: 16793.41, |Param|: 3604.49, |GParam|: 19.99, Training: 125/55/70 total/source/target tokens/sec Epoch: 1, Batch: 5150/11961, Batch size: 16, LR: 0.1000, PPL: 16599.94, |Param|: 3607.53, |GParam|: 22.04, Training: 125/55/70 total/source/target tokens/sec Epoch: 1, Batch: 5200/11961, Batch size: 16, LR: 0.1000, PPL: 16422.52, |Param|: 3610.76, |GParam|: 14.84, Training: 126/55/70 total/source/target tokens/sec Epoch: 1, Batch: 5250/11961, Batch size: 16, LR: 0.1000, PPL: 16250.54, |Param|: 3613.79, |GParam|: 17.59, Training: 126/56/70 total/source/target tokens/sec Epoch: 1, Batch: 5300/11961, Batch size: 16, LR: 0.1000, PPL: 16060.75, |Param|: 3617.39, |GParam|: 14.33, Training: 126/56/70 total/source/target tokens/sec Epoch: 1, Batch: 5350/11961, Batch size: 16, LR: 0.1000, PPL: 15904.40, |Param|: 3619.68, |GParam|: 19.11, Training: 126/56/70 total/source/target tokens/sec Epoch: 1, Batch: 5400/11961, Batch size: 16, LR: 0.1000, PPL: 15729.42, |Param|: 3622.23, |GParam|: 21.42, Training: 126/56/70 total/source/target tokens/sec Epoch: 1, Batch: 5450/11961, Batch size: 16, LR: 0.1000, PPL: 15562.52, |Param|: 3625.43, |GParam|: 23.71, Training: 126/56/70 total/source/target tokens/sec Epoch: 1, Batch: 5500/11961, Batch size: 16, LR: 0.1000, PPL: 15404.20, |Param|: 3627.15, |GParam|: 23.64, Training: 126/56/70 total/source/target tokens/sec Epoch: 1, Batch: 5550/11961, Batch size: 16, LR: 0.1000, PPL: 15242.70, |Param|: 3629.80, |GParam|: 42.00, Training: 126/56/70 total/source/target tokens/sec Epoch: 1, Batch: 5600/11961, Batch size: 16, LR: 0.1000, PPL: 15090.21, |Param|: 3633.52, |GParam|: 43.70, Training: 126/56/70 total/source/target tokens/sec Epoch: 1, Batch: 5650/11961, Batch size: 16, LR: 0.1000, PPL: 14941.49, |Param|: 3636.20, |GParam|: 36.06, Training: 126/56/70 total/source/target tokens/sec Epoch: 1, Batch: 5700/11961, Batch size: 16, LR: 0.1000, PPL: 14796.37, |Param|: 3638.97, |GParam|: 45.44, Training: 126/56/70 total/source/target tokens/sec Epoch: 1, Batch: 5750/11961, Batch size: 16, LR: 0.1000, PPL: 14647.09, |Param|: 3641.55, |GParam|: 15.11, Training: 126/56/70 total/source/target tokens/sec Epoch: 1, Batch: 5800/11961, Batch size: 16, LR: 0.1000, PPL: 14503.58, |Param|: 3643.45, |GParam|: 41.54, Training: 126/56/70 total/source/target tokens/sec Epoch: 1, Batch: 5850/11961, Batch size: 16, LR: 0.1000, PPL: 14367.76, |Param|: 3646.51, |GParam|: 29.30, Training: 127/56/70 total/source/target tokens/sec Epoch: 1, Batch: 5900/11961, Batch size: 16, LR: 0.1000, PPL: 14229.58, |Param|: 3648.90, |GParam|: 20.55, Training: 127/57/70 total/source/target tokens/sec Epoch: 1, Batch: 5950/11961, Batch size: 16, LR: 0.1000, PPL: 14092.06, |Param|: 3651.28, |GParam|: 36.70, Training: 127/57/70 total/source/target tokens/sec Epoch: 1, Batch: 6000/11961, Batch size: 16, LR: 0.1000, PPL: 13966.04, |Param|: 3654.00, |GParam|: 18.21, Training: 127/57/69 total/source/target tokens/sec Epoch: 1, Batch: 6050/11961, Batch size: 16, LR: 0.1000, PPL: 13845.40, |Param|: 3656.71, |GParam|: 31.67, Training: 127/57/69 total/source/target tokens/sec Epoch: 1, Batch: 6100/11961, Batch size: 16, LR: 0.1000, PPL: 13719.37, |Param|: 3659.01, |GParam|: 25.13, Training: 127/57/69 total/source/target tokens/sec Epoch: 1, Batch: 6150/11961, Batch size: 16, LR: 0.1000, PPL: 13597.99, |Param|: 3660.93, |GParam|: 49.26, Training: 127/57/69 total/source/target tokens/sec Epoch: 1, Batch: 6200/11961, Batch size: 16, LR: 0.1000, PPL: 13484.60, |Param|: 3662.66, |GParam|: 40.45, Training: 127/57/69 total/source/target tokens/sec Epoch: 1, Batch: 6250/11961, Batch size: 16, LR: 0.1000, PPL: 13358.43, |Param|: 3665.36, |GParam|: 20.47, Training: 127/57/69 total/source/target tokens/sec Epoch: 1, Batch: 6300/11961, Batch size: 16, LR: 0.1000, PPL: 13257.99, |Param|: 3668.09, |GParam|: 26.95, Training: 127/57/69 total/source/target tokens/sec Epoch: 1, Batch: 6350/11961, Batch size: 16, LR: 0.1000, PPL: 13155.02, |Param|: 3670.72, |GParam|: 17.61, Training: 127/57/69 total/source/target tokens/sec Epoch: 1, Batch: 6400/11961, Batch size: 16, LR: 0.1000, PPL: 13048.59, |Param|: 3673.44, |GParam|: 29.09, Training: 127/57/69 total/source/target tokens/sec Epoch: 1, Batch: 6450/11961, Batch size: 16, LR: 0.1000, PPL: 12946.44, |Param|: 3675.74, |GParam|: 18.19, Training: 127/57/69 total/source/target tokens/sec Epoch: 1, Batch: 6500/11961, Batch size: 16, LR: 0.1000, PPL: 12840.66, |Param|: 3678.17, |GParam|: 36.61, Training: 127/58/69 total/source/target tokens/sec Epoch: 1, Batch: 6550/11961, Batch size: 16, LR: 0.1000, PPL: 12731.78, |Param|: 3680.62, |GParam|: 27.39, Training: 128/58/69 total/source/target tokens/sec Epoch: 1, Batch: 6600/11961, Batch size: 16, LR: 0.1000, PPL: 12631.24, |Param|: 3682.63, |GParam|: 26.35, Training: 128/58/69 total/source/target tokens/sec Epoch: 1, Batch: 6650/11961, Batch size: 16, LR: 0.1000, PPL: 12531.37, |Param|: 3684.90, |GParam|: 23.49, Training: 128/58/69 total/source/target tokens/sec Epoch: 1, Batch: 6700/11961, Batch size: 16, LR: 0.1000, PPL: 12444.95, |Param|: 3687.35, |GParam|: 30.80, Training: 128/58/69 total/source/target tokens/sec Epoch: 1, Batch: 6750/11961, Batch size: 16, LR: 0.1000, PPL: 12351.70, |Param|: 3689.62, |GParam|: 36.92, Training: 128/58/69 total/source/target tokens/sec Epoch: 1, Batch: 6800/11961, Batch size: 16, LR: 0.1000, PPL: 12272.15, |Param|: 3692.23, |GParam|: 24.67, Training: 128/58/69 total/source/target tokens/sec Epoch: 1, Batch: 6850/11961, Batch size: 16, LR: 0.1000, PPL: 12184.13, |Param|: 3694.83, |GParam|: 40.30, Training: 128/58/69 total/source/target tokens/sec Epoch: 1, Batch: 6900/11961, Batch size: 16, LR: 0.1000, PPL: 12101.90, |Param|: 3697.14, |GParam|: 48.90, Training: 128/58/69 total/source/target tokens/sec Epoch: 1, Batch: 6950/11961, Batch size: 16, LR: 0.1000, PPL: 12016.17, |Param|: 3700.16, |GParam|: 24.05, Training: 128/58/69 total/source/target tokens/sec Epoch: 1, Batch: 7000/11961, Batch size: 16, LR: 0.1000, PPL: 11935.10, |Param|: 3703.07, |GParam|: 20.58, Training: 128/58/69 total/source/target tokens/sec Epoch: 1, Batch: 7050/11961, Batch size: 16, LR: 0.1000, PPL: 11844.25, |Param|: 3705.55, |GParam|: 18.77, Training: 128/58/69 total/source/target tokens/sec Epoch: 1, Batch: 7100/11961, Batch size: 16, LR: 0.1000, PPL: 11760.77, |Param|: 3708.38, |GParam|: 41.75, Training: 128/58/69 total/source/target tokens/sec Epoch: 1, Batch: 7150/11961, Batch size: 16, LR: 0.1000, PPL: 11680.19, |Param|: 3711.39, |GParam|: 21.77, Training: 128/58/69 total/source/target tokens/sec Epoch: 1, Batch: 7200/11961, Batch size: 16, LR: 0.1000, PPL: 11601.90, |Param|: 3714.15, |GParam|: 39.18, Training: 128/58/69 total/source/target tokens/sec Epoch: 1, Batch: 7250/11961, Batch size: 16, LR: 0.1000, PPL: 11528.23, |Param|: 3716.82, |GParam|: 24.16, Training: 128/59/69 total/source/target tokens/sec Epoch: 1, Batch: 7300/11961, Batch size: 16, LR: 0.1000, PPL: 11455.13, |Param|: 3720.03, |GParam|: 29.81, Training: 128/59/69 total/source/target tokens/sec Epoch: 1, Batch: 7350/11961, Batch size: 16, LR: 0.1000, PPL: 11384.01, |Param|: 3723.54, |GParam|: 21.86, Training: 129/59/69 total/source/target tokens/sec Epoch: 1, Batch: 7400/11961, Batch size: 16, LR: 0.1000, PPL: 11311.74, |Param|: 3726.14, |GParam|: 42.43, Training: 129/59/69 total/source/target tokens/sec Epoch: 1, Batch: 7450/11961, Batch size: 16, LR: 0.1000, PPL: 11237.50, |Param|: 3728.37, |GParam|: 44.22, Training: 129/59/69 total/source/target tokens/sec Epoch: 1, Batch: 7500/11961, Batch size: 16, LR: 0.1000, PPL: 11162.26, |Param|: 3730.83, |GParam|: 25.33, Training: 129/59/69 total/source/target tokens/sec Epoch: 1, Batch: 7550/11961, Batch size: 16, LR: 0.1000, PPL: 11088.36, |Param|: 3733.30, |GParam|: 20.41, Training: 129/59/69 total/source/target tokens/sec Epoch: 1, Batch: 7600/11961, Batch size: 16, LR: 0.1000, PPL: 11016.07, |Param|: 3736.79, |GParam|: 21.46, Training: 129/59/69 total/source/target tokens/sec Epoch: 1, Batch: 7650/11961, Batch size: 16, LR: 0.1000, PPL: 10957.65, |Param|: 3739.76, |GParam|: 24.25, Training: 129/59/69 total/source/target tokens/sec Epoch: 1, Batch: 7700/11961, Batch size: 16, LR: 0.1000, PPL: 10895.03, |Param|: 3742.51, |GParam|: 17.49, Training: 129/59/69 total/source/target tokens/sec Epoch: 1, Batch: 7750/11961, Batch size: 16, LR: 0.1000, PPL: 10827.87, |Param|: 3745.73, |GParam|: 30.26, Training: 129/59/69 total/source/target tokens/sec Epoch: 1, Batch: 7800/11961, Batch size: 16, LR: 0.1000, PPL: 10761.69, |Param|: 3748.53, |GParam|: 20.88, Training: 129/59/69 total/source/target tokens/sec Epoch: 1, Batch: 7850/11961, Batch size: 16, LR: 0.1000, PPL: 10703.59, |Param|: 3751.31, |GParam|: 53.11, Training: 129/59/69 total/source/target tokens/sec Epoch: 1, Batch: 7900/11961, Batch size: 16, LR: 0.1000, PPL: 10635.39, |Param|: 3754.23, |GParam|: 19.18, Training: 129/59/69 total/source/target tokens/sec Epoch: 1, Batch: 7950/11961, Batch size: 16, LR: 0.1000, PPL: 10567.87, |Param|: 3757.63, |GParam|: 37.55, Training: 129/59/69 total/source/target tokens/sec Epoch: 1, Batch: 8000/11961, Batch size: 16, LR: 0.1000, PPL: 10510.83, |Param|: 3759.96, |GParam|: 28.51, Training: 129/60/69 total/source/target tokens/sec Epoch: 1, Batch: 8050/11961, Batch size: 16, LR: 0.1000, PPL: 10449.28, |Param|: 3762.79, |GParam|: 30.15, Training: 129/60/69 total/source/target tokens/sec Epoch: 1, Batch: 8100/11961, Batch size: 16, LR: 0.1000, PPL: 10396.29, |Param|: 3765.43, |GParam|: 25.39, Training: 130/60/69 total/source/target tokens/sec Epoch: 1, Batch: 8150/11961, Batch size: 16, LR: 0.1000, PPL: 10342.16, |Param|: 3768.53, |GParam|: 25.58, Training: 130/60/69 total/source/target tokens/sec Epoch: 1, Batch: 8200/11961, Batch size: 16, LR: 0.1000, PPL: 10288.99, |Param|: 3771.33, |GParam|: 23.84, Training: 130/60/69 total/source/target tokens/sec Epoch: 1, Batch: 8250/11961, Batch size: 9, LR: 0.1000, PPL: 10240.39, |Param|: 3774.53, |GParam|: 45.47, Training: 130/60/69 total/source/target tokens/sec Epoch: 1, Batch: 8300/11961, Batch size: 16, LR: 0.1000, PPL: 10185.90, |Param|: 3777.77, |GParam|: 24.82, Training: 130/60/69 total/source/target tokens/sec Epoch: 1, Batch: 8350/11961, Batch size: 16, LR: 0.1000, PPL: 10126.21, |Param|: 3780.56, |GParam|: 18.68, Training: 130/60/69 total/source/target tokens/sec Epoch: 1, Batch: 8400/11961, Batch size: 16, LR: 0.1000, PPL: 10072.22, |Param|: 3783.61, |GParam|: 19.97, Training: 130/60/69 total/source/target tokens/sec Epoch: 1, Batch: 8450/11961, Batch size: 16, LR: 0.1000, PPL: 10021.71, |Param|: 3786.48, |GParam|: 27.36, Training: 130/60/69 total/source/target tokens/sec Epoch: 1, Batch: 8500/11961, Batch size: 16, LR: 0.1000, PPL: 9964.55, |Param|: 3788.92, |GParam|: 51.03, Training: 130/60/69 total/source/target tokens/sec Epoch: 1, Batch: 8550/11961, Batch size: 16, LR: 0.1000, PPL: 9910.56, |Param|: 3792.13, |GParam|: 25.12, Training: 130/60/69 total/source/target tokens/sec Epoch: 1, Batch: 8600/11961, Batch size: 16, LR: 0.1000, PPL: 9863.18, |Param|: 3794.67, |GParam|: 49.11, Training: 130/60/69 total/source/target tokens/sec Epoch: 1, Batch: 8650/11961, Batch size: 16, LR: 0.1000, PPL: 9814.14, |Param|: 3797.00, |GParam|: 40.87, Training: 130/60/69 total/source/target tokens/sec Epoch: 1, Batch: 8700/11961, Batch size: 16, LR: 0.1000, PPL: 9763.63, |Param|: 3800.13, |GParam|: 19.99, Training: 130/60/69 total/source/target tokens/sec Epoch: 1, Batch: 8750/11961, Batch size: 16, LR: 0.1000, PPL: 9714.63, |Param|: 3802.98, |GParam|: 40.94, Training: 130/61/69 total/source/target tokens/sec Epoch: 1, Batch: 8800/11961, Batch size: 16, LR: 0.1000, PPL: 9665.37, |Param|: 3805.65, |GParam|: 46.28, Training: 130/61/69 total/source/target tokens/sec Epoch: 1, Batch: 8850/11961, Batch size: 16, LR: 0.1000, PPL: 9615.19, |Param|: 3808.23, |GParam|: 24.19, Training: 130/61/69 total/source/target tokens/sec Epoch: 1, Batch: 8900/11961, Batch size: 16, LR: 0.1000, PPL: 9568.03, |Param|: 3810.74, |GParam|: 37.79, Training: 130/61/69 total/source/target tokens/sec Epoch: 1, Batch: 8950/11961, Batch size: 16, LR: 0.1000, PPL: 9522.97, |Param|: 3813.77, |GParam|: 29.07, Training: 131/61/69 total/source/target tokens/sec Epoch: 1, Batch: 9000/11961, Batch size: 16, LR: 0.1000, PPL: 9479.63, |Param|: 3816.52, |GParam|: 32.71, Training: 131/61/69 total/source/target tokens/sec Epoch: 1, Batch: 9050/11961, Batch size: 16, LR: 0.1000, PPL: 9432.36, |Param|: 3819.26, |GParam|: 47.86, Training: 131/61/69 total/source/target tokens/sec Epoch: 1, Batch: 9100/11961, Batch size: 16, LR: 0.1000, PPL: 9395.33, |Param|: 3822.11, |GParam|: 20.54, Training: 131/61/69 total/source/target tokens/sec Epoch: 1, Batch: 9150/11961, Batch size: 16, LR: 0.1000, PPL: 9351.51, |Param|: 3824.94, |GParam|: 36.28, Training: 131/61/69 total/source/target tokens/sec Epoch: 1, Batch: 9200/11961, Batch size: 16, LR: 0.1000, PPL: 9305.61, |Param|: 3827.37, |GParam|: 36.66, Training: 131/61/69 total/source/target tokens/sec Epoch: 1, Batch: 9250/11961, Batch size: 16, LR: 0.1000, PPL: 9259.24, |Param|: 3829.84, |GParam|: 19.73, Training: 131/61/69 total/source/target tokens/sec Epoch: 1, Batch: 9300/11961, Batch size: 16, LR: 0.1000, PPL: 9214.15, |Param|: 3832.57, |GParam|: 34.51, Training: 131/61/69 total/source/target tokens/sec Epoch: 1, Batch: 9350/11961, Batch size: 16, LR: 0.1000, PPL: 9172.37, |Param|: 3835.31, |GParam|: 43.18, Training: 131/61/69 total/source/target tokens/sec Epoch: 1, Batch: 9400/11961, Batch size: 16, LR: 0.1000, PPL: 9125.86, |Param|: 3837.57, |GParam|: 33.48, Training: 131/61/69 total/source/target tokens/sec Epoch: 1, Batch: 9450/11961, Batch size: 16, LR: 0.1000, PPL: 9088.40, |Param|: 3840.34, |GParam|: 37.71, Training: 131/61/69 total/source/target tokens/sec Epoch: 1, Batch: 9500/11961, Batch size: 16, LR: 0.1000, PPL: 9052.25, |Param|: 3842.95, |GParam|: 19.13, Training: 131/61/69 total/source/target tokens/sec Epoch: 1, Batch: 9550/11961, Batch size: 16, LR: 0.1000, PPL: 9010.76, |Param|: 3845.39, |GParam|: 22.22, Training: 131/62/69 total/source/target tokens/sec Epoch: 1, Batch: 9600/11961, Batch size: 16, LR: 0.1000, PPL: 8972.55, |Param|: 3847.86, |GParam|: 20.23, Training: 131/62/69 total/source/target tokens/sec Epoch: 1, Batch: 9650/11961, Batch size: 16, LR: 0.1000, PPL: 8930.62, |Param|: 3850.33, |GParam|: 21.23, Training: 131/62/69 total/source/target tokens/sec Epoch: 1, Batch: 9700/11961, Batch size: 16, LR: 0.1000, PPL: 8887.35, |Param|: 3852.76, |GParam|: 65.29, Training: 131/62/69 total/source/target tokens/sec Epoch: 1, Batch: 9750/11961, Batch size: 16, LR: 0.1000, PPL: 8853.43, |Param|: 3855.38, |GParam|: 24.30, Training: 131/62/69 total/source/target tokens/sec Epoch: 1, Batch: 9800/11961, Batch size: 16, LR: 0.1000, PPL: 8816.38, |Param|: 3858.15, |GParam|: 23.90, Training: 132/62/69 total/source/target tokens/sec Epoch: 1, Batch: 9850/11961, Batch size: 16, LR: 0.1000, PPL: 8779.77, |Param|: 3860.81, |GParam|: 24.60, Training: 132/62/69 total/source/target tokens/sec Epoch: 1, Batch: 9900/11961, Batch size: 16, LR: 0.1000, PPL: 8739.60, |Param|: 3863.01, |GParam|: 34.16, Training: 132/62/69 total/source/target tokens/sec Epoch: 1, Batch: 9950/11961, Batch size: 16, LR: 0.1000, PPL: 8701.43, |Param|: 3865.69, |GParam|: 38.05, Training: 132/62/69 total/source/target tokens/sec Epoch: 1, Batch: 10000/11961, Batch size: 16, LR: 0.1000, PPL: 8663.82, |Param|: 3868.14, |GParam|: 33.51, Training: 132/62/69 total/source/target tokens/sec Epoch: 1, Batch: 10050/11961, Batch size: 16, LR: 0.1000, PPL: 8627.17, |Param|: 3870.47, |GParam|: 18.95, Training: 132/62/69 total/source/target tokens/sec Epoch: 1, Batch: 10100/11961, Batch size: 16, LR: 0.1000, PPL: 8593.76, |Param|: 3873.17, |GParam|: 28.40, Training: 132/62/69 total/source/target tokens/sec Epoch: 1, Batch: 10150/11961, Batch size: 16, LR: 0.1000, PPL: 8556.48, |Param|: 3875.61, |GParam|: 21.78, Training: 132/62/69 total/source/target tokens/sec Epoch: 1, Batch: 10200/11961, Batch size: 16, LR: 0.1000, PPL: 8519.93, |Param|: 3878.10, |GParam|: 38.16, Training: 132/62/69 total/source/target tokens/sec Epoch: 1, Batch: 10250/11961, Batch size: 16, LR: 0.1000, PPL: 8486.54, |Param|: 3880.63, |GParam|: 42.85, Training: 132/62/69 total/source/target tokens/sec Epoch: 1, Batch: 10300/11961, Batch size: 16, LR: 0.1000, PPL: 8451.90, |Param|: 3883.04, |GParam|: 24.74, Training: 132/62/69 total/source/target tokens/sec Epoch: 1, Batch: 10350/11961, Batch size: 16, LR: 0.1000, PPL: 8417.07, |Param|: 3885.40, |GParam|: 26.92, Training: 132/63/69 total/source/target tokens/sec Epoch: 1, Batch: 10400/11961, Batch size: 16, LR: 0.1000, PPL: 8381.48, |Param|: 3887.72, |GParam|: 24.95, Training: 132/63/69 total/source/target tokens/sec Epoch: 1, Batch: 10450/11961, Batch size: 16, LR: 0.1000, PPL: 8345.20, |Param|: 3890.07, |GParam|: 30.50, Training: 132/63/69 total/source/target tokens/sec Epoch: 1, Batch: 10500/11961, Batch size: 16, LR: 0.1000, PPL: 8314.76, |Param|: 3892.60, |GParam|: 27.48, Training: 132/63/69 total/source/target tokens/sec Epoch: 1, Batch: 10550/11961, Batch size: 16, LR: 0.1000, PPL: 8279.58, |Param|: 3895.04, |GParam|: 52.15, Training: 132/63/69 total/source/target tokens/sec Epoch: 1, Batch: 10600/11961, Batch size: 16, LR: 0.1000, PPL: 8248.08, |Param|: 3897.27, |GParam|: 22.93, Training: 132/63/69 total/source/target tokens/sec Epoch: 1, Batch: 10650/11961, Batch size: 16, LR: 0.1000, PPL: 8208.68, |Param|: 3899.59, |GParam|: 26.27, Training: 132/63/69 total/source/target tokens/sec Epoch: 1, Batch: 10700/11961, Batch size: 16, LR: 0.1000, PPL: 8174.44, |Param|: 3901.87, |GParam|: 22.66, Training: 133/63/69 total/source/target tokens/sec Epoch: 1, Batch: 10750/11961, Batch size: 16, LR: 0.1000, PPL: 8141.27, |Param|: 3904.25, |GParam|: 30.30, Training: 133/63/69 total/source/target tokens/sec Epoch: 1, Batch: 10800/11961, Batch size: 16, LR: 0.1000, PPL: 8106.26, |Param|: 3906.33, |GParam|: 25.40, Training: 133/63/69 total/source/target tokens/sec Epoch: 1, Batch: 10850/11961, Batch size: 16, LR: 0.1000, PPL: 8071.57, |Param|: 3908.82, |GParam|: 43.22, Training: 133/63/69 total/source/target tokens/sec Epoch: 1, Batch: 10900/11961, Batch size: 16, LR: 0.1000, PPL: 8040.07, |Param|: 3911.29, |GParam|: 36.10, Training: 133/63/69 total/source/target tokens/sec Epoch: 1, Batch: 10950/11961, Batch size: 16, LR: 0.1000, PPL: 8005.31, |Param|: 3913.71, |GParam|: 26.50, Training: 133/63/69 total/source/target tokens/sec Epoch: 1, Batch: 11000/11961, Batch size: 16, LR: 0.1000, PPL: 7972.50, |Param|: 3915.92, |GParam|: 35.14, Training: 133/63/69 total/source/target tokens/sec Epoch: 1, Batch: 11050/11961, Batch size: 16, LR: 0.1000, PPL: 7939.09, |Param|: 3918.03, |GParam|: 39.80, Training: 133/63/69 total/source/target tokens/sec Epoch: 1, Batch: 11100/11961, Batch size: 16, LR: 0.1000, PPL: 7906.87, |Param|: 3920.19, |GParam|: 32.39, Training: 133/63/69 total/source/target tokens/sec Epoch: 1, Batch: 11150/11961, Batch size: 16, LR: 0.1000, PPL: 7878.65, |Param|: 3922.56, |GParam|: 39.54, Training: 133/64/69 total/source/target tokens/sec Epoch: 1, Batch: 11200/11961, Batch size: 16, LR: 0.1000, PPL: 7844.98, |Param|: 3925.09, |GParam|: 23.82, Training: 132/63/69 total/source/target tokens/sec Epoch: 1, Batch: 11250/11961, Batch size: 16, LR: 0.1000, PPL: 7812.56, |Param|: 3927.73, |GParam|: 23.69, Training: 132/63/69 total/source/target tokens/sec Epoch: 1, Batch: 11300/11961, Batch size: 16, LR: 0.1000, PPL: 7775.94, |Param|: 3929.74, |GParam|: 42.56, Training: 132/63/69 total/source/target tokens/sec Epoch: 1, Batch: 11350/11961, Batch size: 16, LR: 0.1000, PPL: 7743.51, |Param|: 3931.90, |GParam|: 29.73, Training: 132/63/69 total/source/target tokens/sec Epoch: 1, Batch: 11400/11961, Batch size: 16, LR: 0.1000, PPL: 7705.48, |Param|: 3934.44, |GParam|: 29.01, Training: 133/63/69 total/source/target tokens/sec Epoch: 1, Batch: 11450/11961, Batch size: 16, LR: 0.1000, PPL: 7672.92, |Param|: 3936.75, |GParam|: 29.20, Training: 133/64/69 total/source/target tokens/sec Epoch: 1, Batch: 11500/11961, Batch size: 16, LR: 0.1000, PPL: 7641.61, |Param|: 3938.89, |GParam|: 26.40, Training: 133/64/69 total/source/target tokens/sec Epoch: 1, Batch: 11550/11961, Batch size: 16, LR: 0.1000, PPL: 7609.38, |Param|: 3941.31, |GParam|: 44.57, Training: 133/64/69 total/source/target tokens/sec Epoch: 1, Batch: 11600/11961, Batch size: 16, LR: 0.1000, PPL: 7574.65, |Param|: 3943.78, |GParam|: 36.12, Training: 133/64/69 total/source/target tokens/sec Epoch: 1, Batch: 11650/11961, Batch size: 16, LR: 0.1000, PPL: 7544.46, |Param|: 3946.17, |GParam|: 58.80, Training: 133/64/69 total/source/target tokens/sec Epoch: 1, Batch: 11700/11961, Batch size: 16, LR: 0.1000, PPL: 7508.51, |Param|: 3948.58, |GParam|: 37.85, Training: 133/64/69 total/source/target tokens/sec Epoch: 1, Batch: 11750/11961, Batch size: 16, LR: 0.1000, PPL: 7473.52, |Param|: 3950.76, |GParam|: 37.81, Training: 133/64/68 total/source/target tokens/sec Epoch: 1, Batch: 11800/11961, Batch size: 16, LR: 0.1000, PPL: 7434.59, |Param|: 3953.21, |GParam|: 74.22, Training: 133/64/68 total/source/target tokens/sec Epoch: 1, Batch: 11850/11961, Batch size: 16, LR: 0.1000, PPL: 7390.42, |Param|: 3955.73, |GParam|: 28.50, Training: 133/64/68 total/source/target tokens/sec Epoch: 1, Batch: 11900/11961, Batch size: 16, LR: 0.1000, PPL: 7352.60, |Param|: 3958.12, |GParam|: 29.26, Training: 133/64/68 total/source/target tokens/sec Epoch: 1, Batch: 11950/11961, Batch size: 16, LR: 0.1000, PPL: 7311.31, |Param|: 3960.31, |GParam|: 40.45, Training: 133/64/68 total/source/target tokens/sec Train 7303.3308711641 Valid 2217.3230611991 saving checkpoint to demo-model_epoch1.00_2217.32.t7 hans@hans-Lenovo-IdeaPad-Y500: ~/Documents/HANS/MAC/SUCCESSFUL MODELS/ADD/seq2seq-attn-master hans@hans-Lenovo-IdeaPad-Y500:~/Documents/HANS/MAC/SUCCESSFUL MODELS/ADD/seq2seq- -attn-master$ th train.lua -data_file data/demo-train.hdf5 -val_data_file data/demo-val.hdf5 -savefile demo-model using CUDA on GPU 1... loading data... done! Source vocab size: 50004, Target vocab size: 150004 Source max sent len: 50, Target max sent len: 52 Number of additional features on source side: 0 Switching on memory preallocation loading demo-model_epoch1.00_2217.32.t7... Number of parameters: 84236504 (active: 84236504) Epoch: 2, Batch: 50/11961, Batch size: 16, LR: 0.1000, PPL: 4838768824.50, |Param|: 3975.92, |GParam|: 38.16, Training: 91/17/73 total/source/target tokens/sec Epoch: 2, Batch: 100/11961, Batch size: 16, LR: 0.1000, PPL: 880249925.61, |Param|: 4015.28, |GParam|: 27.68, Training: 96/23/72 total/source/target tokens/sec Epoch: 2, Batch: 150/11961, Batch size: 16, LR: 0.1000, PPL: 264879553.82, |Param|: 4046.21, |GParam|: 49.78, Training: 99/27/72 total/source/target tokens/sec Epoch: 2, Batch: 200/11961, Batch size: 16, LR: 0.1000, PPL: 109227793.09, |Param|: 4068.61, |GParam|: 44.03, Training: 101/29/71 total/source/target tokens/sec Epoch: 2, Batch: 250/11961, Batch size: 16, LR: 0.1000, PPL: 66526226.47, |Param|: 4087.16, |GParam|: 28.99, Training: 102/30/71 total/source/target tokens/sec Epoch: 2, Batch: 300/11961, Batch size: 16, LR: 0.1000, PPL: 47088944.37, |Param|: 4108.59, |GParam|: 31.74, Training: 104/32/71 total/source/target tokens/sec Epoch: 2, Batch: 350/11961, Batch size: 16, LR: 0.1000, PPL: 29983354.77, |Param|: 4128.45, |GParam|: 52.24, Training: 105/33/71 total/source/target tokens/sec Epoch: 2, Batch: 400/11961, Batch size: 16, LR: 0.1000, PPL: 21500571.98, |Param|: 4146.79, |GParam|: 32.91, Training: 105/34/71 total/source/target tokens/sec Epoch: 2, Batch: 450/11961, Batch size: 16, LR: 0.1000, PPL: 16583410.74, |Param|: 4161.84, |GParam|: 34.75, Training: 106/35/71 total/source/target tokens/sec Epoch: 2, Batch: 500/11961, Batch size: 16, LR: 0.1000, PPL: 13306696.23, |Param|: 4177.40, |GParam|: 44.95, Training: 107/36/71 total/source/target tokens/sec Epoch: 2, Batch: 550/11961, Batch size: 16, LR: 0.1000, PPL: 11231166.80, |Param|: 4189.96, |GParam|: 41.09, Training: 108/37/71 total/source/target tokens/sec Epoch: 2, Batch: 600/11961, Batch size: 16, LR: 0.1000, PPL: 10461788.54, |Param|: 4203.96, |GParam|: 40.55, Training: 108/37/71 total/source/target tokens/sec Epoch: 2, Batch: 650/11961, Batch size: 16, LR: 0.1000, PPL: 9416502.18, |Param|: 4216.53, |GParam|: 42.16, Training: 109/38/71 total/source/target tokens/sec Epoch: 2, Batch: 700/11961, Batch size: 16, LR: 0.1000, PPL: 8327026.98, |Param|: 4229.09, |GParam|: 32.59, Training: 110/38/71 total/source/target tokens/sec Epoch: 2, Batch: 750/11961, Batch size: 16, LR: 0.1000, PPL: 7494849.67, |Param|: 4238.42, |GParam|: 55.44, Training: 110/39/71 total/source/target tokens/sec Epoch: 2, Batch: 800/11961, Batch size: 16, LR: 0.1000, PPL: 6723791.55, |Param|: 4248.66, |GParam|: 32.55, Training: 110/39/71 total/source/target tokens/sec Epoch: 2, Batch: 850/11961, Batch size: 16, LR: 0.1000, PPL: 6197310.44, |Param|: 4258.54, |GParam|: 35.54, Training: 111/40/71 total/source/target tokens/sec Epoch: 2, Batch: 900/11961, Batch size: 16, LR: 0.1000, PPL: 5814556.86, |Param|: 4268.71, |GParam|: 34.67, Training: 112/40/71 total/source/target tokens/sec Epoch: 2, Batch: 950/11961, Batch size: 16, LR: 0.1000, PPL: 5442788.76, |Param|: 4276.83, |GParam|: 31.44, Training: 112/41/71 total/source/target tokens/sec Epoch: 2, Batch: 1000/11961, Batch size: 16, LR: 0.1000, PPL: 5103346.08, |Param|: 4287.71, |GParam|: 26.77, Training: 112/41/71 total/source/target tokens/sec Epoch: 2, Batch: 1050/11961, Batch size: 16, LR: 0.1000, PPL: 4762990.21, |Param|: 4295.76, |GParam|: 29.72, Training: 113/42/71 total/source/target tokens/sec Epoch: 2, Batch: 1100/11961, Batch size: 16, LR: 0.1000, PPL: 4483828.82, |Param|: 4304.71, |GParam|: 73.94, Training: 113/42/70 total/source/target tokens/sec Epoch: 2, Batch: 1150/11961, Batch size: 16, LR: 0.1000, PPL: 4174935.69, |Param|: 4314.17, |GParam|: 42.85, Training: 113/43/70 total/source/target tokens/sec Epoch: 2, Batch: 1200/11961, Batch size: 16, LR: 0.1000, PPL: 3881684.11, |Param|: 4326.39, |GParam|: 28.52, Training: 114/43/70 total/source/target tokens/sec Epoch: 2, Batch: 1250/11961, Batch size: 16, LR: 0.1000, PPL: 3586027.50, |Param|: 4336.86, |GParam|: 24.53, Training: 114/43/70 total/source/target tokens/sec Epoch: 2, Batch: 1300/11961, Batch size: 16, LR: 0.1000, PPL: 3272803.69, |Param|: 4346.67, |GParam|: 21.81, Training: 114/43/70 total/source/target tokens/sec Epoch: 2, Batch: 1350/11961, Batch size: 16, LR: 0.1000, PPL: 2983300.42, |Param|: 4357.57, |GParam|: 30.68, Training: 115/44/70 total/source/target tokens/sec Epoch: 2, Batch: 1400/11961, Batch size: 16, LR: 0.1000, PPL: 2713163.65, |Param|: 4367.44, |GParam|: 38.85, Training: 115/44/70 total/source/target tokens/sec Epoch: 2, Batch: 1450/11961, Batch size: 16, LR: 0.1000, PPL: 2452952.84, |Param|: 4378.22, |GParam|: 48.15, Training: 115/44/70 total/source/target tokens/sec Epoch: 2, Batch: 1500/11961, Batch size: 16, LR: 0.1000, PPL: 2218312.16, |Param|: 4388.11, |GParam|: 29.60, Training: 116/45/70 total/source/target tokens/sec Epoch: 2, Batch: 1550/11961, Batch size: 16, LR: 0.1000, PPL: 1985136.83, |Param|: 4412.88, |GParam|: 32.66, Training: 116/45/70 total/source/target tokens/sec Epoch: 2, Batch: 1600/11961, Batch size: 16, LR: 0.1000, PPL: 1772242.43, |Param|: 4435.18, |GParam|: 41.94, Training: 116/45/70 total/source/target tokens/sec Epoch: 2, Batch: 1650/11961, Batch size: 16, LR: 0.1000, PPL: 1571927.39, |Param|: 4447.56, |GParam|: 75.26, Training: 116/45/70 total/source/target tokens/sec Epoch: 2, Batch: 1700/11961, Batch size: 16, LR: 0.1000, PPL: 1398845.96, |Param|: 4454.25, |GParam|: 58.90, Training: 116/46/70 total/source/target tokens/sec Epoch: 2, Batch: 1750/11961, Batch size: 16, LR: 0.1000, PPL: 1251162.43, |Param|: 4458.35, |GParam|: 78.37, Training: 116/46/70 total/source/target tokens/sec Epoch: 2, Batch: 1800/11961, Batch size: 16, LR: 0.1000, PPL: 1107787.84, |Param|: 4461.76, |GParam|: 48.72, Training: 117/46/70 total/source/target tokens/sec Epoch: 2, Batch: 1850/11961, Batch size: 16, LR: 0.1000, PPL: 991776.06, |Param|: 4466.26, |GParam|: 71.71, Training: 117/46/70 total/source/target tokens/sec Epoch: 2, Batch: 1900/11961, Batch size: 16, LR: 0.1000, PPL: 890671.49, |Param|: 4469.18, |GParam|: 39.07, Training: 117/46/70 total/source/target tokens/sec Epoch: 2, Batch: 1950/11961, Batch size: 16, LR: 0.1000, PPL: 798715.00, |Param|: 4476.79, |GParam|: 60.88, Training: 117/47/70 total/source/target tokens/sec Epoch: 2, Batch: 2000/11961, Batch size: 16, LR: 0.1000, PPL: 720789.20, |Param|: 4484.97, |GParam|: 42.08, Training: 118/47/70 total/source/target tokens/sec Epoch: 2, Batch: 2050/11961, Batch size: 16, LR: 0.1000, PPL: 652023.08, |Param|: 4491.22, |GParam|: 53.16, Training: 118/47/70 total/source/target tokens/sec Epoch: 2, Batch: 2100/11961, Batch size: 16, LR: 0.1000, PPL: 593071.28, |Param|: 4498.90, |GParam|: 83.75, Training: 118/47/70 total/source/target tokens/sec Epoch: 2, Batch: 2150/11961, Batch size: 16, LR: 0.1000, PPL: 537014.02, |Param|: 4506.03, |GParam|: 60.12, Training: 118/48/70 total/source/target tokens/sec Epoch: 2, Batch: 2200/11961, Batch size: 16, LR: 0.1000, PPL: 491575.20, |Param|: 4510.13, |GParam|: 62.57, Training: 118/48/70 total/source/target tokens/sec Epoch: 2, Batch: 2250/11961, Batch size: 16, LR: 0.1000, PPL: 448285.70, |Param|: 4516.09, |GParam|: 54.65, Training: 119/48/70 total/source/target tokens/sec Epoch: 2, Batch: 2300/11961, Batch size: 16, LR: 0.1000, PPL: 413488.51, |Param|: 4522.04, |GParam|: 30.18, Training: 119/48/70 total/source/target tokens/sec Epoch: 2, Batch: 2350/11961, Batch size: 16, LR: 0.1000, PPL: 382360.55, |Param|: 4526.25, |GParam|: 64.29, Training: 119/48/70 total/source/target tokens/sec Epoch: 2, Batch: 2400/11961, Batch size: 16, LR: 0.1000, PPL: 354825.39, |Param|: 4531.27, |GParam|: 60.91, Training: 119/49/70 total/source/target tokens/sec Epoch: 2, Batch: 2450/11961, Batch size: 16, LR: 0.1000, PPL: 330916.14, |Param|: 4532.42, |GParam|: 79.08, Training: 119/49/70 total/source/target tokens/sec Epoch: 2, Batch: 2500/11961, Batch size: 16, LR: 0.1000, PPL: 309086.51, |Param|: 4534.82, |GParam|: 42.34, Training: 120/49/70 total/source/target tokens/sec Epoch: 2, Batch: 2550/11961, Batch size: 16, LR: 0.1000, PPL: 289114.08, |Param|: 4536.67, |GParam|: 57.71, Training: 120/49/70 total/source/target tokens/sec Epoch: 2, Batch: 2600/11961, Batch size: 16, LR: 0.1000, PPL: 271803.92, |Param|: 4539.46, |GParam|: 69.82, Training: 120/49/70 total/source/target tokens/sec Epoch: 2, Batch: 2650/11961, Batch size: 16, LR: 0.1000, PPL: 255417.23, |Param|: 4541.10, |GParam|: 66.36, Training: 120/49/70 total/source/target tokens/sec Epoch: 2, Batch: 2700/11961, Batch size: 16, LR: 0.1000, PPL: 242183.79, |Param|: 4543.51, |GParam|: 72.45, Training: 120/50/70 total/source/target tokens/sec Epoch: 2, Batch: 2750/11961, Batch size: 16, LR: 0.1000, PPL: 228255.90, |Param|: 4546.00, |GParam|: 57.84, Training: 120/50/70 total/source/target tokens/sec Epoch: 2, Batch: 2800/11961, Batch size: 16, LR: 0.1000, PPL: 215864.52, |Param|: 4549.50, |GParam|: 92.83, Training: 120/50/70 total/source/target tokens/sec Epoch: 2, Batch: 2850/11961, Batch size: 16, LR: 0.1000, PPL: 204444.45, |Param|: 4552.57, |GParam|: 57.72, Training: 121/50/70 total/source/target tokens/sec Epoch: 2, Batch: 2900/11961, Batch size: 16, LR: 0.1000, PPL: 193873.00, |Param|: 4555.13, |GParam|: 60.72, Training: 121/50/70 total/source/target tokens/sec Epoch: 2, Batch: 2950/11961, Batch size: 16, LR: 0.1000, PPL: 183512.55, |Param|: 4558.61, |GParam|: 40.00, Training: 121/50/70 total/source/target tokens/sec Epoch: 2, Batch: 3000/11961, Batch size: 16, LR: 0.1000, PPL: 174029.36, |Param|: 4561.89, |GParam|: 71.26, Training: 121/51/70 total/source/target tokens/sec Epoch: 2, Batch: 3050/11961, Batch size: 16, LR: 0.1000, PPL: 165503.35, |Param|: 4565.30, |GParam|: 55.75, Training: 121/51/70 total/source/target tokens/sec Epoch: 2, Batch: 3100/11961, Batch size: 16, LR: 0.1000, PPL: 157597.62, |Param|: 4568.80, |GParam|: 91.64, Training: 121/51/70 total/source/target tokens/sec Epoch: 2, Batch: 3150/11961, Batch size: 16, LR: 0.1000, PPL: 150352.78, |Param|: 4575.07, |GParam|: 53.95, Training: 122/51/70 total/source/target tokens/sec Epoch: 2, Batch: 3200/11961, Batch size: 16, LR: 0.1000, PPL: 143606.75, |Param|: 4583.33, |GParam|: 81.70, Training: 122/51/70 total/source/target tokens/sec Epoch: 2, Batch: 3250/11961, Batch size: 16, LR: 0.1000, PPL: 137101.36, |Param|: 4591.28, |GParam|: 35.14, Training: 122/51/70 total/source/target tokens/sec Epoch: 2, Batch: 3300/11961, Batch size: 16, LR: 0.1000, PPL: 130935.34, |Param|: 4599.11, |GParam|: 48.47, Training: 122/51/70 total/source/target tokens/sec Epoch: 2, Batch: 3350/11961, Batch size: 16, LR: 0.1000, PPL: 125220.62, |Param|: 4606.31, |GParam|: 60.13, Training: 122/52/70 total/source/target tokens/sec Epoch: 2, Batch: 3400/11961, Batch size: 16, LR: 0.1000, PPL: 120191.68, |Param|: 4612.96, |GParam|: 63.84, Training: 122/52/70 total/source/target tokens/sec Epoch: 2, Batch: 3450/11961, Batch size: 16, LR: 0.1000, PPL: 115456.83, |Param|: 4618.41, |GParam|: 71.54, Training: 122/52/70 total/source/target tokens/sec Epoch: 2, Batch: 3500/11961, Batch size: 16, LR: 0.1000, PPL: 110880.94, |Param|: 4624.19, |GParam|: 66.28, Training: 122/52/70 total/source/target tokens/sec Epoch: 2, Batch: 3550/11961, Batch size: 16, LR: 0.1000, PPL: 106722.83, |Param|: 4630.02, |GParam|: 44.79, Training: 123/52/70 total/source/target tokens/sec Epoch: 2, Batch: 3600/11961, Batch size: 16, LR: 0.1000, PPL: 102851.19, |Param|: 4633.93, |GParam|: 70.15, Training: 123/52/70 total/source/target tokens/sec Epoch: 2, Batch: 3650/11961, Batch size: 16, LR: 0.1000, PPL: 99326.00, |Param|: 4639.15, |GParam|: 56.55, Training: 123/52/70 total/source/target tokens/sec Epoch: 2, Batch: 3700/11961, Batch size: 16, LR: 0.1000, PPL: 95893.31, |Param|: 4644.25, |GParam|: 62.47, Training: 123/52/70 total/source/target tokens/sec Epoch: 2, Batch: 3750/11961, Batch size: 16, LR: 0.1000, PPL: 92806.63, |Param|: 4649.98, |GParam|: 36.18, Training: 123/53/70 total/source/target tokens/sec Epoch: 2, Batch: 3800/11961, Batch size: 16, LR: 0.1000, PPL: 89853.00, |Param|: 4655.44, |GParam|: 83.24, Training: 123/53/70 total/source/target tokens/sec Epoch: 2, Batch: 3850/11961, Batch size: 16, LR: 0.1000, PPL: 87042.90, |Param|: 4660.40, |GParam|: 53.86, Training: 123/53/70 total/source/target tokens/sec Epoch: 2, Batch: 3900/11961, Batch size: 16, LR: 0.1000, PPL: 84475.50, |Param|: 4670.31, |GParam|: 60.47, Training: 123/53/70 total/source/target tokens/sec Epoch: 2, Batch: 3950/11961, Batch size: 16, LR: 0.1000, PPL: 81746.54, |Param|: 4679.95, |GParam|: 78.79, Training: 123/53/70 total/source/target tokens/sec Epoch: 2, Batch: 4000/11961, Batch size: 16, LR: 0.1000, PPL: 79171.30, |Param|: 4690.64, |GParam|: 55.03, Training: 124/53/70 total/source/target tokens/sec Epoch: 2, Batch: 4050/11961, Batch size: 16, LR: 0.1000, PPL: 76736.21, |Param|: 4701.66, |GParam|: 61.61, Training: 124/53/70 total/source/target tokens/sec Epoch: 2, Batch: 4100/11961, Batch size: 16, LR: 0.1000, PPL: 74360.22, |Param|: 4711.78, |GParam|: 34.55, Training: 124/53/70 total/source/target tokens/sec Epoch: 2, Batch: 4150/11961, Batch size: 16, LR: 0.1000, PPL: 72171.81, |Param|: 4721.15, |GParam|: 79.11, Training: 124/54/70 total/source/target tokens/sec Epoch: 2, Batch: 4200/11961, Batch size: 16, LR: 0.1000, PPL: 70132.05, |Param|: 4729.42, |GParam|: 38.04, Training: 124/54/70 total/source/target tokens/sec Epoch: 2, Batch: 4250/11961, Batch size: 16, LR: 0.1000, PPL: 68194.65, |Param|: 4736.29, |GParam|: 85.70, Training: 124/54/70 total/source/target tokens/sec Epoch: 2, Batch: 4300/11961, Batch size: 16, LR: 0.1000, PPL: 66376.37, |Param|: 4743.68, |GParam|: 94.17, Training: 124/54/70 total/source/target tokens/sec Epoch: 2, Batch: 4350/11961, Batch size: 16, LR: 0.1000, PPL: 64536.07, |Param|: 4749.50, |GParam|: 39.35, Training: 124/54/70 total/source/target tokens/sec Epoch: 2, Batch: 4400/11961, Batch size: 16, LR: 0.1000, PPL: 62838.08, |Param|: 4757.05, |GParam|: 84.73, Training: 124/54/70 total/source/target tokens/sec Epoch: 2, Batch: 4450/11961, Batch size: 16, LR: 0.1000, PPL: 61248.21, |Param|: 4764.74, |GParam|: 41.82, Training: 124/54/70 total/source/target tokens/sec Epoch: 2, Batch: 4500/11961, Batch size: 16, LR: 0.1000, PPL: 59803.14, |Param|: 4770.98, |GParam|: 48.30, Training: 125/54/70 total/source/target tokens/sec Epoch: 2, Batch: 4550/11961, Batch size: 16, LR: 0.1000, PPL: 58281.10, |Param|: 4777.17, |GParam|: 58.99, Training: 125/54/70 total/source/target tokens/sec Epoch: 2, Batch: 4600/11961, Batch size: 16, LR: 0.1000, PPL: 56925.82, |Param|: 4782.65, |GParam|: 71.62, Training: 125/54/70 total/source/target tokens/sec Epoch: 2, Batch: 4650/11961, Batch size: 16, LR: 0.1000, PPL: 55734.21, |Param|: 4787.68, |GParam|: 66.63, Training: 125/55/70 total/source/target tokens/sec Epoch: 2, Batch: 4700/11961, Batch size: 16, LR: 0.1000, PPL: 54534.37, |Param|: 4793.48, |GParam|: 68.33, Training: 125/55/70 total/source/target tokens/sec Epoch: 2, Batch: 4750/11961, Batch size: 16, LR: 0.1000, PPL: 53308.62, |Param|: 4798.69, |GParam|: 61.73, Training: 125/55/70 total/source/target tokens/sec Epoch: 2, Batch: 4800/11961, Batch size: 16, LR: 0.1000, PPL: 52117.22, |Param|: 4804.74, |GParam|: 57.42, Training: 125/55/70 total/source/target tokens/sec Epoch: 2, Batch: 4850/11961, Batch size: 16, LR: 0.1000, PPL: 51040.11, |Param|: 4810.35, |GParam|: 59.12, Training: 125/55/70 total/source/target tokens/sec Epoch: 2, Batch: 4900/11961, Batch size: 16, LR: 0.1000, PPL: 50013.01, |Param|: 4815.57, |GParam|: 50.32, Training: 125/55/70 total/source/target tokens/sec Epoch: 2, Batch: 4950/11961, Batch size: 16, LR: 0.1000, PPL: 49029.85, |Param|: 4820.47, |GParam|: 55.38, Training: 125/55/70 total/source/target tokens/sec Epoch: 2, Batch: 5000/11961, Batch size: 16, LR: 0.1000, PPL: 48099.11, |Param|: 4825.18, |GParam|: 40.05, Training: 125/55/70 total/source/target tokens/sec Epoch: 2, Batch: 5050/11961, Batch size: 16, LR: 0.1000, PPL: 47242.27, |Param|: 4830.25, |GParam|: 45.21, Training: 126/55/70 total/source/target tokens/sec Epoch: 2, Batch: 5100/11961, Batch size: 16, LR: 0.1000, PPL: 46398.25, |Param|: 4834.62, |GParam|: 46.77, Training: 126/55/70 total/source/target tokens/sec Epoch: 2, Batch: 5150/11961, Batch size: 16, LR: 0.1000, PPL: 45599.85, |Param|: 4838.83, |GParam|: 68.07, Training: 126/55/70 total/source/target tokens/sec Epoch: 2, Batch: 5200/11961, Batch size: 16, LR: 0.1000, PPL: 44850.91, |Param|: 4842.55, |GParam|: 27.45, Training: 126/56/70 total/source/target tokens/sec Epoch: 2, Batch: 5250/11961, Batch size: 16, LR: 0.1000, PPL: 44158.14, |Param|: 4846.92, |GParam|: 57.45, Training: 126/56/70 total/source/target tokens/sec Epoch: 2, Batch: 5300/11961, Batch size: 16, LR: 0.1000, PPL: 43425.90, |Param|: 4851.88, |GParam|: 66.08, Training: 126/56/70 total/source/target tokens/sec Epoch: 2, Batch: 5350/11961, Batch size: 16, LR: 0.1000, PPL: 42793.17, |Param|: 4856.54, |GParam|: 33.74, Training: 126/56/70 total/source/target tokens/sec Epoch: 2, Batch: 5400/11961, Batch size: 16, LR: 0.1000, PPL: 42102.62, |Param|: 4860.15, |GParam|: 45.78, Training: 126/56/70 total/source/target tokens/sec Epoch: 2, Batch: 5450/11961, Batch size: 16, LR: 0.1000, PPL: 41472.30, |Param|: 4864.07, |GParam|: 37.98, Training: 126/56/70 total/source/target tokens/sec Epoch: 2, Batch: 5500/11961, Batch size: 16, LR: 0.1000, PPL: 40896.75, |Param|: 4867.64, |GParam|: 35.46, Training: 126/56/70 total/source/target tokens/sec Epoch: 2, Batch: 5550/11961, Batch size: 16, LR: 0.1000, PPL: 40282.02, |Param|: 4871.65, |GParam|: 32.85, Training: 126/56/70 total/source/target tokens/sec Epoch: 2, Batch: 5600/11961, Batch size: 16, LR: 0.1000, PPL: 39713.81, |Param|: 4876.55, |GParam|: 60.17, Training: 126/56/70 total/source/target tokens/sec Epoch: 2, Batch: 5650/11961, Batch size: 16, LR: 0.1000, PPL: 39155.68, |Param|: 4880.51, |GParam|: 43.14, Training: 126/56/70 total/source/target tokens/sec Epoch: 2, Batch: 5700/11961, Batch size: 16, LR: 0.1000, PPL: 38611.95, |Param|: 4884.46, |GParam|: 64.22, Training: 127/56/70 total/source/target tokens/sec Epoch: 2, Batch: 5750/11961, Batch size: 16, LR: 0.1000, PPL: 38068.92, |Param|: 4887.82, |GParam|: 69.25, Training: 127/56/70 total/source/target tokens/sec Epoch: 2, Batch: 5800/11961, Batch size: 16, LR: 0.1000, PPL: 37556.95, |Param|: 4891.30, |GParam|: 70.46, Training: 127/57/70 total/source/target tokens/sec Epoch: 2, Batch: 5850/11961, Batch size: 16, LR: 0.1000, PPL: 37072.37, |Param|: 4894.95, |GParam|: 93.40, Training: 127/57/70 total/source/target tokens/sec Epoch: 2, Batch: 5900/11961, Batch size: 16, LR: 0.1000, PPL: 36576.59, |Param|: 4898.47, |GParam|: 63.53, Training: 127/57/70 total/source/target tokens/sec Epoch: 2, Batch: 5950/11961, Batch size: 16, LR: 0.1000, PPL: 36106.14, |Param|: 4901.84, |GParam|: 47.24, Training: 127/57/70 total/source/target tokens/sec Epoch: 2, Batch: 6000/11961, Batch size: 16, LR: 0.1000, PPL: 35677.41, |Param|: 4904.88, |GParam|: 79.95, Training: 127/57/70 total/source/target tokens/sec Epoch: 2, Batch: 6050/11961, Batch size: 16, LR: 0.1000, PPL: 35256.57, |Param|: 4908.26, |GParam|: 43.05, Training: 127/57/70 total/source/target tokens/sec Epoch: 2, Batch: 6100/11961, Batch size: 16, LR: 0.1000, PPL: 34841.61, |Param|: 4912.22, |GParam|: 73.69, Training: 127/57/70 total/source/target tokens/sec Epoch: 2, Batch: 6150/11961, Batch size: 16, LR: 0.1000, PPL: 34432.11, |Param|: 4915.59, |GParam|: 73.26, Training: 127/57/70 total/source/target tokens/sec Epoch: 2, Batch: 6200/11961, Batch size: 16, LR: 0.1000, PPL: 34043.96, |Param|: 4918.29, |GParam|: 43.09, Training: 127/57/70 total/source/target tokens/sec Epoch: 2, Batch: 6250/11961, Batch size: 16, LR: 0.1000, PPL: 33618.28, |Param|: 4921.04, |GParam|: 59.57, Training: 127/57/70 total/source/target tokens/sec Epoch: 2, Batch: 6300/11961, Batch size: 16, LR: 0.1000, PPL: 33292.95, |Param|: 4924.20, |GParam|: 83.45, Training: 127/57/70 total/source/target tokens/sec Epoch: 2, Batch: 6350/11961, Batch size: 16, LR: 0.1000, PPL: 32947.69, |Param|: 4927.36, |GParam|: 41.24, Training: 128/57/70 total/source/target tokens/sec Epoch: 2, Batch: 6400/11961, Batch size: 16, LR: 0.1000, PPL: 32596.78, |Param|: 4930.64, |GParam|: 50.88, Training: 128/58/70 total/source/target tokens/sec Epoch: 2, Batch: 6450/11961, Batch size: 16, LR: 0.1000, PPL: 32243.79, |Param|: 4933.21, |GParam|: 59.41, Training: 128/58/70 total/source/target tokens/sec Epoch: 2, Batch: 6500/11961, Batch size: 16, LR: 0.1000, PPL: 31883.21, |Param|: 4936.18, |GParam|: 44.22, Training: 128/58/70 total/source/target tokens/sec Epoch: 2, Batch: 6550/11961, Batch size: 16, LR: 0.1000, PPL: 31522.26, |Param|: 4938.75, |GParam|: 66.93, Training: 128/58/70 total/source/target tokens/sec Epoch: 2, Batch: 6600/11961, Batch size: 16, LR: 0.1000, PPL: 31207.45, |Param|: 4941.83, |GParam|: 52.22, Training: 128/58/70 total/source/target tokens/sec Epoch: 2, Batch: 6650/11961, Batch size: 16, LR: 0.1000, PPL: 30885.96, |Param|: 4944.91, |GParam|: 42.47, Training: 128/58/70 total/source/target tokens/sec Epoch: 2, Batch: 6700/11961, Batch size: 16, LR: 0.1000, PPL: 30618.97, |Param|: 4947.64, |GParam|: 47.09, Training: 128/58/70 total/source/target tokens/sec Epoch: 2, Batch: 6750/11961, Batch size: 16, LR: 0.1000, PPL: 30320.12, |Param|: 4950.58, |GParam|: 49.87, Training: 128/58/70 total/source/target tokens/sec Epoch: 2, Batch: 6800/11961, Batch size: 16, LR: 0.1000, PPL: 30073.98, |Param|: 4953.50, |GParam|: 75.12, Training: 128/58/70 total/source/target tokens/sec Epoch: 2, Batch: 6850/11961, Batch size: 16, LR: 0.1000, PPL: 29801.02, |Param|: 4956.21, |GParam|: 49.49, Training: 128/58/70 total/source/target tokens/sec Epoch: 2, Batch: 6900/11961, Batch size: 16, LR: 0.1000, PPL: 29549.58, |Param|: 4959.13, |GParam|: 40.28, Training: 128/58/70 total/source/target tokens/sec Epoch: 2, Batch: 6950/11961, Batch size: 16, LR: 0.1000, PPL: 29286.41, |Param|: 4961.95, |GParam|: 55.98, Training: 128/58/70 total/source/target tokens/sec Epoch: 2, Batch: 7000/11961, Batch size: 16, LR: 0.1000, PPL: 29038.71, |Param|: 4964.59, |GParam|: 47.89, Training: 128/58/70 total/source/target tokens/sec Epoch: 2, Batch: 7050/11961, Batch size: 16, LR: 0.1000, PPL: 28763.61, |Param|: 4967.37, |GParam|: 87.38, Training: 129/58/70 total/source/target tokens/sec Epoch: 2, Batch: 7100/11961, Batch size: 16, LR: 0.1000, PPL: 28508.18, |Param|: 4970.12, |GParam|: 49.27, Training: 129/59/70 total/source/target tokens/sec Epoch: 2, Batch: 7150/11961, Batch size: 16, LR: 0.1000, PPL: 28252.98, |Param|: 4972.96, |GParam|: 63.07, Training: 129/59/70 total/source/target tokens/sec Epoch: 2, Batch: 7200/11961, Batch size: 16, LR: 0.1000, PPL: 28013.93, |Param|: 4975.75, |GParam|: 69.26, Training: 129/59/70 total/source/target tokens/sec Epoch: 2, Batch: 7250/11961, Batch size: 16, LR: 0.1000, PPL: 27793.24, |Param|: 4978.24, |GParam|: 67.80, Training: 129/59/70 total/source/target tokens/sec Epoch: 2, Batch: 7300/11961, Batch size: 16, LR: 0.1000, PPL: 27567.75, |Param|: 4981.44, |GParam|: 66.93, Training: 129/59/70 total/source/target tokens/sec Epoch: 2, Batch: 7350/11961, Batch size: 16, LR: 0.1000, PPL: 27359.78, |Param|: 4984.28, |GParam|: 46.48, Training: 129/59/70 total/source/target tokens/sec Epoch: 2, Batch: 7400/11961, Batch size: 16, LR: 0.1000, PPL: 27155.42, |Param|: 4986.11, |GParam|: 35.39, Training: 129/59/70 total/source/target tokens/sec Epoch: 2, Batch: 7450/11961, Batch size: 16, LR: 0.1000, PPL: 26940.47, |Param|: 4988.53, |GParam|: 56.41, Training: 129/59/70 total/source/target tokens/sec Epoch: 2, Batch: 7500/11961, Batch size: 16, LR: 0.1000, PPL: 26724.05, |Param|: 4991.38, |GParam|: 58.83, Training: 129/59/70 total/source/target tokens/sec Epoch: 2, Batch: 7550/11961, Batch size: 16, LR: 0.1000, PPL: 26508.81, |Param|: 4993.74, |GParam|: 55.33, Training: 129/59/70 total/source/target tokens/sec Epoch: 2, Batch: 7600/11961, Batch size: 16, LR: 0.1000, PPL: 26305.14, |Param|: 4996.38, |GParam|: 47.03, Training: 129/59/70 total/source/target tokens/sec Epoch: 2, Batch: 7650/11961, Batch size: 16, LR: 0.1000, PPL: 26140.96, |Param|: 4999.20, |GParam|: 50.43, Training: 129/59/70 total/source/target tokens/sec Epoch: 2, Batch: 7700/11961, Batch size: 16, LR: 0.1000, PPL: 25967.17, |Param|: 5001.94, |GParam|: 42.99, Training: 129/59/70 total/source/target tokens/sec Epoch: 2, Batch: 7750/11961, Batch size: 16, LR: 0.1000, PPL: 25782.90, |Param|: 5004.08, |GParam|: 66.59, Training: 129/59/70 total/source/target tokens/sec Epoch: 2, Batch: 7800/11961, Batch size: 16, LR: 0.1000, PPL: 25605.25, |Param|: 5006.67, |GParam|: 80.02, Training: 129/59/70 total/source/target tokens/sec Epoch: 2, Batch: 7850/11961, Batch size: 16, LR: 0.1000, PPL: 25451.81, |Param|: 5008.94, |GParam|: 86.00, Training: 129/59/69 total/source/target tokens/sec Epoch: 2, Batch: 7900/11961, Batch size: 16, LR: 0.1000, PPL: 25269.35, |Param|: 5011.71, |GParam|: 50.35, Training: 130/60/69 total/source/target tokens/sec Epoch: 2, Batch: 7950/11961, Batch size: 16, LR: 0.1000, PPL: 25087.02, |Param|: 5014.57, |GParam|: 39.97, Training: 130/60/69 total/source/target tokens/sec Epoch: 2, Batch: 8000/11961, Batch size: 16, LR: 0.1000, PPL: 24939.11, |Param|: 5016.78, |GParam|: 52.76, Training: 130/60/69 total/source/target tokens/sec Epoch: 2, Batch: 8050/11961, Batch size: 16, LR: 0.1000, PPL: 24782.16, |Param|: 5019.13, |GParam|: 69.58, Training: 130/60/69 total/source/target tokens/sec Epoch: 2, Batch: 8100/11961, Batch size: 16, LR: 0.1000, PPL: 24643.93, |Param|: 5021.62, |GParam|: 49.55, Training: 130/60/69 total/source/target tokens/sec Epoch: 2, Batch: 8150/11961, Batch size: 16, LR: 0.1000, PPL: 24502.30, |Param|: 5024.10, |GParam|: 55.73, Training: 130/60/69 total/source/target tokens/sec Epoch: 2, Batch: 8200/11961, Batch size: 16, LR: 0.1000, PPL: 24372.47, |Param|: 5026.71, |GParam|: 61.79, Training: 130/60/69 total/source/target tokens/sec Epoch: 2, Batch: 8250/11961, Batch size: 9, LR: 0.1000, PPL: 24253.84, |Param|: 5029.27, |GParam|: 82.39, Training: 130/60/69 total/source/target tokens/sec Epoch: 2, Batch: 8300/11961, Batch size: 16, LR: 0.1000, PPL: 24117.88, |Param|: 5032.21, |GParam|: 47.48, Training: 130/60/69 total/source/target tokens/sec Epoch: 2, Batch: 8350/11961, Batch size: 16, LR: 0.1000, PPL: 23968.79, |Param|: 5034.84, |GParam|: 82.46, Training: 130/60/69 total/source/target tokens/sec Epoch: 2, Batch: 8400/11961, Batch size: 16, LR: 0.1000, PPL: 23833.82, |Param|: 5037.69, |GParam|: 72.51, Training: 130/60/69 total/source/target tokens/sec Epoch: 2, Batch: 8450/11961, Batch size: 16, LR: 0.1000, PPL: 23707.87, |Param|: 5040.11, |GParam|: 49.06, Training: 130/60/69 total/source/target tokens/sec Epoch: 2, Batch: 8500/11961, Batch size: 16, LR: 0.1000, PPL: 23561.26, |Param|: 5042.75, |GParam|: 72.56, Training: 130/60/69 total/source/target tokens/sec Epoch: 2, Batch: 8550/11961, Batch size: 16, LR: 0.1000, PPL: 23418.46, |Param|: 5045.32, |GParam|: 51.77, Training: 130/60/69 total/source/target tokens/sec Epoch: 2, Batch: 8600/11961, Batch size: 16, LR: 0.1000, PPL: 23305.81, |Param|: 5048.11, |GParam|: 42.91, Training: 130/60/69 total/source/target tokens/sec Epoch: 2, Batch: 8650/11961, Batch size: 16, LR: 0.1000, PPL: 23188.42, |Param|: 5050.31, |GParam|: 47.69, Training: 130/61/69 total/source/target tokens/sec Epoch: 2, Batch: 8700/11961, Batch size: 16, LR: 0.1000, PPL: 23060.04, |Param|: 5052.98, |GParam|: 37.03, Training: 130/61/69 total/source/target tokens/sec Epoch: 2, Batch: 8750/11961, Batch size: 16, LR: 0.1000, PPL: 22939.95, |Param|: 5055.79, |GParam|: 75.19, Training: 131/61/69 total/source/target tokens/sec Epoch: 2, Batch: 8800/11961, Batch size: 16, LR: 0.1000, PPL: 22817.68, |Param|: 5058.25, |GParam|: 70.94, Training: 131/61/69 total/source/target tokens/sec Epoch: 2, Batch: 8850/11961, Batch size: 16, LR: 0.1000, PPL: 22697.68, |Param|: 5061.16, |GParam|: 50.03, Training: 131/61/69 total/source/target tokens/sec Epoch: 2, Batch: 8900/11961, Batch size: 16, LR: 0.1000, PPL: 22581.68, |Param|: 5063.43, |GParam|: 65.45, Training: 131/61/69 total/source/target tokens/sec Epoch: 2, Batch: 8950/11961, Batch size: 16, LR: 0.1000, PPL: 22474.20, |Param|: 5065.99, |GParam|: 61.63, Training: 131/61/69 total/source/target tokens/sec Epoch: 2, Batch: 9000/11961, Batch size: 16, LR: 0.1000, PPL: 22367.62, |Param|: 5068.59, |GParam|: 50.97, Training: 131/61/69 total/source/target tokens/sec Epoch: 2, Batch: 9050/11961, Batch size: 16, LR: 0.1000, PPL: 22262.82, |Param|: 5071.47, |GParam|: 66.29, Training: 131/61/69 total/source/target tokens/sec Epoch: 2, Batch: 9100/11961, Batch size: 16, LR: 0.1000, PPL: 22183.09, |Param|: 5074.18, |GParam|: 37.80, Training: 131/61/69 total/source/target tokens/sec Epoch: 2, Batch: 9150/11961, Batch size: 16, LR: 0.1000, PPL: 22080.90, |Param|: 5076.51, |GParam|: 61.59, Training: 131/61/69 total/source/target tokens/sec Epoch: 2, Batch: 9200/11961, Batch size: 16, LR: 0.1000, PPL: 21972.27, |Param|: 5078.99, |GParam|: 72.77, Training: 131/61/69 total/source/target tokens/sec Epoch: 2, Batch: 9250/11961, Batch size: 16, LR: 0.1000, PPL: 21859.51, |Param|: 5081.39, |GParam|: 53.70, Training: 131/61/69 total/source/target tokens/sec Epoch: 2, Batch: 9300/11961, Batch size: 16, LR: 0.1000, PPL: 21756.46, |Param|: 5084.12, |GParam|: 53.38, Training: 131/61/69 total/source/target tokens/sec Epoch: 2, Batch: 9350/11961, Batch size: 16, LR: 0.1000, PPL: 21664.63, |Param|: 5086.75, |GParam|: 49.47, Training: 131/61/69 total/source/target tokens/sec Epoch: 2, Batch: 9400/11961, Batch size: 16, LR: 0.1000, PPL: 21556.99, |Param|: 5089.28, |GParam|: 60.07, Training: 131/61/69 total/source/target tokens/sec Epoch: 2, Batch: 9450/11961, Batch size: 16, LR: 0.1000, PPL: 21483.34, |Param|: 5092.13, |GParam|: 34.83, Training: 131/62/69 total/source/target tokens/sec Epoch: 2, Batch: 9500/11961, Batch size: 16, LR: 0.1000, PPL: 21414.74, |Param|: 5094.96, |GParam|: 55.62, Training: 131/62/69 total/source/target tokens/sec Epoch: 2, Batch: 9550/11961, Batch size: 16, LR: 0.1000, PPL: 21320.02, |Param|: 5097.70, |GParam|: 43.60, Training: 131/62/69 total/source/target tokens/sec Epoch: 2, Batch: 9600/11961, Batch size: 16, LR: 0.1000, PPL: 21237.51, |Param|: 5100.07, |GParam|: 69.29, Training: 131/62/69 total/source/target tokens/sec Epoch: 2, Batch: 9650/11961, Batch size: 16, LR: 0.1000, PPL: 21140.96, |Param|: 5102.82, |GParam|: 42.73, Training: 132/62/69 total/source/target tokens/sec Epoch: 2, Batch: 9700/11961, Batch size: 16, LR: 0.1000, PPL: 21039.27, |Param|: 5105.19, |GParam|: 78.35, Training: 132/62/69 total/source/target tokens/sec Epoch: 2, Batch: 9750/11961, Batch size: 16, LR: 0.1000, PPL: 20975.47, |Param|: 5108.22, |GParam|: 56.66, Training: 132/62/69 total/source/target tokens/sec Epoch: 2, Batch: 9800/11961, Batch size: 16, LR: 0.1000, PPL: 20907.21, |Param|: 5111.42, |GParam|: 54.05, Training: 132/62/69 total/source/target tokens/sec Epoch: 2, Batch: 9850/11961, Batch size: 16, LR: 0.1000, PPL: 20838.20, |Param|: 5114.13, |GParam|: 64.47, Training: 132/62/69 total/source/target tokens/sec Epoch: 2, Batch: 9900/11961, Batch size: 16, LR: 0.1000, PPL: 20753.59, |Param|: 5117.17, |GParam|: 66.94, Training: 132/62/69 total/source/target tokens/sec Epoch: 2, Batch: 9950/11961, Batch size: 16, LR: 0.1000, PPL: 20674.91, |Param|: 5119.97, |GParam|: 38.80, Training: 132/62/69 total/source/target tokens/sec Epoch: 2, Batch: 10000/11961, Batch size: 16, LR: 0.1000, PPL: 20589.44, |Param|: 5122.95, |GParam|: 68.71, Training: 132/62/69 total/source/target tokens/sec Epoch: 2, Batch: 10050/11961, Batch size: 16, LR: 0.1000, PPL: 20517.58, |Param|: 5126.28, |GParam|: 36.84, Training: 132/62/69 total/source/target tokens/sec Epoch: 2, Batch: 10100/11961, Batch size: 16, LR: 0.1000, PPL: 20457.00, |Param|: 5129.52, |GParam|: 38.19, Training: 132/62/69 total/source/target tokens/sec Epoch: 2, Batch: 10150/11961, Batch size: 16, LR: 0.1000, PPL: 20389.53, |Param|: 5132.58, |GParam|: 54.60, Training: 132/62/69 total/source/target tokens/sec Epoch: 2, Batch: 10200/11961, Batch size: 16, LR: 0.1000, PPL: 20313.05, |Param|: 5134.93, |GParam|: 53.22, Training: 132/62/69 total/source/target tokens/sec Epoch: 2, Batch: 10250/11961, Batch size: 16, LR: 0.1000, PPL: 20254.21, |Param|: 5138.42, |GParam|: 84.39, Training: 132/62/69 total/source/target tokens/sec Epoch: 2, Batch: 10300/11961, Batch size: 16, LR: 0.1000, PPL: 20184.41, |Param|: 5141.48, |GParam|: 78.20, Training: 132/63/69 total/source/target tokens/sec Epoch: 2, Batch: 10350/11961, Batch size: 16, LR: 0.1000, PPL: 20109.24, |Param|: 5144.41, |GParam|: 56.91, Training: 132/63/69 total/source/target tokens/sec Epoch: 2, Batch: 10400/11961, Batch size: 16, LR: 0.1000, PPL: 20036.96, |Param|: 5147.49, |GParam|: 78.03, Training: 132/63/69 total/source/target tokens/sec Epoch: 2, Batch: 10450/11961, Batch size: 16, LR: 0.1000, PPL: 19964.55, |Param|: 5150.62, |GParam|: 46.23, Training: 132/63/69 total/source/target tokens/sec Epoch: 2, Batch: 10500/11961, Batch size: 16, LR: 0.1000, PPL: 19915.03, |Param|: 5154.17, |GParam|: 49.36, Training: 132/63/69 total/source/target tokens/sec Epoch: 2, Batch: 10550/11961, Batch size: 16, LR: 0.1000, PPL: 19849.51, |Param|: 5157.34, |GParam|: 39.83, Training: 133/63/69 total/source/target tokens/sec Epoch: 2, Batch: 10600/11961, Batch size: 16, LR: 0.1000, PPL: 19799.79, |Param|: 5160.59, |GParam|: 47.83, Training: 133/63/69 total/source/target tokens/sec Epoch: 2, Batch: 10650/11961, Batch size: 16, LR: 0.1000, PPL: 19725.72, |Param|: 5163.59, |GParam|: 55.55, Training: 133/63/69 total/source/target tokens/sec Epoch: 2, Batch: 10700/11961, Batch size: 16, LR: 0.1000, PPL: 19661.20, |Param|: 5166.76, |GParam|: 61.17, Training: 133/63/69 total/source/target tokens/sec Epoch: 2, Batch: 10750/11961, Batch size: 16, LR: 0.1000, PPL: 19601.56, |Param|: 5169.72, |GParam|: 65.12, Training: 133/63/69 total/source/target tokens/sec Epoch: 2, Batch: 10800/11961, Batch size: 16, LR: 0.1000, PPL: 19533.65, |Param|: 5172.90, |GParam|: 45.69, Training: 133/63/69 total/source/target tokens/sec Epoch: 2, Batch: 10850/11961, Batch size: 16, LR: 0.1000, PPL: 19472.44, |Param|: 5176.31, |GParam|: 77.23, Training: 133/63/69 total/source/target tokens/sec Epoch: 2, Batch: 10900/11961, Batch size: 16, LR: 0.1000, PPL: 19425.02, |Param|: 5179.98, |GParam|: 51.47, Training: 133/63/69 total/source/target tokens/sec Epoch: 2, Batch: 10950/11961, Batch size: 16, LR: 0.1000, PPL: 19366.68, |Param|: 5183.29, |GParam|: 65.51, Training: 133/63/69 total/source/target tokens/sec Epoch: 2, Batch: 11000/11961, Batch size: 16, LR: 0.1000, PPL: 19319.20, |Param|: 5186.64, |GParam|: 77.65, Training: 133/63/69 total/source/target tokens/sec Epoch: 2, Batch: 11050/11961, Batch size: 16, LR: 0.1000, PPL: 19268.06, |Param|: 5189.93, |GParam|: 44.94, Training: 133/63/69 total/source/target tokens/sec Epoch: 2, Batch: 11100/11961, Batch size: 16, LR: 0.1000, PPL: 19213.65, |Param|: 5193.22, |GParam|: 64.79, Training: 133/64/69 total/source/target tokens/sec Epoch: 2, Batch: 11150/11961, Batch size: 16, LR: 0.1000, PPL: 19171.36, |Param|: 5197.59, |GParam|: 52.82, Training: 133/64/69 total/source/target tokens/sec Epoch: 2, Batch: 11200/11961, Batch size: 16, LR: 0.1000, PPL: 19112.84, |Param|: 5200.75, |GParam|: 69.53, Training: 133/64/69 total/source/target tokens/sec Epoch: 2, Batch: 11250/11961, Batch size: 16, LR: 0.1000, PPL: 19061.93, |Param|: 5204.48, |GParam|: 63.86, Training: 133/64/69 total/source/target tokens/sec Epoch: 2, Batch: 11300/11961, Batch size: 16, LR: 0.1000, PPL: 18992.17, |Param|: 5207.26, |GParam|: 64.87, Training: 133/64/69 total/source/target tokens/sec Epoch: 2, Batch: 11350/11961, Batch size: 16, LR: 0.1000, PPL: 18938.04, |Param|: 5210.48, |GParam|: 61.26, Training: 133/64/69 total/source/target tokens/sec Epoch: 2, Batch: 11400/11961, Batch size: 16, LR: 0.1000, PPL: 18871.02, |Param|: 5213.46, |GParam|: 76.30, Training: 134/64/69 total/source/target tokens/sec Epoch: 2, Batch: 11450/11961, Batch size: 16, LR: 0.1000, PPL: 18816.07, |Param|: 5216.21, |GParam|: 59.43, Training: 134/64/69 total/source/target tokens/sec Epoch: 2, Batch: 11500/11961, Batch size: 16, LR: 0.1000, PPL: 18766.57, |Param|: 5219.56, |GParam|: 53.50, Training: 134/64/69 total/source/target tokens/sec Epoch: 2, Batch: 11550/11961, Batch size: 16, LR: 0.1000, PPL: 18713.27, |Param|: 5223.34, |GParam|: 43.85, Training: 134/64/69 total/source/target tokens/sec Epoch: 2, Batch: 11600/11961, Batch size: 16, LR: 0.1000, PPL: 18653.62, |Param|: 5226.80, |GParam|: 67.46, Training: 134/64/69 total/source/target tokens/sec Epoch: 2, Batch: 11650/11961, Batch size: 16, LR: 0.1000, PPL: 18607.05, |Param|: 5230.41, |GParam|: 63.38, Training: 134/64/69 total/source/target tokens/sec Epoch: 2, Batch: 11700/11961, Batch size: 16, LR: 0.1000, PPL: 18542.25, |Param|: 5234.24, |GParam|: 71.47, Training: 134/64/69 total/source/target tokens/sec Epoch: 2, Batch: 11750/11961, Batch size: 16, LR: 0.1000, PPL: 18472.80, |Param|: 5237.00, |GParam|: 64.77, Training: 134/64/69 total/source/target tokens/sec Epoch: 2, Batch: 11800/11961, Batch size: 16, LR: 0.1000, PPL: 18396.98, |Param|: 5240.09, |GParam|: 89.69, Training: 134/65/69 total/source/target tokens/sec Epoch: 2, Batch: 11850/11961, Batch size: 16, LR: 0.1000, PPL: 18305.59, |Param|: 5243.22, |GParam|: 72.71, Training: 134/65/69 total/source/target tokens/sec Epoch: 2, Batch: 11900/11961, Batch size: 16, LR: 0.1000, PPL: 18234.17, |Param|: 5246.76, |GParam|: 75.32, Training: 134/65/69 total/source/target tokens/sec Epoch: 2, Batch: 11950/11961, Batch size: 16, LR: 0.1000, PPL: 18148.08, |Param|: 5249.85, |GParam|: 63.89, Training: 134/65/69 total/source/target tokens/sec Train 36m18132.992960986 Valid 36m3306.1873313652 saving checkpoint to demo-model_epoch2.00_3306.19.t7 Epoch: 3, Batch: 50/11961, Batch size: 16, LR: 0.1000, PPL: 9752.00, |Param|: 5246.88, |GParam|: 17.69, Training: 91/17/73 total/source/target tokens/sec Epoch: 3, Batch: 100/11961, Batch size: 16, LR: 0.1000, PPL: 1907.13, |Param|: 5245.87, |GParam|: 13.62, Training: 96/23/72 total/source/target tokens/sec Epoch: 3, Batch: 150/11961, Batch size: 16, LR: 0.1000, PPL: 1229.61, |Param|: 5244.80, |GParam|: 12.62, Training: 99/27/72 total/source/target tokens/sec Epoch: 3, Batch: 200/11961, Batch size: 16, LR: 0.1000, PPL: 1030.62, |Param|: 5243.78, |GParam|: 11.16, Training: 101/29/72 total/source/target tokens/sec Epoch: 3, Batch: 250/11961, Batch size: 16, LR: 0.1000, PPL: 988.77, |Param|: 5243.06, |GParam|: 12.91, Training: 102/30/71 total/source/target tokens/sec Epoch: 3, Batch: 300/11961, Batch size: 16, LR: 0.1000, PPL: 963.77, |Param|: 5242.24, |GParam|: 13.05, Training: 104/32/71 total/source/target tokens/sec Epoch: 3, Batch: 350/11961, Batch size: 16, LR: 0.1000, PPL: 918.78, |Param|: 5241.37, |GParam|: 16.28, Training: 105/33/71 total/source/target tokens/sec Epoch: 3, Batch: 400/11961, Batch size: 16, LR: 0.1000, PPL: 920.31, |Param|: 5240.39, |GParam|: 19.44, Training: 105/34/71 total/source/target tokens/sec Epoch: 3, Batch: 450/11961, Batch size: 16, LR: 0.1000, PPL: 916.10, |Param|: 5239.44, |GParam|: 13.07, Training: 106/35/71 total/source/target tokens/sec Epoch: 3, Batch: 500/11961, Batch size: 16, LR: 0.1000, PPL: 924.37, |Param|: 5238.49, |GParam|: 16.70, Training: 107/36/71 total/source/target tokens/sec Epoch: 3, Batch: 550/11961, Batch size: 16, LR: 0.1000, PPL: 933.99, |Param|: 5237.60, |GParam|: 19.20, Training: 108/36/71 total/source/target tokens/sec Epoch: 3, Batch: 600/11961, Batch size: 16, LR: 0.1000, PPL: 966.33, |Param|: 5236.54, |GParam|: 15.26, Training: 108/37/71 total/source/target tokens/sec Epoch: 3, Batch: 650/11961, Batch size: 16, LR: 0.1000, PPL: 996.83, |Param|: 5235.57, |GParam|: 18.76, Training: 109/38/71 total/source/target tokens/sec Epoch: 3, Batch: 700/11961, Batch size: 16, LR: 0.1000, PPL: 1014.93, |Param|: 5234.71, |GParam|: 16.58, Training: 109/38/71 total/source/target tokens/sec Epoch: 3, Batch: 750/11961, Batch size: 16, LR: 0.1000, PPL: 1034.71, |Param|: 5234.04, |GParam|: 19.89, Training: 110/39/71 total/source/target tokens/sec Epoch: 3, Batch: 800/11961, Batch size: 16, LR: 0.1000, PPL: 1053.88, |Param|: 5233.42, |GParam|: 20.19, Training: 110/39/71 total/source/target tokens/sec Epoch: 3, Batch: 850/11961, Batch size: 16, LR: 0.1000, PPL: 1074.00, |Param|: 5232.78, |GParam|: 17.22, Training: 111/40/71 total/source/target tokens/sec Epoch: 3, Batch: 900/11961, Batch size: 16, LR: 0.1000, PPL: 1090.58, |Param|: 5232.14, |GParam|: 19.18, Training: 111/40/71 total/source/target tokens/sec Epoch: 3, Batch: 950/11961, Batch size: 16, LR: 0.1000, PPL: 1112.40, |Param|: 5231.35, |GParam|: 16.27, Training: 112/41/71 total/source/target tokens/sec Epoch: 3, Batch: 1000/11961, Batch size: 16, LR: 0.1000, PPL: 1132.73, |Param|: 5230.62, |GParam|: 18.24, Training: 112/41/71 total/source/target tokens/sec Epoch: 3, Batch: 1050/11961, Batch size: 16, LR: 0.1000, PPL: 1155.34, |Param|: 5230.05, |GParam|: 17.96, Training: 113/42/70 total/source/target tokens/sec Epoch: 3, Batch: 1100/11961, Batch size: 16, LR: 0.1000, PPL: 1176.10, |Param|: 5229.51, |GParam|: 17.43, Training: 113/42/70 total/source/target tokens/sec Epoch: 3, Batch: 1150/11961, Batch size: 16, LR: 0.1000, PPL: 1195.78, |Param|: 5228.90, |GParam|: 24.69, Training: 113/43/70 total/source/target tokens/sec Epoch: 3, Batch: 1200/11961, Batch size: 16, LR: 0.1000, PPL: 1216.31, |Param|: 5228.29, |GParam|: 20.23, Training: 114/43/70 total/source/target tokens/sec Epoch: 3, Batch: 1250/11961, Batch size: 16, LR: 0.1000, PPL: 1234.19, |Param|: 5227.62, |GParam|: 23.88, Training: 114/43/70 total/source/target tokens/sec Epoch: 3, Batch: 1300/11961, Batch size: 16, LR: 0.1000, PPL: 1245.27, |Param|: 5226.98, |GParam|: 21.52, Training: 114/43/70 total/source/target tokens/sec Epoch: 3, Batch: 1350/11961, Batch size: 16, LR: 0.1000, PPL: 1260.74, |Param|: 5226.12, |GParam|: 20.97, Training: 115/44/70 total/source/target tokens/sec Epoch: 3, Batch: 1400/11961, Batch size: 16, LR: 0.1000, PPL: 1260.48, |Param|: 5225.33, |GParam|: 21.17, Training: 115/44/70 total/source/target tokens/sec Epoch: 3, Batch: 1450/11961, Batch size: 16, LR: 0.1000, PPL: 1279.64, |Param|: 5224.68, |GParam|: 23.61, Training: 115/44/70 total/source/target tokens/sec Epoch: 3, Batch: 1500/11961, Batch size: 16, LR: 0.1000, PPL: 1301.46, |Param|: 5224.03, |GParam|: 20.36, Training: 116/45/70 total/source/target tokens/sec Epoch: 3, Batch: 1550/11961, Batch size: 16, LR: 0.1000, PPL: 1322.47, |Param|: 5223.46, |GParam|: 25.86, Training: 116/45/70 total/source/target tokens/sec Epoch: 3, Batch: 1600/11961, Batch size: 16, LR: 0.1000, PPL: 1343.10, |Param|: 5222.99, |GParam|: 28.11, Training: 116/45/70 total/source/target tokens/sec Epoch: 3, Batch: 1650/11961, Batch size: 16, LR: 0.1000, PPL: 1367.21, |Param|: 5222.57, |GParam|: 33.06, Training: 116/45/70 total/source/target tokens/sec Epoch: 3, Batch: 1700/11961, Batch size: 16, LR: 0.1000, PPL: 1388.47, |Param|: 5222.09, |GParam|: 22.92, Training: 116/46/70 total/source/target tokens/sec Epoch: 3, Batch: 1750/11961, Batch size: 16, LR: 0.1000, PPL: 1413.54, |Param|: 5221.67, |GParam|: 34.85, Training: 117/46/70 total/source/target tokens/sec Epoch: 3, Batch: 1800/11961, Batch size: 16, LR: 0.1000, PPL: 1435.96, |Param|: 5221.27, |GParam|: 25.36, Training: 117/46/70 total/source/target tokens/sec Epoch: 3, Batch: 1850/11961, Batch size: 16, LR: 0.1000, PPL: 1456.38, |Param|: 5220.97, |GParam|: 25.61, Training: 117/46/70 total/source/target tokens/sec Epoch: 3, Batch: 1900/11961, Batch size: 16, LR: 0.1000, PPL: 1475.84, |Param|: 5220.61, |GParam|: 26.41, Training: 117/46/70 total/source/target tokens/sec Epoch: 3, Batch: 1950/11961, Batch size: 16, LR: 0.1000, PPL: 1494.94, |Param|: 5220.29, |GParam|: 26.27, Training: 117/47/70 total/source/target tokens/sec Epoch: 3, Batch: 2000/11961, Batch size: 16, LR: 0.1000, PPL: 1512.26, |Param|: 5220.04, |GParam|: 54.61, Training: 118/47/70 total/source/target tokens/sec Epoch: 3, Batch: 2050/11961, Batch size: 16, LR: 0.1000, PPL: 1531.13, |Param|: 5219.85, |GParam|: 35.41, Training: 118/47/70 total/source/target tokens/sec Epoch: 3, Batch: 2100/11961, Batch size: 16, LR: 0.1000, PPL: 1551.17, |Param|: 5219.63, |GParam|: 39.43, Training: 118/47/70 total/source/target tokens/sec Epoch: 3, Batch: 2150/11961, Batch size: 16, LR: 0.1000, PPL: 1569.16, |Param|: 5219.40, |GParam|: 29.32, Training: 118/48/70 total/source/target tokens/sec Epoch: 3, Batch: 2200/11961, Batch size: 16, LR: 0.1000, PPL: 1588.75, |Param|: 5219.17, |GParam|: 25.45, Training: 118/48/70 total/source/target tokens/sec Epoch: 3, Batch: 2250/11961, Batch size: 16, LR: 0.1000, PPL: 1599.14, |Param|: 5218.96, |GParam|: 32.83, Training: 119/48/70 total/source/target tokens/sec Epoch: 3, Batch: 2300/11961, Batch size: 16, LR: 0.1000, PPL: 1616.76, |Param|: 5218.77, |GParam|: 33.01, Training: 119/48/70 total/source/target tokens/sec Epoch: 3, Batch: 2350/11961, Batch size: 16, LR: 0.1000, PPL: 1633.12, |Param|: 5218.54, |GParam|: 29.69, Training: 119/48/70 total/source/target tokens/sec Epoch: 3, Batch: 2400/11961, Batch size: 16, LR: 0.1000, PPL: 1647.53, |Param|: 5218.31, |GParam|: 30.92, Training: 119/49/70 total/source/target tokens/sec Epoch: 3, Batch: 2450/11961, Batch size: 16, LR: 0.1000, PPL: 1663.63, |Param|: 5218.10, |GParam|: 23.06, Training: 119/49/70 total/source/target tokens/sec Epoch: 3, Batch: 2500/11961, Batch size: 16, LR: 0.1000, PPL: 1679.41, |Param|: 5217.85, |GParam|: 28.26, Training: 119/49/70 total/source/target tokens/sec Epoch: 3, Batch: 2550/11961, Batch size: 16, LR: 0.1000, PPL: 1698.54, |Param|: 5217.68, |GParam|: 30.46, Training: 120/49/70 total/source/target tokens/sec Epoch: 3, Batch: 2600/11961, Batch size: 16, LR: 0.1000, PPL: 1717.96, |Param|: 5217.52, |GParam|: 23.58, Training: 120/49/70 total/source/target tokens/sec Epoch: 3, Batch: 2650/11961, Batch size: 16, LR: 0.1000, PPL: 1733.08, |Param|: 5217.39, |GParam|: 23.84, Training: 120/49/70 total/source/target tokens/sec Epoch: 3, Batch: 2700/11961, Batch size: 16, LR: 0.1000, PPL: 1750.90, |Param|: 5217.20, |GParam|: 35.48, Training: 120/50/70 total/source/target tokens/sec Epoch: 3, Batch: 2750/11961, Batch size: 16, LR: 0.1000, PPL: 1765.69, |Param|: 5217.11, |GParam|: 26.37, Training: 120/50/70 total/source/target tokens/sec Epoch: 3, Batch: 2800/11961, Batch size: 16, LR: 0.1000, PPL: 1778.30, |Param|: 5217.00, |GParam|: 32.26, Training: 120/50/70 total/source/target tokens/sec Epoch: 3, Batch: 2850/11961, Batch size: 16, LR: 0.1000, PPL: 1791.88, |Param|: 5216.94, |GParam|: 28.22, Training: 120/50/70 total/source/target tokens/sec Epoch: 3, Batch: 2900/11961, Batch size: 16, LR: 0.1000, PPL: 1802.10, |Param|: 5216.80, |GParam|: 28.31, Training: 121/50/70 total/source/target tokens/sec Epoch: 3, Batch: 2950/11961, Batch size: 16, LR: 0.1000, PPL: 1816.06, |Param|: 5216.72, |GParam|: 33.31, Training: 121/50/70 total/source/target tokens/sec Epoch: 3, Batch: 3000/11961, Batch size: 16, LR: 0.1000, PPL: 1828.92, |Param|: 5216.68, |GParam|: 34.39, Training: 121/50/70 total/source/target tokens/sec Epoch: 3, Batch: 3050/11961, Batch size: 16, LR: 0.1000, PPL: 1843.78, |Param|: 5216.62, |GParam|: 28.33, Training: 121/51/70 total/source/target tokens/sec Epoch: 3, Batch: 3100/11961, Batch size: 16, LR: 0.1000, PPL: 1855.67, |Param|: 5216.61, |GParam|: 32.63, Training: 121/51/70 total/source/target tokens/sec Epoch: 3, Batch: 3150/11961, Batch size: 16, LR: 0.1000, PPL: 1871.05, |Param|: 5216.63, |GParam|: 32.69, Training: 121/51/70 total/source/target tokens/sec Epoch: 3, Batch: 3200/11961, Batch size: 16, LR: 0.1000, PPL: 1885.82, |Param|: 5216.64, |GParam|: 27.33, Training: 122/51/70 total/source/target tokens/sec Epoch: 3, Batch: 3250/11961, Batch size: 16, LR: 0.1000, PPL: 1899.66, |Param|: 5216.64, |GParam|: 32.18, Training: 122/51/70 total/source/target tokens/sec Epoch: 3, Batch: 3300/11961, Batch size: 16, LR: 0.1000, PPL: 1913.79, |Param|: 5216.63, |GParam|: 39.89, Training: 122/51/70 total/source/target tokens/sec Epoch: 3, Batch: 3350/11961, Batch size: 16, LR: 0.1000, PPL: 1925.93, |Param|: 5216.63, |GParam|: 32.65, Training: 122/51/70 total/source/target tokens/sec Epoch: 3, Batch: 3400/11961, Batch size: 16, LR: 0.1000, PPL: 1941.50, |Param|: 5216.73, |GParam|: 28.46, Training: 122/52/70 total/source/target tokens/sec Epoch: 3, Batch: 3450/11961, Batch size: 16, LR: 0.1000, PPL: 1955.66, |Param|: 5216.76, |GParam|: 29.16, Training: 122/52/70 total/source/target tokens/sec Epoch: 3, Batch: 3500/11961, Batch size: 16, LR: 0.1000, PPL: 1971.72, |Param|: 5216.83, |GParam|: 28.62, Training: 122/52/70 total/source/target tokens/sec Epoch: 3, Batch: 3550/11961, Batch size: 16, LR: 0.1000, PPL: 1986.65, |Param|: 5216.91, |GParam|: 26.22, Training: 122/52/70 total/source/target tokens/sec Epoch: 3, Batch: 3600/11961, Batch size: 16, LR: 0.1000, PPL: 2000.82, |Param|: 5216.92, |GParam|: 27.59, Training: 122/52/70 total/source/target tokens/sec Epoch: 3, Batch: 3650/11961, Batch size: 16, LR: 0.1000, PPL: 2016.12, |Param|: 5216.96, |GParam|: 30.20, Training: 123/52/70 total/source/target tokens/sec Epoch: 3, Batch: 3700/11961, Batch size: 16, LR: 0.1000, PPL: 2031.28, |Param|: 5217.05, |GParam|: 31.83, Training: 123/52/70 total/source/target tokens/sec Epoch: 3, Batch: 3750/11961, Batch size: 16, LR: 0.1000, PPL: 2046.36, |Param|: 5217.20, |GParam|: 33.28, Training: 123/52/70 total/source/target tokens/sec Epoch: 3, Batch: 3800/11961, Batch size: 16, LR: 0.1000, PPL: 2061.38, |Param|: 5217.35, |GParam|: 29.30, Training: 123/53/70 total/source/target tokens/sec Epoch: 3, Batch: 3850/11961, Batch size: 16, LR: 0.1000, PPL: 2075.22, |Param|: 5217.49, |GParam|: 27.70, Training: 123/53/70 total/source/target tokens/sec Epoch: 3, Batch: 3900/11961, Batch size: 16, LR: 0.1000, PPL: 2093.29, |Param|: 5217.57, |GParam|: 31.25, Training: 123/53/70 total/source/target tokens/sec Epoch: 3, Batch: 3950/11961, Batch size: 16, LR: 0.1000, PPL: 2109.97, |Param|: 5217.65, |GParam|: 34.36, Training: 123/53/70 total/source/target tokens/sec Epoch: 3, Batch: 4000/11961, Batch size: 16, LR: 0.1000, PPL: 2127.02, |Param|: 5217.81, |GParam|: 39.35, Training: 123/53/70 total/source/target tokens/sec Epoch: 3, Batch: 4050/11961, Batch size: 16, LR: 0.1000, PPL: 2142.29, |Param|: 5217.94, |GParam|: 45.69, Training: 123/53/70 total/source/target tokens/sec Epoch: 3, Batch: 4100/11961, Batch size: 16, LR: 0.1000, PPL: 2154.32, |Param|: 5218.12, |GParam|: 37.17, Training: 124/53/70 total/source/target tokens/sec Epoch: 3, Batch: 4150/11961, Batch size: 16, LR: 0.1000, PPL: 2168.99, |Param|: 5218.28, |GParam|: 27.87, Training: 124/53/70 total/source/target tokens/sec Epoch: 3, Batch: 4200/11961, Batch size: 16, LR: 0.1000, PPL: 2182.32, |Param|: 5218.41, |GParam|: 54.31, Training: 124/54/70 total/source/target tokens/sec Epoch: 3, Batch: 4250/11961, Batch size: 16, LR: 0.1000, PPL: 2198.03, |Param|: 5218.54, |GParam|: 40.72, Training: 124/54/70 total/source/target tokens/sec Epoch: 3, Batch: 4300/11961, Batch size: 16, LR: 0.1000, PPL: 2214.95, |Param|: 5218.75, |GParam|: 33.30, Training: 124/54/70 total/source/target tokens/sec Epoch: 3, Batch: 4350/11961, Batch size: 16, LR: 0.1000, PPL: 2230.18, |Param|: 5219.02, |GParam|: 36.92, Training: 124/54/70 total/source/target tokens/sec Epoch: 3, Batch: 4400/11961, Batch size: 16, LR: 0.1000, PPL: 2245.99, |Param|: 5219.34, |GParam|: 38.40, Training: 124/54/70 total/source/target tokens/sec Epoch: 3, Batch: 4450/11961, Batch size: 16, LR: 0.1000, PPL: 2261.78, |Param|: 5219.63, |GParam|: 45.95, Training: 124/54/70 total/source/target tokens/sec Epoch: 3, Batch: 4500/11961, Batch size: 16, LR: 0.1000, PPL: 2277.34, |Param|: 5219.91, |GParam|: 32.75, Training: 124/54/70 total/source/target tokens/sec Epoch: 3, Batch: 4550/11961, Batch size: 16, LR: 0.1000, PPL: 2291.09, |Param|: 5220.16, |GParam|: 36.39, Training: 125/54/70 total/source/target tokens/sec Epoch: 3, Batch: 4600/11961, Batch size: 16, LR: 0.1000, PPL: 2305.89, |Param|: 5220.40, |GParam|: 45.67, Training: 125/54/70 total/source/target tokens/sec Epoch: 3, Batch: 4650/11961, Batch size: 16, LR: 0.1000, PPL: 2321.74, |Param|: 5220.70, |GParam|: 32.24, Training: 125/54/70 total/source/target tokens/sec Epoch: 3, Batch: 4700/11961, Batch size: 16, LR: 0.1000, PPL: 2337.60, |Param|: 5221.04, |GParam|: 36.90, Training: 125/55/70 total/source/target tokens/sec Epoch: 3, Batch: 4750/11961, Batch size: 16, LR: 0.1000, PPL: 2353.35, |Param|: 5221.42, |GParam|: 30.49, Training: 125/55/70 total/source/target tokens/sec Epoch: 3, Batch: 4800/11961, Batch size: 16, LR: 0.1000, PPL: 2369.10, |Param|: 5221.76, |GParam|: 36.95, Training: 125/55/70 total/source/target tokens/sec Epoch: 3, Batch: 4850/11961, Batch size: 16, LR: 0.1000, PPL: 2385.42, |Param|: 5222.16, |GParam|: 37.88, Training: 125/55/70 total/source/target tokens/sec Epoch: 3, Batch: 4900/11961, Batch size: 16, LR: 0.1000, PPL: 2402.55, |Param|: 5222.51, |GParam|: 49.34, Training: 125/55/70 total/source/target tokens/sec Epoch: 3, Batch: 4950/11961, Batch size: 16, LR: 0.1000, PPL: 2416.66, |Param|: 5222.91, |GParam|: 32.86, Training: 125/55/70 total/source/target tokens/sec Epoch: 3, Batch: 5000/11961, Batch size: 16, LR: 0.1000, PPL: 2431.39, |Param|: 5223.32, |GParam|: 33.66, Training: 125/55/70 total/source/target tokens/sec Epoch: 3, Batch: 5050/11961, Batch size: 16, LR: 0.1000, PPL: 2446.23, |Param|: 5223.73, |GParam|: 33.37, Training: 125/55/70 total/source/target tokens/sec Epoch: 3, Batch: 5100/11961, Batch size: 16, LR: 0.1000, PPL: 2462.46, |Param|: 5224.17, |GParam|: 42.37, Training: 125/55/70 total/source/target tokens/sec Epoch: 3, Batch: 5150/11961, Batch size: 16, LR: 0.1000, PPL: 2479.55, |Param|: 5224.60, |GParam|: 46.75, Training: 125/55/70 total/source/target tokens/sec Epoch: 3, Batch: 5200/11961, Batch size: 16, LR: 0.1000, PPL: 2497.65, |Param|: 5225.10, |GParam|: 36.47, Training: 126/55/70 total/source/target tokens/sec Epoch: 3, Batch: 5250/11961, Batch size: 16, LR: 0.1000, PPL: 2514.99, |Param|: 5225.62, |GParam|: 32.91, Training: 126/56/70 total/source/target tokens/sec Epoch: 3, Batch: 5300/11961, Batch size: 16, LR: 0.1000, PPL: 2531.82, |Param|: 5226.21, |GParam|: 27.39, Training: 126/56/70 total/source/target tokens/sec Epoch: 3, Batch: 5350/11961, Batch size: 16, LR: 0.1000, PPL: 2552.70, |Param|: 5226.82, |GParam|: 48.53, Training: 126/56/70 total/source/target tokens/sec Epoch: 3, Batch: 5400/11961, Batch size: 16, LR: 0.1000, PPL: 2568.23, |Param|: 5227.38, |GParam|: 39.03, Training: 126/56/70 total/source/target tokens/sec Epoch: 3, Batch: 5450/11961, Batch size: 16, LR: 0.1000, PPL: 2584.18, |Param|: 5227.96, |GParam|: 50.56, Training: 126/56/70 total/source/target tokens/sec Epoch: 3, Batch: 5500/11961, Batch size: 16, LR: 0.1000, PPL: 2601.21, |Param|: 5228.50, |GParam|: 48.01, Training: 126/56/70 total/source/target tokens/sec Epoch: 3, Batch: 5550/11961, Batch size: 16, LR: 0.1000, PPL: 2615.45, |Param|: 5229.15, |GParam|: 45.71, Training: 126/56/70 total/source/target tokens/sec Epoch: 3, Batch: 5600/11961, Batch size: 16, LR: 0.1000, PPL: 2631.33, |Param|: 5229.77, |GParam|: 34.35, Training: 126/56/70 total/source/target tokens/sec Epoch: 3, Batch: 5650/11961, Batch size: 16, LR: 0.1000, PPL: 2648.86, |Param|: 5230.39, |GParam|: 39.13, Training: 126/56/70 total/source/target tokens/sec Epoch: 3, Batch: 5700/11961, Batch size: 16, LR: 0.1000, PPL: 2665.28, |Param|: 5231.00, |GParam|: 41.30, Training: 126/56/70 total/source/target tokens/sec Epoch: 3, Batch: 5750/11961, Batch size: 16, LR: 0.1000, PPL: 2679.63, |Param|: 5231.68, |GParam|: 34.36, Training: 126/56/70 total/source/target tokens/sec Epoch: 3, Batch: 5800/11961, Batch size: 16, LR: 0.1000, PPL: 2694.53, |Param|: 5232.42, |GParam|: 35.02, Training: 127/56/70 total/source/target tokens/sec Epoch: 3, Batch: 5850/11961, Batch size: 16, LR: 0.1000, PPL: 2710.79, |Param|: 5233.09, |GParam|: 39.73, Training: 127/57/70 total/source/target tokens/sec Epoch: 3, Batch: 5900/11961, Batch size: 16, LR: 0.1000, PPL: 2725.91, |Param|: 5233.78, |GParam|: 38.44, Training: 127/57/70 total/source/target tokens/sec Epoch: 3, Batch: 5950/11961, Batch size: 16, LR: 0.1000, PPL: 2740.60, |Param|: 5234.52, |GParam|: 40.15, Training: 127/57/70 total/source/target tokens/sec Epoch: 3, Batch: 6000/11961, Batch size: 16, LR: 0.1000, PPL: 2757.55, |Param|: 5235.36, |GParam|: 26.99, Training: 127/57/70 total/source/target tokens/sec Epoch: 3, Batch: 6050/11961, Batch size: 16, LR: 0.1000, PPL: 2775.32, |Param|: 5236.13, |GParam|: 35.57, Training: 127/57/70 total/source/target tokens/sec Epoch: 3, Batch: 6100/11961, Batch size: 16, LR: 0.1000, PPL: 2791.03, |Param|: 5236.99, |GParam|: 46.93, Training: 127/57/70 total/source/target tokens/sec Epoch: 3, Batch: 6150/11961, Batch size: 16, LR: 0.1000, PPL: 2806.36, |Param|: 5237.76, |GParam|: 42.39, Training: 127/57/70 total/source/target tokens/sec Epoch: 3, Batch: 6200/11961, Batch size: 16, LR: 0.1000, PPL: 2822.65, |Param|: 5238.56, |GParam|: 53.29, Training: 127/57/70 total/source/target tokens/sec Epoch: 3, Batch: 6250/11961, Batch size: 16, LR: 0.1000, PPL: 2837.25, |Param|: 5239.42, |GParam|: 27.86, Training: 127/57/70 total/source/target tokens/sec Epoch: 3, Batch: 6300/11961, Batch size: 16, LR: 0.1000, PPL: 2854.95, |Param|: 5240.29, |GParam|: 34.74, Training: 127/57/70 total/source/target tokens/sec Epoch: 3, Batch: 6350/11961, Batch size: 16, LR: 0.1000, PPL: 2871.12, |Param|: 5241.08, |GParam|: 32.17, Training: 127/57/70 total/source/target tokens/sec Epoch: 3, Batch: 6400/11961, Batch size: 16, LR: 0.1000, PPL: 2886.52, |Param|: 5241.93, |GParam|: 44.88, Training: 128/57/70 total/source/target tokens/sec Epoch: 3, Batch: 6450/11961, Batch size: 16, LR: 0.1000, PPL: 2903.52, |Param|: 5242.69, |GParam|: 44.01, Training: 128/58/70 total/source/target tokens/sec Epoch: 3, Batch: 6500/11961, Batch size: 16, LR: 0.1000, PPL: 2918.32, |Param|: 5243.44, |GParam|: 50.22, Training: 128/58/70 total/source/target tokens/sec Epoch: 3, Batch: 6550/11961, Batch size: 16, LR: 0.1000, PPL: 2932.88, |Param|: 5244.17, |GParam|: 40.41, Training: 128/58/70 total/source/target tokens/sec Epoch: 3, Batch: 6600/11961, Batch size: 16, LR: 0.1000, PPL: 2947.82, |Param|: 5245.13, |GParam|: 36.81, Training: 128/58/70 total/source/target tokens/sec Epoch: 3, Batch: 6650/11961, Batch size: 16, LR: 0.1000, PPL: 2961.76, |Param|: 5246.03, |GParam|: 35.87, Training: 128/58/70 total/source/target tokens/sec Epoch: 3, Batch: 6700/11961, Batch size: 16, LR: 0.1000, PPL: 2979.29, |Param|: 5247.00, |GParam|: 36.41, Training: 128/58/70 total/source/target tokens/sec Epoch: 3, Batch: 6750/11961, Batch size: 16, LR: 0.1000, PPL: 2993.78, |Param|: 5247.98, |GParam|: 37.45, Training: 128/58/70 total/source/target tokens/sec Epoch: 3, Batch: 6800/11961, Batch size: 16, LR: 0.1000, PPL: 3009.78, |Param|: 5248.94, |GParam|: 36.29, Training: 128/58/70 total/source/target tokens/sec Epoch: 3, Batch: 6850/11961, Batch size: 16, LR: 0.1000, PPL: 3026.18, |Param|: 5250.03, |GParam|: 44.86, Training: 128/58/70 total/source/target tokens/sec Epoch: 3, Batch: 6900/11961, Batch size: 16, LR: 0.1000, PPL: 3043.20, |Param|: 5251.01, |GParam|: 42.88, Training: 128/58/70 total/source/target tokens/sec Epoch: 3, Batch: 6950/11961, Batch size: 16, LR: 0.1000, PPL: 3059.08, |Param|: 5252.04, |GParam|: 39.97, Training: 128/58/70 total/source/target tokens/sec Epoch: 3, Batch: 7000/11961, Batch size: 16, LR: 0.1000, PPL: 3074.81, |Param|: 5253.10, |GParam|: 39.53, Training: 128/58/70 total/source/target tokens/sec Epoch: 3, Batch: 7050/11961, Batch size: 16, LR: 0.1000, PPL: 3087.81, |Param|: 5254.14, |GParam|: 37.69, Training: 128/58/70 total/source/target tokens/sec Epoch: 3, Batch: 7100/11961, Batch size: 16, LR: 0.1000, PPL: 3102.87, |Param|: 5255.16, |GParam|: 36.33, Training: 129/58/70 total/source/target tokens/sec Epoch: 3, Batch: 7150/11961, Batch size: 16, LR: 0.1000, PPL: 3117.43, |Param|: 5256.23, |GParam|: 42.55, Training: 129/59/70 total/source/target tokens/sec Epoch: 3, Batch: 7200/11961, Batch size: 16, LR: 0.1000, PPL: 3132.52, |Param|: 5257.40, |GParam|: 37.91, Training: 129/59/70 total/source/target tokens/sec Epoch: 3, Batch: 7250/11961, Batch size: 16, LR: 0.1000, PPL: 3149.27, |Param|: 5258.56, |GParam|: 34.88, Training: 129/59/70 total/source/target tokens/sec Epoch: 3, Batch: 7300/11961, Batch size: 16, LR: 0.1000, PPL: 3165.71, |Param|: 5259.71, |GParam|: 40.36, Training: 129/59/70 total/source/target tokens/sec Epoch: 3, Batch: 7350/11961, Batch size: 16, LR: 0.1000, PPL: 3181.44, |Param|: 5260.81, |GParam|: 36.90, Training: 129/59/70 total/source/target tokens/sec Epoch: 3, Batch: 7400/11961, Batch size: 16, LR: 0.1000, PPL: 3196.67, |Param|: 5261.99, |GParam|: 48.05, Training: 129/59/70 total/source/target tokens/sec Epoch: 3, Batch: 7450/11961, Batch size: 16, LR: 0.1000, PPL: 3212.26, |Param|: 5263.20, |GParam|: 43.05, Training: 129/59/70 total/source/target tokens/sec Epoch: 3, Batch: 7500/11961, Batch size: 16, LR: 0.1000, PPL: 3226.31, |Param|: 5264.52, |GParam|: 34.30, Training: 129/59/69 total/source/target tokens/sec Epoch: 3, Batch: 7550/11961, Batch size: 16, LR: 0.1000, PPL: 3238.53, |Param|: 5265.72, |GParam|: 35.80, Training: 129/59/69 total/source/target tokens/sec Epoch: 3, Batch: 7600/11961, Batch size: 16, LR: 0.1000, PPL: 3252.65, |Param|: 5266.93, |GParam|: 35.53, Training: 129/59/69 total/source/target tokens/sec Epoch: 3, Batch: 7650/11961, Batch size: 16, LR: 0.1000, PPL: 3271.81, |Param|: 5268.35, |GParam|: 44.37, Training: 129/59/69 total/source/target tokens/sec Epoch: 3, Batch: 7700/11961, Batch size: 16, LR: 0.1000, PPL: 3287.17, |Param|: 5269.72, |GParam|: 35.91, Training: 129/59/69 total/source/target tokens/sec Epoch: 3, Batch: 7750/11961, Batch size: 16, LR: 0.1000, PPL: 3301.89, |Param|: 5270.94, |GParam|: 46.07, Training: 129/59/69 total/source/target tokens/sec Epoch: 3, Batch: 7800/11961, Batch size: 16, LR: 0.1000, PPL: 3316.14, |Param|: 5272.24, |GParam|: 52.79, Training: 129/59/69 total/source/target tokens/sec Epoch: 3, Batch: 7850/11961, Batch size: 16, LR: 0.1000, PPL: 3333.28, |Param|: 5273.67, |GParam|: 48.43, Training: 129/59/69 total/source/target tokens/sec Epoch: 3, Batch: 7900/11961, Batch size: 16, LR: 0.1000, PPL: 3346.44, |Param|: 5275.18, |GParam|: 54.60, Training: 130/60/69 total/source/target tokens/sec Epoch: 3, Batch: 7950/11961, Batch size: 16, LR: 0.1000, PPL: 3359.82, |Param|: 5276.63, |GParam|: 35.34, Training: 130/60/69 total/source/target tokens/sec Epoch: 3, Batch: 8000/11961, Batch size: 16, LR: 0.1000, PPL: 3376.59, |Param|: 5278.21, |GParam|: 50.58, Training: 130/60/69 total/source/target tokens/sec Epoch: 3, Batch: 8050/11961, Batch size: 16, LR: 0.1000, PPL: 3389.81, |Param|: 5279.72, |GParam|: 40.87, Training: 130/60/69 total/source/target tokens/sec Epoch: 3, Batch: 8100/11961, Batch size: 16, LR: 0.1000, PPL: 3406.49, |Param|: 5281.38, |GParam|: 37.75, Training: 130/60/69 total/source/target tokens/sec Epoch: 3, Batch: 8150/11961, Batch size: 16, LR: 0.1000, PPL: 3422.81, |Param|: 5282.95, |GParam|: 41.53, Training: 130/60/69 total/source/target tokens/sec Epoch: 3, Batch: 8200/11961, Batch size: 16, LR: 0.1000, PPL: 3439.02, |Param|: 5284.65, |GParam|: 36.78, Training: 130/60/69 total/source/target tokens/sec Epoch: 3, Batch: 8250/11961, Batch size: 9, LR: 0.1000, PPL: 3455.80, |Param|: 5286.37, |GParam|: 54.82, Training: 130/60/69 total/source/target tokens/sec Epoch: 3, Batch: 8300/11961, Batch size: 16, LR: 0.1000, PPL: 3470.15, |Param|: 5288.04, |GParam|: 39.87, Training: 130/60/69 total/source/target tokens/sec Epoch: 3, Batch: 8350/11961, Batch size: 16, LR: 0.1000, PPL: 3482.90, |Param|: 5289.56, |GParam|: 36.92, Training: 130/60/69 total/source/target tokens/sec Epoch: 3, Batch: 8400/11961, Batch size: 16, LR: 0.1000, PPL: 3498.11, |Param|: 5291.17, |GParam|: 40.73, Training: 130/60/69 total/source/target tokens/sec Epoch: 3, Batch: 8450/11961, Batch size: 16, LR: 0.1000, PPL: 3514.49, |Param|: 5293.03, |GParam|: 36.36, Training: 130/60/69 total/source/target tokens/sec Epoch: 3, Batch: 8500/11961, Batch size: 16, LR: 0.1000, PPL: 3527.52, |Param|: 5294.55, |GParam|: 45.46, Training: 130/60/69 total/source/target tokens/sec Epoch: 3, Batch: 8550/11961, Batch size: 16, LR: 0.1000, PPL: 3541.47, |Param|: 5296.27, |GParam|: 45.12, Training: 130/60/69 total/source/target tokens/sec Epoch: 3, Batch: 8600/11961, Batch size: 16, LR: 0.1000, PPL: 3558.15, |Param|: 5298.24, |GParam|: 42.51, Training: 130/60/69 total/source/target tokens/sec Epoch: 3, Batch: 8650/11961, Batch size: 16, LR: 0.1000, PPL: 3573.57, |Param|: 5300.08, |GParam|: 43.83, Training: 130/61/69 total/source/target tokens/sec Epoch: 3, Batch: 8700/11961, Batch size: 16, LR: 0.1000, PPL: 3587.63, |Param|: 5301.86, |GParam|: 55.12, Training: 131/61/69 total/source/target tokens/sec Epoch: 3, Batch: 8750/11961, Batch size: 16, LR: 0.1000, PPL: 3602.19, |Param|: 5303.76, |GParam|: 43.87, Training: 131/61/69 total/source/target tokens/sec Epoch: 3, Batch: 8800/11961, Batch size: 16, LR: 0.1000, PPL: 3615.59, |Param|: 5305.28, |GParam|: 41.79, Training: 131/61/69 total/source/target tokens/sec Epoch: 3, Batch: 8850/11961, Batch size: 16, LR: 0.1000, PPL: 3629.64, |Param|: 5307.36, |GParam|: 34.22, Training: 131/61/69 total/source/target tokens/sec Epoch: 3, Batch: 8900/11961, Batch size: 16, LR: 0.1000, PPL: 3643.69, |Param|: 5309.24, |GParam|: 53.60, Training: 131/61/69 total/source/target tokens/sec Epoch: 3, Batch: 8950/11961, Batch size: 16, LR: 0.1000, PPL: 3661.16, |Param|: 5311.39, |GParam|: 43.59, Training: 131/61/69 total/source/target tokens/sec Epoch: 3, Batch: 9000/11961, Batch size: 16, LR: 0.1000, PPL: 3676.87, |Param|: 5313.22, |GParam|: 42.48, Training: 131/61/69 total/source/target tokens/sec Epoch: 3, Batch: 9050/11961, Batch size: 16, LR: 0.1000, PPL: 3691.15, |Param|: 5315.56, |GParam|: 49.58, Training: 131/61/69 total/source/target tokens/sec Epoch: 3, Batch: 9100/11961, Batch size: 16, LR: 0.1000, PPL: 3708.59, |Param|: 5318.06, |GParam|: 39.21, Training: 131/61/69 total/source/target tokens/sec Epoch: 3, Batch: 9150/11961, Batch size: 16, LR: 0.1000, PPL: 3722.81, |Param|: 5319.83, |GParam|: 41.27, Training: 131/61/69 total/source/target tokens/sec Epoch: 3, Batch: 9200/11961, Batch size: 16, LR: 0.1000, PPL: 3735.77, |Param|: 5321.91, |GParam|: 36.42, Training: 131/61/69 total/source/target tokens/sec Epoch: 3, Batch: 9250/11961, Batch size: 16, LR: 0.1000, PPL: 3748.80, |Param|: 5324.03, |GParam|: 39.79, Training: 131/61/69 total/source/target tokens/sec Epoch: 3, Batch: 9300/11961, Batch size: 16, LR: 0.1000, PPL: 3763.25, |Param|: 5326.24, |GParam|: 44.44, Training: 131/61/69 total/source/target tokens/sec Epoch: 3, Batch: 9350/11961, Batch size: 16, LR: 0.1000, PPL: 3778.10, |Param|: 5328.45, |GParam|: 49.33, Training: 131/61/69 total/source/target tokens/sec Epoch: 3, Batch: 9400/11961, Batch size: 16, LR: 0.1000, PPL: 3791.75, |Param|: 5330.57, |GParam|: 48.37, Training: 131/62/69 total/source/target tokens/sec Epoch: 3, Batch: 9450/11961, Batch size: 16, LR: 0.1000, PPL: 3808.55, |Param|: 5333.06, |GParam|: 41.63, Training: 131/62/69 total/source/target tokens/sec Epoch: 3, Batch: 9500/11961, Batch size: 16, LR: 0.1000, PPL: 3825.82, |Param|: 5335.73, |GParam|: 42.14, Training: 131/62/69 total/source/target tokens/sec Epoch: 3, Batch: 9550/11961, Batch size: 16, LR: 0.1000, PPL: 3840.76, |Param|: 5337.93, |GParam|: 44.35, Training: 132/62/69 total/source/target tokens/sec Epoch: 3, Batch: 9600/11961, Batch size: 16, LR: 0.1000, PPL: 3856.85, |Param|: 5339.89, |GParam|: 50.35, Training: 132/62/69 total/source/target tokens/sec Epoch: 3, Batch: 9650/11961, Batch size: 16, LR: 0.1000, PPL: 3869.95, |Param|: 5341.82, |GParam|: 47.60, Training: 132/62/69 total/source/target tokens/sec Epoch: 3, Batch: 9700/11961, Batch size: 16, LR: 0.1000, PPL: 3881.68, |Param|: 5344.19, |GParam|: 39.68, Training: 132/62/69 total/source/target tokens/sec Epoch: 3, Batch: 9750/11961, Batch size: 16, LR: 0.1000, PPL: 3899.63, |Param|: 5346.86, |GParam|: 53.08, Training: 132/62/69 total/source/target tokens/sec Epoch: 3, Batch: 9800/11961, Batch size: 16, LR: 0.1000, PPL: 3915.57, |Param|: 5349.61, |GParam|: 42.57, Training: 132/62/69 total/source/target tokens/sec Epoch: 3, Batch: 9850/11961, Batch size: 16, LR: 0.1000, PPL: 3931.92, |Param|: 5352.11, |GParam|: 39.48, Training: 132/62/69 total/source/target tokens/sec Epoch: 3, Batch: 9900/11961, Batch size: 16, LR: 0.1000, PPL: 3945.64, |Param|: 5354.83, |GParam|: 33.47, Training: 132/62/69 total/source/target tokens/sec Epoch: 3, Batch: 9950/11961, Batch size: 16, LR: 0.1000, PPL: 3959.39, |Param|: 5357.37, |GParam|: 60.47, Training: 132/62/69 total/source/target tokens/sec Epoch: 3, Batch: 10000/11961, Batch size: 16, LR: 0.1000, PPL: 3973.57, |Param|: 5359.92, |GParam|: 42.87, Training: 132/62/69 total/source/target tokens/sec Epoch: 3, Batch: 10050/11961, Batch size: 16, LR: 0.1000, PPL: 3988.17, |Param|: 5362.78, |GParam|: 41.93, Training: 132/62/69 total/source/target tokens/sec Epoch: 3, Batch: 10100/11961, Batch size: 16, LR: 0.1000, PPL: 4004.83, |Param|: 5365.50, |GParam|: 47.96, Training: 132/62/69 total/source/target tokens/sec Epoch: 3, Batch: 10150/11961, Batch size: 16, LR: 0.1000, PPL: 4020.94, |Param|: 5368.28, |GParam|: 41.16, Training: 132/62/69 total/source/target tokens/sec Epoch: 3, Batch: 10200/11961, Batch size: 16, LR: 0.1000, PPL: 4034.98, |Param|: 5370.77, |GParam|: 36.51, Training: 132/62/69 total/source/target tokens/sec Epoch: 3, Batch: 10250/11961, Batch size: 16, LR: 0.1000, PPL: 4052.48, |Param|: 5373.85, |GParam|: 43.24, Training: 132/63/69 total/source/target tokens/sec Epoch: 3, Batch: 10300/11961, Batch size: 16, LR: 0.1000, PPL: 4067.46, |Param|: 5376.61, |GParam|: 59.32, Training: 132/63/69 total/source/target tokens/sec Epoch: 3, Batch: 10350/11961, Batch size: 16, LR: 0.1000, PPL: 4081.46, |Param|: 5379.32, |GParam|: 44.79, Training: 132/63/69 total/source/target tokens/sec Epoch: 3, Batch: 10400/11961, Batch size: 16, LR: 0.1000, PPL: 4095.41, |Param|: 5382.44, |GParam|: 38.94, Training: 133/63/69 total/source/target tokens/sec Epoch: 3, Batch: 10450/11961, Batch size: 16, LR: 0.1000, PPL: 4110.15, |Param|: 5385.57, |GParam|: 43.73, Training: 133/63/69 total/source/target tokens/sec Epoch: 3, Batch: 10500/11961, Batch size: 16, LR: 0.1000, PPL: 4127.22, |Param|: 5388.24, |GParam|: 47.71, Training: 133/63/69 total/source/target tokens/sec Epoch: 3, Batch: 10550/11961, Batch size: 16, LR: 0.1000, PPL: 4141.44, |Param|: 5391.43, |GParam|: 42.32, Training: 133/63/69 total/source/target tokens/sec Epoch: 3, Batch: 10600/11961, Batch size: 16, LR: 0.1000, PPL: 4158.55, |Param|: 5394.73, |GParam|: 48.63, Training: 133/63/69 total/source/target tokens/sec Epoch: 3, Batch: 10650/11961, Batch size: 16, LR: 0.1000, PPL: 4172.03, |Param|: 5397.64, |GParam|: 44.93, Training: 133/63/69 total/source/target tokens/sec Epoch: 3, Batch: 10700/11961, Batch size: 16, LR: 0.1000, PPL: 4186.99, |Param|: 5400.47, |GParam|: 43.57, Training: 133/63/69 total/source/target tokens/sec Epoch: 3, Batch: 10750/11961, Batch size: 16, LR: 0.1000, PPL: 4201.96, |Param|: 5403.26, |GParam|: 61.70, Training: 133/63/69 total/source/target tokens/sec Epoch: 3, Batch: 10800/11961, Batch size: 16, LR: 0.1000, PPL: 4217.21, |Param|: 5406.21, |GParam|: 58.22, Training: 133/63/69 total/source/target tokens/sec Epoch: 3, Batch: 10850/11961, Batch size: 16, LR: 0.1000, PPL: 4231.00, |Param|: 5409.56, |GParam|: 54.72, Training: 133/63/69 total/source/target tokens/sec Epoch: 3, Batch: 10900/11961, Batch size: 16, LR: 0.1000, PPL: 4246.65, |Param|: 5412.94, |GParam|: 45.56, Training: 133/63/69 total/source/target tokens/sec Epoch: 3, Batch: 10950/11961, Batch size: 16, LR: 0.1000, PPL: 4260.94, |Param|: 5415.74, |GParam|: 50.17, Training: 133/63/69 total/source/target tokens/sec Epoch: 3, Batch: 11000/11961, Batch size: 16, LR: 0.1000, PPL: 4277.39, |Param|: 5418.66, |GParam|: 40.42, Training: 133/64/69 total/source/target tokens/sec Epoch: 3, Batch: 11050/11961, Batch size: 16, LR: 0.1000, PPL: 4292.29, |Param|: 5422.04, |GParam|: 55.34, Training: 133/64/69 total/source/target tokens/sec Epoch: 3, Batch: 11100/11961, Batch size: 16, LR: 0.1000, PPL: 4308.34, |Param|: 5424.75, |GParam|: 45.19, Training: 133/64/69 total/source/target tokens/sec Epoch: 3, Batch: 11150/11961, Batch size: 16, LR: 0.1000, PPL: 4326.16, |Param|: 5428.11, |GParam|: 47.82, Training: 133/64/69 total/source/target tokens/sec Epoch: 3, Batch: 11200/11961, Batch size: 16, LR: 0.1000, PPL: 4340.69, |Param|: 5431.01, |GParam|: 47.21, Training: 133/64/69 total/source/target tokens/sec Epoch: 3, Batch: 11250/11961, Batch size: 16, LR: 0.1000, PPL: 4356.24, |Param|: 5433.86, |GParam|: 44.92, Training: 134/64/69 total/source/target tokens/sec Epoch: 3, Batch: 11300/11961, Batch size: 16, LR: 0.1000, PPL: 4368.08, |Param|: 5436.50, |GParam|: 44.84, Training: 134/64/69 total/source/target tokens/sec Epoch: 3, Batch: 11350/11961, Batch size: 16, LR: 0.1000, PPL: 4383.07, |Param|: 5439.04, |GParam|: 58.27, Training: 134/64/69 total/source/target tokens/sec Epoch: 3, Batch: 11400/11961, Batch size: 16, LR: 0.1000, PPL: 4396.13, |Param|: 5441.63, |GParam|: 53.55, Training: 134/64/69 total/source/target tokens/sec Epoch: 3, Batch: 11450/11961, Batch size: 16, LR: 0.1000, PPL: 4411.35, |Param|: 5443.97, |GParam|: 46.69, Training: 134/64/69 total/source/target tokens/sec Epoch: 3, Batch: 11500/11961, Batch size: 16, LR: 0.1000, PPL: 4426.25, |Param|: 5446.74, |GParam|: 58.34, Training: 134/64/69 total/source/target tokens/sec Epoch: 3, Batch: 11550/11961, Batch size: 16, LR: 0.1000, PPL: 4440.89, |Param|: 5449.24, |GParam|: 61.00, Training: 134/64/69 total/source/target tokens/sec Epoch: 3, Batch: 11600/11961, Batch size: 16, LR: 0.1000, PPL: 4454.14, |Param|: 5451.93, |GParam|: 55.25, Training: 134/64/69 total/source/target tokens/sec Epoch: 3, Batch: 11650/11961, Batch size: 16, LR: 0.1000, PPL: 4470.29, |Param|: 5454.86, |GParam|: 47.98, Training: 134/64/69 total/source/target tokens/sec Epoch: 3, Batch: 11700/11961, Batch size: 16, LR: 0.1000, PPL: 4482.16, |Param|: 5457.60, |GParam|: 69.41, Training: 134/64/69 total/source/target tokens/sec Epoch: 3, Batch: 11750/11961, Batch size: 16, LR: 0.1000, PPL: 4494.36, |Param|: 5459.63, |GParam|: 60.14, Training: 134/65/69 total/source/target tokens/sec Epoch: 3, Batch: 11800/11961, Batch size: 16, LR: 0.1000, PPL: 4503.98, |Param|: 5461.97, |GParam|: 66.52, Training: 134/65/69 total/source/target tokens/sec Epoch: 3, Batch: 11850/11961, Batch size: 16, LR: 0.1000, PPL: 4511.03, |Param|: 5464.10, |GParam|: 52.58, Training: 134/65/69 total/source/target tokens/sec Epoch: 3, Batch: 11900/11961, Batch size: 16, LR: 0.1000, PPL: 4522.15, |Param|: 5466.70, |GParam|: 48.70, Training: 134/65/69 total/source/target tokens/sec Epoch: 3, Batch: 11950/11961, Batch size: 16, LR: 0.1000, PPL: 4530.55, |Param|: 5468.70, |GParam|: 69.45, Training: 134/65/69 total/source/target tokens/sec Train 36m4532.8107766977 Valid 36m2960.5173303691 saving checkpoint to demo-model_epoch3.00_2960.52.t7 Epoch: 4, Batch: 50/11961, Batch size: 16, LR: 0.1000, PPL: 687.48, |Param|: 5467.09, |GParam|: 19.13, Training: 92/17/74 total/source/target tokens/sec Epoch: 4, Batch: 100/11961, Batch size: 16, LR: 0.1000, PPL: 457.41, |Param|: 5466.66, |GParam|: 13.54, Training: 96/23/73 total/source/target tokens/sec Epoch: 4, Batch: 150/11961, Batch size: 16, LR: 0.1000, PPL: 422.73, |Param|: 5466.22, |GParam|: 12.91, Training: 99/27/72 total/source/target tokens/sec Epoch: 4, Batch: 200/11961, Batch size: 16, LR: 0.1000, PPL: 418.28, |Param|: 5465.85, |GParam|: 11.91, Training: 101/29/72 total/source/target tokens/sec Epoch: 4, Batch: 250/11961, Batch size: 16, LR: 0.1000, PPL: 438.03, |Param|: 5465.59, |GParam|: 13.20, Training: 103/30/72 total/source/target tokens/sec Epoch: 4, Batch: 300/11961, Batch size: 16, LR: 0.1000, PPL: 450.93, |Param|: 5465.25, |GParam|: 12.46, Training: 104/32/71 total/source/target tokens/sec Epoch: 4, Batch: 350/11961, Batch size: 16, LR: 0.1000, PPL: 446.87, |Param|: 5464.87, |GParam|: 14.05, Training: 105/33/71 total/source/target tokens/sec Epoch: 4, Batch: 400/11961, Batch size: 16, LR: 0.1000, PPL: 462.29, |Param|: 5464.50, |GParam|: 16.39, Training: 106/34/71 total/source/target tokens/sec Epoch: 4, Batch: 450/11961, Batch size: 16, LR: 0.1000, PPL: 472.19, |Param|: 5464.10, |GParam|: 13.47, Training: 107/35/71 total/source/target tokens/sec Epoch: 4, Batch: 500/11961, Batch size: 16, LR: 0.1000, PPL: 485.63, |Param|: 5463.75, |GParam|: 17.32, Training: 107/36/71 total/source/target tokens/sec Epoch: 4, Batch: 550/11961, Batch size: 16, LR: 0.1000, PPL: 497.89, |Param|: 5463.37, |GParam|: 20.28, Training: 108/37/71 total/source/target tokens/sec Epoch: 4, Batch: 600/11961, Batch size: 16, LR: 0.1000, PPL: 522.43, |Param|: 5462.96, |GParam|: 15.21, Training: 108/37/71 total/source/target tokens/sec Epoch: 4, Batch: 650/11961, Batch size: 16, LR: 0.1000, PPL: 545.40, |Param|: 5462.57, |GParam|: 19.47, Training: 109/38/71 total/source/target tokens/sec Epoch: 4, Batch: 700/11961, Batch size: 16, LR: 0.1000, PPL: 561.76, |Param|: 5462.14, |GParam|: 17.07, Training: 110/38/71 total/source/target tokens/sec Epoch: 4, Batch: 750/11961, Batch size: 16, LR: 0.1000, PPL: 578.23, |Param|: 5461.83, |GParam|: 19.52, Training: 110/39/71 total/source/target tokens/sec Epoch: 4, Batch: 800/11961, Batch size: 16, LR: 0.1000, PPL: 594.42, |Param|: 5461.42, |GParam|: 16.59, Training: 110/39/71 total/source/target tokens/sec Epoch: 4, Batch: 850/11961, Batch size: 16, LR: 0.1000, PPL: 609.93, |Param|: 5461.04, |GParam|: 15.64, Training: 111/40/71 total/source/target tokens/sec Epoch: 4, Batch: 900/11961, Batch size: 16, LR: 0.1000, PPL: 623.52, |Param|: 5460.68, |GParam|: 15.43, Training: 112/40/71 total/source/target tokens/sec Epoch: 4, Batch: 950/11961, Batch size: 16, LR: 0.1000, PPL: 639.50, |Param|: 5460.30, |GParam|: 20.13, Training: 112/41/71 total/source/target tokens/sec Epoch: 4, Batch: 1000/11961, Batch size: 16, LR: 0.1000, PPL: 653.81, |Param|: 5459.88, |GParam|: 20.31, Training: 112/41/71 total/source/target tokens/sec Epoch: 4, Batch: 1050/11961, Batch size: 16, LR: 0.1000, PPL: 670.48, |Param|: 5459.48, |GParam|: 18.09, Training: 113/42/71 total/source/target tokens/sec Epoch: 4, Batch: 1100/11961, Batch size: 16, LR: 0.1000, PPL: 684.25, |Param|: 5459.13, |GParam|: 17.33, Training: 113/42/71 total/source/target tokens/sec Epoch: 4, Batch: 1150/11961, Batch size: 16, LR: 0.1000, PPL: 699.00, |Param|: 5458.82, |GParam|: 23.23, Training: 114/43/70 total/source/target tokens/sec Epoch: 4, Batch: 1200/11961, Batch size: 16, LR: 0.1000, PPL: 714.85, |Param|: 5458.47, |GParam|: 21.67, Training: 114/43/70 total/source/target tokens/sec Epoch: 4, Batch: 1250/11961, Batch size: 16, LR: 0.1000, PPL: 729.38, |Param|: 5458.14, |GParam|: 22.33, Training: 114/43/70 total/source/target tokens/sec Epoch: 4, Batch: 1300/11961, Batch size: 16, LR: 0.1000, PPL: 737.88, |Param|: 5457.75, |GParam|: 19.49, Training: 114/43/70 total/source/target tokens/sec Epoch: 4, Batch: 1350/11961, Batch size: 16, LR: 0.1000, PPL: 749.12, |Param|: 5457.38, |GParam|: 23.25, Training: 115/44/70 total/source/target tokens/sec Epoch: 4, Batch: 1400/11961, Batch size: 16, LR: 0.1000, PPL: 747.71, |Param|: 5457.00, |GParam|: 21.25, Training: 115/44/70 total/source/target tokens/sec Epoch: 4, Batch: 1450/11961, Batch size: 16, LR: 0.1000, PPL: 760.76, |Param|: 5456.72, |GParam|: 21.28, Training: 115/44/70 total/source/target tokens/sec Epoch: 4, Batch: 1500/11961, Batch size: 16, LR: 0.1000, PPL: 775.61, |Param|: 5456.47, |GParam|: 20.80, Training: 116/45/70 total/source/target tokens/sec Epoch: 4, Batch: 1550/11961, Batch size: 16, LR: 0.1000, PPL: 791.10, |Param|: 5456.24, |GParam|: 24.58, Training: 116/45/70 total/source/target tokens/sec Epoch: 4, Batch: 1600/11961, Batch size: 16, LR: 0.1000, PPL: 806.82, |Param|: 5456.10, |GParam|: 27.48, Training: 116/45/70 total/source/target tokens/sec Epoch: 4, Batch: 1650/11961, Batch size: 16, LR: 0.1000, PPL: 822.72, |Param|: 5455.94, |GParam|: 31.34, Training: 116/45/70 total/source/target tokens/sec Epoch: 4, Batch: 1700/11961, Batch size: 16, LR: 0.1000, PPL: 836.99, |Param|: 5455.78, |GParam|: 24.14, Training: 116/46/70 total/source/target tokens/sec Epoch: 4, Batch: 1750/11961, Batch size: 16, LR: 0.1000, PPL: 851.93, |Param|: 5455.61, |GParam|: 34.29, Training: 117/46/70 total/source/target tokens/sec Epoch: 4, Batch: 1800/11961, Batch size: 16, LR: 0.1000, PPL: 865.74, |Param|: 5455.44, |GParam|: 25.11, Training: 117/46/70 total/source/target tokens/sec Epoch: 4, Batch: 1850/11961, Batch size: 16, LR: 0.1000, PPL: 878.32, |Param|: 5455.27, |GParam|: 29.80, Training: 117/46/70 total/source/target tokens/sec Epoch: 4, Batch: 1900/11961, Batch size: 16, LR: 0.1000, PPL: 890.57, |Param|: 5455.15, |GParam|: 26.11, Training: 117/46/70 total/source/target tokens/sec Epoch: 4, Batch: 1950/11961, Batch size: 16, LR: 0.1000, PPL: 902.18, |Param|: 5455.00, |GParam|: 26.12, Training: 117/47/70 total/source/target tokens/sec Epoch: 4, Batch: 2000/11961, Batch size: 16, LR: 0.1000, PPL: 913.05, |Param|: 5454.84, |GParam|: 40.70, Training: 118/47/70 total/source/target tokens/sec Epoch: 4, Batch: 2050/11961, Batch size: 16, LR: 0.1000, PPL: 923.15, |Param|: 5454.72, |GParam|: 25.35, Training: 118/47/70 total/source/target tokens/sec Epoch: 4, Batch: 2100/11961, Batch size: 16, LR: 0.1000, PPL: 935.39, |Param|: 5454.59, |GParam|: 29.64, Training: 118/47/70 total/source/target tokens/sec Epoch: 4, Batch: 2150/11961, Batch size: 16, LR: 0.1000, PPL: 946.86, |Param|: 5454.44, |GParam|: 27.47, Training: 118/48/70 total/source/target tokens/sec Epoch: 4, Batch: 2200/11961, Batch size: 16, LR: 0.1000, PPL: 958.52, |Param|: 5454.27, |GParam|: 26.34, Training: 118/48/70 total/source/target tokens/sec Epoch: 4, Batch: 2250/11961, Batch size: 16, LR: 0.1000, PPL: 965.31, |Param|: 5454.13, |GParam|: 24.39, Training: 119/48/70 total/source/target tokens/sec Epoch: 4, Batch: 2300/11961, Batch size: 16, LR: 0.1000, PPL: 975.43, |Param|: 5454.02, |GParam|: 30.59, Training: 119/48/70 total/source/target tokens/sec Epoch: 4, Batch: 2350/11961, Batch size: 16, LR: 0.1000, PPL: 984.62, |Param|: 5453.90, |GParam|: 30.33, Training: 119/48/70 total/source/target tokens/sec Epoch: 4, Batch: 2400/11961, Batch size: 16, LR: 0.1000, PPL: 992.74, |Param|: 5453.79, |GParam|: 22.08, Training: 119/49/70 total/source/target tokens/sec Epoch: 4, Batch: 2450/11961, Batch size: 16, LR: 0.1000, PPL: 1001.00, |Param|: 5453.69, |GParam|: 26.96, Training: 119/49/70 total/source/target tokens/sec Epoch: 4, Batch: 2500/11961, Batch size: 16, LR: 0.1000, PPL: 1010.04, |Param|: 5453.57, |GParam|: 27.36, Training: 120/49/70 total/source/target tokens/sec Epoch: 4, Batch: 2550/11961, Batch size: 16, LR: 0.1000, PPL: 1020.46, |Param|: 5453.50, |GParam|: 27.91, Training: 120/49/70 total/source/target tokens/sec Epoch: 4, Batch: 2600/11961, Batch size: 16, LR: 0.1000, PPL: 1031.02, |Param|: 5453.43, |GParam|: 24.12, Training: 120/49/70 total/source/target tokens/sec Epoch: 4, Batch: 2650/11961, Batch size: 16, LR: 0.1000, PPL: 1039.24, |Param|: 5453.34, |GParam|: 36.21, Training: 120/49/70 total/source/target tokens/sec Epoch: 4, Batch: 2700/11961, Batch size: 16, LR: 0.1000, PPL: 1048.00, |Param|: 5453.27, |GParam|: 32.07, Training: 120/50/70 total/source/target tokens/sec Epoch: 4, Batch: 2750/11961, Batch size: 16, LR: 0.1000, PPL: 1056.16, |Param|: 5453.20, |GParam|: 27.49, Training: 120/50/70 total/source/target tokens/sec Epoch: 4, Batch: 2800/11961, Batch size: 16, LR: 0.1000, PPL: 1062.81, |Param|: 5453.12, |GParam|: 42.13, Training: 120/50/70 total/source/target tokens/sec Epoch: 4, Batch: 2850/11961, Batch size: 16, LR: 0.1000, PPL: 1069.58, |Param|: 5453.06, |GParam|: 29.58, Training: 121/50/70 total/source/target tokens/sec Epoch: 4, Batch: 2900/11961, Batch size: 16, LR: 0.1000, PPL: 1075.15, |Param|: 5452.99, |GParam|: 30.60, Training: 121/50/70 total/source/target tokens/sec Epoch: 4, Batch: 2950/11961, Batch size: 16, LR: 0.1000, PPL: 1083.37, |Param|: 5452.95, |GParam|: 28.16, Training: 121/50/70 total/source/target tokens/sec Epoch: 4, Batch: 3000/11961, Batch size: 16, LR: 0.1000, PPL: 1090.58, |Param|: 5452.91, |GParam|: 28.96, Training: 121/51/70 total/source/target tokens/sec Epoch: 4, Batch: 3050/11961, Batch size: 16, LR: 0.1000, PPL: 1099.03, |Param|: 5452.85, |GParam|: 26.23, Training: 121/51/70 total/source/target tokens/sec Epoch: 4, Batch: 3100/11961, Batch size: 16, LR: 0.1000, PPL: 1105.79, |Param|: 5452.81, |GParam|: 31.73, Training: 121/51/70 total/source/target tokens/sec Epoch: 4, Batch: 3150/11961, Batch size: 16, LR: 0.1000, PPL: 1114.04, |Param|: 5452.78, |GParam|: 30.86, Training: 121/51/70 total/source/target tokens/sec Epoch: 4, Batch: 3200/11961, Batch size: 16, LR: 0.1000, PPL: 1121.56, |Param|: 5452.74, |GParam|: 31.49, Training: 122/51/70 total/source/target tokens/sec Epoch: 4, Batch: 3250/11961, Batch size: 16, LR: 0.1000, PPL: 1128.56, |Param|: 5452.71, |GParam|: 35.25, Training: 122/51/70 total/source/target tokens/sec Epoch: 4, Batch: 3300/11961, Batch size: 16, LR: 0.1000, PPL: 1136.00, |Param|: 5452.65, |GParam|: 34.84, Training: 122/51/70 total/source/target tokens/sec Epoch: 4, Batch: 3350/11961, Batch size: 16, LR: 0.1000, PPL: 1143.08, |Param|: 5452.62, |GParam|: 35.45, Training: 122/51/70 total/source/target tokens/sec Epoch: 4, Batch: 3400/11961, Batch size: 16, LR: 0.1000, PPL: 1150.62, |Param|: 5452.58, |GParam|: 25.64, Training: 122/52/70 total/source/target tokens/sec Epoch: 4, Batch: 3450/11961, Batch size: 16, LR: 0.1000, PPL: 1157.19, |Param|: 5452.53, |GParam|: 30.22, Training: 122/52/70 total/source/target tokens/sec Epoch: 4, Batch: 3500/11961, Batch size: 16, LR: 0.1000, PPL: 1165.40, |Param|: 5452.51, |GParam|: 27.53, Training: 122/52/70 total/source/target tokens/sec Epoch: 4, Batch: 3550/11961, Batch size: 16, LR: 0.1000, PPL: 1172.49, |Param|: 5452.49, |GParam|: 26.89, Training: 122/52/70 total/source/target tokens/sec Epoch: 4, Batch: 3600/11961, Batch size: 16, LR: 0.1000, PPL: 1179.70, |Param|: 5452.46, |GParam|: 30.12, Training: 123/52/70 total/source/target tokens/sec Epoch: 4, Batch: 3650/11961, Batch size: 16, LR: 0.1000, PPL: 1187.50, |Param|: 5452.45, |GParam|: 27.54, Training: 123/52/70 total/source/target tokens/sec Epoch: 4, Batch: 3700/11961, Batch size: 16, LR: 0.1000, PPL: 1194.63, |Param|: 5452.44, |GParam|: 31.55, Training: 123/52/70 total/source/target tokens/sec Epoch: 4, Batch: 3750/11961, Batch size: 16, LR: 0.1000, PPL: 1202.26, |Param|: 5452.40, |GParam|: 30.81, Training: 123/53/70 total/source/target tokens/sec Epoch: 4, Batch: 3800/11961, Batch size: 16, LR: 0.1000, PPL: 1209.42, |Param|: 5452.39, |GParam|: 30.07, Training: 123/53/70 total/source/target tokens/sec Epoch: 4, Batch: 3850/11961, Batch size: 16, LR: 0.1000, PPL: 1215.98, |Param|: 5452.38, |GParam|: 27.77, Training: 123/53/70 total/source/target tokens/sec Epoch: 4, Batch: 3900/11961, Batch size: 16, LR: 0.1000, PPL: 1224.48, |Param|: 5452.37, |GParam|: 29.84, Training: 123/53/70 total/source/target tokens/sec Epoch: 4, Batch: 3950/11961, Batch size: 16, LR: 0.1000, PPL: 1232.41, |Param|: 5452.32, |GParam|: 31.08, Training: 123/53/70 total/source/target tokens/sec Epoch: 4, Batch: 4000/11961, Batch size: 16, LR: 0.1000, PPL: 1240.54, |Param|: 5452.32, |GParam|: 28.02, Training: 123/53/70 total/source/target tokens/sec Epoch: 4, Batch: 4050/11961, Batch size: 16, LR: 0.1000, PPL: 1247.75, |Param|: 5452.31, |GParam|: 34.78, Training: 124/53/70 total/source/target tokens/sec Epoch: 4, Batch: 4100/11961, Batch size: 16, LR: 0.1000, PPL: 1253.69, |Param|: 5452.30, |GParam|: 32.38, Training: 124/53/70 total/source/target tokens/sec Epoch: 4, Batch: 4150/11961, Batch size: 16, LR: 0.1000, PPL: 1260.61, |Param|: 5452.30, |GParam|: 30.93, Training: 124/53/70 total/source/target tokens/sec Epoch: 4, Batch: 4200/11961, Batch size: 16, LR: 0.1000, PPL: 1266.65, |Param|: 5452.30, |GParam|: 35.46, Training: 124/54/70 total/source/target tokens/sec Epoch: 4, Batch: 4250/11961, Batch size: 16, LR: 0.1000, PPL: 1274.43, |Param|: 5452.28, |GParam|: 43.30, Training: 124/54/70 total/source/target tokens/sec Epoch: 4, Batch: 4300/11961, Batch size: 16, LR: 0.1000, PPL: 1282.80, |Param|: 5452.29, |GParam|: 32.84, Training: 124/54/70 total/source/target tokens/sec Epoch: 4, Batch: 4350/11961, Batch size: 16, LR: 0.1000, PPL: 1290.17, |Param|: 5452.31, |GParam|: 37.67, Training: 124/54/70 total/source/target tokens/sec Epoch: 4, Batch: 4400/11961, Batch size: 16, LR: 0.1000, PPL: 1297.30, |Param|: 5452.32, |GParam|: 43.20, Training: 124/54/70 total/source/target tokens/sec Epoch: 4, Batch: 4450/11961, Batch size: 16, LR: 0.1000, PPL: 1305.17, |Param|: 5452.32, |GParam|: 35.72, Training: 124/54/70 total/source/target tokens/sec Epoch: 4, Batch: 4500/11961, Batch size: 16, LR: 0.1000, PPL: 1312.41, |Param|: 5452.33, |GParam|: 39.39, Training: 124/54/70 total/source/target tokens/sec Epoch: 4, Batch: 4550/11961, Batch size: 16, LR: 0.1000, PPL: 1319.01, |Param|: 5452.33, |GParam|: 37.59, Training: 124/54/70 total/source/target tokens/sec Epoch: 4, Batch: 4600/11961, Batch size: 16, LR: 0.1000, PPL: 1326.43, |Param|: 5452.36, |GParam|: 36.73, Training: 125/54/70 total/source/target tokens/sec Epoch: 4, Batch: 4650/11961, Batch size: 16, LR: 0.1000, PPL: 1333.61, |Param|: 5452.37, |GParam|: 29.23, Training: 125/54/70 total/source/target tokens/sec Epoch: 4, Batch: 4700/11961, Batch size: 16, LR: 0.1000, PPL: 1340.78, |Param|: 5452.40, |GParam|: 34.52, Training: 125/55/70 total/source/target tokens/sec Epoch: 4, Batch: 4750/11961, Batch size: 16, LR: 0.1000, PPL: 1348.70, |Param|: 5452.44, |GParam|: 32.44, Training: 125/55/70 total/source/target tokens/sec Epoch: 4, Batch: 4800/11961, Batch size: 16, LR: 0.1000, PPL: 1356.56, |Param|: 5452.47, |GParam|: 26.11, Training: 125/55/70 total/source/target tokens/sec Epoch: 4, Batch: 4850/11961, Batch size: 16, LR: 0.1000, PPL: 1365.05, |Param|: 5452.53, |GParam|: 30.48, Training: 125/55/70 total/source/target tokens/sec Epoch: 4, Batch: 4900/11961, Batch size: 16, LR: 0.1000, PPL: 1374.27, |Param|: 5452.58, |GParam|: 37.06, Training: 125/55/70 total/source/target tokens/sec Epoch: 4, Batch: 4950/11961, Batch size: 16, LR: 0.1000, PPL: 1381.79, |Param|: 5452.63, |GParam|: 36.63, Training: 125/55/70 total/source/target tokens/sec Epoch: 4, Batch: 5000/11961, Batch size: 16, LR: 0.1000, PPL: 1388.98, |Param|: 5452.65, |GParam|: 40.94, Training: 125/55/70 total/source/target tokens/sec Epoch: 4, Batch: 5050/11961, Batch size: 16, LR: 0.1000, PPL: 1395.93, |Param|: 5452.69, |GParam|: 37.72, Training: 125/55/70 total/source/target tokens/sec Epoch: 4, Batch: 5100/11961, Batch size: 16, LR: 0.1000, PPL: 1403.83, |Param|: 5452.74, |GParam|: 37.77, Training: 125/55/70 total/source/target tokens/sec Epoch: 4, Batch: 5150/11961, Batch size: 16, LR: 0.1000, PPL: 1411.88, |Param|: 5452.78, |GParam|: 40.16, Training: 125/55/70 total/source/target tokens/sec Epoch: 4, Batch: 5200/11961, Batch size: 16, LR: 0.1000, PPL: 1420.65, |Param|: 5452.83, |GParam|: 35.29, Training: 125/55/70 total/source/target tokens/sec Epoch: 4, Batch: 5250/11961, Batch size: 16, LR: 0.1000, PPL: 1428.88, |Param|: 5452.89, |GParam|: 41.48, Training: 126/55/70 total/source/target tokens/sec Epoch: 4, Batch: 5300/11961, Batch size: 16, LR: 0.1000, PPL: 1436.93, |Param|: 5452.93, |GParam|: 29.28, Training: 126/56/70 total/source/target tokens/sec Epoch: 4, Batch: 5350/11961, Batch size: 16, LR: 0.1000, PPL: 1446.84, |Param|: 5452.99, |GParam|: 36.39, Training: 126/56/70 total/source/target tokens/sec Epoch: 4, Batch: 5400/11961, Batch size: 16, LR: 0.1000, PPL: 1455.24, |Param|: 5453.06, |GParam|: 54.04, Training: 126/56/70 total/source/target tokens/sec Epoch: 4, Batch: 5450/11961, Batch size: 16, LR: 0.1000, PPL: 1463.43, |Param|: 5453.12, |GParam|: 44.93, Training: 126/56/70 total/source/target tokens/sec Epoch: 4, Batch: 5500/11961, Batch size: 16, LR: 0.1000, PPL: 1472.14, |Param|: 5453.20, |GParam|: 43.82, Training: 126/56/70 total/source/target tokens/sec Epoch: 4, Batch: 5550/11961, Batch size: 16, LR: 0.1000, PPL: 1479.54, |Param|: 5453.24, |GParam|: 33.11, Training: 126/56/70 total/source/target tokens/sec Epoch: 4, Batch: 5600/11961, Batch size: 16, LR: 0.1000, PPL: 1487.53, |Param|: 5453.29, |GParam|: 30.77, Training: 126/56/70 total/source/target tokens/sec Epoch: 4, Batch: 5650/11961, Batch size: 16, LR: 0.1000, PPL: 1496.48, |Param|: 5453.36, |GParam|: 33.16, Training: 126/56/70 total/source/target tokens/sec Epoch: 4, Batch: 5700/11961, Batch size: 16, LR: 0.1000, PPL: 1504.72, |Param|: 5453.45, |GParam|: 43.09, Training: 126/56/70 total/source/target tokens/sec Epoch: 4, Batch: 5750/11961, Batch size: 16, LR: 0.1000, PPL: 1511.84, |Param|: 5453.52, |GParam|: 34.74, Training: 126/56/70 total/source/target tokens/sec Epoch: 4, Batch: 5800/11961, Batch size: 16, LR: 0.1000, PPL: 1519.33, |Param|: 5453.61, |GParam|: 41.04, Training: 126/56/70 total/source/target tokens/sec Epoch: 4, Batch: 5850/11961, Batch size: 16, LR: 0.1000, PPL: 1527.87, |Param|: 5453.67, |GParam|: 37.53, Training: 126/56/70 total/source/target tokens/sec Epoch: 4, Batch: 5900/11961, Batch size: 16, LR: 0.1000, PPL: 1536.36, |Param|: 5453.75, |GParam|: 39.51, Training: 127/57/70 total/source/target tokens/sec Epoch: 4, Batch: 5950/11961, Batch size: 16, LR: 0.1000, PPL: 1543.71, |Param|: 5453.84, |GParam|: 41.44, Training: 127/57/69 total/source/target tokens/sec Epoch: 4, Batch: 6000/11961, Batch size: 16, LR: 0.1000, PPL: 1552.07, |Param|: 5453.92, |GParam|: 30.33, Training: 127/57/69 total/source/target tokens/sec Epoch: 4, Batch: 6050/11961, Batch size: 16, LR: 0.1000, PPL: 1560.76, |Param|: 5454.02, |GParam|: 39.46, Training: 127/57/69 total/source/target tokens/sec Epoch: 4, Batch: 6100/11961, Batch size: 16, LR: 0.1000, PPL: 1568.55, |Param|: 5454.10, |GParam|: 40.51, Training: 127/57/69 total/source/target tokens/sec Epoch: 4, Batch: 6150/11961, Batch size: 16, LR: 0.1000, PPL: 1576.56, |Param|: 5454.20, |GParam|: 38.88, Training: 127/57/69 total/source/target tokens/sec Epoch: 4, Batch: 6200/11961, Batch size: 16, LR: 0.1000, PPL: 1585.20, |Param|: 5454.32, |GParam|: 46.58, Training: 127/57/69 total/source/target tokens/sec Epoch: 4, Batch: 6250/11961, Batch size: 16, LR: 0.1000, PPL: 1592.74, |Param|: 5454.44, |GParam|: 30.62, Training: 127/57/69 total/source/target tokens/sec Epoch: 4, Batch: 6300/11961, Batch size: 16, LR: 0.1000, PPL: 1601.89, |Param|: 5454.56, |GParam|: 33.84, Training: 127/57/69 total/source/target tokens/sec Epoch: 4, Batch: 6350/11961, Batch size: 16, LR: 0.1000, PPL: 1610.65, |Param|: 5454.67, |GParam|: 34.36, Training: 127/57/69 total/source/target tokens/sec Epoch: 4, Batch: 6400/11961, Batch size: 16, LR: 0.1000, PPL: 1618.82, |Param|: 5454.79, |GParam|: 37.54, Training: 127/57/69 total/source/target tokens/sec Epoch: 4, Batch: 6450/11961, Batch size: 16, LR: 0.1000, PPL: 1628.79, |Param|: 5454.92, |GParam|: 42.77, Training: 127/57/69 total/source/target tokens/sec Epoch: 4, Batch: 6500/11961, Batch size: 16, LR: 0.1000, PPL: 1636.95, |Param|: 5455.03, |GParam|: 45.41, Training: 127/58/69 total/source/target tokens/sec Epoch: 4, Batch: 6550/11961, Batch size: 16, LR: 0.1000, PPL: 1645.05, |Param|: 5455.14, |GParam|: 35.40, Training: 128/58/69 total/source/target tokens/sec Epoch: 4, Batch: 6600/11961, Batch size: 16, LR: 0.1000, PPL: 1652.73, |Param|: 5455.25, |GParam|: 42.15, Training: 128/58/69 total/source/target tokens/sec Epoch: 4, Batch: 6650/11961, Batch size: 16, LR: 0.1000, PPL: 1660.53, |Param|: 5455.37, |GParam|: 33.47, Training: 128/58/69 total/source/target tokens/sec Epoch: 4, Batch: 6700/11961, Batch size: 16, LR: 0.1000, PPL: 1669.74, |Param|: 5455.48, |GParam|: 41.42, Training: 128/58/69 total/source/target tokens/sec Epoch: 4, Batch: 6750/11961, Batch size: 16, LR: 0.1000, PPL: 1677.50, |Param|: 5455.61, |GParam|: 33.92, Training: 128/58/69 total/source/target tokens/sec Epoch: 4, Batch: 6800/11961, Batch size: 16, LR: 0.1000, PPL: 1685.91, |Param|: 5455.72, |GParam|: 37.35, Training: 128/58/69 total/source/target tokens/sec Epoch: 4, Batch: 6850/11961, Batch size: 16, LR: 0.1000, PPL: 1694.85, |Param|: 5455.86, |GParam|: 37.43, Training: 128/58/69 total/source/target tokens/sec Epoch: 4, Batch: 6900/11961, Batch size: 16, LR: 0.1000, PPL: 1703.76, |Param|: 5455.98, |GParam|: 37.12, Training: 128/58/69 total/source/target tokens/sec Epoch: 4, Batch: 6950/11961, Batch size: 16, LR: 0.1000, PPL: 1712.49, |Param|: 5456.13, |GParam|: 37.28, Training: 128/58/69 total/source/target tokens/sec Epoch: 4, Batch: 7000/11961, Batch size: 16, LR: 0.1000, PPL: 1720.93, |Param|: 5456.26, |GParam|: 42.58, Training: 128/58/69 total/source/target tokens/sec Epoch: 4, Batch: 7050/11961, Batch size: 16, LR: 0.1000, PPL: 1728.20, |Param|: 5456.39, |GParam|: 45.00, Training: 128/58/69 total/source/target tokens/sec Epoch: 4, Batch: 7100/11961, Batch size: 16, LR: 0.1000, PPL: 1736.17, |Param|: 5456.52, |GParam|: 36.45, Training: 128/58/69 total/source/target tokens/sec Epoch: 4, Batch: 7150/11961, Batch size: 16, LR: 0.1000, PPL: 1744.50, |Param|: 5456.67, |GParam|: 38.72, Training: 128/58/69 total/source/target tokens/sec Epoch: 4, Batch: 7200/11961, Batch size: 16, LR: 0.1000, PPL: 1752.73, |Param|: 5456.81, |GParam|: 39.21, Training: 128/59/69 total/source/target tokens/sec Epoch: 4, Batch: 7250/11961, Batch size: 16, LR: 0.1000, PPL: 1761.69, |Param|: 5456.94, |GParam|: 35.10, Training: 128/59/69 total/source/target tokens/sec Epoch: 4, Batch: 7300/11961, Batch size: 16, LR: 0.1000, PPL: 1770.80, |Param|: 5457.09, |GParam|: 37.10, Training: 129/59/69 total/source/target tokens/sec Epoch: 4, Batch: 7350/11961, Batch size: 16, LR: 0.1000, PPL: 1779.60, |Param|: 5457.25, |GParam|: 41.50, Training: 129/59/69 total/source/target tokens/sec Epoch: 4, Batch: 7400/11961, Batch size: 16, LR: 0.1000, PPL: 1787.54, |Param|: 5457.39, |GParam|: 44.71, Training: 129/59/69 total/source/target tokens/sec Epoch: 4, Batch: 7450/11961, Batch size: 16, LR: 0.1000, PPL: 1795.81, |Param|: 5457.54, |GParam|: 41.37, Training: 129/59/69 total/source/target tokens/sec Epoch: 4, Batch: 7500/11961, Batch size: 16, LR: 0.1000, PPL: 1803.77, |Param|: 5457.70, |GParam|: 37.50, Training: 129/59/69 total/source/target tokens/sec Epoch: 4, Batch: 7550/11961, Batch size: 16, LR: 0.1000, PPL: 1811.07, |Param|: 5457.84, |GParam|: 33.39, Training: 129/59/69 total/source/target tokens/sec Epoch: 4, Batch: 7600/11961, Batch size: 16, LR: 0.1000, PPL: 1819.24, |Param|: 5458.01, |GParam|: 35.36, Training: 129/59/69 total/source/target tokens/sec Epoch: 4, Batch: 7650/11961, Batch size: 16, LR: 0.1000, PPL: 1829.24, |Param|: 5458.16, |GParam|: 36.11, Training: 129/59/69 total/source/target tokens/sec Epoch: 4, Batch: 7700/11961, Batch size: 16, LR: 0.1000, PPL: 1837.57, |Param|: 5458.32, |GParam|: 35.56, Training: 129/59/69 total/source/target tokens/sec Epoch: 4, Batch: 7750/11961, Batch size: 16, LR: 0.1000, PPL: 1846.44, |Param|: 5458.49, |GParam|: 42.79, Training: 129/59/69 total/source/target tokens/sec Epoch: 4, Batch: 7800/11961, Batch size: 16, LR: 0.1000, PPL: 1854.24, |Param|: 5458.66, |GParam|: 52.51, Training: 129/59/69 total/source/target tokens/sec Epoch: 4, Batch: 7850/11961, Batch size: 16, LR: 0.1000, PPL: 1863.13, |Param|: 5458.82, |GParam|: 46.25, Training: 129/59/69 total/source/target tokens/sec Epoch: 4, Batch: 7900/11961, Batch size: 16, LR: 0.1000, PPL: 1870.59, |Param|: 5458.98, |GParam|: 36.45, Training: 129/59/69 total/source/target tokens/sec Epoch: 4, Batch: 7950/11961, Batch size: 16, LR: 0.1000, PPL: 1878.52, |Param|: 5459.14, |GParam|: 38.49, Training: 129/60/69 total/source/target tokens/sec Epoch: 4, Batch: 8000/11961, Batch size: 16, LR: 0.1000, PPL: 1887.78, |Param|: 5459.30, |GParam|: 43.47, Training: 129/60/69 total/source/target tokens/sec Epoch: 4, Batch: 8050/11961, Batch size: 16, LR: 0.1000, PPL: 1895.28, |Param|: 5459.45, |GParam|: 41.15, Training: 129/60/69 total/source/target tokens/sec Epoch: 4, Batch: 8100/11961, Batch size: 16, LR: 0.1000, PPL: 1904.64, |Param|: 5459.60, |GParam|: 37.07, Training: 130/60/69 total/source/target tokens/sec Epoch: 4, Batch: 8150/11961, Batch size: 16, LR: 0.1000, PPL: 1913.41, |Param|: 5459.78, |GParam|: 40.13, Training: 130/60/69 total/source/target tokens/sec Epoch: 4, Batch: 8200/11961, Batch size: 16, LR: 0.1000, PPL: 1922.29, |Param|: 5459.97, |GParam|: 38.05, Training: 130/60/69 total/source/target tokens/sec Epoch: 4, Batch: 8250/11961, Batch size: 9, LR: 0.1000, PPL: 1931.80, |Param|: 5460.12, |GParam|: 54.44, Training: 130/60/69 total/source/target tokens/sec Epoch: 4, Batch: 8300/11961, Batch size: 16, LR: 0.1000, PPL: 1939.73, |Param|: 5460.28, |GParam|: 38.55, Training: 130/60/69 total/source/target tokens/sec Epoch: 4, Batch: 8350/11961, Batch size: 16, LR: 0.1000, PPL: 1946.83, |Param|: 5460.43, |GParam|: 34.12, Training: 130/60/69 total/source/target tokens/sec Epoch: 4, Batch: 8400/11961, Batch size: 16, LR: 0.1000, PPL: 1954.90, |Param|: 5460.60, |GParam|: 39.97, Training: 130/60/69 total/source/target tokens/sec Epoch: 4, Batch: 8450/11961, Batch size: 16, LR: 0.1000, PPL: 1964.10, |Param|: 5460.78, |GParam|: 40.31, Training: 130/60/69 total/source/target tokens/sec Epoch: 4, Batch: 8500/11961, Batch size: 16, LR: 0.1000, PPL: 1972.67, |Param|: 5460.94, |GParam|: 44.71, Training: 130/60/69 total/source/target tokens/sec Epoch: 4, Batch: 8550/11961, Batch size: 16, LR: 0.1000, PPL: 1980.99, |Param|: 5461.11, |GParam|: 42.98, Training: 130/60/69 total/source/target tokens/sec Epoch: 4, Batch: 8600/11961, Batch size: 16, LR: 0.1000, PPL: 1990.41, |Param|: 5461.30, |GParam|: 46.78, Training: 130/60/69 total/source/target tokens/sec Epoch: 4, Batch: 8650/11961, Batch size: 16, LR: 0.1000, PPL: 1999.25, |Param|: 5461.49, |GParam|: 45.61, Training: 130/60/69 total/source/target tokens/sec Epoch: 4, Batch: 8700/11961, Batch size: 16, LR: 0.1000, PPL: 2007.35, |Param|: 5461.67, |GParam|: 47.85, Training: 130/61/69 total/source/target tokens/sec Epoch: 4, Batch: 8750/11961, Batch size: 16, LR: 0.1000, PPL: 2015.75, |Param|: 5461.86, |GParam|: 42.78, Training: 130/61/69 total/source/target tokens/sec Epoch: 4, Batch: 8800/11961, Batch size: 16, LR: 0.1000, PPL: 2023.44, |Param|: 5462.04, |GParam|: 45.07, Training: 130/61/69 total/source/target tokens/sec Epoch: 4, Batch: 8850/11961, Batch size: 16, LR: 0.1000, PPL: 2031.48, |Param|: 5462.22, |GParam|: 38.76, Training: 130/61/69 total/source/target tokens/sec Epoch: 4, Batch: 8900/11961, Batch size: 16, LR: 0.1000, PPL: 2039.63, |Param|: 5462.41, |GParam|: 48.54, Training: 131/61/69 total/source/target tokens/sec Epoch: 4, Batch: 8950/11961, Batch size: 16, LR: 0.1000, PPL: 2049.13, |Param|: 5462.62, |GParam|: 38.30, Training: 131/61/69 total/source/target tokens/sec Epoch: 4, Batch: 9000/11961, Batch size: 16, LR: 0.1000, PPL: 2058.61, |Param|: 5462.80, |GParam|: 46.53, Training: 131/61/69 total/source/target tokens/sec Epoch: 4, Batch: 9050/11961, Batch size: 16, LR: 0.1000, PPL: 2066.79, |Param|: 5462.99, |GParam|: 50.29, Training: 131/61/69 total/source/target tokens/sec Epoch: 4, Batch: 9100/11961, Batch size: 16, LR: 0.1000, PPL: 2076.03, |Param|: 5463.21, |GParam|: 43.62, Training: 131/61/69 total/source/target tokens/sec Epoch: 4, Batch: 9150/11961, Batch size: 16, LR: 0.1000, PPL: 2084.24, |Param|: 5463.40, |GParam|: 45.51, Training: 131/61/69 total/source/target tokens/sec Epoch: 4, Batch: 9200/11961, Batch size: 16, LR: 0.1000, PPL: 2092.09, |Param|: 5463.58, |GParam|: 41.30, Training: 131/61/69 total/source/target tokens/sec Epoch: 4, Batch: 9250/11961, Batch size: 16, LR: 0.1000, PPL: 2100.12, |Param|: 5463.79, |GParam|: 38.90, Training: 131/61/69 total/source/target tokens/sec Epoch: 4, Batch: 9300/11961, Batch size: 16, LR: 0.1000, PPL: 2108.93, |Param|: 5463.99, |GParam|: 45.83, Training: 131/61/69 total/source/target tokens/sec Epoch: 4, Batch: 9350/11961, Batch size: 16, LR: 0.1000, PPL: 2117.46, |Param|: 5464.22, |GParam|: 48.73, Training: 131/61/69 total/source/target tokens/sec Epoch: 4, Batch: 9400/11961, Batch size: 16, LR: 0.1000, PPL: 2125.91, |Param|: 5464.42, |GParam|: 48.28, Training: 131/61/69 total/source/target tokens/sec Epoch: 4, Batch: 9450/11961, Batch size: 16, LR: 0.1000, PPL: 2135.15, |Param|: 5464.60, |GParam|: 40.55, Training: 131/61/69 total/source/target tokens/sec Epoch: 4, Batch: 9500/11961, Batch size: 16, LR: 0.1000, PPL: 2144.74, |Param|: 5464.84, |GParam|: 41.30, Training: 131/61/69 total/source/target tokens/sec Epoch: 4, Batch: 9550/11961, Batch size: 16, LR: 0.1000, PPL: 2153.36, |Param|: 5465.05, |GParam|: 43.59, Training: 131/62/69 total/source/target tokens/sec Epoch: 4, Batch: 9600/11961, Batch size: 16, LR: 0.1000, PPL: 2162.77, |Param|: 5465.24, |GParam|: 48.66, Training: 131/62/69 total/source/target tokens/sec Epoch: 4, Batch: 9650/11961, Batch size: 16, LR: 0.1000, PPL: 2171.09, |Param|: 5465.44, |GParam|: 43.76, Training: 131/62/69 total/source/target tokens/sec Epoch: 4, Batch: 9700/11961, Batch size: 16, LR: 0.1000, PPL: 2178.57, |Param|: 5465.67, |GParam|: 48.46, Training: 131/62/69 total/source/target tokens/sec Epoch: 4, Batch: 9750/11961, Batch size: 16, LR: 0.1000, PPL: 2189.03, |Param|: 5465.89, |GParam|: 46.28, Training: 131/62/69 total/source/target tokens/sec Epoch: 4, Batch: 9800/11961, Batch size: 16, LR: 0.1000, PPL: 2198.17, |Param|: 5466.12, |GParam|: 43.18, Training: 132/62/69 total/source/target tokens/sec Epoch: 4, Batch: 9850/11961, Batch size: 16, LR: 0.1000, PPL: 2207.83, |Param|: 5466.36, |GParam|: 41.87, Training: 132/62/69 total/source/target tokens/sec Epoch: 4, Batch: 9900/11961, Batch size: 16, LR: 0.1000, PPL: 2216.26, |Param|: 5466.60, |GParam|: 33.76, Training: 132/62/69 total/source/target tokens/sec Epoch: 4, Batch: 9950/11961, Batch size: 16, LR: 0.1000, PPL: 2224.46, |Param|: 5466.83, |GParam|: 52.50, Training: 132/62/69 total/source/target tokens/sec Epoch: 4, Batch: 10000/11961, Batch size: 16, LR: 0.1000, PPL: 2233.42, |Param|: 5467.05, |GParam|: 44.94, Training: 132/62/69 total/source/target tokens/sec Epoch: 4, Batch: 10050/11961, Batch size: 16, LR: 0.1000, PPL: 2242.44, |Param|: 5467.29, |GParam|: 45.63, Training: 132/62/69 total/source/target tokens/sec Epoch: 4, Batch: 10100/11961, Batch size: 16, LR: 0.1000, PPL: 2252.58, |Param|: 5467.55, |GParam|: 52.75, Training: 132/62/69 total/source/target tokens/sec Epoch: 4, Batch: 10150/11961, Batch size: 16, LR: 0.1000, PPL: 2262.10, |Param|: 5467.80, |GParam|: 45.66, Training: 132/62/69 total/source/target tokens/sec Epoch: 4, Batch: 10200/11961, Batch size: 16, LR: 0.1000, PPL: 2271.02, |Param|: 5468.05, |GParam|: 40.58, Training: 132/62/69 total/source/target tokens/sec Epoch: 4, Batch: 10250/11961, Batch size: 16, LR: 0.1000, PPL: 2281.15, |Param|: 5468.32, |GParam|: 46.62, Training: 132/62/69 total/source/target tokens/sec Epoch: 4, Batch: 10300/11961, Batch size: 16, LR: 0.1000, PPL: 2290.64, |Param|: 5468.58, |GParam|: 48.53, Training: 132/62/69 total/source/target tokens/sec Epoch: 4, Batch: 10350/11961, Batch size: 16, LR: 0.1000, PPL: 2299.61, |Param|: 5468.84, |GParam|: 46.33, Training: 132/63/69 total/source/target tokens/sec Epoch: 4, Batch: 10400/11961, Batch size: 16, LR: 0.1000, PPL: 2308.72, |Param|: 5469.11, |GParam|: 38.95, Training: 132/63/69 total/source/target tokens/sec Epoch: 4, Batch: 10450/11961, Batch size: 16, LR: 0.1000, PPL: 2317.54, |Param|: 5469.40, |GParam|: 47.88, Training: 132/63/69 total/source/target tokens/sec Epoch: 4, Batch: 10500/11961, Batch size: 16, LR: 0.1000, PPL: 2327.57, |Param|: 5469.65, |GParam|: 49.16, Training: 132/63/69 total/source/target tokens/sec Epoch: 4, Batch: 10550/11961, Batch size: 16, LR: 0.1000, PPL: 2336.13, |Param|: 5469.93, |GParam|: 43.15, Training: 132/63/69 total/source/target tokens/sec Epoch: 4, Batch: 10600/11961, Batch size: 16, LR: 0.1000, PPL: 2346.05, |Param|: 5470.21, |GParam|: 47.68, Training: 132/63/69 total/source/target tokens/sec Epoch: 4, Batch: 10650/11961, Batch size: 16, LR: 0.1000, PPL: 2354.24, |Param|: 5470.51, |GParam|: 51.68, Training: 133/63/69 total/source/target tokens/sec Epoch: 4, Batch: 10700/11961, Batch size: 16, LR: 0.1000, PPL: 2363.57, |Param|: 5470.84, |GParam|: 42.95, Training: 133/63/69 total/source/target tokens/sec Epoch: 4, Batch: 10750/11961, Batch size: 16, LR: 0.1000, PPL: 2372.52, |Param|: 5471.13, |GParam|: 49.06, Training: 133/63/69 total/source/target tokens/sec Epoch: 4, Batch: 10800/11961, Batch size: 16, LR: 0.1000, PPL: 2381.84, |Param|: 5471.47, |GParam|: 60.52, Training: 133/63/69 total/source/target tokens/sec Epoch: 4, Batch: 10850/11961, Batch size: 16, LR: 0.1000, PPL: 2390.11, |Param|: 5471.83, |GParam|: 37.45, Training: 133/63/69 total/source/target tokens/sec Epoch: 4, Batch: 10900/11961, Batch size: 16, LR: 0.1000, PPL: 2399.64, |Param|: 5472.17, |GParam|: 47.20, Training: 133/63/69 total/source/target tokens/sec Epoch: 4, Batch: 10950/11961, Batch size: 16, LR: 0.1000, PPL: 2408.25, |Param|: 5472.51, |GParam|: 51.29, Training: 133/63/69 total/source/target tokens/sec Epoch: 4, Batch: 11000/11961, Batch size: 16, LR: 0.1000, PPL: 2417.63, |Param|: 5472.88, |GParam|: 43.71, Training: 133/63/69 total/source/target tokens/sec Epoch: 4, Batch: 11050/11961, Batch size: 16, LR: 0.1000, PPL: 2426.72, |Param|: 5473.27, |GParam|: 54.30, Training: 133/63/69 total/source/target tokens/sec Epoch: 4, Batch: 11100/11961, Batch size: 16, LR: 0.1000, PPL: 2436.24, |Param|: 5473.66, |GParam|: 46.01, Training: 133/64/69 total/source/target tokens/sec Epoch: 4, Batch: 11150/11961, Batch size: 16, LR: 0.1000, PPL: 2447.11, |Param|: 5474.06, |GParam|: 58.45, Training: 133/64/69 total/source/target tokens/sec Epoch: 4, Batch: 11200/11961, Batch size: 16, LR: 0.1000, PPL: 2456.12, |Param|: 5474.44, |GParam|: 51.14, Training: 133/64/69 total/source/target tokens/sec Epoch: 4, Batch: 11250/11961, Batch size: 16, LR: 0.1000, PPL: 2465.45, |Param|: 5474.83, |GParam|: 44.54, Training: 133/64/69 total/source/target tokens/sec Epoch: 4, Batch: 11300/11961, Batch size: 16, LR: 0.1000, PPL: 2473.31, |Param|: 5475.21, |GParam|: 56.41, Training: 133/64/69 total/source/target tokens/sec Epoch: 4, Batch: 11350/11961, Batch size: 16, LR: 0.1000, PPL: 2482.22, |Param|: 5475.62, |GParam|: 49.57, Training: 133/64/69 total/source/target tokens/sec Epoch: 4, Batch: 11400/11961, Batch size: 16, LR: 0.1000, PPL: 2490.33, |Param|: 5476.03, |GParam|: 55.53, Training: 133/64/69 total/source/target tokens/sec Epoch: 4, Batch: 11450/11961, Batch size: 16, LR: 0.1000, PPL: 2499.95, |Param|: 5476.46, |GParam|: 49.73, Training: 134/64/69 total/source/target tokens/sec Epoch: 4, Batch: 11500/11961, Batch size: 16, LR: 0.1000, PPL: 2509.20, |Param|: 5476.89, |GParam|: 51.95, Training: 134/64/69 total/source/target tokens/sec Epoch: 4, Batch: 11550/11961, Batch size: 16, LR: 0.1000, PPL: 2518.37, |Param|: 5477.31, |GParam|: 55.61, Training: 134/64/69 total/source/target tokens/sec Epoch: 4, Batch: 11600/11961, Batch size: 16, LR: 0.1000, PPL: 2526.73, |Param|: 5477.74, |GParam|: 56.07, Training: 134/64/69 total/source/target tokens/sec Epoch: 4, Batch: 11650/11961, Batch size: 16, LR: 0.1000, PPL: 2537.03, |Param|: 5478.20, |GParam|: 51.14, Training: 134/64/69 total/source/target tokens/sec Epoch: 4, Batch: 11700/11961, Batch size: 16, LR: 0.1000, PPL: 2544.70, |Param|: 5478.63, |GParam|: 51.96, Training: 134/64/69 total/source/target tokens/sec Epoch: 4, Batch: 11750/11961, Batch size: 16, LR: 0.1000, PPL: 2553.19, |Param|: 5479.00, |GParam|: 57.51, Training: 134/64/69 total/source/target tokens/sec Epoch: 4, Batch: 11800/11961, Batch size: 16, LR: 0.1000, PPL: 2560.18, |Param|: 5479.42, |GParam|: 59.29, Training: 134/64/69 total/source/target tokens/sec Epoch: 4, Batch: 11850/11961, Batch size: 16, LR: 0.1000, PPL: 2565.87, |Param|: 5479.77, |GParam|: 44.02, Training: 134/65/69 total/source/target tokens/sec Epoch: 4, Batch: 11900/11961, Batch size: 16, LR: 0.1000, PPL: 2573.56, |Param|: 5480.11, |GParam|: 46.07, Training: 134/65/69 total/source/target tokens/sec Epoch: 4, Batch: 11950/11961, Batch size: 16, LR: 0.1000, PPL: 2580.50, |Param|: 5480.42, |GParam|: 90.12, Training: 134/65/69 total/source/target tokens/sec Train 36m2582.1220978721 Valid 36m2958.3082902242 saving checkpoint to demo-model_epoch4.00_2958.31.t7 Script started on Monday 24 October 2016 08:55:52 AM IST hans@hans-Lenovo-IdeaPad-Y500:~/Documents/HANS/MAC/SUCCESSFUL MODELS/ADD/seq2seq-attn-master$ th train.lua -data_file data/demo-train.hdf5 -val_data_file data/demo-val.hdf5 -savefile demo-model using CUDA on GPU 1... loading data... done! Source vocab size: 50004, Target vocab size: 150004 Source max sent len: 50, Target max sent len: 52 Number of additional features on source side: 0 Switching on memory preallocation loading demo-model_epoch4.00_2958.31.t7... Number of parameters: 84236504 (active: 84236504) Epoch: 5, Batch: 50/11961, Batch size: 16, LR: 0.0500, PPL: 375825299.43, |Param|: 5407.84, |GParam|: 503.37, Training: 131/61/69 total/source/target tokens/sec Epoch: 5, Batch: 100/11961, Batch size: 16, LR: 0.0500, PPL: 145308733.29, |Param|: 5407.19, |GParam|: 130.81, Training: 132/63/69 total/source/target tokens/sec Epoch: 5, Batch: 150/11961, Batch size: 16, LR: 0.0500, PPL: 85249666.69, |Param|: 5406.86, |GParam|: 1190.36, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 200/11961, Batch size: 16, LR: 0.0500, PPL: 59144104.98, |Param|: 5407.06, |GParam|: 160.50, Training: 133/63/69 total/source/target tokens/sec Epoch: 5, Batch: 250/11961, Batch size: 16, LR: 0.0500, PPL: 47474726.02, |Param|: 5407.71, |GParam|: 119.46, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 300/11961, Batch size: 16, LR: 0.0500, PPL: 39104983.86, |Param|: 5408.40, |GParam|: 503.27, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 350/11961, Batch size: 16, LR: 0.0500, PPL: 31825540.48, |Param|: 5409.29, |GParam|: 166.30, Training: 133/63/69 total/source/target tokens/sec Epoch: 5, Batch: 400/11961, Batch size: 16, LR: 0.0500, PPL: 27134578.14, |Param|: 5410.12, |GParam|: 72.62, Training: 133/63/69 total/source/target tokens/sec Epoch: 5, Batch: 450/11961, Batch size: 16, LR: 0.0500, PPL: 23789065.75, |Param|: 5410.93, |GParam|: 78.36, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 500/11961, Batch size: 16, LR: 0.0500, PPL: 21650306.94, |Param|: 5411.21, |GParam|: 94.56, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 550/11961, Batch size: 16, LR: 0.0500, PPL: 19412749.66, |Param|: 5411.67, |GParam|: 63.95, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 600/11961, Batch size: 16, LR: 0.0500, PPL: 17653859.36, |Param|: 5412.05, |GParam|: 94.65, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 650/11961, Batch size: 16, LR: 0.0500, PPL: 16016575.71, |Param|: 5412.53, |GParam|: 501.79, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 700/11961, Batch size: 16, LR: 0.0500, PPL: 14725156.44, |Param|: 5413.06, |GParam|: 55.03, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 750/11961, Batch size: 16, LR: 0.0500, PPL: 13521841.96, |Param|: 5413.54, |GParam|: 101.09, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 800/11961, Batch size: 16, LR: 0.0500, PPL: 12586591.06, |Param|: 5414.11, |GParam|: 556.76, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 850/11961, Batch size: 16, LR: 0.0500, PPL: 11583556.98, |Param|: 5414.80, |GParam|: 74.18, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 900/11961, Batch size: 16, LR: 0.0500, PPL: 10563873.33, |Param|: 5415.39, |GParam|: 58.28, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 950/11961, Batch size: 16, LR: 0.0500, PPL: 9845186.09, |Param|: 5416.12, |GParam|: 82.23, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1000/11961, Batch size: 16, LR: 0.0500, PPL: 9231901.39, |Param|: 5416.77, |GParam|: 100.02, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1050/11961, Batch size: 16, LR: 0.0500, PPL: 8552437.49, |Param|: 5417.53, |GParam|: 47.91, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1100/11961, Batch size: 16, LR: 0.0500, PPL: 7961960.94, |Param|: 5418.20, |GParam|: 306.29, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1150/11961, Batch size: 16, LR: 0.0500, PPL: 7474140.02, |Param|: 5418.98, |GParam|: 120.41, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1200/11961, Batch size: 16, LR: 0.0500, PPL: 7023617.36, |Param|: 5419.82, |GParam|: 33.52, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1250/11961, Batch size: 16, LR: 0.0500, PPL: 6623046.40, |Param|: 5420.69, |GParam|: 116.39, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1300/11961, Batch size: 16, LR: 0.0500, PPL: 6318415.97, |Param|: 5421.56, |GParam|: 48.06, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1350/11961, Batch size: 16, LR: 0.0500, PPL: 6005388.10, |Param|: 5422.43, |GParam|: 87.92, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1400/11961, Batch size: 16, LR: 0.0500, PPL: 5666401.48, |Param|: 5423.24, |GParam|: 49.01, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1450/11961, Batch size: 16, LR: 0.0500, PPL: 5354662.72, |Param|: 5424.01, |GParam|: 33.38, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1500/11961, Batch size: 16, LR: 0.0500, PPL: 5045090.79, |Param|: 5424.79, |GParam|: 68.16, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1550/11961, Batch size: 16, LR: 0.0500, PPL: 4768232.09, |Param|: 5425.56, |GParam|: 72.80, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1600/11961, Batch size: 16, LR: 0.0500, PPL: 4480429.04, |Param|: 5426.22, |GParam|: 62.47, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1650/11961, Batch size: 16, LR: 0.0500, PPL: 4205743.11, |Param|: 5426.81, |GParam|: 133.23, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1700/11961, Batch size: 16, LR: 0.0500, PPL: 3960205.29, |Param|: 5427.36, |GParam|: 85.19, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1750/11961, Batch size: 16, LR: 0.0500, PPL: 3714690.46, |Param|: 5427.88, |GParam|: 49.96, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1800/11961, Batch size: 16, LR: 0.0500, PPL: 3484905.87, |Param|: 5428.33, |GParam|: 84.35, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1850/11961, Batch size: 16, LR: 0.0500, PPL: 3268761.37, |Param|: 5428.76, |GParam|: 69.10, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1900/11961, Batch size: 16, LR: 0.0500, PPL: 3077613.60, |Param|: 5429.18, |GParam|: 56.30, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1950/11961, Batch size: 16, LR: 0.0500, PPL: 2893075.40, |Param|: 5429.55, |GParam|: 46.39, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 2000/11961, Batch size: 16, LR: 0.0500, PPL: 2720887.17, |Param|: 5429.98, |GParam|: 49.97, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 2050/11961, Batch size: 16, LR: 0.0500, PPL: 2577265.24, |Param|: 5430.34, |GParam|: 93.80, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 2100/11961, Batch size: 16, LR: 0.0500, PPL: 2435359.13, |Param|: 5430.74, |GParam|: 63.96, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 2150/11961, Batch size: 16, LR: 0.0500, PPL: 2297617.91, |Param|: 5431.17, |GParam|: 52.57, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 2200/11961, Batch size: 16, LR: 0.0500, PPLScript started on Monday 24 October 2016 02:43:53 PM IST hans@hans-Lenovo-IdeaPad-Y500:~/Documents/HANS/MAC/SUCCESSFUL MODELS/ADD/seq2seq-attn-master$ th train.lua -data_file data/demo-train.hdf5 -val_data_file data/demo-val.hdf5 -savefile demo-model using CUDA on GPU 1... loading data... done! Source vocab size: 50004, Target vocab size: 150004 Source max sent len: 50, Target max sent len: 52 Number of additional features on source side: 0 Switching on memory preallocation loading demo-model_epoch4.00_2958.31.t7... Number of parameters: 84236504 (active: 84236504) Epoch: 5, Batch: 50/11961, Batch size: 16, LR: 0.0500, PPL: 375825299.43, |Param|: 5407.84, |GParam|: 503.37, Training: 129/60/68 total/source/target tokens/sec Epoch: 5, Batch: 100/11961, Batch size: 16, LR: 0.0500, PPL: 145308733.29, |Param|: 5407.19, |GParam|: 130.81, Training: 131/62/68 total/source/target tokens/sec Epoch: 5, Batch: 150/11961, Batch size: 16, LR: 0.0500, PPL: 85249666.69, |Param|: 5406.86, |GParam|: 1190.36, Training: 133/63/69 total/source/target tokens/sec Epoch: 5, Batch: 200/11961, Batch size: 16, LR: 0.0500, PPL: 59144104.98, |Param|: 5407.06, |GParam|: 160.50, Training: 133/63/69 total/source/target tokens/sec Epoch: 5, Batch: 250/11961, Batch size: 16, LR: 0.0500, PPL: 47474726.02, |Param|: 5407.71, |GParam|: 119.46, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 300/11961, Batch size: 16, LR: 0.0500, PPL: 39104983.86, |Param|: 5408.40, |GParam|: 503.27, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 350/11961, Batch size: 16, LR: 0.0500, PPL: 31825540.48, |Param|: 5409.29, |GParam|: 166.30, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 400/11961, Batch size: 16, LR: 0.0500, PPL: 27134578.14, |Param|: 5410.12, |GParam|: 72.62, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 450/11961, Batch size: 16, LR: 0.0500, PPL: 23789065.75, |Param|: 5410.93, |GParam|: 78.36, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 500/11961, Batch size: 16, LR: 0.0500, PPL: 21650306.94, |Param|: 5411.21, |GParam|: 94.56, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 550/11961, Batch size: 16, LR: 0.0500, PPL: 19412749.66, |Param|: 5411.67, |GParam|: 63.95, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 600/11961, Batch size: 16, LR: 0.0500, PPL: 17653859.36, |Param|: 5412.05, |GParam|: 94.65, Training: 134/65/69 total/source/target tokens/sec Epoch: 5, Batch: 650/11961, Batch size: 16, LR: 0.0500, PPL: 16016575.71, |Param|: 5412.53, |GParam|: 501.79, Training: 134/65/69 total/source/target tokens/sec Epoch: 5, Batch: 700/11961, Batch size: 16, LR: 0.0500, PPL: 14725156.44, |Param|: 5413.06, |GParam|: 55.03, Training: 134/65/69 total/source/target tokens/sec Epoch: 5, Batch: 750/11961, Batch size: 16, LR: 0.0500, PPL: 13521841.96, |Param|: 5413.54, |GParam|: 101.09, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 800/11961, Batch size: 16, LR: 0.0500, PPL: 12586591.06, |Param|: 5414.11, |GParam|: 556.76, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 850/11961, Batch size: 16, LR: 0.0500, PPL: 11583556.98, |Param|: 5414.80, |GParam|: 74.18, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 900/11961, Batch size: 16, LR: 0.0500, PPL: 10563873.33, |Param|: 5415.39, |GParam|: 58.28, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 950/11961, Batch size: 16, LR: 0.0500, PPL: 9845186.09, |Param|: 5416.12, |GParam|: 82.23, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1000/11961, Batch size: 16, LR: 0.0500, PPL: 9231901.39, |Param|: 5416.77, |GParam|: 100.02, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1050/11961, Batch size: 16, LR: 0.0500, PPL: 8552437.49, |Param|: 5417.53, |GParam|: 47.91, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1100/11961, Batch size: 16, LR: 0.0500, PPL: 7961960.94, |Param|: 5418.20, |GParam|: 306.29, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1150/11961, Batch size: 16, LR: 0.0500, PPL: 7474140.02, |Param|: 5418.98, |GParam|: 120.41, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1200/11961, Batch size: 16, LR: 0.0500, PPL: 7023617.36, |Param|: 5419.82, |GParam|: 33.52, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1250/11961, Batch size: 16, LR: 0.0500, PPL: 6623046.40, |Param|: 5420.69, |GParam|: 116.39, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1300/11961, Batch size: 16, LR: 0.0500, PPL: 6318415.97, |Param|: 5421.56, |GParam|: 48.06, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1350/11961, Batch size: 16, LR: 0.0500, PPL: 6005388.10, |Param|: 5422.43, |GParam|: 87.92, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1400/11961, Batch size: 16, LR: 0.0500, PPL: 5666401.48, |Param|: 5423.24, |GParam|: 49.01, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1450/11961, Batch size: 16, LR: 0.0500, PPL: 5354662.72, |Param|: 5424.01, |GParam|: 33.38, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1500/11961, Batch size: 16, LR: 0.0500, PPL: 5045090.79, |Param|: 5424.79, |GParam|: 68.16, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1550/11961, Batch size: 16, LR: 0.0500, PPL: 4768232.09, |Param|: 5425.56, |GParam|: 72.80, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1600/11961, Batch size: 16, LR: 0.0500, PPL: 4480429.04, |Param|: 5426.22, |GParam|: 62.47, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1650/11961, Batch size: 16, LR: 0.0500, PPL: 4205743.11, |Param|: 5426.81, |GParam|: 133.23, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1700/11961, Batch size: 16, LR: 0.0500, PPL: 3960205.29, |Param|: 5427.36, |GParam|: 85.19, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1750/11961, Batch size: 16, LR: 0.0500, PPL: 3714690.46, |Param|: 5427.88, |GParam|: 49.96, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1800/11961, Batch size: 16, LR: 0.0500, PPL: 3484905.87, |Param|: 5428.33, |GParam|: 84.35, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1850/11961, Batch size: 16, LR: 0.0500, PPL: 3268761.37, |Param|: 5428.76, |GParam|: 69.10, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1900/11961, Batch size: 16, LR: 0.0500, PPL: 3077613.60, |Param|: 5429.18, |GParam|: 56.30, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1950/11961, Batch size: 16, LR: 0.0500, PPL: 2893075.40, |Param|: 5429.55, |GParam|: 46.39, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 2000/11961, Batch size: 16, LR: 0.0500, PPL: 2720887.17, |Param|: 5429.98, |GParam|: 49.97, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 2050/11961, Batch size: 16, LR: 0.0500, PPL: 2577265.24, |Param|: 5430.34, |GParam|: 93.80, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 2100/11961, Batch size: 16, LR: 0.0500, PPL: 2435359.13, |Param|: 5430.74, |GParam|: 63.96, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 2150/11961, Batch size: 16, LR: 0.0500, PPL: 2297617.91, |Param|: 5431.17, |GParam|: 52.57, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 2200/11961, Batch size: 16, LR: 0.0500, PPL: 2169378.67, |Param|: 5431.54, |GParam|: 56.46, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 2250/11961, Batch size: 16, LR: 0.0500, PPL: 2060297.19, |Param|: 5431.93, |GParam|: 75.61, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 2300/11961, Batch size: 16, LR: 0.0500, PPL: 1954468.52, |Param|: 5432.31, |GParam|: 67.98, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 2350/11961, Batch size: 16, LR: 0.0500, PPL: 1847907.68, |Param|: 5432.70, |GParam|: 45.80, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 2400/11961, Batch size: 16, LR: 0.0500, PPL: 1761524.18, |Param|: 5433.10, |GParam|: 30.18, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 2450/11961, Batch size: 16, LR: 0.0500, PPL: 1674580.42, |Param|: 5433.55, |GParam|: 25.49, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 2500/11961, Batch size: 16, LR: 0.0500, PPL: 1599688.07, |Param|: 5433.97, |GParam|: 50.17, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 2550/11961, Batch size: 16, LR: 0.0500, PPL: 1531041.61, |Param|: 5434.32, |GParam|: 72.81, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 2600/11961, Batch size: 16, LR: 0.0500, PPL: 1463770.60, |Param|: 5434.73, |GParam|: 99.36, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 2650/11961, Batch size: 16, LR: 0.0500, PPL: 1399232.62, |Param|: 5435.16, |GParam|: 55.20, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 2700/11961, Batch size: 16, LR: 0.0500, PPL: 1335312.77, |Param|: 5435.54, |GParam|: 65.14, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 2750/11961, Batch size: 16, LR: 0.0500, PPL: 1276944.33, |Param|: 5435.91, |GParam|: 68.37, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 2800/11961, Batch size: 16, LR: 0.0500, PPL: 1226620.60, |Param|: 5436.26, |GParam|: 75.68, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 2850/11961, Batch size: 16, LR: 0.0500, PPL: 1177485.35, |Param|: 5436.70, |GParam|: 46.10, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 2900/11961, Batch size: 16, LR: 0.0500, PPL: 1130714.42, |Param|: 5437.10, |GParam|: 60.12, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 2950/11961, Batch size: 16, LR: 0.0500, PPL: 1083957.58, |Param|: 5437.52, |GParam|: 85.64, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 3000/11961, Batch size: 16, LR: 0.0500, PPL: 1045251.37, |Param|: 5437.94, |GParam|: 31.52, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 3050/11961, Batch size: 16, LR: 0.0500, PPL: 1005596.94, |Param|: 5438.35, |GParam|: 67.94, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 3100/11961, Batch size: 16, LR: 0.0500, PPL: 967709.47, |Param|: 5438.79, |GParam|: 88.67, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 3150/11961, Batch size: 16, LR: 0.0500, PPL: 934049.73, |Param|: 5439.23, |GParam|: 99.86, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 3200/11961, Batch size: 16, LR: 0.0500, PPL: 903250.47, |Param|: 5439.65, |GParam|: 52.54, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 3250/11961, Batch size: 16, LR: 0.0500, PPL: 874853.63, |Param|: 5440.06, |GParam|: 61.04, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 3300/11961, Batch size: 16, LR: 0.0500, PPL: 845726.28, |Param|: 5440.51, |GParam|: 28.38, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 3350/11961, Batch size: 16, LR: 0.0500, PPL: 818841.93, |Param|: 5440.97, |GParam|: 79.19, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 3400/11961, Batch size: 16, LR: 0.0500, PPL: 792350.68, |Param|: 5441.42, |GParam|: 64.84, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 3450/11961, Batch size: 16, LR: 0.0500, PPL: 766834.90, |Param|: 5441.84, |GParam|: 88.59, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 3500/11961, Batch size: 16, LR: 0.0500, PPL: 744506.17, |Param|: 5442.26, |GParam|: 71.37, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 3550/11961, Batch size: 16, LR: 0.0500, PPL: 722512.62, |Param|: 5442.64, |GParam|: 69.67, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 3600/11961, Batch size: 16, LR: 0.0500, PPL: 701816.91, |Param|: 5443.04, |GParam|: 92.76, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 3650/11961, Batch size: 16, LR: 0.0500, PPL: 682441.76, |Param|: 5443.44, |GParam|: 50.54, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 3700/11961, Batch size: 16, LR: 0.0500, PPL: 663877.27, |Param|: 5443.86, |GParam|: 235.30, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 3750/11961, Batch size: 16, LR: 0.0500, PPL: 644957.14, |Param|: 5444.31, |GParam|: 63.59, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 3800/11961, Batch size: 16, LR: 0.0500, PPL: 629467.85, |Param|: 5444.73, |GParam|: 58.59, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 3850/11961, Batch size: 16, LR: 0.0500, PPL: 615133.68, |Param|: 5445.12, |GParam|: 97.29, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 3900/11961, Batch size: 16, LR: 0.0500, PPL: 596943.29, |Param|: 5445.61, |GParam|: 45.76, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 3950/11961, Batch size: 16, LR: 0.0500, PPL: 582319.40, |Param|: 5446.02, |GParam|: 57.73, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 4000/11961, Batch size: 16, LR: 0.0500, PPL: 567044.69, |Param|: 5446.41, |GParam|: 60.13, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 4050/11961, Batch size: 16, LR: 0.0500, PPL: 552885.40, |Param|: 5446.83, |GParam|: 171.65, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 4100/11961, Batch size: 16, LR: 0.0500, PPL: 539461.49, |Param|: 5447.24, |GParam|: 45.93, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 4150/11961, Batch size: 16, LR: 0.0500, PPL: 525844.71, |Param|: 5447.64, |GParam|: 85.54, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 4200/11961, Batch size: 16, LR: 0.0500, PPL: 513522.85, |Param|: 5448.06, |GParam|: 91.90, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 4250/11961, Batch size: 16, LR: 0.0500, PPL: 501792.24, |Param|: 5448.48, |GParam|: 69.16, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 4300/11961, Batch size: 16, LR: 0.0500, PPL: 488145.32, |Param|: 5448.84, |GParam|: 42.59, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 4350/11961, Batch size: 16, LR: 0.0500, PPL: 476425.57, |Param|: 5449.23, |GParam|: 45.59, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 4400/11961, Batch size: 16, LR: 0.0500, PPL: 466043.41, |Param|: 5449.62, |GParam|: 84.91, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 4450/11961, Batch size: 16, LR: 0.0500, PPL: 456183.60, |Param|: 5450.02, |GParam|: 100.98, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 4500/11961, Batch size: 16, LR: 0.0500, PPL: 447537.73, |Param|: 5450.41, |GParam|: 38.18, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 4550/11961, Batch size: 16, LR: 0.0500, PPL: 438453.16, |Param|: 5450.84, |GParam|: 50.39, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 4600/11961, Batch size: 16, LR: 0.0500, PPL: 429388.95, |Param|: 5451.27, |GParam|: 74.84, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 4650/11961, Batch size: 16, LR: 0.0500, PPL: 421339.35, |Param|: 5451.64, |GParam|: 53.30, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 4700/11961, Batch size: 16, LR: 0.0500, PPL: 413021.89, |Param|: 5452.00, |GParam|: 49.44, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 4750/11961, Batch size: 16, LR: 0.0500, PPL: 404922.01, |Param|: 5452.39, |GParam|: 71.29, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 4800/11961, Batch size: 16, LR: 0.0500, PPL: 398137.91, |Param|: 5452.79, |GParam|: 49.73, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 4850/11961, Batch size: 16, LR: 0.0500, PPL: 390268.13, |Param|: 5453.16, |GParam|: 44.00, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 4900/11961, Batch size: 16, LR: 0.0500, PPL: 383058.03, |Param|: 5453.55, |GParam|: 51.24, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 4950/11961, Batch size: 16, LR: 0.0500, PPL: 376070.13, |Param|: 5453.95, |GParam|: 47.82, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 5000/11961, Batch size: 16, LR: 0.0500, PPL: 369556.87, |Param|: 5454.34, |GParam|: 53.94, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 5050/11961, Batch size: 16, LR: 0.0500, PPL: 362483.56, |Param|: 5454.73, |GParam|: 50.09, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 5100/11961, Batch size: 16, LR: 0.0500, PPL: 355928.66, |Param|: 5455.12, |GParam|: 67.90, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 5150/11961, Batch size: 16, LR: 0.0500, PPL: 349557.64, |Param|: 5455.53, |GParam|: 35.88, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 5200/11961, Batch size: 16, LR: 0.0500, PPL: 343556.76, |Param|: 5455.90, |GParam|: 37.02, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 5250/11961, Batch size: 16, LR: 0.0500, PPL: 337514.97, |Param|: 5456.29, |GParam|: 46.69, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 5300/11961, Batch size: 16, LR: 0.0500, PPL: 331367.15, |Param|: 5456.66, |GParam|: 68.85, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 5350/11961, Batch size: 16, LR: 0.0500, PPL: 325797.48, |Param|: 5457.01, |GParam|: 47.02, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 5400/11961, Batch size: 16, LR: 0.0500, PPL: 320977.70, |Param|: 5457.33, |GParam|: 54.94, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 5450/11961, Batch size: 16, LR: 0.0500, PPL: 315782.05, |Param|: 5457.69, |GParam|: 40.80, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 5500/11961, Batch size: 16, LR: 0.0500, PPL: 310188.85, |Param|: 5458.06, |GParam|: 65.69, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 5550/11961, Batch size: 16, LR: 0.0500, PPL: 305065.72, |Param|: 5458.41, |GParam|: 38.38, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 5600/11961, Batch size: 16, LR: 0.0500, PPL: 300394.39, |Param|: 5458.82, |GParam|: 68.04, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 5650/11961, Batch size: 16, LR: 0.0500, PPL: 295416.98, |Param|: 5459.22, |GParam|: 126.60, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 5700/11961, Batch size: 16, LR: 0.0500, PPL: 291013.08, |Param|: 5459.56, |GParam|: 61.19, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 5750/11961, Batch size: 16, LR: 0.0500, PPL: 286641.43, |Param|: 5459.88, |GParam|: 81.33, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 5800/11961, Batch size: 16, LR: 0.0500, PPL: 282064.14, |Param|: 5460.25, |GParam|: 51.14, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 5850/11961, Batch size: 16, LR: 0.0500, PPL: 277838.18, |Param|: 5460.63, |GParam|: 37.23, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 5900/11961, Batch size: 16, LR: 0.0500, PPL: 273984.47, |Param|: 5460.97, |GParam|: 48.73, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 5950/11961, Batch size: 16, LR: 0.0500, PPL: 269997.19, |Param|: 5461.30, |GParam|: 64.50, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 6000/11961, Batch size: 16, LR: 0.0500, PPL: 266334.38, |Param|: 5461.65, |GParam|: 49.92, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 6050/11961, Batch size: 16, LR: 0.0500, PPL: 262569.26, |Param|: 5461.98, |GParam|: 78.41, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 6100/11961, Batch size: 16, LR: 0.0500, PPL: 258748.31, |Param|: 5462.34, |GParam|: 41.99, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 6150/11961, Batch size: 16, LR: 0.0500, PPL: 255265.25, |Param|: 5462.67, |GParam|: 48.53, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 6200/11961, Batch size: 16, LR: 0.0500, PPL: 251925.79, |Param|: 5463.07, |GParam|: 19.38, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 6250/11961, Batch size: 10, LR: 0.0500, PPL: 248459.48, |Param|: 5463.44, |GParam|: 89.86, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 6300/11961, Batch size: 16, LR: 0.0500, PPL: 244985.41, |Param|: 5463.83, |GParam|: 35.35, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 6350/11961, Batch size: 16, LR: 0.0500, PPL: 241883.31, |Param|: 5464.17, |GParam|: 75.27, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 6400/11961, Batch size: 16, LR: 0.0500, PPL: 238958.17, |Param|: 5464.54, |GParam|: 58.67, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 6450/11961, Batch size: 16, LR: 0.0500, PPL: 235707.02, |Param|: 5464.86, |GParam|: 36.87, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 6500/11961, Batch size: 16, LR: 0.0500, PPL: 232704.42, |Param|: 5465.24, |GParam|: 52.13, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 6550/11961, Batch size: 16, LR: 0.0500, PPL: 229554.27, |Param|: 5465.59, |GParam|: 47.01, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 6600/11961, Batch size: 16, LR: 0.0500, PPL: 226612.58, |Param|: 5465.95, |GParam|: 94.86, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 6650/11961, Batch size: 16, LR: 0.0500, PPL: 223921.26, |Param|: 5466.26, |GParam|: 26.86, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 6700/11961, Batch size: 16, LR: 0.0500, PPL: 221353.38, |Param|: 5466.62, |GParam|: 74.65, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 6750/11961, Batch size: 16, LR: 0.0500, PPL: 218732.46, |Param|: 5466.98, |GParam|: 18.39, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 6800/11961, Batch size: 16, LR: 0.0500, PPL: 215769.51, |Param|: 5467.30, |GParam|: 74.23, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 6850/11961, Batch size: 16, LR: 0.0500, PPL: 213194.34, |Param|: 5467.64, |GParam|: 42.84, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 6900/11961, Batch size: 16, LR: 0.0500, PPL: 210965.36, |Param|: 5467.96, |GParam|: 32.48, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 6950/11961, Batch size: 16, LR: 0.0500, PPL: 208651.45, |Param|: 5468.27, |GParam|: 51.76, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 7000/11961, Batch size: 16, LR: 0.0500, PPL: 206244.01, |Param|: 5468.60, |GParam|: 59.81, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 7050/11961, Batch size: 16, LR: 0.0500, PPL: 203654.76, |Param|: 5468.90, |GParam|: 50.36, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 7100/11961, Batch size: 16, LR: 0.0500, PPL: 201439.58, |Param|: 5469.19, |GParam|: 173.66, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 7150/11961, Batch size: 16, LR: 0.0500, PPL: 199004.46, |Param|: 5469.52, |GParam|: 59.61, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 7200/11961, Batch size: 16, LR: 0.0500, PPL: 196622.60, |Param|: 5469.85, |GParam|: 75.71, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 7250/11961, Batch size: 16, LR: 0.0500, PPL: 194462.94, |Param|: 5470.19, |GParam|: 76.41, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 7300/11961, Batch size: 16, LR: 0.0500, PPL: 192266.69, |Param|: 5470.51, |GParam|: 46.26, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 7350/11961, Batch size: 16, LR: 0.0500, PPL: 190205.08, |Param|: 5470.83, |GParam|: 70.57, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 7400/11961, Batch size: 16, LR: 0.0500, PPL: 188102.19, |Param|: 5471.18, |GParam|: 38.85, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 7450/11961, Batch size: 16, LR: 0.0500, PPL: 185792.57, |Param|: 5471.53, |GParam|: 35.32, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 7500/11961, Batch size: 16, LR: 0.0500, PPL: 183824.15, |Param|: 5471.85, |GParam|: 52.17, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 7550/11961, Batch size: 16, LR: 0.0500, PPL: 181987.38, |Param|: 5472.16, |GParam|: 60.37, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 7600/11961, Batch size: 16, LR: 0.0500, PPL: 180091.47, |Param|: 5472.47, |GParam|: 51.46, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 7650/11961, Batch size: 16, LR: 0.0500, PPL: 178200.14, |Param|: 5472.82, |GParam|: 61.18, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 7700/11961, Batch size: 16, LR: 0.0500, PPL: 176212.78, |Param|: 5473.11, |GParam|: 66.97, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 7750/11961, Batch size: 16, LR: 0.0500, PPL: 174447.87, |Param|: 5473.42, |GParam|: 61.25, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 7800/11961, Batch size: 16, LR: 0.0500, PPL: 172723.70, |Param|: 5473.73, |GParam|: 72.88, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 7850/11961, Batch size: 16, LR: 0.0500, PPL: 170938.58, |Param|: 5474.04, |GParam|: 29.70, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 7900/11961, Batch size: 16, LR: 0.0500, PPL: 169067.70, |Param|: 5474.39, |GParam|: 17.93, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 7950/11961, Batch size: 16, LR: 0.0500, PPL: 167386.41, |Param|: 5474.69, |GParam|: 46.88, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 8000/11961, Batch size: 16, LR: 0.0500, PPL: 165653.34, |Param|: 5474.97, |GParam|: 44.31, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 8050/11961, Batch size: 16, LR: 0.0500, PPL: 164046.17, |Param|: 5475.27, |GParam|: 49.94, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 8100/11961, Batch size: 16, LR: 0.0500, PPL: 162478.08, |Param|: 5475.55, |GParam|: 52.61, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 8150/11961, Batch size: 16, LR: 0.0500, PPL: 160850.23, |Param|: 5475.88, |GParam|: 26.79, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 8200/11961, Batch size: 16, LR: 0.0500, PPL: 159338.22, |Param|: 5476.23, |GParam|: 45.89, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 8250/11961, Batch size: 16, LR: 0.0500, PPL: 157888.73, |Param|: 5476.53, |GParam|: 65.49, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 8300/11961, Batch size: 16, LR: 0.0500, PPL: 156457.65, |Param|: 5476.83, |GParam|: 42.28, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 8350/11961, Batch size: 16, LR: 0.0500, PPL: 154993.02, |Param|: 5477.13, |GParam|: 20.20, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 8400/11961, Batch size: 16, LR: 0.0500, PPL: 153529.00, |Param|: 5477.45, |GParam|: 47.49, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 8450/11961, Batch size: 16, LR: 0.0500, PPL: 152061.52, |Param|: 5477.76, |GParam|: 37.96, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 8500/11961, Batch size: 16, LR: 0.0500, PPL: 150653.59, |Param|: 5478.04, |GParam|: 60.03, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 8550/11961, Batch size: 16, LR: 0.0500, PPL: 149348.01, |Param|: 5478.33, |GParam|: 57.36, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 8600/11961, Batch size: 16, LR: 0.0500, PPL: 148081.34, |Param|: 5478.61, |GParam|: 42.24, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 8650/11961, Batch size: 16, LR: 0.0500, PPL: 146759.67, |Param|: 5478.89, |GParam|: 78.99, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 8700/11961, Batch size: 16, LR: 0.0500, PPL: 145508.58, |Param|: 5479.16, |GParam|: 75.83, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 8750/11961, Batch size: 16, LR: 0.0500, PPL: 144291.58, |Param|: 5479.40, |GParam|: 49.81, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 8800/11961, Batch size: 16, LR: 0.0500, PPL: 142989.41, |Param|: 5479.68, |GParam|: 35.24, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 8850/11961, Batch size: 16, LR: 0.0500, PPL: 141910.10, |Param|: 5479.94, |GParam|: 40.76, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 8900/11961, Batch size: 16, LR: 0.0500, PPL: 140645.44, |Param|: 5480.22, |GParam|: 33.91, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 8950/11961, Batch size: 16, LR: 0.0500, PPL: 139501.50, |Param|: 5480.50, |GParam|: 48.52, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 9000/11961, Batch size: 16, LR: 0.0500, PPL: 138342.23, |Param|: 5480.77, |GParam|: 56.54, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 9050/11961, Batch size: 16, LR: 0.0500, PPL: 137158.60, |Param|: 5481.04, |GParam|: 170.59, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 9100/11961, Batch size: 16, LR: 0.0500, PPL: 136036.90, |Param|: 5481.30, |GParam|: 72.45, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 9150/11961, Batch size: 16, LR: 0.0500, PPL: 134853.30, |Param|: 5481.61, |GParam|: 49.41, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 9200/11961, Batch size: 16, LR: 0.0500, PPL: 133785.10, |Param|: 5481.87, |GParam|: 55.86, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 9250/11961, Batch size: 16, LR: 0.0500, PPL: 132681.40, |Param|: 5482.15, |GParam|: 76.03, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 9300/11961, Batch size: 16, LR: 0.0500, PPL: 131513.46, |Param|: 5482.43, |GParam|: 25.53, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 9350/11961, Batch size: 16, LR: 0.0500, PPL: 130435.45, |Param|: 5482.66, |GParam|: 69.07, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 9400/11961, Batch size: 16, LR: 0.0500, PPL: 129480.96, |Param|: 5482.91, |GParam|: 78.18, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 9450/11961, Batch size: 16, LR: 0.0500, PPL: 128518.83, |Param|: 5483.18, |GParam|: 41.85, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 9500/11961, Batch size: 16, LR: 0.0500, PPL: 127511.97, |Param|: 5483.45, |GParam|: 53.14, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 9550/11961, Batch size: 16, LR: 0.0500, PPL: 126584.38, |Param|: 5483.69, |GParam|: 75.57, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 9600/11961, Batch size: 16, LR: 0.0500, PPL: 125616.02, |Param|: 5483.94, |GParam|: 67.61, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 9650/11961, Batch size: 16, LR: 0.0500, PPL: 124681.35, |Param|: 5484.18, |GParam|: 57.04, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 9700/11961, Batch size: 16, LR: 0.0500, PPL: 123717.06, |Param|: 5484.42, |GParam|: 28.56, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 9750/11961, Batch size: 16, LR: 0.0500, PPL: 122834.68, |Param|: 5484.66, |GParam|: 53.20, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 9800/11961, Batch size: 16, LR: 0.0500, PPL: 121944.59, |Param|: 5484.89, |GParam|: 74.18, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 9850/11961, Batch size: 16, LR: 0.0500, PPL: 121043.14, |Param|: 5485.12, |GParam|: 36.04, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 9900/11961, Batch size: 16, LR: 0.0500, PPL: 120183.10, |Param|: 5485.35, |GParam|: 46.37, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 9950/11961, Batch size: 16, LR: 0.0500, PPL: 119299.77, |Param|: 5485.62, |GParam|: 23.03, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 10000/11961, Batch size: 16, LR: 0.0500, PPL: 118418.96, |Param|: 5485.84, |GParam|: 61.04, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 10050/11961, Batch size: 16, LR: 0.0500, PPL: 117434.12, |Param|: 5486.12, |GParam|: 55.16, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 10100/11961, Batch size: 16, LR: 0.0500, PPL: 116610.09, |Param|: 5486.35, |GParam|: 46.66, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 10150/11961, Batch size: 16, LR: 0.0500, PPL: 115840.38, |Param|: 5486.58, |GParam|: 45.22, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 10200/11961, Batch size: 16, LR: 0.0500, PPL: 115029.78, |Param|: 5486.80, |GParam|: 65.62, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 10250/11961, Batch size: 16, LR: 0.0500, PPL: 114179.23, |Param|: 5487.04, |GParam|: 74.33, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 10300/11961, Batch size: 16, LR: 0.0500, PPL: 113309.71, |Param|: 5487.28, |GParam|: 64.53, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 10350/11961, Batch size: 16, LR: 0.0500, PPL: 112570.67, |Param|: 5487.52, |GParam|: 23.91, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 10400/11961, Batch size: 16, LR: 0.0500, PPL: 111825.61, |Param|: 5487.72, |GParam|: 19.01, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 10450/11961, Batch size: 16, LR: 0.0500, PPL: 111019.48, |Param|: 5487.93, |GParam|: 45.98, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 10500/11961, Batch size: 16, LR: 0.0500, PPL: 110239.30, |Param|: 5488.16, |GParam|: 75.86, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 10550/11961, Batch size: 16, LR: 0.0500, PPL: 109435.21, |Param|: 5488.36, |GParam|: 37.92, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 10600/11961, Batch size: 16, LR: 0.0500, PPL: 108688.86, |Param|: 5488.56, |GParam|: 75.45, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 10650/11961, Batch size: 16, LR: 0.0500, PPL: 107936.14, |Param|: 5488.79, |GParam|: 30.26, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 10700/11961, Batch size: 16, LR: 0.0500, PPL: 107203.03, |Param|: 5488.99, |GParam|: 79.86, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 10750/11961, Batch size: 16, LR: 0.0500, PPL: 106467.61, |Param|: 5489.23, |GParam|: 65.71, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 10800/11961, Batch size: 16, LR: 0.0500, PPL: 105819.85, |Param|: 5489.42, |GParam|: 45.50, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 10850/11961, Batch size: 16, LR: 0.0500, PPL: 105153.81, |Param|: 5489.64, |GParam|: 23.89, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 10900/11961, Batch size: 16, LR: 0.0500, PPL: 104458.03, |Param|: 5489.87, |GParam|: 80.48, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 10950/11961, Batch size: 16, LR: 0.0500, PPL: 103801.43, |Param|: 5490.09, |GParam|: 62.86, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 11000/11961, Batch size: 16, LR: 0.0500, PPL: 103129.74, |Param|: 5490.27, |GParam|: 70.13, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 11050/11961, Batch size: 16, LR: 0.0500, PPL: 102493.47, |Param|: 5490.51, |GParam|: 50.86, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 11100/11961, Batch size: 16, LR: 0.0500, PPL: 101833.36, |Param|: 5490.72, |GParam|: 48.55, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 11150/11961, Batch size: 16, LR: 0.0500, PPL: 101108.21, |Param|: 5490.94, |GParam|: 54.16, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 11200/11961, Batch size: 16, LR: 0.0500, PPL: 100446.07, |Param|: 5491.18, |GParam|: 80.36, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 11250/11961, Batch size: 16, LR: 0.0500, PPL: 99838.14, |Param|: 5491.40, |GParam|: 66.85, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 11300/11961, Batch size: 16, LR: 0.0500, PPL: 99251.19, |Param|: 5491.58, |GParam|: 90.31, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 11350/11961, Batch size: 16, LR: 0.0500, PPL: 98690.95, |Param|: 5491.82, |GParam|: 65.21, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 11400/11961, Batch size: 16, LR: 0.0500, PPL: 98089.97, |Param|: 5492.02, |GParam|: 77.42, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 11450/11961, Batch size: 16, LR: 0.0500, PPL: 97517.25, |Param|: 5492.23, |GParam|: 71.61, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 11500/11961, Batch size: 16, LR: 0.0500, PPL: 96917.71, |Param|: 5492.43, |GParam|: 88.04, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 11550/11961, Batch size: 16, LR: 0.0500, PPL: 96284.77, |Param|: 5492.63, |GParam|: 54.69, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 11600/11961, Batch size: 16, LR: 0.0500, PPL: 95716.78, |Param|: 5492.82, |GParam|: 57.62, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 11650/11961, Batch size: 16, LR: 0.0500, PPL: 95098.65, |Param|: 5493.00, |GParam|: 56.70, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 11700/11961, Batch size: 16, LR: 0.0500, PPL: 94578.59, |Param|: 5493.20, |GParam|: 47.69, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 11750/11961, Batch size: 16, LR: 0.0500, PPL: 94014.13, |Param|: 5493.42, |GParam|: 58.79, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 11800/11961, Batch size: 16, LR: 0.0500, PPL: 93431.58, |Param|: 5493.62, |GParam|: 48.69, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 11850/11961, Batch size: 16, LR: 0.0500, PPL: 92891.99, |Param|: 5493.82, |GParam|: 67.50, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 11900/11961, Batch size: 16, LR: 0.0500, PPL: 92360.13, |Param|: 5494.00, |GParam|: 44.28, Training: 133/64/68 total/source/target tokens/sec Epoch: 5, Batch: 11950/11961, Batch size: 16, LR: 0.0500, PPL: 91837.79, |Param|: 5494.19, |GParam|: 66.32, Training: 133/64/68 total/source/target tokens/sec Train 36m91718.663677416 Valid 36m6079.4950706264 saving checkpoint to demo-model_epoch5.00_6079.50.t7 Epoch: 6, Batch: 50/11961, Batch size: 16, LR: 0.0500, PPL: 16191.38, |Param|: 5494.40, |GParam|: 69.73, Training: 132/63/69 total/source/target tokens/sec Epoch: 6, Batch: 100/11961, Batch size: 16, LR: 0.0500, PPL: 16465.85, |Param|: 5494.59, |GParam|: 203.15, Training: 132/63/69 total/source/target tokens/sec Epoch: 6, Batch: 150/11961, Batch size: 16, LR: 0.0500, PPL: 16661.21, |Param|: 5494.76, |GParam|: 73.00, Training: 132/63/69 total/source/target tokens/sec Epoch: 6, Batch: 200/11961, Batch size: 16, LR: 0.0500, PPL: 16747.30, |Param|: 5494.92, |GParam|: 43.75, Training: 133/64/69 total/source/target tokens/sec Epoch: 6, Batch: 250/11961, Batch size: 16, LR: 0.0500, PPL: 16781.55, |Param|: 5495.11, |GParam|: 47.37, Training: 134/64/69 total/source/target tokens/sec Epoch: 6, Batch: 300/11961, Batch size: 16, LR: 0.0500, PPL: 17028.67, |Param|: 5495.31, |GParam|: 37.92, Training: 134/65/69 total/source/target tokens/sec Epoch: 6, Batch: 350/11961, Batch size: 16, LR: 0.0500, PPL: 17010.37, |Param|: 5495.45, |GParam|: 78.56, Training: 133/64/69 total/source/target tokens/sec Epoch: 6, Batch: 400/11961, Batch size: 16, LR: 0.0500, PPL: 16790.48, |Param|: 5495.58, |GParam|: 45.17, Training: 133/64/69 total/source/target tokens/sec Epoch: 6, Batch: 450/11961, Batch size: 16, LR: 0.0500, PPL: 17011.51, |Param|: 5495.73, |GParam|: 37.18, Training: 134/65/69 total/source/target tokens/sec Epoch: 6, Batch: 500/11961, Batch size: 16, LR: 0.0500, PPL: 16983.71, |Param|: 5495.88, |GParam|: 41.73, Training: 134/65/69 total/source/target tokens/sec Epoch: 6, Batch: 550/11961, Batch size: 16, LR: 0.0500, PPL: 16692.28, |Param|: 5496.00, |GParam|: 51.27, Training: 133/64/69 total/source/target tokens/sec Epoch: 6, Batch: 600/11961, Batch size: 16, LR: 0.0500, PPL: 16672.06, |Param|: 5496.14, |GParam|: 84.30, Training: 134/64/69 total/source/target tokens/sec Epoch: 6, Batch: 650/11961, Batch size: 16, LR: 0.0500, PPL: 16659.50, |Param|: 5496.29, |GParam|: 85.45, Training: 133/64/69 total/source/target tokens/sec Epoch: 6, Batch: 700/11961, Batch size: 16, LR: 0.0500, PPL: 16785.19, |Param|: 5496.41, |GParam|: 100.51, Training: 133/64/69 total/source/target tokens/sec Epoch: 6, Batch: 750/11961, Batch size: 16, LR: 0.0500, PPL: 16786.24, |Param|: 5496.53, |GParam|: 67.60, Training: 133/64/69 total/source/target tokens/sec Epoch: 6, Batch: 800/11961, Batch size: 16, LR: 0.0500, PPL: 16714.52, |Param|: 5496.65, |GParam|: 66.79, Training: 133/64/69 total/source/target tokens/sec Epoch: 6, Batch: 850/11961, Batch size: 16, LR: 0.0500, PPL: 16698.74, |Param|: 5496.76, |GParam|: 70.77, Training: 133/64/69 total/source/target tokens/sec Epoch: 6, Batch: 900/11961, Batch size: 16, LR: 0.0500, PPL: 16625.04, |Param|: 5496.87, |GParam|: 42.85, Training: 133/64/69 total/source/target tokens/sec Epoch: 6, Batch: 950/11961, Batch size: 16, LR: 0.0500, PPL: 16573.68, |Param|: 5496.98, |GParam|: 70.12, Training: 133/64/69 total/source/target tokens/sec Epoch: 6, Batch: 1000/11961, Batch size: 16, LR: 0.0500, PPL: 16535.20, |Param|: 5497.10, |GParam|: 66.69, Training: 133/64/69 total/source/target tokens/sec ^C/home/hans/torch/install/bin/luajit: /home/hans/.luarocks/share/lua/5.1/nn/MM.lua:85: interrupted! stack traceback: /home/hans/.luarocks/share/lua/5.1/nn/MM.lua:85: in function 'updateGradInput' /home/hans/.luarocks/share/lua/5.1/nngraph/gmodule.lua:420: in function 'neteval' /home/hans/.luarocks/share/lua/5.1/nngraph/gmodule.lua:454: in function 'updateGradInput' /home/hans/.luarocks/share/lua/5.1/nngraph/gmodule.lua:420: in function 'neteval' /home/hans/.luarocks/share/lua/5.1/nngraph/gmodule.lua:454: in function 'updateGradInput' /home/hans/.luarocks/share/lua/5.1/nn/Module.lua:31: in function 'backward' train.lua:535: in function 'train_batch' train.lua:745: in function 'train' train.lua:1071: in function 'main' train.lua:1074: in main chunk [C]: in function 'dofile' ...hans/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x00405d50 hans@hans-Lenovo-IdeaPad-Y500: ~/Documents/HANS/MAC/SUCCESSFUL MODELS/ADD/seq2seq-attn-master hans@hans-Lenovo-IdeaPad-Y500:~/Documents/HANS/MAC/SUCCESSFUL MODELS/ADD/seq2seq- -attn-master$ exit exit Script done on Tuesday 25 October 2016 11:03:19 AM IST hans@hans-Lenovo-IdeaPad-Y500: ~/Documents/HANS/MAC/SUCCESSFUL MODELS/ADD/seq2seq-attn-master hans@hans-Lenovo-IdeaPad-Y500:~/Documents/HANS/MAC/SUCCESSFUL MODELS/ADD/seq2seq- -attn-master$ th train.lua -data_file data/demo-train.hdf5 -val_data_file data/demo-val.hdf5 -savefile demo-model using CUDA on GPU 1... loading data... done! Source vocab size: 50004, Target vocab size: 150004 Source max sent len: 50, Target max sent len: 52 Number of additional features on source side: 0 Switching on memory preallocation loading demo-model_epoch4.00_2958.31.t7... Number of parameters: 84236504 (active: 84236504) Epoch: 5, Batch: 50/11961, Batch size: 16, LR: 0.0500, PPL: 375825299.43, |Param|: 5407.84, |GParam|: 503.37, Training: 130/61/69 total/source/target tokens/sec Epoch: 5, Batch: 100/11961, Batch size: 16, LR: 0.0500, PPL: 145308733.29, |Param|: 5407.19, |GParam|: 130.81, Training: 132/62/69 total/source/target tokens/sec Epoch: 5, Batch: 150/11961, Batch size: 16, LR: 0.0500, PPL: 85249666.69, |Param|: 5406.86, |GParam|: 1190.36, Training: 133/63/69 total/source/target tokens/sec Epoch: 5, Batch: 200/11961, Batch size: 16, LR: 0.0500, PPL: 59144104.98, |Param|: 5407.06, |GParam|: 160.50, Training: 132/63/69 total/source/target tokens/sec Epoch: 5, Batch: 250/11961, Batch size: 16, LR: 0.0500, PPL: 47474726.02, |Param|: 5407.71, |GParam|: 119.46, Training: 133/63/69 total/source/target tokens/sec Epoch: 5, Batch: 300/11961, Batch size: 16, LR: 0.0500, PPL: 39104983.86, |Param|: 5408.40, |GParam|: 503.27, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 350/11961, Batch size: 16, LR: 0.0500, PPL: 31825540.48, |Param|: 5409.29, |GParam|: 166.30, Training: 133/63/69 total/source/target tokens/sec Epoch: 5, Batch: 400/11961, Batch size: 16, LR: 0.0500, PPL: 27134578.14, |Param|: 5410.12, |GParam|: 72.62, Training: 133/63/69 total/source/target tokens/sec Epoch: 5, Batch: 450/11961, Batch size: 16, LR: 0.0500, PPL: 23789065.75, |Param|: 5410.93, |GParam|: 78.36, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 500/11961, Batch size: 16, LR: 0.0500, PPL: 21650306.94, |Param|: 5411.21, |GParam|: 94.56, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 550/11961, Batch size: 16, LR: 0.0500, PPL: 19412749.66, |Param|: 5411.67, |GParam|: 63.95, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 600/11961, Batch size: 16, LR: 0.0500, PPL: 17653859.36, |Param|: 5412.05, |GParam|: 94.65, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 650/11961, Batch size: 16, LR: 0.0500, PPL: 16016575.71, |Param|: 5412.53, |GParam|: 501.79, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 700/11961, Batch size: 16, LR: 0.0500, PPL: 14725156.44, |Param|: 5413.06, |GParam|: 55.03, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 750/11961, Batch size: 16, LR: 0.0500, PPL: 13521841.96, |Param|: 5413.54, |GParam|: 101.09, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 800/11961, Batch size: 16, LR: 0.0500, PPL: 12586591.06, |Param|: 5414.11, |GParam|: 556.76, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 850/11961, Batch size: 16, LR: 0.0500, PPL: 11583556.98, |Param|: 5414.80, |GParam|: 74.18, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 900/11961, Batch size: 16, LR: 0.0500, PPL: 10563873.33, |Param|: 5415.39, |GParam|: 58.28, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 950/11961, Batch size: 16, LR: 0.0500, PPL: 9845186.09, |Param|: 5416.12, |GParam|: 82.23, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1000/11961, Batch size: 16, LR: 0.0500, PPL: 9231901.39, |Param|: 5416.77, |GParam|: 100.02, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1050/11961, Batch size: 16, LR: 0.0500, PPL: 8552437.49, |Param|: 5417.53, |GParam|: 47.91, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1100/11961, Batch size: 16, LR: 0.0500, PPL: 7961960.94, |Param|: 5418.20, |GParam|: 306.29, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1150/11961, Batch size: 16, LR: 0.0500, PPL: 7474140.02, |Param|: 5418.98, |GParam|: 120.41, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1200/11961, Batch size: 16, LR: 0.0500, PPL: 7023617.36, |Param|: 5419.82, |GParam|: 33.52, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1250/11961, Batch size: 16, LR: 0.0500, PPL: 6623046.40, |Param|: 5420.69, |GParam|: 116.39, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1300/11961, Batch size: 16, LR: 0.0500, PPL: 6318415.97, |Param|: 5421.56, |GParam|: 48.06, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1350/11961, Batch size: 16, LR: 0.0500, PPL: 6005388.10, |Param|: 5422.43, |GParam|: 87.92, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1400/11961, Batch size: 16, LR: 0.0500, PPL: 5666401.48, |Param|: 5423.24, |GParam|: 49.01, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1450/11961, Batch size: 16, LR: 0.0500, PPL: 5354662.72, |Param|: 5424.01, |GParam|: 33.38, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1500/11961, Batch size: 16, LR: 0.0500, PPL: 5045090.79, |Param|: 5424.79, |GParam|: 68.16, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1550/11961, Batch size: 16, LR: 0.0500, PPL: 4768232.09, |Param|: 5425.56, |GParam|: 72.80, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1600/11961, Batch size: 16, LR: 0.0500, PPL: 4480429.04, |Param|: 5426.22, |GParam|: 62.47, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1650/11961, Batch size: 16, LR: 0.0500, PPL: 4205743.11, |Param|: 5426.81, |GParam|: 133.23, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1700/11961, Batch size: 16, LR: 0.0500, PPL: 3960205.29, |Param|: 5427.36, |GParam|: 85.19, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1750/11961, Batch size: 16, LR: 0.0500, PPL: 3714690.46, |Param|: 5427.88, |GParam|: 49.96, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1800/11961, Batch size: 16, LR: 0.0500, PPL: 3484905.87, |Param|: 5428.33, |GParam|: 84.35, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1850/11961, Batch size: 16, LR: 0.0500, PPL: 3268761.37, |Param|: 5428.76, |GParam|: 69.10, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1900/11961, Batch size: 16, LR: 0.0500, PPL: 3077613.60, |Param|: 5429.18, |GParam|: 56.30, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 1950/11961, Batch size: 16, LR: 0.0500, PPL: 2893075.40, |Param|: 5429.55, |GParam|: 46.39, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 2000/11961, Batch size: 16, LR: 0.0500, PPL: 2720887.17, |Param|: 5429.98, |GParam|: 49.97, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 2050/11961, Batch size: 16, LR: 0.0500, PPL: 2577265.24, |Param|: 5430.34, |GParam|: 93.80, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 2100/11961, Batch size: 16, LR: 0.0500, PPL: 2435359.13, |Param|: 5430.74, |GParam|: 63.96, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 2150/11961, Batch size: 16, LR: 0.0500, PPL: 2297617.91, |Param|: 5431.17, |GParam|: 52.57, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 2200/11961, Batch size: 16, LR: 0.0500, PPL: 2169378.67, |Param|: 5431.54, |GParam|: 56.46, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 2250/11961, Batch size: 16, LR: 0.0500, PPL: 2060297.19, |Param|: 5431.93, |GParam|: 75.61, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 2300/11961, Batch size: 16, LR: 0.0500, PPL: 1954468.52, |Param|: 5432.31, |GParam|: 67.98, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 2350/11961, Batch size: 16, LR: 0.0500, PPL: 1847907.68, |Param|: 5432.70, |GParam|: 45.80, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 2400/11961, Batch size: 16, LR: 0.0500, PPL: 1761524.18, |Param|: 5433.10, |GParam|: 30.18, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 2450/11961, Batch size: 16, LR: 0.0500, PPL: 1674580.42, |Param|: 5433.55, |GParam|: 25.49, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 2500/11961, Batch size: 16, LR: 0.0500, PPL: 1599688.07, |Param|: 5433.97, |GParam|: 50.17, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 2550/11961, Batch size: 16, LR: 0.0500, PPL: 1531041.61, |Param|: 5434.32, |GParam|: 72.81, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 2600/11961, Batch size: 16, LR: 0.0500, PPL: 1463770.60, |Param|: 5434.73, |GParam|: 99.36, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 2650/11961, Batch size: 16, LR: 0.0500, PPL: 1399232.62, |Param|: 5435.16, |GParam|: 55.20, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 2700/11961, Batch size: 16, LR: 0.0500, PPL: 1335312.77, |Param|: 5435.54, |GParam|: 65.14, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 2750/11961, Batch size: 16, LR: 0.0500, PPL: 1276944.33, |Param|: 5435.91, |GParam|: 68.37, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 2800/11961, Batch size: 16, LR: 0.0500, PPL: 1226620.60, |Param|: 5436.26, |GParam|: 75.68, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 2850/11961, Batch size: 16, LR: 0.0500, PPL: 1177485.35, |Param|: 5436.70, |GParam|: 46.10, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 2900/11961, Batch size: 16, LR: 0.0500, PPL: 1130714.42, |Param|: 5437.10, |GParam|: 60.12, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 2950/11961, Batch size: 16, LR: 0.0500, PPL: 1083957.58, |Param|: 5437.52, |GParam|: 85.64, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 3000/11961, Batch size: 16, LR: 0.0500, PPL: 1045251.37, |Param|: 5437.94, |GParam|: 31.52, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 3050/11961, Batch size: 16, LR: 0.0500, PPL: 1005596.94, |Param|: 5438.35, |GParam|: 67.94, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 3100/11961, Batch size: 16, LR: 0.0500, PPL: 967709.47, |Param|: 5438.79, |GParam|: 88.67, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 3150/11961, Batch size: 16, LR: 0.0500, PPL: 934049.73, |Param|: 5439.23, |GParam|: 99.86, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 3200/11961, Batch size: 16, LR: 0.0500, PPL: 903250.47, |Param|: 5439.65, |GParam|: 52.54, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 3250/11961, Batch size: 16, LR: 0.0500, PPL: 874853.63, |Param|: 5440.06, |GParam|: 61.04, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 3300/11961, Batch size: 16, LR: 0.0500, PPL: 845726.28, |Param|: 5440.51, |GParam|: 28.38, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 3350/11961, Batch size: 16, LR: 0.0500, PPL: 818841.93, |Param|: 5440.97, |GParam|: 79.19, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 3400/11961, Batch size: 16, LR: 0.0500, PPL: 792350.68, |Param|: 5441.42, |GParam|: 64.84, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 3450/11961, Batch size: 16, LR: 0.0500, PPL: 766834.90, |Param|: 5441.84, |GParam|: 88.59, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 3500/11961, Batch size: 16, LR: 0.0500, PPL: 744506.17, |Param|: 5442.26, |GParam|: 71.37, Training: 133/64/69 total/source/target tokens/sec Epoch: 5, Batch: 3550/11961, Batch size: 16, LR: 0.0500, PPL: 722512.62, |Param|: 5442.64, |GParam|: 69.67, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 3600/11961, Batch size: 16, LR: 0.0500, PPL: 701816.91, |Param|: 5443.04, |GParam|: 92.76, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 3650/11961, Batch size: 16, LR: 0.0500, PPL: 682441.76, |Param|: 5443.44, |GParam|: 50.54, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 3700/11961, Batch size: 16, LR: 0.0500, PPL: 663877.27, |Param|: 5443.86, |GParam|: 235.30, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 3750/11961, Batch size: 16, LR: 0.0500, PPL: 644957.14, |Param|: 5444.31, |GParam|: 63.59, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 3800/11961, Batch size: 16, LR: 0.0500, PPL: 629467.85, |Param|: 5444.73, |GParam|: 58.59, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 3850/11961, Batch size: 16, LR: 0.0500, PPL: 615133.68, |Param|: 5445.12, |GParam|: 97.29, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 3900/11961, Batch size: 16, LR: 0.0500, PPL: 596943.29, |Param|: 5445.61, |GParam|: 45.76, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 3950/11961, Batch size: 16, LR: 0.0500, PPL: 582319.40, |Param|: 5446.02, |GParam|: 57.73, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 4000/11961, Batch size: 16, LR: 0.0500, PPL: 567044.69, |Param|: 5446.41, |GParam|: 60.13, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 4050/11961, Batch size: 16, LR: 0.0500, PPL: 552885.40, |Param|: 5446.83, |GParam|: 171.65, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 4100/11961, Batch size: 16, LR: 0.0500, PPL: 539461.49, |Param|: 5447.24, |GParam|: 45.93, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 4150/11961, Batch size: 16, LR: 0.0500, PPL: 525844.71, |Param|: 5447.64, |GParam|: 85.54, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 4200/11961, Batch size: 16, LR: 0.0500, PPL: 513522.85, |Param|: 5448.06, |GParam|: 91.90, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 4250/11961, Batch size: 16, LR: 0.0500, PPL: 501792.24, |Param|: 5448.48, |GParam|: 69.16, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 4300/11961, Batch size: 16, LR: 0.0500, PPL: 488145.32, |Param|: 5448.84, |GParam|: 42.59, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 4350/11961, Batch size: 16, LR: 0.0500, PPL: 476425.57, |Param|: 5449.23, |GParam|: 45.59, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 4400/11961, Batch size: 16, LR: 0.0500, PPL: 466043.41, |Param|: 5449.62, |GParam|: 84.91, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 4450/11961, Batch size: 16, LR: 0.0500, PPL: 456183.60, |Param|: 5450.02, |GParam|: 100.98, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 4500/11961, Batch size: 16, LR: 0.0500, PPL: 447537.73, |Param|: 5450.41, |GParam|: 38.18, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 4550/11961, Batch size: 16, LR: 0.0500, PPL: 438453.16, |Param|: 5450.84, |GParam|: 50.39, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 4600/11961, Batch size: 16, LR: 0.0500, PPL: 429388.95, |Param|: 5451.27, |GParam|: 74.84, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 4650/11961, Batch size: 16, LR: 0.0500, PPL: 421339.35, |Param|: 5451.64, |GParam|: 53.30, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 4700/11961, Batch size: 16, LR: 0.0500, PPL: 413021.89, |Param|: 5452.00, |GParam|: 49.44, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 4750/11961, Batch size: 16, LR: 0.0500, PPL: 404922.01, |Param|: 5452.39, |GParam|: 71.29, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 4800/11961, Batch size: 16, LR: 0.0500, PPL: 398137.91, |Param|: 5452.79, |GParam|: 49.73, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 4850/11961, Batch size: 16, LR: 0.0500, PPL: 390268.13, |Param|: 5453.16, |GParam|: 44.00, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 4900/11961, Batch size: 16, LR: 0.0500, PPL: 383058.03, |Param|: 5453.55, |GParam|: 51.24, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 4950/11961, Batch size: 16, LR: 0.0500, PPL: 376070.13, |Param|: 5453.95, |GParam|: 47.82, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 5000/11961, Batch size: 16, LR: 0.0500, PPL: 369556.87, |Param|: 5454.34, |GParam|: 53.94, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 5050/11961, Batch size: 16, LR: 0.0500, PPL: 362483.56, |Param|: 5454.73, |GParam|: 50.09, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 5100/11961, Batch size: 16, LR: 0.0500, PPL: 355928.66, |Param|: 5455.12, |GParam|: 67.90, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 5150/11961, Batch size: 16, LR: 0.0500, PPL: 349557.64, |Param|: 5455.53, |GParam|: 35.88, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 5200/11961, Batch size: 16, LR: 0.0500, PPL: 343556.76, |Param|: 5455.90, |GParam|: 37.02, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 5250/11961, Batch size: 16, LR: 0.0500, PPL: 337514.97, |Param|: 5456.29, |GParam|: 46.69, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 5300/11961, Batch size: 16, LR: 0.0500, PPL: 331367.15, |Param|: 5456.66, |GParam|: 68.85, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 5350/11961, Batch size: 16, LR: 0.0500, PPL: 325797.48, |Param|: 5457.01, |GParam|: 47.02, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 5400/11961, Batch size: 16, LR: 0.0500, PPL: 320977.70, |Param|: 5457.33, |GParam|: 54.94, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 5450/11961, Batch size: 16, LR: 0.0500, PPL: 315782.05, |Param|: 5457.69, |GParam|: 40.80, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 5500/11961, Batch size: 16, LR: 0.0500, PPL: 310188.85, |Param|: 5458.06, |GParam|: 65.69, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 5550/11961, Batch size: 16, LR: 0.0500, PPL: 305065.72, |Param|: 5458.41, |GParam|: 38.38, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 5600/11961, Batch size: 16, LR: 0.0500, PPL: 300394.39, |Param|: 5458.82, |GParam|: 68.04, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 5650/11961, Batch size: 16, LR: 0.0500, PPL: 295416.98, |Param|: 5459.22, |GParam|: 126.60, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 5700/11961, Batch size: 16, LR: 0.0500, PPL: 291013.08, |Param|: 5459.56, |GParam|: 61.19, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 5750/11961, Batch size: 16, LR: 0.0500, PPL: 286641.43, |Param|: 5459.88, |GParam|: 81.33, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 5800/11961, Batch size: 16, LR: 0.0500, PPL: 282064.14, |Param|: 5460.25, |GParam|: 51.14, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 5850/11961, Batch size: 16, LR: 0.0500, PPL: 277838.18, |Param|: 5460.63, |GParam|: 37.23, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 5900/11961, Batch size: 16, LR: 0.0500, PPL: 273984.47, |Param|: 5460.97, |GParam|: 48.73, Training: 134/65/69 total/source/target tokens/sec Epoch: 5, Batch: 5950/11961, Batch size: 16, LR: 0.0500, PPL: 269997.19, |Param|: 5461.30, |GParam|: 64.50, Training: 134/65/69 total/source/target tokens/sec Epoch: 5, Batch: 6000/11961, Batch size: 16, LR: 0.0500, PPL: 266334.38, |Param|: 5461.65, |GParam|: 49.92, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 6050/11961, Batch size: 16, LR: 0.0500, PPL: 262569.26, |Param|: 5461.98, |GParam|: 78.41, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 6100/11961, Batch size: 16, LR: 0.0500, PPL: 258748.31, |Param|: 5462.34, |GParam|: 41.99, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 6150/11961, Batch size: 16, LR: 0.0500, PPL: 255265.25, |Param|: 5462.67, |GParam|: 48.53, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 6200/11961, Batch size: 16, LR: 0.0500, PPL: 251925.79, |Param|: 5463.07, |GParam|: 19.38, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 6250/11961, Batch size: 10, LR: 0.0500, PPL: 248459.48, |Param|: 5463.44, |GParam|: 89.86, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 6300/11961, Batch size: 16, LR: 0.0500, PPL: 244985.41, |Param|: 5463.83, |GParam|: 35.35, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 6350/11961, Batch size: 16, LR: 0.0500, PPL: 241883.31, |Param|: 5464.17, |GParam|: 75.27, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 6400/11961, Batch size: 16, LR: 0.0500, PPL: 238958.17, |Param|: 5464.54, |GParam|: 58.67, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 6450/11961, Batch size: 16, LR: 0.0500, PPL: 235707.02, |Param|: 5464.86, |GParam|: 36.87, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 6500/11961, Batch size: 16, LR: 0.0500, PPL: 232704.42, |Param|: 5465.24, |GParam|: 52.13, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 6550/11961, Batch size: 16, LR: 0.0500, PPL: 229554.27, |Param|: 5465.59, |GParam|: 47.01, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 6600/11961, Batch size: 16, LR: 0.0500, PPL: 226612.58, |Param|: 5465.95, |GParam|: 94.86, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 6650/11961, Batch size: 16, LR: 0.0500, PPL: 223921.26, |Param|: 5466.26, |GParam|: 26.86, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 6700/11961, Batch size: 16, LR: 0.0500, PPL: 221353.38, |Param|: 5466.62, |GParam|: 74.65, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 6750/11961, Batch size: 16, LR: 0.0500, PPL: 218732.46, |Param|: 5466.98, |GParam|: 18.39, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 6800/11961, Batch size: 16, LR: 0.0500, PPL: 215769.51, |Param|: 5467.30, |GParam|: 74.23, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 6850/11961, Batch size: 16, LR: 0.0500, PPL: 213194.34, |Param|: 5467.64, |GParam|: 42.84, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 6900/11961, Batch size: 16, LR: 0.0500, PPL: 210965.36, |Param|: 5467.96, |GParam|: 32.48, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 6950/11961, Batch size: 16, LR: 0.0500, PPL: 208651.45, |Param|: 5468.27, |GParam|: 51.76, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 7000/11961, Batch size: 16, LR: 0.0500, PPL: 206244.01, |Param|: 5468.60, |GParam|: 59.81, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 7050/11961, Batch size: 16, LR: 0.0500, PPL: 203654.76, |Param|: 5468.90, |GParam|: 50.36, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 7100/11961, Batch size: 16, LR: 0.0500, PPL: 201439.58, |Param|: 5469.19, |GParam|: 173.66, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 7150/11961, Batch size: 16, LR: 0.0500, PPL: 199004.46, |Param|: 5469.52, |GParam|: 59.61, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 7200/11961, Batch size: 16, LR: 0.0500, PPL: 196622.60, |Param|: 5469.85, |GParam|: 75.71, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 7250/11961, Batch size: 16, LR: 0.0500, PPL: 194462.94, |Param|: 5470.19, |GParam|: 76.41, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 7300/11961, Batch size: 16, LR: 0.0500, PPL: 192266.69, |Param|: 5470.51, |GParam|: 46.26, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 7350/11961, Batch size: 16, LR: 0.0500, PPL: 190205.08, |Param|: 5470.83, |GParam|: 70.57, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 7400/11961, Batch size: 16, LR: 0.0500, PPL: 188102.19, |Param|: 5471.18, |GParam|: 38.85, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 7450/11961, Batch size: 16, LR: 0.0500, PPL: 185792.57, |Param|: 5471.53, |GParam|: 35.32, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 7500/11961, Batch size: 16, LR: 0.0500, PPL: 183824.15, |Param|: 5471.85, |GParam|: 52.17, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 7550/11961, Batch size: 16, LR: 0.0500, PPL: 181987.38, |Param|: 5472.16, |GParam|: 60.37, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 7600/11961, Batch size: 16, LR: 0.0500, PPL: 180091.47, |Param|: 5472.47, |GParam|: 51.46, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 7650/11961, Batch size: 16, LR: 0.0500, PPL: 178200.14, |Param|: 5472.82, |GParam|: 61.18, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 7700/11961, Batch size: 16, LR: 0.0500, PPL: 176212.78, |Param|: 5473.11, |GParam|: 66.97, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 7750/11961, Batch size: 16, LR: 0.0500, PPL: 174447.87, |Param|: 5473.42, |GParam|: 61.25, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 7800/11961, Batch size: 16, LR: 0.0500, PPL: 172723.70, |Param|: 5473.73, |GParam|: 72.88, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 7850/11961, Batch size: 16, LR: 0.0500, PPL: 170938.58, |Param|: 5474.04, |GParam|: 29.70, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 7900/11961, Batch size: 16, LR: 0.0500, PPL: 169067.70, |Param|: 5474.39, |GParam|: 17.93, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 7950/11961, Batch size: 16, LR: 0.0500, PPL: 167386.41, |Param|: 5474.69, |GParam|: 46.88, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 8000/11961, Batch size: 16, LR: 0.0500, PPL: 165653.34, |Param|: 5474.97, |GParam|: 44.31, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 8050/11961, Batch size: 16, LR: 0.0500, PPL: 164046.17, |Param|: 5475.27, |GParam|: 49.94, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 8100/11961, Batch size: 16, LR: 0.0500, PPL: 162478.08, |Param|: 5475.55, |GParam|: 52.61, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 8150/11961, Batch size: 16, LR: 0.0500, PPL: 160850.23, |Param|: 5475.88, |GParam|: 26.79, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 8200/11961, Batch size: 16, LR: 0.0500, PPL: 159338.22, |Param|: 5476.23, |GParam|: 45.89, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 8250/11961, Batch size: 16, LR: 0.0500, PPL: 157888.73, |Param|: 5476.53, |GParam|: 65.49, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 8300/11961, Batch size: 16, LR: 0.0500, PPL: 156457.65, |Param|: 5476.83, |GParam|: 42.28, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 8350/11961, Batch size: 16, LR: 0.0500, PPL: 154993.02, |Param|: 5477.13, |GParam|: 20.20, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 8400/11961, Batch size: 16, LR: 0.0500, PPL: 153529.00, |Param|: 5477.45, |GParam|: 47.49, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 8450/11961, Batch size: 16, LR: 0.0500, PPL: 152061.52, |Param|: 5477.76, |GParam|: 37.96, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 8500/11961, Batch size: 16, LR: 0.0500, PPL: 150653.59, |Param|: 5478.04, |GParam|: 60.03, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 8550/11961, Batch size: 16, LR: 0.0500, PPL: 149348.01, |Param|: 5478.33, |GParam|: 57.36, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 8600/11961, Batch size: 16, LR: 0.0500, PPL: 148081.34, |Param|: 5478.61, |GParam|: 42.24, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 8650/11961, Batch size: 16, LR: 0.0500, PPL: 146759.67, |Param|: 5478.89, |GParam|: 78.99, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 8700/11961, Batch size: 16, LR: 0.0500, PPL: 145508.58, |Param|: 5479.16, |GParam|: 75.83, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 8750/11961, Batch size: 16, LR: 0.0500, PPL: 144291.58, |Param|: 5479.40, |GParam|: 49.81, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 8800/11961, Batch size: 16, LR: 0.0500, PPL: 142989.41, |Param|: 5479.68, |GParam|: 35.24, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 8850/11961, Batch size: 16, LR: 0.0500, PPL: 141910.10, |Param|: 5479.94, |GParam|: 40.76, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 8900/11961, Batch size: 16, LR: 0.0500, PPL: 140645.44, |Param|: 5480.22, |GParam|: 33.91, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 8950/11961, Batch size: 16, LR: 0.0500, PPL: 139501.50, |Param|: 5480.50, |GParam|: 48.52, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 9000/11961, Batch size: 16, LR: 0.0500, PPL: 138342.23, |Param|: 5480.77, |GParam|: 56.54, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 9050/11961, Batch size: 16, LR: 0.0500, PPL: 137158.60, |Param|: 5481.04, |GParam|: 170.59, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 9100/11961, Batch size: 16, LR: 0.0500, PPL: 136036.90, |Param|: 5481.30, |GParam|: 72.45, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 9150/11961, Batch size: 16, LR: 0.0500, PPL: 134853.30, |Param|: 5481.61, |GParam|: 49.41, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 9200/11961, Batch size: 16, LR: 0.0500, PPL: 133785.10, |Param|: 5481.87, |GParam|: 55.86, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 9250/11961, Batch size: 16, LR: 0.0500, PPL: 132681.40, |Param|: 5482.15, |GParam|: 76.03, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 9300/11961, Batch size: 16, LR: 0.0500, PPL: 131513.46, |Param|: 5482.43, |GParam|: 25.53, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 9350/11961, Batch size: 16, LR: 0.0500, PPL: 130435.45, |Param|: 5482.66, |GParam|: 69.07, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 9400/11961, Batch size: 16, LR: 0.0500, PPL: 129480.96, |Param|: 5482.91, |GParam|: 78.18, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 9450/11961, Batch size: 16, LR: 0.0500, PPL: 128518.83, |Param|: 5483.18, |GParam|: 41.85, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 9500/11961, Batch size: 16, LR: 0.0500, PPL: 127511.97, |Param|: 5483.45, |GParam|: 53.14, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 9550/11961, Batch size: 16, LR: 0.0500, PPL: 126584.38, |Param|: 5483.69, |GParam|: 75.57, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 9600/11961, Batch size: 16, LR: 0.0500, PPL: 125616.02, |Param|: 5483.94, |GParam|: 67.61, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 9650/11961, Batch size: 16, LR: 0.0500, PPL: 124681.35, |Param|: 5484.18, |GParam|: 57.04, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 9700/11961, Batch size: 16, LR: 0.0500, PPL: 123717.06, |Param|: 5484.42, |GParam|: 28.56, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 9750/11961, Batch size: 16, LR: 0.0500, PPL: 122834.68, |Param|: 5484.66, |GParam|: 53.20, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 9800/11961, Batch size: 16, LR: 0.0500, PPL: 121944.59, |Param|: 5484.89, |GParam|: 74.18, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 9850/11961, Batch size: 16, LR: 0.0500, PPL: 121043.14, |Param|: 5485.12, |GParam|: 36.04, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 9900/11961, Batch size: 16, LR: 0.0500, PPL: 120183.10, |Param|: 5485.35, |GParam|: 46.37, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 9950/11961, Batch size: 16, LR: 0.0500, PPL: 119299.77, |Param|: 5485.62, |GParam|: 23.03, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 10000/11961, Batch size: 16, LR: 0.0500, PPL: 118418.96, |Param|: 5485.84, |GParam|: 61.04, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 10050/11961, Batch size: 16, LR: 0.0500, PPL: 117434.12, |Param|: 5486.12, |GParam|: 55.16, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 10100/11961, Batch size: 16, LR: 0.0500, PPL: 116610.09, |Param|: 5486.35, |GParam|: 46.66, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 10150/11961, Batch size: 16, LR: 0.0500, PPL: 115840.38, |Param|: 5486.58, |GParam|: 45.22, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 10200/11961, Batch size: 16, LR: 0.0500, PPL: 115029.78, |Param|: 5486.80, |GParam|: 65.62, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 10250/11961, Batch size: 16, LR: 0.0500, PPL: 114179.23, |Param|: 5487.04, |GParam|: 74.33, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 10300/11961, Batch size: 16, LR: 0.0500, PPL: 113309.71, |Param|: 5487.28, |GParam|: 64.53, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 10350/11961, Batch size: 16, LR: 0.0500, PPL: 112570.67, |Param|: 5487.52, |GParam|: 23.91, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 10400/11961, Batch size: 16, LR: 0.0500, PPL: 111825.61, |Param|: 5487.72, |GParam|: 19.01, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 10450/11961, Batch size: 16, LR: 0.0500, PPL: 111019.48, |Param|: 5487.93, |GParam|: 45.98, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 10500/11961, Batch size: 16, LR: 0.0500, PPL: 110239.30, |Param|: 5488.16, |GParam|: 75.86, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 10550/11961, Batch size: 16, LR: 0.0500, PPL: 109435.21, |Param|: 5488.36, |GParam|: 37.92, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 10600/11961, Batch size: 16, LR: 0.0500, PPL: 108688.86, |Param|: 5488.56, |GParam|: 75.45, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 10650/11961, Batch size: 16, LR: 0.0500, PPL: 107936.14, |Param|: 5488.79, |GParam|: 30.26, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 10700/11961, Batch size: 16, LR: 0.0500, PPL: 107203.03, |Param|: 5488.99, |GParam|: 79.86, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 10750/11961, Batch size: 16, LR: 0.0500, PPL: 106467.61, |Param|: 5489.23, |GParam|: 65.71, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 10800/11961, Batch size: 16, LR: 0.0500, PPL: 105819.85, |Param|: 5489.42, |GParam|: 45.50, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 10850/11961, Batch size: 16, LR: 0.0500, PPL: 105153.81, |Param|: 5489.64, |GParam|: 23.89, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 10900/11961, Batch size: 16, LR: 0.0500, PPL: 104458.03, |Param|: 5489.87, |GParam|: 80.48, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 10950/11961, Batch size: 16, LR: 0.0500, PPL: 103801.43, |Param|: 5490.09, |GParam|: 62.86, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 11000/11961, Batch size: 16, LR: 0.0500, PPL: 103129.74, |Param|: 5490.27, |GParam|: 70.13, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 11050/11961, Batch size: 16, LR: 0.0500, PPL: 102493.47, |Param|: 5490.51, |GParam|: 50.86, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 11100/11961, Batch size: 16, LR: 0.0500, PPL: 101833.36, |Param|: 5490.72, |GParam|: 48.55, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 11150/11961, Batch size: 16, LR: 0.0500, PPL: 101108.21, |Param|: 5490.94, |GParam|: 54.16, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 11200/11961, Batch size: 16, LR: 0.0500, PPL: 100446.07, |Param|: 5491.18, |GParam|: 80.36, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 11250/11961, Batch size: 16, LR: 0.0500, PPL: 99838.14, |Param|: 5491.40, |GParam|: 66.85, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 11300/11961, Batch size: 16, LR: 0.0500, PPL: 99251.19, |Param|: 5491.58, |GParam|: 90.31, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 11350/11961, Batch size: 16, LR: 0.0500, PPL: 98690.95, |Param|: 5491.82, |GParam|: 65.21, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 11400/11961, Batch size: 16, LR: 0.0500, PPL: 98089.97, |Param|: 5492.02, |GParam|: 77.42, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 11450/11961, Batch size: 16, LR: 0.0500, PPL: 97517.25, |Param|: 5492.23, |GParam|: 71.61, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 11500/11961, Batch size: 16, LR: 0.0500, PPL: 96917.71, |Param|: 5492.43, |GParam|: 88.04, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 11550/11961, Batch size: 16, LR: 0.0500, PPL: 96284.77, |Param|: 5492.63, |GParam|: 54.69, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 11600/11961, Batch size: 16, LR: 0.0500, PPL: 95716.78, |Param|: 5492.82, |GParam|: 57.62, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 11650/11961, Batch size: 16, LR: 0.0500, PPL: 95098.65, |Param|: 5493.00, |GParam|: 56.70, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 11700/11961, Batch size: 16, LR: 0.0500, PPL: 94578.59, |Param|: 5493.20, |GParam|: 47.69, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 11750/11961, Batch size: 16, LR: 0.0500, PPL: 94014.13, |Param|: 5493.42, |GParam|: 58.79, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 11800/11961, Batch size: 16, LR: 0.0500, PPL: 93431.58, |Param|: 5493.62, |GParam|: 48.69, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 11850/11961, Batch size: 16, LR: 0.0500, PPL: 92891.99, |Param|: 5493.82, |GParam|: 67.50, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 11900/11961, Batch size: 16, LR: 0.0500, PPL: 92360.13, |Param|: 5494.00, |GParam|: 44.28, Training: 134/64/69 total/source/target tokens/sec Epoch: 5, Batch: 11950/11961, Batch size: 16, LR: 0.0500, PPL: 91837.79, |Param|: 5494.19, |GParam|: 66.32, Training: 134/64/69 total/source/target tokens/sec Train 36m91718.663677416 Valid 36m6079.4950706264 saving checkpoint to demo-model_epoch5.00_6079.50.t7 Epoch: 6, Batch: 50/11961, Batch size: 16, LR: 0.0500, PPL: 16191.38, |Param|: 5494.40, |GParam|: 69.73, Training: 131/62/69 total/source/target tokens/sec Epoch: 6, Batch: 100/11961, Batch size: 16, LR: 0.0500, PPL: 16465.85, |Param|: 5494.59, |GParam|: 203.15, Training: 132/62/69 total/source/target tokens/sec Epoch: 6, Batch: 150/11961, Batch size: 16, LR: 0.0500, PPL: 16661.21, |Param|: 5494.76, |GParam|: 73.00, Training: 132/63/68 total/source/target tokens/sec Epoch: 6, Batch: 200/11961, Batch size: 16, LR: 0.0500, PPL: 16747.30, |Param|: 5494.92, |GParam|: 43.75, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 250/11961, Batch size: 16, LR: 0.0500, PPL: 16781.55, |Param|: 5495.11, |GParam|: 47.37, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 300/11961, Batch size: 16, LR: 0.0500, PPL: 17028.67, |Param|: 5495.31, |GParam|: 37.92, Training: 133/65/68 total/source/target tokens/sec Epoch: 6, Batch: 350/11961, Batch size: 16, LR: 0.0500, PPL: 17010.37, |Param|: 5495.45, |GParam|: 78.56, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 400/11961, Batch size: 16, LR: 0.0500, PPL: 16790.48, |Param|: 5495.58, |GParam|: 45.17, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 450/11961, Batch size: 16, LR: 0.0500, PPL: 17011.51, |Param|: 5495.73, |GParam|: 37.18, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 500/11961, Batch size: 16, LR: 0.0500, PPL: 16983.71, |Param|: 5495.88, |GParam|: 41.73, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 550/11961, Batch size: 16, LR: 0.0500, PPL: 16692.28, |Param|: 5496.00, |GParam|: 51.27, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 600/11961, Batch size: 16, LR: 0.0500, PPL: 16672.06, |Param|: 5496.14, |GParam|: 84.30, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 650/11961, Batch size: 16, LR: 0.0500, PPL: 16659.50, |Param|: 5496.29, |GParam|: 85.45, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 700/11961, Batch size: 16, LR: 0.0500, PPL: 16785.19, |Param|: 5496.41, |GParam|: 100.51, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 750/11961, Batch size: 16, LR: 0.0500, PPL: 16786.24, |Param|: 5496.53, |GParam|: 67.60, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 800/11961, Batch size: 16, LR: 0.0500, PPL: 16714.52, |Param|: 5496.65, |GParam|: 66.79, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 850/11961, Batch size: 16, LR: 0.0500, PPL: 16698.74, |Param|: 5496.76, |GParam|: 70.77, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 900/11961, Batch size: 16, LR: 0.0500, PPL: 16625.04, |Param|: 5496.87, |GParam|: 42.85, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 950/11961, Batch size: 16, LR: 0.0500, PPL: 16573.68, |Param|: 5496.98, |GParam|: 70.12, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 1000/11961, Batch size: 16, LR: 0.0500, PPL: 16535.20, |Param|: 5497.10, |GParam|: 66.69, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 1050/11961, Batch size: 16, LR: 0.0500, PPL: 16611.37, |Param|: 5497.23, |GParam|: 51.09, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 1100/11961, Batch size: 16, LR: 0.0500, PPL: 16627.66, |Param|: 5497.34, |GParam|: 69.82, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 1150/11961, Batch size: 16, LR: 0.0500, PPL: 16582.54, |Param|: 5497.47, |GParam|: 42.48, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 1200/11961, Batch size: 16, LR: 0.0500, PPL: 16595.05, |Param|: 5497.59, |GParam|: 37.17, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 1250/11961, Batch size: 16, LR: 0.0500, PPL: 16569.63, |Param|: 5497.68, |GParam|: 40.12, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 1300/11961, Batch size: 16, LR: 0.0500, PPL: 16506.71, |Param|: 5497.77, |GParam|: 64.32, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 1350/11961, Batch size: 16, LR: 0.0500, PPL: 16491.00, |Param|: 5497.86, |GParam|: 52.18, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 1400/11961, Batch size: 16, LR: 0.0500, PPL: 16561.66, |Param|: 5497.95, |GParam|: 70.86, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 1450/11961, Batch size: 16, LR: 0.0500, PPL: 16582.03, |Param|: 5498.06, |GParam|: 62.63, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 1500/11961, Batch size: 16, LR: 0.0500, PPL: 16629.31, |Param|: 5498.14, |GParam|: 105.78, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 1550/11961, Batch size: 16, LR: 0.0500, PPL: 16651.11, |Param|: 5498.23, |GParam|: 37.61, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 1600/11961, Batch size: 16, LR: 0.0500, PPL: 16612.18, |Param|: 5498.31, |GParam|: 54.10, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 1650/11961, Batch size: 16, LR: 0.0500, PPL: 16565.03, |Param|: 5498.39, |GParam|: 44.39, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 1700/11961, Batch size: 16, LR: 0.0500, PPL: 16522.32, |Param|: 5498.48, |GParam|: 112.65, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 1750/11961, Batch size: 16, LR: 0.0500, PPL: 16522.62, |Param|: 5498.57, |GParam|: 59.87, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 1800/11961, Batch size: 16, LR: 0.0500, PPL: 16497.41, |Param|: 5498.65, |GParam|: 80.59, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 1850/11961, Batch size: 16, LR: 0.0500, PPL: 16441.11, |Param|: 5498.73, |GParam|: 75.04, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 1900/11961, Batch size: 16, LR: 0.0500, PPL: 16378.22, |Param|: 5498.81, |GParam|: 37.77, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 1950/11961, Batch size: 16, LR: 0.0500, PPL: 16391.04, |Param|: 5498.89, |GParam|: 45.58, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 2000/11961, Batch size: 16, LR: 0.0500, PPL: 16399.75, |Param|: 5498.97, |GParam|: 65.37, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 2050/11961, Batch size: 16, LR: 0.0500, PPL: 16425.55, |Param|: 5499.06, |GParam|: 35.85, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 2100/11961, Batch size: 16, LR: 0.0500, PPL: 16425.25, |Param|: 5499.14, |GParam|: 57.78, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 2150/11961, Batch size: 16, LR: 0.0500, PPL: 16378.83, |Param|: 5499.23, |GParam|: 49.91, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 2200/11961, Batch size: 16, LR: 0.0500, PPL: 16338.93, |Param|: 5499.32, |GParam|: 40.87, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 2250/11961, Batch size: 16, LR: 0.0500, PPL: 16304.26, |Param|: 5499.39, |GParam|: 88.44, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 2300/11961, Batch size: 16, LR: 0.0500, PPL: 16315.54, |Param|: 5499.47, |GParam|: 52.09, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 2350/11961, Batch size: 16, LR: 0.0500, PPL: 16306.80, |Param|: 5499.54, |GParam|: 65.30, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 2400/11961, Batch size: 16, LR: 0.0500, PPL: 16299.68, |Param|: 5499.61, |GParam|: 53.08, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 2450/11961, Batch size: 16, LR: 0.0500, PPL: 16232.02, |Param|: 5499.69, |GParam|: 47.75, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 2500/11961, Batch size: 16, LR: 0.0500, PPL: 16253.14, |Param|: 5499.77, |GParam|: 72.91, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 2550/11961, Batch size: 16, LR: 0.0500, PPL: 16256.23, |Param|: 5499.83, |GParam|: 65.15, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 2600/11961, Batch size: 16, LR: 0.0500, PPL: 16207.19, |Param|: 5499.88, |GParam|: 81.38, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 2650/11961, Batch size: 16, LR: 0.0500, PPL: 16181.89, |Param|: 5499.95, |GParam|: 74.85, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 2700/11961, Batch size: 16, LR: 0.0500, PPL: 16108.27, |Param|: 5500.01, |GParam|: 54.48, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 2750/11961, Batch size: 16, LR: 0.0500, PPL: 16085.30, |Param|: 5500.08, |GParam|: 38.64, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 2800/11961, Batch size: 16, LR: 0.0500, PPL: 16068.76, |Param|: 5500.14, |GParam|: 62.09, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 2850/11961, Batch size: 16, LR: 0.0500, PPL: 16086.77, |Param|: 5500.20, |GParam|: 35.26, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 2900/11961, Batch size: 16, LR: 0.0500, PPL: 16099.87, |Param|: 5500.28, |GParam|: 59.38, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 2950/11961, Batch size: 16, LR: 0.0500, PPL: 16068.93, |Param|: 5500.34, |GParam|: 56.78, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 3000/11961, Batch size: 16, LR: 0.0500, PPL: 16032.52, |Param|: 5500.40, |GParam|: 54.14, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 3050/11961, Batch size: 16, LR: 0.0500, PPL: 16023.02, |Param|: 5500.47, |GParam|: 47.41, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 3100/11961, Batch size: 16, LR: 0.0500, PPL: 15986.59, |Param|: 5500.53, |GParam|: 26.89, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 3150/11961, Batch size: 16, LR: 0.0500, PPL: 15954.41, |Param|: 5500.58, |GParam|: 81.20, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 3200/11961, Batch size: 16, LR: 0.0500, PPL: 15933.63, |Param|: 5500.63, |GParam|: 19.61, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 3250/11961, Batch size: 16, LR: 0.0500, PPL: 15902.98, |Param|: 5500.68, |GParam|: 68.31, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 3300/11961, Batch size: 16, LR: 0.0500, PPL: 15904.53, |Param|: 5500.72, |GParam|: 62.42, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 3350/11961, Batch size: 16, LR: 0.0500, PPL: 15912.72, |Param|: 5500.80, |GParam|: 64.55, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 3400/11961, Batch size: 16, LR: 0.0500, PPL: 15906.43, |Param|: 5500.86, |GParam|: 39.14, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 3450/11961, Batch size: 16, LR: 0.0500, PPL: 15899.10, |Param|: 5500.92, |GParam|: 44.88, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 3500/11961, Batch size: 16, LR: 0.0500, PPL: 15897.38, |Param|: 5500.98, |GParam|: 47.22, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 3550/11961, Batch size: 16, LR: 0.0500, PPL: 15865.69, |Param|: 5501.03, |GParam|: 30.85, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 3600/11961, Batch size: 16, LR: 0.0500, PPL: 15859.45, |Param|: 5501.09, |GParam|: 52.67, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 3650/11961, Batch size: 16, LR: 0.0500, PPL: 15820.91, |Param|: 5501.13, |GParam|: 43.39, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 3700/11961, Batch size: 16, LR: 0.0500, PPL: 15830.53, |Param|: 5501.19, |GParam|: 67.03, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 3750/11961, Batch size: 16, LR: 0.0500, PPL: 15806.72, |Param|: 5501.26, |GParam|: 71.02, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 3800/11961, Batch size: 16, LR: 0.0500, PPL: 15774.62, |Param|: 5501.32, |GParam|: 40.68, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 3850/11961, Batch size: 16, LR: 0.0500, PPL: 15719.82, |Param|: 5501.37, |GParam|: 49.70, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 3900/11961, Batch size: 16, LR: 0.0500, PPL: 15685.10, |Param|: 5501.42, |GParam|: 88.70, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 3950/11961, Batch size: 16, LR: 0.0500, PPL: 15673.26, |Param|: 5501.48, |GParam|: 66.42, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 4000/11961, Batch size: 16, LR: 0.0500, PPL: 15666.42, |Param|: 5501.52, |GParam|: 62.78, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 4050/11961, Batch size: 16, LR: 0.0500, PPL: 15653.23, |Param|: 5501.57, |GParam|: 74.64, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 4100/11961, Batch size: 16, LR: 0.0500, PPL: 15616.88, |Param|: 5501.62, |GParam|: 50.82, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 4150/11961, Batch size: 16, LR: 0.0500, PPL: 15594.66, |Param|: 5501.66, |GParam|: 57.81, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 4200/11961, Batch size: 16, LR: 0.0500, PPL: 15553.47, |Param|: 5501.70, |GParam|: 47.69, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 4250/11961, Batch size: 16, LR: 0.0500, PPL: 15535.27, |Param|: 5501.75, |GParam|: 41.15, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 4300/11961, Batch size: 16, LR: 0.0500, PPL: 15538.52, |Param|: 5501.78, |GParam|: 57.04, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 4350/11961, Batch size: 16, LR: 0.0500, PPL: 15527.71, |Param|: 5501.83, |GParam|: 58.53, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 4400/11961, Batch size: 16, LR: 0.0500, PPL: 15511.42, |Param|: 5501.88, |GParam|: 66.25, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 4450/11961, Batch size: 16, LR: 0.0500, PPL: 15497.60, |Param|: 5501.93, |GParam|: 35.92, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 4500/11961, Batch size: 16, LR: 0.0500, PPL: 15474.85, |Param|: 5501.97, |GParam|: 41.05, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 4550/11961, Batch size: 16, LR: 0.0500, PPL: 15438.78, |Param|: 5502.02, |GParam|: 75.74, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 4600/11961, Batch size: 16, LR: 0.0500, PPL: 15417.84, |Param|: 5502.06, |GParam|: 82.69, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 4650/11961, Batch size: 16, LR: 0.0500, PPL: 15408.44, |Param|: 5502.10, |GParam|: 59.00, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 4700/11961, Batch size: 16, LR: 0.0500, PPL: 15380.27, |Param|: 5502.14, |GParam|: 70.94, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 4750/11961, Batch size: 16, LR: 0.0500, PPL: 15356.35, |Param|: 5502.18, |GParam|: 27.56, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 4800/11961, Batch size: 16, LR: 0.0500, PPL: 15372.65, |Param|: 5502.22, |GParam|: 126.84, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 4850/11961, Batch size: 16, LR: 0.0500, PPL: 15370.32, |Param|: 5502.26, |GParam|: 59.55, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 4900/11961, Batch size: 16, LR: 0.0500, PPL: 15338.02, |Param|: 5502.31, |GParam|: 34.95, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 4950/11961, Batch size: 16, LR: 0.0500, PPL: 15344.74, |Param|: 5502.35, |GParam|: 52.90, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 5000/11961, Batch size: 16, LR: 0.0500, PPL: 15320.84, |Param|: 5502.40, |GParam|: 42.18, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 5050/11961, Batch size: 16, LR: 0.0500, PPL: 15297.71, |Param|: 5502.43, |GParam|: 68.80, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 5100/11961, Batch size: 16, LR: 0.0500, PPL: 15280.16, |Param|: 5502.47, |GParam|: 86.54, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 5150/11961, Batch size: 16, LR: 0.0500, PPL: 15260.66, |Param|: 5502.50, |GParam|: 43.65, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 5200/11961, Batch size: 16, LR: 0.0500, PPL: 15256.76, |Param|: 5502.55, |GParam|: 53.99, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 5250/11961, Batch size: 16, LR: 0.0500, PPL: 15252.41, |Param|: 5502.58, |GParam|: 55.05, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 5300/11961, Batch size: 16, LR: 0.0500, PPL: 15253.01, |Param|: 5502.62, |GParam|: 63.72, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 5350/11961, Batch size: 16, LR: 0.0500, PPL: 15229.73, |Param|: 5502.65, |GParam|: 37.55, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 5400/11961, Batch size: 16, LR: 0.0500, PPL: 15181.06, |Param|: 5502.68, |GParam|: 24.87, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 5450/11961, Batch size: 16, LR: 0.0500, PPL: 15162.42, |Param|: 5502.72, |GParam|: 87.69, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 5500/11961, Batch size: 16, LR: 0.0500, PPL: 15129.80, |Param|: 5502.75, |GParam|: 65.00, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 5550/11961, Batch size: 16, LR: 0.0500, PPL: 15094.43, |Param|: 5502.78, |GParam|: 60.26, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 5600/11961, Batch size: 16, LR: 0.0500, PPL: 15062.38, |Param|: 5502.80, |GParam|: 69.48, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 5650/11961, Batch size: 16, LR: 0.0500, PPL: 15043.58, |Param|: 5502.83, |GParam|: 31.55, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 5700/11961, Batch size: 16, LR: 0.0500, PPL: 15035.71, |Param|: 5502.87, |GParam|: 76.79, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 5750/11961, Batch size: 16, LR: 0.0500, PPL: 15034.62, |Param|: 5502.89, |GParam|: 45.46, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 5800/11961, Batch size: 16, LR: 0.0500, PPL: 15010.95, |Param|: 5502.93, |GParam|: 83.80, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 5850/11961, Batch size: 16, LR: 0.0500, PPL: 14986.24, |Param|: 5502.97, |GParam|: 76.80, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 5900/11961, Batch size: 16, LR: 0.0500, PPL: 14978.63, |Param|: 5503.00, |GParam|: 58.16, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 5950/11961, Batch size: 16, LR: 0.0500, PPL: 14944.57, |Param|: 5503.03, |GParam|: 18.53, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 6000/11961, Batch size: 16, LR: 0.0500, PPL: 14928.36, |Param|: 5503.06, |GParam|: 45.57, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 6050/11961, Batch size: 16, LR: 0.0500, PPL: 14906.13, |Param|: 5503.09, |GParam|: 48.00, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 6100/11961, Batch size: 16, LR: 0.0500, PPL: 14888.29, |Param|: 5503.11, |GParam|: 29.10, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 6150/11961, Batch size: 16, LR: 0.0500, PPL: 14864.88, |Param|: 5503.14, |GParam|: 48.30, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 6200/11961, Batch size: 16, LR: 0.0500, PPL: 14857.58, |Param|: 5503.17, |GParam|: 44.67, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 6250/11961, Batch size: 16, LR: 0.0500, PPL: 14837.16, |Param|: 5503.20, |GParam|: 69.29, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 6300/11961, Batch size: 7, LR: 0.0500, PPL: 14811.46, |Param|: 5503.22, |GParam|: 111.98, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 6350/11961, Batch size: 16, LR: 0.0500, PPL: 14795.37, |Param|: 5503.25, |GParam|: 46.97, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 6400/11961, Batch size: 16, LR: 0.0500, PPL: 14770.47, |Param|: 5503.28, |GParam|: 64.98, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 6450/11961, Batch size: 16, LR: 0.0500, PPL: 14748.63, |Param|: 5503.30, |GParam|: 25.33, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 6500/11961, Batch size: 16, LR: 0.0500, PPL: 14744.94, |Param|: 5503.33, |GParam|: 83.14, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 6550/11961, Batch size: 16, LR: 0.0500, PPL: 14732.98, |Param|: 5503.35, |GParam|: 40.25, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 6600/11961, Batch size: 16, LR: 0.0500, PPL: 14709.82, |Param|: 5503.37, |GParam|: 25.80, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 6650/11961, Batch size: 16, LR: 0.0500, PPL: 14691.72, |Param|: 5503.40, |GParam|: 63.16, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 6700/11961, Batch size: 16, LR: 0.0500, PPL: 14664.75, |Param|: 5503.43, |GParam|: 47.71, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 6750/11961, Batch size: 16, LR: 0.0500, PPL: 14640.88, |Param|: 5503.45, |GParam|: 62.70, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 6800/11961, Batch size: 16, LR: 0.0500, PPL: 14614.84, |Param|: 5503.48, |GParam|: 76.48, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 6850/11961, Batch size: 16, LR: 0.0500, PPL: 14604.74, |Param|: 5503.51, |GParam|: 62.43, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 6900/11961, Batch size: 16, LR: 0.0500, PPL: 14588.59, |Param|: 5503.53, |GParam|: 55.70, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 6950/11961, Batch size: 16, LR: 0.0500, PPL: 14561.81, |Param|: 5503.55, |GParam|: 60.55, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 7000/11961, Batch size: 16, LR: 0.0500, PPL: 14548.36, |Param|: 5503.56, |GParam|: 63.20, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 7050/11961, Batch size: 16, LR: 0.0500, PPL: 14525.94, |Param|: 5503.58, |GParam|: 42.81, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 7100/11961, Batch size: 16, LR: 0.0500, PPL: 14514.80, |Param|: 5503.61, |GParam|: 43.11, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 7150/11961, Batch size: 16, LR: 0.0500, PPL: 14491.36, |Param|: 5503.63, |GParam|: 57.03, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 7200/11961, Batch size: 16, LR: 0.0500, PPL: 14468.69, |Param|: 5503.66, |GParam|: 63.04, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 7250/11961, Batch size: 16, LR: 0.0500, PPL: 14443.52, |Param|: 5503.68, |GParam|: 21.50, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 7300/11961, Batch size: 16, LR: 0.0500, PPL: 14430.79, |Param|: 5503.70, |GParam|: 24.35, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 7350/11961, Batch size: 16, LR: 0.0500, PPL: 14415.63, |Param|: 5503.72, |GParam|: 55.22, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 7400/11961, Batch size: 16, LR: 0.0500, PPL: 14401.53, |Param|: 5503.74, |GParam|: 45.59, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 7450/11961, Batch size: 16, LR: 0.0500, PPL: 14392.65, |Param|: 5503.76, |GParam|: 92.59, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 7500/11961, Batch size: 16, LR: 0.0500, PPL: 14364.99, |Param|: 5503.78, |GParam|: 43.24, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 7550/11961, Batch size: 16, LR: 0.0500, PPL: 14337.63, |Param|: 5503.80, |GParam|: 88.51, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 7600/11961, Batch size: 16, LR: 0.0500, PPL: 14326.73, |Param|: 5503.82, |GParam|: 52.83, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 7650/11961, Batch size: 16, LR: 0.0500, PPL: 14338.59, |Param|: 5503.82, |GParam|: 47.33, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 7700/11961, Batch size: 16, LR: 0.0500, PPL: 14314.74, |Param|: 5503.85, |GParam|: 72.59, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 7750/11961, Batch size: 16, LR: 0.0500, PPL: 14287.60, |Param|: 5503.87, |GParam|: 65.88, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 7800/11961, Batch size: 16, LR: 0.0500, PPL: 14266.77, |Param|: 5503.89, |GParam|: 54.07, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 7850/11961, Batch size: 16, LR: 0.0500, PPL: 14248.34, |Param|: 5503.92, |GParam|: 76.80, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 7900/11961, Batch size: 16, LR: 0.0500, PPL: 14220.70, |Param|: 5503.94, |GParam|: 40.54, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 7950/11961, Batch size: 16, LR: 0.0500, PPL: 14207.59, |Param|: 5503.96, |GParam|: 102.97, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 8000/11961, Batch size: 16, LR: 0.0500, PPL: 14185.56, |Param|: 5503.98, |GParam|: 98.20, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 8050/11961, Batch size: 16, LR: 0.0500, PPL: 14167.07, |Param|: 5504.00, |GParam|: 60.53, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 8100/11961, Batch size: 16, LR: 0.0500, PPL: 14152.66, |Param|: 5504.02, |GParam|: 61.75, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 8150/11961, Batch size: 16, LR: 0.0500, PPL: 14139.62, |Param|: 5504.03, |GParam|: 42.31, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 8200/11961, Batch size: 16, LR: 0.0500, PPL: 14119.54, |Param|: 5504.04, |GParam|: 49.08, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 8250/11961, Batch size: 16, LR: 0.0500, PPL: 14113.93, |Param|: 5504.07, |GParam|: 50.05, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 8300/11961, Batch size: 16, LR: 0.0500, PPL: 14096.92, |Param|: 5504.08, |GParam|: 43.17, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 8350/11961, Batch size: 16, LR: 0.0500, PPL: 14069.38, |Param|: 5504.10, |GParam|: 34.69, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 8400/11961, Batch size: 16, LR: 0.0500, PPL: 14054.48, |Param|: 5504.12, |GParam|: 98.34, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 8450/11961, Batch size: 16, LR: 0.0500, PPL: 14041.76, |Param|: 5504.14, |GParam|: 52.52, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 8500/11961, Batch size: 16, LR: 0.0500, PPL: 14018.91, |Param|: 5504.15, |GParam|: 68.36, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 8550/11961, Batch size: 16, LR: 0.0500, PPL: 14004.82, |Param|: 5504.16, |GParam|: 74.46, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 8600/11961, Batch size: 16, LR: 0.0500, PPL: 13983.66, |Param|: 5504.17, |GParam|: 52.57, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 8650/11961, Batch size: 16, LR: 0.0500, PPL: 13966.46, |Param|: 5504.18, |GParam|: 63.08, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 8700/11961, Batch size: 16, LR: 0.0500, PPL: 13954.56, |Param|: 5504.19, |GParam|: 51.92, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 8750/11961, Batch size: 16, LR: 0.0500, PPL: 13940.55, |Param|: 5504.21, |GParam|: 35.05, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 8800/11961, Batch size: 16, LR: 0.0500, PPL: 13928.67, |Param|: 5504.21, |GParam|: 70.87, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 8850/11961, Batch size: 16, LR: 0.0500, PPL: 13914.17, |Param|: 5504.23, |GParam|: 94.09, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 8900/11961, Batch size: 16, LR: 0.0500, PPL: 13899.37, |Param|: 5504.24, |GParam|: 120.70, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 8950/11961, Batch size: 16, LR: 0.0500, PPL: 13878.46, |Param|: 5504.25, |GParam|: 58.50, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 9000/11961, Batch size: 16, LR: 0.0500, PPL: 13861.42, |Param|: 5504.26, |GParam|: 77.36, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 9050/11961, Batch size: 16, LR: 0.0500, PPL: 13848.87, |Param|: 5504.26, |GParam|: 52.51, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 9100/11961, Batch size: 16, LR: 0.0500, PPL: 13831.39, |Param|: 5504.27, |GParam|: 59.87, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 9150/11961, Batch size: 16, LR: 0.0500, PPL: 13811.07, |Param|: 5504.28, |GParam|: 57.59, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 9200/11961, Batch size: 16, LR: 0.0500, PPL: 13791.34, |Param|: 5504.29, |GParam|: 47.90, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 9250/11961, Batch size: 16, LR: 0.0500, PPL: 13773.04, |Param|: 5504.30, |GParam|: 69.50, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 9300/11961, Batch size: 16, LR: 0.0500, PPL: 13757.69, |Param|: 5504.30, |GParam|: 40.21, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 9350/11961, Batch size: 16, LR: 0.0500, PPL: 13742.03, |Param|: 5504.31, |GParam|: 50.88, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 9400/11961, Batch size: 16, LR: 0.0500, PPL: 13724.93, |Param|: 5504.32, |GParam|: 57.10, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 9450/11961, Batch size: 16, LR: 0.0500, PPL: 13703.73, |Param|: 5504.32, |GParam|: 39.59, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 9500/11961, Batch size: 16, LR: 0.0500, PPL: 13687.10, |Param|: 5504.33, |GParam|: 55.58, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 9550/11961, Batch size: 16, LR: 0.0500, PPL: 13679.02, |Param|: 5504.34, |GParam|: 70.97, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 9600/11961, Batch size: 16, LR: 0.0500, PPL: 13663.00, |Param|: 5504.34, |GParam|: 61.87, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 9650/11961, Batch size: 16, LR: 0.0500, PPL: 13640.95, |Param|: 5504.35, |GParam|: 42.54, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 9700/11961, Batch size: 16, LR: 0.0500, PPL: 13634.90, |Param|: 5504.35, |GParam|: 63.77, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 9750/11961, Batch size: 16, LR: 0.0500, PPL: 13615.10, |Param|: 5504.36, |GParam|: 67.92, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 9800/11961, Batch size: 16, LR: 0.0500, PPL: 13598.26, |Param|: 5504.36, |GParam|: 19.01, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 9850/11961, Batch size: 16, LR: 0.0500, PPL: 13581.27, |Param|: 5504.36, |GParam|: 68.01, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 9900/11961, Batch size: 16, LR: 0.0500, PPL: 13567.72, |Param|: 5504.36, |GParam|: 53.49, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 9950/11961, Batch size: 16, LR: 0.0500, PPL: 13553.71, |Param|: 5504.36, |GParam|: 58.57, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 10000/11961, Batch size: 16, LR: 0.0500, PPL: 13541.43, |Param|: 5504.37, |GParam|: 58.58, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 10050/11961, Batch size: 16, LR: 0.0500, PPL: 13523.29, |Param|: 5504.37, |GParam|: 63.96, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 10100/11961, Batch size: 16, LR: 0.0500, PPL: 13513.75, |Param|: 5504.37, |GParam|: 57.59, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 10150/11961, Batch size: 16, LR: 0.0500, PPL: 13493.17, |Param|: 5504.38, |GParam|: 63.90, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 10200/11961, Batch size: 16, LR: 0.0500, PPL: 13475.61, |Param|: 5504.38, |GParam|: 29.29, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 10250/11961, Batch size: 16, LR: 0.0500, PPL: 13451.44, |Param|: 5504.39, |GParam|: 75.60, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 10300/11961, Batch size: 7, LR: 0.0500, PPL: 13436.94, |Param|: 5504.38, |GParam|: 69.98, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 10350/11961, Batch size: 16, LR: 0.0500, PPL: 13417.78, |Param|: 5504.38, |GParam|: 34.77, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 10400/11961, Batch size: 16, LR: 0.0500, PPL: 13400.59, |Param|: 5504.38, |GParam|: 52.96, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 10450/11961, Batch size: 16, LR: 0.0500, PPL: 13388.83, |Param|: 5504.38, |GParam|: 26.51, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 10500/11961, Batch size: 16, LR: 0.0500, PPL: 13373.65, |Param|: 5504.38, |GParam|: 111.57, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 10550/11961, Batch size: 16, LR: 0.0500, PPL: 13363.96, |Param|: 5504.37, |GParam|: 76.43, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 10600/11961, Batch size: 16, LR: 0.0500, PPL: 13347.38, |Param|: 5504.37, |GParam|: 42.88, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 10650/11961, Batch size: 16, LR: 0.0500, PPL: 13333.26, |Param|: 5504.37, |GParam|: 55.81, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 10700/11961, Batch size: 16, LR: 0.0500, PPL: 13314.79, |Param|: 5504.37, |GParam|: 74.66, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 10750/11961, Batch size: 16, LR: 0.0500, PPL: 13300.66, |Param|: 5504.37, |GParam|: 41.31, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 10800/11961, Batch size: 16, LR: 0.0500, PPL: 13282.86, |Param|: 5504.35, |GParam|: 64.47, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 10850/11961, Batch size: 16, LR: 0.0500, PPL: 13269.91, |Param|: 5504.35, |GParam|: 53.08, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 10900/11961, Batch size: 16, LR: 0.0500, PPL: 13255.44, |Param|: 5504.35, |GParam|: 54.57, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 10950/11961, Batch size: 16, LR: 0.0500, PPL: 13233.52, |Param|: 5504.35, |GParam|: 90.09, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 11000/11961, Batch size: 16, LR: 0.0500, PPL: 13217.46, |Param|: 5504.34, |GParam|: 66.05, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 11050/11961, Batch size: 16, LR: 0.0500, PPL: 13201.62, |Param|: 5504.34, |GParam|: 85.66, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 11100/11961, Batch size: 16, LR: 0.0500, PPL: 13188.14, |Param|: 5504.33, |GParam|: 70.44, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 11150/11961, Batch size: 16, LR: 0.0500, PPL: 13175.09, |Param|: 5504.32, |GParam|: 114.94, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 11200/11961, Batch size: 16, LR: 0.0500, PPL: 13151.57, |Param|: 5504.32, |GParam|: 76.86, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 11250/11961, Batch size: 16, LR: 0.0500, PPL: 13135.76, |Param|: 5504.31, |GParam|: 57.27, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 11300/11961, Batch size: 16, LR: 0.0500, PPL: 13119.84, |Param|: 5504.31, |GParam|: 37.90, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 11350/11961, Batch size: 16, LR: 0.0500, PPL: 13114.38, |Param|: 5504.29, |GParam|: 79.40, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 11400/11961, Batch size: 16, LR: 0.0500, PPL: 13109.26, |Param|: 5504.28, |GParam|: 51.92, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 11450/11961, Batch size: 16, LR: 0.0500, PPL: 13096.17, |Param|: 5504.28, |GParam|: 30.68, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 11500/11961, Batch size: 16, LR: 0.0500, PPL: 13080.57, |Param|: 5504.28, |GParam|: 39.18, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 11550/11961, Batch size: 16, LR: 0.0500, PPL: 13064.66, |Param|: 5504.27, |GParam|: 79.21, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 11600/11961, Batch size: 16, LR: 0.0500, PPL: 13051.59, |Param|: 5504.26, |GParam|: 19.80, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 11650/11961, Batch size: 16, LR: 0.0500, PPL: 13035.43, |Param|: 5504.26, |GParam|: 58.61, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 11700/11961, Batch size: 16, LR: 0.0500, PPL: 13021.42, |Param|: 5504.26, |GParam|: 42.72, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 11750/11961, Batch size: 16, LR: 0.0500, PPL: 13009.38, |Param|: 5504.25, |GParam|: 64.91, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 11800/11961, Batch size: 16, LR: 0.0500, PPL: 12997.33, |Param|: 5504.24, |GParam|: 54.39, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 11850/11961, Batch size: 16, LR: 0.0500, PPL: 12984.64, |Param|: 5504.23, |GParam|: 97.25, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 11900/11961, Batch size: 16, LR: 0.0500, PPL: 12971.46, |Param|: 5504.22, |GParam|: 68.47, Training: 133/64/68 total/source/target tokens/sec Epoch: 6, Batch: 11950/11961, Batch size: 16, LR: 0.0500, PPL: 12953.38, |Param|: 5504.21, |GParam|: 34.11, Training: 133/64/68 total/source/target tokens/sec Train 36m12950.505419021 Valid 36m4128.7972338126 saving checkpoint to demo-model_epoch6.00_4128.80.t7 Epoch: 7, Batch: 50/11961, Batch size: 16, LR: 0.0500, PPL: 8438.71, |Param|: 5504.22, |GParam|: 78.81, Training: 133/64/69 total/source/target tokens/sec Epoch: 7, Batch: 100/11961, Batch size: 16, LR: 0.0500, PPL: 8269.30, |Param|: 5504.22, |GParam|: 51.59, Training: 132/63/69 total/source/target tokens/sec Epoch: 7, Batch: 150/11961, Batch size: 16, LR: 0.0500, PPL: 8147.86, |Param|: 5504.22, |GParam|: 17.88, Training: 132/62/69 total/source/target tokens/sec Epoch: 7, Batch: 200/11961, Batch size: 16, LR: 0.0500, PPL: 8197.88, |Param|: 5504.22, |GParam|: 67.23, Training: 132/63/69 total/source/target tokens/sec Epoch: 7, Batch: 250/11961, Batch size: 16, LR: 0.0500, PPL: 8288.60, |Param|: 5504.21, |GParam|: 72.65, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 300/11961, Batch size: 16, LR: 0.0500, PPL: 8332.62, |Param|: 5504.21, |GParam|: 55.10, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 350/11961, Batch size: 16, LR: 0.0500, PPL: 8299.24, |Param|: 5504.21, |GParam|: 62.83, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 400/11961, Batch size: 16, LR: 0.0500, PPL: 8214.05, |Param|: 5504.20, |GParam|: 69.08, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 450/11961, Batch size: 16, LR: 0.0500, PPL: 8220.49, |Param|: 5504.19, |GParam|: 40.33, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 500/11961, Batch size: 16, LR: 0.0500, PPL: 8201.01, |Param|: 5504.18, |GParam|: 45.82, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 550/11961, Batch size: 16, LR: 0.0500, PPL: 8202.00, |Param|: 5504.17, |GParam|: 73.97, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 600/11961, Batch size: 16, LR: 0.0500, PPL: 8165.18, |Param|: 5504.16, |GParam|: 56.30, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 650/11961, Batch size: 16, LR: 0.0500, PPL: 8119.97, |Param|: 5504.16, |GParam|: 31.49, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 700/11961, Batch size: 16, LR: 0.0500, PPL: 8158.25, |Param|: 5504.14, |GParam|: 57.74, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 750/11961, Batch size: 16, LR: 0.0500, PPL: 8155.47, |Param|: 5504.13, |GParam|: 33.48, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 800/11961, Batch size: 16, LR: 0.0500, PPL: 8169.94, |Param|: 5504.12, |GParam|: 30.08, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 850/11961, Batch size: 16, LR: 0.0500, PPL: 8166.96, |Param|: 5504.10, |GParam|: 69.36, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 900/11961, Batch size: 16, LR: 0.0500, PPL: 8169.01, |Param|: 5504.09, |GParam|: 59.44, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 950/11961, Batch size: 16, LR: 0.0500, PPL: 8240.92, |Param|: 5504.07, |GParam|: 55.74, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 1000/11961, Batch size: 16, LR: 0.0500, PPL: 8231.41, |Param|: 5504.06, |GParam|: 75.76, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 1050/11961, Batch size: 16, LR: 0.0500, PPL: 8241.95, |Param|: 5504.04, |GParam|: 74.95, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 1100/11961, Batch size: 16, LR: 0.0500, PPL: 8253.48, |Param|: 5504.02, |GParam|: 49.51, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 1150/11961, Batch size: 16, LR: 0.0500, PPL: 8260.14, |Param|: 5504.00, |GParam|: 45.02, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 1200/11961, Batch size: 16, LR: 0.0500, PPL: 8283.32, |Param|: 5503.98, |GParam|: 74.09, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 1250/11961, Batch size: 16, LR: 0.0500, PPL: 8261.51, |Param|: 5503.97, |GParam|: 67.72, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 1300/11961, Batch size: 16, LR: 0.0500, PPL: 8292.32, |Param|: 5503.95, |GParam|: 38.72, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 1350/11961, Batch size: 16, LR: 0.0500, PPL: 8265.18, |Param|: 5503.93, |GParam|: 55.13, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 1400/11961, Batch size: 16, LR: 0.0500, PPL: 8305.39, |Param|: 5503.90, |GParam|: 51.35, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 1450/11961, Batch size: 16, LR: 0.0500, PPL: 8335.92, |Param|: 5503.88, |GParam|: 28.87, Training: 133/65/68 total/source/target tokens/sec Epoch: 7, Batch: 1500/11961, Batch size: 16, LR: 0.0500, PPL: 8355.55, |Param|: 5503.86, |GParam|: 79.79, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 1550/11961, Batch size: 16, LR: 0.0500, PPL: 8349.11, |Param|: 5503.84, |GParam|: 58.59, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 1600/11961, Batch size: 16, LR: 0.0500, PPL: 8331.20, |Param|: 5503.82, |GParam|: 49.06, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 1650/11961, Batch size: 16, LR: 0.0500, PPL: 8330.69, |Param|: 5503.81, |GParam|: 82.78, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 1700/11961, Batch size: 16, LR: 0.0500, PPL: 8307.68, |Param|: 5503.79, |GParam|: 51.69, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 1750/11961, Batch size: 16, LR: 0.0500, PPL: 8312.75, |Param|: 5503.77, |GParam|: 90.13, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 1800/11961, Batch size: 16, LR: 0.0500, PPL: 8331.24, |Param|: 5503.74, |GParam|: 67.38, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 1850/11961, Batch size: 16, LR: 0.0500, PPL: 8330.36, |Param|: 5503.72, |GParam|: 53.23, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 1900/11961, Batch size: 16, LR: 0.0500, PPL: 8322.95, |Param|: 5503.70, |GParam|: 46.60, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 1950/11961, Batch size: 16, LR: 0.0500, PPL: 8332.38, |Param|: 5503.68, |GParam|: 65.70, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 2000/11961, Batch size: 16, LR: 0.0500, PPL: 8337.74, |Param|: 5503.65, |GParam|: 53.68, Training: 133/65/68 total/source/target tokens/sec Epoch: 7, Batch: 2050/11961, Batch size: 16, LR: 0.0500, PPL: 8339.35, |Param|: 5503.63, |GParam|: 46.09, Training: 133/65/68 total/source/target tokens/sec Epoch: 7, Batch: 2100/11961, Batch size: 16, LR: 0.0500, PPL: 8352.64, |Param|: 5503.60, |GParam|: 46.03, Training: 133/65/68 total/source/target tokens/sec Epoch: 7, Batch: 2150/11961, Batch size: 16, LR: 0.0500, PPL: 8356.87, |Param|: 5503.58, |GParam|: 63.19, Training: 133/65/68 total/source/target tokens/sec Epoch: 7, Batch: 2200/11961, Batch size: 16, LR: 0.0500, PPL: 8352.03, |Param|: 5503.55, |GParam|: 17.03, Training: 133/65/68 total/source/target tokens/sec Epoch: 7, Batch: 2250/11961, Batch size: 16, LR: 0.0500, PPL: 8373.08, |Param|: 5503.52, |GParam|: 56.21, Training: 133/65/68 total/source/target tokens/sec Epoch: 7, Batch: 2300/11961, Batch size: 16, LR: 0.0500, PPL: 8381.76, |Param|: 5503.49, |GParam|: 64.27, Training: 134/65/68 total/source/target tokens/sec Epoch: 7, Batch: 2350/11961, Batch size: 16, LR: 0.0500, PPL: 8394.02, |Param|: 5503.47, |GParam|: 46.57, Training: 134/65/68 total/source/target tokens/sec Epoch: 7, Batch: 2400/11961, Batch size: 16, LR: 0.0500, PPL: 8402.79, |Param|: 5503.44, |GParam|: 122.18, Training: 134/65/68 total/source/target tokens/sec Epoch: 7, Batch: 2450/11961, Batch size: 16, LR: 0.0500, PPL: 8386.70, |Param|: 5503.42, |GParam|: 49.26, Training: 134/65/68 total/source/target tokens/sec Epoch: 7, Batch: 2500/11961, Batch size: 16, LR: 0.0500, PPL: 8384.46, |Param|: 5503.39, |GParam|: 44.83, Training: 134/65/68 total/source/target tokens/sec Epoch: 7, Batch: 2550/11961, Batch size: 16, LR: 0.0500, PPL: 8408.83, |Param|: 5503.35, |GParam|: 61.45, Training: 134/65/68 total/source/target tokens/sec Epoch: 7, Batch: 2600/11961, Batch size: 16, LR: 0.0500, PPL: 8400.02, |Param|: 5503.33, |GParam|: 61.72, Training: 134/65/68 total/source/target tokens/sec Epoch: 7, Batch: 2650/11961, Batch size: 16, LR: 0.0500, PPL: 8398.17, |Param|: 5503.30, |GParam|: 77.26, Training: 134/65/68 total/source/target tokens/sec Epoch: 7, Batch: 2700/11961, Batch size: 16, LR: 0.0500, PPL: 8371.64, |Param|: 5503.28, |GParam|: 49.94, Training: 134/65/68 total/source/target tokens/sec Epoch: 7, Batch: 2750/11961, Batch size: 16, LR: 0.0500, PPL: 8363.63, |Param|: 5503.25, |GParam|: 40.57, Training: 133/65/68 total/source/target tokens/sec Epoch: 7, Batch: 2800/11961, Batch size: 16, LR: 0.0500, PPL: 8361.05, |Param|: 5503.22, |GParam|: 68.32, Training: 133/65/68 total/source/target tokens/sec Epoch: 7, Batch: 2850/11961, Batch size: 16, LR: 0.0500, PPL: 8343.01, |Param|: 5503.19, |GParam|: 49.24, Training: 133/65/68 total/source/target tokens/sec Epoch: 7, Batch: 2900/11961, Batch size: 16, LR: 0.0500, PPL: 8341.05, |Param|: 5503.16, |GParam|: 49.84, Training: 133/65/68 total/source/target tokens/sec Epoch: 7, Batch: 2950/11961, Batch size: 16, LR: 0.0500, PPL: 8331.98, |Param|: 5503.13, |GParam|: 69.83, Training: 133/65/68 total/source/target tokens/sec Epoch: 7, Batch: 3000/11961, Batch size: 16, LR: 0.0500, PPL: 8326.44, |Param|: 5503.10, |GParam|: 50.76, Training: 133/65/68 total/source/target tokens/sec Epoch: 7, Batch: 3050/11961, Batch size: 16, LR: 0.0500, PPL: 8339.05, |Param|: 5503.08, |GParam|: 34.37, Training: 133/65/68 total/source/target tokens/sec Epoch: 7, Batch: 3100/11961, Batch size: 16, LR: 0.0500, PPL: 8338.79, |Param|: 5503.05, |GParam|: 49.09, Training: 133/65/68 total/source/target tokens/sec Epoch: 7, Batch: 3150/11961, Batch size: 16, LR: 0.0500, PPL: 8337.31, |Param|: 5503.02, |GParam|: 49.62, Training: 133/65/68 total/source/target tokens/sec Epoch: 7, Batch: 3200/11961, Batch size: 16, LR: 0.0500, PPL: 8328.64, |Param|: 5502.98, |GParam|: 40.95, Training: 133/65/68 total/source/target tokens/sec Epoch: 7, Batch: 3250/11961, Batch size: 16, LR: 0.0500, PPL: 8318.87, |Param|: 5502.96, |GParam|: 39.71, Training: 133/65/68 total/source/target tokens/sec Epoch: 7, Batch: 3300/11961, Batch size: 16, LR: 0.0500, PPL: 8327.55, |Param|: 5502.93, |GParam|: 62.01, Training: 133/65/68 total/source/target tokens/sec Epoch: 7, Batch: 3350/11961, Batch size: 16, LR: 0.0500, PPL: 8319.09, |Param|: 5502.90, |GParam|: 19.02, Training: 133/65/68 total/source/target tokens/sec Epoch: 7, Batch: 3400/11961, Batch size: 16, LR: 0.0500, PPL: 8303.43, |Param|: 5502.87, |GParam|: 70.97, Training: 133/65/68 total/source/target tokens/sec Epoch: 7, Batch: 3450/11961, Batch size: 4, LR: 0.0500, PPL: 8297.64, |Param|: 5502.85, |GParam|: 115.31, Training: 133/65/68 total/source/target tokens/sec Epoch: 7, Batch: 3500/11961, Batch size: 16, LR: 0.0500, PPL: 8302.59, |Param|: 5502.81, |GParam|: 53.16, Training: 133/65/68 total/source/target tokens/sec Epoch: 7, Batch: 3550/11961, Batch size: 16, LR: 0.0500, PPL: 8297.64, |Param|: 5502.78, |GParam|: 77.73, Training: 133/65/68 total/source/target tokens/sec Epoch: 7, Batch: 3600/11961, Batch size: 16, LR: 0.0500, PPL: 8321.80, |Param|: 5502.74, |GParam|: 43.51, Training: 133/65/68 total/source/target tokens/sec Epoch: 7, Batch: 3650/11961, Batch size: 16, LR: 0.0500, PPL: 8321.82, |Param|: 5502.70, |GParam|: 81.12, Training: 133/65/68 total/source/target tokens/sec Epoch: 7, Batch: 3700/11961, Batch size: 16, LR: 0.0500, PPL: 8314.16, |Param|: 5502.67, |GParam|: 42.38, Training: 133/65/68 total/source/target tokens/sec Epoch: 7, Batch: 3750/11961, Batch size: 16, LR: 0.0500, PPL: 8302.68, |Param|: 5502.64, |GParam|: 56.95, Training: 133/65/68 total/source/target tokens/sec Epoch: 7, Batch: 3800/11961, Batch size: 16, LR: 0.0500, PPL: 8297.40, |Param|: 5502.61, |GParam|: 52.77, Training: 133/65/68 total/source/target tokens/sec Epoch: 7, Batch: 3850/11961, Batch size: 16, LR: 0.0500, PPL: 8277.77, |Param|: 5502.58, |GParam|: 56.51, Training: 133/65/68 total/source/target tokens/sec Epoch: 7, Batch: 3900/11961, Batch size: 16, LR: 0.0500, PPL: 8276.81, |Param|: 5502.55, |GParam|: 59.42, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 3950/11961, Batch size: 16, LR: 0.0500, PPL: 8273.83, |Param|: 5502.52, |GParam|: 63.58, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 4000/11961, Batch size: 16, LR: 0.0500, PPL: 8271.88, |Param|: 5502.49, |GParam|: 74.41, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 4050/11961, Batch size: 16, LR: 0.0500, PPL: 8282.66, |Param|: 5502.45, |GParam|: 60.58, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 4100/11961, Batch size: 16, LR: 0.0500, PPL: 8286.26, |Param|: 5502.42, |GParam|: 27.05, Training: 133/65/68 total/source/target tokens/sec Epoch: 7, Batch: 4150/11961, Batch size: 16, LR: 0.0500, PPL: 8279.27, |Param|: 5502.38, |GParam|: 67.19, Training: 133/65/68 total/source/target tokens/sec Epoch: 7, Batch: 4200/11961, Batch size: 16, LR: 0.0500, PPL: 8262.78, |Param|: 5502.35, |GParam|: 42.02, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 4250/11961, Batch size: 16, LR: 0.0500, PPL: 8250.68, |Param|: 5502.32, |GParam|: 44.12, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 4300/11961, Batch size: 16, LR: 0.0500, PPL: 8251.83, |Param|: 5502.29, |GParam|: 24.37, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 4350/11961, Batch size: 16, LR: 0.0500, PPL: 8250.04, |Param|: 5502.25, |GParam|: 44.80, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 4400/11961, Batch size: 16, LR: 0.0500, PPL: 8243.73, |Param|: 5502.22, |GParam|: 77.57, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 4450/11961, Batch size: 16, LR: 0.0500, PPL: 8254.71, |Param|: 5502.19, |GParam|: 79.18, Training: 133/65/68 total/source/target tokens/sec Epoch: 7, Batch: 4500/11961, Batch size: 16, LR: 0.0500, PPL: 8254.58, |Param|: 5502.15, |GParam|: 53.91, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 4550/11961, Batch size: 16, LR: 0.0500, PPL: 8257.12, |Param|: 5502.12, |GParam|: 34.46, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 4600/11961, Batch size: 16, LR: 0.0500, PPL: 8255.70, |Param|: 5502.08, |GParam|: 78.36, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 4650/11961, Batch size: 16, LR: 0.0500, PPL: 8251.45, |Param|: 5502.04, |GParam|: 39.03, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 4700/11961, Batch size: 16, LR: 0.0500, PPL: 8246.01, |Param|: 5502.00, |GParam|: 72.42, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 4750/11961, Batch size: 16, LR: 0.0500, PPL: 8237.98, |Param|: 5501.97, |GParam|: 54.32, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 4800/11961, Batch size: 16, LR: 0.0500, PPL: 8232.69, |Param|: 5501.94, |GParam|: 44.35, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 4850/11961, Batch size: 16, LR: 0.0500, PPL: 8234.84, |Param|: 5501.90, |GParam|: 80.51, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 4900/11961, Batch size: 16, LR: 0.0500, PPL: 8231.75, |Param|: 5501.87, |GParam|: 67.43, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 4950/11961, Batch size: 16, LR: 0.0500, PPL: 8233.80, |Param|: 5501.83, |GParam|: 59.03, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 5000/11961, Batch size: 16, LR: 0.0500, PPL: 8226.18, |Param|: 5501.80, |GParam|: 43.64, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 5050/11961, Batch size: 16, LR: 0.0500, PPL: 8220.03, |Param|: 5501.77, |GParam|: 56.33, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 5100/11961, Batch size: 16, LR: 0.0500, PPL: 8214.34, |Param|: 5501.73, |GParam|: 39.01, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 5150/11961, Batch size: 16, LR: 0.0500, PPL: 8208.48, |Param|: 5501.70, |GParam|: 75.75, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 5200/11961, Batch size: 16, LR: 0.0500, PPL: 8212.16, |Param|: 5501.66, |GParam|: 69.14, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 5250/11961, Batch size: 16, LR: 0.0500, PPL: 8207.92, |Param|: 5501.63, |GParam|: 81.92, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 5300/11961, Batch size: 16, LR: 0.0500, PPL: 8202.83, |Param|: 5501.59, |GParam|: 93.35, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 5350/11961, Batch size: 16, LR: 0.0500, PPL: 8209.45, |Param|: 5501.55, |GParam|: 66.29, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 5400/11961, Batch size: 16, LR: 0.0500, PPL: 8213.30, |Param|: 5501.51, |GParam|: 25.96, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 5450/11961, Batch size: 16, LR: 0.0500, PPL: 8208.40, |Param|: 5501.48, |GParam|: 67.54, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 5500/11961, Batch size: 16, LR: 0.0500, PPL: 8204.78, |Param|: 5501.44, |GParam|: 22.33, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 5550/11961, Batch size: 16, LR: 0.0500, PPL: 8201.60, |Param|: 5501.41, |GParam|: 47.02, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 5600/11961, Batch size: 16, LR: 0.0500, PPL: 8190.22, |Param|: 5501.37, |GParam|: 75.33, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 5650/11961, Batch size: 16, LR: 0.0500, PPL: 8182.24, |Param|: 5501.34, |GParam|: 40.01, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 5700/11961, Batch size: 16, LR: 0.0500, PPL: 8170.60, |Param|: 5501.31, |GParam|: 47.55, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 5750/11961, Batch size: 16, LR: 0.0500, PPL: 8165.97, |Param|: 5501.27, |GParam|: 50.40, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 5800/11961, Batch size: 16, LR: 0.0500, PPL: 8171.43, |Param|: 5501.23, |GParam|: 33.61, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 5850/11961, Batch size: 16, LR: 0.0500, PPL: 8165.36, |Param|: 5501.20, |GParam|: 50.12, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 5900/11961, Batch size: 16, LR: 0.0500, PPL: 8157.99, |Param|: 5501.16, |GParam|: 73.38, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 5950/11961, Batch size: 16, LR: 0.0500, PPL: 8162.45, |Param|: 5501.12, |GParam|: 58.16, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 6000/11961, Batch size: 16, LR: 0.0500, PPL: 8158.07, |Param|: 5501.09, |GParam|: 44.39, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 6050/11961, Batch size: 16, LR: 0.0500, PPL: 8149.52, |Param|: 5501.06, |GParam|: 44.49, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 6100/11961, Batch size: 16, LR: 0.0500, PPL: 8150.71, |Param|: 5501.01, |GParam|: 50.22, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 6150/11961, Batch size: 16, LR: 0.0500, PPL: 8146.82, |Param|: 5500.98, |GParam|: 57.45, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 6200/11961, Batch size: 16, LR: 0.0500, PPL: 8141.62, |Param|: 5500.94, |GParam|: 55.24, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 6250/11961, Batch size: 16, LR: 0.0500, PPL: 8133.92, |Param|: 5500.90, |GParam|: 44.08, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 6300/11961, Batch size: 16, LR: 0.0500, PPL: 8133.96, |Param|: 5500.86, |GParam|: 50.10, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 6350/11961, Batch size: 16, LR: 0.0500, PPL: 8128.73, |Param|: 5500.82, |GParam|: 73.19, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 6400/11961, Batch size: 16, LR: 0.0500, PPL: 8133.50, |Param|: 5500.78, |GParam|: 76.60, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 6450/11961, Batch size: 16, LR: 0.0500, PPL: 8131.14, |Param|: 5500.74, |GParam|: 43.56, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 6500/11961, Batch size: 16, LR: 0.0500, PPL: 8134.25, |Param|: 5500.69, |GParam|: 34.62, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 6550/11961, Batch size: 16, LR: 0.0500, PPL: 8125.35, |Param|: 5500.66, |GParam|: 74.12, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 6600/11961, Batch size: 16, LR: 0.0500, PPL: 8121.49, |Param|: 5500.62, |GParam|: 31.29, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 6650/11961, Batch size: 16, LR: 0.0500, PPL: 8118.00, |Param|: 5500.58, |GParam|: 32.15, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 6700/11961, Batch size: 16, LR: 0.0500, PPL: 8112.57, |Param|: 5500.54, |GParam|: 56.71, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 6750/11961, Batch size: 16, LR: 0.0500, PPL: 8113.72, |Param|: 5500.50, |GParam|: 39.36, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 6800/11961, Batch size: 16, LR: 0.0500, PPL: 8108.95, |Param|: 5500.47, |GParam|: 54.38, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 6850/11961, Batch size: 16, LR: 0.0500, PPL: 8103.74, |Param|: 5500.43, |GParam|: 43.86, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 6900/11961, Batch size: 16, LR: 0.0500, PPL: 8099.01, |Param|: 5500.39, |GParam|: 99.13, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 6950/11961, Batch size: 16, LR: 0.0500, PPL: 8100.34, |Param|: 5500.35, |GParam|: 51.85, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 7000/11961, Batch size: 16, LR: 0.0500, PPL: 8098.09, |Param|: 5500.31, |GParam|: 57.53, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 7050/11961, Batch size: 16, LR: 0.0500, PPL: 8091.03, |Param|: 5500.27, |GParam|: 61.48, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 7100/11961, Batch size: 16, LR: 0.0500, PPL: 8084.24, |Param|: 5500.24, |GParam|: 63.55, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 7150/11961, Batch size: 16, LR: 0.0500, PPL: 8081.66, |Param|: 5500.20, |GParam|: 24.07, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 7200/11961, Batch size: 16, LR: 0.0500, PPL: 8073.82, |Param|: 5500.17, |GParam|: 64.13, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 7250/11961, Batch size: 16, LR: 0.0500, PPL: 8068.36, |Param|: 5500.13, |GParam|: 58.57, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 7300/11961, Batch size: 16, LR: 0.0500, PPL: 8069.88, |Param|: 5500.08, |GParam|: 57.21, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 7350/11961, Batch size: 16, LR: 0.0500, PPL: 8064.93, |Param|: 5500.04, |GParam|: 41.19, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 7400/11961, Batch size: 16, LR: 0.0500, PPL: 8060.11, |Param|: 5500.00, |GParam|: 37.18, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 7450/11961, Batch size: 16, LR: 0.0500, PPL: 8055.54, |Param|: 5499.96, |GParam|: 35.72, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 7500/11961, Batch size: 16, LR: 0.0500, PPL: 8053.86, |Param|: 5499.92, |GParam|: 56.73, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 7550/11961, Batch size: 16, LR: 0.0500, PPL: 8049.48, |Param|: 5499.87, |GParam|: 32.68, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 7600/11961, Batch size: 16, LR: 0.0500, PPL: 8042.71, |Param|: 5499.83, |GParam|: 66.82, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 7650/11961, Batch size: 16, LR: 0.0500, PPL: 8039.45, |Param|: 5499.79, |GParam|: 78.24, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 7700/11961, Batch size: 16, LR: 0.0500, PPL: 8039.57, |Param|: 5499.75, |GParam|: 68.67, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 7750/11961, Batch size: 16, LR: 0.0500, PPL: 8039.04, |Param|: 5499.70, |GParam|: 47.44, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 7800/11961, Batch size: 16, LR: 0.0500, PPL: 8042.03, |Param|: 5499.66, |GParam|: 62.06, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 7850/11961, Batch size: 16, LR: 0.0500, PPL: 8037.49, |Param|: 5499.62, |GParam|: 54.06, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 7900/11961, Batch size: 16, LR: 0.0500, PPL: 8030.33, |Param|: 5499.58, |GParam|: 64.16, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 7950/11961, Batch size: 16, LR: 0.0500, PPL: 8029.59, |Param|: 5499.54, |GParam|: 45.78, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 8000/11961, Batch size: 16, LR: 0.0500, PPL: 8025.42, |Param|: 5499.50, |GParam|: 40.23, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 8050/11961, Batch size: 16, LR: 0.0500, PPL: 8026.07, |Param|: 5499.45, |GParam|: 53.20, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 8100/11961, Batch size: 16, LR: 0.0500, PPL: 8020.07, |Param|: 5499.42, |GParam|: 49.69, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 8150/11961, Batch size: 16, LR: 0.0500, PPL: 8018.35, |Param|: 5499.37, |GParam|: 56.57, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 8200/11961, Batch size: 16, LR: 0.0500, PPL: 8011.07, |Param|: 5499.34, |GParam|: 37.97, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 8250/11961, Batch size: 16, LR: 0.0500, PPL: 8013.05, |Param|: 5499.29, |GParam|: 62.60, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 8300/11961, Batch size: 16, LR: 0.0500, PPL: 8009.39, |Param|: 5499.25, |GParam|: 58.22, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 8350/11961, Batch size: 16, LR: 0.0500, PPL: 8003.40, |Param|: 5499.21, |GParam|: 56.42, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 8400/11961, Batch size: 16, LR: 0.0500, PPL: 7998.58, |Param|: 5499.17, |GParam|: 67.80, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 8450/11961, Batch size: 16, LR: 0.0500, PPL: 7993.57, |Param|: 5499.13, |GParam|: 49.24, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 8500/11961, Batch size: 16, LR: 0.0500, PPL: 7986.48, |Param|: 5499.09, |GParam|: 54.86, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 8550/11961, Batch size: 16, LR: 0.0500, PPL: 7983.10, |Param|: 5499.05, |GParam|: 79.47, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 8600/11961, Batch size: 16, LR: 0.0500, PPL: 7977.06, |Param|: 5499.01, |GParam|: 48.86, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 8650/11961, Batch size: 16, LR: 0.0500, PPL: 7966.27, |Param|: 5498.97, |GParam|: 66.18, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 8700/11961, Batch size: 16, LR: 0.0500, PPL: 7961.82, |Param|: 5498.93, |GParam|: 61.51, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 8750/11961, Batch size: 16, LR: 0.0500, PPL: 7961.10, |Param|: 5498.89, |GParam|: 50.56, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 8800/11961, Batch size: 16, LR: 0.0500, PPL: 7952.36, |Param|: 5498.86, |GParam|: 54.30, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 8850/11961, Batch size: 16, LR: 0.0500, PPL: 7953.04, |Param|: 5498.81, |GParam|: 37.78, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 8900/11961, Batch size: 16, LR: 0.0500, PPL: 7951.50, |Param|: 5498.77, |GParam|: 80.46, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 8950/11961, Batch size: 15, LR: 0.0500, PPL: 7949.97, |Param|: 5498.73, |GParam|: 119.16, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 9000/11961, Batch size: 16, LR: 0.0500, PPL: 7946.56, |Param|: 5498.69, |GParam|: 60.14, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 9050/11961, Batch size: 16, LR: 0.0500, PPL: 7948.40, |Param|: 5498.64, |GParam|: 42.56, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 9100/11961, Batch size: 16, LR: 0.0500, PPL: 7940.46, |Param|: 5498.61, |GParam|: 18.59, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 9150/11961, Batch size: 16, LR: 0.0500, PPL: 7931.02, |Param|: 5498.57, |GParam|: 74.97, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 9200/11961, Batch size: 16, LR: 0.0500, PPL: 7929.10, |Param|: 5498.53, |GParam|: 71.36, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 9250/11961, Batch size: 16, LR: 0.0500, PPL: 7923.04, |Param|: 5498.49, |GParam|: 65.83, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 9300/11961, Batch size: 16, LR: 0.0500, PPL: 7920.85, |Param|: 5498.45, |GParam|: 24.27, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 9350/11961, Batch size: 16, LR: 0.0500, PPL: 7913.89, |Param|: 5498.41, |GParam|: 96.44, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 9400/11961, Batch size: 16, LR: 0.0500, PPL: 7907.47, |Param|: 5498.37, |GParam|: 53.60, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 9450/11961, Batch size: 16, LR: 0.0500, PPL: 7904.11, |Param|: 5498.33, |GParam|: 42.65, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 9500/11961, Batch size: 16, LR: 0.0500, PPL: 7897.17, |Param|: 5498.29, |GParam|: 79.27, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 9550/11961, Batch size: 16, LR: 0.0500, PPL: 7890.89, |Param|: 5498.25, |GParam|: 25.51, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 9600/11961, Batch size: 16, LR: 0.0500, PPL: 7884.26, |Param|: 5498.21, |GParam|: 46.07, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 9650/11961, Batch size: 16, LR: 0.0500, PPL: 7878.22, |Param|: 5498.17, |GParam|: 35.66, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 9700/11961, Batch size: 16, LR: 0.0500, PPL: 7872.75, |Param|: 5498.12, |GParam|: 56.88, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 9750/11961, Batch size: 16, LR: 0.0500, PPL: 7860.94, |Param|: 5498.09, |GParam|: 79.72, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 9800/11961, Batch size: 16, LR: 0.0500, PPL: 7853.15, |Param|: 5498.05, |GParam|: 57.60, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 9850/11961, Batch size: 16, LR: 0.0500, PPL: 7847.59, |Param|: 5498.01, |GParam|: 64.86, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 9900/11961, Batch size: 16, LR: 0.0500, PPL: 7845.55, |Param|: 5497.97, |GParam|: 43.08, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 9950/11961, Batch size: 16, LR: 0.0500, PPL: 7841.08, |Param|: 5497.93, |GParam|: 88.44, Training: 133/64/68 total/source/target tokens/sec Epoch: 7, Batch: 10000/11961, Batch size: 16, LR: 0.0500, PPL: 7840.66, |Param|: 5497.89, |GParam|: 57.19, Training: 132/64/68 total/source/target tokens/sec Epoch: 7, Batch: 10050/11961, Batch size: 16, LR: 0.0500, PPL: 7834.65, |Param|: 5497.84, |GParam|: 46.75, Training: 129/62/66 total/source/target tokens/sec Epoch: 7, Batch: 10100/11961, Batch size: 16, LR: 0.0500, PPL: 7825.08, |Param|: 5497.81, |GParam|: 55.07, Training: 129/62/66 total/source/target tokens/sec Epoch: 7, Batch: 10150/11961, Batch size: 16, LR: 0.0500, PPL: 7818.93, |Param|: 5497.77, |GParam|: 23.08, Training: 129/62/66 total/source/target tokens/sec Epoch: 7, Batch: 10200/11961, Batch size: 16, LR: 0.0500, PPL: 7812.63, |Param|: 5497.74, |GParam|: 47.59, Training: 129/62/66 total/source/target tokens/sec Epoch: 7, Batch: 10250/11961, Batch size: 16, LR: 0.0500, PPL: 7805.17, |Param|: 5497.70, |GParam|: 43.09, Training: 129/62/66 total/source/target tokens/sec Epoch: 7, Batch: 10300/11961, Batch size: 16, LR: 0.0500, PPL: 7802.05, |Param|: 5497.66, |GParam|: 66.76, Training: 129/62/66 total/source/target tokens/sec Epoch: 7, Batch: 10350/11961, Batch size: 16, LR: 0.0500, PPL: 7797.98, |Param|: 5497.62, |GParam|: 52.54, Training: 129/62/66 total/source/target tokens/sec Epoch: 7, Batch: 10400/11961, Batch size: 16, LR: 0.0500, PPL: 7793.96, |Param|: 5497.58, |GParam|: 58.64, Training: 129/62/66 total/source/target tokens/sec Epoch: 7, Batch: 10450/11961, Batch size: 16, LR: 0.0500, PPL: 7790.95, |Param|: 5497.54, |GParam|: 53.79, Training: 129/62/66 total/source/target tokens/sec Epoch: 7, Batch: 10500/11961, Batch size: 16, LR: 0.0500, PPL: 7783.76, |Param|: 5497.50, |GParam|: 52.23, Training: 129/62/66 total/source/target tokens/sec Epoch: 7, Batch: 10550/11961, Batch size: 16, LR: 0.0500, PPL: 7778.30, |Param|: 5497.46, |GParam|: 57.05, Training: 129/62/66 total/source/target tokens/sec Epoch: 7, Batch: 10600/11961, Batch size: 16, LR: 0.0500, PPL: 7769.48, |Param|: 5497.43, |GParam|: 21.68, Training: 129/62/66 total/source/target tokens/sec Epoch: 7, Batch: 10650/11961, Batch size: 16, LR: 0.0500, PPL: 7763.22, |Param|: 5497.39, |GParam|: 37.18, Training: 129/62/66 total/source/target tokens/sec Epoch: 7, Batch: 10700/11961, Batch size: 16, LR: 0.0500, PPL: 7757.96, |Param|: 5497.35, |GParam|: 57.33, Training: 129/62/66 total/source/target tokens/sec Epoch: 7, Batch: 10750/11961, Batch size: 16, LR: 0.0500, PPL: 7756.28, |Param|: 5497.30, |GParam|: 93.59, Training: 129/62/66 total/source/target tokens/sec Epoch: 7, Batch: 10800/11961, Batch size: 16, LR: 0.0500, PPL: 7749.63, |Param|: 5497.27, |GParam|: 67.52, Training: 129/62/66 total/source/target tokens/sec Epoch: 7, Batch: 10850/11961, Batch size: 16, LR: 0.0500, PPL: 7746.73, |Param|: 5497.23, |GParam|: 101.53, Training: 129/62/66 total/source/target tokens/sec Epoch: 7, Batch: 10900/11961, Batch size: 16, LR: 0.0500, PPL: 7744.25, |Param|: 5497.19, |GParam|: 66.02, Training: 129/62/66 total/source/target tokens/sec Epoch: 7, Batch: 10950/11961, Batch size: 16, LR: 0.0500, PPL: 7737.49, |Param|: 5497.15, |GParam|: 57.33, Training: 129/62/66 total/source/target tokens/sec Epoch: 7, Batch: 11000/11961, Batch size: 16, LR: 0.0500, PPL: 7731.55, |Param|: 5497.11, |GParam|: 36.55, Training: 129/62/66 total/source/target tokens/sec Epoch: 7, Batch: 11050/11961, Batch size: 16, LR: 0.0500, PPL: 7726.13, |Param|: 5497.08, |GParam|: 23.22, Training: 129/62/67 total/source/target tokens/sec Epoch: 7, Batch: 11100/11961, Batch size: 16, LR: 0.0500, PPL: 7723.02, |Param|: 5497.04, |GParam|: 65.98, Training: 129/62/67 total/source/target tokens/sec Epoch: 7, Batch: 11150/11961, Batch size: 16, LR: 0.0500, PPL: 7715.36, |Param|: 5497.00, |GParam|: 70.12, Training: 130/62/67 total/source/target tokens/sec Epoch: 7, Batch: 11200/11961, Batch size: 16, LR: 0.0500, PPL: 7710.37, |Param|: 5496.96, |GParam|: 65.62, Training: 130/63/67 total/source/target tokens/sec Epoch: 7, Batch: 11250/11961, Batch size: 16, LR: 0.0500, PPL: 7706.87, |Param|: 5496.92, |GParam|: 48.79, Training: 130/63/67 total/source/target tokens/sec Epoch: 7, Batch: 11300/11961, Batch size: 16, LR: 0.0500, PPL: 7704.17, |Param|: 5496.88, |GParam|: 45.93, Training: 130/63/67 total/source/target tokens/sec Epoch: 7, Batch: 11350/11961, Batch size: 16, LR: 0.0500, PPL: 7698.86, |Param|: 5496.84, |GParam|: 71.12, Training: 130/63/67 total/source/target tokens/sec Epoch: 7, Batch: 11400/11961, Batch size: 16, LR: 0.0500, PPL: 7690.51, |Param|: 5496.80, |GParam|: 20.25, Training: 130/63/67 total/source/target tokens/sec Epoch: 7, Batch: 11450/11961, Batch size: 16, LR: 0.0500, PPL: 7686.93, |Param|: 5496.76, |GParam|: 57.43, Training: 130/63/67 total/source/target tokens/sec Epoch: 7, Batch: 11500/11961, Batch size: 16, LR: 0.0500, PPL: 7685.65, |Param|: 5496.72, |GParam|: 56.96, Training: 130/63/67 total/source/target tokens/sec Epoch: 7, Batch: 11550/11961, Batch size: 16, LR: 0.0500, PPL: 7679.74, |Param|: 5496.68, |GParam|: 77.48, Training: 130/63/67 total/source/target tokens/sec Epoch: 7, Batch: 11600/11961, Batch size: 16, LR: 0.0500, PPL: 7676.59, |Param|: 5496.64, |GParam|: 121.79, Training: 130/63/67 total/source/target tokens/sec Epoch: 7, Batch: 11650/11961, Batch size: 16, LR: 0.0500, PPL: 7668.97, |Param|: 5496.60, |GParam|: 79.64, Training: 130/63/67 total/source/target tokens/sec Epoch: 7, Batch: 11700/11961, Batch size: 16, LR: 0.0500, PPL: 7660.23, |Param|: 5496.56, |GParam|: 49.59, Training: 130/63/67 total/source/target tokens/sec Epoch: 7, Batch: 11750/11961, Batch size: 16, LR: 0.0500, PPL: 7654.72, |Param|: 5496.52, |GParam|: 44.82, Training: 130/63/67 total/source/target tokens/sec Epoch: 7, Batch: 11800/11961, Batch size: 16, LR: 0.0500, PPL: 7650.58, |Param|: 5496.48, |GParam|: 42.79, Training: 130/63/67 total/source/target tokens/sec Epoch: 7, Batch: 11850/11961, Batch size: 16, LR: 0.0500, PPL: 7647.61, |Param|: 5496.43, |GParam|: 28.80, Training: 130/63/67 total/source/target tokens/sec Epoch: 7, Batch: 11900/11961, Batch size: 16, LR: 0.0500, PPL: 7642.25, |Param|: 5496.40, |GParam|: 66.48, Training: 130/63/67 total/source/target tokens/sec Epoch: 7, Batch: 11950/11961, Batch size: 16, LR: 0.0500, PPL: 7634.96, |Param|: 5496.36, |GParam|: 42.32, Training: 130/63/67 total/source/target tokens/sec Train 36m7633.9813832184 Valid 36m4228.3232003008 saving checkpoint to demo-model_epoch7.00_4228.32.t7 Epoch: 8, Batch: 50/11961, Batch size: 16, LR: 0.0500, PPL: 5025.57, |Param|: 5496.32, |GParam|: 40.51, Training: 134/65/68 total/source/target tokens/sec Epoch: 8, Batch: 100/11961, Batch size: 16, LR: 0.0500, PPL: 5496.95, |Param|: 5496.29, |GParam|: 48.92, Training: 133/64/69 total/source/target tokens/sec Epoch: 8, Batch: 150/11961, Batch size: 16, LR: 0.0500, PPL: 5407.94, |Param|: 5496.26, |GParam|: 45.23, Training: 132/63/69 total/source/target tokens/sec Epoch: 8, Batch: 200/11961, Batch size: 16, LR: 0.0500, PPL: 5544.15, |Param|: 5496.22, |GParam|: 126.52, Training: 133/64/69 total/source/target tokens/sec Epoch: 8, Batch: 250/11961, Batch size: 16, LR: 0.0500, PPL: 5699.50, |Param|: 5496.19, |GParam|: 18.24, Training: 133/64/69 total/source/target tokens/sec Epoch: 8, Batch: 300/11961, Batch size: 16, LR: 0.0500, PPL: 5630.68, |Param|: 5496.16, |GParam|: 63.51, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 350/11961, Batch size: 16, LR: 0.0500, PPL: 5707.65, |Param|: 5496.12, |GParam|: 65.81, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 400/11961, Batch size: 16, LR: 0.0500, PPL: 5678.54, |Param|: 5496.10, |GParam|: 33.69, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 450/11961, Batch size: 16, LR: 0.0500, PPL: 5622.35, |Param|: 5496.07, |GParam|: 59.76, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 500/11961, Batch size: 16, LR: 0.0500, PPL: 5687.32, |Param|: 5496.03, |GParam|: 53.01, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 550/11961, Batch size: 16, LR: 0.0500, PPL: 5639.48, |Param|: 5496.00, |GParam|: 66.27, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 600/11961, Batch size: 16, LR: 0.0500, PPL: 5650.14, |Param|: 5495.96, |GParam|: 80.65, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 650/11961, Batch size: 16, LR: 0.0500, PPL: 5644.79, |Param|: 5495.93, |GParam|: 62.00, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 700/11961, Batch size: 16, LR: 0.0500, PPL: 5624.64, |Param|: 5495.90, |GParam|: 45.24, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 750/11961, Batch size: 16, LR: 0.0500, PPL: 5681.62, |Param|: 5495.86, |GParam|: 101.22, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 800/11961, Batch size: 16, LR: 0.0500, PPL: 5705.92, |Param|: 5495.82, |GParam|: 89.85, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 850/11961, Batch size: 16, LR: 0.0500, PPL: 5708.39, |Param|: 5495.79, |GParam|: 42.78, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 900/11961, Batch size: 16, LR: 0.0500, PPL: 5726.95, |Param|: 5495.75, |GParam|: 61.29, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 950/11961, Batch size: 16, LR: 0.0500, PPL: 5742.41, |Param|: 5495.72, |GParam|: 54.52, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 1000/11961, Batch size: 16, LR: 0.0500, PPL: 5756.49, |Param|: 5495.68, |GParam|: 34.42, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 1050/11961, Batch size: 16, LR: 0.0500, PPL: 5753.33, |Param|: 5495.65, |GParam|: 45.31, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 1100/11961, Batch size: 16, LR: 0.0500, PPL: 5757.03, |Param|: 5495.61, |GParam|: 90.32, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 1150/11961, Batch size: 16, LR: 0.0500, PPL: 5752.16, |Param|: 5495.58, |GParam|: 52.33, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 1200/11961, Batch size: 16, LR: 0.0500, PPL: 5761.62, |Param|: 5495.54, |GParam|: 53.63, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 1250/11961, Batch size: 16, LR: 0.0500, PPL: 5770.85, |Param|: 5495.50, |GParam|: 66.95, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 1300/11961, Batch size: 16, LR: 0.0500, PPL: 5749.41, |Param|: 5495.47, |GParam|: 72.61, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 1350/11961, Batch size: 16, LR: 0.0500, PPL: 5737.38, |Param|: 5495.44, |GParam|: 55.55, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 1400/11961, Batch size: 16, LR: 0.0500, PPL: 5729.70, |Param|: 5495.40, |GParam|: 39.72, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 1450/11961, Batch size: 16, LR: 0.0500, PPL: 5744.78, |Param|: 5495.37, |GParam|: 129.87, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 1500/11961, Batch size: 16, LR: 0.0500, PPL: 5739.49, |Param|: 5495.34, |GParam|: 50.25, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 1550/11961, Batch size: 16, LR: 0.0500, PPL: 5721.42, |Param|: 5495.30, |GParam|: 65.04, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 1600/11961, Batch size: 16, LR: 0.0500, PPL: 5733.36, |Param|: 5495.27, |GParam|: 49.79, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 1650/11961, Batch size: 16, LR: 0.0500, PPL: 5731.93, |Param|: 5495.23, |GParam|: 20.83, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 1700/11961, Batch size: 16, LR: 0.0500, PPL: 5740.21, |Param|: 5495.19, |GParam|: 60.69, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 1750/11961, Batch size: 16, LR: 0.0500, PPL: 5717.77, |Param|: 5495.17, |GParam|: 52.03, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 1800/11961, Batch size: 16, LR: 0.0500, PPL: 5712.79, |Param|: 5495.13, |GParam|: 47.37, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 1850/11961, Batch size: 16, LR: 0.0500, PPL: 5727.60, |Param|: 5495.09, |GParam|: 46.72, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 1900/11961, Batch size: 16, LR: 0.0500, PPL: 5722.16, |Param|: 5495.06, |GParam|: 44.45, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 1950/11961, Batch size: 16, LR: 0.0500, PPL: 5717.39, |Param|: 5495.02, |GParam|: 65.70, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 2000/11961, Batch size: 16, LR: 0.0500, PPL: 5734.45, |Param|: 5494.98, |GParam|: 63.24, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 2050/11961, Batch size: 16, LR: 0.0500, PPL: 5722.80, |Param|: 5494.95, |GParam|: 60.49, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 2100/11961, Batch size: 16, LR: 0.0500, PPL: 5726.01, |Param|: 5494.91, |GParam|: 50.89, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 2150/11961, Batch size: 16, LR: 0.0500, PPL: 5732.82, |Param|: 5494.87, |GParam|: 68.05, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 2200/11961, Batch size: 16, LR: 0.0500, PPL: 5741.93, |Param|: 5494.84, |GParam|: 46.62, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 2250/11961, Batch size: 16, LR: 0.0500, PPL: 5731.35, |Param|: 5494.80, |GParam|: 45.60, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 2300/11961, Batch size: 16, LR: 0.0500, PPL: 5732.36, |Param|: 5494.77, |GParam|: 63.13, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 2350/11961, Batch size: 16, LR: 0.0500, PPL: 5733.91, |Param|: 5494.73, |GParam|: 61.69, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 2400/11961, Batch size: 16, LR: 0.0500, PPL: 5734.69, |Param|: 5494.69, |GParam|: 81.15, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 2450/11961, Batch size: 16, LR: 0.0500, PPL: 5736.63, |Param|: 5494.65, |GParam|: 65.23, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 2500/11961, Batch size: 16, LR: 0.0500, PPL: 5736.50, |Param|: 5494.62, |GParam|: 31.27, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 2550/11961, Batch size: 16, LR: 0.0500, PPL: 5736.93, |Param|: 5494.58, |GParam|: 43.20, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 2600/11961, Batch size: 16, LR: 0.0500, PPL: 5731.35, |Param|: 5494.54, |GParam|: 37.70, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 2650/11961, Batch size: 16, LR: 0.0500, PPL: 5746.94, |Param|: 5494.51, |GParam|: 61.41, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 2700/11961, Batch size: 16, LR: 0.0500, PPL: 5758.76, |Param|: 5494.46, |GParam|: 47.20, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 2750/11961, Batch size: 16, LR: 0.0500, PPL: 5750.24, |Param|: 5494.43, |GParam|: 40.70, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 2800/11961, Batch size: 16, LR: 0.0500, PPL: 5755.92, |Param|: 5494.39, |GParam|: 89.45, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 2850/11961, Batch size: 16, LR: 0.0500, PPL: 5773.44, |Param|: 5494.35, |GParam|: 69.22, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 2900/11961, Batch size: 16, LR: 0.0500, PPL: 5768.13, |Param|: 5494.31, |GParam|: 40.74, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 2950/11961, Batch size: 16, LR: 0.0500, PPL: 5757.27, |Param|: 5494.28, |GParam|: 52.43, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 3000/11961, Batch size: 16, LR: 0.0500, PPL: 5757.40, |Param|: 5494.24, |GParam|: 79.49, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 3050/11961, Batch size: 16, LR: 0.0500, PPL: 5755.36, |Param|: 5494.20, |GParam|: 61.18, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 3100/11961, Batch size: 16, LR: 0.0500, PPL: 5746.36, |Param|: 5494.17, |GParam|: 57.55, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 3150/11961, Batch size: 16, LR: 0.0500, PPL: 5741.95, |Param|: 5494.14, |GParam|: 30.34, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 3200/11961, Batch size: 16, LR: 0.0500, PPL: 5734.78, |Param|: 5494.10, |GParam|: 56.03, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 3250/11961, Batch size: 16, LR: 0.0500, PPL: 5732.33, |Param|: 5494.07, |GParam|: 55.53, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 3300/11961, Batch size: 16, LR: 0.0500, PPL: 5739.37, |Param|: 5494.03, |GParam|: 80.52, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 3350/11961, Batch size: 16, LR: 0.0500, PPL: 5743.80, |Param|: 5493.99, |GParam|: 25.30, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 3400/11961, Batch size: 16, LR: 0.0500, PPL: 5734.13, |Param|: 5493.95, |GParam|: 85.45, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 3450/11961, Batch size: 16, LR: 0.0500, PPL: 5734.03, |Param|: 5493.92, |GParam|: 67.97, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 3500/11961, Batch size: 16, LR: 0.0500, PPL: 5731.94, |Param|: 5493.88, |GParam|: 76.92, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 3550/11961, Batch size: 16, LR: 0.0500, PPL: 5722.72, |Param|: 5493.85, |GParam|: 47.86, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 3600/11961, Batch size: 16, LR: 0.0500, PPL: 5730.33, |Param|: 5493.81, |GParam|: 44.74, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 3650/11961, Batch size: 16, LR: 0.0500, PPL: 5719.16, |Param|: 5493.78, |GParam|: 64.04, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 3700/11961, Batch size: 16, LR: 0.0500, PPL: 5716.33, |Param|: 5493.74, |GParam|: 31.32, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 3750/11961, Batch size: 16, LR: 0.0500, PPL: 5721.02, |Param|: 5493.70, |GParam|: 26.67, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 3800/11961, Batch size: 16, LR: 0.0500, PPL: 5717.86, |Param|: 5493.66, |GParam|: 20.33, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 3850/11961, Batch size: 16, LR: 0.0500, PPL: 5723.14, |Param|: 5493.62, |GParam|: 42.46, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 3900/11961, Batch size: 16, LR: 0.0500, PPL: 5717.85, |Param|: 5493.59, |GParam|: 37.53, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 3950/11961, Batch size: 16, LR: 0.0500, PPL: 5717.25, |Param|: 5493.55, |GParam|: 58.23, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 4000/11961, Batch size: 16, LR: 0.0500, PPL: 5714.43, |Param|: 5493.52, |GParam|: 63.38, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 4050/11961, Batch size: 16, LR: 0.0500, PPL: 5716.98, |Param|: 5493.48, |GParam|: 46.08, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 4100/11961, Batch size: 16, LR: 0.0500, PPL: 5715.78, |Param|: 5493.44, |GParam|: 54.35, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 4150/11961, Batch size: 16, LR: 0.0500, PPL: 5713.86, |Param|: 5493.41, |GParam|: 51.95, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 4200/11961, Batch size: 16, LR: 0.0500, PPL: 5713.68, |Param|: 5493.37, |GParam|: 49.44, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 4250/11961, Batch size: 16, LR: 0.0500, PPL: 5716.13, |Param|: 5493.33, |GParam|: 40.53, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 4300/11961, Batch size: 16, LR: 0.0500, PPL: 5709.59, |Param|: 5493.30, |GParam|: 68.39, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 4350/11961, Batch size: 16, LR: 0.0500, PPL: 5709.33, |Param|: 5493.26, |GParam|: 60.20, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 4400/11961, Batch size: 16, LR: 0.0500, PPL: 5707.47, |Param|: 5493.22, |GParam|: 70.45, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 4450/11961, Batch size: 16, LR: 0.0500, PPL: 5704.97, |Param|: 5493.19, |GParam|: 91.90, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 4500/11961, Batch size: 16, LR: 0.0500, PPL: 5710.39, |Param|: 5493.14, |GParam|: 64.29, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 4550/11961, Batch size: 16, LR: 0.0500, PPL: 5707.57, |Param|: 5493.11, |GParam|: 45.41, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 4600/11961, Batch size: 16, LR: 0.0500, PPL: 5706.90, |Param|: 5493.07, |GParam|: 73.11, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 4650/11961, Batch size: 16, LR: 0.0500, PPL: 5705.37, |Param|: 5493.04, |GParam|: 110.84, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 4700/11961, Batch size: 16, LR: 0.0500, PPL: 5701.57, |Param|: 5493.00, |GParam|: 41.97, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 4750/11961, Batch size: 16, LR: 0.0500, PPL: 5709.09, |Param|: 5492.96, |GParam|: 30.43, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 4800/11961, Batch size: 16, LR: 0.0500, PPL: 5708.44, |Param|: 5492.92, |GParam|: 24.31, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 4850/11961, Batch size: 16, LR: 0.0500, PPL: 5704.96, |Param|: 5492.89, |GParam|: 32.75, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 4900/11961, Batch size: 16, LR: 0.0500, PPL: 5705.62, |Param|: 5492.85, |GParam|: 33.13, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 4950/11961, Batch size: 16, LR: 0.0500, PPL: 5702.95, |Param|: 5492.82, |GParam|: 45.74, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 5000/11961, Batch size: 16, LR: 0.0500, PPL: 5702.43, |Param|: 5492.78, |GParam|: 68.89, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 5050/11961, Batch size: 16, LR: 0.0500, PPL: 5703.40, |Param|: 5492.74, |GParam|: 61.86, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 5100/11961, Batch size: 16, LR: 0.0500, PPL: 5703.02, |Param|: 5492.71, |GParam|: 85.30, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 5150/11961, Batch size: 16, LR: 0.0500, PPL: 5707.58, |Param|: 5492.67, |GParam|: 79.92, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 5200/11961, Batch size: 16, LR: 0.0500, PPL: 5704.51, |Param|: 5492.63, |GParam|: 27.79, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 5250/11961, Batch size: 5, LR: 0.0500, PPL: 5697.52, |Param|: 5492.60, |GParam|: 103.91, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 5300/11961, Batch size: 16, LR: 0.0500, PPL: 5695.63, |Param|: 5492.57, |GParam|: 58.44, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 5350/11961, Batch size: 16, LR: 0.0500, PPL: 5691.73, |Param|: 5492.53, |GParam|: 21.31, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 5400/11961, Batch size: 16, LR: 0.0500, PPL: 5691.64, |Param|: 5492.49, |GParam|: 110.40, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 5450/11961, Batch size: 16, LR: 0.0500, PPL: 5692.69, |Param|: 5492.45, |GParam|: 64.53, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 5500/11961, Batch size: 16, LR: 0.0500, PPL: 5693.76, |Param|: 5492.42, |GParam|: 55.05, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 5550/11961, Batch size: 16, LR: 0.0500, PPL: 5694.14, |Param|: 5492.38, |GParam|: 63.50, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 5600/11961, Batch size: 16, LR: 0.0500, PPL: 5697.44, |Param|: 5492.34, |GParam|: 78.94, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 5650/11961, Batch size: 16, LR: 0.0500, PPL: 5697.38, |Param|: 5492.30, |GParam|: 49.37, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 5700/11961, Batch size: 16, LR: 0.0500, PPL: 5691.54, |Param|: 5492.27, |GParam|: 69.49, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 5750/11961, Batch size: 16, LR: 0.0500, PPL: 5691.14, |Param|: 5492.23, |GParam|: 54.80, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 5800/11961, Batch size: 16, LR: 0.0500, PPL: 5690.85, |Param|: 5492.20, |GParam|: 61.03, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 5850/11961, Batch size: 16, LR: 0.0500, PPL: 5695.70, |Param|: 5492.16, |GParam|: 41.39, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 5900/11961, Batch size: 16, LR: 0.0500, PPL: 5692.32, |Param|: 5492.12, |GParam|: 52.67, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 5950/11961, Batch size: 16, LR: 0.0500, PPL: 5693.88, |Param|: 5492.08, |GParam|: 38.50, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 6000/11961, Batch size: 16, LR: 0.0500, PPL: 5690.83, |Param|: 5492.05, |GParam|: 35.28, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 6050/11961, Batch size: 16, LR: 0.0500, PPL: 5690.06, |Param|: 5492.01, |GParam|: 63.66, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 6100/11961, Batch size: 16, LR: 0.0500, PPL: 5687.24, |Param|: 5491.98, |GParam|: 69.57, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 6150/11961, Batch size: 16, LR: 0.0500, PPL: 5684.78, |Param|: 5491.95, |GParam|: 35.59, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 6200/11961, Batch size: 16, LR: 0.0500, PPL: 5682.06, |Param|: 5491.91, |GParam|: 64.42, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 6250/11961, Batch size: 16, LR: 0.0500, PPL: 5677.83, |Param|: 5491.88, |GParam|: 64.75, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 6300/11961, Batch size: 16, LR: 0.0500, PPL: 5680.64, |Param|: 5491.83, |GParam|: 58.00, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 6350/11961, Batch size: 16, LR: 0.0500, PPL: 5677.81, |Param|: 5491.80, |GParam|: 46.84, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 6400/11961, Batch size: 16, LR: 0.0500, PPL: 5679.47, |Param|: 5491.76, |GParam|: 49.63, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 6450/11961, Batch size: 16, LR: 0.0500, PPL: 5680.28, |Param|: 5491.72, |GParam|: 71.38, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 6500/11961, Batch size: 16, LR: 0.0500, PPL: 5680.18, |Param|: 5491.68, |GParam|: 55.47, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 6550/11961, Batch size: 16, LR: 0.0500, PPL: 5679.42, |Param|: 5491.64, |GParam|: 76.72, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 6600/11961, Batch size: 16, LR: 0.0500, PPL: 5674.56, |Param|: 5491.61, |GParam|: 48.88, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 6650/11961, Batch size: 16, LR: 0.0500, PPL: 5672.94, |Param|: 5491.57, |GParam|: 50.58, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 6700/11961, Batch size: 16, LR: 0.0500, PPL: 5671.47, |Param|: 5491.54, |GParam|: 68.87, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 6750/11961, Batch size: 16, LR: 0.0500, PPL: 5665.43, |Param|: 5491.50, |GParam|: 72.40, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 6800/11961, Batch size: 16, LR: 0.0500, PPL: 5663.48, |Param|: 5491.47, |GParam|: 21.49, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 6850/11961, Batch size: 16, LR: 0.0500, PPL: 5658.93, |Param|: 5491.43, |GParam|: 63.30, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 6900/11961, Batch size: 16, LR: 0.0500, PPL: 5656.70, |Param|: 5491.40, |GParam|: 57.77, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 6950/11961, Batch size: 16, LR: 0.0500, PPL: 5655.51, |Param|: 5491.36, |GParam|: 57.54, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 7000/11961, Batch size: 16, LR: 0.0500, PPL: 5655.77, |Param|: 5491.32, |GParam|: 30.30, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 7050/11961, Batch size: 16, LR: 0.0500, PPL: 5656.30, |Param|: 5491.29, |GParam|: 63.97, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 7100/11961, Batch size: 16, LR: 0.0500, PPL: 5655.33, |Param|: 5491.25, |GParam|: 72.09, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 7150/11961, Batch size: 16, LR: 0.0500, PPL: 5652.81, |Param|: 5491.22, |GParam|: 46.57, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 7200/11961, Batch size: 16, LR: 0.0500, PPL: 5651.63, |Param|: 5491.18, |GParam|: 42.02, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 7250/11961, Batch size: 16, LR: 0.0500, PPL: 5651.08, |Param|: 5491.15, |GParam|: 66.14, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 7300/11961, Batch size: 16, LR: 0.0500, PPL: 5648.61, |Param|: 5491.11, |GParam|: 54.65, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 7350/11961, Batch size: 16, LR: 0.0500, PPL: 5648.19, |Param|: 5491.08, |GParam|: 46.47, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 7400/11961, Batch size: 16, LR: 0.0500, PPL: 5644.45, |Param|: 5491.05, |GParam|: 48.11, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 7450/11961, Batch size: 16, LR: 0.0500, PPL: 5643.90, |Param|: 5491.01, |GParam|: 49.74, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 7500/11961, Batch size: 16, LR: 0.0500, PPL: 5643.45, |Param|: 5490.97, |GParam|: 58.30, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 7550/11961, Batch size: 16, LR: 0.0500, PPL: 5644.99, |Param|: 5490.93, |GParam|: 61.40, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 7600/11961, Batch size: 16, LR: 0.0500, PPL: 5645.07, |Param|: 5490.89, |GParam|: 65.85, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 7650/11961, Batch size: 16, LR: 0.0500, PPL: 5643.00, |Param|: 5490.85, |GParam|: 83.92, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 7700/11961, Batch size: 16, LR: 0.0500, PPL: 5639.62, |Param|: 5490.82, |GParam|: 63.94, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 7750/11961, Batch size: 16, LR: 0.0500, PPL: 5641.40, |Param|: 5490.78, |GParam|: 45.02, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 7800/11961, Batch size: 16, LR: 0.0500, PPL: 5637.39, |Param|: 5490.75, |GParam|: 26.20, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 7850/11961, Batch size: 16, LR: 0.0500, PPL: 5635.14, |Param|: 5490.71, |GParam|: 85.82, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 7900/11961, Batch size: 16, LR: 0.0500, PPL: 5636.59, |Param|: 5490.67, |GParam|: 44.03, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 7950/11961, Batch size: 16, LR: 0.0500, PPL: 5636.31, |Param|: 5490.64, |GParam|: 53.50, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 8000/11961, Batch size: 16, LR: 0.0500, PPL: 5634.14, |Param|: 5490.60, |GParam|: 50.61, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 8050/11961, Batch size: 16, LR: 0.0500, PPL: 5631.27, |Param|: 5490.56, |GParam|: 44.14, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 8100/11961, Batch size: 16, LR: 0.0500, PPL: 5626.93, |Param|: 5490.53, |GParam|: 66.63, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 8150/11961, Batch size: 16, LR: 0.0500, PPL: 5619.43, |Param|: 5490.50, |GParam|: 48.82, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 8200/11961, Batch size: 16, LR: 0.0500, PPL: 5616.69, |Param|: 5490.47, |GParam|: 58.79, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 8250/11961, Batch size: 16, LR: 0.0500, PPL: 5615.03, |Param|: 5490.43, |GParam|: 29.13, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 8300/11961, Batch size: 16, LR: 0.0500, PPL: 5612.05, |Param|: 5490.40, |GParam|: 122.14, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 8350/11961, Batch size: 16, LR: 0.0500, PPL: 5608.68, |Param|: 5490.36, |GParam|: 67.26, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 8400/11961, Batch size: 16, LR: 0.0500, PPL: 5609.19, |Param|: 5490.33, |GParam|: 89.26, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 8450/11961, Batch size: 16, LR: 0.0500, PPL: 5605.94, |Param|: 5490.29, |GParam|: 52.51, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 8500/11961, Batch size: 16, LR: 0.0500, PPL: 5601.48, |Param|: 5490.26, |GParam|: 44.12, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 8550/11961, Batch size: 16, LR: 0.0500, PPL: 5602.27, |Param|: 5490.22, |GParam|: 69.75, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 8600/11961, Batch size: 16, LR: 0.0500, PPL: 5599.67, |Param|: 5490.19, |GParam|: 80.25, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 8650/11961, Batch size: 16, LR: 0.0500, PPL: 5601.45, |Param|: 5490.15, |GParam|: 79.92, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 8700/11961, Batch size: 16, LR: 0.0500, PPL: 5600.73, |Param|: 5490.11, |GParam|: 46.52, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 8750/11961, Batch size: 16, LR: 0.0500, PPL: 5597.92, |Param|: 5490.08, |GParam|: 75.81, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 8800/11961, Batch size: 16, LR: 0.0500, PPL: 5595.73, |Param|: 5490.04, |GParam|: 47.23, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 8850/11961, Batch size: 16, LR: 0.0500, PPL: 5598.33, |Param|: 5490.00, |GParam|: 74.50, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 8900/11961, Batch size: 16, LR: 0.0500, PPL: 5592.45, |Param|: 5489.97, |GParam|: 71.20, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 8950/11961, Batch size: 16, LR: 0.0500, PPL: 5590.44, |Param|: 5489.94, |GParam|: 66.26, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 9000/11961, Batch size: 16, LR: 0.0500, PPL: 5588.36, |Param|: 5489.90, |GParam|: 55.30, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 9050/11961, Batch size: 16, LR: 0.0500, PPL: 5584.09, |Param|: 5489.87, |GParam|: 70.62, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 9100/11961, Batch size: 16, LR: 0.0500, PPL: 5582.35, |Param|: 5489.83, |GParam|: 81.92, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 9150/11961, Batch size: 16, LR: 0.0500, PPL: 5580.17, |Param|: 5489.80, |GParam|: 81.73, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 9200/11961, Batch size: 16, LR: 0.0500, PPL: 5574.99, |Param|: 5489.77, |GParam|: 40.38, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 9250/11961, Batch size: 16, LR: 0.0500, PPL: 5570.87, |Param|: 5489.73, |GParam|: 75.29, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 9300/11961, Batch size: 16, LR: 0.0500, PPL: 5565.73, |Param|: 5489.70, |GParam|: 32.24, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 9350/11961, Batch size: 16, LR: 0.0500, PPL: 5564.06, |Param|: 5489.66, |GParam|: 64.38, Training: 133/64/68 total/source/target tokens/sec Epoch: 8, Batch: 9400/11961, Batch size: 16, LR: 0.0500, PPL: 5563.42, |Param|: