Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does KDDAug make baseline worse #4

Open
zhongshsh opened this issue Mar 20, 2023 · 4 comments
Open

Why does KDDAug make baseline worse #4

zhongshsh opened this issue Mar 20, 2023 · 4 comments

Comments

@zhongshsh
Copy link

zhongshsh commented Mar 20, 2023

Baseline (updn, v2)

I run the following code:

CUDA_VISIBLE_DEVICES=0 python main.py --dataset v2 --mode updn --debias none --output v2_updn --seed 0

and get the following log:

epoch 0, time: 166.78
	train_loss: 11.61, score: 37.06
	eval score: 49.19 (91.72)
	yn score: 68.32 other score: 38.98 num score: 32.80
epoch 1, time: 157.35
	train_loss: 3.79, score: 51.77
	eval score: 55.12 (91.72)
	yn score: 71.71 other score: 47.50 num score: 36.24
epoch 2, time: 159.59
	train_loss: 3.44, score: 56.62
	eval score: 57.38 (91.72)
	yn score: 73.28 other score: 50.15 num score: 39.04
epoch 3, time: 159.02
	train_loss: 3.24, score: 59.76
	eval score: 59.68 (91.72)
	yn score: 76.77 other score: 51.95 num score: 39.78
epoch 4, time: 158.03
	train_loss: 3.08, score: 62.20
	eval score: 60.85 (91.72)
	yn score: 78.06 other score: 53.19 num score: 40.39
epoch 5, time: 157.90
	train_loss: 2.96, score: 64.18
	eval score: 61.78 (91.72)
	yn score: 79.15 other score: 53.89 num score: 41.72
epoch 6, time: 156.29
	train_loss: 2.86, score: 65.90
	eval score: 62.31 (91.72)
	yn score: 79.52 other score: 54.48 num score: 42.44
epoch 7, time: 152.53
	train_loss: 2.77, score: 67.50
	eval score: 62.90 (91.72)
	yn score: 80.50 other score: 54.93 num score: 42.49
epoch 8, time: 153.10
	train_loss: 2.68, score: 68.95
	eval score: 63.15 (91.72)
	yn score: 80.72 other score: 55.26 num score: 42.46
epoch 9, time: 150.43
	train_loss: 2.60, score: 70.30
	eval score: 63.37 (91.72)
	yn score: 81.08 other score: 55.38 num score: 42.68
epoch 10, time: 153.21
	train_loss: 2.53, score: 71.61
	eval score: 63.51 (91.72)
	yn score: 81.20 other score: 55.63 num score: 42.53
epoch 11, time: 151.57
	train_loss: 2.46, score: 72.83
	eval score: 63.51 (91.72)
	yn score: 81.03 other score: 55.62 num score: 43.02
epoch 12, time: 154.94
	train_loss: 2.40, score: 74.09
	eval score: 63.69 (91.72)
	yn score: 81.50 other score: 55.68 num score: 42.82
epoch 13, time: 157.07
	train_loss: 2.34, score: 75.11
	eval score: 63.64 (91.72)
	yn score: 81.34 other score: 55.64 num score: 43.01
epoch 14, time: 150.56
	train_loss: 2.28, score: 76.12
	eval score: 63.79 (91.72)
	yn score: 81.44 other score: 55.82 num score: 43.20
epoch 15, time: 155.87
	train_loss: 2.23, score: 76.99
	eval score: 63.78 (91.72)
	yn score: 81.41 other score: 55.85 num score: 43.09
epoch 16, time: 156.20
	train_loss: 2.18, score: 77.83
	eval score: 63.72 (91.72)
	yn score: 81.30 other score: 55.77 num score: 43.29
epoch 17, time: 155.62
	train_loss: 2.13, score: 78.56
	eval score: 63.62 (91.72)
	yn score: 81.29 other score: 55.73 num score: 42.67
epoch 18, time: 153.85
	train_loss: 2.09, score: 79.26
	eval score: 63.82 (91.72)
	yn score: 81.51 other score: 55.82 num score: 43.20
epoch 19, time: 149.12
	train_loss: 2.05, score: 79.91
	eval score: 63.67 (91.72)
	yn score: 81.20 other score: 55.75 num score: 43.21
epoch 20, time: 156.56
	train_loss: 2.01, score: 80.50
	eval score: 63.58 (91.72)
	yn score: 81.18 other score: 55.65 num score: 42.98
epoch 21, time: 157.06
	train_loss: 1.97, score: 81.08
	eval score: 63.54 (91.72)
	yn score: 81.24 other score: 55.70 num score: 42.31
epoch 22, time: 159.29
	train_loss: 1.94, score: 81.44
	eval score: 63.61 (91.72)
	yn score: 81.31 other score: 55.71 num score: 42.61
epoch 23, time: 158.90
	train_loss: 1.91, score: 81.95
	eval score: 63.64 (91.72)
	yn score: 81.27 other score: 55.67 num score: 43.11
epoch 24, time: 153.60
	train_loss: 1.88, score: 82.34
	eval score: 63.47 (91.72)
	yn score: 81.12 other score: 55.60 num score: 42.50
epoch 25, time: 162.35
	train_loss: 1.85, score: 82.77
	eval score: 63.50 (91.72)
	yn score: 81.21 other score: 55.55 num score: 42.67
epoch 26, time: 150.99
	train_loss: 1.82, score: 83.18
	eval score: 63.50 (91.72)
	yn score: 81.21 other score: 55.53 num score: 42.75
epoch 27, time: 155.69
	train_loss: 1.79, score: 83.49
	eval score: 63.47 (91.72)
	yn score: 81.40 other score: 55.47 num score: 42.20
epoch 28, time: 149.41
	train_loss: 1.77, score: 83.81
	eval score: 63.42 (91.72)
	yn score: 81.24 other score: 55.51 num score: 42.09
epoch 29, time: 155.37
	train_loss: 1.74, score: 84.16
	eval score: 63.45 (91.72)
	yn score: 81.33 other score: 55.48 num score: 42.24

Finetune

However, when I run the finetune code, score is worse.

all_aug_dataset

CUDA_VISIBLE_DEVICES=0 python aug_main.py --backbone logs/v2_updn/model.pth --aug_name all --dataset v2 --output v2_all_finetune --seed 0
epoch 0, time: 621.42
	train_loss: 19.15, score: 78.05
	eval score: 58.51 (91.72)
	yn score: 69.84 other score: 55.08 num score: 38.95
epoch 1, time: 600.32
	train_loss: 18.59, score: 79.09
	eval score: 58.25 (91.72)
	yn score: 69.47 other score: 54.94 num score: 38.57
epoch 2, time: 627.15
	train_loss: 18.43, score: 79.43
	eval score: 58.16 (91.72)
	yn score: 69.13 other score: 54.89 num score: 39.04
epoch 3, time: 596.63
	train_loss: 18.35, score: 79.63
	eval score: 57.81 (91.72)
	yn score: 68.39 other score: 54.79 num score: 38.84
epoch 4, time: 559.34
	train_loss: 18.28, score: 79.77
	eval score: 57.72 (91.72)
	yn score: 68.10 other score: 54.76 num score: 39.12
epoch 5, time: 555.54
	train_loss: 18.24, score: 79.88
	eval score: 57.59 (91.72)
	yn score: 67.94 other score: 54.75 num score: 38.66
epoch 6, time: 538.99
	train_loss: 18.20, score: 79.96
	eval score: 57.63 (91.72)
	yn score: 68.08 other score: 54.66 num score: 38.84
epoch 7, time: 537.62
	train_loss: 18.17, score: 80.04
	eval score: 57.53 (91.72)
	yn score: 67.75 other score: 54.71 num score: 38.85
epoch 8, time: 535.85
	train_loss: 18.15, score: 80.10
	eval score: 57.41 (91.72)
	yn score: 67.61 other score: 54.65 num score: 38.59
epoch 9, time: 572.04
	train_loss: 18.13, score: 80.15
	eval score: 57.28 (91.72)
	yn score: 67.27 other score: 54.58 num score: 38.84

clip-based filtering aug_dataset

CUDA_VISIBLE_DEVICES=1 python aug_main.py --backbone logs/v2_updn/model.pth --aug_name total --dataset v2 --output v2_total_finetune --seed 0
epoch 0, time: 94.15
	train_loss: 16.94, score: 81.52
	eval score: 60.17 (91.72)
	yn score: 74.87 other score: 54.53 num score: 39.29
epoch 1, time: 87.94
	train_loss: 15.91, score: 82.73
	eval score: 59.61 (91.72)
	yn score: 73.58 other score: 54.47 num score: 38.95
epoch 2, time: 88.37
	train_loss: 15.70, score: 83.12
	eval score: 59.47 (91.72)
	yn score: 73.31 other score: 54.42 num score: 38.83
epoch 3, time: 90.02
	train_loss: 15.57, score: 83.35
	eval score: 59.00 (91.72)
	yn score: 72.20 other score: 54.31 num score: 38.88
epoch 4, time: 101.34
	train_loss: 15.49, score: 83.51
	eval score: 58.77 (91.72)
	yn score: 71.83 other score: 54.19 num score: 38.62
epoch 5, time: 91.20
	train_loss: 15.42, score: 83.66
	eval score: 58.85 (91.72)
	yn score: 72.30 other score: 54.05 num score: 38.38
epoch 6, time: 93.16
	train_loss: 15.37, score: 83.78
	eval score: 58.41 (91.72)
	yn score: 71.18 other score: 53.97 num score: 38.56
epoch 7, time: 90.26
	train_loss: 15.33, score: 83.86
	eval score: 58.47 (91.72)
	yn score: 71.52 other score: 53.93 num score: 38.15
epoch 8, time: 87.14
	train_loss: 15.29, score: 83.94
	eval score: 58.60 (91.72)
	yn score: 71.94 other score: 53.84 num score: 38.27
epoch 9, time: 99.16
	train_loss: 15.26, score: 84.02
	eval score: 58.16 (91.72)
	yn score: 70.87 other score: 53.75 num score: 38.32
@zhongshsh zhongshsh changed the title Why does data-augmented finetune make baseline worse Why does KDDAug make baseline worse Mar 20, 2023
@ItemZheng
Copy link
Owner

Which teacher did you use?How abont the performanc of ID teacher and OOD teacher on VQA v2 dataset?

@zhongshsh
Copy link
Author

I train the teacher by following the README. Specifically, use the code :

CUDA_VISIBLE_DEVICES=0 python main.py --dataset v2 --mode q_v_debias --debias learned_mixin --topq 1 --topv -1 --qvp 5 --output v2_lmh_css --seed 2048

Then I assign new answer

CUDA_VISIBLE_DEVICES=0 python assign_answer.py --dataset v2 --name other --split low --teacher_path logs/v2_lmh_css/model.pth

@ItemZheng
Copy link
Owner

For VQA v2, I may train the teacher by following command:

CUDA_VISIBLE_DEVICES=0 python main.py --dataset v2 --mode q_v_debias --debias learned_mixin --topq 1 --topv -1 --qvp 9 --output v2_lmh_css --seed 2048 --epoch 40

You can try again.

@zhongshsh
Copy link
Author

zhongshsh commented Mar 24, 2023

I train the teacher by :

CUDA_VISIBLE_DEVICES=4 python main.py --dataset v2 --mode q_v_debias --debias learned_mixin --topq 1 --topv -1 --qvp 9 --output v2_lmh_css_issue --seed 2048  --epoch 40

The content of log as follows, which is higher than before.

epoch 0, time: 240.35
	train_loss: 6.22, score: 17.05
	eval score: 38.57 (91.72)
	yn score: 51.48 other score: 35.36 num score: 13.66
epoch 1, time: 194.00
	train_loss: 3.69, score: 34.63
	eval score: 40.82 (91.72)
	yn score: 40.48 other score: 45.30 num score: 24.97
epoch 2, time: 212.26
	train_loss: 3.35, score: 35.37
	eval score: 35.37 (91.72)
	yn score: 23.41 other score: 48.40 num score: 20.69
epoch 3, time: 206.79
	train_loss: 3.19, score: 39.23
	eval score: 37.90 (91.72)
	yn score: 24.16 other score: 50.52 num score: 29.86
epoch 4, time: 206.55
	train_loss: 3.06, score: 41.93
	eval score: 42.73 (91.72)
	yn score: 35.48 other score: 51.36 num score: 31.08
epoch 5, time: 204.88
	train_loss: 2.95, score: 44.03
	eval score: 44.77 (91.72)
	yn score: 38.17 other score: 52.71 num score: 33.81
epoch 6, time: 207.57
	train_loss: 2.86, score: 46.77
	eval score: 44.46 (91.72)
	yn score: 36.56 other score: 53.07 num score: 34.73
epoch 7, time: 206.11
	train_loss: 2.78, score: 49.78
	eval score: 48.31 (91.72)
	yn score: 46.27 other score: 53.78 num score: 33.59
epoch 8, time: 204.72
	train_loss: 2.73, score: 51.48
	eval score: 48.91 (91.72)
	yn score: 46.82 other score: 53.99 num score: 35.78
epoch 9, time: 209.48
	train_loss: 2.67, score: 52.84
	eval score: 48.97 (91.72)
	yn score: 46.76 other score: 54.21 num score: 35.65
epoch 10, time: 206.95
	train_loss: 2.62, score: 54.73
	eval score: 47.96 (91.72)
	yn score: 44.44 other score: 54.43 num score: 33.68
epoch 11, time: 206.52
	train_loss: 2.56, score: 56.71
	eval score: 48.97 (91.72)
	yn score: 46.31 other score: 54.58 num score: 35.48
epoch 12, time: 206.86
	train_loss: 2.51, score: 58.15
	eval score: 52.85 (91.72)
	yn score: 55.86 other score: 54.87 num score: 36.63
epoch 13, time: 202.63
	train_loss: 2.47, score: 60.24
	eval score: 52.95 (91.72)
	yn score: 55.77 other score: 54.85 num score: 37.70
epoch 14, time: 204.77
	train_loss: 2.42, score: 61.68
	eval score: 53.30 (91.72)
	yn score: 56.29 other score: 54.90 num score: 38.71
epoch 15, time: 207.28
	train_loss: 2.38, score: 63.30
	eval score: 54.19 (91.72)
	yn score: 58.96 other score: 55.12 num score: 37.05
epoch 16, time: 203.41
	train_loss: 2.34, score: 64.82
	eval score: 54.36 (91.72)
	yn score: 59.30 other score: 55.02 num score: 37.79
epoch 17, time: 204.64
	train_loss: 2.31, score: 65.78
	eval score: 56.17 (91.72)
	yn score: 63.46 other score: 55.09 num score: 39.36
epoch 18, time: 209.91
	train_loss: 2.29, score: 66.96
	eval score: 56.15 (91.72)
	yn score: 64.00 other score: 55.18 num score: 37.30
epoch 19, time: 206.39
	train_loss: 2.24, score: 68.34
	eval score: 56.17 (91.72)
	yn score: 63.59 other score: 55.01 num score: 39.25
epoch 20, time: 208.02
	train_loss: 2.23, score: 69.14
	eval score: 55.12 (91.72)
	yn score: 60.96 other score: 55.09 num score: 38.53
epoch 21, time: 209.03
	train_loss: 2.20, score: 70.46
	eval score: 54.36 (91.72)
	yn score: 59.11 other score: 54.77 num score: 39.19
epoch 22, time: 206.20
	train_loss: 2.18, score: 70.92
	eval score: 56.75 (91.72)
	yn score: 64.47 other score: 55.18 num score: 40.55
epoch 23, time: 209.92
	train_loss: 2.15, score: 72.00
	eval score: 56.62 (91.72)
	yn score: 64.29 other score: 55.25 num score: 39.77
epoch 24, time: 208.95
	train_loss: 2.12, score: 72.64
	eval score: 57.59 (91.72)
	yn score: 66.79 other score: 55.21 num score: 40.19
epoch 25, time: 208.71
	train_loss: 2.12, score: 73.41
	eval score: 57.88 (91.72)
	yn score: 67.65 other score: 55.35 num score: 39.39
epoch 26, time: 207.85
	train_loss: 2.08, score: 74.34
	eval score: 57.71 (91.72)
	yn score: 67.62 other score: 55.21 num score: 38.71
epoch 27, time: 209.99
	train_loss: 2.07, score: 74.69
	eval score: 58.24 (91.72)
	yn score: 69.08 other score: 55.11 num score: 38.91
epoch 28, time: 211.65
	train_loss: 2.06, score: 75.41
	eval score: 58.48 (91.72)
	yn score: 69.65 other score: 55.14 num score: 39.01
epoch 29, time: 208.22
	train_loss: 2.04, score: 75.85
	eval score: 58.56 (91.72)
	yn score: 69.85 other score: 55.20 num score: 38.88
epoch 30, time: 206.02
	train_loss: 2.01, score: 76.51
	eval score: 58.84 (91.72)
	yn score: 70.59 other score: 55.00 num score: 39.68
epoch 31, time: 209.33
	train_loss: 2.02, score: 76.54
	eval score: 58.52 (91.72)
	yn score: 69.40 other score: 55.03 num score: 40.53
epoch 32, time: 207.57
	train_loss: 1.99, score: 77.05
	eval score: 57.85 (91.72)
	yn score: 67.83 other score: 55.04 num score: 39.81
epoch 33, time: 211.12
	train_loss: 1.98, score: 77.48
	eval score: 58.95 (91.72)
	yn score: 70.91 other score: 54.99 num score: 39.56
epoch 34, time: 210.15
	train_loss: 1.97, score: 77.84
	eval score: 58.10 (91.72)
	yn score: 68.10 other score: 55.09 num score: 40.79
epoch 35, time: 210.03
	train_loss: 1.94, score: 78.30
	eval score: 58.96 (91.72)
	yn score: 70.89 other score: 54.92 num score: 39.94
epoch 36, time: 209.06
	train_loss: 1.91, score: 78.71
	eval score: 59.26 (91.72)
	yn score: 71.95 other score: 54.96 num score: 39.11
epoch 37, time: 207.74
	train_loss: 1.92, score: 78.87
	eval score: 57.91 (91.72)
	yn score: 67.98 other score: 54.95 num score: 40.17
epoch 38, time: 210.99
	train_loss: 1.89, score: 79.22
	eval score: 58.88 (91.72)
	yn score: 70.60 other score: 54.91 num score: 40.25
epoch 39, time: 210.63
	train_loss: 1.87, score: 79.44
	eval score: 58.61 (91.72)
	yn score: 70.33 other score: 54.78 num score: 39.47

But when I update new answer to aug data and finetune the backbone, the results still go worse.

CUDA_VISIBLE_DEVICES=0 python aug_main.py --backbone logs/v2_updn/model.pth --aug_name all --dataset v2 --output v2_all_finetune_issue --seed 0
epoch 0, time: 532.43
	train_loss: 13.46, score: 83.21
	eval score: 62.51 (91.72)
	yn score: 79.48 other score: 55.25 num score: 41.24
epoch 1, time: 514.16
	train_loss: 12.96, score: 84.34
	eval score: 62.44 (91.72)
	yn score: 79.40 other score: 55.19 num score: 41.11
epoch 2, time: 539.51
	train_loss: 12.80, score: 84.78
	eval score: 62.35 (91.72)
	yn score: 79.24 other score: 55.16 num score: 41.02
epoch 3, time: 544.58
	train_loss: 12.70, score: 85.05
	eval score: 62.33 (91.72)
	yn score: 79.41 other score: 55.03 num score: 40.89
epoch 4, time: 525.27
	train_loss: 12.63, score: 85.26
	eval score: 62.21 (91.72)
	yn score: 79.13 other score: 54.97 num score: 41.00
epoch 5, time: 524.51
	train_loss: 12.58, score: 85.42
	eval score: 62.12 (91.72)
	yn score: 78.88 other score: 54.97 num score: 40.96
epoch 6, time: 522.15
	train_loss: 12.53, score: 85.56
	eval score: 62.08 (91.72)
	yn score: 78.94 other score: 54.95 num score: 40.60
epoch 7, time: 527.31
	train_loss: 12.50, score: 85.67
	eval score: 62.03 (91.72)
	yn score: 78.78 other score: 54.86 num score: 41.00
epoch 8, time: 517.79
	train_loss: 12.46, score: 85.76
	eval score: 61.94 (91.72)
	yn score: 78.66 other score: 54.82 num score: 40.83
epoch 9, time: 515.08
	train_loss: 12.44, score: 85.86
	eval score: 61.95 (91.72)
	yn score: 78.67 other score: 54.76 num score: 41.09
CUDA_VISIBLE_DEVICES=1 python aug_main.py --backbone logs/v2_updn/model.pth --aug_name total --dataset v2 --output v2_total_finetune_issue --seed 0
epoch 0, time: 91.96
	train_loss: 11.77, score: 85.08
	eval score: 62.37 (91.72)
	yn score: 80.15 other score: 54.47 num score: 41.12
epoch 1, time: 101.57
	train_loss: 11.03, score: 86.24
	eval score: 62.26 (91.72)
	yn score: 79.92 other score: 54.40 num score: 41.19
epoch 2, time: 95.16
	train_loss: 10.84, score: 86.65
	eval score: 62.11 (91.72)
	yn score: 79.81 other score: 54.33 num score: 40.66
epoch 3, time: 94.00
	train_loss: 10.73, score: 86.94
	eval score: 62.04 (91.72)
	yn score: 79.75 other score: 54.23 num score: 40.67
epoch 4, time: 99.22
	train_loss: 10.64, score: 87.17
	eval score: 61.98 (91.72)
	yn score: 79.72 other score: 54.16 num score: 40.63
epoch 5, time: 98.48
	train_loss: 10.57, score: 87.32
	eval score: 61.97 (91.72)
	yn score: 79.77 other score: 54.11 num score: 40.54
epoch 6, time: 93.91
	train_loss: 10.52, score: 87.49
	eval score: 61.80 (91.72)
	yn score: 79.57 other score: 53.94 num score: 40.47
epoch 7, time: 95.05
	train_loss: 10.48, score: 87.61
	eval score: 61.81 (91.72)
	yn score: 79.74 other score: 53.86 num score: 40.37
epoch 8, time: 99.80
	train_loss: 10.44, score: 87.73
	eval score: 61.65 (91.72)
	yn score: 79.43 other score: 53.83 num score: 40.14
epoch 9, time: 97.78
	train_loss: 10.41, score: 87.82
	eval score: 61.63 (91.72)
	yn score: 79.41 other score: 53.73 num score: 40.42

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants