Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_on_tap.py results don't match expected results. #13

Open
AssafSinger94 opened this issue Oct 15, 2023 · 4 comments
Open

test_on_tap.py results don't match expected results. #13

AssafSinger94 opened this issue Oct 15, 2023 · 4 comments

Comments

@AssafSinger94
Copy link

AssafSinger94 commented Oct 15, 2023

Hello,
When running test_on_tap.py, I get different results than reported in the testing section.
The mean d_avg of all 30 videos (output is added below) is 72.376, compared to d_avg 70.6; survival_16 89.3; median_l2 6.9 reported.
I download the reference mode using sh get_reference_model.sh, and I test on tapvid_davis.pkl which I downloaded and unzipped from https://storage.googleapis.com/dm-tapnet/tapvid_davis.zip.

I would really appreciate any assistance and clarifications on the matter!
Assaf

@AssafSinger94
Copy link
Author

Attached below is the output:

model_name 1_128_i16_tap01_132907
loading TAPVID-DAVIS dataset...
found 30 videos in ./datasets/tapvid_davis
+--------------------------------------------------------+------------+
|                        Modules                         | Parameters |
+--------------------------------------------------------+------------+
|           module.fnet.layer3.0.conv1.weight            |   110592   |
|           module.fnet.layer3.0.conv2.weight            |   147456   |
|           module.fnet.layer3.1.conv1.weight            |   147456   |
|           module.fnet.layer3.1.conv2.weight            |   147456   |
|           module.fnet.layer4.0.conv1.weight            |   147456   |
|           module.fnet.layer4.0.conv2.weight            |   147456   |
|           module.fnet.layer4.1.conv1.weight            |   147456   |
|           module.fnet.layer4.1.conv2.weight            |   147456   |
|                module.fnet.conv2.weight                |   958464   |
|    module.delta_block.first_block_conv.conv.weight     |   275712   |
| module.delta_block.basicblock_list.2.conv2.conv.weight |   196608   |
| module.delta_block.basicblock_list.3.conv1.conv.weight |   196608   |
| module.delta_block.basicblock_list.3.conv2.conv.weight |   196608   |
| module.delta_block.basicblock_list.4.conv1.conv.weight |   393216   |
| module.delta_block.basicblock_list.4.conv2.conv.weight |   786432   |
| module.delta_block.basicblock_list.5.conv1.conv.weight |   786432   |
| module.delta_block.basicblock_list.5.conv2.conv.weight |   786432   |
| module.delta_block.basicblock_list.6.conv1.conv.weight |  1572864   |
| module.delta_block.basicblock_list.6.conv2.conv.weight |  3145728   |
| module.delta_block.basicblock_list.7.conv1.conv.weight |  3145728   |
| module.delta_block.basicblock_list.7.conv2.conv.weight |  3145728   |
+--------------------------------------------------------+------------+
total params: 17.57 M
reading ckpt from ./reference_model
...found checkpoint ./reference_model/model-000200000.pth
1_128_i16_tap01_132907; step 000001/30; rtime 0.01; itime 1.09; d_x 74.9; sur_x 100.0; med_x 1.8
1_128_i16_tap01_132907; step 000002/30; rtime 0.03; itime 0.91; d_x 71.1; sur_x 85.2; med_x 2.7
1_128_i16_tap01_132907; step 000003/30; rtime 0.04; itime 0.58; d_x 69.1; sur_x 87.8; med_x 2.9
1_128_i16_tap01_132907; step 000004/30; rtime 0.03; itime 0.97; d_x 72.6; sur_x 88.2; med_x 3.4
1_128_i16_tap01_132907; step 000005/30; rtime 0.03; itime 0.68; d_x 75.9; sur_x 90.1; med_x 2.8
1_128_i16_tap01_132907; step 000006/30; rtime 0.02; itime 0.71; d_x 77.7; sur_x 87.4; med_x 2.4
1_128_i16_tap01_132907; step 000007/30; rtime 0.02; itime 0.51; d_x 76.8; sur_x 88.6; med_x 3.7
1_128_i16_tap01_132907; step 000008/30; rtime 0.02; itime 1.16; d_x 75.5; sur_x 88.4; med_x 4.0
1_128_i16_tap01_132907; step 000009/30; rtime 0.04; itime 0.87; d_x 75.4; sur_x 88.4; med_x 4.0
1_128_i16_tap01_132907; step 000010/30; rtime 0.04; itime 1.15; d_x 70.9; sur_x 83.3; med_x 9.3
1_128_i16_tap01_132907; step 000011/30; rtime 0.04; itime 1.05; d_x 71.6; sur_x 84.4; med_x 8.7
1_128_i16_tap01_132907; step 000012/30; rtime 0.04; itime 0.93; d_x 71.3; sur_x 85.4; med_x 8.2
1_128_i16_tap01_132907; step 000013/30; rtime 0.03; itime 1.15; d_x 72.8; sur_x 86.5; med_x 7.6
1_128_i16_tap01_132907; step 000014/30; rtime 0.04; itime 0.97; d_x 72.4; sur_x 86.9; med_x 7.3
1_128_i16_tap01_132907; step 000015/30; rtime 0.03; itime 0.95; d_x 71.5; sur_x 86.8; med_x 8.1
1_128_i16_tap01_132907; step 000016/30; rtime 0.03; itime 1.00; d_x 71.7; sur_x 87.6; med_x 7.7
1_128_i16_tap01_132907; step 000017/30; rtime 0.04; itime 0.69; d_x 73.4; sur_x 88.3; med_x 7.3
1_128_i16_tap01_132907; step 000018/30; rtime 0.03; itime 0.75; d_x 73.1; sur_x 89.0; med_x 7.0
1_128_i16_tap01_132907; step 000019/30; rtime 0.03; itime 0.84; d_x 72.2; sur_x 89.0; med_x 6.9
1_128_i16_tap01_132907; step 000020/30; rtime 0.03; itime 0.60; d_x 71.6; sur_x 88.7; med_x 6.8
1_128_i16_tap01_132907; step 000021/30; rtime 0.03; itime 0.69; d_x 71.1; sur_x 88.8; med_x 6.6
1_128_i16_tap01_132907; step 000022/30; rtime 0.02; itime 0.73; d_x 72.0; sur_x 89.3; med_x 6.4
1_128_i16_tap01_132907; step 000023/30; rtime 0.03; itime 0.92; d_x 71.6; sur_x 89.3; med_x 6.2
1_128_i16_tap01_132907; step 000024/30; rtime 0.04; itime 1.07; d_x 71.1; sur_x 88.5; med_x 6.6
1_128_i16_tap01_132907; step 000025/30; rtime 0.04; itime 0.60; d_x 71.5; sur_x 88.6; med_x 6.8
1_128_i16_tap01_132907; step 000026/30; rtime 0.02; itime 0.65; d_x 70.8; sur_x 88.3; med_x 7.4
1_128_i16_tap01_132907; step 000027/30; rtime 0.02; itime 0.61; d_x 70.4; sur_x 88.5; med_x 7.4
1_128_i16_tap01_132907; step 000028/30; rtime 0.03; itime 0.95; d_x 70.4; sur_x 88.7; med_x 7.3
1_128_i16_tap01_132907; step 000029/30; rtime 0.05; itime 0.64; d_x 70.4; sur_x 89.0; med_x 7.1
1_128_i16_tap01_132907; step 000030/30; rtime 0.02; itime 0.60; d_x 70.5; sur_x 89.2; med_x 7.0

@AssafSinger94
Copy link
Author

In addition, I wanted to ask about the following concerns regarding this testing of TAP-Vid DAVIS:

  1. I notice that during data loading of this dataset on datasets.tapviddataset_fullseq.TapVidDavis that the "raw videos" are loaded from the pickle file, which are in 480x854 resolution, and not the 256x256 resolution videos, as described in the paper.
  2. I see that right before model inference, the video and query points are resized to image_size (seen in test_on_tap.test_on_fullseq), which is set in test_on_tap.main to (512,896). Could you please elaborate further about this resizing? when going over the paper I couldn't find any mention of this.

@aharley
Copy link
Owner

aharley commented Oct 15, 2023

Thanks for the messages.

For d_avg: How did you compute 72.376? It looks like it's showing 70.5 in the snippet you posted. The d_x shown in each row is the running average, so 70.5 is the average across the 30 videos.

For resolution: I think some papers use 256x256 at test time, but we find that higher-resolution input helps performance, if you can afford it. The stats are still computed at 256x256 though.

@chandlj
Copy link

chandlj commented Nov 17, 2023

@aharley I'm getting similar results (~70 d_x). The PointOdyssey paper reported about ~63 on this metric. I'm also getting ~7 on the MTE metric, while the paper reported ~4. My results for survival are in line with the paper. I was curious if you had changed or otherwise improved the reference model from the paper, or if there is a bug somewhere?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants