-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong Sampling Frame Rate and Questions about Data #1
Comments
Hi! @MeiliMa
Dataset files we used are the original Vertical/scripts/add_ethucy_datasets.py Lines 20 to 24 in f6e6736
Vertical/scripts/add_ethucy_datasets.py Lines 34 to 38 in f6e6736
Vertical/scripts/add_ethucy_datasets.py Lines 48 to 52 in f6e6736
SDD dataset files we used are their original 0 1354 1121 1406 1184 4000 1 0 0 "Biker"
0 1354 1121 1406 1184 4001 1 0 1 "Biker"
0 1354 1121 1406 1184 4002 1 0 1 "Biker"
0 1354 1121 1406 1184 4003 1 0 1 "Biker"
0 1354 1121 1406 1184 4004 1 0 1 "Biker"
0 1354 1121 1406 1184 4005 1 0 1 "Biker"
0 1354 1121 1406 1184 4006 1 0 1 "Biker" These annotations are with bounding boxes, and we process these files and divide trajectories with Vertical/scripts/sdd_txt2csv.py Lines 41 to 46 in f6e6736
Here scale = 100 is a scaling parameter to process data to the similar scales of ETH-UCY. The reported results in our paper have been corrected (*100) using this parameter.Lines 28 to 41 in f6e6736
The frame_step = 0.4 / (1/30) = 12 frames, and the video is annotated at 30 fps, thus the sample interval is also 0.4 seconds.
I hope I can help to solve your problem (i.e., why our data seem different from the trajectron++ split (it seems to be used first by social GAN), even though both ours and theirs come from the same original file).
I see some previous works (with-no-Trajectron++ sources) have also actually used 6-frame-step on the eth sub-dataset. Further, I have checked the datasets used on eth-ucy in Ynet and trajectron++ that you mentioned above. The results seem that Ynet is using a pixel-based ETH-UCY data, and the total amount of them (884 different lines in eth) is less than the original data used in both trajectron++ (5492lines in eth) and our method (8909lines in eth). We will try to use the new data obtained after the traj++ interpolation on this dataset in the future. Please also point out if you have better insights. |
@MeiliMa |
I compared the annotation lines of the ETHUCY dataset used in the different approaches, where one 2D coordinate of an agent at a given moment is defined as a
It shows that the data used to train our model is almost the same with trajectron++ (except for the ETH-eth subset). After a rough comparison, we also found that the amount of SDD data used by Y-Net was also less than the original SDD dataset. Therefore, to ensure a fair comparison, we have produced data from the raw data (SDD: SDD, ETH: BIWI Walking Pedestrians dataset), as there are now so-many versions of processed data. **Information about the INFO:
The annotation was done at 2.5 fps, that is with a timestep of 0.4 seconds.
NOTES:
This sequence was acquired from the top of the ETH main building, Zurich, by Stefano Pellegrini and Andreas Ess in 2009. Therefore, the annotation step is 0.4s, just as our dataset files |
A 6-frame sampling interval will lead to an observation and prediction horizon of 1.92s and 2.88s long given 8- and 12-frame samples respectively, while a 10-frame interval will lead to 3.2s and 4.8s. So, the time horizon of the data used for Trajectron++ is 1.7 times as long as your data. May this be a reason why your results on ETH are much better than Trajectron++? Besides the frame sampling difference, the data used by Trajectron++, SocialGAN, etc. have only two digital places and your data have four digital places. I do not know how much error it would bring about, given the gap between your results and Trajectron++ is quite small on datasets other than ETH. For SDD, the following are scenarios used by YNet for training and testing, according to the data they shared:
The files you used are, according to sdd.plist:
So, you use 36 training scenarios and 12 testing scenarios, while YNet uses only 30 training scenarios but 17 testing scenarios. Anyway, this is a great work. I hope you could show the advantages of your work clearly by drawing a more fair comparison with existing approaches. |
Why this issue was closed? I think you need to solve the problem or give a mark in your paper to talk about this before closing this issue. |
original ETH dataset files: |
I do not think your statement makes sense. To subsampling from 25FPS to 2.5FPS means that you need to draw samples every 10 frames. Why did you say that it is correct to use a 6-frame interval? I know there are different versions of training and testing split or data processing for those data. It is normal, as there are lots of researchers doing work in this field. It is your duty to draw a fair comparison in your work when publishing the paper. Because Trajectron++, etc. have published their code, you can run it with your data if you think their data usage is wrong, rather than saying nothing in your paper and misleading the readers. I think this issue should leave open to let others know there is a data usage difference between this work and other baselines. Or you should explicitly say it in the README file and your paper. |
I apologize for the lack of clarity in my presentation. In simple terms, the original dataset file has labeled frames with intervals of six, e.g.
But note that this phenomenon only appears in the ETH-eth sub-dataset. |
This should be the actually original data annotated with 2.5FPS. https://github.com/crowdbotp/OpenTraj/blob/master/datasets/ETH/seq_eth/biwi_eth_10fps.txt |
The data you used were wrongly annotated, as the clarification can be found at |
I do not agree with you. The original file could be also found at And the info could fe found at: |
The same statement can be found in README.txt from the data downloaded from the official link. Regardless of which version of data is correct, yours or those used by SocialGAN, Trajectron++, AgentFormer, etc., you should mention it in your paper rather than saying nothing and misleading the readers, which also should include the difference of other data beside ETH. As the data with 10-frame interval has been widely used, if you do think it is wrong, I really hope you could point it out in your paper and help the community correctize it. |
The |
Their original video file could be downloaded at https://data.vision.ee.ethz.ch/cvl/aem/ewap_dataset_full.tgz. |
@MeiliMa I re-read the Y-Net article carefully and they say
In fact, the SDD dataset contains many other kinds of agents such as bicycles and cars, which are filtered by Ynet's training data, and which will make the prediction easier than the original full-dataset (like we used) (because bicycles, cars, etc. all move faster than people and also have differences in interaction relationships). However, most of the previous baselines before Ynet considered all kinds of agents at the same time. Why don't you go ahead and ask them to use all the datasets, but assume that I am using the wrong dataset? We fully respect and understand what these authors at Y-Net's paper are doing, as they provide good enough work. But I don't agree with your use of the word At the same time, this issue has been illustrated in the recent CPVR2022 work "End-to-End Trajectory Distribution Prediction Based on Occupancy Grid Maps":
where the mentioned 27 and 28 are both works from Y-Net's authors. Besides SDD, a series of classical works (like the Social LSTM) on trajectory prediction use the same ETH-UCY dataset files as we do. We cannot accept the Finally, we will point out the differences between these datasets, such as |
Your work looks awesome. While trying to reproduce it myself, I found the sampling frame rate in the code is different from that reported in the paper.
The sampling is controlled by the following code
Vertical/codes/dataset/__manager.py
Lines 188 to 189 in f6e6736
where
frame_step
is 1 leading to an interval of 6 frames in ETH/UCY.However, the interval mentioned in the paper and that used by other baselines are 10 actually. Using a 6-frame interval will largely reduce the trajectory length and make the prediction easier.
For ETH/UCY, your data are different to some baselines that you compared in the paper. For example, the data used in Trajectron++ can be found at https://github.com/StanfordASL/Trajectron-plus-plus/tree/master/experiments/pedestrians/raw. They use the same data with most baselines, including SocialGAN, AgentFormer, TransformerTF, etc.
For SDD, the data used for YNet can be found at https://github.com/HarshayuGirase/Human-Path-Prediction/tree/master/ynet#pretrained-models-data-and-config-files, which is also different to those you used.
To draw a fair comparison, I think you need to make sure using the same training/testing data to those baselines listed in your paper.
The text was updated successfully, but these errors were encountered: