Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

where is the data? #1

Open
baiziyuandyufei opened this issue Jul 18, 2019 · 7 comments
Open

where is the data? #1

baiziyuandyufei opened this issue Jul 18, 2019 · 7 comments

Comments

@baiziyuandyufei
Copy link

hello! I run your project in my computer, the command I runned is
python run.py train
but the program prompt
No such file or directory: 'glove_word2id
I think the code not include the data file, is it?
Thank you!

@manueltonneau
Copy link

Before running the run.py file, you have to run the glove_embedding.py file on the raw data. This will produce two files needed for run.py: glove_embeddings.npy and a glove_word2id dictionary.

@changtianluckyforever
Copy link

Before running the run.py file, you have to run the glove_embedding.py file on the raw data. This will produce two files needed for run.py: glove_embeddings.npy and a glove_word2id dictionary.

Hello! Excuse me, mananeau, I am also reading this project repository, in the text_preprocessing.ipynb file, it imports en_disaster.csv? Could you please let me know where we could find it? thanks!

@manueltonneau
Copy link

Unfortunately, it seems that @sebsk did not share the data csv, or at least, I didn't find it. Looking at the paper, you'll get more information about the data sources:

image

@changtianluckyforever
Copy link

Unfortunately, it seems that @sebsk did not share the data csv, or at least, I didn't find it. Looking at the paper, you'll get more information about the data sources:

image

Thanks!!!!

@rajae-Bens
Copy link

Hi,

did u find the data ? can u share it with us?

Thank u

@sebsk
Copy link
Owner

sebsk commented Dec 22, 2020

Hey guys, sorry for the super late reply. I did not anticipate people would ever notice this project (happy to know:)). The Tweets text data I used in this project are here: https://drive.google.com/drive/folders/1HwR_VKiAZsHPbifoxXrMMekfdEW59Mqk?usp=sharing

I have split them into df_train, df_val, and df_test, but it is not necessary to be split this way. While the aforementioned 3 are preprocessed data, "en_disaster" is the raw dataset.

Sorry again for not responding timely, and big thanks for the attention!

@baiziyuandyufei
Copy link
Author

@sebsk Thank you!!!……^_^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants