-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
To train on my own dataset #85
Comments
One more thing: my dataset has data in non-latin language. Do I need to make any modifications to train.py or any other files? |
Hello,
(2020.8.20 updated) Please read data filtering part carefully.
Furthermore, as a default, we make the label lowercase here. So, for your own dataset, you should modify Another way is just to comment out these filtering part (use
If you do not use the character filtering part and use Hope it helps. |
Thank for the reply. I'm working with japanese particularly. It has thousands of characters. So I have to copy all of them to the character list. Am I correct? |
@boy977
Best |
@ku21fan Hello. I tried to use "--PAD" option along with ''--rgb'' but I got the error |
@boy977 Yes, you will need to change some code to use "--PAD" option along with ''--rgb''. |
@ku21fan one more dump question: what does "norm_ED" stand for? |
@boy977 norm_ED is normalised edit distance it's another metric used to validate STR models. |
@rahzaazhar About the norm_ED value, currently in the source code it is the sum of edit distance in all test cases. (It is not "normalized" yet). |
@dviettu134 Thank you for the comment. Best |
Hi ku21fan ['miim_b', 'saad_m', 'raa_e', ''] Please Help! |
@ooza Hello, In my opinion.. there are 2 easy ways.
or if you can't do this for some reason,
Hope it helps. Best |
@ku21fan Hello, |
我也遇到这个问题,怎么生成训练集呢? |
I have had a problem with my alphabet. The model predicted only digits. The reason was the characters were in uppercase. |
@ooza @ku21fan Could you please help me how to solve the problem of combining two characters and create new character like( شك = ش+ ك ) with characters of Arabic . |
@rm2886 I don't have enough time for testing my solution, but I want to help you. So, I would try to change --character argument from string to list. For example, you have an alphabet "abc", you should use ["a", "b", "c"]. It allows adding different combinations of symbols in the alphabet. Perhaps, you should change the data format from {imagepath}\t{label}\n to {imagepath}\t{l a be l} or something like that for getting a pair (imagepath - ["l", "a", "be", "l"]), because when the data are preparing, labels are processing just iterate by a string, but you should iterate by list (because you want the model thinks that a group of symbols is one symbol). |
Something similar of what @2113vm said happened to me. My dataset was only digits + uppercase letters, so as @ku21fan suggested I skipped these lines: deep-text-recognition-benchmark/dataset.py Lines 209 to 210 in d38c3cb
and set --character as "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ". However, if you don't use the option --data_filtering_off deep-text-recognition-benchmark/dataset.py Line 171 in d38c3cb
to
Otherwise, all letters will be skipped. This happened to me because I didn't use --data_filtering_off. In my case, it might be easier to use it and forget about the filtering part 'cause I had filtered my dataset previously, but I didn't notice. Anyway, I have found @ku21fan 's code for training pretty confortable, the way you print and log all the loss, accuracy, ground truth vs. predictions information during training is really useful and makes the process much easier, thank you! Hope it helps to someone! |
@ku21fan |
Hi I am trying to fine-tune the normal case insensitive model (TPS-ResNet-BiLSTM-Attn) by running the following command. I have also added 4 additional characters to opt.character
It's still showing the following error. Am I missing something? Thanks for help. Iknoor |
The reason is your config is not the same TPS-ResNet-BiLSTM-Attn_15000.pth. You should not change the alphabet |
I see that your dataset code is already inconsistent with TRBA, if you change it to non-Arabic characters, is it already possible to make changes without data filtering or use UNK tokens? |
Hello, everyone, I have been trying for a long time to solve the error 'AssertionError: datasets should not be an empty iterable'.
For me in this way worked. As rule of thumb i would recommend anyway to try to access in different ways the folder in which your dataset has been created. If the error of "iterable dataset" will begin something related to pickle something and in the detail error you see your number samples you are on a right way. Best |
unfortunately, I have tried this but I got another error in the data loader. it reads the data with using this command:
Anyone has faced this issue? |
solved by adding |
I try all the step to resolve this error on training the IAM dataset but I am not able to resolve this error and got the same error again and again |
@pchalotra Getting the same error. Did you solve this? |
any idea |
I had num_samples = 0 error and I found an easier solution, I hope it will help someone as well.
|
RuntimeError: Error(s) in loading state_dict for DataParallel: |
Hi. I created lmdb dataset on my own data by running create_lmdb_dataset.py. then I run the train command on it and got the following output:
CUDA_VISIBLE_DEVICES=0 python3 train.py --train_data result/train --valid_data result/test --Transformation TPS --FeatureExtraction ResNet --SequenceModeling BiLSTM --Prediction Attn
dataset_root: result/train
opt.select_data: ['MJ', 'ST']
opt.batch_ratio: ['0.5', '0.5']
dataset_root: result/train dataset: MJ
Traceback (most recent call last):
File "train.py", line 283, in
train(opt)
File "train.py", line 26, in train
train_dataset = Batch_Balanced_Dataset(opt)
File "/home/mor-ai/Work/deep-text-recognition-benchmark/dataset.py", line 37, in init
_dataset = hierarchical_dataset(root=opt.train_data, opt=opt, select_data=[selected_d])
File "/home/mor-ai/Work/deep-text-recognition-benchmark/dataset.py", line 106, in hierarchical_dataset
concatenated_dataset = ConcatDataset(dataset_list)
File "/home/mor-ai/.local/lib/python3.6/site-packages/torch/utils/data/dataset.py", line 187, in init
assert len(datasets) > 0, 'datasets should not be an empty iterable'
AssertionError: datasets should not be an empty iterable
Can you help me resolve this?
The text was updated successfully, but these errors were encountered: