Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to train with custom data - AssertionError: datasets should not be an empty iterable #172

Closed
pendex900x opened this issue Jun 1, 2020 · 13 comments

Comments

@pendex900x
Copy link

pendex900x commented Jun 1, 2020

I use this command to train: !python3 '/content/deep-text-recognition-benchmark/create_lmdb_dataset.py' --inputPath '/content/deep-text-recognition-benchmark/train' --gtFile '/content/gt.txt' --outputPath '/content/deep-text-recognition-benchmark/result'
I want to use it to detect license plates.

Input_path have inputPath with input images, with names like 'CJFY10,jpg'
outputPath it is an empty folder.
gtFile it is an txt with this format:

C:/Users/X/Desktop/deep-text-recognition-benchmark-master/train/DVHS56.png DVHS56
C:/Users/X/Desktop/deep-text-recognition-benchmark-master/train/DYVS72.png DYVS72
C:/Users/X/Desktop/deep-text-recognition-benchmark-master/val/HDYP18.png HDYP18
C:/Users/X/Desktop/deep-text-recognition-benchmark-master/val/HKHT72.png HKHT72
C:/Users/X/Desktop/deep-text-recognition-benchmark-master/val/HPXC69.png HPXC69

And error when I run the command it is:

Traceback (most recent call last):
  File "/content/deep-text-recognition-benchmark/create_lmdb_dataset.py", line 87, in <module>
    fire.Fire(createDataset)
  File "/usr/local/lib/python3.6/dist-packages/fire/core.py", line 138, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/usr/local/lib/python3.6/dist-packages/fire/core.py", line 468, in _Fire
    target=component.__name__)
  File "/usr/local/lib/python3.6/dist-packages/fire/core.py", line 672, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/content/deep-text-recognition-benchmark/create_lmdb_dataset.py", line 47, in createDataset
    imagePath, label = datalist[i].strip('\n').split('\t')
ValueError: not enough values to unpack (expected 2, got 1)

What I'm doing wrong? Any help is welcome

@vipin-kunam
Copy link

vipin-kunam commented Jun 2, 2020

Make sure your gt file does not contain space in the middle or at the end.Every line should have path and label seperated by space.This is because a particular line in gt file contains only space.

@pendex900x
Copy link
Author

pendex900x commented Jun 2, 2020

Make sure your gt file does not contain space in the middle or at the end.Every line should have path and label seperated by space.This is because a particular line in gt file contains only space.

I had to be modified this line
env = lmdb.open(outputPath, map_size=1099511627776) to this env = lmdb.open (outputPath, map_size = 10995116277)

The output was 2 files in result folder, data.mdb and lock.mdb

I changed that line because of space problems. Now I'm running that in my local computer and not in Google Colab.

And also I did this 1 step of this [link] by @ku21fan (#85) except the last one of

you should change opt.character into your own character list

And now when I trying to run: python train.py --train_data 'C:/Users/X/Desktop/deep-text-recognition-benchmark-master/train' --valid_data 'C:/Users/X/Desktop/deep-text-recognition-benchmark-master/val' --Transformation TPS --FeatureExtraction ResNet --SequenceModeling BiLSTM --Prediction Attn

But the output is:

Filtering the images containing characters which are not in opt.character
Filtering the images whose label is longer than opt.batch_max_length
--------------------------------------------------------------------------------
dataset_root: 'C:/Users/X/Desktop/deep-text-recognition-benchmark-master/train'
opt.select_data: ['/']
opt.batch_ratio: ['1']
--------------------------------------------------------------------------------
dataset_root:    'C:/Users/X/Desktop/deep-text-recognition-benchmark-master/train'       dataset: /
Traceback (most recent call last):
  File "train.py", line 314, in <module>
    train(opt)
  File "train.py", line 31, in train
    train_dataset = Batch_Balanced_Dataset(opt)
  File "C:\Users\X\Desktop\deep-text-recognition-benchmark-master\dataset.py", line 42, in __init__
    _dataset, _dataset_log = hierarchical_dataset(root=opt.train_data, opt=opt, select_data=[selected_d])
  File "C:\Users\X\Desktop\deep-text-recognition-benchmark-master\dataset.py", line 124, in hierarchical_dataset
    concatenated_dataset = ConcatDataset(dataset_list)
  File "C:\Users\X\Anaconda3\lib\site-packages\torch\utils\data\dataset.py", line 68, in __init__
    assert len(datasets) > 0, 'datasets should not be an empty iterable'
AssertionError: datasets should not be an empty iterable

Why this error happends? AssertionError: datasets should not be an empty iterable

@pendex900x pendex900x changed the title How to train with custom data How to train with custom data - AssertionError: datasets should not be an empty iterable Jun 2, 2020
@ku21fan
Copy link
Contributor

ku21fan commented Jun 5, 2020

Hello,

  1. the image path and labels should be separated with 'tab'
    {imagepath}\t{label}\n
    \t means 'tab'

So I recommend replacing .png (.png + space) with .png\t (.png + tab) in your gt file.

  1. AssertionError: datasets should not be an empty iterable
    -> this error occurs when the lmdb dataset is empty.
    so, if you succeed to create the lmdb dataset, it would be also solved.

Hope it helps,
Best

@jingjie181
Copy link

jingjie181 commented Oct 29, 2021

Hello @ku21fan,

I have followed your instructions and changed the gt file to follow the format {imagepath}\t{label}\n however, I am still facing the AssertionError. Is there anyway to check if the lmdb dataset is created properly?

I am currently trying to train the model on Thai. I have changed the character list as follows:

image

and also updated --select_data and --batch_ratio as follows:

image

but I am still facing the same error when I try to train using the command
train.py --train_data train/result/data.mdb --valid_data validation/result/data.mdb --Transformation None --FeatureExtraction VGG --SequenceModeling BiLSTM --Prediction CTC --data_filtering_off

image

Is there anything I am missing out?

@lvforce
Copy link

lvforce commented Nov 5, 2021

Hello @ku21fan,

I have followed your instructions and changed the gt file to follow the format {imagepath}\t{label}\n however, I am still facing the AssertionError. Is there anyway to check if the lmdb dataset is created properly?

I am currently trying to train the model on Thai. I have changed the character list as follows:

image

and also updated --select_data and --batch_ratio as follows:

image

but I am still facing the same error when I try to train using the command train.py --train_data train/result/data.mdb --valid_data validation/result/data.mdb --Transformation None --FeatureExtraction VGG --SequenceModeling BiLSTM --Prediction CTC --data_filtering_off

image

Is there anything I am missing out?

did you resolve it?

@678098
Copy link

678098 commented Jan 25, 2022

So I've encountered the same problem. It looks like train.py args like --select_data and --batch_ratio are needed to be set together to some value to not filter out directories with some names.

So if you have train directory, you can specify --select_data train --batch_ratio 0.5 to make it work.

@bilalltf
Copy link

I'm facing the same issue did anyone resolve it?

@wjbmattingly
Copy link

Did anyone ever resolve this issue?

@TheChwal
Copy link

TheChwal commented Apr 3, 2023

Same issue here

@MauroLeidi
Copy link

Maybe you have a blank line at the end if you are using pandas.to_csv(separator = '\t') function. You should remove any blank line at the end of the file (\n alone in the last line)

@Vaibhavsun
Copy link

Has anyone find solution for datasets should not be any empty iterable

@MathieuPaillart
Copy link

Same issue here

@S0mbre
Copy link

S0mbre commented Dec 3, 2024

Same here. Issue remains.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests