How to train with custom data - AssertionError: datasets should not be an empty iterable #172

pendex900x · 2020-06-01T22:45:02Z

I use this command to train: !python3 '/content/deep-text-recognition-benchmark/create_lmdb_dataset.py' --inputPath '/content/deep-text-recognition-benchmark/train' --gtFile '/content/gt.txt' --outputPath '/content/deep-text-recognition-benchmark/result'
I want to use it to detect license plates.

Input_path have inputPath with input images, with names like 'CJFY10,jpg'
outputPath it is an empty folder.
gtFile it is an txt with this format:

C:/Users/X/Desktop/deep-text-recognition-benchmark-master/train/DVHS56.png DVHS56
C:/Users/X/Desktop/deep-text-recognition-benchmark-master/train/DYVS72.png DYVS72
C:/Users/X/Desktop/deep-text-recognition-benchmark-master/val/HDYP18.png HDYP18
C:/Users/X/Desktop/deep-text-recognition-benchmark-master/val/HKHT72.png HKHT72
C:/Users/X/Desktop/deep-text-recognition-benchmark-master/val/HPXC69.png HPXC69

And error when I run the command it is:

Traceback (most recent call last):
  File "/content/deep-text-recognition-benchmark/create_lmdb_dataset.py", line 87, in <module>
    fire.Fire(createDataset)
  File "/usr/local/lib/python3.6/dist-packages/fire/core.py", line 138, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/usr/local/lib/python3.6/dist-packages/fire/core.py", line 468, in _Fire
    target=component.__name__)
  File "/usr/local/lib/python3.6/dist-packages/fire/core.py", line 672, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/content/deep-text-recognition-benchmark/create_lmdb_dataset.py", line 47, in createDataset
    imagePath, label = datalist[i].strip('\n').split('\t')
ValueError: not enough values to unpack (expected 2, got 1)

What I'm doing wrong? Any help is welcome

The text was updated successfully, but these errors were encountered:

vipin-kunam · 2020-06-02T10:36:30Z

Make sure your gt file does not contain space in the middle or at the end.Every line should have path and label seperated by space.This is because a particular line in gt file contains only space.

pendex900x · 2020-06-02T10:59:39Z

Make sure your gt file does not contain space in the middle or at the end.Every line should have path and label seperated by space.This is because a particular line in gt file contains only space.

I had to be modified this line
env = lmdb.open(outputPath, map_size=1099511627776) to this env = lmdb.open (outputPath, map_size = 10995116277)

The output was 2 files in result folder, data.mdb and lock.mdb

I changed that line because of space problems. Now I'm running that in my local computer and not in Google Colab.

And also I did this 1 step of this [link] by @ku21fan (#85) except the last one of

you should change opt.character into your own character list

And now when I trying to run: python train.py --train_data 'C:/Users/X/Desktop/deep-text-recognition-benchmark-master/train' --valid_data 'C:/Users/X/Desktop/deep-text-recognition-benchmark-master/val' --Transformation TPS --FeatureExtraction ResNet --SequenceModeling BiLSTM --Prediction Attn

But the output is:

Filtering the images containing characters which are not in opt.character
Filtering the images whose label is longer than opt.batch_max_length
--------------------------------------------------------------------------------
dataset_root: 'C:/Users/X/Desktop/deep-text-recognition-benchmark-master/train'
opt.select_data: ['/']
opt.batch_ratio: ['1']
--------------------------------------------------------------------------------
dataset_root:    'C:/Users/X/Desktop/deep-text-recognition-benchmark-master/train'       dataset: /
Traceback (most recent call last):
  File "train.py", line 314, in <module>
    train(opt)
  File "train.py", line 31, in train
    train_dataset = Batch_Balanced_Dataset(opt)
  File "C:\Users\X\Desktop\deep-text-recognition-benchmark-master\dataset.py", line 42, in __init__
    _dataset, _dataset_log = hierarchical_dataset(root=opt.train_data, opt=opt, select_data=[selected_d])
  File "C:\Users\X\Desktop\deep-text-recognition-benchmark-master\dataset.py", line 124, in hierarchical_dataset
    concatenated_dataset = ConcatDataset(dataset_list)
  File "C:\Users\X\Anaconda3\lib\site-packages\torch\utils\data\dataset.py", line 68, in __init__
    assert len(datasets) > 0, 'datasets should not be an empty iterable'
AssertionError: datasets should not be an empty iterable

Why this error happends? AssertionError: datasets should not be an empty iterable

ku21fan · 2020-06-05T09:59:17Z

Hello,

the image path and labels should be separated with 'tab'
{imagepath}\t{label}\n
\t means 'tab'

So I recommend replacing .png (.png + space) with .png\t (.png + tab) in your gt file.

AssertionError: datasets should not be an empty iterable
-> this error occurs when the lmdb dataset is empty.
so, if you succeed to create the lmdb dataset, it would be also solved.

Hope it helps,
Best

jingjie181 · 2021-10-29T07:41:34Z

Hello @ku21fan,

I have followed your instructions and changed the gt file to follow the format {imagepath}\t{label}\n however, I am still facing the AssertionError. Is there anyway to check if the lmdb dataset is created properly?

I am currently trying to train the model on Thai. I have changed the character list as follows:

and also updated --select_data and --batch_ratio as follows:

but I am still facing the same error when I try to train using the command
train.py --train_data train/result/data.mdb --valid_data validation/result/data.mdb --Transformation None --FeatureExtraction VGG --SequenceModeling BiLSTM --Prediction CTC --data_filtering_off

Is there anything I am missing out?

lvforce · 2021-11-05T21:29:52Z

Hello @ku21fan,

I have followed your instructions and changed the gt file to follow the format {imagepath}\t{label}\n however, I am still facing the AssertionError. Is there anyway to check if the lmdb dataset is created properly?

I am currently trying to train the model on Thai. I have changed the character list as follows:

and also updated --select_data and --batch_ratio as follows:

but I am still facing the same error when I try to train using the command train.py --train_data train/result/data.mdb --valid_data validation/result/data.mdb --Transformation None --FeatureExtraction VGG --SequenceModeling BiLSTM --Prediction CTC --data_filtering_off

Is there anything I am missing out?

did you resolve it?

678098 · 2022-01-25T02:38:42Z

So I've encountered the same problem. It looks like train.py args like --select_data and --batch_ratio are needed to be set together to some value to not filter out directories with some names.

So if you have train directory, you can specify --select_data train --batch_ratio 0.5 to make it work.

bilalltf · 2022-03-11T16:23:41Z

I'm facing the same issue did anyone resolve it?

wjbmattingly · 2022-05-23T15:40:00Z

Did anyone ever resolve this issue?

TheChwal · 2023-04-03T00:49:55Z

Same issue here

MauroLeidi · 2023-04-27T20:57:49Z

Maybe you have a blank line at the end if you are using pandas.to_csv(separator = '\t') function. You should remove any blank line at the end of the file (\n alone in the last line)

Vaibhavsun · 2024-07-11T12:13:33Z

Has anyone find solution for datasets should not be any empty iterable

MathieuPaillart · 2024-10-17T13:59:59Z

Same issue here

S0mbre · 2024-12-03T06:39:18Z

Same here. Issue remains.

pendex900x changed the title ~~How to train with custom data~~ How to train with custom data - AssertionError: datasets should not be an empty iterable Jun 2, 2020

ku21fan closed this as completed Jun 5, 2020

This was referenced Jun 5, 2020

How to train custom data after creating mdb files. #174

Closed

about test.py #163

Closed

Aldemaro14 mentioned this issue Jul 24, 2020

TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType' #204

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to train with custom data - AssertionError: datasets should not be an empty iterable #172

How to train with custom data - AssertionError: datasets should not be an empty iterable #172

pendex900x commented Jun 1, 2020 •

edited

Loading

vipin-kunam commented Jun 2, 2020 •

edited

Loading

pendex900x commented Jun 2, 2020 •

edited

Loading

ku21fan commented Jun 5, 2020

jingjie181 commented Oct 29, 2021 •

edited

Loading

lvforce commented Nov 5, 2021

678098 commented Jan 25, 2022

bilalltf commented Mar 11, 2022

wjbmattingly commented May 23, 2022

TheChwal commented Apr 3, 2023

MauroLeidi commented Apr 27, 2023

Vaibhavsun commented Jul 11, 2024

MathieuPaillart commented Oct 17, 2024

S0mbre commented Dec 3, 2024

How to train with custom data - AssertionError: datasets should not be an empty iterable #172

How to train with custom data - AssertionError: datasets should not be an empty iterable #172

Comments

pendex900x commented Jun 1, 2020 • edited Loading

vipin-kunam commented Jun 2, 2020 • edited Loading

pendex900x commented Jun 2, 2020 • edited Loading

ku21fan commented Jun 5, 2020

jingjie181 commented Oct 29, 2021 • edited Loading

lvforce commented Nov 5, 2021

678098 commented Jan 25, 2022

bilalltf commented Mar 11, 2022

wjbmattingly commented May 23, 2022

TheChwal commented Apr 3, 2023

MauroLeidi commented Apr 27, 2023

Vaibhavsun commented Jul 11, 2024

MathieuPaillart commented Oct 17, 2024

S0mbre commented Dec 3, 2024

pendex900x commented Jun 1, 2020 •

edited

Loading

vipin-kunam commented Jun 2, 2020 •

edited

Loading

pendex900x commented Jun 2, 2020 •

edited

Loading

jingjie181 commented Oct 29, 2021 •

edited

Loading