-
Notifications
You must be signed in to change notification settings - Fork 906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem with loss = nan during training #214
Comments
Hi! |
I used VOTT to export it to xml (pascal voc format) but I noticed some values were out of bounds so I wrote a script to correct those mistakes. |
No possible answer for my problem? |
Sorry for late reply... First, i would like to know how are you getting your data: 1.Are you using VoTT,LabelImage or something else to generate your tfrecord files? #You just say that you were using Vott so no need to answer.
I hope you could answer as soon as possible in order to help you. |
No problem! 1 and 2: We used VoTT to create pascalvoc format annotations for the images (we started with an older tensorflow a year ago or so that required this format). These were then converted using a slightly edited voc2012.py script:
And I also added this to train.py:
3: The classnames are in Dutch so there's that but the content of the .names file is: 4: I think I did, however I did hardcode them into train.py (I will change that if it's necessary and try again but this has always worked in the past):
For point 5, I already added the content of a tfrecord file in the opening post (excluded the encoded part). It outputs the same information as using your script. |
Okey, it will take me a little... Now i'm preparing my Lunch :P, but when this ends i will start to analize your code |
No problem! Will also keep looking for the problem. |
Ok, the only issue that I could suppose is that VoTT returned corrupted files...
In order to solve that, you need to change your code to generate single tfrecords for image. I'll give you the code that I use to convert pascal voc format to single tfrecord files in the next comment beacuse i'm on a different PC :P. |
import hashlib
import io
import logging
import os
from lxml import etree
import PIL.Image
import tensorflow as tf
import os
import glob
import pandas as pd
import xml.etree.ElementTree as ET
import os
import io
import pandas as pd
import tensorflow as tf
import hashlib
from PIL import Image
import dataset_util
from collections import namedtuple, OrderedDict
import random
## GENERATE CSV##
def xml_to_csv(path):
xml_list = []
for xml_file,filename in [(path+"/"+a,a )for a in os.listdir(path) if ".xml" in a]:
tree = ET.parse(xml_file)
root = tree.getroot()
for member in root.findall('object'):
value = (filename.replace(".xml",".jpg"),
int(root.find('size')[0].text),
int(root.find('size')[1].text),
member[0].text,
int(member[4][0].text),
int(member[4][1].text),
int(member[4][2].text),
int(member[4][3].text)
)
xml_list.append(value)
column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax']
xml_df = pd.DataFrame(xml_list, columns=column_name)
return xml_df
def create_csv():
for folder in ['train','test']: #Modify if you have more than one folder
image_path = os.path.join(os.getcwd(), ('images/' + folder)) #Images path
xml_df = xml_to_csv(image_path)
xml_df.to_csv(('images/' + folder + '_labels.csv'), index=None)
print('Successfully converted xml to csv.')
create_csv()
##GENERATE TFRECORDS##
from collections import namedtuple
def create_function(lista_tags):
def class_text_to_int(row_label):
if row_label in lista_tags:
return lista_tags.index(row_label)+1
else:
None
return class_text_to_int
class_text_to_int = create_function(list) #Tag list HERE
def split(df, group):
data = namedtuple('data', ['filename', 'object'])
gb = df.groupby(group)
return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)]
def create_tf_example(group, path):
with tf.compat.v1.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid:
encoded_jpg = fid.read()
encoded_jpg_io = io.BytesIO(encoded_jpg)
image = Image.open(encoded_jpg_io)
width, height = image.size
key = hashlib.sha256(encoded_jpg).hexdigest()
filename = group.filename.encode('utf8')
image_format = b'jpg'
xmins = []
xmaxs = []
ymins = []
ymaxs = []
classes_text = []
classes = []
difficult_obj = []
truncated = []
poses = []
for i,(index,row) in enumerate(group.object.iterrows()):
if i == 100:
break
xmins.append(row['xmin'] / width)
xmaxs.append(row['xmax'] / width)
ymins.append(row['ymin'] / height)
ymaxs.append(row['ymax'] / height)
classes_text.append(row['class'].encode('utf8'))
classes.append(class_text_to_int(row['class']))
difficult_obj.append(int(False))
truncated.append(int(False))
poses.append("Unspecified".encode('utf8'))
tf_example = tf.train.Example(features=tf.train.Features(feature={
'image/height': dataset_util.int64_feature(height),
'image/width': dataset_util.int64_feature(width),
'image/key/sha256': dataset_util.bytes_feature(key.encode('utf8')),
'image/filename': dataset_util.bytes_feature(filename),
'image/source_id': dataset_util.bytes_feature(filename),
'image/encoded': dataset_util.bytes_feature(encoded_jpg),
'image/format': dataset_util.bytes_feature(image_format),
'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
'image/object/difficult': dataset_util.int64_list_feature(difficult_obj),
'image/object/truncated': dataset_util.int64_list_feature(truncated),
'image/object/view': dataset_util.bytes_list_feature(poses),
'image/object/class/label': dataset_util.int64_list_feature(classes),
}))
return tf_example
path = os.path.join(os.getcwd(), "./images/train") # PATH Imagenes
examples = pd.read_csv("./02_17_20.csv") #Path CSV
grouped = split(examples, 'filename')
for group in grouped:
a = "./tfrecords/"+list(group.object.filename)[0].replace(".jpg","")+".tfrecord" #PATH tfrecords
writer = tf.compat.v1.python_io.TFRecordWriter(a)
tf_example = create_tf_example(group, path)
writer.write(tf_example.SerializeToString())
writer.close() Also, you need dataset_util.py that contains: import tensorflow as tf
def int64_feature(value):
return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
def int64_list_feature(value):
return tf.train.Feature(int64_list=tf.train.Int64List(value=value))
def bytes_feature(value):
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
def bytes_list_feature(value):
return tf.train.Feature(bytes_list=tf.train.BytesList(value=value))
def float_list_feature(value):
return tf.train.Feature(float_list=tf.train.FloatList(value=value))
def read_examples_list(path):
with tf.gfile.GFile(path) as fid:
lines = fid.readlines()
return [line.strip().split(' ')[0] for line in lines]
def recursive_parse_xml_to_dict(xml):
if not xml:
return {xml.tag: xml.text}
result = {}
for child in xml:
child_result = recursive_parse_xml_to_dict(child)
if child.tag != 'object':
result[child.tag] = child_result[child.tag]
else:
if child.tag not in result:
result[child.tag] = []
result[child.tag].append(child_result[child.tag])
return {xml.tag: result} Is needless to say that those are not my scripts (The first script I modified for my convenience... The second one too i guess :P). That's why you could see multiple equal imports and it's because it's a kinda old version of my code (The new version is very specific, that's why i didn't paste that code here). The original material is from [here].(https://github.com/EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10) |
Thank you very much! |
I think I might have found the mistake in my annotation boxes. Sometimes ymin is larger than ymax and sometimes ymax is larger than ymin. Do you think this could be the cause of the problem? And if so, which way should I save them as, ymax the larger one or ymin the larger one? |
Yes, this is just a mistake. |
Problems have been fixed by the solutions offered in this thread. Closing the issue now. |
Hello, I've been trying to get a custom dataset (11k training images and 1k validation images) trained with this model but I always get a loss which is nan after a while.
Eager_tf mode:
I also went and checked my tfrecords (which I create using a slightly edited voc2012.py) but I don't see anything wrong with the outputs:
In my classes.names file I have 29 classes and 'koffer' is indeed on line 17.
If need be I could also post the edited version of the voc2012 however I onnly edited the reading of files because my folder layout is a bit different from what's been posted here.
I've noticed that some xmin or ymin values can be really small e.g.:
But I don't think that's the source of the problem. I've also checked with #128 but my annotations of the .xml files already have xmin, xmax, ymin and ymax that are correct (so not out of bounds or anything).
The text was updated successfully, but these errors were encountered: