Skip to content
This repository has been archived by the owner on Dec 1, 2021. It is now read-only.

As for issue #484: remove pandas dependency #1072

Merged
merged 13 commits into from
Jul 30, 2020
Merged

Conversation

lm-lily
Copy link
Contributor

@lm-lily lm-lily commented May 29, 2020

What this patch does to fix the issue.

Remove pandas dependencies.

Link to any relevant issues or pull requests.

Fix for issue #484.

@lm-lily lm-lily requested review from ruimashita and iizukak May 29, 2020 09:28
@blueoil-butler blueoil-butler bot added the CI: auto-run Run CI automatically label May 29, 2020
@bo-code-review-bot
Copy link

This PR needs Approvals as follows.

  • Ownership Approval for / from iizukak, tkng, ruimashita
  • Readability Approval for Python from tkng, tsawada, tfujiwar

Please choose reviewers and requet reviews!

Click to see how to approve each reviews

You can approve this PR by triggered comments as follows.

  • Approve all reviews requested to you (readability and ownership) and LGTM review
    Approval, LGTM

  • Approve all ownership reviews
    Ownership Approval or OA

  • Approve all readability reviews
    Readability Approval or RA

  • Approve specified review targets

    • Example of Ownership Reviewer of /: Ownership Approval for / or OA for /
    • Example of Readability Reviewer of Python: Readability Approval for Python or RA for Python
  • Approve LGTM review
    LGTM

See all trigger comments

Please replace [Target] to review target

  • Ownership Approval
    • Ownership Approval for [Target]
    • OA for [Target]
    • Ownership Approval
    • OA
    • Approval
  • Readability Approval
    • Readability Approval for [Target]
    • RA for [Target]
    • [Target] Readability Approval
    • [Target] RA
    • Readability Approval
    • RA
    • Approval
  • LGTM
    • LGTM
    • lgtm

@lm-lily lm-lily removed request for ruimashita and iizukak May 29, 2020 09:50
@lm-lily lm-lily requested a review from iizukak June 1, 2020 06:34
@iizukak
Copy link
Member

iizukak commented Jun 2, 2020

@lm-lily Can you remove pandas dependency from setup.cfg?
https://github.com/blue-oil/blueoil/blob/master/setup.cfg#L26

Copy link
Member

@iizukak iizukak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lm-lily Thank you. Some comments added.

@@ -54,12 +54,12 @@ def num_max_boxes(self):
return 42

def _annotation_file_from_image_id(self, image_id):
annotation_file = os.path.join(self.annotations_dir, "{:06d}.xml".format(image_id))
annotation_file = os.path.join(self.annotations_dir, "{}.xml".format(image_id))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why :06d removed?
Related to pandas?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file name extracted using my submitted code (after removing pandas dataframe method) failed with :06d format.
I have to adjust this part to make it right.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file name extracted using my submitted code (after removing pandas dataframe method) failed with :06d format.

Sorry, Why this happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file name extracted using my submitted code (after removing pandas dataframe method) failed with :06d format.

Sorry, Why this happen?

The image_id has the format of sample data: 2007_000032. If we apply format {:06} to it, the filename will not match the format we need. It will be 2007000032 instead of 2007_000032, and hence cause the error of file not found.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lm-lily I'm still not sure why this is related to removing pandas. If this is an existing bug, please make another PR to fix that.

Copy link
Contributor Author

@lm-lily lm-lily Jul 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tfujiwar
Thank you Fujiwara san.
I will revert this change.

Copy link
Contributor Author

@lm-lily lm-lily Jul 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tfujiwar
lmnet-test failed when I revert the above code.

[2020-07-20T05:41:28Z] self = <blueoil.datasets.pascalvoc_2007.Pascalvoc2007 object at 0x7ffb8908dc50>
--
  | [2020-07-20T05:41:28Z] image_id = '000085'
  | [2020-07-20T05:41:28Z]
  | [2020-07-20T05:41:28Z]     def _annotation_file_from_image_id(self, image_id):
  | [2020-07-20T05:41:28Z] >       annotation_file = os.path.join(self.annotations_dir, "{:06d}.xml".format(image_id))
  | [2020-07-20T05:41:28Z] E       ValueError: Unknown format code 'd' for object of type 'str'
  | [2020-07-20T05:41:28Z]
  | [2020-07-20T05:41:28Z] ../blueoil/datasets/pascalvoc_2007.py:57: ValueError

My previous explanation was a fault.
In the original code, the data is handled by pandas dataframe, which will handle the data type conversion internally.
My code takes in the file name as string type and that caused the lmnet-test failure.
Since the file name format is safe to be read as whole, I replace {:06d} to {} as the quick fix.
Since this is pandas removal related issue, I will fix this code again and please advice if you have any better suggestion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your investigation. I've understood the reason.

return annotation_file

def _image_file_from_image_id(self, image_id):
"""Return image file name of a image."""
return os.path.join(self.jpegimages_dir, "{:06d}.jpg".format(image_id))
return os.path.join(self.jpegimages_dir, "{}.jpg".format(image_id))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is the same as reply above.

names=['image_id'])
image_id = list()

with open(filename) as f:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you use csv module?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lines here are about reading a txt file, not csv file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


image_files = df.image_files.tolist()
label_files = df.label_files.tolist()
image_files, label_files = list(), list()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why Don't you use csv module?

Copy link
Contributor Author

@lm-lily lm-lily Jun 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for reading a text file instead of a csv file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python csv module can load space separated file.
https://docs.python.org/ja/3/library/csv.html#csv.Dialect.delimiter

It's good to use standard library than implement by ourself.

@lm-lily
Copy link
Contributor Author

lm-lily commented Jun 5, 2020

@lm-lily Can you remove pandas dependency from setup.cfg?
https://github.com/blue-oil/blueoil/blob/master/setup.cfg#L26

@iizukak
There is still delta-mark remained in Blueoil (issue #1066).
Is it ok to remove pandas dependency from setup.cfg before issue #1066 is solved?

@iizukak
Copy link
Member

iizukak commented Jun 9, 2020

@lm-lily OK. Then, It's good to remove DeLTA Mark dataset loader first.
Thank you for trying to solve a lot of issues!

@lm-lily lm-lily linked an issue Jun 9, 2020 that may be closed by this pull request
@lm-lily lm-lily self-assigned this Jun 9, 2020
@lm-lily
Copy link
Contributor Author

lm-lily commented Jun 9, 2020

This PR is put to pending until PR #1086 is merged into Blueoil.

@CLAassistant
Copy link

CLAassistant commented Jun 12, 2020

CLA assistant check
All committers have signed the CLA.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@lm-lily lm-lily requested a review from iizukak June 30, 2020 08:05
@iizukak
Copy link
Member

iizukak commented Jul 2, 2020

@lm-lily Thank you for the review request. I'll review again soon.

Copy link
Member

@iizukak iizukak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lm-lily
I leave some comments.
Some of the code, I can not understand perfectly.
If my comment is not reasonable, sorry.

names=['image_id'])
image_id = list()

with open(filename) as f:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK Thanks.

names=['image_id'])
image_id = list()

with open(filename) as f:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delim_whitespace=True,
header=None,
names=['image_id'])
image_id = list()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image_id looks like single value.
How about image_ids ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In reply to #1072 (comment)

read().splitlines() is possible.
I will modify my code to use read().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image_id looks like single value.
How about image_ids ?

Noted.

@@ -54,12 +54,12 @@ def num_max_boxes(self):
return 42

def _annotation_file_from_image_id(self, image_id):
annotation_file = os.path.join(self.annotations_dir, "{:06d}.xml".format(image_id))
annotation_file = os.path.join(self.annotations_dir, "{}.xml".format(image_id))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file name extracted using my submitted code (after removing pandas dataframe method) failed with :06d format.

Sorry, Why this happen?

image_files = df.image_files.tolist()
label_files = df.label_files.tolist()
image_files, label_files = list(), list()
with open(filename) as f:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python csv module can load space separated file.
https://docs.python.org/ja/3/library/csv.html#csv.Dialect.delimiter

It's good to use standard library than implement by ourself.

Copy link
Contributor Author

@lm-lily lm-lily Jul 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reply same as #1072 (comment), as same for #1072 (comment).


image_files = df.image_files.tolist()
label_files = df.label_files.tolist()
image_files, label_files = list(), list()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python csv module can load space separated file.
https://docs.python.org/ja/3/library/csv.html#csv.Dialect.delimiter

It's good to use standard library than implement by ourself.

with open(output_csv, "w") as fp:
wr = csv.writer(fp)
wr.writerow(columns)
for row in data_by_row:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! It's a good suggestion, I have changed it.

columns.sort()
df = pd.DataFrame([], columns=columns)

values_step_dict = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this initialize code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not necessary. I will remove this.
Thank you for pointing it out.

df = df[["step"] + columns]
for step in step_list:
if step not in values_step_dict:
values_step_dict[step] = ''
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Key of values_step_dict looks value. Not step. Correct?
I'm not sure why this line's key is step.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iizukak
I am sorry, it's my mistake, I forget to change line#37 to (event.value, event.step). The sequence is reversed.
Thank you for capturing this significant mistake!

@lm-lily lm-lily requested a review from iizukak July 13, 2020 15:01
Copy link
Member

@iizukak iizukak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OA

@iizukak iizukak requested a review from tfujiwar July 20, 2020 00:14
@iizukak
Copy link
Member

iizukak commented Jul 20, 2020

@lm-lily
Thanks. Please wait for the Python readability review.

@lm-lily
Copy link
Contributor Author

lm-lily commented Jul 28, 2020

@tfujiwar
May I know about the review status? Thank you.

Copy link
Contributor

@tfujiwar tfujiwar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! Sorry for taking a long time... I left some comments.

Comment on lines 59 to 60
linelist = [line.rstrip('\n') for line in open(os.path.join(self.data_dir, 'imagenet_classes.txt'))]
return linelist
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update the code to close the file?

with open(...) as f:
    return [line.rstrip('\n') for ...]

@@ -54,12 +54,12 @@ def num_max_boxes(self):
return 42

def _annotation_file_from_image_id(self, image_id):
annotation_file = os.path.join(self.annotations_dir, "{:06d}.xml".format(image_id))
annotation_file = os.path.join(self.annotations_dir, "{}.xml".format(image_id))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your investigation. I've understood the reason.

Comment on lines +80 to +83
for metrics_key in metrics_keys:
if not value_matrix:
step_list = sorted(_step_list(event_accumulator, metrics_key), reverse=True)
value_matrix.append(step_list)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With these lines, the list of steps comes from the first metrics_key. Is that the same with the original code? Did you check the result is the same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tfujiwar
Yes, I have verified the result.
The list of steps is the same and is repeated for each metrics_key, it is enough to just read it once from any metrics_key.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Then what about this code? I think we don't need to use a for-loop.

value_matrix = [sorted(_step_list(event_accumulator, metrics_keys[0]), reverse=True)]

Copy link
Contributor Author

@lm-lily lm-lily Jul 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tfujiwar
This can't work, there will be TypeError: 'set' object is not subscriptable because metrics_keys is a set.
How about the following options?

for metrics_key in metrics_keys:
        step_list = sorted(_step_list(event_accumulator, metrics_key), reverse=True)
        value_matrix.append(step_list)
        break

or

metrics_key = next(iter(metrics_keys))
step_list = sorted(_step_list(event_accumulator, metrics_key), reverse=True)
value_matrix.append(step_list)

Which solution is more preferable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I've understood the situation. It might be better to refactor the existing code also but this PR is OK 👍

@lm-lily lm-lily requested a review from tfujiwar July 30, 2020 12:58
Copy link
Contributor

@tfujiwar tfujiwar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RA

@lm-lily
Copy link
Contributor Author

lm-lily commented Jul 30, 2020

/ready

@bo-mergebot
Copy link
Contributor

bo-mergebot bot commented Jul 30, 2020

⏳Merge job is queued...

@bo-mergebot bo-mergebot bot merged commit 53dae5a into master Jul 30, 2020
@bo-mergebot bo-mergebot bot deleted the issue_484_remove_pandas branch July 30, 2020 23:30
oatawa1 added a commit to oatawa1/blueoil that referenced this pull request Aug 3, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CI: auto-run Run CI automatically
Projects
None yet
Development

Successfully merging this pull request may close these issues.

remove pandas dependency
4 participants