As for issue #484: remove pandas dependency #1072

lm-lily · 2020-05-29T09:28:23Z

What this patch does to fix the issue.

Remove pandas dependencies.

Link to any relevant issues or pull requests.

Fix for issue #484.

bo-code-review-bot · 2020-05-29T09:28:29Z

This PR needs Approvals as follows.

Ownership Approval for / from iizukak, tkng, ruimashita
Readability Approval for Python from tkng, tsawada, tfujiwar

Please choose reviewers and requet reviews!

Click to see how to approve each reviews

You can approve this PR by triggered comments as follows.

Approve all reviews requested to you (readability and ownership) and LGTM review
Approval, LGTM
Approve all ownership reviews
Ownership Approval or OA
Approve all readability reviews
Readability Approval or RA
Approve specified review targets
- Example of Ownership Reviewer of /: Ownership Approval for / or OA for /
- Example of Readability Reviewer of Python: Readability Approval for Python or RA for Python
Approve LGTM review
LGTM

See all trigger comments

Please replace [Target] to review target

Ownership Approval
- Ownership Approval for [Target]
- OA for [Target]
- Ownership Approval
- OA
- Approval
Readability Approval
- Readability Approval for [Target]
- RA for [Target]
- [Target] Readability Approval
- [Target] RA
- Readability Approval
- RA
- Approval
LGTM
- LGTM
- lgtm

iizukak · 2020-06-02T01:16:21Z

@lm-lily Can you remove pandas dependency from setup.cfg?
https://github.com/blue-oil/blueoil/blob/master/setup.cfg#L26

iizukak

@lm-lily Thank you. Some comments added.

iizukak · 2020-06-02T01:20:11Z

blueoil/datasets/pascalvoc_2007.py

@@ -54,12 +54,12 @@ def num_max_boxes(self):
            return 42

    def _annotation_file_from_image_id(self, image_id):
-        annotation_file = os.path.join(self.annotations_dir, "{:06d}.xml".format(image_id))
+        annotation_file = os.path.join(self.annotations_dir, "{}.xml".format(image_id))


Why :06d removed?
Related to pandas?

The file name extracted using my submitted code (after removing pandas dataframe method) failed with :06d format.
I have to adjust this part to make it right.

The file name extracted using my submitted code (after removing pandas dataframe method) failed with :06d format.

Sorry, Why this happen?

The file name extracted using my submitted code (after removing pandas dataframe method) failed with :06d format.

Sorry, Why this happen?

The image_id has the format of sample data: 2007_000032. If we apply format {:06} to it, the filename will not match the format we need. It will be 2007000032 instead of 2007_000032, and hence cause the error of file not found.

@lm-lily I'm still not sure why this is related to removing pandas. If this is an existing bug, please make another PR to fix that.

@tfujiwar
Thank you Fujiwara san.
I will revert this change.

@tfujiwar
lmnet-test failed when I revert the above code.

[2020-07-20T05:41:28Z] self = <blueoil.datasets.pascalvoc_2007.Pascalvoc2007 object at 0x7ffb8908dc50> -- | [2020-07-20T05:41:28Z] image_id = '000085' | [2020-07-20T05:41:28Z] | [2020-07-20T05:41:28Z] def _annotation_file_from_image_id(self, image_id): | [2020-07-20T05:41:28Z] > annotation_file = os.path.join(self.annotations_dir, "{:06d}.xml".format(image_id)) | [2020-07-20T05:41:28Z] E ValueError: Unknown format code 'd' for object of type 'str' | [2020-07-20T05:41:28Z] | [2020-07-20T05:41:28Z] ../blueoil/datasets/pascalvoc_2007.py:57: ValueError

My previous explanation was a fault.
In the original code, the data is handled by pandas dataframe, which will handle the data type conversion internally.
My code takes in the file name as string type and that caused the lmnet-test failure.
Since the file name format is safe to be read as whole, I replace {:06d} to {} as the quick fix.
Since this is pandas removal related issue, I will fix this code again and please advice if you have any better suggestion.

Thanks for your investigation. I've understood the reason.

iizukak · 2020-06-02T01:20:17Z

blueoil/datasets/pascalvoc_2007.py

        return annotation_file

    def _image_file_from_image_id(self, image_id):
        """Return image file name of a image."""
-        return os.path.join(self.jpegimages_dir, "{:06d}.jpg".format(image_id))
+        return os.path.join(self.jpegimages_dir, "{}.jpg".format(image_id))


The reason is the same as reply above.

iizukak · 2020-06-02T01:20:46Z

blueoil/datasets/pascalvoc_base.py

-            names=['image_id'])
+        image_id = list()
+
+        with open(filename) as f:


Why don't you use csv module?

The lines here are about reading a txt file, not csv file.

Can we use readlines to read lines as list?

https://docs.python.org/ja/3/library/codecs.html?highlight=readlines#codecs.StreamReader.readlines

iizukak · 2020-06-02T01:27:13Z

blueoil/datasets/camvid.py


-        image_files = df.image_files.tolist()
-        label_files = df.label_files.tolist()
+        image_files, label_files = list(), list()


Why Don't you use csv module?

This is for reading a text file instead of a csv file.

Python csv module can load space separated file.
https://docs.python.org/ja/3/library/csv.html#csv.Dialect.delimiter

It's good to use standard library than implement by ourself.

lm-lily · 2020-06-05T13:50:13Z

@lm-lily Can you remove pandas dependency from setup.cfg?
https://github.com/blue-oil/blueoil/blob/master/setup.cfg#L26

@iizukak
There is still delta-mark remained in Blueoil (issue #1066).
Is it ok to remove pandas dependency from setup.cfg before issue #1066 is solved?

iizukak · 2020-06-09T00:07:01Z

@lm-lily OK. Then, It's good to remove DeLTA Mark dataset loader first.
Thank you for trying to solve a lot of issues!

lm-lily · 2020-06-09T14:46:48Z

This PR is put to pending until PR #1086 is merged into Blueoil.

CLAassistant · 2020-06-12T06:46:53Z

All committers have signed the CLA.

CLAassistant · 2020-06-12T06:47:08Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

iizukak · 2020-07-02T03:41:32Z

@lm-lily Thank you for the review request. I'll review again soon.

iizukak

@lm-lily
I leave some comments.
Some of the code, I can not understand perfectly.
If my comment is not reasonable, sorry.

iizukak · 2020-07-02T23:48:15Z

blueoil/datasets/pascalvoc_base.py

-            names=['image_id'])
+        image_id = list()
+
+        with open(filename) as f:


iizukak · 2020-07-02T23:50:28Z

blueoil/datasets/pascalvoc_base.py

-            names=['image_id'])
+        image_id = list()
+
+        with open(filename) as f:


Can we use readlines to read lines as list?

https://docs.python.org/ja/3/library/codecs.html?highlight=readlines#codecs.StreamReader.readlines

iizukak · 2020-07-02T23:52:21Z

blueoil/datasets/pascalvoc_base.py

-            delim_whitespace=True,
-            header=None,
-            names=['image_id'])
+        image_id = list()


image_id looks like single value.
How about image_ids ?

In reply to #1072 (comment)

read().splitlines() is possible.
I will modify my code to use read().

image_id looks like single value.
How about image_ids ?

Noted.

iizukak · 2020-07-02T23:58:45Z

blueoil/datasets/pascalvoc_2007.py

@@ -54,12 +54,12 @@ def num_max_boxes(self):
            return 42

    def _annotation_file_from_image_id(self, image_id):
-        annotation_file = os.path.join(self.annotations_dir, "{:06d}.xml".format(image_id))
+        annotation_file = os.path.join(self.annotations_dir, "{}.xml".format(image_id))


The file name extracted using my submitted code (after removing pandas dataframe method) failed with :06d format.

Sorry, Why this happen?

blueoil/datasets/ilsvrc_2012.py

iizukak · 2020-07-03T00:05:26Z

blueoil/datasets/camvid.py

-        image_files = df.image_files.tolist()
-        label_files = df.label_files.tolist()
+        image_files, label_files = list(), list()
+        with open(filename) as f:


Python csv module can load space separated file.
https://docs.python.org/ja/3/library/csv.html#csv.Dialect.delimiter

It's good to use standard library than implement by ourself.

Reply same as #1072 (comment), as same for #1072 (comment).

iizukak · 2020-07-03T00:06:01Z

blueoil/datasets/camvid.py


-        image_files = df.image_files.tolist()
-        label_files = df.label_files.tolist()
+        image_files, label_files = list(), list()


Python csv module can load space separated file.
https://docs.python.org/ja/3/library/csv.html#csv.Dialect.delimiter

It's good to use standard library than implement by ourself.

iizukak · 2020-07-03T00:12:12Z

blueoil/cmd/output_event.py

+    with open(output_csv, "w") as fp:
+        wr = csv.writer(fp)
+        wr.writerow(columns)
+        for row in data_by_row:


Can we use writerows ?
https://docs.python.org/ja/3/library/csv.html#csv.csvwriter.writerows

Thank you! It's a good suggestion, I have changed it.

iizukak · 2020-07-03T00:33:30Z

blueoil/cmd/output_event.py

-    columns.sort()
-    df = pd.DataFrame([], columns=columns)
+
+    values_step_dict = {}


Do we need this initialize code?

It is not necessary. I will remove this.
Thank you for pointing it out.

iizukak · 2020-07-03T00:38:58Z

blueoil/cmd/output_event.py

-    df = df[["step"] + columns]
+            for step in step_list:
+                if step not in values_step_dict:
+                    values_step_dict[step] = ''


Key of values_step_dict looks value. Not step. Correct?
I'm not sure why this line's key is step.

@iizukak
I am sorry, it's my mistake, I forget to change line#37 to (event.value, event.step). The sequence is reversed.
Thank you for capturing this significant mistake!

iizukak

OA

blueoil/datasets/ilsvrc_2012.py

iizukak · 2020-07-20T00:15:11Z

@lm-lily
Thanks. Please wait for the Python readability review.

…/blueoil into issue_484_remove_pandas

lm-lily · 2020-07-28T22:15:46Z

@tfujiwar
May I know about the review status? Thank you.

tfujiwar

Thanks for working on this! Sorry for taking a long time... I left some comments.

tfujiwar · 2020-07-28T07:41:44Z

blueoil/datasets/ilsvrc_2012.py

+        linelist = [line.rstrip('\n') for line in open(os.path.join(self.data_dir, 'imagenet_classes.txt'))]
+        return linelist


Could you update the code to close the file?

with open(...) as f: return [line.rstrip('\n') for ...]

tfujiwar · 2020-07-28T07:44:23Z

blueoil/datasets/pascalvoc_2007.py

@@ -54,12 +54,12 @@ def num_max_boxes(self):
            return 42

    def _annotation_file_from_image_id(self, image_id):
-        annotation_file = os.path.join(self.annotations_dir, "{:06d}.xml".format(image_id))
+        annotation_file = os.path.join(self.annotations_dir, "{}.xml".format(image_id))


Thanks for your investigation. I've understood the reason.

tfujiwar · 2020-07-29T03:02:06Z

blueoil/cmd/output_event.py

+    for metrics_key in metrics_keys:
+        if not value_matrix:
+            step_list = sorted(_step_list(event_accumulator, metrics_key), reverse=True)
+            value_matrix.append(step_list)


With these lines, the list of steps comes from the first metrics_key. Is that the same with the original code? Did you check the result is the same?

@tfujiwar
Yes, I have verified the result.
The list of steps is the same and is repeated for each metrics_key, it is enough to just read it once from any metrics_key.

OK. Then what about this code? I think we don't need to use a for-loop.

value_matrix = [sorted(_step_list(event_accumulator, metrics_keys[0]), reverse=True)]

@tfujiwar
This can't work, there will be TypeError: 'set' object is not subscriptable because metrics_keys is a set.
How about the following options?

for metrics_key in metrics_keys: step_list = sorted(_step_list(event_accumulator, metrics_key), reverse=True) value_matrix.append(step_list) break

or

metrics_key = next(iter(metrics_keys)) step_list = sorted(_step_list(event_accumulator, metrics_key), reverse=True) value_matrix.append(step_list)

Which solution is more preferable?

Hmm, I've understood the situation. It might be better to refactor the existing code also but this PR is OK 👍

…/blueoil into issue_484_remove_pandas

tfujiwar

RA

lm-lily · 2020-07-30T23:06:53Z

/ready

bo-mergebot · 2020-07-30T23:06:54Z

⏳Merge job is queued...

As for issue #484: remove pandas dependency

13bd280

lm-lily requested review from ruimashita and iizukak May 29, 2020 09:28

blueoil-butler bot added the CI: auto-run Run CI automatically label May 29, 2020

lm-lily removed request for ruimashita and iizukak May 29, 2020 09:50

pep8 check fix

ce64e28

lm-lily requested a review from iizukak June 1, 2020 06:34

iizukak reviewed Jun 2, 2020

View reviewed changes

lm-lily linked an issue Jun 9, 2020 that may be closed by this pull request

remove pandas dependency #484

Closed

lm-lily self-assigned this Jun 9, 2020

pandas removal from setup.cfg

840d345

Merge branch 'master' into issue_484_remove_pandas

a02324e

lm-lily requested a review from iizukak June 30, 2020 08:05

iizukak reviewed Jul 3, 2020

View reviewed changes

lm-lily and others added 2 commits July 13, 2020 21:52

Improve codes as suggested by reviewer.

5cf97c0

Merge branch 'master' into issue_484_remove_pandas

3fd6b92

lm-lily requested a review from iizukak July 13, 2020 15:01

iizukak approved these changes Jul 20, 2020

View reviewed changes

blueoil/datasets/ilsvrc_2012.py Outdated Show resolved Hide resolved

iizukak requested a review from tfujiwar July 20, 2020 00:14

revert not-panda-dependency-related code.

957ec5d

lm-lily and others added 3 commits July 20, 2020 14:25

Merge branch 'issue_484_remove_pandas' of https://github.com/blue-oil…

a6ba93e

…/blueoil into issue_484_remove_pandas

to fix filename data type problem after removing pandas library.

4703a9e

Merge branch 'master' into issue_484_remove_pandas

4df9745

tfujiwar reviewed Jul 29, 2020

View reviewed changes

lm-lily added 2 commits July 29, 2020 15:03

modify coding style as suggested by reviewer.

f675e31

Merge branch 'issue_484_remove_pandas' of https://github.com/blue-oil…

46d1913

…/blueoil into issue_484_remove_pandas

lm-lily requested a review from tfujiwar July 30, 2020 12:58

tfujiwar approved these changes Jul 30, 2020

View reviewed changes

Merge branch 'master' into issue_484_remove_pandas

e22deac

bo-mergebot bot merged commit 53dae5a into master Jul 30, 2020

bo-mergebot bot deleted the issue_484_remove_pandas branch July 30, 2020 23:30

tfujiwar mentioned this pull request Jul 30, 2020

Enhance the dataset class to accept the dataset in PASCALVOC format with configuration. #1140

Merged

oatawa1 added a commit to oatawa1/blueoil that referenced this pull request Aug 3, 2020

Resolved as in blue-oil#1072 (image_id will be changed from int to str)

25004f1

		linelist = [line.rstrip('\n') for line in open(os.path.join(self.data_dir, 'imagenet_classes.txt'))]
		return linelist

As for issue #484: remove pandas dependency #1072

As for issue #484: remove pandas dependency #1072

Conversation

lm-lily commented May 29, 2020

What this patch does to fix the issue.

Link to any relevant issues or pull requests.

bo-code-review-bot bot commented May 29, 2020

iizukak commented Jun 2, 2020

iizukak left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lm-lily Jul 20, 2020 • edited Loading

Choose a reason for hiding this comment

lm-lily Jul 20, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lm-lily Jun 6, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lm-lily commented Jun 5, 2020

iizukak commented Jun 9, 2020 • edited Loading

lm-lily commented Jun 9, 2020

CLAassistant commented Jun 12, 2020 • edited Loading

CLAassistant commented Jun 12, 2020

iizukak commented Jul 2, 2020

iizukak left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lm-lily Jul 13, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iizukak left a comment

Choose a reason for hiding this comment

iizukak commented Jul 20, 2020

lm-lily commented Jul 28, 2020

tfujiwar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lm-lily Jul 29, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tfujiwar left a comment

Choose a reason for hiding this comment

lm-lily commented Jul 30, 2020

bo-mergebot bot commented Jul 30, 2020

lm-lily Jul 20, 2020 •

edited

Loading

lm-lily Jul 20, 2020 •

edited

Loading

lm-lily Jun 6, 2020 •

edited

Loading

iizukak commented Jun 9, 2020 •

edited

Loading

CLAassistant commented Jun 12, 2020 •

edited

Loading

lm-lily Jul 13, 2020 •

edited

Loading

lm-lily Jul 29, 2020 •

edited

Loading