Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example training code over oxford pet dataset #144

Merged
merged 18 commits into from
Sep 7, 2022
Merged

Conversation

eddyxu
Copy link
Contributor

@eddyxu eddyxu commented Sep 6, 2022

A Pytorch lighting model to train classification models over Oxford Pet dataset.

@eddyxu eddyxu force-pushed the lei/train_pet.py branch 2 times, most recently from e22cce3 to fdedf66 Compare September 6, 2022 21:10
@eddyxu eddyxu changed the base branch from main to lei/pytorch September 6, 2022 21:11
@eddyxu eddyxu self-assigned this Sep 6, 2022
@eddyxu eddyxu requested a review from changhiskhan September 6, 2022 21:23
@eddyxu eddyxu marked this pull request as ready for review September 6, 2022 21:25
@eddyxu
Copy link
Contributor Author

eddyxu commented Sep 6, 2022

This PR is built on top of lei/pytorch branch.

@eddyxu
Copy link
Contributor Author

eddyxu commented Sep 7, 2022

Rough Benchmarks:

Configuration

  • g5.xlarge
  • 10 epoches
  • num_workers=4
time ./train_pet.py -F raw s3://eto-public/datasets/oxford_pet/
9:55.60
time ./train_pet.py -F lance s3://eto-public/datasets/oxford_pet/oxford_pet.lance/train/
3:35.39

The Dataset / Scanner does not work well with partition pruning tho.

Copy link
Contributor

@changhiskhan changhiskhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of nits

@@ -173,6 +173,15 @@ class PlainDecoderImpl : public Decoder {
return fmt::format("PlainEncoder({})", type_->ToString());
}

protected:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious why protected instead of private?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is used by BooleanDecoder as well.

@@ -78,8 +78,8 @@ def read_metadata(self, check_quality=False) -> pd.DataFrame:
no_index = pd.Index(names.values).difference(df.filename)
self._data_quality_issues["missing_index"] = no_index

# TODO lance doesn't support writing booleans yet
with_xmls['segmented'] = with_xmls.segmented.astype(pd.Int8Dtype())
with_xmls['segmented'] = with_xmls.segmented.apply(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with_xmls.segments.astype('boolean') should work (https://pandas.pydata.org/docs/user_guide/boolean.html) no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup I tried, but it does not work. So the segmented is mixed with string and float NaN.

Base automatically changed from lei/pytorch to main September 7, 2022 17:43
@eddyxu eddyxu merged commit 106133c into main Sep 7, 2022
@eddyxu eddyxu deleted the lei/train_pet.py branch September 7, 2022 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants