Example training code over oxford pet dataset #144

eddyxu · 2022-09-06T21:08:45Z

A Pytorch lighting model to train classification models over Oxford Pet dataset.

eddyxu · 2022-09-06T21:32:48Z

This PR is built on top of lei/pytorch branch.

eddyxu · 2022-09-07T05:33:25Z

Rough Benchmarks:

Configuration

g5.xlarge
10 epoches
num_workers=4

time ./train_pet.py -F raw s3://eto-public/datasets/oxford_pet/
9:55.60
time ./train_pet.py -F lance s3://eto-public/datasets/oxford_pet/oxford_pet.lance/train/
3:35.39

The Dataset / Scanner does not work well with partition pruning tho.

changhiskhan

couple of nits

changhiskhan · 2022-09-07T17:16:53Z

cpp/src/lance/encodings/plain.cc

@@ -173,6 +173,15 @@ class PlainDecoderImpl : public Decoder {
    return fmt::format("PlainEncoder({})", type_->ToString());
  }

+ protected:


just curious why protected instead of private?

This one is used by BooleanDecoder as well.

changhiskhan · 2022-09-07T17:20:47Z

python/benchmarks/parse_pet.py

@@ -78,8 +78,8 @@ def read_metadata(self, check_quality=False) -> pd.DataFrame:
            no_index = pd.Index(names.values).difference(df.filename)
            self._data_quality_issues["missing_index"] = no_index

-        # TODO lance doesn't support writing booleans yet
-        with_xmls['segmented'] = with_xmls.segmented.astype(pd.Int8Dtype())
+        with_xmls['segmented'] = with_xmls.segmented.apply(


with_xmls.segments.astype('boolean') should work (https://pandas.pydata.org/docs/user_guide/boolean.html) no?

yup I tried, but it does not work. So the segmented is mixed with string and float NaN.

eddyxu force-pushed the lei/train_pet.py branch 2 times, most recently from e22cce3 to fdedf66 Compare September 6, 2022 21:10

eddyxu changed the base branch from main to lei/pytorch September 6, 2022 21:11

eddyxu self-assigned this Sep 6, 2022

eddyxu requested a review from changhiskhan September 6, 2022 21:23

eddyxu marked this pull request as ready for review September 6, 2022 21:25

eddyxu added python benchmark PyTorch PyTorch support labels Sep 7, 2022

changhiskhan approved these changes Sep 7, 2022

View reviewed changes

Base automatically changed from lei/pytorch to main September 7, 2022 17:43

eddyxu and others added 17 commits September 7, 2022 10:50

add pytorch dataset

2346f80

pass batch size via scanner

9a5138e

fix tests

2c716ff

fix py38 compatiblity

16e8945

address comments

78cbd5a

add train pet benchmark

e9eaf15

use torch dataset

5f2aadc

add train loop

861b61c

cleanup

763ac6c

training loop for classifcation

03d1fcf

revert unnecessary change

5c3a7f1

tune DataLoader number of workers

a701460

read raw dataset

1697a38

make sure read images in 3 channels

6b0ca73

fix label type

936642f

fixes

8528b31

support filters

fd73cdf

fix optional check

0d1b2a7

eddyxu force-pushed the lei/train_pet.py branch from 28a0c02 to 0d1b2a7 Compare September 7, 2022 17:54

eddyxu merged commit 106133c into main Sep 7, 2022

eddyxu deleted the lei/train_pet.py branch September 7, 2022 18:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Example training code over oxford pet dataset #144

Example training code over oxford pet dataset #144

eddyxu commented Sep 6, 2022 •

edited

Loading

eddyxu commented Sep 6, 2022

eddyxu commented Sep 7, 2022 •

edited

Loading

changhiskhan left a comment

changhiskhan Sep 7, 2022

eddyxu Sep 7, 2022

changhiskhan Sep 7, 2022

eddyxu Sep 7, 2022

Example training code over oxford pet dataset #144

Example training code over oxford pet dataset #144

Conversation

eddyxu commented Sep 6, 2022 • edited Loading

eddyxu commented Sep 6, 2022

eddyxu commented Sep 7, 2022 • edited Loading

changhiskhan left a comment

Choose a reason for hiding this comment

changhiskhan Sep 7, 2022

Choose a reason for hiding this comment

eddyxu Sep 7, 2022

Choose a reason for hiding this comment

changhiskhan Sep 7, 2022

Choose a reason for hiding this comment

eddyxu Sep 7, 2022

Choose a reason for hiding this comment

eddyxu commented Sep 6, 2022 •

edited

Loading

eddyxu commented Sep 7, 2022 •

edited

Loading