-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read data from an hdf rather than a csv #29
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -270,7 +270,7 @@ def generate_missing_data(column: pd.Series, *_: Any) -> pd.Series: | |
:returns: pd.Series of empty strings with the index of column. | ||
""" | ||
|
||
return pd.Series("", index=column.index) | ||
return pd.Series(pd.NA, index=column.index) | ||
|
||
|
||
def generate_typographical_errors( | ||
|
@@ -322,6 +322,7 @@ def keyboard_corrupt(truth, corrupted_pr, addl_pr, rng): | |
include_original_token_level = configuration.include_original_token_level | ||
|
||
rng = np.random.default_rng(seed=randomness_stream.seed) | ||
column = column.astype(str) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This will convert any NaNs to "nan" and proceed to corrupt that. We shouldn't have any NaNs at this point though, right? B/c those get dropped up front when this gets called? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, that's correct, but definitely great to call this out |
||
for idx in column.index: | ||
noised_value = keyboard_corrupt( | ||
column[idx], | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this actually causing a problem or do you just find this more readable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just made debugging easier since I could put a breakpoint between the function call and the assignment to the series.