Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add logging and progress bar for generation of forms #67

Merged
merged 2 commits into from
Apr 14, 2023

Conversation

mattkappel
Copy link
Contributor

@mattkappel mattkappel commented Apr 14, 2023

Add logging and progress bar for generation of forms

Description

  • Category: feature
  • JIRA issue: MIC-3987

Changes

  • Adds tqdm to setup requirements
  • Adds tqdm progress bar over the main noise_form loop
  • Adds a logging message on opening a data file
  • Sets default log level to INFO, configurable to DEBUG if verbose=True on an interface

Testing

Interactive output looks as expected:

>>> import pseudopeople as pp
>>> data = pp.generate_decennial_census()
2023-04-14 10:16:37.443 | INFO     | pseudopeople.interface:_generate_form:62 - Loading data from /Users/mkappel/git/pseudopeople/pseudopeople/src/pseudopeople/data/sample_forms/decennial_census_observer/decennial_census_observer.parquet.
Applying noise: 100%|███████████████████████████| 8/8 [00:00<00:00, 53.98type/s]
>>> data2 = pp.generate_taxes_w2_and_1099()
2023-04-14 10:19:22.760 | INFO     | pseudopeople.interface:_generate_form:62 - Loading data from /Users/mkappel/git/pseudopeople/pseudopeople/src/pseudopeople/data/sample_forms/tax_w2_observer/tax_w2_observer.parquet.
Applying noise: 100%|███████████████████████████| 8/8 [00:00<00:00, 25.11type/s]

@mattkappel mattkappel requested a review from a team as a code owner April 14, 2023 17:21
@@ -48,7 +49,7 @@ def noise_form(
randomness = get_randomness_stream(form.name, seed)

noise_configuration = configuration[form.name]
for noise_type in NOISE_TYPES:
for noise_type in tqdm(NOISE_TYPES, desc="Applying noise", unit="type"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love tqdm - it's just so easy. But what is unit="type" doing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it is doing xxx type/s. Hmm...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe unit="noise types"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default it "it" so instead of "it/sec" it says "type/sec". I mean "type" to be noise type, but it could be any string we want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to keep it short to save on the terminal space. There is already the description of "Applying noise". There should be no ambiguity there...

Copy link
Collaborator

@rmudambi rmudambi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two notes:

  • We should set the default log level to INFO if it is not already at that level. That should be part of this PR
  • How does this work when there are multiple files being noised (i.e. say there were two decennial census files in the sample data directory rather than 1)? When they run on the full data they will have 334 files. This does not need to be resolved in this PR.

@stevebachmeier stevebachmeier self-requested a review April 14, 2023 17:30
@stevebachmeier
Copy link
Contributor

  • How does this work when there are multiple files being noised (i.e. say there were two decennial census files in the sample data directory rather than 1)? When they run on the full data they will have 334 files. This does not need to be resolved in this PR.

tqdm is currently wrapping the actual noising loop, so each of the 334 files would get their own progress bar one after another since it's done in serial.

@mattkappel
Copy link
Contributor Author

@rmudambi What Steve said. I tried wrapping the file reading with tqdm, too, but that got messy, since the loops are intertwined.

I'm still looking into where/how we ought to set the log level.

@mattkappel mattkappel force-pushed the feature/mic-3987-add-tqdm-logging branch from b7bab14 to c3dad05 Compare April 14, 2023 18:23
@mattkappel mattkappel merged commit 96a6b90 into develop Apr 14, 2023
@mattkappel mattkappel deleted the feature/mic-3987-add-tqdm-logging branch April 14, 2023 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants