-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mock qa data #6465
Mock qa data #6465
Conversation
Passing run #21870 ↗︎
Details:
Review all test suite changes for PR #6465 ↗︎ |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #6465 +/- ##
==========================================
- Coverage 93.04% 92.94% -0.10%
==========================================
Files 264 264
Lines 10303 10303
Branches 835 835
==========================================
- Hits 9586 9576 -10
- Misses 589 598 +9
- Partials 128 129 +1 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool.
Can we have an issue number, a changelog entry etc.
I'd also dispute that the qa data is mock - it is real qa about mock data, which is kind of different (i.e. we're actually doing the checks instead of just filling in some made up numbers), so maybe change the names to reflect that.
@@ -35,6 +35,7 @@ defaults: | |||
FLOWAPI_FLOWDB_USER: flowapi | |||
FLOWMACHINE_FLOWDB_PASSWORD: foo | |||
FLOWAPI_FLOWDB_PASSWORD: foo | |||
TEST_QA_CHECK: 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. I kind of feel like this should default to on really.
@@ -76,3 +76,6 @@ else | |||
echo "Must set SYNTHETIC_DATA_GENERATOR environment variable to 'sql' or 'python'." | |||
exit 1 | |||
fi | |||
if [ "$TEST_QA_CHECK" ]; then | |||
pipenv run python ./run_qa_checks.py --dates ${DISASTER_END:-"2015-01-01"} --event-types calls mds sms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we not want to just run it for all the dates in the database really?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And all event types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have internal sources of truth for those? I'd guess it's etl.available_dates
for dates, but I'm not sure what it would be for the various flavours of events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
etl.etl_records
or available_tables
flowdb/testdata/bin/run_qa_checks.py
Outdated
from itertools import product | ||
from pathlib import Path | ||
from typing import List | ||
from jinja2 import Environment, PackageLoader, Template |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs adding to the Pipfile and a relock.
flowdb/testdata/bin/run_qa_checks.py
Outdated
|
||
|
||
# NOTE: given this gets run _after_ all the ingestion, 'dates' may be an irrelevence | ||
# unless final_table is normally one of the dated partitions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The final table will almost always be a dated partition table.
flowdb/testdata/bin/run_qa_checks.py
Outdated
for date, type, template in product(qa_scn.dates, qa_scn.tables, templates) | ||
) | ||
|
||
with engine.connect() as conn: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iirc you need an explicit commit now, or to use engine.begin()
.
flowdb/testdata/bin/run_qa_checks.py
Outdated
db_port = os.getenv("POSTGRES_PORT", "9000") | ||
db_password = os.getenv( | ||
"POSTGRES_PASSWORD", "flowflow" | ||
) # In here for dev, don't think we'll need it in prod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean for locally running this script I assume?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use os.environ
I think, because you don't esp want it diverging from development_environment
.
flowdb/testdata/bin/run_qa_checks.py
Outdated
|
||
env = Environment(loader=PackageLoader("flowetl", "qa_checks/qa_checks")) | ||
|
||
# @James; if something like this already exists, I'll grab |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is in the operator, so I think I'd stick to what you have here.
flowdb/Pipfile
Outdated
@@ -8,6 +8,7 @@ psutil = "*" | |||
|
|||
[dev-packages] | |||
black = {extras = ["jupyter"],version = "==24.2.0"} | |||
jinja2 = "*" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should go in flowdb/testdata/test_data
rather than the root one, isn't needed in the base image, only the test/synth ones.
@@ -76,3 +76,6 @@ else | |||
echo "Must set SYNTHETIC_DATA_GENERATOR environment variable to 'sql' or 'python'." | |||
exit 1 | |||
fi | |||
if [ "$TEST_QA_CHECK" ]; then | |||
pipenv run python ./run_qa_checks.py --dates ${DISASTER_END:-"2015-01-01"} --event-types calls mds sms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
etl.etl_records
or available_tables
flowdb/testdata/bin/run_qa_checks.py
Outdated
db_port = os.getenv("POSTGRES_PORT", "9000") | ||
db_password = os.getenv( | ||
"POSTGRES_PASSWORD", "flowflow" | ||
) # In here for dev, don't think we'll need it in prod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use os.environ
I think, because you don't esp want it diverging from development_environment
.
POSTGRES_DB and POSTGRES_PORT aren't in development_environment; so hardcoded. Not sure if this is a good idea for POSTGRES_PORT.
@@ -7,6 +7,7 @@ name = "pypi" | |||
tohu = "*" | |||
"geoalchemy2" = "*" | |||
sqlalchemy = "*" | |||
jinja2 = "*" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you'll need to add to testdata/synthetic_data/Pipfile
as well
flowdb/testdata/bin/run_qa_checks.py
Outdated
def get_available_tables(engine: Engine): | ||
with engine.begin() as conn: | ||
tables = conn.execute( | ||
"SELECT table_name FROM available_tables WHERE has_subscribers" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also just get this from the available dates one now I think of it, but this is good too :)
flowdb/testdata/bin/run_qa_checks.py
Outdated
import argparse | ||
|
||
|
||
env = Environment(loader=PackageLoader("flowetl", "qa_checks/qa_checks")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably we need to copy these into the container as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just sticking them into the dockerignore
should grab it, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy now 👌
Closes #6467
I have:
Description
Adds the option to run QA checks on mock data, on by default