-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removing Sample Csv Data Loader and Cleaning up Sample Data Loader #206
Removing Sample Csv Data Loader and Cleaning up Sample Data Loader #206
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small nit, but as you mentioned, we should deprecate csv sample data loader to avoid this kinda of confusion...
hive,gold,test_schema,test_table2,col1,[email protected],10 | ||
hive,gold,test_schema,test_table2,col1,[email protected],10 | ||
hive,gold,test_schema,test_table1,col1,[email protected],500 | ||
hive,gold,test_schema,test_table1,col1,[email protected],100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it mainly usage is the change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sorry about this. This should be in its own MR probably. I noticed when testing that popular tables no longer show up, I believe this is due to requirement that there are minimum 10 unique users. I was lazy and thought I would add it to this MR as well.
if its ok I created this issue which I can definitely work on to remove sample_csv_data_loader, it would involve refactoring sample_data_loader to load csvs like is done in sample_csv_data_loader which I believe will make it easier to understand and maintain: amundsen-io/amundsen#316 |
@samshuster sounds good, I am ok either way: 1. merge this pr first, then you have another pr to refactor; 2.use the same pr to refactor as well. Your call :) |
ok sure, I will do the refactor for this pr. Working on that now |
removing sample_csv_data_loader.py simplified sample_data_loader.py Finally, modified sample data so that popular table shows by default again.
bc14897
to
aff0447
Compare
ok @feng-tao it is refactored. I know the diff looks scary, but it does successfully load neo4j and I confirmed everything looks good in frontend. |
thanks @samshuster for the hard work. Will take a look. cc @jinhyukchang |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I update a comment to make sure that we use csv extractor for Demo purpose.
* Patch to fix backward incompatible issue on table metadata (amundsen-io#195) * Backward compatible patch * Increment version * Update * escaping backslashes in neo4j publisher (amundsen-io#191) * escaping backslashes in neo4j publisher * removing an extra backslash * Bumping versions to 1.5.3 * Mode Dashboard extractor with Generic REST API Query (amundsen-io#194) * Initial check in on REST API Query * Working version * docstring * Update * Update * Make unit test happy * Update docstring * Update * Update * Update * Adding unit tests * Updated README.md * jsonpath_rw to extra_requires * update models for 2.0.0 (amundsen-io#188) * update models for 2.0.0 schema_name -> schema in table model update search model to use schema, last_updated_timestamp * update name -> full_name * Mode Dashboard execution timestamp and state (amundsen-io#197) * Initial commit * Update * Update * Update * flake8 * Update docstring * Update * Update * Add mode_dashboard_constants and mode_dashboard_utils * Add more to Util * Added back model class to ModeDashboardExtractor * flake8 * Address PR comments * Fix Sample Scripts and Data (amundsen-io#199) * Change schema_name -> schema in sample_table_programmatic_source.csv * Update schema_name -> schema in sample_data_loader * Change name -> full_name in sample_user.csv * Change name -> full_name in sample_data_loader.py * Retrigger CLA Check * Fix broken HiveTableMetadataExtractor. (amundsen-io#201) * Fix. * Update corresponding tests. * Replace schema_name with and fix presto as well. * Replace unit test with schema, since we renamed schema_name back to . * Update setup.py Co-authored-by: Tao Feng <[email protected]> * Follow up patches on v2 field name changes (amundsen-io#202) * Mode dashboard owner (amundsen-io#200) * Mode dashboard owner * Not having dashboard_owner to create User node * Update * Support unicode in file_system_neo4j_csv_loader (amundsen-io#203) * Support unicode in file_system_neo4j_csv_loader * increment version * Update * Update * Let Python3 use csv not unicodecsv * Add badges to Neo4jExtractor and elastic search (amundsen-io#204) * Add badges to Neo4jSearchExtractor * update publisher to have badges * update elastic search document * fix typo * update name * filter tags by type * typo * do not filter tags because then i can't get badges on staging :| * update tests * fix tests * use amunsen_common for elastic search index * revert commit using amundsencommon * add comment * make backwards compatible * remove badges from tags * Mode Dashboard last successful execution (amundsen-io#205) * Mode Dashboard last successful execution * Increment version * Remove py27 Antlar presto sql usage extractor (amundsen-io#208) * Remove py27 Antlar presto sql usage extractor * remove more * remove pytest macro * Removing Sample Csv Data Loader and Cleaning up Sample Data Loader (amundsen-io#206) * refactored TableColumnCsvExtractor to the csv_extractors file. removing sample_csv_data_loader.py simplified sample_data_loader.py Finally, modified sample data so that popular table shows by default again. * Update sample_data_loader.py Co-authored-by: Tao Feng <[email protected]> * Added created timestamp and last modified timestamp on Dashboard (amundsen-io#207) * Added created timestamp and last modified timestamp on Dashboard * Remove owner and reload time from dashboard_metadata * Move dashboard_metadata under dashboard package * Update * Update * Added dashboard group url and dashboard url * Added cluster node in Dashboard * Optimize Neo4j Cypher query on Neo4jSearchDataExtractor (amundsen-io#213) * Optimize Neo4j Cypher query on Neo4jSearchDataExtractor * flake8 * Add Dashboard Table model (amundsen-io#210) * Add Dashboard Table model * Update dashboard_table.py * Update test_dashboard_table.py * bump version * Remove antlr4 deps (amundsen-io#214) * Update dashboard ES doc (amundsen-io#218) * Remove neo4j dashboard/metric extractor (amundsen-io#219) * Add default cypher query for user/dashboard entity to search extractor (amundsen-io#220) * Update the query publish tag to match entity (amundsen-io#221) * issue-320 documentation of models (amundsen-io#217) * Start if issue-320 documentatino of models * adding more documentation * Adding BigQuery to the readme (amundsen-io#222) * Adding Bigquery to the readme * fixing link * Rename sample_dag.py to hive_sample_dag.py (amundsen-io#223) To clarify what this sample is about. A year ago this folder only had one file :) * Adding Mode dashboard usage extractor and generic loader (amundsen-io#225) * Adding Mode dashboard usage extractor and generic loader * Update * Increment version * Add query and chart on Dashboard (amundsen-io#224) * Add query and chart on Dashboard * Update * Increment version * Update test case * Add title field in user model (amundsen-io#227) * Add title field in user model * fix one more test * Slight change from title to role name (amundsen-io#228) * Update python version for databuilder (amundsen-io#229) * Dashboard usage model (amundsen-io#226) * Added Dashboard usage model * Update * Update * Update * Clarify ES publisher warning on first run (amundsen-io#230) * Add role_name to user ES doc (amundsen-io#232) * minor mods to get tests to pass * tweak schema field after testing demo execution * move Table into simple_sql_parser now that we're not trying to conform to a missing presto parser * add role_name to sample_data_loader Co-authored-by: Jin Hyuk Chang <[email protected]> Co-authored-by: Luke Lowery <[email protected]> Co-authored-by: friendtocephalopods <[email protected]> Co-authored-by: Jacob Kim <[email protected]> Co-authored-by: Robert Yi <[email protected]> Co-authored-by: Tao Feng <[email protected]> Co-authored-by: christina stead <[email protected]> Co-authored-by: Tao Feng <[email protected]> Co-authored-by: samshuster <[email protected]> Co-authored-by: jornh <[email protected]>
Summary of Changes
Refactored TableColumnCsvExtractor to the csv_extractors file.
Added TableColumnCsvExtractor to sample_csv_data_loader.py also fixed other issue with sample_csv_data_loader.py relating to index.
Finally, modified sample data so that popular table shows by default again.
I would recommend that we actually replace sample_data_loader with sample_csv_data_loader in the future.
Tests
No tests modified
Documentation
No documentation modified.
CheckList
Make sure you have checked all steps below to ensure a timely review.
make test