-
Notifications
You must be signed in to change notification settings - Fork 688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix an error in importing sample data #11077
Fix an error in importing sample data #11077
Conversation
[REVIEW NOTIFICATION] This pull request has not been approved. To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
I'll review this next week. I think some more changes are needed. |
I think we should consider the following:
It might be good to consider if we need to combine these or not. My take on it is that we should not do this as the procedure for TiDB Cloud uses the cloud UI to import and uses data that was already converted to dumpling format instead of the original CSV data. We could use the S3 hosted files for the onprem example as well as
The referenced issue links to https://docs.pingcap.com/tidb/stable/quick-start-with-tidb on which there are examples for "MySQL Client" and Side note: Maybe we should add a note about
People that are just starting with TiDB are likely to come from a MySQL background. This means that they might already be familiar with "MySQL Client", While showing the functionality of Lightning it might also be good to not overwhelm them with new tools. Maybe we should list both the
If we use
The example uses
I think the goal of importing the sample data is:
https://s3.amazonaws.com/capitalbikeshare-data/index.html has data from 2010 until last month (2022-09). Maybe we should use the newer data. Note that the newer files have different fields.
To see the full benefits of TiDB it might be needed to load more years of data than the example uses. Maybe list the commands to import all data as well.
Might be good to give an example about TiFlash with this data (adding replica and explain, etc)
OSS Insight is a full example as it has the data and an opensource application. Maybe we want to either use OSS Insight data here or create a small example frontend for the bikeshare data to give a more complete example. If we do this we could use some of the example queries from the TiDB Cloud Playground functionality. Maybe of these points are probably out of scope for this specific PR. For now I think for this PR we should:
|
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Co-authored-by: Daniël van Eeden <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems fine. However if I recall correctly not all CSV files had the same set of columns as that changed at some point in time. Does this work correctly for all CSV files?
[LGTM Timeline notifier]Timeline:
|
Co-authored-by: Daniël van Eeden <[email protected]>
Yes. I did a quick test using the procedure and I could import all the files:
mysql> SELECT COUNT(*) + 26 AS adjusted_row_count FROM bikeshare.trips;
+--------------------+
| adjusted_row_count |
+--------------------+
| 19117669 |
+--------------------+
1 row in set (0.00 sec) |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: hfxsd The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
In response to a cherrypick label: new pull request created to branch |
What is changed, added or deleted? (Required)
The step to LOAD DATA INFILE does not work with the client tiup client. Changed it to another method.
close #6647
Which TiDB version(s) do your changes apply to? (Required)
Tips for choosing the affected version(s):
By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.
For details, see tips for choosing the affected versions.
What is the related PR or file link(s)?
Do your changes match any of the following descriptions?