Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix an error in importing sample data #11077

Merged

Conversation

hfxsd
Copy link
Collaborator

@hfxsd hfxsd commented Oct 28, 2022

What is changed, added or deleted? (Required)

The step to LOAD DATA INFILE does not work with the client tiup client. Changed it to another method.

close #6647

Which TiDB version(s) do your changes apply to? (Required)

Tips for choosing the affected version(s):

By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.

For details, see tips for choosing the affected versions.

  • master (the latest development version)
  • v6.4 (TiDB 6.4 versions)
  • v6.3 (TiDB 6.3 versions)
  • v6.1 (TiDB 6.1 versions)
  • v5.4 (TiDB 5.4 versions)
  • v5.3 (TiDB 5.3 versions)
  • v5.2 (TiDB 5.2 versions)
  • v5.1 (TiDB 5.1 versions)
  • v5.0 (TiDB 5.0 versions)

What is the related PR or file link(s)?

  • This PR is translated from:
  • Other reference link(s):

Do your changes match any of the following descriptions?

  • Delete files
  • Change aliases
  • Need modification after applied to another branch
  • Might cause conflicts after applied to another branch

@ti-chi-bot
Copy link
Member

[REVIEW NOTIFICATION]

This pull request has not been approved.

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added missing-translation-status This PR does not have translation status info. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 28, 2022
@hfxsd hfxsd linked an issue Oct 28, 2022 that may be closed by this pull request
@hfxsd hfxsd self-assigned this Oct 28, 2022
@hfxsd hfxsd added type/bugfix This PR fixes a bug. translation/doing This PR's assignee is translating this PR. area/general Relates to TiDB overview, architecture, and other general descriptions. and removed missing-translation-status This PR does not have translation status info. labels Oct 28, 2022
@hfxsd hfxsd requested a review from dveeden October 28, 2022 07:55
@dveeden
Copy link
Contributor

dveeden commented Oct 28, 2022

I'll review this next week. I think some more changes are needed.

@hfxsd hfxsd added translation/no-need No need to translate this PR. and removed translation/doing This PR's assignee is translating this PR. labels Oct 28, 2022
@lilin90 lilin90 changed the title quick start: fix an error in importing sample data Fix an error in importing sample data Oct 29, 2022
@dveeden
Copy link
Contributor

dveeden commented Oct 31, 2022

I think we should consider the following:

  1. There are two documents about this with different procedures

It might be good to consider if we need to combine these or not. My take on it is that we should not do this as the procedure for TiDB Cloud uses the cloud UI to import and uses data that was already converted to dumpling format instead of the original CSV data. We could use the S3 hosted files for the onprem example as well as tidb-lightning supports S3.

  1. Client strategy

The referenced issue links to https://docs.pingcap.com/tidb/stable/quick-start-with-tidb on which there are examples for "MySQL Client" and tiup client. On TiDB Cloud we have WebSQL which is based on https://github.com/xo/usql just like tiup client. However there isn't an example for tiup client on the TiDB Cloud connect page. It also doesn't look like tiup client was intended to be used for anything except for local playgrounds. On this dialog we also list https://github.com/dbcli/mycli. I'm often using "MySQL Shell", which is listed on https://docs.pingcap.com/tidb/stable/dev-guide-connect-to-tidb#mysql-shell as well. There are also many GUI clients that people use like https://github.com/dbeaver/dbeaver and https://github.com/mysql/mysql-workbench. We should make sure the method that we're choosing here aligns with the long term plans for support of clients.

image

Side note: Maybe we should add a note about tiup client on https://docs.pingcap.com/tidb/stable/sql-statement-load-data

  1. Audience

People that are just starting with TiDB are likely to come from a MySQL background. This means that they might already be familiar with "MySQL Client", LOAD DATA..., etc. but not have experience with tidb-lightning.

While showing the functionality of Lightning it might also be good to not overwhelm them with new tools.

Maybe we should list both the tidb-lightning method and the LOAD DATA... method.

  1. TiDB Cloud

If we use LOAD DATA... or tidb-lightning with tidb backend then this procedure should also work on TiDB Cloud. Let's try to keep all of this working with TiDB Cloud and document if anything special is needed for that.

  1. Using a schema file.

The example uses no-schema = true and manually creates the schema. We could put the SQL in a file and let tidb-lightning take care of this.

  1. The goal

I think the goal of importing the sample data is:

  • Giving people some data in TiDB to allow them to play with this with SQL.
    • Maybe add some example queries?
    • What about adding indexes?
    • Using the data in dumpling format from S3 might be faster
  • Having people learn how to import data into TiDB
  1. Newer data

https://s3.amazonaws.com/capitalbikeshare-data/index.html has data from 2010 until last month (2022-09). Maybe we should use the newer data. Note that the newer files have different fields.

  1. Data amount

To see the full benefits of TiDB it might be needed to load more years of data than the example uses. Maybe list the commands to import all data as well.

  1. TiFlash.

Might be good to give an example about TiFlash with this data (adding replica and explain, etc)

  1. OSS Insight.

OSS Insight is a full example as it has the data and an opensource application. Maybe we want to either use OSS Insight data here or create a small example frontend for the bikeshare data to give a more complete example.

If we do this we could use some of the example queries from the TiDB Cloud Playground functionality.

Maybe of these points are probably out of scope for this specific PR. For now I think for this PR we should:

  1. Keep both methods (LOAD DATA... and tidb-lightning)
  2. Consider putting the SQL in a file for tidb-lightning.
  3. Add some example queries. Maybe add an index.
  4. Load newer data
  5. List optional steps to load more data

import-example-data.md Outdated Show resolved Hide resolved
@ti-chi-bot
Copy link

ti-chi-bot bot commented Jun 26, 2023

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ti-chi-bot ti-chi-bot bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 26, 2023
Copy link
Contributor

@dveeden dveeden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine. However if I recall correctly not all CSV files had the same set of columns as that changed at some point in time. Does this work correctly for all CSV files?

import-example-data.md Outdated Show resolved Hide resolved
@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jan 7, 2025
Copy link

ti-chi-bot bot commented Jan 7, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-01-07 07:24:36.886290641 +0000 UTC m=+252020.175122345: ☑️ agreed by dveeden.

Co-authored-by: Daniël van Eeden <[email protected]>
@alastori
Copy link
Contributor

alastori commented Jan 23, 2025

This seems fine. However if I recall correctly not all CSV files had the same set of columns as that changed at some point in time. Does this work correctly for all CSV files?

Yes. I did a quick test using the procedure and I could import all the files:

#Total files
% ls -1 *.csv | wc -l
      26

# Total lines      
% wc -l *.csv | awk '/total/ {print $1}'
19117669
mysql> SELECT COUNT(*) + 26 AS adjusted_row_count FROM bikeshare.trips;
+--------------------+
| adjusted_row_count |
+--------------------+
|           19117669 |
+--------------------+
1 row in set (0.00 sec)

@hfxsd hfxsd added the lgtm label Jan 23, 2025
@hfxsd
Copy link
Collaborator Author

hfxsd commented Jan 23, 2025

/approve

Copy link

ti-chi-bot bot commented Jan 23, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hfxsd

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the approved label Jan 23, 2025
@hfxsd hfxsd added the needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. label Jan 23, 2025
@ti-chi-bot ti-chi-bot bot merged commit 46660a9 into pingcap:master Jan 23, 2025
9 checks passed
ti-chi-bot pushed a commit to ti-chi-bot/docs that referenced this pull request Jan 23, 2025
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.5: #20131.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved area/general Relates to TiDB overview, architecture, and other general descriptions. lgtm needs-1-more-lgtm Indicates a PR needs 1 more LGTM. needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. translation/no-need No need to translate this PR. type/bugfix This PR fixes a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Import Example Database does not work with 'tiup client'
4 participants