Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create comprehensive data type tests for MySQL #3562

Closed
sherifnada opened this issue May 24, 2021 · 6 comments · Fixed by #3810
Closed

Create comprehensive data type tests for MySQL #3562

sherifnada opened this issue May 24, 2021 · 6 comments · Fixed by #3810
Assignees
Labels
area/connectors Connector related issues lang/java type/enhancement New feature or request

Comments

@sherifnada
Copy link
Contributor

sherifnada commented May 24, 2021

Tell us about the problem you're trying to solve

child of airbytehq/airbyte-internal-issues#61. See that issue's description for an understanding of what we are trying to achieve here.

Describe the solution you’d like

Example table:

Data Type Category of edge case example values
varchar weird characters emojis: "hi there:eyes: "
weird charactersets: UTF-16 or UTF-8
varchar null null
float extreme values NaN, infinity
float values with lots of decimal values 1.23451236211

How to do this ticket:

  1. Please list out all the data types supported natively by the database. For example, for Postgres, all the types can be found here: https://www.postgresql.org/docs/9.5/datatype.html.
  2. Include a link to the reference docs you used so anyone reviewing the proposal can easily compare the reference docs to the types you listed.
  3. Fill out the example table above: in the "Data Type" column, list every data type supported natively by the database. You can probably find this in the database docs e.g: https://dev.mysql.com/doc/refman/8.0/en/data-types.html
  4. Suggest 1 or more categories of edge cases and examples of the values per data type
  5. Have someone on the team review it to stress test the values proposed
  6. Create smaller tickets to encapsulate the work required to create the test cases for the proposed test cases
  7. Place the smaller tickets into the sprint as appropriate and being work on them

FAQs when implementing this ticket

Where to place the tests?

You should create a new test file for the connector under the integration tests directory for the connector: https://github.com/airbytehq/airbyte/tree/master/airbyte-integrations/connectors/source-mysql/src/test-integration/java/io/airbyte/integrations/source/mysql

How should I structure the tests?

There's probably a few ways to go about writing these tests. For all of them, we assume you've started up a TestContainer.

First way is to create one monolithic table with many columns, each column resembling a value from your proposed table above. This will probably be hard to maintain so i'd suggest against it.

You can alternatively create a separate table, one per data type e.g: test_integers for ints, test_varchar for varchars, etc... and verify that the connector can accurately read all those values.

There will probably be some significant code duplication if we do this naively, so please explore how we can parametrize this implementation. Ideally, we would be able to copy the same pattern to all our other databases so we can apply the same testing here.

Please create a sub-ticket of this one (GH doesn't have a concept of sub-tickets so just create another ticket and include a URL to this ticket somewhere in the description body) to encapsulate the first part of this ticket (filling out the table) and another ticket to encapsulate the second part(s) of this ticket (implementation).

┆Issue is synchronized with this Asana task by Unito

@sherifnada sherifnada added type/enhancement New feature or request area/connectors Connector related issues lang/java labels May 24, 2021
@DoNotPanicUA DoNotPanicUA self-assigned this May 26, 2021
@DoNotPanicUA
Copy link
Contributor

DoNotPanicUA commented May 29, 2021

Source links:

Data Type Category of edge case Example values Comment Common test result CDC test result
tinyint null null ok ok
tinyint edge values -128, 127 ok ok
smallint null null ok ok
smallint edge values -32768, 32767 ok ok
smallint zerofill 000001 ok ok
mediumint null null ok ok
mediumint edge values -8388608, 8388607 ok ok
mediumint zerofill 000001 ok ok
int null null ok ok
int edge values -2147483648, 2147483647 ok ok
int zerofill 000001 ok ok
bigint null null ok ok
bigint extreem value 9223372036854775807 ok ok
float null null is subtype of double ok ok
double extreem value 1e308 Insert value: power(10, 308) ok ok
double extreem value 1.4013e-45 Insert value: 1/power(10, 45) ok ok
double null null ok ok
decimal null null is subtype of double ok ok
bit binary value 0x01, 0x00 wrong result. True/False instead of 1/0 wrong result. True/False instead of 1/0
bit null null ok ok
date null null ok ok
date Zero date 0000-00-00, 2021-00-00, 2021-05-00 precondition for insert: SET @@sql_mode='' fail ok
date Zero date 0000-00-00, 2021-00-00, 2021-05-00 precondition for insert: SET @@sql_mode='' fail ok
datetime null null ok ok
datetime Zero date 0000-00-00 00:00:00 precondition for insert: SET @@sql_mode='' fail ok
timestamp null null is subtype ob datetime ok ok
time edge value -838:59:59.000000 fail ok
time null null ok ok
varchar null null ok ok
varchar character set йцу column type: varchar(50) character set cp1251 ok wrong value, but value is wrongly decoded тест
varchar character set column type: varchar(50) character set utf16; insert value: 0xfffd ok wrong value, but result value contains additional symbol �
varchar special char !"#$%&'()*+,-./:;<=>?@[]^_`{|}~ ok ok
varbinary binary value 0x717171 wrong value, binary instead of text wrong value, binary instead of text
varbinary null null ok ok
blob binary value 0x717171 wrong value, binary instead of text wrong value, binary instead of text
blob null null ok ok
mediumtext null null ok ok
mediumtext extreem value 000000000.....0 --max_allowed_packet=200M might required to get data; insert value: lpad('0', 16777214, '0') wrong value, NULL instead of value wrong value, NULL instead of value
tinytext null null ok ok
text null null ok ok
mediumtext null null ok ok
longtext null null ok ok
json null null ok ok
json specific type {"a": 10, "b": 15} ok ok
point null null There are about 7 sililar types the Spatial family. Shall I also test them? ok ok
point specific type 0x00000000001010000F03 Insert value: (ST_GeomFromText('POINT(1 1)')); ok ok
bool/boolean unexpected value null, 127, -128 wrong value, but looks like we have a bug in bool processing for MySQL. According to the MySql doc. only 0 is false. But we treat 0 or less values as false. wrong value, but value parsed as number instead of boolean (true/false)

@subodh1810
Copy link
Contributor

subodh1810 commented May 31, 2021

@DoNotPanicUA This looks great. I think you have covered all the data types and values.

Regarding the question

There are about 7 similar types the Spatial family. Shall I also test them?

I think yes, even though we have not mentioned these data types in our document, it would be nice if we would have an idea of how Airbyte would tackle these data types. I would also advice that dont spend too much time on these data types and just have a test so that we understand the output of syncs.

For the tests that are failing, I would advice just raise a PR with all your tests and ignore/disable the failing ones, as part of another ticket we would go back and fix the code to fix those tests.

Also have you tested this against the CDC path of MySQL as well?

@DoNotPanicUA
Copy link
Contributor

@subodh1810
Thank you for your feedback.
If I get the difference right between CDC and common source testing, there will be very slight difference in the stream configuration and some grants manipulations.
I'm going to finalize abstract method for such tests and then add a test for CDC. I will put my test results into the table.

@DoNotPanicUA
Copy link
Contributor

@subodh1810
I've added test results for the CDC. It works much more stable, tricky data can't fail the processing, but I found few cases where result might be wrong. Please check the result table.

@subodh1810
Copy link
Contributor

@DoNotPanicUA the result table looks fine. I think you can raise the PR for these tests.

@DoNotPanicUA
Copy link
Contributor

The ticket scope is extended during the PR review.
Additionally, the comprehensive test will compare a stream outcome value and expected values defined by a developer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues lang/java type/enhancement New feature or request
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants