Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: support fixed dimension vector #55002

Merged

Conversation

EricZequan
Copy link
Contributor

@EricZequan EricZequan commented Jul 29, 2024

What problem does this PR solve?

Issue Number: ref #54245

Problem Summary:

What changed and how does it work?

Support fixed size vector types like VECTOR(3).

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
mysql> DROP TABLE t;
Query OK, 0 rows affected (0.12 sec)

mysql> create table t(embedding VECTOR);
Query OK, 0 rows affected (0.06 sec)

mysql> INSERT INTO t VALUES ('[1,2,3]'), ('[4,5]'), ('[6]');
Query OK, 3 rows affected (0.01 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> ALTER TABLE t MODIFY COLUMN embedding VECTOR(0);
ERROR 1105 (HY000): vector has 3 dimensions, does not fit VECTOR(0)
mysql> ALTER TABLE t MODIFY COLUMN embedding VECTOR(16001);
ERROR 1105 (HY000): vector cannot have more than 16000 dimensions
mysql> ALTER TABLE t MODIFY COLUMN embedding VECTOR(3);
ERROR 1105 (HY000): vector has 2 dimensions, does not fit VECTOR(3)
mysql> DELETE FROM t WHERE VEC_DIMS(embedding) != 3;
Query OK, 2 rows affected (0.01 sec)

mysql> ALTER TABLE t MODIFY COLUMN embedding VECTOR(3);
Query OK, 0 rows affected (0.14 sec)

mysql> SHOW COLUMNS FROM t;
+-----------+------------------+------+------+---------+-------+
| Field     | Type             | Null | Key  | Default | Extra |
+-----------+------------------+------+------+---------+-------+
| embedding | vector<float>(3) | YES  |      | NULL    |       |
+-----------+------------------+------+------+---------+-------+
1 row in set (0.00 sec)

mysql> ALTER TABLE t MODIFY COLUMN embedding VECTOR;
Query OK, 0 rows affected (0.07 sec)

mysql> 
mysql> SHOW COLUMNS FROM t;
+-----------+---------------+------+------+---------+-------+
| Field     | Type          | Null | Key  | Default | Extra |
+-----------+---------------+------+------+---------+-------+
| embedding | vector<float> | YES  |      | NULL    |       |
+-----------+---------------+------+------+---------+-------+
1 row in set (0.00 sec)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

EricZequan and others added 23 commits July 15, 2024 18:33
Signed-off-by: “EricZequan” <[email protected]>
Signed-off-by: “EricZequan” <[email protected]>
Signed-off-by: “EricZequan” <[email protected]>
Signed-off-by: “EricZequan” <[email protected]>
Signed-off-by: “EricZequan” <[email protected]>
Signed-off-by: “EricZequan” <[email protected]>
Signed-off-by: “EricZequan” <[email protected]>
Signed-off-by: “EricZequan” <[email protected]>
Signed-off-by: “EricZequan” <[email protected]>
Signed-off-by: “EricZequan” <[email protected]>
Signed-off-by: “EricZequan” <[email protected]>
Signed-off-by: “EricZequan” <[email protected]>
Signed-off-by: “EricZequan” <[email protected]>
Signed-off-by: “EricZequan” <[email protected]>
Signed-off-by: “EricZequan” <[email protected]>
Signed-off-by: “EricZequan” <[email protected]>
Signed-off-by: “EricZequan” <[email protected]>
Signed-off-by: “EricZequan” <[email protected]>
Signed-off-by: “EricZequan” <[email protected]>
Signed-off-by: “EricZequan” <[email protected]>
@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jul 29, 2024
Copy link

tiprow bot commented Jul 29, 2024

Hi @EricZequan. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@EricZequan
Copy link
Contributor Author

/retest

Copy link

tiprow bot commented Jul 29, 2024

@EricZequan: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link

codecov bot commented Jul 29, 2024

Codecov Report

Attention: Patch coverage is 72.13115% with 17 lines in your changes missing coverage. Please review.

Please upload report for BASE (feature/vector-search/vector-data-type@5389de9). Learn more about missing BASE report.

Additional details and impacted files
@@                             Coverage Diff                             @@
##             feature/vector-search/vector-data-type     #55002   +/-   ##
===========================================================================
  Coverage                                          ?   75.4096%           
===========================================================================
  Files                                             ?       1561           
  Lines                                             ?     439477           
  Branches                                          ?          0           
===========================================================================
  Hits                                              ?     331408           
  Misses                                            ?      87508           
  Partials                                          ?      20561           
Flag Coverage Δ
integration 50.8258% <0.0000%> (?)
unit 71.6977% <72.1311%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 52.9656% <0.0000%> (?)
parser ∅ <0.0000%> (?)
br 63.1957% <0.0000%> (?)

@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Aug 2, 2024
@ti-chi-bot ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Aug 2, 2024
Copy link
Member

@breezewish breezewish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

This part is missing?

pkg/expression/integration_test/integration_test.go Outdated Show resolved Hide resolved
pkg/types/datum.go Show resolved Hide resolved
Signed-off-by: “EricZequan” <[email protected]>
Signed-off-by: “EricZequan” <[email protected]>
@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Aug 5, 2024
// CheckVectorDimValid checks if the vector's dimension is valid.
func CheckVectorDimValid(dim int) error {
const (
maxVectorDimension = 16000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the maximum dimension supported in MySQL is 16383. We should align to it according to the feature spec. /cc @breezewish

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. Please also add a TiDB-CSE PR to update this limit. @EricZequan

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roger that.🫡

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Aug 5, 2024
Copy link

ti-chi-bot bot commented Aug 5, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-08-05 07:37:47.886995853 +0000 UTC m=+251197.754094939: ☑️ agreed by breezewish.
  • 2024-08-05 08:46:43.746153468 +0000 UTC m=+255333.613252557: ☑️ agreed by tangenta.

@ti-chi-bot ti-chi-bot bot added the ok-to-test Indicates a PR is ready to be tested. label Aug 5, 2024
@pingcap pingcap deleted a comment from tiprow bot Aug 5, 2024
@pingcap pingcap deleted a comment from EricZequan Aug 5, 2024
@pingcap pingcap deleted a comment from tiprow bot Aug 6, 2024
Copy link

ti-chi-bot bot commented Aug 6, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: breezewish, hawkingrei, tangenta, XuHuaiyu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the approved label Aug 6, 2024
@ti-chi-bot ti-chi-bot bot merged commit 7fff125 into pingcap:feature/vector-search/vector-data-type Aug 6, 2024
21 checks passed
@pingcap pingcap deleted a comment from ti-chi-bot bot Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm ok-to-test Indicates a PR is ready to be tested. release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants