Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[vtctldserver] OnlineDDL protobuf message introduction #12195

Closed

Conversation

ajm188
Copy link
Contributor

@ajm188 ajm188 commented Jan 31, 2023

Description

Introducing a formal protobuf type for OnlineDDL types to use across the existing onlineddl code, vtctldserver (and client), and eventually vtadmin.

This is a prerequisite to moving any of the command functionality, as it will help us ensure we're maintaining compatibility if both the new and old code is using the same underlying types.

In order to do this I needed to slightly change some of the field names, which I've (attempted) to do in a backwards-compatible way, issuing deprecation warnings anywhere we parse in an "old" way.

Related Issue(s)

Closes #13713

Checklist

  • "Backport to:" labels have been added if this change should be back-ported
  • Tests were added or are not required
  • Documentation was added or is not required

Deployment Notes

@vitess-bot
Copy link
Contributor

vitess-bot bot commented Jan 31, 2023

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • If this is a change that users need to know about, please apply the release notes (needs details) label so that merging is blocked unless the summary release notes document is included.
  • If a test is added or modified, there should be a documentation on top of the test to explain what the expected behavior is what the test does.

If a new flag is being introduced:

  • Is it really necessary to add this flag?
  • Flag names should be clear and intuitive (as far as possible)
  • Help text should be descriptive.
  • Flag names should use dashes (-) as word separators rather than underscores (_).

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow should be required, the maintainer team should be notified.

Bug fixes

  • There should be at least one unit or end-to-end test.
  • The Pull Request description should include a link to an issue that describes the bug.

Non-trivial changes

  • There should be some code comments as to why things are implemented the way they are.

New/Existing features

  • Should be documented, either by modifying the existing documentation or creating new documentation.
  • New features should have a link to a feature request issue or an RFC that documents the use cases, corner cases and test cases.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • vtctl command output order should be stable and awk-able.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from VTop, if used there.

@vitess-bot vitess-bot bot added NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsWebsiteDocsUpdate What it says labels Jan 31, 2023
@ajm188 ajm188 force-pushed the andrew/vtctldserver-onlineddl-protobuf branch from d90ffd8 to c05f69f Compare February 1, 2023 19:31
@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2023

This PR is being marked as stale because it has been open for 30 days with no activity. To rectify, you may do any of the following:

  • Push additional commits to the associated branch.
  • Remove the stale label.
  • Add a comment indicating why it is not stale.

If no action is taken within 7 days, this PR will be closed.

@github-actions github-actions bot added the Stale Marks PRs as stale after a period of inactivity, which are then closed after a grace period. label Mar 4, 2023
@ajm188 ajm188 removed the Stale Marks PRs as stale after a period of inactivity, which are then closed after a grace period. label Mar 6, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Apr 6, 2023

This PR is being marked as stale because it has been open for 30 days with no activity. To rectify, you may do any of the following:

  • Push additional commits to the associated branch.
  • Remove the stale label.
  • Add a comment indicating why it is not stale.

If no action is taken within 7 days, this PR will be closed.

@github-actions github-actions bot added the Stale Marks PRs as stale after a period of inactivity, which are then closed after a grace period. label Apr 6, 2023
@ajm188 ajm188 removed the Stale Marks PRs as stale after a period of inactivity, which are then closed after a grace period. label Apr 9, 2023
@github-actions
Copy link
Contributor

This PR is being marked as stale because it has been open for 30 days with no activity. To rectify, you may do any of the following:

  • Push additional commits to the associated branch.
  • Remove the stale label.
  • Add a comment indicating why it is not stale.

If no action is taken within 7 days, this PR will be closed.

@github-actions github-actions bot added the Stale Marks PRs as stale after a period of inactivity, which are then closed after a grace period. label May 10, 2023
@github-actions
Copy link
Contributor

This PR was closed because it has been stale for 7 days with no activity.

@github-actions github-actions bot closed this May 17, 2023
Andrew Mason added 16 commits August 1, 2023 06:21
Signed-off-by: Andrew Mason <[email protected]>
Signed-off-by: Andrew Mason <[email protected]>
Signed-off-by: Andrew Mason <[email protected]>
Signed-off-by: Andrew Mason <[email protected]>
Signed-off-by: Andrew Mason <[email protected]>
Signed-off-by: Andrew Mason <[email protected]>
wip
Signed-off-by: Andrew Mason <[email protected]>
Signed-off-by: Andrew Mason <[email protected]>
Andrew Mason added 4 commits August 2, 2023 19:59
@ajm188 ajm188 added Type: Feature Component: Online DDL Online DDL (vitess/native/gh-ost/pt-osc) and removed NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsWebsiteDocsUpdate What it says labels Aug 3, 2023
@ajm188 ajm188 marked this pull request as ready for review August 3, 2023 20:17
@ajm188 ajm188 changed the title [vtctldserver] WIP: onlineddl protobuf [vtctldserver] OnlineDDL protobuf message introduction Aug 4, 2023
Copy link
Contributor

@shlomi-noach shlomi-noach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajm188 wow, what a massive effort! 🙇 I've left several comments throughout the PR. But, I would also like to carefully question a basic premise here. I'm not in on the many intricates of vtctld and you would know better than me; also, you seem to already have a plan for OnlineDDL commands which we haven't discussed, so I definitely might just miss something.

This change is so massive first and foremost due to moving OnlineDDL to proto. The question being: why do we need OnlineDDL to be in proto? While I don't have a clear grasp of how I'd like to see vtctldclient OnlineDDL commands implemented, I am not sure I see the need for OnlineDDL to pass as a message, in either direction. All I think is needed are a command ("cancel", "complete", "show") and a UUID/"all", for most cases. I don't see that the state of an OnlineDDL struct should pass back or forth?

Basically, the entire existence of the OnlineDDL command was made as a convenience. Most of the commands can be applied via ApplySchema anyway. So it's mostly syntactic sugar. It was created at a time where we envisioned VExec to be a new mechanism with which to control all-things-internal, but we've moved away from that idea. But the command vtctl OnlineDDL caught on, being useful, and so we want to keep supporting it -- however I'd hate for this otherwise syntactic-sugar command to cause such a disruption in the code/design. In my comments below I raise a few concerns over some of the changes.

I'm happy to discuss further. This is such a tremendous work 👏 and again I might miss something very important, but I also want to be cautious and make sure we don't make changes that aren't strictly necessary 🙏

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain what the functions in this file do? I'm a bit lost.

@@ -287,7 +287,7 @@ func commandReloadSchemaShard(cmd *cobra.Command, args []string) error {
func init() {
ApplySchema.Flags().MarkDeprecated("--allow-long-unavailability", "")
ApplySchema.Flags().MarkDeprecated("--skip-preflight", "Deprecated. Assumed to be always 'true'")
ApplySchema.Flags().StringVar(&applySchemaOptions.DDLStrategy, "ddl-strategy", string(schema.DDLStrategyDirect), "Online DDL strategy, compatible with @@ddl_strategy session variable (examples: 'gh-ost', 'pt-osc', 'gh-ost --max-load=Threads_running=100'.")
ApplySchema.Flags().StringVar(&applySchemaOptions.DDLStrategy, "ddl-strategy", strings.ToLower(tabletmanagerdatapb.OnlineDDL_Strategy_name[int32(tabletmanagerdatapb.OnlineDDL_DIRECT)]), "Online DDL strategy, compatible with @@ddl_strategy session variable (examples: 'gh-ost', 'pt-osc', 'gh-ost --max-load=Threads_running=100'.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a helper function to simplify strings.ToLower(tabletmanagerdatapb.OnlineDDL_Strategy_name[int32(...)])? This kind of logic recurs a few times in the code and I find it difficult to parse.

@@ -221,11 +223,11 @@ func TestSchemaChange(t *testing.T) {

testWithInitialSchema(t)
t.Run("create non_online", func(t *testing.T) {
_ = testOnlineDDLStatement(t, alterTableNormalStatement, string(schema.DDLStrategyDirect), "vtctl", "non_online", "")
_ = testOnlineDDLStatement(t, alterTableNormalStatement, "direct", "vtctl", "non_online", "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a helper function that translates OnlineDDL_DIRECT to the string "direct"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably based on tabletmanagerdatapb.OnlineDDL_Strategy_name[int32(strategy)]?

return tabletmanagerdatapb.OnlineDDL_DIRECT, nil
}

lowerName := strings.ToUpper(name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lowerName := strings.ToUpper(...) is contradictory is confusing. Let's do use ToLower() and look for texts like "gh-ost" instread of "GH-OST"

Suggested change
lowerName := strings.ToUpper(name)
lowerName := strings.ToLower(name)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah it should actually be upperName because we need the ALL_CAPS to index into the enum table

Comment on lines +49 to +57
var assertion func(t assert.TestingT, value bool, msgAndArgs ...any) bool
switch direct {
case true:
assertion = assert.True
case false:
assertion = assert.False
}

assertion(t, NewDDLStrategySetting(strategy, "").IsDirect())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this reads simpler:

Suggested change
var assertion func(t assert.TestingT, value bool, msgAndArgs ...any) bool
switch direct {
case true:
assertion = assert.True
case false:
assertion = assert.False
}
assertion(t, NewDDLStrategySetting(strategy, "").IsDirect())
assert.Equal(t, direct, NewDDLStrategySetting(strategy, "").IsDirect())

// should use the IsReadyToComplete and MarkReadyToComplete methods to
// safely modify the value of that field rather than accessing it through
// the embedded protobuf message directly.
m sync.Mutex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wary of adding a mutex member of OnlineDDL. Throughout the existing code we pass OnlineDDL by reference and by value, and this does not play well with mutex members as their behavior can be inconsistent. OnlineDDL was designed to be a data-only object. I'd say we can perhaps enforce users to use sync/atomic Bool or similar?

onlineDDL := &OnlineDDL{}
err := json.Unmarshal(bytes, onlineDDL)
return onlineDDL, err
func FromJSON(data []byte) (*OnlineDDL, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it not be safer to create a temporary struct in the likeness of the old OnlineDDL struct, read the JSON into that struct, then copy over the results to the "new" OnlineDDL fields?

onlineDDL.m.Lock()
defer onlineDDL.m.Unlock()

return onlineDDL.ReadyToComplete, onlineDDL.OnlineDDL.WasReadyToComplete
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use sync/atomic here instead of using a Mutex?

// safely modify the value of that field rather than accessing it through
// the embedded protobuf message directly.
m sync.Mutex
*tabletmanagerdatapb.OnlineDDL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use tabletmanagerdatapb.OnlineDDL by value and not by pointer? This will be more inline with the current usage. Again, I'm concerned about copying OnlineDDL values: with *tabletmanagerdatapb.OnlineDDL we risk two instances of OnlineDDL sharing data. Maybe there's nothing in the code to worry about, and if all the tests pass then I guess everything's fine. But the original design was to share nothing, and I'm concerned we might miss a leak.

Copy link
Contributor

@shlomi-noach shlomi-noach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajm188 wow, what a massive effort! 🙇 I've left several comments throughout the PR. But, I would also like to carefully question a basic premise here. I'm not in on the many intricates of vtctld and you would know better than me; also, you seem to already have a plan for OnlineDDL commands which we haven't discussed, so I definitely might just miss something.

This change is so massive first and foremost due to moving OnlineDDL to proto. The question being: why do we need OnlineDDL to be in proto? While I don't have a clear grasp of how I'd like to see vtctldclient OnlineDDL commands implemented, I am not sure I see the need for OnlineDDL to pass as a message, in either direction. All I think is needed are a command ("cancel", "complete", "show") and a UUID/"all", for most cases. I don't see that the state of an OnlineDDL struct should pass back or forth?

Basically, the entire existence of the OnlineDDL command was made as a convenience. Most of the commands can be applied via ApplySchema anyway. So it's mostly syntactic sugar. It was created at a time where we envisioned VExec to be a new mechanism with which to control all-things-internal, but we've moved away from that idea. But the command vtctl OnlineDDL caught on, being useful, and so we want to keep supporting it -- however I'd hate for this otherwise syntactic-sugar command to cause such a disruption in the code/design. In my comments below I raise a few concerns over some of the changes.

I'm happy to discuss further. This is such a tremendous work 👏 and again I might miss something very important, but I also want to be cautious and make sure we don't make changes that aren't strictly necessary 🙏

@shlomi-noach
Copy link
Contributor

(No idea why the review comment submitted twice).

Just a bit more context: the OnlineDDL object is used internally by the tablet, specifically by the onlineddl/Executor to manage the state of migrations. It's only ever used within go/vt/vttablet/onlineddl/.... Externally, there are commands to manipulate migrations, but then those just hand off the UUID of a migration and the keyspace. These commands do not understand the meaning of OnlineDDL's structure. For those commands, IMHO there is no advantage to having a proto for OnlineDDL.

I think the closes to exposing OnlineDDL is SHOW VITESS_MIGRATIONS that runs a SELECT * FROM _vt.schema_migrations. Some of the columns returned by this query, map into an OnlineDDL struct. But by far not all of them. The majority of columns are not represented as fields in OnlineDDL. Hence, the command SHOW VITESS_MIGRATIONS (or OnlineDDL show) needs to somehow return all those columns anyway; having some of them in a proto-ed OnlineDDL doesn't solve that need and so there is IMHO no advantage to having a proto for OnlineDDL for this command, either.

@ajm188
Copy link
Contributor Author

ajm188 commented Aug 7, 2023

@shlomi-noach yep, you are exactly correct (regarding "wait, hang on, do we need this?") and i came to the same realization yesterday when I started on onlineddl show. What we actually need is a protobuf representation of the _vt.schema_migrations table, and the only things that are actually useful in this PR are the status/strategy enums and parsers, which don't even need to be defined within tabletmanagerdata.

Wish I had gone with the whole-hog approach rather than this way as I would have realized that sooner, but c'est la vie!

I've got the protobuf working in another branch which I'll open shortly, I have some open questions about exactly how much we want to be showing

@ajm188
Copy link
Contributor Author

ajm188 commented Aug 8, 2023

we don't need this, instead see #13738

@ajm188 ajm188 closed this Aug 8, 2023
@ajm188 ajm188 deleted the andrew/vtctldserver-onlineddl-protobuf branch August 8, 2023 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Online DDL Online DDL (vitess/native/gh-ost/pt-osc) Type: Feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Define protobuf message for OnlineDDL messages
3 participants