Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored the analyze-schema regexes as mentioned in the issue #519 #761

Merged
merged 12 commits into from
Feb 7, 2023

Conversation

priyanshi-yb
Copy link
Contributor

@priyanshi-yb priyanshi-yb commented Feb 1, 2023

closes #519

Added automation tests for most of the regexes.

Also

  1. removed the CREATE INDEX CONCURRENTLY case, as supported on 2.14 (refer [YSQL] Support CREATE INDEX CONCURRENTLY yugabyte-db#10799).
  2. changed the case for REINDEX CONCURRENTLY to REINDEX anything as any REINDEX is not supported

@priyanshi-yb priyanshi-yb requested a review from amit-yb February 1, 2023 09:01
@priyanshi-yb priyanshi-yb force-pushed the priyanshi/refactor-analyze-schema branch from 1d8119f to 381cab3 Compare February 1, 2023 10:34
@priyanshi-yb priyanshi-yb assigned rahulb-yb and unassigned rahulb-yb Feb 1, 2023
Copy link
Contributor

@amit-yb amit-yb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor comments.

@@ -50,6 +50,35 @@ type sqlInfo struct {
formattedStmt string
}

var (
anything = `.*`
ws = `[\s\n\t]*`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't there be at least one white-space character? I mean, the regex should be [\s\n\t]+.
If you want an "optional white-space", then define a separate regex.

Comment on lines 59 to 60
commaSeperatedStrings = `[^,]+(?:,[^,]+){0,}`
commaSeperatedStringsMoreThan1 = `[^,]+(?:,[^,]+){1,}`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can name the first expression as optionalCommaSeparatedTokens and the second one as commaSeparatedTokens.
For the second one, did you mean one or more than one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is more than one comma separated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is not what the implementation does:
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think I am explaining in wrong way. It is to capture more than 1 comma separated string. If it will be 1 then it won't be comma separated.
Basically what I am trying to say is that comma separated will be catched with second one and if it won't be comma separated which means only single string then can also be catched by first.
Oh got it! My naming changed the meaning of them. Your's makes sense. Thanks! I will change it.

ifNotExists = opt("IF NOT EXISTS")
commaSeperatedStrings = `[^,]+(?:,[^,]+){0,}`
commaSeperatedStringsMoreThan1 = `[^,]+(?:,[^,]+){1,}`
normalIdent = `[a-zA-Z0-9_]+`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is "normal"? :-)
How about unquotedIdent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to unqualifiedIdent

}

func opt(tokens ...string) string {
return fmt.Sprintf("(%s%s)?", cat(tokens...), ws)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that you will need the trailing white-space token here.

intvlRegex = regexp.MustCompile(`(?i)CREATE TABLE (IF NOT EXISTS )?([a-zA-Z0-9_."]+) .*interval PRIMARY`)
createConvRegex = re("CREATE", opt("DEFAULT"), "CONVERSION", capture(ident))
alterConvRegex = re("ALTER", "CONVERSION", capture(ident))
gistRegex = re("CREATE", "INDEX", ifNotExists, capture(ident), "ON", capture(ident), anything, "USING GIST")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"USING", "GIST"
That will take care of multiple white-spaces between the two tokens.
Same applies to other similar occurrences.

amRegex = re("CREATE", "ACCESS", "METHOD", capture(ident))
idxConcRegex = re("REINDEX ", anything, " ", capture(ident))
storedRegex = re(capture(normalIdent), capture(normalIdent), "GENERATED", "ALWAYS", anything, "STORED")
partitionColumnsRegex = re("CREATE", "TABLE", ifNotExists, capture(ident), `\(`+capture(commaSeperatedStrings)+`\) PARTITION BY`, capture("[A-Za-z]+"), `\(`, capture(commaSeperatedStrings), `\)`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

func parenth(s) string {
    return cat(`(`, s, `)`)
}

Then you can use something like:

parenth(capture(optionalCommaSeparatedTokens))

What is the meaning of "[A-Za-z]+"?

Copy link
Contributor Author

@priyanshi-yb priyanshi-yb Feb 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"[A-Za-z]+" it means that to capture RANGE/HASH/LIST/.. type of the partition.

storedRegex = re(capture(normalIdent), capture(normalIdent), "GENERATED", "ALWAYS", anything, "STORED")
partitionColumnsRegex = re("CREATE", "TABLE", ifNotExists, capture(ident), `\(`+capture(commaSeperatedStrings)+`\) PARTITION BY`, capture("[A-Za-z]+"), `\(`, capture(commaSeperatedStrings), `\)`)
likeAllRegex = re("CREATE", "TABLE", ifNotExists, capture(ident), anything, "LIKE", anything, "INCLUDING ALL")
likeRegex = re("CREATE", "TABLE", ifNotExists, capture(ident), anything, `\(`, "LIKE")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you need to escape the ( character even inside the raw quotes (i.e. backticks).

Copy link
Contributor Author

@priyanshi-yb priyanshi-yb Feb 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, like in this case

re("CREATE", "TABLE", ifNotExists, capture(ident), anything, `\(`, "LIKE")

if I didn't escape it will ask to add a closing brace but we want it to just match as brace in the SQL query.

@priyanshi-yb priyanshi-yb force-pushed the priyanshi/refactor-analyze-schema branch from 9998417 to 25c3364 Compare February 3, 2023 11:21
@priyanshi-yb priyanshi-yb force-pushed the priyanshi/refactor-analyze-schema branch from e851ed3 to 8a8da6e Compare February 3, 2023 18:51
}

func opt(tokens ...string) string {
return fmt.Sprintf("(%s)?", strings.Join(tokens, ws))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use cat() instead of strings.Join().

ident = `[a-zA-Z0-9_."]+`
ifExists = opt("IF EXISTS")
ifNotExists = opt("IF NOT EXISTS")
optionalCommaSeperatedStrings = `[^,]+(?:,[^,]+){0,}`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to refer the items as "tokens" instead of strings. Comma separated strings imply something like ("a", "b", "c")

ws = `[\s\n\t]+`
multiWs = `[\s\n\t]*`
ident = `[a-zA-Z0-9_."]+`
ifExists = opt("IF EXISTS")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opt("IF", "EXISTS")

multiWs = `[\s\n\t]*`
ident = `[a-zA-Z0-9_."]+`
ifExists = opt("IF EXISTS")
ifNotExists = opt("IF NOT EXISTS")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opt("IF", "NOT", "EXISTS")

@priyanshi-yb priyanshi-yb force-pushed the priyanshi/refactor-analyze-schema branch from d3370cf to 0f08a44 Compare February 6, 2023 10:04
Copy link
Contributor

@amit-yb amit-yb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with just one minor comment. Good work 👍 !!

}

func parenth(s string) string {
return `\(` + s + `\)`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cat(optionalWhiteSpaces, (, optionalWhiteSpaces, s, optionalWhiteSpaces, ), optionalWhiteSpaces)

Looking at the above expression, I think it is better to rename optionalWhiteSpaces as ows. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah or if that is not much readable, can be changed to optionalWs ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amit-yb changed it to optionalWS

Copy link
Contributor

@amit-yb amit-yb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@priyanshi-yb priyanshi-yb merged commit b15ed84 into main Feb 7, 2023
@priyanshi-yb priyanshi-yb deleted the priyanshi/refactor-analyze-schema branch February 7, 2023 07:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Simplify regexes in the analyze-schema impl using regex builder
3 participants