-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactored the analyze-schema regexes as mentioned in the issue #519 #761
Conversation
1d8119f
to
381cab3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few minor comments.
yb-voyager/cmd/analyzeSchema.go
Outdated
@@ -50,6 +50,35 @@ type sqlInfo struct { | |||
formattedStmt string | |||
} | |||
|
|||
var ( | |||
anything = `.*` | |||
ws = `[\s\n\t]*` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't there be at least one white-space character? I mean, the regex should be [\s\n\t]+
.
If you want an "optional white-space", then define a separate regex.
yb-voyager/cmd/analyzeSchema.go
Outdated
commaSeperatedStrings = `[^,]+(?:,[^,]+){0,}` | ||
commaSeperatedStringsMoreThan1 = `[^,]+(?:,[^,]+){1,}` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can name the first expression as optionalCommaSeparatedTokens
and the second one as commaSeparatedTokens
.
For the second one, did you mean one or more than one
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is more than one comma separated
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think I am explaining in wrong way. It is to capture more than 1 comma separated string. If it will be 1 then it won't be comma separated.
Basically what I am trying to say is that comma separated will be catched with second one and if it won't be comma separated which means only single string then can also be catched by first.
Oh got it! My naming changed the meaning of them. Your's makes sense. Thanks! I will change it.
yb-voyager/cmd/analyzeSchema.go
Outdated
ifNotExists = opt("IF NOT EXISTS") | ||
commaSeperatedStrings = `[^,]+(?:,[^,]+){0,}` | ||
commaSeperatedStringsMoreThan1 = `[^,]+(?:,[^,]+){1,}` | ||
normalIdent = `[a-zA-Z0-9_]+` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is "normal"? :-)
How about unquotedIdent
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to unqualifiedIdent
yb-voyager/cmd/analyzeSchema.go
Outdated
} | ||
|
||
func opt(tokens ...string) string { | ||
return fmt.Sprintf("(%s%s)?", cat(tokens...), ws) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that you will need the trailing white-space token here.
yb-voyager/cmd/analyzeSchema.go
Outdated
intvlRegex = regexp.MustCompile(`(?i)CREATE TABLE (IF NOT EXISTS )?([a-zA-Z0-9_."]+) .*interval PRIMARY`) | ||
createConvRegex = re("CREATE", opt("DEFAULT"), "CONVERSION", capture(ident)) | ||
alterConvRegex = re("ALTER", "CONVERSION", capture(ident)) | ||
gistRegex = re("CREATE", "INDEX", ifNotExists, capture(ident), "ON", capture(ident), anything, "USING GIST") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"USING", "GIST"
That will take care of multiple white-spaces between the two tokens.
Same applies to other similar occurrences.
yb-voyager/cmd/analyzeSchema.go
Outdated
amRegex = re("CREATE", "ACCESS", "METHOD", capture(ident)) | ||
idxConcRegex = re("REINDEX ", anything, " ", capture(ident)) | ||
storedRegex = re(capture(normalIdent), capture(normalIdent), "GENERATED", "ALWAYS", anything, "STORED") | ||
partitionColumnsRegex = re("CREATE", "TABLE", ifNotExists, capture(ident), `\(`+capture(commaSeperatedStrings)+`\) PARTITION BY`, capture("[A-Za-z]+"), `\(`, capture(commaSeperatedStrings), `\)`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func parenth(s) string {
return cat(`(`, s, `)`)
}
Then you can use something like:
parenth(capture(optionalCommaSeparatedTokens))
What is the meaning of "[A-Za-z]+"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"[A-Za-z]+"
it means that to capture RANGE/HASH/LIST/..
type of the partition.
yb-voyager/cmd/analyzeSchema.go
Outdated
storedRegex = re(capture(normalIdent), capture(normalIdent), "GENERATED", "ALWAYS", anything, "STORED") | ||
partitionColumnsRegex = re("CREATE", "TABLE", ifNotExists, capture(ident), `\(`+capture(commaSeperatedStrings)+`\) PARTITION BY`, capture("[A-Za-z]+"), `\(`, capture(commaSeperatedStrings), `\)`) | ||
likeAllRegex = re("CREATE", "TABLE", ifNotExists, capture(ident), anything, "LIKE", anything, "INCLUDING ALL") | ||
likeRegex = re("CREATE", "TABLE", ifNotExists, capture(ident), anything, `\(`, "LIKE") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure you need to escape the (
character even inside the raw quotes (i.e. backticks).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, like in this case
re("CREATE", "TABLE", ifNotExists, capture(ident), anything, `\(`, "LIKE")
if I didn't escape it will ask to add a closing brace but we want it to just match as brace in the SQL query.
9998417
to
25c3364
Compare
…REINDEX is not supported (yugabyte/yugabyte-db#10267)
e851ed3
to
8a8da6e
Compare
yb-voyager/cmd/analyzeSchema.go
Outdated
} | ||
|
||
func opt(tokens ...string) string { | ||
return fmt.Sprintf("(%s)?", strings.Join(tokens, ws)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use cat()
instead of strings.Join()
.
yb-voyager/cmd/analyzeSchema.go
Outdated
ident = `[a-zA-Z0-9_."]+` | ||
ifExists = opt("IF EXISTS") | ||
ifNotExists = opt("IF NOT EXISTS") | ||
optionalCommaSeperatedStrings = `[^,]+(?:,[^,]+){0,}` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better to refer the items as "tokens" instead of strings
. Comma separated strings imply something like ("a", "b", "c")
yb-voyager/cmd/analyzeSchema.go
Outdated
ws = `[\s\n\t]+` | ||
multiWs = `[\s\n\t]*` | ||
ident = `[a-zA-Z0-9_."]+` | ||
ifExists = opt("IF EXISTS") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opt("IF", "EXISTS")
yb-voyager/cmd/analyzeSchema.go
Outdated
multiWs = `[\s\n\t]*` | ||
ident = `[a-zA-Z0-9_."]+` | ||
ifExists = opt("IF EXISTS") | ||
ifNotExists = opt("IF NOT EXISTS") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opt("IF", "NOT", "EXISTS")
d3370cf
to
0f08a44
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with just one minor comment. Good work 👍 !!
} | ||
|
||
func parenth(s string) string { | ||
return `\(` + s + `\)` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cat(optionalWhiteSpaces,
(, optionalWhiteSpaces, s, optionalWhiteSpaces,
), optionalWhiteSpaces)
Looking at the above expression, I think it is better to rename optionalWhiteSpaces
as ows
. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yah or if that is not much readable, can be changed to optionalWs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amit-yb changed it to optionalWS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
closes #519
Added automation tests for most of the regexes.
Also