Refactored the analyze-schema regexes as mentioned in the issue #519 #761

priyanshi-yb · 2023-02-01T09:01:52Z

closes #519

Added automation tests for most of the regexes.

Also

removed the CREATE INDEX CONCURRENTLY case, as supported on 2.14 (refer [YSQL] Support CREATE INDEX CONCURRENTLY yugabyte-db#10799).
changed the case for REINDEX CONCURRENTLY to REINDEX anything as any REINDEX is not supported

amit-yb

Just a few minor comments.

amit-yb · 2023-02-02T06:55:44Z

yb-voyager/cmd/analyzeSchema.go

@@ -50,6 +50,35 @@ type sqlInfo struct {
 	formattedStmt string
 }

+var (
+	anything                       = `.*`
+	ws                             = `[\s\n\t]*`


Shouldn't there be at least one white-space character? I mean, the regex should be [\s\n\t]+.
If you want an "optional white-space", then define a separate regex.

amit-yb · 2023-02-02T06:58:14Z

yb-voyager/cmd/analyzeSchema.go

+	commaSeperatedStrings          = `[^,]+(?:,[^,]+){0,}`
+	commaSeperatedStringsMoreThan1 = `[^,]+(?:,[^,]+){1,}`


You can name the first expression as optionalCommaSeparatedTokens and the second one as commaSeparatedTokens.
For the second one, did you mean one or more than one?

it is more than one comma separated.

That is not what the implementation does:

Yes, I think I am explaining in wrong way. It is to capture more than 1 comma separated string. If it will be 1 then it won't be comma separated.
Basically what I am trying to say is that comma separated will be catched with second one and if it won't be comma separated which means only single string then can also be catched by first.
Oh got it! My naming changed the meaning of them. Your's makes sense. Thanks! I will change it.

amit-yb · 2023-02-02T06:59:05Z

yb-voyager/cmd/analyzeSchema.go

+	ifNotExists                    = opt("IF NOT EXISTS")
+	commaSeperatedStrings          = `[^,]+(?:,[^,]+){0,}`
+	commaSeperatedStringsMoreThan1 = `[^,]+(?:,[^,]+){1,}`
+	normalIdent                    = `[a-zA-Z0-9_]+`


What is "normal"? :-)
How about unquotedIdent?

change to unqualifiedIdent

amit-yb · 2023-02-02T07:01:20Z

yb-voyager/cmd/analyzeSchema.go

+}
+
+func opt(tokens ...string) string {
+	return fmt.Sprintf("(%s%s)?", cat(tokens...), ws)


I don't think that you will need the trailing white-space token here.

amit-yb · 2023-02-02T07:04:29Z

yb-voyager/cmd/analyzeSchema.go

-	intvlRegex            = regexp.MustCompile(`(?i)CREATE TABLE (IF NOT EXISTS )?([a-zA-Z0-9_."]+) .*interval PRIMARY`)
+	createConvRegex       = re("CREATE", opt("DEFAULT"), "CONVERSION", capture(ident))
+	alterConvRegex        = re("ALTER", "CONVERSION", capture(ident))
+	gistRegex             = re("CREATE", "INDEX", ifNotExists, capture(ident), "ON", capture(ident), anything, "USING GIST")


"USING", "GIST"
That will take care of multiple white-spaces between the two tokens.
Same applies to other similar occurrences.

amit-yb · 2023-02-02T07:13:14Z

yb-voyager/cmd/analyzeSchema.go

+	amRegex               = re("CREATE", "ACCESS", "METHOD", capture(ident))
+	idxConcRegex          = re("REINDEX ", anything, " ", capture(ident))
+	storedRegex           = re(capture(normalIdent), capture(normalIdent), "GENERATED", "ALWAYS", anything, "STORED")
+	partitionColumnsRegex = re("CREATE", "TABLE", ifNotExists, capture(ident), `\(`+capture(commaSeperatedStrings)+`\) PARTITION BY`, capture("[A-Za-z]+"), `\(`, capture(commaSeperatedStrings), `\)`)


func parenth(s) string { return cat(`(`, s, `)`) }

Then you can use something like:

parenth(capture(optionalCommaSeparatedTokens))

What is the meaning of "[A-Za-z]+"?

"[A-Za-z]+" it means that to capture RANGE/HASH/LIST/.. type of the partition.

amit-yb · 2023-02-02T07:15:50Z

yb-voyager/cmd/analyzeSchema.go

+	storedRegex           = re(capture(normalIdent), capture(normalIdent), "GENERATED", "ALWAYS", anything, "STORED")
+	partitionColumnsRegex = re("CREATE", "TABLE", ifNotExists, capture(ident), `\(`+capture(commaSeperatedStrings)+`\) PARTITION BY`, capture("[A-Za-z]+"), `\(`, capture(commaSeperatedStrings), `\)`)
+	likeAllRegex          = re("CREATE", "TABLE", ifNotExists, capture(ident), anything, "LIKE", anything, "INCLUDING ALL")
+	likeRegex             = re("CREATE", "TABLE", ifNotExists, capture(ident), anything, `\(`, "LIKE")


Are you sure you need to escape the ( character even inside the raw quotes (i.e. backticks).

Yes, like in this case

re("CREATE", "TABLE", ifNotExists, capture(ident), anything, `\(`, "LIKE")

if I didn't escape it will ask to add a closing brace but we want it to just match as brace in the SQL query.

…ng on 2.14)

…REINDEX is not supported (yugabyte/yugabyte-db#10267)

amit-yb · 2023-02-06T08:45:57Z

yb-voyager/cmd/analyzeSchema.go

+}
+
+func opt(tokens ...string) string {
+	return fmt.Sprintf("(%s)?", strings.Join(tokens, ws))


Use cat() instead of strings.Join().

amit-yb · 2023-02-06T08:47:41Z

yb-voyager/cmd/analyzeSchema.go

+	ident                         = `[a-zA-Z0-9_."]+`
+	ifExists                      = opt("IF EXISTS")
+	ifNotExists                   = opt("IF NOT EXISTS")
+	optionalCommaSeperatedStrings = `[^,]+(?:,[^,]+){0,}`


It is better to refer the items as "tokens" instead of strings. Comma separated strings imply something like ("a", "b", "c")

amit-yb · 2023-02-06T08:48:37Z

yb-voyager/cmd/analyzeSchema.go

+	ws                            = `[\s\n\t]+`
+	multiWs                       = `[\s\n\t]*`
+	ident                         = `[a-zA-Z0-9_."]+`
+	ifExists                      = opt("IF EXISTS")


opt("IF", "EXISTS")

amit-yb · 2023-02-06T08:48:57Z

yb-voyager/cmd/analyzeSchema.go

+	multiWs                       = `[\s\n\t]*`
+	ident                         = `[a-zA-Z0-9_."]+`
+	ifExists                      = opt("IF EXISTS")
+	ifNotExists                   = opt("IF NOT EXISTS")


opt("IF", "NOT", "EXISTS")

amit-yb

LGTM with just one minor comment. Good work 👍 !!

amit-yb · 2023-02-06T11:47:48Z

yb-voyager/cmd/analyzeSchema.go

+}
+
+func parenth(s string) string {
+	return `\(` + s + `\)`


cat(optionalWhiteSpaces, (, optionalWhiteSpaces, s, optionalWhiteSpaces, ), optionalWhiteSpaces)

Looking at the above expression, I think it is better to rename optionalWhiteSpaces as ows. What do you think?

Yah or if that is not much readable, can be changed to optionalWs ?

@amit-yb changed it to optionalWS

amit-yb

LGTM.

priyanshi-yb requested a review from amit-yb February 1, 2023 09:01

priyanshi-yb force-pushed the priyanshi/refactor-analyze-schema branch from 1d8119f to 381cab3 Compare February 1, 2023 10:34

priyanshi-yb requested a review from sanyamsinghal February 1, 2023 10:52

priyanshi-yb assigned rahulb-yb and unassigned rahulb-yb Feb 1, 2023

priyanshi-yb requested a review from rahulb-yb February 1, 2023 10:52

amit-yb reviewed Feb 2, 2023

View reviewed changes

priyanshi-yb force-pushed the priyanshi/refactor-analyze-schema branch from 9998417 to 25c3364 Compare February 3, 2023 11:21

priyanshi-yb added 8 commits February 3, 2023 18:34

refactored the analyze-schema regexes as mentioned in the issue #519

46988ef

added automation tests in analyze-schema

7ec3439

removed the create index concurrently case from analyze-schema (worki…

4f38940

…ng on 2.14)

removed test case for the CREATE INDEX CONCURRENTLY

d5b7bcd

removed unneccessary comments of regexes

c7f9e7b

changed the case for REINDEX CONCURRENTLY to REINDEX anything as any …

34667e9

…REINDEX is not supported (yugabyte/yugabyte-db#10267)

changes as per review

f667432

change as per review

8a8da6e

priyanshi-yb force-pushed the priyanshi/refactor-analyze-schema branch from e851ed3 to 8a8da6e Compare February 3, 2023 18:51

minor fix for go build

eea2906

amit-yb reviewed Feb 6, 2023

View reviewed changes

some more changes as per review

0f08a44

priyanshi-yb force-pushed the priyanshi/refactor-analyze-schema branch from d3370cf to 0f08a44 Compare February 6, 2023 10:04

amit-yb approved these changes Feb 6, 2023

View reviewed changes

priyanshi-yb added 2 commits February 7, 2023 04:27

minor change

71bfa70

minor change

ae8f518

amit-yb approved these changes Feb 7, 2023

View reviewed changes

priyanshi-yb merged commit b15ed84 into main Feb 7, 2023

priyanshi-yb deleted the priyanshi/refactor-analyze-schema branch February 7, 2023 07:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactored the analyze-schema regexes as mentioned in the issue #519 #761

Refactored the analyze-schema regexes as mentioned in the issue #519 #761

priyanshi-yb commented Feb 1, 2023 •

edited

Loading

amit-yb left a comment

amit-yb Feb 2, 2023

amit-yb Feb 2, 2023

priyanshi-yb Feb 3, 2023

amit-yb Feb 3, 2023

priyanshi-yb Feb 3, 2023

amit-yb Feb 2, 2023

priyanshi-yb Feb 3, 2023

amit-yb Feb 2, 2023

amit-yb Feb 2, 2023

amit-yb Feb 2, 2023

priyanshi-yb Feb 3, 2023 •

edited

Loading

amit-yb Feb 2, 2023

priyanshi-yb Feb 3, 2023 •

edited

Loading

amit-yb Feb 6, 2023

amit-yb Feb 6, 2023

amit-yb Feb 6, 2023

amit-yb Feb 6, 2023

amit-yb left a comment

amit-yb Feb 6, 2023

priyanshi-yb Feb 6, 2023

priyanshi-yb Feb 7, 2023

amit-yb left a comment

		commaSeperatedStrings = `[^,]+(?:,[^,]+){0,}`
		commaSeperatedStringsMoreThan1 = `[^,]+(?:,[^,]+){1,}`

Refactored the analyze-schema regexes as mentioned in the issue #519 #761

Refactored the analyze-schema regexes as mentioned in the issue #519 #761

Conversation

priyanshi-yb commented Feb 1, 2023 • edited Loading

amit-yb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

priyanshi-yb Feb 3, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

priyanshi-yb Feb 3, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amit-yb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amit-yb left a comment

Choose a reason for hiding this comment

priyanshi-yb commented Feb 1, 2023 •

edited

Loading

priyanshi-yb Feb 3, 2023 •

edited

Loading

priyanshi-yb Feb 3, 2023 •

edited

Loading