Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48344][SQL] Prepare SQL Scripting for addition of Execution Framework #48879

Conversation

miland-db
Copy link
Contributor

@miland-db miland-db commented Nov 18, 2024

What changes were proposed in this pull request?

This PR is Initial refactoring of SQL Scripting to prepare it for addition of Execution Framework:

  • Move all files to proper directories/paths.
  • Convert SqlScriptingLogicalOperators to SqlScriptingLogicalPlans.
  • Remove CompoundNestedStatementIteratorExec because it is unnecessary abstraction.
  • Remove parseScript because it is no more needed. Parsing is done in parsePlan method.

Why are the changes needed?

This changes are needed so execution of SQL Scripts can be implemented properly.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@miland-db miland-db marked this pull request as draft November 18, 2024 13:01
@miland-db miland-db marked this pull request as ready for review November 18, 2024 13:49
@miland-db
Copy link
Contributor Author

@davidm-db
Copy link
Contributor

I'm not sure at which point, but we should introduce two things missing here:

  1. Top level compound body should be forbidden - until we find a way to resolve/fix the problem with ambiguity when it is provided.
  2. exitCompoundOrSingleStatement() in parsers.scala should be implemented.

@HyukjinKwon HyukjinKwon changed the title [SPARK-48344] Prepare SQL Scripting for addition of Execution Framework [SPARK-48344][SQL] Prepare SQL Scripting for addition of Execution Framework Nov 19, 2024

/**
* Trait for all SQL Scripting logical operators that are product of parsing phase.
* These operators will be used by the SQL Scripting interpreter to generate execution nodes.
*/
sealed trait CompoundPlanStatement
sealed trait CompoundPlanStatement extends LogicalPlan

/**
* Logical operator representing result of parsing a single SQL statement
* that is supposed to be executed against Spark.
* @param parsedPlan Result of SQL statement parsing.
*/
case class SingleStatement(parsedPlan: LogicalPlan)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still need this abstraction? It's only used as the condition in other statements like IfElseStatement, but seems we can just use LogicalPlan now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though we can use LogicalPlan I would keep this abstraction to differ between SQL Scripts and allowed statements there and "normal" statements outside of script. There will be multiple iterations of refactoring so I would do a follow-up to remove this abstraction if it is really not needed.

One place where it is being used is in AbstractSqlParser -> parsePlan method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future we will also need some kind of abstraction so we know that we should eagerly execute scripts via Spark Connect. So I would keep this for now and discuss this later.

@miland-db miland-db requested a review from cloud-fan November 19, 2024 15:45
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 8791767 Nov 20, 2024
MaxGekk pushed a commit that referenced this pull request Dec 12, 2024
…s and Local Variables

### What changes were proposed in this pull request?
This PR is **third** in series of refactoring and introducing SQL Scripting Execution Framework:
- Introducing `SqlScriptingExecutionContext`, object to keep current state of script execution.
- Introducing `Frames` and `Scopes` to support Local Variables and Error Handlers resolution.
- Decoupled `SqlScriptingIterator` from `SqlScriptExecution`.
- Enabling execution of SQL Scripting using `sql()` API.
- Updated `SqlScriptingExecutionNodeSuite` so tests remain independent of concept of Frames and Scopes.

First [PR](#48879)
Second [PR](#48950)

### Why are the changes needed?
This changes are needed to enable introduction of Error Handling mechanism and Local Variables.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing tests that were updated to support new concepts introduced in this PR.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #49006 from miland-db/milan-dankovic_data/refactor-execution-3.

Lead-authored-by: Milan Dankovic <[email protected]>
Co-authored-by: David Milicevic <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
ericm-db pushed a commit to ericm-db/spark that referenced this pull request Dec 17, 2024
…s and Local Variables

### What changes were proposed in this pull request?
This PR is **third** in series of refactoring and introducing SQL Scripting Execution Framework:
- Introducing `SqlScriptingExecutionContext`, object to keep current state of script execution.
- Introducing `Frames` and `Scopes` to support Local Variables and Error Handlers resolution.
- Decoupled `SqlScriptingIterator` from `SqlScriptExecution`.
- Enabling execution of SQL Scripting using `sql()` API.
- Updated `SqlScriptingExecutionNodeSuite` so tests remain independent of concept of Frames and Scopes.

First [PR](apache#48879)
Second [PR](apache#48950)

### Why are the changes needed?
This changes are needed to enable introduction of Error Handling mechanism and Local Variables.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing tests that were updated to support new concepts introduced in this PR.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#49006 from miland-db/milan-dankovic_data/refactor-execution-3.

Lead-authored-by: Milan Dankovic <[email protected]>
Co-authored-by: David Milicevic <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants