Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improvement] [spark-connector] add IntervalYearType and IntervalDayType for spark connector #2596

Closed
FANNG1 opened this issue Mar 19, 2024 · 14 comments
Assignees
Labels
improvement Improvements on everything

Comments

@FANNG1
Copy link
Contributor

FANNG1 commented Mar 19, 2024

What would you like to be improved?

you could refer #2484 add IntervalYearType and IntervalDayType support for spark connector, add transform logic in SparkTypeConverter and add test in TestSparkTypeConverter

How should we improve?

No response

@FANNG1 FANNG1 added improvement Improvements on everything good first issue Good for newcomers and removed good first issue Good for newcomers labels Mar 19, 2024
@FANNG1
Copy link
Contributor Author

FANNG1 commented Mar 19, 2024

@mchades , I checked spark types, Gravitino IntervalYearType and IntervalDayType couldn't convert to Spark YearMonthIntervalType(startField, endField) and DayTimeIntervalType(startField, endField) because they have parameters. Yes?

@mchades
Copy link
Contributor

mchades commented Mar 20, 2024

Gravitino IntervalYearType can convert to Spark YearMonthIntervalType(YEAR, MONTH)
Gravitino IntervalDayType can convert to Spark DayTimeIntervalType(DAY, SECOND)

FYI: https://substrait.io/types/type_classes/

@charliecheng630
Copy link
Contributor

charliecheng630 commented Mar 23, 2024

@FANNG1 @mchades I would like to work on this.
After tracing code, should IntervalYearType and IntervalDayType be added with parameters (startField, endField)?

@FANNG1
Copy link
Contributor Author

FANNG1 commented Mar 23, 2024

@FANNG1 @mchades I would like to work on this. After tracing code, should IntervalYearType and IntervalDayType be added with parameters (startField, endField)?

I'm not sure about this, How should Spark DayTimeIntervalType(DAY, MINUTE) translate to Gravitino? @mchades

@charliecheng630
Copy link
Contributor

charliecheng630 commented Mar 23, 2024

@mchades Do we need to add parameters in Types? If needed, I would like to implement.

@mchades
Copy link
Contributor

mchades commented Mar 24, 2024

I think we should understand how the interval type is actually used in Spark in order to determine if additional parameters are needed.
Could you provide code examples of how Spark defines and reads/writes the interval type? @FANNG1

@FANNG1
Copy link
Contributor Author

FANNG1 commented Apr 8, 2024

Take YearMonthIntervalType for example, it has two parameters startField and endFiled which could be YEAR and MONTH, valid types are:

  • interval year, like INTERVAL '100' YEAR
  • interval month, like INTERVAL '2' MONTH
  • interval year to month, like INTERVAL '100 2' YEAR TO MONTH
  override val typeName: String = {
    val startFieldName = fieldToString(startField)
    val endFieldName = fieldToString(endField)
    if (startFieldName == endFieldName) {
      s"interval $startFieldName"
    } else if (startField < endField) {
      s"interval $startFieldName to $endFieldName"
    } else {
      throw QueryCompilationErrors.invalidDayTimeIntervalType(startFieldName, endFieldName)
    }
  }

@mchades
Copy link
Contributor

mchades commented Apr 9, 2024

This issue should be considered together with #2373

@FANNG1
Copy link
Contributor Author

FANNG1 commented Apr 9, 2024

Gravitino IntervalYearType can convert to Spark YearMonthIntervalType(YEAR, MONTH) Gravitino IntervalDayType can convert to Spark DayTimeIntervalType(DAY, SECOND)

FYI: https://substrait.io/types/type_classes/

seems we could do the transform like this, because the simpleString of YearMonthIntervalType is YearMonthIntervalType(YEAR, MONTH) which is used to transform to hive types in spark. @mchades WDYT?

@mchades
Copy link
Contributor

mchades commented Apr 10, 2024

Gravitino IntervalYearType can convert to Spark YearMonthIntervalType(YEAR, MONTH) Gravitino IntervalDayType can convert to Spark DayTimeIntervalType(DAY, SECOND)
FYI: https://substrait.io/types/type_classes/

seems we could do the transform like this, because the simpleString of YearMonthIntervalType is YearMonthIntervalType(YEAR, MONTH) which is used to transform to hive types in spark. @mchades WDYT?

we can convert it like this first.

@FANNG1
Copy link
Contributor Author

FANNG1 commented Apr 11, 2024

@charliecheng630 do you still like to work on this?

@FANNG1
Copy link
Contributor Author

FANNG1 commented Apr 14, 2024

@charliecheng630 , this issue is expected to be fix in 0.5 in next week, If you didn't have time, I'd like to work on this.

@charliecheng630
Copy link
Contributor

@FANNG1 I probably won't make it in time to work on this issue.

@FANNG1 FANNG1 self-assigned this Apr 15, 2024
@jerryshao jerryshao added this to the Gravitino June Release milestone Apr 24, 2024
@FANNG1
Copy link
Contributor Author

FANNG1 commented May 16, 2024

Spark don't support create table or load table with IntervalXXType column, not plan to support it unless a real need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvements on everything
Projects
None yet
Development

No branches or pull requests

4 participants