Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arrow connector reviewcomment fixes #28

Conversation

lithinwxd
Copy link
Collaborator

Description

Motivation and Context

Impact

Test Plan

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* ... :pr:`12345`
* ... :pr:`12345`

Hive Connector Changes
* ... :pr:`12345`
* ... :pr:`12345`

If release note is NOT required, use:

== NO RELEASE NOTE ==

@@ -0,0 +1,82 @@
name: arrow tests
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you rename to arrow flight tests?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

FlightInfo flightInfo = clientHandler.getFlightInfo(request, connectorSession);
List<Field> fields = flightInfo.getSchema().getFields();
Optional<Schema> flightschema = clientHandler.getSchema(flightDescriptor, connectorSession);
List<Field> fields = flightschema.map(schema1 -> schema1.getFields()).orElse(Collections.emptyList());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use the method reference Schema::getFields instead of schema1 -> schema1.getFields()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

ConnectorHandleResolver handleResolver,
ConnectorSplitManager splitManager,
ConnectorPageSourceProvider pageSourceProvider,

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this extra line

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

{
}

static Block buildBlockFromVector(FieldVector vector, Type type)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't this be made public?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

appendValueToBuilder(elementType, elementBuilder, value);
}
}
arrayBuilder.closeEntry();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this do?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

closeEntry() ensures that the system knows where one array ends and the next begins.

if (value instanceof Integer) {
type.writeLong(builder, (Integer) value);
}
else if (value instanceof JsonStringArrayList) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

won't JsonStringArrayList occur outside of ListVectors? Shouldn't we handle that case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is needed only for List implemenation may be not required for other primitive vectors

return;
}

if (type instanceof VarcharType) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we handling all the types that we handled in the buildBlockFrom.... methods?

Copy link
Collaborator Author

@lithinwxd lithinwxd Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are handling row types varchar vectors and some of the primitive types . should we handle all ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

won't we get those types in ListVector?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remaining types are timestamp , date , tiny int , small int etc.., we will add for them also

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return FlightDescriptor.command(request.getCommand());
}

private TestingArrowFlightRequest getArrowFlightRequest(ArrowFlightConfig config, String schema)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config is available in class member, no need to pass in method parameters

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is config still needed in this method's parameters

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

{
public String database;
public String password;
public final String database;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these can be private, with getters added for them

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
else {
int dictionaryIndex = encodedVector.get(i);
if (dictionary.isNull(dictionaryIndex)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we use Dictionary and DictionaryEncoder.decode https://arrow.apache.org/docs/java/ipc.html

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

throw new UnsupportedOperationException("Unsupported vector type: " + vector.getClass().getSimpleName());
}

public static Block buildBlockFromEncodedVector(FieldVector encodedVector, FieldVector dictionary)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we accept Dictionary type itself here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

public static Block buildBlockFromEncodedVector(FieldVector encodedVector, Dictionary dictionary)
{
// Validate inputs
if (encodedVector == null || dictionary == null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use requireNonNull method

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@elbinpallimalilibm
Copy link
Collaborator

Do the tests all pass?

<dependency>
<groupId>org.apache.arrow</groupId>
<artifactId>arrow-algorithm</artifactId>
<version>18.0.0</version>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't the arrow version be specified in properties?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this version is specific only to arrow-algorithm , Its used only once . should we move this to properties?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't there already an arrow.version property defined, can't we use that? is this version different?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this is 18 , anyways i will change it to 17 and recheck

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then why not set all arrow dependencies to 18

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay will change it accordingly to 18.1.0 which is latest , will re run the test cases

throw new PrestoException(
StandardErrorCode.INVALID_CAST_ARGUMENT,
"Invalid table handle: Expected an instance of ArrowTableHandle but received "
+ table.getClass().getSimpleName() + ". Please check the table type being used.");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User doesn't have an option to check table type. Error message doesn't need the string "Please check....."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


public void closeRootallocator()
{
allocator.close();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if allocator is null

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Collaborator

@elbinpallimalilibm elbinpallimalilibm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added my suggestions

import static java.lang.String.format;
import static java.util.Objects.requireNonNull;

public class ArrowOutputTableHandle
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this class used?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used for json round test case in TestArrowHandle

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why create a class if it's nowhere used in main

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess this is not needed , will delete it

}
}

public static void handleTimestampType(Type type, BlockBuilder builder, Object value)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of "handle...." name, wouldn't "writeTimestamp...." name be apt? Similarly for other "handle...." methods.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

// Get the Arrow indices vector
IntVector indicesVector = (IntVector) fieldVector;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we need to check type before casting?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


// Encode using dictionary
Dictionary dictionary = new Dictionary(dictionaryVector, new DictionaryEncoding(1L, false, null));
IntVector encodedVector = (IntVector) DictionaryEncoder.encode(rawVector, dictionary);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will DictionaryEncoder.encode always return IntVector

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added different testcases

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DictionaryEncoder.encode will return a BaseIntVector

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

BigIntVector bigIntIndicesVector = (BigIntVector) fieldVector;
int[] ids = new int[bigIntIndicesVector.getValueCount()];
for (int i = 0; i < bigIntIndicesVector.getValueCount(); i++) {
ids[i] = (int) bigIntIndicesVector.get(i);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

down casting big int to int will result in data loss, right? so better we don't support creating dictionaryblock from bigint indices

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah okay will remove that

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

rawVector.setValueCount(VECTOR_LENGTH);

// Encode using dictionary
Dictionary dictionary = new Dictionary(dictionaryVector, new DictionaryEncoding(1L, false, null));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should mention the index type when dictionary encoding is created.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@elbinpallimalilibm elbinpallimalilibm merged commit 15e1025 into arrow-connector-workspace Dec 9, 2024
52 of 53 checks passed
elbinpallimalilibm added a commit that referenced this pull request Dec 9, 2024
* Arrow Flight connector template

Co-authored-by: sai bhaskar reddy <[email protected]>
Co-authored-by: SthuthiGhosh9400 <[email protected]>
Co-authored-by: lithinwxd <[email protected]>
Co-authored-by: Steve Burnett <[email protected]>
Co-authored-by: elbinpallimalilibm <[email protected]>

* Move Arrow type to Presto type mapping to a separate function

* Fix checkstyle errors

* Refactor null check

* Remove explicit versios

* Remove explicit versions

* Remove explicit versions in pom (#25)

* Remove explicit versios

* Remove explicit versions

* Remove explicit declarations in pom

* Resolve upper bound error

* Do not exclude netty dependencies

* Remove explicit versions from pom

* Add integration smoke test

* Use flight descriptor instead of ArrowFlightRequest

* Remove config getter from flight client handler

* Arrow connector reviewcomment fixes (#28)

* Arrow CI job

* added verison in property file

* testing after removing hyphen

* remove path to make sure CI is run during every build

* changed yaml to run CI on evry pull requests and every update except doc update

* review comment fixes

* Use flight descriptor instead of ArrowFlightRequest

* Arrow page utils changes

* Use flight descriptor instead of ArrowFlightRequest

* Arrow page utils changes - fixed checkstyle issues

* Review comment fixes - Root allocator and typos

* Remove config getter from flight client handler

* Arrow CI job

* added verison in property file

* testing after removing hyphen

* remove path to make sure CI is run during every build

* changed yaml to run CI on evry pull requests and every update except doc update

* review comment fixes

* Arrow page utils changes

* Arrow page utils changes - fixed checkstyle issues

* Review comment fixes - Root allocator and typos

* Review comment fixes

* Review comment fixes

* Review comment fixes - Changed config

* Added support for small int tiny int date and timestamp

* Review comment fixes - Dictionary encoding and other tests

* Review comment fixes - Added Tests as per comments

* Removed license header unwanted place

* Removed duplicate CI job

* Review comment fixes

* Added more testcases

* Fixed review comments and added support for other datatypes

* Minor fixes on method argument

* Fixed review comments - cosmetic changes and unneccessary class removal

* Arrow - DictionaryEncoding usecase

* Removed description which looks like generated

* Added support for dictionary encoding

* Fixed review comments

* Fixed review comments -added index type

---------

Co-authored-by: Elbin Pallimalil <[email protected]>
Co-authored-by: Elbin Pallimalil <[email protected]>

---------

Co-authored-by: sai bhaskar reddy <[email protected]>
Co-authored-by: SthuthiGhosh9400 <[email protected]>
Co-authored-by: lithinwxd <[email protected]>
Co-authored-by: Steve Burnett <[email protected]>
@@ -3,7 +3,6 @@
Arrow Flight Connector
======================
This connector allows querying multiple data sources that are supported by an Arrow Flight server.
Apache Arrow enhances performance and efficiency in data-intensive applications through its columnar memory layout, zero-copy reads, vectorized execution, cross-language interoperability, rich data type support, and optimization for modern hardware. These features collectively reduce overhead, improve data processing speeds, and facilitate seamless data exchange between different systems and languages.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you a give a brief overview and link to the official documentation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -15,15 +15,26 @@

import com.fasterxml.jackson.annotation.JsonProperty;

public class TestingInteractionProperties
public final class TestingInteractionProperties
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use the Immutable annotation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -222,7 +223,7 @@ else if (selectStatement != null) {
@Override
public Runnable acceptPut(CallContext callContext, FlightStream flightStream, StreamListener<PutResult> streamListener)
{
return null;
throw new UnsupportedOperationException("This operation is not supported");
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

listActions should also throw unsupported operation exception

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

throw new PrestoException(
StandardErrorCode.INVALID_CAST_ARGUMENT,
"Invalid table handle: Expected an instance of ArrowTableHandle but received "
+ table.getClass().getSimpleName() + "");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to append empty string

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

elbinpallimalilibm added a commit that referenced this pull request Dec 9, 2024
* Arrow Flight connector template

Co-authored-by: sai bhaskar reddy <[email protected]>
Co-authored-by: SthuthiGhosh9400 <[email protected]>
Co-authored-by: lithinwxd <[email protected]>
Co-authored-by: Steve Burnett <[email protected]>
Co-authored-by: elbinpallimalilibm <[email protected]>
Co-authored-by: Steve Burnett <[email protected]>
Co-authored-by: Timothy Meehan <[email protected]>
Co-authored-by: Timothy Meehan <[email protected]>
Co-authored-by: Timothy Meehan <[email protected]>

* Arrow Flight connector template

Co-authored-by: sai bhaskar reddy <[email protected]>
Co-authored-by: SthuthiGhosh9400 <[email protected]>
Co-authored-by: lithinwxd <[email protected]>
Co-authored-by: Steve Burnett <[email protected]>
Co-authored-by: elbinpallimalilibm <[email protected]>

* Move Arrow type to Presto type mapping to a separate function

* Fix checkstyle errors

* Refactor null check

* Remove explicit versios

* Remove explicit versions

* Remove explicit versions in pom (#25)

* Remove explicit versios

* Remove explicit versions

* Remove explicit declarations in pom

* Resolve upper bound error

* Do not exclude netty dependencies

* Remove explicit versions from pom

* Add integration smoke test

* Use flight descriptor instead of ArrowFlightRequest

* Remove config getter from flight client handler

* Arrow connector reviewcomment fixes (#28)

* Arrow CI job

* added verison in property file

* testing after removing hyphen

* remove path to make sure CI is run during every build

* changed yaml to run CI on evry pull requests and every update except doc update

* review comment fixes

* Use flight descriptor instead of ArrowFlightRequest

* Arrow page utils changes

* Use flight descriptor instead of ArrowFlightRequest

* Arrow page utils changes - fixed checkstyle issues

* Review comment fixes - Root allocator and typos

* Remove config getter from flight client handler

* Arrow CI job

* added verison in property file

* testing after removing hyphen

* remove path to make sure CI is run during every build

* changed yaml to run CI on evry pull requests and every update except doc update

* review comment fixes

* Arrow page utils changes

* Arrow page utils changes - fixed checkstyle issues

* Review comment fixes - Root allocator and typos

* Review comment fixes

* Review comment fixes

* Review comment fixes - Changed config

* Added support for small int tiny int date and timestamp

* Review comment fixes - Dictionary encoding and other tests

* Review comment fixes - Added Tests as per comments

* Removed license header unwanted place

* Removed duplicate CI job

* Review comment fixes

* Added more testcases

* Fixed review comments and added support for other datatypes

* Minor fixes on method argument

* Fixed review comments - cosmetic changes and unneccessary class removal

* Arrow - DictionaryEncoding usecase

* Removed description which looks like generated

* Added support for dictionary encoding

* Fixed review comments

* Fixed review comments -added index type

---------

Co-authored-by: Elbin Pallimalil <[email protected]>
Co-authored-by: Elbin Pallimalil <[email protected]>

* Arrow connector reviewcomment fixes elbin (#32)

* Arrow CI job

* added verison in property file

* testing after removing hyphen

* remove path to make sure CI is run during every build

* changed yaml to run CI on evry pull requests and every update except doc update

* review comment fixes

* Arrow page utils changes

* Use flight descriptor instead of ArrowFlightRequest

* Arrow page utils changes - fixed checkstyle issues

* Review comment fixes - Root allocator and typos

* Arrow CI job

* added verison in property file

* testing after removing hyphen

* remove path to make sure CI is run during every build

* changed yaml to run CI on evry pull requests and every update except doc update

* review comment fixes

* Arrow page utils changes

* Arrow page utils changes - fixed checkstyle issues

* Review comment fixes - Root allocator and typos

* Review comment fixes

* Review comment fixes

* Review comment fixes - Changed config

* Added support for small int tiny int date and timestamp

* Review comment fixes - Dictionary encoding and other tests

* Review comment fixes - Added Tests as per comments

* Removed license header unwanted place

* Removed duplicate CI job

* Review comment fixes

* Added more testcases

* Fixed review comments and added support for other datatypes

* Minor fixes on method argument

* Fixed review comments - cosmetic changes and unneccessary class removal

* Arrow - DictionaryEncoding usecase

* Removed description which looks like generated

* Added support for dictionary encoding

* Fixed review comments

* Fixed review comments -added index type

* Added documentation link for arrow connector

* Made TestingInteractionProperties immutable

* Made TestingInteractionProperties immutable

---------

Co-authored-by: lithinpurushothaman <[email protected]>

* Update ArrowPageUtilsTest.java

* Update base-arrow-flight.rst

---------

Co-authored-by: sai bhaskar reddy <[email protected]>
Co-authored-by: SthuthiGhosh9400 <[email protected]>
Co-authored-by: lithinwxd <[email protected]>
Co-authored-by: Steve Burnett <[email protected]>
Co-authored-by: Timothy Meehan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants