-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arrow connector reviewcomment fixes #28
Arrow connector reviewcomment fixes #28
Conversation
Use flight descriptor instead of ArrowFlightRequest
.github/workflows/arrow-tests.yml
Outdated
@@ -0,0 +1,82 @@ | |||
name: arrow tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you rename to arrow flight tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
FlightInfo flightInfo = clientHandler.getFlightInfo(request, connectorSession); | ||
List<Field> fields = flightInfo.getSchema().getFields(); | ||
Optional<Schema> flightschema = clientHandler.getSchema(flightDescriptor, connectorSession); | ||
List<Field> fields = flightschema.map(schema1 -> schema1.getFields()).orElse(Collections.emptyList()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use the method reference Schema::getFields
instead of schema1 -> schema1.getFields()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
ConnectorHandleResolver handleResolver, | ||
ConnectorSplitManager splitManager, | ||
ConnectorPageSourceProvider pageSourceProvider, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this extra line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
{ | ||
} | ||
|
||
static Block buildBlockFromVector(FieldVector vector, Type type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't this be made public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
appendValueToBuilder(elementType, elementBuilder, value); | ||
} | ||
} | ||
arrayBuilder.closeEntry(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
closeEntry() ensures that the system knows where one array ends and the next begins.
if (value instanceof Integer) { | ||
type.writeLong(builder, (Integer) value); | ||
} | ||
else if (value instanceof JsonStringArrayList) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
won't JsonStringArrayList
occur outside of ListVectors? Shouldn't we handle that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is needed only for List implemenation may be not required for other primitive vectors
return; | ||
} | ||
|
||
if (type instanceof VarcharType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we handling all the types that we handled in the buildBlockFrom....
methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are handling row types varchar vectors and some of the primitive types . should we handle all ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
won't we get those types in ListVector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The remaining types are timestamp , date , tiny int , small int etc.., we will add for them also
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Remove config getter from flight client handler
return FlightDescriptor.command(request.getCommand()); | ||
} | ||
|
||
private TestingArrowFlightRequest getArrowFlightRequest(ArrowFlightConfig config, String schema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
config
is available in class member, no need to pass in method parameters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is config
still needed in this method's parameters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
{ | ||
public String database; | ||
public String password; | ||
public final String database; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these can be private, with getters added for them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
else { | ||
int dictionaryIndex = encodedVector.get(i); | ||
if (dictionary.isNull(dictionaryIndex)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we use Dictionary
and DictionaryEncoder.decode
https://arrow.apache.org/docs/java/ipc.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
throw new UnsupportedOperationException("Unsupported vector type: " + vector.getClass().getSimpleName()); | ||
} | ||
|
||
public static Block buildBlockFromEncodedVector(FieldVector encodedVector, FieldVector dictionary) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we accept Dictionary
type itself here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
public static Block buildBlockFromEncodedVector(FieldVector encodedVector, Dictionary dictionary) | ||
{ | ||
// Validate inputs | ||
if (encodedVector == null || dictionary == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use requireNonNull
method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Do the tests all pass? |
presto-base-arrow-flight/pom.xml
Outdated
<dependency> | ||
<groupId>org.apache.arrow</groupId> | ||
<artifactId>arrow-algorithm</artifactId> | ||
<version>18.0.0</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't the arrow version be specified in properties?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this version is specific only to arrow-algorithm , Its used only once . should we move this to properties?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't there already an arrow.version property defined, can't we use that? is this version different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this is 18 , anyways i will change it to 17 and recheck
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then why not set all arrow dependencies to 18
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay will change it accordingly to 18.1.0 which is latest , will re run the test cases
throw new PrestoException( | ||
StandardErrorCode.INVALID_CAST_ARGUMENT, | ||
"Invalid table handle: Expected an instance of ArrowTableHandle but received " | ||
+ table.getClass().getSimpleName() + ". Please check the table type being used."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
User doesn't have an option to check table type. Error message doesn't need the string "Please check....."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
public void closeRootallocator() | ||
{ | ||
allocator.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if allocator is null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added my suggestions
import static java.lang.String.format; | ||
import static java.util.Objects.requireNonNull; | ||
|
||
public class ArrowOutputTableHandle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is this class used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used for json round test case in TestArrowHandle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why create a class if it's nowhere used in main
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess this is not needed , will delete it
} | ||
} | ||
|
||
public static void handleTimestampType(Type type, BlockBuilder builder, Object value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of "handle...." name, wouldn't "writeTimestamp...." name be apt? Similarly for other "handle...." methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
|
||
// Get the Arrow indices vector | ||
IntVector indicesVector = (IntVector) fieldVector; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't we need to check type before casting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
// Encode using dictionary | ||
Dictionary dictionary = new Dictionary(dictionaryVector, new DictionaryEncoding(1L, false, null)); | ||
IntVector encodedVector = (IntVector) DictionaryEncoder.encode(rawVector, dictionary); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will DictionaryEncoder.encode
always return IntVector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added different testcases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DictionaryEncoder.encode will return a BaseIntVector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
BigIntVector bigIntIndicesVector = (BigIntVector) fieldVector; | ||
int[] ids = new int[bigIntIndicesVector.getValueCount()]; | ||
for (int i = 0; i < bigIntIndicesVector.getValueCount(); i++) { | ||
ids[i] = (int) bigIntIndicesVector.get(i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
down casting big int to int will result in data loss, right? so better we don't support creating dictionaryblock from bigint indices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah okay will remove that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
rawVector.setValueCount(VECTOR_LENGTH); | ||
|
||
// Encode using dictionary | ||
Dictionary dictionary = new Dictionary(dictionaryVector, new DictionaryEncoding(1L, false, null)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should mention the index type when dictionary encoding is created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
15e1025
into
arrow-connector-workspace
* Arrow Flight connector template Co-authored-by: sai bhaskar reddy <[email protected]> Co-authored-by: SthuthiGhosh9400 <[email protected]> Co-authored-by: lithinwxd <[email protected]> Co-authored-by: Steve Burnett <[email protected]> Co-authored-by: elbinpallimalilibm <[email protected]> * Move Arrow type to Presto type mapping to a separate function * Fix checkstyle errors * Refactor null check * Remove explicit versios * Remove explicit versions * Remove explicit versions in pom (#25) * Remove explicit versios * Remove explicit versions * Remove explicit declarations in pom * Resolve upper bound error * Do not exclude netty dependencies * Remove explicit versions from pom * Add integration smoke test * Use flight descriptor instead of ArrowFlightRequest * Remove config getter from flight client handler * Arrow connector reviewcomment fixes (#28) * Arrow CI job * added verison in property file * testing after removing hyphen * remove path to make sure CI is run during every build * changed yaml to run CI on evry pull requests and every update except doc update * review comment fixes * Use flight descriptor instead of ArrowFlightRequest * Arrow page utils changes * Use flight descriptor instead of ArrowFlightRequest * Arrow page utils changes - fixed checkstyle issues * Review comment fixes - Root allocator and typos * Remove config getter from flight client handler * Arrow CI job * added verison in property file * testing after removing hyphen * remove path to make sure CI is run during every build * changed yaml to run CI on evry pull requests and every update except doc update * review comment fixes * Arrow page utils changes * Arrow page utils changes - fixed checkstyle issues * Review comment fixes - Root allocator and typos * Review comment fixes * Review comment fixes * Review comment fixes - Changed config * Added support for small int tiny int date and timestamp * Review comment fixes - Dictionary encoding and other tests * Review comment fixes - Added Tests as per comments * Removed license header unwanted place * Removed duplicate CI job * Review comment fixes * Added more testcases * Fixed review comments and added support for other datatypes * Minor fixes on method argument * Fixed review comments - cosmetic changes and unneccessary class removal * Arrow - DictionaryEncoding usecase * Removed description which looks like generated * Added support for dictionary encoding * Fixed review comments * Fixed review comments -added index type --------- Co-authored-by: Elbin Pallimalil <[email protected]> Co-authored-by: Elbin Pallimalil <[email protected]> --------- Co-authored-by: sai bhaskar reddy <[email protected]> Co-authored-by: SthuthiGhosh9400 <[email protected]> Co-authored-by: lithinwxd <[email protected]> Co-authored-by: Steve Burnett <[email protected]>
@@ -3,7 +3,6 @@ | |||
Arrow Flight Connector | |||
====================== | |||
This connector allows querying multiple data sources that are supported by an Arrow Flight server. | |||
Apache Arrow enhances performance and efficiency in data-intensive applications through its columnar memory layout, zero-copy reads, vectorized execution, cross-language interoperability, rich data type support, and optimization for modern hardware. These features collectively reduce overhead, improve data processing speeds, and facilitate seamless data exchange between different systems and languages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you a give a brief overview and link to the official documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -15,15 +15,26 @@ | |||
|
|||
import com.fasterxml.jackson.annotation.JsonProperty; | |||
|
|||
public class TestingInteractionProperties | |||
public final class TestingInteractionProperties |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use the Immutable annotation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -222,7 +223,7 @@ else if (selectStatement != null) { | |||
@Override | |||
public Runnable acceptPut(CallContext callContext, FlightStream flightStream, StreamListener<PutResult> streamListener) | |||
{ | |||
return null; | |||
throw new UnsupportedOperationException("This operation is not supported"); | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
listActions
should also throw unsupported operation exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
throw new PrestoException( | ||
StandardErrorCode.INVALID_CAST_ARGUMENT, | ||
"Invalid table handle: Expected an instance of ArrowTableHandle but received " | ||
+ table.getClass().getSimpleName() + ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to append empty string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* Arrow Flight connector template Co-authored-by: sai bhaskar reddy <[email protected]> Co-authored-by: SthuthiGhosh9400 <[email protected]> Co-authored-by: lithinwxd <[email protected]> Co-authored-by: Steve Burnett <[email protected]> Co-authored-by: elbinpallimalilibm <[email protected]> Co-authored-by: Steve Burnett <[email protected]> Co-authored-by: Timothy Meehan <[email protected]> Co-authored-by: Timothy Meehan <[email protected]> Co-authored-by: Timothy Meehan <[email protected]> * Arrow Flight connector template Co-authored-by: sai bhaskar reddy <[email protected]> Co-authored-by: SthuthiGhosh9400 <[email protected]> Co-authored-by: lithinwxd <[email protected]> Co-authored-by: Steve Burnett <[email protected]> Co-authored-by: elbinpallimalilibm <[email protected]> * Move Arrow type to Presto type mapping to a separate function * Fix checkstyle errors * Refactor null check * Remove explicit versios * Remove explicit versions * Remove explicit versions in pom (#25) * Remove explicit versios * Remove explicit versions * Remove explicit declarations in pom * Resolve upper bound error * Do not exclude netty dependencies * Remove explicit versions from pom * Add integration smoke test * Use flight descriptor instead of ArrowFlightRequest * Remove config getter from flight client handler * Arrow connector reviewcomment fixes (#28) * Arrow CI job * added verison in property file * testing after removing hyphen * remove path to make sure CI is run during every build * changed yaml to run CI on evry pull requests and every update except doc update * review comment fixes * Use flight descriptor instead of ArrowFlightRequest * Arrow page utils changes * Use flight descriptor instead of ArrowFlightRequest * Arrow page utils changes - fixed checkstyle issues * Review comment fixes - Root allocator and typos * Remove config getter from flight client handler * Arrow CI job * added verison in property file * testing after removing hyphen * remove path to make sure CI is run during every build * changed yaml to run CI on evry pull requests and every update except doc update * review comment fixes * Arrow page utils changes * Arrow page utils changes - fixed checkstyle issues * Review comment fixes - Root allocator and typos * Review comment fixes * Review comment fixes * Review comment fixes - Changed config * Added support for small int tiny int date and timestamp * Review comment fixes - Dictionary encoding and other tests * Review comment fixes - Added Tests as per comments * Removed license header unwanted place * Removed duplicate CI job * Review comment fixes * Added more testcases * Fixed review comments and added support for other datatypes * Minor fixes on method argument * Fixed review comments - cosmetic changes and unneccessary class removal * Arrow - DictionaryEncoding usecase * Removed description which looks like generated * Added support for dictionary encoding * Fixed review comments * Fixed review comments -added index type --------- Co-authored-by: Elbin Pallimalil <[email protected]> Co-authored-by: Elbin Pallimalil <[email protected]> * Arrow connector reviewcomment fixes elbin (#32) * Arrow CI job * added verison in property file * testing after removing hyphen * remove path to make sure CI is run during every build * changed yaml to run CI on evry pull requests and every update except doc update * review comment fixes * Arrow page utils changes * Use flight descriptor instead of ArrowFlightRequest * Arrow page utils changes - fixed checkstyle issues * Review comment fixes - Root allocator and typos * Arrow CI job * added verison in property file * testing after removing hyphen * remove path to make sure CI is run during every build * changed yaml to run CI on evry pull requests and every update except doc update * review comment fixes * Arrow page utils changes * Arrow page utils changes - fixed checkstyle issues * Review comment fixes - Root allocator and typos * Review comment fixes * Review comment fixes * Review comment fixes - Changed config * Added support for small int tiny int date and timestamp * Review comment fixes - Dictionary encoding and other tests * Review comment fixes - Added Tests as per comments * Removed license header unwanted place * Removed duplicate CI job * Review comment fixes * Added more testcases * Fixed review comments and added support for other datatypes * Minor fixes on method argument * Fixed review comments - cosmetic changes and unneccessary class removal * Arrow - DictionaryEncoding usecase * Removed description which looks like generated * Added support for dictionary encoding * Fixed review comments * Fixed review comments -added index type * Added documentation link for arrow connector * Made TestingInteractionProperties immutable * Made TestingInteractionProperties immutable --------- Co-authored-by: lithinpurushothaman <[email protected]> * Update ArrowPageUtilsTest.java * Update base-arrow-flight.rst --------- Co-authored-by: sai bhaskar reddy <[email protected]> Co-authored-by: SthuthiGhosh9400 <[email protected]> Co-authored-by: lithinwxd <[email protected]> Co-authored-by: Steve Burnett <[email protected]> Co-authored-by: Timothy Meehan <[email protected]>
Description
Motivation and Context
Impact
Test Plan
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.
If release note is NOT required, use: