-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Kernel] Add examples for Delta Kernel API usage #1926
Conversation
d59897b
to
1d0c173
Compare
5bab08b
to
92b4189
Compare
92b4189
to
5f9b4e2
Compare
kernel/examples/table-reader/pom.xml
Outdated
|
||
<groupId>org.example</groupId> | ||
<artifactId>table-reader</artifactId> | ||
<version>3.0.0-SNAPSHOT</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we be hardcoding this? also why does this need to be same version as the actual delta artifacts that we will publish. unless there is a need to consistent, this is only going to cause more headache to keep it updated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the version to 0.1-SNAPSHOT. Let me know if there is a better version name.
kernel/examples/run_examples.sh
Outdated
@@ -0,0 +1,49 @@ | |||
## |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
license.
kernel/examples/run_examples.sh
Outdated
## tables located in the <repo-root>/connectors/golden-tables/src/main/resources/golden | ||
## directory. | ||
## | ||
## Make sure to run this script from <repo-root> in order for the relative |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not make this script be runnable from anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this sort of lack of flexibility causes future pain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also .. thats why writing such scripts in python gives a lot more flexibility and an entire library of tools to do these sort of stuff. There is a lot of python scripts in this repo (e.g., run-integration-tests.py) that you can steal from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rewrote this in python (by copying the code from delta integration test).
kernel/examples/run_examples.sh
Outdated
-Dstaging.repo.url=${EXTRA_MAVEN_REPO:-"___"} \ | ||
-Ddelta-kernel.version=${STANDALONE_VERSION:-"3.0.0-SNAPSHOT"} \ | ||
-Dexec.args="${test}" | ||
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have you tested that this scripts fails (exit nonzero) when compilation and test in the examples fail?
StructType readSchema, | ||
Snapshot snapshot, | ||
int maxRowCount) throws IOException | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are mixing styles all over the place.. even within this function. can we stick to the following since we are all familiar with it (consistent with all the scala code).
try {
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This follows our Java checkstyle guide and Kernel Java code is. Let me know if this needs to be changed.
import io.delta.kernel.data.DefaultJsonRow; | ||
import io.delta.kernel.data.Row; | ||
import io.delta.kernel.internal.types.TableSchemaSerDe; | ||
import io.delta.kernel.types.ArrayType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: cant this be just types.*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, this is following the Java checkstyle. Also using wildcard in imports is not recommended.
* @return | ||
* @throws Exception | ||
*/ | ||
private void readSnapshot( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
readData
snapshot is ambiguous. snapshot metadata or snapshot data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, changed to readData
...l/examples/table-reader/src/main/java/io/delta/kernel/examples/MultiThreadedTableReader.java
Outdated
Show resolved
Hide resolved
approved, but I left some comments. |
Description
Adds an example project that shows how to read a Delta table using the Kernel APIs. The sample program can also be used as a command line to read the Delta table.
Single threaded reader
Multi-threaded reader (simulating a distributed execution environment)
How was this patch tested?
Manual testing