[Java][Docs] Document environment variables/java properties #22595

asfimport · 2019-08-12T02:44:22Z

Specifically, "-Dio.netty.tryReflectionSetAccessible=true" for JVMs >= 9 and BoundsChecking/NullChecking for get.

Reporter: Micah Kornfield / @emkornfield
Assignee: Ji Liu / @tianchen92

PRs and other links:

GitHub Pull Request #5078

_{Note: This issue was originally created as ARROW-6206. Please see the migration documentation for further details.}

asfimport · 2019-08-12T02:51:17Z

Ji Liu / @tianchen92:
Curious where to update docs? is /arrow/java/README.md?

I just noticed that something in this file should also be updated like 'Java Code Style Guide'

asfimport · 2019-08-12T03:16:20Z

Micah Kornfield / @emkornfield:
arrow/java/README.md is what I was thinking. At some point we might want to create more formal docs using restructured text at: https://github.com/apache/arrow/tree/master/docs/source

asfimport · 2019-08-12T03:25:14Z

Ji Liu / @tianchen92:
Yes, I think Wes has created a issue for restructured text https://issues.apache.org/jira/browse/ARROW-5542.

One more question, why should set jvm param for JVM>=9?

Not quite familiar, seems

<io.netty.tryReflectionSetAccessible>true</io.netty.tryReflectionSetAccessible> is already in pom.xml, it dosen't work?

asfimport · 2019-08-12T03:54:26Z

Micah Kornfield / @emkornfield:
It could probably be set, for all versions, but I think it is only required past JVM 8 (could be mistaken though).

"<io.netty.tryReflectionSetAccessible>true</io.netty.tryReflectionSetAccessible> is already in pom.xml, it dosen't work?"

The only reference I found was for test execution in POM.xml. I think Consumers of the library have to set this property themselves when running the JVM. But my maven knowledge is weak, so I might be misunderstanding something.

asfimport · 2019-08-12T04:10:30Z

Ji Liu / @tianchen92:
Seems you are right, the reference was in test execution.

If you would like to provide a PR for this, I think ''Java Code Style Guide' could also be updated (unused imports, redundant modifier) or I can take this issue:) (If so please let me know if there's other info should be updated besides the above ones).

asfimport · 2019-08-12T04:21:00Z

Micah Kornfield / @emkornfield:
I would say we don't need to include "stricter" things in the README, since
they aren't really exceptions.

asfimport · 2019-08-12T04:24:26Z

Ji Liu / @tianchen92:
Fine, no problem.

asfimport · 2019-08-13T10:01:55Z

Jim Northrup:
NIO is not going to go away, and java is not going to stop harboring unreproducable NIO bugs.

is there a charter for what java usecases will be supported, and THEN, what among these items will leverage NIO, and what among these can use pure heap implementations of objects exclusively?

the utilities should abandon all hope of stability or useful benchmarks while there is a NIO component in a piece of code. the oracle engineers this year are certainly not on the same page as the jdk8 team, or the jdk6 team.

Unsafe/NIO usecases number about 2:

if you're utilizing mmap files to minimize page faults, go there.
if you're talking to crossplatform structs and mailboxes, you have no choice.
if you're squirreling away heap objects using something greater than the -Xmx setting, you should probably engineer it through mmap file access instead of using native handles directly, this is extremely unstable in my experience.

asfimport · 2019-08-13T16:02:36Z

Micah Kornfield / @emkornfield:
"is there a charter for what java usecases will be supported, and THEN, what among these items will leverage NIO, and what among these can use pure heap implementations of objects exclusively?"

I don't fully understand this question. My best attempt to answer it below:

The system property is needed because we use Netty as an off-heap memory allocator, this could potentially be replaced with something JNI based.

The core of the current Java implementation is off-heap memory. If you have specific requirements/use-cases in mind discussing dev@ or user@ mailing list is probably the way to go.

Could you provide a link to the text you quoted I'd be interested in reading it.

asfimport · 2019-08-13T16:04:47Z

Micah Kornfield / @emkornfield:
@tianchen92 thanks for volunteering to do this.

asfimport · 2019-08-14T06:16:26Z

Micah Kornfield / @emkornfield:
Issue resolved by pull request 5078
#5078

asfimport · 2019-08-20T15:33:58Z

Jim Northrup:

Could you provide a link to the text you quoted I'd be interested in reading it.

this is the benefit of having written what amounts to a netty analog over the course of 4 years, including an SSL/TLS sockets layer for http at one point. ultimately there is danger in long-lived services using NIO, end-of-story.

the process cleanup of the underlying OS will be the best protection against java NIO/JNI memory handles – if you have a daemon or long-running thing, or you must use directbuffers, assume that the reference counting is imperfect, and it will bite you one day (it may take days) if you trust it. so thing that use nio should be short lived, and wherever possible process encapsulated.

netty is the jboss-endorsed c10k java representative with the popular marketshare. iiuc arrow is a team that picked up netty derived off-heap tools naively and demonstrated that in 2019 it's still prone to some gotchas that are a little bit stronger than edge cases when the unit tests pass. indeed, my initial testing with writing jdbc to arrow on kilobytes of records succeeded well, and gave me the confidence to assume this will do the job faster than python. and so began this thread on 800+ megabytes of data.

considering the age and size of the netty ecosystem, there is no lack of scrutiny or open source virtue here. it's a VM-level weakness that java NIO is still something like peanuts in the kitchen, you should really put a consumer facing notice on where NIO is and is not present.

asfimport · 2019-08-21T03:45:33Z

Micah Kornfield / @emkornfield:
"iiuc arrow is a team that picked up netty derived off-heap tools naively and demonstrated that in 2019 it's still prone to some gotchas that are a little bit stronger than edge cases when the unit tests pass."

It is true the Java Arrow library has a steep learning curve, and could use better documentation so new developers aren't bitten. There has also been less focus on the non-core Java libraries (i.e. adapters) until recently, and we need to do something distinguish the maturity between them so these types of things are less surprising. If you have suggestions please let us know. I would suggest perhaps sending mail to the dev@ or user@ mailing lists, since generally more people monitor those then conversations on JIRA. FWIW, the core library was adapted from Apache Drill and used by Dremio in their product, both of which, iiuc are long running processes that provide competitive analytic performance (I don't know how prone to resource leakage they are are).

"and gave me the confidence to assume this will do the job faster than python. and so began this thread on 800+ megabytes of data."

I'm sorry you ran into this. If think you are working into the python ecosystem Turbodbc might be your best bet of getting data into Arrow. In general, most of the python code is just a facade on top of C++ so I would expect it to be pretty performant. Please discuss on the mailing list or continue to file JIRAs if you are seeing unexpected performance/behavior. We want to know.

"you should really put a consumer facing notice on where NIO is and is not present."

Would you mind opening up a JIRA/Pull Request describing how you think it is best to publicize it?

asfimport · 2019-08-21T05:30:26Z

Jim Northrup:
(previsouly responded as email, sorry if this creates a dupe)

I admire Arrow for doing a thing well. I hope that if I simply call “mvn maven-versions-plugin:latest” in the future this simple jdbc code will work better than before.

I appreciate the attention to the details.

I think through this discussion the jist is that tensorflow one-hot columns may quickly test the expected norms of arrow. Likewise, timeseries datasets have us blowing gaskets all over the place in terms of time-to-completion and RAM using pandas. What do we do with a 300 gig numpy dataset living in swap that takes 3 dasy to build? There’s no LSTM examples to demonstrate anything but toy datasets.

Turbodbc looks like a good fit for reducing transcription times.

For what I need in the space of Arrow, I think the ideal tool is something to work in and out of numpy and delegate to and from apache Geode or Hazelcast as the main substrate.

If perchance arrow can act as a window to memory grids, all the better.

As I find the time for signups and 2fa’s I will compose this to the lists

asfimport · 2019-08-21T14:20:35Z

Wes McKinney / @wesm:

As I find the time for signups and 2fa’s I will compose this to the lists

This shouldn't be too complicated, all you have to do is send an e-mail to [email protected]

asfimport closed this as completed Aug 14, 2019

asfimport assigned tianchen92 Jan 10, 2023

asfimport added this to the 0.15.0 milestone Jan 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Java][Docs] Document environment variables/java properties #22595

[Java][Docs] Document environment variables/java properties #22595

asfimport commented Aug 12, 2019

asfimport commented Aug 12, 2019

asfimport commented Aug 12, 2019

asfimport commented Aug 12, 2019

asfimport commented Aug 12, 2019

asfimport commented Aug 12, 2019

asfimport commented Aug 12, 2019

asfimport commented Aug 12, 2019

asfimport commented Aug 13, 2019

asfimport commented Aug 13, 2019

asfimport commented Aug 13, 2019

asfimport commented Aug 14, 2019

asfimport commented Aug 20, 2019

asfimport commented Aug 21, 2019

asfimport commented Aug 21, 2019

asfimport commented Aug 21, 2019

[Java][Docs] Document environment variables/java properties #22595

[Java][Docs] Document environment variables/java properties #22595

Comments

asfimport commented Aug 12, 2019

PRs and other links:

asfimport commented Aug 12, 2019

asfimport commented Aug 12, 2019

asfimport commented Aug 12, 2019

asfimport commented Aug 12, 2019

asfimport commented Aug 12, 2019

asfimport commented Aug 12, 2019

asfimport commented Aug 12, 2019

asfimport commented Aug 13, 2019

asfimport commented Aug 13, 2019

asfimport commented Aug 13, 2019

asfimport commented Aug 14, 2019

asfimport commented Aug 20, 2019

asfimport commented Aug 21, 2019

asfimport commented Aug 21, 2019

asfimport commented Aug 21, 2019