-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added draft proposal for WFLY-15659 : Transaction SlotStore config #446
base: main
Are you sure you want to change the base?
Conversation
|
||
=== Dev Contacts | ||
|
||
* mailto:{email}[{author}] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you may add myself or yourself or both here
|
||
=== Nice-to-Have Requirements | ||
|
||
Extend the sever dependency model to allow use of the Persistent Memory library, mashona, to support SlotStore use on pmem hardware. Optionally the components consuming it could simply bundle their own copies, trading version flexibility vs. footprint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is only for transactions and infinispan could we initially just add the library as a resource in the module.xml file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be really problematic for the WildFly build to deal with two different versions of the same library being provided from the same feature pack, which is what we'd be talking about with WildFly's own use of mashona.
Is it expected that different consumers of mashona, e.g. Narayana and Infinispan, aren't going to be able to align on a consistent mashona version?
If not, simplest is to provide a separate module, consistent with how most artifacts in WildFly are provided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The upstreams for narayana and infinispan will inevitably go though periods when they diverge somewhat in the version of the mashona library they use, since they don't release, or subsequently get updated in WF, in lockstep. On the other hand, they should generally be in agreement on which version of the mashona library API they use as with e.g. jboss-logging, so it mostly shouldn't matter if the wildfly pom overrides the minor/patch version that the upstream prefers for sake of unity.
* Testing of the pmem options will require appropriate hardware, though this can be simulated by system configuration (similar to a RAM disk) | ||
|
||
== Community Documentation | ||
//// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need something in the jbosstm docs and in the wildfly transaction model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://docs.wildfly.org/25/Admin_Guide#Transactions_Subsystem is the relevant Admin Guide section.
@jmesnil FYI |
|
||
The current options include filesystem based using file-per-transaction or append-only log (reusing code from HornetQ / ActiveMQ Artemis), or a JDBC database. | ||
|
||
Narayana upstream now also offers the SlotStore, a filesystem based store that employs an efficient memory mapping approach. Additionally and uniquely, this store can utilise Persistent Memory (pmem) hardware where available for very fast transaction logging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this expected to be more efficient than the journal store even without pmem hardware?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
YMMV.
The journal model works by gathering a number of tx log records into a single disk flush, which is great if you happen to have a lot of concurrent tx going on. With the trend to smaller deployments, containers with only one microservice and such, that's less of an advantage. It comes at the cost of higher latency transactions, as each must wait to join the next batch. It also forces global ordering on the tx, which is needed if you're a resource manager (databases, message systems) because data updates have to respect causal order. But it's unnecessary overhead for the tx manager, to which the tx are not ordered.
The SlotStore does one disk flush per tx, which at first glance makes it really inefficient. However, modern SSD can sustain a much higher flush rate that HDD could and indeed benefit from the added concurrency as they can better internally stripe the writes than an HDD with few heads can. It also means each tx can flush immediately instead of waiting for a batch fill/timer, which can reduce latency. At some point you hit a scale ceiling where batching is still beneficial, On pmem that's crazy high, since a flush is in the cost cost ballpark as the thread coordination. On an enterprise SSD not quite so much, but it's at a higher point than many smaller deployments with low tx concurrency ever reach.
To be fair, if its batch interval is tuned to the SSD it's on, the journal can be almost as good even at lower concurrency since you'll essentially be running batches of size 1, though it's still got more thread coordination overhead than the SlotStore. Not that anyone ever tunes it, and the defaults we ship are... somewhat dated compared to modern hardware capabilities. But that's a whole other discussion.
So, not guaranteed to be a win for everyone, but helpful in some use cases.
|
||
Extending the server's transaction management model to allow configuration of these options would allow users to access the new functionality of this component. | ||
|
||
Although the SlotStore code is part of Narayana, the mashona library used to support use on pmem is independent and may also be utilised by other components requiring similar hardware support e.g. Infinispan and messaging. For this reason, it may be suited to packaging as a separate module. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I talked elsewhere about ways to use the subsystem code to allow such a module to be optionally provisioned. But it seems mashona-logwriting is a 32kb jar so that seems like extreme overkill. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. I think the decision probably revolves more around the build and version flexibility than the footprint.
|
||
=== Hard Requirements | ||
|
||
Extend the server management model to facilitate configuration of the new SlotStore transaction log type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some details on what config options will be available would be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Config is at two levels: telling the tx engine to use the SlotStore and where to put it on the filesystem, then sizing it. Because it's memory mapped and Java doesn't like unmapping things, you pretty much have to pick the sizing ahead of time. So, number of slots (roughly the number of concurrent tx you expect) and size of each slot (how much information each tx record contains) Both those are relatively small, such that the best bet may be to just overprovision it significantly as default. I'm almost tempted not to expose the sizing params at all (as with many of the 100+ tx config options, they could still be tweaked by system properties, just not though the model) but maybe that's just inviting trouble.
|
||
* Testing of the SlotStore itself can be accomplished by using the same transaction tests that exercise existing store types, but changing the server config to use the new store type. | ||
|
||
* Testing of the pmem options will require appropriate hardware, though this can be simulated by system configuration (similar to a RAM disk) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much of this is covered within Narayana testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ObjectStore interface - all of it. The SlotStore implementation of it - no idea. Theoretically all of it, but that would require running all the store tests against each store implementation, which makes for quite a large matrix. I don't know what the current CI setup does. pmem - none, as the tx CI doesn't have any. I run the mashona tests on real pmem hardware for each release, but don't run the tx test suite, though that should be possible. Feels like if the CI for that has to be somewhere, it's better on the narayana side using fake-pmem or just the SlotStore on SSD, rather than on the mashona side. The advantage of having real pmem hardware is in accurate perf numbers for e.g. regression testing, not in functional testing.
|
||
* Testing of the new server configuration options will require new tests, patterned on those for existing store configurations. | ||
|
||
* Testing of the SlotStore itself can be accomplished by using the same transaction tests that exercise existing store types, but changing the server config to use the new store type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the relevant tests (somewhat) known? Are they fairly concentrated or are we talking about running significant chunks of the testsuite with an adjusted config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The WF tests? Anything that hits the tx store, which by default is anything running a tx across two resources e.g. a mdb writing a database. Historically I'd have been more worried that set wasn't big enough, rather than that its was too large to be efficiently run, but my knowledge of the app server test coverage is out of date to put it mildly. Coverage can be supplemented somewhat by running with 1PC optimization disabled, such that any tx with even a single resource gets logged to the store, but I'd guess that's still not huge. What's the approach for non-default store configs today? Is the full WF suite run for each of the fs store, journal and jdbc store, or does the bulk of that get exercised only upstream in narayana testing?
=== Testing By | ||
// Put an x in the relevant field to indicate if testing will be done by Engineering or QE. | ||
// Discuss with QE during the Kickoff state to decide this | ||
* [ ] Engineering |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will be the case that Engineering tests this so I guess this would be checked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/cc @mmusgrov
Time to pick this up again after my PTO. If there are no more questions, I guess the next step is to redraft the PR with all the additional material from the Q&A here so we can move on to implementation? |
… values using environment variables JIRA: https://issues.redhat.com/browse/WFCORE-5489 Signed-off-by: Jeff Mesnil <[email protected]>
…rties to configure the managed server JVMs Adding QE contact, Tester role and minor document update [WFCORE-2806] Fix Enginering tick
https://issues.redhat.com/browse/WFLY-15421