-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected commit of transaction using Oracle #36265
Comments
I didn't try this locally yet (have to download an Oracle container on a low-bandwidth connection...), but IIUC the main problem is that the interceptor doesn't trigger a rollback before Agroal gets stopped (and connections get closed). I wonder if the root cause isn't simply that your thread gets interrupted, leading to
(And yes I agree it's a crazily dangerous default, but that's the spec.) Could you try annotating your method with this instead, and tell me if it solves your problem? @Transactional(rollbackOn={Exception.class}) |
Thanks for your reply! The result is still the same. I also verified this "commit on close" behavior with basic JDBC.
|
That's for sure, and from what I can see it's well-known to Oracle users; found this in particular. I'm not too worried about Oracle's behavior though, more about the fact that the transaction wasn't rolled back by the interceptor (so, before the connection was closed), especially with There must be something wrong on shutdown, maybe Agroal stopping before application threads, I don't know. Needs further investigation. |
Note I did fix the issue with a workaround by using a AgroalDataSourceListener. The code is dirty as there is no out-of-the-box solution to set a custom AgroalDataSourceListener in Quarkus. But in essence I just do a rollback in
|
Note the issue is only happening on graceful shutdowns. |
That's the thing: if threads are killed, the call to And all that should happen before the DB connection gets closed, because why on earth would Quarkus stop DB connection pools before killing application threads... If it doesn't happen that way, there's something wrong... Another theory would be that |
Unfortunately I didn't have time to make progress on this.
I had to work on the Agroal code recently, and noticed this: quarkus/extensions/agroal/runtime/src/main/java/io/quarkus/agroal/runtime/DataSources.java Lines 407 to 415 in 5a878c6
If Vert.x threads are killed in the same way, there's indeed a chance the order is undetermined (vertx threads may be killed before or after agroal connection pools, depending on who knows what) |
Maybe killing the applications threads don't cause an explicit rollback of the transaction? I can't really comment on Quarkus internals, but the root cause of my issue is that Oracle commits on close. I'm not sure what the best solution is for Quarkus. |
If you used
Yeah that... is really a dubious design choice. I guess we could override this behavior by default if there's a setting in the JDBC driver, but then if you had to resort to complicated listeners, I guess there isn't such a setting.
Exactly. Because there might be other, similar problems caused by this (still hypothetical) wrong shutdown order. |
HI! Probably, we also encountered the mentioned unexpected commit. This commit occurred for us when we tried to shut down Wildfly(27) gracefully. At that time, we noticed that in some cases, only a part of the tables was committed to the Oracle database (we experienced the same on PostgreSQL). There exists a property in version 1.5.9.Final ironjacamar jdbc where this can be set, and after that, the cleanup function handles the commit as expected for us (checked the main branch, its also looks good to me): https://github.com/ironjacamar/ironjacamar/blob/1.5.9.Final/adapters/src/main/java/org/jboss/jca/adapters/jdbc/BaseWrapperManagedConnection.java#L362C10-L362C10 Without this, we also observed that records appeared in the database that shouldn't have. Some tickets we started from: |
Thanks for the input! I tried looking for some property to set before, but didn't find any. In sqlplus it's the same behavior and you can turn it off with |
Hey @segersb , while working on something else I stumbled upon this: https://quarkus.io/guides/lifecycle#graceful-shutdown Could you confirm that when you talk about graceful shutdowns, you mean it in this specific sense (you set This probably doesn't change the problem, but would be interesting to know. |
Hello @yrodiere We haven't set anything specific for that parameter, so we just use the default. We detected the issue when running our Quarkus app on OpenShift. When a redeploy is done in OpenShift while this batch is running, we end up having partial data. When running the standard When running the quarkus:dev via java.exe + CTRL-C locally, we were able to replicate the issue. This led us to the conclusion that the shutdown method is somehow involved. |
Hi, we also have this exact issue. First observed in our Kubernetes-based test environment, where Kubernetes sends a SIGTERM to the Quarkus app, after which a partial transaction got committed to the Oracle DB. I can reliably reproduce it using Quarkus dev mode and pressing Using the workaround from @segersb above (#36265 (comment)) almost, but not quite, fixes the issue. The call to rollback happens without throwing an exception, but a SINGLE table entry is still inserted anyway. One thing I've noted is that "Arjuna" also tries to rollback the transaction, but fails because the connection was already closed:
I'd be very happy if Quarkus could reliably rollback all open transactions BEFORE closing the connections. |
Hi @ivu-mawi, Thanks for your input! Does your project maybe have multiple datasources? If not I'm a bit worried we'll also encounter issues in our projects. |
Hi, also thanks for reporting this and the workaround. I'm afraid we also use a single, default datasource. The single inserted entry doesn't always occur. But I'm guessing it only doesn't occur when I press |
I think my workaround now will be to simply delay the shutdown until all active transactions are finished:
This might mean waiting too long, so Kubernetes will SIGKILL our app. But that means the the oracle connection will not be closed properly as well, and even Oracle does a rollback then in my tests, same as But I don't know how this would interact with any other ShutdownEvent Observers, if we had any. Would this delay the other ones, preventing their graceful shutdown until SIGKILL happens? Not a good solution in that case. |
That's a creative workaround! Normally you should be able to use @priority annotation to make sure other shutdown listeners go first. SIGKILL always worked for me as well because Oracle only commits when connections are closed properly. Another approach might be to combine the Quarkus property |
Thanks, One thing I've noted is that any I agree with the rest of what you said. I'll stick with my "dynamic" delay, because it allows a normal graceful shutdown if no transactions are running, or very short transactions, which I expect is the much more likely scenario for our app. So the SIGKILL from Kubernetes would only be a rare last resort kind of thing, if someone happens to trigger a restart during one of those rare long transactions. Still, would be much nicer if Quarkus rolls back before close of course :) |
Great it helped! Interestingly the doc for using If you still have issues, you could also play with the values of those properties. |
I agree it's a problem, and I agree it should be solved. It's on my todo-list, but that todo-list is long and does not include just Quarkus. If you want this to be solved quicker, feel free to send a PR or report this through paid support (on RHBQ, ...). Failing that, you can rely on workarounds (I think a few were mentioned?) and I guess you can hope that the upgrade of Agroal #39072 will change something. |
Stumbling across this after days of investigating a swath of lost data that was a consequence of some intermittent state being erroneously committed, I am thankful to have found this discussion and being able to present the workaround originally presented by @segersb in #36265 (comment) with the new capabilities added by #36263 incorporated to simplify it: /**
* Oracle has the weird behavior that it performs an implicit commit on normal connection closure (even with auto-commit disabled),
* which happens on application shutdown. To prevent whatever intermittent state we have during shutdown from being committed,
* we add an explicit rollback to each connection closure. If the connection has already received a COMMIT, the rollback will not work, which is fine.
* <p>
* The code unwraps the {@link Connection} so that we perform the rollback directly on the underlying database connection,
* and not on e.g. Agroal's {@link ConnectionWrapper} which can prevent the rollback from actual being executed due to some safeguards.
*/
@ApplicationScoped
public class CustomOracleRollback implements AgroalPoolInterceptor {
@Override
public void onConnectionDestroy(Connection connection) {
try {
if (connection.isClosed() || connection.getAutoCommit()) {
return;
}
connection.unwrap(Connection.class).rollback();
} catch (SQLException e) {
Log.tracef(e, "Ignoring exception during rollback on connection close");
}
}
} |
Hello @Felk I also did the same on our side, though I did have to add an extra check before doing the rollback. The
|
Good point, thanks! I edited my comment to include those two conditions |
This is starting to look like the most likely explanation given what I learned in #34547 (comment):
So a possible fix would be to make sure Agroal shuts down after Vertx... that probably means the Vertx extension needs to use cc @cescoffier |
Vert.x should be the last to shut down. Otherwise, many things would not be functional. We can imagine an SPI indicating when Vert.x has been shut down (it's an async operation so after completion) for the cases where things must happen after (it should be rare and used with a lot of caution) |
I see, and yes I understand, as Hibernate Reactive for example would no longer work after vertx shut down. And may not be able to shut down, as shutdown in Hibernate Reactive may imply executing SQL statements... Could we perhaps have an event after vertx has stopped accepting new incoming requests and finished processing existing requests? That's more or less what we need here: the guarantee that nothing external will attempt to initiate use of datasources. Though I suppose other components unrelated to vertx (Cron? other servers with more exotic protocols?) could be triggering something similar. We may want a different form of synchronization... a two-step shutdown perhaps? |
I think we should aim at fixing this thing for the next LTS, either by going Yoann's way ^. It's not the first time that we had issues with shutdown ordering so taking a step back and listing all the issues to find a global solution would be awesome. In the meantime, I wonder if we should include #36265 (comment) in the I will create a PR with ^ to get the ball rolling. |
… close When a connection is closed, Oracle will commit the unfinished business, which is a very surprising behavior in most cases and could lead to data loss. I added a -Dquarkus-oracle-no-automatic-rollback-on-connection-close to disable this behavior if absolutely needed. If we have feedback that this is not something we should do always, we can add a proper configuration property (with a default value to determine). Fixes quarkusio#36265 Wouldn't have been possible without the nice work of @segersb and @Felk. Co-authored-by: Felix König <[email protected]>
I created #45301 for discussion. Thanks for all the good discussions in this issue! |
… close When a connection is closed, Oracle will commit the unfinished business, which is a very surprising behavior in most cases and could lead to data loss. I added a -Dquarkus-oracle-no-automatic-rollback-on-connection-close to disable this behavior if absolutely needed. If we have feedback that this is not something we should do always, we can add a proper configuration property (with a default value to determine). Fixes quarkusio#36265 Wouldn't have been possible without the nice work of @segersb and @Felk. Co-authored-by: Felix König <[email protected]>
Describe the bug
The Oracle JDBC driver performs a commit when a connection is closed without an explicit commit or rollback.
When Quarkus is shut down gracefully, connections are closed properly, and this causes unfinished transactions to be committed.
Consider the following simple rest endpoint.
We insert a record in the database, wait 2 minutes and then throw an exception.
Expected behavior
I would expect that nothing is inserted in the database.
Actual behavior
When doing a graceful shutdown during the wait period, the insert is committed.
How to Reproduce?
See code above for the simple rest endpoint.
To perform a graceful shutdown locally we need to start quarkus-dev via a java command, and not
mvn quarkus:dev
.Fortunately when I execute quarkus-dev via IntelliJ, the java command is the first line of the output.
This allows us to use CTRL-C, which will do a graceful shutdown.
Output of
uname -a
orver
No response
Output of
java -version
No response
GraalVM version (if different from Java)
No response
Quarkus version or git rev
No response
Build tool (ie. output of
mvnw --version
orgradlew --version
)No response
Additional information
No response
The text was updated successfully, but these errors were encountered: