-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch split queries #10878
Comments
Note: see also #10465, which is very similar but comes from a slightly different angle. |
See also #5952 |
Unless I'm mistaken, this is no longer relevant as we generate one SQL query per LINQ query since 3.0. Closing. |
Reopening and assigning to @smitpatel at his request, could be done as part of the workaround for #18022. |
I've checked what command batching does to isolation/consistency in SQL Server and PostgreSQL, and the results are negative: merely batching statements into the same DbCommand doesn't have any impact on isolation. In other words, different database state can be observed by different statements in the same DbCommand, if not within an explicit transaction. It's still possible to have consistency by wrapping the batch in a serializable transaction (or snapshot in SQL Server) - see test code. We probably shouldn't do this implicitly, since it implies error or deadlocking behavior that doesn't occur otherwise. Test code[NonParallelizable]
public class BatchTransactionIsolationTest
{
const string PostgresConnectionString = ...;
const string SqlServerConnectionString = ...;
[Test]
public async Task SqlServer([Values(true, false)] bool wrapInSerializableTx)
{
using var conn1 = new SqlConnection(SqlServerConnectionString);
using var conn2 = new SqlConnection(SqlServerConnectionString);
conn1.Open();
conn2.Open();
var createSql = @"
IF OBJECT_ID('dbo.data', 'U') IS NOT NULL
DROP TABLE data;
CREATE TABLE data (id INT);
INSERT INTO data (id) VALUES (1)";
using (var createCmd = new SqlCommand(createSql, conn1))
createCmd.ExecuteNonQuery();
var tx = wrapInSerializableTx
? conn1.BeginTransaction(IsolationLevel.Snapshot)
: null;
using var batchCommand = new SqlCommand(@"
SELECT * FROM data;
WAITFOR DELAY '00:00:05';
SELECT * FROM data", conn1, tx);
using var modifyCommand = new SqlCommand(@"
UPDATE data SET id=2 WHERE id=1;
INSERT INTO data (id) VALUES (10)", conn2);
var t = batchCommand.ExecuteReaderAsync();
Thread.Sleep(1000);
modifyCommand.ExecuteNonQuery();
var reader = await t;
Console.WriteLine("Before wait:");
while (reader.Read())
Console.WriteLine(reader.GetInt32(0));
Assert.True(reader.NextResult());
Console.WriteLine("After wait:");
while (reader.Read())
Console.WriteLine(reader.GetInt32(0));
}
[Test]
public async Task Postgres([Values(true, false)]bool wrapInSerializableTx)
{
using var conn1 = new NpgsqlConnection(PostgresConnectionString);
using var conn2 = new NpgsqlConnection(PostgresConnectionString);
conn1.Open();
conn2.Open();
var createSql = @"
DROP TABLE IF EXISTS data;
CREATE TABLE data (id INT);
INSERT INTO data (id) VALUES (1)";
using (var createCmd = new NpgsqlCommand(createSql, conn1))
createCmd.ExecuteNonQuery();
if (wrapInSerializableTx)
conn1.BeginTransaction(IsolationLevel.Serializable);
using var batchCommand = new NpgsqlCommand(@"
SELECT * FROM data;
SELECT pg_sleep(5);
SELECT * FROM data", conn1);
using var modifyCommand = new NpgsqlCommand(@"
UPDATE data SET id=2 WHERE id=1;
INSERT INTO data (id) VALUES (10)", conn2);
var t = batchCommand.ExecuteReaderAsync();
Thread.Sleep(1000);
modifyCommand.ExecuteNonQuery();
var reader = await t;
Console.WriteLine("Before wait:");
while (reader.Read())
Console.WriteLine(reader.GetInt32(0));
Assert.True(reader.NextResult());
Assert.True(reader.NextResult()); // pg_sleep returns a void resultset
Console.WriteLine("After wait:");
while (reader.Read())
Console.WriteLine(reader.GetInt32(0));
}
} |
@roji Thanks for checking this. It's unfortunate, but I think it re-enforces that we need to retain both collection-include ehaviors for now. |
I completely agree. I also think it points towards retaining single-query as the default behavior, but I think we all agree we need some way to opt into split query. |
Note: batching becomes impossible if we decide to go in the direction of #12776 (fetching dependents by foreign keys instead of reevaluating the principal query). |
In some scenarios, EF Core internally generates more than one read in order to execute a single query. This is notably the case when executing (some?) joins when using a non-MARS provider. Performance could be (greatly!) improved by batching those multiple reads into a single batch internally. Unless I'm mistaken, in the case you generate multiple queries, the results for the earlier queries have to be buffered anyway, so there's no additional penalty in the buffering implied in the batching.
Note: this issue isn't about exposing a batching read API to the user (that's what #10879 is about).
NOTE: batching becomes impossible if we decide to go in the direction of #12776 (fetching dependents by foreign keys instead of reevaluating the principal query).
The text was updated successfully, but these errors were encountered: