Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.AccessViolationException at System.Text.UTF8Encoding.GetCharCount(Byte*, Int32, System.Text.DecoderNLS) #321

Closed
jay1891 opened this issue Dec 17, 2019 · 40 comments

Comments

@jay1891
Copy link

jay1891 commented Dec 17, 2019

Hi,

When adding records to the database with SQLitePCL 2.0.2 I'm getting sometimes the following exception. The entire .NET application crashes, so I can not catch the exception. What could be the reason for this?

Exception Info: System.AccessViolationException
at System.Text.UTF8Encoding.GetCharCount(Byte*, Int32, System.Text.DecoderNLS)
at System.String.CreateStringFromEncoding(Byte*, Int32, System.Text.Encoding)
at SQLitePCL.utf8z.utf8_to_string()
at SQLitePCL.raw.sqlite3_prepare_v2(SQLitePCL.sqlite3, System.String, SQLitePCL.sqlite3_stmt ByRef, System.String ByRef)
at Microsoft.Data.Sqlite.SqliteCommand+d__64.MoveNext()
at Microsoft.Data.Sqlite.SqliteCommand+d__54.MoveNext()
at Microsoft.Data.Sqlite.SqliteDataReader.NextResult()
at Microsoft.Data.Sqlite.SqliteCommand.ExecuteReader(System.Data.CommandBehavior)
at Microsoft.Data.Sqlite.SqliteCommand.ExecuteReaderAsync(System.Data.CommandBehavior, System.Threading.CancellationToken)
at Microsoft.Data.Sqlite.SqliteCommand+d__60.MoveNext()

The Dlls are the following:

Dlls

@ericsink
Copy link
Owner

Well that's interesting.

sqlite3_prepare_v2() returns a pointer to the "tail", which is the part of the SQL string that was unused. During the attempt to convert this back to a string, a memory access violation occurs. This could happen because the tail is not zero-terminated, or maybe because the tail is invalid utf8.

But the tail should point to the same memory given, which was a utf8 string allocated and created by that same overload of prepare_v2().

So there's a bit of a mystery here.

Do you know what SQL string is being passed down? I'm wondering if anything about it looks weird.

I'm a little surprised that Microsoft.Data.Sqlite is calling this particular prepare_v2() overload, but I haven't looked at its code to figure out why. @bricelam Any insight here?

It seems possible you have found a bug, but (1) this piece of code has been pretty well tested, and (2) if there's a bug, I'm not seeing it yet.

@jay1891
Copy link
Author

jay1891 commented Dec 17, 2019

I shall try to see if I can get the SQL string. With SQLite version 1.1.14 it ran for months and it didn't occur. We are now only upgrading to SQLite 2.0.2 and in every hour this crash occurs.

@ericsink
Copy link
Owner

With SQLite version 1.1.14 it ran for months and it didn't occur. We are now only upgrading to SQLite 2.0.2 and in every hour this crash occurs.

That area of code definitely changed between 1.1 and 2.0.

@jay1891
Copy link
Author

jay1891 commented Dec 19, 2019

Yesterday a LoggerFactory was set on the DbContextOptionsBuilder, so we could log the SQL statements, as shown below. It ran all day without any exception. Today I removed the LoggerFactory again and after a few minutes the application crashed.

var optionsBuilder =
new DbContextOptionsBuilder<App...DbContext>()
.EnableSensitiveDataLogging()
.EnableDetailedErrors();
.UseLoggerFactory(AppMsLogging.LoggerFactory);

We currently don't use a LoggerFactory on the DbContext because it is a database with one table that is just used for intensive logging of events in one table, every second and is isolated from the rest of the application data. The database is a +-10GB file and contains continuously the events of the last 7 days.

So I think we can solve it (as a workaround) by adding a LoggerFactory to the DbContextOptionsBuilder. I will do some more testing, but because the exception is not occuring I can not capture the SQL statement.

@ericsink
Copy link
Owner

I am interested in trying to reproduce this problem. Can you share your code, or some piece of it? If not, how much can you tell me about how I would write a contained program that is similar?

Thanks.

@jay1891
Copy link
Author

jay1891 commented Dec 19, 2019

As attachment below I added a little description of the application.

Application description.pdf

Is this helpful?

@ericsink
Copy link
Owner

Yes, that's helpful.

More questions:

So once per second you insert one row? Each row in a separate transaction?

Is this .NET Framework or .NET Core? Which version?

If .NET Core, what's the OS underneath? Windows?

(You're on recent EF, so I guess would mean you're on .NET Core...)

How exactly does the character encoding issue fit in? It looks like you're passing down System.String, so by the time SQLitePCLRaw gets the string, it should be UTF-16, right?

@jay1891
Copy link
Author

jay1891 commented Dec 19, 2019

So once per second you insert one row? Each row in a separate transaction?
=> In normal operations and during which also the exception occurs, there are currently +-80 rows/second saved and this inside 1 default transaction that happens at SaveChanges().

Is this .NET Framework or .NET Core? Which version?
If .NET Core, what's the OS underneath? Windows?

=> Windows 10, and .NET Framework v4.6.1

(You're on recent EF, so I guess would mean you're on .NET Core...)
=> No, all the libraries are still compatible with .NET standard 2.0, so it can run on .NET Framework v4.6.1

How exactly does the character encoding issue fit in? It looks like you're passing down System.String, so by the time SQLitePCLRaw gets the string, it should be UTF-16, right?
=> Yes eventually we just use .NET strings. I just mention the character set, because the exception happens inside UTF8 encoding functionality, and maybe it is related to our character set. Although all characters are used almost all the time, and sometimes the exception happens only after >30 minutes, and all characters were already been processed at least once before (I have the impression).

@jay1891
Copy link
Author

jay1891 commented Dec 20, 2019

Yesterday I did a little test with the NullLoggerFactory and the exception also occurs.

@ericsink
Copy link
Owner

I'm still thinking it might be helpful to know what string was passed in.

I'm wondering if the following info might help. Perhaps you could use this to modify your code to catch the exception and then dump out the diagnostic info?

https://docs.microsoft.com/en-us/dotnet/api/system.runtime.exceptionservices.handleprocesscorruptedstateexceptionsattribute?view=netframework-4.8

Just a thought...

@jay1891
Copy link
Author

jay1891 commented Dec 20, 2019

Thank you for the information Eric. Because I could not use the attributes on an async method I am now testing with the following approach which was also referenced in the document you mentionded.

https://docs.microsoft.com/en-us/dotnet/framework/configure-apps/file-schema/runtime/legacycorruptedstateexceptionspolicy-element?view=netframework-4.8

If you have any remarks, please let me know.

@ericsink
Copy link
Owner

Anything that can give us some insight into what's going on is worth exploring.

I've also wondered if it would be possible to configure debugger settings to run the code under a debugger and inspect things at the moment of the crash.

@jay1891
Copy link
Author

jay1891 commented Dec 21, 2019

After adding <legacyCorruptedStateExceptionsPolicy enabled="true"/> in the app.config, and removing the LoggerFactory, I could capture the actual data (MessageHistory rows) that is added to the DbContext and being saved by SaveChangesAsync().

The following is a zip file which contains a json file with all the MessageHistory rows that are being added when the crash (exception) occurs.
<...>

I could not retrieve the SQL statement(s). I try to see if I can attach a debugger and maybe get a dump.

@jay1891
Copy link
Author

jay1891 commented Dec 21, 2019

I wrote the following test for the MessageHistory.json file which contains the messages on which the exception occurs in the application, but the exception is not occuring in the test.

    [TestMethod]
    public async Task AccessViolationExceptionShouldNotOccur()
    {
        SqliteConnection connection = new SqliteConnection("DataSource=c:\\Temp\\MessageHistoriesTest.db");
        connection.Open();

        try
        {
            // Arrange
            string messagesHistoryAsJsonString = File.ReadAllText("Resources\\MessageHistory.json");

            IMessageHistoryEntityRepository messageHistoryEntityRepository = EfDomainEntityRepositoryFactory.CreateForMessageHistory(connection);
            List<MessageHistory> messageHistories = JsonConvert.DeserializeObject<List<MessageHistory>>(messagesHistoryAsJsonString);

            // Act
            await messageHistoryEntityRepository.AddRangeAsync(messageHistories.ToArray());

            // Assert
        }
        finally
        {
            connection.Close();
        }
    }

The database that is created during the test is the following

<...>

I forgot to mention that there is a small second table, but it contains only 2 fields of which 1 is a reference to the main table.

@jay1891
Copy link
Author

jay1891 commented Dec 21, 2019

To clarify the (test) code a little: The DbContext is encapsulated inside a wrapping entity repository (class) in which the DbContext is injected and on which some basic operations are available. The basic operation for AddRange is the following:

    public async Task<TEntity[]> AddRangeAsync(TEntity[] entities)
    {
        try
        {
            this.DbContext.Set<TEntity>().AddRange(entities);
            await this.DbContext.SaveChangesAsync();
        }
        catch (Exception exception)
        {
            #region Logging
            this.Logger.LogError(exception, exception.Message);
            #endregion

            Exception appDbException = this.DbExceptionConverter.ToAppDbException(exception);
            if (appDbException != null)
            {
                throw appDbException;
            }
            else
            {
                throw;
            }
        }

        return entities;
    }

@Pressacco
Copy link

Pressacco commented Feb 5, 2020

We have also encountered this problem. Were you able to identify the root cause of the issue?

We are currently using:

image

@jay1891
Copy link
Author

jay1891 commented Feb 6, 2020

Hi Pressacco,

No we have not yet identified the root cause. It is on our todo list.

@Pressacco
Copy link

Pressacco commented Feb 6, 2020

I don't know if this helps, but in our case:

  • Specific Character Set (0008,0005) = ISO_IR 100 (Latin alphabet No. 1)
  • PatientName (0010,0010) = had Germanic umlaut characters ö

Our next step will be to run the DICOM through a validator to ensure that the DICOM attributes are valid.

Would you happen to know of any good (and free) DICOM validators ??

@levant
Copy link

levant commented Mar 25, 2020

I've narrowed-down the root cause of this to the heap allocation from the "new byte[nlen]" in "to_utf8_with_z()" not being pinned later in "utf8_to_string()" in the "fixed (byte* q = sp)" block. Since it's not pinned, it may be moved by the GC and that appears to happen leading to the access violation. I confirmed this with the following code which briefly disables garbage collection during those calls in sqlite3_prepare_v2 as shown below. We were using EntityFramework v3.1.2 with SQLiteRaw v2.0.2.669. We have reverted to EntityFramework v2.2.6 which depends on SQLiteRaw v1.1.12 as a workaround for now. We use .Net Framework 4.8. The issue for the "fixed()" block may actually be in the CLR or .Net runtime.

C:\SQLitePCL.raw\src\SQLitePCLRaw.core\raw.cs
static public int sqlite3_prepare_v2(sqlite3 db, string sql, out sqlite3_stmt stmt, out string tail)
{
int rc = 0;
tail = null;
stmt = null;
if (GC.TryStartNoGCRegion(1000000))
{
rc = sqlite3_prepare_v2(db, sql.to_utf8z(), out stmt, out var sp_tail);
tail = sp_tail.utf8_to_string();
GC.EndNoGCRegion();
}
else
{
Log.Debug("FAILED TryStartNoGCRegion");
}
return rc;
}

@ericsink
Copy link
Owner

@levant Do you have more specific information about why you believe that specific thing array is the thing that is not pinned?

I agree that if suspending the GC avoids the problem then that would seem to mean that something is being accessed without being pinned. And I agree that the byte array from to_utf8_with_z is the obvious candidate. I'm just wondering if you narrowed it down for 100% certain.

I wouldn't question this, but the fact is that the array is pinned (with the fixed statement). Unless I am misusing fixed, but I can't find any indication in the docs that I'm doing this wrong.

        public string utf8_to_string()
        {
            if (sp.Length == 0)
            {
                return null;
            }

            unsafe
            {
                fixed (byte* q = sp)
                {
                    return Encoding.UTF8.GetString(q, sp.Length - 1);
                }
            }
        }

Unless, as you suggest, fixed isn't doing what it is supposed to be doing. But if that were true, I would think it likely that such a bug would be runtime-specific, ie., it might show up in .NET Framework but not .NET Core, for example.

The only thing that strikes me as odd here is that I am referencing sp.Length inside the pin. sp is the Span for the byte array. I'd have to look at the CIL output to see if it's doing something weird here. But if so, the easy workaround is to get the Length into a local variable before the pin.

@levant
Copy link

levant commented Mar 27, 2020

Yes @ericsink that is exactly my conclusion as you said: “ Unless, as you suggest, fixed isn't doing what it is supposed to be doing. But if that were true, I would think it likely that such a bug would be runtime-specific, ie., it might show up in .NET Framework but not .NET Core, for example.”

I don’t have time to dig deeper as we have already found a workable solution for us of going back to EntityFramework 2.2.6 whose dependency is your SQLiteRaw 1.1.12 which predates the Span enhancements. Keep in mind that Span was introduced by MS relatively recently so there may be poor support for it in some libraries like the older .Net 4.8 that we use vs .NetCore or the recently available consolidation into .Net 5.0 (available on preview).

@wolfkumpitsch
Copy link

great work, guys. encountering same problem. thanks to this, got some starting points for investigation. cheers

@wolfkumpitsch
Copy link

Hi @ericsink,
in case this helps: i've got the same problem and this article helped.
btw, this also seems related to issue #901?

i've got issues that sound very similar: application crashes with Access Violation while importing xml logs into a database.
this is my first project using sqlite, but not my first rodeo.
i'm using a VS2019 commandline project, 20 threads to import data in parallel. each thread creates its own DbContext, does its work inside a transaction, saves data in database.

the program is(was): .net 4.7.2/win10/sqlite nuget packages



after reading this thread, i converted my importer to .netcore 3.1
straight conversion: created new equivalent .netcore project type and copied existing source files in. no modifications where needed.

it used to crash, as per last crash log below, every few minutes.
after conversion: no more crashing.

hope this triggers some ideas.

Unhandled Exception: System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
at System.Text.UTF8Encoding.GetCharCount(Byte* bytes, Int32 count, DecoderNLS baseDecoder)
at System.String.CreateStringFromEncoding(Byte* bytes, Int32 byteLength, Encoding encoding)
at System.Text.Encoding.GetString(Byte* bytes, Int32 byteCount)
at SQLitePCL.utf8z.utf8_to_string()
at SQLitePCL.raw.sqlite3_prepare_v2(sqlite3 db, String sql, sqlite3_stmt& stmt, String& tail)
at Microsoft.Data.Sqlite.SqliteCommand.d__64.MoveNext()
at Microsoft.Data.Sqlite.SqliteCommand.d__54.MoveNext()
at Microsoft.Data.Sqlite.SqliteDataReader.NextResult()
at Microsoft.Data.Sqlite.SqliteCommand.ExecuteReader(CommandBehavior behavior)
at Microsoft.Data.Sqlite.SqliteCommand.ExecuteReaderAsync(CommandBehavior behavior, CancellationToken cancellationToken)
at Microsoft.Data.Sqlite.SqliteCommand.d__60.MoveNext()
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder1.Start[TStateMachine](TStateMachine& stateMachine) at Microsoft.Data.Sqlite.SqliteCommand.ExecuteDbDataReaderAsync(CommandBehavior behavior, CancellationToken cancellationToken) at System.Data.Common.DbCommand.ExecuteReaderAsync(CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.<ExecuteReaderAsync>d__17.MoveNext() at System.Runtime.CompilerServices.AsyncTaskMethodBuilder1.Start[TStateMachine](TStateMachine& stateMachine)
at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Update.ReaderModificationCommandBatch.d__29.MoveNext()
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder.Start[TStateMachine](TStateMachine& stateMachine)
at Microsoft.EntityFrameworkCore.Update.ReaderModificationCommandBatch.ExecuteAsync(IRelationalConnection connection, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Update.Internal.BatchExecutor.d__8.MoveNext()
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder1.Start[TStateMachine](TStateMachine& stateMachine) at Microsoft.EntityFrameworkCore.Update.Internal.BatchExecutor.ExecuteAsync(IEnumerable1 commandBatches, IRelationalConnection connection, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Storage.RelationalDatabase.SaveChangesAsync(IList1 entries, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.StateManager.<SaveChangesAsync>d__97.MoveNext() at System.Runtime.CompilerServices.AsyncTaskMethodBuilder1.Start[TStateMachine](TStateMachine& stateMachine)
at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.StateManager.SaveChangesAsync(IList1 entriesToSave, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.StateManager.<SaveChangesAsync>d__101.MoveNext() at System.Runtime.CompilerServices.AsyncTaskMethodBuilder1.Start[TStateMachine](TStateMachine& stateMachine)
at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.StateManager.SaveChangesAsync(DbContext _, Boolean acceptAllChangesOnSuccess, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Storage.Internal.NoopExecutionStrategy.ExecuteAsync[TState,TResult](TState state, Func4 operation, Func4 verifySucceeded, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.StateManager.SaveChangesAsync(Boolean acceptAllChangesOnSuccess, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.DbContext.d__54.MoveNext()
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder1.Start[TStateMachine](TStateMachine& stateMachine) at Microsoft.EntityFrameworkCore.DbContext.SaveChangesAsync(Boolean acceptAllChangesOnSuccess, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.DbContext.SaveChangesAsync(CancellationToken cancellationToken) at ...Thingo.ConApp.Program.<ProcessAsync>d__19.MoveNext() in ......Thingo.ConApp\Program.cs:line 536 at System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.InvokeMoveNext(Object stateMachine) at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx) at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx) at System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.Run() at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(Action action, Boolean allowInlining, Task& currentTask) at System.Threading.Tasks.Task.FinishContinuations() at System.Threading.Tasks.Task.FinishStageThree() at System.Threading.Tasks.Task1.TrySetResult(TResult result)
at System.Threading.Tasks.Task.WhenAllPromise1.Invoke(Task ignored) at System.Threading.Tasks.Task.FinishContinuations() at System.Threading.Tasks.Task.FinishStageThree() at System.Threading.Tasks.Task1.TrySetResult(TResult result)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder1.SetResult(TResult result) at ...Thingo.ConApp.Program.<ProcessBlobAsync>d__20.MoveNext() in ......Thingo.ConApp\Program.cs:line 671 at System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.InvokeMoveNext(Object stateMachine) at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx) at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx) at System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.Run() at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(Action action, Boolean allowInlining, Task& currentTask) at System.Threading.Tasks.Task.FinishContinuations() at System.Threading.Tasks.Task.FinishStageThree() at System.Threading.Tasks.Task1.TrySetResult(TResult result)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder1.SetResult(TResult result) at ...Thingo.ConApp.Program.<ReadFileAsync>d__21.MoveNext() in ......Thingo.ConApp\Program.cs:line 678 at System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.InvokeMoveNext(Object stateMachine) at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx) at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx) at System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.Run() at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(Action action, Boolean allowInlining, Task& currentTask) at System.Threading.Tasks.Task.FinishContinuations() at System.Threading.Tasks.Task.FinishStageThree() at System.Threading.Tasks.Task1.TrySetResult(TResult result)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder1.SetResult(TResult result) at System.IO.StreamReader.<ReadToEndAsyncInternal>d__62.MoveNext() at System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.InvokeMoveNext(Object stateMachine) at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx) at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx) at System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.Run() at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(Action action, Boolean allowInlining, Task& currentTask) at System.Threading.Tasks.Task.FinishContinuations() at System.Threading.Tasks.Task.FinishStageThree() at System.Threading.Tasks.Task1.TrySetResult(TResult result)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder1.SetResult(TResult result) at System.IO.StreamReader.<ReadBufferAsync>d__97.MoveNext() at System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.InvokeMoveNext(Object stateMachine) at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx) at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx) at System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.Run() at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(Action action, Boolean allowInlining, Task& currentTask) at System.Threading.Tasks.Task.FinishContinuations() at System.Threading.Tasks.Task.FinishStageThree() at System.Threading.Tasks.Task1.TrySetResult(TResult result)
at System.Threading.Tasks.TaskFactory1.FromAsyncTrimPromise1.Complete(TInstance thisRef, Func3 endMethod, IAsyncResult asyncResult, Boolean requiresSynchronization) at System.Threading.Tasks.TaskFactory1.FromAsyncTrimPromise`1.CompleteFromAsyncResult(IAsyncResult asyncResult)
at System.IO.Stream.ReadWriteTask.InvokeAsyncCallback(Object completedTask)
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.IO.Stream.ReadWriteTask.System.Threading.Tasks.ITaskCompletionAction.Invoke(Task completingTask)
at System.Threading.Tasks.Task.FinishContinuations()
at System.Threading.Tasks.Task.FinishStageThree()
at System.Threading.Tasks.Task.FinishStageTwo()
at System.Threading.Tasks.Task.Finish(Boolean bUserDelegateExecuted)
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot)
at System.Threading.Tasks.Task.ExecuteEntry(Boolean bPreventDoubleExecution)
at System.Threading.Tasks.Task.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
at System.Threading.ThreadPoolWorkQueue.Dispatch()
at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback()
Press any key to continue . . .

@jay1891
Copy link
Author

jay1891 commented Jun 3, 2020

Hi @ericsink, I disabled a part of our logic to prevent this issue, but in another area where I can't disable this issue and where we intensively create records, the issue occurs continuously. Could I help in any way to solve this issue, or what could be done? At the moment I don't know what else I can do.

@ericsink
Copy link
Owner

ericsink commented Jun 3, 2020

@jay1891 Looks like the big thing I need here is a small sample app that reproduces the problem. If I have that, I can do a workaround or fix. Any chance you could provide that?

@jay1891
Copy link
Author

jay1891 commented Jun 3, 2020

@ericsink ok, I try to provide the sample app.

@jay1891
Copy link
Author

jay1891 commented Jun 3, 2020

@ericsink We the solution below I can reproduce it 1 in 5 times in the first 15 seconds after pressing "Create number of entities". It is very difficult (for me) to find a perfect reproduction example. If you have any suggestions, please let me know. Could you try to reproduce it with this sample?

WpfAppSqliteTesting.zip

accessviolation 1

accessviolation 2

@ericsink
Copy link
Owner

ericsink commented Jun 3, 2020

In my case, the error happened on the eighth try. I was beginning to think it would not happen at all. :-)

FWIW:

If I run the sample app twice in a row without Clean in between, I get a different error, unable to load e_sqlite3 on some sort of a Migrate call. Do you see that too?

In the 7 test runs before I saw the "memory is corrupt error", each time it failed at around 35-40 seconds with a SQLite error 5, db is locked. Do you see that too?

I assume both of the above are unrelated to the memory corruption problem, but I thought I should mention them just in case.

@jay1891
Copy link
Author

jay1891 commented Jun 3, 2020

With a little luck it happened 3 times after each other. :-)

  1. "If I run the sample app twice in a row without Clean in between, I get a different error, unable to load e_sqlite3 on some sort of a Migrate call. Do you see that too?"

I didn't had if before, but running the application again that I added here, I had the issue too. I migrated from package.config to PackageReference and now it doesn't occur anymore.

  1. "In the 7 test runs before I saw the "memory is corrupt error", each time it failed at around 35-40 seconds with a SQLite error 5, db is locked. Do you see that too?"

Yes that occurs when I run it too
Also the following error occured but there the app had a high concurrency when adding entities to the db, which I think is not a real/relevant use case.

2020-06-03 19_54_31-Window

  1. I had also the following error, but it only happened once.

2020-06-03 19_44_01-Window

ericsink added a commit that referenced this issue Jun 3, 2020
@ericsink
Copy link
Owner

ericsink commented Jun 3, 2020

FYI, I've done some simplifications to your repro app and checked it in to the tree under test_nupkgs/issue321.

My test runs do seem to suggest that the problem is specific to .NET Framework and does not exist with .NET Core.

ericsink added a commit that referenced this issue Jun 3, 2020
…, put sp.Length into a local so it doesn't get referenced inside the fixed block. I didn't really expect this to make a difference, and sure enough, it did not. the crash still happens.
ericsink added a commit that referenced this issue Jun 3, 2020
…he issue, put sp.Length into a local so it doesn't get referenced inside the fixed block. I didn't really expect this to make a difference, and sure enough, it did not. the crash still happens."

This reverts commit d980630.
@ericsink
Copy link
Owner

ericsink commented Jun 3, 2020

FWIW, as another stab in the dark, I tried updating the repro app to System.Memory 4.5.4, and that didn't help either.

@jay1891
Copy link
Author

jay1891 commented Jun 3, 2020

If it could be of any value; I adjusted your repro.cs (see below) and with this adjustment I can reproduce it with every run of the console app (5/5)

repro.zip

@kpreisser
Copy link

kpreisser commented Jun 4, 2020

Hi,
just some random thoughts; I don't know if this will help or whether you are already aware of this:

When using the Span<T> types in .NET Core and in .NET Framework (with the System.Memory package), one difference is that due to the type being built-in in .NET Core 2.1 and higher, it has additional capabilities e.g. to track objects when the span was initialized using a pointer, which is not the case with .NET Framework.

For example, given a code like the following:

    static ReadOnlySpan<byte> GetSpan()
    {
        var myArray = new byte[50000000];

        //// Option 1: Create a span from pointer
        fixed (byte* ptr = myArray)
            return new ReadOnlySpan<byte>(ptr, myArray.Length);

        // Option 2: Create a span from array (ref)
        //return myArray;
    }

    // Get a span and then force objects to be GCed.
    ReadOnlySpan<byte> byteSpan = GetSpan();
    GC.Collect();

    // Try to access the whole span.
    string str;
    fixed (byte* ptr = byteSpan)
        str = Encoding.UTF8.GetString(ptr, byteSpan.Length);

That code creates a Span<byte> from a pointer (that was obtained by pinning a byte array) and later tries to access it. When running that code with .NET Framework 4.7.2, it will normally crash with an AccessViolationException, as due to the GC.Collect(), the byte array from which the pointer was obtained to initialize the Span<byte> will already have been GCed and so the memory was freed.

In contrast, when you run it on .NET Core 2.1 or higher, it will work without a problem as the Span<byte> tracks the object the pointer points to (the array), and will prevent it from being GCed as long as the Span<byte> is present on the stack (but I don't know how that works exactly).
But when you for example create the span with new ReadOnlySpan<byte>(ptr - 1000, myArray.Length), and later access it using Encoding.UTF8.GetString(ptr + 1000, byteLength), an AccessViolationException will also occur in .NET Core because when initializing the span with the modified pointer, it can no longer track the underlying object.

However, from looking at the code in utf8z.cs and util.cs, I wouldn't think this behavior is applicable there because the Span<byte> appears to be created from a byte[] rather than a byte*, and in that case the Span should have a ref to the field in the array and thus also prevent the object from being GCed in .NET Framework (you can test this when commenting option 1 in the code and uncommenting option 2). Therefore, I currently also don't have any ideas how that code can lead to a AccessViolationException.

Thanks!

@kpreisser
Copy link

kpreisser commented Jun 4, 2020

I took a closer look on the call stack and the relevant code when the crash happens, and I think the situation described in my previous post can actually be the cause of this issue (and explain why it crashes on .NET Framework but works on .NET Core):

In the call stack of the AccessViolationException, we can see that sqlite3_prepare_v2 is called, which then calls utf8z.utf8_to_string:

   at System.Text.UTF8Encoding.GetCharCount(Byte* bytes, Int32 count, DecoderNLS baseDecoder)
   at System.String.CreateStringFromEncoding(Byte* bytes, Int32 byteLength, Encoding encoding)
   at SQLitePCL.utf8z.utf8_to_string()
   at SQLitePCL.raw.sqlite3_prepare_v2(sqlite3 db, String sql, sqlite3_stmt& stmt, String& tail)

The called methods raw.sqlite3_prepare_v2 overloads look like this:

static public int sqlite3_prepare_v2(sqlite3 db, utf8z sql, out sqlite3_stmt stmt, out utf8z tail)
{
int rc = Provider.sqlite3_prepare_v2(db, sql, out var p, out tail);
stmt = sqlite3_stmt.From(p, db);
return rc;
}
static public int sqlite3_prepare_v2(sqlite3 db, string sql, out sqlite3_stmt stmt, out string tail)
{
int rc = sqlite3_prepare_v2(db, sql.to_utf8z(), out stmt, out var sp_tail);
tail = sp_tail.utf8_to_string();
return rc;
}

Notice that the method with a string parameter creates a utf8z from a string, which allocates a new byte[] and then returns a ReadOnlySpan<byte> for it. This utf8z is then passed to the provider implementation of sqlite3_prepare_v2 which looks like this (for cdecl):

unsafe int ISQLite3Provider.sqlite3_prepare_v2(sqlite3 db, utf8z sql, out IntPtr stm, out utf8z tail)
{
fixed (byte* p_sql = sql)
{
var rc = NativeMethods.sqlite3_prepare_v2(db, p_sql, -1, out stm, out var p_tail);
// TODO we could skip the strlen by using the length we were given
tail = utf8z.FromPtr(p_tail);
return rc;
}
}

We can see that it uses a fixed block to pin the span referencing the byte array and then calls the native sqlite3_prepare_v2. While the array is still pinned, it creates a utf8z (and therefore a ReadOnlySpan<byte>) from the returned p_tail value (byte* pointer).

If I understand the SQLite documentation correctly, sqlite3_prepare_v2 sets the pzTail to point to a different part of the string supplied in parameter p_sql.

The method raw.sqlite3_prepare_v2 then calls sp_tail.utf8_to_string, which needs to access the pointer.

However, remember that the pointer passed to the native sqlite3_prepare_v2 points to a byte array that was allocated in util.to_utf8_with_z. During the call of the native sqlite3_prepare_v2 function, this array was pinned, but once we call sp_tail.utf8_to_string(), the original byte array is no longer pinned (on .NET Framework) and also no longer referenced (as the utf8z and therefore the ReadOnlySpan<byte> that references the array no longer exists), so the Garbage Collector could already have collected it (or moved it), even though we still have the tail Span that points somewhere within the original byte array location. Then, when we want to access the tail by calling .utf8_to_string, the AccessViolationException occurs.

This works correctly on .NET Core because as mentioned above, while the byte array is pinned, the ReadOnlySpan<byte> is created from the p_tail pointer that points to a different part of the byte array, and therefore as long as the tail Span is on the stack, the runtime will prevent the byte array from being GCed (and probably from being moved by the GC).


Note that there is a different overload of SQLite3Provider_dynamic_cdecl.sqlite3_prepare_v2 that uses a ReadOnlySpan<byte> rather than utf8z as parameter:

unsafe int ISQLite3Provider.sqlite3_prepare_v2(sqlite3 db, ReadOnlySpan<byte> sql, out IntPtr stm, out ReadOnlySpan<byte> tail)
{
fixed (byte* p_sql = sql)
{
var rc = NativeMethods.sqlite3_prepare_v2(db, p_sql, sql.Length, out stm, out var p_tail);
var len_consumed = (int) (p_tail - p_sql);
int len_remain = sql.Length - len_consumed;
if (len_remain > 0)
{
tail = sql.Slice(len_consumed, len_remain);
}
else
{
tail = ReadOnlySpan<byte>.Empty;
}
return rc;
}
}

I think that one should work correctly, since instead of creating a new ReadOnlySpan<byte> from a byte* pointer, it slices the original span referencing the byte array, and so the new Span should also reference the byte array and therefore prevent it from being GCed.

@ericsink
Copy link
Owner

ericsink commented Jun 4, 2020

@kpreisser I'm still digesting what you've written above, but for now I want to say thanks for taking the time to investigate this and comment!

ericsink added a commit that referenced this issue Jun 4, 2020
…rence on the span explicitly instead of letting the C# 7.3 fixed statement call it implicitly. no change in behavior or effect. working on #321
ericsink added a commit that referenced this issue Jun 4, 2020
…ableReference on the span explicitly instead of letting the C# 7.3 fixed statement call it implicitly. no change in behavior or effect. working on #321"

This reverts commit d0d81f3.
ericsink added a commit that referenced this issue Jun 4, 2020
…n investigation by @kpreisser I have changed the problematic overload of raw.sqlite3_prepare_v2() to call a different overload of same in the provider to avoid creating a span from a pointer.  with this change, I can run the issue321 repro app at 20 iterations on net461 without a failure.  so this seems to confirm with @kpreisser said, but now every case where I create a span from a pointer needs to be reviewed for this kind of lifetime issue.
@wolfkumpitsch
Copy link

wolfkumpitsch commented Jun 4, 2020 via email

@ericsink
Copy link
Owner

The 2.0.4-pre release I uploaded to nuget.org today contains the fix for this problem.

@jakubsuchybio
Copy link

Hi guys,
I also stumbled upon this issue. Same exact problem, however in my case I'm developing sqlite database filler for random mock data for performance testing our main product.
After reading all this, my solution was just to specify SQLitePCLRaw.bundle_e_sqlite3 nuget explicitly in version 2.0.4
But before I didn't use that package directly, but via EFCore 3.1.7, which references SQLitePCLRaw.bundle_e_sqlite3 in version 2.0.2.
I looked at latest EFCore 3.1.18 and it still references 2.0.2
Only newer EFCore 5.0.0+ references 2.0.4
However we use EFCore with .NET Framework 4.8 and can't upgrade to .NET 5+ or .NET Core, so we are stuck with EFCore 3.1.* which is the latest version that still supports .NET Framework 4.8. The EFCore 5.0.0+ doesn't support .NET Framework 4.8 :(

Is there a way for you guys to push EFCore 3.1.* to use the newer SQLitePCLRaw.bundle_e_sqlite3 in version 2.0.4? So this doesn't happen to anyone else still on .NET Framework?

Thanks

@bricelam
Copy link
Contributor

Is there a way for you guys to push EFCore 3.1.* to use the newer SQLitePCLRaw.bundle_e_sqlite3 in version 2.0.4? So this doesn't happen to anyone else still on .NET Framework?

Can you file an issue on dotnet/efcore so we can discuss?

ericsink added a commit that referenced this issue Aug 16, 2021
…. these overloads were not memory safe, as they returned a span made from a pointer obtained from a fixed block. technically, this is a breaking change, but it is likely that nothing was using those overloads. my own test suite was not, and Microsoft.Data.Sqlite was not. there was discussion of the problem with these overloads in #321.  additional detective work on this happened while trying to figure out #430.
ericsink added a commit that referenced this issue Aug 17, 2021
…re_v2/v3. these overloads were not memory safe, as they returned a span made from a pointer obtained from a fixed block. technically, this is a breaking change, but it is likely that nothing was using those overloads. my own test suite was not, and Microsoft.Data.Sqlite was not. there was discussion of the problem with these overloads in #321.  additional detective work on this happened while trying to figure out #430."

This reverts commit 264be3a.
@ericsink ericsink mentioned this issue Aug 25, 2021
@ericsink
Copy link
Owner

Looks like this got fixed but never closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants