Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] out-of-bounds write in SUNMatScaleAddI_Sparse #581

Closed
dweindl opened this issue Sep 28, 2024 · 4 comments · Fixed by #584
Closed

[BUG] out-of-bounds write in SUNMatScaleAddI_Sparse #581

dweindl opened this issue Sep 28, 2024 · 4 comments · Fixed by #584

Comments

@dweindl
Copy link
Contributor

dweindl commented Sep 28, 2024

Current Behavior:

For the example below, I encounter out-of-bounds writes in SUNMatScaleAddI_Sparse at

Expected Behavior:

No out-of-bounds writes.

Steps To Reproduce:

  • Download test_SUNMatScaleAddI_Sparse.tar.gz

  • Unpack: tar -xzf test_SUNMatScaleAddI_Sparse.tar.gz && cd test_SUNMatScaleAddI_Sparse

  • Configure & build: cmake -S. -Bbuild && cmake --build build (this might need some adaptation to find the sundials configuration)

  • valgrind build/sparsetest:

     Invalid write of size 8
        at 0x10C882: SUNMatScaleAddI_Sparse (sunmatrix_sparse.c:753)
        by 0x111B9C: SUNMatScaleAddI (sundials_matrix.c:201)
        by 0x109630: main (in build/sparsetest)
      Address 0x4a8f588 is 8 bytes before a block of size 312 alloc'd
        at 0x484D953: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
        by 0x109D07: SUNSparseMatrix (sunmatrix_sparse.c:130)
        by 0x10943F: main (in build/sparsetest)
     
     Invalid write of size 8
        at 0x10C8C4: SUNMatScaleAddI_Sparse (sunmatrix_sparse.c:754)
        by 0x111B9C: SUNMatScaleAddI (sundials_matrix.c:201)
        by 0x109630: main (in build/sparsetest)
      Address 0x4a8f408 is 8 bytes before a block of size 312 alloc'd
        at 0x484D953: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
        by 0x109C81: SUNSparseMatrix (sunmatrix_sparse.c:127)
        by 0x10943F: main (in build/sparsetest)
    

Environment:

  • SUNDIALS version: 7.1.1
  • OS: Ubuntu 24.04
  • Compiler: gcc 13.2.0

Anything else:

Running the same example with SUNMatScaleAddI_Sparse from sundials 6.6.2 works as expected.
It seems the problem was introduced in #257.

@Steven-Roberts
Copy link
Collaborator

Hi @dweindl, thanks for raising the issue and the nice example to reproduce it. It seems the issue is the sparse matrix addition algorithm is assuming the row indices for a column are sorted which is not the case for your matrix. For example, the column 0 row indices are {0, 1, 3, 2, 4, 10, 7, 8}. I'm not sure if this is a documentation issue or a bug in the algorithm (probably the latter), but I'll discuss with the team tomorrow to plan out a fix. Either way, I see a related out of bounds bug here when N>M:

As a temporary fix, you can reorder the row indices to be ascending:

sunindextype indexvals_data[] = {0, 1, 2, 3, 4, 7, 8, 10, 0, 1, 3, 7, 0, 2, 4, 8, 0, 1, 3, 6, 7, 0, 2, 4, 6, 8, 1, 3, 5, 9, 2, 4, 6, 9, 0, 10, 0, 0, 0};

This ran cleanly through valgrind.

@dweindl
Copy link
Contributor Author

dweindl commented Oct 1, 2024

Thanks for looking into that, @Steven-Roberts. I can confirm that ordering the indexvals fixes the problem.
So far, (while we were still using sundials 5.8.0,) everything seemed to work fine with unordered indexvals.

@drreynolds
Copy link
Collaborator

About 18 months ago, another user submitted PR #257 to adjust how this function was implemented. Essentially, our original implementation (that was in sundials 5.8.0) required O(M*N) loop iterations, instead of the O(NNZ) iterations that one would assume should be necessary. I'm guessing that their efficiency-based fix added an assumption that the indexvals for each row/column are sorted (I'll note that this assumption is satisfied by all of our existing documentation examples, code examples, and unit tests).

I see three options:

  1. We make explicit the assumption that the indexvals are sorted within each row/column. This would update the documentation, retain the current implementation, and potentially require some users to sort the values appropriately.
  2. We revert the current implementation, allowing non-sorted indexvals, but increasing the complexity of the implementation from O(NNZ) back to O(M*N).
  3. Someone devises a new implementation that retains the O(NNZ) complexity but supports non-sorted indexvals.

@Steven-Roberts Steven-Roberts linked a pull request Oct 2, 2024 that will close this issue
balos1 pushed a commit that referenced this issue Oct 4, 2024
Fixes #581

The new implementation eliminates all temporary storage/mallocs and only
traverses the matrix entries once when diagonal elements don't need to
be added. The sparse matrix example shows about a 1.5x speedup on
50000x50000 matrices.
@balos1
Copy link
Member

balos1 commented Oct 4, 2024

Fixed in #584.

@balos1 balos1 closed this as completed Oct 4, 2024
dweindl added a commit to AMICI-dev/AMICI that referenced this issue Oct 7, 2024
We've been lagging behind SUNDIALS development. This updates the SUNDIALS library from v5.8.0 to [v7.1.1](https://sundials.readthedocs.io/en/latest/Changelog_link.html#changes-to-sundials-in-release-7-1-1).

Note:
* This includes the post-7.1.1 changes to `cmake/tpl/FindKLU.cmake` from LLNL/sundials#582
* This includes the post-7.1.1 changes to [`sundials/src/sunmatrix/sparse/sunmatrix_sparse.c:SUNMatScaleAddI_Sparse`](https://github.com/LLNL/sundials/blob/736369d543cb80956b1ba87377ffc0c8cf6b342b/src/sunmatrix/sparse/sunmatrix_sparse.c#L625C1-L694C2) from  LLNL/sundials#584, fixing LLNL/sundials#581

The majority of changes inside amici are related to the introduction of [SUNContext](https://sundials.readthedocs.io/en/latest/sundials/SUNContext_link.html#the-suncontext-type), which is required for the construction of all SUNDIALS objects. Other trivial changes are related to renamed types and macros.

Furthermore, the error handling strategy has changed. As a consequence, simulation error messages will look slightly different.

Closes #1565
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants