Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault from libsolv during dependency checking for reposync from source with large metadata file. #543

Open
filsdepatrick opened this issue Nov 20, 2023 · 6 comments

Comments

@filsdepatrick
Copy link

Hello, My team manages the repository-mirror service at my company. We mirror rpm repos from our devops team and we seem to have hit a bug with libsolv on ol8. When using reposync to mirror a repo with a large (527 MB) metadata file (xxxxxx-filelists.xml.gz), Reposync segfaults with a coredump and this error message:
2023-11-14T03:06:57.243146-08:00 repository-mirror002 kernel: [45200.322222] Code: 89 45 00 48 01 f0 48 8d 50 01 81 fb ff 1f 00 00 76 35 81 fb ff ff ff 07 77 55 81 fb ff ff 0f 00 77 6c 89 d9 c1 e9 0d 83 c9 80 <88> 08 89 d9 48 8d 42 01 48 83 c2 02 c1 e9 06 83 c9 80 88 4a fe eb
2023-11-14T03:11:41.520088-08:00 repository-mirror002. kernel: [45484.589441] reposync[1210499]: segfault at 7f93641f8013 ip 00007f9520a9a6bb sp 00007ffe9edf7250 error 6 in libsolv.so.1[7f9520a62000+90000]
This error reproduces with the distro version of the libsolv package installed (0.7.20-4) as well as with the version of libsolv from ol9 (0.7.22-4), as well as with the latest version from https://github.com/openSUSE/libsolv (0.7.26)
The segfault occurs after the metadata from the source mirror is downloaded completely, and while memory is being re-allocated during the package dependency analysis that reposync initiates via calls to the libsolv library. There is some gdb output that was collected during the segfault that I'll paste below

strace output of reposync failure:

using libsolv-0.7.20-4:

mremap(0x7f3c9bbf2000, 2080378880, 2147487744, MREMAP_MAYMOVE) = 0x7f3c9bbf2000
mremap(0x7f3c9bbf2000, 2147487744, 2281705472, MREMAP_MAYMOVE) = 0x7f3c9bbf2000
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x7f3c1bbf2013} ---
+++ killed by SIGSEGV (core dumped) +++
Segmentation fault (core dumped)

using libsolv-0.7.26:

mremap(0x7f8b3a6f3000, 2147487744, 2281705472, MREMAP_MAYMOVE) = 0x7f8c4e6f7000
mremap(0x7f8c4e6f7000, 2281705472, 18446744071562072064, MREMAP_MAYMOVE) = -1 ENOMEM (Cannot allocate memory)
mmap(NULL, 18446744071562072064, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
mmap(NULL, 18446744071562207232, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
mmap(NULL, 18446744071562072064, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
write(2, "Out of memory allocating 1844674"..., 53Out of memory allocating 18446744071562070016 bytes!
) = 53
rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
getpid() = 2451293
gettid() = 2451293
tgkill(2451293, 2451293, SIGABRT) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=2451293, si_uid=0} ---
+++ killed by SIGABRT (core dumped) +++
Aborted (core dumped)

(gdb) bt full
#0 0x00007ffff6701f6f in __memmove_avx_unaligned_erms () from /lib64/libc.so.6
No symbol table info available.
#1 0x00007ffff2f86784 in data_addblob.isra () from /lib64/libsolv.so.1
No symbol table info available.
#2 0x00007ffff2f87f22 in repodata_serialize_key.isra () from /lib64/libsolv.so.1
No symbol table info available.
#3 0x00007ffff2f924ef in repodata_internalize () from /lib64/libsolv.so.1
No symbol table info available.
#4 0x00007ffff2d3f548 in repo_add_rpmmd () from /lib64/libsolvext.so.1
No symbol table info available.
#5 0x00007ffff3dc2279 in load_filelists_cb(s_Repo*, _IO_FILE*) () from /lib64/libdnf.so.2
No symbol table info available.
#6 0x00007ffff3dc4ddb in load_ext(_DnfSack*, libdnf::Repo*, _hy_repo_repodata, char const*, char const*, int ()(s_Repo, _IO_FILE*), _GError**) () from /lib64/libdnf.so.2
No symbol table info available.
#7 0x00007ffff3dc5427 in dnf_sack_load_repo () from /lib64/libdnf.so.2
No symbol table info available.
#8 0x00007fffe66d562e in load_repo(_SackObject*, _object*, _object*) () from /usr/lib64/python3.6/site-packages/hawkey/_hawkey.so
No symbol table info available.
#9 0x00007ffff7537b84 in PyCFunction_Call () from /lib64/libpython3.6m.so.1.0
No symbol table info available.
#10 0x00007ffff754526f in _PyEval_EvalFrameDefault () from /lib64/libpython3.6m.so.1.0
No symbol table info available.
#11 0x00007ffff751b8d8 in fast_function () from /lib64/libpython3.6m.so.1.0
No symbol table info available.
#12 0x00007ffff753ec97 in call_function () from /lib64/libpython3.6m.so.1.0
No symbol table info available.
#13 0x00007ffff753f8e8 in _PyEval_EvalFrameDefault () from /lib64/libpython3.6m.so.1.0
No symbol table info available.
#14 0x00007ffff749c744 in _PyEval_EvalCodeWithName () from /lib64/libpython3.6m.so.1.0
No symbol table info available.
#15 0x00007ffff751bac0 in fast_function () from /lib64/libpython3.6m.so.1.0
No symbol table info available.
#16 0x00007ffff753ec97 in call_function () from /lib64/libpython3.6m.so.1.0
No symbol table info available.
#17 0x00007ffff754052a in _PyEval_EvalFrameDefault () from /lib64/libpython3.6m.so.1.0
No symbol table info available.
#18 0x00007ffff751b8d8 in fast_function () from /lib64/libpython3.6m.so.1.0
No symbol table info available.
#19 0x00007ffff753ec97 in call_function () from /lib64/libpython3.6m.so.1.0
No symbol table info available.
#20 0x00007ffff753f8e8 in _PyEval_EvalFrameDefault () from /lib64/libpython3.6m.so.1.0
No symbol table info available.
#21 0x00007ffff751b8d8 in fast_function () from /lib64/libpython3.6m.so.1.0
No symbol table info available.
#22 0x00007ffff753ec97 in call_function () from /lib64/libpython3.6m.so.1.0
No symbol table info available.
#23 0x00007ffff753f8e8 in _PyEval_EvalFrameDefault () from /lib64/libpython3.6m.so.1.0
No symbol table info available.
#24 0x00007ffff751b8d8 in fast_function () from /lib64/libpython3.6m.so.1.0
No symbol table info available.
#25 0x00007ffff753ec97 in call_function () from /lib64/libpython3.6m.so.1.0
No symbol table info available.
#26 0x00007ffff753f8e8 in _PyEval_EvalFrameDefault () from /lib64/libpython3.6m.so.1.0
No symbol table info available.
#27 0x00007ffff751b8d8 in fast_function () from /lib64/libpython3.6m.so.1.0
No symbol table info available.
#28 0x00007ffff753ec97 in call_function () from /lib64/libpython3.6m.so.1.0
No symbol table info available.
#29 0x00007ffff753f8e8 in _PyEval_EvalFrameDefault () from /lib64/libpython3.6m.so.1.0
No symbol table info available.
#30 0x00007ffff749c744 in _PyEval_EvalCodeWithName () from /lib64/libpython3.6m.so.1.0
No symbol table info available.
#31 0x00007ffff751bac0 in fast_function () from /lib64/libpython3.6m.so.1.0
No symbol table info available.
#32 0x00007ffff753ec97 in call_function () from /lib64/libpython3.6m.so.1.0
No symbol table info available.
#33 0x00007ffff753f8e8 in _PyEval_EvalFrameDefault () from /lib64/libpython3.6m.so.1.0
No symbol table info available.
#34 0x00007ffff749c744 in _PyEval_EvalCodeWithName () from /lib64/libpython3.6m.so.1.0
No symbol table info available.
#35 0x00007ffff751bac0 in fast_function () from /lib64/libpython3.6m.so.1.0
No symbol table info available.
#36 0x00007ffff753ec97 in call_function () from /lib64/libpython3.6m.so.1.0
No symbol table info available.
#37 0x00007ffff754052a in _PyEval_EvalFrameDefault () from /lib64/libpython3.6m.so.1.0
No symbol table info available.
#38 0x00007ffff749c744 in _PyEval_EvalCodeWithName () from /lib64/libpython3.6m.so.1.0
No symbol table info available.
#39 0x00007ffff755c593 in PyEval_EvalCode () from /lib64/libpython3.6m.so.1.0
No symbol table info available.
--Type for more, q to quit, c to continue without paging--
#40 0x00007ffff75aaa62 in run_mod () from /lib64/libpython3.6m.so.1.0
No symbol table info available.
#41 0x00007ffff747ce9c in PyRun_FileExFlags () from /lib64/libpython3.6m.so.1.0
No symbol table info available.
#42 0x00007ffff748209e in PyRun_SimpleFileExFlags () from /lib64/libpython3.6m.so.1.0
No symbol table info available.
#43 0x00007ffff7482918 in Py_Main.cold.3359 () from /lib64/libpython3.6m.so.1.0
No symbol table info available.
#44 0x0000555555400b96 in main ()
No symbol table info available.
That one crashes doing a memcpy() in the data_addblob inline:

(gdb) disass 0x00007ffff2f86784
Dump of assembler code for function data_addblob.isra.11:
0x00007ffff2f86730 <+0>: push r14
0x00007ffff2f86732 <+2>: mov r14,rdx
0x00007ffff2f86735 <+5>: push r13
0x00007ffff2f86737 <+7>: mov r13,rdi
0x00007ffff2f8673a <+10>: push r12
0x00007ffff2f8673c <+12>: movsxd r12,ecx
0x00007ffff2f8673f <+15>: push rbp
0x00007ffff2f86740 <+16>: mov rbp,r12
0x00007ffff2f86743 <+19>: push rbx
0x00007ffff2f86744 <+20>: mov rbx,rsi
0x00007ffff2f86747 <+23>: movsxd rax,DWORD PTR [rsi]
0x00007ffff2f8674a <+26>: mov rdi,QWORD PTR [rdi]
0x00007ffff2f8674d <+29>: cmp r12,0x1
0x00007ffff2f86751 <+33>: je 0x7ffff2f86790 <data_addblob.isra.11+96>
0x00007ffff2f86753 <+35>: lea rsi,[r12+rax*1]
0x00007ffff2f86757 <+39>: lea rcx,[rax-0x1]
0x00007ffff2f8675b <+43>: lea rdx,[rsi-0x1]
0x00007ffff2f8675f <+47>: or rcx,0x3ff
0x00007ffff2f86766 <+54>: or rdx,0x3ff
0x00007ffff2f8676d <+61>: cmp rcx,rdx
0x00007ffff2f86770 <+64>: jne 0x7ffff2f8679f <data_addblob.isra.11+111>
0x00007ffff2f86772 <+66>: mov QWORD PTR [r13+0x0],rdi
0x00007ffff2f86776 <+70>: mov rdx,r12
0x00007ffff2f86779 <+73>: mov rsi,r14
0x00007ffff2f8677c <+76>: add rdi,rax
0x00007ffff2f8677f <+79>: call 0x7ffff2f5b0b0 memcpy@plt
=> 0x00007ffff2f86784 <+84>: add DWORD PTR [rbx],ebp
0x00007ffff2f86786 <+86>: pop rbx
0x00007ffff2f86787 <+87>: pop rbp
0x00007ffff2f86788 <+88>: pop r12
0x00007ffff2f8678a <+90>: pop r13
0x00007ffff2f8678c <+92>: pop r14
0x00007ffff2f8678e <+94>: ret
0x00007ffff2f8678f <+95>: nop
0x00007ffff2f86790 <+96>: mov rdx,rax
0x00007ffff2f86793 <+99>: lea rsi,[rax+0x1]
0x00007ffff2f86797 <+103>: and edx,0x3ff
0x00007ffff2f8679d <+109>: jne 0x7ffff2f86772 <data_addblob.isra.11+66>
0x00007ffff2f8679f <+111>: mov ecx,0x3ff
0x00007ffff2f867a4 <+116>: mov edx,0x1
0x00007ffff2f867a9 <+121>: call 0x7ffff2f5a970 solv_extend_realloc@plt
0x00007ffff2f867ae <+126>: mov rdi,rax
0x00007ffff2f867b1 <+129>: movsxd rax,DWORD PTR [rbx]
0x00007ffff2f867b4 <+132>: jmp 0x7ffff2f86772 <data_addblob.isra.11+66>

These are the rpm versions for the 3 relevant rpms in case they matter:

2023-11-17 05:22:35PST [ user@repository-mirror002:~ ]
$ rpm -qi libdnf
Name : libdnf
Version : 0.63.0
Release : 14.0.1.el8_8
Architecture: x86_64
Install Date: Tue 20 Jun 2023 08:49:36 AM PDT
Group : Unspecified
Size : 2417728
License : LGPLv2+
Signature : RSA/SHA256, Tue 16 May 2023 05:08:46 PM PDT, Key ID 82562ea9ad986da3
Source RPM : libdnf-0.63.0-14.0.1.el8_8.src.rpm
Build Date : Tue 16 May 2023 05:06:01 PM PDT
Build Host : build-ol8-x86_64.oracle.com
Relocations : (not relocatable)
Vendor : Oracle America
URL : https://github.com/rpm-software-management/libdnf
Summary : Library providing simplified C and Python API to libsolv
Description :
A Library providing simplified C and Python API to libsolv.
2023-11-17 05:22:47PST [ user@repository-mirror002:~ ]
$ rpm -qi libsolv
Name : libsolv
Version : 0.7.20
Release : 4.el8_7
Architecture: x86_64
Install Date: Thu 16 Nov 2023 12:35:44 PM PST
Group : Unspecified
Size : 803747
License : BSD
Signature : RSA/SHA256, Wed 14 Dec 2022 06:39:09 AM PST, Key ID 82562ea9ad986da3
Source RPM : libsolv-0.7.20-4.el8_7.src.rpm
Build Date : Wed 14 Dec 2022 06:35:33 AM PST
Build Host : build-ol8-x86_64.oracle.com
Relocations : (not relocatable)
Vendor : Oracle America
URL : https://github.com/openSUSE/libsolv
Summary : Package dependency solver
Description :
A free package dependency solver using a satisfiability algorithm. The
library is based on two major, but independent, blocks:

  • Using a dictionary approach to store and retrieve package
    and dependency information.

  • Using satisfiability, a well known and researched topic, for
    resolving package dependencies.
    2023-11-17 05:22:53PST [ user@repository-mirror002:~ ]
    $ rpm -qi yum-utils
    Name : yum-utils
    Version : 4.0.21
    Release : 19.0.1.el8_8
    Architecture: noarch
    Install Date: Thu 09 Nov 2023 01:03:11 PM PST
    Group : Unspecified
    Size : 23135
    License : GPLv2+
    Signature : RSA/SHA256, Tue 16 May 2023 04:57:29 PM PDT, Key ID 82562ea9ad986da3
    Source RPM : dnf-plugins-core-4.0.21-19.0.1.el8_8.src.rpm
    Build Date : Tue 16 May 2023 04:54:24 PM PDT
    Build Host : build-ol8-x86_64.oracle.com
    Relocations : (not relocatable)
    Vendor : Oracle America
    URL : https://github.com/rpm-software-management/dnf-plugins-core
    Summary : Yum-utils CLI compatibility layer
    Description :
    As a Yum-utils CLI compatibility layer, supplies in CLI shims for
    debuginfo-install, repograph, package-cleanup, repoclosure, repomanage,
    repoquery, reposync, repotrack, repodiff, builddep, config-manager, debug,
    download and yum-groups-manager that use new implementations using DNF.

@mlschroe
Copy link
Member

Can you please also create a backtrace for the 0.7.26 version?

@mlschroe
Copy link
Member

Can I access the repository with the big repodata so that I can reproduce the crash?

@filsdepatrick
Copy link
Author

Unfortunately I'm unable to reproduce the error now as the source repo has been pruned down to about 6k packages, and the filelists.xml.gz file is now only 250MB in size. That seems to support my assumption that the error is related to the size of the metadata. The repo itself is an internal corporate repo with proprietary packages, so I'm not able to provide access to it.

@filsdepatrick
Copy link
Author

filsdepatrick commented Nov 29, 2023

I was able to reproduce the segmentation fault by creating a large repo from rpms from several of our devops upstream repos. The repo has over 13k rpms and a filelists.xml.gz file that is approx 510 MB I'm attaching the backtrace of the reposync process attempting to mirror this repo. This was run with libsolv-0.7.26

backtrace_with_libsolv-0.7.26.txt

@filsdepatrick
Copy link
Author

I can provide the primary.xml.gz file that has the packages, sizes, provides and dependencies. Although I cannot give access to the source repository itself, the primary.xml file should help to create a repo that can be used to reproduce the error. The backtrace attached to the previous comment has the reposync command as we're running it at the top of the file.

2ab0ae584b7537d0f21de98c29468ecb1cbc3964a07fce0ca2820195cc92152c-primary.xml.gz

@filsdepatrick
Copy link
Author

filsdepatrick commented Dec 6, 2023

I can provide some additional test results from reproducing this issue in a lab using the test repo described above on an internal upstream mirror:

Summary:
Starting state of test repo:
13279 rpms
filelists.xml.gz 510 MB

I tested the occurance of segfault after reducing the size of the test repo by 1000 packages and regenerating the metadata on each iteration.  After 1 reduction, the segfault was still occurring on the downstream lab host when attempting to use reposync to mirror it.  After the second reduction of 1000 packages, the segfault no longer occurred.

-1000 packages segfault occurs (filelists.xml.gz 428 MB)
-1000 packages no segfault (filelists.xml.gz 371 MB)
To find the lower threshold, I added back the most recently eliminated packages by halves and observed:

+500 packages no segfault (filelists.xml.gz 389 MB)
+250 packages no segfault (filelists.xml.gz 407 MB)
+125 packages no segfault (filelists.xml.gz 419 MB)
+68 packages no segfault (filelists.xml.gz 424 MB)
+28 packages no segfault (filelists.xml.gz 426 MB)
+29 (remaining packages to bring to same state as last segfault) segfault recurs (filelists.xml.gz 428 MB)

The segfault occurs when the filelists.xml.gz file reached a size of 428 MB, but did not occur when that metadata file was 426 MB in size.

The difference in the count of files and directories listed in the metadata file was 74501694 when the segfault occurred versus 74133397 when there was no segfault at the next smaller metadata file size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants