-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade Apache Commons Compress to 1.27.1 #1961
Comments
Hi @tc-wleite! I think we can upgrade this and other dependencies on master, targeting 4.2.0. I already plan to upgrade Tika for 4.2.0 and I should run a large regression test before releasing it. |
I noticed that #2091 pointed out a possible vulnerability in 1.21, recommending the upgrade to 1.26. |
There are 2 fixed CVEs in 1.26, CVE-2024-26308 (OOME) is worse to me, let's upgrade. Sorry @wladimirleite, I missed this in the queue. |
Reopening, while testing #2093, I detected in a small test a lot of *.ar (application/x-archive) files stopped to be expanded. I'll test Compress 1.24 and 1.25 to see if that *.ar regression disappears, but I think we should do a large test before this upgrade. |
CommonsCompress-1.25 doesn't have the *.ar regression I detected. I'll downgrade to it on master, but I'll leave this open until we run a larger test to see if there are other regressions. |
That is bad news.
Are the vulnerabilities solved in 1.25? |
I think no. Although non ideal, we have timeouts to protect against infinite loops (DoS) and external parsing to isolate OOMEs. Fortunately there is no (known) Remote Code Execution vulnerability. |
I meant Fortunately. |
If the only problem with 1.26 is with AR, maybe one option would be using |
That worked! |
@lfcnassif, working on a real case, very important valid ZIP files were not expanded. Fortunately, I noticed the parser exceptions and tried to figure out what was going on. |
Hi @wladimirleite, thank you for reporting that. ZIP is much more important to handle correctly than AR. I'll upgrade when I have some time. But could you test the original 7zip sample referenced by the first post here with commons compress 1.27.1? |
Sure! I will try with that 7z. |
One question: does 4.1.6 version handle those valid zip files correctly? |
No, it doesn't. |
I found out that @lfcnassif, as compressed files are quite important (may contain a lot of files), have you ever considered using a |
Great!
Not for generic containers, but for specialized ones, yes, e.g. for DOCX files run PackageParser if OOXMLParser fails (running RawStringParser over ZIP files doesn't extract too much info), for DOC files run GenericOLEParser if OfficeParser fails, etc. A possible bad side effect would be extracting duplicated subitems if first parser fails after extracting a subset of them...
Yes, I put it there thinking about use cases like above, but I never tested it. Taking a quick look at the code, at least the |
Running a small to medium size regression test between compress 1.25.0 and 1.27.1... |
Tests seem fine, there are very minor differences for good and for bad, but the critical *.ar regression I detected with compress 1.26.0 is gone with 1.27.1. |
@wladimirleite, please feel free to open another issue to tackle the problem of some ZIPs still not being handled by compress-1.27.1, maybe using a MultipleParser. |
Great! For now upgrading to 1.27.1 will already solve a lot of cases. |
Processing a 7z of ~7 GB, another user here got several errors.
Testing the archive (with the 7-zip program), it seems fine.
After investigating other possibilities, I tried to use the latest version of Apache Commons Compress (1.24.0), and the processing finished successfully (currently, version 1.21 is used).
@lfcnassif, I know that upgrading this kind of dependency is critical, so I will leave this issue as a suggestion to be evaluated.
The text was updated successfully, but these errors were encountered: