-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect header checksum error #111
Comments
Duplicate of #91? |
Looks like it is. I only looked at the first post or two and didn't see the header checksum thing. Within the TimeZones package, it erred on tzdata___.tar.gz: 1997b, 95d, 95g 95e, 1999f, 96e, 95i, 93g, and 1997c for me. They are all old and IANA is likely hosting the original .tar.gzs, so I wonder if there is some kind of .tar standard issue here too. |
I'm pretty reluctant to ignore header checksums — if so, why bother having a checksum? Even if #91 were implemented and there was an option to turn off checksum verification, we wouldn't want to turn that option on in the Pkg or Artifacts code, so it wouldn't help in this case. I think that if there are really old, broken tarballs and you want to use them as artifacts, they should probably be rehosted with correctly formatted tarballs. You could potentially ask IANA to reformat those tarballs but keep the content the same. I'm not sure how one contacts them to ask that. |
On the other hand, when Pkg is downloading something and has a SHA1 hash of the entire file that it verifies before unpacking, the checksum becomes somewhat redundant, so maybe we could set an option to ignore header checksums in that case since the entire content has already been checksummed. |
Maybe try to reach out to Paul Eggert? |
Looking at the headers of these files, it seems that they just didn't fill the chksum field, as if they got midway through calculating it and then stopped instead of writing it. I wonder if the way they were making the files back then had it as a two-step process or had some kind of bug that failed to write them. I somewhat doubt IANA is going to change an outdated standards file at this point, but perhaps it's worth a shot. The checksum is a bit redundant, so I am okay with ignoring it but I understand the trepidation. My temporary solution is to just blacklist the download by creating empty directory in .julia/artifacts/$sha1 for these folders (will update as I find more). # blacklist tzdata 1997b, 95d, 95g 95e, 1999f, 96e, 95i, 93g, 1997c, 96c, 1996n, 94f, 94a,
# 96d, 94d, 95f, 95b, 96b, 96i, 95c, 96a, 96h, 94e, 94h, 95h, 95l, 94b, 95k, 95m
mkdir 00af50ba7c66b49ad97fdf7a529e476aec5039ad \
4181abd2be1fc8f2964c0681d8bc3ff7d38c299a \
2ad4eaf2fd0db4fd756019cca85d1139303e0c6f \
6ac70b261703c39615f2e7797aa3662f57e97a9a \
02232194ac597e358d30e1f7da36e8ef2ecea1ca \
d34c0c6aeb16e0a4e1b4b06573459df0f4c86122 \
d5506bf9dc54bddf87ce34e54075f1bdb8e1e01c \
e7ddeb4a45080afa6dbdd7c38df81bcf8b9d8f1a \
191a315cdfccae9f67c584f39c7341182fef2d7d \
0098bd8f0a47871e728d40fca476b7c42f988894 \
41df1c2e7f1ab4ae8e9c3ed1904c807238a11e16 \
120dc3d384840054741fa6d3b1d4bb8eba2396b6 \
3792c6bcf2dd410edabc0de920b3262e9bcf08b9 \
c3abe4165d88d36b015eb341b65ff5c5a4c98bf4 \
ae7b60dc0c96fc76c945b5f95b464c2672a1f12b \
f052bec9396067db314ae37cba669e1fdb528742 \
5cee98fd13b7cc03af48b926534759f515d2c1b6 \
214e8413b5da90d6e5d9f6d916be43b48a9f0669 \
33143e05ac18f5c2618df3b30ade30b488ecbc83 \
b973690beafaad07f8fe8172a664e3995665d510 \
d979d12f0912737480e554e2915459b905220779 \
463561ed73c004096d46613a9c94af2b635c4ebf \
90b27ec406fbe1e30a7b486edde037c7fa475c27 \
a7192b7b16d072863839ee24cc68cb944f942fa4 \
ab0e8405a16af3653a5610979ab2a7392ec4fbba \
64afaaed2b88c38a29f9fb4198f22662323d3ceb \
ff665b96f7d50272d0a8e0e321d530c8b09380a5 \
4641d762bdfd45005ffe23a21bace16fb2e5c50a \
f7b20d08ea0fae0f43fce023421032eb9c839f27 All of the problematic .tar.gzs are < 2000, but all .tar.gzs below 2000 aren't problematic. |
I contacted Paul about this and he replied. It looks like the bad checksums are due to a faulty SunOS I think that patching and rehosting them or disabling checksum verification (as it is redundant to an extent with SHA varification) are still probably the best options. |
I looked into this more and it turns out that the checksum isn't invalid. Rather, these old tarballs use a strange format when emitting numbers: they put leading spaces before the digits of the integer value. This is contrary to the spec, which states:
However, this seems harmless to support parsing, so I'm preparing a PR to do that. The reason I had been misdiagnosing this as the checksum field being blank (all spaces) is a combination of things:
Quite a conspiracy to make this seem like a different problem than it actually was: the parsing of a field with leading spaces as zero was causing the checksum value to appear to be zero and dumping the header before restoring the checksum data made the contents appear to be all spaces, which matches the claimed value. |
Some old versions of `tar` (e.g. Solaris) emitted integer fields with leadings *spaces* which is technically contrary to the standard, but seems harmless to support.
Some old versions of `tar` (e.g. Solaris) emitted integer fields with leadings *spaces* which is technically contrary to the standard, but seems harmless to support.
Some old versions of `tar` (e.g. Solaris) emitted integer fields with leadings *spaces* which is technically contrary to the standard, but seems harmless to support. This adds a fragment of this tarball: https://sparse.tamu.edu/MM/Oberwolfach/LF10.tar.gz This is a collection of example sparse matrix files. The fragment is a single header entry followed by one block of data and is a valid tarball by itself with the idiosyncratic field formatting of Solaris `tar`.
Some old versions of `tar` (e.g. Solaris) emitted integer fields with leadings *spaces* which is technically contrary to the standard, but seems harmless to support. This adds a fragment of this tarball: https://sparse.tamu.edu/MM/Oberwolfach/LF10.tar.gz This is a collection of example sparse matrix files. The fragment is a single header entry followed by one block of data and is a valid tarball by itself with the idiosyncratic field formatting of Solaris `tar`.
Some old versions of `tar` (e.g. Solaris) emitted integer fields with leadings *spaces* which is technically contrary to the standard, but seems harmless to support. This adds a fragment of this tarball: https://sparse.tamu.edu/MM/Oberwolfach/LF10.tar.gz This is a collection of example sparse matrix files. The fragment is a single header entry followed by one block of data and is a valid tarball by itself with the idiosyncratic field formatting of Solaris `tar`. (cherry picked from commit ed0c4d2)
Some old versions of `tar` (e.g. Solaris) emitted integer fields with leadings *spaces* which is technically contrary to the standard, but seems harmless to support. This adds a fragment of this tarball: https://sparse.tamu.edu/MM/Oberwolfach/LF10.tar.gz This is a collection of example sparse matrix files. The fragment is a single header entry followed by one block of data and is a valid tarball by itself with the idiosyncratic field formatting of Solaris `tar`. (cherry picked from commit ed0c4d2)
Hi @StefanKarpinski, can you make a release so this fix is generally available without using master? |
Actually, I just tried master with https://data.iana.org/time-zones/releases/tzdata1997b.tar.gz, and it still fails with (build) pkg> status
Status `~/Workspace/ai/FifaTrialSmoothing/build/Project.toml`
[9b87118b] PackageCompiler v1.4.0 `https://github.com/kmsquire/PackageCompiler.jl.git#kms/fix-include-lazy-artifacts`
[a4e569a6] Tar `https://github.com/JuliaIO/Tar.jl.git#master`
julia> using Tar
julia> Tar.list("../tzdata1997b.tar")
ERROR: incorrect header checksum = 0; should be 2782
"africa\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0 444 \0 21140 \0 0 \0 53756 6305412777 \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:33
[2] read_standard_header(io::IOStream; buf::Vector{UInt8}, tee::Base.DevNull)
@ Tar /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/Tar/src/extract.jl:496
[3] iterate_headers(callback::Tar.var"#71#72"{Vector{Tar.Header}}, tar::IOStream; raw::Bool, strict::Bool, buf::Vector{UInt8})
@ Tar /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/Tar/src/extract.jl:17
[4] #68
@ /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/Tar/src/Tar.jl:131 [inlined]
[5] open(f::Tar.var"#68#69"{Bool, Bool, Tar.var"#71#72"{Vector{Tar.Header}}}, args::String; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ Base ./io.jl:330
[6] open
@ ./io.jl:328 [inlined]
[7] arg_read
@ /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/ArgTools/src/ArgTools.jl:60 [inlined]
[8] #list#67
@ /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/Tar/src/Tar.jl:130 [inlined]
[9] #list#70
@ /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/Tar/src/Tar.jl:141 [inlined]
[10] list(tarball::String)
@ Tar /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/Tar/src/Tar.jl:140
[11] top-level scope
@ REPL[2]:1 |
Okay, I guess this might be because Tar is part of the standard library, and I was actually using that version above. Edit: And because I was on Julia 1.6.1 |
Yeah, this is a major annoyance of stdlibs—you cannot freely load newer versions of them. Does it work for you on Julia master? And hopefully also on the Julia 1.7 betas? |
It works fine on 1.6.2. I haven't had the chance to test further. |
Ok, that seems good enough to verify that it's solved. Thanks for the feedback! |
As indicated in this ticket, unpacking tzdata1997b.tar.gz seems to cause Tar.jl to throw an error:
CentOS7, Julia 1.6.1
The
tar
utility in bash takes no issue with this file.The text was updated successfully, but these errors were encountered: