Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect header checksum error #111

Closed
Byrth opened this issue May 10, 2021 · 14 comments · Fixed by #116
Closed

Incorrect header checksum error #111

Byrth opened this issue May 10, 2021 · 14 comments · Fixed by #116

Comments

@Byrth
Copy link

Byrth commented May 10, 2021

As indicated in this ticket, unpacking tzdata1997b.tar.gz seems to cause Tar.jl to throw an error:

Pkg.PlatformEngines.download_verify_unpack("https://data.iana.org/time-zones/releases/tzdata1997b.tar.gz", "8a6c0801bb2474342a48d6c2253e7983d50d4c6d80a1d455c06fa4e4434df3b3", "/root/.julia/artifacts/jl_rJlNRK"; verbose = true, quiet_download = false)

CentOS7, Julia 1.6.1

The tar utility in bash takes no issue with this file.

@giordano
Copy link
Member

Duplicate of #91?

@Byrth
Copy link
Author

Byrth commented May 10, 2021

Looks like it is. I only looked at the first post or two and didn't see the header checksum thing.

Within the TimeZones package, it erred on tzdata___.tar.gz: 1997b, 95d, 95g 95e, 1999f, 96e, 95i, 93g, and 1997c for me.

They are all old and IANA is likely hosting the original .tar.gzs, so I wonder if there is some kind of .tar standard issue here too.

@Byrth Byrth closed this as completed May 10, 2021
@StefanKarpinski
Copy link
Member

I'm pretty reluctant to ignore header checksums — if so, why bother having a checksum? Even if #91 were implemented and there was an option to turn off checksum verification, we wouldn't want to turn that option on in the Pkg or Artifacts code, so it wouldn't help in this case. I think that if there are really old, broken tarballs and you want to use them as artifacts, they should probably be rehosted with correctly formatted tarballs. You could potentially ask IANA to reformat those tarballs but keep the content the same. I'm not sure how one contacts them to ask that.

@StefanKarpinski
Copy link
Member

On the other hand, when Pkg is downloading something and has a SHA1 hash of the entire file that it verifies before unpacking, the checksum becomes somewhat redundant, so maybe we could set an option to ignore header checksums in that case since the entire content has already been checksummed.

@giordano
Copy link
Member

Maybe try to reach out to Paul Eggert?

@Byrth
Copy link
Author

Byrth commented May 10, 2021

Looking at the headers of these files, it seems that they just didn't fill the chksum field, as if they got midway through calculating it and then stopped instead of writing it. I wonder if the way they were making the files back then had it as a two-step process or had some kind of bug that failed to write them. I somewhat doubt IANA is going to change an outdated standards file at this point, but perhaps it's worth a shot.

The checksum is a bit redundant, so I am okay with ignoring it but I understand the trepidation. My temporary solution is to just blacklist the download by creating empty directory in .julia/artifacts/$sha1 for these folders (will update as I find more).

# blacklist tzdata 1997b, 95d, 95g 95e, 1999f, 96e, 95i, 93g, 1997c, 96c, 1996n, 94f, 94a,
# 96d, 94d, 95f, 95b, 96b, 96i, 95c, 96a, 96h, 94e, 94h, 95h, 95l, 94b, 95k, 95m
mkdir 00af50ba7c66b49ad97fdf7a529e476aec5039ad \
    4181abd2be1fc8f2964c0681d8bc3ff7d38c299a \
    2ad4eaf2fd0db4fd756019cca85d1139303e0c6f \
    6ac70b261703c39615f2e7797aa3662f57e97a9a \
    02232194ac597e358d30e1f7da36e8ef2ecea1ca \
    d34c0c6aeb16e0a4e1b4b06573459df0f4c86122 \
    d5506bf9dc54bddf87ce34e54075f1bdb8e1e01c \
    e7ddeb4a45080afa6dbdd7c38df81bcf8b9d8f1a \
    191a315cdfccae9f67c584f39c7341182fef2d7d \
    0098bd8f0a47871e728d40fca476b7c42f988894 \
    41df1c2e7f1ab4ae8e9c3ed1904c807238a11e16 \
    120dc3d384840054741fa6d3b1d4bb8eba2396b6 \
    3792c6bcf2dd410edabc0de920b3262e9bcf08b9 \
    c3abe4165d88d36b015eb341b65ff5c5a4c98bf4 \
    ae7b60dc0c96fc76c945b5f95b464c2672a1f12b \
    f052bec9396067db314ae37cba669e1fdb528742 \
    5cee98fd13b7cc03af48b926534759f515d2c1b6 \
    214e8413b5da90d6e5d9f6d916be43b48a9f0669 \
    33143e05ac18f5c2618df3b30ade30b488ecbc83 \
    b973690beafaad07f8fe8172a664e3995665d510 \
    d979d12f0912737480e554e2915459b905220779 \
    463561ed73c004096d46613a9c94af2b635c4ebf \
    90b27ec406fbe1e30a7b486edde037c7fa475c27 \
    a7192b7b16d072863839ee24cc68cb944f942fa4 \ 
    ab0e8405a16af3653a5610979ab2a7392ec4fbba \
    64afaaed2b88c38a29f9fb4198f22662323d3ceb \
    ff665b96f7d50272d0a8e0e321d530c8b09380a5 \
    4641d762bdfd45005ffe23a21bace16fb2e5c50a \
    f7b20d08ea0fae0f43fce023421032eb9c839f27

All of the problematic .tar.gzs are < 2000, but all .tar.gzs below 2000 aren't problematic.

@Byrth
Copy link
Author

Byrth commented May 13, 2021

Maybe try to reach out to Paul Eggert?

I contacted Paul about this and he replied. It looks like the bad checksums are due to a faulty SunOS tar utility and he suggests that we could perhaps allow them anyway.

I think that patching and rehosting them or disabling checksum verification (as it is redundant to an extent with SHA varification) are still probably the best options.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Jun 2, 2021

I looked into this more and it turns out that the checksum isn't invalid. Rather, these old tarballs use a strange format when emitting numbers: they put leading spaces before the digits of the integer value. This is contrary to the spec, which states:

All other fields are leading zero-filled octal numbers using digits from the ISO/IEC 646:1991 standard IRV. Each numeric field is terminated by one or more <space> or NUL characters.

However, this seems harmless to support parsing, so I'm preparing a PR to do that. The reason I had been misdiagnosing this as the checksum field being blank (all spaces) is a combination of things:

  • we parse a checksum with leading spaces as the value zero
  • in order to validate the checksum of the header, the field is filled with spaces
  • we restore the original checksum data back if the check passes (so Tar.list sees the right value)
  • but we were checking the value and throwing an error before restoring the checksum data

Quite a conspiracy to make this seem like a different problem than it actually was: the parsing of a field with leading spaces as zero was causing the checksum value to appear to be zero and dumping the header before restoring the checksum data made the contents appear to be all spaces, which matches the claimed value.

StefanKarpinski added a commit that referenced this issue Jun 2, 2021
Some old versions of `tar` (e.g. Solaris) emitted integer fields with
leadings *spaces* which is technically contrary to the standard, but
seems harmless to support.
StefanKarpinski added a commit that referenced this issue Jun 2, 2021
Some old versions of `tar` (e.g. Solaris) emitted integer fields with
leadings *spaces* which is technically contrary to the standard, but
seems harmless to support.
StefanKarpinski added a commit that referenced this issue Jun 3, 2021
Some old versions of `tar` (e.g. Solaris) emitted integer fields with
leadings *spaces* which is technically contrary to the standard, but
seems harmless to support. This adds a fragment of this tarball:

    https://sparse.tamu.edu/MM/Oberwolfach/LF10.tar.gz

This is a collection of example sparse matrix files. The fragment is a
single header entry followed by one block of data and is a valid tarball
by itself with the idiosyncratic field formatting of Solaris `tar`.
StefanKarpinski added a commit that referenced this issue Jun 4, 2021
Some old versions of `tar` (e.g. Solaris) emitted integer fields with
leadings *spaces* which is technically contrary to the standard, but
seems harmless to support. This adds a fragment of this tarball:

    https://sparse.tamu.edu/MM/Oberwolfach/LF10.tar.gz

This is a collection of example sparse matrix files. The fragment is a
single header entry followed by one block of data and is a valid tarball
by itself with the idiosyncratic field formatting of Solaris `tar`.
StefanKarpinski added a commit that referenced this issue Jun 4, 2021
Some old versions of `tar` (e.g. Solaris) emitted integer fields with
leadings *spaces* which is technically contrary to the standard, but
seems harmless to support. This adds a fragment of this tarball:

    https://sparse.tamu.edu/MM/Oberwolfach/LF10.tar.gz

This is a collection of example sparse matrix files. The fragment is a
single header entry followed by one block of data and is a valid tarball
by itself with the idiosyncratic field formatting of Solaris `tar`.

(cherry picked from commit ed0c4d2)
StefanKarpinski added a commit that referenced this issue Jun 4, 2021
Some old versions of `tar` (e.g. Solaris) emitted integer fields with
leadings *spaces* which is technically contrary to the standard, but
seems harmless to support. This adds a fragment of this tarball:

    https://sparse.tamu.edu/MM/Oberwolfach/LF10.tar.gz

This is a collection of example sparse matrix files. The fragment is a
single header entry followed by one block of data and is a valid tarball
by itself with the idiosyncratic field formatting of Solaris `tar`.

(cherry picked from commit ed0c4d2)
@kmsquire
Copy link

Hi @StefanKarpinski, can you make a release so this fix is generally available without using master?

@kmsquire
Copy link

kmsquire commented Sep 13, 2021

Actually, I just tried master with https://data.iana.org/time-zones/releases/tzdata1997b.tar.gz, and it still fails with

(build) pkg> status
      Status `~/Workspace/ai/FifaTrialSmoothing/build/Project.toml`
  [9b87118b] PackageCompiler v1.4.0 `https://github.com/kmsquire/PackageCompiler.jl.git#kms/fix-include-lazy-artifacts`
  [a4e569a6] Tar `https://github.com/JuliaIO/Tar.jl.git#master`

julia> using Tar

julia> Tar.list("../tzdata1997b.tar")
ERROR: incorrect header checksum = 0; should be 2782
"africa\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0   444 \0 21140 \0     0 \0      53756  6305412777         \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:33
  [2] read_standard_header(io::IOStream; buf::Vector{UInt8}, tee::Base.DevNull)
    @ Tar /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/Tar/src/extract.jl:496
  [3] iterate_headers(callback::Tar.var"#71#72"{Vector{Tar.Header}}, tar::IOStream; raw::Bool, strict::Bool, buf::Vector{UInt8})
    @ Tar /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/Tar/src/extract.jl:17
  [4] #68
    @ /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/Tar/src/Tar.jl:131 [inlined]
  [5] open(f::Tar.var"#68#69"{Bool, Bool, Tar.var"#71#72"{Vector{Tar.Header}}}, args::String; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Base ./io.jl:330
  [6] open
    @ ./io.jl:328 [inlined]
  [7] arg_read
    @ /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/ArgTools/src/ArgTools.jl:60 [inlined]
  [8] #list#67
    @ /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/Tar/src/Tar.jl:130 [inlined]
  [9] #list#70
    @ /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/Tar/src/Tar.jl:141 [inlined]
 [10] list(tarball::String)
    @ Tar /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/Tar/src/Tar.jl:140
 [11] top-level scope
    @ REPL[2]:1

@kmsquire kmsquire reopened this Sep 13, 2021
@kmsquire
Copy link

kmsquire commented Sep 14, 2021

Okay, I guess this might be because Tar is part of the standard library, and I was actually using that version above.

Edit: And because I was on Julia 1.6.1

@StefanKarpinski
Copy link
Member

Yeah, this is a major annoyance of stdlibs—you cannot freely load newer versions of them. Does it work for you on Julia master? And hopefully also on the Julia 1.7 betas?

@kmsquire
Copy link

It works fine on 1.6.2. I haven't had the chance to test further.

@StefanKarpinski
Copy link
Member

Ok, that seems good enough to verify that it's solved. Thanks for the feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants