Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-POSIX.1-1988 (i.e. v7) tar header #76

Closed
Xflofoxx opened this issue Feb 2, 2017 · 14 comments · Fixed by #204
Closed

Pre-POSIX.1-1988 (i.e. v7) tar header #76

Xflofoxx opened this issue Feb 2, 2017 · 14 comments · Fixed by #204
Labels
enhancement Add new functionality 🎁 Rewarded on Issuehunt This issue has been rewarded on Issuehunt help wanted

Comments

@Xflofoxx
Copy link

Xflofoxx commented Feb 2, 2017

Issuehunt badges

Hi, is it possible to add the headers for Pre-POSIX.1-1988 (i.e. v7) tar recognition?

Reference here: https://en.wikipedia.org/wiki/Tar_(computing)#cite_note-2

Thanks in advice!

stroncium earned $40.00 by resolving this issue!

@Xflofoxx Xflofoxx changed the title Pre-POSIX.1-1988 (i.e. v7) tar header: Pre-POSIX.1-1988 (i.e. v7) tar header Feb 2, 2017
@sindresorhus sindresorhus added enhancement Add new functionality help wanted labels Feb 2, 2017
@mifi
Copy link
Contributor

mifi commented Feb 2, 2017

Should be simple. Can libmagic (file cmd) recognize this?

@Xflofoxx
Copy link
Author

Xflofoxx commented Feb 2, 2017

Yes, here the output:

xflofoxx@mypc:~% file compressed.tar.bz2
compressed.tar.bz2: bzip2 compressed data, block size = 900k

@mifi
Copy link
Contributor

mifi commented Feb 2, 2017

BZ2 is another format with its own magic header, right?

@Xflofoxx
Copy link
Author

Xflofoxx commented Feb 2, 2017

Yes, sorry, let me explain better.
I'm using the library https://github.com/kevva/decompress from @kevva and it correctly bzip the file but then the file-type recognition of the unzipped data (that should be tar) is not identified correctly.

My first idea was to ask the library file-type. If correct identification wasn't possible then I would have asked the decompress lib to introduce a check such as "bypass tar check".

So, the bz2 format is correctly identified but the nested tar content not.

@mifi
Copy link
Contributor

mifi commented Feb 2, 2017

Ok, then please try to run "file" on the nested tar content instead of the bz2

@Xflofoxx
Copy link
Author

Xflofoxx commented Feb 2, 2017

xflofoxx@mypc:~% bunzip2 compressed.tar.bz2

xflofoxx@mypc:~% file compressed.tar
compressed.tar: tar archive

@mifi
Copy link
Contributor

mifi commented Feb 2, 2017

as far as i can see, file-type should already recognize tar, are you sure that it does not?

@Xflofoxx
Copy link
Author

Xflofoxx commented Feb 2, 2017

At line 102 there is this check:
if (buf[257] === 0x75 && buf[258] === 0x73 && buf[259] === 0x74 && buf[260] === 0x61 && buf[261] === 0x72) { return { ext: 'tar', mime: 'application/x-tar' }; }
and accordingly to Wikipedia it recognize perfectly UStar format (as I can see it checks byte 257-261 UStar indicator "ustar").

If I create a new tar I get the right format. The one I try to untar is a file taken from a fire detection system. The strange thing is that I can untar it like the one I create but the checked bytes have the following values:
buf[257] = 0
buf[258] = 0
buf[259] = 0
buf[260] = 0
buf[261] = 0
This is why I think the file was compressed with an old version, and I assume it was "Pre-POSIX.1-1988 (i.e. v7) tar"

The fields defined by the original Unix tar format are listed in the table below. The link indicator/file type table includes some modern extensions. When a field is unused it is filled with NUL bytes. The header uses 257 bytes, then is padded with NUL bytes to make it fill a 512 byte record. There is no "magic number" in the header, for file identification. - Wikipedia

Do you think it's possible to identify the file the same?

@Xflofoxx
Copy link
Author

Xflofoxx commented Feb 2, 2017

For completeness, if I save the buffer from the library into a tar file "manually" the response of the file cmd is "data" and not "tar archive".

@mifi
Copy link
Contributor

mifi commented Feb 3, 2017

From https://github.com/threatstack/libmagic/blob/master/magic/Magdir/archive:
pre-POSIX "tar" archives are handled in the C code.

I found some code here:
https://github.com/threatstack/libmagic/blob/master/src/is_tar.c

however i don't have time to analyze how it works now, feel free to try :)

@Xflofoxx
Copy link
Author

Xflofoxx commented Feb 3, 2017

Wow... thanks a lot!

@noisyui
Copy link

noisyui commented Jan 21, 2019

Actually the code can be borrowed from node-tar to decode the header of tar buffer and check the value of cksumValid, if there is no magic bytes at offset 257 in the header.

@IssueHuntBot
Copy link

@IssueHunt has funded $40.00 to this issue.


stroncium added a commit to stroncium/file-type that referenced this issue Apr 14, 2019
sindresorhus pushed a commit that referenced this issue Apr 19, 2019
@IssueHuntBot
Copy link

@sindresorhus has rewarded $36.00 to @stroncium. See it on IssueHunt

  • 💰 Total deposit: $40.00
  • 🎉 Repository reward(0%): $0.00
  • 🔧 Service fee(10%): $4.00

@issuehunt-oss issuehunt-oss bot added the 🎁 Rewarded on Issuehunt This issue has been rewarded on Issuehunt label May 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Add new functionality 🎁 Rewarded on Issuehunt This issue has been rewarded on Issuehunt help wanted
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants