-
-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for .a
archives and (by extension) ELF (shared) objects
#1
Comments
While researching Mach-O, I stumbled upon ELFSharp, which could be an alternative for parsing ELF files. However I think this runs into the same issues I mentioned earlier of it doing way more than we need and me wanting to understand ELF better for the sake of properly implementing |
Found the spec for the archive file format! It's defined on page 152 of https://refspecs.linuxfoundation.org/elf/gabi41.pdf (Chapter 7: Formats and Protocols, section 2: Archive File.) |
Added partial support for both Linux-style archives and ELF files. Still need to update the readme and publish a new NuGet package. Might also wait until I finish #3 too. |
…onality. Made ELF handling in LinkImportsTransformation more robust. The primary motivation behind this change was to enable reading Linux `.a` static libraries in LinkImportsTransformation. LibObjectFile has some issues parsing the PhysX static libraries and doesn't support automatically identifying the type of library archive being passed to it. (Windows `.lib` files and Linux `.a` files are the same-but-different never-standardized format and differentiating them is weird.) I wrote additional details about the motivations behind this change in PathogenDavid/Kaisa#1 This change also includes tests for trying to import symbols from Linux `.a` static libraries as well as `.o` object files. (The latter is not an intended feature, but I did it on accident when writing the `.a` tests and realized it didn't work as intended. As a side-effect of this change, I made LinkImportsTransformation less picky about certain things. In particular it no longer looks at the names of sections but instead uses the section flags to identify the `ImportType` of the symbol. I also added a LinkImportsTrnasformation.ContainsSymbol method so advanced consumers can use LinkImportsTransformation as an easy way to query information about libraries without interfacing with Kaisa themselves. (Eventually I want something like this more polished in Kaisa proper -- PathogenDavid/Kaisa#3)
…onality. Made ELF handling in LinkImportsTransformation more robust. The primary motivation behind this change was to enable reading Linux `.a` static libraries in LinkImportsTransformation. LibObjectFile has some issues parsing the PhysX static libraries and doesn't support automatically identifying the type of library archive being passed to it. (Windows `.lib` files and Linux `.a` files are the same-but-different never-standardized format and differentiating them is weird.) I wrote additional details about the motivations behind this change in PathogenDavid/Kaisa#1 This change also includes tests for trying to import symbols from Linux `.a` static libraries as well as `.o` object files. (The latter is not an intended feature, but I did it on accident when writing the `.a` tests and realized it didn't work as intended. As a side-effect of this change, I made LinkImportsTransformation less picky about certain things. In particular it no longer looks at the names of sections but instead uses the section flags to identify the `ImportType` of the symbol. I also added a LinkImportsTrnasformation.ContainsSymbol method so advanced consumers can use LinkImportsTransformation as an easy way to query information about libraries without interfacing with Kaisa themselves. (Eventually I want something like this more polished in Kaisa proper -- PathogenDavid/Kaisa#3)
Currently Biohazrd's
LinkImportsTransformation
uses LibObjectFile when parsing.so
shared objects on Linux and does not support parsing.a
library archives. (If you attempt to load a.a
library archive it actually combusts violently since their header magic is the same as Windows.lib
files but the format is slightly different.).a
and.lib
files are both!<arch>
-magic archive files. This format isn't actually standardized and barely even has a name. (The Microsoft documentation literally just calls them archive files, which is decidedly the most infuriatingly impossible thing to search ever.)I've been unable thus far to find a good documentation source on the format of
.a
files, but Wikipedia has a nicely detailed section (Archived PDF) on it that seems to be correct. (I suspect the closest thing to official "reference" documentation is the source code forar
.) As the Wikipedia article notes, the format was never properly standardized and has many variants. One major downside of this lack of standardization is there's no great way to differentiate the different types. (Which kinda makes sense, it's basically a no compression archive format that has a bunch of non-standard extensions.)As such we can't easily have Biohazrd peek at the
.a
/.lib
file and try to decide whether it's a Windows-style archive or a GNU-style archive. If we could, in theory we could've deferred to LibObjectFile for the GNU-style ones and keep using Kaisa for the Windows-style ones. We could also just go off of the file extension, but I'm not crazy about doing this.However, even if we did differentiate (since it is possible -- see differences below) LibObjectFile doesn't seem to properly support (modern? LLVM-generated?)
.a
files. I briefly tested it withlibPhysXCharacterKinematic_static_64.a
and it combusts violently regardless of whichArArchiveKind
I specify. However however, parsing the actually archive isn't the problematic part, it's the object files within. I also briefly tested modifying Kaisa to parse the.a
files and defer to LibObjectFile for parsing the ELF objects within and that combusted violently too. (Some object files parsed fine, but others complained about sections overlapping or invalid section info.)So this leaves us with a decision: Fix LibObjectFile or parse the ELF files ourselves. While I'd love to upstream fixes to LibObjectFile, I'm actually inclined to have us parse the ELF files ourselves:
LinkImportsTransformation
and others who consume its output.As such I think I'm going to make an ELF parser in Kaisa. We can use this for both GNU-style archives and for parsing shared object files. I do plan to look at LibObjectFile as I go through the spec and see if the issue immediately jumps out to me. (Based on the error messages though, I think it's some failure to follow the letter of the spec so it's probably very subtle.)
Differences between
.a
and.lib
Luckily it seems the GNU variant and the Microsoft variants are very closely related. These are the two main differences I'd identified:
//
) is delimited by\n
instead of\0
/
suffix just like shortnames doThe first two are pretty easy to resolve. The last one slightly less so because COFF files aren't really identifiable. Luckily ELF files are since they have the header magic
0x7F, 'E', 'L', 'F'
. There's already a precedent for parsing the first 32 bits of the file to determine it's type thanks for import archive members so I think it's pretty reasonable to put a check for ELF files here too. (In pedantic land this means we don't support a COFF member for the0x457F
machine type (processor architecture) with exactly0x464C
sections, but that's probably fine since that's absurd levels of pedantry.) (In fact I think I might add logic to skip parsing a COFF member if the machine type is invalid to avoid crashing when we interpret something that isn't a COFF member as a COFF member.)After that the only issue is parsing the ELF files...
ELF file spec
I found what is probably the most canonical ELF file spec on the Linux foundation's reference specifications page. There's a few different specifications linked with no clear winner. The two "best" ones appear to be the TIS 1.2 spec from 1995 (Archive) (only contains ELF), the 1997 System-V ABI (Archive), or the draft spec from 2001.
There's also the AMD64-specific extensions to ELF. These aren't critical but are worth keeping in mind. The Linux refpsecs link up to v0.99 (Archive) and I also have in my personal documentation folder a version 1.0 PDF (which is derived from https://gitlab.com/x86-psABIs/x86-64-ABI which seems to be the canonical spec.)
I'll probably end up basing things on the 2001 draft ELF spec (since presumably the fact that it's linked from this page that means it's the version the Kernel developers use use) along with the 1.0 PDF of the AMD64 extensions. The draft specs are also presented as HTML which makes them much easier to link to in comments.
The text was updated successfully, but these errors were encountered: