Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for .a archives and (by extension) ELF (shared) objects #1

Closed
PathogenDavid opened this issue Oct 28, 2021 · 3 comments
Closed

Comments

@PathogenDavid
Copy link
Owner

Currently Biohazrd's LinkImportsTransformation uses LibObjectFile when parsing .so shared objects on Linux and does not support parsing .a library archives. (If you attempt to load a .a library archive it actually combusts violently since their header magic is the same as Windows .lib files but the format is slightly different.)

.a and .lib files are both !<arch>-magic archive files. This format isn't actually standardized and barely even has a name. (The Microsoft documentation literally just calls them archive files, which is decidedly the most infuriatingly impossible thing to search ever.)

I've been unable thus far to find a good documentation source on the format of .a files, but Wikipedia has a nicely detailed section (Archived PDF) on it that seems to be correct. (I suspect the closest thing to official "reference" documentation is the source code for ar.) As the Wikipedia article notes, the format was never properly standardized and has many variants. One major downside of this lack of standardization is there's no great way to differentiate the different types. (Which kinda makes sense, it's basically a no compression archive format that has a bunch of non-standard extensions.)

As such we can't easily have Biohazrd peek at the .a/.lib file and try to decide whether it's a Windows-style archive or a GNU-style archive. If we could, in theory we could've deferred to LibObjectFile for the GNU-style ones and keep using Kaisa for the Windows-style ones. We could also just go off of the file extension, but I'm not crazy about doing this.

However, even if we did differentiate (since it is possible -- see differences below) LibObjectFile doesn't seem to properly support (modern? LLVM-generated?) .a files. I briefly tested it with libPhysXCharacterKinematic_static_64.a and it combusts violently regardless of which ArArchiveKind I specify. However however, parsing the actually archive isn't the problematic part, it's the object files within. I also briefly tested modifying Kaisa to parse the .a files and defer to LibObjectFile for parsing the ELF objects within and that combusted violently too. (Some object files parsed fine, but others complained about sections overlapping or invalid section info.)

So this leaves us with a decision: Fix LibObjectFile or parse the ELF files ourselves. While I'd love to upstream fixes to LibObjectFile, I'm actually inclined to have us parse the ELF files ourselves:

  • LibObjectFile is actually a little too flexible for our use-case and ends up putting a lot of unneeded ELF-specific logic down into LinkImportsTransformation and others who consume its output.
  • Because we need the ELF-specific logic, I've actually been meaning to read the ELF specification closer to understand if we're interpreting symbol tables as intended.
  • By the time I understand the ELF specification well enough to fix LibObjectFile I probably could've written a good-enough parser for Kaisa.

As such I think I'm going to make an ELF parser in Kaisa. We can use this for both GNU-style archives and for parsing shared object files. I do plan to look at LibObjectFile as I go through the spec and see if the issue immediately jumps out to me. (Based on the error messages though, I think it's some failure to follow the letter of the spec so it's probably very subtle.)

Differences between .a and .lib

Luckily it seems the GNU variant and the Microsoft variants are very closely related. These are the two main differences I'd identified:

  • The longnames file (//) is delimited by \n instead of \0
  • Longnames have a / suffix just like shortnames do
  • The actual object files are in ELF instead of COFF

The first two are pretty easy to resolve. The last one slightly less so because COFF files aren't really identifiable. Luckily ELF files are since they have the header magic 0x7F, 'E', 'L', 'F'. There's already a precedent for parsing the first 32 bits of the file to determine it's type thanks for import archive members so I think it's pretty reasonable to put a check for ELF files here too. (In pedantic land this means we don't support a COFF member for the 0x457F machine type (processor architecture) with exactly 0x464C sections, but that's probably fine since that's absurd levels of pedantry.) (In fact I think I might add logic to skip parsing a COFF member if the machine type is invalid to avoid crashing when we interpret something that isn't a COFF member as a COFF member.)

After that the only issue is parsing the ELF files...

ELF file spec

I found what is probably the most canonical ELF file spec on the Linux foundation's reference specifications page. There's a few different specifications linked with no clear winner. The two "best" ones appear to be the TIS 1.2 spec from 1995 (Archive) (only contains ELF), the 1997 System-V ABI (Archive), or the draft spec from 2001.

There's also the AMD64-specific extensions to ELF. These aren't critical but are worth keeping in mind. The Linux refpsecs link up to v0.99 (Archive) and I also have in my personal documentation folder a version 1.0 PDF (which is derived from https://gitlab.com/x86-psABIs/x86-64-ABI which seems to be the canonical spec.)

I'll probably end up basing things on the 2001 draft ELF spec (since presumably the fact that it's linked from this page that means it's the version the Kernel developers use use) along with the 1.0 PDF of the AMD64 extensions. The draft specs are also presented as HTML which makes them much easier to link to in comments.

@PathogenDavid
Copy link
Owner Author

While researching Mach-O, I stumbled upon ELFSharp, which could be an alternative for parsing ELF files. However I think this runs into the same issues I mentioned earlier of it doing way more than we need and me wanting to understand ELF better for the sake of properly implementing LinkImportsTransformation. It's MIT-licensed though, so if ELF becomes a pain it might be worth looking into.

@PathogenDavid
Copy link
Owner Author

PathogenDavid commented Oct 28, 2021

Found the spec for the archive file format! It's defined on page 152 of https://refspecs.linuxfoundation.org/elf/gabi41.pdf (Chapter 7: Formats and Protocols, section 2: Archive File.)

@PathogenDavid
Copy link
Owner Author

Added partial support for both Linux-style archives and ELF files. Still need to update the readme and publish a new NuGet package. Might also wait until I finish #3 too.

PathogenDavid added a commit to MochiLibraries/Biohazrd that referenced this issue Oct 31, 2021
…onality. Made ELF handling in LinkImportsTransformation more robust.

The primary motivation behind this change was to enable reading Linux `.a` static libraries in LinkImportsTransformation. LibObjectFile has some issues parsing the PhysX static libraries and doesn't support automatically identifying the type of library archive being passed to it. (Windows `.lib` files and Linux `.a` files are the same-but-different never-standardized format and differentiating them is weird.) I wrote additional details about the motivations behind this change in PathogenDavid/Kaisa#1

This change also includes tests for trying to import symbols from Linux `.a` static libraries as well as `.o` object files. (The latter is not an intended feature, but I did it on accident when writing the `.a` tests and realized it didn't work as intended.

As a side-effect of this change, I made LinkImportsTransformation less picky about certain things. In particular it no longer looks at the names of sections but instead uses the section flags to identify the `ImportType` of the symbol.

I also added a LinkImportsTrnasformation.ContainsSymbol method so advanced consumers can use LinkImportsTransformation as an easy way to query information about libraries without interfacing with Kaisa themselves. (Eventually I want something like this more polished in Kaisa proper -- PathogenDavid/Kaisa#3)
PathogenDavid added a commit to MochiLibraries/Biohazrd that referenced this issue Oct 31, 2021
…onality. Made ELF handling in LinkImportsTransformation more robust.

The primary motivation behind this change was to enable reading Linux `.a` static libraries in LinkImportsTransformation. LibObjectFile has some issues parsing the PhysX static libraries and doesn't support automatically identifying the type of library archive being passed to it. (Windows `.lib` files and Linux `.a` files are the same-but-different never-standardized format and differentiating them is weird.) I wrote additional details about the motivations behind this change in PathogenDavid/Kaisa#1

This change also includes tests for trying to import symbols from Linux `.a` static libraries as well as `.o` object files. (The latter is not an intended feature, but I did it on accident when writing the `.a` tests and realized it didn't work as intended.

As a side-effect of this change, I made LinkImportsTransformation less picky about certain things. In particular it no longer looks at the names of sections but instead uses the section flags to identify the `ImportType` of the symbol.

I also added a LinkImportsTrnasformation.ContainsSymbol method so advanced consumers can use LinkImportsTransformation as an easy way to query information about libraries without interfacing with Kaisa themselves. (Eventually I want something like this more polished in Kaisa proper -- PathogenDavid/Kaisa#3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant