Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds support for Apple Binary Plist, version 00 #427

Merged
merged 14 commits into from
Oct 4, 2022
Merged

Conversation

dgmcdona
Copy link
Contributor

This adds support for decoding Apple Binary Plists. The only well documented version is 00, and is therefore the only one supported here. I have tested this on both large and small binary plists, including ones with nested dictionaries.

Copy link
Owner

@wader wader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments, but in general looks good i think.

Maybe you want add a bplist.md file? see wasm.md etc, also remember to embed it and run make doc && WRITE_ACTUAL=1 go test ./format. You might need to rebase to make it work as i re-added markdown doc support some days ago.

Also could possibly add torepr support, see msgpack.jq etc. I can help you with that if your not familiar with jq

format/bplist/bplist.go Outdated Show resolved Hide resolved
format/bplist/bplist.go Outdated Show resolved Hide resolved
format/bplist/bplist.go Outdated Show resolved Hide resolved
format/bplist/bplist.go Outdated Show resolved Hide resolved
format/bplist/bplist.go Outdated Show resolved Hide resolved
pkg/decode/decode.go Show resolved Hide resolved
format/bplist/bplist.go Outdated Show resolved Hide resolved
format/bplist/bplist.go Outdated Show resolved Hide resolved
format/bplist/bplist.go Outdated Show resolved Hide resolved
format/bplist/bplist.go Outdated Show resolved Hide resolved
@wader
Copy link
Owner

wader commented Sep 15, 2022

Some test file with be nice Should be just to put a bplist in testdata/test.bplist and a testdata/test.fqtest with $ fq dv test.bplist and run tests with WRITE_ACTUAL and inspect if output looks sane

format/bplist/bplist.go Outdated Show resolved Hide resolved
return d.FE(int(n), decode.BigEndian)
})
case elementTypeDate:
d.FieldStrFn("value", func(d *decode.D) string {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is 64 bit float point unix time? if it was an integer you could dod.FieldF64("value", scalar.DescriptionActualUUnixTime) but this seems to be float, maybe should add some helper for that?

scalar.DescriptionActualUUnixTime sets description so you can still get the number in jq expressions

Copy link
Contributor Author

@dgmcdona dgmcdona Sep 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a helper function for decoding unix time and Cocoa date from float values

format/bplist/bplist.go Outdated Show resolved Hide resolved
format/bplist/bplist.go Outdated Show resolved Hide resolved
@wader
Copy link
Owner

wader commented Sep 15, 2022

A torepr function could look something like this for the current tree structure:

def _bplist_torepr:
  def _f:
    ( .object
    | if .type == "singleton" then .value
      elif .type == "int" then .value
      elif .type == "real" then .value
      elif .type == "date" then .value
      elif .type == "data" then .value.data
      elif .type == "ascii_string" then .value.value
      elif .type == "unicode_string" then .value.value
      elif .type == "uid" then .value
      elif .type == "array" then
        ( .elements
        | map(_f)
        )
      elif .type == "set" then
        ( .elements
        | map(_f)
        )
      elif .type == "dict" then
        ( .dictionary.entries
        | map({key: (.key | _f), value: (.value | _f)})
        | from_entries
        )
      else  error("unknown type: \(.type)")
      end
    );
  ( .objects
  | _f
  );

With this diff:

--- a/format/bplist/bplist.go
+++ b/format/bplist/bplist.go
@@ -1,6 +1,7 @@
 package bplist

 import (
+       "embed"
        "math"
        "time"

@@ -10,6 +11,9 @@ import (
        "github.com/wader/fq/pkg/scalar"
 )

+//go:embed bplist.jq
+var bplistFS embed.FS
+
 func init() {
        interp.RegisterFormat(decode.Format{
                Name:        format.BPLIST,
@@ -17,7 +21,9 @@ func init() {
                Description: "Apple Binary Property List",
                Groups:      []string{format.PROBE},
                DecodeFn:    bplistDecode,
+               Functions:   []string{"torepr"},
        })
+       interp.RegisterFS(bplistFS)
 }

 const (

You can do:

➜  fq git:(bplist) ✗ go run . torepr /System/Library/AssetsV2/com_apple_MobileAsset_LinguisticData/c55ebecca7037d50a22fe39315a802984688c69f.asset/AssetData/parser/config.plist
{
  "CFBundleDevelopmentRegion": "en",
  "CFBundleExecutable": "$(EXECUTABLE_NAME)",
  "CFBundleIdentifier": "com.apple.NLP",
  "CFBundleInfoDictionaryVersion": "6.0",
  "CFBundleName": "$(PRODUCT_NAME)",
  "CFBundlePackageType": "BNDL",
  "CFBundleShortVersionString": "1.0",
  "CFBundleSignature": "????",
  "CFBundleVersion": "1",
  "CanonicalRegions": {
    "de": {
...

And things like $ fq -r 'torepr.CanonicalRegions | toyaml' file.bplist etc

@dgmcdona
Copy link
Contributor Author

Is there a way to have the torepr function display the scalar mapped value instead of the value itself? Would like to have dates represented as a timestamp instead of a floating point value, since it's not obvious how this float has to be converted.

@wader
Copy link
Owner

wader commented Sep 18, 2022

-h formats test seems to fail, is in ./interp, should probably move it to format hmm

Maybe add a torepr test?

Will have a last look when im at a computer, but looks very good now

@wader
Copy link
Owner

wader commented Sep 18, 2022

Is there a way to have the torepr function display the scalar mapped value instead of the value itself? Would like to have dates represented as a timestamp instead of a floating point value, since it's not obvious how this float has to be converted.

torepr will use the scalars default "value" which is the symbolic value if set otherwise the actual value. But in your timestamp case a scalar mapper can also set a string description, that string is used when showing the hexdump tree thingy and can also be accessed with todescription (there is also toactual, tosym, and tovalue).

Yeah that is a bit unfortunate with the weird timestamp epoch, could it make sense to make the sym value a float unix timestamp or is even more confusing? In other decoders i've kept it as numbers as it seemed nicer for queries (doing comparisons etc). Maybe it would make sense to add more time functions? jq has strptime but i haven't used it much myself.

There is some half-finished work in fq to make tovalue have an option to prefer the actual value, maybe something like that could be done to prefer description? and ideas?

@wader
Copy link
Owner

wader commented Sep 18, 2022

Another alternative is to add an option to bplist decoder for how timestamps should be handled? maybe -o timestamp=unix, -o timetamp=iso8601, -o timestamp=cocoa etc?

format/bplist/bplist.md Outdated Show resolved Hide resolved
@wader
Copy link
Owner

wader commented Sep 18, 2022

I have a feeling it will probably also fail on some help output tests that is in formats/all.

Maybe all help output tests should be moved into individual formats? i will probably do that later on. Should probably so that a format will not affects tests outside its own testdata dirctory

@wader
Copy link
Owner

wader commented Sep 18, 2022

An interesting future feature would be to write a toplist in jq, there is toxml :)

| | | value{}: 0x21-0x21.7 (1)
0x20| 09 | . | type: "singleton" (0) (Singleton value (null/bool)) 0x21-0x21.3 (0.4)
| | | value: true 0x22-NA (0)
0x20| 09 | . | unknown0: raw bits 0x21.4-0x21.7 (0.4)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these are the unused 4 bit when size is < 0xf? maybe add a "unused" field for them etc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing this in the output anymore, so I think it is fixed? Can't remember right now if I explicitly fixed this, let me know if not.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you have some local change but i see it for the Info.plist test file:

➜  fq git:(bplist) ✗ go run . . format/bplist/testdata/com.apple.UIAutomation.plist
    │00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d│0123456789abcd│.{}: format/bplist/testdata/com.apple.UIAutomation.plist (bplist)
0x00│62 70 6c 69 73 74 30 30                  │bplist00      │  header{}:
0x00│                        d1 01 02 5f 10 13│        ..._..│  objects{}:
0x0e│55 49 41 75 74 6f 6d 61 74 69 6f 6e 45 6e│UIAutomationEn│
0x1c│61 62 6c 65 64 09                        │abled.        │
0x1c│               09                        │     .        │  unknown0: raw bits
0x1c│                  08 0b 21               │      ..!     │  offset_table[0:3]:
0x1c│                           00 00 00 00 00│         .....│  trailer{}:
0x2a│00 01 01 00 00 00 00 00 00 00 03 00 00 00│..............│
0x38│00 00 00 00 00 00 00 00 00 00 00 00 22│  │............"││

format/bplist/bplist.jq Outdated Show resolved Hide resolved
format/bplist/bplist.go Outdated Show resolved Hide resolved
format/bplist/bplist.go Outdated Show resolved Hide resolved
@wader
Copy link
Owner

wader commented Sep 22, 2022

Hey, i made some changes to help and help tests #430 for you i think it's just to run make doc again and possible add a help test, see some help_*.fqtest file

@dgmcdona
Copy link
Contributor Author

Thanks! I've been super busy but hopefully I can put the finishing touches on this this weekend.

@wader
Copy link
Owner

wader commented Sep 23, 2022

No worries, great, yes i think it's just a few small things left. But i will take an extra look if there is anything so that you have all comments and suggestions ready for the weekend

format/all/help.fqtest Outdated Show resolved Hide resolved
format/bplist/bplist.go Outdated Show resolved Hide resolved
@dgmcdona
Copy link
Contributor Author

dgmcdona commented Sep 24, 2022

Is FieldRawLen the correct way to decode a blob of binary data, such as in the Data type for binary plists? I'm having trouble extracting the data using toactual in the case that I might want to write it out into another file (although I may just be missing something). I get an error like:

can't convert actual value jq value &bitio.SectionReader{r:(*bitio.SectionReader)(0xc0009a53b0), bitBase:768, bitOff:768, bitLimit:128128}

Also, it seems weird that tovalue produces a base64 encoding, but it is truncated for longer values:

./fq 'torepr.SandboxProfileData | tovalue' /Users/davidmcdonald/Library/Containers/com.apple.LoginUserService/Container.plist

"<15920>AACgAJIAAAAFAAAALgASAV8DLAMWBJ8AnQCcAJ8AnwCfAJ8AngCfAJ4AnwCbAJ4AlACDAH4AfQCbAHwAfABhAF4AngBcAJsAmwCeAHwAUgBSAEwASQBSAFIAUgBSAJ8AUgBDAEAAnwCfAJ8AnwCeAJ8AnwCCAJ8AnwCfADsANQCeAJ8AnwCfAJ8AnwCfAJ8AnwCfADQAMAA0AC8ALAArAJ8AnwCfAJ8AnwCfAJ4AnwCeAJ8AnwCfAJ8AHwCfAJ8AnwAcAJ4AGwAbABoAFQCfAJ8AFACfAJ8AnwATABMAEwAOAA4AngCfAJ4AEwCfABMAngATABMAEwCeAAwAngALAA=="

In this case, only 0x100 of the 15920 bytes are encoded as base64, the rest is missing

@wader
Copy link
Owner

wader commented Sep 24, 2022

Is FieldRawLen the correct way to decode a blob of binary data, such as in the Data type for binary plists? I'm having trouble extracting the data using toactual in the case that I might want to write it out into another file (although I may just be missing something). I get an error like:

can't convert actual value jq value &bitio.SectionReader{r:(*bitio.SectionReader)(0xc0009a53b0), bitBase:768, bitOff:768, bitLimit:128128}

Also, it seems weird that tovalue produces a base64 encoding, but it is truncated for longer values:

./fq 'torepr.SandboxProfileData | tovalue' /Users/davidmcdonald/Library/Containers/com.apple.LoginUserService/Container.plist

"<15920>AACgAJIAAAAFAAAALgASAV8DLAMWBJ8AnQCcAJ8AnwCfAJ8AngCfAJ4AnwCbAJ4AlACDAH4AfQCbAHwAfABhAF4AngBcAJsAmwCeAHwAUgBSAEwASQBSAFIAUgBSAJ8AUgBDAEAAnwCfAJ8AnwCeAJ8AnwCCAJ8AnwCfADsANQCeAJ8AnwCfAJ8AnwCfAJ8AnwCfADQAMAA0AC8ALAArAJ8AnwCfAJ8AnwCfAJ4AnwCeAJ8AnwCfAJ8AHwCfAJ8AnwAcAJ4AGwAbABoAFQCfAJ8AFACfAJ8AnwATABMAEwAOAA4AngCfAJ4AEwCfABMAngATABMAEwCeAAwAngALAA=="

In this case, only 0x100 of the 15920 bytes are encoded as base64, the rest is missing

Yes raw is for raw bits (also does not have to be even bytes). Short version is to use tobits/tobytes (depending on which unit size for slicing you want) to get raw bits, otherwise it will behave more as "preview" string. See longer version below.

So at the moment you can do something like this:

# will show hexdump if stdout is a tty (to be safe), can do ... | cat if you really want raw bytes in the tty
$ go run . 'torepr.SandboxProfileData | tobytes' /Users/wader//Library/Containers/com.apple.LoginUserService/Container.plist

# will write raw data as stdout
$ go run . 'torepr.SandboxProfileData | tobytes' /Users/wader//Library/Containers/com.apple.LoginUserService/Container.plist > data

Longer version:

Problem comes from how to represent binary data as jq values. I've tried a couple of different variants, introduce a new binary type, array of ints and as strings, they all have different drawbacks and issues.

  • A new binary type feels natural to add but turns out the jq standard library code etc has assumptions about which types can exist. In fq you can still get the "real" type using _exttype, [0xff,0x90] | tobytes | _exttype is "binary". Also it still have to be able to be a JSON compatible value at some point.
  • Array of ints would still have to be special in some way to know how big each element is (bit/byte).
  • Strings in jq are unicode codepoints arrays so binary data might be interpreted as multi-byte codepoints and i think there were other confusing issue making not suitable for binary, see ex: "åäö" | .,tobytes | length.

Also there is the issue what to do with some formats that has raw fields that can be very large (ex: mp4 mdat), include all or truncates somehow?

So the current compromise is that raw will be base64 string (to be jq compatible) and also be truncated by default. It is possible change the tovalue truncate behaviour with -o bits_format=base64 (there is also md5).

Sorry for the long rant :) but it's very good someone else is messing around with this as i'm not that happy with the current design and i think it can be made better and less confusing, so feedback is very welcome.

BTW fq has support for "binary arrays" (similar to iolists in erlang) so you slice and concatenate parts into a new binary. Maybe a not very good example:

# build a binary array with a bytes (0), a binary slice and a string (will be utf8 bytes) and try to decode it as s bplist (force to skip magic check).
$ go run . -n '"hello" | tobits | [0, .[8:16], "a string"] | tobytes | bplist({force: true}) | d'
   │00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f│0123456789abcdef│.{}: (bplist)
   │                                               │                │  error: bplist: SeekAbs: failed at position 8 (read size 0 seek pos 0): invalid seek offset
   │                                               │                │  header{}:
0x0│00 65 61 20 73 74                              │.ea st          │    magic: "\x00ea st" (invalid)
0x0│                  72 69                        │      ri        │    version: "ri" (invalid)
0x0│                        6e 67│                 │        ng│     │  unknown0: raw bits

format/bplist/bplist.go Outdated Show resolved Hide resolved
| | | value{}: 0x21-0x21.7 (1)
0x20| 09 | . | type: "singleton" (0) (Singleton value (null/bool)) 0x21-0x21.3 (0.4)
| | | value: true 0x22-NA (0)
0x20| 09 | . | unknown0: raw bits 0x21.4-0x21.7 (0.4)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you have some local change but i see it for the Info.plist test file:

➜  fq git:(bplist) ✗ go run . . format/bplist/testdata/com.apple.UIAutomation.plist
    │00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d│0123456789abcd│.{}: format/bplist/testdata/com.apple.UIAutomation.plist (bplist)
0x00│62 70 6c 69 73 74 30 30                  │bplist00      │  header{}:
0x00│                        d1 01 02 5f 10 13│        ..._..│  objects{}:
0x0e│55 49 41 75 74 6f 6d 61 74 69 6f 6e 45 6e│UIAutomationEn│
0x1c│61 62 6c 65 64 09                        │abled.        │
0x1c│               09                        │     .        │  unknown0: raw bits
0x1c│                  08 0b 21               │      ..!     │  offset_table[0:3]:
0x1c│                           00 00 00 00 00│         .....│  trailer{}:
0x2a│00 01 01 00 00 00 00 00 00 00 03 00 00 00│..............│
0x38│00 00 00 00 00 00 00 00 00 00 00 00 22│  │............"││

pkg/decode/decode.go Show resolved Hide resolved
format/bplist/bplist.md Outdated Show resolved Hide resolved
format/bplist/bplist.md Show resolved Hide resolved
@dgmcdona
Copy link
Contributor Author

Handling timestamps is tricky. On the one hand, it's tempting to render it as the description by default, since that is the behavior of plutil and presents the value clearly to the user. On the other hand, jq does have functions for converting timestamps between formats from the raw value. I think the best solution for now is to be true to the original format and render the value as the decoded floating point value, and access the description using todescription as you suggested, that way we stay consistent with the jq way of doing things. I've added a note on this in the format documentation.

@wader
Copy link
Owner

wader commented Sep 25, 2022

singletons seems to produce unknown fields:

➜  fq git:(bplist) ✗ go run . -o line_bytes=10 'grep_by(.type=="singleton"), .unknown0, .unknown1 | dv' /Users/wader//Library/Containers/com.apple.LoginUserService/Container.plist
      │00 01 02 03 04 05 06 07 08 09│0123456789│.objects.entries[3].value.entries[5].value.entries[0].value{}: 0x475b-0x475b.7 (1)
0x4754│                     09      │       .  │  type: "singleton" (0) (Singleton value (null/bool)) 0x475b-0x475b.3 (0.4)
      │                             │          │  value: true 0x475c-NA (0)
      │00 01 02 03 04 05 06 07 08 09│0123456789│.objects.entries[3].value.entries[5].value.entries[1].value{}: 0x475b-0x475b.7 (1)
0x4754│                     09      │       .  │  type: "singleton" (0) (Singleton value (null/bool)) 0x475b-0x475b.3 (0.4)
      │                             │          │  value: true 0x475c-NA (0)
      │00 01 02 03 04 05 06 07 08 09│0123456789│
0x4754│                     09      │       .  │.unknown0: raw bits 0x475b.4-0x475b.7 (0.4)
      │00 01 02 03 04 05 06 07 08 09│0123456789│
0x4754│                        09   │        . │.unknown1: raw bits 0x475c-0x475c.7 (1)

Looks like 12 bits after the type is seen as unknown (should be one unknown field but seems there is a bug in the gap code, maybe because of synthetic fields, will have a look).

Also was a bit confused first until i realized that bplist can apparently use the index multiple times, quite cool.

@wader
Copy link
Owner

wader commented Sep 25, 2022

Range gap issue fixed in master #431, now unknown field shows up as one 12 bit field.

But i got a bit unsure if the gap we're seeing in the example above is correct or not? does "normal" bplist have them? i guess based on how bplist uses offsets tables it would be possible to have ranges that are unused/unknown but does that happen in practice? maybe because alignment etc?

Generally i've tried to make decoders behave so that they ends up with gaps only for things they don't know about/should not be there, ex unknown trailing data.

@wader
Copy link
Owner

wader commented Sep 25, 2022

With #432 toactual and tosym behave the same as tovalue. Also fixed the error. I think that make sense?

The code that handles this is starting to get a bit out of hand, badly needs a rethink/refactor

@wader
Copy link
Owner

wader commented Sep 28, 2022

Im thinking about releasing 0.0.10 soonish, would be nice to include this. If your busy we can merge and i can try fix the remaning things if you like?

@dgmcdona
Copy link
Contributor Author

dgmcdona commented Sep 29, 2022

Im thinking about releasing 0.0.10 soonish, would be nice to include this. If your busy we can merge and i can try fix the remaning things if you like?

Sorry, I've been drowning in work and thesis so I haven't had time to figure out those last unknown bytes. If you want to merge it for the next release, I'm happy to figure out the bugs when I get a chance, or you can if you have the time.

@wader
Copy link
Owner

wader commented Sep 29, 2022

Sorry, I've been drowning in work and thesis so I haven't had time to figure out those last unknown bytes. If you want to merge it for the next release, I'm happy to figure out the bugs when I get a chance, or you can if you have the time.

No need to be sorry, focus on thesis! what is it about?

Ok i'll let you know here if i figure something out

@wader
Copy link
Owner

wader commented Sep 30, 2022

Hey, this should fix the unknown field for singletons:

diff --git a/format/bplist/bplist.go b/format/bplist/bplist.go
index 7c74848c..83480d0d 100644
--- a/format/bplist/bplist.go
+++ b/format/bplist/bplist.go
@@ -91,15 +91,11 @@ func decodeItem(d *decode.D, p *plist) {
        m := d.FieldU4("type", elementTypeMap)
        switch m {
        case elementTypeNullOrBoolOrFill:
-               t := d.U4()
-               switch t {
-               case null:
-                       d.FieldValueNil("value")
-               case boolTrue:
-                       d.FieldValueBool("value", true)
-               case boolFalse:
-                       d.FieldValueBool("value", false)
-               }
+               d.FieldU4("value", scalar.UToScalar{
+                       null:      scalar.S{Sym: nil},
+                       boolTrue:  scalar.S{Sym: true},
+                       boolFalse: scalar.S{Sym: false},
+               })
        case elementTypeInt:
                n := d.FieldUFn("size", func(d *decode.D) uint64 {
                        return 1 << d.U4()

@wader
Copy link
Owner

wader commented Sep 30, 2022

Maybe left is to add some torepr test, rebase on master and regenerate documentation and test actual output.

After that i think were done or something more you want to do?

@wader
Copy link
Owner

wader commented Oct 4, 2022

Let's merge and i fix the things in master

@wader wader merged commit 09ea08f into wader:master Oct 4, 2022
@wader
Copy link
Owner

wader commented Oct 4, 2022

Thanks a lot for your contribution 🥳 Hope the decode API was ok to work with and that you might want to add more formats etc in the future!

@wader
Copy link
Owner

wader commented Oct 4, 2022

Hmm noticed now that the bplist commits don't have your github email. Feel free to do some dummy PR if you want to show up as a contributor to the project.

@wader
Copy link
Owner

wader commented Oct 5, 2022

@dgmcdona Hey, got your message but can't see it here? strange.

Anyways, no problem and totally understand! was so little left and wanted it part of 0.0.10 :) yes i'm looking forward to future contributions and feel free to email me or open issues if you have ideas or want to discuss something, ex how fq could be used in forensics.

What kind of formats are common CF-formats? filesystems etc? I haven't used fq for that but it is designed to handle broken files and i try divide formats into smaller "subformats" to make it possible to decode parts separately etc. For example i've used fq quite a lot to search for patterns then try to decode and filter something at each match, as jq is a generator based language it is quite ergonomic to do, ex try to decode each occurrence for 0xfff8 as a FLAC frame:

tobytes as $b | scan([0xff,0xf8]) | $b[.start:] | flac_frame

Would be really great of someone else experimented with things like that as the functions and behaviors of the binary type in fq is a bit strange at times and i don't really know myself how i would like it to work :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants