-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds support for Apple Binary Plist, version 00 #427
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments, but in general looks good i think.
Maybe you want add a bplist.md
file? see wasm.md
etc, also remember to embed it and run make doc && WRITE_ACTUAL=1 go test ./format
. You might need to rebase to make it work as i re-added markdown doc support some days ago.
Also could possibly add torepr
support, see msgpack.jq etc. I can help you with that if your not familiar with jq
Some test file with be nice Should be just to put a bplist in |
format/bplist/bplist.go
Outdated
return d.FE(int(n), decode.BigEndian) | ||
}) | ||
case elementTypeDate: | ||
d.FieldStrFn("value", func(d *decode.D) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is 64 bit float point unix time? if it was an integer you could dod.FieldF64("value", scalar.DescriptionActualUUnixTime)
but this seems to be float, maybe should add some helper for that?
scalar.DescriptionActualUUnixTime
sets description so you can still get the number in jq expressions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a helper function for decoding unix time and Cocoa date from float values
A torepr function could look something like this for the current tree structure: def _bplist_torepr:
def _f:
( .object
| if .type == "singleton" then .value
elif .type == "int" then .value
elif .type == "real" then .value
elif .type == "date" then .value
elif .type == "data" then .value.data
elif .type == "ascii_string" then .value.value
elif .type == "unicode_string" then .value.value
elif .type == "uid" then .value
elif .type == "array" then
( .elements
| map(_f)
)
elif .type == "set" then
( .elements
| map(_f)
)
elif .type == "dict" then
( .dictionary.entries
| map({key: (.key | _f), value: (.value | _f)})
| from_entries
)
else error("unknown type: \(.type)")
end
);
( .objects
| _f
); With this diff: --- a/format/bplist/bplist.go
+++ b/format/bplist/bplist.go
@@ -1,6 +1,7 @@
package bplist
import (
+ "embed"
"math"
"time"
@@ -10,6 +11,9 @@ import (
"github.com/wader/fq/pkg/scalar"
)
+//go:embed bplist.jq
+var bplistFS embed.FS
+
func init() {
interp.RegisterFormat(decode.Format{
Name: format.BPLIST,
@@ -17,7 +21,9 @@ func init() {
Description: "Apple Binary Property List",
Groups: []string{format.PROBE},
DecodeFn: bplistDecode,
+ Functions: []string{"torepr"},
})
+ interp.RegisterFS(bplistFS)
}
const ( You can do: ➜ fq git:(bplist) ✗ go run . torepr /System/Library/AssetsV2/com_apple_MobileAsset_LinguisticData/c55ebecca7037d50a22fe39315a802984688c69f.asset/AssetData/parser/config.plist
{
"CFBundleDevelopmentRegion": "en",
"CFBundleExecutable": "$(EXECUTABLE_NAME)",
"CFBundleIdentifier": "com.apple.NLP",
"CFBundleInfoDictionaryVersion": "6.0",
"CFBundleName": "$(PRODUCT_NAME)",
"CFBundlePackageType": "BNDL",
"CFBundleShortVersionString": "1.0",
"CFBundleSignature": "????",
"CFBundleVersion": "1",
"CanonicalRegions": {
"de": {
... And things like |
Is there a way to have the |
-h formats test seems to fail, is in ./interp, should probably move it to format hmm Maybe add a torepr test? Will have a last look when im at a computer, but looks very good now |
torepr will use the scalars default "value" which is the symbolic value if set otherwise the actual value. But in your timestamp case a scalar mapper can also set a string description, that string is used when showing the hexdump tree thingy and can also be accessed with Yeah that is a bit unfortunate with the weird timestamp epoch, could it make sense to make the sym value a float unix timestamp or is even more confusing? In other decoders i've kept it as numbers as it seemed nicer for queries (doing comparisons etc). Maybe it would make sense to add more time functions? jq has There is some half-finished work in fq to make |
Another alternative is to add an option to bplist decoder for how timestamps should be handled? maybe |
I have a feeling it will probably also fail on some help output tests that is in formats/all. Maybe all help output tests should be moved into individual formats? i will probably do that later on. Should probably so that a format will not affects tests outside its own testdata dirctory |
An interesting future feature would be to write a |
| | | value{}: 0x21-0x21.7 (1) | ||
0x20| 09 | . | type: "singleton" (0) (Singleton value (null/bool)) 0x21-0x21.3 (0.4) | ||
| | | value: true 0x22-NA (0) | ||
0x20| 09 | . | unknown0: raw bits 0x21.4-0x21.7 (0.4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these are the unused 4 bit when size is < 0xf? maybe add a "unused" field for them etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not seeing this in the output anymore, so I think it is fixed? Can't remember right now if I explicitly fixed this, let me know if not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you have some local change but i see it for the Info.plist test file:
➜ fq git:(bplist) ✗ go run . . format/bplist/testdata/com.apple.UIAutomation.plist
│00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d│0123456789abcd│.{}: format/bplist/testdata/com.apple.UIAutomation.plist (bplist)
0x00│62 70 6c 69 73 74 30 30 │bplist00 │ header{}:
0x00│ d1 01 02 5f 10 13│ ..._..│ objects{}:
0x0e│55 49 41 75 74 6f 6d 61 74 69 6f 6e 45 6e│UIAutomationEn│
0x1c│61 62 6c 65 64 09 │abled. │
0x1c│ 09 │ . │ unknown0: raw bits
0x1c│ 08 0b 21 │ ..! │ offset_table[0:3]:
0x1c│ 00 00 00 00 00│ .....│ trailer{}:
0x2a│00 01 01 00 00 00 00 00 00 00 03 00 00 00│..............│
0x38│00 00 00 00 00 00 00 00 00 00 00 00 22│ │............"││
Hey, i made some changes to help and help tests #430 for you i think it's just to run |
Thanks! I've been super busy but hopefully I can put the finishing touches on this this weekend. |
No worries, great, yes i think it's just a few small things left. But i will take an extra look if there is anything so that you have all comments and suggestions ready for the weekend |
Is
Also, it seems weird that
In this case, only 0x100 of the 15920 bytes are encoded as base64, the rest is missing |
Yes raw is for raw bits (also does not have to be even bytes). Short version is to use So at the moment you can do something like this: # will show hexdump if stdout is a tty (to be safe), can do ... | cat if you really want raw bytes in the tty
$ go run . 'torepr.SandboxProfileData | tobytes' /Users/wader//Library/Containers/com.apple.LoginUserService/Container.plist
# will write raw data as stdout
$ go run . 'torepr.SandboxProfileData | tobytes' /Users/wader//Library/Containers/com.apple.LoginUserService/Container.plist > data Longer version: Problem comes from how to represent binary data as jq values. I've tried a couple of different variants, introduce a new
Also there is the issue what to do with some formats that has raw fields that can be very large (ex: mp4 mdat), include all or truncates somehow? So the current compromise is that raw will be base64 string (to be jq compatible) and also be truncated by default. It is possible change the Sorry for the long rant :) but it's very good someone else is messing around with this as i'm not that happy with the current design and i think it can be made better and less confusing, so feedback is very welcome. BTW fq has support for "binary arrays" (similar to iolists in erlang) so you slice and concatenate parts into a new binary. Maybe a not very good example: # build a binary array with a bytes (0), a binary slice and a string (will be utf8 bytes) and try to decode it as s bplist (force to skip magic check).
$ go run . -n '"hello" | tobits | [0, .[8:16], "a string"] | tobytes | bplist({force: true}) | d'
│00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f│0123456789abcdef│.{}: (bplist)
│ │ │ error: bplist: SeekAbs: failed at position 8 (read size 0 seek pos 0): invalid seek offset
│ │ │ header{}:
0x0│00 65 61 20 73 74 │.ea st │ magic: "\x00ea st" (invalid)
0x0│ 72 69 │ ri │ version: "ri" (invalid)
0x0│ 6e 67│ │ ng│ │ unknown0: raw bits |
| | | value{}: 0x21-0x21.7 (1) | ||
0x20| 09 | . | type: "singleton" (0) (Singleton value (null/bool)) 0x21-0x21.3 (0.4) | ||
| | | value: true 0x22-NA (0) | ||
0x20| 09 | . | unknown0: raw bits 0x21.4-0x21.7 (0.4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you have some local change but i see it for the Info.plist test file:
➜ fq git:(bplist) ✗ go run . . format/bplist/testdata/com.apple.UIAutomation.plist
│00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d│0123456789abcd│.{}: format/bplist/testdata/com.apple.UIAutomation.plist (bplist)
0x00│62 70 6c 69 73 74 30 30 │bplist00 │ header{}:
0x00│ d1 01 02 5f 10 13│ ..._..│ objects{}:
0x0e│55 49 41 75 74 6f 6d 61 74 69 6f 6e 45 6e│UIAutomationEn│
0x1c│61 62 6c 65 64 09 │abled. │
0x1c│ 09 │ . │ unknown0: raw bits
0x1c│ 08 0b 21 │ ..! │ offset_table[0:3]:
0x1c│ 00 00 00 00 00│ .....│ trailer{}:
0x2a│00 01 01 00 00 00 00 00 00 00 03 00 00 00│..............│
0x38│00 00 00 00 00 00 00 00 00 00 00 00 22│ │............"││
Handling timestamps is tricky. On the one hand, it's tempting to render it as the description by default, since that is the behavior of |
singletons seems to produce unknown fields: ➜ fq git:(bplist) ✗ go run . -o line_bytes=10 'grep_by(.type=="singleton"), .unknown0, .unknown1 | dv' /Users/wader//Library/Containers/com.apple.LoginUserService/Container.plist
│00 01 02 03 04 05 06 07 08 09│0123456789│.objects.entries[3].value.entries[5].value.entries[0].value{}: 0x475b-0x475b.7 (1)
0x4754│ 09 │ . │ type: "singleton" (0) (Singleton value (null/bool)) 0x475b-0x475b.3 (0.4)
│ │ │ value: true 0x475c-NA (0)
│00 01 02 03 04 05 06 07 08 09│0123456789│.objects.entries[3].value.entries[5].value.entries[1].value{}: 0x475b-0x475b.7 (1)
0x4754│ 09 │ . │ type: "singleton" (0) (Singleton value (null/bool)) 0x475b-0x475b.3 (0.4)
│ │ │ value: true 0x475c-NA (0)
│00 01 02 03 04 05 06 07 08 09│0123456789│
0x4754│ 09 │ . │.unknown0: raw bits 0x475b.4-0x475b.7 (0.4)
│00 01 02 03 04 05 06 07 08 09│0123456789│
0x4754│ 09 │ . │.unknown1: raw bits 0x475c-0x475c.7 (1) Looks like 12 bits after the type is seen as unknown (should be one unknown field but seems there is a bug in the gap code, maybe because of synthetic fields, will have a look). Also was a bit confused first until i realized that bplist can apparently use the index multiple times, quite cool. |
Range gap issue fixed in master #431, now unknown field shows up as one 12 bit field. But i got a bit unsure if the gap we're seeing in the example above is correct or not? does "normal" bplist have them? i guess based on how bplist uses offsets tables it would be possible to have ranges that are unused/unknown but does that happen in practice? maybe because alignment etc? Generally i've tried to make decoders behave so that they ends up with gaps only for things they don't know about/should not be there, ex unknown trailing data. |
With #432 The code that handles this is starting to get a bit out of hand, badly needs a rethink/refactor |
Im thinking about releasing 0.0.10 soonish, would be nice to include this. If your busy we can merge and i can try fix the remaning things if you like? |
Sorry, I've been drowning in work and thesis so I haven't had time to figure out those last unknown bytes. If you want to merge it for the next release, I'm happy to figure out the bugs when I get a chance, or you can if you have the time. |
No need to be sorry, focus on thesis! what is it about? Ok i'll let you know here if i figure something out |
Hey, this should fix the unknown field for singletons: diff --git a/format/bplist/bplist.go b/format/bplist/bplist.go
index 7c74848c..83480d0d 100644
--- a/format/bplist/bplist.go
+++ b/format/bplist/bplist.go
@@ -91,15 +91,11 @@ func decodeItem(d *decode.D, p *plist) {
m := d.FieldU4("type", elementTypeMap)
switch m {
case elementTypeNullOrBoolOrFill:
- t := d.U4()
- switch t {
- case null:
- d.FieldValueNil("value")
- case boolTrue:
- d.FieldValueBool("value", true)
- case boolFalse:
- d.FieldValueBool("value", false)
- }
+ d.FieldU4("value", scalar.UToScalar{
+ null: scalar.S{Sym: nil},
+ boolTrue: scalar.S{Sym: true},
+ boolFalse: scalar.S{Sym: false},
+ })
case elementTypeInt:
n := d.FieldUFn("size", func(d *decode.D) uint64 {
return 1 << d.U4() |
Maybe left is to add some After that i think were done or something more you want to do? |
Let's merge and i fix the things in master |
Thanks a lot for your contribution 🥳 Hope the decode API was ok to work with and that you might want to add more formats etc in the future! |
Hmm noticed now that the bplist commits don't have your github email. Feel free to do some dummy PR if you want to show up as a contributor to the project. |
@dgmcdona Hey, got your message but can't see it here? strange. Anyways, no problem and totally understand! was so little left and wanted it part of 0.0.10 :) yes i'm looking forward to future contributions and feel free to email me or open issues if you have ideas or want to discuss something, ex how fq could be used in forensics. What kind of formats are common CF-formats? filesystems etc? I haven't used fq for that but it is designed to handle broken files and i try divide formats into smaller "subformats" to make it possible to decode parts separately etc. For example i've used fq quite a lot to search for patterns then try to decode and filter something at each match, as jq is a generator based language it is quite ergonomic to do, ex try to decode each occurrence for 0xfff8 as a FLAC frame: tobytes as $b | scan([0xff,0xf8]) | $b[.start:] | flac_frame Would be really great of someone else experimented with things like that as the functions and behaviors of the binary type in fq is a bit strange at times and i don't really know myself how i would like it to work :) |
This adds support for decoding Apple Binary Plists. The only well documented version is
00
, and is therefore the only one supported here. I have tested this on both large and small binary plists, including ones with nested dictionaries.