Skip to content

Commit

Permalink
Merge pull request #284 from wader/tofromencodings
Browse files Browse the repository at this point in the history
interp: Add to/from<encoding> for some common serialzations, encoding…
  • Loading branch information
wader authored May 28, 2022
2 parents 3f9f6b8 + 3b717c3 commit c8a9cf9
Show file tree
Hide file tree
Showing 49 changed files with 3,340 additions and 337 deletions.
13 changes: 9 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Tool, language and decoders for working with binary data.

fq is inspired by the well known jq tool and language and allows you to work with binary formats the same way you would using jq. In addition it can present data like a hex viewer, transform, slice and concatenate binary data. It also supports nested formats and has an interactive REPL with auto-completion.

It was originally designed to query, inspect and debug media codecs and containers like mp4, flac, mp3, jpeg. Since then it has been extended to support a variety of formats like executables, packet captures (including TCP reassembly) and serialization formats like ASN1 BER, Avro, CBOR, protobuf.
It was originally designed to query, inspect and debug media codecs and containers like mp4, flac, mp3, jpeg. Since then it has been extended to support a variety of formats like executables, packet captures (including TCP reassembly) and serialization formats like JSON, YAML, XML, ASN1 BER, Avro, CBOR, protobuf.

In summary it aims to be jq, hexdump, dd and gdb for files combined into one.

Expand Down Expand Up @@ -129,7 +129,11 @@ xing,

[#]: sh-end

For details see [formats.md](doc/formats.md)
#### Other non-binary formats

Can go to/from XML, JSON, jq-flavored JSON, YAML, TOML, CSV, URLs, hex string, base64, string encodings etc.

For details see [formats.md](doc/formats.md) and [usage.md](doc/usage.md).

## Usage

Expand All @@ -141,7 +145,6 @@ For details see [usage.md](doc/usage.md)

- "fq - jq for binary formats" at [Binary Tools Summit 2022](https://binary-tools.net/summit.html) - [video](https://www.youtube.com/watch?v=GJOq_b0eb-s&list=PLTj8twuHdQz-JcX7k6eOwyVPDB8CyfZc8&index=1) - [slides](doc/presentations/bts2022/fq-bts2022-v1.pdf)


## Install

Use one of the methods listed below or download [release](https://github.com/wader/fq/releases) for your platform. Unarchive it and move the executable to `PATH` etc.
Expand Down Expand Up @@ -271,5 +274,7 @@ Licenses of direct dependencies:
- mapstructure https://github.com/mitchellh/mapstructure/blob/master/LICENSE (MIT)
- copystructure https://github.com/mitchellh/copystructure/blob/master/LICENSE (MIT)
- go-difflib https://github.com/pmezard/go-difflib/blob/master/LICENSE (BSD)
- golang/x/text https://github.com/golang/text/blob/master/LICENSE (BSD)
- golang/x/* https://github.com/golang/text/blob/master/LICENSE (BSD)
- golang/snappy https://github.com/golang/snappy/blob/master/LICENSE (BSD)
- github.com/BurntSushi/toml https://github.com/BurntSushi/toml/blob/master/COPYING (MIT)
- gopkg.in/yaml.v3 https://github.com/go-yaml/yaml/blob/v3/LICENSE (MIT)
245 changes: 245 additions & 0 deletions doc/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -424,6 +424,251 @@ you currently have to do `fq -d raw 'mp3({force: true})' file`.
- `paste` read string from stdin until ^D. Useful for pasting text.
- Ex: `paste | frompem | asn1_ber | repl` read from stdin then decode and start a new sub-REPL with result.

### Encodings, serializations and hashes

In an addition to binary formats fq also support reading to and from encodings and serialization formats.

At the moment fq does not have any dedicated argument for serialization formats but raw string input `-R` slurp `-s` and raw string output `-r` can make things easier. The combination `-Rs` will read all inputs into one string (same as jq).

Note that `from*` functions output jq values and `to*` takes jq values as input so in some cases not all information will properly preserved, for example for XML element and attribute order might change and text and comment nodes might move and will be merged.

Some example usages:

```sh
# read yml and do some query
$ fq -Rs 'fromyaml | ...' file.yml

# convert YAML to JSON
# note -r for raw string output, without a JSON string with JSON would outputted
$ fq -Rsr 'fromyaml | tojson({indent:2})' file.yml

# add token to URL
$ echo -n "https://host.org" | fq -Rsr 'fromurl | .user.username="token" | tourl'
https://[email protected]

$ curl -s https://www.discogs.com/ | fq -Rsr 'fromhtml | .. | select(."-id" == "hot-releases")? | .div[].a."-aria-label"'
Arcade Fire - We
Bell Biv Devoe - Poison
Jazz Sabbath - Vol. 2
Jonathan Richman And The Modern Lovers* - Modern Lovers 88
Kylie* - Infinite Disco
Vince Guaraldi Trio - Baseball Theme

# shows how serialization functions can be used on any string, how to transform values and output som other format
# read decode zip file and start an interactive REPL
$ fq -i . <(curl -sL https://github.com/stefangabos/world_countries/archive/master.zip)
# select from interesting xml file
zip> .local_files[] | select(.file_name == "world_countries-master/data/countries/en/world.xml").uncompressed | repl
# convert xml into jq values
> .local_files[95].uncompressed string> fromxml | repl
# sort countries by and select the first one
>> object> .countries.country | sort_by(."-name") | first |
# see what current input is
>>> object> .
{
"-alpha2": "af",
"-alpha3": "afg",
"-id": "4",
"-name": "Afghanistan"
}
# remove "-" prefix from keys and convert to YAML and print it
>>> object> with_entries(.key |= .[1:]) | toyaml | print
alpha2: af
alpha3: afg
id: "4"
name: Afghanistan
# exit all REPLs back to shell
>>> object> ^D
>> object> ^D
> .local_files[95].uncompressed string> ^D
zip> ^D
```

- `fromxml`/`fromxml($opts)` Parse XML into jq values.<br>
`{seq: true}` preserve element ordering if more than one sibling.<br>
`{array: true}` use nested arrays to represent elements.<br>
- `fromhtml`/`fromhtml($opts)` Parse HTML into jq values.<br>
Same as `fromxml` but less strict and follows html5 parsing rules. Will always have a `html` root with `head` and `body` elements.<br>
`{array: true}` use nested arrays to represent elements.<br>
`{seq: true}` preserve element ordering if more than one sibling.<br>
- `toxml`/`toxml($opts})` Serialize jq value into XML.<br>
`{indent: number}` indent child elements.<br>
Assumes object representation if input is an object, and nested arrays if input is an array.<br>
Will automatically add a root `doc` element if jq value has more then one root element.<br>
If a `#seq` is found on at least one element all siblings will be sort by sequence number. Attributes are always sorted.<br>

XML elements can be represented as jq values in two ways, as objects (inspired by [mxj](https://github.com/clbanning/mxj) and [xml.com's Converting Between XML and JSON
](https://www.xml.com/pub/a/2006/05/31/converting-between-xml-and-json.html)) or nested arrays. Both representations are lossy and might lose ordering of elements, text nodes and comments. In object representation `fromxml`, `fromhtml` and `toxml` support `{seq:true}` option to parse/serialize `{"#seq"=<number>}` attributes to preserve element sibling ordering.

The object version is denser and convenient to query, the nested arrays version is probably easier to use when generating XML.

Let's assume `$xml` is this XML document as a string:
```xml
<doc>
<child attr="1"></child>
<child attr="2">text</child>
<other>text</other>
</doc>
```

With object representation an element is represented as:
- Attributes as dash prefixed `-<key>` keys.
- Text nodes as `#text`.
- Comment nodes as `#comment` keys.
- For explicit sibling ordering `#seq` keys with a number, can be negative, assumed zero if missing.
- Child element with only text as `<name>` key with text as value.
- Child element with more than just text as `<name>` key with value an object.
- Multiple child element sibling with same name as `name` key with value as array with strings and objects.
```jq
> $xml | fromxml
{
"doc": {
"child": [
{
"-attr": "1"
},
{
"#text": "text",
"-attr": "2"
}
],
"other": "text"
}
}
```

With nested array representation, an array with these values `["<name>", {attributes...}, [children...]]`
- Index zero is element name.
- Optional first object attributes (including `#text` and `#comment` keys).
- Optional first array are child elements.
#
```jq
> $xml | fromxml({array: true})
[
"doc",
[
[
"child",
{
"attr": "1"
}
],
[
"child",
{
"#text": "text",
"attr": "2"
}
],
[
"other",
{
"#text": "text"
}
]
]
]
```
Parse and include `#seq` attributes if needed:
```jq
> $xml | fromxml({seq:true})
{
"doc": {
"child": [
{
"#seq": 0,
"-attr": "1"
},
{
"#seq": 1,
"#text": "text",
"-attr": "2"
}
],
"other": {
"#seq": 2,
"#text": "text"
}
}
}
````
Select values in `<doc>`, remove `<child>`, add a `<new>` element, serialize to xml with 2 space indent and print the string
```jq
> $xml | fromxml.doc | del(.child) | .new = "abc" | {root: .} | toxml({indent: 2}) | println
<root>
<new>abc</new>
<other>text</other>
</root>
```
- `fromjson` Parse JSON into jq values.
- `tojson`/`tojson($opt)` Serialize jq value into JSON.<br>
`{indent: number}` indent array/object values.<br>
- `fromjq` Parse jq-flavoured JSON into jq values.
- `tojq`/`tojq($opt)` Serialize jq value into jq-flavoured JSON<br>
`{indent: number}` indent array/object values.<br>
jq-flavoured JSON has optional key quotes, `#` comments and can have trailing comma in arrays.
- `fromyaml` Parse YAML into jq values.
- `toyaml` Serialize jq value into YAML.
- `fromtoml` Parse TOML into jq values.
- `totoml` Serialize jq value into TOML.
- `fromcsv`/`fromcvs($opts)` Parse CSV into jq values.<br>
`{comma: string}` field separator, default ",".<br>
`{comment: string}` comment line character, default "#".<br>
- `tocsv`/`tocsv($opts)` Serialize jq value into CSV.<br>
`{comma: string}` field separator, default ",".<br>
- `fromxmlentities` Decode XML entities.
- `toxmlentities` Encode XML entities.
- `fromurlpath` Decode URL path component.
- `tourlpath` Encode URL path component.
- `fromurlencode` Decode URL query encoding.
- `tourlencode` Encode URL to query encoding.
- `fromurlquery` Decode URL query into object. For duplicates keys value will be an array.
- `tourlquery` Encode objet into query string.
- `fromurl` Decode URL into object.
```jq
> "schema://user:pass@host/path?key=value#fragement" | fromurl
{
"fragment": "fragement",
"host": "host",
"path": "/path",
"query": {
"key": "value"
},
"rawquery": "key=value",
"scheme": "schema",
"user": {
"password": "pass",
"username": "user"
}
}
```
- `tourl` Encode object into URL string.
- `fromhex` Decode hexstring to binary.
- `tohex` Encode binay into hexstring.
- `frombase64`/`frombase64($opts)` Decode base64 encodings into binary.<br>
`{encoding:string}` encoding variant: `std` (default), `url`, `rawstd` or `rawurl`
- `tobase64`/`tobase64($opts)` Encode binary into base64 encodings.<br>
`{encoding:string}` encoding variant: `std` (default), `url`, `rawstd` or `rawurl`
- `tomd4` Hash binary using md4.
- `tomd5` Hash binary using md5.
- `tosha1` Hash binary using sha1.
- `tosha256` Hash binary using sha256.
- `tosha512` Hash binary using sha512.
- `tosha3_224` Hash binary using sha3 224.
- `tosha3_256` Hash binary using sha3 256.
- `tosha3_384` Hash binary using sha3 384.
- `tosha3_512` Hash binary using sha3 512.
- `toiso8859_1` Decode binary as ISO8859-1 into string.
- `fromiso8859_1` Encode string as ISO8859-1 into binary.
- `toutf8` Encode string as UTF8 into binary.
- `fromutf8` Decode binary as UTF8 into string.
- `toutf16` Encode string as UTF16 into binary.
- `fromutf16` Decode binary as UTF16 into string.
- `toutf16le` Encode string as UTF16 little-endian into binary.
- `fromutf16le` Decode binary as UTF16 little-endian into string.
- `toutf16be` Encode string as UTF16 big-endian into binary.
- `fromutf16be` Decode binary as UTF16 big-endian into string.

## Color and unicode output

fq by default tries to use colors if possible, this can be disabled with `-M`. You can also
Expand Down
14 changes: 7 additions & 7 deletions format/all/help.fqtest
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ out ... | aac_frame
out # Decode file using aac_frame options
out $ fq -d aac_frame -o object_type=1 . file
out # Decode value as aac_frame
out ... | aac_frame({object_type: 1})
out ... | aac_frame({object_type:1})
"help(adts)"
out adts: Audio Data Transport Stream decoder
out Examples:
Expand Down Expand Up @@ -114,7 +114,7 @@ out ... | avc_au
out # Decode file using avc_au options
out $ fq -d avc_au -o length_size=4 . file
out # Decode value as avc_au
out ... | avc_au({length_size: 4})
out ... | avc_au({length_size:4})
"help(avc_dcr)"
out avc_dcr: H.264/AVC Decoder Configuration Record decoder
out Examples:
Expand Down Expand Up @@ -276,7 +276,7 @@ out ... | flac_frame
out # Decode file using flac_frame options
out $ fq -d flac_frame -o bits_per_sample=16 . file
out # Decode value as flac_frame
out ... | flac_frame({bits_per_sample: 16})
out ... | flac_frame({bits_per_sample:16})
"help(flac_metadatablock)"
out flac_metadatablock: FLAC metadatablock decoder
out Examples:
Expand Down Expand Up @@ -338,7 +338,7 @@ out ... | hevc_au
out # Decode file using hevc_au options
out $ fq -d hevc_au -o length_size=4 . file
out # Decode value as hevc_au
out ... | hevc_au({length_size: 4})
out ... | hevc_au({length_size:4})
"help(hevc_dcr)"
out hevc_dcr: H.265/HEVC Decoder Configuration Record decoder
out Examples:
Expand Down Expand Up @@ -486,7 +486,7 @@ out ... | mp3
out # Decode file using mp3 options
out $ fq -d mp3 -o max_sync_seek=32768 -o max_unique_header_configs=5 . file
out # Decode value as mp3
out ... | mp3({max_sync_seek: 32768, max_unique_header_configs: 5})
out ... | mp3({max_sync_seek:32768,max_unique_header_configs:5})
"help(mp3_frame)"
out mp3_frame: MPEG audio layer 3 frame decoder
out Examples:
Expand All @@ -512,7 +512,7 @@ out ... | mp4
out # Decode file using mp4 options
out $ fq -d mp4 -o allow_truncated=false -o decode_samples=true . file
out # Decode value as mp4
out ... | mp4({allow_truncated: false, decode_samples: true})
out ... | mp4({allow_truncated:false,decode_samples:true})
out References and links
out ISO/IEC base media file format (MPEG-4 Part 12) https://en.wikipedia.org/wiki/ISO/IEC_base_media_file_format
out Quicktime file format https://developer.apple.com/standards/qtff-2001.pdf
Expand Down Expand Up @@ -774,6 +774,6 @@ out ... | zip
out # Decode file using zip options
out $ fq -d zip -o uncompress=true . file
out # Decode value as zip
out ... | zip({uncompress: true})
out ... | zip({uncompress:true})
out References and links
out https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT
2 changes: 1 addition & 1 deletion format/asn1/testdata/laymans_guide_examples.fqtest
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
$ fq '.[] | . as $t | .hex | gsub("[^0-9a-f]";"") | hex | $t, (asn1_ber | dv)' laymans_guide_examples.json
$ fq '.[] | . as $t | .hex | gsub("[^0-9a-f]";"") | fromhex | $t, (asn1_ber | dv)' laymans_guide_examples.json
{
"decoded": "011011100101110111",
"hex": "03 04 06 6e 5d c0"
Expand Down
4 changes: 2 additions & 2 deletions format/cbor/testdata/appendix_a.fqtest
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
$ fq -i -d json . appendix_a.json
json> length
82
json> map(select(.decoded) | (.cbor | base64 | cbor | torepr) as $a | select( .decoded != $a) | {test: ., actual: $a})
json> map(select(.decoded) | (.cbor | frombase64 | cbor | torepr) as $a | select( .decoded != $a) | {test: ., actual: $a})
[
{
"actual": {
Expand Down Expand Up @@ -35,7 +35,7 @@ json> map(select(.decoded) | (.cbor | base64 | cbor | torepr) as $a | select( .d
}
}
]
json> .[] | select(.decoded) | .cbor | base64 | cbor | dv
json> .[] | select(.decoded) | .cbor | frombase64 | cbor | dv
|00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f|0123456789abcdef|.{}: (cbor) 0x0-0x0.7 (1)
0x0|00| |.| | major_type: "positive_int" (0) 0x0-0x0.2 (0.3)
0x0|00| |.| | short_count: 0 0x0.3-0x0.7 (0.5)
Expand Down
2 changes: 1 addition & 1 deletion format/cbor/testdata/cbor.fqtest
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
$ fq -n '"v2NGdW71Y0FtdCH/" | base64 | cbor | torepr'
$ fq -n '"v2NGdW71Y0FtdCH/" | frombase64 | cbor | torepr'
{
"Amt": -2,
"Fun": true
Expand Down
2 changes: 1 addition & 1 deletion format/mp4/testdata/colr_box.fqtest
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
$ fq -n '"000000106674797069736f6d0000020000000013636f6c726e636c7800010001000100" | hex | mp4 | dv'
$ fq -n '"000000106674797069736f6d0000020000000013636f6c726e636c7800010001000100" | fromhex | mp4 | dv'
|00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f|0123456789abcdef|.{}: (mp4) 0x0-0x22.7 (35)
| | | boxes[0:2]: 0x0-0x22.7 (35)
| | | [0]{}: box 0x0-0xf.7 (16)
Expand Down
2 changes: 1 addition & 1 deletion format/tiff/testdata/infinite.fqtest
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
$ fq.go -n '"SUkqAAwAAAAwMDAwAQAwMDAwMDAwMDAwMDAhAAAAMDAwAAAhAAAA" | base64 | tiff'
$ fq.go -n '"SUkqAAwAAAAwMDAwAQAwMDAwMDAwMDAwMDAhAAAAMDAwAAAhAAAA" | frombase64 | tiff'
|00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f|0123456789abcdef|.{}: (tiff)
| | | error: tiff: error at position 0x27: ifd loop detected for 33
0x00|49 49 2a 00 |II*. | endian: "little-endian" (0x49492a00)
Expand Down
Loading

0 comments on commit c8a9cf9

Please sign in to comment.