-
-
Notifications
You must be signed in to change notification settings - Fork 0
BTLZH10
Attempt to specify the BTLZH 1.0 bitstream.
BTLZH 1.0 is a subset of BTLZA, which will be considered as "effectively frozen".
The BTLZH bitstream is a superset of the Deflate bitstream, and thus decoders for BTLZH are effectively backwards-compatible with Deflate and Deflate64.
Notes:
- BTLZH 1.0 will be limited to a 1MB sliding window.
- A decoder should be able to handle a window this big.
- An encoder shall not use a window larger than this size, or emit references outside this range.
- 0xWMFF
- W=Log2 Window Size minus 8
- M=Method
- 8=Deflate
- 9=Deflate64
- A=BTLZH
- FF=Flags and Check
- 0bLLDCCCCC
- LL=Level, 0=Fastest, 3=Best Compression
- D=Preset Dictionary
- CC=Check Value
- 0bLLDCCCCC
For methods 8 and 9, the compressed data is solely in Deflate or Deflate64 mode. Compressed data may not contain BTLZH specific blocks, even if it is to be decoded with a BTLZH decoder.
For method A, the data is in BTLZH Mode.
In these modes, the compressed data is to be followed by an Adler32 checksum of the expected output.
If the preset dictionary flag is set, the header is followed by an Adler32 checksum which may be used to identify the dictionary. The contents of this dictionary are not defined here, and are specific to the use-case. In the general case, use of this feature is invalid and undefined.
- 0x8M (Primary)
- 0xBM (Alternate)
- Only contains a method number, and is a single byte.
- Does not contain a checksum.
- Default Window Sizes:
- 32kB for Deflate
- 64kB for Deflate64
- 1MB for BTLZA / BTLZH
- Interferes with the normal Zlib header for Deflate64.
- Alternate may be used if the primary would pass the Mod31 check.
- Alternate will likewise be subject to the Mod31 check.
- However, it will not occur that both would pass the Mod31 check.
- Alternate is only valid if primary would pass the Mod31 check.
Blocks will be tagged using a 1-bit final-block indicator followed by a 2 bit block-type ID. Bits will be read from bytes starting from the LSB.
If the final-block bit is set, then this is the final block in this bitstream. Otherwise, more blocks will follow after this block.
Block Types will use a 2-bit ID:
- 0: Stored Data
- 1: Fixed Table
- 2: Dynamic Table
- 3: Escape Case
- 3-bit secondary type follows.
- 0=Stored Data 2
- 1=Arithmetic Mode Header (Invalid in BTLZH)
- 2=Dynamic Table 2
- 3=Fixed Table 2
- 4-7=Reserved / Invalid
If in Deflate64 Mode, the blocks will be nearly identical, but with a few slight differences:
- The dictionary will be extended to 64kB.
- The last 2 entries in the distance table will be used for these lengths.
- Symbol 285 will have 16 extra bits.
- It will encode a value from 3 to 65538.
Stored data is stored in an uncompressed form. The bitstream is aligned to the next byte, and the stored data will be represented in terms of raw bytes.
- len:WORD, nLen:WORD, data:byte[len]
- len: Length of stored data
- nLen: ~len
- data: raw byte data.
Stored data is stored without applying LZ or Huffman compression.
These will be stored as a length followed by the data bytes (encoded as sequences of 8 bits).
- Len:Dist(6)
- Bytes(Len*8)
This form of store represents the data within the bitstream, and as such the logical bytes need not be aligned to a byte address. Stored data will be present as part of the dictionary.
In this mode, the Huffman table will used fixed code-lengths.
- 0-143: 8 bits
- 144-255: 9 bits
- 256-279: 7 bits
- 280-287: 8 bits
Similar to the normal Fixed Huffman Table, except that a table-ID is given:
- Hl(3): TableID, Literal (TableID_Lit)
- Hd(2): TableID, Distance (TableID_Dist)
TableID, Literal:
- 0, Same literal table as before, expanded distance table
- 0-143: 8 bits
- 144-255: 9 bits
- 256-279: 7 bits
- 280-287: 8 bits
- Match lengths are limited to 386 bytes.
- 1, Expanded A.
- 0-127: 8 bits
- 128-383: 9 bits
- 0xxxxxx: 0-127 8 bit
- 1xxxxxxx: 128-383
- 2, Expanded B.
- 0-255: 9 bits
- 256-383: 8 bits
- 0xxxxxx: 256-383 8 bit
- 1xxxxxxx: 0-255
- 3-6, Reserved
- 7, Reuse tables from prior block
- 0:
- 0-63: 6 bits (Window: Full Range)
- 1:
- 0-31: 5 bits (Window: 64kB)
- 2:
- 0-15: 4 bits (Window: 256 bytes)
- 3:
- 0-7: 3 bits (Window: 16 bytes)
- Reuse Literal and Distance tables from prior block.
- TableID_Dist=0
- For piecewise streams, this may reference tables from the prior stream segment.
- Hl(5) bits: Number of coded literal symbols (minus 257, 257-288).
- Hd(5) bits: Number coded distance symbols (minus 1, 1-32).
- Hc(4) bits: Number of header symbols (minus 4, 4-19).
- Hcl(Hc*3): Header symbol code lengths
- Encoded in the following order:
- 16, 17, 18, 0, 8,7, 9,6, 10,5, 11,4, 12,3, 13,2, 14,1, 15, 19
- Encoded in the following order:
- Hcl(Hc*3): Header symbol code lengths
- 0-15: Code Length
- 16: Rc(2), Repeat prior code length 3-7 times.
- 17: Zc(3), Code length of 0 for 3 to 10 times.
- 18: Zc(7), Code length of 0 for 11 to 138 times.
- 19: Uc(1), Increment/Decrement prior length, 1=+1, 0=-1 (BTLZH Specific)
Similar to the normal Dynamic Huffman Table, except for a few sizes and similar:
- Hr(3): Reserved, must be 0
- Hl(7): Number of coded literal symbols (minus 257, 257-384).
- Hd(6): Number coded distance symbols (minus 1, 1-64).
- Hc(5): Number of header symbols (minus 4, 4-35).
- Hcl(Hc*3): Header symbol code lengths
The block header is followed by LZ coded block data.
Will consist of a mix of encoded LZ77 runs and literal bytes. A block of coded data will be terminated by an End Of Block marker.
Symbol Ranges
- 0-255: Literal Byte Values
- 256: End Of Block
- 257-285: LZ Runs (Deflate and BTLZH Modes)
- 286-317: LZ Runs (BTLZH Blocks Only)
- 318-333: Extended Runs (BTLZH Blocks Only)
- 1: Reuse Prior Length and Distance
- 2: Reuse Prior Length, New Distance Follows
- 3: Prior Prior Distance, New Length Follows (As a literal encoding a length).
- 334-383: Unused/Invalid
Say, if we encode:
- A, B, (L=6, D=2)
- A, B, A, B, A, B, A, B
The general pattern consists of a Huffman coded prefix followed by a certain number of extra bits. Length prefixes are encoded using the Literal table, and distance prefixes are encoded using the distance prefix table.
Run Length A (BTLZH):
Run Length A(*) (Prefix, Range, Extra Bits): 257-264, 3- 10, 0 265-268, 11- 18, 1 269-272, 19- 34, 2 273-276, 35- 66, 3 277-280, 67-130, 4 281-284, 131-258, 5 - 285-288, 259- 514, 6 289-292, 515- 1026, 7 293-296, 1027- 2050, 8 297-300, 2051- 4098, 9 301-304, 4099- 8194, 10 305-308, 8195-16386, 11 309-312, 16387-32770, 12 313-316, 32771-65537, 13 317, 65538 , 0 ...
Run Length B (Deflate and Deflate64 blocks):
Run Length B(*) (Prefix, Range, Extra Bits): 257-264, 3- 10, 0 265-268, 11- 18, 1 269-272, 19- 34, 2 273-276, 35- 66, 3 277-280, 67-130, 4 281-284, 131-258, 5 285, 258|3-65538, 0|16
The type of length table used will depend on the block type.
- Fixed Huffman 2 and Dynamic Huffman 2 will use A.
- Fixed Huffman and Dynamic Huffman will used B.
- The encoding of symbol 285 will depend on the mode:
- Deflate Mode: 258, 0
- Deflate64 Mode: 3-65538, 16
- BTLZH Mode: 3-65538, 16
- Symbols beyond 285 are not used in Deflate Mode Blocks.
- The encoding of symbol 285 will depend on the mode:
Run Distance (Prefix, Range, Extra Bits) 0- 3, 1- 4, 0 4/ 5, 5- 8, 1 6/ 7, 9- 16, 2 8/ 9, 17- 32, 3 10/11, 33- 64, 4 12/13, 65- 128, 5 14/15, 129- 256, 6 16/17, 257- 512, 7 18/19, 513- 1024, 8 20/21, 1025- 2048, 9 22/23, 2049- 4096, 10 24/25, 4097- 8192, 11 26/27, 8193-16384, 12 28/29, 16385-32768, 13 30/31, 32769-65536, 14 - 32/33, 65537-131072, 15 34/35, 131073-262144, 16 ...
Likewise for extended runs:
Extended Runs 318-321, 1- 4, 0 322/323, 5- 8, 1 324/325, 9- 16, 2 326/327, 17- 32, 3 328/329, 33- 64, 4 330/331, 65- 128, 5 332/333, 129- 256, 6
Extended runs treat the Variable-Length value as an opcode for a command. In this format, only 1-3 are valid.