Skip to content
Brendan G Bohannon edited this page Dec 10, 2021 · 1 revision

BtRP2 is a yet another byte-based LZ77 variant.

Its design takes some inspiration from the EA RefPack format, but changes were made in an effort to improve compression ratio and decoder performance. Another goal is to try to avoid becomming excessively complicated.

BtRP2 (Transposed, LE):

  • dddddddd-dlllrrr0 (l=3..10, d=0..511, r=0..7)
  • dddddddd-dddddlll-lllrrr01 (l=4..67, d=0..8191, r=0..7)
  • dddddddd-dddddddd-dlllllll-llrrr011 (l=4..515, d=0..131071, r=0..7)
  • rrrr0111 (Raw Bytes, r=(r+1)*8, 8..128)
  • rrr01111 (Long Match, *1)
  • rr011111 (r=1..3 bytes, 0=EOB)
  • rrrrrrrr-r0111111 (Long Raw, r=(r+1)*8, 8..4096)
    • d: Distance
    • l: Match Length
    • r: Literal Length
Values are encoded in little-endian order, with tag bits located in the LSB. Bits will be contiguous within the value, with shift-and-mask being used to extract individual elements.

1: Long Match will encode length and distance using variable-length encodings directly following the initial tag byte.

Length VLN:

  • lllllll0, 4.. 131
  • llllllll-llllll01, 132..16383
Distance VLN:
  • dddddddd-ddddddd0, 32K (0..32767)
  • dddddddd-dddddddd-dddddd01, 4M
While there are both more space efficient and faster ways to handle Length/Distance VLNs (such as via a combined encoding), this encoding is "reasonable" and the Long Match case appears to be relatively less common. The above encoding can be trivially extended to support larger values.

Note the lack of length/distance predictors: While predictors can be useful, their effectiveness in byte-oriented encoders is limited, and supporting these cases tends to have a detrimental effect on performance (they make more sense in Entropy-coded designs).

Simple File Format:

  • 0: 'RP2A'
  • 4: Compressed Size
  • 8: Uncompressed Size
  • 12: Checksum
  • Compressed data starts at 16.
This format will not attempt to deal with chunking or streaming.
Clone this wiki locally