Skip to content

Overflow pages

Volodkin Vladislav edited this page Jan 28, 2023 · 2 revisions

Motivation

In the current implementation of a b-tree node can store only a fixed amount of bytes, which is defined at the tree initialisation. This limit is defined by size of a page holding node. This implies a limit on the tuple (key + value) size, which would guarantee that N of such tuples would fit into a single page (where N is a degree of a b-tree). It is inefficient to make one page big enough to hold all the data of a practically unlimited tuples (<2^32 in length, or 4gb). This would not only mean that this GB-sized pages are loaded into RAM, but also that space on a disk is wasted if most of the tuples are small in size.

Problem

Need to store tuples which size is not limited to a page size.

Design

Here overflow pages will be introduced. Now tuples belonging to a node may be stored in several pages. These pages form a linked list. Node is still logically assigned with a single page and it's ID, but this page may reference another (overflow) page, which may also reference another page and so on. Overflow page will hold 'tail' of a tuple which did not fit into main page.

Equally limiting sizes of all of the cells in the main page

It is logical to hold prefixes of the same length in a main page as we will only need to load overflowed part of the key if two prefixes are identical. Though, it is impractical to do that in all of the overflow pages as it may require too much of them in case of a single long key (e.g, with 4096 page size and a tree degree of 10, we will have around 400 bytes per cell in one page, which will require 9 overflow pages for a single 4000-byte key).

Given that, we can limit keys equally in main page, allowing one key to take almost all of the place in the overflow page. I say almost, because overflow page must still always keep a little space for all of the N (tree degree) keys essential information, specifically:

  1. 8 bytes for start/end offsets;
  2. 4 bytes for cell’s overflowed part id in the next overflow page.

Layout

  1. File header: version, root node id
  2. Main page header: flags, cells count, overflow page id
  3. Overflow page header: cells count, overflow page id
  4. Main “cell header”: key length [uint32], tuple id in overflow page
  5. Overflow “cell header”: tuple id in overflow page
  6. Main page index: array of cell indexes (cells in the main page have fixed size)
  7. Overflow page index: array of [startOffset, endOffset] <no specific order, ids just correspond to those from referencing page>
  8. Internal pages also hold children right after offsets

Operations

Add

  1. Try find enough place for all of the data (overflow id, key len, key, value)
    1. Defragmentation may take place here
  2. Find at least 8 bytes for (overflow id, key len), we may find more free space here
    1. Defragmentation may also take place here
  3. Allocate OF page if inexistent
  4. Repeat 1-3 for OF page with data that is left

Remove:

  1. [starting from the last OF page in the list] Remove offset from the index
  2. Free page if that was the last offset
  3. Repeat 1-2 for previous OF pages

Update [simple version]:

  1. Remove
  2. Add