Skip to content

Commit

Permalink
docs: add fragment diagram
Browse files Browse the repository at this point in the history
  • Loading branch information
wjones127 committed Jul 10, 2023
1 parent 86aa5d2 commit da0166d
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 2 deletions.
14 changes: 14 additions & 0 deletions docs/format.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,9 @@ A ``Manifest`` file includes the metadata to describe a version of the dataset.
:start-at: // Manifest is
:end-at: } // Manifest

Fragments
~~~~~~~~~

``DataFragment`` represents a chunk of data in the dataset. Itself includes one or more ``DataFile``,
where each ``DataFile`` can contain several columns in the chunk of data. It also may include a
``DeletionFile``, which is explained in a later section.
Expand All @@ -35,6 +38,17 @@ where each ``DataFile`` can contain several columns in the chunk of data. It als
:end-at: } // DataFile


The overall structure of a fragment is shown below. One or more data files store
the columns of a fragment. New columns can be added to a fragment by adding new
data files. The deletion file (if present), stores the rows that have been
deleted from the fragment.

.. image:: _static/fragment_structure.png

Every row has a unique id, which is an u64 that is composed of two u32s: the
fragment id and the local row id. The local row id is just the index of the
row in the data files.

File Structure
--------------

Expand Down
5 changes: 3 additions & 2 deletions protos/format.proto
Original file line number Diff line number Diff line change
Expand Up @@ -141,8 +141,9 @@ message DataFile {
string path = 1;
// The ids of the fields/columns in this file.
//
// This must be equal in length to the number of columns in the file. The order
// of the ids determines the mapping from field id to column position.
// This must be equal in length to the number of columns in the file.
// The order of the ids determines the mapping from field id to column
// position.
repeated int32 fields = 2;
} // DataFile

Expand Down

0 comments on commit da0166d

Please sign in to comment.