Skip to content

Commit

Permalink
Add CNAME record for custom domain and fix minor typos (#13)
Browse files Browse the repository at this point in the history
  • Loading branch information
jackye1995 authored Nov 18, 2024
1 parent d17c9f1 commit 1435b91
Show file tree
Hide file tree
Showing 7 changed files with 17 additions and 11 deletions.
1 change: 1 addition & 0 deletions docs/CNAME
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
trinitylake.io
17 changes: 11 additions & 6 deletions docs/format/location.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
A Trinity LakeHouse should be created at a location that we call **Root Location**.

Although TrinityLake in general does not depend on the directory concept in file system,
The root location is expected to behave like a directory where all files in the LakeHouse is stored in
The root location is expected to behave like a directory where all files in the LakeHouse are stored in
locations that have the root location as the prefix.

To avoid user confusion, we will always treat the root location as ending with a `/` even when the user input does not.
Expand All @@ -26,19 +26,24 @@ then the location value stored in the TrinityLake format should be `my-table-def

## File Name Size

All files stored in TrinityLake format must have a maximum file name size defined in the [LakeHouse definition file](./lakehouse.md).
All files stored in TrinityLake format must have a maximum file name size
defined in the [LakeHouse definition file](./lakehouse.md).

## Optimized File Name

A file name in the TrinityLake format is designed for optimized performance in storage.
Given an **Original File Name**, the **Optimized File Name** in storage can be calculated as the following:

1. Calculate the MurMur3 hash of the file name in bytes.
2. Get the first 20 bytes and convert it to binary string representation and use it as the prefix. This maximizes the throughput in object storages like S3.
3. For the first, second and third group of 4 characters in the prefix, further separated with `/`. This maximizes the throughput in file systems like HDFS if a full directory listing at root location is necessary.
2. Get the first 20 bytes and convert it to binary string representation and use it as the prefix.
This maximizes the throughput in object storages like S3.
3. For the first, second and third group of 4 characters in the prefix, further separated with `/`.
This maximizes the throughput in file systems like HDFS if a full directory listing at root location is necessary.
4. Concatenate the prefix before the original file name using the `-` character.

For example, an original file name `my-table-definition.binpb` will be transformed to `0101/0101/0101/10101100-my-table-definition.binpb`.
For example, an original file name `my-table-definition.binpb` will be transformed to
`0101/0101/0101/10101100-my-table-definition.binpb`.

Note that not all the file names will be optimized in this way.
A few system-internal files such as the [root node file](./storage.md#root-node-file-name) will not be stored using this scheme.
A few system-internal files such as the [root node file](./transaction.md#root-node-file-name)
will not be stored using this scheme.
2 changes: 1 addition & 1 deletion docs/format/storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Each node file is in the [Apache Arrow IPC format](https://arrow.apache.org/docs
| 2 | pvalue | String | File location pointer to the value of the key, following [Location Specification](./location.md) | no | |
| 3 | pnode | String | File location pointer to the value of the node, following [Location Specification](./location.md) | no | |

## System-Reserved Rows for Root Node
## System-Internal Rows for Root Node

System-internal keys such as `lakehouse` will appear as the top rows in the file.
Such keys do not exist in non-root node, and do not participate in the tree storage algorithm.
Expand Down
1 change: 0 additions & 1 deletion docs/format/table/table-schema.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,2 @@
# Table Schema

TODO
File renamed without changes
6 changes: 3 additions & 3 deletions docs/format/tree/search-tree-map.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Search Tree Map

A search tree can not only be used as the implementation of a set, but also a key-value map.
A search tree can not only be used as the implementation of a set, but also a key-value **Map**.
For database system applications of search trees like TrinityLake,
the pointer is typically stored as the value in map, which points to a much larger payload in memory.

Expand All @@ -15,8 +15,8 @@ then a 3-way search tree could look like the following:

A N-way search tree map can be persisted in storage.
Here we introduce one way to store it that is used in TrinityLake.
We will store each node of the search tree as a tabular file, that we call **Node Files**, with the following shape:
We will store each node of the search tree as a tabular file, that we call **Node Files**.

For example, using this mechanism, the previous 3-way search tree could look like the following 4 files in S3:

![search-tree-storage](search-tree-storage.png)
![search-tree-map-storage](search-tree-map-storage.png)
1 change: 1 addition & 0 deletions docs/format/versioning.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ until a past version is declared as deprecated.

Because of the backward and forward compatibility requirement, minor and patch versions are for information only
so that people can know what has been changing and update their implementations accordingly.
This is also why only the major version is directly recorded in the [LakeHouse definition](./lakehouse.md).

It is recommended that format implementations explicitly check the format version and
fail the reader accordingly for unsupported future format version.

0 comments on commit 1435b91

Please sign in to comment.