Add CNAME record for custom domain and fix minor typos (#13)

trinitylake-io · Nov 18, 2024 · 1435b91 · 1435b91
1 parent d17c9f1
commit 1435b91
Show file tree

Hide file tree

Showing 7 changed files with 17 additions and 11 deletions.
diff --git a/docs/CNAME b/docs/CNAME
@@ -0,0 +1 @@
+trinitylake.io
diff --git a/docs/format/location.md b/docs/format/location.md
@@ -5,7 +5,7 @@
 A Trinity LakeHouse should be created at a location that we call **Root Location**.
 
 Although TrinityLake in general does not depend on the directory concept in file system,
-The root location is expected to behave like a directory where all files in the LakeHouse is stored in
+The root location is expected to behave like a directory where all files in the LakeHouse are stored in
 locations that have the root location as the prefix.
 
 To avoid user confusion, we will always treat the root location as ending with a `/` even when the user input does not.
@@ -26,19 +26,24 @@ then the location value stored in the TrinityLake format should be `my-table-def
 
 ## File Name Size
 
-All files stored in TrinityLake format must have a maximum file name size defined in the [LakeHouse definition file](./lakehouse.md).
+All files stored in TrinityLake format must have a maximum file name size 
+defined in the [LakeHouse definition file](./lakehouse.md).
 
 ## Optimized File Name
 
 A file name in the TrinityLake format is designed for optimized performance in storage.
 Given an **Original File Name**, the **Optimized File Name** in storage can be calculated as the following:
 
 1. Calculate the MurMur3 hash of the file name in bytes.
-2. Get the first 20 bytes and convert it to binary string representation and use it as the prefix. This maximizes the throughput in object storages like S3.
-3. For the first, second and third group of 4 characters in the prefix, further separated with `/`. This maximizes the throughput in file systems like HDFS if a full directory listing at root location is necessary.
+2. Get the first 20 bytes and convert it to binary string representation and use it as the prefix. 
+   This maximizes the throughput in object storages like S3.
+3. For the first, second and third group of 4 characters in the prefix, further separated with `/`. 
+   This maximizes the throughput in file systems like HDFS if a full directory listing at root location is necessary.
 4. Concatenate the prefix before the original file name using the `-` character.
 
-For example, an original file name `my-table-definition.binpb` will be transformed to `0101/0101/0101/10101100-my-table-definition.binpb`.
+For example, an original file name `my-table-definition.binpb` will be transformed to 
+`0101/0101/0101/10101100-my-table-definition.binpb`.
 
 Note that not all the file names will be optimized in this way.
-A few system-internal files such as the [root node file](./storage.md#root-node-file-name) will not be stored using this scheme.
+A few system-internal files such as the [root node file](./transaction.md#root-node-file-name) 
+will not be stored using this scheme.
diff --git a/docs/format/storage.md b/docs/format/storage.md
@@ -11,7 +11,7 @@ Each node file is in the [Apache Arrow IPC format](https://arrow.apache.org/docs
 | 2  | pvalue | String     | File location pointer to the value of the key, following [Location Specification](./location.md)  | no        |         |
 | 3  | pnode  | String     | File location pointer to the value of the node, following [Location Specification](./location.md) | no        |         |
 
-## System-Reserved Rows for Root Node
+## System-Internal Rows for Root Node
 
 System-internal keys such as `lakehouse` will appear as the top rows in the file.
 Such keys do not exist in non-root node, and do not participate in the tree storage algorithm.

diff --git a/docs/format/table/table-schema.md b/docs/format/table/table-schema.md
@@ -1,3 +1,2 @@
 # Table Schema
 
-TODO
diff --git a/docs/format/tree/search-tree-storage.png → docs/format/tree/search-tree-map-storage.png b/docs/format/tree/search-tree-storage.png → docs/format/tree/search-tree-map-storage.png
diff --git a/docs/format/tree/search-tree-map.md b/docs/format/tree/search-tree-map.md
@@ -1,6 +1,6 @@
 # Search Tree Map
 
-A search tree can not only be used as the implementation of a set, but also a key-value map.
+A search tree can not only be used as the implementation of a set, but also a key-value **Map**.
 For database system applications of search trees like TrinityLake,
 the pointer is typically stored as the value in map, which points to a much larger payload in memory.
 
@@ -15,8 +15,8 @@ then a 3-way search tree could look like the following:
 
 A N-way search tree map can be persisted in storage.
 Here we introduce one way to store it that is used in TrinityLake.
-We will store each node of the search tree as a tabular file, that we call **Node Files**, with the following shape:
+We will store each node of the search tree as a tabular file, that we call **Node Files**.
 
 For example, using this mechanism, the previous 3-way search tree could look like the following 4 files in S3:
 
-![search-tree-storage](search-tree-storage.png)
+![search-tree-map-storage](search-tree-map-storage.png)
diff --git a/docs/format/versioning.md b/docs/format/versioning.md
@@ -36,6 +36,7 @@ until a past version is declared as deprecated.
 
 Because of the backward and forward compatibility requirement, minor and patch versions are for information only
 so that people can know what has been changing and update their implementations accordingly.
+This is also why only the major version is directly recorded in the [LakeHouse definition](./lakehouse.md).
 
 It is recommended that format implementations explicitly check the format version and 
 fail the reader accordingly for unsupported future format version.