apache · jerryshao · Feb 3, 2024 · Feb 2, 2024 · Feb 3, 2024 · jerryshao
diff --git a/docs/expression.md b/docs/expression.md
@@ -0,0 +1,135 @@
+---
+title: "Expression system of Gravitino"
+slug: /expression
+date: 2024-02-02
+keyword: expression function field literal reference
+license: Copyright 2024 Datastrato Pvt Ltd. This software is licensed under the Apache License version 2.
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+This page introduces the expression system of Gravitino. Expressions are vital component of metadata definition, through expressions, you can define default values for columns and function arguments for [function partitioning](./table-partitioning-bucketing-sort-order-indexes.md#table-partitioning), [bucketing](./table-partitioning-bucketing-sort-order-indexes.md#table-bucketing), and sort term of [sort ordering](./table-partitioning-bucketing-sort-order-indexes.md#sort-ordering) in tables.
+Gravitino expression system divides expressions into three basic parts: field reference, literal, and function. Function expressions can contain field references, literals, and other function expressions.
+
+## Field reference
+
+Field reference is a reference to a field in a table.
+The following is an example of creating a field reference expression, demonstrating how to create a reference for the `student` field.
+
+<Tabs>
+  <TabItem value="Json" label="Json">
+
+```json
+[
+  {
+    "type": "field",
+    "fieldName": [
+      "student"
+    ]
+  }
+]
+```
+
+  </TabItem>
+  <TabItem value="java" label="Java">
+
+```java
+NamedReference field = NamedReference.field("student");
+```
+
+  </TabItem>
+</Tabs>
+
+## Literal
+
+Literal is a constant value.
+The following is an example of creating a literal expression, demonstrating how to create three different data types of literal expressions for the value `1024`.
+
+<Tabs>
+  <TabItem value="Json" label="Json">
+
+```json
+[
+  {
+    "type": "literal",
+    "dataType": "integer",
+    "value": "1024"
+  },
+  {
+    "type": "literal",
+    "dataType": "string",
+    "value": "1024"
+  },
+  {
+    "type": "literal",
+    "dataType": "decimal(10,2)",
+    "value": "1024"
+  }
+]
+```
+
+  </TabItem>
+  <TabItem value="java" label="Java">
+
+```java
+Literal<?>[] literals =
+    new Literal[] {
+    Literals.integerLiteral(1024),
+    Literals.stringLiteral("1024"),
+    Literals.decimalLiteral(Decimal.of("1024", 10, 2))
+    };
+```
+
+  </TabItem>
+</Tabs>
+
+## Function expression
+
+Function expression represents a function call with/without arguments. The arguments can be field references, literals, or other function expressions.
+The following is an example of creating a function expression, demonstrating how to create function expressions for `rand()` and `date_trunc('year', birthday)`.
+
+<Tabs>
+  <TabItem value="Json" label="Json">
+
+```json
+[
+  {
+    "type": "function",
+    "funcName": "rand",
+    "funcArgs": []
+  },
+  {
+    "type": "function",
+    "funcName": "date_trunc",
+    "funcArgs": [
+      {
+        "type": "literal",
+        "dataType": "string",
+        "value": "year"
+      },
+      {
+        "type": "field",
+        "fieldName": [
+          "birthday"
+        ]
+      }
+    ]
+  }
+]
+```
+
+  </TabItem>
+  <TabItem value="java" label="Java">
+
+```java
+FunctionExpression[] functionExpressions =
+        new FunctionExpression[] {
+          FunctionExpression.of("rand"),
+          FunctionExpression.of("date_trunc", Literals.stringLiteral("year"), NamedReference.field("birthday"))
+        };
+```
+
+  </TabItem>
+</Tabs>
+
diff --git a/docs/table-partitioning-bucketing-sort-order-indexes.md b/docs/table-partitioning-bucketing-sort-order-indexes.md
@@ -22,76 +22,41 @@ To create a partitioned table, you should provide the following two components t
 The `score`, `createTime`, and `city` appearing in the table below refer to the field names in a table.
 :::
 
-| Partitioning strategy | Description                                                    | JSON example                                                     | Java example                                           | Equivalent SQL semantics              |
-|-----------------------|----------------------------------------------------------------|------------------------------------------------------------------|--------------------------------------------------------|---------------------------------------|
-| `identity`            | Source value, unmodified.                                      | `{"strategy":"identity","fieldName":["score"]}`                  | `Transforms.identity("score")`                         | `PARTITION BY score`                  |
-| `hour`                | Extract a timestamp hour, as hours from '1970-01-01 00:00:00'. | `{"strategy":"hour","fieldName":["createTime"]}`                 | `Transforms.hour("createTime")`                        | `PARTITION BY hour(createTime)`       |
-| `day`                 | Extract a date or timestamp day, as days from '1970-01-01'.    | `{"strategy":"day","fieldName":["createTime"]}`                  | `Transforms.day("createTime")`                         | `PARTITION BY day(createTime)`        |
-| `month`               | Extract a date or timestamp month, as months from '1970-01-01' | `{"strategy":"month","fieldName":["createTime"]}`                | `Transforms.month("createTime")`                       | `PARTITION BY month(createTime)`      |
-| `year`                | Extract a date or timestamp year, as years from 1970.          | `{"strategy":"year","fieldName":["createTime"]}`                 | `Transforms.year("createTime")`                        | `PARTITION BY year(createTime)`       |
-| `bucket[N]`           | Hash of value, mod N.                                          | `{"strategy":"bucket","numBuckets":10,"fieldNames":[["score"]]}` | `Transforms.bucket(10, "score")`                       | `PARTITION BY bucket(10, score)`      |
-| `truncate[W]`         | Value truncated to width W.                                    | `{"strategy":"truncate","width":20,"fieldName":["score"]}`       | `Transforms.truncate(20, "score")`                     | `PARTITION BY truncate(20, score)`    |
-| `list`                | Partition the table by a list value.                           | `{"strategy":"list","fieldNames":[["createTime"],["city"]]}`     | `Transforms.list(new String[] {"createTime", "city"})` | `PARTITION BY list(createTime, city)` |
-| `range`               | Partition the table by a range value.                          | `{"strategy":"range","fieldName":["createTime"]}`                | `Transforms.range("createTime")`                       | `PARTITION BY range(createTime)`      |
-
-As well as the strategies mentioned before, you can use other functions strategies to partition the table, for example, the strategy can be `{"strategy":"functionName","fieldName":["score"]}`. The `functionName` can be any function name that you can use in SQL, for example, `{"strategy":"toDate","fieldName":["createTime"]}` is equivalent to `PARTITION BY toDate(createTime)` in SQL.
-For complex functions, please refer to [FunctionPartitioningDTO](https://github.com/datastrato/gravitino/blob/main/common/src/main/java/com/datastrato/gravitino/dto/rel/partitions/FunctionPartitioningDTO.java).
+| Partitioning strategy | Description                                                    | JSON example                                                                                            | Java example                                                                        | Equivalent SQL semantics              |
+|-----------------------|----------------------------------------------------------------|---------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------|---------------------------------------|
+| `identity`            | Source value, unmodified.                                      | `{"strategy":"identity","fieldName":["score"]}`                                                         | `Transforms.identity("score")`                                                      | `PARTITION BY score`                  |
+| `hour`                | Extract a timestamp hour, as hours from '1970-01-01 00:00:00'. | `{"strategy":"hour","fieldName":["createTime"]}`                                                        | `Transforms.hour("createTime")`                                                     | `PARTITION BY hour(createTime)`       |
+| `day`                 | Extract a date or timestamp day, as days from '1970-01-01'.    | `{"strategy":"day","fieldName":["createTime"]}`                                                         | `Transforms.day("createTime")`                                                      | `PARTITION BY day(createTime)`        |
+| `month`               | Extract a date or timestamp month, as months from '1970-01-01' | `{"strategy":"month","fieldName":["createTime"]}`                                                       | `Transforms.month("createTime")`                                                    | `PARTITION BY month(createTime)`      |
+| `year`                | Extract a date or timestamp year, as years from 1970.          | `{"strategy":"year","fieldName":["createTime"]}`                                                        | `Transforms.year("createTime")`                                                     | `PARTITION BY year(createTime)`       |
+| `bucket[N]`           | Hash of value, mod N.                                          | `{"strategy":"bucket","numBuckets":10,"fieldNames":[["score"]]}`                                        | `Transforms.bucket(10, "score")`                                                    | `PARTITION BY bucket(10, score)`      |
+| `truncate[W]`         | Value truncated to width W.                                    | `{"strategy":"truncate","width":20,"fieldName":["score"]}`                                              | `Transforms.truncate(20, "score")`                                                  | `PARTITION BY truncate(20, score)`    |
+| `list`                | Partition the table by a list value.                           | `{"strategy":"list","fieldNames":[["createTime"],["city"]]}`                                            | `Transforms.list(new String[] {"createTime", "city"})`                              | `PARTITION BY list(createTime, city)` |
+| `range`               | Partition the table by a range value.                          | `{"strategy":"range","fieldName":["createTime"]}`                                                       | `Transforms.range("createTime")`                                                    | `PARTITION BY range(createTime)`      |
+| `function`            | Partition the table by function expression.                    | `{"strategy":"function","funcName":"toYYYYMM","funcArgs":[{"type":"field","fieldName":["VisitDate"]}]}` | `Transforms.apply("toYYYYMM", new Expression[]{NamedReference.field("VisitDate")})` | `PARTITION BY toYYYYMM(VisitDate)`    |
+
+:::note
+For function partitioning, you should provide the function name and the function arguments. The function arguments must be an [expression](./expression.md).
+:::
 
 - Field names: It defines which fields Gravitino uses to partition the table.
 
 - In some cases, you require other information. For example, if the partitioning strategy is `bucket`, you should provide the number of buckets; if the partitioning strategy is `truncate`, you should provide the width of the truncate.
 
-The following is an example of creating a partitioned table:
-
-<Tabs>
-<TabItem value="Json" label="Json">
-
-```json
-[
-  {
-    "strategy": "identity",
-    "fieldName": [
-      "score"
-    ]
-  }
-]
-```
-
-</TabItem>
-<TabItem value="java" label="Java">
-
-```java
-new Transform[] {
-    // Partition by score
-    Transforms.identity("score")
-    }
-```
-
-</TabItem>
-</Tabs>
-
-
 ## Table bucketing
 
 To create a bucketed table, you should use the following three components to construct a valid bucketed table.
 
 - Strategy. It defines how Gravitino distributes table data across partitions.
 
-| Bucket strategy | Description                                                                                                                   | JSON     | Java             |
-|-----------------|-------------------------------------------------------------------------------------------------------------------------------|----------|------------------|
-| hash            | Bucket table using hash. Gravitino distributes table data into buckets based on the hash value of the key.                | `hash`   | `Strategy.HASH`  |
-| range           | Bucket table using range. Gravitino distributes table data into buckets based on a specified range or interval of values. | `range`  | `Strategy.RANGE` |
-| even            | Bucket table using even. Gravitino distributes table data, ensuring an equal distribution of data.                       | `even`   | `Strategy.EVEN`  |
-
-- Number. It defines how many buckets you use to bucket the table.
-- Function arguments. It defines the arguments of the strategy, Gravitino supports the following three kinds of arguments, for more, you can refer to Java class [FunctionArg](https://github.com/datastrato/gravitino/blob/main/common/src/main/java/com/datastrato/gravitino/dto/rel/expressions/FunctionArg.java) and [DistributionDTO](https://github.com/datastrato/gravitino/blob/main/common/src/main/java/com/datastrato/gravitino/dto/rel/DistributionDTO.java) to use more complex function arguments.
-
-| Expression type | JSON example                                                   | Java example                                                                              | Equivalent SQL semantics | Description                       |
-|-----------------|----------------------------------------------------------------|-------------------------------------------------------------------------------------------|--------------------------|-----------------------------------|
-| field           | `{"type":"field","fieldName":["score"]}`                       | `FieldReferenceDTO.of("score")`                                                           | `score`                  | The field reference value `score` |
-| function        | `{"type":"function","functionName":"hour","fieldName":["dt"]}` | `new FuncExpressionDTO.Builder().withFunctionName("hour").withFunctionArgs("dt").build()` | `hour(dt)`               | The function value `hour(dt)`     |
-| constant        | `{"type":"literal","value":10, "dataType": "integer"}`         | `new LiteralDTO.Builder().withValue("10").withDataType(Types.IntegerType.get()).build()`  | `10`                     | The integer literal `10`          |
+| Bucket strategy | Description                                                                                                               | JSON    | Java             |
+|-----------------|---------------------------------------------------------------------------------------------------------------------------|---------|------------------|
+| hash            | Bucket table using hash. Gravitino distributes table data into buckets based on the hash value of the key.                | `hash`  | `Strategy.HASH`  |
+| range           | Bucket table using range. Gravitino distributes table data into buckets based on a specified range or interval of values. | `range` | `Strategy.RANGE` |
+| even            | Bucket table using even. Gravitino distributes table data, ensuring an equal distribution of data.                        | `even`  | `Strategy.EVEN`  |
 
+- number. It defines how many buckets you use to bucket the table.
+- funcArgs. It defines the arguments of the strategy, the argument must be an [expression](./expression.md).
 
 <Tabs>
 <TabItem value="Json" label="Json">
@@ -127,20 +92,20 @@ To define a sorted order table, you should use the following three components to
 - Direction. It defines in which direction Gravitino sorts the table. The default value is `ascending`.
 
 | Direction  | Description                                 | JSON   | Java                       |
-|------------|---------------------------------------------| ------ | -------------------------- |
+|------------|---------------------------------------------|--------|----------------------------|
 | ascending  | Sorted by a field or a function ascending.  | `asc`  | `SortDirection.ASCENDING`  |
 | descending | Sorted by a field or a function descending. | `desc` | `SortDirection.DESCENDING` |
 
 - Null ordering. It describes how to handle null values when ordering
 
 | Null ordering Type | Description                             | JSON          | Java                       |
-|--------------------|-----------------------------------------| ------------- | -------------------------- |
+|--------------------|-----------------------------------------|---------------|----------------------------|
 | null_first         | Puts the null value in the first place. | `nulls_first` | `NullOrdering.NULLS_FIRST` |
 | null_last          | Puts the null value in the last place.  | `nulls_last`  | `NullOrdering.NULLS_LAST`  |
 
 Note: If the direction value is `ascending`, the default ordering value is `nulls_first` and if the direction value is `descending`, the default ordering value is `nulls_last`.
 
-- Sort term. It shows which field or function Gravitino uses to sort the table, please refer to the `Function arguments` in the table bucketing section.
+- sortTerm. It shows which field or function Gravitino uses to sort the table, must be an [expression](./expression.md).
 
 <Tabs>
 <TabItem value="Json" label="Json">
@@ -168,7 +133,7 @@ SortOrders.of(FieldReferenceDTO.of("score"), SortDirection.ASCENDING, NullOrderi
 
 
 :::tip
-**Not all catalogs may support those features.**. Please refer to the related document for more details.
+**Not all catalogs may support those features**. Please refer to the related document for more details.
 :::
 
 The following is an example of creating a partitioned, bucketed table, and sorted order table: