Skip to content

Commit

Permalink
Merge pull request #714 from keboola/edit-native-data-types
Browse files Browse the repository at this point in the history
Minor edits - native data types
  • Loading branch information
hhanova authored Jan 10, 2025
2 parents 0af4661 + c16ab82 commit cc1ca61
Showing 1 changed file with 27 additions and 27 deletions.
54 changes: 27 additions & 27 deletions storage/tables/data-types/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,16 @@ For example, when a user imports a large dataset with predefined data types, suc
Tables with native data types are labeled in the UI with a badge: **auto-typed**.

## Key Benefits
1. **Automatic Data Type Preservation:** Data types from the source are automatically respected, reducing the need for manual adjustments in Storage.
2. **Faster Data Handling:** Native data types enable more efficient data manipulation, as well as faster loading and unloading, improving overall performance.
3. **Simplified Transformations:** Read-Only data access eliminates the need for casting, making data operations smoother and more streamlined.
4. **Flexible Configurations:** Users can decide whether data types should be automatically fetched for each configuration when creating a table.
5. **Improved Workspace Loading:** Loading data into a workspace is significantly faster than loading into a table without native data types, eliminating the need for additional casting.
6. **Typed Columns in Workspaces:** Tables **accessed in a workspace** via the [read-only input mapping](/transformations/workspace/#read-only-input-mapping) already have typed columns, ensuring seamless data handling.
These are the key benefits of using the Native Data Types feature:
- **Automatic Data Type Preservation:** Data types from the source are automatically respected, reducing the need for manual adjustments in Storage.
- **Faster Data Handling:** Native data types enable more efficient data manipulation, as well as faster loading and unloading, improving overall performance.
- **Simplified Transformations:** Read-only data access eliminates the need for casting, making data operations smoother and more streamlined.
- **Flexible Configurations:** Users can decide whether data types should be automatically fetched for each configuration when creating a table.
- **Improved Workspace Loading:** Loading data into a workspace is significantly faster than loading into a table without native data types, eliminating the need for additional casting.
- **Typed Columns in Workspaces:** Tables **accessed in a workspace** via the [read-only input mapping](/transformations/workspace/#read-only-input-mapping) already have typed columns, ensuring seamless data handling.

## Current Drawbacks
Using the Native Data Types feature also has its drawbacks:
- Data types in typed tables cannot be modified after creation. To change the data types, you must recreate the table. This limitation applies to both the UI and the API. See [How to Change Column Types](/storage/tables/data-types/#changing-types-of-existing-typed-columns).
- Keboola does not perform any type conversion during data loading. Your data must exactly match the column type defined in the table within Storage.
- Loading data with incompatible types will result in a failure.
Expand All @@ -34,31 +36,27 @@ You can configure the data type behavior in the UI component configuration setti

In transformations, this option is not available. Instead, you define the data types in your query (if you need the table to be typed). If no types are defined, the table will default to storing data in VARCHAR format. However, it will still be marked as AUTO-TYPED in both cases.

**Important:**
- Existing tables will not be affected by this feature.
- If you do not see the **Automatic data types** option in the sidebar, it means the component does not support this feature.
**Important:** Existing tables will not be affected by this feature. Also, if you do not see the **Automatic data types** option in the sidebar, it means the component does not support this feature.

### How to Create a Typed Table
The Native Data Types feature allows tables to be created with data types that match the original source or storage backend. Here’s how you can create typed tables:
- **Manually via API**
- **Manually via API**
You can manually create typed tables using the [tables-definition endpoint](https://keboola.docs.apiary.io/#reference/tables/create-table-definition/create-new-table-definition). Ensure that the data types align with the storage backend (e.g., Snowflake, BigQuery) used in your project. Alternatively, [base types](/storage/tables/data-types/#base-types) can be used for compatibility.
- **Using a Component**
- **Using a Component**
Extractors and transformations that match the storage backend (e.g., Snowflake SQL transformation on a Snowflake storage backend) will automatically create typed tables in Storage:
- **Matching Storage Backend:** Database extractors and transformations create storage tables using the same data types as the backend.
- **Mismatching Storage Backend:** Extractors use base types to ensure compatibility. [Learn more.](/storage/tables/data-types/#base-types)

<div class="clearfix"></div>
<div class="alert alert-warning" role="alert">
<i class="fas fa-exclamation-circle"></i>
<strong>Important:</strong> When a table is created, it defaults to the lengths and precisions specific to the Storage backend. For instance, in Snowflake, the NUMBER base type defaults to NUMBER(38,9), which might differ from the source database column type, such as NUMBER(10,2). Follow the steps below to avoid this limitation.
</div>

**To avoid this limitation:**
<strong>Important:</strong> When a table is created, it defaults to the lengths and precisions specific to the Storage backend. For instance, in Snowflake, the NUMBER base type defaults to NUMBER(38,9), which might differ from the source database column type, such as NUMBER(10,2). To avoid this limitation:
- Manually create the table in advance using the [Table Definition API](https://keboola.docs.apiary.io/#reference/tables/create-table-definition/create-new-table-definition), specifying the correct lengths and precisions.
- Subsequent jobs writing data to this table will respect your defined schema as long as it matches the expected structure.
- Be cautious when dropping and recreating tables. If a job creates a table, it will default to the base type with backend-specific defaults, which might not align with your source.
</div>

**Example:**
**Example:**
To ensure typed tables are imported correctly into Storage, define your table in a Snowflake SQL transformation, adhering to the desired schema and data types:

{: .image-popup}
Expand All @@ -71,7 +69,7 @@ For detailed mappings, please refer to the [conversion table](/storage/tables/da

### How to Define Data Types

#### 1. Using actual data types of the storage backend
#### Using actual data types of the storage backend
For example, in the case of Snowflake, you can create a column with a specific type like `TIMESTAMP_NTZ` or `DECIMAL(20,2)`. This approach allows you to define all details of the data type, including precision and scale. An example of such a column definition in a table-definition API endpoint call might look like this:

```
Expand All @@ -86,7 +84,7 @@ For example, in the case of Snowflake, you can create a column with a specific t
}
```

#### 2. Using Keboola-provided base types
#### Using Keboola-provided base types
Specifying native types using Keboola’s [base types](/storage/tables/data-types/#base-types) is ideal for component-provided types, as these are storage backend agnostic. This method ensures compatibility across different storage backends. Additionally, base types can also be used when defining tables via the table-definition API endpoint. The definition format is as follows:

```
Expand All @@ -99,8 +97,9 @@ Specifying native types using Keboola’s [base types](/storage/tables/data-type
### Changing Types of Existing Typed Columns
You **cannot change the type of a column in a typed table once it has been created**. However, there are multiple workarounds to address this limitation:

1. **For tables using full load:** Drop the table and create a new one with the correct types. Then, load the data into the newly created table.
2. **For tables loaded incrementally:** You will need to create a new column with the desired type and migrate the data step by step:
**For tables using full load:** Drop the table and create a new one with the correct types. Then, load the data into the newly created table.

**For tables loaded incrementally:** You will need to create a new column with the desired type and migrate the data step by step:
- Assume you have a column `date` of type `VARCHAR` in a typed table, and you want to change it to `TIMESTAMP`.
- Start by adding a new column named `date_timestamp` of type `TIMESTAMP` to the table.
- Update all jobs filling the table to populate both the new column (`date_timestamp`) and the existing column (`date`).
Expand All @@ -125,9 +124,10 @@ If you have a non-typed table, `non_typed_table`, with undefined data types and
{: .image-popup}
![Screenshot - Typed Table Transformation](/storage/tables/data-types/typed-table-transformation.png)

**Step 2: Define the Query**
- In the queries section, write an SQL query to transform the column types. Use proper casting for each column to match the desired data types.
- For example, if you need to format a date column, include the appropriate SQL casting or formatting function in your query.
**Step 2: Define the Query**
In the queries section, write an SQL query to transform the column types. Use proper casting for each column to match the desired data types.

For example, if you need to format a date column, include the appropriate SQL casting or formatting function in your query.

```
CREATE TABLE "typed_table" AS
Expand All @@ -139,11 +139,11 @@ CREATE TABLE "typed_table" AS
FROM "non_typed_table" AS ntt;
```

**Step 3: Run the Transformation**
- Execute the transformation and wait for it to complete.
**Step 3: Run the Transformation**
Execute the transformation and wait for it to complete.

**Step 4: Verify the Schema**
- Once the transformation is finished, check the schema of the newly created table, `typed_table`. It should now include the appropriate data types.
**Step 4: Verify the Schema**
Once the transformation is finished, check the schema of the newly created table, `typed_table`. It should now include the appropriate data types.

***Note:** [Incremental loading](/storage/tables/#incremental-loading) cannot be used when creating a typed table in this manner.*

Expand Down

0 comments on commit cc1ca61

Please sign in to comment.