docs: add quickstart for multi-table synthesis (#89)

* docs: add quickstart for multi-table synthesis * fix(linting): code formatting --------- Co-authored-by: Fabiana Clemente <[email protected]> Co-authored-by: Azory YData Bot <[email protected]>
ydataai · Feb 20, 2024 · d7e99ca · d7e99ca
1 parent bb5d9c4
commit d7e99ca
Show file tree

Hide file tree

Showing 9 changed files with 60 additions and 1 deletion.
diff --git a/docs/assets/quickstart/synthetic_data/configure_schema_sd.webp b/docs/assets/quickstart/synthetic_data/configure_schema_sd.webp
diff --git a/docs/assets/quickstart/synthetic_data/generate_database.webp b/docs/assets/quickstart/synthetic_data/generate_database.webp
diff --git a/docs/assets/quickstart/synthetic_data/mt_anonymization.webp b/docs/assets/quickstart/synthetic_data/mt_anonymization.webp
diff --git a/docs/assets/quickstart/synthetic_data/mt_sd_trained.webp b/docs/assets/quickstart/synthetic_data/mt_sd_trained.webp
diff --git a/docs/assets/quickstart/synthetic_data/select_connector_sd_samples.webp b/docs/assets/quickstart/synthetic_data/select_connector_sd_samples.webp
diff --git a/docs/get-started/create_database_sd_generator.md b/docs/get-started/create_database_sd_generator.md
@@ -0,0 +1,57 @@
+# How to create your first Relational Database Synthetic Data generator
+
+:fontawesome-brands-youtube:{ .youtube }
+Check this quickstart video on <a href="https://youtu.be/40Q56xZbv00?si=T6DMZ-f8mAyPdzf7"><u>how to create your first Relational Database Synthetic Data generator</u></a>.
+
+To generate your first synthetic relational database, you need to have a Multi-Dataset already available in your Data Catalog.
+Check this tutorial to see how you can <a href="../create_multitable_dataset"><u>add your first dataset to Fabric’s Data Catalog</u></a>.
+
+With your database created as a Datasource, you are now able to start configure your Synthetic Data (SD) generator to create a replicate of your database.
+You can either select **"Synthetic Data"** from your left side menu, or you can select **"Create Synthetic Data"** in your project Home
+as shown in the image below.
+
+![Create Synthetic Data](../assets/quickstart/synthetic_data/create_synthetic_data.webp){: style="width:75%"}
+
+You'll be asked to select the dataset you wish to generate synthetic data from and verify the tables you'd like to
+include in the synthesis process, validating their data types - *Time-series* or *Tabular*.
+
+!!! Tip "Table data types are relevant for synthetic data quality"
+    In case some of your tables hold time-series information (meaning there is a time relation between records) it is very important
+    that during the process of configuring your synthetic data generator you do change update your tables data types accordingly.
+    This will not only ensure the quality of that particular table, but also the overall database quality and relations.
+
+![Configure the schema](../assets/quickstart/synthetic_data/configure_schema_sd.webp){: style="width:75%"}
+
+All the PK and FK identified based on the database schema definition, have an automatically created anonymization setting defined.
+Aa standard and incremental integer will be used as the anonymization configuration, but user can change to other pre-defined generation options
+or regex base (user can provide the expected pattern of generation).
+
+![Multi-Table Anonymization](../assets/quickstart/synthetic_data/mt_anonymization.webp){: style="width:75%"}
+
+Finally, as the last step of our process it comes the **Synthetic Data** generator specific configurations, for this particular case we need to
+define both *Display Name* and the *Destination connector*. The *Destination connector* it is mandatory and allow to select the database where
+the generated synthetic database is expected to be written.
+After providing both inputs we can finish the process by clicking in the **"Save"** button as per the image below.
+
+![Select a connector](../assets/quickstart/synthetic_data/select_connector_sd_samples.webp){: style="width:75%"}
+
+Your **Synthetic Data** generator is now training and listed under **"Synthetic Data"**. While the model is being trained, the *Status* will be
+🟡, as soon as the training is completed successfully it will transition to 🟢.
+Once the Synthetic Data generator has finished training, you're ready to start generating your first synthetic dataset.
+You can start by exploring an overview of the model configurations and even validate the quality of the synthetic data generator from a referential integrity
+point of view.
+
+![Synthetic data generator training completed](../assets/quickstart/synthetic_data/mt_sd_trained.webp){: style="width:75%"}
+
+Next, you can generate synthetic data samples by accessing the *Generation* tab or click on *"Go to Generation"*.
+In this section, you are able to generate as many synthetic samples as you want.
+For that you need to define the size of your database in comparison to the real one. This ratio is provided as a percentage.
+In the example below, we have asked a sample with 100% size, meaning, a synthetic database with the same size as the original.
+
+![Generate synthetic data records](../assets/quickstart/synthetic_data/generate_database.webp){: style="width:75%"}
+
+A new line in your *"Sample History"* will be shown and as soon as the sample generation is completed you will be able to
+check the quality the synthetic data already available in your destination database.
+
+**Congrats!** 🚀 You have now successfully created your first Relation **Synthetic Database** with Fabric.
+Get ready for your journey of improved quality data for AI.
diff --git a/docs/get-started/create_syntheticdata_generator.md b/docs/get-started/create_syntheticdata_generator.md
@@ -4,7 +4,7 @@
 Check this quickstart video on <a href="https://youtu.be/GsfggG9PhgE?si=ixlCaesd3cLFOCZm"><u>how to create your first Synthetic Data generator</u></a>.
 
 To generate your first synthetic data, you need to have a Dataset already available in your Data Catalog.
-Check this tutorial to see how you can <a href="upload_csv"><u>add your first dataset to Fabric’s Data Catalog</u></a>.
+Check this tutorial to see how you can <a href="../upload_csv"><u>add your first dataset to Fabric’s Data Catalog</u></a>.
 
 With your first dataset created, you are now able to start the creation of your Synthetic Data generator. You can either
 select **"Synthetic Data"** from your left side menu, or you can select **"Create Synthetic Data"** in your project Home

diff --git a/docs/get-started/index.md b/docs/get-started/index.md
@@ -7,5 +7,6 @@ data quality, data preparation workflows and how you can start leveraging synthe
 ### 📚 <a href="upload_csv"><u>Create your first Dataset with the Data Catalog</u></a>
 ### 💾 <a href="create_multitable_dataset"><u>Create your Multi-Table Dataset with the Data Catalog</u></a>
 ### ⚙️ <a href="create_syntheticdata_generator"><u>Create your first Synthetic Data generator</u></a>
+### 🗄️ <a href="create_database_sd_generator"><u>Create a Relational Database Synthetic Data generator</u></a>
 ### 🧪 <a href="create_lab"><u>Create your first Lab</u></a>
 ### 🌀 <a href="create_pipeline"><u>Create your first data Pipeline</u></a>
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -12,6 +12,7 @@ nav:
           - How to create your first Dataset from a CSV file: "get-started/upload_csv.md"
           - How to create your first Relational database in Fabric's Catalog: "get-started/create_multitable_dataset.md"
           - How to create your first Synthetic Data generator: "get-started/create_syntheticdata_generator.md"
+          - How to create your first Synthetic Data generator for databases: "get-started/create_database_sd_generator.md"
           - How to create your first Lab: "get-started/create_lab.md"
           - How to create your first Pipeline: "get-started/create_pipeline.md"
       - Fabric Community: "get-started/fabric_community.md"