diff --git a/docs/Gemfile.lock b/docs/Gemfile.lock index 1e39cd0..2972269 100644 --- a/docs/Gemfile.lock +++ b/docs/Gemfile.lock @@ -31,6 +31,7 @@ GEM safe_yaml (~> 1.0) terminal-table (>= 1.8, < 4.0) webrick (~> 1.7) + jekyll-mermaid (1.0.0) jekyll-sass-converter (3.0.0) sass-embedded (~> 1.54) jekyll-watch (2.2.1) @@ -67,6 +68,7 @@ PLATFORMS DEPENDENCIES jekyll (~> 4.3.0) + jekyll-mermaid kramdown-parser-gfm tzinfo (~> 1.2) tzinfo-data diff --git a/docs/_includes/block/divider.html b/docs/_includes/block/divider.html new file mode 100644 index 0000000..54b2e75 --- /dev/null +++ b/docs/_includes/block/divider.html @@ -0,0 +1 @@ +
diff --git a/docs/_includes/block/etl-step.html b/docs/_includes/block/etl-step.html index 82b50c4..ad57f28 100644 --- a/docs/_includes/block/etl-step.html +++ b/docs/_includes/block/etl-step.html @@ -18,4 +18,5 @@ - \ No newline at end of file + +{% include block/divider.html %} \ No newline at end of file diff --git a/docs/_includes/block/mermaid.html b/docs/_includes/block/mermaid.html new file mode 100644 index 0000000..e4ddd56 --- /dev/null +++ b/docs/_includes/block/mermaid.html @@ -0,0 +1,3 @@ ++ {{ include.mermaid }} +\ No newline at end of file diff --git a/docs/_includes/head.html b/docs/_includes/head.html index 835b946..b55feed 100644 --- a/docs/_includes/head.html +++ b/docs/_includes/head.html @@ -2,3 +2,5 @@ + + \ No newline at end of file diff --git a/docs/_includes/menu.html b/docs/_includes/menu.html index a5c3935..c03da6a 100644 --- a/docs/_includes/menu.html +++ b/docs/_includes/menu.html @@ -33,6 +33,9 @@ The concept + + Execution Context + Item types diff --git a/docs/assets/css/custom.css b/docs/assets/css/custom.css index 92e66e6..c2e3dca 100644 --- a/docs/assets/css/custom.css +++ b/docs/assets/css/custom.css @@ -1,4 +1,9 @@ @import "code.css"; + +img { + width: 100%; +} + #main-div { width: 100% } diff --git a/docs/assets/images/concept-flows/flow-1.png b/docs/assets/images/concept-flows/flow-1.png new file mode 100644 index 0000000..63816c6 Binary files /dev/null and b/docs/assets/images/concept-flows/flow-1.png differ diff --git a/docs/assets/images/concept-flows/flow-2.png b/docs/assets/images/concept-flows/flow-2.png new file mode 100644 index 0000000..4204fc4 Binary files /dev/null and b/docs/assets/images/concept-flows/flow-2.png differ diff --git a/docs/assets/images/concept-flows/flow-3.png b/docs/assets/images/concept-flows/flow-3.png new file mode 100644 index 0000000..240aa12 Binary files /dev/null and b/docs/assets/images/concept-flows/flow-3.png differ diff --git a/docs/assets/images/concept-flows/flow-4.png b/docs/assets/images/concept-flows/flow-4.png new file mode 100644 index 0000000..70ba0d2 Binary files /dev/null and b/docs/assets/images/concept-flows/flow-4.png differ diff --git a/docs/doc/01-understand-the-etl/execution-context.md b/docs/doc/01-understand-the-etl/execution-context.md index 5d972ff..fb9c83d 100644 --- a/docs/doc/01-understand-the-etl/execution-context.md +++ b/docs/doc/01-understand-the-etl/execution-context.md @@ -1,5 +1,72 @@ --- layout: base -title: PHP-ETL - Understand the ETL +title: PHP-ETL - Understand the ETL subTitle: Execution Context - Why to have an execution context & what it does ---- \ No newline at end of file +width: large +--- + +## Execution Context - Why to have an execution context & what it does + +In most of our examples our chain had access to the whole file system. +This means having multiple chains running together, or having a list of files each execution has generated is impossible. + +Both the šµ Symfony Bundle(and therefore the š¦¢ Sylius integration) and the Magento2 Module will use contextual chains. +This means the "main" operations have only access to a particular directory created for the execution of the chain. + +Additional operations such as the ExternalFileFinderOperation and ExternalFileProcessor will be use to +process files that are either on a remote directory (sftp, bucket s3...) or files that are on the local file system. +Because operations such as the CsvLoader will not have access to those files unless they are copied into the contextual directory of the current execution. + +Let start by a simple example. + +### Write the result of an API to a CSV File. + +{% capture description %} +For this we will first create a new ContextFactory using PerExecutionContextFactory. +This context factory will create unique contexts for each execution. This means a unique directory to run the etl +in; and a unique logger. + +This is only needed if you are running the etl in **š standalone**. With any integration this should be automatically $ +handled for you. This chapter will be the last one where we do mention standalone integrations. + + +{% endcapture %} +{% capture code %} +```php + [ + 'execution' => new PockExecution(new DateTime()) + ] +]; + +$chainProcessor->process( + new ArrayIterator([[]]), + $options +); +``` +{% endcapture %} +{% include block/etl-step.html code=code description=description %} + +Executing this will create a directory in `var/` with the output result. Everytime you execute the chain a new +directory wil be created. + diff --git a/docs/doc/01-understand-the-etl/the-concept.md b/docs/doc/01-understand-the-etl/the-concept.md index e7367c1..8d98f1f 100644 --- a/docs/doc/01-understand-the-etl/the-concept.md +++ b/docs/doc/01-understand-the-etl/the-concept.md @@ -37,6 +37,61 @@ so a GroupedItem can not be in the input of an operation, they can only be the o You can find the list of all native item types [here](doc/01-understand-the-etl/item-types.html). + +### How does it works + +We will have more detailed real use cases with sample data a bit further in the document. + +{% capture column1 %} +In the simplest case the chains receive an iterator containing 2 items in input, both items are processed by each chain operation. +This could be for example a list of customer. Each operation changes the items. +images/concept-flows +{% endcapture %} +{% capture column2 %} +data:image/s3,"s3://crabby-images/5fe4d/5fe4dc29ce8a5d1b80e0c949b03db17de9750b28" alt="rr" +{% endcapture %} +{% include block/2column.html column1=column1 column2=column2 %} + +{% include block/divider.html %} + +{% capture column1 %} +In the following example the iterator sends a single item. The first operation will then send GroupedItems containing 2 items. +The first item could be a customer, and then we fetch each order of the customer in the operation1. +{% endcapture %} +{% capture column2 %} +data:image/s3,"s3://crabby-images/e2062/e20627975d1ad3148ba446c5fe4a37e13e5d5146" alt="rr" +{% endcapture %} +{% include block/2column.html column1=column1 column2=column2 %} + +{% include block/divider.html %} + +{% capture column1 %} +We can also group items, to make aggregations. The chain receives an iterator containg 2 items, the first operation processes both items. +It breaks the chain for the first item, and returns an aggregation of item1 & item 2. +This can be used to count the number of customers. This kind of grouping can use more memory and should therefore be used with care. +{% endcapture %} +{% capture column2 %} +data:image/s3,"s3://crabby-images/c0b41/c0b41632f0971918c1e793aa8d545ef3483e060e" alt="rr" +{% endcapture %} +{% include block/2column.html column1=column1 column2=column2 %} + +{% include block/divider.html %} + +{% capture column1 %} +Chains can also be split, this would allow 2 different operations to be executed on the same item. +{% endcapture %} +{% capture column2 %} +data:image/s3,"s3://crabby-images/39f04/39f0447796a45cca605c5c91f13b4645751e899a" alt="rrr" +{% endcapture %} +{% include block/2column.html column1=column1 column2=column2 %} + +{% include block/divider.html %} + +The split operations is among the building blocks of complex executions. There are additional operations to merge +multiple branches or to repeat a part of the chain. + + + ## Example: Simple CSV Transformation To demonstrate PHP-ETLās capabilities, letās walk through a basic example where we read a CSV file, @@ -148,7 +203,7 @@ $chainProcessor->process( #### šµ Symfony For instance, the following command will process two input files and merge their output: ```bash -./bin/console etl:execute myetl.yaml "['./customers1.csv', './customers2.csv']" +./bin/console etl:execute myetl "['./customers1.csv', './customers2.csv']" ``` {% endcapture %} diff --git a/docs/index.md b/docs/index.md index e3c0157..00389b4 100644 --- a/docs/index.md +++ b/docs/index.md @@ -7,7 +7,7 @@ subTitle: ## What is PHP-ETL PHP-ETL is the go-to library for executing complex data import, export, and transformation tasks within PHP applications. -It offers seamless integrations with the [Symfony Framework](https://symfony.com/), [Sylius](https://sylius.com/fr/) , and can easily be extended to +It offers seamless integrations with the [šµ Symfony Framework](https://symfony.com/), [š¦¢ Sylius](https://sylius.com/fr/) , and can easily be integrated to other CMS and &frameworks, making it ideal for handling intricate data workflows with ease. ## Why PHP-ETL @@ -29,6 +29,84 @@ PHP-ETL handles asynchronous operationsāsuch as API callsānatively, allowing like loading data into the database while making API calls. The library also supports visualizing data flows through auto-generated diagrams, making complex workflows easier to understand and manage. -## A screenshot +## A execution tree + +{% capture mermaid %} +flowchart TD + +subgraph Execution +%% Nodes +0B(Extract Get Article API Params Data