diff --git a/docs/Gemfile.lock b/docs/Gemfile.lock index 1e39cd0..2972269 100644 --- a/docs/Gemfile.lock +++ b/docs/Gemfile.lock @@ -31,6 +31,7 @@ GEM safe_yaml (~> 1.0) terminal-table (>= 1.8, < 4.0) webrick (~> 1.7) + jekyll-mermaid (1.0.0) jekyll-sass-converter (3.0.0) sass-embedded (~> 1.54) jekyll-watch (2.2.1) @@ -67,6 +68,7 @@ PLATFORMS DEPENDENCIES jekyll (~> 4.3.0) + jekyll-mermaid kramdown-parser-gfm tzinfo (~> 1.2) tzinfo-data diff --git a/docs/_includes/block/divider.html b/docs/_includes/block/divider.html new file mode 100644 index 0000000..54b2e75 --- /dev/null +++ b/docs/_includes/block/divider.html @@ -0,0 +1 @@ +
diff --git a/docs/_includes/block/etl-step.html b/docs/_includes/block/etl-step.html index 82b50c4..ad57f28 100644 --- a/docs/_includes/block/etl-step.html +++ b/docs/_includes/block/etl-step.html @@ -18,4 +18,5 @@ - \ No newline at end of file + +{% include block/divider.html %} \ No newline at end of file diff --git a/docs/_includes/block/mermaid.html b/docs/_includes/block/mermaid.html new file mode 100644 index 0000000..e4ddd56 --- /dev/null +++ b/docs/_includes/block/mermaid.html @@ -0,0 +1,3 @@ +
+    {{ include.mermaid }}
+
\ No newline at end of file diff --git a/docs/_includes/head.html b/docs/_includes/head.html index 835b946..b55feed 100644 --- a/docs/_includes/head.html +++ b/docs/_includes/head.html @@ -2,3 +2,5 @@ + + \ No newline at end of file diff --git a/docs/_includes/menu.html b/docs/_includes/menu.html index a5c3935..c03da6a 100644 --- a/docs/_includes/menu.html +++ b/docs/_includes/menu.html @@ -33,6 +33,9 @@ The concept + + Execution Context + Item types diff --git a/docs/assets/css/custom.css b/docs/assets/css/custom.css index 92e66e6..c2e3dca 100644 --- a/docs/assets/css/custom.css +++ b/docs/assets/css/custom.css @@ -1,4 +1,9 @@ @import "code.css"; + +img { + width: 100%; +} + #main-div { width: 100% } diff --git a/docs/assets/images/concept-flows/flow-1.png b/docs/assets/images/concept-flows/flow-1.png new file mode 100644 index 0000000..63816c6 Binary files /dev/null and b/docs/assets/images/concept-flows/flow-1.png differ diff --git a/docs/assets/images/concept-flows/flow-2.png b/docs/assets/images/concept-flows/flow-2.png new file mode 100644 index 0000000..4204fc4 Binary files /dev/null and b/docs/assets/images/concept-flows/flow-2.png differ diff --git a/docs/assets/images/concept-flows/flow-3.png b/docs/assets/images/concept-flows/flow-3.png new file mode 100644 index 0000000..240aa12 Binary files /dev/null and b/docs/assets/images/concept-flows/flow-3.png differ diff --git a/docs/assets/images/concept-flows/flow-4.png b/docs/assets/images/concept-flows/flow-4.png new file mode 100644 index 0000000..70ba0d2 Binary files /dev/null and b/docs/assets/images/concept-flows/flow-4.png differ diff --git a/docs/doc/01-understand-the-etl/execution-context.md b/docs/doc/01-understand-the-etl/execution-context.md index 5d972ff..fb9c83d 100644 --- a/docs/doc/01-understand-the-etl/execution-context.md +++ b/docs/doc/01-understand-the-etl/execution-context.md @@ -1,5 +1,72 @@ --- layout: base -title: PHP-ETL - Understand the ETL +title: PHP-ETL - Understand the ETL subTitle: Execution Context - Why to have an execution context & what it does ---- \ No newline at end of file +width: large +--- + +## Execution Context - Why to have an execution context & what it does + +In most of our examples our chain had access to the whole file system. +This means having multiple chains running together, or having a list of files each execution has generated is impossible. + +Both the šŸŽµ Symfony Bundle(and therefore the šŸ¦¢ Sylius integration) and the Magento2 Module will use contextual chains. +This means the "main" operations have only access to a particular directory created for the execution of the chain. + +Additional operations such as the ExternalFileFinderOperation and ExternalFileProcessor will be use to +process files that are either on a remote directory (sftp, bucket s3...) or files that are on the local file system. +Because operations such as the CsvLoader will not have access to those files unless they are copied into the contextual directory of the current execution. + +Let start by a simple example. + +### Write the result of an API to a CSV File. + +{% capture description %} +For this we will first create a new ContextFactory using PerExecutionContextFactory. +This context factory will create unique contexts for each execution. This means a unique directory to run the etl +in; and a unique logger. + +This is only needed if you are running the etl in **šŸ˜ standalone**. With any integration this should be automatically $ +handled for you. This chapter will be the last one where we do mention standalone integrations. + + +{% endcapture %} +{% capture code %} +```php + [ + 'execution' => new PockExecution(new DateTime()) + ] +]; + +$chainProcessor->process( + new ArrayIterator([[]]), + $options +); +``` +{% endcapture %} +{% include block/etl-step.html code=code description=description %} + +Executing this will create a directory in `var/` with the output result. Everytime you execute the chain a new +directory wil be created. + diff --git a/docs/doc/01-understand-the-etl/the-concept.md b/docs/doc/01-understand-the-etl/the-concept.md index e7367c1..8d98f1f 100644 --- a/docs/doc/01-understand-the-etl/the-concept.md +++ b/docs/doc/01-understand-the-etl/the-concept.md @@ -37,6 +37,61 @@ so a GroupedItem can not be in the input of an operation, they can only be the o You can find the list of all native item types [here](doc/01-understand-the-etl/item-types.html). + +### How does it works + +We will have more detailed real use cases with sample data a bit further in the document. + +{% capture column1 %} +In the simplest case the chains receive an iterator containing 2 items in input, both items are processed by each chain operation. +This could be for example a list of customer. Each operation changes the items. +images/concept-flows +{% endcapture %} +{% capture column2 %} +![rr](/assets/images/concept-flows/flow-1.png) +{% endcapture %} +{% include block/2column.html column1=column1 column2=column2 %} + +{% include block/divider.html %} + +{% capture column1 %} +In the following example the iterator sends a single item. The first operation will then send GroupedItems containing 2 items. +The first item could be a customer, and then we fetch each order of the customer in the operation1. +{% endcapture %} +{% capture column2 %} +![rr](/assets/images/concept-flows/flow-2.png) +{% endcapture %} +{% include block/2column.html column1=column1 column2=column2 %} + +{% include block/divider.html %} + +{% capture column1 %} +We can also group items, to make aggregations. The chain receives an iterator containg 2 items, the first operation processes both items. +It breaks the chain for the first item, and returns an aggregation of item1 & item 2. +This can be used to count the number of customers. This kind of grouping can use more memory and should therefore be used with care. +{% endcapture %} +{% capture column2 %} +![rr](/assets/images/concept-flows/flow-3.png) +{% endcapture %} +{% include block/2column.html column1=column1 column2=column2 %} + +{% include block/divider.html %} + +{% capture column1 %} +Chains can also be split, this would allow 2 different operations to be executed on the same item. +{% endcapture %} +{% capture column2 %} +![rrr](/assets/images/concept-flows/flow-4.png) +{% endcapture %} +{% include block/2column.html column1=column1 column2=column2 %} + +{% include block/divider.html %} + +The split operations is among the building blocks of complex executions. There are additional operations to merge +multiple branches or to repeat a part of the chain. + + + ## Example: Simple CSV Transformation To demonstrate PHP-ETLā€™s capabilities, letā€™s walk through a basic example where we read a CSV file, @@ -148,7 +203,7 @@ $chainProcessor->process( #### šŸŽµ Symfony For instance, the following command will process two input files and merge their output: ```bash -./bin/console etl:execute myetl.yaml "['./customers1.csv', './customers2.csv']" +./bin/console etl:execute myetl "['./customers1.csv', './customers2.csv']" ``` {% endcapture %} diff --git a/docs/index.md b/docs/index.md index e3c0157..00389b4 100644 --- a/docs/index.md +++ b/docs/index.md @@ -7,7 +7,7 @@ subTitle: ## What is PHP-ETL PHP-ETL is the go-to library for executing complex data import, export, and transformation tasks within PHP applications. -It offers seamless integrations with the [Symfony Framework](https://symfony.com/), [Sylius](https://sylius.com/fr/) , and can easily be extended to +It offers seamless integrations with the [šŸŽµ Symfony Framework](https://symfony.com/), [šŸ¦¢ Sylius](https://sylius.com/fr/) , and can easily be integrated to other CMS and &frameworks, making it ideal for handling intricate data workflows with ease. ## Why PHP-ETL @@ -29,6 +29,84 @@ PHP-ETL handles asynchronous operationsā€”such as API callsā€”natively, allowing like loading data into the database while making API calls. The library also supports visualizing data flows through auto-generated diagrams, making complex workflows easier to understand and manage. -## A screenshot +## A execution tree + +{% capture mermaid %} +flowchart TD + +subgraph Execution +%% Nodes +0B(Extract Get Article API Params Data

2 / 2
00:00.064) +style 0B fill:#EEE; +1B(Get products/articles until api stop's

2 / 2
00:00.000)@{ shape: hex} +subgraph 1S[Get articles until api stop's] +100B(Make get Article API call

4 / 1 / 0
00:05.243) +style 100B fill:#ffe294; +end +style 1B fill:#EEE; +2B(Write api response to file to keep history

4 / 4
00:00.057) +style 2B fill:#EEE; +3B(Split response

5 / 5
00:00.008) +style 3B fill:#EEE; +4B(Map Api fields with Sylius attributes code

2085 / 2085
00:01.482) +style 4B fill:#EEE; +5B(Branch to handle attribute option values & product imports

2085 / 2085
04:28.817)@{ shape: hex} +subgraph 5S[Branch to handle attribute option values & product imports] +500B(Split each attribute items

2085 / 2085
00:00.248) +style 500B fill:#EEE; +501B(Load Attribute from database

89571 / 89571
00:46.995) +style 501B fill:#EEE; +502B(Add new choices to select attributes

89571 / 2
00:09.363) +style 502B fill:#EEE; +503B(Persist attribute

2 / 2
00:00.001) +style 503B fill:#EEE; +510B(Flush Doctrine before importing products

2085 / 2085
00:00.961) +style 510B fill:#EEE; +511B(Load Product from database

2085 / 2085
00:00.904) +style 511B fill:#EEE; +512B(Create or Update product

2085 / 2085
00:27.247) +style 512B fill:#EEE; +513B(Add price to product

2085 / 2085
00:01.651) +style 513B fill:#EEE; +514B(Persist entities

2085 / 2085
00:00.338) +style 514B fill:#EEE; +515B(Flush entities

2085 / 2085
00:02.117) +style 515B fill:#EEE; +516B(Clear doctrine

2085 / 2085
00:00.213) +style 516B fill:#EEE; +517B(Prepare data for Set association product API

2085 / 2085
00:00.201) +style 517B fill:#EEE; +518B(Set Sylius Product ID association - API call

2085 / 2
00:00.687) +style 518B fill:#EEE; +519B(Log association response

2085 / 4168
00:00.012) +style 519B fill:#EEE; +end +style 5B fill:#EEE; +%% Links +0B --> 1B +1B --> 100B +1B --> 2B +1S ~~~ 2B +2B --> 3B +3B --> 4B +4B --> 5B +5B --> 500B +500B --> 501B +501B --> 502B +502B --> 503B +5B --> 510B +510B --> 511B +511B --> 512B +512B --> 513B +513B --> 514B +514B --> 515B +515B --> 516B +516B --> 517B +517B --> 518B +518B --> 519B +end +{% endcapture %} + +{% include block/mermaid.html mermaid=mermaid %}