Skip to content

Commit

Permalink
doc: Added description on contextual executions
Browse files Browse the repository at this point in the history
  • Loading branch information
oliverde8 committed Nov 10, 2024
1 parent 7a7d841 commit 7dc70c0
Show file tree
Hide file tree
Showing 14 changed files with 223 additions and 6 deletions.
2 changes: 2 additions & 0 deletions docs/Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ GEM
safe_yaml (~> 1.0)
terminal-table (>= 1.8, < 4.0)
webrick (~> 1.7)
jekyll-mermaid (1.0.0)
jekyll-sass-converter (3.0.0)
sass-embedded (~> 1.54)
jekyll-watch (2.2.1)
Expand Down Expand Up @@ -67,6 +68,7 @@ PLATFORMS

DEPENDENCIES
jekyll (~> 4.3.0)
jekyll-mermaid
kramdown-parser-gfm
tzinfo (~> 1.2)
tzinfo-data
Expand Down
1 change: 1 addition & 0 deletions docs/_includes/block/divider.html
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<div class="ui divider"></div>
3 changes: 2 additions & 1 deletion docs/_includes/block/etl-step.html
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,5 @@
</div>
</div>
</div>
</div>
</div>
{% include block/divider.html %}
3 changes: 3 additions & 0 deletions docs/_includes/block/mermaid.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
<pre class="mermaid">
{{ include.mermaid }}
</pre>
2 changes: 2 additions & 0 deletions docs/_includes/head.html
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,5 @@
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/semantic.min.js"></script>

<link rel="stylesheet" href='{{ "/assets/css/custom.css" | absolute_url }}' />

<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/mermaid.min.js"></script>
3 changes: 3 additions & 0 deletions docs/_includes/menu.html
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,9 @@
<a class="item" href="/doc/01-understand-the-etl/the-concept.html">
The concept
</a>
<a class="item" href="/doc/01-understand-the-etl/execution-context.html">
Execution Context
</a>

<a class="item" href="/doc/01-understand-the-etl/item-types">
Item types
Expand Down
5 changes: 5 additions & 0 deletions docs/assets/css/custom.css
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
@import "code.css";

img {
width: 100%;
}

#main-div {
width: 100%
}
Expand Down
Binary file added docs/assets/images/concept-flows/flow-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/images/concept-flows/flow-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/images/concept-flows/flow-3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/images/concept-flows/flow-4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
71 changes: 69 additions & 2 deletions docs/doc/01-understand-the-etl/execution-context.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,72 @@
---
layout: base
title: PHP-ETL - Understand the ETL
title: PHP-ETL - Understand the ETL
subTitle: Execution Context - Why to have an execution context & what it does
---
width: large
---

## Execution Context - Why to have an execution context & what it does

In most of our examples our chain had access to the whole file system.
This means having multiple chains running together, or having a list of files each execution has generated is impossible.

Both the 🎵 Symfony Bundle(and therefore the 🦢 Sylius integration) and the Magento2 Module will use contextual chains.
This means the "main" operations have only access to a particular directory created for the execution of the chain.

Additional operations such as the ExternalFileFinderOperation and ExternalFileProcessor will be use to
process files that are either on a remote directory (sftp, bucket s3...) or files that are on the local file system.
Because operations such as the CsvLoader will not have access to those files unless they are copied into the contextual directory of the current execution.

Let start by a simple example.

### Write the result of an API to a CSV File.

{% capture description %}
For this we will first create a new ContextFactory using PerExecutionContextFactory.
This context factory will create unique contexts for each execution. This means a unique directory to run the etl
in; and a unique logger.

This is only needed if you are running the etl in **🐘 standalone**. With any integration this should be automatically $
handled for you. This chapter will be the last one where we do mention standalone integrations.


{% endcapture %}
{% capture code %}
```php
<?php
$workdir = __DIR__ . "/var/";
$dirManager = new ChainWorkDirManager($workdir);
$loggerFactory = new NullLoggerFactory();
$fileFactory = new LocalFileSystemFactory($dirManager);

return new PerExecutionContextFactory(
$dirManager,
$fileFactory,
$loggerFactory
);
```
{% endcapture %}
{% include block/etl-step.html code=code description=description %}

{% capture description %}
The execution is identified with objects of type ExecutionInterface set on the processor:
{% endcapture %}
{% capture code %}
```php
$options = [
'etl' => [
'execution' => new PockExecution(new DateTime())
]
];

$chainProcessor->process(
new ArrayIterator([[]]),
$options
);
```
{% endcapture %}
{% include block/etl-step.html code=code description=description %}

Executing this will create a directory in `var/` with the output result. Everytime you execute the chain a new
directory wil be created.

57 changes: 56 additions & 1 deletion docs/doc/01-understand-the-etl/the-concept.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,61 @@ so a GroupedItem can not be in the input of an operation, they can only be the o

You can find the list of all native item types [here](doc/01-understand-the-etl/item-types.html).


### How does it works

We will have more detailed real use cases with sample data a bit further in the document.

{% capture column1 %}
In the simplest case the chains receive an iterator containing 2 items in input, both items are processed by each chain operation.
This could be for example a list of customer. Each operation changes the items.
images/concept-flows
{% endcapture %}
{% capture column2 %}
![rr](/assets/images/concept-flows/flow-1.png)
{% endcapture %}
{% include block/2column.html column1=column1 column2=column2 %}

{% include block/divider.html %}

{% capture column1 %}
In the following example the iterator sends a single item. The first operation will then send GroupedItems containing 2 items.
The first item could be a customer, and then we fetch each order of the customer in the operation1.
{% endcapture %}
{% capture column2 %}
![rr](/assets/images/concept-flows/flow-2.png)
{% endcapture %}
{% include block/2column.html column1=column1 column2=column2 %}

{% include block/divider.html %}

{% capture column1 %}
We can also group items, to make aggregations. The chain receives an iterator containg 2 items, the first operation processes both items.
It breaks the chain for the first item, and returns an aggregation of item1 & item 2.
This can be used to count the number of customers. This kind of grouping can use more memory and should therefore be used with care.
{% endcapture %}
{% capture column2 %}
![rr](/assets/images/concept-flows/flow-3.png)
{% endcapture %}
{% include block/2column.html column1=column1 column2=column2 %}

{% include block/divider.html %}

{% capture column1 %}
Chains can also be split, this would allow 2 different operations to be executed on the same item.
{% endcapture %}
{% capture column2 %}
![rrr](/assets/images/concept-flows/flow-4.png)
{% endcapture %}
{% include block/2column.html column1=column1 column2=column2 %}

{% include block/divider.html %}

The split operations is among the building blocks of complex executions. There are additional operations to merge
multiple branches or to repeat a part of the chain.



## Example: Simple CSV Transformation

To demonstrate PHP-ETL’s capabilities, let’s walk through a basic example where we read a CSV file,
Expand Down Expand Up @@ -148,7 +203,7 @@ $chainProcessor->process(
#### 🎵 Symfony
For instance, the following command will process two input files and merge their output:
```bash
./bin/console etl:execute myetl.yaml "['./customers1.csv', './customers2.csv']"
./bin/console etl:execute myetl "['./customers1.csv', './customers2.csv']"
```
{% endcapture %}

Expand Down
82 changes: 80 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ subTitle:
## What is PHP-ETL

PHP-ETL is the go-to library for executing complex data import, export, and transformation tasks within PHP applications.
It offers seamless integrations with the [Symfony Framework](https://symfony.com/), [Sylius](https://sylius.com/fr/) , and can easily be extended to
It offers seamless integrations with the [🎵 Symfony Framework](https://symfony.com/), [🦢 Sylius](https://sylius.com/fr/) , and can easily be integrated to
other CMS and &frameworks, making it ideal for handling intricate data workflows with ease.

## Why PHP-ETL
Expand All @@ -29,6 +29,84 @@ PHP-ETL handles asynchronous operations—such as API calls—natively, allowing
like loading data into the database while making API calls. The library also supports visualizing data flows
through auto-generated diagrams, making complex workflows easier to understand and manage.

## A screenshot
## A execution tree

{% capture mermaid %}
flowchart TD

subgraph Execution
%% Nodes
0B(Extract Get Article API Params Data<br/><br/>2<i class="sign in alternate icon"></i> / 2<i class="sign out alternate icon"></i><br/>00:00.064<i class="hourglass half icon"></i>)
style 0B fill:#EEE;
1B(Get products/articles until api stop's<br/><br/>2<i class="sign in alternate icon"></i> / 2<i class="sign out alternate icon"></i><br/>00:00.000<i class="hourglass half icon"></i>)@{ shape: hex}
subgraph 1S[Get articles until api stop's]
100B(Make get Article API call<br/><br/>4<i class="sign in alternate icon"></i> / 1<i class="clock icon"></i> / 0<i class="sign out alternate icon"></i><br/>00:05.243<i class="hourglass half icon"></i>)
style 100B fill:#ffe294;
end
style 1B fill:#EEE;
2B(Write api response to file to keep history<br/><br/>4<i class="sign in alternate icon"></i> / 4<i class="sign out alternate icon"></i><br/>00:00.057<i class="hourglass half icon"></i>)
style 2B fill:#EEE;
3B(Split response<br/><br/>5<i class="sign in alternate icon"></i> / 5<i class="sign out alternate icon"></i><br/>00:00.008<i class="hourglass half icon"></i>)
style 3B fill:#EEE;
4B(Map Api fields with Sylius attributes code<br/><br/>2085<i class="sign in alternate icon"></i> / 2085<i class="sign out alternate icon"></i><br/>00:01.482<i class="hourglass half icon"></i>)
style 4B fill:#EEE;
5B(Branch to handle attribute option values & product imports<br/><br/>2085<i class="sign in alternate icon"></i> / 2085<i class="sign out alternate icon"></i><br/>04:28.817<i class="hourglass half icon"></i>)@{ shape: hex}
subgraph 5S[Branch to handle attribute option values & product imports]
500B(Split each attribute items<br/><br/>2085<i class="sign in alternate icon"></i> / 2085<i class="sign out alternate icon"></i><br/>00:00.248<i class="hourglass half icon"></i>)
style 500B fill:#EEE;
501B(Load Attribute from database<br/><br/>89571<i class="sign in alternate icon"></i> / 89571<i class="sign out alternate icon"></i><br/>00:46.995<i class="hourglass half icon"></i>)
style 501B fill:#EEE;
502B(Add new choices to select attributes<br/><br/>89571<i class="sign in alternate icon"></i> / 2<i class="sign out alternate icon"></i><br/>00:09.363<i class="hourglass half icon"></i>)
style 502B fill:#EEE;
503B(Persist attribute<br/><br/>2<i class="sign in alternate icon"></i> / 2<i class="sign out alternate icon"></i><br/>00:00.001<i class="hourglass half icon"></i>)
style 503B fill:#EEE;
510B(Flush Doctrine before importing products<br/><br/>2085<i class="sign in alternate icon"></i> / 2085<i class="sign out alternate icon"></i><br/>00:00.961<i class="hourglass half icon"></i>)
style 510B fill:#EEE;
511B(Load Product from database<br/><br/>2085<i class="sign in alternate icon"></i> / 2085<i class="sign out alternate icon"></i><br/>00:00.904<i class="hourglass half icon"></i>)
style 511B fill:#EEE;
512B(Create or Update product<br/><br/>2085<i class="sign in alternate icon"></i> / 2085<i class="sign out alternate icon"></i><br/>00:27.247<i class="hourglass half icon"></i>)
style 512B fill:#EEE;
513B(Add price to product<br/><br/>2085<i class="sign in alternate icon"></i> / 2085<i class="sign out alternate icon"></i><br/>00:01.651<i class="hourglass half icon"></i>)
style 513B fill:#EEE;
514B(Persist entities<br/><br/>2085<i class="sign in alternate icon"></i> / 2085<i class="sign out alternate icon"></i><br/>00:00.338<i class="hourglass half icon"></i>)
style 514B fill:#EEE;
515B(Flush entities<br/><br/>2085<i class="sign in alternate icon"></i> / 2085<i class="sign out alternate icon"></i><br/>00:02.117<i class="hourglass half icon"></i>)
style 515B fill:#EEE;
516B(Clear doctrine<br/><br/>2085<i class="sign in alternate icon"></i> / 2085<i class="sign out alternate icon"></i><br/>00:00.213<i class="hourglass half icon"></i>)
style 516B fill:#EEE;
517B(Prepare data for Set association product API<br/><br/>2085<i class="sign in alternate icon"></i> / 2085<i class="sign out alternate icon"></i><br/>00:00.201<i class="hourglass half icon"></i>)
style 517B fill:#EEE;
518B(Set Sylius Product ID association - API call<br/><br/>2085<i class="sign in alternate icon"></i> / 2<i class="sign out alternate icon"></i><br/>00:00.687<i class="hourglass half icon"></i>)
style 518B fill:#EEE;
519B(Log association response<br/><br/>2085<i class="sign in alternate icon"></i> / 4168<i class="sign out alternate icon"></i><br/>00:00.012<i class="hourglass half icon"></i>)
style 519B fill:#EEE;
end
style 5B fill:#EEE;
%% Links
0B --> 1B
1B --> 100B
1B --> 2B
1S ~~~ 2B
2B --> 3B
3B --> 4B
4B --> 5B
5B --> 500B
500B --> 501B
501B --> 502B
502B --> 503B
5B --> 510B
510B --> 511B
511B --> 512B
512B --> 513B
513B --> 514B
514B --> 515B
515B --> 516B
516B --> 517B
517B --> 518B
518B --> 519B
end
{% endcapture %}

{% include block/mermaid.html mermaid=mermaid %}


0 comments on commit 7dc70c0

Please sign in to comment.