Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing explanation about model loading phase #15

Merged
merged 1 commit into from
Jun 20, 2017
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ object FlinkJpmmlExample {
Some useful insights from the code:
- in order to load the PMML model, you need to specify only the PMML source path
- `ModelReader` is a lazy reader and it provides the right reading abstraction to TaskManagers
- The resulting `PMMLModel` will be **loaded once** for each TaskManager running on your architecture at
- The resulting `PMMLModel` will be **loaded by once** factory for each TaskManager running on your architecture at
_construction time_
- the `PmmlModel.predict` method expects Flink Vectors as input event and, if you want to manage NaNs, an optional
replace value;
Expand Down Expand Up @@ -155,9 +155,9 @@ the library will take care how to load the model in full compliance of the under
(e.g. HDFS, Alluxio)
- `ModelReader` is the object implementing the previous behavior; it will provide the loading methods but will read it
_lazily_, _i.e._ only when the transformation will be applied
- The `PMMLModel` will be loaded once for each TaskManager running on your architecture; that means if you have an
active TaskManager _A_ made up of 4 TaskSlots, your TM will load **one and only one** shared copy of the model; this is
crucial in order to let the system scale (still simple PMML models can grow to several hundreds of MegaBytes
- The `PMMLModel` will be loaded by a singleton model factory for each TaskManager running on your architecture; that means if you have an
active TaskManager _A_ made up of 4 TaskSlots, your TM will load the model from a single loader entity; this is
crucial in order to let the system scale in thread-safety (still simple PMML models can grow to several hundreds of MegaBytes
proportionally to the model size, meaning a big load in memory terms)
- the `PmmlModel.predict` method expects Flink Vectors as input events; this choice let us to leverage the underlying
Breeze implementation and **no reflection** will be applied at all; moreover, the user don't have to specify any
Expand Down