diff --git a/docs/index.html b/docs/index.html index 438f935..327fb08 100644 --- a/docs/index.html +++ b/docs/index.html @@ -254,7 +254,7 @@
Tutorials
#uuid "d7a602a1-247c-401f-b5d3-6951f2953a9e" {:model-data {:majority-class 1.0, :distinct-labels (0.0 1.0)}, :options {:model-type :metamorph.ml/dummy-classifier}, :id #uuid "c6712401-c327-4f88-b609-1d1861c44d09", :feature-columns [:sex :pclass :embarked], :target-columns [:survived], :target-categorical-maps {:survived #tech.v3.dataset.categorical.CategoricalMap{:lookup-table {"no" 0, "yes" 1}, :src-column :survived, :result-datatype :float64}}, :scicloj.metamorph.ml/unsupervised? nil}
}
+:metamorph/mode :fit
#uuid "ea0466aa-637c-4cab-9be7-546134ce18d0" {:model-data {:majority-class 0.0, :distinct-labels (1.0 0.0)}, :options {:model-type :metamorph.ml/dummy-classifier}, :id #uuid "ba2e722d-58fe-4717-89cd-59f38c6c786f", :feature-columns [:sex :pclass :embarked], :target-columns [:survived], :target-categorical-maps {:survived #tech.v3.dataset.categorical.CategoricalMap{:lookup-table {"no" 0, "yes" 1}, :src-column :survived, :result-datatype :float64}}, :scicloj.metamorph.ml/unsupervised? nil}
}
keys ctx-after-train) (
:metamorph/data
(:metamorph/mode
- "d7a602a1-247c-401f-b5d3-6951f2953a9e") #uuid
This context map has the “data”, the “mode” and an UUID for each operation (we had only one in this pipeline)
:fit
-
{:model-data {:majority-class 1.0, :distinct-labels (0.0 1.0)},
+
{:model-data {:majority-class 0.0, :distinct-labels (1.0 0.0)},
:options {:model-type :metamorph.ml/dummy-classifier},
- :id #uuid "c6712401-c327-4f88-b609-1d1861c44d09",
+ :id #uuid "ba2e722d-58fe-4717-89cd-59f38c6c786f",
:feature-columns [:sex :pclass :embarked],
:target-columns [:survived],
:target-categorical-maps
@@ -701,70 +701,70 @@
-1.0
+0.0
-1.0
+0.0
-1.0
+0.0
-1.0
+0.0
-1.0
+0.0
-1.0
+0.0
-1.0
+0.0
-1.0
+0.0
-1.0
+0.0
-1.0
+0.0
...
-1.0
+0.0
-1.0
+0.0
-1.0
+0.0
-1.0
+0.0
-1.0
+0.0
-1.0
+0.0
-1.0
+0.0
-1.0
+0.0
-1.0
+0.0
-1.0
+0.0
-1.0
+0.0
@@ -782,7 +782,7 @@
-#uuid "d7a602a1-247c-401f-b5d3-6951f2953a9e"
+#uuid "ea0466aa-637c-4cab-9be7-546134ce18d0"
@@ -823,29 +823,29 @@
-1.0
-3.0
0.0
+1.0
+1.0
1.0
+1.0
2.0
-0.0
-0.0
+1.0
3.0
-0.0
+1.0
-1.0
+0.0
3.0
-1.0
+0.0
0.0
-1.0
-0.0
+2.0
+2.0
0.0
@@ -854,17 +854,17 @@
0.0
-3.0
+1.0
0.0
-1.0
-1.0
+0.0
+3.0
0.0
-0.0
-3.0
+1.0
+1.0
0.0
@@ -879,27 +879,27 @@
0.0
-3.0
+1.0
0.0
-1.0
-2.0
0.0
+3.0
+2.0
-1.0
-3.0
+0.0
+2.0
0.0
0.0
3.0
-2.0
+0.0
+0.0
1.0
-3.0
0.0
@@ -913,23 +913,23 @@ 2.0
+1.0
+1.0
0.0
-2.0
-2.0
-1.0
-3.0
+0.0
2.0
+0.0
0.0
-3.0
+2.0
0.0
0.0
-2.0
+3.0
0.0
@@ -940,10 +940,10 @@ :model-data {:majority-class 1.0, :distinct-labels (0.0 1.0)}
+:model-data {:majority-class 0.0, :distinct-labels (1.0 0.0)}
-:id #uuid "c6712401-c327-4f88-b609-1d1861c44d09"
+:id #uuid "ba2e722d-58fe-4717-89cd-59f38c6c786f"
@@ -966,7 +966,7 @@
-1.0
+0.0
1.0
@@ -987,13 +987,13 @@ 0.0
-1.0
+0.0
-0.0
+1.0
-0.0
+1.0
...
@@ -1002,7 +1002,7 @@ 0.0
-1.0
+0.0
0.0
@@ -1017,16 +1017,16 @@ 1.0
-0.0
+1.0
-0.0
+1.0
1.0
-0.0
+1.0
0.0
@@ -1057,7 +1057,7 @@
178]
#tech.v3.dataset.column<float64>[:survived
-1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000...] [
+0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000...]
[
This works as long as all operations of the pipeline follow the metamorph convention (we can create such compliant functions, out of normal dataset->dataset functions, as we will see)
my-pipeline
represents therefore a not yet executed model training / prediction. It can be freely moved around and applied to a dataset when needed.
@@ -1240,7 +1240,7 @@ :metamorph/mode :fit#uuid "ec3c3517-ecaa-42f9-bd58-c7c5fcbde760" {:model-data {:majority-class 1, :distinct-labels (0.0 1.0)}, :options {:model-type :metamorph.ml/dummy-classifier}, :id #uuid "5f1f1d58-a4f3-461e-aa3e-8a7615dfaa90", :feature-columns [:sex :pclass :embarked], :target-columns [:survived], :target-categorical-maps {:survived #tech.v3.dataset.categorical.CategoricalMap{:lookup-table {"no" 0, "yes" 1}, :src-column :survived, :result-datatype :float64}}, :scicloj.metamorph.ml/unsupervised? nil}
}
+:metamorph/mode :fit
#uuid "0991b673-9ecc-43fd-ae5e-7f4ae54f4725" {:model-data {:majority-class 1, :distinct-labels (0.0 1.0)}, :options {:model-type :metamorph.ml/dummy-classifier}, :id #uuid "062ff539-8cfb-434b-9db6-655a4b676e5b", :feature-columns [:sex :pclass :embarked], :target-columns [:survived], :target-categorical-maps {:survived #tech.v3.dataset.categorical.CategoricalMap{:lookup-table {"no" 0, "yes" 1}, :src-column :survived, :result-datatype :float64}}, :scicloj.metamorph.ml/unsupervised? nil}
}
To show the power of pipelines, I start with doing the simplest possible pipeline, and expand then on it.
we can already chain train and test with usual functions:
@@ -1252,7 +1252,7 @@
178]
#tech.v3.dataset.column<float64>[:survived
-1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000...] [
+0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000...] [
the same with pipelines
@@ -1268,7 +1268,7 @@
178]
#tech.v3.dataset.column<float64>[:survived
-1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000...] [
+0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000...] [
diff --git a/docs/noj_book.interactions_ols.html b/docs/noj_book.interactions_ols.html
index d8fe8b7..dbab3b2 100644
--- a/docs/noj_book.interactions_ols.html
+++ b/docs/noj_book.interactions_ols.html
@@ -373,24 +373,24 @@
:sales
-8.23509959
-3.53966439
-1.65867320E-13
-0.42982654
+7.62542540
+3.55542341
+4.52926585E-12
+0.46625902
:youtube
-26.95976843
-0.04718231
+24.86578321
+0.04559160
0.00000000E+00
-0.00175010
+0.00183351
:facebook
-17.70842019
-0.17790322
+17.85573663
+0.18413870
0.00000000E+00
-0.01004625
+0.01031258
@@ -401,14 +401,14 @@ -> evaluations flatten first :test-transform :metric) (
-1.942297012809736
+1.8638458183671391
\(R^2\)
-> evaluations flatten first :test-transform :other-metrices first :metric) (
-0.8938273864418694
+0.9280020874640866
@@ -439,11 +439,11 @@ _unnamed [4 5]:
-
-
-
-
-
+
+
+
+
+
@@ -457,31 +457,31 @@
:sales
-18.77159194
-8.08558750
-0.00000000E+00
-0.43073531
+24.98053874
+7.68473206
+0.00000000
+0.30762876
:youtube
-8.77310479
-0.01901125
-8.88178420E-15
-0.00216699
+13.66590205
+0.02072158
+0.00000000
+0.00151630
:facebook
-2.49145705
-0.03114653
-1.39910231E-02
-0.01250133
+5.83295846
+0.05643198
+0.00000004
+0.00967468
:youtube*facebook
-14.83474088
-0.00090064
-0.00000000E+00
-0.00006071
+17.35234974
+0.00079634
+0.00000000
+0.00004589
@@ -492,14 +492,14 @@ -> evaluations flatten first :test-transform :metric) (
-0.9540779100788728
+1.4887075967183183
\(R^2\)
-> evaluations flatten first :test-transform :other-metrices first :metric) (
-0.9716904782821529
+0.950824042478244
\(RMSE\) and \(R^2\) of the intercation model are sligtly better.
These results suggest that the model with the interaction term is better than the model that contains only main effects. So, for this specific data, we should go for the model with the interaction model.
diff --git a/docs/noj_book.known_issues.html b/docs/noj_book.known_issues.html
index a44e566..3fd25f8 100644
--- a/docs/noj_book.known_issues.html
+++ b/docs/noj_book.known_issues.html
@@ -233,7 +233,7 @@
Table of contents
@@ -269,8 +269,8 @@ 4
-
-4.0.1 scicloj.ml.tribuo
+
+4.1 scicloj.ml.tribuo
Due to a current bug regarding cyclic depdendencies, when using scicloj.ml.tribuo for machine learning, it is necessary to include it explicitly in your project dependencies (even though it is a depedency of Noj itself):
:git/url "https://github.com/scicloj/scicloj.ml.tribuo"
diff --git a/docs/noj_book.visualizing_correlation_matrices.html b/docs/noj_book.visualizing_correlation_matrices.html
index da6c08b..7062f98 100644
--- a/docs/noj_book.visualizing_correlation_matrices.html
+++ b/docs/noj_book.visualizing_correlation_matrices.html
@@ -541,7 +541,7 @@ scicloj/scicloj.ml.tribuo {
Note the slider control and the tooltips.
Here is an example with an actual correlation matrix.
diff --git a/docs/search.json b/docs/search.json
index 8257acf..9497a9c 100644
--- a/docs/search.json
+++ b/docs/search.json
@@ -14,7 +14,7 @@
"href": "index.html#existing-chapters-in-this-book",
"title": "Scinojure Documentation",
"section": "1.1 Existing chapters in this book:",
- "text": "1.1 Existing chapters in this book:\n\nOverview\n\nUnderlying libraries\nRecommended libraries\nknown_issues\n\nTutorials\n\nMachine learning\nMachine learning specific functionality in tech.ml.dataset\nAutoML using metamorph pipelines\nOrdinary least squares with interactions\nDatasets\nVisualizing correlation matrices (experimental 🛠) - DRAFT\n\n\n\nsource: notebooks/index.clj",
+ "text": "1.1 Existing chapters in this book:\n\nOverview\n\nUnderlying libraries\nRecommended libraries\nKnown issues ❗\n\nTutorials\n\nMachine learning\nMachine learning specific functionality in tech.ml.dataset\nAutoML using metamorph pipelines\nOrdinary least squares with interactions\nDatasets\nVisualizing correlation matrices (experimental 🛠) - DRAFT\n\n\n\nsource: notebooks/index.clj",
"crumbs": [
"1 Preface"
]
@@ -46,7 +46,18 @@
"href": "noj_book.known_issues.html",
"title": "4 Known issues ❗",
"section": "",
- "text": "4.0.1 scicloj.ml.tribuo\nDue to a current bug regarding cyclic depdendencies, when using scicloj.ml.tribuo for machine learning, it is necessary to include it explicitly in your project dependencies (even though it is a depedency of Noj itself):\n\nscicloj/scicloj.ml.tribuo {:git/url \"https://github.com/scicloj/scicloj.ml.tribuo\"\n :git/sha \"f4ebf1e1bb78eb99dd35ca886d75b9f65d800e8d\"}\n\n\nsource: notebooks/noj_book/known_issues.clj",
+ "text": "4.1 scicloj.ml.tribuo\nDue to a current bug regarding cyclic depdendencies, when using scicloj.ml.tribuo for machine learning, it is necessary to include it explicitly in your project dependencies (even though it is a depedency of Noj itself):",
+ "crumbs": [
+ "Overview",
+ "4 Known issues ❗"
+ ]
+ },
+ {
+ "objectID": "noj_book.known_issues.html#scicloj.ml.tribuo",
+ "href": "noj_book.known_issues.html#scicloj.ml.tribuo",
+ "title": "4 Known issues ❗",
+ "section": "",
+ "text": "scicloj/scicloj.ml.tribuo {:git/url \"https://github.com/scicloj/scicloj.ml.tribuo\"\n :git/sha \"f4ebf1e1bb78eb99dd35ca886d75b9f65d800e8d\"}\n\n\nsource: notebooks/noj_book/known_issues.clj",
"crumbs": [
"Overview",
"4 Known issues ❗"
@@ -189,7 +200,7 @@
"href": "noj_book.automl.html#the-metamorph-pipeline-abstraction",
"title": "7 AutoML using metamorph pipelines",
"section": "",
- "text": "(require '[scicloj.metamorph.ml :as ml]\n '[scicloj.metamorph.core :as mm]\n '[tablecloth.api :as tc])\n\n\n\n(def titanic ml-basic/numeric-titanic-data)\n\n\n\n(def splits (first (tc/split->seq titanic)))\n\n\n(def train-ds (:train splits))\n\n\n(def test-ds (:test splits))\n\n\n\n\n(def my-pipeline\n (mm/pipeline\n (ml/model {:model-type :metamorph.ml/dummy-classifier})))\n\n\n\nmy-pipeline\n\n\n#function[clojure.core/partial/fn--5925]\n\n\n\n\n\n(def ctx-after-train\n (my-pipeline {:metamorph/data train-ds\n :metamorph/mode :fit}))\n\n\nctx-after-train\n\n{\n\n\n\n\n\n\n\n\n:metamorph/data\n\n\n\nGroup: 0 [711 4]:\n\n\n\n:sex\n:pclass\n:embarked\n:survived\n\n\n\n\n0.0\n2.0\n2.0\n0.0\n\n\n0.0\n1.0\n0.0\n1.0\n\n\n0.0\n3.0\n2.0\n1.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n1.0\n1.0\n2.0\n1.0\n\n\n0.0\n1.0\n0.0\n0.0\n\n\n0.0\n3.0\n2.0\n0.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n0.0\n3.0\n0.0\n1.0\n\n\n...\n...\n...\n...\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n0.0\n2.0\n0.0\n0.0\n\n\n1.0\n1.0\n0.0\n1.0\n\n\n1.0\n2.0\n0.0\n1.0\n\n\n0.0\n2.0\n2.0\n0.0\n\n\n0.0\n1.0\n0.0\n0.0\n\n\n0.0\n3.0\n2.0\n0.0\n\n\n0.0\n1.0\n2.0\n1.0\n\n\n1.0\n3.0\n1.0\n1.0\n\n\n0.0\n1.0\n0.0\n0.0\n\n\n0.0\n3.0\n2.0\n0.0\n\n\n\n\n\n\n\n\n:metamorph/mode :fit#uuid \"d7a602a1-247c-401f-b5d3-6951f2953a9e\" {:model-data {:majority-class 1.0, :distinct-labels (0.0 1.0)}, :options {:model-type :metamorph.ml/dummy-classifier}, :id #uuid \"c6712401-c327-4f88-b609-1d1861c44d09\", :feature-columns [:sex :pclass :embarked], :target-columns [:survived], :target-categorical-maps {:survived #tech.v3.dataset.categorical.CategoricalMap{:lookup-table {\"no\" 0, \"yes\" 1}, :src-column :survived, :result-datatype :float64}}, :scicloj.metamorph.ml/unsupervised? nil}}\n\n(keys ctx-after-train)\n\n\n(:metamorph/data\n :metamorph/mode\n #uuid \"d7a602a1-247c-401f-b5d3-6951f2953a9e\")\n\n\n\n(vals ctx-after-train)\n\n(Group: 0 [711 4]:\n\n\n\n:sex\n:pclass\n:embarked\n:survived\n\n\n\n\n0.0\n2.0\n2.0\n0.0\n\n\n0.0\n1.0\n0.0\n1.0\n\n\n0.0\n3.0\n2.0\n1.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n1.0\n1.0\n2.0\n1.0\n\n\n0.0\n1.0\n0.0\n0.0\n\n\n0.0\n3.0\n2.0\n0.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n0.0\n3.0\n0.0\n1.0\n\n\n...\n...\n...\n...\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n0.0\n2.0\n0.0\n0.0\n\n\n1.0\n1.0\n0.0\n1.0\n\n\n1.0\n2.0\n0.0\n1.0\n\n\n0.0\n2.0\n2.0\n0.0\n\n\n0.0\n1.0\n0.0\n0.0\n\n\n0.0\n3.0\n2.0\n0.0\n\n\n0.0\n1.0\n2.0\n1.0\n\n\n1.0\n3.0\n1.0\n1.0\n\n\n0.0\n1.0\n0.0\n0.0\n\n\n0.0\n3.0\n2.0\n0.0\n\n\n\n:fit\n{:model-data {:majority-class 1.0, :distinct-labels (0.0 1.0)},\n :options {:model-type :metamorph.ml/dummy-classifier},\n :id #uuid \"c6712401-c327-4f88-b609-1d1861c44d09\",\n :feature-columns [:sex :pclass :embarked],\n :target-columns [:survived],\n :target-categorical-maps\n {:survived\n {:lookup-table {\"no\" 0, \"yes\" 1},\n :src-column :survived,\n :result-datatype :float64}},\n :scicloj.metamorph.ml/unsupervised? nil}\n)\n\n\n\n(def ctx-after-predict\n (my-pipeline (assoc ctx-after-train\n :metamorph/mode :transform\n :metamorph/data test-ds)))\n\n\nctx-after-predict\n\n{\n\n\n\n\n\n\n\n\n:metamorph/data\n\n\n\n_unnamed [178 1]:\n\n\n\n:survived\n\n\n\n\n1.0\n\n\n1.0\n\n\n1.0\n\n\n1.0\n\n\n1.0\n\n\n1.0\n\n\n1.0\n\n\n1.0\n\n\n1.0\n\n\n1.0\n\n\n...\n\n\n1.0\n\n\n1.0\n\n\n1.0\n\n\n1.0\n\n\n1.0\n\n\n1.0\n\n\n1.0\n\n\n1.0\n\n\n1.0\n\n\n1.0\n\n\n1.0\n\n\n\n\n\n\n\n\n:metamorph/mode :transform\n\n\n\n\n\n\n\n\n#uuid \"d7a602a1-247c-401f-b5d3-6951f2953a9e\"\n\n\n\n{\n\n\n:feature-columns [:sex :pclass :embarked]\n\n\n:target-categorical-maps {:survived #tech.v3.dataset.categorical.CategoricalMap{:lookup-table {\"no\" 0, \"yes\" 1}, :src-column :survived, :result-datatype :float64}}\n\n\n:target-columns [:survived]\n\n\n:scicloj.metamorph.ml/unsupervised? nil\n\n\n\n\n\n\n\n\n\n:scicloj.metamorph.ml/feature-ds\n\n\n\nGroup: 0 [178 3]:\n\n\n\n:sex\n:pclass\n:embarked\n\n\n\n\n1.0\n3.0\n0.0\n\n\n1.0\n2.0\n0.0\n\n\n0.0\n3.0\n0.0\n\n\n1.0\n3.0\n1.0\n\n\n0.0\n1.0\n0.0\n\n\n0.0\n3.0\n0.0\n\n\n0.0\n3.0\n0.0\n\n\n1.0\n1.0\n0.0\n\n\n0.0\n3.0\n0.0\n\n\n0.0\n1.0\n0.0\n\n\n...\n...\n...\n\n\n0.0\n3.0\n0.0\n\n\n1.0\n2.0\n0.0\n\n\n1.0\n3.0\n0.0\n\n\n0.0\n3.0\n2.0\n\n\n1.0\n3.0\n0.0\n\n\n0.0\n3.0\n0.0\n\n\n0.0\n3.0\n2.0\n\n\n0.0\n2.0\n2.0\n\n\n1.0\n3.0\n2.0\n\n\n0.0\n3.0\n0.0\n\n\n0.0\n2.0\n0.0\n\n\n\n\n\n\n\n\n\n:model-data {:majority-class 1.0, :distinct-labels (0.0 1.0)}\n\n\n:id #uuid \"c6712401-c327-4f88-b609-1d1861c44d09\"\n\n\n\n\n\n\n\n\n\n:scicloj.metamorph.ml/target-ds\n\n\n\nGroup: 0 [178 1]:\n\n\n\n:survived\n\n\n\n\n1.0\n\n\n1.0\n\n\n0.0\n\n\n1.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n1.0\n\n\n0.0\n\n\n0.0\n\n\n...\n\n\n0.0\n\n\n1.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n1.0\n\n\n0.0\n\n\n0.0\n\n\n1.0\n\n\n0.0\n\n\n0.0\n\n\n\n\n\n\n\n\n\n:options {:model-type :metamorph.ml/dummy-classifier}\n\n\n}\n\n\n\n\n\n}\n\n\n\n(-> ctx-after-predict :metamorph/data :survived)\n\n\n#tech.v3.dataset.column<float64>[178]\n:survived\n[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000...]",
+ "text": "(require '[scicloj.metamorph.ml :as ml]\n '[scicloj.metamorph.core :as mm]\n '[tablecloth.api :as tc])\n\n\n\n(def titanic ml-basic/numeric-titanic-data)\n\n\n\n(def splits (first (tc/split->seq titanic)))\n\n\n(def train-ds (:train splits))\n\n\n(def test-ds (:test splits))\n\n\n\n\n(def my-pipeline\n (mm/pipeline\n (ml/model {:model-type :metamorph.ml/dummy-classifier})))\n\n\n\nmy-pipeline\n\n\n#function[clojure.core/partial/fn--5925]\n\n\n\n\n\n(def ctx-after-train\n (my-pipeline {:metamorph/data train-ds\n :metamorph/mode :fit}))\n\n\nctx-after-train\n\n{\n\n\n\n\n\n\n\n\n:metamorph/data\n\n\n\nGroup: 0 [711 4]:\n\n\n\n:sex\n:pclass\n:embarked\n:survived\n\n\n\n\n0.0\n3.0\n2.0\n1.0\n\n\n0.0\n1.0\n2.0\n0.0\n\n\n0.0\n2.0\n0.0\n0.0\n\n\n0.0\n3.0\n2.0\n0.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n1.0\n3.0\n0.0\n0.0\n\n\n1.0\n3.0\n2.0\n0.0\n\n\n1.0\n1.0\n0.0\n1.0\n\n\n1.0\n3.0\n0.0\n1.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n...\n...\n...\n...\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n0.0\n3.0\n1.0\n0.0\n\n\n1.0\n3.0\n1.0\n1.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n0.0\n3.0\n0.0\n1.0\n\n\n0.0\n2.0\n0.0\n0.0\n\n\n0.0\n1.0\n0.0\n0.0\n\n\n0.0\n3.0\n1.0\n0.0\n\n\n0.0\n3.0\n0.0\n1.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n\n\n\n\n\n\n:metamorph/mode :fit#uuid \"ea0466aa-637c-4cab-9be7-546134ce18d0\" {:model-data {:majority-class 0.0, :distinct-labels (1.0 0.0)}, :options {:model-type :metamorph.ml/dummy-classifier}, :id #uuid \"ba2e722d-58fe-4717-89cd-59f38c6c786f\", :feature-columns [:sex :pclass :embarked], :target-columns [:survived], :target-categorical-maps {:survived #tech.v3.dataset.categorical.CategoricalMap{:lookup-table {\"no\" 0, \"yes\" 1}, :src-column :survived, :result-datatype :float64}}, :scicloj.metamorph.ml/unsupervised? nil}}\n\n(keys ctx-after-train)\n\n\n(:metamorph/data\n :metamorph/mode\n #uuid \"ea0466aa-637c-4cab-9be7-546134ce18d0\")\n\n\n\n(vals ctx-after-train)\n\n(Group: 0 [711 4]:\n\n\n\n:sex\n:pclass\n:embarked\n:survived\n\n\n\n\n0.0\n3.0\n2.0\n1.0\n\n\n0.0\n1.0\n2.0\n0.0\n\n\n0.0\n2.0\n0.0\n0.0\n\n\n0.0\n3.0\n2.0\n0.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n1.0\n3.0\n0.0\n0.0\n\n\n1.0\n3.0\n2.0\n0.0\n\n\n1.0\n1.0\n0.0\n1.0\n\n\n1.0\n3.0\n0.0\n1.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n...\n...\n...\n...\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n0.0\n3.0\n1.0\n0.0\n\n\n1.0\n3.0\n1.0\n1.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n0.0\n3.0\n0.0\n1.0\n\n\n0.0\n2.0\n0.0\n0.0\n\n\n0.0\n1.0\n0.0\n0.0\n\n\n0.0\n3.0\n1.0\n0.0\n\n\n0.0\n3.0\n0.0\n1.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n\n:fit\n{:model-data {:majority-class 0.0, :distinct-labels (1.0 0.0)},\n :options {:model-type :metamorph.ml/dummy-classifier},\n :id #uuid \"ba2e722d-58fe-4717-89cd-59f38c6c786f\",\n :feature-columns [:sex :pclass :embarked],\n :target-columns [:survived],\n :target-categorical-maps\n {:survived\n {:lookup-table {\"no\" 0, \"yes\" 1},\n :src-column :survived,\n :result-datatype :float64}},\n :scicloj.metamorph.ml/unsupervised? nil}\n)\n\n\n\n(def ctx-after-predict\n (my-pipeline (assoc ctx-after-train\n :metamorph/mode :transform\n :metamorph/data test-ds)))\n\n\nctx-after-predict\n\n{\n\n\n\n\n\n\n\n\n:metamorph/data\n\n\n\n_unnamed [178 1]:\n\n\n\n:survived\n\n\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n...\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n\n\n\n\n\n\n:metamorph/mode :transform\n\n\n\n\n\n\n\n\n#uuid \"ea0466aa-637c-4cab-9be7-546134ce18d0\"\n\n\n\n{\n\n\n:feature-columns [:sex :pclass :embarked]\n\n\n:target-categorical-maps {:survived #tech.v3.dataset.categorical.CategoricalMap{:lookup-table {\"no\" 0, \"yes\" 1}, :src-column :survived, :result-datatype :float64}}\n\n\n:target-columns [:survived]\n\n\n:scicloj.metamorph.ml/unsupervised? nil\n\n\n\n\n\n\n\n\n\n:scicloj.metamorph.ml/feature-ds\n\n\n\nGroup: 0 [178 3]:\n\n\n\n:sex\n:pclass\n:embarked\n\n\n\n\n0.0\n1.0\n1.0\n\n\n1.0\n1.0\n2.0\n\n\n1.0\n3.0\n1.0\n\n\n0.0\n3.0\n0.0\n\n\n0.0\n2.0\n2.0\n\n\n0.0\n3.0\n0.0\n\n\n0.0\n1.0\n0.0\n\n\n0.0\n3.0\n0.0\n\n\n1.0\n1.0\n0.0\n\n\n0.0\n1.0\n0.0\n\n\n...\n...\n...\n\n\n0.0\n1.0\n0.0\n\n\n0.0\n3.0\n2.0\n\n\n0.0\n2.0\n0.0\n\n\n0.0\n3.0\n0.0\n\n\n0.0\n1.0\n0.0\n\n\n0.0\n3.0\n0.0\n\n\n0.0\n3.0\n2.0\n\n\n1.0\n1.0\n0.0\n\n\n0.0\n2.0\n0.0\n\n\n0.0\n2.0\n0.0\n\n\n0.0\n3.0\n0.0\n\n\n\n\n\n\n\n\n\n:model-data {:majority-class 0.0, :distinct-labels (1.0 0.0)}\n\n\n:id #uuid \"ba2e722d-58fe-4717-89cd-59f38c6c786f\"\n\n\n\n\n\n\n\n\n\n:scicloj.metamorph.ml/target-ds\n\n\n\nGroup: 0 [178 1]:\n\n\n\n:survived\n\n\n\n\n0.0\n\n\n1.0\n\n\n0.0\n\n\n1.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n1.0\n\n\n1.0\n\n\n...\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n1.0\n\n\n1.0\n\n\n1.0\n\n\n1.0\n\n\n1.0\n\n\n0.0\n\n\n\n\n\n\n\n\n\n:options {:model-type :metamorph.ml/dummy-classifier}\n\n\n}\n\n\n\n\n\n}\n\n\n\n(-> ctx-after-predict :metamorph/data :survived)\n\n\n#tech.v3.dataset.column<float64>[178]\n:survived\n[0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000...]",
"crumbs": [
"Tutorials",
"7 AutoML using metamorph pipelines"
@@ -200,7 +211,7 @@
"href": "noj_book.automl.html#use-metamorph-pipelines-to-do-model-training-with-higher-level-api",
"title": "7 AutoML using metamorph pipelines",
"section": "7.2 Use metamorph pipelines to do model training with higher level API",
- "text": "7.2 Use metamorph pipelines to do model training with higher level API\nAs user of metamorph.ml we do not need to deal with this low-level details of how metamorph works, we have convenience functions which hide this\nThe following code will do the same as train, but return a context object, which contains the trained model, so it will execute the pipeline, and not only create it.\nIt uses a convenience function mm/fit which generates compliant context maps internally and executes the pipeline as well.\nThe ctx acts a collector of everything “learned” during :fit, mainly the trained model, but it could be as well other information learned from the data during :fit and to be applied at :transform .\n\n(def train-ctx\n (mm/fit titanic\n (ml/model {:model-type :metamorph.ml/dummy-classifier})))\n\n(The dummy-classifier model does not have a lot of state, so there is little to see)\n\ntrain-ctx\n\n{\n\n\n\n\n\n\n\n\n:metamorph/data\n\n\n\n_unnamed [889 4]:\n\n\n\n:sex\n:pclass\n:embarked\n:survived\n\n\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n1.0\n1.0\n2.0\n1.0\n\n\n1.0\n3.0\n0.0\n1.0\n\n\n1.0\n1.0\n0.0\n1.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n0.0\n3.0\n1.0\n0.0\n\n\n0.0\n1.0\n0.0\n0.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n1.0\n3.0\n0.0\n1.0\n\n\n1.0\n2.0\n2.0\n1.0\n\n\n...\n...\n...\n...\n\n\n1.0\n2.0\n0.0\n1.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n1.0\n3.0\n0.0\n0.0\n\n\n0.0\n2.0\n0.0\n0.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n1.0\n3.0\n1.0\n0.0\n\n\n0.0\n2.0\n0.0\n0.0\n\n\n1.0\n1.0\n0.0\n1.0\n\n\n1.0\n3.0\n0.0\n0.0\n\n\n0.0\n1.0\n2.0\n1.0\n\n\n0.0\n3.0\n1.0\n0.0\n\n\n\n\n\n\n\n\n:metamorph/mode :fit#uuid \"ec3c3517-ecaa-42f9-bd58-c7c5fcbde760\" {:model-data {:majority-class 1, :distinct-labels (0.0 1.0)}, :options {:model-type :metamorph.ml/dummy-classifier}, :id #uuid \"5f1f1d58-a4f3-461e-aa3e-8a7615dfaa90\", :feature-columns [:sex :pclass :embarked], :target-columns [:survived], :target-categorical-maps {:survived #tech.v3.dataset.categorical.CategoricalMap{:lookup-table {\"no\" 0, \"yes\" 1}, :src-column :survived, :result-datatype :float64}}, :scicloj.metamorph.ml/unsupervised? nil}}\nTo show the power of pipelines, I start with doing the simplest possible pipeline, and expand then on it.\nwe can already chain train and test with usual functions:\n\n(->>\n (ml/train train-ds {:model-type :metamorph.ml/dummy-classifier})\n (ml/predict test-ds)\n :survived)\n\n\n#tech.v3.dataset.column<float64>[178]\n:survived\n[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000...]\n\nthe same with pipelines\n\n(def pipeline\n (mm/pipeline (ml/model {:model-type :metamorph.ml/dummy-classifier})))\n\n\n(->>\n (mm/fit-pipe train-ds pipeline)\n (mm/transform-pipe test-ds pipeline)\n :metamorph/data :survived)\n\n\n#tech.v3.dataset.column<float64>[178]\n:survived\n[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000...]",
+ "text": "7.2 Use metamorph pipelines to do model training with higher level API\nAs user of metamorph.ml we do not need to deal with this low-level details of how metamorph works, we have convenience functions which hide this\nThe following code will do the same as train, but return a context object, which contains the trained model, so it will execute the pipeline, and not only create it.\nIt uses a convenience function mm/fit which generates compliant context maps internally and executes the pipeline as well.\nThe ctx acts a collector of everything “learned” during :fit, mainly the trained model, but it could be as well other information learned from the data during :fit and to be applied at :transform .\n\n(def train-ctx\n (mm/fit titanic\n (ml/model {:model-type :metamorph.ml/dummy-classifier})))\n\n(The dummy-classifier model does not have a lot of state, so there is little to see)\n\ntrain-ctx\n\n{\n\n\n\n\n\n\n\n\n:metamorph/data\n\n\n\n_unnamed [889 4]:\n\n\n\n:sex\n:pclass\n:embarked\n:survived\n\n\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n1.0\n1.0\n2.0\n1.0\n\n\n1.0\n3.0\n0.0\n1.0\n\n\n1.0\n1.0\n0.0\n1.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n0.0\n3.0\n1.0\n0.0\n\n\n0.0\n1.0\n0.0\n0.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n1.0\n3.0\n0.0\n1.0\n\n\n1.0\n2.0\n2.0\n1.0\n\n\n...\n...\n...\n...\n\n\n1.0\n2.0\n0.0\n1.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n1.0\n3.0\n0.0\n0.0\n\n\n0.0\n2.0\n0.0\n0.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n1.0\n3.0\n1.0\n0.0\n\n\n0.0\n2.0\n0.0\n0.0\n\n\n1.0\n1.0\n0.0\n1.0\n\n\n1.0\n3.0\n0.0\n0.0\n\n\n0.0\n1.0\n2.0\n1.0\n\n\n0.0\n3.0\n1.0\n0.0\n\n\n\n\n\n\n\n\n:metamorph/mode :fit#uuid \"0991b673-9ecc-43fd-ae5e-7f4ae54f4725\" {:model-data {:majority-class 1, :distinct-labels (0.0 1.0)}, :options {:model-type :metamorph.ml/dummy-classifier}, :id #uuid \"062ff539-8cfb-434b-9db6-655a4b676e5b\", :feature-columns [:sex :pclass :embarked], :target-columns [:survived], :target-categorical-maps {:survived #tech.v3.dataset.categorical.CategoricalMap{:lookup-table {\"no\" 0, \"yes\" 1}, :src-column :survived, :result-datatype :float64}}, :scicloj.metamorph.ml/unsupervised? nil}}\nTo show the power of pipelines, I start with doing the simplest possible pipeline, and expand then on it.\nwe can already chain train and test with usual functions:\n\n(->>\n (ml/train train-ds {:model-type :metamorph.ml/dummy-classifier})\n (ml/predict test-ds)\n :survived)\n\n\n#tech.v3.dataset.column<float64>[178]\n:survived\n[0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000...]\n\nthe same with pipelines\n\n(def pipeline\n (mm/pipeline (ml/model {:model-type :metamorph.ml/dummy-classifier})))\n\n\n(->>\n (mm/fit-pipe train-ds pipeline)\n (mm/transform-pipe test-ds pipeline)\n :metamorph/data :survived)\n\n\n#tech.v3.dataset.column<float64>[178]\n:survived\n[0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000...]",
"crumbs": [
"Tutorials",
"7 AutoML using metamorph pipelines"
@@ -266,7 +277,7 @@
"href": "noj_book.interactions_ols.html#additive-model",
"title": "8 Ordinary least squares with interactions",
"section": "",
- "text": "(def linear-model-config {:model-type :fastmath/ols})\n\n\n(def additive-pipeline\n (mm/pipeline\n {:metamorph/id :model}\n (ml/model linear-model-config)))\n\n\n\n(def evaluations\n (ml/evaluate-pipelines\n [additive-pipeline]\n (tc/split->seq preprocessed-data :holdout)\n loss/rmse\n :loss\n {:other-metrices [{:name :r2\n :metric-fn fmstats/r2-determination}]}))\n\n\n\n\n(-> evaluations flatten first :fit-ctx :model ml/tidy)\n\n\n_unnamed [3 5]:\n\n\n\n:term\n:statistic\n:estimate\n:p.value\n:std.error\n\n\n\n\n:sales\n8.23509959\n3.53966439\n1.65867320E-13\n0.42982654\n\n\n:youtube\n26.95976843\n0.04718231\n0.00000000E+00\n0.00175010\n\n\n:facebook\n17.70842019\n0.17790322\n0.00000000E+00\n0.01004625\n\n\n\n\n\n\n\n(-> evaluations flatten first :test-transform :metric)\n\n\n1.942297012809736\n\n\n\n(-> evaluations flatten first :test-transform :other-metrices first :metric)\n\n\n0.8938273864418694",
+ "text": "(def linear-model-config {:model-type :fastmath/ols})\n\n\n(def additive-pipeline\n (mm/pipeline\n {:metamorph/id :model}\n (ml/model linear-model-config)))\n\n\n\n(def evaluations\n (ml/evaluate-pipelines\n [additive-pipeline]\n (tc/split->seq preprocessed-data :holdout)\n loss/rmse\n :loss\n {:other-metrices [{:name :r2\n :metric-fn fmstats/r2-determination}]}))\n\n\n\n\n(-> evaluations flatten first :fit-ctx :model ml/tidy)\n\n\n_unnamed [3 5]:\n\n\n\n:term\n:statistic\n:estimate\n:p.value\n:std.error\n\n\n\n\n:sales\n7.62542540\n3.55542341\n4.52926585E-12\n0.46625902\n\n\n:youtube\n24.86578321\n0.04559160\n0.00000000E+00\n0.00183351\n\n\n:facebook\n17.85573663\n0.18413870\n0.00000000E+00\n0.01031258\n\n\n\n\n\n\n\n(-> evaluations flatten first :test-transform :metric)\n\n\n1.8638458183671391\n\n\n\n(-> evaluations flatten first :test-transform :other-metrices first :metric)\n\n\n0.9280020874640866",
"crumbs": [
"Tutorials",
"8 Ordinary least squares with interactions"
@@ -277,7 +288,7 @@
"href": "noj_book.interactions_ols.html#interaction-effects",
"title": "8 Ordinary least squares with interactions",
"section": "8.2 Interaction effects",
- "text": "8.2 Interaction effects\nNow we add interaction effects to it, resulting in this model equation: \\[sales = b0 + b1 * youtube + b2 * facebook + b3 * (youtube * facebook)\\]\n\n(def pipe-interaction\n (mm/pipeline\n (tcpipe/add-column :youtube*facebook (fn [ds] (tcc/* (ds :youtube) (ds :facebook))))\n {:metamorph/id :model} (ml/model linear-model-config)))\n\nAgain we evaluate the model,\n\n(def evaluations\n (ml/evaluate-pipelines\n [pipe-interaction]\n (tc/split->seq preprocessed-data :holdout)\n loss/rmse\n :loss\n {:other-metrices [{:name :r2\n :metric-fn fmstats/r2-determination}]}))\n\nand print it and the performance metrics:\n\n(-> evaluations flatten first :fit-ctx :model ml/tidy)\n\n\n_unnamed [4 5]:\n\n\n\n\n\n\n\n\n\n\n:term\n:statistic\n:estimate\n:p.value\n:std.error\n\n\n\n\n:sales\n18.77159194\n8.08558750\n0.00000000E+00\n0.43073531\n\n\n:youtube\n8.77310479\n0.01901125\n8.88178420E-15\n0.00216699\n\n\n:facebook\n2.49145705\n0.03114653\n1.39910231E-02\n0.01250133\n\n\n:youtube*facebook\n14.83474088\n0.00090064\n0.00000000E+00\n0.00006071\n\n\n\n\nAs the multiplcation of youtube*facebook is as well statistically relevant, it suggests that there is indeed an interaction between these 2 predictor variables youtube and facebook.\n\\(RMSE\\)\n\n(-> evaluations flatten first :test-transform :metric)\n\n\n0.9540779100788728\n\n\\(R^2\\)\n\n(-> evaluations flatten first :test-transform :other-metrices first :metric)\n\n\n0.9716904782821529\n\n\\(RMSE\\) and \\(R^2\\) of the intercation model are sligtly better.\nThese results suggest that the model with the interaction term is better than the model that contains only main effects. So, for this specific data, we should go for the model with the interaction model.\n\nsource: notebooks/noj_book/interactions_ols.clj",
+ "text": "8.2 Interaction effects\nNow we add interaction effects to it, resulting in this model equation: \\[sales = b0 + b1 * youtube + b2 * facebook + b3 * (youtube * facebook)\\]\n\n(def pipe-interaction\n (mm/pipeline\n (tcpipe/add-column :youtube*facebook (fn [ds] (tcc/* (ds :youtube) (ds :facebook))))\n {:metamorph/id :model} (ml/model linear-model-config)))\n\nAgain we evaluate the model,\n\n(def evaluations\n (ml/evaluate-pipelines\n [pipe-interaction]\n (tc/split->seq preprocessed-data :holdout)\n loss/rmse\n :loss\n {:other-metrices [{:name :r2\n :metric-fn fmstats/r2-determination}]}))\n\nand print it and the performance metrics:\n\n(-> evaluations flatten first :fit-ctx :model ml/tidy)\n\n\n_unnamed [4 5]:\n\n\n\n\n\n\n\n\n\n\n:term\n:statistic\n:estimate\n:p.value\n:std.error\n\n\n\n\n:sales\n24.98053874\n7.68473206\n0.00000000\n0.30762876\n\n\n:youtube\n13.66590205\n0.02072158\n0.00000000\n0.00151630\n\n\n:facebook\n5.83295846\n0.05643198\n0.00000004\n0.00967468\n\n\n:youtube*facebook\n17.35234974\n0.00079634\n0.00000000\n0.00004589\n\n\n\n\nAs the multiplcation of youtube*facebook is as well statistically relevant, it suggests that there is indeed an interaction between these 2 predictor variables youtube and facebook.\n\\(RMSE\\)\n\n(-> evaluations flatten first :test-transform :metric)\n\n\n1.4887075967183183\n\n\\(R^2\\)\n\n(-> evaluations flatten first :test-transform :other-metrices first :metric)\n\n\n0.950824042478244\n\n\\(RMSE\\) and \\(R^2\\) of the intercation model are sligtly better.\nThese results suggest that the model with the interaction term is better than the model that contains only main effects. So, for this specific data, we should go for the model with the interaction model.\n\nsource: notebooks/noj_book/interactions_ols.clj",
"crumbs": [
"Tutorials",
"8 Ordinary least squares with interactions"