Skip to content

Commit

Permalink
docs: glossary for important terms (#292)
Browse files Browse the repository at this point in the history
Closes #188.

### Summary of Changes

Configured Snippets extension in mkdocs.yml
Created includes/glossary.md
Filled Glossary with various terms

---------

Co-authored-by: guempelp0 <[email protected]>
Co-authored-by: Lars Reimann <[email protected]>
  • Loading branch information
3 people authored Jun 23, 2023
1 parent ccd1c25 commit 2979f24
Show file tree
Hide file tree
Showing 2 changed files with 141 additions and 0 deletions.
139 changes: 139 additions & 0 deletions docs/glossary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
# Glossary

## Accuracy
The fraction of predictions a [classification](#classification) model has correctly identified. Formula:

$$
\text{accuracy} = \frac{\text{True Positives + True Negatives}}{\text{Total amount of data points}}
$$

See here for respective definitions:
[True Positives](#true-positive-tp)
[True Negatives](#true-negative-tn)

## Application Programming Interface (API)
An API allows independent applications to communicate with each other and exchange data.

## Classification
Classification refers to dividing a data set into multiple chunks, which are then considered "classes".

## Confusion Matrix
A confusion matrix is a table that is used to define the performance of a [classification](#classification) algorithm.
It classifies the predictions to be either be [true positive](#true-positive-tp), [true negative](#true-negative-tn),
[false positive](#false-positive-fp) or [false negative](#false-negative-fn).

## Decision Tree
A Decision Tree represents the process of conditional evaluation in a tree diagram.

Implemented in Safe-DS as [Decision Tree][safeds.ml.classical.classification.DecisionTree].

## F1-Score
The harmonic mean of [precision](#precision) and [recall](#recall). Formula:

$$
f_1 = 2 \cdot \frac{\text{precision} \cdot \text{recall}}{\text{precision} + \text{recall}}
$$

## False Negative (FN)
An outcome is considered to be a false negative, if the data model has mistakenly predicted a value of negative class.

## False Positive (FP)
An outcome is considered to be a false positive, if the data model has mistakenly predicted a value of positive class.

## Feature
Each feature represents a measurable piece of data that can be used for analysis.
It is analogous to a column within a table.

## Linear Regression
Linear Regression is the supervised Machine Learning model in which the model finds the best fit linear line between the independent and dependent variable
i.e. it finds the linear relationship between the dependent and independent variable.

Implemented in Safe-DS as [LinearRegression][safeds.ml.classical.regression.LinearRegression].

## Machine Learning (ML)
Machine Learning is a generic term for artificially generating knowledge through experience.
To achieve this, one can choose between a variety of model options.

## Metric
A data metric is an aggregated calculation within a raw dataset.

## One Hot Encoder
If a column's entries consist of a non-numerical data type, using a One Hot Encoder will create
a new column for each different entry, filling it with a '1' in the respective places, '0' otherwise.

Implemented in Safe-DS as [OneHotEncoder][safeds.data.tabular.transformation.OneHotEncoder].

## Overfitting
Overfitting is a scenario in which a data model is unable to capture the relationship between the input and output variables accurately,
due to not generalizing enough.

## Positive Class
The "Positive Class" consists of all attributes to be considered positive. Consequently, every attribute to not be in this class is considered to be negative class.

## Precision
The ability of a [classification](#classification) model to identify only the relevant data points. Formula:

$$
\text{precision} = \frac{\text{True Positives}}{\text{True Positives + False Positives}}
$$

See here for respective references:
[True Positives](#true-positive-tp)
[False Positives](#false-positive-fp)

## Random Forest
Random Forest is an ML model that works by generating decision trees at random.

Implemented in Safe-DS as [RandomForest][safeds.ml.classical.regression.RandomForest].

## Recall
The ability of a [classification](#classification) model to identify all the relevant data points. Formula:

$$
\text{recall} = \frac{\text{True Positives}}{\text{True Positives + False Negatives}}
$$

See here for respective references:
[True Positives](#true-positive-tp)
[False Negatives](#false-negative-fn)

## Regression
Regression refers to the estimation of continuous dependent variables.

## Regularization
Regularization refers to techniques that are used to calibrate machine learning models
in order to minimize the adjusted loss function and prevent [overfitting](#overfitting) or [underfitting](#underfitting).

## Sample
A sample is a subset of the whole data set.
It is analyzed to uncover the meaningful information in the larger data set.

## Supervised Learning
Supervised Learning is a subcategory of ML. This approach uses algorithms to learn given data.
Those Algorithms might be able to find hidden meaning in data - without being told where to look.

## Tagged Table
In addition to a regular table, a Tagged Table will mark one column as tagged, meaning that
an applied algorithm will train to predict its entries. The marked column is referred to as ["target"](#target).

## Target
The target variable of a dataset is the feature of a dataset about which you want to gain a deeper understanding.

## Test Set
A set of examples used only to assess the performance of a fully-specified [classifier](#classification).

## Training Set
A set of examples used for learning, that is to fit the parameters of the [classifier](#classification).

## True Negative (TN)
An outcome is considered to be a true negative, if the data model has correctly predicted a value of negative class.

## True Positive (TP)
An outcome is considered to be a true positive, if the data model has correctly predicted a value of positive class.

## Underfitting
Underfitting is a scenario in which a data model is unable to capture the relationship between the input and output variables accurately,
due to generalizing too much.

## Validation Set
A set of examples used to the parameters of a [classifier](#classification).
2 changes: 2 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ nav:
- Data Visualization: tutorials/data_visualization.ipynb
- Machine Learning: tutorials/machine_learning.ipynb
- API Reference: reference/
- Glossary: glossary.md
- Development:
- Environment: development/environment.md
- Guidelines: development/guidelines.md
Expand Down Expand Up @@ -88,6 +89,7 @@ markdown_extensions:
- pymdownx.inlinehilite
- pymdownx.snippets


# Diagrams
- pymdownx.superfences:
custom_fences:
Expand Down

0 comments on commit 2979f24

Please sign in to comment.