A helper library to interact with Arize AI APIs.
Arize is an end-to-end ML observability and model monitoring platform. The platform is designed to help ML engineers and data science practitioners surface and fix issues with ML models in production faster with: - Automated ML monitoring and model monitoring - Workflows to troubleshoot model performance - Real-time visualizations for model performance monitoring, data quality monitoring, and drift monitoring - Model prediction cohort analysis - Pre-deployment model validation - Integrated model explainability
You can install the development version of arize from GitHub with:
install.packages("devtools")
devtools::install_github("Arize-ai/client_r")
In this example we are going to log modelling results for a
click-through rate calculation. For background on the data and model,
please refer to the original documentation (TODO: LINK). Here we
download the training, validation, and production datasets to show how
to use the arize
package to log modelling results to Arize AI from
R
.
Logging to Arize AI in three steps:
- Perform your analysis and store results you wish to start logging (in this example we start with already prepared datasets)
- Create a logging schema. This is a named list of parameters that describes the dataset you are logging
- Initiate an
arize_logger
instance to send data to the Arize AI API (validation of the dataset and schema is carried out internally)
For this vignette, we download datasets with model results to log. In your usage, this step would be replaced with your modelling procedures.
data_url <-
"https://storage.googleapis.com/arize-assets/fixtures/Click-Through%20Rate%20Use-Case/click_through_rate_categorical_"
dataset_names <- c("training", "validation", "production")
datasets <- lapply(dataset_names,
function(x)
read.csv(paste(data_url, x, ".csv", sep = "")))
datasets <- setNames(datasets, dataset_names)
For clarity, we can separate the datasets
objects into three data
frames, for training, validation, and production. We also make sure the
results timestamp is an vector of class Date
as required by the Arize
AI API.
# This is the data which we will be logging
df_train <- datasets[["training"]]
df_train$model_date <- as.Date(df_train$model_date)
df_valid <- datasets[["validation"]]
df_valid$model_date <- as.Date(df_valid$model_date)
df_prod <- datasets[["production"]]
df_prod$model_date <- as.Date(df_prod$model_date)
The schema
is a simple named list describing the data you wish to log.
The possible fields in the schema can be seen by looking up the
schema_fields
data object exported by the arize
package. To create a
schema we’ll use the arize::create_schema
function. We first declare
our model id, model version, and create a vector of the features.
Thereafter, we pass these as arguments to create_schema
.
model_id <- "click_through_rate_categorical_vignette" # This is the model name that will show up in Arize
model_version <- "v1.0" # Version of model - can be any string
features <- c(
"position",
"domain",
"category",
"device",
"keywords"
)
# Define a Schema() object for Arize to pick up data from the correct columns for logging
library(arize)
schema <- create_schema(
prediction_id_column_name = "id",
prediction_label_column_name = "predictions",
prediction_score_column_name = "CTR_predicted",
actual_label_column_name = "actuals",
actual_score_column_name = "CTR",
feature_column_names = features,
timestamp_column_name = "model_date"
)
schema
#> $prediction_id_column_name
#> [1] "id"
#>
#> $prediction_label_column_name
#> [1] "predictions"
#>
#> $prediction_score_column_name
#> [1] "CTR_predicted"
#>
#> $actual_label_column_name
#> [1] "actuals"
#>
#> $actual_score_column_name
#> [1] "CTR"
#>
#> $feature_column_names
#> [1] "position" "domain" "category" "device" "keywords"
#>
#> $timestamp_column_name
#> [1] "model_date"
We begin by starting a new instance of the arize
’s logger (see
?arize::Client
). For this, we need our authentication credentials that
we can look up from the Space settings
on our Arize AI dashboard. We
can pass these keys to the logger directly in our script, or store them
as environmental variables in an .Renviron
file. In the latter case,
the we can initiate a new arize::Client
instance without supplying
these parameters. Below, we show both approaches:
# Keys added in the code
ORGANIZATION_KEY <- "your organization key"
API_KEY <- "your api key"
arize_client <- Client$new(organization_key = ORGANIZATION_KEY, api_key = API_KEY)
--- or ---
# Keys stored in `.Renviron`
arize_client <- Client$new()
With our Arize logger instantiated, we can proceed to log data to the
server. We’ll need the meta data, schema, and model results data frame.
In addition, we also need to match the results data to the type of the
model and environment. To see the available model types and
environments, look up the arize::model_types
and arize::environments
data objects.
model_types
#> $BINARY
#> [1] 1
#>
#> $NUMERIC
#> [1] 2
#>
#> $CATEGORICAL
#> [1] 3
#>
#> $SCORE_CATEGORICAL
#> [1] 4
environments
#> $PRODUCTION
#> [1] 1
#>
#> $VALIDATION
#> [1] 2
#>
#> $TRAINING
#> [1] 3
We can now log the training, validation, and production results data.
Training data:
arize_client$log(
.data_frame = df_train,
.model_id = model_id,
.model_version = model_version,
.model_type = model_types$SCORE_CATEGORICAL,
.environment = environments$TRAINING,
.schema = schema
)
#> [1] ""
For the validation data, we also supply a batch id to the logger:
arize_client$log(
.data_frame = df_valid,
.model_id = model_id,
.model_version = model_version,
.batch_id = "validation",
.model_type = model_types$SCORE_CATEGORICAL,
.environment = environments$VALIDATION,
.schema = schema
)
#> [1] ""
And production data:
arize_client$log(
.data_frame = df_prod,
.model_id = model_id,
.model_version = model_version,
.model_type = model_types$SCORE_CATEGORICAL,
.environment = environments$PRODUCTION,
.schema = schema
)
#> [1] ""
Note that we changed the environment
passed to the model to match the
dataset being logged, but otherwise the calls to arize_client$log
are
identical for the three modelling stages.
The logger function returns the result of a httr::POST()
call that
sends our data to Arize AI. If everything went well, you should see an
empty string printed to the R
console (""
). If you receive some
other response from the API, things might have gone wrong. In such cases
please consult the API
documentation
for more information.
sessionInfo()