Pet project to explore the "Model-as-a-Service" concept via API creation. Docker image available here.
Lots of the underlying packages have undergone through numerous changes since this proof-of-concept was originally created and as such modifications may be needed, in particular with respect to plumber
's growth in the last couple of years.
Pet project for creating a simple Naïve Bayes classifier with the Titanic data set (as an example) and deploying it as an API through Docker and a container-hosting service.
This project is heavily dependent on two R packages:
plumber
, for creating the API with R --- see more heremlr
, to create and use the model --- see more here
HTTPS can be set via the platform of delivery (tested with Microsoft Azure) instead of burdening the Docker container with a server. However, it is straightforward to integrate an apache server with custom certificates. See the security note below for more information.
It is assumed that the model is created locally, by sourcing the createModel.R. The API itself needs only the the final model and one row of example data (which contains all relevant metadata), which are both stored as .rds
files inside the model
folder, along with the parameter validation function contained in valParams.R
, already inside the model
folder.
To test the API locally (without the Docker container), assuming that both plumber
and mlr
packages are installed, all that is needed is to source the runAPI.R
script, which will automatically plumb mainAPI.R
.
You should be able to find the basic introductory page at http://localhost:8000.
The /titanic
endpoint handles the actual prediction. The parameters to be passed into the model are included in the request itself for simplicity. The pattern is /titanic?param1=value1¶m2=value2
.
-
Example query: http://localhost:8000/titanic?pClass=2&pSex=male&pAge=70&pFare=125&pFamily=0
-
Example response:
[{"prob.FALSE":0.0479,"prob.TRUE":0.9521,"response":"TRUE"}]
If you've built the Docker container (or pulled it from here), then you don't need to have an R installation - that's already taken care of via the container, with all the necessary packages installed.
Simply
- run locally with:
docker run --rm --user docker -p 8000:8000 titanic_api
- and see results on http://localhost:8000/
For an example query and response, see the previous section.
Simply navigate to the URL for the container. That should load the introductory page. Appending /titanic?...
as above should result in the expected behaviour (see the running locally section), as the API maps to the endpoints directly, without any need for further manual configuration or routing.
Data was acquired from Stanford's CS109 publicly accessible page here.
It is assumed that the container would be online behind other security measures such as user authentication and HTTPS. The container itself validates the parameters passed to it (thus avoiding the most obvious security breach) but does not implement other security features. However, such measures are easily implemented and usually already in place. Container hosting services may also offer solutions as well (as mentioned above, tested with Microsoft Azure).
If needed, HTTPS can be implemented via the container by including an apache server and the necessary certificates. For an example of such an implementation, see T-mobile's repository.
FALSE | TRUE |
0.614 | 0.386 |
Class | 1 | 2 | 3 |
---|---|---|---|
FALSE | 0.147 | 0.178 | 0.675 |
TRUE | 0.398 | 0.254 | 0.348 |
Sex | female | male |
---|---|---|
FALSE | 0.149 | 0.851 |
TRUE | 0.681 | 0.319 |
Naïve-Bayes assumes Gaussian distribution for non-categorical features.
Age | mean | std. deviation |
---|---|---|
FALSE | 30.139 | 13.898 |
TRUE | 28.408 | 14.428 |
Fare | mean | std. deviation |
---|---|---|
FALSE | 22.209 | 31.484 |
TRUE | 48.395 | 66.597 |
While the feature is obviously an ordinal (and equally spaced by one person at a time), it has been left as a numerical to be able to predict previously unseen combinations. During input validation, it is ensured that an integer is passed to the model.
Family | mean | std. deviation |
---|---|---|
FALSE | 0.89 | 1.836 |
TRUE | 0.939 | 1.186 |