MLOps: Integrate DevOps in ML systems

Valerie Lim
3 min readMay 19, 2021

--

MLOps lifecycle

In line with the recent MOOC released by DeepLearning.AI on MLOps, this article provides a starter kit on ML deployment, focusing on how we can integrate ML API with DevOps using Jenkins and Docker. This Continuous Integration (CI), Continuous Delivery (CD) and continuous training (CT) pipeline is implemented on AWS. This pipeline involves developing tests and automating ML deployment to ensure we can continuously deliver updated ML solutions quickly and reliably. The full end-to-end implementation can be found on my Github

The tools used to create the examples for this post are:

  • Github for source code management and control
  • Flask to deploy the ML model as an application
  • Docker to containerize the app. This is to ensure the model/API can be deployed in any environment
  • Jenkins to orchestrate the CI/CD pipeline flow so as to to seamlessly incorporate future improvements
  • AWS EC2 instance

The current pipeline includes these jobs

  1. Train and validate model
  2. Create a docker image for your model. This spins up a fresh, clean environment in each run, and the microservice is served in a container
A screenshot of Flask app. Demo can be found here (https://youtu.be/eAlhw-Nu_CQ)

3. Create Jenkinsfile that contain all the instructions needed to orchestrate the entire pipeline automatically. The current Jenkinsfile would perform the following: When developers commit the repository to Github, Jenkins will pull the repo, build the virtual environment to run the ML model and unit tests. A new Docker image is built if the tests are passed successfully, and deployed to Docker Hub. Older versions of Docker images will be removed.

A screenshot of a successful orchestration
  • This pipeline ensures all developers’ working copies are merged into a shared mainline (Continuous Integration) and the model still functions as intended after making improvements (Continuous Delivery). As such, end-users would be able to obtain the latest build of the model.

Outlook: Production perspectives

There are still a lot more techniques and best practices to be added to the current pipeline. E.g.

  1. Techniques to create a lightweight model:
  • Quantization: approximating a neural network that uses floating-point numbers by a neural network of low bit width numbers
  • Weight pruning: Eliminates specific connections between neurons to reduce size of neural networks

2. Perform quality analysis:

  • Data drift: e.g. rotated, cropped, blurred, blocked images
  • Model drift: model performance degrades over time. E.g. accuracy, recall, or some downstream business KPI, such as click-through rate.

3. Use FastAPI as API development, instead of Flask

  • It is faster because it’s built over ASGI vs. WSGI in Flask, and can handle multiple requests in parallel, hence the framework scales well.

4. Develop integration tests to ensure the components work together seamlessly (e.g. input data is consumable by the model)

5. The current ML model is a simple image classification solution, as an example to implement the MLOps cycle. More learnings to be gained and shared when designing an end-to-end ML production system that involves project scoping, modeling strategies, deployment requirements etc. Perhaps in a follow-up post!

6. There are challenges post-deployment, and here’s a helpful guide to address those challenges.

There is some work left for your productive application! Let me know how it goes in the comments below!

Take care and stay safe!

--

--

Valerie Lim

A fast learner and self-starter, Valerie is results driven and possesses strong analytical skills | Data Scientist @ Dell | linkedin.com/in/valerie-lim-yan-hui/