Deploying a small ML model on AWS ECS (Part 1)

A few notes, lessons and observations from deploying a small ML model on AWS ECS
deploy ml model
aws ecs
aws
fastapi
finetuned vietnamese accent model
Author

Peter Hoang

Published

August 21, 2024

In a previous post, I’ve discussed a small ML model finetuned to auto insert vietnamese accent marks that I’ve shared on HuggingFace model hub.

In this post, I’d share my notes and observations from deploying this model on AWS ECS (Elastic Container Service).

Context

Some traditional VPS hosting services also support Docker. But to have “auto-scaling”, we’d need to use a big cloud such as AWS / GCP / Azure.

In this post, we’d zoom into AWS ECS which is the service that is used to deploy containerized applications.

Key steps (end to end)

Below are the key steps to deploying an ML Model that was available locally (or on HF Hub) to AWS ECS:

  1. Putting the ML Model behind an API. For this in Python, we can use Django Rest Framework (DRF), Flask or FastAPI. In this experiment, I used FastAPI because this ML model doesn’t need any database.

  2. Once the server works locally, we can “containerize” it using Docker. A quick instructions can be found here for FastAPI.

  3. To use AWS ECS, we’d host our Docker image in AWS ECR (not a strict requirement, but the ECR fee is low and part of AWS so it may be convenient to do so.)
    At the AWS Console -> ECR: I created a new ECR private repo and follow the few steps (commands) shown auto by AWS.
    I had no problem pushing to ECR just by copy and paste these commands. (Of course, we’d need to first install AWS CLI and the likes, but they’re all mentioned in the instructions).

  4. Follow the official instructions here to setup and start our container instances. by AWS ECS.
    This is the main discussion of this post so we’d focus on that in the next sections.

Overview of ECS concepts

In terms of concepts, it seems to me that ECS has 3 main concepts:

  1. Cluster: like a placeholder for all our stuffs, probably just for grouping things
  2. Service: this is what actually gets deployed. Basically to configure the desired infrastructure and declare which “Tasks” to run, which is another concept below.
  3. Task: a task is collection of (related) containers that run together and deployed together. In practice, we’d need to declare our Tasks first, and then go create services based on them. In our Tasks definition, we’d explicitly declare which containers we want to use (those we’ve pushed to ECR above).

So the sequence looks like this: Create cluster -> Create Task defintion -> Create service & run it.

The first 2 steps shouldn’t cause any problems, except to take note of the port mapping in Docker:

The most interesting learning points are with the ECS Service

When creating the ECS Service, we can choose the underlying infra to be EC2 or Fargate, which can be thought of as a serverless version of EC2 for containers.

But the note here is that as long as the service is running, which is always if it’s an API server, we’d get charged by the running time, not just when the service is actually used by some user. So in this sense, Fargate is a serverless abstraction over EC2 so we don’t have to concern ourselves with manaing the underling EC2 instances. So I chose Fargate for all of my experiments.

When choosing Fargate, there’s a small option, enabled by default, to assign a public IP to our running tasks.

When I kept this option enabled, everything went smoothly and I had something running right a way.

But unexpected “challenges” happened when this option was not enabled. But I’d leave that to Part 2 of this post. Stay tuned.