Testing Mastodon for Scale in AWS

November 13, 2022

Preface

I was going to originally list out the sizes of AWS instances used for the services, but each Mastodon instance is going to have different needs, so instead I am going to list the services used and some thoughts on each. This is a review, not a how-to.

Before someone replies to my post about this, yes I know there are more cost effective ways to run the infrastructure in AWS, but the idea was to use as many native AWS services that would allow scale to grow quickly without too much of a headache.

Services Used

Load Balancer: AWS ALB
Database: Postgres RDS
Redis: ElastiCache Redis
Asset Storage: S3, Served by CloudFront
E-mail: SES
Compute: ECS using EC2 with an ASG (AutoScaling Group)
Network: VPC with NAT Gateway*
SSL: AWS Certificate Manager (ACM)
Secret Manager: AWS Secrets Manager

Configuration (High Level)

*if someone wants to get into the weeds about configuration, please message me on the fediverse/Mastodon*

The initial setup of the services was actually pretty painless. Even less painless if you take the time to put your configuration into Terraform and have your setup configured as IaC. I did not do this since it was a quick and dirty build, but if I were to build this out for myself or others, I definitely would.

I started with setting up my supporting services, which I define as anything that isn’t the web applications.

Setting up your VPC (your network) correctly is going to save you a lot of problems, but as I will get into later, doing it “correctly” costs money.

I typically setup my VPC with 4 subnets in 2 or more AZs. Public, Private, Intra, and Database – You could probably get away without the Intra, as that is a very select use case.

Public: Public Facing services, that can have Public IPs assigned directly to them. This subnet would use a regular Internet Gateway.

Private: Services that should not be exposed to the open internet, but can access the internet through a NAT Gateway. Using ALBs to point to services inside this network ensures you do not expose something you do not intend to the open internet.

Intra: This is a place where you put services that should only be accessed by systems inside your VPC only and *SHOULD NOT* have any access to the internet. This is just an extra layer of security.

Database: AWS recommends setting up a subnet for where your Database services reside.

So with that, we have a VPC with at least 8 Subnets (12 if you used 3 zones), an Internet Gateway, and a NAT Gateway (or multiple depending on fault tolerance). That is before we even deploy the first service.

RDS, ElastiCache, SES, S3, and Cloudfront all are pretty easy to setup. You should note down credentials you get from the setup process, as you will need them later. If you were using Terraform, you could actually pass those values directly in the code to do the task configuration in ECS, but since I did this through the web UI, I had a NotePad++ note going with the addresses and logins for all my services.

For ECS, you will want to start first by defining your Cluster. For Mastodon, since it should always be running, setup a cluster on Linux EC2. Fargate is convenient, but again it is super expensive.

For setting up the cluster, do what you can afford, but for make scalability and reliability, I would recommend going with at least two nodes, one in each of the AZs you created for your VPC.

Once you create the cluster, you can define the task, which is really just defining the type, variables and number of containers you wish to run. For anyone that has ever used Portainer, the setup should be familiar, and for anyone else, it is pretty self explanatory. You will need to define your environment variables, and for your login details, I recommend using AWS Secrets Manger to securely inject the values.

While defining your task, I recommend setting setting up the load balancer, and the scaling policy.

Once the initial setup is done, you should check to make sure you get a response on the AWS domain for the ALB before going to much further ahead.

P.S. Route53 and CloudFlare both support CNAME Flattening if you are using an APEX Domain as the Mastodon domain.

Results

Well, it worked. I tested with some load generation for the instances, and forced scaling both up and down for the compute, and didn’t suffer an outage. Scaling for RDS was kind of out of scope, but I did see some smart people mentioning using Aurora Read Replicas for RDS (serverless read replicas) to allow for RDS to scale easily as well.

I didn’t test what happens with Sidekiq if the services were scaled down while it was performing a job, but I suspect that it would just restart on another server.

The Elephant (Mastodon?) in the Room – Cost

If you thought moving services to the “Cloud” (read, someone else’s computer) was going to be cheaper, this is going to be a blow.

Setting up the network as I described, which is considered “Best Practice”, is super expensive for what it is (search for ‘AWS NAT gateway expensive‘), but you also do not want your entire network out on the public internet, and setting up your own NAT through an EC2 instance is potentially asking for problems if that instance goes offline.

Also, Having a dedicated RDS instance is pretty expensive, especially for larger instances.

For most people, especially those running small to medium instances, this is not a cost effective way to run Mastodon.

Why would anyone want to do this?

One of the advantages for building an environment this way is the ability to scale without having to do a large migration. I could see it making sense for people who have many thousands of users. It would also help for the servers that have a huge influx and then need to scale back down.

Also using some of the services actually are a cost savings/frustration savings a lot if times. SES for sending emails instead of trying to manage an email server (keeping off spam lists especially). S3 with CloudFront saves on disk space and bandwidth for assets.

Both those services do not require you run your environment in AWS and you can get immediate benefits from it.

If you have questions or comments, feel free to reach out to me @GoTakeAKnapp@anti-social.online

Quick and Dirty Blog

November 12, 2022

Normally I try to put a little more effort into the website layout, but since this is simply a place to write stuff about the Mastodon Instance: anti-social.online, I figured it would be just easier to delete all the stuff I don’t need and make it live.