AWS: How to migrate Database to CloudFormation
Infrastructure (is) as code (aka IaC) is gaining more and more popularity whilst the supporting tools are advancing. Running in the cloud is yet another level of abstraction that makes our engineering life easier. Having said that — many companies started their business having servers on-premises gradually migrating to the cloud. In this process some parts of the infrastructure are relatively easy to move, while some others might take years. A good example of the latter are the databases. The complexity here arises from the huge amount of data and a potential necessity to stop production operations for some period of time.
In this tutorial I’ll guide through all the necessary steps to migrate a production database to the cloud while minimizing downtime of the running system.
Prerequisites
This tutorial will cover a specific set of technologies:
- AWS as a cloud provider
- Postgres as a database to be migrated
- Jenkins as an automation tool
You might be running something completely different and that’s alright. The concepts are the same, the implementation will differ.
As always, project sources are available at GitHub, find the link at the end of the page.
There are 3 parts in this tutorial: CloudFormation template creation, Automation and Data migration.
CloudFormation template creation
One of the main goals of IaC is to be able to recover from failures quickly. In other words that means you should be able to recreate your entire production stack with minimal efforts. There are various ways in AWS to achieve that, I’ll be using the CloudFormation template approach.
Note! You should read through the CloudFormation RDS user guide and familiarize yourself with all the important notes.
RDS template supports many various properties. Depending on the use-case you should be able to find everything needed to configure the newly created database instance. The following template was made:
I’d like to highlight a few important properties here:
DBName
property was deliberately left unspecified. In this case a default databasepostgres
will be created. Keeping a default value might be a potential security hole, so take that into account.MasterUsername
andMasterUserPassword
are stored securely in Secrets Manager. Never store them in plan text!AllocatedStorage
andIops
both were set to the minimal values allowed. You will most likely need to adjust those.AllowMajorVersionUpgrade
andAutoMinorVersionUpgrade
were disabled to minimize unexpected surprises in production environment. Doing database updates in a controlled manner is always preferred.DBInstanceClass
in use is a Graviton-based instance type. It is recommended due to the better price and performance. You don’t need to configure anything extra here because of the different processor — AWS covers everything.PubliclyAccessible
should always be disabled. Running your RDS instance in a publicly accessible mode is a huge security issue.
Following the best practices you would first try out the changes in a safe environment before running it in production. To achieve that the template was parameterized accordingly.
Automation
Now that we have prepared the template, let’s create a job to deploy it. As mentioned previously I’ll be using Jenkins pipelines written in groovy.
If you are unfamiliar with Jenkins pipelines and AWS steps in particular — check out the documentation page.
The job is rather self-explanatory. You can of course omit this completely and apply the template using AWS Console UI. Doing so though would not only violate IaC principles, but also create a potential issue in the future if you decide to create another database but in a different region.
Data migration
Now that an RDS instance is deployed in the cloud we can proceed with the last step — data migration. In case of Postgres there are various tools available to achieve that. For bigger database instances with lots of databases in it you will likely prefer to use pg_dumpall. Unfortunately the rds_superuser
role doesn’t have permission on the pg_authid
when you are running in AWS. You can still run pg_dumpall
but you have to keep in mind two aspects:
- Due to the limitation described earlier you should use
— no-role-passwords
option, which means that all the recreated roles will havenull
password - In order to dump users and tablespaces you should use
— globals-only
option
For this tutorial I’ll stick to the simpler utility called pg_dump. It allows to backup a single database and doesn’t work with globals
Note! You might be thinking that I took a simpler route. In fact, that’s exactly the migration I was performing recently — one database with one user and a bunch of tables.
In order to get the dump of the production DB you can use the following command:
pg_dump -Z0 -j 10 -Fd db_name -U user_name -h db_host -p 5432 -f db_name
There are various options you can supply for the pg_dump
command and also many discussions around them. Apart from the basic ones (-U, -h, -p) you can see a few more used to speed up the execution:
- -Z0 meaning that there is no compression applied
- -j 10 is used to dump 10 tables simultaneously
- -Fd is used to dump into a folder, arguably the fastest way to backup/restore the data
Before we proceed to the restoration part there are a few SQL commands to execute due to the use of pg_dump
. We will need to create a user and a database manually. Connect to the RDS instance (you can use DB_MASTER_USER
/DB_MASTER_PASSWORD
values from Secrets Manager) and run the following:
create user "user_name"
createdb
CREATEROLE
PASSWORD '<same as current>'
noinherit;create database "db_name";alter database "db_name" owner to "user_name";
Additionally, you can already add the default privileges for the newly created user:
ALTER DEFAULT PRIVILEGES FOR ROLE “user_name” GRANT ALL ON TABLES TO “db_name”;ALTER DEFAULT PRIVILEGES FOR ROLE “user_name” GRANT ALL ON SEQUENCES TO “db_name”;
Now we can proceed to the restoration of the previously taken dump. The pg_restore tool will be used for it:
pg_restore -j 10 -Fd -O -U user_name -h db_host_name.region.rds.amazonaws.com -p 5432 -d db_name db_name
Most of the options here are the same as for pg_dump
apart from -O
. That one is used so that any user name can be used for the initial connection, and this user will own all the created objects.
Lastly, we should give all the privileges for tables and sequences to our user. Once again connect to the RDS instance and execute the following:
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO “user_name”;GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO “user_name”;
Note! The execution time of both dump and restore operations will vary depending on the size of your Database. Be patient!
Afterwards
A database migration is undoubtedly quite an intense and timely process. Nevertheless, with the tools available today it is manageable and if you want to keep your infrastructure nice and tidy while moving to cloud don’t leave the database out of it.
Should you have any questions please leave them in the comments section below.
The project source files can be found at this GitHub repository.
Support
If you like the content you read and want to support the author — thank you very much!
Here is my Ethereum wallet for tips:
0xB34C2BcE674104a7ca1ECEbF76d21fE1099132F0