This article is now 2 years old! It is highly likely that this information is out of date and the author will have completely forgotten about it. Please take care when following any guidance to ensure you have up-to-date recommendations.
As more services go live on my Kubernetes clusters and more people start relying on them, I get nervous. For the most part, I try and keep my applications and configurations stateless - relying on ConfigMaps for example to store application configuration. This means with a handful of YAML files in my Git repository I can restore everything to working order. Sometimes though, there’s no choice but to use a PersistentVolume to provide some data persistance where you can’t capture it in a config file. This is where a backup of the cluster - and specifically the PersistentVolume is really important.
Enter Velero - the artist formerly known as Heptio Ark.
Velero is an open source tool to safely backup and restore, perform disaster recovery, and migrate Kubernetes cluster resources and persistent volumes.
Velero uses plugins to integrate with various cloud providers, allowing you to backup to different targets - my aim is to backup my vSphere-based (CSI) Persistent Volumes to AWS S3.
Set up AWS
You can set up all the required components using the AWS console, but my preference is to use the AWS CLI.
Create a new Access Key
To use the AWS CLI you’ll need an Access Key. Log onto your AWS console, go to “My Security Credentials” and create an Access Key (if you’ve not already got one)
Keep the details safe (I store mine in my password manager).
Install AWS CLI
I’m using homebrew to install the AWS CLI, and other packages, because I’m on a Mac - check out the official install docs for other OSes.
1
brew install awscli
Configure a new profile
Note: I’m using a named profile as I’ve got a few accounts - you can omit this if you are just setting up the one
Lets set up some variables first:
1
2
3
BUCKET=prod-cluster-backup # The name of your S3 bucket to createREGION=us-west-1 # AWS Region in which to create the S3 bucketPROFILE=my-profile # Only needed if you're creating a named profile for AWS CLI
Configure your AWS profile (omit --profile $PROFILE if you’re using the default profile)
1
2
3
4
5
aws configure --profile $PROFILE> AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
> AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
> Default region name [None]: us-west-2
> Default output format [None]: ENTER
I’m creating a user with the same name as my bucket - since this user and bucket will be used to back up a single cluster, it makes sense for me to be able to identify and link the two by name.
1
aws iam create-user --user-name $BUCKET --profile $PROFILE
Create a JSON file with a policy definition of the permissions velero needs - note that it’s scoped to the specific bucket using the $BUCKET variable:
At this point you could use velero backup create to start backing things up, but Velero won’t automatically backup your persistent volumes - you need to tell it what to backup using an annotation. Without annotating the pods the backup will complete and look successful but it won’t include your data!
Annotate deployments, stateful sets or pods
Let’s take my Vault deployment, for example. It consists of a stateful set of three pods, each pod has a persistant volume called “data”. Prior to deployment I can add the backup.velero.io/backup-volumes: <volume name> annotation to the template metadata in my YAML configuration:
You can create a backup of your entire cluster using velero backup create whole-cluster-backup, or you can create scheduled backups using a cron-like schedule