Welcome to Day 5 of the "How to AWS" series: Base operations
If you missed the introduction and drivers behind this series, check it out here.
In the previous blog post in this series we setup the groundwork for operating in AWS. We:
- Setup IAM for our AWS Account
- Defined and implemented our AWS Networking
- Setup Route53 for DNS
Today, we will focus on establishing some common operational requirements in preparation for deploying our workloads such as:
- AMI management
- Log management
- Secrets management
- Emailing on AWS
All EC2 instances are based upon an Amazon Machine Images (or AMI for short).
While AWS and other providers make AMIs readily available for use, directly relying on these AMIs may cause you to come unstuck later on. One example of this is when AWS releases new AMIs (e.g. Amazon Linux) with recent patches applied, they often remove the previously registered AMI which means a new AMI ID. This can cause havoc to your automated deployment pipelines and/or launch configurations.
To address this issue, you could dynamically look up the latest AMI (via AWS APIs - example here), but be aware of the additional operational risk you are taking on. By making this dynamic you are essentially deploying and building upon an unknown/untested base image. What happens to your application if a recent patch or upstream change isn’t compatible with your app? Hopefully, you catch this before it gets to production!
To combat this and provide a consistent, known base image, best practices recommend the creation and maintenance of a “golden AMI”. Often based upon one of the above mentioned AMIs, this is a subsequent AMI which your business has created. As a result, you know it will exist within your environment and updates are within your control. Ideally, your golden AMI will contain common dependencies, your baseline server hardening and any agents used for operations. For more on what to include in your AMI, take a look at the AWS Answers page on AWS AMI Design.
The ongoing maintenance, creation and sharing of these AMIs introduces some additional operational overhead. As a result I recommend you automate the creation process as much as you can. We will refer to this as an “AMI pipeline”. AWS has released a sample pipeline that should assist you in spinning one up to suit your needs. Read more about that here.
You need backups - this is a universal truth. For EC2 instances, using EBS Snapshots is the easiest and lowest level of backups you will want in place. EBS Snapshots take an incremental block-level copy of data currently on an EBS volume. This, in turn, gets persisted durably by AWS on S3.
To set your operational base environment for snapshots we’re going to use Data Lifecycle Manager (DLM) to automate and provide this capability. DLM uses tags and an associated lifecycle policy to manage the targeting, creation and subsequent deletion (e.g. after 30 days) of your EBS Snapshots. Take a look at the documentation and set it up now!
Log management is often overlooked, but forms a critical component of any incident response and is crucial for proactive monitoring of your environment(s). You need to decide what type of log management is right for your business and your specific requirements.
Logs should no longer simply reside on your servers and be reviewed manually when there’s an incident. Push them to CloudWatch Logs at a minimum using the CloudWatch agent. At the time of writing, there are two versions of this agent (older vs newer) - you should opt to use the newer “unified” CloudWatch agent!
Once your logs are off your server you can:
- Create metrics from them
- Alert on them based on rules
- Trigger automated remediation of issues
- Correlate logs across multiple sources/systems
If you need other options or capabilities, AWS also offers the capability to run a hosted Elasticsearch cluster with Logstash and Kibana (the well known ELK stack). Further, from that you can run your own solution on EC2 or look to leverage one of the many AWS Technology Partners which specialise in log management (such as Sumo Logic).
Secret management isn’t a new topic, but it is one that often seems to be overlooked when getting started with AWS. Just like everyone should be using a password manager for securing all your personal and business passwords (e.g. LastPass, Dashlane or Enpass), systems running on AWS should leverage the available services for securing sensitive information.
You should look to leverage AWS Secrets Manager or Secure Strings within the Systems Manager Parameter Store for storage of sensitive information such as database connection details and passwords. Access to this information will be controlled through IAM policies and the Amazon Key Management Service (Amazon KMS) policies on each encryption key.
Once setup, your systems should access these sensitive details at runtime, as granted by their runtime IAM permissions. This enables you to remove environment specific credentials and configuration from your application code and configuration files. Designed this way, credentials and config do not need to exist in your source code repository! Additionally, any access or updates to the configuration and credentials are logged and auditable via CloudTrail (remember, we set this up on Day 3).
Many workloads will send out emails. It’s worth discussing how you’ll approach outbound email for your workloads.
My recommendation for sending emails from AWS is to leverage Amazon Simple Email Service (SES). By using this service you simplify your email processing requirements and can leverage the functionality of SES (i.e. bounce notifications). Just make sure to verify your sending (aka “From”) email address or domain name in order to send emails via SES. While you’re at it, make a point of reviewing the AWS SES best practices.
When sending emails from EC2 instances directly, there is a default throttle placed on email (TCP 25). If you do want to send from EC2 instances, follow this guide to get the throttle removed. Beware that you then also need to manage any IP blacklisting, bounces, inbound emails and the list goes on.
In this blog post, we have covered common base operational concerns such as AMI management, backup automation, log and secret management and how to email from AWS.
In the next blog post, we will start working on deploying our first workload into AWS, leveraging the steps we’ve completed throughout this series. See you there!