EC2, Auto Scaling, Load Balancers, VPC, Security Groups, NACLs, WAF, VPN, Route 53, S3, RDS, CloudWatch, ECS, Lambda, CloudFormation — all in simple terms.
18
Chapters
40+
Services
100%
Free
01☁️
Introduction to AWS
Cloud Computing for DevOps
AWS is the world's largest cloud platform with 200+ services. As a DevOps engineer, you don't need to know all 200 — you need to master about 15-20 core services that form the backbone of every production environment. This guide covers exactly those services in simple terms.
Core AWS Concepts — Think of It Like This
🌍
Regions & AZs
AWS has data centers in 30+ cities worldwide (regions). Each region has 2-3 separate buildings (Availability Zones) so if one building loses power, your app keeps running in the other.
💰
Pay-As-You-Go
Like electricity — you only pay for what you use. Turn off a server at night? You stop paying. No upfront cost.
🔐
Shared Responsibility
AWS secures the buildings, power, and network cables. YOU secure what you put inside — your data, your passwords, your firewall rules.
IAM controls access to your entire AWS account. Think of it as the security guard at the building entrance — checking IDs, giving visitor passes, and making sure nobody enters restricted areas without permission.
IAM Building Blocks
👤
Users
Individual people. Each gets their own username and password. Suresh, Priya, Rahul — each has their own IAM user.
👥
Groups
Collections of users. Create a group called "Developers" and put Priya and Rahul in it. Attach permissions to the group — all members get those permissions.
🎭
Roles
Temporary access for SERVICES (not people). An EC2 instance needs to access S3? Give it an IAM Role. No passwords needed — it just works.
📜
Policies
JSON documents that say "allow this action on this resource." Policies attach to users, groups, or roles.
IAM Policy Example
JSON{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:PutObject"],
"Resource": "arn:aws:s3:::my-app-bucket/*"
},
{
"Effect": "Deny",
"Action": "s3:DeleteBucket",
"Resource": "*"
}
]
}
// Translation: You CAN read and upload files to my-app-bucket.
// You CANNOT delete any bucket. Deny always wins over Allow.
✓Never use the root account for daily tasks — create an IAM admin user
✓Enable MFA (Multi-Factor Authentication) on every IAM user
✓Use IAM Roles for EC2/Lambda/ECS — never put access keys inside code
✓Follow least privilege: give minimum permissions needed
✓Rotate access keys every 90 days
⚠️ Root Account
Root account has UNLIMITED power — can delete everything, change billing, close the account. Lock it away, enable MFA, and create an IAM admin user for daily work.
03🖥️
EC2 — Elastic Compute
Virtual Servers in the Cloud
EC2 gives you virtual servers (called instances) that you can launch in minutes. It's like renting a computer in Amazon's data center. You choose the size (CPU, RAM), the operating system (Amazon Linux, Ubuntu), and you're running in 30 seconds.
Instance Families — Which One to Pick
Family
CPU/RAM
Best For
Example
t3/t4g
Balanced, burstable
Web servers, small apps, dev/test
t3.micro (1 vCPU, 1 GB)
m5/m6i
Balanced, steady
App servers, backend APIs
m5.xlarge (4 vCPU, 16 GB)
c5/c6i
CPU-heavy
Batch processing, CI/CD agents
c5.2xlarge (8 vCPU, 16 GB)
r5/r6i
Memory-heavy
Databases, caches, in-memory apps
r5.xlarge (4 vCPU, 32 GB)
g4dn
GPU
Machine learning, video encoding
g4dn.xlarge (4 vCPU, GPU)
Pricing Models — Save Up to 90%
💵
On-Demand
Pay per hour/second. No commitment. Most expensive but most flexible. Use for dev/test and unpredictable workloads.
💰
Reserved Instances
Commit for 1 or 3 years → save up to 72%. Best for production servers that run 24/7. You KNOW you need this server for a year.
🏷️
Spot Instances
Bid for unused capacity → save up to 90%. BUT AWS can take it back with 2-minute notice. Perfect for batch jobs, CI/CD builds, data processing.
📋
Savings Plans
Commit to spending $X per hour → flexible across instance types. Simpler than Reserved. Recommended for most companies.
Key Pair & SSH
TERMINAL# Create key pair (do this ONCE)
aws ec2 create-key-pair --key-name my-key --query 'KeyMaterial' --output text > my-key.pem
chmod 400 my-key.pem
# Launch instance
aws ec2 run-instances --image-id ami-0c55b159cbfafe1f0 --instance-type t3.micro --key-name my-key
# SSH into your server
ssh -i my-key.pem ec2-user@<public-ip>
# Check instance status
aws ec2 describe-instances --filters "Name=tag:Name,Values=web-server"
💡 AMI
Amazon Machine Image = a snapshot/template of a server. Like a ghost image. Create an AMI of your configured server → launch 100 identical copies from it. This is how auto-scaling works.
04💾
EBS — Elastic Block Store
Hard Drives for Your EC2 Servers
EBS volumes are the hard drives attached to your EC2 instances. When you stop and start an EC2 instance, your data on EBS survives (unlike the instance store which is temporary). Think of EBS as a USB external hard drive that you can plug into any server.
EBS Volume Types
Type
Speed
Cost
Best For
gp3 (General Purpose SSD)
3000-16000 IOPS
Low
Default choice — boot volumes, apps, databases
io2 (Provisioned IOPS SSD)
Up to 64000 IOPS
High
Mission-critical databases (Oracle, SQL Server)
st1 (Throughput HDD)
500 MB/s max
Very Low
Big data, log processing, data warehouses
sc1 (Cold HDD)
250 MB/s max
Cheapest
Archival, infrequent access
EBS Snapshots
A snapshot is a backup of your EBS volume stored in S3. You can create new volumes from snapshots — even in different regions. This is how you do disaster recovery and migration.
EBS volumes exist in ONE Availability Zone. If that AZ goes down, the volume is unavailable. Solution: take regular snapshots (stored in S3 across AZs) and recreate volumes from snapshots if needed.
05📈
Auto Scaling Groups
Automatically Add/Remove Servers
Auto Scaling Groups (ASG) automatically add more EC2 instances when traffic increases and remove them when traffic drops. Imagine a restaurant that automatically hires more waiters during dinner rush and sends them home when it's quiet. You never pay for idle servers.
How ASG Works — Step by Step
✓Step 1: Create a Launch Template — defines what type of instance to create (AMI, instance type, key pair, security group)
✓Step 2: Create Auto Scaling Group — set minimum (2), desired (2), maximum (10) instance counts
✓Step 3: Attach to Load Balancer — new instances automatically register with ALB
✓Step 4: Set Scaling Policies — rules that say "when CPU > 70% for 5 minutes, add 2 instances"
Keep average CPU at 60% — ASG adds/removes instances automatically
Step Scaling
Add X instances when metric crosses threshold
CPU > 70% → add 2, CPU > 90% → add 4
Scheduled
Scale at specific times
Every Monday 9 AM → set to 10 instances, Friday 6 PM → set to 2
Predictive
ML-based prediction
AWS learns your traffic patterns and scales BEFORE the traffic comes
ASG + ALB Together
ARCHITECTUREASG Configuration:
Min: 2 (always have at least 2 servers running)
Desired: 4 (normally run 4 servers)
Max: 10 (never exceed 10 servers)
Launch Template:
AMI: ami-0c55b (your custom app image)
Instance Type: t3.medium
Security Group: sg-web
User Data: #!/bin/bash
systemctl start nginx
Scaling Policy:
Target: Average CPU Utilization = 60%
Cooldown: 300 seconds (wait 5 min between scaling actions)
Load Balancer: arn:aws:elasticloadbalancing:.../my-alb
Health Check: HTTP /health on port 8080
Result:
Normal day: 4 instances running
Sale event: ASG scales to 8 instances automatically
Night time: ASG scales down to 2 instances
Server crash: ASG replaces unhealthy instance in minutes
💡 Cooldown Period
After scaling up, ASG waits (default 300 seconds) before scaling again. This prevents flip-flopping — adding servers, immediately removing them, adding again. Give your new servers time to warm up.
06⚖️
Load Balancers
Split Traffic & Stay Available
A Load Balancer distributes incoming traffic across multiple EC2 instances. If one server dies, the load balancer stops sending traffic to it. Users never notice. AWS has 3 types of load balancers — each for different use cases.
3 Types of Load Balancers
Type
Layer
Best For
Key Feature
ALB (Application)
Layer 7 (HTTP)
Web apps, APIs, microservices
Path-based routing: /api → backend, /images → CDN
NLB (Network)
Layer 4 (TCP/UDP)
Gaming, IoT, extreme performance
Millions of requests/sec, static IP, ultra-low latency
CLB (Classic)
Layer 4+7
Legacy apps only
DEPRECATED — don't use for new projects
ALB — Application Load Balancer (Most Common)
🎯
Target Groups
A group of EC2 instances, IPs, or Lambda functions. ALB sends traffic to targets in the group. You can have multiple target groups.
👂
Listeners
Rules on which port to listen. Listener on port 443 (HTTPS) → forward to target group. Listener on port 80 → redirect to 443.
📋
Rules
Conditions that decide where traffic goes. IF path = /api/* THEN forward to api-target-group. IF host = admin.site.com THEN forward to admin-target-group.
🏃
Actions
What to do with matched traffic: forward to target group, redirect to another URL, return fixed response, or authenticate with Cognito.
ALB Routing Examples
ALB RULESListener: Port 443 (HTTPS)
Rule 1: IF path = /api/*
THEN forward to → api-target-group (port 8080)
Rule 2: IF path = /admin/*
THEN forward to → admin-target-group (port 3000)
Rule 3: IF host = images.mysite.com
THEN forward to → cdn-target-group
Rule 4: IF path = /old-page
THEN redirect to → https://mysite.com/new-page (301)
Rule 5: IF path = /health
THEN return fixed response → 200 OK "healthy"
Default: Forward to → web-target-group (port 80)
NLB — Network Load Balancer
NLB works at the TCP level (Layer 4) — it doesn't look at HTTP headers or URLs. It just forwards raw TCP packets. This makes it incredibly fast (millions of requests per second) with ultra-low latency. Use NLB for gaming servers, real-time streaming, and when you need a static IP.
Health Checks
HEALTH CHECKHealth Check Configuration:
Protocol: HTTP
Path: /health
Port: 8080
Healthy threshold: 3 (pass 3 checks = healthy)
Unhealthy threshold: 2 (fail 2 checks = unhealthy)
Interval: 30 seconds (check every 30s)
Timeout: 5 seconds (wait max 5s for response)
What happens when a server fails health check:
1. ALB marks it unhealthy
2. ALB stops sending new traffic to it
3. Existing connections drain gracefully
4. ASG detects unhealthy instance
5. ASG terminates it and launches a replacement
6. New instance registers with ALB
7. Health checks pass → traffic resumes
All automatic. Zero human intervention.
07🌐
VPC — Virtual Private Cloud
Your Private Network in AWS
A VPC is your own private, isolated network inside AWS. Think of it like your own office building — you control who enters, which rooms connect to which, and who can access the internet. Every resource (EC2, RDS, Lambda) runs inside a VPC.
VPC Components — The Building Analogy
🏢
VPC
The entire building. You define the total address space: 10.0.0.0/16 = 65,536 IP addresses. One VPC per environment (dev VPC, prod VPC).
🚪
Subnets
Rooms inside the building. Public subnet = room with window to the street (internet access). Private subnet = internal room (no direct internet). You create subnets in different AZs for high availability.
🌍
Internet Gateway (IGW)
The main entrance door connecting your building to the internet. Attach to VPC → public subnets can reach the internet.
📡
NAT Gateway
A one-way mirror. Private subnet instances can ACCESS the internet (download updates) but the internet CANNOT reach them. Like making phone calls but having an unlisted number.
🗺️
Route Tables
Direction signs inside the building. They tell traffic where to go. Public route table: 0.0.0.0/0 → Internet Gateway. Private route table: 0.0.0.0/0 → NAT Gateway.
Typical Production VPC Architecture
ARCHITECTUREVPC: 10.0.0.0/16 (65,536 IPs)
│
├── Public Subnet A: 10.0.1.0/24 (AZ: ap-south-1a)
│ ├── ALB (Application Load Balancer)
│ ├── NAT Gateway
│ └── Bastion Host (jump server for SSH)
│
├── Public Subnet B: 10.0.2.0/24 (AZ: ap-south-1b)
│ └── ALB (second AZ for HA)
│
├── Private Subnet A: 10.0.10.0/24 (AZ: ap-south-1a)
│ └── EC2 App Servers (order-service, user-service)
│
├── Private Subnet B: 10.0.20.0/24 (AZ: ap-south-1b)
│ └── EC2 App Servers (replicas)
│
├── DB Subnet A: 10.0.100.0/24 (AZ: ap-south-1a)
│ └── RDS Primary
│
└── DB Subnet B: 10.0.200.0/24 (AZ: ap-south-1b)
└── RDS Standby (Multi-AZ)
Traffic Flow:
User → ALB (public) → App Server (private) → RDS (db subnet)
App Server → NAT Gateway → Internet (for updates)
Internet ✗→ App Server (blocked — private subnet)
08🛡️
Security Groups & NACLs
Two Layers of Firewall
AWS gives you TWO levels of firewall: Security Groups (instance-level) and NACLs (subnet-level). Think of Security Groups as locks on each room door, and NACLs as security guards at each floor entrance. You need both for proper security.
Security Groups — Room Door Locks
✅
Stateful
If you allow inbound traffic on port 80, the RESPONSE is automatically allowed out. You don't need a separate outbound rule. Smart enough to track connections.
🟢
Allow Only
Security Groups can only ALLOW traffic. There is no "deny" rule. Everything not explicitly allowed is automatically denied.
🔗
Instance Level
Attached directly to EC2 instances, RDS, ALB. Multiple instances can share the same SG.
🔄
Reference Other SGs
Rule: "Allow traffic FROM sg-web-servers" — instead of hardcoding IPs. When new servers join sg-web-servers, they automatically get access.
Security Group Examples
SECURITY GROUPS# sg-web — for ALB (public-facing)
Inbound:
Port 80 (HTTP) → 0.0.0.0/0 (anyone on internet)
Port 443 (HTTPS) → 0.0.0.0/0 (anyone on internet)
Outbound:
All traffic → 0.0.0.0/0 (allow all outbound)
# sg-app — for application servers (private)
Inbound:
Port 8080 → sg-web (only from ALB security group)
Port 22 → sg-bastion (only from bastion/jump server)
Outbound:
All traffic → 0.0.0.0/0
# sg-db — for RDS database (most restricted)
Inbound:
Port 3306 → sg-app (only from app servers)
Outbound:
None needed (stateful — responses auto-allowed)
NACLs — Floor Security Guards
Feature
Security Group
NACL
Level
Instance (EC2, RDS)
Subnet (entire floor)
Stateful/Stateless
Stateful (auto-allows response)
Stateless (must allow inbound AND outbound separately)
Rules
Allow only
Allow AND Deny
Rule Order
All rules evaluated
Rules evaluated in NUMBER order (100, 200, 300...)
Default
Deny all inbound, allow all outbound
Allow ALL (default NACL)
Use Case
Primary firewall for every resource
Extra layer — block specific IPs, compliance requirements
NACL Example
NACL# NACL for Public Subnet
Inbound Rules (evaluated in order):
Rule 100: Allow TCP 443 from 0.0.0.0/0 (HTTPS in)
Rule 200: Allow TCP 80 from 0.0.0.0/0 (HTTP in)
Rule 300: Allow TCP 22 from 10.0.0.0/8 (SSH from VPC only)
Rule 900: DENY ALL from 203.0.113.50 (block specific attacker IP)
Rule * : DENY ALL from 0.0.0.0/0 (default deny everything else)
Outbound Rules:
Rule 100: Allow TCP 1024-65535 to 0.0.0.0/0 (ephemeral ports for responses)
Rule 200: Allow TCP 443 to 0.0.0.0/0 (outbound HTTPS)
Rule * : DENY ALL to 0.0.0.0/0 (default deny)
💡 Interview Answer
\"Security Groups are like room door locks — stateful, allow-only, attached to individual resources. NACLs are like floor security guards — stateless, allow+deny, applied to entire subnets. Use Security Groups as primary firewall, NACLs as an additional security layer.\"
09🔗
VPN & VPC Peering
Connect Networks Securely
VPN creates an encrypted tunnel between your office/data center and AWS. VPC Peering connects two VPCs directly. These are how enterprises connect their on-premises networks to the cloud.
VPN Types
Type
What It Connects
Use Case
How It Works
Site-to-Site VPN
Your office network → AWS VPC
Connect entire office to AWS. All office users access AWS resources.
Encrypted IPSec tunnel between your router and AWS Virtual Private Gateway.
Client VPN
Individual laptop → AWS VPC
Remote employee needs to access private AWS resources from home.
OpenVPN-based. User installs VPN client, authenticates, gets access to VPC.
AWS Direct Connect
Your data center → AWS (dedicated)
High-bandwidth, low-latency connection. Not internet-based.
Physical fiber cable from your DC to AWS. 1 Gbps or 10 Gbps. Most expensive, most reliable.
VPC Peering creates a direct network connection between two VPCs. Traffic flows directly through AWS backbone — no internet, no VPN, no encryption overhead. Both VPCs can talk to each other as if they're on the same network.
VPC PEERINGVPC-A (Production): 10.0.0.0/16
↕ (VPC Peering Connection)
VPC-B (Staging): 172.16.0.0/16
Rules:
✅ VPC-A can talk to VPC-B and vice versa
❌ VPC Peering is NOT transitive:
If A↔B and B↔C, that does NOT mean A↔C
You need a separate peering for A↔C
❌ CIDR ranges must NOT overlap
(both can't use 10.0.0.0/16)
For connecting 10+ VPCs: Use Transit Gateway instead
Transit Gateway = central hub, all VPCs connect to it
Much simpler than managing 45 peering connections
VPN Endpoint Types
Endpoint Type
What It Is
Use Case
Gateway Endpoint
Free, for S3 and DynamoDB only
EC2 in private subnet accessing S3 without going through NAT (saves NAT costs)
Interface Endpoint
Creates ENI in your subnet
Access 80+ AWS services privately (CloudWatch, SQS, ECR, Secrets Manager) without internet
Gateway Load Balancer Endpoint
For third-party appliances
Route traffic through firewall appliances (Palo Alto, Fortinet) before reaching your app
💡 Save Money
Gateway Endpoints for S3 are FREE. If your app servers in private subnets access S3 heavily, use a Gateway Endpoint instead of routing through NAT Gateway (which charges per GB).
10🔥
WAF — Web Application Firewall
Protect Your Apps from Attacks
WAF sits in front of your ALB or CloudFront and inspects every HTTP request before it reaches your application. It blocks SQL injection, cross-site scripting (XSS), bot traffic, and other common web attacks. Think of it as a smart security guard who reads every letter before delivering it.
WAF Components
📋
Web ACL
A collection of rules. Attached to ALB or CloudFront. Each request is checked against all rules in order.
📜
Rules
Individual checks: "block if request contains SQL injection", "allow if IP is in whitelist", "rate limit to 1000 requests per 5 minutes per IP".
📦
Rule Groups
Pre-packaged sets of rules. AWS Managed Rule Groups cover OWASP Top 10, known bad IPs, bots, and more. Enable with one click.
WAF Rule Types
Rule Type
What It Does
Example
IP Set
Allow/block specific IPs
Block country-level IPs or allow only office IPs
Rate-based
Block IPs sending too many requests
Block if > 2000 requests in 5 minutes (DDoS protection)
SQL Injection
Detect SQL injection in request body/URL
Block: /api?id=1; DROP TABLE users
XSS
Detect cross-site scripting attempts
Block: <script>alert("hack")</script>
Geo Match
Allow/block by country
Block traffic from countries you don't serve
AWS Managed
Pre-built rule groups by AWS
AWSManagedRulesCommonRuleSet covers OWASP Top 10
✓Attach WAF to ALB or CloudFront (not directly to EC2)
✓Enable AWS Managed Rules: Common Rule Set + Known Bad Inputs + IP Reputation
✓Add rate limiting: 2000 requests/5 min per IP for APIs
✓Log all blocked requests to S3 for security analysis
✓Use WAF + Shield Standard (free DDoS protection) together
11🌍
Route 53 — DNS
Domain Names & Traffic Routing
Route 53 is AWS's DNS service. It translates human-readable domain names (myapp.com) into IP addresses (52.66.123.45) that computers understand. It also does health checks and intelligent traffic routing.
Routing Policies
Policy
How It Works
Use Case
Simple
One domain → one IP/resource
Basic website, single server
Weighted
Split traffic by percentage
80% to v2, 20% to v3 (canary deployment)
Latency
Route to nearest region
Global app: users in India → Mumbai, users in US → Virginia
Failover
Primary → Backup if primary is down
Primary in Mumbai, backup in Singapore. Auto-failover on health check failure.
Geolocation
Route by user's country
Indian users → Indian servers, US users → US servers
Use Alias (not CNAME) for AWS resources like ALB, CloudFront, S3. Alias is free (no query charges), works at zone apex (myapp.com without www), and resolves faster. CNAME doesn't work at zone apex.
12📦
S3 — Object Storage
Store Anything, Scale Infinitely
S3 stores unlimited files (called objects) with 99.999999999% durability (11 nines — that means if you store 10 million files, you'd lose 1 file every 10,000 years). Used for backups, static websites, Docker images, Terraform state, logs, and data lakes.
Storage Classes — Pay Only for What You Need
Class
Access Pattern
Cost
Example
Standard
Frequently accessed
Highest
App assets, active logs, Docker layers
Standard-IA
Infrequent (1-2 times/month)
~40% less
Backups, disaster recovery files
Glacier Instant
Archive, millisecond access
~68% less
Compliance docs you rarely access
Glacier Deep Archive
Archive, 12-hour retrieval
Cheapest
7-year audit logs, old backups
Lifecycle Rules — Automate Cost Savings
LIFECYCLE# Move objects automatically based on age:
Day 0-30: S3 Standard (frequently accessed)
Day 30-90: S3 Standard-IA (still needed but rarely)
Day 90-365: S3 Glacier Instant (archive, occasional access)
Day 365+: S3 Glacier Deep (long-term archive)
Day 730: DELETE automatically (no longer needed)
# This can save 60-80% on storage costs!
RDS manages your database infrastructure — AWS handles patches, backups, replication, and failover. You focus on your data and queries. It's like having a DBA team included for free.
Supported Engines
Engine
Best For
Special Feature
MySQL
General purpose, most popular
Compatible with existing MySQL apps
PostgreSQL
Complex queries, GIS data
Advanced data types, extensions
Aurora
High performance, auto-scaling
5x faster than MySQL, auto-grows storage
SQL Server
Microsoft/.NET shops
Windows auth, SSIS support
MariaDB
MySQL alternative
Open source, community-driven
Multi-AZ — High Availability
Multi-AZ creates a standby replica in a different Availability Zone. If the primary database crashes, AWS automatically switches to the standby in under 60 seconds. Your application doesn't even need to change connection strings.
Read Replicas — Handle More Traffic
Read Replicas are copies of your database that handle read queries. Your main database handles writes, replicas handle reads. If your app does 80% reads and 20% writes, read replicas can handle 4x more traffic.
✓Enable Multi-AZ for production (automatic failover in <60 seconds)
✓Create Read Replicas for read-heavy applications
✓Enable automated backups with 7+ day retention
✓Place RDS in PRIVATE subnets (never public!)
✓Use IAM authentication instead of password-only
✓Set up CloudWatch alarms for CPU, connections, and free storage
14📊
CloudWatch — Monitoring
The Eyes of Your AWS Infrastructure
CloudWatch collects metrics (numbers), logs (text), and alarms (alerts) from every AWS service. It's like having security cameras and sensors throughout your building. Without CloudWatch, you're flying blind.
Three Pillars of CloudWatch
📈
Metrics
Numbers over time: CPU 73%, Memory 4.2 GB, Request count 1500/sec. Every AWS service automatically sends metrics. EC2 sends CPU, disk, network. ALB sends request count, latency, error rate.
📝
Logs
Text output from your applications. Store, search, and analyze logs centrally. CloudWatch Logs Agent on EC2 sends /var/log/syslog and app logs to CloudWatch. Replaces the need to SSH and grep.
🔔
Alarms
Automatic alerts when metrics cross thresholds. CPU > 85% for 5 minutes → send SNS notification to Slack/email/PagerDuty. Can also trigger Auto Scaling actions.
CloudWatch is AWS-native — zero setup, works immediately. Prometheus/Grafana is open-source — more powerful queries, better dashboards, works across clouds. Many companies use BOTH: CloudWatch for AWS infrastructure, Prometheus/Grafana for application metrics.
15🐳
ECS & EKS — Containers on AWS
Run Docker in Production
ECS (Elastic Container Service) and EKS (Elastic Kubernetes Service) run your Docker containers on AWS. ECS is simpler and AWS-native. EKS gives you full Kubernetes.
Feature
ECS
EKS
Complexity
Simple — AWS proprietary
Complex — full Kubernetes
Learning Curve
Low (if you know Docker)
High (need K8s knowledge)
Portability
AWS only
Multi-cloud, on-prem
Control Plane Cost
Free
$0.10/hour (~$73/month)
Best For
AWS-only teams, simpler apps
K8s teams, multi-cloud, complex apps
ECS with Fargate (Serverless Containers)
Fargate runs your containers WITHOUT managing servers. You define CPU and memory, upload your Docker image, and Fargate handles the rest. No EC2 instances to patch, no clusters to manage. Just containers.
$ docker tag myapp:latest 123456.dkr.ecr.ap-south-1.amazonaws.com/myapp:latestTag image for ECR
$ docker push 123456.dkr.ecr.ap-south-1.amazonaws.com/myapp:latestPush to ECR
16⚡
Lambda — Serverless
Run Code Without Servers
Lambda runs your code in response to events. Someone uploads a file to S3? Lambda processes it. API request comes in? Lambda handles it. You pay only for the milliseconds your code runs. Zero servers to manage.
Lambda Limits
Resource
Limit
Timeout
15 minutes max
Memory
128 MB to 10 GB
Package Size
50 MB zipped, 250 MB unzipped
Concurrent Executions
1000 (default, can increase)
/tmp Storage
512 MB to 10 GB
Common Lambda Triggers for DevOps
📦
S3 Event
File uploaded → Lambda processes it (resize image, scan for malware, parse CSV)
🌐
API Gateway
HTTP request → Lambda handles it (serverless API, webhook handler)
⏰
CloudWatch Event/Cron
Every 5 minutes → Lambda runs (cleanup old snapshots, check SSL expiry)
📨
SQS Queue
Message arrives → Lambda processes it (order processing, email sending)
💡 Use Lambda For DevOps
Auto-cleanup old AMIs, rotate secrets, notify Slack on CloudWatch alarms, process CloudTrail logs, auto-tag untagged resources, backup DynamoDB tables. These small automation tasks are perfect for Lambda.
17🏗️
CloudFormation — IaC
Define Infrastructure in Code
CloudFormation lets you write your entire AWS infrastructure in YAML/JSON files. Instead of clicking 50 buttons in the AWS Console to create a VPC, subnets, EC2, ALB, and RDS — you write one YAML file and CloudFormation creates everything automatically. Delete the stack → everything is cleaned up.
CloudFormation Template
CLOUDFORMATIONAWSTemplateFormatVersion: '2010-09-09'
Description: Web app with ALB and Auto Scaling
Parameters:
InstanceType:
Type: String
Default: t3.micro
AllowedValues: [t3.micro, t3.small, t3.medium]
Environment:
Type: String
Default: staging
Resources:
WebSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Allow HTTP
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 80
ToPort: 80
CidrIp: 0.0.0.0/0
WebServer:
Type: AWS::EC2::Instance
Properties:
InstanceType: !Ref InstanceType
ImageId: ami-0c55b159cbfafe1f0
SecurityGroupIds:
- !Ref WebSecurityGroup
Tags:
- Key: Name
Value: !Sub '${Environment}-web-server'
- Key: Environment
Value: !Ref Environment
Outputs:
ServerIP:
Value: !GetAtt WebServer.PublicIp
Description: Public IP of web server
$ aws cloudformation delete-stack --stack-name my-stackDelete stack (removes ALL resources)
$ aws cloudformation describe-stacks --stack-name my-stackCheck stack status
💡 CloudFormation vs Terraform
CloudFormation: AWS-only, deeply integrated, free, no state file to manage. Terraform: multi-cloud (AWS + Azure + GCP), needs state management, more flexible. For AWS-only teams, CloudFormation is simpler. For multi-cloud, use Terraform.
18💼
Interview Questions
40+ AWS DevOps Q&A
The most asked AWS questions in DevOps interviews — from freshers to experienced.
Networking & Security
❓
Security Group vs NACL?
SG: instance-level, stateful, allow-only. NACL: subnet-level, stateless, allow+deny. SG = room door lock, NACL = floor security guard. Use SGs as primary firewall.
❓
Public vs Private Subnet?
Public: has route to Internet Gateway (resources get public IPs). Private: no direct internet route (use NAT Gateway for outbound). Put app servers in private, ALB in public.
❓
NAT Gateway vs Internet Gateway?
IGW: two-way door (internet can reach you). NAT: one-way mirror (you can reach internet, but internet cannot reach you). Private subnets use NAT for updates.
❓
Site-to-Site VPN vs Direct Connect?
VPN: encrypted tunnel over internet, cheap, quick to set up, variable latency. Direct Connect: dedicated fiber cable, expensive, 1-10 Gbps, consistent low latency.
Compute & Scaling
❓
What is Auto Scaling?
Automatically adds EC2 instances when traffic increases, removes when it drops. Min/desired/max capacity. Uses Launch Templates to create identical instances.
❓
ALB vs NLB?
ALB: Layer 7, HTTP/HTTPS, path-based routing, supports WebSocket. NLB: Layer 4, TCP/UDP, static IP, millions of requests/sec. ALB for web apps, NLB for gaming/TCP.
❓
What are ALB Listeners and Rules?
Listener: port+protocol ALB listens on (443 HTTPS). Rules: conditions that route traffic (IF path=/api THEN forward to api-target-group).
❓
Spot vs Reserved vs On-Demand?
On-Demand: pay per hour, no commitment. Reserved: 1-3 year commitment, 72% savings. Spot: unused capacity, 90% savings, can be terminated. Use Reserved for production, Spot for CI/CD.
Storage & Database
❓
EBS vs S3?
EBS: block storage, attached to EC2, like a hard drive. S3: object storage, accessed via HTTP, unlimited size. EBS for OS/databases, S3 for files/backups/static content.
❓
S3 Storage Classes?
Standard (frequent), Standard-IA (infrequent), Glacier Instant (archive), Glacier Deep (long-term). Use Lifecycle Rules to auto-move objects between classes.
❓
RDS Multi-AZ vs Read Replica?
Multi-AZ: standby in another AZ for failover (high availability). Read Replica: copy for read queries (performance). Multi-AZ for HA, Read Replicas for scale.
❓
What is Route 53?
AWS DNS service. Translates domain names to IPs. Supports routing policies: simple, weighted (canary), latency (nearest region), failover (disaster recovery).
Monitoring & IaC
❓
What is CloudWatch?
AWS monitoring: Metrics (CPU, memory), Logs (centralized log search), Alarms (alert when thresholds crossed). Every AWS service sends data to CloudWatch automatically.
❓
CloudFormation vs Terraform?
CloudFormation: AWS-only, free, no state file. Terraform: multi-cloud, needs state management, more flexible. CloudFormation for AWS-only, Terraform for multi-cloud.
❓
What is WAF?
Web Application Firewall. Sits in front of ALB/CloudFront. Blocks SQL injection, XSS, bot traffic, rate-limits IPs. Use AWS Managed Rules for OWASP Top 10 protection.
❓
VPC Peering vs Transit Gateway?
Peering: direct 1-to-1 VPC connection, not transitive. Transit Gateway: central hub connecting many VPCs (star topology). Use TGW for 5+ VPCs.
❓
Gateway vs Interface Endpoint?
Gateway: free, S3 and DynamoDB only. Interface: creates ENI in subnet, 80+ AWS services, costs per hour+GB. Use Gateway for S3 (saves NAT costs).
❓
ECS vs EKS?
ECS: simple, AWS-native, free control plane. EKS: full Kubernetes, portable, $73/month control plane. ECS for simplicity, EKS for K8s teams.