← Home☸️ K8s
☁️ Complete AWS for DevOps

AWS DevOps Guide

EC2, Auto Scaling, Load Balancers, VPC, Security Groups, NACLs, WAF, VPN, Route 53, S3, RDS, CloudWatch, ECS, Lambda, CloudFormation — all in simple terms.

18
Chapters
40+
Services
100%
Free
01☁️

Introduction to AWS

Cloud Computing for DevOps

AWS is the world's largest cloud platform with 200+ services. As a DevOps engineer, you don't need to know all 200 — you need to master about 15-20 core services that form the backbone of every production environment. This guide covers exactly those services in simple terms.
Core AWS Concepts — Think of It Like This
🌍
Regions & AZs
AWS has data centers in 30+ cities worldwide (regions). Each region has 2-3 separate buildings (Availability Zones) so if one building loses power, your app keeps running in the other.
💰
Pay-As-You-Go
Like electricity — you only pay for what you use. Turn off a server at night? You stop paying. No upfront cost.
🔐
Shared Responsibility
AWS secures the buildings, power, and network cables. YOU secure what you put inside — your data, your passwords, your firewall rules.
AWS CLI Setup
TERMINAL# Install AWS CLI curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o awscli.zip unzip awscli.zip && sudo ./aws/install # Configure your credentials aws configure # Access Key ID: AKIA... # Secret Key: ... # Region: ap-south-1 (Mumbai) # Output: json # Test: who am I? aws sts get-caller-identity
02🔐

IAM — Identity & Access

Who Can Do What in Your AWS Account

IAM controls access to your entire AWS account. Think of it as the security guard at the building entrance — checking IDs, giving visitor passes, and making sure nobody enters restricted areas without permission.
IAM Building Blocks
👤
Users
Individual people. Each gets their own username and password. Suresh, Priya, Rahul — each has their own IAM user.
👥
Groups
Collections of users. Create a group called "Developers" and put Priya and Rahul in it. Attach permissions to the group — all members get those permissions.
🎭
Roles
Temporary access for SERVICES (not people). An EC2 instance needs to access S3? Give it an IAM Role. No passwords needed — it just works.
📜
Policies
JSON documents that say "allow this action on this resource." Policies attach to users, groups, or roles.
IAM Policy Example
JSON{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": ["s3:GetObject", "s3:PutObject"], "Resource": "arn:aws:s3:::my-app-bucket/*" }, { "Effect": "Deny", "Action": "s3:DeleteBucket", "Resource": "*" } ] } // Translation: You CAN read and upload files to my-app-bucket. // You CANNOT delete any bucket. Deny always wins over Allow.
Never use the root account for daily tasks — create an IAM admin user
Enable MFA (Multi-Factor Authentication) on every IAM user
Use IAM Roles for EC2/Lambda/ECS — never put access keys inside code
Follow least privilege: give minimum permissions needed
Rotate access keys every 90 days
⚠️ Root Account

Root account has UNLIMITED power — can delete everything, change billing, close the account. Lock it away, enable MFA, and create an IAM admin user for daily work.

03🖥️

EC2 — Elastic Compute

Virtual Servers in the Cloud

EC2 gives you virtual servers (called instances) that you can launch in minutes. It's like renting a computer in Amazon's data center. You choose the size (CPU, RAM), the operating system (Amazon Linux, Ubuntu), and you're running in 30 seconds.
Instance Families — Which One to Pick
FamilyCPU/RAMBest ForExample
t3/t4gBalanced, burstableWeb servers, small apps, dev/testt3.micro (1 vCPU, 1 GB)
m5/m6iBalanced, steadyApp servers, backend APIsm5.xlarge (4 vCPU, 16 GB)
c5/c6iCPU-heavyBatch processing, CI/CD agentsc5.2xlarge (8 vCPU, 16 GB)
r5/r6iMemory-heavyDatabases, caches, in-memory appsr5.xlarge (4 vCPU, 32 GB)
g4dnGPUMachine learning, video encodingg4dn.xlarge (4 vCPU, GPU)
Pricing Models — Save Up to 90%
💵
On-Demand
Pay per hour/second. No commitment. Most expensive but most flexible. Use for dev/test and unpredictable workloads.
💰
Reserved Instances
Commit for 1 or 3 years → save up to 72%. Best for production servers that run 24/7. You KNOW you need this server for a year.
🏷️
Spot Instances
Bid for unused capacity → save up to 90%. BUT AWS can take it back with 2-minute notice. Perfect for batch jobs, CI/CD builds, data processing.
📋
Savings Plans
Commit to spending $X per hour → flexible across instance types. Simpler than Reserved. Recommended for most companies.
Key Pair & SSH
TERMINAL# Create key pair (do this ONCE) aws ec2 create-key-pair --key-name my-key --query 'KeyMaterial' --output text > my-key.pem chmod 400 my-key.pem # Launch instance aws ec2 run-instances --image-id ami-0c55b159cbfafe1f0 --instance-type t3.micro --key-name my-key # SSH into your server ssh -i my-key.pem ec2-user@<public-ip> # Check instance status aws ec2 describe-instances --filters "Name=tag:Name,Values=web-server"
💡 AMI

Amazon Machine Image = a snapshot/template of a server. Like a ghost image. Create an AMI of your configured server → launch 100 identical copies from it. This is how auto-scaling works.

04💾

EBS — Elastic Block Store

Hard Drives for Your EC2 Servers

EBS volumes are the hard drives attached to your EC2 instances. When you stop and start an EC2 instance, your data on EBS survives (unlike the instance store which is temporary). Think of EBS as a USB external hard drive that you can plug into any server.
EBS Volume Types
TypeSpeedCostBest For
gp3 (General Purpose SSD)3000-16000 IOPSLowDefault choice — boot volumes, apps, databases
io2 (Provisioned IOPS SSD)Up to 64000 IOPSHighMission-critical databases (Oracle, SQL Server)
st1 (Throughput HDD)500 MB/s maxVery LowBig data, log processing, data warehouses
sc1 (Cold HDD)250 MB/s maxCheapestArchival, infrequent access
EBS Snapshots
A snapshot is a backup of your EBS volume stored in S3. You can create new volumes from snapshots — even in different regions. This is how you do disaster recovery and migration.
$ aws ec2 create-snapshot --volume-id vol-12345 --description "Daily backup"Create snapshot
$ aws ec2 create-volume --snapshot-id snap-12345 --availability-zone ap-south-1aCreate volume from snapshot
💡 Interview Fact

EBS volumes exist in ONE Availability Zone. If that AZ goes down, the volume is unavailable. Solution: take regular snapshots (stored in S3 across AZs) and recreate volumes from snapshots if needed.

05📈

Auto Scaling Groups

Automatically Add/Remove Servers

Auto Scaling Groups (ASG) automatically add more EC2 instances when traffic increases and remove them when traffic drops. Imagine a restaurant that automatically hires more waiters during dinner rush and sends them home when it's quiet. You never pay for idle servers.
How ASG Works — Step by Step
Step 1: Create a Launch Template — defines what type of instance to create (AMI, instance type, key pair, security group)
Step 2: Create Auto Scaling Group — set minimum (2), desired (2), maximum (10) instance counts
Step 3: Attach to Load Balancer — new instances automatically register with ALB
Step 4: Set Scaling Policies — rules that say "when CPU > 70% for 5 minutes, add 2 instances"
Step 5: Done! ASG monitors, adds, removes, and replaces instances automatically
Scaling Policies
Policy TypeHow It WorksExample
Target TrackingKeep a metric at a target valueKeep average CPU at 60% — ASG adds/removes instances automatically
Step ScalingAdd X instances when metric crosses thresholdCPU > 70% → add 2, CPU > 90% → add 4
ScheduledScale at specific timesEvery Monday 9 AM → set to 10 instances, Friday 6 PM → set to 2
PredictiveML-based predictionAWS learns your traffic patterns and scales BEFORE the traffic comes
ASG + ALB Together
ARCHITECTUREASG Configuration: Min: 2 (always have at least 2 servers running) Desired: 4 (normally run 4 servers) Max: 10 (never exceed 10 servers) Launch Template: AMI: ami-0c55b (your custom app image) Instance Type: t3.medium Security Group: sg-web User Data: #!/bin/bash systemctl start nginx Scaling Policy: Target: Average CPU Utilization = 60% Cooldown: 300 seconds (wait 5 min between scaling actions) Load Balancer: arn:aws:elasticloadbalancing:.../my-alb Health Check: HTTP /health on port 8080 Result: Normal day: 4 instances running Sale event: ASG scales to 8 instances automatically Night time: ASG scales down to 2 instances Server crash: ASG replaces unhealthy instance in minutes
💡 Cooldown Period

After scaling up, ASG waits (default 300 seconds) before scaling again. This prevents flip-flopping — adding servers, immediately removing them, adding again. Give your new servers time to warm up.

06⚖️

Load Balancers

Split Traffic & Stay Available

A Load Balancer distributes incoming traffic across multiple EC2 instances. If one server dies, the load balancer stops sending traffic to it. Users never notice. AWS has 3 types of load balancers — each for different use cases.
3 Types of Load Balancers
TypeLayerBest ForKey Feature
ALB (Application)Layer 7 (HTTP)Web apps, APIs, microservicesPath-based routing: /api → backend, /images → CDN
NLB (Network)Layer 4 (TCP/UDP)Gaming, IoT, extreme performanceMillions of requests/sec, static IP, ultra-low latency
CLB (Classic)Layer 4+7Legacy apps onlyDEPRECATED — don't use for new projects
ALB — Application Load Balancer (Most Common)
🎯
Target Groups
A group of EC2 instances, IPs, or Lambda functions. ALB sends traffic to targets in the group. You can have multiple target groups.
👂
Listeners
Rules on which port to listen. Listener on port 443 (HTTPS) → forward to target group. Listener on port 80 → redirect to 443.
📋
Rules
Conditions that decide where traffic goes. IF path = /api/* THEN forward to api-target-group. IF host = admin.site.com THEN forward to admin-target-group.
🏃
Actions
What to do with matched traffic: forward to target group, redirect to another URL, return fixed response, or authenticate with Cognito.
ALB Routing Examples
ALB RULESListener: Port 443 (HTTPS) Rule 1: IF path = /api/* THEN forward to → api-target-group (port 8080) Rule 2: IF path = /admin/* THEN forward to → admin-target-group (port 3000) Rule 3: IF host = images.mysite.com THEN forward to → cdn-target-group Rule 4: IF path = /old-page THEN redirect to → https://mysite.com/new-page (301) Rule 5: IF path = /health THEN return fixed response → 200 OK "healthy" Default: Forward to → web-target-group (port 80)
NLB — Network Load Balancer
NLB works at the TCP level (Layer 4) — it doesn't look at HTTP headers or URLs. It just forwards raw TCP packets. This makes it incredibly fast (millions of requests per second) with ultra-low latency. Use NLB for gaming servers, real-time streaming, and when you need a static IP.
Health Checks
HEALTH CHECKHealth Check Configuration: Protocol: HTTP Path: /health Port: 8080 Healthy threshold: 3 (pass 3 checks = healthy) Unhealthy threshold: 2 (fail 2 checks = unhealthy) Interval: 30 seconds (check every 30s) Timeout: 5 seconds (wait max 5s for response) What happens when a server fails health check: 1. ALB marks it unhealthy 2. ALB stops sending new traffic to it 3. Existing connections drain gracefully 4. ASG detects unhealthy instance 5. ASG terminates it and launches a replacement 6. New instance registers with ALB 7. Health checks pass → traffic resumes All automatic. Zero human intervention.
07🌐

VPC — Virtual Private Cloud

Your Private Network in AWS

A VPC is your own private, isolated network inside AWS. Think of it like your own office building — you control who enters, which rooms connect to which, and who can access the internet. Every resource (EC2, RDS, Lambda) runs inside a VPC.
VPC Components — The Building Analogy
🏢
VPC
The entire building. You define the total address space: 10.0.0.0/16 = 65,536 IP addresses. One VPC per environment (dev VPC, prod VPC).
🚪
Subnets
Rooms inside the building. Public subnet = room with window to the street (internet access). Private subnet = internal room (no direct internet). You create subnets in different AZs for high availability.
🌍
Internet Gateway (IGW)
The main entrance door connecting your building to the internet. Attach to VPC → public subnets can reach the internet.
📡
NAT Gateway
A one-way mirror. Private subnet instances can ACCESS the internet (download updates) but the internet CANNOT reach them. Like making phone calls but having an unlisted number.
🗺️
Route Tables
Direction signs inside the building. They tell traffic where to go. Public route table: 0.0.0.0/0 → Internet Gateway. Private route table: 0.0.0.0/0 → NAT Gateway.
Typical Production VPC Architecture
ARCHITECTUREVPC: 10.0.0.0/16 (65,536 IPs) │ ├── Public Subnet A: 10.0.1.0/24 (AZ: ap-south-1a) │ ├── ALB (Application Load Balancer) │ ├── NAT Gateway │ └── Bastion Host (jump server for SSH) │ ├── Public Subnet B: 10.0.2.0/24 (AZ: ap-south-1b) │ └── ALB (second AZ for HA) │ ├── Private Subnet A: 10.0.10.0/24 (AZ: ap-south-1a) │ └── EC2 App Servers (order-service, user-service) │ ├── Private Subnet B: 10.0.20.0/24 (AZ: ap-south-1b) │ └── EC2 App Servers (replicas) │ ├── DB Subnet A: 10.0.100.0/24 (AZ: ap-south-1a) │ └── RDS Primary │ └── DB Subnet B: 10.0.200.0/24 (AZ: ap-south-1b) └── RDS Standby (Multi-AZ) Traffic Flow: User → ALB (public) → App Server (private) → RDS (db subnet) App Server → NAT Gateway → Internet (for updates) Internet ✗→ App Server (blocked — private subnet)
08🛡️

Security Groups & NACLs

Two Layers of Firewall

AWS gives you TWO levels of firewall: Security Groups (instance-level) and NACLs (subnet-level). Think of Security Groups as locks on each room door, and NACLs as security guards at each floor entrance. You need both for proper security.
Security Groups — Room Door Locks
Stateful
If you allow inbound traffic on port 80, the RESPONSE is automatically allowed out. You don't need a separate outbound rule. Smart enough to track connections.
🟢
Allow Only
Security Groups can only ALLOW traffic. There is no "deny" rule. Everything not explicitly allowed is automatically denied.
🔗
Instance Level
Attached directly to EC2 instances, RDS, ALB. Multiple instances can share the same SG.
🔄
Reference Other SGs
Rule: "Allow traffic FROM sg-web-servers" — instead of hardcoding IPs. When new servers join sg-web-servers, they automatically get access.
Security Group Examples
SECURITY GROUPS# sg-web — for ALB (public-facing) Inbound: Port 80 (HTTP) → 0.0.0.0/0 (anyone on internet) Port 443 (HTTPS) → 0.0.0.0/0 (anyone on internet) Outbound: All traffic → 0.0.0.0/0 (allow all outbound) # sg-app — for application servers (private) Inbound: Port 8080 → sg-web (only from ALB security group) Port 22 → sg-bastion (only from bastion/jump server) Outbound: All traffic → 0.0.0.0/0 # sg-db — for RDS database (most restricted) Inbound: Port 3306 → sg-app (only from app servers) Outbound: None needed (stateful — responses auto-allowed)
NACLs — Floor Security Guards
FeatureSecurity GroupNACL
LevelInstance (EC2, RDS)Subnet (entire floor)
Stateful/StatelessStateful (auto-allows response)Stateless (must allow inbound AND outbound separately)
RulesAllow onlyAllow AND Deny
Rule OrderAll rules evaluatedRules evaluated in NUMBER order (100, 200, 300...)
DefaultDeny all inbound, allow all outboundAllow ALL (default NACL)
Use CasePrimary firewall for every resourceExtra layer — block specific IPs, compliance requirements
NACL Example
NACL# NACL for Public Subnet Inbound Rules (evaluated in order): Rule 100: Allow TCP 443 from 0.0.0.0/0 (HTTPS in) Rule 200: Allow TCP 80 from 0.0.0.0/0 (HTTP in) Rule 300: Allow TCP 22 from 10.0.0.0/8 (SSH from VPC only) Rule 900: DENY ALL from 203.0.113.50 (block specific attacker IP) Rule * : DENY ALL from 0.0.0.0/0 (default deny everything else) Outbound Rules: Rule 100: Allow TCP 1024-65535 to 0.0.0.0/0 (ephemeral ports for responses) Rule 200: Allow TCP 443 to 0.0.0.0/0 (outbound HTTPS) Rule * : DENY ALL to 0.0.0.0/0 (default deny)
💡 Interview Answer

\"Security Groups are like room door locks — stateful, allow-only, attached to individual resources. NACLs are like floor security guards — stateless, allow+deny, applied to entire subnets. Use Security Groups as primary firewall, NACLs as an additional security layer.\"

09🔗

VPN & VPC Peering

Connect Networks Securely

VPN creates an encrypted tunnel between your office/data center and AWS. VPC Peering connects two VPCs directly. These are how enterprises connect their on-premises networks to the cloud.
VPN Types
TypeWhat It ConnectsUse CaseHow It Works
Site-to-Site VPNYour office network → AWS VPCConnect entire office to AWS. All office users access AWS resources.Encrypted IPSec tunnel between your router and AWS Virtual Private Gateway.
Client VPNIndividual laptop → AWS VPCRemote employee needs to access private AWS resources from home.OpenVPN-based. User installs VPN client, authenticates, gets access to VPC.
AWS Direct ConnectYour data center → AWS (dedicated)High-bandwidth, low-latency connection. Not internet-based.Physical fiber cable from your DC to AWS. 1 Gbps or 10 Gbps. Most expensive, most reliable.
Site-to-Site VPN — Most Common
ARCHITECTUREYour Office Router AWS ┌──────────┐ Encrypted Tunnel ┌──────────────┐ │ Customer │ ═══════════════════════ │ Virtual │ │ Gateway │ (IPSec VPN) │ Private │ │ Device │ │ Gateway │ └──────────┘ └──────┬───────┘ 192.168.0.0/16 │ (your office network) VPC: 10.0.0.0/16 (your AWS network) Result: Office computers (192.168.x.x) can reach AWS servers (10.0.x.x) through encrypted tunnel.
VPC Peering — Connect Two VPCs
VPC Peering creates a direct network connection between two VPCs. Traffic flows directly through AWS backbone — no internet, no VPN, no encryption overhead. Both VPCs can talk to each other as if they're on the same network.
VPC PEERINGVPC-A (Production): 10.0.0.0/16 ↕ (VPC Peering Connection) VPC-B (Staging): 172.16.0.0/16 Rules: ✅ VPC-A can talk to VPC-B and vice versa ❌ VPC Peering is NOT transitive: If A↔B and B↔C, that does NOT mean A↔C You need a separate peering for A↔C ❌ CIDR ranges must NOT overlap (both can't use 10.0.0.0/16) For connecting 10+ VPCs: Use Transit Gateway instead Transit Gateway = central hub, all VPCs connect to it Much simpler than managing 45 peering connections
VPN Endpoint Types
Endpoint TypeWhat It IsUse Case
Gateway EndpointFree, for S3 and DynamoDB onlyEC2 in private subnet accessing S3 without going through NAT (saves NAT costs)
Interface EndpointCreates ENI in your subnetAccess 80+ AWS services privately (CloudWatch, SQS, ECR, Secrets Manager) without internet
Gateway Load Balancer EndpointFor third-party appliancesRoute traffic through firewall appliances (Palo Alto, Fortinet) before reaching your app
💡 Save Money

Gateway Endpoints for S3 are FREE. If your app servers in private subnets access S3 heavily, use a Gateway Endpoint instead of routing through NAT Gateway (which charges per GB).

10🔥

WAF — Web Application Firewall

Protect Your Apps from Attacks

WAF sits in front of your ALB or CloudFront and inspects every HTTP request before it reaches your application. It blocks SQL injection, cross-site scripting (XSS), bot traffic, and other common web attacks. Think of it as a smart security guard who reads every letter before delivering it.
WAF Components
📋
Web ACL
A collection of rules. Attached to ALB or CloudFront. Each request is checked against all rules in order.
📜
Rules
Individual checks: "block if request contains SQL injection", "allow if IP is in whitelist", "rate limit to 1000 requests per 5 minutes per IP".
📦
Rule Groups
Pre-packaged sets of rules. AWS Managed Rule Groups cover OWASP Top 10, known bad IPs, bots, and more. Enable with one click.
WAF Rule Types
Rule TypeWhat It DoesExample
IP SetAllow/block specific IPsBlock country-level IPs or allow only office IPs
Rate-basedBlock IPs sending too many requestsBlock if > 2000 requests in 5 minutes (DDoS protection)
SQL InjectionDetect SQL injection in request body/URLBlock: /api?id=1; DROP TABLE users
XSSDetect cross-site scripting attemptsBlock: <script>alert("hack")</script>
Geo MatchAllow/block by countryBlock traffic from countries you don't serve
AWS ManagedPre-built rule groups by AWSAWSManagedRulesCommonRuleSet covers OWASP Top 10
Attach WAF to ALB or CloudFront (not directly to EC2)
Enable AWS Managed Rules: Common Rule Set + Known Bad Inputs + IP Reputation
Add rate limiting: 2000 requests/5 min per IP for APIs
Log all blocked requests to S3 for security analysis
Use WAF + Shield Standard (free DDoS protection) together
11🌍

Route 53 — DNS

Domain Names & Traffic Routing

Route 53 is AWS's DNS service. It translates human-readable domain names (myapp.com) into IP addresses (52.66.123.45) that computers understand. It also does health checks and intelligent traffic routing.
Routing Policies
PolicyHow It WorksUse Case
SimpleOne domain → one IP/resourceBasic website, single server
WeightedSplit traffic by percentage80% to v2, 20% to v3 (canary deployment)
LatencyRoute to nearest regionGlobal app: users in India → Mumbai, users in US → Virginia
FailoverPrimary → Backup if primary is downPrimary in Mumbai, backup in Singapore. Auto-failover on health check failure.
GeolocationRoute by user's countryIndian users → Indian servers, US users → US servers
Multi-valueReturn multiple healthy IPsSimple load balancing at DNS level
Common DNS Record Types
RecordPoints ToExample
AIPv4 addressmyapp.com → 52.66.123.45
AAAAIPv6 addressmyapp.com → 2001:db8::1
CNAMEAnother domain namewww.myapp.com → myapp.com
AliasAWS resource (free, faster)myapp.com → d111.cloudfront.net (ALB, CloudFront, S3)
MXMail servermyapp.com → mail.google.com (for email)
TXTText verificationUsed for SSL verification, SPF, DKIM
💡 Alias vs CNAME

Use Alias (not CNAME) for AWS resources like ALB, CloudFront, S3. Alias is free (no query charges), works at zone apex (myapp.com without www), and resolves faster. CNAME doesn't work at zone apex.

12📦

S3 — Object Storage

Store Anything, Scale Infinitely

S3 stores unlimited files (called objects) with 99.999999999% durability (11 nines — that means if you store 10 million files, you'd lose 1 file every 10,000 years). Used for backups, static websites, Docker images, Terraform state, logs, and data lakes.
Storage Classes — Pay Only for What You Need
ClassAccess PatternCostExample
StandardFrequently accessedHighestApp assets, active logs, Docker layers
Standard-IAInfrequent (1-2 times/month)~40% lessBackups, disaster recovery files
Glacier InstantArchive, millisecond access~68% lessCompliance docs you rarely access
Glacier Deep ArchiveArchive, 12-hour retrievalCheapest7-year audit logs, old backups
Lifecycle Rules — Automate Cost Savings
LIFECYCLE# Move objects automatically based on age: Day 0-30: S3 Standard (frequently accessed) Day 30-90: S3 Standard-IA (still needed but rarely) Day 90-365: S3 Glacier Instant (archive, occasional access) Day 365+: S3 Glacier Deep (long-term archive) Day 730: DELETE automatically (no longer needed) # This can save 60-80% on storage costs!
S3 CLI Commands
$ aws s3 mb s3://my-app-bucketCreate bucket
$ aws s3 cp app.jar s3://my-app-bucket/releases/Upload file
$ aws s3 sync ./build s3://frontend-bucket --deleteSync folder (deploy React app)
$ aws s3 ls s3://my-app-bucket/ --recursiveList all objects
$ aws s3 presign s3://bucket/file.pdf --expires-in 3600Generate temp download URL (1 hour)
13🗄️

RDS — Managed Databases

MySQL, PostgreSQL, Aurora

RDS manages your database infrastructure — AWS handles patches, backups, replication, and failover. You focus on your data and queries. It's like having a DBA team included for free.
Supported Engines
EngineBest ForSpecial Feature
MySQLGeneral purpose, most popularCompatible with existing MySQL apps
PostgreSQLComplex queries, GIS dataAdvanced data types, extensions
AuroraHigh performance, auto-scaling5x faster than MySQL, auto-grows storage
SQL ServerMicrosoft/.NET shopsWindows auth, SSIS support
MariaDBMySQL alternativeOpen source, community-driven
Multi-AZ — High Availability
Multi-AZ creates a standby replica in a different Availability Zone. If the primary database crashes, AWS automatically switches to the standby in under 60 seconds. Your application doesn't even need to change connection strings.
Read Replicas — Handle More Traffic
Read Replicas are copies of your database that handle read queries. Your main database handles writes, replicas handle reads. If your app does 80% reads and 20% writes, read replicas can handle 4x more traffic.
Enable Multi-AZ for production (automatic failover in <60 seconds)
Create Read Replicas for read-heavy applications
Enable automated backups with 7+ day retention
Place RDS in PRIVATE subnets (never public!)
Use IAM authentication instead of password-only
Set up CloudWatch alarms for CPU, connections, and free storage
14📊

CloudWatch — Monitoring

The Eyes of Your AWS Infrastructure

CloudWatch collects metrics (numbers), logs (text), and alarms (alerts) from every AWS service. It's like having security cameras and sensors throughout your building. Without CloudWatch, you're flying blind.
Three Pillars of CloudWatch
📈
Metrics
Numbers over time: CPU 73%, Memory 4.2 GB, Request count 1500/sec. Every AWS service automatically sends metrics. EC2 sends CPU, disk, network. ALB sends request count, latency, error rate.
📝
Logs
Text output from your applications. Store, search, and analyze logs centrally. CloudWatch Logs Agent on EC2 sends /var/log/syslog and app logs to CloudWatch. Replaces the need to SSH and grep.
🔔
Alarms
Automatic alerts when metrics cross thresholds. CPU > 85% for 5 minutes → send SNS notification to Slack/email/PagerDuty. Can also trigger Auto Scaling actions.
Important Default EC2 Metrics
MetricWhat It MeasuresAlert Threshold
CPUUtilizationPercentage of CPU used> 85% for 5 min
NetworkIn/OutBytes transferredUnusual spike (DDoS indicator)
StatusCheckFailedHardware/software health> 0 (something is wrong)
DiskReadOps/WriteOpsDisk I/O operationsHigh IOPS = disk bottleneck
CloudWatch Logs — Search All Logs in One Place
JSON# Install CloudWatch Agent on EC2 sudo yum install amazon-cloudwatch-agent # Configure which log files to send # /opt/aws/amazon-cloudwatch-agent/etc/config.json { "logs": { "logs_collected": { "files": { "collect_list": [ { "file_path": "/var/log/syslog", "log_group_name": "production/syslog" }, { "file_path": "/var/log/myapp/*.log", "log_group_name": "production/myapp" } ] } } } }
💡 CloudWatch vs Prometheus/Grafana

CloudWatch is AWS-native — zero setup, works immediately. Prometheus/Grafana is open-source — more powerful queries, better dashboards, works across clouds. Many companies use BOTH: CloudWatch for AWS infrastructure, Prometheus/Grafana for application metrics.

15🐳

ECS & EKS — Containers on AWS

Run Docker in Production

ECS (Elastic Container Service) and EKS (Elastic Kubernetes Service) run your Docker containers on AWS. ECS is simpler and AWS-native. EKS gives you full Kubernetes.
FeatureECSEKS
ComplexitySimple — AWS proprietaryComplex — full Kubernetes
Learning CurveLow (if you know Docker)High (need K8s knowledge)
PortabilityAWS onlyMulti-cloud, on-prem
Control Plane CostFree$0.10/hour (~$73/month)
Best ForAWS-only teams, simpler appsK8s teams, multi-cloud, complex apps
ECS with Fargate (Serverless Containers)
Fargate runs your containers WITHOUT managing servers. You define CPU and memory, upload your Docker image, and Fargate handles the rest. No EC2 instances to patch, no clusters to manage. Just containers.
ECR — Elastic Container Registry
$ aws ecr get-login-password | docker login --username AWS --password-stdin 123456.dkr.ecr.ap-south-1.amazonaws.comLogin to ECR
$ docker tag myapp:latest 123456.dkr.ecr.ap-south-1.amazonaws.com/myapp:latestTag image for ECR
$ docker push 123456.dkr.ecr.ap-south-1.amazonaws.com/myapp:latestPush to ECR
16

Lambda — Serverless

Run Code Without Servers

Lambda runs your code in response to events. Someone uploads a file to S3? Lambda processes it. API request comes in? Lambda handles it. You pay only for the milliseconds your code runs. Zero servers to manage.
Lambda Limits
ResourceLimit
Timeout15 minutes max
Memory128 MB to 10 GB
Package Size50 MB zipped, 250 MB unzipped
Concurrent Executions1000 (default, can increase)
/tmp Storage512 MB to 10 GB
Common Lambda Triggers for DevOps
📦
S3 Event
File uploaded → Lambda processes it (resize image, scan for malware, parse CSV)
🌐
API Gateway
HTTP request → Lambda handles it (serverless API, webhook handler)
CloudWatch Event/Cron
Every 5 minutes → Lambda runs (cleanup old snapshots, check SSL expiry)
📨
SQS Queue
Message arrives → Lambda processes it (order processing, email sending)
💡 Use Lambda For DevOps

Auto-cleanup old AMIs, rotate secrets, notify Slack on CloudWatch alarms, process CloudTrail logs, auto-tag untagged resources, backup DynamoDB tables. These small automation tasks are perfect for Lambda.

17🏗️

CloudFormation — IaC

Define Infrastructure in Code

CloudFormation lets you write your entire AWS infrastructure in YAML/JSON files. Instead of clicking 50 buttons in the AWS Console to create a VPC, subnets, EC2, ALB, and RDS — you write one YAML file and CloudFormation creates everything automatically. Delete the stack → everything is cleaned up.
CloudFormation Template
CLOUDFORMATIONAWSTemplateFormatVersion: '2010-09-09' Description: Web app with ALB and Auto Scaling Parameters: InstanceType: Type: String Default: t3.micro AllowedValues: [t3.micro, t3.small, t3.medium] Environment: Type: String Default: staging Resources: WebSecurityGroup: Type: AWS::EC2::SecurityGroup Properties: GroupDescription: Allow HTTP SecurityGroupIngress: - IpProtocol: tcp FromPort: 80 ToPort: 80 CidrIp: 0.0.0.0/0 WebServer: Type: AWS::EC2::Instance Properties: InstanceType: !Ref InstanceType ImageId: ami-0c55b159cbfafe1f0 SecurityGroupIds: - !Ref WebSecurityGroup Tags: - Key: Name Value: !Sub '${Environment}-web-server' - Key: Environment Value: !Ref Environment Outputs: ServerIP: Value: !GetAtt WebServer.PublicIp Description: Public IP of web server
CloudFormation CLI
$ aws cloudformation create-stack --stack-name my-stack --template-body file://template.ymlCreate stack
$ aws cloudformation update-stack --stack-name my-stack --template-body file://template.ymlUpdate stack
$ aws cloudformation delete-stack --stack-name my-stackDelete stack (removes ALL resources)
$ aws cloudformation describe-stacks --stack-name my-stackCheck stack status
💡 CloudFormation vs Terraform

CloudFormation: AWS-only, deeply integrated, free, no state file to manage. Terraform: multi-cloud (AWS + Azure + GCP), needs state management, more flexible. For AWS-only teams, CloudFormation is simpler. For multi-cloud, use Terraform.

18💼

Interview Questions

40+ AWS DevOps Q&A

The most asked AWS questions in DevOps interviews — from freshers to experienced.
Networking & Security
Security Group vs NACL?
SG: instance-level, stateful, allow-only. NACL: subnet-level, stateless, allow+deny. SG = room door lock, NACL = floor security guard. Use SGs as primary firewall.
Public vs Private Subnet?
Public: has route to Internet Gateway (resources get public IPs). Private: no direct internet route (use NAT Gateway for outbound). Put app servers in private, ALB in public.
NAT Gateway vs Internet Gateway?
IGW: two-way door (internet can reach you). NAT: one-way mirror (you can reach internet, but internet cannot reach you). Private subnets use NAT for updates.
Site-to-Site VPN vs Direct Connect?
VPN: encrypted tunnel over internet, cheap, quick to set up, variable latency. Direct Connect: dedicated fiber cable, expensive, 1-10 Gbps, consistent low latency.
Compute & Scaling
What is Auto Scaling?
Automatically adds EC2 instances when traffic increases, removes when it drops. Min/desired/max capacity. Uses Launch Templates to create identical instances.
ALB vs NLB?
ALB: Layer 7, HTTP/HTTPS, path-based routing, supports WebSocket. NLB: Layer 4, TCP/UDP, static IP, millions of requests/sec. ALB for web apps, NLB for gaming/TCP.
What are ALB Listeners and Rules?
Listener: port+protocol ALB listens on (443 HTTPS). Rules: conditions that route traffic (IF path=/api THEN forward to api-target-group).
Spot vs Reserved vs On-Demand?
On-Demand: pay per hour, no commitment. Reserved: 1-3 year commitment, 72% savings. Spot: unused capacity, 90% savings, can be terminated. Use Reserved for production, Spot for CI/CD.
Storage & Database
EBS vs S3?
EBS: block storage, attached to EC2, like a hard drive. S3: object storage, accessed via HTTP, unlimited size. EBS for OS/databases, S3 for files/backups/static content.
S3 Storage Classes?
Standard (frequent), Standard-IA (infrequent), Glacier Instant (archive), Glacier Deep (long-term). Use Lifecycle Rules to auto-move objects between classes.
RDS Multi-AZ vs Read Replica?
Multi-AZ: standby in another AZ for failover (high availability). Read Replica: copy for read queries (performance). Multi-AZ for HA, Read Replicas for scale.
What is Route 53?
AWS DNS service. Translates domain names to IPs. Supports routing policies: simple, weighted (canary), latency (nearest region), failover (disaster recovery).
Monitoring & IaC
What is CloudWatch?
AWS monitoring: Metrics (CPU, memory), Logs (centralized log search), Alarms (alert when thresholds crossed). Every AWS service sends data to CloudWatch automatically.
CloudFormation vs Terraform?
CloudFormation: AWS-only, free, no state file. Terraform: multi-cloud, needs state management, more flexible. CloudFormation for AWS-only, Terraform for multi-cloud.
What is WAF?
Web Application Firewall. Sits in front of ALB/CloudFront. Blocks SQL injection, XSS, bot traffic, rate-limits IPs. Use AWS Managed Rules for OWASP Top 10 protection.
VPC Peering vs Transit Gateway?
Peering: direct 1-to-1 VPC connection, not transitive. Transit Gateway: central hub connecting many VPCs (star topology). Use TGW for 5+ VPCs.
Gateway vs Interface Endpoint?
Gateway: free, S3 and DynamoDB only. Interface: creates ENI in subnet, 80+ AWS services, costs per hour+GB. Use Gateway for S3 (saves NAT costs).
ECS vs EKS?
ECS: simple, AWS-native, free control plane. EKS: full Kubernetes, portable, $73/month control plane. ECS for simplicity, EKS for K8s teams.