Every command a DevOps engineer needs β from basic navigation to advanced system administration, user management, networking, and shell scripting with real examples and outputs.
18
Chapters
120+
Commands
100%
Free
01π§
Why Linux for DevOps?
96% of Servers Run Linux
Linux powers 96% of the world's servers, 100% of the top 500 supercomputers, and nearly every cloud instance on AWS, Azure, and GCP. As a DevOps engineer, you will spend most of your time on Linux terminals β deploying apps, debugging issues, writing scripts, and managing infrastructure. Linux is not optional β it IS your daily work environment.
Getting Help β The man Command
TERMINAL# man = manual. Shows documentation for ANY command
man ls # Full manual for ls command
man grep # Full manual for grep
man -k "copy" # Search all manuals for keyword "copy"
# Quick help (shorter than man)
ls --help # Brief usage info
curl --help # Flags and options
# Common man sections:
# man 1 command β user commands
# man 5 config β config file formats
# man 8 admin β admin commands
# Pro tip: press / to search inside man, q to quit
π‘ When Stuck
ALWAYS try man <command> or <command> --help first. 90% of your questions are answered right there. This is how experienced engineers learn new commands β they read the manual, not Google.
02π
Linux Directory Structure
Where Everything Lives
Linux organizes everything in a single tree starting from / (root). Unlike Windows with C: D: drives, Linux has ONE tree. Every file, device, process, and configuration has a specific place. Understanding this structure is essential for troubleshooting.
Directory
What Lives Here
DevOps Examples
/
Root β the top of everything
Starting point of the entire filesystem
/home
User home directories
~/ or /home/suresh β your personal files, SSH keys, .bashrc
In Linux, EVERYTHING is treated as a file β regular files, directories, devices (/dev/sda), processes (/proc/1234), even network sockets. This is a core Linux design principle and frequently asked in interviews.
03π§
Navigation & Listing
Move Around Like a Pro
These are the commands you'll type 100 times a day. Master them until they're muscle memory.
TERMINAL# Where am I right now?
$ pwd
/home/suresh/projects
# Go to a directory
$ cd /var/log # Go to absolute path
$ cd .. # Go up one level (parent directory)
$ cd ~ # Go to home directory (/home/suresh)
$ cd - # Go to PREVIOUS directory (like browser back button)
$ cd # Same as cd ~ (go home)
# List files
$ ls # Basic listing
app.js node_modules package.json README.md
$ ls -la # ALL files (including hidden) with details
total 48
drwxr-xr-x 5 suresh suresh 4096 Jun 1 10:30 .
drwxr-xr-x 8 suresh suresh 4096 May 28 09:15 ..
-rw-r--r-- 1 suresh suresh 245 Jun 1 10:30 .env
-rw-r--r-- 1 suresh suresh 1024 Jun 1 10:15 app.js
drwxr-xr-x 50 suresh suresh 4096 Jun 1 10:00 node_modules
-rw-r--r-- 1 suresh suresh 532 Jun 1 10:00 package.json
# Breakdown of ls -la output:
# drwxr-xr-x = permissions (d=directory, rwx=owner, r-x=group, r-x=others)
# 5 = number of links
# suresh = owner
# suresh = group
# 4096 = size in bytes
# Jun 1 = last modified date
# . = current directory name
$ ls -lh # Human-readable sizes (1.5K, 4.2M, 1G)
$ ls -lt # Sort by modification time (newest first)
$ ls -lS # Sort by size (largest first)
$ ls -R # Recursive β list subdirectories too
# Tree view (install: sudo apt install tree)
$ tree -L 2 # Show 2 levels deep
.
βββ src
β βββ app.js
β βββ routes
βββ package.json
βββ Dockerfile
04π
File Operations
Create, Copy, Move, View & Count
Create files, copy them, move them, view their contents, and count things. These operations form the backbone of every DevOps task.
TERMINAL# Print entire file
$ cat config.yml
server_port: 8080
debug: true
# View with line numbers
$ cat -n app.js
1 const express = require('express');
2 const app = express();
3 app.listen(3000);
# View long files (scrollable)
$ less /var/log/syslog # Scroll up/down, search with /keyword, quit with q
$ more /var/log/syslog # Older, simpler version of less
# First/Last lines
$ head -20 access.log # First 20 lines
$ tail -20 access.log # Last 20 lines
$ tail -f /var/log/syslog # FOLLOW β shows new lines as they appear in real-time
# This is THE most used DevOps command for debugging!
# Ctrl+C to stop tail -f
Finding files, searching text, and processing output are the most valuable DevOps skills. These commands save hours of manual work.
find β Search for Files
TERMINAL# Find by name
$ find / -name "nginx.conf" # Find file named nginx.conf
/etc/nginx/nginx.conf
$ find /var/log -name "*.log" # Find all .log files
/var/log/syslog.log
/var/log/nginx/access.log
/var/log/nginx/error.log
# Find by modification time
$ find /tmp -mtime +7 # Modified MORE than 7 days ago
$ find /tmp -mtime -1 # Modified LESS than 1 day ago (today)
$ find /var/log -mtime +30 -name "*.log" # Logs older than 30 days
# Find by change time (metadata: permissions, ownership)
$ find /etc -ctime -1 # Config files changed in last 24 hours
# Find by size
$ find / -size +100M # Files larger than 100 MB
$ find /var/log -size +50M -name "*.log" # Large log files
# Find and DO SOMETHING
$ find /tmp -mtime +7 -delete # Delete files older than 7 days
$ find . -name "*.bak" -exec rm {} \; # Delete all .bak files
$ find /var/log -name "*.log" -exec gzip {} \; # Compress all log files
# Find by type
$ find /opt -type d # Directories only
$ find /opt -type f # Files only
$ find /opt -type l # Symlinks only
# Find by permissions
$ find / -perm 777 # Files with 777 (security risk!)
$ find / -perm /u+s # Files with setuid bit
grep β Search INSIDE Files
TERMINAL# Basic search
$ grep "ERROR" /var/log/syslog
Jun 1 10:30:15 web1 myapp: ERROR database connection timeout
# Recursive search in all files
$ grep -r "password" /etc/
/etc/myapp/config.yml:db_password: secret123
# Case-insensitive
$ grep -i "error" app.log
# Show line numbers
$ grep -n "TODO" app.js
45: // TODO: add input validation
128: // TODO: implement caching
# Count matches
$ grep -c "404" access.log
234
# Show lines BEFORE and AFTER match (context)
$ grep -B 3 -A 3 "OutOfMemory" app.log
# Shows 3 lines before and 3 after the match β essential for debugging!
# Invert match β show lines WITHOUT the pattern
$ grep -v "health-check" access.log # Filter out noisy health checks
# Multiple patterns
$ grep -E "ERROR|FATAL|CRITICAL" app.log # Any of these patterns
Pipes β Chain Commands Together
TERMINAL# Pipe sends output of one command as input to the next
# Think of it as an assembly line β each station does one job
# Find the 10 largest files
$ du -sh /var/log/* | sort -rh | head -10
2.1G /var/log/journal
450M /var/log/nginx
120M /var/log/syslog
# Count unique IP addresses in access log
$ cat access.log | awk '{print $1}' | sort | uniq -c | sort -rn | head -10
4521 192.168.1.100
3890 10.0.1.50
1234 203.0.113.10
# Find which process is using the most memory
$ ps aux | sort -k 4 -rn | head -5
# Search log for errors, extract timestamp, count per hour
$ grep "ERROR" app.log | awk '{print $1" "$2}' | cut -d: -f1 | uniq -c
06ποΈ
Compression & Archives
tar, gzip, zip β Pack and Unpack
Compressing files saves storage and speeds up transfers. In DevOps, you'll compress log files, create backups, and package applications for deployment.
tar β The Swiss Army Knife
TERMINAL# CREATE archive (tar = tape archive)
$ tar -cvf backup.tar /opt/myapp/ # Create tar (no compression)
# c=create, v=verbose, f=filename
$ tar -czvf backup.tar.gz /opt/myapp/ # Create tar + gzip compression
# z=gzip compression
$ tar -cjvf backup.tar.bz2 /opt/myapp/ # Create tar + bzip2 (smaller but slower)
# EXTRACT archive
$ tar -xvf backup.tar # Extract tar
$ tar -xzvf backup.tar.gz # Extract tar.gz
$ tar -xzvf backup.tar.gz -C /opt/restore/ # Extract to specific directory
# LIST contents without extracting
$ tar -tzvf backup.tar.gz # See what's inside
# Memory trick: tar -czvf = Create Ze Vucking File
# tar -xzvf = eXtract Ze Vucking File
gzip, gunzip, zip, unzip
TERMINAL# gzip β compress single files (replaces original)
$ gzip access.log # Creates access.log.gz, removes original
$ gunzip access.log.gz # Decompress back to access.log
$ gzip -k access.log # Keep original file (-k = keep)
$ gzip -9 large-file.log # Maximum compression (-9)
# zip β create ZIP archives (Windows-compatible)
$ zip backup.zip file1.txt file2.txt # Zip specific files
$ zip -r project.zip myproject/ # Zip entire directory recursively
$ unzip project.zip # Extract
$ unzip project.zip -d /opt/restore/ # Extract to specific directory
$ unzip -l project.zip # List contents without extracting
π‘ When to Use What
tar.gz for Linux backups and deployments (most common). zip for cross-platform sharing (Windows compatibility). gzip for compressing single files (log rotation). bzip2 for maximum compression (large archives).
07π
Permissions & Ownership
Who Can Read, Write, and Execute
Every file in Linux has an owner, a group, and three sets of permissions: read (r), write (w), execute (x). Understanding permissions is the difference between a secure server and a hacked server.
TERMINAL# Numeric method (most common)
$ chmod 755 deploy.sh # rwxr-xr-x (owner:all, group+others:read+exec)
$ chmod 644 config.yml # rw-r--r-- (owner:rw, rest:read-only)
$ chmod 600 secrets.env # rw------- (owner ONLY β for passwords, keys)
$ chmod 700 .ssh/ # rwx------ (private directory)
$ chmod 400 my-key.pem # r-------- (SSH key β read-only by owner)
# Symbolic method
$ chmod +x script.sh # Add execute permission for everyone
$ chmod u+w file.txt # Add write for owner (u=user/owner)
$ chmod g-w file.txt # Remove write from group
$ chmod o-rwx secret.txt # Remove all permissions from others
# Recursive β apply to all files in directory
$ chmod -R 755 /opt/myapp/
Octal
Permission
Use For
755
rwxr-xr-x
Scripts, executables, directories
644
rw-r--r--
Config files, HTML, regular files
600
rw-------
Secrets, .env files, private keys
700
rwx------
Private directories, .ssh/
400
r--------
SSH .pem key files
777
rwxrwxrwx
NEVER use in production!
chown β Change Ownership
TERMINAL# Change owner
$ sudo chown suresh file.txt
# Change owner AND group
$ sudo chown suresh:devops file.txt
# Change group only
$ sudo chgrp docker /var/run/docker.sock
# Recursive (entire directory)
$ sudo chown -R www-data:www-data /var/www/
# Common DevOps scenario:
$ sudo chown -R deploy:deploy /opt/myapp/ # App user owns the app directory
β οΈ 777 Permissions
chmod 777 gives EVERYONE full access to read, write, and execute. This is a massive security risk. If an interviewer hears you suggest 777, the interview is over. Use specific permissions like 755 for executables, 644 for files.
08π₯
User & Group Management
Create Users, Manage Access
Every process, every file, every service runs as a specific user. As a DevOps engineer, you create users for applications, add team members, manage group permissions, and control who can run Docker or access specific servers.
User Management Commands
TERMINAL# Create a new user
$ sudo useradd -m -s /bin/bash deploy
# -m = create home directory (/home/deploy)
# -s = set shell to bash
# Set password
$ sudo passwd deploy
New password: ****
# Create user with specific UID and home
$ sudo useradd -m -u 1500 -s /bin/bash -d /opt/jenkins jenkins
# Modify existing user
$ sudo usermod -aG docker suresh # Add suresh to docker group
$ sudo usermod -aG sudo suresh # Add to sudo group (admin access)
$ sudo usermod -s /bin/zsh suresh # Change shell to zsh
$ sudo usermod -L suresh # Lock account (disable login)
$ sudo usermod -U suresh # Unlock account
# Delete user
$ sudo userdel deploy # Delete user (keep home dir)
$ sudo userdel -r deploy # Delete user AND home directory
# Switch user
$ su - deploy # Switch to deploy user
$ sudo -u deploy whoami # Run command as deploy user
# Who am I?
$ whoami # Current username
$ id # UID, GID, groups
uid=1000(suresh) gid=1000(suresh) groups=1000(suresh),27(sudo),999(docker)
Group Management
TERMINAL# Create a group
$ sudo groupadd devops
# Add user to group (IMPORTANT: use -aG, not just -G)
$ sudo usermod -aG devops suresh
# -a = APPEND to groups (without -a, it REPLACES all groups!)
# -G = supplementary group
# Real-world: Add user to Docker group
$ sudo usermod -aG docker suresh
# Now suresh can run docker commands without sudo
# MUST log out and back in for group change to take effect!
# List groups for a user
$ groups suresh
suresh : suresh sudo docker devops
# See all members of a group
$ getent group docker
docker:x:999:suresh,deploy
# Remove user from group
$ sudo gpasswd -d suresh docker
What Happens When You Install a Service?
TERMINAL# When you install nginx, mysql, jenkins, etc.:
$ sudo apt install nginx
# Linux automatically:
# 1. Creates a system user (no login, no home directory)
$ grep nginx /etc/passwd
nginx:x:33:33:Nginx web server:/var/lib/nginx:/usr/sbin/nologin
# /usr/sbin/nologin = this user CANNOT log in (security!)
# 2. Creates a system group
$ grep nginx /etc/group
nginx:x:33:
# 3. Sets file ownership
$ ls -la /var/log/nginx/
-rw-r----- 1 www-data adm access.log
-rw-r----- 1 www-data adm error.log
# Why? Security principle of least privilege:
# Nginx runs as the nginx/www-data user
# If someone hacks nginx, they only have nginx's limited permissions
# They can NOT access other users' files or system commands
Important User Files
File
What It Contains
Example Line
/etc/passwd
All user accounts
suresh:x:1000:1000:Suresh:/home/suresh:/bin/bash
/etc/shadow
Encrypted passwords
suresh:$6$xyz...:19200:0:99999:7:::
/etc/group
All groups and members
docker:x:999:suresh,deploy
/etc/sudoers
Who can use sudo
suresh ALL=(ALL:ALL) ALL
π‘ Interview Classic
\"When you install a service like nginx or MySQL, Linux creates a dedicated system user with /usr/sbin/nologin shell. This is for security β if the service is compromised, the attacker only gets that user's limited permissions, not root access.\"
09β‘
Process Management
Monitor, Kill & Control Running Programs
Every running program is a process with a unique PID (Process ID). DevOps engineers need to find runaway processes, kill hung services, monitor CPU/memory usage, and run background tasks.
View Processes
TERMINAL# List all processes
$ ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.1 168940 11420 ? Ss Jun01 0:05 /sbin/init
nginx 1234 0.1 0.5 52340 41200 ? S 10:30 0:12 nginx: worker
suresh 5678 2.3 1.2 743200 98432 ? Sl 10:45 1:30 java -jar app.jar
# Find a specific process
$ ps aux | grep nginx
$ ps aux | grep java
# Process tree (who started whom)
$ pstree -p
systemd(1)ββ¬βnginx(1230)ββ¬βnginx(1231)
β ββnginx(1232)
ββsshd(800)βββsshd(1500)βββbash(1501)
ββjava(5678)
# Real-time monitoring
$ top # Real-time process viewer
$ htop # Better version (install: apt install htop)
# In top/htop: press M to sort by memory, P by CPU, q to quit
Kill Processes
TERMINAL# Graceful kill (SIGTERM β asks nicely)
$ kill 5678 # Send SIGTERM to PID 5678
# Force kill (SIGKILL β no mercy)
$ kill -9 5678 # Forcefully terminate
# Kill by name
$ pkill nginx # Kill all nginx processes
$ pkill -f "java -jar app.jar" # Kill by full command line match
# Kill all processes of a user
$ pkill -u deploy # Kill all processes owned by deploy user
# What signal numbers mean:
# kill -15 = SIGTERM (graceful, default)
# kill -9 = SIGKILL (force, last resort)
# kill -1 = SIGHUP (reload config, like nginx reload)
Background & Foreground
TERMINAL# Run in background
$ ./long-running-script.sh & # & puts it in background
[1] 12345 # Job number and PID
# Run and survive logout
$ nohup ./script.sh > output.log 2>&1 &
# nohup = don't stop when terminal closes
# > output.log = redirect stdout
# 2>&1 = redirect stderr to same place
# & = run in background
# Job control
$ jobs # List background jobs
$ fg %1 # Bring job 1 to foreground
$ bg %1 # Send job 1 to background
# Ctrl+Z = pause current foreground job
10π
Service Management β systemd
Start, Stop, Enable & Create Services
systemd manages all services on modern Linux. Every time you install nginx, docker, or jenkins, systemd controls it. You use systemctl to start, stop, enable, and check services.
Essential systemctl Commands
TERMINAL$ sudo systemctl start nginx # Start service NOW
$ sudo systemctl stop nginx # Stop service NOW
$ sudo systemctl restart nginx # Stop + Start (brief downtime)
$ sudo systemctl reload nginx # Reload config without stopping (zero downtime!)
$ sudo systemctl status nginx # Check if running, see recent logs
β nginx.service - A high performance web server
Active: active (running) since Mon 2024-06-01 10:30:00 IST; 2h ago
Main PID: 1234 (nginx)
Tasks: 3 (limit: 4677)
Memory: 8.5M
$ sudo systemctl enable nginx # Start automatically on boot
$ sudo systemctl disable nginx # Don't start on boot
$ sudo systemctl is-active nginx # Just check: active or inactive
$ sudo systemctl is-enabled nginx # Check: enabled or disabled
# List all services
$ systemctl list-units --type=service --state=running
Create Your Own Service
SERVICE FILE# /etc/systemd/system/myapp.service
[Unit]
Description=My Node.js Application
After=network.target # Start after network is ready
Wants=postgresql.service # Prefer PostgreSQL to be running
[Service]
Type=simple
User=deploy # Run as deploy user (not root!)
Group=deploy
WorkingDirectory=/opt/myapp
ExecStart=/usr/bin/node server.js
Restart=always # Auto-restart if it crashes
RestartSec=5 # Wait 5 seconds before restart
EnvironmentFile=/opt/myapp/.env # Load environment variables
StandardOutput=journal # Send logs to journalctl
StandardError=journal
[Install]
WantedBy=multi-user.target # Start when system boots normally
# After creating the file:
$ sudo systemctl daemon-reload # Tell systemd about new service
$ sudo systemctl start myapp # Start it
$ sudo systemctl enable myapp # Start on boot
$ journalctl -u myapp -f # Watch logs in real-time
π‘ Restart=always
This is the most important line in a service file for DevOps. If your app crashes at 3 AM, systemd automatically restarts it in 5 seconds. No pager alert, no manual intervention. Production apps should ALWAYS have Restart=always.
11π¦
Package Management
Install, Update & Remove Software
Package managers download, install, update, and remove software with all dependencies handled automatically. Know BOTH apt (Ubuntu/Debian) and yum/dnf (RHEL/CentOS) β you'll encounter both in the field.
Always install specific versions in production: apt install nginx=1.24.0-1. Without pinning, apt upgrade might update nginx to a version with breaking changes. Your CI/CD pipeline should pin every dependency.
12πΎ
Disk & Storage Management
Check Space, Mount Drives, Monitor I/O
Running out of disk space is one of the most common production incidents. Know how to check usage, find large files, mount external storage, and monitor disk I/O.
TERMINAL# βββ Check Disk Space βββ
$ df -h # Disk usage of all mounted filesystems
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 50G 32G 16G 67% /
/dev/sdb1 100G 45G 55G 45% /data
tmpfs 3.9G 0 3.9G 0% /dev/shm
# βββ Check Directory Sizes βββ
$ du -sh /var/log/* # Size of each item in /var/log
2.1G /var/log/journal
450M /var/log/nginx
120M /var/log/syslog
$ du -sh /opt/myapp # Total size of a directory
1.2G /opt/myapp
# βββ Find Largest Files βββ
$ du -ah / | sort -rh | head -20 # Top 20 largest files/dirs
$ find / -type f -size +100M -exec ls -lh {} \; # Files over 100MB
# βββ Disk I/O Statistics βββ
$ iostat -x 1 5 # Disk I/O stats every 1 sec, 5 times
# Look for: %util (near 100% = bottleneck), await (high = slow disk)
# βββ List Block Devices βββ
$ lsblk # Show all disks and partitions
NAME SIZE TYPE MOUNTPOINT
sda 50G disk
ββsda1 49G part /
ββsda2 1G part [SWAP]
sdb 100G disk
ββsdb1 100G part /data
Mount & Unmount
TERMINAL# Mount a new EBS volume (AWS)
$ sudo mkdir /data # Create mount point
$ sudo mount /dev/xvdf /data # Mount the volume
$ df -h /data # Verify
# Make it permanent (survives reboot)
$ sudo blkid /dev/xvdf # Get UUID
/dev/xvdf: UUID="abc-123" TYPE="ext4"
$ sudo nano /etc/fstab # Add this line:
UUID=abc-123 /data ext4 defaults,nofail 0 2
# Unmount
$ sudo umount /data # Unmount (must not be in use)
$ sudo umount -l /data # Lazy unmount (force)
13π
Networking
IPs, Ports, DNS & Troubleshooting
Network troubleshooting is 50% of DevOps debugging. When your app can't connect to the database, or users can't reach your website, these commands tell you exactly what's wrong.
Check IPs & Interfaces
TERMINAL$ ip addr show # Show all network interfaces and IPs
$ ip addr show eth0 # Specific interface
$ ip route show # Show routing table
default via 10.0.0.1 dev eth0 # Default gateway
$ hostname -I # Quick way to get your IP
Check Open Ports β ss and netstat
TERMINAL# ss = modern replacement for netstat (faster, always available)
$ ss -tlnp # TCP listening ports with process names
State Local Address:Port Process
LISTEN 0.0.0.0:22 sshd
LISTEN 0.0.0.0:80 nginx
LISTEN 0.0.0.0:8080 java
LISTEN 127.0.0.1:3306 mysqld
# t=TCP, l=listening, n=numeric (don't resolve names), p=process
# netstat (older but still used)
$ netstat -tlnp # Same output as ss -tlnp
$ netstat -an | grep ESTABLISHED # Active connections
$ netstat -an | grep :8080 # Who's connected to port 8080?
# What process is using a specific port?
$ sudo lsof -i :8080 # Show process on port 8080
$ sudo fuser 8080/tcp # PID using port 8080
Test Connectivity β ping, nc, curl
TERMINAL# Ping β is the host reachable?
$ ping -c 4 google.com # Send 4 packets
$ ping -c 4 10.0.1.50 # Ping internal server
# nc (netcat) β is the PORT open?
$ nc -zv 10.0.1.50 3306 # Test if MySQL port is open
Connection to 10.0.1.50 3306 port [tcp/mysql] succeeded!
$ nc -zv db.server.com 5432 # Test PostgreSQL port
# Test a range of ports
$ nc -zv 10.0.1.50 8080-8090 # Scan ports 8080 to 8090
# curl β test HTTP endpoints
$ curl -v https://api.myapp.com/health
$ curl -I https://api.myapp.com # Headers only (check status code)
HTTP/2 200
$ curl -o /dev/null -s -w "%{http_code}" https://myapp.com
200 # Just the status code
# wget β download files
$ wget https://releases.app.com/v2.1/app.tar.gz
# DNS lookups
$ dig myapp.com # Full DNS query
$ nslookup myapp.com # Simple DNS lookup
$ dig +short myapp.com # Just the IP
52.66.123.45
# Trace network path
$ traceroute google.com # Show every hop to destination
π‘ Troubleshooting Order
1) ping β can I reach the host? 2) nc -zv host port β is the port open? 3) curl β does the HTTP endpoint respond? 4) ss -tlnp on the server β is the service actually listening? This sequence solves 90% of connectivity issues.
14π₯
Firewall β UFW & iptables
Control Network Access
Firewalls block unwanted traffic. UFW is the simple frontend, iptables is the powerful backend. In cloud environments, you usually use Security Groups, but knowing Linux firewalls is essential for on-premises servers.
TERMINAL# βββ UFW (Ubuntu β Simple) βββ
$ sudo ufw enable # Turn on firewall
$ sudo ufw status verbose # Check status and rules
$ sudo ufw allow 22/tcp # Allow SSH
$ sudo ufw allow 80/tcp # Allow HTTP
$ sudo ufw allow 443/tcp # Allow HTTPS
$ sudo ufw allow from 10.0.0.0/8 # Allow entire internal network
$ sudo ufw deny from 203.0.113.50 # Block specific IP
$ sudo ufw delete allow 80/tcp # Remove a rule
# βββ iptables (Advanced β Low Level) βββ
$ sudo iptables -L -n # List all rules
$ sudo iptables -A INPUT -p tcp --dport 80 -j ACCEPT # Allow HTTP
$ sudo iptables -A INPUT -p tcp --dport 22 -s 10.0.0.0/8 -j ACCEPT # SSH from internal only
$ sudo iptables -A INPUT -j DROP # Drop everything else (add LAST!)
15π
Shell Scripting
Automate Everything with Bash
Every DevOps engineer writes bash scripts for deployments, backups, health checks, cleanup, and monitoring. A well-written script replaces hours of manual work.
BASH SCRIPT#!/bin/bash
# deploy.sh β Production deployment script
set -euo pipefail # Exit on error, undefined vars, pipe fails
# set -e = stop if ANY command fails
# set -u = stop if you use an undefined variable
# set -o pipefail = catch errors in pipes
APP_DIR="/opt/myapp"
BACKUP_DIR="/opt/backups"
DATE=$(date +%Y%m%d_%H%M%S)
echo "[$(date)] Starting deployment..."
# Step 1: Backup current version
if [ -d "$APP_DIR" ]; then
echo "Creating backup..."
tar -czf "$BACKUP_DIR/app_$DATE.tar.gz" "$APP_DIR"
fi
# Step 2: Pull latest code
cd "$APP_DIR"
git pull origin main
# Step 3: Build
npm ci
npm run build
# Step 4: Restart service
sudo systemctl restart myapp
# Step 5: Health check
sleep 5
if curl -sf http://localhost:3000/health > /dev/null; then
echo "[$(date)] Deployment successful!"
else
echo "[$(date)] HEALTH CHECK FAILED! Rolling back..."
tar -xzf "$BACKUP_DIR/app_$DATE.tar.gz" -C /
sudo systemctl restart myapp
exit 1
fi
Key Bash Concepts
BASH# Variables
NAME="suresh"
echo "Hello $NAME" # Output: Hello suresh
echo "Path is ${HOME}/projects" # Use {} for clarity
# Conditionals
if [ -f "/opt/app/config.yml" ]; then
echo "Config file exists"
elif [ -d "/opt/app" ]; then
echo "Directory exists but no config"
else
echo "Nothing exists"
fi
# Tests: -f file exists, -d directory exists, -z string is empty
# -eq equal, -ne not equal, -gt greater than
# Loops
for server in web1 web2 web3; do
echo "Deploying to $server..."
ssh deploy@$server 'cd /opt/app && git pull && systemctl restart app'
done
# While loop
while ! curl -sf http://localhost:8080/health; do
echo "Waiting for app to start..."
sleep 2
done
echo "App is ready!"
# Functions
function deploy() {
local app_name=$1
echo "Deploying $app_name"
ssh deploy@server "systemctl restart $app_name"
}
deploy "order-service"
deploy "user-service"
β οΈ set -euo pipefail
ALWAYS start scripts with this. Without it, errors are silently ignored β a failing command doesn't stop the script. Your deployment continues with corrupt state. This line has saved millions of production incidents.
16π
Log Management
Find Problems Before Users Complain
Logs are the first thing you check when something breaks. Linux has a standard location for all logs, and systemd has journalctl for structured log queries.
Important Log Locations
Log File
What It Contains
When to Check
/var/log/syslog
General system messages
System issues, service failures
/var/log/auth.log
Authentication attempts
SSH logins, sudo usage, failed logins
/var/log/kern.log
Kernel messages
Hardware errors, driver issues, OOM kills
/var/log/nginx/access.log
Nginx HTTP requests
Traffic analysis, 404s, slow requests
/var/log/nginx/error.log
Nginx errors
Config errors, upstream failures
/var/log/apt/history.log
Package installations
What was installed/updated and when
journalctl β systemd Log Viewer
TERMINAL# View logs for a specific service
$ journalctl -u nginx # All nginx logs
$ journalctl -u nginx --since today # Today's nginx logs
$ journalctl -u nginx --since "1 hour ago" # Last hour
$ journalctl -u nginx -f # Follow in real-time (like tail -f)
$ journalctl -u nginx -n 50 # Last 50 lines
# View system-wide
$ journalctl -b # Logs since last boot
$ journalctl -p err # Only errors and above
$ journalctl --since "2024-06-01 10:00" --until "2024-06-01 11:00"
# Search across all logs
$ journalctl | grep "Out of memory" # Find OOM kills
$ journalctl -u myapp --no-pager | grep ERROR
Log Rotation β logrotate
LOGROTATE# /etc/logrotate.d/myapp
/var/log/myapp/*.log {
daily # Rotate every day
rotate 14 # Keep 14 rotated files
compress # Compress old logs with gzip
delaycompress # Don't compress yesterday's (in case needed)
missingok # Don't error if log file is missing
notifempty # Don't rotate empty files
postrotate # Run after rotation
systemctl reload myapp
endscript
}
π‘ Always tail -f First
When debugging a live issue, your first command should be: tail -f /var/log/myapp/error.log. Watch the errors flow in real-time while you reproduce the problem. This is the #1 debugging technique for DevOps engineers.
17π§
System Administration
Cron Jobs, System Info & Performance
Daily sysadmin tasks: schedule automated jobs, check system performance, monitor resources, and manage hostnames/time.
Cron β Schedule Automated Tasks
CRONTAB# Edit crontab (task scheduler)
$ crontab -e
# Format: minute hour day month weekday command
# ββββββββββββββ minute (0-59)
# β ββββββββββββββ hour (0-23)
# β β ββββββββββββββ day of month (1-31)
# β β β ββββββββββββββ month (1-12)
# β β β β ββββββββββββββ day of week (0-7, Sun=0 or 7)
# * * * * * command
# Examples:
0 2 * * * /opt/scripts/backup.sh # Every day at 2:00 AM
*/5 * * * * curl -sf http://localhost:8080/health # Every 5 minutes
0 0 * * 0 /opt/scripts/weekly-cleanup.sh # Every Sunday at midnight
0 9 * * 1-5 /opt/scripts/daily-report.sh # Mon-Fri at 9 AM
# List scheduled jobs
$ crontab -l
# View cron logs
$ grep CRON /var/log/syslog
System Information
TERMINAL# System overview
$ uname -a # Kernel version, architecture
Linux web1 5.15.0-1043-aws #48-Ubuntu SMP x86_64 GNU/Linux
$ uptime # How long running + load average
10:30:15 up 45 days, 3:21, 2 users, load average: 0.75, 0.82, 0.68
# Load average: 1-min, 5-min, 15-min
# On 4-core: load 4.0 = fully used, >4.0 = overloaded
$ free -h # Memory usage
total used free shared buff/cache available
Mem: 7.7G 3.2G 1.1G 120M 3.4G 4.1G
Swap: 2.0G 100M 1.9G
# "available" = actual usable memory (includes reclaimable cache)
# CPU info
$ nproc # Number of CPU cores
4
$ lscpu # Detailed CPU information
# System performance snapshot
$ vmstat 1 5 # Virtual memory stats every 1 sec, 5 times
# Watch: r (run queue), si/so (swap in/out β should be 0)
18πΌ
Interview Questions
40+ Linux Q&A for DevOps
These Linux questions are asked in every DevOps interview β from freshers to senior positions.
Commands & Files
β
Difference between find and grep?
find searches for FILES by name, size, date, permissions. grep searches INSIDE files for text patterns. find locates the file, grep reads its content.
β
find -mtime vs -ctime?
mtime = content modification time (file was edited). ctime = metadata change time (permissions, ownership changed). mtime -7 = modified in last 7 days.
β
How to find large files?
find / -type f -size +100M lists files larger than 100MB. Combine with du -sh /var/log/* | sort -rh | head to find largest directories.
β
head vs tail?
head -20 shows first 20 lines. tail -20 shows last 20 lines. tail -f follows a file in real-time (essential for debugging live logs).
Users & Permissions
β
chmod 755 vs 644?
755 (rwxr-xr-x): for scripts and directories. 644 (rw-r--r--): for config files and regular files. 600 (rw-------): for secrets and private keys.
β
How to add user to docker group?
sudo usermod -aG docker username. The -a flag APPENDS to groups. Without -a, it REPLACES all groups (dangerous!). Must log out and back in.
β
What happens when you install a service?
Linux creates a system user with /usr/sbin/nologin shell, creates a group, sets file ownership. Security: if service is hacked, attacker only gets limited permissions.
β
What is /etc/passwd vs /etc/shadow?
passwd: user accounts (username, UID, shell). shadow: encrypted passwords (only readable by root). Separated for security.
Processes & Services
β
How to find which process uses a port?
ss -tlnp | grep :8080 or sudo lsof -i :8080 or sudo fuser 8080/tcp. Shows the PID and process name.
β
kill vs kill -9?
kill (SIGTERM): asks process to shut down gracefully (save data, close connections). kill -9 (SIGKILL): force kills immediately (no cleanup). Always try kill first.
β
nohup purpose?
nohup command & runs a process that survives when you close the terminal. Without nohup, background processes die when your SSH session disconnects.
β
systemctl reload vs restart?
reload: reads new config without stopping service (zero downtime). restart: stops and starts (brief downtime). Always use reload for Nginx in production.
Networking & Troubleshooting
β
ss vs netstat?
ss is modern and faster (uses kernel netlink). netstat is older, being deprecated. Use ss -tlnp to show listening TCP ports with process names.
β
nc (netcat) usage?
nc -zv host port tests if a port is open. Essential for checking: can my app server reach the database on port 3306? Faster than telnet.
β
How to troubleshoot connectivity?
1) ping host (reachable?), 2) nc -zv host port (port open?), 3) curl endpoint (HTTP working?), 4) ss -tlnp on server (service listening?).
β
What is /etc/hosts?
Local DNS override file. Maps hostnames to IPs. Checked BEFORE DNS servers. Use for testing: add 10.0.1.50 api.myapp.com to test against a specific server.
Storage & Disk
β
df vs du?
df shows filesystem-level usage (how full is /dev/sda1). du shows directory-level usage (how big is /var/log). df for overview, du for drilling down.
β
How to mount a volume?
mkdir /data, mount /dev/xvdf /data, add to /etc/fstab for permanent mount. In AWS, after attaching EBS volume, you must format (mkfs) and mount it.
β
iostat purpose?
Shows disk I/O statistics. %util near 100% = disk bottleneck. High await = slow disk. Essential for diagnosing slow database or application performance.
β
What is swap?
Virtual memory on disk. When RAM is full, Linux moves inactive pages to swap. High swap usage = need more RAM. Check with free -h and swapon --show.
βMaster cd, ls, grep, find, cat, tail -f β they're 80% of your daily work