🐧 Complete Linux for DevOps

Linux Complete Guide

Every command a DevOps engineer needs — from basic navigation to advanced system administration, user management, networking, and shell scripting with real examples and outputs.

Chapters

120+

Commands

100%

Free

01🐧

Why Linux for DevOps?

96% of Servers Run Linux

Linux powers 96% of the world's servers, 100% of the top 500 supercomputers, and nearly every cloud instance on AWS, Azure, and GCP. As a DevOps engineer, you will spend most of your time on Linux terminals — deploying apps, debugging issues, writing scripts, and managing infrastructure. Linux is not optional — it IS your daily work environment.

Getting Help — The man Command

TERMINAL# man = manual. Shows documentation for ANY command man ls # Full manual for ls command man grep # Full manual for grep man -k "copy" # Search all manuals for keyword "copy" # Quick help (shorter than man) ls --help # Brief usage info curl --help # Flags and options # Common man sections: # man 1 command → user commands # man 5 config → config file formats # man 8 admin → admin commands # Pro tip: press / to search inside man, q to quit

💡 When Stuck

ALWAYS try man <command> or <command> --help first. 90% of your questions are answered right there. This is how experienced engineers learn new commands — they read the manual, not Google.

02📁

Linux Directory Structure

Where Everything Lives

Linux organizes everything in a single tree starting from / (root). Unlike Windows with C: D: drives, Linux has ONE tree. Every file, device, process, and configuration has a specific place. Understanding this structure is essential for troubleshooting.

Directory	What Lives Here	DevOps Examples
/	Root — the top of everything	Starting point of the entire filesystem
/home	User home directories	~/ or /home/suresh — your personal files, SSH keys, .bashrc
/root	Root user's home directory	Only accessible by root. NOT the same as /
/etc	Configuration files	nginx.conf, sshd_config, hosts, resolv.conf, crontab
/var	Variable data — things that change	Logs (/var/log), databases, mail, cache, PID files
/var/log	All system and app logs	syslog, auth.log, nginx/access.log, journal/
/tmp	Temporary files — cleared on reboot	Build artifacts, temp downloads, session data
/opt	Third-party software	Jenkins, Prometheus, custom apps installed outside package manager
/usr	User programs and libraries	/usr/bin (commands), /usr/lib (libraries), /usr/share (docs)
/usr/local	Manually compiled/installed software	Software you built from source goes here
/bin	Essential user commands	ls, cp, mv, cat, grep — always available even in recovery
/sbin	System admin commands	iptables, fdisk, mount, systemctl — need root
/proc	Virtual filesystem — running processes	/proc/cpuinfo (CPU), /proc/meminfo (RAM), /proc/PID/
/dev	Device files	sda (disk), null, random, tty (terminals)
/mnt	Temporary mount points	Mount external drives, NFS shares here
/srv	Service data	Web server files, FTP data

💡 Everything is a File

In Linux, EVERYTHING is treated as a file — regular files, directories, devices (/dev/sda), processes (/proc/1234), even network sockets. This is a core Linux design principle and frequently asked in interviews.

03🧭

Navigation & Listing

Move Around Like a Pro

These are the commands you'll type 100 times a day. Master them until they're muscle memory.

TERMINAL# Where am I right now? $ pwd /home/suresh/projects # Go to a directory $ cd /var/log # Go to absolute path $ cd .. # Go up one level (parent directory) $ cd ~ # Go to home directory (/home/suresh) $ cd - # Go to PREVIOUS directory (like browser back button) $ cd # Same as cd ~ (go home) # List files $ ls # Basic listing app.js node_modules package.json README.md $ ls -la # ALL files (including hidden) with details total 48 drwxr-xr-x 5 suresh suresh 4096 Jun 1 10:30 . drwxr-xr-x 8 suresh suresh 4096 May 28 09:15 .. -rw-r--r-- 1 suresh suresh 245 Jun 1 10:30 .env -rw-r--r-- 1 suresh suresh 1024 Jun 1 10:15 app.js drwxr-xr-x 50 suresh suresh 4096 Jun 1 10:00 node_modules -rw-r--r-- 1 suresh suresh 532 Jun 1 10:00 package.json # Breakdown of ls -la output: # drwxr-xr-x = permissions (d=directory, rwx=owner, r-x=group, r-x=others) # 5 = number of links # suresh = owner # suresh = group # 4096 = size in bytes # Jun 1 = last modified date # . = current directory name $ ls -lh # Human-readable sizes (1.5K, 4.2M, 1G) $ ls -lt # Sort by modification time (newest first) $ ls -lS # Sort by size (largest first) $ ls -R # Recursive — list subdirectories too # Tree view (install: sudo apt install tree) $ tree -L 2 # Show 2 levels deep . ├── src │ ├── app.js │ └── routes ├── package.json └── Dockerfile

04📄

File Operations

Create, Copy, Move, View & Count

Create files, copy them, move them, view their contents, and count things. These operations form the backbone of every DevOps task.

Create & Delete

TERMINAL# Create empty file $ touch config.yml # Create file with content $ echo "server_port: 8080" > config.yml # Overwrite (>) $ echo "debug: true" >> config.yml # Append (>>) # Create directories $ mkdir logs # Single directory $ mkdir -p deploy/staging/configs # Create nested (parent + child) # Delete $ rm file.txt # Delete file $ rm -r old-folder/ # Delete directory recursively $ rm -rf build/ # Force delete (no confirmation) # WARNING: rm -rf has NO undo. Double-check before running!

Copy & Move

TERMINAL# Copy $ cp app.conf app.conf.backup # Copy file $ cp -r src/ src-backup/ # Copy directory recursively $ cp -p file.txt /backup/ # Preserve permissions and timestamps # Move / Rename $ mv old-name.txt new-name.txt # Rename file $ mv config.yml /etc/myapp/ # Move to another directory $ mv logs/*.log /archive/ # Move all .log files

View File Contents

TERMINAL# Print entire file $ cat config.yml server_port: 8080 debug: true # View with line numbers $ cat -n app.js 1 const express = require('express'); 2 const app = express(); 3 app.listen(3000); # View long files (scrollable) $ less /var/log/syslog # Scroll up/down, search with /keyword, quit with q $ more /var/log/syslog # Older, simpler version of less # First/Last lines $ head -20 access.log # First 20 lines $ tail -20 access.log # Last 20 lines $ tail -f /var/log/syslog # FOLLOW — shows new lines as they appear in real-time # This is THE most used DevOps command for debugging! # Ctrl+C to stop tail -f

Count — wc (word count)

TERMINAL$ wc -l access.log # Count LINES 15234 access.log $ wc -w README.md # Count WORDS 342 README.md $ wc -c app.jar # Count BYTES (file size) 45678901 app.jar # Count specific patterns $ grep -c "ERROR" app.log # Count lines containing ERROR 47 $ cat access.log | grep "500" | wc -l # Count 500 errors 12

05🔍

Search & Text Processing

Find Anything on Your Server

Finding files, searching text, and processing output are the most valuable DevOps skills. These commands save hours of manual work.

find — Search for Files

TERMINAL# Find by name $ find / -name "nginx.conf" # Find file named nginx.conf /etc/nginx/nginx.conf $ find /var/log -name "*.log" # Find all .log files /var/log/syslog.log /var/log/nginx/access.log /var/log/nginx/error.log # Find by modification time $ find /tmp -mtime +7 # Modified MORE than 7 days ago $ find /tmp -mtime -1 # Modified LESS than 1 day ago (today) $ find /var/log -mtime +30 -name "*.log" # Logs older than 30 days # Find by change time (metadata: permissions, ownership) $ find /etc -ctime -1 # Config files changed in last 24 hours # Find by size $ find / -size +100M # Files larger than 100 MB $ find /var/log -size +50M -name "*.log" # Large log files # Find and DO SOMETHING $ find /tmp -mtime +7 -delete # Delete files older than 7 days $ find . -name "*.bak" -exec rm {} \; # Delete all .bak files $ find /var/log -name "*.log" -exec gzip {} \; # Compress all log files # Find by type $ find /opt -type d # Directories only $ find /opt -type f # Files only $ find /opt -type l # Symlinks only # Find by permissions $ find / -perm 777 # Files with 777 (security risk!) $ find / -perm /u+s # Files with setuid bit

grep — Search INSIDE Files

TERMINAL# Basic search $ grep "ERROR" /var/log/syslog Jun 1 10:30:15 web1 myapp: ERROR database connection timeout # Recursive search in all files $ grep -r "password" /etc/ /etc/myapp/config.yml:db_password: secret123 # Case-insensitive $ grep -i "error" app.log # Show line numbers $ grep -n "TODO" app.js 45: // TODO: add input validation 128: // TODO: implement caching # Count matches $ grep -c "404" access.log 234 # Show lines BEFORE and AFTER match (context) $ grep -B 3 -A 3 "OutOfMemory" app.log # Shows 3 lines before and 3 after the match — essential for debugging! # Invert match — show lines WITHOUT the pattern $ grep -v "health-check" access.log # Filter out noisy health checks # Multiple patterns $ grep -E "ERROR|FATAL|CRITICAL" app.log # Any of these patterns

Pipes — Chain Commands Together

TERMINAL# Pipe sends output of one command as input to the next # Think of it as an assembly line — each station does one job # Find the 10 largest files $ du -sh /var/log/* | sort -rh | head -10 2.1G /var/log/journal 450M /var/log/nginx 120M /var/log/syslog # Count unique IP addresses in access log $ cat access.log | awk '{print $1}' | sort | uniq -c | sort -rn | head -10 4521 192.168.1.100 3890 10.0.1.50 1234 203.0.113.10 # Find which process is using the most memory $ ps aux | sort -k 4 -rn | head -5 # Search log for errors, extract timestamp, count per hour $ grep "ERROR" app.log | awk '{print $1" "$2}' | cut -d: -f1 | uniq -c

06🗜️

Compression & Archives

tar, gzip, zip — Pack and Unpack

Compressing files saves storage and speeds up transfers. In DevOps, you'll compress log files, create backups, and package applications for deployment.

tar — The Swiss Army Knife

TERMINAL# CREATE archive (tar = tape archive) $ tar -cvf backup.tar /opt/myapp/ # Create tar (no compression) # c=create, v=verbose, f=filename $ tar -czvf backup.tar.gz /opt/myapp/ # Create tar + gzip compression # z=gzip compression $ tar -cjvf backup.tar.bz2 /opt/myapp/ # Create tar + bzip2 (smaller but slower) # EXTRACT archive $ tar -xvf backup.tar # Extract tar $ tar -xzvf backup.tar.gz # Extract tar.gz $ tar -xzvf backup.tar.gz -C /opt/restore/ # Extract to specific directory # LIST contents without extracting $ tar -tzvf backup.tar.gz # See what's inside # Memory trick: tar -czvf = Create Ze Vucking File # tar -xzvf = eXtract Ze Vucking File

gzip, gunzip, zip, unzip

TERMINAL# gzip — compress single files (replaces original) $ gzip access.log # Creates access.log.gz, removes original $ gunzip access.log.gz # Decompress back to access.log $ gzip -k access.log # Keep original file (-k = keep) $ gzip -9 large-file.log # Maximum compression (-9) # zip — create ZIP archives (Windows-compatible) $ zip backup.zip file1.txt file2.txt # Zip specific files $ zip -r project.zip myproject/ # Zip entire directory recursively $ unzip project.zip # Extract $ unzip project.zip -d /opt/restore/ # Extract to specific directory $ unzip -l project.zip # List contents without extracting

💡 When to Use What

tar.gz for Linux backups and deployments (most common). zip for cross-platform sharing (Windows compatibility). gzip for compressing single files (log rotation). bzip2 for maximum compression (large archives).

07🔒

Permissions & Ownership

Who Can Read, Write, and Execute

Every file in Linux has an owner, a group, and three sets of permissions: read (r), write (w), execute (x). Understanding permissions is the difference between a secure server and a hacked server.

TERMINAL# Permission format explained: $ ls -l app.sh -rwxr-xr-- 1 suresh devops 1024 Jun 1 10:30 app.sh │││ │││ │││ │││ │││ └── Others: r-- (read only) │││ └──── Group: r-x (read + execute) └────── Owner: rwx (read + write + execute) # Numeric (octal) representation: # r=4, w=2, x=1 → add them up # rwx = 4+2+1 = 7 # r-x = 4+0+1 = 5 # r-- = 4+0+0 = 4 # So rwxr-xr-- = 754

chmod — Change Permissions

TERMINAL# Numeric method (most common) $ chmod 755 deploy.sh # rwxr-xr-x (owner:all, group+others:read+exec) $ chmod 644 config.yml # rw-r--r-- (owner:rw, rest:read-only) $ chmod 600 secrets.env # rw------- (owner ONLY — for passwords, keys) $ chmod 700 .ssh/ # rwx------ (private directory) $ chmod 400 my-key.pem # r-------- (SSH key — read-only by owner) # Symbolic method $ chmod +x script.sh # Add execute permission for everyone $ chmod u+w file.txt # Add write for owner (u=user/owner) $ chmod g-w file.txt # Remove write from group $ chmod o-rwx secret.txt # Remove all permissions from others # Recursive — apply to all files in directory $ chmod -R 755 /opt/myapp/

Octal	Permission	Use For
755	rwxr-xr-x	Scripts, executables, directories
644	rw-r--r--	Config files, HTML, regular files
600	rw-------	Secrets, .env files, private keys
700	rwx------	Private directories, .ssh/
400	r--------	SSH .pem key files
777	rwxrwxrwx	NEVER use in production!

chown — Change Ownership

TERMINAL# Change owner $ sudo chown suresh file.txt # Change owner AND group $ sudo chown suresh:devops file.txt # Change group only $ sudo chgrp docker /var/run/docker.sock # Recursive (entire directory) $ sudo chown -R www-data:www-data /var/www/ # Common DevOps scenario: $ sudo chown -R deploy:deploy /opt/myapp/ # App user owns the app directory

⚠️ 777 Permissions

chmod 777 gives EVERYONE full access to read, write, and execute. This is a massive security risk. If an interviewer hears you suggest 777, the interview is over. Use specific permissions like 755 for executables, 644 for files.

08👥

User & Group Management

Create Users, Manage Access

Every process, every file, every service runs as a specific user. As a DevOps engineer, you create users for applications, add team members, manage group permissions, and control who can run Docker or access specific servers.

User Management Commands

TERMINAL# Create a new user $ sudo useradd -m -s /bin/bash deploy # -m = create home directory (/home/deploy) # -s = set shell to bash # Set password $ sudo passwd deploy New password: **** # Create user with specific UID and home $ sudo useradd -m -u 1500 -s /bin/bash -d /opt/jenkins jenkins # Modify existing user $ sudo usermod -aG docker suresh # Add suresh to docker group $ sudo usermod -aG sudo suresh # Add to sudo group (admin access) $ sudo usermod -s /bin/zsh suresh # Change shell to zsh $ sudo usermod -L suresh # Lock account (disable login) $ sudo usermod -U suresh # Unlock account # Delete user $ sudo userdel deploy # Delete user (keep home dir) $ sudo userdel -r deploy # Delete user AND home directory # Switch user $ su - deploy # Switch to deploy user $ sudo -u deploy whoami # Run command as deploy user # Who am I? $ whoami # Current username $ id # UID, GID, groups uid=1000(suresh) gid=1000(suresh) groups=1000(suresh),27(sudo),999(docker)

Group Management

TERMINAL# Create a group $ sudo groupadd devops # Add user to group (IMPORTANT: use -aG, not just -G) $ sudo usermod -aG devops suresh # -a = APPEND to groups (without -a, it REPLACES all groups!) # -G = supplementary group # Real-world: Add user to Docker group $ sudo usermod -aG docker suresh # Now suresh can run docker commands without sudo # MUST log out and back in for group change to take effect! # List groups for a user $ groups suresh suresh : suresh sudo docker devops # See all members of a group $ getent group docker docker:x:999:suresh,deploy # Remove user from group $ sudo gpasswd -d suresh docker

What Happens When You Install a Service?

TERMINAL# When you install nginx, mysql, jenkins, etc.: $ sudo apt install nginx # Linux automatically: # 1. Creates a system user (no login, no home directory) $ grep nginx /etc/passwd nginx:x:33:33:Nginx web server:/var/lib/nginx:/usr/sbin/nologin # /usr/sbin/nologin = this user CANNOT log in (security!) # 2. Creates a system group $ grep nginx /etc/group nginx:x:33: # 3. Sets file ownership $ ls -la /var/log/nginx/ -rw-r----- 1 www-data adm access.log -rw-r----- 1 www-data adm error.log # Why? Security principle of least privilege: # Nginx runs as the nginx/www-data user # If someone hacks nginx, they only have nginx's limited permissions # They can NOT access other users' files or system commands

Important User Files

File	What It Contains	Example Line
/etc/passwd	All user accounts	suresh:x:1000:1000:Suresh:/home/suresh:/bin/bash
/etc/shadow	Encrypted passwords	suresh:$6$xyz...:19200:0:99999:7:::
/etc/group	All groups and members	docker:x:999:suresh,deploy
/etc/sudoers	Who can use sudo	suresh ALL=(ALL:ALL) ALL

💡 Interview Classic

\"When you install a service like nginx or MySQL, Linux creates a dedicated system user with /usr/sbin/nologin shell. This is for security — if the service is compromised, the attacker only gets that user's limited permissions, not root access.\"

09⚡

Process Management

Monitor, Kill & Control Running Programs

Every running program is a process with a unique PID (Process ID). DevOps engineers need to find runaway processes, kill hung services, monitor CPU/memory usage, and run background tasks.

View Processes

TERMINAL# List all processes $ ps aux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.1 168940 11420 ? Ss Jun01 0:05 /sbin/init nginx 1234 0.1 0.5 52340 41200 ? S 10:30 0:12 nginx: worker suresh 5678 2.3 1.2 743200 98432 ? Sl 10:45 1:30 java -jar app.jar # Find a specific process $ ps aux | grep nginx $ ps aux | grep java # Process tree (who started whom) $ pstree -p systemd(1)─┬─nginx(1230)─┬─nginx(1231) │ └─nginx(1232) ├─sshd(800)───sshd(1500)───bash(1501) └─java(5678) # Real-time monitoring $ top # Real-time process viewer $ htop # Better version (install: apt install htop) # In top/htop: press M to sort by memory, P by CPU, q to quit

Kill Processes

TERMINAL# Graceful kill (SIGTERM — asks nicely) $ kill 5678 # Send SIGTERM to PID 5678 # Force kill (SIGKILL — no mercy) $ kill -9 5678 # Forcefully terminate # Kill by name $ pkill nginx # Kill all nginx processes $ pkill -f "java -jar app.jar" # Kill by full command line match # Kill all processes of a user $ pkill -u deploy # Kill all processes owned by deploy user # What signal numbers mean: # kill -15 = SIGTERM (graceful, default) # kill -9 = SIGKILL (force, last resort) # kill -1 = SIGHUP (reload config, like nginx reload)

Background & Foreground

TERMINAL# Run in background $ ./long-running-script.sh & # & puts it in background [1] 12345 # Job number and PID # Run and survive logout $ nohup ./script.sh > output.log 2>&1 & # nohup = don't stop when terminal closes # > output.log = redirect stdout # 2>&1 = redirect stderr to same place # & = run in background # Job control $ jobs # List background jobs $ fg %1 # Bring job 1 to foreground $ bg %1 # Send job 1 to background # Ctrl+Z = pause current foreground job

10🔄

Service Management — systemd

Start, Stop, Enable & Create Services

systemd manages all services on modern Linux. Every time you install nginx, docker, or jenkins, systemd controls it. You use systemctl to start, stop, enable, and check services.

Essential systemctl Commands

TERMINAL$ sudo systemctl start nginx # Start service NOW $ sudo systemctl stop nginx # Stop service NOW $ sudo systemctl restart nginx # Stop + Start (brief downtime) $ sudo systemctl reload nginx # Reload config without stopping (zero downtime!) $ sudo systemctl status nginx # Check if running, see recent logs ● nginx.service - A high performance web server Active: active (running) since Mon 2024-06-01 10:30:00 IST; 2h ago Main PID: 1234 (nginx) Tasks: 3 (limit: 4677) Memory: 8.5M $ sudo systemctl enable nginx # Start automatically on boot $ sudo systemctl disable nginx # Don't start on boot $ sudo systemctl is-active nginx # Just check: active or inactive $ sudo systemctl is-enabled nginx # Check: enabled or disabled # List all services $ systemctl list-units --type=service --state=running

Create Your Own Service

SERVICE FILE# /etc/systemd/system/myapp.service [Unit] Description=My Node.js Application After=network.target # Start after network is ready Wants=postgresql.service # Prefer PostgreSQL to be running [Service] Type=simple User=deploy # Run as deploy user (not root!) Group=deploy WorkingDirectory=/opt/myapp ExecStart=/usr/bin/node server.js Restart=always # Auto-restart if it crashes RestartSec=5 # Wait 5 seconds before restart EnvironmentFile=/opt/myapp/.env # Load environment variables StandardOutput=journal # Send logs to journalctl StandardError=journal [Install] WantedBy=multi-user.target # Start when system boots normally # After creating the file: $ sudo systemctl daemon-reload # Tell systemd about new service $ sudo systemctl start myapp # Start it $ sudo systemctl enable myapp # Start on boot $ journalctl -u myapp -f # Watch logs in real-time

💡 Restart=always

This is the most important line in a service file for DevOps. If your app crashes at 3 AM, systemd automatically restarts it in 5 seconds. No pager alert, no manual intervention. Production apps should ALWAYS have Restart=always.

11📦

Package Management

Install, Update & Remove Software

Package managers download, install, update, and remove software with all dependencies handled automatically. Know BOTH apt (Ubuntu/Debian) and yum/dnf (RHEL/CentOS) — you'll encounter both in the field.

TERMINAL# ═══ APT (Ubuntu / Debian) ═══ $ sudo apt update # Refresh package list (always do first!) $ sudo apt install nginx # Install $ sudo apt install nginx=1.24.0-1 # Install specific version (pin!) $ sudo apt remove nginx # Remove (keep config files) $ sudo apt purge nginx # Remove + delete config files $ sudo apt autoremove # Remove unused dependencies $ sudo apt upgrade # Upgrade ALL packages $ sudo apt search redis # Search for packages $ apt list --installed # List what's installed $ apt show nginx # Show package details # ═══ YUM/DNF (RHEL / CentOS / Amazon Linux) ═══ $ sudo yum update # Update all packages $ sudo yum install nginx # Install $ sudo yum remove nginx # Remove $ sudo yum list installed # List installed $ sudo yum search redis # Search $ sudo dnf install nginx # dnf = modern replacement for yum

💡 Pin Versions in Production

Always install specific versions in production: apt install nginx=1.24.0-1. Without pinning, apt upgrade might update nginx to a version with breaking changes. Your CI/CD pipeline should pin every dependency.

12💾

Disk & Storage Management

Check Space, Mount Drives, Monitor I/O

Running out of disk space is one of the most common production incidents. Know how to check usage, find large files, mount external storage, and monitor disk I/O.

TERMINAL# ═══ Check Disk Space ═══ $ df -h # Disk usage of all mounted filesystems Filesystem Size Used Avail Use% Mounted on /dev/sda1 50G 32G 16G 67% / /dev/sdb1 100G 45G 55G 45% /data tmpfs 3.9G 0 3.9G 0% /dev/shm # ═══ Check Directory Sizes ═══ $ du -sh /var/log/* # Size of each item in /var/log 2.1G /var/log/journal 450M /var/log/nginx 120M /var/log/syslog $ du -sh /opt/myapp # Total size of a directory 1.2G /opt/myapp # ═══ Find Largest Files ═══ $ du -ah / | sort -rh | head -20 # Top 20 largest files/dirs $ find / -type f -size +100M -exec ls -lh {} \; # Files over 100MB # ═══ Disk I/O Statistics ═══ $ iostat -x 1 5 # Disk I/O stats every 1 sec, 5 times # Look for: %util (near 100% = bottleneck), await (high = slow disk) # ═══ List Block Devices ═══ $ lsblk # Show all disks and partitions NAME SIZE TYPE MOUNTPOINT sda 50G disk ├─sda1 49G part / └─sda2 1G part [SWAP] sdb 100G disk └─sdb1 100G part /data

Mount & Unmount

TERMINAL# Mount a new EBS volume (AWS) $ sudo mkdir /data # Create mount point $ sudo mount /dev/xvdf /data # Mount the volume $ df -h /data # Verify # Make it permanent (survives reboot) $ sudo blkid /dev/xvdf # Get UUID /dev/xvdf: UUID="abc-123" TYPE="ext4" $ sudo nano /etc/fstab # Add this line: UUID=abc-123 /data ext4 defaults,nofail 0 2 # Unmount $ sudo umount /data # Unmount (must not be in use) $ sudo umount -l /data # Lazy unmount (force)

13🌐

Networking

IPs, Ports, DNS & Troubleshooting

Network troubleshooting is 50% of DevOps debugging. When your app can't connect to the database, or users can't reach your website, these commands tell you exactly what's wrong.

Check IPs & Interfaces

TERMINAL$ ip addr show # Show all network interfaces and IPs $ ip addr show eth0 # Specific interface $ ip route show # Show routing table default via 10.0.0.1 dev eth0 # Default gateway $ hostname -I # Quick way to get your IP

Check Open Ports — ss and netstat

TERMINAL# ss = modern replacement for netstat (faster, always available) $ ss -tlnp # TCP listening ports with process names State Local Address:Port Process LISTEN 0.0.0.0:22 sshd LISTEN 0.0.0.0:80 nginx LISTEN 0.0.0.0:8080 java LISTEN 127.0.0.1:3306 mysqld # t=TCP, l=listening, n=numeric (don't resolve names), p=process # netstat (older but still used) $ netstat -tlnp # Same output as ss -tlnp $ netstat -an | grep ESTABLISHED # Active connections $ netstat -an | grep :8080 # Who's connected to port 8080? # What process is using a specific port? $ sudo lsof -i :8080 # Show process on port 8080 $ sudo fuser 8080/tcp # PID using port 8080

Test Connectivity — ping, nc, curl

TERMINAL# Ping — is the host reachable? $ ping -c 4 google.com # Send 4 packets $ ping -c 4 10.0.1.50 # Ping internal server # nc (netcat) — is the PORT open? $ nc -zv 10.0.1.50 3306 # Test if MySQL port is open Connection to 10.0.1.50 3306 port [tcp/mysql] succeeded! $ nc -zv db.server.com 5432 # Test PostgreSQL port # Test a range of ports $ nc -zv 10.0.1.50 8080-8090 # Scan ports 8080 to 8090 # curl — test HTTP endpoints $ curl -v https://api.myapp.com/health $ curl -I https://api.myapp.com # Headers only (check status code) HTTP/2 200 $ curl -o /dev/null -s -w "%{http_code}" https://myapp.com 200 # Just the status code # wget — download files $ wget https://releases.app.com/v2.1/app.tar.gz # DNS lookups $ dig myapp.com # Full DNS query $ nslookup myapp.com # Simple DNS lookup $ dig +short myapp.com # Just the IP 52.66.123.45 # Trace network path $ traceroute google.com # Show every hop to destination

💡 Troubleshooting Order

1) ping — can I reach the host? 2) nc -zv host port — is the port open? 3) curl — does the HTTP endpoint respond? 4) ss -tlnp on the server — is the service actually listening? This sequence solves 90% of connectivity issues.

14🔥

Firewall — UFW & iptables

Control Network Access

Firewalls block unwanted traffic. UFW is the simple frontend, iptables is the powerful backend. In cloud environments, you usually use Security Groups, but knowing Linux firewalls is essential for on-premises servers.

TERMINAL# ═══ UFW (Ubuntu — Simple) ═══ $ sudo ufw enable # Turn on firewall $ sudo ufw status verbose # Check status and rules $ sudo ufw allow 22/tcp # Allow SSH $ sudo ufw allow 80/tcp # Allow HTTP $ sudo ufw allow 443/tcp # Allow HTTPS $ sudo ufw allow from 10.0.0.0/8 # Allow entire internal network $ sudo ufw deny from 203.0.113.50 # Block specific IP $ sudo ufw delete allow 80/tcp # Remove a rule # ═══ iptables (Advanced — Low Level) ═══ $ sudo iptables -L -n # List all rules $ sudo iptables -A INPUT -p tcp --dport 80 -j ACCEPT # Allow HTTP $ sudo iptables -A INPUT -p tcp --dport 22 -s 10.0.0.0/8 -j ACCEPT # SSH from internal only $ sudo iptables -A INPUT -j DROP # Drop everything else (add LAST!)

15📜

Shell Scripting

Automate Everything with Bash

Every DevOps engineer writes bash scripts for deployments, backups, health checks, cleanup, and monitoring. A well-written script replaces hours of manual work.

BASH SCRIPT#!/bin/bash # deploy.sh — Production deployment script set -euo pipefail # Exit on error, undefined vars, pipe fails # set -e = stop if ANY command fails # set -u = stop if you use an undefined variable # set -o pipefail = catch errors in pipes APP_DIR="/opt/myapp" BACKUP_DIR="/opt/backups" DATE=$(date +%Y%m%d_%H%M%S) echo "[$(date)] Starting deployment..." # Step 1: Backup current version if [ -d "$APP_DIR" ]; then echo "Creating backup..." tar -czf "$BACKUP_DIR/app_$DATE.tar.gz" "$APP_DIR" fi # Step 2: Pull latest code cd "$APP_DIR" git pull origin main # Step 3: Build npm ci npm run build # Step 4: Restart service sudo systemctl restart myapp # Step 5: Health check sleep 5 if curl -sf http://localhost:3000/health > /dev/null; then echo "[$(date)] Deployment successful!" else echo "[$(date)] HEALTH CHECK FAILED! Rolling back..." tar -xzf "$BACKUP_DIR/app_$DATE.tar.gz" -C / sudo systemctl restart myapp exit 1 fi

Key Bash Concepts

BASH# Variables NAME="suresh" echo "Hello $NAME" # Output: Hello suresh echo "Path is ${HOME}/projects" # Use {} for clarity # Conditionals if [ -f "/opt/app/config.yml" ]; then echo "Config file exists" elif [ -d "/opt/app" ]; then echo "Directory exists but no config" else echo "Nothing exists" fi # Tests: -f file exists, -d directory exists, -z string is empty # -eq equal, -ne not equal, -gt greater than # Loops for server in web1 web2 web3; do echo "Deploying to $server..." ssh deploy@$server 'cd /opt/app && git pull && systemctl restart app' done # While loop while ! curl -sf http://localhost:8080/health; do echo "Waiting for app to start..." sleep 2 done echo "App is ready!" # Functions function deploy() { local app_name=$1 echo "Deploying $app_name" ssh deploy@server "systemctl restart $app_name" } deploy "order-service" deploy "user-service"

⚠️ set -euo pipefail

ALWAYS start scripts with this. Without it, errors are silently ignored — a failing command doesn't stop the script. Your deployment continues with corrupt state. This line has saved millions of production incidents.

16📝

Log Management

Find Problems Before Users Complain

Logs are the first thing you check when something breaks. Linux has a standard location for all logs, and systemd has journalctl for structured log queries.

Important Log Locations

Log File	What It Contains	When to Check
/var/log/syslog	General system messages	System issues, service failures
/var/log/auth.log	Authentication attempts	SSH logins, sudo usage, failed logins
/var/log/kern.log	Kernel messages	Hardware errors, driver issues, OOM kills
/var/log/nginx/access.log	Nginx HTTP requests	Traffic analysis, 404s, slow requests
/var/log/nginx/error.log	Nginx errors	Config errors, upstream failures
/var/log/apt/history.log	Package installations	What was installed/updated and when

journalctl — systemd Log Viewer

TERMINAL# View logs for a specific service $ journalctl -u nginx # All nginx logs $ journalctl -u nginx --since today # Today's nginx logs $ journalctl -u nginx --since "1 hour ago" # Last hour $ journalctl -u nginx -f # Follow in real-time (like tail -f) $ journalctl -u nginx -n 50 # Last 50 lines # View system-wide $ journalctl -b # Logs since last boot $ journalctl -p err # Only errors and above $ journalctl --since "2024-06-01 10:00" --until "2024-06-01 11:00" # Search across all logs $ journalctl | grep "Out of memory" # Find OOM kills $ journalctl -u myapp --no-pager | grep ERROR

Log Rotation — logrotate

LOGROTATE# /etc/logrotate.d/myapp /var/log/myapp/*.log { daily # Rotate every day rotate 14 # Keep 14 rotated files compress # Compress old logs with gzip delaycompress # Don't compress yesterday's (in case needed) missingok # Don't error if log file is missing notifempty # Don't rotate empty files postrotate # Run after rotation systemctl reload myapp endscript }

💡 Always tail -f First

When debugging a live issue, your first command should be: tail -f /var/log/myapp/error.log. Watch the errors flow in real-time while you reproduce the problem. This is the #1 debugging technique for DevOps engineers.

17🔧

System Administration

Cron Jobs, System Info & Performance

Daily sysadmin tasks: schedule automated jobs, check system performance, monitor resources, and manage hostnames/time.

Cron — Schedule Automated Tasks

CRONTAB# Edit crontab (task scheduler) $ crontab -e # Format: minute hour day month weekday command # ┌───────────── minute (0-59) # │ ┌───────────── hour (0-23) # │ │ ┌───────────── day of month (1-31) # │ │ │ ┌───────────── month (1-12) # │ │ │ │ ┌───────────── day of week (0-7, Sun=0 or 7) # * * * * * command # Examples: 0 2 * * * /opt/scripts/backup.sh # Every day at 2:00 AM */5 * * * * curl -sf http://localhost:8080/health # Every 5 minutes 0 0 * * 0 /opt/scripts/weekly-cleanup.sh # Every Sunday at midnight 0 9 * * 1-5 /opt/scripts/daily-report.sh # Mon-Fri at 9 AM # List scheduled jobs $ crontab -l # View cron logs $ grep CRON /var/log/syslog

System Information

TERMINAL# System overview $ uname -a # Kernel version, architecture Linux web1 5.15.0-1043-aws #48-Ubuntu SMP x86_64 GNU/Linux $ uptime # How long running + load average 10:30:15 up 45 days, 3:21, 2 users, load average: 0.75, 0.82, 0.68 # Load average: 1-min, 5-min, 15-min # On 4-core: load 4.0 = fully used, >4.0 = overloaded $ free -h # Memory usage total used free shared buff/cache available Mem: 7.7G 3.2G 1.1G 120M 3.4G 4.1G Swap: 2.0G 100M 1.9G # "available" = actual usable memory (includes reclaimable cache) # CPU info $ nproc # Number of CPU cores 4 $ lscpu # Detailed CPU information # System performance snapshot $ vmstat 1 5 # Virtual memory stats every 1 sec, 5 times # Watch: r (run queue), si/so (swap in/out — should be 0)

18💼

Interview Questions

40+ Linux Q&A for DevOps

These Linux questions are asked in every DevOps interview — from freshers to senior positions.

Commands & Files

Difference between find and grep?

find searches for FILES by name, size, date, permissions. grep searches INSIDE files for text patterns. find locates the file, grep reads its content.

find -mtime vs -ctime?

mtime = content modification time (file was edited). ctime = metadata change time (permissions, ownership changed). mtime -7 = modified in last 7 days.

How to find large files?

find / -type f -size +100M lists files larger than 100MB. Combine with du -sh /var/log/* | sort -rh | head to find largest directories.

head vs tail?

head -20 shows first 20 lines. tail -20 shows last 20 lines. tail -f follows a file in real-time (essential for debugging live logs).

Users & Permissions

chmod 755 vs 644?

755 (rwxr-xr-x): for scripts and directories. 644 (rw-r--r--): for config files and regular files. 600 (rw-------): for secrets and private keys.

How to add user to docker group?

sudo usermod -aG docker username. The -a flag APPENDS to groups. Without -a, it REPLACES all groups (dangerous!). Must log out and back in.

What happens when you install a service?

Linux creates a system user with /usr/sbin/nologin shell, creates a group, sets file ownership. Security: if service is hacked, attacker only gets limited permissions.

What is /etc/passwd vs /etc/shadow?

passwd: user accounts (username, UID, shell). shadow: encrypted passwords (only readable by root). Separated for security.

Processes & Services

How to find which process uses a port?

ss -tlnp | grep :8080 or sudo lsof -i :8080 or sudo fuser 8080/tcp. Shows the PID and process name.

kill vs kill -9?

kill (SIGTERM): asks process to shut down gracefully (save data, close connections). kill -9 (SIGKILL): force kills immediately (no cleanup). Always try kill first.

nohup purpose?

nohup command & runs a process that survives when you close the terminal. Without nohup, background processes die when your SSH session disconnects.

systemctl reload vs restart?

reload: reads new config without stopping service (zero downtime). restart: stops and starts (brief downtime). Always use reload for Nginx in production.

Networking & Troubleshooting

ss vs netstat?

ss is modern and faster (uses kernel netlink). netstat is older, being deprecated. Use ss -tlnp to show listening TCP ports with process names.

nc (netcat) usage?

nc -zv host port tests if a port is open. Essential for checking: can my app server reach the database on port 3306? Faster than telnet.

How to troubleshoot connectivity?

1) ping host (reachable?), 2) nc -zv host port (port open?), 3) curl endpoint (HTTP working?), 4) ss -tlnp on server (service listening?).

What is /etc/hosts?

Local DNS override file. Maps hostnames to IPs. Checked BEFORE DNS servers. Use for testing: add 10.0.1.50 api.myapp.com to test against a specific server.

Storage & Disk

df vs du?

df shows filesystem-level usage (how full is /dev/sda1). du shows directory-level usage (how big is /var/log). df for overview, du for drilling down.

How to mount a volume?

mkdir /data, mount /dev/xvdf /data, add to /etc/fstab for permanent mount. In AWS, after attaching EBS volume, you must format (mkfs) and mount it.

iostat purpose?

Shows disk I/O statistics. %util near 100% = disk bottleneck. High await = slow disk. Essential for diagnosing slow database or application performance.

What is swap?

Virtual memory on disk. When RAM is full, Linux moves inactive pages to swap. High swap usage = need more RAM. Check with free -h and swapon --show.

✓Master cd, ls, grep, find, cat, tail -f — they're 80% of your daily work

✓Know chmod numbers: 755 (scripts), 644 (files), 600 (secrets)

✓Always use set -euo pipefail in bash scripts

✓Use ss -tlnp (not netstat) for port checking

✓Know systemctl start/stop/restart/reload/enable/status

✓Know user management: useradd, usermod -aG, groups, /etc/passwd

✓Know journalctl -u service -f for live log monitoring

✓Know cron syntax: minute hour day month weekday command

✓Know tar -czvf (create) and tar -xzvf (extract)

✓Understand Linux directory structure (/etc, /var/log, /opt, /proc)