Production-Ready Kubernetes Deployment: From Zero to Hero
A comprehensive guide to deploying, managing, and scaling containerized applications in production Kubernetes clusters
Introduction
Kubernetes has become the de facto standard for container orchestration in modern cloud-native applications. However, moving from development to production requires understanding numerous concepts, best practices, and potential pitfalls that aren't immediately obvious.
This tutorial walks you through building a production-grade Kubernetes deployment from scratch. We'll cover cluster architecture, deployment strategies, monitoring, security, and disaster recovery - everything you need to confidently run applications in production.
By the end of this guide, you'll have hands-on experience deploying a complete microservices application with proper configuration, monitoring, and operational practices.
Prerequisites
Before starting, you should have:
- Basic Docker knowledge (containers, images, Dockerfiles)
- Understanding of YAML syntax
- Familiarity with command-line tools
- Access to a cloud provider (AWS, GCP, or Azure) or local Kubernetes cluster
- kubectl installed on your machine
Understanding Kubernetes Architecture
Kubernetes is a distributed system with several key components:
Control Plane Components:
- API Server: Central management point providing REST API for all operations
- etcd: Distributed key-value store for cluster state storage
- Scheduler: Assigns pods to nodes and optimizes resource allocation
- Controller Manager: Maintains desired state and manages various controllers
Worker Node Components:
- kubelet: Agent running on each node managing pods
- kube-proxy: Network proxy for service discovery and load balancing
- Container Runtime: Docker, containerd, or CRI-O for running containers
Core Concepts
Pods: Smallest deployable units, contain one or more containers Deployments: Manage replica sets and rolling updates Services: Stable networking endpoints for pods ConfigMaps/Secrets: Configuration and sensitive data Ingress: HTTP/HTTPS routing to services Namespaces: Virtual clusters for resource isolation
Setting Up Your First Cluster
Let's create a production-ready cluster using a managed Kubernetes service.
Using AWS EKS
# Install eksctl
curl --silent --location "https://github.com/weaveworks/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp
sudo mv /tmp/eksctl /usr/local/bin
# Create cluster with production configuration
eksctl create cluster \
--name production-cluster \
--version 1.28 \
--region us-west-2 \
--nodegroup-name standard-workers \
--node-type t3.medium \
--nodes 3 \
--nodes-min 3 \
--nodes-max 6 \
--managed \
--vpc-nat-mode Single
Verify Cluster Access
# Get cluster information
kubectl cluster-info
# Check nodes
kubectl get nodes
# Expected output:
# NAME STATUS ROLES AGE VERSION
# ip-192-168-10-23.us-west-2.compute.internal Ready <none> 5m v1.28.0
# ip-192-168-45-67.us-west-2.compute.internal Ready <none> 5m v1.28.0
# ip-192-168-78-90.us-west-2.compute.internal Ready <none> 5m v1.28.0
Deploying Your First Application
Let's deploy a microservices application with frontend, backend, and database components.
Application Architecture
鈹屸攢鈹€鈹€鈹€鈹€鈹€鈹€鈹€鈹€鈹€鈹€鈹€鈹€鈹€
Ingress (nginx-ingress)
鈹斺攢鈹€鈹€鈹€鈹€鈹€鈹攢鈹€鈹€鈹€鈹€鈹€鈹€
鈹溾攢鈹€鈹€鈹€鈹€/api/* 鈹€鈹€鈹€鈹€鈹€鈹€Backend Service 鈹€鈹€鈹€鈹€鈹€鈹€Backend Pods
(REST API)
鈹斺攢鈹€鈹€鈹€鈹€/* 鈹€鈹€鈹€鈹€鈹€鈹€Frontend Service 鈹€鈹€鈹€鈹€鈹€鈹€Frontend Pods
(React App)
Database Service
PostgreSQL Pod
(StatefulSet)
Backend Deployment
Create backend-deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: backend
namespace: production
labels:
app: backend
tier: api
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: backend
template:
metadata:
labels:
app: backend
tier: api
version: v1.2.0
spec:
# Security context
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
# Service account
serviceAccountName: backend-sa
containers:
- name: api
image: myregistry.io/backend:v1.2.0
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 8080
protocol: TCP
# Environment variables from ConfigMap and Secret
env:
- name: NODE_ENV
value: production
- name: PORT
value: "8080"
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: database-credentials
key: url
- name: REDIS_HOST
valueFrom:
configMapKeyRef:
name: backend-config
key: redis.host
# Resource limits
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
# Health checks
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: http
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 2
# Graceful shutdown
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 15"]
# Security
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
# Volume mounts
volumeMounts:
- name: tmp
mountPath: /tmp
- name: cache
mountPath: /app/cache
volumes:
- name: tmp
emptyDir: {}
- name: cache
emptyDir: {}
# Affinity rules
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- backend
topologyKey: kubernetes.io/hostname
Backend Service
Create backend-service.yaml:
apiVersion: v1
kind: Service
metadata:
name: backend
namespace: production
labels:
app: backend
spec:
type: ClusterIP
selector:
app: backend
ports:
- name: http
port: 80
targetPort: 8080
protocol: TCP
sessionAffinity: None
ConfigMap and Secrets
Create backend-config.yaml:
apiVersion: v1
kind: ConfigMap
metadata:
name: backend-config
namespace: production
data:
redis.host: "redis-service.production.svc.cluster.local"
redis.port: "6379"
log.level: "info"
cache.ttl: "3600"
---
apiVersion: v1
kind: Secret
metadata:
name: database-credentials
namespace: production
type: Opaque
stringData:
url: "postgresql://user:password@postgres-service:5432/appdbsslmode=require"
username: "appuser"
password: "strong-random-password-here"
Frontend Deployment
Create frontend-deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: frontend
namespace: production
spec:
replicas: 2
selector:
matchLabels:
app: frontend
template:
metadata:
labels:
app: frontend
tier: web
spec:
containers:
- name: nginx
image: myregistry.io/frontend:v1.2.0
ports:
- containerPort: 80
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "200m"
livenessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 10
periodSeconds: 10
readinessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 5
periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
name: frontend
namespace: production
spec:
type: ClusterIP
selector:
app: frontend
ports:
- port: 80
targetPort: 80
Database StatefulSet
Create postgres-statefulset.yaml:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
namespace: production
spec:
serviceName: postgres-service
replicas: 1
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:15-alpine
ports:
- containerPort: 5432
name: postgres
env:
- name: POSTGRES_DB
value: appdb
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: database-credentials
key: username
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: database-credentials
key: password
- name: PGDATA
value: /var/lib/postgresql/data/pgdata
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "1000m"
volumeMounts:
- name: postgres-storage
mountPath: /var/lib/postgresql/data
livenessProbe:
exec:
command:
- pg_isready
- -U
- appuser
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
exec:
command:
- pg_isready
- -U
- appuser
initialDelaySeconds: 10
periodSeconds: 5
volumeClaimTemplates:
- metadata:
name: postgres-storage
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: gp3
resources:
requests:
storage: 20Gi
---
apiVersion: v1
kind: Service
metadata:
name: postgres-service
namespace: production
spec:
clusterIP: None
selector:
app: postgres
ports:
- port: 5432
targetPort: 5432
Ingress Configuration
Install NGINX Ingress Controller:
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.8.1/deploy/static/provider/aws/deploy.yaml
Create ingress.yaml:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: app-ingress
namespace: production
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
nginx.ingress.kubernetes.io/ssl-redirect: "true"
cert-manager.io/cluster-issuer: "letsencrypt-prod"
nginx.ingress.kubernetes.io/rate-limit: "100"
spec:
ingressClassName: nginx
tls:
- hosts:
- app.example.com
secretName: app-tls-cert
rules:
- host: app.example.com
http:
paths:
- path: /api
pathType: Prefix
backend:
service:
name: backend
port:
number: 80
- path: /
pathType: Prefix
backend:
service:
name: frontend
port:
number: 80
Deploy Everything
# Create namespace
kubectl create namespace production
# Apply configurations
kubectl apply -f backend-config.yaml
kubectl apply -f backend-deployment.yaml
kubectl apply -f backend-service.yaml
kubectl apply -f frontend-deployment.yaml
kubectl apply -f postgres-statefulset.yaml
kubectl apply -f ingress.yaml
# Check deployment status
kubectl get all -n production
# Watch rollout
kubectl rollout status deployment/backend -n production
kubectl rollout status deployment/frontend -n production
Monitoring with Prometheus and Grafana
Install Prometheus Stack:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace \
--set prometheus.prometheusSpec.retention=30d \
--set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi
Access Grafana:
kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80
Default credentials: admin / prom-operator
Logging with EFK Stack
Deploy Elasticsearch, Fluentd, and Kibana:
# Add Elastic Helm repo
helm repo add elastic https://helm.elastic.co
helm repo update
# Install Elasticsearch
helm install elasticsearch elastic/elasticsearch \
--namespace logging \
--create-namespace \
--set replicas=3 \
--set volumeClaimTemplate.resources.requests.storage=30Gi
# Install Kibana
helm install kibana elastic/kibana \
--namespace logging \
--set service.type=LoadBalancer
# Install Fluentd
kubectl apply -f https://raw.githubusercontent.com/fluent/fluentd-kubernetes-daemonset/master/fluentd-daemonset-elasticsearch.yaml
Autoscaling
Horizontal Pod Autoscaler
Create hpa.yaml:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: backend-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: backend
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 30
- type: Pods
value: 2
periodSeconds: 60
Cluster Autoscaler
Enable cluster autoscaling:
eksctl create iamserviceaccount \
--name cluster-autoscaler \
--namespace kube-system \
--cluster production-cluster \
--attach-policy-arn arn:aws:iam::aws:policy/AutoScalingFullAccess \
--approve
kubectl apply -f https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml
Backup and Disaster Recovery
Install Velero for backup:
# Install Velero CLI
wget https://github.com/vmware-tanzu/velero/releases/download/v1.12.0/velero-v1.12.0-linux-amd64.tar.gz
tar -xvf velero-v1.12.0-linux-amd64.tar.gz
sudo mv velero-v1.12.0-linux-amd64/velero /usr/local/bin/
# Install Velero in cluster
velero install \
--provider aws \
--plugins velero/velero-plugin-for-aws:v1.8.0 \
--bucket velero-backups \
--backup-location-config region=us-west-2 \
--snapshot-location-config region=us-west-2 \
--secret-file ./credentials-velero
# Create backup schedule
velero schedule create daily-backup \
--schedule="0 2 * * *" \
--include-namespaces production \
--ttl 720h
Security Best Practices
Network Policies
Create network-policy.yaml:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: backend-network-policy
namespace: production
spec:
podSelector:
matchLabels:
app: backend
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
app: frontend
- namespaceSelector:
matchLabels:
name: ingress-nginx
ports:
- protocol: TCP
port: 8080
egress:
- to:
- podSelector:
matchLabels:
app: postgres
ports:
- protocol: TCP
port: 5432
- to:
- namespaceSelector: {}
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
Pod Security Standards
Create pod-security-policy.yaml:
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: restricted
spec:
privileged: false
allowPrivilegeEscalation: false
requiredDropCapabilities:
- ALL
volumes:
- configMap
- emptyDir
- projected
- secret
- downwardAPI
- persistentVolumeClaim
runAsUser:
rule: MustRunAsNonRoot
seLinux:
rule: RunAsAny
fsGroup:
rule: RunAsAny
Troubleshooting
Common debugging commands:
# Check pod logs
kubectl logs -f deployment/backend -n production
# Describe pod for events
kubectl describe pod <pod-name> -n production
# Execute command in pod
kubectl exec -it <pod-name> -n production -- /bin/sh
# Check resource usage
kubectl top pods -n production
kubectl top nodes
# Debug networking
kubectl run debug --image=nicolaka/netshoot -it --rm -- /bin/bash
Conclusion
You now have a production-ready Kubernetes deployment with:
- Multi-tier application architecture
- Proper resource management and limits
- Health checks and graceful shutdown
- Horizontal and cluster autoscaling
- Monitoring with Prometheus/Grafana
- Centralized logging with EFK
- Automated backups with Velero
- Security policies and network isolation
- SSL/TLS termination
- Rolling updates with zero downtime
Next Steps
- Implement GitOps with ArgoCD or Flux
- Add service mesh (Istio/Linkerd) for advanced traffic management
- Set up multi-cluster deployments
- Implement chaos engineering with Chaos Mesh
- Add advanced observability with distributed tracing
This tutorial is part of the PlayHve DevOps & Cloud series. Master cloud-native technologies with our comprehensive guides.