Ever found yourself in a situation where adding a single client configuration to your nginx reverse proxy requires a significant maintenance window? I encountered this scenario in production, and what I discovered reveals a fundamental flaw in how many teams architect multi-client nginx deployments in Kubernetes.
The Setup
We were running a multi-tenant nginx reverse proxy in Kubernetes, serving dozens of client websites through a single nginx instance:
- Multiple client domains (client1.example.com, client2.example.com, etc.)
- Single Kubernetes pod containing nginx + logging pipeline
- All client configurations managed through ConfigMaps
- Persistent volume for log storage
The supposed workflow: update the ConfigMap with new client configs, restart the pod, and you're done. Except you're never actually done.
┌─────────────────────────────────────────────────────────────────┐
│ Internet │
│ client1.example.com ──┐ │
│ client2.example.com ──┼─→ DNS Points to AWS Elastic IP │
│ client3.example.com ──┘ (Static Public IP: 203.0.113.10) │
└─────────────────────────────────┬───────────────────────────────┘
│
┌─────────────────▼────────────────┐
│ AWS EC2 Instance │
│ (Elastic IP: 203.0.113.10) │
│ │
│ ┌─────────────────────────────┐ │
│ │ Kubernetes Cluster │ │
│ │ │ │
│ │ ┌─────────────────────────┐│ │
│ │ │ Single Pod ││ │
│ │ │ ┌─────────┐ ┌────────┐ ││ │
│ │ │ │ nginx │ │ MQ │ ││ │
│ │ │ │(reverse │ │ │ ││ │
│ │ │ │ proxy) │ │ │ ││ │
│ │ │ └─────────┘ └────────┘ ││ │
│ │ │ │ ┌────────┐ ││ │
│ │ │ │ │Filter │ ││ │
│ │ │ │ │ API │ ││ │
│ │ │ ▼ └────────┘ ││ │
│ │ │ [PVC Logs] ││ │
│ │ └─────────────────────────┘│ │
│ └─────────────────────────────┘ │
└──────────────────────────────────┘
The Hidden Problems
Issue 1: Full Service Downtime
Every ConfigMap change requires a complete pod restart:
kubectl apply -f updated-nginx-configmap.yaml
kubectl rollout restart deployment/nginx-deployment
This triggers a cascade of problems that turns a 2-minute configuration change into a 30-minute production incident.
Issue 2: All-or-Nothing Availability
With a single nginx pod handling all client traffic, any restart means every client site goes down simultaneously. There's no graceful failover, no partial service - it's complete outage for everyone.
Issue 3: Scaling Creates Log Chaos
The obvious solution seems to be "add more replicas for high availability," but this creates different problems:
- Multiple nginx pods writing the same logs to different locations
- Logging pipeline gets overwhelmed with duplicate data
- Storage costs multiply unnecessarily
- Debugging becomes impossible with redundant log entries
Issue 4: Resource Multiplication
Each nginx replica requires its own full container set (nginx + MQ + filtering API). Three replicas means nine containers total, tripling your resource usage for what should be a simple availability improvement.
The Root Cause
The fundamental issue isn't with ConfigMaps or pod restarts - it's architectural. We've tightly coupled:
- Network identity: The static IP clients depend on
- Compute identity: The specific node where our pod runs
- Processing identity: The containers handling logs
When any component needs updates, everything must restart together.
The Solution Architecture
The fix requires separating these concerns and implementing true rolling updates. Here's the corrected architecture:
Improved Architecture - Separated Concerns
Multi-pod nginx with Network Load Balancer and centralized logging
┌─────────────────────────────────────────────────────────────────┐
│ Internet │
│ client1.example.com ──┐ │
│ client2.example.com ──┼─→ DNS Points to Static Elastic IPs │
│ client3.example.com ──┘ │
└─────────────────────────────────┬───────────────────────────────┘
│
┌─────────────────▼────────────────┐
│ Network Load Balancer │
│ (Static Elastic IPs) │
└─────────────────┬────────────────┘
│
┌─────────────────────▼─────────────────────────┐
│ Kubernetes Cluster │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │nginx-pod1│ │nginx-pod2│ │nginx-pod3│ │
│ │(nginx │ │(nginx │ │(nginx │ │
│ │ only) │ │ only) │ │ only) │ │
│ └─────┬────┘ └─────┬────┘ └─────┬────┘ │
│ │ │ │ │
│ └──── logs ────┼──── logs ────┘ │
│ │ │
│ ┌────────▼────────┐ │
│ │ Shared EFS │ │
│ │ (Single PVC) │ │
│ └────────┬────────┘ │
│ │ │
│ ┌────────▼────────┐ │
│ │ Centralized │ │
│ │ Logging Pod │ │
│ │ ┌──────────┐ │ │
│ │ │ MQ │ │ │
│ │ └──────────┘ │ │
│ │ ┌──────────┐ │ │
│ │ │Filter API│ │ │
│ │ └──────────┘ │ │
│ └─────────────────┘ │
└──────────────────────────────────────────────┘
Implementation Details
1. Shared Storage Foundation
# EFS Storage Class for shared access
apiVersion: storage.k8s.io/v1
kind: StorageClass
meta
name: efs-sc
provisioner: efs.csi.aws.com
parameters:
provisioningMode: efs-ap
fileSystemId: fs-xxxxxxxxx
directoryPerms: "0755"
---
# Single PVC shared by all nginx pods
apiVersion: v1
kind: PersistentVolumeClaim
meta
name: shared-nginx-logs
spec:
accessModes:
- ReadWriteMany # Critical: supports multiple writers
storageClassName: efs-sc
resources:
requests:
storage: 100Gi
2. Network Load Balancer for Static IPs
apiVersion: v1
kind: Service
meta
name: nginx-nlb
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
service.beta.kubernetes.io/aws-load-balancer-eip-allocations: "eipalloc-12345678,eipalloc-87654321"
service.beta.kubernetes.io/aws-load-balancer-target-group-attributes: preserve_client_ip.enabled=true
service.beta.kubernetes.io/aws-load-balancer-healthcheck-path: "/health"
spec:
type: LoadBalancer
selector:
app: nginx-reverse-proxy
ports:
- name: http
port: 80
targetPort: 80
3. Fast Rolling Update Deployment
apiVersion: apps/v1
kind: Deployment
meta
name: nginx-reverse-proxy
spec:
replicas: 3 # Multiple replicas for rolling updates
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1 # Always keep 2 pods running
maxSurge: 1 # Add 1 extra during updates
template:
spec:
terminationGracePeriodSeconds: 30
containers:
- name: nginx
image: nginx:1.25-alpine
volumeMounts:
- name: nginx-config
mountPath: /etc/nginx/conf.d
- name: shared-logs
mountPath: /var/log/nginx
# Aggressive health checks for fast updates
readinessProbe:
httpGet:
path: /health
port: 80
initialDelaySeconds: 5
periodSeconds: 2
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 10"]
volumes:
- name: nginx-config
configMap:
name: nginx-config
- name: shared-logs
persistentVolumeClaim:
claimName: shared-nginx-logs
4. Centralized Logging Service
apiVersion: apps/v1
kind: Deployment
meta
name: centralized-logging
spec:
replicas: 1
template:
spec:
containers:
- name: mq
image: rabbitmq:3.12-alpine
ports:
- containerPort: 5672
- name: filter-api
image: your-registry/log-filter-api:latest
ports:
- containerPort: 8080
volumeMounts:
- name: shared-logs
mountPath: /var/log/nginx
readOnly: true
volumes:
- name: shared-logs
persistentVolumeClaim:
claimName: shared-nginx-logs
Why This Architecture Works
Separation of Concerns
- Nginx pods handle only reverse proxy duties
- Logging pod handles only log processing
- Shared storage layer enables communication between them
Fast Rolling Updates
Configuration updates now take 15-30 seconds instead of 30 minutes:
kubectl apply -f updated-nginx-config.yaml
kubectl rollout restart deployment/nginx-reverse-proxy
# Total downtime: ~15-30 seconds
No Log Redundancy
- Single shared PVC eliminates duplicate log entries
- Centralized logging processes each request exactly once
- Resource usage scales predictably
Zero Manual Intervention
- NLB handles all IP routing automatically
- Kubernetes manages pod lifecycle
- Health checks prevent traffic to unhealthy pods
The Results
Before this architecture:
- 30-minute maintenance windows for config changes
- All client sites down during updates
- Manual intervention required for every change
- Unpredictable resource usage with replicas
After implementation:
- 15-30 second rolling updates
- Zero simultaneous client downtime
- Fully automated configuration deployment
- Predictable resource scaling
The transformation is dramatic. What used to be emergency maintenance windows requiring coordination with clients became routine deployments that happen transparently during business hours.
Limitations and Trade-offs
AWS Constraints:
- Security group rule limits can be reached with many NLBs
- Target limits of 500 per load balancer
- Connection limits of ~55,000 with client IP preservation
- Cannot share NLB across multiple services
Cost Considerations:
- NLB costs ~$16/month per load balancer
- EFS storage costs scale with usage
- Additional complexity in monitoring
Operational Complexity:
- More moving parts to monitor
- Shared storage introduces potential bottlenecks
- Requires AWS-specific knowledge
Despite these limitations, the operational benefits far outweigh the constraints for most multi-client deployments.
Key Lessons Learned
- Separate scaling concerns: Don't force nginx scaling and log processing to scale together
- Decouple network identity: Static IPs should live with load balancers, not pods
- Design for rolling updates: Health checks and graceful shutdown are critical
- Shared storage patterns: EFS with ReadWriteMany enables elegant solutions
- Monitor the right metrics: Focus on rolling update success rates and health check latency
The Bottom Line
The key insight isn't about any specific technology - it's about architectural separation. By decoupling nginx scaling from logging scaling, and separating network identity from pod identity, we eliminate the constraints that forced us into all-or-nothing deployments.
This approach proves that sometimes the best solution isn't adding more complexity, but removing the artificial constraints that create complexity in the first place. Your nginx reverse proxy can be both highly available and operationally simple - you just need to architect it correctly from the start.
For teams running multi-client nginx deployments in Kubernetes, this architecture provides a clear path from maintenance-heavy operations to truly cloud-native, automated deployments. The difference between 30-minute outages and 30-second updates isn't just technical - it's the difference between infrastructure that fights your business growth and infrastructure that enables it.
Related Reading
This article builds on concepts from our previous post about The Hidden Manual Work Behind Kubernetes Pod Restarts. If you haven't read that yet, it provides essential background on why manual intervention becomes necessary and how Network Load Balancers solve fundamental architectural problems in Kubernetes deployments.