Troubleshooting

Common issues and solutions for Proxmox infrastructure troubleshooting.

🔧 System-Level Issues

Boot and Startup Problems

System Won’t Boot:

# Boot from rescue mode or live USB
# Mount root filesystem
mount /dev/mapper/pve-root /mnt

# Check filesystem integrity
fsck /dev/mapper/pve-root

# Check GRUB configuration
chroot /mnt
update-grub
grub-install /dev/sda

Kernel Panic Issues:

# Boot with previous kernel from GRUB menu
# Remove problematic kernel
apt remove pve-kernel-problematic-version

# Check hardware compatibility
dmesg | grep -i error

# Disable problematic modules
echo "blacklist module_name" >> /etc/modprobe.d/blacklist.conf

Service Startup Failures:

# Check service status
systemctl status pveproxy pvedaemon pve-cluster

# Check service logs
journalctl -u pveproxy -f
journalctl -u pvedaemon -f
journalctl -u pve-cluster -f

# Restart services in order
systemctl restart pve-cluster
systemctl restart pvedaemon
systemctl restart pveproxy

Network Connectivity Issues

No Network Access:

# Check interface status
ip link show
ip addr show

# Check bridge configuration
brctl show

# Restart networking
systemctl restart networking

# Check routing table
ip route show

# Test connectivity
ping -c 3 8.8.8.8

Bridge Configuration Problems:

# Recreate bridge
ip link del vmbr0
systemctl restart networking

# Check interface configuration
cat /etc/network/interfaces

# Verify bridge creation
brctl show vmbr0

DNS Resolution Issues:

# Check DNS configuration
cat /etc/resolv.conf

# Test DNS resolution
nslookup google.com
dig google.com

# Flush DNS cache
systemctl restart systemd-resolved

🗄️ Storage Issues

ZFS Problems

Pool Import Failures:

# Check pool status
zpool status

# Force import pool
zpool import -f poolname

# Import pool with different name
zpool import poolname newpoolname

# Check for pool corruption
zpool scrub poolname

Dataset Mount Issues:

# Check mount status
zfs list
mount | grep zfs

# Force mount dataset
zfs mount -a
zfs mount poolname/dataset

# Check mount properties
zfs get mountpoint poolname/dataset

Performance Issues:

# Check I/O statistics
zpool iostat -v 1

# Check ARC statistics
cat /proc/spl/kstat/zfs/arcstats

# Adjust ARC size
echo 4294967296 > /sys/module/zfs/parameters/zfs_arc_max

Disk and Filesystem Issues

Disk Full Errors:

# Check disk usage
df -h
du -sh /* | sort -hr

# Find large files
find / -type f -size +1G -exec ls -lh {} \;

# Clean up logs
journalctl --vacuum-time=7d

# Clean package cache
apt clean

Filesystem Corruption:

# Check filesystem
fsck -f /dev/mapper/pve-root

# Force check on next boot
touch /forcefsck

# Check for bad blocks
badblocks -v /dev/sda

LVM Issues:

# Scan for LVM volumes
pvscan
vgscan
lvscan

# Activate volume group
vgchange -ay pve

# Check LVM status
pvdisplay
vgdisplay
lvdisplay

🖥️ Virtualization Issues

VM Problems

VM Won’t Start:

# Check VM configuration
qm config 100

# Check VM status
qm status 100

# Check for lock files
ls -la /var/lock/qemu-server/
rm /var/lock/qemu-server/lock-100.conf

# Start VM with debug
qm start 100 --debug

VM Performance Issues:

# Check VM resource usage
qm monitor 100
info cpus
info memory
info block

# Check host resources
htop
iotop

# Adjust VM resources
qm set 100 --memory 4096
qm set 100 --cores 4

VM Network Issues:

# Check VM network configuration
qm config 100 | grep net

# Check bridge connectivity
brctl show

# Test network from VM
qm guest exec 100 -- ping -c 3 8.8.8.8

Container Problems

Container Won’t Start:

# Check container configuration
pct config 200

# Check container status
pct status 200

# Check for errors
journalctl -u pve-container@200

# Force stop and start
pct stop 200 --force
pct start 200

Container Resource Issues:

# Check container resources
pct exec 200 -- free -h
pct exec 200 -- df -h

# Adjust container limits
pct set 200 --memory 2048
pct set 200 --rootfs local-lvm:16

Privileged Container Issues:

# Enable privileged mode
pct set 200 --unprivileged 0

# Enable nesting for Docker
pct set 200 --features nesting=1,keyctl=1

# Check AppArmor issues
aa-status
aa-complain /usr/bin/lxc-start

🐳 Docker Issues

Docker Service Problems

Docker Won’t Start:

# Check Docker status
systemctl status docker

# Check Docker logs
journalctl -u docker -f

# Restart Docker
systemctl restart docker

# Check Docker daemon configuration
cat /etc/docker/daemon.json

Container Issues:

# Check container status
docker ps -a

# Check container logs
docker logs container-name

# Restart container
docker restart container-name

# Execute into container
docker exec -it container-name bash

Docker Compose Issues:

# Check compose file syntax
docker-compose config

# View service logs
docker-compose logs service-name

# Recreate services
docker-compose down
docker-compose up -d

# Force recreate
docker-compose up -d --force-recreate

Storage and Volume Issues

# Check Docker storage usage
docker system df

# Clean up unused resources
docker system prune -a

# Check volume mounts
docker volume ls
docker volume inspect volume-name

# Fix permission issues
chown -R 1002:1002 /docker/service-name/

🌐 Web Interface Issues

Proxmox Web GUI Problems

Can’t Access Web Interface:

# Check pveproxy service
systemctl status pveproxy
systemctl restart pveproxy

# Check firewall
iptables -L

# Check listening ports
netstat -tlnp | grep :8006

# Check SSL certificates
ls -la /etc/pve/local/pve-ssl.*

Authentication Issues:

# Reset root password
passwd root

# Check PAM configuration
cat /etc/pam.d/proxmox-ve-auth

# Clear browser cache and cookies
# Try incognito/private browsing mode

Performance Issues:

# Check system load
uptime
htop

# Check memory usage
free -h

# Restart web services
systemctl restart pveproxy pvedaemon

🔍 Diagnostic Tools

System Diagnostics

cat > /usr/local/bin/system-diagnostics.sh << 'EOF'
#!/bin/bash

echo "=== Proxmox System Diagnostics ==="
echo "Generated: $(date)"
echo

echo "=== System Information ==="
pveversion
uname -a
uptime
echo

echo "=== CPU Information ==="
lscpu | head -20
echo

echo "=== Memory Usage ==="
free -h
echo

echo "=== Disk Usage ==="
df -h
echo

echo "=== Network Interfaces ==="
ip addr show
echo

echo "=== Bridge Status ==="
brctl show
echo

echo "=== Service Status ==="
systemctl status pveproxy pvedaemon pve-cluster --no-pager
echo

echo "=== VM Status ==="
qm list
echo

echo "=== Container Status ==="
pct list
echo

echo "=== Storage Status ==="
zpool status 2>/dev/null || echo "No ZFS pools found"
echo

echo "=== Recent Errors ==="
journalctl --since "1 hour ago" --priority=err --no-pager
EOF

chmod +x /usr/local/bin/system-diagnostics.sh

Network Diagnostics

cat > /usr/local/bin/network-diagnostics.sh << 'EOF'
#!/bin/bash

echo "=== Network Diagnostics ==="
echo "Generated: $(date)"
echo

echo "=== Interface Status ==="
ip link show
echo

echo "=== IP Configuration ==="
ip addr show
echo

echo "=== Routing Table ==="
ip route show
echo

echo "=== DNS Configuration ==="
cat /etc/resolv.conf
echo

echo "=== Connectivity Tests ==="
ping -c 3 8.8.8.8
echo

echo "=== DNS Resolution Test ==="
nslookup google.com
echo

echo "=== Port Listening ==="
netstat -tlnp
echo

echo "=== Bridge Configuration ==="
brctl show
echo

echo "=== Firewall Rules ==="
iptables -L -n
EOF

chmod +x /usr/local/bin/network-diagnostics.sh

Performance Diagnostics

cat > /usr/local/bin/performance-diagnostics.sh << 'EOF'
#!/bin/bash

echo "=== Performance Diagnostics ==="
echo "Generated: $(date)"
echo

echo "=== System Load ==="
uptime
echo

echo "=== CPU Usage ==="
top -bn1 | head -20
echo

echo "=== Memory Usage ==="
free -h
cat /proc/meminfo | head -10
echo

echo "=== Disk I/O ==="
iostat -x 1 3
echo

echo "=== Network I/O ==="
cat /proc/net/dev
echo

echo "=== Process List ==="
ps aux --sort=-%cpu | head -20
echo

echo "=== Disk Usage ==="
df -h
echo

echo "=== ZFS ARC Stats ==="
if [ -f /proc/spl/kstat/zfs/arcstats ]; then
    grep -E "^(hits|misses|c|size)" /proc/spl/kstat/zfs/arcstats
fi
EOF

chmod +x /usr/local/bin/performance-diagnostics.sh

📋 Emergency Procedures

System Recovery

Boot from Rescue Mode:

  1. Boot from Proxmox installation media

  2. Select “Rescue Boot” or “Debug Mode”

  3. Mount root filesystem: mount /dev/mapper/pve-root /mnt

  4. Chroot into system: chroot /mnt

  5. Fix issues and update GRUB: update-grub

Reset to Factory Defaults:

# Backup important data first!
# Reset network configuration
cp /etc/network/interfaces.orig /etc/network/interfaces

# Reset Proxmox configuration (DANGEROUS!)
rm -rf /etc/pve/*

# Reinitialize cluster
pvecm create proxmox

Data Recovery:

# Mount backup storage
mount /dev/sdb1 /mnt/backup

# Restore from backup
qmrestore /mnt/backup/vzdump-qemu-100.vma.zst 100

# Restore configuration
tar -xzf /mnt/backup/pve-config.tar.gz -C /

🚨 Critical Issue Response

Service Outage Response

  1. Immediate Assessment: - Check system status: systemctl status - Check resource usage: htop, df -h - Check network connectivity: ping 8.8.8.8

  2. Service Recovery: - Restart critical services: systemctl restart pveproxy pvedaemon - Check VM/container status: qm list, pct list - Verify storage access: zpool status

  3. Communication: - Notify stakeholders of outage - Provide regular status updates - Document incident for post-mortem

Data Loss Prevention

# Immediate backup of critical data
tar -czf /tmp/emergency-backup-$(date +%Y%m%d_%H%M%S).tar.gz /etc/pve/

# Stop services to prevent further damage
systemctl stop pveproxy pvedaemon

# Create ZFS snapshot if possible
zfs snapshot rpool@emergency-$(date +%Y%m%d_%H%M%S)

# Copy critical files to safe location
rsync -av /etc/pve/ /mnt/backup/emergency-pve-config/

📞 Support Resources

Getting Help

Proxmox Community: - Forum: https://forum.proxmox.com/ - Documentation: https://pve.proxmox.com/pve-docs/ - Wiki: https://pve.proxmox.com/wiki/

Log Collection for Support:

# Generate support bundle
proxmox-backup-debug inspect datastore

# Collect system information
/usr/local/bin/system-diagnostics.sh > /tmp/system-info.txt

# Collect relevant logs
journalctl --since "24 hours ago" > /tmp/system-logs.txt

Professional Support: - Proxmox Subscription: https://www.proxmox.com/en/proxmox-ve/pricing - Enterprise Support: Available with subscription

📚 Additional Resources