Troubleshooting¶
Common issues and solutions for Proxmox infrastructure troubleshooting.
🔧 System-Level Issues¶
Boot and Startup Problems¶
System Won’t Boot:
# Boot from rescue mode or live USB
# Mount root filesystem
mount /dev/mapper/pve-root /mnt
# Check filesystem integrity
fsck /dev/mapper/pve-root
# Check GRUB configuration
chroot /mnt
update-grub
grub-install /dev/sda
Kernel Panic Issues:
# Boot with previous kernel from GRUB menu
# Remove problematic kernel
apt remove pve-kernel-problematic-version
# Check hardware compatibility
dmesg | grep -i error
# Disable problematic modules
echo "blacklist module_name" >> /etc/modprobe.d/blacklist.conf
Service Startup Failures:
# Check service status
systemctl status pveproxy pvedaemon pve-cluster
# Check service logs
journalctl -u pveproxy -f
journalctl -u pvedaemon -f
journalctl -u pve-cluster -f
# Restart services in order
systemctl restart pve-cluster
systemctl restart pvedaemon
systemctl restart pveproxy
Network Connectivity Issues¶
No Network Access:
# Check interface status
ip link show
ip addr show
# Check bridge configuration
brctl show
# Restart networking
systemctl restart networking
# Check routing table
ip route show
# Test connectivity
ping -c 3 8.8.8.8
Bridge Configuration Problems:
# Recreate bridge
ip link del vmbr0
systemctl restart networking
# Check interface configuration
cat /etc/network/interfaces
# Verify bridge creation
brctl show vmbr0
DNS Resolution Issues:
# Check DNS configuration
cat /etc/resolv.conf
# Test DNS resolution
nslookup google.com
dig google.com
# Flush DNS cache
systemctl restart systemd-resolved
🗄️ Storage Issues¶
ZFS Problems¶
Pool Import Failures:
# Check pool status
zpool status
# Force import pool
zpool import -f poolname
# Import pool with different name
zpool import poolname newpoolname
# Check for pool corruption
zpool scrub poolname
Dataset Mount Issues:
# Check mount status
zfs list
mount | grep zfs
# Force mount dataset
zfs mount -a
zfs mount poolname/dataset
# Check mount properties
zfs get mountpoint poolname/dataset
Performance Issues:
# Check I/O statistics
zpool iostat -v 1
# Check ARC statistics
cat /proc/spl/kstat/zfs/arcstats
# Adjust ARC size
echo 4294967296 > /sys/module/zfs/parameters/zfs_arc_max
Disk and Filesystem Issues¶
Disk Full Errors:
# Check disk usage
df -h
du -sh /* | sort -hr
# Find large files
find / -type f -size +1G -exec ls -lh {} \;
# Clean up logs
journalctl --vacuum-time=7d
# Clean package cache
apt clean
Filesystem Corruption:
# Check filesystem
fsck -f /dev/mapper/pve-root
# Force check on next boot
touch /forcefsck
# Check for bad blocks
badblocks -v /dev/sda
LVM Issues:
# Scan for LVM volumes
pvscan
vgscan
lvscan
# Activate volume group
vgchange -ay pve
# Check LVM status
pvdisplay
vgdisplay
lvdisplay
🖥️ Virtualization Issues¶
VM Problems¶
VM Won’t Start:
# Check VM configuration
qm config 100
# Check VM status
qm status 100
# Check for lock files
ls -la /var/lock/qemu-server/
rm /var/lock/qemu-server/lock-100.conf
# Start VM with debug
qm start 100 --debug
VM Performance Issues:
# Check VM resource usage
qm monitor 100
info cpus
info memory
info block
# Check host resources
htop
iotop
# Adjust VM resources
qm set 100 --memory 4096
qm set 100 --cores 4
VM Network Issues:
# Check VM network configuration
qm config 100 | grep net
# Check bridge connectivity
brctl show
# Test network from VM
qm guest exec 100 -- ping -c 3 8.8.8.8
Container Problems¶
Container Won’t Start:
# Check container configuration
pct config 200
# Check container status
pct status 200
# Check for errors
journalctl -u pve-container@200
# Force stop and start
pct stop 200 --force
pct start 200
Container Resource Issues:
# Check container resources
pct exec 200 -- free -h
pct exec 200 -- df -h
# Adjust container limits
pct set 200 --memory 2048
pct set 200 --rootfs local-lvm:16
Privileged Container Issues:
# Enable privileged mode
pct set 200 --unprivileged 0
# Enable nesting for Docker
pct set 200 --features nesting=1,keyctl=1
# Check AppArmor issues
aa-status
aa-complain /usr/bin/lxc-start
🐳 Docker Issues¶
Docker Service Problems¶
Docker Won’t Start:
# Check Docker status
systemctl status docker
# Check Docker logs
journalctl -u docker -f
# Restart Docker
systemctl restart docker
# Check Docker daemon configuration
cat /etc/docker/daemon.json
Container Issues:
# Check container status
docker ps -a
# Check container logs
docker logs container-name
# Restart container
docker restart container-name
# Execute into container
docker exec -it container-name bash
Docker Compose Issues:
# Check compose file syntax
docker-compose config
# View service logs
docker-compose logs service-name
# Recreate services
docker-compose down
docker-compose up -d
# Force recreate
docker-compose up -d --force-recreate
Storage and Volume Issues¶
# Check Docker storage usage
docker system df
# Clean up unused resources
docker system prune -a
# Check volume mounts
docker volume ls
docker volume inspect volume-name
# Fix permission issues
chown -R 1002:1002 /docker/service-name/
🌐 Web Interface Issues¶
Proxmox Web GUI Problems¶
Can’t Access Web Interface:
# Check pveproxy service
systemctl status pveproxy
systemctl restart pveproxy
# Check firewall
iptables -L
# Check listening ports
netstat -tlnp | grep :8006
# Check SSL certificates
ls -la /etc/pve/local/pve-ssl.*
Authentication Issues:
# Reset root password
passwd root
# Check PAM configuration
cat /etc/pam.d/proxmox-ve-auth
# Clear browser cache and cookies
# Try incognito/private browsing mode
Performance Issues:
# Check system load
uptime
htop
# Check memory usage
free -h
# Restart web services
systemctl restart pveproxy pvedaemon
🔍 Diagnostic Tools¶
System Diagnostics¶
cat > /usr/local/bin/system-diagnostics.sh << 'EOF'
#!/bin/bash
echo "=== Proxmox System Diagnostics ==="
echo "Generated: $(date)"
echo
echo "=== System Information ==="
pveversion
uname -a
uptime
echo
echo "=== CPU Information ==="
lscpu | head -20
echo
echo "=== Memory Usage ==="
free -h
echo
echo "=== Disk Usage ==="
df -h
echo
echo "=== Network Interfaces ==="
ip addr show
echo
echo "=== Bridge Status ==="
brctl show
echo
echo "=== Service Status ==="
systemctl status pveproxy pvedaemon pve-cluster --no-pager
echo
echo "=== VM Status ==="
qm list
echo
echo "=== Container Status ==="
pct list
echo
echo "=== Storage Status ==="
zpool status 2>/dev/null || echo "No ZFS pools found"
echo
echo "=== Recent Errors ==="
journalctl --since "1 hour ago" --priority=err --no-pager
EOF
chmod +x /usr/local/bin/system-diagnostics.sh
Network Diagnostics¶
cat > /usr/local/bin/network-diagnostics.sh << 'EOF'
#!/bin/bash
echo "=== Network Diagnostics ==="
echo "Generated: $(date)"
echo
echo "=== Interface Status ==="
ip link show
echo
echo "=== IP Configuration ==="
ip addr show
echo
echo "=== Routing Table ==="
ip route show
echo
echo "=== DNS Configuration ==="
cat /etc/resolv.conf
echo
echo "=== Connectivity Tests ==="
ping -c 3 8.8.8.8
echo
echo "=== DNS Resolution Test ==="
nslookup google.com
echo
echo "=== Port Listening ==="
netstat -tlnp
echo
echo "=== Bridge Configuration ==="
brctl show
echo
echo "=== Firewall Rules ==="
iptables -L -n
EOF
chmod +x /usr/local/bin/network-diagnostics.sh
Performance Diagnostics¶
cat > /usr/local/bin/performance-diagnostics.sh << 'EOF'
#!/bin/bash
echo "=== Performance Diagnostics ==="
echo "Generated: $(date)"
echo
echo "=== System Load ==="
uptime
echo
echo "=== CPU Usage ==="
top -bn1 | head -20
echo
echo "=== Memory Usage ==="
free -h
cat /proc/meminfo | head -10
echo
echo "=== Disk I/O ==="
iostat -x 1 3
echo
echo "=== Network I/O ==="
cat /proc/net/dev
echo
echo "=== Process List ==="
ps aux --sort=-%cpu | head -20
echo
echo "=== Disk Usage ==="
df -h
echo
echo "=== ZFS ARC Stats ==="
if [ -f /proc/spl/kstat/zfs/arcstats ]; then
grep -E "^(hits|misses|c|size)" /proc/spl/kstat/zfs/arcstats
fi
EOF
chmod +x /usr/local/bin/performance-diagnostics.sh
📋 Emergency Procedures¶
System Recovery¶
Boot from Rescue Mode:
Boot from Proxmox installation media
Select “Rescue Boot” or “Debug Mode”
Mount root filesystem: mount /dev/mapper/pve-root /mnt
Chroot into system: chroot /mnt
Fix issues and update GRUB: update-grub
Reset to Factory Defaults:
# Backup important data first!
# Reset network configuration
cp /etc/network/interfaces.orig /etc/network/interfaces
# Reset Proxmox configuration (DANGEROUS!)
rm -rf /etc/pve/*
# Reinitialize cluster
pvecm create proxmox
Data Recovery:
# Mount backup storage
mount /dev/sdb1 /mnt/backup
# Restore from backup
qmrestore /mnt/backup/vzdump-qemu-100.vma.zst 100
# Restore configuration
tar -xzf /mnt/backup/pve-config.tar.gz -C /
🚨 Critical Issue Response¶
Service Outage Response¶
Immediate Assessment: - Check system status: systemctl status - Check resource usage: htop, df -h - Check network connectivity: ping 8.8.8.8
Service Recovery: - Restart critical services: systemctl restart pveproxy pvedaemon - Check VM/container status: qm list, pct list - Verify storage access: zpool status
Communication: - Notify stakeholders of outage - Provide regular status updates - Document incident for post-mortem
Data Loss Prevention¶
# Immediate backup of critical data
tar -czf /tmp/emergency-backup-$(date +%Y%m%d_%H%M%S).tar.gz /etc/pve/
# Stop services to prevent further damage
systemctl stop pveproxy pvedaemon
# Create ZFS snapshot if possible
zfs snapshot rpool@emergency-$(date +%Y%m%d_%H%M%S)
# Copy critical files to safe location
rsync -av /etc/pve/ /mnt/backup/emergency-pve-config/
📞 Support Resources¶
Getting Help¶
Proxmox Community: - Forum: https://forum.proxmox.com/ - Documentation: https://pve.proxmox.com/pve-docs/ - Wiki: https://pve.proxmox.com/wiki/
Log Collection for Support:
# Generate support bundle
proxmox-backup-debug inspect datastore
# Collect system information
/usr/local/bin/system-diagnostics.sh > /tmp/system-info.txt
# Collect relevant logs
journalctl --since "24 hours ago" > /tmp/system-logs.txt
Professional Support: - Proxmox Subscription: https://www.proxmox.com/en/proxmox-ve/pricing - Enterprise Support: Available with subscription