Recovery
This guide explains how to restore your system to a previous state using backup data. The label used throughout the KIOPS codebase is Recovery History (restoration history).
- System failure: Data loss due to server failure, disk corruption, etc.
- Accidental deletion: Important resources or data were mistakenly deleted.
- Update rollback: Rolling back a problematic update to a previous state.
- Environment replication: Using backups to set up an identical environment elsewhere.
Recovery Types
KIOPS supports the following recovery flows.
- Control plane recovery: Restores the cluster from a control plane backup that includes an etcd snapshot and PKI certificates.
- Docker/Podman recovery: Restores volumes, configuration, and containers.
- Restoration may overwrite current data. Create a backup of the current state if needed.
- If possible, perform recovery during low-traffic hours.
- For production environments, validate recovery in a test environment first.
Control Plane Recovery
Restore your cluster from a control plane backup that includes an etcd snapshot and PKI certificates.
Control plane recovery completely replaces the cluster to the selected backup point. All changes after the backup (resource creation/modification/deletion) will be lost. Make sure to back up the current state first.
Step 1: Select a Backup and Click the Restore Icon
- Click [Backup Management] in the left menu.
- In the Backup List tab, switch the K8s backup Segmented control to Control Plane, then find the backup row you want to restore.
- Click the Restore icon in that row.
Step 2: Enter SSH Credentials in ControlplaneRestoreModal
The restore is performed by connecting directly to the master node via SSH.
- SSH username: SSH account for the master node
- SSH password: SSH password for the master node
- Sudo password: Enter the sudo password required to stop/restore etcd and apply the PKI.
Step 3: Enter the Confirmation Text and Start the Restore
The restore button becomes enabled only after you type the backup name exactly for confirmation. Enter the backup name as-is, then click Start Restore. (This operation completely replaces the cluster with the selected backup point in time.)
Step 4: Monitor Progress and Confirm Completion
- The backend downloads the backup bundle from object storage.
- etcd restore (
etcdctl restore) and PKI certificate restoration are performed sequentially. - After completion, check the result status in the Recovery History tab.
Step 5: Verify Cluster Status
Once restoration completes, verify that the cluster is operating normally.
kubectl get nodes
kubectl get pods --all-namespaces
kubectl get pods -n kube-system
- Are all nodes in
Readystate? - Are all kube-system Pods
Running? - Are application Pods running normally?
- Can you reach services normally?
Docker/Podman Recovery
Restore volumes, configuration, and containers from a Docker/Podman backup. Restore options mirror the backend DockerRestore model fields.
Restore Options (UI)
| UI Field Label | Option Key | Description |
|---|---|---|
| Restore Volume Data | restoreVolumes | Restores the backed-up volume data. |
| Restore Configuration Files | restoreConfig | Restores configuration files such as docker-compose. |
| Redeploy Service (Rollback) | redeployCompose | Redeploys the compose stack with the restored configuration. |
| Stop Existing Service | stopExisting | Stops currently running containers before restoring. |
The "original location / new location" options from earlier versions of the guide do not exist in the backend model. There is also no standalone "image restore (docker load)" flow because the DockerRestore model has no image field.
Restore Procedure
- Open [Backup Management] > the Backup List tab and find the target Docker/Podman backup row. The Restore History tab is for viewing results only and has no "start restore" action — restores must always be started from a backup row in the Backup List tab.
- Click the Restore icon on the backup row to open the restore modal.
- Configure the options above (Restore Volume Data / Restore Configuration Files / Redeploy Service (Rollback) / Stop Existing Service).
- Enter SSH credentials in the RemoteConnectionModal when prompted.
- Click Start Restore.
DockerRestoreDetailModal
Once a restore starts, DockerRestoreDetailModal shows:
- Restore start/finish times
- Summary of selected options
- Per-step progress logs
- Final status (
pending,in_progress,completed,failed)
Recovery History
The [Backup Management] > Recovery History tab tracks every restore operation.
- Status values:
pending,in_progress,completed,failed - Detail view: Clicking a row opens either DockerRestoreDetailModal or the control plane restore detail.
- Compose Project Filter: Filter the Docker/Podman recovery history by Docker Compose project name. Useful on Docker hosts that run multiple projects when you want to view only one project's restore history.
Recovery Verification
After recovery, always verify that the system is operating correctly.
Verification Checklist
- Service status: Check Pod/container status. Expected state: Running.
- Data integrity: Verify application data. Expected state: Data exists from backup point.
- Network: Test service access. Expected state: Normal response.
- Logs: Check application logs. Expected state: No errors.
For critical systems, prepare scripts that automatically perform health checks after recovery to reduce verification time.
Troubleshooting
Restore Failure: "snapshot file corrupted"
restore failed: snapshot file corrupted
Why does this happen? The backup file is corrupted and cannot be used for restoration.
Resolution
- Use a different backup file: Pick a valid backup from a different date.
- Verify backup file integrity: Check the checksum on the storage side.
- Check backup copies: If backups were replicated elsewhere, use that file.
SSH Authentication Failure
The master node SSH credentials are incorrect or the user lacks permissions. Re-check that the SSH username, password, and sudo password you entered are correct.
Data Mismatch
This occurs when restored data differs from expectations or applications produce errors.
How to check and resolve
- Verify backup point: Confirm the restored data matches the backup point state.
- Check schema compatibility: Verify the application version is compatible with the data schema.
- Sync external systems: Check if databases or other external systems also need restoration.
Restoring only part of a system can cause data inconsistencies. When possible, restore all related components together.