Skip to main content

Recovery

This guide explains how to restore your system to a previous state using backup data. The label used throughout the KIOPS codebase is Recovery History (restoration history).

When do you need recovery?
  • System failure: Data loss due to server failure, disk corruption, etc.
  • Accidental deletion: Important resources or data were mistakenly deleted.
  • Update rollback: Rolling back a problematic update to a previous state.
  • Environment replication: Using backups to set up an identical environment elsewhere.

Recovery Types

KIOPS supports the following recovery flows.

  • Control plane recovery: Restores the cluster from a control plane backup that includes an etcd snapshot and PKI certificates.
  • Docker/Podman recovery: Restores volumes, configuration, and containers.
Before you restore
  • Restoration may overwrite current data. Create a backup of the current state if needed.
  • If possible, perform recovery during low-traffic hours.
  • For production environments, validate recovery in a test environment first.

Control Plane Recovery

Restore your cluster from a control plane backup that includes an etcd snapshot and PKI certificates.

Warning: Full cluster replacement

Control plane recovery completely replaces the cluster to the selected backup point. All changes after the backup (resource creation/modification/deletion) will be lost. Make sure to back up the current state first.

Step 1: Select a Backup and Click the Restore Icon

  1. Click [Backup Management] in the left menu.
  2. In the Backup List tab, switch the K8s backup Segmented control to Control Plane, then find the backup row you want to restore.
  3. Click the Restore icon in that row.

Step 2: Enter SSH Credentials in ControlplaneRestoreModal

The restore is performed by connecting directly to the master node via SSH.

  • SSH username: SSH account for the master node
  • SSH password: SSH password for the master node
  • Sudo password: Enter the sudo password required to stop/restore etcd and apply the PKI.

Step 3: Enter the Confirmation Text and Start the Restore

The restore button becomes enabled only after you type the backup name exactly for confirmation. Enter the backup name as-is, then click Start Restore. (This operation completely replaces the cluster with the selected backup point in time.)

Step 4: Monitor Progress and Confirm Completion

  1. The backend downloads the backup bundle from object storage.
  2. etcd restore (etcdctl restore) and PKI certificate restoration are performed sequentially.
  3. After completion, check the result status in the Recovery History tab.

Step 5: Verify Cluster Status

Once restoration completes, verify that the cluster is operating normally.

kubectl get nodes
kubectl get pods --all-namespaces
kubectl get pods -n kube-system
Post-restore checklist
  1. Are all nodes in Ready state?
  2. Are all kube-system Pods Running?
  3. Are application Pods running normally?
  4. Can you reach services normally?

Docker/Podman Recovery

Restore volumes, configuration, and containers from a Docker/Podman backup. Restore options mirror the backend DockerRestore model fields.

Restore Options (UI)

UI Field LabelOption KeyDescription
Restore Volume DatarestoreVolumesRestores the backed-up volume data.
Restore Configuration FilesrestoreConfigRestores configuration files such as docker-compose.
Redeploy Service (Rollback)redeployComposeRedeploys the compose stack with the restored configuration.
Stop Existing ServicestopExistingStops currently running containers before restoring.
Note on location options

The "original location / new location" options from earlier versions of the guide do not exist in the backend model. There is also no standalone "image restore (docker load)" flow because the DockerRestore model has no image field.

Restore Procedure

  1. Open [Backup Management] > the Backup List tab and find the target Docker/Podman backup row. The Restore History tab is for viewing results only and has no "start restore" action — restores must always be started from a backup row in the Backup List tab.
  2. Click the Restore icon on the backup row to open the restore modal.
  3. Configure the options above (Restore Volume Data / Restore Configuration Files / Redeploy Service (Rollback) / Stop Existing Service).
  4. Enter SSH credentials in the RemoteConnectionModal when prompted.
  5. Click Start Restore.

DockerRestoreDetailModal

Once a restore starts, DockerRestoreDetailModal shows:

  • Restore start/finish times
  • Summary of selected options
  • Per-step progress logs
  • Final status (pending, in_progress, completed, failed)

Recovery History

The [Backup Management] > Recovery History tab tracks every restore operation.

  • Status values: pending, in_progress, completed, failed
  • Detail view: Clicking a row opens either DockerRestoreDetailModal or the control plane restore detail.
  • Compose Project Filter: Filter the Docker/Podman recovery history by Docker Compose project name. Useful on Docker hosts that run multiple projects when you want to view only one project's restore history.

Recovery Verification

After recovery, always verify that the system is operating correctly.

Verification Checklist

  • Service status: Check Pod/container status. Expected state: Running.
  • Data integrity: Verify application data. Expected state: Data exists from backup point.
  • Network: Test service access. Expected state: Normal response.
  • Logs: Check application logs. Expected state: No errors.
Automate recovery verification

For critical systems, prepare scripts that automatically perform health checks after recovery to reduce verification time.


Troubleshooting

Restore Failure: "snapshot file corrupted"

restore failed: snapshot file corrupted

Why does this happen? The backup file is corrupted and cannot be used for restoration.

Resolution

  1. Use a different backup file: Pick a valid backup from a different date.
  2. Verify backup file integrity: Check the checksum on the storage side.
  3. Check backup copies: If backups were replicated elsewhere, use that file.

SSH Authentication Failure

The master node SSH credentials are incorrect or the user lacks permissions. Re-check that the SSH username, password, and sudo password you entered are correct.

Data Mismatch

This occurs when restored data differs from expectations or applications produce errors.

How to check and resolve

  1. Verify backup point: Confirm the restored data matches the backup point state.
  2. Check schema compatibility: Verify the application version is compatible with the data schema.
  3. Sync external systems: Check if databases or other external systems also need restoration.
Caution with partial restoration

Restoring only part of a system can cause data inconsistencies. When possible, restore all related components together.