rke备份和恢复
Backups and Disaster Recovery
As of v0.1.7, you can configure a RKE cluster to automatically take snapshots of etcd. In a disaster scenario, you can restore these snapshots, which are stored on other nodes in the cluster.
One-Time Snapshots
RKE can take a one-time snapshot of a running etcd node in a RKE cluster. The snapshot is automatically saved in /opt/rke/etcd-snapshots.
$ rke etcd snapshot-save --config cluster.yml
WARN Name of the snapshot is not specified using
INFO Starting saving snapshot on etcd hosts
INFO Setup tunnel for host
INFO Setup tunnel for host
INFO Setup tunnel for host
INFO Saving snapshot on host
INFO Successfully started container on host
INFO Saving snapshot on host
INFO Successfully started container on host
INFO Saving snapshot on host
INFO Successfully started container on host
INFO Finished saving snapshot on all etcd hosts
The command will save a snapshot of etcd from each etcd node in the cluster config file and will save it in /opt/rke/etcd-snapshots. When running the command, an additional container is created to take the snapshot. When the snapshot is completed, the container is automatically removed.
Etcd Recurring Snapshots
To schedule a recurring automatic etcd snapshot save, you can enable the etcd-snapshot service. etcd-snapshot runs in a service container alongside the etcd container. etcd-snapshot automatically takes a snapshot of etcd and stores them to its local disk in /opt/rke/etcd-snapshots.
In the cluster.yml, you need to turn enable snapshot as part of the etcd service. Additionally, you want to specify creation and retention for the snapshot service.
services: etcd:
snapshot: true
creation: 5m0s
retention: 24h
When a cluster is launched with the etcd snapshot service enabled, you can view the etcd-snapshot logs to confirm backups are being created automatically.
$ docker logs etcd-snapshot
time="2018-05-04T18:39:16Z" level=info msg="Initializing Rolling Backups" creation=1m0s retention=24h0m0s
time="2018-05-04T18:40:16Z" level=info msg="Created backup" name="2018-05-04T18:40:16Z_etcd" runtime=108.332814ms
time="2018-05-04T18:41:16Z" level=info msg="Created backup" name="2018-05-04T18:41:16Z_etcd" runtime=92.880112ms
time="2018-05-04T18:42:16Z" level=info msg="Created backup" name="2018-05-04T18:42:16Z_etcd" runtime=83.67642ms
time="2018-05-04T18:43:16Z" level=info msg="Created backup" name="2018-05-04T18:43:16Z_etcd" runtime=86.298499ms
For every node that has the etcd role, these backups are saved to /opt/rke/etcd-snapshots/.
Snapshot Options
Snapshot
By default, the recurring snapshot service is disabled. To enable the service, you need to define it as part of etcd and set it to true.
Creation
By default, the snapshot service will take snapshots every 5 minutes (5m0s). You can change the time between snapshots as part of the creation directive for the etcd service.
Retention
By default, all snapshots are saved for 24 hours (24h) before being deleted and purged. You can change how long to store a snapshot as part of the retention directive for the etcd service.
Etcd Disaster recovery
If there is a disaster with your Kubernetes cluster, you can use rke etcd snapshot-restore to recover your etcd. This command will revert to a specific snapshot stored in /opt/rke/etcd-snapshots that you explicitly define. During the restore process, RKE also removes the old etcd container before creating a new etcd cluster using the snapshot that you have chosen.
Warning: Restoring an etcd snapshot deletes your current etcd cluster and replaces it with a new one. Before you run the rke etcd snapshot-restore command, you should back up any important data in your cluster.
$ rke etcd snapshot-restore --name mysnapshot --config cluster.yml
INFO Starting restore on etcd hosts
INFO Setup tunnel for host
INFO Setup tunnel for host
INFO Setup tunnel for host
INFO Cleaning up host
INFO Running cleaner container on host
INFO Successfully started container on host
INFO Removing cleaner container on host
INFO Successfully cleaned up host
INFO Cleaning up host
INFO Running cleaner container on host
INFO Successfully started container on host
INFO Removing cleaner container on host
INFO Successfully cleaned up host
INFO Cleaning up host
INFO Running cleaner container on host
INFO Successfully started container on host
INFO Removing cleaner container on host
INFO Successfully cleaned up host
INFO Restoring snapshot on etcd host
INFO Successfully started container on host
INFO Restoring snapshot on etcd host
INFO Successfully started container on host
INFO Restoring snapshot on etcd host
INFO Successfully started container on host
INFO Building up etcd plane..
INFO Successfully started container on host
INFO Successfully started container on host
INFO Successfully removed container on host
INFO Successfully started container on host
INFO Successfully started container on host
INFO Successfully removed container on host
INFO Successfully started container on host
INFO Successfully started container on host
INFO Successfully removed container on host
INFO Successfully started etcd plane..
INFO Finished restoring on all etcd hosts
Example
In this example, the Kubernetes cluster was deployed on two AWS nodes.
Name
IP
Role
node1
10.0.0.1
node2
10.0.0.2
Back up the etcd cluster
Take a snapshot of the Kubernetes cluster.
$ rke etcd snapshot-save --name snapshot.db --config cluster.yml
!({{< baseurl >}}/img/rke/rke-etcd-backup.png)
Store the snapshot externally
After taking the etcd snapshot on node2, we recommend saving this backup in a persistence place. One of the options is to save the backup on a S3 bucket or tape backup.
# If you're using an AWS host and have the ability to connect to S3
root@node2:~# s3cmd mb s3://rke-etcd-backup
root@node2:~# s3cmd /opt/rke/etcdbackup/snapshot.db s3://rke-etcd-backup/
Place the backup on a new node
To simulate the failure, let's power down node2.
root@node2:~# poweroff
Before restoring etcd and running rancher up, we need to retrieve the backup saved on S3 to a new node, e.g. node3.
Name
IP
Role
node1
10.0.0.1
node2
10.0.0.2
node3
10.0.0.3
# Make a Directory
root@node3:~# mkdir -p /opt/rke/etcdbackup
$ Get the Backup from S3
root@node3:~# s3cmd get s3://rke-etcd-backup/snapshot.db /opt/rke/etcdbackup/snapshot.db
Restore etcd on the new node from the backup
Before updating and restoring etcd, you will need to add the new node into the Kubernetes cluster with the etcd role. In the cluster.yml, comment out the old node and add in the new node. `
nodes: - address: 10.0.0.1
hostname_override: node1
user: ubuntu
role:
- controlplane
- worker
# - address: 10.0.0.2
# hostname_override: node2
# user: ubuntu
# role:
# - etcd
- address: 10.0.0.3
hostname_override: node3
user: ubuntu
role:
- etcd
After the new node is added to the cluster.yml, run rke etcd snapshot-restore to launch etcd from the backup.]
$ rke etcd snapshot-restore --name snapshot.db --config cluster.yml
Finally, we need to restore the operations on the cluster by making the Kubernetes API point to the new etcdby running rke up again using the new cluster.yml.
$ rke up --config cluster.yml
Confirm that your Kubernetes cluster is functional by checking the pods on your cluster.
> kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-65899c769f-kcdpr 1/1 Running 0 17s
nginx-65899c769f-pc45c 1/1 Running 0 17s
nginx-65899c769f-qkhml 1/1 Running 0 17s
页:
[1]