rke备份和恢复

yinian 发表于 2019-1-31 12:29:51

Backups and Disaster Recovery

　　As of v0.1.7, you can configure a RKE cluster to automatically take snapshots of etcd. In a disaster scenario, you can restore these snapshots, which are stored on other nodes in the cluster.

One-Time Snapshots
　　RKE can take a one-time snapshot of a running etcd node in a RKE cluster. The snapshot is automatically saved in /opt/rke/etcd-snapshots.
　　

$ rke etcd snapshot-save --config cluster.yml　　

　　
WARN Name of the snapshot is not specified using
　　
INFO Starting saving snapshot on etcd hosts
　　
INFO Setup tunnel for host
　　
INFO Setup tunnel for host
　　
INFO Setup tunnel for host
　　
INFO Saving snapshot on host
　　
INFO Successfully started container on host
　　
INFO Saving snapshot on host
　　
INFO Successfully started container on host
　　
INFO Saving snapshot on host
　　
INFO Successfully started container on host
　　
INFO Finished saving snapshot on all etcd hosts
　　

　　The command will save a snapshot of etcd from each etcd node in the cluster config file and will save it in /opt/rke/etcd-snapshots. When running the command, an additional container is created to take the snapshot. When the snapshot is completed, the container is automatically removed.

Etcd Recurring Snapshots
　　To schedule a recurring automatic etcd snapshot save, you can enable the etcd-snapshot service. etcd-snapshot runs in a service container alongside the etcd container. etcd-snapshot automatically takes a snapshot of etcd and stores them to its local disk in /opt/rke/etcd-snapshots.
　　In the cluster.yml, you need to turn enable snapshot as part of the etcd service. Additionally, you want to specify creation and retention for the snapshot service.
　　

services:　　etcd:
　　snapshot: true
　　creation: 5m0s
　　retention: 24h
　　

　　When a cluster is launched with the etcd snapshot service enabled, you can view the etcd-snapshot logs to confirm backups are being created automatically.
　　

$ docker logs etcd-snapshot　　

　　
time="2018-05-04T18:39:16Z" level=info msg="Initializing Rolling Backups" creation=1m0s retention=24h0m0s
　　
time="2018-05-04T18:40:16Z" level=info msg="Created backup" name="2018-05-04T18:40:16Z_etcd" runtime=108.332814ms
　　
time="2018-05-04T18:41:16Z" level=info msg="Created backup" name="2018-05-04T18:41:16Z_etcd" runtime=92.880112ms
　　
time="2018-05-04T18:42:16Z" level=info msg="Created backup" name="2018-05-04T18:42:16Z_etcd" runtime=83.67642ms
　　
time="2018-05-04T18:43:16Z" level=info msg="Created backup" name="2018-05-04T18:43:16Z_etcd" runtime=86.298499ms
　　

　　For every node that has the etcd role, these backups are saved to /opt/rke/etcd-snapshots/.

Snapshot Options
　　Snapshot
　　By default, the recurring snapshot service is disabled. To enable the service, you need to define it as part of etcd and set it to true.
　　Creation
　　By default, the snapshot service will take snapshots every 5 minutes (5m0s). You can change the time between snapshots as part of the creation directive for the etcd service.
　　Retention
　　By default, all snapshots are saved for 24 hours (24h) before being deleted and purged. You can change how long to store a snapshot as part of the retention directive for the etcd service.

Etcd Disaster recovery
　　If there is a disaster with your Kubernetes cluster, you can use rke etcd snapshot-restore to recover your etcd. This command will revert to a specific snapshot stored in /opt/rke/etcd-snapshots that you explicitly define. During the restore process, RKE also removes the old etcd container before creating a new etcd cluster using the snapshot that you have chosen.

　　Warning: Restoring an etcd snapshot deletes your current etcd cluster and replaces it with a new one. Before you run the rke etcd snapshot-restore command, you should back up any important data in your cluster.

　　

$ rke etcd snapshot-restore --name mysnapshot --config cluster.yml　　
INFO Starting restore on etcd hosts
　　
INFO Setup tunnel for host
　　
INFO Setup tunnel for host
　　
INFO Setup tunnel for host
　　
INFO Cleaning up host
　　
INFO Running cleaner container on host
　　
INFO Successfully started container on host
　　
INFO Removing cleaner container on host
　　
INFO Successfully cleaned up host
　　
INFO Cleaning up host
　　
INFO Running cleaner container on host
　　
INFO Successfully started container on host
　　
INFO Removing cleaner container on host
　　
INFO Successfully cleaned up host
　　
INFO Cleaning up host
　　
INFO Running cleaner container on host
　　
INFO Successfully started container on host
　　
INFO Removing cleaner container on host
　　
INFO Successfully cleaned up host
　　
INFO Restoring snapshot on etcd host
　　
INFO Successfully started container on host
　　
INFO Restoring snapshot on etcd host
　　
INFO Successfully started container on host
　　
INFO Restoring snapshot on etcd host
　　
INFO Successfully started container on host
　　
INFO Building up etcd plane..
　　
INFO Successfully started container on host
　　
INFO Successfully started container on host
　　
INFO Successfully removed container on host
　　
INFO Successfully started container on host
　　
INFO Successfully started container on host
　　
INFO Successfully removed container on host
　　
INFO Successfully started container on host
　　
INFO Successfully started container on host
　　
INFO Successfully removed container on host
　　
INFO Successfully started etcd plane..
　　
INFO Finished restoring on all etcd hosts
　　

Example
　　In this example, the Kubernetes cluster was deployed on two AWS nodes.

Name
IP
Role
node1
10.0.0.1

node2
10.0.0.2

Back up the etcd cluster
　　Take a snapshot of the Kubernetes cluster.
　　

$ rke etcd snapshot-save --name snapshot.db --config cluster.yml　　

　　!({{< baseurl >}}/img/rke/rke-etcd-backup.png)

Store the snapshot externally
　　After taking the etcd snapshot on node2, we recommend saving this backup in a persistence place. One of the options is to save the backup on a S3 bucket or tape backup.
　　

# If you're using an AWS host and have the ability to connect to S3　　
root@node2:~# s3cmd mb s3://rke-etcd-backup
　　
root@node2:~# s3cmd /opt/rke/etcdbackup/snapshot.db s3://rke-etcd-backup/
　　

Place the backup on a new node
　　To simulate the failure, let's power down node2.
　　

root@node2:~# poweroff　　

　　Before restoring etcd and running rancher up, we need to retrieve the backup saved on S3 to a new node, e.g. node3.

Name
IP
Role
node1
10.0.0.1

node2
10.0.0.2

node3
10.0.0.3
　　

# Make a Directory　　
root@node3:~# mkdir -p /opt/rke/etcdbackup
　　
$ Get the Backup from S3
　　
root@node3:~# s3cmd get s3://rke-etcd-backup/snapshot.db /opt/rke/etcdbackup/snapshot.db
　　

Restore etcd on the new node from the backup
　　Before updating and restoring etcd, you will need to add the new node into the Kubernetes cluster with the etcd role. In the cluster.yml, comment out the old node and add in the new node. `
　　

nodes:　　- address: 10.0.0.1
　　hostname_override: node1
　　user: ubuntu
　　role:
　　- controlplane
　　- worker
　　
# - address: 10.0.0.2
　　
#    hostname_override: node2
　　
#    user: ubuntu
　　
#    role:
　　
#    - etcd
　　- address: 10.0.0.3
　　hostname_override: node3
　　user: ubuntu
　　role:
　　- etcd
　　

　　After the new node is added to the cluster.yml, run rke etcd snapshot-restore to launch etcd from the backup.]
　　

$ rke etcd snapshot-restore --name snapshot.db --config cluster.yml　　

　　Finally, we need to restore the operations on the cluster by making the Kubernetes API point to the new etcdby running rke up again using the new cluster.yml.
　　

$ rke up --config cluster.yml　　

　　Confirm that your Kubernetes cluster is functional by checking the pods on your cluster.
　　

> kubectl get pods　　
NAME                   READY STATUS RESTARTS AGE
　　
nginx-65899c769f-kcdpr 1/1    Running 0       17s
　　
nginx-65899c769f-pc45c 1/1    Running 0       17s
　　
nginx-65899c769f-qkhml 1/1    Running 0       17s

页: [1]

运维网's Archiver

rke备份和恢复