|
1. GlusterFS术语
- GNFS 和KNFS,前者是GlusterFS自带的NFS Server,而后者指的是操作系统内核提供的kernel NFS服务。两者可选其一,也可以配置成同时提供服务。
- Brick,存储块,指可信主机池中由主机提供的用于存储的专用分区,是GlusterFS中的基本存储单元。
- Volume,逻辑卷,一个逻辑卷是一组bricks的集合。
- Subvolume,一个在经由至少一个的转换器处理之后的brick,被称为sub-volume。
- Volfile,代指GlusterFS的各种配置文件,定义了服务端和客户端使用到的各种转换器以及卷和块的配置信息。
GlusterFS共包含三部分,服务端、客户端和管理进程,每部分都有自己的配置文件。其中服务端和客户端的vol files放置在/var/lib/glusterd/vols/VOLNAME目录下,后台管理进程的配置文件在/etc/glusterfs/目录下。
- Glusterd,后台管理进程,需要在存储集群中的每个节点上都要运行。
- Extended Attributes,扩展属性是文件系统的一个特性,允许用户或应用为文件和目录关联更多的元数据信息。
- FUSE,用户空间内的文件系统。
- GFID,每个GlusterFS中的文件或目录都有一个128bit的数字标识,称为GFID。
- Quorum,该参数设置了在一个可信的存储池中最多可失效的主机节点数量,超出该值则认为该可信存储池已不可用了。
- Rebalance,当一个brick被加入或移除后,会有一个修复进程对数据分布进行重新计算与优化。
- RRDNS,是随机式域名解析的缩写,用于对一个域名设置多个IP解析时,实现数据读负载的分布式处理。
- Split-brain,脑裂,即处于一个镜像复制关系中的bricks之间,发生了数据或元数据的不一致问题,而无法认定哪边的数据正确。
2. GlusterFS的两种访问控制方式
每个GlusterFS逻辑卷的配置文件中都会有相似的两种文件,名称分别是trusted--fuse.vol和-fuse.vol。这实际上与GlusterFS的两种访问控制策略有关。GlusterFS的可信存储池设计,可以满足已经被加入到可信主机池中的主机节点挂接GlusterFS存储的访问需求,此时使用的是带有trusted-volfile前缀的配置文件。而对于不在可信主机池中的主机节点,在使用GlusterFS逻辑存储卷时,需要使用non-trusted-volfile的相关配置文件,使用username/password通过访问控制。
3. 创建和管理可信存储主机池
首先,不能在第一个主机节点上probe自己。其次,需要为所有主机做好主机名解析。
3.1 创建一个包含四个节点的可信主机池
# gluster peer probe server2Probe successful# gluster peer probe server3Probe successful# gluster peer probe server4Probe successful
3.2 在第一个节点上对可信主机池状态进行检查
# gluster peer statusNumber of Peers: 3Hostname: server2
Uuid: 5e987bda-16dd-43c2-835b-08b7d55e94e5State: Peer in Cluster (Connected)Hostname: server3
Uuid: 1e0ca3aa-9ef7-4f66-8f15-cbc348f29ff7State: Peer in Cluster (Connected)Hostname: server4
Uuid: 3e0caba-9df7-4f66-8e5d-cbc348f29ff7State: Peer in Cluster (Connected)
3.3 从已经加入到可信池的其它节点上probe第一个节点
server2# gluster peer probe server1
Probe successful
3.4 继续在server2上执行可信池状态检查的命令
server2# gluster peer statusNumber of Peers: 3Hostname: server1
Uuid: ceed91d5-e8d1-434d-9d47-63e914c93424State: Peer in Cluster (Connected)Hostname: server3
Uuid: 1e0ca3aa-9ef7-4f66-8f15-cbc348f29ff7State: Peer in Cluster (Connected)Hostname: server4
Uuid: 3e0caba-9df7-4f66-8e5d-cbc348f29ff7State: Peer in Cluster (Connected)
3.5 从可信存储主机池中移除一个主机节点
# gluster peer detach server4Detach successful
4. 管理glusterd服务进程
启动与停止:
# /etc/init.d/glusterd start
# /etc/init.d/glusterd stop
设置为随系统自启动:
# chkconfig glusterd on
5. 使用Gluster CLI工具
你可以在Gluster集群中的任一个节点上执行Gluster CLI命令,而配置结果会被自动同步到整个集群。即使逻辑卷正在被挂接中和使用中,也一样可以使用CLI命令进行管理。
直接运行CLI命令,如:
# gluster peer status
使用CLI命令的交互模式:
# gluster
gluster>
gluster > peer status
6. POSIX ACL文件权限管理
使用POSIX Access Control Lists (ACLs) 可以实现更加细粒度的文件访问权限控制。
6.1 在服务端启用POSIX Access Control Lists (ACLs)
如果需要使用POSIX Access Control Lists (ACLs) ,那么在服务器端挂接逻辑卷时就需要指定acl选项,如下所示:
# mount -o acl /dev/sda1 /export1
在/etc/fstab文件中可以这样配置:
LABEL=/work /export1 ext3 rw, acl 14
6.2 在客户端启用POSIX Access Control Lists (ACLs)
# mount -t glusterfs -o acl 198.192.198.234:glustervolume /mnt/gluster
6.3 设置POSIX ACLs
一般地,你可以设置两种类型的POSIX ACLs,access ACLs和default ACLs。前者用于对一个指定的目录或文件设置访问策略,而后者则为目录及目录中的文件提供一种默认的访问控制策略。你可以基于每个用户、用户组以至于不在文件属组内的其它用户,来设置ACLs。
设置access ACLs的命令格式:# setfacl –m file
下表为可以设置的权限项目和格式要求,其中必须是r (read), w (write), and x (execute)的组合形式。
ACL Entry
Description
u:uid:\
Sets the access ACLs for a user. You can specify user name or UID
g:gid:\
Sets the access ACLs for a group. You can specify group name or GID.
m:\
Sets the effective rights mask. The mask is the combination of all access permissions of the owning group and all of the user and group entries.
o:\
Sets the access ACLs for users other than the ones in the group for the file.
授权对象可以是文件,也可以是目录。例如,为用户antony授权testfile的读写权限。
# setfacl -m u:antony:rw /mnt/gluster/data/testfile
设置default ACLs的命令格式:# setfacl –m –-set
例如,设置/data目录的默认ACLs为向不在文件所属用户组内的其它所有用户,开放只读的权限:
# setfacl –m --set o::r /mnt/gluster/data
注:如果同时设置了default acls和access acls,则access acls优先级更高。
6.4 怎样查看已经设置的POSIX ACLs
查看文件的access acls:# getfacl targetfile
查看目录的default acls:# getfacl /mnt/gluster/data/doc
6.5 移除POSIX ACLs
例如,移除用户antony对test-file的所有访问权限:
# setfacl -x u:antony /mnt/gluster/data/test-file
6.6 Samba and ACLs
如果你使用Samba访问GlusterFS FUSE挂接的存储卷,那么POSIX ACLs会默认被启用。
7. 怎样配置使用GlusterFS客户端
你可以通过多种方式访问gluster存储卷。
在Linux系统主机中,使用gluster原生客户端方式,可以获得更高的并发性能和透明的失效转移功能。
在linux/Unix系统主机中,使用NFSv3访问gluster存储卷。
在Windows系统主机中,使用CIFS访问gluster存储卷。
7.1 使用Gluster原生客户端访问存储卷
Gluster原生客户端是一个基于FUSE的运行在客户的用户空间中的客户端程序,也是推荐用于高并发访问逻辑存储卷数据的工具。
1) 安装FUSE内核模块:
# modprobe fuse
# dmesg | grep -i fuse fuse init (API version 7.13)
2) 安装客户端软件及依赖包(以RedHat/CentOS为例):
$ sudo yum -y install openssh-server wget fuse fuse-libs openib libibverbs
3) 打开服务器端的TCP/UDP端口
需要在所有的Gluster服务器上打开TCP和UDP的24007, 24008端口。
需要在所有的Gluster服务器上为启用了的brick打开特定端口,在一台主机上设置了多个bricks的情况下,brick的对外服务端口是从49152开始使用,逐渐递增。例如在主机上设置启用了5个brick,那么:
$ sudo iptables -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 24007:24008 -j ACCEPT
$ sudo iptables -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 49152:49156 -j ACCEPT
注:以上仅做参考,在RedHat/CentOS系统上需对命令做微调。
4) 在客户机上下载最新的glusterfs, glusterfs-fuse, and glusterfs-rdma RPM包
Glusterfs包中包含了Gluster原生客户端。glusterfs-fuse包中包含了FUSE转换器,用于在客户机上挂接glusterfs存储卷。glusterfs-rdma 包是用于客户机使用Infiniband网络时提供相关驱动。
RedHat/CentOS系统应该可以使用yum进行下载和安装。其它Linux系统可以从以下地址中下载从上rpm包:http://www.gluster.org/download/ 。
$ sudo rpm -i glusterfs-3.3.0qa30-1.x86_64.rpm
$ sudo rpm -i glusterfs-fuse-3.3.0qa30-1.x86_64.rpm
$ sudo rpm -i glusterfs-rdma-3.3.0qa30-1.x86_64.rpm
注:gluster-fs-rdma包在不使用Infiniband网络时则不需要安装。
5) 源码方式安装Gluster原生客户端
# mkdir glusterfs
# cd glusterfs
下载源码至该目录并解压:
# tar -xvzf SOURCE-FILE
# ./configure
# make # make install
验证安装结果:
# glusterfs –-version
6) 挂接逻辑存储卷到本地
怎样手工实现挂接:
# mount -t glusterfs server1:/test-volume /mnt/glusterfs
注:在挂接时所指定的server1只是为了从它那里获取到必要的配置信息,在挂接之后,客户机会与不仅仅server1进行通信,也会直接和逻辑存储卷内其它bricks所在的主机进行通信。
关于可选的配置项,格式及样例如下:
# mount -t glusterfs -o backupvolfile-server=volfile_server2,use-readdirp=no,volfile-max-fetch-attempts=2,log-level=WARNING,log-file=/var/log/gluster.log server1:/test-volume /mnt/glusterfs
选项的说明:
backupvolfile-server=server-name
volfile-max-fetch-attempts=number of attempts
log-level=loglevel
log-file=logfile
transport=transport-type
direct-io-mode=[enable|disable]
use-readdirp=[yes|no]
选项volfile-max-fetch-attempts=X在使用到RRDNS或在挂接卷时指定的多个服务器IP的情况下,是比较有用的。
选项backupvolfile-server可以手工设置好当第一个volfile服务器失效时,可以把存储卷的挂接源转移到另一个指定的Server。
怎样实现自动化的挂接:
编辑/etc/fstab文件,增加以下内容:
server1:/test-volume /mnt/glusterfs glusterfs defaults,_netdev 0 0
同样可以设置好一些有用的配置选项:
server1:/test-volume /mnt/glusterfs glusterfs defaults,_netdev,log-level=WARNING,log-file=/var/log/gluster.log 0 0
7.2 使用NFS协议访问存储卷
在服务端和客户端主机上均需要安装NFS工具包,这个工个包的名称在不同发行版本的Linux上并不相同,但大多以nfs-commons, nfs-utils为主。
手工挂接到本地:
# mount -t nfs -o vers=3 server1:/test-volume /mnt/glusterfs
注:Gluster NFS server不支持UDP,如果你的NFS客户端报错提示NFS版本或协议不正确,则可以在挂接命令的参数选项中明确指定下使用TCP协议。
# mount -o mountproto=tcp -t nfs server1:/test-volume /mnt/glusterfs
在Solaris客户机上挂接存储卷:
# mount -o proto=tcp,vers=3 nfs://server1:38467/test-volume /mnt/glusterfs
设置成系统启动时自动挂接存储卷:
修改/etc/fstab文件并添加以下内容:
server1:/test-volume /mnt/glusterfs nfs defaults,_netdev,vers=3 0 0
或
server1:/test-volume /mnt/glusterfs nfs defaults,_netdev,mountproto=tcp 0 0
7.3 使用CIFS协议访问存储卷
1) 在服务端挂接存储卷在本地。
2) 设置Samba配置文件,将挂接目录发布出去。
编辑smb.conf文件并添加以下内容:
[glustertest]
comment = For testing a Gluster volume exported through CIFS
path = /mnt/glusterfs
read only = no
guest ok = yes
保存并重启smb服务: #service smb restart
你需要在Gluster集群内的每个节点上都配置以上内容。
3) 手工在Windows系统中挂接存储卷
Windows Explorer, choose Tools > Map Network Drive
Choose the drive letter using the Drive drop-down list
Click Browse, select the volume to map to the network drive, and click OK
实现基于CIFS的自动挂接:
Windows Explorer, choose Tools > Map Network Drive
Choose the drive letter using the Drive drop-down list.
Click Browse, select the volume to map to the network drive, and click OK.
Click the Reconnect at logon checkbox.
8. 怎样管理GlusterFS服务器的存储卷
8.1 调优存储卷参数
命令格式:# gluster volume set
例如,设置存储卷使用的缓存为256MB:
# gluster volume set test-volume performance.cache-size 256MBSet volume successful
查看修改的结果: # gluster volume info
GlusterFS存储卷配置选项清单:
Option
Description
Default Value
Available Options
auth.allow
IP addresses of the clients which should be allowed to access the volume.
* (allow all)
Valid IP address which includes wild card patterns including *, such as 192.168.1.*
auth.reject
IP addresses of the clients which should be denied to access the volume.
NONE (reject none)
Valid IP address which includes wild card patterns including *, such as 192.168.2.*
client.grace-timeout
Specifies the duration for the lock state to be maintained on the client after a network disconnection.
10
10 - 1800 secs
cluster.self-heal-window-size
Specifies the maximum number of blocks per file on which self-heal would happen simultaneously.
16
0 - 1025 blocks
cluster.data-self-heal-algorithm
Specifies the type of self-heal. If you set the option as "full", the entire file is copied from source to destinations. If the option is set to "diff" the file blocks that are not in sync are copied to destinations. Reset uses a heuristic model. If the file does not exist on one of the subvolumes, or a zero-byte file exists (created by entry self-heal) the entire content has to be copied anyway, so there is no benefit from using the "diff" algorithm. If the file size is about the same as page size, the entire file can be read and written with a few operations, which will be faster than "diff" which has to read checksums and then read and write.
reset
full/diff/reset
cluster.min-free-disk
Specifies the percentage of disk space that must be kept free. Might be useful for non-uniform bricks
10%
Percentage of required minimum free disk space
cluster.stripe-block-size
Specifies the size of the stripe unit that will be read from or written to.
128 KB (for all files)
size in bytes
cluster.self-heal-daemon
Allows you to turn-off proactive self-heal on replicated
On
On/Off
cluster.ensure-durability
This option makes sure the data/metadata is durable across abrupt shutdown of the brick.
On
On/Off
diagnostics.brick-log-level
Changes the log-level of the bricks.
INFO
DEBUG/WARNING/ERROR/CRITICAL/NONE/TRACE
diagnostics.client-log-level
Changes the log-level of the clients.
INFO
DEBUG/WARNING/ERROR/CRITICAL/NONE/TRACE
diagnostics.latency-measurement
Statistics related to the latency of each operation would be tracked.
Off
On/Off
diagnostics.dump-fd-stats
Statistics related to file-operations would be tracked.
Off
On
features.read-only
Enables you to mount the entire volume as read-only for all the clients (including NFS clients) accessing it.
Off
On/Off
features.lock-heal
Enables self-healing of locks when the network disconnects.
On
On/Off
features.quota-timeout
For performance reasons, quota caches the directory sizes on client. You can set timeout indicating the maximum duration of directory sizes in cache, from the time they are populated, during which they are considered valid
0
0 - 3600 secs
geo-replication.indexing
Use this option to automatically sync the changes in the filesystem from Master to Slave.
Off
On/Off
network.frame-timeout
The time frame after which the operation has to be declared as dead, if the server does not respond for a particular operation.
1800 (30 mins)
1800 secs
network.ping-timeout
The time duration for which the client waits to check if the server is responsive. When a ping timeout happens, there is a network disconnect between the client and server. All resources held by server on behalf of the client get cleaned up. When a reconnection happens, all resources will need to be re-acquired before the client can resume its operations on the server. Additionally, the locks will be acquired and the lock tables updated. This reconnect is a very expensive operation and should be avoided.
42 Secs
42 Secs
nfs.enable-ino32
For 32-bit nfs clients or applications that do not support 64-bit inode numbers or large files, use this option from the CLI to make Gluster NFS return 32-bit inode numbers instead of 64-bit inode numbers.
Off
On/Off
nfs.volume-access
Set the access type for the specified sub-volume.
read-write
read-write/read-only
nfs.trusted-write
If there is an UNSTABLE write from the client, STABLE flag will be returned to force the client to not send a COMMIT request. In some environments, combined with a replicated GlusterFS setup, this option can improve write performance. This flag allows users to trust Gluster replication logic to sync data to the disks and recover when required. COMMIT requests if received will be handled in a default manner by fsyncing. STABLE writes are still handled in a sync manner.
Off
On/Off
nfs.trusted-sync
All writes and COMMIT requests are treated as async. This implies that no write requests are guaranteed to be on server disks when the write reply is received at the NFS client. Trusted sync includes trusted-write behavior.
Off
On/Off
nfs.export-dir
This option can be used to export specified comma separated subdirectories in the volume. The path must be an absolute path. Along with path allowed list of IPs/hostname can be associated with each subdirectory. If provided connection will allowed only from these IPs. Format: \[(hostspec[hostspec...])][,...]. Where hostspec can be an IP address, hostname or an IP range in CIDR notation.Note: Care must be taken while configuring this option as invalid entries and/or unreachable DNS servers can introduce unwanted delay in all the mount calls.
No sub directory exported.
Absolute path with allowed list of IP/hostname
nfs.export-volumes
Enable/Disable exporting entire volumes, instead if used in conjunction with nfs3.export-dir, can allow setting up only subdirectories as exports.
On
On/Off
nfs.rpc-auth-unix
Enable/Disable the AUTH_UNIX authentication type. This option is enabled by default for better interoperability. However, you can disable it if required.
On
On/Off
nfs.rpc-auth-null
Enable/Disable the AUTH_NULL authentication type. It is not recommended to change the default value for this option.
On
On/Off
nfs.rpc-auth-allow\
Allow a comma separated list of addresses and/or hostnames to connect to the server. By default, all clients are disallowed. This allows you to define a general rule for all exported volumes.
Reject All
IP address or Host name
nfs.rpc-auth-reject\
Reject a comma separated list of addresses and/or hostnames from connecting to the server. By default, all connections are disallowed. This allows you to define a general rule for all exported volumes.
Reject All
IP address or Host name
nfs.ports-insecure
Allow client connections from unprivileged ports. By default only privileged ports are allowed. This is a global setting in case insecure ports are to be enabled for all exports using a single option.
Off
On/Off
nfs.addr-namelookup
Turn-off name lookup for incoming client connections using this option. In some setups, the name server can take too long to reply to DNS queries resulting in timeouts of mount requests. Use this option to turn off name lookups during address authentication. Note, turning this off will prevent you from using hostnames in rpc-auth.addr.* filters.
On
On/Off
nfs.register-with-portmap
For systems that need to run multiple NFS servers, you need to prevent more than one from registering with portmap service. Use this option to turn off portmap registration for Gluster NFS.
On
On/Off
nfs.port \
Use this option on systems that need Gluster NFS to be associated with a non-default port number.
NA
38465- 38467
nfs.disable
Turn-off volume being exported by NFS
Off
On/Off
performance.write-behind-window-size
Size of the per-file write-behind buffer.
1MB
Write-behind cache size
performance.io-thread-count
The number of threads in IO threads translator.
16
0-65
performance.flush-behind
If this option is set ON, instructs write-behind translator to perform flush in background, by returning success (or any errors, if any of previous writes were failed) to application even before flush is sent to backend filesystem.
On
On/Off
performance.cache-max-file-size
Sets the maximum file size cached by the io-cache translator. Can use the normal size descriptors of KB, MB, GB,TB or PB (for example, 6GB). Maximum size uint64.
2 \^ 64 -1 bytes
size in bytes
performance.cache-min-file-size
Sets the minimum file size cached by the io-cache translator. Values same as "max" above
0B
size in bytes
performance.cache-refresh-timeout
The cached data for a file will be retained till 'cache-refresh-timeout' seconds, after which data re-validation is performed.
1s
0-61
performance.cache-size
Size of the read cache.
32 MB
size in bytes
server.allow-insecure
Allow client connections from unprivileged ports. By default only privileged ports are allowed. This is a global setting in case insecure ports are to be enabled for all exports using a single option.
On
On/Off
server.grace-timeout
Specifies the duration for the lock state to be maintained on the server after a network disconnection.
10
10 - 1800 secs
server.statedump-path
Location of the state dump file.
tmp directory of the brick
New directory path
storage.health-check-interval
Number of seconds between health-checks done on the filesystem that is used for the brick(s). Defaults to 30 seconds, set to 0 to disable.
30
8.2 配置存储卷的通信类型
GlusterFS支持三种数据传输类型,分别是TCP,RDMA 以及二种混合的方式。
1) 在修改传输类型前需要从所有客户机上先卸载该存储卷: # umount mount-point
2) 停止存储卷服务:# gluster volume stop volname
3) 修改传输类型:# gluster volume set volname config.transport tcp,rdma OR tcp OR rdma
4) 在所有客户机上使用指定传输类型选项来挂接存储卷:# mount -t glusterfs -o transport=rdma server1:/test-volume /mnt/glusterfs
8.3 在线扩展存储卷
GlusterFS支持在线对已经使用的存储卷进行扩容,比如补充添加更多的bricks进来,但是新增加的bricks数量需要满足原存储卷的数据存储分布规则要求。
1) 在集群的第一个节点上执行以下命令,实现可信主机池扩容:
# gluster peer probe server4 --假设是向已有3个节点的cluster中添加第4个节点
Probe successful
2) 向逻辑存储卷中追加更多的bricks:
# gluster volume add-brick test-volume server4:/exp4
Add Brick successful
3) 检查存储卷的状态:
# gluster volume info
4) 执行命令使数据在所有bricks上重新均衡分布:
# gluster volume rebalance test-volume start
8.4 在线缩小存储卷
在从逻辑存储卷中删除指定的bricks时,删除的bricks目标和数量都需要和该存储卷的数据冗余机制相匹配。指定的bricks虽然被从存储卷中删除了,但然可以通过直接访问brick来访问其数据。
1) 移除指定的brick
移除bricks的命令格式为:# gluster volume remove-brick start
例如:
# gluster volume remove-brick test-volume server2:/exp2 force
Remove Brick successful
2) 查看被移除了的brick的状态
# gluster volume remove-brick test-volume server2:/exp2 status
Node Rebalanced-files size scanned status
--------- ---------------- ---- ------- -----------
617c923e-6450-4065-8e33-865e28d9428f 34 340 162 in progress
3) 查看存储卷的状态
# gluster volume info
4) 重新分布数据
# gluster volume rebalance test-volume start
8.5 替换发生错误的brick
使用什么方法来替换出现错误的brick,需要视逻辑存储卷的数据冗余机制来定。
对于一个分布存储逻辑卷,方法是先移除指定brick,然后新增一个brick,最后执行数据均衡分布命令。
对于一个镜像存储卷或分布式镜像存储卷,方法是使用replace-brick命令。
1) 一个维护分布式存储卷的例子
当前存储卷中的信息:
Bricks:
Brick1: Server1:/home/gfs/r2_0
Brick2: Server1:/home/gfs/r2_1
先增加新brick进来:
gluster volume add-brick r2 Server1:/home/gfs/r2_2
着手移除有错误的brick:
gluster volume remove-brick r2 Server1:/home/gfs/r2_1 start
当上一步执行完成后,对变更进行确认:
gluster volume remove-brick r2 Server1:/home/gfs/r2_1 commit
现在的存储卷管理信息已经变为:
Bricks:
Brick1: Server1:/home/gfs/r2_0
Brick2: Server1:/home/gfs/r2_2
2) 一个维护镜像复制或分布式镜像复制存储卷的例子
我们演示的是对一个数据冗余度为2的存储卷r2,使用新的Server1:/home/gfs/r2_5替换出现问题了的Server1:/home/gfs/r2_0 。
当前存储卷中的信息:
Volume Name: r2 Type: Distributed-Replicate Volume ID: 24a0437a-daa0-4044-8acf-7aa82efd76fd Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: Server1:/home/gfs/r2_0 Brick2: Server2:/home/gfs/r2_1 Brick3: Server1:/home/gfs/r2_2 Brick4: Server2:/home/gfs/r2_3
确认几点信息:被新增进来的brick中为空;除了出现故障的brick,卷中其它brick的状态都需要是ok;如果将被替换下来的brick的状态还是在线状态,则需要手工把它设置为离线。
#gluster volume status
Status of volume: r2
Gluster process Port Online Pid Brick Server1:/home/gfs/r2_0 49152 Y 5342
Brick Server2:/home/gfs/r2_1 49153 Y 5354
Brick Server1:/home/gfs/r2_2 49154 Y 5365
Brick Server2:/home/gfs/r2_3 49155 Y 5376
从上面信息中可以看到每个brick都有一个进程负责提供服务,pid是它的进程ID。
① 请登录上面主机Server1,然后手工终止进程“5342”:#kill 5342
② 使用gluster存储卷的fuse mount(这个例子里是/mnt/r2)创建一份元数据信息,以实现向新的brick同步数据(from Server1:/home/gfs/r2_1 to Server1:/home/gfs/r2_5)
通过在/mnt/r2挂接点下创建和删除一个子目录,设置和清除一项元数据信息,来触发Glusterfs的自我修复进程开始工作,即执行from Server1:/home/gfs/r2_1 to Server1:/home/gfs/r2_5的自修复工作。
mkdir /mnt/r2/
rmdir /mnt/r2/setfattr -n trusted.non-existent-key -v abc /mnt/r2setfattr -x trusted.non-existent-key /mnt/r2
查看与将被替换的brick处于镜像关系的另一个brick的元数据状态:
#getfattr -d -m. -e hex /home/gfs/r2_1
# file: home/gfs/r2_1security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000trusted.afr.r2-client-0=0x000000000000000300000002 |
|