janneyabc 发表于 2019-2-2 11:02:39

pg inconsistent

  ceph 状态突然 error
  # ceph health detail
  HEALTH_ERR 1 pgs inconsistent; 1 scrub errors;
  pg 2.37c is active+clean+inconsistent, acting
  1 scrub errors
  报错信息总结:
  问题PG:2.37c
  OSD编号:75,6,35
  执行常规修复:
  ceph pg repair 2.37c
这时会出现osd节点各别重启 从新分配pg remap 稍等片刻后恢复ok
  如果查看修复结果还不正常:
  # ceph health detail
  HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
  pg 2.37c is active+clean+inconsistent, acting
  1 scrub errors
  问题依然存在,异常pg没有修复;
  然后执行:
  要洗刷一个pg组,执行命令:
  ceph pg scrub 2.37c
  ceph pg deep-scrub2.37c
  ceph pg repair 2.37c
  以上命令执行后均未修复,依然报上面的错误,查看相关osd 日志报错如下:
  2017-07-24 17:31:10.585305 7f72893c47000 log_channel(cluster) log : 2.37c repair starts
  2017-07-24 17:31:10.710517 7f72893c4700 -1 log_channel(cluster) log : 2.37c repair 1 errors, 0 fixed
  决定修复pg 设置的三块osd ,执行命令如下:
  ceph osd repair 75
  ceph osd repair 6
  ceph osd repair 35
  最后决定用一个最粗暴的方法解决,关闭有问题pg 所使用的主osd 75
  查询pg 使用主osd信息
  ceph pg2.37c query |grep primary

            "blocked_by": [],
"up_primary": 75,
"acting_primary": 75
  执行操作如下:
  systemctl stop ceph-osd@75
  此时ceph开始数据恢复,将osd75 上面的数据在其它节点恢复,等待一段时间,发现数据滚动完成,执行命令查看集群状态。
  # ceph health detail
  HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
  pg 2.37c is active+clean+inconsistent, acting
  1 scrub errors
  # ceph pg repair 2.37c
  instructing pg 2.37c on osd.8 to repair
  然后查看集群状态:
  # ceph health detail
  HEALTH_OK



页: [1]
查看完整版本: pg inconsistent