设为首页 收藏本站
查看: 1042|回复: 0

[经验分享] Oracle 11g Data Guard Error 16143 Heartbeat failed to connect to standby 处理方法

[复制链接]

尚未签到

发表于 2015-11-21 10:33:24 | 显示全部楼层 |阅读模式
  

  
  
  
一.背景说明
  
  前段时间一朋友在生产库上误操作,本来他是打算重启一下DG环境,结果在备库命令执行错误。
  
  本应该执行
  SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
  
  结果朋友执行成了如下命令:
  SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH;
  
  中断了DG主备库的通信环境,这个finish是用来做Failover时用的。 当时让朋友在主库重新生成了一份standby controlfiles,然后copy到备库,在按正常模式启动就可以了。
  
  因为数据库识别主备库就是通过控制文件来的,所以理论上,只需要重新生成一份standby 控制文件就可以了。  后来朋友测试了一下,正常的拉起来了。
  
  今天看到了当时的记录,就顺便模拟一下整个操作,顺便练练手。
  
  
  
二. 演示过程
  
2.1 DG 环境说明
  OS: Oracle Linux6.3
  DB: 11.2.0.3
  
  SQL> select * from v$version;
  
  BANNER
  --------------------------------------------------------------------------------
  Oracle Database 11g Enterprise EditionRelease 11.2.0.3.0 - 64bit Production
  PL/SQL Release 11.2.0.3.0 - Production
  CORE   11.2.0.3.0      Production
  TNS for Linux: Version 11.2.0.3.0 -Production
  NLSRTL Version 11.2.0.3.0 - Production
  
  
  主库:
  
  SQL> select open_mode from v$database;
  
  OPEN_MODE
  --------------------
  READ WRITE
  
  SQL>
  SQL> set pagesize 200
  SQL> select sequence#,applied fromv$archived_log order by sequence# desc;
  
  SEQUENCE# APPLIED
  ---------- ---------
  14 YES
  14 NO
  13 YES
  13 NO
  12 NO
  12 YES
  11 YES
  11 NO
  10 NO
  10 YES
  9 YES
  9 NO
  8 NO
  8 YES
  7 YES
  7 NO
  6 YES
  6 NO
  5 NO
  4 NO
  
  20 rows selected.
  
  
  备库:
  SQL> select open_mode from v$database;
  
  OPEN_MODE
  --------------------
  MOUNTED
  
  SQL>
  
  
  SQL> select sequence#,applied fromv$archived_log order by sequence# desc;
  
  SEQUENCE# APPLIED
  ---------- ---------
  14 YES
  13 YES
  12 YES
  11 YES
  10 YES
  9 YES
  8 YES
  7 YES
  6 YES
  
  9 rows selected.
  
  
2.2 模拟故障
  
  在备库执行如下命令:
  SQL> ALTER DATABASE RECOVER MANAGEDSTANDBY DATABASE FINISH;
  Database altered.
  
  
2.3 查看主库 alert log
  
  [oracle@dg1 trace]$ pwd
  /u01/app/oracle/diag/rdbms/dave_pd/dave/trace
  
  [oracle@dg1 trace]$ tail -30 alert_dave.log
  Thread 1 advanced to log sequence 14 (LGWRswitch)
  Current log# 2 seq# 14 mem# 0: /u01/app/oracle/oradata/dave/redo02.log
  Fri Mar 29 03:30:12 2013
  Archived Log entry 17 added for thread 1sequence 13 ID 0x3312f7c4 dest 1:
  Fri Mar 29 03:30:13 2013
  LNS: Standby redo logfile selected forthread 1 sequence 14 for destination LOG_ARCHIVE_DEST_2
  Fri Mar 29 03:43:10 2013
  Time drift detected. Please check VKTMtrace file for more details.
  Fri Mar 29 04:45:31 2013
  Time drift detected. Please check VKTMtrace file for more details.
  Fri Mar 29 06:28:35 2013
  Time drift detected. Please check VKTMtrace file for more details.
  Fri Mar 29 07:08:14 2013
  Thread 1 advanced to log sequence 15 (LGWRswitch)
  Current log# 3 seq# 15 mem# 0: /u01/app/oracle/oradata/dave/redo03.log
  Fri Mar 29 07:08:16 2013
  Archived Log entry 20 added for thread 1sequence 14 ID 0x3312f7c4 dest 1:
  Fri Mar 29 07:08:17 2013
  LNS: Standby redo logfile selected forthread 1 sequence 15 for destination LOG_ARCHIVE_DEST_2
  Fri Mar 29 07:34:48 2013
  Time drift detected. Please check VKTMtrace file for more details.
  Fri Mar 29 07:48:55 2013
  LNS: Attempting destinationLOG_ARCHIVE_DEST_2 network reconnect (3135)
  LNS: Destination LOG_ARCHIVE_DEST_2 networkreconnect abandoned
  Error 3135 for archive log file 3 to'dave_st'
  Errors in file/u01/app/oracle/diag/rdbms/dave_pd/dave/trace/dave_nsa2_3181.trc:
  ORA-03135: connection lost contact
  LNS: Failed to archive log 3 thread 1sequence 15 (3135)
  Fri Mar 29 07:51:45 2013
  PING[ARC1]: Heartbeatfailed to connect to standby 'dave_st'. Error is 16143.
  
  因为我们在备库执行的Finish命令,导致心跳中断了。
  
2.4 查看备库alert log
  
  [oracle@dg2 trace]$ pwd
  /u01/app/oracle/diag/rdbms/dave_st/dave/trace
  
  [oracle@dg2 trace]$  tail -20 alert_dave.log
  Terminal Recovery: thread 1 seq# 15 redorequired
  Terminal Recovery:
  Recovery of Online Redo Log: Thread 1 Group5 Seq 15 Reading mem 0
  Mem# 0: /u01/app/oracle/oradata/dave/stdbyredo02.log
  Identified End-Of-Redo (failover) forthread 1 sequence 15 at SCN 0xffff.ffffffff
  Incomplete Recovery applied until change1082890 time 03/29/2013 07:48:53
  MRP0: Media Recovery Complete (dave)
  Terminal Recovery: successful completion
  Fri Mar 29 07:48:49 2013
  ARCH: Archival stopped, error occurred.Will continue retrying
  ORACLE Instance dave - Archival Error
  Forcing ARSCN to IRSCN for TR 0:1082890
  Attempt to set limbo arscn 0:1082890 irscn0:1082890
  Resetting standby activation ID 856881092(0x3312f7c4)
  ORA-16014: log 5 sequence# 15 not archived,no available destinations
  ORA-00312: online log 5 thread 1:'/u01/app/oracle/oradata/dave/stdbyredo02.log'
  MRP0: Background Media Recovery processshutdown (dave)
  Fri Mar 29 07:48:50 2013
  Terminal Recovery: completion detected(dave)
  Completed: ALTER DATABASE RECOVER MANAGEDSTANDBY DATABASE FINISH
  [oracle@dg2 trace]$
  
2.5 在主库切换归档
  
  SQL> alter system switch logfile;
  System altered.
  
  SQL> alter system switch logfile;
  System altered.
  
  SQL> select sequence#,applied fromv$archived_log order by sequence# desc;
  
  SEQUENCE# APPLIED
  ---------- ---------
  16 NO
  15 NO
  14 NO
  14 YES
  13 YES
  13 NO
  12 YES
  12 NO
  11 NO
  11 YES
  10 NO
  10 YES
  9 YES
  9 NO
  8 YES
  8 NO
  7 YES
  7 NO
  6 NO
  6 YES
  5 NO
  4 NO
  
  22 rows selected.
  
  SQL>
  
2.6 再次查看主备库日志
  
  主库日志:
  
  Fri Mar 29 07:52:46 2013
  PING[ARC1]: Heartbeat failed to connect tostandby 'dave_st'. Error is 16143.
  Fri Mar 29 07:53:47 2013
  PING[ARC1]: Heartbeat failed to connect tostandby 'dave_st'. Error is 16143.
  Fri Mar 29 07:53:49 2013
  Thread 1 advanced to log sequence 16 (LGWRswitch)
  Current log# 1 seq# 16 mem# 0: /u01/app/oracle/oradata/dave/redo01.log
  Fri Mar 29 07:53:49 2013
  Archived Log entry 21 added for thread 1sequence 15 ID 0x3312f7c4 dest 1:
  Fri Mar 29 07:53:50 2013
  FAL[server, ARC2]: Error 16143 creatingremote archivelog file 'dave_st'
  FAL[server, ARC2]: FAL archive failed, seetrace file.
  ARCH: FAL archive failed. Archivercontinuing
  ORACLE Instance dave - Archival Error.Archiver continuing.
  Thread 1 advanced to log sequence 17 (LGWRswitch)
  Current log# 2 seq# 17 mem# 0: /u01/app/oracle/oradata/dave/redo02.log
  Fri Mar 29 07:53:57 2013
  Archived Log entry 22 added for thread 1sequence 16 ID 0x3312f7c4 dest 1:
  
  
  备库日志:
  
  Fri Mar 29 07:48:50 2013
  Terminal Recovery: completion detected(dave)
  Completed: ALTER DATABASE RECOVER MANAGEDSTANDBY DATABASE FINISH
  Fri Mar 29 07:51:34 2013
  RFS[5]: Assigned to RFS process 9336
  RFS[5]: No connections allowed during/afterterminal recovery.
  Fri Mar 29 07:52:35 2013
  RFS[6]: Assigned to RFS process 9340
  RFS[6]: No connections allowed during/afterterminal recovery.
  Fri Mar 29 07:53:36 2013
  RFS[7]: Assigned to RFS process 9343
  RFS[7]: No connections allowed during/afterterminal recovery.
  Fri Mar 29 07:53:39 2013
  RFS[8]: Assigned to RFS process 9345
  RFS[8]: No connectionsallowed during/after terminal recovery.
  
  
2.7 在主库重建standby control file
  
  先在备库查看一下控制文件名称,等会创建完后直接覆盖过去:
  SQL> show parameter control
  
  NAME                                 TYPE        VALUE
  ----------------------------------------------- ------------------------------
  control_file_record_keep_time        integer     7
  control_files                        string      /u01/app/oracle/oradata/dave/c
  ontrol01.ctl, /u01/app/oracle/
  fast_recovery_area/dave/contro
  l02.ctl
  control_management_pack_access       string      DIAGNOSTIC+TUNING
  
  
  主库创建standby controlfile
  SQL> alter database create standbycontrolfile as '/u01/control01.ctl';
  Database altered.
  
  
  copy到备库的目录,在覆盖原来的控制文件:
  
  --先关闭备库:
  SQL> shutdown immediate
  ORA-01109: database not open
  
  Database dismounted.
  ORACLE instance shut down.
  SQL>
  
  
  --copy并覆盖:
  [oracle@dg2 trace]$ cd/u01/app/oracle/oradata/dave/
  [oracle@dg2 dave]$ ls
  control01.ctl    stdbyredo02.log  stdbyredo04.log  system01.dbf undotbs01.dbf
  stdbyredo01.log  stdbyredo03.log  sysaux01.dbf     temp01.dbf    users01.dbf
  [oracle@dg2 dave]$ mv control01.ctlcontrol01.ctl.bak
  [oracle@dg2 dave]$ ls
  control01.ctl.bak  stdbyredo02.log  stdbyredo04.log  system01.dbf undotbs01.dbf
  stdbyredo01.log    stdbyredo03.log  sysaux01.dbf     temp01.dbf    users01.dbf
  
  [oracle@dg2 dave]$ scp192.168.1.20:/u01/control01.ctl 192.168.1.30:/u01/app/oracle/oradata/dave/
  The authenticity of host '192.168.1.20(192.168.1.20)' can't be established.
  RSA key fingerprint is0d:6a:5f:78:53:a0:bf:54:a8:e3:7e:67:81:06:8d:75.
  Are you sure you want to continueconnecting (yes/no)? yes
  Warning: Permanently added '192.168.1.20'(RSA) to the list of known hosts.
  oracle@192.168.1.20's password:
  oracle@192.168.1.30's password:
  control01.ctl                                                                        100% 9520KB 865.5KB/s   00:11   
  Connection to 192.168.1.20 closed.
  [oracle@dg2 dave]$ ls
  control01.ctl      stdbyredo01.log  stdbyredo03.log  sysaux01.dbf temp01.dbf     users01.dbf
  control01.ctl.bak  stdbyredo02.log  stdbyredo04.log  system01.dbf undotbs01.dbf
  [oracle@dg2 dave]$
  
  
  [oracle@dg2 archivelog]$ cd/u01/app/oracle/fast_recovery_area/dave/
  [oracle@dg2 dave]$ ls
  control02.ctl
  [oracle@dg2 dave]$ mv control02.ctlcontrol02.ctl.bak
  [oracle@dg2 dave]$ ls
  control02.ctl.bak
  [oracle@dg2 dave]$
  
  [oracle@dg2 dave]$ cp control01.ctl/u01/app/oracle/fast_recovery_area/dave/control02.ctl
  
2.8 在正常拉起备库
  
  SQL> startup nomount;
  ORACLE instance started.
  
  Total System Global Area  814227456 bytes
  Fixed Size                  2232760 bytes
  Variable Size             478154312 bytes
  Database Buffers          331350016 bytes
  Redo Buffers                2490368 bytes
  SQL> alter database mount standby database;
  
  Database altered.
  
  SQL> alter database recover managedstandby database disconnect from session;
  
  Database altered.
  
  SQL>
  
2.9 查看主备库日志
  
  主库日志:
  
  Fri Mar 29 08:00:51 2013
  PING[ARC1]: Heartbeat failed to connect tostandby 'dave_st'. Error is 16143.
  Fri Mar 29 08:01:52 2013
  Error 1034 received logging on to thestandby
  PING[ARC1]: Heartbeat failed to connect tostandby 'dave_st'. Error is 1034.
  Fri Mar 29 08:02:56 2013
  Error 1034 received logging on to thestandby
  PING[ARC1]: Heartbeat failed to connect tostandby 'dave_st'. Error is 1034.
  Fri Mar 29 08:03:57 2013
  Error 1034 received logging on to thestandby
  PING[ARC1]: Heartbeat failed to connect tostandby 'dave_st'. Error is 1034.
  Fri Mar 29 08:04:59 2013
  Error 1034 received logging on to the standby
  PING[ARC1]: Heartbeat failed to connect tostandby 'dave_st'. Error is 1034.
  Fri Mar 29 08:06:02 2013
  Error 1034 received logging on to thestandby
  PING[ARC1]: Heartbeat failed to connect tostandby 'dave_st'. Error is 1034.
  Fri Mar 29 08:07:05 2013
  Error 1034 received logging on to thestandby
  PING[ARC1]: Heartbeat failed to connect tostandby 'dave_st'. Error is 1034.
  Fri Mar 29 08:08:08 2013
  PING[ARC1]: Heartbeat failed to connect tostandby 'dave_st'. Error is 16058.
  Fri Mar 29 08:08:34 2013
  ALTER SYSTEM SETlog_archive_dest_state_2='ENABLE' SCOPE=MEMORY SID='*';
  Fri Mar 29 08:08:35 2013
  Thread 1 advanced to log sequence 18 (LGWRswitch)
  Current log# 3 seq# 18 mem# 0: /u01/app/oracle/oradata/dave/redo03.log
  Fri Mar 29 08:08:36 2013
  ******************************************************************
  LGWR: Setting 'active'archival for destination LOG_ARCHIVE_DEST_2
  ******************************************************************
  Fri Mar 29 08:08:36 2013
  Archived Log entry 23added for thread 1 sequence 17 ID 0x3312f7c4 dest 1:
  
  
  备库日志:
  
  [oracle@dg2 trace]$ tail -20 alert_dave.log
  ORA-27037: unable to obtain file status
  Linux-x86_64 Error: 2: No such file ordirectory
  Additional information: 3
  Clearing online redo logfile 3 complete
  Media Recovery Waiting for thread 1sequence 15
  Fetching gap sequence in thread 1, gapsequence 15-16
  Fri Mar 29 08:08:48 2013
  RFS[3]: Assigned to RFS process 9707
  RFS[3]: Opened log for thread 1 sequence 16dbid 856896964 branch 794014730
  Fri Mar 29 08:08:49 2013
  RFS[4]: Assigned to RFS process 9705
  RFS[4]: Opened log for thread 1 sequence 15dbid 856896964 branch 794014730
  Archived Log entry 2 added for thread 1sequence 16 rlc 794014730 ID 0x3312f7c4 dest 2:
  Archived Log entry 3 added for thread 1sequence 15 rlc 794014730 ID 0x3312f7c4 dest 2:
  Fri Mar 29 08:08:55 2013
  Media Recovery Log/u01/archivelog/1_15_794014730.dbf
  Media Recovery Log/u01/archivelog/1_16_794014730.dbf
  Media Recovery Log/u01/archivelog/1_17_794014730.dbf
  Fri Mar 29 08:09:11 2013
  Media Recovery Waitingfor thread 1 sequence 18 (in transit)
  
  注意这里:
  我们把备库拉起来之后,就自动开始同步了。
  
  
2.10 切换归档测试
  
  主库:
  SQL> alter system switch logfile;
  
  System altered.
  
  SQL> select sequence#,applied from v$archived_log order by sequence# desc;
  
  SEQUENCE# APPLIED
  ---------- ---------
  18 NO
  18 NO
  17 NO
  17 YES
  16 YES
  16 NO
  15 NO
  15 YES
  14 NO
  14 YES
  13 YES
  13 NO
  12 NO
  12 YES
  11 NO
  11 YES
  10 YES
  10 NO
  9 NO
  9 YES
  8 NO
  8 YES
  7 NO
  7 YES
  6 NO
  6 YES
  5 NO
  4 NO
  
  28 rows selected.
  
  
  备库:
  
  SQL> select sequence#,applied from v$archived_log order by sequence# desc;
  SEQUENCE# APPLIED
  ---------- ---------
  18 YES
  17 YES
  16 YES
  15 YES
  
  注意这里,备库已经完全同步了。 之前在我们主库看,18的日志还没有应用,因为我们刚启动备库,应用需要一定的时间。 并且在我们重新配置之后,这里的数字就从15开始了。 是我们中断DG后的数字。 但我们主库还是从4开始计算的。
  
  
  小结:
  对于DG通信异常中断的处理,我们仅仅需要重新创建一份standby 的control file就可以了。
  
  
  
  
  
  
---------------------------------------------------------------------------------------

版权所有,文章允许转载,但必须以链接方式注明源地址,否则追究法律责任!

Skype:    tianlesoftware

QQ:       tianlesoftware@gmail.com

Email:    tianlesoftware@gmail.com

Blog:     http://blog.iyunv.com/tianlesoftware

Weibo:    http://weibo.com/tianlesoftware

Twitter:  http://twitter.com/tianlesoftware

Facebook: http://www.facebook.com/tianlesoftware

Linkedin: http://cn.linkedin.com/in/tianlesoftware

运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-141754-1-1.html 上篇帖子: Heartbeat3.0.7安装配置 下篇帖子: CentOS6.0下安装heartbeat 3.x
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表