淡淡回忆 发表于 2015-9-5 13:38:17

Oracle 11g Data Guard Error 16143 Heartbeat failed to connect to standby 处理方法

  

  
  
  
一.背景说明
  
  前段时间一朋友在生产库上误操作,本来他是打算重启一下DG环境,结果在备库命令执行错误。
  
  本应该执行
  SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
  
  结果朋友执行成了如下命令:
  SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH;
  
  中断了DG主备库的通信环境,这个finish是用来做Failover时用的。 当时让朋友在主库重新生成了一份standby controlfiles,然后copy到备库,在按正常模式启动就可以了。
  
  因为数据库识别主备库就是通过控制文件来的,所以理论上,只需要重新生成一份standby 控制文件就可以了。后来朋友测试了一下,正常的拉起来了。
  
  今天看到了当时的记录,就顺便模拟一下整个操作,顺便练练手。
  
  
  
二. 演示过程
  
2.1 DG 环境说明
  OS: Oracle Linux6.3
  DB: 11.2.0.3
  
  SQL> select * from v$version;
  
  BANNER
  --------------------------------------------------------------------------------
  Oracle Database 11g Enterprise EditionRelease 11.2.0.3.0 - 64bit Production
  PL/SQL Release 11.2.0.3.0 - Production
  CORE   11.2.0.3.0      Production
  TNS for Linux: Version 11.2.0.3.0 -Production
  NLSRTL Version 11.2.0.3.0 - Production
  
  
  主库:
  
  SQL> select open_mode from v$database;
  
  OPEN_MODE
  --------------------
  READ WRITE
  
  SQL>
  SQL> set pagesize 200
  SQL> select sequence#,applied fromv$archived_log order by sequence# desc;
  
  SEQUENCE# APPLIED
  ---------- ---------
  14 YES
  14 NO
  13 YES
  13 NO
  12 NO
  12 YES
  11 YES
  11 NO
  10 NO
  10 YES
  9 YES
  9 NO
  8 NO
  8 YES
  7 YES
  7 NO
  6 YES
  6 NO
  5 NO
  4 NO
  
  20 rows selected.
  
  
  备库:
  SQL> select open_mode from v$database;
  
  OPEN_MODE
  --------------------
  MOUNTED
  
  SQL>
  
  
  SQL> select sequence#,applied fromv$archived_log order by sequence# desc;
  
  SEQUENCE# APPLIED
  ---------- ---------
  14 YES
  13 YES
  12 YES
  11 YES
  10 YES
  9 YES
  8 YES
  7 YES
  6 YES
  
  9 rows selected.
  
  
2.2 模拟故障
  
  在备库执行如下命令:
  SQL> ALTER DATABASE RECOVER MANAGEDSTANDBY DATABASE FINISH;
  Database altered.
  
  
2.3 查看主库 alert log
  
  $ pwd
  /u01/app/oracle/diag/rdbms/dave_pd/dave/trace
  
  $ tail -30 alert_dave.log
  Thread 1 advanced to log sequence 14 (LGWRswitch)
  Current log# 2 seq# 14 mem# 0: /u01/app/oracle/oradata/dave/redo02.log
  Fri Mar 29 03:30:12 2013
  Archived Log entry 17 added for thread 1sequence 13 ID 0x3312f7c4 dest 1:
  Fri Mar 29 03:30:13 2013
  LNS: Standby redo logfile selected forthread 1 sequence 14 for destination LOG_ARCHIVE_DEST_2
  Fri Mar 29 03:43:10 2013
  Time drift detected. Please check VKTMtrace file for more details.
  Fri Mar 29 04:45:31 2013
  Time drift detected. Please check VKTMtrace file for more details.
  Fri Mar 29 06:28:35 2013
  Time drift detected. Please check VKTMtrace file for more details.
  Fri Mar 29 07:08:14 2013
  Thread 1 advanced to log sequence 15 (LGWRswitch)
  Current log# 3 seq# 15 mem# 0: /u01/app/oracle/oradata/dave/redo03.log
  Fri Mar 29 07:08:16 2013
  Archived Log entry 20 added for thread 1sequence 14 ID 0x3312f7c4 dest 1:
  Fri Mar 29 07:08:17 2013
  LNS: Standby redo logfile selected forthread 1 sequence 15 for destination LOG_ARCHIVE_DEST_2
  Fri Mar 29 07:34:48 2013
  Time drift detected. Please check VKTMtrace file for more details.
  Fri Mar 29 07:48:55 2013
  LNS: Attempting destinationLOG_ARCHIVE_DEST_2 network reconnect (3135)
  LNS: Destination LOG_ARCHIVE_DEST_2 networkreconnect abandoned
  Error 3135 for archive log file 3 to'dave_st'
  Errors in file/u01/app/oracle/diag/rdbms/dave_pd/dave/trace/dave_nsa2_3181.trc:
  ORA-03135: connection lost contact
  LNS: Failed to archive log 3 thread 1sequence 15 (3135)
  Fri Mar 29 07:51:45 2013
  PING: Heartbeatfailed to connect to standby 'dave_st'. Error is 16143.
  
  因为我们在备库执行的Finish命令,导致心跳中断了。
  
2.4 查看备库alert log
  
  $ pwd
  /u01/app/oracle/diag/rdbms/dave_st/dave/trace
  
  $tail -20 alert_dave.log
  Terminal Recovery: thread 1 seq# 15 redorequired
  Terminal Recovery:
  Recovery of Online Redo Log: Thread 1 Group5 Seq 15 Reading mem 0
  Mem# 0: /u01/app/oracle/oradata/dave/stdbyredo02.log
  Identified End-Of-Redo (failover) forthread 1 sequence 15 at SCN 0xffff.ffffffff
  Incomplete Recovery applied until change1082890 time 03/29/2013 07:48:53
  MRP0: Media Recovery Complete (dave)
  Terminal Recovery: successful completion
  Fri Mar 29 07:48:49 2013
  ARCH: Archival stopped, error occurred.Will continue retrying
  ORACLE Instance dave - Archival Error
  Forcing ARSCN to IRSCN for TR 0:1082890
  Attempt to set limbo arscn 0:1082890 irscn0:1082890
  Resetting standby activation ID 856881092(0x3312f7c4)
  ORA-16014: log 5 sequence# 15 not archived,no available destinations
  ORA-00312: online log 5 thread 1:'/u01/app/oracle/oradata/dave/stdbyredo02.log'
  MRP0: Background Media Recovery processshutdown (dave)
  Fri Mar 29 07:48:50 2013
  Terminal Recovery: completion detected(dave)
  Completed: ALTER DATABASE RECOVER MANAGEDSTANDBY DATABASE FINISH
  $
  
2.5 在主库切换归档
  
  SQL> alter system switch logfile;
  System altered.
  
  SQL> alter system switch logfile;
  System altered.
  
  SQL> select sequence#,applied fromv$archived_log order by sequence# desc;
  
  SEQUENCE# APPLIED
  ---------- ---------
  16 NO
  15 NO
  14 NO
  14 YES
  13 YES
  13 NO
  12 YES
  12 NO
  11 NO
  11 YES
  10 NO
  10 YES
  9 YES
  9 NO
  8 YES
  8 NO
  7 YES
  7 NO
  6 NO
  6 YES
  5 NO
  4 NO
  
  22 rows selected.
  
  SQL>
  
2.6 再次查看主备库日志
  
  主库日志:
  
  Fri Mar 29 07:52:46 2013
  PING: Heartbeat failed to connect tostandby 'dave_st'. Error is 16143.
  Fri Mar 29 07:53:47 2013
  PING: Heartbeat failed to connect tostandby 'dave_st'. Error is 16143.
  Fri Mar 29 07:53:49 2013
  Thread 1 advanced to log sequence 16 (LGWRswitch)
  Current log# 1 seq# 16 mem# 0: /u01/app/oracle/oradata/dave/redo01.log
  Fri Mar 29 07:53:49 2013
  Archived Log entry 21 added for thread 1sequence 15 ID 0x3312f7c4 dest 1:
  Fri Mar 29 07:53:50 2013
  FAL: Error 16143 creatingremote archivelog file 'dave_st'
  FAL: FAL archive failed, seetrace file.
  ARCH: FAL archive failed. Archivercontinuing
  ORACLE Instance dave - Archival Error.Archiver continuing.
  Thread 1 advanced to log sequence 17 (LGWRswitch)
  Current log# 2 seq# 17 mem# 0: /u01/app/oracle/oradata/dave/redo02.log
  Fri Mar 29 07:53:57 2013
  Archived Log entry 22 added for thread 1sequence 16 ID 0x3312f7c4 dest 1:
  
  
  备库日志:
  
  Fri Mar 29 07:48:50 2013
  Terminal Recovery: completion detected(dave)
  Completed: ALTER DATABASE RECOVER MANAGEDSTANDBY DATABASE FINISH
  Fri Mar 29 07:51:34 2013
  RFS: Assigned to RFS process 9336
  RFS: No connections allowed during/afterterminal recovery.
  Fri Mar 29 07:52:35 2013
  RFS: Assigned to RFS process 9340
  RFS: No connections allowed during/afterterminal recovery.
  Fri Mar 29 07:53:36 2013
  RFS: Assigned to RFS process 9343
  RFS: No connections allowed during/afterterminal recovery.
  Fri Mar 29 07:53:39 2013
  RFS: Assigned to RFS process 9345
  RFS: No connectionsallowed during/after terminal recovery.
  
  
2.7 在主库重建standby control file
  
  先在备库查看一下控制文件名称,等会创建完后直接覆盖过去:
  SQL> show parameter control
  
  NAME                                 TYPE      VALUE
  ----------------------------------------------- ------------------------------
  control_file_record_keep_time      integer   7
  control_files                        string      /u01/app/oracle/oradata/dave/c
  ontrol01.ctl, /u01/app/oracle/
  fast_recovery_area/dave/contro
  l02.ctl
  control_management_pack_access       string      DIAGNOSTIC+TUNING
  
  
  主库创建standby controlfile:
  SQL> alter database create standbycontrolfile as '/u01/control01.ctl';
  Database altered.
  
  
  copy到备库的目录,在覆盖原来的控制文件:
  
  --先关闭备库:
  SQL> shutdown immediate
  ORA-01109: database not open
  
  Database dismounted.
  ORACLE instance shut down.
  SQL>
  
  
  --copy并覆盖:
  $ cd/u01/app/oracle/oradata/dave/
  $ ls
  control01.ctl    stdbyredo02.logstdbyredo04.logsystem01.dbf undotbs01.dbf
  stdbyredo01.logstdbyredo03.logsysaux01.dbf   temp01.dbf    users01.dbf
  $ mv control01.ctlcontrol01.ctl.bak
  $ ls
  control01.ctl.bakstdbyredo02.logstdbyredo04.logsystem01.dbf undotbs01.dbf
  stdbyredo01.log    stdbyredo03.logsysaux01.dbf   temp01.dbf    users01.dbf
  
  $ scp192.168.1.20:/u01/control01.ctl 192.168.1.30:/u01/app/oracle/oradata/dave/
  The authenticity of host '192.168.1.20(192.168.1.20)' can't be established.
  RSA key fingerprint is0d:6a:5f:78:53:a0:bf:54:a8:e3:7e:67:81:06:8d:75.
  Are you sure you want to continueconnecting (yes/no)? yes
  Warning: Permanently added '192.168.1.20'(RSA) to the list of known hosts.
  oracle@192.168.1.20's password:
  oracle@192.168.1.30's password:
  control01.ctl                                                                        100% 9520KB 865.5KB/s   00:11   
  Connection to 192.168.1.20 closed.
  $ ls
  control01.ctl      stdbyredo01.logstdbyredo03.logsysaux01.dbf temp01.dbf   users01.dbf
  control01.ctl.bakstdbyredo02.logstdbyredo04.logsystem01.dbf undotbs01.dbf
  $
  
  
  $ cd/u01/app/oracle/fast_recovery_area/dave/
  $ ls
  control02.ctl
  $ mv control02.ctlcontrol02.ctl.bak
  $ ls
  control02.ctl.bak
  $
  
  $ cp control01.ctl/u01/app/oracle/fast_recovery_area/dave/control02.ctl
  
2.8 在正常拉起备库
  
  SQL> startup nomount;
  ORACLE instance started.
  
  Total System Global Area814227456 bytes
  Fixed Size                  2232760 bytes
  Variable Size             478154312 bytes
  Database Buffers          331350016 bytes
  Redo Buffers                2490368 bytes
  SQL> alter database mount standby database;
  
  Database altered.
  
  SQL> alter database recover managedstandby database disconnect from session;
  
  Database altered.
  
  SQL>
  
2.9 查看主备库日志
  
  主库日志:
  
  Fri Mar 29 08:00:51 2013
  PING: Heartbeat failed to connect tostandby 'dave_st'. Error is 16143.
  Fri Mar 29 08:01:52 2013
  Error 1034 received logging on to thestandby
  PING: Heartbeat failed to connect tostandby 'dave_st'. Error is 1034.
  Fri Mar 29 08:02:56 2013
  Error 1034 received logging on to thestandby
  PING: Heartbeat failed to connect tostandby 'dave_st'. Error is 1034.
  Fri Mar 29 08:03:57 2013
  Error 1034 received logging on to thestandby
  PING: Heartbeat failed to connect tostandby 'dave_st'. Error is 1034.
  Fri Mar 29 08:04:59 2013
  Error 1034 received logging on to the standby
  PING: Heartbeat failed to connect tostandby 'dave_st'. Error is 1034.
  Fri Mar 29 08:06:02 2013
  Error 1034 received logging on to thestandby
  PING: Heartbeat failed to connect tostandby 'dave_st'. Error is 1034.
  Fri Mar 29 08:07:05 2013
  Error 1034 received logging on to thestandby
  PING: Heartbeat failed to connect tostandby 'dave_st'. Error is 1034.
  Fri Mar 29 08:08:08 2013
  PING: Heartbeat failed to connect tostandby 'dave_st'. Error is 16058.
  Fri Mar 29 08:08:34 2013
  ALTER SYSTEM SETlog_archive_dest_state_2='ENABLE' SCOPE=MEMORY SID='*';
  Fri Mar 29 08:08:35 2013
  Thread 1 advanced to log sequence 18 (LGWRswitch)
  Current log# 3 seq# 18 mem# 0: /u01/app/oracle/oradata/dave/redo03.log
  Fri Mar 29 08:08:36 2013
  ******************************************************************
  LGWR: Setting 'active'archival for destination LOG_ARCHIVE_DEST_2
  ******************************************************************
  Fri Mar 29 08:08:36 2013
  Archived Log entry 23added for thread 1 sequence 17 ID 0x3312f7c4 dest 1:
  
  
  备库日志:
  
  $ tail -20 alert_dave.log
  ORA-27037: unable to obtain file status
  Linux-x86_64 Error: 2: No such file ordirectory
  Additional information: 3
  Clearing online redo logfile 3 complete
  Media Recovery Waiting for thread 1sequence 15
  Fetching gap sequence in thread 1, gapsequence 15-16
  Fri Mar 29 08:08:48 2013
  RFS: Assigned to RFS process 9707
  RFS: Opened log for thread 1 sequence 16dbid 856896964 branch 794014730
  Fri Mar 29 08:08:49 2013
  RFS: Assigned to RFS process 9705
  RFS: Opened log for thread 1 sequence 15dbid 856896964 branch 794014730
  Archived Log entry 2 added for thread 1sequence 16 rlc 794014730 ID 0x3312f7c4 dest 2:
  Archived Log entry 3 added for thread 1sequence 15 rlc 794014730 ID 0x3312f7c4 dest 2:
  Fri Mar 29 08:08:55 2013
  Media Recovery Log/u01/archivelog/1_15_794014730.dbf
  Media Recovery Log/u01/archivelog/1_16_794014730.dbf
  Media Recovery Log/u01/archivelog/1_17_794014730.dbf
  Fri Mar 29 08:09:11 2013
  Media Recovery Waitingfor thread 1 sequence 18 (in transit)
  
  注意这里:
  我们把备库拉起来之后,就自动开始同步了。
  
  
2.10 切换归档测试
  
  主库:
  SQL> alter system switch logfile;
  
  System altered.
  
  SQL> select sequence#,applied from v$archived_log order by sequence# desc;
  
  SEQUENCE# APPLIED
  ---------- ---------
  18 NO
  18 NO
  17 NO
  17 YES
  16 YES
  16 NO
  15 NO
  15 YES
  14 NO
  14 YES
  13 YES
  13 NO
  12 NO
  12 YES
  11 NO
  11 YES
  10 YES
  10 NO
  9 NO
  9 YES
  8 NO
  8 YES
  7 NO
  7 YES
  6 NO
  6 YES
  5 NO
  4 NO
  
  28 rows selected.
  
  
  备库:
  
  SQL> select sequence#,applied from v$archived_log order by sequence# desc;
  SEQUENCE# APPLIED
  ---------- ---------
  18 YES
  17 YES
  16 YES
  15 YES
  
  注意这里,备库已经完全同步了。 之前在我们主库看,18的日志还没有应用,因为我们刚启动备库,应用需要一定的时间。 并且在我们重新配置之后,这里的数字就从15开始了。 是我们中断DG后的数字。 但我们主库还是从4开始计算的。
  
  
  小结:
  对于DG通信异常中断的处理,我们仅仅需要重新创建一份standby 的control file就可以了。
  
  
  
  
  
  
---------------------------------------------------------------------------------------
版权所有,文章允许转载,但必须以链接方式注明源地址,否则追究法律责任!
Skype:    tianlesoftware
QQ:       tianlesoftware@gmail.com
Email:    tianlesoftware@gmail.com
Blog:   http://blog.iyunv.com/tianlesoftware
Weibo:    http://weibo.com/tianlesoftware
Twitter:http://twitter.com/tianlesoftware
Facebook: http://www.facebook.com/tianlesoftware
Linkedin: http://cn.linkedin.com/in/tianlesoftware
页: [1]
查看完整版本: Oracle 11g Data Guard Error 16143 Heartbeat failed to connect to standby 处理方法