Oracle ADG + Keepalived 切换演练

客户的一套生产环境采用的架构是Oracle ADG + Keepalived,近期需要进行切换演练,要求我这边保障。ADG本身切换倒没啥可说的,但引入keepalived软件,就需要提前研究下这个架构。其实看了下环境配置,整体思路也非常简单,说白了就是利用keepalived软件引入一个VIP,应用侧只需配置连接这个VIP即可。
依据当前生产环境架构模拟了一套自己的测试环境。

1.Keepalived相关配置

关于Keepalived软件的配置和编译安装,可以参考之前《MySQL主主+Keepalived架构安装部署》中Keepalived安装部署章节。
除了利用keepalived软件引入一个VIP,还有一些配置和脚本,脱敏如下:

--------------------------------------------------------
--节点1(192.168.1.124)keepalived.conf文件内容:
--------------------------------------------------------
[root@test04 ~]# cat /etc/keepalived/keepalived.conf
! Configuration File for keepalived

vrrp_script chk_dg_stats {  
    script "/etc/keepalived/check_dataguard.sh" 
    interval 2
    weight -5
    fall 2  
    rise 1  
}

vrrp_instance VI_1 {
    state MASTER    
    interface eth0 
    mcast_src_ip 192.168.1.124
    virtual_router_id 131 
    priority 101 
    inopreempt
    advert_int 1         
    authentication {   
        auth_type PASS 
        auth_pass 888888   
    }
    virtual_ipaddress {    
        192.168.1.131
    }

    track_script {               
       chk_dg_stats             
    }
}

--------------------------------------------------------
--节点2(192.168.1.125)keepalived.conf文件内容:
--------------------------------------------------------
[root@test05 ~]# cat /etc/keepalived/keepalived.conf
! Configuration File for keepalived

vrrp_script chk_dg_stats {  
    script "/etc/keepalived/check_dataguard.sh" 
    interval 2
    weight -5
    fall 2  
    rise 1  
}

vrrp_instance VI_1 {
    state BACKUP
    interface eth0 
    mcast_src_ip 192.168.1.125
    virtual_router_id 131 
    priority 99 
    inopreempt
    advert_int 1         
    authentication {   
        auth_type PASS 
        auth_pass 888888   
    }
    virtual_ipaddress {    
        192.168.1.131
    }

    track_script {               
       chk_dg_stats             
    }
}

--------------------------------------------------------
--所有节点配置脚本check_dataguard.sh,并确认具有x执行权限:
--------------------------------------------------------
# cat /etc/keepalived/check_dataguard.sh 
#!/bin/bash
dbstats=`ps -ef | grep ora_smon | grep -v grep | wc -l`
dgstats=`ps -ef | grep ora_mrp | grep -v grep | wc -l`

if [ "${dbstats}" -eq 0  ]; then
/etc/init.d/keepalived stop
elif [[ "${dbstats}" -gt 0 ]] && [[ "${dgstats}" -gt 0 ]]; then
/etc/init.d/keepalived stop
fi 

说明:脚本check_dataguard.sh主要通过对ora_smon和ora_mrp进程的监控,判断哪种场景下该关闭keepalived服务:
场景1:当不存在ora_smon进程时(数据库实例Crash);
场景2:存在ora_smon进程同时存在ora_mrp进程时(已启动mrp进程的备库)。

--添加x执行权限:
chmod u+x /etc/keepalived/check_dataguard.sh
[root@test04 ~]# ls -l /etc/keepalived/check_dataguard.sh
-rwxr--r--. 1 root root 282 Jul 14 22:35 /etc/keepalived/check_dataguard.sh
[root@test05 ~]# ls -l /etc/keepalived/check_dataguard.sh
-rwxr--r--. 1 root root 281 Jul 14 22:36 /etc/keepalived/check_dataguard.sh

2.ADG手工切换步骤

1)在switchover正式切换前先在主库上手工切换几次日志,确认DG备库同步正常: 
--PRIMARY(主库192.168.1.124)切换几次日志:
SQL>
alter system switch logfile; 
alter system switch logfile; 
alter system switch logfile; 
--Standby (备库192.168.1.125)需确认同步正常没有延迟:
SQL> 
select * from v$dataguard_stats; 
2)主库切换为备库
-- 在PRIMARY(主库192.168.1.124)查询,确认可切换为备库:
select OPEN_MODE, DATABASE_ROLE, SWITCHOVER_STATUS, FORCE_LOGGING, DATAGUARD_BROKER, GUARD_STATUS from v$database; 
-- 在PRIMARY(主库192.168.1.124)操作,切换为备库:
ALTER DATABASE COMMIT TO SWITCHOVER TO STANDBY WITH SESSION SHUTDOWN;
3)备库切换为主库
-- 在Standby(备库192.168.1.125)查询,确认可切换为主库:
select OPEN_MODE, DATABASE_ROLE, SWITCHOVER_STATUS, FORCE_LOGGING, DATAGUARD_BROKER, GUARD_STATUS from v$database; 
-- 在Standby(备库192.168.1.125)操作,切换为主库(根据SWITCHOVER_STATUS值确认用下面哪个命令):
ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY;
ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY WITH SESSION SHUTDOWN;

4)新主库open,新备库启动并开启MRP,新主库启动keepalived服务
--NEW PRIMARY(新主库192.168.1.125)数据库从mount启动到open状态:
ALTER DATABASE OPEN;
--NEW STANDBY(新备库192.168.1.124)数据库startup启动,开启DG日志应用:
STARTUP
RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT;
--确认NEW STANDBY(新备库192.168.1.124)DG同步正常,没有延迟:
SQL> 
select * from v$dataguard_stats; 

5) 新主库启动keepalived服务
--NEW PRIMARY(新主库192.168.1.125)OS层root用户启动keepalived服务:
# /etc/init.d/keepalived start

注意:当演练结束后,若需要switchover主备再次切换,只需要按上面规范步骤重复操作即可(注意主备角色的转换)。

3.VIP和监听的关系

源于最早的一次面试,两个节点的RAC,节点1主机Crash,此时应用通过节点1的VIP是否可以连接到数据库?为什么?
我们都知道节点1主机Crash,其VIP会自动漂移节点2,ping这个IP也是通的,但是通过其连接数据库却不行!会报一个没有监听(ORA-12541: TNS:no listener)的错误。
具体可参考:RAC 某节点不可用时,对应VIP是否可用
那这里的环境,同样是VIP的设置,为何却可以通过VIP(192.168.1.131)连接呢?

[oracle@test03 ~]$ sqlplus sys/oracle@192.168.1.131/demo as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Tue Jul 14 23:45:23 2020

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> 

实际验证,是因为这里主备库的监听配置统一都是主机名:

[oracle@test04 admin]$ cat listener.ora
# listener.ora Network Configuration File: /u01/app/oracle/product/11.2.0/dbhome_1/network/admin/listener.ora
# Generated by Oracle configuration tools.

LISTENER =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))
      (ADDRESS = (PROTOCOL = TCP)(HOST = test04)(PORT = 1521))
    )
  )

ADR_BASE_LISTENER = /u01/app/oracle

[oracle@test05 admin]$ cat listener.ora 
# listener.ora Network Configuration File: /u01/app/oracle/product/11.2.0/dbhome_1/network/admin/listener.ora
# Generated by Oracle configuration tools.

LISTENER =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))
      (ADDRESS = (PROTOCOL = TCP)(HOST = test05)(PORT = 1521))
    )
  )

ADR_BASE_LISTENER = /u01/app/oracle

SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (GLOBAL_DBNAME = jingyus)
      (ORACLE_HOME = /u01/app/oracle/product/11.2.0/dbhome_1)
      (SID_NAME = jingyu)
    )
  )

如果将主机名修改为具体的IP地址,则测试同样会报错(ORA-12541: TNS:no listener)。

This entry was posted in Oracle专项实施 and tagged , . Bookmark the permalink.