这篇文章主要介绍了MySQL高可用架构之MHA的原理分析,具有一定借鉴价值,感兴趣的朋友可以参考下,希望大家阅读完这篇文章之后大有收获,下面让小编带着大家一起了解一下。

MHA角色部署

MHA 服务有两种角色,MHA Manager(管理节点)和MHA Node(数据节点):

MHA Manager:通常单独部署在一台独立的机器上或者直接部署在其中一台slave上(不建议后者),管理多个master/slave集群,每个master/slave集群称作一个application;其作用有二:

(1)master自动切换及故障转移命令运行

(2)其他的帮助脚本运行:手动切换master;master/slave状态检测

MHA node:运行在每台MySQL服务器上(master/slave/manager),它通过监控具备解析和清理logs功能的脚本来加快故障转移。其作用有:

(1)复制主节点的binlog数据

(2)对比从节点的中继日志文件

(3)无需停止从节点的SQL线程,定时删除中继日志

目前MHA主要支持一主多从的架构,要搭建MHA,要求一个复制集群中必须最少有三台数据库服务器,一主二从,即一台充当master,一台充当备用master,另外一台充当从库,因为至少需要三台服务器,出于机器成本的考虑,淘宝也在该基础上进行了改造,目前淘宝TMHA已经支持一主一从。

我们自己使用其实也可以使用1主1从,但是master主机宕机后无法切换,以及无法补全binlog。master的mysqld进程crash后,还是可以切换成功,以及补全binlog的。

官方介绍:https://code.google.com/p/mysql-master-ha/

下图展示了如何通过MHA Manager管理多组主从复制。可以将MHA工作原理总结为如下:

(1)从宕机崩溃的master保存二进制日志事件(binlog events);

(2)识别含有最新更新的slave;

(3)应用差异的中继日志(relay log)到其他的slave;

(4)应用从master保存的二进制日志事件(binlog events);

(5)提升一个slave为新的master;

(6)使其他的slave连接新的master进行复制;

MHA组件

(1)、 Manager工具:

– masterha_check_ssh : 检查MHA的SSH配置。

– masterha_check_repl : 检查MySQL复制。

– masterha_manager : 启动MHA。

– masterha_check_status : 检测当前MHA运行状态。

– masterha_master_monitor : 监测master是否宕机。

– masterha_master_switch : 控制故障转移(自动或手动)。

– masterha_conf_host : 添加或删除配置的server信息。

(2)、 Node工具(这些工具通常由MHAManager的脚本触发,无需人手操作)。

– save_binary_logs : 保存和复制master的二进制日志。

– apply_diff_relay_logs : 识别差异的中继日志事件并应用于其它slave。

– filter_mysqlbinlog : 去除不必要的ROLLBACK事件(MHA已不再使用这个工具)。

– purge_relay_logs : 清除中继日志(不会阻塞SQL线程)。

(3)、自定义扩展:

-secondary_check_script:通过多条网络路由检测master的可用性;

-master_ip_failover_script:更新application使用的masterip;(需要修改)

-shutdown_script:强制关闭master节点;

-report_script:发送报告;

-init_conf_load_script:加载初始配置参数;

-master_ip_online_change:更新master节点ip地址;(需要修改)

MHA环境准备

OS:CentOS 6.8

MySQL :5.7.18

MHA 软件包: MHA 0.57

角色 ip地址 主机名 server_id "类型 "
Master 10.180.2.163 MHA-M1 13306 "写入"
S1 10.180.2.164 MHA-S1 23306 "读" (其实可以一起部署监控,一组MHA 可以多个监控节点)
S2 10.180.2.165 MHA-S2 33306 "读 ","监控复制组" (监控一般不能部署到master 节点,防止Master 宕机不能切换)

安装MHA Node 包

(1)在所有节点安装MHA node所需的perl模块(DBD:mysql),并下载MHA 软件包

?

12yum install perl-DBD-MySQL -y (可能需要epel源)https://mega.nz/#F!G4oRjARB!SWzFS59bUv9VrKwdAeIGVw (MHA0.57)

(2)在所有的节点安装mha node(包括Manager 节点):

tarxfmha4mysql-node-0.57.tar.gzcdmha4mysql-node-0.57perlMakefile.PLmake&&makeinstall

安装完成将产生文件如下:


[root@MHA-S1bin]#lltotal48-r-xr-xr-x1rootroot16381Aug714:06apply_diff_relay_logs-r-xr-xr-x1rootroot4807Aug714:06filter_mysqlbinloglrwxrwxrwx1rootroot26Aug817:10mysql->/usr/local/mysql/bin/mysqllrwxrwxrwx1rootroot32Aug817:09mysqlbinlog->/usr/local/mysql/bin/mysqlbinlog-r-xr-xr-x1rootroot8261Aug714:06purge_relay_logs-rwxr-xr-x1rootroot314Aug816:21purge_relay.sh-r-xr-xr-x1rootroot7525Aug714:06save_binary_logs[root@MHA-S1bin]#pwd/usr/local/bin

增加系统环境变量:

echo"exportPATH=\$PATH:/usr/local/bin">>/etc/profilesource~/.bash_profile安装MHA Manager 包

tarxfmha4mysql-node-0.57.tar.gzcdmha4mysql-node-0.57perlMakefile.PLmake&&makeinstall

安装完成后会在/usr/local/bin目录下面生成以下脚本文件


[root@MHA-S2bin]#pwd/usr/local/bin[root@MHA-S2bin]#lltotal140-r-xr-xr-x1rootroot16381Aug714:07apply_diff_relay_logs-r-xr-xr-x1rootroot4807Aug714:07filter_mysqlbinlog-rwxr-xr-x1rootroot166Aug917:18manager.sh-r-xr-xr-x1rootroot1995Aug717:28masterha_check_repl-r-xr-xr-x1rootroot1779Aug717:28masterha_check_ssh-r-xr-xr-x1rootroot1865Aug717:28masterha_check_status-r-xr-xr-x1rootroot3201Aug717:28masterha_conf_host-r-xr-xr-x1rootroot2517Aug717:28masterha_manager-r-xr-xr-x1rootroot2165Aug717:28masterha_master_monitor-r-xr-xr-x1rootroot2373Aug717:28masterha_master_switch-r-xr-xr-x1rootroot5171Aug717:28masterha_secondary_check-r-xr-xr-x1rootroot1739Aug717:28masterha_stop-rwxr-xr-x1rootroot2169Aug910:49master_ip_failover-rwxr-xr-x1rootroot3648Aug717:30master_ip_failover.old-rwxr-xr-x1rootroot10369Aug1221:33master_ip_online_change-rwxr-xr-x1rootroot9870Aug717:30master_ip_online_change.oldlrwxrwxrwx1rootroot26Aug817:10mysql->/usr/local/mysql/bin/mysqllrwxrwxrwx1rootroot32Aug817:09mysqlbinlog->/usr/local/mysql/bin/mysqlbinlog-rw-------1rootroot0Aug1220:04nohup.out-rwxr-xr-x1rootroot11867Aug717:30power_manager-r-xr-xr-x1rootroot8261Aug714:07purge_relay_logs-rwxr-xr-x1rootroot314Aug816:20purge_relay.sh-r-xr-xr-x1rootroot7525Aug714:07save_binary_logs-rwxr-xr-x1rootroot1360Aug717:30send_report


复制相关脚本到/usr/local/bin目录(软件包解压缩后就有了,不是必须,因为这些脚本不完整,需要自己修改,这是软件开发着留给我们自己发挥的,如果开启下面的任何一个脚本对应的参数,而对应这里的脚本又没有修改,则会抛错,自己被坑的很惨)


[root@MHA-S2 scripts]# ll
total 32
-rwxr-xr-x 1 root root 3443 Jan 8 2012 master_ip_failover #自动切换时vip管理的脚本,不是必须,如果我们使用keepalived的,我们可以自己编写脚本完成对vip的管理,比如监控mysql,如果mysql异常,我们停止keepalived就行,这样vip就会自动漂移
-rwxr-xr-x 1 root root 9186 Jan 8 2012 master_ip_online_change #在线切换时vip的管理,不是必须,同样可以可以自行编写简单的shell完成
-rwxr-xr-x 1 root root 11867 Jan 8 2012 power_manager #故障发生后关闭主机的脚本,不是必须
-rwxr-xr-x 1 root root 1360 Jan 8 2012 send_report #因故障切换后发送报警的脚本,不是必须,可自行编写简单的shell完成。
[root@MHA-S2 scripts]# cp * /usr/local/bin/


配置SSH登录无密码验证

ssh-keygenssh-copy-idroot@xxx(XXX请包括自己,要不然后面check-ssh那步要杯具的)

搭建主从复制环境

详解之前双主复制环境搭建文档

保证两台Slave都搭建成功

Slave_IO_Running:YesSlave_SQL_Running:Yes

两台slave服务器设置read_only(从库对外提供读服务,只所以没有写进配置文件,是因为随时slave会提升为master)

root@localhost:mysql3306.sock[(none)]>setglobalread_only=1

创建监控用户(在master上执行)

grantallprivilegeson*.*toroot@'%'identifiedby'123456';

flushprivileges;

至此,复制搭建完毕,后面配置MHA

MHA环境配置

(1) 创建MHA 工作目录

mkdir-p/etc/mha

修改app1.cnf配置文件,修改后的文件内容如下:

[root@MHA-S2~]#/etc/mha/=/var/log/masterha/app1/=/var/log/masterha/=/data/mysql//=/usr/local/bin/=/usr/local/bin/===/===/usr/local/bin/=/usr/local/bin/masterha_secondary_check-sMHA-S1-sMHA-===MHA-==//设置为候选master,如果设置该参数以后,发生主从切换以后将会将此从库提升为主库,即使这个主库不是集群中事件最新的slavecheck_repl_delay=//默认情况下如果一个slave落后master100M的relaylogs的话,MHA将不会选择该slave作为一个新的master,因为对于这个slave的恢复需要花费很长时间,通过设置check_repl_delay=0,MHA触发切换在选择一个新的master的时候将会忽略复制延时,这个参数对于设置了candidate_master=1的主机非常有用,因为这个候选主在切换的过程中一定是新的master

=MHA-S1port=

=MHA-=

(2)设置relay log的清除方式(在每个slave节点上):

?

1'set global relay_log_purge=0'

注意:

MHA在发生切换的过程中,从库的恢复过程中依赖于relay log的相关信息,所以这里要将relay log的自动清除设置为OFF,采用手动清除relay log的方式。在默认情况下,从服务器上的中继日志会在SQL线程执行完毕后被自动删除。但是在MHA环境中,这些中继日志在恢复其他从服务器时可能会被用到,因此需要禁用中继日志的自动删除功能。定期清除中继日志需要考虑到复制延时的问题。在ext3的文件系统下,删除大的文件需要一定的时间,会导致严重的复制延时。为了避免复制延时,需要暂时为中继日志创建硬链接,因为在linux系统中通过硬链接删除大文件速度会很快。(在mysql数据库中,删除大表时,通常也采用建立硬链接的方式)

MHA节点中包含了pure_relay_logs命令工具,它可以为中继日志创建硬链接,执行SET GLOBAL relay_log_purge=1,等待几秒钟以便SQL线程切换到新的中继日志,再执行SET GLOBAL relay_log_purge=0。

pure_relay_logs脚本参数如下所示:

--usermysql用户名--passwordmysql密码--port端口号--workdir指定创建relaylog的硬链接的位置,默认是/var/tmp,由于系统不同分区创建硬链接文件会失败,故需要执行硬链接具体位置,成功执行脚本后,硬链接的中继日志文件被删除--disable_relay_log_purge默认情况下,如果relay_log_purge=1,脚本会什么都不清理,自动退出,通过设定这个参数,当relay_log_purge=1的情况下会将relay_log_purge设置为0。清理relaylog之后,最后将参数设置为OFF。

(3)设置定期清理relay脚本(例如每天一次,所有服务器)

[root@MHA-S2bin]#purge_relay.!/bin/======[!-$log_dir---user=$user--password=$--disable_relay_log_purge--port=$port--workdir=$work_dir>>$log_dir/purge_relay_logs.log>&

添加到crontab[root@MHA-S2bin]#crontab-l04***/bin/bash/root/purge_relay_log.sh

可以手工执行以下是否会报错

检查SSH配置

[root@MHA-S2bin]#masterha_check_ssh--conf=/etc/mha/app1.cnfMonAug1418:07:022017-[warning]Globalconfigurationfile/etc/masterha_default.cnfnotfound.Skipping.MonAug1418:07:022017-[info]Readingapplicationdefaultconfigurationfrom/etc/mha/app1.cnf..MonAug1418:07:022017-[info]Readingserverconfigurationfrom/etc/mha/app1.cnf..MonAug1418:07:022017-[info]StartingSSHconnectiontests..MonAug1418:07:032017-[debug]MonAug1418:07:022017-[debug]ConnectingviaSSHfromroot@MHA-M1(10.180.2.163:22)toroot@MHA-S1(10.180.2.164:22)..MonAug1418:07:022017-[debug]ok.MonAug1418:07:022017-[debug]ConnectingviaSSHfromroot@MHA-M1(10.180.2.163:22)toroot@MHA-S2(10.180.2.165:22)..MonAug1418:07:032017-[debug]ok.MonAug1418:07:032017-[debug]MonAug1418:07:032017-[debug]ConnectingviaSSHfromroot@MHA-S1(10.180.2.164:22)toroot@MHA-M1(10.180.2.163:22)..MonAug1418:07:032017-[debug]ok.MonAug1418:07:032017-[debug]ConnectingviaSSHfromroot@MHA-S1(10.180.2.164:22)toroot@MHA-S2(10.180.2.165:22)..MonAug1418:07:032017-[debug]ok.MonAug1418:07:042017-[debug]MonAug1418:07:032017-[debug]ConnectingviaSSHfromroot@MHA-S2(10.180.2.165:22)toroot@MHA-M1(10.180.2.163:22)..MonAug1418:07:032017-[debug]ok.MonAug1418:07:042017-[debug]ConnectingviaSSHfromroot@MHA-S2(10.180.2.165:22)toroot@MHA-S1(10.180.2.164:22)..MonAug1418:07:042017-[debug]ok.MonAug1418:07:042017-[info]AllSSHconnectiontestspassedsuccessfully.

检查整个复制环境状况

发现有报错,

TueAug817:46:312017-[info]Checkingmaster_ip_failover_scriptstatus:TueAug817:46:312017-[info]/usr/local/bin/master_ip_failover--command=status--ssh_user=root--orig_master_host=MHA-M1--orig_master_ip=10.180.2.163--orig_master_port=3306Bareword"FIXME_xxx"notallowedwhile"strictsubs"inuseat/usr/local/bin/master_ip_failoverline93.Executionof/usr/local/bin/master_ip_failoverabortedduetocompilationerrors.TueAug817:46:312017-[error][/usr/local/share/perl5/MHA/MasterMonitor.pm,ln229]Failedtogetmaster_ip_failover_scriptstatuswithreturncode255:0.TueAug817:46:312017-[error][/usr/local/share/perl5/MHA/MasterMonitor.pm,ln427]Errorhappenedoncheckingconfigurations.at/usr/local/bin/masterha_check_replline48TueAug817:46:312017-[error][/usr/local/share/perl5/MHA/MasterMonitor.pm,ln525]Errorhappenedonmonitoringservers.TueAug817:46:312017-[info]Gotexitcode1(Notmasterdead).

原来Failover两种方式:一种是虚拟IP地址,一种是全局配置文件。MHA并没有限定使用哪一种方式,而是让用户自己选择,虚拟IP地址的方式会牵扯到其它的软件,比如keepalive软件,而且还要修改脚本master_ip_failover。这里先把app1.cnf 里面master_ip_failover_script= /usr/local/bin/master_ip_failover这个选项屏蔽才可以通过。

#master_ip_failover_script=/usr/local/bin/master_ip_failoverTueAug817:49:402017-[info]Gotexitcode0(Notmasterdead).MySQLReplicationHealthisOK.

检查MHA Manager的状态

[root@MHA-S2mha]#masterha_check_status--conf=/etc/mha/app1.cnfapp1isstopped(2:NOT_RUNNING).

手动启动

[root@MHA-S2mha]#nohupmasterha_manager--conf=/etc/mha/app1.cnf--remove_dead_master_conf--ignore_last_failover</dev/null>/var/log/masterha/app1/manager.log2>&1&[1]16774[root@MHA-S2mha]#ps-ef|grepmasterharoot1677415297417:52pts/300:00:00perl/usr/local/bin/masterha_manager--conf=/etc/mha/app1.cnf--remove_dead_master_conf--ignore_last_failover[root@MHA-S2mha]#masterha_check_status--conf=/etc/mha/app1.cnfapp1(pid:16774)isrunning(0:PING_OK),master:MHA-M1

--remove_dead_master_conf 该参数代表当发生主从切换后,老的主库的ip将会从配置文件中移除。(如果发生异常切换之后修复了旧的master,要加进去新的MHA 的话,必须记得app1.cnf回补server1的信息)

--manger_log 日志存放位置

--ignore_last_failover在缺省情况下,如果MHA检测到连续发生宕机,且两次宕机间隔不足8小时的话,则不会进行Failover,之所以这样限制是为了避免ping-pong效应。该参数代表忽略上次MHA触发切换产生的文件,默认情况下,MHA发生切换后会在日志目录,也就是上面我设置的/data产生app1.failover.complete文件,下次再次切换的时候如果发现该目录下存在该文件将不允许触发切换,除非在第一次切换后收到删除该文件,为了方便,这里设置为--ignore_last_failover。

检查启动日志

[root@MHA-S2app1]#vimanager.logTueAug817:52:372017-[warning]Globalconfigurationfile/etc/masterha_default.cnfnotfound.Skipping.TueAug817:52:372017-[info]Readingapplicationdefaultconfigurationfrom/etc/mha/app1.cnf..TueAug817:52:372017-[info]Readingserverconfigurationfrom/etc/mha/app1.cnf..TueAug817:52:372017-[info]MHA::MasterMonitorversion0.57.TueAug817:52:382017-[info]GTIDfailovermode=1TueAug817:52:382017-[info]DeadServers:TueAug817:52:382017-[info]AliveServers:TueAug817:52:382017-[info]MHA-M1(10.180.2.163:3306)TueAug817:52:382017-[info]MHA-S1(10.180.2.164:3306)TueAug817:52:382017-[info]MHA-S2(10.180.2.165:3306)TueAug817:52:382017-[info]AliveSlaves:TueAug817:52:382017-[info]MHA-S1(10.180.2.164:3306)Version=5.7.18-log(oldestmajorversionbetweenslaves)log-bin:enabledTueAug817:52:382017-[info]GTIDONTueAug817:52:382017-[info]ReplicatingfromMHA-M1(10.180.2.163:3306)TueAug817:52:382017-[info]PrimarycandidateforthenewMaster(candidate_masterisset)TueAug817:52:382017-[info]MHA-S2(10.180.2.165:3306)Version=5.7.18-log(oldestmajorversionbetweenslaves)log-bin:enabledTueAug817:52:382017-[info]GTIDONTueAug817:52:382017-[info]ReplicatingfromMHA-M1(10.180.2.163:3306)TueAug817:52:382017-[info]CurrentAliveMaster:MHA-M1(10.180.2.163:3306)TueAug817:52:382017-[info]Checkingslaveconfigurations..TueAug817:52:382017-[info]Checkingreplicationfilteringsettings..TueAug817:52:382017-[info]binlog_do_db=,binlog_ignore_db=TueAug817:52:382017-[info]Replicationfilteringcheckok.TueAug817:52:382017-[info]GTID(withauto-pos)issupported.SkippingallSSHandNodepackagechecking.TueAug817:52:382017-[info]CheckingSSHpublickeyauthenticationsettingsonthecurrentmaster..TueAug817:52:382017-[info]HealthCheck:SSHtoMHA-M1isreachable.TueAug817:52:382017-[info]MHA-M1(10.180.2.163:3306)(currentmaster)+--MHA-S1(10.180.2.164:3306)+--MHA-S2(10.180.2.165:3306)TueAug817:52:382017-[warning]master_ip_failover_scriptisnotdefined.TueAug817:52:382017-[warning]shutdown_scriptisnotdefined.TueAug817:52:382017-[info]Setmasterpinginterval1seconds.TueAug817:52:382017-[info]Setsecondarycheckscript:/usr/local/bin/masterha_secondary_check-sMHA-S1-sMHA-S2TueAug817:52:382017-[info]StartingpinghealthcheckonMHA-M1(10.180.2.163:3306)..TueAug817:52:382017-[info]Ping(SELECT)succeeded,waitinguntilMySQLdoesn'trespond..

配置VIP

vip配置可以采用两种方式,一种通过keepalived的方式管理虚拟ip的浮动;另外一种通过脚本方式启动虚拟ip的方式(即不需要keepalived或者heartbeat类似的软件)。

这里仅演示使用脚本管理VIP 的方式,修改master_ip_failover 脚本,使用脚本管理VIP

[root@MHA-M1~]#/sbin/ifconfigeth2:110.180.2.168/19

脚本:

[root@MHA-S2bin]#catmaster_ip_failover#!/usr/bin/envperlusestrict;usewarningsFATAL=>'all';useGetopt::Long;my($command,$ssh_user,$orig_master_host,$orig_master_ip,$orig_master_port,$new_master_host,$new_master_ip,$new_master_port);my$vip='10.180.2.168/19';my$key='1';my$ssh_start_vip="/sbin/ifconfigeth2:$key$vip";my$ssh_stop_vip="/sbin/ifconfigeth2:$keydown";GetOptions('command=s'=>\$command,'ssh_user=s'=>\$ssh_user,'orig_master_host=s'=>\$orig_master_host,'orig_master_ip=s'=>\$orig_master_ip,'orig_master_port=i'=>\$orig_master_port,'new_master_host=s'=>\$new_master_host,'new_master_ip=s'=>\$new_master_ip,'new_master_port=i'=>\$new_master_port,);exit&main();submain{print"\n\nINSCRIPTTEST====$ssh_stop_vip==$ssh_start_vip===\n\n";if($commandeq"stop"||$commandeq"stopssh"){my$exit_code=1;eval{print"DisablingtheVIPonoldmaster:$orig_master_host\n";&stop_vip();$exit_code=0;};if($@){warn"GotError:$@\n";exit$exit_code;}exit$exit_code;}elsif($commandeq"start"){my$exit_code=10;eval{print"EnablingtheVIP-$viponthenewmaster-$new_master_host\n";&start_vip();$exit_code=0;};if($@){warn$@;exit$exit_code;}exit$exit_code;}elsif($commandeq"status"){print"CheckingtheStatusofthescript..OK\n";exit0;}else{&usage();exit1;}}substart_vip(){`ssh$ssh_user\@$new_master_host\"$ssh_start_vip\"`;}substop_vip(){return0unless($ssh_user);`ssh$ssh_user\@$orig_master_host\"$ssh_stop_vip\"`;}subusage{print"Usage:master_ip_failover--command=start|stop|stopssh|status--orig_master_host=host--orig_master_ip=ip--orig_master_port=port--new_master_host=host--new_master_ip=ip--new_master_port=port\n";}

在app1.cnf 文件中取消刚刚对master_ip_online_failover 的注释并测试:

再次检查MHAcheck[root@MHA-S2bin]#masterha_check_repl--conf=/etc/mha/app1.cnfWedAug910:49:422017-[warning]Globalconfigurationfile/etc/masterha_default.cnfnotfound.Skipping.WedAug910:49:422017-[info]Readingapplicationdefaultconfigurationfrom/etc/mha/app1.cnf..WedAug910:49:422017-[info]Readingserverconfigurationfrom/etc/mha/app1.cnf..WedAug910:49:422017-[info]MHA::MasterMonitorversion0.57.WedAug910:49:432017-[info]GTIDfailovermode=1WedAug910:49:432017-[info]DeadServers:WedAug910:49:432017-[info]AliveServers:WedAug910:49:432017-[info]MHA-M1(10.180.2.163:3306)WedAug910:49:432017-[info]MHA-S1(10.180.2.164:3306)WedAug910:49:432017-[info]MHA-S2(10.180.2.165:3306)WedAug910:49:432017-[info]AliveSlaves:WedAug910:49:432017-[info]MHA-S1(10.180.2.164:3306)Version=5.7.18-log(oldestmajorversionbetweenslaves)log-bin:enabledWedAug910:49:432017-[info]GTIDONWedAug910:49:432017-[info]ReplicatingfromMHA-M1(10.180.2.163:3306)WedAug910:49:432017-[info]PrimarycandidateforthenewMaster(candidate_masterisset)WedAug910:49:432017-[info]MHA-S2(10.180.2.165:3306)Version=5.7.18-log(oldestmajorversionbetweenslaves)log-bin:enabledWedAug910:49:432017-[info]GTIDONWedAug910:49:432017-[info]ReplicatingfromMHA-M1(10.180.2.163:3306)WedAug910:49:432017-[info]CurrentAliveMaster:MHA-M1(10.180.2.163:3306)WedAug910:49:432017-[info]Checkingslaveconfigurations..WedAug910:49:432017-[info]Checkingreplicationfilteringsettings..WedAug910:49:432017-[info]binlog_do_db=,binlog_ignore_db=WedAug910:49:432017-[info]Replicationfilteringcheckok.WedAug910:49:432017-[info]GTID(withauto-pos)issupported.SkippingallSSHandNodepackagechecking.WedAug910:49:432017-[info]CheckingSSHpublickeyauthenticationsettingsonthecurrentmaster..WedAug910:49:432017-[info]HealthCheck:SSHtoMHA-M1isreachable.WedAug910:49:432017-[info]MHA-M1(10.180.2.163:3306)(currentmaster)+--MHA-S1(10.180.2.164:3306)+--MHA-S2(10.180.2.165:3306)WedAug910:49:432017-[info]CheckingreplicationhealthonMHA-S1..WedAug910:49:432017-[info]ok.WedAug910:49:432017-[info]CheckingreplicationhealthonMHA-S2..WedAug910:49:432017-[info]ok.WedAug910:49:432017-[info]Checkingmaster_ip_failover_scriptstatus:WedAug910:49:432017-[info]/usr/local/bin/master_ip_failover--command=status--ssh_user=root--orig_master_host=MHA-M1--orig_master_ip=10.180.2.163--orig_master_port=3306

IN SCRIPT TEST====/sbin/ifconfig eth2:1 down==/sbin/ifconfig eth2:1 10.180.2.168/19===

Checking the Status of the script.. OK
Wed Aug 9 10:49:43 2017 - [info] OK.
Wed Aug 9 10:49:43 2017 - [warning] shutdown_script is not defined.
Wed Aug 9 10:49:43 2017 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

以上就是MHA 安装配置的全过程,以下进行简单的测试。

(1)failover 测试

手动kill 了master 上面的mysqld 进程,查看切换状态

[root@MHA-S2tmp]#moremanager.logWedAug917:47:112017-[warning]GoterroronMySQLselectping:2006(MySQLserverhasgoneaway)WedAug917:47:112017-[info]Executingsecondarynetworkcheckscript:/usr/local/bin/masterha_secondary_check-sMHA-S1-sMHA-S2--user=root--master_host=MHA-M1--master_ip=10.180.2.163--master_port=3306--master_user=root--master_password=123456--ping_type=SELECTWedAug917:47:112017-[info]ExecutingSSHcheckscript:exit0WedAug917:47:112017-[info]HealthCheck:SSHtoMHA-M1isreachable.MonitoringserverMHA-S1isreachable,MasterisnotreachablefromMHA-S1.OK.MonitoringserverMHA-S2isreachable,MasterisnotreachablefromMHA-S2.OK.WedAug917:47:112017-[info]Masterisnotreachablefromallothermonitoringservers.Failovershouldstart.WedAug917:47:122017-[warning]GoterroronMySQLconnect:2013(LostconnectiontoMySQLserverat'readinginitialcommunicationpacket',systemerror:111)WedAug917:47:122017-[warning]Connectionfailed2time(s)..WedAug917:47:132017-[warning]GoterroronMySQLconnect:2013(LostconnectiontoMySQLserverat'readinginitialcommunicationpacket',systemerror:111)WedAug917:47:132017-[warning]Connectionfailed3time(s)..WedAug917:47:142017-[warning]GoterroronMySQLconnect:2013(LostconnectiontoMySQLserverat'readinginitialcommunicationpacket',systemerror:111)WedAug917:47:142017-[warning]Connectionfailed4time(s)..WedAug917:47:142017-[warning]Masterisnotreachablefromhealthchecker!WedAug917:47:142017-[warning]MasterMHA-M1(10.180.2.163:3306)isnotreachable!WedAug917:47:142017-[warning]SSHisreachable.WedAug917:47:142017-[info]Connectingtoamasterserverfailed.Readingconfigurationfile/etc/masterha_default.cnfand/etc/mha/app1.cnfagain,andtryingtoconnecttoallserverstocheckserverstatus..WedAug917:47:142017-[warning]Globalconfigurationfile/etc/masterha_default.cnfnotfound.Skipping.WedAug917:47:142017-[info]Readingapplicationdefaultconfigurationfrom/etc/mha/app1.cnf..WedAug917:47:142017-[info]Readingserverconfigurationfrom/etc/mha/app1.cnf..WedAug917:47:142017-[info]GTIDfailovermode=1WedAug917:47:142017-[info]DeadServers:WedAug917:47:142017-[info]MHA-M1(10.180.2.163:3306)WedAug917:47:142017-[info]AliveServers:WedAug917:47:142017-[info]MHA-S1(10.180.2.164:3306)WedAug917:47:142017-[info]MHA-S2(10.180.2.165:3306)WedAug917:47:142017-[info]AliveSlaves:WedAug917:47:142017-[info]MHA-S1(10.180.2.164:3306)Version=5.7.18-log(oldestmajorversionbetweenslaves)log-bin:enabledWedAug917:47:142017-[info]GTIDONWedAug917:47:142017-[info]ReplicatingfromMHA-M1(10.180.2.163:3306)WedAug917:47:142017-[info]PrimarycandidateforthenewMaster(candidate_masterisset)WedAug917:47:142017-[info]MHA-S2(10.180.2.165:3306)Version=5.7.18-log(oldestmajorversionbetweenslaves)log-bin:enabledWedAug917:47:142017-[info]GTIDONWedAug917:47:142017-[info]ReplicatingfromMHA-M1(10.180.2.163:3306)WedAug917:47:142017-[info]Checkingslaveconfigurations..WedAug917:47:142017-[info]Checkingreplicationfilteringsettings..WedAug917:47:142017-[info]Replicationfilteringcheckok.WedAug917:47:142017-[info]Masterisdown!WedAug917:47:142017-[info]Terminatingmonitoringscript.WedAug917:47:142017-[info]Gotexitcode20(Masterdead).WedAug917:47:142017-[info]MHA::MasterFailoverversion0.57.WedAug917:47:142017-[info]Startingmasterfailover.WedAug917:47:142017-[info]WedAug917:47:142017-[info]*Phase1:ConfigurationCheckPhase..WedAug917:47:142017-[info]WedAug917:47:142017-[info]GTIDfailovermode=1WedAug917:47:142017-[info]DeadServers:WedAug917:47:142017-[info]MHA-M1(10.180.2.163:3306)WedAug917:47:142017-[info]CheckingmasterreachabilityviaMySQL(doublecheck)...WedAug917:47:142017-[info]ok.WedAug917:47:142017-[info]AliveServers:WedAug917:47:142017-[info]MHA-S1(10.180.2.164:3306)WedAug917:47:142017-[info]MHA-S2(10.180.2.165:3306)WedAug917:47:142017-[info]AliveSlaves:WedAug917:47:142017-[info]MHA-S1(10.180.2.164:3306)Version=5.7.18-log(oldestmajorversionbetweenslaves)log-bin:enabledWedAug917:47:142017-[info]GTIDONWedAug917:47:142017-[info]ReplicatingfromMHA-M1(10.180.2.163:3306)WedAug917:47:142017-[info]PrimarycandidateforthenewMaster(candidate_masterisset)WedAug917:47:142017-[info]MHA-S2(10.180.2.165:3306)Version=5.7.18-log(oldestmajorversionbetweenslaves)log-bin:enabledWedAug917:47:142017-[info]GTIDONWedAug917:47:142017-[info]ReplicatingfromMHA-M1(10.180.2.163:3306)WedAug917:47:142017-[info]StartingGTIDbasedfailover.WedAug917:47:142017-[info]WedAug917:47:142017-[info]**Phase1:ConfigurationCheckPhasecompleted.WedAug917:47:142017-[info]WedAug917:47:142017-[info]*Phase2:DeadMasterShutdownPhase..WedAug917:47:142017-[info]WedAug917:47:142017-[info]Forcingshutdownsothatapplicationsneverconnecttothecurrentmaster..WedAug917:47:142017-[info]ExecutingmasterIPdeactivationscript:WedAug917:47:142017-[info]/usr/local/bin/master_ip_failover--orig_master_host=MHA-M1--orig_master_ip=10.180.2.163--orig_master_port=3306--command=stopssh--ssh_user=rootINSCRIPTTEST====/sbin/ifconfigeth2:1down==/sbin/ifconfigeth2:110.180.2.168/24===DisablingtheVIPonoldmaster:MHA-M1SIOCSIFFLAGS:CannotassignrequestedaddressWedAug917:47:142017-[info]done.WedAug917:47:142017-[warning]shutdown_scriptisnotset.Skippingexplicitshuttingdownofthedeadmaster.WedAug917:47:142017-[info]*Phase2:DeadMasterShutdownPhasecompleted.WedAug917:47:142017-[info]WedAug917:47:142017-[info]*Phase3:MasterRecoveryPhase..WedAug917:47:142017-[info]WedAug917:47:142017-[info]*Phase3.1:GettingLatestSlavesPhase..WedAug917:47:142017-[info]WedAug917:47:142017-[info]Thelatestbinarylogfile/positiononallslavesis3306-binlog.000003:194WedAug917:47:142017-[info]RetrievedGtidSet:a5757eae-7981-11e7-82c7-005056b662d3:6-32210WedAug917:47:142017-[info]Latestslaves(Slavesthatreceivedrelaylogfilestothelatest):WedAug917:47:142017-[info]MHA-S1(10.180.2.164:3306)Version=5.7.18-log(oldestmajorversionbetweenslaves)log-bin:enabledWedAug917:47:142017-[info]GTIDONWedAug917:47:142017-[info]ReplicatingfromMHA-M1(10.180.2.163:3306)WedAug917:47:142017-[info]PrimarycandidateforthenewMaster(candidate_masterisset)WedAug917:47:142017-[info]MHA-S2(10.180.2.165:3306)Version=5.7.18-log(oldestmajorversionbetweenslaves)log-bin:enabledWedAug917:47:142017-[info]GTIDONWedAug917:47:142017-[info]ReplicatingfromMHA-M1(10.180.2.163:3306)WedAug917:47:142017-[info]Theoldestbinarylogfile/positiononallslavesis3306-binlog.000003:194WedAug917:47:142017-[info]RetrievedGtidSet:a5757eae-7981-11e7-82c7-005056b662d3:6-32210WedAug917:47:142017-[info]Oldestslaves:WedAug917:47:142017-[info]MHA-S1(10.180.2.164:3306)Version=5.7.18-log(oldestmajorversionbetweenslaves)log-bin:enabledWedAug917:47:142017-[info]GTIDONWedAug917:47:142017-[info]ReplicatingfromMHA-M1(10.180.2.163:3306)WedAug917:47:142017-[info]PrimarycandidateforthenewMaster(candidate_masterisset)WedAug917:47:142017-[info]MHA-S2(10.180.2.165:3306)Version=5.7.18-log(oldestmajorversionbetweenslaves)log-bin:enabledWedAug917:47:142017-[info]GTIDONWedAug917:47:142017-[info]ReplicatingfromMHA-M1(10.180.2.163:3306)WedAug917:47:142017-[info]WedAug917:47:142017-[info]*Phase3.3:DeterminingNewMasterPhase..WedAug917:47:142017-[info]WedAug917:47:142017-[info]Searchingnewmasterfromslaves..WedAug917:47:142017-[info]Candidatemastersfromtheconfigurationfile:WedAug917:47:142017-[info]MHA-S1(10.180.2.164:3306)Version=5.7.18-log(oldestmajorversionbetweenslaves)log-bin:enabledWedAug917:47:142017-[info]GTIDONWedAug917:47:142017-[info]ReplicatingfromMHA-M1(10.180.2.163:3306)WedAug917:47:142017-[info]PrimarycandidateforthenewMaster(candidate_masterisset)WedAug917:47:142017-[info]Non-candidatemasters:WedAug917:47:142017-[info]Searchingfromcandidate_masterslaveswhichhavereceivedthelatestrelaylogevents..WedAug917:47:142017-[info]NewmasterisMHA-S1(10.180.2.164:3306)WedAug917:47:142017-[info]Startingmasterfailover..WedAug917:47:142017-[info]From:MHA-M1(10.180.2.163:3306)(currentmaster)+--MHA-S1(10.180.2.164:3306)+--MHA-S2(10.180.2.165:3306)To:MHA-S1(10.180.2.164:3306)(newmaster)+--MHA-S2(10.180.2.165:3306)WedAug917:47:142017-[info]WedAug917:47:142017-[info]*Phase3.3:NewMasterRecoveryPhase..WedAug917:47:142017-[info]WedAug917:47:142017-[info]Waitingalllogstobeapplied..WedAug917:47:142017-[info]done.WedAug917:47:142017-[info]Gettingnewmaster'sbinlognameandposition..WedAug917:47:142017-[info]3306-binlog.000003:61944788WedAug917:47:142017-[info]Allotherslavesshouldstartreplicationfromhere.Statementshouldbe:CHANGEMASTERTOMASTER_HOST='MHA-S1or10.180.2.164',MASTER_PORT=3306,MASTER_AUTO_POSITION=1,MASTER_USER='repl',MASTER_PASSWORD='xxx';WedAug917:47:142017-[info]MasterRecoverysucceeded.File:Pos:Exec_Gtid_Set:3306-binlog.000003,61944788,1c2dc99f-7b57-11e7-a280-005056b665cb:1-2,a5757eae-7981-11e7-82c7-005056b662d3:1-32210WedAug917:47:142017-[info]ExecutingmasterIPactivatescript:WedAug917:47:142017-[info]/usr/local/bin/master_ip_failover--command=start--ssh_user=root--orig_master_host=MHA-M1--orig_master_ip=10.180.2.163--orig_master_port=3306--new_master_host=MHA-S1--new_master_ip=10.180.2.164--new_master_port=3306--new_master_user='root'--new_master_password=xxxUnknownoption:new_master_userUnknownoption:new_master_passwordINSCRIPTTEST====/sbin/ifconfigeth2:1down==/sbin/ifconfigeth2:110.180.2.168/24===EnablingtheVIP-10.180.2.168/24onthenewmaster-MHA-S1WedAug917:47:142017-[info]OK.WedAug917:47:142017-[info]Settingread_only=0onMHA-S1(10.180.2.164:3306)..WedAug917:47:142017-[info]ok.WedAug917:47:142017-[info]**Finishedmasterrecoverysuccessfully.WedAug917:47:142017-[info]*Phase3:MasterRecoveryPhasecompleted.WedAug917:47:142017-[info]WedAug917:47:142017-[info]*Phase4:SlavesRecoveryPhase..WedAug917:47:142017-[info]WedAug917:47:142017-[info]WedAug917:47:142017-[info]*Phase4.1:StartingSlavesinparallel..WedAug917:47:142017-[info]WedAug917:47:142017-[info]--SlaverecoveryonhostMHA-S2(10.180.2.165:3306)started,pid:18757.Checktmplog/var/log/masterha/app1/MHA-S2_3306_20170809174714.logifittakestime..WedAug917:47:152017-[info]WedAug917:47:152017-[info]LogmessagesfromMHA-S2...WedAug917:47:152017-[info]WedAug917:47:142017-[info]ResettingslaveMHA-S2(10.180.2.165:3306)andstartingreplicationfromthenewmasterMHA-S1(10.180.2.164:3306)..WedAug917:47:142017-[info]ExecutedCHANGEMASTER.WedAug917:47:152017-[info]Slavestarted.WedAug917:47:152017-[info]gtid_wait(1c2dc99f-7b57-11e7-a280-005056b665cb:1-2,a5757eae-7981-11e7-82c7-005056b662d3:1-32210)completedonMHA-S2(10.180.2.165:3306).Executed0events.WedAug917:47:152017-[info]EndoflogmessagesfromMHA-S2.WedAug917:47:152017-[info]--SlaveonhostMHA-S2(10.180.2.165:3306)started.WedAug917:47:152017-[info]Allnewslaveserversrecoveredsuccessfully.WedAug917:47:152017-[info]WedAug917:47:152017-[info]*Phase5:Newmastercleanupphase..WedAug917:47:152017-[info]WedAug917:47:152017-[info]Resettingslaveinfoonthenewmaster..WedAug917:47:152017-[info]MHA-S1:Resettingslaveinfosucceeded.WedAug917:47:152017-[info]MasterfailovertoMHA-S1(10.180.2.164:3306)completedsuccessfully.WedAug917:47:152017-[info]Deletedserver1entryfrom/etc/mha/app1.cnf.WedAug917:47:152017-[info]-----FailoverReport-----app1:MySQLMasterfailoverMHA-M1(10.180.2.163:3306)toMHA-S1(10.180.2.164:3306)succeededMasterMHA-M1(10.180.2.163:3306)isdown!CheckMHAManagerlogsatMHA-S2:/var/log/masterha/app1/manager.logfordetails.Startedautomated(non-interactive)failover.InvalidatedmasterIPaddressonMHA-M1(10.180.2.163:3306)SelectedMHA-S1(10.180.2.164:3306)asanewmaster.MHA-S1(10.180.2.164:3306):OK:Applyingalllogssucceeded.MHA-S1(10.180.2.164:3306):OK:ActivatedmasterIPaddress.MHA-S2(10.180.2.165:3306):OK:Slavestarted,replicatingfromMHA-S1(10.180.2.164:3306)MHA-S1(10.180.2.164:3306):Resettingslaveinfosucceeded.MasterfailovertoMHA-S1(10.180.2.164:3306)completedsuccessfully.WedAug917:47:152017-[info]Sendingmail..Unknownoption:conf

以上是切换的全日志过程,我们可以看到MHA 切换主要经历以下步骤:

1.配置文件检查阶段,这个阶段会检查整个集群配置文件配置

2.宕机的master处理,这个阶段包括虚拟ip摘除操作,主机关机操作(这个我这里还没有实现,需要研究)

3.复制dead maste和最新slave相差的relay log,并保存到MHA Manger具体的目录下

4.识别含有最新更新的slave

5.应用从master保存的二进制日志事件(binlog events)

6.提升一个slave为新的master进行复制

7.使其他的slave连接新的master进行复制

注意:

1. 切换完之后你会发现MHA Manager 监控程序会自动死掉,官网有如下解释和解决方式:

RunningMHAManagerfromdaemontoolsCurrentlyMHAManagerprocessdoesnotrunasadaemon.Iffailovercompletedsuccessfullyorthemasterprocesswaskilledbyaccident,themanagerstopsworking.Torunasadaemon,daemontool.oranyexternaldaemonprogramcanbeused.Hereisanexampletorunfromdaemontools.

这里我们用shell 脚本的方式去执行就不会发生监控程序死掉的情况

[root@MHA-S2bin]#moremanager.sh#!/bin/shnohupmasterha_manager--conf=/etc/mha/app1.cnf--remove_dead_master_conf--ignore_last_failover</dev/null>/var/log/masterha/app1/manager.log2>&1&

2. 当你修复完死掉的master想重新加入先有的两节点MHA 也是可以的

旧Master :

root@localhost:mysql3306.sock[tt]>showmasterstatus\G***************************1.row***************************File:3306-binlog.000004Position:194Binlog_Do_DB:Binlog_Ignore_DB:Executed_Gtid_Set:a5757eae-7981-11e7-82c7-005056b662d3:1-322101rowinset(0.00sec)

现有master:

root@localhost:mysql3306.sock[tt]>showmasterstatus\G***************************1.row***************************File:3306-binlog.000003Position:61945043Binlog_Do_DB:Binlog_Ignore_DB:Executed_Gtid_Set:1c2dc99f-7b57-11e7-a280-005056b665cb:1-3,a5757eae-7981-11e7-82c7-005056b662d3:1-322101rowinset(0.00sec)

由于有GTID,我们可以直接就change master 切换过去,先对比一下数据:

旧master:

root@localhost:mysql3306.sock[tt]>select*fromt1;+----+------+|id|c1|+----+------+|1|a1||2|a2||3|a3||4|a4|+----+------+4rowsinset(0.02sec)

新master:

root@localhost:mysql3306.sock[tt]>select*fromt1;+----+------+|id|c1|+----+------+|1|a1||2|a2||3|a3||4|a4||5|a5|+----+------+

旧master 直接change master to:

changemastertomaster_host='MHA-S1',master_user='repl',master_password='123456',master_port=3306,master_auto_position=1;

start slave 看输出:

root@localhost:mysql3306.sock[tt]>showslavestatus\G***************************1.row***************************Slave_IO_State:WaitingformastertosendeventMaster_Host:MHA-S1Master_User:replMaster_Port:3306Connect_Retry:60Master_Log_File:3306-binlog.000003Read_Master_Log_Pos:61945043Relay_Log_File:MHA-M1-relay-bin.000004Relay_Log_Pos:715Relay_Master_Log_File:3306-binlog.000003Slave_IO_Running:YesSlave_SQL_Running:Yes

看是否会补全数据:

root@localhost:mysql3306.sock[tt]>select*fromt1;+----+------+|id|c1|+----+------+|1|a1||2|a2||3|a3||4|a4||5|a5|+----+------+

发现数据补全了,加入复制没问题。

最后还得修改app1.cnf 把server1 补上

[server1]hostname=MHA-M1port=3306

重启监控程序并查看MHA 状态

[root@MHA-S2tmp]#masterha_check_repl--conf=/etc/mha/app1.cnfSatAug1220:37:012017-[info]Replicationfilteringcheckok.SatAug1220:37:012017-[error][/usr/local/share/perl5/MHA/Server.pm,ln398]MHA-M1(10.180.2.163:3306):UserrepldoesnotexistordoesnothaveREPLICATIONSLAVEprivilege!Otherslavescannotstartreplicationfromthishost.SatAug1220:37:012017-[error][/usr/local/share/perl5/MHA/MasterMonitor.pm,ln427]Errorhappenedoncheckingconfigurations.at/usr/local/share/perl5/MHA/ServerManager.pmline1403SatAug1220:37:012017-[error][/usr/local/share/perl5/MHA/MasterMonitor.pm,ln525]Errorhappenedonmonitoringservers.SatAug1220:37:012017-[info]Gotexitcode1(Notmasterdead).

发现权限有问题,赶紧修复一下:

MHA-M1:

setsessionsql_log_bin=OFF;grantreplicationslaveon*.*torepl@'%'identifiedby'123456';setsessionsql_log_bin=ON;

再次执行MHA 状态检查:

masterha_check_repl--conf=/etc/mha/app1.cnfSatAug1220:41:142017-[info]CheckingreplicationhealthonMHA-M1..SatAug1220:41:142017-[info]ok.SatAug1220:41:142017-[info]CheckingreplicationhealthonMHA-S2..SatAug1220:41:142017-[info]ok.SatAug1220:41:142017-[info]Checkingmaster_ip_failover_scriptstatus:SatAug1220:41:142017-[info]/usr/local/bin/master_ip_failover--command=status--ssh_user=root--orig_master_host=MHA-S1--orig_master_ip=10.180.2.164--orig_master_port=3306INSCRIPTTEST====/sbin/ifconfigeth2:1down==/sbin/ifconfigeth2:110.180.2.168/24===CheckingtheStatusofthescript..OKSatAug1220:41:152017-[info]OK.SatAug1220:41:152017-[warning]shutdown_scriptisnotdefined.SatAug1220:41:152017-[info]Gotexitcode0(Notmasterdead).MySQLReplicationHealthisOK.

最后启动监控程序

[root@MHA-S2bin]#nohupmonitor.sh&[root@MHA-S2bin]#masterha_check_status--conf=/etc/mha/app1.cnfapp1(pid:32084)isrunning(0:PING_OK),master:MHA-S1

(2)手动在线切换测试

在许多情况下, 需要将现有的主服务器迁移到另外一台服务器上。 比如主服务器硬件故障,RAID 控制卡需要重建,将主服务器移到性能更好的服务器上等等。维护主服务器引起性能下降, 导致停机时间至少无法写入数据。 另外, 阻塞或杀掉当前运行的会话会导致主主之间数据不一致的问题发生。 MHA 提供快速切换和优雅的阻塞写入,这个切换过程只需要 0.5-2s 的时间,这段时间内数据是无法写入的。在很多情况下,0.5-2s 的阻塞写入是可以接受的。因此切换主服务器不需要计划分配维护时间窗口。

MHA在线切换的大概过程:
1.检测复制设置和确定当前主服务器
2.确定新的主服务器
3.阻塞写入到当前主服务器
4.等待所有从服务器赶上复制
5.授予写入到新的主服务器
6.重新设置从服务器

注意,在线切换的时候应用架构需要考虑以下两个问题:

1.自动识别master和slave的问题(master的机器可能会切换),如果采用了vip的方式,基本可以解决这个问题。

2.负载均衡的问题(可以定义大概的读写比例,每台机器可承担的负载比例,当有机器离开集群时,需要考虑这个问题)

为了保证数据完全一致性,在最快的时间内完成切换,MHA的在线切换必须满足以下条件才会切换成功,否则会切换失败。

1.所有slave的IO线程都在运行

2.所有slave的SQL线程都在运行

3.所有的show slave status的输出中Seconds_Behind_Master参数小于或者等于running_updates_limit秒,如果在切换过程中不指定running_updates_limit,那么默认情况下running_updates_limit为1秒。

4.在master端,通过show processlist输出,没有一个更新花费的时间大于running_updates_limit秒。

在线切换步骤如下:

先停止监控程序

[root@MHA-S2app1]#masterha_stop--conf=/etc/mha/app1.cnfStoppedapp1successfully.

修改master_ip_online_change脚本如下:

[root@MHA-S2bin]#moremaster_ip_online_change#!/usr/bin/envperl#Copyright(C)2011DeNACo.,Ltd.##Thisprogramisfreesoftware;youcanredistributeitand/ormodify#itunderthetermsoftheGNUGeneralPublicLicenseaspublishedby#theFreeSoftwareFoundation;eitherversion2oftheLicense,or#(atyouroption)anylaterversion.##Thisprogramisdistributedinthehopethatitwillbeuseful,#butWITHOUTANYWARRANTY;withouteventheimpliedwarrantyof#MERCHANTABILITYorFITNESSFORAPARTICULARPURPOSE.Seethe#GNUGeneralPublicLicenseformoredetails.##YoushouldhavereceivedacopyoftheGNUGeneralPublicLicense#alongwiththisprogram;ifnot,writetotheFreeSoftware#Foundation,Inc.,#51FranklinStreet,FifthFloor,Boston,MA02110-1301USA##Note:Thisisasamplescriptandisnotcomplete.Modifythescriptbasedonyourenvironment.usestrict;usewarningsFATAL=>'all';useGetopt::Long;useMHA::DBHelper;useMHA::NodeUtil;useTime::HiResqw(sleepgettimeofdaytv_interval);useData::Dumper;my$_tstart;my$_running_interval=0.1;my($command,$orig_master_host,$orig_master_ip,$orig_master_port,$orig_master_user,$new_master_host,$new_master_ip,$new_master_port,$new_master_user,);my$vip='10.180.2.168/19';#VirtualIPmy$key="1";my$ssh_start_vip="/sbin/ifconfigeth2:$key$vip";my$ssh_stop_vip="/sbin/ifconfigeth2:$keydown";my$ssh_user="root";my$new_master_password='123456';my$orig_master_password='123456';GetOptions('command=s'=>\$command,#'ssh_user=s'=>\$ssh_user,'orig_master_host=s'=>\$orig_master_host,'orig_master_ip=s'=>\$orig_master_ip,'orig_master_port=i'=>\$orig_master_port,'orig_master_user=s'=>\$orig_master_user,#'orig_master_password=s'=>\$orig_master_password,'new_master_host=s'=>\$new_master_host,'new_master_ip=s'=>\$new_master_ip,'new_master_port=i'=>\$new_master_port,'new_master_user=s'=>\$new_master_user,#'new_master_password=s'=>\$new_master_password,);exit&main();subcurrent_time_us{my($sec,$microsec)=gettimeofday();my$curdate=localtime($sec);return$curdate."".sprintf("%06d",$microsec);}subsleep_until{my$elapsed=tv_interval($_tstart);if($_running_interval>$elapsed){sleep($_running_interval-$elapsed);}}subget_threads_util{my$dbh=shift;my$my_connection_id=shift;my$running_time_threshold=shift;my$type=shift;$running_time_threshold=0unless($running_time_threshold);$type=0unless($type);my@threads;my$sth=$dbh->prepare("SHOWPROCESSLIST");$sth->execute();while(my$ref=$sth->fetchrow_hashref()){my$id=$ref->{Id};my$user=$ref->{User};my$host=$ref->{Host};my$command=$ref->{Command};my$state=$ref->{State};my$query_time=$ref->{Time};my$info=$ref->{Info};$info=~s/^\s*(.*?)\s*$/$1/ifdefined($info);nextif($my_connection_id==$id);nextif(defined($query_time)&&$query_time<$running_time_threshold);nextif(defined($command)&&$commandeq"BinlogDump");nextif(defined($user)&&$usereq"systemuser");nextif(defined($command)&&$commandeq"Sleep"&&defined($query_time)&&$query_time>=1);if($type>=1){nextif(defined($command)&&$commandeq"Sleep");nextif(defined($command)&&$commandeq"Connect");}if($type>=2){nextif(defined($info)&&$info=~m/^select/i);nextif(defined($info)&&$info=~m/^show/i);}push@threads,$ref;}return@threads;}submain{if($commandeq"stop"){##Gracefullykillingconnectionsonthecurrentmaster#1.Setread_only=1onthenewmaster#2.DROPUSERsothatnoappusercanestablishnewconnections#3.Setread_only=1onthecurrentmaster#4.Killcurrentqueries#*Anydatabaseaccessfailurewillresultinscriptdie.my$exit_code=1;eval{##Settingread_only=1onthenewmaster(toavoidaccident)my$new_master_handler=newMHA::DBHelper();#args:hostname,port,user,password,raise_error(die_on_error)_or_not$new_master_handler->connect($new_master_ip,$new_master_port,$new_master_user,$new_master_password,1);printcurrent_time_us()."Setread_onlyonthenewmaster..";$new_master_handler->enable_read_only();if($new_master_handler->is_read_only()){print"ok.\n";}else{die"Failed!\n";}$new_master_handler->disconnect();#Connectingtotheorigmaster,dieifanydatabaseerrorhappensmy$orig_master_handler=newMHA::DBHelper();$orig_master_handler->connect($orig_master_ip,$orig_master_port,$orig_master_user,$orig_master_password,1);##Dropapplicationusersothatnobodycanconnect.Disablingper-sessionbinlogbeforehand#$orig_master_handler->disable_log_bin_local();#printcurrent_time_us()."Drppingappuserontheorigmaster..\n";#FIXME_xxx_drop_app_user($orig_master_handler);##WaitingforN*100millisecondssothatcurrentconnectionscanexitmy$time_until_read_only=15;$_tstart=[gettimeofday];my@threads=get_threads_util($orig_master_handler->{dbh},$orig_master_handler->{connection_id});while($time_until_read_only>0&&$#threads>=0){if($time_until_read_only%5==0){printf"%sWaitingallrunning%dthreadsaredisconnected..(max%dmilliseconds)\n",current_time_us(),$#threads+1,$time_until_read_only*100;if($#threads<5){printData::Dumper->new([$_])->Indent(0)->Terse(1)->Dump."\n"foreach(@threads);}}sleep_until();$_tstart=[gettimeofday];$time_until_read_only--;@threads=get_threads_util($orig_master_handler->{dbh},$orig_master_handler->{connection_id});}##Settingread_only=1onthecurrentmastersothatnobody(exceptSUPER)canwriteprintcurrent_time_us()."Setread_only=1ontheorigmaster..";$orig_master_handler->enable_read_only();if($orig_master_handler->is_read_only()){print"ok.\n";}else{die"Failed!\n";}##WaitingforM*100millisecondssothatcurrentupdatequeriescancompletemy$time_until_kill_threads=5;@threads=get_threads_util($orig_master_handler->{dbh},$orig_master_handler->{connection_id});while($time_until_kill_threads>0&&$#threads>=0){if($time_until_kill_threads%5==0){printf"%sWaitingallrunning%dqueriesaredisconnected..(max%dmilliseconds)\n",current_time_us(),$#threads+1,$time_until_kill_threads*100;if($#threads<5){printData::Dumper->new([$_])->Indent(0)->Terse(1)->Dump."\n"foreach(@threads);}}sleep_until();$_tstart=[gettimeofday];$time_until_kill_threads--;@threads=get_threads_util($orig_master_handler->{dbh},$orig_master_handler->{connection_id});}print"DisablingtheVIPonoldmaster:$orig_master_host\n";&stop_vip();##Terminatingallthreadsprintcurrent_time_us()."Killingallapplicationthreads..\n";$orig_master_handler->kill_threads(@threads)if($#threads>=0);printcurrent_time_us()."done.\n";#$orig_master_handler->enable_log_bin_local();$orig_master_handler->disconnect();##Afterfinishingthescript,MHAexecutesFLUSHTABLESWITHREADLOCK$exit_code=0;};if($@){warn"GotError:$@\n";exit$exit_code;}exit$exit_code;}elsif($commandeq"start"){##Activatingmasteriponthenewmaster#1.Createappuserwithwriteprivileges#2.Movingbackupscriptifneeded#3.Registernewmaster'siptothecatalogdatabase#Wedon'treturnerroreventhoughactivatingupdatableaccounts/ipfailedsothatwedon'tinterruptslaves'recovery.#Ifexitcodeis0or10,MHAdoesnotabortmy$exit_code=10;eval{my$new_master_handler=newMHA::DBHelper();#args:hostname,port,user,password,raise_error_or_not$new_master_handler->connect($new_master_ip,$new_master_port,$new_master_user,$new_master_password,1);##Setread_only=0onthenewmaster#$new_master_handler->disable_log_bin_local();printcurrent_time_us()."Setread_only=0onthenewmaster.\n";$new_master_handler->disable_read_only();##Creatinganappuseronthenewmaster#printcurrent_time_us()."Creatingappuseronthenewmaster..\n";#FIXME_xxx_create_app_user($new_master_handler);#$new_master_handler->enable_log_bin_local();$new_master_handler->disconnect();##Updatemasteriponthecatalogdatabase,etcprint"EnablingtheVIP-$viponthenewmaster-$new_master_host\n";&start_vip();$exit_code=0;};if($@){warn"GotError:$@\n";exit$exit_code;}exit$exit_code;}elsif($commandeq"status"){#donothingexit0;}else{&usage();exit1;}}#AsimplesystemcallthatenabletheVIPonthenewmastersubstart_vip(){`ssh$ssh_user\@$new_master_host\"$ssh_start_vip\"`;}#AsimplesystemcallthatdisabletheVIPontheold_mastersubstop_vip(){`ssh$ssh_user\@$orig_master_host\"$ssh_stop_vip\"`;}subusage{print"Usage:master_ip_online_change--command=start|stop|status--orig_master_host=host--orig_master_ip=ip--orig_master_port=port--new_master_host=host--new_master_ip=ip--new_master_port=port\n";die;}

执行切换

[root@MHA-S2tmp]#masterha_master_switch--conf=/etc/mha/app1.cnf--master_state=alive--new_master_host=MHA-M1--new_master_port=3306--orig_master_is_new_slave--running_updates_limit=10000

其中参数的意思:

--orig_master_is_new_slave 切换时加上此参数是将原 master 变为 slave 节点,如果不加此参数,原来的 master 将不启动

--running_updates_limit=10000,故障切换时,候选master 如果有延迟的话, mha 切换不能成功,加上此参数表示延迟在此时间范围内都可切换(单位为s),但是切换的时间长短是由recover 时relay 日志的大小决定

查看切换后各机器的状态:

S2:

root@localhost:mysql3306.sock[tt]>showslavestatus\G***************************1.row***************************Slave_IO_State:WaitingformastertosendeventMaster_Host:MHA-M1Master_User:replMaster_Port:3306Connect_Retry:60Master_Log_File:3306-binlog.000004Read_Master_Log_Pos:748Relay_Log_File:MHA-S2-relay-bin.000002Relay_Log_Pos:420Relay_Master_Log_File:3306-binlog.000004Slave_IO_Running:YesSlave_SQL_Running:Yes

S1:

root@localhost:mysql3306.sock[tt]>showslavestatus\G***************************1.row***************************Slave_IO_State:WaitingformastertosendeventMaster_Host:MHA-M1Master_User:replMaster_Port:3306Connect_Retry:60Master_Log_File:3306-binlog.000004Read_Master_Log_Pos:748Relay_Log_File:MHA-S1-relay-bin.000002Relay_Log_Pos:420Relay_Master_Log_File:3306-binlog.000004Slave_IO_Running:YesSlave_SQL_Running:Yes

M1:

root@localhost:mysql3306.sock[tt]>showslavestatus\GEmptyset(0.00sec)

在线切换的日志:

[root@MHA-S2tmp]#moresw.log[root@MHA-S2bin]#masterha_master_switch--conf=/etc/mha/app1.cnf--master_state=alive--new_master_host=MHA-M1--new_master_port=3306--orig_master_is_new_slave--running_updates_limit=10000SatAug1221:34:542017-[info]MHA::MasterRotateversion0.57.SatAug1221:34:542017-[info]Startingonlinemasterswitch..SatAug1221:34:542017-[info]SatAug1221:34:542017-[info]*Phase1:ConfigurationCheckPhase..SatAug1221:34:542017-[info]SatAug1221:34:542017-[warning]Globalconfigurationfile/etc/masterha_default.cnfnotfound.Skipping.SatAug1221:34:542017-[info]Readingapplicationdefaultconfigurationfrom/etc/mha/app1.cnf..SatAug1221:34:542017-[info]Readingserverconfigurationfrom/etc/mha/app1.cnf..SatAug1221:34:542017-[info]GTIDfailovermode=1SatAug1221:34:542017-[info]CurrentAliveMaster:MHA-S1(10.180.2.164:3306)SatAug1221:34:542017-[info]AliveSlaves:SatAug1221:34:542017-[info]MHA-M1(10.180.2.163:3306)Version=5.7.18-log(oldestmajorversionbetweenslaves)log-bin:enabledSatAug1221:34:542017-[info]GTIDONSatAug1221:34:542017-[info]ReplicatingfromMHA-S1(10.180.2.164:3306)SatAug1221:34:542017-[info]MHA-S2(10.180.2.165:3306)Version=5.7.18-log(oldestmajorversionbetweenslaves)log-bin:enabledSatAug1221:34:542017-[info]GTIDONSatAug1221:34:542017-[info]ReplicatingfromMHA-S1(10.180.2.164:3306)ItisbettertoexecuteFLUSHNO_WRITE_TO_BINLOGTABLESonthemasterbeforeswitching.IsitoktoexecuteonMHA-S1(10.180.2.164:3306)?(YES/no):yesSatAug1221:35:072017-[info]ExecutingFLUSHNO_WRITE_TO_BINLOGTABLES.Thismaytakelongtime..SatAug1221:35:072017-[info]ok.SatAug1221:35:072017-[info]CheckingMHAisnotmonitoringordoingfailover..SatAug1221:35:072017-[info]CheckingreplicationhealthonMHA-M1..SatAug1221:35:072017-[info]ok.SatAug1221:35:072017-[info]CheckingreplicationhealthonMHA-S2..SatAug1221:35:072017-[info]ok.SatAug1221:35:072017-[info]MHA-M1canbenewmaster.SatAug1221:35:072017-[info]From:MHA-S1(10.180.2.164:3306)(currentmaster)+--MHA-M1(10.180.2.163:3306)+--MHA-S2(10.180.2.165:3306)To:MHA-M1(10.180.2.163:3306)(newmaster)+--MHA-S2(10.180.2.165:3306)+--MHA-S1(10.180.2.164:3306)StartingmasterswitchfromMHA-S1(10.180.2.164:3306)toMHA-M1(10.180.2.163:3306)?(yes/NO):yesSatAug1221:35:152017-[info]CheckingwhetherMHA-M1(10.180.2.163:3306)isokforthenewmaster..SatAug1221:35:152017-[info]ok.SatAug1221:35:152017-[info]MHA-S1(10.180.2.164:3306):SHOWSLAVESTATUSreturnedemptyresult.Tocheckreplicationfilteringrules,temporarilyexecutingCHANGEMASTERtoadummyhost.SatAug1221:35:152017-[info]MHA-S1(10.180.2.164:3306):Resettingslavepointingtothedummyhost.SatAug1221:35:152017-[info]**Phase1:ConfigurationCheckPhasecompleted.SatAug1221:35:152017-[info]SatAug1221:35:152017-[info]*Phase2:RejectingupdatesPhase..SatAug1221:35:152017-[info]SatAug1221:35:152017-[info]Executingmasteriponlinechangescripttodisablewriteonthecurrentmaster:SatAug1221:35:152017-[info]/usr/local/bin/master_ip_online_change--command=stop--orig_master_host=MHA-S1--orig_master_ip=10.180.2.164--orig_master_port=3306--orig_master_user='root'--new_master_host=MHA-M1--new_master_ip=10.180.2.163--new_master_port=3306--new_master_user='root'--orig_master_ssh_user=root--new_master_ssh_user=root--orig_master_is_new_slave--orig_master_password=xxx--new_master_password=xxxUnknownoption:orig_master_ssh_userUnknownoption:new_master_ssh_userUnknownoption:orig_master_is_new_slaveUnknownoption:orig_master_passwordUnknownoption:new_master_passwordSatAug1221:35:152017568580Setread_onlyonthenewmaster..ok.SatAug1221:35:152017573508Waitingallrunning2threadsaredisconnected..(max1500milliseconds){'Time'=>'272878','Command'=>'BinlogDumpGTID','db'=>undef,'Id'=>'40','Info'=>undef,'User'=>'repl','State'=>'Masterhassentallbinlogtoslave;waitingformoreupdates','Host'=>'MHA-S2:46970'}{'Time'=>'3738','Command'=>'BinlogDumpGTID','db'=>undef,'Id'=>'55','Info'=>undef,'User'=>'repl','State'=>'Masterhassentallbinlogtoslave;waitingformoreupdates','Host'=>'MHA-M1:51506'}SatAug1221:35:162017075020Waitingallrunning2threadsaredisconnected..(max1000milliseconds){'Time'=>'272879','Command'=>'BinlogDumpGTID','db'=>undef,'Id'=>'40','Info'=>undef,'User'=>'repl','State'=>'Masterhassentallbinlogtoslave;waitingformoreupdates','Host'=>'MHA-S2:46970'}{'Time'=>'3739','Command'=>'BinlogDumpGTID','db'=>undef,'Id'=>'55','Info'=>undef,'User'=>'repl','State'=>'Masterhassentallbinlogtoslave;waitingformoreupdates','Host'=>'MHA-M1:51506'}SatAug1221:35:162017576059Waitingallrunning2threadsaredisconnected..(max500milliseconds){'Time'=>'272879','Command'=>'BinlogDumpGTID','db'=>undef,'Id'=>'40','Info'=>undef,'User'=>'repl','State'=>'Masterhassentallbinlogtoslave;waitingformoreupdates','Host'=>'MHA-S2:46970'}{'Time'=>'3739','Command'=>'BinlogDumpGTID','db'=>undef,'Id'=>'55','Info'=>undef,'User'=>'repl','State'=>'Masterhassentallbinlogtoslave;waitingformoreupdates','Host'=>'MHA-M1:51506'}SatAug1221:35:172017076940Setread_only=1ontheorigmaster..ok.SatAug1221:35:172017079645Waitingallrunning2queriesaredisconnected..(max500milliseconds){'Time'=>'272880','Command'=>'BinlogDumpGTID','db'=>undef,'Id'=>'40','Info'=>undef,'User'=>'repl','State'=>'Masterhassentallbinlogtoslave;waitingformoreupdates','Host'=>'MHA-S2:46970'}{'Time'=>'3740','Command'=>'BinlogDumpGTID','db'=>undef,'Id'=>'55','Info'=>undef,'User'=>'repl','State'=>'Masterhassentallbinlogtoslave;waitingformoreupdates','Host'=>'MHA-M1:51506'}DisablingtheVIPonoldmaster:MHA-S1SatAug1221:35:172017683769Killingallapplicationthreads..SatAug1221:35:172017686090done.SatAug1221:35:172017-[info]ok.SatAug1221:35:172017-[info]Lockingalltablesontheorigmastertorejectupdatesfromeverybody(includingroot):SatAug1221:35:172017-[info]ExecutingFLUSHTABLESWITHREADLOCK..SatAug1221:35:172017-[info]ok.SatAug1221:35:172017-[info]Origmasterbinlog:posis3306-binlog.000003:61945043.SatAug1221:35:172017-[info]WaitingtoexecuteallrelaylogsonMHA-M1(10.180.2.163:3306)..SatAug1221:35:172017-[info]master_pos_wait(3306-binlog.000003:61945043)completedonMHA-M1(10.180.2.163:3306).Executed0events.SatAug1221:35:172017-[info]done.SatAug1221:35:172017-[info]Gettingnewmaster'sbinlognameandposition..SatAug1221:35:172017-[info]3306-binlog.000004:748SatAug1221:35:172017-[info]Allotherslavesshouldstartreplicationfromhere.Statementshouldbe:CHANGEMASTERTOMASTER_HOST='MHA-M1or10.180.2.163',MASTER_PORT=3306,MASTER_AUTO_POSITION=1,MASTER_USER='repl',MASTER_PASSWORD='xxx';SatAug1221:35:172017-[info]Executingmasteriponlinechangescripttoallowwriteonthenewmaster:SatAug1221:35:172017-[info]/usr/local/bin/master_ip_online_change--command=start--orig_master_host=MHA-S1--orig_master_ip=10.180.2.164--orig_master_port=3306--orig_master_user='root'--new_master_host=MHA-M1--new_master_ip=10.180.2.163--new_master_port=3306--new_master_user='root'--orig_master_ssh_user=root--new_master_ssh_user=root--orig_master_is_new_slave--orig_master_password=xxx--new_master_password=xxxUnknownoption:orig_master_ssh_userUnknownoption:new_master_ssh_userUnknownoption:orig_master_is_new_slaveUnknownoption:orig_master_passwordUnknownoption:new_master_passwordSatAug1221:35:172017865209Setread_only=0onthenewmaster.EnablingtheVIP-10.180.2.168/19onthenewmaster-MHA-M1SatAug1221:35:172017-[info]ok.SatAug1221:35:172017-[info]SatAug1221:35:172017-[info]*Switchingslavesinparallel..SatAug1221:35:172017-[info]SatAug1221:35:172017-[info]--SlaveswitchonhostMHA-S2(10.180.2.165:3306)started,pid:2327SatAug1221:35:172017-[info]SatAug1221:35:182017-[info]LogmessagesfromMHA-S2...SatAug1221:35:182017-[info]SatAug1221:35:182017-[info]WaitingtoexecuteallrelaylogsonMHA-S2(10.180.2.165:3306)..SatAug1221:35:182017-[info]master_pos_wait(3306-binlog.000003:61945043)completedonMHA-S2(10.180.2.165:3306).Executed0events.SatAug1221:35:182017-[info]done.SatAug1221:35:182017-[info]ResettingslaveMHA-S2(10.180.2.165:3306)andstartingreplicationfromthenewmasterMHA-M1(10.180.2.163:3306)..SatAug1221:35:182017-[info]ExecutedCHANGEMASTER.SatAug1221:35:182017-[info]Slavestarted.SatAug1221:35:182017-[info]EndoflogmessagesfromMHA-S2...SatAug1221:35:182017-[info]SatAug1221:35:182017-[info]--SlaveswitchonhostMHA-S2(10.180.2.165:3306)succeeded.SatAug1221:35:182017-[info]Unlockingalltablesontheorigmaster:SatAug1221:35:182017-[info]ExecutingUNLOCKTABLES..SatAug1221:35:182017-[info]ok.SatAug1221:35:182017-[info]Startingorigmasterasanewslave..SatAug1221:35:182017-[info]ResettingslaveMHA-S1(10.180.2.164:3306)andstartingreplicationfromthenewmasterMHA-M1(10.180.2.163:3306)..SatAug1221:35:182017-[info]ExecutedCHANGEMASTER.SatAug1221:35:192017-[info]Slavestarted.SatAug1221:35:192017-[info]Allnewslaveserversswitchedsuccessfully.SatAug1221:35:192017-[info]SatAug1221:35:192017-[info]*Phase5:Newmastercleanupphase..SatAug1221:35:192017-[info]SatAug1221:35:192017-[info]MHA-M1:Resettingslaveinfosucceeded.SatAug1221:35:192017-[info]SwitchingmastertoMHA-M1(10.180.2.163:3306)completedsuccessfully.

感谢你能够认真阅读完这篇文章,希望小编分享的“MySQL高可用架构之MHA的原理分析”这篇文章对大家有帮助,同时也希望大家多多支持亿速云,关注亿速云行业资讯频道,更多相关知识等着你来学习!