手动注册binlog文件造成主从异常的示例分析
这篇文章给大家分享的是有关手动注册binlog文件造成主从异常的示例分析的内容。小编觉得挺实用的,因此分享给大家做个参考,一起跟随小编过来看看吧。
一、问题来源有一个朋友@水米田 问我,基于POSITION的主从。他做了如下的操作
将备份的一些binlog文件加入到了目录中
修改index文件,加入了这些binlog文件
flush binary logs
然后整个主从环境大量延迟。
二、朋友的测试下面是另外一个朋友@徐晨亮的测试:
master上:(root:db1@xucl:10:30:22)[(none)]>showbinarylogs;+---------------------+-----------+|Log_name|File_size|+---------------------+-----------+|mysql-binlog.000031|1019||mysql-binlog.000032|424||mysql-binlog.000033|244||mysql-binlog.000034|2332||mysql-binlog.000035|2134||mysql-binlog.000036|845915||mysql-binlog.000037|11735||mysql-binlog.000038|284||mysql-binlog.000039|284||mysql-binlog.000040|284||mysql-binlog.000041|284||mysql-binlog.000042|234|+---------------------+-----------+12rowsinset(0.00sec)(root:db1@xucl:10:30:34)[(none)]>purgebinarylogsto'mysql-binlog.000039';QueryOK,0rowsaffected(0.00sec)(root:db1@xucl:10:30:49)[(none)]>showbinarylogs;+---------------------+-----------+|Log_name|File_size|+---------------------+-----------+|mysql-binlog.000039|284||mysql-binlog.000040|284||mysql-binlog.000041|284||mysql-binlog.000042|234|+---------------------+-----------+4rowsinset(0.00sec)执行插入数据(root:db1@xucl:10:31:23)[xucl]>insertintotvalues(9,9);将备份的binlog拷贝回原先的目录并修改index文件进行注册[root@izbp12nspj47ypto9t6vyezlogs]#ll总用量884-rw-r-----1mysqlmysql10195月2022:03mysql-binlog.000031-rw-r-----1mysqlmysql4245月2022:03mysql-binlog.000032-rw-r-----1mysqlmysql2445月2022:03mysql-binlog.000033-rw-r-----1mysqlmysql23325月2022:03mysql-binlog.000034-rw-r-----1mysqlmysql21345月2022:03mysql-binlog.000035-rw-r-----1mysqlmysql8459155月2022:03mysql-binlog.000036-rw-r-----1mysqlmysql117355月2022:05mysql-binlog.000037-rw-r-----1mysqlmysql2845月2022:06mysql-binlog.000038-rw-r-----1mysqlmysql2845月2110:28mysql-binlog.000039-rw-r-----1mysqlmysql2845月2110:28mysql-binlog.000040-rw-r-----1mysqlmysql2845月2110:28mysql-binlog.000041-rw-r-----1mysqlmysql4915月2110:31mysql-binlog.000042-rw-r-----1mysqlmysql2045月2110:30mysql-binlog.index主库flushbinarylogs(root:db1@xucl:10:32:51)[(none)]>flushbinarylogs;QueryOK,0rowsaffected(0.01sec)(root:db1@xucl:10:32:57)[(none)]>showbinarylogs;+---------------------+-----------+|Log_name|File_size|+---------------------+-----------+|mysql-binlog.000031|1019||mysql-binlog.000032|424||mysql-binlog.000033|244||mysql-binlog.000034|2332||mysql-binlog.000035|2134||mysql-binlog.000036|845915||mysql-binlog.000037|11735||mysql-binlog.000038|284||mysql-binlog.000039|284||mysql-binlog.000040|284||mysql-binlog.000041|284||mysql-binlog.000042|541||mysql-binlog.000043|234|+---------------------+-----------+13rowsinset(0.00sec)此时,slave报错如下:(root:db1@xucl:10:31:05)[(none)]>showslavestatus\G***************************1.row***************************Slave_IO_State:Master_Host:127.0.0.1Master_User:replMaster_Port:3306Connect_Retry:60Master_Log_File:mysql-binlog.000035Read_Master_Log_Pos:194Relay_Log_File:izbp12nspj47ypto9t6vyez-relay-bin.000011Relay_Log_Pos:373Relay_Master_Log_File:mysql-binlog.000035Slave_IO_Running:NoSlave_SQL_Running:YesReplicate_Do_DB:Replicate_Ignore_DB:Replicate_Do_Table:Replicate_Ignore_Table:Replicate_Wild_Do_Table:Replicate_Wild_Ignore_Table:Last_Errno:0Last_Error:Skip_Counter:0Exec_Master_Log_Pos:194Relay_Log_Space:648Until_Condition:NoneUntil_Log_File:Until_Log_Pos:0Master_SSL_Allowed:NoMaster_SSL_CA_File:Master_SSL_CA_Path:Master_SSL_Cert:Master_SSL_Cipher:Master_SSL_Key:Seconds_Behind_Master:NULLMaster_SSL_Verify_Server_Cert:NoLast_IO_Errno:1236Last_IO_Error:Gotfatalerror1236frommasterwhenreadingdatafrombinarylog:'CannotreplicateGTID-transactionwhen@@GLOBAL.GTID_MODE=OFF,atfile/storage/single/mysql3306/logs/mysql-binlog.000035,position194.;thefirstevent'mysql-binlog.000039'at234,thelasteventreadfrom'/storage/single/mysql3306/logs/mysql-binlog.000035'at259,thelastbytereadfrom'/storage/single/mysql33'Last_SQL_Errno:0Last_SQL_Error:Replicate_Ignore_Server_Ids:Master_Server_Id:3306Master_UUID:e8bdf01a-c79b-11e8-82b3-00163e088352Master_Info_File:mysql.slave_master_infoSQL_Delay:0SQL_Remaining_Delay:NULLSlave_SQL_Running_State:Slavehasreadallrelaylog;waitingformoreupdatesMaster_Retry_Count:86400Master_Bind:Last_IO_Error_Timestamp:19052110:32:57Last_SQL_Error_Timestamp:Master_SSL_Crl:Master_SSL_Crlpath:Retrieved_Gtid_Set:Executed_Gtid_Set:4c423515-6661-11e9-b767-00163e088352:1-7,e8bdf01a-c79b-11e8-82b3-00163e088352:1-57192Auto_Position:0Replicate_Rewrite_DB:Channel_Name:Master_TLS_Version:1rowinset(0.00sec)从slave上的报错来看,在主库flushbinarylogs后,从库又读取注册的binlog并且又apply了三、现象说明
从整个测试来看,MySQL视乎将手动注册的文件进行了传输和应用。报错是因为这个库以前是GITD_MODE=ON的,但是测试的时候已经关闭了GTID_MODE,改为了POSITION的模式,这个报错是因为DUMP线程会进行检测:
image.png
这个图来自我新写的一个系列(暂时还没发布,大概年底才能写好)。不管怎么说,DUMP线程已经在开始传输老的binlog文件了。那么原因是什么呢?下面我们将进行解释。
四、flush binary logs对binlog的操作flush binary logs 将包含如下操作:
获取新的binlog文件名字
关闭旧的binlog
关闭index file
打开index file
打开新的binlog
将新的binlog加入到indexfile
具体可以参考函数MYSQL_BIN_LOG::new_file_impl。其中要说明一下获取新的binlog文件名字是通过函数find_uniq_filename实现的,其中包含如下代码:
file_info=dir_info->dir_entry;for(i=dir_info->number_off_files;i--;file_info++){if(strncmp(file_info->name,start,length)==0&&is_number(file_info->name+length,&number,0)){set_if_bigger(max_found,number);}}...*next=(need_next||max_found==0)?max_found+1:max_found;
大概意思就是扫描index file文件里面的binlog文件,获取其序号最高的一个,然后加1。栈帧如下:
#0find_uniq_filename(name=0x7fffec5ec6d0"/mysqldata/mysql3340/log/binlog",next=0x7fffec5ec678,need_next=true)at/mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:3679#10x000000000187d208ingenerate_new_log_name(new_name=0x7fffec5ec6d0"/mysqldata/mysql3340/log/binlog",new_ext=0x0,log_name=0x7ffedc011520"/mysqldata/mysql3340/log/binlog",is_binlog=true)at/mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:3767#20x0000000001884fdbinMYSQL_BIN_LOG::new_file_impl(this=0x2e83640,need_lock_log=false,extra_description_event=0x0)at/mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:7175#30x0000000001884cbbinMYSQL_BIN_LOG::new_file_without_locking(this=0x2e83640,extra_description_event=0x0)at/mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:7099#40x0000000001886b6binMYSQL_BIN_LOG::rotate(this=0x2e83640,force_rotate=true,check_purge=0x7fffec5ecbfb)at/mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:7775#50x0000000001886d53inMYSQL_BIN_LOG::rotate_and_purge(this=0x2e83640,thd=0x7ffedc000b90,force_rotate=true)at/mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:7846
因此即便我们手动修改了index file,flush binary logs却不会有问题,因为它实际扫描了index file文件。
同时我们也看到flush binary logs重新加载了index file,这个时候手动修改的index file就生效了,使用show binary logs就能查看到你加入的文件了。
binlog切换后,DUMP线程也需要读取下一个binlog文件。我们来看看它是怎么确认读取哪一个文件的。
在函数sender.run()中可以找到循环每个binlog文件的代码。下面一句是寻找下一个binlog文件:
interror=mysql_bin_log.find_next_log(&m_linfo,0);
mysql_bin_log.find_next_log 包含的代码:
my_b_seek(&index_file,linfo->index_file_offset);//进行偏移量偏移linfo->index_file_start_offset=linfo->index_file_offset;length=my_b_gets(&index_file,fname,FN_REFLEN));//读取文件名字...if(normalize_binlog_name(full_fname,fname,is_relay_log))...linfo->index_file_offset=my_b_tell(&index_file);//偏移量从新记录以备下一次使用
我们能够看到DUMP线程并没有实际扫描整个index文件,而是通过一个index文件的偏移量进行读取。如果手动修改index文件那么偏移量就出现了错乱。因此DUMP发送的下一个文件将是不确定的。因此出现了发送手动注册的binlog文件给从库的现象,这种情况下可能出现下面情况:
如果是GTID_MODE 关闭,使用POSITION那么这种情况一定报错,比如重复的行。
如果是GTID_MODE 和AUTO_POSITION=1,那么DUMP线程会进行GTID的过滤不发送,因为Event不发送所以延迟会持续一段时间为0。
如果是GTID_MODE 和AUTO_POSITION=0,那么SQL线程应用GITD_EVENT的时候会进行过滤,延迟可能出现很大的情况。
尽管GTID可能可以屏蔽这种问题,但是DUMP线程已经在考虑发送老的binlog文件了,这是不合适的。
六、purge binary logs能够维护这个偏移量为什么purge binary logs不会出现问题呢,因为在purge binary logs的语句下,会维护这个偏移量如下:
virtualvoidoperator()(THD*thd){LOG_INFO*linfo;mysql_mutex_lock(&thd->LOCK_thd_data);if((linfo=thd->current_linfo))//bbinlog.cc:2829{/*Indexfileoffsetcanbelessthatpurgeoffsetonlyifwejuststartedreadingtheindexfile.Inthatcasewehavenothingtoadjust.*/if(linfo->index_file_offset<m_purge_offset)linfo->fatal=(linfo->index_file_offset!=0);elselinfo->index_file_offset-=m_purge_offset;}mysql_mutex_unlock(&thd->LOCK_thd_data);
我们可以看到linfo->index_file_offset -= m_purge_offset;这样一个语句。下面是栈帧:
#0Adjust_offset::operator()(this=0x7fffec5ec720,thd=0x7ffedc000be0)at/mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:2831#10x0000000000eef320inDo_THD::operator()(this=0x7fffec5ec6a0,thd=0x7ffedc000be0)at/mysqldata/percona-server-locks-detail-5.7.22/sql/mysqld_thd_manager.cc:46#20x0000000000eefa0finstd::for_each<THD**,Do_THD>(__first=0x3003358,__last=0x3003368,__f=...)at/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/bits/stl_algo.h:4200#30x0000000000eeefc0inGlobal_THD_manager::do_for_all_thd(this=0x3003340,func=0x7fffec5ec720)at/mysqldata/percona-server-locks-detail-5.7.22/sql/mysqld_thd_manager.cc:273#40x000000000187ae0ainadjust_linfo_offsets(purge_offset=0)at/mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:2873#50x0000000001883239inMYSQL_BIN_LOG::remove_logs_from_index(this=0x2e83640,log_info=0x7fffec5ec7d0,need_update_threads=true)at/mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:6352#60x0000000001883676inMYSQL_BIN_LOG::purge_logs(this=0x2e83640,to_log=0x7fffec5eca80"/mysqldata/mysql3340/log/binlog.000001",included=false,need_lock_index=true,need_update_threads=true,decrease_log_space=0x0,auto_purge=false)at/mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:6469#70x000000000187b4b5inpurge_master_logs(thd=0x7ffee0000c00,to_log=0x7ffee0006600"binlog.000001")at/mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:3127七、测试POSITION模式下的报错1、 环境
mysql>showbinarylogs;+---------------+-----------+|Log_name|File_size|+---------------+-----------+|binlog.000001|198||binlog.000002|154|+---------------+-----------+2rowsinset(0.00sec)mysql>showcreatetabletestcp\G;***************************1.row***************************Table:testcpCreateTable:CREATETABLE`testcp`(`id`int(11)NOTNULL,PRIMARYKEY(`id`))ENGINE=InnoDBDEFAULTCHARSET=utf81rowinset(0.00sec)ERROR:Noqueryspecified2、执行一个语句
主库:
mysql>insertintotestcpvalues(20);QueryOK,1rowaffected(0.43sec)mysql>select*fromtestcp;+----+|id|+----+|10||20|+----+2rowsinset(0.01sec)
从库:
mysql>showslavestatus\G;***************************1.row***************************Slave_IO_State:WaitingformastertosendeventMaster_Host:192.168.99.41Master_User:replMaster_Port:3340Connect_Retry:60Master_Log_File:binlog.000002Read_Master_Log_Pos:417Relay_Log_File:relay.000004Relay_Log_Pos:624Relay_Master_Log_File:binlog.000002Slave_IO_Running:YesSlave_SQL_Running:Yes...mysql>select*fromtestcp;+----+|id|+----+|10||20|+----+
这个时候DUMP线程index file偏移指针如下:
image.png
3、主库执行purge binary logs做之前拷贝原有binlog.000001为binlog.000001bak因为等会要拷贝回去
mysql>purgebinarylogsto'binlog.000002';QueryOK,0rowsaffected(3.14sec)mysql>showbinarylogs;+---------------+-----------+|Log_name|File_size|+---------------+-----------+|binlog.000002|417|+---------------+-----------+1rowinset(0.00sec)
因为purge binary logs命令会维护偏移指针,这个时候DUMP线程的index file偏移指针如下:
image.png
4、手动更改将binlog.000001bak拷贝为binlog.000001,然后修改index file将binlog.000001加入回去,然后flush binary logs如下:
mysql>flushbinarylogs;QueryOK,0rowsaffected(0.15sec)mysql>showbinarylogs;+---------------+-----------+|Log_name|File_size|+---------------+-----------+|binlog.000001|198||binlog.000002|461|+---------------+-----------+2rowsinset(0.00sec)
手动完成这个工作并不会维护DUMP线程的index file偏移指针,因此如下:
image.png
就这样DUMP线程重新发送了一次binlog.000002的内容,POSITION的从库只能报错了如下:
mysql>select*fromreplication_applier_status_by_worker\G***************************1.row***************************CHANNEL_NAME:WORKER_ID:1THREAD_ID:NULLSERVICE_STATE:OFFLAST_SEEN_TRANSACTION:ANONYMOUSLAST_ERROR_NUMBER:1062LAST_ERROR_MESSAGE:Worker1failedexecutingtransaction'ANONYMOUS'atmasterlogbinlog.000002,end_log_pos386;CouldnotexecuteWrite_rowseventontabletestmts.testcp;Duplicateentry'20'forkey'PRIMARY',Error_code:1062;handlererrorHA_ERR_FOUND_DUPP_KEY;theevent'smasterlogbinlog.000002,end_log_pos386LAST_ERROR_TIMESTAMP:2019-05-2114:46:26
感谢各位的阅读!关于“手动注册binlog文件造成主从异常的示例分析”这篇文章就分享到这里了,希望以上内容可以对大家有一定的帮助,让大家可以学到更多知识,如果觉得文章不错,可以把它分享出去让更多的人看到吧!
声明:本站所有文章资源内容,如无特殊说明或标注,均为采集网络资源。如若本站内容侵犯了原著者的合法权益,可联系本站删除。