本文共 4022 字,大约阅读时间需要 13 分钟。
1 查询slve的延迟情况,超过N秒延迟则等待或者返回失败,确保业务影响时间最短
2 登陆proxy节点,disable当前hproxy,使得后续通过proxy的业务连接失败
3 登陆proxy节点,shutdown当前通过proxy连接的会话(如果sql能快速完成,这步其实可用不做,可以做个时间阈值检测,当N秒以后还有业务层连接则kill)
4 记录当前主库的binlog和pos点
5 等待双主间的数据完全一致,即从库执行到步骤4记录的pos点后,可以利用MASTER_POS_WAIT(log_name,log_pos[,timeout])内置函数实现
6 执行切换,即enable M-M中的备用节点
7 返回切换成功
切换以后,show slave status出现大量的主键冲突报错
通过分析,发现通过haproxy来杀死会话实际上并不靠谱,当连接建立以后,即使通过haproxy杀死会话或者杀死mysql client所在的客户端进程,实际上没有效果,依旧会往下继续执行成功,如果此时主库insert了记录为10的自增id没有完成
当新的主库连接进来后,当时的慢的sql还没跑完,即自增id为10的记录还没同步过来
这时新主库再次insert一条记录,产生的自增id依旧是10
慢SQL这时在主库执行完成,传到新主库后主键冲突,报错
1 通过proxy登陆mysql,执行以下命令
mysql> select * from test_ha_switch;
Empty set (0.00 sec) mysql> insert into test_ha_switch(dd) values (sleep(100));此时SQL处于运行中
2 登陆proxy,通过socat杀死会话
echo "shutdown sessions server ha-proxy/10.9.188.208" | socat stdio /opt/udb/instance/haproxy/7378229c-7ada-4f90-b1ec-96d704979426/stats
3 此时再回去看步骤1中的会话,发现已经是处于ERROR 2013 (HY000): Lost connection to MySQL server during query
mysql> insert into test_ha_switch(dd) values (sleep(100));
ERROR 2013 (HY000): Lost connection to MySQL server during query4 再次查询该表,发现记录依旧写进去了
mysql> select * from test_ha_switch;
ERROR 2006 (HY000): MySQL server has gone away No connection. Trying to reconnect... Connection id: 1536352 Current database: jiang +----+------+ | id | dd | +----+------+ | 1 | 0 | +----+------+ 1 row in set (0.01 sec)另外如果没有proxy,通过杀死客户端所在的进程的方式也一样很好模拟
# mysql -S /opt/udb/instance/mysql-5.6/5770e236-fef6-4d61-bc61-beb6d69f7dbe/mysqld.sock -uucloudbackup jiang -e "show processlist;"
+---------+--------------+-------------------+-------+---------+------+-------+------------------+ | Id | User | Host | db | Command | Time | State | Info | +---------+--------------+-------------------+-------+---------+------+-------+------------------+ | 1536517 | ucloudbackup | 10.9.125.20:62374 | jiang | Sleep | 7 | | NULL | | 1536566 | ucloudbackup | localhost | jiang | Query | 0 | init | show processlist | +---------+--------------+-------------------+-------+---------+------+-------+------------------+ # nohup mysql -S /opt/udb/instance/mysql-5.6/5770e236-fef6-4d61-bc61-beb6d69f7dbe/mysqld.sock -uucloudbackup jiang -e "insert into test_ha_switch(dd) select dd from test_ha_switch;" & [1] 28752 # nohup: ignoring input and appending output to `nohup.out' # ps -ef|grep test_ha_switch root 28752 17668 0 18:10 pts/0 00:00:00 mysql -S /opt/udb/instance/mysql-5.6/5770e236-fef6-4d61-bc61-beb6d69f7dbe/mysqld.sock -uucloudbackup -px xxxxxxxx jiang -e insert into test_ha_switch(dd) select dd from test_ha_switch; root 29117 17668 0 18:10 pts/0 00:00:00 grep test_ha_switch # kill -9 28752 [1]+ Killed nohup mysql -S /opt/udb/instance/mysql-5.6/5770e236-fef6-4d61-bc61-beb6d69f7dbe/mysqld.sock -uucloudbackup jiang -e "insert into test_ha_switch(dd) select dd from test_ha_switch;" # mysql -S /opt/udb/instance/mysql-5.6/5770e236-fef6-4d61-bc61-beb6d69f7dbe/mysqld.sock -uucloudbackup -pJfKS9FFXvk jiang -e "show processlist;" +---------+--------------+-------------------+-------+---------+------+--------------+--------------------------------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +---------+--------------+-------------------+-------+---------+------+--------------+--------------------------------------------------------------+ | 1536517 | ucloudbackup | 10.9.125.20:62374 | jiang | Sleep | 29 | | NULL | | 1536570 | ucloudbackup | localhost | jiang | Query | 16 | Sending data | insert into test_ha_switch(dd) select dd from test_ha_switch | | 1536575 | ucloudbackup | localhost | jiang | Query | 0 | init | show processlist | +---------+--------------+-------------------+-------+---------+------+--------------+--------------------------------------------------------------+优化方法其实也比较简单,就是在切换逻辑的3和4之间,通过直接登陆主节点的方式(这时已经无法通过proxy登陆db了),如果当前还有业务连接,则通过mysql kill语法kill掉当前的业务连接即可;
值得一提的是,kill线程时要确保正常退出,可能会存在这种情况,查看的时候还有两个连接,你kill之前第一个连接已经断开了,那么你再去kill这个线程程序可能就会报错退出,从而第二个线程实际上都没有kill掉
转载地址:http://wcbgb.baihongyu.com/