[MySQL] Percona for MySQL ReplicaSet MHA를 적용하기

Dong yeong Kim·2023년 12월 2일
0

DBMS

목록 보기
15/15

OS : Rocky Linux 8.8 64bit

MySQL : Percona for MySQL 8.0.34

ReplicaSet : 1(Primary), 1(Secondary)

172.30.33.65(VIP)
node1(DB) : 172.30.33.61(MGMT), 100.100.100.61(Replica)
node2(DB) : 10.64.70.22(MGMT), 100.100.100.62(Replica)


안녕하세요, 금일 포스팅은 ReplicaSet에 MHA를 적용한 고가용성 구축을 진행하겠습니다.

단순하게 Replication 환경에서는 그저 MHA를 기동 시킨다면 문제가 없을 구성이지만, ReplicaSet의 환경에서는 MySQL Shell이 조작하지 않는 모든 복제 변경 세팅은 금하기에 이에 대한 적절한 조절이 필요합니다.

MHA 구축 가이드와 이어집니다.

위 MHA 구축 가이드를 보니 개인적으로 어느정도는 글 작성 능력이 좀 늘었다고 해야할까요.....


masterha_check_repl 실행 시 오류 발생

Rhel 8 환경에서는 masterha_check_repl가 실패합니다. 바로 MHA에서 제공하는 NodeUtil.pm의 문법이 약간 바뀌었는데요, 아래와 같습니다.

REHL 8버전 부터 master_check_repl 실행 시 오류 발생

[root@RPE1101200002 ~]# masterha_check_repl --conf=/etc/mha.cnf
Fri Dec  1 09:46:21 2023 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Dec  1 09:46:21 2023 - [info] Reading application default configuration from /etc/mha.cnf..
Fri Dec  1 09:46:21 2023 - [info] Reading server configuration from /etc/mha.cnf..
Fri Dec  1 09:46:21 2023 - [info] MHA::MasterMonitor version 0.58.
Fri Dec  1 09:46:21 2023 - [debug] Connecting to servers..
Fri Dec  1 09:46:22 2023 - [debug]  Connected to: 100.100.100.2(100.100.100.2:3306), user=rsadmin
Fri Dec  1 09:46:22 2023 - [debug]  Number of slave worker threads on host 100.100.100.2(100.100.100.2:3306): 4
Fri Dec  1 09:46:22 2023 - [debug]  Connected to: 100.100.100.3(100.100.100.3:3306), user=rsadmin
Fri Dec  1 09:46:22 2023 - [debug]  Number of slave worker threads on host 100.100.100.3(100.100.100.3:3306): 4
Fri Dec  1 09:46:22 2023 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. Redundant argument in sprintf at /usr/share/perl5/vendor_perl/MHA/NodeUtil.pm line 195.
Fri Dec  1 09:46:22 2023 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Fri Dec  1 09:46:22 2023 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!

변경사항

mha4mysql-node/lib/MHA/NodeUtil.pm

변경 전

...
193 sub parse_mysql_version($) {
194    my $str = shift;
195  $str =~ /(\d+)\.(\d+)/;
196  my $strmajor = "$1.$2";
197  my $result = sprintf( '%03d%03d', $strmajor =~ m/(\d+)/g );
198  return $result;
199
200 }
201
202 sub parse_mysql_major_version($) {
203    my $str = shift;
204  $str =~ /(\d+)\.(\d+)/;
205  my $strmajor = "$1.$2";
206  my $result = sprintf( '%03d%03d', $strmajor =~ m/(\d+)/g );
207  return $result;
208 }

...

변경 후

...
    193 sub parse_mysql_version($) {
    194   my $str = shift;
    195   $str =~ /(\d+)\.(\d+)/;
    196   my $strmajor = "$1.$2";
    197   my $result = sprintf( '%03d%03d', $strmajor =~ m/(\d+)/g );
    198   return $result;
    199 }
    200
    201 sub parse_mysql_major_version($) {
    202   my $str = shift;
    203   $str =~ /(\d+)\.(\d+)/;
    204   my $strmajor = "$1.$2";
    205   my $result = sprintf( '%03d%03d', $strmajor =~ m/(\d+)/g );
    206   return $result;
    207 }


...

이는 기본 Replication이든 ReplicaSet이든 동일합니다.


'!' 패스워드가 들어간 DB 계정 접근 불가 현상

MHA는 기본적으로 MHA가 DB에 접근하는데에 요구되는 계정의 비밀번호 특수문자가 사용 불가합니다.
MHA Manager 기동은 정상적으로 되지만, Failover와 Online Change 도중 비밀번호 오류가 발생하는데요. 원인은 아래와 같습니다.

mha4mysql-node/lib/MHA/NodeUtil.pm

...
my @shell_escape_chars = (
  '"', '!', '#', '&', ';', '`', '|',    '*',
  '?', '~', '<', '>', '^', '(', ')',    '[',
  ']', '{', '}', '$', ',', ' ', '\x0A', '\xFF'
);

sub unescape_for_shell {
  my $str = shift;
  if ( !index( $str, '\\\\' ) ) {
    return $str;
  }
  foreach my $c (@shell_escape_chars) {
    my $x       = quotemeta($c);
    my $pattern = "\\\\(" . $x . ")";
    $str =~ s/$pattern/$1/g;
  }
  return $str;
}

sub escape_for_shell {
  my $str = shift;
  my $ret = "";
  foreach my $c ( split //, $str ) {
    my $x      = $c;
    my $escape = 0;
    foreach my $e (@shell_escape_chars) {
      if ( $e eq $x ) {
        $escape = 1;
        last;
      }
    }
    if ( $x eq "'" ) {
      $x =~ s/'/'\\''/;
    }
    if ( $x eq "\\" ) {
      $x = "\\\\";
    }
    if ($escape) {
      $x = "\\" . $x;
    }
    $ret .= "$x";
  }
  $ret = "'" . $ret . "'";
  return $ret;
}
...

바로 위의 my @shell_escape_chars 배열에 명시된 특수문자들 앞에 '\' 를 넣어 이스케이프를 하게 의도되었는데요, 이로 인해 해당 배열에 포함된 특수문자는 모두 사용이 불가합니다.

예상컨대 개발자의 의도는 명령줄 라인이나, 자동으로 이스케이프 처리가 되지 않는 환경에서의 상황을 고려해둔 것 같습니다.

하지만 MHA는 Perl의 DBI 모듈로 DB에 접근하고, 자동으로 패스워드를 escape 처리하게 됩니다.
(위 배열에 '@' 문자가 없는것도...)

그리하여 위 escape_for_shell 등 메서드들을 그저 원래 인자를 return하게 만들거나 한다면 특수문자 또한 정상적으로 입력됨을 확인할 수 있습니다.
현재까지 테스트(Failover, Online Change 등) 시에는 별 다른 문제가 없었는데, 혹여라도 이에 대해 고려해야 할 사항이 있다면 알려주신다면 정말 감사하겠습니다.


Failover 시 mysqlsh dba.forcePrimaryInstance() 불가

MHA Manager에서 Failover 시 기존 Secondary에서 STOP REPLICA; 커맨드 입력하여 해당 오류 발생

위 소개에서 언급한 것처럼, ReplicaSet은 MySQL Shell 외의 조작되는 모든 커맨드를 경고로 표시합니다.

이를테면 MHA에서 Failover 후 Secondary를 승격시키기 위해 Read-only를 조작한다는 등 외부 요인이 개입된다면, MySQL Shell에서는 오류를 출력합니다.

You are connected to a member of replicaset 'replicaSet'.
* Connecting to replicaset instances
** Connecting to 100.100.100.3:3306

ERROR: Replication applier is OFF at instance 100.100.100.3:3306.
ERROR: Replication errors found for one or more SECONDARY instances. Use the 'invalidateErrorInstances' option to perform the failover anyway by skipping and invalidating instances with errors.
ReplicaSet.forcePrimaryInstance: One or more instances have replication applier errors. (MYSQLSH 51145)
 at (command line):1:27
in [x=dba.getReplicaSet(), x.forcePrimaryInstance('100.100.100.3:3306')]

하지만 위 경우는 MySQL Shell의 버그인데요, Secondary의 applier status가 OFF인 경우에 invalidateErrorIntances 오류를 출력하여 수행이 불가능한 현상입니다.

이는 MySQL Shell 8.2부터 Fix 되었다고 합니다.
(https://dev.mysql.com/doc/relnotes/mysql-shell/8.2/en/news-8-2-0.html)

하지만 아직 8.2를 Prod환경에 적용시키기엔 큰 도전이라고 생각하기에, 우리는 MHA 코드를 수정해야 합니다.

mha4mysql-manager/lib/MHA/MasterFailover.pm

변경 전

...
   1464 sub recover_master_gtid_internal($$$) {
   1465   my $target            = shift;
   1466   my $latest_slave      = shift;
   1467   my $binlog_server_ref = shift;
   1468   my $relay_master_log_file;
   1469   my $exec_master_log_pos;
   1470   $log->info();
   1471   $log->info("* Phase 3.3: New Master Recovery Phase..\n");
   1472   $log->info();
   1473   $log->info(" Waiting all logs to be applied.. ");
   1474   my $ret = $target->wait_until_relay_log_applied($log);
   1475
   1476   if ($ret) {
   1477     $log->error(" Failed with return code $ret");
   1478     return -1;
   1479   }
   1480   $log->info("  done.");
   1481   $target->stop_slave($log);

...

변경 후

...
   1464 sub recover_master_gtid_internal($$$) {
   1465   my $target            = shift;
   1466   my $latest_slave      = shift;
   1467   my $binlog_server_ref = shift;
   1468   my $relay_master_log_file;
   1469   my $exec_master_log_pos;
   1470   $log->info();
   1471   $log->info("* Phase 3.3: New Master Recovery Phase..\n");
   1472   $log->info();
   1473   $log->info(" Waiting all logs to be applied.. ");
   1474   my $ret = $target->wait_until_relay_log_applied($log);
   1475
   1476   if ($ret) {
   1477     $log->error(" Failed with return code $ret");
   1478     return -1;
   1479   }
   1480   $log->info("  done.");
   1481   #$target->stop_slave($log); 

...

mha4mysql-manager/lib/MHA/Server.pm

변경 전

...
    691 # Let the server to return nothing at SHOW SLAVE STATUS (Without this, the new master still points to the previous master)
    692 sub reset_slave_info($) {
    693   my $self     = shift;
    694   my $log      = $self->{logger};
    695   my $dbhelper = $self->{dbhelper};
    696   $log->debug(" Clearing slave info..");
    697   if ( $self->stop_slave() ) {
    698     $log->error(" Stopping slave failed!");
    699     return 1;
    700   }
    701   $self->reset_slave_all();
    702   my %status = $dbhelper->check_slave_status();
    703   if ( $status{Status} == 1 ) {
    704     $log->debug(
    705 " SHOW SLAVE STATUS shows new master does not replicate from anywhere. OK."
    706     );
...

변경 후

...
    691 # Let the server to return nothing at SHOW SLAVE STATUS (Without this, the new master still points to the previous master)
    692 sub reset_slave_info($) {
    693   my $self     = shift;
    694   my $log      = $self->{logger};
    695   my $dbhelper = $self->{dbhelper};
    696   $log->debug(" Clearing slave info..");
    697   #if ( $self->stop_slave() ) {
    698   #  $log->error(" Stopping slave failed!");
    699   #  return 1;
    700   #}
    701   #$self->reset_slave_all();
    702   my %status = $dbhelper->check_slave_status();
    703   if ( $status{Status} == 1 ) {
    704     $log->debug(
    705 " SHOW SLAVE STATUS shows new master does not replicate from anywhere. OK."
    706     );
...

MasterFailover.pm의 STOP SLAVE를 실행하는 과정만 주석처리를 하면 별 문제없이 실행이 되나, debugging 처리 도중 이를 확인하는 부분도 있기에, 해당 Server.pm 디버깅 부분 또한 주석 처리합니다.


Roll-Over 시 정상적인 결과 확인 불가

masterha_master_swtich 수행 도중에도 에러가 발생합니다.
동일하게 MHA의 로직과 MySQL Shell의 dba.setPrimaryInstance() 실행 도중 충돌 문제입니다.

[root@RPE1101200002 ~]# masterha_master_switch --conf=/etc/mha.cnf --master_state=alive --orig_master_is_new_slave --interactive=0

...

Fri Dec  1 14:55:54 2023 - [info]  ok.
Fri Dec  1 14:55:54 2023 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Fri Dec  1 14:55:54 2023 - [info] Executing FLUSH TABLES WITH READ LOCK..
Fri Dec  1 14:55:54 2023 - [info]  ok.
Fri Dec  1 14:55:54 2023 - [info] Orig master binlog:pos is mysql-bin.000004:237.
Fri Dec  1 14:55:54 2023 - [debug] Fetching current slave status..
Fri Dec  1 14:55:54 2023 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln937] Checking slave status failed.
Fri Dec  1 14:55:54 2023 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln178] Got ERROR:  at /usr/share/perl5/vendor_perl/MHA/MasterRotate.pm line 637.
Fri Dec  1 14:55:54 2023 - [debug]  Disconnected from 100.100.100.2(100.100.100.2:3306)
Fri Dec  1 14:55:54 2023 - [debug]  Disconnected from 100.100.100.3(100.100.100.3:3306)

...

그리하여, MHA 소스 수정이 필요합니다.

mha4mysql-manager/lib/MHA/MasterRotate.pm

변경 전

...

    618 sub do_master_online_switch {
    619   my $error_code = 1;
    620   my $orig_master;
    621
    622   eval {
    623     $orig_master = identify_orig_master();
    624     my $new_master = identify_new_master($orig_master);
    625     $log->info("** Phase 1: Configuration Check Phase completed.\n");
    626     $log->info();
    627
    628     if ($g_check_only) {
    629       $log->info("--check_only is set. Exit.");
    630       $error_code = 0;
    631       return;
    632     }
    633
    634     my ( $orig_master_log_file, $orig_master_log_pos ) =
    635       reject_update( $orig_master, $new_master );
    636
    637     $_server_manager->read_slave_status();
    638
    639     my ( $master_log_file, $master_log_pos ) =
    640       switch_master( $orig_master, $new_master, $orig_master_log_file,
    641       $orig_master_log_pos );
    642
    643     $error_code =
    644       switch_slaves( $orig_master, $new_master, $orig_master_log_file,
    645       $orig_master_log_pos, $master_log_file, $master_log_pos );
    646
    647     if ( $g_remove_orig_master_conf
    648       && !$g_orig_master_is_new_slave
    649       && $error_code == 0 )
    650     {
    651       MHA::Config::delete_block_and_save( $g_config_file, $orig_master->{id},
    652         $log );
    653     }
    654
    655     $_server_manager->release_failover_advisory_lock();
    656     $_server_manager->disconnect_all();
    657   };
    658   if ($@) {
    659    if ( $orig_master && $orig_master->{not_error} ) {
    660       $log->info($@);
    661     }
    662     else {
    663       MHA::ManagerUtil::print_error( "Got ERROR: $@", $log );
    664     }
    665     $_server_manager->disconnect_all() if ($_server_manager);
    666     undef $@;
    667   }
    668
    669   return $error_code;
    670 }
...

변경 후

...

    618 sub do_master_online_switch {
    619   my $error_code = 1;
    620   my $orig_master;
    621
    622   #eval {
    623     $orig_master = identify_orig_master();
    624     my $new_master = identify_new_master($orig_master);
    625     $log->info("** Phase 1: Configuration Check Phase completed.\n");
    626     $log->info();
    627
    628     if ($g_check_only) {
    629       $log->info("--check_only is set. Exit.");
    630       $error_code = 0;
    631       return;
    632     }
    633
    634     my ( $orig_master_log_file, $orig_master_log_pos ) =
    635       reject_update( $orig_master, $new_master );
    636
    637     #$_server_manager->read_slave_status();
    638
    639     #my ( $master_log_file, $master_log_pos ) =
    640       switch_master( $orig_master, $new_master, $orig_master_log_file,
    641       $orig_master_log_pos );
    642
    643     #$error_code =
    644     #  switch_slaves( $orig_master, $new_master, $orig_master_log_file,
    645     #  $orig_master_log_pos, $master_log_file, $master_log_pos );
    646
    647     #if ( $g_remove_orig_master_conf
    648     #  && !$g_orig_master_is_new_slave
    649     #  && $error_code == 0 )
    650     #{
    651     #  MHA::Config::delete_block_and_save( $g_config_file, $orig_master->{id},
    652     #    $log );
    653     #}
    654
    655     #$_server_manager->release_failover_advisory_lock();
    656     #$_server_manager->disconnect_all();
    657   #};
    658   #if ($@) {
    659    #if ( $orig_master && $orig_master->{not_error} ) {
    660       #$log->info($@);
    661     #}
    662     #else {
    663       #MHA::ManagerUtil::print_error( "Got ERROR: $@", $log );
    664     #}
    665     #$_server_manager->disconnect_all() if ($_server_manager);
    666     #undef $@;
    667   #}
    668
    669   return $error_code;
    670 }
...

MasterRotate.pm

변경 전

    277 sub reject_update($$) {
    278   my $orig_master = shift;
    279   my $new_master  = shift;
    280   my $ret;
    281   $orig_master->release_monitor_advisory_lock();
    282   $orig_master->disconnect();
    283
    284   $log->info("* Phase 2: Rejecting updates Phase..\n");
    285   $log->info();
    286   if ( $new_master->{master_ip_online_change_script} ) {
    287     my $command =
    288 "$orig_master->{master_ip_online_change_script} --command=stop --orig_master_host=$orig_master->{hostname} --orig_master_ip=$o        rig_master->{ip} --orig_master_port=$orig_master->{port} --orig_master_user=$orig_master->{escaped_user} --new_master_host=$ne        w_master->{hostname} --new_master_ip=$new_master->{ip} --new_master_port=$new_master->{port} --new_master_user=$new_master->{e        scaped_user}";
    289     $command .= " --orig_master_ssh_user=$orig_master->{ssh_user}";
    290     $command .= " --new_master_ssh_user=$new_master->{ssh_user}";
    291     $command .= $orig_master->get_ssh_args_if( 1, "orig", 1 );
    292     $command .= $new_master->get_ssh_args_if( 2, "new", 1 );
    293     if ($g_orig_master_is_new_slave) {
    294       $command .= " --orig_master_is_new_slave";
    295     }
    296     $log->info(
    297 "Executing master ip online change script to disable write on the current master:"
    298     );
    299     $log->info(
    300       "  $command --orig_master_password=xxx --new_master_password=xxx");
    301     $command .=
    302 " --orig_master_password=$orig_master->{escaped_password} --new_master_password=$new_master->{escaped_password}";
    303     my ( $high, $low ) = MHA::ManagerUtil::exec_system($command);
    304
    305     if ( $high == 0 && $low == 0 ) {
    306       $log->info(" ok.");
    307     }
    308     else {
    309       if ( $high == 10 ) {
    310         $log->warning("Proceeding.");
    311       }
    312       else {
    313         croak;
    314       }
    315     }
    316   }
    317   elsif ($g_interactive) {
    318     print
    319 "master_ip_online_change_script is not defined. If you do not disable writes on the current master manually, applications keep         writing on the current master. Is it ok to proceed? (yes/NO): ";
    320     $ret = <STDIN>;
    321     chomp($ret);
    322     if ( lc($ret) !~ /^y/ ) {
    323       $orig_master->{not_error} = 1;
    324       die "Not typed yes. Stopping.";
    325     }
    326   }
    327   else {
    328     $log->warning(
    329 "master_ip_online_change_script is not defined. Skipping disabling writes on the current master."
    330     );
    331   }
    332
    333   # It is necessary to keep connecting on the orig master to check
    334   # binary log is not proceeding. master write control script may kill
    335   # previous connections, so it is needed to establish connection again.
    336   unless ( $orig_master->reconnect() ) {
    337     $log->error(
    338       sprintf( "Failed to connect to the orig master %s!",
    339         $orig_master->get_hostinfo() )
    340     );
    341     croak;
    342   }
    343
    344   my ( $orig_master_log_file, $orig_master_log_pos );
    345   if ($g_skip_lock_all_tables) {
    346     $log->info("Skipping locking all tables.");
    347   }
    348   else {
    349     $log->info(
    350 #"Locking all tables on the orig master to reject updates from everybody (including root):"
    351     );
    352   }
    353   if ( $g_skip_lock_all_tables || $orig_master->lock_all_tables() ) {
    354
    355     # FLUSH TABLES WITH READ LOCK is skipped or failed.
    356     # So we need to verify binlog writes are stopped or not.
    357     # All slaves should complete replication until this position
    358     ( $orig_master_log_file, $orig_master_log_pos ) =
    359       $orig_master->check_binlog_stop();
    360   }
    361   else {
    362     ( $orig_master_log_file, $orig_master_log_pos ) =
    363       $orig_master->get_binlog_position();
    364   }
    365   if ( !$orig_master_log_file || !defined($orig_master_log_pos) ) {
    366     $log->error("Failed to get orig master's binlog and/or position!");
    367     croak;
    368   }
    369   $log->info(
    370     sprintf(
    371       "Orig master binlog:pos is %s:%d.",
    372       $orig_master_log_file, $orig_master_log_pos
    373     )
    374   );
    375   return ( $orig_master_log_file, $orig_master_log_pos );
    376 }
    377
    378 sub switch_master_internal($$$) {
    379   my $target               = shift;
    380   my $orig_master_log_file = shift;
    381   my $orig_master_log_pos  = shift;
    382   if ( $target->master_pos_wait( $orig_master_log_file, $orig_master_log_pos ) )
    383   {
    384     return;
    385   }
    386   return $_server_manager->get_new_master_binlog_position($target);
    387 }

변경 후

    277 sub reject_update($$) {
    278   my $orig_master = shift;
    279   my $new_master  = shift;
    280   my $ret;
    281   $orig_master->release_monitor_advisory_lock();
    282   $orig_master->disconnect();
    283
    284   $log->info("* Phase 2: Rejecting updates Phase..\n");
    285   $log->info();
    286   if ( $new_master->{master_ip_online_change_script} ) {
    287     my $command =
    288 "$orig_master->{master_ip_online_change_script} --command=stop --orig_master_host=$orig_master->{hostname} --orig_master_ip=$o        rig_master->{ip} --orig_master_port=$orig_master->{port} --orig_master_user=$orig_master->{escaped_user} --new_master_host=$ne        w_master->{hostname} --new_master_ip=$new_master->{ip} --new_master_port=$new_master->{port} --new_master_user=$new_master->{e        scaped_user}";
    289     $command .= " --orig_master_ssh_user=$orig_master->{ssh_user}";
    290     $command .= " --new_master_ssh_user=$new_master->{ssh_user}";
    291     $command .= $orig_master->get_ssh_args_if( 1, "orig", 1 );
    292     $command .= $new_master->get_ssh_args_if( 2, "new", 1 );
    293     if ($g_orig_master_is_new_slave) {
    294       $command .= " --orig_master_is_new_slave";
    295     }
    296     $log->info(
    297 "Executing master ip online change script to disable write on the current master:"
    298     );
    299     $log->info(
    300       "  $command --orig_master_password=xxx --new_master_password=xxx");
    301     $command .=
    302 " --orig_master_password=$orig_master->{escaped_password} --new_master_password=$new_master->{escaped_password}";
    303     my ( $high, $low ) = MHA::ManagerUtil::exec_system($command);
    304
    305     if ( $high == 0 && $low == 0 ) {
    306       $log->info(" ok.");
    307     }
    308     else {
    309       if ( $high == 10 ) {
    310         $log->warning("Proceeding.");
    311       }
    312       else {
    313         croak;
    314       }
    315     }
    316   }
    317   elsif ($g_interactive) {
    318     print
    319 "master_ip_online_change_script is not defined. If you do not disable writes on the current master manually, applications keep         writing on the current master. Is it ok to proceed? (yes/NO): ";
    320     $ret = <STDIN>;
    321     chomp($ret);
    322     if ( lc($ret) !~ /^y/ ) {
    323       $orig_master->{not_error} = 1;
    324       die "Not typed yes. Stopping.";
    325     }
    326   }
    327   else {
    328     $log->warning(
    329 "master_ip_online_change_script is not defined. Skipping disabling writes on the current master."
    330     );
    331   }
    332
    333   # It is necessary to keep connecting on the orig master to check
    334   # binary log is not proceeding. master write control script may kill
    335   # previous connections, so it is needed to establish connection again.
    336   unless ( $orig_master->reconnect() ) {
    337     $log->error(
    338       sprintf( "Failed to connect to the orig master %s!",
    339         $orig_master->get_hostinfo() )
    340     );
    341     croak;
    342   }
    343
    344   my ( $orig_master_log_file, $orig_master_log_pos );
    345   #if ($g_skip_lock_all_tables) {
    346   #  $log->info("Skipping locking all tables.");
    347   #}
    348   #else {
    349   #  $log->info(
    350 #"Locking all tables on the orig master to reject updates from everybody (including root):"
    351   #  );
    352   #}
    353   #if ( $g_skip_lock_all_tables || $orig_master->lock_all_tables() ) {
    354
    355     # FLUSH TABLES WITH READ LOCK is skipped or failed.
    356     # So we need to verify binlog writes are stopped or not.
    357     # All slaves should complete replication until this position
    358     ( $orig_master_log_file, $orig_master_log_pos ) =
    359       $orig_master->check_binlog_stop();
    360   #}
    361   #else {
    362   #  ( $orig_master_log_file, $orig_master_log_pos ) =
    363   #    $orig_master->get_binlog_position();
    364   #}
    365   #if ( !$orig_master_log_file || !defined($orig_master_log_pos) ) {
    366   #  $log->error("Failed to get orig master's binlog and/or position!");
    367   #  croak;
    368   #}
    369   #$log->info(
    370   #  sprintf(
    371   #    "Orig master binlog:pos is %s:%d.",
    372   #    $orig_master_log_file, $orig_master_log_pos
    373   #  )
    374   #);
    375   return ( $orig_master_log_file, $orig_master_log_pos );
    376 }
    377
    378 sub switch_master_internal($$$) {
    379   my $target               = shift;
    380   my $orig_master_log_file = shift;
    381   my $orig_master_log_pos  = shift;
    382   if ( $target->master_pos_wait( $orig_master_log_file, $orig_master_log_pos ) )
    383   {
    384     return;
    385   }
    386   return $_server_manager->get_new_master_binlog_position($target);
    387 }

MasterRotate.pm

변경 전

sub switch_master($$$$) {
  my $orig_master          = shift;
  my $new_master           = shift;
  my $orig_master_log_file = shift;
  my $orig_master_log_pos  = shift;

  my ( $master_log_file, $master_log_pos ) =
    switch_master_internal( $new_master, $orig_master_log_file,
    $orig_master_log_pos );
  if ( !$master_log_file or !defined($master_log_pos) ) {
    $log->error("Failed to get new master's binlog and/or position!");
    croak;
  }

  # Allow write access on the new master
  if ( $new_master->{master_ip_online_change_script} ) {
    my $command =
"$new_master->{master_ip_online_change_script} --command=start --orig_master_host=$orig_master->{hostname} --orig_master_ip=$orig_master->{ip} --orig_master_port=$orig_master->{port} --orig_master_user=$orig_master->{escaped_user} --new_master_host=$new_master->{hostname} --new_master_ip=$new_master->{ip} --new_master_port=$new_master->{port} --new_master_user=$new_master->{escaped_user}";
    $command .= " --orig_master_ssh_user=$orig_master->{ssh_user}";
    $command .= " --new_master_ssh_user=$new_master->{ssh_user}";
    $command .= $orig_master->get_ssh_args_if( 1, "orig", 1 );
    $command .= $new_master->get_ssh_args_if( 2, "new", 1 );
    if ($g_orig_master_is_new_slave) {
      $command .= " --orig_master_is_new_slave";
    }
    $log->info(
"Executing master ip online change script to allow write on the new master:"
    );
    $log->info(
      "  $command --orig_master_password=xxx --new_master_password=xxx");
    $command .=
" --orig_master_password=$orig_master->{escaped_password} --new_master_password=$new_master->{escaped_password}";
    my ( $high, $low ) = MHA::ManagerUtil::exec_system($command);

    if ( $high == 0 && $low == 0 ) {
      $log->info(" ok.");
    }
    else {
      if ( $high == 10 ) {
        $log->warning("Proceeding.");
      }
      else {
        croak;
      }
    }
  }

  # Allow write access on master (if read_only==1)
  $new_master->disable_read_only();

  return ( $master_log_file, $master_log_pos );
}

변경 후

sub switch_master($$$$) {
  my $orig_master          = shift;
  my $new_master           = shift;
  my $orig_master_log_file = shift;
  my $orig_master_log_pos  = shift;

  my ( $master_log_file, $master_log_pos ) =
    switch_master_internal( $new_master, $orig_master_log_file,
    $orig_master_log_pos );
 # if ( !$master_log_file or !defined($master_log_pos) ) {
 #   $log->error("Failed to get new master's binlog and/or position!");
 #   croak;
 # }

  # Allow write access on the new master
  if ( $new_master->{master_ip_online_change_script} ) {
    my $command =
"$new_master->{master_ip_online_change_script} --command=start --orig_master_host=$orig_master->{hostname} --orig_master_ip=$orig_master->{ip} --orig_master_port=$orig_master->{port} --orig_master_user=$orig_master->{escaped_user} --new_master_host=$new_master->{hostname} --new_master_ip=$new_master->{ip} --new_master_port=$new_master->{port} --new_master_user=$new_master->{escaped_user}";
    $command .= " --orig_master_ssh_user=$orig_master->{ssh_user}";
    $command .= " --new_master_ssh_user=$new_master->{ssh_user}";
    $command .= $orig_master->get_ssh_args_if( 1, "orig", 1 );
    $command .= $new_master->get_ssh_args_if( 2, "new", 1 );
    if ($g_orig_master_is_new_slave) {
      $command .= " --orig_master_is_new_slave";
    }
    $log->info(
"Executing master ip online change script to allow write on the new master:"
    );
    $log->info(
      "  $command --orig_master_password=xxx --new_master_password=xxx");
    $command .=
" --orig_master_password=$orig_master->{escaped_password} --new_master_password=$new_master->{escaped_password}";
    my ( $high, $low ) = MHA::ManagerUtil::exec_system($command);

    if ( $high == 0 && $low == 0 ) {
      $log->info(" ok.");
    }
    else {
      if ( $high == 10 ) {
        $log->warning("Proceeding.");
      }
      else {
        croak;
      }
    }
  }

  # Allow write access on master (if read_only==1)
  $new_master->disable_read_only();

  return ( $master_log_file, $master_log_pos );
}

mha4mysql-manager/lib/MHA/Server.pm
변경 전

   1002 sub master_pos_wait_internal {
   1003   my ( $self, $binlog_file, $binlog_pos, $log ) = @_;
   1004    $log = $self->{logger} unless ($log);
   1005    my $dbhelper = $self->{dbhelper};
   1006    my $res = $dbhelper->master_pos_wait( $binlog_file, $binlog_pos );
   1007    if ( !defined($res) ) {
   1008      $log->error(
   1009        sprintf(
   1010  "master_pos_wait(%s:%d) returned NULL on %s. Maybe SQL thread was aborted?",
   1011          $binlog_file, $binlog_pos, $self->get_hostinfo()
   1012        )
   1013      );
   1014     return 1;
   1015    }
   1016    if ( $res >= 0 ) {
   1017      $log->info(
   1018        sprintf(
   1019          " master_pos_wait(%s:%d) completed on %s. Executed %d events.",
   1020          $binlog_file, $binlog_pos, $self->get_hostinfo(), $res
   1021        )
   1022      );
   1023      return 0;
   1024    }
   1025    else {
   1026      $log->error(
   1027        sprintf(
   1028          "master_pos_wait(%s:%d) got error on %s: $res",
   1029          $binlog_file, $binlog_pos, $self->get_hostinfo()
   1030        )
   1031      );
   1032      return 1;
   1033    }
   1034 }
   1035 # We do not reset slave here

변경 후 (빈 메서드)

   1002 sub master_pos_wait_internal {
   1003 my ( $self, $binlog_file, $binlog_pos, $log ) = @_;
   1003 # We do not reset slave here
   1004 }

위 코드들이 수정된 MHA manager/node는 Rebuild하여 재배포 하였습니다.
https://github.com/yeongdonge/mha4ReplicaSet

사실, MHA는 Primary 서버에 Ping 체크와 DB 커넥션 여부를 체크하여 문제 발생 시 VIP 플로팅만 담당하고 나머지 역할은 MySQL Shell의 역할입니다.

예를 들어 Primary <> Secondary의 Advisory Lock이 MHA에도 자체적으로 구현되어 있지만, MySQL Shell(ReplicaSet)에도 구현이 되어 있기에, 사실상 위험을 염두할 필요는 없다고 생각합니다.

최근 Orchestrator와 MHA를 Percona MySQL/ReplicaSet에 적용하기 위해 GoLang과 Perl 언어를 새롭게 접하는 계기가 되었는데요.

이로 인하여 해당 솔루션들을 좀 더 깊게 파악하는 계기가 되었던 것 같습니다.

다음 포스팅은 이전에 작성하지 못한, 서비스 환경에 맞는 Orchestrator 코드 수정하기를 포스팅하도록 하겠습니다.

profile
날 것의 기술 '불'로그

0개의 댓글