새로운 프로젝트 배포 처리중에
새로운 오류를 발견했다.
view events 에서 확인..
자세한 이유는 안알려주고,, codedeploy agent 로그를 까보라고 한다.(속상😳)
실패된 대체 인스턴스로 가서 ssh로 인스턴스 접속한 후에 로그를 확인해봤다.(아래는 로그 확인 명령)
cat /var/log/aws/codedeploy-agent/codedeploy-agent.log
위같이 쓰면 아래와 같은 로그 내용을 확인할 수 있었는데…
2022-10-06 00:17:50 ERROR [codedeploy-agent(2600)]: booting child: error during start or run: SystemExit - Stopping CodeDeploy agent due to SSL validation error. - /opt/codedeploy-agent/lib/instance_agent/plugins/codedeploy/command_poller.rb:65:in `abort'
/opt/codedeploy-agent/lib/instance_agent/plugins/codedeploy/command_poller.rb:65:in `validate'
/opt/codedeploy-agent/lib/instance_agent/agent/base.rb:11:in `runner'
/opt/codedeploy-agent/lib/instance_agent/runner/child.rb:32:in `block in prepare_run'
/opt/codedeploy-agent/lib/instance_agent/runner/child.rb:78:in `with_error_handling'
/opt/codedeploy-agent/lib/instance_agent/runner/child.rb:31:in `prepare_run'
/opt/codedeploy-agent/vendor/gems/process_manager-0.0.13/lib/process_manager/child.rb:64:in `block in prepare_run_with_error_handling'
/opt/codedeploy-agent/lib/instance_agent/runner/child.rb:78:in `with_error_handling'
/opt/codedeploy-agent/vendor/gems/process_manager-0.0.13/lib/process_manager/child.rb:63:in `prepare_run_with_error_handling'
/opt/codedeploy-agent/vendor/gems/process_manager-0.0.13/lib/process_manager/child.rb:20:in `start'
/opt/codedeploy-agent/vendor/gems/process_manager-0.0.13/lib/process_manager/master.rb:206:in `block in spawn_child'
/opt/codedeploy-agent/vendor/gems/process_manager-0.0.13/lib/process_manager/master.rb:204:in `fork'
/opt/codedeploy-agent/vendor/gems/process_manager-0.0.13/lib/process_manager/master.rb:204:in `spawn_child'
/opt/codedeploy-agent/vendor/gems/process_manager-0.0.13/lib/process_manager/master.rb:283:in `block (2 levels) in replace_terminated_children'
/opt/codedeploy-agent/vendor/gems/logging-1.8.2/lib/logging/diagnostic_context.rb:323:in `block in create_with_logging_context'
2022-10-06 00:17:50 ERROR [codedeploy-agent(2600)]: booting child: error during start or run: SystemExit - exit - /opt/codedeploy-agent/lib/instance_agent/runner/child.rb:90:in `exit'
/opt/codedeploy-agent/lib/instance_agent/runner/child.rb:90:in `rescue in with_error_handling'
/opt/codedeploy-agent/lib/instance_agent/runner/child.rb:77:in `with_error_handling'
/opt/codedeploy-agent/lib/instance_agent/runner/child.rb:31:in `prepare_run'
/opt/codedeploy-agent/vendor/gems/process_manager-0.0.13/lib/process_manager/child.rb:64:in `block in prepare_run_with_error_handling'
/opt/codedeploy-agent/lib/instance_agent/runner/child.rb:78:in `with_error_handling'
/opt/codedeploy-agent/vendor/gems/process_manager-0.0.13/lib/process_manager/child.rb:63:in `prepare_run_with_error_handling'
/opt/codedeploy-agent/vendor/gems/process_manager-0.0.13/lib/process_manager/child.rb:20:in `start'
/opt/codedeploy-agent/vendor/gems/process_manager-0.0.13/lib/process_manager/master.rb:206:in `block in spawn_child'
/opt/codedeploy-agent/vendor/gems/process_manager-0.0.13/lib/process_manager/master.rb:204:in `fork'
/opt/codedeploy-agent/vendor/gems/process_manager-0.0.13/lib/process_manager/master.rb:204:in `spawn_child'
/opt/codedeploy-agent/vendor/gems/process_manager-0.0.13/lib/process_manager/master.rb:283:in `block (2 levels) in replace_terminated_children'
/opt/codedeploy-agent/vendor/gems/logging-1.8.2/lib/logging/diagnostic_context.rb:323:in `block in create_with_logging_context'
2022-10-06 00:17:50 INFO [codedeploy-agent(2269)]: master 2269: Received CHLD - cleaning dead child process
2022-10-06 00:17:50 INFO [codedeploy-agent(2269)]: master 2269: been told to replace child 2600
2022-10-06 00:17:50 INFO [codedeploy-agent(2269)]: master 2269: not enough child processes running - missing at least 1 - respawning
2022-10-06 00:17:55 INFO [codedeploy-agent(2269)]: master 2269: Spawned child 1/1
2022-10-06 00:17:56 INFO [codedeploy-agent(2649)]: On Premises config file does not exist or not readable
2022-10-06 00:17:56 INFO [codedeploy-agent(2649)]: CodeDeploy endpoint: https://codedeploy-commands.ap-northeast-2.amazonaws.com
2022-10-06 00:17:56 INFO [codedeploy-agent(2649)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandExecutor: Archives to retain is: 5}
2022-10-06 00:17:56 INFO [codedeploy-agent(2649)]: CodeDeploy endpoint: https://codedeploy-commands.ap-northeast-2.amazonaws.com
2022-10-06 00:17:56 INFO [codedeploy-agent(2649)]: CodeDeploy endpoint: https://codedeploy-commands.ap-northeast-2.amazonaws.com
2022-10-06 00:18:03 INFO [codedeploy-agent(2742)]: Checking first if any deployment lifecycle event is in progress master 2269
2022-10-06 00:18:03 INFO [codedeploy-agent(2742)]: Stopping master 2269
2022-10-06 00:18:04 INFO [codedeploy-agent(2269)]: master 2269: Received TERM - stopping children and shutting down
2022-10-06 00:18:04 INFO [codedeploy-agent(2649)]: booting child: Received TERM - setting internal shutting down flag and possibly finishing last run
2022-10-06 00:18:04 ERROR [codedeploy-agent(2649)]: InstanceAgent::Plugins::CodeDeployPlugin::CodeDeployControl: Error during certificate verification on codedeploy endpoint https://codedeploy-commands.ap-northeast-2.amazonaws.com
2022-10-06 00:18:04 ERROR [codedeploy-agent(2649)]: Error validating the SSL configuration: Invalid server certificate
2022-10-06 00:18:04 ERROR [codedeploy-agent(2649)]: booting child: error during start or run: SystemExit - Stopping CodeDeploy agent due to SSL validation error. - /opt/codedeploy-agent/lib/instance_agent/plugins/codedeploy/command_poller.rb:65:in `abort'
인증에 실패해서 child 프로세스가 뜨지 못했다고 하는 그런 느낌..
sudo service codedeploy-agent status
위처럼 나오면 실패한거고
두번째 사진처럼 나와야 codedeploy-agent가 잘 뜬것
구글링을 해본 결과
외부 인터넷 연결이 실패해서 인증을 하지 못한것이라고 한다.
찾아봤던 글에서는 인터넷 연결이 안된다고 해서 공인 ip로 만들고서 해결했다고 했는데 우리는 Private ip로 남겨둬야 해서 다른 해결방법이 필요했다
보안그룹 인바운드-아웃바운드 규칙 다 확인해보고
혹시 해서 ASG 에 있는 서브넷 네개 중 최근에 추가된 서브넷 두개에 대한 연결을 해제후 재배포를 해봤더니 성공했다!
성공은 했는데 이유를 알고 싶어서 까봤는데
기존 서브넷 2개는 라우팅테이블이 private용으로 NAT 연결을 할 수 있게 돼있었고
최근에 추가된 서브넷 2개는 라우팅테이블이 public 용으로 IGW 연결을 할 수 있게 돼있었다!!
프리환경이여서 ASG으로 새로운 인스턴스를 뜰 일이 많이는 없었어서 발견되지 못하고 있던 듯 싶다.
내가 배포했을때 ASG 에 있는 서브넷 4개중 퍼블릭 라우팅 테이블과 연결돼있는
최근에 추가된 서브넷에 연결된 인스턴스가 한대가 띄어졌고 private ip인데 igw로 연결하려니 외부 연결에 실패하여
codedeploy-agent Invalid server certificate 가 뜨면서 배포에 실패한 것 ㄸㄹㄹ
AWS 안에는 AWS 계정 전용 가상 네트워크(VPC)가 존재한다.
VPC 를 나누어 가진 영역이 서브넷이고 private, public 두가지가 존재한다.
서브넷 밖으로 나가는 트래픽에 대한 경로를 지정한다.
Private 서브넷에서 인터넷 또는 다른 AWS 서비스를 사용하기 위한 연결망
공인 IP 주소가 할당된 VPC 내 인스턴스에 대해 NAT를 수행한다.