
AWS 리소스를 모니터링하는 솔루션으로 New Relic, Datadog, Splunk는 3가지 대표적인 APM(Application Performance Monitoring) 및 Observability 툴입니다.
그 중에, New Relic을 활용해서 AWS 클라우드 인프라, 애플리케이션, 네트워크 등을 단일 대시보드에서 관리 및 분석하는 방법을 설명합니다.
관리콘솔 :
- https://one.newrelic.com/
License 체계
- Super User 당 과금
<integration 방안 2가지>

<배포 절차>
1) Kubernetes 통합 마법사에 EKS cluster 이름을 기입


2) 제공되는 Helm 차트를 통해 배포


3) 데이터 수집은 1~3분 정도 소요됩니다.

< 배포 절차 >
helm repo add newrelic https://helm-charts.newrelic.com
helm repo update newrelic
helm upgrade --install newrelic-bundle newrelic/nri-bundle -n newrelic --create-namespace -f values.yaml


## Global values
global:
# -- The cluster name for the Kubernetes cluster.
cluster: "_YOUR_K8S_CLUSTER_NAME_"
# -- The license key for your New Relic Account. This will be preferred configuration option if both `licenseKey` and `customSecret` are specified.
licenseKey: "_YOUR_NEW_RELIC_LICENSE_KEY_"
# -- (bool) In each integration it has different behavior. Enables operating system metric collection on each EC2 K8s node. Not applicable to Fargate nodes.
# @default -- false
# privileged: true
# -- (bool) Must be set to `true` when deploying in an EKS Fargate environment
# @default -- false
fargate: true
## Enable nri-bundle sub-charts
newrelic-infra-operator:
# Deploys the infrastructure operator, which injects the monitoring sidecar into Fargate pods
enabled: true
tolerations:
- key: "eks.amazonaws.com/compute-type"
operator: "Equal"
value: "fargate"
effect: "NoSchedule"
config:
ignoreMutationErrors: true
infraAgentInjection:
# Injection policies can be defined here. See [values file](https://github.com/newrelic/newrelic-infra-operator/blob/main/charts/newrelic-infra-operator/values.yaml#L114-L125) for more detail.
policies:
- namespaceName: namespace-a
- namespaceName: namespace-b
newrelic-infrastructure:
# Deploys the Infrastructure Daemonset to EC2 nodes. Disable for Fargate-only clusters.
enabled: true
nri-metadata-injection:
# Deploy our mutating admission webhook to link APM and Kubernetes entities
enabled: true
kube-state-metrics:
# Deploys Kube State Metrics. Disable if you are already running KSM in your cluster.
enabled: true
nri-kube-events:
# Deploy the Kubernetes events integration.
enabled: true
newrelic-logging:
# Deploys the New Relic's Fluent Bit daemonset to EC2 nodes. Disable for Fargate-only clusters.
enabled: true
newrelic-prometheus-agent:
# Deploys the Prometheus agent for scraping Prometheus endpoints.
enabled: true
config:
kubernetes:
integrations_filter:
enabled: true
source_labels: ["app.kubernetes.io/name", "app.newrelic.io/name", "k8s-app"]
app_values: ["redis", "traefik", "calico", "nginx", "coredns", "kube-dns", "etcd", "cockroachdb", "velero", "harbor", "argocd", "istio"]
helm list -n newrelic // 차트 조회
helm delete -n newrelic newrelic-bundle // 릴리즈 이름 명시

<배포 절차>
1) Cluster Role
- name : newrelic-newrelic-infrastructure-infra-agent
2) add an additional sidecar container for the newrelic/infrastructure-k8s image.
- 모니터링을 원하는 workload에 제공된 snipet을 yaml에 삽입해서 sidecar 컨테이너 배포
3) ClusterRoleBinding
- 모니터링하려는 Pod의 서비스 계정을 주체로 하는 ClusterRoleBinding을 생성
4) secret 생성
- New Relic license key 포함된
- Each namespace needs its own secret.
모니터링을 할 서버들의 각 OS(Windows, Linux)에 agent를 설치해서 메트릭 및 로그를 가져오는 방식입니다.
< Windows >
[Net.ServicePointManager]::SecurityProtocol = 'tls12, tls'; WebClient = New-Object System.Net.WebClient; $WebClient.DownloadFile("https://download.newrelic.com/install/newrelic-cli/scripts/install.ps1", "env:TEMP\install.ps1"); & PowerShell.exe -ExecutionPolicy Bypass -File $env:TEMP\install.ps1; $env:NEW_RELIC_API_KEY='NRAK-ZVOSZFFLO5UM6DD5ZZYXBOD7BH2'; $env:NEW_RELIC_ACCOUNT_ID='3985079'; & 'C:\Program Files\New Relic\New Relic CLI\newrelic.exe' install
< Linux >
curl -Ls https://download.newrelic.com/install/newrelic-cli/scripts/install.sh | bash && sudo NEW_RELIC_API_KEY=NRAK-ZVOSZFFLO5UM6DD5ZZYXBOD7BH2 NEW_RELIC_ACCOUNT_ID=3985079 /usr/local/bin/newrelic install
service newrelic-infra status
service newrelic-infra start
Windows에서 구동 중인 MSSQL의 메트릭을 가져오는 방법입니다.
<구성 절차>
0) Newrelic infra agent 설치
1) sql 접근 계정(newrelic) 생성
2) SQL Server Browser 서비스 시작
3) Newrelic SQL agent 설치
4) mssql-config 수정
5) newrelic infra 서비스 재 시작
--
<sql 접근 계정(newrelic) 생성>
USE master; CREATE LOGIN newrelic WITH PASSWORD = 'My Password'; GRANT CONNECT SQL TO newrelic; GRANT VIEW SERVER STATE TO newrelic; GRANT VIEW ANY DEFINITION TO newrelic;
-- Grant read access privileges section
DECLARE @name SYSNAME
DECLARE db_cursor CURSOR
READ_ONLY FORWARD_ONLY
FOR
SELECT NAME
FROM master.sys.databases
WHERE NAME NOT IN ('master','msdb','tempdb','model','rdsadmin','distribution') and state != 6
OPEN db_cursor
FETCH NEXT FROM db_cursor INTO @name WHILE @@FETCH_STATUS = 0
BEGIN
EXECUTE('USE "' + @name + '"; CREATE USER newrelic FOR LOGIN newrelic;' );
FETCH next FROM db_cursor INTO @name
END
CLOSE db_cursor
DEALLOCATE db_cursor
<구성 절차>
1) Variable 선언
2) Chart의 query 문에 적용
SELECT average(cpuPercent) FROM SystemSample FACET
entityNamesince 1 day ago limit 20
SELECT average(memoryUsedBytes / memoryTotalBytes * 100) FROM SystemSample FACET
entityNamesince 1 day ago limit 20
특정 AWS Account ID만 출력
SELECT average(cpuPercent) FROM SystemSample where awsAccountId IN ('44444444444') FACET
entityNamesince 1 day ago limit 20
* 'in'
- where 절 내에서, 복수 개의 인자(위의 경우 'awsAccountId')를 명시할 때 사용
1) All Entities -> Services-APM -> 점등된 App 선택 -> Summary -> Service Level


2) APM & Services -> Events -> Issue & Activity -> Incidents -> 트리거된 incident 확인

3) APM & Services -> Monitor -> External services -> Called entities -> Response time 차트에서 Latency가 spike 친 App이 있는지 체크


4) APM & Services -> Monitor -> Distributed tracing -> 상위 트랜잭션들의 trace 확인 -> latency 값이 높은 Service 추적


5) APM & Services -> Summary -> Web transactions time -> latency high에 contribution이 가장 높은 segment를 추적

6) APM & Services -> Monitor -> External services -> Called entities -> Response time

7) APM & Services -> Summary -> Web transactions time -> latency high에 contribution이 가장 높은 segment가 'Go' 임을 확인

8) APM & Services -> Summary -> Transactions -> high latency에 기여 중인 top 트랜잭션 확인

9) APM & Services -> Kubernetes -> Activity stream -> 누적된 event 확인 -> 해당 incident가 발생한 시각에 match되는 event를 확인
