Telegraf에는 vSphere 환경의 메트릭 정보를 수집할 수 있는 플러그인이 존재한다.
이 플러그인을 활용해 vCenter로부터 vSpehre 클러스터의 다양한 메트릭 정보를 수집할 수 있고, 수집 작업은 vCenter SDK API를 통해 이루어진다.
호환성을 위해서인지, Telegraf의 기본 설정 파일에는 일부 수집 가능한 메트릭들이 빠져 있지만, 사용자가 직접 목록에 수동으로 메트릭 레이블을 추가해 준다면 Telegraf를 통해 원하는 메트릭을 추가로 수집할 수 있다.
만약 나처럼 Telegraf + InfluxDB + Grafana로 커스텀 대시보드를 작성하여 vSphere 환경을 모니터링 하려고 한다면, 이 정보를 유용하게 활용할 수 있을 것이다.
이 목록은 vCenter 8.0.3을 기준으로 작성되었으며, 만약 자신의 환경에 맞는 목록이 필요하다면 GOVC를 통해 직접 확인할 수 있다.
cpu.readiness.average
cpu.maxlimited.summation
cpu.wait.summation
cpu.idle.summation
cpu.ready.summation
cpu.costop.summation
cpu.usagemhz.average
cpu.run.summation
cpu.used.summation
cpu.usage.vcpus.average
cpu.overlap.summation
cpu.entitlement.latest
cpu.swapwait.summation
cpu.system.summation
cpu.latency.average
cpu.demand.average
cpu.demandEntitlementRatio.latest
cpu.usage.average
datastore.totalWriteLatency.average
datastore.write.average
datastore.numberReadAveraged.average
datastore.totalReadLatency.average
datastore.read.average
datastore.maxTotalLatency.latest
datastore.numberWriteAveraged.average
disk.numberWriteAveraged.average
disk.write.average
disk.numberRead.summation
disk.commands.summation
disk.busResets.summation
disk.usage.average
disk.commandsAborted.summation
disk.read.average
disk.commandsAveraged.average
disk.maxTotalLatency.latest
disk.numberWrite.summation
disk.numberReadAveraged.average
mem.usage.average
mem.active.average
mem.zipSaved.latest
mem.zero.average
mem.decompressionRate.average
mem.swapped.average
mem.granted.average
mem.latency.average
mem.llSwapOutRate.average
mem.overheadMax.average
mem.overheadTouched.average
mem.vmmemctl.average
mem.llSwapUsed.average
mem.consumed.average
mem.overhead.average
mem.swapin.average
mem.shared.average
mem.compressionRate.average
mem.swaptarget.average
mem.activewrite.average
mem.vmmemctltarget.average
mem.llSwapInRate.average
mem.zipped.latest
mem.swapout.average
mem.swapoutRate.average
mem.entitlement.average
mem.swapinRate.average
mem.compressed.average
net.bytesRx.average
net.droppedTx.summation
net.bytesTx.average
net.packetsTx.summation
net.received.average
net.packetsRx.summation
net.broadcastRx.summation
net.usage.average
net.transmitted.average
net.pnicBytesRx.average
net.multicastRx.summation
net.multicastTx.summation
net.droppedRx.summation
net.pnicBytesTx.average
net.broadcastTx.summation
power.capacity.usageStatic.average
power.energy.summation
power.power.average
rescpu.maxLimited5.latest
rescpu.actpk15.latest
rescpu.maxLimited15.latest
rescpu.runav1.latest
rescpu.maxLimited1.latest
rescpu.runpk5.latest
rescpu.runpk15.latest
rescpu.actpk5.latest
rescpu.runav5.latest
rescpu.runpk1.latest
rescpu.actpk1.latest
rescpu.actav1.latest
rescpu.actav15.latest
rescpu.samplePeriod.latest
rescpu.actav5.latest
rescpu.runav15.latest
rescpu.sampleCount.latest
sys.heartbeat.latest
sys.osUptime.latest
sys.uptime.latest
virtualDisk.totalWriteLatency.average
virtualDisk.numberWriteAveraged.average
virtualDisk.readLoadMetric.latest
virtualDisk.numberReadAveraged.average
virtualDisk.writeLoadMetric.latest
virtualDisk.write.average
virtualDisk.readIOSize.latest
virtualDisk.read.average
virtualDisk.smallSeeks.latest
virtualDisk.readLatencyUS.latest
virtualDisk.mediumSeeks.latest
virtualDisk.writeLatencyUS.latest
virtualDisk.totalReadLatency.average
virtualDisk.readOIO.latest
virtualDisk.writeIOSize.latest
virtualDisk.writeOIO.latest
virtualDisk.largeSeeks.latest
cpu.used.summation
cpu.coreUtilization.average
cpu.utilization.average
cpu.usage.average
cpu.idle.summation
cpu.usagemhz.average
cpu.readiness.average
cpu.costop.summation
cpu.latency.average
cpu.totalCapacity.average
cpu.reservedCapacity.average
cpu.swapwait.summation
cpu.ready.summation
cpu.demand.average
cpu.wait.summation
datastore.datastoreReadOIO.latest
datastore.datastoreWriteLoadMetric.latest
datastore.numberWriteAveraged.average
datastore.unmapIOs.summation
datastore.datastoreReadBytes.latest
datastore.datastoreReadLoadMetric.latest
datastore.siocActiveTimePercentage.average
datastore.datastoreNormalReadLatency.latest
datastore.datastoreWriteBytes.latest
datastore.datastoreIops.average
datastore.totalReadLatency.average
datastore.numberReadAveraged.average
datastore.datastoreReadIops.latest
datastore.datastoreVMObservedLatency.latest
datastore.write.average
datastore.datastoreWriteIops.latest
datastore.unmapSize.summation
datastore.read.average
datastore.totalWriteLatency.average
datastore.sizeNormalizedDatastoreLatency.average
datastore.maxTotalLatency.latest
datastore.datastoreWriteOIO.latest
datastore.datastoreNormalWriteLatency.latest
datastore.datastoreMaxQueueDepth.latest
disk.kernelLatency.average
disk.queueWriteLatency.average
disk.commandsAborted.summation
disk.numberReadAveraged.average
disk.usage.average
disk.numberWriteAveraged.average
disk.deviceReadLatency.average
disk.totalReadLatency.average
disk.numberWrite.summation
disk.busResets.summation
disk.kernelReadLatency.average
disk.queueLatency.average
disk.commandsAveraged.average
disk.maxTotalLatency.latest
disk.write.average
disk.commands.summation
disk.maxQueueDepth.average
disk.numberRead.summation
disk.read.average
disk.deviceWriteLatency.average
disk.totalWriteLatency.average
disk.kernelWriteLatency.average
disk.deviceLatency.average
disk.queueReadLatency.average
disk.totalLatency.average
hbr.hbrNumVms.average
hbr.hbrNetRx.average
hbr.hbrNetTx.average
hbr.hbrDiskReadLatency.average
hbr.hbrDiskStallLatency.average
mem.llSwapOutRate.average
mem.bandwidth.read.latest
mem.vmfs.pbc.capMissRatio.latest
mem.zero.average
mem.latency.average
mem.vmfs.pbc.overhead.latest
mem.compressed.average
mem.latency.read.latest
mem.swapinRate.average
mem.reservedCapacity.average
mem.active.average
mem.swapoutRate.average
mem.latency.write.latest
mem.usage.average
mem.vmfs.pbc.size.latest
mem.lowfreethreshold.average
mem.vmmemctl.average
mem.vmfs.pbc.workingSetMax.latest
mem.vmfs.pbc.workingSet.latest
mem.vmfs.pbc.sizeMax.latest
mem.state.latest
mem.heap.average
mem.sharedcommon.average
mem.shared.average
mem.llSwapInRate.average
mem.llSwapUsed.average
mem.swapout.average
mem.swapused.average
mem.overhead.average
mem.bandwidth.total.latest
mem.activewrite.average
mem.llSwapOut.average
mem.consumed.average
mem.totalCapacity.average
mem.bandwidth.write.latest
mem.heapfree.average
mem.compressionRate.average
mem.sysUsage.average
mem.swapin.average
mem.decompressionRate.average
mem.unreserved.average
mem.llSwapIn.average
mem.granted.average
net.bytesTx.average
net.droppedRx.summation
net.unknownProtos.summation
net.received.average
net.packetsRx.summation
net.transmitted.average
net.usage.average
net.errorsTx.summation
net.broadcastRx.summation
net.multicastRx.summation
net.droppedTx.summation
net.multicastTx.summation
net.packetsTx.summation
net.broadcastTx.summation
net.bytesRx.average
net.errorsRx.summation
power.power.average
power.capacity.usageVm.average
power.capacity.usageIdle.average
power.capacity.usageSystem.average
power.energy.summation
power.powerCap.average
rescpu.actpk1.latest
rescpu.samplePeriod.latest
rescpu.runpk5.latest
rescpu.sampleCount.latest
rescpu.maxLimited15.latest
rescpu.runav5.latest
rescpu.actav5.latest
rescpu.actav1.latest
rescpu.runpk15.latest
rescpu.actav15.latest
rescpu.runpk1.latest
rescpu.runav1.latest
rescpu.runav15.latest
rescpu.actpk15.latest
rescpu.maxLimited5.latest
rescpu.actpk5.latest
rescpu.maxLimited1.latest
storageAdapter.commandsAveraged.average
storageAdapter.read.average
storageAdapter.totalWriteLatency.average
storageAdapter.numberReadAveraged.average
storageAdapter.write.average
storageAdapter.totalReadLatency.average
storageAdapter.numberWriteAveraged.average
storageAdapter.maxTotalLatency.latest
storagePath.totalWriteLatency.average
storagePath.numberWriteAveraged.average
storagePath.numberReadAveraged.average
storagePath.read.average
storagePath.totalReadLatency.average
storagePath.write.average
storagePath.commandsAveraged.average
storagePath.maxTotalLatency.latest
sys.resourceMemMapped.latest
sys.resourceMemOverhead.latest
sys.resourceMemSwapped.latest
sys.resourceMemConsumed.latest
sys.resourceMemShared.latest
sys.resourceMemZero.latest
sys.resourceCpuUsage.average
sys.resourceMemTouched.latest
sys.resourceFdUsage.latest
sys.resourceCpuMaxLimited5.latest
sys.resourceMemCow.latest
sys.resourceCpuMaxLimited1.latest
sys.resourceMemAllocShares.latest
sys.resourceCpuRun5.latest
sys.resourceCpuRun1.latest
sys.resourceMemAllocMax.latest
sys.resourceCpuAct5.latest
sys.resourceCpuAct1.latest
sys.resourceCpuAllocShares.latest
sys.resourceCpuAllocMax.latest
sys.resourceMemAllocMin.latest
sys.resourceCpuAllocMin.latest
sys.uptime.latest
clusterServices.failover.latest
cpu.usagemhz.average
cpu.usage.average
gpu.power.used.latest
gpu.utilization.average
gpu.mem.used.average
gpu.mem.reserved.latest
gpu.temperature.average
gpu.mem.total.latest
mem.consumed.average
mem.vmmemctl.average
mem.overhead.average
mem.usage.average
vmop.numReset.latest
vmop.numRegister.latest
vmop.numSuspend.latest
vmop.numPoweron.latest
vmop.numRebootGuest.latest
vmop.numStandbyGuest.latest
vmop.numShutdownGuest.latest
vmop.numCreate.latest
vmop.numDestroy.latest
vmop.numPoweroff.latest
vmop.numUnregister.latest
vmop.numReconfigure.latest
vmop.numClone.latest
vmop.numDeploy.latest
vmop.numChangeHost.latest
vmop.numChangeDS.latest
vmop.numChangeHostDS.latest
vmop.numVMotion.latest
vmop.numSVMotion.latest
vmop.numXVMotion.latest
vmop.numPoweron.latest
vmop.numPoweroff.latest
vmop.numSuspend.latest
vmop.numReset.latest
vmop.numRebootGuest.latest
vmop.numStandbyGuest.latest
vmop.numShutdownGuest.latest
vmop.numCreate.latest
vmop.numDestroy.latest
vmop.numRegister.latest
vmop.numUnregister.latest
vmop.numReconfigure.latest
vmop.numClone.latest
vmop.numDeploy.latest
vmop.numChangeHost.latest
vmop.numChangeDS.latest
vmop.numChangeHostDS.latest
vmop.numVMotion.latest
vmop.numSVMotion.latest
vmop.numXVMotion.latest
datastore.numberReadAveraged.average
datastore.numberWriteAveraged.average
disk.used.latest
disk.provisioned.latest
disk.capacity.latest
disk.numberReadAveraged.average
disk.numberWriteAveraged.average
vCenter가 퍼블릭 API를 통해 생각보다 많은 메트릭 데이터를 제공한다는 사실을 알 수 있었다.
이 데이터를 잘 활용한다면 Aria Operations 만큼은 아니더라도, 그에 준하는 수준의 세분성을 가지는 모니터링 대시보드를 구축할 수 있을 것이다.
한 가지 아쉬운 점은, 개별 VM의 DRS 스코어를 확인할 수 있는 차트가 vCenter에 존재하지만, 퍼블릭 API 로는 제공되지 않는다는 점이다. Aria Ops 없이 DRS 스코어를 모니터링 하려면 다른 접근방법이 필요할 것 같다.