Data Prepper는 오픈소스 Server Side Data Collector임.
Raw log 수집에서부터 interactive ad-hoc analyses까지 end-to end lifecycle 분석을 목표로 하고 있음.
Star수를 보았을때 Opensearch에서 초기단계의 프로젝트이다. (2021년 6월 Launch)
Opensearch Project에서 개발했음 (Apache 2 license)
OpenSearch (https://opensearch.org/)
Copyright OpenSearch Contributors
This product includes software developed by
Elasticsearch (http://www.elastic.co).
Copyright 2009-2018 Elasticsearch
This product includes software developed by The Apache Software
Foundation (http://www.apache.org/).
This product includes software developed by
Joda.org (http://www.joda.org/).
Source, Buffer, Sink, Processor 4개의 핵심요소로 나누어져 있음.
3-1 Minimal component 구성
sample-pipeline:
source:
file:
path: path/to/input-file
sink:
- file:
path: path/to/output-file
3-2 All components 구성
sample-pipeline:
workers: 4 #Number of workers
delay: 100 # in milliseconds, how often the workers should run
source:
file:
path: path/to/input-file
buffer:
bounded_blocking:
buffer_size: 1024 # max number of records the buffer will accept
batch_size: 256 # max number of records the buffer will drain for each read
processor:
- string_converter:
upper_case: true
sink:
- file:
path: path/to/output-file
3-3 input - output 1 / output2 구성
input-pipeline:
source:
file:
path: path/to/input-file
sink:
- pipeline:
name: "output-pipeline-1"
- pipeline:
name: "output-pipeline-2"
output-pipeline-1:
source:
pipeline:
name: "input-pipeline"
processor:
- string_converter:
upper_case: true
sink:
- file:
path: path/to/output-1-file
output-pipeline-2:
source:
pipeline:
name: "input-pipeline"
processor:
- string_converter:
upper_case: false
sink:
- file:
path: path/to/output-2-file
3-4. 조건부 Routing을 활용한 구성
route:
- application-logs: '/log_type == "application"'
- http-logs: '/log_type == "apache"'
4-1. Example Yaml file
Data-prepper-fluentbit-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
namespace: logging
labels:
k8s-app: fluent-bit
data:
# Configuration files: server, input, filters and output
# ======================================================
fluent-bit.conf: |
[SERVICE]
Flush 1
Log_Level info
Daemon off
Parsers_File parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
@INCLUDE input-kubernetes.conf
@INCLUDE filter-kubernetes.conf
@INCLUDE output-data-prepper.conf
input-kubernetes.conf: |
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/*my-app*.log
Parser docker
DB /var/log/flb_kube.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 10
filter-kubernetes.conf: |
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
Kube_Tag_Prefix kube.var.log.containers.
Merge_Log On
Merge_Log_Key log_processed
K8S-Logging.Parser On
K8S-Logging.Exclude Off
output-data-prepper.conf: |
[OUTPUT]
Name http
Match kube.*
Host host.docker.internal
Port 2021
Format json
URI /log/ingest
parsers.conf: |
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On