[EFK로 서버 로그 수집하기] Fluentd로 로그 파싱하고 보내기

Hyunjun Kim·2025년 5월 27일

Data_Engineering

목록 보기

83/153

3. Fluentd로 로그 파싱하고 보내기

3.1 Fluentd 설치하기

https://docs.fluentd.org/installation/before-install 해당 매뉴얼을 따라서 system 설정을 바꿔야한다. 자신의 값에 따라서 reboot 이 필요할 수도 있다.

3.1.1 서버에 직접 설치

Gem이외에 다른 방법도 있다. 설치 매뉴얼

sudo apt update
sudo apt install build-essential -y

Ruby gem 설치

sudo apt install ruby-rubygems -y
sudo apt install ruby-dev -y

sudo gem install fluentd --no-doc

ruby version 때문에 실패할 수도 있다. 실패하면 안내대로 추가 설치를 진행하고 다시 시도한다.

gem install yajl-ruby -v 1.4.1

Docker Container에 설치(강의X)

Fluentd는 별도의 Docker Container 를 이용하는 경우 agent를 따로 설치하지 않고 Driver로 설치할 수 있다는 점이 장점이다.

https://docs.docker.com/config/containers/logging/fluentd/

앞서 설명한 syslog 도 driver로 존재한다.
https://docs.docker.com/config/containers/logging/syslog/

3.1.2 fluentd directory 세팅

fluentd --setup ./fluent

3.1.3 fluentd 테스트

fluentd -c ./fluent/fluent.conf -vv &

echo '{"json":"message"}' | fluent-cat debug.test

3.1.4 Process종료

pkill -f fluentd

3.2 실습용 log generator

3.2.1 Log Generator 설치

mkdir loggen && cd loggen
wget https://github.com/mingrammer/flog/releases/download/v0.4.3/flog_0.4.3_linux_amd64.tar.gz

tar -xvf flog_0.4.3_linux_amd64.tar.gz

./flog --help

3.2.2 Log 생성

json. 버전

./flog -f json -t log -s 1m -n 1000 -o $filename -w &

$filename 을 원하는 값으로
-s 옵션으로 시간간격이 다른 여러 로그를 한번에 생성할 수 있다.

apache 버전

./flog -f apache_common -t log -s 1m -n 1000 -o $filename -w &

이후 실습을 위해 생성한 파일을 복사해서, 날짜가 다른 데이터 파일을 3벌씩 만든다.

3.3 Fluentd 로 로그파일 읽어서 보내기

4.1 Opensearch 설치하기 를 먼저 수행하고 시작한다.

3.3.1 설정 명령들

설정 형식 매뉴얼

source : directives determine the input sources
match : directives determine the output destinations
filter : directives determine the event processing pipelines
system : directives set system-wide configuration
label : directives group the output and filter for internal routing
worker : directives limit to the specific workers
@include : directives include other files
- 예) conf.d 디렉토리를 include하고, conf.d 하위에 여러개의 로그 파이프라인 설정을 파일로 전략이 가능하다.

FLuentd 같은 경우는 매뉴얼을 잘 봐야하고 여기 버전에 따라서 또 설정 방식 같은 것들이 달라질 수도 있고 플러그인 방식으로 지원하는 게 굉장히 많다 보니까 내가 원하는 거, 기본 기능이 아니라 플러그인으로 대부분 지원을 하는데 플러그인의 매뉴얼을 또
상세히 살펴봐야 내가 원하는 기능을 할 수 있다. 그냥 이렇게 웹사이트 쑥 훑어보고 하면
이게 생각보다 어려우실 수 있어요. 그래서 매뉴얼을 우리의 요구사항이 있으면 요구사항을 제대로 정의한 다음에 이걸 지원하려면 어떤 플러그인에 어떤 키벨류로 설정을 하고 어떤 순서로 집어넣어야 된다 이거를 파악한 다음에 이 Fluentd 설정 파일을 짜는 결 추천한다.

3.3.2 Input 지정하기

3.2 에서 생성한 log 파일을 input으로 지정한다. tail plugin을 사용한다. input으로 지정할 수 있는 plugin 메뉴얼

로그 파일을 읽어야 하는데 tail이라는 plugin 을 사용해서 읽을 것이다. input으로 지정할 수 있는 플러그인들은 이 매뉴얼 링크에 들어가 보면 input 종류가 나와 있다

그래서 input 보시면 @type 하면 어떤 플러그인 쓸 거야 를 지정

밑에 오는 설정은 사용하는 플러그인이 뭐냐에 따라서 달라지는데 우선은 태그는 모두 공통이다.

tag 이 소스랑 연결될 다른 설정들에서 이 태그 정보를 보고 연결이 된다. 태그는 그런 의미로 남으니까 이름을 잘 정해야함.

pos_file : 어떤 파일을 어디까지 읽었는지에 대한 포지션 정보 들고 있는 것.

read_from_head : 실습 때문에 True로 함. 같은 파일에 대해서 반복적으로 해야되니까.
이게 false면 tail에 붙어 있을 것.

follow_inodes : 파일 시스템에서 어떤 파일을 가ㅣ리키는 유니크한 값. 로그 로테이팅이나 이런 거 할 때에도 변경되지 않는 값.

parse : input 데이터를 어떻게 읽을지를 알았어. 어떻게 읽을 건데? 라는 걸 정의하는 게 parse 부분.
parse 메뉴얼

로깅할 때 가장 기준이 되는 것은 타임 스탬프다. 타임스탬프가 중요하기 때문에
타임에 대한 설정을 아래와 같이 해준 모습.

json 버전

<source>
	@type tail
	tag log.json.*
	path /home/ubuntu/loggen/json-*.log
	pos_file positions-json.pos
	read_from_head true
	follow_inodes true

	<parse>
		@type json
		time_key datetime
		time_type string
		time_format %d/%b/%Y:%H:%M:%S %z
	</parse>
</source>

regex 버전

<source>
	@type tail
	tag log.apache.*
	path /home/ubuntu/loggen/apache-*.log
	pos_file positions-apache.pos
	read_from_head true
	follow_inodes true
	
    <parse>
		@type regexp
		expression /^(?<client>\S+) \S+ (?<userid>\S+) \[(?<datetime>[^\]]+)\] "(?<method>[A-Z]+) (?<request>[^ "]+)? (?<protocol>HTTP\/[0
		time_key datetime
		time_format %d/%b/%Y:%H:%M:%S %z
	</parse>
</source>

stdout 으로 확인한다. 로그를 제대로 읽고 파싱할 수 있는지 확인한다.

<match log.json.**>
	@type stdout
</match>
<match log.apache.**>
	@type stdout
</match>

❗ 주의 prefix로 모두 받으려면, asterisk 가 두 개(**) 여야한다.

match 옆에는 tag의 조건이 와야한다.

fluentd -c ./fluent.conf -vv

파싱을 제대로 못하거나 match할 것이 없으면 다음과 같은 로그가 뜬다. 이 경우 파싱하는 plugin, 규칙(regex), time 관련 필드와 형식 등
을 확인해야한다. 이 파싱 규칙 맞추는 부분이 가장 어렵다.

no patterns matched tag ..

3.3.3 Output 지정하기

4.1 에서 세팅한 Opensearch 서버로 로그를 보낸다.

먼저 plugin 을 설치해야 한다.
https://docs.fluentd.org/output/opensearch

sudo fluent-gem install fluent-plugin-opensearch

우선 dummy data 를 보내서 opensearch로 잘 전송 & index 생성이 되는지 확인한다.

<source>
@type dummy
tag dummy
dummy {"hello":"world"}
</source>

<match dummy>
@type opensearch
host $your_opensearch_host
port 9200
index_name fluentd-test
</match>

fluentd -c ./fluent.conf -vv

curl 로 index 가 생성되었는지 확인한다.

curl -XGET http://$your_opensearch_host:9200/_cat/indices?v

이제 로그를 opensearch의 index로 보낸다.

<match log.apache.**>
	@type opensearch
	host $your_opensearch_host
	port 9200
	index_name apache-log
</match>

<match log.json.**>
	@type opensearch
	host $your_opensearch_host
	port 9200
	index_name json-log
</match>

태그별로 다른 인덱스로 보낸다.

3.3.4 Timeformat 으로 index 지정하기

Opensearch의 index lifecycle 을 시간으로 가져가기 때문에, log에 담긴 시간 값에 맞는 index로 전송할 수 있다. 매뉴얼 을 꼭 읽고 세팅하는 것을 추천한다.

<match log.json.**>
	@type opensearch
	hosts $your_opensearch_host:9200
	logstash_format true
	logstash_prefix json-timelog
	include_timestamp true
	time_key datetime
	time_key_format %d/%b/%Y:%H:%M:%S %z
</match>

endpoint를 hosts 로 설정해야한다.
host, port 로 endpoint 를 설정하면 설정이 적용되지 않는다.(버그)
logstash 에서 주로 쓰이던 패턴이기 때문에 logstash_format 으로 세팅한다.
prefix 를 변경할 수 있다.
include_timestamp true 를 통해서 time_key 를 바로 kibana(open dashboard)의 @timestamp 로 매핑한다.
time_key 의 포맷을 알려줘야 파싱할 수 있다.
curl로 로그가 가진 시간에 맞는 index가 생성되었는지 확인한다.

curl -XGET http://$your_opensearch_host:9200/_cat/indices?v

3.3.5 필터링 하기

filter 로 tag 에 대한 filtering 조건을 걸 수 있다.
@type grep 으로 원하는 조건 걸기. 결과가 true 인 경우만 다음 단계로 넘어간다. 매뉴얼

<filter log.**>
	@type grep
	<exclude>
		key status
		pattern /^[2][0-9][0-9]/
	</exclude>
#	<regexp>
#		key status
#		pattern /^[1345][01235][0-9]/
#	</regexp>
</filter>

regex 는 inverted operation 이 없기 때문에, exclude 로 하면 조건에 반대되는 기능(filterNot)을 할 수 있다.

@type record_transformer 로 필드를 추가하거나 삭제할 수 있다. 매뉴얼

<filter log.**>
	@type record_transformer
	<record>
		sent_by fluentd
		ftag ${tag}
	</record>
	remove_keys host
</filter>

<record> 로 원하는 필드를 추가할 수 있다. 이미 예약된 변수를 활용할 수 있다.
remove_keys 로 필드 이름을 찾아 삭제한다.

❗ filter 는 순서가 중요하다. match 보다 뒤에오면 해당 match에는 적용되지 않는다. filter 끼리도 선언된 순서대로 적용된다.

Hyunjun Kim

Data Analytics Engineer 가 되

이전 포스트

[EFK로 서버 로그 수집하기] 로그 수집 아키텍처

다음 포스트