ETL (1)

์žญ์žญ์ดยท2021๋…„ 5์›” 10์ผ

ETL

๋ชฉ๋ก ๋ณด๊ธฐ
1/3
post-thumbnail

ETL (1)

๐ŸŽ Contents

0. Summary

ETL์ด๋ž€ ์ถ”์ถœ(Extraction), ๋ณ€ํ™˜(Transformation), ์ ์žฌ(Load)์˜ ์•ฝ์–ด์ด๋‹ค.
Database์— ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”์ถœํ•˜์—ฌ, ์›ํ•˜๋Š” ํ˜•ํƒœ๋กœ ๋ณ€ํ™˜ํ•˜๊ณ  ๋‹ค๋ฅธ Database์— ์ ์žฌํ•˜๋Š” ์ž‘์—…์ด๋‹ค.

๋ฐ์ดํ„ฐ๋ถ„์„์—์„œ๋Š” ๋น ์ง€์ง€์•Š๋Š” ์ค‘์š”ํ•œ ๊ฐœ๋…์ด๋‹ค.

๋ณธ ์‹œ๋ฆฌ์ฆˆ์—์„œ๋Š” ETL์˜ ์ „์ฒด๋ฅผ ๋‹ค๋ฃจ๋ฉฐ ๋‹ค์Œ์„ ์‚ฌ์šฉํ•œ๋‹ค.

  • Database : MariaDB
  • Batch : Spring Batch
  • Scheduler : Jenkins

MariaDB์™€ Jenkins๋Š” docker๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์„œ๋น„์Šคํ•œ๋‹ค.

๋จผ์ € ETL์„ ์œ„ํ•œ ๊ธฐ๋ณธํ™˜๊ฒฝ๊ตฌ์„ฑ์„ ํ•œ๋‹ค.

1. Launch databases

mariadb container๋ฅผ ์ด 2๊ฐœ๋ฅผ ๋„์šด๋‹ค.
db-source์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณ€ํ™˜ํ•˜์—ฌ db-target์— ์ ์žฌํ•  ๊ฒƒ์ด๋‹ค.

MariaDB๋Š” sudo su๋ฅผ ํ†ตํ•ด ์ ‘์†ํ•˜๊ฑฐ๋‚˜ mysql -u root -p๋กœ ๋ฃจํŠธ ํŒจ์Šค์›Œ๋“œ๋ฅผ ์ž…๋ ฅํ•˜๊ณ  ์ ‘์†ํ•ด์•ผํ•œ๋‹ค.
container๋กœ DB๋ฅผ ์˜ฌ๋ฆฌ๋ฉด MYSQL_ROOT_PASSWORD๋ผ๋Š” ํ™˜๊ฒฝ๋ณ€์ˆ˜๋ฅผ ์„ค์ •ํ•˜๊ณ  ์ด๋ฅผ ๋กœ๊ทธ์ธํ• ๋•Œ ์ž…๋ ฅํ•ด์•ผ ํ•œ๋‹ค.

์ดˆ๊ธฐ ๊ตฌ๋™ ํ›„ ํ•ด๋‹น ํ™˜๊ฒฝ๋ณ€์ˆ˜๋ฅผ ์ง€์šด ํ›„ ์žฌ๊ธฐ๋™ํ•œ๋‹ค.

1.1. Create docker-compose.yaml

mkdir -p ~/etl-sample
touch ~/etl-sample/docker-compose.yaml
  • docker-compose.yaml
version: '3.1'

services:
  db-source:
    container_name: db-source
    image: mariadb:latest
    restart: always
    ports:
      - 3306:3306
    volumes:
      - /data/db-source/data:/var/lib/mysql
      - /data/db-source/config:/etc/mysql/conf.d
    environment:
      TZ: Asia/Seoul
      # mysql์„ ์ฒ˜์Œ ์„œ๋น„์Šคํ• ๋•Œ์— root password๊ฐ€ ํ•„์š”ํ•˜๋‹ค.
      # ์•„๋ž˜์˜ ํ™˜๊ฒฝ๋ณ€์ˆ˜๋ฅผ ํฌํ•จํ•˜์—ฌ ์˜ฌ๋ฆฐ ํ›„ ์ง€์›Œ์ค€๋‹ค.
      # MYSQL_ROOT_PASSWORD: ${PASSWORD}
  db-target:
    container_name: db-target
    image: mariadb:latest
    restart: always
    ports:
      - 3307:3306
    volumes:
      - /data/db-target/data:/var/lib/mysql
      - /data/db-target/config:/etc/mysql/conf.d
    environment:
      TZ: Asia/Seoul
      # MYSQL_ROOT_PASSWORD: ${PASSWORD}
cd ~/etl-sample
# ์ดˆ๊ธฐ ๊ตฌ๋™
docker-compose up -d

# ์žฌ์‹œ์ž‘
## docker-compose downd์€ docker rm์„ ํฌํ•จํ•˜๊ณ  ์žˆ๊ธฐ์— volume์„ค์ •์„ ํ™•์ธํ•˜์ž.
docker-compose down
docker-compose up -d

1.2. db-source: Create table & Insert sample data

db-source์— ์ƒ˜ํ”Œ๋ฐ์ดํ„ฐ๋ฅผ ๋„ฃ๋Š” ์ž‘์—…์ด๋‹ค.

docker exec -it db-source bash

mysql -u root -p
# ํŒจ์Šค์›Œ๋“œ ์ž…๋ ฅ
CREATE DATABASE etl;
USE etl;

CREATE TABLE product  (
  id         bigint not null auto_increment,
  name     varchar(255),
  price     bigint,
  created datetime,
  primary key (id)
) engine = InnoDB;

INSERT INTO product (name, price, created) VALUES ('product1', 1000, '2021-05-04 00:00:00');
INSERT INTO product (name, price, created) VALUES ('product2', 2000, '2021-05-04 00:00:00');
INSERT INTO product (name, price, created) VALUES ('product3', 3000, '2021-05-04 00:00:00');
INSERT INTO product (name, price, created) VALUES ('product4', 4000, '2021-05-04 00:00:00');

1.3. db-source: Create Metadata table

Spring batch๊ฐ€ ์ž‘๋™ํ•˜๊ธฐ ์œ„ํ•ด ํ•„์š”ํ•œ ํ…Œ์ด๋ธ”๋“ค์ด ์ •ํ•ด์ ธ์žˆ๋‹ค.
์ด๋ฅผ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ํ…Œ์ด๋ธ”์ด๋ผ๊ณ  ๋ถ€๋ฅธ๋‹ค.
H2์™€ ๊ฐ™์€ ์ธ๋ฉ”๋ชจ๋ฆฌ DB๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด Spring-boot์—์„œ ์ž๋™์œผ๋กœ ๋งŒ๋“ค์–ด์ฃผ์ง€๋งŒ, MySQL๊ณผ ๊ฐ™์ด ๋ณ„๊ฐœ์˜ DB๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ์ˆ˜๋™์œผ๋กœ ๋งŒ๋“ค์–ด์•ผ ํ•œ๋‹ค.

์‚ฌ์šฉํ•˜๋Š” Database๋ณ„๋กœ ์Šคํ‚ค๋งˆ๊ฐ€ ์ •ํ•ด์ ธ ์žˆ๋‹ค. ์Šคํ‚ค๋งˆ ๋งํฌ

๋ณธ ๋ฌธ์„œ์—์„œ๋Š” MySQL์„ ์‚ฌ์šฉํ•˜๋ฏ€๋กœ schema-mysql.sql์„ ์‚ฌ์šฉํ•œ๋‹ค.

docker exec -it db-source bash

mysql -u root -p
# Enter password
use etl;
<!-- Paste schema-mysql.sql -->
show tables;

1.4. db-source: Set the authority

etl ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์˜ ์‚ฌ์šฉ์ž๋ฅผ ๋งŒ๋“ค๊ณ  ๊ถŒํ•œ์„ ๋ถ€์—ฌํ•œ๋‹ค.

docker exec -it db-source bash

mysql -u root -p
# ํŒจ์Šค์›Œ๋“œ ์ž…๋ ฅ

# $USER์—๊ฒŒ etl db์˜ ์ ‘๊ทผ ๊ถŒํ•œ์„ ๋ถ€์—ฌํ•œ๋‹ค. 
# ๋งŒ์ผ, $USER๊ฐ€ ์—†๋‹ค๋ฉด ์ƒˆ๋กœ ์ƒ์„ฑํ•œ๋‹ค.
# grant all privileges on $DATABASE.* to $USER@'%' identified by '$PASSWORD';
grant all privileges on etl.* to rivernine@'%' identified by 'rivernine';
exit

# ํ™•์ธ
mysql -u rivernine -p
# rivernine
show databases;

1.5. db-target: Create table

CREATE DATABASE etl;
USE etl;

CREATE TABLE trans_product  (
  id         bigint not null auto_increment,
  name     varchar(255),
  price     bigint,
  created datetime,
  discount boolean,
  primary key (id)
) engine = InnoDB;

grant all privileges on etl.* to rivernine@'%' identified by 'rivernine';

1.6. db-target: Set the authority

etl ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์˜ ์‚ฌ์šฉ์ž๋ฅผ ๋งŒ๋“ค๊ณ  ๊ถŒํ•œ์„ ๋ถ€์—ฌํ•œ๋‹ค.

docker exec -it db-target bash

mysql -u root -p
# ํŒจ์Šค์›Œ๋“œ ์ž…๋ ฅ

grant all privileges on etl.* to rivernine@'%' identified by 'rivernine';
exit

# ํ™•์ธ
mysql -u rivernine -p
# rivernine
show databases;

2. Configure Spring-boot

ETL์„ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด Spring-boot ํ™˜๊ฒฝ์„ค์ •์„ ํ•œ๋‹ค.
์šฐ์„ , build.gradle, application.yaml์„ ์ˆ˜์ •ํ•˜๊ณ  DTO๋ฅผ ์ •์˜ํ•œ ํ›„ Job์„ ์ƒ์„ฑํ•œ๋‹ค.

๋ณด๋‹ค ์ž์„ธํ•œ ๋‚ด์šฉ์€ ์ด์ „ ๊ธ€์—์„œ ๋‹ค๋ฃจ์—ˆ๋‹ค.

2.1. Add dependencies

build.gradle์— ETL์— ํ•„์š”ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์ถ”๊ฐ€ํ•œ๋‹ค.

  • build.gradle
dependencies {
	implementation (
		'org.springframework.boot:spring-boot-starter-batch',			// Batch
		'org.springframework.boot:spring-boot-starter-jdbc',			// Jdbc
		'org.projectlombok:lombok',																// Lombok
		'mysql:mysql-connector-java',															// Mysql
	)
	annotationProcessor 'org.projectlombok:lombok'

	// test
	// testImplementation 'org.springframework.boot:spring-boot-starter-test'
	// testImplementation 'org.springframework.batch:spring-batch-test'
}

2.2. Create DTO

  • Product.java
@ToString
@Getter
@Setter
@NoArgsConstructor
@Entity
public class Product {
  private static final DateTimeFormatter FORMATTER = DateTimeFormatter.ofPattern("yyyy-MM-dd hh:mm:ss");

  @Id
  @GeneratedValue(strategy = GenerationType.IDENTITY)
  private Long id;
  private String name;
  private Long price;
  private LocalDateTime created;

  public Product(String name, Long price, String created) {
    this.price = price;
    this.name = name;
    this.created = LocalDateTime.parse(created, FORMATTER);
  }

  public Product(Long id, String name, Long price, String created) {
    this.id = id;
    this.price = price;
    this.name = name;
    this.created = LocalDateTime.parse(created, FORMATTER);
  }
}
  • TransProduct.java
@ToString
@Getter
@Setter
@NoArgsConstructor
public class TransProduct {
  private static final DateTimeFormatter FORMATTER = DateTimeFormatter.ofPattern("yyyy-MM-dd hh:mm:ss");

  private Long id;
  private String name;
  private Long price;
  private LocalDateTime created;
  private boolean discount;

  public TransProduct(String name, Long price, String created, boolean discount) {
    this.price = price;
    this.name = name;
    this.created = LocalDateTime.parse(created, FORMATTER);
    this.discount = discount;
  }

  public TransProduct(String name, Long price, LocalDateTime created, boolean discount) {
    this.price = price;
    this.name = name;
    this.created = created;
    this.discount = discount;
  }

  public TransProduct(Long id, String name, Long price, String created, boolean discount) {
    this.id = id;
    this.price = price;
    this.name = name;
    this.created = LocalDateTime.parse(created, FORMATTER);
    this.discount = discount;
  }
}

2.3. Multiple Database access configuration

db-source์™€ db-target์— ๋™์‹œ ์ ‘๊ทผํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” datasource์ˆ˜์ •์ด ํ•„์š”ํ•˜๋‹ค.

2.3.1 Update application.yml

๋จผ์ €, application.yml์— db-source์™€ db-target์˜ datasource ์ •๋ณด๋ฅผ ์ž…๋ ฅํ•œ๋‹ค.

  • application.yml
spring:
  profiles:
    active: mysql
  batch.job.names: ${job.name:NONE}
---
spring:
  profiles: mysql
  datasource-dbsource:
    hikari:
      # MYSQL_ADDR:PORT/DATABASE
      jdbc-url: jdbc:mysql://192.168.56.103:3306/etl
      username: rivernine
      password: rivernine
      # mysql์šฉ jdbc๋“œ๋ผ์ด๋ฒ„
      driver-class-name: com.mysql.jdbc.Driver
  datasource-dbtarget:
    hikari:
      jdbc-url: jdbc:mysql://192.168.56.103:3307/etl
      username: rivernine
      password: rivernine      
      driver-class-name: com.mysql.jdbc.Driver

2.3.2. Create DataSourceConfiguration.java

config/DataSourceConfiguration.java๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ์ˆ˜์ •ํ•œ๋‹ค.

  • DataSourceConfiguration.java
@Configuration
public class DataSourceConfiguration {
  @Bean(name = "dataSource-dbsource")
  @Primary
  @ConfigurationProperties(prefix="spring.datasource-dbsource.hikari")
  public DataSource dataSourceDbSource() {
    return DataSourceBuilder.create().build();
  }

  @Bean(name = "dataSource-dbtarget")
  @ConfigurationProperties(prefix="spring.datasource-dbtarget.hikari")
  public DataSource dataSourceDbTarget() {
    return DataSourceBuilder.create().build();
  }
}

2.3.3. How to use

์ด์ œ ์‚ฌ์šฉํ•˜๊ณ ์ž ํ•˜๋Š” ํด๋ž˜์Šค ๋‚ด๋ถ€์— ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์„ ์–ธํ•˜๊ณ  ์‚ฌ์šฉํ•œ๋‹ค.

@Qualifier("dataSource-dbsource")
@Autowired
private DataSource dataSourceDbSource;

@Qualifier("dataSource-target")
@Autowired
private DataSource dataSourceDbTarget;

๋ชจ๋“  ์†Œ์Šค๋Š” Github์— ์˜ฌ๋ ค๋†“์•˜๋‹ค.

0๊ฐœ์˜ ๋Œ“๊ธ€