embulk 는 Sqoop 과 거의 같은 기능을 하는 솔루션으로 여러 데이터 소스로 읽어와서 부어주는 기능을 하는
트레져 데이터에서 만든솔루션입니다.
fluentd , MassagePack, JeroMQ 트레져데이터에서 만든 라이브러리 솔루션입니다.
Embulk는 다양한 스토리지, 데이터베이스, NoSQL 및 클라우드 서비스 간의 데이터 전송을 돕는 병렬 벌크 데이터
로더입니다 .
Embulk는 기능을 추가하는 플러그인을 지원합니다. 플러그인을 공유 하여 사용자 정의 스크립트를 읽고, 유지 관리하고, 재사용할 수 있도록 유지할 수 있습니다.
GitHub - embulk/embulk: Embulk: Pluggable Bulk Data Loader.

% brew tap AdoptOpenJDK/openjdk
% brew install --cask adoptopenjdk8
% java -version
% brew tap AdoptOpenJDK/openjdk
==> Tapping adoptopenjdk/openjdk
Cloning into '/usr/local/Homebrew/Library/Taps/adoptopenjdk/homebrew-openjdk'...
remote: Enumerating objects: 1996, done.
remote: Counting objects: 100% (60/60), done.
remote: Compressing objects: 100% (22/22), done.
remote: Total 1996 (delta 44), reused 49 (delta 38), pack-reused 1936
Receiving objects: 100% (1996/1996), 372.27 KiB | 1.98 MiB/s, done.
Resolving deltas: 100% (1424/1424), done.
Tapped 47 casks (69 files, 521.9KB).
% brew install --cask adoptopenjdk8
==> Downloading https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u292-b10/OpenJDK8U-jdk_x64_mac_hotspot_8u292b10.pkg
==> Downloading from https://objects.githubusercontent.com/github-production-release-asset-2e65be/140418865/bbad4180-a2e0-11eb-8274-f16f6a90729c?X-Amz-
######################################################################## 100.0%
==> Installing Cask adoptopenjdk8
==> Running installer for adoptopenjdk8; your password may be necessary.
Package installers may write to any location; options such as `--appdir` are ignored.
Password:
Sorry, try again.
Password:
installer: Package name is AdoptOpenJDK
installer: Installing at base path /
installer: The install was successful.
package-id: net.adoptopenjdk.8.jdk
version: 1.8.0_292-b10
volume: /
location:
install-time: 1649222944
🍺 adoptopenjdk8 was successfully installed!
% java -version
openjdk version "1.8.0_292"
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_292-b10)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.292-b10, mixed mode)
thor_mac@kim-youngsam-iMac sv % brew install --cask adoptopenjdk8
Error: Cask adoptopenjdk8 exists in multiple taps:
homebrew/cask-versions/adoptopenjdk8
adoptopenjdk/openjdk/adoptopenjdk8
에러 발생시 패키지 충돌을 일으킨듯 하다
해결 방법
brew untap adoptopenjdk/openjdk
brew cleanup
brew untap adoptopenjdk/openjdk
텝 목록에서 삭제
brew cleanup
최신버전외 나머지 삭제
또는
sudo rm /usr/local/Homebrew/Library/Taps/homebrew/homebrew-cask-versions/Casks/adoptopenjdk8.rb
cask 파일 삭제
% curl --create-dirs -o ~/.embulk/bin/embulk -L "https://dl.embulk.org/embulk-latest.jar"
% echo 'export PATH="$HOME/.embulk/bin:$PATH"' >> ~/.zshrc
% source ~/.zshrc
% curl --create-dirs -o ~/.embulk/bin/embulk -L "https://dl.embulk.org/embulk-latest.jar"
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 654 100 654 0 0 1043 0 --:--:-- --:--:-- --:--:-- 7694
100 42.4M 100 42.4M 0 0 85491 0 0:08:40 0:08:40 --:--:-- 111k
% echo 'export PATH="$HOME/.embulk/bin:$PATH"' >> ~/.zshrc
% source ~/.zshrc
% embulk -version
embulk 0.9.24
embulk gem list
% embulk gem list
2022-04-06 15:37:46.541 +0900: Embulk v0.9.24
Gem plugin path is: /Users/thor_mac/.embulk/lib/gems
*** LOCAL GEMS ***
bundler (1.16.0)
did_you_mean (default: 1.0.1)
embulk (0.9.24 java)
jar-dependencies (default: 0.3.10)
jruby-openssl (0.9.21 java)
jruby-readline (1.2.0 java)
json (1.8.3 java)
liquid (4.0.0)
minitest (default: 5.4.1)
msgpack (1.1.0 java)
net-telnet (default: 0.1.1)
power_assert (default: 0.2.3)
psych (2.2.4 java)
rake (default: 10.4.2)
rdoc (default: 4.2.0)
test-unit (default: 3.1.1)
embulk gem install jdbc
embulk gem install embulk-input-jdbc
embulk gem install embulk-output-mysql
embulk gem install embulk-input-mysql
% embulk gem install jdbc
2022-04-06 15:42:56.206 +0900: Embulk v0.9.24
Gem plugin path is: /Users/thor_mac/.embulk/lib/gems
Fetching: jdbc-0.1.1-java.gem (100%)
Successfully installed jdbc-0.1.1-java
....
% embulk gem list
2022-04-06 15:46:14.986 +0900: Embulk v0.9.24
in:
Gem plugin path is: /Users/thor_mac/.embulk/lib/gems
*** LOCAL GEMS ***
bundler (1.16.0)
did_you_mean (default: 1.0.1)
embulk (0.9.24 java)
in:
embulk-input-jdbc (0.12.3 java)
jar-dependencies (default: 0.3.10)
jdbc (0.1.1 java)
jruby-openssl (0.9.21 java)
jruby-readline (1.2.0 java)
json (1.8.3 java)
liquid (4.0.0)
minitest (default: 5.4.1)
msgpack (1.1.0 java)
net-telnet (default: 0.1.1)
power_assert (default: 0.2.3)
psych (2.2.4 java)
rake (default: 10.4.2)
rdoc (default: 4.2.0)
test-unit (default: 3.1.1)
% cat embulk_prd_csv_to_mysql.yml
exec:
max_threads: 1
min_output_tasks: 1
in:
type: file
path_prefix: '/Users/thor_mac/sv/script/db1/mon/job/prd_user_external_ids.csv'
parser:
default_timezone: "Asia/Seoul"
charset: UTF-8
delimiter: ','
columns:
- {name: user_id, type: long }
- {name: external_type, type: string }
- {name: external_id, type: string }
## - {name: insert_time, type: timestamp, format: '%Y-%m-%d %H:%M:%S' }
skip_header_lines: 1
type: csv
out:
type: mysql
host: mydata-productiondb.cqyhmqtd3iez.ap-northeast-2.rds.amazonaws.com
port: 3306
user: xxxxxx
password: "xxxxxx"
database: fcsdb
auto_create_table: false
table: user_external_ids
mode: insert_direct
column_options:
user_id: {type: bigint(20) NOT NULL}
external_type: {type: varchar(20) NOT NULL}
external_id: {type: varchar(40) NOT NULL}
## insert_time: {type: date NOT NULL}
default_timezone: "Asia/Seoul"
% embulk preview /Users/thor_mac/sv/script/db1/mon/embulk_csv_to_mysql.yml
2022-04-06 17:24:38.017 +0900: Embulk v0.9.24
2022-04-06 17:24:38.574 +0900 [WARN] (main): DEPRECATION: JRuby org.jruby.embed.ScriptingContainer is directly injected.
2022-04-06 17:24:39.911 +0900 [INFO] (main): Gem's home and path are set by default: "/Users/thor_mac/.embulk/lib/gems"
2022-04-06 17:24:40.270 +0900 [INFO] (main): Started Embulk v0.9.24
2022-04-06 17:24:40.429 +0900 [INFO] (0001:preview): Listing local files at directory '/Users/thor_mac/sv/script/db1/mon/job' filtering filename by prefix 'stg_user_external_ids.csv'
2022-04-06 17:24:40.430 +0900 [INFO] (0001:preview): "follow_symlinks" is set false. Note that symbolic links to directories are skipped.
2022-04-06 17:24:40.432 +0900 [INFO] (0001:preview): Loading files [/Users/thor_mac/sv/script/db1/mon/job/stg_user_external_ids.csv]
2022-04-06 17:24:40.437 +0900 [INFO] (0001:preview): Try to read 32,768 bytes from input source
+--------------+----------------------+--------------------------------------+
| user_id:long | external_type:string | external_id:string |
+--------------+----------------------+--------------------------------------+
| 1 | USER_TRACK | b6435831-9fc9--b931-a825db5f5129 |
| 2 | USER_TRACK | 7d71fb8d-19e4-46af-b358- |
| 3 | USER_TRACK | 65f523fe-5442--a232-d22e0d7ad4ee |
| 4 | USER_TRACK | 99ddfb61-a60a-42f0-aaa8-14394bc142d3 |
| 5 | USER_TRACK | 93482d63-8a59-4743--55a2570ce80a |
| 6 | USER_TRACK | -5282-4dba-9b1e-c62ede7dfd1d |
| 7 | USER_TRACK | -cd34-4dc5-9f80- |
| 8 | USER_TRACK | f77b85f7-665f-41b7-8b2f-65bd82938af8 |
| 9 | USER_TRACK | -3c64-4b32-9e1e2678ba321506 |
| 10 | USER_TRACK | 126b8ef3-a182-4f48-acbf- |
| 11 | USER_TRACK | -f3d5-4a49-b283-b95092ae7232 |
....
...
| 254 | USER_TRACK | -6045-447b-a7ad-17885fd51a1d |
| 255 | USER_TRACK | a1bed5fa-7fc6-4406-9fdf-6838b8aca5ad |
| 256 | USER_TRACK | 3578f74a-f5ee-4aaa-99c1-0614468fb7a7 |
| 257 | USER_TRACK | 82dece73-00db-4a92--8bff7613f02e |
| 258 | USER_TRACK | 0ee39069-da0e-4d17a3a5-e22b6738e0b8 |
| 259 | USER_TRACK | f342ed0b-617e-44fb-912a-58fba5d17cbd |
| 260 | USER_TRACK | -7131-4a6d-bf35- |
+--------------+----------------------+--------------------------------------+
embulk run /Users/thor_mac/sv/script/db1/mon/embulk_csv_to_mysql.yml
% embulk run /Users/thor_mac/sv/script/db1/mon/embulk_csv_to_mysql.yml
2022-04-06 17:24:59.640 +0900: Embulk v0.9.24
2022-04-06 17:25:00.363 +0900 [WARN] (main): DEPRECATION: JRuby org.jruby.embed.ScriptingContainer is directly injected.
2022-04-06 17:25:01.920 +0900 [INFO] (main): Gem's home and path are set by default: "/Users/thor_mac/.embulk/lib/gems"
2022-04-06 17:25:02.436 +0900 [INFO] (main): Started Embulk v0.9.24
2022-04-06 17:25:02.538 +0900 [INFO] (0001:transaction): Loaded plugin embulk-output-mysql (0.10.2)
2022-04-06 17:25:02.588 +0900 [INFO] (0001:transaction): Listing local files at directory '/Users/thor_mac/sv/script/db1/mon/job' filtering filename by prefix 'stg_user_external_ids.csv'
2022-04-06 17:25:02.589 +0900 [INFO] (0001:transaction): "follow_symlinks" is set false. Note that symbolic links to directories are skipped.
2022-04-06 17:25:02.591 +0900 [INFO] (0001:transaction): Loading files [/Users/thor_mac/sv/script/db1/mon/job/stg_user_external_ids.csv]
2022-04-06 17:25:02.614 +0900 [INFO] (0001:transaction): Using local thread executor with max_threads=12 / output tasks 6 = input tasks 1 * 6
2022-04-06 17:25:02.646 +0900 [INFO] (0001:transaction): JDBC Driver = /Users/thor_mac/.embulk/lib/gems/gems/embulk-output-mysql-0.10.2-java/default_jdbc_driver/mysql-connector-java-5.1.44.jar
2022-04-06 17:25:02.670 +0900 [INFO] (0001:transaction): Connecting to jdbc:mysql://mydata-stage-db-temp.cqyhmqtd3iez.ap-northeast-2.rds.amazonaws.com:3306/fcsdb options {user=admin, password=***, tcpKeepAlive=true, useSSL=false, useCompression=true, rewriteBatchedStatements=true, connectTimeout=300000, socketTimeout=1800000}
2022-04-06 17:25:02.971 +0900 [INFO] (0001:transaction): TransactionIsolation=repeatable_read
2022-04-06 17:25:02.972 +0900 [INFO] (0001:transaction): Using JDBC Driver mysql-connector-java-5.1.44 ( Revision: b3cda4f864902ffdde495b9df93937c3e20009be )
2022-04-06 17:25:02.972 +0900 [WARN] (0001:transaction): This plugin will update MySQL Connector/J version in the near future release.
2022-04-06 17:25:02.972 +0900 [WARN] (0001:transaction): It has some incompatibility changes.
2022-04-06 17:25:02.972 +0900 [WARN] (0001:transaction): For example, the 5.1.35 introduced `noTimezoneConversionForDateType` and `cacheDefaultTimezone` options.
2022-04-06 17:25:02.972 +0900 [WARN] (0001:transaction): Please read a document and make sure configuration carefully before updating the plugin.
2022-04-06 17:25:02.981 +0900 [WARN] (0001:transaction): The plugin will set `useLegacyDatetimeCode=false` by default in future.
2022-04-06 17:25:02.981 +0900 [INFO] (0001:transaction): Using insert_direct mode
2022-04-06 17:25:03.053 +0900 [INFO] (0001:transaction): {done: 0 / 1, running: 0}
2022-04-06 17:25:03.069 +0900 [INFO] (0015:task-0000): Connecting to jdbc:mysql://mydata-stage-db-temp.cqyhmqtd3iez.ap-northeast-2.rds.amazonaws.com:3306/fcsdb options {user=admin, password=***, tcpKeepAlive=true, useSSL=false, useCompression=true, rewriteBatchedStatements=true, connectTimeout=300000, socketTimeout=1800000}
2022-04-06 17:25:03.143 +0900 [INFO] (0015:task-0000): TransactionIsolation=repeatable_read
2022-04-06 17:25:03.143 +0900 [INFO] (0015:task-0000): Prepared SQL: INSERT INTO `user_external_ids` (`user_id`, `external_type`, `external_id`) VALUES (?, ?, ?)
2022-04-06 17:25:03.155 +0900 [INFO] (0015:task-0000): Connecting to jdbc:mysql://mydata-stage-db-temp.cqyhmqtd3iez.ap-northeast-2.rds.amazonaws.com:3306/fcsdb options {user=admin, password=***, tcpKeepAlive=true, useSSL=false, useCompression=true, rewriteBatchedStatements=true, connectTimeout=300000, socketTimeout=1800000}
2022-04-06 17:25:03.231 +0900 [INFO] (0015:task-0000): TransactionIsolation=repeatable_read
2022-04-06 17:25:03.232 +0900 [INFO] (0015:task-0000): Prepared SQL: INSERT INTO `user_external_ids` (`user_id`, `external_type`, `external_id`) VALUES (?, ?, ?)
2022-04-06 17:25:03.235 +0900 [INFO] (0015:task-0000): Connecting to jdbc:mysql://mydata-stage-db-temp.cqyhmqtd3iez.ap-northeast-2.rds.amazonaws.com:3306/fcsdb options {user=admin, password=***, tcpKeepAlive=true, useSSL=false, useCompression=true, rewriteBatchedStatements=true, connectTimeout=300000, socketTimeout=1800000}
2022-04-06 17:25:03.309 +0900 [INFO] (0015:task-0000): TransactionIsolation=repeatable_read
2022-04-06 17:25:03.309 +0900 [INFO] (0015:task-0000): Prepared SQL: INSERT INTO `user_external_ids` (`user_id`, `external_type`, `external_id`) VALUES (?, ?, ?)
2022-04-06 17:25:03.312 +0900 [INFO] (0015:task-0000): Connecting to jdbc:mysql://mydata-stage-db-temp.cqyhmqtd3iez.ap-northeast-2.rds.amazonaws.com:3306/fcsdb options {user=admin, password=***, tcpKeepAlive=true, useSSL=false, useCompression=true, rewriteBatchedStatements=true, connectTimeout=300000, socketTimeout=1800000}
2022-04-06 17:25:03.385 +0900 [INFO] (0015:task-0000): TransactionIsolation=repeatable_read
2022-04-06 17:25:03.385 +0900 [INFO] (0015:task-0000): Prepared SQL: INSERT INTO `user_external_ids` (`user_id`, `external_type`, `external_id`) VALUES (?, ?, ?)
2022-04-06 17:25:03.388 +0900 [INFO] (0015:task-0000): Connecting to jdbc:mysql://mydata-stage-db-temp.cqyhmqtd3iez.ap-northeast-2.rds.amazonaws.com:3306/fcsdb options {user=admin, password=***, tcpKeepAlive=true, useSSL=false, useCompression=true, rewriteBatchedStatements=true, connectTimeout=300000, socketTimeout=1800000}
2022-04-06 17:25:03.461 +0900 [INFO] (0015:task-0000): TransactionIsolation=repeatable_read
2022-04-06 17:25:03.462 +0900 [INFO] (0015:task-0000): Prepared SQL: INSERT INTO `user_external_ids` (`user_id`, `external_type`, `external_id`) VALUES (?, ?, ?)
2022-04-06 17:25:03.464 +0900 [INFO] (0015:task-0000): Connecting to jdbc:mysql://mydata-stage-db-temp.cqyhmqtd3iez.ap-northeast-2.rds.amazonaws.com:3306/fcsdb options {user=admin, password=***, tcpKeepAlive=true, useSSL=false, useCompression=true, rewriteBatchedStatements=true, connectTimeout=300000, socketTimeout=1800000}
2022-04-06 17:25:03.537 +0900 [INFO] (0015:task-0000): TransactionIsolation=repeatable_read
2022-04-06 17:25:03.538 +0900 [INFO] (0015:task-0000): Prepared SQL: INSERT INTO `user_external_ids` (`user_id`, `external_type`, `external_id`) VALUES (?, ?, ?)
2022-04-06 17:25:04.767 +0900 [INFO] (embulk-output-executor-2): Loading 110,500 rows
2022-04-06 17:25:04.767 +0900 [INFO] (embulk-output-executor-0): Loading 110,500 rows
2022-04-06 17:25:04.767 +0900 [INFO] (embulk-output-executor-3): Loading 110,500 rows
2022-04-06 17:25:04.767 +0900 [INFO] (embulk-output-executor-1): Loading 110,500 rows
2022-04-06 17:25:04.769 +0900 [INFO] (embulk-output-executor-4): Loading 110,500 rows
2022-04-06 17:25:04.770 +0900 [INFO] (embulk-output-executor-5): Loading 110,500 rows
2022-04-06 17:25:07.242 +0900 [INFO] (embulk-output-executor-5): > 2.47 seconds (loaded 110,500 rows in total)
2022-04-06 17:25:07.448 +0900 [INFO] (embulk-output-executor-4): > 2.68 seconds (loaded 110,500 rows in total)
2022-04-06 17:25:08.017 +0900 [INFO] (embulk-output-executor-1): > 3.25 seconds (loaded 110,500 rows in total)
2022-04-06 17:25:08.307 +0900 [INFO] (embulk-output-executor-0): > 3.54 seconds (loaded 110,500 rows in total)
2022-04-06 17:25:08.331 +0900 [INFO] (embulk-output-executor-2): > 3.56 seconds (loaded 110,500 rows in total)