Airbyte Scale out in Kube

문주은·2025년 3월 10일

1. Update values.yaml

To scale out Airbyte using abctl, you'll need to focus on adjusting the resources and configuration of your Kubernetes deployment.

  1. Increase Concurrent Syncs:
    You can modify the values.yaml file to increase the number of concurrent syncs. Add or modify these environment variables:
global:
  edition: "community"  # or "community" depending on your version
  jobs:
    resources:
      limits:
        cpu: "5000m" # 5 CPU core
        memory: "10Gi" # 5 GB 
  1. Increase Worker Replicas:
    To handle more syncs simultaneously, increase the number of worker replicas:
worker:
  replicaCount: 3
  extraEnvs: ## We recommend setting both environment variables with a single, shared value.
    - name: MAX_SYNC_WORKERS
      value: 5
    - name: MAX_CHECK_WORKERS
      value: 5
      
server:
  replicaCount: 3 

Number of concurrent syncs

parallel syncs = number of worker pods * min(MAX_SYNC_WORKERS, TEMPORAL_WORKER_PORTS/4)
  • number of worker pods = worker.replicaCount
  • MAX_SYNC_WORKERS = worker.MAX_SYNC_WORKERS
  • TEMPORAL_WORKER_PORTS = default range 40 ports.(9001-9040)
    parallel syncs = 3 x min(15, 10) = 3 x 10 = 30
  1. Adjust Resource Limits:
    If you're experiencing high CPU or memory usage, you can modify the resource limits for connector pods:
global:
  edition: "community"  # or "community" depending on your version
  jobs:
    resources:
      limits:
        cpu: "5000m" # 5 CPU core
        memory: "10Gi" # 5 GB 
  1. Adjust:
    Adjust your configuration based on these metrics.
    After making these changes to your values.yaml file, you can apply the new configuration using abctl:
$ abctl local install --values ./values.yaml
  • To install airbyte with configuration values and built image to local k8s clusters.

2. Update configuration values with configmap

airbyte-abctl-airbyte-env : Configmap

  • Configmap? configuration variables file to execute airbyte generated by abctl tool in k8s environment
  1. Resource Configuration Environemnt
    JOB_MAIN_CONTAINER_CPU_REQUEST
    JOB_MAIN_CONTAINER_CPU_LIMIT
    JOB_MAIN_CONTAINER_MEMORY_REQUEST
    JOB_MAIN_CONTAINER_MEMORY_LIMIT
# Check configmap
$ kubectl get configmap airbyte-abctl-airbyte-env -n airbyte-abctl -o yaml
  1. Edit Variables
$ kubectl edit configmap airbyte-abctl-airbyte-env -n <your-airbyte-namespace>
  1. Restart Pod
$ kubectl delete pod -l app=airbyte -n <your-airbyte-namespace>

3. Network data bandwidth

current Data pipeline : MSSQL -> Airbyte(EC2) -> Snowflake
purpose : sync MSSQL and Snowflake equally
TODO : To check the data bandwidth amount through network packet size, collect packet information at each stage in the MSSQL -> Airbyte -> Snowflake.

phase1) MSSQL -> Airbyte(EC2)

Checking the network packet size and bandwidth transmitted in MSSQL

  1. Checking the packet size
SELECT *
FROM sys.configurations
WHERE name like '%network packet size%';

/* result
... |          name           | value | minimum | maximum | value_in_use | ... 
... | network packet size (B) | 4096  |   512   |  32767  |     4096     | ... 
*/
  • value=4096 : current packet size (4KB)
  • minimum=512 : minimum configurable size (512B)
  • manimum=32767 : maximize configurable size (32KB)
  • value_in_use=4096 : current using packe tsize (4KB)
[Network speed Basic concepts]
1 Byte = 8 bits
1 Megabit (Mb) = 1,000 Kb = 1,000,000 bits = 125,000 Bytes (125 KB)
1 Megabit per second (1 Mbps) = 125 KB/s = 125,000 Bytes/s
1 Gigabit per second (1 Gbps) = 125 MB/s = 125,000,000 Bytes/s

초당데이터량(MB/s)=Network속도(Mbps)8초당 데이터량 (MB/s) = \frac {Network속도(Mbps)} 8

Network speeddata per second (MB/s)
1 Mbps0.125 MB/s
100 Mbps12.5 MB/s
1 Gbps125 MB/s
10 Gbps1.25 GB/s

초당처리가능패킷개수=초당데이터량(Bytes)패킷크기(Bytes)초당 처리 가능 패킷 개수 = \frac {초당 데이터량(Bytes)} {패킷크기(Bytes)}

Network speeddata per second (MB/s)data per second (Bytes/s)Packet size (4KB)Packets per second
1 Mbps0.125 MB/s125,000 B/s4096B30.5 packet
100 Mbps12.5 MB/s12,500,000 B/s4096B3,051 packet
1 Gbps125 MB/s125,000,000 B/s4096B30,517 packet
10 Gbps1.25 GB/s1,250,000,000 B/s4096B305,175 packet

패킷 늘릴 경우,

Network speedPacket size (4KB)Packet size (8KB)Packet size (16KB)
1 Mbps30.5 packet15.2 packet7 packet
100 Mbps3,051 packet1,525 packet763 packet
1 Gbps30,517 packet15,259 packet7,629 packet
10 Gbps305,175 packet152,587 packet76,294 packet

[Recommended packet settings]

  • 100 Mbps 이하: 기본값(4KB) 유지
  • 1 Gbps 이상: 8KB ~ 16KB로 조정 가능
  • 10 Gbps 이상: 16KB ~ 32KB 설정 고려
profile
Data Engineer

0개의 댓글