데이터베이스 샤딩(Sharding)이란 무엇이며, 어떻게 구현하나요?

김상욱·2024년 12월 22일

데이터베이스 샤딩(Sharding)이란 무엇이며, 어떻게 구현하나요?

What?

샤딩은 대규모 데이터베이스를 작은 단위의 여러 데이터베이스(샤드)로 분할하여 관리하는 기법입니다. 이는 데이터베이스의 성능을 향상시키고, 확장성을 높이기 위해 사용됩니다. 쉽게 말해, 하나의 큰 데이터베이스를 여러 개의 작은 데이터베이스로 나누어 각각의 샤드가 전체 데이터의 일부분만을 담당하게 하는 것입니다.

Why?

데이터가 많아지면 하나의 데이터베이스로 처리하는 데 한계가 생깁니다. 샤딩을 통해 여러 데이터베이스에 분산시켜 동시에 처리할 수 있어 성능이 개선.
데이터가 지속적으로 증가할 때, 새로운 서버나 데이터베이스를 추가하여 수평적으로 확장할 수 있습니다.
한 샤드에 문제가 생겨도 다른 샤드는 정상적으로 작동할 수 있어 시스템 전체의 안정성이 높아집니다.

Type?

수평 샤딩(Horizontal Sharding) : 테이블이 행(Row)을 기준으로 데이터를 분할. 예를 들어, 사용자 데이터를 사용자 ID에 따라 여러 데이터베이스에 나누는 방식입니다.
수직 샤딩(Vertical Sharding) : 테이블의 열(Column)을 기준으로 데이터를 분할합니다. 예를 들어, 자주 사용되는 데이터와 그렇지 않은 데이터를 별도의 데이터베이스에 저장하는 방식.

대부분의 경우, 수평 샤딩이 많이 사용됨.

HOW?

a. 샤드 키(Sharding Key) 선택
샤딩을 위해 데이터를 어떻게 분할할지 결정하는 샤드 키를 선택해야 합니다. 샤드 키는 데이터를 분배할 기준이 되는 속성입니다. 예를 들어, 사용자 ID, 지역, 이메일 도메인 등이 될 수 있습니다.

b. 샤드 매핑(Shard Mapping) 설정
샤드 키를 기준으로 데이터를 어떤 샤드에 저장할지 매핑하는 규칙을 결정합니다. 일반적으로 해시(Hash) 기반이나 범위(Range) 기반으로 나뉩니다.

해시 기반 샤딩 : 샤드 키에 해시 함수를 적용하여 샤드를 결정합니다. 데이터가 균등하게 분배되기 쉬우나, 특정 샤드에 부하가 몰리는 경우가 적습니다.
범위 기반 샤딩 : 샤드 키의 범위를 미리 정해 각 범위에 맞게 데이터를 분배합니다. 예를 들어, 사용자 ID 1-1000은 샤드 1, 1001-2000은 샤드 2 등으로 분배

c. 데이터 접근 계층 구현
샤딩된 데이터베이스에 접근하기 위해 데이터 접근 계층(Data Access Layer) 을 구현해야 합니다. spring 에는 Spring Data나 MyBatis 등을 사용하여 이를 관리할 수 있습니다.

ex) 샤드 키에 따라 다른 데이터소스를 선택하는 방법

@Configuration
public class DataSourceConfig {

    @Bean
    @Primary
    @ConfigurationProperties(prefix = "datasource.shard1")
    public DataSource shard1DataSource() {
        return DataSourceBuilder.create().build();
    }

    @Bean
    @ConfigurationProperties(prefix = "datasource.shard2")
    public DataSource shard2DataSource() {
        return DataSourceBuilder.create().build();
    }

    // 샤드 선택 로직
    @Bean
    public DataSource routingDataSource() {
        AbstractRoutingDataSource routingDataSource = new AbstractRoutingDataSource() {
            @Override
            protected Object determineCurrentLookupKey() {
                return ShardContext.getCurrentShard();
            }
        };
        Map<Object, Object> targetDataSources = new HashMap<>();
        targetDataSources.put("shard1", shard1DataSource());
        targetDataSources.put("shard2", shard2DataSource());
        routingDataSource.setTargetDataSources(targetDataSources);
        return routingDataSource;
    }
}

d. 트래픽 분산 및 로드 밸런싱
여러 샤드에 대한 트래픽을 효율적으로 분산시키기 위해 로드 밸런서를 설정할 수 있습니다. 이는 각 샤드의 부하를 고르게 유지하고, 특정 샤드에 과부하가 걸리는 것을 방지합니다.

Caution

샤드 키를 적절히 선택하여 모든 샤드에 데이터가 고르게 분배되도록 해야 합니다.
샤딩을 도입하면 데이터 접근 로직이 복잡해질 수 있습니다. 이를 잘 관리하기 위한 설계가 필요합니다.
여러 샤드에 걸친 트랜잭션 처리가 복잡해질 수 있습니다.
각 샤드별로 백업과 복구 전략을 마련해야 합니다.

샤딩 도구 활용

ShardingSphere : Apache ShardingSphere는 Java 기반의 데이터베이스 미들웨어로, 샤딩, 분산 트랜잭션, 보안 등을 지원.

<!-- Maven 의존성 추가 -->
<dependency>
    <groupId>org.apache.shardingsphere</groupId>
    <artifactId>sharding-jdbc-spring-boot-starter</artifactId>
    <version>5.0.0</version>
</dependency>

# application.yml 설정 예시
spring:
  shardingsphere:
    datasource:
      names: shard1, shard2
      shard1:
        type: com.zaxxer.hikari.HikariDataSource
        jdbc-url: jdbc:mysql://localhost:3306/shard1
        username: user
        password: pass
      shard2:
        type: com.zaxxer.hikari.HikariDataSource
        jdbc-url: jdbc:mysql://localhost:3306/shard2
        username: user
        password: pass
    sharding:
      tables:
        user:
          actual-data-nodes: shard${0..1}.user
          table-strategy:
            inline:
              sharding-column: user_id
              algorithm-expression: user_${user_id % 2}

취업 준비를 위해 실습을 통해 Java와 Spring을 활용한 백엔드 개발 역량을 키우는 것은 매우 중요합니다. 특히 데이터베이스 샤딩과 같은 고급 개념을 실습해보면 실무에 큰 도움이 될 것입니다. 아래에 신입 개발자가 실습할 만한 프로젝트 아이디어와 단계별 가이드를 제공해드리겠습니다.

프로젝트 아이디어: 사용자 관리 시스템에 데이터베이스 샤딩 적용하기

목표: Spring Boot를 사용하여 간단한 사용자 관리 시스템을 구축하고, Apache ShardingSphere를 활용하여 데이터베이스 샤딩을 구현해봅니다.

1. 프로젝트 준비

필요한 도구 및 기술

Java 17 이상
Spring Boot
Maven 또는 Gradle
MySQL (또는 다른 관계형 데이터베이스)
Apache ShardingSphere
IDE (IntelliJ IDEA, Eclipse 등)
Postman 또는 curl (API 테스트용)

프로젝트 구조

user-sharding-system/
├── src/
│   ├── main/
│   │   ├── java/com/example/sharding/
│   │   │   ├── controller/
│   │   │   ├── entity/
│   │   │   ├── repository/
│   │   │   ├── service/
│   │   │   └── ShardingApplication.java
│   │   └── resources/
│   │       ├── application.yml
│   │       └── schema.sql
│   └── test/
└── pom.xml

2. Spring Boot 프로젝트 생성

Spring Initializr를 사용하여 새로운 Spring Boot 프로젝트를 생성합니다.
- Dependencies: Spring Web, Spring Data JPA, MySQL Driver
프로젝트 설정:
- application.yml 파일에 기본 데이터베이스 설정을 추가합니다.

spring:
  datasource:
    shard1:
      url: jdbc:mysql://localhost:3306/shard1
      username: your_username
      password: your_password
      driver-class-name: com.mysql.cj.jdbc.Driver
    shard2:
      url: jdbc:mysql://localhost:3306/shard2
      username: your_username
      password: your_password
      driver-class-name: com.mysql.cj.jdbc.Driver
  jpa:
    hibernate:
      ddl-auto: update
    show-sql: true
    properties:
      hibernate:
        format_sql: true

3. 데이터베이스 샤딩 설정

MySQL 샤드 데이터베이스 생성:
- shard1과 shard2라는 두 개의 데이터베이스를 생성합니다.

CREATE DATABASE shard1;
CREATE DATABASE shard2;

ShardingSphere 의존성 추가:
- pom.xml에 ShardingSphere 관련 의존성을 추가합니다.

<dependencies>
    <!-- 기존 의존성들 -->

    <!-- ShardingSphere 의존성 추가 -->
    <dependency>
        <groupId>org.apache.shardingsphere</groupId>
        <artifactId>shardingsphere-jdbc-spring-boot-starter</artifactId>
        <version>5.0.0</version>
    </dependency>
</dependencies>

ShardingSphere 설정 추가:
- application.yml에 ShardingSphere 설정을 추가합니다.

spring:
  shardingsphere:
    datasource:
      names: shard1, shard2
      shard1:
        type: com.zaxxer.hikari.HikariDataSource
        jdbc-url: jdbc:mysql://localhost:3306/shard1
        username: your_username
        password: your_password
      shard2:
        type: com.zaxxer.hikari.HikariDataSource
        jdbc-url: jdbc:mysql://localhost:3306/shard2
        username: your_username
        password: your_password
    sharding:
      tables:
        user:
          actual-data-nodes: shard${0..1}.user
          table-strategy:
            inline:
              sharding-column: id
              algorithm-expression: user_${id % 2}

4. 엔티티 및 리포지토리 생성

User 엔티티 생성:

package com.example.sharding.entity;

import javax.persistence.*;

@Entity
@Table(name = "user")
public class User {
    
    @Id
    private Long id;

    private String name;
    private String email;

    // Getters and Setters
}

UserRepository 생성:

package com.example.sharding.repository;

import com.example.sharding.entity.User;
import org.springframework.data.jpa.repository.JpaRepository;
import org.springframework.stereotype.Repository;

@Repository
public interface UserRepository extends JpaRepository<User, Long> {
}

5. 서비스 및 컨트롤러 구현

UserService 생성:

package com.example.sharding.service;

import com.example.sharding.entity.User;
import com.example.sharding.repository.UserRepository;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;

import java.util.List;

@Service
public class UserService {
    
    @Autowired
    private UserRepository userRepository;

    public User saveUser(User user){
        return userRepository.save(user);
    }

    public List<User> getAllUsers(){
        return userRepository.findAll();
    }
}

UserController 생성:

package com.example.sharding.controller;

import com.example.sharding.entity.User;
import com.example.sharding.service.UserService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.*;

import java.util.List;

@RestController
@RequestMapping("/users")
public class UserController {
    
    @Autowired
    private UserService userService;

    @PostMapping
    public User createUser(@RequestBody User user){
        return userService.saveUser(user);
    }

    @GetMapping
    public List<User> getUsers(){
        return userService.getAllUsers();
    }
}

6. 샤딩 테스트

애플리케이션 실행:
- Spring Boot 애플리케이션을 실행합니다.

사용자 데이터 생성:

Postman이나 curl을 사용하여 여러 사용자 데이터를 생성합니다.

curl -X POST http://localhost:8080/users \
-H "Content-Type: application/json" \
-d '{"id":1,"name":"Alice","email":"alice@example.com"}'

curl -X POST http://localhost:8080/users \
-H "Content-Type: application/json" \
-d '{"id":2,"name":"Bob","email":"bob@example.com"}'

샤드 데이터베이스 확인:
- shard1.user과 shard2.user 테이블을 확인하여 데이터가 올바르게 분배되었는지 확인합니다.
```
SELECT * FROM shard1.user;
SELECT * FROM shard2.user;
```
- id 값에 따라 user_0 또는 user_1 테이블에 데이터가 분배되었는지 확인합니다.

7. 추가 실습 아이디어

샤드 키 변경 실습:
- 다른 샤드 키(예: 이메일 도메인)를 사용하여 샤딩 전략을 변경해보고 데이터 분배를 관찰해보세요.
수평 샤딩과 수직 샤딩 비교:
- 수직 샤딩을 적용하여 테이블의 컬럼을 분리해보고, 수평 샤딩과의 차이점을 이해해보세요.
샤딩 장애 시나리오 테스트:
- 특정 샤드에 장애를 유발하고, 다른 샤드가 정상적으로 작동하는지 확인해보세요.
성능 테스트:
- 데이터가 증가함에 따라 샤딩이 성능에 미치는 영향을 벤치마킹 도구를 사용하여 측정해보세요.
ShardingSphere의 고급 기능 활용:
- ShardingSphere의 분산 트랜잭션, 보안 기능 등을 추가로 학습하고 적용해보세요.

8. 학습 자료 및 참고 링크

공식 ShardingSphere 문서: Apache ShardingSphere Documentation
Spring Boot와 ShardingSphere 통합 튜토리얼:
- ShardingSphere Spring Boot Example
온라인 강의 및 블로그:
- 인프런, 패스트캠퍼스 등의 온라인 학습 플랫폼에서 관련 강의를 찾아보세요.
- 블로그 포스트를 통해 다양한 실습 예제를 참고할 수 있습니다.

결론

위의 실습을 통해 데이터베이스 샤딩의 기본 개념을 이해하고, Java와 Spring 환경에서 실제로 적용해보는 경험을 쌓을 수 있습니다. 샤딩 외에도 다양한 백엔드 개발 기술을 꾸준히 학습하고, 작은 프로젝트를 통해 실력을 키워나가면 취업 준비에 큰 도움이 될 것입니다. 화이팅하세요!

김상욱

이전 포스트

CDN(Content Delivery Network)이란 무엇인가요?

다음 포스트

데이터베이스 샤딩(Sharding)이란 무엇이며, 어떻게 구현하나요?