모던 자바 인 액션 (세션 6)

kimseungki·2022년 7월 6일

Java 독서 스트림

독서

목록 보기

6/8

개요

Stream에서 데이터를 저장할 때 어떻게 저장을 하는지에 대한 지식을 제공하는 챕터이다.

Collector란

Collector 인터페이스는 스트림의 구성요소가 어떻게 데이터로 나오는지 지정하는 기능이다. Collector에 파라미터에 존재하는 메서드를 통해 리듀싱 연산이 유연하게 발생하게 됨을 알 수 있다.

Collector에서 제공하는 메서드의 기능

스트림 요소를 하나의 값으로 리듀스하고 요약

요소 그룹화

요소 분할

리듀싱과 요약

counting

long dishes = menu.stream().collect(Collectors.counting());
long dishes = menu.stream().collect(count());

최대값 및 최소값(리듀싱)

스트림의 최댓값 검색

Comparator<Dish> dish = Comparator.comparingInt(Dish::getCalories);
Optional<Dish> most = menu.stream().collect(maxBy(dishCaloriesComparator));

요약연산

summingInt는 객체의 int 데이터를 총합하여, int로 매핑하는 메소드이다.
int totalCalories = menu.stream().collect(summingInt(Dish::getCalories));
모든 총합, 평균 등의 종합 연산데이터를 얻고 싶은 경우 summarizingInt를 쓰면 된다.
javaIntSummaryStatistics menuStatistics = menu.stream().collect(summarizingInt(Dish::getCalories));

문자열 연결

문자를 모두 합치는 기능이 필요할 때 쓰는 메소드이다.

String shortMenu = menu.stream().map(Dish::getName).collect(joining());
String shortMenu = menu.stream().map(Dish::getName).collect(joining(", ")); // ,을 표시할 때 사용

범용 리듀싱 요약

모든 컬렉터는 리듀싱 팩토리 메서드로 정의할 수 있지만, 가독성을 고려한 리듀싱 메서드로 만들어진 컬렉터도 존재한다.

// (연산의 시작값, 정수반환, 2개를 하나라 합치는 것(BinaryOperatory))
int totalCalrories = menu.stream().collect(reducing(0, Dish::getCalories, (i, j) -> i + j);
// 조건을 통해 max값을 가져올 수 있다.
Optional<Dish> mostCaloriesDish = menu.stream().collect(reducing(d1, d2)
  -> d1.getCalories() > d2.getCalories() ? d1 : d2));
// 컬렉션 프레임워크 유연성을 고려한 메소드, Integer의 sum을 활용하여 코드를 좀 더 단순화 할 수 있다.
int totalCalories = menu.stream().collect(reducing(0, Dish::getCalories, Integer::sum));
// intStream을 통해 메소드를 호출함으로써 결과를 얻을 수 있다.
int totalCalories = menu.stream().mapToInt(Dish::getCalories).sum();

자신의 상황에 맞는 최적의 방법

개발자의 상황에 따라 작성하자
스트림 인터페이스에서 직접 제공하는 것에 비해 전에 본 것과 같이 코드가 복잡할 수 있는 경우를 볼 수 있다. 하지만 재사용성과 커스터마이즈 가능성을 제공하는 높은 수준의 추상화 및 일반화를 얻을 수 있다.
예시로 IntStream의 경우 자동언박싱 연산 및 Integer을 int로 변환하는 과정을 피할 수 있어 성능이 좋다.

그룹화

Collectors.groupingBy를 이용하게 된다면 쉽게 메뉴의 그룹을 만들 수 있다.

Map<Dish.Type, List<Dish>> dishsByType = menu.stream().collect(groupingBy(Dish::getType));
// {FISH=[prawns, salmon], OTHER ...}

그룹화를 통해 필터 조건을 걸어줌으로써, 특정데이터만 가져올 수 있다.

Map<Dish, Type, List<Dish>> caloricDishesByType = menu.stream()
  .collect(groupingBy(Dish::getType, filtering(dish -> getCalrories() > 500, toList())));
//{OTHER=[french fries, pizza]...FISH=[]}, 하지만 이 경우 FISH가 재대로 나오지 않는 문제가 있다.

매핑 메소드를 통해 해당 데이터를 가져올 수 있다.

Map<Dish, Type, List<Sting>> dishNamesByTypes = menu.stream()
  .collect(groupingBy(Dish::Type, mapping(Dish::getName, toList())));

만약 그룹이 문자열 리스트로 되어있을 경우, 이 과정에선 지난번에 쓴 flatMap를 활용해서
두 수준의 리스트를 한 수준으로 평면화 작업을 해야한다.

Map<String, List<String>> dishTags = new HashMap<>();
dishTag.push("pork", asList("greasy", "salty"));
dishTag.push("beef", asList("salty", "roasted"));
dishTag.push("chicken", asList("fried", "crisp"));
dishTag.push("rice", asList("light", "natural"));
Map<Dish.Type, Set<String>> dishNamesByType = menu.stream()
  .collect(groupingBy(Dish::getType,
    flatMapping(dish -> dishTags.get(dish.getName()).stream(),
    toSet())));

그룹화를 2개이상하기(다수준 그룹화)

그룹안에 그룹을 연결 할 수 있다.

Map<Dish.Type, Map<CalricLevel, List<Dish>>> dishesByTypeCaloricLevel = menu.stream()
  .collect(groupingBy(Dish::getType,
    groupingBy(dish -> {
      if (dish.getCalories() <= 400) return CaloricLevel.DIET;
      else if (dish.getCalories() <= 700) return CaloricLevel.NORMAL;
      else return CaloricLevel.FAT;
    })
  )
}; 
//{MEAT={DIET=[chicken], NORMAL=[beef]....

서브수준 그룹화

groupingBy로 넘겨주는 컬랙터의 형식은 제한이 없다. 따라서 두번째인수로 어떤걸 넣어도 된다.

// 두번째인수로 Long을 주어 count처리
Map<Dish.Type, Long> typesCount = menu.stream().collect(groupingBy(Dish::getType, counting()));
//{MEAT=3, FISH=2...
// 두번째 인수로 Optional을 주어 최대칼로리 객체 삽입
Map<Dish.Type, Optional<Dish>> mostCaloricByType = menu.stream()
  .collect(groupingBy(Dish::getType, maxBy(CompaingInt(Dish::getCalories))));
//{FISH=Optional[salmon] ....

groupingBy와 mapping를 활용

해당 그룹의 두번째인수에 mapping을 통해 가져온 데이터를 넣어 그룹화를 할 수도 있다.

Map<Dish.Type, Set<CaloricLevel>> caloricLevelsByType = menu.stream()
  .collect(groupingBy(Dish::getType, mapping(dish -> {
    if (dish.getCalories() <= 400) return caloricLevel.DIET;
    else if (dish.getCalories() <= 700) return caloricLevel.NORMAL;
    else return caloricLevel.FAT;
  }, toSet() )));
  {OTHER=[DIET, NOMAL]...

분할

true와 false로 구분하여 리스트를 만들 수 있다.

Map<Boolean, List<Dish>> partitionedMenu = menu.stream().collect(partitioningBy(Dish::isVegetarian));
// {false=[pork, beef...
// true=[french fires...

분할장점

분할은 참, 거짓 두 요소의 스트림 리스트를 모두 유지할 수 있다는 장점이 있다.
따라서 모든 데이터를 바탕으로 추가적인 로직을 더 취할 수 있다는 점이 장점이다.
Map<Boolean, List<Dish>> partitionedMenu = menu.stream().collect(partitioningBy(
  Dish::isVegetarian, groupingBy(Dish::getType));
// {false=FISH=[prawns, salmon...MEAT=[...
// true=OTHER=[french fires...

소수와 비소수 분할

 //스트림의 모든 정수로 candidate를 나눌수 없는 경우 true
public boolean isPrime(int candidate) {
  return IntStream.range(2, candidate).noneMatch(i -> candidate % i == 0);
}
// 주어진 수의 제곱근으로 범위를 좁혀 범위를 제한
public boolean isPrime(int candidate) {
  int candidateRoot = (int) Math.sqrt((double)candidate);
  return IntStream.rangeClosed(2, candidateRoot).noneMatch(i -> candidate % i == 0);
}
// partitioningBy 컬렉터를 활용해 숫자를 소수와 비소수로 구분가능하다.
public Map<Boolean, List<Integer>> partitionPrimes(int n) {
  return IntStream.rangeClosed(2, n).boxed().collect(partitioningBy(candidate -> isPrime(candidate)));
}

Collector 인터페이스

인터페이스는 총 5개가 있다 각자의 구성요소은 다음과같다.

// T 제네릭 A 누적자 R 수집연산결과 객체
public interface Collector<T, A, R> {
  Supplier<A> supplier();
  BiConsumer<A, T> accumulator();
  Function<A, R> finisher();
  BinaryOperator<A> combiner();
  Set<Characteristics> characteristics();
}

supplier 메서드

새로운 결과 컨테이너를 만들때 쓰는 메서드이다.

// 데이터를 넣기 전 객체를 생성한다.
public Supplier<List<T>> supplier() {
  return ArrayList::new;
}
// 메소드 참조
public Supplier<List<T>> supplier() {
  return ArrayList::new;
}

accumulator 메서드

리듀싱 연산을 수행하는 함수를 반환, 즉 스트림에서 리스트와 리스트에 들어갈 데이터 2개가 있고 리스트에 들어갈 데이터를 리스트에 삽입하는 과정이다.
// 2가지 데이터를 하나로 합친다.
public BiConsumer<List<T>, T> acuumulator() {
  return (list, item) -> list.add(item);
}
// 메소드 참조
public BiConsumer<List<T>, T> accumulator() {
  return List::add;
}

finisher 메서드

스트림 탐색을 끝내고 누적자 객체를 최종결과로 변환 하며 누적자가 끝낼때 호출할 함수를 반환한다.
public Function<List<T> List<T>> finisher() {
  return Function.identity();
}

순차 리듀싱과정 논리적 순서

supplier(객체생성) -> accmulator(데이터삽입) -> 데이터를 다 못가져온 경우 accmulator반복 -> finisher 호출 -> 종료

두 결과 컨테이너 병합

1.스트림의 서브파트를 병렬로 처리할 때 사용
2. 해당 메소드는 병렬일 때, finisher 호출 전에 사용된다.
public BinaryOperator<List<T>> combiner() {
  return (list1, list2) -> {
    liat.addAll(list2);
    return list1;
  }
}

Characteristics 메서드

소개
컬렉터의 연산을 정의하는 Characteristics 형식의 불변 집합을 반환
스트림을 병렬로 리듀스 할지 여부 및 병렬로 리듀스한다면 어떤 최적화를 선택해야할지 힌트 제공

종류
UNORDERED : 리듀싱 결과는 스트림 요소의 방문 순서나 누적 순서에 영향을 받지 않는다.
CONCURRENT : 다중 스레드에서 accumulator 함수를 호출할 수 있으며 이 컬렉터는 스트림의 병렬 리듀싱을 수행할 수 있다. 컬렉터의 플래그에 UNORDERED를 함께 설정하지 않았다면 데이터 소스가 정렬되어있지 않은 상황에서만 병렬 리듀싱을 수행할 수 있다.
IDENTITY_FINISH : finisher 메서드가 반환하는 함수는 단순히 identity를 적용할 뿐이므로 이를 생략할 수 있다. 따라서 리듀싱 과정의 최종 결과로 누적자 객체를 바로 사용할 수 있으며, 누적자 A를 결과 R로 안전하게 형변환할 수 있다.

응용

메소드를 활용해 ToListCollector을 구현 가능하다.

public class ToListCollector<T> implements Collect<T, List<T>, List<T>> {
  @Override
  public Supplier<List<T>> supplier() {
    return ArrayList::new; // 객체생성
  }
  @Override
  public BiConsumer<List<T>, T> accumulator() {
    return List::add; // 탐색한 항목 누적
  }
  @Override
  public Function<List<T> List<T>> finisher() {
    return Function.identity(); // 항등함수
  }
  @Override
  public BinaryOperator<List<T>> combiner() {
    return (list1, list2) -> {
      liat.addAll(list2); // 두번째 콘텐츠와 합쳐서 누적자를 수정, 서브스트림 합치는 작업
      return list1;
    }
  } 
  @Override
  public Set<Characteristics> characteristics() {
    return Collections.unmodifiableSet(EnumSet.of(IDENTITY_FINISH, CONCURRENT)); // 컬렉터의 플래그 설정
  }
}

컬렉터 구현 없이 사용

컬렉터에서 한 작업을 각각 파라미터에 넣어서 사용할 수 있다. 다만 이렇게 하게 될 경우은 가독성이 낮다.
커스텀 Collector을 구현하는 것이 재사용성 및 가독성 측면에서 좀 더 나은 편이다.
List<Dish> dishes = menuStream.collect(
  ArrayList::new, // 생성(Supplier)
  List::add,  // 누적(accumulator)
  List:addAll); // 합침(combiner)

커스텀 컬렉터로 성능개선

// 다음은 소수와 비소수를 구분해서 Map에 넣는 로직이다.
public Map<Boolean, List<Integer>> partitionPrimes(int n) {
  return IntStream.rangeClosed(2, n).boxed().collect(partitioningBy(candidate -> isPrime(candidate)));
}
// 메소드에 talkWhile를 사용함으로써 break느낌으로 쓰게 된다면 해당 데이터는 더이상 받지않아 성능적으로 개선된다.
public boolean isPrime(List<Integer> primes, int candidate) {
  int candidateRoot = (int) Math.sqrt((double)candidate);
  return primes.stream()
    .talkWhile(i -> i <= candidateRoot);
    .noneMatch(i -> candidate % i == 0);
}
// 다만 takewhile는 java9에서 지원하므로 java8은 직접만들어야한다.
public static <A> List<A> talkWhile(List<A> list, Predicate<A> p) {
  int i = 0;
  for (A item : list) {
    if(!p.test(item)) { // 프레디케이트 만족여부 확인
      return list.subList(0, i); // 프레디케이트를 만족하지 않으면 이전 데이터를 다 가져와서 반환 후 종료한다.
    }
    i++;
  }
  return list;
}

Collect 커스터마이징 한거 생성

Collector 클래스 시그니처 정의

public class PrimeNumbersCollector implements Collect<Integer, Map<Boolean, List<Integer>>,
  Map<Boolean, List<Integer>>>

리듀싱연산

// Supplier 구현
public Supplier<Map<Boolean, List<Integer>>> supplier() {
  return () -> new HashMap<Boolean, List<Integer>>() {{
    put(true, new ArrayList<Integer>());
    put(false, new ArrayList<Integer>());
  }};
}
// accumulator 구현
public BiConsumer<Map<Boolean, List<Integer>>, Integer> accumulator() {
  return (Map<Boolean, List<Integer>> acc, Integer candidate) -> {
    acc.get( isPrime(acc.get(true), candidate) ) //isPrime 결과에 따라 소수리스트, 비소수 리스트 생성
      .add(candidate); //candidate를 알맞은 리스트에 추가
  };
}

// 병렬실행을 한다면 만들기
public BinaryOperator<Map<Boolean, List<Integer>>> combiner() {
  return (Map<Boolean, List<Integer>> map1, Map<Boolean, List<Integer>> map2) -> {
    map1.get(true).addAll(map2.get(true));
    map1.get(false).addAll(map2.get(false));
  };
}

// 변환과정이 필요하므로 항등함수만 넣는다
public Function<Map<Boolean, List<Integer>>, Map<Boolean, List<Integer>>> finisher() {
  return Function.identity();
}

// 커스터마이징 한 컬렉터의 예시이다.
public Map<Booelan, List<Integer>> partitionPrimesWithCustomCollector(int n) {
  IntStream.rangeClosed(2, n).boxed().collect(
    () -> new HashMap<Boolean, List<Integer>>() {{ // 발행(Supplier)
      put(true, new ArrayList<Integer>());
      put(false, new ArrayList<Integer>());
    }},
    (acc, candidate) -> { // 누적(accumulator)
      acc.get( isPrime(acc.get(true), candidate) )
        .add(candidate);
    },
    (map1, map2) -> { // 합침(finisher)
      map1.get(true).addAll(map2.get(true));
      map1.get(false).addAll(map2.get(false));
    });
}
// 가독성이 떨어지는 아쉬움이 있다.

후기

결론만 말한다면.. 이번 챕터는 읽는데 상당히 오래걸렸다. 하루종일 읽고.. 블로그정리 역시 하루종일 읽던 것을 또 다시 재대로 이해를 했는지 확인하고 정리한거같다. groupingBy, partition을 통해 스트림의 요소를 그룹 및 분할이 가능하며, 스트림의 컬렉터를 쓸 때 그룹화, 분할, 리듀싱 등 다양한 기능이 있고, 최대값, 최소값, 평균값 등을 계산할 수 있는 컬렉터의 존재, 마지막으로 Collector 인터페이스를 활용하여 정의된 메소드를 구현해서 나에게 필요한 커스텀 컬렉터를 구현할 수 있다는 것을 알았다.
다양한 방법이 있어, 가급적 팀프로젝트를 할 때는 가독성 위주, 개인플젝때는 규모에 따라 가독성과 성능적인 측면을 고려하며 오늘 배운 커스터마이징 컬렉터나 그룹화 커스터마이징 등의 기능을 사용해야 될 것 같다.

kimseungki

seung 기술블로그

이전 포스트

모던 자바 인 액션 (세션 4)

다음 포스트