[Java] 스트림의 연산 / Stream (2)

김태환·2024년 11월 3일

스트림의 연산, 뭐가 있지?

중간 연산

중간 연산은 얼마든지 반복이 가능하다. 또, 중간 연산의 결과물이 Strean이라서 중간 연산 앞뒤로 중간 연산들이 붙을 수 있다.

Stream<T> distinct()
중복을 제거

Stream.of(1, 2, 2, 3, 3, 3).distinct().forEach(System.out::print); // 123

Stream<T> filter(Predicate<T> predicate)
조건에 안 맞는 요소 제외

Stream.of(1, 2, 3, 4).filter(n -> n % 2 == 0).forEach(System.out::print); // 24

Stream<T> limit(long maxSize)
스트림의 일부를 잘라낸다.

Stream.of(1, 2, 3, 4, 5).limit(3).forEach(System.out::print); // 123

Stream<T> skip(long n)
스트림의 일부를 건너뛴다.

Stream.of(1, 2, 3, 4, 5).skip(2).forEach(System.out::print); // 345

Stream<T> peek(Consumer<T> action)
스트림의 요소에 작업 수행한다. (단, 중간요소로서 스트림의 요소를 소모하지 않는다.)
중간 연산 결과를 디버깅하는 용도로 사용할 수도 있다.

Stream.of(1, 2, 3).peek(System.out::println).count(); // 1 2 3

Stream<File> fileStream = Stream.of(new File("Ex1.java"), new File("Ex1"),
  new File("Ex1.bak"), new File("Ex2.java"), new File("Ex1.txt"));
  
fileStream.map(File::getName) // Stream<File> -> Stream<String>
        .filter(s -> s.indexOf('.') != -1) // 확장자가 없는 것은 제외
        .peek(s -> System.out.printf("filename=%s%n", s)) // 파일명을 출력한다.
        .map(s -> s.substring(s.indexOf('.') + 1)) // 확장자만 추출
        .peek(s -> System.out.printf("extension=%s%n", s)) // 확장자를 출력한다.
        .forEach(System.out::println); // 최종연산 스트림을 소비.

Stream<T> sorted()
스트림의 요소를 정렬한다.

Stream.of(3, 1, 2).sorted().forEach(System.out::print); // 123

Stream<T> sorted(Comparator<T> comparator)
스트림의 요소를 정렬한다.

Stream.of(3, 1, 2).sorted(Comparator.reverseOrder()).forEach(System.out::print); // 321

Stream<R> map(Function<T, R> mapper)
스트림의 요소를 T에서 R로 변환한다.

Stream<File> fileStream = Stream.of(new File("Ex1.java"), new File("Ex1"),
  new File("Ex1.bak"), new File("Ex2.java"), new File("Ex1.txt"));
  fileStream.map(File::getName)                      // Stream<File> -> Stream<String>
        .filter(s -> s.indexOf('.') != -1)       // 확장자가 없는 것은 제외
        .map(s -> s.substring(s.indexOf('.') + 1)) // 확장자만 추출, Stream<String> -> Stream<String>
        .map(String::toUpperCase)                // 모두 대문자로 변환, Stream<String> -> Stream<String>
        .distinct()                              // 중복 제거
        .forEach(System.out::print);             // JAVABAKTXT

DoubleStream mapToDouble(ToDoubleFunction<T> mapper)
스트림 요소를 double로 변환

Stream.of(1, 2, 3).mapToDouble(n -> n * 1.5).forEach(System.out::println); // 1.5 3.0 4.5

IntStream mapToInt(ToIntFunction<T> mapper)
스트림 요소를 int로 변환

Stream.of("1", "2", "3").mapToInt(Integer::parseInt).forEach(System.out::println); // 1 2 3

LongStream mapToLong(ToLongFunction<T> mapper)
스트림 요소를 long로 변환

Stream.of("100", "200").mapToLong(Long::parseLong).forEach(System.out::println); // 100 200

Stream<R> flatMap(Function<T, Stream<R>> mapper)
스트림의 요소를 변환하고 평탄화
map()을 쓰면 {1, 2} 스트림, {3, 4} 스트림, 이 2개의 스트림을 묶는 스트림이 발생한다. 이걸 그냥 {1, 2, 3, 4} 하나의 스트림으로 평탄화 하겠다면 flatMap()을 사용하면 된다.

Stream.of(Arrays.asList(1, 2), Arrays.asList(3, 4))
              .map(List::stream).forEach(System.out::println); // java.util.stream.ReferencePipeline$Head@1b6d3586 java.util.stream.ReferencePipeline$Head@4554617c
Stream.of(Arrays.asList(1, 2), Arrays.asList(3, 4))
              .flatMap(List::stream).forEach(System.out::print); // 1234

DoubleStream flatMapToDouble(Function<T, DoubleStream> mapper)
요소를 변환하고 double 스트림으로 평탄화

Stream.of(Arrays.asList(1.1, 2.2), Arrays.asList(3.3))
    .flatMapToDouble(list -> list.stream().mapToDouble(Double::doubleValue)).forEach(System.out::println);

IntStream flatMapToInt(Function<T, IntStream> mapper)
요소를 변환하고 int 스트림으로 평탄화

Stream.of(Arrays.asList(1, 2), Arrays.asList(3))
    .flatMapToInt(list -> list.stream().mapToInt(Integer::intValue)).forEach(System.out::println);

LongStream flatMapToLong(Function<T, LongStream> mapper)
요소를 변환하고 long 스트림으로 평탄화

Stream.of(Arrays.asList(100L, 200L), Arrays.asList(300L))
    .flatMapToLong(list -> list.stream().mapToLong(Long::longValue)).forEach(System.out::println);

최종 연산

최종 연산은 단 한 번 수행하면, 해당 스트림은 닫혀버린다. 즉, 스트림 내의 요소들이 소모된 것이라서, 이 연산 뒤에 또 다른 연산을 수행하면 에러가 발생한다. 최종 연산의 결과물은 int, boolean, Optional 등을 반환한다. (Stream이 아니다)

void forEach(Consumer<? super T> action)
각 요소에 지정된 작업 수행, 병렬 스트림일 때, 순서를 보장할 수 없다
```
Stream.of(1, 2, 3).forEach(System.out::println); // 1 2 3
```
void forEachOrdered(Consumer<? super T> action)
병렬 스트림도 순서를 보장해준다 → forEach()와의 차이
```
Stream.of(1, 2, 3).parallel().forEachOrdered(System.out::println); // 1 2 3
```

long count()
스트림의 요소 수 반환

long count = Stream.of(1, 2, 3).count(); // 3

Optional<T> max(Comparator<? super T> comparator)
스트림의 최대값 반환
```
int max = Stream.of(1, 3, 2).max(Integer::compare).orElse(-1); // 3
```
Optional<T> min(Comparator<? super T> comparator)
스트림의 최소값 반환
```
int min = Stream.of(1, 3, 2).min(Integer::compare).orElse(-1); // 1
```

Optional<T> findAny()
스트림의 요소 하나를 반환, null일 수도 있기에 Optional

Stream.of(1, 2, 3).findAny().ifPresent(System.out::println); // 1 (또는 2, 3 중 하나)

Optional<T> findFirst()
첫 번째 요소 반환, null일 수도 있기에 Optional
```
Stream.of(1, 2, 3).findFirst().ifPresent(System.out::println); // 1
```
언제 findAny(), findFirst() 사용하지?
순차적으로 탐색하는 순차 스트림에서는 findFirst()를, 여러 개 중 무엇을 발견할 지 모르지만 아무거나 하나를 반환하면 되고, 멀티 스레드 환경과 같은 병렬 스트림에서는 findAny()를 사용한다고 한다.

boolean allMatch(Predicate<T> p)
모든 요소가 조건을 만족하면 true 반환

boolean allEven = Stream.of(2, 4, 6).allMatch(n -> n % 2 == 0); // true

boolean anyMatch(Predicate<T> p)
하나라도 조건을 만족하면 true 반환

boolean anyEven = Stream.of(1, 2, 3).anyMatch(n -> n % 2 == 0); // true

boolean noneMatch(Predicate<T> p)
모든 요소가 조건을 만족하지 않으면 true 반환
```
boolean noneEven = Stream.of(1, 3, 5).noneMatch(n -> n % 2 == 0); // true
```
Object[] toArray()
스트림의 모든 요소를 배열로 반환
```
Object[] arr = Stream.of(1, 2, 3).toArray(); // [1, 2, 3]
```
<A> A[] toArray(IntFunction<A[]> generator)
제공된 배열 생성기로 요소를 배열로 반환
```
Integer[] arr = Stream.of(1, 2, 3).toArray(Integer[]::new); // [1, 2, 3]
```
Optional<T> reduce(BinaryOperator<T> accumulator)
스트림의 요소를 하나씩 줄여가며(리듀스) 계산
```
int sum = Arrays.stream(new int[]{1, 2, 3}).reduce((a, b) -> a + b).orElse(0); // 6
```

T reduce(T identity, BinaryOperator<T> accumulator)
초기값과 함께 스트림 요소를 줄여가며 계산

int sum = Arrays.stream(new int[]{1, 2, 3}).reduce(0, (a, b) -> a + b); // 6

// 여기서 (a, b) -> a + b는 다음과 같이 작동한다.
int a = identity; // 누적 결과 저장 변수
for(int b : stream)
  a = a + b;    // sum()

<U> U reduce(U identity, BiFunction<U,T,U> accumulator, BinaryOperator<U> combiner)
병렬 계산 시 결과 병합에 사용
```
int lengthSum = Stream.of("a", "bb", "ccc").reduce(0, (sum, str) -> sum + str.length(), Integer::sum); // 6
```

<R, A> R collect(Collector<? super T, A, R> collector)
스트림의 요소를 수집하여 컬렉션에 담거나 특정 형태로 변환
Collector는 T(요소)를 A에 누적한 다음, 결과를 R로 변환해서 반환한다.

List<Integer> list = Stream.of(1, 2, 3).collect(Collectors.toList()); // [1, 2, 3]

collect()에서 다음과 같이 스트림을 분할해서 연산을 수행할 수 있다.

Map<Boolean, Long> stuNumBySex = stuStream
  .collect(Collectors.partitioningBy(Student::isMale, Collectors.counting())); // 분할 + 통계
System.out.println("남학생 수 :" + stuNumBySex.get(true));  // 남학생 수 : 8
System.out.println("여학생 수 :" + stuNumBySex.get(false)); // 여학생 수 : 10

Map<Boolean, Optional<Student>> topScoreBySex = stuStream
  .collect(partitioningBy(Student::isMale, maxBy(comparingInt(Student::getScore)))); // 분할 + 통계
System.out.println("남학생 1등 :" + topScoreBySex.get(true));  // 남학생 1등 : Optional[남학생 정보]
System.out.println("여학생 1등 :" + topScoreBySex.get(false)); // 여학생 1등 : Optional[여학생 정보]

Map<Boolean, Map<Boolean, List<Student>>> failedStuBySex = stuStream
  .collect(partitioningBy(Student::isMale,            // 1. 성별로 분할 (남/녀)
           partitioningBy(s -> s.getScore() < 150))); // 2. 점수 기준으로 분할 (불합격/합격)
List<Student> failedMaleStu = failedStuBySex.get(true).get(true);  // 남학생 중 불합격 학생 목록
List<Student> failedFemaleStu = failedStuBySex.get(false).get(true); // 여학생 중 불합격 학생 목록

collect()에서 스트림의 요소를 그룹화해서 연산을 수행할 수도 있다.

Map<Integer, List<Student>> stuByBan = stuStream
  .collect(groupingBy(Student::getBan, toList())); // 학생을 반별로 그룹화

Map<Integer, Map<Integer, List<Student>>> stuByHakAndBan = stuStream
  .collect(groupingBy(Student::getHak,              // 1. 학년별 그룹화
           groupingBy(Student::getBan)));           // 2. 반별 그룹화

Map<String, Set<Student3.Level>> stuByScoreGroup = Stream.of(stuArr)
  .collect(groupingBy(s -> s.getHak() + "-" + s.getBan(),
      mapping(s -> {
          if(s.getScore() >= 200) return Student3.Level.HIGH;
          else if(s.getScore() >= 100) return Student3.Level.MID;
          else return Student3.Level.LOW;
      }, toSet())
  ));
// [학년-반][해당 반에 분포되어 있는 성적 그룹 목록]
// ex) [1-1][HIGH], [1-2][HIGH, MID]

<R> R collect(Supplier<R> supplier, BiConsumer<R,? super T> accumulator, BiConsumer<R,R> combiner)
스트림 요소를 축적하며 분할하고 결과를 컬렉션에 담아 반환
```
Set<Integer> set = Stream.of(1, 2, 3).collect(HashSet::new, HashSet::add, HashSet::addAll); // [1, 2, 3]
```
reduce(), collect()의 차이점
reduce()는 전체를 수행하는 반면, collect()는 그룹별로, 병렬로 reduce()를 수행할 수 있다.
그래서 collect()의 Collector 인터페이스 안을 들여다보면 병렬로 수행된 연산 결과를 결합하는 combiner() 메서드가 정의되어 있다. → 직접 구현할 필요는 없고, 구현체로 Collectors 클래스가 있다.

결론

중간 연산은 스트림을 반환한다.

최종 연산은 스트림 외의 자료형도 반환할 수 있다. 그리고, 스트림 요소를 소모하여 결과값을 반환한다.

그러므로, 중간 연산은 연결할 수 있지만, 최종 연산은 단 한 번 수행 후에 연결이 불가능하다.

병렬 스트림 연산이냐 아니냐의 차이도 주의할 것.

참고 및 출처
Java의 정석

남궁석의 정석코딩 유튜브 <[자바의 정석 - 기초편] ch14-26~29 스트림의 중간연산(1)> https://www.youtube.com/watch?v=G2lPQB42GL8&t=5s

남궁석의 정석코딩 유튜브 <[자바의 정석 - 기초편] ch14-30~34 스트림의 중간연산(2)> https://www.youtube.com/watch?v=sEa4RQGG0HU

남궁석의 정석코딩 유튜브 <[자바의 정석 - 기초편] ch14-40~44 스트림의 최종연산에 대한 강의입니다.>
https://www.youtube.com/watch?v=M_4a4tUCSPU&t=1208s

남궁석의 정석코딩 유튜브 <[자바의 정석 - 기초편] ch14-45~49 collect()와 Collectors에 대한 강의입니다.>
https://www.youtube.com/watch?v=u9KOajCP3D8

남궁석의 정석코딩 유튜브 <[자바의 정석 - 기초편] ch14-50~55 스트림의 그룹화와 분할에 대한 강의입니다.>
https://www.youtube.com/watch?v=VUh_t_j9qjE

김태환

이로운 개발자

이전 포스트

[Java] 스트림은 무엇인가? / Stream (1)

다음 포스트

[Java] 스트림의 연산 / Stream (2)

스트림의 연산, 뭐가 있지?

중간 연산

최종 연산

결론

[Java] 스트림은 무엇인가? / Stream (1)

[Java] Variable used in lambda expression should be final or effectively final

0개의 댓글