[Java] TreeSet과 HashSet Add 파헤치기

하비·2025년 7월 31일

HashSet TreeSet

Java

목록 보기

10/13

삼성 S/W 알고리즘 특강 사전문제를 풀다가 TreeSet을 써야만 하는 문제가 있었다.

"TreeSet을 add할 때, 같은 객체임을 구분하는게 무엇인지 아는가?"

답은 compareTo를 통해서 구분해서 넣어준다.

이 부분을 알고 있다면 이 글을 읽을 필요가 없다.
만약에, 습관적으로 equals로 구분하지 않나? 라고 생각한다면 이 글이 도움이 될 것이다.

TreeSet의 원소에 해당하는 객체에 equals와 hashcode를 오버라이딩 해놓고, 구분해줬는데 왜 add할 때 같은 원소 취급하지? 라는 실수가 있었다.

지금 생각해보면, 당연한걸 왜 헷갈렸을까 싶다.
아무래도 equals, hash code와 TreeSet이 어떻게 동작하는지 제대로 된 지식이 없었기 때문인 것 같다.

그래서 equals, hash code, TreeSet 동작 구조를 파헤쳐 보려고 한다.

1. 문제가 됐던 코드

class Node{
	...
		
	@Override
	public int hashCode() {
		return Objects.hash(genre, id);
	}

	@Override
	public boolean equals(Object obj) {
		if (this == obj)
			return true;
		if (obj == null)
			return false;
		if (getClass() != obj.getClass())
			return false;
		Movie other = (Movie) obj;
		return genre == other.genre && id == other.id;
	}
	
	@Override
	public int compareTo(Movie o) {
		if(o.totalRating==this.totalRating) return Integer.compare(o.registerTime, this.registerTime);
		return Integer.compare(o.totalRating, this.totalRating);
	}
}

2. TreeSet

TreeSet의 원소를 다음과 같이 했다고 가정해보자

class Node{
	int id, genre, totalRating;

	public Node(int id, int genre, int totalRating) {
		super();
		this.id = id;
		this.genre = genre;
		this.totalRating = totalRating;
	}
}

이때, TreeSet에 Node(555, 4, 5), Node(111, 3, 5), Node(111, 3, 5) 객체를 넣는다고 해보자
이때는 Node들을 다 다른 객체로 인식하기 때문에 3개가 다 추가가 될 것이다.

내용이 같은 객체를 같게 인식해주게 하려면 어떻게 해야할까?

나는 버릇처럼 equals와 hashCode 함수를 오버라이딩해서 구분해주려고 했다.
이게 문제가 되었다.
만약에 코드를 다음과 같이 썼다면, 결과가 어떻게 될까?

class Node implements Comparable<Node>{
	...
		
	@Override
	public int hashCode() {
		return Objects.hash(genre, id);
	}

	@Override
	public boolean equals(Object obj) {
		if (this == obj)
			return true;
		if (obj == null)
			return false;
		if (getClass() != obj.getClass())
			return false;
		Movie other = (Movie) obj;
		return genre == other.genre && id == other.id;
	}
	
	@Override
	public int compareTo(Movie o) {
		return Integer.compare(o.totalRating, this.totalRating);
	}
}

답은 Node(555, 4, 5) 1개만 TreeSet에 들어가고 나머지는 false를 출력하게 된다.
이 이유는 TreeSet이 어떻게 add 하는 지 방식을 보면 알 수 있다.

1) TreeSet의 add

TreeSet은 생성자를 불러올 때 아무 것도 인자로 안줬다면 TreeMap으로 객체를 생성한다.
그래서 결국 TreeSet.add는 TreeMap의 put 함수를 불러온다.

TreeSet 생성자

TreeSet(NavigableMap<E,Object> m) {
    this.m = m;
}

public TreeSet() {
    this(new TreeMap<>());
}

public TreeSet(Comparator<? super E> comparator) {
    this(new TreeMap<>(comparator));
}

TreeSet.add

public boolean add(E e) {
    return m.put(e, PRESENT)==null;
}

TreeMap.put

public V put(K key, V value) {
    return put(key, value, true);
}

private V put(K key, V value, boolean replaceOld) {
    Entry<K,V> t = root;
    if (t == null) {
        addEntryToEmptyMap(key, value);
        return null;
    }
    int cmp;
    Entry<K,V> parent;
    // split comparator and comparable paths
    Comparator<? super K> cpr = comparator;
    if (cpr != null) {
        do {
            parent = t;
            cmp = cpr.compare(key, t.key);
            if (cmp < 0)
                t = t.left;
            else if (cmp > 0)
                t = t.right;
            else {
                V oldValue = t.value;
                if (replaceOld || oldValue == null) {
                    t.value = value;
                }
                return oldValue;
            }
        } while (t != null);
    } else {
        Objects.requireNonNull(key);
        @SuppressWarnings("unchecked")
        Comparable<? super K> k = (Comparable<? super K>) key;
        do {
            parent = t;
            cmp = k.compareTo(t.key);
            if (cmp < 0)
                t = t.left;
            else if (cmp > 0)
                t = t.right;
            else {
                V oldValue = t.value;
                if (replaceOld || oldValue == null) {
                    t.value = value;
                }
                return oldValue;
            }
        } while (t != null);
    }
    addEntry(key, value, parent, cmp < 0);
    return null;
}

이런 식의 구조로 되어 있다.

public V put(K key, V value) {
    return put(key, value, true);
}

private V put(K key, V value, boolean replaceOld) {
    Entry<K,V> t = root;
    if (t == null) {
        addEntryToEmptyMap(key, value);
        return null;
    }
    int cmp;
    Entry<K,V> parent;
    // split comparator and comparable paths
    Comparator<? super K> cpr = comparator;
    if (cpr != null) {
        do {
            parent = t;
            cmp = cpr.compare(key, t.key);
            if (cmp < 0)
                t = t.left;
            else if (cmp > 0)
                t = t.right;
            else {
                V oldValue = t.value;
                if (replaceOld || oldValue == null) {
                    t.value = value;
                }
                return oldValue;
            }
        } while (t != null);
    } else {
        Objects.requireNonNull(key);
        @SuppressWarnings("unchecked")
        Comparable<? super K> k = (Comparable<? super K>) key;
        do {
            parent = t;
            cmp = k.compareTo(t.key);
            if (cmp < 0)
                t = t.left;
            else if (cmp > 0)
                t = t.right;
            else {
                V oldValue = t.value;
                if (replaceOld || oldValue == null) {
                    t.value = value;
                }
                return oldValue;
            }
        } while (t != null);
    }
    addEntry(key, value, parent, cmp < 0);
    return null;
}

comparator에서 두 객체(추가할 값과 기존 값) 비교 후 1번째가 2번째보다 적으면 음수, 같으면 0, 크면 양수를 리턴한다. 그리고 그 리턴 값이 양수나 음수인 경우 추가할 값의 적절한 위치까지 가게 된다. 그리고 그 위치에 addEntry 함수를 통해 값을 추가하고, null을 반환한다.
만약 기존 값에 추가할 값과 같은 값이 있어서 comparator가 0을 리턴할 경우, 새로 추가할 값을 해당하는 키에 갱신해준다. 그리고 원래 있던 값을 반환한다.

public boolean add(E e) {
    return m.put(e, PRESENT)==null;
}

TreeSet add 함수에서는 null 값이 반환될 경우, add가 되었다는 뜻으로 true 값을 리턴하게 된다. 만약 null 값이 아닌 값이 반환될 경우, false 값을 반환한다.

TreeSet의 경우엔 value 값이 없기에 PRESENT를 넣어준다. PRESENT는 더미 Object다.
즉, 만약 같은 원소를 넣어줄 경우에 PRESENT를 반환하게 되고, 이것은 null이 아니기 때문에 false를 반환하는 것이다.
결국, 값을 넣어줄 때, Comparator만 관련 있다는 걸 생각하면 된다.

다시 문제의 코드로 넘어가보면,

class Node implements Comparable<Node>{
	...
		
	@Override
	public int hashCode() {
		return Objects.hash(genre, id);
	}

	@Override
	public boolean equals(Object obj) {
		if (this == obj)
			return true;
		if (obj == null)
			return false;
		if (getClass() != obj.getClass())
			return false;
		Movie other = (Movie) obj;
		return genre == other.genre && id == other.id;
	}
	
	@Override
	public int compareTo(Movie o) {
		return Integer.compare(o.totalRating, this.totalRating);
	}
}

결국 compareTo를 통해서 totalRating 값만 비교해주기 때문에, totalRating이 같은 값은 다 같은 원소로 인식하게 된다. 그래서 TreeSet에 Node(555, 4, 5), Node(111, 3, 5), Node(111, 3, 5)를 넣었을 경우, 처음에 넣는 Node(555, 4, 5) 원소 1개만 추가가 되는 것이다.

고친 코드

class Node implements Comparable<Node>{
	...
	
	@Override
	public int compareTo(Node o) {
		if(o.totalRating==this.totalRating) return Integer.compare(o.id, this.id);
		return Integer.compare(o.totalRating, this.totalRating);
	}
}

totalRating 외에 id가 달라도 다른 원소로 구분하고 싶다면, 이렇게 코드를 고쳐야 한다.

remove도 마찬가지로 comparator의 compare을 통해서 비교 후 삭제한다.

그럼 equals와 HashCode는 언제 쓸까?

결론적으로 HashSet, HashMap 등의 자료구조를 쓸 때, 써줘야 한다.
TreeSet은 equals와 hashcode를 사용하는 부분이 없다.
그럼 이 기회에 HashSet도 알아보자

2. HashSet

HashSet도 TreeSet처럼 HashMap을 가져다가 쓴다.

HashSet 생성자

public HashSet() {
    map = new HashMap<>();
}

HashSet.add

public boolean add(E e) {
    return map.put(e, PRESENT)==null;
}

HashMap.put
HashMap의 put을 보면 equals를 사용해서 비교하는걸 볼 수 있다.

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

나는 HashMap도 같이 써야했기에 결론적으로 equals와 hashCode도 써줘야 하는 게 맞았다.
하지만 TreeSet도 원소를 넣을 때, equals로 구분한다고 단순하게 생각해, 원소를 추가하는 부분에서 틀리게 되었다.
이번 기회에 정리를 통해서 제대로 구분해서 알아보게 되었다. 나처럼 잠시 헷갈렸던 분들에게 도움이 되길.

하비

멋진 개발자가 될테야

이전 포스트

[Java] Java & Spring boot 버전 뭘 선택해야 할까?

다음 포스트