[Java] HashMap Method, 해싱 충돌 살펴보기

알고리즘/자료구조

[Java] HashMap Method, 해싱 충돌 살펴보기

leejinwoo1126 2023. 8. 19. 15:45

HashMap

- 내부적으로 hash 값을 인덱스로 하여 자료 구조 관리

- Key 중복 x , Value 중복 허용

- hash 값의 중복이 발생하는 경우 LinkedList, Tree 자료 구조 사용하여 해결하고 있음

Method 살펴보기

put()

- key 로 만든 hash 값에 해당하는 bucket이 없는 경우 신규 Node 저장, 있는 경우 value를 덮어씌움

- 동일한 hash 값이 있는 경우 ThreadHold 값에 따라 LinkedList, Tree 로 자료 연결되어 관리

참고. HashMap put() 살펴보기

예시

테스트 파일에서 공통으로 사용할 Map의 데이터를 초기화

public class HashMapTest {
    Map<String, Integer> map = new HashMap<>();

    @BeforeEach
    void setUp() {
        map.put("A", 65);
        map.put("B", 66);
        map.put("C", 67);
        map.put("D", 68);
        map.put("Z", 90);
    }
	
    // .. 
}

putIfAbsent()

key mapping 되는

- Node 없는 경우 value를 넣고 null을 반환한다 ( absent : 없는 )

- Node 있는 경우 기존 value를 반환한다

//java.util.Map 인터페이스
default V putIfAbsent(K key, V value) {
    V v = this.get(key);
    if (v == null) {
        v = this.put(key, value);
    }

    return v;
}


//java.util.HashMap 클래스 
public V putIfAbsent(K key, V value) {
    return this.putVal(hash(key), key, value, true, true); // 네번재 인자가 true 
}

예시

@DisplayName("key가 존재하지 않을 때만 value를 넣는다")
@Test
void putIfAbsent() { 
    //given //when
    map.putIfAbsent("A", 0);
    map.putIfAbsent("X", 88);

    //then
    assertThat(map.get("A")).isEqualTo(65);
    assertThat(map.get("X")).isEqualTo(88);
    assertThat(map).containsEntry("A", 65)
                   .containsEntry("B", 66)
                   .containsEntry("C", 67)
                   .containsEntry("D", 68)
                   .containsEntry("X", 88)
                   .containsEntry("Z", 90);
                   
    assertThat(map.putIfAbsent("Y", 89)).isNull(); // key 값이 존재하지 않는 경우
    assertThat(map.putIfAbsent("Y", 999)).isEqualTo(89); // key 값이 존재하는 경우
}

get()

key와 hash 값이 동일한 객체를 찾아 반환 (없는 경우 null)

참고. HashMap get() 살펴보기

getOrDefault()

key mapping 되는 value를 반환하거나, 없으면 defaultValue를 반환

// java.util.HashMap 구현체 
public V getOrDefault(Object key, V defaultValue) {
    Node e;
    return (e = this.getNode(hash(key), key)) == null ? defaultValue : e.value;
}

public V get(Object key) {
    Node e;
    return (e = this.getNode(hash(key), key)) == null ? null : e.value;
}

public boolean containsKey(Object key) {
    return this.getNode(hash(key), key) != null;
}

예시

@Test
void getOrDefault() {
    //given
    int value1 = 0;
    int value2 = 90;

    //when
    int result1 = map.getOrDefault("X", 0);
    int result2 = map.getOrDefault("Z", 0);

    //then
    Assertions.assertThat(result1).isEqualTo(value1);
    Assertions.assertThat(result2).isEqualTo(value2);
}

containsKey()

key 로 Node를 찾을 경우 true, 못 찾으면 false 반환

public boolean containsKey(Object key) {
    return this.getNode(hash(key), key) != null;
}

예시

@DisplayName("key에 해당하는 노드의 존재 여부를 확인한다")
@Test
void containsKey() {
    //given when then
    assertThat(map.containsKey("A")).isTrue();
    assertThat(map.containsKey("Y")).isFalse();
}

containsValue()

value와 동일한 Node<K, V> e 가 있는지 확인

의문) e.next 를 보고 LinkedList 자료 구조(java.util.HashMap.Node)인 건 알겠는데,
Tree 자료구조(java.util.HashMap.TreeNode)인 경우는 next 필드가 없는데 어떻게 되는거지??

// java.util.HashMap
public boolean containsValue(Object value) {
    Node[] tab;
    if ((tab = this.table) != null && this.size > 0) {
        Node[] var4 = tab;
        int var5 = tab.length;

        for(int var6 = 0; var6 < var5; ++var6) {
            for(Node<K, V> e = var4[var6]; e != null; e = e.next) { // LinkedList*
                Object v;
                if ((v = e.value) == value || value != null && value.equals(v)) {
                    return true;
                }
            }
        }
    }

    return false;
}

keySet(), values(), entrySet()

Map의 경우 Collection 인터페이스를 상속하지 않으므로
stream API 사용하고 싶은 경우 해당 method로 변환 후 사용하게 된다

- keySet() : Key 값을 Set에 담아 반환

- values() : Value 를 Collection에 담아 반환

- entrySet() : 각 Key, Value를 Map.Entry 객체로 Set에 담아 반환

public Set<K> keySet() {
    Set<K> ks = this.keySet;
    if (ks == null) {
        ks = new KeySet();
        this.keySet = (Set)ks;
    }

    return (Set)ks;
}

public Collection<V> values() {
    Collection<V> vs = this.values;
    if (vs == null) {
        vs = new Values();
        this.values = (Collection)vs;
    }

    return (Collection)vs;
}

public Set<Map.Entry<K, V>> entrySet() {
    Set es;
    return (es = this.entrySet) == null ? (this.entrySet = new EntrySet()) : es;
}

Set의 경우 Collection 인터페이스를 상속하고 있기 때문에 stream() 호출 가능하다

Stream<String> keyStream = map.keySet().stream();
Stream<Integer> valueStream = map.values().stream();
Stream<Map.Entry<String, Integer>> entryStream = map.entrySet().stream();

compute()

- key 존재 여부 상관없이 BiFunction 실행

- BiFunction 정의하지 않는 경우 NPE (Null Point Exception) 발생

@DisplayName("key가 없으면 BiFunction 결과 추가하고, 있으면 BiFunction 결과로 갱신한다")
@Test
void compute() {
    assertThatThrownBy(() -> map.compute("NPE", null)).isInstanceOf(NullPointerException.class);

    assertThat(map.compute("A", (key, value) -> value + 10)).isEqualTo(75);
    assertThat(map.compute("NPE", (key, value) -> 9999)).isEqualTo(9999);
}

computeIfAbsent()

- key 가 존재하면 기존 value를 반환

- key 가 없을 때 Function 실행 결과 value와 함께 저장

@DisplayName("key 존재하면 value를 반환하고, 없으면 함수 결과 value와 key를 저장ㅎ낟 ")
@Test
void computeIfAbsent() {
    assertThat(map.computeIfAbsent("A", key -> 999999)).isEqualTo(65);
    assertThat(map.computeIfAbsent("X", key -> 88)).isEqualTo(88);
    
    assertThatThrownBy(() -> map.computeIfAbsent("NPE", null))
                .isInstanceOf(NullPointerException.class);
}

putIfAbsent()와 computIfAbsent()의 차이점은 Optional의 orElse()와 orElseGet() 차이와 닮았다

putIfAbsent()와 computIfAbsent() 함수형 인자 둘 다 전달 가능한데
- 조건 확인 후 computIfAbsent() 의 경우 지연 실행하는 반면,
- putIfAbsent() 는 함수 인자가 실행되기 때문에 성능 이슈 야기할 수 있으므로 V 타입 인자를 바로 넘겨서 사용하는게 좋다

computeIfPresent()

- key 가 존재할 때 BiFunction 실행한 value 값으로 갱신하고 반환한다

- key 가 없을 때 null 반환

@DisplayName("key가 존재하면 함수 실행 결과값 갱신후 결과값 반환하고, 없으면 null 반환")
@Test
void computeIfPresent() {
    assertThat(map.computeIfPresent("X", (key, value) -> 88)).isNull();
    assertThat(map.computeIfPresent("A", (key, value) -> value + 10)).isEqualTo(75);
    
    assertThatThrownBy(() -> map.computeIfPresent("NPE", null))
                .isInstanceOf(NullPointerException.class);
}

merge()

- K key : key 값을 받음

- V value : key가 존재하지 않을 때 넣을 값, putIfAbsent() 처럼 동작

- BiFunction<? super V, ? super V, ? extends V> remappingFunction : key가 존재할 경우 remapping 할 BiFunction 실행하여 값 갱신, computeIfPresent() 처럼 동작(단, 함수 인자가 차이 있음)

computeIfPresent() 의 경우 BiFunction<? super K, ? super V, ? extend V> remappingFunction
merge의 경우 BiFunction<? super V, ? super V, ? extends V> remappingFunction

assertThat(map.merge("A", 1, (oldValue, putValue) -> oldValue * 10)).isEqualTo(650);
assertThat(map.merge("Y", 89, (oldValue, putValue) -> putValue * 10)).isEqualTo(89);
assertThat(map.merge("Y", 89, (oldValue, putValue) -> null)).isNull();

// 인자가 null 이면 NPE 발생
assertThatThrownBy(() -> map.merge("NPE", null, (oldValue, putValue) -> oldValue + putValue))
                .isInstanceOf(NullPointerException.class);

assertThatThrownBy(() -> map.merge("NPE", 0, null))
        .isInstanceOf(NullPointerException.class);

- 만약 세번째 인자가 null 인 경우 해당 key 는 remove 된다

참고 https://1-7171771.tistory.com/92

Map의 유용한 인터페이스

put put은 Map에 삽입해주는 메소드이다. Map에서는 key값의 중복이 허용되지 않는다. 중복된 key 값을 put할 경우 값이 대체된다. public void example1() { Map map = new HashMap(); map.put("사과", 1); map.put("포도", 1)

1-7171771.tistory.com

해싱 충돌(Hash Collision)

- 구현 클래스 살펴봤을 때, 비트 연산자와 수학적 기법도 포함되다보니 우선 포인트만 이해하는 걸로 넘어감
- 면접이나 실무에서 Map에 대해 묻는 경우가 없어, 후에 필요시 깊이 있게 공부하는게 현실적이라 생각
(2023.08.20)

Collection Framework 중 하나인 Map에서는 hash() 호출하여 반환 값을 인덱스로 하여 자료를 관리한다

아래는 자바 11버전의 HashMap 클래스의 hash() method 이다

static final int hash(Object key) {
    int h;
    return key == null ? 0 : (h = key.hashCode()) ^ h >>> 16; // ^ : XOR, >>> : bit shift operator
}

- XOR 연산과 bit shift operator로 하여 int 범위 내에 효율적으로 hash 값을 만들도록 한 것으로 보인다

hash()의 경우 int (4byte, 32bit) 값을 반환하는데, 과거 서로 다른 객체라도 해쉬 값이 동일한 경우가 나타나 해싱 충돌 문제가 나타 났었다. 하지만 현재는 해싱 충돌 기법과 재해싱 기법 등으로 이를 보완해왔다

해싱 충돌을 해결하기 위한 기법은 Separate Chaning과 Open Addressing 방식이 있다

Separate Chaining 방식

- Java HashMap에서는 Separate Chaining 사용

- 내부적으로 Node(=LinkedList), TreeNode(=Tree) 자료 구조로 데이터 연결하여 해싱 충돌 해결

- 다음 노드 가르키는 링크(포인터)로 인해 Open Adderssing 방식보다 메모리를 추가 사용한다

- 아래와 같이 경계값을 지정해두고, 8이상인 경우 Tree 로, 6이하인 경우 LinkedList로 변환하여 사용하는 것으로 보인다

// java.util.HashMap
static final int TREEIFY_THRESHOLD = 8;
static final int UNTREEIFY_THRESHOLD = 6;

HashMap 클래스를 살펴 볼 경우 아래와 같이 Node<K, V>[] table 필드로 자료 관리하는 것을 확인할 수 있다

//java.util.HashMap
static final int DEFAULT_INITIAL_CAPACITY = 16; // 기본 배열 용량

static final int MAXIMUM_CAPACITY = 1073741824; // 최대 배열 용량

static final float DEFAULT_LOAD_FACTOR = 0.75F; // 배열 용량을 늘리기 위한 기준

transient Node<K, V>[] table; // 데이터 저장하는 배열

public HashMap(int initialCapacity) {
    this(initialCapacity, 0.75F);
}

public HashMap() {
    this.loadFactor = 0.75F; // loadFactor : 배열의 용량을 늘리는 기준
}

public HashMap(Map<? extends K, ? extends V> m) {
    this.loadFactor = 0.75F;
    this.putMapEntries(m, false);
}

//..

java.util.HashMap.Node

- HashMap 클래스 안에 정의된 static inner class

- LinkedList 자료구조 나타내기 위한 포인터 next가 있었다

static class Node<K, V> implements Map.Entry<K, V> {
        final int hash;
        final K key;
        V value;
        Node<K, V> next; // 다음 노드를 가르키는 포인터

        Node(int hash, K key, V value, Node<K, V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }

        // .. method 생략
    }

java.util.HashMap.TreeNode

- Tree 자료 구조 나타내기 위한 클래스

static final class TreeNode<K, V> extends LinkedHashMap.Entry<K, V> {
        TreeNode<K, V> parent;
        TreeNode<K, V> left;
        TreeNode<K, V> right;
        TreeNode<K, V> prev;
        boolean red;

        TreeNode(int hash, K key, V val, Node<K, V> next) {
            super(hash, key, val, next);
        }

        final TreeNode<K, V> root() {
            TreeNode r;
            TreeNode p;
            for(r = this; (p = r.parent) != null; r = p) {
            }

            return r;
        }
        
        // .. 이하 생략 
}

HashMap get() 호출시 TreeNode, Node 클래스 타입에 따라 분기가 나눠져 있는 것을 확인 가능했다

put() 을 사용할 때 일정 카운트를 넘어가면 TreeNode로 변환하는 로직이 있다. (생략)

간단 구현해본 Separate Chaing ( by Singly LinkedList)

public class SeparateChaining {
    Node[] hashTable;
    int size;

    public SeparateChaining(int capacity) {
        this.hashTable = new Node[capacity];
    }

    public class Node{
        String key;
        String value;
        Node next = null; // 다음 노드를 가르키는 포인터

        public Node(String key, String value){
            this.key = key;
            this.value = value;
        }
    }

    public int size() {
        return this.size;
    }
    
    public final int hash(String key) { // key의 첫 글자 활용하여 간단 구현
        return (int)(key.charAt(0)) % this.hashTable.length;
    }

    public String get(String key) {
        int hash = hash(key);

        if(this.hashTable[hash] != null) {
            Node node = this.hashTable[hash];
            if(node.key == key) return node.value;

            while(node.next != null) {
                if(node.next.key == key) {
                    return node.next.value;
                }

                node = node.next;
            }
        }

        return null;
    }

    public boolean put(String key, String value) {
        int hash = hash(key);

        if(this.hashTable[hash] == null) {
            this.hashTable[hash] = new Node(key, value);
            this.size += 1;
            return true;
        } else {
            Node node = this.hashTable[hash];
            while(node.next != null) {
                if(node.key == key) {
                    node.value = value;
                    return true;
                }

                node = node.next;
            }

            node.next = new Node(key, value);
            this.size += 1;
            return true;
        }
    }

    public static void main(String[] args) {
        SeparateChaining separateChaining = new SeparateChaining(26);
        separateChaining.put("DaveLee", "01022223333");
        separateChaining.put("fun-coding", "01033334444");
        separateChaining.put("David", "01044445555");
        separateChaining.put("Dave", "01055556666");

        System.out.println(separateChaining.get("DaveLee")); // 01022223333
        System.out.println(separateChaining.get("fun-coding")); // 01033334444
        System.out.println(separateChaining.get("David")); // 01044445555
        System.out.println(separateChaining.get("Dave")); // 01055556666

        System.out.println(separateChaining.hash("DaveLee")); // 16
        System.out.println(separateChaining.hash("fun-coding")); // 24
        System.out.println(separateChaining.hash("David")); // 16
        System.out.println(separateChaining.hash("Dave")); // 16
    }
}

Open Addressing 방식

- 해싱 충돌 발생시 재해싱을 통해 빈 Bucket을 찾아 자료 저장

- Chaining 방식처럼 추가 포인터 사용하지 않아, 메모리는 좀 더 효율적

- 저장할 데이터가 적은 경우 유리

- 선형 탐색(Linear Probing), 제곱 탐색(Quadratic Probing), 이중 해시(Double Hashing) 방식이 있음

간단 구현해본 Open Addressing (by Linear Probing)

선형 탐색(Linear Probing)으로 hash 와 key가 같은 경우 value 덮어씌우고, 아닌 경우 hash += 1 하면서 빈 bucket에 값을 넣음

public class OpenAddressing {

    Node[] hashTable = null;
    int size;

    public OpenAddressing(int size) {
        this.hashTable = new Node[size];
    }

    public class Node {
        String key;
        String value;

        public Node(String key, String value) {
            this.key = key;
            this.value = value;
        }
    }

    public int size() {
        return this.size;
    }

    public int hash(String key) {
        return (int)(key.charAt(0)) % this.hashTable.length;
    }

    public boolean put(String key, String value) {
        int hash = hash(key);

        if(this.hashTable[hash] == null) {
            this.hashTable[hash] = new Node(key, value);
            return true;
        } else {
            while(this.hashTable[hash] != null) {
                if(this.hashTable[hash].key == key) {
                    this.hashTable[hash].value = value;
                    size += 1;
                    return true;
                }

                hash += 1; // hash를 1씩 증가시킴
                if(this.hashTable.length <= hash) return false;
            }

            this.hashTable[hash] = new Node(key, value);
            size += 1;
            return true;
        }

    }

    public String get(String key) {
        int hash = hash(key);

        if(this.hashTable[hash] == null) return null;

        while(this.hashTable[hash] != null) {
            if(this.hashTable[hash].key == key) {
                return this.hashTable[hash].value;
            }

            hash += 1;
            if(this.hashTable.length <= hash) break;
        }

        return null;
    }

    public static void main(String[] args) {
        OpenAddressing openAddressing = new OpenAddressing(26);
        openAddressing.put("DaveLee", "01022223333");
        openAddressing.put("fun-coding", "01033334444");
        openAddressing.put("David", "01044445555");
        openAddressing.put("Dave", "01055556666");

        System.out.println(openAddressing.get("DaveLee")); // 01022223333
        System.out.println(openAddressing.get("fun-coding")); // 01033334444
        System.out.println(openAddressing.get("David")); // 01044445555
        System.out.println(openAddressing.get("Dave")); // 01055556666

        System.out.println(openAddressing.hash("DaveLee")); // 16
        System.out.println(openAddressing.hash("fun-coding")); // 24
        System.out.println(openAddressing.hash("David")); // 16
        System.out.println(openAddressing.hash("Dave")); // 16
    }
}

참고.

https://developer-talk.tistory.com/747

https://tecoble.techcourse.co.kr/post/2020-07-29-equals-and-hashCode/

https://github.com/wjdrbs96/Today-I-Learn/blob/master/Java/Collection/Map/Java%20HashMap%20%EB%8F%99%EC%9E%91%EC%9B%90%EB%A6%AC.md

저작자표시 비영리 변경금지 (새창열림)