[카카오] 파이썬 순위 검색

Dev.Jo·2022년 2월 13일

알고리즘

목록 보기

10/21

풀이코드

from bisect import bisect_left

def solution(info, query):
    table = {}
    result = []
    
    for applicant in info:
        [language,position,career,soulfood,score] = applicant.split(" ")
        for _lan in [language,"-"]:
            for _pos in [position,"-"]:
                for _car in [career,"-"]:
                    for _soul in[soulfood,"-"]:
                        _query = " and ".join([_lan,_pos,_car,_soul])
                        if _query not in table:
                            table[_query] = [int(score)]
                        else :
                            table[_query].append(int(score))
                            
                            
    for scores in table.values():
        scores.sort()
    
    for query_element in query:
        query_info = " ".join(query_element.split(" ")[:-1])
        query_score = int(query_element.split(" ")[-1])
        if query_info in table :
            result.append(len(table[query_info])-bisect_left(table[query_info],query_score))
        else:
            result.append(0)
        
    return result

복기

문제 키워드 : 조합 이진탐색

문제조건 다시 살펴보기

info 배열의 크기는 1 이상 50,000 이하입니다.
query 배열의 크기는 1 이상 100,000 이하입니다.
개발언어는 cpp, java, python 중 하나입니다.
직군은 backend, frontend 중 하나입니다.
경력은 junior, senior 중 하나입니다.
소울푸드는 chicken, pizza 중 하나입니다.

처음 문제접근

처음에는 완전탐색처럼 하나하니씩 비교해나가는 로직을 떠올렸습니다

for queryElement in query:
	for infoElement in info:
    	// queryElement와 infoElement를 비교

이러한 방법으로 문제를 접근하면 50,000 * 100,000 만큼의 연산이 필요하기 때문에 효율성 측면에서 통과하지 못합니다

다음 방법으로 문제를 해결합니다

info의 조합을 미리 생성한 테이블을 이용하자

1. 조건에 맞는 모든 조합 미리 구하기

다음과 같이 info에 대한 해시테이블을 만들어 둔다면 시간복잡도 O(1)로 접근할 수 있습니다


{
	"java and backend and junior and pizza": [100,150,200]
    ...
}

// "java and backend and junior and pizza" query에 대해 시간복잡도 O(1)으로 접근할 수 있습니다

이제 info에 대해 "-"를 고려해서 조합을 만들어야합니다. 총 16가지(2x2x2x2) 방법이 있겠죠??

- and backend and junior and pizza
java and - and junior and pizza
- and backend and - and pizza
java and backend and junior and -
- and backend and - and -
... 계속

저는 4중 for문을 돌면서 조합을 생성하는 방법을 택했습니다. dfs나 다른 방법도 있겠지만 문제에서 이미 정해진 가지수만 고려하면 되기 때문에 구현이 간편한 방법을 택했습니다.

table = { }
for applicant in info:
        [language,position,career,soulfood,score] = applicant.split(" ")
        for _lan in [language,"-"]:
            for _pos in [position,"-"]:
                for _car in [career,"-"]:
                
                    _query = " and ".join([_lan,_pos,_car,_soul])
                    
                        if _query not in table:
                            table[_query] = [int(score)]
                        else :
                            table[_query].append(int(score))

" and ".join()으로 query 문자열 생성한 후, table에 해당 query가 있는 지 검사합니다.
table에 있다면 리스트에 append하고 없다면 리스트로 초기화합니다

2. 이진탐색으로 지원자 수 구하기

이렇게 table에 지원자 info에 대한 모든 조합을 넣는 작업을 마쳤습니다.
이제 query의 점수에 따라 점수 이상의 지원자가 몇명있는지를 검색해야합니다.

여기서도 효율성 문제가 발생합니다. 선형탐색을 하게되면 최악의 경우 50,000 * 100,000 시간복잡도가 발생하 수 있습니다.

이때 시간복잡도 O(logN)에 빛나는 마법같은 이진탐색으로 이것을 해결할 수 있습니다

파이썬에서는 이진탐색을 직접 구현할 수 도 있고 bisect 라이브러리로도 구현할 수 있는데 실수를 줄이고자 라이브러리를 사용했습니다.

해당 점수보다 높은 지원자의 수를 구하기 위해 다음 과정을 따릅니다

[100,150,200,300]

bisect_left(130) // 1
bisect_left(150) // 1

bisect_left메서드로 해당점수보다 크거나 같은 수가 나오는 첫번째 인덱스를 구합니다
전체 배열요소의 수 - 위에서 구한 인덱스 = 해당점수보다 높은 지원자 수

for query_element in query:
		// 쿼리를 info 문자열과 점수로 분리합니다
        query_info = " ".join(query_element.split(" ")[:-1])
        query_score = int(query_element.split(" ")[-1])
        
        
        // 쿼리가 테이블에 있는 경우와 없는 경우를 생각해주어야합니다
        if query_info in table :
        	// 전체 배열요소의 수 - 이진탐색의 점수 첫번째 인덱스            result.append(len(table[query_info])-bisect_left(table[query_info],query_score))
        else:
        	// 테이블에 쿼리가 없는 경우는 0 (조건을 만족하는 지원자가 없음)
            result.append(0)

테이블에 쿼리가 없는 경우도 생각해주어야 합니다

코드 리팩토링

문제는 풀었지만 다른 사람들의 코드를 참고해 깔끔하게 다시 리팩토링하려고 합니다

문자열 대신 튜플로

_query = " and ".join([_lan,_pos,_car,_soul])
if _query not in table:
	table[_query] = [int(score)]
else :
	table[_query].append(int(score))

table에 저장을 할 때 문자열을 key로 저장을 했는데 join() 연산등 번거로운 과정이 있습니다. 문자열 대신 튜플을 키로 사용하겠습니다

table[(_lan,_pos,_car,_soul)] = [int(score)]

문제의 조건에 대해 조합을 만들자

조합을 만들 때 주어진 info 에 대해서만 만들었습니다

for applicant in info:
        [language,position,career,soulfood,score] = applicant.split(" ")

이런 코드 때문에 나중에 query를 찾을 때 테이블에 조건이 없는 경우가 발생했습니다.
처음에 문제에 맞는 모든 조합을 생성하도록 합시다

table = dict()
    for language in ['cpp', 'java', 'python', '-']:
        for position in ['backend', 'frontend', '-']:
            for career in ['junior', 'senior', '-']:
                for soulfood in ['chicken', 'pizza', '-']:
                    table.setdefault((language, position, career, solufood), list())

setdefault 메서드로 기본값을 list로 정할 수 있습니다.

리팩토링 코드

from bisect import bisect_left

def solution(info, query):
    table = dict()
    for language in ['cpp', 'java', 'python', '-']:
        for position in ['backend', 'frontend', '-']:
            for career in ['junior', 'senior', '-']:
                for soulfood in ['chicken', 'pizza', '-']:
                    table.setdefault((language, position, career, soulfood), list())
    
    for applicant in info:
        [language,position,career,soulfood,score] = applicant.split(" ")
        for _lan in [language,"-"]:
            for _pos in [position,"-"]:
                for _car in [career,"-"]:
                    for _soul in[soulfood,"-"]:
                        table[(_lan,_pos,_car,_soul)].append(int(score))

    for scores in table.values():
        scores.sort()
    
    result = []
    for query_element in query:
        query_info = query_element.split(" ")
        query_score = int(query_info[-1])
        [lan,pos,car,soul] = [query_info[0],query_info[2],query_info[4],query_info[6]]
        result.append(len(table[(lan,pos,car,soul)]) - bisect_left(table[(lan,pos,car,soul)],query_score))

    return result

Dev.Jo

소프트웨어 엔지니어, 프론트엔드 개발자

이전 포스트

[백준] 파이썬 숫자 카드 2

다음 포스트