[HackerRank/SQL] Advanced Join 문제풀이

Sooyeon·2023년 11월 3일
0

문제풀이 

목록 보기
12/95
post-thumbnail

[HackerRank/SQL] Advanced Join


SQL Project Planning

Problem

You are given a table, Projects, containing three columns: Task_ID, Start_Date and End_Date. It is guaranteed that the difference between the End_Date and the Start_Date is equal to 1 day for each row in the table.

If the End_Date of the tasks are consecutive, then they are part of the same project. Samantha is interested in finding the total number of different projects completed.

Write a query to output the start and end dates of projects listed by the number of days it took to complete the project in ascending order. If there is more than one project that have the same number of completion days, then order by the start date of the project.

Sample Input

Sample Output

2015-10-28 2015-10-29
2015-10-30 2015-10-31
2015-10-13 2015-10-15
2015-10-01 2015-10-04

Explanation
The example describes following four projects:

Project 1: Tasks 1, 2 and 3 are completed on consecutive days, so these are part of the project. Thus start date of project is 2015-10-01 and end date is 2015-10-04, so it took 3 days to complete the project.

Project 2: Tasks 4 and 5 are completed on consecutive days, so these are part of the project. Thus, the start date of project is 2015-10-13 and end date is 2015-10-15, so it took 2 days to complete the project.

Project 3: Only task 6 is part of the project. Thus, the start date of project is 2015-10-28 and end date is 2015-10-29, so it took 1 day to complete the project.

Project 4: Only task 7 is part of the project. Thus, the start date of project is 2015-10-30 and end date is 2015-10-31, so it took 1 day to complete the project.

Problem explanation

- start_date,end_date출력 
1.연속적인 날짜인 start_date와 end_date는 하나의 프로젝트로 구분
2.start_date ~ end_date의 차이를 프로젝트 기간으로 계산,
3.프로젝트 기간으로 오름차순 정렬
4.3번 조건이 같다면, start_date로 정렬하기 


1. start_date와 end_date모두 각각 연속적인 날짜여야하며,
start_date와 end_date는 동일해서는 안된다.
=>from 서브쿼리를 이용해서 start_date는 end_date와 같지 않은날짜만 추출, 
=>end_date는 start_date와 같지 않은날짜만 추출 

SELECT Start_date,End_Date
FROM 
    (SELECT Start_date 
     FROM Projects
     WHERE Start_Date NOT IN(SELECT End_Date FROM Projects)) a,
     (SELECT End_Date
      FROM Projects
      WHERE End_Date NOT IN (SELECT Start_Date FROM Projects)) b
order by Start_Date

2. WHERE 조건에 Start_Date < End_Date 를 추가해준다.

SELECT Start_date,End_Date
FROM 
    (SELECT Start_date 
     FROM Projects
     WHERE Start_Date NOT IN(SELECT End_Date FROM Projects)) a,
     (SELECT End_Date
      FROM Projects
      WHERE End_Date NOT IN (SELECT Start_Date FROM Projects)) b
WHERE Start_Date < End_Date
order by Start_Date,End_Date

3.GROUP BY 로 Start_Date 별, End_Date 중 실제 마감일은 가장 
빠른 일정으로 선택해야함,
프로젝트 기간 순으로 정렬, 시작일 기준으로 정렬 

SELECT Start_Date,MAX(End_Date),DATEDIFF(End_Date,Start_Date)
FROM 
    (SELECT Start_Date 
     FROM Projects
     WHERE Start_Date NOT IN(SELECT End_Date FROM Projects)) a,
     (SELECT End_Date
      FROM Projects
      WHERE End_Date NOT IN (SELECT Start_Date FROM Projects)) b
WHERE Start_Date<End_Date
GROUP BY Start_Date,End_Date
ORDER BY DATEDIFF(MIN(End_Date),Start_Date),Start_Date

# DATEDIFF => 날짜 차이 

solution

SELECT Start_Date,MIN(End_Date)
FROM 
    (SELECT Start_Date 
     FROM Projects
     WHERE Start_Date NOT IN(SELECT End_Date FROM Projects)) a,
     (SELECT End_Date
      FROM Projects
      WHERE End_Date NOT IN (SELECT Start_Date FROM Projects)) b
WHERE Start_Date<End_Date
GROUP BY Start_Date
ORDER BY DATEDIFF(MIN(End_Date),Start_Date),Start_Date

Placements

Problem

You are given three tables: Students, Friends and Packages. Students contains two columns: ID and Name. Friends contains two columns: ID and Friend_ID (ID of the ONLY best friend). Packages contains two columns: ID and Salary (offered salary in $ thousands per month).

Write a query to output the names of those students whose best friends got offered a higher salary than them. Names must be ordered by the salary amount offered to the best friends. It is guaranteed that no two students got same salary offer.

Sample Input

Sample Output

Samantha
Julia
Scarlet

Explanation

See the following table:

Now,

  • Samantha's best friend got offered a higher salary than her at 11.55
  • Julia's best friend got offered a higher salary than her at 12.12
  • Scarlet's best friend got offered a higher salary than her at 15.2
  • Ashley's best friend did NOT get offered a higher salary than her

The name output, when ordered by the salary offered to their friends, will be:

  • Samantha
  • Julia
  • Scarlet

Problem explanation

- 베스트프렌드가 자신보다 더 높은 월급을 받는 학생의 이름을 출력하라.
- 월급으로 오름차순 정렬

1. Students와 Packages 테이블을 ID 기준으로 JOIN해 
각 학생의 이름과 Salary를 확인
2. Students와 Friends 테이블을 ID 기준으로 JOIN해
기존 테이블에 각 학생의 친구 컬럼 추가
3.친구들의 Salary 컬럼을 추가하기 위하여
Friends 테이블의 Friend_id와 Packages 테이블의 ID를 기준으로 JOIN
4.WHERE 조건으로 학생 월급 < 친구월급 추가
5.친구월급 기준으로 정렬

solution

SELECT s.name
FROM students AS s
INNER JOIN packages AS p
ON s.id=p.id
INNER JOIN friends AS f 
On s.id=f.id
INNER JOIN packages AS p2
ON f.friend_id=p2.id
WHERE p.salary<p2.salary
ORDER BY p2.salary

Interviews

Problem

Samantha interviews many candidates from different colleges using coding challenges and contests. Write a query to print the contest_id, hacker_id, name, and the sums of total_submissions, total_accepted_submissions, total_views, and total_unique_views for each contest sorted by contest_id. Exclude the contest from the result if all four sums are 0.

Note: A specific contest can be used to screen candidates at more than one college, but each college only holds 1 screening contest.

Input Format

The following tables hold interview data:

  • Contests: The contest_id is the id of the contest, hacker_id is the id of the hacker who created the contest, and name is the name of the hacker

  • Colleges: The college_id is the id of the college, and contest_id is the id of the contest that Samantha used to screen the candidates.

  • Challenges: The challenge_id is the id of the challenge that belongs to one of the contests whose contest_id Samantha forgot, and college_id is the id of the college where the challenge was given to candidates.

  • View_Stats: The challenge_id is the id of the challenge, total_views is the number of times the challenge was viewed by candidates, and total_unique_views is the number of times the challenge was viewed by

  • Submission_Stats: The challenge_id is the id of the challenge, total_submissions is the number of submissions for the challenge, and total_accepted_submission is the number of submissions that achieved full

Sample Output

66406 17973 Rose 111 39 156 56
66556 79153 Angela 0 0 11 10
94828 80275 Frank 150 38 41 15

Explanation

The contest 66406 is used in the college 11219. In this college 11219, challenges 18765 and 47127 are asked, so from the view and submission stats:

  • Sum of total submissions = 27+56+28=111
  • Sum of total accepted submissions = 10+18+11=39
  • Sum of total views = 43+72+26+15=156
  • Sum of total unique views 10+13+19+14=56

Similarly, we can find the sums for contests 66556 and 94828.

Problem explanation

1.contest_id, hacker_id, name,
the sums of total_submissions,total_accepted_submissions,
total_views, total_unique_views for each contest 추출

2. total_submissions,total_accepted_submissions,
total_views, total_unique_views 네개의 합이 0이면 제외
=> challenge_id별 네개의 계산값을 구해야한다.
따라서,from 서브쿼리에 이용 

3. contest_id별 정렬 

solution

select contests.contest_id, 
        contests.hacker_id, 
        contests.name, 
        sum(total_submissions), 
        sum(total_accepted_submissions), 
        sum(total_views), 
        sum(total_unique_views)
from contests
join colleges on contests.contest_id = colleges.contest_id
join challenges on colleges.college_id = challenges.college_id
left join (select challenge_id, 
           sum(total_views) as total_views, 
           sum(total_unique_views) as total_unique_views
           from view_stats 
           group by challenge_id) vs 
           on challenges.challenge_id = vs.challenge_id 
left join (select challenge_id, 
            sum(total_submissions) as total_submissions, 
           sum(total_accepted_submissions) as total_accepted_submissions 
           from submission_stats group by challenge_id) ss 
           on challenges.challenge_id = ss.challenge_id
group by contests.contest_id, contests.hacker_id, contests.name
having sum(total_submissions)!=0 or
       sum(total_accepted_submissions)!=0 or
       sum(total_views)!=0 or
       sum(total_unique_views)!=0
order by contest_id;

15 Days of Learning SQL

Problem

Julia conducted a days of learning SQL contest. The start date of the contest was March 01, 2016 and the end date was March 15, 2016.

Write a query to print total number of unique hackers who made at least submission each day (starting on the first day of the contest), and find the hacker_id and name of the hacker who made maximum number of submissions each day. If more than one such hacker has a maximum number of submissions, print the lowest hacker_id. The query should print this information for each day of the contest, sorted by the date.

Input Format

The following tables hold contest data:

  • Hackers: The hacker_id is the id of the hacker, and name is the name of the hacker.

  • Submissions: The submission_date is the date of the submission, submission_id is the id of the submission, hacker_id is the id of the hacker who made the submission, and score is the score of the submission.

Sample Input

For the following sample input, assume that the end date of the contest was March 06, 2016.

  • Hackers Table:

  • Submissions Table:

    Sample Output
2016-03-01 4 20703 Angela
2016-03-02 2 79722 Michael
2016-03-03 2 20703 Angela
2016-03-04 2 20703 Angela
2016-03-05 1 36396 Frank
2016-03-06 1 20703 Angela

Explanation

On March 01, 2016 hackers 20703, 36396, 53473, and 79722 made submissions. There are unique hackers who made at least one submission each day. As each hacker made one submission, 20703 is considered to be the hacker who made maximum number of submissions on this day. The name of the hacker is Angela.

On March 02, 2016 hackers 15758, 20703, and 79722 made submissions. Now 20703 and 79722 were the only ones to submit every day, so there are 2 unique hackers who made at least one submission each day. 79722 made 2 submissions, and name of the hacker is Michael.

On March 03, 2016 hackers 20703, 36396, 79722 and 79722 made submissions. Now 20703 and 79722 were the only ones, so there are 2 unique hackers who made at least one submission each day. As each hacker made one submission so 20703 is considered to be the hacker who made maximum number of submissions on this day. The name of the hacker is Angela.

On March 04, 2016 hackers 20703, 44065, 53473, and 79722 made submissions. Now 20703 and 79722 only submitted each day, so there are 2 unique hackers who made at least one submission each day. As each hacker made one submission so 20703 is considered to be the hacker who made maximum number of submissions on this day. The name of the hacker is Angela.

On March 05, 2016 hackers 20703, 36396, 38289 and 62529 made submissions. Now 20703 only submitted each day, so there is only 1 unique hacker who made at least one submission each day. 36396 made 2 submissions and name of the hacker is Frank.

On March 06, 2016 only 20703 made submission, so there is only 1 unique hacker who made at least one submission each day. 20703 made 1 submission and name of the hacker is Angela.

Problem explanation

output: 날짜,첫날부터 현재까지 매일 최소1번 이상 제출한 사람의 수,
해당 날짜에 가장 많이 제출한 사람의 아이디,이름

정렬 조건 : 같은 횟수인 사람 있으면 id로 정렬

1.일자별로 매일 1회 이상 제출한 해커의 총 인원수
=> 해당 조건을 판단하기 위해서는 
해커별로 해당일자까지의 중복을 제외한 제출일 건수가,
해당일자와 시작일자(2016-03-01)까지의 차이 일수와 같아야함 

(SELECT COUNT(DISTINCT HACKER_ID)  
 FROM SUBMISSIONS S2  
 WHERE S2.SUBMISSION_DATE = S1.SUBMISSION_DATE AND    
(SELECT COUNT(DISTINCT S3.SUBMISSION_DATE) 
 FROM SUBMISSIONS S3 WHERE S3.HACKER_ID = S2.HACKER_ID AND S3.SUBMISSION_DATE < S1.SUBMISSION_DATE) = DATEDIFF(S1.SUBMISSION_DATE , '2016-03-01')


 2.일자별, 해커별(GROUP BY 일자, 해커)로 제출 수를 먼저 구합니다.
 =>일자별로 최대 제출 수를 기록한 해커를 구하기
 =>랭킹이 1인 데이터가 최대 제출 수를 기록한 해커가 될 것입니다.
(SELECT HACKER_ID FROM SUBMISSIONS S2 WHERE S2.SUBMISSION_DATE = S1.SUBMISSION_DATE 
GROUP BY HACKER_ID ORDER BY COUNT(SUBMISSION_ID) DESC, HACKER_ID LIMIT 1) AS TMP,
(SELECT NAME FROM HACKERS WHERE HACKER_ID = TMP)FROM
(SELECT DISTINCT SUBMISSION_DATE FROM SUBMISSIONS) S1
GROUP BY SUBMISSION_DATE;

solution

SELECT SUBMISSION_DATE,
(SELECT COUNT(DISTINCT HACKER_ID)  
 FROM SUBMISSIONS S2  
 WHERE S2.SUBMISSION_DATE = S1.SUBMISSION_DATE AND    
(SELECT COUNT(DISTINCT S3.SUBMISSION_DATE) 
 FROM SUBMISSIONS S3 WHERE S3.HACKER_ID = S2.HACKER_ID AND S3.SUBMISSION_DATE < S1.SUBMISSION_DATE) = DATEDIFF(S1.SUBMISSION_DATE , '2016-03-01')),
(SELECT HACKER_ID FROM SUBMISSIONS S2 WHERE S2.SUBMISSION_DATE = S1.SUBMISSION_DATE 
GROUP BY HACKER_ID ORDER BY COUNT(SUBMISSION_ID) DESC, HACKER_ID LIMIT 1) AS TMP,
(SELECT NAME FROM HACKERS WHERE HACKER_ID = TMP)
FROM
(SELECT DISTINCT SUBMISSION_DATE FROM SUBMISSIONS) S1
GROUP BY SUBMISSION_DATE;

0개의 댓글