You are given a table, Projects, containing three columns: Task_ID, Start_Date and End_Date. It is guaranteed that the difference between the End_Date and the Start_Date is equal to 1 day for each row in the table.
Column | Type |
---|---|
Task_ID | Integer |
Start_Date | Date |
End_Date | Date |
If the End_Date of the tasks are consecutive, then they are part of the same project. Samantha is interested in finding the total number of different projects completed.
Write a query to output the start and end dates of projects listed by the number of days it took to complete the project in ascending order. If there is more than one project that have the same number of completion days, then order by the start date of the project.
Sample Input
Sample Output
2015-10-28 2015-10-29
2015-10-30 2015-10-31
2015-10-13 2015-10-15
2015-10-01 2015-10-04
Explanation
The example describes following four projects:
Project 1: Tasks 1, 2 and 3 are completed on consecutive days, so these are part of the project. Thus start date of project is 2015-10-01 and end date is 2015-10-04, so it took 3 days to complete the project.
Project 2: Tasks 4 and 5 are completed on consecutive days, so these are part of the project. Thus, the start date of project is 2015-10-13 and end date is 2015-10-15, so it took 2 days to complete the project.
Project 3: Only task 6 is part of the project. Thus, the start date of project is 2015-10-28 and end date is 2015-10-29, so it took 1 day to complete the project.
Project 4: Only task 7 is part of the project. Thus, the start date of project is 2015-10-30 and end date is 2015-10-31, so it took 1 day to complete the project.
SET sql_mode = '';
SELECT Start_Date, End_Date
FROM
(SELECT Start_Date FROM Projects WHERE Start_Date NOT IN (SELECT End_Date FROM Projects)) a,
(SELECT End_Date FROM Projects WHERE End_Date NOT IN (SELECT Start_Date FROM Projects)) b
WHERE Start_Date < End_Date
GROUP BY Start_Date
ORDER BY DATEDIFF(End_Date, Start_Date), Start_Date
우선 설명에 앞서 해당 문제는 혼자 못풀어서 정답을 보고 정답 쿼리문을 씹고, 뜯고, 맛보고, 즐겨서 내것으로 만들고 있다!!, 또한 이번문제의 쿼리문은 모두 이해하고 있다!!
1단계 : 테이블 만들기
SELECT Start_Date, End_Date
FROM
(SELECT Start_Date FROM Projects WHERE Start_Date NOT IN (SELECT End_Date FROM Projects)) a,
(SELECT End_Date FROM Projects WHERE End_Date NOT IN (SELECT Start_Date FROM Projects)) b ;
#먼저 연속된날짜를 고려해야 하기 때문에 FROM절을 위와 같이 만들어야한다.(FROM절 조차도 풀지못하였음 ㅠㅠㅠ 뭔가 쿼리문적 사고가 아직은 좀 부족한것같다...)
#Start_Date가 End_Date와 같으면 연속된 프로젝트이기 때문에 우선 연속된 프로젝트를 하나로 묶어야하기 때문에 End_Date와 같지않은 Start_Date와 Start_Date와 같지않은 End_Date로 테이블을 만들어준다.
#End_Date에 똑같은 날짜가 11개씩 출력이 되는것을 볼 수 있다. 따라서 다음 단계는 이를 해결하는것이다.
SELECT Start_Date, End_Date
FROM
(SELECT Start_Date FROM Projects WHERE Start_Date NOT IN (SELECT End_Date FROM Projects)) a,
(SELECT End_Date FROM Projects WHERE End_Date NOT IN (SELECT Start_Date FROM Projects)) b
WHERE Start_Date < End_Date
#WHERE절을 걸어서 Start_Date는 당연히 End_Date보다 작아야하므로 해당 조건문을 걸면 다음과 같이 출력이 된다.
#이번 출력에서 문제는 Start_Date의 중복이 보인다. 이것은 Group By로 간단하게 해결할 수 있다.
#" SET sql_mode = ''; " 는 only_full_group_by 를 꺼주는 역할을 합니다. GROUP BY 특성상 위에서 말한것과 같은 조건이 걸려있어서 이것을 끄는 방법입니다!!! only_full_group_by 이 있는 이유가 제대로 된 방법으로 group by를 하라는 의미로 나온것이라고 하네요
SET sql_mode = '';
SELECT Start_Date, End_Date
FROM
(SELECT Start_Date FROM Projects WHERE Start_Date NOT IN (SELECT End_Date FROM Projects)) a,
(SELECT End_Date FROM Projects WHERE End_Date NOT IN (SELECT Start_Date FROM Projects)) b
WHERE Start_Date < End_Date
GROUP BY Start_Date
이렇게 해결된것을 볼 수 있습니다.
SET sql_mode = '';
SELECT Start_Date, End_Date
FROM
(SELECT Start_Date FROM Projects WHERE Start_Date NOT IN (SELECT End_Date FROM Projects)) a,
(SELECT End_Date FROM Projects WHERE End_Date NOT IN (SELECT Start_Date FROM Projects)) b
WHERE Start_Date < End_Date
GROUP BY Start_Date
ORDER BY DATEDIFF(End_Date, Start_Date), Start_Date
#DATEDIFF()함수는 두개의 날짜값의 차이를 int로 반환하는 Mssql내장함수이다.