reference
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.duplicated.html
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.duplicated.html
문제 : https://leetcode.com/problems/duplicate-emails/description/
중복된 email만 출력
input value에 대해, 중복여부를 return한다.
subset으로 column을 지정할 수 있고, keep 으로 deupicate를 mark 한다.
first -> Mark duplicates as True except for the first occurrence.
last -> Mark duplicates as True except for the last occurrence.
Flse -> Mark all duplicates as True.
df.duplicated()
0 False
1 True
2 False
3 False
4 False
dtype: bool
df.duplicated(keep=False)
0 True
1 True
2 False
3 False
4 False
dtype: bool
duplicate rows 를 제거한다.
def duplicate_emails(person: pd.DataFrame) -> pd.DataFrame:
df = person[person.duplicated(subset=["email"])][["email"]]
df = df.drop_duplicates()
return df