velog 조회수를 한 눈에 보여주도록 도와주는 패키지

최더디·2021년 12월 30일

Crawling JWT python velog-hits

Velog-Hits

목록 보기

1/3

🍕 Velog Hits

Github: https://github.com/insutance/velog-hits
PyPI: https://pypi.org/project/velog-hits/

📍 서론

velog는 나에게 개발 관련 글을 작성할 때 tistory, medium, naver 보다 더 좋은 사이트다.
예전에 가장 큰 단점은 통계를 볼 수 없다는 것이었지만, velog를 만드신 velopert 님께서 게시글 통계를 볼 수 있는 기능까지 만들어주셨다. 너무 원했던 기능이었는데, 만들어주셔서 감사합니다:)

하지만 해당 기능의 단점은 게시글을 하나하나 들어가서 통계를 눌러 조회수 통계를 봐야 한다는 점이다. 게시글 클릭 → 통계 클릭 이 행동을 계속 해야 했는데, 점점 게시글이 많아지다 보니 너무 힘들어졌다. 그래서 나는 하나하나의 게시글에 들어가 통계를 누르는 행동을 없애고자 게시글들의 조회수를 한 번에 볼 수 있도록 도와주는 velog-hits를 개발했다.

📍개발 내용

개발을 진행할 때 중요한 부분만 간략히 작성한 글입니다 🥸
velog-hits 사용 방법은 Github or PyPI 링크를 통해 보실 수 있습니다.

1) GraphQL 로 값 가져오기

이번 velog-hits 만들면서 가장 중요한 부분이다.
GraphQL란 메타(전 페이스북)에서 만든 쿼리 언어, SQL과 마찬가지로 쿼리 언어이다. SQL은 데이터베이스에 저장 된 데이터를 잘 가져오는 것을 목적으로하고, GraphQL은 웹 클라이언트가 데이터를 서버로부터 효율적으로 가져오는 것이 목적이라고 한다.

Velog 사이트에서 F12 → 네트워크 탭을 눌러서 값을 확인했을 때 graphql 이 계속해서 존재했고 이것들의 응답을 보니 내가 원하는 데이터들이 존재했다. 그렇기 때문에 POST를 보낼 때 해당 데이터를 같이 보내주면 원하는 값을 얻을 수 있었다.

참고링크

2) 모든 게시글 가져오기

GraphQL을 사용해 쉽게 데이터들 얻을 수 있었다.
여기서 cursor 값은 새롭게 데이터를 가져올 때의 기준이 되는 게시글의 고유 ID 값을 넣어주면 된다.

# velog_hits/graphql.py
def graphql_posts(username, cursor=None):
	"""블로그 게시물 목록 가져오는 GraphQL"""
	if cursor is None:
	  cursor = "null"
	else:
	  cursor = f"\"{cursor}\""
	
	return {
	  "query": f"""
	  query {{
	    posts(cursor: {cursor}, username: "{username}") {{
	      id
	      title
	      url_slug
	      comments_count
	      tags
	      likes
	      }}
	    }}
	"""
	}

위의 graphql_posts() 를 사용해 계속해서 cursor의 새로운 값을 넣어주도록 반복문을 돌렸다.
20개보다 적다면 모든 게시글이 로딩된 것으로 간주하고 break를 걸어줬다. 그렇게 한 이유는 POST로 보낸 후의 response 값에서 posts 개수가 20개씩 나오는 것을 확인했기 때문이다.

# velog_hits/crawler.py
class HitsCrawler:
	...
	def get_posts(self) -> list:
	  posts = []
	  cursor = None
	
	  while True:
	    if cursor is None:
	      query = graphql_posts(self.username)
	    else:
	      query = graphql_posts(self.username, cursor)
	
	    response = requests.post(url="https://v2.velog.io/graphql", json=query)
	    response_data = json.loads(response.text)
	    posts.extend(response_data["data"]["posts"])
	
	    cursor = posts[-1]["id"]
	
	    if len(response_data["data"]["posts"]) < 20:
	      break
	
	  return posts

3) 통계 결과 가져오기

통계를 확인하는 것은 로그인 했을 때만 볼 수 있는 기능이기 때문에 Access Token이 필요했다.
우선 통계를 눌렀을 때 F12 → 네트워크 에서 graphql 값을 보고 query를 가져오면 아래와 같다.

# velog_hits/graphql.py
def graphql_get_status(post_id):
	"""통계 정보 가져오는 GraphQL"""
	return {
    "query": f"""
	  query {{
	  getStats(post_id: "{post_id}") {{
	    total
	    count_by_day {{
	      count
	      day
	      __typename
	    }}
	    __typename
	  }}
	}}
	"""
	}

Access Token을 headers 값으로 넣어주고 POST를 보내면 내가 원하는 통계 값을 얻을 수 있었다. 이때 나는 이전 조회수는 필요없고 오늘의 조회수, 전체 조회수만 궁금했기 때문에 원하는 값만 가져왔다.

# velog_hits/crawler.py
class HitsCrawler:
	...
	def get_hits(self) -> list:
	posts = self.get_posts()
	headers = {"Authorization": f"Bearer {self.access_token}"}
	
	hits = []
	for post in posts:
	  query = graphql_get_status(post["id"])
	  response = requests.post(
	      url="https://v2.velog.io/graphql",
	      json=query,
	      headers=headers
	  )
	  response_data = json.loads(response.text)
	  try:
	    hits.append(
	        {
	            "id": post["id"],
	            "total": response_data["data"]["getStats"]["total"],
	            "latest_count": response_data["data"]["getStats"]["count_by_day"][0]["count"],
	            "latest_day": response_data["data"]["getStats"]["count_by_day"][0]["day"]
	        }
	    )
	  except TypeError:
	    print("Access Token이 잘못된 형식이거나 만료 되었을 수 있습니다.")
	    sys.exit()
	
	return hits

4) ID 값으로 조인

get_posts() 를 통해 얻은 데이터들과, get_views()를 통해 얻은 데이터들을 id 값으로 JOIN을 해줌으로써 내가 원하는 모든 데이터를 얻을 수 있었다.

# velog_hits/crawler.py
class HitsCrawler:
	...
	def get_post_infos(self) -> pd.DataFrame:
	  posts = self.get_posts()
	  views = self.get_views()
	
	  df_posts = pd.DataFrame.from_dict(posts)
	  df_views = pd.DataFrame.from_dict(views)
	  post_infos = pd.merge(left=df_posts, right=df_views, how="inner", on="id")
	
	  return post_infos

5) HTML 로 변환해 결과 보여주기

DataFrame 형태의 데이터를 to_html() 을 사용해 쉽게 테이블 형태로 보여줄 수 있었다.
결과를 보여주는 index.html 파일을 생성하는데 htmlhits 라는 폴더를 만들고 해당 폴더에 넣어줬다. 또한 추후에 테이블을 이쁘게 꾸며주는 프론트 개발이 있을 수 있으니 to_json() 을 사용해 JSON 데이터도 추출해 놓았다.

# velog_hits/convertor.py
class DF2HTMLConverter:
  def convert_df_to_html(self, df):
    try:
      velog_hits_path = Path.cwd()
      html_path = os.path.join(velog_hits_path, "htmlhits")
      if not os.path.isdir(html_path):
        os.mkdir(html_path)

      with open(os.path.join(html_path, "index.html"), "w") as html_file:
        html = df.to_html(index=False, escape=False)
        html_file.write(html)

      with open(os.path.join(html_path, "hits_data.json"), "w") as json_file:
        json_data = df.to_json(orient="records", date_format="iso")
        json_file.write(json_data)

      print("Velog Hits Success!!")
      print(f"Velog Hits Result: {os.path.join(html_path, 'index.html')}")
      return True

    except Exception:
      print("Velog Hits Fail")
      sys.exit()

위에서 추출한 데이터에서 내가 쉽게 볼 수 있도록 값을 조금 수정해서 해당 DataFrame을 HTML로 볼 수 있도록 아래 메서드들을 추가했다.

# velog_hits/convertor.py
class DF2HTMLConverter:
	...
  def get_result_dataframe(self, df, url):
    df = self._create_url(df, url)
    df = self._create_html_link(df)
    df = self._modify_date_format(df)

    df = df[["post", "tags", "comments_count", "likes", "total", "latest_count", "latest_day"]]
    return df

  def _create_url(self, df, url):
    df["url"] = url + df["url_slug"]
    return df

  def _create_html_link(self, df):
    df["post"] = "<a href='" + df["url"] + "'>" + df["title"] + "</a>"
    return df

  def _modify_date_format(self, df):
    df["latest_day"] = pd.to_datetime(df["latest_day"], format="%Y/%m/%d")
    df["latest_day"] = df["latest_day"].dt.date
    return df

6) 명령어 생성

argparse 를 사용해 명령어로 원하는 값을 입력받아 프로그램을 실행시키도록 했다.
-u 옵션을 통해 username을 입력받고, -at 옵션을 통해 access token 값을 입력 받는다.

# velog_hits/command.py
import argparse

class CommandParser:
  def __init__(self) -> None:
    self.parser = argparse.ArgumentParser(description="Velog Hits")
    self.parser.add_argument("-u", "--username", nargs=1, required=True, help="Velog Username")
    self.parser.add_argument("-at", "--accesstoken", nargs=1, required=True, help="Your Velog Access Token")

  def get_args(self):
    return self.parser.parse_args()

7) main.py 생성

모든 기능을 만들었으니 main()을 통해 velog hits 기능이 실행 되도록 했다.

# velog_hits/main.py
from velog_hits.command import CommandParser
from velog_hits.convertor import DF2HTMLConverter
from velog_hits.crawler import HitsCrawler

def main():
  args = CommandParser().get_args()
  username = args.username[0]
  access_token = args.accesstoken[0]

  print(f"'{username}'님의 조회수 데이터를 가져오고 있으니 잠시만 기다려주세요:)")
  hits_crawler = HitsCrawler(username, access_token)
  post_infos = hits_crawler.get_post_infos()
  print(f"'{username}'님의 조회수 데이터를 모두 가져왔습니다!!")

  print("HTML로 변환을 시작합니다...")
  convertor = DF2HTMLConverter()
  df_result = convertor.get_result_dataframe(post_infos, f"https://velog.io/@{username}/")
  convertor.convert_df_to_html(df_result)

8) setup.py 생성

$ velog-hits -u {username} -at {access_token} 과 같이 velog_hits 로 시작하는 명령어를 만들기 위해 pypi에 새롭게 패키지를 등록했다.

from setuptools import setup, find_packages

setup(
    name="velog-hits",
		...
    entry_points={
        "console_scripts": [
            "velog-hits = velog_hits.main:main"
        ]
    }
)

이를 통해 $pip install velog-hits 라는 명령어를 통해 패키지를 다운받고, $ velog-hits 명령어를 사용할 수 있게 했다.

📍마무리

계속해서 만들고 싶었던 기능이였는데, 21년이 지나기 전에 해당 기능을 만들어서 기분이 좋다:)
아직 HTML이 너무 이쁘지는 않지만 계속해서 업데이트를 하려고 한다. velog 사이트 자체에서 조회수을 종합하여 볼 수 있는 기능이 나오는 그날까진, 아마 나는 velog-hits를 자주 사용할 것 같다!

해당 기능에 문제가 되는 부분, 잘못된 부분, 궁금한 부분이 존재한다면 댓글 또는 이메일 보내주시면 감사하겠습니다:)

최더디

focus on why

다음 포스트

velog 조회수 패키지, 정렬 기능 업데이트

10개의 댓글

안 형준

2022년 1월 8일

너무 좋네요 ㅎㅎ

1개의 답글

LJH

2022년 1월 9일

access token 값은 어디서 확인할 수 있나요?

3개의 답글

ljkgb