한라대학교 공지 알림 봇 제작기 (2) - 코딩

김학진·2019년 12월 30일

Coding pyrhon 공지 알림 봇 웹 크롤링 텔레그램 텔레그램 봇 한라대학교

한라대학교 공지 알림 봇 제작기

목록 보기

2/2

A. 개괄적 도식화

개괄적으로 도식화를 하면 저런 모습이다.

나(client)는 학교 공지사항이 최신화 되면 알람받기를 원한다

위 목적을 해결하기 위해서는

학교 공지사항이 최신화 되면

알람오기를 원한다 (나에게)

위 두가지 사항을 만족시켜야 한다.
그러므로 나는 학교 공지사항이 최신화 되는지 지속적으로 확인하는 부분과 알람을 보내주는 부분을 만들것이다.

때문에 알람을 보내주는 부분은 텔레그램과 카카오톡을 활용하여 코딩을 하고, 그 코드는 지속적으로 학교 공지사항이 최신화 되는지 확인해줘야한다.
서버를 사용하는 이유는 나는 사정상 pc를 항상 켜둘 수 없는 상황이다. 학교 공지사항이 최신화 되는지 어플(텔레그램, 카카오톡)에서 언제나 확인하기 위해서는 pc가 꺼지지 않고 지속적으로 코드를 실행시켜야 하기 때문이다.

B. 코딩

import requests
from bs4 import BeautifulSoup
import os
import telegram

my_token = '****'
bot = telegram.Bot(token=my_token)
my_chat_id = "****"

BASE_DIR = os.path.dirname(os.path.abspath(__file__))
while True:
    req = requests.get('http://www.halla.ac.kr/mbs/kr/jsp/board/list.jsp?boardId=23401&mcategoryId=&id=kr_060101000000')
    req.encoding = 'utf-8'

    html = req.text
    soup = BeautifulSoup(html, 'html.parser')
    posts = soup.select('table > tbody > tr > td > a')

    count_page_num = 0
    count_notice_num = 0

    for i in posts:
        category = i.get('href')
        count_page_num = count_page_num + 1  # 공지 제목 다음을 카운트하기 위해
        if 'mcategoryI' not in category:
            count_notice_num = count_page_num

    latest = posts[count_notice_num].text
    latest_category = posts[count_notice_num].get('href')

    with open(os.path.join(BASE_DIR, 'latest.txt'), 'r+') as f_read:
        before = f_read.readline()
        if before != latest:
            boardSeq = latest_category.find('boardSeq=')
            boardSeq_number = latest_category[boardSeq + 9:]
            url = "http://www.halla.ac.kr/mbs/kr/jsp/board/view.jsp?spage=1&boardId=23401&boardSeq=" + boardSeq_number + "&mcategoryId=&id=kr_060101000000&column=&search="
            bot.sendMessage(chat_id=my_chat_id, text="새 공지사항이 있습니다.")
            bot.sendMessage(chat_id=my_chat_id, text=latest)
            bot.sendMessage(chat_id=my_chat_id, text=url)
            with open(os.path.join(BASE_DIR, 'latest.txt'), 'w+') as f_write:
                f_write.write(latest)
                f_write.close()
        f_read.close()

전체 코드는 이렇게 된다.
위 코드는 내가 입대하기 전 작성한 코드이다.
입대하고 텔레그램 계정 자체를 삭제하다보니 봇 토큰과 chat_id가 날라가서 해당 부분만 수정했다.

sleep을 추가하여 학교 서버에 주는 부하를 줄여주는 것이 좋다._BLEX 님

a. 플러그인

1. requests

2. beautifulsoup4

Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.

These instructions illustrate all major features of Beautiful Soup 4, with examples. I show you what the library is good for, how it works, how to use it, how to make it do what you want, and what to do when it violates your expectations.

This document covers Beautiful Soup version 4.8.1. The examples in this documentation should work the same way in Python 2.7 and Python 3.2.
You might be looking for the documentation for Beautiful Soup 3. If so, you should know that Beautiful Soup 3 is no longer being developed and that support for it will be dropped on or after December 31, 2020. If you want to learn about the differences between Beautiful Soup 3 and Beautiful Soup 4, see Porting code to BS4.

This documentation has been translated into other languages by Beautiful Soup users:

뷰티플수프는 HTML과 XML 파일로부터 데이터를 뽑아내기 위한 파이썬 라이브러리이다. 여러분이 선호하는 해석기와 함께 사용하여 일반적인 방식으로 해석 트리를 항해, 검색, 변경할 수 있다. 주로 프로그래머의 수고를 덜어준다.

이 지도서에서는 뷰티플수프 4의 중요한 특징들을 예제와 함께 모두 보여준다. 이 라이브러리가 어느 곳에 유용한지, 어떻게 작동하는지, 또 어떻게 사용하는지, 어떻게 원하는대로 바꿀 수 있는지, 예상을 빗나갔을 때 어떻게 해야 하는지를 보여준다.

이 문서의 예제들은 파이썬 2.7과 Python 3.2에서 똑 같이 작동한다.

혹시 뷰티플수프 3에 관한 문서를 찾고 계신다면 뷰티플수프 3는 더 이상 개발되지 않는다는 사실을 꼭 아셔야겠다. 새로 프로젝트를 시작한다면 뷰티플수프 4를 적극 추천한다. 뷰티플수프 3와 뷰티플수프 4의 차이점은 BS4 코드 이식하기를 참조하자.

즉 beautifulsoup는 잘못된 HTML을 수정하여 쉽게 탑색할 수 있는 XML형식의 파이썬 객체로 변환한다.