<웹 크롤링과 플라스크로 웹에 올리기 - 노마드 코더>

  1. 기본 세팅
from flask import Flask

app = Flask("SuperScrapper")

app.run("0.0.0.0")
  1. /로 접근하면 파이썬 코드가 실행되게 하기
from flask import Flask

app = Flask("FlaskClass")

@app.route("/")
def home():
  return "hello Welcome"

@app.route("/contact")  #여기로 이동할 수 있음
def contact():
  return "contact me!"

app.run("0.0.0.0") # 레플잇 용
  1. URL 활용

(키워드 인자로 핸들하기)

from flask import Flask

app = Flask("FlaskClass")

@app.route("/")
def home():
  return "hello Welcome"

@app.route("/<username>")
def contact(username):
  return f"Hello, your name is {username}"

app.run("0.0.0.0")

(사용자에게 웹사이트를 보여주는 것)
먼저 html 파일을 templates 폴더에 저장

(job.html)

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>Job Search</title>
</head>
<body>
  <h1>Job Search</h1>
  <form action="">
    <input placeholder="What job do you wand?" required />
 <button>Search</button>
  </form>
</body>
</html>

이후 render 임포트 하여 파일 불러오면 떠있는 창에서 볼 수 있음

(main.py)

from flask import Flask, render_template

app = Flask("FlaskClass")

@app.route("/")
def home():
  return render_template("job.html")

app.run("0.0.0.0")
  1. 사용자가 원하는 검색 결과를 내고 싶다.->request
    최종적으로 한 것 : html 파일에 쓴 단어를 다른 페이지에서 나오게 하는 것.(~한 직업을 찾고 있구나! 하는 페이지, 홈페이지 총 두 가지 만드는 것. ; 이 과정에서 불러오는 내용을 렌더링이라고 함)
<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>Job Searcht</title>
</head>
<body>
  <h1>Search Result</h1>
  <h3>You are looking for {{searchingBy}}</h3>

</body>
</html>

job.html

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>Job Search</title>
</head>
<body>
  <h1>Job Search</h1>
  <form action="/report" method="get">
    <input placeholder="What job do you want?" required name = "word" />
    <button>Search</button>
  </form>
</body>
</html>
  1. 대소문자 맞추기, none 값 조정(소문자로만, none은 홈으로)
from flask import Flask, render_template, request

app = Flask("FlaskClass")

@app.route("/")
def home():
  return render_template("job.html")

@app.route("/report")
def report():
  word = request.args.get('word')
  if word:
    word = word.lower()
  else:
    return redirect("/")
  return render_template("report.html", searchingBy = word)

app.run("0.0.0.0")

  1. 스택오버플로어 크롤링 자료 이용해서 실습
  • 먼저 url에서 자료 이름 마음대로 바꿀 수 있게 전체적으로 수정
    scrapper.py
import requests
from bs4 import BeautifulSoup

def get_last_page(url):
    result = requests.get(url)
    soup = BeautifulSoup(result.text, "html.parser")
    pages = soup.find("div", {"class": "s-pagination"}).find_all("a")
    last_page = pages[-2].get_text(strip=True)
    return int(last_page)


def extract_job(html):
  title = html.find("h2", {"class" : "mb4"}).find('a')["title"]
  company, location = html.find("h3", {"class" : "mb4"}).find_all("span", recursive = False)
  company = company.get_text(strip = True)
  location = location.get_text(strip = True).strip('-')
  job_id = html["data-jobid"]

  return {"title" : title, 'company' : company, 'location' : location, 'apply_link' : f"https://stackoverflow.com/jobs/{job_id}"}



def extract_jobs(last_page, url):
  jobs = []
  for page in range(last_page):
    result = requests.get(f"{url}&pg=page+1")
    soup=BeautifulSoup(result.text, "html.parser")
    results = soup.find_all("div", {"class" : "-job"})
    for result in results:
      job = extract_job(result)
      jobs.append(job)
  return jobs

def get_jobs(word):
  url = f"https://stackoverflow.com/jobs?q={word}&sort=i"
  last_page = get_last_page(url)
  jobs = extract_jobs(last_page, url)
  return jobs

이 상태에서 홈페이지에 아무 직업이나 검색하면 시간이 조금 걸리긴 하지만 모두 띄워지기는 함.(양식은 전과 동일)

  1. 홈페이지에 표로 표시하려면?-> fake db 만들기
  • 한 번 검색한거는 안 돌려도 되는 저장소 같은 것을 만드는 것임.
    main.py
from flask import Flask, render_template, request
from scrapper import get_jobs

app = Flask("FlaskClass")

db = {}

@app.route("/")
def home():
  return render_template("job.html")

@app.route("/report")
def report():
  word = request.args.get('word')
  if word:
    word = word.lower()
    fromDb = db.get(word)
    if fromDb:
      jobs = fromDb
    else:
      jobs = get_jobs(word)
      db[word] = jobs
  else:
    return redirect("/")
  return render_template("report.html", searchingBy = word)

app.run("0.0.0.0")

여기까지 하면 다시 검색해도 금방 나옴
(db에 저장되어 있다는 뜻.) ; route 밖에서 작동해야 함

  • 마무리로 직업이 몇 개 나오는지까지 표시해주는 것
    report.html
<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>Job Searcht</title>
</head>
<body>
  <h1>Search Result</h1>
  <h3>Found {{resultsNumber}} results for : {{searchingBy}}</h3>

</body>
</html>

main.py

from flask import Flask, render_template, request
from scrapper import get_jobs

app = Flask("FlaskClass")

db = {}

@app.route("/")
def home():
  return render_template("job.html")

@app.route("/report")
def report():
  word = request.args.get('word')
  if word:
    word = word.lower()
    fromDb = db.get(word)
    if fromDb:
      jobs = fromDb
    else:
      jobs = get_jobs(word)
      db[word] = jobs
  else:
    return redirect("/")
  return render_template("report.html", searchingBy = word, resultsNumber = len(jobs))

app.run("0.0.0.0")
  1. job을 렌더링(화면에 직업을 띄우기) ; for문 이용

report.html

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>Job Searcht</title>
  <style>
    section {
      display : grid;
      gap : 20px;
      grid-template-columns : repeat(4, 1fr);
      
    }
  </style>
</head>
<body>
  <h1>Search Result</h1>
  <h3>Found {{resultsNumber}} results for : {{searchingBy}}</h3>
  <section>
    <h4>title</h4>
    <h4>company</h4>
    <h4>location</h4>
    <h4>link</h4>
    {% for job in jobs %}
    <span>{{job.title}}</span>
    <span>{{job.company}}</span>    
    <span>{{job.location}}</span>
    <a href="{{job.link}}" target = "_blank">apply</a>
    {% endfor %}

  </section>

</body>
</html>

main.py

from flask import Flask, render_template, request
from scrapper import get_jobs

app = Flask("FlaskClass")

db = {}

@app.route("/")
def home():
  return render_template("job.html")

@app.route("/report")
def report():
  word = request.args.get('word')
  if word:
    word = word.lower()
    exisitingJobs = db.get(word)
    if exisitingJobs:
      jobs = exisitingJobs
    else:
      jobs = get_jobs(word)
      db[word] = jobs
  else:
    return redirect("/")
  return render_template("report.html", searchingBy = word, resultsNumber = len(jobs), jobs = jobs)

app.run("0.0.0.0")
  • 결과로 4가지 분류로 홈페이지가 깔끔하게 정리된다.(검색결과에 따라 달라짐)
  1. csv파일로 저장하는 방법
  • 먼저 저장 버튼 만들기

report.html

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>Job Searcht</title>
  <style>
    section {
      display : grid;
      gap : 20px;
      grid-template-columns : repeat(4, 1fr);
      
    }
  </style>
</head>
<body>
  <h1>Search Result</h1>
  <h3>Found {{resultsNumber}} results for : {{searchingBy}}</h3>
  <a href="/export?word={{searchingBy}}">Export to CSV</a> 
  <section>
    <h4>title</h4>
    <h4>company</h4>
    <h4>location</h4>
    <h4>link</h4>
    {% for job in jobs %}
    <span>{{job.title}}</span>
    <span>{{job.company}}</span>    
    <span>{{job.location}}</span>
    <a href="{{job.link}}" target = "_blank">apply</a>
    {% endfor %}

  </section>

</body>
</html>

main.py(예외 처리 나옴)

from flask import Flask, render_template, request
from scrapper import get_jobs

app = Flask("FlaskClass")

db = {}

@app.route("/")
def home():
  return render_template("job.html")

@app.route("/report")
def report():
  word = request.args.get('word')
  if word:
    word = word.lower()
    exisitingJobs = db.get(word)
    if exisitingJobs:
      jobs = exisitingJobs
    else:
      jobs = get_jobs(word)
      db[word] = jobs
  else:
    return redirect("/")
  return render_template("report.html", searchingBy = word, resultsNumber = len(jobs), jobs = jobs)

@app.route("/export")
def export():
  try: #예외처리 활용하기
    word = request.args.get('word')
    if not word:
      raise Exception()
    word = word.lower()
    jobs = db.get(word)
    if not jobs:
      raise Exception()
    return f"Generate CSV for {word}"
  except:
    return redirect("/")

app.run("0.0.0.0")

이 단계까지 하면 하이퍼링크 눌렀을 때 어디로 이동한다 정도는 알 수 있음.

  1. 최종
    exporter.py
import csv

def save_to_file(jobs):
  file = open("jobs.csv", mode = "w")
  writer = csv.writer(file)
  writer .writerow(["title", "company", "location", "link"])
  for job in jobs:
    writer.writerow(list(job.values()))
  return

main.py

from flask import Flask, render_template, request, redirect, send_file
from scrapper import get_jobs
from exporter import save_to_file

app = Flask("FlaskClass")

db = {}

@app.route("/")
def home():
  return render_template("job.html")

@app.route("/report")
def report():
  word = request.args.get('word')
  if word:
    word = word.lower()
    exisitingJobs = db.get(word)
    if exisitingJobs:
      jobs = exisitingJobs
    else:
      jobs = get_jobs(word)
      db[word] = jobs
  else:
    return redirect("/")
  return render_template("report.html", searchingBy = word, resultsNumber = len(jobs), jobs = jobs)

@app.route("/export")
def export():
  try:
    word = request.args.get('word')
    if not word:
      raise Exception()
    word = word.lower()
    jobs = db.get(word)
    if not jobs:
      raise Exception()
    save_to_file(jobs)
    return send_file("jobs.csv") # 임포트
  except:
    return redirect("/")



app.run("0.0.0.0")
  • report.html, job.html 등은 건들지 않음
  • 마지막에 improt 중 리다이렉트 임포트 안 한 것 알아냄..

결과 레플잇

어려웠다. Flask에 쓰임새에 대해 간접적으로 확인할 수 있었다. 추후 사용하게 될 경우 참고하자.

profile
커피 내리고 향 맡는거 좋아해요. 이것 저것 공부합니다.

0개의 댓글