이것이 데이터 분석이다 - 웹 크롤링

화이팅·2023년 2월 28일

eda

목록 보기

24/29

출처 : 이것이 데이터 분석이다

table -> tbody -> tr -> td -> a

특정 URL로부터 HTML 문서 가져옴

driver.get(url)
html=driver.page_source
HTML 문서에서 데이터 추출

BeautifulSoup
soup=BeautifulSoup(html, 'html.parser') # beautifulsoup() 클래스의 soup객체로 변환
함수사용 -> 특정 HTML태그 가져오기

find() , find_all()
contents_table=soup.find(name='table', attrs={'class' : 'table-hover'})
tbody=contents_table.find(name='tbody')
rows=tbody.find_all(name='tr')

코드를 입력하세요

find() : 이름, 속성, 속성값을 특정
select() : css selector
1) 태그명으로 찾기

tags_span=soup.select('span')

2) id와 class로 찾기

select('조건') : #id or .class명

driver.find_element_by_css_selector('조건')
: beautifulsoup사용하지 않고 필요한 정보 찾아오는 명령어

ex) tr태그로 곡 정보 찾기

songs=soup.select('table.byChart > tbody > tr')

하하...하.