
다음과 같은 URL을 쪼개면 다음과 같습니다.
URL : https://www.example.com:5000/path/to/page.html?lang=ko&id=001#content
protocol : https
domain : www.example.com
port : 5000
path : /path/to/page.html
query : lang=ko&id=001
fragment : content
urllib.parse를 이용하여 URL 구조를 해부해 봅니다.
import urllib.parse
url_parser = urllib.parse.urlparse("https://www.example.com:5000/path/to/page.html?lang=ko&id=001#content")
print('{:<15} : {}'.format("scheme",url_parser.scheme))
print('{:<15} : {}'.format("netloc",url_parser.netloc))
print('{:<15} : {}'.format("path",url_parser.path))
print('{:<15} : {}'.format("query",url_parser.query))
print('{:<15} : {}'.format("fragment",url_parser.fragment))
query를 다시 분해하기 위해서 parse_qs 혹은 parse_qsl을 이용해 볼 수 있습니다.
import urllib.parse
print(urllib.parse.parse_qs("?lang=ko&id=001")) # {'lang': ['ko'], 'id': ['001']}
print(urllib.parse.parse_qsl("?lang=ko&id=001")) # [('lang', 'ko'), ('id', '001')]
GET Method로 웹 서버로 접속하여 데이터를 얻어봅니다.
import urllib
import urllib.request
url = 'https://logins.daum.net/accounts/signinform.do?url=https%3A%2F%2Fwww.daum.net%2F'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)', 'Content-Type': 'application/json; charset=utf-8'}
request = urllib.request.Request(url=url, headers=headers)
try:
response= urllib.request.urlopen(request)
print(response.status)
print(response.read().decode())
except urllib.error.HTTPError as e:
print(e.code)
print(e.read())
except urllib.error.URLError as e:
print(e.reason)
POST Method의 경우 data 인자를 추가하여 전송하면 됩니다.
import urllib
import urllib.request
url = 'https://logins.daum.net/accounts/signinform.do?url=https%3A%2F%2Fwww.daum.net%2F'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)', 'Content-Type': 'application/json; charset=utf-8'}
data = urllib.parse.urlencode({'id':'1234'}).encode('utf-8')
request = urllib.request.Request(url=url, headers=headers, data=data)
try:
response= urllib.request.urlopen(request)
print(response.status)
print(response.read().decode())
except urllib.error.HTTPError as e:
print(e.code)
print(e.read())
except urllib.error.URLError as e:
print(e.reason)