python module과 컴퓨터에 실행파일 둘다 설치 필요
$ pip3 install pytesseract
$ sudo apt install tesseract-ocr
for mac
$ brew install tesseract
# This Python file uses the following encoding: utf-8
from PIL import Image
from pytesseract import image_to_string
img = Image.open('sample.jpg')
text = image_to_string(img)
print(text)
https://developer.ibm.com/tutorials/document-scanner/
https://m.blog.naver.com/samsjang/220694855018
1 # This Python file uses the following encoding: utf-8
2 from PIL import Image
3 from pytesseract import image_to_string
4 import sys
5
6 filename = sys.argv[1]
7 img = Image.open(filename)
8 text = image_to_string(img, lang='kor')
9 #print(text)
10 print(text.encode('utf-8').decode('utf-8'))
sudo add-apt-repository ppa:alex-p/tesseract-ocr
sudo apt-get update
sudo apt install tesseract-ocr-kor
ubuntu16 기준임.
ubuntu18 도 동일한듯
tesseract -c preserve_interword_spaces=1 ../../../eng_kor_test_img.PNG stdout -l kor+eng --psm 4
kor+eng 두언어 동시가능 -> 인식률 떨어짐
--psm 인식률 조정 옵션인듯
https://niceman.tistory.com/155
t(text.encode('utf-8'))