2022.01.25 TIL

서승원·2022년 1월 25일

TIL

목록 보기

59/68

CLOVA OCR 을 이용한 문서의 jsp 형식으로 변환

NCLOUD의 API인 CLOVA OCR 을 이용해서 문서의 이미지를 업로드하면 해당 이미지로부터 문자열을 추출해서 JSP 형식으로 반환하도록 매뉴얼의 예제 코드를 이용해서 실습해봤다.

위와 같은 이미지를 업로드한다고 하면,

public class 참고2_1 {

public static void main(String[] args) {

String apiURL = "";

String secretKey = "";

try {

URL url = new URL(apiURL);

HttpURLConnection con = (HttpURLConnection)url.openConnection();

// 이 부분이 파악이 필요할 듯. 엽서/택배

// 클라이언트에서 서버로 대량의 정보를 전달하기 위해 POST 방식으로 요청하는 코드??

con.setUseCaches(false);

con.setDoInput(true);

con.setDoOutput(true);

con.setRequestMethod("POST");

// 요청의 헤더부분에 포함시키는 정보 : json 형태의 문서를 내용에 넣는것??

con.setRequestProperty("Content-Type", "application/json; charset=utf-8");

con.setRequestProperty("X-OCR-SECRET", secretKey);

JSONObject json = new JSONObject();

json.put("version", "V2");

json.put("requestId", UUID.randomUUID().toString());

json.put("timestamp", System.currentTimeMillis());

// 파일에서 읽어들여서 > byte[] 에 넣어주고 > 그걸 json 안에 포함시킨다??

// 우리가 지금 쓰려는게 서류의 사진을 전송하면 그 안의 글자를 추출하는 기술이라면 ??

// 아마도 서류의 사진에 해당하는 파일을 json 안에 넣어서 전송하나보다 ...

JSONObject image = new JSONObject();

image.put("format", "jpg");

//image.put("url", "https://kr.object.ncloudstorage.com/ocr-ci-test/sample/1.jpg"); // image should be public, otherwise, should use data

FileInputStream inputStream = new FileInputStream("img\\003.jpg");

byte[] buffer = new byte[inputStream.available()];

inputStream.read(buffer);

inputStream.close();

image.put("data", buffer);

image.put("name", "demo");

JSONArray images = new JSONArray();

images.put(image);

json.put("images", images);

// json 형태의 정보를 POST 방식으로 요청하면서 내용에 포함시켜 전송하는 코드.

String postParams = json.toString();

DataOutputStream wr = new DataOutputStream(con.getOutputStream());

wr.writeBytes(postParams);

wr.flush();

wr.close();

// 200 이면 잘 전송되고 응답이 잘 도착했다.

int responseCode = con.getResponseCode();

BufferedReader br;

if (responseCode == 200) {

br = new BufferedReader(new InputStreamReader(con.getInputStream(),"utf-8"));

} else {

br = new BufferedReader(new InputStreamReader(con.getErrorStream()));

}

// 요청에 대해 서버에서 응답이 넘어왔다. 그거 확인하는 코드

String inputLine;

StringBuffer response = new StringBuffer();

while ((inputLine = br.readLine()) != null) {

response.append(inputLine);

}

br.close();

System.out.println( response );

} catch (Exception e) {

System.out.println(e);

}

Colored by Color Scripter

주석의 흐름과 같이 진행이 되고 response의 StringBuffer는
{"version":"V2","requestId":"e5f6dc20-dcf6-4884-936b-a4a8fbf14d27","timestamp":1643076899701,"images":[{"uid":"231ac846900a4a15b212954b114c2e94","name":"demo","inferResult":"SUCCESS","message":"SUCCESS","validationResult":{"result":"NO_REQUESTED"},"fields":[{"valueType":"ALL","boundingPoly":{"vertices":[{"x":140.0,"y":163.0},{"x":435.0,"y":163.0},{"x":435.0,"y":184.0},{"x":140.0,"y":184.0}]},"inferText":"源??닔?븳臾닿굅遺곸씠???몢猷⑤?몄궪泥쒓컩?옄?룞諛⑹궘","inferConfidence":0.9868,"type":"NORMAL","lineBreak":true}]}]}
위와 같은 꼴이 되고 이를 jsp의 형식으로 정리하고, 인코딩 방식을 변경해보면

{

"version":"V2",

"requestId":"a51fc787-157c-457c-9058-e69119f80e52",

"timestamp":1643077006457,

"images":

[

{

"uid":"c673bbd230c844938b3d3920007ecb72",

"name":"demo",

"inferResult":"SUCCESS",

"message":"SUCCESS",

"validationResult":

{

"result":"NO_REQUESTED"

"fields":[

{

"valueType":"ALL",

"boundingPoly":

{

"vertices":[

{

"x":140.0,

"y":163.0

{

"x":435.0,

"y":163.0

{

"x":435.0,

"y":184.0

{

"x":140.0,

"y":184.0

}

]

"inferText":"김수한무거북이와두루미삼천갑자동방삭",

"inferConfidence":0.9868,

"type":"NORMAL","lineBreak":true

}

]

}

]

}

Colored by Color Scripter

위와 같이 정리된다. 여기서 필요한 부분은 "inferText"로 문서 이미지의 문자열을 추출한 것이다.vertices 를 이용해 해당 이미지의 좌표를 나타낸다. inferText를 JSONObject로부터 담아내서 사용하기 위한 코드를 작성해봤다.

public class Test914 {

public static void main(String[] args) {

String l = "{\"version\":\"V2\",\"requestId\":\"4e9a9707-d541-4bdf-96e8-909248343e3c\",\"timestamp\":1643077817114,\"images\":[{\"uid\":\"d1d73eae596d4e4fb9219ce031e72ab4\",\"name\":\"demo\",\"inferResult\":\"SUCCESS\",\"message\":\"SUCCESS\",\"validationResult\":{\"result\":\"NO_REQUESTED\"},\"fields\":[{\"valueType\":\"ALL\",\"boundingPoly\":{\"vertices\":[{\"x\":140.0,\"y\":163.0},{\"x\":435.0,\"y\":163.0},{\"x\":435.0,\"y\":184.0},{\"x\":140.0,\"y\":184.0}]},\"inferText\":\"김수한무거북이와두루미삼천갑자동방삭\",\"inferConfidence\":0.9868,\"type\":\"NORMAL\",\"lineBreak\":true}]}]}";

JSONObject jo = new JSONObject( l );

JSONArray images = jo.getJSONArray("images");

JSONObject image = images.getJSONObject(0);

JSONArray fields = image.getJSONArray("fields");

JSONObject field = fields.getJSONObject(0);

String inferText = field.getString("inferText");

System.out.println( inferText );

}

Colored by Color Scripter

jsp의 자료 구조에 맞도록 Array와 Object 에 대한 구분, 해당 자료형에 대한 method 사용의 구분을 잘 확인하며 "inferText"에 해당하는 String을 getString을 이용해 얻어내고, 출력 결과는 "김수한무거북이와두루미삼천갑자동방삭" 과 같이 된다.

서승원

2년차 백엔드 개발자, crimy

이전 포스트

2022.01.24 TIL

다음 포스트

2022.01.25 TIL

TIL

2022.01.24 TIL

2022.01.26 TIL

0개의 댓글