YOLOv8 segmentation 안드로이드 -4

알로에·2023년 10월 11일

YOLOv8 android instance segmentation kotlin opencv

YOLOv8-Segmentation

목록 보기

4/6

✔ 1. Output의 이해

후처리를 하려면 output의 형태를 알아야 한다.

간단히 위와 같이 표현할 수 있다. output0의 개수는 8400이 아닐수도 있다. 기본적으로 모델의 input size를 [640 * 640] 으로 하면 8400 이지만 그렇지 않은 경우 줄어들거나 늘어날 수 있다. 마찬가지로 prototype mask의 크기도 [160 * 160] 이지만, 모델의 input size에 따라 달라질 수 있다.
output0에 있는 mask weight의 개수는 output1의 개수와 동일하다.
여기서 output1에 있는 mask들이 최종 마스크라고 생각할 수 있지만, 그렇지 않다.
https://openaccess.thecvf.com/content_ICCV_2019/papers/Bolya_YOLACT_Real-Time_Instance_Segmentation_ICCV_2019_paper.pdf
위의 사이트는 YOLO Instance_Segmentation 논문인데, 이곳에서 자세히 나와있다.

위 사진이 논문의 일부이다. 이 그림을 통해 알 수 있듯이 각 객체별로 mask를 추후에 추출해야 한다.
기본적으로 object detection 때와 마찬가지로 NMS 과정이 있는 것을 확인할 수 있다.

✔ 2. 후처리 (1)

결과를 담을 Result 객체를 생성한다.

data class Result(var box: Rect, val confidence: Float, val index: Int, var maskMat: Mat)

여기서 Rect 객체는 OpenCV에서 제공하는 Rect 객체이다. box와 maskMat은 var 형태로 되어있는 것을 볼 수 있는데, box는 이후 화면 크기에 맞게 resize 할 때 변하기 때문이고, maskMat은 최종 mask로 변경하기 위해서 var 형태로 설정하였다.

후처리를 위한 메서드를 정의한다.

단계는 nms까지 과정이 먼저이므로 그 과정을 먼저 처리해서 box들을 축소한다. 좋은 점은 OpenCV 라이브러리에서 해당 과정을 지원하는 메서드가 있다는 점이다.

 private fun boxOutput(output: Mat, labelSize: Int): MutableList<Result> {
        val detections = output.reshape(1, output.total().toInt() / OUTPUT_SIZE).t()

        val boxes = Array(detections.rows()) { Rect2d() }
        val maxScores = Array(detections.rows()) { 0f }
        val indexes = Array(detections.rows()) { 0 }

        for (i in 0 until detections.rows()) {
            val scores = detections.row(i).colRange(4, 4 + labelSize)
            val max = Core.minMaxLoc(scores)
            val xPos = detections.get(i, 0)[0]
            val yPos = detections.get(i, 1)[0]
            val width = detections.get(i, 2)[0]
            val height = detections.get(i, 3)[0]
            val left = 0.0.coerceAtLeast(xPos - width / 2.0)
            val top = 0.0.coerceAtLeast(yPos - height / 2.0)

            boxes[i] = Rect2d(left, top, width, height)
            maxScores[i] = max.maxVal.toFloat()
            indexes[i] = max.maxLoc.x.toInt()

            scores.release()
        }

        val rects = MatOfRect2d(*boxes)
        val floats = MatOfFloat(*maxScores.toFloatArray())
        val ints = MatOfInt(*indexes.toIntArray())
        val indices = MatOfInt()
        Dnn.NMSBoxesBatched(rects, floats, ints, CONFIDENCE_THRESHOLD, NMS_THRESHOLD, indices)

        val list = mutableListOf<Result>()

        if (indices.total().toInt() == 0) return list

        indices.toList().forEach {
            val scores = detections.row(it).colRange(4, 4 + labelSize)
            val max = Core.minMaxLoc(scores)

            val xPos = detections.get(it, 0)[0]
            val yPos = detections.get(it, 1)[0]
            val width = detections.get(it, 2)[0]
            val height = detections.get(it, 3)[0]

            val x = 0.0.coerceAtLeast(xPos - width / 2.0).toInt()
            val y = 0.0.coerceAtLeast(yPos - height / 2.0).toInt()
            val w = INPUT_SIZE.toDouble().coerceAtMost(width).toInt()
            val h = INPUT_SIZE.toDouble().coerceAtMost(height).toInt()
            val rect = Rect(x, y, w, h)

            val score = max.maxVal.toFloat()
            val index = max.maxLoc.x.toInt()
            val mask = detections.row(it).colRange(4 + labelSize, detections.cols())
            val result = Result(rect, score, index, mask)
            list.add(result)
        }
        detections.release()
        return list
    }

output은 총 두 개지만, output1은 mask에 대한 값들만 담겨 있어서 output0만 사용하면 된다.
8400개에 대해서 각각 가장 높은 confidence 값, index, box를 따로 값을 가져오고 NMS를 실행하면 된다.

Dnn.NMSBoxesBatched(rects, floats, ints, CONFIDENCE_THRESHOLD, NMS_THRESHOLD, indices)

이후에는 남은 값들을 정리해서 Result 객체에 추가한다.
YOLOv8의 좌표값들은 박스의 중심에 대한 x,y로 나오므로 OpenCV의 Rect 객체에 담을 수 있게 박스의 시작점에 대해 x,y로 수정해야 한다.

지금 저장된 box 값들은 모델의 input size에 맞게 변환 되어있다. 이를 다시 화면의 크기에 맞게 box 사이즈를 resize 하는 메서드를 새로 정의한다.

private fun resizeBox(list: MutableList<Result>, width: Int, height: Int) {
        list.forEach {
            val box = it.box
            val x = (box.x * width / INPUT_SIZE)
            val y = (box.y * height / INPUT_SIZE)
            var w = (box.width * width / INPUT_SIZE)
            var h = (box.height * height / INPUT_SIZE)

            if(w > width) w = width
            if(h > height) h = height

            if(x + w > width) w = width - x
            if(y + h > height) h = height - y

            val rect = Rect(x, y, w, h)
            it.box = rect
        }
    }

너비와 높이는 각각 화면을 넘지않게 수정한다.

이제 이 정의된 메서드를 실행할 메서드를 만든다.

    private fun postProcess(
        output0: Mat,
        output1: Mat,
        labelSize: Int,
        width: Int,
        height: Int
    ): MutableList<Result> {
        val lists = boxOutput(output0, labelSize)
        resizeBox(lists, width, height)
//        maskOutput(lists, output1, width, height)

        output0.release()
        output1.release()
        return lists
    }

주석 처리된 maskOutput는 이후 mask에 대해 최종 마스크를 도출하는 메서드이다.

✔ 3. 후처리 (2)

이제 mask에 대해 최종 마스크를 구하기 위해 아래 인터페이스를 만들고, 이전에 만들었던 Inference 인터페이스에 implemets 한다.

interface Segment {
}

interface Inference : Load, Segment{

// 이전과 동일 
...

}

이제 이 Segment 인터페이스에 아래 메서드들을 추가한다.

interface Segment {

    fun maskOutput(
        boxOutputs: MutableList<Result>,
        output1: Mat,
        matWidth: Int,
        matHeight: Int
    ) {

        if (boxOutputs.size == 0) return

        val maskPredictionList = boxOutputs.map { it.maskMat }
        val maskPredictionMat = Mat()
        Core.vconcat(maskPredictionList, maskPredictionMat)
        val reshapeSize = Inference.OUTPUT_MASK_SIZE * Inference.OUTPUT_MASK_SIZE
        val outputMat = output1.reshape(1, output1.total().toInt() / reshapeSize)
        val matMul = Mat()

        Core.gemm(maskPredictionMat, outputMat, 1.0, Mat(), 0.0, matMul)
        val masks = sigmoid(matMul)
        val resizedBoxes = resizeBoxes(boxOutputs, matWidth, matHeight)
        val blurSize = Size(
            (matWidth / Inference.OUTPUT_MASK_SIZE).toDouble(),
            (matHeight / Inference.OUTPUT_MASK_SIZE).toDouble()
        )

        for (i in 0 until resizedBoxes.size) {
            val resizeBox = resizedBoxes[i]
            val scaleX = resizeBox.x
            val scaleY = resizeBox.y
            val scaleW = resizeBox.width
            val scaleH = resizeBox.height

            val w = boxOutputs[i].box.width
            val h = boxOutputs[i].box.height

            val mask = masks.row(i).reshape(1, Inference.OUTPUT_MASK_SIZE)
            val resizedCropMask = Mat(mask, Rect(scaleX, scaleY, scaleW, scaleH))
            val cropMask = Mat()
            val blurMask = Mat()
            val thresholdMask = Mat()
            val resizeSize = Size(w.toDouble(), h.toDouble())

            Imgproc.resize(resizedCropMask, cropMask, resizeSize, 0.0, 0.0, Imgproc.INTER_LINEAR)
            Imgproc.blur(cropMask, blurMask, blurSize)
            Imgproc.threshold(blurMask, thresholdMask, 0.5, 1.0, Imgproc.THRESH_BINARY)

            thresholdMask.convertTo(thresholdMask, CvType.CV_8UC1)
            boxOutputs[i].maskMat.release()
            boxOutputs[i].maskMat = thresholdMask

            mask.release()
            resizedCropMask.release()
            cropMask.release()
            blurMask.release()
        }

        maskPredictionMat.release()
        output1.release()
        outputMat.release()
        matMul.release()
        masks.release()
        maskPredictionList.forEach { it.release() }
    }

    private fun sigmoid(mat: Mat): Mat {
        val oneMat = Mat.ones(mat.size(), mat.type())
        val mulMat = Mat()
        val expMat = Mat()
        val outMat = Mat()

        Core.multiply(mat, Scalar(-1.0), mulMat)
        Core.exp(mulMat, expMat)
        Core.add(oneMat, expMat, outMat)
        Core.divide(oneMat, outMat, outMat)

        oneMat.release()
        mulMat.release()
        expMat.release()

        return outMat
    }

    private fun resizeBoxes(
        boxOutputs: MutableList<Result>,
        width: Int,
        height: Int
    ): MutableList<Rect> {
        val resizedBoxes = mutableListOf<Rect>()
        boxOutputs.forEach {
            val rect = it.box
            val x = rect.x * Inference.OUTPUT_MASK_SIZE / width
            val w = rect.width * Inference.OUTPUT_MASK_SIZE / width
            val y = rect.y * Inference.OUTPUT_MASK_SIZE / height
            val h = rect.height * Inference.OUTPUT_MASK_SIZE / height

            resizedBoxes.add(Rect(x, y, w, h))
        }
        return resizedBoxes
    }
}

메서드는 총 3가지이다. 차례대로 mask weight와 prototype masks를 이용해서 최종 마스크들을 구하는 메서드, Mat 객체끼리 sigmoid 하는 메서드, mask에 있는 box 사이즈를 줄이는 메서드이다.
이전에 Inference에서 사용했던 box resize는 예측 box에 대해 원본 사진의 크기에 맞게 box 사이즈를 변경하는 것이다. 이번에 사용하는 resizeBoxes는 이 box에 대해 mask 사이즈([160 * 160])에 맞게 크기를 축소하는 것이다.

maskOutput 메서드는 복잡할 수 있다. 아래와 같은 로직이 적용된다.

nms 처리가 완료된 Result 객체에서 mask weight를 뽑아온다.

이전에 output0의 형태를 보면 알 수 있듯이 mask weight의 크기는 32로 고정된다. -> 형태는 [result의 개수 * 32]가 된다.

output1의 형태는 [32 * 25,600]의 크기를 가지게 된다. 이때 이 25600은 prototype mask를 일렬로 늘린 형태로 160 * 160의 값이다.

위에서 나왔던 mask weight와 prototype mask 간의 행렬 곱을 수행한다. [result의 개수 * 32][32 * 25600] -> [result의 개수 * 25600]

sigmoid 연산을 수행한다.

result 별로 하나씩 25600의 1차원 배열을 [160 * 160]의 2차원 형태로 변환한다.

기존에 검출했던 박스의 크기를 prototype mask의 크기(160*160)에 맞게 resize하고 blur 처리를 한다. (노이즈 제거)

threshold를 지정하고 그 이상 값만을 저장한다. (최종 마스크)

이제 위에서 정의했던 메서드들을 실행하는 코드를 추가하면 된다.
이전에 썼던 Inference 인터페이스에서 postProcess 메서드에 아래 주석을 해제한다.

private fun postProcess(
        output0: Mat, output1: Mat, labelSize: Int, width: Int, height: Int
    ): MutableList<Result> {
        val lists = boxOutput(output0, labelSize)
        resizeBox(lists, width, height)
        // 주석 해제
        maskOutput(lists, output1, width, height)

        output0.release()
        output1.release()
        return lists
    }

detect 메서드에서 postProcess 메서드를 실행한다.

fun detect(mat: Mat, net: Net, labels: Array<String>) {
        if (isDetect) return

        isDetect = true
        val inputMat = Mat()
        Imgproc.resize(mat, inputMat, Size(INPUT_SIZE.toDouble(), INPUT_SIZE.toDouble()))
        Imgproc.cvtColor(inputMat, inputMat, Imgproc.COLOR_RGBA2RGB)
        inputMat.convertTo(inputMat, CvType.CV_32FC3)
        val blob = Dnn.blobFromImage(inputMat, SCALE_FACTOR)
        net.setInput(blob)

        val output0 = Mat()
        val output1 = Mat()
        val outputList = arrayListOf(output0, output1)
        val outputNameList = arrayListOf(OUTPUT_NAME_0, OUTPUT_NAME_1)

        net.forward(outputList, outputNameList)
        val lists =
            postProcess(outputList[0], outputList[1], labels.size, mat.width(), mat.height())

        blob.release()
        inputMat.release()

        isDetect = false
    }

이제 이 list 안에는 box, confidence, index, final mask 가 담기게 된다.

이 다음 글은 이 검출된 내용을 그리는 내용이 될 것이다.

알로에

이전 포스트

YOLOv8 segmentation 안드로이드 -3

다음 포스트

YOLOv8 segmentation 안드로이드 -4