[Android][Java] Object Detection With EfficientDet - 3 실시간 이미지에 모델 적용하기

παντοκράτωρ·2024년 7월 15일

Java Object Detection android tensorflow

Android

목록 보기

5/6

진행 상황

CameraUtils의 startCamera에서 analyzerUtils의 analyze()를 imageProxy 데이터를 인수로 호출했었다.

imageAnalysis.setAnalyzer(cameraExecutor, new ImageAnalysis.Analyzer() {
                        @OptIn(markerClass = ExperimentalGetImage.class)
                        @Override
                        public void analyze(@NonNull ImageProxy image) {
                            analyzerUtils.analyze(image);
                        }
                    });

AnalyzerUtils 생성

AnalyzerUtils.java

public class AnalyzerUtils {
    private static final int INPUT_SIZE = 320; // EfficientDet Lite0 모델의 입력 크기
    // private static final int INPUT_SIZE = 384; // EfficientDet Lite1 모델의 입력 크기
    // private static final int INPUT_SIZE = 448; // EfficientDet Lite2 모델의 입력 크기
    // private static final int INPUT_SIZE = 512; // EfficientDet Lite3 모델의 입력 크기
    // private static final int INPUT_SIZE = 640; // EfficientDet Lite4 모델의 입력 크기
    private static final float CONFIDENCE_THRESHOLD = 0.5f;
    private Activity dstActivity;
    private TextView textView;

    Interpreter tflite;

    ObjectMarker objectMarker;

    public AnalyzerUtils(Activity dstActivity) {
        try {
            tflite = new Interpreter(loadModelFile(dstActivity, "EfficientDet0.tflite"));
        } catch (Exception e) {
            e.printStackTrace();
        }

        objectMarker = new ObjectMarker(dstActivity);
        this.dstActivity = dstActivity;
    }

    @OptIn(markerClass = ExperimentalGetImage.class)
    public void analyze(ImageProxy image) {
        Image mediaImage = image.getImage();

        if (objectMarker != null) {
            objectMarker.setImageSourceInfo(mediaImage.getWidth(), mediaImage.getHeight()); // or true if using front camera
        }

        Bitmap bitmap = imageProxyToBitmap(image);
        TensorImage inputImageBuffer = preprocess(bitmap);

        // 첫 번째 실행으로 numDetections 값을 가져옴
        float[] numDetections = new float[1];
        Map<Integer, Object> outputMapFirstRun = new HashMap<>();
        outputMapFirstRun.put(3, numDetections);

        Log.d(TAG, "analyze: "+ outputMapFirstRun);
        tflite.runForMultipleInputsOutputs(new Object[]{inputImageBuffer.getBuffer()}, outputMapFirstRun);

        int detectionsCount = (int) numDetections[0];

        // 실제 탐지된 객체 수에 맞게 출력 버퍼를 조정
        float[][][] outputLocations = new float[1][detectionsCount][4];
        float[][] outputClasses = new float[1][detectionsCount];
        float[][] outputScores = new float[1][detectionsCount];

        Map<Integer, Object> outputMap = new HashMap<>();
        outputMap.put(0, outputLocations);
        outputMap.put(1, outputClasses);
        outputMap.put(2, outputScores);
        outputMap.put(3, numDetections);

        // 원본 이미지 크기
        int imageWidth = bitmap.getWidth();
        int imageHeight = bitmap.getHeight();

        tflite.runForMultipleInputsOutputs(new Object[]{inputImageBuffer.getBuffer()}, outputMap);

        // 결과 해석
        for (int i = 0; i < detectionsCount; i++) {
        	// 편의상 하나만 감지하고 탈출
            if( i == 1 ){
            	break;
			}
            if (outputScores[0][i] >= CONFIDENCE_THRESHOLD) {
                RectF boundingBox = new RectF(
                        outputLocations[0][i][1] * INPUT_SIZE,
                        outputLocations[0][i][0] * INPUT_SIZE,
                        outputLocations[0][i][3] * INPUT_SIZE,
                        outputLocations[0][i][2] * INPUT_SIZE
                );

                // 원래 이미지 크기로 복구
                boundingBox.left = boundingBox.left * imageWidth / INPUT_SIZE;
                boundingBox.top = boundingBox.top * imageHeight / INPUT_SIZE;
                boundingBox.right = boundingBox.right * imageWidth / INPUT_SIZE;
                boundingBox.bottom = boundingBox.bottom * imageHeight / INPUT_SIZE;
                objectMarker.updateRect(boundingBox);

                int detectedClass = (int) outputClasses[0][i];
                float score = outputScores[0][i];
                String category = items.get(detectedClass);
                Log.d(TAG, "Detected object: Class=" + category + ", Score=" + score + ", Box=" + boundingBox);

				// TextView에 현재 감지된 
                textView = this.dstActivity.findViewById(R.id.textView);
                textView.setText(category);
            }
        }

        image.close();
    }

    private TensorImage preprocess(Bitmap bitmap) {
        TensorImage tensorImage = new TensorImage();
        tensorImage.load(bitmap);
        Bitmap resizedBitmap = Bitmap.createScaledBitmap(bitmap, INPUT_SIZE, INPUT_SIZE, true);
        tensorImage.load(resizedBitmap);
        return tensorImage;
    }

    private MappedByteBuffer loadModelFile(Context context, String modelPath) throws IOException {
        AssetFileDescriptor fileDescriptor = context.getAssets().openFd(modelPath);
        FileInputStream inputStream = new FileInputStream(fileDescriptor.getFileDescriptor());
        FileChannel fileChannel = inputStream.getChannel();
        long startOffset = fileDescriptor.getStartOffset();
        long declaredLength = fileDescriptor.getDeclaredLength();
        return fileChannel.map(FileChannel.MapMode.READ_ONLY, startOffset, declaredLength);
    }

    private Bitmap imageProxyToBitmap(ImageProxy image) {
        ImageProxy.PlaneProxy[] planes = image.getPlanes();
        ByteBuffer yBuffer = planes[0].getBuffer();
        ByteBuffer uBuffer = planes[1].getBuffer();
        ByteBuffer vBuffer = planes[2].getBuffer();

        int ySize = yBuffer.remaining();
        int uSize = uBuffer.remaining();
        int vSize = vBuffer.remaining();

        byte[] nv21 = new byte[ySize + uSize + vSize];

        yBuffer.get(nv21, 0, ySize);
        vBuffer.get(nv21, ySize, vSize);
        uBuffer.get(nv21, ySize + vSize, uSize);

        YuvImage yuvImage = new YuvImage(nv21, ImageFormat.NV21, image.getWidth(), image.getHeight(), null);
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        yuvImage.compressToJpeg(new Rect(0, 0, image.getWidth(), image.getHeight()), 100, out);
        byte[] imageBytes = out.toByteArray();
        return BitmapFactory.decodeByteArray(imageBytes, 0, imageBytes.length);
    }


    static List<String> items = Arrays.asList(
            "person",
            "bicycle",
            "car",
            "motorcycle",
            "airplane",
            "bus",
            "train",
            "truck",
            "boat",
            "traffic light",
            "fire hydrant",
            "street sign",
            "stop sign",
            "parking meter",
            "bench",
            "bird",
            "cat",
            "dog",
            "horse",
            "sheep",
            "cow",
            "elephant",
            "bear",
            "zebra",
            "giraffe",
            "hat",
            "backpack",
            "umbrella",
            "shoe",
            "eye glasses",
            "handbag",
            "tie",
            "suitcase",
            "frisbee",
            "skis",
            "snowboard",
            "sports ball",
            "kite",
            "baseball bat",
            "baseball glove",
            "skateboard",
            "surfboard",
            "tennis racket",
            "bottle",
            "plate",
            "wine glass",
            "cup",
            "fork",
            "knife",
            "spoon",
            "bowl",
            "banana",
            "apple",
            "sandwich",
            "orange",
            "broccoli",
            "carrot",
            "hot dog",
            "pizza",
            "donut",
            "cake",
            "chair",
            "couch",
            "potted plant",
            "bed",
            "mirror",
            "dining table",
            "window",
            "desk",
            "toilet",
            "door",
            "tv",
            "laptop",
            "mouse",
            "remote",
            "keyboard",
            "cell phone",
            "microwave",
            "oven",
            "toaster",
            "sink",
            "refrigerator",
            "blender",
            "book",
            "clock",
            "vase",
            "scissors",
            "teddy bear",
            "hair drier",
            "toothbrush",
            "hair brush"
    );
}

요약

Interpreter tflite 변수에 assets 폴더에 있는 EfficientDet 모델을 로드
ImageProxy 데이터를 Bitmap으로 변환 후 전처리(입력 이미지 크기 조절 등)
출력 데이터 형태에 맞게 변수를 생성한 후 outputMap으로 Object Detection 데이터 가져오기 (객체 위치, 객체 카테고리 등의 데이터가 들어 있음. 자세한 건 EfficientDet Metadata 확인)
결과 해석 및 ObjectMarker.updateRect() 호출로 객체 위치에 직사각형 그리기 , TextView에 감지된 객체 카테고리명 출력
ImageProxy 변수를 close() 처리하기 (close()를 해야 다음 이미지 처리 가능)

Object Marker 및 updateRect() 함수는 다음 포스트에서 정리한다

activity_main.xml 수정

activity_main.xml

    <TextView
        android:id="@+id/textView"
        android:layout_width="360dp"
        android:layout_height="80dp"
        android:gravity="center"
        android:text="Initial"
        android:textColor="@color/black"
        android:background="@color/white"
        android:textSize="60sp"

        app:layout_constraintBottom_toBottomOf="parent"
        app:layout_constraintEnd_toEndOf="parent"
        app:layout_constraintStart_toStartOf="parent" />

activity_main.xml에 인식된 객체의 카테고리를 출력할 TextView 추가한다.

παντοκράτωρ

이전 포스트

[Android][Java] Object Detection With EfficientDet - 2 Camera 사용

다음 포스트