[iOS] speech to text by Speech

์ •์œ ์ง„ยท2022๋…„ 8์›” 11์ผ
0

swift

๋ชฉ๋ก ๋ณด๊ธฐ
9/25
post-thumbnail
post-custom-banner

๐Ÿค“ ๋“ค์–ด๊ฐ€๋ฉฐ

text๋ฅผ speech๋กœ ์˜ฎ๊ธฐ๋Š” ์ผ์€ ์ƒ๊ฐ๋ณด๋‹ค ๊ฐ„๋‹จํ–ˆ๋‹ค. '๊ทธ๋Ÿฐ๋ฐ speech๋ฅผ text๋กœ ๋ฐ”๊พธ๋Š” ๊ฒƒ์€ ์–ด๋–ป๊ฒŒ ํ•˜์ง€? MLKit ์ด๋ผ๋„ ์‚ฌ์šฉํ•ด์•ผ ํ•˜๋‚˜?' ๋ผ๋Š” ๊ฑฑ์ •. ๊ฒ€์ƒ‰ํ•ด๋ณด๋‹ˆ MLKit๊นŒ์ง€ ์‚ฌ์šฉํ•  ํ•„์š”๋Š” ์—†์„ ๊ฒƒ ๊ฐ™๊ณ  SFSpeechRecognizer, AVAudioEngine ์„ ์‚ฌ์šฉํ•ด์„œ ๊ธฐ๋Šฅ์„ ๊ตฌํ˜„ํ•œ SwiftUI ์ƒ˜ํ”Œ์ด ์žˆ์–ด ์‚ดํŽด๋ณด๊ณ ์ž ํ•œ๋‹ค. ํ† ์ดํ”„๋กœ์ ํŠธ์— ํ•ด๋‹น ๊ธฐ๋Šฅ์„ ๋„ฃ์–ด ๋ฒˆ์—ญํ•˜๊ธฐ ์›ํ•˜๋Š” ๋ฌธ์žฅ์„ text๋กœ ์ž…๋ ฅํ•  ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๋งˆ์ดํฌ๋ฅผ ์‚ฌ์šฉํ•ด ์ž…๋ ฅํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋งŒ๋“ค ์˜ˆ์ •์ด๋‹ค.

https://developer.apple.com/tutorials/app-dev-training/transcribing-speech-to-text
base๊ฐ€ ๋œ ์ƒ˜ํ”Œ ์˜ˆ์ œ. ํŠœํ† ๋ฆฌ์–ผ์ด ์—„์ฒญ๋‚˜๊ฒŒ ์นœ์ ˆํ•˜๋‹ค.

๐ŸŽค Speech

https://developer.apple.com/documentation/speech

๐Ÿ“ฐ overview

ํ•ด๋‹น ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋…น์Œ๋œ ์˜ค๋””์˜ค๋‚˜ ์Œ์„ฑ์—์„œ ๋‹จ์–ด๋ฅผ ํฌ์ฐฉํ•  ์ˆ˜ ์žˆ๋‹ค. ์šฐ๋ฆฌ๊ฐ€ ์•„์ดํฐ ํ‚ค๋ณด๋“œ์—์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐ›์•„์“ฐ๊ธฐ ๊ธฐ๋Šฅ์ด ์˜ค๋””์˜ค๋ฅผ ํ…์ŠคํŠธ๋กœ ๋ฐ”๊พธ๊ธฐ ์œ„ํ•ด ์Šคํ”ผ์น˜ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ํ‚ค๋ณด๋“œ์— ์˜์กดํ•˜์ง€ ์•Š๊ณ  ์Œ์„ฑ์œผ๋กœ ๋ช…๋ นํ•˜๋Š” ๊ฒƒ๋„ ๊ฐ€๋Šฅํ•ด์ง„๋‹ค. ํ•ด๋‹น ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ๋‹ค๊ตญ์–ด๋ฅผ ์ง€์›ํ•˜๋‚˜ ๊ฐ๊ฐ์˜ SFSpeechRecognizer ๊ฐ์ฒด๋Š” ํ•œ๊ฐ€์ง€ ์–ธ์–ด๋งŒ ์ปค๋ฒ„ํ•œ๋‹ค. (๋‚˜์˜ ๊ฒฝ์šฐ 4๊ฐœ ๊ตญ์–ด๋ฅผ ๋ฒˆ์—ญํ•˜๊ธฐ ์œ„ํ•ด 4๊ฐœ์˜ ๊ฐ์ฒด๋ฅผ ์‚ฌ์šฉํ•ด์•ผํ•  ๊ฒƒ) ๋‹ค๊ตญ์–ด ์ง€์›์„ ์œ„ํ•ด์„œ๋Š” ๊ธฐ๋ณธ์ ์œผ๋กœ ๋„คํŠธ์›Œํฌ ์—ฐ๊ฒฐ์ด ๋˜๋Š” ํ™˜๊ฒฝ์„ ์ „์ œ๋กœ ํ•œ๋‹ค. (on-device๋กœ ์ง€์›ํ•˜๋Š” ์–ธ์–ด๊ฐ€ ์žˆ์ง€๋งŒ ํ”„๋ ˆ์ž„์›Œํฌ ์ž์ฒด๊ฐ€ ์• ํ”Œ ์„œ๋ฒ„์— ์˜์กดํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.)

๐Ÿ“• Class SFSpeechRecognizer

class SFSpeechRecognizer : NSObject
  • ์Œ์„ฑ ์ธ์‹ ํ”„๋กœ์„ธ์Šค์˜ ๋ฉ”์ธ ์ค‘์ถ” ๊ฐ์ฒด
  • ์ตœ์ดˆ์˜ ์Œ์„ฑ ์ธ์‹์„ ์œ„ํ•˜์—ฌ authorization ์š”์ฒญ
  • ํ”„๋กœ์„ธ์Šค ์ค‘ ์–ด๋–ค ์–ธ์–ด๋ฅผ ์‚ฌ์šฉํ•  ์ง€ ๊ฒฐ์ •
  • ์Œ์„ฑ ์ธ์‹ task init

1. โ›น๏ธโ€โ™‚๏ธ ์‹œ์ž‘ํ•˜๊ธฐ

1) authorization setup

project > target > Info
Privacy - Speech Recognition Usage Description : You can view a text transcription of your meeting in the app.
Privacy - Microphone Usage Description : Audio is recorded to transcribe the meeting. Audio recordings are discarded after transcription.

2) create object

import AVFoundation
import Foundation
import Speech
import SwiftUI

// interface๊ฐ€ ๋˜๋Š” wrapper class๋ž„์ง€.. 
class SpeechRecognizer: ObservableObject {
	private let recognizer: SFSpeechRecognizer?
    
     init() {
        recognizer = SFSpeechRecognizer()
        
        Task(priority: .background) {
            do {
                guard recognizer != nil else {
                    throw RecognizerError.nilRecognizer
                }
                guard await SFSpeechRecognizer.hasAuthorizationToRecognize() else {
                    throw RecognizerError.notAuthorizedToRecognize
                }
                guard await AVAudioSession.sharedInstance().hasPermissionToRecord() else {
                    throw RecognizerError.notPermittedToRecord
                }
            } catch {
                speakError(error)
            }
        }
     }
    
  • initํ•˜๋ฉด์„œ ๊ฐ์ฒด์— ๋Œ€ํ•œ ์œ ํšจ์„ฑ ๊ฒ€์‚ฌ (์—๋Ÿฌ๊ด€๋ จ enum์€ ์ž์œ ๋กญ๊ฒŒ customํ•˜๋ฉด ๋œ๋‹ค.)
  • async ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด Task ์„ ์–ธ
  • ๊ฐ€๋“œ๋ฌธ์—์„œ ์‚ฌ์šฉ๋œ ๋ฉ”์„œ๋“œ๋Š” extension์—์„œ ์„ ์–ธ๋œ ๋ฉ”์„œ๋“œ์ด๋ฏ€๋กœ ๊ณต์‹๋ฌธ์„œ์—๋Š” ์กด์žฌํ•˜์ง€ ์•Š๋Š”๋‹ค. ์•„๋ž˜์˜ ์ฝ”๋“œ ์ฐธ๊ณ .

3) check authorization

extension SFSpeechRecognizer {
    static func hasAuthorizationToRecognize() async -> Bool {
        await withCheckedContinuation { continuation in
            requestAuthorization { status in
                continuation.resume(returning: status == .authorized)
            }
        }
    }
}

๐Ÿ” auth check ํ•˜๋‚˜ํ•˜๋‚˜ ๋œฏ์–ด๋ณด๊ธฐ

func withCheckedContinuation<T>(function: String = #function, _ body: (CheckedContinuation<T, Never>) -> Void) async -> T
  • withCheckedContinuation: concurrent generic function, ํ˜„์žฌ ์ง„ํ–‰๋˜๋Š” ํƒœ์Šคํฌ๋ฅผ ์ค‘๋‹จํ•˜๊ณ  closure๋ฅผ ์ˆ˜ํ–‰ํ•œ๋‹ค. closure์— argument๋กœ CheckedContinuation์ด ์ „๋‹ฌ๋˜๋Š”๋ฐ ๋ฐ˜๋“œ์‹œ resumeํ•ด์•ผ ์ค‘๋‹จ๋˜์—ˆ๋˜ ํƒœ์Šคํฌ๋ฅผ ์ด์–ด๋‚˜๊ฐˆ ์ˆ˜ ์žˆ๋‹ค.
class func requestAuthorization(_ handler: @escaping (SFSpeechRecognizerAuthorizationStatus) -> Void)
  • requestAuthorization(_:): SFSpeechRecognizer Type Method, handler๋Š” AuthorizationStatus ๊ฐ’์„ ๋ฐ›๋Š”๋ฐ status ์ƒํƒœ๊ฐ€ "known" ์ผ ๋•Œ์— ์ด block์ด execute ๋œ๋‹ค. status parameter๋Š” ํ˜„์žฌ ๋‚ด ์•ฑ์˜ auth ์ƒํƒœ ๊ฐ’์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค.
func resume(returning value: T)
  • CheckedContinuation์˜ instance method
  • ์ค‘๋‹จ๋œ ์ง€์ ์—์„œ ๋‹ค์‹œ ํƒœ์Šคํฌ๋ฅผ ์‹คํ–‰ํ•˜๋Š”๋ฐ value๋กœ ๋„˜๊ธด ๊ฐ’์„ caller-๋ฉ”์„œ๋“œ๋ฅผ ํ˜ธ์ถœํ•œ ๊ณณ, ์—ฌ๊ธฐ์„œ๋Š” guard await SFSpeechRecognizer.hasAuthorizationToRecognize()-์—๊ฒŒ ๋ฐ˜ํ™˜ํ•œ๋‹ค.
  • ๋”ฐ๋ผ์„œ status์˜ ์ƒํƒœ ๊ฐ’์ด .authorized์ด ์•„๋‹ˆ๋ฉด false๋ฅผ ๋ฐ˜ํ™˜ํ•ด์„œ guard ๋ฌธ์ด throw๋ฅผ ํ•  ๊ฒƒ์ด๋‹ค.

2. ๐Ÿ˜Ž locale?

์•ž์„œ์„œ SFSpeechRecognizer๋Š” ํ•œ ๊ฐ์ฒด๊ฐ€ ํ•˜๋‚˜์˜ ์–ธ์–ด๋งŒ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ์–ธ๊ธ‰ํ–ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๊ธฐ๋ณธ ์ƒ์„ฑ์ž๋Š” ์‚ฌ์šฉ์ž์˜ default ์–ธ์–ด ์…‹ํŒ…์„ ๋”ฐ๋ผ๊ฐ€๋Š” ๊ฐ์ฒด๊ฐ€ ์ƒ์„ฑ๋œ๋‹ค. ํŠน์ • ์–ธ์–ด๋ฅผ ์ธ์‹ํ•˜๋„๋ก ์„ค์ •ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋‹ค๋ฅธ ์ƒ์„ฑ์ž๋ฅผ ์‚ฌ์šฉํ•ด์•ผ ํ•œ๋‹ค.

init?(locale: Locale)
// Creates a speech recognizer associated with the specified locale.
  • Locale(identifier: "en-US")
  • Locale(identifier: "ko_KR")
  • Locale(identifier: "ca_ES")
  • Locale(identifier: "ja_JP")

3.๐Ÿ—ฃ transcribe, ์ด๊ฒƒ์„ ์œ„ํ•ด ๋‹ฌ๋ ค์™”๋‹ค.

ํŠน์ • view์—์„œ ์Œ์„ฑ ์ธ์‹์„ ์‹œ์ž‘ํ•˜๊ฑฐ๋‚˜ ๋ฐ˜๋Œ€๋กœ ์Œ์„ฑ ์ธ์‹ ํ”„๋กœ์„ธ์Šค๋ฅผ ์ •์ง€์‹œํ‚ค๋Š” ํฌ์ธํŠธ๊ฐ€ ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•˜์ž. ๋ฒ„ํŠผ tap ์ด๋ฒคํŠธ์ผ ์ˆ˜๋„ ์žˆ๊ฒ ๊ณ  UIKit์ด๋ผ๋ฉด viewWillAppear/viewWillDisapear, SwiftUI๋ผ๋ฉด onAppear/OnDisappear ์ผ ์ˆ˜ ์žˆ๊ฒ ๋‹ค. ์–ธ์ œ์ธ์ง€๋Š” ์„ ํƒํ•˜๊ธฐ ๋‚˜๋ฆ„์ด๊ณ  ์ค‘์š”ํ•œ ๊ฒƒ์€ ํŠน์ • ์‹œ์ ์— ์šฐ๋ฆฌ๊ฐ€ ์ƒ์„ฑํ•œ speechRecognizer๋ฅผ ์‹œ์ž‘ํ•ด์•ผํ•œ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค.

.onAppear {
            speechRecognizer.reset()
            speechRecognizer.transcribe()
        }

โš™๏ธ func reset

class SpeechRecognizer: ObservableObject {

 private var audioEngine: AVAudioEngine?
 private var request: SFSpeechAudioBufferRecognitionRequest?
 private var task: SFSpeechRecognitionTask?
 private let recognizer: SFSpeechRecognizer?

func reset() {
        task?.cancel()
        audioEngine?.stop()
        audioEngine = nil
        request = nil
        task = nil
    }
  • ์œ„์˜ ์ฝ”๋“œ๋ฅผ ๋ณด๋ฉด ๋Š๊ปด์ง€๊ฒ ์ง€๋งŒ ์Œ์„ฑ ์ธ์‹์„ ํ•œ๋‹ค๋Š”๊ฒŒ ๋‹จ์ˆœํžˆ SFSpeechRecognizer ๊ฐ์ฒด๋งŒ ์žˆ๋‹ค๊ณ  ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๊ฒƒ์ด ์•„๋‹ˆ๋‹ค.
  • recognizer๊ฐ€ ์Œ์„ฑ ์ธ์‹ task๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด ํ•„์š”ํ•œ ์ค€๋น„๋ฌผ 2๊ฐ€์ง€
    • audioEngine: AVAudioEngine
    • request: SFSpeechAudioBufferRecognitionRequest

โš™๏ธ func prepareEngine

let (audioEngine, request) = try Self.prepareEngine()

 private static func prepareEngine() throws -> (AVAudioEngine, SFSpeechAudioBufferRecognitionRequest) {
        let audioEngine = AVAudioEngine()
        
        let request = SFSpeechAudioBufferRecognitionRequest()
        request.shouldReportPartialResults = true
        
        let audioSession = AVAudioSession.sharedInstance()
        try audioSession.setCategory(.record, mode: .measurement, options: .duckOthers)
        try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
        let inputNode = audioEngine.inputNode
        
        let recordingFormat = inputNode.outputFormat(forBus: 0)
        inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer: AVAudioPCMBuffer, when: AVAudioTime) in
            request.append(buffer)
        }
        audioEngine.prepare()
        try audioEngine.start()
        
        return (audioEngine, request)
    }

๊ทธ๋‚˜์ €๋‚˜ tuple๋กœ returnํ•ด์„œ tuple๋กœ ๋ฐ›์œผ๋ฉด ์—ฌ๋Ÿฌ๊ฐœ์˜ ๋ณ€์ˆ˜๋ฅผ ํ•œ ๋ฒˆ์— initializeํ•  ์ˆ˜ ์žˆ๋Š”๊ฑฐ ๋‚˜๋งŒ ๋ชฐ๋ž๋‚˜ ํ˜น์‰ฌ? ๐Ÿคช

๐Ÿ” audio engine prepare ๋œฏ์–ด๋ณด๊ธฐ

๊ทธ๋ƒฅ ์ƒ๊ฐ์—†์ด ๋ณต์‚ฌ+๋ถ™์—ฌ๋„ฃ๊ธฐ ํ•˜๋ฉด ํŽธํ• ๊ฑฐ ๊ฐ™์€๋ฐ ๊ทธ๋ƒฅ ๋„˜์–ด๊ฐ€๊ธฐ์—๋Š” ๋งˆ์Œ์ด ๋ถˆํŽธํ•˜๋‹ค. audioEngine์€ ๋”ฑ ๋ด๋„ ํ•„์š”ํ• ๊ฑฐ ๊ฐ™์€๋ฐ (๋‚ฉ๋“ 100) ์ € AudioSession์ด๋ž‘ inputNode๊ฐ€ ๋Œ€์ฒด ๋ฌด์—‡์ด์ง€.๐Ÿง ๊ทธ๋Ÿฌ๋‹ˆ๊นŒ ์•Œ์•„๋ณด์ž.

1) ๐Ÿ”Š AudioSession

class AVAudioSession: NSObject
  • ๋‚˜์˜ app๊ณผ OS system (= ์˜ค๋””์˜ค ํ•˜๋“œ์›จ์–ด) ์‚ฌ์ด์—์„œ ์ค‘๊ฐœ์ž ์—ญํ• ์„ ํ•œ๋‹ค.
  • ์ด ์„ธ์…˜ ๋•๋ถ„์— ๋‚ด๊ฐ€ ํ•˜๋“œ์›จ์–ด์˜ ๋™์ž‘ ํ•˜๋‚˜ํ•˜๋‚˜ ๊ฐ„์„ญํ•˜๊ฑฐ๋‚˜ ์ง์ ‘ interactionํ•˜์ง€ ์•Š์•„๋„ ๋œ๋‹ค.(๋”ฐ๋ด‰ ์˜ค๋””์˜ค์„ธ์…˜์•„ ๊ณ ๋งˆ์›Œ!๐Ÿ‘)
  • ๋ชจ๋“  ์• ํ”Œ ์•ฑ๋“ค์€ ๊ธฐ๋ณธ์ ์œผ๋กœ default audio session์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค.
  • (๊ธฐ๋ณธ config ์„ค์ •์ด ๋˜์–ด์žˆ์œผ๋‹ˆ ๊ฐ€์ ธ๋‹ค ์“ฐ๋ฉด ๋จ.)
AVAudioSession.sharedInstance()
  • default session์ด ์ œ๊ณตํ•˜๋Š” ๊ฒƒ
    • audio playback (๋…น์Œ ํ›„์— ๋‹ค์‹œ ์žฌ์ƒํ•˜๋Š” ๊ฒƒ.)
    • iOS์—์„œ๋Š” silent mode ์‚ฌ์šฉ ์‹œ ๋ชจ๋“  ์˜ค๋””์˜ค ์žฌ์ƒ์„ ์นจ๋ฌต์‹œํ‚จ๋‹ค.
  • default session์ด ์ œ๊ณตํ•˜์ง€ ์•Š๋Š” ๊ฒƒ์€ ์–ด๋–กํ•˜์ง€? -> app์˜ ์˜ค๋””์˜ค์„ธ์…˜์˜ category๋ฅผ ์ˆ˜์ •ํ•˜๋ฉด ๋œ๋‹ค.
var category: AVAudioSession.Category { get }
func setCategory(AVAudioSession.Category, options: AVAudioSession.CategoryOptions)
  • audio session category๋Š” ์˜ค๋””์˜ค์˜ ๋™์ž‘ (behavior)์„ ์ •์˜ํ•œ๋‹ค.
  • default ๊ฐ’์€ soloAmbient
  • ์ด ์™ธ์—๋„ multiRoute, playAndRecord, playback, record...
  • option์€ specific audio session categories์— ์œ ํšจํ•˜๋‹ค
  • duckOthers : ํ•ด๋‹น ์„ธ์…˜์˜ ์˜ค๋””์˜ค๊ฐ€ ์žฌ์ƒ๋  ๋•Œ ๋‹ค๋ฅธ ์˜ค๋””์˜ค ์„ธ์…˜์˜ ๋ณผ๋ฅจ์„ ์ค„์ด๋Š” ์˜ต์…˜
  • https://developer.apple.com/documentation/avfaudio/avaudiosession/categoryoptions (๋‹ค๋ฅธ ์˜ต์…˜๋“ค์€ ์ด ๋งํฌ ์ฐธ๊ณ ํ•˜๊ธฐ)
func setActive(
    _ active: Bool,
    options: AVAudioSession.SetActiveOptions = []
) throws
  • ์„ค์ •ํ•œ option์„ ํ† ๋Œ€๋กœ ์ƒ์„ฑ๋œ ์˜ค๋””์˜ค ์„ธ์…˜์„ ํ™œ์„ฑํ™”ํ•œ๋‹ค.
  • notifyOthersOnDeactivation : ๋‚ด ์•ฑ์˜ ์˜ค๋””์˜ค ์„ธ์…˜์„ ๋น„ํ™œ์„ฑํ™”ํ•˜๋ฉด ๋‹ค๋ฅธ ์•ฑ๋“คํ•œํ…Œ ์•Œ๋ฆฌ๋Š” ์˜ต์…˜ (๊ทธ๋ž˜์•ผ ๋‹ค๋ฅธ ์•ฑ์˜ ์†Œ๋ฆฌ๋ฅผ ๋‹ค์‹œ ์ผค ์ˆ˜ ์žˆ๊ฒ ์ง€..)

2) ๐Ÿ’ฌ inputNode

class AVAudioEngine : NSObject
  • audioEngine์€ ์˜ค๋””์˜ค nodes๋ฅผ ๊ด€๋ฆฌํ•˜๊ณ  playback์„ ์ปจํŠธ๋กคํ•˜๋ฉฐ ์‹ค์‹œ๊ฐ„์œผ๋กœ rendering ๊ฐ’์„ ์„ค์ •ํ•˜๋Š” ๊ฐ์ฒด
  • AVAudioNode ๊ทธ๋ฃน์„ ์—”์ง„์ด ๊ฐ€์ง€๊ณ  ์žˆ๋Š”๋ฐ ์˜ค๋””์˜ค๋ฅผ ํ”„๋กœ์„ธ์‹ฑํ•˜๋Š”๋ฐ์— ์ด ๋…ธ๋“œ ๊ทธ๋ฃน์ด ํ•„์š”ํ•˜๋‹ค.

let audioFile = /* An AVAudioFile instance that points to file that's open for reading. */
let audioEngine = AVAudioEngine()
let playerNode = AVAudioPlayerNode()

// Attach the player node to the audio engine.
audioEngine.attach(playerNode)

// Connect the player node to the output node.
audioEngine.connect(playerNode, 
                    to: audioEngine.outputNode, 
                    format: audioFile.processingFormat)
playerNode.scheduleFile(audioFile, 
                        at: nil, 
                        completionCallbackType: .dataPlayedBack) { _ in
    /* Handle any work that's necessary after playback. */
}
  • ์˜ค๋””์˜ค ํŒŒ์ผ์„ ์žฌ์ƒํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด, ์ค€๋น„๋ฌผ์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค
    • ์˜ค๋””์˜ค ์—”์ง„ ๊ฐ์ฒด
    • ์—”์ง„์— ๋ถ™์ผ ๋…ธ๋“œ ๊ฐ์ฒด
    • readํ•  ์ˆ˜ ์žˆ๋Š” ์˜ค๋””์˜ค ํŒŒ์ผ ๊ฐ์ฒด
  • ๋…ธ๋“œ๋ฅผ -> ์—”์ง„์— attach -> ์˜ค๋””์˜ค ํŒŒ์ผ์„ -> ๋…ธ๋“œ์— schedule
  • ๋น„์œ ํ•˜์ž๋ฉด ์˜ค๋””์˜ค ์—”์ง„์ด ์นด์„ธํŠธ ํ”Œ๋ ˆ์ด์–ด ๐Ÿ“น, ๋…ธ๋“œ๊ฐ€ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ์นด์„ธํŠธ ํ…Œ์ดํ”„ ๐Ÿ“ผ, ์˜ค๋””์˜ค ํŒŒ์ผ์ด ํ…Œ์ดํ”„์— ์ƒˆ๊ฒจ์ง„ ๋ฐ์ดํ„ฐ๐ŸŽž ๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์„๊นŒ? (๋‚ก์€ ๋น„์œ ๐Ÿฅฒ)
  • ๊ทธ๋Ÿฐ๋ฐ input tape์™€ output tape๊ฐ€ ์กด์žฌํ•˜๋Š”...* (input tape์— ๋…น์Œํ•˜๊ณ  output tape๋กœ ์žฌ์ƒํ•œ๋‹ค๊ณ  ์ดํ•ดํ•˜๋ฉด ๋ ์ง€)
var inputNode: AVAudioInputNode { get }
  • ์˜ค๋””์˜ค input์„ ํ•˜๋ ค๋ฉด ์ด inputNode๋ฅผ ํ†ตํ•ด ์ผํ•œ๋‹ค.
  • ๋‚ด๊ฐ€ ์ด property์— ์ ‘๊ทผํ•˜๋ฉด ์˜ค๋””์˜ค์—”์ง„์€ singletone ๊ฐ์ฒด๋ฅผ ๋Œ๋ ค์คŒ
  • ๊ทธ๋Ÿผ input์„ ๋Œ๋ ค๋ฐ›์œผ๋ ค๋ฉด? ๋ฐฉ๋ฒ•์€ 2๊ฐ€์ง€
    • ๋‹ค๋ฅธ node๋ฅผ ์ด input์˜ output๊ณผ ์—ฐ๊ฒฐํ•˜๊ธฐ
    • recording tap ์ƒ์„ฑํ•˜๊ธฐ
func outputFormat(forBus bus: AVAudioNodeBus) -> AVAudioFormat
  • retrieves the output format for bus (...?)
  • AVAudioNodeBus: ์˜ค๋””์˜ค ๋…ธ๋“œ์— ๋ถ™์–ด์žˆ๋Š” ๋ฒ„์Šค์˜ ์ธ๋ฑ์Šค๋กœ Int ๊ฐ’์„ ๋ฐ›๋Š”๋‹ค
  • ๊ณต์‹๋ฌธ์„œ์— ๋”ฐ๋ฅด๋ฉด ์˜ค๋””์˜ค ๋…ธ๋“œ ๊ฐ์ฒด๋Š” ์—ฌ๋Ÿฌ๊ฐœ์˜ input/ output ๋ฒ„์Šค๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค๊ณ  ํ•œ๋‹ค.
  • ๋…ธ๋“œ ์•ˆ์— ๋ฒ„์Šค๊ฐ€ 1์ค„๋กœ ์ค„์ค„์ด ๋ถ™์–ด์žˆ๋‹ค๊ณ  ์ƒ์ƒํ•ด๋ณด์ž. (๊ณต์‹ ๋ฌธ์„œ์— ๋”ฐ๋ฅด๋ฉด ๋ฌด์กฐ๊ฑด 1:1๋กœ ์—ฐ๊ฒฐ๋œ๋‹ค๊ณ  ํ•จ)
  • AVAudioFormat: ์˜ค๋””์˜ค ํฌ๋งท ๊ฐ์ฒด, AudioStreamBasicDescription์˜ wrapper
func installTap(
    onBus bus: AVAudioNodeBus,
    bufferSize: AVAudioFrameCount,
    format: AVAudioFormat?,
    block tapBlock: @escaping AVAudioNodeTapBlock
)

typealias AVAudioNodeTapBlock = (AVAudioPCMBuffer, AVAudioTime) -> Void
  • audio tap ์ด๋ž€, ํŠธ๋ž™์˜ ์˜ค๋””์˜ค ๋ฐ์ดํ„ฐ์— ์ ‘๊ทผํ•  ๋•Œ ์‚ฌ์šฉํ•œ๋‹ค.
  • ๋‚ด๊ฐ€ ์„ ํƒํ•œ ๋ฒ„์Šค๋Š” record, monitor, observeํ•  ๋•Œ ์‚ฌ์šฉ๋˜๊ณ  ์ด ๋ฒ„์Šค์— ์˜ค๋””์˜ค ํƒญ์„ ๋ถ™์ธ๋‹ค. ๋…ธ๋“œ์— ์ค„์ค„์ด ์„œ์žˆ๋Š” ๋ฒ„์Šค์™€ ๊ทธ ๋ฒ„์Šค๋งˆ๋‹ค ๋‹ฌ๋ ค์žˆ๋Š” ํƒญ์„ ์ƒ์ƒํ•˜๋Š” ์ค‘.
  • bus: tap์„ ๋ถ™์ผ output bus
  • format: ์ด๋ฏธ connected state์ธ output์— ์—ฐ๊ฒฐํ•  ๊ฒฝ์šฐ ์—๋Ÿฌ ๋ฐœ์ƒ
  • tapBlock: ์˜ค๋””์˜ค ๋ฒ„ํผ(์˜ค๋””์˜ค ์‹œ์Šคํ…œ์ด ๋…ธ๋“œ์—์„œ ์บก์ฒ˜ํ•œ ์˜ค๋””์˜ค output)์™€ ํ•จ๊ป˜ ์ฒ˜๋ฆฌํ•  ์ž‘์—…

โš™๏ธ func transcribe

์ด์ „์˜ ๋ชจ๋“  ๊ณผ์ •์€ ์ด transcribe์„ ์œ„ํ•œ ๊ฒƒ์ด์—ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ์ด ๊ณผ์ •์„ ํ†ตํ•ด ์ค€๋น„ํ•œ ์˜ค๋””์˜ค ์—”์ง„์œผ๋กœ ์Œ์„ฑ์„ ๋ฐ›๊ณ  ๊ทธ ์Œ์„ฑ์„ ํ† ๋Œ€๋กœ ํ…์ŠคํŠธ๋ฅผ ์ถ”์ถœํ•œ๋‹ค.

func transcribe() {
        DispatchQueue(label: "Speech Recognizer Queue", qos: .background).async { [weak self] in
            guard let self = self, let recognizer = self.recognizer, recognizer.isAvailable else {
                self?.speakError(RecognizerError.recognizerIsUnavailable)
                return
            }
            
            do {
                let (audioEngine, request) = try Self.prepareEngine()
                self.audioEngine = audioEngine
                self.request = request
                self.task = recognizer.recognitionTask(with: request, resultHandler: self.recognitionHandler(result:error:))
            } catch {
                self.reset()
                self.speakError(error)
            }
        }
    }
  • ์šฐ๋ฆฌ์˜ ์Œ์„ฑ์„ ์ธ์‹ ํ•  recognize ์œ ํšจํ•œ์ง€ ํ™•์ธํ•˜๊ณ 
  • recognitionTask(with:resultHandler:) ๋ฉ”์„œ๋“œ๋กœ request๋ฅผ ์‹คํ–‰ํ•˜๊ณ  ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ํ•ธ๋“ค๋Ÿฌ ์•ˆ์—์„œ ์ฒ˜๋ฆฌํ•œ๋‹ค.
  • @escaping ํ•จ์ˆ˜ ์•ˆ์—์„œ๋Š” SFSpeechRecognitionResult, Error ๋ฅผ ๋ฐ›๋Š”๋‹ค.
  • ๋ฐ›์€ argument๋ฅผ ๊ทธ๋Œ€๋กœ parameter๋กœ ๋„˜๊ธฐ๊ธฐ ๋•Œ๋ฌธ์— (result:error:)๋กœ ์“ธ ์ˆ˜ ์žˆ๋‹ค.

๐Ÿ” recognitionHandler ํ•˜๋‚˜ํ•˜๋‚˜ ๋œฏ์–ด๋ณด๊ธฐ

์—ญ์‹œ ์ฝ”๋“œ๊ฐ€ ๊น”๋”ํ•œ ๋ฐ์—๋Š” ์ด์œ ๊ฐ€ ์žˆ๋‹ค. ๊ธฐ๋Šฅ์„ extractํ•˜์—ฌ ํ•ธ๋“ค๋Ÿฌ๋ฅผ ๋ณ„๊ฐœ์˜ private ๋ฉ”์„œ๋“œ๋กœ ์„ ์–ธํ–ˆ๋‹ค. ๋‚˜๋„ ์•ž์œผ๋กœ ์ด๋ ‡๊ฒŒ ์ฝ”๋“œ๋ฅผ ์งœ์•ผ์ง€.

private func recognitionHandler(result: SFSpeechRecognitionResult?, error: Error?) {
        let receivedFinalResult = result?.isFinal ?? false
        let receivedError = error != nil
        
        if receivedFinalResult || receivedError {
            audioEngine?.stop()
            audioEngine?.inputNode.removeTap(onBus: 0)
        }
        
        if let result = result {
            speak(result.bestTranscription.formattedString)
        }
    }
    
private func speak(_ message: String) {
        transcript = message
    }
  • ๊ฒฐ๊ณผ๊ฐ’์ด ์—๋Ÿฌ์ด๋ฉด ์˜ค๋””์˜ค ์—”์ง„์„ ์ค‘์ง€ํ•˜๊ณ  inputnode์— ๋ถ™์˜€๋˜ ํƒญ์„ ์ œ๊ฑฐํ•œ๋‹ค.
  • ๊ฒฐ๊ณผ ๊ฐ’์ด ์—๋Ÿฌ ๋˜๋Š” nil์ด ์•„๋‹ˆ๋ฉด bestTranscription ๋ง ๊ทธ๋Œ€๋กœ ๊ฐ€์žฅ ์‹ ๋ขฐ์„ฑ์ด ๋†’์€ ์Œ์„ฑ -> ๋ฌธ์ž์˜ ๊ฒฐ๊ณผ๋ฅผ ์ถ”์ถœํ•ด ํด๋ž˜์Šค์˜ ํ”„๋กœํผํ‹ฐ์— ๋‹ด๋Š”๋‹ค.
  • ๊ฒฐ๊ณผ ํ…์ŠคํŠธ๋ฅผ ์–ด๋–ป๊ฒŒ ์‚ฌ์šฉํ•  ์ง€๋Š” ์šฐ๋ฆฌ์—๊ฒŒ ๋‹ฌ๋ ค์žˆ๋‹ค. ๐Ÿ˜† ํ™”์ดํŒ…!

๐Ÿ˜‡ ์ •๋ฆฌํ•˜๋ฉฐ

์ฝ”๋“œ ์ž์ฒด๋Š” ๋ช‡์ค„ ๋˜์ง€ ์•Š์ง€๋งŒ ์˜ค๋””์˜ค๋ฅผ ๊ฐ€์ ธ๋‹ค ์“ฐ๋Š” ๊ฐœ๋…์ด ์ƒ์†Œํ•ด์„œ ๋…ธ๋“œ๋‹ˆ ์ธํ’‹ ์•„์›ƒํ’‹์ด๋‹ˆ ๋œฏ์–ด๋ณด๋Š” ๋ฐ์— ์‹œ๊ฐ„์ด ๊ฑธ๋ ธ๋‹ค. ๋ฌผ๋ก  ๋‹ค ์ดํ•ดํ•œ ๊ฒƒ์€ ์•„๋‹ˆ๊ณ  ์ด ์˜ค๋””์˜ค ํ”„๋กœ์„ธ์Šค๋ฅผ ํ•˜๋‚˜ํ•˜๋‚˜ ์™„๋ฒฝํ•˜๊ฒŒ ์ดํ•ดํ•  ์ƒ๊ฐ์€ ์—†๋‹ค. ์›น๊ฐœ๋ฐœ์„ ํ•  ๋•Œ์—๋Š” ์ด๋ฏธ ์ƒ์„ฑ๋œ ์˜ค๋””์˜ค ํŒŒ์ผ์„ ๋‹ค๋ฃจ๋Š” ์ผ ์ •๋„๋Š” ์žˆ์—ˆ์ง€๋งŒ ๊ทธ ์ด์ƒ์€ ์—†์—ˆ๋Š”๋ฐ ์ด๋ฒˆ ์‹œ๋„์—๋Š” ๋‚ด๊ฐ€ ๋„ค์ดํ‹ฐ๋ธŒ ์ž์›์„ ํ™œ์šฉํ•œ๋‹ค๋Š” ๊ฒƒ์ด ๋ถ„๋ช…ํ•˜๊ฒŒ ๋Š๊ปด์ ธ์„œ ์žฌ๋ฐŒ์—ˆ๋‹ค.

profile
๋Š๋ ค๋„ ํ•œ ๊ฑธ์Œ ์”ฉ ๋๊นŒ์ง€
post-custom-banner

0๊ฐœ์˜ ๋Œ“๊ธ€