브라우저에서 사용자 음성 처리하기

이종경·2024년 9월 23일

브라우저에서 음성 다루기

목록 보기

1/2

서론

리액트에서 사용자의 음성을 입력받아 처리하는 내용을 정리한 글입니다. 소스를 찾아봐도 최신 글은 잘 안보여서 적당히 수정한 내용을 올려봅니다. 다만 아쉬운 점은 리팩터링하고자 노력을 많이 했습니다만... 가독성 측면에서 부족한 점이 많습니다.

제가 생각한 핵심기능은 다음과 같습니다.

핵심기능

사용자의 음성을 입력 받아 사용자의 음성을 입력받고 있음을 나타낸다.

사용자의 음성을 입력 받아 이를 처리한다. (ex. 서버로 전송 혹은 저장)

사용자의 음성에 대한 Visualize는 Web Audio API를 통해 구현합니다.

사용자의 음성 입력받아 녹음하기

사용자의 음성은 Media Capture and Streams API (Media Stream)
을 사용하여 입력을 받게 됩니다. 코드는 다음과 같습니다.

 const startRecording = async () => {
    if (navigator.mediaDevices !== undefined) {
      try {
        streamRef.current = await navigator.mediaDevices.getUserMedia({
          audio: true,
        });
        setAudioContext(new AudioContext()); // AudioContext 생성하기
      } catch (error) {
        console.log(error);
      }
    }
  };

이때 생성된 AudioContext에따라 아래의 Hook이 실행됩니다.
자세한 설명은 주석으로 기입했습니다.

 useEffect(() => {
    if (!audioContext || !streamRef.current) return;

    analyserRef.current = audioContext.createAnalyser(); // 오디오 데이터 시각화를 위해 생성
    sourceRef.current = audioContext.createMediaStreamSource(streamRef.current); // 사용자의 음성을 AudioContext와 연결
    sourceRef.current.connect(analyserRef.current); // 생성된 스트림을 Analyser와 연결

   requestAnimationFrame(updateDecibel); // 사용자의 음성을 감지하여 이벤트 발행
   
    recorderRef.current = new MediaRecorder(streamRef.current); // MediaRecorder 생성

    const audioChunks: Blob[] = []; // 데이터 변환을 위해 배열 생성
    recorderRef.current.ondataavailable = (e) => { // 녹음 실행시 이벤트 설정
      audioChunks.push(e.data);
    };

    recorderRef.current.onstop = () => { // 녹음 멈춤시 이벤트 설정
      const audioBlob = new Blob(audioChunks);
      setAudioBlob(audioBlob); // 정지시 audioBlob 저장
    };

    recorderRef.current.start(); // 녹음 시작
 	return () => {
      if (!frameRef.current) return;
      cancelAnimationFrame(frameRef.current); 
    };
  }, [audioContext]);

녹음된 음성 처리

  useEffect(() => {
    if (!audioBlob) return;
    console.log(audioBlob); // 녹음된 오디오 파일

    // 로컬에 저장해서 들어보려면 아래와 같이 작성해주세요.
    const url = URL.createObjectURL(audioBlob);
    const a = document.createElement("a");
    a.href = url;
    a.download = "audio.wav";
    document.body.appendChild(a);
    a.click();
    URL.revokeObjectURL(url);
    document.body.removeChild(a);
  }, [audioBlob]);

해당 hook을 통해 audioBlob의 생성여부에 따라 음성 처리 로직을 추가합니다. (API 요청, 로컬에 저장 등)

데시벨에 따라 애니메이션 실행하기

우선 다음과 같이 작성하여 analyser로부터 데시벨 계산식을 작성합니다.

const getDecibel = (analyser: AnalyserNode) => {
    const bufferLength = analyser.frequencyBinCount;
    const dataArray = new Uint8Array(bufferLength); // 주파수 시각화
    analyser.getByteFrequencyData(dataArray); // 주파수 가져오기
    const sum = dataArray.reduce((acc, cur) => acc + cur);
    const average = sum / bufferLength;
    return Math.floor(20 * Math.log10(average));
  };

데시벨 계산식은 다음을 참고하여 작성하였습니다.

출처 : 나무위키 - 데시벨

requestAnimationFrame 메서드를 활용하여 애니메이션을 등록합니다.

이를 통해 프레임 시작시 매번 이벤트가 실행됩니다.

const updateDecibel = () => {
    if (!analyserRef.current || !streamRef.current || !audioContext) return;

    if (audioContext.currentTime >= 20) { // 단위는 초(s)입니다.
      // 최대 녹음 시간 설정
      saveRecording();
    }

    const isStopRecording = streamRef.current.getAudioTracks().every((track) => track.readyState === "ended"); // 음성 출력이 종료되었는지 확인

    if (isStopRecording && frameRef.current) {
      return cancelAnimationFrame(frameRef.current); // 데시벨 업데이트 중지
    }
    const decibel = getDecibel(analyserRef.current); // 매프레임마다 데시벨을 받아옴
    setIsSpeaking(decibel > 10); // 10dB 보다 크면 true 
    frameRef.current = requestAnimationFrame(updateDecibel); // 추후 녹음이 종료될 때 이벤트를 삭제하기 위해 ref에 할당
  };

전체 코드

function RecordButton() {
  const streamRef = useRef<MediaStream | null>(null);
  const recorderRef = useRef<MediaRecorder | null>(null);
  const analyserRef = useRef<AnalyserNode | null>(null);
  const sourceRef = useRef<MediaStreamAudioSourceNode | null>(null);
  const frameRef = useRef<number | null>(null);

  const [audioContext, setAudioContext] = useState<AudioContext | null>(null);
  const [onRecord, setOnRecord] = useState<boolean>(false);
  const [audioBlob, setAudioBlob] = useState<Blob | null>(null);
  const [isSpeaking, setIsSpeaking] = useState<boolean>(false);

  const getDecibel = (analyser: AnalyserNode) => {
    const bufferLength = analyser.frequencyBinCount;
    const dataArray = new Uint8Array(bufferLength); 
    analyser.getByteFrequencyData(dataArray);
    const sum = dataArray.reduce((acc, cur) => acc + cur);
    const average = sum / bufferLength;
    return Math.floor(20 * Math.log10(average));
  };

  const updateDecibel = () => {
    if (!analyserRef.current || !streamRef.current || !audioContext) return;

    if (audioContext.currentTime >= 20) {
      saveRecording();
    }

    const isStopRecording = streamRef.current.getAudioTracks().every((track) => track.readyState === "ended");

    if (isStopRecording && frameRef.current) {
      return cancelAnimationFrame(frameRef.current);
    }
    const decibel = getDecibel(analyserRef.current);
    setIsSpeaking(decibel > 10);
    frameRef.current = requestAnimationFrame(updateDecibel);
  };

  const startRecording = async () => {
    if (navigator.mediaDevices !== undefined) {
      try {
        streamRef.current = await navigator.mediaDevices.getUserMedia({
          audio: true,
        });
        setAudioContext(new AudioContext());
      } catch (error) {
        console.log(error);
      }
    }
  };

  const saveRecording = () => {
    if (!streamRef.current || !audioContext || !sourceRef.current || !recorderRef.current) return;
    recorderRef.current.stop();

    streamRef.current.getAudioTracks().forEach((track) => {
      track.stop(); 
    });
    setAudioContext(null);
    sourceRef.current.disconnect();
    setOnRecord(false);
  };

  useEffect(() => {
    if (!audioContext || !streamRef.current) return;

    analyserRef.current = audioContext.createAnalyser();
    sourceRef.current = audioContext.createMediaStreamSource(streamRef.current);
    sourceRef.current.connect(analyserRef.current);

    requestAnimationFrame(updateDecibel);

    recorderRef.current = new MediaRecorder(streamRef.current);

    const audioChuncks: Blob[] = [];
    recorderRef.current.ondataavailable = (e) => {
      audioChuncks.push(e.data);
    };

    recorderRef.current.onstop = () => {
      const audioBlob = new Blob(audioChuncks, { type: "audio/wav" });
      setAudioBlob(audioBlob);
    };

    recorderRef.current.start(); 
    setOnRecord(true);

    return () => {
      if (!frameRef.current) return;
      cancelAnimationFrame(frameRef.current);
    };
  }, [audioContext]);

  useEffect(() => {
    if (!audioBlob) return;
    console.log(audioBlob);
  }, [audioBlob]);

  return (
    <button
      className={`audio-button ${onRecord ? "bg-blue" : "bg-orange"} ${
        onRecord && isSpeaking ? "border-green" : "border-gray"
      }`}
      onClick={onRecord ? saveRecording : startRecording}
    >
      {onRecord ? "녹음 중지" : "녹음 시작"}
    </button>
  );
}

참고
MDN
AudioContext
MediaRecorder
MediaElementAudioSourceNode
BaseAudioContext
Window: requestAnimationFrame() method
etc
웹 애니메이션 최적화 requestAnimationFrame 가이드

이종경

작은 성취들이 모여 큰 결과를 만든다고 믿으며, 꾸준함을 바탕으로 개발 역량을 키워가고 있습니다

다음 포스트