๐Ÿ—ฃ๏ธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์—†์ด TTS(Text To Speech) ๊ธฐ๋Šฅ ๊ตฌํ˜„ํ•˜๊ธฐ

Haizelยท2023๋…„ 4์›” 19์ผ
2

๐Ÿ“ฎ SENDY | Main Project

๋ชฉ๋ก ๋ณด๊ธฐ
2/6
post-thumbnail

๐Ÿ’ญ ์Œ์„ฑ TTS ๊ธฐ๋Šฅ์„ ๋„์ž…ํ•œ ์ด์œ 


ํŽธ์ง€๋ฅผ ์ฝ๊ธฐ ์–ด๋ ค์šด ์‚ฌ์šฉ์ž(ํ•ธ๋“œํฐ์„ ๋ณผ ์ˆ˜ ์—†๋Š” ์ƒํ™ฉ์ด๊ฑฐ๋‚˜ ๋…ธ์•ˆ, ์žฅ์•  ๋“ฑ) ๋“ค๋„ Sendy(์ƒŒ๋””) ์„œ๋น„์Šค๋ฅผ ์ด์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ํ…์ŠคํŠธ โžก๏ธ ์Œ์„ฑ์œผ๋กœ ๋ณ€ํ™˜ํ•ด์ฃผ๋Š” TTS ๊ธฐ๋Šฅ์„ ๋„์ž…ํ–ˆ๋‹ค.


๐Ÿง ์™œ ๋ณ„๋„ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋‚˜ API๊ฐ€ ์•„๋‹Œ Window ๋‚ด์žฅ Web Speech API๋ฅผ ์‚ฌ์šฉํ•˜์˜€๋‚˜?


์นด์นด์˜ค๋‚˜ ๊ตฌ๊ธ€์—์„œ ์ œ๊ณตํ•˜๋Š” TTS ์„œ๋น„์Šค์˜ ๊ฒฝ์šฐ, ์Œ์„ฑ์ด ์ข€ ๋” ์ž์—ฐ์Šค๋Ÿฝ๊ณ  ํ˜ธํ™˜์„ฑ์ด ์ข‹๋‹ค๋Š” ์žฅ์ ์ด ์žˆ์ง€๋งŒ ์œ ๋ฃŒ ์„œ๋น„์Šค์ธ ๊ด€๊ณ„๋กœ ๋ฌด๋ฃŒ๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” Window Web Speech API์„ ์ด์šฉํ•ด TTS๋ฅผ ๊ตฌํ˜„ํ•˜์˜€๋‹ค.


๐Ÿ’Œ ์ƒŒ๋”” ์† ์Œ์„ฑ TTS ๊ธฐ๋Šฅ


์•„์‰ฝ๊ฒŒ๋„ ๋ฒจ๋กœ๊ทธ์—” ๋™์˜์ƒ์ด ์—…๋กœ๋“œ๋˜์ง€ ์•Š์•„ GIF๋กœ ๋Œ€์ฒดํ•œ๋‹ค. ํ˜น ๊ถ๊ธˆํ•œ ์‚ฌ๋žŒ์ด ์žˆ๋‹ค๋ฉด, ์œ„ ์ƒŒ๋”” ์‚ฌ์ดํŠธ๋ฅผ ๋ฐฉ๋ฌธํ•ด์ฃผ์„ธ์š” ! ๐Ÿ“ฎ


๐Ÿ—ฃ๏ธ Web Speech API ๊ธฐ๋Šฅ ๋ฐ ์‚ฌ์šฉ๋ฒ•


Web Speech API ๋Š” ์›น ์•ฑ์—์„œ ์Œ์„ฑ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” API๋กœ, SpeechSynthesis(ํ…์ŠคํŠธ ์Œ์„ฑ ๋ณ€ํ™˜) ๋ฐ SpeechRecognition(๋น„๋™๊ธฐ ์Œ์„ฑ ์ธ์‹)์˜ ๋‘ ๋ถ€๋ถ„์œผ๋กœ ๊ตฌ์„ฑ๋œ๋‹ค.

1. SpeechSynthesis

์ปจํŠธ๋กค๋Ÿฌ ์—ญํ• ์„ ํ•˜๋Š” Interface๋กœ ๋””๋ฐ”์ด์Šค์—์„œ ์‚ฌ์šฉ๊ฐ€๋Šฅํ•œ ๋ชฉ์†Œ๋ฆฌ ์ •๋ณด๋ฅผ ๊ฐ€์ ธ์™€ ์Œ์„ฑ์„ ์Šคํ”ผ์น˜ํ•˜๊ณ  ์ •์ง€ํ•  ์ˆ˜ ์žˆ๋‹ค.

์ฃผ์š” ๋ฉ”์„œ๋“œ๊ธฐ๋Šฅ
getVoices()ํ˜„์žฌ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๋Š” ๋””๋ฐ”์ด์Šค์˜ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๋ชฉ์†Œ๋ฆฌ๋ฅผ ๊ฐ€์ ธ์˜จ๋‹ค.
speak()utterance(speech)๋ฅผ utterrace queue์— ์Œ“์•„ ์Œ์„ฑ์œผ๋กœ ์ฝ์„ ์ˆ˜ ์žˆ๊ฒŒ ํ•œ๋‹ค.

2. SpeechSynthesisUtterance

speech ์š”์ฒญ์„ ํ‘œํ˜„ํ•˜๋Š” interface๋กœ ๋ฌด์—‡์„ ์–ด๋–ป๊ฒŒ ์ฝ์„์ง€์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ํฌํ•จํ•˜๊ณ  ์žˆ๋‹ค.

์ฃผ์š” ํ”„๋กœํผํ‹ฐ์„ค์ •
langko-KR ์™€ ๊ฐ™์ด ์‚ฌ์šฉํ•  ์–ธ์–ด์— ๋Œ€ํ•œ ์„ค์ •์œผ๋กœ, ๊ธฐ๋ณธ๊ฐ’์€ lang๊ฐ’ ๋˜๋Š” user-agent-default ๊ฐ’์ด๋‹ค.
pitch์Šคํ”ผ์น˜์˜ pitch ๊ฐ’์œผ๋กœ, ๊ธฐ๋ณธ๊ฐ’์€ 1์ด๋ฉฐ 0 ~ 2๊นŒ์ง€ ์กฐ์ ˆ ๊ฐ€๋Šฅํ•˜๋‹ค.
rate์Šคํ”ผ์น˜์˜ ์†๋„๊ฐ’์œผ๋กœ ๊ธฐ๋ณธ๊ฐ’์€ 1์ด๋‹ค. 0.1์—์„œ 10๊นŒ์ง€ ์กฐ์ ˆ ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ˆซ์ž๊ฐ€ ํด์ˆ˜๋ก ์†๋„๊ฐ€ ๋น ๋ฅด๋‹ค.
voice์Šคํ”ผ์น˜ ๋ชฉ์†Œ๋ฆฌ ์„ค์ •๊ฐ’์œผ๋กœ, ๋ฏธ์„ค์ • ์‹œ utterance ์–ธ์–ด์˜ default ์–ธ์–ด ๋ชฉ์†Œ๋ฆฌ๊ฐ€ ๊ธฐ๋ณธ๊ฐ’์ด ๋œ๋‹ค.

3. ๋”œ๋ ˆ์ด ๋ฌธ์ œ ํ•ด๊ฒฐ

  • TTS ๊ธฐ๋Šฅ์„ ๊ตฌํ˜„ํ•ด ์‚ฌ์šฉํ•ด๋ณด๋‹ˆ window.speechSynthesis.getVoices() ๋ฉ”์„œ๋“œ์—์„œ ๋”œ๋ ˆ์ด๊ฐ€ ๋ฐœ์ƒํ•ด ์ตœ์ดˆ ์‚ฌ์šฉ์‹œ ๋ช‡ ์ดˆ ํ›„ TTS ์Œ์„ฑ์ด ๋‚˜์˜ค๋Š”๊ฒŒ ํ™•์ธ๋˜์—ˆ๋‹ค. ๋”ฐ๋ผ์„œ ํŽ˜์ด์ง€ ์ดˆ๊ธฐ ๋ Œ๋”๋ง ์‹œ ํ•ด๋‹น ๋ฉ”์„œ๋“œ๋„ ํ•จ๊ป˜ ๋ Œ๋”๋ง ๋  ์ˆ˜ ์žˆ๋„๋ก useEffect ์— ๋„ฃ์–ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜์˜€๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ ๋”œ๋ ˆ์ด ์—†์ด TTS ๊ธฐ๋Šฅ์„ ์‚ฌ์šฉ ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋˜์—ˆ๋‹ค.
  //์Œ์„ฑ ๋ณ€ํ™˜ ๋ชฉ์†Œ๋ฆฌ preload
  useEffect(() => {
    window.speechSynthesis.getVoices();
  }, []);

๐Ÿ‘ฉ๐Ÿปโ€๐Ÿ’ป ์ตœ์ข… ๊ตฌํ˜„ํ•œ ์ฝ”๋“œ


TTS. js

  • ์Œ์„ฑ TTS ์žฌ์ƒ

export const getSpeech = (text) => {
  let voices = [];

  //๋””๋ฐ”์ด์Šค์— ๋‚ด์žฅ๋œ voice๋ฅผ ๊ฐ€์ ธ์˜จ๋‹ค.
  const setVoiceList = () => {
    voices = window.speechSynthesis.getVoices();
  };

  setVoiceList();

  if (window.speechSynthesis.onvoiceschanged !== undefined) {
    //voice list์— ๋ณ€๊ฒฝ๋์„๋•Œ, voice๋ฅผ ๋‹ค์‹œ ๊ฐ€์ ธ์˜จ๋‹ค.
    window.speechSynthesis.onvoiceschanged = setVoiceList;
  }

  const speech = (txt) => {
    const lang = "ko-KR";
    const utterThis = new SpeechSynthesisUtterance(txt);
    //rate : speech ์†๋„ ์กฐ์ ˆ (๊ธฐ๋ณธ๊ฐ’ 1 / ์กฐ์ ˆ 0.1 ~ 10 -> ์ˆซ์ž๊ฐ€ ํด์ˆ˜๋ก ์†๋„๊ฐ€ ๋น ๋ฆ„ )
    const rate = 0.8;

    utterThis.lang = lang;
    utterThis.rate = rate;

    /* ํ•œ๊ตญ์–ด vocie ์ฐพ๊ธฐ
        ๋””๋ฐ”์ด์Šค ๋ณ„๋กœ ํ•œ๊ตญ์–ด๋Š” ko-KR ๋˜๋Š” ko_KR๋กœ voice๊ฐ€ ์ •์˜๋˜์–ด ์žˆ๋‹ค.
    */
    const kor_voice = voices.find(
      (elem) => elem.lang === lang || elem.lang === lang.replace("-", "_")
    );

    //ํžŒ๊ตญ์–ด voice๊ฐ€ ์žˆ๋‹ค๋ฉด ? utterance์— ๋ชฉ์†Œ๋ฆฌ๋ฅผ ์„ค์ •ํ•œ๋‹ค : ๋ฆฌํ„ดํ•˜์—ฌ ๋ชฉ์†Œ๋ฆฌ๊ฐ€ ๋‚˜์˜ค์ง€ ์•Š๋„๋ก ํ•œ๋‹ค.
    if (kor_voice) {
      utterThis.voice = kor_voice;
    } else {
      return;
    }

    //utterance๋ฅผ ์žฌ์ƒ(speak)ํ•œ๋‹ค.
    window.speechSynthesis.speak(utterThis);
  };

  speech(text);
};
  • ์Œ์„ฑ TTS ๋ฉˆ์ถค
export const pauseSpeech = () => {
  window.speechSynthesis.cancel();
};

ReadLetter. js

import { getSpeech, pauseSpeech } from "./TTS";

...

  //์Œ์„ฑ value ์ƒํƒœ
  const voiceValue = `${data.content}`;

  //์Œ์„ฑtts speech ๋ฒ„ํŠผ
  const handleSpeechButton = () => {
    getSpeech(voiceValue);
  };
  //์Œ์„ฑtts pause ๋ฒ„ํŠผ
  const handlePauseButton = () => {
    getSpeech(pauseSpeech());
  };

  //์Œ์„ฑ ๋ณ€ํ™˜ ๋ชฉ์†Œ๋ฆฌ preload
  useEffect(() => {
    window.speechSynthesis.getVoices();
  }, []);

return (
  
   <>
    ...
     <AiOutlineSound onClick={handleSpeechButton} />
     <HiPause onClick={handlePauseButton} />
    ...
    
   </>   
  
  )
}

๐Ÿ’ญ ์•„์‰ฌ์šด์ 


  • ๋ฌด๋ฃŒ API ์ด๋‹ค๋ณด๋‹ˆ ์Œ์„ฑ์ด ์กฐ๊ธˆ ๋ถ€์ž์—ฐ์Šค๋Ÿฝ๊ณ  ์ด์งˆ๊ฐ์ด ๋“ ๋‹ค.
  • ๋˜ ๋ธŒ๋ผ์šฐ์ € ๊ธฐ๋ฐ˜์œผ๋กœ ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•˜์˜€๊ธฐ ๋•Œ๋ฌธ์—, ๋ชจ๋ฐ”์ผ ํ™˜๊ฒฝ์—์„œ๋Š” ์Œ์„ฑ TTS ๊ธฐ๋Šฅ์ด ์†ก์ถœ๋˜์ง€ ์•Š๋Š”๋‹ค๋Š” ๋ฌธ์ œ์ ์ด ์žˆ๋‹ค.
  • ์ถ”ํ›„tts-react , ํ˜น์€ react-speech-kit ๋ฅผ ํ†ตํ•ด ๋ฆฌํŒฉํ† ๋งํ•ด ์‚ฌํ•ญ๋“ค์„ ๊ฐœ์„ ํ•  ์˜ˆ์ •์ด๋‹ค.

| ์ฐธ๊ณ ์ž๋ฃŒ

profile
ํ•œ์ž… ํฌ๊ธฐ๋กœ ๋ฒ ์–ด๋จน๋Š” ๊ฐœ๋ฐœ์ง€์‹ ๐Ÿฐ

0๊ฐœ์˜ ๋Œ“๊ธ€