미팅 3

그녕·2024년 10월 7일

AI

목록 보기

31/32

multisource의 논문들 independent하게도 어려운데 dependent하게 해보자
비슷한 논문 있는지 찾아볼 것 (손뼉이 마주쳐야지 소리가 남- 관계에 초점)
Audio clip의 소리 annotation이 뭔지(물체 그 자체인 지, 확장된 단어인 지)
audio processing tutorial 공부하기

내 연구 방향: 소리와 이미지(영상)의 관계를 살펴보기 (귀는 눈보다 강력하다)
예를 들어, 사람이 쓰레기통을 차서 탕 소리가 난 경우 vs 사람이 쓰레기통 옆인 허공을 차서 소리가 안난 경우 눈으로만 보면 사람이 쓰레기통을 찼는지 안 찼는지 헷갈리지만 실제 소리를 들어보면 명백히 알 수 있음.
이처럼 소리와 이미지(영상)의 관계를 보고 text로 사람이 쓰레기통을 찼다 이런 text를 만들거나, 어디서 소리가 났는지 (쓰레기통, 발이 충돌 하는 그 순간 지점)?

비슷한 논문 있는지

audio visual task에 대해서 그 둘의 관계에 초점을 둔 논문이 있는지 찾아보았습니다.
키워드를

audio visual causality detection
causal action sound recognition
audio visual grounding

SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos (cvpr2024)
논문링크

=> 사람의 행동으로 직접 발생한 소리와 그렇지 않은 소리를 구분하는 방법

=> 데이터셋 pair를 보면 직접 안 모으고도 필요한걸 얻을 수 있음, annotation supplementary도 뒤에 나와있음, 자세히 읽어보기 !
actionto sound 논문 읽기

Audio dataset annotation 찾아보기

audio clip에서 사용한 dataset 2개

ESC-50 dataset
git: https://github.com/karolpiczak/ESC-50

2000개의 라벨된 audio recording, 5개의 폴더로 나뉜다

Urban Sound Datasets
시작 시간, 종료 시간, salience label, class label
salience label: 1= Foreground 전경, 2= Background 배경
class label: air_conditioner, car_horn, children_playing, dog_bark, drilling, engine_idling, gun_shot, jackhammer, siren, street_music

Audio Signal Processing for Machine Learning

youtube link
github link

0. Audio Signal Processing for Machine Learning

Sound waves, DAC/ADC, Audio transformations ...

1. Sound and Waveforms

Sound 소리는 물체의 진동에 의해서 생성된다. Vibrations cause air molecules to oscillate.
Waveform은 frequency(주파수), intensity(강도), timbre(음색)의 정보를 지닌다.

Higher frequency -> higher sound
Larger amplitude(진폭) -> louder sound
Mapping pitch to frequency:

F(p) = 2^(p-69/12) *440

2. Intensity, Loudness, and Timbre

Sound power- watt(W)로 측정됨
Sound Intensity - Sound power per unit area, W/m^2으로 측정됨
Intensity level - dB로 측정됨

dB = 10 log I(1)/I(0)  # I(0)은 기준 세기, I(1)은 비교하는 세기

Loudness - 우리가 인식하는거, 주관적인 sound intensity, phons로 측정
Timbre- 소리의 색, 같은 intensity, frequency, duration이어도 다른 두 소리의 차이점, bright, dark, warm, harsh이런 단어로 묘사됨 단어로 묘사됨, multidimensional
Timbre의 특징으로 Sound envelope (시간 진행에 따른 소리 변화의 양상)
Complex sound

그녕

AI 개발자

이전 포스트

Audio-Visual grounding 정리

다음 포스트