Auditory & Acoustic
让机器人听懂世界——从语音识别到环境声音分离到声学定位。这是具身感知里最被低估的一块,但所有家务机器人都需要它。
先读这三篇。
Whisper 用 68 万小时弱标注做 ASR → AudioLM 把音频建模成 token 序列 → Acoustic Swarms 用一群麦克风分离声源。
-
1
Robust Speech Recognition via Large-Scale Weak Supervision
Whisper 把网上 68 万小时音频和字幕一锅烩,喂进普通 Transformer,开箱就能听各种口音、噪声和长录音,还顺手翻译——靠数据杂取胜。
-
2
AudioLM
把声音切成两种"音频字"——一种管说啥、一种管音色,模型像写句子一样续写,给 3 秒就能接出像本人的语音。
-
3
Creating speech zones with self-distributing acoustic swarms
七个像骰子那么大的小机器人,自己爬上桌散成一圈,桌上几个人同时讲话,它能分清谁说了啥。
2019 到 2024,15 篇怎么排开。
祖师爷
经典
前沿
Auditory & Acoustic 全部 15 篇。
| era | year | title | venue |
|---|---|---|---|
| 前沿 | 2024 | Proactive Hearing Assistants that Isolate Egocentric Conversations | UIST |
| 经典 | 2024 | NeuralAids: Wireless Hearables With Programmable Speech AI Accelerators | MobiCom |
| 祖师爷 | 2023 | Creating speech zones with self-distributing acoustic swarms | Nature |
| 祖师爷 | 2019 | Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation | IEEE/ACM TASLP |
| 祖师爷 | 2022 | SoundStream: An End-to-End Neural Audio Codec | IEEE/ACM TASLP |
| 经典 | 2020 | Conformer | Interspeech |
| 经典 | 2020 | Dual-path RNN | ICASSP |
| 经典 | 2021 | Meta-StyleSpeech | ICML |
| 经典 | 2023 | AudioLM | TASLP |
| 经典 | 2023 | EnCodec | TMLR |
| 经典 | 2023 | MusicLM | arXiv |
| 经典 | 2023 | Robust Speech Recognition via Large-Scale Weak Supervision | ICML |
| 前沿 | 2023 | SeamlessM4T | arXiv |
| 前沿 | 2024 | Stable Audio | ICML |
| 前沿 | 2024 | Universal Source Separation with Weakly Labelled Data | TASLP |