回 Jason 主站·Embodied AI Reading Station
没主意?快捷入口
RF Perception & Mapping · Plate Nº 90

RadarSLAM: Radar based Large-Scale SLAM in All Weathers

7 min read · 2561 字 · ⭐⭐⭐⭐ · 短摘要

本笔记基于摘要 + 公开资料,未读全文。

一句话讲什么(TL;DR)

让一台"会转圈的雷达"在大雾大雪天里也能给车画地图、记住自己走过哪。

这是个什么场景

想象你下大雪天开车回家。雪片糊在挡风玻璃上、车灯一开全是反光、GPS 在高架桥下信号断了——你只能靠对路口、对路牌的模糊印象往前蹭。自动驾驶车也会遇到一模一样的窘境:

  • 相机(车的"眼睛"):跟人眼一样,黑灯瞎火、起雾下雪就抓瞎
  • 激光雷达(用激光尺到处测距):精度高,但雨滴雪花会把激光打散,像在烟雾里打手电
  • 雷达(毫米波,类似蝙蝠的回声定位,只是换成电磁波):波长更长,能"穿"过雨雾雪,几乎不挑天气

RadarSLAM 想做的事,就是只靠这台"全天候蝙蝠"的回波数据,同时完成两件事:知道自己在哪、画出周围地图。难点在于雷达图像很糊——噪声大、分辨率低、还经常出"鬼影"(multipath / speckle,多路径反射和斑点噪声),不像相机照片那样一眼能看懂。

RadarSLAM — 场景示意:这论文要解决的现实问题
Plate Nº IRadarSLAM — 场景示意:这论文要解决的现实问题

之前的人怎么做的 — 3-5 bullet

  • 视觉 SLAM(ORB-SLAM 系列):靠相机提取特征点匹配。优点是轻便便宜,缺点是怕黑、怕逆光、怕雨雾
  • 激光雷达 SLAM(LOAM、Cartographer):靠 3D 点云配准(ICP / NDT)。精度高,但激光在恶劣天气下衰减严重
  • 早期雷达 odometry:只做"前后两帧之间走了多远",不做闭环,不做全局地图,所以漂移会越积越大
  • 基于深度学习的雷达定位:把雷达图当图像直接学,但需要大量标注,且泛化到新城市场景吃力
  • 雷达 + GPS / IMU 融合:靠外部传感器消除漂移,但 GPS 在隧道、室内、高楼峡谷里不可靠

RadarSLAM 想做的是纯雷达 + 全图优化这条路:不依赖 GPS,也不靠学习模型,而是用经典 SLAM 框架(前端 odometry + 后端 pose graph)把雷达"用透"。

这篇论文的关键想法

打个比方:视觉 SLAM 是"明厨亮灶里的老师傅",整套菜谱(前端跟踪 + 回环 + 全图优化)已经很成熟。RadarSLAM 干的事就是把这套菜谱原样搬进"地下室厨房"——食材(数据)变糊了、灯(光线)暗了,但流程基本不变,只针对几样关键工序换工具:

  1. 位姿跟踪(odometry,"我刚才走了多远"):从扫描雷达的 polar image(极坐标图,下面解释)里抽稳定的特征点,对比相邻两帧,估计车开了多远、转了多少
  2. 回环检测(loop closure,"这地方好像来过"):当车绕回之前去过的路口,系统要能"认出来"。论文用一种适合雷达图的描述子(descriptor,下面解释)做地点识别,一旦匹配上就加一条"我又回到这了"的约束
  3. 位姿图优化(pose graph optimization,"把走偏的轨迹拉直"):把所有里程计估计 + 闭环约束扔进一个全局图优化器(g2o / GTSAM 这类工具),让长时间累积的漂移在闭环处被"拉回去"
  4. 全天候鲁棒:因为雷达本身不挑天气,上面这套流水线在雨、雾、雪、夜晚都成立——这是相比视觉/激光 SLAM 的核心卖点

等等,先慢一拍——这里面的几个词到底是什么?

  • polar image:雷达每转一圈,把不同角度收到的回波拼成一张图。横轴是距离、纵轴是角度,跟相机照片完全两回事,得当成另一种"图像"来处理
  • 描述子:把一帧观测压成一串数字"指纹"。两帧指纹像,就大概率是同一个地方
  • pose graph:一张"我去过哪、相对怎么走"的关系图。每个节点是一帧位姿,每条边是一段相对位移

可以理解为:"视觉 SLAM 的方法论 + 雷达的传感器优势 = 一个能在现实世界恶劣天气下用的 SLAM"。

RadarSLAM — 方法示意:核心 pipeline
Plate Nº IIRadarSLAM — 方法示意:核心 pipeline

它怎么做的(方法)— 3-4 段

第一段 — 雷达数据预处理(像把模糊照片里的"亮点"挑出来):扫描雷达每转一圈输出一张 polar image,每行是某个角度上一束波的回波强度(intensity vs range)。原图噪声大,需要做峰值检测(peak detection)+ 阈值过滤,把"看起来像真实物体"的反射点挑出来。这一步等价于把雷达 raw 数据变成稀疏的 2D 关键点集合,类似从相机图里抽 ORB 特征点。

第二段 — 帧间 odometry(像两张拼图边对齐,看看挪了多少):拿到当前帧的关键点集合,跟上一帧做匹配(matching),估计两帧之间的相对位姿 ΔT(dx, dy, dθ,前后/左右/转角)。具体配准方法可能是 RANSAC(一种排除"离谱点"的统计方法)+ 几何一致性筛选,或者类似 ICP 的迭代最近点。这一步给出短时段的运动估计,但会随着时间累积漂移。

第三段 — 回环检测(像看到熟悉路牌想起"我来过这"):每隔一段距离把当前帧的全局描述子和历史帧库比对。如果发现高度相似的历史帧,且几何上自洽(不是巧合),就触发一次 loop closure,加一条约束边到 pose graph 里。雷达的描述子设计是关键挑战——既要对视角变化鲁棒,又要对噪声/动态物体不敏感。具体描述子设计需读原文。

第四段 — 全图优化与建图(像老师把答错的题分摊回前几页改回去):把所有"帧到帧"约束和"闭环"约束一起扔给非线性最小二乘求解器(pose graph optimization),优化所有历史位姿,让闭环处的累计误差被分摊回整条轨迹。最终输出一条全局一致的轨迹 + 一张由所有 keyframe(关键帧)雷达点拼起来的全局地图。

实验在做什么

论文用公开雷达数据集(Oxford Radar RobotCar Dataset 或类似数据集)做评测,重点关注:

  • 轨迹精度:和真值(ground truth,一般是 GPS-RTK + INS)比,看绝对/相对位姿误差(ATE / RPE)。具体数字需读原文
  • 天气鲁棒性:在雨/雾/雪/夜晚等不同天气分别跑,看精度是否大幅退化。这是论文最大的卖点
  • 对比 baseline:和视觉 SLAM、激光 SLAM、纯 radar odometry(不做闭环)对比
  • 大尺度场景:跑长达数公里甚至几十公里的城市场景,看回环和全局优化是否真的能压住漂移

你应该懂的几个新词 — 4-6 个

  • FMCW scanning radar:调频连续波扫描雷达,发射频率随时间线性变化的电磁波,通过回波频差测距,配合机械旋转扫一圈
  • Polar image:极坐标图。每行是一个角度,每列是该角度上不同距离的回波强度。和相机图(笛卡尔坐标)不一样,处理时常要转成 cartesian
  • Pose graph optimization:位姿图优化。每个节点是一帧的位姿,每条边是一个相对位姿约束。优化目标是让所有约束的残差最小
  • Loop closure:回环检测/闭环。识别"我现在在的地方之前来过",然后加一条跨越很多帧的约束,把累积漂移拉回正
  • Place recognition descriptor:地点识别描述子。把一帧的传感器观测压成一个紧凑向量,用向量相似度判断"是不是同一个地方"
  • Odometry drift:里程计漂移。短期估计精度尚可,但每帧的小误差不停累加,跑久了轨迹会"飘走"

它和其他论文什么关系

  • 上游传感器思路:受到 millimap(毫米波建图)、rf-slam(早期 RF SLAM)系列启发,都是"用穿透性强的电磁波代替光"
  • 方法论上游:直接借鉴视觉 SLAM 的经典 pipeline(ORB-SLAM、Cartographer),尤其是 pose graph + 回环检测这套架构
  • 数据集层面:和 Oxford Radar RobotCar、MulRan 这类公开雷达数据集互相催化
  • 后续工作:催生了 Under the Radar、Kidnapped Radar、RaLL(Radar Localization Learning)等一系列雷达 SLAM/定位工作,以及把雷达和 lidar/camera 融合的多模态 SLAM
  • 横向参考:本仓库里 nlos-mmwave.md 关注非视距毫米波感知,millimap.md 关注雷达建图,本文是把这两条思路往"完整 SLAM 系统"推进的代表

我建议这样读 — 3-4 步

  1. 先看图和实验:直接翻论文里的轨迹图和定性建图图,感受"在雪天/雾天里画出的地图长啥样"。这是最能传递 motivation 的部分
  2. 看 odometry 章节:搞清楚雷达关键点怎么提的、帧间匹配是怎么做的——这是和视觉 SLAM 最不一样的地方
  3. 看 loop closure 章节:重点看描述子设计。雷达图的全局描述子是这条线最有研究价值的子问题
  4. 跳过 pose graph 细节:如果你已经懂 g2o/GTSAM,这部分就是标准操作;不懂的话回头补 SLAM 教材的 pose graph 章节再来

为什么值得读

  • 传感器视角的开拓:在视觉/激光 SLAM 卷到极致的时代,提醒你"换个传感器,整个问题域就变了"——尤其是为自动驾驶、低空飞行、户外机器人等真实部署场景服务时,恶劣天气是无法回避的
  • 经典框架 + 新传感器的范式:展示了一种很务实的研究模式——不是发明新理论,而是把成熟方法论"翻译"到新数据上,并解决翻译过程中的关键技术债
  • 完整系统:不只做 odometry,而是做完整 SLAM(前端 + 回环 + 全图优化),可以当成"如何搭一个端到端 SLAM 系统"的样本工程
  • 对 embodied AI 的意义:未来机器人/无人机要走出实验室、面对真实天气,全天候感知是必需品。RadarSLAM 是这条路上的早期里程碑

引用本笔记 / Cite this note
BibTeX
@online{eai_radarslam_2026,
  title       = {(readable note) RadarSLAM: Radar based Large-Scale SLAM in All Weathers},
  author      = {Zhou, Jason},
  year        = {2026},
  note        = {Note on a 2020 paper},
  howpublished = {\url{https://estelledc.github.io/embodied-ai-reading-station/papers/radarslam/}},
  organization = {Embodied AI Reading Station}
}

All 156 papers (full index)
  1. 1. LLaVA: Visual Instruction Tuning
  2. 2. 3DShape2VecSet: 3D Shape Representation for Diffusion Models
  3. 3. SayCan: Do As I Can, Not As I Say
  4. 4. OpenVLA: An Open-Source Vision-Language-Action Model
  5. 5. VLAS: VLA Model With Speech Instructions
  6. 6. MLA: Multisensory Language-Action Model
  7. 7. Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control
  8. 8. CartoRadar: RF-Based 3D SLAM Rivaling Vision Approaches
  9. 9. mmCLIP: Boosting mmWave-based Zero-shot HAR via Signal-Text Alignment
  10. 10. mmNorm: Non-Line-of-Sight 3D Object Reconstruction via mmWave Surface Normal Estimation
  11. 11. Proactive Hearing Assistants that Isolate Egocentric Conversations
  12. 12. NeuralAids: Wireless Hearables With Programmable Speech AI Accelerators
  13. 13. Creating speech zones with self-distributing acoustic swarms
  14. 14. Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation
  15. 15. SoundStream: An End-to-End Neural Audio Codec
  16. 16. AudioLM
  17. 17. Conformer
  18. 18. Dual-path RNN
  19. 19. EnCodec
  20. 20. Meta-StyleSpeech
  21. 21. MusicLM
  22. 22. Robust Speech Recognition via Large-Scale Weak Supervision
  23. 23. SeamlessM4T
  24. 24. Stable Audio
  25. 25. Universal Source Separation with Weakly Labelled Data
  26. 26. Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning
  27. 27. RLBench: The Robot Learning Benchmark & Learning Environment
  28. 28. robosuite: A Modular Simulation Framework and Benchmark for Robot Learning
  29. 29. BridgeData V2
  30. 30. CALVIN
  31. 31. LIBERO
  32. 32. RH20T
  33. 33. What Matters in Learning from Offline Human Demonstrations for Robot Manipulation
  34. 34. DROID
  35. 35. Open X-Embodiment
  36. 36. RoboCasa
  37. 37. SimplerEnv
  38. 38. Diffusion Policy: Visuomotor Policy Learning via Action Diffusion
  39. 39. 3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations
  40. 40. Consistency Policy: Accelerated Visuomotor Policies via Consistency Distillation
  41. 41. EquiBot: SIM(3)-Equivariant Diffusion Policy
  42. 42. DiT-Policy
  43. 43. Diffusion Policy Policy Optimization (DPPO)
  44. 44. Affordance-based Robot Manipulation with Flow Matching
  45. 45. FlowPolicy: 3D Flow-based Policy via Consistency Flow Matching
  46. 46. FAST: Efficient Action Tokenization for VLA
  47. 47. pi_0: Vision-Language-Action Flow Model
  48. 48. pi_0.5: VLA with Open-World Generalization
  49. 49. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
  50. 50. Generative Adversarial Imitation Learning
  51. 51. Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware (ACT/ALOHA)
  52. 52. AnyTeleop
  53. 53. Behavior Transformers: Cloning k Modes with One Stone
  54. 54. Implicit Behavioral Cloning
  55. 55. RoboCat
  56. 56. ALOHA 2
  57. 57. DexCap
  58. 58. HumanPlus
  59. 59. Generalizable Humanoid Manipulation with 3D Diffusion Policies (iDP3)
  60. 60. Mobile ALOHA
  61. 61. SmolVLA
  62. 62. Universal Manipulation Interface
  63. 63. Behavior Generation with Latent Actions (VQ-BeT)
  64. 64. ImageBind: One Embedding Space To Bind Them All
  65. 65. Connecting Touch and Vision via Cross-Modal Prediction
  66. 66. AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
  67. 67. AudioPaLM
  68. 68. FROMAGe: Grounding LLMs to Images
  69. 69. OneLLM
  70. 70. X-VLM: Multi-Grained Vision Language Pre-Training
  71. 71. Tactile Beyond Pixels (Sparsh-X)
  72. 72. Sparsh: Self-supervised Touch Representations
  73. 73. Tactile-VLA
  74. 74. TLA: Tactile-Language-Action
  75. 75. Code as Policies: Language Model Programs for Embodied Control
  76. 76. Inner Monologue: Embodied Reasoning through Planning with Language Models
  77. 77. LLM+P: Empowering LLMs with Optimal Planning
  78. 78. PaLM-E: An Embodied Multimodal Language Model
  79. 79. ProgPrompt
  80. 80. ChatGPT for Robotics
  81. 81. GenSim
  82. 82. RoboFlamingo
  83. 83. Tree-Planner
  84. 84. VoxPoser
  85. 85. See Through Smoke: Robust Indoor Mapping with Low-cost mmWave Radar
  86. 86. Can WiFi Estimate Person Pose?
  87. 87. 3DRIMR: 3D Reconstruction and Imaging via mmWave Radar based on Deep Learning
  88. 88. milliEgo: Single-chip mmWave Radar Aided Egomotion Estimation via Deep Sensor Fusion
  89. 89. High Resolution Point Clouds from mmWave Radar
  90. 90. RadarSLAM: Radar based Large-Scale SLAM in All Weathers
  91. 91. Through-Wall Pose Imaging in Real-Time with a Many-to-Many Encoder/Decoder Paradigm
  92. 92. RFMask: A Simple Baseline for Human Silhouette Segmentation with Radio Signals
  93. 93. RFPose-OT: RF-Based 3D Human Pose Estimation via Optimal Transport Theory
  94. 94. Argus: Multi-View Egocentric Human Mesh Reconstruction Based on Stripped-Down Wearable mmWave Add-on
  95. 95. Diffusion Model is a Good Pose Estimator from 3D RF-Vision
  96. 96. Enabling Visual Recognition at Radio Frequency (PanoRadar)
  97. 97. Wave-Former: Through-Occlusion 3D Reconstruction via Wireless Shape Completion
  98. 98. Habitat: A Platform for Embodied AI Research
  99. 99. Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning
  100. 100. DexMV
  101. 101. Habitat 2.0
  102. 102. ManiSkill
  103. 103. ProcTHOR
  104. 104. SAPIEN: A SimulAted Part-based Interactive ENvironment
  105. 105. BEHAVIOR-1K
  106. 106. Habitat 3.0
  107. 107. Isaac Lab
  108. 108. MuJoCo Playground
  109. 109. RT-1: Robotics Transformer for Real-World Control at Scale
  110. 110. 3D Diffusion Policy (DP3)
  111. 111. Octo: An Open-Source Generalist Robot Policy
  112. 112. RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
  113. 113. RT-Trajectory: Robotic Task Generalization via Hindsight Trajectory Sketches
  114. 114. 3D-VLA
  115. 115. DexVLA
  116. 116. GR-2: Generative Video-Language-Action Model
  117. 117. OpenHelix
  118. 118. OpenVLA-OFT
  119. 119. RDT-1B: Diffusion Foundation Model for Bimanual Manipulation
  120. 120. RoboMamba
  121. 121. SpatialVLA
  122. 122. TinyVLA
  123. 123. TraceVLA: Visual Trace Prompting
  124. 124. Learning Transferable Visual Models From Natural Language Supervision
  125. 125. Flamingo: a Visual Language Model for Few-Shot Learning
  126. 126. BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
  127. 127. BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
  128. 128. DeepSeek-VL: Towards Real-World Vision-Language Understanding
  129. 129. EVA-CLIP: Improved Training Techniques for CLIP at Scale
  130. 130. FILIP: Fine-grained Interactive Language-Image Pre-Training
  131. 131. Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
  132. 132. InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
  133. 133. Improved Baselines with Visual Instruction Tuning
  134. 134. OBELICS
  135. 135. Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
  136. 136. Sigmoid Loss for Language Image Pre-Training
  137. 137. What matters when building vision-language models?
  138. 138. Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
  139. 139. The Llama 3 Herd of Models
  140. 140. LLaVA-NeXT-Interleave
  141. 141. LLaVA-OneVision: Easy Visual Task Transfer
  142. 142. Long-CLIP: Unlocking the Long-Text Capability of CLIP
  143. 143. Pixtral 12B
  144. 144. Dream to Control: Learning Behaviors by Latent Imagination
  145. 145. World Models
  146. 146. DayDreamer
  147. 147. Mastering Atari with Discrete World Models
  148. 148. Dreamer V3: Mastering Diverse Domains through World Models
  149. 149. Transformers are Sample-Efficient World Models
  150. 150. TWM: Transformer-based World Models
  151. 151. 1X World Model Challenge
  152. 152. Cosmos World Foundation Model Platform
  153. 153. GAIA-1
  154. 154. Genie: Generative Interactive Environments
  155. 155. Navigation World Models
  156. 156. UniSim