| 2025 |
Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control |
World Model & Video Policy |
arXiv |
| 2025 |
DiT-Policy |
Diffusion Policy |
ICRA |
| 2025 |
Diffusion Policy Policy Optimization (DPPO) |
Diffusion Policy |
ICLR |
| 2025 |
FlowPolicy: 3D Flow-based Policy via Consistency Flow Matching |
Diffusion Policy |
AAAI |
| 2025 |
FAST: Efficient Action Tokenization for VLA |
Diffusion Policy |
RSS |
| 2025 |
pi_0.5: VLA with Open-World Generalization |
Diffusion Policy |
arXiv |
| 2025 |
Generalizable Humanoid Manipulation with 3D Diffusion Policies (iDP3) |
Imitation Learning |
RSS |
| 2025 |
SmolVLA |
Imitation Learning |
arXiv |
| 2025 |
TLA: Tactile-Language-Action |
Multimodal Ecology |
ICRA |
| 2025 |
Wave-Former: Through-Occlusion 3D Reconstruction via Wireless Shape Completion |
RF Perception & Mapping |
arXiv |
| 2025 |
DexVLA |
End-to-End VLA |
arXiv |
| 2025 |
OpenHelix |
End-to-End VLA |
arXiv |
| 2025 |
OpenVLA-OFT |
End-to-End VLA |
RSS |
| 2025 |
Dreamer V3: Mastering Diverse Domains through World Models |
World Model & Video Policy |
Nature |
| 2025 |
1X World Model Challenge |
World Model & Video Policy |
arXiv |
| 2025 |
Cosmos World Foundation Model Platform |
World Model & Video Policy |
arXiv |
| 2025 |
Navigation World Models |
World Model & Video Policy |
CVPR |
| 2024 |
Stable Audio |
Auditory & Acoustic |
ICML |
| 2024 |
DROID |
Datasets & Benchmarks |
RSS |
| 2024 |
RoboCasa |
Datasets & Benchmarks |
RSS |
| 2024 |
3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations |
Diffusion Policy |
RSS |
| 2024 |
Consistency Policy: Accelerated Visuomotor Policies via Consistency Distillation |
Diffusion Policy |
RSS |
| 2024 |
EquiBot: SIM(3)-Equivariant Diffusion Policy |
Diffusion Policy |
CoRL |
| 2024 |
Affordance-based Robot Manipulation with Flow Matching |
Diffusion Policy |
IROS |
| 2024 |
pi_0: Vision-Language-Action Flow Model |
Diffusion Policy |
arXiv |
| 2024 |
ALOHA 2 |
Imitation Learning |
Tech Report |
| 2024 |
DexCap |
Imitation Learning |
RSS |
| 2024 |
HumanPlus |
Imitation Learning |
CoRL |
| 2024 |
Mobile ALOHA |
Imitation Learning |
CoRL |
| 2024 |
Universal Manipulation Interface |
Imitation Learning |
RSS |
| 2024 |
Behavior Generation with Latent Actions (VQ-BeT) |
Imitation Learning |
ICML |
| 2024 |
RoboFlamingo |
High-Level Planning |
ICLR |
| 2024 |
Diffusion Model is a Good Pose Estimator from 3D RF-Vision |
RF Perception & Mapping |
CVPR |
| 2024 |
3D Diffusion Policy (DP3) |
End-to-End VLA |
RSS |
| 2024 |
Octo: An Open-Source Generalist Robot Policy |
End-to-End VLA |
RSS |
| 2024 |
3D-VLA |
End-to-End VLA |
ICML |
| 2024 |
GR-2: Generative Video-Language-Action Model |
End-to-End VLA |
arXiv |
| 2024 |
RDT-1B: Diffusion Foundation Model for Bimanual Manipulation |
End-to-End VLA |
ICLR |
| 2024 |
RoboMamba |
End-to-End VLA |
NeurIPS |
| 2024 |
TinyVLA |
End-to-End VLA |
RA-L |
| 2024 |
Long-CLIP: Unlocking the Long-Text Capability of CLIP |
VLM Foundation |
ECCV |
| 2024 |
Genie: Generative Interactive Environments |
World Model & Video Policy |
ICML |
| 2024 |
UniSim |
World Model & Video Policy |
ICLR |
| 2023 |
3DShape2VecSet: 3D Shape Representation for Diffusion Models |
VLM Foundation |
SIGGRAPH |
| 2023 |
MusicLM |
Auditory & Acoustic |
arXiv |
| 2023 |
BridgeData V2 |
Datasets & Benchmarks |
dataset-eval |
| 2023 |
LIBERO |
Datasets & Benchmarks |
NeurIPS |
| 2023 |
RH20T |
Datasets & Benchmarks |
RSS Workshop |
| 2023 |
Diffusion Policy: Visuomotor Policy Learning via Action Diffusion |
Diffusion Policy |
RSS |
| 2023 |
Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware (ACT/ALOHA) |
Imitation Learning |
RSS |
| 2023 |
AnyTeleop |
Imitation Learning |
CoRL |
| 2023 |
RoboCat |
Imitation Learning |
TMLR |
| 2023 |
VoxPoser |
High-Level Planning |
CoRL |
| 2023 |
RT-Trajectory: Robotic Task Generalization via Hindsight Trajectory Sketches |
End-to-End VLA |
ICLR |
| 2023 |
GAIA-1 |
World Model & Video Policy |
arXiv |
| 2022 |
Behavior Transformers: Cloning k Modes with One Stone |
Imitation Learning |
NeurIPS |
| 2021 |
What Matters in Learning from Offline Human Demonstrations for Robot Manipulation |
Datasets & Benchmarks |
CoRL |
| 2021 |
ManiSkill |
Simulation & Sim2Real |
NeurIPS |