| 2025 |
VLAS: VLA Model With Speech Instructions |
Multimodal Ecology |
ICLR |
| 2025 |
Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control |
World Model & Video Policy |
arXiv |
| 2025 |
FAST: Efficient Action Tokenization for VLA |
Diffusion Policy |
RSS |
| 2025 |
pi_0.5: VLA with Open-World Generalization |
Diffusion Policy |
arXiv |
| 2025 |
SmolVLA |
Imitation Learning |
arXiv |
| 2025 |
Tactile-VLA |
Multimodal Ecology |
CoRL |
| 2025 |
DexVLA |
End-to-End VLA |
arXiv |
| 2025 |
OpenHelix |
End-to-End VLA |
arXiv |
| 2025 |
OpenVLA-OFT |
End-to-End VLA |
RSS |
| 2025 |
SpatialVLA |
End-to-End VLA |
arXiv |
| 2025 |
Dreamer V3: Mastering Diverse Domains through World Models |
World Model & Video Policy |
Nature |
| 2025 |
Cosmos World Foundation Model Platform |
World Model & Video Policy |
arXiv |
| 2024 |
OpenVLA: An Open-Source Vision-Language-Action Model |
End-to-End VLA |
CoRL |
| 2024 |
MLA: Multisensory Language-Action Model |
Multimodal Ecology |
arXiv |
| 2024 |
RoboCasa |
Datasets & Benchmarks |
RSS |
| 2024 |
SimplerEnv |
Datasets & Benchmarks |
NeurIPS |
| 2024 |
pi_0: Vision-Language-Action Flow Model |
Diffusion Policy |
arXiv |
| 2024 |
Behavior Generation with Latent Actions (VQ-BeT) |
Imitation Learning |
ICML |
| 2024 |
Sparsh: Self-supervised Touch Representations |
Multimodal Ecology |
CoRL |
| 2024 |
BEHAVIOR-1K |
Simulation & Sim2Real |
CoRL |
| 2024 |
Octo: An Open-Source Generalist Robot Policy |
End-to-End VLA |
RSS |
| 2024 |
GR-2: Generative Video-Language-Action Model |
End-to-End VLA |
arXiv |
| 2024 |
RoboMamba |
End-to-End VLA |
NeurIPS |
| 2024 |
TraceVLA: Visual Trace Prompting |
End-to-End VLA |
ICLR |
| 2024 |
LLaVA-OneVision: Easy Visual Task Transfer |
VLM Foundation |
arXiv |
| 2023 |
BridgeData V2 |
Datasets & Benchmarks |
dataset-eval |
| 2023 |
LIBERO |
Datasets & Benchmarks |
NeurIPS |
| 2023 |
Open X-Embodiment |
Datasets & Benchmarks |
ICRA |
| 2023 |
ProgPrompt |
High-Level Planning |
ICRA |
| 2023 |
ChatGPT for Robotics |
High-Level Planning |
IEEE Access |
| 2023 |
VoxPoser |
High-Level Planning |
CoRL |
| 2023 |
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control |
End-to-End VLA |
CoRL |
| 2022 |
CALVIN |
Datasets & Benchmarks |
RA-L |