回 Jason 主站·Embodied AI Reading Station
没主意?快捷入口
Tag

#language (73 篇)

yeartitletopicvenue
2025 VLAS: VLA Model With Speech Instructions Multimodal Ecology ICLR
2025 FAST: Efficient Action Tokenization for VLA Diffusion Policy RSS
2025 pi_0.5: VLA with Open-World Generalization Diffusion Policy arXiv
2025 Generalizable Humanoid Manipulation with 3D Diffusion Policies (iDP3) Imitation Learning RSS
2025 SmolVLA Imitation Learning arXiv
2025 Tactile-VLA Multimodal Ecology CoRL
2025 TLA: Tactile-Language-Action Multimodal Ecology ICRA
2025 OpenHelix End-to-End VLA arXiv
2025 OpenVLA-OFT End-to-End VLA RSS
2025 1X World Model Challenge World Model & Video Policy arXiv
2025 Cosmos World Foundation Model Platform World Model & Video Policy arXiv
2024 OpenVLA: An Open-Source Vision-Language-Action Model End-to-End VLA CoRL
2024 MLA: Multisensory Language-Action Model Multimodal Ecology arXiv
2024 mmCLIP: Boosting mmWave-based Zero-shot HAR via Signal-Text Alignment RF Perception & Mapping SenSys 2024
2024 DROID Datasets & Benchmarks RSS
2024 pi_0: Vision-Language-Action Flow Model Diffusion Policy arXiv
2024 Behavior Generation with Latent Actions (VQ-BeT) Imitation Learning ICML
2024 OneLLM Multimodal Ecology CVPR
2024 GenSim High-Level Planning ICLR
2024 RoboFlamingo High-Level Planning ICLR
2024 Tree-Planner High-Level Planning ICLR
2024 Habitat 3.0 Simulation & Sim2Real ICLR
2024 Octo: An Open-Source Generalist Robot Policy End-to-End VLA RSS
2024 3D-VLA End-to-End VLA ICML
2024 GR-2: Generative Video-Language-Action Model End-to-End VLA arXiv
2024 RDT-1B: Diffusion Foundation Model for Bimanual Manipulation End-to-End VLA ICLR
2024 RoboMamba End-to-End VLA NeurIPS
2024 TinyVLA End-to-End VLA RA-L
2024 TraceVLA: Visual Trace Prompting End-to-End VLA ICLR
2024 DeepSeek-VL: Towards Real-World Vision-Language Understanding VLM Foundation arXiv
2024 Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks VLM Foundation CVPR
2024 InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks VLM Foundation CVPR
2024 Improved Baselines with Visual Instruction Tuning VLM Foundation CVPR
2024 What matters when building vision-language models? VLM Foundation NeurIPS
2024 Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling VLM Foundation arXiv
2024 The Llama 3 Herd of Models VLM Foundation arXiv
2024 LLaVA-NeXT-Interleave VLM Foundation arXiv
2024 LLaVA-OneVision: Easy Visual Task Transfer VLM Foundation arXiv
2024 Long-CLIP: Unlocking the Long-Text Capability of CLIP VLM Foundation ECCV
2024 Pixtral 12B VLM Foundation arXiv
2023 LLaVA: Visual Instruction Tuning VLM Foundation NeurIPS
2023 AudioLM Auditory & Acoustic TASLP
2023 EnCodec Auditory & Acoustic TMLR
2023 Robust Speech Recognition via Large-Scale Weak Supervision Auditory & Acoustic ICML
2023 SeamlessM4T Auditory & Acoustic arXiv
2023 Open X-Embodiment Datasets & Benchmarks ICRA
2023 RoboCat Imitation Learning TMLR
2023 AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model Multimodal Ecology EACL
2023 AudioPaLM Multimodal Ecology arXiv
2023 FROMAGe: Grounding LLMs to Images Multimodal Ecology ICML
2023 Code as Policies: Language Model Programs for Embodied Control High-Level Planning ICRA
2023 LLM+P: Empowering LLMs with Optimal Planning High-Level Planning arXiv
2023 PaLM-E: An Embodied Multimodal Language Model High-Level Planning ICML
2023 ProgPrompt High-Level Planning ICRA
2023 ChatGPT for Robotics High-Level Planning IEEE Access
2023 VoxPoser High-Level Planning CoRL
2023 RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control End-to-End VLA CoRL
2023 RT-Trajectory: Robotic Task Generalization via Hindsight Trajectory Sketches End-to-End VLA ICLR
2023 BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models VLM Foundation ICML
2023 OBELICS VLM Foundation NeurIPS
2023 Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond VLM Foundation arXiv
2023 Transformers are Sample-Efficient World Models World Model & Video Policy ICLR
2023 TWM: Transformer-based World Models World Model & Video Policy ICLR
2023 GAIA-1 World Model & Video Policy arXiv
2022 SayCan: Do As I Can, Not As I Say High-Level Planning CoRL
2022 Behavior Transformers: Cloning k Modes with One Stone Imitation Learning NeurIPS
2022 X-VLM: Multi-Grained Vision Language Pre-Training Multimodal Ecology ICML
2022 Inner Monologue: Embodied Reasoning through Planning with Language Models High-Level Planning CoRL
2022 ProcTHOR Simulation & Sim2Real NeurIPS
2022 RT-1: Robotics Transformer for Real-World Control at Scale End-to-End VLA RSS
2022 Flamingo: a Visual Language Model for Few-Shot Learning VLM Foundation NeurIPS
2022 BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation VLM Foundation ICML
2021 Learning Transferable Visual Models From Natural Language Supervision VLM Foundation ICML