回 Jason 主站·Embodied AI Reading Station
没主意?快捷入口
Tag

#vision (92 篇)

yeartitletopicvenue
2025 DiT-Policy Diffusion Policy ICRA
2025 Diffusion Policy Policy Optimization (DPPO) Diffusion Policy ICLR
2025 FlowPolicy: 3D Flow-based Policy via Consistency Flow Matching Diffusion Policy AAAI
2025 Generalizable Humanoid Manipulation with 3D Diffusion Policies (iDP3) Imitation Learning RSS
2025 Tactile Beyond Pixels (Sparsh-X) Multimodal Ecology CoRL
2025 Tactile-VLA Multimodal Ecology CoRL
2025 TLA: Tactile-Language-Action Multimodal Ecology ICRA
2025 Wave-Former: Through-Occlusion 3D Reconstruction via Wireless Shape Completion RF Perception & Mapping arXiv
2025 DexVLA End-to-End VLA arXiv
2025 OpenVLA-OFT End-to-End VLA RSS
2025 SpatialVLA End-to-End VLA arXiv
2024 OpenVLA: An Open-Source Vision-Language-Action Model End-to-End VLA CoRL
2024 mmCLIP: Boosting mmWave-based Zero-shot HAR via Signal-Text Alignment RF Perception & Mapping SenSys 2024
2024 Stable Audio Auditory & Acoustic ICML
2024 Universal Source Separation with Weakly Labelled Data Auditory & Acoustic TASLP
2024 DROID Datasets & Benchmarks RSS
2024 SimplerEnv Datasets & Benchmarks NeurIPS
2024 3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations Diffusion Policy RSS
2024 EquiBot: SIM(3)-Equivariant Diffusion Policy Diffusion Policy CoRL
2024 Affordance-based Robot Manipulation with Flow Matching Diffusion Policy IROS
2024 pi_0: Vision-Language-Action Flow Model Diffusion Policy arXiv
2024 DexCap Imitation Learning RSS
2024 Mobile ALOHA Imitation Learning CoRL
2024 Universal Manipulation Interface Imitation Learning RSS
2024 OneLLM Multimodal Ecology CVPR
2024 Sparsh: Self-supervised Touch Representations Multimodal Ecology CoRL
2024 GenSim High-Level Planning ICLR
2024 RoboFlamingo High-Level Planning ICLR
2024 Argus: Multi-View Egocentric Human Mesh Reconstruction Based on Stripped-Down Wearable mmWave Add-on RF Perception & Mapping SenSys
2024 Diffusion Model is a Good Pose Estimator from 3D RF-Vision RF Perception & Mapping CVPR
2024 Enabling Visual Recognition at Radio Frequency (PanoRadar) RF Perception & Mapping MobiCom
2024 3D Diffusion Policy (DP3) End-to-End VLA RSS
2024 Octo: An Open-Source Generalist Robot Policy End-to-End VLA RSS
2024 3D-VLA End-to-End VLA ICML
2024 RDT-1B: Diffusion Foundation Model for Bimanual Manipulation End-to-End VLA ICLR
2024 RoboMamba End-to-End VLA NeurIPS
2024 TinyVLA End-to-End VLA RA-L
2024 TraceVLA: Visual Trace Prompting End-to-End VLA ICLR
2024 DeepSeek-VL: Towards Real-World Vision-Language Understanding VLM Foundation arXiv
2024 Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks VLM Foundation CVPR
2024 InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks VLM Foundation CVPR
2024 Improved Baselines with Visual Instruction Tuning VLM Foundation CVPR
2024 What matters when building vision-language models? VLM Foundation NeurIPS
2024 Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling VLM Foundation arXiv
2024 The Llama 3 Herd of Models VLM Foundation arXiv
2024 LLaVA-NeXT-Interleave VLM Foundation arXiv
2024 LLaVA-OneVision: Easy Visual Task Transfer VLM Foundation arXiv
2024 Long-CLIP: Unlocking the Long-Text Capability of CLIP VLM Foundation ECCV
2024 Pixtral 12B VLM Foundation arXiv
2024 Genie: Generative Interactive Environments World Model & Video Policy ICML
2024 UniSim World Model & Video Policy ICLR
2023 LLaVA: Visual Instruction Tuning VLM Foundation NeurIPS
2023 MusicLM Auditory & Acoustic arXiv
2023 Robust Speech Recognition via Large-Scale Weak Supervision Auditory & Acoustic ICML
2023 BridgeData V2 Datasets & Benchmarks dataset-eval
2023 LIBERO Datasets & Benchmarks NeurIPS
2023 RH20T Datasets & Benchmarks RSS Workshop
2023 AnyTeleop Imitation Learning CoRL
2023 RoboCat Imitation Learning TMLR
2023 ImageBind: One Embedding Space To Bind Them All Multimodal Ecology CVPR
2023 AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model Multimodal Ecology EACL
2023 FROMAGe: Grounding LLMs to Images Multimodal Ecology ICML
2023 PaLM-E: An Embodied Multimodal Language Model High-Level Planning ICML
2023 VoxPoser High-Level Planning CoRL
2023 RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control End-to-End VLA CoRL
2023 RT-Trajectory: Robotic Task Generalization via Hindsight Trajectory Sketches End-to-End VLA ICLR
2023 BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models VLM Foundation ICML
2023 EVA-CLIP: Improved Training Techniques for CLIP at Scale VLM Foundation arXiv
2023 OBELICS VLM Foundation NeurIPS
2023 Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond VLM Foundation arXiv
2023 Sigmoid Loss for Language Image Pre-Training VLM Foundation ICCV
2023 GAIA-1 World Model & Video Policy arXiv
2022 CALVIN Datasets & Benchmarks RA-L
2022 X-VLM: Multi-Grained Vision Language Pre-Training Multimodal Ecology ICML
2022 Inner Monologue: Embodied Reasoning through Planning with Language Models High-Level Planning CoRL
2022 RFMask: A Simple Baseline for Human Silhouette Segmentation with Radio Signals RF Perception & Mapping TMM
2022 DexMV Simulation & Sim2Real ECCV
2022 Flamingo: a Visual Language Model for Few-Shot Learning VLM Foundation NeurIPS
2022 BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation VLM Foundation ICML
2022 FILIP: Fine-grained Interactive Language-Image Pre-Training VLM Foundation ICLR
2022 DayDreamer World Model & Video Policy CoRL
2021 ManiSkill Simulation & Sim2Real NeurIPS
2021 Learning Transferable Visual Models From Natural Language Supervision VLM Foundation ICML
2021 Mastering Atari with Discrete World Models World Model & Video Policy ICLR
2020 Conformer Auditory & Acoustic Interspeech
2020 See Through Smoke: Robust Indoor Mapping with Low-cost mmWave Radar RF Perception & Mapping SenSys
2020 milliEgo: Single-chip mmWave Radar Aided Egomotion Estimation via Deep Sensor Fusion RF Perception & Mapping SenSys
2020 RadarSLAM: Radar based Large-Scale SLAM in All Weathers RF Perception & Mapping BMVC
2019 RLBench: The Robot Learning Benchmark & Learning Environment Datasets & Benchmarks RA-L
2019 Connecting Touch and Vision via Cross-Modal Prediction Multimodal Ecology CVPR
2019 Through-Wall Pose Imaging in Real-Time with a Many-to-Many Encoder/Decoder Paradigm RF Perception & Mapping arXiv
2019 Habitat: A Platform for Embodied AI Research Simulation & Sim2Real ICCV