3D Vision arXiv Daily Visual Localization 448 papers

Visual Localization

448 papers
ULF-Loc: Unbiased Landmark Feature for Robust Visual Localization with 3D Gaussian Splatting
Yingdong Gu et al. · 2026-05-06
Depth-Guided Privacy-Preserving Visual Localization Using 3D Sphere Clouds
Heejoon Moon et al. · 2026-05-01
MSACT: Multistage Spatial Alignment for Stable Low-Latency Fine Manipulation
Xianbo Cai et al. · 2026-05-01
AirZoo: A Unified Large-Scale Dataset for Grounding Aerial Geometric 3D Vision
Xiaoya Cheng et al. · 2026-04-29
3D-LENS: A 3D Lifting-based Elevated Novel-view Synthesis method for Single-View Aerial-Ground Re-Identification
William Grolleau et al. · 2026-04-29
COMPASS: COmpact Multi-channel Prior-map And Scene Signature for Floor-Plan-Based Visual Localization
Muhammad Shaheer et al. · 2026-04-28
Geometric Analysis of Self-Supervised Vision Representations for Semantic Image Retrieval
Esteban Rodríguez-Betancourt et al. · 2026-04-27
Region Matters: Efficient and Reliable Region-Aware Visual Place Recognition
Shunpeng Chen et al. · 2026-04-24
Revisiting Geometric Obfuscation with Dual Convergent Lines for Privacy-Preserving Image Queries in Visual Localization
Jeonggon Kim et al. · 2026-04-24
TEMA: Anchor the Image, Follow the Text for Multi-Modification Composed Image Retrieval
Zixu Li et al. · 2026-04-24
ConeSep: Cone-based Robust Noise-Unlearning Compositional Network for Composed Image Retrieval
Zixu Li et al. · 2026-04-22
UniCVR: From Alignment to Reranking for Unified Zero-Shot Composed Visual Retrieval
Haokun Wen et al. · 2026-04-22
Air-Know: Arbiter-Calibrated Knowledge-Internalizing Robust Network for Composed Image Retrieval
Zhiheng Fu et al. · 2026-04-22
SL(C)AMma: Simultaneous Localisation, (Calibration) and Mapping With a Magnetometer Array
Thomas Edridge et al. · 2026-04-21
T-REN: Learning Text-Aligned Region Tokens Improves Dense Vision-Language Alignment and Scalability
Savya Khosla et al. · 2026-04-20
INTENT: Invariance and Discrimination-aware Noise Mitigation for Robust Composed Image Retrieval
Zhiwei Chen et al. · 2026-04-20
HABIT: Chrono-Synergia Robust Progressive Learning Framework for Composed Image Retrieval
Zixu Li et al. · 2026-04-20
Brain-Inspired Capture: Evidence-Driven Neuromimetic Perceptual Simulation for Visual Decoding
Feixue Shao et al. · 2026-04-20
ReTrack: Evidence-Driven Dual-Stream Directional Anchor Calibration Network for Composed Video Retrieval
Zixu Li et al. · 2026-04-20
Subject-Aware Multi-Granularity Alignment for Zero-Shot EEG-to-Image Retrieval
Lin Jiang et al. · 2026-04-20
mEOL: Training-Free Instruction-Guided Multimodal Embedder for Vector Graphics and Image Retrieval
Kyeong Seon Kim et al. · 2026-04-18
KIRA: Knowledge-Intensive Image Retrieval and Reasoning Architecture for Specialized Visual Domains
Parthaw Goswami et al. · 2026-04-18
Where Do Vision-Language Models Fail? World Scale Analysis for Image Geolocalization
Siddhant Bharadwaj et al. · 2026-04-17
Continual Hand-Eye Calibration for Open-world Robotic Manipulation
Fazeng Li et al. · 2026-04-17
Sketch and Text Synergy: Fusing Structural Contours and Descriptive Attributes for Fine-Grained Image Retrieval
Siyuan Wang et al. · 2026-04-17
SceneGlue: Scene-Aware Transformer for Feature Matching without Scene-Level Annotation
Songlin Du et al. · 2026-04-15
Indexing Multimodal Language Models for Large-scale Image Retrieval
Bahey Tharwat et al. · 2026-04-14
A Sanity Check on Composed Image Retrieval
Yikun Liu et al. · 2026-04-14
VidTAG: Temporally Aligned Video to GPS Geolocalization with Denoising Sequence Prediction at a Global Scale
Parth Parag Kulkarni et al. · 2026-04-14
Human-Inspired Context-Selective Multimodal Memory for Social Robots
Hangyeol Kang et al. · 2026-04-13
Privacy-Preserving Structureless Visual Localization via Image Obfuscation
Vojtech Panek et al. · 2026-04-13
Seeing Through Touch: Tactile-Driven Visual Localization of Material Regions
Seongyu Kim et al. · 2026-04-13
CLAY: Conditional Visual Similarity Modulation in Vision-Language Embedding Space
Sohwi Lim et al. · 2026-04-13
FashionMV: Product-Level Composed Image Retrieval with Multi-View Fashion Data
Peng Yuan et al. · 2026-04-11
AsymLoc: Towards Asymmetric Feature Matching for Efficient Visual Localization
Mohammad Omama et al. · 2026-04-10
Bag of Bags: Adaptive Visual Vocabularies for Genizah Join Image Retrieval
Sharva Gogawale et al. · 2026-04-09
SearchAD: Large-Scale Rare Image Retrieval Dataset for Autonomous Driving
Felix Embacher et al. · 2026-04-09
Learning to Search: A Decision-Based Agent for Knowledge-Based Visual Question Answering
Zhuohong Chen et al. · 2026-04-09
VGGT-SLAM++
Avilasha Mandal et al. · 2026-04-08
Stealthy and Adjustable Text-Guided Backdoor Attacks on Multimodal Pretrained Models
Yiyang Zhang et al. · 2026-04-07
WRF4CIR: Weight-Regularized Fine-Tuning Network for Composed Image Retrieval
Yizhuo Xu et al. · 2026-04-07
LSGS-Loc: Towards Robust 3DGS-Based Visual Localization for Large-Scale UAV Scenarios
Xiang Zhang et al. · 2026-04-07
Beyond Semantic Search: Towards Referential Anchoring in Composed Image Retrieval
Yuxin Yang et al. · 2026-04-07
CraterBench-R: Instance-Level Crater Retrieval for Planetary Scale
Jichao Fang et al. · 2026-04-06
MPTF-Net: Multi-view Pyramid Transformer Fusion Network for LiDAR-based Place Recognition
Shuyuan Li et al. · 2026-04-06
MELT: Improve Composed Image Retrieval via the Modification Frequentation-Rarity Balance Network
Guozhi Qiu et al. · 2026-03-31
RHO: Robust Holistic OSM-Based Metric Cross-View Geo-Localization
Junwei Zheng et al. · 2026-03-29
NeedleDB: A Generative-AI Based System for Accurate and Efficient Image Retrieval using Complex Natural Language Queries
Mahdi Erfanian et al. · 2026-03-29
TIGeR: A Unified Framework for Time, Images and Geo-location Retrieval
David G. Shatwell et al. · 2026-03-28
Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones
Moritz Nottebaum et al. · 2026-03-27
HINT: Composed Image Retrieval with Dual-path Compositional Contextualized Network
Mingyu Zhang et al. · 2026-03-27
4DRaL: Bridging 4D Radar with LiDAR for Place Recognition using Knowledge Distillation
Ningyuan Huang et al. · 2026-03-27
Few Shots Text to Image Retrieval: New Benchmarking Dataset and Optimization Methods
Ofer Idan et al. · 2026-03-26
Just Zoom In: Cross-View Geo-Localization via Autoregressive Zooming
Yunus Talha Erzurumlu et al. · 2026-03-26
On-Demand Instructional Material Providing Agent Based on MLLM for Tutoring Support
Takumi Kato et al. · 2026-03-26
Sparse Autoencoders for Interpretable Medical Image Representation Learning
Philipp Wesp et al. · 2026-03-24
ARGENT: Adaptive Hierarchical Image-Text Representations
Chuong Huynh et al. · 2026-03-24
Retrieval-Guided Photovoltaic Inventory Estimation from Satellite Imagery for Distribution Grid Planning
Muhao Guo et al. · 2026-03-24
SOUPLE: Enhancing Audio-Visual Localization and Segmentation with Learnable Prompt Contexts
Khanh Binh Nguyen et al. · 2026-03-24
HyFI: Hyperbolic Feature Interpolation for Brain-Vision Alignment
Sangmin Jo et al. · 2026-03-24
ADaFuSE: Adaptive Diffusion-generated Image and Text Fusion for Interactive Text-to-Image Retrieval
Zhuocheng Zhang et al. · 2026-03-23
SATTC: Structure-Aware Label-Free Test-Time Calibration for Cross-Subject EEG-to-Image Retrieval
Qunjie Huang et al. · 2026-03-21
A Multihead Continual Learning Framework for Fine-Grained Fashion Image Retrieval with Contrastive Learning and Exponential Moving Average Distillation
Ling Xiao et al. · 2026-03-21
IsoCLIP: Decomposing CLIP Projectors for Efficient Intra-modal Alignment
Simone Magistri et al. · 2026-03-20
IUP-Pose: Decoupled Iterative Uncertainty Propagation for Real-time Relative Pose Regression via Implicit Dense Alignment v1
Jun Wang et al. · 2026-03-20
MCoT-MVS: Multi-level Vision Selection by Multi-modal Chain-of-Thought Reasoning for Composed Image Retrieval
Xuri Ge et al. · 2026-03-18
VisBrowse-Bench: Benchmarking Visual-Native Search for Multimodal Browsing Agents
Zhengbo Zhang et al. · 2026-03-18
Visual Product Search Benchmark
Karthik Sulthanpete Govindappa · 2026-03-17
Retrieving Counterfactuals Improves Visual In-Context Learning
Guangzhi Xiong et al. · 2026-03-17
HMAR: Hierarchical Modality-Aware Expert and Dynamic Routing Medical Image Retrieval Architecture
Aojie Yuan · 2026-03-17
Rethinking Pose Refinement in 3D Gaussian Splatting under Pose Prior and Geometric Uncertainty
Mangyu Kong et al. · 2026-03-17
Evaluation of Visual Place Recognition Methods for Image Pair Retrieval in 3D Vision and Robotics
Dennis Haitz et al. · 2026-03-14
Sky2Ground: A Benchmark for Site Modeling under Varying Altitude
Zengyan Wang et al. · 2026-03-14
A Closed-Form Solution for Debiasing Vision-Language Models with Utility Guarantees Across Modalities and Tasks
Tangzheng Lian et al. · 2026-03-13
Text-Phase Synergy Network with Dual Priors for Unsupervised Cross-Domain Image Retrieval
Jing Yang et al. · 2026-03-13
CM-Bench: A Comprehensive Cross-Modal Feature Matching Benchmark Bridging Visible and Infrared Images
Liangzheng Sun et al. · 2026-03-13
FBCIR: Balancing Cross-Modal Focuses in Composed Image Retrieval
Chenchen Zhao et al. · 2026-03-12
Efficient Cross-View Localization in 6G Space-Air-Ground Integrated Network
Min Hao et al. · 2026-03-12
Composed Vision-Language Retrieval for Skin Cancer Case Search via Joint Alignment of Global and Local Representations
Yuheng Wang et al. · 2026-03-10
$L^3$:Scene-agnostic Visual Localization in the Wild
Yu Zhang et al. · 2026-03-09
QdaVPR: A novel query-based domain-agnostic model for visual place recognition
Shanshan Wan et al. · 2026-03-08
T2Nav Algebraic Topology Aware Temporal Graph Memory and Loop Detection for ZeroShot Visual Navigation
Quang-Anh N. D. et al. · 2026-03-06
EventGeM: Global-to-Local Feature Matching for Event-Based Visual Place Recognition
Adam D. Hines et al. · 2026-03-06
Visual Words Meet BM25: Sparse Auto-Encoder Visual Word Scoring for Image Retrieval
Donghoon Han et al. · 2026-03-06
Loop Closure via Maximal Cliques in 3D LiDAR-Based SLAM
Javier Laserna et al. · 2026-03-05
PinPoint: Evaluation of Composed Image Retrieval with Explicit Negatives, Multi-Image Queries, and Paraphrase Testing
Rohan Mahadev et al. · 2026-03-04
SSR: A Generic Framework for Text-Aided Map Compression for Localization
Mohammad Omama et al. · 2026-03-04
Long-Term Visual Localization in Dynamic Benthic Environments: A Dataset, Footprint-Based Ground Truth, and Visual Place Recognition Benchmark
Martin Kvisvik Larsen et al. · 2026-03-04
VGG-T$^3$: Offline Feed-Forward 3D Reconstruction at Scale
Sven Elflein et al. · 2026-02-26
WISER: Wider Search, Deeper Thinking, and Adaptive Fusion for Training-Free Zero-Shot Composed Image Retrieval
Tianyue Wang et al. · 2026-02-26
Autoregressive Visual Decoding from EEG Signals
Sicheng Dai et al. · 2026-02-26
Pix2Key: Controllable Open-Vocabulary Retrieval with Semantic Decomposition and Self-Supervised Visual Dictionary Learning
Guoyizhe Wei et al. · 2026-02-26
Global-Aware Edge Prioritization for Pose Graph Initialization
Tong Wei et al. · 2026-02-25
Automatic Map Density Selection for Locally-Performant Visual Place Recognition
Somayeh Hussaini et al. · 2026-02-25
Seeing Through Words: Controlling Visual Retrieval Quality with Language Models
Jianglin Lu et al. · 2026-02-24
LST-SLAM: A Stereo Thermal SLAM System for Kilometer-Scale Dynamic Environments
Zeyu Jiang et al. · 2026-02-24
Long-Term Multi-Session 3D Reconstruction Under Substantial Appearance Change
Beverley Gorry et al. · 2026-02-24
Unlocking Multimodal Document Intelligence: From Current Triumphs to Future Frontiers of Visual Document Retrieval
Yibo Yan et al. · 2026-02-23
VGGT-MPR: VGGT-Enhanced Multimodal Place Recognition in Autonomous Driving Environments
Jingyi Xu et al. · 2026-02-23
Evaluating the Impact of Data Anonymization on Image Retrieval
Marvin Chen et al. · 2026-02-23
Knowledge-aware Visual Question Generation for Remote Sensing Images
Siran Li et al. · 2026-02-22
Questions beyond Pixels: Integrating Commonsense Knowledge in Visual Question Generation for Remote Sensing
Siran Li et al. · 2026-02-22
IRIS-SLAM: Unified Geo-Instance Representations for Robust Semantic Localization and Mapping
Tingyang Xiao et al. · 2026-02-21
VQPP: Video Query Performance Prediction Benchmark
Adrian Catalin Lutu et al. · 2026-02-19
DiffPlace: Street View Generation via Place-Controllable Diffusion Model Enhancing Place Recognition
Ji Li et al. · 2026-02-12
Arbitrary Ratio Feature Compression via Next Token Prediction
Yufan Liu et al. · 2026-02-12
DeepImageSearch: Benchmarking Multimodal Agents for Context-Aware Image Retrieval in Visual Histories
Chenlong Deng et al. · 2026-02-11
WristMIR: Coarse-to-Fine Region-Aware Retrieval of Pediatric Wrist Radiographs with Radiology Report-Driven Learning
Mert Sonmezer et al. · 2026-02-10
OSCAR: Optimization-Steered Agentic Planning for Composed Image Retrieval
Teng Wang et al. · 2026-02-09
A Sketch+Text Composed Image Retrieval Dataset for Thangka
Jinyu Xu et al. · 2026-02-09
UrbanGraphEmbeddings: Learning and Evaluating Spatially Grounded Multimodal Embeddings for Urban Science
Jie Zhang et al. · 2026-02-09
SDR-CIR: Semantic Debias Retrieval Framework for Training-Free Zero-Shot Composed Image Retrieval
Yi Sun et al. · 2026-02-05
SAR-RAG: ATR Visual Question Answering by Semantic Search, Retrieval, and MLLM Generation
David F. Ramirez et al. · 2026-02-04
Quantile Transfer for Reliable Operating Point Selection in Visual Place Recognition
Dhyey Manish Rajani et al. · 2026-02-04
Beyond Static Cropping: Layer-Adaptive Visual Localization and Decoding Enhancement
Zipeng Zhu et al. · 2026-02-04
Invariance on Manifolds: Understanding Robust Visual Representations for Place Recognition
Jintao Cheng et al. · 2026-02-04
LaVPR: Benchmarking Language and Vision for Place Recognition
Ofer Idan et al. · 2026-02-03
ObjEmbed: Towards Universal Multimodal Object Embeddings
Shenghao Fu et al. · 2026-02-03
Real-Time Loop Closure Detection in Visual SLAM via NetVLAD and Faiss
Enguang Fan · 2026-02-02
ReCALL: Recalibrating Capability Degradation for MLLM-based Composed Image Retrieval
Tianyu Yang et al. · 2026-02-02
Interacted Planes Reveal 3D Line Mapping
Zeran Ke et al. · 2026-02-01
Variance & Greediness: A comparative study of metric-learning losses
Donghuo Zeng et al. · 2026-01-29
When Vision Meets Texts in Listwise Reranking
Hongyi Cai · 2026-01-28
Eliminating Hallucination in Diffusion-Augmented Interactive Text-to-Image Retrieval
Zhuocheng Zhang et al. · 2026-01-28
VGGT-SLAM 2.0: Real time Dense Feed-forward Scene Reconstruction
Dominic Maggio et al. · 2026-01-27
Pixel-Grounded Retrieval for Knowledgeable Large Multimodal Models
Jeonghwan Kim et al. · 2026-01-27
X-Aligner: Composed Visual Retrieval without the Bells and Whistles
Yuqian Zheng et al. · 2026-01-23
Rethinking Composed Image Retrieval Evaluation: A Fine-Grained Benchmark from Image Editing
Tingyu Song et al. · 2026-01-22
Heterogeneous Uncertainty-Guided Composed Image Retrieval with Fine-Grained Probabilistic Learning
Haomiao Tang et al. · 2026-01-22
Unified Multimodal and Multilingual Retrieval via Multi-Task Learning with NLU Integration
Xinyuan Zhang et al. · 2026-01-21
LookBench: A Live and Holistic Open Benchmark for Fashion Image Retrieval
Chao Gao et al. · 2026-01-21
XR: Cross-Modal Agents for Composed Image Retrieval
Zhongyu Yang et al. · 2026-01-20
Fine-Grained Zero-Shot Composed Image Retrieval with Complementary Visual-Semantic Integration
Yongcong Ye et al. · 2026-01-20
Glance-or-Gaze: Incentivizing LMMs to Adaptively Focus Search via Reinforcement Learning
Hongbo Bai et al. · 2026-01-20
DC-VLAQ: Query-Residual Aggregation for Robust Visual Place Recognition
Hanyu Zhu et al. · 2026-01-19
SupScene: Learning Overlap-Aware Global Descriptor for Unconstrained SfM
Xulei Shi et al. · 2026-01-17
Simple Models, Rich Representations: Visual Decoding from Primate Intracortical Neural Signals
Matteo Ciferri et al. · 2026-01-16
Multilingual-To-Multimodal (M2M): Unlocking New Languages with Monolingual Text
Piyush Singh Pasi · 2026-01-15
UniHash: Unifying Pointwise and Pairwise Hashing Paradigms for Seen and Unseen Category Retrieval
Xiaoxu Ma et al. · 2026-01-14
Hybrid guided variational autoencoder for visual place recognition
Ni Wang et al. · 2026-01-14
Keyframe-based Dense Mapping with the Graph of View-Dependent Local Maps
Krzysztof Zielinski et al. · 2026-01-13
Enhancing Image Quality Assessment Ability of LMMs via Retrieval-Augmented Generation
Kang Fu et al. · 2026-01-13
Ground What You See: Hallucination-Resistant MLLMs via Caption Feedback, Diversity-Aware Sampling, and Conflict Regularization
Miao Pan et al. · 2026-01-13
Multi-task Cross-modal Learning for Chest X-ray Image Retrieval
Zhaohui Liang et al. · 2026-01-08
ImLoc: Revisiting Visual Localization with Image-based Representation
Xudong Jiang et al. · 2026-01-07
CSMCIR: CoT-Enhanced Symmetric Alignment with Memory Bank for Composed Image Retrieval
Zhipeng Qian et al. · 2026-01-07
BREATH-VL: Vision-Language-Guided 6-DoF Bronchoscopy Localization via Semantic-Geometric Fusion
Qingyao Tian et al. · 2026-01-07
HOLO: Homography-Guided Pose Estimator Network for Fine-Grained Visual Localization on SD Maps
Xuchang Zhong et al. · 2026-01-07
Comparative Analysis of Binarization Methods For Medical Image Hashing On Odir Dataset
Nedim Muzoglu · 2026-01-07
Loop Closure using AnyLoc Visual Place Recognition in DPV-SLAM
Wenzheng Zhang et al. · 2026-01-06
Vision-Language Reasoning for Geolocalization: A Reinforcement Learning Approach
Biao Wu et al. · 2026-01-05
OCP-LS: An Efficient Algorithm for Visual Localization
Jindi Zhong et al. · 2025-12-31
Geometric Multi-Session Map Merging with Learned Local Descriptors
Yanlong Ma et al. · 2025-12-30
Learning to Feel the Future: DreamTacVLA for Contact-Rich Manipulation
Guo Ye et al. · 2025-12-29
MindWatcher: Toward Smarter Multimodal Tool-Integrated Reasoning
Jiawei Chen et al. · 2025-12-29
Anomaly Detection by Effectively Leveraging Synthetic Images
Sungho Kang et al. · 2025-12-29
UniPR-3D: Towards Universal Visual Place Recognition with Visual Geometry Grounded Transformer
Tianchen Deng et al. · 2025-12-28
Reloc-VGGT: Visual Re-localization with Geometry Grounded Transformer
Tianchen Deng et al. · 2025-12-26
Leveraging Lightweight Entity Extraction for Scalable Event-Based Image Retrieval
Dao Sy Duy Minh et al. · 2025-12-24
Soft Filtering: Guiding Zero-shot Composed Image Retrieval with Prescriptive and Proscriptive Constraints
Youjin Jung et al. · 2025-12-23
Towards Natural Language-Based Document Image Retrieval: New Dataset and Benchmark
Hao Guo et.al. · 2025-12-23
Beyond CLIP: Knowledge-Enhanced Multimodal Transformers for Cross-Modal Alignment in Diabetic Retinopathy Diagnosis
Argha Kamal Samanta et.al. · 2025-12-22
Finer-Personalization Rank: Fine-Grained Retrieval Examines Identity Preservation for Personalized Generation
Connor Kilrain et.al. · 2025-12-22
Text2Graph VPR: A Text-to-Graph Expert System for Explainable Place Recognition in Changing Environments
Saeideh Yousefzadeh et.al. · 2025-12-21
Through the PRISm: Importance-Aware Scene Graphs for Image Retrieval
Dimitrios Georgoulopoulos et.al. · 2025-12-20
Robust Scene Coordinate Regression via Geometrically-Consistent Global Descriptors
Son Tung Nguyen et.al. · 2025-12-19
The Effect of Negation on CLIP in Medical Imaging: Limitations of Contrastive Language-Image Pretraining
Jasmine Vu et.al. · 2025-12-18
MACL: Multi-Label Adaptive Contrastive Learning Loss for Remote Sensing Image Retrieval
Amna Amir et.al. · 2025-12-18
CLNet: Cross-View Correspondence Makes a Stronger Geo-Localizationer
Xianwei Cao et.al. · 2025-12-16
Neurosymbolic Inference On Foundation Models For Remote Sensing Text-to-image Retrieval With Complex Queries
Emanuele Mezzi et.al. · 2025-12-16
Towards Test-time Efficient Visual Place Recognition via Asymmetric Query Processing
Jaeyoon Kim et.al. · 2025-12-15
Patch-wise Retrieval: A Bag of Practical Techniques for Instance-level Matching
Wonseok Choi et.al. · 2025-12-14
Beyond Pixels: A Training-Free, Text-to-Text Framework for Remote Sensing Image Retrieval
J. Xiao et.al. · 2025-12-11
YOPO-Nav: Visual Navigation using 3DGS Graphs from One-Pass Videos
Ryan Meegan et.al. · 2025-12-10
Adaptive Thresholding for Visual Place Recognition using Negative Gaussian Mixture Statistics
Nick Trinh et.al. · 2025-12-09
Generalized Referring Expression Segmentation on Aerial Photos
Luís Marnoto et.al. · 2025-12-08
Spatial Retrieval Augmented Autonomous Driving
Xiaosong Jia et.al. · 2025-12-07
Language-driven Fine-grained Retrieval
Shijie Wang et.al. · 2025-12-06
GuideNav: User-Informed Development of a Vision-Only Robotic Navigation Assistant For Blind Travelers
Hochul Hwang et.al. · 2025-12-05
ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning
Shengyuan Ding et.al. · 2025-12-04
Visual Reasoning Tracer: Object-Level Grounded Reasoning Benchmark
Haobo Yuan et.al. · 2025-12-04
Semantic-Guided Two-Stage GAN for Face Inpainting with Hybrid Perceptual Encoding
Abhigyan Bhattacharya et.al. · 2025-12-04
Revealing stimulus-dependent dynamics through statistical complexity
Edson V. de Paula et.al. · 2025-12-04
Influence of Object Affordance on Action Language Understanding: Evidence from Dynamic Causal Modeling Analysis
Supriya Bordoloi et.al. · 2025-12-04
LiteVGGT: Boosting Vanilla VGGT via Geometry-aware Cached Token Merging
Zhijian Shu et.al. · 2025-12-04
Terahertz Fourier Ptychographic Imaging
Pitambar Mukherjee et.al. · 2025-12-04
TEMPO-VINE: A Multi-Temporal Sensor Fusion Dataset for Localization and Mapping in Vineyards
Mauro Martini et.al. · 2025-12-04
MemLoRA: Distilling Expert Adapters for On-Device Memory Systems
Massimo Bini et.al. · 2025-12-04
Spectral micro-CT for quantitative analysis of calcification in fibrocartilage
Vittoria Mazzini et.al. · 2025-12-04
HUD: Hierarchical Uncertainty-Aware Disambiguation Network for Composed Video Retrieval
Zhiwei Chen et.al. · 2025-12-02
GeoBridge: A Semantic-Anchored Multi-View Foundation Model Bridging Images and Text for Geo-Localization
Zixuan Song et.al. · 2025-12-02
Generative Editing in the Joint Vision-Language Space for Zero-Shot Composed Image Retrieval
Xin Wang et.al. · 2025-12-01
Winning Solutions for the Rayan AI Contest: Compositional Retrieval, Zero-Shot Anomaly Detection, and Backdoor Detection
Ali Nafisi et.al. · 2025-12-01
MARVO: Marine-Adaptive Radiance-aware Visual Odometry
Sacchin Sundar et.al. · 2025-11-28
UNION: A Lightweight Target Representation for Efficient Zero-Shot Image-Guided Retrieval with Optional Textual Queries
Hoang-Bao Le et.al. · 2025-11-27
Attention-Guided Patch-Wise Sparse Adversarial Attacks on Vision-Language-Action Models
Naifu Zhang et.al. · 2025-11-26
Fast 3D Ultrasound Localization Microscopy via Projection-based Processing Framework
Jingke Zhang et.al. · 2025-11-26
Qwen3-VL Technical Report
Shuai Bai et.al. · 2025-11-26
Harmony: Harmonizing Audio and Video Generation through Cross-Task Synergy
Teng Hu et.al. · 2025-11-26
FITRep: Attention-Guided Item Representation via MLLMs
Guoxiao Zhang et.al. · 2025-11-26
Thinking With Bounding Boxes: Enhancing Spatio-Temporal Video Grounding via Reinforcement Fine-Tuning
Xin Gu et.al. · 2025-11-26
HTTM: Head-wise Temporal Token Merging for Faster VGGT
Weitian Wang et.al. · 2025-11-26
Low-dose Chemically Specific Bioimaging via Deep-UV Lensless Holographic Microscopy on a Standard Camera
Piotr Arcab et.al. · 2025-11-26
Adaptive Lighting Control in Visible Light Systems: An Integrated Sensing, Communication, and Illumination Framework
Xinyan Xie et.al. · 2025-11-26
Towards an Effective Action-Region Tracking Framework for Fine-grained Video Action Recognition
Baoli Sun et.al. · 2025-11-26
Wigner and Gabor phase-space analysis of propagators for evolution equations
Elena Cordero et.al. · 2025-11-24
Real-Time Object Tracking with On-Device Deep Learning for Adaptive Beamforming in Dynamic Acoustic Environments
Jorge Ortigoso-Narro et.al. · 2025-11-24
In-vivo imaging with a low-cost MRI scanner and cloud data processing in low-resource settings
Teresa Guallart-Naval et.al. · 2025-11-24
Can Modern Vision Models Understand the Difference Between an Object and a Look-alike?
Itay Cohen et.al. · 2025-11-24
From Pixels to Posts: Retrieval-Augmented Fashion Captioning and Hashtag Generation
Moazzam Umer Gondal et.al. · 2025-11-24
Graph-based 3D Human Pose Estimation using WiFi Signals
Jichao Chen et.al. · 2025-11-24
Towards Generalizable Deepfake Detection via Forgery-aware Audio-Visual Adaptation: A Variational Bayesian Approach
Fan Nie et.al. · 2025-11-24
LAA3D: A Benchmark of Detecting and Tracking Low-Altitude Aircraft in 3D Space
Hai Wu et.al. · 2025-11-24
Multi-Agent Monocular Dense SLAM With 3D Reconstruction Priors
Haihang Wu et.al. · 2025-11-24
Dynamic Granularity Matters: Rethinking Vision Transformers Beyond Fixed Patch Splitting
Qiyang Yu et.al. · 2025-11-24
GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization
Yikun Wang et.al. · 2025-11-19
First Frame Is the Place to Go for Video Content Customization
Jingxi Chen et.al. · 2025-11-19
Hierarchical Semantic Tree Anchoring for CLIP-Based Class-Incremental Learning
Tao Hu et.al. · 2025-11-19
Multi-Text Guided Few-Shot Semantic Segmentation
Qiang Jiao et.al. · 2025-11-19
SIGMMA: Hierarchical Graph-Based Multi-Scale Multi-modal Contrastive Alignment of Histopathology Image and Spatial Transcriptome
Dabin Jeong et.al. · 2025-11-19
HV-Attack: Hierarchical Visual Attack for Multimodal Retrieval Augmented Generation
Linyin Luo et.al. · 2025-11-19
The Empowerment of Science of Science by Large Language Models: New Tools and Methods
Guoqiang Liang et.al. · 2025-11-19
C2F-Space: Coarse-to-Fine Space Grounding for Spatial Instructions using Vision-Language Models
Nayoung Oh et.al. · 2025-11-19
Towards Unbiased Cross-Modal Representation Learning for Food Image-to-Recipe Retrieval
Qing Wang et.al. · 2025-11-19
Unbiased Semantic Decoding with Vision Foundation Models for Few-shot Segmentation
Jin Wang et.al. · 2025-11-19
Multi-modal Loop Closure Detection with Foundation Models in Severely Unstructured Environments
Laura Alejandra Encinar Gonzalez et.al. · 2025-11-07
DAFM: Dynamic Adaptive Fusion for Multi-Model Collaboration in Composed Image Retrieval
Yawei Cai et.al. · 2025-11-07
Multi-Task Learning for Visually Grounded Reasoning in Gastrointestinal VQA
Itbaan Safwan et.al. · 2025-11-06
An Efficient Algorithm for Learning-Based Visual Localization
Jindi Zhong et.al. · 2025-11-06
Object Detection as an Optional Basis: A Graph Matching Network for Cross-View UAV Localization
Tao Liu et.al. · 2025-11-04
LUMA-RAG: Lifelong Multimodal Agents with Provably Stable Streaming Alignment
Rohan Wandre et.al. · 2025-11-04
SEPS: Semantic-enhanced Patch Slimming Framework for fine-grained cross-modal alignment
Xinyu Mao et.al. · 2025-11-03
Evaluating Perspectival Biases in Cross-Modal Retrieval
Teerapol Saengsukhiran et.al. · 2025-11-03
Dynamic Multi-level Weighted Alignment Network for Zero-shot Sketch-based Image Retrieval
Hanwen Su et.al. · 2025-11-02
Multi-Mapcher: Loop Closure Detection-Free Heterogeneous LiDAR Multi-Session SLAM Leveraging Outlier-Robust Registration for Autonomous Vehicles
Hyungtae Lim et.al. · 2025-11-01
Approximate Diverse $k$-nearest Neighbor Search in Vector Database
Jiachen Zhao et.al. · 2025-10-31
Scaling Image Geo-Localization to Continent Level
Philipp Lindenberger et.al. · 2025-10-30
Instance-Level Composed Image Retrieval
Bill Psomas et.al. · 2025-10-29
DualCap: Enhancing Lightweight Image Captioning via Dual Retrieval with Similar Scenes Visual Prompts
Binbin Li et.al. · 2025-10-28
Accurate and Scalable Multimodal Pathology Retrieval via Attentive Vision-Language Alignment
Hongyi Wang et.al. · 2025-10-27
Seeing the Unseen: Towards Zero-Shot Inspection for Wind Turbine Blades using Knowledge-Augmented Vision Language Models
Yang Zhang et.al. · 2025-10-26
TWC-SLAM: Multi-Agent Cooperative SLAM with Text Semantics and WiFi Features Integration for Similar Indoor Environments
Chunyu Li et.al. · 2025-10-26
Cross-view Localization and Synthesis -- Datasets, Challenges and Opportunities
Ningli Xu et.al. · 2025-10-26
STATUS Bench: A Rigorous Benchmark for Evaluating Object State Understanding in Vision-Language Models
Mahiro Ukai et.al. · 2025-10-26
Bag-of-Word-Groups (BoWG): A Robust and Efficient Loop Closure Detection Method Under Perceptual Aliasing
Xiang Fei et.al. · 2025-10-26
BioCAP: Exploiting Synthetic Captions Beyond Labels in Biological Foundation Models
Ziheng Zhang et.al. · 2025-10-24
Beyond Single Images: Retrieval Self-Augmented Unsupervised Camouflaged Object Detection
Ji Du et.al. · 2025-10-21
ImageGem: In-the-wild Generative Image Interaction Dataset for Generative Model Personalization
Yuanhe Guo et.al. · 2025-10-21
DualHash: A Stochastic Primal-Dual Algorithm with Theoretical Guarantee for Deep Hashing
Luxuan Li et.al. · 2025-10-21
Joint Multi-Condition Representation Modelling via Matrix Factorisation for Visual Place Recognition
Timur Ismagilov et.al. · 2025-10-20
Small Language Models Offer Significant Potential for Science Community
Jian Zhang et.al. · 2025-10-18
Acquisition of interpretable domain information during brain MR image harmonization for content-based image retrieval
Keima Abe et.al. · 2025-10-16
Through the Lens of Doubt: Robust and Efficient Uncertainty Estimation for Visual Place Recognition
Emily Miller et.al. · 2025-10-15
Embedding the Teacher: Distilling vLLM Preferences for Scalable Image Retrieval
Eric He et.al. · 2025-10-13
Hierarchical Scheduling for Multi-Vector Image Retrieval
Maoliang Li et.al. · 2025-10-10
DarkHash: A Data-Free Backdoor Attack Against Deep Hashing
Ziqi Zhou et.al. · 2025-10-09
CIR-CoT: Towards Interpretable Composed Image Retrieval via End-to-End Chain-of-Thought Reasoning
Weihuang Lin et.al. · 2025-10-09
Mutual Learning for Hashing: Unlocking Strong Hash Functions from Weak Supervision
Xiaoxu Ma et.al. · 2025-10-09
Multi-hop Deep Joint Source-Channel Coding with Deep Hash Distillation for Semantically Aligned Image Retrieval
Didrik Bergström et.al. · 2025-10-08
CalibCLIP: Contextual Calibration of Dominant Semantics for Text-Driven Image Retrieval
Bin Kang et.al. · 2025-10-07
Personalizing Retrieval using Joint Embeddings or "the Return of Fluffy"
Bruno Korbar et.al. · 2025-10-06
Flexible and Efficient Spatio-Temporal Transformer for Sequential Visual Place Recognition
Yu Kiu et.al. · 2025-10-05
The Overlooked Value of Test-time Reference Sets in Visual Place Recognition
Mubariz Zaffar et.al. · 2025-10-04
Novel UWB Synthetic Aperture Radar Imaging for Mobile Robot Mapping
Charith Premachandra et.al. · 2025-10-03
Team Xiaomi EV-AD VLA: Caption-Guided Retrieval System for Cross-Modal Drone Navigation -- Technical Report for IROS 2025 RoboSense Challenge Track 4
Lingfeng Zhang et.al. · 2025-10-03
EvoWorld: Evolving Panoramic World Generation with Explicit 3D Memory
Jiahao Wang et.al. · 2025-10-01
A Scene is Worth a Thousand Features: Feed-Forward Camera Localization from a Collection of Image Features
Axel Barroso-Laguna et.al. · 2025-10-01
Semantic Visual Simultaneous Localization and Mapping: A Survey on State of the Art, Challenges, and Future Directions
Thanh Nguyen Canh et.al. · 2025-10-01
Video Object Segmentation-Aware Audio Generation
Ilpo Viertola et.al. · 2025-09-30
SQUARE: Semantic Query-Augmented Fusion and Efficient Batch Reranking for Training-free Zero-Shot Composed Image Retrieval
Ren-Di Wu et.al. · 2025-09-30
SETR: A Two-Stage Semantic-Enhanced Framework for Zero-Shot Composed Image Retrieval
Yuqi Xiao et.al. · 2025-09-30
SAGE: Spatial-visual Adaptive Graph Exploration for Visual Place Recognition
Shunpeng Chen et.al. · 2025-09-30
Robust Visual Localization in Compute-Constrained Environments by Salient Edge Rendering and Weighted Hamming Similarity
Tu-Hoa Pham et.al. · 2025-09-29
Performance-Efficiency Trade-off for Fashion Image Retrieval
Julio Hurtado et.al. · 2025-09-29
Prepare for Warp Speed: Sub-millisecond Visual Place Recognition Using Event Cameras
Vignesh Ramanathan et.al. · 2025-09-28
Johnson-Lindenstrauss Lemma Guided Network for Efficient 3D Medical Segmentation
Jinpeng Lu et.al. · 2025-09-26
Efficient Multimodal Dataset Distillation via Generative Models
Zhenghao Zhao et.al. · 2025-09-25
A Versatile Foundation Model for AI-enabled Mammogram Interpretation
Fuxiang Huang et.al. · 2025-09-24
SGAligner++: Cross-Modal Language-Aided 3D Scene Graph Alignment
Binod Singh et.al. · 2025-09-23
Vision-Free Retrieval: Rethinking Multimodal Search with Textual Scene Descriptions
Ioanna Ntinou et.al. · 2025-09-23
OrthoLoC: UAV 6-DoF Localization and Calibration Using Orthographic Geodata
Oussema Dhaouadi et.al. · 2025-09-22
Learning Attribute-Aware Hash Codes for Fine-Grained Image Retrieval via Query Optimization
Peng Wang et.al. · 2025-09-21
SERVAL: Surprisingly Effective Zero-Shot Visual Document Retrieval Powered by Large Vision and Language Models
Thong Nguyen et.al. · 2025-09-18
PRISM: Product Retrieval In Shopping Carts using Hybrid Matching
Arda Kabadayi et.al. · 2025-09-18
Chain-of-Thought Re-ranking for Image Retrieval Tasks
Shangrong Wu et.al. · 2025-09-18
DiffVL: Diffusion-Based Visual Localization on 2D Maps via BEV-Conditioned GPS Denoising
Li Gao et.al. · 2025-09-18
Event-LAB: Towards Standardized Evaluation of Neuromorphic Localization Methods
Adam D. Hines et.al. · 2025-09-18
Hashing-Baseline: Rethinking Hashing in the Age of Pretrained Models
Ilyass Moummad et.al. · 2025-09-17
CSMoE: An Efficient Remote Sensing Foundation Model with Soft Mixture-of-Experts
Leonard Hackel et.al. · 2025-09-17
DiffHash: Text-Guided Targeted Attack via Diffusion Models against Deep Hashing Image Retrieval
Zechao Liu et.al. · 2025-09-17
Semantic-Enhanced Cross-Modal Place Recognition for Robust Robot Localization
Yujia Lin et.al. · 2025-09-16
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Nikhil Keetha et.al. · 2025-09-16
Bridging Vision Language Models and Symbolic Grounding for Video Question Answering
Haodi Ma et.al. · 2025-09-15
Listening for "You": Enhancing Speech Image Retrieval via Target Speaker Extraction
Wenhao Yang et.al. · 2025-09-11
Aerial-ground Cross-modal Localization: Dataset, Ground-truth, and Benchmark
Yandi Yang et.al. · 2025-09-09
Back To The Drawing Board: Rethinking Scene-Level Sketch-Based Image Retrieval
Emil Demić et.al. · 2025-09-08
Towards an Accurate and Effective Robot Vision (The Problem of Topological Localization for Mobile Robots)
Emanuela Boros et.al. · 2025-09-05
FloodVision: Urban Flood Depth Estimation Using Foundation Vision-Language Models and Domain Knowledge Graph
Zhangding Liu et.al. · 2025-09-05
Global-to-Local or Local-to-Global? Enhancing Image Retrieval with Efficient Local Search and Effective Global Re-ranking
Dror Aiger et.al. · 2025-09-05
DUDE: Diffusion-Based Unsupervised Cross-Domain Image Retrieval
Ruohong Yang et.al. · 2025-09-04
Scale, Don't Fine-tune: Guiding Multimodal LLMs for Efficient Visual Place Recognition at Test-Time
Jintao Cheng et.al. · 2025-09-02
Ensemble-Based Event Camera Place Recognition Under Varying Illumination
Therese Joseph et.al. · 2025-09-02
M3Ret: Unleashing Zero-shot Multimodal Medical Image Retrieval via Self-Supervision
Che Liu et.al. · 2025-09-01
ReCap: Event-Aware Image Captioning with Article Retrieval and Semantic Gaussian Normalization
Thinh-Phuc Nguyen et.al. · 2025-09-01
FAR-Net: Multi-Stage Fusion Network with Enhanced Semantic Alignment and Adaptive Reconciliation for Composed Image Retrieval
Jeong-Woo Park et.al. · 2025-07-17
MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval
Jeong-Woo Park et.al. · 2025-07-17
QuRe: Query-Relevant Retrieval through Hard Negative Sampling in Composed Image Retrieval
Jaehyun Kwak et.al. · 2025-07-16
CorrMoE: Mixture of Experts with De-stylization Learning for Cross-Scene and Cross-Domain Correspondence Pruning
Peiwen Xia et.al. · 2025-07-16
GT-Loc: Unifying When and Where in Images Through a Joint Embedding Space
David G. Shatwell et.al. · 2025-07-14
Text-to-Remote-Sensing-Image Retrieval beyond RGB Sources
Daniele Rege Cambrin et.al. · 2025-07-14
Kaleidoscopic Background Attack: Disrupting Pose Estimation with Multi-Fold Radial Symmetry Textures
Xinlong Ding et.al. · 2025-07-14
RadiomicsRetrieval: A Customizable Framework for Medical Image Retrieval Using Radiomics Features
Inye Na et.al. · 2025-07-11
LiDAR, GNSS and IMU Sensor Alignment through Dynamic Time Warping to Construct 3D City Maps
Haitian Wang et.al. · 2025-07-11
Deep Hashing with Semantic Hash Centers for Image Retrieval
Li Chen et.al. · 2025-07-11
SCREP: Scene Coordinate Regression and Evidential Learning-based Perception-Aware Trajectory Generation
Juyeop Han et.al. · 2025-07-10
VP-SelDoA: Visual-prompted Selective DoA Estimation of Target Sound via Semantic-Spatial Matching
Yu Chen et.al. · 2025-07-10
Evaluating Attribute Confusion in Fashion Text-to-Image Generation
Ziyue Liu et.al. · 2025-07-09
MS-DPPs: Multi-Source Determinantal Point Processes for Contextual Diversity Refinement of Composite Attributes in Text to Image Retrieval
Naoya Sogi et.al. · 2025-07-09
Automatic Synthesis of High-Quality Triplet Data for Composed Image Retrieval
Haiwen Li et.al. · 2025-07-08
OFFSET: Segmentation-based Focus Shift Revision for Composed Image Retrieval
Zhiwei Chen et.al. · 2025-07-08
What's Making That Sound Right Now? Video-centric Audio-Visual Localization
Hahyeon Choi et.al. · 2025-07-08
Llama Nemoretriever Colembed: Top-Performing Text-Image Retrieval Model
Mengyao Xu et.al. · 2025-07-07
An analysis of vision-language models for fabric retrieval
Francesco Giuliari et.al. · 2025-07-07
Simultaneous Localization and Mapping Using Active mmWave Sensing in 5G NR
Tao Du et.al. · 2025-07-07
U-ViLAR: Uncertainty-Aware Visual Localization for Autonomous Driving via Differentiable Association and Registration
Xiaofan Li et.al. · 2025-07-06
Query-Based Adaptive Aggregation for Multi-Dataset Joint Training Toward Universal Visual Place Recognition
Jiuhong Xiao et.al. · 2025-07-04
LoD-Loc v2: Aerial Visual Localization over Low Level-of-Detail City Models using Explicit Silhouette Alignment
Juelin Zhu et.al. · 2025-07-01
Utilizing a Novel Deep Learning Method for Scene Categorization in Remote Sensing Data
Ghufran A. Omran et.al. · 2025-06-28
Mask-aware Text-to-Image Retrieval: Referring Expression Segmentation Meets Cross-modal Retrieval
Li-Cheng Shen et.al. · 2025-06-28
MatChA: Cross-Algorithm Matching with Feature Augmentation
Paula Carbó Cubero et.al. · 2025-06-27
OracleFusion: Assisting the Decipherment of Oracle Bone Script with Structurally Constrained Semantic Typography
Caoshuo Li et.al. · 2025-06-26
Referring Expression Instance Retrieval and A Strong End-to-End Baseline
Xiangzhao Hao et.al. · 2025-06-26
Visualizing intercalation effects in 2D materials using AFM based techniques
Karmen Kapustić et.al. · 2025-06-25
On the Burstiness of Faces in Set
Jiong Wang et.al. · 2025-06-25
jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval
Michael Günther et.al. · 2025-06-24
Class Agnostic Instance-level Descriptor for Visual Instance Search
Qi-Ying Sun et.al. · 2025-06-20
MambaHash: Visual State Space Deep Hashing Model for Large-Scale Image Retrieval
Chao He et.al. · 2025-06-19
Fine-grained Image Retrieval via Dual-Vision Adaptation
Xin Jiang et.al. · 2025-06-19
Adversarial Attacks and Detection in Visual Place Recognition for Safer Robot Navigation
Connor Malone et.al. · 2025-06-19
Semantic and Feature Guided Uncertainty Quantification of Visual Localization for Autonomous Vehicles
Qiyuan Wu et.al. · 2025-06-18
ReSeDis: A Dataset for Referring-based Object Search across Large-Scale Image Collections
Ziling Huang et.al. · 2025-06-18
HARMONY: A Scalable Distributed Vector Database for High-Throughput Approximate Nearest Neighbor Search
Qian Xu et.al. · 2025-06-17
TACS-Graphs: Traversability-Aware Consistent Scene Graphs for Ground Robot Indoor Localization and Mapping
Jeewon Kim et.al. · 2025-06-17
Hierarchical Multi-Positive Contrastive Learning for Patent Image Retrieval
Kshitij Kavimandan et.al. · 2025-06-17
A Semantically-Aware Relevance Measure for Content-Based Medical Image Retrieval Evaluation
Xiaoyang Wei et.al. · 2025-06-16
EmbodiedPlace: Learning Mixture-of-Features with Embodied Constraints for Visual Place Recognition
Bingxi Liu et.al. · 2025-06-16
SuperPlace: The Renaissance of Classical Feature Aggregation for Visual Place Recognition in the Era of Foundation Models
Bingxi Liu et.al. · 2025-06-16
Feature Complementation Architecture for Visual Place Recognition
Weiwei Wang et.al. · 2025-06-14
Towards a general-purpose foundation model for fMRI analysis
Cheng Wang et.al. · 2025-06-11
Improving Personalized Search with Regularized Low-Rank Parameter Updates
Fiona Ryan et.al. · 2025-06-11
Hierarchical Image Matching for UAV Absolute Visual Localization via Semantic and Structural Constraints
Xiangkai Zhang et.al. · 2025-06-11
Safeguarding Multimodal Knowledge Copyright in the RAG-as-a-Service Environment
Tianyu Chen et.al. · 2025-06-10
Robust Visual Localization via Semantic-Guided Multi-Scale Transformer
Zhongtao Tian et.al. · 2025-06-10
Interpretable and Reliable Detection of AI-Generated Images via Grounded Reasoning in MLLMs
Yikun Ji et.al. · 2025-06-08
Zero Shot Composed Image Retrieval
Santhosh Kakarla et.al. · 2025-06-07
GenIR: Generative Visual Feedback for Mental Image Retrieval
Diji Yang et.al. · 2025-06-06
Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning
Sheng Chen et.al. · 2025-06-06
HypeVPR: Exploring Hyperbolic Space for Perspective to Equirectangular Visual Place Recognition
Suhan Woo et.al. · 2025-06-05
Deep Learning Reforms Image Matching: A Survey and Outlook
Shihua Zhang et.al. · 2025-06-05
Entity Image and Mixed-Modal Image Retrieval Datasets
Cristian-Ioan Blaga et.al. · 2025-06-02
Quantization-based Bounds on the Wasserstein Metric
Jonathan Bobrutsky et.al. · 2025-06-01
SORCE: Small Object Retrieval in Complex Environments
Chunxu Liu et.al. · 2025-05-30
Sketch Down the FLOPs: Towards Efficient Networks for Human Sketch
Aneeshan Sain et.al. · 2025-05-29
4DTAM: Non-Rigid Tracking and Mapping via Dynamic Surface Gaussians
Hidenobu Matsuki et.al. · 2025-05-28
UAVPairs: A Challenging Benchmark for Match Pair Retrieval of Large-scale UAV Images
Junhuan Liu et.al. · 2025-05-28
Fast Feature Matching of UAV Images via Matrix Band Reduction-based GPU Data Schedule
San Jiang et.al. · 2025-05-28
Visual Loop Closure Detection Through Deep Graph Consensus
Martin Büchner et.al. · 2025-05-27
QuARI: Query Adaptive Retrieval Improvement
Eric Xing et.al. · 2025-05-27
ConText-CIR: Learning from Concepts in Text for Composed Image Retrieval
Eric Xing et.al. · 2025-05-27
Visualized Text-to-Image Retrieval
Di Wu et.al. · 2025-05-26
Multimodal Reasoning Agent for Zero-Shot Composed Image Retrieval
Rong-Cheng Tu et.al. · 2025-05-26
Can Visual Encoder Learn to See Arrows?
Naoyuki Terashita et.al. · 2025-05-26
TAT-VPR: Ternary Adaptive Transformer for Dynamic and Efficient Visual Place Recognition
Oliver Grainge et.al. · 2025-05-22
Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval
Siting Li et.al. · 2025-05-21
SCENIR: Visual Semantic Clarity through Unsupervised Scene Graph Retrieval
Nikolaos Chaidos et.al. · 2025-05-21
Multimodal RAG-driven Anomaly Detection and Classification in Laser Powder Bed Fusion using Large Language Models
Kiarash Naghavi Khanghah et.al. · 2025-05-20
MMS-VPR: Multimodal Street-Level Visual Place Recognition Dataset and Benchmark
Yiwei Ou et.al. · 2025-05-18
Improved Bag-of-Words Image Retrieval with Geometric Constraints for Ground Texture Localization
Aaron Wilhelm et.al. · 2025-05-16
Redundancy-Aware Pretraining of Vision-Language Foundation Models in Remote Sensing
Mathis Jürgen Adler et.al. · 2025-05-16
SafeNav: Safe Path Navigation using Landmark Based Localization in a GPS-denied Environment
Ganesh Sapkota et.al. · 2025-05-13
Thermal-LiDAR Fusion for Robust Tunnel Localization in GNSS-Denied and Low-Visibility Conditions
Lukas Schichler et.al. · 2025-05-06
LiftFeat: 3D Geometry-Aware Local Feature Matching
Yepeng Liu et.al. · 2025-05-06
Seeing the Abstract: Translating the Abstract Language for Vision Language Models
Davide Talon et.al. · 2025-05-06
OBD-Finder: Explainable Coarse-to-Fine Text-Centric Oracle Bone Duplicates Discovery
Chongsheng Zhang et.al. · 2025-05-04
NeuroLoc: Encoding Navigation Cells for 6-DOF Camera Localization
Xun Li et.al. · 2025-05-02
GSFeatLoc: Visual Localization Using Feature Correspondence on 3D Gaussian Splatting
Jongwon Lee et.al. · 2025-05-01
From Mapping to Composing: A Two-Stage Framework for Zero-shot Composed Image Retrieval
Yabing Wang et.al. · 2025-04-25
A Guide to Structureless Visual Localization
Vojtech Panek et.al. · 2025-04-24
Rethinking Vision Transformer for Large-Scale Fine-Grained Image Retrieval
Xin Jiang et.al. · 2025-04-23
Media Content Atlas: A Pipeline to Explore and Investigate Multidimensional Media Space using Multimodal LLMs
Merve Cerit et.al. · 2025-04-22
A Multimodal Recaptioning Framework to Account for Perceptual Diversity in Multilingual Vision-Language Modeling
Kyle Buettner et.al. · 2025-04-19
SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs
Haoxuan Li et.al. · 2025-04-17
Generalized Visual Relation Detection with Diffusion Models
Kaifeng Gao et.al. · 2025-04-16
Visual Re-Ranking with Non-Visual Side Information
Gustav Hanning et.al. · 2025-04-15
TMCIR: Token Merge Benefits Composed Image Retrieval
Chaoyang Wang et.al. · 2025-04-15
Focus on Local: Finding Reliable Discriminative Regions for Visual Place Recognition
Changwei Wang et.al. · 2025-04-14
Evolved Hierarchical Masking for Self-Supervised Learning
Zhanzhou Feng et.al. · 2025-04-12
HAL-NeRF: High Accuracy Localization Leveraging Neural Radiance Fields
Asterios Reppas et.al. · 2025-04-11
Hypergraph Vision Transformers: Images are More than Nodes, More than Edges
Joshua Fixelle et.al. · 2025-04-11
FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations
Cheng-Yu Hsieh et.al. · 2025-04-11
PNE-SGAN: Probabilistic NDT-Enhanced Semantic Graph Attention Network for LiDAR Loop Closure Detection
Xiong Li et.al. · 2025-04-11
Multi-modal Reference Learning for Fine-grained Text-to-Image Retrieval
Zehong Ma et.al. · 2025-04-10
A Pointcloud Registration Framework for Relocalization in Subterranean Environments
David Akhihiero et.al. · 2025-04-09
Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception
Ruotian Peng et.al. · 2025-04-09
To Match or Not to Match: Revisiting Image Matching for Reliable Visual Place Recognition
Davide Sferrazza et.al. · 2025-04-08
NCL-CIR: Noise-aware Contrastive Learning for Composed Image Retrieval
Peng Gao et.al. · 2025-04-06
Re-thinking Temporal Search for Long-Form Video Understanding
Jinhui Ye et.al. · 2025-04-06
REJEPA: A Novel Joint-Embedding Predictive Architecture for Efficient Remote Sensing Image Retrieval
Shabnam Choudhury et.al. · 2025-04-04
A Chefs KISS -- Utilizing semantic information in both ICP and SLAM framework
Sven Ochs et.al. · 2025-04-02
Prompt-Guided Attention Head Selection for Focus-Oriented Image Retrieval
Yuji Nozawa et.al. · 2025-04-02
IDMR: Towards Instance-Driven Precise Visual Correspondence in Multimodal Retrieval
Bangwei Liu et.al. · 2025-04-01
Scaling Prompt Instructed Zero Shot Composed Image Retrieval with Image-Only Data
Yiqun Duan et.al. · 2025-04-01
CIBR: Cross-modal Information Bottleneck Regularization for Robust CLIP Generalization
Yingrui Ji et.al. · 2025-03-31
LiM-Loc: Visual Localization with Dense and Accurate 3D Reference Maps Directly Corresponding 2D Keypoints to 3D LiDAR Point Clouds
Masahiko Tsuji et.al. · 2025-03-31
Multiview Image-Based Localization
Cameron Fiore et.al. · 2025-03-30
LOCORE: Image Re-ranking with Long-Context Sequence Modeling
Zilin Xiao et.al. · 2025-03-27
Fwd2Bot: LVLM Visual Token Compression with Double Forward Bottleneck
Adrian Bulat et.al. · 2025-03-27
UGNA-VPR: A Novel Training Paradigm for Visual Place Recognition Based on Uncertainty-Guided NeRF Augmentation
Yehui Shen et.al. · 2025-03-27
FineCIR: Explicit Parsing of Fine-Grained Modification Semantics for Composed Image Retrieval
Zixu Li et.al. · 2025-03-27
Clean Image May be Dangerous: Data Poisoning Attacks Against Deep Hashing
Shuai Li et.al. · 2025-03-27
CoLLM: A Large Language Model for Composed Image Retrieval
Chuong Huynh et.al. · 2025-03-25
Scene-agnostic Pose Regression for Visual Localization
Junwei Zheng et.al. · 2025-03-25
From Sparse to Dense: Camera Relocalization with Scene-Specific Detector from Feature Gaussian Splatting
Zhiwei Huang et.al. · 2025-03-25
Fine-grained Textual Inversion Network for Zero-Shot Composed Image Retrieval
Haoqiang Lin et.al. · 2025-03-25
LocDiffusion: Identifying Locations on Earth by Diffusing in the Hilbert Space
Zhangyu Wang et.al. · 2025-03-23
Selecting and Pruning: A Differentiable Causal Sequentialized State-Space Model for Two-View Correspondence Learning
Xiang Fang et.al. · 2025-03-23
What Time Tells Us? An Explorative Study of Time Awareness Learned from Static Images
Dongheng Lin et.al. · 2025-03-23
good4cir: Generating Detailed Synthetic Captions for Composed Image Retrieval
Pranavi Kolouju et.al. · 2025-03-22
Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval
Yuanmin Tang et.al. · 2025-03-21
Autonomous Exploration-Based Precise Mapping for Mobile Robots through Stepwise and Consistent Motions
Muhua Zhang et.al. · 2025-03-21
PromptHash: Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing Retrieval
Qiang Zou et.al. · 2025-03-20
Automating 3D Dataset Generation with Neural Radiance Fields
P. Schulz et.al. · 2025-03-20
3D Densification for Multi-Map Monocular VSLAM in Endoscopy
X. Anadón et.al. · 2025-03-18
A-SCoRe: Attention-based Scene Coordinate Regression for wide-ranging scenarios
Huy-Hoang Bui et.al. · 2025-03-18
Scale Efficient Training for Large Datasets
Qing Zhou et.al. · 2025-03-17
Multi-Platform Teach-and-Repeat Navigation by Visual Place Recognition Based on Deep-Learned Local Features
Václav Truhlařík et.al. · 2025-03-17
All You Need to Know About Training Image Retrieval Models
Gabriele Berton et.al. · 2025-03-17
ImageScope: Unifying Language-Guided Image Retrieval via Large Multimodal Model Collective Reasoning
Pengfei Luo et.al. · 2025-03-13
Exploring the best way for UAV visual localization under Low-altitude Multi-view Observation Condition: a Benchmark
Yibin Ye et.al. · 2025-03-12
Revisiting Medical Image Retrieval via Knowledge Consolidation
Yang Nan et.al. · 2025-03-12
CQVPR: Landmark-aware Contextual Queries for Visual Place Recognition
Dongyue Li et.al. · 2025-03-11
Find your Needle: Small Object Image Retrieval via Multi-Object Attention Optimization
Michael Green et.al. · 2025-03-10
Zero-Shot Hashing Based on Reconstruction With Part Alignment
Yan Jiang et.al. · 2025-03-10
Improving Visual Place Recognition with Sequence-Matching Receptiveness Prediction
Somayeh Hussaini et.al. · 2025-03-10
RoboDesign1M: A Large-scale Dataset for Robot Design Understanding
Tri Le et.al. · 2025-03-09
StructVPR++: Distill Structural and Semantic Knowledge with Weighting Samples for Visual Place Recognition
Yanqing Shen et.al. · 2025-03-09
TextInPlace: Indoor Visual Place Recognition in Repetitive Structures with Scene Text Spotting and Verification
Huaqi Tao et.al. · 2025-03-09
NeuraLoc: Visual Localization in Neural Implicit Map with Dual Complementary Features
Hongjia Zhai et.al. · 2025-03-08
Select a paper to read