Visual Localization

ULF-Loc: Unbiased Landmark Feature for Robust Visual Localization with 3D Gaussian Splatting

Yingdong Gu et al. · 2026-05-06

Depth-Guided Privacy-Preserving Visual Localization Using 3D Sphere Clouds

Heejoon Moon et al. · 2026-05-01

MSACT: Multistage Spatial Alignment for Stable Low-Latency Fine Manipulation

Xianbo Cai et al. · 2026-05-01

AirZoo: A Unified Large-Scale Dataset for Grounding Aerial Geometric 3D Vision

Xiaoya Cheng et al. · 2026-04-29

3D-LENS: A 3D Lifting-based Elevated Novel-view Synthesis method for Single-View Aerial-Ground Re-Identification

William Grolleau et al. · 2026-04-29

COMPASS: COmpact Multi-channel Prior-map And Scene Signature for Floor-Plan-Based Visual Localization

Muhammad Shaheer et al. · 2026-04-28

Geometric Analysis of Self-Supervised Vision Representations for Semantic Image Retrieval

Esteban Rodríguez-Betancourt et al. · 2026-04-27

Region Matters: Efficient and Reliable Region-Aware Visual Place Recognition

Shunpeng Chen et al. · 2026-04-24

Revisiting Geometric Obfuscation with Dual Convergent Lines for Privacy-Preserving Image Queries in Visual Localization

Jeonggon Kim et al. · 2026-04-24

TEMA: Anchor the Image, Follow the Text for Multi-Modification Composed Image Retrieval

Zixu Li et al. · 2026-04-24

ConeSep: Cone-based Robust Noise-Unlearning Compositional Network for Composed Image Retrieval

Zixu Li et al. · 2026-04-22

UniCVR: From Alignment to Reranking for Unified Zero-Shot Composed Visual Retrieval

Haokun Wen et al. · 2026-04-22

Air-Know: Arbiter-Calibrated Knowledge-Internalizing Robust Network for Composed Image Retrieval

Zhiheng Fu et al. · 2026-04-22

SL(C)AMma: Simultaneous Localisation, (Calibration) and Mapping With a Magnetometer Array

Thomas Edridge et al. · 2026-04-21

T-REN: Learning Text-Aligned Region Tokens Improves Dense Vision-Language Alignment and Scalability

Savya Khosla et al. · 2026-04-20

INTENT: Invariance and Discrimination-aware Noise Mitigation for Robust Composed Image Retrieval

Zhiwei Chen et al. · 2026-04-20

HABIT: Chrono-Synergia Robust Progressive Learning Framework for Composed Image Retrieval

Zixu Li et al. · 2026-04-20

Brain-Inspired Capture: Evidence-Driven Neuromimetic Perceptual Simulation for Visual Decoding

Feixue Shao et al. · 2026-04-20

ReTrack: Evidence-Driven Dual-Stream Directional Anchor Calibration Network for Composed Video Retrieval

Zixu Li et al. · 2026-04-20

Subject-Aware Multi-Granularity Alignment for Zero-Shot EEG-to-Image Retrieval

Lin Jiang et al. · 2026-04-20

mEOL: Training-Free Instruction-Guided Multimodal Embedder for Vector Graphics and Image Retrieval

Kyeong Seon Kim et al. · 2026-04-18

KIRA: Knowledge-Intensive Image Retrieval and Reasoning Architecture for Specialized Visual Domains

Parthaw Goswami et al. · 2026-04-18

Where Do Vision-Language Models Fail? World Scale Analysis for Image Geolocalization

Siddhant Bharadwaj et al. · 2026-04-17

Continual Hand-Eye Calibration for Open-world Robotic Manipulation

Fazeng Li et al. · 2026-04-17

Sketch and Text Synergy: Fusing Structural Contours and Descriptive Attributes for Fine-Grained Image Retrieval

Siyuan Wang et al. · 2026-04-17

SceneGlue: Scene-Aware Transformer for Feature Matching without Scene-Level Annotation

Songlin Du et al. · 2026-04-15

Indexing Multimodal Language Models for Large-scale Image Retrieval

Bahey Tharwat et al. · 2026-04-14

A Sanity Check on Composed Image Retrieval

Yikun Liu et al. · 2026-04-14

VidTAG: Temporally Aligned Video to GPS Geolocalization with Denoising Sequence Prediction at a Global Scale

Parth Parag Kulkarni et al. · 2026-04-14

Human-Inspired Context-Selective Multimodal Memory for Social Robots

Hangyeol Kang et al. · 2026-04-13

Privacy-Preserving Structureless Visual Localization via Image Obfuscation

Vojtech Panek et al. · 2026-04-13

Seeing Through Touch: Tactile-Driven Visual Localization of Material Regions

Seongyu Kim et al. · 2026-04-13

CLAY: Conditional Visual Similarity Modulation in Vision-Language Embedding Space

Sohwi Lim et al. · 2026-04-13

FashionMV: Product-Level Composed Image Retrieval with Multi-View Fashion Data

Peng Yuan et al. · 2026-04-11

AsymLoc: Towards Asymmetric Feature Matching for Efficient Visual Localization

Mohammad Omama et al. · 2026-04-10

Bag of Bags: Adaptive Visual Vocabularies for Genizah Join Image Retrieval

Sharva Gogawale et al. · 2026-04-09

SearchAD: Large-Scale Rare Image Retrieval Dataset for Autonomous Driving

Felix Embacher et al. · 2026-04-09

Learning to Search: A Decision-Based Agent for Knowledge-Based Visual Question Answering

Zhuohong Chen et al. · 2026-04-09

VGGT-SLAM++

Avilasha Mandal et al. · 2026-04-08

Stealthy and Adjustable Text-Guided Backdoor Attacks on Multimodal Pretrained Models

Yiyang Zhang et al. · 2026-04-07

WRF4CIR: Weight-Regularized Fine-Tuning Network for Composed Image Retrieval

Yizhuo Xu et al. · 2026-04-07

LSGS-Loc: Towards Robust 3DGS-Based Visual Localization for Large-Scale UAV Scenarios

Xiang Zhang et al. · 2026-04-07

Beyond Semantic Search: Towards Referential Anchoring in Composed Image Retrieval

Yuxin Yang et al. · 2026-04-07

CraterBench-R: Instance-Level Crater Retrieval for Planetary Scale

Jichao Fang et al. · 2026-04-06

MPTF-Net: Multi-view Pyramid Transformer Fusion Network for LiDAR-based Place Recognition

Shuyuan Li et al. · 2026-04-06

MELT: Improve Composed Image Retrieval via the Modification Frequentation-Rarity Balance Network

Guozhi Qiu et al. · 2026-03-31

RHO: Robust Holistic OSM-Based Metric Cross-View Geo-Localization

Junwei Zheng et al. · 2026-03-29

NeedleDB: A Generative-AI Based System for Accurate and Efficient Image Retrieval using Complex Natural Language Queries

Mahdi Erfanian et al. · 2026-03-29

TIGeR: A Unified Framework for Time, Images and Geo-location Retrieval

David G. Shatwell et al. · 2026-03-28

Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

Moritz Nottebaum et al. · 2026-03-27

HINT: Composed Image Retrieval with Dual-path Compositional Contextualized Network

Mingyu Zhang et al. · 2026-03-27

4DRaL: Bridging 4D Radar with LiDAR for Place Recognition using Knowledge Distillation

Ningyuan Huang et al. · 2026-03-27

Few Shots Text to Image Retrieval: New Benchmarking Dataset and Optimization Methods

Ofer Idan et al. · 2026-03-26

Just Zoom In: Cross-View Geo-Localization via Autoregressive Zooming

Yunus Talha Erzurumlu et al. · 2026-03-26

On-Demand Instructional Material Providing Agent Based on MLLM for Tutoring Support

Takumi Kato et al. · 2026-03-26

Sparse Autoencoders for Interpretable Medical Image Representation Learning

Philipp Wesp et al. · 2026-03-24

ARGENT: Adaptive Hierarchical Image-Text Representations

Chuong Huynh et al. · 2026-03-24

Retrieval-Guided Photovoltaic Inventory Estimation from Satellite Imagery for Distribution Grid Planning

Muhao Guo et al. · 2026-03-24

SOUPLE: Enhancing Audio-Visual Localization and Segmentation with Learnable Prompt Contexts

Khanh Binh Nguyen et al. · 2026-03-24

HyFI: Hyperbolic Feature Interpolation for Brain-Vision Alignment

Sangmin Jo et al. · 2026-03-24

ADaFuSE: Adaptive Diffusion-generated Image and Text Fusion for Interactive Text-to-Image Retrieval

Zhuocheng Zhang et al. · 2026-03-23

SATTC: Structure-Aware Label-Free Test-Time Calibration for Cross-Subject EEG-to-Image Retrieval

Qunjie Huang et al. · 2026-03-21

A Multihead Continual Learning Framework for Fine-Grained Fashion Image Retrieval with Contrastive Learning and Exponential Moving Average Distillation

Ling Xiao et al. · 2026-03-21

IsoCLIP: Decomposing CLIP Projectors for Efficient Intra-modal Alignment

Simone Magistri et al. · 2026-03-20

IUP-Pose: Decoupled Iterative Uncertainty Propagation for Real-time Relative Pose Regression via Implicit Dense Alignment v1

Jun Wang et al. · 2026-03-20

MCoT-MVS: Multi-level Vision Selection by Multi-modal Chain-of-Thought Reasoning for Composed Image Retrieval

Xuri Ge et al. · 2026-03-18

VisBrowse-Bench: Benchmarking Visual-Native Search for Multimodal Browsing Agents

Zhengbo Zhang et al. · 2026-03-18

Visual Product Search Benchmark

Karthik Sulthanpete Govindappa · 2026-03-17

Retrieving Counterfactuals Improves Visual In-Context Learning

Guangzhi Xiong et al. · 2026-03-17

HMAR: Hierarchical Modality-Aware Expert and Dynamic Routing Medical Image Retrieval Architecture

Aojie Yuan · 2026-03-17

Rethinking Pose Refinement in 3D Gaussian Splatting under Pose Prior and Geometric Uncertainty

Mangyu Kong et al. · 2026-03-17

Evaluation of Visual Place Recognition Methods for Image Pair Retrieval in 3D Vision and Robotics

Dennis Haitz et al. · 2026-03-14

Sky2Ground: A Benchmark for Site Modeling under Varying Altitude

Zengyan Wang et al. · 2026-03-14

A Closed-Form Solution for Debiasing Vision-Language Models with Utility Guarantees Across Modalities and Tasks

Tangzheng Lian et al. · 2026-03-13

Text-Phase Synergy Network with Dual Priors for Unsupervised Cross-Domain Image Retrieval

Jing Yang et al. · 2026-03-13

CM-Bench: A Comprehensive Cross-Modal Feature Matching Benchmark Bridging Visible and Infrared Images

Liangzheng Sun et al. · 2026-03-13

FBCIR: Balancing Cross-Modal Focuses in Composed Image Retrieval

Chenchen Zhao et al. · 2026-03-12

Efficient Cross-View Localization in 6G Space-Air-Ground Integrated Network

Min Hao et al. · 2026-03-12

Composed Vision-Language Retrieval for Skin Cancer Case Search via Joint Alignment of Global and Local Representations

Yuheng Wang et al. · 2026-03-10

$L^3$:Scene-agnostic Visual Localization in the Wild

Yu Zhang et al. · 2026-03-09

QdaVPR: A novel query-based domain-agnostic model for visual place recognition

Shanshan Wan et al. · 2026-03-08

T2Nav Algebraic Topology Aware Temporal Graph Memory and Loop Detection for ZeroShot Visual Navigation

Quang-Anh N. D. et al. · 2026-03-06

EventGeM: Global-to-Local Feature Matching for Event-Based Visual Place Recognition

Adam D. Hines et al. · 2026-03-06

Visual Words Meet BM25: Sparse Auto-Encoder Visual Word Scoring for Image Retrieval

Donghoon Han et al. · 2026-03-06

Loop Closure via Maximal Cliques in 3D LiDAR-Based SLAM

Javier Laserna et al. · 2026-03-05

PinPoint: Evaluation of Composed Image Retrieval with Explicit Negatives, Multi-Image Queries, and Paraphrase Testing

Rohan Mahadev et al. · 2026-03-04

SSR: A Generic Framework for Text-Aided Map Compression for Localization

Mohammad Omama et al. · 2026-03-04

Long-Term Visual Localization in Dynamic Benthic Environments: A Dataset, Footprint-Based Ground Truth, and Visual Place Recognition Benchmark

Martin Kvisvik Larsen et al. · 2026-03-04

VGG-T$^3$: Offline Feed-Forward 3D Reconstruction at Scale

Sven Elflein et al. · 2026-02-26

WISER: Wider Search, Deeper Thinking, and Adaptive Fusion for Training-Free Zero-Shot Composed Image Retrieval

Tianyue Wang et al. · 2026-02-26

Autoregressive Visual Decoding from EEG Signals

Sicheng Dai et al. · 2026-02-26

Pix2Key: Controllable Open-Vocabulary Retrieval with Semantic Decomposition and Self-Supervised Visual Dictionary Learning

Guoyizhe Wei et al. · 2026-02-26

Global-Aware Edge Prioritization for Pose Graph Initialization

Tong Wei et al. · 2026-02-25

Automatic Map Density Selection for Locally-Performant Visual Place Recognition

Somayeh Hussaini et al. · 2026-02-25

Seeing Through Words: Controlling Visual Retrieval Quality with Language Models

Jianglin Lu et al. · 2026-02-24

LST-SLAM: A Stereo Thermal SLAM System for Kilometer-Scale Dynamic Environments

Zeyu Jiang et al. · 2026-02-24

Long-Term Multi-Session 3D Reconstruction Under Substantial Appearance Change

Beverley Gorry et al. · 2026-02-24

Unlocking Multimodal Document Intelligence: From Current Triumphs to Future Frontiers of Visual Document Retrieval

Yibo Yan et al. · 2026-02-23

VGGT-MPR: VGGT-Enhanced Multimodal Place Recognition in Autonomous Driving Environments

Jingyi Xu et al. · 2026-02-23

Evaluating the Impact of Data Anonymization on Image Retrieval

Marvin Chen et al. · 2026-02-23

Knowledge-aware Visual Question Generation for Remote Sensing Images

Siran Li et al. · 2026-02-22

Questions beyond Pixels: Integrating Commonsense Knowledge in Visual Question Generation for Remote Sensing

Siran Li et al. · 2026-02-22

IRIS-SLAM: Unified Geo-Instance Representations for Robust Semantic Localization and Mapping

Tingyang Xiao et al. · 2026-02-21

VQPP: Video Query Performance Prediction Benchmark

Adrian Catalin Lutu et al. · 2026-02-19

DiffPlace: Street View Generation via Place-Controllable Diffusion Model Enhancing Place Recognition

Ji Li et al. · 2026-02-12

Arbitrary Ratio Feature Compression via Next Token Prediction

Yufan Liu et al. · 2026-02-12

DeepImageSearch: Benchmarking Multimodal Agents for Context-Aware Image Retrieval in Visual Histories

Chenlong Deng et al. · 2026-02-11

WristMIR: Coarse-to-Fine Region-Aware Retrieval of Pediatric Wrist Radiographs with Radiology Report-Driven Learning

Mert Sonmezer et al. · 2026-02-10

OSCAR: Optimization-Steered Agentic Planning for Composed Image Retrieval

Teng Wang et al. · 2026-02-09

A Sketch+Text Composed Image Retrieval Dataset for Thangka

Jinyu Xu et al. · 2026-02-09

UrbanGraphEmbeddings: Learning and Evaluating Spatially Grounded Multimodal Embeddings for Urban Science

Jie Zhang et al. · 2026-02-09

SDR-CIR: Semantic Debias Retrieval Framework for Training-Free Zero-Shot Composed Image Retrieval

Yi Sun et al. · 2026-02-05

SAR-RAG: ATR Visual Question Answering by Semantic Search, Retrieval, and MLLM Generation

David F. Ramirez et al. · 2026-02-04

Quantile Transfer for Reliable Operating Point Selection in Visual Place Recognition

Dhyey Manish Rajani et al. · 2026-02-04

Beyond Static Cropping: Layer-Adaptive Visual Localization and Decoding Enhancement

Zipeng Zhu et al. · 2026-02-04

Invariance on Manifolds: Understanding Robust Visual Representations for Place Recognition

Jintao Cheng et al. · 2026-02-04

LaVPR: Benchmarking Language and Vision for Place Recognition

Ofer Idan et al. · 2026-02-03

ObjEmbed: Towards Universal Multimodal Object Embeddings

Shenghao Fu et al. · 2026-02-03

Real-Time Loop Closure Detection in Visual SLAM via NetVLAD and Faiss

Enguang Fan · 2026-02-02

ReCALL: Recalibrating Capability Degradation for MLLM-based Composed Image Retrieval

Tianyu Yang et al. · 2026-02-02

Interacted Planes Reveal 3D Line Mapping

Zeran Ke et al. · 2026-02-01

Variance & Greediness: A comparative study of metric-learning losses

Donghuo Zeng et al. · 2026-01-29

When Vision Meets Texts in Listwise Reranking

Hongyi Cai · 2026-01-28

Eliminating Hallucination in Diffusion-Augmented Interactive Text-to-Image Retrieval

Zhuocheng Zhang et al. · 2026-01-28

VGGT-SLAM 2.0: Real time Dense Feed-forward Scene Reconstruction

Dominic Maggio et al. · 2026-01-27

Pixel-Grounded Retrieval for Knowledgeable Large Multimodal Models

Jeonghwan Kim et al. · 2026-01-27

X-Aligner: Composed Visual Retrieval without the Bells and Whistles

Yuqian Zheng et al. · 2026-01-23

Rethinking Composed Image Retrieval Evaluation: A Fine-Grained Benchmark from Image Editing

Tingyu Song et al. · 2026-01-22

Heterogeneous Uncertainty-Guided Composed Image Retrieval with Fine-Grained Probabilistic Learning

Haomiao Tang et al. · 2026-01-22

Unified Multimodal and Multilingual Retrieval via Multi-Task Learning with NLU Integration

Xinyuan Zhang et al. · 2026-01-21

LookBench: A Live and Holistic Open Benchmark for Fashion Image Retrieval

Chao Gao et al. · 2026-01-21

XR: Cross-Modal Agents for Composed Image Retrieval

Zhongyu Yang et al. · 2026-01-20

Fine-Grained Zero-Shot Composed Image Retrieval with Complementary Visual-Semantic Integration

Yongcong Ye et al. · 2026-01-20

Glance-or-Gaze: Incentivizing LMMs to Adaptively Focus Search via Reinforcement Learning

Hongbo Bai et al. · 2026-01-20

DC-VLAQ: Query-Residual Aggregation for Robust Visual Place Recognition

Hanyu Zhu et al. · 2026-01-19

SupScene: Learning Overlap-Aware Global Descriptor for Unconstrained SfM

Xulei Shi et al. · 2026-01-17

Simple Models, Rich Representations: Visual Decoding from Primate Intracortical Neural Signals

Matteo Ciferri et al. · 2026-01-16

Multilingual-To-Multimodal (M2M): Unlocking New Languages with Monolingual Text

Piyush Singh Pasi · 2026-01-15

UniHash: Unifying Pointwise and Pairwise Hashing Paradigms for Seen and Unseen Category Retrieval

Xiaoxu Ma et al. · 2026-01-14

Hybrid guided variational autoencoder for visual place recognition

Ni Wang et al. · 2026-01-14

Keyframe-based Dense Mapping with the Graph of View-Dependent Local Maps

Krzysztof Zielinski et al. · 2026-01-13

Enhancing Image Quality Assessment Ability of LMMs via Retrieval-Augmented Generation

Kang Fu et al. · 2026-01-13

Ground What You See: Hallucination-Resistant MLLMs via Caption Feedback, Diversity-Aware Sampling, and Conflict Regularization

Miao Pan et al. · 2026-01-13

Multi-task Cross-modal Learning for Chest X-ray Image Retrieval

Zhaohui Liang et al. · 2026-01-08

ImLoc: Revisiting Visual Localization with Image-based Representation

Xudong Jiang et al. · 2026-01-07

CSMCIR: CoT-Enhanced Symmetric Alignment with Memory Bank for Composed Image Retrieval

Zhipeng Qian et al. · 2026-01-07

BREATH-VL: Vision-Language-Guided 6-DoF Bronchoscopy Localization via Semantic-Geometric Fusion

Qingyao Tian et al. · 2026-01-07

HOLO: Homography-Guided Pose Estimator Network for Fine-Grained Visual Localization on SD Maps

Xuchang Zhong et al. · 2026-01-07

Comparative Analysis of Binarization Methods For Medical Image Hashing On Odir Dataset

Nedim Muzoglu · 2026-01-07

Loop Closure using AnyLoc Visual Place Recognition in DPV-SLAM

Wenzheng Zhang et al. · 2026-01-06

Vision-Language Reasoning for Geolocalization: A Reinforcement Learning Approach

Biao Wu et al. · 2026-01-05

OCP-LS: An Efficient Algorithm for Visual Localization

Jindi Zhong et al. · 2025-12-31

Geometric Multi-Session Map Merging with Learned Local Descriptors

Yanlong Ma et al. · 2025-12-30

Learning to Feel the Future: DreamTacVLA for Contact-Rich Manipulation

Guo Ye et al. · 2025-12-29

MindWatcher: Toward Smarter Multimodal Tool-Integrated Reasoning

Jiawei Chen et al. · 2025-12-29

Anomaly Detection by Effectively Leveraging Synthetic Images

Sungho Kang et al. · 2025-12-29

UniPR-3D: Towards Universal Visual Place Recognition with Visual Geometry Grounded Transformer

Tianchen Deng et al. · 2025-12-28

Reloc-VGGT: Visual Re-localization with Geometry Grounded Transformer

Tianchen Deng et al. · 2025-12-26

Leveraging Lightweight Entity Extraction for Scalable Event-Based Image Retrieval

Dao Sy Duy Minh et al. · 2025-12-24

Soft Filtering: Guiding Zero-shot Composed Image Retrieval with Prescriptive and Proscriptive Constraints

Youjin Jung et al. · 2025-12-23

Towards Natural Language-Based Document Image Retrieval: New Dataset and Benchmark

Hao Guo et.al. · 2025-12-23

Beyond CLIP: Knowledge-Enhanced Multimodal Transformers for Cross-Modal Alignment in Diabetic Retinopathy Diagnosis

Argha Kamal Samanta et.al. · 2025-12-22

Finer-Personalization Rank: Fine-Grained Retrieval Examines Identity Preservation for Personalized Generation

Connor Kilrain et.al. · 2025-12-22

Text2Graph VPR: A Text-to-Graph Expert System for Explainable Place Recognition in Changing Environments

Saeideh Yousefzadeh et.al. · 2025-12-21

Through the PRISm: Importance-Aware Scene Graphs for Image Retrieval

Dimitrios Georgoulopoulos et.al. · 2025-12-20

Robust Scene Coordinate Regression via Geometrically-Consistent Global Descriptors

Son Tung Nguyen et.al. · 2025-12-19

The Effect of Negation on CLIP in Medical Imaging: Limitations of Contrastive Language-Image Pretraining

Jasmine Vu et.al. · 2025-12-18

MACL: Multi-Label Adaptive Contrastive Learning Loss for Remote Sensing Image Retrieval

Amna Amir et.al. · 2025-12-18

CLNet: Cross-View Correspondence Makes a Stronger Geo-Localizationer

Xianwei Cao et.al. · 2025-12-16

Neurosymbolic Inference On Foundation Models For Remote Sensing Text-to-image Retrieval With Complex Queries

Emanuele Mezzi et.al. · 2025-12-16

Towards Test-time Efficient Visual Place Recognition via Asymmetric Query Processing

Jaeyoon Kim et.al. · 2025-12-15

Patch-wise Retrieval: A Bag of Practical Techniques for Instance-level Matching

Wonseok Choi et.al. · 2025-12-14

Beyond Pixels: A Training-Free, Text-to-Text Framework for Remote Sensing Image Retrieval

J. Xiao et.al. · 2025-12-11

YOPO-Nav: Visual Navigation using 3DGS Graphs from One-Pass Videos

Ryan Meegan et.al. · 2025-12-10

Adaptive Thresholding for Visual Place Recognition using Negative Gaussian Mixture Statistics

Nick Trinh et.al. · 2025-12-09

Generalized Referring Expression Segmentation on Aerial Photos

Luís Marnoto et.al. · 2025-12-08

Spatial Retrieval Augmented Autonomous Driving

Xiaosong Jia et.al. · 2025-12-07

Language-driven Fine-grained Retrieval

Shijie Wang et.al. · 2025-12-06

GuideNav: User-Informed Development of a Vision-Only Robotic Navigation Assistant For Blind Travelers

Hochul Hwang et.al. · 2025-12-05

ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning

Shengyuan Ding et.al. · 2025-12-04

Visual Reasoning Tracer: Object-Level Grounded Reasoning Benchmark

Haobo Yuan et.al. · 2025-12-04

Semantic-Guided Two-Stage GAN for Face Inpainting with Hybrid Perceptual Encoding

Abhigyan Bhattacharya et.al. · 2025-12-04

Revealing stimulus-dependent dynamics through statistical complexity

Edson V. de Paula et.al. · 2025-12-04

Influence of Object Affordance on Action Language Understanding: Evidence from Dynamic Causal Modeling Analysis

Supriya Bordoloi et.al. · 2025-12-04

LiteVGGT: Boosting Vanilla VGGT via Geometry-aware Cached Token Merging

Zhijian Shu et.al. · 2025-12-04

Terahertz Fourier Ptychographic Imaging

Pitambar Mukherjee et.al. · 2025-12-04

TEMPO-VINE: A Multi-Temporal Sensor Fusion Dataset for Localization and Mapping in Vineyards

Mauro Martini et.al. · 2025-12-04

MemLoRA: Distilling Expert Adapters for On-Device Memory Systems

Massimo Bini et.al. · 2025-12-04

Spectral micro-CT for quantitative analysis of calcification in fibrocartilage

Vittoria Mazzini et.al. · 2025-12-04

HUD: Hierarchical Uncertainty-Aware Disambiguation Network for Composed Video Retrieval

Zhiwei Chen et.al. · 2025-12-02

GeoBridge: A Semantic-Anchored Multi-View Foundation Model Bridging Images and Text for Geo-Localization

Zixuan Song et.al. · 2025-12-02

Generative Editing in the Joint Vision-Language Space for Zero-Shot Composed Image Retrieval

Xin Wang et.al. · 2025-12-01

Winning Solutions for the Rayan AI Contest: Compositional Retrieval, Zero-Shot Anomaly Detection, and Backdoor Detection

Ali Nafisi et.al. · 2025-12-01

MARVO: Marine-Adaptive Radiance-aware Visual Odometry

Sacchin Sundar et.al. · 2025-11-28

UNION: A Lightweight Target Representation for Efficient Zero-Shot Image-Guided Retrieval with Optional Textual Queries

Hoang-Bao Le et.al. · 2025-11-27

Attention-Guided Patch-Wise Sparse Adversarial Attacks on Vision-Language-Action Models

Naifu Zhang et.al. · 2025-11-26

Fast 3D Ultrasound Localization Microscopy via Projection-based Processing Framework

Jingke Zhang et.al. · 2025-11-26

Qwen3-VL Technical Report

Shuai Bai et.al. · 2025-11-26

Harmony: Harmonizing Audio and Video Generation through Cross-Task Synergy

Teng Hu et.al. · 2025-11-26

FITRep: Attention-Guided Item Representation via MLLMs

Guoxiao Zhang et.al. · 2025-11-26

Thinking With Bounding Boxes: Enhancing Spatio-Temporal Video Grounding via Reinforcement Fine-Tuning

Xin Gu et.al. · 2025-11-26

HTTM: Head-wise Temporal Token Merging for Faster VGGT

Weitian Wang et.al. · 2025-11-26

Low-dose Chemically Specific Bioimaging via Deep-UV Lensless Holographic Microscopy on a Standard Camera

Piotr Arcab et.al. · 2025-11-26

Adaptive Lighting Control in Visible Light Systems: An Integrated Sensing, Communication, and Illumination Framework

Xinyan Xie et.al. · 2025-11-26

Towards an Effective Action-Region Tracking Framework for Fine-grained Video Action Recognition

Baoli Sun et.al. · 2025-11-26

Wigner and Gabor phase-space analysis of propagators for evolution equations

Elena Cordero et.al. · 2025-11-24

Real-Time Object Tracking with On-Device Deep Learning for Adaptive Beamforming in Dynamic Acoustic Environments

Jorge Ortigoso-Narro et.al. · 2025-11-24

In-vivo imaging with a low-cost MRI scanner and cloud data processing in low-resource settings

Teresa Guallart-Naval et.al. · 2025-11-24

Can Modern Vision Models Understand the Difference Between an Object and a Look-alike?

Itay Cohen et.al. · 2025-11-24

From Pixels to Posts: Retrieval-Augmented Fashion Captioning and Hashtag Generation

Moazzam Umer Gondal et.al. · 2025-11-24

Graph-based 3D Human Pose Estimation using WiFi Signals

Jichao Chen et.al. · 2025-11-24

Towards Generalizable Deepfake Detection via Forgery-aware Audio-Visual Adaptation: A Variational Bayesian Approach

Fan Nie et.al. · 2025-11-24

LAA3D: A Benchmark of Detecting and Tracking Low-Altitude Aircraft in 3D Space

Hai Wu et.al. · 2025-11-24

Multi-Agent Monocular Dense SLAM With 3D Reconstruction Priors

Haihang Wu et.al. · 2025-11-24

Dynamic Granularity Matters: Rethinking Vision Transformers Beyond Fixed Patch Splitting

Qiyang Yu et.al. · 2025-11-24

GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization

Yikun Wang et.al. · 2025-11-19

First Frame Is the Place to Go for Video Content Customization

Jingxi Chen et.al. · 2025-11-19

Hierarchical Semantic Tree Anchoring for CLIP-Based Class-Incremental Learning

Tao Hu et.al. · 2025-11-19

Multi-Text Guided Few-Shot Semantic Segmentation

Qiang Jiao et.al. · 2025-11-19

SIGMMA: Hierarchical Graph-Based Multi-Scale Multi-modal Contrastive Alignment of Histopathology Image and Spatial Transcriptome

Dabin Jeong et.al. · 2025-11-19

HV-Attack: Hierarchical Visual Attack for Multimodal Retrieval Augmented Generation

Linyin Luo et.al. · 2025-11-19

The Empowerment of Science of Science by Large Language Models: New Tools and Methods

Guoqiang Liang et.al. · 2025-11-19

C2F-Space: Coarse-to-Fine Space Grounding for Spatial Instructions using Vision-Language Models

Nayoung Oh et.al. · 2025-11-19

Towards Unbiased Cross-Modal Representation Learning for Food Image-to-Recipe Retrieval

Qing Wang et.al. · 2025-11-19

Unbiased Semantic Decoding with Vision Foundation Models for Few-shot Segmentation

Jin Wang et.al. · 2025-11-19

Multi-modal Loop Closure Detection with Foundation Models in Severely Unstructured Environments

Laura Alejandra Encinar Gonzalez et.al. · 2025-11-07

DAFM: Dynamic Adaptive Fusion for Multi-Model Collaboration in Composed Image Retrieval

Yawei Cai et.al. · 2025-11-07

Multi-Task Learning for Visually Grounded Reasoning in Gastrointestinal VQA

Itbaan Safwan et.al. · 2025-11-06

An Efficient Algorithm for Learning-Based Visual Localization

Jindi Zhong et.al. · 2025-11-06

Object Detection as an Optional Basis: A Graph Matching Network for Cross-View UAV Localization

Tao Liu et.al. · 2025-11-04

LUMA-RAG: Lifelong Multimodal Agents with Provably Stable Streaming Alignment

Rohan Wandre et.al. · 2025-11-04

SEPS: Semantic-enhanced Patch Slimming Framework for fine-grained cross-modal alignment

Xinyu Mao et.al. · 2025-11-03

Evaluating Perspectival Biases in Cross-Modal Retrieval

Teerapol Saengsukhiran et.al. · 2025-11-03

Dynamic Multi-level Weighted Alignment Network for Zero-shot Sketch-based Image Retrieval

Hanwen Su et.al. · 2025-11-02

Multi-Mapcher: Loop Closure Detection-Free Heterogeneous LiDAR Multi-Session SLAM Leveraging Outlier-Robust Registration for Autonomous Vehicles

Hyungtae Lim et.al. · 2025-11-01

Approximate Diverse $k$-nearest Neighbor Search in Vector Database

Jiachen Zhao et.al. · 2025-10-31

Scaling Image Geo-Localization to Continent Level

Philipp Lindenberger et.al. · 2025-10-30

Instance-Level Composed Image Retrieval

Bill Psomas et.al. · 2025-10-29

DualCap: Enhancing Lightweight Image Captioning via Dual Retrieval with Similar Scenes Visual Prompts

Binbin Li et.al. · 2025-10-28

Accurate and Scalable Multimodal Pathology Retrieval via Attentive Vision-Language Alignment

Hongyi Wang et.al. · 2025-10-27

Seeing the Unseen: Towards Zero-Shot Inspection for Wind Turbine Blades using Knowledge-Augmented Vision Language Models

Yang Zhang et.al. · 2025-10-26

TWC-SLAM: Multi-Agent Cooperative SLAM with Text Semantics and WiFi Features Integration for Similar Indoor Environments

Chunyu Li et.al. · 2025-10-26

Cross-view Localization and Synthesis -- Datasets, Challenges and Opportunities

Ningli Xu et.al. · 2025-10-26

STATUS Bench: A Rigorous Benchmark for Evaluating Object State Understanding in Vision-Language Models

Mahiro Ukai et.al. · 2025-10-26

Bag-of-Word-Groups (BoWG): A Robust and Efficient Loop Closure Detection Method Under Perceptual Aliasing

Xiang Fei et.al. · 2025-10-26

BioCAP: Exploiting Synthetic Captions Beyond Labels in Biological Foundation Models

Ziheng Zhang et.al. · 2025-10-24

Beyond Single Images: Retrieval Self-Augmented Unsupervised Camouflaged Object Detection

Ji Du et.al. · 2025-10-21

ImageGem: In-the-wild Generative Image Interaction Dataset for Generative Model Personalization

Yuanhe Guo et.al. · 2025-10-21

DualHash: A Stochastic Primal-Dual Algorithm with Theoretical Guarantee for Deep Hashing

Luxuan Li et.al. · 2025-10-21

Joint Multi-Condition Representation Modelling via Matrix Factorisation for Visual Place Recognition

Timur Ismagilov et.al. · 2025-10-20

Small Language Models Offer Significant Potential for Science Community

Jian Zhang et.al. · 2025-10-18

Acquisition of interpretable domain information during brain MR image harmonization for content-based image retrieval

Keima Abe et.al. · 2025-10-16

Through the Lens of Doubt: Robust and Efficient Uncertainty Estimation for Visual Place Recognition

Emily Miller et.al. · 2025-10-15

Embedding the Teacher: Distilling vLLM Preferences for Scalable Image Retrieval

Eric He et.al. · 2025-10-13

Hierarchical Scheduling for Multi-Vector Image Retrieval

Maoliang Li et.al. · 2025-10-10

DarkHash: A Data-Free Backdoor Attack Against Deep Hashing

Ziqi Zhou et.al. · 2025-10-09

CIR-CoT: Towards Interpretable Composed Image Retrieval via End-to-End Chain-of-Thought Reasoning

Weihuang Lin et.al. · 2025-10-09

Mutual Learning for Hashing: Unlocking Strong Hash Functions from Weak Supervision

Xiaoxu Ma et.al. · 2025-10-09

Multi-hop Deep Joint Source-Channel Coding with Deep Hash Distillation for Semantically Aligned Image Retrieval

Didrik Bergström et.al. · 2025-10-08

CalibCLIP: Contextual Calibration of Dominant Semantics for Text-Driven Image Retrieval

Bin Kang et.al. · 2025-10-07

Personalizing Retrieval using Joint Embeddings or "the Return of Fluffy"

Bruno Korbar et.al. · 2025-10-06

Flexible and Efficient Spatio-Temporal Transformer for Sequential Visual Place Recognition

Yu Kiu et.al. · 2025-10-05

The Overlooked Value of Test-time Reference Sets in Visual Place Recognition

Mubariz Zaffar et.al. · 2025-10-04

Novel UWB Synthetic Aperture Radar Imaging for Mobile Robot Mapping

Charith Premachandra et.al. · 2025-10-03

Team Xiaomi EV-AD VLA: Caption-Guided Retrieval System for Cross-Modal Drone Navigation -- Technical Report for IROS 2025 RoboSense Challenge Track 4

Lingfeng Zhang et.al. · 2025-10-03

EvoWorld: Evolving Panoramic World Generation with Explicit 3D Memory

Jiahao Wang et.al. · 2025-10-01

A Scene is Worth a Thousand Features: Feed-Forward Camera Localization from a Collection of Image Features

Axel Barroso-Laguna et.al. · 2025-10-01

Semantic Visual Simultaneous Localization and Mapping: A Survey on State of the Art, Challenges, and Future Directions

Thanh Nguyen Canh et.al. · 2025-10-01

Video Object Segmentation-Aware Audio Generation

Ilpo Viertola et.al. · 2025-09-30

SQUARE: Semantic Query-Augmented Fusion and Efficient Batch Reranking for Training-free Zero-Shot Composed Image Retrieval

Ren-Di Wu et.al. · 2025-09-30

SETR: A Two-Stage Semantic-Enhanced Framework for Zero-Shot Composed Image Retrieval

Yuqi Xiao et.al. · 2025-09-30

SAGE: Spatial-visual Adaptive Graph Exploration for Visual Place Recognition

Shunpeng Chen et.al. · 2025-09-30

Robust Visual Localization in Compute-Constrained Environments by Salient Edge Rendering and Weighted Hamming Similarity

Tu-Hoa Pham et.al. · 2025-09-29

Performance-Efficiency Trade-off for Fashion Image Retrieval

Julio Hurtado et.al. · 2025-09-29

Prepare for Warp Speed: Sub-millisecond Visual Place Recognition Using Event Cameras

Vignesh Ramanathan et.al. · 2025-09-28

Johnson-Lindenstrauss Lemma Guided Network for Efficient 3D Medical Segmentation

Jinpeng Lu et.al. · 2025-09-26

Efficient Multimodal Dataset Distillation via Generative Models

Zhenghao Zhao et.al. · 2025-09-25

A Versatile Foundation Model for AI-enabled Mammogram Interpretation

Fuxiang Huang et.al. · 2025-09-24

SGAligner++: Cross-Modal Language-Aided 3D Scene Graph Alignment

Binod Singh et.al. · 2025-09-23

Vision-Free Retrieval: Rethinking Multimodal Search with Textual Scene Descriptions

Ioanna Ntinou et.al. · 2025-09-23

OrthoLoC: UAV 6-DoF Localization and Calibration Using Orthographic Geodata

Oussema Dhaouadi et.al. · 2025-09-22

Learning Attribute-Aware Hash Codes for Fine-Grained Image Retrieval via Query Optimization

Peng Wang et.al. · 2025-09-21

SERVAL: Surprisingly Effective Zero-Shot Visual Document Retrieval Powered by Large Vision and Language Models

Thong Nguyen et.al. · 2025-09-18

PRISM: Product Retrieval In Shopping Carts using Hybrid Matching

Arda Kabadayi et.al. · 2025-09-18

Chain-of-Thought Re-ranking for Image Retrieval Tasks

Shangrong Wu et.al. · 2025-09-18

DiffVL: Diffusion-Based Visual Localization on 2D Maps via BEV-Conditioned GPS Denoising

Li Gao et.al. · 2025-09-18

Event-LAB: Towards Standardized Evaluation of Neuromorphic Localization Methods

Adam D. Hines et.al. · 2025-09-18

Hashing-Baseline: Rethinking Hashing in the Age of Pretrained Models

Ilyass Moummad et.al. · 2025-09-17

CSMoE: An Efficient Remote Sensing Foundation Model with Soft Mixture-of-Experts

Leonard Hackel et.al. · 2025-09-17

DiffHash: Text-Guided Targeted Attack via Diffusion Models against Deep Hashing Image Retrieval

Zechao Liu et.al. · 2025-09-17

Semantic-Enhanced Cross-Modal Place Recognition for Robust Robot Localization

Yujia Lin et.al. · 2025-09-16

MapAnything: Universal Feed-Forward Metric 3D Reconstruction

Nikhil Keetha et.al. · 2025-09-16

Bridging Vision Language Models and Symbolic Grounding for Video Question Answering

Haodi Ma et.al. · 2025-09-15

Listening for "You": Enhancing Speech Image Retrieval via Target Speaker Extraction

Wenhao Yang et.al. · 2025-09-11

Aerial-ground Cross-modal Localization: Dataset, Ground-truth, and Benchmark

Yandi Yang et.al. · 2025-09-09

Back To The Drawing Board: Rethinking Scene-Level Sketch-Based Image Retrieval

Emil Demić et.al. · 2025-09-08

Towards an Accurate and Effective Robot Vision (The Problem of Topological Localization for Mobile Robots)

Emanuela Boros et.al. · 2025-09-05

FloodVision: Urban Flood Depth Estimation Using Foundation Vision-Language Models and Domain Knowledge Graph

Zhangding Liu et.al. · 2025-09-05

Global-to-Local or Local-to-Global? Enhancing Image Retrieval with Efficient Local Search and Effective Global Re-ranking

Dror Aiger et.al. · 2025-09-05

DUDE: Diffusion-Based Unsupervised Cross-Domain Image Retrieval

Ruohong Yang et.al. · 2025-09-04

Scale, Don't Fine-tune: Guiding Multimodal LLMs for Efficient Visual Place Recognition at Test-Time

Jintao Cheng et.al. · 2025-09-02

Ensemble-Based Event Camera Place Recognition Under Varying Illumination

Therese Joseph et.al. · 2025-09-02

M3Ret: Unleashing Zero-shot Multimodal Medical Image Retrieval via Self-Supervision

Che Liu et.al. · 2025-09-01

ReCap: Event-Aware Image Captioning with Article Retrieval and Semantic Gaussian Normalization

Thinh-Phuc Nguyen et.al. · 2025-09-01

FAR-Net: Multi-Stage Fusion Network with Enhanced Semantic Alignment and Adaptive Reconciliation for Composed Image Retrieval

Jeong-Woo Park et.al. · 2025-07-17

MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval

Jeong-Woo Park et.al. · 2025-07-17

QuRe: Query-Relevant Retrieval through Hard Negative Sampling in Composed Image Retrieval

Jaehyun Kwak et.al. · 2025-07-16

CorrMoE: Mixture of Experts with De-stylization Learning for Cross-Scene and Cross-Domain Correspondence Pruning

Peiwen Xia et.al. · 2025-07-16

GT-Loc: Unifying When and Where in Images Through a Joint Embedding Space

David G. Shatwell et.al. · 2025-07-14

Text-to-Remote-Sensing-Image Retrieval beyond RGB Sources

Daniele Rege Cambrin et.al. · 2025-07-14

Kaleidoscopic Background Attack: Disrupting Pose Estimation with Multi-Fold Radial Symmetry Textures

Xinlong Ding et.al. · 2025-07-14

RadiomicsRetrieval: A Customizable Framework for Medical Image Retrieval Using Radiomics Features

Inye Na et.al. · 2025-07-11

LiDAR, GNSS and IMU Sensor Alignment through Dynamic Time Warping to Construct 3D City Maps

Haitian Wang et.al. · 2025-07-11

Deep Hashing with Semantic Hash Centers for Image Retrieval

Li Chen et.al. · 2025-07-11

SCREP: Scene Coordinate Regression and Evidential Learning-based Perception-Aware Trajectory Generation

Juyeop Han et.al. · 2025-07-10

VP-SelDoA: Visual-prompted Selective DoA Estimation of Target Sound via Semantic-Spatial Matching

Yu Chen et.al. · 2025-07-10

Evaluating Attribute Confusion in Fashion Text-to-Image Generation

Ziyue Liu et.al. · 2025-07-09

MS-DPPs: Multi-Source Determinantal Point Processes for Contextual Diversity Refinement of Composite Attributes in Text to Image Retrieval

Naoya Sogi et.al. · 2025-07-09

Automatic Synthesis of High-Quality Triplet Data for Composed Image Retrieval

Haiwen Li et.al. · 2025-07-08

OFFSET: Segmentation-based Focus Shift Revision for Composed Image Retrieval

Zhiwei Chen et.al. · 2025-07-08

What's Making That Sound Right Now? Video-centric Audio-Visual Localization

Hahyeon Choi et.al. · 2025-07-08

Llama Nemoretriever Colembed: Top-Performing Text-Image Retrieval Model

Mengyao Xu et.al. · 2025-07-07

An analysis of vision-language models for fabric retrieval

Francesco Giuliari et.al. · 2025-07-07

Simultaneous Localization and Mapping Using Active mmWave Sensing in 5G NR

Tao Du et.al. · 2025-07-07

U-ViLAR: Uncertainty-Aware Visual Localization for Autonomous Driving via Differentiable Association and Registration

Xiaofan Li et.al. · 2025-07-06

Query-Based Adaptive Aggregation for Multi-Dataset Joint Training Toward Universal Visual Place Recognition

Jiuhong Xiao et.al. · 2025-07-04

LoD-Loc v2: Aerial Visual Localization over Low Level-of-Detail City Models using Explicit Silhouette Alignment

Juelin Zhu et.al. · 2025-07-01

Utilizing a Novel Deep Learning Method for Scene Categorization in Remote Sensing Data

Ghufran A. Omran et.al. · 2025-06-28

Mask-aware Text-to-Image Retrieval: Referring Expression Segmentation Meets Cross-modal Retrieval

Li-Cheng Shen et.al. · 2025-06-28

MatChA: Cross-Algorithm Matching with Feature Augmentation

Paula Carbó Cubero et.al. · 2025-06-27

OracleFusion: Assisting the Decipherment of Oracle Bone Script with Structurally Constrained Semantic Typography

Caoshuo Li et.al. · 2025-06-26

Referring Expression Instance Retrieval and A Strong End-to-End Baseline

Xiangzhao Hao et.al. · 2025-06-26

Visualizing intercalation effects in 2D materials using AFM based techniques

Karmen Kapustić et.al. · 2025-06-25

On the Burstiness of Faces in Set

Jiong Wang et.al. · 2025-06-25

jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval

Michael Günther et.al. · 2025-06-24

Class Agnostic Instance-level Descriptor for Visual Instance Search

Qi-Ying Sun et.al. · 2025-06-20

MambaHash: Visual State Space Deep Hashing Model for Large-Scale Image Retrieval

Chao He et.al. · 2025-06-19

Fine-grained Image Retrieval via Dual-Vision Adaptation

Xin Jiang et.al. · 2025-06-19

Adversarial Attacks and Detection in Visual Place Recognition for Safer Robot Navigation

Connor Malone et.al. · 2025-06-19

Semantic and Feature Guided Uncertainty Quantification of Visual Localization for Autonomous Vehicles

Qiyuan Wu et.al. · 2025-06-18

ReSeDis: A Dataset for Referring-based Object Search across Large-Scale Image Collections

Ziling Huang et.al. · 2025-06-18

HARMONY: A Scalable Distributed Vector Database for High-Throughput Approximate Nearest Neighbor Search

Qian Xu et.al. · 2025-06-17

TACS-Graphs: Traversability-Aware Consistent Scene Graphs for Ground Robot Indoor Localization and Mapping

Jeewon Kim et.al. · 2025-06-17

Hierarchical Multi-Positive Contrastive Learning for Patent Image Retrieval

Kshitij Kavimandan et.al. · 2025-06-17

A Semantically-Aware Relevance Measure for Content-Based Medical Image Retrieval Evaluation

Xiaoyang Wei et.al. · 2025-06-16

EmbodiedPlace: Learning Mixture-of-Features with Embodied Constraints for Visual Place Recognition

Bingxi Liu et.al. · 2025-06-16

SuperPlace: The Renaissance of Classical Feature Aggregation for Visual Place Recognition in the Era of Foundation Models

Bingxi Liu et.al. · 2025-06-16

Feature Complementation Architecture for Visual Place Recognition

Weiwei Wang et.al. · 2025-06-14

Towards a general-purpose foundation model for fMRI analysis

Cheng Wang et.al. · 2025-06-11

Improving Personalized Search with Regularized Low-Rank Parameter Updates

Fiona Ryan et.al. · 2025-06-11

Hierarchical Image Matching for UAV Absolute Visual Localization via Semantic and Structural Constraints

Xiangkai Zhang et.al. · 2025-06-11

Safeguarding Multimodal Knowledge Copyright in the RAG-as-a-Service Environment

Tianyu Chen et.al. · 2025-06-10

Robust Visual Localization via Semantic-Guided Multi-Scale Transformer

Zhongtao Tian et.al. · 2025-06-10

Interpretable and Reliable Detection of AI-Generated Images via Grounded Reasoning in MLLMs

Yikun Ji et.al. · 2025-06-08

Zero Shot Composed Image Retrieval

Santhosh Kakarla et.al. · 2025-06-07

GenIR: Generative Visual Feedback for Mental Image Retrieval

Diji Yang et.al. · 2025-06-06

Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning

Sheng Chen et.al. · 2025-06-06

HypeVPR: Exploring Hyperbolic Space for Perspective to Equirectangular Visual Place Recognition

Suhan Woo et.al. · 2025-06-05

Deep Learning Reforms Image Matching: A Survey and Outlook

Shihua Zhang et.al. · 2025-06-05

Entity Image and Mixed-Modal Image Retrieval Datasets

Cristian-Ioan Blaga et.al. · 2025-06-02

Quantization-based Bounds on the Wasserstein Metric

Jonathan Bobrutsky et.al. · 2025-06-01

SORCE: Small Object Retrieval in Complex Environments

Chunxu Liu et.al. · 2025-05-30

Sketch Down the FLOPs: Towards Efficient Networks for Human Sketch

Aneeshan Sain et.al. · 2025-05-29

4DTAM: Non-Rigid Tracking and Mapping via Dynamic Surface Gaussians

Hidenobu Matsuki et.al. · 2025-05-28

UAVPairs: A Challenging Benchmark for Match Pair Retrieval of Large-scale UAV Images

Junhuan Liu et.al. · 2025-05-28

Fast Feature Matching of UAV Images via Matrix Band Reduction-based GPU Data Schedule

San Jiang et.al. · 2025-05-28

Visual Loop Closure Detection Through Deep Graph Consensus

Martin Büchner et.al. · 2025-05-27

QuARI: Query Adaptive Retrieval Improvement

Eric Xing et.al. · 2025-05-27

ConText-CIR: Learning from Concepts in Text for Composed Image Retrieval

Eric Xing et.al. · 2025-05-27

Visualized Text-to-Image Retrieval

Di Wu et.al. · 2025-05-26

Multimodal Reasoning Agent for Zero-Shot Composed Image Retrieval

Rong-Cheng Tu et.al. · 2025-05-26

Can Visual Encoder Learn to See Arrows?

Naoyuki Terashita et.al. · 2025-05-26

TAT-VPR: Ternary Adaptive Transformer for Dynamic and Efficient Visual Place Recognition

Oliver Grainge et.al. · 2025-05-22

Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval

Siting Li et.al. · 2025-05-21

SCENIR: Visual Semantic Clarity through Unsupervised Scene Graph Retrieval

Nikolaos Chaidos et.al. · 2025-05-21

Multimodal RAG-driven Anomaly Detection and Classification in Laser Powder Bed Fusion using Large Language Models

Kiarash Naghavi Khanghah et.al. · 2025-05-20

MMS-VPR: Multimodal Street-Level Visual Place Recognition Dataset and Benchmark

Yiwei Ou et.al. · 2025-05-18

Improved Bag-of-Words Image Retrieval with Geometric Constraints for Ground Texture Localization

Aaron Wilhelm et.al. · 2025-05-16

Redundancy-Aware Pretraining of Vision-Language Foundation Models in Remote Sensing

Mathis Jürgen Adler et.al. · 2025-05-16

SafeNav: Safe Path Navigation using Landmark Based Localization in a GPS-denied Environment

Ganesh Sapkota et.al. · 2025-05-13

Thermal-LiDAR Fusion for Robust Tunnel Localization in GNSS-Denied and Low-Visibility Conditions

Lukas Schichler et.al. · 2025-05-06

LiftFeat: 3D Geometry-Aware Local Feature Matching

Yepeng Liu et.al. · 2025-05-06

Seeing the Abstract: Translating the Abstract Language for Vision Language Models

Davide Talon et.al. · 2025-05-06

OBD-Finder: Explainable Coarse-to-Fine Text-Centric Oracle Bone Duplicates Discovery

Chongsheng Zhang et.al. · 2025-05-04

NeuroLoc: Encoding Navigation Cells for 6-DOF Camera Localization

Xun Li et.al. · 2025-05-02

GSFeatLoc: Visual Localization Using Feature Correspondence on 3D Gaussian Splatting

Jongwon Lee et.al. · 2025-05-01

From Mapping to Composing: A Two-Stage Framework for Zero-shot Composed Image Retrieval

Yabing Wang et.al. · 2025-04-25

A Guide to Structureless Visual Localization

Vojtech Panek et.al. · 2025-04-24

Rethinking Vision Transformer for Large-Scale Fine-Grained Image Retrieval

Xin Jiang et.al. · 2025-04-23

Media Content Atlas: A Pipeline to Explore and Investigate Multidimensional Media Space using Multimodal LLMs

Merve Cerit et.al. · 2025-04-22

A Multimodal Recaptioning Framework to Account for Perceptual Diversity in Multilingual Vision-Language Modeling

Kyle Buettner et.al. · 2025-04-19

SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs

Haoxuan Li et.al. · 2025-04-17

Generalized Visual Relation Detection with Diffusion Models

Kaifeng Gao et.al. · 2025-04-16

Visual Re-Ranking with Non-Visual Side Information

Gustav Hanning et.al. · 2025-04-15

TMCIR: Token Merge Benefits Composed Image Retrieval

Chaoyang Wang et.al. · 2025-04-15

Focus on Local: Finding Reliable Discriminative Regions for Visual Place Recognition

Changwei Wang et.al. · 2025-04-14

Evolved Hierarchical Masking for Self-Supervised Learning

Zhanzhou Feng et.al. · 2025-04-12

HAL-NeRF: High Accuracy Localization Leveraging Neural Radiance Fields

Asterios Reppas et.al. · 2025-04-11

Hypergraph Vision Transformers: Images are More than Nodes, More than Edges

Joshua Fixelle et.al. · 2025-04-11

FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations

Cheng-Yu Hsieh et.al. · 2025-04-11

PNE-SGAN: Probabilistic NDT-Enhanced Semantic Graph Attention Network for LiDAR Loop Closure Detection

Xiong Li et.al. · 2025-04-11

Multi-modal Reference Learning for Fine-grained Text-to-Image Retrieval

Zehong Ma et.al. · 2025-04-10

A Pointcloud Registration Framework for Relocalization in Subterranean Environments

David Akhihiero et.al. · 2025-04-09

Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception

Ruotian Peng et.al. · 2025-04-09

To Match or Not to Match: Revisiting Image Matching for Reliable Visual Place Recognition

Davide Sferrazza et.al. · 2025-04-08

NCL-CIR: Noise-aware Contrastive Learning for Composed Image Retrieval

Peng Gao et.al. · 2025-04-06

Re-thinking Temporal Search for Long-Form Video Understanding

Jinhui Ye et.al. · 2025-04-06

REJEPA: A Novel Joint-Embedding Predictive Architecture for Efficient Remote Sensing Image Retrieval

Shabnam Choudhury et.al. · 2025-04-04

A Chefs KISS -- Utilizing semantic information in both ICP and SLAM framework

Sven Ochs et.al. · 2025-04-02

Prompt-Guided Attention Head Selection for Focus-Oriented Image Retrieval

Yuji Nozawa et.al. · 2025-04-02

IDMR: Towards Instance-Driven Precise Visual Correspondence in Multimodal Retrieval

Bangwei Liu et.al. · 2025-04-01

Scaling Prompt Instructed Zero Shot Composed Image Retrieval with Image-Only Data

Yiqun Duan et.al. · 2025-04-01

CIBR: Cross-modal Information Bottleneck Regularization for Robust CLIP Generalization

Yingrui Ji et.al. · 2025-03-31

LiM-Loc: Visual Localization with Dense and Accurate 3D Reference Maps Directly Corresponding 2D Keypoints to 3D LiDAR Point Clouds

Masahiko Tsuji et.al. · 2025-03-31

Multiview Image-Based Localization

Cameron Fiore et.al. · 2025-03-30

LOCORE: Image Re-ranking with Long-Context Sequence Modeling

Zilin Xiao et.al. · 2025-03-27

Fwd2Bot: LVLM Visual Token Compression with Double Forward Bottleneck

Adrian Bulat et.al. · 2025-03-27

UGNA-VPR: A Novel Training Paradigm for Visual Place Recognition Based on Uncertainty-Guided NeRF Augmentation

Yehui Shen et.al. · 2025-03-27

FineCIR: Explicit Parsing of Fine-Grained Modification Semantics for Composed Image Retrieval

Zixu Li et.al. · 2025-03-27

Clean Image May be Dangerous: Data Poisoning Attacks Against Deep Hashing

Shuai Li et.al. · 2025-03-27

CoLLM: A Large Language Model for Composed Image Retrieval

Chuong Huynh et.al. · 2025-03-25

Scene-agnostic Pose Regression for Visual Localization

Junwei Zheng et.al. · 2025-03-25

From Sparse to Dense: Camera Relocalization with Scene-Specific Detector from Feature Gaussian Splatting

Zhiwei Huang et.al. · 2025-03-25

Fine-grained Textual Inversion Network for Zero-Shot Composed Image Retrieval

Haoqiang Lin et.al. · 2025-03-25

LocDiffusion: Identifying Locations on Earth by Diffusing in the Hilbert Space

Zhangyu Wang et.al. · 2025-03-23

Selecting and Pruning: A Differentiable Causal Sequentialized State-Space Model for Two-View Correspondence Learning

Xiang Fang et.al. · 2025-03-23

What Time Tells Us? An Explorative Study of Time Awareness Learned from Static Images

Dongheng Lin et.al. · 2025-03-23

good4cir: Generating Detailed Synthetic Captions for Composed Image Retrieval

Pranavi Kolouju et.al. · 2025-03-22

Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval

Yuanmin Tang et.al. · 2025-03-21

Autonomous Exploration-Based Precise Mapping for Mobile Robots through Stepwise and Consistent Motions

Muhua Zhang et.al. · 2025-03-21

PromptHash: Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing Retrieval

Qiang Zou et.al. · 2025-03-20

Automating 3D Dataset Generation with Neural Radiance Fields

P. Schulz et.al. · 2025-03-20

3D Densification for Multi-Map Monocular VSLAM in Endoscopy

X. Anadón et.al. · 2025-03-18

A-SCoRe: Attention-based Scene Coordinate Regression for wide-ranging scenarios

Huy-Hoang Bui et.al. · 2025-03-18

Scale Efficient Training for Large Datasets

Qing Zhou et.al. · 2025-03-17

Multi-Platform Teach-and-Repeat Navigation by Visual Place Recognition Based on Deep-Learned Local Features

Václav Truhlařík et.al. · 2025-03-17

All You Need to Know About Training Image Retrieval Models

Gabriele Berton et.al. · 2025-03-17

ImageScope: Unifying Language-Guided Image Retrieval via Large Multimodal Model Collective Reasoning

Pengfei Luo et.al. · 2025-03-13

Exploring the best way for UAV visual localization under Low-altitude Multi-view Observation Condition: a Benchmark

Yibin Ye et.al. · 2025-03-12

Revisiting Medical Image Retrieval via Knowledge Consolidation

Yang Nan et.al. · 2025-03-12

CQVPR: Landmark-aware Contextual Queries for Visual Place Recognition

Dongyue Li et.al. · 2025-03-11

Find your Needle: Small Object Image Retrieval via Multi-Object Attention Optimization

Michael Green et.al. · 2025-03-10

Zero-Shot Hashing Based on Reconstruction With Part Alignment

Yan Jiang et.al. · 2025-03-10

Improving Visual Place Recognition with Sequence-Matching Receptiveness Prediction

Somayeh Hussaini et.al. · 2025-03-10

RoboDesign1M: A Large-scale Dataset for Robot Design Understanding

Tri Le et.al. · 2025-03-09

StructVPR++: Distill Structural and Semantic Knowledge with Weighting Samples for Visual Place Recognition

Yanqing Shen et.al. · 2025-03-09

TextInPlace: Indoor Visual Place Recognition in Repetitive Structures with Scene Text Spotting and Verification

Huaqi Tao et.al. · 2025-03-09

NeuraLoc: Visual Localization in Neural Implicit Map with Dual Complementary Features

Hongjia Zhai et.al. · 2025-03-08