3D Vision arXiv Daily World Model 80 papers

World Model

80 papers
Beyond Object-Level Alignment: Do Brains and DNNs Preserve the Same Transformations?
Yukiyasu Kamitani · 2026-05-07
Reconstruction or Semantics? What Makes a Latent Space Useful for Robotic World Models
Nilaksh et al. · 2026-05-07
Earth-o1: A Grid-free Observation-native Atmospheric World Model
Junchao Gong et al. · 2026-05-07
MANTRA: Synthesizing SMT-Validated Compliance Benchmarks for Tool-Using LLM Agents
Ashwani Anand et al. · 2026-05-07
Render, Don't Decode: Weight-Space World Models with Latent Structural Disentanglement
Roussel Desmond Nzoyem et al. · 2026-05-07
EA-WM: Event-Aware Generative World Model with Structured Kinematic-to-Visual Action Fields
Zhaoyang Yang et al. · 2026-05-07
Causal Reinforcement Learning for Complex Card Games: A Magic The Gathering Benchmark
Cristiano da Costa Cunha et al. · 2026-05-07
HaM-World: Soft-Hamiltonian World Models with Selective Memory for Planning
Haoyun Tang et al. · 2026-05-07
LoViF 2026 The First Challenge on Holistic Quality Assessment for 4D World Model (PhyScore)
Wei Luo et al. · 2026-05-06
Executable World Models for ARC-AGI-3 in the Era of Coding Agents
Sergey Rodionov · 2026-05-06
Implementing True MPI Sessions and Evaluating MPI Initialization Scalability
Hui Zhou et al. · 2026-05-05
A Benchmark for Interactive World Models with a Unified Action Generation Framework
Jianjie Fang et al. · 2026-05-05
RoboAlign-R1: Distilled Multimodal Reward Alignment for Robot Video World Models
Hao Wu et al. · 2026-05-05
What You Think is What You See: Driving Exploration in VLM Agents via Visual-Linguistic Curiosity
Haoxi Li et al. · 2026-05-05
AniMatrix: An Anime Video Generation Model that Thinks in Art, Not Physics
Tencent HY Team · 2026-05-05
Learning to Theorize the World from Observation
Doojin Baek et al. · 2026-05-05
High-Fidelity Full-Sky Video Prediction for Photovoltaic Ramp Event Forecasting
Siyuan Wang et al. · 2026-05-04
Existence, Asymptotic Behavior, and Numerical Analysis of a Generalized Abel Differential Equation with Applications in Financial Modeling
Dragos-Patru Covei · 2026-05-04
DynoSLAM: Dynamic SLAM with Generative Graph Neural Networks for Real-World Social Navigation
Danil Tokhchukov et al. · 2026-05-04
Shadow-Loom: Causal Reasoning over Graphical World Model of Narratives
David Wilmot · 2026-05-04
HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation
Xin Zhou et al. · 2026-04-30
LaST-R1: Reinforcing Action via Adaptive Physical Latent Reasoning for VLA Models
Hao Chen et al. · 2026-04-30
Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling
Keming Wu et al. · 2026-04-30
Beyond Gaussian Bottlenecks: Topologically Aligned Encoding of Vision-Transformer Feature Spaces
Andrew Bond et al. · 2026-04-30
Dreaming Across Towns: Semantic Rollout and Town-Adversarial Regularization for Zero-Shot Held-Out-Town Fixed-Route Driving in CARLA
Feeza Khan Khanzada et al. · 2026-04-30
GUI Agents with Reinforcement Learning: Toward Digital Inhabitants
Junan Hu et al. · 2026-04-30
Flying by Inference: Active Inference World Models for Adaptive UAV Swarms
Kaleem Arshid et al. · 2026-04-30
Simulating clinical interventions with a generative multimodal model of human physiology
Guy Lutsker et al. · 2026-04-30
Graph World Models: Concepts, Taxonomy, and Future Directions
Jiawei Liu et al. · 2026-04-30
MotuBrain: An Advanced World Action Model for Robot Control
MotuBrain Team et al. · 2026-04-30
Seeing Fast and Slow: Learning the Flow of Time in Videos
Yen-Siang Wu et al. · 2026-04-23
Machine Behavior in Relational Moral Dilemmas: Moral Rightness, Predicted Human Behavior, and Model Decisions
Jiseon Kim et al. · 2026-04-23
Hi-WM: Human-in-the-World-Model for Scalable Robot Post-Training
Yaxuan Li et al. · 2026-04-23
WorldMark: A Unified Benchmark Suite for Interactive Video World Models
Xiaojie Xu et al. · 2026-04-23
Building a Precise Video Language with Human-AI Oversight
Zhiqiu Lin et al. · 2026-04-22
Agentic AI for Personalized Physiotherapy: A Multi-Agent Framework for Generative Video Training and Real-Time Pose Correction
Abhishek Dharmaratnakar et al. · 2026-04-22
Open-H-Embodiment: A Large-Scale Dataset for Enabling Foundation Models in Medical Robotics
Open-H-Embodiment Consortium et al. · 2026-04-22
DeVI: Physics-based Dexterous Human-Object Interaction via Synthetic Video Imitation
Hyeonwoo Kim et al. · 2026-04-22
Occupancy Reward Shaping: Improving Credit Assignment for Offline Goal-Conditioned Reinforcement Learning
Aravind Venugopal et al. · 2026-04-22
CCTVBench: Contrastive Consistency Traffic VideoQA Benchmark for Multimodal LLMs
Xingcheng Zhou et al. · 2026-04-22
Sonata: A Hybrid World Model for Inertial Kinematics under Clinical Data Scarcity
Blaise Delaney et al. · 2026-04-20
The Umwelt Representation Hypothesis: Rethinking Universality
Victoria Bosch et al. · 2026-04-20
Scaling Human-AI Coding Collaboration Requires a Governable Consensus Layer
Tianfu Wang et al. · 2026-04-20
Infrastructure-Centric World Models: Bridging Temporal Depth and Spatial Breadth for Roadside Perception
Siyuan Meng et al. · 2026-04-19
Dual-Anchoring: Addressing State Drift in Vision-Language Navigation
Kangyi Wu et al. · 2026-04-19
Long-CODE: Isolating Pure Long-Context as an Orthogonal Dimension in Video Evaluation
Zhijiang Tang et al. · 2026-04-19
DreamShot: Personalized Storyboard Synthesis with Video Diffusion Prior
Junjia Huang et al. · 2026-04-19
TensorHub: Rethinking AI Model Hub with Tensor-Centric Compression
Tingfeng Lan et al. · 2026-04-18
LIVE: Leveraging Image Manipulation Priors for Instruction-based Video Editing
Weicheng Wang et al. · 2026-04-18
SafeDream: Safety World Model for Proactive Early Jailbreak Detection
Bo Yan et al. · 2026-04-18
Seedance 2.0: Advancing Video Generation for World Complexity
Team Seedance et al. · 2026-04-15
Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective
Weijie Wang et al. · 2026-04-15
Beyond State Consistency: Behavior Consistency in Text-Based World Models
Youling Huang et al. · 2026-04-15
Vision-and-Language Navigation for UAVs: Progress, Challenges, and a Research Roadmap
Hanxuan Chen et al. · 2026-04-15
DiT as Real-Time Rerenderer: Streaming Video Stylization with Autoregressive Diffusion Transformer
Hengye Lyu et al. · 2026-04-15
VibeFlow: Versatile Video Chroma-Lux Editing through Self-Supervised Learning
Yifan Li et al. · 2026-04-15
Robotic Manipulation is Vision-to-Geometry Mapping ($f(v) \rightarrow G$): Vision-Geometry Backbones over Language and Video Models
Zijian Song et al. · 2026-04-14
A Dataset and Evaluation for Complex 4D Markerless Human Motion Capture
Yeeun Park et al. · 2026-04-14
ArtifactWorld: Scaling 3D Gaussian Splatting Artifact Restoration via Video Generation Models
Xinliang Wang et al. · 2026-04-14
Grounded World Model for Semantically Generalizable Planning
Quanyi Li et al. · 2026-04-13
Phantom: Physics-Infused Video Generation via Joint Modeling of Visual and Latent Physical Dynamics
Ying Shen et al. · 2026-04-09
Grounding Clinical AI Competency in Human Cognition Through the Clinical World Model and Skill-Mix Framework
Seyed Amir Ahmad Safavi-Naini et al. · 2026-04-09
Beyond Static Forecasting: Unleashing the Power of World Models for Mobile Traffic Extrapolation
Xiaoqian Qi et al. · 2026-04-09
ViVa: A Video-Generative Value Model for Robot Reinforcement Learning
Jindi Lv et al. · 2026-04-09
MotionScape: A Large-Scale Real-World Highly Dynamic UAV Video Dataset for World Models
Zile Guo et al. · 2026-04-09
WorldMAP: Bootstrapping Vision-Language Navigation Trajectory Prediction with Generative World Models
Hongjin Chen et al. · 2026-04-09
DailyArt: Discovering Articulation from Single Static Images via Latent Dynamics
Hang Zhang et al. · 2026-04-09
CausalVAE as a Plug-in for World Models: Towards Reliable Counterfactual Dynamics
Ziyi Ding et al. · 2026-04-09
Grasp as You Dream: Imitating Functional Grasping from Generated Human Demonstrations
Chao Tang et al. · 2026-04-08
GIRL: Generative Imagination Reinforcement Learning via Information-Theoretic Hallucination Control
Prakul Sunil Hiremath · 2026-04-08
How Much LLM Does a Self-Revising Agent Actually Need?
Seongwoo Jeong et al. · 2026-04-08
PhyEdit: Towards Real-World Object Manipulation via Physically-Grounded Image Editing
Ruihang Xu et al. · 2026-04-08
INSPATIO-WORLD: A Real-Time 4D World Simulator via Spatiotemporal Autoregressive Modeling
InSpatio Team et al. · 2026-04-08
Radio-Frequency Inverse Rendering for Wireless Environment Modeling
Fuhai Wang et al. · 2026-04-08
Telecom World Models: Unifying Digital Twins, Foundation Models, and Predictive Planning for 6G
Hang Zou et al. · 2026-04-08
The Rhetoric of Machine Learning
Robert C. Williamson · 2026-04-08
Controllable Generative Video Compression
Ding Ding et al. · 2026-04-08
Neural Computers
Mingchen Zhuge et al. · 2026-04-07
Evolution of Video Generative Foundations
Teng Hu et al. · 2026-04-07
Action Images: End-to-End Policy Learning via Multiview Video Generation
Haoyu Zhen et al. · 2026-04-07
Select a paper to read