World Model

Beyond Object-Level Alignment: Do Brains and DNNs Preserve the Same Transformations?

Yukiyasu Kamitani · 2026-05-07

Reconstruction or Semantics? What Makes a Latent Space Useful for Robotic World Models

Nilaksh et al. · 2026-05-07

Earth-o1: A Grid-free Observation-native Atmospheric World Model

Junchao Gong et al. · 2026-05-07

MANTRA: Synthesizing SMT-Validated Compliance Benchmarks for Tool-Using LLM Agents

Ashwani Anand et al. · 2026-05-07

Render, Don't Decode: Weight-Space World Models with Latent Structural Disentanglement

Roussel Desmond Nzoyem et al. · 2026-05-07

EA-WM: Event-Aware Generative World Model with Structured Kinematic-to-Visual Action Fields

Zhaoyang Yang et al. · 2026-05-07

Causal Reinforcement Learning for Complex Card Games: A Magic The Gathering Benchmark

Cristiano da Costa Cunha et al. · 2026-05-07

HaM-World: Soft-Hamiltonian World Models with Selective Memory for Planning

Haoyun Tang et al. · 2026-05-07

LoViF 2026 The First Challenge on Holistic Quality Assessment for 4D World Model (PhyScore)

Wei Luo et al. · 2026-05-06

Executable World Models for ARC-AGI-3 in the Era of Coding Agents

Sergey Rodionov · 2026-05-06

Implementing True MPI Sessions and Evaluating MPI Initialization Scalability

Hui Zhou et al. · 2026-05-05

A Benchmark for Interactive World Models with a Unified Action Generation Framework

Jianjie Fang et al. · 2026-05-05

RoboAlign-R1: Distilled Multimodal Reward Alignment for Robot Video World Models

Hao Wu et al. · 2026-05-05

What You Think is What You See: Driving Exploration in VLM Agents via Visual-Linguistic Curiosity

Haoxi Li et al. · 2026-05-05

AniMatrix: An Anime Video Generation Model that Thinks in Art, Not Physics

Tencent HY Team · 2026-05-05

Learning to Theorize the World from Observation

Doojin Baek et al. · 2026-05-05

High-Fidelity Full-Sky Video Prediction for Photovoltaic Ramp Event Forecasting

Siyuan Wang et al. · 2026-05-04

Existence, Asymptotic Behavior, and Numerical Analysis of a Generalized Abel Differential Equation with Applications in Financial Modeling

Dragos-Patru Covei · 2026-05-04

DynoSLAM: Dynamic SLAM with Generative Graph Neural Networks for Real-World Social Navigation

Danil Tokhchukov et al. · 2026-05-04

Shadow-Loom: Causal Reasoning over Graphical World Model of Narratives

David Wilmot · 2026-05-04

HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation

Xin Zhou et al. · 2026-04-30

LaST-R1: Reinforcing Action via Adaptive Physical Latent Reasoning for VLA Models

Hao Chen et al. · 2026-04-30

Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling

Keming Wu et al. · 2026-04-30

Beyond Gaussian Bottlenecks: Topologically Aligned Encoding of Vision-Transformer Feature Spaces

Andrew Bond et al. · 2026-04-30

Dreaming Across Towns: Semantic Rollout and Town-Adversarial Regularization for Zero-Shot Held-Out-Town Fixed-Route Driving in CARLA

Feeza Khan Khanzada et al. · 2026-04-30

GUI Agents with Reinforcement Learning: Toward Digital Inhabitants

Junan Hu et al. · 2026-04-30

Flying by Inference: Active Inference World Models for Adaptive UAV Swarms

Kaleem Arshid et al. · 2026-04-30

Simulating clinical interventions with a generative multimodal model of human physiology

Guy Lutsker et al. · 2026-04-30

Graph World Models: Concepts, Taxonomy, and Future Directions

Jiawei Liu et al. · 2026-04-30

MotuBrain: An Advanced World Action Model for Robot Control

MotuBrain Team et al. · 2026-04-30

Seeing Fast and Slow: Learning the Flow of Time in Videos

Yen-Siang Wu et al. · 2026-04-23

Machine Behavior in Relational Moral Dilemmas: Moral Rightness, Predicted Human Behavior, and Model Decisions

Jiseon Kim et al. · 2026-04-23

Hi-WM: Human-in-the-World-Model for Scalable Robot Post-Training

Yaxuan Li et al. · 2026-04-23

WorldMark: A Unified Benchmark Suite for Interactive Video World Models

Xiaojie Xu et al. · 2026-04-23

Building a Precise Video Language with Human-AI Oversight

Zhiqiu Lin et al. · 2026-04-22

Agentic AI for Personalized Physiotherapy: A Multi-Agent Framework for Generative Video Training and Real-Time Pose Correction

Abhishek Dharmaratnakar et al. · 2026-04-22

Open-H-Embodiment: A Large-Scale Dataset for Enabling Foundation Models in Medical Robotics

Open-H-Embodiment Consortium et al. · 2026-04-22

DeVI: Physics-based Dexterous Human-Object Interaction via Synthetic Video Imitation

Hyeonwoo Kim et al. · 2026-04-22

Occupancy Reward Shaping: Improving Credit Assignment for Offline Goal-Conditioned Reinforcement Learning

Aravind Venugopal et al. · 2026-04-22

CCTVBench: Contrastive Consistency Traffic VideoQA Benchmark for Multimodal LLMs

Xingcheng Zhou et al. · 2026-04-22

Sonata: A Hybrid World Model for Inertial Kinematics under Clinical Data Scarcity

Blaise Delaney et al. · 2026-04-20

The Umwelt Representation Hypothesis: Rethinking Universality

Victoria Bosch et al. · 2026-04-20

Scaling Human-AI Coding Collaboration Requires a Governable Consensus Layer

Tianfu Wang et al. · 2026-04-20

Infrastructure-Centric World Models: Bridging Temporal Depth and Spatial Breadth for Roadside Perception

Siyuan Meng et al. · 2026-04-19

Dual-Anchoring: Addressing State Drift in Vision-Language Navigation

Kangyi Wu et al. · 2026-04-19

Long-CODE: Isolating Pure Long-Context as an Orthogonal Dimension in Video Evaluation

Zhijiang Tang et al. · 2026-04-19

DreamShot: Personalized Storyboard Synthesis with Video Diffusion Prior

Junjia Huang et al. · 2026-04-19

TensorHub: Rethinking AI Model Hub with Tensor-Centric Compression

Tingfeng Lan et al. · 2026-04-18

LIVE: Leveraging Image Manipulation Priors for Instruction-based Video Editing

Weicheng Wang et al. · 2026-04-18

SafeDream: Safety World Model for Proactive Early Jailbreak Detection

Bo Yan et al. · 2026-04-18

Seedance 2.0: Advancing Video Generation for World Complexity

Team Seedance et al. · 2026-04-15

Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective

Weijie Wang et al. · 2026-04-15

Beyond State Consistency: Behavior Consistency in Text-Based World Models

Youling Huang et al. · 2026-04-15

Vision-and-Language Navigation for UAVs: Progress, Challenges, and a Research Roadmap

Hanxuan Chen et al. · 2026-04-15

DiT as Real-Time Rerenderer: Streaming Video Stylization with Autoregressive Diffusion Transformer

Hengye Lyu et al. · 2026-04-15

VibeFlow: Versatile Video Chroma-Lux Editing through Self-Supervised Learning

Yifan Li et al. · 2026-04-15

Robotic Manipulation is Vision-to-Geometry Mapping ($f(v) \rightarrow G$): Vision-Geometry Backbones over Language and Video Models

Zijian Song et al. · 2026-04-14

A Dataset and Evaluation for Complex 4D Markerless Human Motion Capture

Yeeun Park et al. · 2026-04-14

ArtifactWorld: Scaling 3D Gaussian Splatting Artifact Restoration via Video Generation Models

Xinliang Wang et al. · 2026-04-14

Grounded World Model for Semantically Generalizable Planning

Quanyi Li et al. · 2026-04-13

Phantom: Physics-Infused Video Generation via Joint Modeling of Visual and Latent Physical Dynamics

Ying Shen et al. · 2026-04-09

Grounding Clinical AI Competency in Human Cognition Through the Clinical World Model and Skill-Mix Framework

Seyed Amir Ahmad Safavi-Naini et al. · 2026-04-09

Beyond Static Forecasting: Unleashing the Power of World Models for Mobile Traffic Extrapolation

Xiaoqian Qi et al. · 2026-04-09

ViVa: A Video-Generative Value Model for Robot Reinforcement Learning

Jindi Lv et al. · 2026-04-09

MotionScape: A Large-Scale Real-World Highly Dynamic UAV Video Dataset for World Models

Zile Guo et al. · 2026-04-09

WorldMAP: Bootstrapping Vision-Language Navigation Trajectory Prediction with Generative World Models

Hongjin Chen et al. · 2026-04-09

DailyArt: Discovering Articulation from Single Static Images via Latent Dynamics

Hang Zhang et al. · 2026-04-09

CausalVAE as a Plug-in for World Models: Towards Reliable Counterfactual Dynamics

Ziyi Ding et al. · 2026-04-09

Grasp as You Dream: Imitating Functional Grasping from Generated Human Demonstrations

Chao Tang et al. · 2026-04-08

GIRL: Generative Imagination Reinforcement Learning via Information-Theoretic Hallucination Control

Prakul Sunil Hiremath · 2026-04-08

How Much LLM Does a Self-Revising Agent Actually Need?

Seongwoo Jeong et al. · 2026-04-08

PhyEdit: Towards Real-World Object Manipulation via Physically-Grounded Image Editing

Ruihang Xu et al. · 2026-04-08

INSPATIO-WORLD: A Real-Time 4D World Simulator via Spatiotemporal Autoregressive Modeling

InSpatio Team et al. · 2026-04-08

Radio-Frequency Inverse Rendering for Wireless Environment Modeling

Fuhai Wang et al. · 2026-04-08

Telecom World Models: Unifying Digital Twins, Foundation Models, and Predictive Planning for 6G

Hang Zou et al. · 2026-04-08

The Rhetoric of Machine Learning

Robert C. Williamson · 2026-04-08

Controllable Generative Video Compression

Ding Ding et al. · 2026-04-08

Neural Computers

Mingchen Zhuge et al. · 2026-04-07

Evolution of Video Generative Foundations

Teng Hu et al. · 2026-04-07

Action Images: End-to-End Policy Learning via Multiview Video Generation

Haoyu Zhen et al. · 2026-04-07