
Tesla’s Vision-Only FSD Builds a Real 3D World From Camera Pixels
The piece explains how Tesla's FSD uses only camera input to create a live 3D representation of the world by fusing multi-view image features into a unified 3D space. Two patents—Vision-Based Occupancy Determination and Vision-Based Surface Determination—describe a voxel-based occupancy map and a 3D surface mesh, built from 2D images via transformers, temporal fusion, and deconvolution, enabling depth, motion, and material understanding without LiDAR. The two systems work together to inform prediction, path planning, and control, training on data from LIDAR/photogrammetry and camera feeds to produce a robust, dynamic world model for real-time driving decisions.













