An “automated video editor” (AVE) automatically processes one or more input videos to create an edited video stream with little or no user interaction. The AVE produces cinematic effects such as cross-cuts, zooms, pans, insets, 3-D effects, etc., by applying a combination of cinematic rules, object recognition techniques, and digital editing of the input video. Consequently, the AVE is capable of using a simple video taken with a fixed camera to automatically simulate cinematic editing effects that would normally require multiple cameras and/or professional editing. The AVE first defines a list of scenes in the video and generates a rank-ordered list of candidate shots for each scene. Each frame of each scene is then analyzed or “parsed” using object detection techniques (“detectors”) for isolating unique objects (faces, moving/stationary objects, etc.) in the scene. Shots are then automatically selected for each scene and used to construct the edited video stream.