Temporal and spatial scaling of video images including video object planes (VOPs) (117, 118, 119, 405, 415, 420, 430, 520, 522, 524, 526, 532, 542, 705, 730, 750, 760, 780, 790, 805, 815, 820, 830, 850, 860, 880, 890) in an input
digital video sequence is provided. Coding efficiency is improved by adaptively compressing scaled field mode video. Upsampled VOPs (450, 490, 522, 542, 750, 790) in the enhancement layer are reordered to provide a greater correlation with the input
video sequence based on a linear criteria. The resulting residue is coded using a
spatial transformation such as the DCT. A motion compensation scheme is used for coding enhancement layer VOPs (450, 460, 480, 490, 522, 524, 526, 542, 750, 760, 780, 790, 850, 860, 880, 890) by scaling motion vectors which have already been determined for the base layer VOPs (405, 415, 420, 430, 520, 532, 705, 730, 805, 815, 820, 830). A reduced search area whose center is defined by the scaled motion vectors is provided. The motion compensation scheme is suitable for use with scaled frame mode or field mode video. Various processor configurations achieve particular scaleable coding results. Applications of scaleable coding include
stereoscopic video, picture-in-picture, preview access channels, and ATM communications.