An invention for generating standard description records from
multimedia information. The invention utilizes fundamental entity-relation models for the Generic AV DS that classify the entities, the entity attributes, and the relationships in relevant types to describe visual data. It also involves classification of entity attributes into syntactic and semantic attributes. Syntactic attributes can be categorized into different levels: type / technique,
global distribution,
local structure, and global composition. Semantic attributes can be likewise discretely categorized: generic object, generic scene, specific object, specific scene, abstract object, and abstract scene. The invention further classifies entity relationships into syntactic / semantic categories. Syntactic relationship categories include spatial, temporal, and visual categories.
Semantic relationship categories include lexical and predicative categories. Spatial and temporal relationships can be topological or directional; visual relationships can be global, local, or composition; lexical relationships can be synonymy, antonymy, hyponymy / hypernymy, or meronymy / holonymy; and predicative relationships can be actions (events) or states.