The invention discloses a method and a 
system for carrying out semantic description on audio and video contents. The method comprises the following steps of: firstly, splitting the audio and video contents into multiple fragments, endowing a structure attribute used for marking sequence and 
nest relation of each fragment for each fragment, and generating an 
XML (Extensive Makeup Language) file provided with the sequence and the 
nest relation; secondly, respectively carrying out semantic description on each fragment in the 
XML file according to a structural dictionary and a 
semantics dictionary to form a new 
XML file; and at last, adding an 
XML Schema announcement and a copyright announcement to the new XML file, adding into original audio and video files to generat an audio and video file comprising the XML file, or simultaneously adding a position of the fragment of the corresponding audio and video file into the new XML file to generatthe XML file corresponding to the original audio and video files. According to the method and 
system provided by the invention, the precise search of 
audio frequency and video frequency based on the audio and video files or the XML file corresponding to the original audio and video files can be realized.