Automatic pick-up method of key features of digital document

A digital file and key feature technology, applied in the direction of electronic digital data processing, special data processing applications, instruments, etc., can solve the problems of limited application range, slow comparison speed of thesaurus, and inability to extract overlapping repeated word strings, etc.

Inactive Publication Date: 2006-04-26
WEBGENIE INFORMATION +1
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0018] 1. In the thesaurus comparison method, although the results are all correct words in the thesaurus, it does not guarantee that all keywords can be extracted
In addition, it requires a lot of manpower and time to maintain the thesaurus, and cannot cope with unexpected proper names such as names of people, places, and institutions, and the larger the thesaurus, the slower the comparison speed;
[0019] 2. Most of the grammar analysis programs need to rely on the established dictionaries or corpora, so their shortcomings are the same as the thesaurus comparison method
In addition, some can only analyze complete sentences that conform to the grammar, so that keywords in data such as bibliography, title and OCR text cannot be extracted;
[0020] 3. Although the statistical analysis method can extract new vocabulary, this method requires a large number of data samples to determine the appropriate statistical parameters. Keywords with insufficient statistical parameters will not be selected, thus limiting the scope of its application;
[0021] 4. Regarding the music melody part, using the method of extracting repeated strings, the computational complexity is O(n 2 ), or there is a way to improve the computational complexity, but this method cannot extract overlapping repeated strings; and
[0022] 5. Some methods of extracting repeated strings have some limitations in use
For example: only all repeated strings with length K can be found, or only repeated strings with the longest length L can be found

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic pick-up method of key features of digital document
  • Automatic pick-up method of key features of digital document
  • Automatic pick-up method of key features of digital document

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0052] For music files, repeated segments can be used as the basis for key melody extraction. For text data, repeated segments can also be used as the basis for automatic keyword extraction. This is because when an article discusses a certain topic, Some strings are often mentioned several times. For example, an article discussing information retrieval will inevitably mention the words "information retrieval", "retrieval system", "information retrieval system" several times. Therefore, the ideal automatic key feature extraction method is a set of techniques that can at least extract all the largest overlapping repeating fragments that are different. The so-called maximum here refers to the longest string length, or the highest number of occurrences. That is to say, if a certain repeating segment is not a substring of any repeating segment, it should be extracted; in addition, although a certain repeating segment is a substring of another repeating string, if its occurrence fr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

An automatic extraction method for key features of digital files. The digital file is converted into a columnar data structure including at least one columnar element. Then set a combined column area to be in an empty state, and take out these column elements in the column data structure in sequence. According to different situations, these bar elements are recombined and put into the combined bar area or the last bar area, or directly discarded. Then convert the combined column area into a column data structure. Repeat the above steps until the stop condition is reached. Finally, the column elements in the last column area are filtered according to a key feature filter condition, so as to be used for key feature retrieval.

Description

technical field [0001] The invention relates to an automatic retrieval method for digital files, in particular to an automatic retrieval method for key features of digital files. Background technique [0002] The current level of accessibility and popularization of the Internet has led to faster data growth and more frequent use of various retrieval systems. The new generation of information retrieval systems, especially those that allow full-text or content-based queries, must be able to use more efficient automation technologies to provide simple and effective retrieval services. However, most of these automated technologies, such as automatic indexing, automatic index dictionary creation, automatic summarization, automatic classification, relevant feedback, automatic filtering, approximate retrieval, etc., must first perform the action of key feature extraction of documents, and then proceed based on the results other processing. Therefore, whether it is bibliographic d...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 曾元显
Owner WEBGENIE INFORMATION
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products