Acquisition system and method of text field based on concept symbols

A technology for acquiring systems and fields, applied in the field of text and language information processing based on conceptual symbols, and can solve problems such as difficult to provide, rough field categories, and combination.

Inactive Publication Date: 2010-02-10
INST OF ACOUSTICS CHINESE ACAD OF SCI
View PDF1 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the domain categories obtained by automatic text clustering are very rough, and the results are difficult to adapt to actual needs due to the lack of classification guidance
Moreover, the same text clustering method has a good effect on a certain text set, but may have a poor effect on another text set, that is, there are shortcomings in the practicality and stability of text clustering.
[0004] In summary, the statistical method of text classification requires a large amount of pre-classified training corpus, which is often difficult to provide during classification.
Although text clustering can overcome this shortcoming, it is difficult to combine the clustering results with the actual needs of classification.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Acquisition system and method of text field based on concept symbols
  • Acquisition system and method of text field based on concept symbols
  • Acquisition system and method of text field based on concept symbols

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0069] The present invention will be described in detail below in conjunction with specific embodiments and accompanying drawings.

[0070] First, 11 texts of news reports about the 2004 Athens Olympic Games were downloaded from the Internet, with a total of 60 natural paragraphs and 6501 Chinese characters.

[0071] Secondly, according to the design principles and design symbols in "Basic Theorems of Language Concept Space and Mathematical Physical Expressions" (Ocean Press, July 2004), the concept symbols of the q73 (competition) field are specifically perfected, and the results of the competition field are obtained. Set of concept symbols. At the same time, it enriches the words and their semantic knowledge about the competition in the word knowledge base.

[0072] Third, use a word segmentation processor to segment, sentence, and word-segment a text. For example, the following text: Title: Malaysia's "little flag bearer" did not enter the diving semi-finals

[0073] Xin...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an acquisition system and a method of the text field based on concept symbols. The system comprises a concept symbol set for expressing word concepts and field categories, a word knowledge base for storing word and concept symbols, a word segmentation processor, a statement semantic analyzer and a field arbiter. The method comprises the following steps: (1) segmenting an input text into paragraphs, statements and words; (2) carrying out semantic analysis on the statements for obtaining concept categories and semantic blocks of the statements; (3) obtaining activating words in the statements according to semantic concept symbols in the field concept symbol set and the word knowledge base; (4) carrying out comprehensive scoring on field concept symbols of the activating words and obtaining the field concept symbol with the highest score as the field of the statements; (5) merging the statements in the paragraphs according to the field concept symbols for obtaininga statement group and the field thereof; and (6) obtaining the field of the text according to a title of the text and the frequency of occurrence and the position of the statement group in the statement group.

Description

technical field [0001] The present invention relates to the field of text information processing by using computer science and technology, in particular to a system and method for acquiring text field based on concept symbols. Background technique [0002] Text classification technology is the method and process of using computers to classify a text into one or more domain categories according to certain rules, knowledge and steps. The general method of text classification is to represent texts as feature vectors, and when the "angle" of the feature vectors of two texts is less than a certain angle, they are classified into the same category. Generally, words are selected as the text features to form the feature vector of the text. The construction method of the feature vector mostly uses the TF*IDF method or the TF*IWF method derived from it. TF*IDF uses the frequency of occurrence of words in the document and in the document collection. The product of the reciprocal of th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
Inventor 韦向峰黄曾阳张全缪建明
Owner INST OF ACOUSTICS CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products