Mixed text feature word extraction method

An extraction method and text technology, applied in the field of semantic network, can solve the problem that the extraction method cannot be extracted and the accuracy is not high, and achieve the effect of great use value, strict method conditions, and high accuracy

Inactive Publication Date: 2017-05-03
SICHUAN YONGLIAN INFORMATION TECH CO LTD
View PDF4 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] For the commonly used text feature extraction methods require a large number of training sets as the premise of extraction, if only a text is given, the text feature will not

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Mixed text feature word extraction method
  • Mixed text feature word extraction method
  • Mixed text feature word extraction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] In order to solve the problem that the commonly used text feature extraction methods require a large number of training sets as the premise of extraction, if only one text is given, the text feature cannot be extracted and the accuracy of the commonly used text feature extraction method is not high. Figure 1-Figure 4 The present invention has been described in detail, and its specific implementation steps are as follows:

[0025] Step 1: Use Chinese word segmentation technology to perform word segmentation processing on the text. The specific word segmentation technology process is as follows:

[0026] Step 1.1: According to the "word segmentation dictionary", find the word in the sentence to be segmented that matches the dictionary, scan the Chinese character string to be segmented completely, search and match in the dictionary of the system, and mark the words in the dictionary when encountering them ; If there is no relevant match in the dictionary, simply split the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a mixed text feature word extraction method comprising the steps of performing word segmentation on a text by using a word segmentation technology; removing stop words by matching a stop word list; acquiring a series of word position and property weight values according to research statistics; extracting the first word c(w1), and the first m-1 words of which the RE (ci,c(w1) values are bigger according to information quantity of the words in the text based on the abovementioned two factors; building a word network model based on a word semantic similarity method; and at last finding a text feature word set meeting an important degree condition according to a neighborhood method. Compared with traditional text feature word extraction method, the method provided by the invention has higher accuracy, harsher conditions, and higher application value; according to the method, the feature of the text can be extracted under the condition that only a text condition is provided without a text set or without advanced classification, contribution degrees of the different words to a text thought are computed, and a good theoretical basis is provided for subsequent text similarity and text clustering.

Description

technical field [0001] The invention relates to the technical field of semantic network, in particular to a method for extracting mixed text feature vocabulary. Background technique [0002] Text features refer to the collection of words that can best represent the subject of the text. Text features can not only summarize the main content and subject of the text well, but also reduce the complexity of text processing. Currently commonly used text feature extraction methods include word frequency-inverse document frequency method, information gain and other methods. The simple structure of the word frequency-inverse document frequency method cannot effectively reflect the importance of words or phrases and the distribution of feature values, so the accuracy of TF-IDF is not very high. The information gain method is only suitable for extracting text features of one category, but cannot be used for extracting text features of multiple categories. The above two text feature ex...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27G06F17/30
CPCG06F16/35G06F40/284G06F40/30
Inventor 金平艳
Owner SICHUAN YONGLIAN INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products