New text feature vocabulary extraction method

An extraction method and text technology, applied in the field of semantic network, can solve the problem that the extraction method cannot be extracted and the accuracy is not high, and achieve the effect of great use value and high accuracy

Inactive Publication Date: 2017-05-03
SICHUAN YONGLIAN INFORMATION TECH CO LTD
View PDF2 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] For the commonly used text feature extraction methods require a large number of training sets as the premise of extraction, if only one text is given, the text feature will not be able to be extracted and the commonly used text feature extraction methods have low precision, the present invention provides a A New Method of Text Feature Vocabulary Extraction

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • New text feature vocabulary extraction method
  • New text feature vocabulary extraction method
  • New text feature vocabulary extraction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] In order to solve the problem that the commonly used text feature extraction methods require a large number of training sets as the premise of extraction, if only a text is given, the text feature cannot be extracted and the accuracy of the commonly used text feature extraction method is not high. Figure 1-Figure 3 The present invention has been described in detail, and its specific implementation steps are as follows:

[0021] Step 1: Use Chinese word segmentation technology to perform word segmentation processing on the text. The specific word segmentation technology process is as follows:

[0022] Step 1.1: According to the "word segmentation dictionary", find the word in the sentence to be segmented that matches the dictionary, scan the Chinese character string to be segmented completely, search and match in the dictionary of the system, and mark the words in the dictionary when encountering them ; If there is no relevant match in the dictionary, simply split the s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a new text feature vocabulary extraction method. The method comprises the following steps: performing text word segmentation processing by using the word segmentation technology, performing unused word removal processing on vocabularies by using an unused word matching table, obtaining a series of vocabulary positions and vocabulary weight values according to investigation, survey and statistics, and extracting the first word c(w1), and the first m-1 words of which the RE (ci,c(w1)) values are bigger according to information quantity of the words in the text based on the abovementioned two factors, namely obtaining a feature vocabulary vector of the text. Compared with the traditional text feature vocabulary extraction method, the new text feature vocabulary extraction method provided by the invention has the advantages of high accuracy and higher application value, features of the text can be extracted when no text set is provided or no category is classified in advance and only one text condition is provided, the construction of different vocabularies to the idea of the text is calculated, and good theoretical basis is provided for subsequent text similarity and text clustering.

Description

technical field [0001] The invention relates to the technical field of semantic network, in particular to a new method for extracting text feature vocabulary. Background technique [0002] Text features refer to the collection of words that can best represent the subject of the text. Text features can not only summarize the main content and subject of the text well, but also reduce the complexity of text processing. The commonly used text feature extraction methods include word frequency-inverse document frequency method, information gain and other methods. The simple structure of the word frequency-inverse document frequency method cannot effectively reflect the importance of words or phrases and the distribution of feature values, so the accuracy of TF-IDF is not very high. The information gain method is only suitable for extracting text features of one category, but cannot be used for extracting text features of multiple categories. The above two text feature extraction...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/3344G06F16/335G06F16/35G06F16/36
Inventor 金平艳
Owner SICHUAN YONGLIAN INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products