Method for realizing text feature extraction based on improved small-world network model

A small-world network and model implementation technology, which is applied in the field of text feature extraction by the small-world network model, can solve the problems of not considering the semantic status contribution of feature words and the lack of data normalization processing, etc., so as to reduce the error rate of results and greatly Utilization value, effect of data processing norms

Inactive Publication Date: 2017-08-11
SICHUAN YONGLIAN INFORMATION TECH CO LTD
View PDF3 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the commonly used text feature extraction methods do not consider the semantic status of the feature vocabulary

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for realizing text feature extraction based on improved small-world network model
  • Method for realizing text feature extraction based on improved small-world network model
  • Method for realizing text feature extraction based on improved small-world network model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] In order to solve the problem that the commonly used text feature extraction method does not consider the semantic status of the feature vocabulary and its contribution to the expression of the main text of the text, and the lack of data normalization processing, combined with Figure 1-Figure 4 The present invention has been described in detail, and its specific implementation steps are as follows:

[0029] Step 1: Initialize the text corpus module, and perform Chinese word segmentation preprocessing on the text W. The specific description process is as follows:

[0030] Comprehensive word segmentation and stop word removal technology, the flow chart of the Chinese text preprocessing process is as follows figure 2 . .

[0031] The word segmentation method here uses a Chinese automatic word segmentation algorithm based on information theory, and its specific word segmentation and stop word removal steps are as follows:

[0032] Step 1.1: Use the stop table to proces...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for realizing text feature extraction based on an improved small-world network model. According to the method, a semantic relevancy function is determined according to a Chinese word segmentation preprocessing process and determined vocabulary position weights and word class weights in combination with a (HowNet) two-vocabulary relevancy algorithm and a vocabulary-to-text importance method, wherein the function is subjected to normalization processing, and calculation conditions of values are more standard; and two parameters, namely a density parameter and a weight parameter are set for a lexical semantic network model graph, the two parameters are effectively fused, and an appropriate threshold value is set to extract text feature vocabularies. The method has higher accuracy and overcomes the defect that a traditional method is only suitable for extracting text features of one category; the method has higher application value, contribution degrees of different vocabularies to text thought can be precisely calculated, data processing is more standard, the result error rate is lowered, the constructed lexical semantic network model graph better conforms to the actual condition, and meanwhile a good theoretical basis is provided for subsequent text clustering.

Description

technical field [0001] The invention relates to the technical field of semantic network, in particular to a text feature extraction method based on an improved small-world network model. Background technique [0002] At present, the traditional keyword extraction algorithm only pays attention to the surface statistical characteristics of the document (such as word frequency, word position, word length, etc.), and ignores the semantic information and structural information of the document, resulting in the lack of keyword semantic and structural information. However, the existing keyword extraction algorithms based on the word network use the structural information of the document to a certain extent, but the utilization of semantic information is still insufficient, and the construction process of the network depends too much on the granularity of word segmentation. The information gain method is only suitable for extracting text features of one category, but cannot be used ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27G06F17/30
CPCG06F16/35G06F40/216G06F40/242G06F40/284G06F40/30
Inventor 金平艳
Owner SICHUAN YONGLIAN INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products