Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

New small world network model realization text feature extraction method

A small-world network and model implementation technology, applied in the field of text feature extraction based on small-world network models, can solve the problems of not considering the semantic status of feature words, lack of keyword semantics and structural information, ignoring document semantic information and structural information, etc. , to achieve the effect of large utilization value, standardized data processing and high accuracy

Inactive Publication Date: 2017-08-18
SICHUAN YONGLIAN INFORMATION TECH CO LTD
View PDF3 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the commonly used text feature extraction methods do not consider the semantic status of the feature vocabulary and its contribution to the expression of the text, and ignore the semantic information and structural information of the document, resulting in the lack of semantic and structural information of keywords.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • New small world network model realization text feature extraction method
  • New small world network model realization text feature extraction method
  • New small world network model realization text feature extraction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] In order to solve the problem that the commonly used text feature extraction methods do not consider the semantic status of the feature vocabulary and its contribution to the expression of the text, combined with Figure 1-Figure 4 The present invention has been described in detail, and its specific implementation steps are as follows:

[0028] Step 1: Initialize the text corpus module, and perform Chinese word segmentation preprocessing on the text W. The specific description process is as follows:

[0029] Comprehensive word segmentation and stop word removal technology, the flow chart of the Chinese text preprocessing process is as follows figure 2 . .

[0030] The word segmentation method here uses a Chinese automatic word segmentation algorithm based on information theory, and its specific word segmentation and stop word removal steps are as follows:

[0031] Step 1.1: Use the stop table to process the text to remove stop words.

[0032] Step 1.2: According to...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a new small world network model realization text feature extraction method. According to a Chinese word segmentation preprocessing process, position weights and part-of-speech weights of vocabularies are determined; semantic correlation functions defined in the specification are determined by integrating "HowNet"-based two vocabulary correlation algorithms and a method for importance of the vocabularies to a text; herein, the functions are all subjected to normalization processing; value calculation conditions are more normative and stricter; two parameters including a density parameter and an edge weight parameter are set for a lexical semantic network model graph; proper thresholds are set for the parameters; and when the conditions are met, the vocabularies are text feature vocabularies. The method has higher accuracy, overcomes the deficiency that an information gain method is only suitable for extracting a type of text feature, has higher application values, and can accurately calculate contribution degrees of different vocabularies to a text thought; the data processing is more normative; the constructed lexical semantic network model graph better conforms to actual conditions; and a good theoretical basis is provided for subsequent text clustering.

Description

technical field [0001] The invention relates to the technical field of semantic network, in particular to a method for extracting text features realized by a new small-world network model. Background technique [0002] Currently commonly used text feature extraction methods include word frequency-inverse document frequency method—TF-IDF, information gain method, mutual information and other methods; the simple structure of TF-IDF cannot effectively reflect the importance of words or phrases and the characteristics of feature values. Distribution, so the accuracy of TF-IDF is not very high. The information gain method is only suitable for extracting text features of one category, but cannot be used for extracting text features of multiple categories. The mutual information method considers the ratio of the probability of category occurrence to the probability of collection, which will cause a defect, that is, the difference in the number of texts in the category collection w...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/30
CPCG06F16/35G06F40/216G06F40/242G06F40/284G06F40/30
Inventor 金平艳
Owner SICHUAN YONGLIAN INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products