Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Improved algorithm for extracting text feature by small-world model

A technology of world model and text, which is applied in the field of extracting text features by the improved small world model, can solve the problems of ignoring the semantic information and structural information of documents, not considering the semantic status of feature words, and the lack of semantic and structural information of keywords, etc., to achieve large Utilize value, high accuracy, high precision effects

Inactive Publication Date: 2017-12-01
SICHUAN YONGLIAN INFORMATION TECH CO LTD
View PDF3 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the commonly used text feature extraction methods do not consider the semantic status of the feature vocabulary and its contribution to the expression of the text, and ignore the semantic information and structural information of the document, resulting in the lack of semantic and structural information of keywords.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Improved algorithm for extracting text feature by small-world model
  • Improved algorithm for extracting text feature by small-world model
  • Improved algorithm for extracting text feature by small-world model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] In order to solve the problem that the commonly used text feature extraction method does not consider the semantic status of the feature vocabulary and its contribution to the expression of the text, and the lack of keyword semantic and structural information, combined with Figure 1-Figure 4 The present invention has been described in detail, and its specific implementation steps are as follows:

[0026] Step 1: Initialize the text corpus module, and perform Chinese word segmentation preprocessing on the text W. The specific description process is as follows:

[0027] Comprehensive word segmentation and stop word removal technology, the flow chart of the Chinese text preprocessing process is as follows figure 2 . .

[0028] The word segmentation method here uses a Chinese automatic word segmentation algorithm based on information theory, and its specific word segmentation and stop word removal steps are as follows:

[0029] Step 1.1: Use the stop table to process t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention disclosed an improved algorithm for extracting a text feature by a small-world model. According to a Chinese word segmentation preprocessing process, vocabulary position weight and part-of-speech weight are determined, and the semantic relativity function R (c1,c2) is determined by combining a relevancy algorithm of two vocabularies on the basis of the HowNet with a vocabulary to text important degree method. Since a polymerism can construct a vocabulary semantic network model map, two parameters, including a density parameter and an edge weight parameter, are set, proper threshold values are independently set, and the threshold values are taken as text feature vocabularies if a condition is met. The algorithm exhibits better accuracy and application value, the deficiency that an information gaining method is only suitable for extracting text features of one category, the contribution degrees of different vocabularies for text thought can be accurately calculated, and meanwhile, a good theoretical basis is provided for subsequent text clustering.

Description

technical field [0001] The invention relates to the technical field of semantic network, in particular to an improved small-world model algorithm for extracting text features. Background technique [0002] In the era of information explosion, the amount of information increases exponentially. In the face of massive texts, how to quickly grasp the theme of an article and grasp the author's thoughts has become a key issue in saving readers' time and improving their reading speed. Text features as the embodiment of the theme of the article and the author's thoughts can effectively solve this problem. However, the vast majority of articles on the Internet do not provide keywords. If manual indexing is used to mark these texts, it will not only be time-consuming, laborious, inefficient, but also relatively subjective. [0003] Currently commonly used text feature extraction methods include word frequency-inverse document frequency method—TF-IDF, information gain method, mutual ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G06F17/30
CPCG06F16/35G06F40/284G06F40/30
Inventor 金平艳
Owner SICHUAN YONGLIAN INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products