Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A text representation method based on wt-glove word vector construction

A technology of text representation and word vector, applied in the fields of natural language processing, data mining and text classification, it can solve the problems of complex calculation and insufficient representation of text information.

Active Publication Date: 2021-09-10
XIAN UNIV OF TECH
View PDF9 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The purpose of the present invention is to provide a text representation method based on WT-GloVe word vector construction, which solves the problems of complex calculation or insufficient comprehensive text information representation in the traditional text representation method existing in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A text representation method based on wt-glove word vector construction
  • A text representation method based on wt-glove word vector construction
  • A text representation method based on wt-glove word vector construction

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0056] The present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0057] A text representation method based on WT-GloVe word vector construction of the present invention, the flow chart is as follows figure 1 As shown, the specific steps are as follows:

[0058] Step 1. Calculate and evaluate the importance of the network text itself by calculating the word distance of its own features, and judge its own contribution to the category according to the inter-class distribution of the feature, and combine the two as a feature weighted model of word distance and inter-class distribution, which is called WDID-TFIDF, step 1 is implemented according to the following steps:

[0059] Load the data set 20NewsGroups, import the required modules, give the GloVe model, set the training data storage path, and the encoding format; define functions, introduce the English general stop vocabulary, perform word segmentation o...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a text representation method based on WT‑GloVe word vector construction. Firstly, the importance degree is evaluated by calculating the word distance of the network text's own characteristics, and the contribution degree to the category is judged according to the inter-class distribution of the features, and the two The former combines the feature weighted model as word distance and inter-class distribution, which is called WDID-TFIDF; then filters irrelevant words according to the shortcomings of the GloVe model to improve the quality of word vector training; finally selects the corresponding word distance and inter-class distribution according to the results The feature weighted value and dot multiplication to obtain the weighted word vector model, which is the final text representation method. The invention solves the problems in the prior art that the calculation of the traditional text representation method is complicated or the representation of the text information is not comprehensive enough.

Description

technical field [0001] The invention belongs to the technical fields of natural language processing, data mining and text classification, and in particular relates to a text representation method based on WT-GloVe word vector construction. Background technique [0002] The rapid development of the Internet industry has led to the emergence of a large number of industries such as social networks and mobile Internet, and the continuous growth of the number of websites worldwide has led to the generation of explosive information. Spam filtering of emails, question classification of question answering systems, identification of query information in search engines, judgment of positive and negative sentiments of products on shopping websites, analysis of public opinions in government systems, discovery of new topics in social media, and monitoring of online public opinion, etc. Both require continuous updating of ultra-large-scale text dataset processing technologies. At the sam...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/62G06F40/20
CPCG06F40/20G06F18/22G06F18/24G06F18/214
Inventor 姚全珠古倩费蓉赵佳瑜李莎莎
Owner XIAN UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products