A method and apparatus for enriching short text semantics based on graph vector representation

A short text, representation technology, applied in semantic analysis, natural language data processing, instruments, etc., can solve the problems of external noise, text information mining interference, information clutter, etc.

Active Publication Date: 2019-03-29
SUN YAT SEN UNIV
View PDF3 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

These two methods may cause information clutter, introduce external noise, and interfere with the mining of text information

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and apparatus for enriching short text semantics based on graph vector representation
  • A method and apparatus for enriching short text semantics based on graph vector representation
  • A method and apparatus for enriching short text semantics based on graph vector representation

Examples

Experimental program
Comparison scheme
Effect test

example

[0089] The practical example of the present invention is as follows:

[0090] S10 performs word segmentation and stop word processing for the input short text corpus data, see image 3 , a word segmentation sample map;

[0091] S20 divides the data into words, and connects adjacent words in pairs, so that each short text is converted into a chain, and the texts can be connected by keywords, and finally form a picture, see Figure 4 , a partial sample graph of the word graph;

[0092] S30 performs a random walk on the word graph, traverses the nodes, starts from each node, randomly selects a word connected to it as the next node, repeats the randomly selected action, and stops walking after reaching the specified walk length, see Figure 5 , a sample map of the walk results of some word graphs;

[0093] S40 performs vectorized representation on the obtained sequence, and the model structure of the vectorized representation is as follows Figure 6 As shown, it is divided int...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and apparatus for enriching short text semantics based on graph vector representation. The device is used for realizing the method. The method includes the processingof word segmentation and deactivation of short text corpus data. The processed corpus data is connected with adjacent words to form a word graph; The word graph is randomly walked, the sequence is sequentially generated from the upper node to the lower node, and when the text chain of the word graph reaches the specified text chain length, the word graph is stopped to walk, and the sequence of allnodes is obtained; the acquired node sequence is input to the vectorization representation model, and vectorization representation is performed on all nodes; and the vector representation corresponding to all nodes are output. The invention constructs a chain by connecting adjacent words in a short text, constructs a graph by using a keyword connection mode between chains formed by different short texts, and obtains vector representation of each node by using a graph vector representation algorithm for the constructed word graph, so as to be applied to a machine learning model.

Description

technical field [0001] The invention relates to the field of natural language processing, in particular to a method and device for enriching short text semantics based on graph vector representation. Background technique [0002] Short text is the carrier of fast information transmission, such as Weibo, comments, search, news recommendation, etc., which play an important role in people's daily life. A lot of valuable information can be extracted by using these data. For example, short texts of Weibo can be used for network public opinion analysis and hot topic discovery, short texts of user comments can optimize recommendation algorithms and marketing strategies, and short texts of searches can be used for profile analysis of users , such as analyzing the user's age, gender, education, etc., so as to provide users with better and more personalized services. Text classification method is an important means of extracting potential information of short texts and mining their h...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
CPCG06F40/30G06F40/284
Inventor 郑子彬马璐
Owner SUN YAT SEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products