Method and device for determining semantic similarity of text based on graph data

A graph data and similarity technology, applied in the computer field, can solve the problems of limited effect of training text similarity model, complex semantic expression, large amount of data, etc., and achieve the effect of an effective text semantic similarity determination method

Active Publication Date: 2022-04-12
ALIPAY (HANGZHOU) INFORMATION TECH CO LTD
View PDF12 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In some scenarios with a large amount of data and complex semantic expressions (such as cloud customer service), although a large amount of corpus has been accumulated, it is difficult to collect high-quality annotation data for a single business, and the effect of training text similarity models is limited.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for determining semantic similarity of text based on graph data
  • Method and device for determining semantic similarity of text based on graph data
  • Method and device for determining semantic similarity of text based on graph data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] The solutions provided in this specification will be described below in conjunction with the accompanying drawings.

[0047] figure 1 A schematic diagram of graph data suitable for the implementation architecture of this specification is shown. The graph data applicable to the implementation framework of this specification can describe the sentences and words in the corpus, as well as the relationship between them. In graph data, sentences and words correspond to each node, and each node can be represented by a corresponding node vector. The connection relationship between nodes is represented by connecting edges. Among them, the corpus can include corpus data obtained from various channels or sources, such as news corpus, daily chat corpus, diplomatic corpus, professional academic data (such as agricultural, medical and other professional corpus), and customer service of various network platforms. corpus and so on.

[0048] exist figure 1 In , graph data is given ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of this specification provides the concept of using graph data to determine the semantic similarity of text. Under this technical concept, the nodes in the graph data correspond to the sentences and words in the corpus, and the associated sentences and words, words and words are connected by Connection edges are connected, and each node corresponds to a node expression vector that can express the semantic information of the corresponding word or sentence. In the data preprocessing process of graph data, a large amount of unsupervised data can be used to construct graph data to describe the relationship between words and sentences, and words and words, and the model parameters of the processing model can be optimized through a small amount of supervised data to make similar Vector representations of text can interact with each other, enabling efficient vector representation of text and vocabulary through graph data. When determining the semantic similarity of the text, the vector of the text to be determined semantic similarity is obtained through the graph data, and the semantic similarity of the text is determined using the vector similarity. In this way, the generality, accuracy and effectiveness of text semantic similarity can be improved.

Description

technical field [0001] One or more embodiments of this specification relate to the field of computer technology, and in particular, to a method and device for expressing text vectors based on graph data, and a method and device for determining text similarity by computer based on graph data. Background technique [0002] With the development of artificial intelligence technology, more and more businesses can be completed through machine learning models. The processing of natural language by machine learning models is also an important research direction. For example, in the field of intelligent customer service, it is usually necessary to identify the semantics of the text and determine the standard questions corresponding to the user's questions, so as to provide users with appropriate answers. In this case, many schemes involve the problem of text similarity, that is, the degree of similarity between the natural language expression text of the user's question and the stan...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06Q30/00G06F16/33G06F40/289G06F40/30
CPCG06Q30/01G06F16/3344
Inventor 杨明晖崔恒斌陈晓军陈显玲
Owner ALIPAY (HANGZHOU) INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products