Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Sentence similarity calculation method based on edge information and semantic information

A technology of sentence similarity and semantic information, applied in special data processing applications, unstructured text data retrieval, text database query, etc., can solve the problems of high computational complexity and low accuracy, and achieve the effect of improving computational accuracy

Active Publication Date: 2020-04-10
SUN YAT SEN UNIV
View PDF5 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The present invention provides a sentence similarity calculation method based on edge information and semantic information in order to overcome the technical defects of low accuracy and high calculation complexity in the existing sentence similarity calculation method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Sentence similarity calculation method based on edge information and semantic information
  • Sentence similarity calculation method based on edge information and semantic information
  • Sentence similarity calculation method based on edge information and semantic information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0069] Such as figure 1 As shown, a method for calculating sentence similarity based on edge information and semantic information includes the following steps:

[0070] S1: Input the comparison sentence to be processed, and calculate the difference in sentence length;

[0071] S2: Perform text preprocessing on the comparison sentence to be processed, and dynamically generate the first word pair vector and the second word pair vector;

[0072] S3: Calculate the similarity of the first word pair vector and the second word pair vector based on the edge information and semantic information to obtain the mixed similarity;

[0073] S4: Use the overall similarity variable to correct the similarity vector to obtain the corrected vector;

[0074] S5: Calculate sentence dependent variables based on the first word pair vector and the second word pair vector through the dependency model;

[0075] S6: Use sentence dependent variables and sentence length differences to further correct the corrected ve...

Embodiment 2

[0106] More specifically, on the basis of Example 1, such as image 3 As shown, the Pearson coefficient of the algorithm proposed by the present invention is higher than the Pearson coefficient of the existing traditional algorithm, and the performance is better than that of the existing traditional algorithm. Table 1 is the R&G word pair similarity data table, specifically:

[0107] Table 1 R&G word pair similarity data table

[0108]

[0109]

[0110]

[0111] The above table is the specific value for calculating the similarity of R&G word pairs using this algorithm.

[0112] More specifically, the step S4 is specifically:

[0113] S41: According to the definition of R&G, when the word similarity value is greater than 0.8025, the word pair can be defined as a synonym [6], so the number of data exceeding 0.8025 value in the two mixed similarity vectors is counted, and the overall similarity variable is calculated, specifically as :

[0114] ω=sum(C1,C2) / γ

[0115] Where C 1 , C 2 Res...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a sentence similarity calculation method based on edge information and semantic information. The method comprises the following steps: inputting a comparison sentence to be processed, and calculating a sentence length difference value; performing text preprocessing on the comparison statement to be processed, and dynamically generating a word pair vector; calculating word pair vector similarity based on the edge information and the semantic information to obtain mixed similarity; correcting a similarity vector by utilizing a similarity overall variable; calculating a sentence dependence variable according to the word pair vector through a dependence model; and further correcting the corrected vector by utilizing the sentence dependence variable and the sentence length difference value, and outputting a final similarity score. According to the similarity calculation method provided by the invention, the word similarity calculation precision is comprehensively improved, the influence of the sentence length on the sentence similarity calculation precision is reduced by utilizing the sentence overall similarity variable, the sentence overall similarity is comprehensively corrected by utilizing the dependence variable and the sentence length difference value, and the sentence similarity calculation precision is improved.

Description

Technical field [0001] The invention relates to the technical field of natural language processing, and more specifically, to a sentence similarity calculation method based on edge information and semantic information. Background technique [0002] The existing methods for calculating the similarity of words or sentences can be divided into four categories [1]: 1) Calculating similarity based on word co-occurrence [2]; 2) Calculating similarity based on corpus data [3]; 3) Based on network engine Calculate similarity [4]; 4) Based on word embedding, use neural network to calculate similarity. [0003] First of all, the method of calculating similarity based on word co-occurrence is to put words in a tree-like knowledge base, and use the shortest path length between word sub-concepts, common node depth, concept density and other characteristics to quantify word similarity. This method There are obvious shortcomings. The shortcoming is that the method ignores the position informatio...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/33
CPCG06F16/334G06F16/3344G06F16/3334G06F16/3335Y02D10/00
Inventor 张琳叶家豪
Owner SUN YAT SEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products