Chinese sentence similarity calculation method based on Word2Vec

A similarity calculation and similarity technology, which is applied in calculation, semantic analysis, natural language data processing, etc., can solve problems such as insufficient consideration and specific calculation, and achieve the effect of accurate and reliable functions and improved accuracy

Inactive Publication Date: 2018-12-21
NORTHEASTERN UNIV
View PDF3 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this method does not fully consider and specifically calculate the relationship between words in a sentence, and

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese sentence similarity calculation method based on Word2Vec
  • Chinese sentence similarity calculation method based on Word2Vec
  • Chinese sentence similarity calculation method based on Word2Vec

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] The specific embodiment of the present invention will be further described below in conjunction with accompanying drawing:

[0036] as attached figure 1 As shown in -3, a method for calculating the similarity of Chinese sentences based on Word2Vec, the method includes:

[0037] S1.1, train a Chinese corpus through Word2Vec to obtain a word vector model;

[0038] S1.2, crawl online corpus through crawler software, and create question templates;

[0039] S1.3, perform word segmentation, part-of-speech analysis and syntactic analysis on the question Q input by the user and a question A in the question template;

[0040] S1.4, match the question Q entered by the user with the question A in the question template through the word vector model, and obtain the similarity adjustment coefficient score1 and semantics between the question Q input by the user and the question A in the question template similarity score2;

[0041] S1.5, obtain the sentence similarity score betwee...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Chinese sentence similarity calculation method based on Word2Vec. This method is based on large-scale corpus training to get the word vector model, and the sentence is represented as syntactic component tree structure by LTP syntactic parser. The calculation method comprises the following steps of: accepting a question Q input by a user; performing word segmentation, part-of-speech analysis and syntactic analysis of user input question Q; The similarity adjustment coefficient score1 and semantic similarity score score2 between question Q and question A are obtained bymatching the question Q input by user and each question A in the question template. The similarity adjustment coefficient score1 and semantic similarity score score2 are calculated. According to thesimilarity adjustment coefficient score1 and the semantic similarity score2, the sentence similarity score between the question Q and the question A is calculated. The invention effectively improves the accuracy of similarity calculation by adding sentence structure information into sentence similarity calculation and calculating syntactic relations between words.

Description

technical field [0001] The invention relates to the technical field of natural language processing, in particular to a method for calculating the similarity of Chinese sentences. Background technique [0002] Sentence similarity calculation is an important basic research work in text information processing. This technology is widely used in text summarization, automatic question answering system and machine translation. The accuracy of these application systems largely depends on the accuracy of sentence similarity calculation. Therefore, improving the accuracy of sentence similarity calculation is the primary problem to be solved in current research. [0003] Statistical language models have become the mainstream in the field of natural language processing research. However, in the past, most of the statistical learning methods in the field of natural language processing belonged to shallow models, and the ability to learn data representation was weak. The calculation of s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27
CPCG06F40/211G06F40/289G06F40/30
Inventor 姜涛王庆宫俊
Owner NORTHEASTERN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products