Sentence similarity calculation method based on sentence meaning structure characteristics

A technology of sentence similarity and structural features, applied in computing, computer components, semantic analysis, etc., can solve the problems of sparse features and no consideration of deep semantic information, and achieve the effect of reducing loss and improving accuracy

Inactive Publication Date: 2017-02-22
BEIJING INSTITUTE OF TECHNOLOGYGY
View PDF3 Cites 35 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] In order to solve the problem of sparse features and no consideration of deep semantic information in the calculation of social short text sentence similarity, the present invention proposes a sentence similarity calculation method using sentence structure features

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Sentence similarity calculation method based on sentence meaning structure characteristics
  • Sentence similarity calculation method based on sentence meaning structure characteristics
  • Sentence similarity calculation method based on sentence meaning structure characteristics

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] In order to better illustrate the purpose and advantages of the present invention, the implementation of the method of the present invention will be further described in detail below in conjunction with specific examples.

[0033] The experiment uses the corpus disclosed in the 2013 NLP&&CC Conference oriented Chinese microblog viewpoint element extraction evaluation task. Randomly selected 5 topics and a total of 10,896 sentences as a collection of short texts, and evaluated the effect of sentence similarity calculation by applying sentence similarity calculation to short text clustering and evaluating the clustering effect. For the evaluation of clustering effect, the index of Silhouette Coefficient is used to measure. The concept of Silhouette Coefficient was first proposed by PeterJ.Pousseeuw in 1986. It combines two factors of cohesion and separation to judge the clustering effect.

[0034] The calculation steps of the silhouette coefficient are as follows:

[003...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a sentence similarity calculation method based on sentence meaning structure characteristics, aiming to solve the problem of characteristic sparsity in social short-text sentence similarity calculation. The sentence similarity calculation method includes analyzing the meaning of a sentence according to a sentence meaning structure model, digging potential thematic knowledge according to a thematic model, expanding sentence characteristics according to theme-word distribution to obtain a sentence vector based on the sentence characteristics, introducing a Paragraph Vector deep study model to study the context characteristics of the sentence, acquiring a sentence vector based on context information, and weighing sentence similarity obtained from calculation of the two sentence vectors. The sentence similarity calculation has the advantages that semantic information and the context information of the sentence are dug deeply, so that internal relations among sentences are described comprehensively and accurately, and accuracy in similarity calculation is improved.

Description

technical field [0001] The invention relates to a sentence similarity calculation method using sentence meaning structure features, and belongs to the fields of computer science and natural language processing. Background technique [0002] Sentence similarity calculation is used to measure the semantic similarity between two texts, and it is the basic link of information retrieval and automatic summarization in natural language processing. With the rapid development of social networking sites, a large number of social short texts represented by Weibo have emerged, which are short in length and diverse in presentation methods. Due to the lack of structured information of long documents, the traditional sentence similarity calculation method cannot be directly applied to this type of text. Sentence similarity calculation for short texts. [0003] At present, according to the depth of semantic analysis of sentences, the similarity calculation methods for sentences in short so...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06K9/62
CPCG06F40/211G06F40/30G06F18/22
Inventor 罗森林陈倩柔潘丽敏原玉娇
Owner BEIJING INSTITUTE OF TECHNOLOGYGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products