Sentence semantic distance measurement method

A technology of semantic distance and measurement method, applied in special data processing applications, instruments, unstructured text data retrieval, etc.

Active Publication Date: 2019-07-12
网经科技(苏州)有限公司
View PDF4 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The method of direct sentence embedding not only requires a very large-scale training corpus, but also uses a relatively complex neural network training model. Although it has achieved a small lead in some evaluation tasks, it is limited in the vertical field due to the limitations of corpus size and computing resources and Low interpretability, not universally applicable

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Sentence semantic distance measurement method
  • Sentence semantic distance measurement method
  • Sentence semantic distance measurement method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0073] In order to have a clearer understanding of the technical features, purposes and effects of the present invention, specific implementations are now described in detail.

[0074] The measuring method of sentence semantic distance of the present invention comprises the following steps:

[0075] 1) Perform word segmentation and stop word preprocessing on the sentence data set;

[0076] Usually the original sentence data is not separated between words, and contains function words, symbols, etc. that do not contribute to semantic expression, so preprocessing is required;

[0077] Use word segmentation methods or tools to preprocess sentence data sets. The word segmentation methods are dictionary-based maximum matching methods, full segmentation path selection methods, word sequence tagging-based methods, or transfer-based word segmentation methods. The word segmentation tools are open source tools or Closed-source word segmentation tools, word segmentation tools provide a v...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a sentence semantic distance measurement method. The method comprises the following steps: firstly, carrying out word segmentation and stop word removal preprocessing on a sentence data set; selecting a word meaning similarity scheme, and setting a threshold value to execute normalization of synonymous words and synonymous words; then, calculating the vector space distanceof the two statements by combining smooth inverse frequency weighting and common component removal; measuring the word order distance of the two statements according to the out-of-order degree; calculating the semantic dependency distance of the two statements by combining the semantic dependency quintuple features; and finally, carrying out hybrid weighting calculation on the vector space distance, the word order distance and the semantic dependency distance. Measurement is from three dimensions of sentence vector representation, sentence word sequence and sentence component dependency, andfinally a final semantic distance is obtained in a weighted summation manner. A word level calculation means is utilized, and a sentence level operation idea is absorbed, and through introduction andcreative combination of a vector space distance, a word order distance and a semantic dependency distance, the semantic distance of the sentences is more comprehensively and reasonably measured.

Description

technical field [0001] The invention relates to a method for measuring the semantic distance of sentences, belonging to the technical field of text information processing. Background technique [0002] Semantic computing is one of the basic tasks in the field of text information processing, and it has practical usage scenarios at all levels from words, sentences, paragraphs to chapters. According to the development status of natural language processing technology, different ideas and strategies are currently adopted in semantic computing at different levels. For the calculation of semantic distance between sentences, the research work mainly focuses on two levels, namely word level and sentence level. [0003] The word-level measurement method, the main idea is to filter out the word strings that have a greater impact on semantics after the two sentences to be compared are segmented, and it is also possible to perform a shallow dependency analysis. For each word in the word...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F16/33
CPCG06F40/211G06F40/284G06F40/30Y02D10/00
Inventor 孟亚磊刘继明金宁陈浮刘松
Owner 网经科技(苏州)有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products