A Calculation Method of Document Semantic Similarity

A technology of semantic similarity and calculation method, which is applied in the field of document semantic similarity calculation, can solve the problems of little meaning, application, and semantic calculation method

Inactive Publication Date: 2018-01-19
ANHUI HUAZHEN INFORMATION SCI & TECH
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This will pose a huge challenge to the storage capacity of the system, which is also an important reason why some semantic computing methods cannot be applied in large-scale systems
[0004] Therefore, the smaller similarity value in the similarity matrix is ​​not meaningful, but also brings a huge burden to the system, and the data in the similar matrix needs to be screened

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Calculation Method of Document Semantic Similarity
  • A Calculation Method of Document Semantic Similarity
  • A Calculation Method of Document Semantic Similarity

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] refer to figure 1 , a kind of document semantic similarity computing method that the present invention proposes, adopts the method for setting the threshold value, calculates the similarity between partitions, specifically comprises the following steps:

[0031] A. Construct one or more sets of ontology databases; construct an ontology database by inputting concept systems and main description words, in the ontology database, concepts form a concept tree according to the degree of association, and the concept tree forms a concept forest;

[0032] B. Calculate the semantic similarity; use the tf-idf (term frequency-inverse document frequency, term frequency-inverse document frequency) algorithm to calculate the query object vQuery m with documentation vDoc m The semantic similarity between, the calculation formula is,

[0033]

[0034] tf is the number of times the query object appears in the document, idf is a measure of the general importance of the query object, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a document semantic similarity calculation method. According to the method, the document retrieval workload is reduced, the working efficiency is improved, a threshold value setting method is adopted, and the similarity is calculated by intervals. The method specifically comprises steps as follows: A, constructing one or more sets of body libraries: each body library is formed by inputting of concept systems and main description terms, and in each body library, concepts form concept trees according to correlations and the concept trees form concept forests; B, calculating the semantic similarity: the semantic similarity between a query object vQuerym and a document vDocm is calculated with a tf-idf (term frequency-inverse document frequency) algorithm, the calculation formula is shown in the specification, tf is the number of times of appearance of the query object in the document, idf is the metric of the general importance of the query object, sim(cmi and cnj) is the semantic similarity between cmi and cnj, the calculation formula is shown in the specification, d is the distance between every two concepts in each concept tree, c is an automatic regulation parameter along with the system, p is the predefined correlation between cmi and cnj in each body, and the default is 1.

Description

technical field [0001] The invention relates to the technical field of document-oriented intelligent information retrieval, in particular to a method for calculating document semantic similarity. Background technique [0002] Semantic computing is a kind of writing information content based on the meaning and vocabulary shared by users and computers. On the basis of people's real life, thus enriching the meaning and value of the whole real world. [0003] A search engine based on keyword matching judges whether a query matches a document through literal matching of keywords, which is a binary logic; while a semantic search engine theoretically has a non-identical relationship between most documents and the concept being queried. From the definition of semantic similarity, it can be found that the existence of this phenomenon is due to the non-zero similarity between most words in the word similarity matrix. This will pose a huge challenge to the storage capacity of the sys...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/3335G06F16/3344G06F16/36
Inventor 贾岩
Owner ANHUI HUAZHEN INFORMATION SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products