Method and apparatus for measuring similarity of documents

A similarity and document technology, applied in the computer field, can solve problems such as low similarity accuracy and inability to accurately summarize document content.

Inactive Publication Date: 2016-07-27
INSPUR SOFTWARE CO LTD
View PDF4 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] It can be seen from the above description that the solutions in the prior art only use some keywords to represent documents, and cannot accurately summarize the contents of the documents, and the accuracy of the obtained similarity is not high.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and apparatus for measuring similarity of documents
  • Method and apparatus for measuring similarity of documents
  • Method and apparatus for measuring similarity of documents

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0065] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work belong to the protection of the present invention. scope.

[0066] Such as figure 1 As shown, the embodiment of the present invention provides a method for measuring the similarity of documents, and the method may include the following steps:

[0067] S1: Obtain a target document and at least one comparison document;

[0068] S2: Perform word segmentation on the target document and the at least one comparison ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method and apparatus for measuring a similarity of documents. The method comprises the steps of acquiring a target document and at least one comparison document; carrying out word segmentation on the target document and the at least one comparison document to acquire words to be processed of each comparison document and words to be processed of the target document; according to an occurrence frequency of each word to be processed in each comparison document, generating a comparison semantic vector of each comparison document, and according to an occurrence frequency of each word to be processed of the target document, generating a target semantic vector of the target document; and according to the target semantic vector and each comparison semantic vector, determining the similarity of the target document and each comparison document. According to the method and apparatus for measuring the similarity of the documents, which are provided by the invention, the similarity between the documents can be more accurately determined.

Description

technical field [0001] The invention relates to the field of computer technology, in particular to a method and device for measuring the similarity of documents. Background technique [0002] As people pay more and more attention to science, technology and social development, in real life production, people rely more and more on retrieval tools. How to provide retrieval tools efficiently and accurately, text deduplication is a key part. item processing technology. To achieve text deduplication, it is necessary to determine the similarity between documents. [0003] In the prior art, when calculating the similarity between two documents, some keywords are generally extracted from each document, and the similarity between the two documents is determined by comparing these keywords. [0004] It can be seen from the above description that the solutions in the prior art only use some keywords to represent documents, but cannot accurately summarize the contents of the documents,...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/3344G06F40/30
Inventor 于文才甄教明王茂帅高峰柳廷娜
Owner INSPUR SOFTWARE CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products