Comparison matrix similarity retrieval method based on multi-order fingerprints

A comparison matrix and similarity technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problem that the similarity retrieval mechanism cannot be effectively migrated
CN108573045AActive Publication Date: 2018-09-25同方知网数字出版技术股份有限公司

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
同方知网数字出版技术股份有限公司
Publication Date
2018-09-25

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention discloses a comparison matrix similarity retrieval method based on multi-order fingerprints. The method comprises the following steps: fragmenting texts, saving in a database and cleaning text data to form a unified format text; encoding the unified format text by using a simhash algorithm to form a 64-bit binary multi-order fingerprint feature value and saving in the database; calculating the Hamming distance between the feature value of a similarity comparison text and the feature values of other texts, selecting the text of which the Hamming distance is smaller than the threshold value of 3 for performing secondary calculation; constructing a comparison matrix by combining the original text and the comparison text two by two, calculating text similarity and similar content, and marking the output; optimizing the text similarity and a similarity content calculation method, and using parallel computing to calculate multiple practical threads simultaneously in the optimization method.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention relates to the technical fields of text mining and computer information processing, in particular to a multi-stage fingerprint-based comparison matrix similarity retrieval method. Background technique

[0002] With the popularity of computers for various natural language processing applications such as text information, people have put forward higher requirements for computer text processing in the face of the increasingly complex needs of today's society. In the field of similarity retrieval, the existing methods are irreproducible and require a lot of hardware support and special database support, so they cannot meet the diverse needs of enterprises. Especially for state-owned enterprises, public institutions, and state secret agencies, the public similarity retrieval system cannot be used because the data needs to be kept confidential. Faced with the increasing demand for project declarations, it is only possible to conduct similar ch...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More