Document repeatability identification method and device, electronic equipment and storage medium

A recognition method and repeatable technology, applied in electrical digital data processing, instruments, calculations, etc., can solve problems affecting work efficiency, affecting accurate information transmission, etc.
CN112926314APending Publication Date: 2021-06-08CHINA CONSTRUCTION BANK

Patent Information

Authority / Receiving Office
CN ยท China
Current Assignee / Owner
CHINA CONSTRUCTION BANK
Publication Date
2021-06-08

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention relates to the technical field of artificial intelligence, and discloses a document repeatability identification method and device, electronic equipment and a storage medium. The method includes extracting at least two target words of a target document, and constructing a target word sequence of the target document according to the at least two target words, wherein the target words at least comprise nouns, verbs and quantity words in the target document; determining a one-hot coding vector of each word in the target word sequence; according to the one-hot coding vector of each word in the target word sequence, determining a feature vector of the target document; and determining the repeatability of the target document according to the distance between the feature vectors of other documents and the feature vector of the target document. According to the technical scheme, the accuracy of document query is improved on the premise of balancing the time complexity and the space complexity of document duplicate checking, and a new thought is provided for repeated recognition of documents.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The embodiments of the present application relate to the technical field of artificial intelligence, in particular to the technical field of natural language processing, and specifically to a method, device, electronic device, and storage medium for identifying repetitiveness of documents. Background technique

[0002] With the development of Internet technology, all walks of life and various documents can be obtained from the Internet. For example, financial institutions access a large number of financial documents from the Internet every day, including market express, financial information, research reports, policy interpretations, announcements, etc. Many documents from different data sources are the same or similar. If it is not filtered, a large number of duplicate documents or similar documents will flood in, which will greatly affect the accurate transmission of information and affect work efficiency. Therefore, it is particularly important to ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More