Document repeatability identification method and device, electronic equipment and storage medium
A recognition method and repeatable technology, applied in electrical digital data processing, instruments, calculations, etc., can solve problems affecting work efficiency, affecting accurate information transmission, etc.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0027] figure 1 It is a flow chart of a document repeatability identification method provided by Embodiment 1 of the present application; this embodiment is applicable to the case of document repetition identification, especially applicable to the case of financial document repetition identification. The method can be executed by a document repeatability identification device, which is implemented by software / hardware, and can be integrated into an electronic device carrying a document repeatability identification function, such as a server.
[0028] Such as figure 1 As shown, the method may specifically include:
[0029] S110. Extract at least two target words of the target document, and construct a target word sequence of the target document according to the at least two target words.
[0030] Among them, the target document refers to the document that needs to be repeatedly identified; the target word refers to the word in the target document that can express the main mea...
Embodiment 2
[0047] figure 2 It is a flow chart of a document repeatability recognition method provided in Embodiment 2 of the present application; on the basis of the above embodiment, for "according to the one-hot encoding vector of each word in the target word sequence, determine the feature vector of the target document "Further optimization, providing an optional way.
[0048] Such as figure 2 As shown, the method may specifically include:
[0049] S210. Extract at least two target words of the target document, and construct a target word sequence of the target document according to the at least two target words.
[0050] S220. Determine the one-hot encoding vector of each word in the target word sequence.
[0051] S230. Based on the set sliding window, traverse the target word sequence to obtain at least two word segments.
[0052] In this embodiment, setting the sliding window is set by those skilled in the art according to the actual situation.
[0053] In this embodiment, b...
Embodiment 3
[0065] image 3 It is a flow chart of a document repetitive identification method provided in Embodiment 3 of the present application; on the basis of the above embodiments, a global signature and a local signature are added to further optimize the document repetitive identification method.
[0066] Such as image 3 As shown, the method may specifically include:
[0067] S310. Determine the global signature and local signature of the target document.
[0068] In this embodiment, the global signature is used to characterize the overall features of the target document; the local signature is used to characterize the salient local features of the target document.
[0069] Optionally, determining the global signature of the target document may be to perform a hash operation on text information in the target document to obtain the global signature of the target document. Specifically, the text information in the target document can be hashed using a Secure Hash Algorithm (SHA) t...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com