Unlock instant, AI-driven research and patent intelligence for your innovation.

Multi-document text duplicate checking method, electronic equipment and storage medium

A multi-document and text technology, applied in text database query, unstructured text data retrieval, electronic digital data processing, etc., can solve problems such as low accuracy rate of duplicate check, inaccurate duplicate check results, missing documents, etc., to improve The accuracy of the duplicate check, the solution to the unsuitability of the business, and the effect of solving the inaccurate duplicate check

Pending Publication Date: 2022-03-01
EMOTIBOT TECH LTD
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Method 1, the method of splicing multiple documents, the documents are not divided into primary and secondary, and the key business information is not prominent enough
May lead to low accuracy rate of duplicate checking
The premise of method 2 is to assume that there is a one-to-one corresponding document, and the document is likely to be missing in the actual business, resulting in the loss of effectiveness of this method and inaccurate results of duplicate checking

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-document text duplicate checking method, electronic equipment and storage medium
  • Multi-document text duplicate checking method, electronic equipment and storage medium
  • Multi-document text duplicate checking method, electronic equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0048] The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

[0049] Like numbers and letters denote similar items in the following figures, so that once an item is defined in one figure, it does not require further definition and explanation in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second" and the like are only used to distinguish descriptions, and cannot be understood as indicating or implying relative importance.

[0050] figure 1 It is a schematic structural diagram of an electronic device provided in an embodiment of the present application. The electronic device 100 may be used to execute the method for checking plagiarism of multiple documents provided by the embodiment of the present application. Such as figure 1 As shown, the electronic device 100 includes: one or more processors 102, and ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a multi-document text duplicate checking method and electronic equipment. The method comprises the steps that a to-be-recognized document set is acquired; aiming at preset key indexes, extracting a paragraph set corresponding to each key index from the to-be-identified document set; according to the item paragraph set corresponding to each key index in the item document set, determining the similarity between the item paragraph set and the paragraph set under each key index; and according to the similarity between the item paragraph set and the paragraph set under each key index, determining the similarity between the item document set and the to-be-identified document set. Therefore, the duplicate checking accuracy can be improved, and the problems that duplicate checking of multiple documents is not accurate and businesses are not adaptive are solved.

Description

technical field [0001] The present application relates to the technical field of natural language processing, in particular to a multi-document text plagiarism checking method, electronic equipment and a storage medium. Background technique [0002] In the real world, text is an important carrier of information, in fact, research shows that 80% of information exists in text. In many scenarios, information is redundant and repetitive. The main goal of text plagiarism checking technology is to detect redundancy and repetition of information. [0003] Duplicate checking is for the detection of segmental repetitions such as an article, a paragraph, etc. However, in many scenarios, the repeatability measurement of information depends not only on one document, but on multiple documents. In heavy scenarios, it contains multiple information sources, medical records, surgical records, hospital records, etc. [0004] Existing technologies mainly have two methods in multi-document p...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/33G06F40/194
CPCG06F16/334G06F40/194
Inventor 简仁贤任钊立马永宁
Owner EMOTIBOT TECH LTD