Text information checking method, device and electronic device for text information

A text information and text technology, applied in the field of information processing, can solve the problems of poor similarity performance of retrieval results and slow retrieval speed.

Active Publication Date: 2019-02-19
中孚安全技术有限公司
View PDF5 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In view of this, the object of the present invention is to provide a text information plagiarism checking method, device and electronic equipment to allev

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text information checking method, device and electronic device for text information
  • Text information checking method, device and electronic device for text information
  • Text information checking method, device and electronic device for text information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0028] figure 1 is a method for checking plagiarism of text information according to an embodiment of the present invention, such as figure 1 As shown, the method includes the following steps:

[0029] Step S11, obtaining the text to be queried;

[0030]In the embodiment of the present invention, the text to be queried can be given by the user, or can be obtained automatically from the network node, and can be set by the user according to the requirement, which is not specifically limited in the embodiment of the present invention.

[0031] In addition, the present invention supports a big data architecture, can perform real-time stream processing on query text, and can optimize and upgrade computing and storage performance more conveniently and quickly when the amount of data increases.

[0032] Step S12, performing paragraph splitting processing on the text to be queried to obtain multiple split paragraphs of the text to be queried;

[0033] After the text to be queried i...

Embodiment 2

[0099] The embodiment of the present invention also provides a device for checking plagiarism of text information. The device for checking plagiarism of text information is mainly used to implement the method for checking the plagiarism of text information provided in the above-mentioned content of the embodiment of the present invention. The text provided in the embodiment of the present invention is as follows The information checking device will be introduced in detail.

[0100] Figure 5 is a schematic diagram of a device for checking plagiarism of text information according to an embodiment of the present invention, such as Figure 5 As shown, the device mainly includes an acquisition module 10, a split module 20, a function processing module 30, and a segmentation matching module 40, wherein:

[0101] An acquisition module, configured to acquire text to be queried;

[0102] The splitting module is used to split the paragraphs of the text to be queried to obtain multipl...

Embodiment 3

[0113] see Figure 6 , the embodiment of the present invention also provides an electronic device, including: a processor 50, a memory 51, a bus 52 and a communication interface 53, the processor 50, the communication interface 53 and the memory 51 are connected through the bus 52; the processor 50 is used for Executable modules, such as computer programs, stored in the memory 51 are executed.

[0114] Wherein, the memory 51 may include a high-speed random access memory (RAM, Random Access Memory), and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory. The communication connection between the system network element and at least one other network element is realized through at least one communication interface 53 (which may be wired or wireless), and the Internet, wide area network, local network, metropolitan area network, etc. can be used.

[0115] The bus 52 can be an ISA bus, a PCI bus or an EISA bus, etc. The bus can be divide...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a text information checking method and device and an electronic device for text information, which relate to the technical field of information processing and comprise the following steps: obtaining a text to be inquired; performing paragraph splitting processing on the text to be queried to obtain a plurality of splitting paragraphs of the text to be queried; hash functionprocessing is carried out on each split paragraph to obtain the characteristic fingerprint of the split paragraph; the feature fingerprints of the split segments are segmented to obtain a plurality offeature fingerprint fragments, each feature fingerprint fragment is matched with a preset feature fingerprint fragment in a corresponding feature fingerprint fragment library, and text information checking results are determined according to the matching results. The method matches each feature fingerprint fragment with its corresponding feature fingerprint fragment library, The text informationchecking results can be obtained quickly, and the duplicate checking results of each split paragraph have good similarity performance, which alleviates the technical problems of the existing text information duplicate checking methods, such as slow retrieval speed and poor similarity performance of retrieval results.

Description

technical field [0001] The present invention relates to the technical field of information processing, in particular to a text information plagiarism checking method, device and electronic equipment. Background technique [0002] The amount of data in the era of big data is far greater than what humans can handle, most of which are duplicated, reproduced or plagiarized. For the purpose of ascertaining the source of text plagiarism or reducing the repeated storage of the same content, a duplicate checking or similar checking system is required. [0003] The text information plagiarism check method in the prior art is to segment the text content, remove the stop words and extract the feature words, store the feature word vectors in the database, the requested article will be processed in the same way, and find the feature word vectors The distance, such as: Hamming distance, cosine distance, the closer the distance, the higher the similarity. This method can find highly simi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/332G06F17/27
CPCG06F40/279
Inventor 熊英超孙宏跃刘志远
Owner 中孚安全技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products