Supercharge Your Innovation With Domain-Expert AI Agents!

Sentence information fingerprint representation method and sentence duplicate checking method and system

A sentence and fingerprint technology, applied in the field of information processing, can solve the problems of inaccurate fingerprint information, difficulty in extracting keywords and high-frequency words, etc., and achieve the effect of accurate information fingerprints and accelerated judgment.

Pending Publication Date: 2021-11-12
路米科技江苏有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The purpose of the present invention is to provide a sentence information fingerprint representation method, a sentence plagiarism check method and system to improve the problem in the prior art that the fingerprint information is not accurate enough due to the difficulty in extracting keywords and high-frequency words

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Sentence information fingerprint representation method and sentence duplicate checking method and system
  • Sentence information fingerprint representation method and sentence duplicate checking method and system
  • Sentence information fingerprint representation method and sentence duplicate checking method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0071] Some implementations of the present application will be described in detail below in conjunction with the accompanying drawings. In the case of no conflict, each of the following embodiments and each feature in the embodiments can be combined with each other.

[0072] Please see figure 1 , figure 1 It is a flow chart of a sentence information fingerprint representation method provided by an embodiment of the present invention. The sentence information fingerprint representation method comprises the following steps:

[0073] Step S110: Obtain the sentence information to be tested; the sentence information to be tested includes sentence content information, punctuation mark information, and the like. The above-mentioned sentence information to be tested can be in Chinese or English. The non-letter words like Chinese are in units of words; the words in letters like English are in words. For example: the sentence information to be tested is "vehicle maintenance and test...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a sentence information fingerprint representation method and a sentence duplicate checking method and system, and relates to the technical field of information processing. The sentence information fingerprint representation method comprises the following steps: acquiring sentence information to be tested; calculating the number of words in the sentence information to be tested; matching in a preset word number table according to the word number to obtain an interception number corresponding to the word number; according to the interception number, sequentially extracting characters corresponding to the interception number in the sentence information to be detected; respectively carrying out UTF-8 coding on the corresponding intercepted number of words in the extracted sentence information to be tested to generate coding information corresponding to each word; performing complementation on the coding information corresponding to each word according to a preset parameter to generate a remainder of each code; and sorting the coding remainders according to a sequence to generate information fingerprints of the sentences. Therefore, the problem that the fingerprint information is not accurate enough due to the fact that the key words and the high-frequency words are difficult to extract is avoided.

Description

technical field [0001] The invention relates to the technical field of information processing, in particular to a sentence information fingerprint representation method, a sentence plagiarism checking method and a system. Background technique [0002] At present, extracting fingerprints from a text information is mainly to extract keywords and high-frequency words in the text, and use them as fingerprint information. When 8 keywords and their word frequencies are selected as its fingerprints, the accuracy is above 98%, and the recall rate is in the About 30%. This shows that it is necessary to be able to "summarize" the information and find out the 8 most frequently used words, which can basically represent this information, but the extraction of keywords and high-frequency words is relatively difficult, resulting in inaccurate fingerprint information. Contents of the invention [0003] The purpose of the present invention is to provide a sentence information fingerprint ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/211G06F40/216G06F40/284G06F16/215
CPCG06F40/211G06F40/284G06F40/216G06F16/215
Inventor 祁智恒
Owner 路米科技江苏有限公司
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More