Cross-language document similarity detection method

A detection method and similarity technology, which is applied in the field of cross-language document similarity detection, can solve the problems of cross-language document similarity detection barriers, document similarity detection inapplicability, etc., and achieve the effect of solving changes and deformations

Inactive Publication Date: 2012-02-22
BEIHANG UNIV
View PDF9 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0013] As can be seen from the example, also due to the diversity of expressions and the polysemy of vocabulary, it will cause obstacles in the similarity detection of cross-language documents
[0014] Due to the above-mentioned technical difficulties, traditional document similarity detection methods are not applicable to document similarity detection in cross-language situations

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cross-language document similarity detection method
  • Cross-language document similarity detection method
  • Cross-language document similarity detection method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] The present invention will be further described in detail with reference to the accompanying drawings and embodiments.

[0034] The cross-language document similarity detection method of the present invention, such as figure 1 As shown, it specifically includes the following steps:

[0035] Step 1. The source document and the target document to be compared are respectively converted into intermediate documents based on words in the same language. The source document and the target document are plain text documents in any language.

[0036] The conversion method is as follows: firstly divide the source document or the target document at the granularity of one or several words; then convert each divided word or phrase into a set Slot composed of an intermediate representation, the intermediate representation Words or phrases in a certain language corresponding to the words or phrases divided into the source document or the target document; finally, an index is built for...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a cross-language document similarity detection method, and belongs to the technical field of document similarity comparison. The method comprises the following steps: firstly, respectively converting a source document and a target document to be compared into intermediate documents based on the same language word; searching similar intermediate representation sets between the two intermediate documents so as to establish a mapping set; and finally finding out similar text sections between the source document and the target document through the mapping set according to amethod of searching the similar text sections. The cross-language document similarity detection method provided by the invention has the beneficial effects that the difficulty in cross-language document similarity detection can be better solved and better detection results can be obtained.

Description

technical field [0001] The invention relates to the technical field of document similarity comparison, in particular to a cross-language document similarity detection method. Background technique [0002] The plagiarism of papers has been a major problem that has plagued the academic community. In recent years, there have been a lot of researches on the similarity detection technology of documents in the same language, and there are also many available products. However, the research on cross-language document similarity detection is basically still in a blank state. At the same time, the phenomenon of cross-language paper plagiarism is becoming more and more serious. Therefore, it is a very valuable and meaningful topic to study the document similarity detection technology in the cross-language situation. [0003] At present, the difficulty of cross-language document similarity detection is mainly reflected in two aspects: [0004] 1. Differences in grammatical structure ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 赵长海晏海华杨沐杉
Owner BEIHANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products