Query generation method for source retrieval based on machine learning in plagiarism detection

A query generation and machine learning technology, applied in the field of information retrieval, which can solve problems such as lack of continuous improvement ability

Inactive Publication Date: 2017-07-18
HEILONGJIANG INST OF TECH
View PDF2 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0015] In order to solve the problem of lack of continuous improvement in the method of query generation based on the heuristic method in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Query generation method for source retrieval based on machine learning in plagiarism detection
  • Query generation method for source retrieval based on machine learning in plagiarism detection
  • Query generation method for source retrieval based on machine learning in plagiarism detection

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach 1

[0053] The specific embodiment one, the query generation method of the source retrieval based on machine learning in a kind of plagiarism detection described in the present embodiment is:

[0054] for a suspicious document fragment s k , using the existing n query generation methods to obtain a set of candidate query sets Sort all the candidate queries in the set to obtain a sorted list;

[0055] Take the first m queries of the sorted list as suspicious document fragments s k query

[0056] In this embodiment, the set of candidate queries The candidate query of , is to use the existing source retrieval query generation method in the suspicious document fragment s k extracted from the is to use the existing query generation method 1 in the suspicious document fragment s k Alternative queries extracted from above.

[0057] The existing source retrieval query generation method described in this embodiment is an existing known query generation method, for example: TF, ...

specific Embodiment approach 2

[0058] Specific embodiment 2. This embodiment is a further limitation of the query generation method based on machine learning source retrieval in plagiarism detection described in specific embodiment 1. In this embodiment, all alternatives in the set The principle of query sorting is to sort from high to low according to the evaluation indicators of source retrieval corresponding to each query.

[0059]The evaluation index of the source retrieval refers to the index obtained by the existing evaluation method for evaluating the retrieval results of the source retrieval, which indicates the quality of the source retrieval. In this embodiment, the sorting basis for the selected queries is limited to the evaluation index of the source retrieval, that is, the query obtained by the query method with a relatively high evaluation index is selected as the final query, thereby improving the quality of the source retrieval.

specific Embodiment approach 3

[0060] Specific Embodiment 3. This embodiment is a further limitation of the query generation method based on machine learning source retrieval in the plagiarism detection described in specific embodiment 1. In this embodiment, the sorting is based on a machine learning method Achieved.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a query generation method for source retrieval based on machine learning in plagiarism detection, relates to the technical field of information retrieval, in particular to a query generation technology in an information retrieval technology, and solves the problems of dependency on expert experience and lack of continuous improvement capability in a method for performing query generation by adopting a heuristic-based method in a source retrieval technology of the prior art. The method comprises the steps of obtaining a group of alternative query sets defined in the specification by adopting n existing query generation methods for a suspicious document fragment sk; sorting all alternative queries in the set to obtain a sorting list; and taking first m queries of the sorting list as queries, defined in the specification, of the suspicious document fragment sk. According to the method, an inherent research thought for the query generation method in the technical field of existing source retrieval is overcome, and a characteristic that different source retrieval methods have different source retrieval performances on the same suspicious document fragment is fully utilized.

Description

technical field [0001] The invention relates to the technical field of information retrieval, and in particular to information retrieval technology and query generation technology. Background technique [0002] With the development and popularization of computer network technology, network resources are widely used, which in turn promotes the rapid development of network search engines. Search engines enable people to make full use of network resources for learning, communication and entertainment. While the development of network resources and search engine technology has brought convenience to people, it has also brought some negative effects. For example, in the fields of teaching and scientific research, some people use network resources and search engines to plagiarize. , It is called academic fraud in the society. With the frequent exposure of such academic fraud incidents, a new technology to identify such "academic fraud" has emerged, namely: plagiarism retrieval t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06K9/62
CPCG06F16/3344G06F16/35G06F18/2411
Inventor 孔蕾蕾齐浩亮韩中元韩咏郝振元
Owner HEILONGJIANG INST OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products