Unlock instant, AI-driven research and patent intelligence for your innovation.

A Similarity Detection Method for Science and Technology Project Application Form Based on Synonym Analysis

A technology of scientific and technological projects and detection methods, applied in the field of natural language processing, can solve problems such as inability to process semantics, achieve the effect of eliminating interference and improving accuracy

Active Publication Date: 2018-11-06
SCI & TECH INFORMATION INST ZHEJIANG PROV
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In the process of declaring the scientific and technological project declaration project, the full text, paragraphs, and sentences of the scientific and technological project declaration are processed through natural language processing, synonym forest, word segmentation and other technologies, and the analysis results are processed to establish a feature weight vector. The text is analyzed through synonym analysis and TF-IDF model, and the feature weight vector space is established, which solves the defect that the vector space model cannot handle semantics, and improves the accuracy of Chinese text similarity detection

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Similarity Detection Method for Science and Technology Project Application Form Based on Synonym Analysis
  • A Similarity Detection Method for Science and Technology Project Application Form Based on Synonym Analysis
  • A Similarity Detection Method for Science and Technology Project Application Form Based on Synonym Analysis

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] The present invention will be further described below in conjunction with accompanying drawing and example.

[0040] For the convenience of description, the relevant symbols are defined as follows:

[0041] L: Basic lexical corpus.

[0042] T: Synonym Ci Lin.

[0043] S i : The i (i=1, 2, . . . , n)th Chinese character string.

[0044] D. i : the i (i=1, 2, .

[0045] |D|: The total number of scientific and technological project declarations in the text database.

[0046] Science and Technology Project Application Form D i The jth (j=1, 2, . . . , n) text block in .

[0047] V i : i (i=1, 2, . . . , n)th word vector.

[0048] w k : The kth entry.

[0049] |{t:w k ∈D t}|: contains the entry w k The number of application forms for scientific and technological projects.

[0050] Frequency (w k ): entry w k word frequency.

[0051] Weight(P j ): text block P j weights.

[0052] Science and Technology Project Declaration Form D i The text feature ve...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a similarity detection method of scientific and technological project declarations based on synonym analysis. The invention combines synonym analysis and vector models to calculate the similarity between scientific and technological project declarations, and uses the synonym analysis technology to analyze synonyms for dictionaries. A synonym dictionary was established; in view of the particularity of the format of the scientific and technological project declaration, the text of the scientific and technological project declaration was processed in blocks. Then the text block is analyzed and processed by word segmentation algorithm. In the process of declaring the scientific and technological project declaration project, the full text, paragraphs, and sentences of the scientific and technological project declaration are processed through natural language processing, synonym forest, word segmentation and other technologies, and the analysis results are processed to establish a feature weight vector. The text is analyzed through synonym analysis and TF-IDF model, and the feature weight vector space is established, which solves the defect that the vector space model cannot handle semantics, and improves the accuracy of Chinese text similarity detection.

Description

technical field [0001] The invention belongs to the field of natural language processing, and is mainly used for similarity detection of scientific and technological project declaration forms. Background technique [0002] In recent years, as the central government has invested a lot of funds and financial support for scientific research projects, domestic scientific and technological undertakings are also booming. At the same time, problems such as plagiarism and repeated declarations of scientific and technological project declarations have arisen, which have seriously hindered the development of scientific and technological undertakings. healthy growth. Aiming at plagiarism and repeated declarations of scientific and technological project declarations, the invention invents a detection method for the similarity of Chinese texts, which can help the project declaration center to effectively identify scientific and technological project declarations with serious plagiarism. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/22G06F17/27
CPCG06F40/194G06F40/247G06F40/284G06F40/30
Inventor 严伟吕跃华沈凯杨威杨朔
Owner SCI & TECH INFORMATION INST ZHEJIANG PROV