Chinese author identification method based on double-layer classification model, and device for realizing Chinese author identification method

A technology for author identification and classification models, applied in special data processing applications, instruments, electrical digital data processing, etc., and can solve problems such as low recognition accuracy and reduced recognition accuracy.

Inactive Publication Date: 2013-01-16
HUNAN UNIV
View PDF1 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The present invention aims at discovering new effective features for Chinese; solving the problem that the noise contained in the high-dimensional feature vector causes the recognition accuracy to decline; and when the number of authors is relatively large (greater than 20), the recognition accuracy is low. question

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese author identification method based on double-layer classification model, and device for realizing Chinese author identification method
  • Chinese author identification method based on double-layer classification model, and device for realizing Chinese author identification method
  • Chinese author identification method based on double-layer classification model, and device for realizing Chinese author identification method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] In order to make the purpose and technical solution of the present invention clearer, the specific implementation manners of the present invention will be described in detail below.

[0030] The specific steps of the Chinese author identification method based on the two-layer classification model are as follows:

[0031] The first step is to get the author vector. Use the word sense tagging module to tag words in Chinese works. The input of the word meaning tagging module is a work. By calling the full-text word sense disambiguation module of the Language Technology Platform (LTP) freely shared by the Social Computing and Information Retrieval Research Center of Harbin Institute of Technology, after tagging the sentences in the work, the tagged results will be Save in a new document.

[0032]For each word-sense tagged document, the calculation module takes it as an input, extracts the frequency of 88 word-sense tags (the 88 word-sense tags are counted in alphabetical ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a Chinese author identification method based on a double-layer classification model and a device for realizing the Chinese author identification method, belonging to the field of information security. Aiming at the problem of low identification accuracy caused by excessive authors, an author grouping layer is added in an author identification model; each author is represented into an author vector; authors are grouped by a clustering algorithm; a second layer is an author identification layer; a dependence relationship, a function word, a punctuation mark and a word class mark are extracted from the second layer to use as characteristics; and author identification is carried out in the group. According to the method or the device, the problem that the identification accuracy is lowered because of excessive authors can be effectively solved. Meanwhile, with a proposed characteristic dimensionality reduction and optimization method based on a main ingredient analysis method, the problem that the identification accuracy is affected by noise comprised by a high-dimensionality characteristic vector is solved. The Chinese author identification method can be applied to the author textual research field of a literature and also can be applied to the field of information security, such as copyright protection.

Description

technical field [0001] The invention relates to the field of Chinese natural language processing and the field of Chinese author identification, in particular to a Chinese author identification method and device based on a two-layer classification model. Background technique [0002] In recent years, the wind of plagiarism and plagiarism has intensified in academic fields such as literary creation and thesis writing. For example, there were several cases of plagiarism in papers of National Social Science Fund projects in Shanghai; post-80s writer Guo Jingming's "How Many Flowers Fall in Dreams" was suspected of plagiarism; Sang Yuzhu, executive vice chairman of the Photographers Association of Jilin Federation of Literary and Art Circles, was suspected of plagiarizing and using other people's works; Wang Hui, a professor of the Chinese Department of Tsinghua University and former editor-in-chief of "Reading" magazine, wrote his doctoral thesis "Resisting Despair" more than 2...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 刘玉玲万晶
Owner HUNAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products