A method for grading Chinese texts and calculating difficulty scores for Chinese texts

A text and Chinese technology, which is applied in the field of grading Chinese texts and calculating the difficulty score of Chinese texts, can solve the problems of heavy workload, subjectivity, and difficulty in reaching a unified opinion, and achieve good scoring effects

Active Publication Date: 2021-05-04
SUN YAT SEN UNIV
View PDF9 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, if it is extended to a more general scenario, there will be problems such as the selected features are not comprehensive enough, and the linear model used is not ideal.
More importantly, the features selected for Pinyin texts cannot well reflect the difficulty characteristics of Chinese texts
Since there are no tools in China that can score the difficulty of Chinese texts, many domestic text difficulty scoring tasks, such as teaching material ratings, are still done using the most primitive manual grading.
However, for the existing Chinese texts with various themes and different styles, manual difficulty rating is a huge workload and very time-consuming.
Moreover, the results of manual grading are often very subjective, and it is difficult to reach a consensus during retesting

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method for grading Chinese texts and calculating difficulty scores for Chinese texts
  • A method for grading Chinese texts and calculating difficulty scores for Chinese texts
  • A method for grading Chinese texts and calculating difficulty scores for Chinese texts

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] The present invention will be further described below in conjunction with specific embodiment:

[0047] See attached figure 1 As shown, a method for grading Chinese text and calculating the difficulty score of Chinese text described in this embodiment includes the following steps:

[0048]S1. Text acquisition and grade labeling, obtaining articles with classification labels as a training set, the specific steps are as follows:

[0049] S11. Selecting suitable teaching materials to build a dedicated Chinese teaching material text corpus;

[0050] S12. Preliminary screening is performed on the corpus text, and articles with low data quality are eliminated;

[0051] S13. Based on the corpus information integration expert opinion, a grade label is given;

[0052] By referring to the two language framework systems of the Common European Framework of Reference for Languages ​​and the syllabus for Chinese language teaching promulgated by Hanban, the difficulty level of Chin...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a method for grading Chinese texts and calculating the difficulty score of Chinese texts. Firstly, text acquisition and grade labeling are performed to obtain articles with classification labels as a training set, and then feature extraction is performed to obtain the text corresponding to each article. The eigenvalues ​​of all the linguistic features of the text; then, the establishment and testing of the model are carried out, and the prediction accuracy reaches the expected model, and finally the obtained model is used to predict the difficulty of the text. The present invention is applicable to various scenarios where text legibility needs to be evaluated, and the adopted support vector regression algorithm model achieves a better scoring effect by increasing the dimensional space of features, which is superior to traditional linear models.

Description

technical field [0001] The invention relates to the technical field of model prediction, in particular to a method for grading Chinese texts and calculating the difficulty score of Chinese texts. Background technique [0002] With the development of network technology, massive amounts of unstructured data such as text, images, and videos are generated on the Internet every day. The text data in it can be processed by modern natural language processing technology to dig out more valuable information hidden in the text. For a long time, the mainstream technology of natural language processing was rule-based, from various syntactic analysis to semantic analysis, and then with the development of the Internet, accompanied by the generation of a large amount of corpus, statistical natural language processing gradually emerged . Text legibility research is one of the research points. In the process of language learning, teachers need to select moderately difficult texts from a l...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/289G06F40/242G06F16/35G06K9/62
CPCG06F16/35G06F40/242G06F40/289G06F18/2411
Inventor 郑子彬林星彤
Owner SUN YAT SEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products