Multi-scale difficulty vector classification method for graded reading materials

A classification method and multi-scale technology, applied in the direction of text database clustering/classification, special data processing applications, instruments, etc., can solve problems such as multi-time, limitation, insufficient sentence information extraction, etc., to enhance generalization and training. The effect of fast speed and rich difficulty feature representation
CN110727796AActive Publication Date: 2020-01-24SOUTH CHINA UNIV OF TECH

Patent Information

Authority / Receiving Office
CN Β· China
Patent Type
Applications(China)
Current Assignee / Owner
SOUTH CHINA UNIV OF TECH
Publication Date
2020-01-24

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention discloses a multi-scale difficulty vector classification method for graded reading materials. The classification method comprises the following steps: firstly, constructing word matchingfeatures, context features, topic features and the like to enrich feature representation; a light and comprehensive sentence difficulty vector is obtained in combination with the most prominent characteristic in previous research, and then is input into a classifier such as a GBDT (Gradient Boost Tree), so that a very good effect is achieved on educational graded reading linguistic data and general linguistic data. According to the method, feature representation is simplified, sentence difficulty can be reflected only through 21 vectors, multi-scale features are introduced, difficulty featurerepresentation is enriched, and model generalization is enhanced; a difficulty vector representation system suitable for sentence levels and article levels is constructed by combining newly used context information, and good effects are obtained in two data sets of the sentence levels and the article levels; the classifier uses a gradient boosting tree, the training speed is high, and a feature importance sequence can be obtained.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention relates to the technical field of clarity analysis in natural language processing, in particular to a multi-scale difficulty vector classification method for graded readers. Background technique

[0002] The task of difficulty vector classification is, given a text, by analyzing the text, giving the difficulty value of the text or judging which level of readers the text is suitable for. Applied in the field of education, it can provide a reference for the selection of graded corpus and textbook materials, and can quantitatively measure the difficulty and complexity of sentence comprehension. In the field of general texts such as news texts, the difficulty and professionalism of news reading can also be analyzed. This difficulty vector can make a more accurate measurement of the difficulty and complexity of text understanding, provide an important basis for sentence simplification and refinement, and also provide a reference for the selec...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More