Linear model method used for simplified-Chinese readability measurement

A technology of linear model and linear regression model, which is applied in special data processing applications, unstructured text data retrieval, text database clustering/classification, etc., can solve large differences, and the empirical research on Chinese readability indicators has not yet been comprehensive and systematic issues such as ground expansion and small research data

Inactive Publication Date: 2018-05-01
GUANGDONG UNIVERSITY OF FOREIGN STUDIES
View PDF5 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

But there are still many deficiencies, such as the scale of research data is very small (such as twenty or thirty articles or dozens of sentences), over-targeted (such as for foreign students of individual languages), variable selection is relatively subjective (such as omitting automatic learning variables), insufficient empirical tests (such as lack of goodness-of-fit tests),...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Linear model method used for simplified-Chinese readability measurement
  • Linear model method used for simplified-Chinese readability measurement
  • Linear model method used for simplified-Chinese readability measurement

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0099] as attached figure 1 As shown, this embodiment discloses a linear model method for measuring the readability of Simplified Chinese. Include the following steps:

[0100] S1. Construct Simplified Chinese text and its readability level corpus;

[0101] S2. Preprocessing the corpus text, including word segmentation, sentence segmentation, part-of-speech tagging, named entity recognition, component syntax analysis, dependency syntax analysis, clause tagging, and / or stroke statistics;

[0102] S3. Extracting and calculating text language features;

[0103] S4. Construct the best combination of features based on language features and regression algorithms;

[0104] S5. Construct a linear regression model for measuring readability.

[0105] Among them, the text language features are as follows:

[0106] Table 1. Text language feature table

[0107]

[0108]

[0109]

[0110]

[0111] In the above table, the low strokes, medium strokes, and high strokes in the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a linear model method used for simplified-Chinese readability measurement. The method includes the steps of: constructing simplified-Chinese text and a readability level corpusthereof; preprocessing the text, wherein word segmentation, sentence segmentation, part-of-speech labeling, named-entity recognition, component syntax analysis, dependency syntax analysis, clause labeling and stroke counting are included; extracting and calculating text language features; constructing a best feature combination according to the language features and a regression algorithm; and constructing a linear regression model of readability measurement. The text language features adopted by the model cover four aspects of shallow-layer features, part-of-speech label features (also called semantic or lexical features), grammatical features, textual features and the like, a readability level of simplified-Chinese text for learners of which native languages are Chinese can be automatically predicted, and a gap of readability prediction models based on the simplified-Chinese text is filled. The model of the invention is high in a fitting degree, is high in interpretability, and hasextensibility and an important reference value for evaluating readability of application text.

Description

technical field [0001] The invention relates to the technical field of readability measurement, in particular to a linear model method for readability measurement of Simplified Chinese. Background technique [0002] Linguistic complexity is a multidimensional and interdisciplinary academic concept that can be studied from the perspectives of natural language processing, second language acquisition, psycholinguistics, cognitive science, and contrastive linguistics. The definition of language complexity can be carried out from two aspects: language complexity in the strict sense, that is, the research on the complexity of language structure, which is mostly used in cross-language comparative research and automatic scoring of compositions; research on language complexity in a relative sense, such as readability, Language difficulty, cognitive cost, etc., are mostly used in applied research that serves language learning and text understanding. This project examines the linguist...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/35G06F40/284
Inventor 丘心颖邓可斌
Owner GUANGDONG UNIVERSITY OF FOREIGN STUDIES
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products