Linear model method used for simplified-Chinese readability measurement
A technology of linear model and linear regression model, which is applied in special data processing applications, unstructured text data retrieval, text database clustering/classification, etc., can solve large differences, and the empirical research on Chinese readability indicators has not yet been comprehensive and systematic issues such as ground expansion and small research data
Image
Examples
Embodiment
[0099] as attached figure 1 As shown, this embodiment discloses a linear model method for measuring the readability of Simplified Chinese. Include the following steps:
[0100] S1. Construct Simplified Chinese text and its readability level corpus;
[0101] S2. Preprocessing the corpus text, including word segmentation, sentence segmentation, part-of-speech tagging, named entity recognition, component syntax analysis, dependency syntax analysis, clause tagging, and / or stroke statistics;
[0102] S3. Extracting and calculating text language features;
[0103] S4. Construct the best combination of features based on language features and regression algorithms;
[0104] S5. Construct a linear regression model for measuring readability.
[0105] Among them, the text language features are as follows:
[0106] Table 1. Text language feature table
[0107]
[0108]
[0109]
[0110]
[0111] In the above table, the low strokes, medium strokes, and high strokes in the...
PUM
Login to View More Abstract
Description
Claims
Application Information
- IPC
- G06F17/30; G06F17/27
- CPC
- G06F16/35; G06F40/284
- Inventors
- δΈεΏι’; ιε―ζ



