Unlock instant, AI-driven research and patent intelligence for your innovation.

Four-layer structure Chinese text regularized system and realization thereof

A layer structure and text technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as difficult writing of rules, maintenance, general promotion, etc.

Inactive Publication Date: 2012-12-12
BEIJING UNIV OF POSTS & TELECOMM
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The rule-based method is more intuitive, but it also has obvious disadvantages: the rules are difficult to write and maintain, and the generalizability is also very general

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Four-layer structure Chinese text regularized system and realization thereof
  • Four-layer structure Chinese text regularized system and realization thereof
  • Four-layer structure Chinese text regularized system and realization thereof

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0018] The Chinese text regularization system proposed by the present invention includes three parts: non-standard word recognition, non-standard word ambiguity elimination and standard pinyin generation, and a Chinese text regularization system with a four-layer structure is constructed. The finite automata recognizes non-standard words from the real text, and gives the specific category marks of non-standard words. The conditional random field model is used for ambiguous non-standard words, and its sub-classification is given with the corresponding rules, and the third stage is used based on the error The driven rule learning method constructs optimal rules to further optimize the results of the previous stage. Finally, both basic non-standard words and ambiguous non-standard words are input into the last part to generate standard pronunciation. At the same time, this whole set of Chinese text regularization system provides web services based on C / S mode, and can support up ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a completely new regularized method suitable for Chinese text, which combines machine learning with rules to greatly improve the regularized precision rate of the Chinese text. The method comprises steps as follows: firstly analyzing non-standard words in an appointed corpus, concluding types of the non-standard words, using finite automaton for establishing a dictionary for identifying the non-standard words in a real text; selecting some types occupied the most therein and characteristics to establish a template; using a condition random field arithmetic for modeling,using suitable rules to process and sub-classify the rest parts, for further improving the identification precision rate of the non-standard words and eliminating the ambiguity; aiming at the faults occurred in the identification, using a fault drive rule learning method to select the optimal rules, for further improving the precision rate; and finally generating the correct pronunciation of the non-standard words via a standard pronunciation generating module. Based on the method, the invention conceives a four-layer structure Chinese text regularized system which can greatly improve the regularized precision rate and high efficiency of the Chinese text.

Description

technical field [0001] The invention belongs to the field of computer man-machine communication, relates to a Chinese text regularization system with a multi-level structure, and simultaneously supports Web access in C / S (Client / Server) mode. The present invention introduces the concept of non-standard words in Chinese texts, and effectively classifies them under the premise of systematic analysis and induction, adopts machine learning methods such as conditional random fields, and proposes a recognition method including non-standard words , disambiguation and standard pronunciation to generate a four-layer Chinese text regularization model, which is suitable for practical applications such as speech synthesis and machine translation. Background technique [0002] With the development of information technology, language technology, and computer technology, people's requirements for text processing are getting higher and higher, especially the Chinese language, which is enjoy...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/27G06F17/28G06N1/00
Inventor 董远周涛
Owner BEIJING UNIV OF POSTS & TELECOMM