Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Method for Segmenting Chinese General Texts Based on Partially Supervised Learning

A supervised learning and generalization technology, applied to computer parts, instruments, calculations, etc., can solve problems such as inability to segment text, low word recognition rate, manpower and time consumption, etc., to achieve excellent performance, feature accuracy improvement, The effect of manpower saving

Active Publication Date: 2020-05-19
CHENGDU UNIV OF INFORMATION TECH
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] To sum up, the problems existing in the existing technology: relying on large-scale artificial data sets, requiring a lot of manpower and time consumption; the word recognition rate is low; the text cannot be accurately segmented into words with appropriate granularity

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Method for Segmenting Chinese General Texts Based on Partially Supervised Learning
  • A Method for Segmenting Chinese General Texts Based on Partially Supervised Learning
  • A Method for Segmenting Chinese General Texts Based on Partially Supervised Learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] In order to make the object, technical solution and advantages of the present invention more clear, the present invention will be further described in detail below in conjunction with the examples. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0035] The application principle of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0036] The Chinese generalized text segmentation method based on partially supervised learning provided by the embodiment of the present invention, the Chinese generalized text segmentation method based on partially supervised learning regards the Chinese short text word segmentation task as a two-category or three-category problem, and According to the main features of the short text, the pre- and contextual feature information with less noise is combined with a partially supervised l...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the technical field of language processing, and discloses a Chinese generalized text segmentation method based on partly supervised learning. The word segmentation task of Chinese short texts is regarded as a two-category or three-category problem, and the main features of the short text are extracted with relatively The contextual feature information of small noises is combined with a partially supervised learning method for word segmentation. Through the control experiments of five groups plus a set of "difficult" data sets, it is not difficult to find that the results of word segmentation of short texts are deeply affected by the length of contextual information, among which binary contextual information can best fit the characteristics of word segmentation of short texts. It can effectively improve the performance of word segmentation; the mixed features of two and three elements can better express the information of each "empty" and its performance is the best, and it will lose performance if it is more or less; the application of partly supervised learning in short text word segmentation is also It can reflect its excellent ability to complete parameters, which can greatly reduce the work of manual labeling and obtain better performance.

Description

technical field [0001] The invention belongs to the technical field of language processing, and in particular relates to a Chinese generalized text segmentation method based on partly supervised learning. Background technique [0002] In natural language processing tasks, the most basic task is to segment a piece of text into blocks containing the most basic semantics. And word can meet the requirement of this task of the present invention just most, the present invention can be extracted by the segmentation that word is easy to be extracted by blank space in the language of self-contained delimiter between similar English this class word, but in Chinese this without separating In the language of symbol, the present invention just needs to carry out a participle task separately. At present, there are two traditional conventional methods. One is based on the matching method, that is, using a manually constructed dictionary to perform word-by-word comparison to verify whether...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/289G06K9/62
CPCG06F40/289G06F18/24155
Inventor 王亚强何思佑唐聃舒红平
Owner CHENGDU UNIV OF INFORMATION TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products