Intelligent Chinese word segmentation method based on statistics and deep learning

A Chinese word segmentation and deep learning technology, applied in the field of word segmentation, can solve problems such as high complexity and slow word segmentation speed

Active Publication Date: 2019-11-05
SHANDONG UNIV OF SCI & TECH
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The purpose of the present invention is to provide an intelligent Chinese word segmentation method based on statistics and deep learning, which solves the problems of high complexity and slow word segmentation speed when only using the bidirectional LSTM algorithm for Chinese word segmentation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Intelligent Chinese word segmentation method based on statistics and deep learning
  • Intelligent Chinese word segmentation method based on statistics and deep learning
  • Intelligent Chinese word segmentation method based on statistics and deep learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] The present invention will be described in detail below in combination with specific embodiments.

[0021] The flow of the intelligent Chinese word segmentation method based on statistics and deep learning of the present invention is as follows: figure 1 As shown, the steps are as follows:

[0022] Step1. Data preprocessing. figure 2 It is a flow chart of the data (text) preprocessing process; the text document to be segmented can be preprocessed, and the document can be segmented with the help of the original punctuation marks, paragraph separators and other symbols that have a separating effect in the text, so as to obtain shorter sentences or characters string.

[0023] Considering the Chinese writing format and characteristics, the author usually puts the content with similar content or close logical connection in a natural paragraph. Therefore, technical terms in the field generally appear repeatedly in one or several natural paragraphs. For paragraphs with a l...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an intelligent Chinese word segmentation method based on statistics and deep learning. The method comprises the following steps of constructing a domain term set; selecting a word segmentation method; word segmentation decision. The method has the advantages that a word segmentation model combining the word segmentation method based on statistics and the deep learning technology is adopted, an application range is wide, accurate word segmentation can be conducted on professional words in the professional field, the algorithm is simple, and the word segmentation speed ishigh.

Description

technical field [0001] The invention belongs to the technical field of word segmentation, and relates to a technology capable of improving the word segmentation accuracy of professional terms for documents in the professional field. Background technique [0002] Chinese word segmentation (Chinese Word Segmentation) is the process of dividing a sequence of Chinese characters into individual words, which is the basis for natural language processing. As a branch of natural language processing, Chinese information processing includes three levels: lexical analysis, syntactic analysis and semantic analysis, among which Chinese word segmentation is the first step of lexical analysis. Chinese word segmentation has a wide range of applications, ranging from POS part-of-speech tagging, NER named entity recognition, to automatic classification, automatic proofreading, search engines, speech synthesis, machine translation, etc. The Chinese word segmentation method based on statistics ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
CPCY02D10/00
Inventor 徐建国刘梦凡刘泳慧
Owner SHANDONG UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products