Chinese commercial text preprocessing method based on machine learning

A machine learning and preprocessing technology, applied in machine learning, electrical digital data processing, special data processing applications, etc., can solve problems that cannot be used to process Chinese text, and achieve the effect of improving accuracy

Pending Publication Date: 2019-11-15
NANJING UNIV OF POSTS & TELECOMM
View PDF1 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

There are huge differences between Chinese data and other languages. For example, Chinese is written continuously, there is no change in voice and tense, and there are polyphonic characters, etc., which makes Chinese more flexible. Many foreign mature technologies cannot be used to process Chinese text

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese commercial text preprocessing method based on machine learning
  • Chinese commercial text preprocessing method based on machine learning
  • Chinese commercial text preprocessing method based on machine learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] In order to make the object, technical solution and advantages of the present invention more clear, the present invention will be further described in detail below in conjunction with the examples. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0047] The application principle of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0048] The present invention uses python language to realize on Windows platform, as Figure 1-3 As shown, the Chinese commercial text that will be input into the platform is processed by the scheme in the following steps:

[0049] (1) Sentence and word segmentation for Chinese business texts

[0050] Sentences are divided using periods as identifiers, and the word segmentation is divided based on a statistical probability model. Described participle concrete process is that the cha...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Chinese commercial text preprocessing method based on machine learning. The input Chinese commercial text is processed through the following steps that (1) performing sentence segmentation and word segmentation on the Chinese commercial text; (2) carrying out part-of-speech tagging on the segmented words by utilizing a decision tree; (3) performing word sense disambiguation by utilizing conditional probability based on a Bayesian classifier; (4) representing a word vector by using a hybrid model combining One-Hot coding and a Skip-Gram model; (5) adjusting word weights by utilizing TF-IDF, and determining corresponding word meanings of the polysemy under the current context; and (6) outputting the Chinese commercial text preprocessed based on machine learning. TheChinese commercial text preprocessing method can effectively solve the problems that a Chinese commercial question-answering system does not answer questions and is limited in response scene due to insufficient text preprocessing, can improve the text understanding accuracy of a computer, and enables extension work such as machine translation and intelligent question-answering to have implementability.

Description

technical field [0001] The invention belongs to the field of natural language processing, and in particular relates to a machine learning-based Chinese business text preprocessing method. Background technique [0002] The combination of business development and artificial intelligence has received more and more attention, and speech recognition technology is the basis of human-computer interaction. The current natural language processing usually adopts the following two methods. One is the rule-based natural language processing method. After many years of experiments using this method at home and abroad, the effect is still very unsatisfactory. Starting from syntax and other aspects, it is analyzed and processed according to the rules of the language. Because there are too many rules, there is no fixed method. At the same time, new rules are constantly added through people's production and life, so it is very difficult to implement. Another method is the natural language pr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06N20/00
CPCG06N20/00
Inventor 桂冠张婕杨洁
Owner NANJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products