A Laos language text subject classification method

A topic classification and text technology, applied in the fields of natural language processing and machine learning, can solve the problems of ignoring information, text misunderstanding, etc., to avoid zero probability problems, improve accuracy, and improve the effect of classification

Active Publication Date: 2019-02-01
KUNMING UNIV OF SCI & TECH
View PDF6 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

But it has its own shortcomings, that is, it thinks that all feature attributes are conditionally independent, which is equivalent to putting text featu

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Laos language text subject classification method
  • A Laos language text subject classification method
  • A Laos language text subject classification method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0020] Embodiment 1: as Figure 1-2 Shown, a kind of Lao language text subject classification method, described method step is as follows: Step1, utilize web crawler technology to crawl Lao text, the text that has crawled five categories in total is respectively: economy, politics, education, tourism ,generally. Store them in the corresponding five folders. The folders are named after categories to facilitate subsequent retrieval and processing, and then perform text processing on the crawled articles to remove some noise words that have nothing to do with classification, so as to build a corpus; Further, the noise words can be set to include emoticons, numbers, spaces, and stop words; wherein emoticons, numbers, and spaces are removed by regular expressions, and stop words are removed by using a stop word table (appearing in the stop word table words are removed). When removing some unrelated noise words, the regular expression encoding is used as follows: u"^[\u0000-\u10ff...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Laos language text subject classification method, belonging to the technical field of natural language processing and machine learning. The N-gram language feature extractionmodel and naive bayesian mathematical model to achieve the recognition of laos article theme, to some extent, eliminated the limitations of naive bayesian. It considers the conditional independence assumption that the text is regarded as a word bag model without considering the order information between words, and simultaneously uses the unigram and bigram feature model, which improves the recognition rate of the text.

Description

technical field [0001] The invention relates to a Lao language text topic classification method, which belongs to the technical fields of natural language processing and machine learning. Background technique [0002] With the popularization of the network, the information on the network increases exponentially. When users use search engines to retrieve the information they want, web pages often return thousands of relevant pages, and how can users quickly and effectively locate the desired information without viewing these pages one by one? At this time, topic recognition plays an important role. It can use our pre-trained classifier to locate the topic of the content that the user wants in the limited information input by the user, so as to respond effectively to the user. The Naive Bayesian classification model is a method with a long history and a solid theoretical foundation. It is a direct and efficient method for dealing with many problems at the same time, and many ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/9535
Inventor 周兰江王兴金张建安周枫
Owner KUNMING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products