Web-based text classification mining system and web-based text classification mining method

A text classification and classification algorithm technology, which is applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of difficult control of dynamic changes, inconsistent data forms, and difficult guarantee of data security, so as to improve classification accuracy and improve The precision rate and the effect of improving the recall rate

Inactive Publication Date: 2011-09-14
悠易互通(北京)广告有限公司
View PDF2 Cites 61 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

It developed from data mining and faces many unprecedented problems: for example, the amount of data continues to expand, it is difficult

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Web-based text classification mining system and web-based text classification mining method
  • Web-based text classification mining system and web-based text classification mining method
  • Web-based text classification mining system and web-based text classification mining method

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0030] The system and method of the present invention will be further described in detail below in conjunction with the drawings and the embodiments of the present invention.

[0031] The task of data mining is mainly to find hidden, potential, possible data patterns, internal connections, laws, development trends and other useful information from a large amount of, incomplete, noisy, fuzzy, and random data. These data are often stored in the form of a structured static database (data warehouse), and also include some other forms of data collection. In view of the diversity of data, data mining tasks and data mining methods, data mining has to face many challenging new topics. The design of data mining language, the development of efficient and useful data mining methods and systems, the establishment of interactive and integrated data mining environments, and the application of data mining techniques to solve large-scale practical application problems are all current data mining...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a web-based text classification mining system and a web-based text classification mining method. The system mainly comprises a text pre-processing module, a word segmentation processing module and a classification algorithm module, wherein the text pre-processing module is used for automatically screening specific information from texts to be tested, pre-processing the specific information, and filtering out irrelevant information to effectively represent the texts; the word segmentation processing module is used for carrying out word segmentation on the texts, finding attributes/attributive words of each text, and making preparation for selection of characteristic words; and the classification algorithm module is used for carrying out characteristic selection to obtain an optimum characteristic sub-set, or finding corresponding probabilities according to data which is provided by a file of a training result, comparing the corresponding probabilities to obtain the type of the maximum probability, drawing a conclusion and storing the conclusion in the file finally. The system overcomes the shortcoming of conditional independence assumption of a naive Bayes algorithm by using a hypertext markup language (HTML) tag weight, improves a classifier and can improve the recall ratio and precision ratio of data mining.

Description

technical field [0001] The invention relates to information retrieval and data mining technology, in particular to a web-based text classification mining system and method. Background technique [0002] With the extensive development of computers in today's world and the maturity of database technology, the amount of data accumulated by human beings has become larger and larger, forming a situation where data is rich but knowledge is scarce. People in various fields look forward to a method that can efficiently discover useful information, that is, knowledge, from a large amount of data. Under this background, research on knowledge discovery and data mining has become a hot topic. [0003] Data mining is the process of using a variety of analytical tools to discover patterns and relationships among massive amounts of data that can be used to make predictions. Data mining involves theories and techniques in many fields such as database, artificial intelligence, machine learn...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F17/27
Inventor 张杰刘奎飞
Owner 悠易互通(北京)广告有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products