Domain-knowledge-based short text classification method and text classification system

A technology of text classification and domain knowledge, applied in the field of text classification system, can solve problems such as weak feature information and increased classification error rate

Inactive Publication Date: 2011-09-21
SHANGHAI BIJIA DATA
View PDF0 Cites 87 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Therefore, the text vector to be classified is brought into the classifier, and the feature informati

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Domain-knowledge-based short text classification method and text classification system
  • Domain-knowledge-based short text classification method and text classification system
  • Domain-knowledge-based short text classification method and text classification system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0083] The present invention will be described in further detail below in conjunction with the accompanying drawings.

[0084] Such as figure 2 As shown, it is a text classification system based on domain knowledge in Embodiment 1 of the present invention, and the text classification system includes:

[0085] The training data acquisition module is used to acquire the data for model training to obtain the learning library;

[0086] The training data acquisition module mainly acquires data for model training. Further preferably, the training data acquisition module acquires data for model training through a web crawler program to obtain a learning library, that is, by analyzing some data categories similar to B2C websites , using web crawler technology to obtain class-labeled data.

[0087] The data preprocessing module is used for information extraction to process the unstructured data into structured data, obtain the original data for building a model or classifying the mo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a domain-knowledge-based short text classification method and a domain-knowledge-based short text classification system used in the technical field of information. The method is used for overcoming the defect that the traditional text classification method cannot well classify short texts. Aiming at the characteristics that the short text description concept signals are relatively weak and the text features are seriously insufficient, the invention provides the short text data classification method and the text classification system suitable for commodity web page data. According to the embodiment, a commodity classifier with excellent classification effect is obtained by reforming the traditional classifier, introducing new elements and devoting to matching application of algorithm and data. The introduction of the new elements comprises the following steps of: introducing a concept of domain words and introducing the concept into the classifier so as to effectively increase the information quantity of the short texts; and performing different-lexical-item-set-based semantic analysis on the short text data, particularly the web page commodity data, and introducing the semantic analysis result into the classifier so as to introduce new information for the commodity data information and improve the accuracy of text classification.

Description

technical field [0001] The invention relates to the field of information technology, in particular to a text classification method and a text classification system based on domain knowledge. Background technique [0002] Today, with the rapid development of information technology, users can obtain a large amount of information through various channels. For example, by browsing the web, using search engines for information retrieval, and receiving emails, the problem that often occurs is that there is a large amount of data but lack of effective information. [0003] For example, when searching for a keyword in webpages such as Baidu and Google, a large number of webpage links containing the keyword are obtained. Among them, some webpage links can reflect that the corresponding webpage has content related to the keyword, and some webpage links correlation is poor. When users are unwilling or unable to traverse every piece of data due to the huge project, how to obtain effec...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 陈吕祥刘敏
Owner SHANGHAI BIJIA DATA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products