Text classification method and obtained text classifier

A text classification and text technology, applied in the field of text classification methods and obtained text classifiers, can solve the problems of high input cost, long time consumption, more time and computing resources, etc.

Active Publication Date: 2017-07-14
数库(上海)科技有限公司
View PDF3 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The present invention aims at the disadvantages that the classification system of the text classification technology in the prior art cannot be changed at will. If it is necessary to change the classification system, it takes a lot of time and computing resources, and the corpus needs to be marked manually, which requires high investment costs and takes a long time. The purpose is to provide a text classification method that can flexibly change the classification system and automatically label text, which greatly saves computing resources, time and cost

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text classification method and obtained text classifier
  • Text classification method and obtained text classifier

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0062] The preferred embodiments are listed below, combined with figure 1 and figure 2 To more clearly and completely illustrate the method for obtaining a text classifier for automatically labeling corpus and the implementation process of the text classifier in the present invention.

[0063] Step A, the concept determination process includes:

[0064] Concept set X consists of concept x i Composition, where i = 1, 2, 3, ... n, for each concept x in the concept set X i Corresponding to a concept keyword set Y composed of at least one concept keyword i . A text may be associated with one or more concepts x i , and may not be associated with any concept x i . If a text has many concepts related to a concept x in the concept set X i related content, the text and the concept x i associated; if the content of a text is related to any concept x in the concept set X i are not related, it is said that the text is associated with the concept. From the perspective of text c...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A technical scheme of the invention discloses a method for obtaining a text classifier used for automatic corpus tagging, and the text classifier. The method comprises: determining a concept set, using a concept keyword corresponding to each concept to match a non-tagging corpus text and perform automatic tagging processing, the concept keyword being in a concept keyword set; for each concept, when the number of texts in a tagging corpus text set corresponding to the concept meets a threshold condition, training a corresponding text classification model for the concept, to obtain a corresponding text classifier, and finally a text classifier set corresponding to the concept is obtained, all text numbers meeting the threshold condition. The invention provides an algorithm structure having universality, and a classification system is flexibly changed, calculation time and resources are saved. The text classifier just needs few initial corpus texts, and the text classifier automatically tags without manual tagging, so as to further save time and cost.

Description

technical field [0001] The invention relates to the technical field of artificial intelligence text classification, in particular to a text classification method and an obtained text classifier. Background technique [0002] With the rapid development of network technology, the requirements for the effective organization and management of electronic text information and the ability to quickly, accurately and comprehensively find relevant information are getting higher and higher. As a key technology for processing and organizing a large amount of text data, text classification solves the problem of messy information to a large extent and facilitates users to accurately obtain the required information. technical foundation. Text classification generally includes the process of expressing text, selecting and training text classifiers, and evaluating and feedbacking text classification results. The existing text classification technology is usually implemented according to th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/35
Inventor 贾宁夏磊
Owner 数库(上海)科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products