A method for classifying enterprise domain and screening enterprise keyword

A screening method and keyword technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of lack of classification technology, too many tag noise words, etc., and achieve excellent comprehensive performance and low industry classification error rate , the effect of high classification efficiency

Active Publication Date: 2018-12-28
SOUTHEAST UNIV
View PDF2 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Purpose of the invention: Aiming at the above existing problems in the prior art, the present invention proposes a method for classifying enterprise fields and selecting enterprise keywords, which can solve the current problems in the field of enterprises Due to the lack of existing text classification technology and the problem that there are many label noise words extracted by enterprise search engines, the method provided by the invention can classify enterprises with high accuracy, and provides a way of extracting enterprise labels. Through the present invention, enterprises can Keywords extracted from documents can reduce noise labels in enterprise search engines and make enterprise search engine positioning more accurate

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method for classifying enterprise domain and screening enterprise keyword
  • A method for classifying enterprise domain and screening enterprise keyword
  • A method for classifying enterprise domain and screening enterprise keyword

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] The technical solutions of the present invention will be further described below in conjunction with the accompanying drawings and embodiments.

[0047] A kind of enterprise field classification and enterprise key word screening method described in the present invention, the general steps of this method are as follows:

[0048]First of all, it is necessary to collect a large number of enterprise introduction documents in different fields through the crawler program and classify them into a training corpus. The training corpus is a corporate document database that has been manually classified and calibrated to its category. Then adopt the improved TF-IDF algorithm provided by the present invention to carry out enterprise taxonomy dictionary training. The traditional TF-IDF algorithm only uses the word frequency information in the text, and the accuracy of the extracted keywords is not high. The improved TF-IDF algorithm of the present invention is aimed at the characteri...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A method for classifying enterprise domains and screening enterprise keywords is disclosed by the invention. The method obtains the feature words of the related fields through training the enterprisedomain corpus as a classification dictionary, and then uses the classification dictionary to classify the enterprise profile documents. After classification, the method can also extract the industry label representing the enterprise domain from the enterprise profile documents. In addition, the method can overcome the influence of most noisy words in Chinese text processing, and has low error rate, high classification efficiency and excellent comprehensive performance.

Description

technical field [0001] The invention belongs to the fields of Chinese text processing and text mining, and in particular relates to a method for classifying enterprise fields and selecting enterprise keywords. Background technique [0002] In the information age, a large amount of information is stored in text, such as various research documents, enterprise information documents, books, web documents, etc. In recent years, computer technology has advanced by leaps and bounds, and technologies such as data mining and text information mining have become hotspots in information science research, and it is expected that some mature text mining technologies will be used in production. [0003] In many cases, readers do not have enough energy to read through all the texts obtained, so many documents provide abstracts and keywords to help readers judge whether the content of the text is of interest to them and whether they want to continue reading. In the past, text summarization ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06K9/62G06Q10/06
CPCG06Q10/06G06F40/216G06F40/289G06F18/24
Inventor 邝野夏思宇李钢
Owner SOUTHEAST UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products