Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Information gain-based English social media account number classification method

A technology of social media and information gain, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as no research, and achieve the effect of improving accuracy

Inactive Publication Date: 2017-12-12
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF4 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Using the method of text classification to classify social media accounts, there is no related research so far

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Information gain-based English social media account number classification method
  • Information gain-based English social media account number classification method
  • Information gain-based English social media account number classification method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] The technical solution of the present invention will be further described below in conjunction with the accompanying drawings.

[0045] Such as figure 1 As shown, the classification method of English social media accounts based on information gain includes the following steps:

[0046] S1. Data preprocessing: Segment the blog posts published by social media accounts, remove stop words and useless symbols, and obtain the characteristic words of the account; use the bag of words model to represent the account, the bag of words model is a tool in natural language processing and information retrieval Simplified expression model under (IR). Under this model, text such as sentences or documents can be represented by a bag containing these words, which does not consider the grammar and the order of the words.

[0047] Example: (1) John likes to watch movies. Mary likes movies too.

[0048] (2)John also likes to watch football games.

[0049] In the above two sentences, the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an information gain-based English social media account number classification method. The method comprises the following steps of: S1, carrying out data preprocessing to obtain feature words of an account number; S2, feature selection: selecting the feature words of the account number by utilizing an information gain method so as to obtain a feature word with class representativeness; S3, feature diffusion: finding a synonym of the feature word by utilizing wordnet and artificially adding certain keywords in a field category to diffuse the feature word; S4, classification model construction: carrying out processing by adoption of a machine learning technology so as to an account number classification model; and S5, classifying unknown social media account numbers. According to the method, a common text classification method is applied to English social media account number classification, so that users can rapidly search account numbers in a certain field category from massive accounts and then obtain related effective information of the field category.

Description

technical field [0001] The invention belongs to the technical field of text classification, in particular to a method for classifying English social media accounts based on information gain. Background technique [0002] Text classification technology is an important basis for information retrieval and text mining. Its main task is to determine its category according to the text content under a predetermined set of category tags. Text classification has a wide range of applications in natural language processing and understanding, information organization and management, content information filtering and other fields. The text classification method based on machine learning that gradually matured in the 1990s paid more attention to the automatic mining and generation and dynamic optimization capabilities of the classifier model, which was better than the previous text classification model based on knowledge engineering and expert systems in terms of classification effect and...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06Q50/00
Inventor 费高雷朱闻一陈浩赵海林谢星辰
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products