Method and device for recognizing class of social contact short texts and method and device for training classification models

A classification model and short text technology, applied in text database clustering/classification, unstructured text data retrieval, special data processing applications, etc. Accuracy and the effect of enriching user experience

Inactive Publication Date: 2015-09-30
BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
View PDF4 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

It can be seen that this classification method only considers the text content attributes of Weibo, and public opinion users are more concerned about t

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for recognizing class of social contact short texts and method and device for training classification models
  • Method and device for recognizing class of social contact short texts and method and device for training classification models
  • Method and device for recognizing class of social contact short texts and method and device for training classification models

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0035] Embodiment one

[0036] figure 1 It is a flow chart showing the method for identifying categories of social short texts according to Embodiment 1 of the present invention. The method can be performed, for example, on a microblog server.

[0037] refer to figure 1 , in step S110, acquire social short text data.

[0038] For example, the obtained social short text data is shown in Table 1 below:

[0039] Table 1

[0040]

[0041] It can be seen that the category information of the social short text data in Table 1 is unknown, and subsequent processing is required to identify the category of the social short text.

[0042] In step S120, text feature data is extracted from the social short text data.

[0043] Here, the text feature data may include, but not limited to, at least one of the following: plain text feature data, writing habit feature data, social feature data and user feature data.

[0044] Wherein, the plain text feature data may include the data of the...

Example Embodiment

[0061] Embodiment two

[0062] figure 2 It is a flow chart showing the training method of the short text classification model in Embodiment 2 of the present invention. The short text classification model is used to identify the categories of social short texts.

[0063] refer to figure 2 , in step 210, a plurality of labeled sample data is acquired, each of the labeled sample data includes social short text data, labeled text feature data and category information.

[0064] Here, the text feature data may include, but not limited to, at least one of the following: plain text feature data, writing habit feature data, social feature data, and user feature data.

[0065] In addition, considering that social short texts have both media and social attributes, reasonable categories need to be set for social short texts. Therefore, the category information can be news events, advertisements, non-commercial sharing or private conversations. Among them, news events, advertisements...

Example Embodiment

[0080] Embodiment three

[0081] image 3 It is a logical block diagram showing the device for identifying categories of social short texts according to Embodiment 3 of the present invention. can be used to execute as figure 1 The method steps of the illustrated embodiment.

[0082] refer to image 3 , the device for identifying the category of the social short text includes a text data acquisition module 310 , a feature data extraction module 320 , a category information acquisition module 330 and a category information determination module 340 .

[0083] The text data acquiring module 310 is used for acquiring social short text data.

[0084] The feature data extraction module 320 is used to extract text feature data from the social short text data.

[0085] Here, the text feature data may include at least one of the following: plain text feature data, writing habit feature data, social feature data and user feature data.

[0086] Specifically, the plain text feature d...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention provides a method and device for recognizing the class of social contact short texts and a method and device for training classification models. The method for recognizing the class of the social contact short texts includes the steps that social contact short text data are acquired; text characteristic data are extracted from the social contact short text data; with the text characteristic data as input, first class information of the social contact short text data is acquired from at least two trained short text classification models; second class information of the social contact short text data is determined according to the acquired first class information of the social contact short text data. By means of the method and device for recognizing the class of the social contact short texts and the method and device for training the classification models, the class information of the social contact short texts can be automatically and accurately recognized, so that the effect and accuracy of classifying lots of social contact short texts are improved, and the method and device for recognizing the class of the social contact short texts and the method and device for training the classification models are widely applied to various short text analysis scenes and improve network experience of users.

Description

technical field [0001] The present invention relates to the technical field of network information processing, in particular to a method for identifying social short text categories, a classification model training method and a device. Background technique [0002] With the widespread use of applications such as Weibo, Tieba and WeChat, a large amount of text data has been generated on the Internet, most of which are fragmentary descriptions or opinion comments. Because of their short text content, these text contents are called social media. short text. In the face of massive text data, how to accurately and effectively classify it has become a topic of widespread concern and research in the Internet industry. [0003] Usually, word-based vector space models are constructed for short texts, which makes the spatial modules of short texts too sparse. Furthermore, using a single model for training and learning has low classification effect and accuracy. In addition, taking ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/35
Inventor 莫洋沈剑平李炫宋元峰骆金昌陈玉光
Owner BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products