Unlock instant, AI-driven research and patent intelligence for your innovation.

Method, device and system for extracting live comment theme based on N-gram model and medium

An extraction method and extraction system technology are applied in the field of bullet screen topic extraction based on the N-gram model, which can solve the problems of labor and material cost, low efficiency, inaccurate bullet screen topic extraction, etc. The screen represents the exact effect

Active Publication Date: 2019-01-15
WUHAN DOUYU NETWORK TECH CO LTD
View PDF8 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, most of the traditional barrage text extraction solutions in the live broadcast industry use manual labeling. This method consumes a lot of manpower and material costs. The method is obviously inefficient
Moreover, the existing bullet chat text is represented purely based on the bag of words model, ignoring the relationship between a single word and the context, making the extraction of bullet chat topics inaccurate

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method, device and system for extracting live comment theme based on N-gram model and medium
  • Method, device and system for extracting live comment theme based on N-gram model and medium
  • Method, device and system for extracting live comment theme based on N-gram model and medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] The present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0031] see figure 1 As shown, the embodiment of the present invention provides a method for extracting bullet chatting topics based on the N-gram model, comprising the following steps:

[0032] S1. Data preparation: extract barrage data;

[0033] S2. Building barrage features: extract features corresponding to words representing a specific intention, and add them to the custom lexicon; add words that have no practical meaning to the custom stop lexicon;

[0034] S3. Data preprocessing: remove the data whose "bullet chat content" field is empty; remove the punctuation marks in the "bullet chat content" field;

[0035] S4. Use the N-gram model to represent the content of the bullet chat as a word vector: the content of the bullet chat after data preprocessing is represented by the N-gram model, and the N-gram model indicates that the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method, a device and a system for extracting live comment theme based on N-gram model and a medium, and relates to the live broadcast field. The method comprises the following steps: extracting live comment data; extracting features corresponding to words expressing a specific intention and adding them to a custom thesaurus; adding meaningless words to a custom deactivated thesaurus; performing data preprocessing: remove the data whose live comment content field is empty and removing punctuation marks from the live comment content field; representing the live commentcontent after data preprocessing by using an N-gram model, wherein the N-gram model indicates that the occurrence probability of a word in a sentence is related to the preceding N-1 word, and N is a positive integer; segmenting each live comment content into a group of word vectors, and segmenting each live comment content is segmented according to the word formation rules in the user-defined thesaurus, and filtering the useless words according to the user-defined deactivated thesaurus. The invention can accurately extract the live comment theme.

Description

technical field [0001] The present invention relates to the field of live broadcasting, in particular to a method, medium, device and system for extracting a barrage topic based on an N-gram model. Background technique [0002] The main text content of the live broadcast platform is generally displayed as a barrage. In order to count the content of the barrage, it is necessary to extract the text information of the barrage of the live platform. At present, most of the traditional barrage text extraction solutions in the live broadcast industry use manual labeling. This method consumes a lot of manpower and material costs. The method is obviously inefficient. Moreover, the existing bullet chat text is represented solely based on the bag-of-words model, ignoring the relationship between a single word and the context, making the extraction of bullet chat topics inaccurate. Contents of the invention [0003] The object of the present invention is to overcome the shortcomings...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27H04N21/235H04N21/435
CPCH04N21/235H04N21/435G06F40/289
Inventor 龚灿陈少杰张文明
Owner WUHAN DOUYU NETWORK TECH CO LTD