Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Bullet chat topic extraction method, medium, equipment and system based on n-gram model

An extraction method and extraction system technology, applied in the field of bullet chat topic extraction based on the N-gram model, can solve the problems of manpower and material cost, inaccurate bullet chat topic extraction, low efficiency, etc., and achieve the effect of reducing computational complexity

Active Publication Date: 2022-06-21
WUHAN DOUYU NETWORK TECH CO LTD
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, most of the traditional barrage text extraction solutions in the live broadcast industry use manual labeling. This method consumes a lot of manpower and material costs. The method is obviously inefficient
Moreover, the existing bullet chat text is represented purely based on the bag of words model, ignoring the relationship between a single word and the context, making the extraction of bullet chat topics inaccurate

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Bullet chat topic extraction method, medium, equipment and system based on n-gram model
  • Bullet chat topic extraction method, medium, equipment and system based on n-gram model
  • Bullet chat topic extraction method, medium, equipment and system based on n-gram model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.

[0031] see figure 1 As shown, an embodiment of the present invention provides a method for extracting bullet screen topics based on an N-gram model, including the following steps:

[0032] S1. Data preparation: extract the bullet screen data;

[0033] S2. Build barrage features: extract the features corresponding to words that represent a specific intent, and add them to the custom thesaurus; add words that have no actual meaning to the custom stop thesaurus;

[0034] S3. Data preprocessing: remove the empty data in the "Barrage Content" field; remove the punctuation marks in the "Barrage Content" field;

[0035] S4. Use the N-gram model to represent the bullet screen content as a word vector: The bullet screen content that has undergone data preprocessing is represented by the N-gram model. The N-gram model indicates that the occurre...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a barrage topic extraction method, medium, equipment and system based on an N-gram model, and relates to the field of live broadcasting. The method includes the following steps: extracting barrage data; extracting features corresponding to words expressing a specific intention and adding them to a custom lexicon; adding words without practical meaning to a custom stop lexicon; data preprocessing: removing " The data in which the "Bullet Chat Content" field is empty; remove the punctuation marks in the "Bullet Chat Content" field; the preprocessed bullet chat content is represented by the N-gram model, which represents the occurrence of a certain word in the sentence The probability is related to the previous N-1 words, and N is a positive integer; divide each barrage content into a set of word vectors, and divide each barrage content according to the word formation rules in the custom lexicon. Disable thesaurus to filter useless words. The present invention can accurately extract barrage topics.

Description

technical field [0001] The present invention relates to the field of live broadcasting, in particular to a method, medium, device and system for extracting the subject of a bullet screen based on an N-gram model. Background technique [0002] The main text content of the live broadcast platform is generally expressed as a bullet screen. In order to count the bullet screen content, it is necessary to extract the bullet screen text information of the live broadcast platform. At present, most of the traditional bullet screen text extraction solutions in the live broadcast industry use manual tagging, which consumes a lot of manpower and material costs. The method is obviously inefficient. Moreover, the existing bullet screen text is simply represented based on the bag-of-words model, ignoring the relationship between a single word and the context, which makes the extraction of bullet screen topics inaccurate. SUMMARY OF THE INVENTION [0003] The purpose of the present inve...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/284G06F40/216H04N21/235H04N21/435
CPCH04N21/235H04N21/435G06F40/289
Inventor 龚灿陈少杰张文明
Owner WUHAN DOUYU NETWORK TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products