Radio and television news keyword automatic extraction method based on deep learning

A technology of radio and television and deep learning, applied in neural learning methods, electrical digital data processing, natural language data processing, etc., can solve problems such as inability to obtain vocabulary and incomplete coverage, achieve accurate organization and management, and improve management efficiency Effect

Inactive Publication Date: 2021-05-28
成都索贝视频云计算有限公司
View PDF5 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Existing models that rely on word segmentation (such as TextRank, TF-IDF, etc.) cannot obtain such vocabulary; on the other hand, because these keywords do not have certain language characteristics, they cannot be completely covered even by adding word segmentation thesaurus

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Radio and television news keyword automatic extraction method based on deep learning
  • Radio and television news keyword automatic extraction method based on deep learning
  • Radio and television news keyword automatic extraction method based on deep learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0047] Such as figure 1 As shown, the automatic extraction method of radio and television news keywords based on deep learning includes steps:

[0048] S1, mark the keyword information of the radio and television news data to be analyzed, and construct a keyword data set;

[0049] S2, using the pre-training model to build a keyword extraction model, and using the keyword data set training in step S1 to build the keyword extraction model;

[0050] S3, using the keyword extraction model trained in step S2 to predict the input radio and television news, and obtain a keyword extraction result.

Embodiment 2

[0052] On the basis of Embodiment 1, the following steps are performed to construct the keyword data set in step S1:

[0053] S11, collect radio and television news data, and use the keyword results given by relevant professionals as candidate keywords; then clean the candidate keywords, remove meaningless and redundant keywords, and obtain the final keyword results; cleaning includes : First use entity recognition technology to identify entities in news text data, and remove entity words from candidate keywords; remove keywords that are too long or too short and keywords that do not appear in the original text. In this way, meaningless and redundant keywords can be removed, so that the features of meaningful keywords can be better identified in the subsequent training model, so that the trained model can better extract meaningful keywords.

[0054] S12, after the sentence and paragraph aggregation of the radio and television news text data, according to the final keyword resu...

Embodiment 3

[0056] On the basis of Embodiment 1, in step S2, the keyword extraction model includes a text vectorization layer, a first keyword prediction layer and a second keyword sequence labeling layer in series order.

[0057] The text vectorization layer adopts the pre-trained BERT layer to convert the text sequence Convert to a sequence of vectors ;in, represents the input text sequence, Represents the text vector sequence encoded by the text vector layer, and n represents the total number of characters in the input text sequence. In this embodiment, with the help of BERT's powerful language representation capabilities, better character-level semantic embedding expressions can be obtained.

[0058] The construction process of the first keyword prediction layer is as follows:

[0059] S21, for forward LSTM, define the bias parameters of the forget gate matrix and the forget gate matrix , the memory gate matrix and the bias parameter of the memory gate matrix , the outp...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a radio and television news keyword automatic extraction method based on deep learning, which comprises the following steps: S1, marking keyword information of radio and television news data to be analyzed, and constructing a keyword data set; S2, constructing a keyword extraction model by utilizing a pre-training model, and training the constructed keyword extraction model by utilizing the keyword data set in the step S1; and S3, predicting the input broadcast television news by utilizing the keyword extraction model trained in the step S2 to obtain a keyword extraction result and the like. According to the method, efficient keyword extraction can be carried out on the broadcast television news content, media resources are more accurately organized and managed, the management efficiency is improved, and technical support and the like are better provided for retrieval, recommendation and release services of users.

Description

technical field [0001] The present invention relates to the field of automatic indexing of radio and television news media assets, and more specifically, to a method for automatically extracting keywords from radio and television news based on deep learning. Background technique [0002] In the era of converged media, the explosive growth of video data has brought huge challenges to the reuse of media resources, and the real-time requirements for program cataloging and indexing have become higher accordingly. It is of great significance to use big data and artificial intelligence technology to automatically classify, identify, and index media content in multiple dimensions, realize the automatic extraction of content tags of media resources themselves, and improve cataloging quality and work efficiency. This is the supporting basis for the gradual transformation of media data management from traditional manual cataloging to automatic cataloging relying on an intelligent mana...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/295G06N3/04G06N3/08G06F16/38G06F16/31
CPCG06F40/295G06N3/08G06F16/38G06F16/31G06N3/044G06N3/045
Inventor 温序铭朱婷婷杨瀚
Owner 成都索贝视频云计算有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products