Radio and television news keyword automatic extraction method based on deep learning

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of radio and television and deep learning, applied in neural learning methods, electrical digital data processing, natural language data processing, etc., can solve problems such as inability to obtain vocabulary and incomplete coverage, achieve accurate organization and management, and improve management efficiency Effect

Inactive Publication Date: 2021-05-28

成都索贝视频云计算有限公司

View PDF5 Cites 2 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Existing models that rely on word segmentation (such as TextRank, TF-IDF, etc.) cannot obtain such vocabulary; on the other hand, because these keywords do not have certain language characteristics, they cannot be completely covered even by adding word segmentation thesaurus

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0047] Such as figure 1 As shown, the automatic extraction method of radio and television news keywords based on deep learning includes steps:

[0048] S1, mark the keyword information of the radio and television news data to be analyzed, and construct a keyword data set;

[0049] S2, using the pre-training model to build a keyword extraction model, and using the keyword data set training in step S1 to build the keyword extraction model;

[0050] S3, using the keyword extraction model trained in step S2 to predict the input radio and television news, and obtain a keyword extraction result.

Embodiment 2

[0052] On the basis of Embodiment 1, the following steps are performed to construct the keyword data set in step S1:

[0053] S11, collect radio and television news data, and use the keyword results given by relevant professionals as candidate keywords; then clean the candidate keywords, remove meaningless and redundant keywords, and obtain the final keyword results; cleaning includes : First use entity recognition technology to identify entities in news text data, and remove entity words from candidate keywords; remove keywords that are too long or too short and keywords that do not appear in the original text. In this way, meaningless and redundant keywords can be removed, so that the features of meaningful keywords can be better identified in the subsequent training model, so that the trained model can better extract meaningful keywords.

[0054] S12, after the sentence and paragraph aggregation of the radio and television news text data, according to the final keyword resu...

Embodiment 3

[0056] On the basis of Embodiment 1, in step S2, the keyword extraction model includes a text vectorization layer, a first keyword prediction layer and a second keyword sequence labeling layer in series order.

[0057] The text vectorization layer adopts the pre-trained BERT layer to convert the text sequence Convert to a sequence of vectors ;in, represents the input text sequence, Represents the text vector sequence encoded by the text vector layer, and n represents the total number of characters in the input text sequence. In this embodiment, with the help of BERT's powerful language representation capabilities, better character-level semantic embedding expressions can be obtained.

[0058] The construction process of the first keyword prediction layer is as follows:

[0059] S21, for forward LSTM, define the bias parameters of the forget gate matrix and the forget gate matrix , the memory gate matrix and the bias parameter of the memory gate matrix , the outp...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a radio and television news keyword automatic extraction method based on deep learning, which comprises the following steps: S1, marking keyword information of radio and television news data to be analyzed, and constructing a keyword data set; S2, constructing a keyword extraction model by utilizing a pre-training model, and training the constructed keyword extraction model by utilizing the keyword data set in the step S1; and S3, predicting the input broadcast television news by utilizing the keyword extraction model trained in the step S2 to obtain a keyword extraction result and the like. According to the method, efficient keyword extraction can be carried out on the broadcast television news content, media resources are more accurately organized and managed, the management efficiency is improved, and technical support and the like are better provided for retrieval, recommendation and release services of users.

Description

technical field [0001] The present invention relates to the field of automatic indexing of radio and television news media assets, and more specifically, to a method for automatically extracting keywords from radio and television news based on deep learning. Background technique [0002] In the era of converged media, the explosive growth of video data has brought huge challenges to the reuse of media resources, and the real-time requirements for program cataloging and indexing have become higher accordingly. It is of great significance to use big data and artificial intelligence technology to automatically classify, identify, and index media content in multiple dimensions, realize the automatic extraction of content tags of media resources themselves, and improve cataloging quality and work efficiency. This is the supporting basis for the gradual transformation of media data management from traditional manual cataloging to automatic cataloging relying on an intelligent mana...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F40/295G06N3/04G06N3/08G06F16/38G06F16/31

CPCG06F40/295G06N3/08G06F16/38G06F16/31G06N3/044G06N3/045

Inventor 温序铭朱婷婷杨瀚

Owner 成都索贝视频云计算有限公司

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Radio and television news keyword automatic extraction method based on deep learning

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology