Unlock instant, AI-driven research and patent intelligence for your innovation.

Adaptive radio and television news keyword standardization method

A technology of radio and television and keywords, applied in special data processing applications, unstructured text data retrieval, text database indexing, etc., can solve the problem that keywords cannot meet user business needs, and achieve accurate organization and management, accurate Media resources, the effect of improving management efficiency

Active Publication Date: 2021-09-03
CHENGDU SOBEY DIGITAL TECH CO LTD
View PDF24 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In media content tags, "keywords" are an important item in traditional catalogues, which are closely related to content. However, the keywords extracted by AI technology cannot meet the actual business needs of users.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Adaptive radio and television news keyword standardization method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0031] Such as figure 1 As shown, the adaptive radio and television news keyword standardization method,

[0032] Adaptive radio and television news keyword standardization method, including:

[0033] Step A, standardize candidate keywords based on the basic keyword library, and add words that cannot be standardized to the white list. When the number of white-named words increases to the set amount, analyze the words in the white list to extract representative words and return them to the user. To expand the basic keyword database.

[0034] In this embodiment, the basic keyword database may be a characteristic vocabulary database on the user side.

Embodiment 2

[0036] On the basis of Embodiment 1, the candidate keywords are obtained according to the following steps: predict the input radio and television news based on the trained keyword extraction model, obtain the keyword extraction results, and perform blacklist filtering on the extraction results to form candidate keywords word.

[0037] In this embodiment, the keyword extraction model used when performing keyword prediction on radio and television news may be the keyword extraction model based on deep learning provided by the present invention or other existing keyword extraction models. If the keyword extraction model based on deep learning provided by the present invention is adopted, the model includes a text vectorization layer, a first keyword prediction layer, and a second keyword sequence labeling layer in series order. The text vectorization layer is mainly based on the pre-trained language model, which can obtain vocabulary that cannot be obtained by traditional models ...

Embodiment 3

[0043] On the basis of embodiment 1, standardize candidate keywords based on the basic keyword library, including:

[0044] Step A1, obtain a plurality of news text corpora, build the study sample that is used to train FastText word vector model; In this embodiment, can utilize crawler etc. tools to obtain a plurality of news text corpus; In this embodiment, considering FastText word The vector model uses character-level n-grams to represent words. This processing technique makes it better for word vectors generated by low-frequency words; at the same time, it also enables it to encode any word (including those that do not appear in the lexicon. on the other hand, keywords in radio and television news also have such characteristics (some keywords have low frequency, and keywords are not completely words, they may be words, phrases or multiple words), so the The FastText word vector model is more applicable to the scene of the present invention for word vectorization;

[0045]...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for standardizing adaptive radio and television news keywords, including steps: step A, standardize candidate keywords based on the basic keyword database, and add whitelists to words that cannot be standardized, and when the whitelist words increase to a set amount Afterwards, the words in the white list are analyzed and the representative words are extracted and returned to the user for expanding the basic keyword library, etc.; the present invention is a method for automatic content labeling for radio and television news using intelligent technology, considering the actual business of the user Under the premise of meeting the needs, the standardization of keywords and the expansion of characteristic thesaurus can be carried out adaptively, which can organize and manage media resources more accurately and improve management efficiency.

Description

technical field [0001] The invention relates to the field of automatic indexing of radio and television news media assets, and more specifically, relates to a method for standardizing adaptive radio and television news keywords. Background technique [0002] In the era of converged media, the explosive growth of news video data has brought huge challenges to the reuse of media resources. How to catalog relevant news videos quickly, cheaply and easily has become very important. On the other hand, with the improvement of computer computing capabilities and the gradual maturity of related vision and NLP algorithm means, it has become an important task to realize automatic indexing of video content data through big data and artificial intelligence technology to improve cataloging quality and real-time indexing. a trend. In this environment, the management of media data has gradually shifted from traditional manual cataloging to automatic cataloging relying on an intelligent man...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/31
CPCG06F16/313
Inventor 温序铭朱婷婷杨瀚严照宇陈智
Owner CHENGDU SOBEY DIGITAL TECH CO LTD