Adaptive radio and television news keyword standardization method
A technology of radio and television and keywords, applied in special data processing applications, unstructured text data retrieval, text database indexing, etc., can solve the problem that keywords cannot meet user business needs, and achieve accurate organization and management, accurate Media resources, the effect of improving management efficiency
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0031] Such as figure 1 As shown, the adaptive radio and television news keyword standardization method,
[0032] Adaptive radio and television news keyword standardization method, including:
[0033] Step A, standardize candidate keywords based on the basic keyword library, and add words that cannot be standardized to the white list. When the number of white-named words increases to the set amount, analyze the words in the white list to extract representative words and return them to the user. To expand the basic keyword database.
[0034] In this embodiment, the basic keyword database may be a characteristic vocabulary database on the user side.
Embodiment 2
[0036] On the basis of Embodiment 1, the candidate keywords are obtained according to the following steps: predict the input radio and television news based on the trained keyword extraction model, obtain the keyword extraction results, and perform blacklist filtering on the extraction results to form candidate keywords word.
[0037] In this embodiment, the keyword extraction model used when performing keyword prediction on radio and television news may be the keyword extraction model based on deep learning provided by the present invention or other existing keyword extraction models. If the keyword extraction model based on deep learning provided by the present invention is adopted, the model includes a text vectorization layer, a first keyword prediction layer, and a second keyword sequence labeling layer in series order. The text vectorization layer is mainly based on the pre-trained language model, which can obtain vocabulary that cannot be obtained by traditional models ...
Embodiment 3
[0043] On the basis of embodiment 1, standardize candidate keywords based on the basic keyword library, including:
[0044] Step A1, obtain a plurality of news text corpora, build the study sample that is used to train FastText word vector model; In this embodiment, can utilize crawler etc. tools to obtain a plurality of news text corpus; In this embodiment, considering FastText word The vector model uses character-level n-grams to represent words. This processing technique makes it better for word vectors generated by low-frequency words; at the same time, it also enables it to encode any word (including those that do not appear in the lexicon. on the other hand, keywords in radio and television news also have such characteristics (some keywords have low frequency, and keywords are not completely words, they may be words, phrases or multiple words), so the The FastText word vector model is more applicable to the scene of the present invention for word vectorization;
[0045]...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 
