Article topic keyword extraction method and apparatus based on low-rank matrix decomposition
A low-rank matrix and extraction method technology, applied in the field of article topic keyword extraction based on low-rank matrix decomposition, can solve problems such as heavy workload
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0069] The embodiment of the present invention provides a flow chart of a method for extracting article topic keywords based on low-rank matrix decomposition. figure 1 As shown, the method includes the following steps:
[0070] Step S110: Perform data preprocessing of cleaning, word segmentation, and removal of stop words on the text in the article to be processed, so as to obtain text that is convenient for keyword extraction of subsequent events. The aforementioned articles may be news, microblogs, blogs, comments, etc.
[0071] In the text preprocessing stage, the present invention mainly performs the following text preprocessing: remove URL links, emoticons, and invalid characters in the article text; since there are no spaces between Chinese words, word segmentation of the text is required before keyword extraction , the present invention uses an open source natural language processing toolkit with good effect——HanLP to carry out word segmentation; then remove stop words...
Embodiment 2
[0097] This embodiment provides a device for extracting article topic keywords based on low-rank matrix decomposition. The specific structure of the device is as follows: image 3 shown, including:
[0098] The data preprocessing module 31 is used to represent the word as a real value vector. Before the text after the preprocessing of the tool training data, it also includes: performing data preprocessing on the article text to be processed, the data preprocessing includes cleaning, word segmentation, and removing stops. use words.
[0099] The word vectorized file generation module 32 is used to use the article text after the tool training data preprocessed to represent the word as a real value vector to obtain the word vectorized file, which includes a plurality of word vectors, and the word vectorized file includes a plurality of word vectors. Contains keywords and non-keywords;
[0100]The keyword matrix building module 33 is used to use the keyword extraction algorithm ...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com