Theme word extraction method, and method and device for obtaining related digital resource by using same
A digital resource and extraction method technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve problems such as polysemous words, synonym interference, poor robustness, etc., to achieve enhanced robustness, improved accuracy, improved The effect of accuracy
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0048] A method for extracting subject words is provided in this embodiment, which is used to extract subject words in digital resources. The digital resources here can be one file or multiple files. After pre-selecting the digital resources, for the selected digital resources to extract keywords. The flow chart of the method is as figure 1 shown, including the following steps:
[0049] S11. Segment the text of the digital resource.
[0050] After digital resources are selected, the set of selected digital resources is positioned as D={d 1 , d 2 ,...,d m}, where d i , i=1,...,m represent the i-th news text, and m can be 1. Load the user dictionary to segment a single news text. The user dictionary is a collection of words composed of idioms, abbreviations and new words. Its function is to add some special terms in specific fields, such as idioms, abbreviations and new words, to improve the accuracy of word segmentation by the tokenizer. It is defined as userLib ={e 1 ...
Embodiment 2
[0072] This embodiment provides a method for obtaining related digital resources, which is used to obtain digital resources related to the selected digital resources among the massive digital resources. First, select the first digital resource. The first digital resource can be One article may also be multiple digital resources belonging to one topic. The purpose of this embodiment is to find out other digital resources related to the first digital resource. The flow chart of the method is as figure 2 shown, including the following steps:
[0073] S21. Using the method in Embodiment 1 to extract the subject words of the first digital resource. After the first digital resource is selected, the method in Example 1 is used to extract the subject words of the first digital resource, which will not be repeated here. Through the method in Example 1, the subject term vector of the first digital resource can be obtained topicWords=(tterm 1 ,t term 2 ,...,tterm q ), where tterm ...
Embodiment 3
[0099] This embodiment provides a topic generation method, which is used to obtain files in the resource library that belong to the same topic as the files read by the user according to the interested files that the user has read, and push these topics to the user to increase user experience. The flow of the topic generation method is as follows image 3 shown, including the following steps:
[0100] S31. Select a first digital resource. Here, digital resources that the user is interested in or concerned about can be selected, or some digital resources that the user has read. This step is used to select reference information, and the first digital resource is reference information for subsequent processing.
[0101] S32. Select one candidate digital resource in sequence as the second digital resource. A digital resource is selected from the candidate resource library as the second digital resource for subsequent processing.
[0102] S33. Use the method described in Embodime...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com