Keyword extraction method and apparatus
An extraction method and keyword technology, applied in the video field, can solve problems such as insufficient comprehensiveness and inability to reflect the position information of words, and achieve the effect of improving the accuracy rate
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0027] figure 1 It is a technical flow chart of Embodiment 1 of the present invention, combined with figure 1 , a keyword extraction method in the embodiment of the present invention mainly includes the following steps:
[0028] Step 110: using a tokenizer to segment the text to obtain words, and filtering the words to obtain candidate keywords;
[0029] In the embodiment of the present invention, the collected text is divided into separate words by using the existing word breaker and the part of speech of each word can be obtained, wherein the word breaker can include a word breaker based on a dictionary matching algorithm, a word breaker based on a thesaurus matching device, a word segmenter based on word frequency statistics, and a word segmenter based on knowledge understanding, etc., which are not limited in the embodiments of the present invention.
[0030] After the word is obtained by the tokenizer, the word needs to be further processed, such as filtering the word f...
Embodiment 2
[0047] figure 2 It is the technical flow chart of Embodiment 2 of the present invention, combining figure 2 , a keyword extraction method in the embodiment of the present invention can be further refined into the following steps:
[0048] Step 210: use the tokenizer to segment the text to obtain each word and its part of speech;
[0049] In the embodiment of the present invention, using the existing word segmentation method, the method of segmenting the text into words may be any of the following methods, or any combination of several types.
[0050] The word segmenter based on the dictionary matching algorithm uses dictionary matching, Chinese morphology or other Chinese language knowledge to perform word segmentation, such as: maximum matching method, minimum word segmentation method, etc. The word segmenter based on thesaurus matching is based on the statistical information of words and words, such as the information between adjacent words, word frequency and correspond...
Embodiment 3
[0086] image 3 It is the technical flowchart of Embodiment 3 of the present invention, combining image 3 , a keyword extraction device of the present invention mainly includes a candidate keyword acquisition module 310 , a similarity calculation module 320 , an inverse document frequency calculation module 330 , and a keyword extraction module 340 .
[0087] The candidate keyword acquisition module 310 is used to segment the text using a tokenizer to obtain each word and its part of speech, and perform stop word filtering on the word according to the part of speech and the preset blacklist to obtain candidate keywords ;
[0088] The similarity calculation module 320 is used to calculate the similarity between any two candidate keywords;
[0089] The inverse document frequency calculation module 330 is configured to use the TextRank formula to iteratively calculate the weight of each candidate keyword according to the similarity, and calculate the inverse document frequency...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com