Multiple-document automatic abstracting method based on frequent itemset
A technology of frequent itemsets and automatic summarization, which is applied in the field of data processing, can solve problems such as inconsistent contributions, phrase shifting, and low similarity results of sentences with similar semantics, and achieve high clarity, high practicability, and high simplicity Effect
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
specific Embodiment
[0051] A multi-document collection of 20 topics from the Sogou dataset is selected, and the documents of these 20 topics contain about 250 Chinese documents. The word segmentation algorithm of the Chinese Academy of Sciences is used to process the word segmentation of multiple documents, the stop words are removed according to the stop word list, and the Apriori association algorithm is used to mine frequent itemsets. The compression ratios are respectively 10%, 20% and 30%, and multi-document abstracts are generated under different compression ratios.
[0052] 1. Preprocessing of sentence segmentation, word segmentation and removal of stop words
[0053] In order to cluster all the document sentences in the multi-document collection, the multi-document should be segmented first. make D={d 1 , d 2 …, d n } , representing a multi-document collection, where, d i Represents a single document. Use regular expressions to match the end of the sentence, and divide the mul...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com