Network text segmenting method based on genetic algorithm
A genetic algorithm and text segmentation technology, applied in the field of network text segmentation, can solve problems that affect the accuracy of similarity, cannot provide word frequency information, and affect the accuracy of text segmentation results, etc.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0026] With reference to the accompanying drawings, this embodiment is aimed at the target text with the theme of "Beijing Olympics", the language usage is standardized, and the text length is relatively short. The specific steps of text segmentation are as follows:
[0027] The first step is to set the search theme of the web spider as vocabulary related to the Olympic Games, and use the web spider to collect web pages on the Internet. The determination of Olympic theme vocabulary includes the following three steps: 1) Manually determine a number of texts that can represent the search theme, usually 10 to 20; 2) Count the word frequency of nouns and verbs in the text, and select words with high word frequency as the undetermined theme vocabulary set , and the word frequency threshold is set to 30; 3) From the undetermined topic vocabulary set, manually select 10-15 words as topic vocabulary.
[0028] Web pages are all HTML documents, and it is necessary to perform text prepro...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com