Unstructured text data enhanced distributed large-scale data dimension extracting method
A technology for large-scale data and text data, which is applied in the fields of electrical digital data processing, special data processing applications, instruments, etc., and can solve problems such as the difficulty of unstructured text data and the inability to build dimensions
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0059] An enhanced distributed large-scale data dimension extraction method for unstructured text data, including:
[0060] Step 1: Text word segmentation: Segment the input text, find out the mutual information value between the smallest semantic units, set the first threshold through training, compare the first threshold with the mutual information value between the smallest semantic units, When the mutual information value is greater than or equal to the first threshold, a word segmentation result is obtained;
[0061] Step 2: Word frequency statistics: According to the word segmentation results, perform word frequency statistics on the input text, and establish a corresponding word frequency relationship table;
[0062] Step 3: Input text topic extraction: According to the target field of interest in extraction, determine the set of topic words in the target field, and determine the stability of the topic words in the input text when they appear together with all the words...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com