Chinese text parallel data mining method based on hierarchy
A text and Chinese technology, applied in the field of information processing, can solve problems such as low mining efficiency and large amount of original data, and achieve the effects of improving calculation speed, high word segmentation accuracy, and high efficiency
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0013] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.
[0014] A layer-based Chinese text parallel data mining method is characterized in that it comprises the following steps:
[0015] Step 1: Establishment of the Chinese text vector space model: By segmenting the entire Chinese text collection, the word segmentation form of each text and the feature entry set containing all deduplicated entries in the text set are obtained, and then the feature entry set is used to count each The term frequency inverse document frequency (TFIDF) of the text, and the text vector space model is established according to the term frequency inverse document frequency (TFIDF).
[0016] Definition of Term Frequency Inverse Document Frequency (TFIDF): It refers to an index that an entry represents the amount of text information that contains the entry. Its calculation formula is: TFIDF ij =TF ij *IDF i
[0017] TF i...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com