Text excavating method of semi-structural document set
A structured document and text mining technology, which is applied in the fields of instruments, computing, and electrical digital data processing, etc., can solve the problems of not fully utilizing the text mining effect and not forming a mathematical model, so as to achieve the effect of improving the effect and widely applying it
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0015] The present invention will be further described below in conjunction with the accompanying drawings. We selected some term entry documents in the terminology database of China Encyclopedia as the example data, and each term entry document is a semi-structured XML document.
[0016] First, if figure 1 As shown, it is first necessary to read in the document and perform structural analysis on the document, such as figure 2 shown. Determine whether each node of the document already exists in the structure tree. If the node information does not exist in the structure tree, you need to add the node information to the structure tree and give the node a unique identification number, such as image 3 shown.
[0017] Second, if the currently analyzed node contains child nodes, continue to analyze its first child node until the data node does not contain child nodes; if the current node is a data node, perform word segmentation on the text field of the data node, and according...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com