Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

DINFO-OEC text analysis mining method and device thereof

A technology for mining equipment and text, applied in the field of text mining, can solve the problem of not considering business personnel, and achieve the effects of reducing business maintenance investment, improving maintainability, and improving effects

Active Publication Date: 2015-11-04
ZHONGKE DINGFU BEIJING TECH DEV
View PDF6 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] However, the existing technologies generally use statistical methods for text mining, without considering the needs of business personnel, and only provide mining algorithms, which has brought a lot of trouble to business personnel
The problem faced by text mining technology is how to analyze and mine valuable information that users care about from a piece or a large amount of unstructured text, so that business personnel can define mining requirements and mining rules from a business perspective, regardless of the language expression in the text Linguistic Ambiguity Caused by the Diversity of Habits

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • DINFO-OEC text analysis mining method and device thereof
  • DINFO-OEC text analysis mining method and device thereof
  • DINFO-OEC text analysis mining method and device thereof

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with specific embodiments and with reference to the accompanying drawings.

[0023] figure 1 It is a schematic diagram of DINFO-OEC unstructured text big data analysis and mining method, such as figure 1 The input of the shown DINFO-OEC text analysis and mining method 100 is unstructured text 111. The "unstructured text" mentioned here refers to a text fragment, including a sentence such as "I want to apply for a card", and also includes An article, meanwhile, "unstructured text" includes text expressed in various languages ​​such as Simplified Chinese and English.

[0024] Step S120, preprocessing the unstructured text 111, including sentence segmentation, word segmentation, and part-of-speech tagging. The sentence segmentation process is to break the text 111 with a period and divide it into ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a concept-based unstructured text big data analysis mining method and a device thereof. The method comprises the following steps of: (1) performing pre-processing, including word segmentation and named entity recognition; (2) performing concept extraction and concept expression identification on an input text; (3) performing analysis mining on the concept expression of the input text according to mining rules; (4) calculating confidence levels of the mining results; (5) outputting the mining results according to the confidence levels; and (5) visually showing the mining results. A mining model of the method comprises three trees, including an ontology tree, an element tree and a concept tree. The device comprises a modelling unit (1), a pre-processing unit (2), a concept extraction and expression identifying unit (3), an analysis mining unit (4) and a visual show unit (5). The method and the device have the following advantages that: diversity of services and natural language expressions is separated in the modelling process, and investment in service maintenance is lowered; and the mining method can greatly improve accuracy of analysis mining.

Description

technical field [0001] The invention relates to the technical field of text mining, in particular to a concept-based DINFO-OEC mining method and DINFO-OEC mining equipment. Background technique [0002] 80% of social big data is unstructured data, and unstructured big data processing is the biggest challenge big data faces. Structured data analysis cannot fully mine and discover semantics in big data. [0003] The challenges of unstructured text mining are: [0004] Maintenance challenges brought about by language diversity: There are various language expressions in the text, and irregular usages such as abbreviations and shorthand are common. [0005] Maintenance challenges brought about by multiple business categories and fast-changing rules: There are many business categories and fast-changing categories. Every time a category changes, it is necessary to reorganize the language rules of all related categories. The maintenance workload is huge and the maintenance efficie...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 不公告发明人
Owner ZHONGKE DINGFU BEIJING TECH DEV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products