Text processing and search system based on big data platform

A big data platform and text processing technology, applied in electronic digital data processing, special data processing applications, unstructured text data retrieval, etc. The effect of improving efficiency

Inactive Publication Date: 2017-04-26
NO 32 RES INST OF CHINA ELECTRONICS TECH GRP
View PDF5 Cites 23 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

"A Semantic-Based Big Data Analysis Business Intelligence Service System" patent (Chinese Patent Publication No.: CN104182389A, 2014.12.03), introduces a semantic-based big data analysis business intelligence service system, which realizes the Internet-rich Ac...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text processing and search system based on big data platform
  • Text processing and search system based on big data platform

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] The preferred embodiments of the present invention are given below in conjunction with the accompanying drawings to describe the technical solution of the present invention in detail.

[0041] Such as figure 1 As shown, the text processing and retrieval system 101 based on the big data platform provided by the embodiment of the present invention includes a text extraction module 102, a text segmentation module 103, an index establishment module 104, an entity recognition module 105, a keyword extraction module 106, Automatic summary module 107, text clustering module 108, automatic classification module 109, service interface module 110, semantic tagging module 111 and shared retrieval module 112 based on distributed memory;

[0042] Text extraction module 102 receives external text file, at first judges whether its file is damaged, if so then no longer carries out follow-up text processing, otherwise recognizes its file format again, carries out corresponding text extr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text processing and search system based on a big data platform. The system comprises a text processing portion based on Hadoop and a distributed search function portion based on Hadoop, wherein the text processing portion based on Hadoop comprises a text extraction module and the like; the distributed search function portion based on Hadoop comprises a semantic annotation module and a distributed memory sharing-based search module. According to the system provided by the invention, the text data with different formats and different codes can be processed; more comprehensive text processing operations such as content extraction, text word segmentation, index building, entity identification, keyword extraction, autoabstract, text clustering and automatic classification are performed on the text, to fully explore information and value included by the text data; a text processing result can be released out via a service interface, so that interaction and expansibility of the system are improved; a distributed memory sharing-based full-text search technology is used, so that full-text search efficiency after the text is processed is improved.

Description

technical field [0001] The invention relates to a computer information processing system, in particular to a text processing and retrieval system based on a big data platform. Background technique [0002] The explosive growth of data is the most typical feature of the information age. According to the research report of Internet Data Center (IDC), in 2011, 1.8ZB (ie 1.8 trillion GB) of data was created worldwide. That's the equivalent of every American writing 3 Tweets (comments on "Twitter") every minute, and still writing non-stop for 27,000 years. The scale of servers in Google's data center reaches millions, and the amount of data processed every day exceeds 100PB. Such big data includes a large amount of structured and unstructured data, especially unstructured data represented by text, etc. The two key issues in dealing with massive data are the storage and calculation of massive data. Traditional text processing systems Neither of these two aspects can meet the de...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/316G06F16/35G06F16/367
Inventor 姜鑫王金华
Owner NO 32 RES INST OF CHINA ELECTRONICS TECH GRP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products