A multi-keyword parallel search method for streaming RDF data based on spark Streaming

A search method and keyword technology, applied in the field of multi-keyword parallel search of streaming RDF data, to achieve efficient real-time query, improve search efficiency, and reduce the number of tasks

Active Publication Date: 2021-11-02
FUZHOU UNIV
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, in terms of real-time big data, Hadoop is powerless and has certain limitations.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A multi-keyword parallel search method for streaming RDF data based on spark Streaming
  • A multi-keyword parallel search method for streaming RDF data based on spark Streaming
  • A multi-keyword parallel search method for streaming RDF data based on spark Streaming

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

[0041] Such as figure 1 As shown, the present embodiment provides a multi-keyword parallel search method for streaming RDF data based on Spark Streaming, comprising the following steps:

[0042] Step S1: According to the Redis-based distributed storage scheme, map the keywords input by the user to the class vertices or attribute edges on the RDF ontology graph, construct the RDF ontology class-attribute two-dimensional model, and prune through the relationship between classes , deduplication, and connection operations to construct the corresponding ontology query subgraph;

[0043] Step S2: Build a correlation evaluation function to score and sort ontology query subgraphs from two aspects: structural tightness and content relevance;

[0044] Step S3: According to the priority of the ontology query subgraph, use the MapReduce computing framework to sea...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention relates to a multi-keyword parallel search method for streaming RDF data based on Spark Streaming. Firstly, input keywords are mapped to class vertices or attribute edges on an RDF ontology graph, and a two-dimensional model of RDF ontology class-attribute is constructed. The relationship between classes is pruned, deduplicated, and connected to construct the corresponding ontology query subgraph; a correlation evaluation function is proposed to score and rank ontology query subgraphs from the two aspects of structure tightness and content relevance ;According to the priority of ontology query subgraphs, the ones with higher scores are searched first, using the MapReduce computing framework to search in parallel for matching instance triples on the RDF data graph, and connecting according to the connection relationship of ontology query subgraphs to obtain the top Top-k results . The invention not only avoids the iterative search for the connection paths between the vertices on a large number of vertices of the data graph, but also improves the accuracy of the query and further improves the search efficiency.

Description

technical field [0001] The invention relates to the technical field of massive RDF data streaming retrieval, in particular to a multi-keyword parallel search method for streaming RDF data based on Spark Streaming. Background technique [0002] With the advent of big data, distributed processing platforms such as Hadoop have obvious advantages in batch processing, but in the face of real-time processing of streaming data, there are many shortcomings, and the emergence of streaming data real-time processing platforms makes up for it Disadvantaged batch processing platforms are deficient in real-time processing. Real-time search of streaming data has become a new research hotspot. A variety of streaming data is generated on the Internet. Due to the heterogeneity of data, RDF is widely used to provide a unified metadata representation in data streams. RDF dynamic data streams have attracted considerable interest in the semantic web community. . In response to this growing dem...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/27G06F16/28G06F16/2458G06F16/2453G06F40/30
CPCG06F40/30
Inventor 汪璟玢于龙
Owner FUZHOU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products