Query task optimization method based on science and technology consultation large-scale graph data

A technology of task optimization and query optimization, which is applied in the direction of electronic digital data processing, special data processing applications, digital data information retrieval, etc., can solve the problems of high communication cost and processing overhead and inapplicability of servers, so as to improve flexibility and reduce Complexity, the effect of improving query efficiency

Pending Publication Date: 2022-02-08
BEIJING UNIV OF POSTS & TELECOMM
0 Cites 1 Cited by

AI-Extracted Technical Summary

Problems solved by technology

[0003] At present, although the query optimization technology on graph data has made great progress, there are still some problems: the graph partition technology for graph query optimization can split graph data into multiple servers, but the communication cost ...
View more

Method used

[0079] Specifically, the pattern advance in the embodiments of the present disclosure is to replace the traversal operation in the pattern with efficient lookup of the set. Make the company-business abnormal model in advance, put the company ID information associated with the "business abnormal" node into a hash table, and then filter the condition to determine whether the "company" node exists in the hash table, if "company" If the node does not exist in the hash table, it means that the company has no abnormal operation, and only 3292 times of O(1) time complexity are needed for set search, thus improving the query efficiency.
[0081] Exemplary, in an embodiment of the present disclosure, the query task in the scientific and technological consulting scene specifies the industrial chain tag information tag, starting from th...
View more

Abstract

The invention provides a query task optimization method and system based on the science and technology consultation large-scale graph data, and a storage medium. According to the invention, the identification of a query task is obtained, and a corresponding query optimization method is selected according to the identification of the query task; and the query optimization method comprises the steps of graph traversal expansion sequence strategy adjustment, Cardinality reduction, mode advance, view materialization, querying of a graph database by using the query optimization method, and outputting of a query result. Therefore, in the method provided by the invention, the corresponding query optimization method can be selected according to the identifier of the query task, and the flexibility of the query method is improved. Meanwhile, in the method provided by the invention, the query optimization method improves the query efficiency of the query task of the science and technology consultation large-scale graph data in different scenes, reduces the complexity of query calculation, and shortens time spent on query.

Application Domain

Special data processing applicationsSemantic tool creation

Technology Topic

Query optimizationGraph traversal +5

Image

  • Query task optimization method based on science and technology consultation large-scale graph data
  • Query task optimization method based on science and technology consultation large-scale graph data

Examples

  • Experimental program(1)

Example Embodiment

[0024] Example one
[0025] figure 1 For a flow diagram of the query task optimization method based on the technical consultation of large-scale diagram data according to the present application, such as figure 1 As shown, the method can include:
[0026] Step 101, obtain the identity of the query task.
[0027] It should be noted that in the embodiments of the present disclosure, the query task can include a mechanism, talent, and industrial chain. Among them, in the embodiment of the present disclosure, the mechanism can be a company ID, talent can be a person
[0028] Among them, in the embodiment of the present disclosure, the identity of the query task can be obtained according to the content of the query task. In the embodiment of the present disclosure, it is assumed that the query task is to view the company associated with a person, and the identity of the query task is obtained.
[0029] Step 102, select the corresponding query optimization method according to the identity of the query task, where the query optimization method includes adjusting the drawing traversal deployment sequence policy, Cardinality reduction, mode advance, materialization view.
[0030] In the embodiment of the present disclosure, different identifiers correspond to different query optimization methods, and the corresponding query method can be selected according to the identity of the query task.
[0031] And, in the embodiment of the present disclosure can include adjusting the diagram traversal deployment sequence strategy, Cardinality reduction, mode advance, materialization view.
[0032] Further, in the embodiment of the present disclosure, the adjustment diagram traversal deployment sequential strategy combined with technology consultation actual query scenario, design two-way BFS map traversal development, start searching from the starting point and the end point, once searching the other direction Searched location (or some states have been accessed in both directions), find a shortest path to connect the starting point and endpoint. Then collect a certain point in the middle of the shortest circuit, meet in the path, so the number of nodes of the two-way BFS is 2 * nm/2+1 Magnitude.
[0033] Specifically, in the embodiment of the present disclosure, the adjustment diagram traversal deployment sequence policy can include the following steps:
[0034] S11, input source entity nodes and target entity nodes, and enter the intermediate entity node type mtype, and path mode Pattern;
[0035] S12, the initialization S1, S2 two node set, where S1 is initialized to input source entity nodes, and S2 initializes the input target entity node;
[0036] S13, using pattern and mtype to calculate the two-way BFS deployment order, and use pattern1 to represent the left-end expansion order, pattern2 represents the right end expansion order;
[0037] S14, if S1 or S2 is not empty, continue to perform step S15; otherwise, step S111 is performed;
[0038] S15, S is the collection of this layer extension node;
[0039] S16, exchange S1 and S2, alternately expand from the left end and extension from the right end;
[0040] S17, for each node Node in the S1 collection, expand Node's next layer of neighbor nodes in accordance with the mode, and use Next_Nodes;
[0041] S18, judges the node in each next_node, if the node is in the S collection, that is, a path is found, and step S111;
[0042] S19, all nodes NEXT_NODES extended in this layer are added to the set S, and the set S is copied to S1, the storage path;
[0043] S110, repeat step S14;
[0044] S111, end.
[0045] In the embodiment of the present disclosure, the query task gives the industrial chain label tag and person information Person, queries its sub-industrial chain labels from TAG, and the patents belonging to the sub-industrial chain label, and the company to which the patent belongs. The company's office / investment and other related personnel. In the already constructed technology consultation knowledge map, the industrial chain-sub-industrial chain label - patenction This path will generate 146,284 patented intermediate nodes. If the use of one-way BFS, the extension of 146,284 patents will produce explosive intermediate results. , Seriously affect the performance of query.
[0046] If the two-way BFS in the present disclosure, two-way search, two-way search, ie the industrial chain label - sub-industrial chain labels - patents and personnel - the company - patent, the industry Chain label - sub-industrial chain label - patented 146,284 patented intermediate nodes processed to haveh table, and then reverse from the person nodes, the person - company - patent this path generates a set of results, and finally The hash table is interspersed, finds the path to the conditional connecting point and the terminal, and the time complexity requires only O (N).
[0047]Further, the disclosed embodiment of the present embodiment, after the Cardinality indicates the number of unique values ​​to a weight, such as Columns Cardinality (column base) refers to the number of unique values ​​of the column contains. The number of direct impact on performance when the model the effect of compression and scanning engine. It is necessary to try to reduce Cardinality minimum, to shorten the time required for a query.
[0048] Wherein, in the present embodiment of the present disclosure, the Cardinality reduction may include the steps of:
[0049] S21, input source entity node and path mode PATTERN;
[0050] S22, next_nodes is a collection of nodes of the next layer and initialize the next layer of neighbor nodes expanded in accordance with the mode extension;
[0051] S23, the NEXT_NODES node is heavy;
[0052] S24, Q is a node queue, which is initialized to Next_Nodes;
[0053] S25, if Q is not empty, continue to perform step S26; otherwise, step S212 is performed;
[0054] S26, SIZE is the current quantity of the queue;
[0055] S27, if Size is not empty, continue to perform step S28; otherwise, step S211 is performed;
[0056] S28, pop up the current queue Node node;
[0057] S29, according to the mode expansion Node next to the next layer of neighbor node_nodes;
[0058] S210, join next_nodes to queue Q;
[0059] S211, if the current mode is traversed pattern, proceed to step S212, otherwise step S25;
[0060] S212, end.
[0061] , The present exemplary embodiment of the disclosed embodiment, the map information technology consulting under the actual scene, or there may be multiple edges between the edges of two different types, such as "Company" and the node "person" nodes there is a "Company - three relations incumbent "- investors" / "companies - public shareholders - who" / "company. Therefore, the "staff" to find adjacent nodes starting from a certain company, you may locate from the three related to some of the same "personnel" nodes, resulting in duplicate nodes. Repeated redundant nodes will increase Cardinality, when a repeat of "personnel" nodes continue to look for adjacent nodes, it will repeat the traverse, which will increase the number of intermediate nodes, it was increased query time. Accordingly, the present disclosed embodiment, a distinct advance optimization strategy to reduce the cardinality.

PUM

no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.

Similar technology patents

MAC authentication method and apparatus

InactiveCN101197785Aincrease flexibilitylow cost
Owner:NEW H3C TECH CO LTD

Overturning and climbing robot with two telescopic arms

ActiveCN103273500Aincrease flexibilityIncreased range of mobile work
Owner:HARBIN ENG UNIV

Classification and recommendation of technical efficacy words

  • increase flexibility
  • Improve query efficiency

Front illuminated back side contact thin wafer detectors

InactiveUS7057254B2reduce radiation damage susceptibilityincrease flexibility
Owner:OSI OPTOELECTRONICS

Security guarantee method and system for Windows terminals based on auto white list

InactiveCN101650768Aincrease flexibilityImprove the range of adaptation
Owner:SHENZHEN Y& D ELECTRONICS CO LTD

Data processing method and system

InactiveUS20070018986A1increase flexibility
Owner:IBM CORP

Urine dispersing urinal insert device

InactiveUS6920648B1increase flexibility
Owner:SUSKI MICHAEL R +1

Management method and management system for history and culture information data, and server

InactiveCN105117965AHigh concurrent accessImprove query efficiency
Owner:SHANGHAI ADVANCED RES INST CHINESE ACADEMY OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products