Query statement conversion method and device, electronic equipment, storage medium and product
By generating and transforming abstract syntax trees, the problem of incompatibility between query languages on different data platforms is solved, enabling efficient and secure automated conversion of query statements and improving the flexibility and efficiency of data analysis.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- BEIJING BAIDU NETCOM SCI & TECH CO LTD
- Filing Date
- 2026-03-17
- Publication Date
- 2026-06-19
AI Technical Summary
The incompatibility of query languages between different data platforms leads to low query conversion efficiency and security risks, and existing technologies struggle to achieve accurate and efficient automated conversion.
By generating an abstract syntax tree for the first query language, identifying and transforming target nodes, constructing an abstract syntax tree adapted to the second query language, and finally generating the target query statement, the visitor pattern is used to ensure syntactic correctness.
It achieves high-precision, high-efficiency, and high-security automated conversion between different query languages, improving the flexibility and efficiency of data analysis and reducing labor costs and security risks.
Smart Images

Figure CN122240127A_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to the field of data processing technology, and more particularly to the field of big data or data query technology. Specifically, this disclosure relates to a query statement conversion method, apparatus, electronic device, storage medium, and product. Background Technology
[0002] Currently, with the increasing complexity of data processing needs, data platforms are also becoming more diversified. Various data platforms mostly use their own proprietary query languages. The query languages of different data platforms are incompatible with each other, creating significant "data silos."
[0003] In certain scenarios, there is a need to convert queries in one query language into queries in another. How to accurately and efficiently automate this conversion between different query languages has become a critical technical problem that urgently needs to be solved. Summary of the Invention
[0004] To address at least one of the aforementioned deficiencies, this disclosure provides a query statement conversion method, apparatus, electronic device, storage medium, and product.
[0005] According to a first aspect of this disclosure, a query statement transformation method is provided, the method comprising: Based on the first query statement of the first query language, generate the first abstract syntax tree; In response to the existence of a pre-specified target node in the first abstract syntax tree, the target node in the first abstract syntax tree is transformed to obtain a second abstract syntax tree adapted to the second query language; Based on the second abstract syntax tree, a second query statement of the second query language is generated.
[0006] According to a second aspect of this disclosure, a query statement conversion apparatus is provided, the apparatus comprising: The abstract syntax tree generation module is used to generate a first abstract syntax tree based on the first query statement of the first query language; The node transformation module is used to transform the target node in the first abstract syntax tree in response to the existence of a pre-specified target node in the first abstract syntax tree, so as to obtain a second abstract syntax tree adapted to the second query language. The query statement generation module is used to generate second query statements in the second query language based on the second abstract syntax tree.
[0007] According to a third aspect of this disclosure, an electronic device is provided, the electronic device comprising: At least one processor; and A memory communicatively connected to at least one of the aforementioned processors; wherein, The memory stores instructions that can be executed by at least one processor, which, when executed by the at least one processor, enables the at least one processor to perform the query statement transformation method.
[0008] According to a fourth aspect of this disclosure, a non-transitory computer-readable storage medium is provided that stores computer instructions, wherein the computer instructions are used to cause a computer to execute the above-described query statement conversion method.
[0009] According to a fifth aspect of this disclosure, a computer program product is provided, including a computer program that, when executed by a processor, implements the above-described query statement transformation method.
[0010] It should be understood that the description in this section is not intended to identify key or essential features of the embodiments of this disclosure, nor is it intended to limit the scope of this disclosure. Other features of this disclosure will become readily apparent from the following description. Attached Figure Description
[0011] The accompanying drawings are provided to better understand this solution and do not constitute a limitation of this disclosure.
[0012] Figure 1 This is a flowchart illustrating a query statement conversion method provided in an embodiment of this disclosure.
[0013] Figure 2 This is a schematic diagram of the structure of the query statement conversion system provided in the embodiments of this disclosure.
[0014] Figure 3 This is a flowchart illustrating one specific implementation of the method provided in this disclosure.
[0015] Figure 4 This is a schematic diagram of the structure of a query statement conversion device provided in an embodiment of this disclosure.
[0016] Figure 5 This is a block diagram of an electronic device used to implement the query statement conversion method provided in the embodiments of this disclosure. Detailed Implementation
[0017] The exemplary embodiments of this disclosure are described below with reference to the accompanying drawings, including various details of the embodiments to aid understanding, and should be considered merely exemplary. Therefore, those skilled in the art will recognize that various changes and modifications can be made to the embodiments described herein without departing from the scope and spirit of this disclosure. Similarly, for clarity and brevity, descriptions of well-known functions and structures are omitted in the following description.
[0018] First, some terms used in this application will be explained.
[0019] An abstract syntax tree (AST) is a tree-like data structure used to represent the syntactic structure of source code. Each node in an AST represents a syntactic structural unit in the source code, such as an operator, a function call, a command, or a value.
[0020] Kusto Query Language (KQL) is a query language provided by the Kibana platform for quickly searching and filtering data in Elasticsearch.
[0021] Elasticsearch Query Language (ES|QL) is a pipeline query language designed for Elasticsearch, offering intuitive and powerful data exploration and transformation capabilities.
[0022] Search Processing Language (SPL) is the native language of the Splunk platform, specifically designed for searching, processing, and analyzing machine-generated big data (such as logs and events).
[0023] ArcSight Query Language (AQL) is a dedicated query language for the SIEM platform ArcSight, used to query, correlate, and analyze security event data.
[0024] Currently, data platforms typically use their own proprietary query languages. In some scenarios, there's a need to convert queries from one query language to another. For example, a query developed on one data platform based on complex query rules cannot be directly transferred to another; it requires query language conversion. Similarly, during cross-platform data migration, accumulated queries from the original platform need to be migrated to the new platform, a process involving converting queries to the new platform's query language. Furthermore, when performing unified queries on data sources from different platforms, a single query language cannot achieve cross-platform data queries; separate query language conversions are required for each platform.
[0025] In related technologies, manual rewriting is often used when converting between different query languages, which consumes a lot of manpower and has low conversion efficiency.
[0026] There are also some automatic conversion methods in related technologies, such as mapping strings in query statements based on regular expressions to achieve conversion. However, this method has problems with syntax errors or semantic deviations when dealing with complex query structures such as nested functions, parameter order changes, and aggregation logic, resulting in poor conversion effects and failing to meet usage requirements.
[0027] Other related technologies include translation using large language models. However, large language models are susceptible to illusions, and the style or structure of the generated code may vary each time, leading to insufficient reliability. Furthermore, large language models typically have high latency, which cannot meet the needs of high-concurrency real-time query scenarios. Additionally, queries may involve sensitive or private information, and using external large language models for translation may pose security risks.
[0028] In summary, there is an urgent need for a solution that can accurately and efficiently automate the conversion between different query languages.
[0029] The query statement conversion method, apparatus, electronic device, storage medium, and product provided in this disclosure are intended to solve at least one of the above-mentioned technical problems in the prior art.
[0030] Figure 1 This is a flowchart illustrating the query statement conversion method provided in the embodiments of this disclosure, as shown below. Figure 1 As shown, the method may include the following steps: Step S110: Generate a first abstract syntax tree based on the first query statement of the first query language; Step S120: In response to the existence of a pre-specified target node in the first abstract syntax tree, the target node in the first abstract syntax tree is transformed to obtain a second abstract syntax tree adapted to the second query language; Step S130: Based on the second abstract syntax tree, generate the second query statement of the second query language.
[0031] As can be seen from the above process, this disclosure generates a first abstract syntax tree based on the query statement of the source language (i.e., the first query language), transforms the target nodes in the first abstract syntax tree to obtain a second abstract syntax tree adapted to the target language (i.e., the second query language), and then generates the query statement of the target language based on the second abstract syntax tree. Based on this scheme, accurate and efficient automatic conversion between different query languages can be achieved, which helps to improve the flexibility and efficiency of data analysis.
[0032] The following describes in detail each step of the above process and the effects that can be further produced, with reference to the embodiments. It should be noted that the terms "first" and "second" involved in this disclosure do not have limitations in terms of size, order, or quantity, but are only used to distinguish them in name. For example, "first query language" and "second query language" are used to distinguish two different query languages.
[0033] First, the above step S110, namely "generating a first abstract syntax tree based on a first query statement of a first query language", will be described in detail with reference to the embodiments.
[0034] The first query language is the source language to be converted, and the first query statement is a query statement written based on the first query language.
[0035] In this step, by converting the first query statement into a structured, hierarchical tree model (i.e., the first abstract syntax tree), the syntactic structure of the first query statement can be clearly obtained, providing a foundation for subsequent query language conversion.
[0036] For example, nodes in the abstract syntax tree can be predefined, and nodes can be generated by parsing the first query statement to finally obtain the first abstract syntax tree.
[0037] For example, the nodes defined in the abstract syntax tree specifically include: base class node, root node, expression node, and command node.
[0038] Among them, the base class node is the base class of all nodes.
[0039] The root node represents a complete query. It contains the data source definition and an ordered list of commands, i.e., the execution sequence of the pipeline.
[0040] An expression: Represents a unit of computation that produces a value; it is the base class for other concrete expression nodes. Expression nodes can be of the following types: Identifier: The name of a field or variable.
[0041] Literals: Represent constant values, such as the integer 100, the string "error", or the boolean value true.
[0042] Binary operators: represent binary operations. For example, ">" (greater than) and "AND" (and).
[0043] Function call: Represents a function call, which includes the function name and parameter list.
[0044] Unary operators: represent unary operations, such as "NOT".
[0045] Command nodes represent a specific processing stage in a pipeline flow and serve as the base class for other specific command nodes. Command nodes specifically include the following types: Filtering commands: used for data filtering.
[0046] Projection command: Used to select or exclude fields.
[0047] Aggregation commands: used for grouped statistics, including aggregation functions and grouping fields.
[0048] Sort command: Used to sort query results.
[0049] Limit command: Used to limit the number of results returned. For example, "Return only the first 10 records".
[0050] Extended commands: Used to generate new fields.
[0051] The following describes in detail step S120, namely, "in response to the existence of a pre-specified target node in the first abstract syntax tree, the target node in the first abstract syntax tree is transformed to obtain a second abstract syntax tree adapted to the second query language," with reference to the embodiments.
[0052] The second query language is the target language to which the query needs to be converted.
[0053] Different query languages (such as the first query language and the second query language) may have the same logical intent when expressing certain specific semantics, but they will have inherent differences in their specific syntactic expressions. The pre-specified target node is precisely the centralized mapping of these inherent differences at the level of the abstract syntax tree structure, that is, it represents the "point of divergence" between the first query language and the second query language when expressing the same semantic concept.
[0054] The transformation of the target node involves converting the syntax of the first query language into an equivalent syntax that is adapted to the second query language, while preserving the complete semantic logic of the original query.
[0055] The second abstract syntax tree obtained by transforming the target nodes in the first abstract syntax tree has internal nodes that conform to the syntactic expression format of the second query language, providing a foundation for generating high-quality second query statements based on the second abstract syntax tree.
[0056] The following describes step S130, namely "generating a second query statement of the second query language based on the second abstract syntax tree", in detail with reference to the embodiments.
[0057] After obtaining the second abstract syntax tree adapted to the second query language, a second query statement can be generated based on the second abstract syntax tree, that is, query code that can be executed on the target data platform.
[0058] For example, the Visitor Pattern can be used to traverse each node in the second abstract syntax tree. When a node is traversed, the corresponding code fragment can be "concatenated" according to the specific attributes of the node and the syntax rules of the second query language, thereby ensuring the syntactic correctness of the generated second query statement.
[0059] Continuing with the previous example, when generating the second query statement, adjustments can be made based on the specific details of the second query language to ensure that the second query statement conforms to the specifications of the second query language. For example, regarding the handling of quotation marks, KQL does not require conversion of quotation marks, while ES|QL requires converting quotation marks to backticks. Similarly, in SPL, the keyword "search" needs to be added when performing an explicit search to ensure compliance with specifications.
[0060] In summary, based on steps S110 to S130, this disclosure generates a first abstract syntax tree based on the query statement of the source language, transforms the target nodes in the first abstract syntax tree to obtain a second abstract syntax tree adapted to the target language, and then generates the query statement of the target language based on the second abstract syntax tree. Based on this scheme, accurate and efficient automatic conversion between different query languages can be achieved, which helps to improve the flexibility and efficiency of data analysis.
[0061] In this scheme, by performing a conversion on the nodes in the first abstract syntax tree to adapt to the target language, the resulting second abstract syntax tree can accurately adapt to the specification of the target language while maintaining semantic consistency, thereby achieving high-precision query language conversion.
[0062] Current regular expression-based conversion schemes are shallow string-level conversions that cannot deeply understand complex query structures such as nested functions, parameter order variations, and aggregation logic. This solution, however, constructs an abstract syntax tree to accurately understand the complete semantics of the query statement, effectively handling complex query structures like nested functions, parameter order variations, and aggregation logic, ensuring high semantic fidelity in the converted query. Furthermore, this query language conversion method offers advantages such as high precision, high efficiency, high security, and high reliability, overcoming the shortcomings of conversion methods based on large language model translation.
[0063] In one optional embodiment of this disclosure, generating a first abstract syntax tree based on a first query statement of a first query language includes: Lexical analysis is performed on the first query statement to obtain lexical units; Identify the command delimiter in the lexical unit and divide the lexical unit into lexical unit groups corresponding to different commands based on the command delimiter; Based on each lexical unit group, command nodes are constructed, and expression subtrees mounted on the command nodes are constructed, thereby generating the first abstract syntax tree.
[0064] Among them, lexical units are the basic syntactic elements generated by lexical analysis, such as keywords, identifiers, operators, literals, delimiters, etc.
[0065] For example, lexical parsing can be performed using a context-free grammar (CFG).
[0066] Command delimiters are specific symbols used in query statements to separate different commands. Command delimiters are generally pipe characters, but can also include other forms, such as keywords.
[0067] Query statements typically represent a pipelined data processing flow, where each command represents an independent stage of data processing, usually separated by command delimiters. Based on command delimiters, the parsed sequence of lexical units can be segmented into groups of lexical units corresponding to each command.
[0068] The first or first few lexical units in each lexical unit group are usually keywords that represent the command type, which can be used to determine the command type and then create a command node.
[0069] After creating the command node, the remaining lexical units in the lexical unit group can be parsed to construct an expression subtree that expresses the specific processing logic of the command, and then attached to the command node. After creating all command nodes and expression subtrees in the first query statement, a complete first abstract syntax tree can be obtained.
[0070] In this approach, by dividing complex query statements into separate processing stages and then parsing each stage individually, the pipeline processing logic and command sequence of the streaming query language can be accurately captured, ensuring that the generated first abstract syntax tree can completely and accurately represent the semantics and streaming structure of the first query statement. Furthermore, by parsing each processing stage separately, the complexity of rule parsing can be reduced.
[0071] In one alternative embodiment of this disclosure, the target node includes a function node, and the transformation of the target node in the first abstract syntax tree includes: Adjust the function name of the first function corresponding to the function node to the function name of the second function. The first function is a function defined in the first query language, and the second function is a function defined in the second query language. The first function and the second function are functions with the same functionality. Adjust the first parameter of the first function in the function node so that its parameter format and parameter order are adapted to the second parameter of the second function.
[0072] In this context, a function node is a node in the abstract syntax tree that represents a function call, and it typically contains the function name and parameter list.
[0073] Different query languages may have different names, parameter formats, and parameter order for functions with the same functionality. By adjusting the function name, parameter format, and parameter order in the function node, the second abstract syntax tree can be made to conform to the specifications of the second query language.
[0074] The first function is a function defined in the first query language. The second function is a function in the second query language that is semantically equivalent to the first function; it represents the target state after syntactic transformation. Both functions are identical, but may differ at the syntactic level, such as in function name, parameter format, and parameter order.
[0075] For example, the first function is a function defined in KQL with the function name "bin", and the corresponding second function has the function name "date_trunc" in ES|QL. When converting the function node, the function name "bin" in the function node can be adjusted to "date_trunc".
[0076] Parameter format refers to the rules for representing function parameter values under the specifications of a specific query language. For example, the parameter format for dates in KQL is "YYYY-MM-dd", while the date format in SPL is "%Y-%m-%d".
[0077] The parameter order refers to the sequence of parameters in the function definition. Functions with the same functionality may have different parameter orders in different query languages.
[0078] The first parameter is the argument of the first function, which can be in the form of a parameter list. By adjusting the parameter format and the order of the first parameters in the list, a parameter list conforming to the calling conventions of the second function (i.e., the second parameter) can be obtained.
[0079] For example, the first query language is KQL. In KQL, the parameter arrangement rule for the function "bin" specifies the parameter order as [timestamp expression, interval value], and the interval value should be in the format of "number + unit letter" (e.g., 1h). The second query language is ES|QL. In ES|QL, the parameter arrangement rule for the function "date_trunc" specifies the parameter order as [interval value, timestamp expression], and the interval value should be in the format of "unit English word" (e.g., hour). When converting the function node, the parameter order in the function node can be adjusted to [interval value, timestamp expression], and the interval value format can be adjusted to the form of "unit English word".
[0080] In this solution, by adjusting the function name and adapting the parameter format / order of the function node while maintaining semantic integrity, the accuracy of semantics in complex cross-language function conversion is guaranteed, thereby ensuring the semantic fidelity of the query statement.
[0081] In one optional embodiment of this disclosure, the function name of the first function corresponding to the function node is adjusted to the function name of the second function, including: Based on the function name of the first function corresponding to the function node, determine the normalized function name corresponding to the first function; Based on the normalized function name, determine the corresponding function name of the second function, and adjust the function name of the first function corresponding to the function node to the function name of the second function; Adjust the first parameter of the first function in the function node so that its parameter format and order are adapted to the second parameter of the second function, including: Based on the function name of the second function, determine the parameter format and the parameter order of the second function; Based on the parameter format and parameter order of the second function, the first parameter is adjusted to become the second parameter of the second function.
[0082] Among them, the normalized function name is a standardized function semantic identifier that is independent of a specific language. It is a semantic intermediate representation of the function name and is used to realize the conversion of function names between different query languages.
[0083] For example, the operation of "binning by time interval" corresponds to the function "bin" in KQL and "date_trunc" in ES|QL. These function names can be mapped to the same normalized function name, such as "TIME_BINNING".
[0084] For example, a mapping table between function names and normalized function names under different query languages can be established. When converting function names, firstly, based on the function name of the source first function and the first query language, the corresponding normalized function name is determined by looking up the mapping table. Then, based on the normalized function name and the second query language, the mapping table is consulted to determine the specific function name that should be used in the second query language, and the function name in the function node is adjusted accordingly.
[0085] For example, the function name can be stored in association with the function format and parameter order of that function. After determining the function name of the second function, the parameter format and parameter order of the second function can be determined based on the function name. Then, the first parameter is adjusted based on the parameter format and parameter order of the second function to obtain the second parameter.
[0086] For example, the first function is "bin" in KQL, whose parameters are arranged in the order of [timestamp expression, interval value], and the interval value parameter is in the format of "number + unit letter" (e.g., 1h). The second function is "date_trunc" in ES|QL, whose parameters are arranged in the order of [interval value, timestamp expression], and the interval value is in the format of "unit English word" (e.g., hour).
[0087] In this scheme, a semantic intermediate layer is constructed by standardizing function names, achieving a one-to-one mapping between function names and planning function names, thereby ensuring the accuracy of function name conversion. Simultaneously, it facilitates expansion by directly adding mapping entries when introducing new query languages or new functionalities.
[0088] In one alternative approach of this disclosure, the target node includes a date node or a node containing a date value, and the target node in the first abstract syntax tree is transformed, including: Convert the first date format of the date value in the target node to the second date format adapted to the second query language.
[0089] Among them, the date node is a node in the abstract syntax tree that specifically represents the date / time data type, and its core value is a date value.
[0090] A node containing a date value is not strictly a date node, but one of its fields or attributes contains a date value.
[0091] Date formats differ significantly across query languages. For example, KQL uses the format "YYYY-MM-dd," while SPL uses "%Y-%m-%d." To ensure accuracy in query language conversion, date formats need to be converted accordingly.
[0092] For example, a mapping relationship can be established between date formats of different query languages so that a second date format in a second query language can be determined based on the mapping relationship, and the date values in the abstract syntax tree can be converted to the corresponding date format.
[0093] In this solution, date values in the abstract syntax tree are automatically identified and converted to a date format that conforms to the specifications of the second query language. This enables format conversion of date values at the node level of the abstract syntax tree, thereby improving the accuracy of query statement conversion.
[0094] In one alternative approach of this disclosure, the target node is a node containing a regular expression string, and the target node in the first abstract syntax tree is transformed, including: Determine whether the regular expression string of the target node is supported by the regular expression engine corresponding to the second query language; In response to the regular expression string being supported by the regular expression engine corresponding to the second query language, the regular expression string in the target node is preserved; If the regular expression string is not supported by the regular expression engine corresponding to the second query language, retain the regular expression string in the target node and add the corresponding prompt message.
[0095] In this context, the regular expression string is a string defined based on regular expression syntax in the query statement, commonly used for pattern matching (such as the `match` operation) or field extraction (such as the `parse` command). After the first query statement is converted into the first abstract syntax tree (BST), the regular expression string can be a function call parameter or part of an expression in a node of the BST.
[0096] The regular expression engine is the core component for parsing and executing regular expression matching. Different query languages may use different types of regular expression engines, and different engines support different syntax features.
[0097] If the regular expression string in the target node is supported by the regular expression engine corresponding to the second query language, the original regular expression string can be preserved.
[0098] When a regular expression string is not supported by the regular expression engine corresponding to the second query language, the original regular expression string can be retained, and a prompt message can be added to indicate the compatibility issue.
[0099] For example, the prompt message can clearly indicate, such as, that there is a problem with the regular expression string, what the specific incompatible features are, the possible impact of the problem, and the suggested direction of modification.
[0100] In this solution, the regular expression character transformation is tested for compatibility by the regular expression engine, and incompatibility is accurately identified. The original regular expression string is preserved, and precise prompts are added to prompt the user to manually make compatibility adjustments.
[0101] For example, Figure 2 This is a schematic diagram of the structure of the query statement conversion system provided in the embodiments of this disclosure.
[0102] like Figure 2 As shown, the system may specifically include: an input layer 210, a core transformation engine 220, and an output layer 230. The core transformation engine 220 specifically includes a parser 221, an abstract syntax tree generator 222, and an optimizer 223. The output layer 230 includes generators for multiple target query languages, namely, an SQL generator 231, a KQL generator 232, an ES|QL generator 233, and an AQL generator 234. The main processing flow of this system during query statement transformation is as follows: In the input layer 210, the system can accept various types of query languages as input, including: SQL, KQL, ES|QL, and AQL.
[0103] In the core transformation engine 220, the parser 221 can receive the input query language and perform preliminary parsing. The parsed result is then passed to the abstract syntax tree generator 222 to generate an abstract syntax tree (i.e., the first abstract syntax tree).
[0104] The optimizer 223 can optimize the generated abstract syntax tree, including adjusting function names, parameter formats, parameter arrangement attributes, date formats, regular expression strings, etc.
[0105] In output layer 230, the optimized abstract syntax tree (i.e., the second abstract syntax tree) is passed to the corresponding target query language generator to generate query statements in the target query language. SQL generator 231 converts the optimized abstract syntax tree into SQL query statements. KQL generator 232 converts the optimized abstract syntax tree into KQL query statements. ES|QL generator 233 converts the optimized abstract syntax tree into ES|QL query statements. AQL generator 234 converts the optimized abstract syntax tree into AQL query statements.
[0106] Finally, the system outputs the converted query statement in the target query language for further use or execution.
[0107] Figure 2 The system shown achieves conversion between different query languages through steps of parsing, generating an abstract syntax tree, optimizing, and regenerating. This conversion can be used in scenarios such as data migration and query language compatibility handling.
[0108] For example, Figure 3 This is a flowchart illustrating one specific implementation of the method provided in this disclosure.
[0109] like Figure 3 As shown, the process includes the following steps: Step S310: Lexical analysis to obtain lexical units.
[0110] After the conversion process begins, you can enter the first query statement. By parsing the first query statement, you can obtain the lexical unit.
[0111] Step S320: Divide into multiple processing stages.
[0112] After identifying the lexical units, the command delimiter in the lexical unit can be used as a delimiter to divide the lexical unit sequence into multiple processing stages (i.e., lexical unit groups).
[0113] Step S330: Generate command node.
[0114] After dividing the process into multiple processing stages, the corresponding command parser can be called for each processing stage to generate command node 1, command node 2, etc., and finally a list of nodes can be obtained.
[0115] After generating the list of command nodes, the remaining lexical units of the processing stage can be parsed to perform finer-grained lexical and syntactic analysis, and the corresponding expression subtree can be constructed. The expression subtree is then attached to the command node to form a complete command subtree, and finally the first abstract syntax tree is constructed.
[0116] Step S340: Traverse the node list based on the visitor pattern.
[0117] During the code generation phase, the node list can be traversed based on the visitor pattern, that is, traversing the nodes in the first abstract syntax tree. For each command node, step S350: generate code snippets can be executed directly, that is, directly generate the code snippet corresponding to the command node, and then step S360: check if it is the first snippet, that is, determine whether the snippet is the code snippet corresponding to the first command node. If the generated code snippet is the code snippet corresponding to the first command node, then step S370: add to the code in the final output is executed directly. If the generated code snippet is not the code snippet corresponding to the first command node, then a pipe symbol is inserted, and then step S370: add to the code in the final output is executed.
[0118] Finally, all the fragments are pieced together to form a complete second query statement that conforms to the syntax of the target language.
[0119] The solution provided in this disclosure can be applied to secure data querying across data platforms. It allows queries to be initiated against any one or more backend data platforms through a unified interactive entry point. For example, analysts can use their familiar KQL to perform queries, and the system automatically converts these queries into query commands for Splunk or Elasticsearch platforms and executes them, significantly reducing the complexity of data querying in a data platform environment and the manpower requirements.
[0120] The solution provided in this disclosure can also be applied to cross-platform migration of query logic, supporting the automatic conversion of verified query statements in the original data platform into versions adapted to the target data platform, thereby improving development efficiency.
[0121] The foregoing has described specific embodiments of this specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims may be performed in a different order than that shown in the embodiments and may still achieve the desired result. Furthermore, the processes depicted in the drawings do not necessarily require the specific or sequential order shown to achieve the desired result. In some embodiments, multitasking and parallel processing are possible or may be advantageous.
[0122] According to another embodiment, a query statement conversion apparatus is provided. Figure 4 A schematic diagram of the query statement conversion device according to one embodiment is shown. Figure 4 As shown, the query statement conversion device 400 includes: Abstract syntax tree generation module 410 is used to generate a first abstract syntax tree based on a first query statement of a first query language; The node conversion module 420 is used to convert the target node in the first abstract syntax tree in response to the existence of a pre-specified target node in the first abstract syntax tree, so as to obtain a second abstract syntax tree adapted to the second query language. The query statement generation module 430 is used to generate a second query statement in the second query language based on the second abstract syntax tree.
[0123] As an optional approach, the target node includes a function node. When transforming the target node in the first abstract syntax tree, the node transformation module 420 specifically performs the following: Adjust the function name of the first function corresponding to the function node to the function name of the second function. The first function is a function defined in the first query language, and the second function is a function defined in the second query language. The first function and the second function are functions with the same functionality. Adjust the first parameter of the first function in the function node so that its parameter format and parameter order are adapted to the second parameter of the second function.
[0124] As an optional method, when the node conversion module 420 adjusts the function name of the first function corresponding to the function node to the function name of the second function, it is specifically used for: Based on the function name of the first function corresponding to the function node, determine the normalized function name corresponding to the first function; Based on the normalized function name, determine the corresponding function name of the second function, and adjust the function name of the first function corresponding to the function node to the function name of the second function; Adjust the first parameter of the first function in the function node so that its parameter format and order are adapted to the second parameter of the second function, including: Based on the function name of the second function, determine the parameter format and the parameter order of the second function; Based on the parameter format and parameter order of the second function, the first parameter is adjusted to become the second parameter of the second function.
[0125] As an optional approach, the abstract syntax tree generation module 410 is specifically used for: Lexical analysis is performed on the first query statement to obtain lexical units; Identify the command delimiter in the lexical unit and divide the lexical unit into lexical unit groups corresponding to different commands based on the command delimiter; Based on each lexical unit group, command nodes are constructed, and expression subtrees mounted on the command nodes are constructed, thereby generating the first abstract syntax tree.
[0126] As an optional approach, the target node may include a date node or a node containing a date value. When transforming the target node in the first abstract syntax tree, the node transformation module 420 specifically performs the following: Convert the first date format of the date value in the target node to the second date format adapted to the second query language.
[0127] As an optional approach, the target node is a node containing a regular expression string. When the node conversion module 420 converts the target node in the first abstract syntax tree, it specifically performs the following: Determine whether the regular expression string of the target node is supported by the regular expression engine corresponding to the second query language; In response to the regular expression string being supported by the regular expression engine corresponding to the second query language, the regular expression string in the target node is preserved; If the regular expression string is not supported by the regular expression engine corresponding to the second query language, retain the regular expression string in the target node and add the corresponding prompt message.
[0128] The various embodiments in this specification are described in a progressive manner. Similar or identical parts between embodiments can be referred to mutually. Each embodiment focuses on describing the differences from other embodiments. In particular, for system or system embodiments, since they are basically similar to method embodiments, the description is relatively simple, and relevant parts can be referred to the descriptions in the method embodiments. The systems and system embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without creative effort.
[0129] The collection, storage, use, processing, transmission, provision, and disclosure of user personal information involved in the technical solution disclosed herein comply with the provisions of relevant laws and regulations and do not violate public order and good morals.
[0130] According to embodiments of this disclosure, this disclosure also provides an electronic device, a readable storage medium, and a computer program product.
[0131] Figure 5 A schematic block diagram of an example electronic device 500 that can be used to implement embodiments of the present disclosure is shown. The electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processors, cellular phones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely illustrative and are not intended to limit the implementation of the present disclosure described and / or claimed herein.
[0132] like Figure 5 As shown, device 500 includes a computing unit 501, which can perform various appropriate actions and processes based on a computer program stored in read-only memory (ROM) 502 or a computer program loaded from storage unit 508 into random access memory (RAM) 503. RAM 503 may also store various programs and data required for the operation of device 500. The computing unit 501, ROM 502, and RAM 503 are interconnected via bus 504. Input / output (I / O) interface 505 is also connected to bus 504.
[0133] Multiple components in device 500 are connected to I / O interface 505, including: input unit 506, such as keyboard, mouse, etc.; output unit 507, such as various types of monitors, speakers, etc.; storage unit 508, such as disk, optical disk, etc.; and communication unit 509, such as network card, modem, wireless transceiver, etc. Communication unit 509 allows device 500 to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks.
[0134] The computing unit 501 can be a variety of general-purpose and / or special-purpose processing components with processing and computing capabilities. Some examples of the computing unit 501 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various special-purpose artificial intelligence (AI) computing chips, various computing units running machine learning model algorithms, a digital signal processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 501 performs the AAA method described above. For example, in some embodiments, the AAA method described above can be implemented as a computer software program tangibly contained in a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program can be loaded and / or installed on device 500 via ROM 502 and / or communication unit 509. When the computer program is loaded into RAM 503 and executed by the computing unit 501, one or more steps of the AAA method described above can be performed. Alternatively, in other embodiments, the computing unit 501 can be configured to perform the AAA method described above by any other suitable means (e.g., by means of firmware).
[0135] Various embodiments of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), systems-on-a-chip (SoCs), complex programmable logic devices (CPLDs), computer hardware, firmware, software, and / or combinations thereof. These various embodiments may include implementations in one or more computer programs that can be executed and / or interpreted on a programmable system including at least one programmable processor, which may be a dedicated or general-purpose programmable processor, capable of receiving data and instructions from a storage system, at least one input device, and at least one output device, and transmitting data and instructions to the storage system, the at least one input device, and the at least one output device.
[0136] The program code used to implement the methods of this disclosure may be written in any combination of one or more programming languages. This program code may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing apparatus, such that when executed by the processor or controller, the program code causes the functions / operations specified in the flowcharts and / or block diagrams to be implemented. The program code may be executed entirely on a machine, partially on a machine, as a standalone software package partially on a machine and partially on a remote machine, or entirely on a remote machine or server.
[0137] In the context of this disclosure, a machine-readable medium can be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium can be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium can be, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
[0138] To provide interaction with a user, the systems and techniques described herein can be implemented on a computer having: a display device for displaying information to the user (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor); and a keyboard and pointing device (e.g., a mouse or trackball) through which the user provides input to the computer. Other types of devices can also be used to provide interaction with the user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form (including sound input, voice input, or tactile input).
[0139] The systems and technologies described herein can be implemented in computing systems that include backend components (e.g., as a data server), or computing systems that include middleware components (e.g., an application server), or computing systems that include frontend components (e.g., a user computer with a graphical user interface or web browser through which a user can interact with implementations of the systems and technologies described herein), or any combination of such backend, middleware, or frontend components. The components of the system can be interconnected via digital data communication of any form or medium (e.g., a communication network). Examples of communication networks include local area networks (LANs), wide area networks (WANs), and the Internet.
[0140] Computer systems can include clients and servers. Clients and servers are generally located far apart and typically interact via communication networks. Client-server relationships are created by computer programs running on the respective computers and having a client-server relationship with each other. Servers can be cloud servers, servers in distributed systems, or servers incorporating blockchain technology.
[0141] It should be understood that the various forms of processes shown above can be used to rearrange, add, or delete steps. For example, the steps described in this disclosure can be executed in parallel, sequentially, or in different orders, as long as the desired result of the technical solution disclosed in this disclosure can be achieved, and this is not limited herein.
[0142] The specific embodiments described above do not constitute a limitation on the scope of protection of this disclosure. Those skilled in the art should understand that various modifications, combinations, sub-combinations, and substitutions can be made according to design requirements and other factors. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of this disclosure should be included within the scope of protection of this disclosure.
Claims
1. A query statement transformation method, comprising: Based on the first query statement of the first query language, generate the first abstract syntax tree; In response to the existence of a pre-specified target node in the first abstract syntax tree, the target node in the first abstract syntax tree is transformed to obtain a second abstract syntax tree adapted to the second query language; Based on the second abstract syntax tree, a second query statement in the second query language is generated.
2. The method of claim 1, wherein, The target node includes a function node, and the transformation of the target node in the first abstract syntax tree includes: The function name of the first function corresponding to the function node is adjusted to the function name of the second function. The first function is a function defined in the first query language, and the second function is a function defined in the second query language. The first function and the second function are functions with the same function. The first parameter of the first function in the function node is adjusted so that its parameter format and parameter order are adapted to the second parameter of the second function.
3. The method of claim 2, wherein, The step of adjusting the function name of the first function corresponding to the function node to the function name of the second function includes: Based on the function name of the first function corresponding to the function node, determine the normalized function name corresponding to the first function; Based on the normalized function name, determine the corresponding function name of the second function, and adjust the function name of the first function corresponding to the function node to the function name of the second function; The step of adjusting the first parameter of the first function in the function node to adapt its parameter format and parameter order to the second parameter of the second function includes: Based on the function name of the second function, determine the parameter format and the parameter arrangement order of the second function; Based on the parameter format and parameter arrangement order of the second function, the first parameter is adjusted to become the second parameter of the second function.
4. The method of claim 1, wherein, The first query statement based on the first query language generates a first abstract syntax tree, including: Lexical analysis is performed on the first query statement to obtain lexical units; Identify the command separator in the lexical unit, and divide the lexical unit into lexical unit groups corresponding to different commands based on the command separator; Based on each of the lexical unit groups, a command node is constructed, and an expression subtree mounted on the command node is constructed, thereby generating the first abstract syntax tree.
5. The method of claim 1, wherein, The target node includes a date node or a node containing a date value, and the transformation of the target node in the first abstract syntax tree includes: The first date format of the date value in the target node is converted into a second date format adapted to the second query language.
6. The method of claim 1, wherein, The target node is a node containing a regular expression string, and the transformation of the target node in the first abstract syntax tree includes: Determine whether the regular expression string of the target node is supported by the regular expression engine corresponding to the second query language; In response to the fact that the regular expression string is supported by the regular expression engine corresponding to the second query language, the regular expression string in the target node is retained; In response to the fact that the regular expression string is not supported by the regular expression engine corresponding to the second query language, the regular expression string in the target node is retained, and corresponding prompt information is added.
7. A query statement conversion device, comprising: The abstract syntax tree generation module is used to generate a first abstract syntax tree based on the first query statement of the first query language; The node conversion module is used to convert the target node in the first abstract syntax tree in response to the existence of a pre-specified target node in the first abstract syntax tree, so as to obtain a second abstract syntax tree adapted to the second query language. The query statement generation module is used to generate a second query statement in the second query language based on the second abstract syntax tree.
8. An electronic device, comprising: At least one processor; as well as A memory communicatively connected to the at least one processor; wherein, The memory stores instructions that can be executed by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.
9. A non-transitory computer readable storage medium having stored thereon computer instructions, wherein, The computer instructions are used to cause the computer to perform the method according to any one of claims 1-6.
10. A computer program product comprising a computer program that, when executed by a processor, implements the method according to any one of claims 1-6.