Method, system and electronic device for accessing newly added fields of a data set

By generating target fields and field linked lists, and using WITH clause expressions to combine access statements, the problem of low expansion efficiency in traditional dataset management methods is solved, and dynamic expansion and efficient querying of dataset fields are realized.

CN119149567BActive Publication Date: 2026-06-19广域铭岛数字科技有限公司 +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
广域铭岛数字科技有限公司
Filing Date
2024-09-05
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Traditional dataset management methods are struggling to adapt to the rapid growth in business changes and data analysis needs, resulting in low dataset expansion efficiency and impacting work efficiency.

Method used

By generating new target fields and field lists, and using WITH clause expressions to combine access statements, the dataset fields can be dynamically expanded, reducing the number of dataset decommissioning steps.

Benefits of technology

It simplifies the data architecture, improves the efficiency of dataset expansion, makes query statements more concise and efficient, and reduces expansion steps.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN119149567B_ABST
    Figure CN119149567B_ABST
Patent Text Reader

Abstract

This application relates to the field of dataset technology and discloses a method, system, and electronic device for accessing new fields in a dataset. The application, on the server side, uses the original field and existing target fields as parent fields, generates new target fields according to field generation rules, and generates a field linked list based on the field generation rules between the target fields. Access statement fragments corresponding to the target fields are then combined using WITH clauses according to the field linked list to obtain field access statements. These field access statements provide access services to the target fields to the user terminal. This not only organizes the target fields using the field linked list, keeping the number of database objects within a certain limit and greatly simplifying the data architecture, but also allows for field expansion without disabling the dataset, making query statements more concise and efficient. Therefore, it reduces the steps required for dataset expansion and improves the efficiency of dataset expansion in two ways.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of dataset technology, and in particular to a method, system, and electronic device for accessing new fields in a dataset. Background Technology

[0002] In the current wave of digital transformation, the rapid development of information technology and the widespread application of big data have greatly boosted enterprise decision-making efficiency and business innovation capabilities. As a core production factor, the efficient management and flexible application of data have become crucial to enterprise competitiveness. However, with increasing business complexity and the growing diversity of data analysis needs, traditional dataset management methods are gradually revealing their limitations.

[0003] Traditional dataset architectures are often based on static design, meaning the field structure and data model are fixed from the outset, making it difficult to adapt to subsequent business changes and the rapid growth of data analysis needs. When projects are upgraded or business requirements change, expanding dataset fields typically involves a cumbersome process: first, the dataset must be paused to avoid data inconsistencies; then, developers need to delve into the underlying layers, modifying the dataset's SQL queries, and even uploading new data source files. This process is not only time-consuming and labor-intensive but can also lead to service interruptions, impacting business continuity. To alleviate this problem, some enterprises have attempted to build complex data warehouse systems, using multiple database tables or views for hierarchical data storage and logical processing to achieve dynamic dataset expansion. However, while this approach improves dataset flexibility to some extent, adding new database tables or views not only increases system complexity and maintenance costs but also requires fine-grained management of data permissions to ensure data security and compliance. This makes the dataset expansion process cumbersome and error-prone, and the dataset's utilization efficiency and response speed remain low.

[0004] Therefore, existing dataset expansion methods suffer from cumbersome steps due to various reasons, resulting in low dataset expansion efficiency and consequently affecting work efficiency. Summary of the Invention

[0005] To provide a basic understanding of some aspects of the disclosed embodiments, a brief summary is given below. This summary is not intended as a general commentary, nor is it intended to identify key / important components or describe the scope of protection of these embodiments, but rather as a prelude to the detailed description that follows.

[0006] In view of the shortcomings of the prior art described above, this application discloses a method, system and electronic device for accessing new fields in a dataset, so as to improve the expansion efficiency of the dataset.

[0007] This application discloses a method for accessing newly added fields in a dataset, applied on a server-side platform connected to a user terminal. The method includes: acquiring an original dataset, the original dataset including original fields; generating new target fields according to the field generation rules corresponding to the original dataset, based on the original fields and / or existing target fields, according to the field generation rules; establishing a linked list data structure based on the field generation rules between the target fields to obtain a field linked list composed of the target fields, wherein the field linked list stores access statement fragments corresponding to the target fields; and combining the access statement fragments using WITH clauses according to the field linked list to obtain a field access statement, wherein the user terminal uses the field access statement to access the target fields.

[0008] In one embodiment of this application, generating a new target field according to the original field and / or an existing target field according to the field generation rule includes at least one of the following: if the rule type corresponding to the field generation rule is a calculated column type, then the field generation rule includes a field calculation statement and a field type; the field calculation statement is executed according to the original field and / or the existing target field to obtain a calculation result; the calculation result is verified according to the field generation rule; after the verification passes, the calculation result is stored according to the field type to obtain the target field; wherein the field calculation statement is generated based on an SQL statement; if the rule type corresponding to the field generation rule is a summary column type, then the field generation rule includes a window function; the window function is executed according to the original field and / or the existing target field to obtain the target field output by the window function.

[0009] In one embodiment of this application, a linked list data structure is established according to the field generation rules between target fields to obtain a field linked list composed of the target fields, including: obtaining one or more business projects; classifying the target fields according to the business projects to obtain global fields and private fields, wherein the private fields include project fields corresponding to each business project; establishing a linked list data structure according to the field generation rules between global fields to obtain a global field chain, and establishing a linked list data structure according to the field generation rules between project fields to obtain project field chains corresponding to each business project; linking the global field chains to each project field chain according to the field generation rules to obtain a complete field chain.

[0010] In one embodiment of this application, the target fields are classified according to the business projects to obtain global fields and private fields, including: pre-setting usage probability information corresponding to each of the original fields, wherein the usage probability information includes the field usage probability corresponding to each of the business projects; calculating the usage probability information corresponding to the second field based on the usage probability information corresponding to the first field, wherein the first field is the parent field corresponding to the second field, and the second field is any target field; arranging the field usage probabilities corresponding to the second field in descending order, and determining a first priority probability and a second priority probability; if the difference between the first priority probability and the second priority probability is greater than or equal to a preset threshold, then the second field is determined as a project field corresponding to a pending project, wherein the pending project is the business project corresponding to the first priority probability; if the difference between the first priority probability and the second priority probability is less than the preset threshold, then the second field is determined as a global field.

[0011] In one embodiment of this application, the access statement fragments are combined using WITH clauses according to the field linked list to obtain a field access statement, including: obtaining a target project, wherein the target project is any business project; determining the current field chain corresponding to the target project from each of the project field chains based on the matching result between the target project and the business project, and combining the global field chain and the current field chain to obtain a field chain to be accessed; and sequentially combining the access statement fragments in the field chain to be accessed using WITH clauses according to the link relationship between each target field in the field chain to be accessed, and generating a query main clause based on the access statement fragments to obtain a field access statement composed of the access statement fragments and the query main clause.

[0012] In one embodiment of this application, the access statement fragments in the chain of fields to be accessed are sequentially combined using the WITH clause expression, including: if the target field does not belong to an aggregate field, then the access statement fragments corresponding to the target field are combined; if the target field belongs to an aggregate field, then when generating the main query clause, the access statement fragments corresponding to the aggregate field are combined in the main query clause, wherein the field generation rules corresponding to the aggregate field include aggregate functions.

[0013] In one embodiment of this application, obtaining an original dataset includes: responding to a data source connection rule, the data source connection rule including a data source type and / or data source connection information, connecting to an original data source according to the data source connection information; responding to a dataset generation rule, the dataset generation rule including data source matching information, data collection information, and field storage information, matching from each of the original data sources according to the data source matching information to obtain a target data source, wherein the data source matching information includes a dataset type and / or a data source identifier; collecting data from the target data source according to the data collection information to obtain a target data column; storing the target data column according to the field storage information to obtain original fields, and generating an original dataset corresponding to the target data source according to the original fields, wherein the field storage information includes a field type and / or a field name.

[0014] In one embodiment of this application, the method further includes at least one of the following: receiving a data source connection rule sent by the user terminal, wherein the user terminal is configured to display a data source interface, allowing a user to input the data source connection rule through the data source interface; receiving a dataset generation rule sent by the user terminal, wherein the user terminal is configured to display a dataset interface, allowing a user to input the dataset generation rule through the dataset interface; and receiving a field generation rule sent by the user terminal, wherein the user terminal is configured to display a field interface, allowing a user to input the field generation rule through the field interface.

[0015] This application discloses a system for accessing new fields in a dataset, comprising: a server for acquiring an original dataset, the original dataset including original fields; generating new target fields according to the field generation rules corresponding to the original dataset and / or existing target fields according to the field generation rules; establishing a linked list data structure according to the field generation rules between the target fields to obtain a field linked list composed of the target fields, wherein the field linked list stores access statement fragments corresponding to the target fields; combining the access statement fragments in WITH clause form according to the field linked list to obtain a field access statement; and a user terminal connected to the server, the user terminal being used to access the target fields through the field access statement.

[0016] This application discloses an electronic device, including: a processor and a memory; the memory is used to store a computer program, and the processor is used to execute the computer program stored in the memory to cause the electronic device to perform the above-described method.

[0017] The beneficial effects of this application are:

[0018] On the server side, using the original fields and existing target fields as parent fields, new target fields are generated according to field generation rules. A field linked list is then created based on the field generation rules between the target fields. Access statement fragments corresponding to the target fields are combined using WITH clauses according to this linked list to obtain the field access statement. This field access statement provides access services to the target fields to the user terminal. This approach not only organizes the target fields using a field linked list, significantly simplifying the data architecture by keeping the number of database objects to a single level compared to creating multiple database tables or views for field expansion, but also allows for field expansion without disabling the dataset, making queries more concise and efficient. Therefore, it reduces the steps involved in dataset expansion from two aspects, improving the efficiency of dataset expansion. Attached Figure Description

[0019] Figure 1 This is a schematic diagram of the structure of an application environment implementing a method for accessing new fields in a dataset, as described in this application embodiment.

[0020] Figure 2 This is a flowchart illustrating a method for accessing newly added fields in a dataset, as described in an embodiment of this application.

[0021] Figure 3 This is a flowchart illustrating a method for obtaining data source connection rules in an embodiment of this application;

[0022] Figure 4 This is a schematic diagram of the structure of a data source interface in an embodiment of this application;

[0023] Figure 5 This is a flowchart illustrating a method for obtaining dataset generation rules in an embodiment of this application;

[0024] Figure 6 This is a schematic diagram of the structure of a dataset interface in an embodiment of this application;

[0025] Figure 7 This is a flowchart illustrating a method for obtaining a field generation rule in an embodiment of this application;

[0026] Figure 8 This is a schematic diagram of the structure of a field interface in an embodiment of this application;

[0027] Figure 9 This is a schematic diagram of the structure of another field interface in an embodiment of this application;

[0028] Figure 10 This is a schematic diagram of the structure of a complete field chain in an embodiment of this application;

[0029] Figure 11 This is a schematic diagram of the structure of a new field access system for a dataset in an embodiment of the present invention;

[0030] Figure 12 This is a schematic diagram of the structure of an electronic device in an embodiment of the present invention. Detailed Implementation

[0031] The following specific examples illustrate the implementation of the present invention. Those skilled in the art can easily understand other advantages and effects of the present invention from the content disclosed in this specification. The present invention can also be implemented or applied through other different specific embodiments, and various details in this specification can also be modified or changed based on different viewpoints and applications without departing from the spirit of the present invention. It should be noted that, unless otherwise specified, the following embodiments and sub-samples in the embodiments can be combined with each other.

[0032] It should be noted that the illustrations provided in the following embodiments are only schematic representations of the basic concept of the present invention. Therefore, the drawings only show the components related to the present invention and are not drawn according to the actual number, shape and size of the components in the actual implementation. In the actual implementation, the form, quantity and proportion of each component can be arbitrarily changed, and the layout of the components may also be more complex.

[0033] In the following description, numerous details are explored to provide a more thorough explanation of embodiments of the invention. However, it will be apparent to those skilled in the art that embodiments of the invention may be practiced without these specific details. In other embodiments, well-known structures and devices are shown in block diagram form rather than in detail to avoid obscuring embodiments of the invention.

[0034] The terms "first," "second," etc., used in the specification, claims, and accompanying drawings of this disclosure are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate for the embodiments of this disclosure described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion.

[0035] Unless otherwise stated, the term "multiple" means two or more.

[0036] In this embodiment of the disclosure, the character " / " indicates that the objects before and after it are in an "or" relationship. For example, A / B means: A or B.

[0037] The term "and / or" describes an association between objects, indicating that three relationships can exist. For example, A and / or B means: A or B, or A and B.

[0038] Combination Figure 1 As shown, this disclosure provides an application environment for implementing a method for accessing new fields in a dataset, including a server and a user terminal, wherein the server is connected to the user terminal via a network.

[0039] The server-side is used for at least one of the following: obtaining the original dataset, which includes original fields; generating new target fields according to the field generation rules corresponding to the original dataset and / or existing target fields; establishing a linked list data structure according to the field generation rules between the target fields to obtain a field linked list composed of the target fields, wherein the field linked list stores access statement fragments corresponding to the target fields; and combining the access statement fragments in WITH clause form according to the field linked list to obtain the field access statement.

[0040] The user terminal is used to access the target field through field access statements.

[0041] Combination Figure 2 As shown, this disclosure provides a method for accessing newly added fields in a dataset, applied on a server side, with the server side connected to a user terminal. The method includes:

[0042] Step S201: Obtain the original dataset;

[0043] The original dataset includes the original fields;

[0044] Step S202: In response to the field generation rules corresponding to the original dataset, generate new target fields according to the field generation rules based on the original fields and / or existing target fields;

[0045] Step S203: Establish a linked list data structure according to the field generation rules between each target field to obtain a field linked list composed of each target field;

[0046] The field linked list stores the access statement fragment corresponding to the target field, the field name corresponding to the target field, the field category, the actual field type, the field type specified by the user, whether it contains aggregate functions, the preceding field, and the following field.

[0047] Among them, the access statement fragment corresponding to the target field is an expression of an SQL (Structured Query Language) statement;

[0048] Step S204: According to the field linked list, combine the access statement fragments in the WITH clause expression to obtain the field access statement;

[0049] The WITH clause is a way to define temporary result sets in an SQL query. It allows developers to create in-memory result sets that can be used multiple times in subsequent queries, thereby simplifying complex query logic and improving code readability and maintainability.

[0050] The user terminal is used to access the target field through a field access statement.

[0051] The method for adding fields to a dataset provided in this disclosure involves generating new target fields on the server side using the original fields and existing target fields as parent fields, according to field generation rules. A field linked list is then generated based on the field generation rules between the target fields. Access statement fragments corresponding to the target fields are combined using WITH clauses according to the field linked list to obtain field access statements. These field access statements provide access services to the target fields to the user terminal. This approach not only organizes the target fields using a field linked list, simplifying the data architecture by limiting the number of database objects to a single set compared to creating multiple database tables or views for field expansion, but also allows for field expansion without disabling the dataset, making query statements more concise and efficient. This reduces the steps required for dataset expansion in two ways, improving the efficiency of dataset expansion.

[0052] In some embodiments, this disclosure provides a method for accessing new fields in a dataset, comprising: obtaining an original dataset, wherein the original dataset includes original fields; in response to a field generation rule corresponding to the original dataset, determining a parent field from the basic fields, and generating a target field based on the parent field according to the field generation rule, adding the target field to the basic fields, wherein the basic fields also include the original fields; generating field reference relationships between the target fields according to the field generation rule, so as to establish a linked list data structure based on the field reference relationships, thereby obtaining a field linked list composed of each target field; and generating a field access statement containing a WITH clause according to the field reference relationships, so that a user terminal can access the target field through the field access statement.

[0053] Optionally, obtaining the original dataset includes: in response to data source connection rules, which include data source type and / or data source connection information, connecting to the original data source based on the data source connection information; in response to dataset generation rules, which include data source matching information, data collection information, and field storage information, matching from each original data source based on the data source matching information to obtain the target data source, wherein the data source matching information includes dataset type and / or data source identifier; collecting data from the target data source based on the data collection information to obtain the target data column; storing the target data column based on the field storage information to obtain the original fields, and generating the original dataset corresponding to the target data source based on the original fields, wherein the field storage information includes field type and / or field name.

[0054] In some embodiments, the raw data source stores all the information for establishing a database connection to provide a device or raw media that provides the required data. Just as a file can be found in a file system by specifying a file name, the corresponding database connection can be queried by providing the correct data source name. This is often used in data analysis, data integration, business intelligence (BI), and other data-driven applications.

[0055] In some embodiments, the original data source includes relational database management systems (RDBMS), NoSQL (Not Only SQL) databases, data warehouses, file storage, application programming interfaces (APIs), sensors and Internet of Things (IoT) devices, spreadsheets, etc.

[0056] In some embodiments, relational databases include MySQL, PostgreSQL, Oracle, and SQL Server, which use tables to store structured data and support SQL for data querying and manipulation; NoSQL databases include MongoDB, Cassandra, and Couchbase, which are used to store unstructured or semi-structured data and support highly scalable and flexible data models; data warehouses include Amazon Redshift, Google BigQuery, and Snowflake, which are used to store and analyze large amounts of historical data, optimize query performance, and support complex data analysis and report generation; file storage includes CSV (Comma-Separated Values), JSON (JavaScript Object Notation), XML (Extensible Markup Language), Parquet, and other file formats, which are stored on local file systems, cloud storage (such as Amazon S3 and Google Cloud Storage), or distributed file systems (such as Hadoop HDFS); real-time data streaming includes Apache Kafka, Apache Flink, and Apache Spark. Streaming includes real-time data streaming for processing and analyzing real-time generated data streams, suitable for applications requiring real-time decision-making and response; application programming interfaces (APIs) for providing programmatic access to data from external systems or services; sensors and IoT devices for generating and transmitting real-time data, such as temperature sensors and smart home devices; and spreadsheets, including Microsoft Excel and Google Sheets, for small-scale data storage and basic data analysis.

[0057] In some embodiments, a raw dataset refers to a set of organized data, typically existing in the form of tables, files, databases, or other structured and unstructured formats. One dataset usually corresponds to one data source, while one data source can correspond to multiple datasets. A raw dataset contains multiple data records or entries, each consisting of one or more fields (columns). Datasets can originate from various data sources and are used for applications such as data analysis, machine learning, and data mining. Depending on the type of data source, there will be corresponding types of datasets, such as database types, file types, and API types. For ease of understanding, a data source can be simply compared to a database, and a dataset to a table.

[0058] Optionally, the method further includes: receiving data source connection rules sent by a user terminal, wherein the user terminal is used to display a data source interface, allowing the user to input data source connection rules through the data source interface.

[0059] Combination Figure 3 As shown in the figure, this disclosure provides a method for obtaining data source connection rules, including:

[0060] Step S301: Display the data source interface to the user;

[0061] Among them, at least a part of the data source interface, such as Figure 4 As shown;

[0062] Step S302: The user fills in the data source type corresponding to the original data source through the data source interface;

[0063] The data sources include relational databases, NoSQL databases, data warehouses, file storage, application programming interfaces, sensors and IoT devices, spreadsheets, etc.

[0064] Step S303: The user fills in the data source connection information according to the data source type through the data source interface;

[0065] Depending on the data source type, different data source connection information identifiers and connections to the corresponding original data source need to be filled in;

[0066] If the data source type is a relational database, the data source connection information includes the database type, host address, port, database name, account and password, etc.

[0067] Step S304: Test the connectivity of the original data source based on the data source type and data source connection information;

[0068] Step S305: If the test passes, generate data source connection rules based on the data source type and data source connection information.

[0069] Optionally, the method further includes: receiving dataset generation rules sent by a user terminal, wherein the user terminal is used to display a dataset interface, allowing the user to input dataset generation rules through the dataset interface.

[0070] Combination Figure 5 As shown in the embodiments of this disclosure, a method for obtaining dataset generation rules is provided, including:

[0071] Step S501: Display the dataset interface to the user;

[0072] At least a portion of the dataset interface, such as Figure 6 As shown;

[0073] Step S502: The user fills in the dataset type through the dataset interface;

[0074] The dataset types include database types, file types, Kafka types, or API types, etc.

[0075] The dataset type must be consistent with the data source type of the target data source;

[0076] Step S503: The user selects the target database from the original database through the dataset interface;

[0077] Among them, the target database is determined by identifier matching based on the user's selection;

[0078] Step S504: The user fills in the data collection information through the dataset interface;

[0079] Depending on the different dataset types, corresponding data collection information needs to be filled in;

[0080] If the dataset type is a database, the data collection information includes SQL statements, etc.

[0081] If the dataset type is a file type, the data collection information includes Excel files, CSV files, etc.

[0082] If the dataset type is Kafka, then the data collection information package includes Topic, etc.

[0083] If the dataset type is API, the data collection information includes interface path information, etc.

[0084] Step S505: The user fills in the fields to store information through the dataset interface;

[0085] The field storage information includes field type and / or field name, and the field type includes characters, numbers, or time, etc.

[0086] Optionally, the method further includes: receiving field generation rules sent by a user terminal, wherein the user terminal is used to display a field interface, allowing the user to input field generation rules through the field interface.

[0087] In some embodiments, the field name of the target field is filled in through the field interface. This name is displayed in the field access statement as characters processed by MD5, in order to avoid the field name not meeting the query engine's specifications.

[0088] Optionally, a new target field is generated according to the original field and / or the existing target field according to the field generation rule, including: if the rule type corresponding to the field generation rule is a calculated column type, then the field generation rule includes a field calculation statement and a field type; the field calculation statement is executed according to the original field and / or the existing target field to obtain the calculation result; the calculation result is validated according to the field generation rule; after the validation passes, the calculation result is stored according to the field type to obtain the target field, wherein the field calculation statement is generated based on the SQL statement.

[0089] In some embodiments, the field generation rules corresponding to the computed column type generate the target field by writing SQL fragments. Compared to the summary column type, the computed column type supports any SQL statement, including aggregate functions, and has greater flexibility.

[0090] In some embodiments, the field generation rules corresponding to the computed column type support character, number, and time types. After writing the SQL statement, the type of the newly added field can be selected according to the requirements. If automatic is selected, the program will obtain the field type based on the execution result of the SQL statement. Otherwise, it will determine whether the actual type of the SQL statement execution is consistent with the type selected by the user. If they are inconsistent, a forced type conversion will be performed.

[0091] In some embodiments, due to the high flexibility of SQL statements, the field generation rules corresponding to the calculated column type can be tested manually or automatically to verify whether the written SQL statement conforms to the specifications and can be executed correctly, and the test will return the actual type of the field.

[0092] Optionally, a new target field is generated based on the original field and / or the existing target field according to the field generation rule, including: if the rule type corresponding to the field generation rule is a summary column type, then the field generation rule includes a window function, and the window function is executed based on the original field and / or the existing target field to obtain the target field output by the window function.

[0093] In some embodiments, the field generation rules corresponding to the summary column type generate the target field through the window function's dimension, measure, sorting field, and summary method.

[0094] In some embodiments, the field generation rules corresponding to the summary column type use window functions to generate target data, which can only be numeric type, so the summary column does not need to specify the type.

[0095] In some embodiments, the field generation rules corresponding to the summary column type are strictly limited according to the window function specification and cannot be modified by the user at will. Therefore, there is no need to perform trial calculations.

[0096] Combination Figure 7As shown in the embodiments of this disclosure, a method for obtaining field generation rules is provided, including:

[0097] Step S701: Display the field interface to the user;

[0098] The interface for the fields corresponding to the calculated column type is as follows: Figure 8 As shown, the interface for the fields corresponding to the summary column type is as follows: Figure 9 As shown;

[0099] Step S702: The user selects the rule type through the field interface;

[0100] The rule types include calculated column types or summary column types;

[0101] In step S703, the user fills in the field name of the target field through the field interface, and then proceeds to steps S704 and S70.

[0102] Step S704: If the rule type is a computed column type, the user writes an SQL statement through the field interface and fills in the field type corresponding to the target field.

[0103] Step S705: Perform a trial calculation based on the entered SQL statement and field type; if the trial calculation passes, proceed to step S707.

[0104] Step S706: If the rule type is a summary column type, the user fills in the dimension, measure, sorting field and summary method of the window function through the field interface, and then proceeds to step S707.

[0105] Step S707: Save the field generation rules;

[0106] Step S708: Display the target data to the user through the field interface.

[0107] Optionally, the method further includes: establishing a linked list data structure based on the field generation rules between each target field to obtain a field linked list composed of each target field, including: obtaining one or more business projects; classifying each target field according to the business project to obtain global fields and private fields, wherein the private fields include the project fields corresponding to each business project; establishing a linked list data structure based on the field generation rules between each global field to obtain a global field chain, and establishing a linked list data structure based on the field generation rules between each project field to obtain the project field chain corresponding to each business project; linking the global field chain to each project field chain according to the field generation rules to obtain a complete field chain.

[0108] Combination Figure 10As shown, this embodiment of the disclosure provides a complete field chain, which includes a global field chain, item field chain 1, item field chain 2, and item field chain 3. The global field chain includes global field A, global field B, and global field C. Item field chain 1 includes item field E and item field F. Item field chain 2 includes item field G, item field H, and item field I. Item field chain 3 includes item field G and item field J. The global field C has field reference relationships with item field E and item field G, respectively.

[0109] In some embodiments, target fields are connected through field reference relationships to form a global field chain and multiple project field chains. This chain is named With-Pipeline. Target fields in the global field chain are visible to all projects, while target fields in the project field chains are only visible to the project. Therefore, the field name of the same target field can be the same in different project field chains, but any field name in a project field chain cannot be the same as a field name in the global field chain.

[0110] Optionally, the target fields are categorized according to business projects to obtain global fields and private fields, including: pre-setting usage probability information corresponding to each original field, wherein the usage probability information includes the field usage probability corresponding to each business project; calculating the usage probability information corresponding to the second field based on the usage probability information corresponding to the first field, wherein the first field is the parent field corresponding to the second field, and the second field is any target field; arranging the field usage probabilities corresponding to the second field in descending order, and determining the first priority probability and the second priority probability; if the difference between the first priority probability and the second priority probability is greater than or equal to a preset threshold, then the second field is determined as the project field corresponding to the undetermined project, wherein the undetermined project is the business project corresponding to the first priority probability; if the difference between the first priority probability and the second priority probability is less than the preset threshold, then the second field is determined as a global field.

[0111] In some embodiments, the usage frequency of each field in each business project is statistically analyzed using data analysis tools or historical query records, and the usage probability of each field in each business project is calculated accordingly. For example, the usage probability of the "Order ID" field is 0.8 in the e-commerce project, 0.2 in the financial project, and 0.05 in the logistics project. This data is pre-stored in the database; the hierarchical relationship between fields is defined, for example, "User ID" is the parent field of "Order ID" because orders are usually associated with users, and this hierarchical relationship is also stored in the database for subsequent calculations; for any target field (such as "Shipping Address"), the usage probability of the target field in each business project is calculated using a certain algorithm (such as the weighted average method) based on the usage probability information of its parent field (such as "Order ID"). Assuming that the high usage probability of "Order ID" in the e-commerce project affects the usage probability of its subordinate field "Shipping Address" in the e-commerce project; the calculated usage probabilities of the target field (such as "Shipping Address") are sorted from largest to smallest. The first probability (first-order probability) and the second probability (second-order probability) after sorting are determined. For example, after sorting, the probability of using "shipping address" in e-commerce projects is 0.7, which is the first priority probability; the probability of using it in financial projects is 0.2, which is the second priority probability. A preset threshold (e.g., 0.3) is used to determine whether a field should be specific to a particular business project or treated as a common field. If the difference between the first and second priority probabilities is greater than or equal to the preset threshold (0.7 - 0.2 = 0.5 > 0.3), then the target field (e.g., "shipping address") is determined as the project field of the business project (e-commerce project) corresponding to that first priority probability. Otherwise, it is considered a common field, i.e., a field shared by multiple business projects. Based on the classification results, the field attributes in the data model are updated to mark which fields are specific to a particular business project and which are common fields. Simultaneously, database views can be automatically generated or updated to facilitate user access to relevant data.

[0112] Optionally, according to the field chain, the access statement fragments are combined using WITH clauses to obtain a field access statement, including: obtaining the target project, where the target project is any business project; determining the current field chain corresponding to the target project from the field chains of each project based on the matching result between the target project and the business project, and combining the global field chain and the current field chain to obtain the field chain to be accessed; according to the link relationship between each target field in the field chain to be accessed, the access statement fragments in the field chain to be accessed are combined sequentially using WITH clauses, and a query main clause is generated based on the access statement fragments to obtain a field access statement composed of access statement fragments and a query main clause.

[0113] In some embodiments, when any business project needs to access a target field, the chain of fields to be accessed corresponding to that business project is obtained by matching; the SQL statements corresponding to each target field on the chain of fields to be accessed are constructed in order from beginning to end based on the SQL fragments on the field reference relationship and the original field information, so as to obtain the field access statement.

[0114] In some embodiments, the field chain to be accessed consists of a global field chain and a project field chain. For example, if the project field chain corresponding to business project 1 is project field chain 1, and business project 1 needs to access the target field, the global field chain and project field chain 1 are combined to obtain the field chain to be accessed, and the SQL statement is completed according to the field reference relationship, so that project 1 can access the target field on the field chain to be accessed through the SQL statement.

[0115] Optionally, the access statement fragments in the chain of fields to be accessed are sequentially combined using the WITH clause, including: if the target field is not an aggregate field, then the access statement fragments corresponding to the target field are combined; if the target field is an aggregate field, then when generating the main query clause, the access statement fragments corresponding to the aggregate field are combined in the main query clause, wherein the field generation rules corresponding to the aggregate field include aggregate functions.

[0116] In some embodiments, since aggregate functions need to be grouped when used in SQL statements, grouping will change the original number of data rows in the dataset and destroy the structure of the dataset. Therefore, after constructing the access statement fragments corresponding to each target field, the target field corresponding to the aggregate function is added to the query statement to achieve normal access to the target field corresponding to the aggregate function.

[0117] In some embodiments, the target fields in the chain of fields to be accessed include a first field, a second field, a third field, and a fourth field, wherein the field generation rule corresponding to the fourth field includes an aggregation function; constructing an access statement fragment corresponding to the first field, constructing an access statement fragment corresponding to the second field by referencing (FROM) the access statement fragment corresponding to the first field, constructing an access statement fragment corresponding to the third field by referencing the access statement fragment corresponding to the second field, constructing a query interval by referencing the access statement fragment corresponding to the third field, and adding an access statement fragment corresponding to the fourth field to the main query clause.

[0118] Combination Figure 11 As shown, this disclosure provides a system for accessing new fields in a dataset, including a server 1101 and a user terminal 1102.

[0119] Server 1101 is used to obtain the original dataset, which includes the original fields; in response to the field generation rules corresponding to the original dataset, it determines the parent field from the basic fields, generates the target field based on the parent field according to the field generation rules, and adds the target field to the basic fields, where the basic fields also include the original fields; it generates the field reference relationship between the target fields according to the field generation rules; and it generates the field access statement containing the WITH clause according to the field reference relationship.

[0120] User terminal 1102 connects to the server and is used to access target fields through field access statements.

[0121] The new field access system for datasets provided in this disclosure involves generating new target fields on the server side using original fields and existing target fields as parent fields, according to field generation rules. A field linked list is then generated based on the field generation rules between the target fields. Access statement fragments corresponding to the target fields are combined using WITH clauses according to the field linked list to obtain field access statements. These field access statements provide access services to the target fields to the user terminal. This approach not only organizes the target fields using a field linked list, simplifying the data architecture by keeping the number of database objects to a single level compared to creating multiple database tables or views for field expansion, but also allows for field expansion without disabling the dataset, making query statements more concise and efficient. This reduces the steps required for dataset expansion in two ways, improving the efficiency of dataset expansion.

[0122] This disclosure also provides an electronic device, including: a processor and a memory; the memory is used to store a computer program, and the processor is used to execute the computer program stored in the memory, so that the electronic device performs the above-described method.

[0123] Figure 12 A schematic diagram of a computer system suitable for implementing the embodiments of this application is shown. It should be noted that... Figure 12 The computer system 1200 of the electronic device shown is merely an example and should not impose any limitation on the functionality and scope of use of the embodiments of this application.

[0124] like Figure 12As shown, the computer system 1200 includes a Central Processing Unit (CPU) 1201, which can perform various appropriate actions and processes, such as executing the methods described in the above embodiments, based on programs stored in Read-Only Memory (ROM) 1202 or programs loaded from storage portion 1208 into Random Access Memory (RAM) 1203. Various programs and data required for system operation are also stored in the Random Access Memory 1203. The CPU 1201, ROM 1202, and RAM 1203 are interconnected via a bus 1204. An Input / Output (I / O) interface 1205 is also connected to the bus 1204.

[0125] The following components are connected to the input / output interface 1205: an input section 1206 including a keyboard, mouse, etc.; an output section 1207 including a cathode ray tube (CRT), liquid crystal display (LCD), etc., and speakers, etc.; a storage section 1208 including a hard disk, etc.; and a communication section 1209 including a network interface card such as a LAN (Local Area Network) card, modem, etc. The communication section 1209 performs communication processing via a network such as the Internet. A drive 1210 is also connected to the input / output interface 1205 as needed. Removable media 1211, such as a disk, optical disk, magneto-optical disk, semiconductor memory, etc., are installed on the drive 1210 as needed so that computer programs read from them can be installed into the storage section 1208 as needed.

[0126] Specifically, according to embodiments of this application, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of this application include a computer program product comprising a computer program carried on a computer-readable medium, the computer program including a computer program for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via communication section 1209, and / or installed from removable medium 1211. When the computer program is executed by central processing unit (CPU) 1201, it performs various functions defined in the system of this application.

[0127] The foregoing description and accompanying drawings fully illustrate embodiments of this disclosure to enable those skilled in the art to practice them. Other embodiments may include structural, logical, electrical, procedural, and other changes. The embodiments represent only possible variations. Individual components and functions are optional unless explicitly required, and the order of operation may vary. Parts and subsamples of some embodiments may be included in or replace parts and subsamples of other embodiments. Moreover, the terminology used in this application is for describing embodiments only and is not intended to limit the claims. As used in the description of embodiments and claims, the singular forms “a,” “an,” and “the” are intended to equally include the plural forms unless the context clearly indicates otherwise. Similarly, the term “and / or” as used in this application means including one or more of the associated listed items and all possible combinations thereof. Additionally, when used in this application, the term "comprise" and its variations "comprises" and / or "comprising" refer to the presence of stated subsamples, wholes, steps, operations, elements, and / or components, but do not exclude the presence or addition of one or more other subsamples, wholes, steps, operations, elements, components, and / or groups thereof. Without further limitations, an element defined by the phrase "comprising a..." does not exclude the presence of other identical elements in the process, method, or apparatus that includes the element. In this document, each embodiment may focus on the differences from other embodiments, and similar or identical parts between embodiments can be referred to mutually. For methods, products, etc., disclosed in the embodiments, if they correspond to the method section disclosed in the embodiments, the relevant parts can be referred to the description of the method section.

[0128] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of the embodiments of this disclosure. Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.

[0129] The methods and products (including but not limited to devices and equipment) disclosed in the embodiments herein can be implemented in other ways. For example, the device embodiments described above are merely illustrative. For instance, the division of units may be merely a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some sub-samples may be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be electrical, mechanical, or other forms. Units described as separate components may or may not be physically separate, and components shown as units may or may not be physical units, that is, they may be located in one place or distributed across multiple network units. Some or all of the units may be selected to implement this embodiment according to actual needs. Furthermore, the functional units in the embodiments of this disclosure may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.

[0130] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code, which contains one or more executable instructions for implementing a specified logical function. In some alternative implementations, the functions marked in the blocks may occur in a different order than that shown in the drawings. For example, two consecutive blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. In the descriptions corresponding to the flowcharts and block diagrams in the accompanying drawings, the operations or steps corresponding to different blocks may also occur in a different order than disclosed in the description, and sometimes there is no specific order between different operations or steps. For example, two consecutive operations or steps may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. Each block in a block diagram and / or flowchart, and combinations of blocks in a block diagram and / or flowchart, can be implemented using a dedicated hardware-based system that performs the specified function or action, or using a combination of dedicated hardware and computer instructions.

Claims

1. A method for accessing new fields of a data set, characterized in that Applied to a server-side application, wherein the server-side is connected to a user terminal, the method includes: Obtain the original dataset, which includes the original fields; In response to the field generation rules corresponding to the original dataset, new target fields are generated according to the original fields and / or existing target fields in accordance with the field generation rules. A linked list data structure is established based on the field generation rules between each target field to obtain a field linked list composed of each target field, wherein the field linked list stores the access statement fragments corresponding to each target field; Based on the field generation rules between target fields, a linked list data structure is established to obtain a field linked list composed of the target fields. This includes: acquiring one or more business projects; classifying the target fields according to the business projects to obtain global fields and private fields, wherein the private fields include project fields corresponding to each business project; establishing a linked list data structure based on the field generation rules between global fields to obtain a global field chain, and establishing a linked list data structure based on the field generation rules between project fields to obtain project field chains corresponding to each business project; and linking the global field chains to the project field chains according to the field generation rules to obtain a complete field chain. The target fields are categorized according to the business projects to obtain global fields and private fields. This includes: pre-setting usage probability information corresponding to each of the original fields, wherein the usage probability information includes the field usage probability corresponding to each of the business projects; calculating the usage probability information corresponding to the second field based on the usage probability information corresponding to the first field, wherein the first field is the parent field corresponding to the second field, and the second field is any target field; arranging the field usage probabilities corresponding to the second field in descending order and determining a first priority probability and a second priority probability; if the difference between the first priority probability and the second priority probability is greater than or equal to a preset threshold, then the second field is determined as a project field corresponding to a pending project, wherein the pending project is the business project corresponding to the first priority probability; if the difference between the first priority probability and the second priority probability is less than the preset threshold, then the second field is determined as a global field. According to the field linked list, the access statement fragments are combined using WITH clauses to obtain a field access statement, wherein the user terminal is used to access the target field through the field access statement.

2. The method according to claim 1, characterized in that, Generate a new target field based on the original field and / or the existing target field according to the field generation rules, including at least one of the following: If the rule type corresponding to the field generation rule is a calculated column type, then the field generation rule includes a field calculation statement and a field type. The field calculation statement is executed according to the original field and / or the existing target field to obtain the calculation result. The calculation result is then verified according to the field generation rule. After the verification is passed, the calculation result is stored according to the field type to obtain the target field. The field calculation statement is generated based on an SQL statement. If the rule type corresponding to the field generation rule is a summary column type, then the field generation rule includes a window function. The window function is executed based on the original field and / or the existing target field to obtain the target field output by the window function.

3. The method of claim 1, wherein, Based on the aforementioned field linked list, the access statement fragments are combined using WITH clauses to obtain the field access statements, including: Obtain the target project, wherein the target project is any business project; Based on the matching result between the target project and the business project, the current field chain corresponding to the target project is determined from each of the project field chains, and the global field chain and the current field chain are combined to obtain the field chain to be accessed; According to the link relationship between each target field in the chain of fields to be accessed, the access statement fragments in the chain of fields to be accessed are sequentially combined using the WITH clause expression, and a query main clause is generated based on the access statement fragments to obtain a field access statement composed of the access statement fragments and the query main clause.

4. The method of claim 3, wherein, The access statement fragments in the chain of fields to be accessed are sequentially combined using the WITH clause expression, including: If the target field is not an aggregate field, then combine the access statement fragments corresponding to the target field; If the target field is an aggregate field, then when generating the main query clause, the access statement fragment corresponding to the aggregate field is combined in the main query clause, wherein the field generation rule corresponding to the aggregate field includes aggregate functions.

5. The method according to any one of claims 1 to 4, characterized in that, Obtain the original dataset, including: In response to a data source connection rule, which includes a data source type and / or data source connection information, the original data source is connected according to the data source connection information. In response to the dataset generation rules, which include data source matching information, data collection information, and field storage information, the target data source is obtained by matching from each of the original data sources according to the data source matching information, wherein the data source matching information includes dataset type and / or data source identifier; Data is collected from the target data source based on the data collection information to obtain the target data column; The target data column is stored according to the field storage information to obtain the original field, and the original dataset corresponding to the target data source is generated according to the original field, wherein the field storage information includes field type and / or field name.

6. The method of claim 5, wherein, The method further includes at least one of the following: The system receives data source connection rules sent by the user terminal, wherein the user terminal is used to display a data source interface, allowing the user to input the data source connection rules through the data source interface. The system receives dataset generation rules sent by the user terminal, wherein the user terminal is used to display a dataset interface, allowing the user to input the dataset generation rules through the dataset interface. The system receives field generation rules sent by the user terminal, wherein the user terminal is used to display a field interface, allowing the user to input the field generation rules through the field interface.

7. A system for new field access of a data set, characterized by include: The server-side component is used to obtain the original dataset, which includes the original fields. In response to the field generation rules corresponding to the original dataset, new target fields are generated according to the original fields and / or existing target fields according to the field generation rules; a linked list data structure is established according to the field generation rules between each target field to obtain a field linked list composed of each target field, wherein the field linked list stores access statement fragments corresponding to the target fields; according to the field linked list, the access statement fragments are combined in the form of WITH clauses to obtain field access statements; The server obtains a field linked list composed of the target fields and acquires one or more business projects through the following methods: classifying the target fields according to the business projects to obtain global fields and private fields, wherein the private fields include project fields corresponding to each business project; establishing a linked list data structure according to the field generation rules between the global fields to obtain a global field chain, and establishing a linked list data structure according to the field generation rules between the project fields to obtain project field chains corresponding to each business project; and linking the global field chains to each project field chain according to the field generation rules to obtain a complete field chain. The server obtains global and private fields in the following ways: pre-setting usage probability information corresponding to each of the original fields, wherein the usage probability information includes the field usage probability corresponding to each of the business projects; calculating the usage probability information corresponding to the second field based on the usage probability information corresponding to the first field, wherein the first field is the parent field corresponding to the second field, and the second field is any target field; arranging the field usage probabilities corresponding to the second field in descending order, and determining the first priority probability and the second priority probability; if the difference between the first priority probability and the second priority probability is greater than or equal to a preset threshold, then the second field is determined as the project field corresponding to the pending project, wherein the pending project is the business project corresponding to the first priority probability; if the difference between the first priority probability and the second priority probability is less than the preset threshold, then the second field is determined as a global field. A user terminal connects to the server, and the user terminal is used to access the target field through the field access statement.

8. An electronic device, comprising: include: Processor and memory; The memory is configured to store a computer program, and the processor is configured to execute the computer program stored in the memory, so that the electronic device performs the method in any one of claims 1 to 6.