An information processing method, device and system
By calculating the field similarity between the logical model and the business model, a field mapping relationship can be quickly established, solving the problem of low modeling efficiency in the ETL process and realizing the efficient construction of the data warehouse.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ALIBABA GROUP HOLDING LTD
- Filing Date
- 2019-02-27
- Publication Date
- 2026-06-16
AI Technical Summary
During the ETL process, when the number of models is large, the modeling efficiency is low and increases exponentially with the number of models, resulting in excessively long processing time.
By obtaining the similarity of fields between the logical model and the business model, and using edit distance and semantic similarity to calculate field mapping relationships, a data warehouse can be quickly built.
It improves modeling efficiency in the ETL process, reduces the impact of increasing the number of models on time, and speeds up the construction of the data warehouse.
Smart Images

Figure CN116401305B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of data warehousing, specifically to an information processing method, apparatus, and system. Background Technology
[0002] The ETL (Extract-Transform-Load) process, as the core and soul of BI / DW (Business Intelligence / Data Warehouse), integrates data according to unified rules. It is responsible for transforming data from the data source to the target data warehouse and is an important step in implementing a data warehouse.
[0003] The ETL process generally includes abstracting numerous logical models from the business model, and then developing and implementing these logical models as physical models. The business model refers to the data model constructed by decomposing the company's or department's business processes to conform to the characteristics of that business. The logical model refers to abstracting the entities and relationships between entities from the business model, and designing information such as entity attributes and primary keys. The physical model refers to the concrete implementation of the logical model, designing the data warehouse architecture, and placing the data into the data warehouse.
[0004] In existing technologies, completing the ETL process requires developers to fully understand the business model and logical model, clearly define the business model needed to develop the physical model, and design specific development methods. However, when the number of models in the ETL process is large, modeling can take a long time. Moreover, the ETL process time increases exponentially with the number of models, significantly reducing modeling efficiency. Summary of the Invention
[0005] This application provides an information processing method to improve modeling efficiency in data processing processes such as ETL.
[0006] The information processing method provided in this application includes:
[0007] Obtain the field similarity between the logical model and the business model;
[0008] Based on the field similarity between the logical model and the business model, the mapping relationship between the fields of the logical model and the business model is obtained.
[0009] Optionally, obtaining the field similarity between the logical model and the business model includes:
[0010] Obtain the edit distance between fields in the logical model and fields in the business model and / or the semantic similarity between fields in the logical model and fields in the business model;
[0011] Based on the edit distance and / or the semantic similarity, obtain the field similarity between the logical model and the business model.
[0012] Optionally, obtaining the edit distance between fields in the logical model and fields in the business model and / or the semantic similarity between fields in the logical model and fields in the business model includes:
[0013] Obtain the metadata of the logical model and the metadata of the business model;
[0014] The metadata of the logical model and the metadata of the business model are subjected to word segmentation to obtain the word segmentation results of the fields in the logical model and the word segmentation results of the fields in the business model.
[0015] Based on the word segmentation results of the fields in the logical model and the word segmentation results of the fields in the business model, the edit distance between the fields in the logical model and the fields in the business model and / or the semantic similarity between the fields in the logical model and the fields in the business model are obtained.
[0016] Optionally, the step of performing word segmentation processing on the metadata of the logical model and the metadata of the business model to obtain the word segmentation results of the fields in the logical model and the fields in the business model includes:
[0017] The metadata of the logical model and the metadata of the business model are segmented into words to obtain the initial segmentation results of the logical model and the initial segmentation results of the business model.
[0018] Based on the initial word segmentation results of the logical model, punctuation marks and stop words are deleted from the metadata of the logical model to obtain the word segmentation results of the fields in the logical model. Similarly, based on the initial word segmentation results of the business model, punctuation marks and stop words are deleted from the metadata of the business model to obtain the word segmentation results of the fields in the business model.
[0019] Optionally, the metadata of the logical model includes at least one of the following:
[0020] The field names of the logical model;
[0021] Field annotations for the logical model;
[0022] The field types of the logical model.
[0023] Optionally, the metadata of the business model includes at least one of the following:
[0024] The field names of the business model;
[0025] Field annotations for the business model;
[0026] The field types of the business model.
[0027] Optionally, obtaining the edit distance between fields in the logical model and fields in the business model, and the semantic similarity between fields in the logical model and fields in the business model, based on the field segmentation results of the logical model and the field segmentation results of the business model, includes:
[0028] Based on the field segmentation results of the logical model and the field segmentation results of the business model, the edit distance algorithm is used to obtain the edit distance between the fields in the logical model and the fields in the business model.
[0029] Based on the field segmentation results of the logical model and the field segmentation results of the business model, the semantic similarity between the fields in the logical model and the fields in the business model is obtained using a thesaurus.
[0030] Optionally, obtaining the mapping relationship between fields in the logical model and the business model based on the field similarity between the logical model and the business model includes:
[0031] Based on the field similarity between the logical model and the business model, obtain the field in the business model that has the highest similarity to the field in the logical model;
[0032] Based on the fields in the logical model and the fields in the business model with the highest similarity, obtain the mapping relationship between the fields in the logical model and the business model.
[0033] Optionally, the method for obtaining the mapping relationship between fields between the logical model and the business model further includes:
[0034] Get the request to build a data warehouse;
[0035] Based on the request to establish a data warehouse, the physical model corresponding to the logical model and the business model is determined by utilizing the mapping relationship between fields between the logical model and the business model;
[0036] Based on the physical model, establish a data warehouse.
[0037] This application provides an information processing apparatus, comprising:
[0038] The similarity acquisition unit is used to acquire the field similarity between the logical model and the business model based on the edit distance between the fields in the logical model and the fields in the business model, and the semantic similarity between the fields in the logical model and the fields in the business model.
[0039] The mapping acquisition unit is used to acquire the mapping relationship between fields in the logical model and the business model based on the field similarity between the logical model and the business model.
[0040] This application provides a method for obtaining primary key information of a business model, including:
[0041] Retrieve fields and data from the business model;
[0042] Calculations are performed based on the fields and data to obtain statistical information of the fields, judgment information of specified attributes of the fields, and feature information of the fields;
[0043] Based on the statistical information of the field, the specified attribute judgment information of the field, and the feature information of the field, the primary key information of the business model is obtained.
[0044] Optionally, obtaining the fields and data in the business model includes:
[0045] Obtain a specified amount of data from the business model.
[0046] Optionally, the statistical information of the field includes at least one of the following statistical information:
[0047] The field's null value rate;
[0048] The repetition rate of the field;
[0049] The average length of the data in the field;
[0050] The variance of the data length of the field.
[0051] Optionally, the specified attribute judgment information of the field includes at least one of the following:
[0052] Is the field an identification code?
[0053] Is the field a date?
[0054] Is the field a link information?
[0055] Is the field a phone number?
[0056] Is the field a timestamp?
[0057] Is the field an address information?
[0058] Is the field a check digit?
[0059] Is the field a monotonically increasing sequence?
[0060] Optionally, the feature information of the field includes at least one of the following:
[0061] Is the field a number?
[0062] Does the field contain Chinese characters?
[0063] Does the field contain special symbols?
[0064] Does the field have a similar prefix or a similar suffix?
[0065] The location information of the field in the business model.
[0066] Optionally, obtaining the primary key information of the business model based on the statistical information of the field, the specified attribute judgment of the field, and the feature information of the field includes:
[0067] Based on the statistical information of the field, the determination of the specified attributes of the field, and the feature information of the field, a recommendation strategy for the primary key information of the business model is constructed;
[0068] Based on the recommendation strategy of the primary key information of the business model, candidate primary keys of the business model are obtained;
[0069] The candidate primary keys of the business model are classified to obtain the classification results of the candidate primary keys;
[0070] Based on the classification results, obtain the primary key information of the business model.
[0071] Optionally, the method for obtaining the primary key information of the business model further includes:
[0072] Get the request to build a data warehouse;
[0073] Based on the request to establish a data warehouse, the physical model corresponding to the logical model and the business model is determined using the primary key information of the business model;
[0074] Based on the physical model, establish a data warehouse.
[0075] This application provides an apparatus for obtaining primary key information of a business model, comprising:
[0076] The field and data acquisition unit is used to acquire fields and data from the business model.
[0077] The field information calculation unit is used to perform calculations based on the fields and data to obtain statistical information of the fields, judgment information of specified attributes of the fields, and feature information of the fields.
[0078] The primary key information calculation unit is used to obtain the primary key information of the business model based on the statistical information of the field, the specified attribute judgment information of the field, and the feature information of the field.
[0079] This application provides a method for obtaining the relevant business model of the logical model, including:
[0080] Obtain the raw data of the business model to be processed and the raw data of the logical model to be processed;
[0081] Word segmentation is performed on the original data of the business model to be processed and the original data of the logic model to be processed to obtain the standard feature information of the business model to be processed and the standard feature information of the logic model to be processed.
[0082] Based on the standard feature information of the business model to be processed and the standard feature information of the logic model to be processed, and based on the statistical information of the standard feature information of the historical business model obtained by training with historical data and the statistical information of the standard feature information of the historical logic model obtained by training with historical data, the statistical values of the standard feature information of the business model to be processed and the statistical values of the standard feature information of the logic model to be processed are obtained.
[0083] Based on the statistical values of the standard feature information of the business model to be processed and the statistical values of the standard feature information of the logic model to be processed, the correlation between the business model to be processed and the logic model to be processed is obtained.
[0084] Based on the relevance between the business model to be processed and the logic model to be processed, obtain the business model related to the logic model to be processed.
[0085] Optionally, the method for obtaining the relevant business model of the logical model further includes:
[0086] Obtain the single-layer lineage relationship between the historical business model and the historical logical model;
[0087] Based on the single-layer lineage relationship, obtain all lineage relationships between the historical business model and the historical logical model;
[0088] The original data of the historical business model is segmented to obtain the standard feature information of the historical business model;
[0089] Based on the original data of the historical logical model, the standard feature information of the historical business model, and all lineage relationships between the historical business model and the historical logical model, the standard feature information of the logical model is obtained;
[0090] Based on the standard feature information of the historical business model and the standard feature information of the logical model, statistical information of the standard feature information of the historical business model and statistical information of the standard feature information of the historical logical model are obtained.
[0091] Optionally, the method for obtaining the relevant business model of the logical model further includes:
[0092] Get the request to build a data warehouse;
[0093] Based on the request to establish a data warehouse, the physical model corresponding to the logical model to be processed is determined using the business model related to the logical model to be processed.
[0094] Based on the physical model, the data warehouse is established.
[0095] This application provides an apparatus for obtaining a related business model of a logical model, comprising:
[0096] The raw data acquisition unit is used to acquire the raw data of the business model to be processed and the raw data of the logical model to be processed.
[0097] The word segmentation processing unit is used to perform word segmentation processing on the original data of the business model to be processed and the original data of the logic model to be processed, so as to obtain the standard feature information of the business model to be processed and the standard feature information of the logic model to be processed.
[0098] The statistical value acquisition unit is used to obtain the statistical values of the standard feature information of the business model to be processed and the standard feature information of the logic model to be processed, based on the standard feature information of the business model to be processed and the standard feature information of the logic model to be processed, and based on the statistical information of the standard feature information of the historical business model obtained by training with historical data and the statistical information of the standard feature information of the historical logic model obtained by training with historical data.
[0099] The relevance acquisition unit is used to obtain the relevance between the business model to be processed and the logic model to be processed based on the statistical values of the standard feature information of the business model to be processed and the statistical values of the standard feature information of the logic model to be processed.
[0100] The business model acquisition unit is used to acquire the relevant business model of the logic model to be processed based on the relevance between the business model to be processed and the logic model to be processed.
[0101] This application provides a method for establishing a data warehouse, including:
[0102] Obtain the business model related to the logical model;
[0103] Obtain the primary key information of the business model;
[0104] Obtain the mapping relationship between fields in the logical model and the business model;
[0105] Based on the primary key information of the business model and the mapping relationship, determine the physical model corresponding to the logical model and the business model;
[0106] Based on the physical model, establish a data warehouse corresponding to the physical model.
[0107] This application provides a data processing system, including: a primary key information acquisition module for a business model, a business model acquisition module, a field mapping relationship acquisition module, and a data warehouse establishment module;
[0108] The business model acquisition module is used to acquire business models related to the logical model;
[0109] The primary key information acquisition module of the business model is used to acquire the primary key information of the business model.
[0110] The field mapping relationship acquisition module is used to acquire the field mapping relationship between the logical model and the business model;
[0111] The data warehouse establishment module is used to determine the physical model corresponding to the logical model and the business model based on the primary key information of the business model and the mapping relationship; and to establish a data warehouse corresponding to the physical model based on the physical model.
[0112] Compared with the prior art, this application has the following advantages:
[0113] The method for obtaining the mapping relationship between fields in a logical model and a business model, as provided in this application, is used to obtain the metadata of the logical model and the metadata of the business model; word segmentation processing is performed on the metadata of the logical model and the metadata of the business model to obtain the word segmentation results of the fields in the logical model and the fields in the business model; based on the word segmentation results of the fields in the logical model and the fields in the business model, the edit distance and semantic similarity between the fields in the logical model and the fields in the business model are obtained; based on the edit distance and semantic similarity between the fields in the logical model and the fields in the business model, the field similarity between the logical model and the business model is obtained; based on the field similarity between the logical model and the business model, the mapping relationship between the fields in the logical model and the business model is obtained.
[0114] The method provided in this application for obtaining the mapping relationship between fields in the logical model and the business model can quickly obtain the mapping relationship between fields in the logical model and the business model, thereby improving the modeling efficiency in the ETL process.
[0115] The method for obtaining primary key information of a business model provided in this application is used to obtain fields and data in the business model; calculations are performed based on the fields and data to obtain statistical information of the fields, specified attribute judgment information of the fields, and feature information of the fields; and the primary key information of the business model is obtained based on the statistical information of the fields, the specified attribute judgment information of the fields, and the feature information of the fields.
[0116] The method for obtaining primary key information of a business model provided in this application can quickly obtain the primary key information of a business model, thereby improving the modeling efficiency in the ETL process.
[0117] The method for obtaining relevant business models of a logical model, as provided in this application, involves acquiring the original data of the business model to be processed and the original data of the logical model to be processed; performing word segmentation on the original data of the business model to be processed and the original data of the logical model to be processed to obtain the standard feature information of the business model to be processed and the standard feature information of the logical model to be processed; obtaining statistical values of the standard feature information of the business model to be processed and the standard feature information of the logical model to be processed based on the statistical values of the standard feature information of the business model to be processed and the standard feature information of the logical model to be processed, using statistical values of the standard feature information of the business model to be processed and the standard feature information of the logical model to be processed; obtaining the relevance between the business model to be processed and the logical model to be processed based on the statistical values of the standard feature information of the business model to be processed and the standard feature information of the logical model to be processed; and obtaining business models related to the logical model to be processed based on the relevance between the business model to be processed and the logical model to be processed.
[0118] The method for obtaining relevant business models of a logical model provided in this application can quickly obtain relevant business models of a logical model, thereby improving modeling efficiency in the ETL process. Attached Figure Description
[0119] Figure 1 This is a flowchart of a method for obtaining the mapping relationship between fields between a logical model and a business model, provided in the first embodiment of this application.
[0120] Figure 2 This is a schematic diagram of an apparatus for obtaining the mapping relationship between fields between a logical model and a business model, provided in the second embodiment of this application;
[0121] Figure 3 This is a flowchart of a method for obtaining primary key information of a business model provided in the third embodiment of this application;
[0122] Figure 4 This is a schematic diagram of a device for obtaining primary key information of a business model according to the fourth embodiment of this application;
[0123] Figure 5 This is a flowchart of a method for obtaining a related business model of a logical model according to the fifth embodiment of this application;
[0124] Figure 6 This is a schematic diagram of an apparatus for obtaining a related business model of a logical model, provided in the sixth embodiment of this application;
[0125] Figure 7 This is a flowchart of a method for establishing a data warehouse provided in the seventh embodiment of this application;
[0126] Figure 8 This is a schematic diagram of a data processing system provided in the eighth embodiment of this application. Detailed Implementation
[0127] Many specific details are set forth in the following description to provide a full understanding of this application. However, this application can be implemented in many other ways different from those described herein, and those skilled in the art can make similar extensions without departing from the spirit of this application; therefore, this application is not limited to the specific embodiments disclosed below.
[0128] The first embodiment of this application provides an information processing method. Please refer to... Figure 1 This figure is a schematic diagram of the first embodiment of this application. The following is in conjunction with... Figure 1 The first embodiment of this application provides a detailed description of an information processing method. The method includes the following steps:
[0129] Step S101: Obtain the field similarity between the logical model and the business model.
[0130] This step is used to obtain the field similarity between the logical model and the business model.
[0131] The process of obtaining field similarity between the logical model and the business model includes:
[0132] Obtain the edit distance between fields in the logical model and fields in the business model and / or the semantic similarity between fields in the logical model and fields in the business model;
[0133] Based on the edit distance and / or the semantic similarity, obtain the field similarity between the logical model and the business model.
[0134] After obtaining the edit distance between fields in the logical model and fields in the business model, and the semantic similarity between fields in the logical model and fields in the business model, the field similarity between the logical model and the business model can be obtained through weighted calculation; alternatively, the field similarity between the logical model and the business model can be obtained solely through the edit distance or solely through the semantic similarity.
[0135] The step of obtaining the edit distance between fields in the logical model and fields in the business model and / or the semantic similarity between fields in the logical model and fields in the business model includes:
[0136] Obtain the metadata of the logical model and the metadata of the business model;
[0137] The metadata of the logical model and the metadata of the business model are subjected to word segmentation to obtain the word segmentation results of the fields in the logical model and the word segmentation results of the fields in the business model.
[0138] Based on the word segmentation results of the fields in the logical model and the word segmentation results of the fields in the business model, the edit distance between the fields in the logical model and the fields in the business model and / or the semantic similarity between the fields in the logical model and the fields in the business model are obtained.
[0139] The metadata of the logical model includes at least one of the following:
[0140] The field names of the logical model;
[0141] Field annotations for the logical model;
[0142] The field types of the logical model.
[0143] The metadata of the business model includes at least one of the following:
[0144] The field names of the business model;
[0145] Field annotations for the business model;
[0146] The field types of the business model.
[0147] The step of performing word segmentation on the metadata of the logical model and the metadata of the business model to obtain the word segmentation results of the fields in the logical model and the fields in the business model includes:
[0148] The metadata of the logical model and the metadata of the business model are segmented into words to obtain the initial segmentation results of the logical model and the initial segmentation results of the business model.
[0149] Based on the initial word segmentation results of the logical model, punctuation marks and stop words are deleted from the metadata of the logical model to obtain the word segmentation results of the fields in the logical model. Similarly, based on the initial word segmentation results of the business model, punctuation marks and stop words are deleted from the metadata of the business model to obtain the word segmentation results of the fields in the business model.
[0150] In natural language processing, word segmentation is a frequently used preprocessing step. English words are naturally separated by spaces, making word segmentation straightforward. However, sometimes it's necessary to treat multiple words as a single segment, such as nouns like "New York," which need to be treated as a single word. Chinese, lacking spaces, requires special handling for word segmentation. Since word segmentation is a common technique, it will not be discussed in detail here.
[0151] Stop words refer to certain words or phrases that are automatically filtered out before or after processing natural language data (or text) in information retrieval to save storage space and improve search efficiency.
[0152] The step of obtaining the edit distance between fields in the logical model and fields in the business model, and the semantic similarity between fields in the logical model and fields in the business model, based on the field segmentation results of the logical model and the field segmentation results of the business model, includes:
[0153] Based on the field segmentation results of the logical model and the field segmentation results of the business model, the edit distance algorithm is used to obtain the edit distance between the fields in the logical model and the fields in the business model.
[0154] Based on the field segmentation results of the logical model and the field segmentation results of the business model, the semantic similarity between the fields in the logical model and the fields in the business model is obtained using a thesaurus.
[0155] Edit distance, also known as Levenshtein distance, refers to the minimum number of edit operations required to transform one string into another. Edit operations include replacing one character with another, inserting a character, and deleting a character. Edit distance reflects the physical similarity between strings, i.e., how many operations are needed to replace one character with another.
[0156] The semantic similarity between fields can be obtained by querying a thesaurus.
[0157] Step S102: Based on the field similarity between the logical model and the business model, obtain the mapping relationship between the fields of the logical model and the business model.
[0158] This step is used to obtain the mapping relationship between fields in the logical model and the business model based on the field similarity between the logical model and the business model.
[0159] The step of obtaining the mapping relationship between fields in the logical model and the business model based on the field similarity between the logical model and the business model includes:
[0160] Based on the field similarity between the logical model and the business model, obtain the field in the business model that has the highest similarity to the field in the logical model;
[0161] Based on the fields in the logical model and the fields in the business model with the highest similarity, obtain the mapping relationship between the fields in the logical model and the business model.
[0162] The method provided in this embodiment can perform intelligent mapping calculations for each pair of fields in the logical model and the business model, and automatically recommend field B in the business model C2 that is most similar to field A in the logical model C1.
[0163] The method for obtaining the mapping relationship between fields in the logical model and the business model further includes:
[0164] Get the request to build a data warehouse;
[0165] Based on the request to establish a data warehouse, the physical model corresponding to the logical model and the business model is determined by utilizing the mapping relationship between fields between the logical model and the business model;
[0166] Based on the physical model, establish a data warehouse.
[0167] The above steps describe a method for establishing a data warehouse using the mapping relationship between fields in the logical model and the business model. First, a request to establish a data warehouse is obtained; then, based on the request, the physical model corresponding to the logical model and the business model is determined using the mapping relationship between fields in the logical model and the business model; finally, the data warehouse is established based on the physical model.
[0168] In the above embodiments, an information processing method is provided; correspondingly, this application also provides an information processing apparatus. Please refer to... Figure 2 This is a schematic diagram of an embodiment of an information processing apparatus according to this application. Since this embodiment, namely the second embodiment, is basically similar to the method embodiment, it is described simply; relevant details can be found in the description of the method embodiment. The apparatus embodiment described below is merely illustrative.
[0169] An information processing apparatus according to this embodiment includes:
[0170] The similarity acquisition unit 201 is used to acquire the field similarity between the logical model and the business model based on the edit distance between the fields in the logical model and the fields in the business model, and the semantic similarity between the fields in the logical model and the fields in the business model.
[0171] The mapping acquisition unit 202 is used to acquire the mapping relationship between fields in the logical model and the business model based on the field similarity between the logical model and the business model.
[0172] In this embodiment, the similarity acquisition unit is specifically used to: after acquiring the edit distance between the fields in the logical model and the fields in the business model, and the semantic similarity between the fields in the logical model and the fields in the business model, obtain the field similarity between the logical model and the business model through weighted calculation.
[0173] In this embodiment, the similarity acquisition unit is further configured to: acquire metadata of the logical model and metadata of the business model; perform word segmentation processing on the metadata of the logical model and the metadata of the business model to obtain word segmentation results of fields in the logical model and words segmentation results of fields in the business model; and obtain the edit distance between fields in the logical model and fields in the business model and / or the semantic similarity between fields in the logical model and fields in the business model based on the word segmentation results of fields in the logical model and words segmentation results of fields in the business model.
[0174] The third embodiment of this application provides a method for obtaining primary key information of a business model. Please refer to... Figure 3 This figure is a schematic diagram of the third embodiment of this application. The following is in conjunction with... Figure 3 The third embodiment of this application provides a method for obtaining primary key information of a business model. The method includes the following steps:
[0175] Step S301: Obtain the fields and data in the business model.
[0176] This step is used to obtain the fields and data in the business model.
[0177] The process of obtaining fields and data from the business model includes:
[0178] Obtain a specified amount of data from the business model.
[0179] For each business model, a fixed number of fields are randomly selected from the model. For example, the fixed number is 10,000. If the number of fields in the business model is less than this fixed number, all fields from the business model are retrieved.
[0180] Step S302: Calculate based on the fields and data to obtain statistical information of the fields, judgment information of specified attributes of the fields, and feature information of the fields.
[0181] This step is used to perform calculations based on the fields and data to obtain statistical information of the fields, judgment information of specified attributes of the fields, and feature information of the fields.
[0182] The specified attribute judgment information of the field includes at least one of the following:
[0183] Is the field an identification code?
[0184] Is the field a date?
[0185] Is the field a link information?
[0186] Is the field a phone number?
[0187] Is the field a timestamp?
[0188] Is the field an address information?
[0189] Is the field a check digit?
[0190] Is the field a monotonically increasing sequence?
[0191] The feature information of the field includes at least one of the following:
[0192] Are all of the fields numbers?
[0193] Does the field contain Chinese characters?
[0194] Does the field contain special symbols?
[0195] Does the field have a similar prefix or a similar suffix?
[0196] The location information of the field in the business model.
[0197] Step S303: Obtain the primary key information of the business model based on the statistical information of the field, the specified attribute judgment information of the field, and the feature information of the field.
[0198] This step is used to obtain the primary key information of the business model based on the statistical information of the field, the specified attribute judgment information of the field, and the feature information of the field.
[0199] The step of obtaining the primary key information of the business model based on the statistical information of the field, the specified attribute judgment of the field, and the feature information of the field includes:
[0200] Based on the statistical information of the field, the determination of the specified attributes of the field, and the feature information of the field, a recommendation strategy for the primary key information of the business model is constructed;
[0201] Based on the recommendation strategy of the primary key information of the business model, candidate primary keys of the business model are obtained;
[0202] The candidate primary keys of the business model are classified to obtain the classification results of the candidate primary keys;
[0203] Based on the classification results, obtain the primary key information of the business model.
[0204] The primary key information of the business model can be a single-field primary key or a composite-field primary key.
[0205] The method for obtaining the primary key information of the business model provided in this embodiment, with the help of big data analysis technology, can automatically recommend primary keys for each business model in the operational data store (ods) layer of the data warehouse architecture.
[0206] The method for obtaining the primary key information of the business model further includes:
[0207] Get the request to build a data warehouse;
[0208] Based on the request to establish a data warehouse, the physical model corresponding to the logical model and the business model is determined using the primary key information of the business model;
[0209] Based on the physical model, establish a data warehouse.
[0210] By adopting the above steps, the primary key information of the obtained business model can be used in the process of building a data warehouse, thereby reducing the workload of designing and developing solutions and implementing solutions in the ETL process.
[0211] In the above embodiments, a method for obtaining primary key information of a business model is provided. Correspondingly, this application also provides an apparatus for obtaining primary key information of a business model. Please refer to... Figure 4 This is a flowchart illustrating an embodiment of a device for obtaining primary key information of a business model according to this application. Since this embodiment, namely the fourth embodiment, is basically similar to the method embodiment, it is described simply; relevant details can be found in the description of the method embodiment. The device embodiment described below is merely illustrative.
[0212] This embodiment of an apparatus for obtaining primary key information of a business model includes:
[0213] Field and data acquisition unit 401 is used to acquire fields in the business model;
[0214] The field information calculation unit 402 is used to obtain statistical information of the field, specified attribute judgment information of the field, and feature information of the field based on the field.
[0215] The primary key information calculation unit 403 is used to obtain the primary key information of the business model based on the statistical information of the field, the specified attribute judgment information of the field, and the feature information of the field.
[0216] In this embodiment, the field acquisition unit is specifically used to: acquire a specified number of field data in the business model.
[0217] In this embodiment, the primary key information acquisition unit is specifically used to: construct a recommendation strategy for the primary key information of the business model based on the statistical information of the field, the specified attribute judgment of the field, and the feature information of the field;
[0218] Based on the recommendation strategy of the primary key information of the business model, candidate primary keys of the business model are obtained;
[0219] The candidate primary keys of the business model are classified to obtain the classification results of the candidate primary keys;
[0220] Based on the classification results, obtain the primary key information of the business model.
[0221] The fifth embodiment of this application provides a method for obtaining the relevant business model of the logical model. Please refer to... Figure 5 This figure is a schematic diagram of the fifth embodiment of this application. The following is in conjunction with... Figure 5 This application provides a detailed description of a method for obtaining relevant business models from a logical model, as described in the fifth embodiment. This embodiment recommends the top N most relevant business models for a logical model and can provide relevance scores between the models. The method provided in this embodiment employs a machine learning algorithm and consists of a training part and a recommendation part.
[0222] The implementation of the method includes the following steps:
[0223] Step S501: Obtain the raw data of the business model to be processed and the raw data of the logical model to be processed.
[0224] This step is used to obtain the raw data of the business model to be processed and the raw data of the logical model to be processed.
[0225] This step is a recommended part of this embodiment. First, the raw data of the business model to be processed and the raw data of the logical model to be processed are collected. Then, based on the analysis of the training data, a business model is recommended for the logical model.
[0226] Step S502: Perform word segmentation on the original data of the business model to be processed and the original data of the logic model to be processed to obtain the standard feature information of the business model to be processed and the standard feature information of the logic model to be processed.
[0227] This step is used to perform word segmentation on the original data of the business model to be processed and the original data of the logic model to be processed, so as to obtain the standard feature information of the business model to be processed and the standard feature information of the logic model to be processed.
[0228] This step is a recommended part of this embodiment. In this embodiment, the standard feature information of the business model to be processed can be feature words representing the business model to be processed, and the standard feature information of the logic model to be processed can be feature words representing the logic model to be processed.
[0229] Step S503: Based on the standard feature information of the business model to be processed and the standard feature information of the logic model to be processed, and according to the statistical information of the standard feature information of the historical business model obtained by training with historical data and the statistical information of the standard feature information of the historical logic model obtained by training with historical data, obtain the statistical value of the standard feature information of the business model to be processed and the statistical value of the standard feature information of the logic model to be processed.
[0230] This step is used to obtain the statistical values of the standard feature information of the business model to be processed and the standard feature information of the logic model to be processed, based on the standard feature information of the business model to be processed and the standard feature information of the logic model to be processed, and according to the statistical information of the standard feature information of the historical business model obtained by training with historical data and the statistical information of the standard feature information of the historical logic model obtained by training with historical data.
[0231] The statistical information of the historical logical model includes the term frequency (TF), inverse document frequency (IDF), and TF-IDF (Term Frequency-Inverse Document Frequency) for each word in the standard feature information of the historical logical model. The calculation methods for TF, IDF, and TF-IDF are as follows:
[0232] tf = the number of times a word appears in the article / the total number of words in the article;
[0233] idf = log(total number of documents in the corpus / total number of documents containing the word + 1);
[0234] tf-idf = tf*idf.
[0235] Step S504: Based on the statistical values of the standard feature information of the business model to be processed and the statistical values of the standard feature information of the logic model to be processed, obtain the correlation between the business model to be processed and the logic model to be processed.
[0236] This step is used to obtain the correlation between the business model to be processed and the logic model to be processed based on the statistical values of the standard feature information of the business model to be processed and the statistical values of the standard feature information of the logic model to be processed.
[0237] Based on the statistical values of the standard feature information of the business model to be processed and the standard feature information of the logic model to be processed, the correlation between the business model to be processed and the logic model to be processed can be statistically obtained.
[0238] Step S505: Based on the relevance between the business model to be processed and the logic model to be processed, obtain the business model related to the logic model to be processed.
[0239] This step is used to obtain the business model related to the logic model to be processed based on the relevance between the business model to be processed and the logic model to be processed.
[0240] Based on the relevance between the business model to be processed and the logical model to be processed, obtain the most relevant business models of the logical model to be processed, and give specific relevance scores between the business model to be processed and the logical model to be processed.
[0241] The method for obtaining the relevant business model of the logical model further includes:
[0242] Obtain the single-layer lineage relationship between the historical business model and the historical logical model;
[0243] Based on the single-layer lineage relationship, obtain all lineage relationships between the historical business model and the historical logical model;
[0244] The original data of the historical business model is segmented to obtain the standard feature information of the historical business model;
[0245] Based on the original data of the historical logical model, the standard feature information of the historical business model, and all lineage relationships between the historical business model and the historical logical model, the standard feature information of the logical model is obtained;
[0246] Based on the standard feature information of the historical business model and the standard feature information of the logical model, statistical information of the standard feature information of the historical business model and statistical information of the standard feature information of the historical logical model are obtained.
[0247] This step belongs to the training part of this embodiment. The single-level lineage relationship means that if model 1 directly depends on model 2, then model 1 and model 2 are considered to have a single-level lineage relationship, with model 1 inheriting from model 2. In the metadata of data development systems such as DataWorks, single-level dependencies, i.e., single-level lineage relationships, between models are preserved.
[0248] Based on the single-layer lineage relationship, obtain all lineage relationships between the historical business model and the historical logical model, including:
[0249] Based on graph model theory, a bloodline propagation algorithm is constructed to calculate all bloodline relationships between the two models.
[0250] Based on the relationships, propagation, and directionality between nodes in the graph model, the algorithm calculates the maximum connected subgraph with direction in the graph model, thereby constructing the lineage propagation algorithm.
[0251] The specific implementation of the lineage propagation algorithm includes: calculating the maximum connected subgraph with direction through the single-layer lineage relationship between nodes; calculating the other nodes that each node depends on for each connected subgraph; thereby calculating the multi-layer lineage relationship that each model depends on.
[0252] The method for obtaining the relevant business model of the logical model further includes:
[0253] Get the request to build a data warehouse;
[0254] Based on the request to establish a data warehouse, the physical model corresponding to the logical model to be processed is determined using the business model related to the logical model to be processed.
[0255] Based on the physical model, the data warehouse is established.
[0256] The steps described provide a scenario for using the relevant business model of the acquired logical model.
[0257] In the above embodiments, a method for obtaining the relevant business model of the logical model is provided. Correspondingly, this application also provides an apparatus for obtaining the relevant business model of the logical model. Please refer to...Figure 6 This is a flowchart illustrating an embodiment of an apparatus for obtaining a relevant business model of a logical model according to this application. Since this embodiment, namely the sixth embodiment, is basically similar to the method embodiment, it is described simply; relevant details can be found in the description of the method embodiment. The apparatus embodiment described below is merely illustrative.
[0258] This embodiment of an apparatus for obtaining a related business model of a logical model includes:
[0259] The raw data acquisition unit 601 is used to acquire the raw data of the business model to be processed and the raw data of the logical model to be processed.
[0260] The word segmentation processing unit 602 is used to perform word segmentation processing on the original data of the business model to be processed and the original data of the logic model to be processed, so as to obtain the standard feature information of the business model to be processed and the standard feature information of the logic model to be processed.
[0261] The statistical value acquisition unit 603 is used to obtain the statistical value of the standard feature information of the business model to be processed and the statistical value of the standard feature information of the logic model to be processed based on the standard feature information of the business model to be processed and the standard feature information of the logic model to be processed, and based on the statistical information of the standard feature information of the historical business model obtained by training with historical data and the statistical information of the standard feature information of the historical logic model obtained by training with historical data.
[0262] The relevance acquisition unit 604 is used to obtain the relevance between the business model to be processed and the logic model to be processed based on the statistical values of the standard feature information of the business model to be processed and the statistical values of the standard feature information of the logic model to be processed.
[0263] The business model acquisition unit 605 is used to acquire the relevant business model of the logic model to be processed based on the relevance between the business model to be processed and the logic model to be processed.
[0264] The seventh embodiment of this application provides a method for establishing a data warehouse. Please refer to [link / reference]. Figure 7 This is a flowchart illustrating a method for establishing a data warehouse. The method includes:
[0265] Step S701: Obtain the business model related to the logical model.
[0266] This step is used to obtain the primary key information of the business model.
[0267] For this step, please refer to the relevant section of the fifth embodiment of this application. Step S702: Obtain the primary key information of the business model.
[0268] This step is used to obtain the primary key information of the business model.
[0269] Step S703: Obtain the mapping relationship between the fields of the logical model and the business model.
[0270] This step is used to obtain the mapping relationship between the fields of the logical model and the business model.
[0271] For details on this step, please refer to the relevant sections of the first embodiment of this application.
[0272] Step S704: Determine the physical model corresponding to the logical model and the business model based on the primary key information of the business model and the mapping relationship.
[0273] This step is used to determine the physical model corresponding to the logical model and the business model based on the primary key information of the business model and the mapping relationship.
[0274] Step S705: Based on the physical model, establish a data warehouse corresponding to the physical model.
[0275] This step is used to establish a data warehouse corresponding to the physical model.
[0276] The eighth embodiment of this application provides a data processing system. Please refer to [link / reference]. Figure 8 This is a schematic diagram of a data processing system. The system includes: a primary key information acquisition module 801 for a business model, a business model acquisition module 803, a field mapping relationship acquisition module 802, and a data warehouse establishment module 804;
[0277] The business model acquisition module is used to acquire business models related to the logical model;
[0278] The primary key information acquisition module of the business model is used to acquire the primary key information of the business model.
[0279] The field mapping relationship acquisition module is used to acquire the field mapping relationship between the logical model and the business model;
[0280] The data warehouse establishment module is used to determine the physical model corresponding to the logical model and the business model based on the primary key information of the business model and the mapping relationship; and to establish a data warehouse corresponding to the physical model based on the physical model.
[0281] Since this embodiment is a system embodiment corresponding to the seventh embodiment, the description is relatively simple. For relevant parts, please refer to the description in the seventh embodiment.
[0282] The ninth embodiment of this application provides an electronic device, which includes: a processor; and a memory for storing a computer program. After the device runs the computer program through the processor, it executes an information processing method provided in the first embodiment of this application, or executes a method for obtaining primary key information of a business model provided in the third embodiment of this application, or executes a method for obtaining related business models of a logical model provided in the fifth embodiment of this application, or executes a method for establishing a data warehouse provided in the seventh embodiment of this application.
[0283] The tenth embodiment of this application provides a computer storage medium storing a computer program that is executed by a processor to perform an information processing method provided in the first embodiment of this application, or a method for obtaining primary key information of a business model provided in the third embodiment of this application, or a method for obtaining related business models of a logical model provided in the fifth embodiment of this application, or a method for establishing a data warehouse provided in the seventh embodiment of this application.
[0284] Although this application discloses preferred embodiments as described above, it is not intended to limit this application. Any person skilled in the art can make possible changes and modifications without departing from the spirit and scope of this application. Therefore, the scope of protection of this application should be determined by the scope defined in the claims of this application.
[0285] In a typical configuration, a computing device includes one or more CPUs, input / output interfaces, network interfaces, and memory.
[0286] Memory may include non-persistent storage in computer-readable media, such as random access memory (RAM) and / or non-volatile memory, such as read-only memory (ROM) or flash RAM. Memory is an example of computer-readable media.
[0287] 1. Computer-readable media includes both permanent and non-permanent, removable and non-removable media that can store information by any method or technology. Information can be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, magnetic magnetic disk storage or other magnetic storage devices, or any other non-transferable medium that can be used to store information accessible by a computing device. As defined herein, computer-readable media does not include non-transitory computer-readable media, such as modulated data signals and carrier waves.
[0288] 2. Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
Claims
1. An information processing method, characterized in that, include: Obtain at least one business model related to the logic model to be processed; Obtain the field similarity between the logical model and the business model; Based on the field similarity between the logical model and the business model, obtain the mapping relationship between the fields of the logical model and the business model; The acquisition of at least one business model related to the logic model to be processed includes: Obtain the raw data of the business model to be processed and the raw data of the logical model to be processed; Word segmentation is performed on the original data of the business model to be processed and the original data of the logic model to be processed to obtain the standard feature information of the business model to be processed and the standard feature information of the logic model to be processed. Based on the standard feature information of the business model to be processed and the standard feature information of the logic model to be processed, and based on the statistical information of the standard feature information of the historical business model and the statistical information of the standard feature information of the historical logic model obtained by training with historical data, the correlation between the business model to be processed and the logic model to be processed is calculated. Based on the relevance, at least one business model related to the logic model to be processed is selected from multiple candidate business models.
2. The information processing method according to claim 1, characterized in that, The process of obtaining field similarity between the logical model and the business model includes: Obtain the edit distance between fields in the logical model and fields in the business model and / or the semantic similarity between fields in the logical model and fields in the business model; Based on the edit distance and / or the semantic similarity, obtain the field similarity between the logical model and the business model.
3. The information processing method according to claim 2, characterized in that, The step of obtaining the edit distance between fields in the logical model and fields in the business model and / or the semantic similarity between fields in the logical model and fields in the business model includes: Obtain the metadata of the logical model and the metadata of the business model; The metadata of the logical model and the metadata of the business model are subjected to word segmentation to obtain the word segmentation results of the fields in the logical model and the word segmentation results of the fields in the business model. Based on the word segmentation results of the fields in the logical model and the word segmentation results of the fields in the business model, the edit distance between the fields in the logical model and the fields in the business model and / or the semantic similarity between the fields in the logical model and the fields in the business model are obtained.
4. The information processing method according to claim 3, characterized in that, The step of performing word segmentation on the metadata of the logical model and the metadata of the business model to obtain the word segmentation results of the fields in the logical model and the fields in the business model includes: The metadata of the logical model and the metadata of the business model are segmented into words to obtain the initial segmentation results of the logical model and the initial segmentation results of the business model. Based on the initial word segmentation results of the logical model, punctuation marks and stop words are deleted from the metadata of the logical model to obtain the word segmentation results of the fields in the logical model. Similarly, based on the initial word segmentation results of the business model, punctuation marks and stop words are deleted from the metadata of the business model to obtain the word segmentation results of the fields in the business model.
5. The information processing method according to claim 3, characterized in that, The metadata of the logical model includes at least one of the following: The field names of the logical model; Field annotations for the logical model; The field types of the logical model; The metadata of the business model includes at least one of the following: The field names of the business model; Field annotations for the business model; The field types of the business model.
6. The information processing method according to claim 3, characterized in that, The step of obtaining the edit distance between fields in the logical model and fields in the business model and / or the semantic similarity between fields in the logical model and fields in the business model based on the field segmentation results of the logical model and the field segmentation results of the business model includes: Based on the field segmentation results of the logical model and the field segmentation results of the business model, the edit distance algorithm is used to obtain the edit distance between the fields in the logical model and the fields in the business model. And / or based on the field segmentation results of the logical model and the field segmentation results of the business model, the semantic similarity between the fields in the logical model and the fields in the business model is obtained using a thesaurus.
7. The information processing method according to claim 1, characterized in that, The step of obtaining the mapping relationship between fields in the logical model and the business model based on the field similarity between the logical model and the business model includes: Based on the field similarity between the logical model and the business model, obtain the field in the business model that has the highest similarity to the field in the logical model; Based on the fields in the logical model and the fields in the business model with the highest similarity, obtain the mapping relationship between the fields in the logical model and the business model.
8. The information processing method according to claim 1, characterized in that, The method further includes: Retrieve fields and data from the business model; Calculations are performed based on the fields and data to obtain statistical information of the fields, judgment information of specified attributes of the fields, and / or feature information of the fields; Based on the statistical information of the field, the specified attribute judgment information of the field, and / or the feature information of the field, obtain the primary key information of the business model.
9. The information processing method according to claim 8, characterized in that, The statistical information of the field includes at least one of the following statistical information: The field's null value rate; The repetition rate of the field; The average length of the data in the field; The variance of the data length of the field; The specified attribute judgment information of the field includes at least one of the following: Is the field an identification code? Is the field a date? Is the field a link information? Is the field a phone number? Is the field a timestamp? Is the field an address information? Is the field a check digit? Is the field a monotonically increasing sequence? The feature information of the field includes at least one of the following: Is the field a number? Does the field contain Chinese characters? Does the field contain special symbols? Does the field have a similar prefix or a similar suffix? The location information of the field in the business model.
10. The information processing method according to claim 8, characterized in that, The step of obtaining the primary key information of the business model based on the statistical information of the field, the specified attribute judgment of the field, and the feature information of the field includes: Based on the statistical information of the field, the determination of the specified attributes of the field, and the feature information of the field, a recommendation strategy for the primary key information of the business model is constructed; Based on the recommendation strategy of the primary key information of the business model, candidate primary keys of the business model are obtained; The candidate primary keys of the business model are classified to obtain the classification results of the candidate primary keys; Based on the classification results, obtain the primary key information of the business model.
11. An information processing device, characterized in that, include: The similarity acquisition unit is used to obtain the field similarity between the logical model and the business model. The mapping acquisition unit is used to acquire the mapping relationship between fields in the logical model and the business model based on the field similarity between the logical model and the business model. The raw data acquisition unit is used to acquire the raw data of the business model to be processed and the raw data of the logical model to be processed. The word segmentation processing unit is used to perform word segmentation processing on the original data of the business model to be processed and the original data of the logic model to be processed, so as to obtain the standard feature information of the business model to be processed and the standard feature information of the logic model to be processed. The relevance acquisition unit is used to calculate the relevance between the business model to be processed and the logic model to be processed based on the standard feature information of the business model to be processed and the standard feature information of the logic model to be processed, and based on the statistical information of the standard feature information of the historical business model and the statistical information of the standard feature information of the historical logic model obtained by training with historical data. The business model acquisition unit is used to select at least one business model related to the logic model to be processed from multiple candidate business models based on the relevance.
12. The information processing apparatus according to claim 11, characterized in that, include: The field and data acquisition unit is used to acquire fields and data from the business model. The field information calculation unit is used to perform calculations based on the fields and data to obtain statistical information of the fields, judgment information of specified attributes of the fields, and feature information of the fields. The primary key information calculation unit is used to obtain the primary key information of the business model based on the statistical information of the field, the specified attribute judgment information of the field, and the feature information of the field.
13. An electronic device, characterized in that, include: processor; as well as, A memory for storing a computer program, which, after being run by the processor, performs the method as described in any one of claims 1-10.
14. A computer storage medium, characterized in that, The computer storage medium stores a computer program that is executed by a processor to perform the method as described in any one of claims 1-10.