Index building methods, apparatus, computer equipment, and storage media

By employing an index building method characterized by high availability, high fault tolerance, high cohesion and low coupling, and high scalability, the problem of low index building efficiency in e-commerce platforms has been solved, enabling fast and reliable product index generation and improving user experience and search efficiency.

CN115687350BActive Publication Date: 2026-06-30VIPSHOP (GUANGZHOU) SOFTWARE CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
VIPSHOP (GUANGZHOU) SOFTWARE CO LTD
Filing Date
2022-10-31
Publication Date
2026-06-30

Smart Images

  • Figure CN115687350B_ABST
    Figure CN115687350B_ABST
Patent Text Reader

Abstract

This application relates to an index construction method, apparatus, computer device, and storage medium. The method includes: receiving an index construction request; obtaining relevant data information of the target customer for index construction based on the target customer corresponding to the index construction request; performing cluster analysis on the relevant data information to obtain classified data; configuring a construction environment; concatenating the classified data according to preset dimensions based on the construction environment to generate wide table data; formatting the wide table data with an index and writing it into a corresponding index library to form a first data table; and updating the wide table data in the index library using an incremental update mechanism. The indexing mechanism constructed in this application ensures the availability, fault tolerance, and robustness of the product index, greatly improving construction efficiency. In addition, the switching mechanism constructed in this application allows for a smooth and lossless switch when system anomalies occur, improving the efficiency of search data construction and significantly enhancing the user experience.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the technical field of data search, and in particular to an index building method, apparatus, computer device, and storage medium. Background Technology

[0002] Current research indicates that there is no mature and readily available evaluation method to quickly build product indexes and a robust disaster recovery system. Existing ETL (Extract-Transform-Load, data warehouse technology) implementations are relatively simple, with limited data sources, and do not delve into actual e-commerce operations. Search scenarios, however, are far more complex, relying on over a dozen systems, such as PDC (product system), sales system, PMS (price management system), PTP (product tagging system), GOS (operation management system), category (category management system), big data, ABT (traffic distribution system), USP (customer system), VDE (product tagging system), brandstore (brand management system), and any new third-party systems to be integrated. Rapid index building also depends on the performance of these systems, and despite the large number of systems, high performance cannot be guaranteed for all of them.

[0003] In addition, search is a major traffic entry point for a large e-commerce platform, and building product data (product index) is the cornerstone of the entire search. Every day, it faces millions of users and tens of millions of search terms. If product data is not built in a timely manner, users will not be able to find newly launched products directly through search, and users will not be able to get the most popular products in the first place, which may lead to user churn and directly cause a loss of sales.

[0004] Therefore, there is an urgent need to propose an index construction method, device, computer equipment, and storage medium that is highly available, highly fault-tolerant, highly cohesive, loosely coupled, and highly scalable. Summary of the Invention

[0005] Therefore, it is necessary to provide a highly available, highly fault-tolerant, highly cohesive, loosely coupled, and highly scalable index construction method, apparatus, computer equipment, and storage medium to address the aforementioned technical problems.

[0006] On the one hand, an index construction method is provided, the method comprising:

[0007] Step A: Receive the index building request;

[0008] Step B: Obtain relevant data information for index construction from the target customer corresponding to the index construction request;

[0009] Step C: Perform cluster analysis on the relevant data information to obtain the classified data;

[0010] Step D: Configure the building environment, and based on the building environment, concatenate the classified data according to preset dimensions to generate wide table data;

[0011] Step E: Format the wide table data with an index and write it into the corresponding index library to form the first data table, and update the wide table data in the index library with an incremental update mechanism.

[0012] In one embodiment, the method further includes: clearing all data in the original second data table; detecting whether the first data table meets a preset synchronization standard; if it meets the preset synchronization standard, synchronizing all data in the first data table to the second data table to form a new second data table; and connecting the first data table and the second data table based on a preset switching mechanism to form a one-to-one correspondence.

[0013] In one embodiment, the method further includes: the preset switching mechanism includes: obtaining target data of the first data table in the background server, extracting the index parameters of the target data; comparing the index parameters with a preset threshold: if the index parameters are higher than or equal to the preset threshold, then switching the first data table corresponding to the index to the second data table; if the index parameters are lower than the preset threshold, then not switching.

[0014] In one embodiment, the method further includes: obtaining relevant data information for building the index from the target customer corresponding to the index building request, which includes: the index building request being a reference table identifier for wide table creation; obtaining the corresponding target customer based on the reference table identifier; and obtaining relevant data information for building the index from the corresponding target customer, wherein the relevant data information is a field associated with the reference table identifier.

[0015] In one embodiment, the method further includes: the clustering analysis algorithm used for clustering the relevant data information includes at least one of the following: partition-based clustering algorithm, density-based spatial clustering algorithm, and Gaussian mixture model.

[0016] In one embodiment, the method further includes: configuring the construction environment, and using the construction environment to concatenate the categorized data according to a preset dimension to generate wide table data, which includes: configuring the data source, establishing a wide table construction environment based on the data source and preset configuration content; using the wide table construction environment, using an asynchronous programming mechanism to concatenate the categorized data according to the same preset dimension, merging multiple data streams into one; and performing two-dimensional table concatenation on the merged data stream, completing the attribute values ​​of each data stream to obtain the wide table data.

[0017] In one embodiment, the method further includes: indexing and formatting the wide table data, writing it into a corresponding index library to form a first data table, and updating the wide table data in the index library using an incremental update mechanism, which includes: obtaining the splicing time of each data stream in the wide table data; stratifying the storage database according to the splicing time, the stratification including currently in use, currently under construction, and previously constructed; writing the splicing time into a third data table, and indexing and formatting the data stream corresponding to the splicing time; writing the index-formatted data stream into the corresponding index library to form the first data table; and updating the first data table in the index library using an incremental update mechanism if a new data stream appears.

[0018] On the other hand, an index building apparatus is provided, the apparatus comprising:

[0019] The data receiving module is used to receive index building requests;

[0020] The index information acquisition module is used to acquire relevant data information of the target customer for index construction based on the target customer corresponding to the index construction request.

[0021] The classification module is used to perform cluster analysis on the relevant data information to obtain classified data;

[0022] The wide table data generation module is used to configure the construction environment and, based on the construction environment, splices the classified data according to preset dimensions to generate wide table data.

[0023] The index generation module is used to format the wide table data into an index and write it into the corresponding index library to form a first data table, and to update the wide table data in the index library using an incremental update mechanism.

[0024] In another aspect, a computer device is provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to perform the following steps:

[0025] Step A: Receive the index building request;

[0026] Step B: Obtain relevant data information for index construction from the target customer corresponding to the index construction request;

[0027] Step C: Perform cluster analysis on the relevant data information to obtain the classified data;

[0028] Step D: Configure the building environment, and based on the building environment, concatenate the classified data according to preset dimensions to generate wide table data;

[0029] Step E: Format the wide table data with an index and write it into the corresponding index library to form the first data table, and update the wide table data in the index library with an incremental update mechanism.

[0030] In another aspect, a computer-readable storage medium is provided having a computer program stored thereon, which, when executed by a processor, performs the following steps:

[0031] Step A: Receive the index building request;

[0032] Step B: Obtain relevant data information for index construction from the target customer corresponding to the index construction request;

[0033] Step C: Perform cluster analysis on the relevant data information to obtain the classified data;

[0034] Step D: Configure the building environment, and based on the building environment, concatenate the classified data according to preset dimensions to generate wide table data;

[0035] Step E: Format the wide table data with an index and write it into the corresponding index library to form the first data table, and update the wide table data in the index library with an incremental update mechanism.

[0036] The aforementioned index building method, apparatus, computer equipment, and storage medium include the following steps: receiving an index building request; obtaining relevant data information of the target customer for index building based on the target customer corresponding to the index building request; performing cluster analysis on the relevant data information to obtain classified data; configuring a building environment; splicing the classified data according to preset dimensions based on the building environment to generate wide table data; formatting the wide table data with an index and writing it into the corresponding index library to form a first data table; and updating the wide table data in the index library with an incremental update mechanism. The indexing mechanism constructed in this application ensures the availability, fault tolerance, and robustness of the product index, and reduces the construction time from the current hour level to the minute level, greatly improving the construction efficiency. In addition, the switching mechanism constructed in this application can seamlessly switch when the system malfunctions, improving the efficiency of search data construction and significantly enhancing the user experience. Attached Figure Description

[0037] Figure 1 This is a diagram illustrating the application environment of an index construction method in one embodiment;

[0038] Figure 2 This is a flowchart illustrating an index construction method in one embodiment;

[0039] Figure 3 This is a structural block diagram of an index building apparatus in one embodiment;

[0040] Figure 4 This is an internal structural diagram of a computer device in one embodiment. Detailed Implementation

[0041] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.

[0042] The index building method provided in this application can be applied to, for example... Figure 1 In the application environment shown, terminal 102 communicates with a data processing platform located on server 104 via a network. Terminal 102 can be, but is not limited to, various personal computers, laptops, smartphones, tablets, and portable wearable devices. Server 104 can be a standalone server or a server cluster consisting of multiple servers.

[0043] Example 1

[0044] In one embodiment, such as Figure 2 As shown, an index construction method is provided, which can be applied to... Figure 1 Taking the terminal in the example, the explanation includes the following steps:

[0045] S1: Receive index building requests.

[0046] It should be noted that the index building request is a reference table identifier for wide table creation. Based on the reference table identifier, the reference format of the corresponding first data table can be quickly found, thereby improving the index building efficiency. The first data table is the official table.

[0047] S2: Obtain relevant data information of the target customer for building the index based on the target customer corresponding to the index building request.

[0048] It should be noted that this step specifically includes:

[0049] The corresponding target customer is obtained based on the reference table identifier. The corresponding target customer may be an e-commerce platform. The e-commerce platform and the reference table identifier have a pre-generated mapping relationship. The e-commerce platform may specifically include other systems used to build the e-commerce platform, such as PDC (product system), sales (sales system), PMS (price management system), PTP (product tagging system), GOS (operation management system), category (category management system), big data, ABT (traffic distribution system), USP (customer system), VDE (product tagging system), and brandstore (brand management system).

[0050] Based on the corresponding target customer, relevant data information for building an index is obtained. The relevant data information is the field associated with the reference table identifier. The relevant data information for building the index may be product data.

[0051] Furthermore, in this embodiment, the associated fields refer to the field names of the data source table of the reference table and the field content associated with those field names.

[0052] S3: Perform cluster analysis on the relevant data information to obtain the classified data.

[0053] It should be noted that clustering analysis algorithms can be partition-based clustering algorithms (such as the K-Means algorithm), density-based spatial clustering algorithms (such as the DBSCAN algorithm), and Gaussian mixture models (GMM).

[0054] For example, let's take the K-Means algorithm as an example:

[0055] (1) Select k objects from the relevant data information as initial cluster centers;

[0056] (2) Calculate the distance from each cluster object to the cluster center to divide the cluster;

[0057] (3) Calculate each cluster center again;

[0058] (4) Calculate the standard measure function until the maximum number of iterations is reached, then stop; otherwise, continue with steps (2) and (3).

[0059] S4: Configure the building environment, and based on the building environment, splice the classified data according to preset dimensions to generate wide table data.

[0060] It should be noted that the configuration steps for building the environment include:

[0061] Configure the data source and establish a wide table building environment based on the data source and the preset configuration content. The data source can be different node contents in the data table, such as the master node. The preset configuration content is the data storage path and the data configuration file preset by the target customer.

[0062] Based on the wide table construction environment, the classified data is spliced ​​together according to the same preset dimension using an asynchronous programming mechanism, and multiple data streams are merged into one. The preset dimension can be a time dimension.

[0063] The merged data streams are then concatenated into a two-dimensional table to complete the attribute values ​​of each data stream, resulting in the wide table data.

[0064] S5: The wide table data is indexed and formatted, and written into the corresponding index library to form the first data table, and the wide table data in the index library is updated using an incremental update mechanism.

[0065] It should be noted that the specific steps for S4 to S5 are as follows:

[0066] Obtain the splicing time of each data stream in the wide table data, and layer the storage database according to the splicing time. The layer includes the data currently in use, currently under construction, and the data from the last construction.

[0067] The splicing time is written into the third data table, and the data stream corresponding to the splicing time is indexed and formatted. The third data table is a Rename intermediate table used to store the process data of index building.

[0068] The formatted data stream is written into the corresponding index library to form the first data table. If a new data stream appears, the first data table in the index library is updated using an incremental update mechanism. For example, the incremental update mechanism can update the wide table data in the index library as follows: Consumer VDP (Data Message Service) updates brand information, sales information, PMS discount information, SKU size and color, vendor_SKU, category attributes, PDC product attributes, regional service visibility, PTP tags, VDE tags, etc.

[0069] Furthermore, a second data table is constructed in the database as a backup table. This allows for automatic and rapid switching to the second data table when the first data table becomes unavailable, avoiding unnecessary economic losses and improving user experience. The construction process of the second data table includes:

[0070] Clear all data in the existing second data table;

[0071] The system checks whether the first data table meets the preset synchronization standard. The preset synchronization standard is to check the quantity of goods (spu / mid), etc. If the check passes, it meets the preset synchronization standard.

[0072] If the preset synchronization standard is met, all data in the first data table will be synchronized to the second data table to form a new second data table.

[0073] The first data table and the second data table are connected based on a preset switching mechanism to form a one-to-one correspondence.

[0074] Specifically, the preset switching mechanism includes:

[0075] Obtain the target data of the first data table in the backend server, and extract the index parameters of the target data. The backend server can be a monitoring panel, which can directly see the key data of the system, such as the number of each index, the health status of the wide table job, and the recall number of top N hot words.

[0076] Compare the index parameter with the preset threshold:

[0077] If the index parameter is higher than or equal to the preset threshold, then the first data table corresponding to the index is switched to the second data table;

[0078] If the index parameter is lower than a preset threshold, no switching will be performed;

[0079] For example, search hot keyword monitoring can retrieve the top 200 hot keywords. If the number of products recalled for a certain hot keyword is lower than a preset threshold, the index will not be switched. Index document number threshold: if the number of documents in the new index is lower than a preset threshold, the index will not be switched.

[0080] Furthermore, this switching mechanism is applied to disaster recovery tools. When a problem is found in the official table, or when monitoring indicators issue an alarm, the index can be automatically or manually switched to the backup table through the disaster recovery tool.

[0081] The above-mentioned index construction method includes: receiving an index construction request; obtaining relevant data information of the target customer for index construction based on the target customer corresponding to the index construction request; performing cluster analysis on the relevant data information to obtain classified data; configuring a construction environment; splicing the classified data according to a preset dimension based on the construction environment to generate wide table data; formatting the wide table data into an index and writing it into the corresponding index library to form a first data table; and updating the wide table data in the index library with an incremental update mechanism. The index mechanism constructed in this application ensures the availability, fault tolerance, and robustness of the product index, and reduces the construction time from the existing hour level to the minute level, greatly improving the construction efficiency. In addition, the switching mechanism constructed in this application can seamlessly switch when the system encounters an anomaly, improving the efficiency of search data construction and significantly enhancing the user experience.

[0082] It should be understood that, although Figure 2 The steps in the flowchart are shown sequentially as indicated by the arrows, but these steps are not necessarily executed in the order indicated by the arrows. Unless otherwise specified herein, there is no strict order in which these steps are executed, and they can be performed in other orders. Figure 2At least some of the steps in the process may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily completed at the same time, but can be executed at different times. The execution order of these sub-steps or stages is not necessarily sequential, but can be executed in turn or alternately with other steps or at least some of the sub-steps or stages of other steps.

[0083] Example 2

[0084] In one embodiment, such as Figure 3 As shown, an index building device is provided, including: a data receiving module, an index information acquisition module, a classification module, a wide table data generation module, and an index generation module, wherein:

[0085] The data receiving module is used to receive index building requests;

[0086] The index information acquisition module is used to acquire relevant data information of the target customer for index construction based on the target customer corresponding to the index construction request.

[0087] The classification module is used to perform cluster analysis on the relevant data information to obtain classified data;

[0088] The wide table data generation module is used to configure the construction environment and, based on the construction environment, splices the classified data according to preset dimensions to generate wide table data.

[0089] The index generation module is used to format the wide table data into an index and write it into the corresponding index library to form a first data table, and to update the wide table data in the index library using an incremental update mechanism.

[0090] The index building device further includes a data table switching module, which is used to clear all data in the original second data table; detect whether the first data table meets the preset synchronization standard; if it meets the preset synchronization standard, synchronize all data in the first data table to the second data table to form a new second data table; and connect the first data table and the second data table based on the preset switching mechanism to form a one-to-one correspondence.

[0091] The preset switching mechanism includes:

[0092] Obtain the target data from the first data table in the backend server, and extract the index parameters of the target data;

[0093] Compare the index parameter with the preset threshold:

[0094] If the index parameter is higher than or equal to the preset threshold, then the first data table corresponding to the index is switched to the second data table;

[0095] If the index parameter is lower than a preset threshold, no switching will be performed.

[0096] In a preferred embodiment of the present invention, the index information acquisition module is specifically used for:

[0097] The index building request is a reference table identifier for wide table creation;

[0098] The corresponding target customer is obtained based on the reference table identifier;

[0099] Based on the corresponding target customer, relevant data information for building the index is obtained, and the relevant data information is the field associated with the reference table identifier.

[0100] In a preferred embodiment of the present invention, the wide table data generation module is specifically used for:

[0101] Configure the data source and establish a wide table building environment based on the data source and the preset configuration content;

[0102] Based on the wide table construction environment, the classified data is spliced ​​together according to the same preset dimensions using an asynchronous programming mechanism, merging multiple data streams into one.

[0103] The merged data streams are then concatenated into a two-dimensional table to complete the attribute values ​​of each data stream, resulting in the wide table data.

[0104] In a preferred embodiment of the present invention, the index generation module is specifically used for:

[0105] Obtain the splicing time of each data stream in the wide table data, and layer the storage database according to the splicing time. The layer includes the data currently in use, currently under construction, and the data from the last construction.

[0106] Write the splicing time into the third data table, and index and format the data stream corresponding to the splicing time;

[0107] The formatted data stream is written into the corresponding index library to form a first data table. If a new data stream appears, the first data table in the index library is updated using an incremental update mechanism.

[0108] For specific limitations regarding the index building apparatus, please refer to the limitations on the index building method above, which will not be repeated here. Each module in the aforementioned index building apparatus can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in or independent of the processor in a computer device in hardware form, or stored in the memory of a computer device in software form, so that the processor can call and execute the operations corresponding to each module.

[0109] Example 3

[0110] In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as follows: Figure 4 As shown, the computer device includes a processor, memory, network interface, display screen, and input devices connected via a system bus. The processor provides computing and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system and computer programs. The internal memory provides an environment for the operation of the operating system and computer programs stored in the non-volatile storage media. The network interface is used to communicate with external terminals via a network connection. When the computer program is executed by the processor, it implements an indexing method. The display screen can be a liquid crystal display (LCD) or an e-ink display. The input devices can be a touch layer covering the display screen, buttons, a trackball, or a touchpad mounted on the computer device casing, or an external keyboard, touchpad, or mouse.

[0111] Those skilled in the art will understand that Figure 4 The structure shown is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the computer device to which the present application is applied. Specific computer devices may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.

[0112] In one embodiment, a computer device is provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to perform the following steps:

[0113] Step A: Receive the index building request;

[0114] Step B: Obtain relevant data information for index construction from the target customer corresponding to the index construction request;

[0115] Step C: Perform cluster analysis on the relevant data information to obtain the classified data;

[0116] Step D: Configure the building environment, and based on the building environment, concatenate the classified data according to preset dimensions to generate wide table data;

[0117] Step E: Format the wide table data with an index and write it into the corresponding index library to form the first data table, and update the wide table data in the index library with an incremental update mechanism.

[0118] In one embodiment, the processor, when executing a computer program, also performs the following steps:

[0119] Clear all data in the existing second data table;

[0120] Check whether the first data table conforms to the preset synchronization standard;

[0121] If the preset synchronization standard is met, all data in the first data table will be synchronized to the second data table to form a new second data table.

[0122] The first data table and the second data table are connected based on a preset switching mechanism to form a one-to-one correspondence.

[0123] The preset switching mechanism includes:

[0124] Obtain the target data from the first data table in the backend server, and extract the index parameters of the target data;

[0125] Compare the index parameter with the preset threshold:

[0126] If the index parameter is higher than or equal to the preset threshold, then the first data table corresponding to the index is switched to the second data table;

[0127] If the index parameter is lower than a preset threshold, no switching will be performed.

[0128] In one embodiment, the processor, when executing a computer program, also performs the following steps:

[0129] The index building request is a reference table identifier for wide table creation;

[0130] The corresponding target customer is obtained based on the reference table identifier;

[0131] Based on the corresponding target customer, relevant data information for building the index is obtained, and the relevant data information is the field associated with the reference table identifier.

[0132] In one embodiment, the processor, when executing a computer program, also performs the following steps:

[0133] Configure the data source and establish a wide table building environment based on the data source and the preset configuration content;

[0134] Based on the wide table construction environment, the classified data is spliced ​​together according to the same preset dimensions using an asynchronous programming mechanism, merging multiple data streams into one.

[0135] The merged data streams are then concatenated into a two-dimensional table to complete the attribute values ​​of each data stream, resulting in the wide table data.

[0136] In one embodiment, the processor, when executing a computer program, also performs the following steps:

[0137] Obtain the splicing time of each data stream in the wide table data, and layer the storage database according to the splicing time. The layer includes the data currently in use, currently under construction, and the data from the last construction.

[0138] Write the splicing time into the third data table, and index and format the data stream corresponding to the splicing time;

[0139] The formatted data stream is written into the corresponding index library to form a first data table. If a new data stream appears, the first data table in the index library is updated using an incremental update mechanism.

[0140] Example 4

[0141] In one embodiment, a computer-readable storage medium is provided having a computer program stored thereon, the computer program performing the following steps when executed by a processor:

[0142] Step A: Receive the index building request;

[0143] Step B: Obtain relevant data information for index construction from the target customer corresponding to the index construction request;

[0144] Step C: Perform cluster analysis on the relevant data information to obtain the classified data;

[0145] Step D: Configure the building environment, and based on the building environment, concatenate the classified data according to preset dimensions to generate wide table data;

[0146] Step E: Format the wide table data with an index and write it into the corresponding index library to form the first data table, and update the wide table data in the index library with an incremental update mechanism.

[0147] In one embodiment, when the computer program is executed by a processor, it also performs the following steps:

[0148] Clear all data in the existing second data table;

[0149] Check whether the first data table conforms to the preset synchronization standard;

[0150] If the preset synchronization standard is met, all data in the first data table will be synchronized to the second data table to form a new second data table.

[0151] The first data table and the second data table are connected based on a preset switching mechanism to form a one-to-one correspondence.

[0152] The preset switching mechanism includes:

[0153] Obtain the target data from the first data table in the backend server, and extract the index parameters of the target data;

[0154] Compare the index parameter with the preset threshold:

[0155] If the index parameter is higher than or equal to the preset threshold, then the first data table corresponding to the index is switched to the second data table;

[0156] If the index parameter is lower than a preset threshold, no switching will be performed.

[0157] In one embodiment, when the computer program is executed by a processor, it also performs the following steps:

[0158] The index building request is a reference table identifier for wide table creation;

[0159] The corresponding target customer is obtained based on the reference table identifier;

[0160] Based on the corresponding target customer, relevant data information for building the index is obtained, and the relevant data information is the field associated with the reference table identifier.

[0161] In one embodiment, when the computer program is executed by a processor, it also performs the following steps:

[0162] Configure the data source and establish a wide table building environment based on the data source and the preset configuration content;

[0163] Based on the wide table construction environment, the classified data is spliced ​​together according to the same preset dimensions using an asynchronous programming mechanism, merging multiple data streams into one.

[0164] The merged data streams are then concatenated into a two-dimensional table to complete the attribute values ​​of each data stream, resulting in the wide table data.

[0165] In one embodiment, when the computer program is executed by a processor, it also performs the following steps:

[0166] Obtain the splicing time of each data stream in the wide table data, and layer the storage database according to the splicing time. The layer includes the data currently in use, currently under construction, and the data from the last construction.

[0167] Write the splicing time into the third data table, and index and format the data stream corresponding to the splicing time;

[0168] The formatted data stream is written into the corresponding index library to form a first data table. If a new data stream appears, the first data table in the index library is updated using an incremental update mechanism.

[0169] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. Any references to memory, storage, databases, or other media used in the embodiments provided in this application can include non-volatile and / or volatile memory. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link DRAM (SLDRAM), Rambus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

[0170] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

[0171] The embodiments described above are merely illustrative of several implementation methods of this application, and while the descriptions are relatively specific and detailed, they should not be construed as limiting the scope of the invention patent. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the protection scope of this application. Therefore, the protection scope of this patent application should be determined by the appended claims.

Claims

1. An index construction method, characterized in that, The method includes: Receive index building requests; According to the target customer corresponding to the index building request, obtain the relevant data information of the target customer used to build the index; Cluster analysis is performed on the relevant data to obtain categorized data; Configure the building environment, and based on the building environment, concatenate the classified data according to preset dimensions to generate wide table data; The wide table data is indexed and formatted, and written into the corresponding index library to form the first data table. The wide table data in the index library is then updated using an incremental update mechanism. The configuration and construction environment, based on which the categorized data is concatenated according to preset dimensions to generate wide table data, includes: Configure the data source and establish a wide table building environment based on the data source and the preset configuration content; Based on the wide table construction environment, the classified data is spliced ​​together according to the same preset dimension using an asynchronous programming mechanism, and multiple data streams are merged into one, wherein the preset dimension is the time dimension; The merged data streams are then concatenated into a two-dimensional table to complete the attribute values ​​of each data stream, resulting in the wide table data. The step of indexing and formatting the wide table data, writing it into the corresponding index database to form the first data table, and updating the wide table data in the index database using an incremental update mechanism includes: Obtain the splicing time of each data stream in the wide table data, and layer the storage database according to the splicing time. The layer includes the data currently in use, currently under construction, and the data from the last construction. The splicing time is written into a third data table, and the data stream corresponding to the splicing time is indexed and formatted. The third data table is used to store the process data of index building. The formatted data stream is written into the corresponding index library to form a first data table. If a new data stream appears, the first data table in the index library is updated using an incremental update mechanism.

2. The index construction method according to claim 1, characterized in that, The method further includes: Clear all data in the existing second data table; Check whether the first data table conforms to the preset synchronization standard; If the preset synchronization standard is met, all data in the first data table will be synchronized to the second data table to form a new second data table. The first data table and the second data table are connected based on a preset switching mechanism to form a one-to-one correspondence.

3. The index construction method according to claim 2, characterized in that, The preset switching mechanism includes: Obtain the target data from the first data table in the backend server, and extract the index parameters of the target data; Compare the index parameter with the preset threshold: If the index parameter is higher than or equal to the preset threshold, then the first data table corresponding to the index is switched to the second data table; If the index parameter is lower than a preset threshold, no switching will be performed.

4. The index construction method according to claim 1, characterized in that, The step of obtaining the relevant data information of the target customer for index construction based on the target customer corresponding to the index construction request includes: The index building request is a reference table identifier for wide table creation; The corresponding target customer is obtained based on the reference table identifier; Based on the corresponding target customer, relevant data information for building the index is obtained, and the relevant data information is the field associated with the reference table identifier.

5. The index construction method according to claim 1, characterized in that, The clustering analysis algorithm used to perform clustering analysis on the relevant data information includes at least one of the following: partition-based clustering algorithm, density-based spatial clustering algorithm, and Gaussian mixture model.

6. An index building apparatus for implementing the index building method as described in any one of claims 1-5, characterized in that, The device includes: The data receiving module is used to receive index building requests; The index information acquisition module is used to acquire relevant data information of the target customer for index construction based on the target customer corresponding to the index construction request. The classification module is used to perform cluster analysis on the relevant data information to obtain classified data; The wide table data generation module is used to configure the construction environment and, based on the construction environment, splices the classified data according to preset dimensions to generate wide table data. The index generation module is used to format the wide table data into an index and write it into the corresponding index library to form a first data table, and to update the wide table data in the index library using an incremental update mechanism.

7. A computer device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the steps of the method according to any one of claims 1 to 5.

8. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 5.