A follow-on transaction data processing method and device based on big data
By constructing a buy-side transaction queue and matching it with sell-side transactions, calculating the profit of each transaction and storing the record, the problem of low processing efficiency of co-investment transaction data is solved, achieving efficient and accurate processing of co-investment transaction data, and supporting the operation and maintenance and reconciliation of the financial transaction data management platform.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CSC FINANCIAL CO LTD
- Filing Date
- 2026-03-10
- Publication Date
- 2026-06-19
AI Technical Summary
Existing methods for processing co-investment transaction data are inefficient, cannot meet the need for daily updates, and cannot achieve accurate matching, transaction-by-transaction accounting, and statistical analysis, thus affecting the accuracy of platform operation management and customer reconciliation.
Using a big data-based processing method, the system acquires all the transaction data of the platform's follow-up investment, constructs a follow-up purchase transaction queue, and generates buy-sell transaction pairs by matching the follow-up sale transaction data with the first-to-sell logic. The system calculates the profit of each transaction and stores the profit records according to a distributed logic and a daily update strategy.
It achieves efficient daily processing of T-level follow-up investment transaction data, realizes accurate matching of follow-up investment transaction flow, calculates historical returns for each transaction, and integrates transaction data statistical analysis, thereby improving data processing efficiency and accuracy, and providing reliable operation and maintenance support and reconciliation services for financial transaction data management platforms.
Smart Images

Figure CN122243635A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of financial transaction data management technology, and in particular to a method and apparatus for processing follow-on investment transaction data based on big data. Background Technology
[0002] Copy trading data, or simply copy trading data, refers to the financial transaction data generated by a financial management platform during the processing of copy trading transactions. Copy trading data is the core basis for evaluating and selecting trading signal providers. As a form of financial business, the processing of copy trading data requires conducting historical profit calculations and data matching statistics on each transaction's historical return based on the client's buy and sell orders based on the underlying asset signals. This supports the platform's refined data management, client reconciliation, and operational statistics.
[0003] To meet the application requirements of co-investment trading, data processing can be performed based on profit and loss calculation methods commonly used in the financial trading field. The core logic of profit and loss calculation is to subtract the purchase cost and transaction-related fees from the realized income; that is, the profit and loss of a single transaction is calculated as "selling price - holding price × transaction quantity - total fees". For data processing, distributed computing programming models such as MapReduce, used for parallel processing of large-scale datasets, can be employed to achieve distributed co-investment trading data processing.
[0004] However, the general profit and loss calculation method is not adapted to the co-investment scenario and cannot display matching details to adjust the profit calculation dimensions. Furthermore, distributed computing programming models such as MapReduce suffer from frequent disk I / O and low efficiency when processing terabytes of co-investment transaction data, failing to meet the requirements of daily updates. In addition, it is difficult to achieve accurate matching, transaction-by-transaction accounting, and statistical analysis, affecting the accuracy of platform operation management and customer reconciliation. Summary of the Invention
[0005] In view of this, embodiments of this application provide a method and apparatus for processing follow-up investment transaction data based on big data, so as to solve the problems of low efficiency and accuracy in processing follow-up investment transaction data.
[0006] According to a first aspect of this application, a method for processing follow-up investment transaction data based on big data is provided, the method comprising: Obtain all transaction data related to co-investment on the platform, including data on co-buying transactions and co-selling transactions; A follow-buying transaction queue is constructed based on the follow-buying transaction data, and the follow-buying transaction queue includes multiple follow-buying transactions sorted according to the follow-buying timestamp; For each follow-up sell transaction in the follow-up sell transaction data, the follow-up buy transaction queue is matched with the follow-up buy transaction queue through the first-to-sell logic to generate a buy-sell transaction pair; Calculate the profit per transaction for the buy-sell pair, and generate a profit record based on the profit per transaction. The profit record includes the total profit obtained by aggregating the profit per transaction using the following transaction flow or the following transaction flow as the aggregation dimension. The revenue records are stored according to distributed logic and a daily update strategy.
[0007] In some embodiments, obtaining the full transaction flow data of the platform's follow-up investments includes: Obtain the recommended target signal associated with the aforementioned follow-up investment transaction flow data; The transaction data of the follow-up investments are classified according to the recommended target signal to determine the type of target; A pipeline matching strategy is set according to the type of the target, and the pipeline matching strategy is used to perform pipeline matching; Read the follow-up investment operation markers from the full transaction flow data of the platform; Based on the follow-up investment operation marker, follow-up purchase transactions are filtered out from the follow-up investment transaction flow data to obtain the follow-up purchase transaction flow data.
[0008] In some embodiments, a flow matching strategy is set according to the type of the target, including: The type of the target asset is read, which includes a first type, a second type, and a third type; the first type indicates that the target asset has no buy or sell direction; the second type indicates that buy or sell direction is added; and the third type indicates that trading time is increased. The transaction matching method is set based on the type of the target asset. Specifically, when the target asset type is the first type, the transaction matching method is set to filter transaction flows within the recommended valid time period; when the target asset type is the second type, the transaction matching method is set to match the time and the same buy / sell direction; when the target asset type is the third type, the transaction matching method is set to match the time, buy / sell direction, and customer transaction time after the recommended target time. The pipeline matching strategy is generated based on the pipeline matching method.
[0009] In some embodiments, for each follow-the-sell transaction in the follow-the-sell transaction data, a transaction matching process is performed with the follow-the-sell transaction queue using a first-to-first-to-sell logic to generate a buy-sell transaction pair, including: Based on the buy-follow timestamp, the head of the buy-follow queue is determined according to the first-to-buy-first-sell logic; Starting from the head of the queue, each sell order is matched with a buy order in sequence to generate the buy-sell transaction pair; The objective data associated with the buy-sell transaction pair includes at least one of the following: transaction price, transaction quantity, and transaction fee.
[0010] In some embodiments, starting from the head of the queue, for each sell order, buy orders are sequentially matched to generate the buy-sell transaction pair, including: The number of follow-buy transactions is counted from the follow-buy transaction flow data, including the number of individual follow-buy transactions. The number of followers is counted from the aforementioned follow-up transaction data; If the number of buy orders in a single transaction is greater than the number of sell orders, the buy order flow is split into a matched portion and a remaining portion. The buy-sell transaction pair is generated based on the matched portion, and the remaining portion is stored in the buy order flow queue. If the number of buy orders in a single transaction is less than or equal to the number of sell orders, a full matching process is performed on the buy order flow to generate the buy-sell transaction pair.
[0011] In some embodiments, calculating the profit per transaction for the buy-sell pair includes: By traversing the buy and sell transaction pairs, the matching quantity and value information are obtained; the value information includes the sell price, buy price, buy commission, and sell commission. Calculate the difference between the selling price and the buying price to obtain the price difference per transaction; The single transaction revenue value is calculated based on the single transaction price difference and the matching quantity, wherein the single transaction revenue value is the product of the single transaction price difference and the matching quantity; Calculate the sum of the buy-follow-sell-follow-buy commission and the sell-follow-sell commission to obtain the single expenditure value; The single-transaction revenue is calculated based on the single-transaction income value and the single-transaction expenditure value, whereby the single-transaction revenue is the difference between the single-transaction income value and the single-transaction expenditure value.
[0012] In some embodiments, generating a revenue record based on the single revenue transaction includes: Set an aggregation dimension, which is set based on the follow-selling transaction volume or the follow-buying transaction volume; The individual revenues are aggregated one by one according to the aggregation dimension to generate the total revenue; Obtain the record information associated with the single revenue, the record information including at least one of customer identifier, target code and transaction timestamp; The revenue record is generated based on the recorded information and the total revenue.
[0013] In some embodiments, the revenue records are stored according to distributed logic and a daily update strategy, including: Obtain the customer's unique identifier and transaction timestamp corresponding to the revenue record; Generate a composite partition key based on the customer's unique identifier and the transaction timestamp; Based on the composite partition key, the revenue records are stored in partitions using a distributed file system. The distributed file system achieves distributed matching through a big data processing core and revenue aggregation through big data structured queries.
[0014] In some embodiments, after storing the revenue records according to distributed logic and a daily update strategy, the method further includes: Obtain resource scheduling time information, which includes the task execution time and task end time set according to a preset idle time period; A resource coordination queue is created based on the resource scheduling time information. The resource coordination queue is a dedicated big data processing queue created based on a preset capacity scheduling priority. Define the resources of a single node and obtain the total resources of the cluster; Request application resources based on the single node resources and the total cluster resources; Dynamic resource allocation is performed according to the application resources to automatically scale up or down based on the data volume of the resource coordination queue.
[0015] According to a second aspect of this application, a data processing device for follow-up investment transactions based on big data is provided, the device comprising: The data collection and filtering module is used to acquire all follow-up investment transaction data on the platform, including follow-buy transaction data and follow-sell transaction data. The queue management module is used to construct a follow-buy transaction queue based on the follow-buy transaction data. The follow-buy transaction queue includes multiple follow-buy transactions sorted according to the follow-buy timestamp. The transaction matching module is used to perform transaction matching with the buy-first-sell queue for each buy-first-sell transaction in the buy-first-sell transaction data, so as to generate buy-sell transaction pairs. The profit calculation and aggregation module is used to calculate the profit of a single transaction of the buy-sell transaction pair, and to generate a profit record based on the single profit. The profit record includes the total profit obtained by aggregating the single profit using the following transaction flow or the following transaction flow as the aggregation dimension. The big data processing module is used to store the revenue records according to distributed logic and a daily update strategy.
[0016] According to a third aspect of this application, a computer device is provided, including a storage medium, a processor, and a computer program stored on the storage medium and executable on the processor, wherein the processor executes the program to implement the above-described big data-based follow-up investment transaction data processing method.
[0017] According to a fourth aspect of this application, a storage medium is provided that stores a computer program thereon, which, when executed by a processor, implements the above-described method for processing follow-up investment transaction data based on big data.
[0018] By employing the above technical solutions, this application provides a method and apparatus for processing follow-up investment transaction data based on big data. After acquiring all follow-up investment transaction flow data from the platform, the method constructs a follow-buy flow queue based on the follow-buy transaction flow data. For each follow-sell flow in the follow-sell transaction flow data, it performs flow matching with the follow-buy flow queue using a first-in, first-out logic to generate buy-sell transaction pairs. Then, it calculates the single-transaction profit of the buy-sell transaction pair and generates profit records based on these single-transaction profits, storing the profit records according to distributed logic and a daily update strategy. This method can achieve efficient daily processing of terabyte-level follow-up investment transaction data, integrating accurate matching of follow-up investment transaction flows, historical profit calculation for each transaction, and statistical analysis of transaction data, thereby improving the efficiency and accuracy of follow-up investment transaction data processing. This method can provide reliable operation and maintenance support and reconciliation services for financial transaction data management platforms, further improving the efficiency and accuracy of follow-up investment transaction data processing.
[0019] The above description is only an overview of the technical solution of this application. In order to better understand the technical means of this application and to implement it in accordance with the contents of the specification, and to make the above and other objects, features and advantages of this application more obvious and understandable, specific embodiments of this application are given below. Attached Figure Description
[0020] The accompanying drawings, which are included to provide a further understanding of this application and form part of this application, illustrate exemplary embodiments and are used to explain this application, but do not constitute an undue limitation of this application. In the drawings: Figure 1 A schematic diagram of the data processing method for follow-up investment transactions based on big data, provided for an embodiment of this application; Figure 2 This is a schematic diagram illustrating the entire process of co-investment transaction data processing provided in the embodiments of this application; Figure 3 This is a schematic diagram of the process for generating buy-sell transaction pairs provided in the embodiments of this application; Figure 4 This is a schematic diagram of the storage revenue recording process provided in an embodiment of this application; Figure 5This is a schematic diagram of the resource scheduling process provided in the embodiments of this application; Figure 6 This is a schematic diagram of the structure of a big data-based follow-up investment transaction data processing device provided in an embodiment of this application. Detailed Implementation
[0021] The present application will be described in detail below with reference to the accompanying drawings and embodiments. It should be noted that, unless otherwise specified, the embodiments and features described in the embodiments of the present application can be combined with each other.
[0022] In this embodiment, the big data-based copy trading data processing method and apparatus can be applied to the matching and calculation of copy trading transaction data in the financial field. Copy trading is a financial business activity whereby a user follows a major investor, investing in the same asset under the same conditions and price. The series of transaction data generated by copy trading is referred to as copy trading data.
[0023] Co-investment transaction data, also known as co-investment transaction flow data or co-investment data, can be processed by financial management platforms, including generation, transformation, storage, matching, and accounting. Co-investment transaction data is the core basis for evaluating and selecting trading signal providers. Since co-investment is a financial business model, the processing of its generated co-investment transaction data requires analyzing the transaction flow formed by clients buying and selling based on target signals. Therefore, financial management platforms can conduct historical return accounting and data matching statistics on a transaction-by-transaction basis to support refined data management, client reconciliation, and operational statistics.
[0024] To meet the application requirements of co-investment trading, in some embodiments, co-investment trading data processing can be based on profit and loss calculation methods commonly used in the financial trading field. The core logic of profit and loss calculation is to subtract the purchase cost and transaction-related fees from the realized income; that is, the profit and loss of a single transaction is calculated as "selling price - holding price × transaction quantity - total fees". Furthermore, in terms of data processing, distributed computing programming models such as MapReduce, which are used for parallel processing of large-scale datasets, can be adopted to achieve distributed co-investment trading data processing.
[0025] However, the general profit and loss calculation method is not adapted to the co-investment scenario and cannot display matching details to adjust the profit calculation dimensions. Furthermore, distributed computing programming models such as MapReduce suffer from frequent disk I / O and low efficiency when processing terabytes of co-investment transaction data, failing to meet the requirements of daily updates. In addition, it is difficult to achieve accurate matching, transaction-by-transaction accounting, and statistical analysis, affecting the accuracy of platform operation management and customer reconciliation.
[0026] To address the issues of low efficiency and accuracy in processing co-investment transaction data, some embodiments of this application provide a big data-based co-investment transaction data processing method. This method, based on a big data processing framework, filters out co-investment transaction flows, then uses First In First Out (FIFO) logic to split and match buy and sell flows, calculates and aggregates the returns for each buy and sell transaction pair, and finally utilizes distributed storage and scheduling to achieve daily updates of terabyte-level data, thereby improving the efficiency and accuracy of co-investment transaction data processing.
[0027] It should be noted that the processing of the co-investment transaction data is performed by the financial management platform, which is used to match and calculate the transaction flow data related to the co-investment transactions to generate profit record data that can be stored in the database. The entire co-investment transaction data processing process only involves pure data processing. The method, apparatus, and financial management platform built based on the method do not participate in investment decisions, nor do they generate any commercial schemes that affect trading behavior.
[0028] The method described can be applied to a financial transaction data management platform (hereinafter referred to as the financial management platform). For example, the financial transaction data management platform is deployed on a 10-node Spark cluster, with each node configured with 32 CPU cores, 256GB of memory, and 2TB of SSD hard disk. HDFS is used as the distributed storage system, with a total storage capacity of 200TB. The database uses HBase to store structured historical return results for each transaction, and Grafana is used for front-end visualization to generate transaction data operation and maintenance reports. The transaction flow accessed by the platform consists of completed stock-type target transaction data, generating an average of 500GB of follow-investment transaction data per day, including follow-buy and follow-sell transaction flow data from more than 3 million clients. All data are objective transaction records and do not involve any investment prediction or decision-making related information.
[0029] The method can be applied to electronic devices that establish a communication connection with a financial management platform and have data processing capabilities. These electronic devices include, but are not limited to, computers, servers, mobile terminals, smart wearable devices, and industrial control machines. For ease of description, this application embodiment uses a financial management platform as the executing entity of the method. It should be understood that the method can also be applied to other types of executing entities, which are not illustrated in this application embodiment. Figure 1 As shown, the method includes: S101. Obtain full transaction data of the platform's follow-up investment.
[0030] When processing co-investment transaction data, the financial management platform can first obtain the platform's full co-investment transaction flow data. This full co-investment transaction flow data refers to all co-investment transaction data generated within the set data processing period.
[0031] The follow-up investment transaction data includes follow-buy transaction data and follow-sell transaction data. The follow-buy transaction data is a dataset composed of multiple follow-buy transactions, and each follow-buy transaction is marked with a timestamp of the follow-buy transaction operation, i.e., the follow-buy timestamp. Similarly, the follow-sell transaction data is a dataset composed of multiple follow-sell transactions, and each follow-sell transaction is marked with a timestamp of the follow-sell transaction operation, i.e., the follow-sell timestamp.
[0032] The financial management platform can retrieve all follow-up investment transaction data from a database storing such transaction records. To do this, the platform can determine the data processing period based on the current time and generate a data retrieval request accordingly, which is then sent to the database. Upon receiving the request, the database reads the data processing period and returns all follow-up investment transaction data within that period to the platform. After retrieving the full follow-up investment transaction data, the platform can further filter the transaction data. In some embodiments, when retrieving the full follow-up investment transaction data, it can also obtain signals of recommended investment targets associated with the data. The data is then categorized according to these signals to determine the target type. Finally, a transaction matching strategy is set based on the target type to execute the transaction matching process.
[0033] To set up a transaction matching strategy, when setting the strategy based on the type of asset, the asset types obtained from the asset classification can be read first. These asset types include a first type, a second type, and a third type. The first type indicates that the asset has no buy or sell direction; the second type indicates that buy or sell directions are added; and the third type indicates that the transaction time is increased.
[0034] Then, based on the type of asset, a transaction matching method is set, thereby generating a transaction matching strategy. Specifically, when the asset type is type 1, the transaction matching method is set to filter transaction flows within the recommended valid time period; when the asset type is type 2, the transaction matching method is set to match the time and the same buy / sell direction; when the asset type is type 3, the transaction matching method is set to match the time, buy / sell direction, and customer transaction time after the recommended asset's time.
[0035] For example, a financial management platform can have a built-in data collection and filtering module, which includes a data access unit and a transaction flow filtering unit. After obtaining the platform's full transaction flow data and recommended target signals through the data access unit, the financial management platform can classify the targets into multiple target types. The first type of target type is one with no buy / sell direction; in this case, transaction flow can be matched according to the transaction flow within the recommended valid time period. The second type of target type adds buy / sell direction; this type of target needs to match transaction flow with the same time and buy / sell direction. The third type of target type adds transaction time; this type of target needs to match transaction flow with time, buy / sell direction, and customer transaction time that are after the recommended target's time.
[0036] After classifying the targets, the system can read the follow-investment operation tags from the platform's full transaction data and filter out the follow-buy transactions from the follow-investment transaction data to obtain the follow-buy transaction data. The tag content of the follow-investment operation tag has a unique mapping relationship with the follow-buy operation.
[0037] For example, at 0:00 AM, the data collection and filtering module uses the DataX tool to obtain the full transaction data of the previous day. The full transaction data is in ORC format and includes objective fields such as unique customer identifier, transaction timestamp, transaction type, asset code, transaction quantity, transaction price, and follow-up investment operation flag. The transaction flow filtering unit, based on purely technical filtering conditions of "transaction time" and "transaction type," combined with the unique customer identifier associated with the signed follow-up investment service, filters out the follow-up purchase transaction flow from the full data and stores it in the " / logasset / 20251119" directory in HDFS. This step is only a technical filtering operation of the transaction flow and does not involve any investment value judgment or evaluation of transaction behavior.
[0038] After the financial management platform obtains all follow-up investment transaction data through the data access unit, it can use the transaction filtering unit to filter the data. During the filtering process, the filtering unit can read the follow-up operation markers in the follow-up investment transaction data. Since the marker content can have a unique mapping relationship with the follow-buy operation, it can determine whether a follow-up investment transaction is a follow-buy transaction based on the follow-up operation marker, thus enabling the filtering of follow-buy transactions within the follow-up investment transaction data. Multiple filtered follow-buy transactions can also form a dataset, i.e., follow-buy transaction data.
[0039] Similarly, follow-the-sell transactions can be filtered from the follow-the-sell transaction data based on follow-the-sell operation tags. That is, if the tag content of a follow-the-sell operation has a unique mapping relationship with a follow-the-sell operation, then the follow-the-sell transaction with that tag can be identified as a follow-the-sell transaction, thus allowing for filtering of follow-the-sell transactions from the follow-the-sell transaction data. Furthermore, multiple filtered follow-the-sell transactions can be combined to form a dataset, i.e., follow-the-sell transaction data.
[0040] After filtering out buy and sell transaction data by marking the follow-up investment operation, the remaining follow-up investment transaction data without the follow-up investment operation mark can be identified as independent transactions. Since independent transactions do not participate in the matching and calculation of follow-up investment transaction data, filtering buy-up transactions can exclude independent transactions, reduce the amount of data processing and the interference of independent transaction data on the data processing results.
[0041] S102. Construct a follow-buy transaction queue based on the follow-buy transaction data.
[0042] After obtaining the full volume of follow-up investment transaction data from the platform, the follow-buy transaction data can be split into transaction flows. That is, a follow-buy transaction queue can be constructed based on the follow-buy transaction data. The follow-buy transaction queue includes multiple follow-buy transactions sorted by follow-buy timestamp.
[0043] like Figure 2 As shown, after acquiring all the follow-up investment transaction data from the platform, the financial management platform can iterate through the follow-up investment transaction data by timestamp to determine the follow-up timestamp of each transaction. Then, it constructs a follow-up investment transaction queue in ascending order of the follow-up timestamps, sorting the transactions from earliest to latest. This places transactions with earlier follow-up timestamps closer to the beginning of the sequence, and those with later timestamps closer to the end, thus achieving transaction queue splitting. The follow-up investment transaction queue formed by this transaction queue splitting can be used to constrain the order of subsequent buy and sell matching.
[0044] For example, a financial management platform can have a built-in transaction matching module. This module can read follow-up investment records from HDFS, perform technical partitioning based on the customer's unique identifier, and establish a follow-up buying queue sorted by transaction timestamp in ascending order. Simultaneously, it reads the sorted follow-up selling records and performs purely technical matching processing on each record. Taking customer A's follow-up selling records as an example: Customer A followed up selling 1000 shares of target X at a price of 1.5 yuan / share at 14:30 on November 19, 2025. There are two objective records in their follow-up buying queue: 500 shares at 1.2 yuan / share at 09:30 on November 19, 2025, and 600 shares at 1.3 yuan / share at 11:00 on November 19, 2025. The matching unit first fully matches the 500-share buy record and sell record at 09:30, removes the buy record from the queue, and matches the remaining 500-share sell record with the buy record at 11:00. The buy record at 11:00 is then split into "500 matched shares" and "100 remaining shares". At this point, the sell record processing is complete, and the buy queue only has the 100-share buy record from 11:00 on 2025-11-19. Two sets of objective data for "buy-sell" trading pairs are generated: (buy 500 shares at 09:30, sell 500 shares at 14:30) and (buy 500 shares at 11:00, sell 500 shares at 14:30). Only objective data such as transaction fees, transaction amount, and transaction price of the two sets of transactions are associated. Similarly, no evaluation of investment ability or trading strategy is involved when executing the transaction flow split.
[0045] S103. For each follow-the-sell transaction in the follow-the-sell transaction data, perform transaction matching with the follow-the-buy transaction queue through the first-to-sell logic to generate a buy-sell transaction pair.
[0046] After constructing the buy / sell transaction queue, buy / sell matching can be performed based on the queue. Buy / sell matching allows for the identification of buy / sell transactions for the same asset (or the same user) from both buy and sell transaction data, forming related buy / sell pairs. Therefore, for each sell transaction in the sell transaction data, a buy-first-sell logic can be used to match it with the buy / sell queue to generate buy / sell pairs.
[0047] In order to generate buy-sell trading pairs, in some embodiments, when performing flow matching, the head of the buy-following flow queue can be determined first based on the buy-following timestamp and according to the first-to-sell logic. Starting from the head of the queue, buy-following flows are matched sequentially for each sell-following flow to generate buy-sell trading pairs.
[0048] like Figure 3 As shown, when matching each sell order with buy orders to generate a buy-sell trading pair, the number of buy orders can be counted from the buy order data, and the number of sell orders can be counted from the sell order data. The number of buy orders includes the number of buy orders per order, i.e., the number of buy orders contained in a single buy order.
[0049] Next, the quantity of a single buy order is compared with the quantity of a single sell order. If the quantity of a single buy order is greater than the quantity of a single sell order, the buy order flow is split into a matched portion and a remaining portion. A buy-sell transaction pair is generated based on the matched portion, and the remaining portion is stored in the buy order flow queue. If the quantity of a single buy order is less than or equal to the quantity of a single sell order, a full match is performed on the buy order flow to generate a buy-sell transaction pair.
[0050] For example, after constructing a buy / sell transaction queue, a financial management platform can match buy and sell transactions for each sell / sell transaction, starting from the head of the queue according to the buy / sell time sequence. During the matching process, the quantity of a single buy transaction can be compared with the quantity of a single sell transaction. If the quantity of a single buy transaction exceeds the quantity of a single sell transaction, the buy / sell transaction can be split into a "matched portion" and a "remaining portion." The buy / sell transactions in the matched portion can form buy / sell pairs with the matched sell / sell transactions, while the remaining buy / sell transactions can be retained in the buy / sell transaction queue for subsequent matching.
[0051] Correspondingly, by comparing the number of buy orders and sell orders in a single transaction, if the number of buy orders is less than the number of sell orders, a full match can be performed. After the full match, the next buy order will be matched until the number of sell orders is completely matched, thus generating a buy-sell transaction pair.
[0052] After generating buy-sell trading pairs, objective data can be associated with them. This objective data includes at least one of the following: transaction price, transaction quantity, and transaction fee. For example, after generating buy-sell trading pairs, objective data related to the buy and sell transactions can be extracted from the platform's full-volume follow-the-market transaction data. This allows the association of objective data such as transaction price, transaction quantity, and transaction fee with the buy-sell trading pairs for subsequent profit calculation and profit aggregation.
[0053] S104. Calculate the profit per transaction for the buy-sell pair and generate a profit record based on the profit per transaction.
[0054] After generating buy-sell trading pairs through buy-sell matching, historical profit calculations can be performed on each buy-sell trading pair, i.e., calculating the profit of a single buy-sell trading pair, and data aggregation can be performed, i.e., generating profit records based on single profit. These profit records include the total profit obtained by aggregating single profit by using either buy or sell volume as the aggregation dimension.
[0055] To perform historical profit calculations, in some embodiments, the profit per transaction for a buy-sell pair can be calculated by iterating through the buy-sell pairs to obtain matching quantity and value information. The value information includes the sell price, buy price, buy commission, and sell commission.
[0056] Next, calculate the difference between the selling price and the buying price to obtain the single-transaction price difference. Then, calculate the single-transaction revenue value based on the single-transaction price difference and the matching quantity, where the single-transaction revenue value is the product of the single-transaction price difference and the matching quantity. Then, calculate the sum of the buying and selling fees to obtain the single-transaction expenditure value. Finally, calculate the difference between the single-transaction revenue value and the single-transaction expenditure value to obtain the single-transaction profit.
[0057] For example, when performing historical earnings calculation on a transaction-by-transaction basis, the individual earnings (TPs) can be calculated using the following formula: TPs=(Ps-Pp)×N-(HFp+HFs); Where TPs represents the profit per transaction; Ps represents the sell price, which is the transaction price of a sell transaction; Pp represents the buy price, which is the transaction price of a buy transaction; N represents the matching quantity, which is the number of transactions in a single buy-sell pair; HFp represents the buy transaction fee, which is the objective cost incurred by the buy transaction, such as commission and stamp duty; and HFs represents the sell transaction fee, which is the objective cost incurred by the sell transaction.
[0058] After calculating historical earnings on a transaction-by-transaction basis, the calculated individual earnings can be aggregated to generate earnings records. In some embodiments, when generating earnings records based on individual earnings, an aggregation dimension can first be set based on the follow-selling or follow-buying transaction data, and then individual earnings can be aggregated according to the aggregation dimension to generate total earnings. Then, the record information associated with each individual earnings is obtained, and the earnings record is generated based on the record information and the total earnings. The record information includes at least one of the following: customer identifier, target code, and transaction timestamp.
[0059] For example, a financial management platform can have a built-in historical return calculation and aggregation module, which can calculate returns for the two trading pairs mentioned above, namely: The first set of historical earnings data = (1.5 - 1.2) × 500 - (1.2 × 500 × 0.1% + 1.5 × 500 × 0.1% + 0.02 × 500) = 150 - (0.6 + 0.75 + 10) = 138.65 yuan; The second set of historical earnings data = (1.5 - 1.3) × 500 - (1.3 × 500 × 0.1% + 1.5 × 500 × 0.1% + 0.02 × 500) = 100 - (0.65 + 0.75 + 10) = 88.6 yuan; The data aggregation unit then uses this follow-the-sale transaction volume as a dimension to summarize the total historical revenue data = 138.65 + 88.6 = 227.25 yuan, associates it with objective information such as the customer's unique identifier, the target code, and the transaction date, and stores it in the "follow_profit" table in HBase. This aggregation is a pure data summary operation and does not involve any portfolio analysis or product recommendations.
[0060] After calculating the profit of each transaction through historical profit accounting, the financial management platform aggregates the total profit by the dimensions of follow-selling or follow-buying transaction volume, and records information such as customer identifier, target code, and transaction timestamp associated with the aggregated results to form a profit record for each transaction.
[0061] S105. Store revenue records according to distributed logic and daily update strategy.
[0062] After generating revenue records, the financial management platform can store these records according to a distributed logic and daily update strategy, enabling big data processing and daily updates. Distributed logic means that the system's business processing rules, calculation processes, or decision-making algorithms do not run on a single server, but are distributed across multiple computer nodes, and these nodes need to work collaboratively to complete the overall task.
[0063] A daily update strategy refers to a system where the data, models, or rules are not changed in real time, but rather refreshed on a daily basis. That is, at a fixed time each day, new data generated within the past 24 hours is incrementally added.
[0064] In some embodiments, when storing revenue records according to distributed logic and a daily update strategy, the customer's unique identifier and transaction timestamp corresponding to the revenue record can be obtained first, and a composite partition key can be generated based on the customer's unique identifier and transaction timestamp. Then, based on the composite partition key, a distributed file system is used to partition and store the revenue records. The distributed file system achieves distributed matching through a big data processing core and revenue aggregation through big data structured queries.
[0065] For example, such as Figure 4 As shown, to achieve big data processing and daily updates, the financial management platform can use the Hadoop Distributed File System (HDFS) to partition and store relevant data for profit records according to a composite partition key formed by the customer's unique identifier and the transaction date. Furthermore, during data storage, distributed matching can be achieved through Spark Core. The number of partitions in the Spark Core's Resilient Distributed Dataset (RDD) is set to 200 to meet the parallelism requirements when calculating profit records in the cluster.
[0066] After storing the revenue record data, the financial management platform can also use structured query engines such as Spark SQL to perform revenue aggregation calculations. When performing revenue aggregation calculations, a temporary view can be created by running "df.create OrReplace TempView("trades")", and based on this temporary view, the following query can be executed: "SELECT customer_id, trade_date, SUM(profit) as total_profit, COUNT(*) as trade_count FROM tradesGROUP BY customer_id, trade_date" to perform revenue aggregation calculations. Then, "OVER(PARTITION BY customer_id ORDER BY trade_date)" is used to calculate cumulative revenue, ranking, etc., and window function analysis is performed. Finally, the aggregation results are cached by executing "persist(StorageLevel.MEMORY_AND_DISK)". In some embodiments, the revenue record data can also be displayed, such as generating objective statistical reports. To this end, the financial management platform, based on big data frameworks such as Spark and FIFO matching logic, achieves accurate matching of follow-up investment transaction flows, calculation of profit on a transaction-by-transaction basis, and daily updates of terabyte-level data. The processing results can then be displayed to operations personnel through Grafana.
[0067] For example, a financial management platform can have a built-in big data processing module. The computing scheduling unit of the big data processing module starts a Spark technology task at 0:30 AM, which uses an RDD partition with 200 partitions in Spark Core to perform distributed matching of 3 million buy orders and 1.8 million sell orders. The task parallelism is set to 320, and the technical matching of all trading pairs is completed in 45 minutes. Then, it performs arithmetic calculations and data aggregation of historical returns for each transaction through Spark SQL, generating 1.8 million records of historical returns for a single sell order, which takes 30 minutes. All computing tasks are completed at 2:00 AM, and the results are synchronized to HBase and the front-end visualization system. Before 9:00 AM that day, the platform operators can view the transaction data statistics report of the previous day through Grafana.
[0068] By applying the technical solutions of the above embodiments, the big data-based follow-up investment transaction data processing method described in the above embodiments can achieve efficient daily processing of T-level follow-up investment transaction data, while achieving accurate matching of follow-up investment transaction flows, historical profit calculation for each transaction, and integrated statistical analysis of transaction data. This improves the efficiency and accuracy of follow-up investment transaction data processing, solves the problem that matching details cannot be displayed and profit calculation dimensions cannot be adjusted in follow-up investment transaction flow scenarios, and provides reliable operation and maintenance support and customer reconciliation services for financial transaction data management platforms.
[0069] In some embodiments, as a refinement and extension of the specific implementation of the above embodiments, and to fully illustrate the specific implementation process of this embodiment, some embodiments of this application also provide a method for processing follow-on investment transaction data based on big data. The difference between this method and the above embodiments is that resource scheduling can be performed based on a Yet Another Resource Negotiator (YARN) cluster configuration. For example... Figure 5 As shown, the method includes: S201. Obtain resource scheduling time information; S202. Create a resource coordination queue based on resource scheduling time information; S203, Define the resources of a single node and obtain the total resources of the cluster; S204. Request application resources based on the resources of a single node and the total resources of the cluster; S205. Perform dynamic resource allocation according to application resources, and automatically expand or shrink the resource coordination queue based on the amount of data.
[0070] In order to achieve resource scheduling, after the financial management platform stores the revenue records according to distributed logic and daily update strategy, it can first obtain resource scheduling time information, which includes the task execution time and task end time set according to the preset idle time period.
[0071] A resource coordination queue is then created based on the resource scheduling time information. This resource coordination queue is a dedicated big data processing queue created based on a preset capacity scheduling priority. Then, by defining single-node resources and obtaining the total cluster resources, application resources are requested based on the single-node resources and the total cluster resources. Dynamic resource allocation is then performed according to the application resources to automatically scale up or down based on the data volume in the resource coordination queue.
[0072] For example, a financial management platform can utilize a YARN cluster for resource scheduling. This YARN cluster is typically a Spark cluster with 10 nodes, each with 32 cores and 256GB of memory. Tasks can then be automatically executed via the YARN cluster at midnight each day, completing the previous day's data processing and synchronizing it to HBase before 9:00 AM.
[0073] When performing resource scheduling, a dedicated queue, spark-trading, can be created as a YARN queue by setting capacity scheduling priority. Then, the single-node resource definition is achieved by executing "yarn.nodemanager.resource.cpu-vcores=32, yarn.nodemanager.resource.memory-mb=262144 (256GB)". Next, the total cluster resources are obtained, i.e., 10 nodes × 32 cores = 320 vcores and 10 nodes × 256GB = 2.56TB of memory. Then, Spark application resource requests are made using "num-executors 20 --executor-cores 8 --executor-memory 96g --driver-memory 32g". Finally, "spark.dynamicAllocation.enabled=true" is enabled for dynamic resource allocation, allowing automatic scaling based on data volume. The resource allocation strategy is 20 Executors × 8 cores = 160 cores for parallel computing, reserving resources for daemons such as NodeManager and DataNode.
[0074] By applying the technical solutions of the above embodiments, the big data-based follow-up investment transaction data processing method described in the above embodiments can achieve precise splitting and matching of buy and sell transaction flows through FIFO logic, thereby improving data matching accuracy. After testing, the matching accuracy of a single transaction flow reaches over 99.9%, and it supports detailed display of matching information to facilitate adjustment of profit calculation dimensions and accurate reconciliation. The method also achieves daily updates of terabyte-level data based on the Spark framework, improving data processing efficiency. The computation time of the method can be reduced by more than 16 times compared to MapReduce technology, and it can efficiently process 500GB of follow-up investment data daily.
[0075] The method also possesses strong data management capabilities. By generating transaction-by-transaction revenue records and objective statistical reports, covering revenue distribution at the customer level and matching completion rate at the target level, it enhances the platform's refined data management level and supports operational optimization and customer reconciliation. Furthermore, it is adaptable to various co-investment transaction flow scenarios, supporting configuration by objective parameters such as transaction type and customer identifier, improving scenario adaptability and flexibility. In addition, the method employs HDFS distributed storage and partition management technology to improve data security and ensure the secure storage and efficient querying of terabyte-level transaction data, meeting financial data compliance requirements.
[0076] In some embodiments, as a specific implementation of the big data-based follow-up investment transaction data processing method described in the above embodiments, some embodiments of this application also provide a big data-based follow-up investment transaction data processing device, such as... Figure 6 As shown, the device includes: The data collection and filtering module is used to acquire all follow-up investment transaction data on the platform, including follow-buy transaction data and follow-sell transaction data. The queue management module is used to construct a follow-buy transaction queue based on the follow-buy transaction data. The follow-buy transaction queue includes multiple follow-buy transactions sorted according to the follow-buy timestamp. The transaction matching module is used to perform transaction matching with the buy-first-sell queue for each buy-first-sell transaction in the buy-first-sell transaction data, so as to generate buy-sell transaction pairs. The profit calculation and aggregation module is used to calculate the profit of a single transaction of the buy-sell transaction pair, and to generate a profit record based on the single profit. The profit record includes the total profit obtained by aggregating the single profit using the following transaction flow or the following transaction flow as the aggregation dimension. The big data processing module is used to store the revenue records according to distributed logic and a daily update strategy.
[0077] In some embodiments, the functional modules included in the big data-based co-investment transaction data processing device may further include functional units. For example, the data acquisition and filtering module may include a data access unit and a transaction filtering unit; the transaction matching module may include a splitting and matching unit; the revenue calculation and aggregation module may include a revenue calculation unit and a data aggregation unit; and the big data processing module may include a distributed storage unit, an HDFS unit, a YARN computing scheduling unit, and a Spark task execution unit, etc.
[0078] In addition, the device may also include an operation and maintenance data statistics module. The hardware architecture of the operation and maintenance data statistics module consists of a processor and a memory. The memory is used to store drivers for the processor to call, so as to realize operation and maintenance data statistics.
[0079] By employing the above technical solutions, this application provides a method and apparatus for processing follow-up investment transaction data based on big data. After acquiring all follow-up investment transaction flow data from the platform, the method constructs a follow-buy flow queue based on the follow-buy transaction flow data. For each follow-sell flow in the follow-sell transaction flow data, it performs flow matching with the follow-buy flow queue using a first-in, first-out logic to generate buy-sell transaction pairs. Then, it calculates the single-transaction profit of the buy-sell transaction pair and generates profit records based on these single-transaction profits, storing the profit records according to distributed logic and a daily update strategy. This method can achieve efficient daily processing of terabyte-level follow-up investment transaction data, integrating accurate matching of follow-up investment transaction flows, historical profit calculation for each transaction, and statistical analysis of transaction data, thereby improving the efficiency and accuracy of follow-up investment transaction data processing. This method can provide reliable operation and maintenance support and reconciliation services for financial transaction data management platforms, further improving the efficiency and accuracy of follow-up investment transaction data processing.
[0080] It should be noted that other corresponding descriptions of the functional units involved in the big data-based follow-up investment transaction data processing device provided in this application embodiment can be found in the corresponding descriptions in the big data-based follow-up investment transaction data processing method provided in the above embodiments, and will not be repeated here.
[0081] This application also provides a computer device, specifically a personal computer, server, network device, etc. The computer device includes a bus, processor, memory, and communication interface, and may also include input / output interfaces and a display device. The processor of the computer device provides computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The database of the computer device stores location information. The network interface of the computer device is used for communication with external terminals via a network connection. When the computer program is executed by the processor, it implements the steps in the various method embodiments.
[0082] Those skilled in the art will understand that the structure of the computer device described above is only a partial structure related to the solution of this application, and does not constitute a limitation on the computer device to which the solution of this application is applied. A specific computer device may include more or fewer components, or combine certain components, or have different component arrangements.
[0083] In one embodiment, a computer-readable storage medium is also provided, which may be non-volatile or volatile, and a computer program is stored thereon, which, when executed by a processor, implements the steps in the above method embodiments.
[0084] In one embodiment, a computer program product is also provided, including a computer program that, when executed by a processor, implements the steps in the above method embodiments.
[0085] It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties.
[0086] Those skilled in the art will understand that all or part of the processes in the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium. When the computer program is executed, it can include the processes of the embodiments of the above methods.
[0087] Any references to memory, database, or other media used in the embodiments provided in this application may include at least one of non-volatile and volatile memory. Non-volatile memory may include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive random access memory (ReRAM), magnetic random access memory (MRAM), ferroelectric random access memory (FRAM), phase change memory (PCM), graphene memory, etc.
[0088] Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM can take many forms, such as static random access memory (SRAM) or dynamic random access memory (DRAM).
[0089] The databases involved in the embodiments provided in this application may include at least one type of relational database and non-relational database. Non-relational databases may include, but are not limited to, distributed databases based on blockchain. The processors involved in the embodiments provided in this application may be, but are not limited to, general-purpose processors, graphics processors, digital signal processors, programmable logic devices, quantum computing-based data processing logic devices, etc.
[0090] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.
[0091] The embodiments described above are merely examples of several implementation methods of this application, and while the descriptions are specific and detailed, they should not be construed as limiting the scope of this patent application. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these modifications and improvements all fall within the protection scope of this application.
Claims
1. A method for processing follow-on investment transaction data based on big data, characterized in that, The method includes: Obtain all transaction data related to co-investment on the platform, including data on co-buying transactions and co-selling transactions; A follow-buying transaction queue is constructed based on the follow-buying transaction data, and the follow-buying transaction queue includes multiple follow-buying transactions sorted according to the follow-buying timestamp; For each follow-up sell transaction in the follow-up sell transaction data, the follow-up buy transaction queue is matched with the follow-up buy transaction queue through the first-to-sell logic to generate a buy-sell transaction pair; Calculate the profit per transaction for the buy-sell pair, and generate a profit record based on the profit per transaction. The profit record includes the total profit obtained by aggregating the profit per transaction using the following transaction flow or the following transaction flow as the aggregation dimension. The revenue records are stored according to distributed logic and a daily update strategy.
2. The method according to claim 1, characterized in that, Obtain full transaction data for all follow-on investments on the platform, including: Obtain the recommended target signal associated with the aforementioned follow-up investment transaction flow data; The transaction data of the follow-up investments are classified according to the recommended target signal to determine the type of target; A pipeline matching strategy is set according to the type of the target, and the pipeline matching strategy is used to perform pipeline matching; Read the follow-up investment operation markers from the full transaction flow data of the platform; Based on the follow-up investment operation marker, follow-up purchase transactions are filtered out from the follow-up investment transaction flow data to obtain the follow-up purchase transaction flow data.
3. The method according to claim 2, characterized in that, Set a flow matching strategy according to the type of the target, including: The type of the target asset is read, which includes a first type, a second type, and a third type; the first type indicates that the target asset has no buy or sell direction; the second type indicates that buy or sell direction is added; and the third type indicates that trading time is increased. The transaction matching method is set based on the type of the target asset. Specifically, when the target asset type is the first type, the transaction matching method is set to filter transaction flows within the recommended valid time period; when the target asset type is the second type, the transaction matching method is set to match the time and the same buy / sell direction; when the target asset type is the third type, the transaction matching method is set to match the time, buy / sell direction, and customer transaction time after the recommended target time. The pipeline matching strategy is generated based on the pipeline matching method.
4. The method according to claim 1, characterized in that, For each follow-the-sell transaction in the aforementioned follow-the-sell transaction data, a buy-first-sell logic is used to match the follow-the-sell transaction queue to generate a buy-sell transaction pair, including: Based on the buy-follow timestamp, the head of the buy-follow queue is determined according to the first-to-buy-first-sell logic; Starting from the head of the queue, each sell order is matched with a buy order in sequence to generate the buy-sell transaction pair; The objective data associated with the buy-sell transaction pair includes at least one of the following: transaction price, transaction quantity, and transaction fee.
5. The method according to claim 4, characterized in that, Starting from the head of the queue, each sell order is sequentially matched with buy orders to generate the buy-sell transaction pair, including: The number of follow-buy transactions is counted from the follow-buy transaction flow data, including the number of individual follow-buy transactions. The number of followers is counted from the aforementioned follow-up transaction data; If the number of buy orders in a single transaction is greater than the number of sell orders, the buy order flow is split into a matched portion and a remaining portion. The buy-sell transaction pair is generated based on the matched portion, and the remaining portion is stored in the buy order flow queue. If the number of buy orders in a single transaction is less than or equal to the number of sell orders, a full matching process is performed on the buy order flow to generate the buy-sell transaction pair.
6. The method according to claim 1, characterized in that, Calculating the profit per transaction for the buy-sell pair includes: By traversing the buy and sell transaction pairs, the matching quantity and value information are obtained; the value information includes the sell price, buy price, buy commission, and sell commission. Calculate the difference between the selling price and the buying price to obtain the price difference per transaction; The single transaction revenue value is calculated based on the single transaction price difference and the matching quantity, wherein the single transaction revenue value is the product of the single transaction price difference and the matching quantity; Calculate the sum of the buy-follow-sell-follow-buy commission and the sell-follow-sell commission to obtain the single expenditure value; The single-transaction revenue is calculated based on the single-transaction income value and the single-transaction expenditure value, whereby the single-transaction revenue is the difference between the single-transaction income value and the single-transaction expenditure value.
7. The method according to claim 1, characterized in that, Based on the single revenue transaction, a revenue record is generated, including: Set an aggregation dimension, which is set based on the follow-selling transaction volume or the follow-buying transaction volume; The individual revenues are aggregated one by one according to the aggregation dimension to generate the total revenue; Obtain the record information associated with the single revenue, the record information including at least one of customer identifier, target code and transaction timestamp; The revenue record is generated based on the recorded information and the total revenue.
8. The method according to claim 1, characterized in that, The revenue records are stored according to distributed logic and a daily update strategy, including: Obtain the customer's unique identifier and transaction timestamp corresponding to the revenue record; Generate a composite partition key based on the customer's unique identifier and the transaction timestamp; Based on the composite partition key, the revenue records are stored in partitions using a distributed file system. The distributed file system achieves distributed matching through a big data processing core and revenue aggregation through big data structured queries.
9. The method according to claim 1, characterized in that, After storing the revenue records according to distributed logic and a daily update strategy, the method further includes: Obtain resource scheduling time information, which includes the task execution time and task end time set according to a preset idle time period; A resource coordination queue is created based on the resource scheduling time information. The resource coordination queue is a dedicated big data processing queue created based on a preset capacity scheduling priority. Define the resources of a single node and obtain the total resources of the cluster; Request application resources based on the single node resources and the total cluster resources; Dynamic resource allocation is performed according to the application resources to automatically scale up or down based on the data volume of the resource coordination queue.
10. A data processing device for follow-up investment transactions based on big data, characterized in that, The device includes: The data collection and filtering module is used to acquire all follow-up investment transaction data on the platform, including follow-buy transaction data and follow-sell transaction data. The queue management module is used to construct a follow-buy transaction queue based on the follow-buy transaction data. The follow-buy transaction queue includes multiple follow-buy transactions sorted according to the follow-buy timestamp. The transaction matching module is used to perform transaction matching with the buy-first-sell queue for each buy-first-sell transaction in the buy-first-sell transaction data, so as to generate buy-sell transaction pairs. The profit calculation and aggregation module is used to calculate the profit of a single transaction of the buy-sell transaction pair, and to generate a profit record based on the single profit. The profit record includes the total profit obtained by aggregating the single profit using the following transaction flow or the following transaction flow as the aggregation dimension. The big data processing module is used to store the revenue records according to distributed logic and a daily update strategy.