A process multidimensional data processing system
By generating standardized event sequences and constructing process closure templates, the problem of fragmentation in multi-source process event records is identified and corrected, solving the problem of process-level fragmentation in existing technologies and improving the reliability of data processing and the accuracy of analysis.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- HANGZHOU FOCUS TECHNOLOGY CO LTD
- Filing Date
- 2026-03-16
- Publication Date
- 2026-06-12
AI Technical Summary
Existing technologies lack a quality constraint mechanism for the process closure relationship of multi-source process event records, which leads to distorted process review basis, deviation of responsibility investigation basis, inaccurate anomaly identification results, and the inclusion of broken records lacking closure constraints in training samples, affecting the reliability of subsequent analysis results.
A process multidimensional data processing system is provided. The system generates standardized event sequences and constructs process closure templates through a sequence modeling module. It uses a chain position matching module to identify necessary, alternative, and prohibited cross-linking chain positions. The break identification module performs cross-source verification and time-series penetration verification. The sample discrimination module constructs a purified sample set and a break sample set. Finally, the system generates process multidimensional data processing results.
It enables the identification and correction of process-level continuity relationships in multi-source process event records, improves the clarity of process basis for subsequent analysis, reduces the probability of broken data inflow, and enhances the accuracy of accountability and anomaly identification.
Smart Images

Figure CN122196003A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of data processing technology, and more specifically, to a process multidimensional data processing system. Background Technology
[0002] In the field of multi-source business process data governance, existing technologies typically focus on verifying field integrity, format standardization, coding consistency, primary and foreign key relationship correctness, and standard mapping correctness to ensure the availability of data table structure, field content, and source relationships at the storage level. Patent document US20130055042A1 discloses a data quality analysis and management scheme, focusing on integrity, consistency, standardization, complete reference relationships, and repeatability execution quality testing, demonstrating that existing technologies can already perform quality checks on structured data at the field and record levels. Patent document US20100070463A1 discloses a data source tracking and management scheme, which incorporates ancestor objects, workflow parameters, and intermediate results of data objects into source information management for error source tracing and workflow quality control, demonstrating that existing technologies recognize the role of source information, metadata, and derivation relationships in data credibility analysis. Patent document US12326798B2 discloses a scheme for generating causal process models based on process event logs and outputting difference views, demonstrating that existing technologies can already identify causal execution relationships between activities and detect process differences from event logs.
[0003] However, the focus of the aforementioned existing technologies is still mainly on field-level, record-level, source-level, or event-pair-level verification and analysis, with less emphasis on further determining whether multiple process events form a traceable, verifiable, and referential closed relationship under the same process fact. For multidimensional process data, its reliability depends not only on whether a single process record itself is complete, but also on whether its associated preceding events, subsequent events, and cross-source events can form a continuous process expression. Even if a process record has complete fields, complete timestamps, and clear source identification, it does not necessarily mean that the corresponding process has met the conditions for entering the subsequent high-reliability analysis stage.
[0004] In practical applications, the following situations may easily occur: an execution record includes execution time, execution subject, and result marker, but lacks corresponding application record, authorization record, pre-verification record, or result confirmation record; an abnormal result record includes result time, result content, and source identifier, but lacks delivery record, viewing record, response record, review record, or handling record; or event records from multiple sources, although existing separately, are broken in terms of time sequence, corroboration relationship, or anaphora relationship, and cannot form a continuous closed expression around the same process facts; when only field-level quality rules or source-level verification rules are used, such records may still be judged as qualified records and continue to flow into subsequent statistical analysis, rule mining, or model training stages.
[0005] Because existing technologies lack quality constraint mechanisms oriented towards process closure relationships, the process results output by the system are prone to appearing complete at the field level but fragmented at the process level. This leads to distorted process review basis, skewed basis for accountability investigation, inaccurate anomaly identification results, and the inclusion of fragmented records lacking closure constraints in the training samples. Consequently, although subsequent analysis results are based on a formally complete dataset, the corresponding process facts do not form a closed expression, making it difficult to provide reliable input for highly reliable process analysis.
[0006] Therefore, a process multidimensional data processing system is needed to perform standardized processing on multi-source process event records, and further establish process closure constraints around the preceding, subsequent, corroborating, and anaphoric relationships between events. It should identify missing links, link conflicts, cross-source corroboration mismatches, and broken anaphoric relationships, and then perform broken link isolation, missing link filling, and training sample purification accordingly to output process multidimensional data processing results with process closure constraints. Summary of the Invention
[0007] In order to overcome the above-mentioned defects of the prior art and to achieve the above objectives, this application provides the following technical solution: In a first aspect, this application discloses a process multidimensional data processing system, comprising: The sequence modeling module obtains a multi-source process event record set corresponding to the target object, generates a standardized event sequence based on field parsing, event type mapping, time semantics integration, and source identifier binding results, and constructs a process closure template based on the pre-order, post-order, and corroboration relationships of event types, generating a template chain rule set; The chain matching module takes a standardized event sequence and a template chain rule set as input, performs chain matching, determines the necessary chain, alternative chain and prohibited cross-chain, and generates a candidate closed chain set and chain coverage results. The fracture identification module takes the candidate closed chain set and chain position coverage result as input, performs cross-source verification, time-series penetration verification and retracement consistency verification, and generates closure verification result and broken chain set; The sample discrimination module takes the closure check result and the broken chain set as input, constructs the purified sample set and the broken sample set, trains the closure discrimination model, outputs the chain-level closure result and the break type result, and generates the missing chain position filling path based on the chain-level closure result, the break type result and the template chain position rule set. The results output module takes the chain-level closure result, break type result, and missing chain position replenishment path as input, performs broken chain isolation, sample admission filtering, and template chain position rule correction, and generates multi-dimensional data processing results.
[0008] Compared with related technologies, this application has the following advantages: First, the multi-source process event record set corresponding to the target object is parsed, the event type is mapped, the time semantics are rectified, and the source identifier is bound to generate a standardized event sequence. Then, based on the preceding, succeeding, and corroborating relationships of the event types, a process closure template is constructed, generating a template chain rule set. On this basis, a candidate closed chain set and chain coverage results are obtained through chain matching. Then, cross-source corroboration verification, time-series penetration verification, and retracement consistency verification are sequentially performed on the candidate closed chain set to form a closure verification result and a set of broken chains. Thus, this application advances the existing data governance approach, which focuses on field integrity, format standardization, and source relationships, into a constraint-based governance approach oriented towards the continuous relationships of process events. This approach can not only identify whether a single record is complete, but also identify process-level problems such as missing preceding events, mismatched corroborating relationships, abnormal sequence relationships, and broken retracement relationships. This reduces the situation where the field level is complete but the process level is broken, providing data with clearer process evidence for subsequent analysis.
[0009] Furthermore, using the closure verification results and the set of broken chains as input, a purified sample set and a broken sample set are constructed to train the closure discrimination model. The output chain-level closure results and broken type results are then generated. Based on the chain-level closure results, broken type results, and template chain position rule set, a missing chain position restoration path is generated. Subsequently, broken chain isolation, sample admission filtering, and template chain position rule correction are performed to generate multi-dimensional data processing results. Thus, this application can not only identify and distinguish broken objects, but also transform the broken position, broken type, and restoration order into reusable rule correction basis. This ensures that the multi-dimensional data processing results simultaneously include directly usable valid objects, isolated broken objects, and corrected template chain position rule set, thereby reducing the probability of broken data continuing to flow into statistical analysis, rule mining, and model training, and improving the consistency between subsequent process review, accountability investigation, anomaly identification, and training sample construction and the actual process. Attached Figure Description
[0010] Figure 1 A flowchart illustrating the steps of a process multidimensional data processing system provided in this application; Figure 2 A flowchart of multidimensional data processing provided for this application. Detailed Implementation
[0011] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application. Example
[0012] Please see Figure 1 As shown, this embodiment provides a process multidimensional data processing system, including: a sequence modeling module, a chain position matching module, a break identification module, a sample discrimination module, and a result output module, with each module connected via wired and / or wireless means.
[0013] The sequence modeling module is used to obtain the multi-source process event record set corresponding to the target object, generate a standardized event sequence based on field parsing, event type mapping, time semantic integration and source identifier binding results, and construct process closure templates based on the pre-order, post-order and corroboration relationships of event types to generate template chain rule sets.
[0014] In specific implementation, the method for generating standardized event sequences includes: determining the target object, which is used to limit the data belonging scope corresponding to a multi-dimensional data processing process; the target object can be any object among a business object, a business matter, a task processing process, a work order, a batch of production objects, a user, an order, or a medical treatment matter, as long as the subsequent process event records belong to the same target object; the purpose is to establish a unified processing boundary.
[0015] The process involves acquiring a multi-source process event record set for a target object. This set refers to a collection of process event records from at least two data sources that are related to the target object. Each process event record includes at least one or more of the following: record identifier, target object identifier, original event description, record generation time, and source information. Data sources can be business record tables, log record tables, approval record tables, execution record tables, result record tables, feedback record tables, message record tables, manual entry records, or equipment collection records. The acquisition process begins with filtering based on the target object identifier, followed by supplementary searches based on time range, business scope, or source scope. The search results are then summarized to form the multi-source process event record set. The purpose is to collect process traces of the same target object from different data sources.
[0016] Field parsing is performed on each process event record in the multi-source process event record set to generate field parsing results. Field parsing results refer to the unified field expressions extracted and reorganized from the process event records. Field parsing results include at least one or more of the following: event subject field, event action field, event object field, event time field, event source field, and event status field. When the process event record is a structured record, the corresponding content is extracted directly according to the field name. When the process event record is a semi-structured record or a text record, it is segmented and extracted according to preset delimiters, fixed text positions, keyword adjacent regions, or field templates. When the field names in different data sources are different but represent the same meaning, the corresponding content is merged into the same field category. The purpose is to organize the original process event records into comparable field expressions.
[0017] The event type mapping is performed using the field parsing results as input, generating event type mapping results. The event type mapping results refer to the correspondence between each process event record and a preset event type set. The preset event type set is used to unify different descriptions of the same type of event from different data sources, and includes at least one or more of the following: trigger event type, application event type, verification event type, execution event type, result event type, feedback event type, confirmation event type, and disposal event type. During mapping, the event action field and event status field are extracted first, and then the event object field is combined to perform event semantic determination. If the event action field corresponds to the meaning of initiating, submitting, or registering, it is mapped to an application event type; if the event action field corresponds to the meaning of executing, completing, or implementing, it is mapped to an execution event type; if the event action field corresponds to the meaning of generating, returning, or obtaining, it is mapped to a result event type; if the event action field corresponds to the meaning of checking, reviewing, or verifying, it is mapped to a verification event type or a confirmation event type. When the same process event record corresponds to multiple event types simultaneously, an event type is determined according to the order of priority of event action field, verification of event status field, and verification of event object field; the purpose is to form a unified event category.
[0018] The process uses event type mapping results and field parsing results as input to perform time semantic adjustment, generating a time semantic adjustment result. The time semantic adjustment result refers to the result obtained after converting different time expressions into a unified time expression. The time semantic adjustment result includes at least one or more of the following: standard time point, time interval start point, time interval end point, time sequence marker, and time precision marker. During adjustment, the event occurrence time, result formation time, record generation time, supplementary entry time, and synchronization time are first identified. When a process event record includes multiple time fields, the standard time point is determined according to the order of event occurrence time, result formation time, record generation time, and synchronization time. When the time content is a relative time expression, it is converted to an absolute time expression based on the reference time point corresponding to the target object. When the time content is only accurate to the date, a time precision marker is added. The tolerance range used in time conversion and time sorting is set according to the duration of the process corresponding to the target object and the record formation interval, with the aim of forming a unified time benchmark.
[0019] The source identifier binding is performed using the field parsing results, event type mapping results, and time semantics integration results as input, generating source identifier binding results. Source identifier binding results refer to the correspondence between each process event record and the data source attribute. Source identifier binding results include at least one or more of the following: source category marker, source original record marker, source generation method marker, and source responsibility location marker. The source category marker distinguishes between manually entered sources, automatically collected sources, rule-generated sources, or externally imported sources. The source generation method marker distinguishes between original generation, synchronous copying, summary derivation, or supplementary correction. The source responsibility location marker indicates the record position of the process event record in the corresponding data source. During binding, the event source field is read first, and then the source table name, record path, generation marker, or synchronization marker is combined to determine the source category marker, source generation method marker, and source responsibility location marker. The purpose is to provide a unified source basis for subsequent verification relationship identification.
[0020] Taking field parsing results, event type mapping results, time semantics straightening results, and source identifier binding results as input, the event records of each process are reorganized according to a unified record structure to generate a standardized event sequence. A standardized event sequence is a sequence of event records expressed according to a unified field standard and arranged in chronological order. Each standardized event record in the standardized event sequence includes at least the target object identifier, event type, standard time point or time interval, source identifier, event status, and original record reference information. During generation, all standardized event records corresponding to the same target object are first collected and then sorted according to the standard time point. When the standard time points are consistent, they are sorted according to the event type order in the preset event type set. When both the standard time point and the event type are consistent, they are sorted according to the source generation method. After sorting, they are written into the same sequence expression to obtain a standardized event sequence. The purpose is to convert multi-source process event records into direct input objects for the subsequent construction process closure template.
[0021] Perform a sequence integrity check on the standardized event sequence. The sequence integrity check refers to checking whether there are missing target object identifiers, missing event types, missing standard time points, or missing source identifiers in the standardized event records in the standardized event sequence. When there are missing items, mark the missing field positions of the corresponding standardized event records and retain them in the standardized event sequence. The purpose is to identify the positions of information to be supplemented when closing the template in the subsequent construction process.
[0022] In specific implementation, the method for generating template chain rule sets includes: taking a standardized event sequence as input, extracting the event type set from the standardized event sequence; the event type set refers to the deduplication result of all event types in the standardized event sequence; after extraction, the occurrence position, occurrence frequency, and adjacent event types of each event type in the standardized event sequence are statistically analyzed; the purpose is to provide a unified event type input for relation construction.
[0023] Based on the event type set and standardized event sequence, a precedence relationship for event types is constructed. A precedence relationship refers to the relationship where one event type precedes another event type in the process sequence. During construction, adjacent or near-neighbor pairs of standardized event records under the same target object are selected, and the correspondence between the event types of the preceding and following standardized event records is statistically analyzed. When an event type precedes another event type multiple times within a preset time window, and there is no reverse combination that conflicts with the event semantics, the preceding event type is recorded as the precedence event type of the following event type. The preset time window is set according to the event interval distribution, source synchronization delay, and the duration of the process corresponding to the target object in the standardized event sequence. The purpose is to determine the prior constraints between event types.
[0024] The subsequent relationships of event types are constructed based on the preceding relationships. A subsequent relationship refers to the relationship in which one event type is located after another event type in the process sequence. During construction, the subsequent event types of the preceding event types are determined by working backward from the subsequent event types in the preceding relationships, and it is checked whether the corresponding event types can form a continuous process segment together with the preceding event types. If a continuous process segment cannot be formed, it is not written into the subsequent relationship. The purpose is to form a sequence constraint that can be traced forward and extended backward.
[0025] Based on the source identifiers and field parsing results in the standardized event sequence, a corroboration relationship for event types is constructed. A corroboration relationship refers to the mutual proof relationship between two event types from different data sources or different record locations regarding the same process fact. During construction, standardized event record pairs with different source identifiers are first selected, and then the event object field, standard time point or time interval, event status field, and original record reference information are compared. When the event object fields are consistent or have an inclusion relationship, the time expression is within a preset proximity range, the event status field does not have semantic conflicts, and the original record reference information does not have a mutual exclusion relationship, the corresponding event type pair is determined as a corroboration relationship. The preset proximity range is set according to the time precision mark, the source generation method mark, and the event record interval distribution. The purpose is to form mutual proof constraints for multi-source events.
[0026] A process closure template is constructed using prior, subsequent, and corroborating relationships as inputs. A process closure template refers to a template object that organizes multiple event types around the same process fact according to positional and constraint relationships into a closureable process expression. During construction, a core event type is first selected as the template center. Then, the preceding event types before the core event type are arranged as the pre-positions, the subsequent event types after the core event type are arranged as the post-positions, and the event types that form corroborating relationships with the core event type, the pre-positions, or the post-positions are arranged as corroborating positions. This forms the process closure template, the purpose of which is to integrate discrete relationships into a templated expression that can be used for subsequent chain position matching.
[0027] In the process closure template, mandatory links, alternative links, and prohibited cross-link links are identified to generate a template link rule set. Mandatory links are positions in the process closure template used to express the target process, and their absence would prevent the formation of a complete process expression. Alternative links are positions used to replace another position under different data sources, recording methods, or business branches. Prohibited cross-link links are combinations of positions in the process closure template where direct connection is not allowed. When determining these, it is first checked whether each position has both preceding and subsequent relationships. When a position has continuous support and its absence would prevent the completion of the process expression, the corresponding position is identified as a mandatory link. When two positions carry the same process meaning but have different source category or source generation method labels, either position is identified as an alternative link to the other. When there is an intermediate position between two positions that acts as a connector, and a direct connection would disrupt the preceding or subsequent relationship, the corresponding position combination is identified as a prohibited cross-link. The purpose is to refine the process closure template into subsequent link matching rules.
[0028] In one embodiment, the template chain rule set is not directly generated from the standardized event sequence to be processed in the current round, but is initialized based on at least one of the following sources: historical closed process samples, business process configuration items, or manually verified samples. The processing of the standardized event sequence in the current round is only used to adapt, filter, or partially modify the existing template chain rule set, and the candidate closed chain to be judged in the current round is not directly used as the basis for the initialization of the template in the current round, so as to avoid the formation of a self-circular dependency between the template chain rule set and the sample to be judged.
[0029] The required links, alternative links, and prohibited bridging links in each process closure template are summarized according to the template identifier to generate a template link rule set. The template link rule set refers to the set of rules used for subsequent link matching of standardized event sequences. Each rule in the template link rule set includes at least one or more of the following: template identifier, link name, link type, allowed connection position, prohibited connection position, and verification position requirement. After generation, the template link rule set is associated with the standardized event sequence to provide direct input objects for subsequent link matching.
[0030] Perform a rule consistency check on the template chain rule set. The rule consistency check refers to checking whether the same position under the same template identifier is marked as both a mandatory chain position and an alternative chain position, or whether the same position combination is written into both an allowed connection position and a prohibited connection position. When a conflict exists, the conflicting rules are corrected in the order of preorder relationship, subsequent relationship and corroboration relationship, and the correction results are written back to the template chain rule set to ensure that the template chain rule set has a unified rule standard.
[0031] To make the generation process of standardized event sequences and template chain rule sets easier to understand, the following example uses data from the clinical field: Taking a critical value testing scenario, and using the same inpatient as the target, the outpatient system's record A01 shows a test request time of 08:10 on March 5, 2026, and the requested item is a serum potassium test. The test system's record B03 shows a result formation time of 09:02 on March 5, 2026, and a serum potassium value of 2.8 mmol / L. The message system's record C07 shows a notification time of 09:05 on March 5, 2026, and the notification recipient is the attending physician. The medical record's record D11 shows a treatment record creation time of 09:18 on March 5, 2026, and the treatment content is a potassium supplementation order. Based on the fields... The parsed event action fields are application, generation, sending, and handling, respectively. Based on the event type mapping, they are determined as application event type, result event type, feedback event type, and handling event type. Based on the time semantics, the standard time points 08:10, 09:02, 09:05, and 09:18 are obtained. Based on the source identifier binding, the source category tags are business record table, result record table, message record table, and manual entry record, respectively, thus forming a standardized event sequence arranged in chronological order. Furthermore, a process closure template is constructed in which the application event type precedes the result event type, the result event type precedes the feedback event type, the feedback event type precedes the handling event type, and the result event type and the feedback event type form a corroborating relationship. A template chain rule set including the application position, result position, feedback position, and handling position is generated.
[0032] The chain matching module is used to perform chain matching with a standardized event sequence and a template chain rule set as input, determine the necessary chain bits, alternative chain bits, and prohibited cross-chain bits, and generate a candidate closed chain set and chain bit coverage results.
[0033] In specific implementation, the method for generating a candidate closed chain set includes: using a standardized event sequence and a template chain position rule set as common inputs; the standardized event sequence is used to provide standardized event records arranged according to a unified field caliber and time order, with the aim of introducing the standardized event records and template chain position rules into the same step simultaneously to form the input object for chain position matching; template assignment is performed on the standardized event sequence according to the template identifier in the template chain position rule set to generate template candidate event groups; the template candidate event group refers to a combination of standardized event records that correspond to the same template identifier and have a matching relationship with the template chain position rule set; during assignment, the chain position name and chain position type in the template chain position rule set are read first, and then standardized event records with the same event type or that can correspond to substitute chain positions are selected from the standardized event sequence to form template candidate event groups under the corresponding template identifier; during selection, the standard time point, source identifier, event status, and original record reference information are retained simultaneously; the purpose is to organize the standardized event records corresponding to different process closed templates separately to facilitate subsequent chain position matching according to the template.
[0034] Using a template candidate event group as input, chain position matching is performed to determine mandatory chain positions, alternative chain positions, and prohibited bridging chain positions. Chain position matching refers to mapping the standardized event records in the template candidate event group to the corresponding chain position positions in the template chain position rule set. During matching, the initial chain positions are first determined based on the correspondence between event type and chain position name, and then the correspondence is verified based on the source identifier, event status, and original record back pointer information. When a standardized event record corresponds to a mandatory chain position, the corresponding standardized event record is marked as a mandatory chain position record. When a standardized event record corresponds to an alternative chain position, the corresponding standardized event record is marked as an alternative chain position record, and the name of the replaced position is recorded. When two matched positions belong to a prohibited connection position combination, the corresponding positions are marked as a prohibited bridging chain position combination, and direct position connections between the two are prohibited. The purpose is to transform discrete standardized event records into a candidate process position sequence that can be organized according to the template position.
[0035] Using the template candidate event group that has completed chain matching as input, position joins are performed to generate closed candidate paths. A closed candidate path refers to the position arrangement result obtained by joining according to the allowed join positions and chronological order in the template chain rule set. During position joins, the necessary chain record is first used as the positioning position. Then, standardized event records that are chronologically preceding and belong to allowed join positions are selected before the positioning position, and standardized event records that are chronologically following and belong to allowed join positions are selected after the positioning position, forming a preceding position sequence and a following position sequence. During the join process, if multiple connectable standardized event records exist at the same position, they are joined according to the rules. The standardized event records are ranked according to their proximity to the time point, the degree of corroboration of the source identifier, and the consistency of the original record's back-point information. The standardized event records with the highest ranking are selected to enter the current closed candidate path, and the remaining standardized event records are used to form parallel closed candidate paths. The proximity of the standard time point is set according to the event interval distribution in the standardized event sequence, the degree of corroboration of the source identifier is set according to the corroboration position requirements in the template chain position rule set, and the consistency of the original record's back-point information is set according to whether the original record's back-point information points to the same process fact. The purpose is to organize the matched positions into path objects that can be verified later according to the template chain position rule set.
[0036] Path filtering is performed on closed candidate paths to generate a candidate closed chain set. The candidate closed chain set refers to the set of closed candidate paths retained after verification using the template chain rule set. During filtering, each closed candidate path is first checked to see if it includes at least one required chain record, then its adjacent positions are checked to see if they are allowed connection positions, and finally, the closed candidate path is checked to see if there are any prohibited cross-link combinations. When a closed candidate path includes a required chain record, its adjacent positions pass the allowed connection position verification, and there are no prohibited cross-link combinations, the closed candidate path is added to the candidate closed chain set. When a closed candidate path does not include a required chain record, or there are prohibited cross-link combinations, or its adjacent positions do not pass the allowed connection position verification... During verification, closed candidate paths are marked as unselected paths, and the reasons for failure are recorded. The purpose is to converge the positional arrangement results into the input objects for subsequent cross-source verification, time-series penetration verification, and back-pointing consistency verification. Numbering and position back-writing are performed on the candidate closed chain set. Numbering and position back-writing refers to assigning a unique closed chain identifier to each candidate closed chain in the candidate closed chain set and establishing a correspondence between each position in the candidate closed chain and the corresponding standardized event record. After back-writing, each candidate closed chain includes at least a closed chain identifier, template identifier, positional order, standardized event record set, source identifier set, and original record back-pointing information set. The reason is to enable the candidate closed chain set to be directly used as input for the next step.
[0037] In specific implementation, the method for generating chain position coverage results includes: using a candidate closed chain set and a template chain position rule set as common inputs. The candidate closed chain set provides candidate closed chains that have passed path filtering, while the template chain position rule set provides template identifiers, chain position names, chain position types, allowed connection positions, prohibited connection positions, and verification position requirements. The purpose is to further determine the occupancy and missing status of each chain position based on the candidate closed chains. For each candidate closed chain in the candidate closed chain set, the corresponding template identifier is extracted, and then the template position list is read from the template chain position rule set based on the corresponding template identifier. The template position list refers to the combination result of all chain position names, chain position types, allowed connection positions, prohibited connection positions, and verification position requirements under the corresponding template identifier. After extraction, each position in the template position list is compared with the actual position in the candidate closed chain one by one, in order to establish a correspondence between the template position and the actual position.
[0038] Based on the template location list and the actual locations in the candidate closed chain, a location occupancy judgment is performed to generate a chain position occupancy result. The chain position occupancy result refers to the judgment result of whether each template location in the candidate closed chain has a corresponding standardized event record. During the judgment, if the template location has a corresponding standardized event record in the candidate closed chain, the template location is marked as a covered location; if the template location does not have a corresponding standardized event record in the candidate closed chain, the template location is marked as an uncovered location; if the template location is occupied by a substitute chain position record, the substitute source and the name of the replaced location are recorded in the covered location. The reason is to clarify the actual coverage status of the template location by the candidate closed chain.
[0039] Based on the chain position occupancy results, a coverage integrity judgment is performed to generate chain position coverage results. Chain position coverage results are a summary of the coverage status of each template position in the candidate closed chain. During generation, the number of covered positions, uncovered positions, and alternative covered positions are first counted. Then, it is checked whether all mandatory chain positions appear in covered or alternative covered positions. When all mandatory chain positions appear, the coverage status of the mandatory chain positions is recorded in the chain position coverage results. When any mandatory chain position is missing, the missing position is recorded in the chain position coverage results. Simultaneously, prohibited bridging chain position combinations are reviewed. If an adjacent actual position belonging to a prohibited connection position combination appears in the candidate closed chain, the bridging occupancy is recorded in the chain position coverage results. The purpose of using location is to convert the location occupancy status into a result object that directly reflects the completeness of template coverage. A coverage detail table is generated based on the chain position coverage results. The coverage detail table refers to the detailed results recording the template location coverage around each candidate closed chain. Each detail in the coverage detail table includes at least one or more of the following: closed chain identifier, template identifier, location name, chain position type, coverage status, corresponding standardized event record identifier, alternative source marker, and bridging occupancy marker. When generating the coverage detail table, the closed chain identifier is written into the corresponding detail so that the coverage detail table can point back to the corresponding candidate closed chain. The purpose is to refine the chain position coverage results into location-level input objects that can be directly called in subsequent verification steps.
[0040] Perform a coverage order check on the chain position coverage results. The coverage order check refers to checking whether the actual position order in the candidate closed chain is consistent with the allowed connection position order in the template position list. During the check, the correspondence between adjacent actual positions and allowed connection positions is compared one by one according to the position order in the candidate closed chain. When the adjacent actual position belongs to the allowed connection position, the corresponding coverage state is retained. When the adjacent actual position does not belong to the allowed connection position, the order offset position is recorded in the chain position coverage results. The reason is to make the chain position coverage results reflect whether the position appears and whether the position order meets the template requirements.
[0041] Output the chain bit coverage results and establish a correspondence with the candidate closed chain set. Establishing a correspondence means binding the chain bit coverage results corresponding to each candidate closed chain according to the closed chain identifier. After binding, each candidate closed chain in the candidate closed chain set corresponds to a set of chain bit coverage results. The purpose is to provide paired input objects for the next step to perform cross-source verification, time-through verification, and back-point consistency verification with the candidate closed chain set and chain bit coverage results as input.
[0042] The break identification module is used to take the candidate closed chain set and chain position coverage results as input, perform cross-source verification, time-series penetration verification and retracement consistency verification, and generate closure verification results and broken chain set.
[0043] In specific implementation, the method for generating closure verification results includes: using the candidate closure chain set and the chain position coverage result as common inputs; the candidate closure chain set is used to provide candidate closure chains that have completed position connection and path filtering, and the chain position coverage result is used to provide the mandatory chain position coverage status, alternative coverage status, bridging occupied position, and sequence offset position corresponding to each candidate closure chain; the purpose is to simultaneously introduce the standardized event record content and position coverage status in the candidate closure chain into the verification step; for each candidate closure chain in the candidate closure chain set, the corresponding chain position coverage result is read to generate the closure chain to be verified; the closure chain to be verified refers to the candidate closure chain that has established a closure chain identifier, template identifier, position order, standardized event record set, source identifier set, and original record retracement information set, and has established a corresponding relationship with the corresponding chain position coverage result, the purpose of which is to provide a unified verification object for subsequent cross-source verification, time-series penetration verification, and retracement consistency verification.
[0044] Cross-source verification is performed on the closed chain to be verified, generating cross-source verification results. Cross-source verification refers to checking whether standardized event records within the verification position requirements of the closed chain to be verified form a mutual verification relationship. During verification, pairs of standardized event records with different source identifiers are first extracted from the closed chain to be verified. Then, the event object field, event status field, standard time point or time interval, and original record reference information in the standardized event record pairs are compared. When the event object field is consistent or has an inclusion relationship, the event status field does not have semantic conflicts, and the standard time point or time interval is located in the preset adjacent... Within a short range, when the original record reference information does not have a mutually exclusive relationship, the corresponding standardized event record pair is marked as a verified record pair; when at least one of the above conditions is not met, the corresponding standardized event record pair is marked as a verified record pair that fails; the preset proximity range is set according to time precision marking, source generation method marking, and event interval distribution; then the number of verified record pairs and the number of verified record pairs that fail in each closed chain to be verified are counted to form a cross-source verification result; the purpose is to identify whether multi-source events in the same candidate closed chain mutually prove the same process fact.
[0045] A time-through check is performed on the closed chain to be checked, generating a time-through check result. The time-through check checks whether the positional order in the closed chain to be checked crosses the allowed connection position order in the template chain rule set, and whether it bypasses a necessary or alternative chain position that should be in the middle position. During the check, the corresponding template position list is first read from the template chain rule set based on the template identifier. Then, adjacent actual positions are compared with allowed connection positions one by one according to the positional order in the closed chain to be checked. When adjacent actual positions are within allowed connection positions, and there are no necessary or alternative chain positions that should appear but are not, the corresponding positional relationship is marked as a time-through position pair. When adjacent actual positions are not within allowed connection positions, or there are necessary or alternative chain positions that should appear but are not, the corresponding positional relationship is marked as a time-through position pair. Subsequently, the number of time-through position pairs and the number of time-through position pairs in each closed chain to be checked are counted to form a time-through check result. The purpose is to identify whether the positional connections in the candidate closed chain unfold according to the order specified in the template chain rule set.
[0046] Perform backreference consistency verification on the closed chain to be verified and generate backreference consistency verification results. Backreference consistency verification refers to checking whether the backreference information of the original records corresponding to the standardized event records in the closed chain to be verified points to the same process fact or the same set of process facts. During the verification, first extract the set of original record backreference information in the closed chain to be verified, and then compare the record source, record identifier, parent record identifier, derived record identifier, and referenced record identifier in the set of original record backreference information.
[0047] In one embodiment, the original record reference information includes at least one or more of the following: original record identifier, business serial number, parent record identifier, derived record identifier, and reference record identifier. When some data sources do not provide parent record identifier, derived record identifier, or reference record identifier, an alternative reference key is constructed based on the target object identifier, business serial number, key object field combination, and standard time point proximity relationship, and reference consistency verification is performed based on the alternative reference key.
[0048] When the retracement information of each original record traces back to the same original process fact, or traces back to consecutive derived records in the same set of original process facts, the corresponding closed chain to be checked is marked as retracement consistent; when the retracement information of the original record traces back to different process facts, or there is an interruption in the retracement path, the corresponding closed chain to be checked is marked as retracement inconsistent; then the retracement consistency check result is formed, the reason being to identify whether each standardized event record in the candidate closed chain originates from the same process fact evolution process.
[0049] Using cross-source verification results, time-series passthrough verification results, retracement consistency verification results, and chain position coverage results as input, a closure state summary is performed to generate a closure verification result. Closure state summary refers to the unified merging of multiple verification results for the same closed chain to be verified. During summary, the necessary chain position coverage state, alternative coverage state, bridging position, and sequence offset position in the chain position coverage result are read first. Then, the number of verified record pairs and the number of verified non-verified record pairs in the cross-source verification results, the number of time-series pass position pairs and the number of time-series pass position pairs in the time-series passthrough verification results, and the number of retracement points in the retracement consistency verification results are read. The closure check result records the closure pass status when the required chain position coverage is complete, the number of verification failure record pairs is zero, the number of timing penetration position pairs is zero, the closure inconsistency mark does not exist, the bridging position does not exist, and the sequence offset position does not exist. When at least one of the above conditions fails, the closure check result records the closure failure status, and simultaneously writes the failure position, failure record pair, closure interruption position, bridging position, or sequence offset position. The reason is to integrate the scattered check results into a unified result object for the subsequent construction of the clean sample set and the broken sample set.
[0050] The closure verification results are assigned a result number and bound to the result. This means assigning a unique verification result identifier to each closure verification result and binding the verification result identifier to the corresponding closure chain identifier. After binding, each candidate closure chain corresponds to a closure verification result. The purpose is to enable the closure verification results to be directly used as input objects for the next step of constructing the purified sample set and the broken sample set.
[0051] In specific implementation, the method for generating the set of broken chains includes: using the closure check result and the candidate closure chain set as common inputs. The closure check result is used to provide the closure pass status or closure fail status of each candidate closure chain. The candidate closure chain set is used to provide the closure chain identifier, template identifier, position order, standardized event record set, source identifier set, and original record retracement information set of the corresponding candidate closure chain. The purpose is to filter out the candidate closure chains that have failed the closure check from the candidate closure chains that have completed the closure check.
[0052] Based on the closure failure status in the closure check results, corresponding candidate closure chains are filtered to generate a set of broken candidate chains. The set of broken candidate chains refers to the set of candidate closure chains corresponding to the closure failure status. During filtering, the closure chain identifier in the closure check results is read first, and then the candidate closure chains with the same identifier are extracted from the set of candidate closure chains. The extraction results are then summarized into a set of broken candidate chains. The purpose is to separate the candidate closure chains that failed the closure from the set of candidate closure chains, so as to facilitate subsequent break attribution. For each broken candidate chain in the set of broken candidate chains, a break position extraction is performed to generate a break position result. Break position extraction refers to extracting the specific position or specific record pair that caused the closure failure from the closure check results. During extraction, the failure position, failure record pair, back-point interruption position, bridging occupied position, and sequence offset position in the closure check results are read, and then the corresponding position or corresponding record pair is written into the break position result. The aim is to realize that the cause of the closure failure falls into the position layer or record pair layer of the broken candidate chain.
[0053] The fracture type determination is performed using the fracture location result as input, generating a fracture type result. Fracture type determination refers to distinguishing the fracture source of the fracture candidate chain based on the fracture location result. During determination, if the fracture location result includes a missing position of a necessary chain position, the corresponding fracture candidate chain is marked as a covering fracture type; if the fracture location result includes a pair of records that failed to verify, the corresponding fracture candidate chain is marked as a verification fracture type; if the fracture location result includes a pair of time-penetration positions or a sequence offset position, the corresponding fracture candidate chain is marked as a time-penetration fracture type; if the fracture location result includes a retracement interruption position, the corresponding fracture candidate chain is marked as a retracement fracture type; if the fracture location result includes a bridging position, the corresponding fracture candidate chain is marked as a bridging fracture type. When the same fracture candidate chain corresponds to multiple fracture sources, the main fracture type is determined in the order of missing position of a necessary chain position, bridging position, time-penetration position pair, pair of records that failed to verify, and retracement interruption position, and the remaining fracture types are retained as additional fracture types. The purpose is to convert the fracture candidate chain into a fracture object that can be classified and managed.
[0054] The process involves writing fracture chains using the candidate fracture chain set, fracture location results, and fracture type results as input, generating a fracture chain set. The fracture chain set refers to the set of candidate closed chains after fracture location and fracture type annotations. During writing, the closed chain identifier, template identifier, position order, standardized event record set, source identifier set, original record retracement information set, fracture location result, and fracture type result of each candidate fracture chain are written into the same fracture record. All fracture records are then aggregated to form the fracture chain set. This aims to solidify the candidate fracture chains and their fracture causes as direct input objects for subsequent fracture sample set construction. A set integrity check is performed on the fracture chain set. This check verifies whether each fracture record in the set includes the closed chain identifier, template identifier, fracture location result, and fracture type result. If any fracture record lacks one of these, the missing content is filled in based on the corresponding closure check result. After filling in the missing content, the fracture chain set is rewritten, ensuring that the fracture chain set can be directly used as input for the next step of constructing the fracture sample set.
[0055] To make the closure verification results and the basis for the formation of the broken chain set more intuitive, we continue to expand downwards along the aforementioned candidate closed chains. When implementing the above scheme, in the candidate closed chain with closed chain identifier L001, the blood potassium result of record identifier B03 (2.8 mmol / L) and the notification content of record identifier C07 (including the critical value sending time of 09:05) form a corroborating record pair. The application position at 08:10, the result position at 09:02, the feedback position at 09:05, and the treatment position at 09:18 are all within the allowed connection position sequence specified by the template chain position rule set, and the original record feedback information of A01, B03, C07, and D11 are also included. All traces back to the same set of inspection application facts. Therefore, the closed chain identifier L001 of the closed verification result record is in a closed pass state, with 0 failed positions, 0 failed record pairs, 0 retracement interruption positions, and 0 bridging occupied positions. The other candidate closed chain with closed chain identifier L002 only contains the application position 08:10, the result position 09:02, and the disposal position 10:40. It lacks a feedback position and there is a sequence offset between the disposal position and the result position. Therefore, the closed verification result record L002 is in a closed fail state, and two break information items, feedback position missing and time sequence offset, are written into the broken chain set.
[0056] The sample discrimination module is used to construct a clean sample set and a broken sample set with the closure check result and the broken chain set as input, train the closure discrimination model, output the chain-level closure result and the break type result using the closure discrimination model, and generate the missing chain position filling path based on the chain-level closure result, the break type result and the template chain position rule set.
[0057] In specific implementation, the method for constructing the clean sample set and the broken sample set includes: using the closure check result and the broken chain set as common inputs; the closure check result is used to provide the closure pass status, closure fail status, fail position, fail record pair, back-point interruption position, bridging occupied position, and sequence offset position corresponding to each candidate closure chain; the broken chain set is used to provide the break position result and break type result corresponding to each broken chain; the purpose is to simultaneously introduce closure status information and break attribution information into the sample construction step; according to the closure chain identifier, the closure check result and the broken chain set are correspondingly organized to generate the sample candidate record set, which refers to the set of candidate closure chain records summarized according to the closure chain identifier; during the organization, the closure chain identifier in the closure check result is read first, and then the broken chain record consistent with the closure chain identifier is extracted from the broken chain set; when the closure check result corresponds to the closure pass status, the corresponding candidate closure chain record is written into the closure candidate record group; when the closure check result corresponds to the closure fail status, the corresponding candidate closure chain record and the corresponding broken chain record are jointly written into the broken candidate record group to organize the candidate closure chains of different closure states separately.
[0058] For closed candidate record groups, sample admission screening is performed to generate cleaned candidate samples. Sample admission screening refers to checking whether the candidate closed chains in the closed candidate record group have the record integrity and positional integrity to enter the cleaned sample set. During screening, first, it is checked whether there is a closed pass state in the corresponding closure verification result. Then, it is checked whether the corresponding candidate closed chain contains the closed chain identifier, template identifier, positional order, standardized event record set, source identifier set, and original record retracement information set. Then, the mandatory chain position coverage status and alternative coverage status in the corresponding chain position coverage result are read to check whether all mandatory chain positions are covered by covered positions or alternative coverage positions. When all the above checks are passed, the corresponding candidate closed chain is recorded as a cleaned candidate sample. The aim is to ensure that the cleaned candidate samples have complete closed expression.
[0059] The cleaned candidate samples are reorganized to generate a cleaned sample set. Sample field reorganization refers to converting the position information, coverage information, and verification information in the cleaned candidate samples into unified sample fields. During reorganization, each cleaned candidate sample is written with a closed chain identifier, template identifier, position order, mandatory chain position coverage status, alternative coverage status, number of verified record pairs, number of verified non-verified record pairs, number of time-series passed position pairs, number of time-series penetrated position pairs, retracement consistency flag, retracement inconsistency flag, and closure pass status. All cleaned candidate samples are summarized to form a cleaned sample set. The purpose is to organize the closed candidate closed chains into positive sample inputs for training the closure discrimination model.
[0060] For each candidate fracture record group, fracture sample screening is performed to generate fracture candidate samples. During screening, the system first checks whether there is a failed closure state in the corresponding closure check results, and then reads the fracture position result and fracture type result from the corresponding fracture chain record. When a failed closure state exists, and the fracture position result, fracture type result, closure chain identifier, and template identifier have all been written, the corresponding candidate closure chain is recorded as a fracture candidate sample. This aims to ensure that the fracture candidate samples have a directly identifiable fracture source. Sample field reorganization is performed on the fracture candidate samples to generate a fracture sample set. During reorganization, the closure chain identifier, template identifier, position order, failed position, failed record pair, retracement interruption position, bridging occupied position, sequence offset position, fracture position result, fracture type result, and failed closure state are written to each fracture candidate sample. All fracture candidate samples are summarized to form a fracture sample set. This aims to organize the failed closure candidate closure chains into fracture sample inputs for training the closure discrimination model.
[0061] A sample consistency check is performed on the cleaned sample set and the broken sample set. The sample consistency check refers to checking whether the sample field definitions in the cleaned sample set and the broken sample set are consistent. The check content includes at least the template identifier field, position order field, coverage status field, verification result field, and label field. When the expression of the same field is different, the field definition is unified according to the order of closure chain identifier, template identifier, position order, coverage status, verification result, and label result. The unified field is then rewritten into the cleaned sample set and the broken sample set. This is to provide a unified sample expression for the subsequent training of the closure discrimination model.
[0062] In specific implementation, the method for training the closure discrimination model and outputting chain-level closure results, break type results, and missing chain position repair paths is as follows: A training sample table is constructed using the cleaned sample set and the break sample set as common inputs. The training sample table refers to the sample table after merging the cleaned sample set and the break sample set according to a unified field standard. During construction, each sample in the cleaned sample set is first written into the training sample table, and the corresponding sample's closure status field is marked as "closed successfully." Then, each sample in the break sample set is written into the training sample table, and the corresponding sample's closure status field is marked as "closed unsuccessfully." Simultaneously, the break type result is written into the break type field. This is to form a unified training data source for training the closure discrimination model.
[0063] Training features are extracted from the training sample table to generate a training feature table. Training features refer to sample fields that can represent the closure state and breakage source of candidate closed chains. Training features include at least the number of positions, the coverage state of necessary chain positions, the coverage state of alternative positions, the number of verified record pairs, the number of verified non-verified record pairs, the number of temporally passed position pairs, the number of temporally penetrated position pairs, the retracement consistency flag, the retracement inconsistency flag, the number of bridging positions, and the number of sequence offset positions. After extraction, each training feature is written into the training feature table in the order of the samples. The purpose is to convert the sample content into feature inputs for training the closure discrimination model.
[0064] A closure discrimination model is trained using a training feature table as input. This model is used to discriminate the closure state and breakage type of candidate closed chains. During training, the closure state field is first read from the training sample table as the closure discrimination label, and then the breakage type field is read from the corresponding record of the failed closure sample as the breakage discrimination label. The training features from the training feature table are then input into the model training process to obtain the corresponding closure state and breakage type discrimination model. The stopping conditions used during training are set according to the number of label changes, the number of erroneous records, and the distribution changes of various labels between two adjacent training rounds. The aim is to obtain a discrimination model capable of identifying the closure state and breakage source of candidate closed chains.
[0065] The process involves reorganizing candidate closed chains into a feature table, using them as input. Candidate closed chains are those that need to be judged in this round but haven't yet been written into the cleaned or broken sample set. During feature reorganization, based on the field definitions of the training feature table, judgment features with the same names as the training features are extracted from the chain position coverage results, cross-source verification results, temporal penetration verification results, and retracement consistency verification results corresponding to the candidate closed chains. These features are then written into the feature table according to the closed chain identifier. The aim is to ensure that the candidate closed chains are consistent with the input definitions of the closure discrimination model. Finally, the closure discrimination function is called using the feature table as input. The model outputs chain-level closure results and break type results. Chain-level closure results refer to the closure success or failure results given for each candidate closed chain. Break type results refer to the coverage break type, confirmation break type, temporal break type, retracement break type, or bridging break type given for each candidate closed chain. When calling the model, the judgment features in the feature table to be judged are first input into the closure discrimination model to obtain the closure status output corresponding to each closed chain identifier. Then, the corresponding break type results are output for the closed chain identifiers that failed to close. This is to unify the position-level and record-pair-level verification content into chain-level judgment results.
[0066] Using chain-level closure results, break type results, template chain position rule set, and candidate closure chains to be judged as common inputs, a missing chain position backtracking path is generated. The missing chain position backtracking path refers to the order in which the missing positions are retrieved for candidate closure chains that failed to close. During generation, candidate closure chains that failed to close are first filtered from the chain-level closure results, and then the corresponding break type results are read. When the break type result is a covered break type, the missing chain position name is determined based on the required and alternative chain positions in the template chain position rule set. Then, using the previous and next covered positions in the candidate closure chains to be judged as boundary positions, a backtracking path is generated from the previous covered position to the missing chain position, and then from the missing chain position... The order of position replacement is connected to the next covered position; when the break type result is a confirmed break type, a cross-source replacement order is generated according to the confirmation position requirements; when the break type result is a temporal break type, the replacement order is corrected according to the order generated by the allowed connection positions; when the break type result is a back-pointing break type, a back-pointing supplementation order is generated according to the back-pointing information of the original record; when the break type result is a bridging break type, an intermediate position replacement order is generated according to the combination of prohibited connection positions; the replacement order corresponding to all candidate closed chains that failed to close forms the missing chain position replacement path; the purpose is to provide a directly executable path input for subsequent template chain position rule correction and break chain isolation.
[0067] The chain-level closure results, break type results, and missing chain position completion paths are bound together. Result binding means establishing a correspondence between the chain-level closure results, break type results, and missing chain position completion paths corresponding to each candidate closed chain according to the closed chain identifier. After binding, each candidate closed chain corresponds to a chain-level closure result, and a candidate closed chain that fails to close corresponds to a break type result and a missing chain position completion path. This is to enable the output results to be directly used as input objects for the next step of breaking chain isolation, sample admission filtering, and template chain position rule correction.
[0068] In specific implementation, the data acquisition method for the closure discrimination model during the training phase can be as follows: training data is collected from the aforementioned clean sample set and fracture sample set. Each sample in the clean sample set is considered a closure-passing sample, and each sample in the fracture sample set is considered a closure-failing sample. For each fracture sample, the missing chain position and corresponding repair order are further labeled to form a training feature table, closure state labels, fracture type labels, and repair path labels. The training method can be a supervised training method, i.e., using the training feature table as model input and the closure state labels, fracture type labels, and repair path labels as joint training labels to iteratively update the parameters of the closure discrimination model. The specific model can employ multi-task learning. The learning model, in one implementation, employs a multi-layer feedforward neural network model. The input layer receives training features, and the output layer includes a closed-state output, a break type output, and a recovery path output. The recovery path output is used to output the name of the missing chain position, the recovery start position, the recovery middle position, and the recovery termination position. The specific error function can be composed of a weighted sum of the closed-state error term, the break type error term, and the recovery path error term. The training convergence condition can be any one of the following: the change in the total error function between two adjacent training rounds is less than a preset change range; the change in the number of path prediction errors in the validation samples is less than a preset recording range; or the change in the number of labels output between two adjacent training rounds is less than a preset number range.
[0069] To clarify the generation process of the purification sample set, the fracture sample set, the chain-level closure result, the fracture type result, and the missing chain position replenishment path, let's continue with the example. When substituting the above closure verification result into the sample discrimination module, we can write the closure chain identifier L001 into the purification sample set and assign it the closure pass status label 1. We can write the closure chain identifier L002 into the fracture sample set and assign it the closure failure status label 0 and the fracture type label "time fracture type". At the same time, we extract the position quantity (4 and 3), the required chain position coverage status (complete and missing), the number of verified pass record pairs (1 and 0), the number of time-series penetration position pairs (0 and 1), and the retracement consistency label (1 and 0) from L001 and L002 as training features. After completing the closure discrimination model training, we input the same caliber features into the newly entered candidate closure chain L003 to obtain the chain-level closure result as closure failure and the fracture type result as coverage fracture type. Based on the template chain position rule set, we generate the missing chain position replenishment path as the first application position, the second result position, the feedback position to be filled, and the fourth disposal position.
[0070] The results output module takes chain-level closure results, break type results, and missing chain position completion paths as inputs, performs broken chain isolation, sample admission filtering, and template chain position rule correction, and generates multi-dimensional data processing results.
[0071] For specific implementation, please refer to Figure 2As shown, the method for generating multidimensional data processing results includes: using chain-level closure results, break type results, and missing chain position completion paths as common inputs; chain-level closure results are used to characterize the closure pass status or closure fail status corresponding to each candidate closure chain, break type results are used to characterize the break source corresponding to the candidate closure chain that failed closure, and missing chain position completion paths are used to characterize the order of missing position lookup corresponding to the candidate closure chain that failed closure; the aim is to simultaneously introduce closure judgment results, break attribution results, and completion guidance results into the post-processing steps; according to the closure chain identifier, the chain-level closure results, break type results, and missing chain position completion paths are correspondingly organized to generate a set of closure chains to be processed; the set of closure chains to be processed refers to the set of candidate closure chain processing objects summarized according to the closure chain identifier; during the organization, the closure chain identifier and closure status in the chain-level closure results are read first, and then the records in the break type results and missing chain position completion paths that are consistent with the closure chain identifier are written into the same processing object to form a set of closure chains to be processed; the purpose is to merge multiple results corresponding to the same candidate closure chain into a unified processing object.
[0072] Break chain isolation is performed on the set of closed chains to be processed, generating an isolated closed chain set and a retained closed chain set. Break chain isolation refers to separating candidate closed chains that have failed to close from the set of closed chains to be processed according to the chain-level closure results. During execution, the closure status of each processing object is read first, and then the processing objects with the closure status of "failed to close" are written to the isolated closed chain set, and the processing objects with the closure status of "closed to close" are written to the retained closed chain set. When writing to the isolated closed chain set, the corresponding break type result and the missing chain position filling path are written at the same time. The purpose is to separate the processing objects that cannot be directly output from the processing objects that can be directly retained.
[0073] Sample admission filtering is performed on the retained closed-chain set to generate an admission closed-chain set. Sample admission filtering refers to checking whether each processing object in the retained closed-chain set meets the recording conditions for writing the multidimensional data processing results of the process. During filtering, it is first checked whether there is a closed-chain pass state in the corresponding processing object, and then it is checked whether the processing object contains a closed-chain identifier, template identifier, position order, standardized event record set, source identifier set, and original record retracement information set. When all the above checks pass, the corresponding processing object is written into the admission closed-chain set. When at least one of the above checks fails, the corresponding processing object is removed from the admission closed-chain set, and the reason for removal is written into the isolation closed-chain set. The aim is to solve the problem of ensuring that the processing objects for writing the multidimensional data processing results of the process have a unified admission criteria.
[0074] The process involves extracting backfill positions from the isolated closed-chain set to generate a rule correction candidate set. This candidate set comprises candidate rule information extracted from the isolated closed-chain set for correcting the template chain position rule set. During extraction, the break type result and missing chain position backfill path are first read from each isolated processing object. When the break type result is a covering break type, the missing chain position name, the previous covered position, and the next covered position are extracted to form a mandatory chain position fill-in candidate rule or an alternative chain position fill-in candidate rule. When the break type result is a confirming break type, the failed records are extracted. For the corresponding source identifier and verification position requirements, verification position correction rules are formed; when the fracture type result is a time-series fracture type, the sequence offset position and allowed connection position are extracted to form allowed connection position correction rules; when the fracture type result is a back-finding fracture type, the back-finding interruption position and the corresponding original record back-finding information are extracted to form back-finding verification and supplementary rules; when the fracture type result is a bridging fracture type, the bridging occupied position and the intermediate missing position are extracted to form prohibited connection position correction rules; the purpose is to convert the fracture cause and back-finding information into the template chain position rule correction basis.
[0075] The template chain rule set is modified by taking the rule modification candidate set and the template chain rule set as input. The modified template chain rule set is generated. Template chain rule modification refers to adding, deleting, or replacing chain names, chain types, allowed connection positions, prohibited connection positions, and verification position requirements in the template chain rule set based on the rule modification candidate set. During execution, the rule modification candidate set is first categorized according to template identifiers, and then the candidate rule information under the corresponding template identifier is read one by one. When the candidate rule information does not conflict with existing rules in the template chain rule set, the candidate rule information is added to the template chain rule set. When the candidate rule information corresponds to the same position as an existing rule but has a different expression... When applying constraints, the preceding and following positional relationships and breakage types recorded in the missing link completion path are used as the basis for correction, and existing rules are replaced or rearranged. After correction, consistency checks are performed on the link names, link types, allowed connection positions, prohibited connection positions, and verification positions under the same template identifier. When the same position corresponds to both allowed and prohibited connection positions, or the same position corresponds to both required and alternative link positions, conflict resolution is performed in the order of preceding, following, and verification relationships, and the resolved rules are written back to the corrected template link rule set. The aim is to continuously correct the template link rule set by utilizing the breakage information in the isolated closed chain set.
[0076] The process uses the set of admission closed chains, the set of isolation closed chains, and the modified template chain position rule set as input to summarize the execution results and generate a multi-dimensional data processing result. During result summarization, each processed object in the admission closed chain set is first written to the valid closed chain result area, then each processed object in the isolation closed chain set is written to the isolation closed chain result area, and the corresponding break type result and missing chain position recovery path are written to the isolation reason field. Subsequently, the modified template chain position rule set is written to the rule correction result area. The process multi-dimensional data processing result is formed by summing the valid closed chain result area, the isolation closed chain result area, and the rule correction result area. The process multi-dimensional data processing result includes at least one or more of the following: valid closed chain result, isolation closed chain result, break type result, missing chain position recovery path, and modified template chain position rule set. The aim is to simultaneously output directly usable closed chain objects, isolated broken objects, and rule correction results that can be called subsequently.
[0077] Perform a result integrity check on the process multidimensional data processing results. The result integrity check refers to checking whether the process multidimensional data processing results contain closed chain identifiers, closed states, template identifiers, positional order, and template identifiers corresponding to the modified template chain position rule set. When any result item is missing one of the above contents, the missing content is filled in according to the admission closed chain set, the isolation closed chain set, or the modified template chain position rule set, and then rewritten into the process multidimensional data processing results. The purpose is to ensure that the process multidimensional data processing results can be directly used as a unified output object for subsequent process analysis, sample reuse, or further rule modification.
[0078] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the aforementioned scope.
[0079] Finally: The above description is only a preferred embodiment of this application and is not intended to limit this application. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the protection scope of this application.
Claims
1. A process multidimensional data processing system, characterized in that, include: The sequence modeling module obtains a multi-source process event record set corresponding to the target object, generates a standardized event sequence based on field parsing, event type mapping, time semantics integration, and source identifier binding results, and constructs a process closure template based on the pre-order, post-order, and corroboration relationships of event types, generating a template chain rule set; The chain matching module takes a standardized event sequence and a template chain rule set as input, performs chain matching, determines the necessary chain, alternative chain and prohibited cross-chain, and generates a candidate closed chain set and chain coverage results. The fracture identification module takes the candidate closed chain set and chain position coverage result as input, performs cross-source verification, time-series penetration verification and retracement consistency verification, and generates closure verification result and broken chain set; The sample discrimination module takes the closure check result and the broken chain set as input, constructs the purified sample set and the broken sample set, trains the closure discrimination model, outputs the chain-level closure result and the break type result, and generates the missing chain position filling path based on the chain-level closure result, the break type result and the template chain position rule set. The results output module takes the chain-level closure result, break type result, and missing chain position replenishment path as input, performs broken chain isolation, sample admission filtering, and template chain position rule correction, and generates multi-dimensional data processing results.
2. The process multidimensional data processing system according to claim 1, characterized in that, The method for generating a standardized event sequence includes: identifying the target object; obtaining a multi-source process event record set around the target object; performing field parsing on each process event record in the multi-source process event record set to generate field parsing results; performing event type mapping with the field parsing results as input to generate event type mapping results; performing time semantic adjustment with the event type mapping results and field parsing results as input to generate time semantic adjustment results; performing source identifier binding with the field parsing results, event type mapping results, and time semantic adjustment results as input to generate source identifier binding results; and reorganizing each process event record with the field parsing results, event type mapping results, time semantic adjustment results, and source identifier binding results as input to generate a standardized event sequence.
3. The process multidimensional data processing system according to claim 2, characterized in that, The method for generating template chain rule sets includes: taking a standardized event sequence as input, extracting an event type set; constructing a preorder relationship for event types based on the event type set and the standardized event sequence; constructing a subsequent relationship for event types based on the preorder relationship; constructing a verification relationship for event types based on the source identifier and field parsing results in the standardized event sequence; constructing a process closure template based on the preorder relationship, subsequent relationship, and verification relationship; determining the necessary chain positions, alternative chain positions, and prohibited cross-link chain positions in the process closure template, and generating a template chain rule set.
4. The process multidimensional data processing system according to claim 3, characterized in that, The method for generating candidate closed chain sets and chain position coverage results includes: taking a standardized event sequence and a template chain position rule set as input, performing template assignment on the standardized event sequence according to the template identifier to generate template candidate event groups; performing chain position matching on the template candidate event groups as input to determine necessary chain positions, alternative chain positions, and prohibited cross-link chain positions; performing position connection on the template candidate event groups that have completed chain position matching to generate closed candidate paths; performing path filtering on the closed candidate paths to generate candidate closed chain sets; performing number and position write-back on the candidate closed chain sets to generate closed chain identifiers; and taking the candidate closed chain sets and the template chain position rule set as input, performing position occupancy judgment and coverage integrity judgment according to the template position list to generate chain position coverage results.
5. The process multidimensional data processing system according to claim 4, characterized in that, The method for generating closure verification results includes: taking the candidate closure chain set and chain bit coverage results as input, reading the chain bit coverage results corresponding to each candidate closure chain, and generating the closure chain to be verified; performing cross-source verification on the closure chain to be verified, and generating cross-source verification results; performing time-series penetration verification on the closure chain to be verified, and generating time-series penetration verification results; performing retracement consistency verification on the closure chain to be verified, and generating retracement consistency verification results; and taking the cross-source verification results, time-series penetration verification results, retracement consistency verification results, and chain bit coverage results as input, performing closure state summarization, and generating closure verification results.
6. The process multidimensional data processing system according to claim 5, characterized in that, The method for generating a set of broken chains includes: taking the closure check result and the set of candidate closure chains as input, filtering the corresponding candidate closure chains based on the closure failure status in the closure check result, and generating a set of candidate broken chains; extracting the break position for each candidate broken chain in the set of candidate broken chains, and generating a break position result; determining the break type based on the break position result, and generating a break type result; and writing the broken chains based on the set of candidate broken chains, the break position result, and the break type result, and generating a set of broken chains.
7. A process multidimensional data processing system according to claim 5 or 6, characterized in that, The method for outputting chain-level closure results and break type results includes: performing corresponding organization on the closure check results and break chain set based on the closure chain identifier to generate a sample candidate record set; performing sample admission screening and sample field reorganization on the candidate closure chain records in the sample candidate record set corresponding to the closure passed state to generate a clean sample set; performing break sample screening and sample field reorganization on the candidate closure chain records in the sample candidate record set corresponding to the closure failed state to generate a break sample set; performing sample consistency checks on the clean sample set and the break sample set; training the closure discrimination model with the clean sample set and the break sample set as input; calling the closure discrimination model with the candidate closure chain to be judged as input, and outputting chain-level closure results and break type results.
8. A process multidimensional data processing system according to claim 7, characterized in that, The method for generating missing link completion paths includes: filtering candidate closure chains that failed to complete the corresponding closure in the chain-level closure results; reading the corresponding break type results; when the break type result is a covering break type, determining the missing link name and generating the position completion order based on the required and alternative links in the template link rule set; when the break type result is a confirmation break type, a temporal break type, a back-pointing break type, or a bridging break type, generating the corresponding completion order based on the confirmation position requirements, allowed connection positions, original record back-pointing information, or prohibited connection positions respectively; and summarizing the completion orders to form the missing link completion path.
9. A process multidimensional data processing system according to claim 8, characterized in that, The method for generating multidimensional data processing results includes: organizing the chain-level closure results, break type results, and missing chain position completion paths according to the closed chain identifier to generate a set of closed chains to be processed; isolating broken chains in the set of closed chains to be processed to generate an isolated closed chain set and a retained closed chain set; performing sample admission filtering on the retained closed chain set to generate an admission closed chain set; extracting completion positions from the isolated closed chain set to generate a rule correction candidate set; performing template chain position rule correction with the rule correction candidate set and the template chain position rule set as input to generate a corrected template chain position rule set; and summarizing the results with the admission closed chain set, the isolated closed chain set, and the corrected template chain position rule set as input to generate multidimensional data processing results.
10. A process multidimensional data processing system according to claim 9, characterized in that, The methods for performing template chain rule corrections include: when the break type result is a covering break type, adding or replacing mandatory or alternative chain positions; when the break type result is a confirming break type, correcting the confirming position requirements; when the break type result is a timing break type, correcting the allowed connection positions; when the break type result is a bridging break type, correcting the prohibited connection positions; and when the break type result is a backfinding break type, correcting the backfinding check supplementary rules.