A claim settlement document context verification method, device, equipment and medium
By encapsulating data, aligning it spatiotemporally, and constructing a semantic graph for claims documents, a comprehensive risk score is calculated. This solves the problem in existing technologies that cannot verify the consistency between document content and claims events, enabling timely identification of inconsistent documents and improving the accuracy of claims review.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- PING AN TECH (SHENZHEN) CO LTD
- Filing Date
- 2026-03-06
- Publication Date
- 2026-06-12
AI Technical Summary
Existing claims document review technology cannot effectively verify the consistency between document content and the context of the claims event, making it difficult to identify documents that appear authentic but do not match the claims event in a timely manner.
By acquiring the claims documents to be verified, encapsulating the data, extracting the structured element set, and constructing a spatiotemporal consistency tensor by combining spatiotemporal alignment, constructing a semantic graph using claims events, integrating spatiotemporal contradiction scores and semantic matching degrees to calculate a comprehensive risk score, and finally determining the threshold.
It enables timely identification of claim documents that appear authentic but do not match the actual claim events, thereby improving the accuracy and reliability of claim review.
Smart Images

Figure CN122199160A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of data analysis technology, and in particular to a method, apparatus, device, and medium for context verification of claims documents. Background Technology
[0002] In insurance claims, claims review typically involves processing various types of supporting documents, such as medical invoices, medical reports, and transportation receipts. In medical reimbursement and auto insurance claims, the review process often relies on document image recognition to extract text information, verifying the completeness of elements based on fixed templates, and detecting whether the document has been directly edited using image metadata or pixel features. While these methods can, to some extent, check the readability and apparent authenticity of a single document, their focus is primarily on the document's own textual and image features.
[0003] The existing technology suffers from a lack of automatic verification capabilities to ensure consistency between the content of claims documents and the context of the claims event. This makes it difficult to promptly identify document content that does not match the claims event, even when the document appears authentic and complete. For example, in a fintech risk control scenario, a receipt might show a consultation time of 9:00 AM and the location as location A, but auxiliary records from adjacent time periods (such as location tracking and operation logs) show the user as being in location B, which is far from location A. Existing template-based verification processes might still approve the receipt. Similarly, in a healthcare reimbursement scenario, a user submits a pharmacy receipt to claim reimbursement for a medical examination or treatment. Existing systems might allow the reimbursement even after confirming the receipt's authenticity and completeness of fields, failing to provide a reliable verification conclusion regarding the inherent consistency between the document content and the claims event. Summary of the Invention
[0004] This invention provides a method, apparatus, device, and medium for context verification of claims documents, in order to solve the technical problem that existing claims document review technologies cannot automatically verify the consistency between the content of claims documents and the context of claims events, resulting in difficulty in timely identifying claims documents that appear authentic but do not match the claims events.
[0005] Firstly, a method for contextual validation of claims documents is provided, including: Obtain the claim document to be verified, encapsulate the claim document to be verified into a data package to be verified; The data packet to be verified is subjected to element extraction to obtain a structured element set; The structured feature set is spatiotemporally aligned, and a spatiotemporal consistency tensor is constructed in conjunction with the data packet to be verified. A spatiotemporal contradiction score is calculated based on the spatiotemporal consistency tensor. A semantic graph is constructed using the claims events in the data packet to be verified to obtain the semantic matching degree; The spatiotemporal contradiction score and the semantic matching degree are fused together to obtain a comprehensive risk score; A threshold determination is performed on the comprehensive risk score to obtain the context verification result of the claim document to be verified.
[0006] Secondly, a context verification device for claims documents is provided, including: The data acquisition module is used to acquire the claim document to be verified, encapsulate the data of the claim document to be verified, and obtain the data packet to be verified. The feature extraction module is used to extract features from the data packet to be verified to obtain a structured feature set. The spatiotemporal alignment module is used to perform spatiotemporal alignment on the structured feature set, construct a spatiotemporal consistency tensor in combination with the data packet to be verified, and calculate the spatiotemporal contradiction score based on the spatiotemporal consistency tensor. The semantic graph construction module is used to construct a semantic graph using the claims events in the data packet to be verified, and to obtain the semantic matching degree. The data fusion module is used to fuse the spatiotemporal contradiction score with the semantic matching degree to obtain a comprehensive risk score; The result output module is used to determine the threshold of the comprehensive risk score and obtain the context verification result of the claim document to be verified.
[0007] Thirdly, a computer device is provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the aforementioned claim document context verification method.
[0008] Fourthly, a computer-readable storage medium is provided, which stores a computer program that, when executed by a processor, implements the steps of the aforementioned claims document context verification method.
[0009] The aforementioned method, apparatus, device, and medium for context verification of claims documents can acquire the claims document to be verified, encapsulate the data of the claims document to be verified to obtain a data package to be verified; extract elements from the data package to obtain a structured element set; perform spatiotemporal alignment on the structured element set, and construct a spatiotemporal consistency tensor in conjunction with the data package to be verified, and calculate a spatiotemporal inconsistency score based on the spatiotemporal consistency tensor; construct a semantic graph using the claims events in the data package to be verified to obtain a semantic matching degree; fuse the spatiotemporal inconsistency score and the semantic matching degree to obtain a comprehensive risk score; and perform a threshold determination on the comprehensive risk score to obtain the context verification result of the claims document to be verified. In this invention, addressing the problem that existing claims document review technologies cannot automatically verify the consistency between the content of claims documents and the context of claims events, a threshold determination can be performed on the calculated comprehensive risk score to obtain the context verification result of the claims document to be verified. Thus, claims documents that appear genuine but do not match the claims events can be identified in a timely manner. Attached Figure Description
[0010] To more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments of the present invention will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0011] Figure 1 This is a schematic diagram of an application environment for a context verification method for claims documents according to an embodiment of the present invention; Figure 2 This is a flowchart illustrating a method for context verification of claims documents according to an embodiment of the present invention; Figure 3 yes Figure 2 A detailed implementation flow diagram of step S10 Figure 1 ; Figure 4 yes Figure 2 A detailed implementation flow diagram of step S20 Figure 2 ; Figure 5 yes Figure 2 A detailed implementation flow diagram of step S30 Figure 3 ; Figure 6 yes Figure 2 A detailed implementation flow diagram of step S40 Figure 4 ; Figure 7 yes Figure 2A detailed implementation flow diagram of step S50 Figure 5 ; Figure 8 yes Figure 2 A detailed implementation flow diagram of step S60 Figure 6 ; Figure 9 This is a schematic diagram of a context verification device for claims documents according to an embodiment of the present invention; Figure 10 This is a schematic diagram of the structure of a computer device according to an embodiment of the present invention; Figure 11 This is another structural schematic diagram of a computer device according to one embodiment of the present invention. Detailed Implementation
[0012] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0013] The context verification method for claims documents provided in this embodiment of the invention can be applied to, for example... Figure 1 In this application environment, the client communicates with the server via a network. The server can obtain the claim document to be verified from the client, encapsulate the data of the claim document to be verified to obtain a data packet to be verified; extract elements from the data packet to obtain a structured element set; perform spatiotemporal alignment on the structured element set, and construct a spatiotemporal consistency tensor based on the data packet to be verified, and calculate a spatiotemporal contradiction score based on the spatiotemporal consistency tensor; construct a semantic graph using the claim events in the data packet to be verified to obtain a semantic matching degree; fuse the spatiotemporal contradiction score and the semantic matching degree to obtain a comprehensive risk score; and apply a threshold to the comprehensive risk score to obtain the context verification result of the claim document to be verified. In this invention, addressing the problem that existing claim document review technologies cannot automatically verify the consistency between the content of the claim document and the context of the claim event, a threshold can be applied to the calculated comprehensive risk score to obtain the context verification result of the claim document to be verified. In this way, claim documents that appear genuine but do not match the claim event can be identified in a timely manner. The client can be, but is not limited to, various personal computers, laptops, smartphones, tablets, and portable wearable devices. The server can be implemented using a standalone server or a server cluster consisting of multiple servers. The invention will now be described in detail through specific embodiments.
[0014] Please see Figure 2 As shown, Figure 2 A flowchart illustrating a method for context verification of claims documents provided in an embodiment of the present invention includes the following steps: S10: Obtain the claim document to be verified, encapsulate the claim document to be verified into a data packet to be verified.
[0015] The system retrieves the claims documents to be verified. These documents can be uploaded from the client, stored and returned from the business system, or synchronously imported from external cooperation channels. After retrieval, the context verification device performs data encapsulation processing on the claims documents to be verified, incorporating the document content and its associated attribute information into the same data object, and configuring traceable identification information for subsequent processing to obtain the data package to be verified.
[0016] S20: Extract elements from the data packet to be verified to obtain a structured element set.
[0017] The data packet to be verified is processed by extracting elements. Based on the key fields required for claims review, elements are identified and extracted from the data packet to be verified to obtain a structured element set.
[0018] S30: Perform spatiotemporal alignment on the structured feature set, construct a spatiotemporal consistency tensor in conjunction with the data packet to be verified, and calculate the spatiotemporal contradiction score based on the spatiotemporal consistency tensor.
[0019] Spatiotemporal alignment is performed on the structured feature set, and a spatiotemporal consistency tensor is constructed in conjunction with the data packet to be verified. Anchor point information that can be used for spatiotemporal analysis is collected from the structured feature set, and auxiliary spatiotemporal records related to the time range of the anchor points are parsed from the data packet to be verified. The two are matched and aligned in the time and location dimensions to obtain an aligned record set. On this basis, the context verification device maps the aligned spatiotemporal difference information to the spatiotemporal consistency tensor, and performs spatiotemporal conflict score calculation on the tensor to output a spatiotemporal conflict score that characterizes the degree of spatiotemporal conflict.
[0020] S40: Construct a semantic graph using the claims events in the data packet to be verified to obtain the semantic matching degree.
[0021] Semantic parsing is performed on the claims event information to abstract and express the core entities involved in the event and the relationships between them. This abstract expression is then written into the data structure of the semantic graph. Subsequently, document semantic fragments corresponding to the semantic graph node categories are extracted from the data packets to be verified. The document semantic fragments are then compared and matched with the entities in the semantic graph to obtain the semantic matching degree, which is used to characterize the consistency between the document content and the semantic context of the claims event.
[0022] S50: The spatiotemporal contradiction score and the semantic matching degree are fused and calculated to obtain a comprehensive risk score.
[0023] The spatiotemporal contradiction score and semantic matching degree are fused and calculated to obtain a comprehensive risk score. This involves numerically aligning and scaling the two types of scores, and then combining them according to a preset fusion strategy. This ensures that the risk information in the spatiotemporal and semantic dimensions is uniformly expressed in the same scoring space, thereby outputting a comprehensive risk score as input for subsequent judgments.
[0024] S60: Determine the threshold for the comprehensive risk score to obtain the context verification result of the claim document to be verified.
[0025] The system retrieves the risk threshold that matches the current business scenario from the threshold configuration, compares the comprehensive risk score with the risk threshold, outputs the corresponding judgment conclusion identifier, and encapsulates the judgment conclusion identifier and comprehensive risk score in a structured manner to obtain the context verification result, which is then used by the claims review module for subsequent processing and record management.
[0026] Combination Figure 3 As shown, step S10 specifically includes the following steps: S101: Initiate a document collection instruction on the verified claims document to perform multi-source collection processing and obtain the original document set.
[0027] Upon receiving a verification request from the claims processing side, a document collection instruction is generated for the claims documents to be verified. This instruction describes the task identifier, claims application identifier, set of document types to be collected, and set of collection sources for this collection task. It also carries access credentials or authorization markers to enable multi-source collection processing of documents from different sources. Multi-source collection processing can cover client upload channels, business system retention channels, and external collaboration channels: in healthcare scenarios, collection sources may include user-uploaded medical invoice images, electronic copies of hospital-issued diagnostic certificates or examination reports, and pharmacy purchase receipt images; in fintech-related insurance claims scenarios, collection sources may include vehicle insurance damage assessment reports, repair lists, images of traffic accident determination materials, electronic invoices or electronic vouchers, etc. After scheduling the document collection instruction, the executing entity retrieves or receives document files and accompanying information from each collection source and summarizes the collection results to obtain the original document set. Each document in the original document set retains its original file format and original content carrier, and may carry accompanying information such as channel identifiers and submission batch identifiers.
[0028] S102: Convert the original document set into uniformly encoded document content data, and configure page order information on the document content data to obtain standardized document content.
[0029] The original document set is standardized by converting documents with different formats, character sets, and page representations into uniformly encoded document content data and configuring page order information for the document content data to obtain standardized document content. Specifically, for text files, the executing entity can uniformly convert character encoding to a preset encoding and standardize line breaks, full-width and half-width characters, and invisible control characters to ensure consistent byte representation of text content during storage and parsing. For image or scanned document files, the executing entity can first perform image decoding and orientation correction, mapping each page of image to a page-level content unit, and recording basic attributes such as image resolution and color mode in the page-level content unit. For multi-page container files, the executing entity can split them into page-level content units and write page order information according to the splitting order to avoid page order misalignment during subsequent parsing. In healthcare settings, invoices and examination reports are often submitted as multi-page scanned copies, with page number information used to mark the sequential relationship between the report's first page, detail pages, and attachment pages. In fintech settings, traffic accident reports and damage assessment materials are often submitted as a combination of multiple documents, with page number information and the page order within each document used together to indicate the reading order of the materials. After the above conversion, the standardized document content can be organized into a traversable sequence of content based on page order.
[0030] S103: Perform metadata extraction processing on the standardized document content, extracting document type identifier, source identifier, collection time identifier and page sequence identifier from the standardized document content, and integrating them to obtain document metadata.
[0031] Metadata extraction is performed on standardized document content to extract document type identifiers, source identifiers, collection time identifiers, and page sequence identifiers, which are then integrated to obtain document metadata. The document type identifier characterizes the document's category attribute in claims processing, such as medical invoices, medical certificates, test reports, prescriptions, or medication lists in healthcare scenarios, and loss assessment reports, repair invoices, accident determination materials, and travel or transportation vouchers in fintech-related insurance claims scenarios. The source identifier characterizes the channel through which the document enters the system, such as client upload, feedback from partner institutions, or synchronization with business systems. The collection time identifier records the time when the executing entity completes collection or receipt, supporting subsequent batch location of materials by time dimension. The page sequence identifier solidifies the page sorting results of the standardized document content. During the extraction process, the executing entity can comprehensively utilize the channel information carried in the collection instruction, the accompanying information of the original document set, the page sequence information of the standardized document content, and the file header information to complete field filling and consistency verification, and organize each field into document metadata according to a preset structure.
[0032] S104: The standardized document content is segmented based on preset segmentation rules, and a hash digest operation is performed on the segmentation results to obtain document fingerprint data.
[0033] Standardized document content is segmented based on preset segmentation rules, and a hash digest operation is performed on the segmentation results to obtain document fingerprint data. The preset segmentation rules divide the standardized document content into multiple stable content segments, facilitating granular summary calculations. Segmentation rules can divide the content into page-level segments according to page order, or further divide the page-level segments into finer-grained segments by paragraphs, heading blocks, table blocks, or fixed-length windows. A hash digest operation is performed on each content segment to obtain a segment digest value, and the segment digest value, along with the corresponding segment number, page number, and segment boundary position, are recorded. To adapt to common scenarios in healthcare, such as detailed invoice pages and report attachment pages, the segmentation rules can segment detailed and non-detailed areas separately to ensure that the summary calculation covers key areas. To adapt to scenarios in fintech where multiple materials are submitted together, the segmentation rules can first segment different materials according to material boundaries, and then segment them within each material according to page order, thus ensuring that the fingerprint data can express the content structure across documents. After the hash digest operation is completed, the digest values of each fragment are summarized and organized according to the fragment number to obtain the document fingerprint data. The document fingerprint data can be used for subsequent data-level comparisons of document content consistency, duplicate submissions, or abnormal replacements.
[0034] S105: Encapsulate the document metadata and the document fingerprint data to obtain the data packet to be verified.
[0035] Document metadata and document fingerprint data are encapsulated to obtain a data package to be verified. During encapsulation, a data package identifier, a claim application identifier, and a version identifier can be configured for the data package. Structured records of document metadata and document fingerprint data are written into the data package, while retaining the association index with standardized document content. This allows subsequent element extraction and contextual verification to complete data reading, result writing back, and tracking within the same data package. In healthcare scenarios, the data package to be verified can carry multiple medical documents for the same claim, describing the document combination relationship through document type identifiers and page sequence identifiers. In fintech scenarios, the data package to be verified can carry multiple source vouchers for auto insurance or accident insurance claims, describing the source and submission batch of the materials through source identifiers and collection time identifiers. After encapsulation, the data package to be verified serves as the input carrier for subsequent processes in the next stage of processing.
[0036] Combination Figure 4 As shown, step S20 specifically includes the following steps: S201: Load the content of the data packet to be verified to obtain a document content sequence.
[0037] The system reads data objects related to the document content from the data packet to be verified and completes the content loading process to obtain the document content sequence. Specifically, it first reads the document identifier and page sequence information from the data packet to be verified, and then arranges the content of multiple pages in order according to the page sequence information. When the data packet to be verified contains multiple claim materials, each material can be arranged separately according to the document type identifier and page sequence information before being summarized and output, so that the document content sequence remains logically traversable and locatable. For healthcare scenarios, the document content sequence may include medical invoice pages, diagnosis certificate pages, examination report pages, medication list pages, etc.; for fintech-related insurance claim scenarios, the document content sequence may include accident determination material pages, damage assessment list pages, repair invoice pages, travel or transportation voucher pages, etc. During the loading process, unified content reading and decoding can be performed on different content carriers, so that subsequent page layout processing and sequence annotation processing are oriented towards a consistent data structure.
[0038] S202: The document content sequence is divided into several page blocks, and the coordinate information of each page block is recorded. The page block set is then obtained by summarizing the information.
[0039] The document content sequence is segmented into several page blocks, and the coordinate information of each page block is recorded, resulting in a page block set. The document content sequence can be traversed page by page, and page analysis is performed on each page, identifying page areas such as title areas, body text areas, tables, invoice details areas, seal areas, header and footer areas, and dividing each area into a page block. For each page block, its position parameters in the page coordinate system are recorded, such as the coordinates of the top left corner, bottom right corner, or width and height parameters, and the page number, page block number, and basic layout type can also be recorded. In healthcare scenarios, examination reports often include tabular indicator areas and conclusion description areas, while invoices often include invoice element areas and cost details areas; in fintech-related scenarios, damage assessment materials and repair lists often include item list areas, total amount areas, and unit information areas. Through page segmentation and coordinate recording, the page block set can support content rearrangement and position backtracking in subsequent stages.
[0040] S203: Reorder the reading order of the page block set, sort and concatenate each page block according to the coordinate information to obtain a continuous text stream.
[0041] The reading order of the page block set is rearranged by sorting and concatenating the blocks according to coordinate information to generate a continuous text stream. This can be achieved by first grouping the blocks by page number, and then sorting them within each page according to a preset reading order rule, such as ascending vertical coordinates as the primary order and ascending horizontal coordinates as the secondary order. For two-column or multi-column layouts, column area recognition results can be incorporated, grouping the blocks by column area number first, and then performing vertical sorting within each column area to reduce semantic breaks caused by cross-column reading. After sorting, the text content corresponding to each block is concatenated sequentially, and separators or paragraph marks are inserted at the block boundaries to obtain a continuous text stream. This continuous text stream can more stably express the sequential relationship between report items, conclusion descriptions, and cost details in healthcare scenarios; and in fintech-related scenarios, it can more stably express the sequential relationship between accident descriptions, damage assessment item lists, and summaries of amounts.
[0042] S204: Configure the statement position index in the continuous text stream to obtain an indexed statement set.
[0043] Configure statement position indexes for continuous text streams to obtain indexed statement sets. Specifically, statement segmentation can be performed on the continuous text stream. Segmentation rules can combine punctuation marks, line breaks, separators, and page block boundary markers to split the continuous text stream into multiple statement units. For each statement unit, the execution entity assigns a statement number and records the start and end offset positions of the statement in the continuous text stream, while retaining the mapping information of the corresponding page number, page block number, and coordinate range. Through the above index configuration, the indexed statement set supports subsequent model element recognition at the statement granularity and also supports tracing the recognition results back to the original page position. In healthcare scenarios, statement position indexes can cover statements such as consultation time, department name, judgment description, and cost items; in fintech-related scenarios, statement position indexes can cover statements such as accident time, accident location, repair items, and total amount.
[0044] S205: Input the indexed statement set into a preset sequence labeling model to obtain a candidate feature fragment set, and configure feature type labels and fragment position indexes for the candidate feature fragments to obtain a labeled candidate fragment set.
[0045] The set of indexed statements is input into a pre-defined sequence labeling model to obtain a set of candidate feature fragments. Feature type labels and fragment position indices are then configured for each candidate feature fragment, resulting in a labeled set of candidate fragments. Each statement unit can be segmented into words or characters to obtain a symbol sequence, with index information such as statement number, page number, and page block number input as feature fields. The sequence labeling model outputs labeling results for each position of the symbol sequence. The labeling results can take the form of start markers, internal markers, and non-feature markers to characterize the boundaries of feature fragments within the statement. Adjacent and similar marker positions are aggregated into candidate feature fragments, and a feature type label and fragment position index are recorded for each candidate feature fragment. The fragment position index can include the statement number to which the fragment belongs, the start and end positions of the fragment within the statement, the start and end offsets of the fragment in the continuous text stream, and the corresponding page number and page block coordinate range. Element type labels can cover time-based elements, location-based elements, cost-based elements, institution-based elements, number-based elements, and event description-based elements. In the healthcare context, institution-based elements can correspond to hospital names or department names, and number-based elements can correspond to invoice numbers or medical record numbers. In the fintech context, institution-based elements can correspond to repair shops or issuing entities, and number-based elements can correspond to damage assessment report numbers or invoice numbers. The labeled candidate fragment set preserves the correspondence between text content and location indexes in its data structure.
[0046] S206: Convert the values, units, and formats of the labeled candidate fragments according to the preset normalization mapping table to obtain a set of element value pairs.
[0047] The system converts the values, units, and formats of labeled candidate fragments according to a pre-defined normalization mapping table, resulting in a set of element value pairs. It can apply corresponding normalization rules to different element types. For example, for date and time elements, it converts various expressions into a unified date format and fills in missing fields; for location elements, it standardizes multi-granular expressions such as province, city, district, street address, and institution address into a unified address field structure; for monetary elements, it converts Chinese capital amounts, amounts with thousands separators, and amounts in different currencies or units of account into a unified numerical expression and standardizes the currency unit label; for unit of measurement elements, it converts drug specifications, dosage units, and examination item units into standard units within a pre-defined unit set. The normalization mapping table can include mappings for synonymous units, common date formats, common monetary symbols, and Chinese numerals to Arabic numerals. In healthcare scenarios, normalization processing can cover total costs, individual costs, medication dosages, and examination indicator units; in fintech-related scenarios, it can cover repair labor costs, parts costs, total amounts, and invoice amounts. The normalized standard values are paired with the original fragment values and recorded, while retaining the fragment location index and feature type label, to generate a set of feature value pairs.
[0048] S207: Organize the feature type labels and fragment location indexes into a structured feature set based on the feature value pair set.
[0049] The feature type labels and fragment location indexes are organized into a structured feature set based on the feature value pair set. Feature value pairs can be merged by feature type, organizing multiple records of the same feature type into a list structure. For each record, the structured fields can include at least a standard value field, an original value field, a page number field, a statement number field, a continuous text flow offset field, and a page block coordinate field to support subsequent feature location, filtering, and consistency verification. When multiple candidate records exist for the same feature type, they can be organized based on the page area type, occurrence position priority, and record completeness of the fragment location index. For example, records located in the ticket feature area, summary area, or conclusion area are prioritized, and multiple duplicate records are retained to support subsequent comprehensive judgment. In the healthcare scenario, the structured feature set can simultaneously include feature records such as consultation time, consultation location, medical institution name, total cost, cost details, and judgment description; in the fintech scenario, the structured feature set can simultaneously include feature records such as accident time, accident location, issuing unit, damage assessment item, total amount, and invoice number.
[0050] Combination Figure 5 As shown, step S30 specifically includes the following steps: S301: The structured feature set is organized into spatiotemporal anchor points to extract multiple anchor point entries, and an index, corresponding time, location and cost are configured for each anchor point entry to obtain a spatiotemporal anchor point sequence.
[0051] The structured feature set is processed to organize spatiotemporal anchor points, extracting multiple anchor point entries. Each anchor point entry is then indexed and assigned a corresponding time, location, and cost, resulting in a spatiotemporal anchor point sequence. Records combining time and location elements are selected from the structured feature set and linked according to their position indices on the same page, within the same paragraph, or in adjacent sentences, yielding several candidate anchor point entries. For each candidate anchor point entry, an anchor point index i is written, along with the time t of that anchor point entry. i Location l i With cost c i Among them, t i This represents the time value corresponding to the i-th spatiotemporal anchor point extracted from the claim document to be verified, such as the time of medical visit, examination, or accident. i This represents the location value corresponding to the i-th spatiotemporal anchor point extracted from the claim document to be verified, such as hospital address, pharmacy address, accident location, or repair shop address; c iThis represents the cost value associated with the i-th spatiotemporal anchor point, such as examination fees, medication costs, repair costs, or transportation costs. Multiple spatiotemporal anchor point entries in the same document can be sorted chronologically, and the sorted set of anchor point entries can be output as a spatiotemporal anchor point sequence. .
[0052] S302: Based on the time range in the spatiotemporal anchor sequence, perform auxiliary spatiotemporal record parsing on the data packet to be verified to obtain an auxiliary spatiotemporal sequence.
[0053] Auxiliary spatiotemporal records are parsed based on the time range of the spatiotemporal anchor sequence to obtain an auxiliary spatiotemporal sequence. Auxiliary record entry information associated with the claim application is read from the data packet to be verified, and spatiotemporal data related to location or operation is parsed within the user-authorized scope. Auxiliary records can come from the behavior logs retained by the vehicle owner service client, health service client, or business acceptance terminal during the claim process. A query time window is constructed using the minimum and maximum times of the spatiotemporal anchor sequence. Auxiliary spatiotemporal records are extracted within the query time window to obtain an auxiliary spatiotemporal sequence containing time and location fields. .in, This represents the recording time corresponding to the j-th auxiliary spatiotemporal record. This indicates the recording location corresponding to the j-th auxiliary spatiotemporal record; when the auxiliary record is a positioning trajectory, It can be latitude and longitude coordinates; when the auxiliary record is a business outlet visit or online operation record, This can be a network address or access point location description. To improve alignment stability, the address class can be... Geocoding is performed to convert it into latitude and longitude coordinates, and the latitude and longitude are unified under the same coordinate reference.
[0054] S303: Perform spatiotemporal alignment processing on the spatiotemporal anchor sequence and the auxiliary spatiotemporal sequence to obtain an aligned record pair sequence.
[0055] The spatiotemporal anchor sequence is spatiotemporally aligned with the auxiliary spatiotemporal sequence to obtain an aligned record pair sequence. For each spatiotemporal anchor sequence... Retrieve from auxiliary spatiotemporal sequences A set of records that meet the preset time neighborhood conditions, and select records from that set that are similar to... Minimum time difference or coverage Records within the specified time period are used as matching records to obtain auxiliary records corresponding to the anchor entries. .in, Indicates and Aligned auxiliary recording time, Indicates and Aligned auxiliary record locations. After completing the anchor-point alignment, output the sequence of aligned record pairs. In the healthcare context, This can correspond to the time on hospital visit or examination receipts. It can correspond to location trajectory points within the same time period; in fintech-related scenarios. This can be matched with the time recorded on the accident investigation report or repair invoice. It can record the location and operation time related to accident reporting, investigation, or travel.
[0056] S304: Perform spatial difference calculation on the aligned record pair sequence to obtain a distance sequence.
[0057] Spatial difference calculation is performed on the aligned record sequence to obtain a distance sequence. Specifically, the executing entity will use the document location... and auxiliary locations The expression is uniformly converted to spatial coordinates and then calculated. The spatial difference result of the i-th anchor point is used to obtain the distance sequence. ,in Distance(·) represents the geographic distance function, when and When using latitude and longitude coordinates, Distance(·) can be calculated as spherical distance or projected plane distance; when... When dealing with address text, first... Perform geocoding to obtain latitude and longitude, then combine with... Calculate Distance(·). To support subsequent risk characterization, it can be calculated for each... At the same time, the unit marker, such as meter or kilometer, is retained, and the coordinate source identifiers involved in the calculation are recorded to ensure that the distance sequence is traceable.
[0058] S305: Perform time span calculation processing on the aligned record pair sequence and the distance sequence to obtain a time span sequence.
[0059] The time span sequence is obtained by calculating the time span between the aligned record sequence and the distance sequence. This can be done for each anchor point. As a result of the time span, a time span sequence is obtained. ,in The TimeSpan() function represents the time span function. Desirable ; Alternatively, the minimum reachable span between two time periods can be taken, or the difference between the center points of the time periods can be used; when the auxiliary record is a discrete trajectory point, the executing entity can... Set as with The time of the nearest trajectory point is used to ensure that TimeSpan(·) is computable. Convert all values to seconds or minutes and set default handling rules for outliers, such as when... When the value is 0, a preset minimum time span is used to avoid division by zero.
[0060] S306: Based on the distance sequence and time span sequence, perform tensor construction processing to obtain a spatiotemporal consistency tensor, and perform spatiotemporal contradiction score calculation processing based on the spatiotemporal consistency tensor to obtain a spatiotemporal contradiction score.
[0061] Tensor construction is performed based on distance sequences and time span sequences to obtain a spatiotemporally consistent tensor T. STC Based on the spatiotemporal consistency tensor, a spatiotemporal contradiction score is calculated to obtain the spatiotemporal contradiction score S. Conflict Specifically, the feature vector of each anchor point is denoted as... The feature vectors are stacked according to anchor index i to obtain the spatiotemporal consistency tensor T. STC T STC Used to carry spatiotemporal differential information and cost information for multiple anchor points, among which From the distance sequence, From time span series, The cost field comes from the spatiotemporal anchor sequence. The spatiotemporal contradiction score can be calculated using the following formula: in, Indicates a score for spatiotemporal contradictions; This indicates that the summation is performed over all anchor index i; This represents the document location of the i-th anchor point. Aligned auxiliary locations The geographical distance between them, corresponding to the distance sequence ; This represents the document time of the i-th anchor point. With the aligned auxiliary time The time span between them corresponds to the time span sequence in the time span sequence. It can be This serves as a summary measure of the degree of conflict among multiple anchor points, while retaining single anchor point items during the calculation process. This supports subsequent identification of the specific source of the anomaly. Simultaneously, the executing entity can introduce a distance threshold D. th Determine the distance to a single anchor point: when >D th When this happens, mark the anchor point as a distance anomaly anchor point and associate the distance anomaly mark with the corresponding anchor point. Write it together to T STCIn the extended fields or accompanying records, ensure that subsequent processes can trace back to the anchor entry that triggered the exception when referencing the spatiotemporal contradiction score. In healthcare scenarios, D th The reachability distance can be set according to the medical treatment scenario, for example, different thresholds can be used for medical treatment within the same city and for medical treatment across cities; in fintech-related scenarios, D th The parameters can be set according to the geographical distribution of the accident investigation and repair process; for example, different thresholds can be used for the reachability distance between the accident site and the repair network. Finally, the executing entity outputs a spatiotemporal consistency tensor T. STC Scoring for contradictions with time and space This serves as the input for subsequent fusion calculations.
[0062] Combination Figure 6 As shown, step S40 specifically includes the following steps: S401: Perform event element parsing processing on the claim event information in the data packet to be verified, extract the attribute values of name element, claim method element, and product type element, and integrate them to obtain the event semantic element set.
[0063] The claims event information in the data packet to be verified is parsed to extract attribute values for name, claims method, and product type elements, which are then integrated to obtain an event semantic element set. Event fields such as claims application registration information, policy or product information, and accident or medical treatment descriptions are read from the data packet to be verified. These event fields are segmented, parsed, and mapped, and attribute values that represent the core semantics of the event are written into the event semantic element set. The name element represents the core object of the claims event; for example, in a healthcare scenario, it could be a judgment name or medical event name, while in a fintech-related scenario, it could be a risk event name or accident event name. The claims method element represents the handling or claims method of the event, such as outpatient reimbursement, inpatient reimbursement, medication reimbursement, online claims, offline claims, etc. The product type element represents the product or insurance type corresponding to the claims, such as medical insurance, critical illness insurance, accident insurance, car insurance, etc. Each item in the event semantic element set retains its original field value and field identifier.
[0064] S402: Perform semantic normalization processing based on the event semantic element set to obtain a standardized event semantic set.
[0065] Semantic normalization is performed on the event semantic element set to obtain a standardized event semantic set. Synonym merging, ambiguity resolution, and category mapping are performed on name elements, claims method elements, and product type elements respectively to reduce semantic bias caused by different reporting methods and expression habits. For name elements, colloquial descriptions, abbreviations, and aliases can be mapped to standard terms, and a synonym set is retained to cover common variations. For claims method elements, multiple expressions can be mapped to a unified method enumeration set. For product type elements, product names and insurance type names can be mapped to a unified product type set. The standardized event semantic set can be represented as a data structure composed of several standard entity items and standard relation items. Standard entity items contain entity types and standard entity values, and standard relation items contain relation types and entity identifiers at both ends of the relation, providing direct input for subsequent graph construction.
[0066] S403: Write the entity items in the standardized event semantic set into the corresponding node set, and write the relation items in the standardized event semantic set into the corresponding edge set to obtain a semantic knowledge graph.
[0067] The entity items in the standardized event semantic set are written into the corresponding node set, and the relation items in the standardized event semantic set are written into the corresponding edge set, resulting in a semantic knowledge graph (SKG). This SKG is used to represent the contextual semantic structure of a claims event. Its node set includes at least node categories such as disease name nodes, treatment method nodes, and claims product type nodes. The edge set represents the semantic relationships between nodes, such as the diagnosis-treatment relationship between a name node and a treatment method node, the protection relationship between a name node and a claims product type node, and the claims relationship between a claims method node and a claims product type node. For example, in a healthcare scenario, the node set can include nodes such as "leg fracture," "orthopedic treatment or surgery," and "medical insurance," and the edge set can include relationships such as "name-treatment method" and "name-product type." In a fintech scenario, the node set can include nodes such as "accident name," "handling method," and "product type," and the edge set can include relationships such as "accident name-handling method" and "accident name-product type." Thus, the SKG uses a structured representation of node and edge sets to carry the semantic context of a claims event.
[0068] S404: Based on the node categories in the semantic knowledge graph, extract candidate phrases corresponding to the node categories to obtain a set of document semantic fragments.
[0069] Based on the node categories in the SKG, candidate phrases corresponding to the node categories are extracted to obtain a document semantic fragment set. The document text content in the data package to be verified can be read, and phrase extraction and category aggregation processing can be performed on the text content. On the one hand, candidate phrase extraction rules can be configured according to node categories. For example, suspected disease or judgment phrases can be extracted for name node categories, suspected examination, treatment, medication, or surgery phrases can be extracted for treatment method node categories, and suspected insurance or product phrases can be extracted for product type node categories. On the other hand, the extracted phrases and their position markers in the document are recorded together so that the matching results can be traced back to specific document fragments later. The document semantic fragment set can be grouped and stored according to node categories, such as treatment phrase groups, product phrase groups, etc., so that subsequent matching calculations are completed within the same semantic space.
[0070] S405: Perform semantic matching processing using the document semantic fragment set and the semantic knowledge graph, and select the node with the highest similarity as the matching node to obtain a matching result set.
[0071] Semantic matching is performed between the document semantic fragment set and the SKG, and the node with the highest similarity is selected as the matching node to obtain a matching result set. Specifically, each document semantic fragment is mapped to a semantic representation vector, and the standard entity value of each node in the SKG is mapped to a node semantic representation vector. Then, the execution subject calculates the similarity score for each document semantic fragment within the node set of its corresponding node category, selects the node with the highest score as the matching node, and records the identifier of the matching node, the similarity score, and the document fragment position marker to obtain the matching result set. Taking the healthcare scenario as an example, when the semantics of the claim event is a leg fracture and the corresponding treatment method node is orthopedic treatment, but the document semantic fragment contains phrases related to cardiovascular drugs, the similarity score between the document semantic fragment and the disease name node and the treatment method node will be low, and the matching result set will show low similarity or match irrelevant nodes. Taking the fintech scenario as an example, when the claim event is vehicle accident repair, but the document semantic fragment is mainly the content of consumption receipts unrelated to the accident, the similarity score between the document semantic fragment and nodes such as accident handling methods and repair items will be low, and the matching result set will also show low similarity.
[0072] S406: Summarize the semantic similarity in the matching result set according to the preset aggregation rules to obtain the semantic matching degree.
[0073] The semantic similarity scores in the matching result set are aggregated according to preset aggregation rules to obtain the semantic matching score. The similarity score is read for each matching record in the matching result set and aggregated and calculated according to node category or fragment importance. Preset aggregation rules can include mean aggregation, weighted mean aggregation, or quantile aggregation, etc., and different weights can be configured for name-based matching, treatment method-based matching, and product type-based matching to reflect the contribution of different semantic dimensions in contextual consistency judgment. After aggregation, the executing entity outputs the semantic matching degree. This data, along with key records from the matching result set, is retained to support subsequent fusion calculations and anomaly localization. In healthcare scenarios, Used to characterize the semantic consistency between the content of a bill or report and the disease or treatment method; in fintech-related scenarios, Used to characterize the semantic consistency between the content of the voucher and the type and handling method of the accident.
[0074] This semantic matching process can be expressed by the following formula: in, This indicates the semantic matching degree; Match(·) represents the semantic matching calculation function, used to calculate the degree of matching between the document semantics and the claim event semantics; DocContent represents the semantic representation of the claim document to be verified, which can be extracted and summarized from the document text content; This represents the semantic knowledge graph SKG constructed from the claims event. Furthermore, Match(·) can be decomposed into calculating the similarity between the document semantic fragment and the graph node and selecting the node with the highest similarity. For each document semantic fragment, its similarity with the candidate node is calculated, and the node with the highest similarity is selected as the matching node to obtain the single fragment matching result.
[0075] Combination Figure 7 As shown, step S50 specifically includes the following steps: S501: Perform a score loading process on the spatiotemporal contradiction score and the semantic matching degree to obtain a score input record.
[0076] The spatiotemporal contradiction score and semantic matching degree are processed for score loading to obtain the score input record. The spatiotemporal contradiction score is then read. semantic matching degree Configure the corresponding rating field identifier and data type identifier, and write them into the rating input record. Used to characterize the degree of contradiction between the document's spatiotemporal anchor points and auxiliary spatiotemporal records; Used to represent the semantics of document content and semantic knowledge graphs The degree of matching between them. During loading, boundary checks and missing value completion can be performed on the score values: when When the value exceeds the preset range, truncation or normalization mapping is performed; when If the score exceeds the [0,1] range, it will be pruned back to the preset range; if any score is missing, a default value will be written and the missing value field will be written synchronously.
[0077] S502: Subtract the semantic matching degree to obtain a semantic difference item, and write the semantic difference item back to the scoring input record to obtain an extended scoring record.
[0078] Subtracting the semantic matching score yields a semantic difference item, which is then written back to the scoring input record to obtain an expanded scoring record. This is done using the semantic matching score... Perform a subtraction operation on the input to calculate the semantic difference term. The calculation method is as follows =1- ,in This indicates the degree of difference between the semantics of the document content and the semantics of the claims event; a larger value indicates a greater difference, and a smaller value indicates a smaller difference. Write the semantic difference field to the rating input record and retain it. Original value field and The mapping relationship between fields yields extended scoring records. In healthcare scenarios, when the semantic set of a claim event contains a disease name of leg fracture and the document content contains semantic fragments related to cardiovascular drugs, It can present a lower value in the matching calculation, thus making The values are relatively high; in fintech-related scenarios, when the semantic focus of the claims event includes vehicle accident repairs and the document content mainly consists of descriptions of consumer receipts unrelated to the accident, It may also be too low, which in turn makes Too high. The above process only describes the numerical generation and field writing, and does not change the independence of the subsequent threshold determination.
[0079] S503: The extended rating records are concatenated into a behavior feature vector according to a preset field mapping rule, and the behavior feature vector is subjected to historical pattern comparison processing to obtain behavior matching records.
[0080] The extended rating records are concatenated into a behavioral feature vector according to a preset field mapping rule. This behavioral feature vector is then subjected to historical pattern comparison to obtain behavioral matching records. The preset field mapping rule configures the feature field set and field order, and the extended rating records are then... Fields Fields The fields and basic statistical fields associated with the claim application are written to the corresponding positions in the feature vector to obtain the behavior feature vector FeatureVec. The basic statistical fields can be read from the data packet to be verified or the business side event information and written in a standardized manner, such as the number of documents, the number of pages, the combination of document types, the number of submission batches, the total cost range marker, the claim method enumeration marker, the product type enumeration marker, etc., so that the behavior feature vector is compatible with the differentiated data distribution of medical and health and financial technology related scenarios. Then, the historical pattern comparison component is called to read the historical pattern vector set PatternSet that matches the current business scenario from the historical behavior sample library, and perform similarity calculation or distance calculation on FeatureVec and PatternSet to obtain the behavior matching degree BehaviorMatch and its accompanying information. BehaviorMatch is used to characterize the similarity between the current behavior feature vector and the historical fraud behavior pattern. Its numerical domain can be normalized to [0,1] or other preset intervals according to the preset mapping rules; the accompanying information can include the most similar pattern identifier, similarity score sequence or matching confidence, etc., for subsequent auditing and backtracking. In healthcare scenarios, PatternSet can be grouped and maintained according to sub-types such as outpatient, inpatient, and medication purchase; in fintech-related scenarios, PatternSet can be grouped and maintained according to sub-types such as car insurance and accident insurance. Before comparison, the executing entity selects the matching group and then performs the comparison calculation.
[0081] S504: Read the weight parameters from the preset parameter storage area, and perform fusion and summation processing on the extended scoring record and the behavior matching record according to the read weight parameters to obtain a comprehensive risk score.
[0082] The system reads weight parameters from a preset parameter storage area, and then fuses and sums the extended scoring records and behavior matching records based on these weight parameters to obtain a comprehensive risk score. (The system reads weight parameters from the parameter storage area.) , , and read the risk threshold. The configuration item identifier is used for subsequent steps, where Used to characterize the weight of spatiotemporal contradiction scores in fusion calculations Used to characterize the weights of semantic difference terms in fusion computation Used to characterize the weight of behavioral matching degree in fusion calculation The risk threshold used to characterize the comprehensive risk score. The comprehensive risk score can be calculated using the following formula. : in, This represents the overall risk score; Indicates a score for spatiotemporal contradictions; Indicates semantic matching degree; Right now BehaviorMatch represents the degree of behavior matching obtained from comparing historical patterns. , , These are weighting parameters that satisfy a preset numerical range and can be configured according to business scenarios. This is the risk threshold, used in subsequent threshold determination steps. (During calculation...) Before, it is possible to , Perform the same scaling process as BehaviorMatch, such as mapping to a unified interval according to a preset distribution, to ensure the numerical comparability of the three-item weighted sum; after calculation, Write to the fusion result field and , BehaviorMatch , , The value and version identifier are written together into the tracking field to obtain a traceable comprehensive risk score record.
[0083] In this implementation, a comprehensive risk score is used. As input for subsequent threshold determination, the configured risk threshold can be used. right Perform comparison operations; when Below At that time, the claim documents to be verified can be marked as trustworthy and enter the corresponding fast-track processing path. Not less than At this point, the claims documents pending verification enter another processing path and are processed by subsequent procedures according to business strategies.
[0084] Combination Figure 8 As shown, step S60 specifically includes the following steps: S601: Perform threshold retrieval processing on the comprehensive risk score to retrieve the risk threshold corresponding to the comprehensive risk score from the preset threshold configuration table.
[0085] A threshold retrieval process is performed on the comprehensive risk score to retrieve the corresponding risk threshold from a preset threshold configuration table. First, the comprehensive risk score field is read. The system simultaneously reads the scenario information fields associated with the current claim application to locate threshold configuration entries. These scenario information fields may include business scenario identifiers, product type identifiers, claim method identifiers, application channel identifiers, regional identifiers, and threshold configuration version identifiers. The executing entity filters the preset threshold configuration table according to the scenario information fields to obtain a set of candidate threshold entries. Within this set, it selects the final threshold entry based on a priority field. The priority field can be set in a fine-grained to coarse-grained order; for example, it prioritizes threshold entries that simultaneously satisfy both product type and claim method, then selects threshold entries that only satisfy product type, and finally selects threshold entries that only satisfy the business scenario identifier. Subsequently, the executing entity reads the risk threshold from the final threshold entries. and will The version identifier and scope identifier of the threshold entry are written into the threshold retrieval record as input for subsequent judgment processes. For example, in a healthcare scenario, threshold configurations can be differentiated by product type such as health insurance and medical insurance, and by claim methods such as outpatient reimbursement and inpatient reimbursement; in a fintech scenario, threshold configurations can be differentiated by product type such as auto insurance and accident insurance, and by channels such as online claims and offline claims, thereby ensuring that risk thresholds are consistent with the distribution of business risks.
[0086] S602: Calculate the relationship between the comprehensive risk score and the risk threshold to obtain the judgment mark record.
[0087] Calculate the relationship between the comprehensive risk score and the risk threshold to obtain the judgment indicator record. Read... and The comparison operation is performed to output a decision flag, which can be a binary or multi-valued flag. When using a binary flag, the flag can represent either a pass or fail flag; when using a multi-valued flag, the flag can be further differentiated into low-risk, medium-risk, and high-risk levels. To improve the stability of the decision-making process, adjustments can be made before the comparison operation. Perform numerical field verification and missing marker verification, that is, when When a missing flag exists, the Flag can be set to require manual review, and the reason for the missing flag can be written into the decision flag record; when When overflow or outliers exist, you can The comparison is performed after processing according to the preset clipping range. The input and output elements of the comparison operation are written to the decision flag record, which must contain at least a score value field. Threshold field The comparison result field (Flag) and the judgment time identifier field are included to facilitate auditing and tracking in subsequent processing stages. For example, in a healthcare scenario, when... Significantly lower than At that time, the judgment record can be written with a low-risk flag; in fintech scenarios, when Higher than or equal to At that time, the judgment mark record can be written into the high-risk mark to trigger a more stringent claims settlement risk control process.
[0088] S603: Map the determination flag record to a disposal path identifier to obtain a path identifier record.
[0089] The judgment flag record is mapped to a disposal path identifier, resulting in a path identifier record. The Flag in the judgment flag record is read, and the corresponding disposal path entry is retrieved from the preset disposal mapping table, outputting the disposal path identifier PathID. The disposal path identifier can be used to distinguish different business processing branches, such as automatic approval path, supplementary material path, manual review path, and escalation audit path. The PathID is written to the path identifier record, and a correlation identifier field is established between the path identifier record and the judgment flag record to ensure that the disposal path can be traced back to the score and threshold. To adapt to the differentiated risk control strategies of fintech and healthcare, the disposal mapping table can be configured with different mapping results according to the business scenario identifier. In healthcare claims, low-risk flags can be mapped to automatic approval or fast review paths, while medium-to-high-risk flags can be mapped to manual review paths to verify invoices and treatment content. In fintech-related claims, low-risk flags can be mapped to automatic approval or fast claims processing paths, while high-risk flags can be mapped to audit paths to verify the authenticity of the claim, the reasonableness of the loss assessment, and the consistency of the invoices.
[0090] S604: Assign a context verification conclusion identifier to the processing path identifier according to the preset conclusion code table and write it into the path identifier record to obtain the conclusion record.
[0091] Assigning context verification conclusion identifiers to the processing path identifiers according to the preset conclusion code table and writing them into the path identifier record yields the conclusion record. Reading the PathID and retrieving the corresponding conclusion code entry from the preset conclusion code table obtains the context verification conclusion identifier, ConclusionCode. The context verification conclusion identifier can be expressed using enumeration codes or hierarchical codes. Enumeration codes can directly represent categories such as passed, requiring review, requiring supplementation, and risk warning. Hierarchical codes can further include business scenario prefixes, risk level suffixes, and version suffixes to support cross-scenario statistics and configuration management. The ConclusionCode is written into the path identifier record, supplemented with the conclusion description field, conclusion generation time identifier field, and code table version identifier field, and the conclusion record is output. For example, in a healthcare scenario, an automatically passed path can correspond to a passed conclusion identifier, and a manually reviewed path can correspond to a reviewed conclusion identifier; in a fintech scenario, an audit path can correspond to a risk warning conclusion identifier, and the risk level label triggering the conclusion can be recorded in the conclusion description field for collaborative processing of claims and risk control.
[0092] S605: The conclusion record and the comprehensive risk score are structurally encapsulated to obtain the context verification result.
[0093] The conclusion record and the comprehensive risk score are structured and encapsulated to obtain the context validation result. A ResultPack object can be constructed, and the ConclusionCode, PathID, Flag, threshold entry version identifier, etc., from the conclusion record can be written into the ResultPack, along with the comprehensive risk score. Write the rating field to the ResultPack, and you can also write the threshold field. The scoring generation time stamp is included for reuse by the business side during display, tracking, and review. To improve the usability of the results, ResultPack can also write the data packet identifier associated with the data packet to be verified, the claim application identifier, and the result verification batch identifier, enabling the context verification results to be associated with business modules such as claim acceptance, claim risk control, and auditing. After encapsulation, the context verification results can be output, allowing subsequent processes to trigger corresponding review, verification, or audit processing according to the handling path identifier. It also supports tiered handling of the contextual consistency of claim materials in healthcare and fintech-related business scenarios.
[0094] As can be seen, this application encapsulates the data of the claims document to be verified and extracts structured elements. Based on this, it performs spatiotemporal alignment and constructs a spatiotemporal consistency tensor to calculate spatiotemporal inconsistency scores. Simultaneously, it transforms the claims event into a semantic knowledge graph and performs semantic matching with the document content to obtain semantic matching degrees. Then, it fuses the two types of scores, calculates them, and outputs the context verification result after threshold judgment. Technically, this achieves automated verification of the contextual consistency between claims documents and claims events. Therefore, it can address inconsistencies between the time of visit, location of visit, cost items, and treatment semantics in medical and health claims scenarios. This solution identifies inconsistencies between the time, location, loss assessment, or invoice content and the semantics of the incident in insurance claims scenarios related to fintech. It can output stable risk assessment results even when documents appear authentic and have complete fields but exhibit spatiotemporal contradictions or semantic discrepancies. Compared to methods relying solely on OCR, template verification, or image-level tampering detection, this solution improves the accuracy and consistency of contextual verification, reduces the probability of false releases, and supports more efficient review and triage in low-risk situations. This reduces the burden of manual review and improves claims review efficiency and risk control quality.
[0095] It should be understood that the sequence number of each step in the above embodiments does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.
[0096] In one embodiment, a context verification device for claims documents is provided, which corresponds one-to-one with the context verification method for claims documents in the above embodiments. For example... Figure 9 As shown, the context verification device for the claims document includes: a data acquisition module 100, an element extraction module 200, a spatiotemporal alignment module 300, a map construction module 400, a data fusion module 500, and a result output module 600.
[0097] Detailed descriptions of each functional module are as follows: The data acquisition module 100 is used to acquire the claim document to be verified, encapsulate the claim document to be verified, and obtain the data packet to be verified. The feature extraction module 200 is used to extract features from the data packet to be verified to obtain a structured feature set. The spatiotemporal alignment module 300 is used to perform spatiotemporal alignment on the structured feature set, construct a spatiotemporal consistency tensor in combination with the data packet to be verified, and calculate the spatiotemporal contradiction score based on the spatiotemporal consistency tensor. The graph construction module 400 is used to construct a semantic graph using the claims events in the data packet to be verified, and to obtain the semantic matching degree. The data fusion module 500 is used to fuse and calculate the spatiotemporal contradiction score and the semantic matching degree to obtain a comprehensive risk score; The result output module 600 is used to determine the threshold of the comprehensive risk score and obtain the context verification result of the claim document to be verified.
[0098] In one embodiment, the data acquisition module 100 is specifically used for: A document collection command is initiated on the verification claim document to perform multi-source collection processing and obtain the original document set; The original document set is converted into uniformly encoded document content data, and page order information is configured on the document content data to obtain standardized document content; Metadata extraction processing is performed on the standardized document content to extract document type identifier, source identifier, collection time identifier and page sequence identifier from the standardized document content, and integrate them to obtain document metadata; The standardized document content is segmented based on preset segmentation rules, and a hash digest operation is performed on the segmentation results to obtain document fingerprint data. The document metadata and the document fingerprint data are encapsulated to obtain the data packet to be verified.
[0099] In one embodiment, the feature extraction module 200 is specifically used for: The data packet to be verified is loaded to obtain a document content sequence; The document content sequence is divided into several page blocks, and the coordinate information of each page block is recorded. The page block set is then obtained by summarizing the information. The reading order of the page block set is rearranged, and the page blocks are sorted and concatenated according to the coordinate information to obtain a continuous text stream; Configure the statement position index in the continuous text stream to obtain an indexed statement set; The indexed statement set is input into a preset sequence labeling model to obtain a candidate feature fragment set, and feature type labels and fragment position indexes are configured for the candidate feature fragments to obtain a labeled candidate fragment set; The numerical values, units, and formats of the labeled candidate fragment set are converted according to a preset normalization mapping table to obtain a set of element value pairs; The feature type labels and fragment location indexes are organized into a structured feature set based on the feature value pair set.
[0100] In one embodiment, the spatiotemporal alignment module 300 is specifically used for: The structured feature set is spatiotemporally anchored to extract multiple anchor entries, and an index, corresponding time, location and cost are configured for each anchor entry to obtain a spatiotemporal anchor sequence. Based on the time range in the spatiotemporal anchor sequence, the data packet to be verified is parsed using auxiliary spatiotemporal records to obtain an auxiliary spatiotemporal sequence. The spatiotemporal anchor sequence and the auxiliary spatiotemporal sequence are spatiotemporally aligned to obtain an aligned record pair sequence. Spatial difference calculation is performed on the aligned record pair sequence to obtain a distance sequence; The time span is calculated by comparing the aligned record pair sequence with the distance sequence to obtain the time span sequence; Tensor construction is performed based on the distance sequence and time span sequence to obtain a spatiotemporal consistency tensor, and spatiotemporal contradiction score is calculated based on the spatiotemporal consistency tensor to obtain a spatiotemporal contradiction score.
[0101] In one embodiment, the map construction module 400 is specifically used for: The event element parsing process is performed on the claim event information in the data packet to be verified to extract the attribute values of name element, claim method element, and product type element, and integrate them to obtain the event semantic element set; The event semantic element set is subjected to semantic normalization processing to obtain a standardized event semantic set; The entity items in the standardized event semantic set are written into the corresponding node set, and the relation items in the standardized event semantic set are written into the corresponding edge set to obtain a semantic knowledge graph. Based on the extraction of candidate phrases corresponding to node categories in the semantic knowledge graph, a set of document semantic fragments is obtained; Semantic matching is performed using the document semantic fragment set and the semantic knowledge graph, and the node with the highest similarity is selected as the matching node to obtain the matching result set; The semantic similarity in the matching result set is aggregated according to a preset aggregation rule to obtain the semantic matching degree.
[0102] In one embodiment, the data fusion module 500 is specifically used for: The spatiotemporal contradiction score and the semantic matching degree are subjected to a score loading process to obtain a score input record; Subtract the semantic matching degree to obtain a semantic difference item, and write the semantic difference item back to the scoring input record to obtain an extended scoring record; The extended rating records are concatenated into a behavioral feature vector according to a preset field mapping rule, and the behavioral feature vector is subjected to historical pattern comparison processing to obtain behavioral matching records. The weight parameters are read from the preset parameter storage area, and the extended scoring record and the behavior matching record are fused and summed according to the read weight parameters to obtain a comprehensive risk score.
[0103] In one embodiment, the result output module 600 is specifically used for: The comprehensive risk score is subjected to threshold retrieval processing to retrieve the risk threshold corresponding to the comprehensive risk score from a preset threshold configuration table; Calculate the relationship between the comprehensive risk score and the risk threshold to obtain a judgment mark record; The determination flag record is mapped to the disposal path identifier to obtain the path identifier record; According to the preset conclusion code table, a context verification conclusion identifier is assigned to the disposal path identifier and written into the path identifier record to obtain the conclusion record; The conclusion record and the comprehensive risk score are structured and encapsulated to obtain the context verification result.
[0104] For specific limitations regarding the context verification device for claims documents, please refer to the limitations of the context verification method for claims documents above, which will not be repeated here. Each module in the aforementioned context verification device for claims documents can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in or independent of the processor in a computer device in hardware form, or stored in the memory of a computer device in software form, so that the processor can call and execute the operations corresponding to each module.
[0105] In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as follows: Figure 10 As shown. The computer device includes a processor, memory, network interface, and database connected via a system bus. The processor provides computing and control capabilities. The memory includes non-volatile and / or volatile storage media and internal memory. The non-volatile storage media stores the operating system, computer programs, and database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface is used to communicate with external clients via a network connection. When executed by the processor, the computer program implements the functions or steps of a claims document context verification method on the server side.
[0106] In one embodiment, a computer device is provided, which may be a client, and its internal structure diagram may be as follows: Figure 11As shown, the computer device includes a processor, memory, network interface, display screen, and input devices connected via a system bus. The processor provides computing and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system and computer programs. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface is used to communicate with an external server via a network connection. When executed by the processor, the computer program implements client-side functions or steps of a claims document context verification method.
[0107] In one embodiment, a computer-readable storage medium is provided having a computer program stored thereon, which, when executed, can perform the steps provided in the above embodiments.
[0108] It should be noted that the functions or steps that can be implemented by the computer-readable storage medium or computer device described above can be referred to the relevant descriptions on the server side and client side in the foregoing method embodiments. To avoid repetition, they will not be described one by one here.
[0109] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium. When executed, the computer program can include the processes of the embodiments of the above methods. Any references to memory, storage, databases, or other media used in the embodiments provided in this application can include non-volatile and / or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link DRAM (SLDRAM), RAMbus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
[0110] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the above-described division of functional units and modules is used as an example. In practical applications, the above functions can be assigned to different functional units and modules as needed, that is, the internal structure of the device can be divided into different functional units or modules to complete all or part of the functions described above.
[0111] It should be noted that any AI models, software tools, or components not belonging to this company appearing in the embodiments of this application are merely illustrative examples and do not represent actual use. All user personal information involved in the embodiments of this application has been authorized (with the knowledge and consent) by the relevant parties or has been fully authorized by all parties, and the executing entity may obtain it through various legal and compliant means. The collection, storage, use, processing, transmission, provision, and disclosure of the information, data, and signals involved all comply with relevant laws and regulations and do not violate public order and good morals.
[0112] The above-described embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to limit it. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention, and should all be included within the protection scope of the present invention.
Claims
1. A method for contextual verification of claims documents, characterized in that, include: Obtain the claim document to be verified, encapsulate the claim document to be verified into a data package to be verified; The data packet to be verified is subjected to element extraction to obtain a structured element set; The structured feature set is spatiotemporally aligned, and a spatiotemporal consistency tensor is constructed in conjunction with the data packet to be verified. A spatiotemporal contradiction score is calculated based on the spatiotemporal consistency tensor. A semantic graph is constructed using the claims events in the data packet to be verified to obtain the semantic matching degree; The spatiotemporal contradiction score and the semantic matching degree are fused together to obtain a comprehensive risk score; A threshold determination is performed on the comprehensive risk score to obtain the context verification result of the claim document to be verified.
2. The context verification method for claims documents according to claim 1, characterized in that, The step of obtaining the claim document to be verified, encapsulating the claim document to be verified into a data packet to be verified, includes: A document collection command is initiated on the verification claim document to perform multi-source collection processing and obtain the original document set; The original document set is converted into uniformly encoded document content data, and page order information is configured on the document content data to obtain standardized document content; Metadata extraction processing is performed on the standardized document content to extract document type identifier, source identifier, collection time identifier and page sequence identifier from the standardized document content, and integrate them to obtain document metadata; The standardized document content is segmented based on preset segmentation rules, and a hash digest operation is performed on the segmentation results to obtain document fingerprint data. The document metadata and the document fingerprint data are encapsulated to obtain the data packet to be verified.
3. The method for contextual verification of claims documents according to claim 1, characterized in that, The step of extracting features from the data packet to be verified to obtain a structured feature set includes: The data packet to be verified is loaded to obtain a document content sequence; The document content sequence is divided into several page blocks, and the coordinate information of each page block is recorded. The page block set is then obtained by summarizing the information. The reading order of the page block set is rearranged, and each page block is sorted and concatenated according to the coordinate information to obtain a continuous text stream; Configure the statement position index in the continuous text stream to obtain an indexed statement set; The indexed statement set is input into a preset sequence labeling model to obtain a candidate feature fragment set, and feature type labels and fragment position indexes are configured for the candidate feature fragments to obtain a labeled candidate fragment set; The numerical values, units, and formats of the labeled candidate fragment set are converted according to a preset normalization mapping table to obtain a set of element value pairs; The feature type labels and fragment location indexes are organized into a structured feature set based on the feature value pair set.
4. The method for context verification of claims documents according to claim 1, characterized in that, The step of performing spatiotemporal alignment on the structured feature set, constructing a spatiotemporal consistency tensor in conjunction with the data packet to be verified, and calculating a spatiotemporal inconsistency score based on the spatiotemporal consistency tensor includes: The structured feature set is spatiotemporally anchored to extract multiple anchor entries, and an index, corresponding time, location and cost are configured for each anchor entry to obtain a spatiotemporal anchor sequence. Based on the time range in the spatiotemporal anchor sequence, the data packet to be verified is parsed using auxiliary spatiotemporal records to obtain an auxiliary spatiotemporal sequence. The spatiotemporal anchor sequence and the auxiliary spatiotemporal sequence are spatiotemporally aligned to obtain an aligned record pair sequence. Spatial difference calculation is performed on the aligned record pair sequence to obtain a distance sequence; The time span is calculated by comparing the aligned record pair sequence with the distance sequence to obtain the time span sequence; Tensor construction is performed based on the distance sequence and time span sequence to obtain a spatiotemporal consistency tensor, and spatiotemporal contradiction score is calculated based on the spatiotemporal consistency tensor to obtain a spatiotemporal contradiction score.
5. The method for context verification of claims documents according to claim 1, characterized in that, The step of constructing a semantic graph using the claims events in the data packet to be verified to obtain semantic matching degree includes: The event element parsing process is performed on the claim event information in the data packet to be verified to extract the attribute values of name element, claim method element, and product type element, and integrate them to obtain the event semantic element set; The event semantic element set is subjected to semantic normalization processing to obtain a standardized event semantic set; The entity items in the standardized event semantic set are written into the corresponding node set, and the relation items in the standardized event semantic set are written into the corresponding edge set to obtain a semantic knowledge graph. Based on the extraction of candidate phrases corresponding to node categories in the semantic knowledge graph, a set of document semantic fragments is obtained; Semantic matching is performed using the document semantic fragment set and the semantic knowledge graph, and the node with the highest similarity is selected as the matching node to obtain the matching result set; The semantic similarity in the matching result set is aggregated according to a preset aggregation rule to obtain the semantic matching degree.
6. The context verification method for claims documents according to claim 1, characterized in that, The step of fusing the spatiotemporal contradiction score with the semantic matching degree to obtain a comprehensive risk score includes: The spatiotemporal contradiction score and the semantic matching degree are subjected to a score loading process to obtain a score input record; Subtract the semantic matching degree to obtain a semantic difference item, and write the semantic difference item back to the scoring input record to obtain an extended scoring record; The extended rating records are concatenated into a behavioral feature vector according to a preset field mapping rule, and the behavioral feature vector is subjected to historical pattern comparison processing to obtain behavioral matching records. The weight parameters are read from the preset parameter storage area, and the extended scoring record and the behavior matching record are fused and summed according to the read weight parameters to obtain a comprehensive risk score.
7. The method for contextual verification of claims documents according to claim 1, characterized in that, The step of determining a threshold for the comprehensive risk score to obtain the contextual verification result of the claim document to be verified includes: The comprehensive risk score is subjected to threshold retrieval processing to retrieve the risk threshold corresponding to the comprehensive risk score from a preset threshold configuration table; Calculate the relationship between the comprehensive risk score and the risk threshold to obtain a judgment mark record; The determination flag record is mapped to the disposal path identifier to obtain the path identifier record; According to the preset conclusion code table, a context verification conclusion identifier is assigned to the disposal path identifier and written into the path identifier record to obtain the conclusion record; The conclusion record and the comprehensive risk score are structured and encapsulated to obtain the context verification result.
8. A context verification device for claims documents, characterized in that, include: The data acquisition module is used to acquire the claim document to be verified, encapsulate the data of the claim document to be verified, and obtain the data packet to be verified. The feature extraction module is used to extract features from the data packet to be verified to obtain a structured feature set. The spatiotemporal alignment module is used to perform spatiotemporal alignment on the structured feature set, construct a spatiotemporal consistency tensor in combination with the data packet to be verified, and calculate the spatiotemporal contradiction score based on the spatiotemporal consistency tensor. The semantic graph construction module is used to construct a semantic graph using the claims events in the data packet to be verified, and to obtain the semantic matching degree. The data fusion module is used to fuse the spatiotemporal contradiction score with the semantic matching degree to obtain a comprehensive risk score; The result output module is used to determine the threshold of the comprehensive risk score and obtain the context verification result of the claim document to be verified.
9. A computer device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the steps of the context verification method for claims documents as described in any one of claims 1 to 7.
10. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the context verification method for claims documents as described in any one of claims 1 to 7.