Method, system, and computer program product for augmenting embeddings
Augmenting partition embeddings with contextual information improves data retrieval accuracy by appending contextual embeddings, addressing relevance issues in partitioned datasets.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- VISA INTERNATIONAL SERVICE ASSOCIATION
- Filing Date
- 2024-12-26
- Publication Date
- 2026-07-02
AI Technical Summary
Querying data from partitioned datasets often results in relevance issues due to insufficient contextual information, undermining the accuracy of retrieved results.
Augmenting partition embeddings by appending contextual partition embeddings to each partition embedding, forming augmented partition embeddings, and storing them in a database for efficient retrieval and relevance identification.
Enhances the accuracy of query results by providing enhanced context to partition embeddings, enabling more precise data retrieval and improved query systems.
Smart Images

Figure US2024061894_02072026_PF_FP_ABST
Abstract
Description
Attorney Docket No. 08223-2406684 (9071 WO01)METHOD, SYSTEM, AND COMPUTER PROGRAM PRODUCT FOR AUGMENTING EMBEDDINGSBACKGROUND1. Field
[0001] This disclosure relates generally to augmenting embeddings and, in some non-limiting embodiments or aspects, to methods, systems, and computer program products for augmenting embeddings.2. Technical Considerations
[0002] When querying data, identifying and retrieving the portions (e.g., partitions) contextually relevant to the query from a larger data set is a non-trivial task. Breaking the data into smaller partitions and storing those partitions in a database can aid in the efficient retrieval of the data relevant to the query. However, breaking the data into partitions can cause the partitions to not have enough contextual information, making it harder for the system to determine whether the data in the partition is relevant to the query. This can undermine the relevance of the results returned for the query.SUMMARY
[0003] Accordingly, it is an object of the present disclosure to provide methods, systems, and computer program products for augmenting embeddings that overcome some or all of the deficiencies identified above.
[0004] According to non-limiting embodiments or aspects, provided is a computer-implemented method for augmenting embeddings. The method may include receiving, with at least one processor, a data set; identifying, with at least one processor, a contextual partition of the data set; partitioning, with at least one processor, the data set into a plurality of different partitions, each partition of the plurality of partitions including a subset of the data set; inputting, with at least one processor, the contextual partition and the plurality of partitions into an embedding model; generating, with the embedding model, an embedding for each of the contextual partition and the plurality of partitions to form a contextual partition embedding and a plurality of partition embeddings; augmenting, with at least one processor, each of the plurality of partition embeddings by appending the contextual60P0248.DOCX Page 1 of 40Attorney Docket No. 08223-2406684 (9071 WO01)partition embedding to each of the plurality of partition embeddings to form a plurality of augmented partition embeddings; storing, with at least one processor, the plurality of augmented partition embeddings in a database; receiving, with at least one processor, a query associated with the data set; in response to receiving the query, searching, with at least one processor, the plurality of augmented partition embeddings in the database based on the query; and automatically identifying, with at least one processor, at least one augmented partition embedding of the plurality of augmented partition embeddings as relevant to the query based on the searching.
[0005] In some non-limiting embodiments or aspects, the computer-implemented method may further include in response to receiving the query, generating, with the embedding model, a query embedding associated with the query, where the searching the plurality of augmented partition embeddings in the database may be based on the query embedding.
[0006] In some non-limiting embodiments or aspects, the computer-implemented method may further include: in response to identifying the at least one augmented partition embedding of the plurality of augmented partition embeddings as relevant to the query, inputting the at least one augmented partition embedding and / or at least one partition of the plurality of partitions corresponding to the at least one augmented partition embedding to a large language model (LLM); and generating, with the LLM, a query response message to the query based on the at least one augmented partition embedding and / or the at least one partition of the plurality of partitions corresponding to the at least one augmented partition embedding.
[0007] In some non-limiting embodiments or aspects, the computer-implemented method may further include: inputting the query and / or a query embedding corresponding to the query to the LLM, where the query response message to the query may be generated based on the query and / or the query embedding corresponding to the query.
[0008] In some non-limiting embodiments or aspects, identifying the contextual partition may include automatically identifying, with a model, the contextual partition.
[0009] In some non-limiting embodiments or aspects, identifying the contextual partition may include: automatically identifying, with a reinforcement learning model, a predicted contextual partition; generating, with the reinforcement learning model, a feedback request message including the predicted contextual partition; transmitting the feedback request message to a client device; and receiving, with the reinforcement60P0248.DOCX Page 2 of 40Attorney Docket No. 08223-2406684 (9071 WO01)learning model and from the client device, a feedback response message confirming the predicted contextual partition as the contextual partition.
[0010] In some non-limiting embodiments or aspects, identifying the contextual partition may include receiving the contextual partition from a client device.
[0011] In some non-limiting embodiments or aspects, identifying the contextual partition may include: inputting the data set to a generative artificial intelligence model; and generating, with the generative artificial intelligence model, the contextual partition based on the data set.
[0012] In some non-limiting embodiments or aspects, the plurality of partitions may include a first partition and a second partition, where the first partition and the second partition may include an overlap of data from the data set included therein.
[0013] In some non-limiting embodiments or aspects, the plurality of partitions may include a first partition and a second partition, where the first partition and the second partition may not include an overlap of data from the data set included therein.
[0014] In some non-limiting embodiments or aspects, the contextual partition may represent a contextual summary of the data set.
[0015] In some non-limiting embodiments or aspects, appending the contextual partition embedding to each of the plurality of partition embeddings may include concatenating the contextual partition embedding to each of the plurality of partition embeddings to form the plurality of augmented partition embeddings.
[0016] In some non-limiting embodiments or aspects, the contextual partition embedding and / or the plurality of partition embeddings may be context-aware embeddings.
[0017] In some non-limiting embodiments or aspects, searching the plurality of augmented partition embeddings in the database may include executing a semantic search of the plurality of augmented partition embeddings based on the query.
[0018] In some non-limiting embodiments or aspects, a first augmented partition embedding of the plurality of augmented partition embeddings may include the contextual partition embedding appended to a first partition embedding of the plurality of partition embeddings, and the query may include a query message. The computer-implemented method may further include: in response to receiving the query, generating, with the embedding model, a query embedding associated with the query message; appending, with at least one processor, the query embedding to a second query embedding associated with the query to form an augmented query embedding,60P0248.DOCX Page 3 of 40Attorney Docket No. 08223-2406684 (9071 WO01)where automatically identifying the at least one augmented partition embedding of the plurality of augmented partition embeddings as relevant to the query may include executing a dot product of the augmented query embedding and the augmented partition embedding.
[0019] In some non-limiting embodiments or aspects, automatically identifying the at least one augmented partition embedding of the plurality of augmented partition embeddings as relevant to the query based on the searching may include: for each augmented partition embedding of the plurality of augmented partition embeddings, generating a relevance score based on a comparison of the augmented partition embedding and the query embedding; and in response to determining that the relevance score satisfies a relevance threshold, identifying the augmented partition embedding as relevant to the query.
[0020] According to non-limiting embodiments or aspects, provided is a system for augmenting embeddings. The system may include at least one processor configured to: receive a data set; identify a contextual partition of the data set; partition the data set into a plurality of different partitions, each partition of the plurality of partitions including a subset of the data set; input the contextual partition and the plurality of partitions into an embedding model; generate, with the embedding model, an embedding for each of the contextual partition and the plurality of partitions to form a contextual partition embedding and a plurality of partition embeddings; augment each of the plurality of partition embeddings by appending the contextual partition embedding to each of the plurality of partition embeddings to form a plurality of augmented partition embeddings; store the plurality of augmented partition embeddings in a database; receive a query associated with the data set; in response to receiving the query, search the plurality of augmented partition embeddings in the database based on the query; and automatically identify at least one augmented partition embedding of the plurality of augmented partition embeddings as relevant to the query based on the searching.
[0021] In some non-limiting embodiments or aspects, the at least one processor may be further configured to: in response to receiving the query, generate, with the embedding model, a query embedding associated with the query, where the searching the plurality of augmented partition embeddings in the database may be based on the query embedding.60P0248.DOCX Page 4 of 40Attorney Docket No. 08223-2406684 (9071 WO01)
[0022] In some non-limiting embodiments or aspects, the at least one processor may be further configured to: in response to identifying the at least one augmented partition embedding of the plurality of augmented partition embeddings as relevant to the query, inputting the at least one augmented partition embedding and / or at least one partition of the plurality of partitions corresponding to the at least one augmented partition embedding to a large language model (LLM); and generating, with the LLM, a query response message to the query based on the at least one augmented partition embedding and / or the at least one partition of the plurality of partitions corresponding to the at least one augmented partition embedding.
[0023] According to non-limiting embodiments or aspects, provided is a computer program product for augmenting embeddings. The computer program product may include at least one non-transitory computer-readable medium including program instructions that, when executed by at least one processor, cause the at least one processor to: receive a data set; identify a contextual partition of the data set; partition the data set into a plurality of different partitions, each partition of the plurality of partitions including a subset of the data set; input the contextual partition and the plurality of partitions into an embedding model; generate, with the embedding model, an embedding for each of the contextual partition and the plurality of partitions to form a contextual partition embedding and a plurality of partition embeddings; augment each of the plurality of partition embeddings by appending the contextual partition embedding to each of the plurality of partition embeddings to form a plurality of augmented partition embeddings; store the plurality of augmented partition embeddings in a database; receive a query associated with the data set; in response to receiving the query, search the plurality of augmented partition embeddings in the database based on the query; and automatically identify at least one augmented partition embedding of the plurality of augmented partition embeddings as relevant to the query based on the searching.
[0024] Further embodiments or aspects are set forth in the following numbered clauses:
[0025] Clause 1 : A computer-implemented method, comprising: receiving, with at least one processor, a data set; identifying, with at least one processor, a contextual partition of the data set; partitioning, with at least one processor, the data set into a plurality of different partitions, each partition of the plurality of partitions comprising a subset of the data set; inputting, with at least one processor, the contextual partition60P0248.DOCX Page 5 of 40Attorney Docket No. 08223-2406684 (9071 WO01)and the plurality of partitions into an embedding model; generating, with the embedding model, an embedding for each of the contextual partition and the plurality of partitions to form a contextual partition embedding and a plurality of partition embeddings; augmenting, with at least one processor, each of the plurality of partition embeddings by appending the contextual partition embedding to each of the plurality of partition embeddings to form a plurality of augmented partition embeddings; storing, with at least one processor, the plurality of augmented partition embeddings in a database; receiving, with at least one processor, a query associated with the data set; in response to receiving the query, searching, with at least one processor, the plurality of augmented partition embeddings in the database based on the query; and automatically identifying, with at least one processor, at least one augmented partition embedding of the plurality of augmented partition embeddings as relevant to the query based on the searching.
[0026] Clause 2: The computer-implemented method of clause 1 , further comprising: in response to receiving the query, generating, with the embedding model, a query embedding associated with the query, wherein the searching the plurality of augmented partition embeddings in the database is based on the query embedding.
[0027] Clause 3: The computer-implemented method of clause 1 or 2, further comprising: in response to identifying the at least one augmented partition embedding of the plurality of augmented partition embeddings as relevant to the query, inputting the at least one augmented partition embedding and / or at least one partition of the plurality of partitions corresponding to the at least one augmented partition embedding to a large language model (LLM); and generating, with the LLM, a query response message to the query based on the at least one augmented partition embedding and / or the at least one partition of the plurality of partitions corresponding to the at least one augmented partition embedding.
[0028] Clause 4: The computer-implemented method of any of clauses 1 -3, further comprising: inputting the query and / or a query embedding corresponding to the query to the LLM, wherein the query response message to the query is generated based on the query and / or the query embedding corresponding to the query.
[0029] Clause 5: The computer-implemented method of any of clauses 1-4, wherein identifying the contextual partition comprises automatically identifying, with a model, the contextual partition.60P0248.DOCX Page 6 of 40Attorney Docket No. 08223-2406684 (9071 WO01)
[0030] Clause 6: The computer-implemented method of any of clauses 1-5, wherein identifying the contextual partition comprises: automatically identifying, with a reinforcement learning model, a predicted contextual partition; generating, with the reinforcement learning model, a feedback request message comprising the predicted contextual partition; transmitting the feedback request message to a client device; and receiving, with the reinforcement learning model and from the client device, a feedback response message confirming the predicted contextual partition as the contextual partition.
[0031] Clause 7: The computer-implemented method of any of clauses 1-6, wherein identifying the contextual partition comprises receiving the contextual partition from a client device.
[0032] Clause 8: The computer-implemented method of any of clauses 1-7, wherein identifying the contextual partition comprises: inputting the data set to a generative artificial intelligence model; and generating, with the generative artificial intelligence model, the contextual partition based on the data set.
[0033] Clause 9: The computer-implemented method of any of clauses 1-8, wherein the plurality of partitions comprises a first partition and a second partition, wherein the first partition and the second partition comprise an overlap of data from the data set included therein.
[0034] Clause 10: The computer-implemented method of any of clauses 1-9, wherein the plurality of partitions comprises a first partition and a second partition, wherein the first partition and the second partition do not comprise an overlap of data from the data set included therein.
[0035] Clause 11 : The computer-implemented method of any of clauses 1-10, wherein the contextual partition represents a contextual summary of the data set.
[0036] Clause 12: The computer-implemented method of any of clauses 1-11 , wherein appending the contextual partition embedding to each of the plurality of partition embeddings comprises concatenating the contextual partition embedding to each of the plurality of partition embeddings to form the plurality of augmented partition embeddings.
[0037] Clause 13: The computer-implemented method of any of clauses 1-12, wherein the contextual partition embedding and / or the plurality of partition embeddings are context-aware embeddings.60P0248.DOCX Page 7 of 40Attorney Docket No. 08223-2406684 (9071 WO01)
[0038] Clause 14: The computer-implemented method of any of clauses 1-13, wherein searching the plurality of augmented partition embeddings in the database comprises executing a semantic search of the plurality of augmented partition embeddings based on the query.
[0039] Clause 15: The computer-implemented method of any of clauses 1-14, wherein a first augmented partition embedding of the plurality of augmented partition embeddings comprises the contextual partition embedding appended to a first partition embedding of the plurality of partition embeddings, and the query comprises a query message, the computer-implemented method further comprising: in response to receiving the query, generating, with the embedding model, a query embedding associated with the query message; appending, with at least one processor, the query embedding to a second query embedding associated with the query to form an augmented query embedding, wherein automatically identifying the at least one augmented partition embedding of the plurality of augmented partition embeddings as relevant to the query comprises executing a dot product of the augmented query embedding and the augmented partition embedding.
[0040] Clause 16: The computer-implemented method of any of clauses 1-15, wherein automatically identifying the at least one augmented partition embedding of the plurality of augmented partition embeddings as relevant to the query based on the searching comprises: for each augmented partition embedding of the plurality of augmented partition embeddings, generating a relevance score based on a comparison of the augmented partition embedding and the query embedding; and in response to determining that the relevance score satisfies a relevance threshold, identifying the augmented partition embedding as relevant to the query.
[0041] Clause 17: A system comprising at least one processor configured to: receive a data set; identify a contextual partition of the data set; partition the data set into a plurality of different partitions, each partition of the plurality of partitions comprising a subset of the data set; input the contextual partition and the plurality of partitions into an embedding model; generate, with the embedding model, an embedding for each of the contextual partition and the plurality of partitions to form a contextual partition embedding and a plurality of partition embeddings; augment each of the plurality of partition embeddings by appending the contextual partition embedding to each of the plurality of partition embeddings to form a plurality of augmented partition embeddings; store the plurality of augmented partition60P0248.DOCX Page 8 of 40Attorney Docket No. 08223-2406684 (9071 WO01)embeddings in a database; receive a query associated with the data set; in response to receiving the query, search the plurality of augmented partition embeddings in the database based on the query; and automatically identify at least one augmented partition embedding of the plurality of augmented partition embeddings as relevant to the query based on the searching.
[0042] Clause 18: The system of clause 17, the at least one processor further configured to: in response to receiving the query, generate, with the embedding model, a query embedding associated with the query, wherein the searching the plurality of augmented partition embeddings in the database is based on the query embedding.
[0043] Clause 19: The system of clause 17 or 18, the at least one processor further configured to: in response to identifying the at least one augmented partition embedding of the plurality of augmented partition embeddings as relevant to the query, inputting the at least one augmented partition embedding and / or at least one partition of the plurality of partitions corresponding to the at least one augmented partition embedding to a large language model (LLM); and generating, with the LLM, a query response message to the query based on the at least one augmented partition embedding and / or the at least one partition of the plurality of partitions corresponding to the at least one augmented partition embedding.
[0044] Clause 20: A computer program product comprising at least one non-transitory computer-readable medium including program instructions that, when executed by at least one processor, cause the at least one processor to: receive a data set; identify a contextual partition of the data set; partition the data set into a plurality of different partitions, each partition of the plurality of partitions comprising a subset of the data set; input the contextual partition and the plurality of partitions into an embedding model; generate, with the embedding model, an embedding for each of the contextual partition and the plurality of partitions to form a contextual partition embedding and a plurality of partition embeddings; augment each of the plurality of partition embeddings by appending the contextual partition embedding to each of the plurality of partition embeddings to form a plurality of augmented partition embeddings; store the plurality of augmented partition embeddings in a database; receive a query associated with the data set; in response to receiving the query, search the plurality of augmented partition embeddings in the database based on the query; and automatically identify at least one augmented partition embedding of the plurality of augmented partition embeddings as relevant to the query based on the searching.60P0248.DOCX Page 9 of 40Attorney Docket No. 08223-2406684 (9071 WO01)
[0045] These and other features and characteristics of the present disclosure, as well as the methods of operation and functions of the related elements of structures and the combination of parts and economies of manufacture, will become more apparent upon consideration of the following description and the appended claims with reference to the accompanying drawings, all of which form a part of this specification, wherein like reference numerals designate corresponding parts in the various figures. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of the disclosed subject matter.BRIEF DESCRIPTION OF THE DRAWINGS
[0046] Additional advantages and details are explained in greater detail below with reference to the non-limiting, exemplary embodiments that are illustrated in the accompanying schematic figures, in which:
[0047] FIG. 1 is a schematic diagram of an embedding system, according to some non-limiting embodiments or aspects;
[0048] FIG. 2 is a schematic diagram of a contextual partition system, according to some non-limiting embodiments or aspects;
[0049] FIG. 3 is a schematic diagram of an example data set (e.g., document thereof) that has been partitioned, according to some non-limiting embodiments or aspects;
[0050] FIG. 4 is a schematic diagram of a partitioning system and concatenating system, according to some non-limiting embodiments or aspects;
[0051] FIG. 5 is a schematic diagram of a query system, according to some nonlimiting embodiments or aspects;
[0052] FIG. 6 is a flow diagram for an example process for augmenting embeddings, according to some non-limiting embodiments or aspects;
[0053] FIG. 7 is a flow diagram for an example process for augmenting embeddings, according to some non-limiting embodiments or aspects;
[0054] FIG. 8 is a schematic diagram of an example electronic payment processing network, according to some non-limiting embodiments or aspects; and
[0055] FIG. 9 is a schematic diagram of example components of one or more devices of FIGS. 1 , 2, 4, 5, and 8, according to some non-limiting embodiments or aspects.60P0248.DOCX Page 10 of 40Attorney Docket No. 08223-2406684 (9071 WO01)DETAILED DESCRIPTION
[0056] For purposes of the description hereinafter, the terms “end,” “upper,” “lower,” “right,” “left,” “vertical,” “horizontal,” “top,” “bottom,” “lateral,” “longitudinal,” and derivatives thereof shall relate to the embodiments as they are oriented in the drawing figures. However, it is to be understood that the present disclosure may assume various alternative variations and step sequences, except where expressly specified to the contrary. It is also to be understood that the specific devices and processes illustrated in the attached drawings, and described in the following specification, are simply exemplary and non-limiting embodiments or aspects of the disclosed subject matter. Hence, specific dimensions and other physical characteristics related to the embodiments or aspects disclosed herein are not to be considered as limiting.
[0057] Some non-limiting embodiments or aspects are described herein in connection with thresholds. As used herein, satisfying a threshold may refer to a value being greater than the threshold, more than the threshold, higher than the threshold, greater than or equal to the threshold, less than the threshold, fewer than the threshold, lower than the threshold, less than or equal to the threshold, equal to the threshold, etc.
[0058] No aspect, component, element, structure, act, step, function, instruction, and / or the like used herein should be construed as critical or essential unless explicitly described as such. Also, as used herein, the articles “a” and “an” are intended to include one or more items and may be used interchangeably with “one or more” and “at least one.” Furthermore, as used herein, the term “set” is intended to include one or more items (e.g., related items, unrelated items, a combination of related and unrelated items, and / or the like) and may be used interchangeably with “one or more” or “at least one.” Where only one item is intended, the term “one” or similar language is used. Also, as used herein, the terms “has,” “have,” “having,” or the like are intended to be open-ended terms. Further, the phrase “based on” is intended to mean “based at least partially on” unless explicitly stated otherwise. In addition, reference to an action being “based on” a condition may refer to the action being “in response to” the condition. For example, the phrases “based on” and “in response to” may, in some non-limiting embodiments or aspects, refer to a condition for automatically triggering60P0248.DOCX Page 11 of 40Attorney Docket No. 08223-2406684 (9071 WO01)an action (e.g., a specific operation of an electronic device, such as a computing device, a processor, and / or the like).
[0059] As used herein, the term “acquirer institution” may refer to an entity licensed and / or approved by a transaction service provider to originate transactions (e.g., payment transactions) using a payment device associated with the transaction service provider. The transactions the acquirer institution may originate may include payment transactions (e.g., purchases, original credit transactions (OCTs), account funding transactions (AFTs), and / or the like). In some non-limiting embodiments or aspects, an acquirer institution may be a financial institution, such as a bank. As used herein, the term “acquirer system” may refer to one or more computing devices operated by or on behalf of an acquirer institution, such as a server computer executing one or more software applications.
[0060] As used herein, the term “account identifier” may include one or more primary account numbers (PANs), tokens, or other identifiers associated with a customer account. The term “token” may refer to an identifier that is used as a substitute or replacement identifier for an original account identifier, such as a PAN. Account identifiers may be alphanumeric or any combination of characters and / or symbols. Tokens may be associated with a PAN or other original account identifier in one or more data structures (e.g., one or more databases, and / or the like) such that they may be used to conduct a transaction without directly using the original account identifier. In some examples, an original account identifier, such as a PAN, may be associated with a plurality of tokens for different individuals or purposes.
[0061] As used herein, the terms “client” and “client device” may refer to one or more client-side devices or systems. As an example, a “client device” may refer to one or more computing devices. In some non-limiting embodiments or aspects, a client device may be an electronic device configured to communicate with one or more networks. For example, a client device may include one or more computers, portable computers, laptop computers, tablet computers, mobile devices, cellular phones, wearable devices (e.g., watches, glasses, lenses, clothing, and / or the like), PDAs, and / or the like.
[0062] As used herein, the term “communication” may refer to the reception, receipt, transmission, transfer, provision, and / or the like of data (e.g., information, signals, messages, instructions, commands, and / or the like). For one unit (e.g., a device, a system, a component of a device or system, combinations thereof, and / or60P0248.DOCX Page 12 of 40Attorney Docket No. 08223-2406684 (9071 WO01)the like) to be in communication with another unit means that the one unit is able to directly or indirectly receive information from and / or transmit information to the other unit. This may refer to a direct or indirect connection (e.g., a direct communication connection, an indirect communication connection, and / or the like) that is wired and / or wireless in nature. Additionally, two units may be in communication with each other even though the information transmitted may be modified, processed, relayed, and / or routed between the first and second unit. For example, a first unit may be in communication with a second unit even though the first unit passively receives information and does not actively transmit information to the second unit. As another example, a first unit may be in communication with a second unit if at least one intermediary unit processes information received from the first unit and communicates the processed information to the second unit. In some non-limiting embodiments or aspects, a message may refer to a network packet (e.g., a data packet and / or the like) that includes data. It will be appreciated that numerous other arrangements are possible.
[0063] As used herein, the term “computing device” may refer to one or more electronic devices configured to process data. A computing device may, in some examples, include the necessary components to receive, process, and output data, such as a processor, a display, a memory, an input device, a network interface, and / or the like. A computing device may be a mobile device. As an example, a mobile device may include a cellular phone (e.g., a smartphone or standard cellular phone), a portable computer, a wearable device (e.g., watches, glasses, lenses, clothing, and / or the like), a personal digital assistant (PDA), and / or other like devices. A computing device may also be a desktop computer or other form of non-mobile computer.
[0064] As used herein, the term “issuer institution” may refer to one or more entities, such as a bank, that provide accounts to customers for conducting transactions (e.g., payment transactions), such as initiating credit and / or debit payments. For example, an issuer institution may provide an account identifier, such as a PAN, to a customer that uniquely identifies one or more accounts associated with that customer. The account identifier may be embodied on a payment device, such as a physical financial instrument, e.g., a payment card, and / or may be electronic and used for electronic payments. The term “issuer system” refers to one or more computer devices operated by or on behalf of an issuer institution, such as a server60P0248.DOCX Page 13 of 40Attorney Docket No. 08223-2406684 (9071 WO01)computer executing one or more software applications. For example, an issuer system may include one or more authorization servers for authorizing a transaction.
[0065] As used herein, the term “merchant” may refer to an individual or entity that provides goods and / or services, or access to goods and / or services, to customers based on a transaction, such as a payment transaction. The term “merchant” or “merchant system” may also refer to one or more computer systems operated by or on behalf of a merchant, such as a server computer executing one or more software applications.
[0066] As used herein, the term “payment device” may refer to a payment card (e.g., a credit or debit card), a gift card, a smartcard, smart media, a payroll card, a healthcare card, a wristband, a machine-readable medium containing account information, a keychain device or fob, an RFID transponder, a retailer discount or loyalty card, a cellular phone, an electronic wallet mobile application, a personal digital assistant (PDA), a pager, a security card, a computing device, an access card, a wireless terminal, a transponder, and / or the like. In some non-limiting embodiments or aspects, the payment device may include volatile or non-volatile memory to store information (e.g., an account identifier, a name of the account holder, and / or the like).
[0067] As used herein, the term “payment gateway” may refer to an entity and / or a payment processing system operated by or on behalf of such an entity (e.g., a merchant service provider, a payment service provider, a payment facilitator, a payment facilitator that contracts with an acquirer, a payment aggregator, and / or the like), which provides payment services (e.g., transaction service provider payment services, payment processing services, and / or the like) to one or more merchants. The payment services may be associated with the use of portable financial devices managed by a transaction service provider. As used herein, the term “payment gateway system” may refer to one or more computer systems, computer devices, servers, groups of servers, and / or the like, operated by or on behalf of a payment gateway.
[0068] As used herein, a “point-of-sale (POS) device” may refer to one or more devices, which may be used by a merchant to conduct a transaction (e.g., a payment transaction) and / or process a transaction. For example, a POS device may include one or more client devices. Additionally or alternatively, a POS device may include peripheral devices, card readers, scanning devices (e.g., code scanners), Bluetooth® communication receivers, near-field communication (NFC) receivers, radio frequency60P0248.DOCX Page 14 of 40Attorney Docket No. 08223-2406684 (9071 WO01)identification (RFID) receivers, and / or other contactless transceivers or receivers, contact-based receivers, payment terminals, and / or the like. As used herein, a “point-of-sale (POS) system” may refer to one or more client devices and / or peripheral devices used by a merchant to conduct a transaction. For example, a POS system may include one or more POS devices and / or other like devices that may be used to conduct a payment transaction. In some non-limiting embodiments or aspects, a POS system (e.g., a merchant POS system) may include one or more server computers configured to process online payment transactions through webpages, mobile applications, and / or the like.
[0069] As used herein, the term “server” may refer to or include one or more computing devices that are operated by or facilitate communication and processing for multiple parties in a network environment, such as the Internet, although it will be appreciated that communication may be facilitated over one or more public or private network environments and that various other arrangements are possible. Further, multiple computing devices (e.g., servers, point-of-sale (POS) devices, mobile devices, etc.) directly or indirectly communicating in the network environment may constitute a “system.”
[0070] As used herein, the term “system” may refer to one or more computing devices or combinations of computing devices (e.g., processors, servers, client devices, software applications, components of such, and / or the like). Reference to “a device,” “a server,” “a processor,” and / or the like, as used herein, may refer to a previously-recited device, server, or processor that is recited as performing a previous step or function, a different device, server, or processor, and / or a combination of devices, servers, and / or processors. For example, as used in the specification and the claims, a first device, a first server, or a first processor that is recited as performing a first step or a first function may refer to the same or different device, server, or processor recited as performing a second step or a second function.
[0071] As used herein, the term “transaction service provider” may refer to an entity that receives transaction authorization requests from merchants or other entities and provides guarantees of payment, in some cases through an agreement between the transaction service provider and an issuer institution. For example, a transaction service provider may include a payment network such as Visa® or any other entity that processes transactions. The term “transaction processing system” may refer to one or more computer systems operated by or on behalf of a transaction service60P0248.DOCX Page 15 of 40Attorney Docket No. 08223-2406684 (9071 WO01)provider, such as a transaction processing server executing one or more software applications. A transaction processing server may include one or more processors and, in some non-limiting embodiments or aspects, may be operated by or on behalf of a transaction service provider.
[0072] Non-limiting embodiments or aspects of the disclosed subject matter are directed to systems, methods, and computer program products for augmenting embeddings. For example, non-limiting embodiments or aspects identify a contextual partition of a data set and further partition the data set into a plurality of partitions. The contextual partition may represent a contextual summary of the data set (e.g., of a document in the data set). An embedding of the contextual partition may be generated by an embedding model, forming a contextual partition embedding. For each partition, an embedding may be generated by the embedding model, forming a plurality of partition embeddings.
[0073] Non-limiting embodiments or aspects may improve the partition embeddings by augmentation. Augmenting the partition embeddings may comprise appending the contextual partition embedding to each of the plurality of partition embeddings to form a plurality of augmented partition embeddings. These augmented partition embeddings may be stored in a queryable database.
[0074] The data set may be queried to extract the data therein most relevant to the query. In response to receiving a query, an embedding of the query message may be generated by the embedding model to form a query embedding. Using the query embedding, the embedding database may be searched (e.g., a semantic search) to identify relevant augmented partition embeddings. The relevant augmented partition embeddings and / or the partition corresponding thereto may be input to a large language model (LLM), which LLM may be used to generate a response to the query. The LLM may interact conversationally with a user device to receive queries and respond with user-interpretable responses based on the query results.
[0075] Non-liming embodiments or aspects may improve querying systems configured to query data sets to return results to queries that are most relevant to the queries. Appending the contextual partition embedding to each of the plurality of partition embeddings to form a plurality of augmented partition embeddings may improve the original partition embeddings by providing enhanced context to each partition embedding. The enhanced context may enable the query system to efficiently return more accurate results, thus improving the query system itself.60P0248.DOCX Page 16 of 40Attorney Docket No. 08223-2406684 (9071 WO01)
[0076] For the purpose of illustration, in the following description, while the presently disclosed subject matter is described with respect to methods, systems, and computer program products for augmenting embeddings, one skilled in the art will recognize that the disclosed subject matter is not limited to the illustrative embodiments.
[0077] FIG. 1 depicts a non-limiting embodiment or aspect of an embedding system 100 configured to augment embeddings according to some non-limiting embodiments or aspects. Embedding system 100 may include a data set 102 stored in a database, a data processing system 104, and an embedding database 112. Data processing system 104 may comprise a parsing system 106, a partition system 108, and an embedding model 110.
[0078] Data set 102 may be stored in a database. In some non-limiting embodiments or aspects, the database may store data records. In some non-limiting embodiments or aspects, the database storing data set 102 may be in communication with data processing system 104 and / or embedding database 112, e.g., to communicate (e.g., send and / or receive) at least a portion of data set 102 to and / or from data processing system 104 and / or embedding database 112.
[0079] In some non-limiting embodiments or aspects, database may store data set 102. Data set 102 may comprise one or more documents (e.g., digital documents). Data set 102 may comprise any type of document. Non-limiting examples of documents include, but are not limited to, user guides, design documents, reports, research papers, articles, account statements, and the like. The documents (and / or the embeddings thereof) may be searchable by the user using a query system as described herein.
[0080] Data processing system 104 may include at least one computing device, as described herein. In some non-limiting embodiments or aspects, data processing system 104 may include at least one processor (e.g., a multi-core processor) such as a graphics processing unit (GPU), a central processing unit (CPU), an accelerated processing unit (APU), a microprocessor, and / or the like. Data processing system 104 may include parsing system 106, partition system 108, and / or embedding model 110. Each of parsing system 106, partition system 108, and / or embedding model 110 may include at least one processor (e.g., a multi-core processor) such as a graphics processing unit (GPU), a central processing unit (CPU), an accelerated processing unit (APU), a microprocessor, and / or the like. The input to data processing system60P0248.DOCX Page 17 of 40Attorney Docket No. 08223-2406684 (9071 WO01)104 may comprise data set 102, and the output of data processing system 104 may comprise embeddings associated with data set 102 as described herein. Data processing system 104 may store the output (e.g., the embeddings) in embedding database 112.
[0081] Embedding database 112 may include at least one computing device, as described herein. In some non-limiting embodiments or aspects, embedding database 112 may store embeddings generated by data processing system 104. In some nonlimiting embodiments or aspects, embedding database 112 may be in communication with data processing system 104 and / or data set 102, e.g., to communicate (e.g., send and / or receive) at least a portion of the embeddings to and / or from data processing system 104 and / or data set 102.
[0082] The number and arrangement of systems and devices shown in FIG. 1 are provided as an example. There may be additional systems and / or devices, fewer systems and / or devices, different systems and / or devices, and / or differently arranged systems and / or devices than those shown in FIG. 1. Furthermore, two or more systems or devices shown in FIG. 1 may be implemented within a single system or device, or a single system or device shown in FIG. 1 may be implemented as multiple, distributed systems or devices. Additionally or alternatively, a set of systems (e.g., one or more systems) or a set of devices (e.g., one or more devices) of system 100 may perform one or more functions described as being performed by another set of systems or another set of devices of system 100.
[0083] With continued reference to FIG. 1 , in some non-limiting embodiments or aspects, parsing system 106 of data processing system 104 may receive data set 102. Parsing system 106 may modify the structure of the raw data set 102 to a data structure compatible for processing by partition system 108 and / or embedding model 110, and / or large language model (LLM) 522 (see FIG. 5). Modifying the raw data structure may comprise processing the data into a more readable and / or manageable format, breaking down the data into smaller components that may be more easily analyzed, reformatting the data into a predetermined format, and / or the like. Parsing system 106 may communicate the parsed data to partition system 108. In some nonlimiting embodiments or aspects, data set 102 may be communicated directly to partition system 108 without parsing the data.
[0084] Referring to FIGS. 1 and 2, partition system 108 may receive data set 102 and / or parsed data set 214 (hereinafter referred to as data set 102). A contextual60P0248.DOCX Page 18 of 40Attorney Docket No. 08223-2406684 (9071 WO01)partition model 216 of partition system 108 may receive data set 102. In some nonlimiting embodiments or aspects, contextual partition model 216 may comprise a machine learning model. The machine learning model may include at least one neural network, at least one multilayer perceptron (MLP), at least one deep neural network (DNN), at least one attention model, at least one self-attention model, at least one multi-head self-attention model, at least one transformer model, at least one vision transformer (ViT) model, at least one convolutional neural network (CNN), at least one tree model, and / or the like. The machine learning model may include a reinforcement learning model.
[0085] Contextual partition model 216 may identify a contextual partition of the data set 102. Identifying the contextual partition of the data set 102 may comprise identifying a contextual partition of each document of the data set 102 (e.g. the data set may comprise a plurality of documents), as each document may have a contextual partition. A contextual partition may represent a contextual summary of the data set 102 (e.g., the document of the data set 102). The contextual partition may be shorter than the document itself and may summarize the document. For example, the contextual partition may comprise an abstract and / or an introduction section of a research paper or client data in an account statement.
[0086] Contextual partition model 216 may identify a contextual partition for each document of the data set 102. For example, contextual partition model 216 may automatically identify the contextual partition using the machine learning model of contextual partition model 216. Data set 102 may be input to the machine learning model of contextual partition model 216, which may analyze each document and automatically determined a contextual partition thereof.
[0087] Referring to FIG. 2, in some non-limiting embodiments or aspects, a contextual partition system 200 may comprise the partition system 108 comprising a contextual partition model 216, and contextual partition model 216 may comprise a reinforcement learning model. The reinforcement learning model may automatically identify a predicted contextual partition (e.g. for each document). The predicted contextual partition may be the portion of the document that the reinforcement learning model determines (e.g., predicts) suitably represents the document as a contextual partition.
[0088] Contextual partition model 216 may generate a feedback request message comprising the predicted contextual partition. The feedback request message may be60P0248.DOCX Page 19 of 40Attorney Docket No. 08223-2406684 (9071 WO01)transmitted to a client device 218 associated with data set 102. The client of client device 218 may be an owner, creator, and / or representative of the data of data set 102.
[0089] In response to receiving the feedback request message, client device 218 may display the predicted contextual partition. Client may view the predicted contextual partition displayed by client device 218 and input feedback regarding the suitability of the predicted contextual partition as a contextual partition. For example, client may input feedback to client device 218 agreeing that the predicted contextual partition is a suitable contextual partition. For example, client may input feedback to client device 218 disagreeing that the predicted contextual partition is a suitable contextual partition (e.g., specifying that predicted contextual partition is an unsuitable contextual partition). For example, client may input feedback to client device 218 specifying what should be the contextual partition of the document.
[0090] With continued reference to FIG. 2, contextual partition model 216 may receive from client device 218 a feedback response message. The feedback response message may comprise the client feedback.
[0091] In response to feedback indicating disagreement from the client regarding the predicted contextual partition, the reinforcement learning model may re-analyze the document to identify a different predicted contextual partition. Reinforcement learning model and client device 218 may communicate back-and-forth in the abovedescribed manner until a suitable contextual partition has been identified and confirmed.
[0092] In response to feedback comprising a specific client-suggested contextual partition, contextual partition model 216 may identify the specific client-suggested contextual partition as the contextual partition.
[0093] In response to feedback confirming the predicted contextual partition as the contextual partition, contextual partition model 216 may identify the predicted contextual partition as the contextual partition.
[0094] The feedback from client device 218 may be input to the reinforcement learning model, which may be trained and / or re-trained on the feedback to continue to improve the accuracy of the reinforcement learning model in automatically identifying contextual partitions.
[0095] With continued reference to FIGS. 1 and 2, in some non-limiting embodiments or aspects, identifying the contextual partition may comprise receiving,60P0248.DOCX Page 20 of 40Attorney Docket No. 08223-2406684 (9071 WO01)by contextual partition model 216, the contextual partition from client device 218. Client may identify the contextual partition by inputting the contextual partition to client device 218, which may transmit the identified contextual partition to contextual partition model 216.
[0096] With continued reference to FIGS. 1 and 2, in some non-limiting embodiments or aspects, identifying the contextual partition may comprise inputting data set 102 (e.g., one or more documents thereof) to a generative artificial intelligence model. Contextual partition model 216 may comprise the generative artificial intelligence model. The generative artificial intelligence model may generate the contextual partition based on the contents of the document. For example, the generative artificial intelligence model may automatically generate a textual summary of a document, such as generating word(s), sentence(s), paragraph(s), and / or some combination thereof summarizing the document.
[0097] In some non-limiting embodiments or aspects, the contextual partition of a document may comprise a subset of the original document. In some non-limiting embodiments or aspects, the contextual partition of a document may comprise a generated summary of the document that is not a subset of the original document.
[0098] With continued reference to FIGS. 1 and 2, in some non-limiting embodiments or aspects, partition system 108 may partition data set 102 (e.g., each document thereof) into a plurality of different partitions. For example, each document of data set 102 may be partitioned (e.g., chunked) into a plurality of different partitions, and each partition may comprise a subset of the of the document. Partitioning each document may comprise determining a partition size (e.g., a maximum partition size, a minimum partition size, a partition size range, etc.) and partitioning the document according to the partition size. For example, the partition size may include a predetermined number of tokens per partition. The partition size may be determined based on one or more characteristics (e.g., capacity) of a database storing the partition (e.g., embedding database 112), and / or a model processing the data (e.g., embedding model 110, contextual partition model 216, LLM 522 (from FIG. 5), and / or the like).
[0099] FIG. 3 shows a non-limiting example of how data set 102 (e.g., a document) may be partitioned. The data set 102 in FIG. 3 comprises document 300. The document 300 may have an abstract 302 (e.g., a summary section) and a plurality of paragraphs 304. The abstract 302 may be identified as the contextual partition 306 of the document 300 (e.g., by contextual partition model 216). Partition system 108 may60P0248.DOCX Page 21 of 40Attorney Docket No. 08223-2406684 (9071 WO01)partition document 300 into a plurality of partitions 308a-g. In this non-limiting example, partitions 308b-g of the document 300 may comprise at least a portion of a paragraph 304 of document 300. Partition 308a may comprise abstract 302 of document 300.
[0100] In some non-limiting embodiments or aspects, the partitions 308a-g comprise a first partition 308b and a second partition 308c, and the first partition 308b and the second partition 308c may not comprise an overlap of data from the data set 102 included therein. In this non-limiting example, the first paragraph (first partition 308b) may not overlap with the second paragraph (second partition 308c).
[0101] In some non-limiting embodiments or aspects, the partitions 308a-g comprise a first partition 308f and a second partition 308g, and the first partition 308f and the second partition 308g may comprise an overlap of data from the data set 102 included therein. In this non-limiting example, the first partition 308f (fifth paragraph) may overlap with the second partition 308g (sixth paragraph and portions of the fifth paragraph), such that a least a portion of the fifth paragraph may be included in both first partition 308f and second partition 308g. Overlapping data in two partitions may enhance the similarity between the adjacent partitions comprising the overlapping data compared to each other, such that a query system may better understand the relationship between the adjacent paragraphs (and / or their embeddings).
[0102] While FIG. 3 shows a non-limiting example of a document 300 of data set 102 being partitioned based on the abstract and paragraphs and having overlapping portions and non-overlapping portions, it will be appreciated that any other type of document of a data set may be partitioned in a similar manner and / or the document may be partitioned using different partitioning techniques. The partitions may be subsets of the document, each subset representing a portion of the document as a whole. Partitioning the document may make it easier (e.g., more computationally feasible and / or efficient) to store in a database and / or to query the data.
[0103] Referring again to FIGS. 1 and 2, partition system 108 may input the contextual partition and the plurality of partitions into embedding model 110. In some non-limiting embodiments or aspects, embedding model 110 may comprise a model configured to generate embeddings representing each of the contextual partition and the plurality of partitions. The embedding may comprise a mathematical relationship of data, such as a vector, a tensor, and / or the like. The embedding may comprise a mathematical relationship of data that captures its meaning and relationships (e.g., is60P0248.DOCX Page 22 of 40Attorney Docket No. 08223-2406684 (9071 WO01)context-aware). In some non-limiting embodiments or aspects, embedding model 110 may comprise a machine learning model. The machine learning model may include at least one neural network, at least one multilayer perceptron (MLP), at least one deep neural network (DNN), at least one attention model, at least one self-attention model, at least one multi-head self-attention model, at least one transformer model, at least one vision transformer (ViT) model, at least one convolutional neural network (CNN), at least one tree model, and / or the like. The machine learning model may include a generative artificial intelligence model. It will be appreciated that any model suitable for generating embeddings may be used.
[0104] Embedding model 110 may generate one or more embeddings for the contextual partition to form one or more contextual partition embedding. For each of the plurality of partitions, embedding model 110 may generate one or more embeddings to form a plurality of partition embeddings. A contextual partition embedding and partition embeddings may be generated for each document of data set 102. The contextual partition embeddings and partition embeddings may be stored in embedding database 112.
[0105] Referring to FIGS. 1 , 2, and 4, embedding model 110 may augment each of the plurality of partition embeddings. Augmenting a partition embedding may comprise appending the contextual partition embedding to the partition embedding to form an augmented partition embedding. Each of the partition embeddings may be augmented in this manner to form a plurality of augmented partition embeddings.
[0106] In some non-limiting embodiments or aspects, appending the contextual partition embedding to each of the plurality of partition embeddings comprises concatenating the contextual partition embedding to each of the plurality of partition embeddings to form the plurality of augmented partition embeddings.
[0107] Referring to FIG. 4, partitioning system 400 may identify a contextual partition and embed that contextual partition to form a contextual partition embedding CP. Partitioning system 400 may partition the document into a plurality of partitions and embed each of those partitions to form partition embeddings P1-P4. The contextual partition embedding CP and the partition embeddings P1-P4 may each have x dimension.
[0108] Concatenating system 450 may augment each partition embedding P1-P4 by concatenating the contextual partition embedding CP to each of the corresponding plurality of partition embeddings P1-P4 to form a plurality of concatenated partition60P0248.DOCX Page 23 of 40Attorney Docket No. 08223-2406684 (9071 WO01)embeddings C1-C4 (e.g., augmented embeddings). The contextual partition embedding CP may be concatenated with itself to form a contextual concatenated partition embedding CC (e.g., also an augmented embedding). Contextual concatenated partition embedding CC and concatenated partition embeddings C1-C4 may have 2x dimensions.
[0109] Referring again to FIG. 1, the augmented partition embeddings may be stored in embedding database 112. Embedding model 110 may transmit the augmented partition embeddings to embedding database 112 to be stored therein. Embedding database 112 may be configured to store embeddings and / or to enable searching of the stored embeddings in response to a query as described herein.
[0110] Referring to FIG. 5, a query system 500 is shown according to some nonlimiting embodiments or aspects. Query system 500 may comprise a user device 520 of a user, the user requesting to query the data in embedding database 112. User device 520 may include at least one processor (e.g., a multi-core processor) such as a graphics processing unit (GPU), a central processing unit (CPU), an accelerated processing unit (APU), a microprocessor, and / or the like. User device 520 may be a client device.
[0111] User device 520 may interact (e.g., communicate with) LLM 522 to query embedding database 112. LLM 522 may include at least one processor (e.g., a multicore processor) such as a graphics processing unit (GPU), a central processing unit (CPU), an accelerated processing unit (APU), a microprocessor, and / or the like. LLM 522 may comprise a machine learning model. The machine learning model may include at least one neural network, at least one multilayer perceptron (MLP), at least one deep neural network (DNN), at least one attention model, at least one selfattention model, at least one multi-head self-attention model, at least one transformer model, at least one vision transformer (ViT) model, at least one convolutional neural network (CNN), at least one tree model, and / or the like. LLM 522 may be configured to dynamically communicate with user device 520 in a manner that mimics human conversation, such as by generating human-like responses in response to queries from user device 520.
[0112] With continued reference to FIG. 5, user device 520 may transmit a query from a user, the query associated with a data set. The query may comprise a question, a request, a command, and / or the like associated with the data set. Querying the data set may comprise querying the augmented embeddings of the data set stored in60P0248.DOCX Page 24 of 40Attorney Docket No. 08223-2406684 (9071 WO01)embedding database 112. LLM 522 may receive the query from user device 520 during a conversation between LLM 522 and user device 520. While the non-limiting example in FIG. 5 shows the query being sent to LLM 522 during a conversation with user device 520, it will be appreciated that user device 520 may not submit the query through LLM 522 but may submit the query to query system 500 in any suitable way.
[0113] In response to receiving the query, LLM 522 may transmit the query to embedding model 110. In response to receiving the query, embedding model 110 may generate a query embedding associated with the query (e.g., generate a query embedding based on a query message of the query). The query embedding may comprise a mathematical relationship of data of the query, such as a vector, a tensor, and / or the like. The query embedding may comprise a mathematical relationship of data of the query that captures its meaning and relationships (e.g., is context-aware).
[0114] With continued reference to FIG. 5, query system 500 may search the plurality of augmented partition embeddings in embedding database 112 based on the query (and / or the query embedding). Query system 500 may automatically identify at least one augmented partition embedding of the plurality of augmented partition embeddings stored in embedding database 112 as relevant to the query based on the searching.
[0115] The searching of augmented partition embeddings in embedding database 112 may comprise executing a semantic search of the plurality of augmented partition embeddings based on the query (and / or the query embedding) to automatically identify at least one augmented partition embedding of the plurality of augmented partition embeddings stored in embedding database 112 as relevant to the query.
[0116] The searching may comprise determining a similarity between the query embedding and each augmented partition embedding in embedding database 112 to determine the relevance of each augmented partition embedding to the query. For example, for each augmented partition embedding stored in embedding database 112, a relevance score may be generated based on a comparison of the augmented partition embedding and the query embedding. The relevance score may be based on a distance in a multi-dimensional embedding space between the augmented partition embedding and the query embedding. In response to determining that the relevance score satisfies a relevance threshold, the augmented partition embedding satisfying the relevance threshold may be identified as relevant to the query.60P0248.DOCX Page 25 of 40Attorney Docket No. 08223-2406684 (9071 WO01)
[0117] In some non-limiting embodiments or aspects, a first augmented partition embedding of a plurality of augmented partition embeddings stored in embedding database 112 may be analyzed for its relevance to the query. The first augmented partition embedding may comprise the contextual partition embedding (e.g., CP from FIG. 4) appended to a first partition embedding (e.g., P1 from FIG. 4) of the plurality of partition embeddings to form a first concatenated partition embedding (e.g., C1 from FIG. 4). The query for which relevance is being determined may comprise a query message. In response to receiving the query, embedding model 110 may generate a query embedding associated with the query message. The query embedding may be appended to a second query embedding associated with the query to form an augmented query embedding, such as by concatenating the query embedding to the second query embedding. In some non-limiting embodiments or aspects, the second query embedding may be identical to the query embedding, or the second query embedding may be a different embedding from the query embedding. The augmented query embedding may be formed to have a same dimension as the first augmented partition embedding for which relevance is being determined. Automatically identifying that the first augmented partition embedding is relevant to the query may comprise executing a dot product of the augmented query embedding and the first augmented partition embedding. The dimensions of the augmented query embedding and the first augmented partition embedding being the same may enable execution of the dot product function to enable relevance determination of the first augmented partition embedding (e.g., based on a relationship between augmented query embedding and the first augmented partition embedding). While this non-limiting example describes executing a dot product to determine a relationship between two embeddings (to determine relevance thereof), it will be appreciated that other methods of comparing two embeddings in multi-dimensional space may additionally or alternatively be used, such as using cosine distance, Euclidean distance, Manhattan distance, and / or the like.
[0118] With continued reference to FIG. 5, in response to identifying at least one augmented partition embedding relevant to the query (based on the searching), the relevant augmented partition embedding(s) and / or the original partition of the document corresponding thereto (e.g., the non-embedded data) may be input to LLM 522. LLM 522 may generate a query response message to the query based on the relevant augmented partition embedding(s) and / or the partition(s) corresponding60P0248.DOCX Page 26 of 40Attorney Docket No. 08223-2406684 (9071 WO01)thereto. The query response message may comprise a human-interpretable answer to the query, which may be communicated from LLM 522 to user device 520 in the course of the conversation therebetween. The human-interpretable answer may be generated based on the relevant partition (and / or embedding and / or augmented embedding thereof).
[0119] In some non-limiting embodiments or aspects, the query and / or a query embedding corresponding to the query may be input to LLM 522 (e.g. in addition to the relevant augmented partition embedding(s) and / or the original partition of the document corresponding thereto). The query response message generated by LLM 522 may be generated based on the query and / or the query embedding corresponding to the query.
[0120] Referring now to FIG. 6, shown is a process 600 for augmenting embeddings, according to some non-limiting embodiments or aspects. The steps shown in FIG. 6 are for example purposes only. It will be appreciated that additional, fewer, different, and / or a different order of steps may be used in some non-limiting embodiments or aspects. In some non-limiting embodiments or aspects, a step may be automatically performed in response to performance and / or completion of a prior step.
[0121] As shown in FIG. 6, at step 602, a data set containing data may be received. For example, data set 602 may be received by data processing system 104.
[0122] As shown in FIG. 6, at step 604, the data set may be parsed. For example, parsing system 106 may parse the data set.
[0123] As shown in FIG. 6, at step 606, a contextual partition of the data set (e.g., of each document in the data set) may be identified. For example, contextual partition model 216 of partition system 108 of data processing system 104 may identify the contextual partition (s).
[0124] As shown in FIG. 6, at step 608, the contextual partition may be embedded to form a contextual partition embedding. For example, embedding model 110 may generate the contextual partition embedding based on the contextual partition.
[0125] As shown in FIG. 6, at step 610, the data set may be partitioned into a plurality of different partitions, each partition of the plurality of partitions comprising a subset of the data set. For example, partition system 108 may partition the data set into the plurality of different partitions. For example, partition system 108 may partition each document in the data set into a plurality of different partitions.60P0248.DOCX Page 27 of 40Attorney Docket No. 08223-2406684 (9071 WO01)
[0126] As shown in FIG. 6, at step 612, each of the plurality of partitions may be embedded to form partition embeddings. For example, embedding model 110 may generate the partition embeddings for the plurality of partitions.
[0127] As shown in FIG. 6, at step 614, each of the plurality of partition embeddings may be augmented by appending the contextual partition embedding to each of the plurality of partition embeddings to form a plurality of augmented partition embeddings.
[0128] As shown in FIG. 6, at step 616, the plurality of augmented partition embeddings may be stored in a database. For example, the plurality of augmented partition embeddings may be stored in embedding database 112.
[0129] As shown in FIG. 6, at step 618, the database (e.g., embedding database 112) storing the plurality of augmented partition embeddings may be queried. Querying embedding database 112 may comprise automatically identifying at least one augmented partition embedding of the plurality of augmented partition embeddings as relevant to the query based on searching embedding database 112.
[0130] Referring now to FIG. 7, shown is a process 700 for augmenting embeddings, according to some non-limiting embodiments or aspects. The steps shown in FIG. 7 are for example purposes only. It will be appreciated that additional, fewer, different, and / or a different order of steps may be used in some non-limiting embodiments or aspects. In some non-limiting embodiments or aspects, a step may be automatically performed in response to performance and / or completion of a prior step.
[0131] In some non-limiting embodiments or aspects, one or more of the steps of process 700 may be performed (e.g., completely, partially, and / or the like) by embedding system 100 (e.g., one or more devices of embedding system 100) and / or one or more components of query system 500 (e.g., one or more devices of query system 500). In some non-limiting embodiments or aspects, one or more of the steps of process 700 may be performed (e.g., completely, partially, and / or the like) by another system, another device, another group of systems, or another group of devices, separate from or including embedding system 100 and / or query system 500.
[0132] As shown in FIG. 7, at step 702, process 700 may include receiving, with at least one processor, a data set. For example, data processing system 104 may receive data set 102.
[0133] As shown in FIG. 7, at step 704, process 700 may include identifying, with at least one processor, a contextual partition of the data set (e.g., of each document60P0248.DOCX Page 28 of 40Attorney Docket No. 08223-2406684 (9071 WO01)in the data set). For example, contextual partition model 216 of partition system 108 of data processing system 104 may identify the contextual partition.
[0134] As shown in FIG. 7, at step 706, process 700 may include partitioning, with at least one processor, the data set (e.g., of each document in the data set) into a plurality of different partitions, each partition of the plurality of partitions comprising a subset of the data set. For example, partition system 108 of data processing system 104 may partition the data set into the plurality of different partitions.
[0135] As shown in FIG. 7, at step 708, process 700 may include inputting, with at least one processor, the contextual partition and the plurality of partitions into an embedding model. For example, the contextual partition and the plurality of partitions may be input to embedding model 110.
[0136] As shown in FIG. 7, at step 710, process 700 may include generating an embedding for each of the contextual partition and the plurality of partitions to form a contextual partition embedding and a plurality of partition embeddings. For example, embedding model 110 may generate the contextual partition embedding and a plurality of partition embeddings.
[0137] As shown in FIG. 7, at step 712, process 700 may include augmenting, with at least one processor, each of the plurality of partition embeddings by appending the contextual partition embedding to each of the plurality of partition embeddings to form a plurality of augmented partition embeddings.
[0138] As shown in FIG. 7, at step 714, process 700 may include storing, with at least one processor, the plurality of augmented partition embeddings in a database. For example, the plurality of augmented partition embeddings may be stored in embedding database 112.
[0139] As shown in FIG. 7, at step 716, process 700 may include receiving, with at least one processor, a query associated with the data set. For example, the query may be received from user device 520. In some non-limiting example, LLM 522 of query system 500 may receive the query.
[0140] As shown in FIG. 7, at step 718, process 700 may include: in response to receiving the query, searching, with at least one processor, the plurality of augmented partition embeddings in the database based on the query. For example, the plurality of augmented partition embeddings in embedding database 112 may be searched.
[0141] As shown in FIG. 7, at step 720, process 700 may include automatically identifying, with at least one processor, at least one augmented partition embedding60P0248.DOCX Page 29 of 40Attorney Docket No. 08223-2406684 (9071 WO01)of the plurality of augmented partition embeddings as relevant to the query based on the searching.
[0142] FIG. 8 shows an electronic payment processing network 800 according to non-limiting embodiments or aspects. The payment processing network may be used in conjunction with the systems and methods described herein. It will be appreciated that the particular arrangement of electronic payment processing network 800 shown is for example purposes only, and that various arrangements are possible. Transaction processing system 801 (e.g., a transaction handler) is shown to be in communication with one or more issuer systems (e.g., such as issuer system 806) and one or more acquirer systems (e.g., such as acquirer system 808). Although only a single issuer system 806 and single acquirer system 808 are shown, it will be appreciated that transaction processing system 801 may be in communication with a plurality of issuer systems and / or acquirer systems. In some embodiments, transaction processing system 801 may also operate as an issuer system such that both transaction processing system 801 and issuer system 806 are a single system and / or controlled by a single entity.
[0143] In some non-limiting embodiments or aspects, transaction processing system 801 may communicate with merchant system 804 directly through a public or private network connection. Additionally or alternatively, transaction processing system 801 may communicate with merchant system 804 through payment gateway 802 and / or acquirer system 808. In some non-limiting embodiments or aspects, an acquirer system 808 associated with merchant system 804 may operate as payment gateway 802 to facilitate the communication of transaction requests from merchant system 804 to transaction processing system 801. Merchant system 804 may communicate with payment gateway 802 through a public or private network connection. For example, a merchant system 804 that includes a physical POS device may communicate with payment gateway 802 through a public or private network to conduct card-present transactions. As another example, a merchant system 804 that includes a server (e.g., a web server) may communicate with payment gateway 802 through a public or private network, such as a public Internet connection, to conduct card-not-present transactions.
[0144] In some non-limiting embodiments or aspects, transaction processing system 801 , after receiving a transaction request from merchant system 804 that identifies an account identifier of a payor (e.g., such as an account holder) associated60P0248.DOCX Page 30 of 40Attorney Docket No. 08223-2406684 (9071 WO01)with an issued payment device 810, may generate an authorization request message to be communicated to the issuer system 806 that issued the payment device 810 and / or account identifier. Issuer system 806 may then approve or decline the authorization request and, based on the approval or denial, generate an authorization response message that is communicated to transaction processing system 801. Transaction processing system 801 may communicate an approval or denial to merchant system 804. When issuer system 806 approves the authorization request message, it may then clear and settle the payment transaction between the issuer system 806 and acquirer system 808.
[0145] Referring now to FIG. 9, shown is a diagram of example components of a device 900 according to non-limiting embodiments or aspects. Device 900 may correspond to at least one of data set 102, data processing system 104, parsing system 106, partition system 108, embedding model 110, embedding database 112, parsed data set 214, contextual partition model 216, client device 218, partitioning system 400, concatenating system 450, user device 520, LLM 522, and / or any other computing device shown and described herein. In some non-limiting embodiments or aspects, such systems or devices may include at least one device 900 and / or at least one component of device 900. The number and arrangement of components shown in FIG. 9 are provided as an example. In some non-limiting embodiments or aspects, device 900 may include additional components, fewer components, different components, or differently arranged components than those shown in FIG. 9. Additionally, or alternatively, a set of components (e.g., one or more components) of device 900 may perform one or more functions described as being performed by another set of components of device 900.
[0146] As shown in FIG. 9, device 900 may include bus 902, processor 904, memory 906, storage component 908, input component 910, output component 912, and communication interface 914. Bus 902 may include a component that permits communication among the components of device 900. In some non-limiting embodiments or aspects, processor 904 may be implemented in hardware, firmware, or a combination of hardware and software. For example, processor 904 may include a processor (e.g., a central processing unit (CPU), a graphics processing unit (GPU), an accelerated processing unit (APU), etc.), a microprocessor, a digital signal processor (DSP), and / or any processing component (e.g., a field-programmable gate array (FPGA), an application-specific integrated circuit (ASIC), etc.) that can be60P0248.DOCX Page 31 of 40Attorney Docket No. 08223-2406684 (9071 WO01)programmed to perform a function. Memory 906 may include random access memory (RAM), read only memory (ROM), and / or another type of dynamic or static storage device (e.g., flash memory, magnetic memory, optical memory, etc.) that stores information and / or instructions for use by processor 904.
[0147] With continued reference to FIG. 9, storage component 908 may store information and / or software related to the operation and use of device 900. For example, storage component 908 may include a hard disk (e.g., a magnetic disk, an optical disk, a magneto-optic disk, a solid state disk, etc.) and / or another type of computer-readable medium. Input component 910 may include a component that permits device 900 to receive information, such as via user input (e.g., a touch screen display, a keyboard, a keypad, a mouse, a button, a switch, a microphone, etc.). Additionally, or alternatively, input component 910 may include a sensor for sensing information (e.g., a global positioning system (GPS) component, an accelerometer, a gyroscope, an actuator, etc.). Output component 912 may include a component that provides output information from device 900 (e.g., a display, a speaker, one or more light-emitting diodes (LEDs), etc.). Communication interface 914 may include a transceiver-like component (e.g., a transceiver, a separate receiver and transmitter, etc.) that enables device 900 to communicate with other devices, such as via a wired connection, a wireless connection, or a combination of wired and wireless connections. Communication interface 914 may permit device 900 to receive information from another device and / or provide information to another device. For example, communication interface 914 may include an Ethernet interface, an optical interface, a coaxial interface, an infrared interface, a radio frequency (RF) interface, a universal serial bus (USB) interface, a Wi-Fi® interface, a cellular network interface, and / or the like.
[0148] Device 900 may perform one or more processes described herein. Device 900 may perform these processes based on processor 904 executing software instructions stored by a computer-readable medium, such as memory 906 and / or storage component 908. A computer-readable medium may include any non-transitory memory device. A memory device includes memory space located inside of a single physical storage device or memory space spread across multiple physical storage devices. Software instructions may be read into memory 906 and / or storage component 908 from another computer-readable medium or from another device via communication interface 914. When executed, software instructions stored in memory60P0248.DOCX Page 32 of 40Attorney Docket No. 08223-2406684 (9071 WO01)906 and / or storage component 908 may cause processor 904 to perform one or more processes described herein. Additionally, or alternatively, hardwired circuitry may be used in place of or in combination with software instructions to perform one or more processes described herein. Thus, embodiments described herein are not limited to any specific combination of hardware circuitry and software. The term “configured to,” as used herein, may refer to an arrangement of software, device(s), and / or hardware for performing and / or enabling one or more functions (e.g., actions, processes, steps of a process, and / or the like). For example, “a processor configured to” may refer to a processor that executes software instructions (e.g., program code) that cause the processor to perform one or more functions.
[0149] In some non-limiting embodiments or aspects, a computer program product for augmenting embeddings includes at least one non-transitory computer readable medium including program instructions that, when executed by at least one processor, cause the at least one processor to execute one of the previously-described methods. The at least one processor may include any of the components shown in FIG. 1 , 2, 4, and 5.
[0150] Although embodiments have been described in detail for the purpose of illustration, it is to be understood that such detail is solely for that purpose and that the disclosure is not limited to the disclosed embodiments or aspects, but, on the contrary, is intended to cover modifications and equivalent arrangements that are within the spirit and scope of the appended claims. For example, it is to be understood that the present disclosure contemplates that, to the extent possible, one or more features of any embodiment or aspect can be combined with one or more features of any other embodiment or aspect.60P0248.DOCX Page 33 of 40
Claims
Attorney Docket No. 08223-2406684 (9071 WO01)WHAT IS CLAIMED IS:
1. A computer-implemented method, comprising:receiving, with at least one processor, a data set;identifying, with at least one processor, a contextual partition of the data set;partitioning, with at least one processor, the data set into a plurality of different partitions, each partition of the plurality of partitions comprising a subset of the data set;inputting, with at least one processor, the contextual partition and the plurality of partitions into an embedding model;generating, with the embedding model, an embedding for each of the contextual partition and the plurality of partitions to form a contextual partition embedding and a plurality of partition embeddings;augmenting, with at least one processor, each of the plurality of partition embeddings by appending the contextual partition embedding to each of the plurality of partition embeddings to form a plurality of augmented partition embeddings;storing, with at least one processor, the plurality of augmented partition embeddings in a database;receiving, with at least one processor, a query associated with the data set;in response to receiving the query, searching, with at least one processor, the plurality of augmented partition embeddings in the database based on the query; andautomatically identifying, with at least one processor, at least one augmented partition embedding of the plurality of augmented partition embeddings as relevant to the query based on the searching.
2. The computer-implemented method of claim 1 , further comprising:in response to receiving the query, generating, with the embedding model, a query embedding associated with the query, wherein the searching the plurality of augmented partition embeddings in the database is based on the query embedding.60P0248.DOCX Page 34 of 40Attorney Docket No. 08223-2406684 (9071 WO01)3. The computer-implemented method of claim 1 , further comprising:in response to identifying the at least one augmented partition embedding of the plurality of augmented partition embeddings as relevant to the query, inputting the at least one augmented partition embedding and / or at least one partition of the plurality of partitions corresponding to the at least one augmented partition embedding to a large language model (LLM); andgenerating, with the LLM, a query response message to the query based on the at least one augmented partition embedding and / or the at least one partition of the plurality of partitions corresponding to the at least one augmented partition embedding.
4. The computer-implemented method of claim 3, further comprising:inputting the query and / or a query embedding corresponding to the query to the LLM, wherein the query response message to the query is generated based on the query and / or the query embedding corresponding to the query.
5. The computer-implemented method of claim 1 , wherein identifying the contextual partition comprises automatically identifying, with a model, the contextual partition.
6. The computer-implemented method of claim 1 , wherein identifying the contextual partition comprises:automatically identifying, with a reinforcement learning model, a predicted contextual partition;generating, with the reinforcement learning model, a feedback request message comprising the predicted contextual partition;transmitting the feedback request message to a client device; and receiving, with the reinforcement learning model and from the client device, a feedback response message confirming the predicted contextual partition as the contextual partition.60P0248.DOCX Page 35 of 40Attorney Docket No. 08223-2406684 (9071 WO01)7. The computer-implemented method of claim 1 , wherein identifying the contextual partition comprises receiving the contextual partition from a client device.
8. The computer-implemented method of claim 1 , wherein identifying the contextual partition comprises:inputting the data set to a generative artificial intelligence model; and generating, with the generative artificial intelligence model, the contextual partition based on the data set.
9. The computer-implemented method of claim 1 , wherein the plurality of partitions comprises a first partition and a second partition, wherein the first partition and the second partition comprise an overlap of data from the data set included therein.
10. The computer-implemented method of claim 1 , wherein the plurality of partitions comprises a first partition and a second partition, wherein the first partition and the second partition do not comprise an overlap of data from the data set included therein.
11. The computer-implemented method of claim 1 , wherein the contextual partition represents a contextual summary of the data set.
12. The computer-implemented method of claim 1 , wherein appending the contextual partition embedding to each of the plurality of partition embeddings comprises concatenating the contextual partition embedding to each of the plurality of partition embeddings to form the plurality of augmented partition embeddings.
13. The computer-implemented method of claim 1 , wherein the contextual partition embedding and / or the plurality of partition embeddings are context-aware embeddings.60P0248.DOCX Page 36 of 40Attorney Docket No. 08223-2406684 (9071 WO01)14. The computer-implemented method of claim 1 , wherein searching the plurality of augmented partition embeddings in the database comprises executing a semantic search of the plurality of augmented partition embeddings based on the query.
15. The computer-implemented method of claim 1, wherein a first augmented partition embedding of the plurality of augmented partition embeddings comprises the contextual partition embedding appended to a first partition embedding of the plurality of partition embeddings, and the query comprises a query message, the computer-implemented method further comprising:in response to receiving the query, generating, with the embedding model, a query embedding associated with the query message;appending, with at least one processor, the query embedding to a second query embedding associated with the query to form an augmented query embedding,wherein automatically identifying the at least one augmented partition embedding of the plurality of augmented partition embeddings as relevant to the query comprises executing a dot product of the augmented query embedding and the augmented partition embedding.
16. The computer-implemented method of claim 2, wherein automatically identifying the at least one augmented partition embedding of the plurality of augmented partition embeddings as relevant to the query based on the searching comprises:for each augmented partition embedding of the plurality of augmented partition embeddings, generating a relevance score based on a comparison of the augmented partition embedding and the query embedding; andin response to determining that the relevance score satisfies a relevance threshold, identifying the augmented partition embedding as relevant to the query.
17. A system comprising at least one processor configured to: receive a data set;identify a contextual partition of the data set;60P0248.DOCX Page 37 of 40Attorney Docket No. 08223-2406684 (9071 WO01)partition the data set into a plurality of different partitions, each partition of the plurality of partitions comprising a subset of the data set;input the contextual partition and the plurality of partitions into an embedding model;generate, with the embedding model, an embedding for each of the contextual partition and the plurality of partitions to form a contextual partition embedding and a plurality of partition embeddings;augment each of the plurality of partition embeddings by appending the contextual partition embedding to each of the plurality of partition embeddings to form a plurality of augmented partition embeddings;store the plurality of augmented partition embeddings in a database; receive a query associated with the data set;in response to receiving the query, search the plurality of augmented partition embeddings in the database based on the query; andautomatically identify at least one augmented partition embedding of the plurality of augmented partition embeddings as relevant to the query based on the searching.
18. The system of claim 17, the at least one processor further configured to:in response to receiving the query, generate, with the embedding model, a query embedding associated with the query, wherein the searching the plurality of augmented partition embeddings in the database is based on the query embedding.
19. The system of claim 17, the at least one processor further configured to:in response to identifying the at least one augmented partition embedding of the plurality of augmented partition embeddings as relevant to the query, inputting the at least one augmented partition embedding and / or at least one partition of the plurality of partitions corresponding to the at least one augmented partition embedding to a large language model (LLM); andgenerating, with the LLM, a query response message to the query based on the at least one augmented partition embedding and / or the at least one partition of60P0248.DOCX Page 38 of 40Attorney Docket No. 08223-2406684 (9071 WO01)the plurality of partitions corresponding to the at least one augmented partition embedding.
20. A computer program product comprising at least one non-transitory computer-readable medium including program instructions that, when executed by at least one processor, cause the at least one processor to:receive a data set;identify a contextual partition of the data set;partition the data set into a plurality of different partitions, each partition of the plurality of partitions comprising a subset of the data set;input the contextual partition and the plurality of partitions into an embedding model;generate, with the embedding model, an embedding for each of the contextual partition and the plurality of partitions to form a contextual partition embedding and a plurality of partition embeddings;augment each of the plurality of partition embeddings by appending the contextual partition embedding to each of the plurality of partition embeddings to form a plurality of augmented partition embeddings;store the plurality of augmented partition embeddings in a database; receive a query associated with the data set;in response to receiving the query, search the plurality of augmented partition embeddings in the database based on the query; andautomatically identify at least one augmented partition embedding of the plurality of augmented partition embeddings as relevant to the query based on the searching.60P0248.DOCX Page 39 of 40