Method for determining a product data set for producing a product
A generative data-driven model generates tool sequences to enhance the management of complex product data sets in production environments, addressing inefficiencies and errors by providing structured outputs based on user instructions and historical data.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- BASF SE
- Filing Date
- 2025-12-18
- Publication Date
- 2026-06-25
Smart Images

Figure EP2025088164_25062026_PF_FP_ABST
Abstract
Description
METHOD FOR DETERMINING A PRODUCT DATA SET FOR PRODUCING A PRODUCTTECHNICAL FIELD
[0001] This disclosure relates to methods, apparatuses, systems, and the like for determining a product data set for producing a product based on the product data set.TECHNICAL BACKGROUND
[0002] In manufacturing, the management of product data sets such as Bill of Materials (BOM) may present significant technical challenges, particularly when producing complex products in equally complex production environments.SUMMARY
[0003] According to a first aspect a method for determining at least one product data set is disclosed, the method comprising:
[0004] Obtaining a user instruction related to determining the at least one product data set for producing a product based on the product data set;
[0005] Determining whether a historic user instruction similar to the user instruction has been processed in the past, based on a history data base providing at least one history data set, wherein the at least one history data set is associated with a historic user instruction;
[0006] upon determining that the user instruction similar to the user instruction has been processed, determining the at least one product data set based on the at least one history data set;
[0007] Providing the at least one product data set.
[0008] According to further aspects, respective apparatus, system, and use are disclosed.EMBODIMENTS
[0009] Any disclosure, embodiments and examples described herein relate to the method, the aspects, e.g. system, apparatus, product, e.g. chemical product, and computer element lined out above and below. Advantageously, the benefits provided by any of the embodiments and examples may equally apply to all other embodiments and examples.
[0010] According to this disclosure, history data set related to past processing of user instructions may be leveraged to increase reliability and efficiency of data retrieval, which may enhance a production process.
[0011] In some example embodiments, production or manufacturing as well as process data may be represented in a graph structure (i.e. a graph representation) which may allow to connect different entities and / or represent the flow of different processes in a production environment. Such knowledge graphs e.g. representing bills of materials (BOMs), may provide a flexible and dynamic solution that may be utilized to adapt to changes in the production process, e.g. changes in a product data set such as a BOM. However, navigating the graph representation, e.g. stored in a graph data base, to extract the necessary data or e.g. performing a root cause analysis may be elusive as the number of nodes and edges may quickly overwhelm user (e.g. controller, planner). This may limit the use of such knowledge graphs in actual day-to-day use cases. As proposed herein, a generative data-driven model may be used to interpret user instructions, analyzing their intent, and generate a tool sequence (e.g. a sequence of tools or agents to execute) e.g. based on a tool database comprising tool data sets that may e.g. contain information on each tools purpose, input and output data. Tools, function or agents may then be executed according to the order given in the tool sequence e.g. the first tool (e.g. function or agent) to be executed may be executed based on the user instruction (e.g. information given in the user instruction or derivable from the user instruction by the generative data-driven model), and each further tool in the tool sequence may be executed based on the output of the previous tool (and e.g. based on the user instruction depending on which input the tool needs and whether this is present or derivable by the generative data-driven model from the user instruction). Using the tools' output the generative data-driven model or another generative data-driven model may determine (e.g. generate) product data sets e.g. in textual, computer-interpretable and / or visual (e.g. plots, graphs) form in response to the user instruction. In this way a user may dispense with navigating the underlying knowledge graph and may concentrate on the quantities or flows of interest. Using the knowledge graphs may allow to reduce hallucinations and irrelevant responses of the generative data-driven model. By providing different outputs, e.g., data frames, plots, graphs, generated by the different tools (e.g. agents), may further enhance the users ability to verify the response and / or analyze the retrieved data directly. Hence, according to this disclosure e.g. the efficiency and accuracy of managing production data (e.g. BOMs) may be enhanced, which may lead to reducing errors and improving overall production efficiency.
[0012] A product data set may comprise processing data, which may be related to producing and / or processing a chemical product. Processing data may be indicative of at least one stepassociated with one or more processes for producing and / or processing the chemical product. The processing data may comprise instructions associated with producing and / or processing the chemical product. Processing data may comprise instructions to a worker associated with producing production and / or processing the chemical product. The processing data may comprise natural language, unstructured data and / or human-interpretable data. The processing data may be provided, in particular displayed, via a user interface. The processing data may be used and / or provided for monitoring and / or controlling one or more processes associated with producing and / or processing the chemical product. In particular, the processing data may comprise structured data and / or the processed production and / or processing data may comprise natural language, unstructured data and / or human-interpretable data. In an example embodiment the method is a method for monitoring and / or controlling one or more processes associated with producing and / or processing a chemical product, the method further comprising: monitoring and / or controlling one or more processes associated with producing and / or processing a chemical product based on the product data set. In an example, the product data set comprises instructions for monitoring and / or controlling one or more processes associated with producing and / or processing a chemical product.
[0013] In the following, embodiments of the present disclosure will be outlined by ways of examples. It is to be understood that the present disclosure is not limited to said embodiments and / or examples. All terms and definitions used herein are understood broadly and have their general meaning if not indicated otherwise.
[0014] Obtaining a user instruction related to determining the at least one product data set may be receiving the user instruction e.g. via a user interface and e.g. from an operator of a production environment. The disclosed method or any step of the method may be computer- implemented and e.g. be carried out on a processor, e.g. by an orchestrator agent.
[0015] Determining may be based on the user instruction and may e.g. be carried out by an orchestrator agent. Production data may comprise at least a part of a product data set. A graph structure may have nodes connected by edges, wherein the nodes may represent a material or product having certain properties (e.g. the properties may be a physical, chemical and / or biological property associated with the product) and the edges may represent relations between these material or products such as how they are used in a production process. The at least one tool may be or comprise a function or an agent, e.g. the graph extraction tool. A tool sequence may be an ordered list of tools that should be executed in the given order using certain input data that may be provided by the user instruction or another tool that is carried out earlier according to the tool sequence. A tool sequence may comprise an indication of one or moretool(s). Determining the at least one product data set based on at least a part of the output data may further be based on the user instruction, e.g. by providing at least a part of the user instruction as context to a generative data-driven model. Providing (e.g. to the operator) the at least one product data set may be a providing via a user interface. The graph extraction tool may be configured to retrieve at least a part of or a basis of the at least one product data set from the graph database, e.g. a graph extraction tool may be configured to find a node based on a certain property associated with the node, get all nearest neighbour nodes to a given or found node, and / or retrieve all unique values across the graph structure for a property associated with one or more nodes and / or retrieve data from nodes having a specific relationship (e.g. a particular molecular bond or process step). A basis of the at least one product data set may refer to the product data set being based on the retrieved data from the graph database, e.g. in that the retrieved data is transformed, used in a calculation, aggregated etc.. For instance another tool (e.g. as indicated in the tool sequence) may be configured to determine an output, such as a value, based on the retrieved data and / or a further tool may be configured to aggregate data retrieved from different parts of the graph structured production data and / or a further tool may be configured to average values (e.g. from a time series of production data) and / or transform values in the retrieved data.
[0016] The method may for instance be performed, carried-out, executed and / or controlled by an / the computing apparatus, for instance a server, a server cloud, a computer-system, or part thereof. The disclosed method or any step of the method may be computer-implemented. For instance, the method or any step of the method may be performed and / or controlled by using at least one processor e.g. of an / the apparatus.
[0017] Alternatively, the method according to any aspect may be performed, carried-out, executed and / or controlled by more than one apparatus, for instance a server cloud comprising at least two servers or a system of apparatus, e.g. a system comprising at least one server providing at least one data base comprising production data represented in a graph structure, at least one server providing a generative data-driven model, and an apparatus comprising means for carrying-out the respective steps of the method according to the first aspect.
[0018] According to a second example aspect, an apparatus is disclosed, the apparatus comprising respective means for carrying out or performing the steps of the method according to the first aspect and / or any embodiment or example and combinations thereof of the method. Additionally or alternatively the apparatus may comprise at least one processor and at least one memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to carry out the steps of the method according to the first aspect and / or anyembodiment or example and combinations thereof of the method. Additionally or alternatively the apparatus may comprise circuitry (e.g. hardware-only circuitry, digital circuitry and / or a combination of hardware circuits and software) designed or configured to implement the functions for carrying out the steps of the method according to the first aspect (and / or any embodiment or example and combinations thereof of the method). Circuitry may be implemented in a chipset or a chip or an integrated circuit.
[0019] In particular in accordance with the second example aspect an apparatus is disclosed comprising:
[0020] Means (e.g. an obtainer or receiver) for obtaining a user instruction related to determining the at least one product data set for producing a product based on the product data set;
[0021] Means (e.g. an determiner) for determining whether a historic user instruction similar to the user instruction has been processed in the past, based on a history data base providing at least one history data set, wherein the at least one history data set is associated with a historic user instruction;
[0022] Means (e.g. an executor or performer) for upon determining that the user instruction similar to the user instruction has been processed, determining the at least one product data set based on the at least one history data set;
[0023] Means (e.g. a provider or transmitter) for providing (e.g. via a user interface, e.g. to the operator) the at least one product data set.
[0024] The disclosed apparatus according to any aspect may be a module or a component for a device, for example a chip. Alternatively, the disclosed apparatus according to any aspect may be a device, for instance a server, server cloud, a personal computer or a user device. The disclosed apparatus according to any aspect may comprise only the disclosed components, for instance means, processor, memory, or may further comprise one or more additional components, such as a graphical user interface.
[0025] According to a third example aspect, a system for for operating a production environment is disclosed, the system comprising:
[0026] an apparatus according to the second example aspect;
[0027] a graph database providing production data represented in a graph structure, the graph database being communicatively coupled to the apparatus;
[0028] a server providing the at least one generative data-driven model, the server being communicatively coupled to the apparatus,
[0029] together performing or carrying out at least the steps of the method according to first aspect, in particular the system further comprising a user device configured to receive the user instruction from an operator of the production environment and to display the at least one product data set.
[0030] According to a further example aspect, a use of a product data set determined according to the methods according to the first aspect, or by the apparatus according to the second example aspect for displaying the product data set to an operator of the production environment and / or for producing the product.
[0031] According to a further example aspect, a computer element is disclosed, the computer element comprising instructions, which when executed by a processor or a computing apparatus perform or carry out the steps according to the methods or as defined by the apparatuses disclosed herein.
[0032] According to a further example aspect, an apparatus is disclosed, configured to perform and / or control or comprising respective means for performing and / or controlling the method according to any example aspect. According to a further example aspect, an apparatus is disclosed comprising at least one processor; and at least one memory storing instructions that, when executed by the at least one processor, cause the first apparatus at least to perform the method according to any example aspect.
[0033] According to a further example aspect, a computer program or computer program product is disclosed, the computer program or computer program product when executed by a processor causing an apparatus, for instance a server, to perform and / or control the actions of the method according the any aspect.
[0034] According to a further example aspect, a (e.g. tangible and / or non-transitory) computer readable storage medium is disclosed, the computer readable storage medium comprising a computer program, the computer program when executed by a processor causing an apparatus, for instance a server, to perform and / or control the actions of the method according the any aspect.
[0035] Any disclosure herein relating to any example aspect is to be understood to be equally disclosed with respect to any subject-matter according to the respective example aspect, e.g. relating to an apparatus, a method, or a computer program. Thus, for instance, the disclosure of a method step shall also be considered as a disclosure of means for performing and / or causing to perform the respective method step. Likewise, the disclosure of means for performing and / or causing to perform a method step shall also be considered as a disclosure of the method step itself. The same holds for any passage describing at least one processor; and at least onememory including instructions; the at least one memory and the instructions configured to, with the at least one processor, cause an apparatus at least to perform a step.
[0036] In the following example features and example embodiments of all aspects will be described in more detail.
[0037] According to an example embodiment of all aspects, determining the at least one product data set based on the at least one history data set comprises: Determining, based on the user instruction and the at least one history data set, a tool sequence, comprises an indication of at least one tool, wherein the at least one tool comprises at least one graph extraction tool for retrieving at least a part of the at least one product data set or a basis of the at least one product data set from a graph database (e.g. providing production data represented in a graph structure); Carrying out or causing to carry out the at least one tool comprising the at least one graph extraction tool according to the tool sequence; Receiving output data from the at least one tool; wherein the determining the at least one product data set based on the at least one history data set is further based on at least a part of the output data. Using a system of specialized tools may enhance the accuracy and reliability of the determined product data set.
[0038] According to an example embodiment of all aspects, determining the tool sequence comprises: retrieving a set of tool data sets (for instance, the set of tool data sets may be e.g. a list or table of tools and their associated descriptions, e.g. a json file comprising tool id and descriptions comprising e.g. Input parameters for tools, input or output data), wherein each tool data set of the set of tool data sets is associated with a tool of the at least one tool; wherein the determining the tool sequence is further based on the set of tool data sets.
[0039] According to an example embodiment, determining the tool sequence further comprises at least the following steps: providing a task instruction for generating the tool sequence to a generative data-driven model, wherein the task instruction is generated based on the user instruction and the set of tool data sets, the generative data-driven model being configured to generate (e.g. and provide) a tool output data set related to the tool sequence, in response to receiving the task instruction; providing the tool output data set as the tool sequence or in case the tool output data set is not of the same format of a tool sequence: parsing or causing to parse the tool output data set into the tool sequence and providing the tool sequence. A task instruction for generating the tool sequence may be generated based on the user instruction and the set of tool data sets. Parsing the tool output data set may be based on using a rule-based engine, that is configured e.g. to structure the tool output data set in specific order or format based e.g. on certain key-words. Parsing the tool output data set may be based on instructing a generative data-driven model to generate an output of a certain type or format based on the tool outputdata set. e.g. by generating a task instruction based on the user instruction, and providing the task instruction to a generative data-driven model that has been trained on general-purpose data comprising natural language, wherein the generative data-driven model generates and provides the tool sequence. The generative data-driven model for generating the tool output data set or tool sequence may e.g. be GPT4o (e.g. gpt-4o-2024-08-06 or later). Such a generative data-driven model may e.g. be configured to provide structured outputs so that the model adheres to a given output structure. A generative data-driven model may have a response format parameter, which may be set so that the output of the model matches a provided format or schema (e.g. a JSON schema). This may e.g. allow to dispense with further parsing the output of the generative data-driven model.
[0040] According to an example embodiment, parsing the tool output data set comprises: identifying data indicating the at least one tool, and, in case the at least one tool is more than one tool, determining a tool order in which the more than one tool is to be carried out; wherein the tool sequence comprises the data indicating the at least one tool and, in case the at least one tool is more than one tool, the tool order.
[0041] According to an example embodiment, determining the tool sequence comprises: determining whether input data (e.g. Input data having / according to an input data format, input data type and / or input data structure) required for carrying out the at least one tool is provided by the user instruction; upon determining that the input data required for carrying out the at least one tool is provided by the user instruction: including the at least one tool in the tool sequence to be carried out based on the input data. For instance, an input data format, input data type and / or input data structure used by the at least one tool may be identified. The determining or identifying may e.g. be based on a tool data set associated with the at least one tool, the tool data set e.g. comprising an input data format, input data type and / or input data structure used by the at least one tool. For example, a generative data-driven model such as a large language model (LLM) may be utilized to analyze a natural language query provided as a user instruction, mapping the query to required input parameters, including input data format (e.g., JSON, XML), input data type (e.g., integer, string), and input data structure (e.g., a hierarchical array) necessary for executing a specified tool, wherein the LLM may employ tokenization to convert the query into tokens, may generate contextual embeddings to capture semantic meaning, and may apply pattern recognition and semantic analysis to evaluate compliance with the defined criteria, returning e.g. a confirmation of validity or an error message with feedback on the input data's conformity.
[0042] According to an example embodiment, upon determining that the input data required for carrying out the at least one tool is not provided by the user instruction: determining or redetermining the tool sequence, the (re-)determining comprising: retrieving a set of tool data sets, wherein each tool data set of the set of tool data sets is associated with a tool of the at least one tool; identifying, based on the set of tool data sets, a tool providing the input data (e.g. according to the input data format, input data type and / or input data structure) used by the at least one tool; introducing said tool in the tool sequence before the at least one tool. For instance, upon determining that no tool providing the input data format, input data type and / or input data structure used by the at least one tool is available, generating a tool; e.g. by providing a suitable task instruction for generating an according tool (e.g. a function) to the generative data-driven model; storing the generated tool and an according tool data set e.g. In the tool database. Then, in accordance with the tool sequence the tool may be carried out before the at least one tool, so that input data required for the at least one tool is available on time of execution of the at least one tool. The at least one tool may be carried out based on the input data provided by the tool. The (re-)determining may be performed accordingly by the generative data-driven model, for instance implicitly during determining the tool sequence.
[0043] According to an example embodiment, upon determining that the input data required for carrying out the at least one tool is not provided by the user instruction: providing a data provision request (e.g. to the operator via a user interface) for obtaining clarification data; and based on received (from the operator) clarification data, re-determining the tool sequence.
[0044] Re-determining may be carried out e.g. upon determining (by an agent or a validation performed by the orchestrator agent) that an error as occurred when processing the tools according to the tool sequence, for instance if wrong parameters were passed on to a tool, the orchestrator agent may re-determine which parameters to use and may adjust the tool sequence e.g. by including a conversion tool or requesting further data from the operator or calling a different tool. For this, e.g. A task instruction for generating a tool sequence may be provided to the generative data-driven model or another generative data-driven model, wherein the task instruction may comprise an indication of the determined error (e.g. an error message), received additional data from a user, an indication not to call a specific tool but choose another, an indication of a different way to call a tool (e.g. setting a parameter that e.g. configures the tool to accept certain input or provide certain output in case it has different options for these available). For instance, a graph extraction tool may be configurable to retrieve data on production plants, products, processes or the like, and was configured to extract (and provide) data on plants instead of products, so that in re-determining the tool will be configured correctlyto carry out the tools according to the tool sequence. A graph extraction tool may be based on or comprise a cypher query or other query languages or APIs suitable for the respective graph database.
[0045] According to an example embodiment, the at least one product data set is determined or generated using a generative data-driven model in particular trained on general-purpose training data sets or based on (e.g. distilled from) a generative data-driven model trained on general-purpose training data sets. Preferably, determining the tool sequence comprises: providing a task instruction for generating the tool sequence to a generative data-driven model, wherein the task instruction is generated based on the user instruction and the set of tool data sets, the generative data-driven model being configured to generate a tool sequence or a tool output data set related to the tool sequence, in response to receiving the task instruction.
[0046] In an embodiment, the method further comprises, receiving output data from the at least one tool; upon receiving output data from the at least one tool, providing a decision task instruction to a / the generative data-driven model. According to an example embodiment, upon receiving output data from the at least one tool, at least a part of the output data is provided to a / the generative data-driven model, e.g. as context in the decision task instruction. In an example embodiment the generative model is configured to determine based on the at least the part of the output data and the user instruction or the decision task instruction, whether the at least a part of the output data is suitable to determine the at least one product data set. In particular upon determining that the at least a part of the output data is suitable to determine the at least one product data set, the generative data-driven model may be configured to determine the at least one product data set, e.g. determining the product data set using the generative data-driven model. The generative data-driven model may be further configured to, upon determining that the at least a part of the output data is not suitable to determine the at least one product data set, re-determine the tool sequence or determining at least one further tool to be carried out. In the latter case the disclosed method may further comprise, carrying out the tools according to the re-determined tool sequence or the at least one further tool, receiving new or additional output data from the at least one tool of the re-determined tool sequence or from the at least one further tool, and determining the at least one product data set based on at least a part of the new and / or additional output data. For this, the new or additional output data may be provided as context to the generative data-driven model for determining the product data set.
[0047] According to an example embodiment, upon receiving output data from the at least one tool, at least a part of the output data is provided to a / the generative data-driven model. Preferably a decision task instruction, based on or comprising the user instruction, at least apart of the tool sequence and at least a part of the output data, is provided to the generative data-driven model trained on general purpose training data sets and being configured to generate a product data set and / or to re-determine the tool sequence or determining at least one further tool, in response to receiving the decision task instruction or the task instruction for generating the product data set. The method may further comprise: Carrying out or causing to carry out the at least one tool comprising the at least one graph extraction tool according to the re-determined tool sequence or carrying out the at least one further tool; and Receiving output data from the at least one tool of the re-determined tool sequence or the at least one further tool; and Determining the at least one product data set based on at least a part of the output data from the at least one tool of the re-determined tool sequence or the at least one further tool.
[0048] According to an example embodiment, determining the at least one product data set comprises providing the at least a part of the output data (and / or additional output data provided by further tools) to a / the generative data-driven model, for instance as context in a task instruction provided to the generative data-driven model or into an ongoing generation process started by providing the user instruction or a task instruction to the generative data-driven model. Preferably, determining the at least one product data set comprises: providing a task instruction (for generating the product data set) to a generative data-driven model, wherein the task instruction is based on the user instruction (e.g. comprises a representation of the user instruction including at least the same informational content as at least a part of the user instruction relevant for generating the product data set, for instance at least a part of the user instruction or a rephrasing of at least a part of the user instruction) and the at least a part of the output data, the generative data-driven model (e.g., having been trained on general purpose training data sets) being configured to generate a product data set, in response to receiving the task instruction for generating the product data set.
[0049] According to an example embodiment, the method further comprises or the step of carrying out or causing to carry out the at least one tool comprises, carrying out or causing to carry out a first tool according to the tool sequence; receiving output data from a first tool indicated in the tool sequence; upon receiving output data from the first tool, determining an operation instruction to a second tool indicated in the tool sequence, e.g. using a / the generative data-driven model, for instance by providing a task instruction for generating the operation instruction to the generative data-driven model, this task instruction comprising at least a part of the output data from the first tool; Carrying out or causing to carry out the second tool according to the tool sequence based on the operation user instruction. This process may berepeated for some or all tools according to the tool sequence, taking for instance into account at least part of the output data from previous tools for carrying out subsequent tools according to the tool sequence.
[0050] According to an example embodiment, the generative data-driven model is provided, for processing the user instruction, with a structured task instruction comprising at least: (i) a graph schema representation indicative of node labels, relationship classes and properties of the production data represented in the graph structure; (ii) a guidelines block specifying behavioral rules for processing, wherein the behavioral rules comprise at least one of: stepwise reasoning, prohibition of direct query writing by the model, deterministic usage of tools with avoidance of redundant calls, output formatting requirements, and stopping criteria; and (iii) optionally a system time indicator. The structured prompt may be provided as part of or together with the task instruction.
[0051] According to an example embodiment, the generative data-driven model in configured to operate in an interactive loop, wherein after carrying out or causing to carry out a tool according to the tool sequence, output data returned by the at least one tool is provided as context to the generative data-driven model. Based on the user instruction and at least a part of the newly provided output data, the generative data-driven model determines one of: a subsequent or further tool of the tool sequence including parameters therefor, a modification of the tool sequence or re-determined tool sequence, a request for clarification data, a reasoning step, or a decision to generate the at least one product data set. The providing of the output data as context may comprise appending at least a part of the output data to the task instruction and / or to a model context maintained during processing of the user instruction.
[0052] According to an example embodiment, the at least one product data set is determined or generated using a generative data-driven model in particular a generative data-driven model trained on general-purpose training data sets or based on (e.g. distilled from) a generative data- driven model trained on general-purpose training data sets.
[0053] In an embodiment, determining the at least one product data set based on the at least one history data set comprises: providing a task instruction for determining the product data set to a generative data-driven model, wherein the task instruction is based on the user instruction and the history data set, the generative data-driven model (e.g., having been trained on general purpose training data sets) being configured to determine a product data set, in response to receiving the task instruction for determining the product data set.
[0054] According to an example embodiment, determining the at least one product data set comprises: providing a task instruction to a generative data-driven model, wherein the taskinstruction is based on the user instruction and the at least a part of the output data, the generative data-driven model (e.g. having been trained on general purpose training data sets) being configured to determine (e.g. and provide) a product data set, in response to receiving the task instruction for generating the product data set. The method may further comprise generating, based on the user instruction and the at least a part of the output data, a task instruction for determining the product data set.
[0055] According to an example embodiment of the method according to the first aspect, the method further comprising: determining whether the user instruction was successfully processed (e.g. by an operator acknowledging success for instance via a success button displayed on a graphical user interface); and upon determining that the user instruction was successfully processed storing (e.g. by a memorizer) a history data set associated with the user instruction as a historic user instruction in a history data base (e.g. a history graph database in which the history data sets are provided in a graph structure), wherein the history data set comprises the tool sequence and wherein the history data set is associated with the user instruction (for instance the association may e.g. be given by the user instruction being included in the history data set or an identifier of the user instruction (for instance allowing identification and retrieval) may be associated with the history data set).
[0056] According to an example embodiment, upon determining that the user instruction was successfully processed, the method further comprises: embedding the user instruction; and storing the embedded user instruction in a vector data base, wherein the embedded user instruction is associated with the respective history data set (e.g. an identifier may be used as described above, or the user instruction being part of the history data set may account for association, or suitable meta-data may be attached to the history data set associating the history data set and corresponding embedding of historic user instruction).
[0057] According to an example embodiment, determining the tool sequence comprises: determining whether a historic user instruction similar to the user instruction has been processed in the past, based on a history data base providing at least one history data set, wherein the at least one history data set (e.g. each in the history database) is associated with a historic user instruction; upon determining that the user instruction similar to the user instruction has been processed, determining the tool sequence based on the at least one history data set. For instance, the task instruction for generating the tool sequence may be further based on the at least one history data set for instance by including it as context in the corresponding task instruction. For instance, determining whether a historic user instruction similar to the user instruction has been processed in the past may be based on determining a similarity scorebetween an embedding of the user instruction and the at least one embedding of a historic user instruction, wherein the similarity score may be based on the distance between the at least one embedding of the historic user instruction and the embedded user instruction in the embedding space with respect to a similarity measure (e.g. Cosine similarity or Euclidean distance); and e.g. determining that a historic user instruction similar to the user instruction has been processed in the past, when the similarity score is below or above a threshold similarity. A task instruction for generating the tool sequence may be generated based on the (e.g. identified) at least one history data set and e.g. the set of tool data sets, such a task instruction may be provided to a generative data-driven model, the generative data-driven model (e.g. having been trained on general purpose training data sets) being configured to generate (e.g. and provide) a tool output data set related to the tool sequence, in response to receiving the task instruction. Further, for example, in case the tool output data set is not of the correct type or form of a tool sequence: parsing or causing to parse the tool output data set into the tool sequence.
[0058] According to an example embodiment, determining whether a historic user instruction similar to the user instruction has been processed in the past comprises: Obtaining at least one embedding of a historic user instruction; Embedding the user instruction into an embedding space comprising the at least one embedding of the historic user instruction; Determining a similarity score for the at least one embedding of a historic user instruction, wherein the similarity score is based on the distance between the at least one embedding of the historic user instruction and the embedded user instruction in the embedding space with respect to a similarity measure (e.g. Cosine similarity or Euclidean distance); Determining that a historic user instruction similar to the user instruction has been processed in the past, when the similarity score is below or above a threshold similarity.
[0059] According to an example embodiment of the method according to the first aspect, the method further comprising: operating and / or controlling a production environment based on the at least one product data set, in particular to produce the product. For instance, the at least one product data set may be or comprise a sequence of instructions of how to produce the product. For instance, it may be machine-interpretable instructions which may allow automatic configuration of a production environment (e.g. after an operator approved of using the product data set). For instance, the product data set may be displayed to an operator of a production environment, who configures the production environment accordingly to produce the product.
[0060] A task instruction in particular for generating the tool sequence may be generated based on the user instruction and a task instruction template. For instance, generating the task instruction may comprise: obtaining one or more task instruction template(s) related todetermining at least one product data set; selecting at least one task instruction template(s) of the one or more task instruction template(s) based on the user instruction; generating the task instruction based on the at least one task instruction template(s) and the user instruction; and generating output data, such as a tool output data set, a tool sequence and / or product data set, by providing the task instruction to a / the generative data-driven model. This may allow to generate a task instruction specifically tailored to a request made by an operator and may enhance the output the generative data-driven model provides, such as a tool output data set, a tool sequence and / or product data set. Selecting at least one task instruction template(s) of the one or more task instruction template(s) may be based on a similarity measure (e.g. cosine similarity or Euclidian distance) in a common embedding space of the user instruction and the one or more task instruction template(s). For instance, the user instruction may be embedded. For instance selection may be based on a distance between embeddings (e.g. wherein an embedding is a numerical representation) of the user instruction and embedding (e.g. numerical representation(s)) of respective task instruction template(s). For example a similarity measure may be calculated between the embedded user instruction and embeddings of the one or more task instruction template(s). For instance, the task instruction template relating to the embedded task instruction template closest to embedded user instruction in relation to the similarity measure may be selected. In an example, to avoid mismatches, task instruction template relating to the embedded task instruction template closest to embedded user instruction in relation to the similarity measure may be selected in case the similarity measure is below or above a predefined threshold value.
[0061] A generative data-driven model may be a model, e.g. implemented in a computer system, that, based on historical data it has been trained on, may generate new instances of said data e.g. by sampling from a probability distribution, which may have been learned during training, and generate an according output data set after receiving a task instruction. A generative data-driven model may have been trained on general purpose training data sets (and via that training may be configured to) to generate an output data set in response to obtaining (e.g. receiving) the task instruction. A generative data-driven model may be or comprise a transformer-based data driven model, preferably a decoder-only transformer-based model such as a generative pre-trained transformer, e.g. Large Language Model Meta Al (Llama), Llama 2, Llama 3, Mistral 7B, GPT 3.5, GPT 3.5 turbo, GPT 4, GPT 4o. A generative data-driven model may be or comprise a mixture of experts architecture based model e.g. Mixtral 8x7B, Mixtral 8x22B. In a mixture of experts model several decoder blocks may be operated in parallel representing different experts or a feed-forward layer in a block may be split into separate parallel feed-forward layer, wherein each of the parallel feed-forward layers may be regardedas an expert and may learn to focus on different tasks during training. A gating network may be used to switch a particular input to the respective expert, e.g. by training the gateway network alongside the experts for instance using an expectation-maximization algorithm or a gradient descent algorithm. A generative data-driven model may be or comprise a selective state space sequence architecture based model e.g. Mamba. A generative data-driven model may be or comprise a combined architecture such as Mamba LLM or Mamba Mixture of Experts (which may comprise alternating Mamba and mixture of experts layers). A selective or structured state space sequence architecture (e.g. SSMs, S4, or S6 models) may allow for using more context in generation and allow for generating larger output data sets. A generative data-driven model may be trained on general-purpose training data sets or based on (e.g. distilled from) a generative data-driven model trained on general-purpose training data sets.
[0062] For instance, when receiving a task instruction the generative data-driven model may process the task instruction. For instance, the (received) task instruction may be tokenized, e.g. by dividing the task instruction into tokens (i.e. smaller units), such as words or subwords. The task instruction or tokens of the task instruction may be embedded e.g. converted into a vector in the model's latent space, e.g. by an embedding layer of the generative data-driven model.
[0063] A generative data-driven model may be or comprise an artificial neural network (ANN), which may comprise several layers.
[0064] A generative data-driven model may comprise an embedding layer for embedding a received token into the model's latent space, which may be a numerical representation or vector of the token in the latent space.
[0065] The generative data-driven model may comprise a positional encoding layer, which may encode the position of a received token relative to the task instruction. For instance, using trigonometric functions such as sine and cosine a unique positional encoding vector for each position of a token in the task instruction may be generated. The positional encoding vector may then be added element-wise to the embedding of the token obtained by an embedding layer, so that the position of the token is encoded together with the embedding of the token in the embedded token passed to e.g. an encoder or decoder block.
[0066] A generative data-driven model may comprise one or more encoder layers forming an encoder block and / or one or more decoder layers forming a decoder block.
[0067] A decoder and / or encoder block may comprise a self-attention layer. A self-attention layer may be configured to (re-)encode an embedded token (e.g. a vector) by taking into account the context provided by all other tokens. A self-attention layer may be configured to weigh the relevance of different parts (i.e. tokens) of its input with regard to the (e.g. overall) input. Theinput of a self-attention layer may have the form of a matrix (e.g. input matrix) or tensor, in which each row or column may correspond to an embedded token, which may be a vector, The output of a self-attention layer may then also be a matrix (e.g. output matrix) or tensor, wherein each row or column may additionally contain information on the relevance of the token relative to the other tokens. For instance, the first self-attention layer may be configured to weigh the relevance of different tokens of a task instruction based on their relevance to the (overall) task instruction, where the overall task instruction may be represented as the matrix of all embedded tokens of the task instruction. In this case the input matrix for the first self-attention layer e.g. after an input layer, may comprise the embedded tokens of the task instruction, i.e. vector representations of each token of the task instruction. To determine the self-attention, an individual embedded token (e.g. a token vector or row / column vector of the input matrix) may be transformed into a set of vectors (also named a head) namely a query vector, a key vector, and a value vector by e.g. Linear projection, such as multiplying a token vector by a respective weight matrix. Then a scaled dot-product attention may be applied, e.g. by forming the dotproduct of each query vector with each key vector to calculate an attention score that may represent the relevance of a given token relative to the (e.g. overall) input of the self-attention layer, e.g. In particular for the first self-attention layer the relevance of the token to the task instruction. The attention scores may then be normalized using a softmax function, which converts each attention score into a respective softmax score, which may represent probabilities that sum up to 1 , so that e.g. the weight of higher attention scores is increased and the weight of lower attention scores is decreased. Then, each value vector may be multiplied by the respective softmax score to calculate a weighted value vector. The weighted value vectors - corresponding to the tokens of the task instruction - may then be summed up to determine a self-attention vector. The output matrix of the self-attention layer may then comprise the selfattention vectors. The self-attention layer may also utilize several sets of trained vectors (or heads) each comprising a query vector, a key vector, and a value vector. In this case, the output matrices resulting from calculating the self-attention of each head may be concatenated and multiplied by an additional trained weight matrix, which may allow to transform the matrix to the dimensions of the input matrix, the resulting matrix may correspond to the output matrix of the self-attention layer. This multi-head approach to self-attention may allow the self-attention layer to capture different types of information from the input matrix in each of the heads. For example, one head might focus on syntactic information, another on semantic information.
[0068] A decoder and / or encoder block may comprise at least one feed-forward layer, which may perform at least one linear transformation, preferably two linear transformation, wherein a Rectified Linear Unit activation function is applied between the two linear transformations.
[0069] A decoder block may comprise a cross-attention layer. For instance, similar to the selfattention layer an attention score between different data sets may be calculated. For example, a query vector may be determined based on tokens from the task instruction like in self-attention, whereas the key and value vectors may be determined based on context information, e.g. provided separately.
[0070] A generative data-driven model may comprise an output layer, which may e.g. be configured to determine a probability distribution for the next token of an output sequence. For instance, the output matrix of the last decoder block may be linearly transformed to e.g. a vocabulary size (which may be larger than the size of the corresponding dimension of the output matrix) and a softmax function may be applied to create a probability distribution, e.g. over the vocabulary. From this distribution a sampling module may sample the next token of the output sequence, e.g. by selecting the token having the highest probability or by selecting the k tokens with the highest probability and selecting one from the k tokens at random. The sampling module may use top k-sampling or top p-sampling. The output sequence generated may then be attached to the prompt and again processed by the generative data-driven model to determine the next token and so forth until a end token is generated or a maximum length threshold is reached, the end token may be trained, i.e. determined via the training.
[0071] The generative data-driven model may be a (pre-)trained or parametrized general purpose model parametrized or trained based on general data sets including input-output-data pairs not specific to input data related to the product data sets or graph database.
[0072] An agent may be a self-operating computational unit, e.g. a software component or system that can independently execute tasks or operations. This execution may involve processing input data, applying predefined algorithms, and making decisions based on the results. It may also include the ability to adjust or modify its operations in response to changes in input data or outcomes of its operations. The self-operating computational unit may carry out these tasks without the need for continuous human supervision or intervention, although human input may be incorporated as part of its decision-making processes or to change its operational parameters. It should be understood that the specific functionalities, operations, and level of independence of the self-operating computational unit can vary based on the design and requirements of the specific system it is implemented in. A tool is in particular an electronic tool or operating engine and may be a computational agent or function, e.g. configured to calculating a weighted average of some quantities or any other certain form of aggregation. A computational agent or function may e.g. be included in the tool sequence if the user instruction related to postprocessing. A tool may be a tool creation agent configured to generate a certain tool not yetpresent in a tool database, e.g. by providing a task instruction to a generative data-driven model, the task instruction relating to a task to generate e.g. executable software code that receives certain input data and provides certain output data as required e.g. to construct a tool sequence for providing the product data set. For instance, a tool creation agent may generate a graph extraction tool (e.g. a graph query) and may generate a corresponding description, so that the tool can be stored for further use and its description provided in the tool database. A tool may be a plotting agent configured to render a specific plot e.g. related to the product data set, e.g. a mermaid plot. A tool may be an analysis tool configured to analyze e.g. data retrieved by a graph extraction tool according to the user instruction, e.g. extrapolate or interpolate data, or identify possible problems (e.g. by generating a task instruction comprising the retrieved data and an indication that potential problems should be determined and providing it to the generative data-driven model) A tool may be an aggregation agent configured to aggregate different data retrieved from the graph database and e.g. generate a time series from different data points. For an aggregation agent may be configured to aggregate data related to a product's carbon footprint along the production process. A tool may be a selection tool configured to aggregate data on a product e.g. producible in different ways e.g. with different feedstock, and select one way of producing a product, e.g. with the minimum amount of process steps, including certain process steps, comprising a certain input material for producing the product, a certain intermediate product or the like. A tool may be a validation agent configured to e.g. validate whether data retrieved by a graph extraction tool is suitable for determining the at least one product data set, e.g. whether it has the right content, data type (as input for another tool), or format. A validation agent may provide a suitable task instruction to a generative data-driven model, the task instruction e.g. comprising expected content type, data type (e.g. input data type for another tool), or format and a task to validate whether the output of the graph extraction tool adheres to these. A tool may be an output validation agent configured to e.g. validate whether the at least one product data set adheres to the user instruction, for this it may provide a suitable task instruction comprising the at least one product data set and the user instruction together with a task to validate the at least one product data set against the expectations from the user instruction.
[0073] In an embodiment, the at least one tool further comprises a logging tool configured to record, as output data, an intermediate reasoning step or statement generated by a generative data-driven model, wherein the logging tool does not retrieve data from the graph database but provides a record of the reasoning process as part of the tool sequence. At least a part of the record may be provided as part of a task instruction to the generative data-driven model for determining the product data set, re-determining a tool sequence or selecting at least one furtheror subsequent tool. The logging tool or think tool may not access or retrieve data from the graph database, but instead outputs a textual or structured record of the model’s reasoning at a given point in the tool sequence. The output of the logging tool or think tool may be included in the output data and may form part of a verifiable execution trace of the method.
[0074] In a further embodiment, the at least one tool comprises a minimal orthogonal toolset, the minimal orthogonal toolset may comprise: a graph extraction tool configured to retrieve at least one node based on a property associated with the node, a graph extraction tool configured to retrieve all nearest neighbour nodes to a given node, and a graph extraction tool configured to retrieve all unique values for a property associated with nodes or relationships of a given type. Preferably, the at least one tool does only include at least one graph extraction tool from a minimal orthogonal toolset. A minimal orthogonal toolset may be a set of graph extraction tools, each tool providing a distinct, non-overlapping function, such that the set enables the retrieval and traversal of production data represented in a graph structure. The tools may be considered orthogonal in that each tool provides a unique primitive operation not subsumed by the others.
[0075] In an example embodiment, providing input, such as a task instruction and / or context information such as a history data set or tool data set, to the generative data-driven model may comprise mapping the input to a numerical representation of the input. The numerical representation of the input may comprise a tensor associated with the input and / or obtained from the input. In particular, the numerical representation of the input may be indicative of two or more elements of the input and a relation between the two or more elements of the input. Preferably, providing input to the generative data-driven model may comprise at least one of identifying two or more elements of the input, mapping the two or more elements of the input to a numerical representation of the two or more elements, mapping the numerical representation of the two or more elements to a numerical representation of a predefined size related to the numerical representation of the two or more elements, mapping the numerical representation of the predefined size related to the numerical representation of the two or more elements to a numerical representation of the two or more elements and a relation between the two or more elements or a combination thereof. In example an embodiment, processing the input and / or generating output, such as a product data set or tool output data set, from the input may comprise processing the numerical representation of the input, in particular the numerical representation of the two or more elements and the relation between the two or more elements. Processing the numerical representation of the two or more elements and the relation between the two or more elements may comprise mapping the numerical representation of the two or more elements, and optionally the relation between the two or more elements to a numericalrepresentation of the output. The numerical representation of the output data may be mapped to the output, in particular based on a relation between the numerical representation of data and the data. In particular, the data may be of a data type according to the input. Hence, the output may be of the data type according to the input, e.g. of the same data type as the input and / or of the data type specified by the input. Processing the numerical representation of the two or more elements and the relation between the two or more elements may comprise at least one of: generating two or more numerical representations of the two or more elements and the relation between the two or more elements from the numerical representation of the two or more elements and the relation between the two or more elements, modifying the two or more numerical representation of the two or more elements and the relation between the two or more elements by applying a filter to the two or more numerical representations of the two or more elements and the relation between the two or more elements, concatenating the two or more numerical representations of the two or more elements and the relation between the two or more elements, mapping the concatenated numerical representation of the two or more elements and the relation between the two or more elements to a numerical representation of the output data or a combination thereof.
[0076] In an example embodiment, the at least one (e.g. one or more) generative data-driven model may be a pretrained generative data-driven model. The pretrained generative data-driven model(s) may be parametrized and / or trained based on data with a plurality of contexts and / or unstructured data (or natural language data), in particular text data and optionally numerical data such as tabular data or image data. The pretrained generative data-driven model(s) may be configured to perform a plurality of task and / to process data of a plurality of contexts and / or general purpose training data. The pretrained generative data-driven model(s) may be configured to perform the task according to the provided task instruction. Hence, the pretrained data-driven model may be configured to be provided with a plurality of different task instructions and / or provide a plurality of different types of output data upon receiving different task instructions. By using a pretrained model, readily available models can be utilized for generating tool output data sets and / or product data sets, while the data-driven model may be deployed for other applications as well. Thereby, resources for hosting the generative data-driven model can be shared among a plurality of applications. In an example embodiment, the at least one generative data-driven model(s) may be finetuned generative data-driven model(s). The finetuned generative data-driven model(s) may be obtained by training pretrained data-driven model(s) configured to perform a plurality of tasks according to a plurality of task instructions. The finetuned generative data-driven model(s) may be trained additionally on a training data set comprising a plurality of historical contextualized task instructions and corresponding tool output241353WQ0122 data sets and / or product data sets. The finetuned generative data-driven model may be configured to be provided with a plurality of different task instructions and / or provide a plurality of different types of output data upon receiving different types of task instructions. Further, the finetuned generative data-driven model may be configured for providing one type of output upon receiving one type of task instruction with a higher accuracy than providing other types of output data upon receiving other types of task instructions.
[0077] According to an example embodiment of all aspects, the generative data-driven model is or comprises a data-driven reasoning model or multimodal data-driven reasoning model (such as OpenAI's o3 and o4-mini, GPT5 or a large language model having at least similar capabilities). A data-driven reasoning model may be configured to decompose an input task instruction into sub-tasks, to generate potential responses to said sub-tasks and to evaluate and / or validate the potential responses and / or iterate over the potential response including potentially backtracking and generating new potential responses. A data-driven reasoning model may be a generative data-driven model that has been trained to re-iterate and / or validate generated responses prior to providing them. So, the data-driven reasoning model may have been trained to generate tokens in a way resembling iterating over a query or user instruction or other instruction provided to it. For training, reasoning specific training data sets comprising task instructions for solving complex tasks, such as mathematical problems, logical problems, Question-answering tasks (e.g. Stanford Question Answering Dataset), and / or multi-step reasoning data sets that require the model to perform multi-step reasoning (e.g. HotpotQA dataset) may be used, wherein the task instructions are associated with the respective expected result (e.g. labeled). The reasoning data-driven model may have been trained using reinforcement learning which can refine the model's decision-making process. This may involve the model interacting with a simulated environment (e.g. a reward model, for instance a chemical process model configured to simulate or mimic a plant) and receiving feedback (e.g. a reward signal) based on the quality of its responses, e.g. a degree of deviation of the generated to an expected result. Therein, positive rewards are given for correct and logically consistent responses, while negative rewards are given for incorrect or illogical responses. For instance, a subset of trainable parameters (i.e. weights) of a pre-trained generative data-driven model may be updated iteratively based on the reward signal (provided for instance by the reward model). A data-driven reasoning model may also be obtained based on another data-driven reasoning model e.g. by distillation. For instance, a more complex teacher model may be used to generate appropriate training data for a smaller distilled student model, so that the distilled student model may retain much of the performance of the larger model while being more efficient in terms of computational resources, memory usage, and inference speed, the student model may be forinstance a pre-trained LLaMA model that is fine-tuned using the training data generated by the teacher model . Examples of data-driven reasoning models include DeepSeek R1 , OpenAi's GPT o1 and o3 (mini), GPT5 or later, Gemini 3, CriticalThinker-LLaMA-3.1-8B-GGUF, Qwen models or the like; or a large language model having at least similar capabilities.
[0078] In an embodiment, the data-driven reasoning model may be trained based on training task instructions and reward scores or reward signals associated with an accuracy and / or precision of the output data generated by the one or more data-driven reasoning model(s) upon receiving the training task instructions. During the training, the data-driven reasoning model may be adapted according to the reward scores, which are determined based on the output data generated by the data-driven reasoning model in response to receiving the respective training task instructions. The reward scores may be obtained, in particular received via a user interface and / or may be generated by a human. Additionally or alternatively, the reward scores may be generated and / or provided by a reward model. The reward model may be configured to generate and / or provide the reward scores based on receiving the output data generated by the data- driven reasoning model and optionally the training task instructions associated with the output data generated by the data-driven reasoning model. The data-driven reasoning model may be obtained from a pretrained generative data-driven model. By doing so, the data-driven reasoning model can directly obtain its reasoning capabilities from the examples provided during the training. This shapes the reasoning performed by the data-driven reasoning model into a predefined direction allowing for a precise and also accurate reasoning by the data-driven reasoning model. Thereby, the accuracy and / or the precision of generated output data sets may be improved. Ultimately, this may contribute to improving monitoring and / or controlling producing and / or processing a chemical product.
[0079] In an embodiment, the data-driven reasoning model may be trained based on training task instructions or training queries and corresponding target output data to follow task instructions. During the training, the data-driven reasoning model may be provided with the training task instructions and the output data generated by the data-driven reasoning model may be compared with the target output data. During the training, the data-driven reasoning model may be adapted according to a deviation of the output data generated by the data-driven reasoning model upon receiving the training task instructions from the target output data. Training the data-driven reasoning model may comprise determining a deviation of the output data generated by the data-driven reasoning model from the target output data. The target output data may be obtained, in particular received via a user interface and / or a database. The target output data may comprise data expected to be generated by the data-driven reasoningmodel upon receiving the training task instructions. Hence, the target output data may be associated with, in particular related to the training task instructions. The target output data may be related to the training task instructions via one or more reasoning step(s). The one or more reasoning step(s) may specify the relation between the target output data and the training task instructions. The one or more reasoning step(s) may represent and / or may be a logical connection between the training task instructions and the target output data. Additionally or alternatively, the target output data may be generated by a teacher model, in particular by providing the task instruction to the teacher model. The teacher model may be configured to follow task instructions and / or generate an indication of a relation between the task instructions provided to the teacher model and output data generated by the teacher model. In some embodiments, the teacher model may be associated with a higher number of model parameters than the data-driven reasoning model. In other embodiments, equal or lower number of model parameters may be associated with the teacher model than the data-driven reasoning model. The teacher model may be another data-driven reasoning model.
[0080] In an embodiment, the data-driven reasoning model may comprise one or more shared expert(s) and a plurality of routed experts as well as a router. The router may be configured to select at least one of routed experts upon receiving the task instruction. The one or more shared expert(s) may be used independently of the task instruction provided to the data-driven reasoning model. The one or more shared expert(s) may be used automatically upon providing data to the data-driven reasoning model.
[0081] In an embodiment, the generative data-driven model may comprise a mixture of experts model, wherein the mixture of experts model comprises at least one mixture of experts block, wherein the at least one mixture of experts block comprises in particular one or more shared expert(s) and a plurality of routed experts as well as a router, wherein the router may be configured to select the appropriate routed experts for a given input token, e.g. based on routing scores computed using either softmax or sigmoid functions. In an embodiment, the at least one mixture of experts block may further comprise at least one Multi-Head Latent Attention (MLA) layer configured with low-rank compression, i.e. configured to compress latent vectors, in particular key and value vectors, into a lower dimensional space, e.g. via a down-projection matrix. The MLA layer may further be configured to determine rotary positional embeddings.
[0082] In an embodiment, the data-driven reasoning model may be configured to interleave reasoning steps with tool invocation actions during inference, e.g. carrying out certain tools that were part of its training (e.g. a calculator application). For this, the data-driven reasoning model may be trained to emit special action tokens that signal the need to invoke an external tool oragent. Upon generating such an action token, the model may proceed to construct a structured tool call, including the tool identifier and relevant parameters derived from the current reasoning context. The model then halts further token generation while maintaining its internal state, including attention context and memory embeddings. Once the tool response is received, the result is injected into the model’s context window, and the model resumes token generation, continuing the reasoning process with the newly acquired information. This mechanism may allow the model to incorporate real-time computational or retrieval results from different tools or agents into its reasoning chain.
[0083] The data-driven reasoning model may be trained using a combination of supervised fine-tuning and reinforcement learning. During supervised fine-tuning, the model may be exposed to annotated task / solution pairs that include both pure reasoning tasks and tasks requiring tool invocation. These training examples may include explicit demonstrations of when and how to invoke tools, as well as how to incorporate tool outputs into the reasoning trajectory. The model learns to associate specific task patterns with the need for external assistance and to generate the corresponding action tokens and tool call structures. To improve generalization and prevent catastrophic forgetting, the training corpus may be balanced to include both tool- augmented and purely internal reasoning tasks as described above. This may ensure that the model retains its core reasoning capabilities while acquiring the ability to delegate subtasks to external tools when appropriate.
[0084] Reinforcement learning may further refine the model’s decision-making process regarding tool invocation. In this phase, the model is rewarded for correctly identifying when a tool call is beneficial, selecting the appropriate tool, and effectively integrating the tool’s output into the reasoning chain. The reward signal may be derived from task success metrics, such as accuracy, completeness, or user preference scores. The model may be trained using a policy optimization algorithm, such as Group Relative Policy Optimization (GRPO), which evaluates multiple candidate outputs and updates the model based on the relative quality of tool usage strategies. This training may enable the data-driven reasoning model to learn nuanced behaviors, such as deferring tool calls when unnecessary or chaining multiple tool invocations across reasoning steps. As a result, the data-driven reasoning model may achieve a high degree of autonomy and adaptability in complex, multi-step reasoning tasks.
[0085] In an example embodiment, the product may be a chemical product, which may be a product obtained by means of a chemical production process. Chemical production process may refer to a process including one or more chemical reaction(s). The chemical product may be characterized by at least one functional group. The functional group may be at least one of alkylgroup, alkenyl group, alkynyl group, phenyl group, carbonyl group, ketone group, aldehyde group, hydroxyl group, haloformyl group, ester group, carboxylate group, halo group, carboxyl group, peroxy group, carboalkoxy group, hydroperoxyl group, ether group, acetal group, hemiacetcal group, hemiketal group, ketal group, carboxylic anhydride group, carboxamide group, amidine group, amine group, ketamine group, aldimine group, imide group, cyante group, azo group, nitrite group, nitrate group nitro group, nitrile group, sulfide group, thiol group, sulfinyl group, sulfonyl group, sulfo group, thiocyanate group, thionoester group, thiolester group, phosphino group, phosphono group, phosphate group or any combination thereof.
[0086] In an example embodiment, production data may comprise product and / or processing data related to producing and / or processing a chemical product. Production data may comprise instructions associated with producing and / or processing the chemical product. The production data and / or product data sets may comprise natural language, unstructured data and / or human- interpretable data. A product data set may be used and / or provided for controlling one or more processes associated with producing the chemical product. In particular, the production data may comprise structured data and / or the product data set may comprise natural language, unstructured data and / or human-interpretable data.
[0087] In an example embodiment, a user instruction related to determining (for instance comprising e.g. retrieving) the at least one product data set for producing a product may be associated with a property of the product. The property of the product may be related to, in particular comprise, at least a part, of physical, chemical and / or biological properties associated with the product. The user instruction may be obtained via an interface, such as a user interface. The user instruction may be indicative of at least a part of the product. The user instruction may be suitable for identifying the product. The user instruction may comprise an indication of the a / the product (e.g. chemical product) producible in a production environment.
[0088] In an example embodiment, any one of the methods may further comprise: providing functional specification data related to one or more functions of two or more tool(s), and / or providing one or more input data structure(s) related to input data suitable for being provided to at least one of the tools; and e.g. providing a selection and / or structuring task instruction including the user instruction, the input data structure(s) and / or the functional specification data to the generative data-driven model(s) for generating a tool output data set and / or product data set, wherein the generative data-driven model(s) are configured to follow task instructions, wherein the selection and / or structuring task instruction triggers the generative data-driven model to select one or more tool(s) (e.g. operating engine(s)) according to the user instruction and / or to generate input data associated with a data structure suitable for being provided to atleast one the tool(s). For instance, the method may comprise obtaining at least the part of the product data set from the production data represented in a graph data structure according to the user instruction, which may include obtaining at least the part of the product data set according to the input data, in particular by providing the operating input data to the at least one tool(s), in particular the one or more selected tool(s). A tool data set may comprise functional specification data related to one or more functions of the at least one tool(s), and / or one or more input data structure(s). Functional specification data related to one or more functions of the at least one tool(s), and / or one or more input data structure(s) may be provided by including the a tool data set or respectively the functional specification data and / or one or more input data structure(s) as context in a task instruction for the generative data-driven model.
[0089] The methods, systems and / or apparatuses disclosed herein may be part of a large infrastructure for serving a plurality of user instructions including user instructions other than the user instruction related to determining the at least one product data set. Processing a plurality of different request via one interface allows to use available resources efficiently as no extra resources for developing, updating and hosting a plurality of interfaces may be required. Further, more complex requests can be served by combining different tools, especially with simplified navigation for the user. For example, analyzing requests for the actions to be taken allows to combine two data sources in order to provide the product data set. Furthermore, already available implementations requiring well-defined inputs can be operated based on unstructured user instructions.
[0090] History data sets may be obtained, in particular retrieved from a graph data structure (e.g. provided by a graph data base) indicative of how a historic user instruction has been processed to provide a product data set in response to the user instruction, which may for instance be an indication of the tool sequence. The history data set may be part of a thread including at least one historical user instruction. The graph data structure may link historic user instruction, input and output data provided or obtained from tools, the time sequence of the executed tools (e.g. tool sequence), in particular via one or more edge(s), which in turn may represent the data flow. The history data set may be associated with an identifier (ID) e.g. via a user ID associated with the user instruction. The historic user instruction similar to the obtained user instruction may provide more context to the obtained user instruction. Hence, taking the historic user instruction into account when processing the obtained user instruction may allow enhance retrieval of data related to the user instruction (e.g. the preciseness, speed, e.g. because faster models such as GPT4o-mini may be used with similar performance). In other words, the obtained user instruction may be specified by taking the historic user instruction intoaccount. Increasing context information may improve data processing and / or generating by generative data-driven models. By taking the associated history data set into account, the generative data-driven model may more precisely answer obtained user instructions.
[0091] In an example embodiment, an identifier associated with the historic user instruction similar to the obtained user instruction may be obtained, in particular in response to identifying the historical request. Any one of the methods may further comprise obtaining the tool data sets, tool sequence, product data sets corresponding to the historic user instruction similar to the obtained user instruction according to the identifier associated with the historic user instruction similar to the obtained user instruction. The history data sets (e.g. comprising tool data sets, tool sequence, product data sets) corresponding to the historical user instruction similar the obtained user instruction may be obtained from a database comprising a numerical representation associated with the historic user instruction similar to the obtained user instruction and / or another database comprising the tool data sets, tool sequence, product data sets. Thereby, the history data sets (e.g. comprising tool data sets, tool sequence, product data sets) may be stored at one place, e.g. a structured database or a graph database, whereas embedding search or structured search may be used for obtaining historic user instructions similar the obtained user instruction. Hence, the history data sets (e.g. comprising tool data sets, tool sequence, product data sets) may be stored in one database using less resources while the similar historic user task instructions may be retrieved accurately.
[0092] Determining the tool sequence may be based on the obtained at least one history data set. In an example embodiment, the at least one history data set is a plurality of history data sets, wherein the obtained plurality of history data sets are associated with embedded historic user instruction being similar to the obtained user instruction, e.g. fulfilling a threshold criterium with respect to a similarity measure such as cosine similarity. After successfully providing a product data set, the embedding of the obtained user instruction may be added to a plurality of historical user instructions in a vector data base.
[0093] In an example embodiment, any one of the methods may further comprise obtaining an indication of a location associated with the user instruction, in particular a location associated with the production environment, a location of a user associated with the user instruction, a location of a chemical production network, and / or processing facility associated with the user instruction. At least the part of the at least one product data set may be obtained or retrieved according to the obtained user instruction and the obtained indication of the location. Production data may be tailored to location-specific requirements. Some chemical reactions may be sensitive towards humidity. Therefore, production may be constructed differently at locationswith high humidity than compared to locations with lower humidity. By obtaining the location, the product data set can be tailored to the respective local embodiments of the one or more processes.
[0094] In an embodiment, the obtained user instruction may characterize an arrangement of data points, in particular a data format and / or a data structure, associated with the product data set to be obtained. At least the part of the product data set obtained from the graph data structure may be associated with an arrangement of the datapoints other than the arrangement of data points characterized by the received request. The arrangement of the datapoints characterized by the received user instruction may be user-specific.
[0095] In an example embodiment, the generated product data set may comprise instructions for producing the product. In an example embodiment, at least the obtained part of the product data set may be related to two or more versions of at least a part of the product. The two or more versions may be associated with two or more locations, two or more production and / or processing facilities. Determining the tool sequence may further be based on the version, e.g. by providing the version as part of the task instruction to the generative data-driven model.
[0096] In an example embodiment, producing a chemical product may comprise changing a physical property associated with the chemical product and / or applying a physical change to the chemical product and / or processing the chemical product by utilizing a physical force. Processing the chemical product may change a physical appearance of the chemical product while a chemical identity may be unchanged by the processing.
[0097] In an embodiment, any one of the methods may further comprise providing a verifying task instruction for triggering the generative data-driven model to verify the product data set. The verifying task instruction may be associated with and / or may be related to the product data set, in particular as generated by the generative data-driven model.
[0098] Further possible implementations or alternative solutions of the invention also encompass combinations - that are not explicitly mentioned herein - of features described above or below in regard to the embodiments. The person skilled in the art may also add individual or isolated aspects and features to the most basic form of this disclosure.
[0099] Other features will become apparent from the following detailed description considered in conjunction with the accompanying drawings. It is to be understood, however, that the drawings are designed solely for purposes of illustration and not as a definition of the limits, for which reference should be made to the appended claims. It should be further understood that the drawings are not drawn to scale and that they are merely intended to conceptually illustrate the structures and procedures described herein.BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS
[0100] In the following, the present disclosure is further described with reference to the enclosed figures. The same reference numbers in the drawings and this disclosure are intended to refer to the same or like elements, components, and / or parts.
[0101] FIG. 1 shows a schematic diagram of controlling and / or operating a chemical production environment.
[0102] FIG. 2 illustrates an example of a graph structure of production data.
[0103] FIG. 3 illustrates an example of a text-based process description.
[0104] FIG. 4 illustrates an example of a graph structure of a process description for producing polymer-coated plates.
[0105] FIG. 5 shows an example of the method according to the first aspect.
[0106] FIG. 6 shows an example of fine tuning a generative model.
[0107] FIG. 7 illustrates an embodiment of a transformer encoder architecture.
[0108] FIG. 8 illustrates an embodiment of a transformer decoder architecture.
[0109] FIG. 9 illustrates an embodiment of a transformer encoder-decoder architecture.
[0110] FIG. 10 illustrates an embodiment of training and / or deploying the transformer encoder.
[0111] FIG. 11 illustrates an embodiment of a Mamba architecture.
[0112] FIG. 12 illustrates an embodiment of a data-driven reasoning model.
[0113] FIG. 13 illustrates an embodiment of a data-driven reasoning model.
[0114] FIG. 14 shows a schematic block diagram of an example apparatus e.g. according to the second example aspect.DETAILED DESCRIPTION
[0115] The following embodiments are mere examples for implementing the method, the system or application device disclosed herein and shall not be considered limiting. The following description serves to deepen the understanding and shall be understood to complement and be read together with the description as provided in the above summary and embodiment sections of this specification. Some aspects may have a different terminology than e.g. provided in the description above. The skilled person will nevertheless understand that those terms refer to the same subject-matter, e.g. by being more specific.241353WQ0131
[0116] A data-driven reasoning model may be configured to decompose an input task instruction into smaller sub-tasks, to generate potential responses to said sub-tasks and to evaluate and / or validate the potential responses and / or iterate over the potential response including potentially backtracking and generating new potential responses. A data-driven reasoning model may be a generative data-driven model that has been trained to re-iterate and / or validate generated responses prior to providing them. So, the data-driven reasoning model may have been trained to generate tokens in a way resembling iterating over a query or user instruction or other instruction provided to it. For instance, reasoning specific training data sets comprising task instructions for solving complex tasks, such as mathematical problems, logical problems, Question-answering tasks (e.g. Stanford Question Answering Dataset), and / or multi-step reasoning data sets that require the model to perform multi-step reasoning (e.g. HotpotQA dataset) may be used, wherein the task instructions are associated with the respective expected result (e.g. labeled). The reasoning data-driven model may have been trained using reinforcement learning which can refine the model's decision-making process. This may involve the model interacting with a simulated environment and receiving feedback based on the quality of its responses, e.g. against the expected results. Therein, positive rewards are given for correct and logically consistent responses, while negative rewards are given for incorrect or illogical responses. Examples of data-driven reasoning models include DeepSeek R1 , OpenAi's GPT o1 and o3 (mini), GPT5 or later, CriticalThinker-LLaMA-3.1-8B-GGUF, Qwen models or the like.
[0117] In an embodiment, the data-driven reasoning model may be trained based on training task instructions and reward scores associated with an accuracy and / or precision of the output data generated by the one or more data-driven reasoning model(s) upon receiving the training task instructions. During the training, the data-driven reasoning model may be adapted according to the reward scores determined based on the output data generated by the data- driven reasoning model from the training task instructions provided. The reward scores may be obtained, in particular received via a user interface and / or may be generated by a human. Additionally or alternatively, the reward scores may be generated and / or provided by a reward model. The reward model may be configured to generate and / or provide the reward scores based on receiving the output data generated by the data-driven reasoning model and optionally the training task instructions associated with the output data generated by the data-driven reasoning model. The data-driven model may be obtained from a pretrained data-driven model. By doing so, the data-driven reasoning model can directly obtain its reasoning capabilities from the examples provided during the training. This shapes the reasoning performed by the data- driven reasoning model into a predefined direction allowing for a precise and also accuratereasoning by the data-driven reasoning model. Thereby, the accuracy and / or the precision of selecting the best suited tool can be improved resulting in more accurate and / or precise, i.e. meaningful, generation of chemical product data while hallucination can be reduced. Ultimately, this contributes to improving monitoring and / or controlling producing and / or processing a chemical product.
[0118] In an embodiment, the data-driven reasoning model may be trained based on training task instructions or training queries and corresponding target output data to follow task instructions. During the training, the data-driven reasoning model may be provided with the training task instructions and the output data generated by the data-driven reasoning model may be compared with the target output data. During the training, the data-driven reasoning model may be adapted according to a deviation of the output data generated by the data-driven reasoning model upon receiving the training task instructions from the target output data. Training the data-driven reasoning model may comprise determining a deviation of the output data generated by the data-driven reasoning model from the target output data. The target output data may be obtained, in particular received via a user interface and / or a database. The target output data may comprise target indications of selected tools, optionally further target indications of a relation between the training task instructions the target indications of the selected tool. The target output data may comprise data expected to be generated by the data- driven reasoning model upon receiving the training task instructions. Hence, the target output data may be associated with, in particular related to the training task instructions. The target output data may be related to the training task instructions via one or more reasoning step(s). The one or more reasoning step(s) may specify the relation between the target output data and the training task instructions. The one or more reasoning step(s) may represent and / or may be a logical connection between the training task instructions and the target output data.
[0119] Additionally or alternatively, the target output data may be generated by a data-driven teacher model, in particular by providing the task instruction to the data-driven teacher model. The data-driven teacher model may be configured to follow task instructions and / or generate an indication of a relation between the task instructions provided to the data-driven teacher model and output data generated by the data-driven teacher model. In some embodiments, the data- driven teacher model may be associated with a higher number of model parameters than the data-driven reasoning model. In other embodiments, equal or lower number of model parameters may be associated with the teacher model than the data-driven reasoning model. The data-driven teacher model may be another data-driven reasoning model. Preferably, the data-driven reasoning model may comprise a mixture of experts model, wherein the mixture ofexperts model comprises at least one mixture of experts block, wherein the at least one mixture of experts block comprises in particular one or more shared expert(s) and a plurality of routed experts as well as a router, wherein the router may be configured to select the appropriate routed experts for a given input token, e.g. based on routing scores computed using either softmax or sigmoid functions.
[0120] By using a teacher model, knowledge from the teacher model can be distilled to the data-driven reasoning model. Thereby, the data-driven reasoning model is not trained separately resulting in a significant saving of computation resources. Especially, using less parameters with the data-driven reasoning model than the teacher model during inference, either via the architecture or via mixture of experts, reduces the computational resources for using the data- driven reasoning model. Ultimately, this contributes to improving monitoring and / or controlling producing and / or processing a chemical product.
[0121] In an embodiment, different training schemas as described above may be applied together and / or may be combined to obtain, in particular train, the data-driven reasoning model. Thereby, the advantages of the different training schemas can be combined.
[0122] In an embodiment, the generative data-driven model comprises a mixture of experts model, wherein the mixture of experts model comprises at least one mixture of experts block, wherein the at least one mixture of experts block comprises in particular one or more shared expert(s) and a plurality of routed experts as well as a router, wherein the router may be configured to select the appropriate routed experts for a given input token, e.g. based on routing scores computed using either softmax or sigmoid functions. In an embodiment, the at least one mixture of experts block may further comprise at least one Multi-Head Latent Attention (MLA) layer configured with low-rank compression, i.e. configured to compress latent vectors, in particular key and value vectors, into a lower dimensional space, e.g. via a down-projection matrix. The MLA layer may further be configured to determine rotary positional embeddings.
[0123] In an embodiment, the data-driven reasoning model may comprise one or more shared expert(s) and a plurality of routed experts as well as a router. The router may be configured to select at least one of routed experts upon receiving the task instruction. The one or more shared expert(s) may be used independently of the task instruction provided to the data-driven reasoning model. The one or more shared expert(s) may be used automatically upon providing data to the data-driven reasoning model.
[0124] In an embodiment, providing the task instruction to the data-driven reasoning model may include and / or result in generating two or more indications of a selected tool (or operating engine) and selecting one of the two or more indications of the selected tools. Selecting oneindication of the selected tool may include ranking the two or more indications according to a quality score associated with the two or more indications. The quality score may be indicative of an accuracy and / or precision of the indication of the selected tool provided by the one or more data-driven reasoning model(s). The quality score may be obtained, in particular received, preferably via a user interface. The quality score may be provided and / or generated by a reward model. The reward model may be configured to provide quality scores associated with two or more indications in response to receiving the two or more indications, preferably further receiving at least a part of the task instruction, in particular at least a part of the functional specification data and / or the request.
[0125] By selecting one of the two or more indications of the selected tool, more computational resources can be allocated to the task of selecting at least one tool. Thereby, the accuracy and / or the precision of selecting the best suited tool can be improved. As a result, the generated chemical product data is more accurate and / or precise, i.e. meaningful, while hallucination can be reduced. Ultimately, this contributes to improving monitoring and / or controlling producing and / or processing a chemical product.
[0126] In an embodiment, the data-driven reasoning model may be configured to interleave reasoning steps with tool invocation actions during inference, e.g. carrying out certain tools that were part of its training (e.g. a calculator application). For this, the data-driven reasoning model may be trained to emit special action tokens that signal the need to invoke an external tool or agent. Upon generating such an action token, the model may proceed to construct a structured tool call, including the tool identifier and relevant parameters derived from the current reasoning context. The model then halts further token generation while maintaining its internal state, including attention context and memory embeddings. Once the tool response is received, the result is injected into the model’s context window, and the model resumes token generation, continuing the reasoning process with the newly acquired information. This mechanism may allow the model to incorporate real-time computational or retrieval results from different tools or agents into its reasoning chain.
[0127] The data-driven reasoning model may be trained using a combination of supervised fine-tuning and reinforcement learning. During supervised fine-tuning, the model may be exposed to annotated task / solution pairs that include both pure reasoning tasks and tasks requiring tool invocation. These training examples may include explicit demonstrations of when and how to invoke tools, as well as how to incorporate tool outputs into the reasoning trajectory. The model learns to associate specific task patterns with the need for external assistance and to generate the corresponding action tokens and tool call structures. To improve generalizationand prevent catastrophic forgetting, the training corpus may be balanced to include both tool- augmented and purely internal reasoning tasks as described above. This may ensure that the model retains its core reasoning capabilities while acquiring the ability to delegate subtasks to external tools when appropriate.
[0128] Reinforcement learning may further refine the model’s decision-making process regarding tool invocation. In this phase, the model is rewarded for correctly identifying when a tool call is beneficial, selecting the appropriate tool, and effectively integrating the tool’s output into the reasoning chain. The reward signal may be derived from task success metrics, such as accuracy, completeness, or user preference scores. The model may be trained using a policy optimization algorithm, such as Group Relative Policy Optimization (GRPO), which evaluates multiple candidate outputs and updates the model based on the relative quality of tool usage strategies. This training may enable the data-driven reasoning model to learn nuanced behaviors, such as deferring tool calls when unnecessary or chaining multiple tool invocations across reasoning steps. As a result, the data-driven reasoning model may achieve a high degree of autonomy and adaptability in complex, multi-step reasoning tasks.
[0129] FIG. 1 shows an operator 102 controlling and / or operating (118) a chemical production environment in form a chemical plant. Operator 102 provides a user instruction related to determining (e.g. comprising data related to where data relevant for the product data set may be retrieved) a product data set for producing a product to an agent system. An orchestrator agent receives the user instruction and determines a sequence of tools that should be executed to generate the requested product data set, wherein at least one tool is a graph extraction tool configured for retrieving product data from a graph database providing production data in a graph structure. Ford determining the tool sequence the agent may access a tool database providing e.g. a list of tools and an associated description including input data and output data for the tools. Such tool data sets, e.g. in form of a list of tools, may be included as context in a task instruction for a generative data-driven model. Further, the agent may access a history graph database providing history data sets of past successfully processed historic user instructions. Based on similarity between the user instruction and the historic user instruction, the history data set associated with the most similar historic user instruction can be retrieved and also included as context in a task instruction for a generative data-driven model. The task instruction may then be provided to the generative data-driven model, which may have been trained on general purpose data sets in an unsupervised manner (e.g. GPT4o). The generative data-driven model may, in response to receiving the task instruction, generate the tool sequence. According to the tool sequence the respective tools are carried out in the specifiedorder and with the specified input used for the respective tools, e.g. a first tool may be a graph extraction tool to extract certain product properties from the production data in graph structure. A second tool may then be an averaging tool that e.g. provides a time average of the extracted property. A third tool may be a plotting tool generating a plot of the retrieved property data, e.g. a time-dependent plot. The output of the tools may be included in a further task instruction for the generative data-driven model or another generative data-driven model as context together with the original user instruction to generate a product data set corresponding to the user instruction, e.g. in a structure and / or type specified in the user instruction. The product data set may then be provided to the operator 102, which may use it to operate and / or control the plant to produce the product accordingly.
[0130] FIG. 2 shows an examples of production data represented in a graph structure stored in a graph database. Example production data for a chemical product X is shown. However, please note that this graph also includes production data for other products as part of the graph, e.g. for intermediate products. The production data may represent a bill of material (BOM). A BOM may be a comprehensive list of raw materials, components, and assemblies required to construct, manufacture, or repair a product. It may include the quantity of each material needed and may provide additional details such as part numbers, descriptions, and specifications. BOMs may be used in manufacturing and engineering as they ensure that all necessary materials are available and accounted for during production. For example, in a chemical manufacturing process, a BOM may list all the chemicals, solvents, and equipment needed to produce a specific compound. The BOM represented in graph structure in the example of FIG. 2 may be Level 1 : Final Product• Product: Chemical Compound X o Part Number: P001 o Description: Final chemical product Level 2: Intermediate Components
[0131] Intermediate Component A• Part Number: IC001• Description: Intermediate chemical A• Quantity: 5 kg• Sub-components: o Raw Material 1■ Part Number: RM001Description: Raw chemical 1■ Quantity: 3 kg o Raw Material 2■ Part Number: RM002■ Description: Raw chemical 2■ Quantity: 2 kg o Raw Material 3■ Part Number: RM003■ Description: Raw chemical 3■ Quantity: 1 kg
[0132] Intermediate Component B• Part Number: IC002• Description: Intermediate chemical B• Quantity: 3 kg• Sub-components: o Raw Material 3■ Part Number: RM003■ Description: Raw chemical 3■ Quantity: 1 kg o Raw Material 4■ Part Number: RM004■ Description: Raw chemical 4■ Quantity: 2 kgLevel 3: Additional Components• Solvent o Part Number: S001 o Description: Solvent used in the process o Quantity: 2 L
[0133] Such a BOM (e.g. adding relationships between the different materials) may be structured in a graph structure, e.g. by representing the materials or the BOM's components as nodes, and the relationships between them as edges. For instance, like in the following example in a graph database like Neo4j using Cypher queries:CREATE (bom: BOM {id: 'BOM12345', product: 'Chemical Compound X'}) CREATE (intermediateA:Component {partNumber: 'IC001', description: 'Intermediate chemical A' quantity: 5 unit: 'kg'}) CREATE (intermediateB:Component {partNumber: 'IC002', description: 'Intermediate chemical B', quantity: 3, unit: 'kg'}) CREATE ( rawMateriall : Component {partNumber: ' RM001 ' , description: 'Raw chemical 1', quantity: 3, unit: 'kg'}) CREATE ( rawMaterial2 : Component {partNumber: 'RM002', description: 'Raw chemical 2', quantity: 2, unit: 'kg'}) CREATE ( rawMaterial3 : Component {partNumber: 'RM003', description: 'Raw chemical 3' quantity: 1, unit: 'kg'}) CREATE ( rawMaterial4 : Component {partNumber: 'RM004', description: 'Raw chemical 4', quantity: 2, unit: 'kg'}) CREATE (solvent:Component {partNumber: 'S001', description: 'Solvent' quantity: 2, unit: '!_'}) CREATE (bom)- [:CONTAINS {condition: 'Catalytic reaction with Solvent at 70°C'}]- >(intermediateA) CREATE (bom)- [ :CONTAINS {condition: 'Heating at 80°C'}]- >(intermediateB) CREATE ( bom) -[: CONTAINS {condition: 'Dissolution'}]- >(solvent) CREATE (intermediateA) - [ :CONTAINS {condition: 'Reaction with RM002 and RM003 at 60°C '}]->( rawMateriall) CREATE ( intermediateA) - [:CONTAINS {condition: 'Reaction with RM001 and RM003 at 60°C'}]- >(rawMaterial2) CREATE (intermediateA) -[: CONTAINS {condition: 'Reaction with RM001 and RM002 at 60°C '}]->( rawMaterial3) CREATE (intermediates) - [:CONTAINS {condition: 'Cooling with RM003 to 20°C '}]->( rawMaterial3) CREATE ( intermediates) -[: CONTAINS {condition: 'Stirring with RM004'}]- >(rawMaterial4)
[0134] This structure includes the BOM ID, the product name, and a list of components with their part numbers, descriptions, quantities, and units.
[0135] Production data having a graph structure may allow to store complex relationships and dependencies between components. This may be particularly useful in chemical production, where multiple raw materials, intermediates, and final products may be involved, often with intricate interdependencies. As the complexity of the production process increases, a graph may scale to accommodate new components and relationships without becoming unwieldy, so this may provide scalability. Further, a graph database may be optimized for querying relationships and may allow for easier retrieval and analyzes of product data for production. To retrieve certain data from the graph structure a graph extraction tool may be used, which may e.g. Retrieve certain nodes and / or edges based on a graph database query, such as a cypher query. Forinstance, all nodes connected upstream of a certain node (for instance representing e.g. a product or intermediate product) may be retrieved.
[0136] For example, in a graph database like Neo4j, the Cypher query language may be used to retrieve information. For example, to retrieve all components of a specific BOM a graph extraction tools may use the following cypher query:MATCH (bom:B0M {id: ' B0M12345 '})-[: CONTAINS] -> (component : Component) RETURN component
[0137] This cypher query matches the BOM node with the ID 'BOM12345' and retrieves all components related to it through the CONTAINS relationship.
[0138] To retrieve a specific component's details, an example cypher query may be:MATCH (bom:B0M {id: ' BOM12345 '})-[: CONTAINS] - >(component : Component{partNumber: 'C001'}) RETURN component
[0139] This cypher query retrieves the details of the component with part number 'COOT within the specified BOM.
[0140] Updating or changing the BOM in a graph database may involve modifying the nodes and relationships (edges). For example, to add a new component to an existing BOM, the following cypher query may be used:MATCH (bom:BOM {id: 'BOM12345'}) CREATE ( newComponent : Component {partNumber: 'C004' description: ' New Chemical ' , quantity: 3, unit: 'kg'}) CREATE (bom)- [ :CONTAINS] ->( newComponent )
[0141] This cypher query matches the BOM node and creates a new component node, then establishes the CONTAINS relationship between them.
[0142] For example, to update the details of an existing component, the following cypher query may be used:MATCH (bom:BOM {id: ' BOM12345 '})-[: CONTAINS] - >(component : Component{partNumber: 'C001'}) SET component .description = 'Updated Chemical A' , component . quantity = 12
[0143] This query matches the specific component and updates its description and quantity.
[0144] To remove a component from the BOM, the following cypher query may be used:MATCH ( bom : BOM {id : ' BOM12345 ' }) - [ r : CONTAINS] - > ( component : Component {partNumber : ' C001 ' }) DELETE r component
[0145] This cypher query deletes the relationship and the component node.
[0146] The generated product data set may be a visualization of a particular part of the graph structure of the production data or may be of human readable form such as a BOM.
[0147] FIG. 3 illustrates an example of a text-based process description for producing polymer- coated plates and / or supplying new plates.
[0148] The text-based process description may be indicative of a sequence of a plurality of steps related to one or more processes associated with producing and / or processing a chemical product. The text-based process description may indicate the sequence of steps by letters and a closing bracket. Further, the text-based process description may comprise many repetitive words such as “the” or “of”. Hence, the text-based process description may be inefficient for processing by data-driven models. Further, the text-based representation may require more effort to be kept up to date, eg by human users and may not show a relation to another process and / or indicate executors of the steps related to the one or more processes. Hence, an improved process description to enable reliable processing by data-driven models while operating on up- to-date data that can represent relations between two or more processes is required.
[0149] FIG. 4 illustrates an example of a graph structure of a process description (as an example of production data represented in a graph structure) for producing polymer-coated plates and / or supplying new plates corresponding to the text-based process description depicted in FIG. 3.
[0150] The graph data structure may comprise a plurality of nodes and a plurality of edges. The plurality of nodes may be indicative of executors of a plurality of steps included in the process and / or steps associated with the process. The edges may be indicative of relations between the nodes. Hence, the edges may characterize the step to be initiated by one of the executors and / or the step following another step in the process description and / or a link to another process. The graph data structure may be beneficial to oversee if a process may be complete or if any step and / or executor may be missing. Further, the graph data structure may be updated easily and serve as a central point for retrieving data related to processes. More than that, representing how several processes are linked together allows to pass through updates relating to a part of the graph data structure. Especially in a widely linked production such as chemical production interrelated processes and updates e.g. regarding production quantities may be essential to adjust the upcoming one or more step(s) and / or processes.Ultimately, retrieving data from the graph data structure allows processes associated with producing and / or processing a chemical product to be monitored and / or controlled efficiently.
[0151] FIG. 5 shows an example of the method according to the first aspect. In this example, an operator of a production environment transmits a user instruction to a computer system. The user instruction is received via a user interface.
[0152] The user instruction is embedded into an embedding space comprising embeddings of historic user instructions, which may be retrieved from a corresponding vector data base. This may be carried out by a memorizer or memorizer unit or another suitably configured function or computer unit. In the embedding space a similarity score for the embeddings of the historic user instructions is determined, wherein the similarity score for each embedded historic user instruction is based on the distance between the respective embedding of the historic user instruction and the embedded user instruction with respect to a similarity measure (e.g. Cosine similarity or Euclidean distance). It may be determined, whether the similarity score is below or above a threshold similarity. In that case it is determined that one or more historic user instruction similar to the user instruction has / have been processed in the past (e.g. only the closest historic user instruction or a set of all historic user instructions fulfilling said threshold criterium). From a history data base, per historic user instruction of the one or more historic user instructions, a history data set is retrieved (e.g. by identifying the historic data set associated with the historic user instruction, for instance by means of an identifier associated to both the embedded historic user instruction and the corresponding history data set). A history data set may comprise at least part of the tool sequence determined in relation to the respective historic user instruction. The retrieved history data set(s) are passed on to the orchestrator agent.
[0153] The user instruction in this example may be passed to an orchestrator agent for determining a tool sequence comprising at least one graph extraction tool for retrieving a product data set from a graph database providing product data sets as described in context of FIG. 2. The orchestrator agent retrieves / receives a set of tool data sets from a tool database, wherein the tool data sets may comprise descriptions of respective tools (e.g. functions or agents for specific tasks) including any input and output data required to execute the tools, e.g. in json format. A set of tool data sets may e.g. be a list of tools and their associated descriptions.
[0154] Based on the set of tool data sets, the user instruction, and - in case history data set(s) are retrieved / received according to the description above - on the history data set(s), a tool sequence is determined or caused to be determined (e.g. by the orchestrator agent) comprising at least one graph extraction tool for retrieving at least a part of the at least one product data set or a basis of the at least one product data set from a graph database providing productiondata represented in a graph structure (e.g. comprising the at least one product data set or at least comprising all information included in the at least one product data set). A tool sequence may comprise a ordered list of tools specifying which tools to execute in what order for processing the user instruction. For instance, a tool could be a function or an agent requiring specific input and providing specific output. If for instance, a user instruction contains a temperature value in °F and a graph extraction tool needs a temperature value in °C as input, the tool sequence may comprise first executing a transformation tool configured to transform a °F value into a °C value and second executing the graph extraction tool based on the output of the transformation tool. The tool data sets of the tools may comprise a description of these tools, e.g. “transformer tool 1 : Able to transform °F into °C; input: value in °F; output: value in °C” and “graph extraction tool 1 : configured to slice the graph data based on a temperature value; input value in °C”. The tool data sets may be merged into a list or json file. For instance, determining the tool sequence may comprise providing a task instruction for generating the tool sequence to a generative data-driven model, the generative data-driven model (e.g. having been trained on general purpose training data sets) being configured to generate (e.g. and provide) a tool output data set related to the tool sequence, in response to receiving the task instruction (such a generative data-driven model may e.g. be GPT4, GPT4o). The task instruction may be based on the set of tool data sets, the user instruction, and - in case history data set(s) are retrieved / received according to the description above - on the history data set(s). For instance, the task instruction may comprise at least a part of the set of tool data sets, the user instruction, and - in case history data set(s) are retrieved / received according to the description above - on the history data set(s), e.g. as context. The tool output data set provided by the generative data- driven model may then be transformed or parsed into the tool sequence, e.g. the tool names and sequence from the tool output data set may be identified (e.g. providing a suitable task instruction to the generative data-driven model) and stored in a list or array. Preferably, the tool output data set is already in the format of a tool sequence, so that parsing may be dispensed with. In case no tool sequence can be determined based on the user instruction alone, e.g. a clarification agent may be carried out, which may transmit a clarification request to the operator via the user interface. In that case a received clarification information from the operator may be used as the user instruction for carrying out the described method or the clarification information may be used addition to the original user instruction as the user instruction for carrying out the described method.
[0155] Subsequently, the tools are carried out, caused to carry out or executed according to the tool sequence e.g. by a routing module or component of the orchestrator agent. The output data sets provided by the tools may then be received e.g. by the orchestrator agent.
[0156] The product data set may then be generated (e.g. by a response module or component) based at least a part of the output data (e.g. and the user instruction). For instance, a suitable task instruction may be provided to a / the generative data-driven model, the generative data- driven model (e.g. having been trained on general purpose training data sets) being configured to generate (e.g. and provide) a product data set, in response to receiving the task instruction for generating the product data set. This task instruction may e.g. comprise at least a part of the output data and optionally the user instruction, e.g. as context.
[0157] The generated product data set may then be provided e.g. via a user interface to the operator. Further, it may be determined whether the user instruction was successfully processed, based for instance on the tools being successfully executed according to the tool sequence or for example on the operator indicating (e.g. via the user interface) that the product data set was or can be successfully used to produce the product. Upon determining that the user instruction was successfully processed a history data set associated with the user instruction may be stored in the history data base (e.g. in a graph structure, wherein the history data base may be a graph data base). The history data set may comprise the tool sequence, for instance the history data set may comprise at least one of the following: the user instruction, clarification requests, received clarification information, task instructions and output generated by the generative data-driven model in response thereto, output data set, tool output data sets or other communication e.g. with the operator (for instance if further information is requested to process the user instruction) or between different functions / agents. Further upon determining that the user instruction was successfully processed, the embedded user instruction may be stored e.g. in the vector data base, for use as a historic user instruction when the method is carried out afterwards. The (now historic) user instruction may be associated with the corresponding history data set, e.g. by an identifier or within metadata associated or stored together with the embedded (historic) user instruction.
[0158] In the following another example is described.
[0159] An orchestrator agent may utilize a tool database for data retrieval from a graph database operates by following a structured decision-making process. For instance: When a user submits a query (as a user instruction), the orchestrator agent first analyzes the query to understand the specific data requirements. This may involve parsing the query to identify key elements such as the type of data needed, the relationships between data points, and any constraints or filters specified by the user. The orchestrator agent may then consult the tool database, which may contain descriptions of all available tools e.g. for data retrieval. Each tool in the database may be described in terms of its capabilities, such as the types of data it canretrieve, the methods it uses, and any specific conditions or limitations. Based on the analysis of the user query, the orchestrator agent applies a set of criteria to select the most appropriate tools. These criteria (as e.g. Indicated in the tool data set associated to the tool) may include: The tool's ability to retrieve the specific type of data requested; The tool's performance in terms of speed and resource usage; The tool's reliability in providing accurate and complete data; The tool's compatibility with the graph database and any other tools that may be used in conjunction. If a single tool may not be sufficient to fulfill the query. The orchestrator agent may need to combine multiple tools to generate the desired product data set. This involves determining the sequence in which the tools should be used and how the outputs of one tool can serve as inputs to another (as an example of a tool sequence). Once the appropriate tools have been selected and combined, the orchestrator agent executes the data retrieval process. It triggers execution of the selected tools to query the graph database, retrieve the necessary data, and apply any specified filters or transformations. Finally, the orchestrator agent may compile the retrieved data into a format that meets the user's requirements and presents it to the user. This may involve aggregating data from multiple sources, resolving any conflicts or inconsistencies (e.g. via validation task instruction provided to the generative data-driven model), and may allow that the final output is clear and usable. This may further allow providing efficient and accurate product data set, providing the user with the information they need in a timely manner.
[0160] As a further example of the internal Workflow of the orchestrator agent, the orchestrator agent may receive the user query. The agent parses and analyzes the query to understand the specific requirements. This may involve identifying key terms and the context of the request. The agent may consult the tool database to identify tools that can retrieve the required data. The database contains descriptions of each tool's capabilities. The agent prepares a task instruction to an LLM (e.g. GPT-4o) that includes the user query and relevant information from the tool database. This task instruction may be structured to provide the LLM with all necessary context to make an informed decision. For instance, a task instruction for generating the tool sequence may look like;{ " prompt" : "You are a routing agent with access to a tool database . The tool database contains desc riptions of available tools for data retrieval from a graph database . The user query is : ' Retrieve a bill of material for producing a product . ' Based on the tool descriptions provided, decide which tools to use and how to combine them to fulfill the user query . Here are the tool descriptions : {Tool Descriptions} . Based on the user query and the tool descriptions, decide which tools to use and how to combine themto retrieve a bill of material for producing a product . " , "max_tokens " : 500, "temperature" : 0.1 }Tools may be e . g . Unit Conversion tool ( configured to convert a unit (e . g . lb) given in a user instruction into a different unit (e . g . kg) represented in the graph structure) , graph extraction tool, plotting tool, output conversion tool ( configured to convert extracted data into a desired outut format, e . g . machine- readable user instruction) etc .
[0161] The example task instruction provides the LLM with the context to make a decision on the tool sequence. It may include a description of the routing agent's role, the user query, and detailed descriptions of the available tools. Further, for instance the temperature may be controlled to a lower value so to allow for less variability of the generated answers or to a higher value e.g. 0.7 to allow for more variability.
[0162] The LLM would analyze the prompt and provide a response indicating which tools to use and how to combine them in a tool sequence. For example:{ " response" : "To retrieve a bill of material for producing a product, use the following tools in sequence : \n\nl . Unit Conversion tool with the input lib, Graph extraction tool with the output from the unit conversion tool \n\n Combine the outputs of these tools to compile a comprehensive bill of material for producing the product . " }
[0163] This may allow for efficiently and accurately processing the user instruction.
[0164] In an example, the LLM may perform query analysis as part of a single call that includes both the user query and the tool database. This approach simplifies the workflow by consolidating tasks into one step. However, it may limit the granularity of control over each step. Alternatively, the query analysis can be a separate LLM call. This may allow for more focused and detailed analysis of the query before deciding on the tools to use. This may improve the accuracy of tool selection. As an alternative to using an LLM for Query Analysis a rule-based systems may be used, that uses predefined rules and patterns to analyze the query. As a further alternative, simpler Natural Language Processing (NLP) techniques such as tokenization, part- of-speech tagging, named entity recognition, and dependency parsing may be used to analyze the query. Further LLMs may be combined with different techniques. For example, using an LLM for initial query understanding and then applying rule-based or simpler machine learning models for detailed analysis.
[0165] For instance, a rule-based system may use predefined rules and patterns to match user queries with the appropriate tools in the database. For example, the system may use a set of predefined rules to parse and understand the user query. For example, it might look for specific keywords or phrases that indicate the type of data needed. The system may compare the parsed query against a set of rules that describe the capabilities of each tool. It selects the tools that match the criteria specified in the query. The system then may rank the matched tools based on predefined criteria such as relevance, efficiency, and accuracy. The system may determine the sequence in which they should be used based on the rules.
[0166] To call different tools based on the LLM response, the orchestrator agent may follow a structured process to ensure that each tool is invoked in the correct sequence (according to the tool sequence) and that their outputs are combined effectively. As an example, the orchestrator agent first parses the LLM response to identify the tools to be used and the order in which they should be called. For example, if the LLM response indicates that the unit conversion tool should be used first, followed by the graph extraction tool, the agent will include this sequence in the tool sequence. The agent then may prepare the necessary input for each tool based on the user query and any intermediate results from previous tools. This may involve formatting the data in a way that each tool can process. Then, the agent may invoke the first tool in the sequence. For example, if the unit conversion tool is the first tool, the agent sends the relevant data e.g. from the user instruction to this tool to convert the units to being suitable for the graph extraction tool. Once the first tool completes its task, the agent processes the output and prepares it as input for the next tool, e.g. the graph extraction tool. This may involve transforming the data into a format that the next tool can understand. The agent then may call the next tool in the sequence with the prepared input. For example, if the graph extraction tool is the next tool, the agent sends the processed data from the unit conversion tool to this tool to retrieve detailed information about the materials required. After all tools have been called and their outputs have been processed, the orchestrator agent combines the results into a coherent product data set that meets the user instruction.
[0167] The following example may demonstrate how to parse the generative data-driven model's response (e.g. a tool output data set) to extract the tools and their sequence into a tool sequence: import re import json # Example LLM response llm_response = { " response" : "To retrieve a bill of material for producing a product, use the following tools in sequence : \n\nl . Unit Conversion tool with the inputlib, Graph extraction tool with the output from the unit conversion tool \n\n Combine the outputs of these tools to compile a comprehensive bill of material for producing the product . " } # Function to parse the LLM response def parse_llm_response( response) : # Extract the tools and their descriptions using regex tool_pattern = re . compile( r" \d+\ . \s ( [A-Za- z\s ]+) : \s ( . * ? ) \n" , re . DOTALL ) tools = tool_pattern . findall ( response) # Create a list to store the parsed tools parsed_tools = [ ] for tool in tools : tool_name = tool . strip ( ) tool_description= tool . strip( ) parsed_tools . append ( { "tool_name" : tool_name, "tool_description" : tool_description }) return parsed_tools # Parse the LLM response parsed_tools parse_llm_response( llm_response [" response" ] ) # Print the parsed tools print (json . dumps ( parsed_tools, indent =2) )
[0168] As a further example, an agentic system is disclosed which given a user query (Q) performs the following main steps before generating the answer (A) and answer artifacts (ARTs):1 . Analyze the user query2. Expand the user query to provide each of the tools / Agents with the necessary input information to retrieve the required information.3. Generate a sequence of required steps (e.g. tool / agent calls) necessary to retrieve the context required to answer the query.4. Inspect the retrieved context and add additional tools / agents whenever necessary. Furthermore, adapt previous calls that failed or ask the user for additional information as required.5. In case none of the available tools is capable of retrieving the required information, the system will use a tool creation Agent which can generate a graph query not covered by already existing tools.6. If the user’s query requested some post-processing of the context, e.g., calculating a weighted average of some quantities or any other sort of aggregation, the system will use a computational agent which is able to execute such instructions in a robust manner.7. Generate corresponding Artifacts from the aggregated context. This includes the retrieved information as a dataframe, time-series data as line-plots, nodes and edges as a subgraph of the underlying knowledge graph and more.8. Additional plots as requested by the user can be generated using a plotting Agent who can render the desired plot using for example Mermaid.9. The user can further interact with the content of previous system responses and thus continue the deep dive into the processes of interest.
[0169] The disclosed method may further personalize the responses based on the specific user’s previous interactions with the system by retrieving examples from the user’s past user instructions and provided feedback to the system (e.g. from a history data base as described above). This may allow the system to learn the types of contexts the user is interested in, for example a plant controller would be interested in specific bills of materials.
[0170] FIG. 6 shows an example of a training and fine-tuning process to obtain a fine-tuned generative data-driven model 606. A general-purpose generative data-driven model 604 may have been (pre-)trained using a large number of training data sets, which may be unlabeled data sets, in an unsupervised manner. Training may involve tokenizing input texts and masking a number of tokens of the input text. The weights of the generative data-driven model may then be adjusted based on the accuracy of generating the masked tokens, wherein the accuracy may be measured by a metric such as in cross-entropy loss or maximum likelihood estimation. The pre-trained model 604 may then be fine-tuned 620 using for example a number of labeled specific training data sets 422, comprising queries and respective (correct) output data sets. Preferably, low-rank adaptation or parameter-efficient fine-tuning (PEFT) is used for fine-tuning 620, which may allow for efficient fine-tuning and may reduce the risk of the pre-trained general purpose generative data-driven model 604 losing the pre-trained weights (i.e. catastrophic forgetting). The fine-tuning 620 may involve creating a number of training queries, e.g. by consulting operators on likely queries or using a generative data-driven model to generate queries, wherein an operator is included as a production persona in the prompt for generating the training queries, which may enhance the quality of the output. Product data sets may then be generated based on the trained queries using the pre-trained general purpose generative data-driven model 604. The product data set may then be corrected e.g. by consulting respective experts for producing the product and the corrected product data sets may be used as labels for the respective training queries. The labeled training queries, i.e. pairs of training query and corresponding corrected product data sets may then be used for fine-tuning 620 as specific training data sets 422.
[0171] For instance, at least one specific output layer may be added to the pre-trained model 604 which may be trained using the specific training data sets 422, wherein the weights of the generative data-driven model may be kept static during fine-tuning.
[0172] The fine-tuned model 606 may be used as the generative data-driven model for generating a tool output data set and / or product data set 616 for a query 614 of an operator 610, which may enhance the quality of the generated data sets.
[0173] FIG. 7 illustrates an embodiment of a transformer encoder architecture e.g. of an encoder-only transformer model, which may be utilized for obtaining an embedding of a user instruction from the user instruction.
[0174] The transformer encoder comprises an encoder input 724, one or more encoder blocks 720, 714 and an encoder output 722. In particular, the transformer encoder may be referred to as X-former. The transformer encoder architecture may correspond to the encoder architecture associated with the transformer encoder-decoder architecture with an additional encoder output instead of connecting the encoder block directly to the decoder of the transformer encoderdecoder architecture. An example of a transformer encoder architecture is the bi-directional encoder representations from transformers (BERT).
[0175] The input data may be received at the encoder input 724. The input data may comprise at least one of text data, numerical data, tabular data, image data or the like. Where the input data may comprise one of text data, numerical data, tabular data, image data or the like, input embedding of a type corresponding to the type of input data may be applied. The type of the input data may be text data, numerical data, tabular data, image data or the like. In an embodiment, the input data may be associated with two or more types of input data. The input embedding may be associated with the two or more types of input embedding, in particular according to the input data. Hence, the input embedding may be configured to map text data, numerical data, tabular data, image data or the like to a numerical representation of the input data. In particular, at least one first type of input embedding may be applied to at least a part of the input data associated with one first type of input data. Further, at least one second type of input embedding may be applied to at least a part of the input data associated with one second type of input data. The model associated with the input embedding comprising the at least one first and at least one second type of input data may be referred to as multimodal model. The type of the input data may correspond to a modality.
[0176] Receiving and / or providing the input data may comprise identifying two or more elements of the input data. This may be referred to as tokenization. For this purpose, a vocabulary may be available. The vocabulary may specify a plurality of elements, in particular elements typically repeating in data of the type of the input data. For example, where the input data may be text data, the vocabulary may comprise several endings and / or word stems. In anembodiment, the elements of the input data may be specified by a selection indicative of the plurality of elements provided.
[0177] The encoder input 724 may apply an input embedding 702, in particular to the two or more elements of the input data. Applying the input embedding 702 may refer to passing the input data, in particular the two or more elements of the input data preferably separately, through one or more embedding layer. Applying the input embedding may comprise mapping the input data, in particular the two or more elements of the input data to a numerical representation of the input data. The numerical representation may be indicative and / or may be related to the input data. Mapping the input data to the numerical representation of the input data may comprise identifying two or more elements of the input data. For example, where the input data may be text data, the text may be divided into one or more token(s). The one or more element(s) may be mapped to a numerical representation of the one or more part(s). In particular, the number of element(s) may be equal to the number of numerical representation of the element(s). The numerical representation may be a tensor, in particular a vector and / or a matrix.
[0178] Further, the numerical representation of the two or more elements may be mapped to a numerical representation of a predefined size related to the numerical representation of the two or more elements. This may be referred to as padding. Data-driven model(s) may require data input of a predefined size. Hence, padding may allow for processing of input data of irregular size by the generative data-driven model. Padding may include concatenating a numerical representation independent of the input data with the numerical representation of the two or more elements to generate the numerical representation of predefined size related to the numerical representation of the two or more elements. The numerical representation independent of the input data may be indicative of a zero.
[0179] Further, the encoder input 724 may apply positional encoding 704. Applying positional encoding 704 may refer to adding a positional factor to the embedded input obtained via input embedding. Applying positional encoding 704 may comprise mapping the numerical representation of the predefined size related to the numerical representation of the two or more elements to a numerical representation of the two or more elements and a relation between the two or more elements. Preferably, the input data may specify a sequence of elements. The positional factor pposmay be indicative of the position of the elements within the sequence. For example, the positional factor pposmay be obtained based on the following equation:
[0181] ppos(2i + 1) = cos10000 d
[0182] where pos may refer to the position of the element within the sequence, / may refer to the dimension associated with the input embedding and d may refer to the dimension of the model, e.g. transformer decoder, transformer encoder or transformer encoder-decoder. This may be referred to as absolute positional embeddings. Alternatively, the positional encoding may be based on rotary positional embeddings (RoPE). Positional encoding is beneficial since it enables the processing of sequential data without requiring further dimensions indicating the position of each element. Followingly, the positional encoding 704 reduces the computational resources needed for embedding the input data. By passing the input data through the encoder input, the input data may be transformed into a second-rank tensor representing the sequence of elements. This second-rank tensor may be referred to as embedded input data. The embedded input data may be processed by the encoder block. The embedded input data may be provided to the layer normalization 708 by a residual connection. Multi-head self-attention 706 may be applied to the embedded input data. Multi-head self-attention 706 may comprise the two components multi-head and self-attention. Self-attention may be understood as being a filter applied to the embedded input data. By applying the filter to the embedded input data, the elements associated with the embedded input data contributing to the to be generated output data may be identified for generating the output data. Hence, the filter may represent the degree of contributing to the to be generated output data by the elements associated with the embedded input data. Applying the filter may be referred to as weighting the elements associated with the embedded input data. This is advantageous specifically regarding long sequences of elements. The filter may be learned and improved during the training by learning to identify the contribution of elements associated with the embedded input data. For example, in the partial sentence “I went to the bakery to buy a” the last word may be generated by the generative data-driven model such as the transformer encoder. The self-attention may focus the transformer encoder to attend to the word “bakery” and “buy” mostly to generate the word “bread”. Self-attention may refer to attention generated based on the input data. Hence, the filter may be determined based on the input data, preferably the embedded input data. The embedded input data may serve as query Q, key K and value V with respect to the self-attention operation. The self-attention may refer to attention based on the received input data. Hence, the filter may be calculated based on the following formula by inserting the respective tensors based on the embedded input data:
[0184] where dkcorresponds to the dimension of the key.
[0185] For improving the efficiency of the transformer encoder further, the multiple heads are used to apply the filter resulting in the multi-head self-attention 706. Multi-head self-attention706 may comprise applying the filter to two or more elements of the embedded input data. Hence, the tensor may be split into two or more elements and the filter may be applied to the two or more elements separately by two or more heads according to the following equation: head i = Attenti.on(QWQ, KWK, VWv)
[0186] with parameter matrices WLQe ]Rdxd<?, wtKedxd / c, Wi' / e Rdxdv where i may refer to the number of heads, dK, dKand dQmay refer to the dimensions of the value, key and query.
[0187] The result of the two or more head may be concatenated according to the following equation: MultiHead Q, K, V) = Concat(head l, ... , headhyw°
[0188] the number of heads.
[0189] The embedded input data may be transformed via the multi-head self-attention 706 into a context tensor. The context tensor may represent the sequence of elements and the relation between two or more elements of the input data. The context tensor may be a second rank tensor and / or may comprise one or more first rank tensor(s). After the multi-head self-attention 706 layer normalization 708 may be applied based on the context tensor and / or the embedded input data from the residual connection. Applying layer normalization 708 may refer to normalizing the context tensor. Normalizing the context tensor may lower the values of the entries of the context tensor. This reduces the computational cost associated with processing the context tensor. Further, it improves the training by contributing the loss to converge and preventing instabilities.
[0190] Layer normalization 708 may be followed by passing the context tensor to a feedforward layer 710 again followed by layer normalization 712 based on the residual connection to the context tensor and / or the output of the feed-forward layer 710. The feed-forward layer 710 may be a feed-forward neural network. The feed-forward neural network may comprise of a plurality of fully connected neurons. Passing the context tensor through the feed-forward neural network may result in transforming the context tensor linearly. Additionally or alternatively, the neural network may comprise one or more activation functions such as a rectified linear unit (ReLLI). Hence, the neural network may be configured for performing one or more non-linear operations to the context tensor and / or transforming the context tensor non-linearly. After the context tensor has been transformed and / or normalized by the feed-forward layer 710 and the layer normalization 712, the context tensor may be provided to one or more further encoder blocks 714. Having passed the context tensor through the feed-forward layer 710 may adapt the context tensor for the processing by a further attention layer of the one or more further encoder blocks 714 for applying a self-attention filter, preferably multi-head self-attention 706. Thecontext vector after being transformed by the layer normalization 712 and the feed-forward layer 710 may be referred to as hidden state.
[0191] The encoder output 722 comprises of a linear layer 716 and a softmax layer 718. The linear layer 716 may transform the context vector into a logits vector. The linear layer may be fully-connected. The logits vector obtained by passing the context tensor through the linear layer 716 may be passed through the softmax layer 718. Passing the logits vector through the softmax layer 718 may refer to applying the softmax function to the logits vector. Applying the softmax function to the logits vector may result in a probability distribution of one or more elements corresponding to the sequence of elements in the input data. From the probability distribution based on predefined selection criteria, one or more elements may be chosen. The one or more chosen elements may be referred to as the one or more elements generated by the transformer encoder. The one or more generated elements may be provided to the encoder input for generating further one or more elements corresponding to the sequence of the input data and the one or more elements generated by the transformer encoder as described within the context of FIG. 9.
[0192] Hence, processing the numerical representation of the two or more elements and the relation between the two or more elements by the generative data-driven model may comprise at least one of• generating two or more numerical representations of the two or more elements and the relation between the two or more elements from the numerical representation of the two or more elements and the relation between the two or more elements,• modifying the two or more numerical representation of the two or more elements and the relation between the two or more elements by applying a filter to the two or more numerical representations of the two or more elements and the relation between the two or more elements, wherein the filter may be configured to modify the contribution of the two or more elements to the numerical representations of the two or more elements and the relation between the two or more elements,• concatenating the two or more numerical representations of the two or more elements and the relation between the two or more elements• mapping the concatenated numerical representation of the two or more elements and the relation between the two or more elements to a numerical representation of the output data
[0193] or a combination thereof.
[0194] In particular the encoder block may be configured to• split the numerical representation of the two or more elements and the relation between the two or more elements into two or more numerical representations of the two or more elements and the relation between the two or more elements,• modify the two or more numerical representation of the two or more elements and the relation between the two or more elements by applying a filter to the two or more numerical representations of the two or more elements and the relation between the two or more elements, wherein the filter may be configured to modify the contribution of the two or more elements to the numerical representations of the two or more elements and the relation between the two or more elements,• concatenate the two or more numerical representations of the two or more elements and the relation between the two or more elements
[0195] or a combination thereof. Applying self-attention may comprise modifying the two or more numerical representation of the two or more elements and the relation between the two or more elements by applying a filter to the two or more numerical representations of the two or more elements and the relation between the two or more elements , wherein the filter may be configured to modify the contribution of the two or more elements to the numerical representations of the two or more elements and the relation between the two or more elements. The filter may be obtained during training of the generative data-driven model. The filter may be obtained based on, in particular related to the input data. Multi-head self-attention may comprise generating two or more numerical representations of the two or more elements and the relation between the two or more elements from the numerical representation of the two or more elements and the relation between the two or more elements, modifying the two or more numerical representation of the two or more elements and the relation between the two or more elements by applying a filter to the two or more numerical representations of the two or more elements and the relation between the two or more elements , wherein the filter may be configured to modify the contribution of the two or more elements to the numerical representations of the two or more elements and the relation between the two or more elements and / or concatenating the two or more numerical representations of the two or more elements and the relation between the two or more elements.
[0196] The encoder output may be configured to map the concatenated numerical representation of the two or more elements and the relation between the two or more elements to a numerical representation of the output data. The numerical representation of the output data may be mapped to output data, eg by providing a vocabulary indicative of a relation between numerical representations and data of a type according to the input data. Additionally oralternatively, a decoding model may be used to map the concatenated numerical representation of the two or more elements and the relation between the two or more elements to a numerical representation of the output data. The decoding model may be trained to relate a numerical representation of data of a type according to the input data.
[0197] FIG. 7 illustrates an embodiment of a transformer decoder architecture e.g. of an decoder-only transformer or transformer-based model.
[0198] The transformer decoder comprises a decoder input 824, one or more decoder blocks 820 , 814 and a decoder output 822. The transformer decoder may be referred to as X-former. The transformer decoder architecture may correspond to the decoder architecture associated with the transformer encoder-decoder architecture independent of receiving one or more hidden states from the encoder of the transformer encoder-decoder. An example of transformer decoder architectures is the generative pretrained transformer (GPT).
[0199] The decoder input 824 may apply input embedding 802 and positional encoding 804 analogous to analogous to the input embedding 802 and the positional encoding 804 as described within the context of FIG. 7.
[0200] The decoder block 820 may comprise the layer normalizations 808, the masked multihead self-attention 806, the feed-forward layers 810 and / or the layer normalization 812. The embedded input data resulting from passing the input data through the decoder input 824 may be provided to the layer normalization 808 via a residual connection. Further, masked multihead self-attention 806 may be applied to the embedded input data. Masked multi-head selfattention 806 corresponds to the multi-head self-attention 706 as described within the context of FIG. 7 with additionally masking a part of the embedded input data associated with elements later in the sequence than the element to be generated. Additionally or alternatively, the part of the input data associated with elements later in the sequence than the element to be generated may not be received and / or transformed into the embedded input data. Thus, the transformer decoder may be suitable for generating a subsequent element to a sequence, whereas the transformer encoder may be suitable for generating a missing element in within one sequence and / or between two or more sequences. Therefore, the transformer encoder may be configured for classification tasks. The transformer decoder may be configured for text generation. Masked multi-head self-attention may comprise applying a filter obtained based on elements of the sequence of the input data appearing previously to the to be generated part of the sequence. Similar to the transformer encoder as described within the context of FIG. 7, a context tensor may be generated by applying the masked multi-head self-attention 806 and the layer normalization 808. The context tensor may be provided to the layer normalization 812 via aresidual connection. Further, the feed-forward layer 810 and the layer normalization 812 may be analogous to the feed-forward layer 710 and the layer normalization 712 as described within the context of FIG. 7. The context tensor may be provided to one or more further decoder blocks 814.
[0201] The decoder output 822 may comprise of a linear layer 816 and a softmax layer 818. The linear layer 816 and the softmax layer 818 may be analogous to the linear layer 716 and the softmax layer 718 as described within the context of FIG. 7.
[0202] FIG. 9 illustrates an embodiment of a transformer encoder-decoder architecture e.g. of an encoder-decoder transformer(-based) model. The transformer encoder-decoder may comprise the encoder input 940, the one or more encoder blocks 938, 928, the decoder input 946, the decoder block 942 and the decoder output 944. The encoder input 940 may correspond to the encoder input 724 of FIG. 7. The one or more encoder block 938, 928 may correspond to the one or more encoder blocks 720, 714 of FIG. 6. The decoder input 946 may correspond to the decoder input 824 of FIG. 8.
[0203] The architecture described with respect to FIG. 9 may allow that the transformer encoder-decoder may receive and process input data at the encoder input 940 and the one or more encoder blocks 938, 928 and the decoder block 942 and the decoder output 944. Based on the input data, the transformer encoder-decoder may generate output data part by part or sequentially. The sequentially generated output data may be provided to and / or may be processed by the decoder input 946, the one or more decoder blocks 942, 906 and the decoder output 944. Preferably, a sequence may be provided to the encoder input 940 and after having generated at least a part of the output data, the decoder input 946 may be provided with at least the part of the elements of the output data already generated. By doing so, the next elements of the output data may be generated with a higher accuracy by taking the input data and the generated output data into account since more data is received by the transformer encoderdecoder may be received over time.
[0204] Because of the transformer encoder-decoder architecture, the transformer encoderdecoder may be configured for transforming a sequence into another representation of the sequence. An example for transforming one sequence into another representation may be translation of one sentence into another language. A plurality of transformer encoder-decoders may be used such as BART, T5 or the like.
[0205] In an embodiment, the layer normalization 936, 912 may be applied prior to the masked multi-head self-attention 934, multi-head self-attention 914 and / or the feed-forward layer 902 in the transformer decoder, the transformer encoder and / or the transformer encoder-decoder. Bydoing so, the computational resources for applying the multi-head self-attention 914 and / or the feed-forward layer 902 to the embedded input data and / or the context tensor may be decreased as the entries of the respective tensors may be lower after normalization.
[0206] In an embodiment, the decoder output 944 may comprise of a classification neural network, further feedforward layers, convolutional layers, fully connected layers or the like. For example, the transformer encoder-decoder may be configured for choosing between a plurality of options. For this purpose, the transformer encoder-decoder may be provided with three different input data sets and may classify the context vectors obtained from the one or more decoder blocks 942 via one or more linear layers. Followingly, the architecture may be extended depending on the use case to be solved.
[0207] FIG. 10 illustrates an embodiment of training and / or deploying the transformer encoder, the transformer decoder and / or the transformer encoder-decoder.
[0208] The encoder / decoder / encoder-decoder architecture 1002 may correspond to the transformer decoder, the transformer encoder and / or the transformer encoder-decoder as described within the context of FIG. 7 - FIG. 9.
[0209] The output data generated by the encoder / decoder / encoder-decoder architecture 1002 may comprise of one or more elements, in particular a sequence of elements. The previously generated elements of the output data may be provided as input for generating the next element in the sequence of the output data.
[0210] The input data may comprise of N elements, in particular input tokens. An input token may be a token dedicated to be inputted into a data-driven model such as the transformer decoder, the transformer encoder or the transformer encoder-decoder. The output data to be generated may comprise of M elements. The encoder / decoder / encoder-decoder architecture 1002 may generate one element of the output data based on receiving the input data and optionally previously generated elements of the output data at a timestep. Hence, for generating M elements M time steps are required. A time step comprises of providing input 1010, 1012, 1014 to the encoder / decoder / encoder-decoder architecture 1002 and receiving output data 1004, 1008, 1006 from the encoder / decoder / encoder-decoder architecture 1002. In a first timestep, the input 1010 may comprise of N input tokens. The N input tokens may be associated eg with N words, stems or endings. Preferably, the N input tokens may specify a question. One or more input tokens may specify the beginning of the sequence of tokens and / or the end of the sequence of tokens. The input 1010 may be processed by the encoder / decoder / encoder- decoder architecture 1002. Based on the input 1010 at least a part of the output data 1004 may be generated. The at least a part of the output data may comprise a first output token. In thenext timestep, the generated first output token may be provided together with the input 1012. Specifically, where the input 1012 may be received by a transformer encoder-decoder the input tokens may be received at the encoder input 940 and the first output token may be received at the decoder input 946. Where the input 1012 may be received by the transformer encoder, the input 1012 may be received by the encoder input 940 and analogously regarding the transformer decoder and the decoder input 946. Based on the input 1012, the output data 1008 comprising the first output token and a second output token may be generated. Generating the output data 1008 based on the input 1012 may refer to generating the second token based on the first token and the N input tokens, wherein the first token may have been generated based on the N input tokens. This process may be repeated until the last token in the sequence of the output data 1006 may be generated. Preferably, the last token may be an end token. The end token may terminate the generation of a further output token.
[0211] Similarly, to the data processing during deployment of the encoder / decoder / encoder- decoder architecture 1002, the encoder / decoder / encoder-decoder architecture 1002 may be trained. The training data set may comprise a plurality of sequences comprising a plurality of elements. The sequences may be associated with the input data and / or the output data. Additionally or alternatively, the sequences may be independent of the input data and / or the output data. For example, where the input data and the output data may refer to chemical compositions represented via text, the training data set may comprise sequential text data independent of chemical compositions. In this example, the training data set may comprise sequences of words originating from a conversation. In an embodiment, the training data set may comprise at least partially input data sets and / or output data sets.
[0212] The training may be initialized by initializing the encoder / decoder / encoder-decoder architecture 1002. In an embodiment, the parameters associated with the encoder / decoder / encoder-decoder architecture 1002 may be initialized randomly. Additionally or alternatively, the input embedding of the encoder / decoder / encoder-decoder architecture 1002 may be obtained by training a CBOW model or a skip gram model as described within the context of FIG. 5. The trained embedding layer may be used during training. The parameters associated with the embedding layer may be kept constant and / or may be updated after a predefined number of training epochs. By doing so, the number of parameters to be updated is lower enabling a faster and less computational resources-consuming training. Further, the accuracy associated with the embedding layer may be constant and / or may be increased by avoiding error compensation in relation to the just initialized encoder / decoder / encoder-decoder architecture 1002.
[0213] During the training of the encoder / decoder / encoder-decoder architecture 1002, at least a part of the sequences of the training data set may be provided to the encoder / decoder / encoder-decoder architecture 1002 one by another and one or more elements may be generated based on the sequences of the training data set one by another. The elements generated based on the sequences may follow the elements of the parts of sequences the encoder / decoder / encoder-decoder architecture 1002 may have been provided with. The generated one or more elements may be compared to the one or more elements following the at least a part of the sequences provided to the encoder / decoder / encoder-decoder architecture 1002 as specified by the training data set. Hence, during the training the encoder / decoder / encoder-decoder architecture 1002 may generate a guess on the next element and the guess on the next element in a sequence may be compared to the ground truth specifying the actual next element according to the training data set. Based on the guess on the next element and the ground truth a loss may be determined. The loss may define the similarity between the guess on the next element and the ground truth. The loss may be determined by forming a vector dot product between the token associated with the one or more elements and the token associated with the ground truth. A loss unequal to zero may result in updating the parameters associated with encoder / decoder / encoder-decoder architecture 1002. Preferably the parameters associated with the encoder / decoder / encoder-decoder architecture 1002 may be independent of the embedding layer. For example, the parameters associated with the encoder / decoder / encoder-decoder architecture 1002 may be weights of the neurons of the encoder / decoder / encoder-decoder architecture 1002.
[0214] Based on the determined loss, backpropagation may be applied to determine the gradients associated with the parameters of the parameters associated with encoder / decoder / encoder-decoder architecture 1002 to lower the loss. According to the determined gradients, the parameters associated with the encoder / decoder / encoder-decoder architecture 1002, preferably the weights of the neurons associated with the encoder / decoder / encoder-decoder architecture 1002, may be updated by using a gradient descent algorithm.
[0215] The training data set may be unlabeled. The sequences of elements within the training data set may inherently comprise the ground truth for determining the loss with respect to the one or more elements generated during the training of the encoder / decoder / encoder-decoder architecture 1002. Hence, the encoder / decoder / encoder-decoder architecture 1002 may be trained self-supervised. This is advantageous since time and resources for creating a labeled training data set may be saved. Furthermore, this enables the usage of large training data setsassociated with a size of several tera bytes. Consequently, the data-driven model may be accurate in generating elements of a sequence. In addition, the large training data set enables few shot predictions or even zero shot predictions. Hence, the generative data-driven model(s) trained as described above are versatile contributing to saving resources needed for training and / or hosting a plurality of purpose-driven models such as convolutional neural networks. The training described above may be referred to as pretraining. Pretraining may refer to training a generative data-driven model based on data with a plurality of contexts
[0216] The generative data-driven model may be configured for performing few shot or even zero shot predictions with respect to a plurality of use cases after pretraining. The performance of the data-driven model may be increased further by additional training referred to as finetuning. Finetuning may refer to training a pretrained data-driven model for a concrete task, e.g. by providing task instructions to the pretrained data-driven model and adapting the parameters of the pretrained data-driven model to decrease the distance of the generated output data by the pretrained data-driven model in response to receiving the task instructions from predefined output data corresponding to the provided task instructions.
[0217] Models based on the architecture according to FIG. 7 to FIG. 9 and / or pretrained generative data-driven model(s) and / or finetuned data-driven model(s) may be referred to as large language models. Examples of large language models include Llama models, Mistral models, GPT models, BERT models or the like. Such models have been tested. Testing data- driven model(s), in particular pretrained and / or finetuned data-driven model(s), may include comparing output data generated by the one or more data-driven model(s) in response to receiving the input data with target data, e.g. obtained from domain experts. These domain experts may be a current bar for performing tasks the data-driven model(s) may be parametrized and / or trained for. The target data may specify output data desired to be generated in response to receiving the input data.
[0218] FIG. 11 illustrates an embodiment of a selective space state sequence model e.g. utilizing a Mamba architecture, that may be used as a generative data-driven model. A selective state space model architecture may enhance inference speed in relation to a transformer based model.
[0219] The selective state space architecture with its layered structure may be similar to the transformer decoder architecture discussed in relation to FIG. 8. However, instead of decoder blocks selective state space blocks 1132, 1104 are stacked. Selective state space block 1132 may be based on a selective space state sequence model (S6).
[0220] An input token may be linearly projected via linear layer 1112, 1120 into an expanded latent space (which may allow to capture more information during processing in the selective state space layer 1110), followed by a convolution via a convolutional layer 1114 and a nonlinear function (e.g. a sigmoid linear unit (SiLu) or swish activation function). The convolution before the selective state space layer 1110 may prevent independent token calculations. The selective state space layer 1110 performs a selective state space operation. Further, a learnable skip connection may be provided via linear layer 1120, this may use a linear transformation to map the input to the output, similar to a residual connection in a transformer model this may help to mitigate vanishing gradient effects.
[0221] A selective state space layer 1110 may be a linear recurrent network that selectively process data based on the input token, which may allow to focus on relevant data and discard irrelevant data. For instance in each step a separate weight vector may be determined based on the respective input token. The determined weight vector may then be used in a selective scan.
[0222] A selective state space layer 1110 may be used in a convolutional mode e.g. for parallelizable training and a recurrent mode for near-constant time generation of output data. A state space operation may be based on solving the state and output equations, wherein a state equation may describe how a state changes based on how the input influences the state and an output equation may describe how the state is translated to the output. Further how the input influences the output may be represented by a learnable linear transformation, e.g. a matrix D, used in a learnable skip connection.
[0223] The state equation for a hidden state may be (in discretized form):
[0224] hk= Ahk-+ Bxk
[0225] The output may be expressed by (in discretized form):
[0226] yk= Chk
[0227] This discretized space state model may be unfolded into a recurrent form similar to a recurrent network, exemplifying that a selective state space model may be or comprise a linear recurrent model. However, here matrices A, B, and C may also be used as a kernel of a convolution of the state space model. Kernel K for this may e.g. be:
[0228] K = CA2B, CAB, CB)
[0229] which may allow to determine an output:
[0230] yk+1= CA2Bxk-±+ CABxk+ CBxk+1
[0231] So, in this representation of the state space model training may be performed in a parallel manner like in convolutional neural networks.
[0232] Matrix A may be a matrix that represents recent tokens well and decays older tokens and may be initialized using HiPPO:
[0234] where every entry below the diagonal is set to 0. This may allow to create a long-term memory for the selective state space model.
[0235] For a Selective state space block 1132, the matrices B and C as well as the step size A used for discretization of the matrices may be dependent on the input token and may be trained during training, so that for each input token different matrices B and C are determined, which may enhance the content-awareness and may act similar to a multi-head self-attention in a transformer model. However, unlike in space state models with fixed matrices A, B, and C, here the convolutional representation may not be easily determined. Hence, to operate the selective state space layer 1110 in convolutional mode a selective scan may be applied utilizing associative properties of the hidden states calculation, allowing parallel determination of the sequence in parts and iteratively combining them, so that parallel training may be used. Further reading and writing operations may be decreased by using kernel fusion of the described step size, the selective scan, and the multiplication with C.
[0236] Linear layer 1102 may project the generated output back into the same dimension as the input.
[0237] Selective state space blocks may be used together with transformer decoder blocks or mixture of expert blocks (e.g. decoder blocks wherein the feed-forward layer is exchanged for a gating network and a number of parallel feed-forward layers, wherein the gating network switches between the feed-forward layers depending on the input), which may allow leveraging advantages of the different architectures.
[0238] An example of the architecture of a selective state space block may be found in “Mamba: Linear-Time Sequence Modeling with Selective State Spaces” by Albert Gu and Tri Dao arXiv:2312.00752v2 [cs.LG] 31 May 2024, , which is incorporated herein by reference.
[0239] FIG. 12 illustrates an example of a generative data-driven model comprising a mixture of experts model, wherein the mixture of experts model comprises at least one mixture of experts block, wherein the at least one mixture of experts block comprises in particular one or more shared expert(s) and a plurality of routed experts as well as a router, wherein the router may be configured to select the appropriate routed experts for a given input token, e.g. based on routingscores computed using either softmax or sigmoid functions. The at least one mixture of experts block may further comprise at least one Multi-Head Latent Attention (MLA) layer configured with low-rank compression, i.e. configured to compress latent vectors, in particular key and value vectors, into a lower dimensional space, e.g. via a down-projection matrix. The MLA layer me further be configured to determine rotary positional embeddings. For instance, the mixture of experts model may comprise a number of initial decoder blocks 1216 (e.g. three), wherein the decoder blocks may be dense layers or decoder blocks as e.g. described in context of a transformer model in FIG. 8, FIG. 9. After the initial decoder blocks 1216 a mixture of experts block 1232 followed by further decoder and mixture of experts blocks 1202, e.g. alternating between decoder and mixture of experts blocks. The model may comprise 61 decoder and mixture of experts blocks having three initial decoder blocks 1216 and 29 mixture of experts blocks.
[0240] The model may comprise one or more (e.g. 29) mixture of experts blocks 1232 comprising at least one multi-head latent attention layer 1212, and layer normalization 1210, 1214 as well as feed-forward mixture of experts layer 1208. The mixture of experts model may further comprise an input embedding layer 1218, positional encoding 1220, a transformer block, a linear layer 1204 and / or softmax layer 1206. The initial decoder blocks 1216 process the input data, which is then passed through a series of further decoder and mixture of experts blocks 1202, wherein the mixture of experts blocks and transformer blocks may alternate. The mixture of experts block 1232 may be configured with multiple experts which may be activated based on the input data, and their outputs may be combined to produce the final decoder output 1234. A feed-forward mixture of experts layer 1208 may comprise one ore more shared experts (e.g. two), which may be configured to be activated per input token and a plurality of routed experts (e.g. 64 to 256, in particular 256), of which a pre-determined number (e.g. 6 to 8, in particular 8) are activated per input token individually by the router, e.g. based on a routing score learned during training. So, the router within the mixture of experts block may determine which routed experts to activate for each input token, which may enhance efficiency and effective processing, e.g. in that not all parameters (e.g. 671 B) of the mixture of experts model need to be utilized per input token but only a subset (e.g. 37B), which may reduce the number of calculations by more than an order of magnitude. So, in particular the mixture of experts model's combination of shared and routed experts may enhance computational efficiency and model performance.
[0241] An expert (routed or shared) within the mixture of experts block 1232 may be a specialized feed-forward network that processes the input data independently. The router may dynamically select a subset of experts for each input token, based on the token's characteristicsand the experts' capabilities. This selective activation allows the model to focus computational resources on the most relevant experts, improving both inference speed and accuracy. The experts' outputs may then be aggregated and passed through the decoder output layer 1234, which generates the final output.
[0242] The multi-head latent attention mechanism 1212 may allow the model to capture complex dependencies between input tokens, while the layer normalization 1210, 1214 may stabilize in particular the training process. The positional encoding 1220 may provide the model with information about the relative positions of tokens within the input sequence, enhancing its ability to understand context. The mixture of experts block 1232 may comprise a router employing a routing algorithm to dynamically select the most appropriate experts for each input token, based on their learned capabilities.
[0243] The input tokens may be converted into dense vectors using a ParallelEmbedding layer, which may support parallel embedding of input tokens, which may facilitate efficient handling of large-scale data inputs. Positional encoding may be added to the input embeddings to retain the order of the sequence, which may help the model understand the relative positions of tokens within the input sequence. The first three transformer-decoder blocks may use Multi-Layer Perceptron (MLP) layers to process the input embeddings and positional encodings, providing initial transformations to the input data. The multi-head latent-attention 1212 layer may utilize query, key, and value vectors, with support for low-rank projections and rotary positional embeddings, which may capture dependencies between tokens and allow the model to focus on relevant parts of the input sequence. RMSNorm may be used for normalization before the attention and feed-forward layers, which may improve the model's performance.
[0244] The feed-forward network (FFN) within the transformer blocks in this example is implemented as either an MLP or an MoE layer, depending on the layer index. The example MoE layer features 256 routed experts, with 8 experts activated for each token. The gating mechanism dynamically selects the experts based on the input and routing scores, with each MoE layer containing one or more shared experts, providing additional flexibility and specialization. After the initial three MLP layers, the remaining transformer-decoder blocks alternate between MLP and MoE layers, ensuring a balance between computational efficiency and model capacity, leveraging the strengths of both types of layers. The final output projection layer maps the transformed features to the vocabulary size, generating the next token in the sequence using a softmax function to produce probability distributions over the vocabulary.
[0245] The example mixture of experts block comprises multiple experts, including routed experts and one or more shared experts. The router may selects the appropriate experts foreach input, based on routing scores computed using softmax or sigmoid functions. The model configuration includes parameters such as vocabulary size, model dimension, intermediate dimension, number of layers, number of attention heads, and data type (FP8). The router may be a component within the MoE layer that selects and activates a subset of routed experts based on input data (e.g. input token) and a routing score. The experts may be specialized subnetworks within the MoE layer that process a portion of the input data, designed to handle specific types of inputs or tasks, providing specialized knowledge and capabilities. Shared experts may be always active and process every token, providing common functionalities or knowledge that can be utilized across different inputs, enhancing model efficiency and consistency. Routed experts may be dynamically selected and activated by the router based on input data and routing scores, allowing the model to leverage diverse expertise and improve performance.
[0246] A multi-head latent attention (MLA) layer may comprise multiple attention heads (e.g. 1228) and may support low-rank projections and rotary positional embeddings.
[0247] The training process of the mixture of experts model may be configured to enhance efficiency and performance. The model may undergo a multi-stage training regimen, beginning with pre-training on an extensive set of publicly available general data, in particular comprising at least 10 trillion tokens (Byte-level Byte-Pair Encoding (BBPE) algorithm with a vocabulary size of 128K may be used as the tokenizer for the data). During pre-training, the model is exposed to a different tasks and domains, ensuring a comprehensive understanding of various contexts and scenarios. The pre-training phase is characterized by the use of optimization techniques, including the AdamW optimizer with a warmup-and-step-decay learning rate schedule. The AdamW optimizer is an extension of the Adam optimizer that incorporates weight decay regularization to improve generalization. The Adam optimizer itself is a stochastic gradient descent method that computes adaptive learning rates for each parameter. It combines the advantages of two other extensions of stochastic gradient descent: AdaGrad and RMSProp. Specifically, Adam maintains an exponentially decaying average of past gradients (momentum) and squared gradients (RMSProp), which helps in stabilizing the learning process. The learning rate schedule used in pre-training involves an initial warmup phase where the learning rate is increased linearly to a maximum value, followed by a step-decay phase where the learning rate is reduced in a controlled manner. This approach helps in achieving stable convergence and prevents the model from overshooting the optimal parameters. Gradient clipping is employed to prevent exploding gradients, which can destabilize the training process. This technique involvesscaling down gradients that exceed a certain threshold, ensuring that the updates to the model parameters remain within a reasonable range.
[0248] The model is trained on a large-scale distributed infrastructure, utilizing pipeline parallelism, expert parallelism, and data parallelism to efficiently manage computational resources and reduce training time. Pipeline parallelism involves splitting the model into different stages and distributing these stages across multiple processors (e.g. GPUs), allowing for concurrent processing of different parts of the model. Expert parallelism leverages the mixture of experts architecture by distributing different experts across multiple processors, enabling efficient utilization of computational resources. Data parallelism involves splitting the training data into smaller batches and distributing these batches across multiple processors, allowing for simultaneous processing and faster training.
[0249] The training process may be configured with multi-token prediction (MTP) objectives, which extend the prediction scope to multiple future tokens at each position. This may densify the training signals and improves data efficiency, enabling the model to pre-plan its representations for better prediction of future tokens. The MTP modules are sequentially integrated into the training pipeline, maintaining the complete causal chain at each prediction depth. This may enhance the model's performance and facilitate speculative decoding during inference, which may accelerate the generation process.
[0250] To further enhance the efficiency of the training process, the model training may comprise FP8 mixed precision training. Mixed precision training involves using both 16-bit and 32-bit floating-point numbers to represent model parameters and perform computations. FP8, or 8-bit floating-point, is an even more compact representation that allows for faster computations and reduced memory usage. By using FP8 for certain parts of the model, such as activations and gradients, the training process can be accelerated without compromising the model's performance. This approach may leverage hardware accelerators that support mixed precision operations, such as NVIDIA'S Tensor Cores.
[0251] After pre-training, a generative data-driven model such as deepseek-v3 is obtained. Based on such a base generative data-driven model, a data-driven reasoning model may be obtained using reinforcement learning. For reinforcement learning the base generative data- driven model may be used as an initial policy model. The policy model may be trained using a Group Relative Policy Optimization (GRPO) algorithm, which optimizes the model by maximizing the expected reward. The training process involves sampling a group of outputs from the current policy model, evaluating them using reward models, computing the advantage for each output, and updating the policy model based on these advantages. GRPO estimates the baseline fromgroup scores simplifying the training process and reducing computational overhead. The reward models used may include both rule-based and model-based reward models. These reward models provide feedback on the model's outputs, guiding the optimization process. For each input question or task, GRPO samples a group of outputs from the current policy model. These outputs are generated based on the current policy and represent different possible responses to the given input. The sampled outputs are evaluated using the reward models. The reward models provide quality scores based on the quality and correctness of the outputs. For rulebased reward models, this involves deterministic validation (e.g., checking the correctness of a mathematical solution). For model-based reward models, this involves comparing the outputs to human-annotated preference data. GRPO computes the advantage for each output within the group. The advantage is calculated as the difference between the reward of the output and the mean reward of the group, normalized by the standard deviation of the group rewards. This normalization helps stabilize the training process and ensures that the policy model focuses on improving outputs that are significantly better than the average. The policy model is optimized by maximizing the objective function, which is based on the computed advantages. GRPO uses a clipping mechanism to ensure that the updates to the policy model are within a reasonable range, preventing large, destabilizing updates. The objective function for GRPO may be (or similar to) JGRPO(7T) rnin (7r(-clllf?-)A.,clip(7r(-clllf?-), 1 — e, 1 + e) .)] ,where A_i is the advantage for the i-th output, and s is a hyper-parameter for clipping. The advantage A_i may be calculated as:
[0252] Atwhere R_i is the reward for the i-th output, and “mean” and “std” arethe mean and standard deviation of the rewards within the group.
[0253] During reinforcement learning, the policy model is further trained using reward models to obtain the data-driven reasoning model (e.g deepseek-r1-Zero or deepseek-r1). These reward models provide feedback on the policy model's outputs, guiding it towards generating more accurate and contextually appropriate responses. The reward models are configured to evaluate the quality of the model's outputs and provide feedback for optimization. Rule-Based Reward Models are configured to evaluate the correctness of outputs based on predefined rules. For example, in mathematical tasks, the policy model's output is compared against the correct answer using specific rules to determine if it is correct. In coding tasks, the policy model's output is compiled and tested against a set of test cases to determine its correctness. Model-Based Reward Models are trained on human preference data, which may comprise human-annotated examples of desired and undesired outputs. The reward models use this data to providefeedback on the policy model's outputs, guiding it towards generating more accurate and contextually appropriate responses. The reward models may be trained using supervised learning techniques, with the objective of minimizing the difference between the predicted quality scores and the actual scores provided by human annotators. The training data for a modelbased reward model may be curated to cover a wide range of tasks and domains, ensuring that the reward models can provide accurate and reliable feedback. A reward model may comprise a transformer-based encoder such as BERT that processes the input and generates a representation of the output. This representation is then compared to the ground truth or desired output, and a quality score is generated based on the similarity between the two. The reward models may be trained using supervised learning techniques, with the objective of minimizing the difference between the predicted quality scores and the actual scores provided by human annotators.
[0254] The training of the data-driven reasoning model may further comprise supervised fine- tuning for obtaining a fine-tuned data-driven reasoning model such as deepseek-r1. The optional supervised fine-tuning (SFT) may comprise fine-tuning the model on a curated dataset of about 1.5 million instances, encompassing domains such as math, code, writing, reasoning, and safety.
[0255] The data-driven reasoning model may have been trained using reinforcement learning and optionally supervised fine-tuning.
[0256] FIG. 13 illustrates an embodiment of a data-driven reasoning model.
[0257] The data-driven reasoning model may be suitable for selecting at least one tool from a plurality of tool(s) according to a task instruction by generating an indication of a relation between the provided task instruction and the selected tool. The relation between the provided task instruction and the selected tool may show the reasoning by the data-driven reasoning model. For example, the relation between the provided task instruction and the selected operating engine may comprise one or more intermediate reasoning step(s) before providing the model output data. The reasoning step(s) may explain why the data-driven reasoning model concluded to select a specific operating engine. Thereby, human oversight and control is enabled. Based on the reasoning step(s) provided by the data-driven reasoning model, the model can be validated and / or corrected. Furthermore, the accuracy and the precision of the data generated by the data-driven reasoning model is improved. Thereby, the accuracy and / or the precision of selecting the best suited operating engine can be improved resulting in more accurate and / or precise, i.e. meaningful, generation of chemical product data while hallucinationcan be reduced. Ultimately, this contributes to improving monitoring and / or controlling producing and / or processing a chemical product.
[0258] Data-driven reasoning models may have the same architecture as the data-driven models as described in FIGs. 8-9. The data-driven reasoning model may be obtained from a pretrained data-driven model such as a pretrained transformer, in particular by further training the pretrained data-driven model. The data-driven reasoning model may be triggered to reason about a task by instructing chain of thought prompting via the task instruction, i.e. a prompt, provided to the data-driven reasoning model. In an embodiment, the one or more data-driven reasoning model may be triggered to provide two or more indications of the selected tool. Said two or more indications of the selected tool may be ranked, i.e. the best indication of the selected tool may be selected. This ranking can be performed by a human user providing the ranking via a user interface or by providing the two or more indications to a ranking model or a quality model. The ranking or the quality model may be suitable for providing a quality score or a ranking associated with the two or more indications. Based on said quality score or ranking, one of the two or more indications of the selected tool may be provided. In some embodiments, a reward model may be configured to provide said quality score or ranking. This reward model may be trained based on output data provided by the data-driven reasoning model, i.e. the model output data and target output data.
[0259] Additionally or alternatively, the data-driven reasoning model may be trained to reason about the task associated with the task instruction to be completed via Reinforcement Learning, in particular Reinforcement Learning from Human Feedback or supervised finetuning. In an embodiment, data-driven reasoning model may be trained via a combination of Reinforcement Learning and supervised finetuning. Reinforcement Learning may comprise training the data- driven reasoning model based on training task instructions to generate target output data. During the reinforcement learning, the data-driven reasoning model may be adapted according to a reward score as provided by a reward model. The reward model may be trained based on target model output and training task instructions to rate output data generated by the data-driven reasoning model. Said reward model may be trained based on human feedback. Thereby, the reward models learns to reproduce the human feedback which allows to provide human-like feedback without requiring a user to read through thousands and thousands of conversations. Consequently, the quality of the data generated by the data-driven reasoning model is improved without relying on human feedback at every step in the training process. This comes with the benefit of allowing a highly scalable, yet accurate training.
[0260] Supervised finetuning may comprise training the data-driven reasoning model to reason over a task associated with the task instruction based on training task instructions and corresponding target output data. In an example, said target output data may be provided by a teacher model configured to follow task instructions. The teacher model may be a larger model, i.e. associated with a higher number of model parameters, than the data-driven reasoning model. Examples for data-driven reasoning model obtained via supervised finetuning based on target output data generated by a teacher model may include Llama models (8B) or Qwen models (0.5B to 32B). Thereby, the knowledge from the teacher model may be distilled to the data-driven reasoning model while allowing to operate the data-driven reasoning model with less computational resources because of the lower number of parameters. This allows a large scale application of the data-driven reasoning model in on-premise systems being important for data sovereignty.
[0261] FIG. 14 shows a schematic block diagram of an example apparatus 1414 or computation apparatus according to an example aspect, which may for instance represent the apparatus according to the second example aspect. Apparatus 1414 may for instance be configured to perform and / or control or comprise respective means (e.g. at least one of memory 1404, processor 1402, communication interface 1406, user interface 1408) for performing and / or controlling the method according to the third and / or fourth example aspect. Apparatus 1414 may as well constitute an apparatus comprising at least one processor (1402) and at least one memory (1404) storing instructions that, when executed by the at least one processor, cause an apparatus, e.g. apparatus 1414 at least to perform and / or control the method according to all example aspects. Processor 1402 may for instance execute program code stored in memory 1404, which may for instance represent a readable storage medium comprising program code that, when executed by processor 1402, causes the processor 1402 to perform the method according an example aspect. Processor 1402 may for instance further control memory 1404 and / or further memories, the communication interface(s) 1004, the optional user interface 1408, e.g. a graphical user interface 1408. Processor 1402 (and also any other processor mentioned in this specification) may be a processor of any suitable type. Memory 1404 may be included in processor 1402 or memory 1404 may be fixedly connected to processor 1402, or be at least partially removable from processor 1402, for instance in the form of a memory card or stick. Memory 1404 may for instance be non-volatile memory. It may for instance be a FLASH memory (or a part thereof), any of a ROM, PROM, EPROM and EEPROM memory (or a part thereof) or a hard disc (or a part thereof). Memory 1404 may also comprise an operating system for processor 1402. Memory 1404 may also comprise a firmware for apparatus 1414. Memory 1404 may also for instance be a Random Access Memory (RAM) or Dynamic RAM (DRAM). It mayfor instance be used by processor 1402 when executing an operating system and / or computer program. Communication interface (s) 1406 may enable apparatus 1414 to communicate with other entities, e.g. another apparatus, such as a server providing historic user instruction embeddings or a server providing a generative data-driven model or a user device or computer e.g. used by an operator and providing a user interface. The communication interface(s) 1004 may for instance comprise a wireless interface and / or wire-bound interface for instance to communicate with entities via an Intranet. User interface 1408 is optional and may comprise a display for displaying information to a user and / or an input device (e.g. a keyboard, keypad, touchpad, mouse, etc.) for receiving e.g. a query from an operator. Some or all of the components of the apparatus 1414 may for instance be connected via a bus. Some or all of the components of the apparatus 1414 may for instance be combined into one or more modules.
[0262] Example aspects, embodiments and examples of this disclosure provided above may allow an operator, easy, time-efficient, and reliable access to relevant product data for producing a product.
[0263] The publication Prior Art Disclosure; Issue 684; paragraphs
[1000] to
[8005] ; ISSN: 2198-4786; published: February 12, 2024 will be regarded as Reference RF1 , which is incorporated herein by reference in its entirety. Preferably, the product is a product as described in Reference RF1 ; paragraphs
[1000] to
[8005] , Preferably, the method / process described herein is further a method / process for the production of a product.
[0264] The converting step to obtain the product preferably comprises one or more step(s) as described below and can be performed by conventional methods well known to a person skilled in the art. The converting step preferably comprises one or more step(s) selected from:• recycling, preferably depolymerizing, gasifying, pyrolyzing, and / or steam cracking; and / or• purifying, preferably crystallizing, (e.g. solvent) extracting, distilling, evaporating, hydrotreating, absorbing, adsorbing and / or subjecting to ion exchanger; and / or• assembling, preferably foaming, synthesizing, chemical conversion, chemically transforming, polymerizing and / or compounding; and / or• forming, preferably foaming, extruding and / or molding; and / or• finishing, preferably coating and / or smoothing.
[0265] In addition, the one or more step(s) are described in detail in Reference RF1 ; paragraphs
[1000] to
[8005] ,
[0266] The present disclosure has been described in conjunction with preferred embodiments and examples as well. However, other variations can be understood and effected by those persons skilled in the art and practicing the claimed subject-matter, from the studies of thedrawings, this disclosure and the claims. Notably, in particular, any steps presented can be performed in any order, i.e. the present disclosure is not limited to a specific order of these steps. Moreover, it is also not required that the different steps are performed at a certain place or at one node of a distributed system, i.e. each of the steps may be performed at different nodes using different equipment / data processing.
[0267] The sequence of all method steps presented above is not mandatory, also alternative sequences may be possible. Nevertheless, the specific sequence of method steps shown as examples in the figures shall be considered as one possible sequence of method steps, e.g. for the respective embodiment described by the respective figure or an embodiment comprising at least some of the steps described by the respective figure.
[0268] In the present specification, any presented connection in the described embodiments is to be understood in a way that the involved components are operationally coupled. Thus, the connections can be direct or indirect with any number or combination of intervening elements, and there may be merely a functional relationship between the components.
[0269] As used herein ..determining" may also include ..initiating or causing to determine", “generating" may also include ..initiating and / or causing to generate", “providing” may also include “initiating or causing to determine, generate, select, send and / or transmit”, and "obtaining" may also include “initiating or causing to determine, generate, select, retrieve and / or receive”. “Initiating or causing to perform an action” may include any processing signal that triggers a computing node or device to perform the respective action.
[0270] The term “comprising” or “including” is to be understood in an open sense, i.e. in a way that an object that “comprises an element A” may also comprise further elements in addition to element A. Further, the term “comprising” or “including” may be limited to “consisting of”, i.e. consisting of only the specified elements.
[0271] The indefinite article “a” or “an” is not to be understood as “one”, i.e. use of the expression “an element” does not preclude that also further elements are present. A single element or other unit may fulfill the functions of several entities or items recited in the claims. The mere fact that certain measures are recited in the mutual different dependent claims does not indicate that a combination of these measures cannot be used in an advantageous implementation or further elements may be included.
[0272] The expressions “A and / or B” and “at least one of: A or B” are considered interchangeable and meant to comprise any one of the following three scenarios: (i) A, (ii) B, (iii) A and B. More generally, the expression “at least one of the following: ” and “at least one of <a list of two or more elements:*” and similar wording, wherethe list of two or more elements are joined by “and” or “or”, mean at least any one of the elements, or at least any two or more of the elements, or at least all the elements, so the wording includes any combinations of the elements.
[0273] Providing in the scope of this disclosure may include any interface configured to provide data. This may include an application programming interface, a human-machine interface such as a display and / or a software module interface. Providing may include communication of data or submission of data to the interface, in particular display to a user or use of the data by the receiving entity.
[0274] Obtaining in the scope of this disclosure may include any interface configured to obtain or receive data. This may include an application programming interface, a human-machine interface such as a display and / or a software module interface. Obtaining may include communication of data or submission of data from the interface, in particular use of the data by the receiving entity. Any obtaining of data, data structures, data sets, or the like may comprise receiving the data, data structures, data sets, or the like from a server providing (e.g. hosting) a data base comprising the data, data structures, data sets, or the like.
[0275] Various units, circuits, entities, nodes or other computing components may be described as “configured to” perform a task or tasks. Configured to shall recite structure meaning “having circuitry that” performs the task or tasks on operation. The units, circuits, entities, nodes or other computing components can be configured to perform the task even when the unit / circuit / component is not operating. The units, circuits, entities, nodes or other computing components that form the structure corresponding to “configured to” may include hardware circuits and / or memory storing program instructions executable to implement the operation. The units, circuits, entities, nodes or other computing components may be described as performing a task or tasks, for convenience in the description. Such descriptions shall be interpreted as including the phrase “configured to.” Any recitation of “configured to” is expressly intended not to invoke 35 U.S.C. § 112(f) interpretation.
[0276] In general, the methods, apparatuses, systems, computer elements, nodes or other computing components described herein may include memory, software components and hardware components. The memory can include volatile memory such as static or dynamic random-access memory and / or nonvolatile memory such as optical or magnetic disk storage, flash memory, programmable read-only memories, etc. The hardware components may include any combination of combinatorial logic circuitry, clocked storage devices such as flops, registers, latches, etc., finite state machines, memory such as static random-access memory or embedded dynamic random-access memory, custom designed circuitry, programmable logic arrays, etc.
[0277] In the present specification, any presented connection in the described embodiments is to be understood in a way that the involved components are operationally coupled. Thus, the connections can be direct or indirect with any number or combination of intervening elements, and there may be merely a functional relationship between the components.
[0278] Moreover, any of the methods, method steps, processes and actions described or illustrated herein may be implemented using executable instructions in a general-purpose or special-purpose processor and stored on a computer-readable storage medium (e.g., disk, memory, or the like) to be executed by such a processor. References to a ‘computer-readable storage medium’ should be understood to encompass specialized circuits such as signal processing devices, and other devices.
[0279] A processor may be a processor of any suitable type, and is preferably a processor configured for parallel processing of at least a hundred or at least a thousand threads in parallel, e.g. a graphical processing unit (GPU). For instance, the processor comprises at least a hundred or a at least a thousand parallel processing cores. In particular, the processor may comprise at least one (preferably at least a thousand) compute unified device architecture (CUDA) core(s), which may allow for using a graphical processing unit as the processor, which may increase computational efficiency. For instance, the processor may comprise at least one (e.g. at least a hundred) streaming multiprocessor cores, which may allow for increasing the data throughput. As a further example, the processor may comprise one or more (e.g. at least a hundred) tensor core(s) and / or (e.g. at least a hundred) tensor processing units (TPUs) . A tensor core may be specifically adapted to perform matrix operations and may allow to accelerate large matrix operations. A tensor core may be configured to perform mixed-precision matrix multiply and accumulate calculations in a single operation. For instance, a tensor core may perform mixed- precision floating-point matrix arithmetic, specifically utilizing FP16 (half-precision) inputs to produce either full-precision (FP32) or half-precision (FP16) outputs. In the case of FP16 output, a tensor core may provide a performance boost by storing the intermediate accumulation results in FP32 format, thereby maintaining the precision necessary for accurate results. A tensor processing unit may be an application-specific integrated circuit (ASIC). It may comprise a matrix multiplication unit (MXU), which may be specifically adapted or configured for dense linear algebra operations. TPUs may be configured to handle large-scale matrix operations efficiently, which may provide high computational throughput for Al tasks. A TPU may be equipped with on- chip high-bandwidth memory (HBM), which may enhance the capability for the use of larger models and batch sizes. TPUs may be connected in groups called Pods, which may scale upworkloads with minimal code changes. An MXU may be specifically configured for performing matrix multiplications. A TPU may comprise a tensor core.
[0280] For example, a processor may comprise several thousand tensor cores, each capable of performing 64 floating point FMA (Fused Multiply-Add) operations per clock cycle or (e.g. at least several hundred) tensor processing units (TPUs) being specifically configured for accelerating machine learning (ML) workloads, particularly for cloud-based applications. Additionally, Field-Programmable Gate Arrays (FPGAs) and Application-Specific Integrated Circuits (ASICs) may provide flexibility and performance benefits for specific Al tasks.. With these capabilities, such a GPU may allow for hundreds of TFLOPs (Tera Floating-Point Operations per Second) of performance in mixed-precision computations. Furthermore, a tensor core may support a variety of numerical formats, including IEEE standard half-precision, singleprecision, and double-precision floating-point formats, as well as a range of integer formats.
[0281] A processor may be a central processing units (CPU) configured with an advanced architecture, such as Intel’s Xeon Scalable processors or AMD’s EPYC series. A CPU may be configured for sequential processing and general-purpose computing. These CPUs may incorporate vector instruction sets, such as AVX-512, to accelerate mathematical computations that may e.g. enhance Al model training and inference. Furthermore, CPUs may integrate Al accelerators i.e. a CPU may be specifically configured for deep learning workloads.
[0282] The processor may be coupled to memory having a memory bandwidth of at least a hundred gigabytes per second, which may allow efficient handling of extensive data sets and may allow faster reading, processing, and writing compared to a general-purpose processor such as a computational processing unit.
[0283] The memory may be a high-capacity memory configured to manage the data-intensive nature of Al applications, providing necessary bandwidth and storage capacity for complex datasets. The memory may for instance be DDR4, DDR5, High Bandwidth Memory (HBM) and / or GDDR6X memory, which may improve data transfer rates and reduce latency. Such memory may enhance e.g. modeling and real-time sensor data for monitoring and control. Further, the memory may be operated with memory optimization techniques, such as caching and prefetching, which may enhance the execution speed of Al algorithms. Non-volatile Memory (NVM) technologies, including NAND Flash and 3D XPoint, may provide persistent storage solutions with high-speed access, which may enhance rapid data storage and retrieval for Al applications.
[0284] Any disclosure and embodiments described herein relate to the methods, the systems, devices, the computer program element lined out above and vice versa. Advantageously, thebenefits provided by any of the embodiments and examples equally apply to all other embodiments and examples and vice versa.
[0285] All terms and definitions used herein are understood broadly and have their general meaning if not indicated otherwise.
[0286] It will be understood that all presented embodiments are only examples, and that any feature presented for a particular example embodiment may be used with any aspect on its own or in combination with any feature presented for the same or another particular example embodiment and / or in combination with any other feature not mentioned. In particular, the example embodiments presented in this specification shall also be understood to be disclosed in all possible combinations with each other, as far as it is technically reasonable and the example embodiments are not alternatives with respect to each other. It will further be understood that any feature presented for an example embodiment in a particular category (method / apparatus / computer program / system) may also be used in a corresponding manner in an example embodiment of any other category. It should also be understood that presence of a feature in the presented example embodiments shall not necessarily mean that this feature forms an essential feature and cannot be omitted or substituted.
Claims
77CLAIMS1. A method for determining at least one product data set, the method comprising:Obtaining a user instruction related to determining the at least one product data set for producing a product based on the product data set;Determining whether a historic user instruction similar to the user instruction has been processed in the past, based on a history data base providing at least one history data set, wherein the at least one history data set is associated with a historic user instruction; upon determining that the user instruction similar to the user instruction has been processed, determining the at least one product data set based on the at least one history data set, wherein determining the at least one product data set comprises: providing a task instruction for determining the product data set to a generative data-driven model, wherein the task instruction is based on the user instruction and the history data set, the generative data-driven model being configured to determine at least a part of the at least one product data set, in response to receiving the task instruction for determining the product data set;Providing the at least one product data set.
2. The method of claim 1 , the method further comprising: determining whether the user instruction was successfully processed; and upon determining that the user instruction was successfully processed storing a history data set associated with the user instruction as a historic user instruction in a history data base, wherein the history data set comprises the tool sequence and wherein the history data set is associated with the user instruction.
3. The method of claim 2, wherein upon determining that the user instruction was successfully processed, the method further comprises: embedding the user instruction; and storing the embedded user instruction in a vector data base, wherein the embedded user instruction is associated with the respective history data set.
4. The method of any one of claims 1 to 3, wherein determining the at least one product data set based on the at least one history data set comprises:Determining, based on the user instruction and the at least one history data set, a tool sequence, comprises an indication of at least one tool, wherein the at least one tool comprises at least one graph extraction tool for retrieving at least a part of of a basis of the78 at least one product data set from a graph database providing production data represented in a graph structure;Carrying out or causing to carry out the at least one tool comprising the at least one graph extraction tool according to the tool sequence;Receiving output data from the at least one tool; wherein the determining the at least one product data set based on the at least one history data set is further based on at least a part of the output data.
5. The method of any one of claims 1 to 4, wherein determining whether a historic user instruction similar to the user instruction has been processed in the past comprises: Obtaining at least one embedding of a historic user instruction;Embedding the user instruction into an embedding space comprising the at least one embedding of the historic user instruction;Determining a similarity score for the at least one embedding of a historic user instruction, wherein the similarity score is based on the distance between the at least one embedding of the historic user instruction and the embedded user instruction in the embedding space with respect to a similarity measure; determining that a historic user instruction similar to the user instruction has been processed in the past, when the similarity score is below or above a threshold similarity.
6. The method of claim 4, wherein determining the tool sequence comprises: retrieving a set of tool data sets, wherein each tool data set of the set of tool data sets is associated with a tool of the at least one tool; wherein the determining the tool sequence is further based on the set of tool data sets.
7. The method of claim 6, wherein determining the tool sequence further comprises at least the following steps: providing a task instruction for generating the tool sequence to a generative data-driven model, wherein the task instruction is generated based on the user instruction and the set of tool data sets, the generative data-driven model being configured to generate a tool output data set related to the tool sequence, in response to receiving the task instruction; providing the tool output data set as the tool sequence or in case the tool output data set is not of the same format of a tool sequence: parsing or causing to parse the tool output data set into the tool sequence and providing the tool sequence.
8. The method of claim 7, wherein parsing the tool output data set comprises:79 identifying data indicating the at least one tool, and, in case the at least one tool is more than one tool, determining a tool order in which the more than one tool is to be carried out; wherein the tool sequence comprises the data indicating the at least one tool and, in case the at least one tool is more than one tool, the tool order.
9. The method of claim 4 or any one of claims 5 to 8 as long as they are dependent on claim4, wherein determining the tool sequence comprises: determining whether input data required for carrying out the at least one tool is provided by the user instruction; upon determining that the input data required for carrying out the at least one tool is provided by the user instruction: including the at least one tool in the tool sequence to be carried out based on the input data.
10. The method of claim 9, upon determining that the input data required for carrying out the at least one tool is not provided by the user instruction: determining the tool sequence, the determining comprising: retrieving a set of tool data sets, wherein each tool data set of the set of tool data sets is associated with a tool of the at least one tool; identifying, based on the set of tool data sets, a tool providing the input data used by the at least one tool; introducing said tool in the tool sequence before the at least one tool.
11. The method of any one of claims 1 to 10, wherein determining the at least one product data set comprises: providing a task instruction to a generative data-driven model, wherein the task instruction is based on the user instruction, the history data set and the at least a part of the output data, the generative data-driven model being configured to determine a product data set, in response to receiving the task instruction for determining the product data set.
12. The method of any one of claims 1 to 11 , the method further comprising: operating and / or controlling a production environment based on the at least one product data set, in particular to produce the product.
13. An apparatus comprising respective means for carrying out or performing the steps of any one of claims 1 to 12 or comprising at least one processor and at least one memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to carry out the steps of the method according to any one of claims 1 to 12.8014. A system for operating a production environment comprising: an apparatus according to claim 13, a graph database providing production data represented in a graph structure, the graph database being communicatively coupled to the apparatus; a server providing the at least one generative data-driven model, the server being communicatively coupled to the apparatus; together performing or carrying out at least the steps of the method according to any one of claims 1 to 12, in particular the system further comprising a user device configured to receive the user instruction from an operator of the production environment and to display the at least one product data set.
15. Use of a product data set determined according to the methods of any one of claims 1 to 12, or by the apparatus of claim 13 for displaying the product data set to an operator of the production environment and / or for producing the product.