Method, device and electronic equipment for constructing user portrait
By identifying and standardizing multi-source data in process industries, and using a profile support library and vector library to build user profiles, the problem of the inability to dynamically integrate multi-source input data in existing technologies has been solved, thus achieving accurate user profile construction.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SUPCON TECH CO LTD
- Filing Date
- 2026-03-20
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies for building user profiles in process industries rely on manual methods or prompts, and cannot dynamically integrate multi-source input data, resulting in inaccurate user profiles.
By acquiring raw data from the process industry, identifying initial profile elements, extracting and standardizing elements that meet specific rules, and using a profile support library and vector library for data fusion, a target user profile is constructed.
It enables the accurate construction of unified user profiles in process industries, dynamically integrates multi-source input data, reduces reliance on manual intervention and prompts, and improves the accuracy and consistency of user profiles.
Smart Images

Figure CN122241183A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the process industry sector, and more specifically, to a method, apparatus, and electronic device for constructing user profiles. Background Technology
[0002] In the process industry, enterprises typically deploy multiple heterogeneous industrial control systems. The configuration data generated by these systems differs significantly in structure, semantics, and representation. The products involved in the process industry are complex, potentially requiring different product models to be configured for the same physical device to meet varying logical execution and application requirements. The inherent characteristics of these products dictate the diverse input content for the user profile, resulting in varied formats. To synthesize this complex data into a unified user profile, current technologies often employ manually defined standardized data structures, pre-setting all entity types, entity attribute fields, and relationship types for the user profile. This method not only requires significant manpower for rule writing and validation but also necessitates redesigning the data model and modifying the mapping logic upon introducing new business or equipment types, leading to poor system scalability and high maintenance costs.
[0003] Related technologies propose improved methods based on the above approaches, employing natural language processing techniques to extract triples from unstructured documents (such as operating procedures and fault reports) using prompt words to construct knowledge graphs. However, these methods are limited by the insufficient generalization ability of a single model to domain terminology. They rely on prompt words for profile construction, which require frequent changes and cannot be reused across models. This makes it difficult to achieve semantically consistent, continuously evolving, and traceable dynamic fusion of configuration data from heterogeneous systems, multi-format documents, and multi-round operation inputs. Consequently, the generated user profiles suffer from entity omissions, relation conflicts, semantic drift, and evolutionary stagnation. Both methods fail to dynamically fuse multi-source inputs, resulting in inaccurate user profiles.
[0004] There is currently no effective solution to the above problems. Summary of the Invention
[0005] This application provides a method, apparatus, and electronic device for constructing user profiles, which at least solves the technical problem that when constructing unified user profiles in the industrial field, the profile construction relies on manual or prompt words, and cannot dynamically integrate multi-source input data, resulting in inaccurate output user profiles.
[0006] According to one aspect of the embodiments of this application, a method for constructing a user profile is provided, comprising: acquiring raw data generated in a process industry and performing data identification on the raw data to obtain initial profile elements, wherein the initial profile elements include: entities, entity relationships, and entity attributes; extracting a first type of profile elements from the initial profile elements, wherein the first type of profile elements are profile elements in the initial profile elements that satisfy a first type of rule, the first type of rule being used at least to indicate the type range corresponding to each type of profile element; standardizing the first type of profile elements according to a second type of rule to obtain target profile elements, wherein the second type of rule is used at least to provide semantic specification templates corresponding to each type of profile element; and determining a target user profile based on the entities, entity relationships, and entity attributes in the target profile elements.
[0007] Optionally, data identification is performed on the raw data to obtain initial profile elements, including: classifying the raw data according to data type to obtain data of multiple data types, wherein the multiple data types include at least: text and tables, wherein the raw data includes user data files, user dialogue data and operation data generated in the process industry; performing data identification on the data of multiple data types to obtain profile elements corresponding to the data of each data type, and determining the set of profile elements corresponding to the data of each data type as the initial profile elements.
[0008] Optionally, the method further includes: when the data type is text, performing data recognition on the text data to obtain the profile elements corresponding to the text data in the following way: obtaining a profile support library, wherein the profile support library is used to store entities, entity relationships, and entity attributes contained in existing user profiles in industrial domain knowledge bases or process industries, wherein the industrial domain knowledge base is a database containing multiple preset entity types, preset relationship types, and preset attribute types; under the constraints of the profile support library, performing data recognition on the text data through a text semantic recognition model to obtain the profile elements corresponding to the text data.
[0009] Optionally, when the data type is a table, the table-type data is identified to obtain the corresponding profile elements by the following method: the table header content in the table-type data is identified as the first type of table data, and the table body content in the table-type data is identified as the second type of table data; under the constraints of the profile support library, the first type of table data is identified by a table semantic recognition model to obtain the first table profile element, wherein the first table profile element includes the table entity type and the table entity relationship type; the second type of table data is identified by a table semantic recognition model to obtain the second table profile element, wherein the second table profile element includes the entities corresponding to each table entity type in the first table profile element and the entity attributes of the entities; based on the table entity relationship type in the first table profile element, the entity relationship of each entity in the second table profile element is determined, and the second table profile element after supplementing the entity relationships of each entity is determined as the profile element corresponding to the table-type data.
[0010] Optionally, extracting a first type of profile element from the initial profile elements includes: obtaining a first type of rule stored in a preset rule base; determining a target entity type based on the entity type range limitation rule in the first type of rule, wherein the target entity type is the type corresponding to the entity used to construct the user profile as defined in the entity type range limitation rule; determining a target entity relationship type based on the entity relationship type range limitation rule in the first type of rule, wherein the target entity relationship type is the type corresponding to the entity relationship used to construct the user profile as defined in the entity relationship type range limitation rule; extracting entities of type target entity type from the initial profile elements to obtain target entities, and extracting entity attributes corresponding to the target entities; extracting entity relationships of type target entity relationship type from the initial profile elements to obtain target entity relationships; and determining the set of all target entities, target entity relationships corresponding to target entities, and entity attributes as the first type of profile element.
[0011] Optionally, the first type of profile elements are standardized according to the second type of rules to obtain the target profile element, including: obtaining the second type of rules stored in the preset rule base; standardizing the first type of profile elements through semantic specification and attribute annotation to obtain the target profile element; performing semantic specification on the entities and entity relationships in the first type of profile elements according to the semantic specification rules in the second type of rules to obtain the third type of profile element, wherein the semantic specification rules include semantic specification templates corresponding to each type of profile element, and the semantic specification templates include at least synonym specification templates; performing attribute annotation on the entity attributes in the third type of profile elements according to the attribute annotation rules in the second type of rules to obtain the target profile element, wherein the attribute annotation rules are used to limit the entity attribute names that need to be annotated as key attributes.
[0012] Optionally, the target user profile is determined based on the entities, entity relationships, and entity attributes in the target profile elements, including: obtaining a vector library, wherein the vector library is used to store the vector values corresponding to the entities, entity relationships, and entity attributes contained in the existing user profiles in the process industry; determining each entity in the target profile elements as a new entity; for each new entity, if there are overlapping entities, merging the entity attributes and entity relationships corresponding to the new entity with the entity attributes and entity relationships of the overlapping entities in the existing user profiles to obtain the merged existing user profiles; updating each new entity without overlapping entities as a new entity in the merged existing user profiles, and updating the entity relationships and entity attributes of the new entities according to the target profile elements to obtain the target user profile.
[0013] Optionally, the method further includes determining whether any newly added entity has overlapping entities by: vectorizing the first entity to obtain the target vector value corresponding to the first entity, wherein the first entity is any newly added entity in the target profile elements; calculating the similarity between the target vector value and the vector value corresponding to each entity in the vector library to obtain multiple similarities; determining the largest similarity among the multiple similarities as the target similarity, and determining the entity to which the vector value in the vector library used to calculate the target similarity belongs as the similar entity corresponding to the first entity; if the first entity and the similar entity satisfy the third type of rule, determining the similar entity as the overlapping entity of the first entity. In the process, the third type of rule contains multiple sub-rules, each corresponding to a criterion for judging the overlap between the first entity and similar entities in one dimension. If the first entity and similar entities do not meet the third type of rule, a merge request message is sent to the target user. The merge request message carries a merge request to determine whether to merge the first entity and similar entities. The merge confirmation message corresponding to the merge request message is received. If the merge confirmation message indicates that the first entity and similar entities should be merged, the similar entity is determined to be an overlapping entity of the first entity. If the merge confirmation message indicates that the first entity and similar entities should not be merged, it is determined that the first entity does not have any overlapping entities.
[0014] Optionally, the method further includes: updating the profile support library and the vector library based on the target profile elements to obtain the updated profile support library and vector library, wherein the profile support library is used to store the entities, entity relationships and entity attributes contained in the existing user profiles in the process industry, and the vector library is used to store the vector values corresponding to the entities, entity relationships and entity attributes contained in the existing user profiles in the process industry.
[0015] According to another aspect of the embodiments of this application, a user profile construction apparatus is also provided, comprising: an acquisition module, configured to acquire raw data generated in a process industry and perform data recognition on the raw data to obtain initial profile elements, wherein the initial profile elements include: entities, entity relationships, and entity attributes; an extraction module, configured to extract a first type of profile elements from the initial profile elements, wherein the first type of profile elements are profile elements in the initial profile elements that satisfy a first type of rule, and the first type of rule is at least used to indicate the type range corresponding to each type of profile element; a processing module, configured to perform standardization processing on the first type of profile elements according to a second type of rule to obtain target profile elements, wherein the second type of rule is at least used to provide semantic specification templates corresponding to each type of profile element; and a determination module, configured to determine a target user profile based on the entities, entity relationships, and entity attributes in the target profile elements.
[0016] According to another aspect of the embodiments of this application, a non-volatile storage medium is also provided, wherein a program is stored in the non-volatile storage medium, and the program controls the device where the non-volatile storage medium is located to execute the above-mentioned user profile construction method when it runs.
[0017] According to another aspect of the embodiments of this application, an electronic device is also provided, including: a memory and a processor, wherein the processor is used to run a program stored in the memory, wherein the program executes the above-described method for constructing a user profile when it runs.
[0018] According to another aspect of the embodiments of this application, a computer program product is also provided, including computer instructions, which, when executed by a processor, implement the above-described method for constructing a user profile.
[0019] In this embodiment, raw data generated in the process industry is acquired and data identification is performed on the raw data to obtain initial profile elements. These initial profile elements include entities, entity relationships, and entity attributes. A first type of profile elements is extracted from the initial profile elements. The first type of profile elements are those that satisfy a first type of rule, which at least indicates the type range corresponding to each type of profile element. The first type of profile elements are then standardized according to a second type of rule to obtain target profile elements. This second type of rule at least provides semantic specification templates corresponding to each type of profile element. Finally, based on the entities and entity relationships in the target profile elements... The method for determining target user profiles based on entity attributes involves data identification of raw data to obtain initial profile elements. The next step is to filter and standardize the initial profile elements using first and second type rules, dynamically integrating multi-source input data to obtain target profile elements. This method does not rely on manual work or prompts and has universal applicability. Finally, the target user profile is determined based on the entities, entity relationships, and entity attributes in the target profile elements, achieving the goal of accurately constructing user profiles. This solves the technical problem of inaccurate user profiles when relying on manual work or prompts to construct profiles in the industrial field, which cannot dynamically integrate multi-source input data. Attached Figure Description
[0020] The accompanying drawings, which are included to provide a further understanding of this application and form part of this application, illustrate exemplary embodiments and are used to explain this application, but do not constitute an undue limitation of this application. In the drawings:
[0021] Figure 1 This is a hardware structure block diagram of a computer terminal for implementing a user profile construction method according to an embodiment of this application;
[0022] Figure 2 This is a flowchart of a user profile construction method provided according to an embodiment of this application;
[0023] Figure 3 This is an example diagram of entities and entity relationships in a user profile provided according to an embodiment of this application;
[0024] Figure 4 This is a general block diagram of logic and data provided according to an embodiment of this application;
[0025] Figure 5 This is a schematic diagram of information extraction according to an embodiment of this application;
[0026] Figure 6 This is a diagram illustrating the logical steps of table parsing according to an embodiment of this application;
[0027] Figure 7 This is a flowchart illustrating the steps of updating a user profile according to an embodiment of this application;
[0028] Figure 8 This is a schematic diagram of a user profile construction device provided according to an embodiment of this application. Detailed Implementation
[0029] To enable those skilled in the art to better understand the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present application, and not all embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative effort should fall within the scope of protection of the present application.
[0030] The information collected in this application embodiment is information and data authorized by the user or fully authorized by all parties. The collection, storage, use, processing, transmission, provision, disclosure and application of the relevant data all comply with the relevant laws, regulations and standards of the relevant regions, and necessary confidentiality measures have been taken. It does not violate public order and good morals, and provides corresponding operation entry points for users to choose to authorize or reject the automated decision results. If the user chooses to reject, the process will proceed to the expert decision-making process.
[0031] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this application described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.
[0032] To better understand the embodiments of this application, the technical terms involved in the embodiments of this application are explained below:
[0033] Industrial Time Series Large Model: A large-scale artificial intelligence (AI) model designed specifically for massive time series data in the industrial field. Its core value lies in breaking through the limitations of traditional models that are "one model per task", realizing knowledge transfer and few-sample learning, reducing dependence on scarce labeled data, improving generalization ability and development efficiency, and serving as the core infrastructure driving the intelligent transformation of industry.
[0034] Knowledge graph: A structured semantic knowledge base used to describe concepts and their relationships in the physical world.
[0035] Engineering configuration: Information data corresponding to the actual device generated after users perform simple UI operations and configure relevant settings. Different products have different information data and data structures.
[0036] In related technologies, natural language processing techniques are used to extract triples from unstructured documents (such as operating procedures and fault reports) using prompt words to construct knowledge graphs. However, this method is limited by the insufficient generalization ability of a single model to domain terminology. It relies on prompt words for user profile construction, but these prompt words need frequent changes and cannot be reused across models. This makes it difficult to achieve semantically consistent, continuously evolving, and traceable dynamic fusion of configuration data from heterogeneous systems, multi-format documents, and multi-round operation inputs, resulting in user profiles with entity omissions, relationship conflicts, semantic drift, and evolutionary stagnation. Therefore, when constructing unified user profiles in the industrial field, there is a technical problem: relying on manual or prompt word construction, and failing to dynamically fuse multi-source input data, leads to inaccurate output user profiles. To solve this problem, this application provides a related solution, which is described in detail below.
[0037] According to an embodiment of this application, an embodiment of a method for constructing a user profile is provided. It should be noted that the steps shown in the flowchart in the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions. Furthermore, although a logical order is shown in the flowchart, in some cases, the steps shown or described may be executed in a different order than that shown here.
[0038] The methods and embodiments provided in this application can be executed on a computer terminal or similar computing device. Figure 1 A hardware block diagram of a computer terminal for implementing a method of user profile construction is shown. Figure 1 As shown, the computer terminal 10 may include one or more processors 102 (shown as 102a, 102b, ..., 102n in the figure) 102 (processor 102 may include, but is not limited to, a microprocessor MCU or a programmable logic device FPGA, etc.), a memory 104 for storing data, and a transmission device 106 for communication functions. In addition, it may also include: a display, an input / output interface (I / O interface), a universal serial bus (USB) port (which may be included as one of the ports of a BUS bus), a network interface, a power supply, and / or a camera. Those skilled in the art will understand that... Figure 1 The structure shown is for illustrative purposes only and does not limit the structure of the aforementioned electronic device. For example, computer terminal 10 may also include... Figure 1 The more or fewer components shown, or having the same Figure 1 The different configurations shown.
[0039] It should be noted that the aforementioned one or more processors 102 and / or other data processing circuits are generally referred to herein as "data processing circuits". These data processing circuits may be embodied, in whole or in part, in software, hardware, firmware, or any other combination thereof. Furthermore, the data processing circuits may be a single, independent processing module, or may be integrated, in whole or in part, into any other element within the computer terminal 10. As involved in the embodiments of this application, the data processing circuits serve as processor control (e.g., selection of a variable resistor termination path connected to an interface).
[0040] The memory 104 can be used to store software programs and modules of application software, such as the program instructions / data storage device corresponding to the user profile construction method in this embodiment. The processor 102 executes various functional applications and data processing by running the software programs and modules stored in the memory 104, thereby realizing the aforementioned user profile construction method. The memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memory 104 may further include memory remotely located relative to the processor 102, and these remote memories can be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
[0041] The transmission device 106 is used to receive or send data via a network. Specific examples of the network described above may include a wireless network provided by the communication provider of the computer terminal 10. In one example, the transmission device 106 includes a Network Interface Controller (NIC), which can connect to other network devices via a base station to communicate with the Internet. In another example, the transmission device 106 may be a Radio Frequency (RF) module, used for wireless communication with the Internet.
[0042] The display can be, for example, a touchscreen liquid crystal display (LCD) that allows the user to interact with the user interface of the computer terminal 10.
[0043] Under the above operating environment, embodiments of this application provide a method for constructing a user profile, such as... Figure 2 The diagram shown is a flowchart of a user profile construction method according to an embodiment of this application, including:
[0044] Step S202: Obtain raw data generated in the process industry and perform data recognition on the raw data to obtain initial profile elements.
[0045] In the technical solution provided in step S202, the initial profile elements include: entities, entity relationships, and entity attributes. There are multiple ways to perform data identification on the original data to obtain the initial profile elements. For example, the original data can be classified according to data type to obtain data of multiple data types, including at least text and tables. The original data includes user data files, user dialogue data, and operation data generated in the process industry. Data identification is then performed on each of the multiple data types to obtain the profile elements corresponding to each data type. The set of profile elements corresponding to each of the multiple data types is then determined as the initial profile elements.
[0046] In some embodiments of this application, the initial profile element consists only of entities, with attributes (i.e., entity attributes) and relationships (entity relationships) within each entity. In the process industry, the type (or category) of an entity can include, but is not limited to: manufacturer, producer, document, media, operator, tag number, equipment, and loop. Relationships between entities include: common, peer, contain, maintain, source, and manufacturing. Each entity has its own attributes, including location, description, unit, set value, etc. Multiple relationships are allowed between two entities without limitation, and entities can have multiple types and multiple attributes.
[0047] In process industry environments, raw data originates from various heterogeneous sources, including but not limited to: user profile files (e.g., user-uploaded engineering documents), user dialogue data (e.g., operator natural language dialogue records (such as inspection reports, fault descriptions)), and operational data (e.g., operation logs generated by control systems (such as alarm records, parameter change events)). Raw data contains various data types, such as text, tables, images, or other types (e.g., audio, video). Raw data is multi-source and heterogeneous, encompassing user profile files, user dialogue data, and operational data generated within the process industry.
[0048] After acquiring the raw data generated in the process industry, the first step is to analyze the various input contents of the raw data (user profile files in various formats, user dialogue data, and operational data). The raw data is then categorized according to data type, resulting in multiple data types, such as text data (unstructured text expressed in natural language, such as user-uploaded equipment maintenance reports, fault description documents, process specification manuals, etc.), tabular data (semi-structured data organized in a row-column structure, such as equipment parameter configuration tables, loop point allocation tables, instrument calibration record tables, etc.), image data (non-text information stored in image format, such as scanned flowcharts, equipment nameplate photos, infrared thermal images, on-site inspection photos, etc.), and other data types (such as audio data). Each data type is then identified using its corresponding data recognition method to obtain corresponding profile elements. The following details how to perform data recognition on each data type using its respective data recognition method to obtain the corresponding profile elements.
[0049] When the data type is text, the text data is identified to obtain the corresponding profile elements: A profile support library is obtained. This library stores entities, entity relationships, and entity attributes contained in existing user profiles within an industrial domain knowledge base or process industry. The industrial domain knowledge base is a database containing multiple preset entity types, relationship types, and attribute types. Its content originates from publicly available process industry standards and specifications, including process industry-related entity types, relationship types, and entity attribute types. This knowledge base is pre-built and continuously maintained by industry experts and standard document specialists based on process industry standards and specifications. For example, the profile support library stores existing entity type sets (e.g., equipment, tag number, loop, medium, operator, document, etc.); existing entity attribute sets (e.g., location, unit, setpoint, description, source, etc.); and existing entity relationship type sets (e.g., point, containment, maintenance, source, manufacturing, etc.). When initially acquiring raw data, i.e., before any user profiles have been built, the profile support library only contains the industrial domain knowledge base. Under the constraints of the industrial domain knowledge base, a text semantic recognition model is used to identify the text data to obtain the corresponding profile elements.
[0050] The next step, under the constraints of the profile support library, is to use a text semantic recognition model to identify text-type data and obtain the profile elements corresponding to the text-type data:
[0051] Before text-type data enters the recognition process, the current version of the profile support library (containing entities, entity relationships, and entity attributes already included in user profiles within the process industry) is first loaded from local storage or a distributed cache. This support library is a structured semantic knowledge base. The constraints of the profile support library ensure that the text-type data recognition by the text semantic recognition model is bounded (entity category is defined, attribute is limited, and relationship category is defined), rather than arbitrary. Under the constraints of the profile support library (i.e., within the semantic scope defined by the profile support library), the text semantic recognition model can only match and extract profile elements from the predefined entity and relationship databases of the industry domain. This effectively filters out redundant or conflicting information that deviates from the semantic system of the process industry, ensuring that the generated initial profile elements strictly conform to industrial reality semantically. This improves the accuracy and consistency of text data parsing, thereby providing a reliable and standardized input foundation for subsequent standardized processing and target user profile construction, and avoiding the identification of unnecessary and useless entities in the process industry domain.
[0052] In the process of data recognition of text type data through the text semantic recognition model, the text type data is first segmented and broken into sentences. Long texts are divided into independent semantic units (each unit ≤ 50 words) according to punctuation (period, semicolon, line break) to avoid semantic drift caused by long distance dependence. The next step is to standardize the data by removing useless interjections (e.g., "probably", "ne") and unifying the unit format. The text type data is then input into the text semantic recognition model for data recognition (identifying entities, entity relationships, and entity attributes). The text semantic recognition model is a Transformer architecture language model trained with a large amount of process industry data. For example, a joint model of sequence labeling and relation extraction based on BERT-base fine-tuning. Those skilled in the art will know that other pre-trained language models with sequence modeling capabilities can also be used as equivalent replacement models for this semantic recognition model, as long as they can complete the above joint extraction task under the constraints of the profile support library to achieve end-to-end recognition of entities, attributes, and relationships in the process industry. It is important to note that the text semantic recognition model only identifies each word in the text data within the entity types, relationship types, and relationship attribute types allowed by the profile support library. Each word is mapped to the entity types and relationship types allowed by the profile support library to obtain multiple instantiated entities. Finally, the profile elements corresponding to the text data are output (i.e., entities, entity relationships, and entity attributes are obtained by processing the text data). For example, the entity of the equipment type is pump1, whose entity attributes are location: location 1, manufacturer: manufacturer A, and the entity relationship is that entity pump1 and manufacturer A are in a source relationship, that is, entity pump1 is produced by entity manufacturer A.
[0053] When the data type is table, the table data is identified to obtain the corresponding profile elements: the table header content is identified as the first type of table data, and the table body content is identified as the second type of table data; under the constraints of the profile support library, the first type of table data is identified using a table semantic recognition model to obtain the first table profile element, which includes the table entity type and the table entity relationship type; the second type of table data is identified using the table semantic recognition model to obtain the second table profile element, which includes the entities corresponding to each table entity type in the first table profile element and the entity attributes of the entities; based on the table entity relationship type in the first table profile element, the entity relationship of each entity in the second table profile element is determined, and the second table profile element after supplementing the entity relationships of each entity is determined as the profile element corresponding to the table data.
[0054] When performing data recognition on tabular data using the above method, the table header content is first identified as the first type of table data (containing the column headings of each column), and the table body content is identified as the second type of table data (containing rows of numerical, text, or identifier data corresponding to each row and column). Then, under the constraints of the profile support library, the first stage is executed: the first type of table data (header) is input into the table semantic recognition model. This model is a joint sequence classification and label prediction model fine-tuned based on a pre-trained language model (such as BERT-base), and its output space is strictly limited to the predefined set of entity types and relationship types in the profile support library. Under this constraint, the table semantic recognition model performs semantic analysis on each column heading in the first type of table data, identifying its possible corresponding table entity type and table entity relationship type, and outputting the first table profile elements, including: table entity type: for example, "location", "device", etc.; table entity relationship type: for example, "contains", "source", etc. The table entity relationship type also specifies which types of entities will have which type of entity relationship. This stage does not generate specific entity instances, but only obtains the table entity type and table entity relationship type of the table data, that is, what kind of entity each column of the table represents and what kind of relationship may exist between the columns.
[0055] The second stage involves using a table semantic recognition model to identify the second type of table data, resulting in second table profile elements. The second type of table data (table body) is input into the same table semantic recognition model. Based on the first table profile elements output from the previous stage (i.e., the determined entity types and relationship types), the model extracts entity instances and binds attribute values for each row of table body data. Specifically, based on the mapping relationship between columns and entity types, the value of each column in the table body is mapped to the corresponding entity attribute, resulting in second table profile elements that include the entities corresponding to each table entity type in the first table profile elements and the entity attributes of those entities. For example, if the "Place Number" column is identified as the entity type "Place Number", then each row value in that column (e.g., "Place Number 1") is identified as an entity instance (i.e., a concrete entity instantiated from the entity type "Place Number"), and the function value (e.g., "Control Traffic") and the associated device corresponding to Place Number 1 are identified as entity attributes of that entity.
[0056] The third stage involves determining the entity relationships of each entity in the second table profile element based on the table entity relationship types in the first table profile element. After obtaining the first table profile element (entity type and relationship type) and the second table profile element (entity instance and attribute value), the entity relationships of each entity in the second table profile element are supplemented based on the identified table entity relationship types in the first table profile element. The identified table entity relationship types in the first table profile element not only include the type of entity relationship but also which entities will have which type of entity relationship. Therefore, based on the table entity relationship types of the first table profile element, it is possible to determine what kind of entity relationship exists between any two types of entities, thereby determining the entity relationship between each entity in the second table profile element and any other entity. Finally, the second table profile element, after supplementing the entity relationships of each entity, is determined as the profile element corresponding to the table type data.
[0057] In some embodiments of this application, various data types may also include images and other types. For image data, image data recognition is performed under the constraints of the image support library to obtain the image elements corresponding to the image data: The input image is standardized (e.g., size normalization, image denoising and contrast enhancement; if it is a multi-page drawing, page segmentation and region division (e.g., title area, annotation area)). Visible text content is extracted from the image, and the graphic object in the image (e.g., valve, pump, sensor, pipeline, terminal block, etc.) is identified through a visual feature extraction network, and text descriptions of its spatial location and morphological features are generated. The extracted text and the visually recognized graphic object are jointly semantically aligned to obtain comprehensive text data containing visible text content and graphic objects in the image. Then, the above-mentioned text semantic recognition model is used to identify the image elements of the comprehensive text data using predefined entity types and relationship types in the image support library, and the image elements (entity, entity relationship, entity attribute) corresponding to the image data are output.
[0058] Similarly, for other types of data (e.g., audio data), corresponding data recognition methods are used to identify the data under the constraints of the profile support library, obtaining the profile elements corresponding to the audio data: if audio data exists (e.g., operator voice commands, inspection recordings, voice alarm records, etc.), when performing data recognition under the constraints of the profile support library, audio preprocessing is first performed (e.g., audio noise reduction and segmentation, voice endpoint detection, separation of valid voice segments; if the identified voice is that of unauthorized personnel, it is marked as "abnormal input" and recorded). The next step is to use an industrial scenario-optimized Automatic Speech Recognition (ASR) model to convert the audio data into text while preserving the original semantic expression. Then, the text is input into the text semantic recognition model, and the profile elements are identified under the constraints of the profile support library to obtain the profile elements corresponding to the audio data.
[0059] Finally, the sets of profile elements corresponding to various data types are determined as the initial profile elements. Addressing the issues of incomplete identification and poor consistency caused by the diverse sources and heterogeneous structures of user data in the process industry, this approach categorizes raw data into different types such as text and tables. Adaptive identification is then performed on heterogeneous data sources such as user profile files, user dialogue data, and operational data. This enables accurate extraction of entities, entity relationships, and entity attributes from different data types. Integrating the identification results of various data types into unified initial profile elements overcomes the deficiency in identification capabilities under mixed structured and unstructured data scenarios, ensuring the completeness of profile element extraction and laying a reliable data foundation for the subsequent construction of high-precision target user profiles.
[0060] Step S204: Extract the first type of image elements from the initial image elements.
[0061] In the technical solution provided in step S204, the first type of portrait element is the portrait element in the initial portrait elements that satisfies the first type of rule. The first type of rule is used to indicate at least the type range corresponding to each type of portrait element (the type range corresponding to each type of portrait element refers to the type range of entities and entity relationships). There are multiple ways to extract the first type of portrait element from the initial portrait elements, such as: obtaining the first type of rule stored in the preset rule base; determining the target entity type according to the entity type range limitation rule in the first type of rule, wherein the target entity type is the type corresponding to the entity used to construct the user portrait as defined in the entity type range limitation rule; determining the target entity relationship type according to the entity relationship type range limitation rule in the first type of rule, wherein the target entity relationship type is the type corresponding to the entity relationship used to construct the user portrait as defined in the entity relationship type range limitation rule; extracting entities of type target entity type from the initial portrait elements to obtain the target entity, and extracting the entity attributes corresponding to the target entity; extracting entity relationships of type target entity relationship type from the initial portrait elements to obtain the target entity relationship; and determining the set of all target entities, target entity relationships corresponding to the target entities, and entity attributes as the first type of portrait element. This process eliminates redundant information such as non-target types and non-target relationships from the initial profile elements obtained from the multi-source heterogeneous raw data of the process industry, so that the first type of profile elements focuses on entities and relationship structures that have real business value. This effectively solves the problems of profile redundancy, information dilution and low extraction efficiency caused by the ambiguity of type range, and realizes the accurate construction and updating of user profiles in complex industrial data environments.
[0062] In some embodiments of this application, the preset rule base is a predefined database for storing various rules. During the generation of profile elements and the updating of profile configuration (i.e., the construction of the user profile), multiple types of rules are defined to ensure processing according to the intent of the user and the system executing the embodiments of this application. These rule entries are generalized rules and can be continuously improved as the user profile is updated. To further achieve the filtering of profile elements, a first type of profile element is extracted from the initial profile elements: The first type of rules stored in the preset rule base are obtained. The first type of rules includes at least entity type range limiting rules and entity relationship type range limiting rules. The entity type range limiting rules are used to limit the types corresponding to the entities used to construct the user profile. An example of an entity type range limiting rule is as follows: the entity types of the profile elements are limited to the following types: tag number, loop, device, personnel. The entity relationship type range limiting rules are used to limit the types corresponding to the entity relationships used to construct the user profile. An example of an entity relationship type range limiting rule is as follows: the entity relationships of the profile elements are limited to the following types: contained, source, maintenance, included, commonly used.
[0063] Based on the entity relationship type range limitation rules in the first category, the target entity relationship type is determined (that is, the entity relationship types included in the entity relationship type range limitation rules are determined as the target entity relationship types). The next step is to extract entities of the target entity type from the initial profile elements and extract the corresponding entity attributes: each instantiated entity in the initial profile element is filtered according to its entity type, and all entities of the target entity relationship type are determined as target entities. Simultaneously, the entity attributes of these target entities are extracted from the initial profile elements. Entities outside the target entities in the initial profile elements will not be processed further. The next step is to extract entity relationships of the target entity relationship type from the initial profile elements to obtain the target entity relationships: the initial profile elements record the entity relationships of all target entities. There may be multiple types of entity relationships, but not all entity relationships need to be recorded in the user profile. Therefore, in order to accurately filter out the entity relationships needed for the user profile, it is necessary to further extract entity relationships of the target entity relationship type from the initial profile elements to obtain the target entity relationships. Then, only the target entity relationships among all entity relationships of the target entity are retained, and the entity relationships other than the target entity relationships are removed. Finally, the set of all target entities, the target entity relationships corresponding to the target entities, and the entity attributes is determined as the first type of profile element. The first type of profile element contains multiple data items. Each data item corresponds to a target entity, the target entity relationship corresponding to the target entity, and the entity attribute. Among them, the target entity relationship is a triple, which contains another entity with which the target entity has a target entity relationship and the entity relationship itself. The following is an example of a data item: Entity with entity type of personnel: Operator A, corresponding entity attribute is location: central control room, personnel name: A, target entity relationship is: entity relationship between entity (operator A) and entity (equipment) of type of equipment is common (meaning that entity operator A commonly uses equipment 1), entity attribute of entity (equipment 1) is purpose: used for return flow control.
[0064] Step S206: Standardize the first type of portrait elements according to the second type of rules to obtain the target portrait elements.
[0065] In the technical solution provided in step S206, the second type of rules is used to provide semantic specification templates corresponding to various types of portrait elements. Standardizing the first type of portrait elements according to the second type of rules to obtain the target portrait element can be achieved in several ways, such as: obtaining the second type of rules stored in a preset rule base; standardizing the first type of portrait elements through semantic specification and attribute annotation to obtain the target portrait element; performing semantic specification processing on the entities and entity relationships in the first type of portrait elements according to the semantic specification rules in the second type of rules to obtain the third type of portrait element, wherein the semantic specification rules include semantic specification templates corresponding to various types of portrait elements, and the semantic specification templates at least include synonym specification templates; and performing attribute annotation processing on the entity attributes in the third type of portrait elements according to the attribute annotation rules in the second type of rules to obtain the target portrait element, wherein the attribute annotation rules are used to limit the entity attribute names that need to be annotated as key attributes.
[0066] The preset rule base also stores a second type of rule, which includes at least two sub-rules: semantic specification rules and attribute annotation rules. Semantic specification rules contain multiple semantic specification templates, each defining a set of semantic equivalence relations used to map non-standard expressions to unified standard expressions. Semantic specification templates include, but are not limited to: synonym specification templates: at the entity type level, "manufacturer" and "factory" are unified as "manufacturer"; at the relationship type level, "commonly used" and "frequently operated" are unified as "commonly used". The process of semantically specifying entities and entity relationships in the first type of profile elements according to the semantic specification rules in the second type of rule is as follows:
[0067] The process iterates through each entity in the first type of profile element, matching its entity type against each set of synonyms in the semantic specification template. If a match is found, the entity type is replaced with the description specified in the template. Next, iterates through each entity relationship in the first type of profile element, extracting its relationship type field. This relationship type is then matched against the set of relation synonyms in the semantic specification template. If a match is found, the relationship type is replaced with the standard relationship type specified in the template. After this process, all entities and entity relationship types are unified into a preset standard form, forming the third type of profile element. This step does not add or delete entities or relationships; it only standardizes naming and does not change their structure or semantics. Based on this, attribute labeling rules label the entity attributes in the third type of profile element according to a preset list of key attribute names. Finally, based on the standardized entities, relationships, and core attributes, a precise, scalable, and highly efficient target user profile is constructed, effectively solving the problems of inaccurate profile construction, information sparsity, and low search efficiency caused by semantic confusion and attribute redundancy in multi-source heterogeneous configuration data.
[0068] The process of annotating entity attributes in the third type of profile elements according to the attribute annotation rules in the second type of rules is as follows: Based on the attribute annotation rules, all entity attributes of each entity in the third type of profile element are annotated to identify key attributes with critical value in subsequent business functions (such as trend prediction, anomaly diagnosis, and control optimization). Attribute annotation rules are used to limit the entity attribute names that need to be annotated as key attributes. Attribute annotation rules are a set of attribute name filtering and priority definition rules. For example, for the entity type "tag": {"function", "purpose", "use"} are key attributes to be annotated; for the entity type "equipment": {"location", "purpose", "rated flow", "control loop"} are attributes to be annotated. In addition, the second type of rules can also include attribute name selection rules for entity categories. These rules can limit which attributes to merge or retain for different types of entities. For example, for the type "tag", the attribute names "function", "purpose", and "use" can be selected as one of three; selecting "function" or for the type "tag", all three attributes ("function", "purpose", and "use") can be retained, resulting in three different attributes.
[0069] After the semantic standardization and attribute annotation processes described above, the first type of profile elements are transformed into target profile elements. The target profile elements are structured datasets, organized as follows: each entity corresponds to a standardized name; each entity relationship corresponds to a standardized type; and key attributes of each entity's attributes are annotated. The original first type of profile elements are not overwritten; the original data, the original first type of profile elements, and the corresponding target profile elements are stored as a single layer.
[0070] Step S208: Determine the target user profile based on the entities, entity relationships, and entity attributes in the target profile elements.
[0071] In the technical solution provided in step S208, there are multiple ways to determine the target user profile based on the entities, entity relationships, and entity attributes in the target profile elements. For example: obtaining a vector library, where the vector library is used to store the vector values corresponding to the entities, entity relationships, and entity attributes contained in the existing user profiles in the process industry; determining each entity in the target profile elements as a new entity; for each new entity, if there are overlapping entities, merging the entity attributes and entity relationships corresponding to the new entity with the entity attributes and entity relationships of the overlapping entities in the existing user profiles to obtain the merged existing user profiles; updating each new entity without overlapping entities as a new entity in the merged existing user profiles, and updating the entity relationships and entity attributes of the new entities based on the target profile elements to obtain the target user profile.
[0072] The following method is used to determine whether any newly added entity has overlapping entities: Vectorize the first entity to obtain its corresponding target vector value, where the first entity is any newly added entity in the target profile elements; calculate the similarity between the target vector value and the vector value corresponding to each entity in the vector library, obtaining multiple similarities; determine the largest similarity among these multiple similarities as the target similarity, and determine the entity to which the vector value in the vector library used to calculate the target similarity belongs as the similar entity to the first entity; if the first entity and the similar entity satisfy the third type of rule, the similar entity is determined as an overlapping entity of the first entity, where the third type... The rule contains multiple sub-rules, each corresponding to a criterion for judging the overlap between the first entity and similar entities in one dimension. If the first entity and similar entities do not meet the third type of rule, a merge request message is sent to the target user. The merge request message carries a merge request to determine whether to merge the first entity and similar entities. The merge confirmation message corresponding to the merge request message is received. If the merge confirmation message indicates that the first entity and similar entities should be merged, the similar entities are determined to be overlapping entities of the first entity. If the merge confirmation message indicates that the first entity and similar entities should not be merged, it is determined that the first entity has no overlapping entities.
[0073] It should be noted that if the raw data is obtained for the first time, that is, before any user profiles are built, it is not necessary to obtain the vector library. The user profile is directly mapped through the entities, entity relationships and entity attributes in the target profile elements to obtain the initial user profile. This profile serves as the existing user profile in the process industry. Every preset update cycle, the above steps S202-S208 are repeated based on the new raw data to obtain the updated user profile. Each updated user profile serves as the existing user profile for the next update.
[0074] In some embodiments of this application, when an existing user profile already exists, when determining the target user profile based on the entities, entity relationships, and entity attributes in the target user profile elements, a vector library of corresponding vector values for entities, entity relationships, and entity attributes from existing user profiles in the storage process industry is obtained. Each entity in the target user profile element is identified as a new entity, and its overlap with existing entities in the vector library is determined based on vector similarity comparison. For new entities that overlap, their corresponding entity attributes and entity relationships are merged with the corresponding information of the overlapping entities to achieve semantic alignment and content fusion of profile elements, avoiding data redundancy and semantic conflicts. For new entities that do not overlap, they are added to the existing profile as new entities, and their entity relationships and entity attributes are dynamically updated based on the target user profile elements. Thus, in scenarios where multi-source heterogeneous configuration data is continuously received, the incremental expansion and consistency maintenance of the user profile are automatically completed. The implementation process is as follows:
[0075] First, obtain the vector library. The user profile vector library records the vector representation of each entity, entity relationship, and entity attribute contained in existing user profiles within the process industry. The vector values in this library are generated from historical profile elements after standardization. The generation method involves lexical encoding of the entity name, entity type, relationship type, and attribute value, followed by generating a fixed-dimensional vector using a preset word vector model. This vector library is only used to assist in determining entity overlap and can be empty when initially constructing a user profile. Each entity in the target profile element is identified as a new entity; similarly, each new entity is vectorized to obtain its corresponding vector value. The following example, using any new entity (the first entity being any new entity in the target profile element), illustrates how to determine if a new entity has overlapping entities in the existing user profile:
[0076] The vector value of the first entity is determined as the target vector value. The cosine similarity between this target vector value and the vector value of each existing entity in the vector library is calculated sequentially, resulting in a set of similarity values (multiple similarities). The maximum value in this set of similarities is selected and recorded as the target similarity. The entity to which the vector value in the vector library used to calculate the target similarity belongs is determined as the similar entity to the first entity (i.e., another entity whose target similarity is calculated with the first entity). If the vector library is empty, the process of adding a new entity proceeds directly. If the first entity and the similar entity satisfy the third type of rule (and simultaneously satisfy multiple sub-rules in the third type of rule), the similar entity is determined as the overlapping entity of the first entity. Each sub-rule corresponds to a criterion for judging the overlap between the first entity and the similar entity in one dimension, including but not limited to:
[0077] Sub-rule 1 (Entity type consistency): Newly added entity types must be completely consistent with similar entity types;
[0078] Sub-rule 2 (Entity Name Similarity): The similarity between the name, description, and purpose of the newly added entity name and similar entity names is greater than the first preset percentage (80%).
[0079] Sub-rule 3 (overlap of key attributes): The overlap rate of the common attributes of the new entity and similar entities in the key attributes is greater than the second preset percentage (50%) and the semantic similarity of the corresponding attribute values (based on vector cosine) is greater than the preset value (0.8).
[0080] If the first entity and similar entities do not satisfy the third type of rule (meaning they do not satisfy any sub-rule in the third type of rule), it is determined that the newly added entity has no overlapping entities. However, to avoid misjudgment, the system triggers a manual confirmation process: a merge request message is sent to the target user (such as an engineering configuration engineer or system administrator). This message includes: the standardized name, entity type, and key attribute list of the new entity; the standardized name, entity type, and key attribute list of similar entities; the target similarity value between the two; details of the matching status of each sub-rule of the third type of rule; and explicit operation options: "Merge" or "Keep as Independent Entity". After receiving the merge confirmation message returned by the user, if the confirmation message is "Merge", the similar entities are marked as overlapping entities of the newly added entity; otherwise, it is determined that the first entity has no overlapping entities. A merge rule is generated based on the merge request message and the merge confirmation message to indicate whether to merge the two entities indicated in the merge request message, and this rule is added to the rule base. In subsequent profile update cycles, when new entities and similar entities that match the merging rule are encountered again, there is no need to send a merging request; they will be automatically identified as overlapping entities.
[0081] After determining whether there are overlapping entities for each new entity in the target profile element through the above method, if there are overlapping entities, the entity attributes and entity relationships corresponding to the new entity are merged with the entity attributes and entity relationships of the overlapping entities in the existing user profile to obtain the merged existing user profile. When merging, all entity attributes of the new entity are added to the entity attribute set of the overlapping entity, and the entity relationships corresponding to the new entity are added as entity relationships of the overlapping entity. It should be noted that when merging entity relationships, if there is a relationship in the entity relationship of the new entity that overlaps with the entity relationship of the overlapping entity, the overlapping relationship will not be added to the existing user profile. The following method is used to determine whether there is an overlapping relationship: the relationship name of a certain entity relationship of the new entity and a certain entity relationship of the overlapping entity is a synonym (the semantic similarity of the two relationship names is greater than the preset threshold (0.8)). Finally, each newly added entity without overlapping entities is updated as a new entity in the merged existing user profile. The entity relationships and attributes of the new entity are then updated based on the target profile element (i.e., a new entity is added to the merged existing user profile, and the entity attributes and relationships of this new entity from the target profile element are added to the existing user profile), resulting in the target user profile. Additionally, rules in the preset rule base can be added, modified, and deleted in response to administrator commands. The resulting target user profile is also added to the corresponding pyramid facet. Each updated pyramid facet ultimately includes the original data, the first profile element, the target profile element, and the target user profile. Every preset update cycle, after acquiring new original data and updating the target user profile, a new pyramid facet is generated to store the original data, the first profile element, the target profile element, and the updated target user profile from each update process, forming a pyramid-like model. Each input corresponds to a layer of the pyramid. The generation of a new pyramid is based on the existing user profile and the new input data. The processing result may trigger updates to the rule base (such as adding merging rules). This update will affect the entity recognition and merging decisions of subsequent pyramids, but will not affect the existing data of historical pyramids. Pyramid archiving preserves the original input, eliminates profile traces by pyramid, and ensures that no information is missed at each layer (i.e., the updated user profile is saved in the corresponding pyramid, and the updated profile for that user profile is saved in a new pyramid). After each update, the target user profile in the latest generated pyramid is used as the existing user profile for the current version and is used for entity similarity comparison and merging judgment in the next update cycle.
[0082] In some embodiments of this application, the profile support library and vector library can be updated based on the target profile elements to obtain updated profile support libraries and vector libraries. The profile support library stores entities, entity relationships, and entity attributes contained in existing user profiles within the process industry, while the vector library stores vector values corresponding to the entities, entity relationships, and entity attributes contained in existing user profiles within the process industry. The updated profile support library and vector library are used for the next update of the target user profile. Simultaneously, existing entities in the target user profile can be deleted in response to an administrator's deletion command.
[0083] It is important to note that the method in this application embodiment is not only applicable to building user profiles but also to building enterprise profiles, thereby improving the targeting and adaptability of interactions. The resulting user profiles can be used for trend prediction, performance evaluation, anomaly diagnosis, control optimization, and so on in the process industry. When the method in this application embodiment inputs content (raw data) of varying forms, it uses universally applicable steps to handle information extraction, overlay, and removal logic. A flexible and replaceable rule base is incorporated to meet the integration and implementation needs of extended businesses. Each input is treated as a layer of a pyramid, processed and archived layer by layer to ensure information completeness. A graph support library is added during the user profile construction process to balance efficiency and breadth. It can adapt to different business applications, solving the problem of user profile establishment in the process of large-scale industrial time-series model operations, and ensuring the completeness, universality, and continuous growth of user profiles.
[0084] Figure 3 This is an example diagram of entities and entity relationships in a user profile provided according to an embodiment of this application, illustrating a reference example in a user profile, such as... Figure 3 The user profile contains four entities: Device 1, File 1, User 1, and Manufacturer 1. Connections between these entities represent their relationships. The user profile stores information about each entity, its attributes, and their relationships. Manufacturer 1 is a manufacturer located at address B. File 1 is a file located at path C. Device 1 is a device, specifically a flow control valve. User 1 is an operator located in the control room. The relationships between these entities are: Manufacturer 1 produces Device 1, File 1 originates from Device 1, and User 1 maintains Device 1.
[0085] In some embodiments of this application, a system for performing steps S202-S208 of this application is provided. The system includes a main logic module and an AI logic module, wherein the AI logic module has built-in various models in steps S202-S208. Figure 4This is a general block diagram of logic and data provided according to an embodiment of this application, illustrating the modules included in the main logic module of the above system and the actions performed by the AI logic module. The main logic module includes an information extraction module, an information update and association module, and a configuration maintenance module. The information extraction module is the entry point of the entire system, processing user data files, dialogues, operations, etc., from which entities and relationships can be extracted, to form profile elements. The information update and association module updates the profile elements extracted by the information extraction module into the profile configuration (the profile configuration is the user profile). It also updates various auxiliary libraries (support library, preset rule library, vector library). The configuration maintenance module's functions include editing rules, removing profiles updated on a specific tower surface, updating profiles again according to rules, and generating a profile vector library. The AI logic module performs the following operations: file type recognition, vector library search, text classification and summarization, and recognition of profile elements in text based on rules and the support library (corresponding to the process of recognizing profile elements in text-type data). Figure 4 The document also demonstrates the configuration data involved in the system used to perform steps S202-S208 of this application, including a rule base (i.e., a preset rule base), a profile support library, a profile configuration (i.e., a user profile), and a profile vector library (i.e., a vector library). Table 1 below provides a description of each piece of data in the configuration data.
[0086] Table 1
[0087]
[0088] Figure 5This is a schematic diagram of information extraction according to an embodiment of this application, illustrating the information extraction process. For user profile files, dialogues, and operations (i.e., the aforementioned raw data), user profile files can be identified as text in doc, docx, and pdf formats, as tables in CSV, xls, and xlsx formats, and other formats as other data formats (e.g., the audio data described above). Dialogues and operations can be identified as text. For text data, AI logic is used to identify text and output profile elements based on a profile support library (i.e., the process of determining profile elements corresponding to the aforementioned text data types). For table data, profile elements are identified through table parsing logic. For images, image recognition is performed; other data formats are identified using corresponding recognition methods. After identifying profile elements, the profile elements are reorganized according to rules (i.e., the process of determining the aforementioned target profile elements). Finally, the standard interface for user profile elements is called to proceed to the next stage (the next stage is the user profile construction stage). During the information recognition process, the rule base and the profile support library need to be called. The rule base includes system rules (e.g., the first type of rule) and custom rules (e.g., a merge rule is generated by using merge request messages and merge confirmation messages to indicate whether to merge the two entities indicated in the merge request message). The profile support library can determine entity categories, relationship categories, and common relationships, etc.
[0089] Figure 6This diagram illustrates a logical step-by-step process for parsing a table, based on an embodiment of this application. It demonstrates a process for identifying profile elements from table-type data. First, a table file (i.e., table-type data) is input. Irregular table headers are identified (including merged cells, multi-row headers, and further expansion of the profile element range). Then, the element category range is defined. A model trained on a file category template (a table semantic recognition model) identifies the categories of entities and entity relationships within the file (i.e., under the constraints of the profile support library, the table semantic recognition model performs data recognition on the first type of table data to obtain the first table profile element). While this step defines the element category range, it is not limited (because the element category range is defined based on the profile support library, which is updatable, so it is not limited). The next step is to identify entities in the table body. This step distinguishes between invalid rows and entity rows in the table body, identifies entities of various types in the table body, instantiates them to obtain specific entities and entity attributes, and adds new relationships for the identified entities according to the entity relationship categories obtained in the first step, forming a complete entity (i.e., the second table profile element obtained by performing data recognition on the second type of table data through the table semantic recognition model). The complete entity includes the entity, entity attributes, and entity relationships. At this point, the profile element corresponding to the table type data is obtained. After being organized according to the rules (corresponding to the standardization of the first type of profile element according to the second type of rules to obtain the target profile element), it is updated to the corresponding layer. At this time, the layer contains the original file (original data) + the organized entity group and relationship group (including the first profile element and the target profile element corresponding to the profile element of the table type data). Then, it is input into the next step by the profile standard interface (i.e., determining the target user profile based on the entity, entity relationship, and entity attribute in the target profile element).
[0090] Figure 7 This is a user profile update step diagram provided according to an embodiment of this application, which shows a process for updating a user profile. First, the standard profile input is performed (to obtain the target profile element). Then, it is determined whether each entity and relationship should be added to the profile. The profile configuration is updated (that is, the target user profile is determined based on the entities, entity relationships and entity attributes in the target profile element). This step deletes the entities and relationships that need to be deleted, adds the entities and relationships that need to be added, and merges the attributes of the same entities or relationships. Finally, the profile support library and the preset rule library are updated.
[0091] Figure 8 This is a schematic diagram of a user profile building apparatus according to an embodiment of this application, comprising:
[0092] The acquisition module 802 is used to acquire raw data generated in the process industry and perform data recognition on the raw data to obtain initial profile elements, wherein the initial profile elements include: entities, entity relationships and entity attributes.
[0093] The extraction module 804 is used to extract a first type of portrait element from the initial portrait elements. The first type of portrait element is the portrait element in the initial portrait elements that satisfies the first type of rule. The first type of rule is used to indicate at least the type range corresponding to each type of portrait element.
[0094] The processing module 806 is used to standardize the first type of portrait elements according to the second type of rules to obtain the target portrait elements. The second type of rules are used to provide semantic specification templates corresponding to various types of portrait elements.
[0095] The determination module 808 is used to determine the target user profile based on the entities, entity relationships, and entity attributes in the target profile elements.
[0096] It should be noted that, Figure 8 The user profile building apparatus shown is used to perform Figure 2 The user profile construction method shown is therefore Figure 2 The explanations and descriptions in the user profile construction method also apply to the user profile construction device, and will not be repeated here.
[0097] It should be noted that the modules in the above-mentioned user profile construction device can be program modules (e.g., a set of program instructions to implement a certain function) or hardware modules. For the latter, they can be represented in the following forms, but are not limited to these: each of the above modules is represented by a processor, or the functions of each of the above modules are implemented by a processor.
[0098] This application also provides a non-volatile storage medium, which includes a stored program, wherein, during program execution, the device where the non-volatile storage medium is located executes the user profile construction method of any of the above embodiments.
[0099] This application also provides an electronic device, which includes a processor for running a program, wherein the user profile construction method of any of the above embodiments is executed when the program is running.
[0100] According to another aspect of the embodiments of this application, a computer program product is also provided, including a computer program that, when executed by a processor, implements the user profile construction method of any of the above embodiments.
[0101] In the above embodiments of this application, the descriptions of each embodiment have different focuses. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions of other embodiments.
[0102] In the several embodiments provided in this application, it should be understood that the disclosed technical content can be implemented in other ways. The device embodiments described above are merely illustrative; for example, the division of units can be a logical functional division, and in actual implementation, there may be other division methods. For instance, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the displayed or discussed mutual coupling, direct coupling, or communication connection may be through some interfaces; the indirect coupling or communication connection between units or modules may be electrical or other forms.
[0103] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0104] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.
[0105] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to related technologies, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, read-only memory (ROM), random access memory (RAM), portable hard drives, magnetic disks, or optical disks.
[0106] The above description is only a preferred embodiment of this application. It should be noted that for those skilled in the art, several improvements and modifications can be made without departing from the principle of this application, and these improvements and modifications should also be considered within the scope of protection of this application.
Claims
1. A method for constructing a user profile, characterized in that, include: The raw data generated in the process industry is acquired, and the raw data is subjected to data recognition to obtain initial profile elements, wherein the initial profile elements include: entities, entity relationships and entity attributes; Extract a first type of image element from the initial image elements, wherein the first type of image element is the image element in the initial image elements that satisfies the first type of rule, and the first type of rule is used to indicate at least the type range corresponding to each type of image element; The first type of portrait elements are standardized according to the second type of rules to obtain the target portrait elements. The second type of rules are used to provide semantic specification templates for each type of portrait element. The target user profile is determined based on the entities, entity relationships, and entity attributes in the target profile elements.
2. The method of claim 1, wherein, Data recognition is performed on the raw data to obtain initial profile elements, including: The raw data is classified according to data type to obtain data of multiple data types, wherein the multiple data types include at least: text and tables, and the raw data includes user information files, user dialogue data and operation data generated in the process industry; Data identification is performed on the various data types to obtain the corresponding profile elements for each data type. The set of portrait elements corresponding to the various data types is determined as the initial portrait elements.
3. The method according to claim 2, characterized in that, The method further includes: when the data type is text, performing data recognition on the text data in the following manner to obtain the profile elements corresponding to the text data: Obtain a profile support library, wherein the profile support library is used to store entities, entity relationships and entity attributes contained in existing user profiles in industrial domain knowledge bases or process industries, wherein the industrial domain knowledge base is a database containing multiple preset entity types, preset relationship types and preset attribute types; Under the constraints of the image support library, the text type data is identified by a text semantic recognition model to obtain the image elements corresponding to the text type data.
4. The method according to claim 3, characterized in that, The method further includes: when the data type is a table, performing data recognition on the table-type data in the following manner to obtain the profile elements corresponding to the table-type data: The header content of the data of the aforementioned table type is determined as the first type of table data, and the body content of the data of the aforementioned table type is determined as the second type of table data; Under the constraints of the portrait support library, the first type of table data is identified by the table semantic recognition model to obtain the first table portrait element, wherein the first table portrait element includes the table entity type and the table entity relationship type. The second type of table data is identified by the table semantic recognition model to obtain a second table profile element, wherein the second table profile element includes the entity corresponding to each table entity type in the first table profile element and the entity attributes of the entity. Based on the table entity relationship type in the first table profile element, determine the entity relationship of each entity in the second table profile element, and determine the second table profile element after supplementing the entity relationship of each entity as the profile element corresponding to the data of the table type.
5. The method according to claim 1, characterized in that, Extracting the first type of image elements from the initial image elements includes: Retrieve the first type of rule stored in the preset rule base; The target entity type is determined according to the entity type range limitation rule in the first type of rule, wherein the target entity type is the type corresponding to the entity used to construct the user profile as defined in the entity type range limitation rule; The target entity relationship type is determined according to the entity relationship type range limitation rule in the first type of rule, wherein the target entity relationship type is the type of entity relationship used to construct the user profile as defined in the entity relationship type range limitation rule; Extract entities of the type of the target entity from the initial image elements to obtain the target entity, and extract the entity attributes corresponding to the target entity; Extract entity relationships of type target entity relationship from the initial portrait elements to obtain the target entity relationship; The set of all target entities, their corresponding target entity relationships, and entity attributes is defined as the first type of portrait element.
6. The method according to claim 1, characterized in that, The standardization process of the first type of portrait elements according to the second type of rules to obtain the target portrait elements includes: Retrieve the second type of rule stored in the preset rule base; The first type of portrait elements are standardized using both semantic standardization and attribute annotation methods to obtain the target portrait elements: Based on the semantic standardization rules in the second type of rules, the entities and entity relationships in the first type of portrait elements are semantically standardized to obtain the third type of portrait elements. The semantic standardization rules include semantic standardization templates corresponding to each type of portrait element, and the semantic standardization templates include at least synonym standardization templates. The attribute annotation rules in the second type of rules are used to perform attribute annotation processing on the entity attributes in the third type of portrait elements to obtain the target portrait element. The attribute annotation rules are used to limit the entity attribute names that need to be annotated as key attributes.
7. The method according to claim 1, characterized in that, Determining the target user profile based on the entities, entity relationships, and entity attributes in the target profile elements includes: Obtain a vector library, wherein the vector library is used to store vector values corresponding to entities, entity relationships and entity attributes contained in existing user profiles in the process industry; Each entity in the target profile element is identified as a new entity; For each new entity, if there are overlapping entities, the entity attributes and entity relationships corresponding to the new entity are merged with the entity attributes and entity relationships of the overlapping entities in the existing user profile to obtain the merged existing user profile. Each newly added entity that does not have the overlapping entity is updated to a new entity in the merged existing user profile, and the entity relationships and entity attributes of the new entity are updated according to the target profile elements to obtain the target user profile.
8. The method according to claim 7, characterized in that, The method further includes determining whether any of the newly added entities exists in the overlapping entity by means of the following: The first entity is vectorized to obtain the target vector value corresponding to the first entity, wherein the first entity is any newly added entity in the target profile elements; Calculate the similarity between the target vector value and the vector value corresponding to each entity in the vector library to obtain multiple similarity scores. The largest similarity among the plurality of similarities is determined as the target similarity, and the entity to which the vector value in the vector library used to calculate the target similarity belongs is determined as the similar entity corresponding to the first entity; If the first entity and the similar entity satisfy the third type of rule, the similar entity is determined to be an overlapping entity of the first entity. The third type of rule includes multiple sub-rules, and each sub-rule corresponds to a judgment criterion for the degree of overlap between the first entity and the similar entity in one dimension. If the first entity and the similar entity do not satisfy the third type of rule, a merge request message is sent to the target user, wherein the merge request message is used to carry a merge request to whether to merge the first entity and the similar entity; If a merge confirmation message is received corresponding to the merge request message, and the merge confirmation message indicates that the first entity and the similar entity should be merged, then the similar entity is determined as an overlapping entity of the first entity. If the merge confirmation message indicates that the first entity and the similar entity will not be merged, it is determined that the first entity does not have any overlapping entities.
9. The method according to claim 1, characterized in that, The method further includes: The portrait support library and vector library are updated based on the target portrait elements to obtain the updated portrait support library and vector library. The portrait support library is used to store the entities, entity relationships and entity attributes contained in the existing user portraits in the process industry, and the vector library is used to store the vector values corresponding to the entities, entity relationships and entity attributes contained in the existing user portraits in the process industry.
10. A user profile construction apparatus, characterized in that, include: The acquisition module is used to acquire raw data generated in the process industry and perform data recognition on the raw data to obtain initial profile elements, wherein the initial profile elements include: entities, entity relationships and entity attributes; The extraction module is used to extract a first type of portrait element from the initial portrait elements, wherein the first type of portrait element is a portrait element in the initial portrait elements that satisfies a first type of rule, and the first type of rule is used to indicate at least the type range corresponding to each type of portrait element; The processing module is used to standardize the first type of portrait elements according to the second type of rules to obtain the target portrait elements, wherein the second type of rules are used to provide semantic specification templates corresponding to various types of portrait elements. The determination module is used to determine the target user profile based on the entities, entity relationships, and entity attributes in the target profile elements.
11. A non-volatile storage medium, characterized in that, The non-volatile storage medium stores a program, wherein when the program is executed, it controls the device where the non-volatile storage medium is located to execute the user profile construction method according to any one of claims 1 to 9.
12. An electronic device, characterized in that, include: A memory and a processor, the processor being configured to run a program stored in the memory, wherein the program, when running, executes the method for constructing a user profile as described in any one of claims 1 to 9.
13. A computer program product comprising computer instructions, characterized in that, When the computer instructions are executed by the processor, they implement the user profile construction method according to any one of claims 1 to 9.