[0057] Example 1
[0058] This embodiment provides a multimodal knowledge graph construction and retrieval system, such as figure 1 , the multimodal knowledge graph construction and retrieval system includes a cascaded knowledge data acquisition and processing unit, a knowledge graph construction management unit and a knowledge graph application service unit;
[0059] The knowledge data collection and processing unit is used to collect and transmit data, including a multimodal data collection unit;
[0060] The knowledge graph construction management unit is used for the construction and update management of the knowledge graph; the construction of the knowledge graph includes constructing the ontology according to the business needs, completing the knowledge fusion according to the data content and ontology structure, associating the labeled data with the ontology, and completing the knowledge graph model. the construction of;
[0061] The knowledge graph application service unit includes a knowledge retrieval unit, a knowledge association and recommendation unit, and a knowledge question and answer unit;
[0062] like figure 2 , the execution of knowledge fusion includes the following steps:
[0063] Step S1, calculate the real-time rate of change of each multi-modal data, and differentiate the high-speed update data P according to a predefined rate of change threshold n and slow update data P r; for high-speed update data P n , call the fast variable data estimation fusion program to estimate and fuse the data; for the slow update data P r , directly call the slow variable data processing fusion program for calculation and fusion, and directly update the slow update data P r The change value of is calculated;
[0064] Step S2, the change value estimated by the rapidly changing data estimation fusion program exceeds a predefined threshold or the data P is updated slowly r If the change value exceeds the predefined threshold, at least two knowledge graph construction models are called to complete the model construction of the knowledge graph;
[0065]In step S3, the knowledge graph model constructed in step S2 is formed into a final knowledge graph model according to a predefined voting strategy.
[0066] In this embodiment, when the multimodal data is the original data to construct the knowledge graph, the decision and method for processing the data can be effectively differentiated according to the change rate, and the real-time performance and efficiency can be improved. The present invention calculates the real-time change rate of each multimodal data, and distinguishes the high-speed update data P according to the predefined change rate threshold n and slow update data P r; for high-speed update data P n , call the fast variable data estimation fusion program to estimate and fuse the data; for the slow update data P r , directly call the slow variable data processing fusion program for calculation and fusion, and directly update the slow update data P r The change value is calculated, and the efficient knowledge graph construction is realized.
[0067] Specifically, the multimodal data collection unit includes a text data collection unit, an image data collection unit, an audio data collection unit, and a video data collection unit.
[0068] Preferably, in order to efficiently fit, estimate, and integrate fast data, in this embodiment, a specially free fast-variable data estimation fusion program is used to perform data estimation and fusion. Of course, existing data fusion methods can also be used, and the method in this embodiment includes:
[0069] Step R1, define Among them, {x 1 ,x 2 ,...x k ,x K } is the observation value of K independent data samples in the historical high-speed update data samples, k=1, 2, 3...K, j and w are predefined parameters, w 1 ,w 2 ,...w k is a set of real numbers;
[0070] Step R2, pass y k =μ+αt k +ε k , μ=log(2γ), the characteristic index ∝ and the dispersion coefficient γ are calculated; among them, ε k are the pre-defined coefficients of the same distribution but independent error terms with mean 0, t k =log|w k |, K is the number of historical samples;
[0071] Step R3, pass z k =δw k +ε k , calculate the position parameter δ, where z k =arctan(Im(w k )/Re(w k ), ε k are the pre-defined mean 0 that belong to the same distribution but are independent error term coefficients;
[0072] Step R4: Bring the characteristic index ∝, dispersion coefficient γ, and position parameter δ obtained in steps R2 and R3 into φ(w)=exp{jδw-γ|w| ∝ }, and do Fourier transform to get the probability density function f(x), complete the high-speed update data P n The fitted estimates are fused.
[0073] Specifically, the rapidly changing data estimation fusion program is invoked to perform data estimation and fusion, and further includes:
[0074] Step R5, determine As rapidly changing data estimates whether the change value estimated by the fusion procedure exceeds a predefined threshold T max index; among them, A is the data value that needs to be estimated and fused in real time, is the parameter estimated by the historical high-speed update data samples, T max is the detection threshold corresponding to the predefined fusion rate.
[0075] Preferably, in order to prevent the degradation of the real-time system and the inefficiency caused by failure, preferably, the knowledge data acquisition and processing units include multiple, the knowledge graph construction management unit includes multiple, and the knowledge graph application service unit includes multiple;
[0076] Step A1, optionally multiple knowledge data acquisition and processing units, multiple knowledge graph construction management units, and multiple knowledge graph application service units to form a real-time system;
[0077] Step A2, optional adjacent front and rear levels, the unit of the former level is defined as the primary unit, and the unit of the latter level is defined as the secondary unit;
[0078] Step A3, define the real-time system performance model as H=H 1 ·H 2 ·H 3 ·H 4 ·H 5 , where H 1 For effectiveness, H 2 For processing efficiency, H 3 is the system load rate, H 4 is the data processing accuracy, H 5 is the system failure rate;
[0079] Step A4, H 4 is predefined, H 5 It is the real-time system failure rate calculated according to the historical situation, and calculated according to the following publicity H 2 =PH 21 +(1-P)H 21 H 22 , H 3 = (NH 31 +NH 32 )/(N+M);
[0080] Among them, W=PW 1 +(1-P)(W 1 +W 2 ), T=PT 1 +(1-P)(T 1 +T 2 ), t is the total time of the data in the primary unit and the secondary unit, P is the probability of the predefined data entering the secondary unit from the primary unit data, the processing efficiency of the primary unit Secondary unit processing efficiency Primary unit load factor Secondary unit load factor N is the number of primary units, M is the number of secondary units, R is an integer, P R Average data volume according to predefined primary units get, Q R Average data volume according to predefined sub-units get, W 1 =L 1 /λ is the average response time of primary unit data, W 2 = L 2 /λH 21 P is the response time of the secondary unit; T 1 =1/μ 1 Average service time for primary unit data, T 2 =1/μ 2 Average service time for secondary unit data; μ 1 and μ 2 is the parameter of exponential distribution, λ is the predefined Poisson parameter;
[0081] In step A5, the overall performance value of the real-time system is calculated, and the size of the overall performance value is judged. If it is greater than the predefined threshold, return to step A1 to reselect to form a new real-time system.
[0082] This embodiment also provides a method for constructing and retrieving a multimodal knowledge graph, the method comprising:
[0083] Step 1, the multimodal data collection unit collects knowledge data, and preprocesses the knowledge data, distinguishes data categories for the data, establishes data identifiers, generates standard data bars, determines whether the knowledge graph database exists, and if so, obtains the identifiers Index, but store if it does not exist;
[0084] Step 2: Build an ontology according to business needs, and build a mapping relationship between standard data bars and ontology to complete the preliminary construction of the knowledge graph model;
[0085] Step 3, complete the knowledge fusion process according to the data content and ontology structure, and update the knowledge map, including:
[0086] Step S1, calculate the real-time rate of change of each multi-modal data, and differentiate the high-speed update data P according to a predefined rate of change threshold n and slow update data P r; for high-speed update data P n , call the fast variable data estimation fusion program to estimate and fuse the data; for the slow update data P r , directly call the slow variable data processing fusion program for calculation and fusion, and directly update the slow update data P r The change value of is calculated;
[0087] Step S2, the change value estimated by the rapidly changing data estimation fusion program exceeds a predefined threshold or the data P is updated slowly r If the change value exceeds the predefined threshold, at least two knowledge graph construction models are called to complete the model construction of the knowledge graph;
[0088] Step S3, the knowledge graph model constructed in step S2 is formed into the final knowledge graph model according to the predefined voting strategy
[0089] Step 4, the knowledge graph application service unit invokes the knowledge graph according to business requirements to participate in completing the business.
[0090] Preferably, on the basis of the conventional data fusion method, the rapidly changing data estimation fusion procedure of the present embodiment includes:
[0091] Step R1, define Among them, {x 1 ,x 2 ,...x k ,x K } is the observation value of K independent data samples in the historical high-speed update data samples, k=1, 2, 3...K, j and w are predefined parameters, w 1 ,w 2 ,...w k is a set of real numbers;
[0092] Step R2, pass y k =μ+αt k +ε k , μ=log(2γ), the characteristic index ∝ and the dispersion coefficient γ are calculated; among them, ε k are the pre-defined coefficients of the same distribution but independent error terms with mean 0, t k =log|w k |, K is the number of historical samples;
[0093] Step R3, pass z k =δw k +ε k , calculate the position parameter δ, where z k = arctan(Im(w k )/Re(w k ), ε k is a predefined mean 0 that belongs to the same distribution but is independent of the error term coefficients;
[0094] Step R4: Bring the characteristic index ∝, dispersion coefficient γ, and position parameter δ obtained in steps R2 and R3 into φ(w)=exp{jδw-γ|w| ∝}, and do Fourier transform to get the probability density function f(x), complete the high-speed update data P n The fitted estimates are fused.
[0095] Preferably, calling a rapidly changing data estimation fusion program to perform data estimation and fusion, further comprising:
[0096] Step R5, determine As rapidly changing data estimates whether the change value estimated by the fusion procedure exceeds a predefined threshold T max index; among them, A is the data value that needs to be estimated and fused in real time, is the parameter estimated by the historical high-speed update data samples, T max is the detection threshold corresponding to the predefined fusion rate.
[0097] Preferably, in order to improve the real-time efficiency of the system and prevent failure or system function degradation, the multimodal knowledge graph construction and retrieval method further includes:
[0098] Step A1, optionally multiple knowledge data acquisition and processing units, multiple knowledge graph construction management units, and multiple knowledge graph application service units to form a real-time system;
[0099] Step A2, optional adjacent front and rear levels, the unit of the former level is defined as the primary unit, and the unit of the latter level is defined as the secondary unit;
[0100] Step A3, define the real-time system performance model as H=H 1 ·H 2 ·H 3 ·H 4 ·H 5 , where H 1 For effectiveness, H 2 For processing efficiency, H 3 is the system load rate, H 4 is the data processing accuracy, H 5 is the system failure rate;
[0101] Step A4, H 4 is predefined, H 5 It is the real-time system failure rate calculated according to the historical situation, and calculated according to the following publicity H 2 =PH 21 +(1-P)H 21 H 22 , H 3 = (NH 31 +NH 32 )/(N+M);
[0102] Among them, W=PW 1 +(1-P)(W 1 +W 2 ), T=PT 1 +(1-P)(T 1 +T 2 ), t is the total time of the data in the primary unit and the secondary unit, P is the probability of the predefined data entering the secondary unit from the primary unit data, the processing efficiency of the primary unit Secondary unit processing efficiency Primary unit load factor Secondary unit load factor N is the number of primary units, M is the number of secondary units, R is an integer, P R Average data volume according to predefined primary units get, Q R Average data volume according to predefined sub-units get, W 1 =L 1 /λ is the average response time of primary unit data, W 2 =L 2 /λH 21 P is the response time of the secondary unit; T 1 =1/μ 1 Average service time for primary unit data, T 2 =1/μ 2 Average service time for secondary unit data; μ 1 and μ 2 is the parameter of exponential distribution, λ is the predefined Poisson parameter;
[0103] In step A5, the overall performance value of the real-time system is calculated, and the size of the overall performance value is judged. If it is greater than the predefined threshold, return to step A1 to reselect to form a new real-time system.
[0104] In this embodiment, when the knowledge graph is constructed for the original data of the multimodal data, the decision and method for processing the data can be effectively differentiated according to the change rate, and the real-time performance and efficiency can be improved. The present invention calculates the real-time change rate of each multimodal data, and distinguishes the high-speed update data P according to the predefined change rate threshold n and slow update data P r; for high-speed update data P n , call the fast variable data estimation fusion program to estimate and fuse the data; for the slow update data P r , directly call the slow variable data processing fusion program for calculation and fusion, and directly update the slow update data P r The change value is calculated, and the efficient knowledge graph construction is realized. For rapidly changing data, high-speed and high-precision fitting is achieved through the unique algorithm of the present invention. And the overall comprehensive efficiency of the system is evaluated in real time, and then the composition of the system is adjusted to make it effective and efficient.