An artificial intelligence-based flower disease and pest knowledge graph construction system and method
By constructing an AI-based knowledge graph system for flower diseases and pests, the problem of insufficient processing of multi-dimensional dynamic information has been solved, improving the accuracy and response efficiency of disease and pest diagnosis, and making it suitable for risk early warning in greenhouse flower cultivation.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- YUNNAN HUAWU TECHNOLOGY CO LTD
- Filing Date
- 2026-03-19
- Publication Date
- 2026-06-26
AI Technical Summary
Existing flower pest and disease maps cannot effectively process multidimensional dynamic information during construction, resulting in insufficient accuracy and response efficiency of intelligent early warning and diagnosis of pests and diseases. In particular, in greenhouse flower cultivation, the complexity of photoperiod regulation and microclimate change leads to temporal semantic distortion, affecting the accuracy of diagnostic models.
An AI-based knowledge graph construction system for flower diseases and pests is adopted. The system uses a data acquisition module to sample multi-source text by domain, a parsing module to extract four-channel time-series elements, an entity recognition module to calculate channel gating values, a relation extraction module to generate periodic vectors, a time-series transformation module to reduce the dimensionality to disease window coordinates, and a knowledge fusion module to establish a graph network to generate a knowledge graph for flower diseases and pests.
By deeply integrating multidimensional evolution information, the accuracy of locating the disease and pest induction window has been improved, the response time of agricultural diagnosis has been shortened, and more reliable risk warning support has been provided.
Smart Images

Figure CN121882218B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of smart agricultural information processing, and more specifically, to a system and method for constructing a knowledge graph of flower diseases and pests based on artificial intelligence. Background Technology
[0002] With the widespread application of artificial intelligence technology in smart agriculture, knowledge graphs have become a crucial underlying support for agricultural pest and disease question answering and assisted diagnosis. Conventional crop pest and disease maps typically employ a static triplet association method of disease-characteristic-pesticide. When dealing with temporal sequences and triggering factors, time elements are generally treated as a simple single-dimensional calendar axis, or environmental conditions are recorded only as supplementary text annotation fields. However, in the context of flower cultivation (especially greenhouse flower cultivation), the occurrence patterns of pests and diseases are highly complex and dynamically evolving.
[0003] Outbreaks of flower diseases and pests are often independently driven by two dimensions: on the one hand, artificial photoperiod regulation directly promotes the rapid evolution of plant phenological stages; on the other hand, routine operations within greenhouses (such as concentrated irrigation, micro-spraying, harvesting, and cold chain transportation) can instantly alter microclimate conditions (e.g., causing a sudden increase in relative humidity and the generation of continuous free water on leaf surfaces), thereby directly triggering a dangerous window for pathogen infection or pest reproduction. Therefore, in multi-source corpora on flower plant protection, descriptions of disease and pest occurrence cycles are filled with a complex mix of information, including calendar time, phenological stages, operational events, and environmental thresholds. Existing atlas construction schemes often employ a one-dimensional compression approach when extracting this multi-dimensional dynamic information, attempting to forcibly flatten different cyclical elements onto a single time series. This approach completely severs the independent dimensional space that should exist between operational events-induced microclimate changes and light-induced phenological changes, easily leading to misalignments and mismatches in cyclical relationships within multi-source texts. When the underlying network data has such temporal semantic distortion, the upper-level agricultural diagnostic model will be unable to accurately locate the real pre-induction window that leads to the outbreak of diseases when performing retrieval and reasoning, which greatly limits the accuracy and response efficiency of intelligent early warning and diagnosis of flower diseases and pests. Summary of the Invention
[0004] This invention provides an artificial intelligence-based knowledge graph construction system and method for flower diseases and pests, which solves the technical problems mentioned in the background art.
[0005] Firstly, an artificial intelligence-based knowledge graph construction system for flower diseases and pests:
[0006] The acquisition module is used to capture text from multiple sources and perform domain-specific sampling based on the photoperiod domain and humidity domain to obtain sampled text.
[0007] The parsing module is used to vectorize the sampled text and extract four-channel time-series elements, which include calendar elements, phenological stage elements, operational events, and environmental thresholds.
[0008] The entity recognition module is used to calculate channel gating values based on the four-channel time-series elements, and use the channel gating values to define boundaries to extract map entities;
[0009] The relation extraction module is used to extract associated data based on the graph entities and generate periodic vectors based on the four-channel time-series elements to construct a graph quadruple containing the periodic vectors.
[0010] The time-series transformation module is used to fit the mapping relationship between the operation event and the periodic vector, and extract the biaxial time-series operator to reduce the dimensionality of the periodic vector to the coordinates of the onset window.
[0011] The knowledge fusion module is used to align and fuse the coordinates of the disease outbreak window into the representation vector of the atlas entity.
[0012] The graph application module is used to establish a graph network based on the disease window coordinates and generate a knowledge graph of flower diseases and pests, so as to accelerate the response to agricultural diagnostic instructions through the knowledge graph of flower diseases and pests.
[0013] Secondly, an artificial intelligence-based method for constructing a knowledge graph of flower diseases and pests is applied to any of the aforementioned artificial intelligence-based knowledge graph construction systems for flower diseases and pests:
[0014] Multi-source text is captured, and sampled text is obtained by performing domain-specific sampling based on the photoperiod domain and humidity domain.
[0015] The sampled text is vectorized to extract four-channel time-series elements, which include calendar elements, phenological stage elements, operational events, and environmental thresholds.
[0016] Based on the four-channel time-series elements, channel gating values are calculated, and the boundaries are defined using the channel gating values to extract map entities;
[0017] Based on the graph entities, related data are extracted, and periodic vectors are generated based on the four-channel time-series elements to construct a graph quadruple containing the periodic vectors.
[0018] Fit the mapping relationship between the operation event and the periodic vector, and extract the biaxial time series operator to reduce the dimensionality of the periodic vector to the coordinates of the onset window;
[0019] The coordinates of the disease outbreak window are concatenated to the representation vector of the atlas entity for alignment and fusion.
[0020] Based on the coordinates of the disease outbreak window, a graph network is established to generate a knowledge graph of flower diseases and pests, so as to accelerate the response to agricultural diagnostic instructions through the knowledge graph of flower diseases and pests.
[0021] The beneficial effects of this invention are as follows: by deeply integrating multidimensional evolutionary information, it effectively solves the problem of distorted association of multi-source corpora in flower production scenarios, and significantly improves the accuracy of locating the disease and pest induction window; by using dimensionality reduction processing of photoperiod evolution and agricultural intervention elements, it transforms complex and intertwined temporal information into a highly aligned distribution pattern, enhances the expressive efficiency of the knowledge network in response to environmental fluctuations, thereby greatly shortening the response time of agricultural diagnosis and providing more reliable risk early warning support for greenhouse cultivation. Attached Figure Description
[0022] Figure 1 This is a flowchart of an artificial intelligence-based knowledge graph construction system for flower diseases and pests according to the present invention;
[0023] Figure 2 This is a schematic diagram illustrating a specific implementation of the present invention. Detailed Implementation
[0024] The subject matter described herein will now be discussed with reference to exemplary embodiments. It should be understood that these embodiments are discussed only to enable those skilled in the art to better understand and implement the subject matter described herein, and changes may be made to the function and arrangement of the elements discussed without departing from the scope of this specification. Various processes or components may be omitted, substituted, or added as needed in the examples. Furthermore, features described in some examples may be combined in other examples.
[0025] Example 1: As Figure 1 As shown, an artificial intelligence-based knowledge graph construction system for flower diseases and pests is presented:
[0026] The acquisition module is used to capture text from multiple sources and perform domain-specific sampling based on the photoperiod domain and humidity domain to obtain sampled text.
[0027] The parsing module is used to vectorize the sampled text and extract four-channel time-series elements, which include calendar elements, phenological stage elements, operational events, and environmental thresholds.
[0028] The entity recognition module is used to calculate channel gating values based on the four-channel time-series elements, and use the channel gating values to define boundaries to extract map entities;
[0029] The relation extraction module is used to extract associated data based on the graph entities and generate periodic vectors based on the four-channel time-series elements to construct a graph quadruple containing the periodic vectors.
[0030] The time-series transformation module is used to fit the mapping relationship between the operation event and the periodic vector, and extract the biaxial time-series operator to reduce the dimensionality of the periodic vector to the coordinates of the onset window.
[0031] The knowledge fusion module is used to align and fuse the coordinates of the disease outbreak window into the representation vector of the atlas entity.
[0032] The graph application module is used to establish a graph network based on the disease window coordinates and generate a knowledge graph of flower diseases and pests, so as to accelerate the response to agricultural diagnostic instructions through the knowledge graph of flower diseases and pests.
[0033] Preferably, multi-source text is crawled, and sampled text is obtained by segmenting the text according to the photoperiod domain and the humidity domain, including:
[0034] The optical periodicity domain score and humidity domain score of the multi-source text are calculated using the following formula:
[0035]
[0036]
[0037] in, Indicates the optical periodic domain score. Indicates the score for the humidity range. This indicates the multi-source text. This represents textual terms in the multi-source text. Indicates the matching status value. A vocabulary representing photoperiods, A list of terms indicating humidity. Indicates word weight;
[0038] Based on the difference between the photoperiod domain score and the humidity domain score, scene domain labels are assigned to the multi-source text.
[0039] Obtain the source confidence coefficient, time decay factor, and scene domain balance coefficient corresponding to the scene domain label of the multi-source text;
[0040] The sampling weight is obtained by multiplying the source confidence coefficient, the time decay factor, and the scene domain balance coefficient. The calculation formula is as follows:
[0041]
[0042] in, Indicates the sampling weight. This represents the source confidence coefficient. Indicates time difference, Indicates the attenuation rate. Indicates the aging decay factor. Indicates the scene domain balance coefficient;
[0043] The normalized sampling probability is calculated based on the sampling weights, using the following formula:
[0044]
[0045] in, This represents the normalized sampling probability. This represents the sum of the sampling weights of all the multi-source texts.
[0046] The photoperiod domain score is a quantitative indicator that represents the information related to photoperiod regulation and phenological stages of flowers contained in multi-source texts. It is calculated by combining the matching of text words with the photoperiod vocabulary and word weights.
[0047] Humidity domain score is a quantitative indicator that represents the amount of information related to greenhouse operation and humidity environment contained in multi-source text. It is calculated by combining the matching of text words with the humidity vocabulary with word weights.
[0048] Text lexical units are the smallest semantic units obtained after multi-source text has been segmented. They can be obtained by segmenting multi-source text, removing stop words, and filtering by part of speech using Chinese word segmentation tools.
[0049] Multi-source text refers to text data related to flower diseases and pests that are captured from different information sources. It can be obtained by pulling data through the agricultural knowledge base interface.
[0050] The matching status value is a binary quantization value used to determine whether a text word belongs to the corresponding vocabulary. It is 1 when the text word belongs to the corresponding vocabulary and 0 when it does not.
[0051] The photoperiod vocabulary is a collection of all words related to photoperiod regulation and phenological stages of flowers. The preferred vocabulary includes words such as shading, supplemental lighting, nighttime light cutoff, short day, long day, flower bud differentiation, budding, initial flowering, full bloom, and post-harvest. This type of vocabulary is the core indicative vocabulary for the photoperiod regulation and phenological stage division of flowers, and can accurately represent the photoperiod domain attributes of the text.
[0052] The humidity vocabulary is a collection of all words related to greenhouse farming operations and humidity environment. It is preferably a collection of words including irrigation, spraying, harvesting, transportation, refrigeration, relative humidity, leaf surface moisture, free water, condensation, and ventilation. This type of vocabulary is the core indicative vocabulary of humidity changes and related operations in flower cultivation, and can accurately represent the humidity domain attribute of the text.
[0053] Word weight is a quantitative weight value assigned to different text words to distinguish the importance of words in domain attribute determination. It is preferred that ordinary indicator words be assigned a weight of 1 and highly indicator words be assigned a weight of 2 to 3. Highly indicator words have a stronger directional effect on the determination of text domain attributes, and increasing their weight can improve the accuracy of domain score calculation.
[0054] The source confidence coefficient is a quantitative coefficient assigned based on the professionalism of the information source of multi-source texts. It is used to distinguish the credibility of texts from different sources. The preferred values are: 1.00 for professional knowledge bases, government plant protection departments, and university extended materials; 0.90 for peer-reviewed papers and book chapters; 0.70 for industry associations and formal horticulture media; 0.50 for corporate white papers and product pages; and 0.25 for forums and self-media. The professionalism and authority of the information source are positively correlated with the credibility of the text content. The higher the professionalism, the higher the confidence coefficient.
[0055] The decay rate is a quantitative coefficient used to calculate the timeliness decay factor of texts, and is used to distinguish the reference value of texts published at different times. The preferred values are 0.000191 for papers and books, 0.000382 for extended manuals and plant protection station materials, 0.00191 for industry media and product pages, and 0.00385 for forums and self-media. The decay rate is calculated based on the half-life formula. Different types of texts have different knowledge half-lives: 10 years for papers and books, 5 years for extended manuals, 1 year for industry media, and half a year for forums and self-media. The corresponding decay rate is obtained by converting the half-life formula.
[0056] The time difference is the number of days between the publication time of a multi-source text and the current data collection time. It can be obtained by parsing the text's metadata information and the webpage's publication time tag. If there is no publication time, it can be calculated by substituting the first crawl time.
[0057] The time-effect decay factor is a quantitative indicator that characterizes the time effectiveness of multi-source text. It is obtained by exponential calculation of decay rate and time difference, and the value decreases as the time difference increases.
[0058] The scene domain balance coefficient is a quantization coefficient assigned to ensure a balance in the number of text samples collected in the photoperiod domain and the humidity domain. It is preferably between 0.5 and 2.0. This value range can avoid excessive amplification or suppression of text in a single domain, effectively balance the number of samples in the two domains, and prevent excessive collection of data in the noise domain.
[0059] Sampling weight is a quantitative weight obtained by combining the source confidence coefficient, time decay factor and scene domain balance coefficient of multi-source texts, and is used to determine the priority of text in sampling.
[0060] Normalized sampling probability is the probability value obtained by normalizing the sampling weights. It represents the probability of a single multi-source text being selected. The sum of the normalized sampling probabilities of all texts is 1.
[0061] The index item of a multi-source text is a unique identifier assigned to all multi-source texts to be sampled, used to distinguish different text data and facilitate calculation and filtering during the sampling process.
[0062] In detail, the process begins by using a word segmentation tool to process the multi-source text and obtain text units. Each unit is then matched against the photoperiod and humidity word lists. If a match is successful, the corresponding word weights are accumulated to obtain the photoperiod domain score and the humidity domain score. The difference between the two domain scores is then calculated. If the difference is greater than or equal to 2, the text is assigned to the domain with the higher score. If the difference is less than 2, the text is duplicated across both domains, and each duplicated text is assigned a subsequent sampling weight of 0.5. Next, the source confidence coefficient is determined based on the text's information source. This is combined with a timeliness decay factor calculated from the time difference between publication and collection times, and a scene domain balance coefficient calculated from the number of samples in both domains using a sliding window. These three factors are multiplied together to obtain the sampling weight of a single text. Subsequently, the... The sampling weights of all texts to be sampled are summed, and the sampling weight of a single text is divided by the sum to obtain the normalized sampling probability. Finally, a weighted reservoir sampling algorithm is used to extract texts from the candidate pool according to the normalized sampling probability, so as to ensure a balance in the number of text samples in the photoperiod domain and the humidity domain. For example, the photoperiod domain score of the text "After shading treatment, gray mold is prone to occur during the budding stage" is 4, and the humidity domain score is 0. If the difference is 4, which is greater than 2, it is assigned to the photoperiod domain. The humidity domain score of the text "After irrigation, increased humidity induces gray mold" is 5, and the photoperiod domain score is 0. If the difference is 5, which is greater than 2, it is assigned to the humidity domain. The photoperiod domain score of the text "High incidence of gray mold in greenhouses in spring, pay attention to ventilation and humidity control" is 2, and the humidity domain score is 2. If the difference is 0, which is less than 2, it is copied to both domains.
[0063] In detail, the entries in the photoperiod and humidity vocabularies are selected from core indicative terms in the fields of flower diseases and pests and greenhouse cultivation. Specific terms can be supplemented according to the flower varieties used in actual applications. Word weights are assigned based on the strength of the domain attribute indicative power of the term; ordinary indicative terms are assigned a value of 1, while high-indicative terms with specific numerical values are assigned values of 2 to 3. The domain score difference threshold is fixed at 2, a threshold that has been tested extensively with flower and plant protection corpora to accurately distinguish texts dominated by a single domain. The source confidence coefficient is fixed in five categories based on the professionalism of the information source, and can be fine-tuned according to project needs in practical applications, with an adjustment range not exceeding 0.1. The decay rate is calculated using the half-life formula, i.e., the decay rate equals the natural logarithm 2 divided by the knowledge half-life of the text. Different types... The knowledge half-life of the text is determined based on industry consensus in the field of agricultural plant protection; the scene domain balance coefficient is calculated by using a sliding window to count the number of photoperiod domain and humidity domain samples collected in the last 7 days, dividing each domain sample by the target ratio of 0.5, and the calculation result must be limited to the range of 0.5 to 2.0; the execution of weighted reservoir sampling requires first capturing all multi-source texts, deduplicating them using the Sim hash algorithm, and considering texts with a similarity greater than or equal to 0.9 as duplicate texts, retaining only the text with the highest sampling weight, and then including the deduplicated texts in the candidate pool. The sampling sample size is determined according to the subsequent model training requirements, generally between 100,000 and 500,000 texts, and the final sampling text set is formed after sampling is completed according to the normalized sampling probability.
[0064] Preferably, the sampled text is vectorized to extract four-channel time-series elements including calendar elements, phenological stage elements, operational events, and environmental thresholds, including:
[0065] Construct the text lexical embedding matrix of the sampled text, and map it using the channel projection matrix to obtain the channel representations. The calculation formula is as follows:
[0066]
[0067] in, Represents the text word embedding matrix. These represent the channel projection matrices corresponding to calendar, phenological stage, operational event, and environmental threshold, respectively. These represent the corresponding calendar channel, phenological stage channel, operational event channel, and environmental threshold channel, respectively.
[0068] The channel representations of each item are aggregated using attention weights to obtain the feature vectors of each item. The calculation formula is as follows:
[0069]
[0070]
[0071] in, Indicates attention weights, This represents a trainable weight vector. Indicates the number of lexical units. Indicates the first Attention weights for each word element, Each channel indicates the location. The amount, These represent the generated calendar feature vector, phenological stage feature vector, operational event feature vector, and environmental threshold feature vector, respectively.
[0072] The four-channel timing elements are obtained by concatenating the above feature vectors: .
[0073] The text lexical embedding matrix is a two-dimensional matrix formed by converting the lexical units of the sampled text into numerical vectors and arranging them in order. It is a numerical representation of the semantics of the lexical units in the sampled text.
[0074] The calendar channel projection matrix is a weight matrix used to map the text word embedding matrix to the calendar feature space. It is preferably a real number matrix with 768 rows and 64 columns. This dimension can achieve dimensionality reduction while retaining the core information of calendar features, which can meet the efficiency requirements of subsequent feature calculations and match the 768-dimensional vector of pre-trained word embeddings in the mainstream agricultural field.
[0075] The phenological stage channel projection matrix is a weight matrix used to map the text word embedding matrix to the phenological stage feature space. It is preferably a real number matrix with 768 rows and 64 columns. This dimension can achieve dimensionality reduction while retaining the core information of the phenological stage features, and it is consistent with the dimension of the calendar channel projection matrix, which facilitates the subsequent splicing and calculation of feature vectors.
[0076] The operation event channel projection matrix is a weight matrix used to map the text lexical embedding matrix to the operation event feature space. It is preferably a real number matrix with 768 rows and 64 columns. This dimension can achieve dimensionality reduction while retaining the core information of the operation event features, and is consistent with the dimensions of other channel projection matrices to ensure the consistency of feature calculation.
[0077] The environment threshold channel projection matrix is a weight matrix used to map the text word embedding matrix to the environment threshold feature space. It is preferably a real number matrix with 768 rows and 64 columns. This dimension can achieve dimensionality reduction while retaining the core information of the environment threshold features and match the dimensions of the other channel projection matrices.
[0078] The calendar channel representation is a matrix obtained by mapping the text lexical embedding matrix through the calendar channel projection matrix, and it is a feature representation of calendar-related semantics in the sampled text.
[0079] The phenological stage channel representation is a matrix obtained by mapping the text word embedding matrix through the phenological stage channel projection matrix. It is a feature representation of the semantics related to phenological stages in the sampled text.
[0080] The operation event channel representation is a matrix obtained by mapping the text lexical embedding matrix through the operation event channel projection matrix. It is a feature representation of the operation event-related semantics in the sampled text.
[0081] The environment threshold channel representation is a matrix obtained by mapping the text lexical embedding matrix through the environment threshold channel projection matrix. It is a feature representation of the environment threshold-related semantics in the sampled text.
[0082] Attention weights are quantified weight values assigned to each word based on its semantic importance, used to highlight the feature contributions of key words when aggregating channel representations.
[0083] The trainable weight vector is a numerical vector used to calculate attention weights. It is preferably a 768-dimensional real number vector, which matches the column dimension of the text word embedding matrix and can accurately capture the semantic information of words to calculate attention weights.
[0084] The number of lexical units is the total number of valid text lexical units obtained after the sampled text has been segmented. It is a quantitative indicator representing the length of the sampled text.
[0085] The attention weight of the t-th word is the attention weight value assigned to the t-th valid word in the sampled text, which is the weight quantification value of that word in the channel representation aggregation.
[0086] The component at position t in the calendar channel representation is the row vector of the t-th word in the sampled text, which is the calendar-related feature representation of the t-th word.
[0087] The component at position t in the phenological stage channel representation is the row vector of the t-th word in the sampled text, and is the phenological stage-related feature representation of the t-th word.
[0088] The component at position t in the operation event channel representation is the row vector of the t-th word in the sampled text, which is the operation event-related feature representation of the t-th word.
[0089] The component at position t in the environment threshold channel representation is the row vector of the t-th word in the sampled text, which is the environment threshold-related feature representation of the t-th word.
[0090] The calendar feature vector is a one-dimensional vector obtained by weighting and aggregating the calendar channel representations with attention weights. It is a condensed representation of calendar-related features in the sampled text.
[0091] The phenological stage feature vector is a one-dimensional vector obtained by weighting and aggregating the phenological stage channel representations with attention weights. It is a condensed representation of the phenological stage-related features in the sampled text.
[0092] The operation event feature vector is a one-dimensional vector obtained by weighting and aggregating the channel representations of operation events with attention weights. It is a condensed representation of the operation event-related features in the sampled text.
[0093] The environmental threshold feature vector is a one-dimensional vector obtained by weighting and aggregating the environmental threshold channel representations with attention weights. It is a condensed representation of environmental threshold-related features in the sampled text.
[0094] The four-channel temporal element is a one-dimensional vector obtained by sequentially concatenating calendar feature vectors, phenological stage feature vectors, operational event feature vectors, and environmental threshold feature vectors. It is a unified structured representation of four types of temporal-related features in the sampled text.
[0095] In detail, firstly, based on a pre-trained word embedding model in the agricultural field, the lexical units of the sampled text are converted into numerical vectors. A text lexical embedding matrix is constructed by arranging the lexical units in order. Then, this matrix is multiplied by the projection matrices of the calendar, phenological stages, operational events, and environmental threshold channels to obtain four types of channel representations. Next, the text lexical embedding matrix is operated on with trainable weight vectors and normalized to obtain attention weights. These attention weights are then used to weight and sum the positional components of each of the four types of channel representations to obtain four one-dimensional feature vectors. Finally, the four... The feature vectors are concatenated sequentially according to the order of calendar, phenological stage, operational event, and environmental threshold to form a unified four-channel temporal element. For example, the sampled text "humidity greater than 85% after irrigation easily induces gray mold" is segmented into words such as "humidity greater than 85% after irrigation easily induces gray mold". A 768-dimensional word embedding matrix is constructed, and four 64-column channel representations are obtained by mapping through four 768-row 64-column projection matrices. After calculating the attention weights, the feature vectors are weighted and aggregated to obtain four 64-dimensional feature vectors. After concatenation, a 256-dimensional four-channel temporal element is obtained.
[0096] In detail, the text word embedding matrix is constructed using a pre-trained word embedding model in the agricultural domain, with a fixed word vector dimension of 768. If no pre-trained model is available, a bag-of-words model combined with Word2Vec can be used for training. The training corpus consists of texts related to flower diseases and pests. All four channel projection matrices are initialized using a standard normal distribution with a mean of 0 and a variance of 0.01. A multi-label classification loss is constructed using pseudo-labels from the vocabulary annotations for training, and the Adam optimizer is used with a learning rate of 0.001. The trainable weight vectors are also initialized using a standard normal distribution with a mean of 0 and a variance of 0.01, synchronized with the channel projection matrices. Training updates; attention weights are calculated without masking, directly by normalizing the product of the text word embedding matrix and the trainable weight vector, and then summing the weights component by component according to word position during aggregation; when the sampled text contains multiple word segments, the channel representations are vertically concatenated according to word order before the attention weights are calculated and aggregated; each feature vector has a dimension of 64, and the four-channel temporal elements are obtained by concatenating four 64-dimensional feature vectors, with a total dimension of 256; if the number of words in the sampled text is 0, the four-channel temporal elements are directly set to all 0 vectors; if the number of words is 1, each channel component of that word is directly used as the corresponding feature vector.
[0097] Preferably, calculating channel gating values based on the four-channel time-series elements, and using the channel gating values to define boundaries for extracting map entities, includes:
[0098] Calculate the channel gating value: ,in This represents the channel gate value. This represents the activation function. Represents the channel gating weight matrix;
[0099] Calculate the overall launch score: ,in This represents the overall emission score. Represents the entity encoding feature matrix. Represents the basic emission weight matrix. Represents the channel-aided mapping matrix. This represents the gated transmit weight matrix;
[0100] Solving for the entity label sequence in the graph using the restricted sequence decoding algorithm: ,in This represents the sequence of entity labels in the graph. This represents the entity label transition matrix.
[0101] The channel gating value is a numerical matrix obtained by operating the text word embedding matrix through the channel gating weight matrix and then processing it through the activation function. It is a quantitative indicator of the regulation of the entity recognition boundary by the four-channel temporal elements.
[0102] The gated activation function is a function used to perform a nonlinear transformation on the product of the channel gate weight matrix and the text word embedding matrix. It is preferably an sigmoid function. This function can map the values to between 0 and 1, accurately characterize the contribution of channel features to entity boundary determination, and adapt to the control characteristics of the gate value.
[0103] The channel gating weight matrix is a weight matrix used to map the text word embedding matrix to the gating feature space. It is preferably a 768-row, 4-column real number matrix. The column dimension of this matrix matches the number of channels of the four-channel time series elements, and the row dimension is consistent with the column dimension of the text word embedding matrix. It can accurately capture the gating control information of the four-channel features.
[0104] The entity encoding feature matrix is a numerical matrix obtained by performing deep semantic encoding on the text word embedding matrix. It is a deep representation of the semantic features of text words and provides basic semantic features for entity recognition.
[0105] The basic emission weight matrix is a weight matrix used to map the entity encoding feature matrix to the entity label score space. It is preferably a real number matrix with 768 rows and 28 columns, which satisfies the matching of the row dimension with the column dimension of the entity encoding feature matrix and the column dimension is consistent with the total number of entity labels in the field of flower diseases and pests. It can transform deep semantic features into basic scores of each entity label.
[0106] The channel-assisted mapping matrix is used to map the channel gate value to a weight matrix with dimensions matching the gated transmission weight matrix. It is preferably a 4-row, 768-column real number matrix, which satisfies the matching of the row dimension with the column dimension of the channel gate value and the consistency of the column dimension with the row dimension of the gated transmission weight matrix, thereby achieving dimensional adaptation between the gate value and the transmission score.
[0107] The gated emission weight matrix is a weight matrix used to map the gated values processed by the channel auxiliary mapping matrix to the entity label score space. It is preferably a real number matrix with 768 rows and 28 columns, which satisfies the matching of the row dimension with the column dimension of the channel auxiliary mapping matrix and the matching of the column dimension with the total number of entity labels. It can transform the channel gated features into the gated scores of each entity label.
[0108] The comprehensive emission score is a numerical matrix obtained by adding the basic emission score and the gated emission score. It is a quantitative score of entity labels that integrates deep semantic features of text and four-channel gated features.
[0109] The graph entity label sequence is a label sequence obtained by decoding the comprehensive emission score and entity label transition matrix using a restricted sequence decoding algorithm. It is an ordered sequence that represents the entity label type corresponding to each word in the text.
[0110] The overall emission score of the tag yt at position t is the value of the entity tag yt in the t-th row of the overall emission score, which is the quantitative score of the t-th word in the text being identified as the entity tag.
[0111] The entity label transition matrix is a numerical matrix that represents the probability of adjacent transitions between entity labels. It is preferably a real matrix with 28 rows and 28 columns, which satisfies that the row and column dimensions of the matrix are consistent with the total number of entity labels, and can accurately represent the transition probability between any two entity labels.
[0112] The transition score from label y{t-1} to yt is the value of y{t-1} and yt in the corresponding row and column of the entity label transition matrix. It is the quantitative score of the entity label of the (t-1)th word in the text being transferred to the entity label of the tth word.
[0113] In detail, the process begins by multiplying the text word embedding matrix and the channel gating weight matrix of the sampled text. The result is then input into a gating activation function for nonlinear transformation to obtain the channel gating value. Next, deep semantic encoding is performed on the text word embedding matrix to obtain the entity encoding feature matrix. This matrix is multiplied by the basic emission weight matrix to obtain the basic score. Simultaneously, the channel gating value is multiplied by the channel auxiliary mapping matrix and then by the gating emission weight matrix to obtain the gating score. The basic score and the gating score are added positionally to obtain the comprehensive emission score. Finally, a restricted sequence decoding algorithm is used, combining the comprehensive emission score and the entity label transition matrix to perform sequence decoding on the text words, resulting in a graph entity label sequence composed of entity labels corresponding to each word. Based on this sequence, words corresponding to consecutive labels are extracted as graph entities. For example, if the text is "Gray mold in the budding stage is susceptible to infecting petals", word segmentation yields words such as "Gray mold in the budding stage is susceptible to infecting petals". After decoding through the above steps, a label sequence of the verb "disease in the phenological stage" and the affected parts is obtained. Based on this, graph entities such as "Gray mold in the budding stage" and "petals" are extracted.
[0114] In detail, the gating activation function adopts a sigmoid function, which is calculated by dividing the result of the exponential function of the input value by the sum of the result of the exponential function and 1. The channel gating weight matrix, basic emission weight matrix, channel auxiliary mapping matrix, and gating emission weight matrix are all initialized using a standard normal distribution with a mean of 0 and a variance of 0.01. A conditional random field loss function is constructed using the labeled entity label sequence for joint training. The optimizer is the Adam optimizer with a learning rate of 0.001. The entity encoding feature matrix is obtained by deep encoding the text word embedding matrix using a bidirectional long short-term memory network. The hidden layer dimension of this network is set to 768, and the number of network layers is set to 2. The entity labeling system adopts an internal and external labeling rule, including flower disease and pest symptoms and pesticide operations. Nine core entities are identified: event environment, phenology stage, affected parts, and time. Each entity has a start and internal label, while non-entity words are external labels, for a total of 28 labels. The restricted sequence decoding algorithm uses the restricted Viterbi algorithm, with the following constraints: labels such as event environment and phenology stage cannot overlap with labels such as disease and pest; the start label of the same entity can only be followed by the corresponding internal label, while any label can be followed by the external label; when extracting graph entities based on graph entity label sequences, the words corresponding to consecutive start labels and internal labels are concatenated in sequence, and the words corresponding to external labels are discarded. If a single start label has no corresponding internal label, the word corresponding to that label is directly used as a graph entity. The concatenated or extracted word combination is the final graph entity.
[0115] Preferably, the process involves extracting associated data based on the graph entities and generating periodic vectors based on the four-channel time-series elements to construct a graph quadruple containing the periodic vectors, including:
[0116] Calculate the confidence score for the association relationship: ,in and These represent the head entity vector and the tail entity vector, respectively. Represents the association weight matrix;
[0117] Calculate the periodic vector: ,in This represents the periodic vector. Represents the periodic feature projection matrix. This represents the timing elements of the four channels.
[0118] The head entity vector is a one-dimensional numerical vector that represents the semantic and structural features of the head entity in the graph. It is a quantitative representation of the head entity in the feature space.
[0119] Tail entity vectors are one-dimensional numerical vectors that characterize the semantic and structural features of tail entities in a graph. They are quantitative representations of tail entities in the feature space.
[0120] The association weight matrix is a weight matrix used to calculate the inner product of the head entity vector and the tail entity vector and to quantify the association between entities. It is preferably a real number matrix with 256 rows and 256 columns. This dimension is completely matched with the dimension of the graph entity vector, which can accurately capture the association features between entities, adapt to the scoring calculation logic of the bilinear inner product, and form a feature adaptation with the dimension of the four-channel time series elements.
[0121] The association confidence score is a numerical value obtained by operating the head entity vector and the tail entity vector through the association weight matrix. It is a quantitative indicator that represents the degree of authenticity of the association between entities.
[0122] The head entity is the initiating entity of the association in the graph and is one of the core entities that constitute the graph triples.
[0123] Related data is information that represents the semantic association between head entities and tail entities; it is the relational feature that connects different entities in a graph.
[0124] Tail entities are the receiving entities of associations in the graph and are one of the core entities that constitute the graph triples.
[0125] The periodic feature projection matrix is a weight matrix used to linearly transform four-channel time series elements to the periodic feature space. It is preferably a real number matrix with 256 rows and 128 columns. This dimension can reasonably reduce the dimensionality of the 256-dimensional four-channel time series elements, while retaining the core periodic features and reducing the complexity of subsequent calculations. The 128-dimensional output dimension is suitable for the feature requirements of disease window coordinate extraction.
[0126] The periodic vector is a one-dimensional numerical vector obtained by linear transformation of the four-channel time series elements through the periodic feature projection matrix. It is a condensed and quantitative representation of the periodic characteristics of pest and disease occurrence in the sampled text.
[0127] A graph quadruple is a four-dimensional data structure formed by combining a head entity, associated data, a tail entity, and a periodic vector. It is a basic unit of the graph that incorporates temporal and periodic features.
[0128] In detail, all extracted map entities are first paired. The entity that initiates the association in each pair is defined as the head entity, and the entity that receives the association is defined as the tail entity. The corresponding head entity vector and tail entity vector are extracted. The head entity vector is mapped by the association weight matrix and then the inner product operation is performed with the tail entity vector to obtain the association confidence score. A score threshold is set to filter out valid association data. Then, the four-channel time series elements are input into the periodic feature projection matrix for linear transformation to obtain the periodic vector representing the occurrence cycle of pests and diseases. Finally, the filtered head entities, association data, tail entities and corresponding periodic vectors are combined in a fixed order to construct a map quadruple containing periodic time series features. For example, if the head entity is gray mold and the tail entity is petals, and the confidence score of the association data is higher than the threshold, the corresponding periodic vector is generated by the four-channel time series elements. Finally, a map quadruple containing gray mold infected petals and the periodic vector is formed.
[0129] In detail, the head and tail entity vectors are obtained by concatenating the word embedding features and structural features of the graph entities and then normalizing them. The vector dimension is fixed at 256. The word embedding features are extracted based on a pre-trained model in the agricultural domain, and the structural features are obtained from the contextual association information of the entities. The association weight matrix is initialized using a standard normal distribution with a mean of 0 and a variance of 0.01. A loss function is constructed using far-supervised samples for training, and the optimizer is the Adam optimizer with a learning rate of 0.001. Different association data correspond to different association weight matrices. The threshold for determining the confidence score of the association is fixed at 0.7. Association data with a score greater than or equal to 0.7 are considered valid. Values less than 0.7 are discarded. The periodic feature projection matrix is also initialized using a standard normal distribution with a mean of 0 and a variance of 0.01. It is trained using the fitting error of the periodic features as the loss function and is updated synchronously with the association weight matrix. The periodic vector is obtained by linear transformation of 256-dimensional four-channel time series elements, with a fixed dimension of 128. The vector values are standardized to between -1 and 1 using z-values. When there are multiple valid association data for the same head and tail entities, a corresponding graph quadruple is constructed for each association data. When there are multiple periodic vectors for the same association, the periodic vector with the smallest variance is retained as the final value. If there is no valid association data for the entity pair, no graph quadruple is constructed.
[0130] Preferably, fitting the mapping relationship between the operation event and the periodic vector, and extracting a biaxial time series operator to reduce the dimensionality of the periodic vector to the coordinates of the onset window, includes:
[0131] Calculate the linear mapping matrix using the ridge regression algorithm: ,in Represents the linear mapping matrix, Represents a periodic sample matrix. This represents the sample matrix of operation events. Represents the regularization coefficient;
[0132] right Perform singular value decomposition and extract the biaxial temporal operator: , ,in This refers to the dual-axis timing operator. Represents the first two columns of the left singular matrix;
[0133] Calculate the coordinates of the disease onset window: ,in These are the first coordinate component and the second coordinate component, respectively.
[0134] The operation event sample matrix is a two-dimensional numerical matrix formed by arranging multiple operation event feature vectors in columns. It is a collective representation of operation event features in multi-source text.
[0135] The periodic sample matrix is a two-dimensional numerical matrix formed by arranging multiple periodic vectors in columns. It is a collective representation of the periodic features of pests and diseases in multi-source texts.
[0136] The regularization coefficient is a quantization coefficient used in the ridge regression algorithm to prevent matrix singularity and improve model robustness. The preferred value is 0.001. This value can effectively solve the singularity problem of the product of the transpose of the sample matrix of the operation event and itself, while avoiding excessive regularization that would lead to the loss of feature information. It is suitable for the matrix calculation characteristics of the flower disease and pest corpus.
[0137] The linear mapping matrix is the transformation matrix from the operational event sample matrix to the periodic sample matrix obtained by the ridge regression algorithm. It is a quantification matrix that represents the linear mapping relationship between operational events and the pest and disease cycle.
[0138] The left singular matrix is an orthogonal matrix obtained by performing singular value decomposition on a linear mapping matrix. It is a matrix that characterizes the principal characteristic directions of the linear mapping matrix.
[0139] A singular value diagonal matrix is a diagonal matrix obtained by performing singular value decomposition on a linear mapping matrix. The values on the diagonal are singular values, which represent the contribution of each principal characteristic direction.
[0140] The right singular matrix is an orthogonal matrix obtained by performing singular value decomposition on a linear mapping matrix. It is a matrix that helps characterize the features of a linear mapping matrix.
[0141] The dual-axis temporal operator is a matrix obtained by extracting the first two principal singular vectors of the left singular matrix and transposing them. It is the core operator for reducing the dimensionality of periodic vectors to two-dimensional disease window coordinates.
[0142] The first two principal singular vectors of the left singular matrix are the two vectors with the highest contribution in the left singular matrix, representing two independent temporal shearing principal axes: photoperiod regulation and humidity event triggering, respectively.
[0143] The disease window coordinates are two-dimensional numerical coordinates obtained by dimensionality reduction projection of a periodic vector using a biaxial temporal operator. They are a two-dimensional structured representation of the occurrence cycle of flower diseases and pests.
[0144] The first coordinate component is the first dimension of the disease outbreak window coordinates, representing the degree of influence of photoperiod regulation on the occurrence cycle of pests and diseases.
[0145] The second coordinate component is the second dimension of the disease outbreak window coordinates, representing the degree of influence of humidity events on the disease and pest occurrence cycle.
[0146] In detail, the process first extracts operation event feature vectors and periodic vectors without missing values from the processing results of all multi-source texts. Then, all operation event feature vectors are arranged column-wise to construct an operation event sample matrix, and all periodic vectors are arranged column-wise to construct a periodic sample matrix. Next, the ridge regression algorithm is used to multiply the periodic sample matrix by the transpose of the operation event sample matrix, and then multiply the product of the transpose of the operation event sample matrix and itself, plus the inverse of the product of the regularization coefficient and the identity matrix. This yields a linear mapping matrix representing the mapping relationship between operation events and periodic vectors. Singular value decomposition is then performed on the linear mapping matrix to obtain the left singular matrix, the singular value diagonal matrix, and the right singular value matrix. The singular matrix is used to extract the top two principal singular vectors with the highest contribution from the left singular matrix and transpose them to construct a biaxial time-series operator. Finally, each period vector is input into the biaxial time-series operator for dimensionality reduction projection to obtain the disease window coordinates containing the first and second coordinate components. For example, the operation event feature vectors and period vectors of 100,000 flower disease and pest texts are extracted to construct a sample matrix. After ridge regression fitting and singular value decomposition, the biaxial time-series operator is obtained. The period vector corresponding to gray mold is projected through this operator to obtain two-dimensional disease window coordinates of 0.7 and 0.9, which correspond to the influence of photoperiod regulation and humidity event triggering, respectively.
[0147] In detail, when constructing the operational event sample matrix and the periodic sample matrix, only feature vectors and periodic vectors without missing values are selected, with a sample size of no less than 10,000. The row dimension of the matrix is consistent with the dimension of the corresponding vector, and the column dimension is the number of valid samples. The ridge regression algorithm is implemented using direct matrix operations, and the dimension of the identity matrix is completely consistent with the matrix dimension of the product of the transpose and itself of the operational event sample matrix. Singular value decomposition is implemented using the classic Jacobi algorithm, retaining all singular values and feature vectors of the linear mapping matrix during the decomposition process without premature truncation. The semantic attribution of the two principal axes of the dual-axis temporal operators is implemented through a pre-defined anchor word set. The light period anchor word set includes shading, supplemental lighting, short daylight, etc. The set of anchor words for humidity events includes terms such as budding, spraying, leaf wetting, and relative humidity. The average projection amplitude of each principal axis onto the anchor word sample is calculated to determine the first coordinate component corresponding to photoperiod regulation and the second coordinate component corresponding to humidity event triggering. The first and second coordinate components of the disease window coordinate are both standardized to between -1 and 1 by z-value to eliminate the influence of different dimensions. When there are missing values in the periodic vector, the mean of the periodic vector of the same type of pests and diseases is used to fill them. When there are outliers, the quartile method is used to remove them. Outliers are judged as values that exceed 1.5 times the interquartile range of the upper and lower quartiles. The preprocessing rules of the operational event feature vector are consistent with those of the periodic vector.
[0148] Preferably, the process of concatenating the coordinates of the disease outbreak window to the representation vector of the atlas entity for alignment and fusion includes:
[0149] Construct window-enhanced entity representation vectors: ,in Representation vector of a graph entity. Indicates the window weight coefficient;
[0150] Solve for the optimal orthogonal alignment matrix: ,in These represent the enhanced representation matrices of the first and second data sources, respectively.
[0151] Through singular value decomposition get .
[0152] The window weight coefficient is a quantification coefficient used to adjust the contribution of the disease window coordinates in feature fusion. It is preferably 0.5. This value can balance the feature weights of the original representation vector of the graph entity and the disease window coordinates, so as not to weaken the semantic features of the entity itself, and to fully integrate the structural features of the temporal window, thus adapting to the feature fusion requirements of the knowledge graph of flower diseases and pests.
[0153] The window-enhanced entity representation vector is a one-dimensional numerical vector obtained by concatenating the weighted coordinates of the disease window with the atlas entity representation vector. It is an entity quantification representation that incorporates temporal window features.
[0154] The first data source augmentation representation matrix is a two-dimensional numerical matrix formed by arranging the window augmentation entity representation vectors of all graph entities in the first data source in columns. It is a set-based representation of the entity features of the data source.
[0155] The second data source augmentation representation matrix is a two-dimensional numerical matrix formed by arranging the windowed augmentation entity representation vectors of all graph entities in the second data source in columns. It is a set-based representation of the entity features of the data source.
[0156] An orthogonal alignment mapping matrix is an orthogonal matrix used to map enhanced representation matrices from different data sources to the same feature space. It is a quantization matrix that achieves spatial alignment of multi-source data.
[0157] The optimal orthogonal alignment matrix is the optimal orthogonal alignment mapping matrix obtained by minimizing the Frobenius norm. It is a matrix that can achieve accurate alignment of feature spaces of multi-source data.
[0158] The Frobenius norm is a matrix norm used to quantify the differences between elements of two matrices; it is a quantitative indicator that characterizes the similarity between two matrices.
[0159] The left singular vector matrix in the alignment process is an orthogonal matrix obtained by performing singular value decomposition on the product matrix of the augmented representation matrices of the two data sources. It is a matrix that characterizes the main eigendirection of the product matrix.
[0160] The singular value diagonal matrix in the alignment process is a diagonal matrix obtained by performing singular value decomposition on the product matrix of the augmented representation matrices of the two data sources. The values on the diagonal are singular values, which represent the contribution of each principal feature direction.
[0161] The right singular vector matrix in the alignment process is an orthogonal matrix obtained by performing singular value decomposition on the product matrix of the augmented representation matrices of the two data sources. It is a matrix that helps characterize the features of the product matrix.
[0162] In detail, the process begins by multiplying the disease window coordinates with the window weight coefficients. The results are then concatenated sequentially to the end of the representation vector of each entity in the atlas, yielding the window-enhanced entity representation vector for each entity. Next, the window-enhanced entity representation vectors from the first and second data sources are arranged column-wise to construct the first and second data source enhancement representation matrices. The optimization objective is to minimize the Frobenius norm between the first data source enhancement representation matrix and the second data source enhancement representation matrix after orthogonal alignment mapping. The optimal orthogonal alignment mapping matrix is then solved. Specifically, the transpose of the first data source enhancement representation matrix and the second data source enhancement representation matrix are first calculated. The product matrix of matrices is subjected to singular value decomposition to obtain a left singular vector matrix, a singular value diagonal matrix, and a right singular vector matrix. The left singular vector matrix is multiplied by the transpose of the right singular vector matrix to obtain the closed-form solution of the optimal orthogonal alignment matrix. Finally, the enhanced representation matrix of the first data source is multiplied by the optimal orthogonal alignment matrix to achieve feature space alignment and fusion between the first and second data sources. For example, the entity representation vector of gray mold is weighted and concatenated with the corresponding two-dimensional disease window coordinates to obtain the enhanced vector. After constructing the enhanced representation matrices of the two data sources respectively, the optimal orthogonal alignment matrix is obtained through the above steps, thus completing the feature alignment of gray mold-related entities of the two data sources.
[0163] In detail, the window weight coefficient can be fine-tuned between 0.3 and 0.7 according to actual fusion needs; the larger the value, the higher the feature contribution of the disease window coordinates. The dimension of the window-enhanced entity representation vector is the sum of the dimension of the original entity representation vector and the dimension of the disease window coordinates. The original entity representation vector has a dimension of 256, and the disease window coordinates have a dimension of 2, so the dimension of the enhancement vector is fixed at 258. The first data source is preferably the structured data from a professional agricultural knowledge base, and the second data source is the semi-structured text data obtained by crawling. When there are more than two data sources, the professional knowledge base is the central data source, and the other data sources are selected in sequence. The data is paired and fused with the central data source. The orthogonal alignment mapping matrix has the same row dimension as the augmented representation matrix, which is 258 rows and 258 columns. Singular value decomposition is implemented using the Jacobi algorithm, and the decomposition process retains all singular values and feature vectors. After alignment and fusion, cosine similarity is used to determine whether entities are the same entity. The similarity threshold is set to 0.92. If the similarity is greater than or equal to the threshold, the entities are merged. If it is less than the threshold, they are kept as different entities and an alias relationship is established. After fusion, redundant associations are deduplicated, and the association data with the highest confidence score is retained. Missing feature values are filled with the average feature value of entities of the same type.
[0164] Preferably, a graph network is established based on the coordinates of the disease outbreak window to generate a knowledge graph of flower diseases and pests, so as to accelerate the response to agricultural diagnostic commands through the knowledge graph of flower diseases and pests, including:
[0165] Calculate the grid index: , ,in Indicates spatial grid resolution;
[0166] Generate timing window node identifiers: ,in This represents a hash mapping operation. These represent the head entity, related data, and tail entity, respectively. This indicates a character concatenation operation.
[0167] Spatial grid resolution is a quantitative scale that transforms continuous disease window coordinates into discrete grid indexes. It is the grid unit length that divides the two-dimensional disease window coordinate space and provides a criterion for calculating the grid index.
[0168] The median of the spatial distance between the coordinates of the disease outbreak window is the median of the set of values obtained after calculating the spatial distance between each pair of coordinates of all disease outbreak windows in the graph network. It is the core reference for determining the spatial grid resolution.
[0169] The scaling factor is a quantization factor used to adjust the median value of spatial distance to calculate the spatial grid resolution. It is preferably 0.5. This value can adapt to the distribution density of the disease window coordinates, avoiding both excessively dense grids that lead to redundant number of time window nodes and excessively sparse grids that lead to loss of disease and pest cycle characteristics, thus matching the actual distribution characteristics of the disease window coordinates of flower diseases and pests.
[0170] The first grid index is a discrete integer obtained by dividing the first coordinate component of the disease window coordinate by the spatial grid resolution and then rounding it down. It is a unique identifier representing the position of the first coordinate component in the grid space.
[0171] The second grid index is a discrete integer obtained by dividing the second coordinate component of the disease window coordinate by the spatial grid resolution and then rounding it down. It is a unique identifier representing the position of the second coordinate component in the grid space.
[0172] The character concatenation operation is a fundamental operation that connects the head entity, associated data, tail entity, and the first grid index and the second grid index in a fixed order to form a continuous character sequence. It is a prerequisite step for generating a unique temporal window node identifier.
[0173] Hash mapping is an operation that transforms a concatenated sequence of characters into a fixed-length and unique character or numerical value. The preferred method is the Secure Hash Algorithm 256 (SHA-256), which guarantees the uniqueness and irreversibility of the mapping result, effectively avoids the conflict problem of time window node identifiers, and fully meets the uniqueness requirements of graph network node identifiers.
[0174] The temporal window node identifier is a unique result obtained by hash mapping operation on the concatenated character sequence. It is the exclusive identity identifier of the temporal window node in the graph network and is used to distinguish different disease and pest occurrence cycle windows.
[0175] Semantic interpretation labels are natural language labels obtained by mapping the two-dimensional values of the disease window coordinates to intervals. They are human-readable representations of the disease window coordinates and are used to intuitively display the cycle characteristics of pests and diseases at the graph delivery layer.
[0176] In detail, the spatial distance between all pairs of disease window coordinates in the graph network is first calculated. After sorting all distance values, the median value is taken. This median value is then multiplied by a scaling factor to calculate the spatial grid resolution. Next, the first and second coordinate components of each disease window coordinate are divided by the spatial grid resolution and rounded down to obtain the corresponding first and second grid indices. Then, the character forms of the head entity, associated data, and tail entity are concatenated with the numeric forms of the first and second grid indices in a fixed order. The concatenated character sequence is input into a hash mapping operation to generate a unique temporal window node identifier. Based on this identifier, an independent temporal window node is established in the graph network. Then, based on the numerical range of the disease window coordinates, they are mapped to corresponding semantic interpretation labels. Finally, a network association is established between the temporal window nodes and the corresponding head and tail entities. All entity nodes, temporal window nodes, and association relationships are integrated to generate a knowledge graph of flower diseases and pests. For example, the disease window coordinates corresponding to gray mold-infected petals are 0.7 and 0.9, the median spatial distance is 0.4, the scale factor is 0.5, and the grid resolution is 0.2. The first grid index is calculated to be 3 and the second grid index is 4. After splicing gray mold-infected petal 34 (34 is the coordinate of the first grid index 3 and the second grid index 4 combined), a unique node identifier is obtained through hash mapping. The semantic interpretation label of this coordinate mapping is that the high humidity operation triggers the budding stage infection window.
[0177] In detail, the spatial distance between the disease window coordinates is calculated using Euclidean distance, which is the square root of the sum of the squares of the differences in the first and second components of the two coordinates. The scaling factor can be fine-tuned between 0.3 and 0.7 according to the distribution density of the disease window coordinates; a smaller value is used for high distribution density, and a larger value is used for low distribution density. The character concatenation operation is performed in a fixed order of the head entity associated data and the tail entity's first grid index and second grid index. Each part is connected with an English underscore, and empty characters are used to fill in any missing characters. When the hash mapping operation uses the secure hash algorithm 256, the output result is a 64-bit hexadecimal character. If an identifier conflict occurs, an auto-incrementing digit suffix is appended to the end of the concatenation sequence and the operation is recalculated until a unique identifier is generated. The mapping rule for the semantic interpretation label is to map the first... The coordinate components and the second coordinate component are divided into five intervals according to the quartiles: extremely low, low, medium, high, and extremely high. The 25th, 50th, and 75th percentiles are used as interval dividing nodes. Each interval corresponds to a fixed natural language description. The interval descriptions of the two components are combined to obtain the final label. The relationship between the temporal window node and the graph entity is a windowed relationship, with the entity node pointing to the temporal window node. The knowledge graph of flower diseases and pests is stored in a graph database, containing two types of core nodes: entity nodes and temporal window nodes. The original relationships between entities are retained, and new relationships between entities and temporal window nodes are added. Agricultural diagnostic command response is realized through graph path retrieval. During retrieval, the semantic interpretation labels and associated entities of the temporal window nodes are matched first to quickly locate the occurrence cycle and inducing conditions of diseases and pests, thereby improving the efficiency of diagnostic response.
[0178] like Figure 2 As shown, Figure 2 The diagram illustrates a specific scenario of constructing and applying an AI-based knowledge graph for flower diseases and pests. The left side depicts a greenhouse flower cultivation scene, where growers collect on-site information or initiate agricultural diagnostic commands using handheld devices. The right side shows the knowledge graph construction system acquiring multi-source data from online text and agricultural knowledge base interfaces. After domain-specific sampling, it generates a structured knowledge graph of flower diseases and pests and stores it in a graph database. When an agricultural diagnostic command is input into the system, the system responds quickly based on the knowledge graph, providing growers with accurate disease and pest diagnosis and control suggestions.
[0179] Example 2: A method for constructing a knowledge graph of flower diseases and pests based on artificial intelligence, applied to any of the aforementioned systems for constructing a knowledge graph of flower diseases and pests based on artificial intelligence:
[0180] Multi-source text is captured, and sampled text is obtained by performing domain-specific sampling based on the photoperiod domain and humidity domain.
[0181] The sampled text is vectorized to extract four-channel time-series elements, which include calendar elements, phenological stage elements, operational events, and environmental thresholds.
[0182] Based on the four-channel time-series elements, channel gating values are calculated, and the boundaries are defined using the channel gating values to extract map entities;
[0183] Based on the graph entities, related data are extracted, and periodic vectors are generated based on the four-channel time-series elements to construct a graph quadruple containing the periodic vectors.
[0184] Fit the mapping relationship between the operation event and the periodic vector, and extract the biaxial time series operator to reduce the dimensionality of the periodic vector to the coordinates of the onset window;
[0185] The coordinates of the disease outbreak window are concatenated to the representation vector of the atlas entity for alignment and fusion.
[0186] Based on the coordinates of the disease outbreak window, a graph network is established to generate a knowledge graph of flower diseases and pests, so as to accelerate the response to agricultural diagnostic instructions through the knowledge graph of flower diseases and pests.
[0187] It should be noted that the interval and threshold sizes are set for ease of comparison. The size of the threshold depends on the amount of sample data and the base number set by those skilled in the art for each set of sample data, as long as it does not affect the proportional relationship between the parameter and the quantized value. Furthermore, the above formulas are all dimensionless calculations, and the formulas are derived from software simulations using a large amount of collected data to obtain the most recent real-world results. The preset parameters in the formulas are set by those skilled in the art according to the actual situation.
[0188] The embodiments of this example have been described above. However, this example is not limited to the specific implementation methods described above. The specific implementation methods described above are merely illustrative and not restrictive. Those skilled in the art can make many other forms based on the guidance of this example, and all of them are within the protection scope of this example.
Claims
1. A knowledge graph construction system for flower diseases and pests based on artificial intelligence, characterized in that: The acquisition module is used to capture text from multiple sources and perform domain-specific sampling based on the photoperiod domain and humidity domain to obtain sampled text. The parsing module is used to vectorize the sampled text and extract four-channel time-series elements, which include calendar elements, phenological stage elements, operational events, and environmental thresholds. Construct the text lexical embedding matrix of the sampled text; The entity recognition module is used to calculate channel gating values based on the four-channel time-series elements, and to use the channel gating values to define boundaries to extract entities from the map, including: The text word embedding matrix is multiplied by the channel gating weight matrix, and then processed by a gating activation function to obtain the channel gating value; The entity encoding feature matrix is multiplied by the basic emission weight matrix to obtain the basic score, and the channel gate value is processed by the channel auxiliary mapping matrix and then multiplied by the gate emission weight matrix to obtain the gated score; wherein, the entity encoding feature matrix is a numerical matrix obtained by performing deep semantic encoding on the text word embedding matrix, and the basic emission weight matrix is a weight matrix used to map the entity encoding feature matrix to the entity label score space. The base score is added to the gating score to obtain the overall transmission score; The integrated emission score and entity tag transition matrix are decoded using a restricted sequence decoding algorithm to obtain a graph entity tag sequence, and the graph entities are extracted based on the graph entity tag sequence; The relation extraction module is used to extract associated data based on the graph entities and generate periodic vectors based on the four-channel time-series elements to construct a graph quadruple containing the periodic vectors. The time-series transformation module is used to fit the mapping relationship between the operation events and the periodic vector, and to extract biaxial time-series operators to reduce the dimensionality of the periodic vector to the coordinates of the onset window, including: Obtain multiple operation event feature vectors and multiple period vectors extracted from each of the multi-source texts, and construct an operation event sample matrix and a period sample matrix based on the multiple operation event feature vectors and multiple period vectors respectively; The transformation relationship between the periodic sample matrix and the operational event sample matrix is calculated using the ridge regression algorithm to obtain a linear mapping matrix; The linear mapping matrix is subjected to singular value decomposition to obtain the left singular matrix. The first two principal singular vectors of the left singular matrix are extracted and transposed to construct the biaxial time series operator. The periodic vector is dimensionality-reduced by projecting it using the dual-axis time-series operator to obtain the coordinates of the disease window, which includes the first coordinate component and the second coordinate component. The knowledge fusion module is used to align and fuse the coordinates of the disease outbreak window into the representation vector of the atlas entity. The graph application module is used to establish a graph network based on the disease window coordinates and generate a knowledge graph of flower diseases and pests, so as to accelerate the response to agricultural diagnostic instructions through the knowledge graph of flower diseases and pests.
2. The artificial intelligence-based knowledge graph construction system for flower diseases and pests according to claim 1, characterized in that, Multi-source text is crawled, and sampled text is obtained by segmenting the text according to the photoperiod domain and the humidity domain, including: The optical periodicity domain score and humidity domain score of the multi-source text are calculated, wherein the optical periodicity domain score is calculated based on the matching state value of the text word and the optical periodic vocabulary and the word weight, and the humidity domain score is calculated based on the matching state value of the text word and the humidity vocabulary and the word weight. Based on the difference between the photoperiod domain score and the humidity domain score, a scene domain label is assigned to the multi-source text, and the scene domain label belongs to the photoperiod domain or the humidity domain. Obtain the source confidence coefficient, time decay factor, and scene domain balance coefficient corresponding to the scene domain label of the multi-source text; The sampling weight is obtained by multiplying the source confidence coefficient, the time decay factor, and the scene domain balance coefficient. The normalized sampling probability is calculated based on the sampling weight, and the multi-source text is subjected to weighted reservoir sampling according to the normalized sampling probability to obtain the sampled text.
3. The artificial intelligence-based knowledge graph construction system for flower diseases and pests according to claim 2, characterized in that, The sampled text is vectorized to extract four-channel time-series elements, including calendar elements, phenological stage elements, operational events, and environmental thresholds, including: The text word embedding matrix is mapped using a channel projection matrix to obtain calendar channel representation, phenological stage channel representation, operational event channel representation, and environmental threshold channel representation, respectively. The calendar channel representation, the phenological stage channel representation, the operational event channel representation, and the environmental threshold channel representation are aggregated by attention weights to obtain calendar feature vectors, phenological stage feature vectors, operational event feature vectors, and environmental threshold feature vectors, respectively. The calendar feature vector, the phenological stage feature vector, the operational event feature vector, and the environmental threshold feature vector are concatenated to obtain the four-channel time series elements.
4. The artificial intelligence-based knowledge graph construction system for flower diseases and pests according to claim 3, characterized in that, Based on the aforementioned graph entities, associated data is extracted, and periodic vectors are generated based on the four-channel time-series elements to construct a graph quadruple containing the periodic vectors, including: The graph entities are divided into head entities and tail entities, and the corresponding head entity vectors and tail entity vectors are obtained; The inner product of the head entity vector and the tail entity vector is calculated using the association relationship weight matrix to obtain the association relationship confidence score, and the associated data is extracted based on the association relationship confidence score; The periodic vector is obtained by linearly transforming the four-channel time-series elements using the periodic feature projection matrix. The head entity, the associated data, the tail entity, and the periodic vector are combined to construct the graph quadruple containing the periodic vector.
5. The artificial intelligence-based knowledge graph construction system for flower diseases and pests according to claim 4, characterized in that, The process of concatenating the coordinates of the disease outbreak window to the representation vector of the atlas entity for alignment and fusion includes: The coordinates of the disease window are multiplied by the window weight coefficient and then concatenated into the representation vector of the atlas entity to obtain the window-enhanced entity representation vector. Construct a first data source enhancement representation matrix and a second data source enhancement representation matrix based on the window enhancement entity representation vector; The optimal orthogonal alignment matrix is solved with the objective of minimizing the Frobenius norm of the difference between the first data source augmentation representation matrix after mapping by the orthogonal alignment mapping matrix and the second data source augmentation representation matrix. Specifically, the product matrix of the transpose of the first data source enhanced representation matrix and the second data source enhanced representation matrix is calculated, and the product matrix is subjected to singular value decomposition to obtain the left singular vector matrix, the singular value diagonal matrix and the right singular vector transpose matrix; Multiply the left singular vector matrix by the transpose of the right singular vector matrix to obtain a closed-form solution, which is used as the optimal orthogonal alignment matrix. The first data source enhancement representation matrix is mapped using the optimal orthogonal alignment matrix to align and fuse with the second data source enhancement representation matrix.
6. The artificial intelligence-based knowledge graph construction system for flower diseases and pests according to claim 5, characterized in that, Based on the coordinates of the disease outbreak window, a graph network is established to generate a knowledge graph of flower diseases and pests. This knowledge graph accelerates the response to agricultural diagnostic commands, including: The median spatial distance of the coordinates of multiple disease-causing windows contained in the atlas network is obtained, and the spatial grid resolution is calculated by multiplying the median spatial distance by a scaling factor. Divide the first coordinate component and the second coordinate component by the spatial grid resolution and round down to obtain the first grid index and the second grid index, respectively. The head entity, the associated data, and the tail entity are concatenated with the first grid index and the second grid index, and a hash mapping operation is performed to generate a time-series window node identifier. A time-series window node is established based on the time-series window node identifier, and the onset window coordinates are mapped to semantic interpretation labels; Establish network associations between the time-series window nodes and the graph entities to generate the flower disease and pest knowledge graph, and output the flower disease and pest knowledge graph containing the semantic interpretation tags to accelerate the response to agricultural diagnostic instructions.
7. A method for constructing a knowledge graph of flower diseases and pests based on artificial intelligence, applied to the knowledge graph construction system of flower diseases and pests based on artificial intelligence as described in any one of claims 1-6, characterized in that: Multi-source text is captured, and sampled text is obtained by performing domain-specific sampling based on the photoperiod domain and humidity domain. The sampled text is vectorized to extract four-channel time-series elements, which include calendar elements, phenological stage elements, operational events, and environmental thresholds. Based on the four-channel time-series elements, channel gating values are calculated, and the boundaries are defined using the channel gating values to extract map entities; Based on the graph entities, related data are extracted, and periodic vectors are generated based on the four-channel time-series elements to construct a graph quadruple containing the periodic vectors. Fit the mapping relationship between the operation event and the periodic vector, and extract the biaxial time series operator to reduce the dimensionality of the periodic vector to the coordinates of the onset window; The coordinates of the disease outbreak window are concatenated to the representation vector of the atlas entity for alignment and fusion. Based on the coordinates of the disease outbreak window, a graph network is established to generate a knowledge graph of flower diseases and pests, so as to accelerate the response to agricultural diagnostic instructions through the knowledge graph of flower diseases and pests.