An image recognition-based salvia miltiorrhiza insect pest detection method and system

By collecting multi-temporal, multispectral image sequences of Salvia miltiorrhiza leaves, constructing a microdroplet spatial density field, and tracing the focus of pests, the problem of identifying Salvia miltiorrhiza pests in existing technologies has been solved, enabling accurate detection and early warning of tiny pests.

CN122244532APending Publication Date: 2026-06-19ZHUANGLANG COUNTY AGRICULTURE & RURAL AFFAIRS BUREAU

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
ZHUANGLANG COUNTY AGRICULTURE & RURAL AFFAIRS BUREAU
Filing Date
2026-03-25
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing methods for identifying pests in Salvia miltiorrhiza rely on visible light or broadband multispectral imaging, which makes it difficult to identify pests that are small, elusive, or in the nymph stage. Furthermore, leaf damage characteristics are easily confused with mechanical damage, natural aging, and other disease symptoms, resulting in poor identification specificity and a high misjudgment rate.

Method used

Multi-temporal, multispectral image sequences of Salvia miltiorrhiza leaf surfaces were collected. By identifying microdroplet regions, constructing a microdroplet spatial density field, tracing the focal points of potential pest action, connecting the focal point movement paths, extracting geometric shape and movement speed features, and classifying them using a pre-trained pest behavior decoding network.

Benefits of technology

It effectively detects pests that are small in size, have hidden movements, or are in the nymph stage, improving the specificity of pest identification, avoiding confusion with factors such as mechanical damage and natural aging, and providing accurate monitoring and early warning.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122244532A_ABST
    Figure CN122244532A_ABST
Patent Text Reader

Abstract

This invention discloses a method and system for detecting pests in *Salvia miltiorrhiza* based on image recognition, belonging to the field of image recognition technology. Specifically, it includes: acquiring a multi-temporal, multispectral image sequence of the *Salvia miltiorrhiza* leaf surface from the absence of droplets to the stable formation of droplets; processing each frame of the image to identify and locate all microdroplet regions, recording their appearance time and centroid coordinates; using the leaf surface as a continuous coordinate system, calculating the spatial density field of microdroplets at that moment based on the centroid coordinates of all microdroplets in each frame of the image; analyzing the change of the density field over time, identifying local areas where the density continuously increases, and tracing and marking their geometric centers as potential pest activity focal points; connecting the positions of the same focal point at different times to form a movement path, and extracting its geometric shape and movement speed features; inputting the features into a pre-trained pest behavior decoding network, outputting pest type codes, and generating a detection report. This invention achieves contactless, automated, and accurate tracing and identification of pest activities.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of image recognition technology, and specifically to a method and system for detecting pests in Salvia miltiorrhiza based on image recognition. Background Technology

[0002] As an important medicinal plant, pest and disease monitoring during the large-scale cultivation of Salvia miltiorrhiza is a crucial link in ensuring the yield and quality of the medicinal material. Traditional pest monitoring mainly relies on manual field inspections and experience-based judgment, which suffers from low efficiency, limited coverage, and difficulty in achieving early warning. With the development of agricultural informatization and intelligent sensing technologies, automatic identification methods for crop diseases and pests based on machine vision and image analysis have become a research hotspot in this field, aiming to improve the timeliness and accuracy of monitoring through non-contact, high-throughput technologies.

[0003] In existing technologies, image recognition-based methods for detecting pests in *Salvia miltiorrhiza* can be mainly divided into two categories. One category relies on visible light images, analyzing changes in color, texture, or morphological defects on the leaf surface to identify pests. The other category utilizes multispectral or hyperspectral imaging techniques, capturing differences in reflectance of plants at specific wavelengths to identify physiological stress characteristics related to pests and diseases. These methods primarily depend on capturing the morphology of the pests themselves, leaf defects caused by direct feeding, or specific lesions for identification.

[0004] Existing methods for identifying pests affecting Salvia miltiorrhiza primarily rely on visible light or broadband multispectral imaging techniques. Identification is achieved by capturing the pest's morphology, leaf damage caused by direct feeding, or specific lesions. However, these methods face significant limitations in practical applications: First, when pests are small, elusive, or in their nymphal stage, their morphological features are atypical or difficult to clearly visualize in the image, leading to decreased accuracy. Second, methods relying on leaf damage or discolored areas are susceptible to interference from complex field conditions, such as mechanical damage, natural aging marks, or other non-pest-related disease symptoms, which can cause feature confusion and affect the specificity and reliability of the identification. Summary of the Invention

[0005] The purpose of this invention is to provide a method and system for detecting pests in Salvia miltiorrhiza based on image recognition, and to solve the following technical problems:

[0006] Existing methods for identifying pests in Salvia miltiorrhiza mainly rely on visible light or broadband multispectral imaging, judging pests by capturing their morphology or direct leaf damage characteristics. However, it is difficult to clearly image and accurately classify pests that are small, elusive, or in the nymph stage; furthermore, leaf damage characteristics are easily confused with mechanical damage, natural aging, and other disease symptoms, resulting in poor identification specificity and a high misjudgment rate.

[0007] The objective of this invention can be achieved through the following technical solutions:

[0008] A method for detecting pests in *Salvia miltiorrhiza* based on image recognition includes the following steps:

[0009] S1. Collect a multi-temporal, multi-spectral image sequence of the surface of Salvia miltiorrhiza leaves, the image sequence covering the complete time process from the state without microdroplets to the state of stable microdroplet formation;

[0010] S2. Process each frame of the multi-temporal multispectral image sequence, identify and locate all microdroplet regions in each frame, and record the appearance time and centroid coordinates of each microdroplet region.

[0011] S3. Define the blade surface space as a continuous coordinate system. Based on the set of centroid coordinates of all microdroplet regions in each frame of the image, calculate the microdroplet spatial density field on the blade surface at that moment.

[0012] S4. Analyze the changes of the spatial density field of the microdroplets with time series, identify local spatial regions where the density value continuously increases, and trace and mark the geometric center of each local spatial region as a potential focus of pest action.

[0013] S5. Connect the spatial positions of the same potential pest focus at different times to form a focus movement path; extract the geometric shape features and movement speed features of the focus movement path;

[0014] S6. Input the geometric shape features and movement speed features of the focal movement path into the pre-trained pest behavior decoding network, output the corresponding pest type code, and generate a detection report containing the pest type code and the focal movement path.

[0015] As a further aspect of the present invention: in step S2, the process of identifying and locating all microdroplet regions present in each frame of image is as follows:

[0016] A frame of an image in a droplet-free state from a multi-temporal multispectral image sequence is selected as a background reference frame. The current image frame to be processed and the background reference frame are subjected to pixel-by-pixel gray value subtraction operation in the corresponding multiple narrow bands to obtain the difference image of each band. For each difference image, a difference judgment threshold is calculated based on its global gray distribution statistical characteristics. Pixels with an absolute gray value difference greater than the threshold are marked as candidate change pixels.

[0017] For each pixel location in the image, the number of times it is marked as a candidate changing pixel point in different bands is counted. When the number exceeds the preset cross-band consistency threshold, the pixel location is determined to belong to the microdroplet spectral response region. Spatial connectivity analysis is performed on the pixels belonging to this region. The set of mutually adjacent pixels is marked as a connected region and defined as a microdroplet region. The arithmetic mean of the coordinates of all pixels in the connected region is calculated as its centroid coordinates. The acquisition timestamp of the current image frame is recorded as the appearance time of the microdroplet region.

[0018] As a further aspect of the present invention: in step S3, the process of calculating the spatial density field of microdroplets on the blade surface is as follows:

[0019] A two-dimensional Cartesian coordinate system is established with the effective area of ​​the leaf image as the range. The two-dimensional Cartesian coordinate system is uniformly divided into grids in the horizontal and vertical directions to form a grid array covering the leaf area. Each grid cell is called a density calculation cell.

[0020] For a single-frame image, the centroid coordinates of all microdroplet regions identified in the single-frame image are traversed. Based on the value of the centroid coordinates, the density calculation unit where the centroid coordinates are located is determined, and the initial statistical value of the density calculation unit is incremented by one. After traversal and statistics are completed, an initial density distribution matrix corresponding to the grid array is obtained. A two-dimensional Gaussian convolution kernel with a preset size and standard deviation parameter is used to perform spatial convolution smoothing operation on the initial density distribution matrix. The value of each density calculation unit position in the matrix obtained after smoothing is defined as the microdroplet spatial density field value of the center point coordinate of the density calculation unit at the current time.

[0021] As a further aspect of the present invention: In step S4, the process of analyzing the change of the spatial density field of microdroplets over time and identifying local spatial regions where the density value continuously increases is as follows:

[0022] Acquire spatial density field data of microdroplets at multiple consecutive moments arranged in chronological order. For each density calculation unit in the density field grid system, extract the density field value of the density calculation unit at each moment to form a time density value sequence of the density calculation unit.

[0023] For each density calculation unit, a first-order forward difference calculation is performed on the time density value sequence to obtain the density change value sequence of the density calculation unit over continuous time intervals. A positive change threshold and a minimum duration threshold are set. The density change value sequence of each density calculation unit is traversed to identify whether there is a subsequence in the density change value sequence where the density change value of multiple consecutive time intervals is greater than the positive change threshold. If there is a subsequence whose duration is not less than the minimum duration threshold, and the density value at the first moment after the end of the subsequence is not lower than the density value at the beginning of the subsequence, then the density calculation unit is marked as a continuously active unit.

[0024] As a further aspect of the present invention: in step S4, the process of tracing and marking the geometric center of each local spatial region as a potential focus of pest activity is as follows:

[0025] Spatial clustering analysis is performed on all labeled persistently active units to group spatially adjacent persistently active units into a set. Each set is defined as a local active region. The geometric center coordinates of the local active region are calculated by calculating the weighted average of the coordinates of all persistently active units in the local active region. The weight of each persistently active unit is positively correlated with the average rate of change of the continuously increasing subsequences identified in the density change value sequence of the persistently active unit.

[0026] The calculated geometric center coordinates are marked as the spatial coordinates of a potential pest focus. The earliest time point in the local active area that is marked as a continuously active unit is recorded as the initial tracing time of the potential pest focus.

[0027] As a further aspect of the present invention: in step S5, the process of extracting the geometric shape features and movement speed features of the focus movement path is as follows:

[0028] For a potential pest impact focus, according to the time sequence in which the potential pest impact focus is marked, obtain the spatial location coordinate sequence of the potential pest impact focus at multiple time points. Connect the points in the spatial location coordinate sequence with straight line segments in time sequence to form a focal point movement path polyline. Calculate the ratio of the total length of the focal point movement path polyline to the straight line distance between the start and end points, and define it as the curvature feature of the focal point movement path.

[0029] Calculate the Euclidean distance between each pair of adjacent coordinate points on the broken line of the focus movement path to obtain the displacement between adjacent time points. Divide each displacement by the corresponding time interval to obtain the instantaneous movement speed of the focus on the path segment. Calculate the arithmetic mean of all instantaneous movement speeds as the average speed characteristic of the focus movement path. Calculate the standard deviation of all instantaneous movement speeds as the speed fluctuation characteristic of the focus movement path.

[0030] As a further aspect of the present invention: in step S6, the pre-training process of the pest behavior decoding network is as follows:

[0031] Construct a training dataset. Each sample consists of a historical focal movement path of a known pest category and its category label. For each path, calculate the ratio of the total path length to the straight-line distance between the start and end points to obtain the curvature feature. Calculate the average ratio of the displacement to the time interval of each adjacent point on the path to obtain the average velocity feature. Calculate the standard deviation of the velocity sequence to obtain the velocity fluctuation feature.

[0032] The curvature, average velocity, and velocity fluctuation features are combined into a feature vector and used as network input. The class labels are converted into one-hot encodings as training targets. The network parameters are initialized, and the feature vectors are input into the network in batches for forward propagation. The predicted probability distribution is obtained by normalizing the exponential function at the output layer. The cross-entropy loss between the predicted distribution and the true label is calculated. The gradient of the loss with respect to the network parameters is calculated using the backpropagation algorithm. The network parameters are updated using the gradient descent algorithm. This process is repeated until the classification accuracy of the network on the validation set tends to stabilize.

[0033] As a further aspect of the present invention: In step S6, the process of outputting the corresponding pest type code and generating a detection report containing the pest type code and the focus movement path is as follows:

[0034] The curvature characteristic is obtained by calculating the ratio of the total path length to the straight-line distance between the start and end points of the path to be inspected; the average speed characteristic is obtained by calculating the arithmetic mean of the instantaneous speeds on the path; and the speed fluctuation characteristic is obtained by calculating the standard deviation of the instantaneous speed sequence.

[0035] The above features are combined into a feature vector and input into a pre-trained pest behavior decoding network. After the network performs forward propagation, a probability distribution vector is generated in the output layer. The pest type code corresponding to the dimension with the largest value is selected as the output. An electronic document is created, with the leaf identifier and collection time range written in the header, the pest type codes listed in the main body and all potential pest action focus recorded in a table, and the appendix storing the complete coordinate sequence, timestamp and feature value of each path. The electronic document is then output.

[0036] This invention also includes an image recognition-based pest detection system for *Salvia miltiorrhiza*, used to implement the above-described image recognition-based pest detection method for *Salvia miltiorrhiza*, comprising:

[0037] The time-series image acquisition module is used to acquire multi-temporal multispectral image sequences of the surface of Salvia miltiorrhiza leaves, the image sequences covering the complete time process from the state without microdroplets to the state of stable microdroplet formation;

[0038] The droplet feature extraction module is used to process each frame of the multi-temporal multispectral image sequence, identify and locate all microdroplet regions in each frame of the image, and record the occurrence time and centroid coordinates of each microdroplet region.

[0039] The spatial density field construction module is used to define the space of the blade surface as a continuous coordinate system and calculate the spatial density field of microdroplets on the blade surface at that moment based on the set of centroid coordinates of all microdroplet regions in each frame of the image.

[0040] The dynamic focus tracing module is used to analyze the changes in the spatial density field of the microdroplets over time, identify local spatial regions where the density value continuously increases, and trace and mark the geometric center of each local spatial region as a potential pest focus.

[0041] The trajectory feature generation module is used to connect the spatial positions of the same potential pest focus at different times to form a focus movement path; and to extract the geometric shape features and movement speed features of the focus movement path.

[0042] The intelligent classification report module is used to input the geometric shape features and movement speed features of the focal movement path into a pre-trained pest behavior decoding network, output the corresponding pest type code, and generate a detection report containing the pest type code and the focal movement path.

[0043] The beneficial effects of this invention are:

[0044] This invention overcomes the limitations of existing technologies that rely on the pest itself or direct damage morphology for identification by collecting and analyzing multi-temporal, multispectral image sequences of microdroplets on the surface of *Salvia miltiorrhiza* leaves. Specifically, by constructing a microdroplet spatial density field and identifying continuously increasing local areas, potential pest activity focal points are traced and marked, enabling the capture and source localization of indirect traces of pest activity. This allows for the detection of small, concealed pests or those in the nymph stage. Furthermore, by connecting the focal points at different times to form movement paths and extracting their geometric shape and speed features, this invention establishes the ability to analyze dynamic patterns of pest behavior. These features are classified by a pre-trained pest behavior decoding network, associating subtle spatiotemporal changes with specific pest types, significantly improving the specificity of the judgment and effectively avoiding confusion with mechanical damage, natural aging, or other non-target disease symptoms. The resulting structured detection report integrates pest type, focal point location, and evolutionary path, providing a reliable technical means for accurate monitoring and early warning. Attached Figure Description

[0045] The invention will now be further described with reference to the accompanying drawings.

[0046] Figure 1 This is a flowchart illustrating a method for detecting pests in Salvia miltiorrhiza based on image recognition according to the present invention.

[0047] Figure 2 This is a schematic diagram of a module of a Salvia miltiorrhiza pest detection system based on image recognition according to the present invention. Detailed Implementation

[0048] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0049] Please see Figure 1 As shown, this invention is a method for detecting pests in *Salvia miltiorrhiza* based on image recognition, comprising the following steps:

[0050] S1. Collect a multi-temporal, multi-spectral image sequence of the surface of Salvia miltiorrhiza leaves, the image sequence covering the complete time process from the state without microdroplets to the state of stable microdroplet formation;

[0051] S2. Process each frame of the multi-temporal multispectral image sequence, identify and locate all microdroplet regions in each frame, and record the appearance time and centroid coordinates of each microdroplet region.

[0052] S3. Define the blade surface space as a continuous coordinate system. Based on the set of centroid coordinates of all microdroplet regions in each frame of the image, calculate the microdroplet spatial density field on the blade surface at that moment.

[0053] S4. Analyze the changes of the spatial density field of the microdroplets with time series, identify local spatial regions where the density value continuously increases, and trace and mark the geometric center of each local spatial region as a potential focus of pest action.

[0054] S5. Connect the spatial positions of the same potential pest focus at different times to form a focus movement path; extract the geometric shape features and movement speed features of the focus movement path;

[0055] S6. Input the geometric shape features and movement speed features of the focal movement path into the pre-trained pest behavior decoding network, output the corresponding pest type code, and generate a detection report containing the pest type code and the focal movement path.

[0056] In a preferred embodiment of the present invention, the process of identifying and locating all microdroplet regions present in each frame of image in step S2 is as follows:

[0057] This process relies on the collaborative analysis of multi-temporal, multispectral image sequences, which fully record the dynamic evolution from a clean, unspoiled leaf surface to the initial condensation of microdroplets, until the droplet distribution becomes relatively stable. Each frame is not only a time slice but also a data cube containing multiple independent narrow-band spectral channels, such as the visible red light band, the near-infrared band, and specific water absorption characteristic bands.

[0058] The implementation process first requires establishing a reliable reference baseline. From the image sequence, the earliest image in time, which has been manually or algorithmically confirmed to be completely free of microdroplets on the leaf surface, is selected as the background reference frame. This image represents the background spectral reflectance characteristics of the tested leaf at the initial moment of the experiment.

[0059] For any subsequent frame of the image to be analyzed in the sequence, such as the image acquired 300 seconds after the start of the experiment, the core operation is spectral difference analysis. Specifically, the grayscale values ​​of the current frame and the background reference frame are subtracted pixel-by-pixel in every completely corresponding spectral band. This means that for any pixel in the image, its grayscale value in the red band of the current frame is calculated, and then subtracted from the grayscale value of the same position in the red band of the reference frame to obtain the red difference value; similarly, the difference values ​​in the near-infrared band and all other bands are calculated. This series of operations generates several difference images, the same number as the number of spectral bands. Each difference image visually displays the difference between the current moment and the initial clean state in a specific spectral dimension.

[0060] However, due to slight fluctuations in illumination, physiological activities of the leaves themselves, or sensor noise, not all grayscale changes in the differential images are caused by microdroplets. Therefore, it is necessary to perform thresholding processing independently on the differential images of each band to separate significant changes that may be caused by microdroplets. Thresholding calculation is based on the statistical characteristics of the global grayscale distribution of the differential images. A common method is to calculate the absolute value sequence of grayscale values ​​of all pixels in the differential image and find its mean and standard deviation. The threshold is then set to the mean plus two or three times the standard deviation. For example, if the calculated mean difference of a near-infrared differential image is 5 and the standard deviation is 8, then the threshold can be set to 5 plus 2 multiplied by 8, i.e., 21. Pixels with an absolute grayscale difference greater than 21 in the differential image are initially marked as candidate change pixels in that band. This threshold is adaptive and dynamically changes with the data distribution of different images and different bands.

[0061] Candidate points on a single band may contain noise. To improve the reliability of the judgment, cross-band consistency verification is introduced. For each specific pixel location in the image, such as a pixel at coordinates (x=100, y=200), it is checked how many bands in all processed spectral band difference images mark it as a candidate changed pixel. A cross-band consistency threshold is set, for example, requiring at least three different bands (such as red light, near-infrared, and a short-wave infrared band) to simultaneously determine that the pixel has changed significantly. Only when a pixel location is marked as a candidate point more than this preset threshold, for example, it is marked in 4 out of a total of 5 bands, is the pixel location finally determined to belong to the "microdroplet spectral response region". This process makes full use of the synergistic effect of microdroplets on the reflectivity of multiple spectral bands, effectively suppressing false alarms caused by random noise and single-band specific interference.

[0062] After identifying all pixels belonging to the response region, spatial aggregation is needed to define individual microdroplets. This is achieved through a spatial connected component analysis algorithm. This algorithm scans the binarized response region image (pixels belonging to the region are 1, otherwise 0), grouping all spatially adjacent pixels (typically defined using an eight-neighborhood, i.e., the top, bottom, left, right, and four diagonal directions of a pixel) with a value of 1 into a single set. Each such set of pixels is defined as an independent connected component, corresponding to a physically contiguous microdroplet region. For example, the algorithm might identify hundreds of connected components of varying sizes and shapes.

[0063] For each identified connected component, its geometric centroid coordinates are calculated. Specifically, the row coordinates (i) and column coordinates (j) of all N pixels within the connected component are listed. The sum of the row and column coordinates is then divided by the total number of pixels N. The two quotients (iavg, javg) are the centroid coordinates of the connected component. These coordinates serve as the position identifier of the microdroplet on the two-dimensional image plane. Simultaneously, the acquisition timestamp of the current image frame is recorded, for example, 300.5 seconds from the start of the experiment, as the initial time when this microdroplet region was observed to appear. Thus, the number, location, and appearance time of all microdroplet regions in a single frame image are quantified and extracted.

[0064] In another preferred embodiment of the present invention, in step S3, the process of calculating the spatial density field of microdroplets on the blade surface is as follows:

[0065] First, establish a spatial reference frame. Using the effective area of ​​the image containing the complete blade target as the boundary, establish a two-dimensional Cartesian coordinate system. Typically, the top-left corner of the image is the origin (0,0), the positive X-axis points horizontally to the right, and the positive Y-axis points vertically downwards. The unit is the physical size of one pixel, for example, 0.1 millimeters per pixel.

[0066] Subsequently, the blade area covered by this coordinate system is divided into regular grids. The X-axis is uniformly divided into M parts from minimum to maximum value, and the Y-axis is uniformly divided into N parts, thus forming M x N regular rectangular grid cells. Each such grid cell is called a "density calculation cell". The size of the cell (i.e., spatial resolution) is set according to the research requirements; for example, each cell can be set to correspond to a 0.5 mm x 0.5 mm area on the actual blade. If the blade image covers an area of ​​30 mm x 20 mm, it is divided into 60 columns x 40 rows, totaling 2400 density calculation cells. These cells constitute a regular grid array covering the blade surface.

[0067] For a single frame image at a specific moment, the positions of all microdroplets have been given by step S2, represented as a series of centroid coordinates. The first step in calculating the density field is to generate an "initial density distribution matrix". The number of rows and columns of this matrix is ​​consistent with the grid array, also M rows and N columns. The initial value of each element in the matrix is ​​set to 0.

[0068] Next, the coordinates of all microdroplet centroids identified in the image frame are iterated. For each centroid coordinate, its X and Y values ​​determine which density calculation cell it falls within. For example, a centroid with coordinates (x=5.3 mm, y=12.7 mm) at a 0.5 mm grid resolution falls within the density calculation cell corresponding to column 11 (5.3 / 0.5 rounded down) and row 26 (12.7 / 0.5 rounded down). Therefore, the element value in row 26 and column 11 of the initial density distribution matrix is ​​incremented by 1. After all microdroplets have completed this "voting" count, the value of each cell in the initial density distribution matrix represents the number of microdroplets falling within that cell's geographic range at the corresponding moment in the image frame. This value is a discrete count.

[0069] However, directly using this discrete counting matrix as the density field is not ideal because the random distribution of microdroplets and grid boundary effects make the density map appear unsmooth and discontinuous. Therefore, spatial smoothing is required. This is achieved through two-dimensional convolution operations. First, a "two-dimensional Gaussian convolution kernel" is defined. This is a small numerical matrix, for example, with a size of 5 rows and 5 columns. The value of the central element of the matrix is ​​the largest, and it decays outwards according to a two-dimensional Gaussian function distribution, with the sum of all element values ​​being 1. The standard deviation parameter controls the rate of weight decay; the larger the standard deviation, the stronger the smoothing effect.

[0070] A predefined Gaussian convolution kernel is mathematically applied to the initial density distribution matrix to perform spatial convolution. The operation can be understood as follows: the center of the Gaussian kernel slides sequentially across each element of the density matrix. At each position, the density count value within the local area covered by the Gaussian kernel is multiplied by the weight coefficient corresponding to that position. These weighted values ​​are then summed, and the result is used as the new, smoothed density value, assigned to the density calculation unit corresponding to the center position. This operation eliminates count fluctuations within a single grid cell and reasonably diffuses the influence of a microdroplet into its neighboring grid cells, reflecting the spatial correlation of the microdroplet distribution.

[0071] The new matrix obtained after convolution smoothing no longer has a simple integer value for each element; instead, it contains a continuous numerical value with decimal places. This value is formally defined as the "microdroplet spatial density field value" at the time represented by the center point coordinates of the corresponding density calculation unit in the current image frame. This density field value is a continuous scalar field that visually depicts the spatial distribution pattern of the probability or abundance of microdroplets on each unit area of ​​the leaf surface at a specific moment. By arranging the density fields calculated from different time frames in chronological order, the evolution of the spatial distribution density of microdroplets over time can be dynamically displayed.

[0072] In another preferred embodiment of the present invention, in step S4, the process of analyzing the change of the spatial density field of the microdroplets over time, identifying local spatial regions where the density value continuously increases, and tracing and marking the geometric center of each local spatial region as a potential focus of pest activity is as follows:

[0073] First, it is necessary to acquire microdroplet spatial density field data at consecutive moments arranged in chronological order. Assuming that from the 10th to the 60th minute after the experiment begins, a density field frame is collected and calculated every 2 minutes, then 26 density field data points arranged chronologically will be obtained. Each density field is a two-dimensional matrix, where the row and column indices correspond to the regular grid covering the blade surface, and the value of each element in the matrix represents the spatial density field value of its corresponding grid cell at that moment.

[0074] The first step in the analysis is to perform a time-series analysis on each individual density calculation unit in the grid system, such as a specific grid unit located in the middle of the blade. Density values ​​for this unit at 26 different times are extracted along the time axis. Connecting these values ​​in chronological order creates a "time-density value sequence" specific to this particular spatial location. This sequence records the fluctuation trajectory of microdroplet distribution density over time in this tiny region.

[0075] Next, the trend analysis of the temporal density value sequence for each unit is performed. The specific method is to conduct first-order forward differencing. First-order forward differencing involves subtracting the density value of the previous time step from the density value of the next time step. For a sequence consisting of 26 time points, the differencing calculation will produce 25 differences. These differences, arranged in chronological order, form the "density change value sequence" for that unit. For example, this sequence might be [+0.1, -0.05, +0.3, +0.25, +0.15, -0.1, ...], where positive numbers indicate an increase in density and negative numbers indicate a decrease in density. This change sequence more intuitively reflects the dynamic rate of local aggregation or dissipation than the original density sequence.

[0076] The core identification logic lies in filtering out "continuously rising" events that conform to a specific pattern from the change sequence. This requires defining two key thresholds. One is the "positive change threshold," used to determine whether a single increase is significant enough, rather than a minor fluctuation; for example, it can be set to 0.2. The other is the "minimum duration threshold," used to determine whether the upward trend has lasted for a sufficiently long time; for example, it requires that the condition be met for three consecutive time intervals (corresponding to 6 minutes).

[0077] The identification process iterates through the density change value sequence of each unit. The algorithm slides across the sequence starting from the beginning, searching for a subsequence in which the density change value in each of several consecutive time intervals is greater than the positive change threshold of 0.2. For example, in the change sequence of a certain unit, it might be found that the change values ​​from minute 15 to minute 21 (corresponding to three consecutive time intervals) are +0.25, +0.30, and +0.22, all greater than 0.2.

[0078] However, simply satisfying the condition of continuous rise is not enough. To avoid identifying false signals that are short-lived rises followed by immediate drops, a stability condition needs to be added: at the next moment after the end of the aforementioned continuous rising subsequence (i.e., at the 23rd minute), the density value of this unit cannot be lower than the density value at the beginning of the subsequence (the 15th minute). This condition ensures that the upward trend has a certain "accumulation" effect, and the density level is maintained after the rise ends, rather than falling rapidly.

[0079] When a density calculation unit's temporal density value sequence is identified as having at least one subsequence that satisfies all of the above conditions, the unit is formally designated as a "persistently active unit." This designation means that, within a specific time period, the local spatial location exhibits a sustained and significant trend of increasing microdroplet density, suggesting that it may be an active condensation or aggregation point.

[0080] After labeling all grid cells, S4 proceeds to the second stage: spatial clustering and focus generation. At this point, many labeled, continuously active cells are distributed across the image grid; they may be spatially dispersed or adjacent to each other.

[0081] To identify spatially clustered active regions, spatial clustering analysis needs to be performed on these discrete, persistently active units. A common approach is density-based clustering algorithms, such as the DBSCAN algorithm. This algorithm determines proximity based on the Euclidean distance between units (typically measured in grid intervals in a grid coordinate system). If the distance between two persistently active units is less than a preset neighborhood radius (e.g., two grid units), they are considered adjacent. The algorithm then groups all persistently active units that can be connected by proximity into the same set. Each such set of spatially closely adjacent persistently active units is defined as a "locally active region." This represents a coherent spatial extent that exhibits a persistent clustering trend both in space and time.

[0082] For each identified locally active region, its representative spatial location, i.e., the coordinates of its geometric center, needs to be calculated. This geometric center is not a simple arithmetic mean center, but a weighted center. The weights are related to the "activity intensity" of each constituent unit. Specifically, for each continuously active unit within the region, detailed information about its "continuously rising subsequence" is obtained from the previous identification process. The average of all density changes in this subsequence is calculated; this average represents the average rate of density increase of the unit during the active period, i.e., the "average rate of change." The higher this rate of change, the faster and more intensely the unit clusters during the trend period, and the greater its contribution to defining the region center.

[0083] When calculating the coordinates of the geometric center, the weighted average of the X-coordinates and the weighted average of the Y-coordinates of all cells within the region are calculated separately. The weight of each cell is proportional to its own average rate of change. Assume a locally active region contains three cells with coordinates and average rates of change as follows: Cell A (x=10, y=20, rate of change = 0.25), Cell B (x=11, y=20, rate of change = 0.40), and Cell C (x=10, y=21, rate of change = 0.15). Then, the X-coordinate of the center = (10*0.25 + 11*0.40 + 10*0.15) / (0.25 + 0.40 + 0.15) ≈ 10.44; the Y-coordinate = (20*0.25 + 20*0.40 + 21*0.15) / (0.25 + 0.40 + 0.15) ≈ 20.19. This weighting center will be more biased towards the location of the unit with the most dramatic changes within the region.

[0084] Finally, the calculated weighted geometric center coordinates are formally marked as the spatial coordinates of a "potential pest focus." The logic behind this naming is that the exudation of plant sap or accumulation of exudates caused by insects' (such as aphids and leafhoppers) feeding or oviposition is often the root cause of the continuous, localized generation of microdroplets. Therefore, the center of a local area exhibiting a continuous increase in microdroplet density in space and time has a high probability of corresponding to a real initial pest impact point. Simultaneously, the "earliest marked time point" among all continuously active units within the locally active region that generated this focus is recorded. For example, if a unit within this region is first identified as a continuously active unit 18 minutes after the start of the experiment, then this 18th minute is recorded as the "initial tracing time" of the potential pest focus. This time point provides crucial information for tracing the starting moment of pest activity.

[0085] Through step S4, this invention condenses massive spatiotemporal density field data into several potential pest activity focal points with clear spatial coordinates and initial times, realizing the transformation from continuous field observation to discrete event point detection, and providing direct target objects for subsequent source tracing and early warning.

[0086] In another preferred embodiment of the present invention, the process of extracting the geometric shape features and movement speed features of the focus movement path in step S5 is as follows:

[0087] For a specific potential pest focus, such as focus F001 initially marked at the base of a leaf, it is necessary to acquire its complete trajectory over a continuous observation period. This trajectory is represented as a sequence of spatial coordinates arranged in chronological order. For example, starting from the 18th minute when the focus was first traced, and continuing until the 60th minute, recording its weighted geometric center coordinates every 2 minutes will yield 22 ordered coordinate points. These coordinate points constitute the discrete spatiotemporal sampling data of focus F001 on a two-dimensional image plane.

[0088] First, geometric features are extracted. These coordinate points are then connected sequentially by straight line segments, forming a broken line. This broken line is the "focal movement path broken line" for the focal point during the observation period. To quantify the curvature or meandering of this path, a feature index called "curvature" is introduced. This index is calculated as follows: First, the total length of the broken line is calculated, which is the sum of the lengths of the straight line segments between all adjacent coordinate points. Next, the straight-line distance between the starting and ending coordinates of the path is calculated, i.e., the shortest Euclidean distance in space. Finally, the total length of the broken line is divided by the straight-line distance between the starting and ending points; the resulting ratio is the curvature feature. For example, a broken line with a total length of 15 mm, where the starting and ending points are only 5 mm apart, has a curvature of 3.0. The larger this value, the more tortuous the actual movement path of the focal point is relative to a straight path; a value close to 1.0 indicates that the path is nearly straight. This feature can reflect whether the insect is moving in a directional manner or wandering randomly.

[0089] Next, we extract the movement speed features. This requires parsing the speed information from the path. First, we calculate the Euclidean distance between each pair of adjacent coordinate points on the path polyline. This distance represents the displacement of the focus within the interval between those adjacent time points. For example, if the focus is located at coordinates (10.0, 20.0) at time point T1 (20 minutes) and at coordinates (10.3, 20.1) at time point T2 (22 minutes), then its displacement is the straight-line distance between the two points, approximately 0.32 millimeters.

[0090] Then, each displacement is divided by the time interval between its two adjacent time points to obtain the "instantaneous velocity" of the focus on that specific path segment. Continuing with the previous example, if the time interval is 2 minutes, then the instantaneous velocity is approximately 0.16 millimeters per minute. By performing the same calculation on all adjacent time point pairs, a sequence of multiple instantaneous velocity values ​​can be obtained.

[0091] Based on this instantaneous velocity sequence, two key statistical characteristics can be calculated. The first is the "average velocity characteristic," which is calculated from the arithmetic mean of all instantaneous velocity values ​​in the sequence. This value represents the overall speed at which the focus moves throughout the observation period. The second is the "velocity fluctuation characteristic," which is calculated from the standard deviation of all instantaneous velocity values ​​in the sequence. The standard deviation measures the dispersion of each instantaneous velocity value relative to the mean. For example, a focus with an average velocity of 0.2 mm / min and a standard deviation of 0.05 mm / min indicates relatively stable movement; while another focus with the same average velocity but a standard deviation of 0.15 mm / min indicates that its movement is sometimes fast and sometimes slow, with significant velocity fluctuations. The velocity fluctuation characteristic helps distinguish different motion patterns such as uniform movement, accelerated movement, or intermittent stopping.

[0092] Through the above calculations, the dynamic behavior of a potential pest impact focus can be quantified into three specific numerical characteristics: curvature characteristics describing path tortuosity, average speed characteristics describing movement speed, and speed fluctuation characteristics describing the smoothness of movement. These characteristics provide an objective and quantifiable data basis for subsequent judgment of insect species, activity intensity, or environmental stress response.

[0093] In another preferred embodiment of the present invention, in step S6, the process of inputting the geometric shape features and movement speed features of the focal movement path into a pre-trained pest behavior decoding network, outputting the corresponding pest type code, and generating a detection report containing the pest type code and the focal movement path is as follows:

[0094] The first stage is the pre-training of the insect behavior decoding network. The goal of this stage is to establish a mathematical model capable of learning and distinguishing different insect behavior patterns from path geometry and kinematic features. The primary task of pre-training is to construct a high-quality, labeled training dataset. Each sample in the dataset originates from a known, controlled insect pest experiment observation history. In a typical sample construction, the operator introduces a specific insect species, such as aphids, onto the experimental plant and uses steps S1 to S5 of this method to automatically track and record the complete movement path of one or more potential pest focus areas generated by the insect's activity. This path is represented as a series of spatial coordinate points with timestamps. Simultaneously, the path is explicitly labeled with its corresponding insect category label, such as "aphid." By accumulating a large number of experiments on different insect species (such as spider mites, whiteflies, thrips, etc.), a dataset containing hundreds or thousands of path samples and their corresponding real pest category labels can be constructed.

[0095] For each historical focal movement path in the dataset, three standardized numerical features need to be calculated according to the method defined in step S5. First is the curvature feature, calculated by dividing the total length of the broken line formed by connecting all coordinate points of the path sequentially by the straight-line distance between the start and end points of the path. For example, a path with a total length of 12.5 mm and a straight-line distance of 8.0 mm between the start and end points has a curvature feature value of 1.5625. Second is the average velocity feature, calculated by dividing the displacement between all adjacent time points on the path by the corresponding time interval to obtain a series of instantaneous velocities, and then calculating the arithmetic mean of these instantaneous velocities. A path with varying speeds might have an average velocity of 0.85 mm per minute. Finally, there is the velocity fluctuation feature, calculated by determining the standard deviation of the above instantaneous velocity sequence to quantify the severity of velocity changes; for example, a standard deviation of 0.12 mm per minute.

[0096] Next, the calculated values ​​of curvature, average speed, and speed fluctuation are combined into a three-dimensional feature vector, such as an array of the form [1.5625, 0.85, 0.12]. This feature vector will serve as the input to the neural network model. The corresponding pest category labels, such as "aphid" and "spider mite," need to be converted to the one-hot encoding format commonly used in machine learning. If there are a total of 5 pest categories, then "aphid" might be encoded as [1, 0, 0, 0, 0], "spider mite" as [0, 1, 0, 0, 0], and so on. This one-hot encoded vector will serve as the target for model training.

[0097] The pest behavior decoding network itself is typically a multilayer perceptron structure. A simple design may include an input layer that receives a 3D feature vector; one or more hidden layers with non-linear activation functions, such as 64 neurons per hidden layer using the ReLU activation function; and an output layer with the number of neurons equal to the total number of pest categories. The output layer is then followed by a normalized exponential function, which transforms the raw values ​​of the output layer into a probability distribution vector, where each element has a value between 0 and 1, and the sum of all elements is 1, representing the probability that the model considers the input path to belong to each category.

[0098] At the start of model training, all network parameters, namely connection weights and biases, are randomly initialized. Training is performed in batches, iteratively. In each iteration, a small batch of samples is randomly selected from the dataset, for example, 32 feature vectors and their corresponding one-hot encoded labels. These feature vectors are input into the network for forward propagation to obtain the network's predicted probability distribution for these samples. Then, the cross-entropy loss between the predicted probability distribution and the true one-hot encoded labels is calculated. Cross-entropy loss is a commonly used metric to measure the difference between the predicted probability distribution and the true distribution; the smaller the value, the more accurate the prediction.

[0099] After obtaining the loss value, the gradient of the loss function with respect to each parameter in the network is automatically calculated using the core technique of backpropagation. The gradient indicates the direction and magnitude by which each parameter should be adjusted to reduce the loss. Subsequently, a variant of the gradient descent algorithm, such as the Adam optimizer, is used to update all parameters of the network based on the calculated gradients. For example, if the original value of a weight parameter is 0.15, its gradient is -0.02, and the learning rate is set to 0.001, then the updated weight might become 0.15002.

[0100] The above process—forward propagation, loss calculation, backpropagation, and parameter update—constitutes a complete training batch. Training is performed multiple times on the entire dataset. Simultaneously, a portion of labeled data is reserved as a validation set, which does not participate in parameter updates but is only used to monitor training effectiveness. When the model's classification accuracy on the validation set, for example, after 10 consecutive training epochs, no longer significantly improves and stabilizes at around 95%, pre-training is considered complete, and the network parameters at this point are saved. At this point, the network possesses the ability to infer the most likely pest type based on new path features.

[0101] The second stage is online reasoning and report generation. When it is necessary to analyze a new and unknown observed leaf, the operator first uses steps S1 to S5 to identify all potential pest activity focal points on the leaf and calculates the movement path of each focal point.

[0102] For each path to be detected, its features are calculated using the same standard procedure as in the training phase: the ratio of the total path length to the straight-line distance between the start and end points is used to obtain the curvature feature; the arithmetic mean of all instantaneous velocities on the path is used to obtain the average velocity feature; and the standard deviation of these instantaneous velocities is used to obtain the velocity fluctuation feature. These three values ​​are then combined into a feature vector.

[0103] Then, this feature vector is input into a pre-trained pest behavior decoding network. The network performs forward propagation calculations, undergoes nonlinear transformations in the hidden layers, and finally generates a probability distribution vector at the output layer through a normalized exponential function. Assuming the network's output probability vector for the five pest types is [0.02, 0.08, 0.85, 0.03, 0.02], where the third dimension has the largest value of 0.85, the network determines that the pest type corresponding to the path is the category represented by the third dimension. The system internally pre-defines a mapping table between dimension indices and specific pest type codes; for example, index 2 corresponds to the code "APHID01". This code is the output of this determination.

[0104] Finally, a structured electronic report is automatically generated. The report begins with a clear header, including the unique identifier of the inspected leaf and the time range of image acquisition, such as "Leaf ID: Leaf20231027001, Observation period: 10:00 to 11:30". The main body of the report first provides the comprehensive pest type code, such as "Main identified pest type: APHID01". Subsequently, a table details all identified potential pest activity focal points, with columns including focal point ID, spatial coordinates, initial tracing time, and the pest subtype code determined by the focal point path.

[0105] The appendix to the report stores more detailed raw and intermediate data for verification, including the complete spatial coordinate sequence of each focal movement path, the timestamp corresponding to each coordinate point, and the specific values ​​of the calculated curvature, average speed, and speed fluctuation. The entire electronic document uses an easily parsed structured format, such as JSON or XML, ensuring clear information hierarchy. This allows for quick manual review of conclusions and facilitates automatic reading and processing of detailed data by other programs. Through step S6, this invention achieves fully automated processing and output from raw image sequences to final structured pest diagnosis information.

[0106] Please see Figure 2 As shown, the present invention also includes an image recognition-based pest detection system for *Salvia miltiorrhiza*, used to implement the above-described image recognition-based pest detection method for *Salvia miltiorrhiza*, comprising:

[0107] The time-series image acquisition module is used to acquire multi-temporal multispectral image sequences of the surface of Salvia miltiorrhiza leaves, the image sequences covering the complete time process from the state without microdroplets to the state of stable microdroplet formation;

[0108] The droplet feature extraction module is used to process each frame of the multi-temporal multispectral image sequence, identify and locate all microdroplet regions in each frame of the image, and record the occurrence time and centroid coordinates of each microdroplet region.

[0109] The spatial density field construction module is used to define the space of the blade surface as a continuous coordinate system and calculate the spatial density field of microdroplets on the blade surface at that moment based on the set of centroid coordinates of all microdroplet regions in each frame of the image.

[0110] The dynamic focus tracing module is used to analyze the changes in the spatial density field of the microdroplets over time, identify local spatial regions where the density value continuously increases, and trace and mark the geometric center of each local spatial region as a potential pest focus.

[0111] The trajectory feature generation module is used to connect the spatial positions of the same potential pest focus at different times to form a focus movement path; and to extract the geometric shape features and movement speed features of the focus movement path.

[0112] The intelligent classification report module is used to input the geometric shape features and movement speed features of the focal movement path into a pre-trained pest behavior decoding network, output the corresponding pest type code, and generate a detection report containing the pest type code and the focal movement path.

[0113] The foregoing has provided a detailed description of one embodiment of the present invention, but this description is merely a preferred embodiment and should not be construed as limiting the scope of the invention. All equivalent variations and modifications made within the scope of the claims of this invention should still fall within the patent coverage of this invention.

Claims

1. A method for detecting pests in *Salvia miltiorrhiza* based on image recognition, characterized in that, Includes the following steps: S1. Collect a multi-temporal, multi-spectral image sequence of the surface of Salvia miltiorrhiza leaves, the image sequence covering the complete time process from the state without microdroplets to the state of stable microdroplet formation; S2. Process each frame of the multi-temporal multispectral image sequence, identify and locate all microdroplet regions in each frame, and record the appearance time and centroid coordinates of each microdroplet region. S3. Define the blade surface space as a continuous coordinate system. Based on the set of centroid coordinates of all microdroplet regions in each frame of the image, calculate the microdroplet spatial density field on the blade surface at that moment. S4. Analyze the changes of the spatial density field of the microdroplets with time series, identify local spatial regions where the density value continuously increases, and trace and mark the geometric center of each local spatial region as a potential focus of pest action. S5. Connect the spatial positions of the same potential pest focus at different times to form a focus movement path; extract the geometric shape features and movement speed features of the focus movement path; S6. Input the geometric shape features and movement speed features of the focal movement path into the pre-trained pest behavior decoding network, output the corresponding pest type code, and generate a detection report containing the pest type code and the focal movement path.

2. The method for detecting pests in *Salvia miltiorrhiza* based on image recognition according to claim 1, characterized in that, In step S2, the process of identifying and locating all microdroplet regions present in each frame of image is as follows: A frame of an image in a droplet-free state from a multi-temporal multispectral image sequence is selected as a background reference frame. The current image frame to be processed and the background reference frame are subjected to pixel-by-pixel gray value subtraction operation in the corresponding multiple narrow bands to obtain the difference image of each band. For each difference image, a difference judgment threshold is calculated based on its global gray distribution statistical characteristics. Pixels with an absolute gray value difference greater than the threshold are marked as candidate change pixels. For each pixel location in the image, the number of times it is marked as a candidate changing pixel point in different bands is counted. When the number exceeds the preset cross-band consistency threshold, the pixel location is determined to belong to the microdroplet spectral response region. Spatial connectivity analysis is performed on the pixels belonging to this region. The set of mutually adjacent pixels is marked as a connected region and defined as a microdroplet region. The arithmetic mean of the coordinates of all pixels in the connected region is calculated as its centroid coordinates. The acquisition timestamp of the current image frame is recorded as the appearance time of the microdroplet region.

3. The method for detecting pests in *Salvia miltiorrhiza* based on image recognition according to claim 1, characterized in that, In step S3, the process of calculating the spatial density field of microdroplets on the blade surface is as follows: A two-dimensional Cartesian coordinate system is established with the effective area of ​​the leaf image as the range. The two-dimensional Cartesian coordinate system is uniformly divided into grids in the horizontal and vertical directions to form a grid array covering the leaf area. Each grid cell is called a density calculation cell. For a single-frame image, the centroid coordinates of all microdroplet regions identified in the single-frame image are traversed. Based on the value of the centroid coordinates, the density calculation unit where the centroid coordinates are located is determined, and the initial statistical value of the density calculation unit is incremented by one. After traversal and statistics are completed, an initial density distribution matrix corresponding to the grid array is obtained. A two-dimensional Gaussian convolution kernel with a preset size and standard deviation parameter is used to perform spatial convolution smoothing operation on the initial density distribution matrix. The value of each density calculation unit position in the matrix obtained after smoothing is defined as the microdroplet spatial density field value of the center point coordinate of the density calculation unit at the current time.

4. The method for detecting pests in *Salvia miltiorrhiza* based on image recognition according to claim 1, characterized in that, In step S4, the process of analyzing the change of the spatial density field of microdroplets over time and identifying local spatial regions where the density value continuously increases is as follows: Acquire spatial density field data of microdroplets at multiple consecutive moments arranged in chronological order. For each density calculation unit in the density field grid system, extract the density field value of the density calculation unit at each moment to form a time density value sequence of the density calculation unit. For each density calculation unit, a first-order forward difference calculation is performed on the time density value sequence to obtain the density change value sequence of the density calculation unit over continuous time intervals. A positive change threshold and a minimum duration threshold are set. The density change value sequence of each density calculation unit is traversed to identify whether there is a subsequence in the density change value sequence where the density change value of multiple consecutive time intervals is greater than the positive change threshold. If there is a subsequence whose duration is not less than the minimum duration threshold, and the density value at the first moment after the end of the subsequence is not lower than the density value at the beginning of the subsequence, then the density calculation unit is marked as a continuously active unit.

5. The method for detecting pests in *Salvia miltiorrhiza* based on image recognition according to claim 4, characterized in that, In S4, the process of tracing and marking the geometric center of each local spatial region as a potential focus of pest activity is as follows: Spatial clustering analysis is performed on all labeled persistently active units to group spatially adjacent persistently active units into a set. Each set is defined as a local active region. The geometric center coordinates of the local active region are calculated by calculating the weighted average of the coordinates of all persistently active units in the local active region. The weight of each persistently active unit is positively correlated with the average rate of change of the continuously increasing subsequences identified in the density change value sequence of the persistently active unit. The calculated geometric center coordinates are marked as the spatial coordinates of a potential pest focus. The earliest time point in the local active area that is marked as a continuously active unit is recorded as the initial tracing time of the potential pest focus.

6. The method for detecting pests in *Salvia miltiorrhiza* based on image recognition according to claim 1, characterized in that, In step S5, the process of extracting the geometric shape features and movement speed features of the focus movement path is as follows: For a potential pest impact focus, according to the time sequence in which the potential pest impact focus is marked, obtain the spatial location coordinate sequence of the potential pest impact focus at multiple time points. Connect the points in the spatial location coordinate sequence with straight line segments in time sequence to form a focal point movement path polyline. Calculate the ratio of the total length of the focal point movement path polyline to the straight line distance between the start and end points, and define it as the curvature feature of the focal point movement path. Calculate the Euclidean distance between each pair of adjacent coordinate points on the broken line of the focus movement path to obtain the displacement between adjacent time points. Divide each displacement by the corresponding time interval to obtain the instantaneous movement speed of the focus on the path segment. Calculate the arithmetic mean of all instantaneous movement speeds as the average speed characteristic of the focus movement path. Calculate the standard deviation of all instantaneous movement speeds as the speed fluctuation characteristic of the focus movement path.

7. The method for detecting pests in *Salvia miltiorrhiza* based on image recognition according to claim 1, characterized in that, In step S6, the pre-training process of the pest behavior decoding network is as follows: Construct a training dataset. Each sample consists of a historical focal movement path of a known pest category and its category label. For each path, calculate the ratio of the total path length to the straight-line distance between the start and end points to obtain the curvature feature. Calculate the average ratio of the displacement to the time interval of each adjacent point on the path to obtain the average velocity feature. Calculate the standard deviation of the velocity sequence to obtain the velocity fluctuation feature. The curvature, average velocity, and velocity fluctuation features are combined into a feature vector and used as network input. The class labels are converted into one-hot encodings as training targets. The network parameters are initialized, and the feature vectors are input into the network in batches for forward propagation. The predicted probability distribution is obtained by normalizing the exponential function at the output layer. The cross-entropy loss between the predicted distribution and the true label is calculated. The gradient of the loss with respect to the network parameters is calculated using the backpropagation algorithm. The network parameters are updated using the gradient descent algorithm. This process is repeated until the classification accuracy of the network on the validation set tends to stabilize.

8. The method for detecting pests in *Salvia miltiorrhiza* based on image recognition according to claim 1, characterized in that, In step S6, the process of outputting the corresponding pest type code and generating a detection report containing the pest type code and the focus movement path is as follows: The curvature characteristic is obtained by calculating the ratio of the total path length to the straight-line distance between the start and end points of the path to be inspected; the average speed characteristic is obtained by calculating the arithmetic mean of the instantaneous speeds on the path; and the speed fluctuation characteristic is obtained by calculating the standard deviation of the instantaneous speed sequence. The above features are combined into a feature vector and input into a pre-trained pest behavior decoding network. After the network performs forward propagation, a probability distribution vector is generated in the output layer. The pest type code corresponding to the dimension with the largest value is selected as the output. An electronic document is created, with the leaf identifier and collection time range written in the header, the pest type codes listed in the main body and all potential pest action focus recorded in a table, and the appendix storing the complete coordinate sequence, timestamp and feature value of each path. The electronic document is then output.

9. A Salvia miltiorrhiza pest detection system based on image recognition, used to implement the Salvia miltiorrhiza pest detection method based on image recognition as described in any one of claims 1-8, characterized in that, include: The time-series image acquisition module is used to acquire multi-temporal multispectral image sequences of the surface of Salvia miltiorrhiza leaves, the image sequences covering the complete time process from the state without microdroplets to the state of stable microdroplet formation; The droplet feature extraction module is used to process each frame of the multi-temporal multispectral image sequence, identify and locate all microdroplet regions in each frame of the image, and record the occurrence time and centroid coordinates of each microdroplet region. The spatial density field construction module is used to define the space of the blade surface as a continuous coordinate system and calculate the spatial density field of microdroplets on the blade surface at that moment based on the set of centroid coordinates of all microdroplet regions in each frame of the image. The dynamic focus tracing module is used to analyze the changes in the spatial density field of the microdroplets over time, identify local spatial regions where the density value continuously increases, and trace and mark the geometric center of each local spatial region as a potential pest focus. The trajectory feature generation module is used to connect the spatial positions of the same potential pest focus at different times to form a focus movement path; and to extract the geometric shape features and movement speed features of the focus movement path. The intelligent classification report module is used to input the geometric shape features and movement speed features of the focal movement path into a pre-trained pest behavior decoding network, output the corresponding pest type code, and generate a detection report containing the pest type code and the focal movement path.