Method and system for judging the remaining degree of industrial heritage based on historical image contrast recognition

By using a historical image comparison and recognition method, extracting feature vectors with CNN and self-attention mechanism, and combining dual attention module and adversarial optimization technology, the problem of relying on subjective experience in existing technologies is solved, and the objective assessment and efficient management of the degree of industrial heritage preservation is realized.

CN121579902BActive Publication Date: 2026-06-26CHINA ACAD OF URBAN PLANNING & DESIGN

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
CHINA ACAD OF URBAN PLANNING & DESIGN
Filing Date
2025-11-24
Publication Date
2026-06-26

Smart Images

  • Figure CN121579902B_ABST
    Figure CN121579902B_ABST
Patent Text Reader

Abstract

The application provides an industrial heritage retention degree evaluation method and system based on historical image contrast recognition, and the method comprises the following steps: collecting multi-source heterogeneous data of a target area, extracting a space-time and semantic feature vector, and strengthening feature representation to obtain a comprehensive feature vector; the comprehensive feature vector is subjected to adversarial optimization of a label predictor and a domain discriminator to align the feature distribution of an industrial quality inspection source domain and an industrial heritage target domain, so that a feature matching degree is obtained; and the functional coefficient is predicted by using LSTM time series analysis based on the multi-source heterogeneous data, and a retention degree evaluation report is output. According to the application, the feature vectors are extracted in parallel, the feature fusion is strengthened in combination with a double attention module, and the change of image sequences across years is captured; the difference between the feature distributions of two domains is minimized through adversarial training, and the accuracy of the feature matching degree is improved; finally, the integrity coefficient is calculated by weightedly fusing the feature matching degree and the expert score, the functional coefficient is predicted by using an LSTM time sequence model, and the scientific nature and efficiency of the protection of industrial heritage are improved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of heritage preservation technology, and in particular to a method and system for assessing the preservation level of industrial heritage based on historical image comparison. Background Technology

[0002] Current assessments of the preservation status of industrial heritage primarily rely on manual on-site surveys and experience-based judgment. Traditional methods typically involve expert teams comparing historical images from different periods, combining this with on-site measurement data to identify component damage, such as structural cracks in factory buildings and the degree of equipment aging, supplemented by simple image processing techniques (such as edge detection and grayscale contrast) for static feature extraction. Some studies have attempted to introduce single-scale image analysis or basic time-series models, but these are mostly limited to static images or data from a single year, failing to adequately mine spatiotemporal evolution information within continuous year image sequences. The fusion and application of multi-source heterogeneous data (such as images, equipment logs, and environmental monitoring data) is still in its early stages, resulting in low data utilization and assessment results that heavily depend on expert subjective experience, making it difficult to balance efficiency and objectivity. Summary of the Invention

[0003] This invention aims to at least address the technical problem in existing technologies where the assessment results heavily rely on the subjective experience of experts, making it difficult to balance efficiency and objectivity. In particular, it innovatively proposes a method and system for assessing the preservation level of industrial heritage based on historical image comparison and recognition.

[0004] To achieve the above-mentioned objectives of this invention, this invention provides a method for assessing the preservation level of industrial heritage based on historical image comparison and identification, the method comprising:

[0005] S1. Collect and preprocess multi-source heterogeneous data of the target area; the multi-source heterogeneous data includes continuous year image sequences.

[0006] S2. Use a CNN network to extract the spatiotemporal feature vectors from the consecutive years of image sequence;

[0007] S3. Extract semantic feature vectors from the consecutive year image sequence using a self-attention mechanism;

[0008] S4. Based on the spatiotemporal feature vector and semantic feature vector, the feature representation is enhanced using the spatial-channel dual attention (SA) module to obtain a comprehensive feature vector;

[0009] S5. By using the adversarial optimization of the label predictor and the domain discriminator, the feature distribution of the comprehensive feature vector is aligned between the industrial quality inspection source domain and the industrial heritage target domain to obtain the feature matching degree.

[0010] S6. Calculate the integrity coefficient based on the feature matching degree and expert score; predict the functionality coefficient based on the multi-source heterogeneous data through LSTM time series analysis; and output a retention assessment report based on the integrity coefficient and the predicted functionality coefficient.

[0011] On the other hand, the present invention also provides a system for assessing the preservation level of industrial heritage based on historical image comparison and recognition, the system comprising:

[0012] processor;

[0013] Memory used to store processor-executable instructions;

[0014] The processor is configured to implement the method for assessing the preservation level of industrial heritage based on historical image comparison and recognition when executing the executable instructions.

[0015] The beneficial effects of this invention are as follows: This invention effectively solves the pain points of traditional manual assessment, which relies on subjective experience, is inefficient, and has insufficient utilization of multi-source data, by using multi-source heterogeneous data fusion, spatiotemporal-semantic feature dual-channel extraction, and domain discriminator adversarial optimization techniques. Specifically, it employs CNN and self-attention mechanisms to extract spatiotemporal and semantic feature vectors in parallel, and combines spatial-channel dual attention modules to enhance feature fusion, thereby capturing subtle changes in cross-year image sequences. Through adversarial training of the label predictor and the domain discriminator, it minimizes the feature distribution differences between the industrial quality inspection source domain and the heritage target domain, improving the accuracy of feature matching. Finally, it calculates the integrity coefficient based on a weighted fusion of feature matching degree and expert scores, and uses an LSTM time series model to predict the functionality coefficient, forming a dual quantitative system of "data-driven + experience-calibrated". This makes the retention assessment results both objective and interpretable, supporting dynamic restoration decisions and life cycle management, and significantly improving the scientific nature and efficiency of industrial heritage protection.

[0016] Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. Attached Figure Description

[0017] The above and / or additional aspects and advantages of the present invention will become apparent and readily understood from the description of the embodiments taken in conjunction with the following drawings, in which:

[0018] Figure 1 This is a flowchart of the method for assessing the preservation level of industrial heritage based on historical image comparison and recognition, which is based on the present invention. Detailed Implementation

[0019] Embodiments of the present invention are described in detail below. Examples of these embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary and are only used to explain the present invention, and should not be construed as limiting the present invention.

[0020] Example 1

[0021] like Figure 1 As shown, the method for assessing the preservation level of industrial heritage based on historical image comparison and identification includes the following:

[0022] S1. Collect and preprocess multi-source heterogeneous data of the target area; the multi-source heterogeneous data includes continuous year image sequences.

[0023] In step S1, high-resolution satellite remote sensing imagery, UAV oblique photography data, and continuous year image sequences collected by ground monitoring cameras are first used as core data sources. These images need to cover the entire life cycle of the industrial heritage. At the same time, structured data such as engineering drawings and equipment operation logs from historical archives are integrated, as well as unstructured text data such as on-site survey records and expert evaluation reports from cultural relic protection units, forming a multi-dimensional dataset containing spatiotemporal information, functional attributes, and maintenance records. In the preprocessing stage, histogram equalization is used to enhance contrast for image data, SIFT algorithm is used for image registration to eliminate geometric distortion, and median filtering is used to remove noise interference. For text data, OCR technology is used to achieve digital conversion, and NLP model is used to extract key entities and semantic relationships. Finally, all preprocessed data are uniformly stored in the spatiotemporal database.

[0024] S2. Use a CNN network to extract spatiotemporal feature vectors from consecutive years of image sequences;

[0025] S3. Extract semantic feature vectors from consecutive year image sequences using a self-attention mechanism;

[0026] S4. Based on spatiotemporal feature vectors and semantic feature vectors, the spatial-channel dual attention (SA) module is used to enhance feature representation and obtain a comprehensive feature vector.

[0027] S5. By using the adversarial optimization of the label predictor and the domain discriminator, the feature distribution of the comprehensive feature vector is aligned between the source domain of industrial quality inspection and the target domain of industrial heritage, and the feature matching degree is obtained.

[0028] S6. Calculate the integrity coefficient based on feature matching degree and expert score; predict the functionality coefficient based on multi-source heterogeneous data through LSTM time series analysis; and output a retention assessment report based on the integrity coefficient and the predicted functionality coefficient.

[0029] The principle of the industrial heritage preservation assessment method based on historical image comparison in this embodiment is as follows: By constructing a multi-source heterogeneous data fusion framework, a CNN network is first used to perform convolution operations on consecutive years of image sequences, extracting low-level to high-level spatiotemporal features such as geometric shape and spatial location of industrial heritage components layer by layer to form spatiotemporal feature vectors. At the same time, a self-attention mechanism is used to capture the semantic associations at different time steps in the image sequence. By calculating the similarity weights of the Query vector and the Key vector, the Value vector is weighted and aggregated to generate a semantic feature vector reflecting the functional attributes and structural state of the industrial heritage. Subsequently, the two feature vectors are input into a spatial-channel dual-attention SA module. In the spatial dimension, a spatial weight map is generated through global average pooling, and in the channel dimension, a channel weight vector is generated through global max pooling. The two are fused to form a dual-attention weight matrix, which dynamically weights and enhances the original features, ultimately obtaining a comprehensive feature vector containing spatiotemporal evolution information and semantic association information. Based on this, the label predictor and the domain discriminator are used for adversarial optimization training to make the distribution of the comprehensive feature vector between the industrial quality inspection source domain and the heritage target domain more consistent. The feature matching degree is calculated by the binary classification probability of the domain discriminator. The integrity coefficient is calculated by combining the expert scoring matrix, and the future functionality coefficient is predicted by using a two-layer LSTM network combined with the time attention mechanism. Finally, the model outputs a quantitative retention level based on the weighted sum, forming a comprehensive evaluation result covering morphological integrity and functional usability.

[0030] As an optional embodiment of the present invention, optionally, extracting the spatiotemporal feature vector from the consecutive years' image sequence using a CNN network in step S2 includes:

[0031] S201. Standardize the image sequence for consecutive years;

[0032] In step S201, it is necessary to explain in detail that the standardization processing of consecutive year image sequences mainly includes two steps: size normalization and pixel value standardization. First, images from different years and at different resolutions are uniformly scaled to a preset size (e.g., 512×512 pixels), and bilinear interpolation is used to maintain the smoothness of image edges and avoid geometric distortion. Second, the normalized images are subjected to pixel value standardization, linearly mapping the RGB three-channel pixel values ​​from the range [0, 255] to the interval [0, 1]. At the same time, the global mean and standard deviation are calculated, and Z-score standardization is performed on each channel to make the image data conform to the distribution characteristics of zero mean and unit variance. Through standardization processing, the feature distribution deviation caused by differences in shooting equipment and lighting conditions in images from different years can be eliminated, improving the stability and robustness of spatiotemporal feature extraction.

[0033] S202. Construct a CNN network and extract spatiotemporal information at different scales based on the standardized consecutive year image sequence;

[0034] In step S202, it is necessary to explain in detail that the CNN network in this embodiment adopts a hierarchical architecture, including an input layer, multiple convolutional layers, pooling layers, and fully connected layers. The input layer receives a normalized sequence of consecutive years' images, with each image as an independent input channel; the convolutional layers use sliding window operations with convolutional kernels of different sizes (such as 3×3 and 5×5), and introduce nonlinearity through the ReLU activation function to extract multi-scale spatiotemporal features from edges, textures, components, and scenes layer by layer; the pooling layers use a max pooling strategy to downsample each feature map, reducing the number of parameters while retaining significant features; the fully connected layers concatenate and fuse the multi-scale features, outputting a spatiotemporal feature vector containing spatiotemporal evolution information. Through this hierarchical extraction and fusion mechanism, the CNN network can effectively capture the morphological changes and spatial displacement features of industrial heritage in different years.

[0035] S203. Fuse spatiotemporal information at different scales to obtain spatiotemporal feature vectors.

[0036] In step S203, it is necessary to explain in detail that the fusion of spatiotemporal information adopts a combined strategy of multi-scale feature concatenation and 1×1 convolutional dimensionality reduction. First, the feature maps of different scales output by each convolutional layer are concatenated by channel to form a multi-dimensional feature tensor containing shallow detailed features and deep semantic features. Then, channel compression is performed through 1×1 convolutional kernels to reduce parameter redundancy while maintaining feature expressiveness, ultimately generating a spatiotemporal feature vector with uniform dimensionality. This vector further eliminates spatial redundancy through global average pooling to ensure the translation invariance of the feature representation.

[0037] As an optional embodiment of the present invention, the extraction of semantic feature vectors from consecutive year image sequences using a self-attention mechanism in step S3 may include:

[0038] S301. Segment the preprocessed continuous year image sequence, and then map the segmented continuous year image sequence to the initial token using a learnable linear projection layer.

[0039] In step S301, the preprocessed continuous year image sequence is first divided into fixed-length image segments (e.g., one segment every 5 years) according to time steps, and each segment is processed as an independent sample. Then, a learnable linear projection layer (parameter matrix W∈R(C×D), where C is the number of image channels and D is the token dimension) maps the pixel values ​​of each image segment to a D-dimensional initial token, forming a token sequence {t1, t2, ..., tT} (T is the total number of time steps). This projection layer automatically optimizes the parameters through backpropagation, so that the initial tokens can retain the local features of the original image while also having computable dimensions.

[0040] S302. Based on the initial token, calculate the similarity weights between the segmented consecutive year image sequences using a self-attention mechanism.

[0041] The expression for calculating the similarity weights between the segmented consecutive year image sequences is as follows:

[0042] ,in, Indicates the first After segmentation in the frame image The image sequence block and the first Similarity weights between image sequence blocks Represents the normalized exponential function, Indicates the first The first frame of the image The query vector corresponding to each image patch reflects the spatiotemporal semantic query requirements for specific industrial heritage components (such as chimneys) within that frame. Indicates the first The first frame of the image The key vector corresponding to each image block reflects the semantic feature supply capability of the intra-frame component, and forms a cross-temporal and spatiotemporal semantic association with the query vector. The dimension of the key vector (a positive integer) is used to scale the dot product result and prevent gradient vanishing (usually 64 / 128 / 256).

[0043] In step S302, it is necessary to explain in detail that the original similarity score is obtained by calculating the dot product of the Query vector and the Key vector, then scaled by dividing by the square root of the Key vector dimension, and finally converted into a probabilistic form of similarity weights using a normalized exponential function. This weight matrix (W∈R(T×T)) reflects the semantic association strength between image blocks at different time steps. For example, image blocks corresponding to equipment upgrades in a certain year will form high-weighted connections with image blocks related to functions in subsequent years.

[0044] S303. The initial tokens are weighted and aggregated based on similarity weights to generate semantic feature vectors.

[0045] In step S303, it is necessary to explain in detail that when performing weighted aggregation on the initial token sequence {t1, t2, ..., tT} based on the similarity weight matrix (W∈R(T×T)), the initial token at each time step is first multiplied by the weight vector of the corresponding row to generate a weighted token sequence. Then, all weighted tokens are summed to obtain the aggregated semantic feature vector. This vector is fused with the mean of the initial token sequence through residual connections, preserving both the local features of the original image and strengthening the semantic correlation across time steps. Finally, layer normalization is used to stabilize the training process, ensuring the generalization ability of the semantic feature vector across image sequences from different years. For example, when processing image sequences containing equipment update cycles, the semantic feature vector can automatically focus on image blocks related to functional changes, ignoring interference from changes in the background environment.

[0046] As an optional embodiment of the present invention, optionally, in step S4, the feature representation is enhanced using a spatial-channel dual attention (SA) module based on the spatiotemporal feature vector and the semantic feature vector to obtain a comprehensive feature vector, including:

[0047] S401. Concatenate the spatiotemporal feature vector and the semantic feature vector along the channel dimension to obtain the fused feature tensor;

[0048] In step S401, it is important to explain in detail that when concatenating the spatiotemporal feature vector and the semantic feature vector along the channel dimension, it is essential to ensure that the spatial resolution of the two vectors is consistent. The semantic feature vector is upsampled to the same spatial size as the spatiotemporal feature vector (e.g., 512×512 pixels) using bilinear interpolation. Then, a concat operation is performed along the channel dimension to generate a fused feature tensor F∈R(H×W×(C1+C2)), where H and W are the spatial height and width, and C1 and C2 are the number of channels for the spatiotemporal and semantic features, respectively. This tensor simultaneously preserves the geometric morphological information (e.g., building outlines) and functional semantic information (e.g., equipment status) of the industrial heritage.

[0049] S402. Compress the spatial dimension of the fused feature tensor to generate a single-channel spatial weight map;

[0050] In step S402, it is necessary to explain in detail that when compressing the spatial dimension of the fused feature tensor, a global average pooling operation is used to calculate the mean of each channel along the spatial dimension, generating a single-channel spatial weight map. This weight map, by aggregating global spatial information, can reflect the importance of different channel features in the overall spatial distribution. For example, when there are locally damaged areas in industrial heritage images, the spatial weight value of the corresponding channel will be significantly reduced, thereby suppressing interference from irrelevant areas in subsequent feature enhancement. At the same time, to preserve local spatial details, a global max pooling can be used in parallel to generate an auxiliary spatial weight map. The two are fused through learnable weight parameters to form a more robust spatial weight representation.

[0051] S403. Perform global average pooling on the fused feature tensor to compress the spatial dimension and generate a channel weight vector.

[0052] In step S403, it is necessary to explain in detail that when performing global average pooling on the fused feature tensor, the mean value of each channel is calculated along the spatial dimension to generate a channel weight vector. This vector, by aggregating global spatial information, can reflect the importance of different channel features in the overall semantic expression. For example, when a certain type of feature (such as rust texture) appears in multiple spatial locations in an industrial heritage image, the weight value of the corresponding channel will increase significantly, thus highlighting the contribution of this type of feature in subsequent feature enhancement. To further improve the discriminativeness of the channel weight vector, global max pooling can be used in parallel to generate auxiliary channel weight vectors. The two are fused through learnable weight parameters to form a more robust channel weight representation.

[0053] S404. Broadcast and multiply the single-channel spatial weight map and the channel weight vector to generate a dual-attention weight matrix; use the dual-attention weight matrix to perform element-wise multiplication with the fused feature tensor to perform feature enhancement, and superimpose the spatiotemporal feature vector and semantic feature vector through residual connection to form a style enhancement feature vector.

[0054] In step S405, it is necessary to explain in detail that when broadcasting and multiplying the single-channel spatial weight map and the channel weight vector, the spatial weight map and the channel weight vector are first expanded to the same dimension through a broadcasting mechanism, and then element-wise multiplication is performed to generate a dual-attention weight matrix. The value of each element in this matrix reflects the joint importance of the corresponding spatial location and channel features. When multiplying the dual-attention weight matrix element-wise with the fused feature tensor, the original feature value of each spatial location is dynamically weighted according to its spatial importance and channel importance. For example, well-preserved areas in industrial heritage images will receive higher weight values, while features of damaged or irrelevant areas will be suppressed. The feature-enhanced tensor is added to the original spatiotemporal feature vector and semantic feature vector through residual connections, which not only preserves the integrity of the original features but also enhances the semantic expression of key areas, ultimately forming a style-enhanced feature vector. This vector is stabilized through layer normalization operations to ensure generalization ability across image sequences from different years.

[0055] S405. Perform global average pooling on the style enhancement feature vector to obtain a comprehensive feature vector, which includes spatiotemporal evolution information and semantic association information.

[0056] In step S405, it is necessary to explain in detail that when performing global average pooling on the style enhancement feature vector, the mean is calculated simultaneously along the spatial and channel dimensions, compressing the multidimensional feature tensor into a fixed-length comprehensive feature vector. This operation eliminates spatial redundancy and strengthens the semantic association between channels by aggregating global information, ultimately generating a comprehensive feature representation that includes spatiotemporal evolution information (such as the trajectory of building structure changes over time) and semantic association information (such as the cross-year correspondence of functional components). For example, when processing image sequences of industrial plants that have undergone multiple renovations, the comprehensive feature vector can simultaneously encode the gradual changes in roof morphology (spatiotemporal information) and the type association of equipment updates (semantic information). Furthermore, to improve the interpretability of the features, gradient-weighted class activation mapping (Grad-CAM) can be introduced after pooling to visualize the contribution of key regions in images from different years, helping to verify the effectiveness of feature extraction.

[0057] As an optional embodiment of the present invention, optionally, in step S5, the adversarial optimization of the label predictor and the domain discriminator is used to align the feature distribution of the comprehensive feature vector between the industrial quality inspection source domain and the industrial heritage target domain, and the feature matching degree is obtained by:

[0058] S501. A label predictor is constructed using a multilayer perceptron structure. Its input is a comprehensive feature vector, and its output is the level of industrial heritage preservation.

[0059] In step S501, it is necessary to explain in detail that the multilayer perceptron structure includes an input layer, a hidden layer, and an output layer. The input layer receives the comprehensive feature vector, and its dimension is determined according to the specific length of the comprehensive feature vector. The hidden layer adopts a fully connected approach and introduces nonlinear transformation capability through a nonlinear activation function (such as ReLU), which can learn the complex spatiotemporal-semantic association patterns in the comprehensive feature vector. The number of hidden layer nodes can be adjusted according to actual needs to balance model complexity and generalization ability. The output layer uses the Softmax activation function to map the output of the hidden layer to a probability distribution of the degree of industrial heritage preservation, such as "well-preserved," "partially damaged," and "severely damaged," with each level corresponding to a probability value, and the sum of the probability values ​​is 1, thereby realizing a quantitative assessment of the degree of industrial heritage preservation. During training, the label predictor continuously optimizes the network parameters by minimizing the cross-entropy loss function between the predicted result and the true label, thereby improving the prediction accuracy of the degree of industrial heritage preservation.

[0060] S502. A domain discriminator is constructed using a convolutional neural network structure. Its input is a comprehensive feature vector, and its output is a binary classification probability that the sample belongs to the industrial quality inspection source domain or the industrial heritage target domain.

[0061] In step S502, it needs to be explained in detail that the domain discriminator of the convolutional neural network structure consists of multiple convolutional layers, pooling layers, and fully connected layers. After receiving the comprehensive feature vector, the input layer first extracts local features through convolutional layers. The kernel size and stride can be adjusted according to the feature dimension to capture domain-related features at different scales. The pooling layer uses max pooling or average pooling strategies to reduce the dimensionality of the feature map output by the convolution, enhancing the model's translation invariance. Subsequently, the fully connected layer maps the multidimensional features into binary classification probabilities, outputting the probability value (range 0~1) that the sample belongs to the industrial quality inspection source domain or the industrial heritage target domain. During training, the domain discriminator optimizes the network parameters to improve domain classification accuracy by minimizing the binary cross-entropy loss function between the predicted domain label and the true domain label. At the same time, to avoid gradient vanishing, batch normalization can be introduced after the convolutional layers to accelerate training convergence and improve model stability. For example, when the input is a comprehensive feature vector containing differences in device texture, the domain discriminator can capture subtle differences in material reflection properties between the source and target domains through convolutional kernels, thereby accurately distinguishing the source of the sample.

[0062] S503. Through adversarial optimization training, the cross-entropy loss function of the label predictor is minimized, while the confusion loss function of the domain discriminator is maximized. The confusion loss function is backpropagated through GRL, which promotes the alignment of the feature distribution of the comprehensive feature vector between the industrial quality inspection source domain and the industrial heritage target domain.

[0063] The expression for the cross-entropy loss function is: ,in, This represents the cross-entropy loss function value of the label predictor, used to measure the difference between the predicted retention level and the actual level. The smaller the value, the more accurate the prediction. This represents the total number of samples, i.e., the number of image sequences used in training within the industrial heritage target domain. This indicates the number of retention level categories (e.g., Excellent / Good / Average / Poor, corresponding to...). =4), Indicates the first The sample at the th The true label on the class (one-hot encoded, if the sample belongs to the class) (If it is a class, it is 1; otherwise, it is 0). Indicates the label predictor for the first... The sample at the th Predicted probabilities on the class;

[0064] The expression for the confusion loss function is: ,in, The confusion loss function value of the domain discriminator is maximized during backpropagation through GRL, making it difficult for the domain discriminator to distinguish between the source and target domains. The total number of input samples for the domain discriminator (including mixed samples from the industrial quality inspection source domain and the industrial heritage target domain). Indicates the first Domain labels of each sample ( =0 indicates a sample from the source domain. =1 indicates a target domain sample). The domain discriminator represents the first domain. The predicted probability of a sample belonging to the target domain (output by the Sigmoid activation function, with a value range of [0, 1]).

[0065] In step S503, it is necessary to explain in detail that during the adversarial optimization training process, the label predictor and the domain discriminator form a dynamic game relationship. Specifically, the label predictor continuously optimizes its own parameters by minimizing the cross-entropy loss function to improve the prediction accuracy of the industrial heritage retention level; the domain discriminator optimizes its domain classification ability by minimizing the binary cross-entropy loss function, and simultaneously maximizes the confusion loss function by inverting the sign of the gradient during backpropagation through a gradient reversal layer (GRL). This adversarial mechanism prompts the comprehensive feature vector to gradually eliminate the domain offset between the source and target domains while retaining task-related features (such as retention level semantics). For example, when the source domain samples contain standardized quality inspection equipment and the target domain samples contain old industrial heritage equipment, adversarial training will make the comprehensive feature vector focus more on the functional status of the equipment rather than the material aging differences, thereby improving the cross-domain generalization ability. An alternating optimization strategy is adopted during training: first, the domain discriminator parameters are fixed, and the label predictor parameters are updated to reduce the cross-entropy loss function value; then, the label predictor parameters are fixed, and the domain discriminator parameters are updated through GRL backpropagation to increase the confusion loss function value. Through multiple iterations, the comprehensive feature vector is finally aligned in feature distribution between the industrial quality inspection source domain and the industrial heritage target domain. At this point, the domain discriminator's prediction probability of the sample source is close to 0.5, indicating that the model can no longer effectively distinguish the sample domain, while the label predictor's prediction accuracy of the retention rate of the target domain samples is significantly improved.

[0066] S504. Calculate the feature matching degree based on the binary classification probability output by the domain discriminator. The feature matching degree reflects the alignment between the features of the target domain of industrial heritage and the features of the source domain of industrial quality inspection.

[0067] The expression for feature matching degree is: ,

[0068] in, Indicates feature matching degree. The domain discriminant predicts the probability of a sample in the target domain. This represents the Gaussian kernel bandwidth parameter. This represents the Softmax activation function. The weight matrix of the domain discriminator is represented. Represents the comprehensive feature vector. The bias vector of the domain discriminator.

[0069] In step S504, it is necessary to explain in detail that the calculation of feature matching degree is achieved by quantifying the similarity between the predicted probability distribution of the target domain sample in the domain discriminator and the distribution of the source domain. Specifically, firstly, the output probability of the target domain sample (i.e., the probability that the sample is identified as belonging to the target domain) of the domain discriminator is used, and the probability distribution is smoothed by combining it with a Gaussian kernel function, where the Gaussian kernel bandwidth parameter controls the degree of smoothing. Subsequently, the linear combination of the weight matrix of the domain discriminator and the comprehensive feature vector is mapped to a probability distribution through the Softmax activation function, and finally the feature matching degree is generated. The value range of this index is [0, 1]. The closer the value is to 1, the more similar the target domain feature distribution is to the source domain feature distribution, that is, the higher the alignment degree; conversely, it indicates that the domain offset still exists. For example, when processing images of the same industrial heritage taken in different years, if the feature matching degree reaches 0.85 or higher, it means that the model has effectively eliminated the domain differences caused by shooting equipment, lighting conditions or building aging, and the comprehensive feature vector can stably represent the preservation status of the heritage across years. Furthermore, to dynamically monitor the training process, a matching degree change curve can be introduced: if it continuously rises and tends to stabilize during iteration, it indicates that adversarial optimization is successful; if fluctuations or stagnation occur, the GRL gradient reversal strength or the domain discriminator structure needs to be adjusted (e.g., increasing the convolutional layer depth). Ultimately, feature matching degree not only serves as a quantitative indicator of cross-domain alignment but can also be directly used to screen high-quality training samples—only samples with matching degrees exceeding a threshold (e.g., 0.7) are retained to participate in label predictor updates, thereby further improving the model's generalization performance in the target domain.

[0070] As an optional embodiment of the present invention, optionally, in step S6, the integrity coefficient is calculated based on feature matching degree and expert score; the functionality coefficient is predicted based on multi-source heterogeneous data through LSTM time series analysis; and a retention assessment report is output based on the integrity coefficient and the predicted functionality coefficient, including:

[0071] S601. Based on the feature matching degree and expert scoring matrix, calculate the integrity coefficient through a weighted fusion model;

[0072] The expression for calculating the integrity coefficient is: ,in, Indicates the integrity coefficient. and Indicates the weighting coefficient. Indicates feature matching degree. This represents the average expert rating.

[0073] In step S601, it is necessary to explain in detail that the completeness coefficient is calculated by dynamically weighting the objective feature matching degree and the subjective expert score. Specifically, the weighting coefficient... and Allocation should be based on data reliability: when historical image annotation quality is high and domain alignment is significant, the allocation can be increased. Weights (e.g.) =0.7, =0.3), making feature matching degree the dominant factor in the assessment; if the experts' judgment on the historical value of the heritage is more authoritative, then adjust to =0.4, =0.6. The expert score mean E is obtained by taking the arithmetic mean of the independent scores from multiple experts on dimensions such as the structural integrity and material deterioration of industrial heritage. The score range is usually set to [0, 10]. For example, for a 19th-century foundry, if the feature matching degree is calculated to be 0.82 (indicating high cross-domain feature alignment), and three experts give scores of 8.5, 8.0, and 9.0 respectively (mean E = 8.5), then the integrity coefficient I = 0.6 × 0.82 + 0.4 × 8.5 / 10 = 0.492 + 0.34 = 0.832. This coefficient not only quantifies the technical feature alignment degree, but also incorporates the experience judgment of industry experts, making it particularly suitable for handling heritage cases with non-standard modifications or missing historical data. To improve the model's adaptability, a dynamic weight adjustment mechanism can be introduced: when the correlation coefficient between the feature matching degree and the expert score is lower than a threshold (e.g., r < 0.6), the weight reassignment process is automatically triggered, and the optimal value is searched through Bayesian optimization. , This approach ensures a balance between technical feasibility and historical authenticity in the assessment results. The resulting integrity coefficient will serve as one of the core indicators in the preservation assessment report, directly reflecting the overall preservation status of the heritage's physical condition and historical value.

[0074] S602. Based on the spatiotemporal feature vectors in multi-source heterogeneous data, a two-layer LSTM network is used, combined with a time attention mechanism to weight key time steps, and output the predicted functional coefficients for the next n years.

[0075] In step S602, it is necessary to explain in detail that the dual-layer LSTM network, by stacking two long short-term memory units, can effectively capture complex spatiotemporal dependencies in multi-source heterogeneous data. Specifically, the bottom-layer LSTM first performs preliminary encoding on the input spatiotemporal feature vector, extracting dynamic patterns within a local time window (such as equipment aging rate and structural deformation trend); the upper-layer LSTM further integrates the bottom-layer output to construct a global time series representation, capturing long-term evolution patterns across time periods. The time attention mechanism strengthens the contribution of key historical moments (such as major maintenance events and periods affected by natural disasters) to the prediction results by calculating the attention weight of each time step. For example, when analyzing the functional changes of an industrial heritage site over the past 50 years, the attention mechanism may assign a weight of 0.3 to the equipment renovation period in 1980, a weight of 0.25 to the earthquake restoration period in 2010, and allocate the remaining weights proportionally to the other time steps. Finally, the predicted functionality coefficient maps the weighted time series representation to the interval [0, 1] through a fully connected layer. The closer the value is to 1, the higher the probability that the heritage will maintain its original function within the next n years. For example, if a textile factory's functionality coefficient is predicted to be 0.78 over the next 10 years, it indicates a high probability that its core functional elements, such as production equipment and spatial layout, will remain usable. To improve prediction robustness, a Dropout layer (with a dropout rate of 0.2) is introduced during network training to prevent overfitting, and the Huber loss function is used to balance outlier sensitivity. In practical deployment, the prediction period n can be dynamically adjusted based on the heritage type: for rapidly iterating light industrial heritage (such as food processing plants), n is 3-5 years; for structurally stable heavy industrial heritage (such as steel mills), n can be extended to 10-15 years. The final generated functionality coefficient, together with the integrity coefficient, will form the quantitative basis of the retention assessment report.

[0076] S603. Calculate the weighted sum based on the integrity coefficient and the predicted functionality coefficient to obtain the retention level;

[0077] The formula for calculating the retention level is: ,in, The level of preservation is indicated by a weighted and quantified weighting of the overall preservation status of industrial heritage. The grading criteria are: Excellent (R≥0.85), Good (0.70≤R<0.85), Average (0.50≤R<0.70), and Poor (R<0.50). and This represents the weighting coefficient, reflecting the relative importance of the integrity coefficient and the functionality coefficient. The integrity coefficient represents the degree of morphological integrity of industrial heritage, and its value ranges from [0, 1]. denoted as the predictive functionality coefficient, characterizing the future functional availability of industrial heritage, with a value range of [0, 1];

[0078] In step S603, it is necessary to explain in detail that the calculation of the preservation level is achieved through dynamic assessment by comprehensively considering the current integrity status and future functional availability of the industrial heritage. Specifically, the weighting coefficient... and Differentiated configurations should be made based on the type of heritage and the conservation objectives: when the assessment focuses on the authenticity of historical buildings (such as cultural heritage sites), a specific configuration can be set up. =0.6、 =0.4, making the integrity coefficient the dominant factor in the classification; if the focus is on the sustainable use value of heritage (such as industrial tourism sites), then adjust to =0.4、 =0.6, emphasizing the influence of the functionality coefficient. For example, for a machine manufacturing plant from the early 20th century, if the integrity coefficient I = 0.82 (structure well preserved) and the functionality coefficient F = 0.75 (still capable of production after equipment upgrades), when =0.5、 When the value is 0.5, the retention level R = 0.5 × 0.82 + 0.5 × 0.75 = 0.785, which is classified as "good" according to the standard. This quantitative model not only avoids the one-sidedness of a single indicator, but also adapts to the needs of different protection scenarios through weight adjustment. To improve the flexibility of the assessment, a dynamic weight calibration mechanism can be introduced: when the functional requirements of the heritage undergo significant changes (such as from production to exhibition), the weight optimization process is automatically triggered, and the optimal weight is searched through a genetic algorithm. , This combination ensures consistency between the grading system and the protection objectives. The final preservation level will serve as the core basis for heritage protection decisions; for example, priority will be given to applying for cultural relic protection funds for "excellent" heritage sites, while rescue restoration plans will be initiated for "poor" heritage sites.

[0079] S604. Generate a retention assessment report based on the integrity coefficient, predicted functionality coefficient, and retention level.

[0080] In step S604, it is necessary to explain in detail that the generation of the retention assessment report relies on the quantitative results of the integrity coefficient, predicted functionality coefficient, and retention level, and achieves information integration and visualization through a structured template. The report content covers four core modules: First, heritage overview, including basic information such as the name, geographical location, construction date, and historical function of the industrial heritage, combined with historical images and current photos to form a spatiotemporal comparison, intuitively showing the evolution of the physical form of the heritage; Second, analysis of assessment indicators, presenting the specific values ​​of the integrity coefficient, predicted functionality coefficient, and retention level in tabular form, and indicating the weighting configuration scheme (e.g., α=0.6, β=0.4), and simultaneously displaying intermediate calculation results such as feature matching degree and expert score mean, ensuring the traceability of the assessment process; Third, the basis for level determination, based on the R value range (excellent ≥0.85, ...). The report provides clear grading conclusions (Good 0.70-0.85, Average 0.50-0.70, Poor <0.50), explaining the logic behind weight selection in conjunction with heritage type and protection objectives. For example, it emphasizes the rationale for emphasizing the "integrity coefficient as the primary factor" for cultural heritage sites. Fourthly, it offers protection recommendations, proposing differentiated measures based on the grading. For example, it recommends applying for cultural heritage protection funds and conducting regular monitoring for "Excellent" heritage sites, and developing rescue restoration plans and restricting functional changes for "Poor" heritage sites. It also includes a note predicting the probability of maintaining functionality over the next n years corresponding to the functional coefficient (e.g., F=0.78 corresponds to a 78% probability of usable production functions within 10 years). The report includes a dynamic update prompt at the end, clarifying that when the correlation coefficient between feature matching and expert scores falls below a threshold (r<0.6) or when there are significant changes in heritage functional needs, the weights need to be recalibrated and a revised report generated. The final report is output in PDF format, supporting mixed text and graphics layouts and interactive data queries.

[0081] As an optional embodiment of the present invention, the expression for outputting the predicted functional coefficient for the next n years is optionally: ,in, Indicates the future number Annual predicted functional coefficient This represents the output layer weight matrix. Indicates a historical time step. Indicates the length of the input time window. This represents the time attention weight, calculated using a time attention mechanism. Higher weights are assigned to key time steps (such as the year of major equipment overhaul) to enhance predictive sensitivity. This indicates the hidden state of the second LSTM layer. This represents the output layer bias vector.

[0082] As an optional embodiment of the present invention, the method may further include:

[0083] S7. Visualize and render the preservation assessment report to generate dynamic heat maps and 3D reconstruction models, intuitively showcasing the spatial distribution and temporal evolution characteristics of industrial heritage.

[0084] In step S7, it is necessary to explain in detail that the generation of the dynamic heatmap is based on the spatial distribution data of preservation level, and uses color gradients (such as red-yellow-green) to intuitively reflect the differences in preservation status of different areas. Specifically, industrial heritage is divided into regular grid units, and the preservation level R value of each unit is mapped to a preset color scale: R≥0.85 shows dark green (excellent), 0.70≤R<0.85 shows light green (good), 0.50≤R<0.70 shows yellow (average), and R<0.50 shows red (poor). For example, a heatmap of a steel plant may show a layered effect with the core production area in green (frequent equipment updates), auxiliary facilities in yellow (local aging), and abandoned warehouses in red. To enhance information density, historical image slices are overlaid on the heatmap, and users can observe the changes in preservation status from 1950 to the present through a time slider. The process of the red area shrinking over time can intuitively verify the effectiveness of protection measures.

[0085] The construction of 3D reconstruction models relies on geometric feature vectors from multi-source heterogeneous data, employing point cloud registration and texture mapping techniques to reconstruct the three-dimensional form of heritage sites. First, historical images from different eras are processed using the SfM (Structure of Motion) algorithm to generate high-precision point cloud models. Second, occluded areas are filled in using laser scanning data, constructing a complete geometric framework. Finally, texture information from historical images is mapped onto the model surface, forming a 3D scene that combines realism with a temporal dimension. For example, a reconstruction model of a textile factory can simultaneously display the brick exterior walls from 1960 (extracted from black-and-white images) and the steel roof added in 2020 (obtained from laser scanning). Users can observe the spatiotemporal trajectory of structural changes by rotating the viewpoint. To support interactive analysis, the model embeds voxel-based encoding of preservation levels: the R-value of each voxel (3D pixel) is displayed through transparency adjustment; high R-value areas (such as well-preserved factory buildings) are presented as semi-transparent green, while low R-value areas (such as collapsed chimneys) are presented as semi-transparent red, assisting decision-makers in quickly identifying key areas for protection.

[0086] The dynamic rendering process incorporates a timeline control module, allowing users to specify the starting year and step size (e.g., one frame every 5 years). The system automatically generates an animated sequence of heritage evolution. In the animation, color changes in the heatmap are synchronized with morphological changes in the 3D model. For example, the 1980 heatmap shows an expansion of the red area (corresponding to a collapsed roof in the 3D model), while the 2010 heatmap shows an increase in the green area (corresponding to a restored factory building in the 3D model). To enhance data reliability, the rendering results include metadata tags. Clicking on any area allows users to view the raw data for that location's integrity coefficient, functionality coefficient, and average expert score, as well as the shooting time and equipment parameters of the corresponding historical imagery. The final generated visualization report supports multi-terminal access, achieving real-time rendering on the browser side using WebGL technology. Decision-makers can perform spatial analysis and temporal retrospection without installing specialized software.

[0087] In this embodiment, the axis control module adopts a layered architecture design: the bottom layer integrates a time step parser (e.g., converting the user-input start year and step size into a system-recognizable time series format, such as parsing "starting in 1950, one frame every 5 years" into a time array of [1950, 1955, 1960, ..., 2025]), supporting user-defined start year (e.g., 1950) and time interval (e.g., one frame every 5 or 10 years), generating discrete time series nodes by parsing time parameters; the middle layer deploys time event triggers, automatically triggering the corresponding year's heat map update and 3D model morphology change when the user drags the time slider to a specific node (e.g., the equipment renovation period in 1980); the upper layer builds a multimodal synchronization engine to ensure the spatiotemporal alignment of color gradient (heat map) and structural deformation (3D model), for example, when the time axis points to the earthquake repair period in 2010, the red area in the heat map shrinks synchronously, and the collapsed factory structure in the 3D model is synchronously repaired to the reinforced state. To improve operational smoothness, the module introduces a buffer preloading mechanism, pre-rendering the rendering results of two time steps before and after the current time point, reducing the waiting time for users to drag the slider. Simultaneously, the axis control module supports time jump operations; users can directly input a target year (e.g., 2020) or click on markers on the timeline (e.g., the year corresponding to a major historical event), and the system quickly locates the specified time and updates the visualization content. For mobile access scenarios, the module is optimized for touch-friendly design, allowing users to adjust the timeline range by pinching to zoom, and long-pressing a time point to bring up a summary of the evaluation indicators for that year (e.g., integrity coefficient = 0.78, functionality coefficient = 0.65). Ultimately, the interactive timeline generated by the axis control module serves as the core navigation tool for the visualization report, enabling decision-makers to intuitively trace the changes in the preservation status of industrial heritage from its construction to the present.

[0088] S8. Link the retention assessment report with historical maintenance records, use knowledge graph technology to construct an industrial heritage life cycle map, and output optimization maintenance suggestions and risk warning indicators.

[0089] In step S8, static attributes (such as name, geographical location, construction date, and heritage type) and dynamic indicators (integrity coefficient, functionality coefficient, and preservation level) of industrial heritage are first extracted from the preservation assessment report. Simultaneously, time-seriesd maintenance events (such as repair time, repair location, repair method, and repair cost) are extracted from historical maintenance records to form a structured data layer. Secondly, an entity relationship model is defined: industrial heritage is taken as the core entity, associated with its "contained" sub-components (such as factory buildings, equipment, and chimneys), "experienced" maintenance events, "compliant" protection standards (such as the Cultural Relics Protection Law), and "belonging" heritage categories (such as important modern historical sites). The maintenance event entity is further associated with sub-entities such as "used" maintenance materials, "executed" maintenance teams, and "based on" technical specifications. For example, in the life cycle diagram of a steel plant, the "blast furnace" is a sub-component entity, connected to the 1985 "furnace lining repair" event through "experience" relationships. This event is further associated with the "used" "refractory brick" material entity and the "executed" "third maintenance team" team entity.

[0090] To enhance the semantic reasoning capabilities of the graph, an ontology library is introduced to define logical rules between entities, such as "triggering a 'high-risk' warning when the functionality coefficient is <0.5 and no maintenance events have been recorded in the past 3 years"; or "if the heritage type is 'cultural relic protection unit' and the integrity coefficient is <0.7, then 'applying for special repair funds' is recommended." For example, for a textile factory, the graph, through rule reasoning, discovers that the integrity coefficient of its "weaving workshop" sub-component decreased from 0.82 in 2018 to 0.65 in 2023, and there are no maintenance records after 2020, automatically generating an optimization suggestion of "recommending structural reinforcement in 2024, with a budget of approximately 500,000 yuan." Simultaneously, the graph incorporates spatiotemporal dimension analysis: arranging maintenance events along a timeline and combining the changing trends of retention levels, key intervention nodes are identified (such as a significant increase in the functionality coefficient after a major overhaul).

[0091] The generation of risk warning indicators relies on association rule mining and anomaly detection algorithms in the graph. On the one hand, frequent itemset mining discovers high-frequency co-occurrence patterns, such as the association rule "equipment aging (annual decrease in integrity coefficient > 0.1) → annual decrease in functionality coefficient > 0.15 → failure within 3 years." When new data matches this pattern, an alert is triggered. On the other hand, the isolated forest algorithm is used to detect abnormal maintenance behavior: if the cost of a certain repair is significantly higher than the average of similar events (e.g., more than twice the median) and there is no accompanying improvement in retention level, it is marked as a "potential resource waste" risk. For example, in 2022, the cost of a "major machine tool overhaul" event in a machinery factory reached 800,000 yuan, but the functionality coefficient only increased from 0.68 to 0.71. The graph automatically generated an alert suggesting "reviewing the rationality of the repair plan."

[0092] The optimized maintenance recommendations employ a hierarchical recommendation mechanism: the basic layer generates general recommendations based on retention levels (e.g., "excellent" heritage sites recommend "comprehensive inspection every 5 years," while "poor" heritage sites recommend "immediate restriction on use"); the advanced layer customizes solutions based on heritage type and sub-component characteristics (e.g., "modular update" is recommended for "light industrial heritage," while "structural reinforcement priority" is recommended for "heavy industrial heritage"); the intelligent layer dynamically adjusts recommendations using reinforcement learning models: with maintenance cost, functional recovery rate, and time efficiency as optimization objectives, it outputs the optimal combination by simulating the long-term effects of different maintenance strategies. For example, for a power plant's boiler equipment, the map recommends a phased solution: "replacing the refractory layer in 2024 (cost 300,000 yuan, functional coefficient improved to 0.85) + upgrading the control system in 2026 (cost 200,000 yuan, functional coefficient improved to 0.92)," with a total cost lower than the 550,000 yuan of a one-time comprehensive overhaul, and a higher functional recovery rate.

[0093] The final generated industrial heritage lifecycle map supports interactive queries: users can ask questions in natural language (e.g., "Which components saw a functionality coefficient decrease by more than 0.2 after 2018?"), and the system returns structured answers based on the map's semantic parsing capabilities, highlighting related entities and chains of evidence. The map data is stored in RDF format, supporting federated queries with external databases (such as directories of cultural relics protection units and repair material price databases) to ensure the timeliness and accuracy of recommendations. For example, when a user queries "the qualifications of a repair team for a certain heritage site," the map can link to the National Enterprise Credit Information Publicity System to verify whether the team possesses "cultural relics protection engineering construction qualifications," avoiding invalid recommendations.

[0094] Example 2

[0095] A system for assessing the preservation level of industrial heritage based on historical image comparison and recognition includes:

[0096] processor;

[0097] Memory used to store processor-executable instructions;

[0098] The processor is configured to implement a method for assessing the preservation level of industrial heritage based on historical image comparison and recognition when executing executable instructions.

[0099] It should be noted that the computer device includes a processor, a memory, and may also include one or more of a multimedia component, an input / output (I / O) interface, and a communication component.

[0100] The processor controls the overall operation of the computer device to complete all or part of the steps in the above-mentioned method for assessing the preservation level of industrial heritage based on historical image comparison and recognition.

[0101] Memory is used to store various types of data to support the operation of the computer device. This data may include, for example, instructions for any application or method used to operate on the computer device, as well as application-related data. Memory can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic storage, flash memory, magnetic disk, or optical disk.

[0102] The multimedia component may include a screen and an audio component, wherein the screen may be, for example, a touch screen, and the audio component is used to output and / or input audio signals; for example, the audio component may include a microphone for receiving external audio signals, the received audio signals may be further stored in memory or transmitted via a communication component; the audio component may also include at least one speaker for outputting audio signals.

[0103] I / O interfaces provide interfaces between the processor and other interface modules, such as keyboards, mice, buttons, etc.; these buttons can be virtual buttons or physical buttons.

[0104] The communication component is used for wired or wireless communication between the computer device and other devices; wireless communication, such as Wi-Fi, Bluetooth, Near Field Communication (NFC), 2G, 3G, 4G or 5G, or one or more combinations thereof, and the corresponding communication component may include: Wi-Fi module, Bluetooth module, NFC module, mobile communication module.

[0105] As a preferred embodiment, the computer device may be implemented by one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components to perform the above-described method for assessing the preservation level of industrial heritage based on historical image comparison and recognition.

[0106] Although embodiments of the invention have been shown and described, those skilled in the art will understand that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims

1. A method for assessing the preservation level of industrial heritage based on historical image comparison and identification, characterized in that, The method includes: S1. Collect and preprocess multi-source heterogeneous data of the target area; the multi-source heterogeneous data includes continuous year image sequences. S2. Use a CNN network to extract the spatiotemporal feature vectors from the consecutive years of image sequence; S3. Extract semantic feature vectors from the consecutive year image sequence using a self-attention mechanism; S4. Based on the spatiotemporal feature vector and semantic feature vector, the feature representation is enhanced using the spatial-channel dual attention (SA) module to obtain a comprehensive feature vector; S5. By using the adversarial optimization of the label predictor and the domain discriminator, the feature distribution of the comprehensive feature vector is aligned between the industrial quality inspection source domain and the industrial heritage target domain to obtain the feature matching degree. S6. Calculate the integrity coefficient based on the feature matching degree and expert score; predict the functionality coefficient based on the multi-source heterogeneous data through LSTM time series analysis; and output a retention assessment report based on the integrity coefficient and the predicted functionality coefficient. The retention assessment report output in step S6 includes: S601. Based on the feature matching degree and the expert scoring matrix, calculate the integrity coefficient using a weighted fusion model; S602. Based on the spatiotemporal feature vectors in the multi-source heterogeneous data, a two-layer LSTM network is used, combined with a time attention mechanism, to weight key time steps and output the predicted functional coefficients for the next n years. S603. Calculate a weighted sum based on the integrity coefficient and the predicted functionality coefficient to obtain the retention level; S604. Based on the integrity coefficient, predicted functionality coefficient, and retention level, generate a retention assessment report; The expression for outputting the predicted functional coefficient for the next n years is: in, Indicates the future number Annual predicted functional coefficient This represents the output layer weight matrix. Indicates a historical time step. Indicates the length of the input time window. This represents the temporal attention weight, calculated using a temporal attention mechanism. This indicates the hidden state of the second LSTM layer. This represents the output layer bias vector.

2. The method for assessing the preservation level of industrial heritage based on historical image comparison and recognition as described in claim 1, characterized in that, Extracting the spatiotemporal feature vector from the consecutive year image sequence in step S2 includes: S201. Standardize the image sequence of consecutive years; S202. Construct a CNN network and extract spatiotemporal information at different scales based on the standardized consecutive year image sequence; S203. Fuse spatiotemporal information at different scales to obtain spatiotemporal feature vectors.

3. The method for assessing the preservation level of industrial heritage based on historical image comparison and recognition as described in claim 1, characterized in that, Extracting semantic feature vectors from the consecutive year image sequence in step S3 includes: S301. Segment the preprocessed continuous year image sequence, and then map the segmented continuous year image sequence to the initial token using a learnable linear projection layer. S302. Based on the initial token, calculate the similarity weights between the segmented consecutive year image sequences using a self-attention mechanism; S303. The initial token is weighted and aggregated based on the similarity weight to generate a semantic feature vector.

4. The method for assessing the preservation level of industrial heritage based on historical image comparison and recognition as described in claim 1, characterized in that, Obtaining the comprehensive feature vector in step S4 includes: S401. Concatenate the spatiotemporal feature vector and semantic feature vector along the channel dimension to obtain a fused feature tensor; S402. Compress the spatial dimension of the fused feature tensor to generate a single-channel spatial weight map; S403. Perform global average pooling on the fused feature tensor to compress the spatial dimension and generate a channel weight vector. S404. Broadcast and multiply the single-channel spatial weight map and the channel weight vector to generate a dual-attention weight matrix; use the dual-attention weight matrix to perform element-wise multiplication with the fused feature tensor to perform feature enhancement, and superimpose the spatiotemporal feature vector and the semantic feature vector through residual connection to form a style enhancement feature vector. S405. Perform global average pooling on the style enhancement feature vector to obtain a comprehensive feature vector, which includes spatiotemporal evolution information and semantic association information.

5. The method for assessing the preservation level of industrial heritage based on historical image comparison and recognition as described in claim 1, characterized in that, Obtaining the feature matching degree in step S5 includes: S501. A label predictor is constructed using a multilayer perceptron structure, with the input being the comprehensive feature vector and the output being the level of industrial heritage preservation. S502. A domain discriminator is constructed using a convolutional neural network structure. Its input is the comprehensive feature vector, and its output is the binary classification probability of a sample belonging to the industrial quality inspection source domain or the industrial heritage target domain. S503. Through adversarial optimization training, the cross-entropy loss function of the label predictor is minimized, while the confusion loss function of the domain discriminator is maximized. The confusion loss function is backpropagated through GRL to align the feature distribution of the comprehensive feature vector between the industrial quality inspection source domain and the industrial heritage target domain. S504. Calculate the feature matching degree based on the binary classification probability output by the domain discriminator. The feature matching degree reflects the alignment degree between the industrial heritage target domain features and the industrial quality inspection source domain features.

6. The method for assessing the preservation level of industrial heritage based on historical image comparison and recognition as described in claim 1 or 5, characterized in that, The expression for calculating the feature matching degree is: in, Indicates feature matching degree. The domain discriminant predicts the probability of a sample in the target domain. This represents the Gaussian kernel bandwidth parameter. This represents the Softmax activation function. The weight matrix of the domain discriminator is represented. Represents the comprehensive feature vector. The bias vector of the domain discriminator.

7. The method for assessing the preservation level of industrial heritage based on historical image comparison and recognition as described in claim 1, characterized in that, The method further includes: S7. Visualize and render the retention assessment report to generate a dynamic heat map and a three-dimensional reconstruction model, which intuitively displays the spatial distribution and temporal evolution characteristics of industrial heritage. S8. Link the retention assessment report with historical maintenance records, use knowledge graph technology to construct an industrial heritage life cycle map, and output optimization maintenance suggestions and risk warning indicators.

8. A system for assessing the preservation level of industrial heritage based on historical image comparison and recognition, characterized in that: The system includes: processor; Memory used to store processor-executable instructions; The processor is configured to implement the method for assessing the preservation level of industrial heritage based on historical image comparison and recognition, as described in any one of claims 1 to 7, when executing the executable instructions.