Identity recognition early warning system based on AI large model

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By using an AI-based large-scale model-based identity recognition and early warning system to generate probability distribution maps through graph computing and self-attention mechanisms, the problem of indiscriminate alarms in existing systems has been solved, and reasonable alarm triggering for the behavior of unregistered personnel has been achieved.

CN122245048APending Publication Date: 2026-06-19FUJIAN RONGJI SOFTWARE ENG CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: FUJIAN RONGJI SOFTWARE ENG CO LTD
Filing Date: 2026-05-19
Publication Date: 2026-06-19

Application Information

Patent Timeline

19 May 2026

Application

19 Jun 2026

Publication

CN122245048A

IPC: G08B29/18; G08B13/196; G08B31/00; G06V40/16; G06N3/04; G06N3/045; G06N3/0464; G06N5/04; G06V10/44; G06V10/62; G06V10/80; G06V10/82; G06V40/20

AI Tagging

Application Domain

Character and pattern recognition Neural architectures

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing access control and security early warning systems cannot effectively distinguish between legitimate and abnormal behavior by unregistered individuals during personnel identity verification, resulting in frequent triggering of indiscriminate high-level alarms and generating a large number of invalid alarms.

Method used

An AI-based large-scale model identity recognition and early warning system is adopted. Through a conventional identity recognition calculation module, a large-scale model behavior graph inference calculation engine with graph computing and self-attention mechanism, and a hierarchical early warning control calculation unit, a probability distribution map containing the probability of location transfer and the probability of abnormal behavior is generated, and local alarms are triggered only for behaviors that exceed the preset safety boundary.

Benefits of technology

It enables graph reasoning of the behavior of unregistered personnel in the absence of identity verification, filters out unregistered personnel with normal access logic, and bases alarm triggering on the determination of abnormal physical behavior, thereby reducing the triggering of invalid alarms.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122245048A_ABST

Patent Text Reader

Abstract

This invention relates to the field of computer system technology based on specific computational models, specifically to an identity recognition and early warning system based on a large AI model. The system includes a conventional identity recognition computation module, a large model behavior graph inference computation engine based on graph computation and self-attention mechanisms, and a hierarchical early warning and control computation unit. When the conventional identity recognition computation module matches biometric features and outputs an unregistered identity tag, it activates the inference engine. The inference engine loads a physical space behavior constraint graph, inputs continuous video frames of unregistered personnel into the large model, and extracts appearance attributes and motion trajectory features based on the self-attention mechanism to perform path optimization probability inference, generating a probability distribution map containing location transfer probabilities and abnormal behavior probabilities. This invention shifts the alarm triggering basis from identity comparison failure to physical space behavior link anomaly assessment, filtering out unregistered personnel with normal passage logic and reducing invalid alarms.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of computer system technology based on specific computational models, specifically to an identity recognition and early warning system based on a large AI model. Background Technology

[0002] Existing access control and security early warning systems typically deploy facial feature comparison or gait feature comparison modules when verifying personnel identity. After the system collects biometric data of on-site personnel, it matches it against a pre-stored database of registered personnel features. During the matching process, if the highest classification confidence score obtained by the system is lower than a preset judgment threshold, or if the corresponding baseline feature does not exist in the feature database, conventional technical solutions will directly determine that the current person is unregistered. For such unregistered identity identification results, existing early warning triggering logic generally adopts a non-discriminatory processing mechanism; that is, once an unregistered identity label is output, the system immediately bypasses any intermediate judgment steps and directly issues the highest-level trigger command to all alarm terminals in that physical area, driving the global audible and visual alarm devices into working mode.

[0003] In open physical environments with high personnel mobility and complex backgrounds, the aforementioned indiscriminate processing mechanism frequently triggers the highest-level alarms for ordinary visitors or construction workers without valid identification credentials. Because the existing system completely lacks a secondary verification step to check the actual behavior of unregistered individuals in the physical space when identification fails, the alarm signal is triggered only based on a single failed identity comparison. This results in a loss of correlation between the alarms output by the early warning system and actual security threats, generating a large number of invalid alarms unrelated to actual abnormal behavior. Summary of the Invention

[0004] The purpose of this invention is to provide an identity recognition and early warning system based on a large AI model, which can effectively solve the problems mentioned in the background art.

[0005] To achieve the above objectives, the technical solution adopted by the present invention is as follows:

[0006] The AI-based large-scale model identity recognition and early warning computing system includes a conventional identity recognition computing module, a large-scale model behavior graph inference computing engine based on graph computing and self-attention mechanism, and a hierarchical early warning and control computing unit.

[0007] The conventional identity recognition calculation module performs feature matching calculations on the collected biometric features and activates the large model behavior graph inference calculation engine when an unregistered identity label is output.

[0008] The large model behavior graph inference calculation engine loads the pre-built physical space behavior constraint graph, inputs continuous video frames of unregistered personnel into the AI large model, and the AI large model performs extraction calculation of personnel appearance attribute features and motion trajectory features based on the self-attention mechanism. It performs path optimization calculation and probability inference calculation in the physical space behavior constraint graph to generate a probability distribution map containing the probability of personnel location transfer and the probability of abnormal behavior.

[0009] The hierarchical early warning control calculation unit extracts probability nodes that exceed the preset safety boundary in the probability distribution map, and sends control commands to the front-end alarm terminals in the corresponding abnormal behavior areas to activate the local alarm calculation system according to the preset alarm partition topology map.

[0010] Preferably, the conventional identity recognition calculation module includes a face feature extraction calculation submodule based on a convolutional neural network and a gait feature extraction calculation submodule based on skeletal key points. The face feature extraction calculation submodule and the gait feature extraction calculation submodule respectively perform feature extraction calculation on video frames within the same time window to obtain face feature vectors and gait feature vectors.

[0011] The conventional identity recognition calculation module performs cascade concatenation calculation on the face feature vector and the gait feature vector, and inputs the concatenated joint feature vector into a pre-built neural network-based identity classifier. When the highest classification confidence output by the identity classifier is lower than a preset threshold and the number of consecutive frames reaches a preset frame threshold, the unregistered identity label is output and an activation signal is generated and transmitted to the large model behavior graph inference calculation engine.

[0012] Preferably, the physical space behavior constraint map is constructed by graph convolutional neural network, and the construction method includes: obtaining the coordinates of the monitoring equipment in the target area and the physical channel connectivity relationship, and constructing an initial spatial topology map containing multiple nodes and edges;

[0013] Extract the regional functional attribute labels and historical personnel passage direction weights of the corresponding regions of each node in the initial spatial topology map. Perform embedding mapping calculation on the regional functional attribute labels to calculate the node feature matrix, and perform normalization calculation on the historical personnel passage direction weights to map them to the edge feature matrix.

[0014] The node feature matrix and the edge feature matrix are input into a pre-trained graph convolutional neural network to perform multi-layer feature aggregation calculation, and the output is the physical space behavior constraint map containing spatial location topological constraints and regional functional restriction constraints.

[0015] Preferably, when the AI big model performs the extraction and calculation of personnel appearance attribute features and motion trajectory features, it performs segmentation calculation on the continuous video frames according to a preset time step to obtain multiple video segments, performs cropping and pixel alignment calculation on the target personnel image in each video segment, extracts the image block sequence containing clothing color and body shape contour as the personnel appearance attribute features, and simultaneously extracts the temporal change sequence of the target personnel's foot coordinates in the continuous video frames as the motion trajectory features;

[0016] The AI big model performs cross-modal feature fusion calculation on the image patch sequence and the temporal change sequence through a multi-head cross-attention neural network layer, and maps the fused feature vector to the node space of the physical space behavior constraint map to perform path optimization calculation.

[0017] Preferably, the calculation process for generating a probability distribution map containing the probability of personnel location transfer and the probability of abnormal behavior includes: in the physical space behavior constraint map, performing a deviation calculation between the actual path represented by the motion trajectory feature and the legal path in the physical space behavior constraint map, and converting the deviation into the probability of personnel location transfer through a nonlinear mapping calculation;

[0018] At the functional constraint node of the physical space behavior constraint map, the statistical calculation of the number of backtracking and dwell time of the actual path within a preset time range is performed. The number of backtracking and dwell time are input into a preset piecewise nonlinear behavior judgment function, and the abnormal behavior probability is calculated to obtain the abnormal behavior probability.

[0019] The probability of personnel location transfer and the probability of abnormal behavior are assigned to the corresponding graph nodes to generate the probability distribution map.

[0020] Preferably, the preset security boundary is a multi-dimensional probability threshold vector configured for each node in the probability distribution map, and the multi-dimensional probability threshold vector includes a location transfer probability threshold component and an abnormal behavior probability threshold component.

[0021] The alarm partition topology map divides the target area into multiple independent physical warning grids, and each physical warning grid is configured with at least one front-end alarm terminal.

[0022] The hierarchical early warning control calculation unit sequentially performs a comparison calculation between the probability value of each node in the probability distribution map and the multidimensional probability threshold vector of the corresponding node. When any component exceeds the threshold, the corresponding node is marked as the probability node that exceeds the preset safety boundary, and the query calculation of the alarm partition topology map is performed to determine the physical early warning grid to which the probability node belongs.

[0023] Preferably, the identity classifier is a metric learning-based Siamese neural network computing architecture, which includes a weight-shared feature encoding neural network branch and a Mahalanobis distance calculation layer.

[0024] The feature encoding neural network branch performs dimensionality reduction mapping calculation on the input joint feature vector and outputs a target feature embedding vector of fixed dimension.

[0025] The Mahalanobis distance calculation layer performs Mahalanobis distance calculation between the target feature embedding vector and each benchmark feature embedding vector in the pre-stored registered personnel feature database. The calculated Mahalanobis distances are arranged in ascending order, and the smallest Mahalanobis distance at the top is selected as the highest classification confidence. The identity category is output according to the corresponding position of the smallest Mahalanobis distance in the registered personnel feature database.

[0026] Preferably, the multi-head cross-attention neural network layer performs cross-modal feature fusion calculation on the image patch sequence and the temporal variation sequence in the following way: the temporal variation sequence is converted into a temporal position encoding matrix by performing encoding calculation, and the image patch sequence is converted into an image spatial position encoding matrix by performing encoding calculation.

[0027] The temporal location encoding matrix is used as the query matrix, and the image spatial location encoding matrix is used as the key matrix and value matrix. The dot product attention score calculation of the query matrix and the key matrix is performed in parallel in multiple attention heads of the multi-head cross-attention neural network layer.

[0028] A preset mask matrix is superimposed after the dot product attention score, and feature filtering calculation is performed to filter out the time step features corresponding to the target person being in an invisible state in the time series change sequence. The filtered features are then input into a fully connected neural network layer to perform dimension mapping calculation and output the fused feature vector.

[0029] Preferably, the behavior determination function is a piecewise nonlinear mapping calculation function, which defines three different numerical calculation intervals;

[0030] When the number of backtracking trips is zero and the dwell time is less than a first preset time threshold, the piecewise nonlinear mapping calculation function performs mapping calculation on the input parameters to a first calculation interval and outputs the probability of the abnormal behavior within a first numerical range.

[0031] When the number of backtracking trips exceeds a preset threshold and the dwell time is between the first preset time threshold and the second preset time threshold, the piecewise nonlinear mapping calculation function performs mapping calculation on the input parameters to a second calculation interval and outputs the probability of the abnormal behavior within a second numerical range, wherein the minimum value of the second numerical range is greater than the maximum value of the first numerical range.

[0032] Preferably, the multidimensional probability threshold vector is calculated based on the regional functional attribute label of the physical early warning grid. For a physical early warning grid with the functional attribute label being a restricted area, a first location transfer probability threshold component and a first abnormal behavior probability threshold component are configured. For a physical early warning grid with the functional attribute label being a public area, a second location transfer probability threshold component and a second abnormal behavior probability threshold component are configured. The first location transfer probability threshold component and the first abnormal behavior probability threshold component are respectively less than the second location transfer probability threshold component and the second abnormal behavior probability threshold component.

[0033] The control command sent from the hierarchical early warning control calculation unit to the front-end alarm terminal includes the specific value of the component that exceeds the threshold for the corresponding probability node and the alarm level code. The alarm level code is generated by performing a mapping calculation based on the degree to which the probability value exceeds the threshold. The front-end alarm terminal drives alarm hardware circuits of different power according to the alarm level code.

[0034] Compared with the prior art, the beneficial effects of the present invention are as follows:

[0035] 1. This invention activates a large-scale behavioral graph inference engine based on graph computation and self-attention mechanism when the conventional identity recognition calculation module outputs an unregistered identity tag, transforming the alarm triggering basis from a single failed identity comparison to a physical space behavioral link evaluation. The large-scale behavioral graph inference engine, based on graph computation and self-attention mechanism, loads a physical space behavioral constraint graph containing spatial location topological constraints and regional functional limitations. It performs feature extraction and path optimization probability inference on continuous video frames of unregistered personnel using self-attention mechanism, generating a probability distribution map containing location transfer probabilities and abnormal behavior probabilities. The hierarchical early warning control calculation unit only extracts probability nodes exceeding preset safety boundaries from this probability distribution map and sends control commands to the front-end alarm terminals within the corresponding physical early warning grid according to the alarm partition topology map. This mechanism introduces graph inference calculation based on the matching degree of personnel movement trajectory and regional functions under the condition of missing identity verification, filtering out unregistered personnel with normal passage logic, and ensuring that the generation of local alarm commands is based on the determination of abnormal physical behavior.

[0036] 2. When determining unregistered identity tags, the Mahalanobis distance is calculated by concatenating and splicing facial feature vectors and gait feature vectors and inputting them into a Siamese neural network architecture for metric learning. Combined with a frame threshold limit within the duration window, false triggers caused by missing instantaneous features are reduced. When constructing a physical space behavior constraint map, a graph convolutional neural network is used to perform feature aggregation calculations on regional functional attributes and historical travel direction weights, so that map nodes have actual constraint representations of physical regions. In the feature fusion stage, a multi-head cross-attention neural network layer combined with a mask matrix is used to filter out the time step features of the target personnel in an invisible state, reducing the interference of the occlusion environment on trajectory extraction. When calculating the probability of abnormal behavior, a piecewise nonlinear mapping function is used to perform interval mapping based on the number of backtracking and dwell time. Combined with a multi-dimensional probability threshold vector, differentiated configuration is performed according to the functional attribute tags of the physical warning grid, so that the alarm triggering conditions of different security level areas match the actual security needs. The level code carried by the control command drives alarm hardware circuits of different power, realizing the differentiated physical allocation of warning resources. Attached Figure Description

[0037] Figure 1 This is a flowchart illustrating the overall system workflow of the present invention.

[0038] Figure 2 This is a flowchart illustrating the workflow of the conventional identity recognition calculation module of the present invention.

[0039] Figure 3 This is a flowchart illustrating the construction process of the physical space behavior constraint map of the present invention.

[0040] Figure 4 This is a flowchart of the large model feature extraction and fusion process of the present invention;

[0041] Figure 5 A flowchart for generating the probability distribution diagram of this invention;

[0042] Figure 6 This is a flowchart of the hierarchical early warning control process of the present invention. Detailed Implementation

[0043] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0044] Please refer to Figure 1This embodiment provides an AI-based large-scale model-based identity recognition and early warning system deployed in a security management platform within a target monitoring area. The system includes a conventional identity recognition calculation module, a large-scale model behavior graph inference engine based on graph computing and self-attention mechanisms, and a hierarchical early warning and control calculation unit. The conventional identity recognition calculation module receives real-time video stream data collected by monitoring devices within the target area, extracts continuous video frames containing human biometric features from the video stream data, performs matching calculations on the extracted biometric features, and generates an identity recognition result. When the identity recognition result indicates a registered identity, the system terminates the current early warning judgment process and maintains the normal monitoring state; when the identity recognition result indicates an unregistered identity tag, the conventional identity recognition calculation module generates an activation signal and synchronously transmits the activation signal along with the corresponding continuous video frames of the unregistered person to the large-scale model behavior graph inference engine based on graph computing and self-attention mechanisms, completing the activation operation of the large-scale model behavior graph inference engine based on graph computing and self-attention mechanisms.

[0045] Upon receiving the activation signal, the large-scale model behavior graph inference engine, based on graph computation and self-attention mechanisms, initializes its runtime environment and loads a pre-built physical space behavior constraint graph. This graph includes spatial topological constraints and regional functional limitations of the target monitoring area. The engine then inputs continuous video frames of unregistered personnel into the AI model. The AI model performs feature extraction on these frames using self-attention, obtaining the appearance attributes and motion trajectory features of the unregistered personnel. The model then fuses these features, mapping the fused feature vectors to the node space of the pre-loaded physical space behavior constraint graph. Under the topological constraints of the graph, it performs path optimization and probabilistic inference. During path optimization, the AI model matches the corresponding nodes and edges in the physical space behavior constraint graph with the fused feature vectors to reconstruct the actual movement path of the unregistered personnel in the physical space, while simultaneously calculating the degree of matching between this actual movement path and the legal paths in the graph. During the probabilistic inference process, the AI large-scale model, based on the results of path optimization calculations and combined with the regional functional constraints of each node in the physical space behavior constraint graph, calculates the probability of unregistered personnel moving to their corresponding nodes and the probability of abnormal behavior. These calculated probabilities are then assigned to the corresponding nodes in the physical space behavior constraint graph, generating a probability distribution map containing the probabilities of personnel moving to their corresponding locations and exhibiting abnormal behavior. The large-scale model's behavior graph inference engine, based on graph computation and self-attention mechanisms, transmits the generated probability distribution map to the hierarchical early warning and control calculation unit.

[0046] After receiving the probability distribution map, the hierarchical early warning control calculation unit loads a preset safety boundary and alarm partition topology map. The preset safety boundary represents the probability threshold constraints for each node in the probability distribution map. The alarm partition topology map divides the target monitoring area into multiple independent physical early warning grids, each corresponding to at least one front-end alarm terminal. The hierarchical early warning control calculation unit traverses all nodes in the probability distribution map, sequentially extracting the location transfer probability and abnormal behavior probability values for each node. It compares the extracted probability values with the preset safety boundary of the corresponding node, filtering out probability nodes whose probability values exceed the preset safety boundary. For these filtered probability nodes, the hierarchical early warning control calculation unit queries the alarm partition topology map to determine the physical early warning grid to which the probability node belongs, and obtains the communication address and control interface information of the front-end alarm terminal corresponding to that physical early warning grid. The hierarchical early warning control calculation unit sends control commands to the front-end alarm terminals in the corresponding abnormal behavior area according to the obtained communication addresses. Upon receiving the control commands, the front-end alarm terminals initiate local alarm operations, completing this identification and early warning process.

[0047] In this embodiment, the basic attributes of the spatial topology nodes of the target monitoring area are shown in Table 1.

[0048] Table 1 Basic Attributes of Spatial Topology Nodes in the Target Monitoring Area

[0049] Node ID Corresponding physical region Regional Functional Attribute Labels Planar coordinates of monitoring equipment Connected node ID N001 Main entrance of the park public areas (120,85) N002, N003 N002 Main road of the park public areas (120,150) N001, N004, N005 N003 Visitor reception area public areas (200,85) N001, N006 N004 Office area entrance Restricted Area (120,220) N002, N007 N005 Warehouse area entrance Restricted Area (50,150) N002, N008 N006 Visitor parking public areas (200,150) N003 N007 Office area Restricted Area (120,290) N004 N008 Inside the warehouse area Highly restricted areas (50,220) N005

[0050] Table 1 presents the initial spatial topology map of the target monitoring area, providing basic data support for the construction of the physical space behavior constraint map. Node IDs serve as unique identifiers for each physical region in the topology map; region functional attribute labels characterize the security control level of the corresponding physical region; monitoring device planar coordinates determine the location reference of nodes in physical space; and connected node IDs define the physical channel connectivity between nodes, providing a topological constraint basis for subsequent path optimization and probabilistic reasoning.

[0051] In this embodiment, when the conventional identity recognition calculation module outputs an unregistered identity tag, it activates the large model behavior graph inference calculation engine based on graph computing and self-attention mechanism. The alarm triggering basis is changed from a single identity comparison failure to physical space behavior link evaluation. Through the topological constraints and functional constraints of the physical space behavior constraint graph, probabilistic inference is performed on the movement trajectory and behavior pattern of unregistered personnel. Local alarms are triggered only for abnormal behaviors that exceed the preset security boundary, filtering out unregistered personnel with normal passage logic. This ensures that the generation of local alarm instructions is based on the judgment of abnormal physical behavior.

[0052] In a preferred embodiment, reference Figure 2 The standard identity recognition calculation module includes a face feature extraction submodule based on convolutional neural networks and a gait feature extraction submodule based on skeletal keypoints. Both submodules receive real-time video stream data from the same monitoring device and process the data synchronously within the same preset time window, extracting corresponding biometric features from consecutive video frames within that window. The length of the preset time window can be adjusted according to the passage speed of people in the target monitoring area, ensuring that the video frames within the time window contain complete sequences of face images and gait movements.

[0053] The face feature extraction and calculation submodule based on convolutional neural networks performs face detection processing on each frame of video images within a time window, locating the coordinates of the face region in the video image. Based on the located coordinates, the face region is cropped and normalized to obtain a fixed-size face image block. This submodule then inputs the processed face image block into a pre-trained convolutional neural network to extract the face feature vector corresponding to each frame of video images. The face feature vectors corresponding to all video frames within the time window are then subjected to mean pooling, outputting a fixed-dimensional face feature vector. The gait feature extraction and calculation submodule based on skeletal keypoints performs human body detection and keypoint localization processing on consecutive video frames within the same time window, extracting the temporal change sequence of human skeletal keypoints. Gait motion features are extracted based on the temporal change sequence of skeletal keypoints, and static body features are extracted based on the contour information of the human detection box. The gait motion features and static body features are concatenated to output a fixed-dimensional gait feature vector.

[0054] The conventional identity recognition calculation module performs cascaded concatenation calculations on facial feature vectors and gait feature vectors generated within the same time window to generate a joint feature vector. During the cascade concatenation process, the facial feature vector and gait feature vector are concatenated in a preset order to ensure that the dimensions of the generated joint feature vector are fixed. The conventional identity recognition calculation module inputs the concatenated joint feature vector into a pre-built identity classifier. The identity classifier encodes the input joint feature vector and performs distance calculations, outputting the identity classification result and the corresponding highest classification confidence score. The conventional identity recognition calculation module monitors the highest classification confidence score output by the identity classifier in real time. When the highest classification confidence score is lower than a preset threshold, a continuous frame count operation is initiated to count the number of consecutive video frames with the highest classification confidence score lower than the preset threshold. When the counted continuous frame count reaches the preset frame count threshold, the conventional identity recognition calculation module outputs an unregistered identity label and simultaneously generates an activation signal that is transmitted to the large-scale model behavior graph inference calculation engine based on graph computation and self-attention mechanisms. When the highest classification confidence level is higher than or equal to the preset threshold, the regular identity recognition calculation module terminates the continuous frame count operation, outputs the registered identity label with the corresponding confidence level, and the system maintains the regular monitoring state.

[0055] In this embodiment, the identity classifier is a Siamese neural network architecture based on metric learning. This architecture includes two weight-shared feature encoding neural network branches and a Mahalanobis distance calculation layer. The two feature encoding neural network branches have identical network structures and share weight parameters during training, ensuring that the input feature vectors undergo the same encoding mapping process. Each feature encoding neural network branch contains multiple sequentially connected fully connected neural network layers and non-linear activation layers. It performs dimensionality reduction mapping on the input joint feature vector, mapping the high-dimensional joint feature vector to a low-dimensional feature embedding space, and outputting a fixed-dimensional target feature embedding vector. The Mahalanobis distance calculation layer receives the target feature embedding vector output by the feature encoding neural network branch and simultaneously reads a pre-stored registered personnel feature database. This database stores the baseline feature embedding vectors of all registered personnel, as well as the personnel identity category information corresponding to each baseline feature embedding vector. The Mahalanobis distance calculation layer calculates the Mahalanobis distance between the target feature embedding vector and each baseline feature embedding vector in the registrant feature database. The calculated Mahalanobis distances are arranged in ascending order, and the smallest Mahalanobis distance at the top of the list is selected as the highest classification confidence. Simultaneously, based on the position of the smallest Mahalanobis distance in the registrant feature database, the corresponding identity category is output. The Mahalanobis distance calculation process is implemented using the following formula:

[0056] ;

[0057] in, The Mahalanobis distance, Embed vectors for target features. This is the embedding vector of the baseline features in the registered personnel feature database. The covariance matrix of all baseline feature embedding vectors in the registered personnel feature database is given by the superscript. The matrix transpose operator, superscript The matrix inversion operator.

[0058] The concatenation process of joint feature vectors is achieved through the following formula:

[0059] ;

[0060] in, This is the joint feature vector after concatenation and splicing. For facial feature vectors, For gait feature vectors, This is the vector concatenation operator.

[0061] In this embodiment, the composition and dimension definition of the joint feature vector are shown in Table 2.

[0062] Table 2. Composition and Dimension Definition of Joint Feature Vectors

[0063] Feature categories Feature Source Submodule Vector Dimension Explanation of physical meaning Global facial features Face feature extraction and calculation submodule based on convolutional neural network 256 Global feature vectors containing the relative positions of facial features and contour shapes are extracted using a convolutional neural network. Facial features Face feature extraction and calculation submodule based on convolutional neural network 128 Includes local detail feature vectors of the human eye, nose, and mouth regions, and performs feature enhancement for occluded scenes. Gait temporal features Gait feature extraction and calculation submodule based on skeletal key points 256 It includes the temporal variation characteristics of human walking posture in consecutive video frames, representing the gait movement pattern of people. Body contour features Gait feature extraction and calculation submodule based on skeletal key points 128 Static postural characteristics, including height, shoulder width, and torso proportions, complement gait temporal characteristics. Joint eigenvectors Feature cascade module 768 The fixed-dimensional feature vector, obtained by concatenating and cascading the above four types of features in sequence, is used as the input to the identity classifier.

[0064] Table 2 defines the composition, dimensions, and physical meaning of facial and gait features in a conventional identity recognition calculation module, and clarifies the rules for generating the joint feature vector. Facial and gait features are extracted from video frames within the same time window and concatenated to form a fixed-dimensional joint feature vector. This ensures that the input data for the identity classifier simultaneously includes both facial and gait biometrics, improving the robustness of identity classification.

[0065] In this embodiment, identity classification is performed by concatenating facial feature vectors and gait feature vectors and combining them with a metric learning-based Siamese neural network computing architecture. At the same time, a continuous frame number threshold is introduced to reduce identity misjudgment caused by instantaneous feature loss or single-frame image occlusion, reduce the probability of false triggering of unregistered identity tags, and ensure that subsequent behavior graph inference processes are activated only for continuous identity recognition failure scenarios.

[0066] In a preferred embodiment, reference Figure 3 and Figure 4The physical space behavior constraint graph, loaded by a large-scale model behavior graph inference engine based on graph computation and self-attention mechanisms, is generated through a pre-executed graph construction process. In this process, the planar coordinates of all monitoring devices in the target area and the connectivity information of all physical channels within the target area are first obtained. Based on the coverage of the monitoring devices and the connectivity of the physical channels, the target area is divided into multiple independent spatial regions. Each spatial region corresponds to a topological node, and the connectivity of physical channels corresponds to the edges between topological nodes, thus constructing an initial spatial topology graph containing multiple nodes and edges. In the initial spatial topology graph, edges between nodes are generated only when there is a direct physical connection between two corresponding spatial regions, ensuring that the initial spatial topology graph completely matches the actual physical spatial structure of the target area.

[0067] After the initial spatial topology map is constructed, the regional functional attribute labels corresponding to each node in the initial spatial topology map are extracted. These labels are defined based on the actual use and security control level of the corresponding spatial area. Simultaneously, the historical personnel passage direction weights for each edge in the initial spatial topology map are extracted. These weights are obtained based on historical monitoring data of the target area and represent the personnel passage direction and frequency of the corresponding physical passage. The extracted regional functional attribute labels of all nodes are transformed into a fixed-dimensional node feature matrix through an embedding mapping function. Similarly, the extracted historical personnel passage direction weights of all edges are transformed into a fixed-dimensional edge feature matrix through normalization. The mapping process between the node feature matrix and the edge feature matrix is implemented using the following formula:

[0068] ;

[0069] in, The node feature matrix, For one-hot encoding, embedding mapping functions, This is a collection of regional functional attribute tags. The edge feature matrix, It is a minimum-maximum normalization function. This is a weighted set of historical personnel travel directions.

[0070] The generated node feature matrix and edge feature matrix are input into a pre-trained graph convolutional neural network (Graph Convolutional Neural Network). The Graph Convolutional Neural Network performs feature aggregation calculations on the node and edge features through multiple graph convolutional layers. Each graph convolutional layer aggregates the features of the current node and its neighboring nodes, while also incorporating the weight constraints of the edge feature matrix to update the node's feature representation. After feature aggregation calculations through multiple graph convolutional layers, the Graph Convolutional Neural Network outputs updated node and edge feature matrices. Based on these updated matrix values, a physical spatial behavior constraint map containing spatial location topological constraints and regional functional limitations is generated. The feature aggregation calculation process of the Graph Convolutional Neural Network is implemented using the following formula:

[0071] ;

[0072] in, For the first The node feature matrix output by layer graph convolution. It is a non-linear activation function. To add a self-loop adjacency matrix, for The corresponding degree matrix, For the first The node feature matrix of the layer input, For the first The trainable weight matrix of the layer.

[0073] In this embodiment, when extracting personnel appearance attribute features and motion trajectory features, the AI large model first divides the received continuous video frames of unregistered personnel into multiple video segments according to a preset time step. Each video segment contains a fixed number of continuous video frames, ensuring that the time length of each video segment is consistent. For each frame of video image in each video segment, based on the pre-executed personnel target detection results, the image region of the target personnel is cropped to obtain a cropped image containing only the target personnel. The cropped image is then pixel-aligned and size-normalized to obtain a standardized image of a fixed size. The AI large model inputs the standardized image into a pre-trained visual Transformer network to extract local features in the image, including clothing color distribution and body contour shape. The local features corresponding to the continuous video frames are arranged in chronological order to generate an image patch sequence, which serves as the personnel appearance attribute features.

[0074] Simultaneously, the AI model extracts the coordinates of key foot points of the target person in consecutive video frames, obtaining the planar coordinates of the key foot points in each video image. The foot coordinates corresponding to consecutive video frames are arranged in chronological order to generate a temporal variation sequence of foot coordinates, which serves as the motion trajectory feature. The AI model then uses a multi-head cross-attention neural network layer to perform feature fusion processing on the generated image patch sequence and the temporal variation sequence. The fused feature vector is mapped to the node space of the physical space behavior constraint graph for subsequent path optimization calculations.

[0075] When the multi-head cross-attention neural network layer fuses features from image patch sequences and temporal variation sequences, it first transforms the temporal variation sequence into a temporal position encoding matrix and the image patch sequence into an image spatial position encoding matrix. The temporal position encoding matrix represents the temporal variation features of the motion trajectory, while the image spatial position encoding matrix represents the spatial distribution features of appearance attributes. The generation process of the temporal position encoding matrix is achieved through the following formula:

[0076] ;

[0077] in, This is the temporal position encoding matrix. For time step index, Indexed by feature dimensions, It is a fixed dimension for feature encoding.

[0078] The generation process of the image spatial location coding matrix is achieved through the following formula:

[0079] ;

[0080] in, For image spatial location encoding matrix, This refers to the pixel coordinate index of the image patch within the cropped image. These are the width and height of the cropped image, respectively, and the definitions of the remaining parameters are consistent with the generation formula of the temporal position coding matrix.

[0081] After generating the temporal location encoding matrix and the image spatial location encoding matrix, the temporal location encoding matrix is used as the query matrix, and the image spatial location encoding matrix is used as the key and value matrices. These matrices are then input into multiple attention heads of a multi-head cross-attention neural network layer, where the dot product attention score is calculated in parallel within each attention head. After the dot product attention score is calculated, a preset mask matrix is superimposed on the attention score matrix. In the mask matrix, the positions corresponding to the time step when the target person is in an invisible state are set to negative infinity, and the remaining positions are set to 0. This mask matrix filters out the time step features corresponding to the target person being in an invisible state in the temporal change sequence. The filtered attention features are then input into a fully connected neural network layer for dimension mapping, outputting a fused feature vector. The dot product attention calculation process for a single attention head is implemented using the following formula:

[0082] ;

[0083] in, Attention features of single-attention head output. For querying the matrix, The key matrix, For value matrices, For the mask matrix, The feature dimension of the key matrix, It is a normalized exponential function.

[0084] In this embodiment, the parameter configuration and masking rules of each attention head in the multi-head cross-attention neural network layer are shown in Table 3.

[0085] Table 3. Parameter Configuration Table for Attention Heads in Multi-Head Cross-Attention Neural Network Layers

[0086] Attention Head Number Query matrix dimensions Key matrix dimension Value matrix dimension Masking rules 1 128 128 128 Filter out time step features with missing foot coordinates of the target person in the time series and retain complete trajectory features. 2 128 128 128 Filter out time-step features in image patch sequences where the target person is occluded by more than 50%, while retaining clear appearance features. 3 64 64 64 Preserve the time step feature of the target person being in the center area of the monitoring screen, and weaken the feature weight of the edge area. 4 64 64 64 Retain stable features that exist for three or more consecutive time steps, and filter out noise features that abruptly change in a single frame.

[0087] Table 3 shows the parameter configurations and masking rules for each attention head in the multi-head cross-attention neural network layer, used to effectively fuse personnel appearance attribute features and motion trajectory features. Each attention head performs attention calculations in parallel, and invalid features and noisy data are filtered out through differentiated masking rules to ensure that the fused feature vector simultaneously contains stable appearance attribute information and continuous motion trajectory information, thereby improving the accuracy of feature representation.

[0088] In this embodiment, a graph convolutional neural network is used to perform feature aggregation calculations on the regional functional attributes and historical travel direction weights. This ensures that the nodes of the physical space behavior constraint map possess the actual constraint representation of the physical region, guaranteeing that subsequent path optimization and probabilistic inference conform to the actual physical space rules of the target region. A multi-head cross-attention neural network layer, combined with a mask matrix, is used for feature fusion to filter out time step features where the target person is in an invisible state. This reduces the interference of occlusion and the temporary disappearance of the target person on trajectory extraction and feature fusion, improving the stability and accuracy of feature representation.

[0089] In a preferred embodiment, reference Figure 5 and Figure 6 The process of generating a probability distribution map containing the probability of personnel location transfer and abnormal behavior, based on graph computing and self-attention mechanism, is a large-scale model behavior graph inference computing engine. It includes four processing stages: path deviation calculation, location transfer probability mapping, abnormal behavior probability calculation and probability distribution map generation.

[0090] In the path deviation calculation stage, based on the fused feature vector matching of the actual movement paths of unregistered personnel obtained from the preloaded physical space behavior constraint map, all legal paths matching the start and end nodes of the actual movement paths in the physical space behavior constraint map are extracted, forming a set of legal paths. The deviation between the actual movement path and each legal path in the set of legal paths is calculated. The deviation calculation process is implemented through the following formula:

[0091] ;

[0092] in, This represents the deviation between the actual path and the legal path. The total number of time steps in the trajectory sequence. For the first The actual coordinates of the target personnel corresponding to each time step. This represents the set of legal paths in the physical space behavior constraint graph. A legitimate path In the The coordinates corresponding to each time step This is the L2 norm operator.

[0093] In the location transfer probability mapping stage, the calculated path deviation is transformed into personnel location transfer probabilities through a preset nonlinear mapping function. The numerical range of the location transfer probability is 0 to 1; a higher value indicates a greater deviation between the actual path and the legal path, and a higher degree of anomaly in personnel location transfer. The location transfer probability mapping process is implemented through the following formula:

[0094] ;

[0095] in, For the probability of personnel location transfer, This is a preset proportional coefficient. This refers to the path deviation. It is a natural exponential function.

[0096] In the abnormal behavior probability calculation phase, a preset time statistics window is set at the functional constraint nodes of the physical space behavior constraint map. Within the time statistics window, the number of times unregistered personnel turn back and the duration of their stay within the corresponding node area are counted. The rule for counting the number of turns is the number of times the target person's movement direction reverses by 180 degrees within the corresponding node area, and the rule for counting the duration of stay is the length of time the target person stays continuously within the corresponding node area. The statistically obtained number of turns and duration of stay are input into a preset behavior judgment function to calculate the abnormal behavior probability of the corresponding node.

[0097] The behavior determination function is a piecewise nonlinear mapping function, defined with three distinct calculation intervals. Each interval corresponds to different input parameter conditions and output value ranges. When the number of input bounces is zero and the dwell time is less than a first preset time threshold, the behavior determination function maps the input parameters to the first calculation interval and outputs the probability of abnormal behavior within the first value range. When the number of input bounces is greater than a preset threshold and the dwell time is between the first and second preset time thresholds, the behavior determination function maps the input parameters to the second calculation interval and outputs the probability of abnormal behavior within the second value range, where the minimum value of the second value range is greater than the maximum value of the first value range. Under other input parameter conditions, the behavior determination function maps the input parameters to the third calculation interval and outputs the probability of abnormal behavior within the third value range, where the minimum value of the third value range is greater than the maximum value of the second value range. The calculation process of the piecewise nonlinear mapping function is implemented through the following formula:

[0098] ;

[0099] in, This represents the probability of abnormal behavior. The number of round trips within a preset time range. The duration of stay within a preset time range, The first preset time threshold, The second preset time threshold, For the preset number of times threshold, The linear coefficients are preset, and the maximum value in the first numerical range is less than the minimum value in the second numerical range.

[0100] In the probability distribution map generation stage, the calculated personnel location transfer probability and abnormal behavior probability corresponding to each node are assigned to the corresponding nodes in the physical space behavior constraint map. Based on the topology of the physical space behavior constraint map, a two-dimensional probability distribution map containing personnel location transfer probability and abnormal behavior probability is generated. The position of each node in the probability distribution map corresponds one-to-one with the actual position in the physical space, and the value of each node contains two dimensions: location transfer probability and abnormal behavior probability.

[0101] In this embodiment, the preset safety boundary loaded by the hierarchical early warning control calculation unit is a multi-dimensional probability threshold vector configured for each node in the probability distribution map. This multi-dimensional probability threshold vector includes two dimensions: a location transfer probability threshold component and an abnormal behavior probability threshold component, corresponding to the maximum allowable values of the node's location transfer probability and abnormal behavior probability, respectively. The alarm partition topology map loaded by the hierarchical early warning control calculation unit divides the target monitoring area into multiple independent physical early warning grids. Each physical early warning grid corresponds to at least one node in the physical space behavior constraint map, and each physical early warning grid corresponds to at least one front-end alarm terminal. The correspondence between the front-end alarm terminals and the physical early warning grids is pre-stored in the alarm partition topology map.

[0102] The hierarchical early warning control calculation unit sequentially compares the probability value of each node in the probability distribution map with the corresponding node's multidimensional probability threshold vector. When the node's position transition probability exceeds the corresponding position transition probability threshold component, or the abnormal behavior probability exceeds the corresponding abnormal behavior probability threshold component, the corresponding node is marked as a probability node exceeding the preset safety boundary. The node alarm identification determination process is implemented through the following formula:

[0103] ;

[0104] in, For the first Alarm flags for each node, For the first The probability of position transition of each node. For the first The position transition probability threshold component corresponding to each node. For the first The probability of abnormal behavior of each node. For the first Each node corresponds to an abnormal behavior probability threshold component.

[0105] For probability nodes that exceed the preset safety boundary and have been marked, the hierarchical early warning control calculation unit queries the alarm partition topology map to determine the physical early warning grid to which the probability node belongs and obtains the communication address of the front-end alarm terminal corresponding to that physical early warning grid. The hierarchical early warning control calculation unit generates control instructions for the corresponding front-end alarm terminal. The control instructions include the specific value of the probability node exceeding the threshold and the alarm level code. The alarm level code is generated by mapping according to the degree to which the probability value exceeds the threshold. The mapping process of the alarm level code is implemented through the following formula:

[0106] ;

[0107] in, Alarm level codes, This is the rounding function. This represents the value at which the probability of position transition exceeds the corresponding threshold. This represents the value at which the probability of abnormal behavior exceeds the corresponding threshold. The definitions of the remaining parameters are consistent with the node alarm identifier determination formula.

[0108] The hierarchical early warning control calculation unit sends the generated control commands to the front-end alarm terminal of the corresponding physical early warning grid. After receiving the control commands, the front-end alarm terminal parses the alarm level code in the control commands and drives the alarm hardware circuit of the corresponding power according to the alarm level code to perform local alarm operation.

[0109] In this embodiment, the multidimensional probability threshold vector is configured differently based on the regional functional attribute labels of the physical early warning grid. For physical early warning grids with the functional attribute label of "restricted area," a first location transfer probability threshold component and a first abnormal behavior probability threshold component are configured. For physical early warning grids with the functional attribute label of "public area," a second location transfer probability threshold component and a second abnormal behavior probability threshold component are configured. The first location transfer probability threshold component and the first abnormal behavior probability threshold component are respectively smaller than the second location transfer probability threshold component and the second abnormal behavior probability threshold component. The multidimensional probability threshold configurations for physical early warning grids with different functional attributes are shown in Table 4.

[0110] Table 4. Multidimensional probability threshold configuration table for physical early warning grids with different functional attributes.

[0111] Regional Functional Attribute Labels Position transition probability threshold component Abnormal behavior probability threshold component Corresponding alarm level code range Front-end alarm terminal power level Highly restricted areas 0.3 0.2 3-5 High power setting Restricted Area 0.5 0.4 2-4 medium power setting public areas 0.8 0.7 1-3 Low power setting

[0112] Table 4 defines the multidimensional probability threshold vector configuration rules for physical early warning grids with different functional attributes, as well as the corresponding alarm level codes and drive levels of the front-end alarm terminals. The threshold components for highly restricted areas, restricted areas, and public areas increase sequentially, achieving differentiated alarm triggering conditions for areas with different security levels. The alarm level codes correspond to the power levels of the front-end alarm terminals, enabling differentiated physical allocation of early warning resources.

[0113] In this embodiment, by calculating path deviation and using a piecewise nonlinear mapping function, the probability of location transfer and the probability of abnormal behavior are generated respectively, thereby achieving a quantitative assessment of the behavior patterns of unregistered personnel. Through the differentiated configuration of multidimensional probability threshold vectors, the alarm triggering conditions of different security level areas are matched with actual security needs. By driving alarm hardware circuits with different power levels through control commands carrying alarm level codes, differentiated physical allocation of early warning resources is achieved, avoiding indiscriminate global alarm operations and reducing the probability of generating invalid alarms.

Claims

1. An identity recognition and early warning calculation system based on an AI large-scale model, characterized in that, It includes a conventional identity recognition computing module, a large-scale model behavior graph inference computing engine based on graph computing and self-attention mechanism, and a hierarchical early warning and control computing unit; The conventional identity recognition calculation module performs feature matching calculations on the collected biometric features and activates the large model behavior graph inference calculation engine when an unregistered identity label is output. The large model behavior graph inference calculation engine loads the pre-built physical space behavior constraint graph, inputs continuous video frames of unregistered personnel into the AI large model, and the AI large model performs extraction calculation of personnel appearance attribute features and motion trajectory features based on the self-attention mechanism. It performs path optimization calculation and probability inference calculation in the physical space behavior constraint graph to generate a probability distribution map containing the probability of personnel location transfer and the probability of abnormal behavior. The hierarchical early warning control calculation unit extracts probability nodes that exceed the preset safety boundary in the probability distribution map, and sends control commands to the front-end alarm terminals in the corresponding abnormal behavior areas to activate the local alarm calculation system according to the preset alarm partition topology map.

2. The AI-based large-scale model-based identity recognition early warning calculation system according to claim 1, characterized in that, The conventional identity recognition calculation module includes a face feature extraction calculation submodule based on a convolutional neural network and a gait feature extraction calculation submodule based on skeletal key points. The face feature extraction calculation submodule and the gait feature extraction calculation submodule respectively perform feature extraction calculation on video frames within the same time window to obtain face feature vectors and gait feature vectors. The conventional identity recognition calculation module performs cascade concatenation calculation on the face feature vector and the gait feature vector, and inputs the concatenated joint feature vector into a pre-built neural network-based identity classifier. When the highest classification confidence output by the identity classifier is lower than a preset threshold and the number of consecutive frames reaches a preset frame threshold, the unregistered identity label is output and an activation signal is generated and transmitted to the large model behavior graph inference calculation engine.

3. The AI-based large-scale model-based identity recognition early warning calculation system according to claim 1, characterized in that, The physical space behavior constraint graph is constructed by graph convolutional neural network. The construction method includes: obtaining the coordinates of the monitoring equipment in the target area and the physical channel connectivity relationship, and constructing an initial spatial topology graph containing multiple nodes and edges. Extract the regional functional attribute labels and historical personnel passage direction weights of the corresponding regions of each node in the initial spatial topology map. Perform embedding mapping calculation on the regional functional attribute labels to calculate the node feature matrix, and perform normalization calculation on the historical personnel passage direction weights to map them to the edge feature matrix. The node feature matrix and the edge feature matrix are input into a pre-trained graph convolutional neural network to perform multi-layer feature aggregation calculation, and the output is the physical space behavior constraint map containing spatial location topological constraints and regional functional restriction constraints.

4. The AI-based large-scale model-based identity recognition early warning calculation system according to claim 1, characterized in that, When the AI model extracts and calculates the appearance attribute features and motion trajectory features of the person, it divides the continuous video frames into multiple video segments according to a preset time step. It performs cropping and pixel alignment calculations on the target person image in each video segment and extracts the image block sequence containing clothing color and body contour as the appearance attribute features of the person. At the same time, it extracts the temporal change sequence of the target person's foot coordinates in the continuous video frames as the motion trajectory features. The AI big model performs cross-modal feature fusion calculation on the image patch sequence and the temporal change sequence through a multi-head cross-attention neural network layer, and maps the fused feature vector to the node space of the physical space behavior constraint map to perform path optimization calculation.

5. The AI-based large-scale model-based identity recognition early warning calculation system according to claim 1, characterized in that, The calculation process for generating a probability distribution map containing the probability of personnel location transfer and the probability of abnormal behavior includes: in the physical space behavior constraint map, performing a deviation calculation between the actual path represented by the motion trajectory feature and the legal path in the physical space behavior constraint map, and converting the deviation into the probability of personnel location transfer through a nonlinear mapping calculation; At the functional constraint node of the physical space behavior constraint map, the statistical calculation of the number of backtracking and dwell time of the actual path within a preset time range is performed. The number of backtracking and dwell time are input into a preset piecewise nonlinear behavior judgment function, and the abnormal behavior probability is calculated to obtain the abnormal behavior probability. The probability of personnel location transfer and the probability of abnormal behavior are assigned to the corresponding graph nodes to generate the probability distribution map.

6. The AI-based large-scale model-based identity recognition early warning calculation system according to claim 1, characterized in that, The preset security boundary is a multi-dimensional probability threshold vector configured for each node in the probability distribution map. The multi-dimensional probability threshold vector includes a location transfer probability threshold component and an abnormal behavior probability threshold component. The alarm partition topology map divides the target area into multiple independent physical warning grids, and each physical warning grid is configured with at least one front-end alarm terminal. The hierarchical early warning control calculation unit sequentially performs a comparison calculation between the probability value of each node in the probability distribution map and the multidimensional probability threshold vector of the corresponding node. When any component exceeds the threshold, the corresponding node is marked as the probability node that exceeds the preset safety boundary, and the query calculation of the alarm partition topology map is performed to determine the physical early warning grid to which the probability node belongs.

7. The AI-based large-scale model-based identity recognition early warning calculation system according to claim 2, characterized in that, The identity classifier is a metric learning-based Siamese neural network computing architecture, which includes a weight-shared feature encoding neural network branch and a Mahalanobis distance calculation layer. The feature encoding neural network branch performs dimensionality reduction mapping calculation on the input joint feature vector and outputs a target feature embedding vector of fixed dimension. The Mahalanobis distance calculation layer performs Mahalanobis distance calculation between the target feature embedding vector and each benchmark feature embedding vector in the pre-stored registered personnel feature database. The calculated Mahalanobis distances are arranged in ascending order, and the smallest Mahalanobis distance at the top is selected as the highest classification confidence. The identity category is output according to the corresponding position of the smallest Mahalanobis distance in the registered personnel feature database.

8. The AI-based large-scale model-based identity recognition early warning calculation system according to claim 4, characterized in that, The specific method by which the multi-head cross-attention neural network layer performs cross-modal feature fusion calculation on the image patch sequence and the temporal variation sequence is as follows: the temporal variation sequence is encoded and calculated into a temporal position encoding matrix, and the image patch sequence is encoded and calculated into an image spatial position encoding matrix. The temporal location encoding matrix is used as the query matrix, and the image spatial location encoding matrix is used as the key matrix and value matrix. The dot product attention score calculation of the query matrix and the key matrix is performed in parallel in multiple attention heads of the multi-head cross-attention neural network layer. A preset mask matrix is superimposed after the dot product attention score, and feature filtering calculation is performed to filter out the time step features corresponding to the target person being in an invisible state in the time series change sequence. The filtered features are then input into a fully connected neural network layer to perform dimension mapping calculation and output the fused feature vector.

9. The AI-based large-scale model-based identity recognition early warning calculation system according to claim 5, characterized in that, The behavior determination function is a piecewise nonlinear mapping calculation function, which defines three different numerical calculation intervals. When the number of backtracking trips is zero and the dwell time is less than a first preset time threshold, the piecewise nonlinear mapping calculation function performs mapping calculation on the input parameters to a first calculation interval and outputs the probability of the abnormal behavior within a first numerical range. When the number of backtracking trips exceeds a preset threshold and the dwell time is between the first preset time threshold and the second preset time threshold, the piecewise nonlinear mapping calculation function performs mapping calculation on the input parameters to a second calculation interval and outputs the probability of the abnormal behavior within a second numerical range, wherein the minimum value of the second numerical range is greater than the maximum value of the first numerical range.

10. The AI-based large-scale model-based identity recognition early warning calculation system according to claim 6, characterized in that, The multidimensional probability threshold vector is calculated based on the regional functional attribute label of the physical early warning grid. For a physical early warning grid with the functional attribute label of a restricted area, a first location transfer probability threshold component and a first abnormal behavior probability threshold component are configured. For a physical early warning grid with the functional attribute label of a public area, a second location transfer probability threshold component and a second abnormal behavior probability threshold component are configured. The first location transfer probability threshold component and the first abnormal behavior probability threshold component are respectively less than the second location transfer probability threshold component and the second abnormal behavior probability threshold component. The control command sent from the hierarchical early warning control calculation unit to the front-end alarm terminal includes the specific value of the component that exceeds the threshold for the corresponding probability node and the alarm level code. The alarm level code is generated by performing a mapping calculation based on the degree to which the probability value exceeds the threshold. The front-end alarm terminal drives alarm hardware circuits of different power according to the alarm level code.