Training method of camera scheduling model, and detection method and device of irregular event
By constructing a sample probability model with beta distribution and training a reinforcement learning model, the problem of low training efficiency of camera scheduling models was solved, and efficient violation event detection was achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SF TECH CO LTD
- Filing Date
- 2021-08-31
- Publication Date
- 2026-06-16
AI Technical Summary
The training efficiency of existing camera scheduling models is low, mainly due to limited computing resources, sparse historical data, and long violation detection time, which makes it impossible to effectively allocate resources for violation event detection.
By constructing a sample probability model based on beta distribution, training it using a reinforcement learning model, obtaining historical sample data from cameras, sampling and iterative training, a camera scheduling model is generated.
It improves the training efficiency of the camera scheduling model, enabling the rapid acquisition of a large number of samples for training, solving the problems of resource allocation and detection time consumption, and achieving more efficient violation event detection.
Smart Images

Figure CN115731489B_ABST
Abstract
Description
Technical Field
[0001] This application mainly relates to the field of camera scheduling technology, specifically to a training method for a camera scheduling model, a method and apparatus for detecting violations. Background Technology
[0002] Reinforcement learning has wide applications in resource scheduling, primarily for solving NP-hard (nondeterministic polynomial) problems. It is particularly effective in solving dynamically changing problems due to its fast inference speed, high model robustness, and suitability for high-dimensional issues. The camera violation detection problem requires allocating the limited computing resources of a single AI video inference server to a large number of cameras for abnormal video behavior detection. Given the large number of cameras and limited computing resources, the challenge is how to allocate these limited inference resources to detect as many abnormal behaviors as possible.
[0003] Camera scheduling for violation detection has the following characteristics:
[0004] 1. Violation detection is time-consuming. Because violation detection is based on GPU computing resources, it differs from conventional computing, and acquiring historical data takes a long time, thus limiting the sample size.
[0005] 2. Incomplete historical data distribution is unavailable. Due to limited computing resources and the large number of cameras, even at full capacity, it is impossible to obtain the distribution characteristics of violations from all cameras over 24 hours. Therefore, the historical data obtained in reality is sparse.
[0006] 3. Lack of available labels makes supervised learning unsuitable. This is because cameras are dynamically added and removed, resulting in insufficient historical data.
[0007] The aforementioned characteristics result in limited computing resources, sparse historical data, and unacceptable time costs due to the time-consuming feedback of violation detection, leading to low training efficiency of the camera scheduling model.
[0008] However, the training efficiency of existing camera scheduling models is relatively low. Summary of the Invention
[0009] This application provides a training method for a camera scheduling model, a method and apparatus for detecting violations, aiming to solve the problem of low training efficiency of camera scheduling models in the prior art.
[0010] Firstly, this application provides a method for training a camera scheduling model, the method comprising:
[0011] Acquire historical sample data from the camera;
[0012] A first sample probability distribution is constructed based on the historical sample data;
[0013] Sampling is performed based on the probability distribution of the first sample to obtain the first sampling action data;
[0014] The camera scheduling model is obtained by training the reinforcement learning model based on the first sampled action data.
[0015] Optionally, the first sample probability distribution includes the beta distribution of each camera and the beta distribution of each time period within a day, and the construction of the first sample probability distribution based on the historical sample data includes:
[0016] Based on the historical sample data, the violation frequency and non-violation frequency of each camera, as well as the violation frequency and non-violation frequency of each time period, are obtained;
[0017] The beta distribution of each camera is established based on the violation frequency and non-violation frequency of each camera. The two parameters of the beta distribution of the camera are the violation frequency and non-violation frequency of the camera, respectively.
[0018] The beta distribution for each time period is established based on the frequency of violations and the frequency of no violations. The two parameters of the beta distribution for each time period are the frequency of violations and the frequency of no violations.
[0019] Optionally, the reinforcement learning model includes an evaluation network and a reality network, and the step of training the reinforcement learning model based on the first sampled action data to obtain the camera scheduling model includes:
[0020] Input the current state information corresponding to the first sampled action data into the valuation network to obtain the valuation output action data;
[0021] The first sample probability distribution is updated based on the estimated output action data to obtain the updated second sample probability distribution.
[0022] Sampling is performed based on the second sample probability distribution to obtain the second sampling action data;
[0023] The estimation network and the reality network are iteratively trained based on the first sampled action data and the second sampled action data to obtain the camera scheduling model.
[0024] Optionally, the step of iteratively training the valuation network and the reality network based on the first sampled action data and the second sampled action data to obtain the camera scheduling model includes:
[0025] Obtain the estimated Q value output by the valuation network at each iteration;
[0026] Input the next state information corresponding to the second sampling action data at each iteration into the real network to obtain the target Q value of the real network;
[0027] The loss is calculated based on the estimated Q-value, reward value, and target Q-value, and the estimated network and the real network are iteratively trained to obtain the camera scheduling model.
[0028] Optionally, before calculating the loss based on the estimated Q-value, reward value, and target Q-value, and iteratively training the estimation network and the real network to obtain the camera scheduling model, the following steps are included:
[0029] The reward value is determined based on the similarity between the estimated output action data and the second sampled action data.
[0030] Optionally, the step of sampling based on the first sample probability distribution to obtain the first sampling action data includes:
[0031] The beta distribution of each camera is sampled to obtain the first violation probability parameter for each camera;
[0032] The beta distribution of each time period is sampled to obtain the second violation probability parameter for each time period;
[0033] The third violation probability parameter of each camera in each time period is determined based on the first violation probability parameter of each camera and the second violation probability parameter of each time period.
[0034] The cameras and time periods corresponding to the third violation probability parameter, which are sorted from largest to smallest, are determined as the camera time period combinations to be inspected, thus obtaining the first sampling action data.
[0035] Optionally, updating the first sample probability distribution based on the estimated output action data to obtain the updated second sample probability distribution includes:
[0036] Based on the estimated output action data, the historical sample data is checked to obtain the frequency of violations and the frequency of non-violations detected for each camera, as well as the frequency of violations and the frequency of non-violations detected for each time period.
[0037] The violation frequency and non-violation frequency of each camera are updated based on the violation frequency and non-violation frequency detected by each camera, so as to update the beta distribution of each camera.
[0038] The frequency of violations and the frequency of non-violations detected in each time period are updated to update the beta distribution of each time period.
[0039] The updated beta distributions of each camera and the updated beta distributions of each time period are determined as the second sample probability distribution.
[0040] Secondly, this application provides a method for detecting violations, the detection method comprising:
[0041] Obtain the current status information of each camera, wherein the current status information includes the number of violations and the number of non-violations of each camera in each time period;
[0042] The current state information is input into the camera scheduling model to obtain camera scheduling action information; wherein, the camera scheduling model is any one of the camera scheduling models described in the first aspect.
[0043] Based on the camera scheduling action information, a combination of camera time periods to be inspected is determined, wherein the combination of camera time periods to be inspected includes the camera to be inspected and the corresponding time period to be inspected.
[0044] The violation event detection results are obtained by performing violation event detection on the information captured by the camera to be inspected during the inspection period.
[0045] Thirdly, this application provides a training device for a camera scheduling model, the training device comprising:
[0046] The acquisition unit is used to acquire historical sample data from the camera;
[0047] A distribution construction unit is used to construct a first sample probability distribution based on the historical sample data;
[0048] A sampling unit is used to sample based on the probability distribution of the first sample to obtain first sampling action data;
[0049] The model training unit is used to train the reinforcement learning model based on the first sampled action data to obtain the camera scheduling model.
[0050] Optionally, the first sample probability distribution includes the beta distribution of each camera and the beta distribution of each time period within a day, and the distribution construction unit is used for:
[0051] Based on the historical sample data, the violation frequency and non-violation frequency of each camera, as well as the violation frequency and non-violation frequency of each time period, are obtained;
[0052] The beta distribution of each camera is established based on the violation frequency and non-violation frequency of each camera. The two parameters of the beta distribution of the camera are the violation frequency and non-violation frequency of the camera, respectively.
[0053] The beta distribution for each time period is established based on the frequency of violations and the frequency of no violations. The two parameters of the beta distribution for each time period are the frequency of violations and the frequency of no violations.
[0054] Optionally, the reinforcement learning model includes a valuation network and a reality network, and the model training unit is used for:
[0055] Input the current state information corresponding to the first sampled action data into the valuation network to obtain the valuation output action data;
[0056] The first sample probability distribution is updated based on the estimated output action data to obtain the updated second sample probability distribution.
[0057] Sampling is performed based on the second sample probability distribution to obtain the second sampling action data;
[0058] The estimation network and the reality network are iteratively trained based on the first sampled action data and the second sampled action data to obtain the camera scheduling model.
[0059] Optionally, the model training unit is used for:
[0060] Obtain the estimated Q value output by the valuation network at each iteration;
[0061] Input the next state information corresponding to the second sampling action data at each iteration into the real network to obtain the target Q value of the real network;
[0062] The loss is calculated based on the estimated Q-value, reward value, and target Q-value, and the estimated network and the real network are iteratively trained to obtain the camera scheduling model.
[0063] Optionally, the model training unit is used for:
[0064] The reward value is determined based on the similarity between the estimated output action data and the second sampled action data.
[0065] Optionally, the sampling unit is used for:
[0066] The beta distribution of each camera is sampled to obtain the first violation probability parameter for each camera;
[0067] The beta distribution of each time period is sampled to obtain the second violation probability parameter for each time period;
[0068] The third violation probability parameter of each camera in each time period is determined based on the first violation probability parameter of each camera and the second violation probability parameter of each time period.
[0069] The cameras and time periods corresponding to the third violation probability parameter, which are sorted from largest to smallest, are determined as the camera time period combinations to be inspected, thus obtaining the first sampling action data.
[0070] Optionally, the model training unit is configured to include:
[0071] Based on the estimated output action data, the historical sample data is checked to obtain the frequency of violations and the frequency of non-violations detected for each camera, as well as the frequency of violations and the frequency of non-violations detected for each time period.
[0072] The violation frequency and non-violation frequency of each camera are updated based on the violation frequency and non-violation frequency detected by each camera, so as to update the beta distribution of each camera.
[0073] The frequency of violations and the frequency of non-violations detected in each time period are updated to update the beta distribution of each time period.
[0074] The updated beta distributions of each camera and the updated beta distributions of each time period are determined as the second sample probability distribution.
[0075] Fourthly, this application provides a detection device for violations, the detection device comprising:
[0076] The acquisition unit is used to acquire the current status information of each camera, wherein the current status information includes the violation frequency and non-violation frequency of each camera in each time period;
[0077] A scheduling action unit is used to input the current state information into a camera scheduling model to obtain camera scheduling action information; wherein, the camera scheduling model is any one of the camera scheduling models described in the first aspect;
[0078] The determining unit is configured to determine a combination of camera time periods to be inspected based on the camera scheduling action information, wherein the combination of camera time periods to be inspected includes the camera to be inspected and the corresponding time period to be inspected.
[0079] The violation detection unit is used to detect violations in the information captured by the camera to be inspected during the inspection period and obtain the violation detection result.
[0080] Fifthly, this application provides a computer device, the computer device comprising:
[0081] One or more processors;
[0082] Memory; and
[0083] One or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the processor to implement the training method of the camera scheduling model according to any one of the first aspects or the violation event detection method according to the second aspect.
[0084] Sixthly, this application provides a computer-readable storage medium storing a plurality of instructions adapted for loading by a processor to perform steps in the training method of the camera scheduling model described in any of the first aspects or the detection method of the violation event described in the second aspect.
[0085] This application provides a training method for a camera scheduling model, a method and apparatus for detecting violations, and a device for training the camera scheduling model. The training method includes: acquiring historical sample data from the camera; constructing a first sample probability distribution based on the historical sample data; sampling based on the first sample probability distribution to obtain first sampled action data; and training a reinforcement learning model based on the first sampled action data to obtain the camera scheduling model. This application utilizes historical sample data to construct a sample probability distribution, and then uses the sample probability distribution to sample and train the reinforcement learning model. Since the input of the reinforcement learning model is generated by the constructed sample probability distribution and does not actually perform environmental responses, it solves the problem of time-consuming acquisition of real samples, enabling rapid acquisition of a large number of samples for training, thereby improving the training efficiency of the camera scheduling model. Attached Figure Description
[0086] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0087] Figure 1 This is a schematic diagram of a scenario for the violation detection system provided in this application embodiment;
[0088] Figure 2 This is a schematic flowchart of an embodiment of the training method for the camera scheduling model provided in this application.
[0089] Figure 3 This is a schematic diagram of the beta distribution of each camera in the embodiments of this application;
[0090] Figure 4 This is a schematic diagram of the beta distribution at various time periods in the embodiments of this application;
[0091] Figure 5This is a flowchart illustrating one embodiment of S204 in this application.
[0092] Figure 6 This is a schematic flowchart of an embodiment of the violation detection method provided in this application.
[0093] Figure 7 This is a schematic diagram of an embodiment of the training device for the camera scheduling model provided in this application.
[0094] Figure 8 This is a schematic diagram of an embodiment of the violation detection device provided in this application.
[0095] Figure 9 This is a schematic diagram of an embodiment of the computer device provided in this application. Detailed Implementation
[0096] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0097] In the description of this application, it should be understood that the terms "center," "longitudinal," "lateral," "length," "width," "thickness," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," and "outer," etc., indicating orientation or positional relationships based on the orientation or positional relationships shown in the accompanying drawings, are used only for the convenience of describing this application and simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation, and therefore should not be construed as a limitation of this application. Furthermore, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of indicated technical features. Thus, features defined with "first" and "second" may explicitly or implicitly include one or more features. In the description of this application, "a plurality of" means two or more, unless otherwise explicitly specified.
[0098] In this application, the term "exemplary" is used to mean "used as an example, illustration, or description." Any embodiment described as "exemplary" in this application is not necessarily to be construed as being more preferred or advantageous than other embodiments. The following description is provided to enable any person skilled in the art to make and use this application. Details are set forth in the following description for purposes of explanation. It should be understood that those skilled in the art will recognize that this application can be made without using these specific details. In other instances, well-known structures and processes are not described in detail to avoid obscuring the description of this application with unnecessary detail. Therefore, this application is not intended to be limited to the embodiments shown, but is consistent with the broadest scope of the principles and features disclosed in this application.
[0099] This application provides a training method for a camera scheduling model, a method for detecting violations, an apparatus, a computer device, and a storage medium, which will be described in detail below.
[0100] Please see Figure 1 , Figure 1 This is a schematic diagram of a violation event detection system provided in an embodiment of this application. The violation event detection system may include a computer device 100, in which a training device for a camera scheduling model and / or a violation event detection device are integrated.
[0101] In this embodiment, the computer device 100 can be a standalone server, a server network, or a server cluster. For example, the computer device 100 described in this embodiment includes, but is not limited to, a computer, a network host, a single network server, a set of multiple network servers, or a cloud server composed of multiple servers. The cloud server is composed of a large number of computers or network servers based on cloud computing.
[0102] In this embodiment, the computer device 100 described above can be a general-purpose computer device or a special-purpose computer device. In specific implementations, the computer device 100 can be a desktop computer, a portable computer, a network server, a handheld computer (Personal Digital Assistant, PDA), a mobile phone, a tablet computer, a wireless terminal device, a communication device, an embedded device, etc. This embodiment does not limit the type of computer device 100.
[0103] Those skilled in the art will understand that Figure 1 The application environment shown is merely one application scenario of the solution in this application and does not constitute a limitation on the application scenario of the solution in this application. Other application environments may include more than one application scenario. Figure 1 The number of computer devices shown is more or less, for example Figure 1 Only one computer device is shown in the diagram. It is understood that the violation detection system may also include one or more other computer devices capable of processing data, which are not specifically limited here.
[0104] In addition, such as Figure 1 As shown, the violation detection system may also include a memory 200 for storing data.
[0105] It should be noted that, Figure 1 The schematic diagram of the violation detection system shown is merely an example. The violation detection system and scenario described in this application are for the purpose of more clearly illustrating the technical solutions of this application and do not constitute a limitation on the technical solutions provided in this application. As those skilled in the art will know, with the evolution of violation detection systems and the emergence of new business scenarios, the technical solutions provided in this application are also applicable to similar technical problems.
[0106] First, this application provides a method for training a camera scheduling model, including: acquiring historical sample data of the camera; constructing a first sample probability distribution based on the historical sample data; sampling based on the first sample probability distribution to obtain first sampling action data; and training a reinforcement learning model based on the first sampling action data to obtain a camera scheduling model.
[0107] like Figure 2 The diagram shown is a flowchart of an embodiment of the training method for the camera scheduling model in this application. The training method for the camera scheduling model includes the following steps S201 to S204:
[0108] S201. Obtain historical sample data from the camera.
[0109] In this embodiment, the historical sample data refers to the detection results of violations in videos captured by cameras within a preset historical time period. The preset historical time period can be one month, one week, etc., and is not limited here. For example, in the video captured by camera 1 on February 10, violations were detected at 4 o'clock and 6 o'clock respectively.
[0110] S202. Construct the first sample probability distribution based on historical sample data.
[0111] In this embodiment of the application, the probability distribution of the first sample is a beta distribution.
[0112] The beta distribution is a probability density function that serves as the conjugate prior distribution of the Bernoulli and binomial distributions, and it has important applications in machine learning and mathematical statistics. In probability theory, the beta distribution, also known as the β distribution, refers to a set of continuous probability distributions defined in the interval (0,1). The beta distribution has two parameters, α and β.
[0113] Beta distribution probability density:
[0114]
[0115] Where the coefficient B is:
[0116]
[0117] The Gamma function can be viewed as a real-field generalization of factorial:
[0118]
[0119] Expectation of the beta distribution:
[0120]
[0121] In one specific embodiment, the first sample probability distribution includes the beta distribution of each camera and the beta distribution of each time period within a day. The first sample probability distribution is constructed based on historical sample data, including:
[0122] Based on historical sample data, the violation frequency and non-violation frequency of each camera, as well as the violation frequency and non-violation frequency of each time period, are obtained. A beta distribution for each camera is established based on the violation frequency and non-violation frequency, where the two parameters of the camera's beta distribution are the violation frequency and non-violation frequency, respectively. Similarly, a beta distribution for each time period is established based on the violation frequency and non-violation frequency, where the two parameters of the time period's beta distribution are the violation frequency and non-violation frequency, respectively.
[0123] Specifically, using limited historical sample data, the frequency of violations is counted in the time period dimension as α0, and the frequency of no violations is counted as β0, constructing an initial distribution Beta(α0, β0); the frequency of violations is counted in the camera dimension as α1, and the frequency of no violations is counted as β1, constructing an initial distribution Beta(α1, β1).
[0124] The beta distribution can be viewed as a probability distribution of probabilities. When the specific probability of an event occurring at the camera is unknown, it can provide the likelihood of all possible occurrences. The beta distribution is a non-fixed formula; it represents a set of distributions, specifically the probability distribution of the occurrence of a certain event. As new events occur, the distribution can be updated using formula (5).
[0125] Beta(α0+hits,β0+misses) (5)
[0126] Here, hits is the frequency of newly generated violations, and misses is the frequency of newly generated non-violations.
[0127] For example, there are N cameras in total, and the total number of historical violation events in the historical sample data is M. The process of locating each event to a specific camera can be viewed as a binomial distribution. Each camera can be assigned a beta distribution such as Beta(α1, β1), where m1, m2, ..., m N These represent the violation frequencies of each camera, M-m1, M-m2, ..., Mm N Let represent the frequency of non-violations for each camera. Therefore, we can obtain the beta distribution across N camera dimensions, as follows:
[0128] Beta1(m1,M-m1),Beta2(m2,M-m2)....Beta N (m N Mm N )
[0129] There are a total of 48 time periods, with a total of M historical violations. The process of locating each event to a specific time period can be viewed as a binomial distribution. Each time period can be assigned a beta distribution such as Beta(α1, β1), where t1, t2, ..., t48 represent the violation frequencies of the 48 time periods, and M-t1, M-t2, ..., M-t48 represent the frequencies of violations not detected in that time period. Therefore, the beta distributions for the 48 time periods are as follows:
[0130] Beta1(t1,M-t1),Beta2(t2,M-t2)....Beta N (t 48 Mt 48 )
[0131] It should be noted that in this embodiment, a day is divided into 48 time periods. In other embodiments, a day may be divided into 24 time periods, 12 time periods, etc., depending on the specific circumstances. This application does not limit this.
[0132] S203. Sampling is performed based on the probability distribution of the first sample to obtain the first sampling action data.
[0133] In one specific embodiment, sampling is performed based on a first sample probability distribution to obtain first sampling action data, including:
[0134] (1) Sample the beta distribution of each camera to obtain the first violation probability parameter of each camera.
[0135] The first violation probability parameter represents the likelihood of a camera violating regulations, and is represented by the ordinate of the camera's beta distribution. For example, the first violation probability parameter for each camera can be obtained by sampling the beta distribution of each camera using the toolkit provided by math3.
[0136] like Figure 3 For example, given the beta distribution of four cameras, sampling is performed respectively. The first violation probability parameter from largest to smallest is: camera 3 (1.95) > camera 1 (1.45) > camera 2 (1.2) > camera 4 (0.8).
[0137] (2) Sample the beta distribution of each time period to obtain the second violation probability parameter of each time period.
[0138] The second violation probability parameter is the likelihood of a violation within a time period, and is the ordinate of the beta distribution for that time period.
[0139] like Figure 4 For example, if the β distributions of three time periods are known, and samples are taken respectively, the second violation probability parameter from largest to smallest is time period 3 (15.7) > time period 1 (12.5) > time period 2 (2.2).
[0140] (3) Determine the third violation probability parameter of each camera in each time period based on the first violation probability parameter of each camera and the second violation probability parameter of each time period.
[0141] Specifically, the product of the first violation probability parameter and the second violation probability parameter is determined as the third violation probability parameter for each camera at each time period.
[0142] (4) The cameras and time periods corresponding to the third violation probability parameter, which are sorted from largest to smallest, are determined as the camera time period combinations to be inspected, and the first sampling action data is obtained.
[0143] Specifically, the third violation probability parameters of each camera in each time period are sorted from largest to smallest. The cameras and time periods corresponding to the top-ranked preset number of third violation probability parameters are obtained. The preset number of cameras and time periods corresponding to the third violation probability parameters are determined as the camera time period combination to be inspected, thus obtaining the first sampling action data.
[0144] The preset quantity can be determined based on the upper limit of computing resources. For example, if the upper limit of computing resources is 4, then the preset quantity is 4.
[0145] Referring to Table 1, assuming the maximum computing resource is 4, the camera time slot combinations to be inspected are: Camera 1, Time Slot 3; Camera 2, Time Slot 3; Camera 3, Time Slot 1; Camera 3, Time Slot 3. The camera time slot combinations to be inspected are marked with a first identifier, for example, a first identifier of "1", indicating that the camera will be inspected during that time slot. Other camera time slot combinations are marked with a second identifier, for example, a second identifier of "0", indicating that the camera will not be inspected during that time slot. This yields the camera time slot action table, i.e., the first sampling action data.
[0146]
[0147] Table 1: First Sampling Action Data
[0148] S204. The reinforcement learning model is trained based on the first sampled action data to obtain the camera scheduling model.
[0149] In this embodiment, the reinforcement learning model is a Q-learning model, a DQN model, etc. A reinforcement learning model typically requires an evaluation network (Q_network_local) and a target network (Q_network_target), where the evaluation network is used for decision-making and the target network is used for value evaluation. During the learning process, the evaluation network calculates the predicted value (weights θ), and the target network calculates the target value (weights θ'). After several steps, the target network is frozen, and then the weights of the evaluation network are copied to the target network. Freezing the target network for a period of time and then updating its weights with the weights of the evaluation network stabilizes the training process. The evaluation network updates the target network at a preset frequency.
[0150] See Figure 5 In one specific embodiment, training a reinforcement learning model based on the first sampled action data to obtain a camera scheduling model may include:
[0151] S301. Input the current state information corresponding to the first sampled action data into the valuation network to obtain the valuation output action data.
[0152] Specifically, historical sample data is statistically analyzed at the time period and camera dimensions to determine the frequency of violations and non-violations for each camera in each time period, thus obtaining initialization status information. For example, the initialization status information is shown in Table 2.
[0153]
[0154] Table 2: Initialization Status Information
[0155] The first sample action data 'a' is used to check the historical sample data to obtain the violation frequency of each camera and the violation frequency of each time period. The initial state information is then updated based on the violation frequencies of each camera and each time period to obtain the current state information. For example, if the data of camera 1 in time period 1 on day Z in the historical sample data is checked, and a violation occurs, the violation frequency (hits) is set to 1; the no-violation frequency (misses) is set to 0. If no violation occurs, the violation frequency (hits) is set to 1; the no-violation frequency (misses) is set to 0. The initial state information is then updated to obtain the current state information 's'.
[0156] Input the current state information s corresponding to the first sampled action data a into the valuation network to obtain the valuation output action data X.
[0157] The strategy output by the valuation network is the action, as shown in Table 3. The sum of all sampled actions is denoted as P, which is 467 in this case. In the camera dimension, we can obtain the hits_s_x corresponding to each camera, and misses_s_x = P - hits_s_x, so that the N β distributions in the camera dimension can be updated according to Equation 5. In the time dimension, we can obtain the hits_t_x corresponding to each camera, and misses_t_x = P - hits_t_x, so that the 48 β distributions in the time dimension can be updated according to Equation 5.
[0158]
[0159]
[0160] Table 3: Valuation Output Action Data from the Valuation Network
[0161] S302. Update the probability distribution of the first sample based on the estimated output action data to obtain the updated probability distribution of the second sample.
[0162] In one specific embodiment, the first sample probability distribution is updated based on the estimated output action data to obtain the updated second sample probability distribution, including:
[0163] (1) Based on the estimated output action data, the historical sample data is checked to obtain the inspection violation frequency of each camera and the inspection violation frequency of each time period.
[0164] (2) Update the violation frequency and non-violation frequency of each camera based on the violation frequency of each camera, so as to update the beta distribution of each camera.
[0165] Specifically, the violation frequency of each camera is added to the violation frequency of each camera, and the violation frequency of each camera is updated; the non-violation frequency of each camera is added to the non-violation frequency of each camera, and the violation frequency and non-violation frequency of each camera are updated. The beta distribution of each camera is updated based on the updated violation frequency and updated non-violation frequency of each camera.
[0166] (3) Update the frequency of violations and the frequency of non-violations for each time period based on the frequency of violations in each time period, so as to update the beta distribution for each time period.
[0167] Specifically, the violation frequency of inspections in each time period is added to the violation frequency of each time period, and the violation frequency of each time period is updated; the non-violation frequency of inspections in each time period is added to the non-violation frequency of each time period, and the violation frequency and non-violation frequency of each time period are updated. The beta distribution of each time period is updated based on the updated violation frequency and updated non-violation frequency of each time period.
[0168] (4) The updated beta distribution of each camera and the updated beta distribution of each time period are determined as the second sample probability distribution.
[0169] S303. Sampling is performed based on the probability distribution of the second sample to obtain the second sampling action data.
[0170] The second sample probability distribution includes the updated beta distribution of each camera and the updated beta distribution for each time period.
[0171] (1) Sample the beta distribution of each camera after the update to obtain the fourth violation probability parameter of each camera.
[0172] The fourth violation probability parameter represents the likelihood of a camera violating regulations, and is the ordinate of the updated beta distribution of the cameras. For example, the beta distribution of each camera can be sampled separately using the toolkit provided by math3.
[0173] (2) Sample the beta distribution of each camera after the update to obtain the fifth violation probability parameter for each time period.
[0174] The fifth violation probability parameter is the likelihood of a violation within a time period, and is the vertical axis of the beta distribution for that time period.
[0175] (3) Determine the sixth violation probability parameter of each camera in each time period based on the fourth violation probability parameter of each camera and the fifth violation probability parameter of each time period.
[0176] Specifically, the product of the fourth and fifth violation probability parameters is determined as the sixth violation probability parameter for each camera at each time period.
[0177] (4) The cameras and time periods corresponding to the sixth violation probability parameter, which are sorted from largest to smallest, are determined as the camera time period combinations to be inspected, and the second sampling action data is obtained.
[0178] It should be noted that the first sample data is obtained by sampling the first sample probability distribution, and the second sample probability distribution is obtained by sampling the second sample probability distribution. The sampling process for both is the same.
[0179] S304. Based on the first sampled action data and the second sampled action data, the estimation network and the reality network are iteratively trained to obtain the camera scheduling model.
[0180] In this embodiment of the application, the estimation network and the reality network are iteratively trained based on the first sampled action data and the second sampled action data to obtain a camera scheduling model, including:
[0181] (1) Obtain the estimated Q value of the valuation network output at each iteration.
[0182] At each iteration, the current state information corresponding to the first sampled action data is input into the estimation network to obtain the estimated Q value, denoted as Q(s,a;θ).
[0183] (2) Input the next state information corresponding to the second sampling action data at each iteration into the real network to obtain the target Q value of the real network.
[0184] Similar to the first sampling action data, the second sampling action data is used to check the historical sample data, obtaining the violation frequency of each camera and the violation frequency of each time period. The current status information is then updated based on the violation frequencies of each camera and each time period to obtain the next status information. For example, checking the data of camera 1 in time period 1 on day N in the historical sample data: if a violation occurs, the violation frequency (hits) = 1; the no-violation frequency (misses) = 0; if no violation occurs, the violation frequency (hits) = 1; the no-violation frequency (misses) = 0. The current status information s is then updated to obtain the next status information s′.
[0185] In each iteration, the next state information s′ corresponding to the second sampled action data a′ is input into the real network to obtain the estimated Q value, denoted as Q(s′,a′; θ). - ).
[0186] (3) Calculate the loss based on the estimated Q value, reward value and target Q value and iteratively train the estimated network and the real network to obtain the camera scheduling model.
[0187] In a specific embodiment, the loss is calculated using the loss function shown in formula (6), and the network weights of the valuation network are updated according to the value function shown in formula (7). When the loss is less than a predetermined value, training is stopped, and the camera scheduling model is obtained.
[0188] L(θ)=E[(TargetQ-Q(s,a;θ)) 2 ]
[0189] TargetQ = r + γmax a’ Q(s', a'; θ) (6)
[0190]
[0191] Where r is the reward value.
[0192] In one specific embodiment, the reward value is determined based on the similarity between the valuation output action data X of the valuation network and the second sampled action data Y, where Y = a′.
[0193] Specifically, the reward value is calculated according to formula (8).
[0194]
[0195] Where X represents the output action data of the valuation network, Y represents the second sampled action data, Cov(X,Y) is the covariance between the output action data of the valuation network and the second sampled action data, and Var(X) and Var(Y) are the variances, respectively. The correlation coefficient is used to measure the difference between the valuation network output and the second sampled action data; the higher the similarity, the higher the value, representing a higher feedback reward, and a larger reward value r.
[0196] Because the known historical data is incomplete and biased for each day, it cannot accurately reflect the actual time periods and camera distribution of violations. Therefore, a constructed beta distribution is used to calculate reward values using the only available historical sample data. Through the continuous updating of the beta distribution, and by cross-calculating the reward value with the next moment's state information of the distribution sampling and the output action data of each time node, the distribution of violations can be measured to the greatest extent possible.
[0197] The reward value is generated by the environmental response state produced by the output of the valuation network and the updated distribution, instead of the traditional real environmental feedback reward value. Because historical sample data is used to narrow the search space in the early stage of training, the output of the valuation network in each iteration immediately guides the search direction of the next search in the search space, thereby accelerating the initial convergence speed of DQN learning.
[0198] Ultimately, since the reasoning is not based on real video, the problem of excessively long video reasoning time in reinforcement learning for monitoring camera violations is solved. However, because the limited video reasoning data provides prior knowledge for reinforcement learning and beta distribution, and beta distribution generates more real data samples during continuous parameter adjustment, the learning is accelerated, thus solving the problems of long feedback cycles and long reasoning time.
[0199] For further details, please refer to [link / reference]. Figure 6 This application provides a method for detecting violations, the method including:
[0200] S401. Obtain the current status information of each camera, wherein the current status information includes the number of violations and the number of non-violations of each camera in each time period.
[0201] The current status information can be found in Table 2.
[0202] S402. Input the current status information into the camera scheduling model to obtain camera scheduling action information.
[0203] Among them, the camera scheduling model can be any of the above camera scheduling models.
[0204] S403. Determine the combination of camera time periods to be inspected based on camera scheduling action information. The combination of camera time periods to be inspected includes the camera to be inspected and the corresponding time period to be inspected.
[0205] The camera scheduling action information can be found in Table 3. The cameras and time periods to be inspected are determined as the camera-time period combinations to be inspected. Each camera-time period combination includes both the camera to be inspected and the time period to be inspected. For example, for camera 1 and time period 1, according to Table 3, camera 1 and time period 1 are a camera-time period combination to be inspected; for camera 1 and time period 2, according to Table 3, camera 1 and time period 2 are not a camera-time period combination to be inspected.
[0206] S404. Detect violations in the footage captured by the camera to be inspected during the inspection period and obtain the violation detection results.
[0207] Specifically, the video footage captured by the camera to be inspected during the corresponding inspection period is retrieved for violation detection, and the violation detection results are obtained.
[0208] To better implement the training method of the camera scheduling model in the embodiments of this application, based on the training method of the camera scheduling model, the embodiments of this application also provide a training device for the camera scheduling model, such as... Figure 7 As shown, the training device 700 for the camera scheduling model includes:
[0209] Acquisition unit 701 is used to acquire historical sample data from the camera;
[0210] Distribution building unit 702 is used to construct the first sample probability distribution based on historical sample data;
[0211] The sampling unit 703 is used to perform sampling based on the first sample probability distribution to obtain the first sampling action data;
[0212] The model training unit 704 is used to train the reinforcement learning model based on the first sampled action data to obtain the camera scheduling model.
[0213] Optionally, the first sample probability distribution includes the beta distribution of each camera and the beta distribution of each time period within a day. The distribution building unit 702 is used for:
[0214] Based on historical sample data, obtain the violation frequency and non-violation frequency of each camera, as well as the violation frequency and non-violation frequency of each time period;
[0215] The beta distribution of each camera is established based on the violation frequency and non-violation frequency of each camera. The two parameters of the beta distribution of the camera are the violation frequency and non-violation frequency of the camera, respectively.
[0216] The beta distribution for each time period is established based on the frequency of violations and the frequency of no violations. The two parameters of the beta distribution for each time period are the frequency of violations and the frequency of no violations.
[0217] Optionally, the reinforcement learning model includes a valuation network and a reality network, and the model training unit 704 is used for:
[0218] Input the current state information corresponding to the first sampled action data into the valuation network to obtain the valuation output action data;
[0219] The probability distribution of the first sample is updated based on the valuation output action data to obtain the updated probability distribution of the second sample.
[0220] Sampling is performed based on the probability distribution of the second sample to obtain the second sampling action data;
[0221] The camera scheduling model is obtained by iteratively training the estimation network and the reality network based on the first sampled action data and the second sampled action data.
[0222] Optionally, the model training unit 704 is used for:
[0223] Obtain the estimated Q value output by the valuation network at each iteration;
[0224] Input the next state information corresponding to the second sampling action data at each iteration into the real network to obtain the target Q value of the real network;
[0225] The camera scheduling model is obtained by calculating the loss based on the estimated Q-value, reward value, and target Q-value, and iteratively training the estimated network and the real network.
[0226] Optionally, the model training unit 704 is used for:
[0227] The reward value is determined based on the similarity between the estimated output action data and the second sampled action data.
[0228] Optionally, the sampling unit 703 is used for:
[0229] The beta distribution of each camera is sampled to obtain the first violation probability parameter for each camera;
[0230] The beta distribution of each time period is sampled to obtain the second violation probability parameter for each time period;
[0231] The third violation probability parameter of each camera in each time period is determined based on the first violation probability parameter of each camera and the second violation probability parameter of each time period.
[0232] The cameras and time periods corresponding to the third violation probability parameter, which are sorted from largest to smallest, are determined as the camera time period combinations to be inspected, thus obtaining the first sampling action data.
[0233] Optionally, the model training unit 704 is used to include:
[0234] Based on the estimated output action data, historical sample data is checked to obtain the frequency of violations and the frequency of non-violations detected for each camera, as well as the frequency of violations and the frequency of non-violations detected for each time period.
[0235] The violation frequency and non-violation frequency of each camera are updated based on the violation frequency and non-violation frequency detected by each camera, so as to update the beta distribution of each camera.
[0236] The frequency of violations and the frequency of non-violations detected in each time period are updated to update the beta distribution of each time period.
[0237] The updated beta distributions of each camera and the updated beta distributions of each time period are determined as the second sample probability distribution.
[0238] To better implement the violation detection method in the embodiments of this application, based on the violation detection method, the embodiments of this application also provide a violation detection device, such as... Figure 8As shown, the violation detection device 800 includes:
[0239] The acquisition unit 801 is used to acquire the current status information of each camera, wherein the current status information includes the violation frequency and non-violation frequency of each camera in each time period;
[0240] The scheduling action unit 802 is used to input the current state information into the camera scheduling model to obtain camera scheduling action information; wherein, the camera scheduling model is any one of the camera scheduling models in the first aspect;
[0241] The determining unit 803 is used to determine the combination of time periods of the camera to be inspected based on the camera scheduling action information. The combination of time periods of the camera to be inspected includes the camera to be inspected and the corresponding time period to be inspected.
[0242] The violation detection unit 804 is used to detect violations in the information captured by the camera to be inspected during the inspection period and obtain the violation detection results.
[0243] This application also provides a computer device that integrates a training device for any of the camera scheduling models provided in this application and / or a detection device for violation events. The computer device includes:
[0244] One or more processors;
[0245] Memory; and
[0246] One or more applications, wherein the applications are stored in memory and configured to be executed by a processor in a method for training a camera scheduling model or a method for detecting violations.
[0247] like Figure 9 As shown, it illustrates a structural schematic diagram of the computer device involved in the embodiments of this application, specifically:
[0248] The computer device may include components such as a processor 901 with one or more processing cores, a memory 902 with one or more computer-readable storage media, a power supply 903, and an input unit 904. Those skilled in the art will understand that the computer device structure shown in the figures does not constitute a limitation on the computer device, and may include more or fewer components than shown, or combine certain components, or have different component arrangements. Wherein:
[0249] The processor 901 is the control center of the computer device. It connects various parts of the computer device via various interfaces and lines, and performs various functions and processes data by running or executing software programs and / or modules stored in the memory 902, and by calling data stored in the memory 902, thereby providing overall monitoring of the computer device. Optionally, the processor 901 may include one or more processing cores; the processor 901 may be a Central Processing Unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or any conventional processor. Preferably, the processor 901 may integrate an application processor and a modem processor, wherein the application processor mainly handles the operating system, user interface, and application programs, and the modem processor mainly handles wireless communication. It is understood that the aforementioned modem processor may not be integrated into the processor 901.
[0250] The memory 902 can be used to store software programs and modules. The processor 901 executes various functional applications and data processing by running the software programs and modules stored in the memory 902. The memory 902 may mainly include a program storage area and a data storage area. The program storage area may store the operating system, application programs required for at least one function (such as sound playback function, image playback function, etc.), etc.; the data storage area may store data created according to the use of the computer device, etc. In addition, the memory 902 may include high-speed random access memory, and may also include non-volatile memory, such as at least one disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 902 may also include a memory controller to provide the processor 901 with access to the memory 902.
[0251] The computer device also includes a power supply 903 that supplies power to the various components. Preferably, the power supply 903 can be logically connected to the processor 901 through a power management system, thereby enabling functions such as charging, discharging, and power consumption management through the power management system. The power supply 903 may also include one or more DC or AC power supplies, recharging systems, power fault detection circuits, power converters or inverters, power status indicators, and other arbitrary components.
[0252] The computer device may also include an input unit 904, which can be used to receive input digital or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.
[0253] Although not shown, the computer device may also include a display unit, etc., which will not be described in detail here. Specifically, in this embodiment, the processor 901 in the computer device loads the executable files corresponding to the processes of one or more application programs into the memory 902 according to the following instructions, and the processor 901 runs the application programs stored in the memory 902 to realize various functions, as follows:
[0254] Acquire historical sample data from the camera;
[0255] Construct the first sample probability distribution based on historical sample data;
[0256] Sampling is performed based on the probability distribution of the first sample to obtain the first sampling action data;
[0257] The camera scheduling model is obtained by training the reinforcement learning model based on the first sampled action data;
[0258] Alternatively, obtain the current status information of each camera, which includes the number of violations and the number of non-violations for each camera at each time period;
[0259] Input the current state information into the camera scheduling model to obtain camera scheduling action information; wherein, the camera scheduling model is any of the above-mentioned camera scheduling models.
[0260] The combination of time periods for cameras to be inspected is determined based on camera scheduling action information. The combination of time periods for cameras to be inspected includes the cameras to be inspected and the corresponding time periods to be inspected.
[0261] The system detects violations by analyzing the footage captured by the cameras during the inspection period and obtains the violation detection results.
[0262] Those skilled in the art will understand that all or part of the steps in the various methods of the above embodiments can be performed by instructions, or by instructions controlling related hardware. These instructions can be stored in a computer-readable storage medium and loaded and executed by a processor.
[0263] Therefore, embodiments of this application provide a computer-readable storage medium, which may include: read-only memory (ROM), random access memory (RAM), a disk, or an optical disk, etc. A computer program is stored thereon, and the computer program is loaded by a processor to execute the steps in the training method of any of the camera scheduling models provided in embodiments of this application. For example, the computer program loaded by the processor can execute the following steps:
[0264] Acquire historical sample data from the camera;
[0265] Construct the first sample probability distribution based on historical sample data;
[0266] Sampling is performed based on the probability distribution of the first sample to obtain the first sampling action data;
[0267] The camera scheduling model is obtained by training the reinforcement learning model based on the first sampled action data;
[0268] Alternatively, obtain the current status information of each camera, which includes the number of violations and the number of non-violations for each camera at each time period;
[0269] Input the current state information into the camera scheduling model to obtain camera scheduling action information; wherein, the camera scheduling model is any of the above-mentioned camera scheduling models.
[0270] The combination of time periods for cameras to be inspected is determined based on camera scheduling action information. The combination of time periods for cameras to be inspected includes the cameras to be inspected and the corresponding time periods to be inspected.
[0271] The system detects violations by analyzing the footage captured by the cameras during the inspection period and obtains the violation detection results.
[0272] In the above embodiments, the descriptions of each embodiment have different focuses. For parts not described in detail in a certain embodiment, please refer to the detailed descriptions of other embodiments above, which will not be repeated here.
[0273] In practice, each of the above units or structures can be implemented as an independent entity or can be arbitrarily combined to be implemented as the same or several entities. For the specific implementation of each of the above units or structures, please refer to the previous method embodiments, which will not be repeated here.
[0274] For details on the implementation of each of the above operations, please refer to the previous examples, which will not be repeated here.
[0275] The training method, violation detection method, apparatus, computer equipment, and storage medium for a camera scheduling model provided in the embodiments of this application have been described in detail above. Specific examples have been used to illustrate the principles and implementation methods of this application. The descriptions of the above embodiments are only for the purpose of helping to understand the methods and core ideas of this application. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of this application. Therefore, the content of this specification should not be construed as a limitation of this application.
Claims
1. A training method for a camera scheduling model, characterized in that, The training method for the camera scheduling model includes: Acquire historical sample data from the camera; A first sample probability distribution is constructed based on the historical sample data. This first sample probability distribution includes the beta distribution of each camera and the beta distribution for each time period within a day, including: Based on the historical sample data, the violation frequency and non-violation frequency of each camera, as well as the violation frequency and non-violation frequency of each time period, are obtained; The beta distribution of each camera is established based on the violation frequency and non-violation frequency of each camera, wherein the two parameters of the beta distribution of the camera are the violation frequency and non-violation frequency of the camera, respectively. The beta distribution for each time period is established based on the violation frequency and non-violation frequency for each time period, wherein the two parameters of the beta distribution for each time period are the violation frequency and non-violation frequency for that time period, respectively. Based on the probability distribution of the first sample, sampling is performed to obtain the first sampling action data, including: The beta distribution of each camera is sampled to obtain the first violation probability parameter of each camera; The beta distribution of each time period is sampled to obtain the second violation probability parameter for each time period; The third violation probability parameter of each camera in each time period is determined based on the first violation probability parameter of each camera and the second violation probability parameter of each time period. The cameras and time periods corresponding to the first preset number of third violation probability parameters, sorted from largest to smallest, are determined as the camera time period combinations to be inspected, thus obtaining the first sampling action data; The camera scheduling model is obtained by training the reinforcement learning model based on the first sampled action data.
2. The training method for the camera scheduling model according to claim 1, characterized in that, The reinforcement learning model includes an evaluation network and a reality network. The step of training the reinforcement learning model based on the first sampled action data to obtain the camera scheduling model includes: Input the current state information corresponding to the first sampled action data into the valuation network to obtain the valuation output action data; The first sample probability distribution is updated based on the estimated output action data to obtain the updated second sample probability distribution. Sampling is performed based on the second sample probability distribution to obtain the second sampling action data; The estimation network and the reality network are iteratively trained based on the first sampled action data and the second sampled action data to obtain the camera scheduling model.
3. The training method for the camera scheduling model according to claim 2, characterized in that, The step of iteratively training the valuation network and the reality network based on the first sampled action data and the second sampled action data to obtain the camera scheduling model includes: Obtain the estimated Q value output by the valuation network at each iteration; Input the next state information corresponding to the second sampling action data at each iteration into the real network to obtain the target Q value of the real network; The loss is calculated based on the estimated Q-value, reward value, and target Q-value, and the estimated network and the real network are iteratively trained to obtain the camera scheduling model.
4. The training method for the camera scheduling model according to claim 3, characterized in that, The process of calculating the loss based on the estimated Q-value, reward value, and target Q-value, and iteratively training the estimated network and the real network to obtain the camera scheduling model, includes the following prior steps: The reward value is determined based on the similarity between the estimated output action data and the second sampled action data.
5. The training method for the camera scheduling model according to claim 2, characterized in that, The step of updating the first sample probability distribution based on the estimated output action data to obtain the updated second sample probability distribution includes: Based on the estimated output action data, the historical sample data is checked to obtain the frequency of violations and the frequency of non-violations detected for each camera, as well as the frequency of violations and the frequency of non-violations detected for each time period. The violation frequency and non-violation frequency of each camera are updated based on the violation frequency and non-violation frequency detected by each camera, so as to update the beta distribution of each camera. The frequency of violations and the frequency of non-violations detected in each time period are updated to update the beta distribution of each time period. The updated beta distributions of each camera and the updated beta distributions of each time period are determined as the second sample probability distribution.
6. A method for detecting violations, characterized in that, The detection method includes: Obtain the current status information of each camera, wherein the current status information includes the number of violations and the number of non-violations of each camera in each time period; The current state information is input into the camera scheduling model to obtain camera scheduling action information; wherein, the camera scheduling model is the camera scheduling model according to any one of claims 1-5; Based on the camera scheduling action information, a combination of camera time periods to be inspected is determined, wherein the combination of camera time periods to be inspected includes the camera to be inspected and the corresponding time period to be inspected. The violation event detection results are obtained by performing violation event detection on the information captured by the camera to be inspected during the inspection period.
7. A training device for a camera scheduling model, characterized in that, The training device includes: The acquisition unit is used to acquire historical sample data from the camera; A distribution construction unit is used to construct a first sample probability distribution based on the historical sample data. This first sample probability distribution includes the beta distribution of each camera and the beta distribution for each time period within a day. It is used for: Based on the historical sample data, the violation frequency and non-violation frequency of each camera, as well as the violation frequency and non-violation frequency of each time period, are obtained; The beta distribution of each camera is established based on the violation frequency and non-violation frequency of each camera, wherein the two parameters of the beta distribution of the camera are the violation frequency and non-violation frequency of the camera, respectively. The beta distribution for each time period is established based on the violation frequency and non-violation frequency for each time period, wherein the two parameters of the beta distribution for each time period are the violation frequency and non-violation frequency for that time period, respectively. The sampling unit is used to sample based on the probability distribution of the first sample to obtain first sampling action data, for the purpose of: The beta distribution of each camera is sampled to obtain the first violation probability parameter of each camera; The beta distribution of each time period is sampled to obtain the second violation probability parameter for each time period; The third violation probability parameter of each camera in each time period is determined based on the first violation probability parameter of each camera and the second violation probability parameter of each time period. The cameras and time periods corresponding to the first preset number of third violation probability parameters, sorted from largest to smallest, are determined as the camera time period combinations to be inspected, thus obtaining the first sampling action data; The model training unit is used to train the reinforcement learning model based on the first sampled action data to obtain the camera scheduling model.
8. A device for detecting violations, characterized in that, The detection device includes: The acquisition unit is used to acquire the current status information of each camera, wherein the current status information includes the violation frequency and non-violation frequency of each camera in each time period; A scheduling action unit is used to input the current state information into the camera scheduling model to obtain camera scheduling action information; wherein, the camera scheduling model is the camera scheduling model according to any one of claims 1-5; The determining unit is configured to determine a combination of camera time periods to be inspected based on the camera scheduling action information, wherein the combination of camera time periods to be inspected includes the camera to be inspected and the corresponding time period to be inspected. The violation detection unit is used to detect violations in the information captured by the camera to be inspected during the inspection period and obtain the violation detection result.
9. A computer device, characterized in that, The computer device includes: One or more processors; Memory; and One or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the processor to implement the training method of the camera scheduling model of any one of claims 1 to 5 or the method for detecting violations of claim 6.
10. A computer-readable storage medium, characterized in that, It stores a computer program, which is loaded by a processor to execute the steps of the training method of the camera scheduling model according to any one of claims 1 to 5 or the method for detecting violations according to claim 6.