Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Video event recognition method based on deep residual long-short term memory network

A long-short-term memory and video event technology, applied in character and pattern recognition, biological neural network models, computer components, etc., can solve problems such as small distance between event classes, gradient disappearance, camera viewing angle changes, etc., to achieve good generalization The effects of ability and discrimination, shortening the intra-class distance, and improving discrimination

Inactive Publication Date: 2018-11-06
SUZHOU UNIV +2
View PDF4 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

With the wide application of video surveillance in real life, surveillance video event recognition has received extensive attention and a series of research results have been achieved. However, event recognition of surveillance video still faces great challenges and difficulties, such as surveillance in natural scenes. Due to complex video background, severe object occlusion in the event area, and changes in camera angle of view, the distance between event classes is small and the distance within a class is large.
[0003] In the existing technology, in order to solve the problem of difficult event recognition in surveillance video, the traditional solution is to use the method based on the visual bag of words model and the method based on motion trajectory for event recognition in surveillance video, but this manual feature recognition method is difficult to further Improve recognition accuracy; with the development of the times, deep learning has become a research hotspot in the field of artificial intelligence, and has begun to be applied to surveillance video event detection, behavior recognition and other fields, for example, the dual-stream CNN network for behavior recognition, among them, time CNN The network uses the static frame information of the video, and the spatial CNN network uses the optical flow information of the video, but the method represented by the dual-stream CNN network only uses the short-term dynamic features of the video, and does not effectively use the long-term dynamic features of the video. There are still some defects in video event recognition, so the method of long-term recursive convolution network (LRCN) is used to make up for the above defects. LRCN uses CNN network to extract features, and then sends them to LSTM network to obtain recognition results. Among them, LSTM is also called The long-term short-term memory network can recursively learn long-term dynamic features from the input sequence, so it can handle tasks with typical time sequences, such as speech recognition and behavior recognition. Therefore, the recognition ability of CNN and LSTM networks can be improved through deep architectures. However, whether it is CNN or LSTM, as the depth of the network increases, it will encounter the problem of gradient disappearance, and it is difficult to train a deeper network.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Video event recognition method based on deep residual long-short term memory network
  • Video event recognition method based on deep residual long-short term memory network
  • Video event recognition method based on deep residual long-short term memory network

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach

[0025] Shown in conjunction with accompanying drawing is the embodiment of a kind of video event recognition method based on depth residual long short-term memory network of the present invention, comprises the following steps:

[0026] Step 1) Design of spatio-temporal feature data connection unit: the spatio-temporal feature data is synchronously analyzed by LSTM to form a spatio-temporal feature data link unit DLSTM;

[0027] Such as figure 1 As shown, the specific steps include:

[0028] (1) Receive data: First, two LSTM units are used, which are respectively denoted as SLSTM and TLSTM, and SLSTM receives the feature h from the spatial CNN network SL , the TLSTM receives the feature h from the temporal CNN network TL ;

[0029] (2) Activation function conversion: Before receiving input, the LSTM unit needs to use a nonlinear activation function to process the input data, using the ReLU activation function, SLSTM and TLSTM are transformed by the ReLU activation function ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a video event recognition method based on a deep residual long-short term memory network, which comprises 1) the design of spatial-temporal feature data connection layer, thatis, spatial-temporal feature data is synchronously parsed through a long-short term memory (LSTM) unit and then forms a spatial-temporal feature data connection unit DLSTM (double-LSTM), and the consistency of spatial and temporal information is highlighted; (2) the design of a DU-DLSTM (dual unidirectional DLSTM) structure which expands the width of the network and increases the feature selectionrange; (3) the design of an RDU-DLSTM (residual dual unidirectional DLSTM) module which solves a deeper problem of network gradient disappearance; and 4) the design of a 2C-softmax objective functionwhich reduces the distance within classes while expanding the distance between the classes. The video event recognition method has the advantages that the problem of gradient disappearance is solvedthrough constructing the deep residual network framework, and the video event recognition accuracy is improved by using the consistency fusion of temporal network and spatial network features at the same time.

Description

technical field [0001] The invention relates to a video event recognition technology, in particular to a video event recognition method based on a deep residual long-short-term memory network. Background technique [0002] Video event recognition refers to the recognition of spatio-temporal visual patterns of events from videos. With the wide application of video surveillance in real life, surveillance video event recognition has received extensive attention and a series of research results have been achieved. However, event recognition of surveillance video still faces great challenges and difficulties, such as surveillance in natural scenes. Factors such as complex video background, severe object occlusion in the event area, and changes in camera viewing angles lead to small inter-class distances and large intra-class distances. [0003] In the existing technology, in order to solve the problem of difficult event recognition in surveillance video, the traditional solution...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/00
CPCG06N3/049G06V20/44G06V20/41G06V20/46G06N3/045
Inventor 龚声蓉李永刚刘纯平季怡曹李军王朝晖
Owner SUZHOU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products