The invention belongs to the field of
behavior recognition, particularly relates to a behavior
feature extraction method,
system based on space-
time frequency domain hybrid learning, and a device, andaims to solve the problem of low skeleton behavior
feature extraction precision. The method comprises the steps of obtaining a video behavior sequence based on a skeleton, and extracting a time-spacedomain behavior feature map through converting a network; inputting the time-space domain behavior feature map into a
frequency domain attention network, performing
frequency selection, inverting toa time-space domain, and adding the obtained behavior feature map to the time-space domain behavior feature map; synchronously performing local and non-local reasoning, and performing high-level localreasoning; and globally
pooling the time-space domain behavior feature map obtained through reasoning to obtain the behavior
feature vector of the video behavior sequence. The method can be applied to behavior classification, behavior detection and the like. According to the method, an
effective frequency mode is adaptively selected in a
frequency domain, a network with local affinity fields andnon-local affinity fields is adopted in a time-space domain for space-time reasoning, local details and non-local
semantic information can be synchronously mined, and therefore the
behavior recognition precision is effectively improved.