The invention belongs to the field of 
behavior recognition, particularly relates to a behavior 
feature extraction method, 
system based on space-
time frequency domain hybrid learning, and a device, andaims to solve the problem of low skeleton behavior 
feature extraction precision. The method comprises the steps of obtaining a video behavior sequence based on a skeleton, and extracting a time-spacedomain behavior feature map through converting a network; inputting the time-space domain behavior feature map into a 
frequency domain attention network, performing 
frequency selection, inverting toa time-space domain, and adding the obtained behavior feature map to the time-space domain behavior feature map; synchronously performing local and non-local reasoning, and performing high-level localreasoning; and globally 
pooling the time-space domain behavior feature map obtained through reasoning to obtain the behavior 
feature vector of the video behavior sequence. The method can be applied to behavior classification, behavior detection and the like. According to the method, an 
effective frequency mode is adaptively selected in a 
frequency domain, a network with local affinity fields andnon-local affinity fields is adopted in a time-space domain for space-time reasoning, local details and non-local 
semantic information can be synchronously mined, and therefore the 
behavior recognition precision is effectively improved.