The invention discloses a voice keyword recognition method based on end-to-end, a device thereof and equipment. The concept of the invention is to combine with an end-to-end thought; according to themethod, a pre-built keyword recognition network is directly fitted from the features to the target, so that the recognition process is simpler and more efficient, the superposition effect of adverse effects can be avoided, meanwhile, the keyword recognition network is easier to achieve global optimization, the development cost can be effectively reduced, and therefore, the method has a higher practical value in an actual business scene. According to the invention, an acquisition strategy of identification features is improved; therefore, the pronunciation characteristics adapting to the business scene can be fully represented; therefore, more potential key information can be captured; in addition, the keyword recognition network architecture provided by the invention can utilize the context information from the acoustic perspective, so that the defect that the existing scheme only performs recognition through an isolated pronunciation sample is fundamentally overcome, and the processing effect of locking the keywords from the audio is further obviously improved.