The invention discloses a voice recognition method and system based on a triggered non-autoregressive model. The method comprises the following steps: S11, extracting an acoustic feature sequence; S12, generating a convolution downsampling sequence; S13, generating an acoustic coding state sequence; S14, calculating probability distribution and connection time sequence loss of the prediction marks; S15, calculating the positions and the number of peaks; S16, calculating cross entropy loss by an acoustic decoder; S17, calculating a gradient according to the joint loss of the connection time sequence loss and the cross entropy loss, and performing back propagation; and S18, circularly executing the steps S12 to S17 until the training is completed. The system comprises an acoustic feature sequence extraction module, a convolution downsampling module, an acoustic encoder, a connection time sequence classification module, an acoustic decoder and a joint loss calculation module which are connected with one another in sequence. The connection time sequence classification module comprises a linear change module, a connection time sequence loss calculation module and a peak extraction module.