The invention discloses a voice recognition method and
system based on a triggered non-autoregressive model. The method comprises the following steps: S11, extracting an acoustic feature sequence; S12, generating a
convolution downsampling sequence; S13, generating an acoustic coding
state sequence; S14, calculating probability distribution and
connection time sequence loss of the prediction marks; S15, calculating the positions and the number of peaks; S16, calculating
cross entropy loss by an acoustic decoder; S17, calculating a gradient according to the joint loss of the
connection time sequence loss and the
cross entropy loss, and performing back propagation; and S18, circularly executing the steps S12 to S17 until the training is completed. The
system comprises an acoustic feature sequence extraction module, a
convolution downsampling module, an acoustic
encoder, a
connection time sequence classification module, an acoustic decoder and a joint loss calculation module which are connected with one another in sequence. The connection
time sequence classification module comprises a linear change module, a connection
time sequence loss calculation module and a peak extraction module.