The invention provides a computer-aided translation method and system for a voice sequence and a visual terminal, and belongs to the technical field of computer-aided translation. The method comprises the following steps: translating a voice sequence of a first source language into a text sequence of a second target language, displaying a target text string in a predetermined picture format, and meanwhile, performing voice broadcast on an output text sequence of the second target language. The system comprises a recording storage subsystem, a pause point detection subsystem, a sequence segmentation subsystem, a target voice string recognition subsystem, a voice-to-text subsystem, an abstract generation subsystem, a text translation subsystem, a display subsystem, a voice broadcast subsystem, a sequence merging subsystem, a target voice string database and a target text string picture database. The invention further provides a visual terminal for implementing the method. According to the method, quick recognition and translation output display of key target voice strings can be realized for voice input sequences, and readability and relative integrity of translation results are realized while real-time performance of voice translation is ensured.