A multi-media entertainment device enabling a user to control sound/audio elements of a music or sound program while video, such as a musical performance, is displayed and which is correlated to the played sound elements. The user interacts with triggers such as laser beams that can be interrupted by a player's fingers to play music, such as particular instruments of a soundtrack. For instance, a music video or concert has a video track and a sound track. The video track is displayed, and the user controls the audio play of the sound track by interrupting the beams, each beam associated with a different instrument. This allows the user to play the multimedia device along with a displayed video performance, in unison or synchronization with one or more musicians displayed on a display, such as a TV, monitor or video projection system. The user's play may be scored as a function of the user's accuracy of engaging the triggers in time unison with the displayed video image. For instance, a user can strum a trigger associated with a guitar program in unison with a guitarist on the display. The music created by the user interacting with multiple triggers is sympathetic and always synchronized to the video performance. If the user misses the timing of a note, the sound is not played. In another version, a video program such as that associated with a video game is displayed, and the user interacts by playing the triggers controlling sound elements associated with the displayed video game.