The invention discloses a mobile type multi-modal interaction method and device based on enhanced reality. The method comprises the following steps that: through an enhanced reality way, displaying ahuman-computer interaction interface, wherein an enhanced reality scene comprises interaction information, including a virtual object and the like; through the ways of gesture and voice, sending an interaction instruction by a user, comprehending different-modal semantic through a multi-modal fusion method, and carrying out fusion on the modal data of the gesture and the voice to generate a multi-modal fusion interaction instruction; and after a user interaction instruction acts, returning an acting result to an enhanced reality virtual scene, and carrying out information feedback through thechange of the scene. The device of the invention comprises a gesture sensor, a PC (Personal Computer), a microphone, optical transmission type enhanced reality display equipment and a WiFi (Wireless Fidelity) router. The invention provides the mobile type multi-modal interaction method and device based on the enhanced reality, a human-centered thought is embodied, the method and the device are natural and visual, learning load is lowered, and interaction efficiency is improved.