The present invention tightly integrates and significantly leverages media capture systems to realize a ubiquitous collaborative multi-user environment, referred to as TalkingPaper, taking full advantages of what each unique medium can offer and further enhancing their respective functions and utilities. The invention comprises a paper-pen based knowledge capture subsystem A and a knowledge processing subsystem B. A user uses a subsystem A compliant pen to sketch on a piece of subsystem A enabled paper. The sketching activity, speech and gesture are recorded by the pen, which sends the captured data to a multi-threaded application server of subsystem B for further processing. The server converts and indexes the data to associate and synchronize each line stroke with corresponding audio time frames and enable the content interaction. Thus, users can easily find, select, and replay a session with synchronized speech, text, and video to understand the rationale behind certain ideas or decisions.