Example implementations described herein are directed to systems and methods for document capture, which can involve detecting, from a plurality of frames of a recording of an application window that comprises a document, document content of the document and screen activity of the application window; and generating a web-based copy of the document based on the document content and the screen activity. Further example implementations can involve recording screen activity such as mouse cursors, text annotations, scrolling actions and other activity, and then providing an application layer to replay the screen activity onto the captured document.