The present invention relates to an arrangement (and a method) allowing multi-modal access of content over a global data communications network, e.g. Internet, comprising a mobile station (1), with a user agent, a proxy server (2), and a telephony platform (3). The mobile station (1) is a dual mode station supporting concurrent voice and data sessions, the proxy server (2) comprises an enhanced functionality for supporting voice browsing, and the telephony platform (3) comprises an Automatic Speech Recognizer (ASR) (31) and a block for converting text messages to speech. Said enhanced proxy server (2) interfaces the Automatic Speech Recognizer (31) of the Telephony Platform (3), and key elements (e.g. text, words phrases) are predefined and indicated in the (original) web content. When the enhanced proxy server (2) recognizes/extracts said key elements (using predefined rules) it triggers voice browsing, such that an arbitrary web content (page) can be accessed by voice commands without requiring conversion of the web content.