A unified web-based voice messaging
system provides voice application control between a
web browser and an
application server via an
hypertext transport protocol (HTTP) connection on an
Internet Protocol (IP) network. The
web browser receives an
HTML page from the
application server having an
XML element that defines data for an audio operation to be performed by an
executable audio resource. The
application server executes the voice-enabled
web application by runtime execution of
extensible markup language (
XML) documents that define the voice-enabled
web application to be executed. The application
server includes a runtime environment that establishes an efficient, high-speed connection to a
web server. The application
server, in response to receiving a user request from a user, accesses a selected
XML page that defines at least a part of the voice application to be executed for the user. The XML page may describe any one of a
user interface such as dynamic generation of a menu of options or a prompt for a
password, an
application logic operation, or a function capability such as generating a function call to an external resource. The application
server then parses the XML page, and executes the operation described by the XML page, for example dynamically generating an
HTML page having voice application control content, or fetching another XML page to continue application
processing. In addition, the application server may access an XML page that stores application state information, enabling the application server to be state-aware relative to the user interaction. Hence, the XML page, which can be written using a conventional editor or word processor, defines the application to be executed by the application server within the runtime environment, enabling voice enabled web applications to be generated and executed without the necessity of
programming language environments.