A system method of advertising using a multi-media application system is disclosed. The multi-media application relates to the delivery of multi-media messages using animated entities that audibly deliver messages created by a sender using text-to-speech technologies. The method provides targeted advertising based on information learned about both the sender of a multi-media message and the recipient of the multi-media message. The information may relate to an analysis of a text message created by the sender, emoticons chosen by the sender and inserted into the text of the message, the choice by the sender of an animated entity, or other parameters such as background music chosen for which template is chosen by the sender. Advertising messages may be delivered before the recipient receives the multi-media message, during the reception by the recipient of the multi-media message or following the reception of the multi-media message. A decision regarding whether to include an advertising message may be based on a text analysis or an analysis of the emoticons or other tags inserted into the text by the sender. Further, animated entities such as professionally designed face models, templates, additional emoticons, animation or sound effects may also be purchased by the sender for a limited number of multi-media messages, for limited amount of time or longer for use in creating multi-media messages. The system comprises a server to handle the reception and processing of sender multi-media messages and client software for both creating multi-media messages and receiving multi-media messages.