The invention relates to natural language processing, particularly provides a method, a system, a medium and a device for realizing text-to-voice service based on GRPC, and aims to solve the problems of poor flexibility and poor expansibility of an existing text-to-voice method. In order to achieve the purpose, the method comprises the steps that an interface protocol for achieving the text-to-voice service is formulated in advance, a ProtoBuf compiler is used for compiling and generating a client side code and a server side code, ProtoBuf is used for serializing or de-serializing data, a conversion request sent by the client side is processed at the server side, and a processed result is returned to the client side. According to the method, the voice and audio data are serialized by using the ProtoBuf, the network transmission efficiency of a large audio file is greatly improved, a text-to-voice core algorithm is realized based on a GRPC remote procedure call lightweight framework, the deployment problem of real-time online text-to-voice and non-real-time text-to-voice is solved based on GRPC one-way streaming and non-streaming transmission, and the method has relatively high flexibility, expandability and concurrency capability.