A conferencing system for the hearing and visually impaired
The conferencing system addresses the challenge of translating sign language to text and vice versa using deep learning, ensuring real-time communication and inclusivity for hearing and visually impaired users.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- TURKCELL TEKNOLOJI ARASTIRMA & GELISTIRME AS
- Filing Date
- 2025-12-10
- Publication Date
- 2026-06-18
Smart Images

Figure TR2025051637_18062026_PF_FP_ABST
Abstract
Description
[0001] A CONFERENCING SYSTEM FOR THE HEARING AND VISUALLY IMPAIRED
[0002] Technical Field
[0003] The present invention relates to a system which enables hearing or visually impaired individuals to effectively communicate with others who are hearing or visually impaired or who do not have any disability, on an international level or in their own country, through video conferencing and video chat.
[0004] Background of the Invention
[0005] Today, video conference management is becoming increasingly preferred by companies. With video conferencing sessions, participants can interact with other participants both in audio and video. Visually impaired and hearing-impaired participants experience difficulties in video conferencing sessions. In the state of art, various solutions have been encountered that enable participant interactions in audio interviews to be communicated to the hearing-impaired participant with sign language visuals through more complex models and operations by using a cloudbased computing system.
[0006] However, in the current technique, there is no solution that enables the sign languages of the features extracted from the disabled individuals to be translated into text or another language, and both the sign language of the hearing-impaired user and the text or voice message content of other users to be converted into each other by using two-way communication.
[0007] The United States patent document no. US2024163390, an application included in the state of the art, discloses systems and methods for providing assistance to a visually impaired and / or hearing -impaired user to access a video conference. This invention comprises the steps of initiating a virtual meeting hosting a plurality of participants, detecting a presence of at least one impaired participant from the plurality of participants, loading one or more audio visual settings for providing assistance to the at least one impaired participant, and automatically providing adjustments for one or more audio visual elements provided to the at least one impaired participant. In this invention, hearing-impaired users may make use of an interpreter during virtual conference who can translate from sign language to spoken language and back. In some embodiments of the invention, the video conferencing provider may comprise a motion detection engine that monitors the video stream received from client devices in the form of cameras for users in order to determine whether they are performing sign language or not. For example, a hand gesture detection engine may comprise pattern recognition, artificial intelligence, machine learning and combinations thereof.
[0008] Summary of the Invention
[0009] An object of the present invention is to realize a system for the hearing and visually impaired, which enables the sign languages of the features extracted from the disabled individuals to be translated into text or another language, and both the sign language of the hearing -impaired user and the text or voice message content of other users to be converted into each other by using two-way communication.
[0010] Detailed Description of the Invention
[0011] “A Conferencing System for the Hearing and Visually Impaired” realized to fulfil the objectives of the present invention is shown in the figure attached, in which:
[0012] Figure 1 is a schematic view of the inventive conferencing system for the hearing and visually impaired. The components illustrated in the figure are individually numbered, where the numbers refer to the following:
[0013] 1. System
[0014] 2. Electronic device
[0015] 3. Application
[0016] 4. Server
[0017] The inventive conferencing system (1) for the hearing and visually impaired, which enables hearing or visually impaired individuals to effectively communicate with others who are hearing or visually impaired or who do not have any disability, on an international level or in their own country, through video conferencing and video chat comprises a plurality of electronic devices (2) which are used by hearing or visually impaired users and are configured to enable users with disabilities to communicate with other persons; at least one application (3) which is run on the electronic device (2) and is configured to enable the hearing or visually impaired user to participate in at least one video conferencing session through the interface thereon; and at least one server (4) which establishes a connection with the application (3) and the electronic device (2) and is configured to access the video and audio video conferencing content stream initiated on the application (3); to determine whether there is an electronic device (2) user using sign language in the respective video stream by analyzing the video stream it accesses in real time with deep learning algorithms thereon; to convert the hand and arm gesture expressions displayed by the user of the electronic device (2) with hearing impairment into textual expression by analyzing the hand and arm gestures that constitute the sign language with deep learning supported image processing algorithms thereon when it detects the presence of at least one electronic device (2) user using sign language in the respective video stream; to enable the said textual expressions to be displayed in a bubble on the interface window containing the image of the hearing-impaired user speaking in sign language in the said video conference by sending textual expressions to the application (3), thus, to enable other video conference participants without hearing impairment to understand what the said hearing-impaired user means; to convert the voice data into text instantaneously by analyzing the said voice data of non-disabled participants / users transmitted from the application (3) with the deep learning algorithms thereon; and to enable the transcription of the voice data into text to be displayed as subtitles on the interface window containing the images of the said participants / users.
[0018] The electronic device (2) included in the inventive system (1) is a device in the form of a mobile phone, tablet computer, computer or laptop computer configured to enable hearing-impaired, visually impaired or non-disabled users to participate in the same video conferencing session by running the application (3). The electronic device (2) is configured to transmit audio and video data of the users to the application (3) through the hardware thereon.
[0019] The application (3) included in the inventive system (1) is configured to operate according to the voice commands transmitted by the visually impaired user through the electronic device (2).
[0020] The server (4) included in the inventive system (1) is configured to access the voice message of the visually impaired user in the speech bubble in the application (3) when it detects that the visually impaired user has initiated an interaction in the video conference stream in the application (3) when it establishes a connection with the application (3) and to convert the voice message content into text by analyzing the voice message with cloud-based deep learning algorithms (speech to text). The server (4) is configured to determine the native language of all other users in the same video conferencing session through the application (3) after converting the voice message of the visually impaired user into text, to translate the content of the message of the visually impaired user into the native languages of the other users with deep learning algorithms, and to display the translated text content as bubbles on the applications (3) of the other users. The server (4) is configured to convert the message content converted into text of the visually impaired user into sign language visuals with deep learning algorithms and to send the sign language visuals in bubbles to the application (3) on the electronic device (2) of the hearing-impaired user when it detects that there is at least one hearing-impaired electronic device (2) user in the same video conferencing session after converting the voice message of the visually impaired user into text.
[0021] The server (4) included in another embodiment of the inventive system (1) is configured to convert the message content of the hearing-impaired person with Turkish native language into Turkish text content by processing it with cloud-based deep learning algorithms, to translate the Turkish text content into German with deep learning algorithms, and to convert the text content translated into German into German sign language visuals or animation by processing it with deep learning algorithms thereon when it detects that there is more than one hearing -impaired user with Turkish and German native language in the same video conferencing session by establishing a connection with the application (3).
[0022] The server (4) included in the inventive system (1) is configured to extract the features of the disabled users in the form of body poses and hand poses by processing the captured image content of the disabled users with CNN deep learning algorithms by establishing a connection with the application (3). The server (4) is configured to use deep learning algorithms in the form of RNN and LSTM in detecting object movements in consecutive frames in the video conferencing stream transmitted through the application (3), and to enable instantaneous translation of the sign language by enabling video conferencing stream transmitted from the application (3) and user interactions in the respective stream to be analyzed instantaneously with the said deep learning algorithms. The server (4) is configured to enable the deep learning algorithms running thereon to be continuously updated with the feedback data transmitted by hearing and visually impaired users through the application (3), and to securely store the user information input on the application (3) in at least one database thereon.
[0023] Industrial Application of the Invention
[0024] In the inventive system (1), there are both hearing and visually impaired users in a video conference stream on the application (3). The content that the hearing- impaired user mentions is shown instantaneously in the form of text to other nondisabled participant users through the application (3) by analyzing the hand and arm movements performed by the hearing-impaired user during the interaction in the said video stream with various deep learning algorithms on the server (4). When it is detected that a visually impaired user interacts in the said video conference stream, the audio content input by the visually impaired user is first converted into text by processing it with deep learning algorithms, and then hearing and visually impaired users are enabled to chat with each other instantaneously in the same video conference session by enabling the content converted into text to be translated into sign language.
[0025] With the inventive system (1), it is enabled to translate the sign languages of the features extracted from the disabled individuals into text or another language, and to convert both the sign language of the hearing -impaired user and the text or voice message content of other users into each other by using two-way communication.
[0026] The inventive system (1) operates within the scope of the Personal Data Protection Law (KVKK).
[0027] Within these basic concepts; it is possible to develop various embodiments of the inventive “A Conferencing System (1) for the Hearing and Visually Impaired”; the invention cannot be limited to examples disclosed herein and it is essentially according to claims.
Claims
CLAIMS1. A conferencing system (1) for the hearing and visually impaired, which enables hearing or visually impaired individuals to effectively communicate with others who are hearing or visually impaired or who do not have any disability, on an international level or in their own country, through video conferencing and video chat; comprising a plurality of electronic devices (2) which are used by hearing or visually impaired users and are configured to enable users with disabilities to communicate with other persons; and characterized by at least one application (3) which is run on the electronic device (2) and is configured to enable the hearing or visually impaired user to participate in at least one video conferencing session through the interface thereon; and at least one server (4) which establishes a connection with the application (3) and the electronic device (2) and is configured to access the video and audio video conferencing content stream initiated on the application (3); to determine whether there is an electronic device (2) user using sign language in the respective video stream by analyzing the video stream it accesses in real time with deep learning algorithms thereon; to convert the hand and arm gesture expressions displayed by the user of the electronic device (2) with hearing impairment into textual expression by analyzing the hand and arm gestures that constitute the sign language with deep learning supported image processing algorithms thereon when it detects the presence of at least one electronic device (2) user using sign language in the respective video stream; to enable the said textual expressions to be displayed in a bubble on the interface window containing the image of the hearing-impaired user speaking in sign language in the said video conference by sending textual expressions to the application (3), thus, to enable other video conference participants without hearing impairment to understand what the said hearing-impaired user means; to convert the voice data into text instantaneously by analyzing the said voice data of non-disabledparticipants / users transmited from the application (3) with the deep learning algorithms thereon; and to enable the transcription of the voice data into text to be displayed as subtitles on the interface window containing the images of the said participants / users.
2. A conferencing system (1) for the hearing and visually impaired according to Claim 1; characterized by the electronic device (2) which is a device in the form of a mobile phone, tablet computer, computer or laptop computer configured to enable hearing-impaired, visually impaired or non-disabled users to participate in the same video conferencing session by running the application (3).
3. A conferencing system (1) for the hearing and visually impaired according to Claim 1 or 2; characterized by the electronic device (2) which is configured to transmit audio and video data of the users to the application (3) through the hardware thereon.
4. A conferencing system (1) for the hearing and visually impaired according to any one of the preceding claims; characterized by the application (3) which is configured to operate according to the voice commands transmited by the visually impaired user through the electronic device (2).
5. A conferencing system (1) for the hearing and visually impaired according to any one of the preceding claims; characterized by the server (4) which is configured to access the voice message of the visually impaired user in the speech bubble in the application (3) when it detects that the visually impaired user has initiated an interaction in the video conference stream in the application (3) when it establishes a connection with the application (3) and to convert the voice message content into text by analyzing the voice message with cloud-based deep learning algorithms (speech to text).
6. A conferencing system (1) for the hearing and visually impaired according to any one of the preceding claims; characterized by the server (4) which is configured to determine the native language of all other users in the same video conferencing session through the application (3) after converting the voice message of the visually impaired user into text, to translate the content of the message of the visually impaired user into the native languages of the other users with deep learning algorithms, and to display the translated text content as bubbles on the applications (3) of the other users.
7. A conferencing system (1) for the hearing and visually impaired according to any one of the preceding claims; characterized by the server (4) which is configured to convert the message content converted into text of the visually impaired user into sign language visuals with deep learning algorithms and to send the sign language visuals in bubbles to the application (3) on the electronic device (2) of the hearing- impaired user when it detects that there is at least one hearing-impaired electronic device (2) user in the same video conferencing session after converting the voice message of the visually impaired user into text.
8. A conferencing system (1) for the hearing and visually impaired according to any one of the preceding claims; characterized by the server (4) which is configured to convert the message content of the hearing-impaired person with Turkish native language into Turkish text content by processing it with cloud-based deep learning algorithms, to translate the Turkish text content into German with deep learning algorithms, and to convert the text content translated into German into German sign language visuals or animation by processing it with deep learning algorithms thereon when it detects that there is more than one hearing-impaired user with Turkish and German native language in the same video conferencing session by establishing a connection with the application (3).
9. A conferencing system (1) for the hearing and visually impaired according to any one of the preceding claims; characterized by the server (4) which is configuredto extract the features of the disabled users in the form of body poses and hand poses by processing the captured image content of the disabled users with CNN deep learning algorithms by establishing a connection with the application (3).
10. A conferencing system (1) for the hearing and visually impaired according to any one of the preceding claims; characterized by the server (4) which is configured to use deep learning algorithms in the form of RNN and LSTM in detecting object movements in consecutive frames in the video conferencing stream transmitted through the application (3), and to enable instantaneous translation of the sign language by enabling video conferencing stream transmitted from the application (3) and user interactions in the respective stream to be analyzed instantaneously with the said deep learning algorithms.
11. A conferencing system (1) for the hearing and visually impaired according to any one of the preceding claims; characterized by the server (4) which is configured to enable the deep learning algorithms running thereon to be continuously updated with the feedback data transmitted by hearing and visually impaired users through the application (3), and to securely store the user information input on the application (3) in at least one database thereon.